On Input Design for System Identification CHIARA BRIGHENTI Masters’ Degree Project

On Input Design for System Identification CHIARA BRIGHENTI Masters’ Degree Project
On Input Design for System Identification
Input Design Using Markov Chains
CHIARA BRIGHENTI
Masters’ Degree Project
Stockholm, Sweden March 2009
XR-EE-RT 2009:002
Abstract
When system identification methods are used to construct mathematical
models of real systems, it is important to collect data that reveal useful
information about the systems dynamics. Experimental data are always
corrupted by noise and this causes uncertainty in the model estimate. Therefore, design of input signals that guarantee a certain model accuracy is an
important issue in system identification.
This thesis studies input design problems for system identification where
time domain constraints have to be considered. A finite Markov chain is
used to model the input of the system. This allows to directly include input
amplitude constraints into the input model, by properly choosing the state
space of the Markov chain. The state space is defined so that the model
generates a binary signal. The probability distribution of the Markov chain
is shaped in order to minimize an objective function defined in the input
design problem.
Two identification issues are considered in this thesis: parameter estimation and NMP zeros estimation of linear systems. Stochastic approximation
is needed to minimize the objective function in the parameter estimation
problem, while an adaptive algorithm is used to consistently estimate NMP
zeros.
One of the main advantages of this approach is that the input signal
can be easily generated by extracting samples from the designed optimal
distribution. No spectral factorization techniques or realization algorithms
are required to generate the input signal.
Numerical examples show how these models can improve system identification with respect to other input realization techniques.
1
Acknowledgements
Working on my thesis at the Automatic Control department at KTH has
been a great experience.
I would like to thank my supervisor Professor Bo Wahlberg and my
advisor Dr. Cristian R. Rojas for having given me the possibility to work
on very interesting research topics. Thank you for your guidance. A special
thank goes to Cristian R. Rojas, for the time he spent on me and for the
many ideas shared with me.
Thanks to all the people I had the pleasure to know during this work period: Mitra, Andre B., Fotis, Pedro, Mohammad, Andre, Pierluigi, Matteo,
Davide and Alessandro. I really enjoyed your company.
I would like to thank Professor Giorgio Picci who made this experience
possible.
Finally, thanks to my family and my friends for their continuous help
and support.
2
Acronyms
AR Autoregressive
FDSA Finite Difference Stochastic Approximation
FIR Finite Impulse Response
LMI Linear Matrix Inequality
LTI Linear Time Invariant
NMP Non Minimum Phase
PEM Prediction Error Method
PRBS Pseudo Random Binary Signal
RLS Recursive Least Squares
SISO Single Input Single Output
SPSA Simultaneous Perturbation Stochastic Approximation
WN White Noise
I
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I
1 Introduction
1.1 Thesis outline and contributions . . . . . . . . . . . . . . . .
1
3
2 System Identification
2.1 System and model description . . . . .
2.2 Identification method . . . . . . . . .
2.3 Estimate uncertainty . . . . . . . . . .
2.3.1 Parameter uncertainty . . . . .
2.3.2 Frequency response uncertainty
2.4 Conclusions . . . . . . . . . . . . . . .
.
.
.
.
.
.
4
4
5
6
6
7
7
3 Input Design for System Identification
3.1 Optimal input design problem . . . . . . . . . . . . . . . . . .
3.2 Measures of estimate accuracy . . . . . . . . . . . . . . . . .
3.2.1 Quality constraint based on the parameter covariance
3.2.2 Quality constraint based on the model variance . . . .
3.2.3 Quality constraint based on the confidence region . . .
3.3 Input spectra parametrization . . . . . . . . . . . . . . . . . .
3.3.1 Finite dimensional spectrum parametrization . . . . .
3.3.2 Partial correlation parametrization . . . . . . . . . . .
3.4 Covariance matrix parametrization . . . . . . . . . . . . . . .
3.5 Signal constraints parametrization . . . . . . . . . . . . . . .
3.6 Limitations of input design in the frequency domain . . . . .
3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
9
10
11
11
12
12
13
14
15
15
16
17
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Markov Chain Input Model
18
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Markov chains model . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.1 Markov chain state space . . . . . . . . . . . . . . . . 19
II
4.3
4.4
4.2.2 Markov chains spectra . . . . . . . . . . . . . . . . . .
More general Markov chains . . . . . . . . . . . . . . . . . . .
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 Estimation Using Markov
5.1 Problem formulation . .
5.2 Solution approach . . .
5.3 Cost function evaluation
5.4 Algorithm description .
5.5 Numerical example . . .
5.6 Conclusions . . . . . . .
Chains
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
25
26
27
27
28
28
29
31
42
6 Zero Estimation
44
6.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . 44
6.2 FIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.3 ARX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.4 General linear SISO systems . . . . . . . . . . . . . . . . . . . 47
6.5 Adaptive algorithm for time domain input design . . . . . . . 48
6.5.1 A motivation . . . . . . . . . . . . . . . . . . . . . . . 48
6.5.2 Algorithm description . . . . . . . . . . . . . . . . . . 49
6.5.3 Algorithm modification for Markov chain signals generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.5.4 Numerical example . . . . . . . . . . . . . . . . . . . . 51
6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7 Summary and future work
53
7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
References
55
III
List of Figures
4.1
4.2
4.3
Graph representation of the two states Markov chain S2 . . . .
Graph representation of the four states Markov chain S4 . . .
Graph representation of the three states Markov chain S3 . . .
20
20
25
Mass-spring-damper system. . . . . . . . . . . . . . . . . . . .
Cost functions f2 and f3 in the interval of acceptable values
of the variables umax and ymax , respectively. . . . . . . . . .
5.3 Cost functions f1 , f4 ,f5 in the interval of acceptable values.
5.4 Estimate of the cost function on a discrete set of points for
the two states Markov chain in the case 2. . . . . . . . . . .
5.5 Estimation of the best transition probability for the two states
Markov chain in the case 2. . . . . . . . . . . . . . . . . . . .
5.6 Estimation of the best transition probability p for the four
states Markov chain in the case 2. . . . . . . . . . . . . . . .
5.7 Estimation of the best transition probability r for the four
states Markov chain in the case 2. . . . . . . . . . . . . . . .
5.8 Estimate of the cost function on a discrete set of points for
the two states Markov chain in the case 1. . . . . . . . . . . .
5.9 Estimate of the cost function on a discrete set of points for
the four states Markov chain in the case 1. . . . . . . . . . . .
5.10 Bode diagrams of the optimal spectra of the 2 states Markov
chains in the cases 1, 2 and 3 of Table 5.3, and of the real
discrete system. . . . . . . . . . . . . . . . . . . . . . . . . . .
5.11 Bode diagrams of the optimal spectra of the 4 states Markov
chains in the cases 1, 2 and 3 of Table 5.3, and of the real
discrete system. . . . . . . . . . . . . . . . . . . . . . . . . . .
5.12 Estimates of the frequency response of the system using the
optimal two states Markov chains for the cases 1, 2 and 3 in
Table 5.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
5.1
5.2
IV
32
32
34
35
37
37
38
38
40
40
41
5.13 Estimates of the frequency response of the system using the
optimal four states Markov chains for the cases 1, 2 and 3 in
Table 5.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1
6.2
6.3
Representation of the adaptive algorithm iteration at step k.
uk denotes the vector of all collected input values from the
beginning of the experiment, used for the output prediction
ŷ (k). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A zero estimate trajectory produced by the adaptive algorithms described in Sections 6.5.2 and 6.5.3. . . . . . . . . . .
Normalized variance of the estimation error for the adaptive
algorithms described in Sections 6.5.2 and 6.5.3. . . . . . . .
V
42
50
51
52
List of Tables
4.1
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
Poles and zeros of the canonical spectral factor of the spectrum of sm of Example 4.2.4. . . . . . . . . . . . . . . . . . .
Maximum threshold values in the three analyzed cases. . . . .
Results of 100 Monte-Carlo simulations of the algorithm with
the 2 states Markov chain. . . . . . . . . . . . . . . . . . . . .
Optimal values of the transition probabilities in the cases 1,
2 and 3, obtained after 30000 algorithm iterations. . . . . . .
Total cost function values obtained with the optimal Markov
inputs, a PRBS and white noise in case 1. . . . . . . . . . . .
Trace of the covariance matrix obtained with the optimal
Markov inputs, a PRBS, white noise, a binary input having
the optimal correlation function and the optimal spectrum in
case 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Total cost function value obtained with the optimal Markov
inputs, a PRBS and white noise in case 3. . . . . . . . . . .
Estimated values of the parameters of the continuous real system and relative percentage errors, obtained with the optimal
two states Markov chains. . . . . . . . . . . . . . . . . . . . .
Estimated values of the parameters of the continuous real system and relative percentage errors, obtained with the optimal
four states Markov chains. . . . . . . . . . . . . . . . . . . . .
VI
24
33
34
35
36
36
36
41
42
Chapter 1
Introduction
Mathematical models for systems are necessary in order to predict their
behavior and as parts of their control systems. This work focuses on models
constructed and validated from experimental input/output data, by means
of identification methods.
Information obtained through experiments on the real system depend
on the input excitation which is often limited by amplitude or power constraints. For this reason, experiment design is necessary in order to obtain
system estimates within a given accuracy, saving time and cost of the experiment [1]. Robustness of input design for system identification is also one of
the most important issues, specially when the model of the system is used
for projecting its control system. In [2]- [7] some studies on this problem
are presented. The effects of undermodeling on input design are pointed out
in [8] and [9].
Depending on the cost function considered in this setting, input design
can be typically solved as a constrained optimization problem. In the Prediction Error Method (PEM) framework it is common to use, as a measure of
the estimate accuracy, a function of the asymptotic covariance matrix of the
parameter estimate. This matrix depends on the input spectrum that can
then be shaped in order to obtain a “small” covariance matrix and improve
the estimate accuracy (see [10], [11]). Usually, a constraint on the input
power is also included; in this way, time domain amplitude constraints are
approximately translated in the frequency domain [12].
A first disadvantage of these methods is that they are strongly influenced by the initial knowledge of the system. Secondly, solving the problem
in the frequency domain does not provide any further information on how to
generate the input signal in the time domain: the input can be represented
as filtered white noise, but many probability distributions can be used to
1
CHAPTER 1. INTRODUCTION
2
generate white noise. Furthermore, in practical applications time domain
constraints on signals have to be considered and the power constraint that
is usually set in the frequency domain does not assure that these constraints
are respected. For this reason, in [13] a method is proposed to generate a
binary input with a prescribed correlation function; once an optimal spectrum or correlation function is found solving the input design problem in the
frequency domain, it is possible to generate a binary signal which approximates the optimal input. Also in [14] a method is proposed that provides a
locally optimal binary input in the time domain.
This thesis studies the input design problem in the probability domain.
Compared to design methods in the frequency domain, a solution in the
probability domain makes it easier to generate input trajectories to apply to
the real system, by extracting samples from a given distribution. Inputs are
modeled by finite stationary Markov chains which generate binary signals.
Binary signals are often used in system identification and one of the reasons
is that they achieve the largest power in the set of all signals having the
same maximum amplitude and it is well known that this improves parameter
estimation for linear models.
The idea of modeling the input by a finite Markov chain derives from
the possibility of including input amplitude constraints directly into the
input model, by suitably choosing the state space of the Markov chain.
Furthermore, unlike the design in the frequency domain, this approach keeps
more degrees of freedom in the choice of the optimal spectrum, which in
general is non unique [12].
Two identification problems are considered here: parameter estimation
and non-minimum phase zeros estimation for LTI systems.
For the first problem, the optimal distribution is found by minimizing
the cost function defined in the input design problem with respect to the
one-step transition probabilities of the Markov chain. In this analysis, a
stochastic approximation algorithm is used since a closed-form solution to
the optimization problem is not available and the cost is a stochastic function
of these transition probabilities and is contaminated with noise (see [15], [16]
for details).
For the second problem, it will be shown that the Markov chain input
model has exactly the optimal spectrum for the zero estimation problem of
a FIR or ARX model [6]. In general, the spectrum of a two states Markov
chain can be made equal to the spectrum of the AR process which guarantees a consistent estimate of the NMP zero of a linear SISO system [17].
Therefore, an optimal or consistent input can be generated in the time domain by Markov chain distributions. The adaptive algorithm introduced
CHAPTER 1. INTRODUCTION
3
in [18] for input design in the time domain when undermodeling is considered, is modified here in order to generate Markov chain signals having the
same spectrum as the general inputs designed in the original version of the
algorithm. The advantage is that a binary signal is then used to identify the
non-minimum phase zero, by keeping the same input variance and spectrum.
The outline of the thesis is presented in the next section.
1.1
Thesis outline and contributions
The subject of this thesis is input design for system identification. In particular, the objective of this study is to analyze a new approach to input
design: work in the probability domain model the input signal as a finite
Markov chain. The first chapters summarize some known results of system
identification in the PEM framework and input design in the frequency domain, in order to compare methods and results obtained with the classical
input design method in the frequency domain and with the method proposed
here.
In Chapter 2 PEM and its asymptotic properties are reviewed.
In Chapter 3 the most commonly adopted input design methods are
described. These formulate input design problems as convex optimization
problems. The solution is given as an optimal input spectrum.
In Chapter 4 general Markov chains that model binary input signals are
defined. Some spectral properties are also described.
In Chapter 5 are presented the input design problem for parameter estimation and the solution approach based on the Markov chain input model.
In Chapter 6 input design for identification of NMP zeros of a LTI system is considered. The chapter discusses classical solutions in the frequency
domain and adaptive solutions in the time domain in undermodeling conditions where the input is modeled as an AR or a Markov chain process.
Chapter 7 concludes the thesis.
Chapter 2
System Identification
This chapter introduces system identification in the PEM framework for
parameter estimation of LTI models. Once a model structure is defined,
this method finds the model’s parameters that minimize the prediction error.
Even when the model structure is able to capture the true system dynamics,
the estimate error will not be zero, since data used in the identification
procedure are finite and corrupted by noise. The purpose of input design is
to minimize the estimate error by minimizing the variance of the parameter
estimates, assuming the estimation method is consistent. Section 2.1 defines
systems and model structures considered in this work, while Section 2.2
discusses PEM and its asymptotic properties.
2.1
System and model description
This thesis considers discrete-time LTI SISO systems, lying in the set M of
parametric models
y (t) = G (q, θ) u (t) + H (q, θ) e (t) ,
(2.1)
b1 + b2 q −1 + · · · + bnb q −nb
1 + a1 q −1 + · · · + ana q −na
1 + c1 q −1 + · · · + cnc q −nc
,
1 + d1 q −1 + · · · + dnd q −nd
G (q, θ) = q −nk
H (q, θ) =
θ = [b1 , . . . , bnb , a1 , . . . ana , c1 , . . . cnc , d1 , . . . dnd ]T ∈ Rb×1
where u (t) is the input, y (t) is the output and e (t) is zero mean white
noise with finite variance. The symbol q −1 represents the delay operator
(q −1 u (t) = u (t − 1)). Assume H (q, θ) is stable, monic and minimum phase,
i.e. poles and zeros lie inside the unit circle.
4
CHAPTER 2. SYSTEM IDENTIFICATION
5
The real system S is given as
y (t) = G0 (q) u (t) + H0 (q) e0 (t) ,
(2.2)
where e0 (t) has finite variance λ0 . Assume there exists θ0 , parameter vector
such that G (q, θ0 ) = G0 (q) and H (q, θ0 ) = H0 (q), i.e. assume there is no
undermodeling:
S ∈ M.
(2.3)
This condition is hardly satisfied in practice, since real systems often are of
high order or non linear. Nevertheless, as it will be explained in the next
section, this condition is crucial for the consistence and the asymptotic properties of PEM. In Chapter 5, regarding the parameter estimation problem,
(2.3) will be supposed to hold, while in Chapter 6, where the zero estimation
problem is analyzed, this condition will not necessarily be considered.
2.2
Identification method
System identification aims at describing a real system through a mathematical model constructed and validated from experimental input-output
data. The identification method considered here is PEM [19]. This method
minimizes the function of the prediction error εF (t, θ)
N
¡
¢
1 X 2
VN θ, Z N =
εF (t, θ)
2N
(2.4)
t=1
where Z N is a vector containing the collected input-output data, i.e. Z N =
[y (1) , u (1) , y (2) , u (2) . . . , y (N ) , u (N )].
The prediction error is defined as εF (t, θ) = y (t, θ) − ŷ (t, θ), where the
one-step ahead predictor is given by
£
¤
ŷ (t, θ) = H −1 (q, θ) G (q, θ) u (t) + 1 − H −1 (q, θ) y (t) .
Suppose all the hypothesis for the consistence of PEM are satisfied; in
that case, the parameter estimate θ̂N converges to the true parameter vector
θ0 as N tends to infinity. Briefly, these conditions are
1. Condition (2.3) holds, i.e. there is no undermodeling
2. The signals y (t) and u (t) are jointly quasi-stationary
3. u (t) is persistently exciting of sufficiently large order
CHAPTER 2. SYSTEM IDENTIFICATION
2.3
6
Estimate uncertainty
Measuring the quality of the model estimate is an important issue in system
identification. The measure is chosen depending on the application for which
the model is required. One possibility to measure the estimate uncertainty
is to use a function of the covariance matrix of the parameter estimate. In
other cases, such as in control applications, it could be better to use the
variance of the frequency response estimate in the frequency domain. These
two cases are now presented in more detail.
2.3.1
Parameter uncertainty
Under the assumptions for the consistency of PEM, it holds that
´
√ ³
N θ̂N − θ0 → N (0, Pθ0 ) as N → ∞
Pθ−1
=
0
ψ (t, θ0 ) =
(2.5)
¤
1 £
E ψ (t, θ0 ) ψ T (t, θ0 )
λ0
¯
∂ ŷ (t, θ) ¯¯
∂θ ¯θ0
where N denotes the Normal distribution [19]. Therefore, when the model
class is sufficiently flexible to describe the real system, the parameter estimate will converge to the true parameter vector as the number of data N
used in the estimation goes to infinity, with covariance decaying as 1/N .
From (2.5) it follows that a confidence region in which the parameter
estimate will lie with probability α is
½
¾
³
´T
³
´
−1
2
Uθ = θ| N θ − θ̂N Pθ0 θ − θ̂N ≤ χα (n) .
(2.6)
The covariance matrix defines an ellipsoid asymptotically centered in θ0 .
Upon the condition that u and e are independent (that is, data are
collected in open-loop), the asymptotic expression in the number of data
points N of the inverse of the covariance matrix of the parameter estimate
is
Z π
¡
¢
¡
¢
N
−1
Pθ0 =
Fu eiω , θ0 Φu (ω) Fu? eiω , θ0 dω + Re (θ0 ) (2.7)
2πλ0 −π
Z
¡
¢
¡
¢
N π
Re (θ0 ) =
Fe eiω , θ0 Fe? eiω , θ0 dω
2π −π
CHAPTER 2. SYSTEM IDENTIFICATION
where
7
¡ iω ¢ ¯¯
¡ iω ¢
¡
¢
∂G
e ,θ ¯
Fu e , θ0 = H −1 eiω , θ0
¯
¯
∂θ
θ0
¡ iω ¢ ¯¯
¡
¢
¡
¢ ∂H e , θ ¯
Fe eiω , θ0 = H −1 eiω , θ0
¯
¯
∂θ
(2.8)
(2.9)
θ0
and Φu (ω) is the power spectral density of the input u (t). Here ? denotes
the complex conjugate transpose.
Expression (2.7) shows that the asymptotic covariance matrix of the
parameter estimate depends on the input spectrum. Therefore, by shaping
Φu (ω) it is possible to obtain estimates within a given accuracy.
In the whole thesis it will assumed that there is no feedback in the
system, i.e. u and e are independent.
2.3.2
Frequency response uncertainty
In many applications, it could be preferable to measure the quality of the
model estimate using the variance of the frequency response estimate, frequency by frequency.
In [19] it is shown that under the condition (2.3), the
³
´
iω
variance of G e , θ̂N can be approximated by
³ ³
´´ m Φ (ω)
v
Var G eiω , θ̂N
≈
N Φu (ω)
(2.10)
for large but finite model order m and number of data N , where v is the
process defined as v (t) = H0 (q) e0 (t).
If the model order is not large enough the previous expression is not a
good approximation. Instead, by the Gauss’ approximation formula, it is
possible to write
¡
¢ ¯?
¡
¢¯
³ ³
´´
∂G eiω , θ ¯¯
1 ∂G eiω , θ ¯¯
iω
Var G e , θ̂N
≈
(2.11)
¯ Pθ0
¯ .
¯
¯
N
∂θ
∂θ
θ0
θ0
Equation (2.11) expresses the frequency response uncertainty in terms of the
parameter uncertainty. Therefore, both equations (2.10) and (2.11) show
that a proper choice of the input spectrum can reduce the variance of the
frequency response estimate. This is the purpose of input design.
2.4
Conclusions
This chapter introduced PEM for system identification and its asymptotic
properties that are often used to solve input design problems.
CHAPTER 2. SYSTEM IDENTIFICATION
8
Models constructed from experimental data are always affected by uncertainty. In Section 2.3 two possible measures of the model uncertainty
were discussed: parameter uncertainty and frequency response uncertainty.
The choice between the two depends on the application. Typically, in control applications a measure in the frequency domain is preferable. This work
considers parameter uncertainty.
Chapter 3
Input Design for System
Identification
This chapter presents general input design problems for system identification. Typically, input design aims at optimizing some performance function
under constraints on the estimate accuracy and on the input signal. The
solution approach will be presented in detail, describing how input design
problems can be formulated as convex optimization problems.
Ideas and drawbacks of the general input design framework are reviewed
in Section 3.1. The most widely used measures of estimate accuracy are
presented in Section 3.2. Here is also shown how quality constraints can be
written as convex constraints. Sections 3.3 to 3.5 describe some techniques
used for spectra and signal constraints parametrization, needed for handling
finitely parametrized problems.
3.1
Optimal input design problem
In a general formulation, input design problems are constrained optimization
problems, where the constraints are typically on the input signal spectrum or
power and the estimate accuracy. In this framework the objective function
to be optimized can be any performance criterion, which usually depends
on the practical application. For example, input power or experiment time
can be minimized.
Common input design problem formulations are:
1. Optimize some measure of the estimate accuracy, under constraints on
input excitation.
2. Optimize some property of the input signal, given constraints on the
9
CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION
10
estimate accuracy.
As it will be discussed in the next section, typical measures of the estimate
accuracy are functions of the uncertainty in the model estimate, like (2.7),
(2.10) and (2.11). As was shown in Section 2.3, these functions depend on
the input spectrum, which therefore can be used to optimize the objective
function.
A formal expression of the first problem formulation is
min f (Pθ0 )
Φu
subject to
(3.1)
g (Φu ) ≤ α,
that can also be written as
min γ
Φu
subject to
(3.2)
f (Pθ0 ) ≤ γ
g (Φu ) ≤ α.
The next sections will show how this type of constraints can be formulated as convex constraints, upon certain conditions on the functions f and
g.
The following drawbacks of input design problems like (3.2) have to
be enlightened. First of all, notice that the asymptotic expression of the
covariance matrix depends on the true parameter θ0 that is not known.
Secondly, the constraints may be non-convex and infinite dimensional. In
that case, a parametrization of the input spectrum is necessary in order to
handle finitely parametrized optimization problems. Furthermore, once the
optimal input spectrum has been found, an input signal having that optimal
spectrum has to be generated. This can be done by filtering white noise with
an input spectral factor1 . Nevertheless, no information on the probability
distribution of the white noise is given in this solution approach.
3.2
Measures of estimate accuracy
In the usual input design framework, three types of quality measures are
typically considered. These are described in the following sections.
1
¡
¢
By spectral factor is meant an analytic function L (z) such that Φu (z) = L (z) L z −1 .
CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION
3.2.1
11
Quality constraint based on the parameter covariance
In Section 2.3 it has been shown that the asymptotic covariance matrix of the
parameter estimate is an affine function of the input spectrum, through (2.7).
If the purpose of the identification procedure is parameter estimation, then
typical scalar measures of estimate accuracy are the trace, the determinant
or the maximum eigenvalue of the covariance matrix [12]. Then the following
quality constraints can be introduced:
TrPθ0
≤ γ
(3.3)
detPθ0
≤ δ
(3.4)
λmax (Pθ0 ) ≤ ².
(3.5)
It is possible to prove that these constraints can be manipulated to be convex
in Pθ−1
; proofs can be found in [20] and [21] for (3.4) and (3.5), respectively.
0
For example, (3.5) is equivalent to
"
#
I²
I
≥0
(3.6)
I Pθ−1
0
which is an LMI in Pθ−1
.
0
The constraint (3.3) is a special case of the more general weighted trace
constraint that will be considered in the next subsection.
Notice that all these quality constraints depend on the true parameter
vector θ0 . Many solutions have been presented in the literature to handle
this problem, as will be discussed afterwards.
3.2.2
Quality constraint based on the model variance
Consider the quality constraint based on the variance of the frequency response
Z π
³ ³
´´
1
F (ω) Var G eiω , θ̂N dω ≤ γ,
(3.7)
2π −π
where F (ω) is a weighting function. By substituting the variance expression
(2.11), it results that this quality constraint can be written as
TrW Pθ0 ≤ γ,
where
1
W =
2π
Z
π
−π
(3.8)
¡
¢ ¯?
¡
¢¯
∂G eiω , θ ¯¯
1 ∂G eiω , θ ¯¯
¯ F (ω)
¯ dω.
¯
¯
N
∂θ
∂θ
θ0
(3.9)
θ0
See [12] for details. The following Lemma generalizes the previous result. A
proof can be found in [6].
CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION
12
Lemma 3.2.1 The problem
TrW (ω) Pθ0
≤ γ,
∀ω
W (ω) = V (ω) V ? (ω) ≥ 0,
∀ω
Pθ0
≥ 0
can be formulated as an LMI of the form
"
γ − TrZ ≥ 0
#
Z
V ? (ω)
≥ 0,
V (ω) Pθ−1
0
∀ω.
(3.10)
Notice that this formulation is convex in the matrices Pθ−1
and Z (see [12],
0
[6]). Notice also that this type of constraint includes the constraint on the
trace of the covariance matrix (3.3) as a special case, when W is the identity
matrix.
3.2.3
Quality constraint based on the confidence region
In control applications, it is often preferable to have frequency by frequency
constraints on the estimate error. Consider the measure [6, 12]
¡ ¢
¡
¢
¡ iω ¢
¡ iω ¢ G0 eiω − G eiω , θ
∆ e ,θ = T e
.
(3.11)
G (eiω , θ)
The input has to be designed so that
¯ ¡ iω ¢¯
¯∆ e ¯ ≤ γ,
∀ω,
∀θ ∈ Uθ .
(3.12)
This constraint can also be formulated as a convex constraint in Pθ−1
, as
0
proven in [22].
3.3
Input spectra parametrization
As discussed in the last section, the typical measures of estimate accuracy
are functions of the covariance matrix Pθ0 . Expression (2.7), derived in
the asymptotic analysis of PEM, shows that the input spectrum Φu can
be used to optimize the estimate performance. The problem of finding an
optimal input spectrum has an infinite number of parameters, since Φu (ω)
is a continuous function of the frequency ω. Nevertheless, by a proper
spectrum parametrization, it is possible to formulate the problem as a convex
CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION
13
and finite dimensional optimization problem, since the parametrization of
the input spectrum leads to a parametrization of the inverse of the covariance
matrix [6, 12].
A spectrum can always be written in the general form
Φu (ω) =
∞
X
¡ ¢
c̃k Bk eiω ,
(3.13)
k=−∞
© ¡ ¢ª∞
where Bk eiω k=−∞ are proper stable rational basis functions that span2
L2 . It is always possible to choose basis functions having the hermitian
¡
¢
¡ ¢
properties B−k = Bk? and Bk e−iω = Bk? eiω [6]. The coefficients c̃k satisfy
the symmetry property c̃−k = c̃k and must be such that
Φu (ω) ≥ 0, ∀ω,
(3.14)
otherwise Φu would not be a spectrum.
For example, the FIR representation of a spectrum is obtained by choosing Bk (ω) = e−iωk ; consequently, it results c̃k = rk , where rk is the correlation function of the process u.
By substituting (3.13) into (2.7), the inverse of the covariance matrix
becomes an affine function of the coefficients c̃k of the form
Pθ−1
=
0
∞
X
c̃k Qk + Q̄.
(3.15)
k=−∞
This parametrization of the input spectrum leads to a denumerable but
infinite number of parameters in the optimization problem.
Two possible spectra parametrizations are described in the following
subsections, which make the problem finitely parametrized.
3.3.1
Finite dimensional spectrum parametrization
The finite dimensional spectrum parametrization has the form
¡ ¢
¡ ¢
Φu (ω) = Ψ eiω + Ψ? eiω
¡ ¢
Ψ eiω =
M
−1
X
¡ ¢
c̃k Bk eiω .
(3.16)
k=0
This parametrization forces the coefficients c̃M , c̃M +1 , . . . to be zero. Therefore, the condition Φu (ω) ≥ 0 must be assured through the coefficients
−1
{c̃k }M
k=0 . The following result, deriving from an application of the Positive
Real Lemma [23], can be used to assure the constraint (3.14).
2
L2 denotes the set
©
f|
R
|f (x)|2 dx < ∞
ª
CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION
14
Lemma 3.3.1 Let {A, B, C, D} be a controllable state-space realization of
¡ ¢
Ψ eiω . Then there exists a matrix Q = QT such that
Ã
! Ã
!
Q − AT QA −AT QB
0
CT
+
≥0
(3.17)
−B T QA
−B T QB
C D + DT
if and only if Φu (ω) ,
PM −1
k=0
£ ¡ ¢
¡ ¢¤
c̃k Bk eiω + Bk? eiω ≥ 0, ∀ω.
The state-space realization of the positive real part of the input spectrum can
be easily constructed. For example (Example 3.5 in [6]), an FIR spectrum
has positive real part given by
M
−1
X
¡ ¢ 1
rk e−iωk .
Ψ eiω = r0 +
2
(3.18)
k=1
¡ ¢
A controllable state-space realization of Ψ eiω is
!
Ã
³
´T
O1×M −2
0
, B = 1 0 ... 0
A=
IM −2
OM −2×1
³
´
1
C = r1 r2 . . . rM −1
, D = r0 .
2
(3.19)
Therefore, in this example the constraint (3.14) can be written as an LMI
in Q and r1 , . . . , rM −1 .
3.3.2
Partial correlation parametrization
The partial correlation parametrization uses the finite expansion
M
−1
X
¡ ¢
c̃k Bk eiω
(3.20)
k=−(M −1)
in order to design only the first M coefficients c̃k . In this case, it is necessary
to assure that there exists a sequence c̃M , c̃M +1 , . . . such that the complete
sequence {c̃k }∞
k=0 defines a spectrum. That is, the condition (3.14) must
hold. This means that (3.20) does not necessary define a spectrum itself, but
the designed coefficients are extendable to a sequence that parametrizes a
spectrum. As explained in [6], if an FIR spectrum is considered, a necessary
and sufficient condition for (3.14) to hold is


r0
r1
. . . rM −1


r0
. . . rM −2 
 r1
 .
(3.21)
..
.. 
..
≥0
 .
.
.
. 
 .
rM −1 rM −2 . . .
r0
CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION
15
(see [24] and [25]).
¡ ¢
This condition also applies to more general basis functions, like Bk eiω =
¡ iω ¢ −iωk
¡ iω ¢
L e e
, where L e
> 0 [6, 12].
The constraint (3.21) is an LMI in the first M correlation coefficients
and therefore is convex in these variables. Notice that (3.21) is less restrictive than the condition imposed in Lemma 3.3.1 for the finite dimensional
parametrization. Furthermore, as it will be discussed in the next section, the
finite spectrum parametrization allows to handle spectral constraints on input and output signals, that the partial correlation parametrization cannot
handle, since the parametrization (3.20) is not a spectrum. An advantage
of the latter parametrization, though, is that it uses the minimum number
of free parameters.
3.4
Covariance matrix parametrization
By using one of the input spectrum parametrizations in the expression (2.7),
the inverse of the asymptotic covariance matrix can be written as
Pθ−1
0
=
M
−1
X
c̃k Qk + Q̄,
(3.22)
k=−(M −1)
¡ iω ¢ ¡ iω ¢ ? ¡ iω ¢
Rπ
N
e , θ0 Bk e Fu e , θ0 dω and Q̄ = Re (θ0 ).
where Qk = 2πλ
F
u
−π
0
−1
Then, Pθ0 is expressed as a linear and finitely parametrized function of
the coefficients c̃0 , . . . , c̃M −1 (since the symmetry condition c̃−k = c̃k holds).
Therefore, any quality constraint that is convex in Pθ−1
is also convex in
0
c̃0 , . . . , c̃M −1 . Some common quality constraints have been introduced in
Section 3.2, which were all convex functions of Pθ−1
.
0
3.5
Signal constraints parametrization
Constraints on the input spectrum are also considered in input design. Typically, they are frequency by frequency or power constraints. A detailed discussion of the power constraints parametrization is presented in [6]. Briefly,
consider power constraints of the type
Z π
¯
¡ ¢¯
1
¯Wu eiω ¯2 Φu (ω) dω ≤ αu
(3.23)
2π −π
Z π
¯
¡ ¢¯
1
¯Wy eiω ¯2 Φy (ω) dω ≤ αy .
(3.24)
2π −π
CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION
16
By using a finite spectrum parametrization these constraints can be written
−1
as convex finite-dimensional functions of {c̃k }M
k=0 . For example, a constraint
on the input power for an FIR spectrum becomes r0 ≤ αu .
For frequency by frequency constraints of the form
βu (ω) ≤ Φu (ω) ≤ γu (ω)
(3.25)
βy (ω) ≤ Φy (ω) ≤ γy (ω) ,
(3.26)
−1
Lemma 3.3.1 can be applied to write them as convex constraints in {c̃k }M
k=0 ,
upon the condition that the constraining functions are rational [6].
3.6
Limitations of input design in the frequency
domain
The previous sections introduced input design problems where constraints
on the signal spectra as well as on a measure of the estimate accuracy
are considered. It has been shown that they can be formulated as finitely
parametrized convex optimization problems, upon the condition that the
measure of the estimate accuracy is a convex function of Pθ−1
and the input
0
spectrum is parametrized as proposed in Section 3.3. Therefore, by solving
a constrained optimization problem, the optimal variables c̃0 , . . . , c̃M −1 are
found. The FIR spectrum representation is commonly used, so that the
optimization procedure returns the first M terms of the correlation function.
If a partial correlation parametrization is used, the optimal spectrum can
be found by solving the Yule-Walker equations as described in [26].
From the optimal spectrum is then necessary to generate a signal in the
time domain to apply to the real system. This is a realization problem that
caracterizes solutions in the frequency domain. The input can be generated
as filtered white noise, by spectral factorization of the optimal spectrum.
Nevertheless, many probability distributions can be used to generate white
noise.
Also, it has to be noticed that in general the optimal spectrum is non
unique and the input design approach so far considered only finds one of
the optimal spectra. In fact, a finite dimensional spectrum parametrization forces the input correlation coefficients rM , rM +1 , . . . to be zero; on the
other hand, the partial correlation parametrization needs to complete the
correlation sequence by solving Yule-Walker equations which give only one
particular correlation sequence.
Furthermore, in practical applications time domain constraints on the
signals have to be considered and the power constraint that is usually set in
CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION
17
the frequency domain does not assure that these constraints are respected.
For these reasons, this thesis proposes to analyze the performance of an
input design method in the probability domain, as it will be presented in
the next chapters.
3.7
Conclusions
Classical input design in the frequency domain has been presented. The
advantage of this approach is that input design problems can be formulated
as convex optimization problems. Some limitations of the method concern
time domain constraints on signals and realization techniques.
Chapter 4
Markov Chain Input Model
The drawbacks of input design in the frequency domain, presented in Section
3.6, suggest to study the possibility of a different approach.
This chapter will introduce the idea of input design in the probability
domain. In particular, reasons and advantages of modeling the input signal
as a Markov chain will be presented in Section 4.1. In Section 4.2 the Markov
chain input model will be described in detail.
4.1
Introduction
What in general is required for system identification in practical applications
is an input signal in the time domain that guarantees a sufficiently accurate
estimate of the system while respecting some amplitude constraints.
As discussed above, input design in the frequency domain does not handle time domain constraints on signals. Another disadvantage is that it does
not define how to generate the input signal to apply to the real system from
the optimal spectrum. Furthermore, input design in the frequency domain
does not use the degrees of freedom in the choice of the optimal spectrum,
which is generally non unique. That approach, in fact, only finds an optimal
solution that fits with the optimal correlation coefficients r0 , . . . , rM −1 . All
the other possible solutions are not considered.
The idea of input design in the probability domain arises from this observation: a solution in the probability domain makes it easier to generate
input trajectories to apply to the real system, by extracting samples from
the optimal distribution. In this way, no spectral factorization or realization
algorithm are required.
Markov chains having finite state space could then be used in order
to directly include the time domain amplitude constraints into the input
18
CHAPTER 4. MARKOV CHAIN INPUT MODEL
19
model, by suitably choosing the state space. The idea is to use Markov
chain distributions to generate binary signals. Binary signals are often used
in system identification and one of the reasons is that they achieve the largest
power in the set of all signals having the same maximum amplitude and this
improves parameter estimation for linear models.
Also, if the Markov chain has spectrum of sufficiently high order (and
this depends on the state space dimension), when designing the optimal
probability distribution there are more degrees of freedom in the choice of
the optimal spectrum.
A finite stationary Markov chain is used as input model for system identification. The probability distribution will be shaped in order to optimize
the objective function defined in the input design problem. This function is
then minimized with respect to the transition probabilities of the Markov
chain that completely define its distribution [27].
4.2
Markov chains model
This section describes the state space structure of the general Markov chain
input model and some of its spectral features.
4.2.1
Markov chain state space
Consider a finite stationary Markov chain having states of the form
(ut−n , ut−n+1 , . . . , ut )
(4.1)
where ui represents the value of the input at the time instant i; it can be
equal to either umax or −umax , where umax is the maximum tolerable input
amplitude, imposed by the real system. This model allows the present value
of the input to depend on the last n past values, rather than only on the previous one. Note that at the time instant t, the state can transit only to either
the state (ut−n+1 , ut−n+2 , . . . , ut , umax ) or (ut−n+1 , ut−n+2 , . . . , ut , −umax )
with probabilities p(ut−n ,...,ut ) and 1 − p(ut−n ,...,ut ) , respectively. Not all the
transitions between states are possible; therefore the transition matrix will
present several zeros corresponding to those forbidden state transitions.
The last component of the Markov chain state will generate the binary
signal to apply to the real system.
Example 4.2.1 Consider a Markov chain having state space S2 = {1, −1}.
The graph representation is shown in Figure 4.1 and the corresponding tran-
CHAPTER 4. MARKOV CHAIN INPUT MODEL
sition matrix is
Ã
Π2 =
p
1−p
1−q
q
20
!
.
(4.2)
Figure 4.1: Graph representation of the two states Markov chain S2 .
This simple model generates a binary signal where each sample at time
t depends only on the previous value at time t − 1.
Example 4.2.2 A more general model is the four states Markov chain, with
state space S = {(1, 1) , (1, −1) , (−1, −1) , (−1, 1)}. The transition matrix is


p 1−p 0
0
 0
0
s 1−s 


Π=
(4.3)

 0
0
q 1−q 
r 1−r 0
0
and the corresponding graph is shown in Figure 4.2.
Figure 4.2: Graph representation of the four states Markov chain S4 .
Note that when p = r and s = q the four states Markov chain model is
equivalent to the two states Markov chain of Example 4.2.1.
CHAPTER 4. MARKOV CHAIN INPUT MODEL
21
These examples show one of the advantages of the proposed Markov
chain input model: each model includes all the models of lower dimension
as special cases, for proper choices of the transition probabilities.
4.2.2
Markov chains spectra
This section presents a method to calculate the expression of the Markov
chains spectra. Some examples will illustrate the type of signals these models
generate.
By means of Markov chains and state space realization theories (see [27]
and [28]), it is possible to derive a general expression for the spectrum of a
finite stationary Markov chain sm having state space S = {S1 , S2 , . . . , SJ }.
For the general Markov chains considered in the previous section, each state
has the form (4.1) and the number of states is J = 2n+1 .
Let Π denote the transition matrix whose elements are³the conditional
´
probabilities Π (i, j) = P {sm+1 = Sj | sm = Si } and p̄ =
p̄1 . . . p̄J
the solution of the linear system p̄ = p̄Π, containing the stationary probabilities p̄i = P {sn = Si }. Consider the states Si as column vectors. Defining
³
´
As = S1 . . . SJ
and


Ds = 
p̄1
0
..
0
.



p̄J
it is possible to write the correlation coefficients of the output signal in the
matricial form
rk = As Ds Πk ATs , k = 0, 1, 2 . . .
(4.4)
For k < 0 the correlation can be obtained by the symmetry condition
rk = r−k , since the process sm is real.
To calculate the spectrum of sm as the Fourier transform of the correlation function, note that rk can be viewed as the impulse response, for
k = 1, 2, . . ., of the linear system
xk+1 = Πxk + ΠATs uk
yk = As Ds xk .
(4.5)
Therefore, the transfer function W (z) = As Ds (zI − Π)−1 ΠATs of the
system (4.5) is the Z-transform of the causal part of the correlation function,
that is {rk }∞
k=1 .
CHAPTER 4. MARKOV CHAIN INPUT MODEL
22
¡
¢
Consequently, W z −1 is the Z-transform of the anticausal part of the
correlation function, {rk }−1
k=−∞ .
The spectrum of the Markov chain signal sm can then be expressed as
¡
¢
Φs (z) = W (z) + r0 + W z −1 ,
(4.6)
The correlation rk is in general a matricial function, i.e. for each k, rk is
an n × n matrix. The correlation function of the input signal is given by the
sequence obtained for the element in position (n, n): {rk (n, n)}∞
k=0 . Then,
the input signal spectrum is given by the element in position (n, n) of the
matrix Φs (z).
Consider now the two Markov chains in Examples 4.2.1 and 4.2.2 of the
previous section.
Example 4.2.3 By calculating the transfer function W (z) and the autocorrelation r0 , the following expression for the spectrum of the Markov chain
of Figure 4.1 is obtained:
p
p
(1 + α) (1 − γ) (1 + α) (1 − γ)
Φs (z) =
,
(z − α) (z −1 − α)
(4.7)
2
where α = p + q − 1 ∈ [−1, 1] and γ = (p+q−1)(p+q−2)−(p−q)
.
(p+q−2)
Notice that (4.7) is the spectrum of a first order AR process. This also
means that it is possible to generate any first order AR process through a
two states Markov chain.
p−q
The mean value of sm is E [sm ] = 2−p−q
. By forcing p = q the mean
value would be zero and since γ = α the spectrum would depend on only one
parameter. If p = q the variance of the Markov chain is 1.
Example 4.2.4 Consider the Example 4.2.2 of the previous section where
the mean value of the Markov chain is set to zero. Then s = (1−q)r
(1−p) and the
stationary probabilities are




p̄ = 



Ã
r(1−q)
2(1−p)(1−q)+2r(1−q)
(1−p)(1−q)
2(1−p)(1−q)+2r(1−q)
r(1−q)
2(1−p)(1−q)+2r(1−q)
(1−p)(1−q)
2(1−p)(1−q)+2r(1−q)
T







!
1 1 −1 −1
By definition, As =
. However, only the second
1 −1 −1 1
component of the Markov chain is of interest, since it will represent the
CHAPTER 4. MARKOV CHAIN INPUT MODEL
23
input signal. Therefore, the ³correlation rk of the
´ input process can be calculated using the vector Ās = 1 −1 −1 1 instead of As . The analytic
calculation of the spectrum turns out to be too involved, so it has only been
evaluated numerically. Some values of poles and zeros of the canonical spectral factor 1 are reported in Table 4.1. These data show that it is not possible
to model sm by an AR process, because in general there are also non null
zeros in the spectrum. The four states Markov chain has a higher order
spectrum than the two states Markov chain of Example 4.2.3; the number
of poles and zeros depends on the values of the probabilities p, r and q and
can be up to eight. For some values of the probabilities p, r and q, there are
zero-pole cancelations that reduce the spectrum order; in particular, when
p = q = r the spectrum has the same simple structure obtained for the previous case (see Table 4.1), as it has already been shown in the previous section.
A particular choice of the transition

p 1−p
 0
0

Π4 = 
 0
0
r 1−r
matrix for this Markov chain is

0
0
r 1−r 

(4.8)

p 1−p 
0
0
This choice of the transition matrix makes the Markov chain symmetric
in the sense that the transition probabilities are invariant with respect to
exchanges in the sign of the states components.
Even if the input is designed in the probability domain, the spectrum is
shaped by the choice of n and of the transition probabilities of the Markov
chain.
Notice that Φu is only subject to (4.6) and no other structural constraints
are imposed, except for the constraint that the transition probabilities have
to lie in [0, 1]. For the spectrum parametrizations described in Section 3.3
this does not happen. For example, if the FIR representation of the spectrum is used, the finite dimensional parametrization forces the positive real
part of the spectrum to be an FIR system. The Markov chain signals have
spectrum where poles and zeros are related each other, since the number
of free parameters in the problem is J (the transition probabilities), as the
number of poles of W (z). However, the positive real part of the spectrum is
not forced to be FIR. From these observations it is possible to conclude that
looking at input design using Markov chains in the frequency domain, this
1
By canonical spectral factor is meant the analytic and minimum phase function L (z)
¡
¢
such that Φs (z) = L (z) L z −1 .
CHAPTER 4. MARKOV CHAIN INPUT MODEL
24
p
q
r
poles
zeros
0.2
0.2
0.2
0.8
0.2
0.2
0.2
0.8
0.2
0.2
0.2
0.8
0.8
0.2
0.8
0.8
0.8
0.2
0.2
0.8
0.8
0.8
0.8
0.8
0.3
0.5
0.8
0.2
0.5
0.8
-0.6000
−0.3557 + 0.6161i
−0.3557 − 0.6161i
0.7114
−0.7746
0.7746
0 + 0.7746i
0 − 0.7746i
0 + 0.7746i
0 − 0.7746i
0.7746
−0.7746
−0.3557 + 0.6161i
−0.3557 − 0.6161i
0.7114
0.6000
−0.0302 + 0.5049i
−0.0302 − 0.5049i
−0.1396
−0.1500 + 0.5268i
−0.1500 − 0.5268i
0
0
0
0
0
0.4514
0
0
0
0
0
0
0
0
0
0
0
0
−0.3008
0
−0.3333
Table 4.1: Poles and zeros of the canonical spectral factor of the spectrum
of sm of Example 4.2.4.
CHAPTER 4. MARKOV CHAIN INPUT MODEL
25
approach preserves more degrees of freedom for the choice of the optimal
spectrum than the input design approach described in Chapter 3, since less
constraints on the structure are imposed.
In order to optimize an objective function for input design purposes by
shaping the input process distribution, it is necessary to define the Markov
chain state space and its transition probabilities. That is, once the input
model structure is defined, the objective function is optimized with respect
to the transition probabilities. For the examples considered in this section,
the purpose of the input design problem would be to optimize the objective
function J (u, θ0 ) with respect to the transition probabilities, p in the first
case, p and r in the second.
4.3
More general Markov chains
Input signals generated by the Markov chains described in the previous
sections are binary signals. It is also possible to extend this type of Markov
models to more general input signals. For example, Markov chains with
state space S = {0, 1, −1, 2, −2, 3, −3, . . .} would generate signals having
more than two amplitude levels.
As a simple example, consider a three states Markov chain generating a
ternary input signal. The Markov chain has state space S3 = {0, 1, −1} and
transition graph in Figure 4.3.
Figure 4.3: Graph representation of the three states Markov chain S3 .
To easily calculate an expression for the spectrum, the mean value of the
process is set to zero, so that p = q.
CHAPTER 4. MARKOV CHAIN INPUT MODEL
³
By solving p̄ = p̄Π, the vector p̄ =
found.
The spectrum results
Φs (z) =
1−r
3−2r−p
3 (r − 1) (p − 1) (1 + 3p)
³
2 (3 − 2r − p)
z−
26
1−p
3−2r−p
1
´³
3p−1
z −1 −
2
1−r
3−2r−p
3p−1
2
´
is
´
This spectrum has the same structure as the one found for the two states
Markov chain in Example 4.2.3. In this case, Φs depends on both r and p:
p determines the pole and r the gain of the spectrum.
The expression found for Φs (z) shows that this input model does not
provide a higher order spectrum than the two states Markov chain. In this
case, it is the preferable to use a simpler model.
This work focuses on input models that generate binary signals. The
input models described in this section will not be further considered.
4.4
Conclusions
This chapter defined finite stationary Markov chains generating binary signals. These processes have been introduced to model input signals for system
identification purposes. The main advantages of using Markov chains as input models are that amplitude constraints are directly included into the
input model and the input signal can be easily generated from the optimal
distribution, avoiding the realization problem.
Spectral properties of these processes have been also analyzed through
some examples in Section 4.2.2.
Chapter 5
Estimation Using Markov
Chains
This chapter proposes a method for parameter estimation of an LTI SISO
system by using the Markov chain input model presented in Chapter 4.
Section 5.1 defines the input design problem that will be studied here.
The solution approach is presented in Section 5.2. The design method is
described in detail in Sections 5.3 and 5.4. A numerical example, analyzed
in Section 5.5, concludes the chapter.
5.1
Problem formulation
Consider the system (2.2) and the parametric model (2.1) defined in Section 2.1. In this chapter, objective of the identification procedure is the
estimation of the parameter vector θ.
The input design problem considered in this study is to minimize a measure of the estimate error, f (Pθ0 ), where f is a convex function of the
covariance matrix of the parameter estimate. Often in practice, it is also
necessary to take into account some constraints on the real signals. In that
case, a general cost function can be considered
J (u, θ0 ) = f (Pθ0 (u)) + g,
(5.1)
where g is a term which represents the cost of the experiment. As explained
in Chapter 3, typical functions f are the trace, the determinant or the largest
eigenvalue of the covariance matrix Pθ0 [1].
This problem formulation is slightly different from the one presented in
Chapter 3. No constraints are set in the optimization problem explicitly,
but they are included into the cost function through the term g. The reason
27
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
28
for this is that the stochastic approximation framework is used to minimize
the objective function, since no analytic convex formulation of the problem
is available. This can be seen as a classical multiobjective optimization
approach (an overview can be found in [29] and [30]).
5.2
Solution approach
Since the analytic expressions for the covariance matrix Pθ0 as a function of
the transition probabilities of the Markov chain modeling the input are quite
involved, simulation techniques are required to evaluate the cost function.
The estimate of Pθ0 is a stochastic function of the one-step transition probabilities and is contaminated with noise; therefore, it can only be evaluated
through randomly generated input and noise signals. In this framework,
stochastic approximation algorithms are needed to minimize the cost function (5.1) with respect to the transition probabilities of the Markov chain
(see [15], [16] for details).
An expression for Pθ0 that suits in the stochastic approximation approach
is found in Section 5.3; Pθ0 is estimated as a function of input and noise
data. The stochastic algorithm used to minimize the cost function (5.1) is
described in Section 5.4.
5.3
Cost function evaluation
From the model expression (2.1) it is possible to write
e (t) = H (q, θ)−1 (y (t) − G (q, θ) u (t))
and by linearizing the functions G (q, θ) and H (q, θ) at θ = θ0
G (q, θ) ≈ G0 (q) + 4G (q, θ)
H (q, θ) ≈ H0 (q) + 4H (q, θ) ,
¯
¯
T
where 4G (q, θ) = (θ − θ0 )T ∂G(q,θ)
∂θ ¯ , 4H (q, θ) = (θ − θ0 )
following expression is derived:
θ0
e (t) = (H0 (q) + 4H (q, θ))−1 (H0 (q) e0 (t) − 4G (q, θ) u (t)) .
By substituting the Taylor expansion
(H0 (q) + 4H (q, θ))−1 ≈
1
4H (q, θ)
−
H0 (q)
H0 (q)2
¯
∂H(q,θ) ¯
∂θ ¯θ
, the
0
(5.2)
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
and the expressions of 4G (q, θ) and 4H (q, θ), it results
Ã
¯
1
∂H (q, θ) ¯¯
T
e0 (t) ≈ (θ − θ0 )
¯ e0 (t) +
H0 (q)
∂θ
θ0
!
¯
1
∂G (q, θ) ¯¯
u (t) + e (t) .
H0 (q)
∂θ ¯θ0
29
(5.3)
The problem of estimating the parameter θ for the model (2.1) is asymptotically equivalent to solving the least squares problem for (5.3) where e0
and u are known, when the number of data points N used for estimation
goes to infinity.
Therefore, the asymptotic expression (2.7) can be approximated as
Pθ−1
=
0
³
where S =
1 ¡ T ¢
S S
λ0
´
∈ RN ×b and wi ∈ RN ×1 is the sequence ob¯
¯
∂G(q,θ) ¯
∂H(q,θ) ¯
1
u
(t)
+
¯
∂θ
∂θ ¯ e0 (t).
H0 (q)
w1 . . . wb
tained from wit =
1
H0 (q)
θ0
θ0
Therefore, at each iteration of the algorithm, the cost function is evaluated using randomly generated input and noise signals.
5.4
Algorithm description
When evaluating the cost function by simulation, it is necessary to consider
that the cost function estimate is a stochastic variable that depends on the
transition probabilities of the Markov chains and on the noise process e (t).
Therefore, the cost function values generated through simulation have to
be considered as samples of that stochastic variable. The true value of the
cost function for a given transition probability would be the mean of that
stochastic variable. For these reasons, stochastic approximation is necessary in order to minimize the cost function with respect to the transition
probabilities of the Markov chain describing the input.
One of the most common stochastic approximation methods that do not
require the knowledge of the cost function gradient is the finite difference
stochastic approximation (FDSA) [15]. It uses the recursion
dk ,
p̂k+1 = p̂k − ak ∇J
(5.4)
dk is an estimate of the gradient of J at the k-th step and ak is a
where ∇J
sequence such that limk→∞ ak = 0. The FDSA estimates the gradient of the
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
30
cost function as
dki = J (p̂k + ck ei ) − J (p̂k − ck ei ) ,
∇J
2ck
dki is the i-th comwhere ei denotes the unit vector in the i-th direction, ∇J
ponent of the gradient vector and ck is a sequence of coefficients converging
to zero as k → ∞.
Depending on the number d of parameters with respect to which minimize the cost function, a simultaneous perturbation stochastic approximation (SPSA) may be more efficient than the FDSA [31] ; when d increases the
number of cost function evaluations in a FDSA procedure may be too large
and the algorithm be very slow. In that case the SPSA algorithm described
in [31] gives better performance, since it requires only two evaluations of the
cost function regardless of d. In fact, SPSA estimates the gradient by
dki = J (p̂k + ck ∆k ) − J (p̂k − ck ∆k )
∇J
2ck ∆ki
where ∆k is a d -dimensional random perturbation vector, whose components are independently generated from a Bernoulli ±1 distribution with
probability 0.5 for each outcome [32].
The iteration (5.4) is initialized by a first evaluation of the cost function
on a discrete set of points and choosing the minimum in that set. At any
point in this set, the cost function is evaluated only once; therefore, the
value obtained is a sample extracted from the stochastic variable describing
the cost function at that point. Therefore, it could turn out that the initial
condition is not close to the true minimum of the cost function, due to
noise in the measurements. Nevertheless, in some cases the result of the
initialization procedure may be sufficiently accurate, so there could be no
need to run many algorithm iterations. This of course will depend on the
cost function shape and on the choice of the grid of points.
a
c
The sequences ak and ck can be chosen as ak = A+k+1
and ck =
1/3 ,
(k+1)
which are asymptotically optimal for the FDSA algorithm (see [15]). A
method for choosing A, a and c may be to estimate the gradient of the cost
d0 has magnitude
function at the initial condition, so that the product a0 ∇J
approximately equal to the expected changes among the elements of p̂k in
the early iterations [32]. The coefficient c (as suggested in [32]) ought to
be greater than the variance of the noise in the cost function measurements
in order to have a good estimate of the gradient. This variance may be
estimated at the initial condition of the algorithm.
An analytic proof of the algorithm convergence can be found in [15].
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
31
Figure 5.1: Mass-spring-damper system.
5.5
Numerical example
Consider a mass-spring-damper system (Figure 5.1), where the input u is
the force applied to the mass and the output y is the mass position.
It is described by the transfer function
G0 (s) =
s2 +
1
m
c
ms
+
k
m
with m = 100 Kg, k = 10 N/m and c = 6.3246 N s/m, resulting the natural
frequency ωn = 0.3162 rad/s and the damping ξ = 0.1. The power here is
defined as pw (t) = u (t) ẏ (t).
White noise with variance λ0 = 0.0001 is added at the output and an
output-error model is used [19]. Data are sampled with Ts = 1 s and the
number of data points generated is N = 1000.
As a measure of the estimate accuracy, the trace of the covariance matrix
Pθ0 is used. In order to consider also some practical constraints on the
amplitude of the input and output signals and the maximum and mean
input power, a general cost function will be used:
J (u, θ0 ) = f1 (T rPθ0 (u)) + f2 (umax )
(5.5)
+ f3 (ymax ) + f4 (pwmax ) + f5 (pwmean )
where umax and ymax are the absolute maximum values of the input and
output signals, pwmax and pwmean are the maximum and mean input power.
Thresholds for T rPθ0 , umax , ymax , pwmax and pwmean have been set, which
define the maximum values allowed for each of these variables. Figure 5.2
and 5.3 show the cost functions f2 , f3 and f1 , f4 , f5 , respectively: when
the variables T rPθ0 , umax , ymax , pwmax and pwmean reach their maximum
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
32
Figure 5.2: Cost functions f2 and f3 in the interval of acceptable values of
the variables umax and ymax , respectively.
Figure 5.3: Cost functions f1 , f4 ,f5 in the interval of acceptable values.
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
33
acceptable value (100%), the cost is one. Outside the interval of acceptable
values, the cost functions continue growing linearly.
As input models, the two simple examples of Markov chains introduced
in Section 4.2 are considered here. They are described by the transition
matrices


p 1−p 0
0
Ã
!
 0
p
1−p
0
r 1−r 


Π2 =
Π4 = 
.
 0
1−p
p
0
p 1−p 
r 1−r 0
0
The set of points used for the algorithm initialization as explained in
Section 5.4 is {0.1, 0.2, . . . 0.9}. In the case analyzed here, since the cost
function depends on not more than two parameters, FDSA is used. The
algorithm coefficients have been chosen by the method suggested in the
previous section.
Three cases that have been studied:
1. The cost associated to T rPθ0 and the costs associated to the physical
constraints have comparable values.
2. No power and amplitude constraints are considered.
3. Very strict power constraints are considered.
These cases are summarized in 5.1.
Case
1
2
3
T rPθ0
10−6
5∗
5 ∗ 10−6
5 ∗ 10−6
umax
1N
Inf N
1N
ymax
pwmax
pwmean
1m
Inf m
1m
0.3 Nsm
Inf Nsm
0.03 Nsm
0.03 Nsm
Inf Nsm
0.003 Nsm
Table 5.1: Maximum threshold values in the three analyzed cases.
As a term of comparison for the performance of the Markov input model,
a pseudo-random binary signal and white noise with unit variance (the same
as the variance of the Markov chains) have been applied as inputs to the
system. The results of the simulation runs for the three cases listed above are
shown in Tables 5.4, 5.5 and 5.6. The cost function values are estimated by
evaluating the average of 100 simulation runs using the optimal input found
by the algorithm, the PRBS and white noise inputs. Table 5.5, related
to case 2, shows the optimal value of the trace of the covariance matrix
calculated by solving the LMI formulation of the input design problem in
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
34
the frequency domain, as explained in [12]. Furthermore, by the method
described in [13], a binary signal having the optimal correlation function is
generated. The minimum obtained with this input signal is also shown in
Table 5.5.
The second case, which is the most standard in input design problems, is
first analyzed in detail. Figure 5.4 presents the cost function, estimated on a
fine grid of points, as the average of 100 simulations. Table 5.2 exhibits the
Figure 5.4: Estimate of the cost function on a discrete set of points for the
two states Markov chain in the case 2.
results of two Monte-Carlo simulations (each consisting of 100 runs), which
show that the variance of the algorithm output decreases approximately as
1
NIter , where NIter is the number of algorithm iterations; this guarantees
the empirical algorithm convergence. With 10000 iterations the algorithm
NIter
Mean value Ep̂
Variance Varp̂
1000
2000
0.8657
0.8671
4.6 × 10−4
2.5 × 10−4
Table 5.2: Results of 100 Monte-Carlo simulations of the algorithm with the
2 states Markov chain.
produces the results in Figure 5.5. The optimality of the probability p̂ found
by the algorithm has been verified by using the expression of the two states
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
35
Markov chain spectrum in the asymptotic expression (2.7) and minimizing
T rPθ0 with respect to α; it turns out that the optimal value p̂ = 0.8714 is
very close to the one found by the stochastic algorithm after 30000 iterations,
that is p̂ = 0.8712 (Table 5.3). This confirms that the stochastic algorithm
converges to the true optimal value. In practice, it is not necessary to run
the algorithm for 30000 iterations, since already at the initial condition
the cost function is very close to the minimum and the variance of the
estimate after 10000 iterations is of the order of 10−5 . It has been done
here, anyway, to show that the final value obtained is the true optimal one.
Figure 5.5: Estimation of the best transition probability for the two states
Markov chain in the case 2.
Case
S2
1
p̂ = 0.4720
2
p̂ = 0.8712
3
p̂ = 0.1100
h
h
h
i
p̂ r̂
p̂ r̂
p̂ r̂
i
i
h
=
=
=
h
h
S4
0.4730 0.6794
0.8494 0.6445
i
i
i
0.0005 0.2981
Table 5.3: Optimal values of the transition probabilities in the cases 1, 2
and 3, obtained after 30000 algorithm iterations.
Notice from the results in Table 5.5 that the Markov chains give lower values
of the trace of Pθ0 (u) than all the other inputs, except the true optimal
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
spectrum.
36
The frequencies of the optimal input spectrum for the case 2
J (u, θ0 )
S2
S4
PRBS
WN
1.2758
1.2788
1.2564
20.1326
Table 5.4: Total cost function values obtained with the optimal Markov
inputs, a PRBS and white noise in case 1.
T rPθ0
S2
S4
PRBS
WN
BI
Optimum
1.43e-7
1.59e-7
4.35e-7
4.66e-7
2.18e-6
2.85e-8
Table 5.5: Trace of the covariance matrix obtained with the optimal Markov
inputs, a PRBS, white noise, a binary input having the optimal correlation
function and the optimal spectrum in case 2.
J (u, ϑ0 )
S2
S4
PRBS
WN
78.51
73.58
163.85
484.94
Table 5.6: Total cost function value obtained with the optimal Markov inputs, a PRBS and white noise in case 3.
have been estimated by means of the Multiple Signal Classification (MUSIC)
methods, described in [26]. It results that the optimal input consists of two
sinusoids of frequencies 0.3023 rad/s and 0.3571 rad/s, respectively, where
the main contribution is given by the sinusoid of high frequency, which has
approximately 5.6 times the power of the first component. Note that these
frequencies are very close to the natural frequency of the system and to the
poles of the Markov chains spectra (Figures 5.10 and 5.11).
Figures 5.6 and 5.7 show the trajectories of the probabilities estimates
obtained for the four states Markov chain in case 2. The empirical speed of
convergence is lower than for the two states Markov chain. Nevertheless, the
cost function value does not change significantly if the algorithm is stopped
after 2000 iterations.
Case 1 analyzes the more practical situation in which amplitude and
power constraints on signals have to be considered. In this case, the cost
functions obtained for the two and the four states Markov chain are presented in Figures 5.8 and 5.9. Notice that despite the presence of noise, the
cost function is convex; therefore, the problem has a solution.
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
37
Figure 5.6: Estimation of the best transition probability p for the four states
Markov chain in the case 2.
Figure 5.7: Estimation of the best transition probability r for the four states
Markov chain in the case 2.
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
38
Figure 5.8: Estimate of the cost function on a discrete set of points for the
two states Markov chain in the case 1.
Figure 5.9: Estimate of the cost function on a discrete set of points for the
four states Markov chain in the case 1.
Power constraints move the minimum of the cost function to smaller
probability values; that is, the transition probabilities cannot be too large,
otherwise the input excitation would not respect power constraints. In case
1, the Markov inputs and the PRBS signal give almost the same cost value
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
39
(Table 5.4). This happens because the optimal values of the transition
probabilities are approximately 0.5, which means that the Markov chain
signal is essentially binary white noise (and this depends on the choice of
the thresholds values). Note in Figure 5.10 and 5.11 that in this case the
spectra of the Markov chains are almost constant. The white input, which
is generated with a gaussian distribution, gives a much higher cost value,
due to its amplitude and power.
Also in the third case, when more strict power constraints are imposed
on in the problem, the use of a Markov chain is preferable (see results in
Table 5.6). Therefore, when amplitude and power constraints have to be
considered in the input design problem, the Markov chain model can significantly improve system identification. The optimal distribution is easily
estimated by simulation of the real system.
Notice that the two states Markov chain performs a bit better than
the four states Markov chain in the first two cases, while in the third case,
when stricter power constraints are considered, the four states Markov chain
achieves the lowest cost function value. The reason for this is that in case 1
and 2 the optimal input structure is the two states Markov chain; therefore,
the stochastic approximation algorithm performs better if the simple input
model is used, rather than a more general one that requires more parameters
to be tuned. In fact, notice that in cases 1 and 2 (Tables 5.3) the optimal
transition probabilities p̂ for the four states Markov chain are close to the
corresponding optimal probabilities p̂ for the two states Markov chain. However, the probabilities r̂ do not result equal to the corresponding p̂ and this
explains the worse performance of the four states Markov chain in those two
cases.
Finally, after having calculated optimal transition probabilities for the
considered input models and compared them to other input signals, the
system parameters have to be identified. Their real values are:
1
= 0.01Kg −1
m
c
Ns
= 0.06325
m
m
k
N
= 0.1
m
m
(5.6)
(5.7)
(5.8)
The results of the identification procedure using the optimal Markov chains
are shown in Tables 5.7 and 5.8, and Figures 5.12 and 5.13. Notice that both
the parameters and the frequency response of the real continuous system
are correctly estimated. The relative errors on the parameters estimates are
mostly under 1%. When power constraints have to be included, the estimate
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
40
Figure 5.10: Bode diagrams of the optimal spectra of the 2 states Markov
chains in the cases 1, 2 and 3 of Table 5.3, and of the real discrete system.
Figure 5.11: Bode diagrams of the optimal spectra of the 4 states Markov
chains in the cases 1, 2 and 3 of Table 5.3, and of the real discrete system.
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
41
error is a bit larger. The best results are achieved in case 2 with the two
states Markov chain.
Case
1
m
c
m
k
m
£
Kg −1
est. val.
err. %
£ ¤
est. val. Nms
err. %
£ ¤
est. val. N
m
err. %
¤
1
2
3
0.0101
1%
0.06355
0.4743%
0.09997
0.03%
0.01
0%
0.06329
0.0632%
0.1
0%
0.009688
3.12%
0.06023
4.7747%
0.09948
0.52%
Table 5.7: Estimated values of the parameters of the continuous real system
and relative percentage errors, obtained with the optimal two states Markov
chains.
Figure 5.12: Estimates of the frequency response of the system using the
optimal two states Markov chains for the cases 1, 2 and 3 in Table 5.1.
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
Case
1
m
c
m
k
m
£
Kg −1
est. val.
err. %
£ ¤
est. val. Nms
err. %
£ ¤
est. val. N
m
err. %
¤
42
1
2
3
0.009936
0.64%
0.06261
1.0119%
0.1
0%
0.01
0%
0.06314
0.1739%
0.0999
0.1%
0.01
0%
0.06266
0.9328%
0.09996
0.04%
Table 5.8: Estimated values of the parameters of the continuous real system
and relative percentage errors, obtained with the optimal four states Markov
chains.
Figure 5.13: Estimates of the frequency response of the system using the
optimal four states Markov chains for the cases 1, 2 and 3 in Table 5.1.
5.6
Conclusions
This chapter proposed a solution to the input design problem for parameter
estimation using Markov chains. A stochastic approximation algorithm has
been used to minimize the objective function in the input design problem.
The solution is the probability distribution of the Markov chain, from which
it is possible to easily generate a binary input signal.
From the results in Table 5.5, the example in Section 5.5 showed that the
Markov chain model gives a trace of the covariance matrix 10 times lower
CHAPTER 5. ESTIMATION USING MARKOV CHAINS
43
than the value obtained with a binary input which approximates the process
having the optimal correlation function. Results in Table 5.4 and 5.6 also
prove that the Markov chain model performs equally or better than most
commonly used inputs, as PRBS and white noise. This means that the
Markov chain input model can improve system identification considerably.
Chapter 6
Zero Estimation
In control applications it is often important to have accurate estimates of the
frequency response of a system, in a certain frequency band. Knowledge of a
non-minimum phase zero of a scalar linear system gives relevant information
in this sense; in fact, NMP zeros limit the achievable bandwidth [33]. In
identification procedures, this information may also be useful for estimating
the model structure and order. Input design for accurate zero identification
is then often required. The quantification of the variance of the estimated
zero is particularly important. Many expressions for the variance of the
estimated zero, which are mainly asymptotic expressions, can be found in
the literature.
Input design for NMP zeros identification for LTI systems is presented
in Section 6.1, following the problem formulation in [17]. For FIR and ARX
systems, the optimal input spectrum can be found analytically, as shown in
Sections 6.2 and 6.3, respectively. In these cases, the solution is asymptotic
in the number of data but it does not depend on the model order if it is sufficiently high so that there is no undermodeling. An asymptotically optimal
solution, in the number of data and in the model order, for general SISO
LTI systems is presented in Section 6.4. The adaptive algorithm presented
in [18], which generates an input signal to consistently estimate the NMP
zero, is described in Section 6.5. It has then been modified to fit with the
Markov chain input model and generate an adaptive Markov chain signal.
A numerical example compares the two methods.
6.1
Problem formulation
Consider the SISO LTI system (2.2) and the class of parametric models (2.1)
defined in Section 2.1. PEM is used to estimate the real parameter vector θ.
44
CHAPTER 6. ZERO ESTIMATION
45
The assumptions 1, 2 and 3 in Section 2.2 are considered for the consistency
of the method. These lead to the asymptotic normality of the parameter
estimate and the expression (2.7) of the inverse of the covariance matrix Pθ0 .
Consider a particular zero of the polynomial B (q, θ) = b1 + b2 q −1 + · · · +
bnb q −nb , denoted zk (θ). Assume all zeros of B (q, θ) have multiplicity one.
Let zk0 = zk (θ0 ) and ẑk = zk (θ̂N ) denote the real value and the estimate of
the zero zk (θ) (the same notations as in [17] are used here).
In [34] it is shown that the asymptotic variance of the zero estimate in
the number of data can be expressed as
£
¤2
¡ ¢
¡ ¢
lim E ẑk − zk0 = α2 Γ?b zk0 Pb Γb zk0 ,
N →∞
(6.1)
where
¯ 0 ¯2
¯z ¯
λ
0
α2 = ¯ ¡k ¢¯2
¯
¯
N ¯B̃ zk0 ¯
B̃ (q, θ) =
³
Γb =
(6.2)
B (q, θ)
1 − zk (θ) q −1
1 q −1 . . . q −nb
(6.3)
´T
(6.4)
and P
matrix of the estimates of the parameter vector
³b is the covariance
´
θb = b1 . . . bnb .
If zk0 is a non-minimum phase, the following expression for the asymptotic
variance of the zero estimate is found in [17]:
¯ ¡ ¢¯2 ¯ ¡ ¢¯2
¤
£
α2 ¯H zk0 ¯ ¯A zk0 ¯
0 2
lim lim E ẑk − zk = ³
(6.5)
¯ ¯−2 ´ ¯ ¡ ¢¯2 ,
nb →∞ N →∞
¯Q z 0 ¯
1 − ¯zk0 ¯
k
where Q (q) is minimum phase filter for the input, i.e. u (t) = Q (q) v (t), v
being white noise with unit variance. Notice that this expression is asymptotic also in the model order.
The input design problem considered here is:
1
min
Φu 2π
subject to
Zπ
Φu (ω) dω
(6.6)
−π
Var (ẑk ) ≤ γ
The objective function to minimize is the input power, under quality constraints on the variance of the zero estimate, which can be expressed as (6.1)
or (6.5). In [17] it is shown that the problem (6.6) can be formulated as a
CHAPTER 6. ZERO ESTIMATION
46
convex optimization problem in the input spectrum. The point is to express
the constraint as a convex function of Φu . Using the general asymptotic
expression of the variance of the zero estimate (6.1), the convex formulation
of the problem results:
1
min
Φu 2π
subject to
Pθ−1
−
0
Zπ
Φu (ω) dω
(6.7)
−π
α2
γ
Γb0 Γ?b0 ≥ 0
³
´T
where Γb0 = ΓTb 0
.
The problem is feasible but infinitely parametrized. An input spectrum
parametrization as discussed in Section 3.3 is then needed to solve the problem numerically. Nevertheless, [17] show that for some simple cases it is
possible to derive analytical solutions to this problem. These are in particular FIR and ARX models.
6.2
FIR
Consider the FIR system
y (t) = q −nk B (q, θ) u (t) + e (t)
(6.8)
and a non-minimum phase zero zk0 of B (q, θ0 ).
The optimal input in this case has the following spectrum [17]:
Φu (z) = Q (z) Q (z)?
q
¡ ¢−2
1 − zk0
α
Q (z) = √
.
¡ ¢
γ 1 − z 0 −1 z −1
k
(6.9)
(6.10)
2
The variance of the zero estimate is γ and the required input energy is αγ .
Notice that the optimal input is independent of the model order, which
though has to be equal or greater than the true system order. Consequently,
if there is no undermodeling and the optimal input is chosen, also the variance of the zero estimate will not depend on the model order [6].
6.3
ARX
In [6] it is proven that for the ARX model
y (t) =
1
q −nk B (q, θb )
u (t) +
e (t) ,
A (q, θa )
A (q, θa )
(6.11)
CHAPTER 6. ZERO ESTIMATION
47
the optimal input spectrum is exactly the same found in Section 6.2.
The same observations made in the previous section, regarding the independence of the input spectrum and the variance of the zero estimate from
the model order, can be repeated in this case. Furthermore, it is possible to
see that the optimal input spectrum and the variance of the zero estimate
are also independent of the polynomial A.
6.4
General linear SISO systems
For general SISO models it is possible to derive analytical solutions to the
input design problem based only on the asymptotic covariance expression
(6.5).
In that case, the optimal input spectrum found in [6] is
Φu (z) = Q (z) Q (z)?
q
¡ ¢−2
¯ ¡ ¢ ¡ ¢¯
1 − zk0
α ¯H zk0 A zk0 ¯
.
Q (z) =
√
¡ ¢−1
γ
z −1
1 − z0
(6.12)
(6.13)
k
The analytic expression (6.13) found for general SISO systems is correct
when the model order is sufficiently high. If this is not the case, problem
(6.7) should be solved numerically. By a suitable parametrization of the
input spectrum, it is possible to write the problem as a finite dimensional
convex problem that can be easily solved by efficient numerical optimization
methods [17]. Parametrizations of the input spectrum have been discussed
in Section 3.3.
In [17] a numerical example shows that when the model order equals
the real system order the optimal input is in general a sum of sinusoids
of different frequencies, while if the model order is increased, the optimal
spectrum has a different shape, given by the expression (6.13).
The same paper also shows that there is a significant benefit in using
the optimal input instead of a white input: the variance of the estimated
zero decreases, especially when the zero is close to the unit circle. Also, it
is shown that the input spectrum obtained by substituting the true value of
the zero by the zero estimate is robust with respect to the true zero position.
CHAPTER 6. ZERO ESTIMATION
6.5
6.5.1
48
Adaptive algorithm for time domain input design
A motivation
So far it was considered only the case in which the real system (2.2) is
contained in the model class (2.1) and upon that condition an expression for
the optimal input spectrum was given, when the model order is sufficiently
high.
It has been shown in [35] that, upon certain conditions, an input that
is designed to be optimal for a scalar cost function and a full order model,
results in experimental data for which also reduced order models can be
used to identify the property of interest. For example, in the zero estimation
problem, [35] proves that the input
u (t) = z0−1 u (t − 1) + r (t − 1) ,
(6.14)
where z0 is a non-minimum phase zero of the real system and r is zero mean
white noise with unit variance, can be used to consistently estimate z0 by a
two parameter FIR model
y (t) = θ1 u (t − 1) + θ2 u (t − 2) .
(6.15)
That means that if the input (6.14) is applied to the real system (2.2) and
the NMP zero z0 is estimated as the zero of the the FIR model (6.15), that
is − θθ21 , then the estimate error converges to zero as the number of data goes
to infinity.
Since the real value of the zero is not known, the optimal input ought
to be modified in order to be used in practical situations. There are two
main approaches: the first is to design the input taking into account that
the real system is uncertain a priori [3]; the second one is an adaptive design
where the input is successively updated as new information is available [18,
36]. It has been proven in [37] and [38] that when the real system is in
the model set and an ARX model is used, adaptive input design achieves
asymptotically the same accuracy as the optimal input (6.10) based on the
knowledge of the real system. The idea of an adaptive solution to input
design for general SISO systems in case of undermodeling is introduced
in [18]. It iteratively generates the input (6.10) where z0 is replaced by the
latest available estimate of the zero.
CHAPTER 6. ZERO ESTIMATION
6.5.2
49
Algorithm description
Consider the system (2.2) defined in Section 2.1. Let z0 be a non-minimum
phase zero of B0 (q) = B (q, θ0 ). An FIR model is used to estimate the zero,
y (t) = ϕ (t)T θ,
h
i
where ϕ (t)T = u (t − 1) u (t − 2) , and the input is
q
u (t) = ρt−1 u (t − 1) + 1 − ρ2t−1 r (t − 1) ,
where r (t) is zero mean and unit variance white noise and ρt =
The zero estimate is updated by the equations
1
θ̂ (t) = θ̂ (t − 1) + R−1 (ρt−1 ) ϕ (t) [y (t) − ŷ (t)]
t
θ̂1 (t)
ρt = −
θ̂2 (t)
where
Ã
R (ρ) =
1 ρ
ρ 1
(6.16)
1
ẑt
[18].
(6.17)
(6.18)
!
.
(6.19)
The algorithm is a simplified version of the RLS-algorithm for the model
(6.5.2), where a constant R (ρ) is used instead of a time-varying matrix
Rt (ρ).
Notice that if a perfect estimate of the zero is available, i.e. ρt−1 = z0−1 ,
then the input is generated by the recursion (6.14) providing a consistent
estimate of the zero (see Section 6.5.1). Therefore, the method is consistent,
because if the input is (6.14) then the zero estimate converges to the true
value of the zero and viceversa, if the zero estimate converges to the true
zero, then the algorithm produces the input (6.14).
The proof of the algorithm convergence is given in [18] when G0 is a
stable rational transfer function with exactly one time delay and with at
least one real NMP zero, and when H0 is stable. Actually, a projection
mechanism ought to be introduced to assure convergence. The trajectory
of ẑt is forced to lie in a compact set that is strictly inside the domain of
attraction of the zero z0 .
6.5.3
Algorithm modification for Markov chain signals generation
Observe that the input (6.16) has the spectrum
Φuadapt (z) =
1 − ρ2t−1
.
(z − ρt−1 ) (z −1 − ρt−1 )
(6.20)
CHAPTER 6. ZERO ESTIMATION
50
Figure 6.1: Representation of the adaptive algorithm iteration at step k. uk
denotes the vector of all collected input values from the beginning of the
experiment, used for the output prediction ŷ (k).
From the results in Section 4.2, the same spectrum (6.20) can be obtained
with a Markov chain signal having state space S = {1, −1} and described
by the graph in Figure 4.1, with
p=q=
ρt−1 + 1
.
2
(6.21)
Therefore, it is possible to generate in the time domain a binary signal
having exactly the spectrum (6.20), by extracting samples from a Markov
chain probability distribution. In this setting, the probability distribution
will be time-dependent, since the transition probabilities are functions of the
zero estimate (then p = pt−1 ). The adaptive solution uses the current zero
estimate to update the probability pt−1 that is used then to generate the next
input value u (t). The algorithm iteration scheme is shown in Figure 6.1.
While the initial solution generates the input by filtering white noise
through a time varying filter, this solution extracts each input sample from
a time varying distribution. This method has the advantage that the input
has amplitude constrained directly by its probabilistic model, while keeping
the same spectral properties as the general non binary inputs.
Notice that when no undermodeling is considered, the optimal or asymptotically optimal input spectra for NMP zero estimation, (6.10) and (6.13),
differ only by a scaling gain from the spectrum of the considered Markov
z −1 +1
chain with p = q = 0 2 .
Some numerical results are shown in the next subsection.
CHAPTER 6. ZERO ESTIMATION
6.5.4
51
Numerical example
Consider the example taken from [18]. The real system is
y (t) =
(q − 3) (q − 0.1) (q − 0.2) (q + 0.3)
q
u (t) +
e0 (t)
4
q (q − 0.5)
q − 0.8
where e0 is Gaussian white noise with variance λ0 = 0.01. The NMP zero
is z0 = 3.
Figure 6.2 presents one typical zero estimate trajectory evaluated by the
algorithm presented in Section 6.5.2 and its modification form Section 6.5.3
that uses a Markov chain input signal. The Figure shows similar plots for
Figure 6.2: A zero estimate trajectory produced by the adaptive algorithms
described in Sections 6.5.2 and 6.5.3.
the two algorithms, both reaching a final value very close to the true zero:
ẑMarkov = 3.061
ẑAR = 3.026
Figure 6.3 compares the normalized variance of the estimate error of the two
algorithms. The variance has been calculated through M = 1000 MonteCarlo simulations, each consisting of N = 30000 iterations of (6.16)-(6.17).
The normalized variance of the estimate error at each time instant is evaluated by
M
¢2
1 X¡ i
var (ẑk − z0 ) =
ẑk − z0 .
(6.22)
kM
i=1
CHAPTER 6. ZERO ESTIMATION
52
Figure 6.3: Normalized variance of the estimation error for the adaptive
algorithms described in Sections 6.5.2 and 6.5.3.
The result shows the empirical convergence of both algorithms, since the
normalized variance converges to a constant value.
6.6
Conclusions
This chapter proposed a Markov chain model for NMP zeros estimation. For
ARX or FIR models the considered Markov chain has exactly the asymptotically optimal (in the number of data) input spectrum for zero estimation.
For general SISO models, the spectrum is asymptotically optimal in the
number of data and in the model order.
It has also been shown that the binary signal generated through a Markov
chain with a certain probability distribution has the same spectral properties as the AR signal which guarantees consistent estimates of NMP zeros in
undermodeling conditions. Therefore, a binary signal can be used to consistently estimate NMP zeros of LTI SISO systems using an FIR output model.
The algorithm proposed in [18], which adaptively generates the AR input
signal, has been modified in order to fit with a Markov chain input model.
A numerical example showed that the new solution has the same empirical convergence properties as the previous one. The advantage is that a
binary signal can be used and therefore any input amplitude constraint is
assured.
Chapter 7
Summary and future work
This chapter summarizes the results obtained in this thesis and suggests
some directions for the future work.
7.1
Summary
Input design is an important issue in system identification, in order to optimize the quality of the model constructed by experimental data.
In Chapters 2 and 3 system identification and well known input design
techniques were reviewed.
The objective of this thesis has been to study the input design problem
when finite Markov chains are used as models of the input signals. The main
advantage of this approach with respect to the one in the frequency domain,
is that the input model directly includes the input amplitude constraints
that are always present in practical applications. Secondly, the solution in
the probability domain makes it easier to generate the input signal, since
its samples can be extracted from the optimal distribution. The Markov
chain input model and its properties were described in Chapter 4. Chapter 5 has shown that when the objective of the identification is parameter
estimation of a linear system, a Markov chain model of the input signal can
notably improve system identification. The estimate accuracy, measured as
the trace of the covariance matrix of the parameter estimate, is lower than
the value obtained with the binary signal having the optimal input spectrum. For designing the input probability distribution, a stochastic approximation framework had to be introduced. At last, in Chapter 6 the NMP
zeros estimation problem has been analyzed. It turned out that a simple
two states Markov chain with a prescribed probability distribution allows
to consistently estimate a NMP zero of a linear system in undermodeling
53
CHAPTER 7. SUMMARY AND FUTURE WORK
54
conditions.
7.2
Future work
From the results obtained in this thesis, some other possibly interesting
problems may be analyzed in future works. Some suggestions are:
• More general Markov chains could be used to model non binary signals.
• A more explicit analytic expression of the spectrum of the general
Markov chain model introduced in Chapter 4 could be useful when
choosing the input model state space and order.
• Hammerstein systems identification requires the estimation of both
the linear and the non linear parts of the system. It is known that
in some cases this can be done separately, by using binary signals.
The problem reduces to the design of the spectrum of a binary signal for the linear subsystem identification; the non linear subsystem
identification is in general more complex and depends on the input
probability distribution. Therefore, it could be interesting to study
the performance of a binary signal generated from a Markov chain
distribution for Hammerstein systems identification.
• The L2 -gain of a system gives an important information on the feedback stability. An accurate estimate of this gain is often necessary for
control applications. Stochastic approximation could be used also in
this case to design the optimal probability distribution of a Markov
chain for L2 gain estimation.
Bibliography
[1] G. C. Goodwin, R. L. Payne, Dynamic system identification: experiment design and data analysis. New York: Academic Press, 1977.
[2] H. Hjalmarsson, “From experiment design to closed-loop control”, Automatica, vol. 41, no. 3, pp. 393-438, 2005.
[3] C. R. Rojas, J. S. Welsh, G. C. Goodwin, A.Feuer, “Robust optimal experiment design for system identification”, Automatica 43, 2007, pages
993-1008.
[4] J. Mårtensson, H. Hjalmarsson, “Robust input design using sum of
squares constraints”, in IFAC Symposium on System Identification,
Newcastle, Australia, March 2006, pp.1352-1357.
[5] G. C. Goodwin, J. S. Welsh, A. Feuer, M. Derpich, “Utilizing prior
knowledge in robust optimal experiment design”, in IFAC Symposium
on System Identification, Newcastle, Australia, March 2006, pp.13581363.
[6] H. Jansson, “Experiment Design with Applications in Identification for
Control”, Doctoral Thesis, KTH, Stockholm 2004.
[7] B. L. Cooley, J. H. Lee, S. P. Boyd, “Control-relevant experiment design: a plant-friendly, LMI-based approach”, in American Control Conference, Philadelphia, Pennsylvania, June 1998, pp. 1240-1244.
[8] H. Hjalmarsson, J. Mårtensson, B. Wahlberg, “On some robustness
issues on input design”, in IFAC Symposium on System Identification,
Newcastle, Australia, , March 2006, pp.511-516.
[9] X. Bombois, M. Gilson, “Cheapest identification experiment with guaranteed accuracy in the presence of undermodeling”, in IEEE Conference
on Decision and Control, Paradise Island, Bahamas, December 2004,
pp.505-510.
55
BIBLIOGRAPHY
56
[10] H. Jansson, H. Hjalmarsson, “Input design via LMIs admitting
frequency-wise model specifications in confidence regions”, IEEE
Transactions on Automatic Control, vol. 50, no. 10, pp. 1534-1549,
2005.
[11] K. Lindqvist, H. Hjalmarsson, “Optimal input design using linear matrix inequalities”, in IFAC Symposium on System Identification, Santa
Barbara, California, USA, July 2000.
[12] M. Barenthin, “On Input Design in System Identification for Control”,
Licentiate Thesis in Automatic Control, KTH, Stockholm 2006.
[13] C. R. Rojas, J. S. Welsh, G. C. Goodwin, “A receding horizon algorithm to generate binary signals with a prescribed autocovariance”,
Proceedings of the ACC’07 Conference, 2007, New York, USA.
[14] H. Suzuki, T. Sugie, ”On input design for system identification in time
domain”, Proceedings of the European Control Conference 2007, Kos,
Grece, July 2-5, 2007.
[15] G. C. Pflug, Optimization of Stochastic Models, Kluwer Academic Publishers, 1996.
[16] L. Ljung, G. Pflug, H. Walk, Stochastic approximation and optimization
of random systems, Birkhauser, 1991.
[17] J. Mårtensson, Henrik Jansson, Hakan Hjalmarsson, “Input design for
identification of zeros”, Proceedings of the 16th IFAC World Congress
on Automatic Control, 2005.
[18] C. R. Rojas, H. Hjalmarsson, L. Gerencsèr, J. Martensson, “Consistent estimation of real NMP zeros in stable LTI systems of arbitrary
complexity”, European Control Conference, 2009.
[19] L. Ljung, System identification: Theory for the user, Second Edition.
Prentice Hall, 1999.
[20] Y. Nesterov, A. Nemirovski, “Interior-point polynomial methods in
convex programming”, Studies in Applied Mathematics 13, SIAM,
Philadelphia, PA, 1994.
[21] S. Boyd, L. Ghaoui, E. Feron, V. Balakrishnan, “Linear matrix inequalities in system and control theory.”, Studies in Applied Mathematics,
SIAM, Philadelphia, 1994.
BIBLIOGRAPHY
57
[22] H. Jansson, H. Hjalmarsson, “A framework for mixed H∞ and H2 input
design”, In MTNS, Leuven, Belgium, July 2004.
[23] V. A. Yakubovich, “Solution of certain matrix inequalities occurring
in the theory of automatic control”, Docl. Acad. Nauk. SSSR, pages
1304-1307, 1962.
[24] U. Grenander, G. Szegö, Toeplitz forms and their applications. University of California Press, Berkley, CA. 1958.
[25] A. Lindquist, G. Picci,“Canonical correlation analysis, approximate covariance extension, and identification of stationary time series”, Automatica, 32(5): 709-733, 1996.
[26] P. Stoica, R. Moses, Spectral analysis of signals. Prentice-Hall, Upper
Saddle River, New Jersey, 2005.
[27] J. L. Doob, Stochastic processes. Wiley, New York, 1953.
[28] T. Kailath, Linear systems. Prentice-Hall, Englewood Cliffs, NJ, 1980.
[29] M. Ehrgott, Multicriteria Optimization, Second Edition, Springer,
2006.
[30] E. Zitzler, “Evolutionary algorithms for multiobjective optimization:
methods and applications ”, Doctoral thesis, Swiss Federal Institute of
Technology Zurich, 1999.
[31] J. C. Spall, “Multivariate stochastic approximation using a simultaneous perturbation gradient approximation”, IEEE Transactions on
Automatic Control, vol. 37, no. 3, March 1992.
[32] J. C. Spall, “Implementation of the simultaneous perturbation algorithm for stochastic optimization”, IEEE Transactions on Aerospace
and Electronic Systems, vol. 34, no. 3, pp. 817-823, July 1998.
[33] S. Skogestad, I.Postlethwaite, Multivariable Feedback Control, Analysis
and Design., John Wiley and Sons, 2005.
[34] K. Lindqvist, “On experiment design in identification of smooth linear
systems”, Licentiate thesis, TRITA-S3-REG-0103, 2001.
[35] J. Mårtensson, Geometric analysis of stochastic model errors in system
identification, Doctoral thesis, KTH, Stockholm, Sweden, 2007.
BIBLIOGRAPHY
58
[36] L. Gerencsér, H. Hjalmarsson, “Adaptive input design in system identification”, Proceedings of the 44th IEEE Conference on Decision and
Control and European Control Conference, pages 4988-4993, Seville,
Spain, December 12-15 2005.
[37] L. Gerencsér, H. Hjalmarsson, “Identification of ARX systems with
non-stationary inputs-asymptotic analysis with application to adaptive
input design”, Automatica, 2008.
[38] L. Gerencsér, H. Hjalmarsson, J. Mårtensson, “Adaptive input design
for ARX systems”, European Control Conference, Kos, Greece, July 2-5
2007.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertisement