Experiment Design with Applications in Identification for Control Henrik Jansson

Experiment Design with Applications in Identification for Control Henrik Jansson
Experiment Design with Applications in
Identification for Control
Henrik Jansson
TRITA-S3-REG-0404
ISSN 1404–2150
ISBN 91-7283-905-8
Automatic Control
Department of Signals, Sensors and Systems
Royal Institute of Technology (KTH)
Stockholm, Sweden, 2004
Submitted to the School of Electrical Engineering, Royal Institute of
Technology, in partial fulfillment of the requirements for the degree of
Doctor of Philosophy.
c 2004 by Henrik Jansson
Copyright Experiment Design with Applications in Identification for Control
Automatic Control
Department of Signals, Sensors and Systems
Royal Institute of Technology (KTH)
SE-100 44 Stockholm, Sweden
Abstract
The main part of this thesis focuses on optimal experiment design for
system identification within the prediction error framework.
A rather flexible framework for translating optimal experiment design into tractable convex programs is presented. The design variables
are the spectral properties of the external excitations. The framework
allows for any linear and finite-dimensional parametrization of the design
spectrum or a partial expansion thereof. This includes both continuous
and discrete spectra. Constraints on these spectra can be included in
the design formulation, either in terms of power bounds or as frequency
wise constraints. As quality constraints, general linear functions of the
asymptotic covariance matrix of the estimated parameters can be included. Here, different types of frequency-by-frequency constraints on
the frequency function estimate are expected to be an important contribution to the area of identification and control.
For a certain class of linearly parameterized frequency functions it is
possible to derive variance expressions that are exact for finite sample
sizes. Based on these variance expressions it is shown that the optimization over the square of the Discrete Fourier Transform (DFT) coefficients
of the input leads to convex optimization problems.
The optimal input design are compared to the use of standard identification input signals for two benchmark problems. The results show
significant benefits of appropriate input designs.
Knowledge of the location of non-minimum phase zeros is very useful
when designing controllers. Both analytical and numerical results on
input design for accurate identification of non-minimum phase zeros are
presented.
A method is presented for the computation of an upper bound on the
maximum over the frequencies of a worst case quality measure, e.g. the
worst case performance achieved by a controller in an ellipsoidal uncertainty region. This problem has until now been solved by using a frequency gridding and, here, this is avoided by using the Kalman-Yakubovich-
iv
Popov-lemma.
The last chapter studies experiment design from the perspective of
controller tuning based on experimental data. Iterative Feedback Tuning
(IFT) is an algorithm that utilizes sensitivity information from closedloop experiments for controller tuning. This method is experimentally
costly when multivariable systems are considered. Several methods are
proposed to reduce the experimental time by approximating the gradient
of the cost function. One of these methods uses the same technique of
shifting the order of operators as is used in IFT for scalar systems. This
method is further analyzed and sufficient conditions for local convergence
are derived.
Acknowledgments
The time as a PhD student has been a great experience. I would like to
express my sincere gratitude to my supervisors during these years; Professor Bo Wahlberg, Professor Håkan Hjalmarsson, and Docent Anders
Hansson. You have all been very helpful.
A special thanks goes to Håkan. It has been really inspiring working
with you. You are really creative and I have learned a lot. I really
appreciate the time you spend on me. If I had to pay ”OB-tillägg” I
would be a poor guy.
Many thanks goes to my collaborators and co-authors Kristian Lindqvist,
Jonas Mårtensson and Märta Barenthin. It has been a great fun working
with you. Without you, there would be no SYSID Lab.
I would also like to express my gratitude to Karin Karlsson-Eklund
for helping me with different problems and being supportive.
It has been a pleasure to be at the department. For this, I would like
to thank all colleagues, former and present. I have really enjoyed your
company.
Finally, I would like to thank my family for their support.
Contents
1 Introduction
1.1 System Identification - A Starter . . . . . . .
1.2 Optimal Experiment Design - A Background
1.2.1 Design for Full Order Models . . . . .
1.2.2 Design for High Order Models . . . .
1.2.3 The Return of Full Order Modeling .
1.2.4 Other Contributions . . . . . . . . . .
1.3 Iterative Feedback Tuning . . . . . . . . . . .
1.4 Contributions and Outline . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
2
5
6
7
9
13
14
15
2 Parameter Estimation and Connections to Input Design
2.1 Parameter Estimation . . . . . . . . . . . . . . . . . . . .
2.2 Uncertainty in the Parameter Estimates . . . . . . . . . .
2.2.1 Parameter Covariance . . . . . . . . . . . . . . . .
2.2.2 Confidence Bounds for Estimated Parameters . . .
2.3 Uncertainty of Frequency Function Estimates . . . . . . .
2.3.1 Variance of Frequency Function Estimates . . . . .
2.3.2 Uncertainty Descriptions Based on Parametric Confidence Regions . . . . . . . . . . . . . . . . . . . .
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
19
21
21
23
24
25
3 Fundamentals of Experiment Design
3.1 Introduction . . . . . . . . . . . . . . .
3.1.1 Quality Constraints . . . . . .
3.1.2 Signal Constraints . . . . . . .
3.1.3 Objective Functions . . . . . .
3.1.4 An Introductory Example . . .
3.2 Parametrization and Realization of the
31
31
33
42
42
42
45
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
Input Spectrum
.
.
.
.
.
.
27
29
viii
Contents
3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . .
3.2.2 Finite Dimensional Spectrum Parametrization . .
3.2.3 Partial Correlation Parametrization . . . . . . . .
3.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . .
3.3 Parametrizations of the Covariance Matrix . . . . . . . . .
3.3.1 Complete Parametrizations of the Covariance Matrix
3.3.2 A Parametrization Based on a Finite Dimensional
Spectrum . . . . . . . . . . . . . . . . . . . . . . .
3.3.3 Summary . . . . . . . . . . . . . . . . . . . . . . .
3.4 Parametrization of Signal Constraints . . . . . . . . . . .
3.4.1 Parametrization of Power Constraints . . . . . . .
3.4.2 Parametrization of Point-wise Constraints . . . . .
3.5 Experiment Design in Closed-loop . . . . . . . . . . . . .
3.5.1 Spectrum Representations . . . . . . . . . . . . . .
3.5.2 Experiment Design in Closed-loop with a Fix Controller . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.3 Experiment Design in Closed-loop with a Free Controller . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Quality Constraints . . . . . . . . . . . . . . . . . . . . .
3.6.1 Convex Representation of Quality Constraints . .
3.6.2 Application of the KYP-lemma to Quality Constraints . . . . . . . . . . . . . . . . . . . . . . . .
3.7 Quality Constraints in Ellipsoidal Regions . . . . . . . . .
3.7.1 Reformulation as a Convex Problem . . . . . . . .
3.7.2 A Finite Dimensional Formulation . . . . . . . . .
3.8 Biased Noise Dynamics . . . . . . . . . . . . . . . . . . .
3.8.1 Weighted Variance Constraints . . . . . . . . . . .
3.8.2 Parametric Confidence Ellipsoids . . . . . . . . . .
3.9 Computational Aspects . . . . . . . . . . . . . . . . . . .
3.10 Robustness Aspects . . . . . . . . . . . . . . . . . . . . .
3.10.1 Input Spectrum Parametrization . . . . . . . . . .
3.10.2 Working with Sets of a Priori Models . . . . . . .
3.10.3 Adaptation . . . . . . . . . . . . . . . . . . . . . .
3.10.4 Low and High Order Models and Optimal Input
Design . . . . . . . . . . . . . . . . . . . . . . . . .
3.11 Framework Review and Numerical Illustration . . . . . . .
45
47
50
51
52
53
57
57
58
58
60
60
62
62
63
70
70
71
72
74
77
79
80
81
82
83
83
84
84
85
86
ix
Contents
4 Finite Sample Input Design for Linearly Parametrized
Models
95
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.2 Discrete Fourier Transform Representation of Signals . . . 97
4.3 Least-squares Estimation . . . . . . . . . . . . . . . . . . 98
4.4 Input Design . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.4.1 Geometric Programming Solution . . . . . . . . . . 103
4.4.2 LMI Solution . . . . . . . . . . . . . . . . . . . . . 105
4.4.3 Numerical Illustration . . . . . . . . . . . . . . . . 106
4.4.4 Closed-form Solution . . . . . . . . . . . . . . . . . 109
4.4.5 Input Design Based on Over-parametrized Models 111
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5 Applications
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 A Process Control Application . . . . . . . . . . . . . . .
5.2.1 Optimal Design Compared to White Input Signals
5.2.2 Optimal Input Design in Practice . . . . . . . . . .
5.3 A Mechanical System Application . . . . . . . . . . . . .
5.3.1 Optimal Design Compared to White Input Signals
5.3.2 Input Design in a Practical Situation . . . . . . . .
5.3.3 Sensitivity of the Optimal Design . . . . . . . . . .
5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . .
115
115
118
119
120
122
123
124
125
127
6 Input Design for Identification of Zeros
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Estimation of Parameters and Zeros . . . . . . . . . .
6.2.1 Parameter Estimation . . . . . . . . . . . . . .
6.2.2 Estimation of Zeros . . . . . . . . . . . . . . .
6.3 Input Design - Analytical Results . . . . . . . . . . . .
6.3.1 Input Design for Finite Model Orders . . . . .
6.3.2 Input Design for High-order Systems . . . . . .
6.3.3 Realization of Optimal Inputs . . . . . . . . . .
6.4 Input Design - A Numerical Solution . . . . . . . . . .
6.4.1 Numerical Example . . . . . . . . . . . . . . .
6.5 Sensitivity and Benefits . . . . . . . . . . . . . . . . .
6.5.1 Numerical Example . . . . . . . . . . . . . . .
6.6 Using Restricted Complexity Models for Identification
Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . .
131
132
133
133
135
136
136
142
143
144
146
147
149
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
of
. .
. .
150
151
x
Contents
7 Convex Computation of Worst Case Criteria
153
7.1 Motivation of Quality Measure . . . . . . . . . . . . . . . 155
7.1.1 Parametric Ellipsoidal Constraints . . . . . . . . . 155
7.1.2 Worst Case Performance of a Control Design . . . 156
7.1.3 The Worst Case Vinnicombe Distance . . . . . . . 157
7.2 Computation of Worst Case Criterion for a Fixed Frequency158
7.3 Computation of an Upper Bound . . . . . . . . . . . . . . 160
7.4 Numerical Illustrations . . . . . . . . . . . . . . . . . . . . 163
7.4.1 Computation of the Worst Case Vinnicombe Distance163
7.4.2 Computation of Worst Case Control Performance . 165
7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 166
8 Gradient Estimation in IFT for Multivariable Systems
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 System Description . . . . . . . . . . . . . . . . . . . . . .
8.3 Gradient Estimation in the IFT Framework . . . . . . . .
8.4 Gradient Approximations . . . . . . . . . . . . . . . . . .
8.5 Analysis of Local Convergence . . . . . . . . . . . . . . .
8.6 Numerical Illustrations . . . . . . . . . . . . . . . . . . . .
8.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . .
8.A Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.A.1 Proof of Theorem 8.1 . . . . . . . . . . . . . . . .
8.A.2 Proof of Corollary 8.1 . . . . . . . . . . . . . . . .
8.A.3 Proof of Theorem 8.2 . . . . . . . . . . . . . . . .
8.A.4 Proof of Corollary 8.2 . . . . . . . . . . . . . . . .
8.A.5 Proof of Theorem 8.3 . . . . . . . . . . . . . . . .
169
169
171
173
175
177
182
186
188
188
190
191
191
191
9 Summary and Suggestions for Future Work
193
9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 195
Chapter 1
Introduction
In this thesis different ways of gathering system information based on experimental data are considered. More precisely, we will focus on methods
and situations where it is possible to manipulate some external excitation signals in order to make this information collection more efficient.
A typical system configuration is depicted in Figure 1.1, where S is an
unknown system, u is a known or measured input signal, v represents
unmeasurable external disturbances and y is the measured output of the
system. Possibly, there is also a feedback mechanism, here represented by
the controller K and external reference signals to the closed-loop system
represented by r.
We will consider experiment design from two different perspectives.
The first is how to construct external excitations that yield informative
experiments for system identification. System identification deals with
the topic of obtaining mathematical models of dynamic systems based on
measured data. The nature of the external excitations that acts on the
system plays an important role for the characteristics of the estimated
models in system identification. Sometimes, these excitations can be
manipulated by the model builder. This is very useful in order to obtain
as much qualitative information as possible. This will include design
in open-loop where the spectral properties of u are shaped. We will
also consider situations where there is a feedback mechanism involved
that make direct influence of u impossible, but where u indirectly can
be manipulated by external reference signals and/or via the feedback
mechanism itself.
The second perspective is related to performance calibration of a con-
2
1 Introduction
v
u
y
S
K(ρ)
r
Figure 1.1: A system S with output y, input u and disturbance v.
troller by using experimental data, obtained from special experiments on
the real process in feedback with the controller to be tuned. The starting
point is the system depicted in Figure 1.1 where the feedback loop is
closed. By manipulating the reference signal r it is possible to retrieve
information about the closed-loop sensitivity to the parameters ρ that
parametrizes the controller K(ρ). Iterative Feedback Tuning (IFT) is a
method that uses such sensitivity information to tune controllers. However, the experiment load grows substantially when multivariable systems
are considered. We will study different ways of approximating the gradient estimate to reduce the number of experiments.
This chapter contains short introductions to experiment design for
system identification and to controller tuning using IFT.
1.1
System Identification - A Starter
System identification is a mature subject and there exist several methods. Applications of these can be found in almost all engineering disciplines. The theory has been firmly treated in several books including
(Bohlin, 1991; Eykhoff, 1974; Goodwin and Payne, 1977; Hannan and
Deistler, 1988; Johansson, 1993; Pintelon and Schoukens, 2001). This
thesis considers only identification in the prediction error framework
for which (Ljung, 1999; Söderström and Stoica, 1989) serve as excellent
sources. Furthermore, we will restrict the attention to identification of
discrete time linear time-invariant single input/ single output systems. It
should be remarked that the prediction error method is not restricted to
1.1 System Identification - A Starter
3
this class of systems. The system can be both continuous and non-linear
as well as containing several inputs and outputs.
The method consists of basically two steps. The first is to choose
a model class parametrized by some parameter vector. The second is
to find the model within the model class that minimizes the prediction
error. To illustrate this consider the following example.
Example 1.1
Consider that we have a set of observed input/output data {u(t), y(t)}N
t=1
where u and y are real and scalar valued. Based on these observations we
want to find a mathematical relation that describes the coupling between
the input and output data. Let us use a prediction error approach to this
modelling. The first step is to define a parametrized model class that
correspond to our assumptions of how the data has been generated. One
example is to propose the following second order Finite Impulse Response
(FIR) model
y(t, θ) = b0 u(t − 1) + b1 u(t − 2) + e(t).
(1.1)
Here b0 and b1 are unknown parameters to be estimated and e(t) represents an unmeasurable disturbance in terms of white noise with variance
λ. The parameters are typically collected in a vector θT = b0 b1 . The
objective is thus to estimate θ which in the prediction error framework
is done by minimizing the difference between the measured output y and
the prediction of y based on past data. The best one-step ahead predictor
of the model (1.1) is given by
ŷ(t, θ) = b0 u(t − 1) + b1 u(t − 2)
T u(t − 1)
= b0 b1
u(t − 2)
= θT ϕ(t).
(1.2)
A common way to obtain the estimate θ̂N of θ is by minimizing a leastsquares criterion, i.e.
θ̂N = arg min
θ
N
1 (y(t) − ŷ(t, θ))2 .
2N t=1
(1.3)
Since the predictor is linear in the parameters, a closed form solution of
4
1 Introduction
θ̂N can be obtained. The solution to (1.3) is given by
N
−1 N
T
ϕ(t)ϕ (t)
ϕ(t)y(t).
θ̂N =
t=1
(1.4)
t=1
As can be imagined, the identification include several choices to be
made by the user. These include how to perform the identification experiment, what model structure to choose, which identification method
or criterion to use and what relevant validation methods to apply. This
is a quite complex process. It is important to stress that even though the
identification can be separated in different sub-problems, they are in general dependent on each other. Example 1.1 gives only a simplified picture
of the model estimation step with a prediction error based method.
A prediction error method is a natural and logical framework for
dealing with the system identification problem. Beside its intuitive appeal, there exists a large amount of statistical results that support the
method. The following example shows some of the statistical properties
for the least-squares estimate (1.4) together with a simple illustration of
an optimal input design.
Example 1.2 (Example 1.1 continued)
Here we will illustrate some accuracy properties of the obtained estimate
in Example 1.1. For this we will assume that the true data has been
generated by
y(t, θ) = b00 u(t − 1) + b01 u(t − 2) + eo (t)
(1.5)
i.e. the true system has the same structure as the model (1.1). Now study
the model estimate defined by (1.3). Introduce the quantities
θoT = b00 b01 ,
(1.6)
RN =
N
ϕ(t)ϕT (t)
(1.7)
ϕ(t)y(t).
(1.8)
t=1
and fN =
N
t=1
Provided that RN is invertible, the estimate θ̂N can be written as
−1
−1
fN = θo + RN
θ̂N = RN
N
t=1
ϕ(t)eo (t)
(1.9)
1.2 Optimal Experiment Design - A Background
5
Assume that u is a zero mean stationary stochastic process that is uncorrelated with eo . Then θ̂N is an unbiased estimate of θo , i.e. E θ̂N = θo .
Furthermore, it can be shown that the covariance of θ̂N obeys
−1
r r1
(1.10)
lim N Cov θ̂N = λo 0
r1 r0
N →∞
where λo is the noise variance E e2o (t) = λo and rk = E u(t)u(t − k). A
common approximation of the covariance for finite data is thus
−1
λo r0 r1
.
(1.11)
Cov θ̂N ≈
N r1 r0
Consider e.g. the variance of each parameter
Var b̂0 ≈
λo r0
≈ Var b̂1 .
N (r02 − r12 )
(1.12)
Hence, the variance decays as 1/N . Furthermore, the larger input variance r0 the smaller variance of the parameters. This holds in general for
linear systems. Notice that if the objective is to minimize the variance of
the parameters according to (1.12) and the input variance is constrained,
then the optimal input is white noise i.e. r1 = 0. This is not necessarily
true for other objectives though, see e.g. Example 3.4.
Since the true covariance of θ̂N is typically difficult to explicitly compute
for finite N , it is common to use the approximation (1.11), that is based
on the asymptotic covariance, for experiment design purposes. This is
a good approximation provided the data length is sufficiently large and
that the model class captures the dynamics of the true system.
For the input design problem in Example 1.1 it was easy to compute an
explicit solution. However, optimal input design problems are in general
in their original form non-trivial to solve. In Chapter 3 we will study
the fundamentals of optimal experiment design problem in more detail.
In the next section, we will give a very brief historical background on
optimal experiment design.
1.2
Optimal Experiment Design - A Background
The focus in this section will be directed towards identification of linear systems within the prediction error framework where variance errors
6
1 Introduction
are the only concern. Even though these assumptions seem to be very
restrictive, this is the most common setting when optimal experiment design is considered. Mostly due to the basis of theoretical understanding
that exists for this setting. Some of this theory developed for the prediction error framework regarding model uncertainty will be presented in
Chapter 2.
Experiment design as a subject dates long back and is a much wider
subject than that related to parameter estimation of dynamical systems.
In statistics, a huge amount of literature exists but only a smaller part
is related to input design where (Box and Jenkins, 1970) is one example.
This is much due to the fact that in general there is no controllable
input in statistical time series. There are however several important
contributions in statistics that have been very useful for the work on
input design for linear dynamic systems. Many of these results are related
to the work on static linear regression models, see e.g. (Kiefer, 1959),
(Kiefer and Wolfowitz, 1959), (Kiefer and Wolfowitz, 1959), (Karlin and
Studden, 1966a), (Karlin and Studden, 1966b) and (Fedorov, 1972).
1.2.1
Design for Full Order Models
Input design for linear dynamic systems started out around 1960 where
(Levin, 1960) is one of the earliest contributions. In the 1970’s there was
a vigourous activity in this area for which (Mehra, 1974), (Goodwin and
Payne, 1977), (Zarrop, 1979), (Mehra, 1981) and (Goodwin, 1982) act as
excellent surveys.
Let us summarize some of main points in this work. Let P denote
the asymptotic parameter covariance matrix. To measure the goodness
of different designs, different measures of the covariance matrix of the
estimated parameters have been used. The covariance matrix provides
a measure of the average difference between the estimate and the true
value. The classical approach has been to minimize some scalar function
of the asymptotic covariance matrix P with constraints on input and/or
output power. Examples of commonly used criteria are
A-optimality : min Tr P
E-optimality : λmax (P )
(1.13)
(1.14)
D-optimality : det P
L-optimality : Tr W P
(1.15)
(1.16)
1.2 Optimal Experiment Design - A Background
7
Both design in the time domain and in the frequency domain have
been considered. In the time domain the design typically reduces to a
nonlinear optimal control problem with N free variables, i.e. the number
of variables equals the data length. The complexity was one of the reasons that motivated researchers to do the input design in the frequency
domain. By assuming large data lengths and by restricting the class of
allowable inputs to those having a spectral representation it is possible to
derive nice expressions for the asymptotic covariance matrix in terms of
its inverse, i.e. P −1 . Furthermore, design in the frequency domain makes
it in general easier to interpret the results.
Let the number of estimated parameters be p then any P −1 , which
is a Toeplitz matrix, can be characterized by p(p + 1)/2 parameters. An
important issue of the work in the context of (Mehra, 1974), (Goodwin
and Payne, 1977), (Zarrop, 1979) and others was to find a set of finitely
parametrized inputs that parametrizes all achievable information matrices (P −1 ). An important observation was that for input power constrained designs, all achievable information matrices can be obtained using a sum of sinusoids comprising not more than p(p + 1)/2 + 1 components, see (Mehra, 1974). This is a consequence of Carathéodory’s
theorem. For single input/single outputs linear systems this number
can further reduced by exploiting the structure of P −1 . It is shown in
(Goodwin and Payne, 1977) that for an mth order transfer function with
2m parameters, 2m sinusoids are sufficient. Some further refinements of
this is presented in (Zarrop, 1979), where under some geometric considerations together with the theory of Tschebycheff systems (Karlin and
Studden, 1966b) it is possible to reduce the number of required sinusoids
to m, which is the lowest possible number still having an informative
experiment (Ljung, 1999)
Most of the work during this time period was on optimal input design
in open-loop. But there are also some contributions related to experiment
design in closed-loop, see e.g. (Ng et al., 1977a; Ng et al., 1977b; Gustavsson et al., 1977; Goodwin and Payne, 1977).
1.2.2
Design for High Order Models
In the 1980’s input design attained renewed interest when the control
community recognized the utility of experiment design as a way to obtain
suitable models for control design. The starting point was the derivation
in (Ljung, 1985) of an expression for the variance of the estimated frequency functions that is asymptotic both in the model order and data.
8
1 Introduction
Assume that the true system and the model is described by the linear
relation
y(t) = G(q, θ)u(t) + H(q, θ)eo (t)
where G and H denote discrete transfer functions describing the system
and the noise dynamics, respectively. The novel contribution in (Ljung,
1985) has lead to the well known approximation
m
Φu (ω)
G(ejω , θ̂N )
Φv (ω)
Cov
≈
jω
Φ
N
H(e , θ̂N
ue (−ω)
−1
Φue (ω)
λo
(1.17)
for finite m and N where m is the model order. Here Φu is the input
spectrum, Φue is the cross spectrum that is zero in open-loop operation
and Φv is the spectrum of the disturbance. Furthermore the variance of
eo is λo .
Optimal experiment designs based on different variants of the variance
expression (1.17) have appeared in (Ljung, 1985) (Yuan and Ljung, 1985),
(Yuan and Ljung, 1984) and (Gevers and Ljung, 1986). Some of the key
points in this line of research are:
• To put the intended use of model into focus when assessing the
quality measure in the input design. The performance degradation
is measured by the function
π
J=
−π
Tr C(ω) · Φv (ω)
−1
Φu (ω)
Φue (ω)
dω
λo
Φue (−ω)
(1.18)
where the weighting matrix
C11 (ω)
C(ω) =
C21 (ω)
C12 (ω)
C22 (ω)
depends on the intended application, e.g. simulation, prediction or
control, cf. (Ljung, 1985), (Gevers and Ljung, 1986) and (Ljung,
1999).
• The high order assumption introduces a certain insensitivity against
the order of the true system.
• The optimal experiment design can be explicitly calculated.
1.2 Optimal Experiment Design - A Background
9
Other contributions in this line of research are (Hjalmarsson et al.,
1996), (Forssell and Ljung, 2000) and (Zhu and van den Bosch, 2000)
where different aspects of closed-loop identification in the context of control applications have been studied.
The contributions based on the high order variance expression (1.17)
has proven very successful, both in exposing the fundamentals of the
input design problem, and in practical applications, see e.g. (Zhu, 1998).
As has been mentioned, basing a method on asymptotic in model
order variance results yields a certain robustness against the properties
of the underlying system. However, it has been shown in e.g. (Ninness
and Hjalmarsson, 2002b) that these high order expressions are not always
accurate. This is the major fallacy with these methods, especially for loworder modeling, where the accuracy of the high order variance expression
in many cases is far from acceptable. This is illustrated in the following
simple example.
Example 1.3
Consider modeling of the output-error system
y(t) =
0.1
u(t) + e(t)
q − 0.9
(1.19)
where both the input u and the noise e are realizations of zero mean, unit
variance white noise processes. Based on 1000 samples of input/output
data points, a first order output-error model G(ejω , θ̂N ) is estimated. The
sample variance of this frequency function estimate based on 1000 experiments with different noise realizations is plotted in Figure 1.2 together
with the high order approximation (1.17). The high order approximation
is clearly a bad approximation for the variance of these model estimates
except for frequencies around ω = 0.7 rad.
1.2.3
The Return of Full Order Modeling
The uncertainty of the high order variance expression (1.17) for finite
model orders has lead to a renewed interest in full order modeling and experiment designs based on more accurate variance approximations. Two
contributions that do not rely high order variance expressions are (Cooley
and Lee, 2001), (Lee, 2003) and (Lindqvist and Hjalmarsson, 2001). They
are based on the first order Taylor expansion
Var G(ejω , θ̂N ) ≈
λ dG∗ (ejω , θo ) dG(ejω , θo )
P
N
dθ
dθ
(1.20)
10
1 Introduction
−20
−30
manitude (dB)
−40
−50
−60
−70
−80 −3
10
−2
10
−1
10
0
10
frequency (rad)
Figure 1.2: The variance comparison in Example 1.3. The sample
variance of G(ejω , θ̂N ) (solid) and the high order variance expression
(dashed).
(P is the asymptotic covariance matrix for the parameter estimate) of the
variance of the frequency function estimate G(ejω , θ̂N ). The approximate
variance expression (1.20) is actually the basis for the high order approximation (1.17). However, (1.20) is a good approximation for any model
order (as long as the sample size N is large enough) and can therefore
be expected to be a better approximation than the associated high order
expression, provided of course that the chosen model order is no lower
than the true system order. We examine this next in a simple example.
Example 1.4 (Example 1.3 continued)
The variance approximation (1.20) is compared to the high order approximation and the true variability in Figure 1.3 for the system in Example 1.3. Since the true parameters typically are unknown, the expression
(1.20) has also been evaluated for two different pole locations, 0.85 and
0.95. The Taylor expansion (1.20) does actually only depend on the pole
location and not on the gain of the system for this example. This is
proved in (Ninness and Hjalmarsson, 2004).
11
1.2 Optimal Experiment Design - A Background
−20
manitude (dB)
−30
−40
−50
−60
−70
−80
−90 −3
10
−2
10
−1
10
frequency (rad)
0
10
Figure 1.3: The variance comparison in Example 1.4. The solid line is
the sample variance of G(ejω , θ̂N ). The dashed line is the high order variance expression (1.17). The dash-dotted line is the Taylor approximation
(1.20) for the true system parameters. The dotted lines represents (1.20)
based on the perturbed pole locations 0.85(high static gain of variance)
and 0.95(low static gain of variance), respectively.
The large discrepancy of the high order variance expression from the true
variance for the system in Example 1.4 suggests that basing the input
design on such an approximation may result in an inappropriate design
for this particular system. This is confirmed in the next example.
Example 1.5 (Example 1.4 continued)
Let
0.1
G(q) =
q − 0.9
and consider an input design problem where there is a frequency-byfrequency bound on the variance of the first order model G(ejω , θ̂N ).
This can be posed as
minimize
α
subject to
Var G(ejω , θ̂N ) ≤ |W (ejω )|2
π
1
2π −π Φu (ω)dω ≤ α.
Φu
(1.21)
12
1 Introduction
−20
−30
magnitude (dB)
−40
−50
−60
−70
−80
−90
−100
−110 −3
10
−2
10
−1
10
0
10
frequency (rad)
Figure 1.4: Illustration of Example 1.5. Thin solid line: the variance
bound |W (ejω )|2 which equals the designed variance for the solution
based on the high order variance expression. Dash-dotted line: designed
variance based on Taylor approximation. Thick solid line: sample variance for input design based on Taylor approximation. Dashed line: sample variance for input design based on high order variance approximation.
Here W (ejω ) is a stable transfer function and the objective is to minimize the input power. Now we compare two different designs. In both
designs the true variance is approximated. The first design is based on
the high order variance approximation (1.17) with m = 1 and the second
is based on the Taylor approximation (1.20). The designed variances of
the frequency function estimate together with the corresponding sample
variances based on 1000 Monte-Carlo simulations are illustrated in Figure 1.4. To use a model as low as m = 1 in the high order expression is
a bit unfair. Therefore, the previous designs are compared with a design
based on the high order variance approximation but with m = 23. The
sample variance of G(ejω , θ̂N ) for this design is shown in Figure 1.4. This
design satisfies almost the variance bound. The price paid is that the input power is almost 8 times larger than the design based on the Taylor
approximation.
1.2 Optimal Experiment Design - A Background
13
The previous example further motivates the interest of studying optimal experiment designs based on uncertainty descriptions that are more
accurate for finite model orders than the high order expression (1.20).
Many of the quality constraints that have been suggested in the literature are based on L2 -norms, see e.g. (1.18), from which no conclusions
can be drawn about stability when control applications are concerned.
From the perspective of identification for robust control, frequency-byfrequency bounds are in many cases more relevant. Different frequencyby-frequency constraints on the model quality have been incorporated
into experiment designs in (Hildebrand and Gevers, 2003), (Jansson and
Hjalmarsson, 2004c), (Jansson and Hjalmarsson, 2004b), (Bombois et
al., 2004c) and (Bombois et al., 2004b), cf. Example 1.5.
The contributions (Hildebrand and Gevers, 2003), (Jansson and Hjalmarsson, 2004c) and (Jansson and Hjalmarsson, 2004b) also considers
parametric uncertainties in terms of confidence ellipsoids. These ellipsoids take the form
Uθ = {θ | N (θ − θo )T P −1 (θ − θo ) ≤ χ2α (n)}.
(1.22)
The shape of the ellipsoids will depend on the experimental conditions,
see (Ljung, 1999). This means that it is possible to design inputs such
that the quality objective is satisfied for all systems in a confidence region resulting from the identification experiment. A variant of the uncertainty set (1.22) is considered in (Bombois et al., 2004c) and (Bombois et
al., 2004b) where a projection into the Nyquist plane of the parametric
uncertainty set (1.22) is used.
A major drive that has made it possible to use more refined model
uncertainty descriptions such as (1.20) and (1.22), are the great advances within the optimization community on convex optimization, see
e.g. (Boyd et al., 1994), (Boyd and Vandenberghe, 2003) and (Nesterov
and Nemirovski, 1994).
1.2.4
Other Contributions
The introduction of optimal experiment design given in Section 1.2 gives
only a very brief overview of the field. Focus has been on contributions
related to linear systems where variance errors are the only concern. For
more detailed information and references, we refer to the good surveys
(Mehra, 1974; Goodwin and Payne, 1977; Zarrop, 1979; Goodwin, 1982;
Ljung, 1999) and (Pintelon and Schoukens, 2001) for design of periodic
14
1 Introduction
v
r
e
C(ρ)
u
y
Go
Figure 1.5: Feedback system.
signals. More details on this topic will also be presented in the subsequent
chapters.
So called ”plant-friendly” system identification has been on the agenda
within the chemical process control research community, see (Rivera et
al., 2002; Rivera et al., 2003; Lee, 2003) and the references therein. The
general objective in this line of research is to produce informative data
subject to constraints on the input/output amplitude and variation in the
time domain. Thus, more focus has been on design in the time domain.
1.3
Iterative Feedback Tuning
In this section we will turn from the topic of optimal experiment for identification to experiment design from a different perspective. Here we will
give a short introduction to a method called Iterative Feedback Tuning
(IFT). The purpose of IFT is to tune a controller with known structure based on experimental data. The method was originally introduced
in (Hjalmarsson et al., 1994) and a general presentation has appeared in
(Hjalmarsson et al., 1998). For a recent overview see (Hjalmarsson, 2002).
The control performance objective can often be formulated as some
cost function. In IFT almost any signal-based criterion can be used. The
IFT algorithm provides estimates of the gradient of this criterion based
on closed-loop experiments. The cost function is then minimized by some
gradient-based search method.
Consider the closed-loop system in Figure 1.5 for which the output
can be expressed as
y(ρ) = (I + Go C(ρ))−1 (Go C(ρ)r + v).
(1.23)
Here we have assumed that the system is discrete, linear, multivariable and time-invariant. Furthermore ρ is a real valued vector that
parametrizes the controller. Now differentiate the output with respect
1.4 Contributions and Outline
15
to the ith element of ρ. This gives
∂y(ρ)
∂C(ρ)
= (I + Go C(ρ))−1 Go
(r − y(ρ)).
∂ρi
∂ρi
(1.24)
From (1.24) it is easy to see that a realization of ∂y(ρ)/∂ρi can be obtained by performing two experiments on the real closed-loop system. In
the first experiment the output y(ρ) is collected under normal operational
conditions. In the second experiment, the signal (∂C(ρ)/∂ρi )(r − y(ρ))
is added on the input u while the normal reference is set to zero. This is
however, a rather inefficient way to retrieve gradient information since we
have to repeat this procedure for each element of the vector ρ. The topic
of Chapter 8 is to study different methods to approximate the gradient
(1.24) in order to reduce the experimental burden.
1.4
Contributions and Outline
In this section brief overviews of the different chapters are given together
with the main contributions and the related publications.
Chapter 2
A brief introduction to identification in the prediction error framework
is given in Chapter 2. Several important expressions are reviewed that
quantify the variability in the estimated parameters as well as the corresponding frequency function estimates. These expressions are frequently
used in the experiment designs presented in the subsequent chapters.
Chapter 3
The chapter presents some of the fundamentals for optimal experiment
design within the prediction error framework of system identification.
The general ideas of a rather flexible framework for translating optimal
experiment design problems into convex optimization programs are also
introduced. The key role is the parametrization of the spectrum to be
designed. Here we introduce two useful parametrizations of a spectrum,
that generalizes previous suggested parametrizations in e.g. (Goodwin
and Payne, 1977; Zarrop, 1979; Stoica and Söderström, 1982; Lindqvist
and Hjalmarsson, 2001; Jansson and Hjalmarsson, 2004a). The framework can handle several quality constraints of which some frequency wise
constraints are expected to be important contributions to the area of
16
1 Introduction
identification and control. Part of the work in Chapter 3 has been submitted as
H. Jansson and H. Hjalmarsson, “A general framework for mixed
H∞ and H2 input design”, Submitted to IEEE Transactions on
Automatic Control, May 2004.
This work is also related to the conference papers
H. Jansson and H. Hjalmarsson, “A framework for mixed H∞ and
H2 input design”, Mathematical Theories of Networks and Systems,
Leuven, Belgium, (2004).
H. Jansson and H. Hjalmarsson, “Mixed H∞ and H2 input design
for identification”, IEEE Conference on Decision and Control, Bahamas, (2004).
H. Jansson and H. Hjalmarsson, “Optimal experiment design in
closed loop.”, Submitted to 16th IFAC World Congress, Prague,
Czech Republic, (2005).
Chapter 4
Input designs rely in general on uncertainty descriptions that are valid
asymptotically in the number of data. In this chapter we consider a class
of linearly parameterized frequency functions for which it is possible to
derive variance expressions that are exact for finite sample sizes. The
major contributions are two solutions to translate constraints based on
these variance expressions into convex formulations where optimization
over the square of the Discrete Fourier Transform (DFT) coefficients is
performed.
Chapter 5
This chapter applies the theory developed in Chapter 3 on two applications, a process plant and a resonant mechanical system. One of the
objective is to study possible benefits of using optimal inputs compared
compared to the use of standard identification input signals, for example
PRBS signals, for the aforementioned applications. The second aspect is
to highlight some robustness issues regarding the input design.
The main part of this chapter has resulted in
1.4 Contributions and Outline
17
M. Barenthin, H. Jansson and H. Hjalmarsson, “Applications of
mixed H2 and H∞ input design for identification”, Submitted to
16th IFAC World Congress, Prague, Czech Republic, (2005).
Chapter 6
In this chapter we derive both analytical and numerical results on how
to accurately identify system zeros. Special attention is given to nonminimum phase zeros. Sensitivity and benefits of optimal input design
for identification of zeros are discussed and illustrated.
With minor changes, this chapter contains the contribution
J. Martensson, H. Jansson and H. Hjalmarsson, “Input design for
identification of zeros.”, Submitted to 16th IFAC World Congress,
Prague, Czech Republic, (2005).
Chapter 7
This chapter is not directly related to optimal experiment design. Instead
a method is presented for convex computation of a class of so-called worst
case criteria, of which the worst case Vinnicombe distance is one example.
These criteria can be used to evaluate performance and/or stability for
an ellipsoidal set of parametric models.
The work in this chapter has resulted in
H. Jansson and H. Hjalmarsson, “Convex computation of worst
case criteria with applications in identification for control”, IEEE
Conference on Decision and Control, Bahamas, (2004).
Chapter 8
The gradient estimation problem in Iterative Feedback Tuning (IFT) for
multivariable systems is reviewed in Chapter 8. Different approximations
are proposed with the purpose to reduce the experimental load. One
method, in which operators are shifted in a similar fashion as is used in
IFT for scalar systems, is analyzed in more detail.
The work in this chapter is published in
H. Jansson and H. Hjalmarsson, “Gradient approximations in iterative feedback tuning for multivariable processes”, International
Journal on Adaptive Control and Signal Processing, October, 2004.
Chapter 2
Parameter Estimation
and Connections to Input
Design
A brief introduction to the prediction error method of system identification is given in this chapter. This method consists of basically two
steps. The first is to choose a model class parameterized by some parameter vector. The second is to find the model within the model class
that minimizes the prediction error. Due to unmeasurable disturbances
and finite amount of data the estimated model will always contain errors
even in those cases where the model class is flexible enough to describe
the underlying system. In this chapter we will review results to quantify the variability of the parameter vector and the associated frequency
response estimate. It is the objective of the experiment design to shape
this variability.
2.1
Parameter Estimation
We will assume that the true system dynamics are captured by a linear
discrete time-invariant single-input single-output system given by
S:
y(t) = Go (q)u(t) + v(t)
v(t) = Ho (q)eo (t).
(2.1)
20
2 Parameter Estimation and Connections to Input Design
Here y(t) is the output, u(t) the input, v(t) is the disturbance and eo (t) is
zero mean white noise with variance λo . Furthermore, Go (q) and Ho (q)
are rational transfer functions in the forward shift operator q (qu(t) =
u(t + 1)) with Ho stable, monic and minimum phase. All signals are
assumed to have a spectral representation where the spectral densities of
u and v will be denoted Φu and Φv , respectively.
The true system is modelled by the parametric model
M:
y(t) = G(q, θ)u(t) + H(q, θ)e(t)
(2.2)
where e(t) represents white noise with zero mean and variance λ. It is
assumed that G and H have the rational forms
G(q, θ) =
q −nk B(q, θ)
,
A(q, θ)
C(q, θ)
D(q, θ)
H(q, θ) =
(2.3)
where nk is the delay and
A(q, θ) = 1 + a1 q −1 + · · · + ana q −na
B(q, θ) = b1 + b2 q
C(q, θ) = 1 + c1 q
−1
−1
D(q, θ) = 1 + d1 q
−1
+ · · · + bnb q
+ · · · + cnc q
−nb +1
−nc
+ · · · + dnd q
−nd
(2.4)
(2.5)
(2.6)
(2.7)
The polynomials A(q, θ) − D(q, θ) are parameterized by the real vector
θ ∈ Rn given by
θ = [a1 , · · · , ana , b1 , · · · , bnb , c1 , · · · , cnc , d1 , · · · , dnd ]T
(2.8)
The one-step-ahead predictor for the model (2.2) is
ŷ(t, θ) = H −1 (q, θ)G(q, θ)u(t) + [1 − H −1 (q, θ)]y(t)
(2.9)
The prediction error framework of system identification (Ljung, 1999;
Söderström and Stoica, 1989) with a quadratic cost function aims at
minimizing the prediction errors
ε(t, θ) y(t) − ŷ(t, θ)
(2.10)
with respect to the parameters in the following fashion
θ̂N = arg min
θ
N
1 ε(t, θ)2
2N t=1
(2.11)
Some results on the statistical properties of the estimate θ̂N will now be
given.
2.2 Uncertainty in the Parameter Estimates
2.2
21
Uncertainty in the Parameter Estimates
One common way to measure the quality of the estimates is to study their
asymptotic properties. That is, when the number of data N grows large,
the estimates will belong to some distribution. The properties of the
distribution will then determine the quality of the estimates. Assume
that the true system is in the model set (S ∈ M), i.e. there exists a
parameter θo such that G(θo ) = Go and H(θo ) = Ho . Then it can be
shown under mild assumptions, see (Ljung, 1999), that the prediction
error estimate θ̂N has an asymptotic distribution that obeys
√
N (θ̂N − θo ) → N (0, P ) as N → ∞
lim N E(θ̂N − θo )(θ̂N − θo )T = P
−1
P (θo ) = λo E[ψ(t, θo )ψ T (t, θo )]
∂
y(t, θ)
ψ(t, θo ) =
∂θ
N →∞
(2.12)
θ=θo
Here N denotes the Normal distribution. So when the system is in the
model set, the estimate will converge to the true estimate and the covariance of the estimation error decays as 1/N . Hence better and better
estimates are obtained when more and more data samples are used in the
estimation.
2.2.1
Parameter Covariance
However, it is not only the experiment length that will influence the
estimation accuracy. Introduce the spectrum
Φu Φue
Φχo =
(2.13)
Φ∗ue λo
where Φu is the spectrum of the input and Φue is the cross spectrum
between u and eo . The frequency distribution of the spectrum Φχo will
also influence the accuracy as is shown in the next lemma.
Lemma 2.1
The inverse of the covariance matrix, P −1 (θo ), is a linear function of the
spectrum Φχo given by
π
1
F(θo )Φχo (θo )F ∗ (θo )dω
(2.14)
P −1 (θo ) =
2πλo −π
22
2 Parameter Estimation and Connections to Input Design
where F(q, θo ) = Fu (q, θo ) Fe (q, θo ) with
dG(θo )
dθ
dH(θo )
−1
Fe (θo ) = H (θo )
dθ
Fu (θo ) = H −1 (θo )
(2.15)
(2.16)
Proof: Insert (2.9) into (2.12) and use Parseval’s formula to obtain the
integral expression.
Since P is a measure of the size of the errors in the parameters,
Lemma 2.1 shows exactly how this error is related to the spectrum Φχo ,
and especially the input spectrum Φu and the cross spectrum Φue . Therefore Lemma 2.1 is very useful when considering experiment design. It is
worth to notice that the only quantities that can be used to shape the
covariance P are actually the input spectrum Φu and the cross spectrum Φue . The other quantities in (2.14) are all dependent on the true
underlying system.
The cross spectrum is zero when the system is operating in open-loop.
Hence, the input spectrum is the only quantity that influence the covariance matrix in open-loop. How different input spectra may influence the
parameter covariance is illustrated in the following example.
Example 2.1
Consider identification of the output-error system defined by
y(t) =
0.1q −1
u(t) + e(t)
1 − 0.9q −1
(2.17)
where e(t) is white noise with unit variance. To illustrate the influence
of different input spectra on the covariance of the parameters, two types
of input signals having equal energy but different frequency content are
considered. The first input is a white noise sequence with unit variance.
The second is low-pass filtered white noise. The spectra of these inputs
are given in Figure 2.1.
The result of 1000 identification experiments with different noise realizations is illustrated in Figure 2.2. In each experiment 1000 data points
are used. As this example shows, the frequency distribution of the input
may has a large impact on the covariance of the parameter estimates.
23
2.2 Uncertainty in the Parameter Estimates
2
10
1
10
0
10
−1
Magnitude
10
−2
10
−3
10
−4
10
−5
10
−6
10
−7
10
−3
10
−2
10
−1
10
Frequency
0
10
Figure 2.1: Amplitude plots regarding Example 2.1. The open-loop
system (thin solid), spectrum of input with low-pass characteristics(thick
solid) and the spectrum of white input (dashed).
2.2.2
Confidence Bounds for Estimated Parameters
From the asymptotic normality of the parameter estimates, see (2.12), it
follows that
(θ̂N − θo )T PN−1 (θ̂N − θo ) → χ2 (n)
as N → ∞
(2.18)
with
PN = P/N
(2.19)
Uθ = {θ | (θ − θ̂N )T PN−1 (θ − θ̂N ) ≤ χ2α (n)}
(2.20)
and, hence,
is a confidence region which asymptotically includes the parameter θo
with probability α. Thus the estimates will asymptotically be centered
around θo and for a certain probability α they will be within an ellipsoid
defined by PN and χ2α (n).
24
2 Parameter Estimation and Connections to Input Design
−0.75
−0.8
a
−0.85
−0.9
−0.95
−1
0
0.05
0.1
b
0.15
0.2
Figure 2.2: Monte-Carlo estimates of the system (2.17) where the true
parameters are represented by ao = −0.9 and bo = 0.1 using a colored
and a white input signal respectively, see Example 2.1. The estimates
based on the colored input are given by (+) and these based on the white
input are given by (·).
Example 2.2
Reconsider Example 2.1. The 95% confidence region for a certain input
design is given by (2.20) with χ20.95 (2) = 5.99, where PN depends on the
specific input spectrum. A comparison of these confidence regions and
the Monte-Carlo estimates of Example 2.1 is illustrated in Figure 2.3.
2.3
Uncertainty of Frequency Function Estimates
In many situations, e.g. control design applications, it is more useful to
express the uncertainty of the estimated model in the frequency domain
25
2.3 Uncertainty of Frequency Function Estimates
−0.75
−0.86
−0.8
−0.88
a
a
−0.85
−0.9
−0.9
−0.92
−0.95
−0.94
−1
0
0.05
0.1
b
0.15
0.2
0.07
0.08
0.09
0.1
b
0.11
0.12
0.13
0.14
Figure 2.3: Confidence ellipses for the different input designs in Example 2.1 are plotted for the white input(left plot) and for the colored
input(right plot). These ellipses can be compared with the obtained
model estimates from the Monte-Carlo simulations.
rather than in the parameter domain.
2.3.1
Variance of Frequency Function Estimates
Under the assumptions S ∈ M, it can be shown, see (Ljung, 1985), that
√
N (G(ejω , θ̂N ) − G(ejω , θo )) → N (0, Π(ω))
(2.21)
when N → ∞ with
Π(ω) =
dG∗ (ejω , θo ) dG(ejω , θo )
P
.
dθ
dθ
(2.22)
Here ∗ denotes complex conjugate transpose. Hence a useful approximation for finite data becomes
Var G(ejω , θ̂N ) ≈
1 dG∗ (ejω , θo ) dG(ejω , θo )
P
N
dθ
dθ
(2.23)
26
2 Parameter Estimation and Connections to Input Design
where the covariance matrix of θ̂N is approximately N1 P . The estimation
error will depend on the number of data, the parameter covariance and
the sensitivity of the true system to parameter changes.
An expression is derived in (Ljung, 1985) that is asymptotic in both
the model order and the number of data. Under the assumption of openloop identification, it has lead to the well known approximation
Var G(ejω , θ̂N ) ≈
m Φv (ω)
N Φu (ω)
(2.24)
for finite m and N where m is the model order1 . Due to its simple
structure and that it provides a certain robustness against the properties of the underlying system, the expression (2.24) has been widely
used in experiment design, see e.g. (Ljung and Yuan, 1985; Gevers and
Ljung, 1986; Hjalmarsson et al., 1996; Zhu and van den Bosch, 2000; Forssell and Ljung, 2000). However, since this approximation is derived from
an expression which is asymptotic in the model order, its accuracy for
finite model orders is not guaranteed. An intriguing fact is that for some
situations, the approximate expression (2.24) is quite accurate even for
model orders as low as two, see (Ljung, 1985; Ljung, 1999). But it is
also easy to construct examples where this expression fails for low model
orders, see e.g. (Ninness and Hjalmarsson, 2002b). This has been the
inspiration to derive expressions that are exact for finite model orders,
cf. (Ninness et al., 1999),(Xie and Ljung, 2001), (Ninness and Hjalmarsson, 2002b) and (Ninness and Hjalmarsson, 2003). The generalized result
in the case of independently parameterized dynamics and noise models
(Box-Jenkins models) reads as follows:
Φv (ω)
Φu (ω)
(2.25)
mκ
1 − |ξk |2
κ(ω) .
|ejω − ξk |2
(2.26)
lim N · Var G(q, θ̂N ) = κ(ω)
N →∞
k=1
It can be shown that mκ and {ξk }, for some specific system configurations, will depend on the poles of G(θo ), the dynamics of the noise and
the input, see (Ninness and Hjalmarsson, 2003). The preceding result
suggests the following approximation of the variance of G for finite N
Var G(q, θ̂N ) ≈
1A
κ(ω) Φv (ω)
.
.
N Φu (ω)
more general expression is given in (1.17).
(2.27)
2.3 Uncertainty of Frequency Function Estimates
27
Notice that the model order m in (2.24) is here replaced by the frequency
dependent factor κ(ω). Since κ(ω) depends on the input spectrum in a
rather complicated way it is not straightforward to replace the approximation (2.24) by the more accurate expression (2.27) to design optimal
input signals for identification. One exception is for the case of periodic
inputs and linearly parameterized models as will be shown in Chapter 4.
We will later in this paper show how to directly use the covariance approximation (2.23) instead of using the approximations (2.24) and (2.27)
for input design.
2.3.2
Uncertainty Descriptions Based on Parametric
Confidence Regions
The asymptotic properties (2.12) can be utilized to obtain confidence
intervals on the parameter estimates, see Section 2.2.2. This parametric
uncertainty corresponds to an uncertainty region in the space of transfer
functions denoted D:
D = {G(q, θ) |θ ∈ Uθ }.
(2.28)
An alternative to the covariance expression that is based on a Taylor
expansion (2.23), is to directly work with the uncertainty set (2.28) to
describe the uncertainty of the frequency function. This has been explored in (Bombois et al., 2000b; Bombois et al., 2001; Gevers et al., 2003)
where prediction error identification is connected with robust control theory. In Chapter 3, it is shown how different frequency function uncertainties based on the uncertainty set (2.28) can be transformed into convex constraints and included in experiment designs. The first contribution connected to experiment design and the variability of the frequency
function estimate viewed through the parameter uncertainty set (2.20)
is (Hildebrand and Gevers, 2003). In Chapter 3 a different approach is
taken compared to the contribution (Hildebrand and Gevers, 2003).
Confidence regions in the frequency domain
In (Wahlberg and Ljung, 1992; Bombois et al., 2000a) the image of (2.28)
in the Nyquist plane is studied for linearly parameterized models. It is
shown that this image is represented for each frequency by an ellipsiod
in the Nyquist plane that are centered around the nominal frequency
function estimate G(ejω , θ̂N ).
28
2 Parameter Estimation and Connections to Input Design
Let the model be represented by
G(q, θ) = ΓT (q)θ
(2.29)
and let
g(ejω , θ) =
Re G(ejω , θ)
Im G(ejω , θ)
=
Re Γ(ejω )
θ
Im Γ(ejω )
(2.30)
Γc (ω)
Then the confidence ellipsoid for the frequency function estimate G(ejω , θ̂N )
at the frequency ω is defined by
UG (ω) =
{g ∈ R2 | (g − g(ejω , θ̂N ))T ΠN (ω)(g − g(ejω , θ̂N ) ≤ χ2α (n)}
(2.31)
where ΠN (ω) = (Γc (ω)PN ΓTc (ω))−1 . For models that are not linearly parameterized, the characterization holds approximately for large N when
jω
Γ(ejω ) is replaced by the linearization dG(edθ ,θ) θ .
o
The gain error
The expression (2.31) characterizes frequency-by-frequency confidence regions for the frequency function estimate. In many control applications,
it is often sufficient to consider the gain error
(2.32)
G(ejω , θ̂N ) − Go (ejω )
Based on (2.31) the gain error is bounded by
G(ejω , θ̂N ) − Go (ejω ) ≤ χ2α (n) λ (Γc (ω)PN ΓTc (ω))
(2.33)
see (Bombois et al., 2004a; Bombois et al., 2004d).
In (Hjalmarsson, 2004) a confidence region for the gain error (2.32) is
derived based on Var G(ejω , θ̂N ). This confidence region is given by
G(ejω , θ̂N ) − Go (ejω ) ≤ χ2α (n) Var G(ejω , θ̂N )
(2.34)
2.4 Summary
29
which holds with at least α·100% probability. A useful re-parametrization
for experiment design purposes is to insert the variance approximation
(2.23) into (2.33) which yields the description
dG(ejω , θo )
dG∗ (ejω , θo )
jω
jω PN
(2.35)
G(e , θ̂N ) − Go (e ) ≤ χ2α (n)
dθ
dθ
Both (2.33) and (2.34) can be included in the framework that we present
in Chapter 3. It is worth to notice that the confidence bound (2.33)
has been used for input design in (Bombois et al., 2004d; Bombois et
al., 2004c; Bombois et al., 2004b).
2.4
Summary
We have presented several results that quantify errors in the parameters as well as the corresponding frequency function estimates. The key
expression for experiment design is the expression for the asymptotic covariance matrix P given in (2.14) whose inverse is convex in Φχo . In the
subsequent chapters we will use different functions of P to quantify the
errors in the identified models, e.g. different versions of (2.23), (2.28) and
(2.34).
Notice that all these results are only valid for ”large” N . There is
no general limit on how large N has to be for the asymptotic results
to be reliable. Monte-Carlo simulations indicate that for typical system identification applications, the results are quite reliable for N 300
(Ljung, 1999). Recent studies of the validity of the asymptotic prediction error theory are presented in (Bittanti et al., 2002) and (Garatti et
al., 2003). Non-asymptotic confidence ellipsoids have been considered in
(Campi and Weyer, 2002) and (Weyer and Campi, 2002).
It should also be emphasized that all considered uncertainty results
are based on the assumption that variance errors are the only concern.
This must be kept in mind when designing proper experiment designs.
For example, if an optimal design is based on an assumption that the true
system is a second order linear system and the obtained optimal input is
a sum of two sinusoids. Then there is no invalidation power in the input
to invalidate a second order model. In other words, this input makes it
impossible to check whether a third or a fourth order model is better.
Therefore, optimal design must be performed carefully in order to check
the underlying assumptions of the true system.
Chapter 3
Fundamentals of
Experiment Design
One of the main contributions of this thesis is to introduce a quite general
framework for translating optimal experiment design problems in system
identification into convex optimization programs. The aim of this chapter
is to introduce the fundamentals of this framework. Furthermore, the
theory of optimal experiment design will be more thorough studied. For
a historical background we refer to Section 1.2 and the references therein.
First we will give an overview of the main ideas of the framework.
This introduction will also give a flavor of what kind of experiment design
problems that at present are solvable.
3.1
Introduction
The experiment design problems we consider all have the general form
minimize
objective
subject to
quality constraints, and
signal constraints
Φ χo
(3.1)
i.e. they can all be formulated as optimization problems that include some
constraints on the model quality together with signal constraints. The
quality constraints are typically functions of the asymptotic covariance
matrix P . Therefore, it is natural to use the input spectrum Φu and
32
3 Fundamentals of Experiment Design
eventually the cross spectrum Φue as design variables, cf. (2.14). The
signal constraints have to be included to obtain well-posed problems,
i.e. to prevent the use of infinite input power. The considered signal
constraints include energy as well as frequency-wise constraints.
As will be evidenced, typical experiment design problem formulations
are in their original form intractable for several reasons:
1. The constraints are typically non-convex and such optimization
problems may be difficult to solve.
2. The constraints are in many cases infinite-dimensional, which calls
for special care when undertaking the optimization procedure.
3. There is also the problem of finding a signal realization which has
the desired spectral properties. This is called spectral factorization.
Thus, a useful input design algorithm should contain a second step
that performs spectral factorization of the input spectrum.
4. The asymptotic variance depends typically on the true system parameters θo , i.e. P = P (θo ), which are unknown.
It should be emphasized that these difficulties appear in many experiment design problems. This is the motivation why there still are many
interesting design problems to solve. However, due to the great advances
in the optimization community during the last 15 years, there exist today many useful methods to reformulate and solve difficult optimization
problems of which there are several that apply to our experiment design
problems.
In the following sections, we will show that the first three difficulties
listed above may be solved. This is done by introducing a finite dimensional linear parametrization of the input spectrum and eventually the
cross spectrum. Due to this parametrization, several experiment design
problems can be reduced to tractable convex optimization problems. In
the framework that will be derived several different quality measures can
be fit into the framework as long as they are convex in P −1 .
The last difficulty that the optimal solution in general depend on the
character of the system to be identified is inherent for almost all optimal
designs. This is unavoidable. In a real application this fact must handled.
There are at present very few systematic ways for this. In Section 3.10
we will further discuss this topic.
Let us now discuss and illustrate some of the considered constraints
in more detail.
33
3.1 Introduction
3.1.1
Quality Constraints
In this section we will illustrate some possible and relevant quality constraints that can be included in (3.1). Since we assume that variance
errors are the main concern, it is natural that any considered quality
measure of our models is a function of the (asymptotic) covariance matrix P (2.14), provided the sample size is large. Examples of such quality
constraints are the classical alphabetical measures (1.13)-(1.16). The key
expression that shows how the covariance matrix can be manipulated is
(recapitulate (2.14))
π
1
Φ (ω) Φue (ω)
F(ejω , θo ) ∗u
F ∗ (ejω , θo )dω (3.2)
P −1 (θo ) =
λo
Φue (ω)
2πλo −π
that shows that P −1 is affine in the input spectrum1 Φu and the cross
spectrum Φue . Since these two spectra are the only quantities that can
be used to shape P , the main step to obtain tractable quality measures
are consequently to make the constraint convex in P −1 . Let us illustrate
how this can be done by way of an example.
Example 3.1
Consider the constraint
λmax (P ) ≤ γ
(3.3)
γI − P ≥ 0.
(3.4)
which is equivalent to
By Schur complements, (3.4) is equivalent to
γI
I
≥0
I P −1
(3.5)
which obviously is convex in P −1 .
Even though (3.5) is convex in P −1 , it is an infinite-dimensional constraint due to the frequency dependence of Φχo . Therefore, it is not
straightforward to handle the constraint (3.5). Solutions to this will be
presented in Section 3.2 and Section 3.3. These are based on different
1 The
frequency argument will be frequently omitted in the presentation.
34
3 Fundamentals of Experiment Design
parametrizations of the spectrum Φχo that makes it possible to obtain
linear and finite parametrizations of P −1 on the form
M
P −1 =
c̃k Pk + P̄
(3.6)
k=−M
where Pk , k = −M, · · · , M and P̄ are constant matrices. The original
design variable Φχo is replaced by the variables ck , k = −M, · · · , M .
Consequently, with a parametrization of the form (3.6) inserted in (3.5),
a linear and finite dimensional constraint is obtained. The discussion
around Example 3.1 has unveiled two major steps to obtain tractable
quality constraints: first make the constraint convex in P −1 , then impose
a finite and linear parametrization of P −1 through a suitable parametrization of Φχo .
From now on we will assume that the cross spectrum is zero and only
consider input design in open-loop. The more general case of possible
feedback in the design will be treated in Section 3.5. Let us now illustrate,
by means of a very simple example, how the input spectrum may be
parametrized.
Example 3.2
Let the model and the true system have the structure
y(t, θ) = b0 u(t − 1) + b1 u(t − 2) + e(t)
(3.7)
i.e. a system where H = 1, see (2.2), and hence dH/dθ = 0. Now the
asymptotic covariance (3.2) becomes
π −jω jω
1
e
−1
e
ej2ω Φu (ω)dω
(3.8)
P =
−j2ω
e
2πλo −π
A general spectrum can be written as
Φu (ω) =
∞
rk e−jωk ≥ 0
∀ω
(3.9)
k=−∞
where rk are the auto-correlations of the input, i.e. rk = E u(t)u(t − k).
Assume that the noise variance λo = 1 and insert (3.9) into (3.8). This
yields
r0 r1
1 0
0 1
−1
P =
= r0
+ r1
(3.10)
r1 r0
0 1
1 0
which is a linear parametrization of P −1 in r0 and r1 of the form (3.6).
35
3.1 Introduction
Notice that for this example, only the first two auto-correlation coefficients of the input influences P −1 . This is a very important fact that will
be further explored in the subsequent sections for more general model
structures.
In Section 3.2 two families of parametrizations of an input spectrum will be introduced. They are denoted ”finite dimensional spectrum
parametrization” and ”partial correlation parametrization”. An example
of a finite dimensional spectrum parametrization is
Φu (ω) =
M
rk e−jωk ≥ 0 ∀ ω
(3.11)
k=−M
where the coefficients rk must be such that Φu (ω) ≥ 0, ∀ ω. Notice that
for all spectra of the form (3.11) with M ≥ 1 we obtain the parametrization (3.10) of P −1 . A corresponding partial correlation parametrization
can be characterized by the finite expansion
M
rk e−jωk
(3.12)
k=−M
which may not be a spectrum itself but where the coefficients rk are constrained such that there exist an expansion rM +1 , rM +2 , . . . that altogether yields a spectrum. This type of parametrization makes it possible
to work with infinite expansions of the spectrum. Let us now illustrate
these two types of parametrizations on the second order FIR system (3.7).
Example 3.3 (Example 3.2 continued)
Consider the following design problem.
π
1
minimize
2π −π Φu (ω)dω
Φu
subject to
det P −1 ≥ 1
(3.13)
With the parametrization (3.11) we obtain the optimization problem
minimize
r0 ,...,rM
subject to
r0
r0
r1
r1 ≥1
r0 r0 + 2r1 cos ω + · · · + 2rM cos M ω ≥ 0 ∀ ω
(3.14)
(3.15)
36
3 Fundamentals of Experiment Design
The first constraint corresponds to the determinant of P −1 and the second constrain the new optimization variables r0 , . . . , rM to represent a
finite spectrum. For M = 0, (3.15) gives r1 = 0 and for M = 1 the
corresponding condition becomes r0 ≥ 2|r1 |. When M increases the feasible region for r1 increases as well. Asymptotically the feasible region
approaches r0 ≥ |r1 |. The boundaries for (3.15) for different values of
M are depicted in Figure 3.1 together with the constraint (3.14). It is
easy to verify that the optimal design is given by r0 = 1 and r1 = 0,
i.e. the optimal input is white noise with unit variance. With a partial
correlation parametrization of the form (3.12) with M = 1, the design
problem (3.13) becomes
minimize
r0
subject to
r0
r0
r1
r1 ≥1
r0 r0
r1
r1
≥0
r0
(3.16)
The second constraint ensures that r0 and r1 are correlation coefficients.
The bound of this constraint is r0 ≥ |r1 |. The optimal design is thus
given by r0 = 1 and r1 = 0.
This is a very interesting example. It shows that the partial correlation
parametrization yields a complete parametrization of P −1 with a minimal
number of parameters (2 free parameters for this example). The finite
spectrum parametrization does not yield a complete parametrization for
any finite M . The optimal solution is, however, retrieved for both the
types of input parametrization and for any choice of M . This will of
course depend on the chosen quality constraint as will be illustrated in
the next example.
Let us now go beyond the classical alphabetical measures (1.13)(1.16).
Example 3.4 (Example 3.3 continued)
The variance of the frequency response for the second order FIR system
(3.7) can be approximated by Var{b0 e−jω + b1 e−2jω } ≈ Γ∗ P Γ/N where
37
3.1 Introduction
5
M=4
4
M=1 M=2
M=0
r0
3
2
1
0
−5
−4
−3
−2
−1
0
r
1
2
3
4
5
1
Figure 3.1: Illustration of the optimization problem (3.13). The boundary of (3.14) is given by the dashed line. The thin lines corresponds to
the bounds of (3.15) for M = 0, 1, 2, 4. The thick line is the boundary of
(3.16). The optimal point is given by the cross.
Γ∗ (ω) = [ejω e2jω ]. Now consider the design problem
π
1
Φu (ω)dω
minimize
Φu
2π −π
subject to
1 − a2
Γ∗ (ω)P Γ(ω) ≤ 1
|1 − ae−jω |2
∀ω
(3.17)
where the quality constraint is represented by a frequency-wise constraint
on the variance of the system’s frequency response. The constraint in
(3.17) can be rewritten by Schur complements as
P −1 −
1 − a2
Γ∗ (ω)Γ(ω) ≥ 0 ∀ ω
|1 − ae−jω |2
which with the parametrization (3.11) or (3.12) gives the constraint
1
ejω
r1
−jω 2 r0
2
|1 − ae
|
≥0 ∀ω
(3.18)
− (1 − a ) −jω
r1 r0
e
1
First of all the performance constraint is now a linear inequality in the
variables r0 and r1 . The complication is then the frequency-wise constraint. One solution is to sample the frequency axis and obtain one
38
3 Fundamentals of Experiment Design
inequality constraint for each sample point. An alternative solution is to
use the fact that (3.18) can be seen as a constraint on a finite-dimensional
multivariable spectrum. Such constraints can be converted into linear
finite dimensional constraints as will be shown in Section 3.2. Solutions and constraints to (3.17) are presented in Figure 3.2. Solutions
for a = 0, 0.2, 0.4, 0.6, 0.8 with the partial correlation parametrization
(3.12) of the input spectrum are given by (*). Corresponding solutions
for a white input are given by (x). For a = 0 the solutions coincide. For
larger values of a, the optimal solution is obtained for r1 = 0. Thus the
white input is thus not optimal when a = 0.
We can also see what the solutions are when the finite spectrum
parametrization (3.11) is used. The bound of (3.18) for a = 0.6 is
given in Figure 3.2 by the dotted line. Furthermore, the bounds for
the parametrization (3.11) are given by the thin lines as in Figure 3.1.
The solutions will lie on the dotted line left of the optimal solution depending on the value of M . The solutions for M = 0 and M = 1 are
plotted. It is easy to verify that for a large enough M the solution with
the finite spectrum parametrization (3.11) will coincide with the optimal
solution obtained by the partial correlation parametrization. Solutions
for the other values of a can easily be extracted from the figure in a
similar manner.
In Section 3.2 we will further elaborate on and generalize the concepts
of finite spectrum parametrizations and partial correlation parametrizations. The Example 3.4 illustrates some of the main differences between
these two concepts. The finite spectrum parametrization does in general
require more parameters than the partial correlation parametrization to
yield the same solution. The main motivation to use the finite spectrum
parametrization is its capability to handle frequency-wise constraints on
the input or output spectra. Such constraints can not be treated by a
partial correlation parametrization since we are not working with the
complete spectrum.
In Example 3.4, we introduced a very simple frequency-by-frequency
constraint. In control applications, it is common to have frequency-byfrequency conditions on the error on the frequency function estimate. To
illustrate this consider the weighted relative error
∆(ejω , θ) = T (ejω )
Go (ejω ) − G(ejω , θ)
G(ejω θ)
(3.19)
39
3.1 Introduction
18
16
M=1
14
M=0
r0
12
M=2
M=4
10
8
6
4
2
0
−10
−5
0
r
5
10
1
Figure 3.2: Illustration of Example 3.4. Solid lines correspond to different bounds of spectrum parametrizations as in Figure 3.1. Solutions are
given for partial correlation parametrizations (*) and white noise inputs
(+) for a ∈ [0, 0.2, 0.4, 0.6, 0.8]. The solution for a = 0 is given by r0 = 2
and r1 = 0. The dotted line describes the boundary for the quality constraint in (3.17) for a = 0.6. From this boundary, the solution for the
finite spectrum parametrization with M = 1 can be graphically obtained
(+).
where Go and G(θ) are the true system and the model, respectively,
and where T is a weighting function. When T is equal to the designed
complementary sensitivity function, the H∞ −norm of (3.19) has been
considered as a relevant measure of both robust stability and robust performance (Morari and Zafiriou, 1989; Zhou et al., 1996; Hjalmarsson and
Jansson, 2003); e.g. ∆(θ)∞ < 1 is a classical robust stability condition.
When the model G(θ) is obtained from an identification experiment it
will lie in an uncertainty set. Hence a reasonable objective is therefore
to design the identification experiment such that ∆(θ) becomes small for
all models in such an uncertainty set.
40
3 Fundamentals of Experiment Design
One way to measure the size of ∆(θ) is by considering its variance
which, using a first order Taylor approximation, can be expressed as
T (ejω ) 2
jω
Var G(ejω , θ̂N )
Var ∆(e , θ̂N ) ≈ Go (ejω ) By using the variance approximation (2.23) of G, the variance of ∆(θ)
can be approximated as
T (ejω ) 2 1 dG∗ (ejω , θo ) dG(ejω , θo )
jω
P
(3.20)
Var ∆(e , θ̂N ) ≈ Go (ejω ) N
dθ
dθ
which is an explicit function of P . Different quality measures can now
be formulated as constraints on some norm of (3.20). Some examples of
constraints are
π T (ejω ) 2 dG∗ (ejω , θo ) dG(ejω , θo )
P
dω ≤ 1
(3.21)
Go dθ
dθ
−π
and
T (ejω ) 2 dG∗ (ejω , θo ) dG(ejω , θo )
max P
≤ 1.
ω
Go (ejω ) dθ
dθ
(3.22)
In Section 3.6, we will show how to treat quality constraints such as (3.21)
and (3.22).
One alternative to different measures of the variance of ∆ is to use
the confidence bound in (2.20). This gives
|∆(ejω , θ)| ≤ 1 ∀ ω, ∀θ ∈ Uθ
Uθ = {θ : N (θ − θo ) P
T
−1
(3.23)
(Φu )(θ − θo ) ≤ χ}.
This constraint implies that ∆∞ ≤ 1 for all models in the confidence
region associated with the (to be) identified model. Constraints like (3.23)
are treated in Section 3.7.
Figure 3.3 illustrates different measures of ∆ that today can be handled in an optimal experiment design scheme. The constraints in the
right column all relates back to the high order variance expression (2.24).
For designs based on this type of constraints we refer to Section 1.2.2
and the references therein. It should be emphasized that the developed
methods are not restricted to the ∆-function. The constraints such as
(3.21), (3.22) and (3.23) only serve as good illustrations of the potential
of the newly derived methods.
41
3.1 Introduction
π
1
−π N
1
N
2
T Go 2
T Go dG∗ (θo ) dG(θo )
P dθ dω
dθ
dG∗ (θo ) dG(θo )
P dθ
dθ
π
2−norms
Var ∆ −π
2
m T N Go m
N
2
T Go Φv
Φu dω
Φv
Φu
|∆|2
|∆|2 ∀ θ ∈ Uθ
max
ω
1
N
2
T Go dG∗ (θo ) dG(θo )
P dθ
dθ
max
ω
max |∆|2 ∀ θ ∈ Uθ
ω
∞−norms
m
N
2
T Go Φv
Φu
Figure 3.3: The diagram presents different ways to incorporate a performance measure in terms of ∆, the weighted relative model error, into
performance constraints for input design. Five possible performance constraints are represented. The two expressions in the upper shaded area
represents H2 −norms of the covariance of ∆. The three expressions in
the lower shaded area corresponds to different H∞ −norms of ∆.
42
3.1.2
3 Fundamentals of Experiment Design
Signal Constraints
To obtain well-posed designs, signal constraints must be included. The
framework allows for several different types of signal constraints as long
as they are linear in the spectrum Φχo . Examples of constraints are
energy constraints like
π
1
|Wu (ejω )|2 Φu (ω)dω ≤ 1
2π −π
or frequency-wise constraints like
αu (ω) ≤ Φu (ω) ≤ βu (ω) ∀ ω
What signal constraints that can be included will depend on the parametrization of the involved spectra. Different signal constraints will be treated
in Section 3.4.
3.1.3
Objective Functions
Traditionally, the objective in experiment design has been to optimize
some performance criterion subject to signal constraints, see e.g. (Goodwin
and Payne, 1977; Yuan and Ljung, 1985; Gevers and Ljung, 1986; Hjalmarsson et al., 1996; Zhu and van den Bosch, 2000). For industrial applications however perhaps more relevant measures are excitation level and
experiment time, treated e.g. in (Bombois et al., 2004d) and (Rivera et
al., 2003). In the presented framework for experiment design there is a
large flexibility in the choice of objective function. One can, e.g. choose
whether the objective is to optimize some model quality measure (given
bounds on, e.g. input energy) or to optimize some signal quantity such
as the input energy (given some model quality constraints).
3.1.4
An Introductory Example
An optimal experiment design problem can now be formulated based
on the different parts described in Section 3.1.1, Section 3.1.2 and Section 3.1.3. To illustrate this consider input design regarding the system
described by
y(t) =
1
Bo (q)
u(t) +
e(t)
Ao (q)
Ao (q)
(3.24)
43
3.1 Introduction
30
20
10
magnitude (dB)
0
−10
−20
−30
−40
−50
−60 −3
10
−2
10
−1
10
frequency (rad/s)
0
10
Figure 3.4: Input design for the resonant system. Thin solid line: optimal input spectrum. Dashed line: transfer function T . Dash-dotted line:
open-loop system.
where
Ao (q) = 1 − 1.99185q −1 + 2.20265q −2 − 1.84083q −3 + 0.89413q −4
Bo (q) = 0.10276q
−3
+ 0.18123q
−4
and
.
Here e(t) is zero mean white noise with variance 0.05. This is a resonant
system that describes a flexible transmission system, proposed in (Landau
et al., 1995a) as a benchmark problem for robust control design. The
input design problem is formulated as:
minimize
α
subject to
|∆(ejω , θ)| ≤ γ ∀ ω
N (θ − θo )T P −1 (Φu )(θ − θo ) ≤ χ
π
1
2π −π Φu (ω)dω ≤ α
0 ≤ Φu (ω) ≤ β(ω)
Φu
(3.25)
44
3 Fundamentals of Experiment Design
60
40
magnitude (dB)
20
0
−20
−40
−60
−2
10
−1
0
10
10
frequency (rad/s)
Figure 3.5: Input design for the resonant system. Thin solid line: optimal input spectrum. Dashed line: transfer function T . Dash-dotted line:
open-loop system. Thick solid line: upper bound on Φu .
The objective of this input design problem is to find the input spectrum with the least energy that satisfies (3.23). There may also exist
a frequency-by-frequency constraint on the input spectrum here represented by β(ω). We choose the input spectrum to be parametrized as
Φu (ω) =
M
rk e−jωk ≥ 0
∀ ω.
(3.26)
k=−M
The solution to (3.25) without a frequency-by-frequency upper bound
on the spectrum is shown in Figure 3.4 (i.e. β(ω) = ∞). Here we have
used M = 35. To reduce the required input power, the design concentrates most of the input power around the first resonance peak, which
intuitively seems reasonable. In practice there may exist constraints on
the input excitation in different frequency bands. Figure 3.5 illustrates
a solution when there is a frequency-wise bound on the input spectrum,
that e.g. constrain possible excitation around the first resonance peak. In
3.2 Parametrization and Realization of the Input Spectrum
45
this setting we have used M = 50. This resonant system will also appear
in Chapter 5.
After this introduction we will now give a more theoretical description of the different parts in the framework. The subsequent sections are
organized as follows. Section 3.2 introduces two methods for parametrizing a spectrum. These parametrizations are used for parametrizing the
asymptotic covariance matrix in Section 3.3. Different signal constraints
are considered in Section 3.4 and experiment design in closed-loop are
treated in Section 3.5. How to handle different quality constraints are
described in Section 3.6 and Section 3.7. In Section 3.8 we will consider
optimal design in open-loop for the case of biased noise models. Some
comments on the computational complexity are given in Section 3.9 and
different aspects on robustness regarding the dependence of the solution
on the true system are discussed in Section 3.10. Finally, a summary
of the framework together with a numerical example are given in Section 3.11.
3.2
Parametrization and Realization of the
Input Spectrum
We will consider identification in open-loop in Sections 3.2-3.4. The generalization to closed-loop operation is given in Section 3.5.
3.2.1
Introduction
The expression (2.14) for the inverse of the asymptotic covariance matrix
P shows that the spectrum is the only input related quantity that has
any influence on P . Since any quality measure is a function of P asymptotically in the sample size when only variance errors are present, this
means that the input spectrum is the only quantity that can be used to
negotiate quality constraints. Generally, a spectrum may be written
Φu (ω) =
∞
c̃k Bk (ejω )
(3.27)
k=−∞
for some basis functions {Bk }∞
k=−∞ which span L2 . Without loss of
generality we will assume that B−k = Bk∗ such that c̃−k = c̃k so that the
spectrum is uniquely characterized by c̃0 , c̃1 , c̃2 , . . .. We will also assume
46
3 Fundamentals of Experiment Design
that Bk (e−jω ) = Bk∗ (ejω ) so that c̃k ∈ R. The coefficients c̃k must be such
that
Φu (ω) ≥ 0,
∀ω.
(3.28)
For the most common choice of basis functions, Bk (ejω ) = e−jωk , the
coefficients c̃k have the interpretation as auto-correlations:
∆
c̃k = rk = E u(t)u(t − k).
We will therefore reserve the notation rk for the parameters when working
with complex exponential basis functions.
As we will see in Sections 3.3, 3.6 and 3.7 many quality constraints
can be transformed such that they become linear in the input spectrum.
It is thus natural to parametrize the spectrum in terms of the coefficients
c̃k . However, it is impractical to use an infinite number of parameters so
the parametrization has to be restricted. One possibility is to use a finite
dimensional parametrization2 , i.e.
M
−1
Φu (ω) =
c̃|k| Bk (ejω )
(3.29)
k=−(M −1)
for some positive integer M . Here one has to impose the condition (3.28)
to ensure that Φu indeed is a spectrum. We will denote this type of
approach as a “finite dimensional spectrum parametrization”.
Instead of parametrizing the spectrum, one may equivalently work
with a parametrization of the positive real part
Φu (ω) = Ψ(ejω ) + Ψ∗ (ejω )
Ψ(ejω ) =
M
−1
ck Bk (ejω ).
(3.30)
k=0
This will be our preferred choice when it comes to finite dimensional
spectrum parametrizations.
Alternatively, one may use a partial expansion
M
−1
c̃|k| Bk (ejω )
(3.31)
k=−(M −1)
2 We will frequently use the notation c̃
|k| of the free parameters to emphasize that
the basis functions have been chosen such that c̃−k = c̃k .
3.2 Parametrization and Realization of the Input Spectrum
47
which may not be a spectrum in itself, but constrained such that there
exists additional coefficients c̃|k| , k = M, M + 1, . . . such that the expansion (3.27) satisfies the non-negativity condition (3.28). This approach,
which we denote as a “partial correlation parametrization”, thus enables
one to work with infinite dimensional expansions.
When the input design problem under consideration only depends on
the first M expansion coefficients c̃k , k = 0, 1, . . . , M − 1, the partial
correlation parametrization has lower complexity than the finite dimensional spectrum parametrization. On the other hand, when there are
frequency-by-frequency constraints on the input (and output) spectrum
of the system, a finite dimensional spectrum parametrization is in general
required.
Below we will discuss these two approaches in more detail.
3.2.2
Finite Dimensional Spectrum Parametrization
With a finite dimensional spectrum parametrization such as (3.30), the
parameters ck have to be constrained such that the positivity condition (3.28) holds. To handle this condition, we will use ideas from FIRfilter design (Wu et al., 1996). This approach is based on the positive
real lemma which is a consequence of the Kalman-Yakubovich-Popov
(KYP) lemma, see e.g. (Yakubovich, 1962; Rantzer, 1996). The key idea
is to postulate that the spectrum should be realizable using an M -th
order FIR-filter. Using this restriction on the input spectrum, finitedimensional constraints can be used to represent constraints on the input and the output. Since any spectrum can be approximated by an
FIR-process to any demanded accuracy, the approach is in principle generally applicable. However when M becomes too large, computational
complexity becomes an issue.
This idea was originally introduced for input design in (Lindqvist and
Hjalmarsson, 2001). Here we will generalize the idea by imposing a finitedimensional linear parametrization of the input spectrum in which the
parametrization used in (Lindqvist and Hjalmarsson, 2001) appears as
a special case. We also point out that this type of parametrization has
been used in parameter identification (Stoica et al., 2000).
We will employ the parametrization (3.30) of the positive real part
of the spectrum, where {Bk } is a set of known proper stable finite dimensional rational transfer functions, e.g. Laguerre functions (Wahlberg,
1991) or Kautz functions (Wahlberg, 1994). When Bk (ejω ) = e−jωk we
have the FIR case with the sequence {ck } corresponding to the correlation
48
3 Fundamentals of Experiment Design
coefficients {rk }. Notice that it is not necessary for the basis functions
to be orthogonal; but, naturally, they should be linearly independent.
To ensure that the spectral constraint (3.28) is satisfied, the following
result may be used.
Lemma 3.1
Let A, B, C, D be a controllable state-space realization of the positive
M −1
real part of the input spectrum, Ψ(ejω ) = k=0 ck Bk (ejω ). Then there
exists a Q = QT such that
Q − AT QA −AT QB
K(Q, {A, B, C, D}) −B T QB
−B T QA
(3.32)
0
CT
+
≥
0
C D + DT
if and only if
Φu (ω) M
−1
ck [Bk (ejω ) + Bk∗ (ejω )] ≥ 0 ∀ ω
k=0
Proof: This is an application of the Positive Real Lemma, (Yakubovich,
1962).
The idea is to let A, B, C, D be a state-space
M −1 realization of the positive part of the input spectrum, Ψ(ejω ) = k=0 ck Bk (ejω ) where {ck }
appears linearly in C and D. It is easy to construct such a realization
since {ck } appears linearly in Ψ(ejω ). Given this realization and the
symmetric matrix Q, the constraint Φu (ω) ≥ 0 may be replaced with the
linear matrix inequality (3.32). We illustrate this with an example.
Example 3.5
When the input is shaped by an FIR filter, the positive real part of the
spectrum becomes
Ψ(ejω ) =
M
−1
1
r0 +
rk e−jωk .
2
k=1
For a FIR system, a natural choice of state-space description for the
3.2 Parametrization and Realization of the Input Spectrum
positive real part is the controllable form:
O1×M −2
0
A=
, B = [1 0 . . . 0]T
OM −2×1
IM −2
1
C = [r1 r2 . . . rM −1 ] , D = r0 ,
2
49
(3.33)
where Om×k is the zero matrix of size m by k and Im is the identity matrix
of size m by m. With this parametrization, the correlation sequence
appears linearly in the inequality (3.32) through C and D. Hence (3.32)
becomes an LMI in Q and r = [r0 , . . . , rM −1 ]T .
When a linear and finite-dimensional parametrization of the input
spectrum is used, it becomes easy to construct a state-space description
of the positive real part of the spectrum, see e.g. Example 3.5. Given
that a state-space description of the positive real part of the spectrum is
available there exist directly applicable results to perform spectral factorization. By solving an algebraic Riccati equation it is possible to obtain
an innovations representation corresponding to the desired spectral properties, see (Anderson and Moore, 1979, Chap. 9) and (Lindqvist, 2001).
One may also use discrete spectra, i.e. spectra from periodic inputs.
A discrete spectrum with K non-positive spectral lines distributed over
(−π, π] is given by
Φ(ω) =
∞
c̃k δ (ω − ωk )
(3.34)
k=−∞
where
c̃k ≥ 0
c̃k+K = c̃k
c̃−k = c̃k .
(3.35)
For the frequencies it holds that
ωk+K = ωk + 2π
ω−k = −ωk .
(3.36)
For such spectra the positive real lemma simplifies to the first condition in (3.35). This type of parametrization will be used in Chapter 4.
50
3.2.3
3 Fundamentals of Experiment Design
Partial Correlation Parametrization
When using a partial correlation parametrization one must ensure that
there exists an extension c̃M , c̃M +1 , c̃M +2 , . . . of the sequence c̃0 , . . . , c̃M −1
such that the corresponding basis function expansion (3.27) defines a
spectrum.
Here, we will restrict attention to the case where the basis functions are complex exponentials so that the parameters {c̃k } are the autocorrelation coefficients {rk }. The correlation extension problem3 is then
known as the trigonometric moment problem, or as the Carathéodory
extension problem. It is well known that a necessary and sufficient condition for the existence of such an extension is that the Toeplitz matrix
⎡
⎢
⎢
T =⎢
⎣
r0
r−1
..
.
r1
r0
..
.
···
···
..
.
rM −1
rM −2
..
.
r−(M −1)
r−(M −2)
···
r0
⎤
⎥
⎥
⎥
⎦
(3.37)
is positive definite (Grenander and Szegö, 1958; Byrnes et al., 2001).
Notice that T ≥ 0 is an LMI in the rk :s and hence a convex constraint.
A complete characterization of all spectra having rk , k = 0, . . . , M −1
as first expansion coefficients is given by the Schur parameters (Schur,
1918; Porat, 1994).
For rational expansions, the so-called maximum entropy solution is
to use an all-pole, or AutoRegressive (AR), filter of order M − 1. The
solution is easily obtained via the Yule-Walker equations (Stoica and
Moses, 1997). When there are prescribed zeros, the solution can be computed via convex optimization (Byrnes et al., 2001).
It is also possible to use a discrete spectrum of the form (3.34). All
partial correlation sequences r0 , . . . , rM −1 can be generated by a discrete
spectrum (Grenander and Szegö, 1958; Payne and Goodwin, 1974). However, the frequencies ωk do not necessarily coincide with the fundamental
frequencies 2πk/N . In (Hildebrand and Gevers, 2003) one solution is
presented where M sinusoids are determined. The procedure becomes
quite involved since, apart from the amplitudes, also the locations of the
frequencies ωk have to be determined.
One may also attempt to realize a discrete spectrum of the form (3.34)
3 or,
as is more common the covariance extension problem.
51
3.2 Parametrization and Realization of the Input Spectrum
with the frequencies evenly distributed ωk = 2πk/K. It holds that
rk =
N
−1
2π
c̃l ej N kl , k = 0, . . . , M − 1.
(3.38)
l=0
Hence, if, for given r0 , . . . , rM −1 , there exists a feasible solution {c̃k }
to the constraints (3.35) and (3.38), the spectrum (3.34) has the desired
partial correlation sequence. Notice, that there is no guarantee that there
exists such a solution. In general, increasing the number of spectral lines
K may increase the chances of a feasible solution.
Before we proceed we remark that the discussion in this section also
applies to parametrizations of the type
Bk (ejω ) = L(ejω ) e−jωk ,
L(ejω ) > 0.
(3.39)
Since L is positive it must hold that k c̃k e−jωk also is positive and,
hence, the discussion above applies to this factor. This type of basis
functions will appear in Example 3.8 below.
3.2.4
Summary
We have in this section described different ways of parametrizing the
input spectrum. The starting point has been the finite expansion
M
−1
c̃|k| Bk (ejω ).
(3.40)
k=−(M −1)
Two principally different approaches have been delineated:
• Finite dimensional spectrum parametrization. In this approach, one
includes the constraint that (3.40) is a spectrum. As shown in
Section 3.2.2, doing this via the KYP lemma leads to a convex
constraint. For discrete spectra, the condition is simplified to scalar
non-negativity conditions (3.35).
• Partial correlation parametrization. Here one includes a convex
M −1
constraint that ensures that {c̃k }k=−(M
−1) is a partial correlation
sequence. These parameters are shaped according to the design
criteria that exists. Then in a second step, stochastic realization
52
3 Fundamentals of Experiment Design
theory is used to extend the resulting expansion (3.40) to a bona
fide spectrum
Φu (ω) =
∞
c̃|k| Bk (ejω ).
k=−∞
In Section 3.2.3 we saw that there exists a number of methods for
this.
The partial correlation parametrization has the advantage of keeping
the number of free parameters to a minimum. One disadvantage is that
spectral constraints on signals cannot be handled as one is not working
with the complete spectrum. This is, as we will see in Section 3.4, not a
problem with the finite dimensional spectrum parametrization.
The positivity condition employed in the finite dimensional spectrum
parametrization is more restrictive than the partial correlation sequence
condition. Hence the former approach leads in general to more conservative results than the latter approach.
A key insight is that both types of parametrizations are based on a
finite dimensional parametrization of the type (3.40) which leads to a
linear parametrization of P −1 in the free variables. We shall therefore
in Section 3.6 and Section 3.7 see how various quality constraints can be
transformed to be linear in P −1 . Before this, however, we analyze how
the asymptotic covariance matrix P , defined in (2.12), is parametrized
by the input spectrum.
3.3
Parametrizations of the Covariance Matrix
In this section we will consider how to to obtain a linear and finite dimensional parametrization of the inverse covariance matrix P −1 . In Section 3.3.1 the parametrizations are based on partial correlation parametrizations of the input spectrum. This leads to complete parametrizations of
P −1 . In Section 3.3.2 another type of parametrization based on a finite
dimensional spectrum parametrization is presented.
3.3 Parametrizations of the Covariance Matrix
3.3.1
53
Complete Parametrizations of the Covariance
Matrix
In this section we will consider how to parametrize the input spectrum
such that all possible covariance matrices P , defined in (2.12), can be
generated. In particular we are interested in minimal parametrizations,
i.e. those using a minimal number of parameters. Such parametrizations
are of interest from a computational point of view. The starting point is
the expression (2.14) for the inverse of the covariance matrix P that for
open-loop operation is given by
P −1 (θo ) =
1
2πλo
π
−π
Fu (ejω , θo )Φu (ω)Fu∗ (ejω , θo )dω + Ro (θo )
(3.41)
o)
where Φu is the spectrum of the input and Fu (θo ) = H −1 (θo ) dG(θ
dθ .
Furthermore, Ro is defined as
Ro (θo ) =
1
2π
π
−π
Fe (ejω , θo )Fe∗ (ejω , θo )dω
(3.42)
o)
where Fe (θo ) = H −1 (θo ) dH(θ
dθ . Since the elements of Fu span a linear subspace, it follows that the set of all covariance matrices can be
parametrized in terms of finite dimensional parametrizations of Φu . In
the sub-sections that follow we will discuss some possibilities that exist
to this end. As a preparation we characterize the space spanned by the
elements of Fu (θo )Fu∗ (θo ). We denote this space by X(θo ) and we will
assume that it has dimension 2nG − 1 for some non-negative integer nG .
We motivate this assumption with an example.
Example 3.6
Consider a Box-Jenkins model structure
y(t) =
C(q, θ)
q −nk B(q, θ)
u(t) +
e(t)
A(q, θ)
D(q, θ)
where A(q, θ), B(q, θ), C(q, θ) and D(q, θ) are defined by (2.4)-(2.7).
54
3 Fundamentals of Experiment Design
Then
⎛
q −nk
A(q,θo )
⎜
..
⎜
⎜
.
⎜ q−nk −nb +1
⎜
⎜
A(q,θo )
−1
⎜
o)
⎜ − q A2B(q,θ
(q,θo )
⎜
)
dG(q,
θ
o
..
Fu (q, θo ) = H −1 (q, θo )
= H −1 (q, θo ) ⎜
⎜
dθ
.
⎜
−n
⎜ q a B(q,θo )
⎜ − A2 (q,θo )
⎜
⎜
0
⎜
⎜
..
⎝
.
0
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
(3.43)
which, assuming B(θo ) and A(θo ) to be coprime, implies that
#
$
X(θo ) = Span e−j(nG −1)ω L−1 (ejω , θo ), . . . , ej(nG −1)ω L−1 (ejω , θo )
(3.44)
where nG = nb + na and
L(ejω , θo ) = |H(ejω , θo )|2 |A(ejω , θo )|4 .
(3.45)
Since the elements of Fu are rational and have the poles bounded
away from the unit circle it follows that X(θo ) ⊂ L2 . Let {Bk (θo ), k =
−(nG − 1), . . . , nG − 1} denote an orthonormal basis for X(θo ) which is
such that4 B−k (θo ) = Bk∗ (θo ). We can thus write
Fu (ejω , θo )Fu∗ (ejω , θo ) =
n
G −1
Fk (θo ) Bk (ejω , θo )
(3.46)
k=−(nG −1)
for some matrices Fk (θo ) ∈ Rn×n . Next, we will use (3.46) to characterize
the set of asymptotic covariance matrices defined in (3.41).
4 Due to the symmetry of F (ejω , θ )F ∗ (ejω , θ ) this is always possible, cf. Examu
o
o
u
ple 3.6.
3.3 Parametrizations of the Covariance Matrix
55
Subspace expansions
We can write any input spectrum on the form
n
G −1
Φu (ω) =
c̃|k| Bk (ejω , θo ) + Φ⊥
u (ω)
(3.47)
k=−(nG −1)
where Φ⊥
u is orthogonal to X(θo ). Using (3.46) and (3.47) in (3.41) gives
P
−1
1
(θo ) =
λo
n
G −1
c̃|k| Fk (θo ) + Ro (θo )
(3.48)
k=−(nG −1)
This shows that the finite sequence c̃0 , . . . , c̃nG −1 completely parametrizes
all possible covariance functions. It is also clear that the expansion does
not have to be orthonormal as long as the basis functions in (3.47) span
X(θo ). Notice that then (3.48) no longer holds; linear combinations of
Fl (θ0 ), l = −(nG − 1), . . . , nG − 1 will replace Fk (θ0 ).
Example 3.7 (Example 3.6 continued)
The elements in (3.44) can be used as the first basis functions in (3.47)
in the Box-Jenkins case.
Oblique expansions
It is not necessary that the basis functions are elements of the subspace
X(θo ). We illustrate this with an example.
Example 3.8 (Example 3.6 continued)
Let the spectrum be parametrized with the basis functions
Bk (ejω , θo ) = L(ejω , θo ) e−jωk
(3.49)
where L is given by (3.45). It then follows from (3.41) and (3.43) that
P −1 (θo ) =
n
G −1
c̃|k| Lk (θo ) + Ro (θo )
(3.50)
k=−(nG −1)
for some matrices Lk (θo ) ∈ Rn×n . Hence, c̃0 , . . . , c̃nG −1 parametrize all
possible covariance matrices.
Notice that L(ejω , θo )e−jωk is not orthogonal to X(θo ) when |k| < nG
but that these terms in general do not belong to X(θo ). The remaining
terms (corresponding to |k| ≥ nG ) are orthogonal to X(θo ).
56
3 Fundamentals of Experiment Design
This type of parametrization has been considered in (Goodwin and
Payne, 1977), and been employed in, e.g. , (Hildebrand and Gevers, 2003).
We will now give present a slight modification of this parametrization
that is useful for Box-Jenkins model structures. This parametrization
was originally introduced in (Stoica and Söderström, 1982).
Example 3.9 (Example 3.6 continued)
Define L̃ and {˜lk } as
L̃(ejω , θo ) =|C(ejω , θo )|2 |A(ejω , θo )|4 nl
˜l|k| e−jωk
(3.51)
k=−nl
where nl = 2na + nc − 1. Furthermore, introduce the auto-correlations
c̃k defined by
π
Φu (ω) jωk
1
c̃|k| =
dω
(3.52)
e
2π −π L̃(ejω , θo )
and let np = na + nb + nd − 1. It then follows from (3.41) and (3.43) that
there exist matrices L̃k (θo ) ∈ Rn×n such that
P
−1
1
(θo ) =
2πλo
π
−π
np
Φu (ω)
L̃k (θo )e−jωk dω + Ro (θo )
L̃(ejω , θo ) k=−np
which by (3.52) is equivalent to
P −1 (θo ) =
np
1 c̃|k| L̃k (θo ) + Ro (θo ).
λo
(3.53)
k=−np
This parametrization can equivalently be obtained by using a spectrum
with the basis functions
Bk (ejω , θo ) = L̃(ejω , θo ) e−jωk .
(3.54)
The parametrization (3.53) is also complete but not minimal when nd >
0. The motivation for this over-parametrization is that it becomes possible to obtain finite dimensional parametrizations of input and output
power constraints also when nd > 0. This will be illustrated in Section 3.4.1.
57
3.3 Parametrizations of the Covariance Matrix
Remark on robustness issues
In the preceding examples the basis functions depended on the true parameter θo and in general it is necessary to know θo in order to guarantee
that the basis functions constitute a complete parametrization. However, it is sufficient that these functions are such that their projections
on X(θo ) span X(θo ) themselves. Hence, even if θo is unknown, the set of
basis functions which do not completely parametrize the set of all covariance matrices has Lesbegue measure zero and, thus, the choice of basis
functions is not that critical.
3.3.2
A Parametrization Based on a Finite Dimensional Spectrum
With a finite dimensional spectrum parametrization of the input spectrum on the form (3.30), it is possible to express the inverse covariance
matrix P −1 (θo ) as an affine function of the variables {ck }, the sequence
that parametrizes the input spectrum.
Lemma 3.2
When the input signal has the spectrum (3.30), the inverse covariance
matrix is given by
P −1 (θo ) = Ro (θo ) +
M
−1
ck BkP (θo )
(3.55)
k=0
where
BkP (θo ) =
1
2πλo
π
−π
Fu (ejω , θo )[Bk (ejω ) + Bk∗ (ejω )]Fu∗ (ejω , θo )dω.
Proof: Use the definition of the input spectrum, (3.30), and insert this
into (3.41).
3.3.3
Summary
It possible to derive a finite dimensional parametrization of P −1 on the
form
P −1 (θo ) =
M
k=−M
c̃|k| Lk (θo ) + Ro (θo ).
(3.56)
58
3 Fundamentals of Experiment Design
The expression (3.56) can actually be obtained for both finite dimensional
spectrum parametrizations and partial correlation parametrizations of
the input spectrum. The difference between these two types of spectrum
parametrizations are the constraints that are imposed on the variables
{c̃k }. The consequence of these constraints is that finite dimensional
spectrum parametrizations typically lead to incomplete parametrizations
of P −1 . A good illustration of this is given in Example 3.3. This example
shows that it may not be necessary to parametrize all covariance matrices
to achieve the optimal design. The choice of parametrization depends
also on what type of input/output constraints that are considered in the
design, as will be illustrated in the next section. A partial correlation
parametrization can e.g. not handle frequency-wise constraints on the
spectrum.
3.4
Parametrization of Signal Constraints
Here we will further explore the parametrization of the input spectrum,
defined in Section 3.2. The objective is to formulate different frequency
domain signal constraints as finite-dimensional affine functions in the
variables that define the input spectrum. The considered signal constraints are limitations on the input and/or output spectra in terms of
power as well as frequency-by-frequency constraints.
3.4.1
Parametrization of Power Constraints
For a finite dimensional spectrum parametrization the following result
applies regarding the parametrization of input and output variance constraints.
Lemma 3.3
The variance of z = Wu u, where Wu is a stable linear filter and u has
the spectrum (3.30), can be expressed as
1
2π
|Wu (ejω )|2 Φu (ω)dω =
−π
with
Bku =
π
1
2π
M
−1
ck Bku
(3.57)
k=0
π
−π
|Wu (ejω )|2 [Bk (ejω ) + Bk∗ (ejω )]dω.
(3.58)
59
3.4 Parametrization of Signal Constraints
The variance of zy = Wy y, where Wy is a stable linear filter and y is the
output corresponding to the input u above, when the system is operating
in open-loop is given by
1
2π
π
−π
|Wy (ejω )|2 Φy (ω)dω =
with
Bky (θo )
1
=
2π
M
−1
ck Bky (θo ) + Rv (θo )
(3.59)
k=0
π
−π
|Wy (ejω ) G(ejω , θo )|2 [Bk (ejω ) + Bk∗ (ejω )]dω
and
Rv (θo ) =
1
2π
π
−π
|Wy (ejω )|2 Φv (ω)dω.
(3.60)
(3.61)
Proof: The affine expression (3.57) is an immediate consequence of the
definition of the spectrum. The expression (3.59) is obtained using the
fact that y = G(θo )u+v where u and v are uncorrelated due to open-loop
operation and thus the output spectrum becomes Φy = |G(θo )|2 Φu + Φv .
Thus, the power constraints can be formulated as linear inequalities
in the new variables {ck }.
Example 3.10
When the input is shaped by an FIR filter and the weighting filters Wu
and Wy are unity, the power constraints in Lemma 3.3 become
π
1
Φu (ω)dω = r0
(3.62)
2π −π
and
1
2π
π
−π
Φy (ω)dω = Rv (θo ) +
M
−1
k=−(M −1)
G
where r|k|
is defined by
|G(ejω , θo )|2 =
∞
k=−∞
G −jωk
r|k|
e
.
G
r|k| r|k|
60
3 Fundamentals of Experiment Design
For some partial correlation parametrizations of the input, it is possible to derive finite linear parametrizations of the input or the output
power, see e.g. (Stoica and Söderström, 1982). Here we will illustrate
how to do this for a Box-Jenkins model structure.
Example 3.11 (Example 3.9 continued)
Consider the partial correlation parametrization defined by (3.51)-(3.53).
For this parametrization it holds that
1
2π
π
−π
Φu (ω)dω =
1
2π
nl
˜l|k| e−jωk Φu (ω) dω =
c̃|k| ˜l|k|
jω , θ )
L̃(e
−π k=−n
o
k=−nl
l
π
nl
(3.63)
i.e. the input energy is a linear function in {c̃k }, the variables that
parametrize P −1 , see (3.53).
3.4.2
Parametrization of Point-wise Constraints
To handle point-wise constraints
αu (ω) ≤ Φu (ω) ≤ βu (ω) ∀ ω
αy (ω) ≤ Φy (ω) ≤ βy (ω) ∀ ω
(3.64)
on the input and output spectra, the KYP-lemma may be used to transform these into linear matrix inequalities just as for the input spectrum
constraint, cf. Lemma 3.1, when the constraints are rational functions.
Alternatively, these may be added as sampled frequency domain constraints. This will not have any drastic consequences on the performance
of the algorithm, as a violation of these constraints in between the sampling points not will cause the input design to be non-realizable. Notice
that constraints such as (3.64) can only be handled by finite dimensional
spectrum parametrizations.
3.5
Experiment Design in Closed-loop
In this section we will consider experiment design in closed-loop. Some
early contributions on experiment design in closed-loop are (Ng et al.,
1977a; Ng et al., 1977b; Gustavsson et al., 1977). An interesting question
is when it is beneficial to consider design in closed-loop. There is no clear
cut answer to this. It will certainly depend on choice of objectives and
3.5 Experiment Design in Closed-loop
61
constraints. For the case of constrained input variance, it is shown in
(Ng et al., 1977b) that there is always an D-optimal design in open-loop
when the system and the noise dynamics are independently parametrized.
However, it is also shown that there may be beneficial to use feedback
when they have parameters in common.
In the context of control applications, various arguments for the advantages of identification in closed-loop has been brought forward. In
(Gevers and Ljung, 1986) it was shown that for situations when the high
order variance expression is valid, closed loop experiments under minimum variance control are optimal if the model is to be used for minimum variance control. For similar problem formulations it was shown in
(Hjalmarsson et al., 1996) that it is possible to outperform any fixed input
design by a sequence of closed loop designs, provided the experimentation
time is long enough. Again relying on the high-order variance expression,
(Forssell and Ljung, 2000) showed that, typically, closed-loop experiments
are optimal when the output variance is constrained during the experiments. However, there are system configurations, e.g. output-error systems, for which open-loop design still are optimal, even for the case of
constrained output variance. The results in (Forssell and Ljung, 2000)
show this clearly. It is, however, stated in the conclusions in the same
paper, that closed-loop always are optimal when the output variance is
constrained. This is not correct as explained in the previous discussion.
In another line of research focusing on the bias error it has also been
shown that closed-loop experiments can be beneficial, see e.g. (Zang et
al., 1995; Lee et al., 1993). Closed loop experimentation can also be
motivated by practical arguments. Most industrial processes are being
operated in closed-loop and it is often not possible to open the loop.
Therefore, identification experiments often have to be performed in closed
loop with an existing controller in the loop.
Considering the above, it is of interest to re-examine optimal closed
loop experiment design for finite model order. In this section, we will
generalize the results of Sections 3.2-3.4 to include feedback in the input.
The main difference between design in open-loop compared to design in
closed-loop are the parametrizations of signal constraints and the inverse
covariance matrix. Therefore, we will focus on these issues in this section.
We will assume that the input is generated as
u(t) = −K(q)y(t) + r(t)
(3.65)
where r(t) is an external reference that is uncorrelated with the noise
e(t). The controller K(q) is assumed to be linear and causal.
62
3.5.1
3 Fundamentals of Experiment Design
Spectrum Representations
We will use the concepts of spectrum parametrizations introduced in
Section 3.2. Thus the spectrum (2.13) can be written as
Φχo (ω) =
∞
Ck Bk (ejω ).
(3.66)
k=−∞
The major difference is that the coefficients are matrix-valued, Ck ∈
R2×2 . Furthermore, the (2,2)-element of Ck cannot in general be manipulated since it depends solely on the external noise variance λo . Typically
B0 = I since E u(t)e(t) = 0 and hence
c
0
C0 = o
0 λo
which implies that
Ck =
0
for |k| > 0.
3.5.2
Experiment Design in Closed-loop with a Fix
Controller
We will start with the simplest case where the controller is fix but where
the spectrum of the reference signal is at the designer’s disposal. We
begin by observing that the input spectrum given the feedback (3.65)
becomes
Φu (ω) = |So (ejω )|2 Φr (ω) + |K(ejω )So (ejω )Ho (ejω )|2 λo )
where So = 1/(1 + Go K) is the sensitivity function. Together with (2.13)
this gives that the spectrum Φχo is affine in the reference spectrum Φr .
Consequently, as is evidenced by (2.14), the inverse covariance matrix
P −1 is affine in the same quantity. With the input spectrum substituted
for the reference spectrum, this is exactly the basis for the design techniques for open-loop systems, see Sections 3.2-3.4. It is, hence, straightforward to modify existing open-loop design techniques to handle design
of the reference spectrum when the controller is fix.
63
3.5 Experiment Design in Closed-loop
3.5.3
Experiment Design in Closed-loop with a Free
Controller
We will now generalize the scenario to the case where, in addition to the
reference spectrum, also the feedback mechanism K(q) in (3.65) can be
chosen. We thus have both Φr and K at our disposal. However, it will
turn out to be more natural to instead use the input spectrum Φu and
the cross spectrum Φue as design variables. Since there is a one-to-one
relation between these two sets of variables this imposes no restrictions.
This will be the set-up in the remaining part of the section.
A parametrization in terms of partial correlations
Since the elements of F span a linear subspace, cf. (3.43), it follows
that the set of all covariance matrices can be parametrized in terms of
finite dimensional parametrizations of Φχo , see Section 3.3. Here we will
characterize one such parametrization. This parametrization is based on
an idea originally presented in (Payne and Goodwin, 1974) with further
developments in (Zarrop, 1979) and (Stoica and Söderström, 1982) for
input design in open-loop, see Example 3.9. Here we will present the
generalization to experiment design in closed-loop.
Consider the case where we want to obtain a linear and finite parametrization of P −1 and the input energy
1
2π
π
−π
Φu (ω)dω
(3.67)
For this we will use a partial correlation parametrization of Φχo . The
starting point is (2.14), i.e.
P −1 (θo ) =
1
2πλo
π
−π
F(ejω , θo )Φχo (ω)F ∗ (ejω , θo )dω.
(3.68)
Then parametrize F on the form
1
MkF (θo )e−jωk
D
jω
F (e , θo )
m
F(ejω , θo ) =
(3.69)
k=0
where the scalar transfer function F D (θo ) corresponds to the least common denominator to F(θo ) and where MkF (θo ), k = 1, . . . , m are some
64
3 Fundamentals of Experiment Design
real matrices. Then introduce the parametrization
∞
Φχo (ω) = |F D (ejω , θo )|2
Ck e−jωk
(3.70)
k=−∞
i.e. a parametrization of the form (3.66) with
Bk (ejω , θo ) = |F D (ejω , θo )|2 e−jωk .
Now it is straightforward to rewrite P −1 as a linear function in the autocorrelations Ck as follows
π
1
P −1 (θo ) =
F(ejω , θo )Φχo (ω)F ∗ (ejω , θo )dω
2πλo −π
π
m m
1
MkF (θo )Φχo (ω, θo )(MlF (θo ))T jω(l−k)
e
dω
=
2πλo −π
|F D (ejω , θo )|2
=
k=0 l=0
m
MkF (θo )C0 (MkF (θo ))T
k=0
+
m m−k
F
MlF (θo )Ck (Ml+k
(θo ))T
k=1 l=0
+
m m−k
F
Ml+k
(θo )CkT (MlF (θo ))T .
(3.71)
k=1 l=0
Remark: Notice that all covariance matrices can be generated by a finite
number of the auto-correlations Ck . Hence, it is sufficient to work with
a partial correlation parametrization of the spectrum Φχo .
With the parametrization (3.70) it is possible to obtain a parametrization of the input energy as a linear function of Ck . From (3.70) it follows
that
π
Φχo (ω)
1
Ck =
ejωk dω
(3.72)
2π −π |FD (ejω , θo )|2
which together with
|FD (ejω , θo )|2 =
m
k=−m
f|k| e−jωk
(3.73)
65
3.5 Experiment Design in Closed-loop
gives
1
2π
π
−π
Φu (ω)
Φ∗ue (ω)
m
Φue (ω)
f|k| Ck
dω =
λo
(3.74)
k=−m
The input energy is easily extracted from (3.74).
The use of feedback increases the flexibility of the experiment design. It has also been argued that for some system settings, closed-loop
outperforms open-loop experiment design when there are constraints on
the output variance. The parametrizations in (3.69)-(3.74) have to be
slightly changed to be able to handle variance constraints on the output.
However, the main idea remains.
The output energy is here measured by the function
π
1
|Wy (ejω )|2 Φy (ω)dω
(3.75)
2π −π
where Wy is a stable scalar transfer function. The spectrum of the output
is defined by
∗
G (θo )
.
(3.76)
Φy = G(θo ) H(θo ) Φχo
H ∗ (θo )
% χ defined by
Now introduce the spectrum Φ
o
∗
%χ
Φ
G(θo )
G (θo )
0
0
o
Φ
=
(θ
)
0
H(θo ) χo o
0
H ∗ (θo )
|Wy |2
Notice that
|Wy |2 Φy (θo ) = 1
%χ 1
1 Φ
o
1
(3.77)
(3.78)
Now let F% = [Fu /G, Fe /H] /Wy where Fu and Fe are defined in (2.15)
and (2.16), respectively. Together with (3.77) and (3.68) this gives
π
1
% jω , θo )Φ
% χ (ω)F%∗ (ejω , θo )dω.
P −1 (θo ) =
F(e
(3.79)
o
2πλo −π
Introduce F% defined by
% jω , θo ) =
F(e
1
m
F%D (ejω , θo ) k=0
&kF (θo )e−jωk .
M
(3.80)
66
3 Fundamentals of Experiment Design
Then introduce the parametrization
∞
% χ (ω) = |F%D (ejω , θo )|2
Φ
o
%k e−jωk .
C
(3.81)
k=−∞
A linear finite dimensional parametrization of P −1 is now obtainable, by
following the calculations in (3.71) based on (3.79). This gives
π
1
% jω , θo )Φ
% χ (ω, θo )F%∗ (ejω , θo )dω
F(e
o
2πλo −π
m
&F (θo )C
%0 (M
&F (θo ))T
M
=
k
k
P −1 (θo ) =
k=0
+
m m−k
F
&lF (θo )C
%k (M
&l+k
M
(θo ))T
k=1 l=0
+
m m−k
&F (θo )C
% T (M
&F (θo ))T .
M
l+k
k
l
(3.82)
k=1 l=0
%k obtained from
i.e. a parametrization of P −1 in the auto-correlations C
%
the parametrization (3.81) based on Φχo .
By following the steps in (3.72)-(3.74) it is straightforward to compute
% χ . From the specific parametrization in (3.81) together
the energy of Φ
o
with (3.76) and (3.78) it is easy to verify that (3.75) can be expressed as
1
2π
π
−π
2
|Wy (e )| Φy (ω)dω =
jω
m
k=−m
1
%
%
f|k| 1 1 Ck
1
(3.83)
%k .
Notice that (3.83) is a linear function C
We have presented two general finite linear parametrizations of P −1
that are globally optimal since they parametrizes all covariance matrices.
The parametrizations have to be adapted depending on the considered
signal constraint. Here, we have considered parametrizations of the input
energy and a weighted output variance criterion. However, this type of
parametrization can in principle handle all variance constraints that are
linear in Φχo .
To ensure that the free variables Ck and C̃k correspond to auto-
67
3.5 Experiment Design in Closed-loop
correlation coefficients, we must include the constraint
⎤
⎡
X0 · · · Xm
⎢ ..
.. ⎥ ≥ 0
..
⎣ .
.
. ⎦
T
· · · X0
Xm
whenever the parametrizations in this subsection are used. Here Xk is
%k .
either Ck or C
A parametrization based on a finite dimensional spectrum
So far we have considered parametrizations of P −1 based on partial correlation parametrizations of the spectrum. Here we will introduce a flexible
parametrization based on a finite dimensional spectrum parametrization
instead, i.e.
M
Φχo (ω) =
Ck Bk (ejω ).
(3.84)
k=−M
It turns out to be natural to split the parametrization of (3.84) into one
for Φu and one for Φue . The input spectrum will be parametrized as
Mu
Φu (ω) =
c̃|k| Bku (ejω )
(3.85)
k=−Mu
u
for some stable basis functions Bku that we assume are such that B−k
=
u ∗
(Bk ) .
The cross spectrum is defined by
Φue (ω) = −
Ho (ejω ) λo
To (ejω )
Go (ejω )
(3.86)
where To is the complementary sensitivity function defined by To = 1−So .
Throughout this section we will assume that G is stable and minimum
phase. There are different ways of parametrizing Φue . One obvious
choice is a parametrization similar to (3.85). A different choice is the
parametrization
Mc
Ho λ o sk Bkc (ejω )
Φue (ω) = −
Go
k=0
(3.87)
68
3 Fundamentals of Experiment Design
where {Bkc (ejω )} represents a set of stable basis functions. Notice that the
parametrization in (3.87) corresponds to a linear and finite parametrization of To . This has some advantages. For example, since To corresponds
to the closed-loop response, certain properties like the bandwidth can be
taken into account already in the choice of basis functions. Furthermore,
it is important that the design yields a stable closed-loop system. The
parametrization (3.87) will actually yield a stabilizing controller for any
sequence {sk } when Go is stable and minimum phase together with the
natural requirement that the basis functions are stable. To realize this,
we need to check the stability of To , So , Go So and KSo . First consider
To , that given the parametrization (3.87) is equal to
To (ejω ) =
Mc
sk Bkc (ejω ).
k=0
Hence To is obviously stable when the basis functions are so. Hence,
So = 1 − To will also stable. Then So Go will also be stable as long as Go
is stable. The last quantity we need to check is KSo = To /Go , which is
stable when Go is minimum phase.
For a sequence {sk }, the controller is given by
Mc
s B c (q)
' k=0k k
(.
K(q) =
Mc
Go (q)) 1 − k=0
sk Bkc (q)
(3.88)
Remark:
action in the controller can be imposed by the conMIntegral
c
sk Bkc (1) = 1 .
straint k=0
When the input spectrum and the cross spectrum are defined by (3.85)
and (3.87), respectively, the inverse covariance matrix (3.68) is given by
P −1 (θo ) =Ro (θo ) +
Mu
c̃|k| BPu (k) −
k=−Mu
Mc
sk BPc (k) + (BPc (k))T
k=0
(3.89)
Notice that (3.89) is a linear and finite parametrization in c̃k and sk . In
(3.89) Ro is defined by (3.42),
BPu (k) =
1
2πλo
π
−π
Fu (ejω , θo )Bku (ejω )Fu∗ (ejω , θo )dω
69
3.5 Experiment Design in Closed-loop
and
BPc (k) =
1
2π
π
−π
Fu (ejω , θo )Fe∗ (ejω , θo )
Ho (ejω ) c jω
B (e )dω.
Go (ejω ) k
The variance of zy = Wy y, where Wy is a stable linear filter and y
has the spectrum (3.76), can be expressed by the linear relation
1
2π
π
−π
Mu
|Wy (ejω )|2 Φy (ω)dω =
k=−Mu
c̃|k| Byu (k) −
Mc
sk Byc (k) + Rv (θo )
k=0
(3.90)
where
Byu (k)
Byc (k)
1
=
2π
1
=
2π
π
−π
π
−π
|Wy (ejω ) Go (ejω )|2 Bk (ejω )dω,
|Wy (ejω )Φv (ω)[Bkc (ejω ) + (Bkc )∗ (ejω )]dω
and
Rv (θo ) =
1
2π
π
−π
|Wy (ejω )|2 Φv (ω)dω
where Φv = |H(θo )|2 λo .
The expression (3.89) is another finite linear parametrization of P −1 ,
based on a finite spectrum parametrization of Φχo represented by (3.85)
and (3.87). As for the open-loop case, this type of parametrization will in
general have a higher complexity than the partial correlation parametrizations. However, this class of parametrizations can handle a larger class
of constraints. Beside variance constraints as (3.90) also point-wise constraints, see (3.64), can be treated, which is not the case for parametrizations based on partial correlations. Furthermore, as illustrated, it possible
to e.g. impose certain characteristics of the closed-loop and the controller
directly in the design stage, see (3.87), (3.88) and the remark..
Since the parametrization is based on a finite spectrum parametrization of Φχo , the free variables {c̃k } and {sk } must be constrained such
that Φχo ≥ 0, ∀ ω. For this, Lemma 3.1 is useful.
70
3.6
3 Fundamentals of Experiment Design
Quality Constraints
The results in Section 3.3 and Section 3.5 show that there are several
possibilities to obtain a linear and finite dimensional parametrization of
the inverse covariance matrix P −1 in a set of variables {xk }. Thus, all
constraints that are convex in P −1 also become convex in {xk }. This is
a very important observation that we will further explore in this section
and in Section 3.7 in order to incorporate different quality constraints
into our optimal experiment designs.
3.6.1
Convex Representation of Quality Constraints
There are several classical performance criteria for input design that are
convex in P −1 . For example, λmax (P ) ≤ γ, (Boyd et al., 1994), where
the operator λmax extracts the largest eigenvalue. Another example is
det P ≤ γ (Nesterov and Nemirovski, 1994). A type of criterion that several times has been suggested in input design for control is the weighted
trace criterion, Tr W P ≤ γ. This type of criterion is convex in P −1 as
the following result shows. For generality we allow the weighting function
W to be frequency dependent.
Lemma 3.4
The constraints
Tr W (ω)P ≤ γ
∀ω
∗
W (ω) = V (ω)V (ω) ≥ 0 ∀ ω
P ≥ 0,
(3.91)
may be written as the following constraints;
Γ(ω) γ − Tr Z ≥ 0
Z
V (ω)
≥ 0 ∀ω.
V ∗ (ω) P −1
(3.92)
Proof: The factorization of W (ω) leads to
Tr W (ω)P = Tr V ∗ (ω)P V (ω).
(3.93)
Introduce the slack variable Z ∈ Rz×z . Then the constraint Tr W (ω)P ≤
γ together with (3.93) can be written as
Tr Z ≤ γ
Z − V (ω)P (Φu )V (ω) ≥ 0.
∗
(3.94)
71
3.6 Quality Constraints
Using Schur-complement, (3.94) may be written as (3.92) which is linear
in γ, Z and P −1 and thus the convexity is fulfilled.
The following example will illustrate some situations where the weighted
trace criterion may appear as quality constraint.
Example 3.12
Consider the constraint (3.22) that is one example that can be written
as a frequency dependent weighted trace constraint:
2
1 T (ejω ) dG∗ (θo ) dG(θo )
P
≤1
N Go (ejω ) dθ
dθ
2
1 T (ejω ) dG(θo ) dG∗ (θo )
P ≤1
⇔ Tr
N Go (ejω ) dθ
dθ
(3.95)
⇔ Tr W (ω)P ≤ 1
where
1
W (ω) =
N
T (ejω ) 2 dG(θo ) dG∗ (θo )
.
Go (ejω ) dθ
dθ
Corollary 3.1 The constraints (3.91) can be written as
P −1 −
1
V (ω)V ∗ (ω) ≥ 0
γ
∀ω
(3.96)
when V (ω) is a vector.
Proof: The slack variable Z in (3.92) becomes a scalar when V (ω) is
a vector. Hence, Z can be replaced by γ. Then it is straightforward to
use Schur complements to obtain (3.96) from (3.92).
3.6.2
Application of the KYP-lemma to Quality Constraints
Even though many quality constraints are convex, they will not necessarily become finite-dimensional. One example is the frequency-byfrequency weighted trace criterion in Lemma 3.4. In some situations it is
possible to treat such constraints as positiveness constraints on spectra
and use the idea of Lemma 3.1.
72
3 Fundamentals of Experiment Design
Lemma 3.5
Suppose that V (ω) is a frequency function with controllable state-space
realization {AV , BV , CV , DV }. Let Γ(ω) be defined as in Lemma 3.4.
It then holds that Γ(ω) ≥ 0 ∀ ω if and only if there exists QΓ = QTΓ
such that (recall the definition (3.32) of K)
K(QΓ , {AΓ , BΓ , CΓ , DΓ }) ≥ 0
where
BΓ = 0 BV ,
Z
DV
T
.
DΓ + DΓ =
DVT P −1
AΓ = AV ,
CΓ =
CV
0
(3.97)
and
Proof: The state-space realization {AΓ , BΓ , CΓ , DΓ } defined in Lemma
3.5 gives Γ(ω) = Γ+ (ejω ) + Γ∗+ (ejω ) where
Γ+ (ejω ) = CΓ (ejω I − AΓ )−1 BΓ + DΓ .
Furthermore, if {AV , BV , CV , DV } is controllable then also the realization {AΓ , BΓ , CΓ , DΓ } is that. Thus, the Positive Real Lemma (Yakubovich,
1962) can be applied.
Notice that since the elements of Z and P −1 appear linearly in DΓ +
independently of the realization of V , the only constraint on the
realization of {AV , BV , CV , DV } is that it is controllable.
DΓT ,
3.7
Quality Constraints in Ellipsoidal Regions
In this section we will focus on quality constraints that are based on the
uncertainty set (2.28), defined in Section 2.3.2. To illustrate this, consider
the frequency function
∆(ejω , θ) = T (ejω )
Go (ejω ) − G(ejω , θ)
G(ejω , θ)
introduced in (3.19), and consider the worst-case measure of ∆ over the
set of models in the confidence region:
max |∆|2
ω,θ∈U
U = {θ | N (θ − θo )T P −1 (θ − θo ) ≤ χ}
(3.98)
3.7 Quality Constraints in Ellipsoidal Regions
73
With this setup it is possible to guarantee that, say, 95% of all identified
models will satisfy the quality constraint. In this section we will introduce
a new family of quality measures of the model G which includes (3.98).
Let Wn , Wd , Xn and Xd be finite-dimensional stable transfer functions. Let Yn (ω) be defined by
Yn = Yn∗ Yn
(3.99)
where Yn is some stable finite-dimensional transfer function. Let Yd , Kn
and Kd be defined analogously. Furthermore, let R be a positive definite
matrix.
The generalized quality measure is defined as
F (ω, η) ≤ γ ∀ ω & ∀ η ∈ Υ
Υ = {η | (η − ηo )T R(η − ηo ) ≤ 1}.
(3.100)
where
F (ω, η) =
[Wn G(η) + Xn ]∗ Yn [Wn G(η) + Xn ] + Kn
.
[Wd G(η) + Xd ]∗ Yd [Wd G(η) + Xd ] + Kd
(3.101)
The quality measure (3.100) is a max-norm constraint on F with
respect to ω and it has to be satisfied for all η in the ellipsoid Υ. Uncertainty sets such as Υ are e.g. delivered by the prediction error method,
see Section 2.2.2.
We illustrate the usefulness of the measure via two examples.
Example 3.13
Taking Wn = Wd = 1, Xn = −Go , Yn = T , and Xd = Kn = Kd = 0 gives
that
|G(η) − Go |2 |T |2
F (ω, η) =
= |∆|2 .
|G(η)|2
Thus the generalized quality measure (3.100) includes (3.98).
Example 3.14 (Worst-case chordal distance)
The square of the chordal distance (Vinnicombe, 1993) between G(η) and
Go can be written as
κ2 (ω, η) =
|G(ejω , η) − Go (ejω )|2
.
(1 + |G(ejω , η)|2 )(1 + |Go (ejω )|2 )
(3.102)
Taking Yn = Wn = Wd = 1, Xn = −Go , Xd = 0, Kn = 0 and Kd = Yd =
1 + |Go |2 gives
F (ω, η) = κ2 (ω, η).
74
3 Fundamentals of Experiment Design
As the aforementioned example illustrated, the generalized quality measure (3.100) also includes the worst case chordal distance. Thus the
proposed methods in this section will be able to handle a quality constraint in terms of an upper bound on the worst case chordal distance
as well. The objective of this section is to develop tools such that constraints like (3.100) can be incorporated in the framework developed in
Section 3.2-3.6. Thus we want to transform (3.100) into linear matrix
inequalities.
Input design based on the worst case chordal distance as the objective function has been treated in (Hildebrand and Gevers, 2003). This
is the first contribution that has considered input design with respect to
parametric uncertainties in terms of confidence ellipsoids. The method
in (Hildebrand and Gevers, 2003) is an iterative procedure that consists
of two steps in each iteration. In the first step the worst case chordal
distance is computed for a fix input design by a method proposed in
(Bombois et al., 1999). This method solves a convex optimization problem for each frequency on a frequency grid and then the maximum over
this grid is taken as the worst case chordal distance, see Section 7.2. Based
on this solution, a cutting plane is defined for the input design variables
and in the second step they are updated by an ellipsoid algorithm based
on the cutting plane. Then the first step is repeated. Consequently, the
method obtains a solution by solving several convex optimization problems. In this section we will propose an alternative solution in which
only one optimization problem is solved. A second difference is that we
consider a fixed bound on the quality constraint while in (Hildebrand and
Gevers, 2003) the input power is fixed.
3.7.1
Reformulation as a Convex Problem
Consider the model structure (2.2), which is parametrized by the vector
θ. Partition θT = [η T ξ T ] such that G(θ) = G(η) and let G(η) be
parametrized as
G(η) =
q −nk (b1 + · · · + bnb q −nb +1 )
ZN (q)η
=
1 + · · · + ana q −na
1 + ZD (q)η
(3.103)
where nk is the delay, η T = [a1 . . . ana b1 . . . bnb ] ∈ RnG na +nb ,
ZN (q) = q −nk [0 . . . 0 1 q −1 . . . q −nb +1 ]
and
(3.104)
3.7 Quality Constraints in Ellipsoidal Regions
ZD (q) = [q −1 . . . q −na 0 . . . 0]
75
(3.105)
are row vectors of size nG . Let A denote the complex conjugate of A.
Lemma 3.6
Let F (ω, η) be defined by (3.101) and let G(η) be parametrized as (3.103).
Then
T
η
η
(γF0 (ω) − F1 (ω))
≥0
(3.106)
F ≤γ⇔
1
1
where F0 (ω) = f (ω)(Md (ω)+Md (ω)) and F1 (ω) = f (ω)(Mn (ω)+Mn (ω))
and where
Md (ω) = ZV∗ (Yd (ω)xd x∗d + Kd (ω)vv T )ZV
Mn (ω) = ZV∗ (Yn (ω)xn x∗n + Kn (ω)vv T )ZV
(3.107)
∗
∗
ZN ZD
0
with
=
, v T = [0 1 1], x∗d = [Wd (ejω ) Xd (ejω ) Xd (ejω )]
0
0 1
and xn defined analogous to xd . Furthermore f (ω) is the least common
denominator of Md (ω) and Mn (ω).
ZV∗
Proof: Using that both the numerator and the denominator of F
have the quadratic form (W G + X)∗ Y (W G + X) + K and exploiting the
parametrization of G in (3.103), we obtain
F ≤γ⇔
T
η
η
(γMd (ω) − Mn (ω))
≥0
1
1
where Md and Mn are defined by (3.107). The equivalence still holds
when it is multiplied by f (ω) and since η is real this will be equivalent
to (3.106).
The equivalence in (3.106) will be further exploited in the next theorem.
Theorem 3.1
Assume that F (ω, η) < ∞ for all ω. Furthermore assume that γF0 (ω) −
F1 (ω) is not positive semidefinite. Then the following two statements are
equivalent:
76
3 Fundamentals of Experiment Design
1.
F (ω, η) ≤ γ
∀ ω ∈ [−π, π] and ∀ η ∈ Υ,
Υ = {η | (η − ηo )T R(η − ηo ) ≤ 1}
(3.108)
2. ∃ τ (ω) > 0, τ (ω) ∈ R such that
τ (ω)(γF0 (ω) − F1 (ω)) − E ≥ 0 ∀ ω
−R
Rηo
E T
ηo R 1 − ηoT Rηo
(3.109)
Proof: Lemma 3.6 gives that F ≤ γ is equivalent to
T
η
η
σ0 (η) (γF0 (ω) − F1 (ω))
≥ 0.
1
1
(3.110)
Expression (3.110) is equivalent to that F ≤ γ for a particular η. Now
this must be true for all η ∈ Υ. The ellipsoid Υ can be parametrized as
T −R
η
σ1 (η) 1
ηoT R
Rηo
1 − ηoT Rηo
η
.
1
(3.111)
Hence the condition F ≤ γ, ∀ ω ∈ [−π, π] and η ∈ Υ is equivalent to
σ0 (ω) ≥ 0 for all ω and for all η such that σ1 (η) ≥ 0. Such a problem
can be handled by the S−procedure (Boyd et al., 1994) which states the
following equivalence for each ω: σ0 (η) ≥ 0 ∀ η ∈ Rk such that σ1 (η) ≥ 0
⇔ ∃ β ≥ 0, β ∈ R, such that σ0 (η) − βσ1 (η) ≥ 0 ∀η ∈ Rk . Since there
has to exist one β ≥ 0 for each ω, we can rewrite β as a function of ω
which has to fulfill β(ω) ≥ 0 for all ω. Finally to obtain the expression
1
which is valid if we
(3.109) we change the variable β(ω) as, τ (ω) = β(ω)
can show that β(ω) = 0. The expression σ0 (η) − βσ1 (η) ≥ 0 is true for
β(ω) = 0 only if γF0 (ω) − F1 (ω) ≥ 0 for some ω. The assumption that
γF0 (ω) − F1 (ω) 0 guarantees that β(ω) = 0. Hence, we obtain the
condition τ (ω) > 0 and we arrive at the statement in (3.109).
Theorem 3.1 is very interesting from an input design point of view.
When considering input design, the variable R in (3.109) will be proportional to the inverse covariance matrix of the parameters, see (2.20).
Furthermore, the inverse covariance matrix is affine in the input spectrum
3.7 Quality Constraints in Ellipsoidal Regions
77
Φu and the cross spectrum Φue , see (2.14). With a linear parametrization of these spectra, R also becomes linearly parametrized. Hence, Theorem 3.1 states that the worst-case (over all models in an ellipsoidal model
set) max-norm performance constraint (3.108) can be translated into the
condition (3.109), which is a linear matrix inequality in τ and R for each
ω when γ is fix and F (·) is given by (3.101).
There are two problems associated with this. First of all, τ is an
unknown function of ω. Furthermore, this is an infinite dimensional constraint similar to the performance constraint in (3.92), since the constraint has to hold for all frequencies. In the next subsection we will
address these issues.
3.7.2
A Finite Dimensional Formulation
It was shown in Lemma 3.1 that when the constraint can be viewed as
a positiveness constraint on a spectrum, the KYP-lemma can be applied
to make the constraint finite-dimensional. We now introduce some conditions that will allow the KYP-lemma to be applied to (3.109) such
that this constraint can be reduced to a finite dimensional linear matrix
inequality.
Lemma 3.7
Let F0 (ω), F1 (ω) and E be defined by (3.106), (3.107) and (3.109) and
introduce
Λ(ω) τ (ω)(γF0 (ω) − F1 (ω)) − E.
(3.112)
When τ (ω) is defined by
τ (ω) = Ψ(ejω ) + Ψ∗ (ejω )
Ψ(ejω ) =
K−1
τk Bk (ejω )
(3.113)
k=0
for some linearly independent basis functions Bk , k = 0, . . . , K − 1, there
exists a sequence {Λk } such that
Λ(ω) ≥ 0 ∀ ω
⇔
p
Λk (e−kjω + ekjω ) ≥ 0 ∀ ω
(3.114)
k=0
where the variables {τk } and the elements of R appear linearly in {Λk }.
78
3 Fundamentals of Experiment Design
Proof: Both F0 and F1 have the structure k Fk e−kjω . Multiply with
the least common denominator of τ (ω) to both sides of Λ(ω) ≥ 0. This
gives the equivalence in (3.114).
The special parametrization (3.113) of τ (ω) will, according to Lemma 3.7,
imply that the condition Λ(ω) ≥ 0 ∀ ω can be replaced by a positiveness
constraint on a spectrum, see (3.114). This fact can now be used together
with Lemma 3.1 to transform the infinite dimensional constraint (3.100)
into a linear matrix inequality in the variables {τk } and the elements of
R.
Theorem 3.2
Assume that F (ω, η) < ∞ for all ω and for all η ∈ Υ. Let F0 (ω) and
F1 (ω) be defined by (3.106) and (3.107). Assume that γF0 (ω) − F1 (ω) is
not positive semidefinite. Let τ (ω) be defined as in Lemma 3.7.
Then there exists a state-space representation {Aτ , Bτ , Cτ , Dτ } of
the positive real part of τ (ω) where {τk } appears linearly in Cτ and Dτ .
Similarly, there exists a state-space representation {AΛ , BΛ , CΛ , DΛ } of
the positive real part of the spectrum (3.114) where {τk } and the elements
of R appear linearly in CΛ and DΛ .
Furthermore, it holds that
F (ω, η) ≤ γ
∀ ω ∈ [−π, π] and ∀ η ∈ Υ,
Υ = {η | (η − ηo )T R(η − ηo ) ≤ 1}
if there exist Qτ = QTτ and QΛ = QTΛ such that
)
K(Qτ , {Aτ , Bτ , Cτ , Dτ }) ≥ 0
.
K(QΛ , {AΛ , BΛ , CΛ , DΛ }) ≥ 0
(3.115)
(3.116)
Proof: Due to the parametrization of τ (ω), the constraints (3.116)
will assure that τ (ω) ≥ 0 ∀ ω and Λ(ω) ≥ 0 ∀ ω according to Lemma 3.1
and Lemma 3.7. Whenever τ (ω) ≥ 0 ∀ ω and Λ(ω) ≥ 0 ∀ ω this will,
according to Theorem 3.1, imply (3.115).
Theorem 3.2 is quite powerful in experiment design problems. Notice
that the inequalities (3.116) are LMIs in Qτ , QΛ , τk k = 0, . . . , K − 1 and
the elements of R. Furthermore, when R = P −1 this quantity becomes
linearly parametrized when the input spectrum is linearly parametrized as
in Section 3.2. The use of this theorem will be illustrated in Section 3.11.
79
3.8 Biased Noise Dynamics
3.8
Biased Noise Dynamics
We have so far treated the case where both the system model G and the
noise model H are flexible enough to capture the corresponding quantities
of the true system. Here we will relax this assumption and only assume
that the system dynamics are captured by the model structure. Thus, H
is allowed to be biased. We will further assume that G and and H are
independently parametrized. Introduce the parameter vector
η
θ=
ξ
such that
G(θ) = G(η)
and H(θ) = H(ξ).
The true noise dynamics is represented by Ho and let Φv (ω) = |Ho (ejω )|2 λo
denote the noise spectrum. Furthermore, there exists a parameter vector
ηo such that G(ηo ) = Go , since we assume that G captures the structure
of the true system Go . Then the prediction error estimate (2.11) will
under mild assumptions be such that
η̂N
η
lim
(3.117)
= θ̄ = ¯o
ξ
N →∞ ξˆN
where θ̄ minimizes the variance of the prediction errors, see (Ljung, 1999).
Furthermore, the covariance of η̂N is, under the assumption of open-loop
operation, approximately
Cov η̂N ≈
where
1
Rη =
2π
1
Qη =
2π
π
−π
π
1 −1
R Qη Rη−1
N η
(3.118)
F%u (ejω , ηo )F%u∗ (ejω , ηo )Φu (ω)dω,
(3.119)
Φv (ω)
jω
∗ jω
F%u (e , ηo )F%u (e , ηo )Φu (ω)dω,
¯ 4
H(ejω , ξ)
(3.120)
−π
and
jω
dG(e , ηo )
1
F%u (ejω , ηo ) = .
¯ 2
dη
H(ejω , ξ)
(3.121)
80
3 Fundamentals of Experiment Design
Remark: Both Rη and Qη are linear in the input spectrum Φu .
We will now utilize the observation in the last remark in order to make
different quality constraints that are based on the property (3.118) to be
convex in the input spectrum. Notice that Rη and Qη have the same
structure as P −1 in (3.41) with Ro = 0. From this, it is easy to realize
that the methods to parametrize the covariance matrix P −1 , presented
in Section 3.3, also apply to Rη and Qη . Thus, it is possible to obtain
a linear and finite-dimensional parametrization of Rη and Qη . This is
useful when formulating quality constraints for optimal input design.
3.8.1
Weighted Variance Constraints
The first example of where (3.118) can be utilized is when we consider the
variance of G(ejω , η̂N ), which, using a first order Taylor approximation,
can be expressed as
Var G(ejω , η̂N ) ≈
dG(ejω , ηo )
1 dG∗ (ejω , ηo ) −1
Rη Qη Rη−1
.
N
dη
dη
(3.122)
Now consider the constraint
∗ jω
jω
W (ejω )2 dG (e , ηo ) Rη−1 Qη Rη−1 dG(e , ηo ) ≤ 1 ∀ ω,
dη
dη
(3.123)
where W (ejω ) is a stable transfer function. This is a frequency-byfrequency constraint on the variance of G(ejω , η̂N ). Notice that Corollary 3.1 now applies to the constraint (3.123) with P −1 = Rη Q−1
η Rη .
Thus, (3.123) is equivalent to
jω
∗ jω
Rη Q−1
η Rη − V (e )V (e ) ≥ 0 ∀ ω,
(3.124)
where
V (ejω ) = W (ejω )
dG(ejω , ηo )
.
dη
(3.125)
The constraint (3.124) can, by the use of Schur complements, be expressed as
V (ejω )V ∗ (ejω ) Rη
≤ 0 ∀ ω.
(3.126)
Rη
Qη
81
3.8 Biased Noise Dynamics
Remark: The constraint (3.126) is a linear matrix inequality in Rη and
Qη .
The constraint can by suitable parametrizations of Rη and Qη , see
Section 3.3, become a finite-dimensional LMI for each fixed frequency.
To handle the frequency dependence, the function (3.126) can either be
sampled or Lemma 3.5 may be used, resulting in LMIs of the type (3.97).
3.8.2
Parametric Confidence Ellipsoids
It is also possible to define confidence regions for the estimates η̂N as
N (η̂N − ηo )T Rη Q−1
η Rη (η̂N − ηo ) ≤ χ.
(3.127)
In Section 3.7, we considered the generalized quality measure
F (ω, η) ≤ γ ∀ ω & ∀ η ∈ Υ
Υ = {η | (η − ηo )T R(η − ηo ) ≤ 1}.
(3.128)
where F was defined by (3.101). In Section 3.7, it was shown how the
constraints defined by (3.128) could be transformed into constraints that
are linear in the matrix R, the matrix that determines the shape of
the confidence ellipsoid. Now we will show how the confidence ellipsoid (3.127) can be fit into the theory developed in Section 3.7. For this
we let R = Rη Q−1
η Rη and the objective is to obtain a constraint linear
in Qη and Rη .
Under the assumptions stated in Theorem 3.1, we know that the quality constraint (3.128) is equivalent to
τ (ω)(γF0 (ω) − F1 (ω)) − E ≥ 0 ∀ ω
−R
Rηo
E T
ηo R 1 − ηoT Rηo
(3.129)
for some positive, real and scalar valued function τ (ω) > 0. Notice that
(3.129) is linear in R. Introduce the matrix M (ω) defined by
⎡ ⎤
0
⎢ .. ⎥ ⎢ ⎥
(3.130)
M (ω) = τ (ω)(γF0 (ω) − F1 (ω)) − ⎢ . ⎥ 0 · · · 0 1 .
⎣0⎦
1
82
3 Fundamentals of Experiment Design
Then (3.129) is equivalent to
I
M (ω) + T R I
ηo
ηo ≥ 0 ∀ ω.
(3.131)
Now let R = Rη Q−1
η Rη and insert this into (3.129). Then (3.129) is,
using Schur complements, equivalent to
⎤
⎡
Rη
M
(ω)
⎣
ηoT Rη ⎦ ≥ 0 ∀ ω.
(3.132)
Rη Rη ηo
−Qη
which now is a constraint that is linear in Qη and Rη . One possibility
to handle the frequency dependence in the constraint (3.132) is to apply
the theory in Section 3.7.2 to obtain a finite-dimensional formulation of
(3.132). An alternative is to sample the constraint (3.132) along the frequency axis. For parametrizations of Rη and Qη , we refer to Section 3.3.
3.9
Computational Aspects
We have in this chapter frequently used the KYP-lemma to embed positive constraints on finite auto-correlation sequences into LMI descriptions, cf. Lemma 3.1, Lemma 3.5 and Theorem 3.2. The use of KYP
constraints appears in many control and signal processing applications,
e.g. linear system design and analysis (Boyd and Barratt, 1991; Hindi
et al., 1998), robust control design using integral quadratic constraints
(Jönsson, 1996; Megretski and Rantzer, 1997), quadratic Lyapunov function search (Boyd et al., 1994). Applications related to finite autocorrelation sequences are filter design (Wu et al., 1996) and MA-estimation
(Stoica et al., 2000).
However, to embed KYP-constraints via semidefinite programs are
often very computational costly. The computational complexity is of
the order O(n6 ) using standard solvers where n is the number of free
parameters. This has lead to a number of contributions that study different possibilities to reduce the complexity for this type of problems,
see e.g. (Hansson and Vandenberghe, 2001; Wallin et al., 2003; Gillberg
and Hansson, 2003; Kao et al., 2004). Methods related to finite autocorrelation sequences are reported in (Dumitrescu et al., 2001; Alkire
and Vandenberghe, 2002). It is shown that by solving a dual problem
the complexity can be reduced to O(n4 ) or even further for some cases
where the structure of the dual is exploited.
3.10 Robustness Aspects
3.10
83
Robustness Aspects
Solutions to most optimal input design problems depend on the true, and
unknown, underlying system. One common way to overcome this is to
replace the true system in the design by some estimate of the system that,
e.g. , is obtained from an initial identification experiment. However, due
to the estimation error, there is no guarantee that a design based on such
an estimate will yield a solution that is satisfactory when applied to the
true system. Hence there is a need to develop methods that are robust
with respect to the true system.
Below we will discuss some important issues: The parametrization of
the input spectrum, mini-max solutions with respect to a set of initial
models, the influence of input design on an estimated low order models
and finally we will point out adaptation as a useful tool for experiment
design.
3.10.1
Input Spectrum Parametrization
As we have discussed in Section 3.2 it may happen that the input design
problem only depends on c̃0 , . . . , c̃M −1 , for some finite positive integer
M , in a certain expansion of the input spectrum (3.27). The additional
degrees of freedom, c̃M , c̃M +1 , . . . can then be used to increase robustness
of the design. For the partial correlation parametrization, it is clear that
different correlation extensions yield different robustness properties. Using a discrete spectrum with a minimal number of non-zero spectral lines
for the correlation extension may lead to a design that is very vulnerable
to errors in a priori assumptions about the system behavior. For example, if the true system order turns out to be higher than the number of
non-zero spectral frequencies (over the interval [0, 2π)) the system will
not be identifiable if a separately parametrized noise model is used. Furthermore, even if, say, an ARX-model is used so that the denominator
polynomial in the system dynamics may be estimated from the noise,
certain directions of the parameters associated with the system dynamics are not improved upon by the input in this situation. An all-pole
realization, on the other hand, will yield a spectrum solution that is nonzero everywhere and can be used to identify models of any order. It is
hence important to consider robustness issues when deciding upon which
correlation extension method to use.
In the finite dimensional spectrum parametrization additional constraints related to robustness can be included in the original program,
84
3 Fundamentals of Experiment Design
e.g. frequency-by-frequency bounds to guarantee a certain excitation level.
An alternative to make the design less sensitive to the system estimate
that is used for the design is to restrict the degrees of freedom in the
parametrized spectrum. This restriction will prevent the design to adapt
too much to the system the design is based on. Consider the design for
a resonant system as illustrated in Section 3.1.4. When there was no
frequency-by-frequency bound on the input spectrum, the design concentrated most of the input power around the first resonance peak. Such
a design may be very vulnerable with respect to bad estimates of this
peak, cf. Section 5.3.3. Hence, a more flat spectrum will be less sensitive
to good knowledge of the location of certain narrow frequency bands of
the underlying true system. A way to force the design to produce flat
spectra, is to restrict the flexibility in the parametrization.
3.10.2
Working with Sets of a Priori Models
Robustness can be achieved by posing the design problem such that performance objectives and constraints are satisfied for all systems within
some prior model set. Unfortunately, at present there exists no input
design method with this property. However, for the approach described
in this thesis it is possible to include constraints and objective functions
for several systems simply by adding LMIs for each separate system to
the overall problem. Even though no guarantees can be given in general
that the objectives are met for the true system, this approach provides
improved robustness compared to a design problem that is based on a
single prior model.
Given an initial estimate θ̂i and associated covariance matrix PN one
may for example pick models in the corresponding uncertainty set Uθ̂i
(see (2.20) for a definition of Uθ ). One may also draw samples from the
corresponding Normal distribution.
In the numerical example given in the next section we will compare the
optimal solution, based on the knowledge of the true system, with a design
where the true system is replaced by normal distributed samples lying in
a confidence region obtained from an initial identification experiment.
3.10.3
Adaptation
Another way to combat the uncertainty is to adapt the experiment design
as more data and thus more information of the system is gathered. This
has e.g. been suggested in (Forssell and Ljung, 2000; Lindqvist and Hjal-
85
3.10 Robustness Aspects
marsson, 2000; Samyudiya and Lee, 2000). In (Hjalmarsson et al., 1996),
a closed-loop experiment design is used in a two-step procedure. There
are, however, few contributions on adaptive experiment design. One exception is presented in (Lindqvist, 2001), where an FIR input filter is
adaptively updated. The algorithm is guaranteed to converge under mild
assumptions on the input filter together with the assumption of the system belonging to the model class. Furthermore, the algorithm is stable
as long as the open-loop system is stable and provided that the input
variance is constrained. Other recent contributions related to adaptive
input design are (Rivera et al., 2003) and (Lacy et al., 2003),
3.10.4
Low and High Order Models and Optimal Input Design
In reality we so not know the complexity of the true system. Here we will
illustrate, by means of a very simple example, that optimal input design
can be useful to obtain models of both low and high order that have the
same statistical accuracy as the corresponding full order model estimate.
Example 3.15
Suppose that the objective is to estimate the static gain of the FIR system
y(t) =
n
bok u(t − k) + eo (t)
(3.133)
k=1
where eo is Gaussian white noise with variance
N λo . Furthermore, assume
that the input power is bounded by N1 t=1 u2 (t) ≤ λu . It is easy to
√
realize that a constant input with amplitude λu is optimal for estimating the static gain, cf. Example 3.3 and let a tend to one. This will of
course lead to a poorly conditioned problem if we use a model of order
two or larger. Let us study the properties of a linear regression model of
arbitrary order when the input is static. The model is thus represented
by
⎡ ⎤
b1
⎢ . ⎥
T
y(t) = ϕ (t)θ + e(t) = u(t − 1) · · · u(t − m) ⎣ .. ⎦ + e(t) (3.134)
bm
where e represents white noise. Thus the one-step ahead output prediction is
ŷ(t) = ϕT (t)θ
(3.135)
86
3 Fundamentals of Experiment Design
where θm = [b1 , . . . , bm ]T . The least-squares estimate θ̂m is a solution to
the normal equations
N
t=1
ϕ(t)y(t) =
N
ϕ(t)ϕT (t)θm .
t=1
Since u is static with amplitude |u(t)| =
of (3.136), from which we obtain that
√
(3.136)
λu , we can study the first row
[1, . . . , 1]θ̂m = [1, . . . , 1]θo + √
N
1
eo (t)
λu N t=1
(3.137)
where θo = [bo1 , . . . , bom ]T . Hence, the least-squares estimate θ̂m will provide an unbiased estimate of the true static gain since
E{[1, . . . , 1]θ̂N } = [1, . . . , 1]θo .
(3.138)
This is independent of the model order, i.e. this holds both for models of
lower order than the true system as well for over-parametrized models.
Furthermore, the variance of estimate of the static gain will be λo /(N λu )
also independently of the model order, i.e. they will have the same accuracy as the static gain estimate based on a full-order model.
This example is further illustrated in (Hjalmarsson, 2004). This is a very
interesting example. It illustrates that it is possible to obtain an accurate
estimate of the static gain when the model order is lower than the true
system by applying a suitable input. Normally, undermodeling leads to a
bias contribution that makes the total error larger than the error obtained
with a full-order model. Furthermore, we obtain the same accuracy even
in the case of over-parametrization. The variance error usually increases
with the model order. In Chapter 6, we will see a similar phenomena
when we apply optimal input design for identification of system zeros.
3.11
Framework Review and Numerical Illustration
The necessary pieces for a quite flexible framework for experiment design
have been presented in the preceding sections. Here we will briefly recapitulate the main points. Furthermore, an example illustrating some of
the features will be presented.
3.11 Framework Review and Numerical Illustration
87
The formulation of an experiment design problem can be decomposed
into the following parts:
• Spectrum parametrization.
Here, the framework allows for any linear and finite-dimensional
parametrization of the spectrum or a partial expansion thereof.
This includes e.g. all-zero (FIR) and all-pole (AR) spectra, as well
as discrete spectra.
The choice of whether to use a finite dimensional spectrum parametrization or a partial correlation parametrization is governed by:
– optimality aspects,
– computational aspects,
– signal constraint aspects, and
– robustness aspects.
The partial correlation parametrization is globally optimal and may
use a minimal number of parameters leading to less computational
complexity. However, certain signal constraints can not be guaranteed and the parametrization may depend on the true system.
The finite dimensional spectrum parametrization does not in general yield a globally optimal solution but the basis functions need
not be functions of the true system and this approach can handle
frequency-by-frequency signal constraints.
• Quality constraints.
General linear, possibly frequency dependent, functions of the asymptotic covariance matrix P can be used. Such functions take the form
(3.92) in Lemma 3.4. To handle frequency dependence, the function
(3.92) can either be sampled or Lemma 3.5 may be used, resulting
in LMIs of the type (3.97). These types of quality measures can
either be included as fixed constraints or included in the objective
function, e.g. γ in (3.92) can be either fixed or minimized.
It is also possible to use certain types of quality constraints that
are guaranteed to hold in a confidence region. These functions take
the form (3.101) and include, e.g. , weighted frequency function
errors (see Example 3.13) and worst-case chordal distance measures
(see Example 3.14). The resulting constraint is given by (3.109).
Also here, frequency dependence can be handled either by sampling
88
3 Fundamentals of Experiment Design
the constraints or by application of the KYP-lemma. The latter
approach requires a finite dimensional parametrization of a certain
variable τ (ω), cf. (3.113). This parametrization may introduce a
certain conservatism in the design. The resulting constraints are
the two LMIs in (3.116).
• Signal constraints.
Input and output energy constraints are expressed by (3.57) and
(3.59), respectively. One may, e.g. , use (3.57) as the objective
function to minimize the energy used in a certain frequency band.
This may be of interest for systems such as flexible structures where
certain excitation frequencies may be highly damaging to the system.
Frequency-by-frequency constraints can easily be included as discussed in Section 3.4.2.
As for the quality constraints, signal constraints can either be included as fix constraints or in the objective function.
• Robustness constraints.
We refer to Section 3.10 for a discussion of different robustness
aspects.
We will now illustrate some of the features via an example related to
identification for control.
Example 3.16
In this example we will illustrate the machinery of the framework for an
identification for control problem.
Let the true system be given by
y(t) = Go (q)u(t) + e(t)
−1
(3.139)
0.36q
with Go (q) = 1−0.7q
−1 and where e(t) is zero-mean white noise with
variance 0.1. The magnitude of the frequency function of Go is shown as
the dash-dotted line in Figure 3.6.
The modeling objective is to be able to design a controller such that
the resulting closed-loop system is stable and the complementary sensitivity function is close to
T =
(1 − 0.1353)2 q −1
.
(1 − 0.1353q −1 )2
89
3.11 Framework Review and Numerical Illustration
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
−4
10
−3
−2
10
10
−1
10
0
10
Frequency (rad)
Figure 3.6: Thick solid line: Optimal spectrum based on (3.143).
Dashed line: Input spectrum obtained from robust algorithm. Thin solid
line: Weighting function T . Dash-dotted line: Open loop system Go .
Dotted line: White noise input with variance Φu (ω) = 0.26.
The amplitude curve of T is shown as the thin solid line in Figure 3.6.
A sufficient condition for this is that the weighted relative model error
(3.19) is sufficiently small (in particular ∆(θ)∞ ≤ 1 guarantees stability) (Hjalmarsson and Jansson, 2003). Here we choose the condition
∆(θ)∞ ≤ 0.1 .
(3.140)
As model structure we choose
G(θ) =
bq −1
, θ = [a b]T .
1 − aq −1
The sample size is set to N = 500 samples5 The objective is to find the
minimum energy required, and the corresponding input spectrum, such
that (3.140) is satisfied for all models in the resulting 95% confidence
5 The sample size only acts as a scaling factor for the covariance matrix and it is
easy to modify the results for arbitrary sample size.
90
3 Fundamentals of Experiment Design
region. We therefore use the criterion (3.98). We thus have the problem
minimize
α
Φu
G(θo ) − G(θ) 2
T
≤ γ2
G(θ)
subject to
∀ω
(θ − θo )T PN−1 (θ − θo ) ≤ χ
(3.141)
Φu (ω) ≥ 0 ∀ ω
π
1
Φu (ω)dω ≤ α
2π −π
where γ = 0.1 and where the elliptical constraint in (3.141) is defined by
(2.20) with χ = χ2 (2) = 5.99 which corresponds to a confidence level of
95%.
9
When we restrict the structure of τ to τ (ω) = k=0 τk (e−jωk + ejωk ),
Theorem 3.2 can be applied giving the following approximation to (3.141)
(recall the definition (3.32) of K):
minimize
Φu ,Qτ ,QΛ ,τ0 ,...,τ9
subject to
α
K(Qτ , {Aτ , Bτ , Cτ , Dτ }) ≥ 0
K(QΛ , {AΛ , BΛ , CΛ , DΛ }) ≥ 0
QTτ = Qτ
(3.142)
QTΛ = QΛ
Φu (ω) ≥ 0 ∀ ω
π
1
Φu (ω)dω ≤ α
2π −π
where Cτ and Dτ depend linearly on τk , k = 0, . . . , 9 and that CΛ and
DΛ depend linearly on τk , k = 0, . . . , 9 and Φu . The above problem is
thus convex in all free variables.
In order to reduce the problem to a finite-dimensional one, the spectrum is parametrized as in (3.85) with Bk (ejω ) = e−jωk and M = 20.
Example 3.10 gives that this corresponds to shaping the input spectrum
with a FIR filter of order 20 and that the parameters ck in (3.85) correspond to the auto-correlation sequence rk of the input. Lemma 3.1 gives
that the positivity constraint on Φu now is equivalent to
K(Q, {A, B, C, D}) ≥ 0
3.11 Framework Review and Numerical Illustration
91
and Lemma 3.3 gives that the input variance constraint can be expressed
as r0 ≤ α. Thus the input design problem (3.142) is equivalent to
minimize
α
subject to
K(Qτ , {Aτ , Bτ , Cτ , Dτ }) ≥ 0
Qτ ,QΛ ,τ0 ,...,τ9 ,r0 ,...,r19
K(QΛ , {AΛ , BΛ , CΛ , DΛ }) ≥ 0
K(Q, {A, B, C, D}) ≥ 0
QTτ = Qτ
(3.143)
QTΛ = QΛ
QT = Q
r0 ≤ α
Solving (3.143) gives the input spectrum shown as the thick solid line
in Figure 3.6. The solution conforms quite well to common intuition –
most of the energy is distributed around the desired bandwidth of the
closed-loop system. The minimum power required is α = 0.26. Figure 3.7 shows the parameter estimates of 1000 Monte Carlo runs for the
optimal design based on (3.143). We see that the estimates are clustered
inside the contour ∆∞ = 0.1 as desired. In fact, 96% of the estimated
models satisfy the quality constraint, since some of estimates outside the
confidence ellipse still are inside the level curve ∆∞ = 0.1.
Remark: Given a feasible solution of (3.143), we know that (3.109) is
satisfied and according to Theorem 3.1 this will imply that ∆∞ ≤ 0.1
for all models in the uncertainty set. However, the restriction on τ (ω)
will lead to a conservative input design, since the representation of τ (ω)
only corresponds to a subclass of all τ (ω) ≥ 0. In this example, however,
the solution is not very conservative at all.
For comparison purposes, confidence ellipses for the optimal design
and a white noise design with the same variance (corresponds to Φu (ω) =
0.26 shown as the dotted line in Figure 3.6) are also shown in Figure 3.7.
These ellipses are all based on estimates of the covariance matrix obtained
from Monte Carlo simulations. We clearly see that the approximation of
(3.141) made in (3.143) performs well whereas the white input design
clearly does not meet the objective and, in this case, is uniformly worse
than the optimal design. In fact, only 67% of the models satisfy the
quality constraint. To obtain a white input design with a confidence
ellipsoid that is completely inside the contour curve corresponding to
∆∞ = 0.1, an input variance of Φu (ω) = 0.85 is required.
92
3 Fundamentals of Experiment Design
−0.62
−0.64
−0.66
a
−0.68
−0.7
−0.72
−0.74
−0.76
−0.78
0.3
0.32
0.34
0.36
b
0.38
0.4
0.42
0.44
Figure 3.7: Dots: Estimated model parameters from 1000 Monte-Carlo
runs based on optimal design. Dashed ellipse: Estimated 95% confidence bound for the parameter estimates. Dash-dotted ellipse: Confidence bound for white noise input with Φu = 0.26. Contour lines with
interval 0.025 are plotted for ∆∞ and ∆∞ = 0.1 corresponds to the
thick solid contour.
In calculating the optimal design above, knowledge of the true system
was used. Both the quality constraint and the covariance matrix PN in
(3.141) depend on the true system. This manifests itself in (3.143) in
that AΛ , BΛ , CΛ and DΛ all depend on θo .
In a practical situation the true system is unknown. In a second design
we have used an initial identification experiment with Φu (ω) = 0.1 and
N = 500 to obtain an estimate, θ̂i of the true parameters and an estimate
of the covariance of the parameters, P̂i . With this information at hand,
9 additional parameter estimates were drawn from a Normal distribution
with mean θ̂i and covariance P̂i . A total of 10 parameter estimates have
been used as replacements for θo in (3.143). These estimates are shown
as circles in Figure 3.8. This leads to an input design problem with 10
quality constraints instead of one. The resulting input spectrum is shown
93
3.11 Framework Review and Numerical Illustration
−0.6
a
−0.65
−0.7
−0.75
0.32
0.34
0.36
0.38
0.4
0.42
0.44
b
Figure 3.8: Dots: Estimated model parameters from 1000 Monte-Carlo
runs based on robust design. Dashed ellipse: Estimated 95% confidence
bound for the parameter estimates. Dash-dotted ellipse: Confidence
bound for white noise input with Φu = 0.48. Contour lines with interval 0.025 are plotted for ∆∞ and ∆∞ = 0.1 corresponds to the
thick solid contour. The square is the initial estimate θ̂i and the circles
are the randomly picked estimates used in the robust design.
in Figure 3.6. Here we see that more total energy is required in order to
satisfy the quality constraint for these 10 systems compared to the single
system design considered previously. The power of the input obtained
from this robustified design is α = 0.48. Estimates from 1000 MonteCarlo runs with this design are shown in Figure 3.8. This figure also
includes confidence ellipses for this design and a white noise design with
the same power i.e. Φu (ω) = 0.48.
The price of having 10 different replacements for the unknown θo in
the design is in this case that the solution becomes conservative compared
to the design solely based on the true system. This is evidenced by the
simulations. About 99.5% of the models satisfy the constraint ∆∞ ≤
94
3 Fundamentals of Experiment Design
0.1. However, the robustified algorithm yields a solution that is more
effective compared to a white noise input with the same power. For a
white noise input with Φu (ω) = 0.48, only 87% of the models satisfy the
quality constraint.
Chapter 4
Finite Sample Input
Design for Linearly
Parametrized Models
Most experiment designs rely on uncertainty descriptions (variance expressions or confidence regions) that are valid asymptotically in the sample size N . In this chapter we take another step towards more reliable
input designs, by considering a finite sample size. For the considered
class of linearly parametrized frequency functions it is possible to derive
variance expressions that are exact for finite sample sizes. Based on these
variance expressions it is shown that the optimization over the square of
the Discrete Fourier Transform (DFT) coefficients of the input leads to
convex optimization problems.
Two different approaches are considered. The first method is based
on a recently developed explicit variance expression which is valid for
a class of model structures, which includes FIR, Laguerre and Kautz
models. A restriction with this expression is that the number of nonzero DFT coefficients of the input must equal the number of estimated
parameters. The other method is directly based on the covariance matrix
for the parameter estimates which allows for an arbitrary number of nonzero DFT-coefficients. This method relates to the framework presented
in Chapter 3.
96
4 Finite Sample Input Design for Linearly Parametrized Models
4.1
Introduction
In this chapter we will again consider input design when the true system is
in the model set. Let G(ejω , θ) be the frequency function to be estimated
with θ ∈ Rn the unknown parameter vector. Denote the parameter estimate based on a sample size N of the input-output data set by θ̂N , and
the corresponding frequency function estimate by ĜN (ejω ) = G(ejω , θ̂N ).
The close connection between the variance of the frequency function estimate, Var ĜN (ejω ), and the model uncertainty has prompted the use
of this variance as the key variable in input design.
The key expression for the variance is (2.23), i.e. the first order Taylor
approximation
Var ĜN (ejω ) ≈
1 dG∗ (ejω , θo ) dG(ejω , θo )
P
,
N
dθ
dθ
(4.1)
where θo is the true parameter vector. Based on the expression (4.1) it
is difficult to interpret the influence of the experiment conditions such as
the input spectrum on the variance. This has been one of the drives to
find alternative expressions for (4.1). By introducing
κn,N (ω) = Var(ĜN (ejω ))
N · Φu (ω)
.
Φv (ω)
where Φu (ω) and Φv (ω) denote the input and noise spectra, respectively,
one can write
Var ĜN (ejω ) = κn,N (ω)
Φv (ω)
.
N · Φu (ω)
(4.2)
In one very fruitful line of research, represented by (Gevers and Ljung,
1986; Hjalmarsson et al., 1996; Forssell and Ljung, 2000; Zhu, 2001), the
high-order approximation
κn,N (ω) ≈ mo
(4.3)
for large enough model order mo and number of sample size N , has been
used for input design. The direct frequency by frequency dependence
on the variance of the input spectrum leads to explicit frequency wise
solutions for the input spectrum. The approximation (4.3) is motivated
by the asymptotic result
lim
lim
m→∞ N →∞
N
Φv (ω)
Var ĜN (ejω ) =
,
m
Φu (ω)
(4.4)
4.2 Discrete Fourier Transform Representation of Signals
97
originally derived in (Ljung and Yuan, 1985; Ljung, 1985).
Starting with (Ninness et al., 1999), there have been a number of
contributions towards more exact expressions for κn,N (ω) for different
model structures. The case of a model with fixed denominator and fixed
moving average noise model excited by an auto-regressive (AR) input is
∆
studied in (Xie and Ljung, 2001) and an exact expression for κn (ω) =
limN →∞ κn,N (ω) is derived. This expression is thus not asymptotic in
the number of parameters. A generalization of this result, including BoxJenkins models, is presented in (Ninness and Hjalmarsson, 2004; Ninness
and Hjalmarsson, 2002a). For model classes that are linear in the parameters, the paper (Hjalmarsson and Ninness, 2004) presents an expression
that is valid for finite sample sizes when the number of non-zero spectral
lines of the input equals the number of estimated parameters.
Here we will exploit the variance results in (Hjalmarsson and Ninness,
2004) for input design. This leads to a geometric programming problem.
A related approach is presented in (Lee, 2003) and we will comment more
on the relation to this approach later in the paper.
We we will also derive a method that is based on (4.1). This method
is not restricted to the conditions in (Hjalmarsson and Ninness, 2004),
i.e. more than n of the spectral lines can be taken non-zero. The price
paid for this is that the design procedure itself is somewhat more abstract.
Furthermore, the computational complexity is larger and becomes an
issue for large numbers of excited spectral lines.
The outline of the chapter is as follows. Preliminaries concerning
Fourier representation of signals are covered in Section 4.2. The model
structure, and least-squares estimation of its parameters, is introduced in
Section 4.3. The input design problem is then introduced in Section 4.4
and solution methods are discussed in the following subsections. Section
4.5 contains a summary.
4.2
Discrete Fourier Transform Representation of Signals
Let z(t) ∈ R be defined for t = −(n − 1), . . . , N − 1 (where n > 0) and
satisfy
z(t + N ) = z(t),
t = −(n − 1), . . . , −1.
(4.5)
98
4 Finite Sample Input Design for Linearly Parametrized Models
Then we can write
z(t) =
N
−1
Zk ejω◦ k t , t = −(n − 1), . . . , N − 1
(4.6)
k=0
where ω◦ = 2π/N . Since z(t) is real, it holds that ZN −k = Zk , k =
1, . . . , N − 1 and Z0 ∈ R.
We define the covariance function for z(t) according to
rzz (τ ) =
N −1
1 z(t)z(t − |τ |),
N t=0
|τ | ≤ n − 1.
Using (4.6) we have
rzz (τ ) =
N
−1
|Zk |2 ejω◦ kτ ,
|τ | ≤ n − 1.
(4.7)
k=0
When only m < N spectral lines of z(t) are non-zero, we will express
(4.6) as
z(t) =
m
Z̃k ejωk t , t = −(n − 1), . . . , N − 1
(4.8)
k=1
where Z̃k = Zl and ωk = ω◦ l for some 0 ≤ l < N − 1. Notice that the
covariance function then can be written
rzz (τ ) =
m
|Z̃k |2 ejωk τ ,
|τ | ≤ n − 1.
(4.9)
k=1
4.3
Least-squares Estimation
It will be assumed that the true system is single-input/single-output and
given by
y(t) = G◦ (q)u(t) + e◦ (t)
(4.10)
where {e◦ (t)} is a zero mean i.i.d. process that satisfies E{|e◦ (t)|2 } < ∞
and has variance λ◦ . The system dynamics is given by
G◦ (q) = θ◦T Γ(q) F (q)
(4.11)
99
4.3 Least-squares Estimation
where F (q) is a known stable transfer function and where
Γ(q) = [q −1 , . . . , q −n ]T
(4.12)
(q −1 is the time-shift operator). By introducing
z(t) = F (q)u(t)
(4.13)
ϕ(t) = [z(t − 1), z(t − 2), . . . z(t − n)]T
(4.14)
and
we may rewrite (4.10) as
y(t) = θ◦T ϕ(t) + e◦ (t).
(4.15)
It is assumed that the model structure is of the same form
y(t) = θT ϕ(t) + e(t).
(4.16)
Notice that several common model structures fit into this form. FIR
systems correspond to F (q) = 1. Furthermore, fixed denominator structures such as Laguerre and Kautz models correspond to F (q) = 1/A(q)
for some suitably chosen polynomial A(q) (Ninness et al., 1999).
N
The available data is {y(t)}N
t=1 and {ϕ(t)}t=1 and based on these samples the parameter vector θ is estimated using the least-squares method.
This results in the estimate
−1
N
N
1 1 T
ϕ(t)ϕ (t)
ϕ(t) y(t).
(4.17)
θ̂N =
N t=1
N t=1
The function
T
ĜN (ejω ) = θ̂N
Γ(ejω )F (ejω )
(4.18)
jω
is an estimate of the true frequency function G◦ (e ).
Before we proceed we introduce the notation
T (t(τ ))
(4.19)
for an n × n Toeplitz matrix with elements tij = t(i − j).
For (4.14) it follows that
N
1 ϕ(t)ϕT (t) = T (rzz (τ )) .
N t=1
(4.20)
The input design methods to be presented below are based on the finite
sample variance expression in the following lemma.
100
4 Finite Sample Input Design for Linearly Parametrized Models
Lemma 4.1
Suppose that ϕ(t) is given by (4.14) where z(t), t = −(n − 1), . . . , N − 1
is deterministic and defined by (4.6). Suppose further that the number of
estimated parameters n is not larger than the number of non-zero spectral
lines for z(t). Then the least-squares estimate θ̂N ∈ Rn defined by (4.17)
is well defined and has covariance matrix
Cov θ̂N =
λ◦ −1
T (rzz (τ ))
N
(4.21)
where rzz is the covariance function (4.7) of z.
Furthermore, the variance of ĜN (ejω ), defined in (4.18), is given by
Var ĜN (ejω ) =
λ◦
|F (ejω )|2 Γ∗ (ejω ) T −1 (rzz (τ )) Γ(ejω ).
N
(4.22)
Proof: See (Hjalmarsson and Ninness, 2004)
Define
vk (ω) =
m
*
l=1, l=k
jω
e − ejωl
ejωk − ejωl
2
=
m
*
l=1, l=k
sin2
ω−ωl 2 l
sin2 ωk −ω
2
(4.23)
As the following theorem shows, it is possible to rewrite (4.22) such that
the DFT-coefficients appear explicitly in the variance expression when
the number of non-zero DFT-coefficients equals the number of estimated
parameters.
Theorem 4.1
Let ϕ(t) and θ̂N be defined by (4.14) and (4.17), respectively, with z(t)
defined in (4.13) satisfying z(t + N ) = z(t) for t = −(n − 1), . . . , −1 such
that this signal can be written as (4.8). Furthermore, let the number of
non-zero spectral lines m equal the number of estimated parameters n.
Then it holds that the variance of the frequency function estimate (4.18)
is given by
Var ĜN (ejω ) =
m
1
λ◦
|F (ejω )|2
vk (ω).
N
|Z̃k |2
k=1
Proof: See (Hjalmarsson and Ninness, 2004)
(4.24)
101
4.3 Least-squares Estimation
Corollary 4.1 (FIR-models) Let ϕ(t) and θ̂N be defined by (4.14) and
(4.17), respectively, with F (q) = 1 and u(t) satisfying u(t + N ) = u(t)
for t = −(n − 1), . . . , −1 such that this signal can be written as
u(t) =
m
Ũk ejωk t , t = −(n − 1), . . . , N − 1.
(4.25)
k=1
Furthermore, let the number of non-zero spectral lines m equal the number of estimated parameters n. Then it holds that the variance of the
frequency function estimate (4.18) is given by
Var ĜN (ejω ) =
m
λ◦ 1
vk (ω)
N
|Ũk |2
k=1
(4.26)
where vk is defined by (4.23).
Proof: See (Hjalmarsson and Ninness, 2004)
The condition on the input sequence in Corollary 4.1 implies that the
input has to be tapered, i.e. u(t) = 0 for t = N − n + 1, . . . , N , for a
system that is started at the time of data collection.
We next state the variance expression when the input is periodic.
Corollary 4.2 (Periodic input) Let u(t) be periodic with Fourier representation (4.25) extended to hold also for t < −(n−1). Let ϕ(t) and θ̂N
be defined by (4.14) and (4.17). Furthermore, let the number of non-zero
spectral lines m equal the number of estimated parameters n. Then it
holds that the variance of the frequency function estimate (4.18) is given
by
Var ĜN (ejω ) =
m
1
λ◦
|F (ejω )|2
vk (ω)
jω
N
|F (e k )Ũk |2
k=1
(4.27)
where vk is defined by (4.23).
Proof: See (Hjalmarsson and Ninness, 2004)
102
4 Finite Sample Input Design for Linearly Parametrized Models
4.4
Input Design
Suppose that the maximum input amplitude
umax =
max
−(n−1)≤t≤N −1
|u(t)|
(4.28)
is restricted, i.e. umax ≤ Cmax for some positive constant Cmax . It is then
natural to consider the input design problem where the time it takes to
satisfy some condition on the variance is minimized, i.e.
minimize
N
(4.29)
subject to
Var ĜN (ejω ) ≤ b(ω)
(4.30)
{u(t), t=−(n−1),...,N −1}
umax ≤ Cmax
In view of (4.7), Lemma 4.1 shows that the input influences the accuracy of ĜN only through the squared magnitude |Uk |2 of the spectral
lines. This is true when F = 1, i.e. for FIR-models, when u(t) = u(t+N ),
t = −(n − 1), . . . , −1, and for the general case when the input is periodic.
The phases are immaterial exactly as in the asymptotic case where only
the input spectrum plays a role. Hence, it is natural to use αk = |Uk |2
as optimization variables.
The freedom in the choice of phases can be used to reduce the maximum amplitude of the input. Various techniques exists to find the phases
of Uk , k = 1, . . . , n such that umax is minimized given |Uk |, k = 1, . . . , n
(Pintelon and Schoukens, 2001). There is a strong connection between
the minimum and the root mean square (RMS) value
uRMS
+
+
,
,N −1
N
,1 ,
2
=
|u(t)| = |Uk |2 .
N t=1
(4.31)
k=0
The crest factor
Cr(u) =
umax
uRMS
(4.32)
has a minimum of typically 1 ≤ Cr(u) ≤ 2. Hence, the constraint umax ≤
Cmax can be replaced by the constraint uRMS ≤ CRMS for some positive
constant CRMS .
103
4.4 Input Design
Thus a simpler, and more tractable, problem is to minimize the RMS
for a given sample size N :
N
−1
minimize
αk , k=0,...N −1
subject to
αk
(4.33)
k=0
λ◦ ∗ jω
b(ω)
−1
Γ (e ) T (rzz (τ ))
Γ(ejω ) ≤
N
|F (ejω )|2
(4.34)
αk = |Uk |2
where, referring to Lemma 4.1, the variance constraint (4.30) is captured
by the constraint (4.34). The problem (4.33) is not tractable due to the
inequality (4.34) which is non-convex in |Uk |. In the subsections that
follow we will discuss two methods to convert this problem into a convex
programming problem.
4.4.1
Geometric Programming Solution
Below we will show that a convex solution exists to the problem (4.33)–
(4.34) if the number of spectral lines are restricted to the number of
estimated parameters. Therefore, let z(t) be given by (4.8) with m = n.
Let the corresponding DFT-coefficients of u(t) be denoted by Ũk and
assume that the input is such that Z̃k = F (ejωk )Ũk (i.e. u is periodic).
Define
fk (ω) =
λ◦
vk (ω),
N |F (ejωk )|2
α̃k = |Ũk |2 ,
1≤k≤n
1≤k≤n
(4.35)
(4.36)
with vk defined by (4.23). From Theorem 4.1, we have that
Var ĜN (ejω ) =
fk (ω) α̃k−1 .
|F (ejω )|2
n
(4.37)
k=1
The input design problem can thus be written as
minimize
α̃k , k=1,...n
subject to
n
k=1
n
k=1
α̃k
fk (ω) α̃k−1 ≤
(4.38)
b(ω)
,
|F (ejω )|2
0≤ω≤π
(4.39)
104
4 Finite Sample Input Design for Linearly Parametrized Models
where the free variables are the n frequency components ωk , k = 1, . . . , n
and the n squared amplitudes α̃k of the non-zero spectral lines.
The constraint (4.39) is semi-infinite but can be approximated by a
discretization over the frequency axis.
For each fixed selection of the n non-zero spectral lines, (4.38) is
a geometric programming in α̃k , k = 1, . . . , n. This problem can be
converted into a convex problem via the transformation ηk = log(α̃k )
and by taking the logarithm of the constraints (Boyd and Vandenberghe,
2003). This leads to the following convex problem
minimize
ηk , k=1,...,n
subject to
n
k=1
log
eηk
.
(4.40)
n
fk (ω̄l )|F (ej ω̄l )|2
k=1
b(ω̄l )
/
e
−ηk
≤ 0,
l = 1, . . . , M
where M is the number of discretized frequencies ω̄l .
The convexity of (4.40) implies that the global optimum of this problem can be computed with arbitrary accuracy. Hence, with the frequencies of the non-zero spectral lines fixed, a suitable input can be determined.
Returning to the problem (4.38), we see that the solution of this
problem can be found by solving (4.40) for all selections of non-zero
spectral lines. Given the sample size N , there are
N
n
possible selections of non-zero spectral lines. Hence, when N is large an
exhaustive search over all combinations becomes time consuming. However, a reasonable starting point is to use evenly distributed spectral lines.
This is also supported by some observations regarding the sensitivity of
the variance with respect to the choice of spectral lines in the input made
in (Hjalmarsson and Ninness, 2004).
It is of interest to compare this approach to the frequency domain
method proposed in (Lee, 2003) which is based on FIR-modeling and
finite data of the same type as in this contribution. In (Lee, 2003) it
is conjectured that for FIR models the variance (4.1) can be written in
a very simple form when n of the spectral lines have to be non-zero,
i.e. exactly the same situation as considered in the present subsection.
Based on this simple expression, an explicit expression for the optimal
105
4.4 Input Design
input spectrum is derived in (Lee, 2003). Comparing with the approach
above, a limitation in (Lee, 2003) is that the frequency behaviour outside
of the non-zero spectral lines cannot be accounted for in the design.
It can also be remarked that (Lee, 2003) propose to optimize directly
over the input sequence. This leads to bilinear matrix inequalities which
are significantly harder to solve.
4.4.2
LMI Solution
Using Schur complements the constraint (4.34) can be written as
N b(ω)
∗ jω
Γ
(e
)
jω
2
≥ 0, 0 ≤ ω ≤ π.
(4.41)
Λ(ω) λo |F (ejω )|
T (rzz (τ ))
Γ(e )
When F (q) = 1 or the input is periodic it holds that the squared magnitudes of the DFT-coefficients of the input appear linearly in the Toeplitz
matrix T (rzz (τ )), see (4.19), (4.9) (4.26) and (4.27). Thus the constraint
(4.41) becomes convex but semi-infinite since it has to hold for all frequencies between 0 and π. This problem can be handled by the KYP-lemma
(Yakubovich, 1962), similar to the variance constraints in Chapter 3.6.
Lemma 4.2
Assume there exists a rational transfer function, L(ejω ), such that
b(ω)/|F (ejω )|2 = L(ejω ) + L∗ (ejω ).
Let (A, B, C, D) be a controllable state-space representation of the positive real part of Λ(ω) in (4.41). Then the constraint
Λ(ω) ≥ 0,
0≤ω≤π
is equivalent to the LMI
0
Q − AT QA C T − AT QB
+
0
−B T QB
C − B T QA
0
≥0
D + DT
(4.42)
with Q = Q∗ and where the optimization variables |Uk |2 appear linearly
in D.
Proof: When b(ω)/|F (ejω )|2 can be represented by a spectrum, it is
possible find a controllable state-space description of Λ(ω). Given this
realization it is a consequence of the KYP-lemma that the constraint
106
4 Finite Sample Input Design for Linearly Parametrized Models
0.05
0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
0
0.5
1
1.5
2
2.5
3
Frequency (rad)
Figure 4.1: Variance Var ĜN (ejω ) for the design based on the geometric programming solution in Section 4.4.1. Solid line: Sample variance.
Dashed line: Theoretical expression (4.22). Dotted line: The variance
bound b(ω).
(4.41) is fulfilled for each frequency if and only if ∃Q = Q∗ that satisfies
(4.42).
This means that when there exists a rational transfer function, L(ejω ),
such that b(ω)/|F (ejω )|2 = L(ejω ) + L∗ (ejω ), the constraint (4.41) can
be replaced by one single LMI. Thus the input design problem (4.33)–
(4.34) can be solved exactly. With this approach there is no restriction
on the number m of non-zero DFT-coefficients (other than m ≥ n which
is required for identifiability).
4.4.3
Numerical Illustration
Suppose that a FIR model of order n = 40 is to be estimated using
N = 600 input/output samples. The problem is to design the input such
that the variance is below the dotted curve in Figure 4.1. We will do
107
4.4 Input Design
0.05
0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
0
0.5
1
1.5
2
2.5
3
Frequency (rad)
Figure 4.2: Variance Var ĜN (ejω ) for the design based on the LMI
solution in Section 4.4.2. Solid line: Sample variance. Dashed line: Theoretical expression (4.22). Dotted line: The variance bound b(ω).
two different designs, one based on the geometric programming solution
corresponding to (4.40) and one based on the LMI solution suggested in
Section 4.4.2.
Choosing the spectral lines to be evenly spaced with a distance of 2π/n
apart and using the Matlab optimization toolbox command fmincon to
solve the problem (4.40) results in the amplitudes Ũk for the spectral lines
given in Figure 4.3. Figure 4.1 shows the variance. The result of Monte
Carlo simulations, based on 5000 noise realizations, and the theoretical
expression (4.22) are shown.
The advantage with the LMI solution is that it is possible to choose
m, the number of non-zero DFT-coefficients, to be larger than the order of the system n. This has been utilized in the LMI design where
m = 60. Figure 4.2 shows the variance, comparing the result of Monte
Carlo simulations, based on 5000 noise realizations, and the theoretical
expression (4.22). The optimal amplitudes are shown in Figure 4.3. The
spectral lines are evenly spaced with a distance of 2π/m apart. In Figure
108
4 Finite Sample Input Design for Linearly Parametrized Models
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.5
1
1.5
2
2.5
3
Frequency (rad)
Figure 4.3: Amplitudes |Ũk |. Geometric programming design (’x’), and
LMI design (’o’)
4.4 two different inputs with these amplitudes are shown. The dotted
curve corresponds to choosing the phase equal to zero for all components
in u whereas the solid line corresponds to random phases in the interval
[0, 2π). Clearly the crest factor, and therefore the maximal amplitude
umax , is very sensitive to the choice of phases.
The extra degree of freedom that the LMI design has compared to
the geometric programming solution is in this example beneficial. The
energy of the input based .on the LMI design with m = 60 > n is about
21% less compared to the solution based on m = n. For the optimization
procedures that have been presented in this chapter, the excited frequencies are pre-specified, and they are in general non-optimal since we do not
know the location of the optimal frequencies in advance. Thus, the more
frequencies that are excited the larger is the probability that the solution
is close to the optimal. Since the sample size N is finite, the available frequencies are restricted to the finite set, Ω = {2πk/N, k = 0, . . . , N − 1}.
Therefore, the optimal solution can be found by setting m = N but then
109
4.4 Input Design
20
15
10
5
0
−5
0
50
100
t
150
200
250
Figure 4.4: Two input signals that correspond to the amplitudes in
Figure 4.3 for the LMI solution. The solid line corresponds to a random
selection of phases. The dotted line corresponds to Ũk = |Ũk |.
computational complexity may become an issue. The input energy decays typically exponentially with m which is illustrated in Figure 4.5.
Therefore, m can in most cases be chosen less than N .
4.4.4
Closed-form Solution
An interesting question is when there exists a solution containing n frequencies, where ntrue ≤ n < N , on the frequency grid Ω, where ntrue
is the order of the true system and n is order of the model. To partly
answer this question. Consider a frequency function b(ω) defined by
b(ω) =
n
λ◦ 1
vk (ω)
N
|Bk |2
(4.43)
k=1
where vk is defined by (4.23) and where ωk = ωl , k = l. Assume that
(4.43) is used as the frequency-by-frequency bound in the design problem
(4.33)-(4.34), then the following theorem applies.
110
4 Finite Sample Input Design for Linearly Parametrized Models
1.05
normalized energy
1
0.95
0.9
0.85
0.8
40
45
50
55
60
65
70
75
80
m
Figure 4.5: Required energy of the input to identify an FIR system of
order 40 as a function of the number of excited frequencies (m). The frequencies are equidistantly distributed. The energy levels are normalized
by the obtained level for m = 40.
Theorem 4.2
Assume that the conditions in Lemma 4.1 hold. Further assume that
F (q) = 1 and that u(t) satisfies u(t + N ) = u(t) for t = −(n − 1), . . . , −1
such that this signal can be written as (4.25). When the variance bound
b(ω) is given by (4.43), the solution to the design problem (4.33)-(4.34)
are given by |Uk |2 = |Bk |2 .
Proof:
written
The variance bound (4.43) can according to Corollary 4.1 be
b(ω) =
λ◦ ∗ jω −1
Γ (e )T (rb (τ )) Γ(ejω ),
N
(4.44)
where rb (τ ) is given by (4.9). Insert (4.44) into (4.34) which gives the
condition T (rzz (τ )) − T (rb (τ )) ≥ 0. The covariance function rzz (τ ) is
defined by (4.9) with m = n and |Z̃k |2 = |Uk |2 = αk . Since the objective
N −1
is to minimize k=0 αk = rzz (0), the solution is given by T (rzz (τ )) =
T (rb (τ )). Furthermore, since ωk = ωl , k = l for the bound (4.43), it
111
4.4 Input Design
follows that there is a one-to-one relation between the variables rb (τ ), τ =
0, . . . , n−1 and the coefficients |Bk |2 , k = 0, . . . , n−1. Since, rzz (τ ), τ =
0, . . . , n − 1 and the DFT-coefficients |Uk |2 , k = 0, . . . , n − 1 also are
characterized as (4.44), see (4.9), we have that |Uk |2 = |Bk |2 .
Theorem 4.2 shows that the considered bound defined by (4.43), there
exists an optimal solution comprising n different frequencies. These frequencies coincide with the frequencies ωk that defines the bound (4.43).
4.4.5
Input Design Based on Over-parametrized Models
Consider that the input design is based on the model order n, but that
the true order is ntrue < n. Then the input design will be conservative
if the number of estimated parameters n̄ are less than n, provided n̄ ≥
ntrue . Let ĜN (ejω , n) denote the frequency response estimate where n
parameters have been estimated.
Theorem 4.3
Suppose that z(t), t = −(n−1), . . . , N −1 is deterministic and defined by
(4.8) and where only n spectral lines are non-zero. Suppose further that
the variance of ĜN (ejω , n) obeys Var ĜN (ejω , n) ≤ b(ω). Then when
n̄ ≤ n parameters are estimated it holds that Var ĜN (ejω , n̄) ≤ b(ω),
provided n̄ ≥ ntrue where ntrue is the order of the true system.
Proof: Lemma 4.1 states that
Var ĜN (ejω , n) =
λ◦
|F (ejω )|2 Γ∗ (ejω ) T −1 (rzz (τ )) Γ(ejω ).
N
N b(ω)
Introduce b̄(ω) = λ◦ |F
(ejω )|2 . Then by partitioning of Γ and T we have
the following equivalence
⇔
Var ĜN (ejω , n) ≤ b(ω)
∗ −1 Γ1 (ejω )
T11 T12
Γ1 (ejω )
≤ b̄(ω)
∗
T12
T22
Γ2 (ejω )
Γ2 (ejω )
(4.45)
112
4 Finite Sample Input Design for Linearly Parametrized Models
By Schur complements (4.45) is equivalent to
⎞
⎛
b̄(ω)
Γ1 (ejω )∗ Γ2 (ejω )∗
T11
T12 ⎠ ≥ 0
⇔ ⎝Γ1 (ejω )
jω
∗
Γ2 (e )
T12
T22
jω ∗
b̄(ω)
Γ1 (e )
⇒
≥0
Γ1 (ejω )
T11
⇔
−1
Γ1 (ejω ) ≤ b̄(ω).
Γ1 (ejω )∗ T11
The last line will, according to Lemma 4.1, correspond to
Var ĜN (ejω , n̄) ≤ b(ω)
when Γ1 (q) = [q −1 , . . . , q −n̄ ] and when n̄ ≥ ntrue . Hence
Var ĜN (ejω , n̄) ≤ b(ω)
when
Var ĜN (ejω , n) ≤ b(ω).
Theorem 4.3 is illustrated in Figure 4.6.
4.5
Conclusions
In this chapter we have considered input design for system identification
problems with finite sample size. The objective of the input design has
been to minimize the root mean square of the input sequence subject
to a frequency wise constraint on the variance of the frequency function
estimate. Two solutions, that perform the optimization over the square
of the DFT coefficients of the input, have been presented.
The first solution is based on an explicit variance expression. This
leads to a convex geometric programming problem which has the main
restriction that the number of non-zero DFT coefficients must equal the
number of estimated parameters.
The second solution is based on the covariance matrix for the parameter estimates. This solution allows for an arbitrary number of DFT
coefficients in the optimization. Furthermore, this method can handle
frequency by frequency constraints, in contrary to the geometric programming solution which has to discretize the frequency axis.
113
4.5 Conclusions
0.05
0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
0
0.5
1
1.5
2
Frequency (rad)
2.5
3
3.5
Figure 4.6: Variance Var ĜN (ejω ) for design based on the LMI solution
in Section 4.4.2 where the design is based on n = 40 but where only 20
parameters have been estimated. Solid line: Sample variance. Dashed
line: Theoretical expression (4.22). Dotted line: The variance bound
b(ω).
Chapter 5
Applications
The objective of this chapter is to quantify benefits of optimal input
design compared to the use of standard identification input signals, for
example PRBS signals for some common, and important, application
areas of system identification. Two benchmark problems taken from process control and control of flexible mechanical structures are considered.
We present results both when the design is based on knowledge of the
true system (in general the optimal design depends on the system itself)
and for a practical two step procedure when an initial model estimate
is used in the design instead of the true system. The results show that
there is a substantial reduction in experiment time and input excitation
level. A discussion on the sensitivity of the optimal input design to model
estimates is provided.
This chapter is organized as follows. Section 5.1 gives an introduction
together with some technical assumptions including the optimal input
design setup. The optimal input design is applied on a process plant
in Section 5.2 and on a resonant system in Section 5.3. The chapter is
concluded in Section 5.4.
5.1
Introduction
Many industrial processes have (very) slow responses which leads to long
and expensive identification experiments (Ogunnaike, 1996). It is thus
important to design the experiments carefully as to maximize the information contents. Another area where input design can be crucial is when
116
5 Applications
identifying flexible mechanical structures. Here, time is not crucial but
the experiments are usually severely constrained in order to not damage
equipment.
Input design has a long history and (Zhu, 2001; Rivera et al., 2003;
Lee, 2003; Jacobsen, 1994) are some recent contributions related to process control. Typical design problems correspond to non-convex programs
and, hence, computational aspects have limited the applicability of optimal input design. One purpose with this chapter is to examine more
closely what the framework of Chapter 3 has to offer for the aforementioned application areas. More precisely, the objective is twofold:
I. The first aspect is to quantify possible benefits of optimal input
design for the two applications. The use of input signals with optimal frequency distribution will be compared to the use of standard
identification input signals e.g. PRBS signals. The benefits will
be quantified in terms of saved experiment time and in possible
reduction of input excitation.
Since process modelling may be very time consuming we will in this
paper illustrate possible time savings by using an optimal strategy
for the considered process model. Here the time it takes to obtain a
certain quality of the model is measured and compared for different
inputs when the input energy is held constant.
For the mechanical system, the experiment time is in many cases
not an issue. Instead, we will study possible savings in the level of
input excitation with an optimal strategy.
II. The second aspect is to enlighten some robustness issues regarding
the input design. Optimal input designs in general depend on the
unknown true system. This is typically circumvented by replacing
the true system with some estimate in the design procedure. But
there exist very few hard results on the sensitivity and the robustness of these optimal designs with respect to uncertainties in the
model estimate that is used in the design.
Here we will illustrate situations where input designs are very sensitive to model errors included in the design. Furthermore, we will
redo the comparison in (I), but in a more realistic setting where the
optimal design philosophy is replaced by a two-step procedure. The
approach taken is inspired by the work in (Lindqvist and Hjalmarsson, 2001) and (Lindqvist, 2001). In the first step an initial model
is estimated based on a PRBS input. An input design based on this
117
5.1 Introduction
estimate is then applied to the process. This adaptive approach is
compared to an approach which only uses PRBS as input. The
comparison is based on Monte-Carlo simulations in order to study
the average gain in performance.
Usually the comparison between different input signals is in terms
of confidence bounds, see e.g. (Gevers and Ljung, 1986), (Forssell and
Ljung, 2000) and (Shirt et al., 1994). For industrial applications however
perhaps more relevant measures are excitation level and experiment time,
treated e.g. in (Bombois et al., 2004d) and (Rivera et al., 2003).
We will assume that the systems obey the discrete-time linear relation
y(t) = Go (q)u(t) + Ho (q)e(t)
(5.1)
where Go and Ho represents the system and the noise dynamics, respectively, with q being the delay operator. Furthermore, y is the output,
u is the input and e is white noise. For the modelling of (5.1) we will
consider identification within the prediction error framework as described
in Chapter 2. We will assume that we have a full-order model structure
and hence only variance errors occurs. Therefore, for large data lengths,
the model error can be characterized by some function of the parameter
covariance P . To define a proper quality function one has to take the
intended use of the model into account. Here we will consider control
design and we will adopt the following quality measure
|∆(θ)| ≤ γ ∀ ω, ∀θ ∈ Uθ
Uθ = {θ : N (θ − θo ) P
T
−1
(5.2)
(Φu )(θ − θo ) ≤ χ}
(5.3)
where
∆(θ) = T
Go − G(θ)
.
G(θ)
(5.4)
The constraint (5.2) implies that ∆∞ ≤ γ for all models in the
confidence region (5.3). Based on the quality measure (5.2), we will pose
the input design problem as
minimize
α
subject to
|∆(θ)| ≤ γ ∀ ω
N (θ − θo )T P −1 (Φu )(θ − θo ) ≤ χ
π
1
2π −π Φu (ω)dω ≤ α
0 ≤ Φu (ω) ≤ β(ω)
Φu
(5.5)
118
5 Applications
The objective of this input design problem is to find the input spectrum
with the least energy that satisfies (5.2). There may also exist a frequency
by frequency constraint on the input spectrum here represented by β(ω).
The input design problem is a non-convex and non-trivial optimization
problem but by applying the theory in Chapter 3 it is possible to obtain
a finite-dimensional convex program. To obtain the convex program we
will mainly work with the parametrization
Φu =
M
−1
ck cos(kω)
(5.6)
k=0
This parametrization is rather flexible. For example, both power and
frequency by frequency constraints on the input spectrum can be handled, which will be shown in the examples. Furthermore, any spectrum
can be approximated to any demanded accuracy provided that the order
M is sufficiently large. However, when M becomes too large, computational complexity becomes an issue. This parametrization was originally
introduced in (Lindqvist and Hjalmarsson, 2001). Notice that (5.6) corresponds to white noise when M = 1.
The parameters in (5.5) are specified such that |∆| has to be less than
0.1 for at least 95% of the estimated models. Hence γ = 0.1 and the size
of the confidence region χ is determined such that P r(χ2 (n) ≤ χ) = 0.95
where n denotes the number of parameters in the model1 . For example,
χ = 9.49 when n = 4. The frequency weighting T in (3.19) will be a
discretization of T̃ or T̃ 2 where
T̃ (s) =
ω02
(s2 + 2ξω0 s + ω02 )
(5.7)
with the damping ξ = 0.7 and where ω0 will be used to change the
bandwidth of T̃ .
5.2
A Process Control Application
The main objective of this section is to give a flavor of the usefulness of
using optimal input design for identification of models for process control
design. The process plant is defined by (5.1) with the ARX structure
Go (q) =
1 χ2 (n)
B(q)
A(q)
Ho (q) =
1
A(q)
denotes the χ2 -distribution with n degrees of freedom.
(5.8)
119
5.2 A Process Control Application
where
A(q) = 1 − 1.511q −1 + 0.5488q −2
B(q) = 0.02059q
−1
+ 0.01686q
−2
and
.
The sampling time is 10 seconds and e has variance 0.01. This is a
a slight modification of a typical process control application considered
in (Skogestad, 2003). The process has a rise time of 227 seconds, and
consequently, collecting data samples for the identification takes long
time. Therefore the objective of using optimal input design for this plant
is to keep the experiment time to a minimum.
5.2.1
Optimal Design Compared to White Input Signals
Here we will compare the experiment time required for an optimal input
design to achieve a certain quality constraint with the corresponding time
required for a white noise input. The optimal input design will be based
on knowledge of the true system. In reality, of course, this is not a feasible
solution since the true system is unknown. However, the motivation for
this analysis is to investigate what could in the best case be achieved with
optimal input design.
The optimal design is based on a data length of Nopt = 500. Furthermore, T is a discretization of T̃ 2 in (5.7) and there is no frequency by
frequency bound β(ω) on Φu . In the comparison we have normalized the
power of the white input to be equal to the power of the obtained optimal input. The data length of the white input has then been increased
until 95% of the obtained models satisfy |∆| ≤ 0.1. This data length
is denoted Nwhite and the ratio Nwhite /Nopt is plotted in Figure 5.1 for
different bandwidths of T . This example shows that the white input requires about 10 times more data to satisfy the quality constraint, which
is a quite substantial amount of data. In other words, the optimal experiment takes less than one and a half hour compared to almost 14 hours
with a white input. The input spectra corresponding to high bandwidth
(0.0324 rad/s) of T are shown in Figure 5.2.
The optimal spectrum has a very clear low-pass character. Since the
quality measure has a harder penalty on errors of the relative model error
in the low frequency range this agrees well with common intuition; excite
the frequencies where you want small errors. Instead of the white input,
a signal with most of its energy in the frequency band of the weighting
120
5 Applications
11.6
11.4
Nwhite/Nopt
11.2
11
10.8
10.6
10.4
10.2
10
0.015
Figure 5.1: The ratio
process plant.
0.02
Nwhite
Nopt
0.025
ωB (rad/s)
0.03
0.035
as function of the bandwidth of T for the
function T would perhaps have been a more appropriate choice in this
example. One example is depicted in Figure 5.2, which corresponds to
the spectrum of a 3rd order Butterworth filter, where the cutoff frequency
equals the bandwidth of T . This design achieves approximately the specified quality constraint and its energy is 1.7 larger than the optimal. Even
though this is larger than the optimal it is still significantly less than the
energy of the white input.
5.2.2
Optimal Input Design in Practice
To handle the more realistic situation where the true system is unknown,
we will replace the optimal design strategy by a two-step procedure. In
the first step an initial model is estimated based on a PRBS input2 .
This model estimate is used as a replacement for the true system in the
design problem (5.5). The obtained sub-optimal solution is then applied
to the process in the second step. This adaptive approach is compared
to an approach which only uses PRBS as input. The two strategies are
illustrated in Figure 5.3. The main objective is to investigate whether
there are any benefits of using a sub-optimal design approach or not.
2 PRBS
is a periodic, deterministic signal with white-noise-like properties.
121
5.2 A Process Control Application
60
40
magnitude (dB)
20
0
−20
−40
−60
−80 −4
10
−3
10
−2
10
frequency (rad/s)
−1
10
Figure 5.2: The process plant, high bandwidth of T . Thick solid line:
optimal input spectrum. Dashed line: transfer function T . Dash-dotted
line: open-loop system. Thin solid line: white input spectrum. Dotted
line: low-pass approximation of optimal spectrum.
The comparison is based on Monte-Carlo simulations in order to study
the average gain in performance.
First consider the two-step adaptive input design approach. We use
a PRBS with length Ninit = 300 and amplitude 3 to estimate an initial
model estimate Gm of the true system. This model is used for input
design based on (5.5) with no frequency by frequency upper bound on
the input spectrum and experiment length Nopt = 500. This strategy is
compared to the approach where a single set of PRBS is used in each
Monte-Carlo run. For the comparison’s sake the amplitude of the the
PRBS is tuned so that the signal has the same input power as the average
power of the input in the two-step approach. After 1000 Monte-Carlo
simulations with different noise realizations, 98.3% of the models with
the two-step procedure satisfy ∆∞ ≤ 0.1. With an experiment length
of N = 3600, 96.4% of the models satisfy the constraint for the PRBS
122
5 Applications
(a)
N
(b)
Ninit
Nopt
Figure 5.3: Two strategies for input design. (a) The ad hoc-solution. A
PRBS is used for the entire identification experiment as input signal. (b)
Adaptive input design. The input data set is split in two parts. The first
part, a PRBS, is used for identification of an initial model estimate. The
second part is an, with respect to the initial model, optimally designed
input.
approach.
One realization of the input sequences for both strategies are plotted
versus time in hours in Figure 5.4. We clearly see that the experiment
time when input design is involved is less than 2 hours and 15 minutes,
but more than 10 hours for the PRBS input. We conclude that for
the considered quality constraint, the experiment time can be shortened
substantially even when the sub-optimal design is used.
5.3
A Mechanical System Application
In this section, input design is applied to a resonant mechanical system. The system is represented by a slightly modified version of the
half-load flexible transmission system proposed in (Landau et al., 1995b)
as a benchmark problem for control design. It has been used for input
design illustrations in (Bombois et al., 2004b). The system is defined by
the ARX structure (5.8) with
A(q) = 1 − 1.99185q −1 + 2.20265q −2 − 1.84083q −3 + 0.89413q −4
B(q) = 0.10276q
−3
+ 0.18123q
−4
and
.
Furthermore, e is white noise with variance 0.01. The sampling time is
Ts = 0.05 seconds.
123
5.3 A Mechanical System Application
10
5
0
−5
−10
0
2
4
2
4
6
8
6
8
10
0
−10
0
time (h)
Figure 5.4: The process plant. Above: the input sequence not involving
optimal input design. Below: the input sequence when involving optimal
input design. The first part of the signal is used to identify an initial
model estimate.
The experiment time is not an issue for this system. Therefore, the
objective of the design is to obtain an input that, for a given data length,
has as low excitation level as possible.
5.3.1
Optimal Design Compared to White Input Signals
The optimal design will be based on the true system, as the example in
Section 5.2.1. Here T is a discretization of T̃ and the data length is 500
for the optimal design as well as for the white input. The reason is that
we will compare excitation levels rather than experiment times.
First, consider the optimal input design when there is no upper bound
imposed on the input spectrum. The input power αopt for the optimal input is plotted versus the bandwidth of T in Figure 5.5. It is clear that the
input excitation increases with increasing bandwidth of T . This has to do
with the definition of ∆. When the bandwidth increases the relative error
around the first resonance peak starts to dominate in ∆∞ and more
124
5 Applications
input power has to be injected. The optimal spectra for low and high
bandwidth, respectively, are plotted in Figure 5.6 and Figure 5.7. Here
we see that the input power is concentrated around the first resonance
peak for high bandwidths.
Let αwhite be the required input power for a white input to achieve
the specified model quality. The ratio αopt /αwhite is plotted versus the
bandwidth of T in Figure 5.5. We can conclude that when the total
power is compared there are certainly benefits obtained using an optimal
strategy compared to the white input, especially for high bandwidths
where the white input requires about ten times the energy required for the
optimal design. This is due to the large impact of the first resonance peak
and the capability of the optimal design to distribute much energy around
this peak and less for other frequencies. However, in practice it may be
unacceptable to concentrate the input power around the resonance peak.
The presented optimal input design method can handle frequencywise conditions on the input spectrum, see β(ω) in (5.5). This possibility
is now used. A frequency-wise upper bound on the input spectrum is
included in the design problem (5.5) that restricts the possibility to inject energy around the first resonance peak. The upper bound and the
obtained spectrum is shown in Figure 5.8. This is a good illustration
of the impact of the first resonance peak on the required input power.
With this restriction we need about 14 times more energy than without
the bound. So there is a delicate trade-off between demanding a small
relative model error around the resonance peak, the possibility to excite
this frequency band and required total input power. It is worth to notice
that the suggested framework allows the designer to control this.
5.3.2
Input Design in a Practical Situation
In this section, the parameters of the true system are assumed to be
unknown. As for the process application we will handle this situation
by replacing the optimal design procedure by an adaptive two-step approach. For the mechanical system, the two-step adaptive approach goes
as follows. A PRBS with length Ninit = 300 and amplitude 0.5 is used
for the estimation of the initial model Gm . An input is designed based
on Gm using (5.5) with Nopt = 500. This strategy is compared with
the approach using a single set of PRBS data of length 800, i.e. the data
lengths are equalized. The signal power of the PRBS is set to 6 times that
of the two-stage sub-optimal input in each Monte-Carlo run. One realization of the input sequences for both strategies are plotted versus time
125
5.3 A Mechanical System Application
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
2
3
4
5
6
ωB
7
8
9
10
Figure 5.5: The mechanical system. The input power αopt () and the
ratio αopt /αwhite () as functions of the bandwidth of T .
in Figure 5.9. For 1000 Monte-Carlo simulations, 92.3% of the obtained
models from the two-step procedure passed the constraint ∆∞ ≤ 0.1.
The corresponding figure for the PRBS approach was 91.8%. Thus the
input excitation for the PRBS approach needs to be about 6 times the
optimal input power to produce equally good models. We conclude that
for the given quality constraint, the excitation level of the input signal can
be reduced significantly using the illustrated sub-optimal input design.
5.3.3
Sensitivity of the Optimal Design
It was illustrated in the previous section that the input design performed
well even when the true system was replaced by an estimated model.
However, some caution must be taken. In this section we will illustrate,
by means of an example, that the optimal input design method may be
sensitive to the quality of the initial model.
Let us again consider the mechanical system. Assume that the true
system is unknown, but that we have a preliminary model Gm that deviates from the true system Go . The magnitude plots of Gm and Go are
shown in Figure 5.10. Now assume that the true system is replaced with
126
5 Applications
Bode Diagram
40
20
magnitude (dB)
0
−20
−40
−60
−80
−100
−120 −2
10
−1
10
0
10
frequency (rad/sec)
1
10
Figure 5.6: The mechanical system, low bandwidth of T . Thick solid
line: optimal input spectrum. Dashed line: transfer function T . Dashdotted line: open-loop system. Thin solid line: white input spectrum.
Gm in the design problem (5.5). The resulting sub-optimal input spectrum is plotted in Figure 5.10 together with the optimal input spectrum
based on Go . The bandwidth of T is 8 rad/s.
We see from Figure 5.10 that the input power is differently distributed
for the sub-optimal design compared to the optimal one. However, the
energy is in both cases concentrated around the first resonance peak.
This is completely in line with the observations in Section 5.3.1 where
it was recognized that it is effective to inject energy around the first
resonance peak for high bandwidths of T . However, an input design that
concentrates the power around a resonance peak may be very vulnerable
with respect to bad model estimates of this peak. The model Gm is
one such example. For example, out of 100 Monte-Carlo identification
experiments, only 23 of the obtained models with the sub-optimal design
achieve the quality constraint ∆∞ < 0.1. Optimally it should be at
127
5.4 Conclusions
Bode Diagram
40
20
magnitude (dB)
0
−20
−40
−60
−80
−100
−120 −3
10
−2
10
−1
10
0
10
frequency (rad/sec)
1
10
Figure 5.7: The mechanical system, high bandwidth of T . Thick solid
line: optimal input spectrum. Dashed line: transfer function T . Dashdotted line: open-loop system. Thin solid line: white input spectrum.
least 95%. We conclude that it is important that the initial experiment
captures the resonance peaks of importance. The reason why the suboptimal method in Section 5.3.2 performed well is probably that the
initial experiment with the PRBS signal excited these peaks yielding
proper initial model estimates. A way to make the solution less sensitive
to this peak is to restrict the flexibility of the input spectrum, i.e. reduce
M . Another solution is to include constraints on the input spectrum in
terms of frequency-wise bounds or a frequency weighting on the input
power, see (3.57). See also the related discussion in Section 3.10.
5.4
Conclusions
In this chapter we have illustrated and quantified possible benefits with
optimal input design in identification for two applications. We have com-
128
5 Applications
50
40
30
magnitude (dB)
20
10
0
−10
−20
−30
−40
−50
−1
10
0
10
frequency (rad/s)
1
10
Figure 5.8: The mechanical system, high bandwidth of T . Thick solid
line: optimal input spectrum. Dashed line: transfer function T . Dashdotted line: open-loop system. Thin solid line: upper bound on Φu .
pared optimally designed input signals with white input signals. The
results show significant benefits of appropriate input designs. Either the
experiment time can be shortened or the input power can be reduced.
Through Monte-Carlo simulations it is illustrated that there are advantages also in the case where the true system is replaced by a model
estimate in the design.
Here we have only touched upon adaptive input design. For practical
implementations of optimal input design, adaptation is typically required.
There are though only few contributions related to adaptive input design,
see e.g. (Hjalmarsson et al., 1996), (Lindqvist and Hjalmarsson, 2001) and
(Lindqvist, 2001). More contributions are certainly welcome.
Another important issue is the sensitivity of the input design with
respect to the underlying system properties, since the optimal design
typically depends on those. Such knowledge can be used to make the
129
5.4 Conclusions
2
1
0
−1
−2
0
5
10
15
5
10
15
20
25
30
35
20
25
30
35
2
1
0
−1
−2
0
time (s)
Figure 5.9: The mechanical system. Above: the input sequence not involving optimal input design. Below: the input sequence when involving
optimal input design. The first part of the signal is used to identify an
initial model estimate. The second part is the optimal input signal.
input more robust with respect to uncertainties of the underlying system. For this analysis the variance expressions (2.25) may provide some
insights.
130
5 Applications
From: u1 To: y1
20
magnitude (dB)
0
−20
−40
−60
−80
−100 −1
10
0
10
1
10
Figure 5.10: The mechanical system. Thick solid line: the optimal
input spectrum. Thin solid line: the sub-optimal input spectrum (see
text). Dashed line: weight function T . Dotted line: the model. Dash
dotted line: the true system.
Chapter 6
Input Design for
Identification of Zeros
In this chapter we will consider input design for accurate identification
of non-minimum phase zeros in linear systems. Recently, several variance results regarding estimation of non-minimum phase zeros have been
presented, see (Lindqvist, 2001), (Hjalmarsson and Lindqvist, 2002) and
(Mårtensson and Hjalmarsson, 2003). Based on these results, we will
show how to design the input that has the least energy content required
to keep the variance of an estimated zero below a certain limit. Both
analytical and numerical results are presented. A striking fact of the analytical results is that the variance of an estimated zero is independent
of the model order when the optimal input is applied.
We will also quantify the benefits of using the optimal design compared to using a white input signal or a square-wave. Robustness issues
will also be covered in this chapter. The optimal design depends on the
location of the true unknown zero and is therefore infeasible. This is
typically circumvented by replacing the true zero by an estimate. The
sensitivity of the solution to this estimate is investigated .
The chapter is organized as follows. Section 6.2 contains information about system assumptions and the used identification framework.
Asymptotic variance expressions for an estimated zero are also given.
Based on these variance expressions the optimal input design problem is
formulated and both analytical and numerical solutions to this problem
are presented in Section 6.3 and Section 6.4. Sensitivity and benefits
132
6 Input Design for Identification of Zeros
of optimal input design for identification of zeros are discussed and illustrated in Section 6.5. Section 6.6 studies the connections between models
of restricted complexity and variance optimal input design. The chapter
is concluded in Section 6.7.
6.1
Introduction
A model is often used in control design for both analysis and synthesis purposes. Consequently, system identification with focus on control
design has been a research area with a lot of activity. The overall objective of identification for control is to deliver models suitable for control design. See (Gevers, 1993), (Van den Hof and Schrama, 1995) and
(Hjalmarsson, 2004) for overviews of the area.
For scalar linear systems, the model should be accurate in the frequency bands important for the control design and it is generally acknowledged that the region around the cross-over frequency of the loop gain is
of particular importance. Since the loop gain depends on the controller
yet to be designed, the cross-over frequency is in general unknown. However, for systems that contain performance limitations e.g. non-minimum
phase zeros and time-delays, the achievable bandwidth is restricted. For
example, a real single continuous time non-minimum phase zero at z restricts the achievable bandwidth to approximately z/2 (Skogestad and
Postlethwaite, 1996).
Therefore knowledge of a non-minimum phase zero is very useful since
it gives valuable information of what control specifications that can be
defined and a hint on reasonable choice of excitation. Furthermore, it is
also valuable in the identification step since it would simplify the task
of deciding on model structure, model order, noise model and pre-filters
since it specifies an important frequency range.
Spurred by this observation, expressions for the variance of an estimated non-minimum zero have been derived in (Lindqvist, 2001) for FIR
models and in (Hjalmarsson and Lindqvist, 2002) for ARX models. This
work is generalized to include general linear single input/single output
(SISO) model structures in (Mårtensson and Hjalmarsson, 2003). A key
result in these contributions is that the estimated non-minimum phase
zeros is not subject to the usual increase in the variance when the model
order is increased.
Based on these variance results, we will in this contribution consider
input design for accurate identification of non-minimum phase zeros. The
133
6.2 Estimation of Parameters and Zeros
input design problem is formulated as an optimization problem where the
objective is to minimize the input effort required to keep the variance of
the non-minimum zero below a certain limit.
6.2
Estimation of Parameters and Zeros
We will assume that the true system is linear and contains a non-minimum
phase zero. For identification of this system we will use the prediction
error framework presented in Chapter 2. For the reader’s convenience we
will repeat some of the basics in this framework. We will also introduce
variance expressions for the estimated zeros.
6.2.1
Parameter Estimation
The model of our single input/single output system is defined by
y(t) = G(q, θ)u(t) + H(q, θ)e(t)
(6.1)
where G and H are parameterized by the real valued parameter vector θ.
Furthermore, y is the output and u is the input and e is zero mean white
noise with variance λ. It is assumed that G and H have the rational
forms
G(q, θ) =
q −nk B(q, θ)
,
A(q, θ)
H(q, θ) =
C(q, θ)
D(q, θ)
(6.2)
where
A(q, θ) = 1 + a1 q −1 + · · · + ana q −na
B(q, θ) = b1 + b2 q
C(q, θ) = 1 + c1 q
−1
−1
D(q, θ) = 1 + d1 q
−1
+ · · · + bnb q
+ · · · + cnc q
−nb +1
−nc
+ · · · + dnd q
−nd
(6.3)
(6.4)
(6.5)
(6.6)
with q being the delay operator. We will assume that there exists a
description of the true system within the model class defined by θ = θo
and λ = λo . The one-step-ahead predictor for the model (6.1) is
(6.7)
ŷ(t, θ) =H −1 (q, θ)G(q, θ)u(t) + 1 − H −1 (q, θ) y(t)
and the prediction error is
ε(t, θ) = H −1 (q, θ) (y(t) − G(q, θ)u(t)) .
(6.8)
134
6 Input Design for Identification of Zeros
The parameters are estimated with the prediction error method using a
least mean square criterion to minimize the prediction error. The parameter estimate is
N
1 2
ε (t, θ),
(6.9)
θ̂N = arg min
2N t=1
θ
where N denotes the number of the data that is used for the estimation. Under mild assumptions the parameter estimate has an asymptotic
distribution (Ljung, 1999) that obeys
√
N (θ̂N − θo ) → N (0, P ) as N → ∞
lim N E(θ̂N − θo )(θ̂N − θo )T = P
−1
P (θo ) = λo E[ψ(t, θo )ψ T (t, θo )]
∂
y(t, θ)
ψ(t, θo ) =
∂θ
N →∞
(6.10)
θ=θo
Here N denotes the Normal distribution. Using (6.7) we obtain
ψ(t, θo ) = Fu (q, θo )u(t) + Fe (q, θo )eo (t)
where
(6.11)
Fu (q, θ) =
∂G(q, θ)
1
H(q, θ) ∂θ
(6.12)
Fe (q, θ) =
∂H(q, θ)
1
.
H(q, θ)
∂θ
(6.13)
and
Under the assumption of open-loop operation, i.e. that u and e are uncorrelated, we can write
π
1
P −1 (θo ) =
Fu (θo )Φu Fu∗ (θo )dω + Ro (θo )
(6.14)
2π −π
where Φu is the spectrum of the input and where
λo π
Ro (θo ) =
Fe (θo )Fe∗ (θo )dω.
2π −π
(6.15)
The connection between the asymptotic covariance and the input spectrum in (6.14) will be further exploited for input design for identification
of zeros. But first we will review some results regarding the accuracy of
identified zeros.
135
6.2 Estimation of Parameters and Zeros
6.2.2
Estimation of Zeros
Consider identification of a system defined by (6.1) and (6.2). Let θbT =
[b1 , . . . , bnb ] and introduce the polynomial
p(z, θb ) = b1 z nb −1 + b2 z nb −2 · · · + bnb .
(6.16)
A zero zi (θ) of the system (6.1) is defined by
zi (θ) = p(zi , θb ) = 0.
Now we consider one particular zero, z(θ), which is assumed to have
multiplicity one. Introduce the notation
ẑ = (θ̂N ),
zo = z(θo ),
B(q, θ)
1 − z(θ)q −1
F (z) = F (z(θ), θ),
% θ) =
B(q,
and
where F corresponds to a general transfer function evaluated at q = z(θ).
Further, introduce the vector
Γb (q) = 1
q −1
···
q −nb +1
T
.
(6.17)
In (Lindqvist, 2001) it is established that the variance of an estimated
zero is
λo |zo |2 ∗
lim N E(ẑ − zo )2 =
Γ (z )P Γ (z )
(6.18)
% o )|2 b o b b o
N →∞
|B(z
where Pb is the asymptotic the covariance matrix of θb .
If we consider non-minimum phase zeros and increase the model order
we can simplify the expression (6.18). Let u(t) = Q(q)v(t) where v(t) is a
white noise sequence with variance 1 and Q(q) is a minimum phase filter.
Then according to (Mårtensson and Hjalmarsson, 2003) we have
lim
lim N E(ẑk − zo )2 =
nb →∞ N →∞
for general linear SISO systems.
λo |zo |2 |H(zo )|2 |A(zo )|2
% o )|2 |Q(zo )|2
(1 − |zo |−2 )|B(z
(6.19)
136
6.3
6 Input Design for Identification of Zeros
Input Design - Analytical Results
In this section we will use the variance expressions (6.18) and (6.19) in
order to determine suitable inputs for accurate identification of zeros.
The input design will be formulated as an optimization problem where
we seek the input spectrum with least energy that keeps the variance
below a certain limit. This can be stated as follows:
π
1
Φu (ω)dω
minimize
Φu
2π −π
(6.20)
subject to Var ẑ ≤ γ.
The choice of optimization variable is natural because the only quantity, asymptotically in N , that can be used to shape the variance is the
spectrum of the input, cf. (6.14), (6.18) and (6.19).
6.3.1
Input Design for Finite Model Orders
The first step to solve (6.20) is to rewrite the original problem formulation
into a convex program with respect to the input spectrum. The objective
function is already convex but the constraint is not. Let
Γ
(6.21)
Γb0 = b
0
and
α2 =
λo |zo |2
.
% o )|2
N |B(z
(6.22)
The variance constraint in (6.20) using (6.18) now becomes
γ
− Γ∗b0 (zo ) P Γb0 (zo ) ≥ 0,
α2
(6.23)
which by Corollary 3.1 is equivalent to
P −1 −
α2
Γb0 Γ∗b0 ≥ 0.
γ
(6.24)
Since the inverse of the covariance matrix is affine in Φu , the constraint
(6.24) is convex with respect to Φu . Thus, the convex formulation of
137
6.3 Input Design - Analytical Results
(6.20) is
minimize
Φu
subject to
1
2π
P
−1
π
−π
Φu (ω)dω
α2
Γb0 Γ∗b0 ≥ 0.
−
γ
(6.25)
This means that if (6.25) is feasible it has a global optimal solution.
The expression (6.25) is convex but in general infinite-dimensional which
calls for special care when undertaking the optimization. But as will be
shown in Section 6.4, by imposing certain parameterizations of the input
spectrum it is possible to reformulate (6.25) as a finite-dimensional convex
optimization problem. Today, there exist several numerical optimization
routines that solve such problems to any demanded accuracy. But first
we will show that it is possible to derive analytical solutions to (6.25) for
FIR and for ARX model structures.
Theorem 6.1
Consider the FIR-system
y(t) = q −nk B(q, θb )u(t) + e(t).
(6.26)
For a non-minimum phase zero, zo , the input design problem (6.25) is
solved by filtering unit variance white noise with the first order AR-filter
0
1 − zo−2
α
,
(6.27)
Q(q) = √
γ (1 − zo−1 q −1 )
i.e. by placing a pole in zo−1 . The minimal required input energy is α2 /γ.
For a minimum phase zero, zo , the input design problem (6.25) is
solved by filtering unit variance white noise with the first order AR-filter
0
1 − zo2
α
,
(6.28)
Q(q) = √
n
−1
γ|zo | b (1 − zo q −1 )
i.e. by
placing a pole in zo . The minimal required input energy is
α2 / γ|zo |2(nb −1) .
Proof: With the covariance function
π
1
rk =
Φu (ω)ejωk
2π −π
(6.29)
138
6 Input Design for Identification of Zeros
the energy of the input can be expressed as
π
1
Φu (ω)dω = r0 .
2π −π
(6.30)
For notational convenience, let the variable n̄b be defined as n̄b = nb − 1.
Then the asymptotic covariance matrix becomes
⎞
⎛
r0 · · · rn̄b
⎜
.. ⎟ = R .
..
P −1 = ⎝ ...
(6.31)
u
.
. ⎠
rn̄b
...
r0
In this case the constraint in (6.25) becomes
⎛
⎞
⎛
r0 · · · rn̄b
1
...
2
α ⎜ .
⎜ ..
⎟
.
.
..
..
.. ⎠ −
⎝ .
⎝ ..
.
γ
−n̄b
...
rn̄b . . . r0
zo
⎞
zo−n̄b
⎟
..
⎠ ≥ 0.
.
−2n̄b
zo
(6.32)
To satisfy (6.32) we need that r0 ≥ α2 /γ. If we can find a covariance
function rm with r0 = α2 /γ that satisfies (6.32) we have a solution. In
the following we prove that the covariance function
rm =
α2 −m
z
γ o
(6.33)
is such a solution when zo is a non-minimum phase zero. First we note
that this particular choice of rm gives Ru ≥ 0 and that the first row and
column of (6.32) is zero. Now we need to show that
⎛
⎞ ⎛ −1 ⎞
1
. . . zo1−n̄b
zo
⎜
⎟ ⎜ .. ⎟ −1
..
..
.
.
. . . zo−n̄b ≥ 0. (6.34)
⎝
⎠ − ⎝ . ⎠ zo
.
.
.
zo1−n̄b
...
1
Using Schur complements this
⎛
1
2
α ⎜ .
⎝ ..
γ
zo−n̄b
zo−n̄b
is equivalent to
⎞
. . . zo−n̄b
.. ⎟ = R ≥ 0,
..
u
.
. ⎠
...
(6.35)
1
which is true as noted before. A signal with the covariance function (6.33)
can be generated by filtering unit variance white noise with the first order
AR-filter (6.27). This proves the first part of Theorem 6.1.
6.3 Input Design - Analytical Results
139
Minimum phase zeros can be treated in the same way. We first note
that (6.32) requires that
(6.36)
r0 ≥ α2 zo−2n̄b .
The problem (6.20) is solved by letting the input signal have the covariance function
rm = α2 zom−2n̄b ,
(6.37)
which is obtained by filtering white noise with the filter
0
α 1 − zo2
Q(q) =
.
|zo |nb −1 (1 − zo q −1 )
(6.38)
This concludes the proof of Theorem 6.1.
Remark: The filter (6.27) is constructed such that the variance of the
estimated zero will be γ. Thus, the variance constraint in (6.20) is tight.
Notice that the optimal filter is independent of the model order. From this
it is easy to conclude that the variance of the estimated non-minimum
phase zero also will be independent of the model order1 when optimal
input design is used. This is a very interesting fact. This means that
there is no loss in accuracy by using an over-parametrized model when
optimal input design is applied, cf. Example 3.15.
Remark: The optimal filter for minimum phase zeros (6.28) depends on
the model order. Hence, the order independence of the variance, that
holds for non-minimum zeros when optimal input design is applied, does
not hold for minimum phase zeros.
The result in Theorem 6.1 will now be generalized to ARX model
structures.
Theorem 6.2
Consider the ARX-system
y(t) = q −nk
1
B(q, θb )
u(t) +
e(t).
A(q, θa )
A(q, θa )
(6.39)
For a non-minimum phase zero, the input design problem (6.25) has the
same solution as for a FIR-system, see Theorem 6.1.
1 The
model order must be equal or greater than the true system order.
140
6 Input Design for Identification of Zeros
Proof: For the ARX-system (6.39) we have
.
/
Γb (q)
−nk
Fu (q) = q
,
− B(q)
A(q) Γa (q)
0
.
Fe (q) =
1
− A(q)
Γa (q)
Let G =
P −1
(6.40)
B
A.
Now we have that
π ∗
1
0 0
Γb Γ∗b
−G Γb Γ∗a
=
Φu dω +
0 R0
2π −π −GΓa Γ∗b |G|2 Γa Γ∗a
Ru Ruy
=
.
Ryu Ry
The constraint in (6.25) becomes
α2
Ru Ruy
Γb Γ∗b
−
Ryu Ry
0
γ
0
0
(6.41)
≥ 0.
(6.42)
The upper left corner of (6.42) is the same as (6.32). In the following we
prove that the solution to the FIR-case, (6.33), also satisfies (6.42). We
start by examining Ryu in detail:
π
1
Γa Γ∗b GΦu dω
Ryu = −
2π −π
π
% − zo e−jω ) α2 (1 − zo−2 )
B(1
1
Γa Γ∗b
dω
−
(6.43)
2π −π
A
|1 − zo−1 e−jω |2
π
% α2 (1 − z −2 )
B
1
o
dω.
Γa Γ∗b
=−
2π −π
A (1 − zo ejω )
Let
∞
%
B
α2 (1 − zo−2 ) =
fκ e−jωκ
A
κ=0
(6.44)
and
1
e−jω zo−1
=−
jω
(1 − zo e )
(1 − zo−1 e−jω
∞
= −e−jω
gτ e−jωτ .
τ =0
(6.45)
141
6.3 Input Design - Analytical Results
This gives us
Ryu [m, n] =
=
1
2π
)
π
∞
e−jω(n−m−2)
−π
m≥n−1
,
otherwise
Ryu
⎛
0
⎜ ..
= ⎝.
0
..
.
0
···
..
.
∞
gτ e−jωτ
τ =0
κ=0
0,
,
i.e.
fκ e−jωκ
···
..
.
(6.46)
⎞
.. ⎟ .
.⎠
(6.47)
Now we look at the constraint (6.42). Here we also need r0 ≥ α2 /γ. With
the covariance function
rk =
α2 −k
z
γ o
(6.48)
the first row and column of (6.42) are zero and we need
⎛
1
⎜ ..
⎜ .
⎜ 1−n
⎜zo
⎜
⎜ 0
⎜
⎜ .
⎝ ..
0
···
..
.
···
..
.
···
zo1−n
..
.
1
0
..
.
···
..
.
···
..
.
···
This is by Schur complements
⎛
1
zo−1 · · · zo−n
⎜ zo−1
1
· · · zo1−n
⎜
⎜ ..
..
..
..
⎜ .
.
.
.
⎜
−n
zo1−n · · ·
1
α2⎜
⎜ zo
⎜ 0
0
⎜
⎜ .
..
.
.
.
⎝ .
.
.
0
0
···
⎞ ⎛ −1 ⎞ ⎛ −1 ⎞T
zo
zo
0
.. ⎟ ⎜ .. ⎟ ⎜ .. ⎟
⎜ . ⎟⎜ . ⎟
.⎟
⎟⎜
⎟
⎟ ⎜ −n
⎟ ⎜zo ⎟ ⎜zo−n ⎟
⎟⎜
⎟ ≥ 0.
⎟−⎜
⎜
⎟⎜
⎟
⎟
⎟ ⎜ 0 ⎟⎜ 0 ⎟
⎜
⎜
⎟
⎟
.. ⎠ ⎝ .. ⎠ ⎝ .. ⎟
.
. ⎠
.
0
0
equivalent to
⎞
0 ··· 0
0 ··· 0 ⎟
⎟
.. ⎟
..
.
.⎟
⎟
Ru
⎟
⎟= Ryu
· · · ⎟
⎟
.. . .
.. ⎟
. .⎠
.
···
(6.49)
Ruy
≥0
Ry
(6.50)
which is true. This proves that the covariance function (6.48) solves the
input design problem (6.20) for the ARX-system (6.39).
142
6 Input Design for Identification of Zeros
Remark: As for FIR models, the variance of the estimated zeros will
be independent of the model order when we use optimal input design.
Furthermore, the solution in Theorem 6.2 gives a tight bound of the
variance constraint with a filter that is independent of the A-polynomial.
Hence, it is easy to conclude that the variance of the zero is independent
of the A-polynomial as well. However, it is important to estimate the
A-polynomial for the asymptotic properties (2.12) to hold.
6.3.2
Input Design for High-order Systems
For general linear SISO models it is possible to derive an analytical solution of (6.20) based on the asymptotic variance expression (6.19).
Theorem 6.3
The input design problem (6.20) where the variance of a non-minimum
phase zero is defined by (6.19) is solved by filtering unit variance white
noise with the first order AR-filter
0
1 − zo−2
α|H(zo )A(zo )|
(6.51)
Q(q) =
√
γ
(1 − zo−1 q −1 )
Proof: Introduce
A = ao
···
aη
T
, Γ= 1
zo−1
···
zo−η
T
and let the input filter Q(q) be the FIR-filter
Qη (zo ) = AT Γ
then the input energy is
π
η
1
jω 2
|Qη (e )| dω =
a2i = AT A
2π −π
i=0
(6.52)
(6.53)
With
α2 |H(zo )A(zo )|2
γ
the input design problem (6.20) can be stated
αh2 =
minimize
A
subject to
AT A
AT ΓΓT A ≥
αh2
.
(1 − |zo |−2 )
(6.54)
143
6.3 Input Design - Analytical Results
The minimum is achieved when A is an eigenvector corresponding to the
largest eigenvalue of ΓΓT . The matrix ΓΓT has rank one and the only
non-zero eigenvalue is ΓT Γ with the eigenvector Γ, giving the optimal
solution
αh
A= Γ.
(6.55)
(1 − zo−2 )ΓT Γ
The optimal FIR-filter is
Qη = αh
η
−i −i
i=0 zo q
(1 − zo−2 )
η
−2i
i=0 zo
(6.56)
and if we let the order of the FIR-filter go to infinity we get the optimal
filter
αh (1 − zo−2 )
Q(q) = lim Qη (q) =
.
(6.57)
η→∞
1 − zo−1 q −1
This proves Theorem 6.3.
Notice that the optimal filter coincides with (6.27) for FIR and ARX
models. This in complete line with the observation that the optimal filter
for any finite model order is actually given by (6.27) for these model
structures. The solution for other model structures is in principle the
same, i.e. a pole placed in (zo )−1 , when the model order is sufficiently
large. The only difference is the gain of the filters.
6.3.3
Realization of Optimal Inputs
So far we have realized the optimal input by filtering unit variance white
noise through a specific first order AR-filter. This is by no means the
only way to realize the optimal input. In fact, all inputs with a spectrum
that coincides with that of the optimal first order AR-filter, are optimal.
This follows from the asymptotic theory. For all the cases presented in
Theorem 6.1-6.3, the optimal input is characterized by auto-correlations
of the form rm = βη −|m| . Such inputs can, apart from filtering white
noise, e.g. also be realized by binary signals, see (Söderström and Stoica,
1989; Tulleken, 1990).
144
6.4
6 Input Design for Identification of Zeros
Input Design - A Numerical Solution
We have so far presented analytical solutions to (6.25) for FIR and ARX
model structures and for general linear model structures if we let the
model order tend to ∞. Here we will show how to solve (6.25) numerically
for a Box-Jenkins model structure defined by (6.1)-(6.6). The key is to
rewrite (6.25) into a finite-dimensional convex program which indeed can
be obtained by a suitable parametrization of the input spectrum. For an
overview of different parameterizations of the input spectrum, we refer to
Chapter 3.2. Here we will use a partial correlation parametrization of the
input spectrum, which was introduced in (Stoica and Söderström, 1982),
see also Example 3.9. Define L and {lk } as
L(ejω , θ) =|C(ejω , θ)|2 |A(ejω , θ)|4 nl
l|k| e−jωk
(6.58)
k=−nl
where nl = 2na + nc − 1. Furthermore, introduce the auto-correlations
ck defined by
1
ck =
2π
π
−π
Φu (ω) jωk
e
dω
L(ejω , θo )
(6.59)
and let np = na + nb + nd − 1.
Lemma 6.1
Let L(ejω , θ) be defined by (6.58). Furthermore assume that the polynomials A, B, C and D in the Box-Jenkins model are coprime. Then there
exist matrices Mk ∈ Rn×n such that the inverse covariance matrix P −1
defined by (6.14) can be expressed as
P −1 (θo ) =
np
c|k| (θo )Mk (θo ) + Ro (θo )
(6.60)
k=−np
Proof: See (Stoica and Söderström, 1982) or Example 3.9.
With this particular parametrization it is possible to express the input
power as a linear function.
145
6.4 Input Design - A Numerical Solution
Lemma 6.2
The power of the input u(t) with power spectrum Φu (ω) can be expressed
as
1
2π
π
−π
Φu (ω)dω =
nl
c|k| l|k|
(6.61)
k=−nl
Proof: See (Stoica and Söderström, 1982) or Example 3.11.
Now it is possible to rewrite the original input design formulation
(6.25).
Theorem 6.4
Under the assumptions stated in Lemma 6.1 and Lemma 6.2, the input
design problem (6.25) is equivalent to the following finite-dimensional
convex program
minimize
c0 ,...,cm
subject to
nl
c|k| l|k|
k=−nl
np
c|k| Mk + Ro (θo ) − α2 Γb0 Γ∗b0 ≥ 0
k=−np
⎡
c0
⎢ c1
⎢
⎣· · ·
cm
c1
c0
···
cm−1
⎤
(6.62)
···
cm
· · · cm−1 ⎥
⎥≥0
···
··· ⎦
...
c0
where m = max(nl , np ).
Proof: Direct application of the results in Lemma 6.1 and Lemma 6.2
to (6.25). The constraint on the Toeplitz matrix in (6.62) assures that
the optimization variables c0 , . . . , cm are indeed auto-correlations to a
quasi-stationary process.
The input design problem (6.62) is now convex and finite-dimensional
and there are several efficient numerical optimization methods that solve
such problems. Let us illustrate the results of this section.
146
6 Input Design for Identification of Zeros
100
Magnitude (dB)
50
0
−50
−100
−150
−2
10
−1
10
0
10
Frequency (rad/s)
1
10
Figure 6.1: Optimal spectra for nb = 2 (solid) and for nb = 3 (dashed).
6.4.1
Numerical Example
We will assume that the dynamics of the system is defined by the continuous time system
Gc (s) =
1−s
(s + 1)(2s + 1)
(6.63)
i.e. there is a continuous time non-minimum phase zero in 1. With a zeroorder hold discretization with sampling time Ts = 0.25 s this corresponds
to the discrete non-minimum phase zero zd = 1.29. Furthermore, we
will assume that the input/output relation is defined by an output-error
(OE) model structure, the data length is N = 500 and λo = 0.1. When
the model order equals the true system order, i.e. the order is two, the
solution to (6.62) is basically a sum of two sinusoids. When the order of
the B-polynomial, nb , is increased, the solution coincides with the first
order AR-filter defined in (6.51) This is illustrated in Figure 6.1 where the
optimal spectra for nb = 2 and for nb = 3 are shown. Notice that there
is a quite dramatic difference between the optimal spectra for different
model orders.
147
6.5 Sensitivity and Benefits
6.5
Sensitivity and Benefits
Recapitulate the asymptotic variance expression
Var ẑ lim
lim N E(ẑk −zo )2 =
nb →∞ N →∞
λo |zo |2 |H(zo )|2 |A(zo )|2
(6.64)
% o )|2 |Q(zo )|2
(1 − |zo |−2 )|B(z
that was introduced in (6.19). Based on (6.64) we will in this section
try to quantify possible benefits of using an optimal or a sub-optimal
design instead of using a white input. There will be a comparison of the
obtained variance levels when the input power is normalized to one for
all the designs. We will also study how the location of the zero affects
the result.
In the first comparison, the optimal input filter with unit power i.e.
0
1 − zo−2
(6.65)
Qopt (q, zo ) =
1 − zo−1 q −1
is compared with Q = 1. From (6.64) we have that
Var ẑ(Qopt )
1
=
= 1 − zo−2
2
|Q
Var ẑ(Q = 1)
opt (zo )|
(6.66)
The thick solid line in Figure 6.2 corresponds to 1 − zo−2 as a function
of the zero location. Thus there is a substantial decrease in variance for
zeros close to the unit circle when the optimal input design is used instead
of a white input. This comparison also indicates that when the zero is
located far from the unit circle (|zo | 4), there is no benefit in using
optimal input design. A white input performs almost as good as the
optimal design. One interpretation of this relates to the location of the
discrete zero with respect to the sampling time. Consider the continuous
system (6.63), which for Ts = 0.25 has a discrete zero in 1.29. If the
sampling time is increased the discrete zero will move away from the unit
circle, and hence the effect of the non-minimum phase zero will e.g. be less
visible in the discrete measurements of a step response. Consequently,
the benefits of optimal input design are reduced.
In a practical situation the location of the true zero is unknown and
an estimate of the zero may be used for input design. Given the optimal
filter (6.65) and an estimate of the zero, ẑ, a natural choice of input filter
is
√
1 − ẑ −2
(6.67)
Qapp (q, ẑ) =
1 − ẑ −1 q −1
148
6 Input Design for Identification of Zeros
1
0.9
0.8
variance reduction
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
zero location
6
7
8 9 10
15
Figure 6.2: The thick solid line represents the optimal variance reduction as a function of the zero location, see (6.66). The dashed lines corresponds to (6.68) and illustrates the variance reduction with a sub-optimal
design.
A reasonable question is how the uncertainty in the zero location will
affect the estimation accuracy. This is also illustrated in Figure 6.2. The
dashed lines corresponds to the ratio
2
1 − ẑ −1 zo−1
Var ẑ(Qapp )
=
1 − ẑ −2
Var ẑ(Q = 1)
(6.68)
as a function of ẑ for four different locations of the true zero (corresponding to the circles in the figure). These curves show that there is a quite
large tolerance with respect to the estimated zero location.
149
6.5 Sensitivity and Benefits
Table 6.1: Comparison
phase zero.
Model order PRBS
2
0.0022
5
0.0027
6.5.1
of the variance for an estimated non-minimum
Qopt
0.0011
0.0011
Qapp
0.0012
0.0012
Square-wave
0.0017
0.0023
Numerical Example
Let the dynamics of the system be defined by the continuous system in
Section 6.4.1. The sampling time and the data length are the same as
in Section 6.4.1 but we will assume that the true system is of ARX type
with a noise variance of 0.0025.
Now we will compare, by means of an example, the obtained accuracy when using four different types of inputs. The first input is a
Pseudo-Random-Binary-Signal (PRBS) which has white-noise-like properties. The second input is the optimal one and hence the optimal input
filter is given by (6.27). We know from (6.66) that the optimal gain in
accuracy when using the optimal input compared to a white input is approximately a factor 2.5 when the model order tends to infinity. These
two input signals will be compared to a sub-optimal input given by (6.67)
with the zero estimate ẑ = 1.6 and a square-wave signal where the signal
is constant in 10 s before switching level. This square-wave signal, that
takes the values ±1, is constructed such that the typical dip of the step
response for a system with a non-minimum phase zero is clearly visible.
The power of all inputs are equalized to one. We have used a model
structure of order two (the true order) and one of order five, i.e. an overparametrization. The result of 10000 Monte-Carlo simulations is given
in Table 6.1. The theoretical value of the variance for the optimal input is, asymptotically in data, 0.0010, independently of the model order
provided it is larger than the true system order. The Monte-Carlo simulations confirm this well. The variance of the optimal input remains
constant when the order is increased. Furthermore, the variance is close
to the asymptotic value. Also the sub-optimal input performs well. From
(6.64) we have that
1 − (zo )−2 1 − (ẑ)−2
Var ẑ(Qopt )
|Qapp (zo )|2
=
=
.
(6.69)
2
|Qopt (zo )|2
Var ẑ(Qapp )
(1 − (ẑ zo )−1 )
With zo = 1.29 and ẑ = 1.6, (6.69) equals 0.915 that can be compared
150
6 Input Design for Identification of Zeros
with 0.0011/0.0012 ≈ 0.917 obtained from the Monte-Carlo simulations.
For model order two, the variance reduction using the optimal input
instead of the square-wave is about 1.5. When the order is increased to
five, this factor increases to 2. The corresponding figures for the PRBS
input are 2 and 2.5, respectively.
6.6
Using Restricted Complexity Models for
Identification of Zeros
The optimal correlation sequence of the input for accurate identification
−|k|
where zo denotes
of zeros for FIR or ARX systems is given by rk = zo
a unique non-minimum phase zero, see Theorem 6.1 and Theorem 6.2.
This also holds for general linear models asymptotically in the order of
the numerator polynomial, see Theorem 6.3. These results are derived
under the assumptions of full or high order modelling. Now we will study
the use models of lower order than the true system.
Assume that the true system is linear and described by
y(t) =
n
gk u(t − k) + v(t).
(6.70)
k=1
Let the model be the linear regression
y(t) = ϕT (t)θ + e(t) = u(t − 1)
···
⎤
b1
⎢ ⎥
u(t − m) ⎣ ... ⎦ + e(t)
⎡
(6.71)
bm
where e represents white noise. Thus the one-step ahead output prediction is
ŷ(t) = ϕT (t)θ.
The least-squares estimate is obtained as
N
−1 N
T
ϕ(t)ϕ (t)
ϕ(t)y(t).
θ̂N =
t=1
(6.72)
(6.73)
t=1
In Example 3.15, we saw that we obtained a consistent estimate of the
static gain when we used optimal input design independently of which
the model order was. Now, we will study θ̂N when N tends to infinity
and check the consistency.
151
6.7 Conclusions
Theorem 6.5
Let the true system be described by (6.70) and assume that the true
system has one non-minimum zero zo . Furthermore, assume that u and
v are statistically independent and that u is a stationary and stochastic
−|k|
process with auto-correlations E u(t)u(t − k) = zo . Then the leastsquares estimate (6.73), based on the model (6.71) with 1 ≤ m < n,
gives a consistent estimate of zo .
Proof: Notice that the true system is a linear regression in gk . Use
this and insert (6.70) into (6.73). Since u and v are independent the limit
estimate θ∗ becomes
⎤−1 ⎡
⎡
⎤
r2 · · ·
rm
···
rn
r0 r1
r0 · · · rm
⎢
.. ⎥ ⎢r1 r0 r1 · · · rm−1 · · · rn−1 ⎥ θ
..
θ∗ = ⎣ ...
⎦ o
.
. ⎦ ⎣. .
..
..
..
.
.
.
.
.
.
···
.
rm · · · r0
(6.74)
−|k|
where θoT = [g1 , g2 , . . . , gm ]. Now insert the correlations rk = zo
into
(6.74). This gives
⎡
⎤
1 0
0 0
0
···
⎢0 1
⎥
0 0
0
···
⎢
⎥
θ∗ = ⎢ .
(6.75)
⎥ θo .
..
⎣ ..
⎦
.
0 · · · 0 1 zo−1 zo−2 · · ·
A zero of the limit model is defined by b1 xm + · · · + bm = 0, which by
(6.75) is equivalent to
g1 xm + · · · + gm−1 + gm zo−1 + · · · + gn zom−n = 0.
Thus x = zo is a solution to (6.76).
6.7
(6.76)
Conclusions
Analytical solutions have been derived for FIR and ARX model structures
that presents the most efficient input, in terms of input energy, to estimate a discrete non-minimum phase zero zo . The optimal input can be
characterized by a first order AR-filter with a pole in zo−1 . This solution
is independent of the model order. Thus, the variance of the estimated
152
6 Input Design for Identification of Zeros
non-minimum phase zero will be independent of the model order when
the optimal input is applied. A similar analytic solution is obtained for
general linear models based on a variance expression that is asymptotic
in model order and data. A numerical solution has been presented for
general linear SISO models of finite orders. It was illustrated that the
optimal input may be very different depending on model structure and
order.
Possible benefits of optimal design have been presented. It was shown
that the variance can be reduced significantly compared to white inputs
and square-waves, especially when the model is over-parameterized. It is
also shown that a solution based on the optimal AR-filter, in which the
true zero is replaced by an estimated zero, is quite robust with respect
to the estimated zero location.
Chapter 7
Convex Computation of
Worst Case Criteria
Let G(η) be a discrete linear time-invariant single input/single output
model of Go . Here η ∈ Rn is a vector that parameterizes G. The model
deviates typically from the true system. There are different ways of
representing the uncertainty in the model. The uncertainty will in this
chapter be represented by parametric uncertainties. The parameters will
lie in an ellipsoid centered around a nominal estimate ηo . The ellipsoid
is described by
Υ = {η | (η − ηo )T R(η − ηo ) ≤ 1}.
(7.1)
Parametric uncertainties in terms of ellipsoids such as Υ appears e.g. in
identification in the prediction error framework (Ljung, 1999).
To evaluate the quality of the estimated models and the possible performance degradation that the model errors may induce, it is important
to quantify them. The impact of the errors depends very much on the
intended application of the model and consequently a quality measure
has to take the intended use into account.
In Chapter 3.7 we introduced the following frequency function
F (ω, η) [Wn G(η) + Xn ]∗ Yn [Wn G(η) + Xn ] + Kn
,
[Wd G(η) + Xd ]∗ Yd [Wd G(η) + Xd ] + Kd
(7.2)
where Wn , Wd , Xn and Xd are finite-dimensional stable transfer func-
154
7 Convex Computation of Worst Case Criteria
tions. Furthermore Yn (ω) was defined by
Yn = Yn∗ Yn
(7.3)
where Yn is a stable finite-dimensional transfer function. The quantities
Yd , Kn and Kd have analogous definitions to Yn . Here A∗ denotes the
complex conjugate transpose of A.
In this chapter we will consider quality measures of G(η) that can be
expressed as
JF sup F (ω, η),
ω,η∈Υ
(7.4)
whenever F is well defined.
The quality measure (7.4) was introduced in Chapter 3.7. It was
shown that (7.4) can be included in the experiment design as a fixed
constraint, thus the upper bound of JF was constrained to be less than a
fixed value, while the parametric confidence region was manipulable via
R due to its dependence on the input spectrum.
The objective in this chapter is to compute JF given a fix confidence
ellipsoid. It is a non-trivial optimization problem to compute JF . However, when the frequency is fixed, the computation of maxη F (ω, η), η ∈
Υ, can be turned into a convex optimization problem. Hence, one way to
approximate JF is to compute maxη F (ω, η), η ∈ Υ for a finite number
of frequencies and pick the largest value as an approximation of JF . This
approach has been used in (Bombois et al., 2000b) to compute the worstcase ν−gap. A similar approach has been used in (Bombois et al., 2001)
to compute the worst-case performance of a control design.
There are two main contributions of this chapter that extend previous
results presented in (Bombois et al., 1999), (Bombois et al., 2000b) and
(Bombois et al., 2001). The first is the introduction of the generalized
cost function (7.4). The second is the introduction of a method that gives
an upper bound on JF without discretization of the frequency axis.
The outline of the chapter is as follows. Some examples are shown in
Section 7.1 to motivate the structure of the quality measure JF in (7.4).
It is shown that JF can be computed for a fixed frequency in Section 7.2.
By imposing some limitations on one of the optimization variables it is
possible to compute an upper bound of JF , considering all frequencies.
This is shown in Section 7.3. The method is illustrated by numerical
examples in Section 7.4 and some conclusions are drawn in Section 7.5.
7.1 Motivation of Quality Measure
7.1
155
Motivation of Quality Measure
In this section we will give some specific examples of problems that can
be expressed as (7.4). First we will consider the parametric uncertainty
in (7.4).
7.1.1
Parametric Ellipsoidal Constraints
One way of obtaining models is to use the prediction error framework
of system identification, see (Ljung, 1999) and (Söderström and Stoica, 1989). Based on N observed input-output data points this framework
delivers a frequency response estimate G(ejω , θ̂N ), where θ̂N is the prediction error estimate of a vector θ ∈ Rn that parameterizes a set of
transfer functions, G(ejω , θ), together with an uncertainty region. When
the model is flexible enough to capture the underlying dynamics, i.e. the
true system is in the model class, the uncertainty in the estimate is only
due to noise and other stochastic disturbances in the data. These errors
are denoted variance errors. There exist strong analytic results for the
variance error of prediction error estimates in terms of the covariance of
the parameters, see (Ljung, 1999) and Chapter 2.
The covariance matrix can be used to derive confidence bounds for
the estimates. These confidence bounds take the form of ellipsoids such
as Υ defined in (7.1) with R being proportional to the inverse of the
covariance matrix of the parameters. The confidence ellipsoid can be used
to determine the distance between the estimates and the true system. But
the confidence region can also be used together with an estimate, say θ̂N ,
to obtain en uncertainty ellipsoid which contains the true parameters
with probability p. Such a confidence ellipsoid for the estimate θ̂N is
defined by
Uθ = {θ | (θ − θ̂N )T PN−1 (θ − θ̂N ) ≤ α}
(7.5)
where PN is the covariance matrix of θ̂N and α is obtained from the
χ2 −distribution such that P r(χ2 (n) ≤ α) = p for the pre-specified level
of confidence, e.g. p = 0.95.
The parametric uncertainty region Uθ . corresponds to an uncertainty
region in the space of transfer functions denoted D:
D = {G(q, θ) |θ ∈ Uθ }.
(7.6)
It has been the main objective of (Bombois et al., 1999; Bombois et
al., 2000b; Bombois et al., 2001; Gevers et al., 2003) to derive results
156
7 Convex Computation of Worst Case Criteria
based on the uncertainty description (7.6) to bridge the gap between
identification and robust control theory.
Parametric uncertainty regions in terms of ellipsoids does not only
appear in identification in the prediction error framework. They do also
appear in e.g. set-membership identification techniques, see (Milanese
and Vicino, 1991).
7.1.2
Worst Case Performance of a Control Design
Assume that an identified model, G will be used for control design where
the objective is to control the open-loop system y = Go u + v where y is
the output, u the input, and v is some disturbance. The control law is
defined by
u = K(r − y) + w
(7.7)
where r and w are external excitation signals. With the model G and
the controller K, the designed closed-loop system becomes
GK
G
r
y
1+GK
1+GK
=
1
K
w
u
1+GK
1+GK
1
r
G K 1 T (G, K)
(7.8)
=
w
1
1 + GK
if the disturbance v is neglected. There are several ways of defining the
performance of the closed-loop system (7.8). When the controller K
stabilizes T (G, K) the performance measure adopted here is defined by
J(G, K, Wl , Wr ) Wl T (G, K) Wr ∞
(7.9)
where Wl = diag(Wl1 , Wl2 ) and Wr = diag(Wr1 , Wr2 ) are diagonal frequency weights and A∞ is the H∞ −norm (Zhou et al., 1996) of the
stable transfer function function A.
The most interesting performance measure is J(Go , K, Wl , Wr ) since
this will measure the achieved performance when the controller is applied
to the true system Go . However, Go is unknown.
Assume that the model G has been obtained from a prediction error
identification experiment, which together with the model estimate also
delivers an estimate of the covariance matrix for the estimated parameters. Further, assume that the true system is contained in the model class,
then the covariance matrix can be used to define an uncertainty set D as
157
7.1 Motivation of Quality Measure
in (7.6). Modulo an error in the covariance estimate, this uncertainty set
will also contain the true system with a prespecified probability. Thus,
one way to do a worst case estimation of the achieved performance is to
consider the following criterion
JW C (D, K, Wl , Wr ) max Wl T (G, K) Wr ∞
G(θ)∈D
(7.10)
where the uncertainty region D is defined in (7.6). By the definition of D
we know that the probability of JW C (D, K, Wl , Wr ) ≥ J(Go , K, Wl , Wr )
at least equals the degree of confidence, provided the asymptotic theory
of the prediction error identification is accurate.
As will be evidenced in Section 7.2, the computation of the worst
case performance over the model set D becomes a tractable convex optimization problem when the frequency is fix. Hence a lower bound of
JW C can be obtained by solving this optimization problem for a finite set
of frequencies and then take the maximum of all the obtained objective
values. This is the solution suggested by (Bombois et al., 2001).
Now consider the square of JW C :
2
∗
JW
C (D, K, Wl , Wr ) = max σ̄(TW TW ), θ ∈ Uθ
ω,θ
(7.11)
TW = Wl T (G, K) Wr )
where σ̄(A) is the largest singular value of A. Then
∗
∗
σ̄(TW
TW ) = TW TW
=
(|Wl1 G|2 + |Wl2 |2 )(|Wr1 K|2 + |Wr2 |2 )
|1 + GK|2
(7.12)
since T (·) is a rank one matrix. Notice that (7.12) can be fit into the
expression for F (ω) defined in (7.2) by choosing Wn = Wl1 , Yn =
(|Wr1 K|2 + |Wr2 |2 ), Kn = |Wl2 |2 (|Wr1 K|2 + |Wr2 |2 ), Wd = K, Xd =
2
Yd = 1 and Xn = Kd = 0. Hence the computation of JW
C (·) becomes a
special case of computing JF defined in (7.4).
7.1.3
The Worst Case Vinnicombe Distance
Now consider the worst case ν−gap between the identified model G(θ̂N )
and the uncertainty set D defined by
δW C (G(θ̂N ), D) = sup δν (G(θ̂N ), G(θ))
θ∈Uθ
158
7 Convex Computation of Worst Case Criteria
where δν denotes the Vinnicombe ν−gap between two transfer functions
introduced in (Vinnicombe, 1993). Here Uθ obeys (7.5). Furthermore,
when θ̂N ∈ Uθ the worst case ν−gap can be expressed as
δW C (G(θ̂N ), D) = sup κ(G(θ̂N ), G(θ)),
(7.13)
ω,θ∈Uθ
see Lemma 5.1 in (Bombois et al., 2000b). Here κ is the chordal-distance
κ((G(θ̂N ), G(θ) = |G(θ̂N ) − G(θ)|
0
1 + |G(θ̂N )|2 1 + |G(θ)|2
(7.14)
between the two transfer functions G(θ̂N ) and G(θ). Now consider the
square of the chordal-distance
κ2 (G(θ̂N ), G(θ)) =
|G(θ̂N ) − G(θ)|2
(1 + |G(θ̂N )|2 )(1 + |G(θ))|2 )
that also can be fit into the expression for F (ω) defined in (7.2) by choosing Xn = −G(θ̂N ), Wn = Wd = Yn = 1, Yd = Kd = 1 + |G(θ̂N )|2 and
Xd = Kn = 0. Hence the computation of the worst-case chordal-distance
becomes yet another example of a problem that can be fit into the computation of JF .
7.2
Computation of Worst Case Criterion
for a Fixed Frequency
The computation of JF can be restated as the following optimization
problem
minimize
γ
subject to
F (ω, η) ≤ γ ∀ ω
(η − ηo )T R(η − ηo ) ≤ 1.
γ
(7.15)
Let γ = γopt be the minimizer of (7.15), then JF = γopt . As stated, the
optimization problem in (7.15) is intractable since it involves constraints
that are infinite-dimensional and non-convex. In this section we will show
that for each frequency, the optimization problem (7.15) can be restated
as a convex problem that has a unique solution if feasible. The way this
7.2 Computation of Worst Case Criterion for a Fixed Frequency
159
is shown exploits techniques developed in (Bombois et al., 2000b) and
(Bombois et al., 2001) and the result can be seen as a generalization of
previous contributions.
The first step is to exploit the special structure of F since both the
numerator and the denominator of F are quadratic functions in G(η).
For this we will reuse the result in Lemma 3.6 that states
T
η
η
(γF0 (ω) − F1 (ω))
≥0
(7.16)
F ≤γ⇔
1
1
for some matrices F0 (ω) and F1 (ω). The equivalence in (7.16) will be
further exploited in the next theorem.
Theorem 7.1
Assume that F (ω, η) < ∞ for all ω and for all η ∈ Υ. Then the following
two statements are equivalent:
I.
F (ω, η) ≤ γ
∀ ω ∈ [−π, π] and ∀ η ∈ Υ,
Υ = {η | (η − ηo )T R(η − ηo ) ≤ 1}
(7.17)
II. ∃ τ (ω) ≥ 0, τ (ω) ∈ R such that
γF0 (ω) − F1 (ω) − τ (ω)E ≥ 0 ∀ ω
−R
Rηo
E T
ηo R 1 − ηoT Rηo
(7.18)
Proof: The equivalence (7.16) gives that F ≤ γ is equivalent to
T
η
η
σ0 (η) (γF0 (ω) − F1 (ω))
≥ 0.
1
1
(7.19)
Expression (7.19) is equivalent to that F ≤ γ for a particular η. Now
this must be true for all η ∈ Υ. The ellipsoid Υ can be parameterized as
T −R
η
σ1 (η) 1
−ηoT R
−Rηo
1 − ηoT Rη
η
1
(7.20)
Hence the condition F ≤ γP ∀ ω ∈ [−π, π] and η ∈ Υ is equivalent to
σ0 (ω) ≥ 0 for all ω and for all η such that σ1 (η) ≥ 0. Such a problem
160
7 Convex Computation of Worst Case Criteria
can be handled by the S−procedure (Boyd et al., 1994) which states the
following equivalence for each ω: σ0 (η) ≥ 0 ∀ η ∈ Rk such that σ1 (η) ≥ 0
⇔ ∃ τ ≥ 0, τ ∈ R, such that σ0 (η) − τ σ1 (η) ≥ 0 ∀η ∈ Rk . Since there
has to exist one τ ≥ 0 for each ω, we can rewrite τ as a function of ω
which has to fulfill τ (ω) ≥ 0 for all ω. The condition σ0 (η) − τ σ1 (η) ≥
0 ∀ ω & ∀ η ∈ Rk now corresponds to (7.18).
Theorem 7.1 can be used to rewrite the optimization problem (7.15)
as
minimize
γ
subject to
γF0 (ω) − F1 (ω) − τ (ω)E ≥ 0 ∀ ω
τ (ω) ≥ 0 ∀ ω.
γ,τ (ω)
(7.21)
Remark: For a fixed frequency, ω = Ω, the problem (7.21) becomes a
tractable convex optimization problem in the variables γ and τ .
This has been recognized in (Bombois et al., 2000b) where the worst
case ν-gap is computed for a fixed frequency. A similar approach is used
in (Bombois et al., 2001) to compute the worst case control performance
as defined in (7.10) but for a fixed frequency.
The main difficulty with (7.21), when the frequency is free, is the
unknown positive function τ (ω). In the next section, the optimization
problem will be relaxed by limiting τ (ω) to be a linearly parameterized
spectrum. With this restriction on τ (ω) it is possible to obtain a finite
dimensional convex optimization problem using the Kalman-YakubovichPopov (Yakubovich, 1962), whose minimizer will be an upper bound on
JF .
7.3
Computation of an Upper Bound
Introduce the following parametrization of τ (ω):
τ (ω) = Ψ(ejω ) + Ψ∗ (ejω )
Ψ(ejω ) =
M
−1
1
τ0 +
τk Bk (ejω )
2
(7.22)
k=1
where {Bk (ejω )} is a set of stable and finite-dimensional transfer functions, e.g. Laguerre functions (Wahlberg, 1991) or Kautz functions (Wahlberg,
1994). With the restriction τ (ω) ≥ 0 ∀ω, τ (ω) becomes a spectrum where
7.3 Computation of an Upper Bound
161
Ψ(ejω ) corresponds to the positive real part if τ (ω) ≥ 0 ∀ ω. When
{Bk (ejω )} = {e−kjω } the sequence {τk } corresponds to the correlation
coefficients. The main reason to study such parameterizations of τ (ω) is
that infinite-dimensional spectral constraints such as τ (ω) ≥ 0 ∀ ω may
be replaced by finite-dimensional linear matrix inequalities when the positive real part of the spectrum is linearly parameterized, see Lemma 3.1.
The idea is to let {A, B, C, D} be a state-space realization of the positive
M −1
part of the spectrum, Ψ(ejω ) = 12 τ0 + k=1 τk Bk (ejω ) where {τk } appears linearly in C and D. It is easy to construct such a realization since
{τk } appears linearly in Ψ(ejω ). Given this realization, the inequality
τ (ω) τ0 +
M
−1
ck [Bk (ejω ) + Bk∗ (ejω )] ≥ 0 ∀ ω
(7.23)
k=1
can, according to Lemma 3.1 be replaced by
Q − AT QA −AT QB
K(Q, {A, B, C, D}) −B T QB
−B T QA
0
CT
+
≥ 0.
C D + DT
(7.24)
Notice that (7.24) is a linear matrix inequality in Q and {τk }.
Lemma 7.1
Let F0 (ω), F1 (ω) and E be defined by (7.16), (3.107) and (7.18) and
introduce
Λ(ω) γF0 (ω) − F1 (ω) − τ (ω)E.
(7.25)
When τ (ω) is defined by (7.23) there exists a sequence {Λk } such
Λ(ω) ≥ 0 ∀ ω
⇔
p
Λk (e−kjω + ekjω ) ≥ 0 ∀ ω
(7.26)
k=0
where the variables {τk } and γ appears linearly in {Λk }.
Proof: Both F0 and F1 have the structure k Fk e−kjω . Multiply with
the least common denominator of τ (ω) to both sides of Λ(ω) ≥ 0. This
gives the equivalence in (7.26)
162
7 Convex Computation of Worst Case Criteria
The special parametrization of τ (ω) introduced in (7.23) will, according
to Lemma 7.1, imply that the condition Λ(ω) ≥ 0 ∀ ω can be replaced
by a positiveness constraint on a spectrum, see (7.26). This fact can now
be used together with Lemma 3.1.
Theorem 7.2
Assume that F (ω, η) < ∞ for all ω and for all η ∈ Υ. Let τ (ω) be defined
by (7.23) and let {Aτ , Bτ , Cτ , Dτ } be a state-space representation of the
positive real part of τ (ω) where {τk } appears linearly in Cτ and Dτ .
Let {AΛ , BΛ , CΛ , DΛ } be the corresponding state-space representation of
the positive real part of the spectrum (7.26) where {τk } and γ appears
linearly in CΛ and DΛ .
Then
F (ω, η) ≤ γ ∀ ω ∈ [−π, π] and ∀ η ∈ Υ,
(7.27)
Υ = {η | (η − ηo )T R(η − ηo ) ≤ 1}
if
)
K(Qτ , {Aτ , Bτ , Cτ , Dτ }) ≥ 0
.
K(QΛ , {AΛ , BΛ , CΛ , DΛ }) ≥ 0
(7.28)
Proof: Due to the parametrization of τ (ω), the constraints (7.28) will
assure that τ (ω) ≥ 0 ∀ ω and Λ(ω) ≥ 0 ∀ ω according to Lemma 3.1
and Lemma 7.1. Whenever τ (ω) ≥ 0 ∀ ω and Λ(ω) ≥ 0 ∀ ω this will,
according to Theorem 7.1, imply (7.27).
Theorem 7.2 can now be used to state a finite-dimensional convex
optimization problem to compute an upper bound on JF .
Theorem 7.3
Consider the assumptions established in Theorem 7.2. The minimizer
γopt of the optimization problem
minimize
γ,{τk },Qτ ,QΛ
subject to
γ
K(Qτ , {Aτ , Bτ , Cτ , Dτ }) ≥ 0
K(QΛ , {AΛ , BΛ , CΛ , DΛ }) ≥ 0
(7.29)
is an upper bound of JF defined in (7.4).
Proof: Apply Theorem 7.2 on the constraints in (7.21). The obtained
optimization problem will provide an upper bound if feasible due to the
restriction on τ (ω).
This is a very powerful result, which will be illustrated in Section 7.4.
163
7.4 Numerical Illustrations
7.4
Numerical Illustrations
The true linear discrete time system obeys
y(t) = Go (q)u(t) + e(t)
(7.30)
−1
0.8q
where Go (q) = 1−q−1
+0.16q −2 and where q is the time-shift operator
−1
q u(t) = u(t − 1). Furthermore y is the output, u the input and e
is zero-mean white noise with variance 0.1. A model is estimated using
the prediction error method based on 1000 samples of input/output data
from the system (3.139). The obtained model is
G(η̂) =
0.8064q −1
1 − 1.018q −1 + 0.1839q −2
(7.31)
where η̂ = [−1.018 0.1839 0.8064]. The covariance matrix of η̂ is
⎛
⎞
1.112 −0.987 0.739
(7.32)
Pη̂ = 10−3 · ⎝−0.987 0.900 −0.614⎠ .
0.739 −0.614 0.791
Now define the parametric uncertainty ellipsoid Uη as
Uη = {η | (η − η̂)T Pη̂ (η − η̂) ≤ 7.82}.
(7.33)
The right hand side of (7.33) is determined as P r(χ2 (3) ≤ 7.82) = 0.95.
Let ηo = [−1 0.16 0.8] be the vector that parameterizes the true system
Go . The ellipsoid Uη will contain ηo with a probability of 95% provided
the asymptotic theory of the prediction error method is accurate, see
(Ljung, 1999). The vector ηo is indeed contained in Uη since
(ηo − η̂)T Pη̂−1 (ηo − η̂) = 3.21 < 7.82.
7.4.1
Computation of the Worst Case Vinnicombe
Distance
Assume that we want to compute the worst case ν-gap
δW C = max
ω,η∈Uη
|G(η̂) − G(η)|2
.
(1 + |G(η̂)|2 )(1 + |G(η))|2 )
To compute δW C corresponds to compute JF for appropriate choices of
weights in the general cost function F as was shown in Section 7.1.3.
164
7 Convex Computation of Worst Case Criteria
An upper bound on JF can be computed, according to Theorem 7.3,
by solving a finite-dimensional convex optimization problem. We will
now use Theorem 7.3 to compute an upper bound on δW C . First the
parametrization of τ (ω) must be specified. We have chosen
τ (ω) =
M
−1
τk e−jωk
(7.34)
k=−(M −1)
with M = 3. There is a trade-off between flexibility and computational
complexity in the choice of M . The positive real part of (7.34) can be
realized by
O1×M −2
0
Aτ =
, Bτ = [1 0 . . . 0]T
OM −2×1
IM −2
(7.35)
1
Cτ = [τ1 τ2 . . . τM −1 ] , Dτ = τ0 ,
2
where Om×k is the zero matrix of size m by k and Im is the identity
matrix of size m by m. Hence the constraint τ (ω) ≥ 0 ∀ ω can now be
replaced by the linear matrix inequality K(Qτ , {Aτ , Bτ , Cτ , Dτ }) ≥ 0.
The special parametrization of τ (ω) will imply that
Λ(ω) = γF0 (ω) − F1 (ω) − τ (ω)E
=
p
Λk (e−jωk + ejωk ),
(7.36)
(7.37)
k=0
see Lemma 7.1. The positive real part of (7.36) can be realized in the
same manner as τ (ω), i.e. the realization follows from (7.35). Given the
realization {AΛ , BΛ , CΛ , DΛ } of Λ(ω), the constraint Λ(ω) ≥ 0 ∀ ω can
now be replaced by the linear matrix inequality
K(QΛ , {AΛ , BΛ , CΛ , DΛ }) ≥ 0.
Now it is straightforward to compute an upper bound on δW C using
Theorem 7.3. The result is shown in Figure 7.1. The obtained upper
bound is δW C ≤ 0.06348. This upper bound can be compared with a
method that divides the frequency axis into a finite number of frequencies
and where the worst case chordal distance, defined as maxη κ(G(η̂), G(η))
is computed for each frequency according to (7.21). Here κ is defined by
(7.14). This frequency by frequency method yields that δW C ≥ 0.06344.
Hence the obtained upper bound is relative tight to the true value of
δW C . This is somewhat surprising since the parametrization of τ (ω) is
quite restrictive.
165
7.4 Numerical Illustrations
0.07
0.06
0.05
0.04
0.03
0.02
0.01 −3
10
−2
10
−1
ω (rad)
10
0
10
Figure 7.1: Computation of the worst case ν-gap. Solid line: Obtained
upper bound based on (7.29). Dotted line: The worst case chordal distance, frequency by frequency.
7.4.2
Computation of Worst Case Control Performance
Assume that we want to control the open-loop system by the PI controller
−1
K(q) = 0.1−0.05q
. This controller stabilizes the nominal model (7.31).
1−q −1
The controller will actually stabilize all models in the set Gη = {G(η) :
η ∈ Uη } since δW C < T (G(η̂), K)−1
∞ , see (Vinnicombe, 1993).
The worst case performance when the controller is applied to the
systems in the set Gη can be evaluated by the criterion (7.10). Here we
will compute an upper bound on the worst case sensitivity function
JW CS = max
G(η)∈Gη
1
1 + KG(η)
by using the result of Theorem 7.3. The parametrization for τ (ω) follows
166
7 Convex Computation of Worst Case Criteria
1.4
1.2
1
0.8
0.6
0.4
0.2
0 −3
10
−2
10
−1
ω (rad)
10
0
10
Figure 7.2: Thick solid line: Upper bound on the sensitivity function.
Dotted line: Worst case sensitivity function, frequency by frequency.
Thin solid line: Achieved sensitivity with K applied to G(ηo ).
from (7.34) with M = 2. The result of the computation of JW CS is shown
in Figure 7.2. The obtained upper bound is JW CS ≤ 1.35297. A frequency by frequency optimization method yields that JW CS ≥ 1.35295.
7.5
Conclusions
In this chapter we have presented a method, in terms of a convex program,
to compute different worst case criteria, e.g. the worst case Vinnicombe
distance. Such problems have been solved until now by using a frequency
gridding. The main contributions are the introduction of a generalized
cost function, that e.g. includes the worst case Vinnicombe distance, and
a method that computes an upper bound of this cost function without
using a discretization of the frequency axis. The disadvantage is that the
7.5 Conclusions
167
method imposes a relaxation that may introduce conservativeness. It is
of great interest to further investigate how conservative these results may
be. However, numerical results shows that the conservativeness, at least
for a large number of cases, not will be an issue.
Chapter 8
Gradient Estimation in
IFT for Multivariable
Systems
Iterative Feedback Tuning (IFT) is a model free control tuning method
using closed-loop experiments, see (Hjalmarsson et al., 1994; Hjalmarsson
et al., 1998; Hjalmarsson, 2002). For single-input single-output (SISO)
systems only 2 or 3, depending on the controller structure, closed-loop
experiments are required. However for multivariable systems the number of experiments increases to be proportional to the dimension of the
controller. In this chapter several methods are proposed to reduce the
experimental time by approximating the gradient of the cost function.
One of these methods uses the same technique of shifting operators as is
used in IFT for SISO systems. This method is further analyzed and sufficient conditions for local convergence are derived. It is shown that even
when there are commutation errors due to the approximation method,
the numerical optimization may still converge to the true optimum.
8.1
Introduction
In order to develop control systems that both guarantee stability and
provide good performance some knowledge about the process to be controlled is necessary. One important source of information comes from
170
8 Gradient Estimation in IFT for Multivariable Systems
experimental data collected from the process. Hence, means to map the
experimental information into the controller are essential ingredients in
any control design procedure.
Often, but not necessarily, an explicit model of the process is used
as an intermediate for this. This model is typically obtained by fitting
parameters in a prespecified model structure so that the model behaves
similarly to the true process for some available data set. The usefulness
of a model lies in the fact that it can be used to predict the process
behavior for other operating conditions than those in the data set. Based
on this predicted behavior, a suitable controller can be designed. At the
same time as the ability to extrapolate makes a model-based approach
very powerful, it is also its main weakness. An incorrect model may lead
to undesirable, and even catastrophic, results. The issue of how to ensure
that an identified model is suited for control design has been subject to
intense study over the last fifteen years (Gevers, 1993; Van den Hof and
Schrama, 1995; Hjalmarsson, 2003).
A more cautious approach is to limit the extrapolation ability of the
model. Suppose that the control design is embodied in terms of a design criterion which is a function of some parameters in the controller.
Suppose also that there already is a controller operating in the feedback
loop. One could then try to model the design criterion locally around
the present controller parameters and use this local model to adjust the
controller so that performance is improved. Iterative Feedback Tuning
(IFT) is a tuning method which is based on this philosophy. IFT extracts
information about the closed-loop sensitivity to the model/controller parameters (gradient information). This corresponds to local information
about the controlled process around the current closed-loop trajectory
and hence, implicitly, to a local model.
The price for using accurate local system information is first of all that
only gradual changes are allowed in a controller tuning algorithm. This
makes tuning slower compared to if a model that is able to (accurately)
model the complete process characteristics is available. Furthermore, a
local model cannot in general be used to predict the stability border.
Finally, to obtain this information, special experiments may have to be
performed on the true process, possibly disturbing the normal operating
conditions. In IFT, for SISO systems, the experiment load can be kept
at a maximum of two experiments in each iteration independent of the
number of the parameters to tune. However for nonlinear systems and
multi input/multi output (MIMO) systems the experiment load grows
substantially.
8.2 System Description
171
With IFT transformed into a nonlinear setting it is shown in (Sjöberg
and Agarwal, 1996) and (De Bruyne et al., 1997) that the gradient of
the control cost can be computed by performing additional experiments.
The number of experiments turn out to be proportional to the number of
parameters. To reduce the number of experiments, identification-based
methods to approximate the gradient are proposed in (De Bruyne et
al., 1997) and (Sjöberg and Agarwal, 1997). A hybrid version between
the ideas of the original IFT and the model-based approximations is
presented in (Sjöberg and Bruyne, 1999), where a model of the linearized
closed-loop system is introduced in order to compensate for the errors
that occur when the original IFT is used for nonlinear systems.
It is shown in (Hjalmarsson, 1999) that for linear time-invariant MIMO
systems, the experimental load can be reduced to be proportional to the
dimension of the controller. From a practical point of view, the experimental load might still be too high for large systems. Hence it is of great
interest to further reduce the experimental load. It is the main objective
of this paper to discuss, suggest and evaluate some methods that exist
for doing this.
IFT was introduced in (Hjalmarsson et al., 1994) and a general presentation has appeared in (Hjalmarsson et al., 1998). For a recent overview
see (Hjalmarsson, 2002). The Special Section (Hjalmarsson and Gever,
2003) contains a number of applications of IFT.
The system set up is introduced in Section 8.2. In Section 8.3 there is
a brief description of how the gradients are estimated within the framework of IFT. Since the gradient estimation is experimentally costly, some
approximations of the gradient estimate are introduced in Section 8.4.
One of the methods, which uses the same technique as is used for IFT
and SISO systems, is analyzed in Section 8.5, where conditions for local
convergence are stated. The performance of this method is the studied on three numerical examples in Section 8.6. Finally there are some
concluding remarks in Section 8.7.
8.2
System Description
The overall closed-loop system, depicted in Figure 8.1 is described by
yt = G0 ut + vt
ut (ρ) = C(ρ)(rt − yt (ρ)),
(8.1)
172
8 Gradient Estimation in IFT for Multivariable Systems
v
r
e(ρ)
u(ρ)
C(ρ)
y(ρ)
G0
Figure 8.1: Feedback system.
where the process to be controlled, G0 , is assumed to be a discrete linear
time-invariant multivariable system, yt ∈ Rp is the output, ut ∈ Rm is the
input and vt ∈ Rp represents unmeasurable signals such as disturbances
and noise. The disturbance vt is assumed to be a zero-mean discrete-time
stochastic process and it is also assumed that sequences from different
experiments are mutually independent. The controller C(ρ) is a m × p
transfer function matrix parameterized by some parameter vector ρ ∈ Rf .
The reference rt is an external vector. The sub index t denotes the
discrete time instants. Notice that signals originating from measurements
on the closed-loop system are functions of ρ. The parametrization of C(ρ)
is such that all signals in the closed-loop are differentiable with respect to
ρ. To ease the notation the time argument will from now on be omitted
whenever possible. The performance of the controller is measured by the
following quadratic criterion
N
1
T
E
J(ρ) =
ỹk (ρ) ỹk (ρ)
2N
(8.2)
k=0
where ỹ(ρ) = y(ρ)−yd is the difference between the achieved output and
the desired output yd . The desired output is assumed to be generated as
yd = Td r, where Td is the reference model. The expectation E[·] is with
respect to the probability distribution of the disturbance v. The optimal
controller parameterized by ρc is defined by
ρc = arg min J(ρ).
ρ
(8.3)
Hence the target of a controller tuning algorithm is to minimize (8.2)
to find the optimal controller setting represented by ρc . In general, the
problem of minimizing J(ρ) is not convex and one has to content with a
8.3 Gradient Estimation in the IFT Framework
173
local minimum. The stationary points of J(ρ) are given as solutions to
N )
∂yk (ρ) 1T
1
∂J(ρ)
=
E
(8.4)
0=
ỹk (ρ) .
∂ρ
N
∂ρ
k=1
With computed gradients the solution can be obtained by gradient based
methods, e.g. the Gauss-Newton search algorithm
ρj+1 = ρj − γj R−1
j
∂J(ρj )
,
∂ρ
(8.5)
where Rj is an approximation of the Hessian of J(ρ) and γj is the adjustable step length.
As the problem is stated in (8.4) it is intractable since it involves
expectations that are unknown. However, such problems can be handled
by classical results in stochastic approximation algorithms, provided there
exists an unbiased estimate of the gradient, ∂J(ρ)
∂ρ . The key contribution
in the first derivation of IFT for SISO systems (Hjalmarsson et al., 1994)
was to show that an unbiased estimate of ∂J(ρ)
∂ρ can indeed be obtained
by performing experiments on the true closed-loop system. How this is
done is the topic of the next section.
8.3
Gradient Estimation in the IFT Framework
Here we will give a brief introduction to how the gradient is obtained
in the framework of IFT. For a more detailed description of IFT in general we refer to (Hjalmarsson et al., 1998; Hjalmarsson, 2002) and to
(Hjalmarsson, 1999) for the special implicatons regarding MIMO systems.
With the achieved sensitivity function and the complementary sensitivity function, respectively defined by
[I + G0 C(ρ)]−1
S0 (ρ)
=
T0 (ρ)
= S0 (ρ)G0 C(ρ),
(8.6)
(8.7)
the expression for the output y(ρ) from the closed-loop system (8.1) is
y(ρ) = T0 (ρ)r + S0 (ρ)v.
(8.8)
174
8 Gradient Estimation in IFT for Multivariable Systems
e(ρ)
C (ρ)
v=0
y (ρ)
r=0
C(ρ)
u (ρ)
G0
Figure 8.2: Setup for exact gradient experiments
The gradient of y(ρ) with respect to to an arbitrary element of ρ, denoted
by y (ρ), is then
y (ρ)
= S0 (ρ)G0 C (ρ)(r − y(ρ))
(8.9)
where y(ρ) is the output collected from the closed-loop system operating
under normal operating conditions. Defining the control error as e(ρ) =
r − y(ρ), then ideally the gradient of y(ρ) is obtained running the closedloop experiment shown in Figure 8.2. In practice a perturbed estimate
ŷ (ρ) = y (ρ)+S0 (ρ)v is obtained due to the non-zero disturbance. It can
be shown (Hjalmarsson et al., 1994) that an unbiased estimate, denoted
ˆ
by ∂ J(ρ)/∂ρ,
of ∂J(ρ)/∂ρ is obtained if v is a stochastic stationary signal
with zero mean that is mutually independent for sequences collected from
ˆ
different experiments. Unbiased means that E[∂ J(ρ)/∂ρ]
= ∂J(ρ)/∂ρ.
However, this is a rather inefficient way of generating the gradient. With
this setup one has to perform one gradient experiment for each parameter
in the vector ρ. This amount can be reduced drastically using the fact
that scalar linear operators commute. For SISO systems (8.9) can be
rewritten as
y (ρ) = C(ρ)−1 C (ρ)S0 (ρ)G0 C(ρ)e(ρ).
(8.10)
Thus, to obtain the gradient signal ∂y(ρ)
∂ρ only two experiments are needed
independent of the number of parameters. The first one collects y(ρ)
under normal operating conditions and in the second one the control
error e(ρ) is set to be the reference and the output of this experiment
is then filtered through C(ρ)−1 C (ρ). The last operation can be done
off-line since C(ρ) is a known function of ρ.
For MIMO systems the operators in (8.9) do not typically commute.
However, since each element in the controller corresponds to a SISO system this trick of commutation can be used for each input/ouput connection if special experiments are performed. It is shown in (Hjalmarsson,
175
8.4 Gradient Approximations
1999) that this limits the maximum number of required gradient experiments to m × p, i.e. equal to the dimension of the controller. Despite this
reduction, the experiment may be prohibitively long for slow processes,
such as distillation columns, since the experiments are performed on the
true plant and possibly disturbing the normal operational conditions.
Hence it is of great interest to further reduce the number of experiments.
It is the objective of this chapter to discuss some options that exist for
doing this.
8.4
Gradient Approximations
One way to reduce the number of experiments further is to approximate
the signal ∂y(ρ)
∂ρ . There is a large body of theory on approximation errors in optimization, contributions related to IFT are e.g. (Bruyne and
Carrette, 1997) for linear systems and (Sjöberg and Bruyne, 1999) for
nonlinear systems. In (Hjalmarsson, 1998) it is remarked that in practice
IFT seems to be robust with respect to the gradient estimate for many
systems. As long as this estimate corresponds to a descent direction, performance may be improved by a suitable choice of the step-length. The
estimate thus does not have to be exact. In fact, for convergence aspects,
it is more important that the quality of the estimate is good in a vicinity
of the optimum than in the surrounding regions. Hence a mix of methods
for obtaining a gradient estimate may be useful. In the first phase, far
away from the optimum, one could use a method which produces rough
but cheap gradient estimates. Provided these estimates are reasonable,
a sub optimal controller is obtained. If this controller is not satisfactory,
one can continue with the original IFT method, which provides unbiased
estimates of the gradient and hence guarantees convergence to a local
optimum (provided stability is maintained).
We will here suggest some methods to approximate ∂y(ρ)
∂ρ for MIMO
systems. Rewrite (8.9) as
∂C(ρ)
∂y(ρ)
= S0 (ρ)G0 C(ρ)C(ρ)−1
e(ρ)
∂ρi
∂ρi
∂C(ρ)
= T0 (ρ) C(ρ)−1
e(ρ).
∂ρi
Ai (ρ)
The following approximation techniques are considered:
(8.11)
176
M 1.
M 2.
M 3.
M 4.
8 Gradient Estimation in IFT for Multivariable Systems
∂y(ρ)
∂ρi
= Ai (ρ)T0 (ρ)e(ρ)
This is the same approach as is used for SISO systems which means
that a maximum of two gradient experiments are required no matter
what the dimension of the controller is. For MIMO there is almost
always an error due to the commutation error when Ai (ρ) and
T0 (ρ) are shifted. The motivation for using this method is that
T0 (ρ) is typically close to the identity in the pass band, which
should reduce the commutation error. The local convergence of
this method will be further studied in the next section.
∂y(ρ)
∂ρi
= Td Ai (ρ)T0 (ρ)T−1
d e(ρ)
This method is introduced to decrease the commutation error. When
T0 (ρ) tends to Td , T0 (ρ)T−1
tends to the identity which, obvid
ously, commutes with all matrices. An implementation issue is that
in many cases is non-causal. Since both T−1
and e(ρ) are
T−1
d
d
known, non-causal filtering can be applied to compute T−1
d e(ρ).
∂y(ρ)
∂ρi
= Td Ai (ρ)e(ρ)
The objective is to tune T0 (ρ) towards Td . By assuming that
T0 (ρ) ≈ Td one could replace the unknown T0 (ρ) with Td . This
approximation has been applied in model reference adaptive control
(Whitaker et al., 1958). This method is a naive alternative to the
previous presented. When T0 (ρ) is far from Td we cannot expect
the approximation to be good. However, it has the nice property
that when T0 (ρ) tends to Td , the approximation error decreases.
∂y(ρ)
∂ρi
= T̂0 (ρ)Ai (ρ)e(ρ)
Here T̂0 (ρ) is an identified model of the closed-loop system. This
idea is used in (Bruyne and Carrette, 1997), where IFT is applied to
a resonant system where the gradient experiment was not physically
realizable due to high excitation of resonant nodes. The identification is assumed to be simplified by the assumption that the closedloop system is typically of low order and that possible nonlinear
effects in G0 are reduced by the feedback. Some of the drawbacks
is that the method relies on the identified model, possibly extra
signals need to be injected to excite the system during identification and it requires more knowledge of the user to perform the
identification properly.
M 5.
∂y(ρ)
∂ρi
= T̂0 (ρ)Ai (ρ)T0 (ρ)T̂0 (ρ)−1 e(ρ)
177
8.5 Analysis of Local Convergence
This method is inspired by the second method, M 2. Here Td is
replaced by an identified model T̂0 (ρ).
The list of different gradient approximations can be extended much
further, e.g. we can think of estimating T̂0 (ρ) based on an identified
model Ĝ0 of the plant. This approach is used in (Trulsson and Ljung,
1985). The main drawback is that the plant might be both nonlinear
and of high order which complicates the identification and the following
control design.
Notice that there is a difference in the number of experiments between
the methods presented above. In every method at least one on-line experiment is needed to generate e(ρ). The first two and the last (M 1, M 2
and M 5) also need a second on-line experiment where the signal e(ρ),
−1
e(ρ), depending on method, is filtered through the
T−1
d e(ρ) or T̂0 (ρ)
closed-loop system T0 (ρ). The identification based methods (M 4 and
M 5) possibly also need some extra on-line experiments to carry out the
identification of T̂0 (ρ). Finally, the gradient signal
∂y(ρ)
∂ρ
is obtained by
off-line filtering through Ai (ρ), Td Ai (ρ) or T̂0 (ρ)Ai (ρ) for each element
i in ρ. This can be compared with 1 + m × p on-line experiments for the
true gradient.
8.5
Analysis of Local Convergence
We will study the local convergence of the gradient approximation method
M 1 introduced in the previous section, i.e. the method which applies the
same technique of shifting operators as is used for SISO systems. The
purpose of the analysis is to provide some conditions under which the
proposed gradient approximation method will work. In case of failing
to go downhill with the gradient approximations, a full set of gradient
experiments can be carried out.
In the analysis we will consider the case where requirements on the
controller to attenuate noise is subordinated its ability of model reference
tracking. Hence we are not interested to tune the controller with respect
to the noise contributions. This situation can be handled by doing a
slight modification of the original criterion (8.2). Consider the quadratic
design criterion
J(ρ) =
1
E[ỹ1T (ρ)ỹ2 (ρ)],
2
(8.12)
178
8 Gradient Estimation in IFT for Multivariable Systems
where ỹ1 and ỹ2 are based on different realizations of the output. Since
the disturbance is mutually independent between different experiments
the influence of the disturbance will be decorrelated using such a criterion.
Notice that if there exists a parameter ρc such that T0 (ρc ) = Td then
this is a stationary point for the design criterion. The objective of this
section is to analyze local convergence for an iterative method which uses
the gradient approximation method M 1 around the point ρ = ρc where
T0 (ρc ) = Td .
There is no loss of generality to let the system be noise-free in the
analysis of local convergence when considering the criterion (8.12) and
the stationary point ρ = ρc . Thus in the analysis the closed-loop system obeys (8.1) when the disturbance v is assumed to be zero. We will
consider the quadratic design criterion
J(ρ) =
1
E[ỹT (ρ)ỹ(ρ)].
2
(8.13)
The criterion (8.13) becomes a good approximation of (8.2) when the
number of data points, N , becomes large. The analysis will be based on
the so-called ODE analysis (Ljung, 1977) which relate the evolution of
an iterative algorithm like
ρj+1 = ρj − γj
j )
∂J(ρ
∂ρ
(8.14)
to the trajectories of a differential equation. The corresponding ODE to
(8.14) is
∂J(ρ)
dρ/dt = −
.
(8.15)
∂ρ
The idea is that when the step size γj tends to zero the numerical iteration
method will asymptotically behave as the corresponding ODE. Consider
the approximation of the gradient of (8.13)
∂J(ρ)
∂ρ
⎤
T
∂y(ρ)
ỹ(ρ)⎦
E⎣
∂ρ
⎡
⎤
T
∂y(ρ)
E⎣
(T0 (ρ) − Td )r⎦ .
∂ρ
⎡
=
=
(8.16)
179
8.5 Analysis of Local Convergence
Notice that if there exists a parameter ρc such that T0 (ρc ) = Td then this
is a stationary point for the design criterion and furthermore
∂J(ρc )
∂ρ .
c)
∂J(ρ
∂ρ
=
The question is whether this is a stable stationary point or not.
We will state some results that gives sufficient conditions for local
convergence around T0 (ρc ) = Td .
Introduce the general linear reference model
⎤
⎡
T11 (ejω ) . . . T1n (ejω )
⎥
⎢
..
..
..
Td (eiω ) = ⎣
(8.17)
⎦
.
.
.
Tn1 (ejω ) . . .
Tnn (ejω )
where the elements [Tij ] are some arbitrary SISO transfer functions. Furthermore let the linear controller have the structure
⎡
⎤
C11 (ρ11 , ejω ) . . . C1n (ρ1n , ejω )
⎢
⎥
..
..
..
C(ρ, ejω ) = ⎣
(8.18)
⎦
.
.
.
Cn1 (ρn1 , ejω )
...
Cnn (ρnn , ejω )
where
Cij (ρij , ejω ) = ρTij Γ(ejω ),
(8.19)
ρij = [ ρij1 , . . . , ρijm ]T and where
Γ(ejω ) = [ B1 (ejω ), . . . , Bm (ejω ) ]T , is some basis function. For example the discrete PID-controller
CP ID (q) =
1
q −1
q −2
ρ1 + ρ2 q −1 + ρ3 q −2
=
ρ
+
ρ
+
ρ
1
2
3
1 − q1
1 − q −1
1 − q −1
1 − q −1
can be represented in this setting.
Without loss of generality, the complexity of each element in (8.18),
m, is assumed to be equal for all elements. The controller parameters are
collected in the vector ρ = [ ρT11 . . . ρTn1 ρT12 . . . ρTnn ]T , i.e. each
element in (8.18) is assumed to be individually parameterized. The spectrum of e(ρc ) is defined by
Φe (ω) = S0 (ρc )Φr S∗0 (ρc )
(8.20)
where Φr is the spectrum of r. Let Ā and A∗ denote the conjugate and
the conjugate transpose of A, respectively. The Kronecker product will
be denoted ⊗. From now on, the frequency argument will be omitted.
180
8 Gradient Estimation in IFT for Multivariable Systems
Theorem 8.1
Assume there exists a parameter vector ρ = ρc such that T0 (ρc ) = Td
where Td is defined by (8.17). When the controller is defined by (8.18),
ρ = ρc is a stable stationary point to the ODE (8.15) using the gradient
approximation
∂y(ρ)
∂ρjkl
= C−1 (ρ) ∂C(ρ)
T0 (ρ)e(ρ) if the matrix
∂ρ
jkl
(Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗ TTd > 0
(8.21)
for all ω ∈ [−π, π].
Proof: See Appendix 8.A.1.
This result shows that even if the operators C−1 C and T0 do not
commute, the descent method (8.14) might still be locally convergent
using this approximation method.
It might be argued that the condition that T0 (ρc ) = Td is restrictive
in that it may a priori be difficult to ensure that the controller structure
is such that this is possible. However, in (Hjalmarsson et al., 1998) it
is argued that the design specifications should be adapted to the chosen
controller complexity so that Td is achievable.
In case the convergence point corresponds to a diagonal closed-loop
system the following can be stated about how large the dynamics of each
channel can deviate from each other still guaranteeing convergence.
Corollary 8.1 Assume that the reference signal r = [r1 , · · · , rn ]T obeys
E[rk rl ] = 0, k = l, i.e. the different channels of the reference are independent. Then when T0 (ρc ) = Td = diag{Tkk } is diagonal,
(Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗ TTd > 0
∀ ω ∈ [−π, π].
if and only if for all ω ∈ [−π, π]
| arg Tkk (ejω ) − arg Tll (ejω )| <
Proof: See Appendix 8.A.2.
π
2
∀ k, l.
When the controller has a diagonal form with a decoupling element,
the structure can be further exploited.
Theorem 8.2
Assume there exists a parameter vector ρ = ρc such that T0 (ρc ) = Td
where Td is defined by (8.17). When the controller is defined by C(ρ) =
181
8.5 Analysis of Local Convergence
WCd (ρ), where W is some arbitrary multivariable transfer operator that
is independent of ρ and where Cd (ρ) is diagonal, then ρ = ρc is a stable
stationary point to the ODE (8.15) using the gradient approximation
∂y(ρ)
∂ρjkl
= C−1 (ρ) ∂C(ρ)
T0 (ρ)e(ρ) if
∂ρ
jkl
(Td Φe ) T̄d + (Φe T∗d ) TTd > 0
(8.22)
for all ω ∈ [−π, π].
Proof: See Appendix 8.A.3.
Here denotes the Hadamard product.
Corollary 8.2 Assume that the reference signal r = [r1 , · · · , rn ]T obeys
E[rk rl ] = 0, k = l and the controller is defined by C(ρ) = WCd (ρ) as
in Theorem 8.2. When T0 (ρc ) = Td = diag{Tkk } is diagonal,
(Td Φe ) T̄d + (Φe T∗d ) TTd > 0
for all ω ∈ [−π, π].
Proof: See Appendix 8.A.4.
Theorem 8.3
Under the same assumptions as in Theorem 8.1 , with the exception
that the controller is a proportional controller, ρ = ρc becomes a stable
stationary point to the ODE (8.15) using the gradient approximation
∂y(ρ)
∂ρjkl
= C−1 (ρ) ∂C(ρ)
T0 (ρ)e(ρ) if
∂ρ
π
−π
π
−π
jkl
((Td Φe ) T̄d + (Φe T∗d ) TTd )dω > 0
when C = WCd (ρ) or
((Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗ TTd )dω > 0
when C is of full order.
Proof: See Appendix 8.A.5.
We have given sufficient conditions for local convergence around the
point T0 (ρc ) = Td for a method using the steepest descent method
(8.14) in which the step-size tends to zero and where the gradient of
182
8 Gradient Estimation in IFT for Multivariable Systems
the output with respect to the controller is approximated using approximation method M 1. The method is based on the assumption that
T0 (ρ)C(ρ)−1 C (ρ) ≈ C(ρ)−1 C (ρ)T0 (ρ). All the results state some
conditions on the convergence point T0 (ρc ) = Td . The most promising
about these results are that even if Td and C−1 C do not commute, we
may have convergence to the true optimum.
In the analysis a pure model reference criterion has been studied.
Furthermore only output based cost functions have been considered. It
is desirable to extend the work in the future to include cost functions
which also contain input weighting. Especially since the standard IFT
approach can handle all these different types of criteria, not just the
criterion (8.13).
8.6
Numerical Illustrations
We will now use the gradient approximation method M 1 in a numerical
example in order to demonstrate some of the results in Section 8.5. The
performance of this method will be compared with the original gradient
estimation method of IFT for MIMO systems (Hjalmarsson, 1999) (denoted: the original method), which produces unbiased gradient estimates.
The parameters will be updated according to
ρj+1 = ρj − γj
j )
∂J(ρ
,
∂ρ
where
N ) j 1T
j )
∂J(ρ
1 ∂y
k (ρ )
=
ỹk (ρj ).
∂ρ
N
∂ρ
k=1
The process to be controlled is the following two by two system
−1
−1
G0 (q) =
−2.25q
1−q −1
−2.5+3q −1
1−1.4q −1 +0.4q −2
2.25q
1−q −1
0.5−0.6q −1
1−1.4q −1 +0.4q −2
(8.23)
which has a non-minimum phase zero in 1.2. The reference model is
defined by
(1−p)q −1
0
1−pq −1
.
(8.24)
Td (p) =
−0.2+0.24q −1
0
1−1.6q −1 +0.64q −2
183
8.6 Numerical Illustrations
0.2
0
−0.2
−0.4
−0.6
−0.8
0.75
0.8
0.85
0.9
0.95
Figure 8.3: Solid line: J(1, 2) as a function of p. Dashed line: J(2, 1)
as a function of p.
ρ 0.1
which
5ρ 0.1
has only one parameter free to tune. The optimal controller will be a
function of p, see (8.24). The optimal parameter is given by ρc = (1−p)/9.
In the simulation example we use a reference signal with the spectrum
Φr = 0.25I, i.e. white noise.
Since a P-controller is used Theorem 8.3 applies, which states local
convergence if
π
((Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗ TTd )dω > 0.
The system is controlled with the P-controller C(ρ) =
−π
When Td and Φr are diagonal this condition reduces to check
π
J(k, l) (|Skk |2 (Tkk T̄ll + T̄kk Tll )dω > 0 ∀ k, l,
−π
where Skk and Tkk represents the kth diagonal element of Sd and Td ,
184
8 Gradient Estimation in IFT for Multivariable Systems
0.023
0.022
ρ
0.021
0.02
0.019
0.018
0.017
0
20
40
60
80
100
iterations
120
140
160
180
200
Figure 8.4: Illustration of a small perturbation on ρ and with p =
0.8. Solid line-original method, dashed line-approximation M 1, dotted
horizontal line-optimal ρ.
respectively. Figure 8.3 show J(1, 2) and J(2, 1) as functions of the pole
p. This plot indicates that local convergence can not be guaranteed if
p < 0.917. It is worth to notice that the analytical results only give
sufficient conditions. Furthermore they are only valid for sufficiently large
number of data N . In this example N = 330.
We will now compare two different reference models, Td (p = 0.8)
and Td (p = 0.95). When we start with a controller that is perturbed
from the optimum as ρ = ρc − 0.005, the descent algorithm converge to
the true optimum for both p = 0.8 and p = 0.95, see Figure 8.4 and
Figure 8.5 . Notice that the descent algorithm converge even though
that p = 0.8 < 0.916, but that the convergence rate is slow compared the
standard IFT method which uses the true gradient.
When the initial parameter is perturbed more, ρ = ρc −0.1, we obtain
the result in Figure 8.6 and Figure 8.7. Thus when p = 0.8 the algorithm
diverge but when p = 0.95 we still converge.
185
8.6 Numerical Illustrations
−3
8
x 10
7
ρ
6
5
4
3
2
0
2
4
6
8
10
iterations
12
14
16
18
20
Figure 8.5: Illustration of a small perturbation on ρ and with p =
0.93. Solid line-original method, dashed line-approximation M 1, dotted
horizontal line-optimal ρ.
When considering this particular example it seems that a reference
model which obeys the sufficient condition J(k, l) > 0 has a larger region
of convergence than a convergence point for which J(k, l) < 0. Furthermore, the same conclusion seems to hold also when the convergence rate
is considered.
To evaluate for which reference models the approximation method
M 1 may work the result of Corollary 8.1 might be useful. It states that
the phase difference between each diagonal element in the reference model
should be less than π/2. Figure 8.8 shows the bode plots of the different
elements in the reference models considered here. It is worth to notice
that difference in gain between T11 (0.8) and T22 is less than the difference
between T11 (0.95) and T22 . Still the reference model Td (0.95) is better
with respect to convergence aspects. When considering the phase, we
see that difference between T11 (0.95) and T22 is less than the difference
186
8 Gradient Estimation in IFT for Multivariable Systems
0.05
0
−0.05
ρ
−0.1
−0.15
−0.2
−0.25
−0.3
−0.35
0
2
4
6
8
10
iterations
12
14
16
18
20
Figure 8.6: Illustration of a large perturbation on ρ and p = 0.8. Solid
line-original method, dashed line-approximation M 1, dotted horizontal
line-optimal ρ.
between T11 (0.8) and T22 which may explain why Td (0.95) is better than
Td (0.8).
The simulation example shows that a descent method using the approximation method M 1 may converge to the true optimum even though
the commutation error is non-zero. It has also been illustrated that the
choice of reference model may have a large impact on the convergence of
such an algorithm.
8.7
Conclusions
In this chapter we have examined the gradient estimation problem in IFT
for MIMO systems. Since the original IFT algorithm requires 1 + m × p
experiments to compute the gradient of a system with m control signals
and p sensed outputs, we have proposed several methods to approximate
187
8.7 Conclusions
0.02
0
ρ
−0.02
−0.04
−0.06
−0.08
−0.1
0
2
4
6
8
10
iterations
12
14
16
18
20
Figure 8.7: Illustration of a large perturbation on ρ and p = 0.93. Solid
line-original method, dashed line-approximation M 1, dotted horizontal
line-optimal ρ.
the gradient in order to decrease the experimental load. In particular, we
have studied a method in which operators are shifted in a similar fashion
as is in IFT for SISO systems. This reduces the amount of experiments
to two regardless of the complexity of the system and the controller. The
shifting of multivariable operators almost always introduces an error due
to that they typically do not commute. However the analysis gives sufficient conditions for local convergence which shows that the numerical
gradient search may still converge to the true optimum even if the commutation error is non-zero. However some caution has to be taken as is
illustrated in the simulation example.
So far only output based cost functions have been considered. It is
desirable to extend the work in the future to include cost functions which
also contain input weighting.
188
8 Gradient Estimation in IFT for Multivariable Systems
Bode Diagram
0
Magnitude (dB)
−5
−10
−15
−20
−25
−30
0
Phase (deg)
−90
−180
−270
−360
0
1
10
10
Frequency (rad/sec)
Figure 8.8: Bode plot of T11 (0.8) (dashed), T11 (0.95) (solid) and T22
(dotted).
8.A
8.A.1
Proofs
Proof of Theorem 8.1
A sufficient condition for the ODE (8.15) to be locally stable is that the
linearized system
∂ ∂J(ρ)
ρ
(8.25)
dρ/dt = −
∂ρ ∂ρ
c
∂ ∂J(ρ )
is stable at the stationary point, i.e. the eigenvalues of ∂ρ
must
∂ρ
have positive real parts. A sufficient condition that the real parts of the
eigenvalues of
c)
∂ ∂J(ρ
∂ρ ∂ρ
are positive is that
c )
c )
∂ ∂J(ρ
∂ ∂J(ρ
+
Ĵpp (ρ ) =
∂ρ ∂ρ
∂ρ ∂ρ
c
T
(8.26)
189
8.A Proofs
is positive definite. Introduce
Ajkl (ρ) = C−1 (ρ)
∂C(ρ)
∂ρjkl
(8.27)
and let C−1 = [ a1 . . . an ] where ai is a column vector. Since each
element of (8.18) is individually parameterized, Ajkl will be of rank one
and thus can be expressed as
Ajkl = Bl aj εTk
(8.28)
where εk is the unit vector with the kth element equal to one. Using
those expressions, an arbitrary element of
c)
∂ ∂J(ρ
∂ρ ∂ρ
can be expressed as
⎡
⎤
T
∂y(ρ)
E⎣
(T0 (ρ) − Td )r⎦ ∂ρrst
∂
∂ρjkl
ρ=ρc
c
c T
= E (Arst (ρ )Td e(ρ )) Td Ajkl (ρc )e(ρc )
) π
1
1
c
c
∗
Tr
Td Ajkl (ρ )Φe (ω)(Arst (ρ )Td ) dω ,
=
2π
−π
(8.29)
With (8.28) we obtain
Tr{Td Ajkl Φe (Arst Td )∗ }
= Bl B̄t Tr{Td aj εTk Φe T∗d εs a∗r }
(8.30)
= Bl B̄t εTk Φe T∗d εs a∗r Td aj
= Bl B̄t (Φe T∗d )sk a∗r Td aj .
Let Q = Td Φe = [Qij ] and introduce the matrix
⎡
Q̄11 a∗1 Td a1
⎢
..
⎢
.
⎢
⎢ Q̄11 a∗n Td a1
P=⎢
⎢
⎢ Q̄21 a∗ Td a1
1
⎣
..
.
Q̄11 a∗1 Td an
..
.
···
..
.
···
Q̄11 a∗n Td an
···
Q̄21 a∗1 Td an
Q̄12 a∗1 Td a1
..
.
Q̄12 a∗n Td a1
..
.
···
⎤
⎥
⎥
⎥
··· ⎥
⎥,
⎥
⎥
⎦
..
.
(8.31)
190
8 Gradient Estimation in IFT for Multivariable Systems
then using (8.29) together with (8.30) and (8.31),
pressed as
c)
∂ ∂J(ρ
∂ρ ∂ρ
c )
∂ ∂J(ρ
σ2 π
=
Γ̄D PΓTD dω
∂ρ ∂ρ
2π −π
σ2 π
=
Γ̄D Ξ∗ (Td Φe ) ⊗ Td )ΞΓTD dω.
2π −π
can be ex-
(8.32)
with ΓD and Ξ being block diagonal matrices with each diagonal block
element equal to Γ and C−1 , respectively. Using the definition (8.26) and
(8.32), we obtain
Ĵpp (ρc ) =
σ2
2π
π
−π
MJ dω
(8.33)
where MJ is the factorization
MJ = ΓD ΞT ((Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗ TTd )Ξ̄ Γ∗D .
(8.34)
H
2
If x ∈ Rmn then
xT Ĵpp x =
σ2
2π
π
F(ejω )H(ω)F(ejω )∗ dω
(8.35)
−π
where F(ejω ) = xT ΓD is a multiple input single output filter and thus
xT Ĵpp x ≥ 0 with equality if and only if x = 0. This holds under the
assumption that H > 0 for all ω ∈ [ −π, π ]. Furthermore H > 0 if
and only if Td ⊗ T̄d + T∗d ⊗ TTd > 0 since Ξ is quadratic and has full rank
for all ω ∈ [ −π, π ]. This concludes the proof.
8.A.2
Proof of Corollary 8.1
When E[rk rl ] = 0, k = l and Td is diagonal, Φe becomes diagonal,
i.e. Φe = diag[|Skk |2 Φrk ]. This leads to that (Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗
TTd = diag[|Skk |2 Φrk ](Tkk T̄ll + Tll T̄kk )]. Hence (Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗
TTd > 0 if and only if Tkk T̄ll +Tll T̄kk > 0 for all k, l and ω. The expression
Tkk T̄ll + Tll T̄kk > 0 if and only if | arg Tkk (ejω ) − arg Tll (ejω )| < π2 for
all k, l and ω.
191
8.A Proofs
8.A.3
Proof of Theorem 8.2
Introduce the extended process G̃0 = G0 W and let Cd = diag[Cii ],
then the outline of the proof of Theorem 8.1 can be followed. Let Q =
Td Φe = [Qij ]. By exploiting the diagonal structure of the controller,
(8.30) becomes
Tr{Td Ajjt Φe (Arrl Td )∗ }
−1 −1
C̄rr Trj Q̄rj
= Bl B̄t Cjj
(8.36)
and again Ĵpp (ρc ) can be written as (8.33), but now
MJ = ΓD C−1 ((Td Φe ) T̄d + (Φe T∗d ) TTd )C̄−1 Γ∗D .
(8.37)
Then Ĵpp (ρc ) > 0 if (Td Φe ) T̄d + (Φe T∗d ) TTd > 0 for all ω ∈ [−π, π].
This follows from the factorization (8.37) and the proof of Theorem 8.1.
8.A.4
Proof of Corollary 8.2
When E[rk rl ] = 0, k = l and Td is diagonal, Φe becomes diagonal, i.e. Φe = diag[|Skk |2 Φrk ]. This leads to that (Td Φe ) T̄d =
diag[|Skk |2 Φrk |Tkk |2 ] which is obviously positive and hence (Td Φe ) T̄d + (Φe T∗d ) TTd > 0.
8.A.5
Proof of Theorem 8.3
In the proof of Theorem 8.1 use the fact that a P-controller is frequency
independent. Then a sufficient condition for (8.33) to be positive definite
is that
π
((Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗ TTd )dω > 0.
−π
A similar reasoning holds for the diagonal case.
Chapter 9
Summary and
Suggestions for Future
Work
Several different problems related to experiment design have been studied
in this thesis. The main focus has been on optimal experiment design for
identification but also experiment design for controller tuning based on
closed-loop data has been considered. This chapter briefly summarizes
the main contributions and suggestions for future work are presented.
9.1
Summary
Chapter 3 – Fundamentals of Optimal Experiment Design
Typical optimal experiment design formulations are non-convex and infinite dimensional which is from an optimization point of view intractable.
A flexible framework for translating such problems into finite dimensional
convex programs is introduced in this chapter. The framework allows for
a large class of quality constraints. This includes e.g. the classical A-,
E-, D- and L-optimal constraints. A more important contribution from
a control perspective are the introduction of different types of frequencyby-frequency constraints on the frequency function estimate. Here both
variance constraints and certain types of constraints that are guaranteed
to hold in a confidence region are considered.
194
9 Summary and Suggestions for Future Work
The key to obtain tractable experiment design formulations lies in the
parametrization of the design spectrum. Two general representations are
introduced that nicely generalize previous presented parametrizations.
They all have in common that they are linear and finite-dimensional
parametrizations of the spectrum or a partial expansion thereof. Both
continuous and discrete spectra can be handled and constraints on these
spectra can be included in the design formulation, either in terms of
power bounds or as frequency wise constraints.
Design both in open- and closed-loop is considered. Furthermore, results for input design for models with biased noise dynamics is presented.
Chapter 4 – Finite Sample Input Design for Linearly Parametrized
Models
Most experiment designs rely on uncertainty descriptions that are valid
asymptotically in the sample size. In this chapter a certain class of linearly parametrized models is considered for which exact variance expressions can be obtained for finite sample sizes. Based on these expressions,
two solutions that perform the optimization over the square of the DFT
coefficients of the input, is presented.
Chapter 5 – Applications
A comparison is performed between the use of optimal inputs and the use
of standard identification input signals, for example PRBS signals for two
benchmark problems taken from process control and control of flexible
mechanical structures. The results show that there is a substantial reduction in experiment time and input excitation level using optimal inputs.
It is indicated by Monte-Carlo simulations that there are advantages also
in the case where the true system is replaced by a model estimate in the
input design.
Chapter 6 – Input Design for Identification of Zeros
Accurate identification of a non-minimum zero, zo , is the topic of Chapter 6. It is shown that the optimal input is characterized by the auto−|k|
for FIR and ARX model structures.
correlations E u(t)u(t − k) = αzo
This is also true for general linear models asymptotically in the model
order. Furthermore, it is shown that the variance of an estimated zero is
independent of the model order when the optimal input is applied provided it is larger than or equal to the true order. A numerical solution
9.2 Future Work
195
has been derived for general linear models of finite order. An example
illustrated that the optimal input may be very different depending on
model structure and order.
It is also shown that the variance can be reduced significantly using
optimal designed inputs compared to white inputs and square-waves, especially when the model is over-parameterized. It is also shown that a
solution where the true zero is replaced by an estimated zero, is quite
robust with respect to the estimated zero location.
Chapter 7 – Convex Computation of Worst Case Criteria
The main contributions are the introduction of a generalized cost function, that e.g. includes the worst case Vinnicombe distance, and a method
that computes an upper bound of this cost function without using frequency gridding. This gridding is avoided by introducing a relaxation of
a certain variable and by using the Kalman-Yakubovich-Popov-lemma
Chapter 8 – Gradient Estimation in IFT for Multivariable Systems
The gradient estimation problem in IFT for MIMO systems may be very
experimentally costly. Several methods to approximate the gradient is
suggested in Chapter 8. A method, in which the order of operators are
shifted in a similar fashion as in IFT for SISO systems, is deeper analyzed. This reduces the amount of experiments to two regardless of the
complexity of the system and the controller. Multivariable operators do
not in general commute. However the analysis gives sufficient conditions
for local convergence which shows that the numerical gradient search may
still converge to the true optimum even if the commutation error is nonzero. However, caution has to be taken as is illustrated in the simulation
example.
9.2
Future Work
The work of this thesis has generated some ideas of future work. Some
suggestions are:
• Extend the results and study optimal experiment design for multiinput/multi-output systems.
196
9 Summary and Suggestions for Future Work
• The Kalman-Yakubovich-Popov-lemma is frequently used to handle frequency wise constraints. The existing formulations of these
constraints are computationally costly and a reformulation would
be useful.
• Optimal experiment designs depend in most cases on the true underlying system. Further knowledge of the sensitivity of a design
based on an estimate of the true system may be useful for designing
more robust experiments.
• Examples show that accurate models can be obtained by a proper
choice of excitation even though the model order is lower than the
true order. This should be further explored and connections to
non-linear systems are of high relevance.
• In Chapter 6, input design for identification of zeros is considered.
In the future, other quantities such as poles, but also important
properties such as gain and phase margins would be interesting to
study.
• The method for the computation of an upper bound in Chapter 7
has the disadvantage that a relaxation is imposed leading to conservativeness. It is of great interest to further investigate how conservative these results may be and if there are ways around this.
This also applies to the results in Section 3.7.
• For the gradient estimation problem in Chapter 8 only output based
cost functions have been considered. It is desirable to extend the
work in the future to include cost functions which also contain input
weighting.
Bibliography
Alkire, B. and L. Vandenberghe (2002). Convex optimization problems
involving finite autocorrelation sequences.. Mathematical Programming Series A 93, 331–359.
Anderson, B. D. O. and J.B. Moore (1979). Optimal Filtering. PrenticeHall. New Jersey.
Bittanti, S., M. C. Campi and S. Garatti (2002). New results on the
asymptotic theory of system identification for the assessment of the
quality of estimated models.. In: Proc. 41st IEEE Conf. on Decision
and Control. Las Vegas, Nevada, USA.
Bohlin, T. (1991). Interactive System Identification: Prospects and Pitfalls. Springer-Verlag.
Bombois, X., B. D. O. Anderson and M. Gevers (2000a). Mapping parametric confidence ellipsoids to Nyquist plane for linearly
parametrized transfer functions.. In: Model Identification and Adaptive Control.. pp. 53–71. Springer Verlag.
Bombois, X., B.D.O. Anderson and M. Gevers (2004a). Quantification of
frequency domain error bounds with guaranteed confidence level in
Prediction Error Identification. Systems and Control Letters. Submitted.
Bombois, X., G. Scorletti, M. Gevers, R. Hildebrand and P. Van den
Hof (2004b). Cheapest open-loop identification for control. In: Proc.
43th IEEE Conf. on Decision and Control. Bahamas.
Bombois, X., G. Scorletti, M. Gevers, R. Hildebrand and P. Van den
Hof (2004c). Least costly identification experiment for control. In:
MTNS’04. Leuven, Belgium.
198
Bibliography
Bombois, X., G. Scorletti, M. Gevers, R. Hildebrand and P. Van den Hof
(2004d). Least costly identification experiment for control. Automatica. Submitted.
Bombois, X., M. Gevers and G. Scorletti (1999). Controller validation
based on an identified model.. In: Proc. 38th IEEE Conf. on Decision and Control. Phoenix, Arizona, USA.
Bombois, X., M. Gevers and G. Scorletti (2000b). A measure of robust stability for an identified set of parametrized transfer functions.
IEEE Trans. Automatic Control 45, 2141–2145.
Bombois, X., M. Gevers, G. Scorletti and B. D. O. Anderson (2001). Robustness analysis tool for an uncertainty set obtained by prediction
error identification. Automatica 37, 1629–1636.
Box, G. E. P. and G. M. Jenkins (1970). Time Series Analysis, Forecasting and Control. Holden-Day. San Francisco.
Boyd, S. and C. Barratt (1991). Linear controller design: Limits of performance. Prentice-Hall.
Boyd, S. and L. Vandenberghe (2003). Convex Optimization. Cambridge
University Press.
Boyd, S., L. Ghaoui, E. Feron and V. Balakrishnan (1994). Linear matrix
inequalities in systems and control theory, Studies in applied Applied
Mathematics. SIAM. Philadelphia.
Bruyne, F.De and P. Carrette (1997). Synthetic generation of the gradient
for an iterative controller optimization method. In: Proceedings of
European Control Conference. Brussels, Belgium.
Byrnes, C.I., S.V. Gusev and A. Lindquist (2001). From finite covariance
windows to modeling filters: A convex optimization approach. SIAM
Review 43, 645–675.
Campi, M. C. and E. Weyer (2002). Finite sample properties of system
identification methods. IEEE Trans. Automatic Control 47, 1329–
1334.
Cooley, B. L. and J. H. Lee (2001). Control-relevant experiment design
for multivariable systems described by expansions in orthonormal
bases.. Automatica 37, 273–281.
Bibliography
199
De Bruyne, F., B.D.O. Anderson, M. Gevers and N. Linard (1997). Iterative controller optimization for nonlinear systems. In: Proc. 36th
IEEE Conf. on Decision and Control. San Diego, California.
Dumitrescu, B., I. Tăbuş and P. Stoica (2001). On the Parameterization
of Positive Real Sequences and MA Parameter Estimation. IEEE
Transactions on Signal Processing AC-49, 2630–2639.
Eykhoff, P. (1974). System Identification, Parameter and State Estimation. Wiley.
Fedorov, V. V. (1972). Theory of Optimal Experiments, volume 12 of
Probability and Mathematical Statistics. Academic Press.
Forssell, U. and L. Ljung (2000). Some results on optimal experiment
design. Automatica 36(5), 749–756.
Garatti, S., M. C. Campi and S. Bittanti (2003). Model quality assessment for instrumental variable methods: use of the asymptotic theory in practice.. In: Proc. 42nd IEEE Conf. on Decision and Control.
Maui, Hawaii, USA.
Gevers, M. (1993). Towards a joint design of identification and control?.
In: Essays on Control: Perspectives in the Theory and its Applications (H. L. Trentelman and J. C. Willems, Eds.). Birkhäuser.
Gevers, M. and L. Ljung (1986). Optimal experiment designs with respect
to the intended model application. Automatica 22, 543–554.
Gevers, M., X. Bombois, B. Codrons, G. Scorletti and B. D. O. Anderson
(2003). Model validation for control and controller validation in a
prediction error identification framework–Part I: theory. Automatica
39, 403–415.
Gillberg, Jonas and Anders Hansson (2003). Polynomial complexity for a
Nesterov-Todd potential-reduction method with inexact search directions. In: Proceedings of thet 42nd IEEE Conference on Decision
and Control. Maui, Hawaii, USA. p. 6.
Goodwin, G. C. (1982). Experiment design. In: 6th IFAC Symposium on
System Identification. Washington D.C.. pp. 19–26.
Goodwin, G.C. and R.L. Payne (1977). Dynamic System Identification:
Experiment Design and Data Analysis, volume 136 of Mathematics
in Science and Engineering. Academic Press.
200
Bibliography
Grenander, U. and G. Szegö (1958). Toeplitz Forms and Their Applications. University of California Press. Berkley, CA.
Gustavsson, I., L. Ljung and T. Söderström (1977). Identification of processes in closed loop – identifiability and accuracy aspects. Automatica 13, 59–75.
Hannan, E. J. and M. Deistler (1988). The Statistical Theory of Linear
Systems. J. Wiley & Sons. N. Y.
Hansson, Anders and L. Vandenberghe (2001). A primal-dual potential
reduction method for integral quadratic constraints. In: 2001 American Control Conference. Arlington, Virginia. pp. 3013–3018.
Hildebrand, R. and M. Gevers (2003). Identification for control: Optimal
input design with respect to a worst case ν-gap cost function. SIAM
Journal on Control and Optimization 41(5), 1586–1608.
Hindi, H., B. Hassibi and S. Boyd (1998). Multiobjective H2 /H∞ optimal control via finite-dimensional Q-parametrization and linear
matrix inequalities. In: American Control Conference (ACC 1998).
Philadelphia, Pennsylvania, USA.
Hjalmarsson, H. (1998). Control of nonlinear systems using Iterative
Feedback Tuning. In: Proc. 1998 American Control Conference.
Philadelphia. pp. 2083–2087.
Hjalmarsson, H. (1999). Efficient tuning of linear multivariable controllers
using iterative feedback tuning. International Journal on Adaptive
Control and Signal Processing 13, 553–572.
Hjalmarsson, H. (2002). Iterative feedback tuning–an overview. International Journal on Adaptive Control and Signal Processing 16, 373–
395.
Hjalmarsson, H. (2003). From experiments to closed loop control. In:
13th IFAC Symposium on System Identification. Rotterdam, The
Netherlands.
Hjalmarsson, H. (2004). From experiments to control. Automatica.
Hjalmarsson, H. and B. Ninness (2004). An exact finite sample variance
expression for linearly parameterized frequency function estimates.
Automatica. Submitted.
Bibliography
201
Hjalmarsson, H. and H. Jansson (2003). Using a sufficient condition to analyze the interplay between identification and control. In: 13th IFAC
Symposium on System Identification. Rotterdam, The Netherlands.
Hjalmarsson, H. and K. Lindqvist (2002). Identification of performance
limitations in control using ARX-models. In: Proceedings of The
15th IFAC World Congress.
Hjalmarsson, H. and M. Gever (2003). Algorithms and applications of
iterative feedback tuning. Special Section in Control Engineering
Practice 11, 1021.
Hjalmarsson, H., M. Gevers and F. De Bruyne (1996). For model-based
control design, closed loop identification gives better performance.
Automatica 32(12), 1659–1673.
Hjalmarsson, H., M. Gevers and O. Lequin (1998). Iterative Feedback
Tuning: Theory and Applications. IEEE Control Systems Magazine
18(4), 26–41.
Hjalmarsson, H., S. Gunnarsson and M. Gevers (1994). A convergent
iterative restricted complexity control design scheme. In: Proc. 33rd
IEEE Conf. on Decision and Control. Orlando, FL. pp. 1735–1740.
Jacobsen, E. W. (1994). Identification for control of strongly interactive
plants. In: AIChe Annual Meeting. San Francisco, CA.
Jansson, H. and H. Hjalmarsson (2004a). A framework for mixed H∞
and H2 input design. In: MTNS’04. Leuven, Belgium.
Jansson, H. and H. Hjalmarsson (2004b). A general framework for mixed
H∞ and H2 input design. IEEE Trans. Aut. Contr. submitted.
Jansson, H. and H. Hjalmarsson (2004c). Mixed H∞ and H2 input design
for identification. In: CDC’04. Bahamas.
Johansson, R. (1993). System Modeling and Identification. Information
and System Sciences Series. Prentice-Hall. New Jersey.
Jönsson, U. (1996). Robustness analysis of uncertain and nonlinear systems. Phd thesis. Lund Institute of Technology, Lund, Sweden.
Kao, C.Y., A. Megretski and U. Jönsson (2004). Specialized fast algorithms for IQC feasibility and optimization problems. Automatica
40(2), 239–252.
202
Bibliography
Karlin, S. and W. Studden (1966a). Optimal experimental designs. Ann.
Math. Stat. 37, 783–815.
Karlin, S. and W. Studden (1966b). Tscebycheff Systems with Applications to Analysis and Statistics. Wiley-Interscience.
Kiefer, J. (1959). Optimum experimental designs. J. Royal Stat. Soc.
21, 273–319.
Kiefer, J. and J. Wolfowitz (1959). Optimum designs in regression problems. Ann. Math. Stat. 30, 271–294.
Lacy, S.L., D.S. Bernstein and R.S. Erwin (2003). Finite-horizon input
selection for system identification. In: IEEE Conference on Decision
and Control. IEEE. Maui, Hawaii, USA.
Landau, I. D., D. Rey, A. Karimi, A. Voda and A. Franco (1995a). A flexible transmission system as a benchmark for robust digital control.
European Journal of Control 1(2), 77–96.
Landau, I.D., D. Rey, A. Karimi, A. Voda and A. Franco (1995b). A flexible transmission system as a benchmark for robust digital control.
Journal of Process Control 1, 77–96.
Lee, J. H. (2003). Control-relevant design of periodic test input signals
for iterative open-loop identification of multivariable fir systems. In:
13th IFAC Symposium on System Identification. Rotterdam, The
Netherlands.
Lee, W. S., B. D. O. Anderson, R. L. Kosut and I. M. Y. Mareels (1993).
A new approach to adaptive robust control. IJACSP 7, 183–211.
Levin, M. J. (1960). Optimal estimation of impulse response in the presence of noise. IRE Transactions on Circuit Theory CT-7, 50–56.
Lindqvist, K. (2001). On experiment design in identification of smooth
linear systems. Licentiate thesis, TRITA-S3-REG-0103.
Lindqvist, K. and H. Hjalmarsson (2000). Optimal input design using linear matrix inequalities.. In: Proc. 12th IFAC Symposium on System
Identification. Santa Barbara, California, USA.
Lindqvist, K. and H. Hjalmarsson (2001). Identification for control:
Adaptive input design using convex optimization. In: Conference
on Decision and Control. IEEE. Orlando, Florida, USA.
Bibliography
203
Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE
Transactions on Automatic Control AC-22(4), 551–575.
Ljung, L. (1985). Asymptotic variance expressions for identified black-box
transfer function models. IEEE Transactions on Automatic Control
AC-30, 834–844.
Ljung, L. (1999). System Identification - Theory For the User, 2nd ed.
PTR Prentice Hall. Upper Saddle River, N.J.
Ljung, L. and Z.D. Yuan (1985). Asymptotic properties of black-box
identification of transfer functions. IEEE Transactions on Automatic
Control AC-30, 514–530.
Mårtensson, J. and H. Hjalmarsson (2003). Identification of performance
limitations in control using general SISO-models. In: 13th IFAC
Symposium on System Identification.
Megretski, A. and A. Rantzer (1997). System analysis via integral
quadratic constraints. IEEE Transactions on Automatic Control
AC-42(6), 819–830.
Mehra, R. K. (1974). Optimal input signals for parameter estimation
in dynamic systems–survey and new results. IEEE Transactions on
Automatic Control AC-19, 753–768.
Mehra, R. K. (1981). Choice of input signals. In: Trends and Progress in
System Identification (P. Eykhoff, Ed.). Pergamon Press, Oxford.
Milanese, M. and A. Vicino (1991). Optimal estimation theory for dynamic systems with set membership uncertainty: An overview. Automatica 27(6), 997–1009.
Morari, M. and E. Zafiriou (1989). Robust Process Control. Prentice-Hall.
Englewood Cliffs, NJ.
Nesterov, Y. and A. Nemirovski (1994). Interior-Point Polynomial Methods in Convex Programming Studies Appl. Math. 13. Studies Appl.
Math. 13, SIAM, Philadelphia, PA.
Ng, T. S., G. C. Goodwin and R. L. Payne (1977a). On maximal accuracy
estimation with output power constraints. IEEE Trans. Automatic
Control 22, 133–134.
204
Bibliography
Ng, T. S., G. C. Goodwin and T. Söderström (1977b). Optimal experiment design for linear systems with input-output constraints. Automatica 13, 571–577.
Ninness, B. and H. Hjalmarsson (2002a). Exact quantification of variance
error. In: 15th World Congress on Automatic Control. Barcelona,
Spain.
Ninness, B. and H. Hjalmarsson (2002b). Quantification of variance error.
In: 15th World Congress on Automatic Control. Barcelona, Spain.
Ninness, B. and H. Hjalmarsson (2003). The analysis of variance error:
Quantifications exact for finite model order. In: IEEE Conference
on Decision and Control. IEEE. Maui, Hawaii, USA.
Ninness, B. and H. Hjalmarsson (2004). Variance error quantifications
that are exact for finite model order. IEEE Transactions on Automatic Control.
Ninness, B., H. Hjalmarsson and F. Gustafsson (1999). The fundamental role of general orthonormal bases in system identification. IEEE
Transactions on Automatic Control AC-44, 1384–1406.
Ogunnaike, B. A. (1996). A contemporary industrial perspective on control theory and practice. A. Rev.Control 20, 1–8.
Payne, R.L. and G.C. Goodwin (1974). Simplification of frequency domain experiment design for SISO systems. Technical Report 74/3.
Imperial College, London.
Pintelon, R. and J. Schoukens (2001). System Identification: A frequency
domain approach. IEEE Press.
Porat, B. (1994). Digital Processing of Random Signals. Prentice-Hall.
Englewood Cliffs, NJ.
Rantzer, A. (1996). On the Kalman-Yakubovich-Popov lemma. Systems
and Control letters 28(1), 7–10.
Rivera, D.E., H. Lee, M.W. Braun and H.D. Mittelman (2003). Plantfriendly system identification: a challenge for the process industries.
In: 13th IFAC Symposium on System Identification. Rotterdam, The
Netherlands. pp. 917–922.
Bibliography
205
Rivera, D.E., M.W. Braun and H.D. Mittelmann (2002). Constrained
multisine inputs for plant-friendly identification of chemical processes.. In: 15th IFAC World Congress. Barcelona, Spain.
Samyudiya, Y. and J.H. Lee (2000). A two-setp approach to controlrelevant design of test input signals for iterative system identification.. In: 12th IFAC Symposium on System Identification. Santa
Barbara, California, USA.
Schur, I. (1918). On power series which are bounded in the interior of the
unit circle I and II. Reine Angew. Math. 148, 122–145.
Shirt, R.W., T.J. Harris and D.W. Bacon (1994). Experimental design
considerations for dynamic systems. Ind. Eng. Res. 33, 2656–2667.
Sjöberg, J. and F. De Bruyne (1999). On a nonlinear controller tuning
strategy. In: 14th IFAC World Congress. Vol. I. Beijing, P.R. China.
pp. 343–348.
Sjöberg, J. and M. Agarwal (1996). Model-free repetitive control design
for nonlinear systems. In: Proc. 35th IEEE Conf. on Decision and
Control. Vol. 4. Kobe, Japan. pp. 2824–2829.
Sjöberg, J. and M. Agarwal (1997). Nonlinear controller tuning based
on linearized time-variant model. In: Proc. of American Control
Conference. Albuquerque, New Mexico.
Skogestad, S. (2003). Simple analytic rules for model reduction and PID
controller tuning. Journal of Process Control 13, 291–309.
Skogestad, S. and I. Postlethwaite (1996). Multivariable Feedback Control,
Analysis and Design. John Wiley and Sons.
Söderström, T. and P. Stoica (1989). System Identification. Prentice-Hall
International. Hemel Hempstead, UK.
Stoica, P. and R. Moses (1997). Introduction to Spectral Analysis. Prentice
Hall. New Jersey.
Stoica, P. and T. Söderström (1982). A Useful Parameterization for Optimal Experiment Design. IEEE Trans. Aut. Contr. AC-27(4), 986–
989.
Stoica, P., T. McKelvey and J. Mari (2000). MA estimation in polynomial
time. IEEE Transactions on Signal Processing AC-48, 1999–2012.
206
Bibliography
Trulsson, E. and L. Ljung (1985). Adaptive control based on explicit
criterion minimization. Automatica 21(4), 385–399.
Tulleken, H. J. A. (1990). Generalized Binary Noise Test-signal Concept for Improved Identification-experiment Design. Automatica
26(1), 37–49.
Van den Hof, P. M. J. and R.J.P. Schrama (1995). Identification and
control - closed loop issues. Automatica 31(12), 1751–1770.
Vinnicombe, G. (1993). Frequency domain uncertainty and the
graph topology. IEEE Transactions on Automatic Control AC38(9), 1371–1383.
Wahlberg, B. (1991). System Identification Using Laguerre Models. IEEE
Transactions on Automatic Control AC-36(5), 551–562.
Wahlberg, B. (1994). System Identification Using Kautz Models. IEEE
Transactions on Automatic Control AC-39(6), 1276–1281.
Wahlberg, B. and L. Ljung (1992). Hard Frequency-Domain Model Error
Bounds from Least-Squares Like Identification Techniques. IEEE
Transactions on Automatic Control AC-37(7), 900–912.
Wallin, Ragnar, Anders Hansson and L. Vandenberghe (2003). Comparison of two structure-exploiting optimization algorithms for integral
quadratic constraints. In: 4th IFAC symposium on robust control
design. Milan, Italy.
Weyer, E. and M. C. Campi (2002). Non-asymptotic confidence ellipsoids
for the least-square estimate. Automatica 38, 1539–1547.
Whitaker, H.P., J. Yamron and A. Kezer (1958). Design of modelreference adaptive control systems for aircraft.. In: Technical report,
Report R–164, Instrumentation Laboratory. MIT, MA.
Wu, S-P., S. Boyd and L. Vandenberghe (1996). FIR Filter Design via
Semidefinite Programming and Spectral Factorization. In: Proc.
35th IEEE Conf. on Decision and Control. Kobe, Japan. pp. 271–
276.
Xie, L.-L and L. Ljung (2001). Asymptotic variance expressions for estimated frequency functions. IEEE Transactions on Automatic Control AC-46, 1887–1899.
Bibliography
207
Yakubovich, V. A. (1962). Solution of certain matrix inequalities occuring in the theory of automatic control. Docl. Acad. Nauk. SSSR
pp. 1304–1307.
Yuan, Z. D. and L. Ljung (1984). Black-box identification of multivariable
transfer functions – asymptotic properties and optimal input design.
Int. J. Control 40(2), 233–256.
Yuan, Z. D. and L. Ljung (1985). Unprejudiced optimal open loop
input design for identification of transfer functions. Automatica
21(6), 697–708.
Zang, Z., R. R. Bitmead and M. Gevers (1995). Iterative weighted leastsquares identification and weighted LQG control design. Automatica
31, 1577–1594.
Zarrop, M. (1979). Design for Dynamic System Identification. Lecture
Notes in Control and Information Sciences. Sci. 21. Springer Verlag,
Berlin.
Zhou, K., J.C. Doyle and K. Glover (1996). Robust and Optimal Control.
Prentice Hall, New Jersey.
Zhu, Y. C. (1998). Multivariable process identification for MPC: the
asymptotic method and its applications. J. Proc. Control 8(2), 101–
115.
Zhu, Y. C. (2001). Multivariable System Identification for Process Control. Pergamon. Oxford.
Zhu, Y. C. and P. P. J. van den Bosch (2000). Optimal closed-loop identification test design for internal model control. Automatica 36, 1237–
1241.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertisement