Experiment Design with Applications in Identification for Control Henrik Jansson TRITA-S3-REG-0404 ISSN 1404–2150 ISBN 91-7283-905-8 Automatic Control Department of Signals, Sensors and Systems Royal Institute of Technology (KTH) Stockholm, Sweden, 2004 Submitted to the School of Electrical Engineering, Royal Institute of Technology, in partial fulﬁllment of the requirements for the degree of Doctor of Philosophy. c 2004 by Henrik Jansson Copyright Experiment Design with Applications in Identiﬁcation for Control Automatic Control Department of Signals, Sensors and Systems Royal Institute of Technology (KTH) SE-100 44 Stockholm, Sweden Abstract The main part of this thesis focuses on optimal experiment design for system identiﬁcation within the prediction error framework. A rather ﬂexible framework for translating optimal experiment design into tractable convex programs is presented. The design variables are the spectral properties of the external excitations. The framework allows for any linear and ﬁnite-dimensional parametrization of the design spectrum or a partial expansion thereof. This includes both continuous and discrete spectra. Constraints on these spectra can be included in the design formulation, either in terms of power bounds or as frequency wise constraints. As quality constraints, general linear functions of the asymptotic covariance matrix of the estimated parameters can be included. Here, diﬀerent types of frequency-by-frequency constraints on the frequency function estimate are expected to be an important contribution to the area of identiﬁcation and control. For a certain class of linearly parameterized frequency functions it is possible to derive variance expressions that are exact for ﬁnite sample sizes. Based on these variance expressions it is shown that the optimization over the square of the Discrete Fourier Transform (DFT) coeﬃcients of the input leads to convex optimization problems. The optimal input design are compared to the use of standard identiﬁcation input signals for two benchmark problems. The results show signiﬁcant beneﬁts of appropriate input designs. Knowledge of the location of non-minimum phase zeros is very useful when designing controllers. Both analytical and numerical results on input design for accurate identiﬁcation of non-minimum phase zeros are presented. A method is presented for the computation of an upper bound on the maximum over the frequencies of a worst case quality measure, e.g. the worst case performance achieved by a controller in an ellipsoidal uncertainty region. This problem has until now been solved by using a frequency gridding and, here, this is avoided by using the Kalman-Yakubovich- iv Popov-lemma. The last chapter studies experiment design from the perspective of controller tuning based on experimental data. Iterative Feedback Tuning (IFT) is an algorithm that utilizes sensitivity information from closedloop experiments for controller tuning. This method is experimentally costly when multivariable systems are considered. Several methods are proposed to reduce the experimental time by approximating the gradient of the cost function. One of these methods uses the same technique of shifting the order of operators as is used in IFT for scalar systems. This method is further analyzed and suﬃcient conditions for local convergence are derived. Acknowledgments The time as a PhD student has been a great experience. I would like to express my sincere gratitude to my supervisors during these years; Professor Bo Wahlberg, Professor Håkan Hjalmarsson, and Docent Anders Hansson. You have all been very helpful. A special thanks goes to Håkan. It has been really inspiring working with you. You are really creative and I have learned a lot. I really appreciate the time you spend on me. If I had to pay ”OB-tillägg” I would be a poor guy. Many thanks goes to my collaborators and co-authors Kristian Lindqvist, Jonas Mårtensson and Märta Barenthin. It has been a great fun working with you. Without you, there would be no SYSID Lab. I would also like to express my gratitude to Karin Karlsson-Eklund for helping me with diﬀerent problems and being supportive. It has been a pleasure to be at the department. For this, I would like to thank all colleagues, former and present. I have really enjoyed your company. Finally, I would like to thank my family for their support. Contents 1 Introduction 1.1 System Identiﬁcation - A Starter . . . . . . . 1.2 Optimal Experiment Design - A Background 1.2.1 Design for Full Order Models . . . . . 1.2.2 Design for High Order Models . . . . 1.2.3 The Return of Full Order Modeling . 1.2.4 Other Contributions . . . . . . . . . . 1.3 Iterative Feedback Tuning . . . . . . . . . . . 1.4 Contributions and Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 5 6 7 9 13 14 15 2 Parameter Estimation and Connections to Input Design 2.1 Parameter Estimation . . . . . . . . . . . . . . . . . . . . 2.2 Uncertainty in the Parameter Estimates . . . . . . . . . . 2.2.1 Parameter Covariance . . . . . . . . . . . . . . . . 2.2.2 Conﬁdence Bounds for Estimated Parameters . . . 2.3 Uncertainty of Frequency Function Estimates . . . . . . . 2.3.1 Variance of Frequency Function Estimates . . . . . 2.3.2 Uncertainty Descriptions Based on Parametric Conﬁdence Regions . . . . . . . . . . . . . . . . . . . . 2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 21 21 23 24 25 3 Fundamentals of Experiment Design 3.1 Introduction . . . . . . . . . . . . . . . 3.1.1 Quality Constraints . . . . . . 3.1.2 Signal Constraints . . . . . . . 3.1.3 Objective Functions . . . . . . 3.1.4 An Introductory Example . . . 3.2 Parametrization and Realization of the 31 31 33 42 42 42 45 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Input Spectrum . . . . . . 27 29 viii Contents 3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . 3.2.2 Finite Dimensional Spectrum Parametrization . . 3.2.3 Partial Correlation Parametrization . . . . . . . . 3.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . 3.3 Parametrizations of the Covariance Matrix . . . . . . . . . 3.3.1 Complete Parametrizations of the Covariance Matrix 3.3.2 A Parametrization Based on a Finite Dimensional Spectrum . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . 3.4 Parametrization of Signal Constraints . . . . . . . . . . . 3.4.1 Parametrization of Power Constraints . . . . . . . 3.4.2 Parametrization of Point-wise Constraints . . . . . 3.5 Experiment Design in Closed-loop . . . . . . . . . . . . . 3.5.1 Spectrum Representations . . . . . . . . . . . . . . 3.5.2 Experiment Design in Closed-loop with a Fix Controller . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Experiment Design in Closed-loop with a Free Controller . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Quality Constraints . . . . . . . . . . . . . . . . . . . . . 3.6.1 Convex Representation of Quality Constraints . . 3.6.2 Application of the KYP-lemma to Quality Constraints . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Quality Constraints in Ellipsoidal Regions . . . . . . . . . 3.7.1 Reformulation as a Convex Problem . . . . . . . . 3.7.2 A Finite Dimensional Formulation . . . . . . . . . 3.8 Biased Noise Dynamics . . . . . . . . . . . . . . . . . . . 3.8.1 Weighted Variance Constraints . . . . . . . . . . . 3.8.2 Parametric Conﬁdence Ellipsoids . . . . . . . . . . 3.9 Computational Aspects . . . . . . . . . . . . . . . . . . . 3.10 Robustness Aspects . . . . . . . . . . . . . . . . . . . . . 3.10.1 Input Spectrum Parametrization . . . . . . . . . . 3.10.2 Working with Sets of a Priori Models . . . . . . . 3.10.3 Adaptation . . . . . . . . . . . . . . . . . . . . . . 3.10.4 Low and High Order Models and Optimal Input Design . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Framework Review and Numerical Illustration . . . . . . . 45 47 50 51 52 53 57 57 58 58 60 60 62 62 63 70 70 71 72 74 77 79 80 81 82 83 83 84 84 85 86 ix Contents 4 Finite Sample Input Design for Linearly Parametrized Models 95 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.2 Discrete Fourier Transform Representation of Signals . . . 97 4.3 Least-squares Estimation . . . . . . . . . . . . . . . . . . 98 4.4 Input Design . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.4.1 Geometric Programming Solution . . . . . . . . . . 103 4.4.2 LMI Solution . . . . . . . . . . . . . . . . . . . . . 105 4.4.3 Numerical Illustration . . . . . . . . . . . . . . . . 106 4.4.4 Closed-form Solution . . . . . . . . . . . . . . . . . 109 4.4.5 Input Design Based on Over-parametrized Models 111 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5 Applications 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 A Process Control Application . . . . . . . . . . . . . . . 5.2.1 Optimal Design Compared to White Input Signals 5.2.2 Optimal Input Design in Practice . . . . . . . . . . 5.3 A Mechanical System Application . . . . . . . . . . . . . 5.3.1 Optimal Design Compared to White Input Signals 5.3.2 Input Design in a Practical Situation . . . . . . . . 5.3.3 Sensitivity of the Optimal Design . . . . . . . . . . 5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 115 115 118 119 120 122 123 124 125 127 6 Input Design for Identiﬁcation of Zeros 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Estimation of Parameters and Zeros . . . . . . . . . . 6.2.1 Parameter Estimation . . . . . . . . . . . . . . 6.2.2 Estimation of Zeros . . . . . . . . . . . . . . . 6.3 Input Design - Analytical Results . . . . . . . . . . . . 6.3.1 Input Design for Finite Model Orders . . . . . 6.3.2 Input Design for High-order Systems . . . . . . 6.3.3 Realization of Optimal Inputs . . . . . . . . . . 6.4 Input Design - A Numerical Solution . . . . . . . . . . 6.4.1 Numerical Example . . . . . . . . . . . . . . . 6.5 Sensitivity and Beneﬁts . . . . . . . . . . . . . . . . . 6.5.1 Numerical Example . . . . . . . . . . . . . . . 6.6 Using Restricted Complexity Models for Identiﬁcation Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 131 132 133 133 135 136 136 142 143 144 146 147 149 . . . . . . . . . . . . . . . . . . . . . . . . of . . . . 150 151 x Contents 7 Convex Computation of Worst Case Criteria 153 7.1 Motivation of Quality Measure . . . . . . . . . . . . . . . 155 7.1.1 Parametric Ellipsoidal Constraints . . . . . . . . . 155 7.1.2 Worst Case Performance of a Control Design . . . 156 7.1.3 The Worst Case Vinnicombe Distance . . . . . . . 157 7.2 Computation of Worst Case Criterion for a Fixed Frequency158 7.3 Computation of an Upper Bound . . . . . . . . . . . . . . 160 7.4 Numerical Illustrations . . . . . . . . . . . . . . . . . . . . 163 7.4.1 Computation of the Worst Case Vinnicombe Distance163 7.4.2 Computation of Worst Case Control Performance . 165 7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 166 8 Gradient Estimation in IFT for Multivariable Systems 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 System Description . . . . . . . . . . . . . . . . . . . . . . 8.3 Gradient Estimation in the IFT Framework . . . . . . . . 8.4 Gradient Approximations . . . . . . . . . . . . . . . . . . 8.5 Analysis of Local Convergence . . . . . . . . . . . . . . . 8.6 Numerical Illustrations . . . . . . . . . . . . . . . . . . . . 8.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 8.A Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.A.1 Proof of Theorem 8.1 . . . . . . . . . . . . . . . . 8.A.2 Proof of Corollary 8.1 . . . . . . . . . . . . . . . . 8.A.3 Proof of Theorem 8.2 . . . . . . . . . . . . . . . . 8.A.4 Proof of Corollary 8.2 . . . . . . . . . . . . . . . . 8.A.5 Proof of Theorem 8.3 . . . . . . . . . . . . . . . . 169 169 171 173 175 177 182 186 188 188 190 191 191 191 9 Summary and Suggestions for Future Work 193 9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 195 Chapter 1 Introduction In this thesis diﬀerent ways of gathering system information based on experimental data are considered. More precisely, we will focus on methods and situations where it is possible to manipulate some external excitation signals in order to make this information collection more eﬃcient. A typical system conﬁguration is depicted in Figure 1.1, where S is an unknown system, u is a known or measured input signal, v represents unmeasurable external disturbances and y is the measured output of the system. Possibly, there is also a feedback mechanism, here represented by the controller K and external reference signals to the closed-loop system represented by r. We will consider experiment design from two diﬀerent perspectives. The ﬁrst is how to construct external excitations that yield informative experiments for system identiﬁcation. System identiﬁcation deals with the topic of obtaining mathematical models of dynamic systems based on measured data. The nature of the external excitations that acts on the system plays an important role for the characteristics of the estimated models in system identiﬁcation. Sometimes, these excitations can be manipulated by the model builder. This is very useful in order to obtain as much qualitative information as possible. This will include design in open-loop where the spectral properties of u are shaped. We will also consider situations where there is a feedback mechanism involved that make direct inﬂuence of u impossible, but where u indirectly can be manipulated by external reference signals and/or via the feedback mechanism itself. The second perspective is related to performance calibration of a con- 2 1 Introduction v u y S K(ρ) r Figure 1.1: A system S with output y, input u and disturbance v. troller by using experimental data, obtained from special experiments on the real process in feedback with the controller to be tuned. The starting point is the system depicted in Figure 1.1 where the feedback loop is closed. By manipulating the reference signal r it is possible to retrieve information about the closed-loop sensitivity to the parameters ρ that parametrizes the controller K(ρ). Iterative Feedback Tuning (IFT) is a method that uses such sensitivity information to tune controllers. However, the experiment load grows substantially when multivariable systems are considered. We will study diﬀerent ways of approximating the gradient estimate to reduce the number of experiments. This chapter contains short introductions to experiment design for system identiﬁcation and to controller tuning using IFT. 1.1 System Identification - A Starter System identiﬁcation is a mature subject and there exist several methods. Applications of these can be found in almost all engineering disciplines. The theory has been ﬁrmly treated in several books including (Bohlin, 1991; Eykhoﬀ, 1974; Goodwin and Payne, 1977; Hannan and Deistler, 1988; Johansson, 1993; Pintelon and Schoukens, 2001). This thesis considers only identiﬁcation in the prediction error framework for which (Ljung, 1999; Söderström and Stoica, 1989) serve as excellent sources. Furthermore, we will restrict the attention to identiﬁcation of discrete time linear time-invariant single input/ single output systems. It should be remarked that the prediction error method is not restricted to 1.1 System Identification - A Starter 3 this class of systems. The system can be both continuous and non-linear as well as containing several inputs and outputs. The method consists of basically two steps. The ﬁrst is to choose a model class parametrized by some parameter vector. The second is to ﬁnd the model within the model class that minimizes the prediction error. To illustrate this consider the following example. Example 1.1 Consider that we have a set of observed input/output data {u(t), y(t)}N t=1 where u and y are real and scalar valued. Based on these observations we want to ﬁnd a mathematical relation that describes the coupling between the input and output data. Let us use a prediction error approach to this modelling. The ﬁrst step is to deﬁne a parametrized model class that correspond to our assumptions of how the data has been generated. One example is to propose the following second order Finite Impulse Response (FIR) model y(t, θ) = b0 u(t − 1) + b1 u(t − 2) + e(t). (1.1) Here b0 and b1 are unknown parameters to be estimated and e(t) represents an unmeasurable disturbance in terms of white noise with variance λ. The parameters are typically collected in a vector θT = b0 b1 . The objective is thus to estimate θ which in the prediction error framework is done by minimizing the diﬀerence between the measured output y and the prediction of y based on past data. The best one-step ahead predictor of the model (1.1) is given by ŷ(t, θ) = b0 u(t − 1) + b1 u(t − 2) T u(t − 1) = b0 b1 u(t − 2) = θT ϕ(t). (1.2) A common way to obtain the estimate θ̂N of θ is by minimizing a leastsquares criterion, i.e. θ̂N = arg min θ N 1 (y(t) − ŷ(t, θ))2 . 2N t=1 (1.3) Since the predictor is linear in the parameters, a closed form solution of 4 1 Introduction θ̂N can be obtained. The solution to (1.3) is given by N −1 N T ϕ(t)ϕ (t) ϕ(t)y(t). θ̂N = t=1 (1.4) t=1 As can be imagined, the identiﬁcation include several choices to be made by the user. These include how to perform the identiﬁcation experiment, what model structure to choose, which identiﬁcation method or criterion to use and what relevant validation methods to apply. This is a quite complex process. It is important to stress that even though the identiﬁcation can be separated in diﬀerent sub-problems, they are in general dependent on each other. Example 1.1 gives only a simpliﬁed picture of the model estimation step with a prediction error based method. A prediction error method is a natural and logical framework for dealing with the system identiﬁcation problem. Beside its intuitive appeal, there exists a large amount of statistical results that support the method. The following example shows some of the statistical properties for the least-squares estimate (1.4) together with a simple illustration of an optimal input design. Example 1.2 (Example 1.1 continued) Here we will illustrate some accuracy properties of the obtained estimate in Example 1.1. For this we will assume that the true data has been generated by y(t, θ) = b00 u(t − 1) + b01 u(t − 2) + eo (t) (1.5) i.e. the true system has the same structure as the model (1.1). Now study the model estimate deﬁned by (1.3). Introduce the quantities θoT = b00 b01 , (1.6) RN = N ϕ(t)ϕT (t) (1.7) ϕ(t)y(t). (1.8) t=1 and fN = N t=1 Provided that RN is invertible, the estimate θ̂N can be written as −1 −1 fN = θo + RN θ̂N = RN N t=1 ϕ(t)eo (t) (1.9) 1.2 Optimal Experiment Design - A Background 5 Assume that u is a zero mean stationary stochastic process that is uncorrelated with eo . Then θ̂N is an unbiased estimate of θo , i.e. E θ̂N = θo . Furthermore, it can be shown that the covariance of θ̂N obeys −1 r r1 (1.10) lim N Cov θ̂N = λo 0 r1 r0 N →∞ where λo is the noise variance E e2o (t) = λo and rk = E u(t)u(t − k). A common approximation of the covariance for ﬁnite data is thus −1 λo r0 r1 . (1.11) Cov θ̂N ≈ N r1 r0 Consider e.g. the variance of each parameter Var b̂0 ≈ λo r0 ≈ Var b̂1 . N (r02 − r12 ) (1.12) Hence, the variance decays as 1/N . Furthermore, the larger input variance r0 the smaller variance of the parameters. This holds in general for linear systems. Notice that if the objective is to minimize the variance of the parameters according to (1.12) and the input variance is constrained, then the optimal input is white noise i.e. r1 = 0. This is not necessarily true for other objectives though, see e.g. Example 3.4. Since the true covariance of θ̂N is typically diﬃcult to explicitly compute for ﬁnite N , it is common to use the approximation (1.11), that is based on the asymptotic covariance, for experiment design purposes. This is a good approximation provided the data length is suﬃciently large and that the model class captures the dynamics of the true system. For the input design problem in Example 1.1 it was easy to compute an explicit solution. However, optimal input design problems are in general in their original form non-trivial to solve. In Chapter 3 we will study the fundamentals of optimal experiment design problem in more detail. In the next section, we will give a very brief historical background on optimal experiment design. 1.2 Optimal Experiment Design - A Background The focus in this section will be directed towards identiﬁcation of linear systems within the prediction error framework where variance errors 6 1 Introduction are the only concern. Even though these assumptions seem to be very restrictive, this is the most common setting when optimal experiment design is considered. Mostly due to the basis of theoretical understanding that exists for this setting. Some of this theory developed for the prediction error framework regarding model uncertainty will be presented in Chapter 2. Experiment design as a subject dates long back and is a much wider subject than that related to parameter estimation of dynamical systems. In statistics, a huge amount of literature exists but only a smaller part is related to input design where (Box and Jenkins, 1970) is one example. This is much due to the fact that in general there is no controllable input in statistical time series. There are however several important contributions in statistics that have been very useful for the work on input design for linear dynamic systems. Many of these results are related to the work on static linear regression models, see e.g. (Kiefer, 1959), (Kiefer and Wolfowitz, 1959), (Kiefer and Wolfowitz, 1959), (Karlin and Studden, 1966a), (Karlin and Studden, 1966b) and (Fedorov, 1972). 1.2.1 Design for Full Order Models Input design for linear dynamic systems started out around 1960 where (Levin, 1960) is one of the earliest contributions. In the 1970’s there was a vigourous activity in this area for which (Mehra, 1974), (Goodwin and Payne, 1977), (Zarrop, 1979), (Mehra, 1981) and (Goodwin, 1982) act as excellent surveys. Let us summarize some of main points in this work. Let P denote the asymptotic parameter covariance matrix. To measure the goodness of diﬀerent designs, diﬀerent measures of the covariance matrix of the estimated parameters have been used. The covariance matrix provides a measure of the average diﬀerence between the estimate and the true value. The classical approach has been to minimize some scalar function of the asymptotic covariance matrix P with constraints on input and/or output power. Examples of commonly used criteria are A-optimality : min Tr P E-optimality : λmax (P ) (1.13) (1.14) D-optimality : det P L-optimality : Tr W P (1.15) (1.16) 1.2 Optimal Experiment Design - A Background 7 Both design in the time domain and in the frequency domain have been considered. In the time domain the design typically reduces to a nonlinear optimal control problem with N free variables, i.e. the number of variables equals the data length. The complexity was one of the reasons that motivated researchers to do the input design in the frequency domain. By assuming large data lengths and by restricting the class of allowable inputs to those having a spectral representation it is possible to derive nice expressions for the asymptotic covariance matrix in terms of its inverse, i.e. P −1 . Furthermore, design in the frequency domain makes it in general easier to interpret the results. Let the number of estimated parameters be p then any P −1 , which is a Toeplitz matrix, can be characterized by p(p + 1)/2 parameters. An important issue of the work in the context of (Mehra, 1974), (Goodwin and Payne, 1977), (Zarrop, 1979) and others was to ﬁnd a set of ﬁnitely parametrized inputs that parametrizes all achievable information matrices (P −1 ). An important observation was that for input power constrained designs, all achievable information matrices can be obtained using a sum of sinusoids comprising not more than p(p + 1)/2 + 1 components, see (Mehra, 1974). This is a consequence of Carathéodory’s theorem. For single input/single outputs linear systems this number can further reduced by exploiting the structure of P −1 . It is shown in (Goodwin and Payne, 1977) that for an mth order transfer function with 2m parameters, 2m sinusoids are suﬃcient. Some further reﬁnements of this is presented in (Zarrop, 1979), where under some geometric considerations together with the theory of Tschebycheﬀ systems (Karlin and Studden, 1966b) it is possible to reduce the number of required sinusoids to m, which is the lowest possible number still having an informative experiment (Ljung, 1999) Most of the work during this time period was on optimal input design in open-loop. But there are also some contributions related to experiment design in closed-loop, see e.g. (Ng et al., 1977a; Ng et al., 1977b; Gustavsson et al., 1977; Goodwin and Payne, 1977). 1.2.2 Design for High Order Models In the 1980’s input design attained renewed interest when the control community recognized the utility of experiment design as a way to obtain suitable models for control design. The starting point was the derivation in (Ljung, 1985) of an expression for the variance of the estimated frequency functions that is asymptotic both in the model order and data. 8 1 Introduction Assume that the true system and the model is described by the linear relation y(t) = G(q, θ)u(t) + H(q, θ)eo (t) where G and H denote discrete transfer functions describing the system and the noise dynamics, respectively. The novel contribution in (Ljung, 1985) has lead to the well known approximation m Φu (ω) G(ejω , θ̂N ) Φv (ω) Cov ≈ jω Φ N H(e , θ̂N ue (−ω) −1 Φue (ω) λo (1.17) for ﬁnite m and N where m is the model order. Here Φu is the input spectrum, Φue is the cross spectrum that is zero in open-loop operation and Φv is the spectrum of the disturbance. Furthermore the variance of eo is λo . Optimal experiment designs based on diﬀerent variants of the variance expression (1.17) have appeared in (Ljung, 1985) (Yuan and Ljung, 1985), (Yuan and Ljung, 1984) and (Gevers and Ljung, 1986). Some of the key points in this line of research are: • To put the intended use of model into focus when assessing the quality measure in the input design. The performance degradation is measured by the function π J= −π Tr C(ω) · Φv (ω) −1 Φu (ω) Φue (ω) dω λo Φue (−ω) (1.18) where the weighting matrix C11 (ω) C(ω) = C21 (ω) C12 (ω) C22 (ω) depends on the intended application, e.g. simulation, prediction or control, cf. (Ljung, 1985), (Gevers and Ljung, 1986) and (Ljung, 1999). • The high order assumption introduces a certain insensitivity against the order of the true system. • The optimal experiment design can be explicitly calculated. 1.2 Optimal Experiment Design - A Background 9 Other contributions in this line of research are (Hjalmarsson et al., 1996), (Forssell and Ljung, 2000) and (Zhu and van den Bosch, 2000) where diﬀerent aspects of closed-loop identiﬁcation in the context of control applications have been studied. The contributions based on the high order variance expression (1.17) has proven very successful, both in exposing the fundamentals of the input design problem, and in practical applications, see e.g. (Zhu, 1998). As has been mentioned, basing a method on asymptotic in model order variance results yields a certain robustness against the properties of the underlying system. However, it has been shown in e.g. (Ninness and Hjalmarsson, 2002b) that these high order expressions are not always accurate. This is the major fallacy with these methods, especially for loworder modeling, where the accuracy of the high order variance expression in many cases is far from acceptable. This is illustrated in the following simple example. Example 1.3 Consider modeling of the output-error system y(t) = 0.1 u(t) + e(t) q − 0.9 (1.19) where both the input u and the noise e are realizations of zero mean, unit variance white noise processes. Based on 1000 samples of input/output data points, a ﬁrst order output-error model G(ejω , θ̂N ) is estimated. The sample variance of this frequency function estimate based on 1000 experiments with diﬀerent noise realizations is plotted in Figure 1.2 together with the high order approximation (1.17). The high order approximation is clearly a bad approximation for the variance of these model estimates except for frequencies around ω = 0.7 rad. 1.2.3 The Return of Full Order Modeling The uncertainty of the high order variance expression (1.17) for ﬁnite model orders has lead to a renewed interest in full order modeling and experiment designs based on more accurate variance approximations. Two contributions that do not rely high order variance expressions are (Cooley and Lee, 2001), (Lee, 2003) and (Lindqvist and Hjalmarsson, 2001). They are based on the ﬁrst order Taylor expansion Var G(ejω , θ̂N ) ≈ λ dG∗ (ejω , θo ) dG(ejω , θo ) P N dθ dθ (1.20) 10 1 Introduction −20 −30 manitude (dB) −40 −50 −60 −70 −80 −3 10 −2 10 −1 10 0 10 frequency (rad) Figure 1.2: The variance comparison in Example 1.3. The sample variance of G(ejω , θ̂N ) (solid) and the high order variance expression (dashed). (P is the asymptotic covariance matrix for the parameter estimate) of the variance of the frequency function estimate G(ejω , θ̂N ). The approximate variance expression (1.20) is actually the basis for the high order approximation (1.17). However, (1.20) is a good approximation for any model order (as long as the sample size N is large enough) and can therefore be expected to be a better approximation than the associated high order expression, provided of course that the chosen model order is no lower than the true system order. We examine this next in a simple example. Example 1.4 (Example 1.3 continued) The variance approximation (1.20) is compared to the high order approximation and the true variability in Figure 1.3 for the system in Example 1.3. Since the true parameters typically are unknown, the expression (1.20) has also been evaluated for two diﬀerent pole locations, 0.85 and 0.95. The Taylor expansion (1.20) does actually only depend on the pole location and not on the gain of the system for this example. This is proved in (Ninness and Hjalmarsson, 2004). 11 1.2 Optimal Experiment Design - A Background −20 manitude (dB) −30 −40 −50 −60 −70 −80 −90 −3 10 −2 10 −1 10 frequency (rad) 0 10 Figure 1.3: The variance comparison in Example 1.4. The solid line is the sample variance of G(ejω , θ̂N ). The dashed line is the high order variance expression (1.17). The dash-dotted line is the Taylor approximation (1.20) for the true system parameters. The dotted lines represents (1.20) based on the perturbed pole locations 0.85(high static gain of variance) and 0.95(low static gain of variance), respectively. The large discrepancy of the high order variance expression from the true variance for the system in Example 1.4 suggests that basing the input design on such an approximation may result in an inappropriate design for this particular system. This is conﬁrmed in the next example. Example 1.5 (Example 1.4 continued) Let 0.1 G(q) = q − 0.9 and consider an input design problem where there is a frequency-byfrequency bound on the variance of the ﬁrst order model G(ejω , θ̂N ). This can be posed as minimize α subject to Var G(ejω , θ̂N ) ≤ |W (ejω )|2 π 1 2π −π Φu (ω)dω ≤ α. Φu (1.21) 12 1 Introduction −20 −30 magnitude (dB) −40 −50 −60 −70 −80 −90 −100 −110 −3 10 −2 10 −1 10 0 10 frequency (rad) Figure 1.4: Illustration of Example 1.5. Thin solid line: the variance bound |W (ejω )|2 which equals the designed variance for the solution based on the high order variance expression. Dash-dotted line: designed variance based on Taylor approximation. Thick solid line: sample variance for input design based on Taylor approximation. Dashed line: sample variance for input design based on high order variance approximation. Here W (ejω ) is a stable transfer function and the objective is to minimize the input power. Now we compare two diﬀerent designs. In both designs the true variance is approximated. The ﬁrst design is based on the high order variance approximation (1.17) with m = 1 and the second is based on the Taylor approximation (1.20). The designed variances of the frequency function estimate together with the corresponding sample variances based on 1000 Monte-Carlo simulations are illustrated in Figure 1.4. To use a model as low as m = 1 in the high order expression is a bit unfair. Therefore, the previous designs are compared with a design based on the high order variance approximation but with m = 23. The sample variance of G(ejω , θ̂N ) for this design is shown in Figure 1.4. This design satisﬁes almost the variance bound. The price paid is that the input power is almost 8 times larger than the design based on the Taylor approximation. 1.2 Optimal Experiment Design - A Background 13 The previous example further motivates the interest of studying optimal experiment designs based on uncertainty descriptions that are more accurate for ﬁnite model orders than the high order expression (1.20). Many of the quality constraints that have been suggested in the literature are based on L2 -norms, see e.g. (1.18), from which no conclusions can be drawn about stability when control applications are concerned. From the perspective of identiﬁcation for robust control, frequency-byfrequency bounds are in many cases more relevant. Diﬀerent frequencyby-frequency constraints on the model quality have been incorporated into experiment designs in (Hildebrand and Gevers, 2003), (Jansson and Hjalmarsson, 2004c), (Jansson and Hjalmarsson, 2004b), (Bombois et al., 2004c) and (Bombois et al., 2004b), cf. Example 1.5. The contributions (Hildebrand and Gevers, 2003), (Jansson and Hjalmarsson, 2004c) and (Jansson and Hjalmarsson, 2004b) also considers parametric uncertainties in terms of conﬁdence ellipsoids. These ellipsoids take the form Uθ = {θ | N (θ − θo )T P −1 (θ − θo ) ≤ χ2α (n)}. (1.22) The shape of the ellipsoids will depend on the experimental conditions, see (Ljung, 1999). This means that it is possible to design inputs such that the quality objective is satisﬁed for all systems in a conﬁdence region resulting from the identiﬁcation experiment. A variant of the uncertainty set (1.22) is considered in (Bombois et al., 2004c) and (Bombois et al., 2004b) where a projection into the Nyquist plane of the parametric uncertainty set (1.22) is used. A major drive that has made it possible to use more reﬁned model uncertainty descriptions such as (1.20) and (1.22), are the great advances within the optimization community on convex optimization, see e.g. (Boyd et al., 1994), (Boyd and Vandenberghe, 2003) and (Nesterov and Nemirovski, 1994). 1.2.4 Other Contributions The introduction of optimal experiment design given in Section 1.2 gives only a very brief overview of the ﬁeld. Focus has been on contributions related to linear systems where variance errors are the only concern. For more detailed information and references, we refer to the good surveys (Mehra, 1974; Goodwin and Payne, 1977; Zarrop, 1979; Goodwin, 1982; Ljung, 1999) and (Pintelon and Schoukens, 2001) for design of periodic 14 1 Introduction v r e C(ρ) u y Go Figure 1.5: Feedback system. signals. More details on this topic will also be presented in the subsequent chapters. So called ”plant-friendly” system identiﬁcation has been on the agenda within the chemical process control research community, see (Rivera et al., 2002; Rivera et al., 2003; Lee, 2003) and the references therein. The general objective in this line of research is to produce informative data subject to constraints on the input/output amplitude and variation in the time domain. Thus, more focus has been on design in the time domain. 1.3 Iterative Feedback Tuning In this section we will turn from the topic of optimal experiment for identiﬁcation to experiment design from a diﬀerent perspective. Here we will give a short introduction to a method called Iterative Feedback Tuning (IFT). The purpose of IFT is to tune a controller with known structure based on experimental data. The method was originally introduced in (Hjalmarsson et al., 1994) and a general presentation has appeared in (Hjalmarsson et al., 1998). For a recent overview see (Hjalmarsson, 2002). The control performance objective can often be formulated as some cost function. In IFT almost any signal-based criterion can be used. The IFT algorithm provides estimates of the gradient of this criterion based on closed-loop experiments. The cost function is then minimized by some gradient-based search method. Consider the closed-loop system in Figure 1.5 for which the output can be expressed as y(ρ) = (I + Go C(ρ))−1 (Go C(ρ)r + v). (1.23) Here we have assumed that the system is discrete, linear, multivariable and time-invariant. Furthermore ρ is a real valued vector that parametrizes the controller. Now diﬀerentiate the output with respect 1.4 Contributions and Outline 15 to the ith element of ρ. This gives ∂y(ρ) ∂C(ρ) = (I + Go C(ρ))−1 Go (r − y(ρ)). ∂ρi ∂ρi (1.24) From (1.24) it is easy to see that a realization of ∂y(ρ)/∂ρi can be obtained by performing two experiments on the real closed-loop system. In the ﬁrst experiment the output y(ρ) is collected under normal operational conditions. In the second experiment, the signal (∂C(ρ)/∂ρi )(r − y(ρ)) is added on the input u while the normal reference is set to zero. This is however, a rather ineﬃcient way to retrieve gradient information since we have to repeat this procedure for each element of the vector ρ. The topic of Chapter 8 is to study diﬀerent methods to approximate the gradient (1.24) in order to reduce the experimental burden. 1.4 Contributions and Outline In this section brief overviews of the diﬀerent chapters are given together with the main contributions and the related publications. Chapter 2 A brief introduction to identiﬁcation in the prediction error framework is given in Chapter 2. Several important expressions are reviewed that quantify the variability in the estimated parameters as well as the corresponding frequency function estimates. These expressions are frequently used in the experiment designs presented in the subsequent chapters. Chapter 3 The chapter presents some of the fundamentals for optimal experiment design within the prediction error framework of system identiﬁcation. The general ideas of a rather ﬂexible framework for translating optimal experiment design problems into convex optimization programs are also introduced. The key role is the parametrization of the spectrum to be designed. Here we introduce two useful parametrizations of a spectrum, that generalizes previous suggested parametrizations in e.g. (Goodwin and Payne, 1977; Zarrop, 1979; Stoica and Söderström, 1982; Lindqvist and Hjalmarsson, 2001; Jansson and Hjalmarsson, 2004a). The framework can handle several quality constraints of which some frequency wise constraints are expected to be important contributions to the area of 16 1 Introduction identiﬁcation and control. Part of the work in Chapter 3 has been submitted as H. Jansson and H. Hjalmarsson, “A general framework for mixed H∞ and H2 input design”, Submitted to IEEE Transactions on Automatic Control, May 2004. This work is also related to the conference papers H. Jansson and H. Hjalmarsson, “A framework for mixed H∞ and H2 input design”, Mathematical Theories of Networks and Systems, Leuven, Belgium, (2004). H. Jansson and H. Hjalmarsson, “Mixed H∞ and H2 input design for identiﬁcation”, IEEE Conference on Decision and Control, Bahamas, (2004). H. Jansson and H. Hjalmarsson, “Optimal experiment design in closed loop.”, Submitted to 16th IFAC World Congress, Prague, Czech Republic, (2005). Chapter 4 Input designs rely in general on uncertainty descriptions that are valid asymptotically in the number of data. In this chapter we consider a class of linearly parameterized frequency functions for which it is possible to derive variance expressions that are exact for ﬁnite sample sizes. The major contributions are two solutions to translate constraints based on these variance expressions into convex formulations where optimization over the square of the Discrete Fourier Transform (DFT) coeﬃcients is performed. Chapter 5 This chapter applies the theory developed in Chapter 3 on two applications, a process plant and a resonant mechanical system. One of the objective is to study possible beneﬁts of using optimal inputs compared compared to the use of standard identiﬁcation input signals, for example PRBS signals, for the aforementioned applications. The second aspect is to highlight some robustness issues regarding the input design. The main part of this chapter has resulted in 1.4 Contributions and Outline 17 M. Barenthin, H. Jansson and H. Hjalmarsson, “Applications of mixed H2 and H∞ input design for identiﬁcation”, Submitted to 16th IFAC World Congress, Prague, Czech Republic, (2005). Chapter 6 In this chapter we derive both analytical and numerical results on how to accurately identify system zeros. Special attention is given to nonminimum phase zeros. Sensitivity and beneﬁts of optimal input design for identiﬁcation of zeros are discussed and illustrated. With minor changes, this chapter contains the contribution J. Martensson, H. Jansson and H. Hjalmarsson, “Input design for identiﬁcation of zeros.”, Submitted to 16th IFAC World Congress, Prague, Czech Republic, (2005). Chapter 7 This chapter is not directly related to optimal experiment design. Instead a method is presented for convex computation of a class of so-called worst case criteria, of which the worst case Vinnicombe distance is one example. These criteria can be used to evaluate performance and/or stability for an ellipsoidal set of parametric models. The work in this chapter has resulted in H. Jansson and H. Hjalmarsson, “Convex computation of worst case criteria with applications in identiﬁcation for control”, IEEE Conference on Decision and Control, Bahamas, (2004). Chapter 8 The gradient estimation problem in Iterative Feedback Tuning (IFT) for multivariable systems is reviewed in Chapter 8. Diﬀerent approximations are proposed with the purpose to reduce the experimental load. One method, in which operators are shifted in a similar fashion as is used in IFT for scalar systems, is analyzed in more detail. The work in this chapter is published in H. Jansson and H. Hjalmarsson, “Gradient approximations in iterative feedback tuning for multivariable processes”, International Journal on Adaptive Control and Signal Processing, October, 2004. Chapter 2 Parameter Estimation and Connections to Input Design A brief introduction to the prediction error method of system identiﬁcation is given in this chapter. This method consists of basically two steps. The ﬁrst is to choose a model class parameterized by some parameter vector. The second is to ﬁnd the model within the model class that minimizes the prediction error. Due to unmeasurable disturbances and ﬁnite amount of data the estimated model will always contain errors even in those cases where the model class is ﬂexible enough to describe the underlying system. In this chapter we will review results to quantify the variability of the parameter vector and the associated frequency response estimate. It is the objective of the experiment design to shape this variability. 2.1 Parameter Estimation We will assume that the true system dynamics are captured by a linear discrete time-invariant single-input single-output system given by S: y(t) = Go (q)u(t) + v(t) v(t) = Ho (q)eo (t). (2.1) 20 2 Parameter Estimation and Connections to Input Design Here y(t) is the output, u(t) the input, v(t) is the disturbance and eo (t) is zero mean white noise with variance λo . Furthermore, Go (q) and Ho (q) are rational transfer functions in the forward shift operator q (qu(t) = u(t + 1)) with Ho stable, monic and minimum phase. All signals are assumed to have a spectral representation where the spectral densities of u and v will be denoted Φu and Φv , respectively. The true system is modelled by the parametric model M: y(t) = G(q, θ)u(t) + H(q, θ)e(t) (2.2) where e(t) represents white noise with zero mean and variance λ. It is assumed that G and H have the rational forms G(q, θ) = q −nk B(q, θ) , A(q, θ) C(q, θ) D(q, θ) H(q, θ) = (2.3) where nk is the delay and A(q, θ) = 1 + a1 q −1 + · · · + ana q −na B(q, θ) = b1 + b2 q C(q, θ) = 1 + c1 q −1 −1 D(q, θ) = 1 + d1 q −1 + · · · + bnb q + · · · + cnc q −nb +1 −nc + · · · + dnd q −nd (2.4) (2.5) (2.6) (2.7) The polynomials A(q, θ) − D(q, θ) are parameterized by the real vector θ ∈ Rn given by θ = [a1 , · · · , ana , b1 , · · · , bnb , c1 , · · · , cnc , d1 , · · · , dnd ]T (2.8) The one-step-ahead predictor for the model (2.2) is ŷ(t, θ) = H −1 (q, θ)G(q, θ)u(t) + [1 − H −1 (q, θ)]y(t) (2.9) The prediction error framework of system identiﬁcation (Ljung, 1999; Söderström and Stoica, 1989) with a quadratic cost function aims at minimizing the prediction errors ε(t, θ) y(t) − ŷ(t, θ) (2.10) with respect to the parameters in the following fashion θ̂N = arg min θ N 1 ε(t, θ)2 2N t=1 (2.11) Some results on the statistical properties of the estimate θ̂N will now be given. 2.2 Uncertainty in the Parameter Estimates 2.2 21 Uncertainty in the Parameter Estimates One common way to measure the quality of the estimates is to study their asymptotic properties. That is, when the number of data N grows large, the estimates will belong to some distribution. The properties of the distribution will then determine the quality of the estimates. Assume that the true system is in the model set (S ∈ M), i.e. there exists a parameter θo such that G(θo ) = Go and H(θo ) = Ho . Then it can be shown under mild assumptions, see (Ljung, 1999), that the prediction error estimate θ̂N has an asymptotic distribution that obeys √ N (θ̂N − θo ) → N (0, P ) as N → ∞ lim N E(θ̂N − θo )(θ̂N − θo )T = P −1 P (θo ) = λo E[ψ(t, θo )ψ T (t, θo )] ∂ y(t, θ) ψ(t, θo ) = ∂θ N →∞ (2.12) θ=θo Here N denotes the Normal distribution. So when the system is in the model set, the estimate will converge to the true estimate and the covariance of the estimation error decays as 1/N . Hence better and better estimates are obtained when more and more data samples are used in the estimation. 2.2.1 Parameter Covariance However, it is not only the experiment length that will inﬂuence the estimation accuracy. Introduce the spectrum Φu Φue Φχo = (2.13) Φ∗ue λo where Φu is the spectrum of the input and Φue is the cross spectrum between u and eo . The frequency distribution of the spectrum Φχo will also inﬂuence the accuracy as is shown in the next lemma. Lemma 2.1 The inverse of the covariance matrix, P −1 (θo ), is a linear function of the spectrum Φχo given by π 1 F(θo )Φχo (θo )F ∗ (θo )dω (2.14) P −1 (θo ) = 2πλo −π 22 2 Parameter Estimation and Connections to Input Design where F(q, θo ) = Fu (q, θo ) Fe (q, θo ) with dG(θo ) dθ dH(θo ) −1 Fe (θo ) = H (θo ) dθ Fu (θo ) = H −1 (θo ) (2.15) (2.16) Proof: Insert (2.9) into (2.12) and use Parseval’s formula to obtain the integral expression. Since P is a measure of the size of the errors in the parameters, Lemma 2.1 shows exactly how this error is related to the spectrum Φχo , and especially the input spectrum Φu and the cross spectrum Φue . Therefore Lemma 2.1 is very useful when considering experiment design. It is worth to notice that the only quantities that can be used to shape the covariance P are actually the input spectrum Φu and the cross spectrum Φue . The other quantities in (2.14) are all dependent on the true underlying system. The cross spectrum is zero when the system is operating in open-loop. Hence, the input spectrum is the only quantity that inﬂuence the covariance matrix in open-loop. How diﬀerent input spectra may inﬂuence the parameter covariance is illustrated in the following example. Example 2.1 Consider identiﬁcation of the output-error system deﬁned by y(t) = 0.1q −1 u(t) + e(t) 1 − 0.9q −1 (2.17) where e(t) is white noise with unit variance. To illustrate the inﬂuence of diﬀerent input spectra on the covariance of the parameters, two types of input signals having equal energy but diﬀerent frequency content are considered. The ﬁrst input is a white noise sequence with unit variance. The second is low-pass ﬁltered white noise. The spectra of these inputs are given in Figure 2.1. The result of 1000 identiﬁcation experiments with diﬀerent noise realizations is illustrated in Figure 2.2. In each experiment 1000 data points are used. As this example shows, the frequency distribution of the input may has a large impact on the covariance of the parameter estimates. 23 2.2 Uncertainty in the Parameter Estimates 2 10 1 10 0 10 −1 Magnitude 10 −2 10 −3 10 −4 10 −5 10 −6 10 −7 10 −3 10 −2 10 −1 10 Frequency 0 10 Figure 2.1: Amplitude plots regarding Example 2.1. The open-loop system (thin solid), spectrum of input with low-pass characteristics(thick solid) and the spectrum of white input (dashed). 2.2.2 Confidence Bounds for Estimated Parameters From the asymptotic normality of the parameter estimates, see (2.12), it follows that (θ̂N − θo )T PN−1 (θ̂N − θo ) → χ2 (n) as N → ∞ (2.18) with PN = P/N (2.19) Uθ = {θ | (θ − θ̂N )T PN−1 (θ − θ̂N ) ≤ χ2α (n)} (2.20) and, hence, is a conﬁdence region which asymptotically includes the parameter θo with probability α. Thus the estimates will asymptotically be centered around θo and for a certain probability α they will be within an ellipsoid deﬁned by PN and χ2α (n). 24 2 Parameter Estimation and Connections to Input Design −0.75 −0.8 a −0.85 −0.9 −0.95 −1 0 0.05 0.1 b 0.15 0.2 Figure 2.2: Monte-Carlo estimates of the system (2.17) where the true parameters are represented by ao = −0.9 and bo = 0.1 using a colored and a white input signal respectively, see Example 2.1. The estimates based on the colored input are given by (+) and these based on the white input are given by (·). Example 2.2 Reconsider Example 2.1. The 95% conﬁdence region for a certain input design is given by (2.20) with χ20.95 (2) = 5.99, where PN depends on the speciﬁc input spectrum. A comparison of these conﬁdence regions and the Monte-Carlo estimates of Example 2.1 is illustrated in Figure 2.3. 2.3 Uncertainty of Frequency Function Estimates In many situations, e.g. control design applications, it is more useful to express the uncertainty of the estimated model in the frequency domain 25 2.3 Uncertainty of Frequency Function Estimates −0.75 −0.86 −0.8 −0.88 a a −0.85 −0.9 −0.9 −0.92 −0.95 −0.94 −1 0 0.05 0.1 b 0.15 0.2 0.07 0.08 0.09 0.1 b 0.11 0.12 0.13 0.14 Figure 2.3: Conﬁdence ellipses for the diﬀerent input designs in Example 2.1 are plotted for the white input(left plot) and for the colored input(right plot). These ellipses can be compared with the obtained model estimates from the Monte-Carlo simulations. rather than in the parameter domain. 2.3.1 Variance of Frequency Function Estimates Under the assumptions S ∈ M, it can be shown, see (Ljung, 1985), that √ N (G(ejω , θ̂N ) − G(ejω , θo )) → N (0, Π(ω)) (2.21) when N → ∞ with Π(ω) = dG∗ (ejω , θo ) dG(ejω , θo ) P . dθ dθ (2.22) Here ∗ denotes complex conjugate transpose. Hence a useful approximation for ﬁnite data becomes Var G(ejω , θ̂N ) ≈ 1 dG∗ (ejω , θo ) dG(ejω , θo ) P N dθ dθ (2.23) 26 2 Parameter Estimation and Connections to Input Design where the covariance matrix of θ̂N is approximately N1 P . The estimation error will depend on the number of data, the parameter covariance and the sensitivity of the true system to parameter changes. An expression is derived in (Ljung, 1985) that is asymptotic in both the model order and the number of data. Under the assumption of openloop identiﬁcation, it has lead to the well known approximation Var G(ejω , θ̂N ) ≈ m Φv (ω) N Φu (ω) (2.24) for ﬁnite m and N where m is the model order1 . Due to its simple structure and that it provides a certain robustness against the properties of the underlying system, the expression (2.24) has been widely used in experiment design, see e.g. (Ljung and Yuan, 1985; Gevers and Ljung, 1986; Hjalmarsson et al., 1996; Zhu and van den Bosch, 2000; Forssell and Ljung, 2000). However, since this approximation is derived from an expression which is asymptotic in the model order, its accuracy for ﬁnite model orders is not guaranteed. An intriguing fact is that for some situations, the approximate expression (2.24) is quite accurate even for model orders as low as two, see (Ljung, 1985; Ljung, 1999). But it is also easy to construct examples where this expression fails for low model orders, see e.g. (Ninness and Hjalmarsson, 2002b). This has been the inspiration to derive expressions that are exact for ﬁnite model orders, cf. (Ninness et al., 1999),(Xie and Ljung, 2001), (Ninness and Hjalmarsson, 2002b) and (Ninness and Hjalmarsson, 2003). The generalized result in the case of independently parameterized dynamics and noise models (Box-Jenkins models) reads as follows: Φv (ω) Φu (ω) (2.25) mκ 1 − |ξk |2 κ(ω) . |ejω − ξk |2 (2.26) lim N · Var G(q, θ̂N ) = κ(ω) N →∞ k=1 It can be shown that mκ and {ξk }, for some speciﬁc system conﬁgurations, will depend on the poles of G(θo ), the dynamics of the noise and the input, see (Ninness and Hjalmarsson, 2003). The preceding result suggests the following approximation of the variance of G for ﬁnite N Var G(q, θ̂N ) ≈ 1A κ(ω) Φv (ω) . . N Φu (ω) more general expression is given in (1.17). (2.27) 2.3 Uncertainty of Frequency Function Estimates 27 Notice that the model order m in (2.24) is here replaced by the frequency dependent factor κ(ω). Since κ(ω) depends on the input spectrum in a rather complicated way it is not straightforward to replace the approximation (2.24) by the more accurate expression (2.27) to design optimal input signals for identiﬁcation. One exception is for the case of periodic inputs and linearly parameterized models as will be shown in Chapter 4. We will later in this paper show how to directly use the covariance approximation (2.23) instead of using the approximations (2.24) and (2.27) for input design. 2.3.2 Uncertainty Descriptions Based on Parametric Confidence Regions The asymptotic properties (2.12) can be utilized to obtain conﬁdence intervals on the parameter estimates, see Section 2.2.2. This parametric uncertainty corresponds to an uncertainty region in the space of transfer functions denoted D: D = {G(q, θ) |θ ∈ Uθ }. (2.28) An alternative to the covariance expression that is based on a Taylor expansion (2.23), is to directly work with the uncertainty set (2.28) to describe the uncertainty of the frequency function. This has been explored in (Bombois et al., 2000b; Bombois et al., 2001; Gevers et al., 2003) where prediction error identiﬁcation is connected with robust control theory. In Chapter 3, it is shown how diﬀerent frequency function uncertainties based on the uncertainty set (2.28) can be transformed into convex constraints and included in experiment designs. The ﬁrst contribution connected to experiment design and the variability of the frequency function estimate viewed through the parameter uncertainty set (2.20) is (Hildebrand and Gevers, 2003). In Chapter 3 a diﬀerent approach is taken compared to the contribution (Hildebrand and Gevers, 2003). Conﬁdence regions in the frequency domain In (Wahlberg and Ljung, 1992; Bombois et al., 2000a) the image of (2.28) in the Nyquist plane is studied for linearly parameterized models. It is shown that this image is represented for each frequency by an ellipsiod in the Nyquist plane that are centered around the nominal frequency function estimate G(ejω , θ̂N ). 28 2 Parameter Estimation and Connections to Input Design Let the model be represented by G(q, θ) = ΓT (q)θ (2.29) and let g(ejω , θ) = Re G(ejω , θ) Im G(ejω , θ) = Re Γ(ejω ) θ Im Γ(ejω ) (2.30) Γc (ω) Then the conﬁdence ellipsoid for the frequency function estimate G(ejω , θ̂N ) at the frequency ω is deﬁned by UG (ω) = {g ∈ R2 | (g − g(ejω , θ̂N ))T ΠN (ω)(g − g(ejω , θ̂N ) ≤ χ2α (n)} (2.31) where ΠN (ω) = (Γc (ω)PN ΓTc (ω))−1 . For models that are not linearly parameterized, the characterization holds approximately for large N when jω Γ(ejω ) is replaced by the linearization dG(edθ ,θ) θ . o The gain error The expression (2.31) characterizes frequency-by-frequency conﬁdence regions for the frequency function estimate. In many control applications, it is often suﬃcient to consider the gain error (2.32) G(ejω , θ̂N ) − Go (ejω ) Based on (2.31) the gain error is bounded by G(ejω , θ̂N ) − Go (ejω ) ≤ χ2α (n) λ (Γc (ω)PN ΓTc (ω)) (2.33) see (Bombois et al., 2004a; Bombois et al., 2004d). In (Hjalmarsson, 2004) a conﬁdence region for the gain error (2.32) is derived based on Var G(ejω , θ̂N ). This conﬁdence region is given by G(ejω , θ̂N ) − Go (ejω ) ≤ χ2α (n) Var G(ejω , θ̂N ) (2.34) 2.4 Summary 29 which holds with at least α·100% probability. A useful re-parametrization for experiment design purposes is to insert the variance approximation (2.23) into (2.33) which yields the description dG(ejω , θo ) dG∗ (ejω , θo ) jω jω PN (2.35) G(e , θ̂N ) − Go (e ) ≤ χ2α (n) dθ dθ Both (2.33) and (2.34) can be included in the framework that we present in Chapter 3. It is worth to notice that the conﬁdence bound (2.33) has been used for input design in (Bombois et al., 2004d; Bombois et al., 2004c; Bombois et al., 2004b). 2.4 Summary We have presented several results that quantify errors in the parameters as well as the corresponding frequency function estimates. The key expression for experiment design is the expression for the asymptotic covariance matrix P given in (2.14) whose inverse is convex in Φχo . In the subsequent chapters we will use diﬀerent functions of P to quantify the errors in the identiﬁed models, e.g. diﬀerent versions of (2.23), (2.28) and (2.34). Notice that all these results are only valid for ”large” N . There is no general limit on how large N has to be for the asymptotic results to be reliable. Monte-Carlo simulations indicate that for typical system identiﬁcation applications, the results are quite reliable for N 300 (Ljung, 1999). Recent studies of the validity of the asymptotic prediction error theory are presented in (Bittanti et al., 2002) and (Garatti et al., 2003). Non-asymptotic conﬁdence ellipsoids have been considered in (Campi and Weyer, 2002) and (Weyer and Campi, 2002). It should also be emphasized that all considered uncertainty results are based on the assumption that variance errors are the only concern. This must be kept in mind when designing proper experiment designs. For example, if an optimal design is based on an assumption that the true system is a second order linear system and the obtained optimal input is a sum of two sinusoids. Then there is no invalidation power in the input to invalidate a second order model. In other words, this input makes it impossible to check whether a third or a fourth order model is better. Therefore, optimal design must be performed carefully in order to check the underlying assumptions of the true system. Chapter 3 Fundamentals of Experiment Design One of the main contributions of this thesis is to introduce a quite general framework for translating optimal experiment design problems in system identiﬁcation into convex optimization programs. The aim of this chapter is to introduce the fundamentals of this framework. Furthermore, the theory of optimal experiment design will be more thorough studied. For a historical background we refer to Section 1.2 and the references therein. First we will give an overview of the main ideas of the framework. This introduction will also give a ﬂavor of what kind of experiment design problems that at present are solvable. 3.1 Introduction The experiment design problems we consider all have the general form minimize objective subject to quality constraints, and signal constraints Φ χo (3.1) i.e. they can all be formulated as optimization problems that include some constraints on the model quality together with signal constraints. The quality constraints are typically functions of the asymptotic covariance matrix P . Therefore, it is natural to use the input spectrum Φu and 32 3 Fundamentals of Experiment Design eventually the cross spectrum Φue as design variables, cf. (2.14). The signal constraints have to be included to obtain well-posed problems, i.e. to prevent the use of inﬁnite input power. The considered signal constraints include energy as well as frequency-wise constraints. As will be evidenced, typical experiment design problem formulations are in their original form intractable for several reasons: 1. The constraints are typically non-convex and such optimization problems may be diﬃcult to solve. 2. The constraints are in many cases inﬁnite-dimensional, which calls for special care when undertaking the optimization procedure. 3. There is also the problem of ﬁnding a signal realization which has the desired spectral properties. This is called spectral factorization. Thus, a useful input design algorithm should contain a second step that performs spectral factorization of the input spectrum. 4. The asymptotic variance depends typically on the true system parameters θo , i.e. P = P (θo ), which are unknown. It should be emphasized that these diﬃculties appear in many experiment design problems. This is the motivation why there still are many interesting design problems to solve. However, due to the great advances in the optimization community during the last 15 years, there exist today many useful methods to reformulate and solve diﬃcult optimization problems of which there are several that apply to our experiment design problems. In the following sections, we will show that the ﬁrst three diﬃculties listed above may be solved. This is done by introducing a ﬁnite dimensional linear parametrization of the input spectrum and eventually the cross spectrum. Due to this parametrization, several experiment design problems can be reduced to tractable convex optimization problems. In the framework that will be derived several diﬀerent quality measures can be ﬁt into the framework as long as they are convex in P −1 . The last diﬃculty that the optimal solution in general depend on the character of the system to be identiﬁed is inherent for almost all optimal designs. This is unavoidable. In a real application this fact must handled. There are at present very few systematic ways for this. In Section 3.10 we will further discuss this topic. Let us now discuss and illustrate some of the considered constraints in more detail. 33 3.1 Introduction 3.1.1 Quality Constraints In this section we will illustrate some possible and relevant quality constraints that can be included in (3.1). Since we assume that variance errors are the main concern, it is natural that any considered quality measure of our models is a function of the (asymptotic) covariance matrix P (2.14), provided the sample size is large. Examples of such quality constraints are the classical alphabetical measures (1.13)-(1.16). The key expression that shows how the covariance matrix can be manipulated is (recapitulate (2.14)) π 1 Φ (ω) Φue (ω) F(ejω , θo ) ∗u F ∗ (ejω , θo )dω (3.2) P −1 (θo ) = λo Φue (ω) 2πλo −π that shows that P −1 is aﬃne in the input spectrum1 Φu and the cross spectrum Φue . Since these two spectra are the only quantities that can be used to shape P , the main step to obtain tractable quality measures are consequently to make the constraint convex in P −1 . Let us illustrate how this can be done by way of an example. Example 3.1 Consider the constraint λmax (P ) ≤ γ (3.3) γI − P ≥ 0. (3.4) which is equivalent to By Schur complements, (3.4) is equivalent to γI I ≥0 I P −1 (3.5) which obviously is convex in P −1 . Even though (3.5) is convex in P −1 , it is an inﬁnite-dimensional constraint due to the frequency dependence of Φχo . Therefore, it is not straightforward to handle the constraint (3.5). Solutions to this will be presented in Section 3.2 and Section 3.3. These are based on diﬀerent 1 The frequency argument will be frequently omitted in the presentation. 34 3 Fundamentals of Experiment Design parametrizations of the spectrum Φχo that makes it possible to obtain linear and ﬁnite parametrizations of P −1 on the form M P −1 = c̃k Pk + P̄ (3.6) k=−M where Pk , k = −M, · · · , M and P̄ are constant matrices. The original design variable Φχo is replaced by the variables ck , k = −M, · · · , M . Consequently, with a parametrization of the form (3.6) inserted in (3.5), a linear and ﬁnite dimensional constraint is obtained. The discussion around Example 3.1 has unveiled two major steps to obtain tractable quality constraints: ﬁrst make the constraint convex in P −1 , then impose a ﬁnite and linear parametrization of P −1 through a suitable parametrization of Φχo . From now on we will assume that the cross spectrum is zero and only consider input design in open-loop. The more general case of possible feedback in the design will be treated in Section 3.5. Let us now illustrate, by means of a very simple example, how the input spectrum may be parametrized. Example 3.2 Let the model and the true system have the structure y(t, θ) = b0 u(t − 1) + b1 u(t − 2) + e(t) (3.7) i.e. a system where H = 1, see (2.2), and hence dH/dθ = 0. Now the asymptotic covariance (3.2) becomes π −jω jω 1 e −1 e ej2ω Φu (ω)dω (3.8) P = −j2ω e 2πλo −π A general spectrum can be written as Φu (ω) = ∞ rk e−jωk ≥ 0 ∀ω (3.9) k=−∞ where rk are the auto-correlations of the input, i.e. rk = E u(t)u(t − k). Assume that the noise variance λo = 1 and insert (3.9) into (3.8). This yields r0 r1 1 0 0 1 −1 P = = r0 + r1 (3.10) r1 r0 0 1 1 0 which is a linear parametrization of P −1 in r0 and r1 of the form (3.6). 35 3.1 Introduction Notice that for this example, only the ﬁrst two auto-correlation coeﬃcients of the input inﬂuences P −1 . This is a very important fact that will be further explored in the subsequent sections for more general model structures. In Section 3.2 two families of parametrizations of an input spectrum will be introduced. They are denoted ”ﬁnite dimensional spectrum parametrization” and ”partial correlation parametrization”. An example of a ﬁnite dimensional spectrum parametrization is Φu (ω) = M rk e−jωk ≥ 0 ∀ ω (3.11) k=−M where the coeﬃcients rk must be such that Φu (ω) ≥ 0, ∀ ω. Notice that for all spectra of the form (3.11) with M ≥ 1 we obtain the parametrization (3.10) of P −1 . A corresponding partial correlation parametrization can be characterized by the ﬁnite expansion M rk e−jωk (3.12) k=−M which may not be a spectrum itself but where the coeﬃcients rk are constrained such that there exist an expansion rM +1 , rM +2 , . . . that altogether yields a spectrum. This type of parametrization makes it possible to work with inﬁnite expansions of the spectrum. Let us now illustrate these two types of parametrizations on the second order FIR system (3.7). Example 3.3 (Example 3.2 continued) Consider the following design problem. π 1 minimize 2π −π Φu (ω)dω Φu subject to det P −1 ≥ 1 (3.13) With the parametrization (3.11) we obtain the optimization problem minimize r0 ,...,rM subject to r0 r0 r1 r1 ≥1 r0 r0 + 2r1 cos ω + · · · + 2rM cos M ω ≥ 0 ∀ ω (3.14) (3.15) 36 3 Fundamentals of Experiment Design The ﬁrst constraint corresponds to the determinant of P −1 and the second constrain the new optimization variables r0 , . . . , rM to represent a ﬁnite spectrum. For M = 0, (3.15) gives r1 = 0 and for M = 1 the corresponding condition becomes r0 ≥ 2|r1 |. When M increases the feasible region for r1 increases as well. Asymptotically the feasible region approaches r0 ≥ |r1 |. The boundaries for (3.15) for diﬀerent values of M are depicted in Figure 3.1 together with the constraint (3.14). It is easy to verify that the optimal design is given by r0 = 1 and r1 = 0, i.e. the optimal input is white noise with unit variance. With a partial correlation parametrization of the form (3.12) with M = 1, the design problem (3.13) becomes minimize r0 subject to r0 r0 r1 r1 ≥1 r0 r0 r1 r1 ≥0 r0 (3.16) The second constraint ensures that r0 and r1 are correlation coeﬃcients. The bound of this constraint is r0 ≥ |r1 |. The optimal design is thus given by r0 = 1 and r1 = 0. This is a very interesting example. It shows that the partial correlation parametrization yields a complete parametrization of P −1 with a minimal number of parameters (2 free parameters for this example). The ﬁnite spectrum parametrization does not yield a complete parametrization for any ﬁnite M . The optimal solution is, however, retrieved for both the types of input parametrization and for any choice of M . This will of course depend on the chosen quality constraint as will be illustrated in the next example. Let us now go beyond the classical alphabetical measures (1.13)(1.16). Example 3.4 (Example 3.3 continued) The variance of the frequency response for the second order FIR system (3.7) can be approximated by Var{b0 e−jω + b1 e−2jω } ≈ Γ∗ P Γ/N where 37 3.1 Introduction 5 M=4 4 M=1 M=2 M=0 r0 3 2 1 0 −5 −4 −3 −2 −1 0 r 1 2 3 4 5 1 Figure 3.1: Illustration of the optimization problem (3.13). The boundary of (3.14) is given by the dashed line. The thin lines corresponds to the bounds of (3.15) for M = 0, 1, 2, 4. The thick line is the boundary of (3.16). The optimal point is given by the cross. Γ∗ (ω) = [ejω e2jω ]. Now consider the design problem π 1 Φu (ω)dω minimize Φu 2π −π subject to 1 − a2 Γ∗ (ω)P Γ(ω) ≤ 1 |1 − ae−jω |2 ∀ω (3.17) where the quality constraint is represented by a frequency-wise constraint on the variance of the system’s frequency response. The constraint in (3.17) can be rewritten by Schur complements as P −1 − 1 − a2 Γ∗ (ω)Γ(ω) ≥ 0 ∀ ω |1 − ae−jω |2 which with the parametrization (3.11) or (3.12) gives the constraint 1 ejω r1 −jω 2 r0 2 |1 − ae | ≥0 ∀ω (3.18) − (1 − a ) −jω r1 r0 e 1 First of all the performance constraint is now a linear inequality in the variables r0 and r1 . The complication is then the frequency-wise constraint. One solution is to sample the frequency axis and obtain one 38 3 Fundamentals of Experiment Design inequality constraint for each sample point. An alternative solution is to use the fact that (3.18) can be seen as a constraint on a ﬁnite-dimensional multivariable spectrum. Such constraints can be converted into linear ﬁnite dimensional constraints as will be shown in Section 3.2. Solutions and constraints to (3.17) are presented in Figure 3.2. Solutions for a = 0, 0.2, 0.4, 0.6, 0.8 with the partial correlation parametrization (3.12) of the input spectrum are given by (*). Corresponding solutions for a white input are given by (x). For a = 0 the solutions coincide. For larger values of a, the optimal solution is obtained for r1 = 0. Thus the white input is thus not optimal when a = 0. We can also see what the solutions are when the ﬁnite spectrum parametrization (3.11) is used. The bound of (3.18) for a = 0.6 is given in Figure 3.2 by the dotted line. Furthermore, the bounds for the parametrization (3.11) are given by the thin lines as in Figure 3.1. The solutions will lie on the dotted line left of the optimal solution depending on the value of M . The solutions for M = 0 and M = 1 are plotted. It is easy to verify that for a large enough M the solution with the ﬁnite spectrum parametrization (3.11) will coincide with the optimal solution obtained by the partial correlation parametrization. Solutions for the other values of a can easily be extracted from the ﬁgure in a similar manner. In Section 3.2 we will further elaborate on and generalize the concepts of ﬁnite spectrum parametrizations and partial correlation parametrizations. The Example 3.4 illustrates some of the main diﬀerences between these two concepts. The ﬁnite spectrum parametrization does in general require more parameters than the partial correlation parametrization to yield the same solution. The main motivation to use the ﬁnite spectrum parametrization is its capability to handle frequency-wise constraints on the input or output spectra. Such constraints can not be treated by a partial correlation parametrization since we are not working with the complete spectrum. In Example 3.4, we introduced a very simple frequency-by-frequency constraint. In control applications, it is common to have frequency-byfrequency conditions on the error on the frequency function estimate. To illustrate this consider the weighted relative error ∆(ejω , θ) = T (ejω ) Go (ejω ) − G(ejω , θ) G(ejω θ) (3.19) 39 3.1 Introduction 18 16 M=1 14 M=0 r0 12 M=2 M=4 10 8 6 4 2 0 −10 −5 0 r 5 10 1 Figure 3.2: Illustration of Example 3.4. Solid lines correspond to diﬀerent bounds of spectrum parametrizations as in Figure 3.1. Solutions are given for partial correlation parametrizations (*) and white noise inputs (+) for a ∈ [0, 0.2, 0.4, 0.6, 0.8]. The solution for a = 0 is given by r0 = 2 and r1 = 0. The dotted line describes the boundary for the quality constraint in (3.17) for a = 0.6. From this boundary, the solution for the ﬁnite spectrum parametrization with M = 1 can be graphically obtained (+). where Go and G(θ) are the true system and the model, respectively, and where T is a weighting function. When T is equal to the designed complementary sensitivity function, the H∞ −norm of (3.19) has been considered as a relevant measure of both robust stability and robust performance (Morari and Zaﬁriou, 1989; Zhou et al., 1996; Hjalmarsson and Jansson, 2003); e.g. ∆(θ)∞ < 1 is a classical robust stability condition. When the model G(θ) is obtained from an identiﬁcation experiment it will lie in an uncertainty set. Hence a reasonable objective is therefore to design the identiﬁcation experiment such that ∆(θ) becomes small for all models in such an uncertainty set. 40 3 Fundamentals of Experiment Design One way to measure the size of ∆(θ) is by considering its variance which, using a ﬁrst order Taylor approximation, can be expressed as T (ejω ) 2 jω Var G(ejω , θ̂N ) Var ∆(e , θ̂N ) ≈ Go (ejω ) By using the variance approximation (2.23) of G, the variance of ∆(θ) can be approximated as T (ejω ) 2 1 dG∗ (ejω , θo ) dG(ejω , θo ) jω P (3.20) Var ∆(e , θ̂N ) ≈ Go (ejω ) N dθ dθ which is an explicit function of P . Diﬀerent quality measures can now be formulated as constraints on some norm of (3.20). Some examples of constraints are π T (ejω ) 2 dG∗ (ejω , θo ) dG(ejω , θo ) P dω ≤ 1 (3.21) Go dθ dθ −π and T (ejω ) 2 dG∗ (ejω , θo ) dG(ejω , θo ) max P ≤ 1. ω Go (ejω ) dθ dθ (3.22) In Section 3.6, we will show how to treat quality constraints such as (3.21) and (3.22). One alternative to diﬀerent measures of the variance of ∆ is to use the conﬁdence bound in (2.20). This gives |∆(ejω , θ)| ≤ 1 ∀ ω, ∀θ ∈ Uθ Uθ = {θ : N (θ − θo ) P T −1 (3.23) (Φu )(θ − θo ) ≤ χ}. This constraint implies that ∆∞ ≤ 1 for all models in the conﬁdence region associated with the (to be) identiﬁed model. Constraints like (3.23) are treated in Section 3.7. Figure 3.3 illustrates diﬀerent measures of ∆ that today can be handled in an optimal experiment design scheme. The constraints in the right column all relates back to the high order variance expression (2.24). For designs based on this type of constraints we refer to Section 1.2.2 and the references therein. It should be emphasized that the developed methods are not restricted to the ∆-function. The constraints such as (3.21), (3.22) and (3.23) only serve as good illustrations of the potential of the newly derived methods. 41 3.1 Introduction π 1 −π N 1 N 2 T Go 2 T Go dG∗ (θo ) dG(θo ) P dθ dω dθ dG∗ (θo ) dG(θo ) P dθ dθ π 2−norms Var ∆ −π 2 m T N Go m N 2 T Go Φv Φu dω Φv Φu |∆|2 |∆|2 ∀ θ ∈ Uθ max ω 1 N 2 T Go dG∗ (θo ) dG(θo ) P dθ dθ max ω max |∆|2 ∀ θ ∈ Uθ ω ∞−norms m N 2 T Go Φv Φu Figure 3.3: The diagram presents diﬀerent ways to incorporate a performance measure in terms of ∆, the weighted relative model error, into performance constraints for input design. Five possible performance constraints are represented. The two expressions in the upper shaded area represents H2 −norms of the covariance of ∆. The three expressions in the lower shaded area corresponds to diﬀerent H∞ −norms of ∆. 42 3.1.2 3 Fundamentals of Experiment Design Signal Constraints To obtain well-posed designs, signal constraints must be included. The framework allows for several diﬀerent types of signal constraints as long as they are linear in the spectrum Φχo . Examples of constraints are energy constraints like π 1 |Wu (ejω )|2 Φu (ω)dω ≤ 1 2π −π or frequency-wise constraints like αu (ω) ≤ Φu (ω) ≤ βu (ω) ∀ ω What signal constraints that can be included will depend on the parametrization of the involved spectra. Diﬀerent signal constraints will be treated in Section 3.4. 3.1.3 Objective Functions Traditionally, the objective in experiment design has been to optimize some performance criterion subject to signal constraints, see e.g. (Goodwin and Payne, 1977; Yuan and Ljung, 1985; Gevers and Ljung, 1986; Hjalmarsson et al., 1996; Zhu and van den Bosch, 2000). For industrial applications however perhaps more relevant measures are excitation level and experiment time, treated e.g. in (Bombois et al., 2004d) and (Rivera et al., 2003). In the presented framework for experiment design there is a large ﬂexibility in the choice of objective function. One can, e.g. choose whether the objective is to optimize some model quality measure (given bounds on, e.g. input energy) or to optimize some signal quantity such as the input energy (given some model quality constraints). 3.1.4 An Introductory Example An optimal experiment design problem can now be formulated based on the diﬀerent parts described in Section 3.1.1, Section 3.1.2 and Section 3.1.3. To illustrate this consider input design regarding the system described by y(t) = 1 Bo (q) u(t) + e(t) Ao (q) Ao (q) (3.24) 43 3.1 Introduction 30 20 10 magnitude (dB) 0 −10 −20 −30 −40 −50 −60 −3 10 −2 10 −1 10 frequency (rad/s) 0 10 Figure 3.4: Input design for the resonant system. Thin solid line: optimal input spectrum. Dashed line: transfer function T . Dash-dotted line: open-loop system. where Ao (q) = 1 − 1.99185q −1 + 2.20265q −2 − 1.84083q −3 + 0.89413q −4 Bo (q) = 0.10276q −3 + 0.18123q −4 and . Here e(t) is zero mean white noise with variance 0.05. This is a resonant system that describes a ﬂexible transmission system, proposed in (Landau et al., 1995a) as a benchmark problem for robust control design. The input design problem is formulated as: minimize α subject to |∆(ejω , θ)| ≤ γ ∀ ω N (θ − θo )T P −1 (Φu )(θ − θo ) ≤ χ π 1 2π −π Φu (ω)dω ≤ α 0 ≤ Φu (ω) ≤ β(ω) Φu (3.25) 44 3 Fundamentals of Experiment Design 60 40 magnitude (dB) 20 0 −20 −40 −60 −2 10 −1 0 10 10 frequency (rad/s) Figure 3.5: Input design for the resonant system. Thin solid line: optimal input spectrum. Dashed line: transfer function T . Dash-dotted line: open-loop system. Thick solid line: upper bound on Φu . The objective of this input design problem is to ﬁnd the input spectrum with the least energy that satisﬁes (3.23). There may also exist a frequency-by-frequency constraint on the input spectrum here represented by β(ω). We choose the input spectrum to be parametrized as Φu (ω) = M rk e−jωk ≥ 0 ∀ ω. (3.26) k=−M The solution to (3.25) without a frequency-by-frequency upper bound on the spectrum is shown in Figure 3.4 (i.e. β(ω) = ∞). Here we have used M = 35. To reduce the required input power, the design concentrates most of the input power around the ﬁrst resonance peak, which intuitively seems reasonable. In practice there may exist constraints on the input excitation in diﬀerent frequency bands. Figure 3.5 illustrates a solution when there is a frequency-wise bound on the input spectrum, that e.g. constrain possible excitation around the ﬁrst resonance peak. In 3.2 Parametrization and Realization of the Input Spectrum 45 this setting we have used M = 50. This resonant system will also appear in Chapter 5. After this introduction we will now give a more theoretical description of the diﬀerent parts in the framework. The subsequent sections are organized as follows. Section 3.2 introduces two methods for parametrizing a spectrum. These parametrizations are used for parametrizing the asymptotic covariance matrix in Section 3.3. Diﬀerent signal constraints are considered in Section 3.4 and experiment design in closed-loop are treated in Section 3.5. How to handle diﬀerent quality constraints are described in Section 3.6 and Section 3.7. In Section 3.8 we will consider optimal design in open-loop for the case of biased noise models. Some comments on the computational complexity are given in Section 3.9 and diﬀerent aspects on robustness regarding the dependence of the solution on the true system are discussed in Section 3.10. Finally, a summary of the framework together with a numerical example are given in Section 3.11. 3.2 Parametrization and Realization of the Input Spectrum We will consider identiﬁcation in open-loop in Sections 3.2-3.4. The generalization to closed-loop operation is given in Section 3.5. 3.2.1 Introduction The expression (2.14) for the inverse of the asymptotic covariance matrix P shows that the spectrum is the only input related quantity that has any inﬂuence on P . Since any quality measure is a function of P asymptotically in the sample size when only variance errors are present, this means that the input spectrum is the only quantity that can be used to negotiate quality constraints. Generally, a spectrum may be written Φu (ω) = ∞ c̃k Bk (ejω ) (3.27) k=−∞ for some basis functions {Bk }∞ k=−∞ which span L2 . Without loss of generality we will assume that B−k = Bk∗ such that c̃−k = c̃k so that the spectrum is uniquely characterized by c̃0 , c̃1 , c̃2 , . . .. We will also assume 46 3 Fundamentals of Experiment Design that Bk (e−jω ) = Bk∗ (ejω ) so that c̃k ∈ R. The coeﬃcients c̃k must be such that Φu (ω) ≥ 0, ∀ω. (3.28) For the most common choice of basis functions, Bk (ejω ) = e−jωk , the coeﬃcients c̃k have the interpretation as auto-correlations: ∆ c̃k = rk = E u(t)u(t − k). We will therefore reserve the notation rk for the parameters when working with complex exponential basis functions. As we will see in Sections 3.3, 3.6 and 3.7 many quality constraints can be transformed such that they become linear in the input spectrum. It is thus natural to parametrize the spectrum in terms of the coeﬃcients c̃k . However, it is impractical to use an inﬁnite number of parameters so the parametrization has to be restricted. One possibility is to use a ﬁnite dimensional parametrization2 , i.e. M −1 Φu (ω) = c̃|k| Bk (ejω ) (3.29) k=−(M −1) for some positive integer M . Here one has to impose the condition (3.28) to ensure that Φu indeed is a spectrum. We will denote this type of approach as a “ﬁnite dimensional spectrum parametrization”. Instead of parametrizing the spectrum, one may equivalently work with a parametrization of the positive real part Φu (ω) = Ψ(ejω ) + Ψ∗ (ejω ) Ψ(ejω ) = M −1 ck Bk (ejω ). (3.30) k=0 This will be our preferred choice when it comes to ﬁnite dimensional spectrum parametrizations. Alternatively, one may use a partial expansion M −1 c̃|k| Bk (ejω ) (3.31) k=−(M −1) 2 We will frequently use the notation c̃ |k| of the free parameters to emphasize that the basis functions have been chosen such that c̃−k = c̃k . 3.2 Parametrization and Realization of the Input Spectrum 47 which may not be a spectrum in itself, but constrained such that there exists additional coeﬃcients c̃|k| , k = M, M + 1, . . . such that the expansion (3.27) satisﬁes the non-negativity condition (3.28). This approach, which we denote as a “partial correlation parametrization”, thus enables one to work with inﬁnite dimensional expansions. When the input design problem under consideration only depends on the ﬁrst M expansion coeﬃcients c̃k , k = 0, 1, . . . , M − 1, the partial correlation parametrization has lower complexity than the ﬁnite dimensional spectrum parametrization. On the other hand, when there are frequency-by-frequency constraints on the input (and output) spectrum of the system, a ﬁnite dimensional spectrum parametrization is in general required. Below we will discuss these two approaches in more detail. 3.2.2 Finite Dimensional Spectrum Parametrization With a ﬁnite dimensional spectrum parametrization such as (3.30), the parameters ck have to be constrained such that the positivity condition (3.28) holds. To handle this condition, we will use ideas from FIRﬁlter design (Wu et al., 1996). This approach is based on the positive real lemma which is a consequence of the Kalman-Yakubovich-Popov (KYP) lemma, see e.g. (Yakubovich, 1962; Rantzer, 1996). The key idea is to postulate that the spectrum should be realizable using an M -th order FIR-ﬁlter. Using this restriction on the input spectrum, ﬁnitedimensional constraints can be used to represent constraints on the input and the output. Since any spectrum can be approximated by an FIR-process to any demanded accuracy, the approach is in principle generally applicable. However when M becomes too large, computational complexity becomes an issue. This idea was originally introduced for input design in (Lindqvist and Hjalmarsson, 2001). Here we will generalize the idea by imposing a ﬁnitedimensional linear parametrization of the input spectrum in which the parametrization used in (Lindqvist and Hjalmarsson, 2001) appears as a special case. We also point out that this type of parametrization has been used in parameter identiﬁcation (Stoica et al., 2000). We will employ the parametrization (3.30) of the positive real part of the spectrum, where {Bk } is a set of known proper stable ﬁnite dimensional rational transfer functions, e.g. Laguerre functions (Wahlberg, 1991) or Kautz functions (Wahlberg, 1994). When Bk (ejω ) = e−jωk we have the FIR case with the sequence {ck } corresponding to the correlation 48 3 Fundamentals of Experiment Design coeﬃcients {rk }. Notice that it is not necessary for the basis functions to be orthogonal; but, naturally, they should be linearly independent. To ensure that the spectral constraint (3.28) is satisﬁed, the following result may be used. Lemma 3.1 Let A, B, C, D be a controllable state-space realization of the positive M −1 real part of the input spectrum, Ψ(ejω ) = k=0 ck Bk (ejω ). Then there exists a Q = QT such that Q − AT QA −AT QB K(Q, {A, B, C, D}) −B T QB −B T QA (3.32) 0 CT + ≥ 0 C D + DT if and only if Φu (ω) M −1 ck [Bk (ejω ) + Bk∗ (ejω )] ≥ 0 ∀ ω k=0 Proof: This is an application of the Positive Real Lemma, (Yakubovich, 1962). The idea is to let A, B, C, D be a state-space M −1 realization of the positive part of the input spectrum, Ψ(ejω ) = k=0 ck Bk (ejω ) where {ck } appears linearly in C and D. It is easy to construct such a realization since {ck } appears linearly in Ψ(ejω ). Given this realization and the symmetric matrix Q, the constraint Φu (ω) ≥ 0 may be replaced with the linear matrix inequality (3.32). We illustrate this with an example. Example 3.5 When the input is shaped by an FIR ﬁlter, the positive real part of the spectrum becomes Ψ(ejω ) = M −1 1 r0 + rk e−jωk . 2 k=1 For a FIR system, a natural choice of state-space description for the 3.2 Parametrization and Realization of the Input Spectrum positive real part is the controllable form: O1×M −2 0 A= , B = [1 0 . . . 0]T OM −2×1 IM −2 1 C = [r1 r2 . . . rM −1 ] , D = r0 , 2 49 (3.33) where Om×k is the zero matrix of size m by k and Im is the identity matrix of size m by m. With this parametrization, the correlation sequence appears linearly in the inequality (3.32) through C and D. Hence (3.32) becomes an LMI in Q and r = [r0 , . . . , rM −1 ]T . When a linear and ﬁnite-dimensional parametrization of the input spectrum is used, it becomes easy to construct a state-space description of the positive real part of the spectrum, see e.g. Example 3.5. Given that a state-space description of the positive real part of the spectrum is available there exist directly applicable results to perform spectral factorization. By solving an algebraic Riccati equation it is possible to obtain an innovations representation corresponding to the desired spectral properties, see (Anderson and Moore, 1979, Chap. 9) and (Lindqvist, 2001). One may also use discrete spectra, i.e. spectra from periodic inputs. A discrete spectrum with K non-positive spectral lines distributed over (−π, π] is given by Φ(ω) = ∞ c̃k δ (ω − ωk ) (3.34) k=−∞ where c̃k ≥ 0 c̃k+K = c̃k c̃−k = c̃k . (3.35) For the frequencies it holds that ωk+K = ωk + 2π ω−k = −ωk . (3.36) For such spectra the positive real lemma simpliﬁes to the ﬁrst condition in (3.35). This type of parametrization will be used in Chapter 4. 50 3.2.3 3 Fundamentals of Experiment Design Partial Correlation Parametrization When using a partial correlation parametrization one must ensure that there exists an extension c̃M , c̃M +1 , c̃M +2 , . . . of the sequence c̃0 , . . . , c̃M −1 such that the corresponding basis function expansion (3.27) deﬁnes a spectrum. Here, we will restrict attention to the case where the basis functions are complex exponentials so that the parameters {c̃k } are the autocorrelation coeﬃcients {rk }. The correlation extension problem3 is then known as the trigonometric moment problem, or as the Carathéodory extension problem. It is well known that a necessary and suﬃcient condition for the existence of such an extension is that the Toeplitz matrix ⎡ ⎢ ⎢ T =⎢ ⎣ r0 r−1 .. . r1 r0 .. . ··· ··· .. . rM −1 rM −2 .. . r−(M −1) r−(M −2) ··· r0 ⎤ ⎥ ⎥ ⎥ ⎦ (3.37) is positive deﬁnite (Grenander and Szegö, 1958; Byrnes et al., 2001). Notice that T ≥ 0 is an LMI in the rk :s and hence a convex constraint. A complete characterization of all spectra having rk , k = 0, . . . , M −1 as ﬁrst expansion coeﬃcients is given by the Schur parameters (Schur, 1918; Porat, 1994). For rational expansions, the so-called maximum entropy solution is to use an all-pole, or AutoRegressive (AR), ﬁlter of order M − 1. The solution is easily obtained via the Yule-Walker equations (Stoica and Moses, 1997). When there are prescribed zeros, the solution can be computed via convex optimization (Byrnes et al., 2001). It is also possible to use a discrete spectrum of the form (3.34). All partial correlation sequences r0 , . . . , rM −1 can be generated by a discrete spectrum (Grenander and Szegö, 1958; Payne and Goodwin, 1974). However, the frequencies ωk do not necessarily coincide with the fundamental frequencies 2πk/N . In (Hildebrand and Gevers, 2003) one solution is presented where M sinusoids are determined. The procedure becomes quite involved since, apart from the amplitudes, also the locations of the frequencies ωk have to be determined. One may also attempt to realize a discrete spectrum of the form (3.34) 3 or, as is more common the covariance extension problem. 51 3.2 Parametrization and Realization of the Input Spectrum with the frequencies evenly distributed ωk = 2πk/K. It holds that rk = N −1 2π c̃l ej N kl , k = 0, . . . , M − 1. (3.38) l=0 Hence, if, for given r0 , . . . , rM −1 , there exists a feasible solution {c̃k } to the constraints (3.35) and (3.38), the spectrum (3.34) has the desired partial correlation sequence. Notice, that there is no guarantee that there exists such a solution. In general, increasing the number of spectral lines K may increase the chances of a feasible solution. Before we proceed we remark that the discussion in this section also applies to parametrizations of the type Bk (ejω ) = L(ejω ) e−jωk , L(ejω ) > 0. (3.39) Since L is positive it must hold that k c̃k e−jωk also is positive and, hence, the discussion above applies to this factor. This type of basis functions will appear in Example 3.8 below. 3.2.4 Summary We have in this section described diﬀerent ways of parametrizing the input spectrum. The starting point has been the ﬁnite expansion M −1 c̃|k| Bk (ejω ). (3.40) k=−(M −1) Two principally diﬀerent approaches have been delineated: • Finite dimensional spectrum parametrization. In this approach, one includes the constraint that (3.40) is a spectrum. As shown in Section 3.2.2, doing this via the KYP lemma leads to a convex constraint. For discrete spectra, the condition is simpliﬁed to scalar non-negativity conditions (3.35). • Partial correlation parametrization. Here one includes a convex M −1 constraint that ensures that {c̃k }k=−(M −1) is a partial correlation sequence. These parameters are shaped according to the design criteria that exists. Then in a second step, stochastic realization 52 3 Fundamentals of Experiment Design theory is used to extend the resulting expansion (3.40) to a bona ﬁde spectrum Φu (ω) = ∞ c̃|k| Bk (ejω ). k=−∞ In Section 3.2.3 we saw that there exists a number of methods for this. The partial correlation parametrization has the advantage of keeping the number of free parameters to a minimum. One disadvantage is that spectral constraints on signals cannot be handled as one is not working with the complete spectrum. This is, as we will see in Section 3.4, not a problem with the ﬁnite dimensional spectrum parametrization. The positivity condition employed in the ﬁnite dimensional spectrum parametrization is more restrictive than the partial correlation sequence condition. Hence the former approach leads in general to more conservative results than the latter approach. A key insight is that both types of parametrizations are based on a ﬁnite dimensional parametrization of the type (3.40) which leads to a linear parametrization of P −1 in the free variables. We shall therefore in Section 3.6 and Section 3.7 see how various quality constraints can be transformed to be linear in P −1 . Before this, however, we analyze how the asymptotic covariance matrix P , deﬁned in (2.12), is parametrized by the input spectrum. 3.3 Parametrizations of the Covariance Matrix In this section we will consider how to to obtain a linear and ﬁnite dimensional parametrization of the inverse covariance matrix P −1 . In Section 3.3.1 the parametrizations are based on partial correlation parametrizations of the input spectrum. This leads to complete parametrizations of P −1 . In Section 3.3.2 another type of parametrization based on a ﬁnite dimensional spectrum parametrization is presented. 3.3 Parametrizations of the Covariance Matrix 3.3.1 53 Complete Parametrizations of the Covariance Matrix In this section we will consider how to parametrize the input spectrum such that all possible covariance matrices P , deﬁned in (2.12), can be generated. In particular we are interested in minimal parametrizations, i.e. those using a minimal number of parameters. Such parametrizations are of interest from a computational point of view. The starting point is the expression (2.14) for the inverse of the covariance matrix P that for open-loop operation is given by P −1 (θo ) = 1 2πλo π −π Fu (ejω , θo )Φu (ω)Fu∗ (ejω , θo )dω + Ro (θo ) (3.41) o) where Φu is the spectrum of the input and Fu (θo ) = H −1 (θo ) dG(θ dθ . Furthermore, Ro is deﬁned as Ro (θo ) = 1 2π π −π Fe (ejω , θo )Fe∗ (ejω , θo )dω (3.42) o) where Fe (θo ) = H −1 (θo ) dH(θ dθ . Since the elements of Fu span a linear subspace, it follows that the set of all covariance matrices can be parametrized in terms of ﬁnite dimensional parametrizations of Φu . In the sub-sections that follow we will discuss some possibilities that exist to this end. As a preparation we characterize the space spanned by the elements of Fu (θo )Fu∗ (θo ). We denote this space by X(θo ) and we will assume that it has dimension 2nG − 1 for some non-negative integer nG . We motivate this assumption with an example. Example 3.6 Consider a Box-Jenkins model structure y(t) = C(q, θ) q −nk B(q, θ) u(t) + e(t) A(q, θ) D(q, θ) where A(q, θ), B(q, θ), C(q, θ) and D(q, θ) are deﬁned by (2.4)-(2.7). 54 3 Fundamentals of Experiment Design Then ⎛ q −nk A(q,θo ) ⎜ .. ⎜ ⎜ . ⎜ q−nk −nb +1 ⎜ ⎜ A(q,θo ) −1 ⎜ o) ⎜ − q A2B(q,θ (q,θo ) ⎜ ) dG(q, θ o .. Fu (q, θo ) = H −1 (q, θo ) = H −1 (q, θo ) ⎜ ⎜ dθ . ⎜ −n ⎜ q a B(q,θo ) ⎜ − A2 (q,θo ) ⎜ ⎜ 0 ⎜ ⎜ .. ⎝ . 0 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (3.43) which, assuming B(θo ) and A(θo ) to be coprime, implies that # $ X(θo ) = Span e−j(nG −1)ω L−1 (ejω , θo ), . . . , ej(nG −1)ω L−1 (ejω , θo ) (3.44) where nG = nb + na and L(ejω , θo ) = |H(ejω , θo )|2 |A(ejω , θo )|4 . (3.45) Since the elements of Fu are rational and have the poles bounded away from the unit circle it follows that X(θo ) ⊂ L2 . Let {Bk (θo ), k = −(nG − 1), . . . , nG − 1} denote an orthonormal basis for X(θo ) which is such that4 B−k (θo ) = Bk∗ (θo ). We can thus write Fu (ejω , θo )Fu∗ (ejω , θo ) = n G −1 Fk (θo ) Bk (ejω , θo ) (3.46) k=−(nG −1) for some matrices Fk (θo ) ∈ Rn×n . Next, we will use (3.46) to characterize the set of asymptotic covariance matrices deﬁned in (3.41). 4 Due to the symmetry of F (ejω , θ )F ∗ (ejω , θ ) this is always possible, cf. Examu o o u ple 3.6. 3.3 Parametrizations of the Covariance Matrix 55 Subspace expansions We can write any input spectrum on the form n G −1 Φu (ω) = c̃|k| Bk (ejω , θo ) + Φ⊥ u (ω) (3.47) k=−(nG −1) where Φ⊥ u is orthogonal to X(θo ). Using (3.46) and (3.47) in (3.41) gives P −1 1 (θo ) = λo n G −1 c̃|k| Fk (θo ) + Ro (θo ) (3.48) k=−(nG −1) This shows that the ﬁnite sequence c̃0 , . . . , c̃nG −1 completely parametrizes all possible covariance functions. It is also clear that the expansion does not have to be orthonormal as long as the basis functions in (3.47) span X(θo ). Notice that then (3.48) no longer holds; linear combinations of Fl (θ0 ), l = −(nG − 1), . . . , nG − 1 will replace Fk (θ0 ). Example 3.7 (Example 3.6 continued) The elements in (3.44) can be used as the ﬁrst basis functions in (3.47) in the Box-Jenkins case. Oblique expansions It is not necessary that the basis functions are elements of the subspace X(θo ). We illustrate this with an example. Example 3.8 (Example 3.6 continued) Let the spectrum be parametrized with the basis functions Bk (ejω , θo ) = L(ejω , θo ) e−jωk (3.49) where L is given by (3.45). It then follows from (3.41) and (3.43) that P −1 (θo ) = n G −1 c̃|k| Lk (θo ) + Ro (θo ) (3.50) k=−(nG −1) for some matrices Lk (θo ) ∈ Rn×n . Hence, c̃0 , . . . , c̃nG −1 parametrize all possible covariance matrices. Notice that L(ejω , θo )e−jωk is not orthogonal to X(θo ) when |k| < nG but that these terms in general do not belong to X(θo ). The remaining terms (corresponding to |k| ≥ nG ) are orthogonal to X(θo ). 56 3 Fundamentals of Experiment Design This type of parametrization has been considered in (Goodwin and Payne, 1977), and been employed in, e.g. , (Hildebrand and Gevers, 2003). We will now give present a slight modiﬁcation of this parametrization that is useful for Box-Jenkins model structures. This parametrization was originally introduced in (Stoica and Söderström, 1982). Example 3.9 (Example 3.6 continued) Deﬁne L̃ and {˜lk } as L̃(ejω , θo ) =|C(ejω , θo )|2 |A(ejω , θo )|4 nl ˜l|k| e−jωk (3.51) k=−nl where nl = 2na + nc − 1. Furthermore, introduce the auto-correlations c̃k deﬁned by π Φu (ω) jωk 1 c̃|k| = dω (3.52) e 2π −π L̃(ejω , θo ) and let np = na + nb + nd − 1. It then follows from (3.41) and (3.43) that there exist matrices L̃k (θo ) ∈ Rn×n such that P −1 1 (θo ) = 2πλo π −π np Φu (ω) L̃k (θo )e−jωk dω + Ro (θo ) L̃(ejω , θo ) k=−np which by (3.52) is equivalent to P −1 (θo ) = np 1 c̃|k| L̃k (θo ) + Ro (θo ). λo (3.53) k=−np This parametrization can equivalently be obtained by using a spectrum with the basis functions Bk (ejω , θo ) = L̃(ejω , θo ) e−jωk . (3.54) The parametrization (3.53) is also complete but not minimal when nd > 0. The motivation for this over-parametrization is that it becomes possible to obtain ﬁnite dimensional parametrizations of input and output power constraints also when nd > 0. This will be illustrated in Section 3.4.1. 57 3.3 Parametrizations of the Covariance Matrix Remark on robustness issues In the preceding examples the basis functions depended on the true parameter θo and in general it is necessary to know θo in order to guarantee that the basis functions constitute a complete parametrization. However, it is suﬃcient that these functions are such that their projections on X(θo ) span X(θo ) themselves. Hence, even if θo is unknown, the set of basis functions which do not completely parametrize the set of all covariance matrices has Lesbegue measure zero and, thus, the choice of basis functions is not that critical. 3.3.2 A Parametrization Based on a Finite Dimensional Spectrum With a ﬁnite dimensional spectrum parametrization of the input spectrum on the form (3.30), it is possible to express the inverse covariance matrix P −1 (θo ) as an aﬃne function of the variables {ck }, the sequence that parametrizes the input spectrum. Lemma 3.2 When the input signal has the spectrum (3.30), the inverse covariance matrix is given by P −1 (θo ) = Ro (θo ) + M −1 ck BkP (θo ) (3.55) k=0 where BkP (θo ) = 1 2πλo π −π Fu (ejω , θo )[Bk (ejω ) + Bk∗ (ejω )]Fu∗ (ejω , θo )dω. Proof: Use the deﬁnition of the input spectrum, (3.30), and insert this into (3.41). 3.3.3 Summary It possible to derive a ﬁnite dimensional parametrization of P −1 on the form P −1 (θo ) = M k=−M c̃|k| Lk (θo ) + Ro (θo ). (3.56) 58 3 Fundamentals of Experiment Design The expression (3.56) can actually be obtained for both ﬁnite dimensional spectrum parametrizations and partial correlation parametrizations of the input spectrum. The diﬀerence between these two types of spectrum parametrizations are the constraints that are imposed on the variables {c̃k }. The consequence of these constraints is that ﬁnite dimensional spectrum parametrizations typically lead to incomplete parametrizations of P −1 . A good illustration of this is given in Example 3.3. This example shows that it may not be necessary to parametrize all covariance matrices to achieve the optimal design. The choice of parametrization depends also on what type of input/output constraints that are considered in the design, as will be illustrated in the next section. A partial correlation parametrization can e.g. not handle frequency-wise constraints on the spectrum. 3.4 Parametrization of Signal Constraints Here we will further explore the parametrization of the input spectrum, deﬁned in Section 3.2. The objective is to formulate diﬀerent frequency domain signal constraints as ﬁnite-dimensional aﬃne functions in the variables that deﬁne the input spectrum. The considered signal constraints are limitations on the input and/or output spectra in terms of power as well as frequency-by-frequency constraints. 3.4.1 Parametrization of Power Constraints For a ﬁnite dimensional spectrum parametrization the following result applies regarding the parametrization of input and output variance constraints. Lemma 3.3 The variance of z = Wu u, where Wu is a stable linear ﬁlter and u has the spectrum (3.30), can be expressed as 1 2π |Wu (ejω )|2 Φu (ω)dω = −π with Bku = π 1 2π M −1 ck Bku (3.57) k=0 π −π |Wu (ejω )|2 [Bk (ejω ) + Bk∗ (ejω )]dω. (3.58) 59 3.4 Parametrization of Signal Constraints The variance of zy = Wy y, where Wy is a stable linear ﬁlter and y is the output corresponding to the input u above, when the system is operating in open-loop is given by 1 2π π −π |Wy (ejω )|2 Φy (ω)dω = with Bky (θo ) 1 = 2π M −1 ck Bky (θo ) + Rv (θo ) (3.59) k=0 π −π |Wy (ejω ) G(ejω , θo )|2 [Bk (ejω ) + Bk∗ (ejω )]dω and Rv (θo ) = 1 2π π −π |Wy (ejω )|2 Φv (ω)dω. (3.60) (3.61) Proof: The aﬃne expression (3.57) is an immediate consequence of the deﬁnition of the spectrum. The expression (3.59) is obtained using the fact that y = G(θo )u+v where u and v are uncorrelated due to open-loop operation and thus the output spectrum becomes Φy = |G(θo )|2 Φu + Φv . Thus, the power constraints can be formulated as linear inequalities in the new variables {ck }. Example 3.10 When the input is shaped by an FIR ﬁlter and the weighting ﬁlters Wu and Wy are unity, the power constraints in Lemma 3.3 become π 1 Φu (ω)dω = r0 (3.62) 2π −π and 1 2π π −π Φy (ω)dω = Rv (θo ) + M −1 k=−(M −1) G where r|k| is deﬁned by |G(ejω , θo )|2 = ∞ k=−∞ G −jωk r|k| e . G r|k| r|k| 60 3 Fundamentals of Experiment Design For some partial correlation parametrizations of the input, it is possible to derive ﬁnite linear parametrizations of the input or the output power, see e.g. (Stoica and Söderström, 1982). Here we will illustrate how to do this for a Box-Jenkins model structure. Example 3.11 (Example 3.9 continued) Consider the partial correlation parametrization deﬁned by (3.51)-(3.53). For this parametrization it holds that 1 2π π −π Φu (ω)dω = 1 2π nl ˜l|k| e−jωk Φu (ω) dω = c̃|k| ˜l|k| jω , θ ) L̃(e −π k=−n o k=−nl l π nl (3.63) i.e. the input energy is a linear function in {c̃k }, the variables that parametrize P −1 , see (3.53). 3.4.2 Parametrization of Point-wise Constraints To handle point-wise constraints αu (ω) ≤ Φu (ω) ≤ βu (ω) ∀ ω αy (ω) ≤ Φy (ω) ≤ βy (ω) ∀ ω (3.64) on the input and output spectra, the KYP-lemma may be used to transform these into linear matrix inequalities just as for the input spectrum constraint, cf. Lemma 3.1, when the constraints are rational functions. Alternatively, these may be added as sampled frequency domain constraints. This will not have any drastic consequences on the performance of the algorithm, as a violation of these constraints in between the sampling points not will cause the input design to be non-realizable. Notice that constraints such as (3.64) can only be handled by ﬁnite dimensional spectrum parametrizations. 3.5 Experiment Design in Closed-loop In this section we will consider experiment design in closed-loop. Some early contributions on experiment design in closed-loop are (Ng et al., 1977a; Ng et al., 1977b; Gustavsson et al., 1977). An interesting question is when it is beneﬁcial to consider design in closed-loop. There is no clear cut answer to this. It will certainly depend on choice of objectives and 3.5 Experiment Design in Closed-loop 61 constraints. For the case of constrained input variance, it is shown in (Ng et al., 1977b) that there is always an D-optimal design in open-loop when the system and the noise dynamics are independently parametrized. However, it is also shown that there may be beneﬁcial to use feedback when they have parameters in common. In the context of control applications, various arguments for the advantages of identiﬁcation in closed-loop has been brought forward. In (Gevers and Ljung, 1986) it was shown that for situations when the high order variance expression is valid, closed loop experiments under minimum variance control are optimal if the model is to be used for minimum variance control. For similar problem formulations it was shown in (Hjalmarsson et al., 1996) that it is possible to outperform any ﬁxed input design by a sequence of closed loop designs, provided the experimentation time is long enough. Again relying on the high-order variance expression, (Forssell and Ljung, 2000) showed that, typically, closed-loop experiments are optimal when the output variance is constrained during the experiments. However, there are system conﬁgurations, e.g. output-error systems, for which open-loop design still are optimal, even for the case of constrained output variance. The results in (Forssell and Ljung, 2000) show this clearly. It is, however, stated in the conclusions in the same paper, that closed-loop always are optimal when the output variance is constrained. This is not correct as explained in the previous discussion. In another line of research focusing on the bias error it has also been shown that closed-loop experiments can be beneﬁcial, see e.g. (Zang et al., 1995; Lee et al., 1993). Closed loop experimentation can also be motivated by practical arguments. Most industrial processes are being operated in closed-loop and it is often not possible to open the loop. Therefore, identiﬁcation experiments often have to be performed in closed loop with an existing controller in the loop. Considering the above, it is of interest to re-examine optimal closed loop experiment design for ﬁnite model order. In this section, we will generalize the results of Sections 3.2-3.4 to include feedback in the input. The main diﬀerence between design in open-loop compared to design in closed-loop are the parametrizations of signal constraints and the inverse covariance matrix. Therefore, we will focus on these issues in this section. We will assume that the input is generated as u(t) = −K(q)y(t) + r(t) (3.65) where r(t) is an external reference that is uncorrelated with the noise e(t). The controller K(q) is assumed to be linear and causal. 62 3.5.1 3 Fundamentals of Experiment Design Spectrum Representations We will use the concepts of spectrum parametrizations introduced in Section 3.2. Thus the spectrum (2.13) can be written as Φχo (ω) = ∞ Ck Bk (ejω ). (3.66) k=−∞ The major diﬀerence is that the coeﬃcients are matrix-valued, Ck ∈ R2×2 . Furthermore, the (2,2)-element of Ck cannot in general be manipulated since it depends solely on the external noise variance λo . Typically B0 = I since E u(t)e(t) = 0 and hence c 0 C0 = o 0 λo which implies that Ck = 0 for |k| > 0. 3.5.2 Experiment Design in Closed-loop with a Fix Controller We will start with the simplest case where the controller is ﬁx but where the spectrum of the reference signal is at the designer’s disposal. We begin by observing that the input spectrum given the feedback (3.65) becomes Φu (ω) = |So (ejω )|2 Φr (ω) + |K(ejω )So (ejω )Ho (ejω )|2 λo ) where So = 1/(1 + Go K) is the sensitivity function. Together with (2.13) this gives that the spectrum Φχo is aﬃne in the reference spectrum Φr . Consequently, as is evidenced by (2.14), the inverse covariance matrix P −1 is aﬃne in the same quantity. With the input spectrum substituted for the reference spectrum, this is exactly the basis for the design techniques for open-loop systems, see Sections 3.2-3.4. It is, hence, straightforward to modify existing open-loop design techniques to handle design of the reference spectrum when the controller is ﬁx. 63 3.5 Experiment Design in Closed-loop 3.5.3 Experiment Design in Closed-loop with a Free Controller We will now generalize the scenario to the case where, in addition to the reference spectrum, also the feedback mechanism K(q) in (3.65) can be chosen. We thus have both Φr and K at our disposal. However, it will turn out to be more natural to instead use the input spectrum Φu and the cross spectrum Φue as design variables. Since there is a one-to-one relation between these two sets of variables this imposes no restrictions. This will be the set-up in the remaining part of the section. A parametrization in terms of partial correlations Since the elements of F span a linear subspace, cf. (3.43), it follows that the set of all covariance matrices can be parametrized in terms of ﬁnite dimensional parametrizations of Φχo , see Section 3.3. Here we will characterize one such parametrization. This parametrization is based on an idea originally presented in (Payne and Goodwin, 1974) with further developments in (Zarrop, 1979) and (Stoica and Söderström, 1982) for input design in open-loop, see Example 3.9. Here we will present the generalization to experiment design in closed-loop. Consider the case where we want to obtain a linear and ﬁnite parametrization of P −1 and the input energy 1 2π π −π Φu (ω)dω (3.67) For this we will use a partial correlation parametrization of Φχo . The starting point is (2.14), i.e. P −1 (θo ) = 1 2πλo π −π F(ejω , θo )Φχo (ω)F ∗ (ejω , θo )dω. (3.68) Then parametrize F on the form 1 MkF (θo )e−jωk D jω F (e , θo ) m F(ejω , θo ) = (3.69) k=0 where the scalar transfer function F D (θo ) corresponds to the least common denominator to F(θo ) and where MkF (θo ), k = 1, . . . , m are some 64 3 Fundamentals of Experiment Design real matrices. Then introduce the parametrization ∞ Φχo (ω) = |F D (ejω , θo )|2 Ck e−jωk (3.70) k=−∞ i.e. a parametrization of the form (3.66) with Bk (ejω , θo ) = |F D (ejω , θo )|2 e−jωk . Now it is straightforward to rewrite P −1 as a linear function in the autocorrelations Ck as follows π 1 P −1 (θo ) = F(ejω , θo )Φχo (ω)F ∗ (ejω , θo )dω 2πλo −π π m m 1 MkF (θo )Φχo (ω, θo )(MlF (θo ))T jω(l−k) e dω = 2πλo −π |F D (ejω , θo )|2 = k=0 l=0 m MkF (θo )C0 (MkF (θo ))T k=0 + m m−k F MlF (θo )Ck (Ml+k (θo ))T k=1 l=0 + m m−k F Ml+k (θo )CkT (MlF (θo ))T . (3.71) k=1 l=0 Remark: Notice that all covariance matrices can be generated by a ﬁnite number of the auto-correlations Ck . Hence, it is suﬃcient to work with a partial correlation parametrization of the spectrum Φχo . With the parametrization (3.70) it is possible to obtain a parametrization of the input energy as a linear function of Ck . From (3.70) it follows that π Φχo (ω) 1 Ck = ejωk dω (3.72) 2π −π |FD (ejω , θo )|2 which together with |FD (ejω , θo )|2 = m k=−m f|k| e−jωk (3.73) 65 3.5 Experiment Design in Closed-loop gives 1 2π π −π Φu (ω) Φ∗ue (ω) m Φue (ω) f|k| Ck dω = λo (3.74) k=−m The input energy is easily extracted from (3.74). The use of feedback increases the ﬂexibility of the experiment design. It has also been argued that for some system settings, closed-loop outperforms open-loop experiment design when there are constraints on the output variance. The parametrizations in (3.69)-(3.74) have to be slightly changed to be able to handle variance constraints on the output. However, the main idea remains. The output energy is here measured by the function π 1 |Wy (ejω )|2 Φy (ω)dω (3.75) 2π −π where Wy is a stable scalar transfer function. The spectrum of the output is deﬁned by ∗ G (θo ) . (3.76) Φy = G(θo ) H(θo ) Φχo H ∗ (θo ) % χ deﬁned by Now introduce the spectrum Φ o ∗ %χ Φ G(θo ) G (θo ) 0 0 o Φ = (θ ) 0 H(θo ) χo o 0 H ∗ (θo ) |Wy |2 Notice that |Wy |2 Φy (θo ) = 1 %χ 1 1 Φ o 1 (3.77) (3.78) Now let F% = [Fu /G, Fe /H] /Wy where Fu and Fe are deﬁned in (2.15) and (2.16), respectively. Together with (3.77) and (3.68) this gives π 1 % jω , θo )Φ % χ (ω)F%∗ (ejω , θo )dω. P −1 (θo ) = F(e (3.79) o 2πλo −π Introduce F% deﬁned by % jω , θo ) = F(e 1 m F%D (ejω , θo ) k=0 &kF (θo )e−jωk . M (3.80) 66 3 Fundamentals of Experiment Design Then introduce the parametrization ∞ % χ (ω) = |F%D (ejω , θo )|2 Φ o %k e−jωk . C (3.81) k=−∞ A linear ﬁnite dimensional parametrization of P −1 is now obtainable, by following the calculations in (3.71) based on (3.79). This gives π 1 % jω , θo )Φ % χ (ω, θo )F%∗ (ejω , θo )dω F(e o 2πλo −π m &F (θo )C %0 (M &F (θo ))T M = k k P −1 (θo ) = k=0 + m m−k F &lF (θo )C %k (M &l+k M (θo ))T k=1 l=0 + m m−k &F (θo )C % T (M &F (θo ))T . M l+k k l (3.82) k=1 l=0 %k obtained from i.e. a parametrization of P −1 in the auto-correlations C % the parametrization (3.81) based on Φχo . By following the steps in (3.72)-(3.74) it is straightforward to compute % χ . From the speciﬁc parametrization in (3.81) together the energy of Φ o with (3.76) and (3.78) it is easy to verify that (3.75) can be expressed as 1 2π π −π 2 |Wy (e )| Φy (ω)dω = jω m k=−m 1 % % f|k| 1 1 Ck 1 (3.83) %k . Notice that (3.83) is a linear function C We have presented two general ﬁnite linear parametrizations of P −1 that are globally optimal since they parametrizes all covariance matrices. The parametrizations have to be adapted depending on the considered signal constraint. Here, we have considered parametrizations of the input energy and a weighted output variance criterion. However, this type of parametrization can in principle handle all variance constraints that are linear in Φχo . To ensure that the free variables Ck and C̃k correspond to auto- 67 3.5 Experiment Design in Closed-loop correlation coeﬃcients, we must include the constraint ⎤ ⎡ X0 · · · Xm ⎢ .. .. ⎥ ≥ 0 .. ⎣ . . . ⎦ T · · · X0 Xm whenever the parametrizations in this subsection are used. Here Xk is %k . either Ck or C A parametrization based on a ﬁnite dimensional spectrum So far we have considered parametrizations of P −1 based on partial correlation parametrizations of the spectrum. Here we will introduce a ﬂexible parametrization based on a ﬁnite dimensional spectrum parametrization instead, i.e. M Φχo (ω) = Ck Bk (ejω ). (3.84) k=−M It turns out to be natural to split the parametrization of (3.84) into one for Φu and one for Φue . The input spectrum will be parametrized as Mu Φu (ω) = c̃|k| Bku (ejω ) (3.85) k=−Mu u for some stable basis functions Bku that we assume are such that B−k = u ∗ (Bk ) . The cross spectrum is deﬁned by Φue (ω) = − Ho (ejω ) λo To (ejω ) Go (ejω ) (3.86) where To is the complementary sensitivity function deﬁned by To = 1−So . Throughout this section we will assume that G is stable and minimum phase. There are diﬀerent ways of parametrizing Φue . One obvious choice is a parametrization similar to (3.85). A diﬀerent choice is the parametrization Mc Ho λ o sk Bkc (ejω ) Φue (ω) = − Go k=0 (3.87) 68 3 Fundamentals of Experiment Design where {Bkc (ejω )} represents a set of stable basis functions. Notice that the parametrization in (3.87) corresponds to a linear and ﬁnite parametrization of To . This has some advantages. For example, since To corresponds to the closed-loop response, certain properties like the bandwidth can be taken into account already in the choice of basis functions. Furthermore, it is important that the design yields a stable closed-loop system. The parametrization (3.87) will actually yield a stabilizing controller for any sequence {sk } when Go is stable and minimum phase together with the natural requirement that the basis functions are stable. To realize this, we need to check the stability of To , So , Go So and KSo . First consider To , that given the parametrization (3.87) is equal to To (ejω ) = Mc sk Bkc (ejω ). k=0 Hence To is obviously stable when the basis functions are so. Hence, So = 1 − To will also stable. Then So Go will also be stable as long as Go is stable. The last quantity we need to check is KSo = To /Go , which is stable when Go is minimum phase. For a sequence {sk }, the controller is given by Mc s B c (q) ' k=0k k (. K(q) = Mc Go (q)) 1 − k=0 sk Bkc (q) (3.88) Remark: action in the controller can be imposed by the conMIntegral c sk Bkc (1) = 1 . straint k=0 When the input spectrum and the cross spectrum are deﬁned by (3.85) and (3.87), respectively, the inverse covariance matrix (3.68) is given by P −1 (θo ) =Ro (θo ) + Mu c̃|k| BPu (k) − k=−Mu Mc sk BPc (k) + (BPc (k))T k=0 (3.89) Notice that (3.89) is a linear and ﬁnite parametrization in c̃k and sk . In (3.89) Ro is deﬁned by (3.42), BPu (k) = 1 2πλo π −π Fu (ejω , θo )Bku (ejω )Fu∗ (ejω , θo )dω 69 3.5 Experiment Design in Closed-loop and BPc (k) = 1 2π π −π Fu (ejω , θo )Fe∗ (ejω , θo ) Ho (ejω ) c jω B (e )dω. Go (ejω ) k The variance of zy = Wy y, where Wy is a stable linear ﬁlter and y has the spectrum (3.76), can be expressed by the linear relation 1 2π π −π Mu |Wy (ejω )|2 Φy (ω)dω = k=−Mu c̃|k| Byu (k) − Mc sk Byc (k) + Rv (θo ) k=0 (3.90) where Byu (k) Byc (k) 1 = 2π 1 = 2π π −π π −π |Wy (ejω ) Go (ejω )|2 Bk (ejω )dω, |Wy (ejω )Φv (ω)[Bkc (ejω ) + (Bkc )∗ (ejω )]dω and Rv (θo ) = 1 2π π −π |Wy (ejω )|2 Φv (ω)dω where Φv = |H(θo )|2 λo . The expression (3.89) is another ﬁnite linear parametrization of P −1 , based on a ﬁnite spectrum parametrization of Φχo represented by (3.85) and (3.87). As for the open-loop case, this type of parametrization will in general have a higher complexity than the partial correlation parametrizations. However, this class of parametrizations can handle a larger class of constraints. Beside variance constraints as (3.90) also point-wise constraints, see (3.64), can be treated, which is not the case for parametrizations based on partial correlations. Furthermore, as illustrated, it possible to e.g. impose certain characteristics of the closed-loop and the controller directly in the design stage, see (3.87), (3.88) and the remark.. Since the parametrization is based on a ﬁnite spectrum parametrization of Φχo , the free variables {c̃k } and {sk } must be constrained such that Φχo ≥ 0, ∀ ω. For this, Lemma 3.1 is useful. 70 3.6 3 Fundamentals of Experiment Design Quality Constraints The results in Section 3.3 and Section 3.5 show that there are several possibilities to obtain a linear and ﬁnite dimensional parametrization of the inverse covariance matrix P −1 in a set of variables {xk }. Thus, all constraints that are convex in P −1 also become convex in {xk }. This is a very important observation that we will further explore in this section and in Section 3.7 in order to incorporate diﬀerent quality constraints into our optimal experiment designs. 3.6.1 Convex Representation of Quality Constraints There are several classical performance criteria for input design that are convex in P −1 . For example, λmax (P ) ≤ γ, (Boyd et al., 1994), where the operator λmax extracts the largest eigenvalue. Another example is det P ≤ γ (Nesterov and Nemirovski, 1994). A type of criterion that several times has been suggested in input design for control is the weighted trace criterion, Tr W P ≤ γ. This type of criterion is convex in P −1 as the following result shows. For generality we allow the weighting function W to be frequency dependent. Lemma 3.4 The constraints Tr W (ω)P ≤ γ ∀ω ∗ W (ω) = V (ω)V (ω) ≥ 0 ∀ ω P ≥ 0, (3.91) may be written as the following constraints; Γ(ω) γ − Tr Z ≥ 0 Z V (ω) ≥ 0 ∀ω. V ∗ (ω) P −1 (3.92) Proof: The factorization of W (ω) leads to Tr W (ω)P = Tr V ∗ (ω)P V (ω). (3.93) Introduce the slack variable Z ∈ Rz×z . Then the constraint Tr W (ω)P ≤ γ together with (3.93) can be written as Tr Z ≤ γ Z − V (ω)P (Φu )V (ω) ≥ 0. ∗ (3.94) 71 3.6 Quality Constraints Using Schur-complement, (3.94) may be written as (3.92) which is linear in γ, Z and P −1 and thus the convexity is fulﬁlled. The following example will illustrate some situations where the weighted trace criterion may appear as quality constraint. Example 3.12 Consider the constraint (3.22) that is one example that can be written as a frequency dependent weighted trace constraint: 2 1 T (ejω ) dG∗ (θo ) dG(θo ) P ≤1 N Go (ejω ) dθ dθ 2 1 T (ejω ) dG(θo ) dG∗ (θo ) P ≤1 ⇔ Tr N Go (ejω ) dθ dθ (3.95) ⇔ Tr W (ω)P ≤ 1 where 1 W (ω) = N T (ejω ) 2 dG(θo ) dG∗ (θo ) . Go (ejω ) dθ dθ Corollary 3.1 The constraints (3.91) can be written as P −1 − 1 V (ω)V ∗ (ω) ≥ 0 γ ∀ω (3.96) when V (ω) is a vector. Proof: The slack variable Z in (3.92) becomes a scalar when V (ω) is a vector. Hence, Z can be replaced by γ. Then it is straightforward to use Schur complements to obtain (3.96) from (3.92). 3.6.2 Application of the KYP-lemma to Quality Constraints Even though many quality constraints are convex, they will not necessarily become ﬁnite-dimensional. One example is the frequency-byfrequency weighted trace criterion in Lemma 3.4. In some situations it is possible to treat such constraints as positiveness constraints on spectra and use the idea of Lemma 3.1. 72 3 Fundamentals of Experiment Design Lemma 3.5 Suppose that V (ω) is a frequency function with controllable state-space realization {AV , BV , CV , DV }. Let Γ(ω) be deﬁned as in Lemma 3.4. It then holds that Γ(ω) ≥ 0 ∀ ω if and only if there exists QΓ = QTΓ such that (recall the deﬁnition (3.32) of K) K(QΓ , {AΓ , BΓ , CΓ , DΓ }) ≥ 0 where BΓ = 0 BV , Z DV T . DΓ + DΓ = DVT P −1 AΓ = AV , CΓ = CV 0 (3.97) and Proof: The state-space realization {AΓ , BΓ , CΓ , DΓ } deﬁned in Lemma 3.5 gives Γ(ω) = Γ+ (ejω ) + Γ∗+ (ejω ) where Γ+ (ejω ) = CΓ (ejω I − AΓ )−1 BΓ + DΓ . Furthermore, if {AV , BV , CV , DV } is controllable then also the realization {AΓ , BΓ , CΓ , DΓ } is that. Thus, the Positive Real Lemma (Yakubovich, 1962) can be applied. Notice that since the elements of Z and P −1 appear linearly in DΓ + independently of the realization of V , the only constraint on the realization of {AV , BV , CV , DV } is that it is controllable. DΓT , 3.7 Quality Constraints in Ellipsoidal Regions In this section we will focus on quality constraints that are based on the uncertainty set (2.28), deﬁned in Section 2.3.2. To illustrate this, consider the frequency function ∆(ejω , θ) = T (ejω ) Go (ejω ) − G(ejω , θ) G(ejω , θ) introduced in (3.19), and consider the worst-case measure of ∆ over the set of models in the conﬁdence region: max |∆|2 ω,θ∈U U = {θ | N (θ − θo )T P −1 (θ − θo ) ≤ χ} (3.98) 3.7 Quality Constraints in Ellipsoidal Regions 73 With this setup it is possible to guarantee that, say, 95% of all identiﬁed models will satisfy the quality constraint. In this section we will introduce a new family of quality measures of the model G which includes (3.98). Let Wn , Wd , Xn and Xd be ﬁnite-dimensional stable transfer functions. Let Yn (ω) be deﬁned by Yn = Yn∗ Yn (3.99) where Yn is some stable ﬁnite-dimensional transfer function. Let Yd , Kn and Kd be deﬁned analogously. Furthermore, let R be a positive deﬁnite matrix. The generalized quality measure is deﬁned as F (ω, η) ≤ γ ∀ ω & ∀ η ∈ Υ Υ = {η | (η − ηo )T R(η − ηo ) ≤ 1}. (3.100) where F (ω, η) = [Wn G(η) + Xn ]∗ Yn [Wn G(η) + Xn ] + Kn . [Wd G(η) + Xd ]∗ Yd [Wd G(η) + Xd ] + Kd (3.101) The quality measure (3.100) is a max-norm constraint on F with respect to ω and it has to be satisﬁed for all η in the ellipsoid Υ. Uncertainty sets such as Υ are e.g. delivered by the prediction error method, see Section 2.2.2. We illustrate the usefulness of the measure via two examples. Example 3.13 Taking Wn = Wd = 1, Xn = −Go , Yn = T , and Xd = Kn = Kd = 0 gives that |G(η) − Go |2 |T |2 F (ω, η) = = |∆|2 . |G(η)|2 Thus the generalized quality measure (3.100) includes (3.98). Example 3.14 (Worst-case chordal distance) The square of the chordal distance (Vinnicombe, 1993) between G(η) and Go can be written as κ2 (ω, η) = |G(ejω , η) − Go (ejω )|2 . (1 + |G(ejω , η)|2 )(1 + |Go (ejω )|2 ) (3.102) Taking Yn = Wn = Wd = 1, Xn = −Go , Xd = 0, Kn = 0 and Kd = Yd = 1 + |Go |2 gives F (ω, η) = κ2 (ω, η). 74 3 Fundamentals of Experiment Design As the aforementioned example illustrated, the generalized quality measure (3.100) also includes the worst case chordal distance. Thus the proposed methods in this section will be able to handle a quality constraint in terms of an upper bound on the worst case chordal distance as well. The objective of this section is to develop tools such that constraints like (3.100) can be incorporated in the framework developed in Section 3.2-3.6. Thus we want to transform (3.100) into linear matrix inequalities. Input design based on the worst case chordal distance as the objective function has been treated in (Hildebrand and Gevers, 2003). This is the ﬁrst contribution that has considered input design with respect to parametric uncertainties in terms of conﬁdence ellipsoids. The method in (Hildebrand and Gevers, 2003) is an iterative procedure that consists of two steps in each iteration. In the ﬁrst step the worst case chordal distance is computed for a ﬁx input design by a method proposed in (Bombois et al., 1999). This method solves a convex optimization problem for each frequency on a frequency grid and then the maximum over this grid is taken as the worst case chordal distance, see Section 7.2. Based on this solution, a cutting plane is deﬁned for the input design variables and in the second step they are updated by an ellipsoid algorithm based on the cutting plane. Then the ﬁrst step is repeated. Consequently, the method obtains a solution by solving several convex optimization problems. In this section we will propose an alternative solution in which only one optimization problem is solved. A second diﬀerence is that we consider a ﬁxed bound on the quality constraint while in (Hildebrand and Gevers, 2003) the input power is ﬁxed. 3.7.1 Reformulation as a Convex Problem Consider the model structure (2.2), which is parametrized by the vector θ. Partition θT = [η T ξ T ] such that G(θ) = G(η) and let G(η) be parametrized as G(η) = q −nk (b1 + · · · + bnb q −nb +1 ) ZN (q)η = 1 + · · · + ana q −na 1 + ZD (q)η (3.103) where nk is the delay, η T = [a1 . . . ana b1 . . . bnb ] ∈ RnG na +nb , ZN (q) = q −nk [0 . . . 0 1 q −1 . . . q −nb +1 ] and (3.104) 3.7 Quality Constraints in Ellipsoidal Regions ZD (q) = [q −1 . . . q −na 0 . . . 0] 75 (3.105) are row vectors of size nG . Let A denote the complex conjugate of A. Lemma 3.6 Let F (ω, η) be deﬁned by (3.101) and let G(η) be parametrized as (3.103). Then T η η (γF0 (ω) − F1 (ω)) ≥0 (3.106) F ≤γ⇔ 1 1 where F0 (ω) = f (ω)(Md (ω)+Md (ω)) and F1 (ω) = f (ω)(Mn (ω)+Mn (ω)) and where Md (ω) = ZV∗ (Yd (ω)xd x∗d + Kd (ω)vv T )ZV Mn (ω) = ZV∗ (Yn (ω)xn x∗n + Kn (ω)vv T )ZV (3.107) ∗ ∗ ZN ZD 0 with = , v T = [0 1 1], x∗d = [Wd (ejω ) Xd (ejω ) Xd (ejω )] 0 0 1 and xn deﬁned analogous to xd . Furthermore f (ω) is the least common denominator of Md (ω) and Mn (ω). ZV∗ Proof: Using that both the numerator and the denominator of F have the quadratic form (W G + X)∗ Y (W G + X) + K and exploiting the parametrization of G in (3.103), we obtain F ≤γ⇔ T η η (γMd (ω) − Mn (ω)) ≥0 1 1 where Md and Mn are deﬁned by (3.107). The equivalence still holds when it is multiplied by f (ω) and since η is real this will be equivalent to (3.106). The equivalence in (3.106) will be further exploited in the next theorem. Theorem 3.1 Assume that F (ω, η) < ∞ for all ω. Furthermore assume that γF0 (ω) − F1 (ω) is not positive semideﬁnite. Then the following two statements are equivalent: 76 3 Fundamentals of Experiment Design 1. F (ω, η) ≤ γ ∀ ω ∈ [−π, π] and ∀ η ∈ Υ, Υ = {η | (η − ηo )T R(η − ηo ) ≤ 1} (3.108) 2. ∃ τ (ω) > 0, τ (ω) ∈ R such that τ (ω)(γF0 (ω) − F1 (ω)) − E ≥ 0 ∀ ω −R Rηo E T ηo R 1 − ηoT Rηo (3.109) Proof: Lemma 3.6 gives that F ≤ γ is equivalent to T η η σ0 (η) (γF0 (ω) − F1 (ω)) ≥ 0. 1 1 (3.110) Expression (3.110) is equivalent to that F ≤ γ for a particular η. Now this must be true for all η ∈ Υ. The ellipsoid Υ can be parametrized as T −R η σ1 (η) 1 ηoT R Rηo 1 − ηoT Rηo η . 1 (3.111) Hence the condition F ≤ γ, ∀ ω ∈ [−π, π] and η ∈ Υ is equivalent to σ0 (ω) ≥ 0 for all ω and for all η such that σ1 (η) ≥ 0. Such a problem can be handled by the S−procedure (Boyd et al., 1994) which states the following equivalence for each ω: σ0 (η) ≥ 0 ∀ η ∈ Rk such that σ1 (η) ≥ 0 ⇔ ∃ β ≥ 0, β ∈ R, such that σ0 (η) − βσ1 (η) ≥ 0 ∀η ∈ Rk . Since there has to exist one β ≥ 0 for each ω, we can rewrite β as a function of ω which has to fulﬁll β(ω) ≥ 0 for all ω. Finally to obtain the expression 1 which is valid if we (3.109) we change the variable β(ω) as, τ (ω) = β(ω) can show that β(ω) = 0. The expression σ0 (η) − βσ1 (η) ≥ 0 is true for β(ω) = 0 only if γF0 (ω) − F1 (ω) ≥ 0 for some ω. The assumption that γF0 (ω) − F1 (ω) 0 guarantees that β(ω) = 0. Hence, we obtain the condition τ (ω) > 0 and we arrive at the statement in (3.109). Theorem 3.1 is very interesting from an input design point of view. When considering input design, the variable R in (3.109) will be proportional to the inverse covariance matrix of the parameters, see (2.20). Furthermore, the inverse covariance matrix is aﬃne in the input spectrum 3.7 Quality Constraints in Ellipsoidal Regions 77 Φu and the cross spectrum Φue , see (2.14). With a linear parametrization of these spectra, R also becomes linearly parametrized. Hence, Theorem 3.1 states that the worst-case (over all models in an ellipsoidal model set) max-norm performance constraint (3.108) can be translated into the condition (3.109), which is a linear matrix inequality in τ and R for each ω when γ is ﬁx and F (·) is given by (3.101). There are two problems associated with this. First of all, τ is an unknown function of ω. Furthermore, this is an inﬁnite dimensional constraint similar to the performance constraint in (3.92), since the constraint has to hold for all frequencies. In the next subsection we will address these issues. 3.7.2 A Finite Dimensional Formulation It was shown in Lemma 3.1 that when the constraint can be viewed as a positiveness constraint on a spectrum, the KYP-lemma can be applied to make the constraint ﬁnite-dimensional. We now introduce some conditions that will allow the KYP-lemma to be applied to (3.109) such that this constraint can be reduced to a ﬁnite dimensional linear matrix inequality. Lemma 3.7 Let F0 (ω), F1 (ω) and E be deﬁned by (3.106), (3.107) and (3.109) and introduce Λ(ω) τ (ω)(γF0 (ω) − F1 (ω)) − E. (3.112) When τ (ω) is deﬁned by τ (ω) = Ψ(ejω ) + Ψ∗ (ejω ) Ψ(ejω ) = K−1 τk Bk (ejω ) (3.113) k=0 for some linearly independent basis functions Bk , k = 0, . . . , K − 1, there exists a sequence {Λk } such that Λ(ω) ≥ 0 ∀ ω ⇔ p Λk (e−kjω + ekjω ) ≥ 0 ∀ ω (3.114) k=0 where the variables {τk } and the elements of R appear linearly in {Λk }. 78 3 Fundamentals of Experiment Design Proof: Both F0 and F1 have the structure k Fk e−kjω . Multiply with the least common denominator of τ (ω) to both sides of Λ(ω) ≥ 0. This gives the equivalence in (3.114). The special parametrization (3.113) of τ (ω) will, according to Lemma 3.7, imply that the condition Λ(ω) ≥ 0 ∀ ω can be replaced by a positiveness constraint on a spectrum, see (3.114). This fact can now be used together with Lemma 3.1 to transform the inﬁnite dimensional constraint (3.100) into a linear matrix inequality in the variables {τk } and the elements of R. Theorem 3.2 Assume that F (ω, η) < ∞ for all ω and for all η ∈ Υ. Let F0 (ω) and F1 (ω) be deﬁned by (3.106) and (3.107). Assume that γF0 (ω) − F1 (ω) is not positive semideﬁnite. Let τ (ω) be deﬁned as in Lemma 3.7. Then there exists a state-space representation {Aτ , Bτ , Cτ , Dτ } of the positive real part of τ (ω) where {τk } appears linearly in Cτ and Dτ . Similarly, there exists a state-space representation {AΛ , BΛ , CΛ , DΛ } of the positive real part of the spectrum (3.114) where {τk } and the elements of R appear linearly in CΛ and DΛ . Furthermore, it holds that F (ω, η) ≤ γ ∀ ω ∈ [−π, π] and ∀ η ∈ Υ, Υ = {η | (η − ηo )T R(η − ηo ) ≤ 1} if there exist Qτ = QTτ and QΛ = QTΛ such that ) K(Qτ , {Aτ , Bτ , Cτ , Dτ }) ≥ 0 . K(QΛ , {AΛ , BΛ , CΛ , DΛ }) ≥ 0 (3.115) (3.116) Proof: Due to the parametrization of τ (ω), the constraints (3.116) will assure that τ (ω) ≥ 0 ∀ ω and Λ(ω) ≥ 0 ∀ ω according to Lemma 3.1 and Lemma 3.7. Whenever τ (ω) ≥ 0 ∀ ω and Λ(ω) ≥ 0 ∀ ω this will, according to Theorem 3.1, imply (3.115). Theorem 3.2 is quite powerful in experiment design problems. Notice that the inequalities (3.116) are LMIs in Qτ , QΛ , τk k = 0, . . . , K − 1 and the elements of R. Furthermore, when R = P −1 this quantity becomes linearly parametrized when the input spectrum is linearly parametrized as in Section 3.2. The use of this theorem will be illustrated in Section 3.11. 79 3.8 Biased Noise Dynamics 3.8 Biased Noise Dynamics We have so far treated the case where both the system model G and the noise model H are ﬂexible enough to capture the corresponding quantities of the true system. Here we will relax this assumption and only assume that the system dynamics are captured by the model structure. Thus, H is allowed to be biased. We will further assume that G and and H are independently parametrized. Introduce the parameter vector η θ= ξ such that G(θ) = G(η) and H(θ) = H(ξ). The true noise dynamics is represented by Ho and let Φv (ω) = |Ho (ejω )|2 λo denote the noise spectrum. Furthermore, there exists a parameter vector ηo such that G(ηo ) = Go , since we assume that G captures the structure of the true system Go . Then the prediction error estimate (2.11) will under mild assumptions be such that η̂N η lim (3.117) = θ̄ = ¯o ξ N →∞ ξˆN where θ̄ minimizes the variance of the prediction errors, see (Ljung, 1999). Furthermore, the covariance of η̂N is, under the assumption of open-loop operation, approximately Cov η̂N ≈ where 1 Rη = 2π 1 Qη = 2π π −π π 1 −1 R Qη Rη−1 N η (3.118) F%u (ejω , ηo )F%u∗ (ejω , ηo )Φu (ω)dω, (3.119) Φv (ω) jω ∗ jω F%u (e , ηo )F%u (e , ηo )Φu (ω)dω, ¯ 4 H(ejω , ξ) (3.120) −π and jω dG(e , ηo ) 1 F%u (ejω , ηo ) = . ¯ 2 dη H(ejω , ξ) (3.121) 80 3 Fundamentals of Experiment Design Remark: Both Rη and Qη are linear in the input spectrum Φu . We will now utilize the observation in the last remark in order to make diﬀerent quality constraints that are based on the property (3.118) to be convex in the input spectrum. Notice that Rη and Qη have the same structure as P −1 in (3.41) with Ro = 0. From this, it is easy to realize that the methods to parametrize the covariance matrix P −1 , presented in Section 3.3, also apply to Rη and Qη . Thus, it is possible to obtain a linear and ﬁnite-dimensional parametrization of Rη and Qη . This is useful when formulating quality constraints for optimal input design. 3.8.1 Weighted Variance Constraints The ﬁrst example of where (3.118) can be utilized is when we consider the variance of G(ejω , η̂N ), which, using a ﬁrst order Taylor approximation, can be expressed as Var G(ejω , η̂N ) ≈ dG(ejω , ηo ) 1 dG∗ (ejω , ηo ) −1 Rη Qη Rη−1 . N dη dη (3.122) Now consider the constraint ∗ jω jω W (ejω )2 dG (e , ηo ) Rη−1 Qη Rη−1 dG(e , ηo ) ≤ 1 ∀ ω, dη dη (3.123) where W (ejω ) is a stable transfer function. This is a frequency-byfrequency constraint on the variance of G(ejω , η̂N ). Notice that Corollary 3.1 now applies to the constraint (3.123) with P −1 = Rη Q−1 η Rη . Thus, (3.123) is equivalent to jω ∗ jω Rη Q−1 η Rη − V (e )V (e ) ≥ 0 ∀ ω, (3.124) where V (ejω ) = W (ejω ) dG(ejω , ηo ) . dη (3.125) The constraint (3.124) can, by the use of Schur complements, be expressed as V (ejω )V ∗ (ejω ) Rη ≤ 0 ∀ ω. (3.126) Rη Qη 81 3.8 Biased Noise Dynamics Remark: The constraint (3.126) is a linear matrix inequality in Rη and Qη . The constraint can by suitable parametrizations of Rη and Qη , see Section 3.3, become a ﬁnite-dimensional LMI for each ﬁxed frequency. To handle the frequency dependence, the function (3.126) can either be sampled or Lemma 3.5 may be used, resulting in LMIs of the type (3.97). 3.8.2 Parametric Confidence Ellipsoids It is also possible to deﬁne conﬁdence regions for the estimates η̂N as N (η̂N − ηo )T Rη Q−1 η Rη (η̂N − ηo ) ≤ χ. (3.127) In Section 3.7, we considered the generalized quality measure F (ω, η) ≤ γ ∀ ω & ∀ η ∈ Υ Υ = {η | (η − ηo )T R(η − ηo ) ≤ 1}. (3.128) where F was deﬁned by (3.101). In Section 3.7, it was shown how the constraints deﬁned by (3.128) could be transformed into constraints that are linear in the matrix R, the matrix that determines the shape of the conﬁdence ellipsoid. Now we will show how the conﬁdence ellipsoid (3.127) can be ﬁt into the theory developed in Section 3.7. For this we let R = Rη Q−1 η Rη and the objective is to obtain a constraint linear in Qη and Rη . Under the assumptions stated in Theorem 3.1, we know that the quality constraint (3.128) is equivalent to τ (ω)(γF0 (ω) − F1 (ω)) − E ≥ 0 ∀ ω −R Rηo E T ηo R 1 − ηoT Rηo (3.129) for some positive, real and scalar valued function τ (ω) > 0. Notice that (3.129) is linear in R. Introduce the matrix M (ω) deﬁned by ⎡ ⎤ 0 ⎢ .. ⎥ ⎢ ⎥ (3.130) M (ω) = τ (ω)(γF0 (ω) − F1 (ω)) − ⎢ . ⎥ 0 · · · 0 1 . ⎣0⎦ 1 82 3 Fundamentals of Experiment Design Then (3.129) is equivalent to I M (ω) + T R I ηo ηo ≥ 0 ∀ ω. (3.131) Now let R = Rη Q−1 η Rη and insert this into (3.129). Then (3.129) is, using Schur complements, equivalent to ⎤ ⎡ Rη M (ω) ⎣ ηoT Rη ⎦ ≥ 0 ∀ ω. (3.132) Rη Rη ηo −Qη which now is a constraint that is linear in Qη and Rη . One possibility to handle the frequency dependence in the constraint (3.132) is to apply the theory in Section 3.7.2 to obtain a ﬁnite-dimensional formulation of (3.132). An alternative is to sample the constraint (3.132) along the frequency axis. For parametrizations of Rη and Qη , we refer to Section 3.3. 3.9 Computational Aspects We have in this chapter frequently used the KYP-lemma to embed positive constraints on ﬁnite auto-correlation sequences into LMI descriptions, cf. Lemma 3.1, Lemma 3.5 and Theorem 3.2. The use of KYP constraints appears in many control and signal processing applications, e.g. linear system design and analysis (Boyd and Barratt, 1991; Hindi et al., 1998), robust control design using integral quadratic constraints (Jönsson, 1996; Megretski and Rantzer, 1997), quadratic Lyapunov function search (Boyd et al., 1994). Applications related to ﬁnite autocorrelation sequences are ﬁlter design (Wu et al., 1996) and MA-estimation (Stoica et al., 2000). However, to embed KYP-constraints via semideﬁnite programs are often very computational costly. The computational complexity is of the order O(n6 ) using standard solvers where n is the number of free parameters. This has lead to a number of contributions that study different possibilities to reduce the complexity for this type of problems, see e.g. (Hansson and Vandenberghe, 2001; Wallin et al., 2003; Gillberg and Hansson, 2003; Kao et al., 2004). Methods related to ﬁnite autocorrelation sequences are reported in (Dumitrescu et al., 2001; Alkire and Vandenberghe, 2002). It is shown that by solving a dual problem the complexity can be reduced to O(n4 ) or even further for some cases where the structure of the dual is exploited. 3.10 Robustness Aspects 3.10 83 Robustness Aspects Solutions to most optimal input design problems depend on the true, and unknown, underlying system. One common way to overcome this is to replace the true system in the design by some estimate of the system that, e.g. , is obtained from an initial identiﬁcation experiment. However, due to the estimation error, there is no guarantee that a design based on such an estimate will yield a solution that is satisfactory when applied to the true system. Hence there is a need to develop methods that are robust with respect to the true system. Below we will discuss some important issues: The parametrization of the input spectrum, mini-max solutions with respect to a set of initial models, the inﬂuence of input design on an estimated low order models and ﬁnally we will point out adaptation as a useful tool for experiment design. 3.10.1 Input Spectrum Parametrization As we have discussed in Section 3.2 it may happen that the input design problem only depends on c̃0 , . . . , c̃M −1 , for some ﬁnite positive integer M , in a certain expansion of the input spectrum (3.27). The additional degrees of freedom, c̃M , c̃M +1 , . . . can then be used to increase robustness of the design. For the partial correlation parametrization, it is clear that diﬀerent correlation extensions yield diﬀerent robustness properties. Using a discrete spectrum with a minimal number of non-zero spectral lines for the correlation extension may lead to a design that is very vulnerable to errors in a priori assumptions about the system behavior. For example, if the true system order turns out to be higher than the number of non-zero spectral frequencies (over the interval [0, 2π)) the system will not be identiﬁable if a separately parametrized noise model is used. Furthermore, even if, say, an ARX-model is used so that the denominator polynomial in the system dynamics may be estimated from the noise, certain directions of the parameters associated with the system dynamics are not improved upon by the input in this situation. An all-pole realization, on the other hand, will yield a spectrum solution that is nonzero everywhere and can be used to identify models of any order. It is hence important to consider robustness issues when deciding upon which correlation extension method to use. In the ﬁnite dimensional spectrum parametrization additional constraints related to robustness can be included in the original program, 84 3 Fundamentals of Experiment Design e.g. frequency-by-frequency bounds to guarantee a certain excitation level. An alternative to make the design less sensitive to the system estimate that is used for the design is to restrict the degrees of freedom in the parametrized spectrum. This restriction will prevent the design to adapt too much to the system the design is based on. Consider the design for a resonant system as illustrated in Section 3.1.4. When there was no frequency-by-frequency bound on the input spectrum, the design concentrated most of the input power around the ﬁrst resonance peak. Such a design may be very vulnerable with respect to bad estimates of this peak, cf. Section 5.3.3. Hence, a more ﬂat spectrum will be less sensitive to good knowledge of the location of certain narrow frequency bands of the underlying true system. A way to force the design to produce ﬂat spectra, is to restrict the ﬂexibility in the parametrization. 3.10.2 Working with Sets of a Priori Models Robustness can be achieved by posing the design problem such that performance objectives and constraints are satisﬁed for all systems within some prior model set. Unfortunately, at present there exists no input design method with this property. However, for the approach described in this thesis it is possible to include constraints and objective functions for several systems simply by adding LMIs for each separate system to the overall problem. Even though no guarantees can be given in general that the objectives are met for the true system, this approach provides improved robustness compared to a design problem that is based on a single prior model. Given an initial estimate θ̂i and associated covariance matrix PN one may for example pick models in the corresponding uncertainty set Uθ̂i (see (2.20) for a deﬁnition of Uθ ). One may also draw samples from the corresponding Normal distribution. In the numerical example given in the next section we will compare the optimal solution, based on the knowledge of the true system, with a design where the true system is replaced by normal distributed samples lying in a conﬁdence region obtained from an initial identiﬁcation experiment. 3.10.3 Adaptation Another way to combat the uncertainty is to adapt the experiment design as more data and thus more information of the system is gathered. This has e.g. been suggested in (Forssell and Ljung, 2000; Lindqvist and Hjal- 85 3.10 Robustness Aspects marsson, 2000; Samyudiya and Lee, 2000). In (Hjalmarsson et al., 1996), a closed-loop experiment design is used in a two-step procedure. There are, however, few contributions on adaptive experiment design. One exception is presented in (Lindqvist, 2001), where an FIR input ﬁlter is adaptively updated. The algorithm is guaranteed to converge under mild assumptions on the input ﬁlter together with the assumption of the system belonging to the model class. Furthermore, the algorithm is stable as long as the open-loop system is stable and provided that the input variance is constrained. Other recent contributions related to adaptive input design are (Rivera et al., 2003) and (Lacy et al., 2003), 3.10.4 Low and High Order Models and Optimal Input Design In reality we so not know the complexity of the true system. Here we will illustrate, by means of a very simple example, that optimal input design can be useful to obtain models of both low and high order that have the same statistical accuracy as the corresponding full order model estimate. Example 3.15 Suppose that the objective is to estimate the static gain of the FIR system y(t) = n bok u(t − k) + eo (t) (3.133) k=1 where eo is Gaussian white noise with variance N λo . Furthermore, assume that the input power is bounded by N1 t=1 u2 (t) ≤ λu . It is easy to √ realize that a constant input with amplitude λu is optimal for estimating the static gain, cf. Example 3.3 and let a tend to one. This will of course lead to a poorly conditioned problem if we use a model of order two or larger. Let us study the properties of a linear regression model of arbitrary order when the input is static. The model is thus represented by ⎡ ⎤ b1 ⎢ . ⎥ T y(t) = ϕ (t)θ + e(t) = u(t − 1) · · · u(t − m) ⎣ .. ⎦ + e(t) (3.134) bm where e represents white noise. Thus the one-step ahead output prediction is ŷ(t) = ϕT (t)θ (3.135) 86 3 Fundamentals of Experiment Design where θm = [b1 , . . . , bm ]T . The least-squares estimate θ̂m is a solution to the normal equations N t=1 ϕ(t)y(t) = N ϕ(t)ϕT (t)θm . t=1 Since u is static with amplitude |u(t)| = of (3.136), from which we obtain that √ (3.136) λu , we can study the ﬁrst row [1, . . . , 1]θ̂m = [1, . . . , 1]θo + √ N 1 eo (t) λu N t=1 (3.137) where θo = [bo1 , . . . , bom ]T . Hence, the least-squares estimate θ̂m will provide an unbiased estimate of the true static gain since E{[1, . . . , 1]θ̂N } = [1, . . . , 1]θo . (3.138) This is independent of the model order, i.e. this holds both for models of lower order than the true system as well for over-parametrized models. Furthermore, the variance of estimate of the static gain will be λo /(N λu ) also independently of the model order, i.e. they will have the same accuracy as the static gain estimate based on a full-order model. This example is further illustrated in (Hjalmarsson, 2004). This is a very interesting example. It illustrates that it is possible to obtain an accurate estimate of the static gain when the model order is lower than the true system by applying a suitable input. Normally, undermodeling leads to a bias contribution that makes the total error larger than the error obtained with a full-order model. Furthermore, we obtain the same accuracy even in the case of over-parametrization. The variance error usually increases with the model order. In Chapter 6, we will see a similar phenomena when we apply optimal input design for identiﬁcation of system zeros. 3.11 Framework Review and Numerical Illustration The necessary pieces for a quite ﬂexible framework for experiment design have been presented in the preceding sections. Here we will brieﬂy recapitulate the main points. Furthermore, an example illustrating some of the features will be presented. 3.11 Framework Review and Numerical Illustration 87 The formulation of an experiment design problem can be decomposed into the following parts: • Spectrum parametrization. Here, the framework allows for any linear and ﬁnite-dimensional parametrization of the spectrum or a partial expansion thereof. This includes e.g. all-zero (FIR) and all-pole (AR) spectra, as well as discrete spectra. The choice of whether to use a ﬁnite dimensional spectrum parametrization or a partial correlation parametrization is governed by: – optimality aspects, – computational aspects, – signal constraint aspects, and – robustness aspects. The partial correlation parametrization is globally optimal and may use a minimal number of parameters leading to less computational complexity. However, certain signal constraints can not be guaranteed and the parametrization may depend on the true system. The ﬁnite dimensional spectrum parametrization does not in general yield a globally optimal solution but the basis functions need not be functions of the true system and this approach can handle frequency-by-frequency signal constraints. • Quality constraints. General linear, possibly frequency dependent, functions of the asymptotic covariance matrix P can be used. Such functions take the form (3.92) in Lemma 3.4. To handle frequency dependence, the function (3.92) can either be sampled or Lemma 3.5 may be used, resulting in LMIs of the type (3.97). These types of quality measures can either be included as ﬁxed constraints or included in the objective function, e.g. γ in (3.92) can be either ﬁxed or minimized. It is also possible to use certain types of quality constraints that are guaranteed to hold in a conﬁdence region. These functions take the form (3.101) and include, e.g. , weighted frequency function errors (see Example 3.13) and worst-case chordal distance measures (see Example 3.14). The resulting constraint is given by (3.109). Also here, frequency dependence can be handled either by sampling 88 3 Fundamentals of Experiment Design the constraints or by application of the KYP-lemma. The latter approach requires a ﬁnite dimensional parametrization of a certain variable τ (ω), cf. (3.113). This parametrization may introduce a certain conservatism in the design. The resulting constraints are the two LMIs in (3.116). • Signal constraints. Input and output energy constraints are expressed by (3.57) and (3.59), respectively. One may, e.g. , use (3.57) as the objective function to minimize the energy used in a certain frequency band. This may be of interest for systems such as ﬂexible structures where certain excitation frequencies may be highly damaging to the system. Frequency-by-frequency constraints can easily be included as discussed in Section 3.4.2. As for the quality constraints, signal constraints can either be included as ﬁx constraints or in the objective function. • Robustness constraints. We refer to Section 3.10 for a discussion of diﬀerent robustness aspects. We will now illustrate some of the features via an example related to identiﬁcation for control. Example 3.16 In this example we will illustrate the machinery of the framework for an identiﬁcation for control problem. Let the true system be given by y(t) = Go (q)u(t) + e(t) −1 (3.139) 0.36q with Go (q) = 1−0.7q −1 and where e(t) is zero-mean white noise with variance 0.1. The magnitude of the frequency function of Go is shown as the dash-dotted line in Figure 3.6. The modeling objective is to be able to design a controller such that the resulting closed-loop system is stable and the complementary sensitivity function is close to T = (1 − 0.1353)2 q −1 . (1 − 0.1353q −1 )2 89 3.11 Framework Review and Numerical Illustration 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 −4 10 −3 −2 10 10 −1 10 0 10 Frequency (rad) Figure 3.6: Thick solid line: Optimal spectrum based on (3.143). Dashed line: Input spectrum obtained from robust algorithm. Thin solid line: Weighting function T . Dash-dotted line: Open loop system Go . Dotted line: White noise input with variance Φu (ω) = 0.26. The amplitude curve of T is shown as the thin solid line in Figure 3.6. A suﬃcient condition for this is that the weighted relative model error (3.19) is suﬃciently small (in particular ∆(θ)∞ ≤ 1 guarantees stability) (Hjalmarsson and Jansson, 2003). Here we choose the condition ∆(θ)∞ ≤ 0.1 . (3.140) As model structure we choose G(θ) = bq −1 , θ = [a b]T . 1 − aq −1 The sample size is set to N = 500 samples5 The objective is to ﬁnd the minimum energy required, and the corresponding input spectrum, such that (3.140) is satisﬁed for all models in the resulting 95% conﬁdence 5 The sample size only acts as a scaling factor for the covariance matrix and it is easy to modify the results for arbitrary sample size. 90 3 Fundamentals of Experiment Design region. We therefore use the criterion (3.98). We thus have the problem minimize α Φu G(θo ) − G(θ) 2 T ≤ γ2 G(θ) subject to ∀ω (θ − θo )T PN−1 (θ − θo ) ≤ χ (3.141) Φu (ω) ≥ 0 ∀ ω π 1 Φu (ω)dω ≤ α 2π −π where γ = 0.1 and where the elliptical constraint in (3.141) is deﬁned by (2.20) with χ = χ2 (2) = 5.99 which corresponds to a conﬁdence level of 95%. 9 When we restrict the structure of τ to τ (ω) = k=0 τk (e−jωk + ejωk ), Theorem 3.2 can be applied giving the following approximation to (3.141) (recall the deﬁnition (3.32) of K): minimize Φu ,Qτ ,QΛ ,τ0 ,...,τ9 subject to α K(Qτ , {Aτ , Bτ , Cτ , Dτ }) ≥ 0 K(QΛ , {AΛ , BΛ , CΛ , DΛ }) ≥ 0 QTτ = Qτ (3.142) QTΛ = QΛ Φu (ω) ≥ 0 ∀ ω π 1 Φu (ω)dω ≤ α 2π −π where Cτ and Dτ depend linearly on τk , k = 0, . . . , 9 and that CΛ and DΛ depend linearly on τk , k = 0, . . . , 9 and Φu . The above problem is thus convex in all free variables. In order to reduce the problem to a ﬁnite-dimensional one, the spectrum is parametrized as in (3.85) with Bk (ejω ) = e−jωk and M = 20. Example 3.10 gives that this corresponds to shaping the input spectrum with a FIR ﬁlter of order 20 and that the parameters ck in (3.85) correspond to the auto-correlation sequence rk of the input. Lemma 3.1 gives that the positivity constraint on Φu now is equivalent to K(Q, {A, B, C, D}) ≥ 0 3.11 Framework Review and Numerical Illustration 91 and Lemma 3.3 gives that the input variance constraint can be expressed as r0 ≤ α. Thus the input design problem (3.142) is equivalent to minimize α subject to K(Qτ , {Aτ , Bτ , Cτ , Dτ }) ≥ 0 Qτ ,QΛ ,τ0 ,...,τ9 ,r0 ,...,r19 K(QΛ , {AΛ , BΛ , CΛ , DΛ }) ≥ 0 K(Q, {A, B, C, D}) ≥ 0 QTτ = Qτ (3.143) QTΛ = QΛ QT = Q r0 ≤ α Solving (3.143) gives the input spectrum shown as the thick solid line in Figure 3.6. The solution conforms quite well to common intuition – most of the energy is distributed around the desired bandwidth of the closed-loop system. The minimum power required is α = 0.26. Figure 3.7 shows the parameter estimates of 1000 Monte Carlo runs for the optimal design based on (3.143). We see that the estimates are clustered inside the contour ∆∞ = 0.1 as desired. In fact, 96% of the estimated models satisfy the quality constraint, since some of estimates outside the conﬁdence ellipse still are inside the level curve ∆∞ = 0.1. Remark: Given a feasible solution of (3.143), we know that (3.109) is satisﬁed and according to Theorem 3.1 this will imply that ∆∞ ≤ 0.1 for all models in the uncertainty set. However, the restriction on τ (ω) will lead to a conservative input design, since the representation of τ (ω) only corresponds to a subclass of all τ (ω) ≥ 0. In this example, however, the solution is not very conservative at all. For comparison purposes, conﬁdence ellipses for the optimal design and a white noise design with the same variance (corresponds to Φu (ω) = 0.26 shown as the dotted line in Figure 3.6) are also shown in Figure 3.7. These ellipses are all based on estimates of the covariance matrix obtained from Monte Carlo simulations. We clearly see that the approximation of (3.141) made in (3.143) performs well whereas the white input design clearly does not meet the objective and, in this case, is uniformly worse than the optimal design. In fact, only 67% of the models satisfy the quality constraint. To obtain a white input design with a conﬁdence ellipsoid that is completely inside the contour curve corresponding to ∆∞ = 0.1, an input variance of Φu (ω) = 0.85 is required. 92 3 Fundamentals of Experiment Design −0.62 −0.64 −0.66 a −0.68 −0.7 −0.72 −0.74 −0.76 −0.78 0.3 0.32 0.34 0.36 b 0.38 0.4 0.42 0.44 Figure 3.7: Dots: Estimated model parameters from 1000 Monte-Carlo runs based on optimal design. Dashed ellipse: Estimated 95% conﬁdence bound for the parameter estimates. Dash-dotted ellipse: Conﬁdence bound for white noise input with Φu = 0.26. Contour lines with interval 0.025 are plotted for ∆∞ and ∆∞ = 0.1 corresponds to the thick solid contour. In calculating the optimal design above, knowledge of the true system was used. Both the quality constraint and the covariance matrix PN in (3.141) depend on the true system. This manifests itself in (3.143) in that AΛ , BΛ , CΛ and DΛ all depend on θo . In a practical situation the true system is unknown. In a second design we have used an initial identiﬁcation experiment with Φu (ω) = 0.1 and N = 500 to obtain an estimate, θ̂i of the true parameters and an estimate of the covariance of the parameters, P̂i . With this information at hand, 9 additional parameter estimates were drawn from a Normal distribution with mean θ̂i and covariance P̂i . A total of 10 parameter estimates have been used as replacements for θo in (3.143). These estimates are shown as circles in Figure 3.8. This leads to an input design problem with 10 quality constraints instead of one. The resulting input spectrum is shown 93 3.11 Framework Review and Numerical Illustration −0.6 a −0.65 −0.7 −0.75 0.32 0.34 0.36 0.38 0.4 0.42 0.44 b Figure 3.8: Dots: Estimated model parameters from 1000 Monte-Carlo runs based on robust design. Dashed ellipse: Estimated 95% conﬁdence bound for the parameter estimates. Dash-dotted ellipse: Conﬁdence bound for white noise input with Φu = 0.48. Contour lines with interval 0.025 are plotted for ∆∞ and ∆∞ = 0.1 corresponds to the thick solid contour. The square is the initial estimate θ̂i and the circles are the randomly picked estimates used in the robust design. in Figure 3.6. Here we see that more total energy is required in order to satisfy the quality constraint for these 10 systems compared to the single system design considered previously. The power of the input obtained from this robustiﬁed design is α = 0.48. Estimates from 1000 MonteCarlo runs with this design are shown in Figure 3.8. This ﬁgure also includes conﬁdence ellipses for this design and a white noise design with the same power i.e. Φu (ω) = 0.48. The price of having 10 diﬀerent replacements for the unknown θo in the design is in this case that the solution becomes conservative compared to the design solely based on the true system. This is evidenced by the simulations. About 99.5% of the models satisfy the constraint ∆∞ ≤ 94 3 Fundamentals of Experiment Design 0.1. However, the robustiﬁed algorithm yields a solution that is more eﬀective compared to a white noise input with the same power. For a white noise input with Φu (ω) = 0.48, only 87% of the models satisfy the quality constraint. Chapter 4 Finite Sample Input Design for Linearly Parametrized Models Most experiment designs rely on uncertainty descriptions (variance expressions or conﬁdence regions) that are valid asymptotically in the sample size N . In this chapter we take another step towards more reliable input designs, by considering a ﬁnite sample size. For the considered class of linearly parametrized frequency functions it is possible to derive variance expressions that are exact for ﬁnite sample sizes. Based on these variance expressions it is shown that the optimization over the square of the Discrete Fourier Transform (DFT) coeﬃcients of the input leads to convex optimization problems. Two diﬀerent approaches are considered. The ﬁrst method is based on a recently developed explicit variance expression which is valid for a class of model structures, which includes FIR, Laguerre and Kautz models. A restriction with this expression is that the number of nonzero DFT coeﬃcients of the input must equal the number of estimated parameters. The other method is directly based on the covariance matrix for the parameter estimates which allows for an arbitrary number of nonzero DFT-coeﬃcients. This method relates to the framework presented in Chapter 3. 96 4 Finite Sample Input Design for Linearly Parametrized Models 4.1 Introduction In this chapter we will again consider input design when the true system is in the model set. Let G(ejω , θ) be the frequency function to be estimated with θ ∈ Rn the unknown parameter vector. Denote the parameter estimate based on a sample size N of the input-output data set by θ̂N , and the corresponding frequency function estimate by ĜN (ejω ) = G(ejω , θ̂N ). The close connection between the variance of the frequency function estimate, Var ĜN (ejω ), and the model uncertainty has prompted the use of this variance as the key variable in input design. The key expression for the variance is (2.23), i.e. the ﬁrst order Taylor approximation Var ĜN (ejω ) ≈ 1 dG∗ (ejω , θo ) dG(ejω , θo ) P , N dθ dθ (4.1) where θo is the true parameter vector. Based on the expression (4.1) it is diﬃcult to interpret the inﬂuence of the experiment conditions such as the input spectrum on the variance. This has been one of the drives to ﬁnd alternative expressions for (4.1). By introducing κn,N (ω) = Var(ĜN (ejω )) N · Φu (ω) . Φv (ω) where Φu (ω) and Φv (ω) denote the input and noise spectra, respectively, one can write Var ĜN (ejω ) = κn,N (ω) Φv (ω) . N · Φu (ω) (4.2) In one very fruitful line of research, represented by (Gevers and Ljung, 1986; Hjalmarsson et al., 1996; Forssell and Ljung, 2000; Zhu, 2001), the high-order approximation κn,N (ω) ≈ mo (4.3) for large enough model order mo and number of sample size N , has been used for input design. The direct frequency by frequency dependence on the variance of the input spectrum leads to explicit frequency wise solutions for the input spectrum. The approximation (4.3) is motivated by the asymptotic result lim lim m→∞ N →∞ N Φv (ω) Var ĜN (ejω ) = , m Φu (ω) (4.4) 4.2 Discrete Fourier Transform Representation of Signals 97 originally derived in (Ljung and Yuan, 1985; Ljung, 1985). Starting with (Ninness et al., 1999), there have been a number of contributions towards more exact expressions for κn,N (ω) for diﬀerent model structures. The case of a model with ﬁxed denominator and ﬁxed moving average noise model excited by an auto-regressive (AR) input is ∆ studied in (Xie and Ljung, 2001) and an exact expression for κn (ω) = limN →∞ κn,N (ω) is derived. This expression is thus not asymptotic in the number of parameters. A generalization of this result, including BoxJenkins models, is presented in (Ninness and Hjalmarsson, 2004; Ninness and Hjalmarsson, 2002a). For model classes that are linear in the parameters, the paper (Hjalmarsson and Ninness, 2004) presents an expression that is valid for ﬁnite sample sizes when the number of non-zero spectral lines of the input equals the number of estimated parameters. Here we will exploit the variance results in (Hjalmarsson and Ninness, 2004) for input design. This leads to a geometric programming problem. A related approach is presented in (Lee, 2003) and we will comment more on the relation to this approach later in the paper. We we will also derive a method that is based on (4.1). This method is not restricted to the conditions in (Hjalmarsson and Ninness, 2004), i.e. more than n of the spectral lines can be taken non-zero. The price paid for this is that the design procedure itself is somewhat more abstract. Furthermore, the computational complexity is larger and becomes an issue for large numbers of excited spectral lines. The outline of the chapter is as follows. Preliminaries concerning Fourier representation of signals are covered in Section 4.2. The model structure, and least-squares estimation of its parameters, is introduced in Section 4.3. The input design problem is then introduced in Section 4.4 and solution methods are discussed in the following subsections. Section 4.5 contains a summary. 4.2 Discrete Fourier Transform Representation of Signals Let z(t) ∈ R be deﬁned for t = −(n − 1), . . . , N − 1 (where n > 0) and satisfy z(t + N ) = z(t), t = −(n − 1), . . . , −1. (4.5) 98 4 Finite Sample Input Design for Linearly Parametrized Models Then we can write z(t) = N −1 Zk ejω◦ k t , t = −(n − 1), . . . , N − 1 (4.6) k=0 where ω◦ = 2π/N . Since z(t) is real, it holds that ZN −k = Zk , k = 1, . . . , N − 1 and Z0 ∈ R. We deﬁne the covariance function for z(t) according to rzz (τ ) = N −1 1 z(t)z(t − |τ |), N t=0 |τ | ≤ n − 1. Using (4.6) we have rzz (τ ) = N −1 |Zk |2 ejω◦ kτ , |τ | ≤ n − 1. (4.7) k=0 When only m < N spectral lines of z(t) are non-zero, we will express (4.6) as z(t) = m Z̃k ejωk t , t = −(n − 1), . . . , N − 1 (4.8) k=1 where Z̃k = Zl and ωk = ω◦ l for some 0 ≤ l < N − 1. Notice that the covariance function then can be written rzz (τ ) = m |Z̃k |2 ejωk τ , |τ | ≤ n − 1. (4.9) k=1 4.3 Least-squares Estimation It will be assumed that the true system is single-input/single-output and given by y(t) = G◦ (q)u(t) + e◦ (t) (4.10) where {e◦ (t)} is a zero mean i.i.d. process that satisﬁes E{|e◦ (t)|2 } < ∞ and has variance λ◦ . The system dynamics is given by G◦ (q) = θ◦T Γ(q) F (q) (4.11) 99 4.3 Least-squares Estimation where F (q) is a known stable transfer function and where Γ(q) = [q −1 , . . . , q −n ]T (4.12) (q −1 is the time-shift operator). By introducing z(t) = F (q)u(t) (4.13) ϕ(t) = [z(t − 1), z(t − 2), . . . z(t − n)]T (4.14) and we may rewrite (4.10) as y(t) = θ◦T ϕ(t) + e◦ (t). (4.15) It is assumed that the model structure is of the same form y(t) = θT ϕ(t) + e(t). (4.16) Notice that several common model structures ﬁt into this form. FIR systems correspond to F (q) = 1. Furthermore, ﬁxed denominator structures such as Laguerre and Kautz models correspond to F (q) = 1/A(q) for some suitably chosen polynomial A(q) (Ninness et al., 1999). N The available data is {y(t)}N t=1 and {ϕ(t)}t=1 and based on these samples the parameter vector θ is estimated using the least-squares method. This results in the estimate −1 N N 1 1 T ϕ(t)ϕ (t) ϕ(t) y(t). (4.17) θ̂N = N t=1 N t=1 The function T ĜN (ejω ) = θ̂N Γ(ejω )F (ejω ) (4.18) jω is an estimate of the true frequency function G◦ (e ). Before we proceed we introduce the notation T (t(τ )) (4.19) for an n × n Toeplitz matrix with elements tij = t(i − j). For (4.14) it follows that N 1 ϕ(t)ϕT (t) = T (rzz (τ )) . N t=1 (4.20) The input design methods to be presented below are based on the ﬁnite sample variance expression in the following lemma. 100 4 Finite Sample Input Design for Linearly Parametrized Models Lemma 4.1 Suppose that ϕ(t) is given by (4.14) where z(t), t = −(n − 1), . . . , N − 1 is deterministic and deﬁned by (4.6). Suppose further that the number of estimated parameters n is not larger than the number of non-zero spectral lines for z(t). Then the least-squares estimate θ̂N ∈ Rn deﬁned by (4.17) is well deﬁned and has covariance matrix Cov θ̂N = λ◦ −1 T (rzz (τ )) N (4.21) where rzz is the covariance function (4.7) of z. Furthermore, the variance of ĜN (ejω ), deﬁned in (4.18), is given by Var ĜN (ejω ) = λ◦ |F (ejω )|2 Γ∗ (ejω ) T −1 (rzz (τ )) Γ(ejω ). N (4.22) Proof: See (Hjalmarsson and Ninness, 2004) Deﬁne vk (ω) = m * l=1, l=k jω e − ejωl ejωk − ejωl 2 = m * l=1, l=k sin2 ω−ωl 2 l sin2 ωk −ω 2 (4.23) As the following theorem shows, it is possible to rewrite (4.22) such that the DFT-coeﬃcients appear explicitly in the variance expression when the number of non-zero DFT-coeﬃcients equals the number of estimated parameters. Theorem 4.1 Let ϕ(t) and θ̂N be deﬁned by (4.14) and (4.17), respectively, with z(t) deﬁned in (4.13) satisfying z(t + N ) = z(t) for t = −(n − 1), . . . , −1 such that this signal can be written as (4.8). Furthermore, let the number of non-zero spectral lines m equal the number of estimated parameters n. Then it holds that the variance of the frequency function estimate (4.18) is given by Var ĜN (ejω ) = m 1 λ◦ |F (ejω )|2 vk (ω). N |Z̃k |2 k=1 Proof: See (Hjalmarsson and Ninness, 2004) (4.24) 101 4.3 Least-squares Estimation Corollary 4.1 (FIR-models) Let ϕ(t) and θ̂N be deﬁned by (4.14) and (4.17), respectively, with F (q) = 1 and u(t) satisfying u(t + N ) = u(t) for t = −(n − 1), . . . , −1 such that this signal can be written as u(t) = m Ũk ejωk t , t = −(n − 1), . . . , N − 1. (4.25) k=1 Furthermore, let the number of non-zero spectral lines m equal the number of estimated parameters n. Then it holds that the variance of the frequency function estimate (4.18) is given by Var ĜN (ejω ) = m λ◦ 1 vk (ω) N |Ũk |2 k=1 (4.26) where vk is deﬁned by (4.23). Proof: See (Hjalmarsson and Ninness, 2004) The condition on the input sequence in Corollary 4.1 implies that the input has to be tapered, i.e. u(t) = 0 for t = N − n + 1, . . . , N , for a system that is started at the time of data collection. We next state the variance expression when the input is periodic. Corollary 4.2 (Periodic input) Let u(t) be periodic with Fourier representation (4.25) extended to hold also for t < −(n−1). Let ϕ(t) and θ̂N be deﬁned by (4.14) and (4.17). Furthermore, let the number of non-zero spectral lines m equal the number of estimated parameters n. Then it holds that the variance of the frequency function estimate (4.18) is given by Var ĜN (ejω ) = m 1 λ◦ |F (ejω )|2 vk (ω) jω N |F (e k )Ũk |2 k=1 (4.27) where vk is deﬁned by (4.23). Proof: See (Hjalmarsson and Ninness, 2004) 102 4 Finite Sample Input Design for Linearly Parametrized Models 4.4 Input Design Suppose that the maximum input amplitude umax = max −(n−1)≤t≤N −1 |u(t)| (4.28) is restricted, i.e. umax ≤ Cmax for some positive constant Cmax . It is then natural to consider the input design problem where the time it takes to satisfy some condition on the variance is minimized, i.e. minimize N (4.29) subject to Var ĜN (ejω ) ≤ b(ω) (4.30) {u(t), t=−(n−1),...,N −1} umax ≤ Cmax In view of (4.7), Lemma 4.1 shows that the input inﬂuences the accuracy of ĜN only through the squared magnitude |Uk |2 of the spectral lines. This is true when F = 1, i.e. for FIR-models, when u(t) = u(t+N ), t = −(n − 1), . . . , −1, and for the general case when the input is periodic. The phases are immaterial exactly as in the asymptotic case where only the input spectrum plays a role. Hence, it is natural to use αk = |Uk |2 as optimization variables. The freedom in the choice of phases can be used to reduce the maximum amplitude of the input. Various techniques exists to ﬁnd the phases of Uk , k = 1, . . . , n such that umax is minimized given |Uk |, k = 1, . . . , n (Pintelon and Schoukens, 2001). There is a strong connection between the minimum and the root mean square (RMS) value uRMS + + , ,N −1 N ,1 , 2 = |u(t)| = |Uk |2 . N t=1 (4.31) k=0 The crest factor Cr(u) = umax uRMS (4.32) has a minimum of typically 1 ≤ Cr(u) ≤ 2. Hence, the constraint umax ≤ Cmax can be replaced by the constraint uRMS ≤ CRMS for some positive constant CRMS . 103 4.4 Input Design Thus a simpler, and more tractable, problem is to minimize the RMS for a given sample size N : N −1 minimize αk , k=0,...N −1 subject to αk (4.33) k=0 λ◦ ∗ jω b(ω) −1 Γ (e ) T (rzz (τ )) Γ(ejω ) ≤ N |F (ejω )|2 (4.34) αk = |Uk |2 where, referring to Lemma 4.1, the variance constraint (4.30) is captured by the constraint (4.34). The problem (4.33) is not tractable due to the inequality (4.34) which is non-convex in |Uk |. In the subsections that follow we will discuss two methods to convert this problem into a convex programming problem. 4.4.1 Geometric Programming Solution Below we will show that a convex solution exists to the problem (4.33)– (4.34) if the number of spectral lines are restricted to the number of estimated parameters. Therefore, let z(t) be given by (4.8) with m = n. Let the corresponding DFT-coeﬃcients of u(t) be denoted by Ũk and assume that the input is such that Z̃k = F (ejωk )Ũk (i.e. u is periodic). Deﬁne fk (ω) = λ◦ vk (ω), N |F (ejωk )|2 α̃k = |Ũk |2 , 1≤k≤n 1≤k≤n (4.35) (4.36) with vk deﬁned by (4.23). From Theorem 4.1, we have that Var ĜN (ejω ) = fk (ω) α̃k−1 . |F (ejω )|2 n (4.37) k=1 The input design problem can thus be written as minimize α̃k , k=1,...n subject to n k=1 n k=1 α̃k fk (ω) α̃k−1 ≤ (4.38) b(ω) , |F (ejω )|2 0≤ω≤π (4.39) 104 4 Finite Sample Input Design for Linearly Parametrized Models where the free variables are the n frequency components ωk , k = 1, . . . , n and the n squared amplitudes α̃k of the non-zero spectral lines. The constraint (4.39) is semi-inﬁnite but can be approximated by a discretization over the frequency axis. For each ﬁxed selection of the n non-zero spectral lines, (4.38) is a geometric programming in α̃k , k = 1, . . . , n. This problem can be converted into a convex problem via the transformation ηk = log(α̃k ) and by taking the logarithm of the constraints (Boyd and Vandenberghe, 2003). This leads to the following convex problem minimize ηk , k=1,...,n subject to n k=1 log eηk . (4.40) n fk (ω̄l )|F (ej ω̄l )|2 k=1 b(ω̄l ) / e −ηk ≤ 0, l = 1, . . . , M where M is the number of discretized frequencies ω̄l . The convexity of (4.40) implies that the global optimum of this problem can be computed with arbitrary accuracy. Hence, with the frequencies of the non-zero spectral lines ﬁxed, a suitable input can be determined. Returning to the problem (4.38), we see that the solution of this problem can be found by solving (4.40) for all selections of non-zero spectral lines. Given the sample size N , there are N n possible selections of non-zero spectral lines. Hence, when N is large an exhaustive search over all combinations becomes time consuming. However, a reasonable starting point is to use evenly distributed spectral lines. This is also supported by some observations regarding the sensitivity of the variance with respect to the choice of spectral lines in the input made in (Hjalmarsson and Ninness, 2004). It is of interest to compare this approach to the frequency domain method proposed in (Lee, 2003) which is based on FIR-modeling and ﬁnite data of the same type as in this contribution. In (Lee, 2003) it is conjectured that for FIR models the variance (4.1) can be written in a very simple form when n of the spectral lines have to be non-zero, i.e. exactly the same situation as considered in the present subsection. Based on this simple expression, an explicit expression for the optimal 105 4.4 Input Design input spectrum is derived in (Lee, 2003). Comparing with the approach above, a limitation in (Lee, 2003) is that the frequency behaviour outside of the non-zero spectral lines cannot be accounted for in the design. It can also be remarked that (Lee, 2003) propose to optimize directly over the input sequence. This leads to bilinear matrix inequalities which are signiﬁcantly harder to solve. 4.4.2 LMI Solution Using Schur complements the constraint (4.34) can be written as N b(ω) ∗ jω Γ (e ) jω 2 ≥ 0, 0 ≤ ω ≤ π. (4.41) Λ(ω) λo |F (ejω )| T (rzz (τ )) Γ(e ) When F (q) = 1 or the input is periodic it holds that the squared magnitudes of the DFT-coeﬃcients of the input appear linearly in the Toeplitz matrix T (rzz (τ )), see (4.19), (4.9) (4.26) and (4.27). Thus the constraint (4.41) becomes convex but semi-inﬁnite since it has to hold for all frequencies between 0 and π. This problem can be handled by the KYP-lemma (Yakubovich, 1962), similar to the variance constraints in Chapter 3.6. Lemma 4.2 Assume there exists a rational transfer function, L(ejω ), such that b(ω)/|F (ejω )|2 = L(ejω ) + L∗ (ejω ). Let (A, B, C, D) be a controllable state-space representation of the positive real part of Λ(ω) in (4.41). Then the constraint Λ(ω) ≥ 0, 0≤ω≤π is equivalent to the LMI 0 Q − AT QA C T − AT QB + 0 −B T QB C − B T QA 0 ≥0 D + DT (4.42) with Q = Q∗ and where the optimization variables |Uk |2 appear linearly in D. Proof: When b(ω)/|F (ejω )|2 can be represented by a spectrum, it is possible ﬁnd a controllable state-space description of Λ(ω). Given this realization it is a consequence of the KYP-lemma that the constraint 106 4 Finite Sample Input Design for Linearly Parametrized Models 0.05 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 0.5 1 1.5 2 2.5 3 Frequency (rad) Figure 4.1: Variance Var ĜN (ejω ) for the design based on the geometric programming solution in Section 4.4.1. Solid line: Sample variance. Dashed line: Theoretical expression (4.22). Dotted line: The variance bound b(ω). (4.41) is fulﬁlled for each frequency if and only if ∃Q = Q∗ that satisﬁes (4.42). This means that when there exists a rational transfer function, L(ejω ), such that b(ω)/|F (ejω )|2 = L(ejω ) + L∗ (ejω ), the constraint (4.41) can be replaced by one single LMI. Thus the input design problem (4.33)– (4.34) can be solved exactly. With this approach there is no restriction on the number m of non-zero DFT-coeﬃcients (other than m ≥ n which is required for identiﬁability). 4.4.3 Numerical Illustration Suppose that a FIR model of order n = 40 is to be estimated using N = 600 input/output samples. The problem is to design the input such that the variance is below the dotted curve in Figure 4.1. We will do 107 4.4 Input Design 0.05 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 0.5 1 1.5 2 2.5 3 Frequency (rad) Figure 4.2: Variance Var ĜN (ejω ) for the design based on the LMI solution in Section 4.4.2. Solid line: Sample variance. Dashed line: Theoretical expression (4.22). Dotted line: The variance bound b(ω). two diﬀerent designs, one based on the geometric programming solution corresponding to (4.40) and one based on the LMI solution suggested in Section 4.4.2. Choosing the spectral lines to be evenly spaced with a distance of 2π/n apart and using the Matlab optimization toolbox command fmincon to solve the problem (4.40) results in the amplitudes Ũk for the spectral lines given in Figure 4.3. Figure 4.1 shows the variance. The result of Monte Carlo simulations, based on 5000 noise realizations, and the theoretical expression (4.22) are shown. The advantage with the LMI solution is that it is possible to choose m, the number of non-zero DFT-coeﬃcients, to be larger than the order of the system n. This has been utilized in the LMI design where m = 60. Figure 4.2 shows the variance, comparing the result of Monte Carlo simulations, based on 5000 noise realizations, and the theoretical expression (4.22). The optimal amplitudes are shown in Figure 4.3. The spectral lines are evenly spaced with a distance of 2π/m apart. In Figure 108 4 Finite Sample Input Design for Linearly Parametrized Models 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 Frequency (rad) Figure 4.3: Amplitudes |Ũk |. Geometric programming design (’x’), and LMI design (’o’) 4.4 two diﬀerent inputs with these amplitudes are shown. The dotted curve corresponds to choosing the phase equal to zero for all components in u whereas the solid line corresponds to random phases in the interval [0, 2π). Clearly the crest factor, and therefore the maximal amplitude umax , is very sensitive to the choice of phases. The extra degree of freedom that the LMI design has compared to the geometric programming solution is in this example beneﬁcial. The energy of the input based .on the LMI design with m = 60 > n is about 21% less compared to the solution based on m = n. For the optimization procedures that have been presented in this chapter, the excited frequencies are pre-speciﬁed, and they are in general non-optimal since we do not know the location of the optimal frequencies in advance. Thus, the more frequencies that are excited the larger is the probability that the solution is close to the optimal. Since the sample size N is ﬁnite, the available frequencies are restricted to the ﬁnite set, Ω = {2πk/N, k = 0, . . . , N − 1}. Therefore, the optimal solution can be found by setting m = N but then 109 4.4 Input Design 20 15 10 5 0 −5 0 50 100 t 150 200 250 Figure 4.4: Two input signals that correspond to the amplitudes in Figure 4.3 for the LMI solution. The solid line corresponds to a random selection of phases. The dotted line corresponds to Ũk = |Ũk |. computational complexity may become an issue. The input energy decays typically exponentially with m which is illustrated in Figure 4.5. Therefore, m can in most cases be chosen less than N . 4.4.4 Closed-form Solution An interesting question is when there exists a solution containing n frequencies, where ntrue ≤ n < N , on the frequency grid Ω, where ntrue is the order of the true system and n is order of the model. To partly answer this question. Consider a frequency function b(ω) deﬁned by b(ω) = n λ◦ 1 vk (ω) N |Bk |2 (4.43) k=1 where vk is deﬁned by (4.23) and where ωk = ωl , k = l. Assume that (4.43) is used as the frequency-by-frequency bound in the design problem (4.33)-(4.34), then the following theorem applies. 110 4 Finite Sample Input Design for Linearly Parametrized Models 1.05 normalized energy 1 0.95 0.9 0.85 0.8 40 45 50 55 60 65 70 75 80 m Figure 4.5: Required energy of the input to identify an FIR system of order 40 as a function of the number of excited frequencies (m). The frequencies are equidistantly distributed. The energy levels are normalized by the obtained level for m = 40. Theorem 4.2 Assume that the conditions in Lemma 4.1 hold. Further assume that F (q) = 1 and that u(t) satisﬁes u(t + N ) = u(t) for t = −(n − 1), . . . , −1 such that this signal can be written as (4.25). When the variance bound b(ω) is given by (4.43), the solution to the design problem (4.33)-(4.34) are given by |Uk |2 = |Bk |2 . Proof: written The variance bound (4.43) can according to Corollary 4.1 be b(ω) = λ◦ ∗ jω −1 Γ (e )T (rb (τ )) Γ(ejω ), N (4.44) where rb (τ ) is given by (4.9). Insert (4.44) into (4.34) which gives the condition T (rzz (τ )) − T (rb (τ )) ≥ 0. The covariance function rzz (τ ) is deﬁned by (4.9) with m = n and |Z̃k |2 = |Uk |2 = αk . Since the objective N −1 is to minimize k=0 αk = rzz (0), the solution is given by T (rzz (τ )) = T (rb (τ )). Furthermore, since ωk = ωl , k = l for the bound (4.43), it 111 4.4 Input Design follows that there is a one-to-one relation between the variables rb (τ ), τ = 0, . . . , n−1 and the coeﬃcients |Bk |2 , k = 0, . . . , n−1. Since, rzz (τ ), τ = 0, . . . , n − 1 and the DFT-coeﬃcients |Uk |2 , k = 0, . . . , n − 1 also are characterized as (4.44), see (4.9), we have that |Uk |2 = |Bk |2 . Theorem 4.2 shows that the considered bound deﬁned by (4.43), there exists an optimal solution comprising n diﬀerent frequencies. These frequencies coincide with the frequencies ωk that deﬁnes the bound (4.43). 4.4.5 Input Design Based on Over-parametrized Models Consider that the input design is based on the model order n, but that the true order is ntrue < n. Then the input design will be conservative if the number of estimated parameters n̄ are less than n, provided n̄ ≥ ntrue . Let ĜN (ejω , n) denote the frequency response estimate where n parameters have been estimated. Theorem 4.3 Suppose that z(t), t = −(n−1), . . . , N −1 is deterministic and deﬁned by (4.8) and where only n spectral lines are non-zero. Suppose further that the variance of ĜN (ejω , n) obeys Var ĜN (ejω , n) ≤ b(ω). Then when n̄ ≤ n parameters are estimated it holds that Var ĜN (ejω , n̄) ≤ b(ω), provided n̄ ≥ ntrue where ntrue is the order of the true system. Proof: Lemma 4.1 states that Var ĜN (ejω , n) = λ◦ |F (ejω )|2 Γ∗ (ejω ) T −1 (rzz (τ )) Γ(ejω ). N N b(ω) Introduce b̄(ω) = λ◦ |F (ejω )|2 . Then by partitioning of Γ and T we have the following equivalence ⇔ Var ĜN (ejω , n) ≤ b(ω) ∗ −1 Γ1 (ejω ) T11 T12 Γ1 (ejω ) ≤ b̄(ω) ∗ T12 T22 Γ2 (ejω ) Γ2 (ejω ) (4.45) 112 4 Finite Sample Input Design for Linearly Parametrized Models By Schur complements (4.45) is equivalent to ⎞ ⎛ b̄(ω) Γ1 (ejω )∗ Γ2 (ejω )∗ T11 T12 ⎠ ≥ 0 ⇔ ⎝Γ1 (ejω ) jω ∗ Γ2 (e ) T12 T22 jω ∗ b̄(ω) Γ1 (e ) ⇒ ≥0 Γ1 (ejω ) T11 ⇔ −1 Γ1 (ejω ) ≤ b̄(ω). Γ1 (ejω )∗ T11 The last line will, according to Lemma 4.1, correspond to Var ĜN (ejω , n̄) ≤ b(ω) when Γ1 (q) = [q −1 , . . . , q −n̄ ] and when n̄ ≥ ntrue . Hence Var ĜN (ejω , n̄) ≤ b(ω) when Var ĜN (ejω , n) ≤ b(ω). Theorem 4.3 is illustrated in Figure 4.6. 4.5 Conclusions In this chapter we have considered input design for system identiﬁcation problems with ﬁnite sample size. The objective of the input design has been to minimize the root mean square of the input sequence subject to a frequency wise constraint on the variance of the frequency function estimate. Two solutions, that perform the optimization over the square of the DFT coeﬃcients of the input, have been presented. The ﬁrst solution is based on an explicit variance expression. This leads to a convex geometric programming problem which has the main restriction that the number of non-zero DFT coeﬃcients must equal the number of estimated parameters. The second solution is based on the covariance matrix for the parameter estimates. This solution allows for an arbitrary number of DFT coeﬃcients in the optimization. Furthermore, this method can handle frequency by frequency constraints, in contrary to the geometric programming solution which has to discretize the frequency axis. 113 4.5 Conclusions 0.05 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 0.5 1 1.5 2 Frequency (rad) 2.5 3 3.5 Figure 4.6: Variance Var ĜN (ejω ) for design based on the LMI solution in Section 4.4.2 where the design is based on n = 40 but where only 20 parameters have been estimated. Solid line: Sample variance. Dashed line: Theoretical expression (4.22). Dotted line: The variance bound b(ω). Chapter 5 Applications The objective of this chapter is to quantify beneﬁts of optimal input design compared to the use of standard identiﬁcation input signals, for example PRBS signals for some common, and important, application areas of system identiﬁcation. Two benchmark problems taken from process control and control of ﬂexible mechanical structures are considered. We present results both when the design is based on knowledge of the true system (in general the optimal design depends on the system itself) and for a practical two step procedure when an initial model estimate is used in the design instead of the true system. The results show that there is a substantial reduction in experiment time and input excitation level. A discussion on the sensitivity of the optimal input design to model estimates is provided. This chapter is organized as follows. Section 5.1 gives an introduction together with some technical assumptions including the optimal input design setup. The optimal input design is applied on a process plant in Section 5.2 and on a resonant system in Section 5.3. The chapter is concluded in Section 5.4. 5.1 Introduction Many industrial processes have (very) slow responses which leads to long and expensive identiﬁcation experiments (Ogunnaike, 1996). It is thus important to design the experiments carefully as to maximize the information contents. Another area where input design can be crucial is when 116 5 Applications identifying ﬂexible mechanical structures. Here, time is not crucial but the experiments are usually severely constrained in order to not damage equipment. Input design has a long history and (Zhu, 2001; Rivera et al., 2003; Lee, 2003; Jacobsen, 1994) are some recent contributions related to process control. Typical design problems correspond to non-convex programs and, hence, computational aspects have limited the applicability of optimal input design. One purpose with this chapter is to examine more closely what the framework of Chapter 3 has to oﬀer for the aforementioned application areas. More precisely, the objective is twofold: I. The ﬁrst aspect is to quantify possible beneﬁts of optimal input design for the two applications. The use of input signals with optimal frequency distribution will be compared to the use of standard identiﬁcation input signals e.g. PRBS signals. The beneﬁts will be quantiﬁed in terms of saved experiment time and in possible reduction of input excitation. Since process modelling may be very time consuming we will in this paper illustrate possible time savings by using an optimal strategy for the considered process model. Here the time it takes to obtain a certain quality of the model is measured and compared for diﬀerent inputs when the input energy is held constant. For the mechanical system, the experiment time is in many cases not an issue. Instead, we will study possible savings in the level of input excitation with an optimal strategy. II. The second aspect is to enlighten some robustness issues regarding the input design. Optimal input designs in general depend on the unknown true system. This is typically circumvented by replacing the true system with some estimate in the design procedure. But there exist very few hard results on the sensitivity and the robustness of these optimal designs with respect to uncertainties in the model estimate that is used in the design. Here we will illustrate situations where input designs are very sensitive to model errors included in the design. Furthermore, we will redo the comparison in (I), but in a more realistic setting where the optimal design philosophy is replaced by a two-step procedure. The approach taken is inspired by the work in (Lindqvist and Hjalmarsson, 2001) and (Lindqvist, 2001). In the ﬁrst step an initial model is estimated based on a PRBS input. An input design based on this 117 5.1 Introduction estimate is then applied to the process. This adaptive approach is compared to an approach which only uses PRBS as input. The comparison is based on Monte-Carlo simulations in order to study the average gain in performance. Usually the comparison between diﬀerent input signals is in terms of conﬁdence bounds, see e.g. (Gevers and Ljung, 1986), (Forssell and Ljung, 2000) and (Shirt et al., 1994). For industrial applications however perhaps more relevant measures are excitation level and experiment time, treated e.g. in (Bombois et al., 2004d) and (Rivera et al., 2003). We will assume that the systems obey the discrete-time linear relation y(t) = Go (q)u(t) + Ho (q)e(t) (5.1) where Go and Ho represents the system and the noise dynamics, respectively, with q being the delay operator. Furthermore, y is the output, u is the input and e is white noise. For the modelling of (5.1) we will consider identiﬁcation within the prediction error framework as described in Chapter 2. We will assume that we have a full-order model structure and hence only variance errors occurs. Therefore, for large data lengths, the model error can be characterized by some function of the parameter covariance P . To deﬁne a proper quality function one has to take the intended use of the model into account. Here we will consider control design and we will adopt the following quality measure |∆(θ)| ≤ γ ∀ ω, ∀θ ∈ Uθ Uθ = {θ : N (θ − θo ) P T −1 (5.2) (Φu )(θ − θo ) ≤ χ} (5.3) where ∆(θ) = T Go − G(θ) . G(θ) (5.4) The constraint (5.2) implies that ∆∞ ≤ γ for all models in the conﬁdence region (5.3). Based on the quality measure (5.2), we will pose the input design problem as minimize α subject to |∆(θ)| ≤ γ ∀ ω N (θ − θo )T P −1 (Φu )(θ − θo ) ≤ χ π 1 2π −π Φu (ω)dω ≤ α 0 ≤ Φu (ω) ≤ β(ω) Φu (5.5) 118 5 Applications The objective of this input design problem is to ﬁnd the input spectrum with the least energy that satisﬁes (5.2). There may also exist a frequency by frequency constraint on the input spectrum here represented by β(ω). The input design problem is a non-convex and non-trivial optimization problem but by applying the theory in Chapter 3 it is possible to obtain a ﬁnite-dimensional convex program. To obtain the convex program we will mainly work with the parametrization Φu = M −1 ck cos(kω) (5.6) k=0 This parametrization is rather ﬂexible. For example, both power and frequency by frequency constraints on the input spectrum can be handled, which will be shown in the examples. Furthermore, any spectrum can be approximated to any demanded accuracy provided that the order M is suﬃciently large. However, when M becomes too large, computational complexity becomes an issue. This parametrization was originally introduced in (Lindqvist and Hjalmarsson, 2001). Notice that (5.6) corresponds to white noise when M = 1. The parameters in (5.5) are speciﬁed such that |∆| has to be less than 0.1 for at least 95% of the estimated models. Hence γ = 0.1 and the size of the conﬁdence region χ is determined such that P r(χ2 (n) ≤ χ) = 0.95 where n denotes the number of parameters in the model1 . For example, χ = 9.49 when n = 4. The frequency weighting T in (3.19) will be a discretization of T̃ or T̃ 2 where T̃ (s) = ω02 (s2 + 2ξω0 s + ω02 ) (5.7) with the damping ξ = 0.7 and where ω0 will be used to change the bandwidth of T̃ . 5.2 A Process Control Application The main objective of this section is to give a ﬂavor of the usefulness of using optimal input design for identiﬁcation of models for process control design. The process plant is deﬁned by (5.1) with the ARX structure Go (q) = 1 χ2 (n) B(q) A(q) Ho (q) = 1 A(q) denotes the χ2 -distribution with n degrees of freedom. (5.8) 119 5.2 A Process Control Application where A(q) = 1 − 1.511q −1 + 0.5488q −2 B(q) = 0.02059q −1 + 0.01686q −2 and . The sampling time is 10 seconds and e has variance 0.01. This is a a slight modiﬁcation of a typical process control application considered in (Skogestad, 2003). The process has a rise time of 227 seconds, and consequently, collecting data samples for the identiﬁcation takes long time. Therefore the objective of using optimal input design for this plant is to keep the experiment time to a minimum. 5.2.1 Optimal Design Compared to White Input Signals Here we will compare the experiment time required for an optimal input design to achieve a certain quality constraint with the corresponding time required for a white noise input. The optimal input design will be based on knowledge of the true system. In reality, of course, this is not a feasible solution since the true system is unknown. However, the motivation for this analysis is to investigate what could in the best case be achieved with optimal input design. The optimal design is based on a data length of Nopt = 500. Furthermore, T is a discretization of T̃ 2 in (5.7) and there is no frequency by frequency bound β(ω) on Φu . In the comparison we have normalized the power of the white input to be equal to the power of the obtained optimal input. The data length of the white input has then been increased until 95% of the obtained models satisfy |∆| ≤ 0.1. This data length is denoted Nwhite and the ratio Nwhite /Nopt is plotted in Figure 5.1 for diﬀerent bandwidths of T . This example shows that the white input requires about 10 times more data to satisfy the quality constraint, which is a quite substantial amount of data. In other words, the optimal experiment takes less than one and a half hour compared to almost 14 hours with a white input. The input spectra corresponding to high bandwidth (0.0324 rad/s) of T are shown in Figure 5.2. The optimal spectrum has a very clear low-pass character. Since the quality measure has a harder penalty on errors of the relative model error in the low frequency range this agrees well with common intuition; excite the frequencies where you want small errors. Instead of the white input, a signal with most of its energy in the frequency band of the weighting 120 5 Applications 11.6 11.4 Nwhite/Nopt 11.2 11 10.8 10.6 10.4 10.2 10 0.015 Figure 5.1: The ratio process plant. 0.02 Nwhite Nopt 0.025 ωB (rad/s) 0.03 0.035 as function of the bandwidth of T for the function T would perhaps have been a more appropriate choice in this example. One example is depicted in Figure 5.2, which corresponds to the spectrum of a 3rd order Butterworth ﬁlter, where the cutoﬀ frequency equals the bandwidth of T . This design achieves approximately the speciﬁed quality constraint and its energy is 1.7 larger than the optimal. Even though this is larger than the optimal it is still signiﬁcantly less than the energy of the white input. 5.2.2 Optimal Input Design in Practice To handle the more realistic situation where the true system is unknown, we will replace the optimal design strategy by a two-step procedure. In the ﬁrst step an initial model is estimated based on a PRBS input2 . This model estimate is used as a replacement for the true system in the design problem (5.5). The obtained sub-optimal solution is then applied to the process in the second step. This adaptive approach is compared to an approach which only uses PRBS as input. The two strategies are illustrated in Figure 5.3. The main objective is to investigate whether there are any beneﬁts of using a sub-optimal design approach or not. 2 PRBS is a periodic, deterministic signal with white-noise-like properties. 121 5.2 A Process Control Application 60 40 magnitude (dB) 20 0 −20 −40 −60 −80 −4 10 −3 10 −2 10 frequency (rad/s) −1 10 Figure 5.2: The process plant, high bandwidth of T . Thick solid line: optimal input spectrum. Dashed line: transfer function T . Dash-dotted line: open-loop system. Thin solid line: white input spectrum. Dotted line: low-pass approximation of optimal spectrum. The comparison is based on Monte-Carlo simulations in order to study the average gain in performance. First consider the two-step adaptive input design approach. We use a PRBS with length Ninit = 300 and amplitude 3 to estimate an initial model estimate Gm of the true system. This model is used for input design based on (5.5) with no frequency by frequency upper bound on the input spectrum and experiment length Nopt = 500. This strategy is compared to the approach where a single set of PRBS is used in each Monte-Carlo run. For the comparison’s sake the amplitude of the the PRBS is tuned so that the signal has the same input power as the average power of the input in the two-step approach. After 1000 Monte-Carlo simulations with diﬀerent noise realizations, 98.3% of the models with the two-step procedure satisfy ∆∞ ≤ 0.1. With an experiment length of N = 3600, 96.4% of the models satisfy the constraint for the PRBS 122 5 Applications (a) N (b) Ninit Nopt Figure 5.3: Two strategies for input design. (a) The ad hoc-solution. A PRBS is used for the entire identiﬁcation experiment as input signal. (b) Adaptive input design. The input data set is split in two parts. The ﬁrst part, a PRBS, is used for identiﬁcation of an initial model estimate. The second part is an, with respect to the initial model, optimally designed input. approach. One realization of the input sequences for both strategies are plotted versus time in hours in Figure 5.4. We clearly see that the experiment time when input design is involved is less than 2 hours and 15 minutes, but more than 10 hours for the PRBS input. We conclude that for the considered quality constraint, the experiment time can be shortened substantially even when the sub-optimal design is used. 5.3 A Mechanical System Application In this section, input design is applied to a resonant mechanical system. The system is represented by a slightly modiﬁed version of the half-load ﬂexible transmission system proposed in (Landau et al., 1995b) as a benchmark problem for control design. It has been used for input design illustrations in (Bombois et al., 2004b). The system is deﬁned by the ARX structure (5.8) with A(q) = 1 − 1.99185q −1 + 2.20265q −2 − 1.84083q −3 + 0.89413q −4 B(q) = 0.10276q −3 + 0.18123q −4 and . Furthermore, e is white noise with variance 0.01. The sampling time is Ts = 0.05 seconds. 123 5.3 A Mechanical System Application 10 5 0 −5 −10 0 2 4 2 4 6 8 6 8 10 0 −10 0 time (h) Figure 5.4: The process plant. Above: the input sequence not involving optimal input design. Below: the input sequence when involving optimal input design. The ﬁrst part of the signal is used to identify an initial model estimate. The experiment time is not an issue for this system. Therefore, the objective of the design is to obtain an input that, for a given data length, has as low excitation level as possible. 5.3.1 Optimal Design Compared to White Input Signals The optimal design will be based on the true system, as the example in Section 5.2.1. Here T is a discretization of T̃ and the data length is 500 for the optimal design as well as for the white input. The reason is that we will compare excitation levels rather than experiment times. First, consider the optimal input design when there is no upper bound imposed on the input spectrum. The input power αopt for the optimal input is plotted versus the bandwidth of T in Figure 5.5. It is clear that the input excitation increases with increasing bandwidth of T . This has to do with the deﬁnition of ∆. When the bandwidth increases the relative error around the ﬁrst resonance peak starts to dominate in ∆∞ and more 124 5 Applications input power has to be injected. The optimal spectra for low and high bandwidth, respectively, are plotted in Figure 5.6 and Figure 5.7. Here we see that the input power is concentrated around the ﬁrst resonance peak for high bandwidths. Let αwhite be the required input power for a white input to achieve the speciﬁed model quality. The ratio αopt /αwhite is plotted versus the bandwidth of T in Figure 5.5. We can conclude that when the total power is compared there are certainly beneﬁts obtained using an optimal strategy compared to the white input, especially for high bandwidths where the white input requires about ten times the energy required for the optimal design. This is due to the large impact of the ﬁrst resonance peak and the capability of the optimal design to distribute much energy around this peak and less for other frequencies. However, in practice it may be unacceptable to concentrate the input power around the resonance peak. The presented optimal input design method can handle frequencywise conditions on the input spectrum, see β(ω) in (5.5). This possibility is now used. A frequency-wise upper bound on the input spectrum is included in the design problem (5.5) that restricts the possibility to inject energy around the ﬁrst resonance peak. The upper bound and the obtained spectrum is shown in Figure 5.8. This is a good illustration of the impact of the ﬁrst resonance peak on the required input power. With this restriction we need about 14 times more energy than without the bound. So there is a delicate trade-oﬀ between demanding a small relative model error around the resonance peak, the possibility to excite this frequency band and required total input power. It is worth to notice that the suggested framework allows the designer to control this. 5.3.2 Input Design in a Practical Situation In this section, the parameters of the true system are assumed to be unknown. As for the process application we will handle this situation by replacing the optimal design procedure by an adaptive two-step approach. For the mechanical system, the two-step adaptive approach goes as follows. A PRBS with length Ninit = 300 and amplitude 0.5 is used for the estimation of the initial model Gm . An input is designed based on Gm using (5.5) with Nopt = 500. This strategy is compared with the approach using a single set of PRBS data of length 800, i.e. the data lengths are equalized. The signal power of the PRBS is set to 6 times that of the two-stage sub-optimal input in each Monte-Carlo run. One realization of the input sequences for both strategies are plotted versus time 125 5.3 A Mechanical System Application 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2 3 4 5 6 ωB 7 8 9 10 Figure 5.5: The mechanical system. The input power αopt () and the ratio αopt /αwhite () as functions of the bandwidth of T . in Figure 5.9. For 1000 Monte-Carlo simulations, 92.3% of the obtained models from the two-step procedure passed the constraint ∆∞ ≤ 0.1. The corresponding ﬁgure for the PRBS approach was 91.8%. Thus the input excitation for the PRBS approach needs to be about 6 times the optimal input power to produce equally good models. We conclude that for the given quality constraint, the excitation level of the input signal can be reduced signiﬁcantly using the illustrated sub-optimal input design. 5.3.3 Sensitivity of the Optimal Design It was illustrated in the previous section that the input design performed well even when the true system was replaced by an estimated model. However, some caution must be taken. In this section we will illustrate, by means of an example, that the optimal input design method may be sensitive to the quality of the initial model. Let us again consider the mechanical system. Assume that the true system is unknown, but that we have a preliminary model Gm that deviates from the true system Go . The magnitude plots of Gm and Go are shown in Figure 5.10. Now assume that the true system is replaced with 126 5 Applications Bode Diagram 40 20 magnitude (dB) 0 −20 −40 −60 −80 −100 −120 −2 10 −1 10 0 10 frequency (rad/sec) 1 10 Figure 5.6: The mechanical system, low bandwidth of T . Thick solid line: optimal input spectrum. Dashed line: transfer function T . Dashdotted line: open-loop system. Thin solid line: white input spectrum. Gm in the design problem (5.5). The resulting sub-optimal input spectrum is plotted in Figure 5.10 together with the optimal input spectrum based on Go . The bandwidth of T is 8 rad/s. We see from Figure 5.10 that the input power is diﬀerently distributed for the sub-optimal design compared to the optimal one. However, the energy is in both cases concentrated around the ﬁrst resonance peak. This is completely in line with the observations in Section 5.3.1 where it was recognized that it is eﬀective to inject energy around the ﬁrst resonance peak for high bandwidths of T . However, an input design that concentrates the power around a resonance peak may be very vulnerable with respect to bad model estimates of this peak. The model Gm is one such example. For example, out of 100 Monte-Carlo identiﬁcation experiments, only 23 of the obtained models with the sub-optimal design achieve the quality constraint ∆∞ < 0.1. Optimally it should be at 127 5.4 Conclusions Bode Diagram 40 20 magnitude (dB) 0 −20 −40 −60 −80 −100 −120 −3 10 −2 10 −1 10 0 10 frequency (rad/sec) 1 10 Figure 5.7: The mechanical system, high bandwidth of T . Thick solid line: optimal input spectrum. Dashed line: transfer function T . Dashdotted line: open-loop system. Thin solid line: white input spectrum. least 95%. We conclude that it is important that the initial experiment captures the resonance peaks of importance. The reason why the suboptimal method in Section 5.3.2 performed well is probably that the initial experiment with the PRBS signal excited these peaks yielding proper initial model estimates. A way to make the solution less sensitive to this peak is to restrict the ﬂexibility of the input spectrum, i.e. reduce M . Another solution is to include constraints on the input spectrum in terms of frequency-wise bounds or a frequency weighting on the input power, see (3.57). See also the related discussion in Section 3.10. 5.4 Conclusions In this chapter we have illustrated and quantiﬁed possible beneﬁts with optimal input design in identiﬁcation for two applications. We have com- 128 5 Applications 50 40 30 magnitude (dB) 20 10 0 −10 −20 −30 −40 −50 −1 10 0 10 frequency (rad/s) 1 10 Figure 5.8: The mechanical system, high bandwidth of T . Thick solid line: optimal input spectrum. Dashed line: transfer function T . Dashdotted line: open-loop system. Thin solid line: upper bound on Φu . pared optimally designed input signals with white input signals. The results show signiﬁcant beneﬁts of appropriate input designs. Either the experiment time can be shortened or the input power can be reduced. Through Monte-Carlo simulations it is illustrated that there are advantages also in the case where the true system is replaced by a model estimate in the design. Here we have only touched upon adaptive input design. For practical implementations of optimal input design, adaptation is typically required. There are though only few contributions related to adaptive input design, see e.g. (Hjalmarsson et al., 1996), (Lindqvist and Hjalmarsson, 2001) and (Lindqvist, 2001). More contributions are certainly welcome. Another important issue is the sensitivity of the input design with respect to the underlying system properties, since the optimal design typically depends on those. Such knowledge can be used to make the 129 5.4 Conclusions 2 1 0 −1 −2 0 5 10 15 5 10 15 20 25 30 35 20 25 30 35 2 1 0 −1 −2 0 time (s) Figure 5.9: The mechanical system. Above: the input sequence not involving optimal input design. Below: the input sequence when involving optimal input design. The ﬁrst part of the signal is used to identify an initial model estimate. The second part is the optimal input signal. input more robust with respect to uncertainties of the underlying system. For this analysis the variance expressions (2.25) may provide some insights. 130 5 Applications From: u1 To: y1 20 magnitude (dB) 0 −20 −40 −60 −80 −100 −1 10 0 10 1 10 Figure 5.10: The mechanical system. Thick solid line: the optimal input spectrum. Thin solid line: the sub-optimal input spectrum (see text). Dashed line: weight function T . Dotted line: the model. Dash dotted line: the true system. Chapter 6 Input Design for Identification of Zeros In this chapter we will consider input design for accurate identiﬁcation of non-minimum phase zeros in linear systems. Recently, several variance results regarding estimation of non-minimum phase zeros have been presented, see (Lindqvist, 2001), (Hjalmarsson and Lindqvist, 2002) and (Mårtensson and Hjalmarsson, 2003). Based on these results, we will show how to design the input that has the least energy content required to keep the variance of an estimated zero below a certain limit. Both analytical and numerical results are presented. A striking fact of the analytical results is that the variance of an estimated zero is independent of the model order when the optimal input is applied. We will also quantify the beneﬁts of using the optimal design compared to using a white input signal or a square-wave. Robustness issues will also be covered in this chapter. The optimal design depends on the location of the true unknown zero and is therefore infeasible. This is typically circumvented by replacing the true zero by an estimate. The sensitivity of the solution to this estimate is investigated . The chapter is organized as follows. Section 6.2 contains information about system assumptions and the used identiﬁcation framework. Asymptotic variance expressions for an estimated zero are also given. Based on these variance expressions the optimal input design problem is formulated and both analytical and numerical solutions to this problem are presented in Section 6.3 and Section 6.4. Sensitivity and beneﬁts 132 6 Input Design for Identification of Zeros of optimal input design for identiﬁcation of zeros are discussed and illustrated in Section 6.5. Section 6.6 studies the connections between models of restricted complexity and variance optimal input design. The chapter is concluded in Section 6.7. 6.1 Introduction A model is often used in control design for both analysis and synthesis purposes. Consequently, system identiﬁcation with focus on control design has been a research area with a lot of activity. The overall objective of identiﬁcation for control is to deliver models suitable for control design. See (Gevers, 1993), (Van den Hof and Schrama, 1995) and (Hjalmarsson, 2004) for overviews of the area. For scalar linear systems, the model should be accurate in the frequency bands important for the control design and it is generally acknowledged that the region around the cross-over frequency of the loop gain is of particular importance. Since the loop gain depends on the controller yet to be designed, the cross-over frequency is in general unknown. However, for systems that contain performance limitations e.g. non-minimum phase zeros and time-delays, the achievable bandwidth is restricted. For example, a real single continuous time non-minimum phase zero at z restricts the achievable bandwidth to approximately z/2 (Skogestad and Postlethwaite, 1996). Therefore knowledge of a non-minimum phase zero is very useful since it gives valuable information of what control speciﬁcations that can be deﬁned and a hint on reasonable choice of excitation. Furthermore, it is also valuable in the identiﬁcation step since it would simplify the task of deciding on model structure, model order, noise model and pre-ﬁlters since it speciﬁes an important frequency range. Spurred by this observation, expressions for the variance of an estimated non-minimum zero have been derived in (Lindqvist, 2001) for FIR models and in (Hjalmarsson and Lindqvist, 2002) for ARX models. This work is generalized to include general linear single input/single output (SISO) model structures in (Mårtensson and Hjalmarsson, 2003). A key result in these contributions is that the estimated non-minimum phase zeros is not subject to the usual increase in the variance when the model order is increased. Based on these variance results, we will in this contribution consider input design for accurate identiﬁcation of non-minimum phase zeros. The 133 6.2 Estimation of Parameters and Zeros input design problem is formulated as an optimization problem where the objective is to minimize the input eﬀort required to keep the variance of the non-minimum zero below a certain limit. 6.2 Estimation of Parameters and Zeros We will assume that the true system is linear and contains a non-minimum phase zero. For identiﬁcation of this system we will use the prediction error framework presented in Chapter 2. For the reader’s convenience we will repeat some of the basics in this framework. We will also introduce variance expressions for the estimated zeros. 6.2.1 Parameter Estimation The model of our single input/single output system is deﬁned by y(t) = G(q, θ)u(t) + H(q, θ)e(t) (6.1) where G and H are parameterized by the real valued parameter vector θ. Furthermore, y is the output and u is the input and e is zero mean white noise with variance λ. It is assumed that G and H have the rational forms G(q, θ) = q −nk B(q, θ) , A(q, θ) H(q, θ) = C(q, θ) D(q, θ) (6.2) where A(q, θ) = 1 + a1 q −1 + · · · + ana q −na B(q, θ) = b1 + b2 q C(q, θ) = 1 + c1 q −1 −1 D(q, θ) = 1 + d1 q −1 + · · · + bnb q + · · · + cnc q −nb +1 −nc + · · · + dnd q −nd (6.3) (6.4) (6.5) (6.6) with q being the delay operator. We will assume that there exists a description of the true system within the model class deﬁned by θ = θo and λ = λo . The one-step-ahead predictor for the model (6.1) is (6.7) ŷ(t, θ) =H −1 (q, θ)G(q, θ)u(t) + 1 − H −1 (q, θ) y(t) and the prediction error is ε(t, θ) = H −1 (q, θ) (y(t) − G(q, θ)u(t)) . (6.8) 134 6 Input Design for Identification of Zeros The parameters are estimated with the prediction error method using a least mean square criterion to minimize the prediction error. The parameter estimate is N 1 2 ε (t, θ), (6.9) θ̂N = arg min 2N t=1 θ where N denotes the number of the data that is used for the estimation. Under mild assumptions the parameter estimate has an asymptotic distribution (Ljung, 1999) that obeys √ N (θ̂N − θo ) → N (0, P ) as N → ∞ lim N E(θ̂N − θo )(θ̂N − θo )T = P −1 P (θo ) = λo E[ψ(t, θo )ψ T (t, θo )] ∂ y(t, θ) ψ(t, θo ) = ∂θ N →∞ (6.10) θ=θo Here N denotes the Normal distribution. Using (6.7) we obtain ψ(t, θo ) = Fu (q, θo )u(t) + Fe (q, θo )eo (t) where (6.11) Fu (q, θ) = ∂G(q, θ) 1 H(q, θ) ∂θ (6.12) Fe (q, θ) = ∂H(q, θ) 1 . H(q, θ) ∂θ (6.13) and Under the assumption of open-loop operation, i.e. that u and e are uncorrelated, we can write π 1 P −1 (θo ) = Fu (θo )Φu Fu∗ (θo )dω + Ro (θo ) (6.14) 2π −π where Φu is the spectrum of the input and where λo π Ro (θo ) = Fe (θo )Fe∗ (θo )dω. 2π −π (6.15) The connection between the asymptotic covariance and the input spectrum in (6.14) will be further exploited for input design for identiﬁcation of zeros. But ﬁrst we will review some results regarding the accuracy of identiﬁed zeros. 135 6.2 Estimation of Parameters and Zeros 6.2.2 Estimation of Zeros Consider identiﬁcation of a system deﬁned by (6.1) and (6.2). Let θbT = [b1 , . . . , bnb ] and introduce the polynomial p(z, θb ) = b1 z nb −1 + b2 z nb −2 · · · + bnb . (6.16) A zero zi (θ) of the system (6.1) is deﬁned by zi (θ) = p(zi , θb ) = 0. Now we consider one particular zero, z(θ), which is assumed to have multiplicity one. Introduce the notation ẑ = (θ̂N ), zo = z(θo ), B(q, θ) 1 − z(θ)q −1 F (z) = F (z(θ), θ), % θ) = B(q, and where F corresponds to a general transfer function evaluated at q = z(θ). Further, introduce the vector Γb (q) = 1 q −1 ··· q −nb +1 T . (6.17) In (Lindqvist, 2001) it is established that the variance of an estimated zero is λo |zo |2 ∗ lim N E(ẑ − zo )2 = Γ (z )P Γ (z ) (6.18) % o )|2 b o b b o N →∞ |B(z where Pb is the asymptotic the covariance matrix of θb . If we consider non-minimum phase zeros and increase the model order we can simplify the expression (6.18). Let u(t) = Q(q)v(t) where v(t) is a white noise sequence with variance 1 and Q(q) is a minimum phase ﬁlter. Then according to (Mårtensson and Hjalmarsson, 2003) we have lim lim N E(ẑk − zo )2 = nb →∞ N →∞ for general linear SISO systems. λo |zo |2 |H(zo )|2 |A(zo )|2 % o )|2 |Q(zo )|2 (1 − |zo |−2 )|B(z (6.19) 136 6.3 6 Input Design for Identification of Zeros Input Design - Analytical Results In this section we will use the variance expressions (6.18) and (6.19) in order to determine suitable inputs for accurate identiﬁcation of zeros. The input design will be formulated as an optimization problem where we seek the input spectrum with least energy that keeps the variance below a certain limit. This can be stated as follows: π 1 Φu (ω)dω minimize Φu 2π −π (6.20) subject to Var ẑ ≤ γ. The choice of optimization variable is natural because the only quantity, asymptotically in N , that can be used to shape the variance is the spectrum of the input, cf. (6.14), (6.18) and (6.19). 6.3.1 Input Design for Finite Model Orders The ﬁrst step to solve (6.20) is to rewrite the original problem formulation into a convex program with respect to the input spectrum. The objective function is already convex but the constraint is not. Let Γ (6.21) Γb0 = b 0 and α2 = λo |zo |2 . % o )|2 N |B(z (6.22) The variance constraint in (6.20) using (6.18) now becomes γ − Γ∗b0 (zo ) P Γb0 (zo ) ≥ 0, α2 (6.23) which by Corollary 3.1 is equivalent to P −1 − α2 Γb0 Γ∗b0 ≥ 0. γ (6.24) Since the inverse of the covariance matrix is aﬃne in Φu , the constraint (6.24) is convex with respect to Φu . Thus, the convex formulation of 137 6.3 Input Design - Analytical Results (6.20) is minimize Φu subject to 1 2π P −1 π −π Φu (ω)dω α2 Γb0 Γ∗b0 ≥ 0. − γ (6.25) This means that if (6.25) is feasible it has a global optimal solution. The expression (6.25) is convex but in general inﬁnite-dimensional which calls for special care when undertaking the optimization. But as will be shown in Section 6.4, by imposing certain parameterizations of the input spectrum it is possible to reformulate (6.25) as a ﬁnite-dimensional convex optimization problem. Today, there exist several numerical optimization routines that solve such problems to any demanded accuracy. But ﬁrst we will show that it is possible to derive analytical solutions to (6.25) for FIR and for ARX model structures. Theorem 6.1 Consider the FIR-system y(t) = q −nk B(q, θb )u(t) + e(t). (6.26) For a non-minimum phase zero, zo , the input design problem (6.25) is solved by ﬁltering unit variance white noise with the ﬁrst order AR-ﬁlter 0 1 − zo−2 α , (6.27) Q(q) = √ γ (1 − zo−1 q −1 ) i.e. by placing a pole in zo−1 . The minimal required input energy is α2 /γ. For a minimum phase zero, zo , the input design problem (6.25) is solved by ﬁltering unit variance white noise with the ﬁrst order AR-ﬁlter 0 1 − zo2 α , (6.28) Q(q) = √ n −1 γ|zo | b (1 − zo q −1 ) i.e. by placing a pole in zo . The minimal required input energy is α2 / γ|zo |2(nb −1) . Proof: With the covariance function π 1 rk = Φu (ω)ejωk 2π −π (6.29) 138 6 Input Design for Identification of Zeros the energy of the input can be expressed as π 1 Φu (ω)dω = r0 . 2π −π (6.30) For notational convenience, let the variable n̄b be deﬁned as n̄b = nb − 1. Then the asymptotic covariance matrix becomes ⎞ ⎛ r0 · · · rn̄b ⎜ .. ⎟ = R . .. P −1 = ⎝ ... (6.31) u . . ⎠ rn̄b ... r0 In this case the constraint in (6.25) becomes ⎛ ⎞ ⎛ r0 · · · rn̄b 1 ... 2 α ⎜ . ⎜ .. ⎟ . . .. .. .. ⎠ − ⎝ . ⎝ .. . γ −n̄b ... rn̄b . . . r0 zo ⎞ zo−n̄b ⎟ .. ⎠ ≥ 0. . −2n̄b zo (6.32) To satisfy (6.32) we need that r0 ≥ α2 /γ. If we can ﬁnd a covariance function rm with r0 = α2 /γ that satisﬁes (6.32) we have a solution. In the following we prove that the covariance function rm = α2 −m z γ o (6.33) is such a solution when zo is a non-minimum phase zero. First we note that this particular choice of rm gives Ru ≥ 0 and that the ﬁrst row and column of (6.32) is zero. Now we need to show that ⎛ ⎞ ⎛ −1 ⎞ 1 . . . zo1−n̄b zo ⎜ ⎟ ⎜ .. ⎟ −1 .. .. . . . . . zo−n̄b ≥ 0. (6.34) ⎝ ⎠ − ⎝ . ⎠ zo . . . zo1−n̄b ... 1 Using Schur complements this ⎛ 1 2 α ⎜ . ⎝ .. γ zo−n̄b zo−n̄b is equivalent to ⎞ . . . zo−n̄b .. ⎟ = R ≥ 0, .. u . . ⎠ ... (6.35) 1 which is true as noted before. A signal with the covariance function (6.33) can be generated by ﬁltering unit variance white noise with the ﬁrst order AR-ﬁlter (6.27). This proves the ﬁrst part of Theorem 6.1. 6.3 Input Design - Analytical Results 139 Minimum phase zeros can be treated in the same way. We ﬁrst note that (6.32) requires that (6.36) r0 ≥ α2 zo−2n̄b . The problem (6.20) is solved by letting the input signal have the covariance function rm = α2 zom−2n̄b , (6.37) which is obtained by ﬁltering white noise with the ﬁlter 0 α 1 − zo2 Q(q) = . |zo |nb −1 (1 − zo q −1 ) (6.38) This concludes the proof of Theorem 6.1. Remark: The ﬁlter (6.27) is constructed such that the variance of the estimated zero will be γ. Thus, the variance constraint in (6.20) is tight. Notice that the optimal ﬁlter is independent of the model order. From this it is easy to conclude that the variance of the estimated non-minimum phase zero also will be independent of the model order1 when optimal input design is used. This is a very interesting fact. This means that there is no loss in accuracy by using an over-parametrized model when optimal input design is applied, cf. Example 3.15. Remark: The optimal ﬁlter for minimum phase zeros (6.28) depends on the model order. Hence, the order independence of the variance, that holds for non-minimum zeros when optimal input design is applied, does not hold for minimum phase zeros. The result in Theorem 6.1 will now be generalized to ARX model structures. Theorem 6.2 Consider the ARX-system y(t) = q −nk 1 B(q, θb ) u(t) + e(t). A(q, θa ) A(q, θa ) (6.39) For a non-minimum phase zero, the input design problem (6.25) has the same solution as for a FIR-system, see Theorem 6.1. 1 The model order must be equal or greater than the true system order. 140 6 Input Design for Identification of Zeros Proof: For the ARX-system (6.39) we have . / Γb (q) −nk Fu (q) = q , − B(q) A(q) Γa (q) 0 . Fe (q) = 1 − A(q) Γa (q) Let G = P −1 (6.40) B A. Now we have that π ∗ 1 0 0 Γb Γ∗b −G Γb Γ∗a = Φu dω + 0 R0 2π −π −GΓa Γ∗b |G|2 Γa Γ∗a Ru Ruy = . Ryu Ry The constraint in (6.25) becomes α2 Ru Ruy Γb Γ∗b − Ryu Ry 0 γ 0 0 (6.41) ≥ 0. (6.42) The upper left corner of (6.42) is the same as (6.32). In the following we prove that the solution to the FIR-case, (6.33), also satisﬁes (6.42). We start by examining Ryu in detail: π 1 Γa Γ∗b GΦu dω Ryu = − 2π −π π % − zo e−jω ) α2 (1 − zo−2 ) B(1 1 Γa Γ∗b dω − (6.43) 2π −π A |1 − zo−1 e−jω |2 π % α2 (1 − z −2 ) B 1 o dω. Γa Γ∗b =− 2π −π A (1 − zo ejω ) Let ∞ % B α2 (1 − zo−2 ) = fκ e−jωκ A κ=0 (6.44) and 1 e−jω zo−1 =− jω (1 − zo e ) (1 − zo−1 e−jω ∞ = −e−jω gτ e−jωτ . τ =0 (6.45) 141 6.3 Input Design - Analytical Results This gives us Ryu [m, n] = = 1 2π ) π ∞ e−jω(n−m−2) −π m≥n−1 , otherwise Ryu ⎛ 0 ⎜ .. = ⎝. 0 .. . 0 ··· .. . ∞ gτ e−jωτ τ =0 κ=0 0, , i.e. fκ e−jωκ ··· .. . (6.46) ⎞ .. ⎟ . .⎠ (6.47) Now we look at the constraint (6.42). Here we also need r0 ≥ α2 /γ. With the covariance function rk = α2 −k z γ o (6.48) the ﬁrst row and column of (6.42) are zero and we need ⎛ 1 ⎜ .. ⎜ . ⎜ 1−n ⎜zo ⎜ ⎜ 0 ⎜ ⎜ . ⎝ .. 0 ··· .. . ··· .. . ··· zo1−n .. . 1 0 .. . ··· .. . ··· .. . ··· This is by Schur complements ⎛ 1 zo−1 · · · zo−n ⎜ zo−1 1 · · · zo1−n ⎜ ⎜ .. .. .. .. ⎜ . . . . ⎜ −n zo1−n · · · 1 α2⎜ ⎜ zo ⎜ 0 0 ⎜ ⎜ . .. . . . ⎝ . . . 0 0 ··· ⎞ ⎛ −1 ⎞ ⎛ −1 ⎞T zo zo 0 .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟ ⎜ . ⎟⎜ . ⎟ .⎟ ⎟⎜ ⎟ ⎟ ⎜ −n ⎟ ⎜zo ⎟ ⎜zo−n ⎟ ⎟⎜ ⎟ ≥ 0. ⎟−⎜ ⎜ ⎟⎜ ⎟ ⎟ ⎟ ⎜ 0 ⎟⎜ 0 ⎟ ⎜ ⎜ ⎟ ⎟ .. ⎠ ⎝ .. ⎠ ⎝ .. ⎟ . . ⎠ . 0 0 equivalent to ⎞ 0 ··· 0 0 ··· 0 ⎟ ⎟ .. ⎟ .. . .⎟ ⎟ Ru ⎟ ⎟= Ryu · · · ⎟ ⎟ .. . . .. ⎟ . .⎠ . ··· (6.49) Ruy ≥0 Ry (6.50) which is true. This proves that the covariance function (6.48) solves the input design problem (6.20) for the ARX-system (6.39). 142 6 Input Design for Identification of Zeros Remark: As for FIR models, the variance of the estimated zeros will be independent of the model order when we use optimal input design. Furthermore, the solution in Theorem 6.2 gives a tight bound of the variance constraint with a ﬁlter that is independent of the A-polynomial. Hence, it is easy to conclude that the variance of the zero is independent of the A-polynomial as well. However, it is important to estimate the A-polynomial for the asymptotic properties (2.12) to hold. 6.3.2 Input Design for High-order Systems For general linear SISO models it is possible to derive an analytical solution of (6.20) based on the asymptotic variance expression (6.19). Theorem 6.3 The input design problem (6.20) where the variance of a non-minimum phase zero is deﬁned by (6.19) is solved by ﬁltering unit variance white noise with the ﬁrst order AR-ﬁlter 0 1 − zo−2 α|H(zo )A(zo )| (6.51) Q(q) = √ γ (1 − zo−1 q −1 ) Proof: Introduce A = ao ··· aη T , Γ= 1 zo−1 ··· zo−η T and let the input ﬁlter Q(q) be the FIR-ﬁlter Qη (zo ) = AT Γ then the input energy is π η 1 jω 2 |Qη (e )| dω = a2i = AT A 2π −π i=0 (6.52) (6.53) With α2 |H(zo )A(zo )|2 γ the input design problem (6.20) can be stated αh2 = minimize A subject to AT A AT ΓΓT A ≥ αh2 . (1 − |zo |−2 ) (6.54) 143 6.3 Input Design - Analytical Results The minimum is achieved when A is an eigenvector corresponding to the largest eigenvalue of ΓΓT . The matrix ΓΓT has rank one and the only non-zero eigenvalue is ΓT Γ with the eigenvector Γ, giving the optimal solution αh A= Γ. (6.55) (1 − zo−2 )ΓT Γ The optimal FIR-ﬁlter is Qη = αh η −i −i i=0 zo q (1 − zo−2 ) η −2i i=0 zo (6.56) and if we let the order of the FIR-ﬁlter go to inﬁnity we get the optimal ﬁlter αh (1 − zo−2 ) Q(q) = lim Qη (q) = . (6.57) η→∞ 1 − zo−1 q −1 This proves Theorem 6.3. Notice that the optimal ﬁlter coincides with (6.27) for FIR and ARX models. This in complete line with the observation that the optimal ﬁlter for any ﬁnite model order is actually given by (6.27) for these model structures. The solution for other model structures is in principle the same, i.e. a pole placed in (zo )−1 , when the model order is suﬃciently large. The only diﬀerence is the gain of the ﬁlters. 6.3.3 Realization of Optimal Inputs So far we have realized the optimal input by ﬁltering unit variance white noise through a speciﬁc ﬁrst order AR-ﬁlter. This is by no means the only way to realize the optimal input. In fact, all inputs with a spectrum that coincides with that of the optimal ﬁrst order AR-ﬁlter, are optimal. This follows from the asymptotic theory. For all the cases presented in Theorem 6.1-6.3, the optimal input is characterized by auto-correlations of the form rm = βη −|m| . Such inputs can, apart from ﬁltering white noise, e.g. also be realized by binary signals, see (Söderström and Stoica, 1989; Tulleken, 1990). 144 6.4 6 Input Design for Identification of Zeros Input Design - A Numerical Solution We have so far presented analytical solutions to (6.25) for FIR and ARX model structures and for general linear model structures if we let the model order tend to ∞. Here we will show how to solve (6.25) numerically for a Box-Jenkins model structure deﬁned by (6.1)-(6.6). The key is to rewrite (6.25) into a ﬁnite-dimensional convex program which indeed can be obtained by a suitable parametrization of the input spectrum. For an overview of diﬀerent parameterizations of the input spectrum, we refer to Chapter 3.2. Here we will use a partial correlation parametrization of the input spectrum, which was introduced in (Stoica and Söderström, 1982), see also Example 3.9. Deﬁne L and {lk } as L(ejω , θ) =|C(ejω , θ)|2 |A(ejω , θ)|4 nl l|k| e−jωk (6.58) k=−nl where nl = 2na + nc − 1. Furthermore, introduce the auto-correlations ck deﬁned by 1 ck = 2π π −π Φu (ω) jωk e dω L(ejω , θo ) (6.59) and let np = na + nb + nd − 1. Lemma 6.1 Let L(ejω , θ) be deﬁned by (6.58). Furthermore assume that the polynomials A, B, C and D in the Box-Jenkins model are coprime. Then there exist matrices Mk ∈ Rn×n such that the inverse covariance matrix P −1 deﬁned by (6.14) can be expressed as P −1 (θo ) = np c|k| (θo )Mk (θo ) + Ro (θo ) (6.60) k=−np Proof: See (Stoica and Söderström, 1982) or Example 3.9. With this particular parametrization it is possible to express the input power as a linear function. 145 6.4 Input Design - A Numerical Solution Lemma 6.2 The power of the input u(t) with power spectrum Φu (ω) can be expressed as 1 2π π −π Φu (ω)dω = nl c|k| l|k| (6.61) k=−nl Proof: See (Stoica and Söderström, 1982) or Example 3.11. Now it is possible to rewrite the original input design formulation (6.25). Theorem 6.4 Under the assumptions stated in Lemma 6.1 and Lemma 6.2, the input design problem (6.25) is equivalent to the following ﬁnite-dimensional convex program minimize c0 ,...,cm subject to nl c|k| l|k| k=−nl np c|k| Mk + Ro (θo ) − α2 Γb0 Γ∗b0 ≥ 0 k=−np ⎡ c0 ⎢ c1 ⎢ ⎣· · · cm c1 c0 ··· cm−1 ⎤ (6.62) ··· cm · · · cm−1 ⎥ ⎥≥0 ··· ··· ⎦ ... c0 where m = max(nl , np ). Proof: Direct application of the results in Lemma 6.1 and Lemma 6.2 to (6.25). The constraint on the Toeplitz matrix in (6.62) assures that the optimization variables c0 , . . . , cm are indeed auto-correlations to a quasi-stationary process. The input design problem (6.62) is now convex and ﬁnite-dimensional and there are several eﬃcient numerical optimization methods that solve such problems. Let us illustrate the results of this section. 146 6 Input Design for Identification of Zeros 100 Magnitude (dB) 50 0 −50 −100 −150 −2 10 −1 10 0 10 Frequency (rad/s) 1 10 Figure 6.1: Optimal spectra for nb = 2 (solid) and for nb = 3 (dashed). 6.4.1 Numerical Example We will assume that the dynamics of the system is deﬁned by the continuous time system Gc (s) = 1−s (s + 1)(2s + 1) (6.63) i.e. there is a continuous time non-minimum phase zero in 1. With a zeroorder hold discretization with sampling time Ts = 0.25 s this corresponds to the discrete non-minimum phase zero zd = 1.29. Furthermore, we will assume that the input/output relation is deﬁned by an output-error (OE) model structure, the data length is N = 500 and λo = 0.1. When the model order equals the true system order, i.e. the order is two, the solution to (6.62) is basically a sum of two sinusoids. When the order of the B-polynomial, nb , is increased, the solution coincides with the ﬁrst order AR-ﬁlter deﬁned in (6.51) This is illustrated in Figure 6.1 where the optimal spectra for nb = 2 and for nb = 3 are shown. Notice that there is a quite dramatic diﬀerence between the optimal spectra for diﬀerent model orders. 147 6.5 Sensitivity and Benefits 6.5 Sensitivity and Benefits Recapitulate the asymptotic variance expression Var ẑ lim lim N E(ẑk −zo )2 = nb →∞ N →∞ λo |zo |2 |H(zo )|2 |A(zo )|2 (6.64) % o )|2 |Q(zo )|2 (1 − |zo |−2 )|B(z that was introduced in (6.19). Based on (6.64) we will in this section try to quantify possible beneﬁts of using an optimal or a sub-optimal design instead of using a white input. There will be a comparison of the obtained variance levels when the input power is normalized to one for all the designs. We will also study how the location of the zero aﬀects the result. In the ﬁrst comparison, the optimal input ﬁlter with unit power i.e. 0 1 − zo−2 (6.65) Qopt (q, zo ) = 1 − zo−1 q −1 is compared with Q = 1. From (6.64) we have that Var ẑ(Qopt ) 1 = = 1 − zo−2 2 |Q Var ẑ(Q = 1) opt (zo )| (6.66) The thick solid line in Figure 6.2 corresponds to 1 − zo−2 as a function of the zero location. Thus there is a substantial decrease in variance for zeros close to the unit circle when the optimal input design is used instead of a white input. This comparison also indicates that when the zero is located far from the unit circle (|zo | 4), there is no beneﬁt in using optimal input design. A white input performs almost as good as the optimal design. One interpretation of this relates to the location of the discrete zero with respect to the sampling time. Consider the continuous system (6.63), which for Ts = 0.25 has a discrete zero in 1.29. If the sampling time is increased the discrete zero will move away from the unit circle, and hence the eﬀect of the non-minimum phase zero will e.g. be less visible in the discrete measurements of a step response. Consequently, the beneﬁts of optimal input design are reduced. In a practical situation the location of the true zero is unknown and an estimate of the zero may be used for input design. Given the optimal ﬁlter (6.65) and an estimate of the zero, ẑ, a natural choice of input ﬁlter is √ 1 − ẑ −2 (6.67) Qapp (q, ẑ) = 1 − ẑ −1 q −1 148 6 Input Design for Identification of Zeros 1 0.9 0.8 variance reduction 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 zero location 6 7 8 9 10 15 Figure 6.2: The thick solid line represents the optimal variance reduction as a function of the zero location, see (6.66). The dashed lines corresponds to (6.68) and illustrates the variance reduction with a sub-optimal design. A reasonable question is how the uncertainty in the zero location will aﬀect the estimation accuracy. This is also illustrated in Figure 6.2. The dashed lines corresponds to the ratio 2 1 − ẑ −1 zo−1 Var ẑ(Qapp ) = 1 − ẑ −2 Var ẑ(Q = 1) (6.68) as a function of ẑ for four diﬀerent locations of the true zero (corresponding to the circles in the ﬁgure). These curves show that there is a quite large tolerance with respect to the estimated zero location. 149 6.5 Sensitivity and Benefits Table 6.1: Comparison phase zero. Model order PRBS 2 0.0022 5 0.0027 6.5.1 of the variance for an estimated non-minimum Qopt 0.0011 0.0011 Qapp 0.0012 0.0012 Square-wave 0.0017 0.0023 Numerical Example Let the dynamics of the system be deﬁned by the continuous system in Section 6.4.1. The sampling time and the data length are the same as in Section 6.4.1 but we will assume that the true system is of ARX type with a noise variance of 0.0025. Now we will compare, by means of an example, the obtained accuracy when using four diﬀerent types of inputs. The ﬁrst input is a Pseudo-Random-Binary-Signal (PRBS) which has white-noise-like properties. The second input is the optimal one and hence the optimal input ﬁlter is given by (6.27). We know from (6.66) that the optimal gain in accuracy when using the optimal input compared to a white input is approximately a factor 2.5 when the model order tends to inﬁnity. These two input signals will be compared to a sub-optimal input given by (6.67) with the zero estimate ẑ = 1.6 and a square-wave signal where the signal is constant in 10 s before switching level. This square-wave signal, that takes the values ±1, is constructed such that the typical dip of the step response for a system with a non-minimum phase zero is clearly visible. The power of all inputs are equalized to one. We have used a model structure of order two (the true order) and one of order ﬁve, i.e. an overparametrization. The result of 10000 Monte-Carlo simulations is given in Table 6.1. The theoretical value of the variance for the optimal input is, asymptotically in data, 0.0010, independently of the model order provided it is larger than the true system order. The Monte-Carlo simulations conﬁrm this well. The variance of the optimal input remains constant when the order is increased. Furthermore, the variance is close to the asymptotic value. Also the sub-optimal input performs well. From (6.64) we have that 1 − (zo )−2 1 − (ẑ)−2 Var ẑ(Qopt ) |Qapp (zo )|2 = = . (6.69) 2 |Qopt (zo )|2 Var ẑ(Qapp ) (1 − (ẑ zo )−1 ) With zo = 1.29 and ẑ = 1.6, (6.69) equals 0.915 that can be compared 150 6 Input Design for Identification of Zeros with 0.0011/0.0012 ≈ 0.917 obtained from the Monte-Carlo simulations. For model order two, the variance reduction using the optimal input instead of the square-wave is about 1.5. When the order is increased to ﬁve, this factor increases to 2. The corresponding ﬁgures for the PRBS input are 2 and 2.5, respectively. 6.6 Using Restricted Complexity Models for Identification of Zeros The optimal correlation sequence of the input for accurate identiﬁcation −|k| where zo denotes of zeros for FIR or ARX systems is given by rk = zo a unique non-minimum phase zero, see Theorem 6.1 and Theorem 6.2. This also holds for general linear models asymptotically in the order of the numerator polynomial, see Theorem 6.3. These results are derived under the assumptions of full or high order modelling. Now we will study the use models of lower order than the true system. Assume that the true system is linear and described by y(t) = n gk u(t − k) + v(t). (6.70) k=1 Let the model be the linear regression y(t) = ϕT (t)θ + e(t) = u(t − 1) ··· ⎤ b1 ⎢ ⎥ u(t − m) ⎣ ... ⎦ + e(t) ⎡ (6.71) bm where e represents white noise. Thus the one-step ahead output prediction is ŷ(t) = ϕT (t)θ. The least-squares estimate is obtained as N −1 N T ϕ(t)ϕ (t) ϕ(t)y(t). θ̂N = t=1 (6.72) (6.73) t=1 In Example 3.15, we saw that we obtained a consistent estimate of the static gain when we used optimal input design independently of which the model order was. Now, we will study θ̂N when N tends to inﬁnity and check the consistency. 151 6.7 Conclusions Theorem 6.5 Let the true system be described by (6.70) and assume that the true system has one non-minimum zero zo . Furthermore, assume that u and v are statistically independent and that u is a stationary and stochastic −|k| process with auto-correlations E u(t)u(t − k) = zo . Then the leastsquares estimate (6.73), based on the model (6.71) with 1 ≤ m < n, gives a consistent estimate of zo . Proof: Notice that the true system is a linear regression in gk . Use this and insert (6.70) into (6.73). Since u and v are independent the limit estimate θ∗ becomes ⎤−1 ⎡ ⎡ ⎤ r2 · · · rm ··· rn r0 r1 r0 · · · rm ⎢ .. ⎥ ⎢r1 r0 r1 · · · rm−1 · · · rn−1 ⎥ θ .. θ∗ = ⎣ ... ⎦ o . . ⎦ ⎣. . .. .. .. . . . . . . ··· . rm · · · r0 (6.74) −|k| where θoT = [g1 , g2 , . . . , gm ]. Now insert the correlations rk = zo into (6.74). This gives ⎡ ⎤ 1 0 0 0 0 ··· ⎢0 1 ⎥ 0 0 0 ··· ⎢ ⎥ θ∗ = ⎢ . (6.75) ⎥ θo . .. ⎣ .. ⎦ . 0 · · · 0 1 zo−1 zo−2 · · · A zero of the limit model is deﬁned by b1 xm + · · · + bm = 0, which by (6.75) is equivalent to g1 xm + · · · + gm−1 + gm zo−1 + · · · + gn zom−n = 0. Thus x = zo is a solution to (6.76). 6.7 (6.76) Conclusions Analytical solutions have been derived for FIR and ARX model structures that presents the most eﬃcient input, in terms of input energy, to estimate a discrete non-minimum phase zero zo . The optimal input can be characterized by a ﬁrst order AR-ﬁlter with a pole in zo−1 . This solution is independent of the model order. Thus, the variance of the estimated 152 6 Input Design for Identification of Zeros non-minimum phase zero will be independent of the model order when the optimal input is applied. A similar analytic solution is obtained for general linear models based on a variance expression that is asymptotic in model order and data. A numerical solution has been presented for general linear SISO models of ﬁnite orders. It was illustrated that the optimal input may be very diﬀerent depending on model structure and order. Possible beneﬁts of optimal design have been presented. It was shown that the variance can be reduced signiﬁcantly compared to white inputs and square-waves, especially when the model is over-parameterized. It is also shown that a solution based on the optimal AR-ﬁlter, in which the true zero is replaced by an estimated zero, is quite robust with respect to the estimated zero location. Chapter 7 Convex Computation of Worst Case Criteria Let G(η) be a discrete linear time-invariant single input/single output model of Go . Here η ∈ Rn is a vector that parameterizes G. The model deviates typically from the true system. There are diﬀerent ways of representing the uncertainty in the model. The uncertainty will in this chapter be represented by parametric uncertainties. The parameters will lie in an ellipsoid centered around a nominal estimate ηo . The ellipsoid is described by Υ = {η | (η − ηo )T R(η − ηo ) ≤ 1}. (7.1) Parametric uncertainties in terms of ellipsoids such as Υ appears e.g. in identiﬁcation in the prediction error framework (Ljung, 1999). To evaluate the quality of the estimated models and the possible performance degradation that the model errors may induce, it is important to quantify them. The impact of the errors depends very much on the intended application of the model and consequently a quality measure has to take the intended use into account. In Chapter 3.7 we introduced the following frequency function F (ω, η) [Wn G(η) + Xn ]∗ Yn [Wn G(η) + Xn ] + Kn , [Wd G(η) + Xd ]∗ Yd [Wd G(η) + Xd ] + Kd (7.2) where Wn , Wd , Xn and Xd are ﬁnite-dimensional stable transfer func- 154 7 Convex Computation of Worst Case Criteria tions. Furthermore Yn (ω) was deﬁned by Yn = Yn∗ Yn (7.3) where Yn is a stable ﬁnite-dimensional transfer function. The quantities Yd , Kn and Kd have analogous deﬁnitions to Yn . Here A∗ denotes the complex conjugate transpose of A. In this chapter we will consider quality measures of G(η) that can be expressed as JF sup F (ω, η), ω,η∈Υ (7.4) whenever F is well deﬁned. The quality measure (7.4) was introduced in Chapter 3.7. It was shown that (7.4) can be included in the experiment design as a ﬁxed constraint, thus the upper bound of JF was constrained to be less than a ﬁxed value, while the parametric conﬁdence region was manipulable via R due to its dependence on the input spectrum. The objective in this chapter is to compute JF given a ﬁx conﬁdence ellipsoid. It is a non-trivial optimization problem to compute JF . However, when the frequency is ﬁxed, the computation of maxη F (ω, η), η ∈ Υ, can be turned into a convex optimization problem. Hence, one way to approximate JF is to compute maxη F (ω, η), η ∈ Υ for a ﬁnite number of frequencies and pick the largest value as an approximation of JF . This approach has been used in (Bombois et al., 2000b) to compute the worstcase ν−gap. A similar approach has been used in (Bombois et al., 2001) to compute the worst-case performance of a control design. There are two main contributions of this chapter that extend previous results presented in (Bombois et al., 1999), (Bombois et al., 2000b) and (Bombois et al., 2001). The ﬁrst is the introduction of the generalized cost function (7.4). The second is the introduction of a method that gives an upper bound on JF without discretization of the frequency axis. The outline of the chapter is as follows. Some examples are shown in Section 7.1 to motivate the structure of the quality measure JF in (7.4). It is shown that JF can be computed for a ﬁxed frequency in Section 7.2. By imposing some limitations on one of the optimization variables it is possible to compute an upper bound of JF , considering all frequencies. This is shown in Section 7.3. The method is illustrated by numerical examples in Section 7.4 and some conclusions are drawn in Section 7.5. 7.1 Motivation of Quality Measure 7.1 155 Motivation of Quality Measure In this section we will give some speciﬁc examples of problems that can be expressed as (7.4). First we will consider the parametric uncertainty in (7.4). 7.1.1 Parametric Ellipsoidal Constraints One way of obtaining models is to use the prediction error framework of system identiﬁcation, see (Ljung, 1999) and (Söderström and Stoica, 1989). Based on N observed input-output data points this framework delivers a frequency response estimate G(ejω , θ̂N ), where θ̂N is the prediction error estimate of a vector θ ∈ Rn that parameterizes a set of transfer functions, G(ejω , θ), together with an uncertainty region. When the model is ﬂexible enough to capture the underlying dynamics, i.e. the true system is in the model class, the uncertainty in the estimate is only due to noise and other stochastic disturbances in the data. These errors are denoted variance errors. There exist strong analytic results for the variance error of prediction error estimates in terms of the covariance of the parameters, see (Ljung, 1999) and Chapter 2. The covariance matrix can be used to derive conﬁdence bounds for the estimates. These conﬁdence bounds take the form of ellipsoids such as Υ deﬁned in (7.1) with R being proportional to the inverse of the covariance matrix of the parameters. The conﬁdence ellipsoid can be used to determine the distance between the estimates and the true system. But the conﬁdence region can also be used together with an estimate, say θ̂N , to obtain en uncertainty ellipsoid which contains the true parameters with probability p. Such a conﬁdence ellipsoid for the estimate θ̂N is deﬁned by Uθ = {θ | (θ − θ̂N )T PN−1 (θ − θ̂N ) ≤ α} (7.5) where PN is the covariance matrix of θ̂N and α is obtained from the χ2 −distribution such that P r(χ2 (n) ≤ α) = p for the pre-speciﬁed level of conﬁdence, e.g. p = 0.95. The parametric uncertainty region Uθ . corresponds to an uncertainty region in the space of transfer functions denoted D: D = {G(q, θ) |θ ∈ Uθ }. (7.6) It has been the main objective of (Bombois et al., 1999; Bombois et al., 2000b; Bombois et al., 2001; Gevers et al., 2003) to derive results 156 7 Convex Computation of Worst Case Criteria based on the uncertainty description (7.6) to bridge the gap between identiﬁcation and robust control theory. Parametric uncertainty regions in terms of ellipsoids does not only appear in identiﬁcation in the prediction error framework. They do also appear in e.g. set-membership identiﬁcation techniques, see (Milanese and Vicino, 1991). 7.1.2 Worst Case Performance of a Control Design Assume that an identiﬁed model, G will be used for control design where the objective is to control the open-loop system y = Go u + v where y is the output, u the input, and v is some disturbance. The control law is deﬁned by u = K(r − y) + w (7.7) where r and w are external excitation signals. With the model G and the controller K, the designed closed-loop system becomes GK G r y 1+GK 1+GK = 1 K w u 1+GK 1+GK 1 r G K 1 T (G, K) (7.8) = w 1 1 + GK if the disturbance v is neglected. There are several ways of deﬁning the performance of the closed-loop system (7.8). When the controller K stabilizes T (G, K) the performance measure adopted here is deﬁned by J(G, K, Wl , Wr ) Wl T (G, K) Wr ∞ (7.9) where Wl = diag(Wl1 , Wl2 ) and Wr = diag(Wr1 , Wr2 ) are diagonal frequency weights and A∞ is the H∞ −norm (Zhou et al., 1996) of the stable transfer function function A. The most interesting performance measure is J(Go , K, Wl , Wr ) since this will measure the achieved performance when the controller is applied to the true system Go . However, Go is unknown. Assume that the model G has been obtained from a prediction error identiﬁcation experiment, which together with the model estimate also delivers an estimate of the covariance matrix for the estimated parameters. Further, assume that the true system is contained in the model class, then the covariance matrix can be used to deﬁne an uncertainty set D as 157 7.1 Motivation of Quality Measure in (7.6). Modulo an error in the covariance estimate, this uncertainty set will also contain the true system with a prespeciﬁed probability. Thus, one way to do a worst case estimation of the achieved performance is to consider the following criterion JW C (D, K, Wl , Wr ) max Wl T (G, K) Wr ∞ G(θ)∈D (7.10) where the uncertainty region D is deﬁned in (7.6). By the deﬁnition of D we know that the probability of JW C (D, K, Wl , Wr ) ≥ J(Go , K, Wl , Wr ) at least equals the degree of conﬁdence, provided the asymptotic theory of the prediction error identiﬁcation is accurate. As will be evidenced in Section 7.2, the computation of the worst case performance over the model set D becomes a tractable convex optimization problem when the frequency is ﬁx. Hence a lower bound of JW C can be obtained by solving this optimization problem for a ﬁnite set of frequencies and then take the maximum of all the obtained objective values. This is the solution suggested by (Bombois et al., 2001). Now consider the square of JW C : 2 ∗ JW C (D, K, Wl , Wr ) = max σ̄(TW TW ), θ ∈ Uθ ω,θ (7.11) TW = Wl T (G, K) Wr ) where σ̄(A) is the largest singular value of A. Then ∗ ∗ σ̄(TW TW ) = TW TW = (|Wl1 G|2 + |Wl2 |2 )(|Wr1 K|2 + |Wr2 |2 ) |1 + GK|2 (7.12) since T (·) is a rank one matrix. Notice that (7.12) can be ﬁt into the expression for F (ω) deﬁned in (7.2) by choosing Wn = Wl1 , Yn = (|Wr1 K|2 + |Wr2 |2 ), Kn = |Wl2 |2 (|Wr1 K|2 + |Wr2 |2 ), Wd = K, Xd = 2 Yd = 1 and Xn = Kd = 0. Hence the computation of JW C (·) becomes a special case of computing JF deﬁned in (7.4). 7.1.3 The Worst Case Vinnicombe Distance Now consider the worst case ν−gap between the identiﬁed model G(θ̂N ) and the uncertainty set D deﬁned by δW C (G(θ̂N ), D) = sup δν (G(θ̂N ), G(θ)) θ∈Uθ 158 7 Convex Computation of Worst Case Criteria where δν denotes the Vinnicombe ν−gap between two transfer functions introduced in (Vinnicombe, 1993). Here Uθ obeys (7.5). Furthermore, when θ̂N ∈ Uθ the worst case ν−gap can be expressed as δW C (G(θ̂N ), D) = sup κ(G(θ̂N ), G(θ)), (7.13) ω,θ∈Uθ see Lemma 5.1 in (Bombois et al., 2000b). Here κ is the chordal-distance κ((G(θ̂N ), G(θ) = |G(θ̂N ) − G(θ)| 0 1 + |G(θ̂N )|2 1 + |G(θ)|2 (7.14) between the two transfer functions G(θ̂N ) and G(θ). Now consider the square of the chordal-distance κ2 (G(θ̂N ), G(θ)) = |G(θ̂N ) − G(θ)|2 (1 + |G(θ̂N )|2 )(1 + |G(θ))|2 ) that also can be ﬁt into the expression for F (ω) deﬁned in (7.2) by choosing Xn = −G(θ̂N ), Wn = Wd = Yn = 1, Yd = Kd = 1 + |G(θ̂N )|2 and Xd = Kn = 0. Hence the computation of the worst-case chordal-distance becomes yet another example of a problem that can be ﬁt into the computation of JF . 7.2 Computation of Worst Case Criterion for a Fixed Frequency The computation of JF can be restated as the following optimization problem minimize γ subject to F (ω, η) ≤ γ ∀ ω (η − ηo )T R(η − ηo ) ≤ 1. γ (7.15) Let γ = γopt be the minimizer of (7.15), then JF = γopt . As stated, the optimization problem in (7.15) is intractable since it involves constraints that are inﬁnite-dimensional and non-convex. In this section we will show that for each frequency, the optimization problem (7.15) can be restated as a convex problem that has a unique solution if feasible. The way this 7.2 Computation of Worst Case Criterion for a Fixed Frequency 159 is shown exploits techniques developed in (Bombois et al., 2000b) and (Bombois et al., 2001) and the result can be seen as a generalization of previous contributions. The ﬁrst step is to exploit the special structure of F since both the numerator and the denominator of F are quadratic functions in G(η). For this we will reuse the result in Lemma 3.6 that states T η η (γF0 (ω) − F1 (ω)) ≥0 (7.16) F ≤γ⇔ 1 1 for some matrices F0 (ω) and F1 (ω). The equivalence in (7.16) will be further exploited in the next theorem. Theorem 7.1 Assume that F (ω, η) < ∞ for all ω and for all η ∈ Υ. Then the following two statements are equivalent: I. F (ω, η) ≤ γ ∀ ω ∈ [−π, π] and ∀ η ∈ Υ, Υ = {η | (η − ηo )T R(η − ηo ) ≤ 1} (7.17) II. ∃ τ (ω) ≥ 0, τ (ω) ∈ R such that γF0 (ω) − F1 (ω) − τ (ω)E ≥ 0 ∀ ω −R Rηo E T ηo R 1 − ηoT Rηo (7.18) Proof: The equivalence (7.16) gives that F ≤ γ is equivalent to T η η σ0 (η) (γF0 (ω) − F1 (ω)) ≥ 0. 1 1 (7.19) Expression (7.19) is equivalent to that F ≤ γ for a particular η. Now this must be true for all η ∈ Υ. The ellipsoid Υ can be parameterized as T −R η σ1 (η) 1 −ηoT R −Rηo 1 − ηoT Rη η 1 (7.20) Hence the condition F ≤ γP ∀ ω ∈ [−π, π] and η ∈ Υ is equivalent to σ0 (ω) ≥ 0 for all ω and for all η such that σ1 (η) ≥ 0. Such a problem 160 7 Convex Computation of Worst Case Criteria can be handled by the S−procedure (Boyd et al., 1994) which states the following equivalence for each ω: σ0 (η) ≥ 0 ∀ η ∈ Rk such that σ1 (η) ≥ 0 ⇔ ∃ τ ≥ 0, τ ∈ R, such that σ0 (η) − τ σ1 (η) ≥ 0 ∀η ∈ Rk . Since there has to exist one τ ≥ 0 for each ω, we can rewrite τ as a function of ω which has to fulﬁll τ (ω) ≥ 0 for all ω. The condition σ0 (η) − τ σ1 (η) ≥ 0 ∀ ω & ∀ η ∈ Rk now corresponds to (7.18). Theorem 7.1 can be used to rewrite the optimization problem (7.15) as minimize γ subject to γF0 (ω) − F1 (ω) − τ (ω)E ≥ 0 ∀ ω τ (ω) ≥ 0 ∀ ω. γ,τ (ω) (7.21) Remark: For a ﬁxed frequency, ω = Ω, the problem (7.21) becomes a tractable convex optimization problem in the variables γ and τ . This has been recognized in (Bombois et al., 2000b) where the worst case ν-gap is computed for a ﬁxed frequency. A similar approach is used in (Bombois et al., 2001) to compute the worst case control performance as deﬁned in (7.10) but for a ﬁxed frequency. The main diﬃculty with (7.21), when the frequency is free, is the unknown positive function τ (ω). In the next section, the optimization problem will be relaxed by limiting τ (ω) to be a linearly parameterized spectrum. With this restriction on τ (ω) it is possible to obtain a ﬁnite dimensional convex optimization problem using the Kalman-YakubovichPopov (Yakubovich, 1962), whose minimizer will be an upper bound on JF . 7.3 Computation of an Upper Bound Introduce the following parametrization of τ (ω): τ (ω) = Ψ(ejω ) + Ψ∗ (ejω ) Ψ(ejω ) = M −1 1 τ0 + τk Bk (ejω ) 2 (7.22) k=1 where {Bk (ejω )} is a set of stable and ﬁnite-dimensional transfer functions, e.g. Laguerre functions (Wahlberg, 1991) or Kautz functions (Wahlberg, 1994). With the restriction τ (ω) ≥ 0 ∀ω, τ (ω) becomes a spectrum where 7.3 Computation of an Upper Bound 161 Ψ(ejω ) corresponds to the positive real part if τ (ω) ≥ 0 ∀ ω. When {Bk (ejω )} = {e−kjω } the sequence {τk } corresponds to the correlation coeﬃcients. The main reason to study such parameterizations of τ (ω) is that inﬁnite-dimensional spectral constraints such as τ (ω) ≥ 0 ∀ ω may be replaced by ﬁnite-dimensional linear matrix inequalities when the positive real part of the spectrum is linearly parameterized, see Lemma 3.1. The idea is to let {A, B, C, D} be a state-space realization of the positive M −1 part of the spectrum, Ψ(ejω ) = 12 τ0 + k=1 τk Bk (ejω ) where {τk } appears linearly in C and D. It is easy to construct such a realization since {τk } appears linearly in Ψ(ejω ). Given this realization, the inequality τ (ω) τ0 + M −1 ck [Bk (ejω ) + Bk∗ (ejω )] ≥ 0 ∀ ω (7.23) k=1 can, according to Lemma 3.1 be replaced by Q − AT QA −AT QB K(Q, {A, B, C, D}) −B T QB −B T QA 0 CT + ≥ 0. C D + DT (7.24) Notice that (7.24) is a linear matrix inequality in Q and {τk }. Lemma 7.1 Let F0 (ω), F1 (ω) and E be deﬁned by (7.16), (3.107) and (7.18) and introduce Λ(ω) γF0 (ω) − F1 (ω) − τ (ω)E. (7.25) When τ (ω) is deﬁned by (7.23) there exists a sequence {Λk } such Λ(ω) ≥ 0 ∀ ω ⇔ p Λk (e−kjω + ekjω ) ≥ 0 ∀ ω (7.26) k=0 where the variables {τk } and γ appears linearly in {Λk }. Proof: Both F0 and F1 have the structure k Fk e−kjω . Multiply with the least common denominator of τ (ω) to both sides of Λ(ω) ≥ 0. This gives the equivalence in (7.26) 162 7 Convex Computation of Worst Case Criteria The special parametrization of τ (ω) introduced in (7.23) will, according to Lemma 7.1, imply that the condition Λ(ω) ≥ 0 ∀ ω can be replaced by a positiveness constraint on a spectrum, see (7.26). This fact can now be used together with Lemma 3.1. Theorem 7.2 Assume that F (ω, η) < ∞ for all ω and for all η ∈ Υ. Let τ (ω) be deﬁned by (7.23) and let {Aτ , Bτ , Cτ , Dτ } be a state-space representation of the positive real part of τ (ω) where {τk } appears linearly in Cτ and Dτ . Let {AΛ , BΛ , CΛ , DΛ } be the corresponding state-space representation of the positive real part of the spectrum (7.26) where {τk } and γ appears linearly in CΛ and DΛ . Then F (ω, η) ≤ γ ∀ ω ∈ [−π, π] and ∀ η ∈ Υ, (7.27) Υ = {η | (η − ηo )T R(η − ηo ) ≤ 1} if ) K(Qτ , {Aτ , Bτ , Cτ , Dτ }) ≥ 0 . K(QΛ , {AΛ , BΛ , CΛ , DΛ }) ≥ 0 (7.28) Proof: Due to the parametrization of τ (ω), the constraints (7.28) will assure that τ (ω) ≥ 0 ∀ ω and Λ(ω) ≥ 0 ∀ ω according to Lemma 3.1 and Lemma 7.1. Whenever τ (ω) ≥ 0 ∀ ω and Λ(ω) ≥ 0 ∀ ω this will, according to Theorem 7.1, imply (7.27). Theorem 7.2 can now be used to state a ﬁnite-dimensional convex optimization problem to compute an upper bound on JF . Theorem 7.3 Consider the assumptions established in Theorem 7.2. The minimizer γopt of the optimization problem minimize γ,{τk },Qτ ,QΛ subject to γ K(Qτ , {Aτ , Bτ , Cτ , Dτ }) ≥ 0 K(QΛ , {AΛ , BΛ , CΛ , DΛ }) ≥ 0 (7.29) is an upper bound of JF deﬁned in (7.4). Proof: Apply Theorem 7.2 on the constraints in (7.21). The obtained optimization problem will provide an upper bound if feasible due to the restriction on τ (ω). This is a very powerful result, which will be illustrated in Section 7.4. 163 7.4 Numerical Illustrations 7.4 Numerical Illustrations The true linear discrete time system obeys y(t) = Go (q)u(t) + e(t) (7.30) −1 0.8q where Go (q) = 1−q−1 +0.16q −2 and where q is the time-shift operator −1 q u(t) = u(t − 1). Furthermore y is the output, u the input and e is zero-mean white noise with variance 0.1. A model is estimated using the prediction error method based on 1000 samples of input/output data from the system (3.139). The obtained model is G(η̂) = 0.8064q −1 1 − 1.018q −1 + 0.1839q −2 (7.31) where η̂ = [−1.018 0.1839 0.8064]. The covariance matrix of η̂ is ⎛ ⎞ 1.112 −0.987 0.739 (7.32) Pη̂ = 10−3 · ⎝−0.987 0.900 −0.614⎠ . 0.739 −0.614 0.791 Now deﬁne the parametric uncertainty ellipsoid Uη as Uη = {η | (η − η̂)T Pη̂ (η − η̂) ≤ 7.82}. (7.33) The right hand side of (7.33) is determined as P r(χ2 (3) ≤ 7.82) = 0.95. Let ηo = [−1 0.16 0.8] be the vector that parameterizes the true system Go . The ellipsoid Uη will contain ηo with a probability of 95% provided the asymptotic theory of the prediction error method is accurate, see (Ljung, 1999). The vector ηo is indeed contained in Uη since (ηo − η̂)T Pη̂−1 (ηo − η̂) = 3.21 < 7.82. 7.4.1 Computation of the Worst Case Vinnicombe Distance Assume that we want to compute the worst case ν-gap δW C = max ω,η∈Uη |G(η̂) − G(η)|2 . (1 + |G(η̂)|2 )(1 + |G(η))|2 ) To compute δW C corresponds to compute JF for appropriate choices of weights in the general cost function F as was shown in Section 7.1.3. 164 7 Convex Computation of Worst Case Criteria An upper bound on JF can be computed, according to Theorem 7.3, by solving a ﬁnite-dimensional convex optimization problem. We will now use Theorem 7.3 to compute an upper bound on δW C . First the parametrization of τ (ω) must be speciﬁed. We have chosen τ (ω) = M −1 τk e−jωk (7.34) k=−(M −1) with M = 3. There is a trade-oﬀ between ﬂexibility and computational complexity in the choice of M . The positive real part of (7.34) can be realized by O1×M −2 0 Aτ = , Bτ = [1 0 . . . 0]T OM −2×1 IM −2 (7.35) 1 Cτ = [τ1 τ2 . . . τM −1 ] , Dτ = τ0 , 2 where Om×k is the zero matrix of size m by k and Im is the identity matrix of size m by m. Hence the constraint τ (ω) ≥ 0 ∀ ω can now be replaced by the linear matrix inequality K(Qτ , {Aτ , Bτ , Cτ , Dτ }) ≥ 0. The special parametrization of τ (ω) will imply that Λ(ω) = γF0 (ω) − F1 (ω) − τ (ω)E = p Λk (e−jωk + ejωk ), (7.36) (7.37) k=0 see Lemma 7.1. The positive real part of (7.36) can be realized in the same manner as τ (ω), i.e. the realization follows from (7.35). Given the realization {AΛ , BΛ , CΛ , DΛ } of Λ(ω), the constraint Λ(ω) ≥ 0 ∀ ω can now be replaced by the linear matrix inequality K(QΛ , {AΛ , BΛ , CΛ , DΛ }) ≥ 0. Now it is straightforward to compute an upper bound on δW C using Theorem 7.3. The result is shown in Figure 7.1. The obtained upper bound is δW C ≤ 0.06348. This upper bound can be compared with a method that divides the frequency axis into a ﬁnite number of frequencies and where the worst case chordal distance, deﬁned as maxη κ(G(η̂), G(η)) is computed for each frequency according to (7.21). Here κ is deﬁned by (7.14). This frequency by frequency method yields that δW C ≥ 0.06344. Hence the obtained upper bound is relative tight to the true value of δW C . This is somewhat surprising since the parametrization of τ (ω) is quite restrictive. 165 7.4 Numerical Illustrations 0.07 0.06 0.05 0.04 0.03 0.02 0.01 −3 10 −2 10 −1 ω (rad) 10 0 10 Figure 7.1: Computation of the worst case ν-gap. Solid line: Obtained upper bound based on (7.29). Dotted line: The worst case chordal distance, frequency by frequency. 7.4.2 Computation of Worst Case Control Performance Assume that we want to control the open-loop system by the PI controller −1 K(q) = 0.1−0.05q . This controller stabilizes the nominal model (7.31). 1−q −1 The controller will actually stabilize all models in the set Gη = {G(η) : η ∈ Uη } since δW C < T (G(η̂), K)−1 ∞ , see (Vinnicombe, 1993). The worst case performance when the controller is applied to the systems in the set Gη can be evaluated by the criterion (7.10). Here we will compute an upper bound on the worst case sensitivity function JW CS = max G(η)∈Gη 1 1 + KG(η) by using the result of Theorem 7.3. The parametrization for τ (ω) follows 166 7 Convex Computation of Worst Case Criteria 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −3 10 −2 10 −1 ω (rad) 10 0 10 Figure 7.2: Thick solid line: Upper bound on the sensitivity function. Dotted line: Worst case sensitivity function, frequency by frequency. Thin solid line: Achieved sensitivity with K applied to G(ηo ). from (7.34) with M = 2. The result of the computation of JW CS is shown in Figure 7.2. The obtained upper bound is JW CS ≤ 1.35297. A frequency by frequency optimization method yields that JW CS ≥ 1.35295. 7.5 Conclusions In this chapter we have presented a method, in terms of a convex program, to compute diﬀerent worst case criteria, e.g. the worst case Vinnicombe distance. Such problems have been solved until now by using a frequency gridding. The main contributions are the introduction of a generalized cost function, that e.g. includes the worst case Vinnicombe distance, and a method that computes an upper bound of this cost function without using a discretization of the frequency axis. The disadvantage is that the 7.5 Conclusions 167 method imposes a relaxation that may introduce conservativeness. It is of great interest to further investigate how conservative these results may be. However, numerical results shows that the conservativeness, at least for a large number of cases, not will be an issue. Chapter 8 Gradient Estimation in IFT for Multivariable Systems Iterative Feedback Tuning (IFT) is a model free control tuning method using closed-loop experiments, see (Hjalmarsson et al., 1994; Hjalmarsson et al., 1998; Hjalmarsson, 2002). For single-input single-output (SISO) systems only 2 or 3, depending on the controller structure, closed-loop experiments are required. However for multivariable systems the number of experiments increases to be proportional to the dimension of the controller. In this chapter several methods are proposed to reduce the experimental time by approximating the gradient of the cost function. One of these methods uses the same technique of shifting operators as is used in IFT for SISO systems. This method is further analyzed and sufﬁcient conditions for local convergence are derived. It is shown that even when there are commutation errors due to the approximation method, the numerical optimization may still converge to the true optimum. 8.1 Introduction In order to develop control systems that both guarantee stability and provide good performance some knowledge about the process to be controlled is necessary. One important source of information comes from 170 8 Gradient Estimation in IFT for Multivariable Systems experimental data collected from the process. Hence, means to map the experimental information into the controller are essential ingredients in any control design procedure. Often, but not necessarily, an explicit model of the process is used as an intermediate for this. This model is typically obtained by ﬁtting parameters in a prespeciﬁed model structure so that the model behaves similarly to the true process for some available data set. The usefulness of a model lies in the fact that it can be used to predict the process behavior for other operating conditions than those in the data set. Based on this predicted behavior, a suitable controller can be designed. At the same time as the ability to extrapolate makes a model-based approach very powerful, it is also its main weakness. An incorrect model may lead to undesirable, and even catastrophic, results. The issue of how to ensure that an identiﬁed model is suited for control design has been subject to intense study over the last ﬁfteen years (Gevers, 1993; Van den Hof and Schrama, 1995; Hjalmarsson, 2003). A more cautious approach is to limit the extrapolation ability of the model. Suppose that the control design is embodied in terms of a design criterion which is a function of some parameters in the controller. Suppose also that there already is a controller operating in the feedback loop. One could then try to model the design criterion locally around the present controller parameters and use this local model to adjust the controller so that performance is improved. Iterative Feedback Tuning (IFT) is a tuning method which is based on this philosophy. IFT extracts information about the closed-loop sensitivity to the model/controller parameters (gradient information). This corresponds to local information about the controlled process around the current closed-loop trajectory and hence, implicitly, to a local model. The price for using accurate local system information is ﬁrst of all that only gradual changes are allowed in a controller tuning algorithm. This makes tuning slower compared to if a model that is able to (accurately) model the complete process characteristics is available. Furthermore, a local model cannot in general be used to predict the stability border. Finally, to obtain this information, special experiments may have to be performed on the true process, possibly disturbing the normal operating conditions. In IFT, for SISO systems, the experiment load can be kept at a maximum of two experiments in each iteration independent of the number of the parameters to tune. However for nonlinear systems and multi input/multi output (MIMO) systems the experiment load grows substantially. 8.2 System Description 171 With IFT transformed into a nonlinear setting it is shown in (Sjöberg and Agarwal, 1996) and (De Bruyne et al., 1997) that the gradient of the control cost can be computed by performing additional experiments. The number of experiments turn out to be proportional to the number of parameters. To reduce the number of experiments, identiﬁcation-based methods to approximate the gradient are proposed in (De Bruyne et al., 1997) and (Sjöberg and Agarwal, 1997). A hybrid version between the ideas of the original IFT and the model-based approximations is presented in (Sjöberg and Bruyne, 1999), where a model of the linearized closed-loop system is introduced in order to compensate for the errors that occur when the original IFT is used for nonlinear systems. It is shown in (Hjalmarsson, 1999) that for linear time-invariant MIMO systems, the experimental load can be reduced to be proportional to the dimension of the controller. From a practical point of view, the experimental load might still be too high for large systems. Hence it is of great interest to further reduce the experimental load. It is the main objective of this paper to discuss, suggest and evaluate some methods that exist for doing this. IFT was introduced in (Hjalmarsson et al., 1994) and a general presentation has appeared in (Hjalmarsson et al., 1998). For a recent overview see (Hjalmarsson, 2002). The Special Section (Hjalmarsson and Gever, 2003) contains a number of applications of IFT. The system set up is introduced in Section 8.2. In Section 8.3 there is a brief description of how the gradients are estimated within the framework of IFT. Since the gradient estimation is experimentally costly, some approximations of the gradient estimate are introduced in Section 8.4. One of the methods, which uses the same technique as is used for IFT and SISO systems, is analyzed in Section 8.5, where conditions for local convergence are stated. The performance of this method is the studied on three numerical examples in Section 8.6. Finally there are some concluding remarks in Section 8.7. 8.2 System Description The overall closed-loop system, depicted in Figure 8.1 is described by yt = G0 ut + vt ut (ρ) = C(ρ)(rt − yt (ρ)), (8.1) 172 8 Gradient Estimation in IFT for Multivariable Systems v r e(ρ) u(ρ) C(ρ) y(ρ) G0 Figure 8.1: Feedback system. where the process to be controlled, G0 , is assumed to be a discrete linear time-invariant multivariable system, yt ∈ Rp is the output, ut ∈ Rm is the input and vt ∈ Rp represents unmeasurable signals such as disturbances and noise. The disturbance vt is assumed to be a zero-mean discrete-time stochastic process and it is also assumed that sequences from diﬀerent experiments are mutually independent. The controller C(ρ) is a m × p transfer function matrix parameterized by some parameter vector ρ ∈ Rf . The reference rt is an external vector. The sub index t denotes the discrete time instants. Notice that signals originating from measurements on the closed-loop system are functions of ρ. The parametrization of C(ρ) is such that all signals in the closed-loop are diﬀerentiable with respect to ρ. To ease the notation the time argument will from now on be omitted whenever possible. The performance of the controller is measured by the following quadratic criterion N 1 T E J(ρ) = ỹk (ρ) ỹk (ρ) 2N (8.2) k=0 where ỹ(ρ) = y(ρ)−yd is the diﬀerence between the achieved output and the desired output yd . The desired output is assumed to be generated as yd = Td r, where Td is the reference model. The expectation E[·] is with respect to the probability distribution of the disturbance v. The optimal controller parameterized by ρc is deﬁned by ρc = arg min J(ρ). ρ (8.3) Hence the target of a controller tuning algorithm is to minimize (8.2) to ﬁnd the optimal controller setting represented by ρc . In general, the problem of minimizing J(ρ) is not convex and one has to content with a 8.3 Gradient Estimation in the IFT Framework 173 local minimum. The stationary points of J(ρ) are given as solutions to N ) ∂yk (ρ) 1T 1 ∂J(ρ) = E (8.4) 0= ỹk (ρ) . ∂ρ N ∂ρ k=1 With computed gradients the solution can be obtained by gradient based methods, e.g. the Gauss-Newton search algorithm ρj+1 = ρj − γj R−1 j ∂J(ρj ) , ∂ρ (8.5) where Rj is an approximation of the Hessian of J(ρ) and γj is the adjustable step length. As the problem is stated in (8.4) it is intractable since it involves expectations that are unknown. However, such problems can be handled by classical results in stochastic approximation algorithms, provided there exists an unbiased estimate of the gradient, ∂J(ρ) ∂ρ . The key contribution in the ﬁrst derivation of IFT for SISO systems (Hjalmarsson et al., 1994) was to show that an unbiased estimate of ∂J(ρ) ∂ρ can indeed be obtained by performing experiments on the true closed-loop system. How this is done is the topic of the next section. 8.3 Gradient Estimation in the IFT Framework Here we will give a brief introduction to how the gradient is obtained in the framework of IFT. For a more detailed description of IFT in general we refer to (Hjalmarsson et al., 1998; Hjalmarsson, 2002) and to (Hjalmarsson, 1999) for the special implicatons regarding MIMO systems. With the achieved sensitivity function and the complementary sensitivity function, respectively deﬁned by [I + G0 C(ρ)]−1 S0 (ρ) = T0 (ρ) = S0 (ρ)G0 C(ρ), (8.6) (8.7) the expression for the output y(ρ) from the closed-loop system (8.1) is y(ρ) = T0 (ρ)r + S0 (ρ)v. (8.8) 174 8 Gradient Estimation in IFT for Multivariable Systems e(ρ) C (ρ) v=0 y (ρ) r=0 C(ρ) u (ρ) G0 Figure 8.2: Setup for exact gradient experiments The gradient of y(ρ) with respect to to an arbitrary element of ρ, denoted by y (ρ), is then y (ρ) = S0 (ρ)G0 C (ρ)(r − y(ρ)) (8.9) where y(ρ) is the output collected from the closed-loop system operating under normal operating conditions. Deﬁning the control error as e(ρ) = r − y(ρ), then ideally the gradient of y(ρ) is obtained running the closedloop experiment shown in Figure 8.2. In practice a perturbed estimate ŷ (ρ) = y (ρ)+S0 (ρ)v is obtained due to the non-zero disturbance. It can be shown (Hjalmarsson et al., 1994) that an unbiased estimate, denoted ˆ by ∂ J(ρ)/∂ρ, of ∂J(ρ)/∂ρ is obtained if v is a stochastic stationary signal with zero mean that is mutually independent for sequences collected from ˆ diﬀerent experiments. Unbiased means that E[∂ J(ρ)/∂ρ] = ∂J(ρ)/∂ρ. However, this is a rather ineﬃcient way of generating the gradient. With this setup one has to perform one gradient experiment for each parameter in the vector ρ. This amount can be reduced drastically using the fact that scalar linear operators commute. For SISO systems (8.9) can be rewritten as y (ρ) = C(ρ)−1 C (ρ)S0 (ρ)G0 C(ρ)e(ρ). (8.10) Thus, to obtain the gradient signal ∂y(ρ) ∂ρ only two experiments are needed independent of the number of parameters. The ﬁrst one collects y(ρ) under normal operating conditions and in the second one the control error e(ρ) is set to be the reference and the output of this experiment is then ﬁltered through C(ρ)−1 C (ρ). The last operation can be done oﬀ-line since C(ρ) is a known function of ρ. For MIMO systems the operators in (8.9) do not typically commute. However, since each element in the controller corresponds to a SISO system this trick of commutation can be used for each input/ouput connection if special experiments are performed. It is shown in (Hjalmarsson, 175 8.4 Gradient Approximations 1999) that this limits the maximum number of required gradient experiments to m × p, i.e. equal to the dimension of the controller. Despite this reduction, the experiment may be prohibitively long for slow processes, such as distillation columns, since the experiments are performed on the true plant and possibly disturbing the normal operational conditions. Hence it is of great interest to further reduce the number of experiments. It is the objective of this chapter to discuss some options that exist for doing this. 8.4 Gradient Approximations One way to reduce the number of experiments further is to approximate the signal ∂y(ρ) ∂ρ . There is a large body of theory on approximation errors in optimization, contributions related to IFT are e.g. (Bruyne and Carrette, 1997) for linear systems and (Sjöberg and Bruyne, 1999) for nonlinear systems. In (Hjalmarsson, 1998) it is remarked that in practice IFT seems to be robust with respect to the gradient estimate for many systems. As long as this estimate corresponds to a descent direction, performance may be improved by a suitable choice of the step-length. The estimate thus does not have to be exact. In fact, for convergence aspects, it is more important that the quality of the estimate is good in a vicinity of the optimum than in the surrounding regions. Hence a mix of methods for obtaining a gradient estimate may be useful. In the ﬁrst phase, far away from the optimum, one could use a method which produces rough but cheap gradient estimates. Provided these estimates are reasonable, a sub optimal controller is obtained. If this controller is not satisfactory, one can continue with the original IFT method, which provides unbiased estimates of the gradient and hence guarantees convergence to a local optimum (provided stability is maintained). We will here suggest some methods to approximate ∂y(ρ) ∂ρ for MIMO systems. Rewrite (8.9) as ∂C(ρ) ∂y(ρ) = S0 (ρ)G0 C(ρ)C(ρ)−1 e(ρ) ∂ρi ∂ρi ∂C(ρ) = T0 (ρ) C(ρ)−1 e(ρ). ∂ρi Ai (ρ) The following approximation techniques are considered: (8.11) 176 M 1. M 2. M 3. M 4. 8 Gradient Estimation in IFT for Multivariable Systems ∂y(ρ) ∂ρi = Ai (ρ)T0 (ρ)e(ρ) This is the same approach as is used for SISO systems which means that a maximum of two gradient experiments are required no matter what the dimension of the controller is. For MIMO there is almost always an error due to the commutation error when Ai (ρ) and T0 (ρ) are shifted. The motivation for using this method is that T0 (ρ) is typically close to the identity in the pass band, which should reduce the commutation error. The local convergence of this method will be further studied in the next section. ∂y(ρ) ∂ρi = Td Ai (ρ)T0 (ρ)T−1 d e(ρ) This method is introduced to decrease the commutation error. When T0 (ρ) tends to Td , T0 (ρ)T−1 tends to the identity which, obvid ously, commutes with all matrices. An implementation issue is that in many cases is non-causal. Since both T−1 and e(ρ) are T−1 d d known, non-causal ﬁltering can be applied to compute T−1 d e(ρ). ∂y(ρ) ∂ρi = Td Ai (ρ)e(ρ) The objective is to tune T0 (ρ) towards Td . By assuming that T0 (ρ) ≈ Td one could replace the unknown T0 (ρ) with Td . This approximation has been applied in model reference adaptive control (Whitaker et al., 1958). This method is a naive alternative to the previous presented. When T0 (ρ) is far from Td we cannot expect the approximation to be good. However, it has the nice property that when T0 (ρ) tends to Td , the approximation error decreases. ∂y(ρ) ∂ρi = T̂0 (ρ)Ai (ρ)e(ρ) Here T̂0 (ρ) is an identiﬁed model of the closed-loop system. This idea is used in (Bruyne and Carrette, 1997), where IFT is applied to a resonant system where the gradient experiment was not physically realizable due to high excitation of resonant nodes. The identiﬁcation is assumed to be simpliﬁed by the assumption that the closedloop system is typically of low order and that possible nonlinear eﬀects in G0 are reduced by the feedback. Some of the drawbacks is that the method relies on the identiﬁed model, possibly extra signals need to be injected to excite the system during identiﬁcation and it requires more knowledge of the user to perform the identiﬁcation properly. M 5. ∂y(ρ) ∂ρi = T̂0 (ρ)Ai (ρ)T0 (ρ)T̂0 (ρ)−1 e(ρ) 177 8.5 Analysis of Local Convergence This method is inspired by the second method, M 2. Here Td is replaced by an identiﬁed model T̂0 (ρ). The list of diﬀerent gradient approximations can be extended much further, e.g. we can think of estimating T̂0 (ρ) based on an identiﬁed model Ĝ0 of the plant. This approach is used in (Trulsson and Ljung, 1985). The main drawback is that the plant might be both nonlinear and of high order which complicates the identiﬁcation and the following control design. Notice that there is a diﬀerence in the number of experiments between the methods presented above. In every method at least one on-line experiment is needed to generate e(ρ). The ﬁrst two and the last (M 1, M 2 and M 5) also need a second on-line experiment where the signal e(ρ), −1 e(ρ), depending on method, is ﬁltered through the T−1 d e(ρ) or T̂0 (ρ) closed-loop system T0 (ρ). The identiﬁcation based methods (M 4 and M 5) possibly also need some extra on-line experiments to carry out the identiﬁcation of T̂0 (ρ). Finally, the gradient signal ∂y(ρ) ∂ρ is obtained by oﬀ-line ﬁltering through Ai (ρ), Td Ai (ρ) or T̂0 (ρ)Ai (ρ) for each element i in ρ. This can be compared with 1 + m × p on-line experiments for the true gradient. 8.5 Analysis of Local Convergence We will study the local convergence of the gradient approximation method M 1 introduced in the previous section, i.e. the method which applies the same technique of shifting operators as is used for SISO systems. The purpose of the analysis is to provide some conditions under which the proposed gradient approximation method will work. In case of failing to go downhill with the gradient approximations, a full set of gradient experiments can be carried out. In the analysis we will consider the case where requirements on the controller to attenuate noise is subordinated its ability of model reference tracking. Hence we are not interested to tune the controller with respect to the noise contributions. This situation can be handled by doing a slight modiﬁcation of the original criterion (8.2). Consider the quadratic design criterion J(ρ) = 1 E[ỹ1T (ρ)ỹ2 (ρ)], 2 (8.12) 178 8 Gradient Estimation in IFT for Multivariable Systems where ỹ1 and ỹ2 are based on diﬀerent realizations of the output. Since the disturbance is mutually independent between diﬀerent experiments the inﬂuence of the disturbance will be decorrelated using such a criterion. Notice that if there exists a parameter ρc such that T0 (ρc ) = Td then this is a stationary point for the design criterion. The objective of this section is to analyze local convergence for an iterative method which uses the gradient approximation method M 1 around the point ρ = ρc where T0 (ρc ) = Td . There is no loss of generality to let the system be noise-free in the analysis of local convergence when considering the criterion (8.12) and the stationary point ρ = ρc . Thus in the analysis the closed-loop system obeys (8.1) when the disturbance v is assumed to be zero. We will consider the quadratic design criterion J(ρ) = 1 E[ỹT (ρ)ỹ(ρ)]. 2 (8.13) The criterion (8.13) becomes a good approximation of (8.2) when the number of data points, N , becomes large. The analysis will be based on the so-called ODE analysis (Ljung, 1977) which relate the evolution of an iterative algorithm like ρj+1 = ρj − γj j ) ∂J(ρ ∂ρ (8.14) to the trajectories of a diﬀerential equation. The corresponding ODE to (8.14) is ∂J(ρ) dρ/dt = − . (8.15) ∂ρ The idea is that when the step size γj tends to zero the numerical iteration method will asymptotically behave as the corresponding ODE. Consider the approximation of the gradient of (8.13) ∂J(ρ) ∂ρ ⎤ T ∂y(ρ) ỹ(ρ)⎦ E⎣ ∂ρ ⎡ ⎤ T ∂y(ρ) E⎣ (T0 (ρ) − Td )r⎦ . ∂ρ ⎡ = = (8.16) 179 8.5 Analysis of Local Convergence Notice that if there exists a parameter ρc such that T0 (ρc ) = Td then this is a stationary point for the design criterion and furthermore ∂J(ρc ) ∂ρ . c) ∂J(ρ ∂ρ = The question is whether this is a stable stationary point or not. We will state some results that gives suﬃcient conditions for local convergence around T0 (ρc ) = Td . Introduce the general linear reference model ⎤ ⎡ T11 (ejω ) . . . T1n (ejω ) ⎥ ⎢ .. .. .. Td (eiω ) = ⎣ (8.17) ⎦ . . . Tn1 (ejω ) . . . Tnn (ejω ) where the elements [Tij ] are some arbitrary SISO transfer functions. Furthermore let the linear controller have the structure ⎡ ⎤ C11 (ρ11 , ejω ) . . . C1n (ρ1n , ejω ) ⎢ ⎥ .. .. .. C(ρ, ejω ) = ⎣ (8.18) ⎦ . . . Cn1 (ρn1 , ejω ) ... Cnn (ρnn , ejω ) where Cij (ρij , ejω ) = ρTij Γ(ejω ), (8.19) ρij = [ ρij1 , . . . , ρijm ]T and where Γ(ejω ) = [ B1 (ejω ), . . . , Bm (ejω ) ]T , is some basis function. For example the discrete PID-controller CP ID (q) = 1 q −1 q −2 ρ1 + ρ2 q −1 + ρ3 q −2 = ρ + ρ + ρ 1 2 3 1 − q1 1 − q −1 1 − q −1 1 − q −1 can be represented in this setting. Without loss of generality, the complexity of each element in (8.18), m, is assumed to be equal for all elements. The controller parameters are collected in the vector ρ = [ ρT11 . . . ρTn1 ρT12 . . . ρTnn ]T , i.e. each element in (8.18) is assumed to be individually parameterized. The spectrum of e(ρc ) is deﬁned by Φe (ω) = S0 (ρc )Φr S∗0 (ρc ) (8.20) where Φr is the spectrum of r. Let Ā and A∗ denote the conjugate and the conjugate transpose of A, respectively. The Kronecker product will be denoted ⊗. From now on, the frequency argument will be omitted. 180 8 Gradient Estimation in IFT for Multivariable Systems Theorem 8.1 Assume there exists a parameter vector ρ = ρc such that T0 (ρc ) = Td where Td is deﬁned by (8.17). When the controller is deﬁned by (8.18), ρ = ρc is a stable stationary point to the ODE (8.15) using the gradient approximation ∂y(ρ) ∂ρjkl = C−1 (ρ) ∂C(ρ) T0 (ρ)e(ρ) if the matrix ∂ρ jkl (Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗ TTd > 0 (8.21) for all ω ∈ [−π, π]. Proof: See Appendix 8.A.1. This result shows that even if the operators C−1 C and T0 do not commute, the descent method (8.14) might still be locally convergent using this approximation method. It might be argued that the condition that T0 (ρc ) = Td is restrictive in that it may a priori be diﬃcult to ensure that the controller structure is such that this is possible. However, in (Hjalmarsson et al., 1998) it is argued that the design speciﬁcations should be adapted to the chosen controller complexity so that Td is achievable. In case the convergence point corresponds to a diagonal closed-loop system the following can be stated about how large the dynamics of each channel can deviate from each other still guaranteeing convergence. Corollary 8.1 Assume that the reference signal r = [r1 , · · · , rn ]T obeys E[rk rl ] = 0, k = l, i.e. the diﬀerent channels of the reference are independent. Then when T0 (ρc ) = Td = diag{Tkk } is diagonal, (Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗ TTd > 0 ∀ ω ∈ [−π, π]. if and only if for all ω ∈ [−π, π] | arg Tkk (ejω ) − arg Tll (ejω )| < Proof: See Appendix 8.A.2. π 2 ∀ k, l. When the controller has a diagonal form with a decoupling element, the structure can be further exploited. Theorem 8.2 Assume there exists a parameter vector ρ = ρc such that T0 (ρc ) = Td where Td is deﬁned by (8.17). When the controller is deﬁned by C(ρ) = 181 8.5 Analysis of Local Convergence WCd (ρ), where W is some arbitrary multivariable transfer operator that is independent of ρ and where Cd (ρ) is diagonal, then ρ = ρc is a stable stationary point to the ODE (8.15) using the gradient approximation ∂y(ρ) ∂ρjkl = C−1 (ρ) ∂C(ρ) T0 (ρ)e(ρ) if ∂ρ jkl (Td Φe ) T̄d + (Φe T∗d ) TTd > 0 (8.22) for all ω ∈ [−π, π]. Proof: See Appendix 8.A.3. Here denotes the Hadamard product. Corollary 8.2 Assume that the reference signal r = [r1 , · · · , rn ]T obeys E[rk rl ] = 0, k = l and the controller is deﬁned by C(ρ) = WCd (ρ) as in Theorem 8.2. When T0 (ρc ) = Td = diag{Tkk } is diagonal, (Td Φe ) T̄d + (Φe T∗d ) TTd > 0 for all ω ∈ [−π, π]. Proof: See Appendix 8.A.4. Theorem 8.3 Under the same assumptions as in Theorem 8.1 , with the exception that the controller is a proportional controller, ρ = ρc becomes a stable stationary point to the ODE (8.15) using the gradient approximation ∂y(ρ) ∂ρjkl = C−1 (ρ) ∂C(ρ) T0 (ρ)e(ρ) if ∂ρ π −π π −π jkl ((Td Φe ) T̄d + (Φe T∗d ) TTd )dω > 0 when C = WCd (ρ) or ((Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗ TTd )dω > 0 when C is of full order. Proof: See Appendix 8.A.5. We have given suﬃcient conditions for local convergence around the point T0 (ρc ) = Td for a method using the steepest descent method (8.14) in which the step-size tends to zero and where the gradient of 182 8 Gradient Estimation in IFT for Multivariable Systems the output with respect to the controller is approximated using approximation method M 1. The method is based on the assumption that T0 (ρ)C(ρ)−1 C (ρ) ≈ C(ρ)−1 C (ρ)T0 (ρ). All the results state some conditions on the convergence point T0 (ρc ) = Td . The most promising about these results are that even if Td and C−1 C do not commute, we may have convergence to the true optimum. In the analysis a pure model reference criterion has been studied. Furthermore only output based cost functions have been considered. It is desirable to extend the work in the future to include cost functions which also contain input weighting. Especially since the standard IFT approach can handle all these diﬀerent types of criteria, not just the criterion (8.13). 8.6 Numerical Illustrations We will now use the gradient approximation method M 1 in a numerical example in order to demonstrate some of the results in Section 8.5. The performance of this method will be compared with the original gradient estimation method of IFT for MIMO systems (Hjalmarsson, 1999) (denoted: the original method), which produces unbiased gradient estimates. The parameters will be updated according to ρj+1 = ρj − γj j ) ∂J(ρ , ∂ρ where N ) j 1T j ) ∂J(ρ 1 ∂y k (ρ ) = ỹk (ρj ). ∂ρ N ∂ρ k=1 The process to be controlled is the following two by two system −1 −1 G0 (q) = −2.25q 1−q −1 −2.5+3q −1 1−1.4q −1 +0.4q −2 2.25q 1−q −1 0.5−0.6q −1 1−1.4q −1 +0.4q −2 (8.23) which has a non-minimum phase zero in 1.2. The reference model is deﬁned by (1−p)q −1 0 1−pq −1 . (8.24) Td (p) = −0.2+0.24q −1 0 1−1.6q −1 +0.64q −2 183 8.6 Numerical Illustrations 0.2 0 −0.2 −0.4 −0.6 −0.8 0.75 0.8 0.85 0.9 0.95 Figure 8.3: Solid line: J(1, 2) as a function of p. Dashed line: J(2, 1) as a function of p. ρ 0.1 which 5ρ 0.1 has only one parameter free to tune. The optimal controller will be a function of p, see (8.24). The optimal parameter is given by ρc = (1−p)/9. In the simulation example we use a reference signal with the spectrum Φr = 0.25I, i.e. white noise. Since a P-controller is used Theorem 8.3 applies, which states local convergence if π ((Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗ TTd )dω > 0. The system is controlled with the P-controller C(ρ) = −π When Td and Φr are diagonal this condition reduces to check π J(k, l) (|Skk |2 (Tkk T̄ll + T̄kk Tll )dω > 0 ∀ k, l, −π where Skk and Tkk represents the kth diagonal element of Sd and Td , 184 8 Gradient Estimation in IFT for Multivariable Systems 0.023 0.022 ρ 0.021 0.02 0.019 0.018 0.017 0 20 40 60 80 100 iterations 120 140 160 180 200 Figure 8.4: Illustration of a small perturbation on ρ and with p = 0.8. Solid line-original method, dashed line-approximation M 1, dotted horizontal line-optimal ρ. respectively. Figure 8.3 show J(1, 2) and J(2, 1) as functions of the pole p. This plot indicates that local convergence can not be guaranteed if p < 0.917. It is worth to notice that the analytical results only give suﬃcient conditions. Furthermore they are only valid for suﬃciently large number of data N . In this example N = 330. We will now compare two diﬀerent reference models, Td (p = 0.8) and Td (p = 0.95). When we start with a controller that is perturbed from the optimum as ρ = ρc − 0.005, the descent algorithm converge to the true optimum for both p = 0.8 and p = 0.95, see Figure 8.4 and Figure 8.5 . Notice that the descent algorithm converge even though that p = 0.8 < 0.916, but that the convergence rate is slow compared the standard IFT method which uses the true gradient. When the initial parameter is perturbed more, ρ = ρc −0.1, we obtain the result in Figure 8.6 and Figure 8.7. Thus when p = 0.8 the algorithm diverge but when p = 0.95 we still converge. 185 8.6 Numerical Illustrations −3 8 x 10 7 ρ 6 5 4 3 2 0 2 4 6 8 10 iterations 12 14 16 18 20 Figure 8.5: Illustration of a small perturbation on ρ and with p = 0.93. Solid line-original method, dashed line-approximation M 1, dotted horizontal line-optimal ρ. When considering this particular example it seems that a reference model which obeys the suﬃcient condition J(k, l) > 0 has a larger region of convergence than a convergence point for which J(k, l) < 0. Furthermore, the same conclusion seems to hold also when the convergence rate is considered. To evaluate for which reference models the approximation method M 1 may work the result of Corollary 8.1 might be useful. It states that the phase diﬀerence between each diagonal element in the reference model should be less than π/2. Figure 8.8 shows the bode plots of the diﬀerent elements in the reference models considered here. It is worth to notice that diﬀerence in gain between T11 (0.8) and T22 is less than the diﬀerence between T11 (0.95) and T22 . Still the reference model Td (0.95) is better with respect to convergence aspects. When considering the phase, we see that diﬀerence between T11 (0.95) and T22 is less than the diﬀerence 186 8 Gradient Estimation in IFT for Multivariable Systems 0.05 0 −0.05 ρ −0.1 −0.15 −0.2 −0.25 −0.3 −0.35 0 2 4 6 8 10 iterations 12 14 16 18 20 Figure 8.6: Illustration of a large perturbation on ρ and p = 0.8. Solid line-original method, dashed line-approximation M 1, dotted horizontal line-optimal ρ. between T11 (0.8) and T22 which may explain why Td (0.95) is better than Td (0.8). The simulation example shows that a descent method using the approximation method M 1 may converge to the true optimum even though the commutation error is non-zero. It has also been illustrated that the choice of reference model may have a large impact on the convergence of such an algorithm. 8.7 Conclusions In this chapter we have examined the gradient estimation problem in IFT for MIMO systems. Since the original IFT algorithm requires 1 + m × p experiments to compute the gradient of a system with m control signals and p sensed outputs, we have proposed several methods to approximate 187 8.7 Conclusions 0.02 0 ρ −0.02 −0.04 −0.06 −0.08 −0.1 0 2 4 6 8 10 iterations 12 14 16 18 20 Figure 8.7: Illustration of a large perturbation on ρ and p = 0.93. Solid line-original method, dashed line-approximation M 1, dotted horizontal line-optimal ρ. the gradient in order to decrease the experimental load. In particular, we have studied a method in which operators are shifted in a similar fashion as is in IFT for SISO systems. This reduces the amount of experiments to two regardless of the complexity of the system and the controller. The shifting of multivariable operators almost always introduces an error due to that they typically do not commute. However the analysis gives sufﬁcient conditions for local convergence which shows that the numerical gradient search may still converge to the true optimum even if the commutation error is non-zero. However some caution has to be taken as is illustrated in the simulation example. So far only output based cost functions have been considered. It is desirable to extend the work in the future to include cost functions which also contain input weighting. 188 8 Gradient Estimation in IFT for Multivariable Systems Bode Diagram 0 Magnitude (dB) −5 −10 −15 −20 −25 −30 0 Phase (deg) −90 −180 −270 −360 0 1 10 10 Frequency (rad/sec) Figure 8.8: Bode plot of T11 (0.8) (dashed), T11 (0.95) (solid) and T22 (dotted). 8.A 8.A.1 Proofs Proof of Theorem 8.1 A suﬃcient condition for the ODE (8.15) to be locally stable is that the linearized system ∂ ∂J(ρ) ρ (8.25) dρ/dt = − ∂ρ ∂ρ c ∂ ∂J(ρ ) is stable at the stationary point, i.e. the eigenvalues of ∂ρ must ∂ρ have positive real parts. A suﬃcient condition that the real parts of the eigenvalues of c) ∂ ∂J(ρ ∂ρ ∂ρ are positive is that c ) c ) ∂ ∂J(ρ ∂ ∂J(ρ + Ĵpp (ρ ) = ∂ρ ∂ρ ∂ρ ∂ρ c T (8.26) 189 8.A Proofs is positive deﬁnite. Introduce Ajkl (ρ) = C−1 (ρ) ∂C(ρ) ∂ρjkl (8.27) and let C−1 = [ a1 . . . an ] where ai is a column vector. Since each element of (8.18) is individually parameterized, Ajkl will be of rank one and thus can be expressed as Ajkl = Bl aj εTk (8.28) where εk is the unit vector with the kth element equal to one. Using those expressions, an arbitrary element of c) ∂ ∂J(ρ ∂ρ ∂ρ can be expressed as ⎡ ⎤ T ∂y(ρ) E⎣ (T0 (ρ) − Td )r⎦ ∂ρrst ∂ ∂ρjkl ρ=ρc c c T = E (Arst (ρ )Td e(ρ )) Td Ajkl (ρc )e(ρc ) ) π 1 1 c c ∗ Tr Td Ajkl (ρ )Φe (ω)(Arst (ρ )Td ) dω , = 2π −π (8.29) With (8.28) we obtain Tr{Td Ajkl Φe (Arst Td )∗ } = Bl B̄t Tr{Td aj εTk Φe T∗d εs a∗r } (8.30) = Bl B̄t εTk Φe T∗d εs a∗r Td aj = Bl B̄t (Φe T∗d )sk a∗r Td aj . Let Q = Td Φe = [Qij ] and introduce the matrix ⎡ Q̄11 a∗1 Td a1 ⎢ .. ⎢ . ⎢ ⎢ Q̄11 a∗n Td a1 P=⎢ ⎢ ⎢ Q̄21 a∗ Td a1 1 ⎣ .. . Q̄11 a∗1 Td an .. . ··· .. . ··· Q̄11 a∗n Td an ··· Q̄21 a∗1 Td an Q̄12 a∗1 Td a1 .. . Q̄12 a∗n Td a1 .. . ··· ⎤ ⎥ ⎥ ⎥ ··· ⎥ ⎥, ⎥ ⎥ ⎦ .. . (8.31) 190 8 Gradient Estimation in IFT for Multivariable Systems then using (8.29) together with (8.30) and (8.31), pressed as c) ∂ ∂J(ρ ∂ρ ∂ρ c ) ∂ ∂J(ρ σ2 π = Γ̄D PΓTD dω ∂ρ ∂ρ 2π −π σ2 π = Γ̄D Ξ∗ (Td Φe ) ⊗ Td )ΞΓTD dω. 2π −π can be ex- (8.32) with ΓD and Ξ being block diagonal matrices with each diagonal block element equal to Γ and C−1 , respectively. Using the deﬁnition (8.26) and (8.32), we obtain Ĵpp (ρc ) = σ2 2π π −π MJ dω (8.33) where MJ is the factorization MJ = ΓD ΞT ((Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗ TTd )Ξ̄ Γ∗D . (8.34) H 2 If x ∈ Rmn then xT Ĵpp x = σ2 2π π F(ejω )H(ω)F(ejω )∗ dω (8.35) −π where F(ejω ) = xT ΓD is a multiple input single output ﬁlter and thus xT Ĵpp x ≥ 0 with equality if and only if x = 0. This holds under the assumption that H > 0 for all ω ∈ [ −π, π ]. Furthermore H > 0 if and only if Td ⊗ T̄d + T∗d ⊗ TTd > 0 since Ξ is quadratic and has full rank for all ω ∈ [ −π, π ]. This concludes the proof. 8.A.2 Proof of Corollary 8.1 When E[rk rl ] = 0, k = l and Td is diagonal, Φe becomes diagonal, i.e. Φe = diag[|Skk |2 Φrk ]. This leads to that (Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗ TTd = diag[|Skk |2 Φrk ](Tkk T̄ll + Tll T̄kk )]. Hence (Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗ TTd > 0 if and only if Tkk T̄ll +Tll T̄kk > 0 for all k, l and ω. The expression Tkk T̄ll + Tll T̄kk > 0 if and only if | arg Tkk (ejω ) − arg Tll (ejω )| < π2 for all k, l and ω. 191 8.A Proofs 8.A.3 Proof of Theorem 8.2 Introduce the extended process G̃0 = G0 W and let Cd = diag[Cii ], then the outline of the proof of Theorem 8.1 can be followed. Let Q = Td Φe = [Qij ]. By exploiting the diagonal structure of the controller, (8.30) becomes Tr{Td Ajjt Φe (Arrl Td )∗ } −1 −1 C̄rr Trj Q̄rj = Bl B̄t Cjj (8.36) and again Ĵpp (ρc ) can be written as (8.33), but now MJ = ΓD C−1 ((Td Φe ) T̄d + (Φe T∗d ) TTd )C̄−1 Γ∗D . (8.37) Then Ĵpp (ρc ) > 0 if (Td Φe ) T̄d + (Φe T∗d ) TTd > 0 for all ω ∈ [−π, π]. This follows from the factorization (8.37) and the proof of Theorem 8.1. 8.A.4 Proof of Corollary 8.2 When E[rk rl ] = 0, k = l and Td is diagonal, Φe becomes diagonal, i.e. Φe = diag[|Skk |2 Φrk ]. This leads to that (Td Φe ) T̄d = diag[|Skk |2 Φrk |Tkk |2 ] which is obviously positive and hence (Td Φe ) T̄d + (Φe T∗d ) TTd > 0. 8.A.5 Proof of Theorem 8.3 In the proof of Theorem 8.1 use the fact that a P-controller is frequency independent. Then a suﬃcient condition for (8.33) to be positive deﬁnite is that π ((Td Φe ) ⊗ T̄d + (Φe T∗d ) ⊗ TTd )dω > 0. −π A similar reasoning holds for the diagonal case. Chapter 9 Summary and Suggestions for Future Work Several diﬀerent problems related to experiment design have been studied in this thesis. The main focus has been on optimal experiment design for identiﬁcation but also experiment design for controller tuning based on closed-loop data has been considered. This chapter brieﬂy summarizes the main contributions and suggestions for future work are presented. 9.1 Summary Chapter 3 – Fundamentals of Optimal Experiment Design Typical optimal experiment design formulations are non-convex and inﬁnite dimensional which is from an optimization point of view intractable. A ﬂexible framework for translating such problems into ﬁnite dimensional convex programs is introduced in this chapter. The framework allows for a large class of quality constraints. This includes e.g. the classical A-, E-, D- and L-optimal constraints. A more important contribution from a control perspective are the introduction of diﬀerent types of frequencyby-frequency constraints on the frequency function estimate. Here both variance constraints and certain types of constraints that are guaranteed to hold in a conﬁdence region are considered. 194 9 Summary and Suggestions for Future Work The key to obtain tractable experiment design formulations lies in the parametrization of the design spectrum. Two general representations are introduced that nicely generalize previous presented parametrizations. They all have in common that they are linear and ﬁnite-dimensional parametrizations of the spectrum or a partial expansion thereof. Both continuous and discrete spectra can be handled and constraints on these spectra can be included in the design formulation, either in terms of power bounds or as frequency wise constraints. Design both in open- and closed-loop is considered. Furthermore, results for input design for models with biased noise dynamics is presented. Chapter 4 – Finite Sample Input Design for Linearly Parametrized Models Most experiment designs rely on uncertainty descriptions that are valid asymptotically in the sample size. In this chapter a certain class of linearly parametrized models is considered for which exact variance expressions can be obtained for ﬁnite sample sizes. Based on these expressions, two solutions that perform the optimization over the square of the DFT coeﬃcients of the input, is presented. Chapter 5 – Applications A comparison is performed between the use of optimal inputs and the use of standard identiﬁcation input signals, for example PRBS signals for two benchmark problems taken from process control and control of ﬂexible mechanical structures. The results show that there is a substantial reduction in experiment time and input excitation level using optimal inputs. It is indicated by Monte-Carlo simulations that there are advantages also in the case where the true system is replaced by a model estimate in the input design. Chapter 6 – Input Design for Identiﬁcation of Zeros Accurate identiﬁcation of a non-minimum zero, zo , is the topic of Chapter 6. It is shown that the optimal input is characterized by the auto−|k| for FIR and ARX model structures. correlations E u(t)u(t − k) = αzo This is also true for general linear models asymptotically in the model order. Furthermore, it is shown that the variance of an estimated zero is independent of the model order when the optimal input is applied provided it is larger than or equal to the true order. A numerical solution 9.2 Future Work 195 has been derived for general linear models of ﬁnite order. An example illustrated that the optimal input may be very diﬀerent depending on model structure and order. It is also shown that the variance can be reduced signiﬁcantly using optimal designed inputs compared to white inputs and square-waves, especially when the model is over-parameterized. It is also shown that a solution where the true zero is replaced by an estimated zero, is quite robust with respect to the estimated zero location. Chapter 7 – Convex Computation of Worst Case Criteria The main contributions are the introduction of a generalized cost function, that e.g. includes the worst case Vinnicombe distance, and a method that computes an upper bound of this cost function without using frequency gridding. This gridding is avoided by introducing a relaxation of a certain variable and by using the Kalman-Yakubovich-Popov-lemma Chapter 8 – Gradient Estimation in IFT for Multivariable Systems The gradient estimation problem in IFT for MIMO systems may be very experimentally costly. Several methods to approximate the gradient is suggested in Chapter 8. A method, in which the order of operators are shifted in a similar fashion as in IFT for SISO systems, is deeper analyzed. This reduces the amount of experiments to two regardless of the complexity of the system and the controller. Multivariable operators do not in general commute. However the analysis gives suﬃcient conditions for local convergence which shows that the numerical gradient search may still converge to the true optimum even if the commutation error is nonzero. However, caution has to be taken as is illustrated in the simulation example. 9.2 Future Work The work of this thesis has generated some ideas of future work. Some suggestions are: • Extend the results and study optimal experiment design for multiinput/multi-output systems. 196 9 Summary and Suggestions for Future Work • The Kalman-Yakubovich-Popov-lemma is frequently used to handle frequency wise constraints. The existing formulations of these constraints are computationally costly and a reformulation would be useful. • Optimal experiment designs depend in most cases on the true underlying system. Further knowledge of the sensitivity of a design based on an estimate of the true system may be useful for designing more robust experiments. • Examples show that accurate models can be obtained by a proper choice of excitation even though the model order is lower than the true order. This should be further explored and connections to non-linear systems are of high relevance. • In Chapter 6, input design for identiﬁcation of zeros is considered. In the future, other quantities such as poles, but also important properties such as gain and phase margins would be interesting to study. • The method for the computation of an upper bound in Chapter 7 has the disadvantage that a relaxation is imposed leading to conservativeness. It is of great interest to further investigate how conservative these results may be and if there are ways around this. This also applies to the results in Section 3.7. • For the gradient estimation problem in Chapter 8 only output based cost functions have been considered. It is desirable to extend the work in the future to include cost functions which also contain input weighting. Bibliography Alkire, B. and L. Vandenberghe (2002). Convex optimization problems involving ﬁnite autocorrelation sequences.. Mathematical Programming Series A 93, 331–359. Anderson, B. D. O. and J.B. Moore (1979). Optimal Filtering. PrenticeHall. New Jersey. Bittanti, S., M. C. Campi and S. Garatti (2002). New results on the asymptotic theory of system identiﬁcation for the assessment of the quality of estimated models.. In: Proc. 41st IEEE Conf. on Decision and Control. Las Vegas, Nevada, USA. Bohlin, T. (1991). Interactive System Identiﬁcation: Prospects and Pitfalls. Springer-Verlag. Bombois, X., B. D. O. Anderson and M. Gevers (2000a). Mapping parametric conﬁdence ellipsoids to Nyquist plane for linearly parametrized transfer functions.. In: Model Identiﬁcation and Adaptive Control.. pp. 53–71. Springer Verlag. Bombois, X., B.D.O. Anderson and M. Gevers (2004a). Quantiﬁcation of frequency domain error bounds with guaranteed conﬁdence level in Prediction Error Identiﬁcation. Systems and Control Letters. Submitted. Bombois, X., G. Scorletti, M. Gevers, R. Hildebrand and P. Van den Hof (2004b). Cheapest open-loop identiﬁcation for control. In: Proc. 43th IEEE Conf. on Decision and Control. Bahamas. Bombois, X., G. Scorletti, M. Gevers, R. Hildebrand and P. Van den Hof (2004c). Least costly identiﬁcation experiment for control. In: MTNS’04. Leuven, Belgium. 198 Bibliography Bombois, X., G. Scorletti, M. Gevers, R. Hildebrand and P. Van den Hof (2004d). Least costly identiﬁcation experiment for control. Automatica. Submitted. Bombois, X., M. Gevers and G. Scorletti (1999). Controller validation based on an identiﬁed model.. In: Proc. 38th IEEE Conf. on Decision and Control. Phoenix, Arizona, USA. Bombois, X., M. Gevers and G. Scorletti (2000b). A measure of robust stability for an identiﬁed set of parametrized transfer functions. IEEE Trans. Automatic Control 45, 2141–2145. Bombois, X., M. Gevers, G. Scorletti and B. D. O. Anderson (2001). Robustness analysis tool for an uncertainty set obtained by prediction error identiﬁcation. Automatica 37, 1629–1636. Box, G. E. P. and G. M. Jenkins (1970). Time Series Analysis, Forecasting and Control. Holden-Day. San Francisco. Boyd, S. and C. Barratt (1991). Linear controller design: Limits of performance. Prentice-Hall. Boyd, S. and L. Vandenberghe (2003). Convex Optimization. Cambridge University Press. Boyd, S., L. Ghaoui, E. Feron and V. Balakrishnan (1994). Linear matrix inequalities in systems and control theory, Studies in applied Applied Mathematics. SIAM. Philadelphia. Bruyne, F.De and P. Carrette (1997). Synthetic generation of the gradient for an iterative controller optimization method. In: Proceedings of European Control Conference. Brussels, Belgium. Byrnes, C.I., S.V. Gusev and A. Lindquist (2001). From ﬁnite covariance windows to modeling ﬁlters: A convex optimization approach. SIAM Review 43, 645–675. Campi, M. C. and E. Weyer (2002). Finite sample properties of system identiﬁcation methods. IEEE Trans. Automatic Control 47, 1329– 1334. Cooley, B. L. and J. H. Lee (2001). Control-relevant experiment design for multivariable systems described by expansions in orthonormal bases.. Automatica 37, 273–281. Bibliography 199 De Bruyne, F., B.D.O. Anderson, M. Gevers and N. Linard (1997). Iterative controller optimization for nonlinear systems. In: Proc. 36th IEEE Conf. on Decision and Control. San Diego, California. Dumitrescu, B., I. Tăbuş and P. Stoica (2001). On the Parameterization of Positive Real Sequences and MA Parameter Estimation. IEEE Transactions on Signal Processing AC-49, 2630–2639. Eykhoﬀ, P. (1974). System Identiﬁcation, Parameter and State Estimation. Wiley. Fedorov, V. V. (1972). Theory of Optimal Experiments, volume 12 of Probability and Mathematical Statistics. Academic Press. Forssell, U. and L. Ljung (2000). Some results on optimal experiment design. Automatica 36(5), 749–756. Garatti, S., M. C. Campi and S. Bittanti (2003). Model quality assessment for instrumental variable methods: use of the asymptotic theory in practice.. In: Proc. 42nd IEEE Conf. on Decision and Control. Maui, Hawaii, USA. Gevers, M. (1993). Towards a joint design of identiﬁcation and control?. In: Essays on Control: Perspectives in the Theory and its Applications (H. L. Trentelman and J. C. Willems, Eds.). Birkhäuser. Gevers, M. and L. Ljung (1986). Optimal experiment designs with respect to the intended model application. Automatica 22, 543–554. Gevers, M., X. Bombois, B. Codrons, G. Scorletti and B. D. O. Anderson (2003). Model validation for control and controller validation in a prediction error identiﬁcation framework–Part I: theory. Automatica 39, 403–415. Gillberg, Jonas and Anders Hansson (2003). Polynomial complexity for a Nesterov-Todd potential-reduction method with inexact search directions. In: Proceedings of thet 42nd IEEE Conference on Decision and Control. Maui, Hawaii, USA. p. 6. Goodwin, G. C. (1982). Experiment design. In: 6th IFAC Symposium on System Identiﬁcation. Washington D.C.. pp. 19–26. Goodwin, G.C. and R.L. Payne (1977). Dynamic System Identiﬁcation: Experiment Design and Data Analysis, volume 136 of Mathematics in Science and Engineering. Academic Press. 200 Bibliography Grenander, U. and G. Szegö (1958). Toeplitz Forms and Their Applications. University of California Press. Berkley, CA. Gustavsson, I., L. Ljung and T. Söderström (1977). Identiﬁcation of processes in closed loop – identiﬁability and accuracy aspects. Automatica 13, 59–75. Hannan, E. J. and M. Deistler (1988). The Statistical Theory of Linear Systems. J. Wiley & Sons. N. Y. Hansson, Anders and L. Vandenberghe (2001). A primal-dual potential reduction method for integral quadratic constraints. In: 2001 American Control Conference. Arlington, Virginia. pp. 3013–3018. Hildebrand, R. and M. Gevers (2003). Identiﬁcation for control: Optimal input design with respect to a worst case ν-gap cost function. SIAM Journal on Control and Optimization 41(5), 1586–1608. Hindi, H., B. Hassibi and S. Boyd (1998). Multiobjective H2 /H∞ optimal control via ﬁnite-dimensional Q-parametrization and linear matrix inequalities. In: American Control Conference (ACC 1998). Philadelphia, Pennsylvania, USA. Hjalmarsson, H. (1998). Control of nonlinear systems using Iterative Feedback Tuning. In: Proc. 1998 American Control Conference. Philadelphia. pp. 2083–2087. Hjalmarsson, H. (1999). Eﬃcient tuning of linear multivariable controllers using iterative feedback tuning. International Journal on Adaptive Control and Signal Processing 13, 553–572. Hjalmarsson, H. (2002). Iterative feedback tuning–an overview. International Journal on Adaptive Control and Signal Processing 16, 373– 395. Hjalmarsson, H. (2003). From experiments to closed loop control. In: 13th IFAC Symposium on System Identiﬁcation. Rotterdam, The Netherlands. Hjalmarsson, H. (2004). From experiments to control. Automatica. Hjalmarsson, H. and B. Ninness (2004). An exact ﬁnite sample variance expression for linearly parameterized frequency function estimates. Automatica. Submitted. Bibliography 201 Hjalmarsson, H. and H. Jansson (2003). Using a suﬃcient condition to analyze the interplay between identiﬁcation and control. In: 13th IFAC Symposium on System Identiﬁcation. Rotterdam, The Netherlands. Hjalmarsson, H. and K. Lindqvist (2002). Identiﬁcation of performance limitations in control using ARX-models. In: Proceedings of The 15th IFAC World Congress. Hjalmarsson, H. and M. Gever (2003). Algorithms and applications of iterative feedback tuning. Special Section in Control Engineering Practice 11, 1021. Hjalmarsson, H., M. Gevers and F. De Bruyne (1996). For model-based control design, closed loop identiﬁcation gives better performance. Automatica 32(12), 1659–1673. Hjalmarsson, H., M. Gevers and O. Lequin (1998). Iterative Feedback Tuning: Theory and Applications. IEEE Control Systems Magazine 18(4), 26–41. Hjalmarsson, H., S. Gunnarsson and M. Gevers (1994). A convergent iterative restricted complexity control design scheme. In: Proc. 33rd IEEE Conf. on Decision and Control. Orlando, FL. pp. 1735–1740. Jacobsen, E. W. (1994). Identiﬁcation for control of strongly interactive plants. In: AIChe Annual Meeting. San Francisco, CA. Jansson, H. and H. Hjalmarsson (2004a). A framework for mixed H∞ and H2 input design. In: MTNS’04. Leuven, Belgium. Jansson, H. and H. Hjalmarsson (2004b). A general framework for mixed H∞ and H2 input design. IEEE Trans. Aut. Contr. submitted. Jansson, H. and H. Hjalmarsson (2004c). Mixed H∞ and H2 input design for identiﬁcation. In: CDC’04. Bahamas. Johansson, R. (1993). System Modeling and Identiﬁcation. Information and System Sciences Series. Prentice-Hall. New Jersey. Jönsson, U. (1996). Robustness analysis of uncertain and nonlinear systems. Phd thesis. Lund Institute of Technology, Lund, Sweden. Kao, C.Y., A. Megretski and U. Jönsson (2004). Specialized fast algorithms for IQC feasibility and optimization problems. Automatica 40(2), 239–252. 202 Bibliography Karlin, S. and W. Studden (1966a). Optimal experimental designs. Ann. Math. Stat. 37, 783–815. Karlin, S. and W. Studden (1966b). Tscebycheﬀ Systems with Applications to Analysis and Statistics. Wiley-Interscience. Kiefer, J. (1959). Optimum experimental designs. J. Royal Stat. Soc. 21, 273–319. Kiefer, J. and J. Wolfowitz (1959). Optimum designs in regression problems. Ann. Math. Stat. 30, 271–294. Lacy, S.L., D.S. Bernstein and R.S. Erwin (2003). Finite-horizon input selection for system identiﬁcation. In: IEEE Conference on Decision and Control. IEEE. Maui, Hawaii, USA. Landau, I. D., D. Rey, A. Karimi, A. Voda and A. Franco (1995a). A ﬂexible transmission system as a benchmark for robust digital control. European Journal of Control 1(2), 77–96. Landau, I.D., D. Rey, A. Karimi, A. Voda and A. Franco (1995b). A ﬂexible transmission system as a benchmark for robust digital control. Journal of Process Control 1, 77–96. Lee, J. H. (2003). Control-relevant design of periodic test input signals for iterative open-loop identiﬁcation of multivariable ﬁr systems. In: 13th IFAC Symposium on System Identiﬁcation. Rotterdam, The Netherlands. Lee, W. S., B. D. O. Anderson, R. L. Kosut and I. M. Y. Mareels (1993). A new approach to adaptive robust control. IJACSP 7, 183–211. Levin, M. J. (1960). Optimal estimation of impulse response in the presence of noise. IRE Transactions on Circuit Theory CT-7, 50–56. Lindqvist, K. (2001). On experiment design in identiﬁcation of smooth linear systems. Licentiate thesis, TRITA-S3-REG-0103. Lindqvist, K. and H. Hjalmarsson (2000). Optimal input design using linear matrix inequalities.. In: Proc. 12th IFAC Symposium on System Identiﬁcation. Santa Barbara, California, USA. Lindqvist, K. and H. Hjalmarsson (2001). Identiﬁcation for control: Adaptive input design using convex optimization. In: Conference on Decision and Control. IEEE. Orlando, Florida, USA. Bibliography 203 Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Transactions on Automatic Control AC-22(4), 551–575. Ljung, L. (1985). Asymptotic variance expressions for identiﬁed black-box transfer function models. IEEE Transactions on Automatic Control AC-30, 834–844. Ljung, L. (1999). System Identiﬁcation - Theory For the User, 2nd ed. PTR Prentice Hall. Upper Saddle River, N.J. Ljung, L. and Z.D. Yuan (1985). Asymptotic properties of black-box identiﬁcation of transfer functions. IEEE Transactions on Automatic Control AC-30, 514–530. Mårtensson, J. and H. Hjalmarsson (2003). Identiﬁcation of performance limitations in control using general SISO-models. In: 13th IFAC Symposium on System Identiﬁcation. Megretski, A. and A. Rantzer (1997). System analysis via integral quadratic constraints. IEEE Transactions on Automatic Control AC-42(6), 819–830. Mehra, R. K. (1974). Optimal input signals for parameter estimation in dynamic systems–survey and new results. IEEE Transactions on Automatic Control AC-19, 753–768. Mehra, R. K. (1981). Choice of input signals. In: Trends and Progress in System Identiﬁcation (P. Eykhoﬀ, Ed.). Pergamon Press, Oxford. Milanese, M. and A. Vicino (1991). Optimal estimation theory for dynamic systems with set membership uncertainty: An overview. Automatica 27(6), 997–1009. Morari, M. and E. Zaﬁriou (1989). Robust Process Control. Prentice-Hall. Englewood Cliﬀs, NJ. Nesterov, Y. and A. Nemirovski (1994). Interior-Point Polynomial Methods in Convex Programming Studies Appl. Math. 13. Studies Appl. Math. 13, SIAM, Philadelphia, PA. Ng, T. S., G. C. Goodwin and R. L. Payne (1977a). On maximal accuracy estimation with output power constraints. IEEE Trans. Automatic Control 22, 133–134. 204 Bibliography Ng, T. S., G. C. Goodwin and T. Söderström (1977b). Optimal experiment design for linear systems with input-output constraints. Automatica 13, 571–577. Ninness, B. and H. Hjalmarsson (2002a). Exact quantiﬁcation of variance error. In: 15th World Congress on Automatic Control. Barcelona, Spain. Ninness, B. and H. Hjalmarsson (2002b). Quantiﬁcation of variance error. In: 15th World Congress on Automatic Control. Barcelona, Spain. Ninness, B. and H. Hjalmarsson (2003). The analysis of variance error: Quantiﬁcations exact for ﬁnite model order. In: IEEE Conference on Decision and Control. IEEE. Maui, Hawaii, USA. Ninness, B. and H. Hjalmarsson (2004). Variance error quantiﬁcations that are exact for ﬁnite model order. IEEE Transactions on Automatic Control. Ninness, B., H. Hjalmarsson and F. Gustafsson (1999). The fundamental role of general orthonormal bases in system identiﬁcation. IEEE Transactions on Automatic Control AC-44, 1384–1406. Ogunnaike, B. A. (1996). A contemporary industrial perspective on control theory and practice. A. Rev.Control 20, 1–8. Payne, R.L. and G.C. Goodwin (1974). Simpliﬁcation of frequency domain experiment design for SISO systems. Technical Report 74/3. Imperial College, London. Pintelon, R. and J. Schoukens (2001). System Identiﬁcation: A frequency domain approach. IEEE Press. Porat, B. (1994). Digital Processing of Random Signals. Prentice-Hall. Englewood Cliﬀs, NJ. Rantzer, A. (1996). On the Kalman-Yakubovich-Popov lemma. Systems and Control letters 28(1), 7–10. Rivera, D.E., H. Lee, M.W. Braun and H.D. Mittelman (2003). Plantfriendly system identiﬁcation: a challenge for the process industries. In: 13th IFAC Symposium on System Identiﬁcation. Rotterdam, The Netherlands. pp. 917–922. Bibliography 205 Rivera, D.E., M.W. Braun and H.D. Mittelmann (2002). Constrained multisine inputs for plant-friendly identiﬁcation of chemical processes.. In: 15th IFAC World Congress. Barcelona, Spain. Samyudiya, Y. and J.H. Lee (2000). A two-setp approach to controlrelevant design of test input signals for iterative system identiﬁcation.. In: 12th IFAC Symposium on System Identiﬁcation. Santa Barbara, California, USA. Schur, I. (1918). On power series which are bounded in the interior of the unit circle I and II. Reine Angew. Math. 148, 122–145. Shirt, R.W., T.J. Harris and D.W. Bacon (1994). Experimental design considerations for dynamic systems. Ind. Eng. Res. 33, 2656–2667. Sjöberg, J. and F. De Bruyne (1999). On a nonlinear controller tuning strategy. In: 14th IFAC World Congress. Vol. I. Beijing, P.R. China. pp. 343–348. Sjöberg, J. and M. Agarwal (1996). Model-free repetitive control design for nonlinear systems. In: Proc. 35th IEEE Conf. on Decision and Control. Vol. 4. Kobe, Japan. pp. 2824–2829. Sjöberg, J. and M. Agarwal (1997). Nonlinear controller tuning based on linearized time-variant model. In: Proc. of American Control Conference. Albuquerque, New Mexico. Skogestad, S. (2003). Simple analytic rules for model reduction and PID controller tuning. Journal of Process Control 13, 291–309. Skogestad, S. and I. Postlethwaite (1996). Multivariable Feedback Control, Analysis and Design. John Wiley and Sons. Söderström, T. and P. Stoica (1989). System Identiﬁcation. Prentice-Hall International. Hemel Hempstead, UK. Stoica, P. and R. Moses (1997). Introduction to Spectral Analysis. Prentice Hall. New Jersey. Stoica, P. and T. Söderström (1982). A Useful Parameterization for Optimal Experiment Design. IEEE Trans. Aut. Contr. AC-27(4), 986– 989. Stoica, P., T. McKelvey and J. Mari (2000). MA estimation in polynomial time. IEEE Transactions on Signal Processing AC-48, 1999–2012. 206 Bibliography Trulsson, E. and L. Ljung (1985). Adaptive control based on explicit criterion minimization. Automatica 21(4), 385–399. Tulleken, H. J. A. (1990). Generalized Binary Noise Test-signal Concept for Improved Identiﬁcation-experiment Design. Automatica 26(1), 37–49. Van den Hof, P. M. J. and R.J.P. Schrama (1995). Identiﬁcation and control - closed loop issues. Automatica 31(12), 1751–1770. Vinnicombe, G. (1993). Frequency domain uncertainty and the graph topology. IEEE Transactions on Automatic Control AC38(9), 1371–1383. Wahlberg, B. (1991). System Identiﬁcation Using Laguerre Models. IEEE Transactions on Automatic Control AC-36(5), 551–562. Wahlberg, B. (1994). System Identiﬁcation Using Kautz Models. IEEE Transactions on Automatic Control AC-39(6), 1276–1281. Wahlberg, B. and L. Ljung (1992). Hard Frequency-Domain Model Error Bounds from Least-Squares Like Identiﬁcation Techniques. IEEE Transactions on Automatic Control AC-37(7), 900–912. Wallin, Ragnar, Anders Hansson and L. Vandenberghe (2003). Comparison of two structure-exploiting optimization algorithms for integral quadratic constraints. In: 4th IFAC symposium on robust control design. Milan, Italy. Weyer, E. and M. C. Campi (2002). Non-asymptotic conﬁdence ellipsoids for the least-square estimate. Automatica 38, 1539–1547. Whitaker, H.P., J. Yamron and A. Kezer (1958). Design of modelreference adaptive control systems for aircraft.. In: Technical report, Report R–164, Instrumentation Laboratory. MIT, MA. Wu, S-P., S. Boyd and L. Vandenberghe (1996). FIR Filter Design via Semideﬁnite Programming and Spectral Factorization. In: Proc. 35th IEEE Conf. on Decision and Control. Kobe, Japan. pp. 271– 276. Xie, L.-L and L. Ljung (2001). Asymptotic variance expressions for estimated frequency functions. IEEE Transactions on Automatic Control AC-46, 1887–1899. Bibliography 207 Yakubovich, V. A. (1962). Solution of certain matrix inequalities occuring in the theory of automatic control. Docl. Acad. Nauk. SSSR pp. 1304–1307. Yuan, Z. D. and L. Ljung (1984). Black-box identiﬁcation of multivariable transfer functions – asymptotic properties and optimal input design. Int. J. Control 40(2), 233–256. Yuan, Z. D. and L. Ljung (1985). Unprejudiced optimal open loop input design for identiﬁcation of transfer functions. Automatica 21(6), 697–708. Zang, Z., R. R. Bitmead and M. Gevers (1995). Iterative weighted leastsquares identiﬁcation and weighted LQG control design. Automatica 31, 1577–1594. Zarrop, M. (1979). Design for Dynamic System Identiﬁcation. Lecture Notes in Control and Information Sciences. Sci. 21. Springer Verlag, Berlin. Zhou, K., J.C. Doyle and K. Glover (1996). Robust and Optimal Control. Prentice Hall, New Jersey. Zhu, Y. C. (1998). Multivariable process identiﬁcation for MPC: the asymptotic method and its applications. J. Proc. Control 8(2), 101– 115. Zhu, Y. C. (2001). Multivariable System Identiﬁcation for Process Control. Pergamon. Oxford. Zhu, Y. C. and P. P. J. van den Bosch (2000). Optimal closed-loop identiﬁcation test design for internal model control. Automatica 36, 1237– 1241.

Download PDF

advertisement