On Input Design for System Identification Input Design Using Markov Chains CHIARA BRIGHENTI Masters’ Degree Project Stockholm, Sweden March 2009 XR-EE-RT 2009:002 Abstract When system identification methods are used to construct mathematical models of real systems, it is important to collect data that reveal useful information about the systems dynamics. Experimental data are always corrupted by noise and this causes uncertainty in the model estimate. Therefore, design of input signals that guarantee a certain model accuracy is an important issue in system identification. This thesis studies input design problems for system identification where time domain constraints have to be considered. A finite Markov chain is used to model the input of the system. This allows to directly include input amplitude constraints into the input model, by properly choosing the state space of the Markov chain. The state space is defined so that the model generates a binary signal. The probability distribution of the Markov chain is shaped in order to minimize an objective function defined in the input design problem. Two identification issues are considered in this thesis: parameter estimation and NMP zeros estimation of linear systems. Stochastic approximation is needed to minimize the objective function in the parameter estimation problem, while an adaptive algorithm is used to consistently estimate NMP zeros. One of the main advantages of this approach is that the input signal can be easily generated by extracting samples from the designed optimal distribution. No spectral factorization techniques or realization algorithms are required to generate the input signal. Numerical examples show how these models can improve system identification with respect to other input realization techniques. 1 Acknowledgements Working on my thesis at the Automatic Control department at KTH has been a great experience. I would like to thank my supervisor Professor Bo Wahlberg and my advisor Dr. Cristian R. Rojas for having given me the possibility to work on very interesting research topics. Thank you for your guidance. A special thank goes to Cristian R. Rojas, for the time he spent on me and for the many ideas shared with me. Thanks to all the people I had the pleasure to know during this work period: Mitra, Andre B., Fotis, Pedro, Mohammad, Andre, Pierluigi, Matteo, Davide and Alessandro. I really enjoyed your company. I would like to thank Professor Giorgio Picci who made this experience possible. Finally, thanks to my family and my friends for their continuous help and support. 2 Acronyms AR Autoregressive FDSA Finite Difference Stochastic Approximation FIR Finite Impulse Response LMI Linear Matrix Inequality LTI Linear Time Invariant NMP Non Minimum Phase PEM Prediction Error Method PRBS Pseudo Random Binary Signal RLS Recursive Least Squares SISO Single Input Single Output SPSA Simultaneous Perturbation Stochastic Approximation WN White Noise I Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I 1 Introduction 1.1 Thesis outline and contributions . . . . . . . . . . . . . . . . 1 3 2 System Identification 2.1 System and model description . . . . . 2.2 Identification method . . . . . . . . . 2.3 Estimate uncertainty . . . . . . . . . . 2.3.1 Parameter uncertainty . . . . . 2.3.2 Frequency response uncertainty 2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . 4 4 5 6 6 7 7 3 Input Design for System Identification 3.1 Optimal input design problem . . . . . . . . . . . . . . . . . . 3.2 Measures of estimate accuracy . . . . . . . . . . . . . . . . . 3.2.1 Quality constraint based on the parameter covariance 3.2.2 Quality constraint based on the model variance . . . . 3.2.3 Quality constraint based on the confidence region . . . 3.3 Input spectra parametrization . . . . . . . . . . . . . . . . . . 3.3.1 Finite dimensional spectrum parametrization . . . . . 3.3.2 Partial correlation parametrization . . . . . . . . . . . 3.4 Covariance matrix parametrization . . . . . . . . . . . . . . . 3.5 Signal constraints parametrization . . . . . . . . . . . . . . . 3.6 Limitations of input design in the frequency domain . . . . . 3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 10 11 11 12 12 13 14 15 15 16 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Markov Chain Input Model 18 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Markov chains model . . . . . . . . . . . . . . . . . . . . . . . 19 4.2.1 Markov chain state space . . . . . . . . . . . . . . . . 19 II 4.3 4.4 4.2.2 Markov chains spectra . . . . . . . . . . . . . . . . . . More general Markov chains . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Estimation Using Markov 5.1 Problem formulation . . 5.2 Solution approach . . . 5.3 Cost function evaluation 5.4 Algorithm description . 5.5 Numerical example . . . 5.6 Conclusions . . . . . . . Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 25 26 27 27 28 28 29 31 42 6 Zero Estimation 44 6.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . 44 6.2 FIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.3 ARX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.4 General linear SISO systems . . . . . . . . . . . . . . . . . . . 47 6.5 Adaptive algorithm for time domain input design . . . . . . . 48 6.5.1 A motivation . . . . . . . . . . . . . . . . . . . . . . . 48 6.5.2 Algorithm description . . . . . . . . . . . . . . . . . . 49 6.5.3 Algorithm modification for Markov chain signals generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.5.4 Numerical example . . . . . . . . . . . . . . . . . . . . 51 6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7 Summary and future work 53 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 References 55 III List of Figures 4.1 4.2 4.3 Graph representation of the two states Markov chain S2 . . . . Graph representation of the four states Markov chain S4 . . . Graph representation of the three states Markov chain S3 . . . 20 20 25 Mass-spring-damper system. . . . . . . . . . . . . . . . . . . . Cost functions f2 and f3 in the interval of acceptable values of the variables umax and ymax , respectively. . . . . . . . . . 5.3 Cost functions f1 , f4 ,f5 in the interval of acceptable values. 5.4 Estimate of the cost function on a discrete set of points for the two states Markov chain in the case 2. . . . . . . . . . . 5.5 Estimation of the best transition probability for the two states Markov chain in the case 2. . . . . . . . . . . . . . . . . . . . 5.6 Estimation of the best transition probability p for the four states Markov chain in the case 2. . . . . . . . . . . . . . . . 5.7 Estimation of the best transition probability r for the four states Markov chain in the case 2. . . . . . . . . . . . . . . . 5.8 Estimate of the cost function on a discrete set of points for the two states Markov chain in the case 1. . . . . . . . . . . . 5.9 Estimate of the cost function on a discrete set of points for the four states Markov chain in the case 1. . . . . . . . . . . . 5.10 Bode diagrams of the optimal spectra of the 2 states Markov chains in the cases 1, 2 and 3 of Table 5.3, and of the real discrete system. . . . . . . . . . . . . . . . . . . . . . . . . . . 5.11 Bode diagrams of the optimal spectra of the 4 states Markov chains in the cases 1, 2 and 3 of Table 5.3, and of the real discrete system. . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12 Estimates of the frequency response of the system using the optimal two states Markov chains for the cases 1, 2 and 3 in Table 5.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.1 5.2 IV 32 32 34 35 37 37 38 38 40 40 41 5.13 Estimates of the frequency response of the system using the optimal four states Markov chains for the cases 1, 2 and 3 in Table 5.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 6.2 6.3 Representation of the adaptive algorithm iteration at step k. uk denotes the vector of all collected input values from the beginning of the experiment, used for the output prediction ŷ (k). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A zero estimate trajectory produced by the adaptive algorithms described in Sections 6.5.2 and 6.5.3. . . . . . . . . . . Normalized variance of the estimation error for the adaptive algorithms described in Sections 6.5.2 and 6.5.3. . . . . . . . V 42 50 51 52 List of Tables 4.1 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 Poles and zeros of the canonical spectral factor of the spectrum of sm of Example 4.2.4. . . . . . . . . . . . . . . . . . . Maximum threshold values in the three analyzed cases. . . . . Results of 100 Monte-Carlo simulations of the algorithm with the 2 states Markov chain. . . . . . . . . . . . . . . . . . . . . Optimal values of the transition probabilities in the cases 1, 2 and 3, obtained after 30000 algorithm iterations. . . . . . . Total cost function values obtained with the optimal Markov inputs, a PRBS and white noise in case 1. . . . . . . . . . . . Trace of the covariance matrix obtained with the optimal Markov inputs, a PRBS, white noise, a binary input having the optimal correlation function and the optimal spectrum in case 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Total cost function value obtained with the optimal Markov inputs, a PRBS and white noise in case 3. . . . . . . . . . . Estimated values of the parameters of the continuous real system and relative percentage errors, obtained with the optimal two states Markov chains. . . . . . . . . . . . . . . . . . . . . Estimated values of the parameters of the continuous real system and relative percentage errors, obtained with the optimal four states Markov chains. . . . . . . . . . . . . . . . . . . . . VI 24 33 34 35 36 36 36 41 42 Chapter 1 Introduction Mathematical models for systems are necessary in order to predict their behavior and as parts of their control systems. This work focuses on models constructed and validated from experimental input/output data, by means of identification methods. Information obtained through experiments on the real system depend on the input excitation which is often limited by amplitude or power constraints. For this reason, experiment design is necessary in order to obtain system estimates within a given accuracy, saving time and cost of the experiment [1]. Robustness of input design for system identification is also one of the most important issues, specially when the model of the system is used for projecting its control system. In [2]- [7] some studies on this problem are presented. The effects of undermodeling on input design are pointed out in [8] and [9]. Depending on the cost function considered in this setting, input design can be typically solved as a constrained optimization problem. In the Prediction Error Method (PEM) framework it is common to use, as a measure of the estimate accuracy, a function of the asymptotic covariance matrix of the parameter estimate. This matrix depends on the input spectrum that can then be shaped in order to obtain a “small” covariance matrix and improve the estimate accuracy (see [10], [11]). Usually, a constraint on the input power is also included; in this way, time domain amplitude constraints are approximately translated in the frequency domain [12]. A first disadvantage of these methods is that they are strongly influenced by the initial knowledge of the system. Secondly, solving the problem in the frequency domain does not provide any further information on how to generate the input signal in the time domain: the input can be represented as filtered white noise, but many probability distributions can be used to 1 CHAPTER 1. INTRODUCTION 2 generate white noise. Furthermore, in practical applications time domain constraints on signals have to be considered and the power constraint that is usually set in the frequency domain does not assure that these constraints are respected. For this reason, in [13] a method is proposed to generate a binary input with a prescribed correlation function; once an optimal spectrum or correlation function is found solving the input design problem in the frequency domain, it is possible to generate a binary signal which approximates the optimal input. Also in [14] a method is proposed that provides a locally optimal binary input in the time domain. This thesis studies the input design problem in the probability domain. Compared to design methods in the frequency domain, a solution in the probability domain makes it easier to generate input trajectories to apply to the real system, by extracting samples from a given distribution. Inputs are modeled by finite stationary Markov chains which generate binary signals. Binary signals are often used in system identification and one of the reasons is that they achieve the largest power in the set of all signals having the same maximum amplitude and it is well known that this improves parameter estimation for linear models. The idea of modeling the input by a finite Markov chain derives from the possibility of including input amplitude constraints directly into the input model, by suitably choosing the state space of the Markov chain. Furthermore, unlike the design in the frequency domain, this approach keeps more degrees of freedom in the choice of the optimal spectrum, which in general is non unique [12]. Two identification problems are considered here: parameter estimation and non-minimum phase zeros estimation for LTI systems. For the first problem, the optimal distribution is found by minimizing the cost function defined in the input design problem with respect to the one-step transition probabilities of the Markov chain. In this analysis, a stochastic approximation algorithm is used since a closed-form solution to the optimization problem is not available and the cost is a stochastic function of these transition probabilities and is contaminated with noise (see [15], [16] for details). For the second problem, it will be shown that the Markov chain input model has exactly the optimal spectrum for the zero estimation problem of a FIR or ARX model [6]. In general, the spectrum of a two states Markov chain can be made equal to the spectrum of the AR process which guarantees a consistent estimate of the NMP zero of a linear SISO system [17]. Therefore, an optimal or consistent input can be generated in the time domain by Markov chain distributions. The adaptive algorithm introduced CHAPTER 1. INTRODUCTION 3 in [18] for input design in the time domain when undermodeling is considered, is modified here in order to generate Markov chain signals having the same spectrum as the general inputs designed in the original version of the algorithm. The advantage is that a binary signal is then used to identify the non-minimum phase zero, by keeping the same input variance and spectrum. The outline of the thesis is presented in the next section. 1.1 Thesis outline and contributions The subject of this thesis is input design for system identification. In particular, the objective of this study is to analyze a new approach to input design: work in the probability domain model the input signal as a finite Markov chain. The first chapters summarize some known results of system identification in the PEM framework and input design in the frequency domain, in order to compare methods and results obtained with the classical input design method in the frequency domain and with the method proposed here. In Chapter 2 PEM and its asymptotic properties are reviewed. In Chapter 3 the most commonly adopted input design methods are described. These formulate input design problems as convex optimization problems. The solution is given as an optimal input spectrum. In Chapter 4 general Markov chains that model binary input signals are defined. Some spectral properties are also described. In Chapter 5 are presented the input design problem for parameter estimation and the solution approach based on the Markov chain input model. In Chapter 6 input design for identification of NMP zeros of a LTI system is considered. The chapter discusses classical solutions in the frequency domain and adaptive solutions in the time domain in undermodeling conditions where the input is modeled as an AR or a Markov chain process. Chapter 7 concludes the thesis. Chapter 2 System Identification This chapter introduces system identification in the PEM framework for parameter estimation of LTI models. Once a model structure is defined, this method finds the model’s parameters that minimize the prediction error. Even when the model structure is able to capture the true system dynamics, the estimate error will not be zero, since data used in the identification procedure are finite and corrupted by noise. The purpose of input design is to minimize the estimate error by minimizing the variance of the parameter estimates, assuming the estimation method is consistent. Section 2.1 defines systems and model structures considered in this work, while Section 2.2 discusses PEM and its asymptotic properties. 2.1 System and model description This thesis considers discrete-time LTI SISO systems, lying in the set M of parametric models y (t) = G (q, θ) u (t) + H (q, θ) e (t) , (2.1) b1 + b2 q −1 + · · · + bnb q −nb 1 + a1 q −1 + · · · + ana q −na 1 + c1 q −1 + · · · + cnc q −nc , 1 + d1 q −1 + · · · + dnd q −nd G (q, θ) = q −nk H (q, θ) = θ = [b1 , . . . , bnb , a1 , . . . ana , c1 , . . . cnc , d1 , . . . dnd ]T ∈ Rb×1 where u (t) is the input, y (t) is the output and e (t) is zero mean white noise with finite variance. The symbol q −1 represents the delay operator (q −1 u (t) = u (t − 1)). Assume H (q, θ) is stable, monic and minimum phase, i.e. poles and zeros lie inside the unit circle. 4 CHAPTER 2. SYSTEM IDENTIFICATION 5 The real system S is given as y (t) = G0 (q) u (t) + H0 (q) e0 (t) , (2.2) where e0 (t) has finite variance λ0 . Assume there exists θ0 , parameter vector such that G (q, θ0 ) = G0 (q) and H (q, θ0 ) = H0 (q), i.e. assume there is no undermodeling: S ∈ M. (2.3) This condition is hardly satisfied in practice, since real systems often are of high order or non linear. Nevertheless, as it will be explained in the next section, this condition is crucial for the consistence and the asymptotic properties of PEM. In Chapter 5, regarding the parameter estimation problem, (2.3) will be supposed to hold, while in Chapter 6, where the zero estimation problem is analyzed, this condition will not necessarily be considered. 2.2 Identification method System identification aims at describing a real system through a mathematical model constructed and validated from experimental input-output data. The identification method considered here is PEM [19]. This method minimizes the function of the prediction error εF (t, θ) N ¡ ¢ 1 X 2 VN θ, Z N = εF (t, θ) 2N (2.4) t=1 where Z N is a vector containing the collected input-output data, i.e. Z N = [y (1) , u (1) , y (2) , u (2) . . . , y (N ) , u (N )]. The prediction error is defined as εF (t, θ) = y (t, θ) − ŷ (t, θ), where the one-step ahead predictor is given by £ ¤ ŷ (t, θ) = H −1 (q, θ) G (q, θ) u (t) + 1 − H −1 (q, θ) y (t) . Suppose all the hypothesis for the consistence of PEM are satisfied; in that case, the parameter estimate θ̂N converges to the true parameter vector θ0 as N tends to infinity. Briefly, these conditions are 1. Condition (2.3) holds, i.e. there is no undermodeling 2. The signals y (t) and u (t) are jointly quasi-stationary 3. u (t) is persistently exciting of sufficiently large order CHAPTER 2. SYSTEM IDENTIFICATION 2.3 6 Estimate uncertainty Measuring the quality of the model estimate is an important issue in system identification. The measure is chosen depending on the application for which the model is required. One possibility to measure the estimate uncertainty is to use a function of the covariance matrix of the parameter estimate. In other cases, such as in control applications, it could be better to use the variance of the frequency response estimate in the frequency domain. These two cases are now presented in more detail. 2.3.1 Parameter uncertainty Under the assumptions for the consistency of PEM, it holds that ´ √ ³ N θ̂N − θ0 → N (0, Pθ0 ) as N → ∞ Pθ−1 = 0 ψ (t, θ0 ) = (2.5) ¤ 1 £ E ψ (t, θ0 ) ψ T (t, θ0 ) λ0 ¯ ∂ ŷ (t, θ) ¯¯ ∂θ ¯θ0 where N denotes the Normal distribution [19]. Therefore, when the model class is sufficiently flexible to describe the real system, the parameter estimate will converge to the true parameter vector as the number of data N used in the estimation goes to infinity, with covariance decaying as 1/N . From (2.5) it follows that a confidence region in which the parameter estimate will lie with probability α is ½ ¾ ³ ´T ³ ´ −1 2 Uθ = θ| N θ − θ̂N Pθ0 θ − θ̂N ≤ χα (n) . (2.6) The covariance matrix defines an ellipsoid asymptotically centered in θ0 . Upon the condition that u and e are independent (that is, data are collected in open-loop), the asymptotic expression in the number of data points N of the inverse of the covariance matrix of the parameter estimate is Z π ¡ ¢ ¡ ¢ N −1 Pθ0 = Fu eiω , θ0 Φu (ω) Fu? eiω , θ0 dω + Re (θ0 ) (2.7) 2πλ0 −π Z ¡ ¢ ¡ ¢ N π Re (θ0 ) = Fe eiω , θ0 Fe? eiω , θ0 dω 2π −π CHAPTER 2. SYSTEM IDENTIFICATION where 7 ¡ iω ¢ ¯¯ ¡ iω ¢ ¡ ¢ ∂G e ,θ ¯ Fu e , θ0 = H −1 eiω , θ0 ¯ ¯ ∂θ θ0 ¡ iω ¢ ¯¯ ¡ ¢ ¡ ¢ ∂H e , θ ¯ Fe eiω , θ0 = H −1 eiω , θ0 ¯ ¯ ∂θ (2.8) (2.9) θ0 and Φu (ω) is the power spectral density of the input u (t). Here ? denotes the complex conjugate transpose. Expression (2.7) shows that the asymptotic covariance matrix of the parameter estimate depends on the input spectrum. Therefore, by shaping Φu (ω) it is possible to obtain estimates within a given accuracy. In the whole thesis it will assumed that there is no feedback in the system, i.e. u and e are independent. 2.3.2 Frequency response uncertainty In many applications, it could be preferable to measure the quality of the model estimate using the variance of the frequency response estimate, frequency by frequency. In [19] it is shown that under the condition (2.3), the ³ ´ iω variance of G e , θ̂N can be approximated by ³ ³ ´´ m Φ (ω) v Var G eiω , θ̂N ≈ N Φu (ω) (2.10) for large but finite model order m and number of data N , where v is the process defined as v (t) = H0 (q) e0 (t). If the model order is not large enough the previous expression is not a good approximation. Instead, by the Gauss’ approximation formula, it is possible to write ¡ ¢ ¯? ¡ ¢¯ ³ ³ ´´ ∂G eiω , θ ¯¯ 1 ∂G eiω , θ ¯¯ iω Var G e , θ̂N ≈ (2.11) ¯ Pθ0 ¯ . ¯ ¯ N ∂θ ∂θ θ0 θ0 Equation (2.11) expresses the frequency response uncertainty in terms of the parameter uncertainty. Therefore, both equations (2.10) and (2.11) show that a proper choice of the input spectrum can reduce the variance of the frequency response estimate. This is the purpose of input design. 2.4 Conclusions This chapter introduced PEM for system identification and its asymptotic properties that are often used to solve input design problems. CHAPTER 2. SYSTEM IDENTIFICATION 8 Models constructed from experimental data are always affected by uncertainty. In Section 2.3 two possible measures of the model uncertainty were discussed: parameter uncertainty and frequency response uncertainty. The choice between the two depends on the application. Typically, in control applications a measure in the frequency domain is preferable. This work considers parameter uncertainty. Chapter 3 Input Design for System Identification This chapter presents general input design problems for system identification. Typically, input design aims at optimizing some performance function under constraints on the estimate accuracy and on the input signal. The solution approach will be presented in detail, describing how input design problems can be formulated as convex optimization problems. Ideas and drawbacks of the general input design framework are reviewed in Section 3.1. The most widely used measures of estimate accuracy are presented in Section 3.2. Here is also shown how quality constraints can be written as convex constraints. Sections 3.3 to 3.5 describe some techniques used for spectra and signal constraints parametrization, needed for handling finitely parametrized problems. 3.1 Optimal input design problem In a general formulation, input design problems are constrained optimization problems, where the constraints are typically on the input signal spectrum or power and the estimate accuracy. In this framework the objective function to be optimized can be any performance criterion, which usually depends on the practical application. For example, input power or experiment time can be minimized. Common input design problem formulations are: 1. Optimize some measure of the estimate accuracy, under constraints on input excitation. 2. Optimize some property of the input signal, given constraints on the 9 CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 10 estimate accuracy. As it will be discussed in the next section, typical measures of the estimate accuracy are functions of the uncertainty in the model estimate, like (2.7), (2.10) and (2.11). As was shown in Section 2.3, these functions depend on the input spectrum, which therefore can be used to optimize the objective function. A formal expression of the first problem formulation is min f (Pθ0 ) Φu subject to (3.1) g (Φu ) ≤ α, that can also be written as min γ Φu subject to (3.2) f (Pθ0 ) ≤ γ g (Φu ) ≤ α. The next sections will show how this type of constraints can be formulated as convex constraints, upon certain conditions on the functions f and g. The following drawbacks of input design problems like (3.2) have to be enlightened. First of all, notice that the asymptotic expression of the covariance matrix depends on the true parameter θ0 that is not known. Secondly, the constraints may be non-convex and infinite dimensional. In that case, a parametrization of the input spectrum is necessary in order to handle finitely parametrized optimization problems. Furthermore, once the optimal input spectrum has been found, an input signal having that optimal spectrum has to be generated. This can be done by filtering white noise with an input spectral factor1 . Nevertheless, no information on the probability distribution of the white noise is given in this solution approach. 3.2 Measures of estimate accuracy In the usual input design framework, three types of quality measures are typically considered. These are described in the following sections. 1 ¡ ¢ By spectral factor is meant an analytic function L (z) such that Φu (z) = L (z) L z −1 . CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 3.2.1 11 Quality constraint based on the parameter covariance In Section 2.3 it has been shown that the asymptotic covariance matrix of the parameter estimate is an affine function of the input spectrum, through (2.7). If the purpose of the identification procedure is parameter estimation, then typical scalar measures of estimate accuracy are the trace, the determinant or the maximum eigenvalue of the covariance matrix [12]. Then the following quality constraints can be introduced: TrPθ0 ≤ γ (3.3) detPθ0 ≤ δ (3.4) λmax (Pθ0 ) ≤ ². (3.5) It is possible to prove that these constraints can be manipulated to be convex in Pθ−1 ; proofs can be found in [20] and [21] for (3.4) and (3.5), respectively. 0 For example, (3.5) is equivalent to " # I² I ≥0 (3.6) I Pθ−1 0 which is an LMI in Pθ−1 . 0 The constraint (3.3) is a special case of the more general weighted trace constraint that will be considered in the next subsection. Notice that all these quality constraints depend on the true parameter vector θ0 . Many solutions have been presented in the literature to handle this problem, as will be discussed afterwards. 3.2.2 Quality constraint based on the model variance Consider the quality constraint based on the variance of the frequency response Z π ³ ³ ´´ 1 F (ω) Var G eiω , θ̂N dω ≤ γ, (3.7) 2π −π where F (ω) is a weighting function. By substituting the variance expression (2.11), it results that this quality constraint can be written as TrW Pθ0 ≤ γ, where 1 W = 2π Z π −π (3.8) ¡ ¢ ¯? ¡ ¢¯ ∂G eiω , θ ¯¯ 1 ∂G eiω , θ ¯¯ ¯ F (ω) ¯ dω. ¯ ¯ N ∂θ ∂θ θ0 (3.9) θ0 See [12] for details. The following Lemma generalizes the previous result. A proof can be found in [6]. CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 12 Lemma 3.2.1 The problem TrW (ω) Pθ0 ≤ γ, ∀ω W (ω) = V (ω) V ? (ω) ≥ 0, ∀ω Pθ0 ≥ 0 can be formulated as an LMI of the form " γ − TrZ ≥ 0 # Z V ? (ω) ≥ 0, V (ω) Pθ−1 0 ∀ω. (3.10) Notice that this formulation is convex in the matrices Pθ−1 and Z (see [12], 0 [6]). Notice also that this type of constraint includes the constraint on the trace of the covariance matrix (3.3) as a special case, when W is the identity matrix. 3.2.3 Quality constraint based on the confidence region In control applications, it is often preferable to have frequency by frequency constraints on the estimate error. Consider the measure [6, 12] ¡ ¢ ¡ ¢ ¡ iω ¢ ¡ iω ¢ G0 eiω − G eiω , θ ∆ e ,θ = T e . (3.11) G (eiω , θ) The input has to be designed so that ¯ ¡ iω ¢¯ ¯∆ e ¯ ≤ γ, ∀ω, ∀θ ∈ Uθ . (3.12) This constraint can also be formulated as a convex constraint in Pθ−1 , as 0 proven in [22]. 3.3 Input spectra parametrization As discussed in the last section, the typical measures of estimate accuracy are functions of the covariance matrix Pθ0 . Expression (2.7), derived in the asymptotic analysis of PEM, shows that the input spectrum Φu can be used to optimize the estimate performance. The problem of finding an optimal input spectrum has an infinite number of parameters, since Φu (ω) is a continuous function of the frequency ω. Nevertheless, by a proper spectrum parametrization, it is possible to formulate the problem as a convex CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 13 and finite dimensional optimization problem, since the parametrization of the input spectrum leads to a parametrization of the inverse of the covariance matrix [6, 12]. A spectrum can always be written in the general form Φu (ω) = ∞ X ¡ ¢ c̃k Bk eiω , (3.13) k=−∞ © ¡ ¢ª∞ where Bk eiω k=−∞ are proper stable rational basis functions that span2 L2 . It is always possible to choose basis functions having the hermitian ¡ ¢ ¡ ¢ properties B−k = Bk? and Bk e−iω = Bk? eiω [6]. The coefficients c̃k satisfy the symmetry property c̃−k = c̃k and must be such that Φu (ω) ≥ 0, ∀ω, (3.14) otherwise Φu would not be a spectrum. For example, the FIR representation of a spectrum is obtained by choosing Bk (ω) = e−iωk ; consequently, it results c̃k = rk , where rk is the correlation function of the process u. By substituting (3.13) into (2.7), the inverse of the covariance matrix becomes an affine function of the coefficients c̃k of the form Pθ−1 = 0 ∞ X c̃k Qk + Q̄. (3.15) k=−∞ This parametrization of the input spectrum leads to a denumerable but infinite number of parameters in the optimization problem. Two possible spectra parametrizations are described in the following subsections, which make the problem finitely parametrized. 3.3.1 Finite dimensional spectrum parametrization The finite dimensional spectrum parametrization has the form ¡ ¢ ¡ ¢ Φu (ω) = Ψ eiω + Ψ? eiω ¡ ¢ Ψ eiω = M −1 X ¡ ¢ c̃k Bk eiω . (3.16) k=0 This parametrization forces the coefficients c̃M , c̃M +1 , . . . to be zero. Therefore, the condition Φu (ω) ≥ 0 must be assured through the coefficients −1 {c̃k }M k=0 . The following result, deriving from an application of the Positive Real Lemma [23], can be used to assure the constraint (3.14). 2 L2 denotes the set © f| R |f (x)|2 dx < ∞ ª CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 14 Lemma 3.3.1 Let {A, B, C, D} be a controllable state-space realization of ¡ ¢ Ψ eiω . Then there exists a matrix Q = QT such that Ã ! Ã ! Q − AT QA −AT QB 0 CT + ≥0 (3.17) −B T QA −B T QB C D + DT if and only if Φu (ω) , PM −1 k=0 £ ¡ ¢ ¡ ¢¤ c̃k Bk eiω + Bk? eiω ≥ 0, ∀ω. The state-space realization of the positive real part of the input spectrum can be easily constructed. For example (Example 3.5 in [6]), an FIR spectrum has positive real part given by M −1 X ¡ ¢ 1 rk e−iωk . Ψ eiω = r0 + 2 (3.18) k=1 ¡ ¢ A controllable state-space realization of Ψ eiω is ! Ã ³ ´T O1×M −2 0 , B = 1 0 ... 0 A= IM −2 OM −2×1 ³ ´ 1 C = r1 r2 . . . rM −1 , D = r0 . 2 (3.19) Therefore, in this example the constraint (3.14) can be written as an LMI in Q and r1 , . . . , rM −1 . 3.3.2 Partial correlation parametrization The partial correlation parametrization uses the finite expansion M −1 X ¡ ¢ c̃k Bk eiω (3.20) k=−(M −1) in order to design only the first M coefficients c̃k . In this case, it is necessary to assure that there exists a sequence c̃M , c̃M +1 , . . . such that the complete sequence {c̃k }∞ k=0 defines a spectrum. That is, the condition (3.14) must hold. This means that (3.20) does not necessary define a spectrum itself, but the designed coefficients are extendable to a sequence that parametrizes a spectrum. As explained in [6], if an FIR spectrum is considered, a necessary and sufficient condition for (3.14) to hold is r0 r1 . . . rM −1 r0 . . . rM −2 r1 . (3.21) .. .. .. ≥0 . . . . . rM −1 rM −2 . . . r0 CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 15 (see [24] and [25]). ¡ ¢ This condition also applies to more general basis functions, like Bk eiω = ¡ iω ¢ −iωk ¡ iω ¢ L e e , where L e > 0 [6, 12]. The constraint (3.21) is an LMI in the first M correlation coefficients and therefore is convex in these variables. Notice that (3.21) is less restrictive than the condition imposed in Lemma 3.3.1 for the finite dimensional parametrization. Furthermore, as it will be discussed in the next section, the finite spectrum parametrization allows to handle spectral constraints on input and output signals, that the partial correlation parametrization cannot handle, since the parametrization (3.20) is not a spectrum. An advantage of the latter parametrization, though, is that it uses the minimum number of free parameters. 3.4 Covariance matrix parametrization By using one of the input spectrum parametrizations in the expression (2.7), the inverse of the asymptotic covariance matrix can be written as Pθ−1 0 = M −1 X c̃k Qk + Q̄, (3.22) k=−(M −1) ¡ iω ¢ ¡ iω ¢ ? ¡ iω ¢ Rπ N e , θ0 Bk e Fu e , θ0 dω and Q̄ = Re (θ0 ). where Qk = 2πλ F u −π 0 −1 Then, Pθ0 is expressed as a linear and finitely parametrized function of the coefficients c̃0 , . . . , c̃M −1 (since the symmetry condition c̃−k = c̃k holds). Therefore, any quality constraint that is convex in Pθ−1 is also convex in 0 c̃0 , . . . , c̃M −1 . Some common quality constraints have been introduced in Section 3.2, which were all convex functions of Pθ−1 . 0 3.5 Signal constraints parametrization Constraints on the input spectrum are also considered in input design. Typically, they are frequency by frequency or power constraints. A detailed discussion of the power constraints parametrization is presented in [6]. Briefly, consider power constraints of the type Z π ¯ ¡ ¢¯ 1 ¯Wu eiω ¯2 Φu (ω) dω ≤ αu (3.23) 2π −π Z π ¯ ¡ ¢¯ 1 ¯Wy eiω ¯2 Φy (ω) dω ≤ αy . (3.24) 2π −π CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 16 By using a finite spectrum parametrization these constraints can be written −1 as convex finite-dimensional functions of {c̃k }M k=0 . For example, a constraint on the input power for an FIR spectrum becomes r0 ≤ αu . For frequency by frequency constraints of the form βu (ω) ≤ Φu (ω) ≤ γu (ω) (3.25) βy (ω) ≤ Φy (ω) ≤ γy (ω) , (3.26) −1 Lemma 3.3.1 can be applied to write them as convex constraints in {c̃k }M k=0 , upon the condition that the constraining functions are rational [6]. 3.6 Limitations of input design in the frequency domain The previous sections introduced input design problems where constraints on the signal spectra as well as on a measure of the estimate accuracy are considered. It has been shown that they can be formulated as finitely parametrized convex optimization problems, upon the condition that the measure of the estimate accuracy is a convex function of Pθ−1 and the input 0 spectrum is parametrized as proposed in Section 3.3. Therefore, by solving a constrained optimization problem, the optimal variables c̃0 , . . . , c̃M −1 are found. The FIR spectrum representation is commonly used, so that the optimization procedure returns the first M terms of the correlation function. If a partial correlation parametrization is used, the optimal spectrum can be found by solving the Yule-Walker equations as described in [26]. From the optimal spectrum is then necessary to generate a signal in the time domain to apply to the real system. This is a realization problem that caracterizes solutions in the frequency domain. The input can be generated as filtered white noise, by spectral factorization of the optimal spectrum. Nevertheless, many probability distributions can be used to generate white noise. Also, it has to be noticed that in general the optimal spectrum is non unique and the input design approach so far considered only finds one of the optimal spectra. In fact, a finite dimensional spectrum parametrization forces the input correlation coefficients rM , rM +1 , . . . to be zero; on the other hand, the partial correlation parametrization needs to complete the correlation sequence by solving Yule-Walker equations which give only one particular correlation sequence. Furthermore, in practical applications time domain constraints on the signals have to be considered and the power constraint that is usually set in CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 17 the frequency domain does not assure that these constraints are respected. For these reasons, this thesis proposes to analyze the performance of an input design method in the probability domain, as it will be presented in the next chapters. 3.7 Conclusions Classical input design in the frequency domain has been presented. The advantage of this approach is that input design problems can be formulated as convex optimization problems. Some limitations of the method concern time domain constraints on signals and realization techniques. Chapter 4 Markov Chain Input Model The drawbacks of input design in the frequency domain, presented in Section 3.6, suggest to study the possibility of a different approach. This chapter will introduce the idea of input design in the probability domain. In particular, reasons and advantages of modeling the input signal as a Markov chain will be presented in Section 4.1. In Section 4.2 the Markov chain input model will be described in detail. 4.1 Introduction What in general is required for system identification in practical applications is an input signal in the time domain that guarantees a sufficiently accurate estimate of the system while respecting some amplitude constraints. As discussed above, input design in the frequency domain does not handle time domain constraints on signals. Another disadvantage is that it does not define how to generate the input signal to apply to the real system from the optimal spectrum. Furthermore, input design in the frequency domain does not use the degrees of freedom in the choice of the optimal spectrum, which is generally non unique. That approach, in fact, only finds an optimal solution that fits with the optimal correlation coefficients r0 , . . . , rM −1 . All the other possible solutions are not considered. The idea of input design in the probability domain arises from this observation: a solution in the probability domain makes it easier to generate input trajectories to apply to the real system, by extracting samples from the optimal distribution. In this way, no spectral factorization or realization algorithm are required. Markov chains having finite state space could then be used in order to directly include the time domain amplitude constraints into the input 18 CHAPTER 4. MARKOV CHAIN INPUT MODEL 19 model, by suitably choosing the state space. The idea is to use Markov chain distributions to generate binary signals. Binary signals are often used in system identification and one of the reasons is that they achieve the largest power in the set of all signals having the same maximum amplitude and this improves parameter estimation for linear models. Also, if the Markov chain has spectrum of sufficiently high order (and this depends on the state space dimension), when designing the optimal probability distribution there are more degrees of freedom in the choice of the optimal spectrum. A finite stationary Markov chain is used as input model for system identification. The probability distribution will be shaped in order to optimize the objective function defined in the input design problem. This function is then minimized with respect to the transition probabilities of the Markov chain that completely define its distribution [27]. 4.2 Markov chains model This section describes the state space structure of the general Markov chain input model and some of its spectral features. 4.2.1 Markov chain state space Consider a finite stationary Markov chain having states of the form (ut−n , ut−n+1 , . . . , ut ) (4.1) where ui represents the value of the input at the time instant i; it can be equal to either umax or −umax , where umax is the maximum tolerable input amplitude, imposed by the real system. This model allows the present value of the input to depend on the last n past values, rather than only on the previous one. Note that at the time instant t, the state can transit only to either the state (ut−n+1 , ut−n+2 , . . . , ut , umax ) or (ut−n+1 , ut−n+2 , . . . , ut , −umax ) with probabilities p(ut−n ,...,ut ) and 1 − p(ut−n ,...,ut ) , respectively. Not all the transitions between states are possible; therefore the transition matrix will present several zeros corresponding to those forbidden state transitions. The last component of the Markov chain state will generate the binary signal to apply to the real system. Example 4.2.1 Consider a Markov chain having state space S2 = {1, −1}. The graph representation is shown in Figure 4.1 and the corresponding tran- CHAPTER 4. MARKOV CHAIN INPUT MODEL sition matrix is Ã Π2 = p 1−p 1−q q 20 ! . (4.2) Figure 4.1: Graph representation of the two states Markov chain S2 . This simple model generates a binary signal where each sample at time t depends only on the previous value at time t − 1. Example 4.2.2 A more general model is the four states Markov chain, with state space S = {(1, 1) , (1, −1) , (−1, −1) , (−1, 1)}. The transition matrix is p 1−p 0 0 0 0 s 1−s Π= (4.3) 0 0 q 1−q r 1−r 0 0 and the corresponding graph is shown in Figure 4.2. Figure 4.2: Graph representation of the four states Markov chain S4 . Note that when p = r and s = q the four states Markov chain model is equivalent to the two states Markov chain of Example 4.2.1. CHAPTER 4. MARKOV CHAIN INPUT MODEL 21 These examples show one of the advantages of the proposed Markov chain input model: each model includes all the models of lower dimension as special cases, for proper choices of the transition probabilities. 4.2.2 Markov chains spectra This section presents a method to calculate the expression of the Markov chains spectra. Some examples will illustrate the type of signals these models generate. By means of Markov chains and state space realization theories (see [27] and [28]), it is possible to derive a general expression for the spectrum of a finite stationary Markov chain sm having state space S = {S1 , S2 , . . . , SJ }. For the general Markov chains considered in the previous section, each state has the form (4.1) and the number of states is J = 2n+1 . Let Π denote the transition matrix whose elements are³the conditional ´ probabilities Π (i, j) = P {sm+1 = Sj | sm = Si } and p̄ = p̄1 . . . p̄J the solution of the linear system p̄ = p̄Π, containing the stationary probabilities p̄i = P {sn = Si }. Consider the states Si as column vectors. Defining ³ ´ As = S1 . . . SJ and Ds = p̄1 0 .. 0 . p̄J it is possible to write the correlation coefficients of the output signal in the matricial form rk = As Ds Πk ATs , k = 0, 1, 2 . . . (4.4) For k < 0 the correlation can be obtained by the symmetry condition rk = r−k , since the process sm is real. To calculate the spectrum of sm as the Fourier transform of the correlation function, note that rk can be viewed as the impulse response, for k = 1, 2, . . ., of the linear system xk+1 = Πxk + ΠATs uk yk = As Ds xk . (4.5) Therefore, the transfer function W (z) = As Ds (zI − Π)−1 ΠATs of the system (4.5) is the Z-transform of the causal part of the correlation function, that is {rk }∞ k=1 . CHAPTER 4. MARKOV CHAIN INPUT MODEL 22 ¡ ¢ Consequently, W z −1 is the Z-transform of the anticausal part of the correlation function, {rk }−1 k=−∞ . The spectrum of the Markov chain signal sm can then be expressed as ¡ ¢ Φs (z) = W (z) + r0 + W z −1 , (4.6) The correlation rk is in general a matricial function, i.e. for each k, rk is an n × n matrix. The correlation function of the input signal is given by the sequence obtained for the element in position (n, n): {rk (n, n)}∞ k=0 . Then, the input signal spectrum is given by the element in position (n, n) of the matrix Φs (z). Consider now the two Markov chains in Examples 4.2.1 and 4.2.2 of the previous section. Example 4.2.3 By calculating the transfer function W (z) and the autocorrelation r0 , the following expression for the spectrum of the Markov chain of Figure 4.1 is obtained: p p (1 + α) (1 − γ) (1 + α) (1 − γ) Φs (z) = , (z − α) (z −1 − α) (4.7) 2 where α = p + q − 1 ∈ [−1, 1] and γ = (p+q−1)(p+q−2)−(p−q) . (p+q−2) Notice that (4.7) is the spectrum of a first order AR process. This also means that it is possible to generate any first order AR process through a two states Markov chain. p−q The mean value of sm is E [sm ] = 2−p−q . By forcing p = q the mean value would be zero and since γ = α the spectrum would depend on only one parameter. If p = q the variance of the Markov chain is 1. Example 4.2.4 Consider the Example 4.2.2 of the previous section where the mean value of the Markov chain is set to zero. Then s = (1−q)r (1−p) and the stationary probabilities are p̄ = Ã r(1−q) 2(1−p)(1−q)+2r(1−q) (1−p)(1−q) 2(1−p)(1−q)+2r(1−q) r(1−q) 2(1−p)(1−q)+2r(1−q) (1−p)(1−q) 2(1−p)(1−q)+2r(1−q) T ! 1 1 −1 −1 By definition, As = . However, only the second 1 −1 −1 1 component of the Markov chain is of interest, since it will represent the CHAPTER 4. MARKOV CHAIN INPUT MODEL 23 input signal. Therefore, the ³correlation rk of the ´ input process can be calculated using the vector Ās = 1 −1 −1 1 instead of As . The analytic calculation of the spectrum turns out to be too involved, so it has only been evaluated numerically. Some values of poles and zeros of the canonical spectral factor 1 are reported in Table 4.1. These data show that it is not possible to model sm by an AR process, because in general there are also non null zeros in the spectrum. The four states Markov chain has a higher order spectrum than the two states Markov chain of Example 4.2.3; the number of poles and zeros depends on the values of the probabilities p, r and q and can be up to eight. For some values of the probabilities p, r and q, there are zero-pole cancelations that reduce the spectrum order; in particular, when p = q = r the spectrum has the same simple structure obtained for the previous case (see Table 4.1), as it has already been shown in the previous section. A particular choice of the transition p 1−p 0 0 Π4 = 0 0 r 1−r matrix for this Markov chain is 0 0 r 1−r (4.8) p 1−p 0 0 This choice of the transition matrix makes the Markov chain symmetric in the sense that the transition probabilities are invariant with respect to exchanges in the sign of the states components. Even if the input is designed in the probability domain, the spectrum is shaped by the choice of n and of the transition probabilities of the Markov chain. Notice that Φu is only subject to (4.6) and no other structural constraints are imposed, except for the constraint that the transition probabilities have to lie in [0, 1]. For the spectrum parametrizations described in Section 3.3 this does not happen. For example, if the FIR representation of the spectrum is used, the finite dimensional parametrization forces the positive real part of the spectrum to be an FIR system. The Markov chain signals have spectrum where poles and zeros are related each other, since the number of free parameters in the problem is J (the transition probabilities), as the number of poles of W (z). However, the positive real part of the spectrum is not forced to be FIR. From these observations it is possible to conclude that looking at input design using Markov chains in the frequency domain, this 1 By canonical spectral factor is meant the analytic and minimum phase function L (z) ¡ ¢ such that Φs (z) = L (z) L z −1 . CHAPTER 4. MARKOV CHAIN INPUT MODEL 24 p q r poles zeros 0.2 0.2 0.2 0.8 0.2 0.2 0.2 0.8 0.2 0.2 0.2 0.8 0.8 0.2 0.8 0.8 0.8 0.2 0.2 0.8 0.8 0.8 0.8 0.8 0.3 0.5 0.8 0.2 0.5 0.8 -0.6000 −0.3557 + 0.6161i −0.3557 − 0.6161i 0.7114 −0.7746 0.7746 0 + 0.7746i 0 − 0.7746i 0 + 0.7746i 0 − 0.7746i 0.7746 −0.7746 −0.3557 + 0.6161i −0.3557 − 0.6161i 0.7114 0.6000 −0.0302 + 0.5049i −0.0302 − 0.5049i −0.1396 −0.1500 + 0.5268i −0.1500 − 0.5268i 0 0 0 0 0 0.4514 0 0 0 0 0 0 0 0 0 0 0 0 −0.3008 0 −0.3333 Table 4.1: Poles and zeros of the canonical spectral factor of the spectrum of sm of Example 4.2.4. CHAPTER 4. MARKOV CHAIN INPUT MODEL 25 approach preserves more degrees of freedom for the choice of the optimal spectrum than the input design approach described in Chapter 3, since less constraints on the structure are imposed. In order to optimize an objective function for input design purposes by shaping the input process distribution, it is necessary to define the Markov chain state space and its transition probabilities. That is, once the input model structure is defined, the objective function is optimized with respect to the transition probabilities. For the examples considered in this section, the purpose of the input design problem would be to optimize the objective function J (u, θ0 ) with respect to the transition probabilities, p in the first case, p and r in the second. 4.3 More general Markov chains Input signals generated by the Markov chains described in the previous sections are binary signals. It is also possible to extend this type of Markov models to more general input signals. For example, Markov chains with state space S = {0, 1, −1, 2, −2, 3, −3, . . .} would generate signals having more than two amplitude levels. As a simple example, consider a three states Markov chain generating a ternary input signal. The Markov chain has state space S3 = {0, 1, −1} and transition graph in Figure 4.3. Figure 4.3: Graph representation of the three states Markov chain S3 . To easily calculate an expression for the spectrum, the mean value of the process is set to zero, so that p = q. CHAPTER 4. MARKOV CHAIN INPUT MODEL ³ By solving p̄ = p̄Π, the vector p̄ = found. The spectrum results Φs (z) = 1−r 3−2r−p 3 (r − 1) (p − 1) (1 + 3p) ³ 2 (3 − 2r − p) z− 26 1−p 3−2r−p 1 ´³ 3p−1 z −1 − 2 1−r 3−2r−p 3p−1 2 ´ is ´ This spectrum has the same structure as the one found for the two states Markov chain in Example 4.2.3. In this case, Φs depends on both r and p: p determines the pole and r the gain of the spectrum. The expression found for Φs (z) shows that this input model does not provide a higher order spectrum than the two states Markov chain. In this case, it is the preferable to use a simpler model. This work focuses on input models that generate binary signals. The input models described in this section will not be further considered. 4.4 Conclusions This chapter defined finite stationary Markov chains generating binary signals. These processes have been introduced to model input signals for system identification purposes. The main advantages of using Markov chains as input models are that amplitude constraints are directly included into the input model and the input signal can be easily generated from the optimal distribution, avoiding the realization problem. Spectral properties of these processes have been also analyzed through some examples in Section 4.2.2. Chapter 5 Estimation Using Markov Chains This chapter proposes a method for parameter estimation of an LTI SISO system by using the Markov chain input model presented in Chapter 4. Section 5.1 defines the input design problem that will be studied here. The solution approach is presented in Section 5.2. The design method is described in detail in Sections 5.3 and 5.4. A numerical example, analyzed in Section 5.5, concludes the chapter. 5.1 Problem formulation Consider the system (2.2) and the parametric model (2.1) defined in Section 2.1. In this chapter, objective of the identification procedure is the estimation of the parameter vector θ. The input design problem considered in this study is to minimize a measure of the estimate error, f (Pθ0 ), where f is a convex function of the covariance matrix of the parameter estimate. Often in practice, it is also necessary to take into account some constraints on the real signals. In that case, a general cost function can be considered J (u, θ0 ) = f (Pθ0 (u)) + g, (5.1) where g is a term which represents the cost of the experiment. As explained in Chapter 3, typical functions f are the trace, the determinant or the largest eigenvalue of the covariance matrix Pθ0 [1]. This problem formulation is slightly different from the one presented in Chapter 3. No constraints are set in the optimization problem explicitly, but they are included into the cost function through the term g. The reason 27 CHAPTER 5. ESTIMATION USING MARKOV CHAINS 28 for this is that the stochastic approximation framework is used to minimize the objective function, since no analytic convex formulation of the problem is available. This can be seen as a classical multiobjective optimization approach (an overview can be found in [29] and [30]). 5.2 Solution approach Since the analytic expressions for the covariance matrix Pθ0 as a function of the transition probabilities of the Markov chain modeling the input are quite involved, simulation techniques are required to evaluate the cost function. The estimate of Pθ0 is a stochastic function of the one-step transition probabilities and is contaminated with noise; therefore, it can only be evaluated through randomly generated input and noise signals. In this framework, stochastic approximation algorithms are needed to minimize the cost function (5.1) with respect to the transition probabilities of the Markov chain (see [15], [16] for details). An expression for Pθ0 that suits in the stochastic approximation approach is found in Section 5.3; Pθ0 is estimated as a function of input and noise data. The stochastic algorithm used to minimize the cost function (5.1) is described in Section 5.4. 5.3 Cost function evaluation From the model expression (2.1) it is possible to write e (t) = H (q, θ)−1 (y (t) − G (q, θ) u (t)) and by linearizing the functions G (q, θ) and H (q, θ) at θ = θ0 G (q, θ) ≈ G0 (q) + 4G (q, θ) H (q, θ) ≈ H0 (q) + 4H (q, θ) , ¯ ¯ T where 4G (q, θ) = (θ − θ0 )T ∂G(q,θ) ∂θ ¯ , 4H (q, θ) = (θ − θ0 ) following expression is derived: θ0 e (t) = (H0 (q) + 4H (q, θ))−1 (H0 (q) e0 (t) − 4G (q, θ) u (t)) . By substituting the Taylor expansion (H0 (q) + 4H (q, θ))−1 ≈ 1 4H (q, θ) − H0 (q) H0 (q)2 ¯ ∂H(q,θ) ¯ ∂θ ¯θ , the 0 (5.2) CHAPTER 5. ESTIMATION USING MARKOV CHAINS and the expressions of 4G (q, θ) and 4H (q, θ), it results Ã ¯ 1 ∂H (q, θ) ¯¯ T e0 (t) ≈ (θ − θ0 ) ¯ e0 (t) + H0 (q) ∂θ θ0 ! ¯ 1 ∂G (q, θ) ¯¯ u (t) + e (t) . H0 (q) ∂θ ¯θ0 29 (5.3) The problem of estimating the parameter θ for the model (2.1) is asymptotically equivalent to solving the least squares problem for (5.3) where e0 and u are known, when the number of data points N used for estimation goes to infinity. Therefore, the asymptotic expression (2.7) can be approximated as Pθ−1 = 0 ³ where S = 1 ¡ T ¢ S S λ0 ´ ∈ RN ×b and wi ∈ RN ×1 is the sequence ob¯ ¯ ∂G(q,θ) ¯ ∂H(q,θ) ¯ 1 u (t) + ¯ ∂θ ∂θ ¯ e0 (t). H0 (q) w1 . . . wb tained from wit = 1 H0 (q) θ0 θ0 Therefore, at each iteration of the algorithm, the cost function is evaluated using randomly generated input and noise signals. 5.4 Algorithm description When evaluating the cost function by simulation, it is necessary to consider that the cost function estimate is a stochastic variable that depends on the transition probabilities of the Markov chains and on the noise process e (t). Therefore, the cost function values generated through simulation have to be considered as samples of that stochastic variable. The true value of the cost function for a given transition probability would be the mean of that stochastic variable. For these reasons, stochastic approximation is necessary in order to minimize the cost function with respect to the transition probabilities of the Markov chain describing the input. One of the most common stochastic approximation methods that do not require the knowledge of the cost function gradient is the finite difference stochastic approximation (FDSA) [15]. It uses the recursion dk , p̂k+1 = p̂k − ak ∇J (5.4) dk is an estimate of the gradient of J at the k-th step and ak is a where ∇J sequence such that limk→∞ ak = 0. The FDSA estimates the gradient of the CHAPTER 5. ESTIMATION USING MARKOV CHAINS 30 cost function as dki = J (p̂k + ck ei ) − J (p̂k − ck ei ) , ∇J 2ck dki is the i-th comwhere ei denotes the unit vector in the i-th direction, ∇J ponent of the gradient vector and ck is a sequence of coefficients converging to zero as k → ∞. Depending on the number d of parameters with respect to which minimize the cost function, a simultaneous perturbation stochastic approximation (SPSA) may be more efficient than the FDSA [31] ; when d increases the number of cost function evaluations in a FDSA procedure may be too large and the algorithm be very slow. In that case the SPSA algorithm described in [31] gives better performance, since it requires only two evaluations of the cost function regardless of d. In fact, SPSA estimates the gradient by dki = J (p̂k + ck ∆k ) − J (p̂k − ck ∆k ) ∇J 2ck ∆ki where ∆k is a d -dimensional random perturbation vector, whose components are independently generated from a Bernoulli ±1 distribution with probability 0.5 for each outcome [32]. The iteration (5.4) is initialized by a first evaluation of the cost function on a discrete set of points and choosing the minimum in that set. At any point in this set, the cost function is evaluated only once; therefore, the value obtained is a sample extracted from the stochastic variable describing the cost function at that point. Therefore, it could turn out that the initial condition is not close to the true minimum of the cost function, due to noise in the measurements. Nevertheless, in some cases the result of the initialization procedure may be sufficiently accurate, so there could be no need to run many algorithm iterations. This of course will depend on the cost function shape and on the choice of the grid of points. a c The sequences ak and ck can be chosen as ak = A+k+1 and ck = 1/3 , (k+1) which are asymptotically optimal for the FDSA algorithm (see [15]). A method for choosing A, a and c may be to estimate the gradient of the cost d0 has magnitude function at the initial condition, so that the product a0 ∇J approximately equal to the expected changes among the elements of p̂k in the early iterations [32]. The coefficient c (as suggested in [32]) ought to be greater than the variance of the noise in the cost function measurements in order to have a good estimate of the gradient. This variance may be estimated at the initial condition of the algorithm. An analytic proof of the algorithm convergence can be found in [15]. CHAPTER 5. ESTIMATION USING MARKOV CHAINS 31 Figure 5.1: Mass-spring-damper system. 5.5 Numerical example Consider a mass-spring-damper system (Figure 5.1), where the input u is the force applied to the mass and the output y is the mass position. It is described by the transfer function G0 (s) = s2 + 1 m c ms + k m with m = 100 Kg, k = 10 N/m and c = 6.3246 N s/m, resulting the natural frequency ωn = 0.3162 rad/s and the damping ξ = 0.1. The power here is defined as pw (t) = u (t) ẏ (t). White noise with variance λ0 = 0.0001 is added at the output and an output-error model is used [19]. Data are sampled with Ts = 1 s and the number of data points generated is N = 1000. As a measure of the estimate accuracy, the trace of the covariance matrix Pθ0 is used. In order to consider also some practical constraints on the amplitude of the input and output signals and the maximum and mean input power, a general cost function will be used: J (u, θ0 ) = f1 (T rPθ0 (u)) + f2 (umax ) (5.5) + f3 (ymax ) + f4 (pwmax ) + f5 (pwmean ) where umax and ymax are the absolute maximum values of the input and output signals, pwmax and pwmean are the maximum and mean input power. Thresholds for T rPθ0 , umax , ymax , pwmax and pwmean have been set, which define the maximum values allowed for each of these variables. Figure 5.2 and 5.3 show the cost functions f2 , f3 and f1 , f4 , f5 , respectively: when the variables T rPθ0 , umax , ymax , pwmax and pwmean reach their maximum CHAPTER 5. ESTIMATION USING MARKOV CHAINS 32 Figure 5.2: Cost functions f2 and f3 in the interval of acceptable values of the variables umax and ymax , respectively. Figure 5.3: Cost functions f1 , f4 ,f5 in the interval of acceptable values. CHAPTER 5. ESTIMATION USING MARKOV CHAINS 33 acceptable value (100%), the cost is one. Outside the interval of acceptable values, the cost functions continue growing linearly. As input models, the two simple examples of Markov chains introduced in Section 4.2 are considered here. They are described by the transition matrices p 1−p 0 0 Ã ! 0 p 1−p 0 r 1−r Π2 = Π4 = . 0 1−p p 0 p 1−p r 1−r 0 0 The set of points used for the algorithm initialization as explained in Section 5.4 is {0.1, 0.2, . . . 0.9}. In the case analyzed here, since the cost function depends on not more than two parameters, FDSA is used. The algorithm coefficients have been chosen by the method suggested in the previous section. Three cases that have been studied: 1. The cost associated to T rPθ0 and the costs associated to the physical constraints have comparable values. 2. No power and amplitude constraints are considered. 3. Very strict power constraints are considered. These cases are summarized in 5.1. Case 1 2 3 T rPθ0 10−6 5∗ 5 ∗ 10−6 5 ∗ 10−6 umax 1N Inf N 1N ymax pwmax pwmean 1m Inf m 1m 0.3 Nsm Inf Nsm 0.03 Nsm 0.03 Nsm Inf Nsm 0.003 Nsm Table 5.1: Maximum threshold values in the three analyzed cases. As a term of comparison for the performance of the Markov input model, a pseudo-random binary signal and white noise with unit variance (the same as the variance of the Markov chains) have been applied as inputs to the system. The results of the simulation runs for the three cases listed above are shown in Tables 5.4, 5.5 and 5.6. The cost function values are estimated by evaluating the average of 100 simulation runs using the optimal input found by the algorithm, the PRBS and white noise inputs. Table 5.5, related to case 2, shows the optimal value of the trace of the covariance matrix calculated by solving the LMI formulation of the input design problem in CHAPTER 5. ESTIMATION USING MARKOV CHAINS 34 the frequency domain, as explained in [12]. Furthermore, by the method described in [13], a binary signal having the optimal correlation function is generated. The minimum obtained with this input signal is also shown in Table 5.5. The second case, which is the most standard in input design problems, is first analyzed in detail. Figure 5.4 presents the cost function, estimated on a fine grid of points, as the average of 100 simulations. Table 5.2 exhibits the Figure 5.4: Estimate of the cost function on a discrete set of points for the two states Markov chain in the case 2. results of two Monte-Carlo simulations (each consisting of 100 runs), which show that the variance of the algorithm output decreases approximately as 1 NIter , where NIter is the number of algorithm iterations; this guarantees the empirical algorithm convergence. With 10000 iterations the algorithm NIter Mean value Ep̂ Variance Varp̂ 1000 2000 0.8657 0.8671 4.6 × 10−4 2.5 × 10−4 Table 5.2: Results of 100 Monte-Carlo simulations of the algorithm with the 2 states Markov chain. produces the results in Figure 5.5. The optimality of the probability p̂ found by the algorithm has been verified by using the expression of the two states CHAPTER 5. ESTIMATION USING MARKOV CHAINS 35 Markov chain spectrum in the asymptotic expression (2.7) and minimizing T rPθ0 with respect to α; it turns out that the optimal value p̂ = 0.8714 is very close to the one found by the stochastic algorithm after 30000 iterations, that is p̂ = 0.8712 (Table 5.3). This confirms that the stochastic algorithm converges to the true optimal value. In practice, it is not necessary to run the algorithm for 30000 iterations, since already at the initial condition the cost function is very close to the minimum and the variance of the estimate after 10000 iterations is of the order of 10−5 . It has been done here, anyway, to show that the final value obtained is the true optimal one. Figure 5.5: Estimation of the best transition probability for the two states Markov chain in the case 2. Case S2 1 p̂ = 0.4720 2 p̂ = 0.8712 3 p̂ = 0.1100 h h h i p̂ r̂ p̂ r̂ p̂ r̂ i i h = = = h h S4 0.4730 0.6794 0.8494 0.6445 i i i 0.0005 0.2981 Table 5.3: Optimal values of the transition probabilities in the cases 1, 2 and 3, obtained after 30000 algorithm iterations. Notice from the results in Table 5.5 that the Markov chains give lower values of the trace of Pθ0 (u) than all the other inputs, except the true optimal CHAPTER 5. ESTIMATION USING MARKOV CHAINS spectrum. 36 The frequencies of the optimal input spectrum for the case 2 J (u, θ0 ) S2 S4 PRBS WN 1.2758 1.2788 1.2564 20.1326 Table 5.4: Total cost function values obtained with the optimal Markov inputs, a PRBS and white noise in case 1. T rPθ0 S2 S4 PRBS WN BI Optimum 1.43e-7 1.59e-7 4.35e-7 4.66e-7 2.18e-6 2.85e-8 Table 5.5: Trace of the covariance matrix obtained with the optimal Markov inputs, a PRBS, white noise, a binary input having the optimal correlation function and the optimal spectrum in case 2. J (u, ϑ0 ) S2 S4 PRBS WN 78.51 73.58 163.85 484.94 Table 5.6: Total cost function value obtained with the optimal Markov inputs, a PRBS and white noise in case 3. have been estimated by means of the Multiple Signal Classification (MUSIC) methods, described in [26]. It results that the optimal input consists of two sinusoids of frequencies 0.3023 rad/s and 0.3571 rad/s, respectively, where the main contribution is given by the sinusoid of high frequency, which has approximately 5.6 times the power of the first component. Note that these frequencies are very close to the natural frequency of the system and to the poles of the Markov chains spectra (Figures 5.10 and 5.11). Figures 5.6 and 5.7 show the trajectories of the probabilities estimates obtained for the four states Markov chain in case 2. The empirical speed of convergence is lower than for the two states Markov chain. Nevertheless, the cost function value does not change significantly if the algorithm is stopped after 2000 iterations. Case 1 analyzes the more practical situation in which amplitude and power constraints on signals have to be considered. In this case, the cost functions obtained for the two and the four states Markov chain are presented in Figures 5.8 and 5.9. Notice that despite the presence of noise, the cost function is convex; therefore, the problem has a solution. CHAPTER 5. ESTIMATION USING MARKOV CHAINS 37 Figure 5.6: Estimation of the best transition probability p for the four states Markov chain in the case 2. Figure 5.7: Estimation of the best transition probability r for the four states Markov chain in the case 2. CHAPTER 5. ESTIMATION USING MARKOV CHAINS 38 Figure 5.8: Estimate of the cost function on a discrete set of points for the two states Markov chain in the case 1. Figure 5.9: Estimate of the cost function on a discrete set of points for the four states Markov chain in the case 1. Power constraints move the minimum of the cost function to smaller probability values; that is, the transition probabilities cannot be too large, otherwise the input excitation would not respect power constraints. In case 1, the Markov inputs and the PRBS signal give almost the same cost value CHAPTER 5. ESTIMATION USING MARKOV CHAINS 39 (Table 5.4). This happens because the optimal values of the transition probabilities are approximately 0.5, which means that the Markov chain signal is essentially binary white noise (and this depends on the choice of the thresholds values). Note in Figure 5.10 and 5.11 that in this case the spectra of the Markov chains are almost constant. The white input, which is generated with a gaussian distribution, gives a much higher cost value, due to its amplitude and power. Also in the third case, when more strict power constraints are imposed on in the problem, the use of a Markov chain is preferable (see results in Table 5.6). Therefore, when amplitude and power constraints have to be considered in the input design problem, the Markov chain model can significantly improve system identification. The optimal distribution is easily estimated by simulation of the real system. Notice that the two states Markov chain performs a bit better than the four states Markov chain in the first two cases, while in the third case, when stricter power constraints are considered, the four states Markov chain achieves the lowest cost function value. The reason for this is that in case 1 and 2 the optimal input structure is the two states Markov chain; therefore, the stochastic approximation algorithm performs better if the simple input model is used, rather than a more general one that requires more parameters to be tuned. In fact, notice that in cases 1 and 2 (Tables 5.3) the optimal transition probabilities p̂ for the four states Markov chain are close to the corresponding optimal probabilities p̂ for the two states Markov chain. However, the probabilities r̂ do not result equal to the corresponding p̂ and this explains the worse performance of the four states Markov chain in those two cases. Finally, after having calculated optimal transition probabilities for the considered input models and compared them to other input signals, the system parameters have to be identified. Their real values are: 1 = 0.01Kg −1 m c Ns = 0.06325 m m k N = 0.1 m m (5.6) (5.7) (5.8) The results of the identification procedure using the optimal Markov chains are shown in Tables 5.7 and 5.8, and Figures 5.12 and 5.13. Notice that both the parameters and the frequency response of the real continuous system are correctly estimated. The relative errors on the parameters estimates are mostly under 1%. When power constraints have to be included, the estimate CHAPTER 5. ESTIMATION USING MARKOV CHAINS 40 Figure 5.10: Bode diagrams of the optimal spectra of the 2 states Markov chains in the cases 1, 2 and 3 of Table 5.3, and of the real discrete system. Figure 5.11: Bode diagrams of the optimal spectra of the 4 states Markov chains in the cases 1, 2 and 3 of Table 5.3, and of the real discrete system. CHAPTER 5. ESTIMATION USING MARKOV CHAINS 41 error is a bit larger. The best results are achieved in case 2 with the two states Markov chain. Case 1 m c m k m £ Kg −1 est. val. err. % £ ¤ est. val. Nms err. % £ ¤ est. val. N m err. % ¤ 1 2 3 0.0101 1% 0.06355 0.4743% 0.09997 0.03% 0.01 0% 0.06329 0.0632% 0.1 0% 0.009688 3.12% 0.06023 4.7747% 0.09948 0.52% Table 5.7: Estimated values of the parameters of the continuous real system and relative percentage errors, obtained with the optimal two states Markov chains. Figure 5.12: Estimates of the frequency response of the system using the optimal two states Markov chains for the cases 1, 2 and 3 in Table 5.1. CHAPTER 5. ESTIMATION USING MARKOV CHAINS Case 1 m c m k m £ Kg −1 est. val. err. % £ ¤ est. val. Nms err. % £ ¤ est. val. N m err. % ¤ 42 1 2 3 0.009936 0.64% 0.06261 1.0119% 0.1 0% 0.01 0% 0.06314 0.1739% 0.0999 0.1% 0.01 0% 0.06266 0.9328% 0.09996 0.04% Table 5.8: Estimated values of the parameters of the continuous real system and relative percentage errors, obtained with the optimal four states Markov chains. Figure 5.13: Estimates of the frequency response of the system using the optimal four states Markov chains for the cases 1, 2 and 3 in Table 5.1. 5.6 Conclusions This chapter proposed a solution to the input design problem for parameter estimation using Markov chains. A stochastic approximation algorithm has been used to minimize the objective function in the input design problem. The solution is the probability distribution of the Markov chain, from which it is possible to easily generate a binary input signal. From the results in Table 5.5, the example in Section 5.5 showed that the Markov chain model gives a trace of the covariance matrix 10 times lower CHAPTER 5. ESTIMATION USING MARKOV CHAINS 43 than the value obtained with a binary input which approximates the process having the optimal correlation function. Results in Table 5.4 and 5.6 also prove that the Markov chain model performs equally or better than most commonly used inputs, as PRBS and white noise. This means that the Markov chain input model can improve system identification considerably. Chapter 6 Zero Estimation In control applications it is often important to have accurate estimates of the frequency response of a system, in a certain frequency band. Knowledge of a non-minimum phase zero of a scalar linear system gives relevant information in this sense; in fact, NMP zeros limit the achievable bandwidth [33]. In identification procedures, this information may also be useful for estimating the model structure and order. Input design for accurate zero identification is then often required. The quantification of the variance of the estimated zero is particularly important. Many expressions for the variance of the estimated zero, which are mainly asymptotic expressions, can be found in the literature. Input design for NMP zeros identification for LTI systems is presented in Section 6.1, following the problem formulation in [17]. For FIR and ARX systems, the optimal input spectrum can be found analytically, as shown in Sections 6.2 and 6.3, respectively. In these cases, the solution is asymptotic in the number of data but it does not depend on the model order if it is sufficiently high so that there is no undermodeling. An asymptotically optimal solution, in the number of data and in the model order, for general SISO LTI systems is presented in Section 6.4. The adaptive algorithm presented in [18], which generates an input signal to consistently estimate the NMP zero, is described in Section 6.5. It has then been modified to fit with the Markov chain input model and generate an adaptive Markov chain signal. A numerical example compares the two methods. 6.1 Problem formulation Consider the SISO LTI system (2.2) and the class of parametric models (2.1) defined in Section 2.1. PEM is used to estimate the real parameter vector θ. 44 CHAPTER 6. ZERO ESTIMATION 45 The assumptions 1, 2 and 3 in Section 2.2 are considered for the consistency of the method. These lead to the asymptotic normality of the parameter estimate and the expression (2.7) of the inverse of the covariance matrix Pθ0 . Consider a particular zero of the polynomial B (q, θ) = b1 + b2 q −1 + · · · + bnb q −nb , denoted zk (θ). Assume all zeros of B (q, θ) have multiplicity one. Let zk0 = zk (θ0 ) and ẑk = zk (θ̂N ) denote the real value and the estimate of the zero zk (θ) (the same notations as in [17] are used here). In [34] it is shown that the asymptotic variance of the zero estimate in the number of data can be expressed as £ ¤2 ¡ ¢ ¡ ¢ lim E ẑk − zk0 = α2 Γ?b zk0 Pb Γb zk0 , N →∞ (6.1) where ¯ 0 ¯2 ¯z ¯ λ 0 α2 = ¯ ¡k ¢¯2 ¯ ¯ N ¯B̃ zk0 ¯ B̃ (q, θ) = ³ Γb = (6.2) B (q, θ) 1 − zk (θ) q −1 1 q −1 . . . q −nb (6.3) ´T (6.4) and P matrix of the estimates of the parameter vector ³b is the covariance ´ θb = b1 . . . bnb . If zk0 is a non-minimum phase, the following expression for the asymptotic variance of the zero estimate is found in [17]: ¯ ¡ ¢¯2 ¯ ¡ ¢¯2 ¤ £ α2 ¯H zk0 ¯ ¯A zk0 ¯ 0 2 lim lim E ẑk − zk = ³ (6.5) ¯ ¯−2 ´ ¯ ¡ ¢¯2 , nb →∞ N →∞ ¯Q z 0 ¯ 1 − ¯zk0 ¯ k where Q (q) is minimum phase filter for the input, i.e. u (t) = Q (q) v (t), v being white noise with unit variance. Notice that this expression is asymptotic also in the model order. The input design problem considered here is: 1 min Φu 2π subject to Zπ Φu (ω) dω (6.6) −π Var (ẑk ) ≤ γ The objective function to minimize is the input power, under quality constraints on the variance of the zero estimate, which can be expressed as (6.1) or (6.5). In [17] it is shown that the problem (6.6) can be formulated as a CHAPTER 6. ZERO ESTIMATION 46 convex optimization problem in the input spectrum. The point is to express the constraint as a convex function of Φu . Using the general asymptotic expression of the variance of the zero estimate (6.1), the convex formulation of the problem results: 1 min Φu 2π subject to Pθ−1 − 0 Zπ Φu (ω) dω (6.7) −π α2 γ Γb0 Γ?b0 ≥ 0 ³ ´T where Γb0 = ΓTb 0 . The problem is feasible but infinitely parametrized. An input spectrum parametrization as discussed in Section 3.3 is then needed to solve the problem numerically. Nevertheless, [17] show that for some simple cases it is possible to derive analytical solutions to this problem. These are in particular FIR and ARX models. 6.2 FIR Consider the FIR system y (t) = q −nk B (q, θ) u (t) + e (t) (6.8) and a non-minimum phase zero zk0 of B (q, θ0 ). The optimal input in this case has the following spectrum [17]: Φu (z) = Q (z) Q (z)? q ¡ ¢−2 1 − zk0 α Q (z) = √ . ¡ ¢ γ 1 − z 0 −1 z −1 k (6.9) (6.10) 2 The variance of the zero estimate is γ and the required input energy is αγ . Notice that the optimal input is independent of the model order, which though has to be equal or greater than the true system order. Consequently, if there is no undermodeling and the optimal input is chosen, also the variance of the zero estimate will not depend on the model order [6]. 6.3 ARX In [6] it is proven that for the ARX model y (t) = 1 q −nk B (q, θb ) u (t) + e (t) , A (q, θa ) A (q, θa ) (6.11) CHAPTER 6. ZERO ESTIMATION 47 the optimal input spectrum is exactly the same found in Section 6.2. The same observations made in the previous section, regarding the independence of the input spectrum and the variance of the zero estimate from the model order, can be repeated in this case. Furthermore, it is possible to see that the optimal input spectrum and the variance of the zero estimate are also independent of the polynomial A. 6.4 General linear SISO systems For general SISO models it is possible to derive analytical solutions to the input design problem based only on the asymptotic covariance expression (6.5). In that case, the optimal input spectrum found in [6] is Φu (z) = Q (z) Q (z)? q ¡ ¢−2 ¯ ¡ ¢ ¡ ¢¯ 1 − zk0 α ¯H zk0 A zk0 ¯ . Q (z) = √ ¡ ¢−1 γ z −1 1 − z0 (6.12) (6.13) k The analytic expression (6.13) found for general SISO systems is correct when the model order is sufficiently high. If this is not the case, problem (6.7) should be solved numerically. By a suitable parametrization of the input spectrum, it is possible to write the problem as a finite dimensional convex problem that can be easily solved by efficient numerical optimization methods [17]. Parametrizations of the input spectrum have been discussed in Section 3.3. In [17] a numerical example shows that when the model order equals the real system order the optimal input is in general a sum of sinusoids of different frequencies, while if the model order is increased, the optimal spectrum has a different shape, given by the expression (6.13). The same paper also shows that there is a significant benefit in using the optimal input instead of a white input: the variance of the estimated zero decreases, especially when the zero is close to the unit circle. Also, it is shown that the input spectrum obtained by substituting the true value of the zero by the zero estimate is robust with respect to the true zero position. CHAPTER 6. ZERO ESTIMATION 6.5 6.5.1 48 Adaptive algorithm for time domain input design A motivation So far it was considered only the case in which the real system (2.2) is contained in the model class (2.1) and upon that condition an expression for the optimal input spectrum was given, when the model order is sufficiently high. It has been shown in [35] that, upon certain conditions, an input that is designed to be optimal for a scalar cost function and a full order model, results in experimental data for which also reduced order models can be used to identify the property of interest. For example, in the zero estimation problem, [35] proves that the input u (t) = z0−1 u (t − 1) + r (t − 1) , (6.14) where z0 is a non-minimum phase zero of the real system and r is zero mean white noise with unit variance, can be used to consistently estimate z0 by a two parameter FIR model y (t) = θ1 u (t − 1) + θ2 u (t − 2) . (6.15) That means that if the input (6.14) is applied to the real system (2.2) and the NMP zero z0 is estimated as the zero of the the FIR model (6.15), that is − θθ21 , then the estimate error converges to zero as the number of data goes to infinity. Since the real value of the zero is not known, the optimal input ought to be modified in order to be used in practical situations. There are two main approaches: the first is to design the input taking into account that the real system is uncertain a priori [3]; the second one is an adaptive design where the input is successively updated as new information is available [18, 36]. It has been proven in [37] and [38] that when the real system is in the model set and an ARX model is used, adaptive input design achieves asymptotically the same accuracy as the optimal input (6.10) based on the knowledge of the real system. The idea of an adaptive solution to input design for general SISO systems in case of undermodeling is introduced in [18]. It iteratively generates the input (6.10) where z0 is replaced by the latest available estimate of the zero. CHAPTER 6. ZERO ESTIMATION 6.5.2 49 Algorithm description Consider the system (2.2) defined in Section 2.1. Let z0 be a non-minimum phase zero of B0 (q) = B (q, θ0 ). An FIR model is used to estimate the zero, y (t) = ϕ (t)T θ, h i where ϕ (t)T = u (t − 1) u (t − 2) , and the input is q u (t) = ρt−1 u (t − 1) + 1 − ρ2t−1 r (t − 1) , where r (t) is zero mean and unit variance white noise and ρt = The zero estimate is updated by the equations 1 θ̂ (t) = θ̂ (t − 1) + R−1 (ρt−1 ) ϕ (t) [y (t) − ŷ (t)] t θ̂1 (t) ρt = − θ̂2 (t) where Ã R (ρ) = 1 ρ ρ 1 (6.16) 1 ẑt [18]. (6.17) (6.18) ! . (6.19) The algorithm is a simplified version of the RLS-algorithm for the model (6.5.2), where a constant R (ρ) is used instead of a time-varying matrix Rt (ρ). Notice that if a perfect estimate of the zero is available, i.e. ρt−1 = z0−1 , then the input is generated by the recursion (6.14) providing a consistent estimate of the zero (see Section 6.5.1). Therefore, the method is consistent, because if the input is (6.14) then the zero estimate converges to the true value of the zero and viceversa, if the zero estimate converges to the true zero, then the algorithm produces the input (6.14). The proof of the algorithm convergence is given in [18] when G0 is a stable rational transfer function with exactly one time delay and with at least one real NMP zero, and when H0 is stable. Actually, a projection mechanism ought to be introduced to assure convergence. The trajectory of ẑt is forced to lie in a compact set that is strictly inside the domain of attraction of the zero z0 . 6.5.3 Algorithm modification for Markov chain signals generation Observe that the input (6.16) has the spectrum Φuadapt (z) = 1 − ρ2t−1 . (z − ρt−1 ) (z −1 − ρt−1 ) (6.20) CHAPTER 6. ZERO ESTIMATION 50 Figure 6.1: Representation of the adaptive algorithm iteration at step k. uk denotes the vector of all collected input values from the beginning of the experiment, used for the output prediction ŷ (k). From the results in Section 4.2, the same spectrum (6.20) can be obtained with a Markov chain signal having state space S = {1, −1} and described by the graph in Figure 4.1, with p=q= ρt−1 + 1 . 2 (6.21) Therefore, it is possible to generate in the time domain a binary signal having exactly the spectrum (6.20), by extracting samples from a Markov chain probability distribution. In this setting, the probability distribution will be time-dependent, since the transition probabilities are functions of the zero estimate (then p = pt−1 ). The adaptive solution uses the current zero estimate to update the probability pt−1 that is used then to generate the next input value u (t). The algorithm iteration scheme is shown in Figure 6.1. While the initial solution generates the input by filtering white noise through a time varying filter, this solution extracts each input sample from a time varying distribution. This method has the advantage that the input has amplitude constrained directly by its probabilistic model, while keeping the same spectral properties as the general non binary inputs. Notice that when no undermodeling is considered, the optimal or asymptotically optimal input spectra for NMP zero estimation, (6.10) and (6.13), differ only by a scaling gain from the spectrum of the considered Markov z −1 +1 chain with p = q = 0 2 . Some numerical results are shown in the next subsection. CHAPTER 6. ZERO ESTIMATION 6.5.4 51 Numerical example Consider the example taken from [18]. The real system is y (t) = (q − 3) (q − 0.1) (q − 0.2) (q + 0.3) q u (t) + e0 (t) 4 q (q − 0.5) q − 0.8 where e0 is Gaussian white noise with variance λ0 = 0.01. The NMP zero is z0 = 3. Figure 6.2 presents one typical zero estimate trajectory evaluated by the algorithm presented in Section 6.5.2 and its modification form Section 6.5.3 that uses a Markov chain input signal. The Figure shows similar plots for Figure 6.2: A zero estimate trajectory produced by the adaptive algorithms described in Sections 6.5.2 and 6.5.3. the two algorithms, both reaching a final value very close to the true zero: ẑMarkov = 3.061 ẑAR = 3.026 Figure 6.3 compares the normalized variance of the estimate error of the two algorithms. The variance has been calculated through M = 1000 MonteCarlo simulations, each consisting of N = 30000 iterations of (6.16)-(6.17). The normalized variance of the estimate error at each time instant is evaluated by M ¢2 1 X¡ i var (ẑk − z0 ) = ẑk − z0 . (6.22) kM i=1 CHAPTER 6. ZERO ESTIMATION 52 Figure 6.3: Normalized variance of the estimation error for the adaptive algorithms described in Sections 6.5.2 and 6.5.3. The result shows the empirical convergence of both algorithms, since the normalized variance converges to a constant value. 6.6 Conclusions This chapter proposed a Markov chain model for NMP zeros estimation. For ARX or FIR models the considered Markov chain has exactly the asymptotically optimal (in the number of data) input spectrum for zero estimation. For general SISO models, the spectrum is asymptotically optimal in the number of data and in the model order. It has also been shown that the binary signal generated through a Markov chain with a certain probability distribution has the same spectral properties as the AR signal which guarantees consistent estimates of NMP zeros in undermodeling conditions. Therefore, a binary signal can be used to consistently estimate NMP zeros of LTI SISO systems using an FIR output model. The algorithm proposed in [18], which adaptively generates the AR input signal, has been modified in order to fit with a Markov chain input model. A numerical example showed that the new solution has the same empirical convergence properties as the previous one. The advantage is that a binary signal can be used and therefore any input amplitude constraint is assured. Chapter 7 Summary and future work This chapter summarizes the results obtained in this thesis and suggests some directions for the future work. 7.1 Summary Input design is an important issue in system identification, in order to optimize the quality of the model constructed by experimental data. In Chapters 2 and 3 system identification and well known input design techniques were reviewed. The objective of this thesis has been to study the input design problem when finite Markov chains are used as models of the input signals. The main advantage of this approach with respect to the one in the frequency domain, is that the input model directly includes the input amplitude constraints that are always present in practical applications. Secondly, the solution in the probability domain makes it easier to generate the input signal, since its samples can be extracted from the optimal distribution. The Markov chain input model and its properties were described in Chapter 4. Chapter 5 has shown that when the objective of the identification is parameter estimation of a linear system, a Markov chain model of the input signal can notably improve system identification. The estimate accuracy, measured as the trace of the covariance matrix of the parameter estimate, is lower than the value obtained with the binary signal having the optimal input spectrum. For designing the input probability distribution, a stochastic approximation framework had to be introduced. At last, in Chapter 6 the NMP zeros estimation problem has been analyzed. It turned out that a simple two states Markov chain with a prescribed probability distribution allows to consistently estimate a NMP zero of a linear system in undermodeling 53 CHAPTER 7. SUMMARY AND FUTURE WORK 54 conditions. 7.2 Future work From the results obtained in this thesis, some other possibly interesting problems may be analyzed in future works. Some suggestions are: • More general Markov chains could be used to model non binary signals. • A more explicit analytic expression of the spectrum of the general Markov chain model introduced in Chapter 4 could be useful when choosing the input model state space and order. • Hammerstein systems identification requires the estimation of both the linear and the non linear parts of the system. It is known that in some cases this can be done separately, by using binary signals. The problem reduces to the design of the spectrum of a binary signal for the linear subsystem identification; the non linear subsystem identification is in general more complex and depends on the input probability distribution. Therefore, it could be interesting to study the performance of a binary signal generated from a Markov chain distribution for Hammerstein systems identification. • The L2 -gain of a system gives an important information on the feedback stability. An accurate estimate of this gain is often necessary for control applications. Stochastic approximation could be used also in this case to design the optimal probability distribution of a Markov chain for L2 gain estimation. Bibliography [1] G. C. Goodwin, R. L. Payne, Dynamic system identification: experiment design and data analysis. New York: Academic Press, 1977. [2] H. Hjalmarsson, “From experiment design to closed-loop control”, Automatica, vol. 41, no. 3, pp. 393-438, 2005. [3] C. R. Rojas, J. S. Welsh, G. C. Goodwin, A.Feuer, “Robust optimal experiment design for system identification”, Automatica 43, 2007, pages 993-1008. [4] J. Mårtensson, H. Hjalmarsson, “Robust input design using sum of squares constraints”, in IFAC Symposium on System Identification, Newcastle, Australia, March 2006, pp.1352-1357. [5] G. C. Goodwin, J. S. Welsh, A. Feuer, M. Derpich, “Utilizing prior knowledge in robust optimal experiment design”, in IFAC Symposium on System Identification, Newcastle, Australia, March 2006, pp.13581363. [6] H. Jansson, “Experiment Design with Applications in Identification for Control”, Doctoral Thesis, KTH, Stockholm 2004. [7] B. L. Cooley, J. H. Lee, S. P. Boyd, “Control-relevant experiment design: a plant-friendly, LMI-based approach”, in American Control Conference, Philadelphia, Pennsylvania, June 1998, pp. 1240-1244. [8] H. Hjalmarsson, J. Mårtensson, B. Wahlberg, “On some robustness issues on input design”, in IFAC Symposium on System Identification, Newcastle, Australia, , March 2006, pp.511-516. [9] X. Bombois, M. Gilson, “Cheapest identification experiment with guaranteed accuracy in the presence of undermodeling”, in IEEE Conference on Decision and Control, Paradise Island, Bahamas, December 2004, pp.505-510. 55 BIBLIOGRAPHY 56 [10] H. Jansson, H. Hjalmarsson, “Input design via LMIs admitting frequency-wise model specifications in confidence regions”, IEEE Transactions on Automatic Control, vol. 50, no. 10, pp. 1534-1549, 2005. [11] K. Lindqvist, H. Hjalmarsson, “Optimal input design using linear matrix inequalities”, in IFAC Symposium on System Identification, Santa Barbara, California, USA, July 2000. [12] M. Barenthin, “On Input Design in System Identification for Control”, Licentiate Thesis in Automatic Control, KTH, Stockholm 2006. [13] C. R. Rojas, J. S. Welsh, G. C. Goodwin, “A receding horizon algorithm to generate binary signals with a prescribed autocovariance”, Proceedings of the ACC’07 Conference, 2007, New York, USA. [14] H. Suzuki, T. Sugie, ”On input design for system identification in time domain”, Proceedings of the European Control Conference 2007, Kos, Grece, July 2-5, 2007. [15] G. C. Pflug, Optimization of Stochastic Models, Kluwer Academic Publishers, 1996. [16] L. Ljung, G. Pflug, H. Walk, Stochastic approximation and optimization of random systems, Birkhauser, 1991. [17] J. Mårtensson, Henrik Jansson, Hakan Hjalmarsson, “Input design for identification of zeros”, Proceedings of the 16th IFAC World Congress on Automatic Control, 2005. [18] C. R. Rojas, H. Hjalmarsson, L. Gerencsèr, J. Martensson, “Consistent estimation of real NMP zeros in stable LTI systems of arbitrary complexity”, European Control Conference, 2009. [19] L. Ljung, System identification: Theory for the user, Second Edition. Prentice Hall, 1999. [20] Y. Nesterov, A. Nemirovski, “Interior-point polynomial methods in convex programming”, Studies in Applied Mathematics 13, SIAM, Philadelphia, PA, 1994. [21] S. Boyd, L. Ghaoui, E. Feron, V. Balakrishnan, “Linear matrix inequalities in system and control theory.”, Studies in Applied Mathematics, SIAM, Philadelphia, 1994. BIBLIOGRAPHY 57 [22] H. Jansson, H. Hjalmarsson, “A framework for mixed H∞ and H2 input design”, In MTNS, Leuven, Belgium, July 2004. [23] V. A. Yakubovich, “Solution of certain matrix inequalities occurring in the theory of automatic control”, Docl. Acad. Nauk. SSSR, pages 1304-1307, 1962. [24] U. Grenander, G. Szegö, Toeplitz forms and their applications. University of California Press, Berkley, CA. 1958. [25] A. Lindquist, G. Picci,“Canonical correlation analysis, approximate covariance extension, and identification of stationary time series”, Automatica, 32(5): 709-733, 1996. [26] P. Stoica, R. Moses, Spectral analysis of signals. Prentice-Hall, Upper Saddle River, New Jersey, 2005. [27] J. L. Doob, Stochastic processes. Wiley, New York, 1953. [28] T. Kailath, Linear systems. Prentice-Hall, Englewood Cliffs, NJ, 1980. [29] M. Ehrgott, Multicriteria Optimization, Second Edition, Springer, 2006. [30] E. Zitzler, “Evolutionary algorithms for multiobjective optimization: methods and applications ”, Doctoral thesis, Swiss Federal Institute of Technology Zurich, 1999. [31] J. C. Spall, “Multivariate stochastic approximation using a simultaneous perturbation gradient approximation”, IEEE Transactions on Automatic Control, vol. 37, no. 3, March 1992. [32] J. C. Spall, “Implementation of the simultaneous perturbation algorithm for stochastic optimization”, IEEE Transactions on Aerospace and Electronic Systems, vol. 34, no. 3, pp. 817-823, July 1998. [33] S. Skogestad, I.Postlethwaite, Multivariable Feedback Control, Analysis and Design., John Wiley and Sons, 2005. [34] K. Lindqvist, “On experiment design in identification of smooth linear systems”, Licentiate thesis, TRITA-S3-REG-0103, 2001. [35] J. Mårtensson, Geometric analysis of stochastic model errors in system identification, Doctoral thesis, KTH, Stockholm, Sweden, 2007. BIBLIOGRAPHY 58 [36] L. Gerencsér, H. Hjalmarsson, “Adaptive input design in system identification”, Proceedings of the 44th IEEE Conference on Decision and Control and European Control Conference, pages 4988-4993, Seville, Spain, December 12-15 2005. [37] L. Gerencsér, H. Hjalmarsson, “Identification of ARX systems with non-stationary inputs-asymptotic analysis with application to adaptive input design”, Automatica, 2008. [38] L. Gerencsér, H. Hjalmarsson, J. Mårtensson, “Adaptive input design for ARX systems”, European Control Conference, Kos, Greece, July 2-5 2007.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement