H A Nonlinear Optimization Approach to -Optimal Modeling and Control 2

H A Nonlinear Optimization Approach to -Optimal Modeling and Control 2

Linköping studies in science and technology. Dissertations.

No. 1528

A Nonlinear Optimization Approach to

H

2

-Optimal Modeling and Control

Daniel Petersson

Department of Electrical Engineering

Linköping University, SE–581 83 Linköping, Sweden

Linköping 2013

Linköping studies in science and technology. Dissertations.

No. 1528

A Nonlinear Optimization Approach to

H

2

-Optimal Modeling and Control

Daniel Petersson

[email protected]

www.control.isy.liu.se

Division of Automatic Control

Department of Electrical Engineering

Linköping University

SE–581 83 Linköping

Sweden

ISBN 978-91-7519-567-4 ISSN 0345-7524

Copyright © 2013 Daniel Petersson

Printed by LiU-Tryck, Linköping, Sweden 2013

To Maria, Wilmer and Elsa!

Abstract

Mathematical models of physical systems are pervasive in engineering. These models can be used to analyze properties of the system, to simulate the system, or synthesize controllers. However, many of these models are too complex or too large for standard analysis and synthesis methods to be applicable. Hence, there is a need to reduce the complexity of models. In this thesis, techniques for reducing complexity of large linear time-invariant ( lti

) state-space models and linear parameter-varying ( lpv

) models are presented. Additionally, a method for synthesizing controllers is also presented.

The methods in this thesis all revolve around a system theoretical measure called the

H

2

-norm, and the minimization of this norm using nonlinear optimization.

Since the optimization problems rapidly grow large, significant e ff ort is spent on understanding and exploiting the inherent structures available in the problems to reduce the computational complexity when performing the optimization.

The first part of the thesis addresses the classical model-reduction problem of lti state-space models. Various

H

2 problems are formulated and solved using the proposed structure-exploiting nonlinear optimization technique. The standard problem formulation is extended to incorporate also frequency-weighted problems and norms defined on finite frequency intervals, both for continuous and discrete-time models. Additionally, a regularization-based method to account for uncertainty in data is explored. Several examples reveal that the method is highly competitive with alternative approaches.

Techniques for finding lpv models from data, and reducing the complexity of lpv models are presented. The basic ideas introduced in the first part of the thesis are extended to the lpv case, once again covering a range of di ff erent setups.

lpv models are commonly used for analysis and synthesis of controllers, but the e ffi ciency of these methods depends highly on a particular algebraic structure in the lpv models. A method to account for and derive models suitable for controller synthesis is proposed. Many of the methods are thoroughly tested on a realistic modeling problem arising in the design and flight clearance of an Airbus aircraft model.

Finally, output-feedback

H

2 controller synthesis for lpv models is addressed by generalizing the ideas and methods used for modeling. One of the ideas here is to skip the lpv modeling phase before creating the controller, and instead synthesize the controller directly from the data, which classically would have been used to generate a model to be used in the controller synthesis problem. The method specializes to standard output-feedback

H

2 controller synthesis in the lti case, and favorable comparisons with alternative state-of-the-art implementations are presented.

v

Populärvetenskaplig sammanfattning

Inom många naturvetenskapliga och tekniska områden används matematiska modeller för att beskriva olika system, till exempel för att beskriva hur ett flygplan kommer att röra sig givet att piloten ställer ut ett visst roderutslag. Dessa matematiska modeller kan exempelvis användas för att spara resurser genom att testa olika prototyper med simuleringar utan att behöva ha den fysiska prototypen. Dessa modeller kan skapas genom fysikaliska principer eller genom att en modell har byggts upp med hjälp av insamlad data.

Dagens moderna och komplexa system kan leda till väldigt stora och komplicerade matematiska modeller och dessa kan ibland vara för stora för att simulera eller analysera. Då behöver man kunna

reducera

komplexiteten på dessa modeller för att det skall vara möjligt att använda dem. Kravet på den reducerade modellen

är att den skall kunna beskriva den stora komplexa modellen tillräckligt väl för det ändamål som krävs.

Det finns många olika slags matematiska modeller av olika grader av komplexitet. Den enklaste typen av modeller är

linjära modeller

och för dessa modeller

är det möjligt att analysera egenskaper och dra viktiga slutsatser om systemet.

Linjära modeller har dock nackdelen att de är begränsade i hur mycket de kan beskriva. Om vi igen tar ett flygplan som exempel, kan man säga att en linjär modell kan beskriva vad som händer med flygplanet om det håller sig på en specifik höjd med en specifik fart. Dock klarar inte den linjära modellen av att beskriva vad som händer om flygplanet avviker från dessa specifika värden på fart och höjd för mycket. En annan typ av modeller är

linjärt parametervarierande modeller

. Dessa modeller beror på en eller flera parametrar som kan beskriva vissa tillstånd. Flygplanet som vi förut beskrev med en linjär modell för en specifik fart och höjd, skulle nu istället kunna beskrivas med en parametervarierande modell.

Denna parametervarierande modell kan, till exempel, vara beroende av dessa parametrar, höjd och fart, och kan då även beskriva vad som händer när flygplanet stiger till en ny höjd och ändrar farten.

I denna avhandling utvecklar vi metoder för att kunna reducera stora komplexa, linjära och linjära parametervarierande modeller till mindre, mer överkomliga modeller. Kravet är att dessa modeller fortfarande ska kunna beskriva det ursprungliga systemet väl så att de kan användas, till exempel, för att analysera systemet.

Med de metoder som har utvecklats för att reducera stora komplexa modeller till mindre modeller som utgångspunkt har även metoder för att kunna konstruera regulatorer för att styra dessa stora komplexa modeller utvecklats.

vii

Acknowledgments

First of all, I would like to thank my supervisor Dr. Johan Löfberg and my cosupervisor Professor Lennart Ljung for all your patience and support. Especially

Johan, for his vast (this time I got it right) knowledge in optimization and always having an open door and taking time to answer my questions.

I would like to thank Professor Lennart Ljung again, as the former head of the

Division of Automatic Control, for the privilege of letting me join the Automatic

Control group and also our current head of the Division of Automatic Control,

Professor Svante Gunnarsson, for always being able to improve on an already excellent workplace and research environment. Of course, I would also like to thank our current administrator, Ninna Stensgård and her predecessors Ulla Salaneck and Åsa Karmelind for always keeping track of everything and always being helpful.

This thesis has been proofread by Dr. Johan Löfberg, Dr. Christian Lyzell, Lic.

Sina Khoshfetrat Pakazad and Lic. Patrik Axelsson. Thank you for your invaluable comments. I would also like thank Dr. Henrik Tidefelt, Dr. Gustaf Hendeby

A

TEX template that was used when writing this thesis.

There have been many joys on the journey as a Ph.D-student, both at work and private. The colleagues that I have shared o ffi ce with, Dr. Henrik Tidefelt and

Lic. Zoran Sjanic deserve an extra thanks for being a very good company in the beginning of this journey, maybe not in the mornings but at least after lunch.

Lic. Rikard Falkeborn, Dr. Ragnar Wallin and Dr. Christian Lyzell also deserves an extra thanks for always being there to discuss anything and everything, both work related and (mostly) irrelevant subjects.

Another person I would like to thank is Dr. Elina Rönnberg. We started at Y together a long time ago and have ever since not been able to leave the university.

All the “onsdagslunchar” and “fika” have meant a lot. Thank you.

A few more people deserve my gratitude, Lic. Fredrik Lindsten and Dr. Jonas

Callmer. As the journey got closer to the end, and the anxiety, over the fact that a thesis should be written, started to grow, Dr. Jonas Callmer, my “Bother in arms”

[

sic!

], helped me by sharing the anxiety by also writing his thesis at the same time.

What also helped was that I found out that Lic. Fredrik Lindsten and I shared a common interest, Beer!, which we like to both talk about and drink. I hope there will be more beer tastings in the future.

For financial support, I would like to thank the European Commission under contract No. AST5-CT-2006-030768-COFCLUO.

Finally, I would like to thank the person who has meant the most. Thank you

Maria! Thank you for all the support and encouragement and thank you for bringing me two of the most important persons in my life; Wilmer and Elsa.

Linköping, August 2013

Daniel Petersson

ix

Contents

Notation

1 Introduction

1.1 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xv

1

2

2

2 Preliminaries

2.1 System Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1.1

2.1.2

Basic Theory and Notation . . . . . . . . . . . . . . . . . . .

Gramians . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1.3

2.1.4

System Norms . . . . . . . . . . . . . . . . . . . . . . . . . .

Output-Feedback Controller . . . . . . . . . . . . . . . . . .

2.1.5

lpv

Systems . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2.1

Local Methods . . . . . . . . . . . . . . . . . . . . . . . . . .

2.3 Matrix Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.3.1

Properties for Dynamical Systems . . . . . . . . . . . . . . .

2.3.2

Matrix Functions . . . . . . . . . . . . . . . . . . . . . . . .

3 Frequency-Limited

H

2

-Norm

3.1 Frequency-Limited Gramians . . . . . . . . . . . . . . . . . . . . .

3.1.1

Continuous Time . . . . . . . . . . . . . . . . . . . . . . . .

3.1.2

Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2 Frequency-Limited

H

2

-Norm . . . . . . . . . . . . . . . . . . . . .

3.2.1

Continuous Time . . . . . . . . . . . . . . . . . . . . . . . .

3.2.2

Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . . .

3.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . .

34

34

36

37

23

23

24

30

9

12

14

15

16

5

6

5

5

19

19

20

4 Model Reduction

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.2 Balanced Truncation . . . . . . . . . . . . . . . . . . . . . . . . . .

4.3 Overview of Model-Reduction Methods using the

4.4 Model Reduction using an

H

2

H

2

-Norm . . . .

-Measure . . . . . . . . . . . . . . . .

39

39

41

43

45 xi

xii

Contents

4.4.1

4.4.2

Standard Model Reduction . . . . . . . . . . . . . . . . . . .

Robust Model Reduction . . . . . . . . . . . . . . . . . . . .

4.4.3

Frequency-Limited Model Reduction . . . . . . . . . . . . .

4.5 Computational Aspects of the Optimization Problems . . . . . . .

4.5.1

Structure in Variables . . . . . . . . . . . . . . . . . . . . . .

4.5.2

4.5.3

Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . .

Structure in Equations . . . . . . . . . . . . . . . . . . . . .

4.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.A Gradient of

V

rob

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.B Equations for Frequency-Weighted Model Reduction . . . . . . . .

4.B.1

4.B.2

Continuous Time . . . . . . . . . . . . . . . . . . . . . . . .

Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . . .

4.C Gradient of the Frequency-Limited Case . . . . . . . . . . . . . . .

5 lpv

Modeling

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.2 Global Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.3 Local Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.4

lpv

Modeling using an

H

2

-Measure . . . . . . . . . . . . . . . . . .

5.4.1

General Properties . . . . . . . . . . . . . . . . . . . . . . .

5.4.2

5.5.1

The Optimization Problem . . . . . . . . . . . . . . . . . . .

5.5 Computational Aspects of the Optimization Problems . . . . . . .

Structure in Variables and Equations . . . . . . . . . . . . .

90

92

97

97

5.5.2

Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

99

5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

87

87

88

88

89

77

80

81

82

65

66

67

75

84

45

55

60

64

65

6 Controller Synthesis 103

6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6.2 Static Output-Feedback

H

2

-Controllers . . . . . . . . . . . . . . . . 104

6.2.1

Continuous Time . . . . . . . . . . . . . . . . . . . . . . . . 105

6.2.2

Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.3 Static Output-Feedback

H

2 lpv

Controllers . . . . . . . . . . . . . 108

6.4 Computational Aspects . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

7 Examples of Applications 121

7.1 Aircraft Example

7.1.1

lpv

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Simplification . . . . . . . . . . . . . . . . . . . . . . . 122

7.1.2

Model Reduction . . . . . . . . . . . . . . . . . . . . . . . . 123

7.2 Model Reduction in System Identification . . . . . . . . . . . . . . 128

7.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

8 Concluding Remarks 133

Contents

Bibliography xiii

135

Notation

Symbols, Operators and Functions

Notation

C

O

N

R

[

a, b

]

A

A e

i

a

i a

Re

a

Im

a

˙ (

t

)

A

I

0

[ A ]

A

T

ij

A

A

1

(

( tr A

) 0

) 0 rank A

Meaning the set of natural numbers the set of real numbers the set of complex numbers

Ordo belongs to the closed interval from

a

to

b

1 the complex conjugate of

a

the real part of

a

the complex part of

a

the time derivative of the function the unit vector with a one in the

x

(

t

)

i

:th element the element-wise complex conjugate of the vector a matrices are denoted by bold, upright capitalized letters the identity matrix a matrix with only zeros element (

i, j

) of the matrix A the transpose of A the complex conjugate transpose of A the inverse of A

A is a positive (semi-)definite matrix

A is a negative (semi-)definite matrix the trace of the matrix A the rank of the matrix A xv

xvi

Notation

Symbols, Operators and Functions

Notation

A

∂a

||

·

||

2

||

N

·

||

·

||

||

·

||

||

F

H

2

H

2

||

·

||

H

μ, σ

2

E (

X

Cov (

)

X

)

Meaning denotes the element wise di ff erentiation of the matrix

A with respect to the scalar variable

a

for vectors the two norm and for matrices the induced two norm the Frobenius norm the

H

2

-norm for dynamical systems the frequency-limited

H

2

-norm for dynamical systems, defined in Chapter 3 the

H

-norm for dynamical systems the Gaussian distribution with mean

σ

2

μ

and variance the expected value of the random variable

X

the covariance matrix of the random variable

X

Abbreviations

Abbreviation lti lpv ltv lft lfr siso miso simo mimo oe qp sdp nlp lmi bmi bfgs cofcluo ls lasso svd

Meaning

Linear time invariant

Linear parameter varying

Linear time varying

Linear fractional transformation

Linear fractional representation

Single input single output

Multiple input single output

Single input multiple output

Multiple input multiple output

Output error

Quadratic programming

Semidefinite programming

Nonlinear programming

Linear matrix inequality

Bilinear matrix inequality

Broyden-Fletcher-Goldfarb-Shanno

Clearance of flight control laws using optimization

Least squares

Least absolute shrinkage and selection operator

Singular value decomposition

1

Introduction

Mathematical models of physical systems are pervasive in engineering. These models can be used to analyze properties of the systems, to simulate the systems, or synthesize controllers. However, many of these models are too complex or too large for standard analysis and synthesis methods to be applicable. Hence, there is a need to be able to reduce the complexity of models. The main goal of this thesis is to develop methods for reducing the complexity of di ff erent systems by minimizing the

H

2

-norm between the large complex system and the reduced system.

Many of the early methods for controller synthesis and model reduction relies on linear algebra and solutions to Lyapunov and Riccati equations. Later, when solvers for more general and advanced optimization methods were developed, it was possible to formulate many of the problems in control theory as, for example, semidefinite programs to be solved using interior-point solvers. However, many of these programs included, not only linear matrix inequalities, lmi s, but also bilinear matrix inequalities, bmi s, which make the problems non-convex. This and the fact that semidefinite programs generally do not scale well with the number of variables sometimes make these problems time consuming and di ffi cult to solve. In this thesis, we take a step back, and instead try to keep the original structure of the problem and formulate a general nonlinear optimization problem using linear algebra and Lyapunov equations, and use a general quasi-

Newton solver to solve the problem. The problems formulated in this thesis are still non-convex, but since the original structure of the problem is kept and a more direct approach is used, it is possible to, for example, impose certain structural constraints on the system matrices and still be able to use the methods for medium-scale systems.

1

2

1 Introduction

1.1

Outline of the Thesis

Most of the results in this thesis concern the minimization of the

H

2

-norm of various linear time-invariant ( lti

) systems with di ff erent structures and how to utilize the di ff erent characteristics of the di ff erent problems. Most of the results are based on standard concepts in matrix theory, linear systems theory and optimization. A brief overview of the necessary concepts in matrix theory, linear systems theory and optimization are presented in Chapter 2.

In Chapter 3, the concepts of frequency-limited Gramians are presented, Additionally, complete derivations for both the discrete-time case and continuoustime case are presented. These are then used to form a frequency-limited

H

2

norm, which is later used in some of the proposed algorithms.

In Chapter 4, a short overview of the model-reduction problem is presented before a number of model-reduction algorithms are presented. These algorithms all try to utilize the di ff erent structures of the equations to be able to solve the problems e ffi ciently using quasi-Newton methods.

In Chapter 5, a number of methods for generating linear parameter-varying models, using the model-reduction methods in Chapter 4 as a foundation, are presented.

In Chapter 6, methods for designing

H

2 controllers, both for linear time-invariant systems and linear parameter-varying systems, are presented. These methods are based on the same procedure as the methods in Chapter 4 and Chapter 5.

Chapter 7 presents two larger examples that highlight some properties and applications for the model reduction and linear parameter-varying algorithms. One example shows a flight clearance application of an Airbus aircraft model and the other example highlights the connections between

H

2 model reduction and system identification.

Finally in Chapter 8 some concluding remarks about the results and suggestions about future research directions are presented.

1.2

Contributions

The first main contributions in the thesis are the model-reduction methods presented in Chapter 4 and especially the frequency-limited model reduction in

Section 4.4.3 and the unified and complete derivation of the frequency-limited

Gramians and frequency-limited

H

2

-norm in Chapter 3, which are based on the publication

Daniel Petersson and Johan Löfberg.

Model reduction using a frequency-limited ber 2012a. URL

H

2

-cost.

arXiv preprint arXiv:1212.1603

http://arxiv.org/abs/1212.1603

,

, Decemwhich has been submitted to

Systems and Control Letters

.

1.2

Contributions

3

The second main contributions in the thesis are the linear parameter-varying generating methods in Chapter 5. To be able to reduce the complexity of a linear parameter-varying model, the idea of model reduction is used to have methods that are invariant to state transformations. These results are based on the publication

Daniel Petersson and Johan Löfberg. Optimization based roximation of multi-model systems. In lpv

-app-

Proceedings of the European

Control Conference

, pages 3172–3177, Budapest, Hungary, 2009, which was extended with

Daniel Petersson and Johan Löfberg. Robust generation of space models using a regularized

H

2

-cost.

In lpv state-

Proceedings of the

IEEE International Symposium on Computer-Aided Control System

Design

, pages 1170–1175, Yokohama, Japan, 2010, to be able to handle uncertainties in the data. These publications with some extensions have also been published in

Daniel Petersson.

based lpv

Nonlinear optimization approaches to modelling and control

H

2

-norm

. Licentiate thesis no. 1453, Department of Electrical Engineering, Linköping University, 2010, and

Daniel Petersson and Johan Löfberg.

Optimization Based Clearance of Flight Control Laws - A Civil Aircraft Application

, chapter Identification of lpv

State-Space Models Using

H

2

-Minimisation, pages

111–128. Springer, 2012b, and have been submitted as

Daniel Petersson and Johan Löfberg. Optimization-based modeling of lpv systems using an

H

2 objective.

nal of Control

, December 2012c.

Submitted to International Jour-

Additionally, an extension of the linear parameter-varying generating methods is presented, where it is possible to control the rank of the coe ffi cient matrices in the resulting linear parameter-varying model.

The third main contributions are the

H

2 controller-synthesis methods in Chapter 6, which use similar ideas as the other contributions to synthesize

H

2 controllers instead. This chapter is partly based on the publication

Daniel Petersson and Johan Löfberg.

ing nonlinear programming. In lpv

H

2

-controller synthesis us-

Proceedings of the 18th IFAC World

Congress

, pages 6692–6696, Milan, Italy, 2011.

2

Preliminaries

This chapter begins by presenting some theory and concepts for system theory.

Some basic optimization background with focus on the concept of quasi-Newton methods will then be presented. The chapter will finish with some matrix theory that will be used in the thesis, where, for example, the concepts of matrix functions are presented.

2.1

System Theory

This section reviews some of the standard system theoretical concepts and explains some system norms that will be used in the thesis.

2.1.1

Basic Theory and Notation

In engineering, mathematical models are often described, in continuous time, by ordinary di ff erential equations. An important subclass of these models is the class of systems of linear ordinary di ff erential equations with constant coe ffi

cients. The models in this class, which are called lti

linear time-invariant models

, models, can mathematically be described, for a continuous-time model, as

˙ y

(

(

t t

) =

) =

Ax

Cx

(

(

t t

) +

) +

Bu

Du

(

(

t t

)

)

,

,

and for a discrete-time model with sample time

T

S

as x (

t

+

T

S

) = Ax (

t

) + Bu (

t

)

,

y (

t

) = Cx (

t

) + Du (

t

)

,

(2.1a)

(2.1b)

(2.2a)

(2.2b)

5

6

2 Preliminaries where x (

t

)

∈ R

n x

is a vector containing the states of the system, vector containing the input to the system and y (

t

)

∈ R

n y

u (

t

)

∈ R

n u

is a is a vector containing the output of the system. The matrices A

,

B

,

C and D are constant matrices of suitable dimensions, where A describes the dynamics of the system, B describes how the input enters the system and C and D describes what is being measured from the system. The system in (2.1) is expressed in state-space form, the corresponding transfer-function form, for the system from u (

t

) to y (

t

), is

Y

(

s

) =

G

(

s

)

U

(

s

)

,

where

U

(

s

) and

Y

(

s

) are the Laplace transforms of u (

t

) and y (

t

) and

G

(

s

) = C (

s

I −

A )

1

B + D

A

C

B

D

.

Here, the notation

A

C

B

D is introduced as the transfer function of the system given a particular realization, A

,

B

,

C and D .

In discrete time, di ff erence equations are used to describe the dynamics of the system, (2.2), and consequently use the

z

-transform instead of the Laplace transform to express the transfer function, i.e., given the discrete-time system in (2.2) the transfer function becomes

G

(

z

) = C (

z

I −

A )

1

B + D .

The vector x , describing the states, can be transformed into a new basis, ˆ , using an invertible matrix, T , i.e., ˆ Tx . This yields the realization

˙ˆ

(

t

) = TAT

1

ˆ (

t

) + TBu (

t

)

,

y (

t

) = CT

1

ˆ (

t

) + Du (

t

)

.

(2.3a)

(2.3b)

The transfer function for this system is

ˆ

(

s

) CT

1

(

s

I −

TAT

1

)

1

TB + D = C (

s

I −

A )

1

B + D =

G

(

s

)

,

thus, there exists infinitely many realizations of a system.

(2.4)

2.1.2

Gramians

Two important entities when it comes to system theory and determining system properties are the

controllability Gramian

, P and the

observability Gramian

, Q .

The equations for these di ff er in continuous and discrete time and the rest of the section is split up into two subsections, one for continuous time and one for discrete time.

2.1

System Theory

7

Continuous-Time Systems

Definition 2.1.

The controllability and observability Gramians, in the continuous-time domain, of the system (2.1) are defined as

P

∞ e

A

τ

BB

T e

A

T

τ

d

τ,

(2.5a)

Q

0

∞ e

A

T

τ

C

T

C e

A

τ

d

τ.

0

(2.5b)

The Gramians in (2.5) can also be written as the stationary solutions to the di ff erential equations

= AP + PA

T

+ BB

T

,

= A

T

Q + QA + C

T

C

,

(2.6a)

(2.6b)

= 0 , thus becoming solutions to the algebraic equations, called

Lyapunov equations,

0 =

0 =

AP + PA +

A

T

Q + QA

BB

T

,

+ C

T

C

.

(2.7a)

(2.7b)

By using Parseval’s identity on (2.5), the Gramians can be expressed in the frequency domain.

Definition 2.2.

The controllability and observability Gramians, in frequency domain, for the system (2.1) are defined as where H (

)

P

Q

1

2

π

−∞

1

H

(

) C

T

CH (

) d

ν,

2

π

−∞

H (

) BB

T

H

(

) d

ν,

(

I

A )

1 and H

∗ denote the conjugate transpose of H .

(2.8a)

(2.8b)

One important observation to make, both for the Gramians in continuous time and discrete time (see Section 2.1.2), is that the Gramians are dependent on which

= Tx , T is invertible, the Gramians change

P

T

= T

1

PT

− T

,

Q

T

= T

T

QT

.

(2.9a)

(2.9b)

8

2 Preliminaries

Hence, the eigenvalues of the Gramians change if a state transformation is performed. However, the eigenvalues of the product of the Gramians,

λ

( PQ ), are invariant to state transformations, since

λ i

( P

T

Q

T

) =

λ i

T

1

PT

− T

T

T

QT =

λ i

T

1

PQT =

λ i

( PQ )

σ i

2

,

(2.10) where

σ i

is called a

Hankel singular value

of the system.

The Gramians, both in continuous time and discrete time, can be interpreted physically (see, e.g., Skogestad and Postlethwaite [2007] or Antoulas [2005]). Given a state x , the smallest amount of energy needed to steer a system from 0 to x is given by x

T

P

1 x

,

(2.11) and the observability Gramian describes the energy obtained by observing the output of a system with initial condition x and given no other input and is described by x

T

Qx

.

(2.12)

This goes for both continuous- and discrete-time systems.

Discrete-Time Systems

Definition 2.3.

The controllability and observability Gramians, in discrete time, of the system (2.2) are defined as

P A

k

BB

T

A

k

T

,

(2.13a)

k

=0

Q A

k

T

C

T

CA

k

.

(2.13b)

k

=0

These Gramians also satisfy the discrete Lyapunov equations

0 =

0 =

APA

T

P +

A

T

QA

Q +

BB

T

,

C

T

C

.

(2.14a)

(2.14b)

The definition of the discrete-time Gramians in frequency domain becomes

Definition 2.4.

The controllability and observability Gramians, in frequency domain, for the system (2.2) are defined as

P

Q

π

1

2

π

π

π

1

H (

ν

) BB

T

H

(

ν

) d

ν,

H

(

) C

T

CH (

) d

ν,

2

π

π

(2.15a)

(2.15b)

2.1

System Theory where H e

=

I e

A

1 and H

∗ denote the conjugate transpose of H .

9

2.1.3

System Norms

System norms are important tools when it comes to comparing and analyzing systems. In this thesis, mainly the

H

2

-norm will be used. In this section, the two most commonly used norms in system theory, namely the

H

2

-norm and the

H

-norm are presented and defined.

Given a system

G

=

A

C

B

D such that

˙

(

t

) = Ax (

t

) + Bw (

t

)

,

z (

t

) = Cx (

t

) + Dw (

t

)

,

(2.16a)

(2.16b) where x is the state, w is a disturbance and z is the output of interest. Suppose a system that guarantees a certain performance is wanted, e.g., w does not influence z too much. The system norms are functions that quantify this into something computationally tractable, with di ff erent interpretations. System norms can be interpreted as norms that answer the question: “given information about the allowed input, how large can the output be?”.

To be able to do this, two signal norms that will be used to interpret the system norms are defined.

Definition 2.5 (

L

2 is defined by

, 2-norm in time).

The

L

2

-norm for square integrable signals

|| e (

t

)

||

L

2

|| e (

τ

)

||

2

2 d

τ.

0

(2.17)

|| e (

t

)

||

L

2 is also referred to as the energy of the signal e (

t

).

Definition 2.6 (

L

∞ signals is defined as

,

-norm in time).

|| e (

t

)

||

L

The

L

-norm for magnitude-bounded sup

τ

0

|| e (

τ

)

||

2

.

(2.18)

For a scalar signal e (

t

),

|| e (

t

)

||

L

∞ is simply the peak of the signal.

These signal norms are used to define some system norms in the next section.

10

2 Preliminaries

Continuous-Time

H

2

-Norm

For a the siso system

H

2

G

, which has the realization (2.16) with

-norm can be defined as

A Hurwitz and D = 0 ,

||

G

||

H

2 sup

|| w (

t

)

||

L

2

1

|| z (

t

)

||

L

.

(2.19)

For some physical interpretations of the

H

2

-norm, see for example Skogestad and Postlethwaite [2007], Skelton et al. [1998] or Zhou et al. [1996]. However, the definition that will be used mostly in this thesis is

Definition 2.7 ( proper ( D = 0

H

2

-norm).

For an asymptotically stable ( A Hurwitz) and strictly

) continuous-time system,

G

, the

H

2

-norm is defined as

||

G

||

H

2

1

2

π

−∞ tr

G

(

)

G

(

)d

ν.

(2.20)

One important thing to note about the norm (see Section 2.1.3),

not

the multiplicative property,

|| an induced norm and does

GF

||

H

2

H

2

-norm is that it is, in contrast to the

≤ ||

G

||

H

2

||

F

||

H

2

, with

not

, in general, satisfy

G

and

F

being two

H

∞ lti systems. This property, if true, makes it possible to analyze individual systems

in series to conclude facts about the interconnected system.

The forms in (2.19) and (2.20) are not suitable for actual evaluation of the norm. However, the dly form. The

H

2

H

2

-norm can be expressed in a more computationally frien-

-norm in (2.20) can be rewritten, given a system

G

H

2

with a realization as in (2.16), using the Gramians in (2.5), to

||

G

|| 2

H

2

=

=

=

1

2

π

−∞

1

2

π

tr

−∞

1 tr

2

π

∞ tr

G

(

)

G

(

)d

ν

−∞

B

T

H

CH

iν iν

C

BB

T

T

CH

H

=

iν iν

1

2

π

−∞

B d

ν

C

T d

ν

tr

G

(

)

G

(

)d

ν

= tr

= tr

B

T

QB

CPC

T

,

(2.21a)

(2.21b) where P and Q satisfy

0 =

0 =

AP

A

T

+

Q

PA

+

T

+

QA +

BB

T

,

C

T

C

.

(2.22a)

(2.22b)

Discrete-Time

H

2

-Norm

All the material for the continuous-time case is readily extended to the discretetime case.

2.1

System Theory

11

Definition 2.8 (

H

2 system,

G

, the

H

2

-norm).

For an asymptotically stable (

-norm is defined as

A Schur) discrete-time

||

G

||

H

2 2

1

π

π

π

tr

G

(e

)

G

(e

)d

ν.

(2.23)

An important observation here is that the system does not have to be strictly proper for the

H

2

-norm to be defined. As in the continuous-time case, the above definition is not in a computationally friendly form, and (2.23) can be reformulated using the definitions of the discrete-time Gramians, (2.13), which yields

||

G

|| 2

H

2

=

1

π

tr

G

2

π

= tr

π

B

T

QB +

(e

D

T

)

D

G

(e

)d

ν

=

π

1

2

π

π

tr

G

(e

)

G

(e

)d

ν

= tr CPC

T

+ DD

T

,

where P and Q satisfy

0 =

0 =

APA

T

A

T

QA

P

Q

+ BB

T

,

+ C

T

C

.

(2.24a)

(2.24b)

(2.25a)

(2.25b)

Continuous-Time

H

-Norm

Although our proposed methods revolve around the

-norm, the

H

H

2

-measure, the

H

-measure will be used in various comparisons. Hence, the definition of it will be presented in this section. As with the

H

2

-norm can be defined using the signal norms presented in Section 2.1.3. Given an asymptotically stable ( A

Hurwitz) continuous-time system,

G

, the

H

-norm is

||

G

||

H

∞ max w (

t

) 0

|| z (

t

)

||

|| w (

t

)

||

L

2

L

2

= max

|| w (

t

)

||

L

2

=1

|| z (

t

)

||

L

2

.

(2.26)

Looking at (2.26), it can be observed that, the

H

-norm is indeed an induced norm, and hence satisfies the multiplicative property

||

GF

||

H

≤ ||

G

||

H

||

F

||

H

.

This is one reason for the popularity of this norm.

The definition for the

H

-norm in the frequency domain is

Definition 2.9 (

H

∞ uous-time system,

G

-norm).

, the

H

For an asymptotically stable ( A Hurwitz) contin-

-norm is, in the frequency domain, defined as

||

G

||

H

∞ max

ω

∈ R

¯ (

G

(

))

.

(2.27)

12

2 Preliminaries

Observe that for the

H

-norm, the system does not have to be strictly proper.

The

H

-norm is however not as straightforward to compute as the

H

2 way to compute the

Hamiltonian matrix

W

H

W

-norm is to compute the smallest value

γ

-norm. One such that the

− has no eigenvalues on the imaginary axis, where

C

A

T

+

I

BR

+

1

DR

D

1

T

D

C

T

C

A

BR

+

1

BR

B

1

T

D

T

C

T

(2.28) and R

γ

2

D

T

D .

Discrete-Time

H

-Norm

The material for the continuous-time case is readily extended to the discrete-time case. The definition for the

H

-norm in discrete time becomes

Definition 2.10 (

H

∞ time system,

G

, the

H

-norm).

For an asymptotically stable ( A Schur) discrete-

-norm is, in the frequency domain, defined as

||

G

||

H

∞ max

ω

[

π,π

]

G

(e

)

.

(2.29)

2.1.4

Output-Feedback Controller

An

output-feedback controller

,

K

, of order

n

K

can be described as a linear system x

K

(

t

) = K

A

x

K

(

t

) + K

B

y (

t

) u (

t

) = K

C

x

K

(

t

) + K

D

y (

t

)

(2.30a)

(2.30b) where x

K

signal and

∈ u

R

n

K

is the state vector of the controller,

R

n u

y

∈ R

n y

the measurement the control signal. A commonly used model for analyzing systems and measure performance, which will be used in this thesis, is

⎜⎜⎝ x z y

⎟⎟⎠

=

⎜⎜⎝

⎜⎜⎜⎜

A

C

C

1

2

B

D

D

1

11

21

B

D

D

2

12

22

⎟⎟⎠

⎟⎟⎟⎟

⎜⎜⎝

⎜⎜⎜⎜ x w u

⎟⎟⎠

,

(2.31) where x

∈ R

n

control signal,

x

is the state vector, z

∈ R

n z

w

∈ R

n w

the disturbance signal, the performance measure and y

∈ R

n y

u

∈ R

n u

the the measurement signal. Here, the matrix D

22 is assumed, without loss of generality, to be zero, see

Zhou et al. [1996]. Combine equations (2.31) and (2.30) to arrive at a state-space representation of the closed-loop system from

T w,z

=

⎢⎢⎢⎢

C

A

1

+

+ B

K

D

2

B

12

K

C

K

D

2

D

C

C

2

2

B

2

K

D

K

A

12

C

K

C

w

D

B

1

11 to

+

+ z , see Figure 2.1,

K

B

2

B

D

K

D

D

12

21

K

D

D

21

D

21

⎥⎥⎥⎥

.

(2.32)

The two types of controllers that will be mentioned in this thesis are controllers. These controllers are designed to minimize the

H

2 or

H

2

H

∞ and

H

-norm of

2.1

System Theory

13

z w

G y u

K

Figure 2.1:

Feedback

the closed-loop system,

T w,z

. The problem of finding an or can be divided into three cases. The simple case, both in the case of

H

2 controllers, is to find a full order controller,

n

K

=

n

H

2

H

∞ controller

H

∞ and

x

, see e.g., Skogestad and

Postlethwaite [2007] or Zhou et al. [1996]. The two more di ffi cult cases are to find a

n

K reduced-order controller

, 0

< n

K

< n x

, or a

static output-feedback controller

= 0. However, the problem of computing a reduced-order controller can be reformulated as a static controller problem, this is shown in El Ghaoui et al.

,

[1997] and restated here for clarification.

To see that the problem of finding a reduced-order controller can be reformulated as a static output-feedback controller, first create the augmented system,

G aug

.

G aug

=

⎢⎢⎢⎢

A

aug

=

A

0

A

aug

C

1

,aug

C

2

,aug

D

B

1

,aug

11

,aug

D

21

,aug

0

0

,

B

1

,aug

=

B

1

0

B

2

,aug

D

12

,aug

D

22

,aug

,

B

2

,aug

=

⎥⎥⎥⎥

,

where

0

I

B

0

2

,

C

1

,aug

= C

1

C

2

,aug

=

0

,

D

11

,aug

= D

11

,

D

12

,aug

=

0

C

2

I

0

,

D

21

,aug

=

D

0

21

,

D

0 D

22

,aug

12

= 0

,

,

with the new state space vector augmented with x

K

∈ R

n

K

, x

aug

= x x

K

, the new control signal augmented with u

K

∈ R

n

K

, u

aug

= u

K

u and the new measurement signal augmented with y

K

∈ R

n

K

sizes with all elements zero and

, y

aug

= y y

K

. The 0 ’s are matrices of compatible

I are identity matrices of compatible sizes.

Now use the static controller, structure u

aug

= K

aug

y

aug

, on

G aug

, where K

aug

has the

K

aug

=

K

A

K

C

K

K

B

D

,

where K

A

,

K

B

,

K

C

and K

D

are the matrices from the controller in (2.30). Computing the closed-loop equations for this feedback system will lead to obtaining

14

2 Preliminaries the same equations as in (2.32). This shows that any method for computing a static output-feedback controller can also be used to compute a reduced-order controller.

2.1.5

LPV

Systems

A natural generalization of lti systems is

linear time-varying systems

, ltv systems, where the state-space matrices can be dependent on time. The drawback is that ltv systems are very hard to analyze and work with. This raises the need of an intermediate step to represent systems, and this is where

linear parametervarying systems

, lpv systems, comes in.

lpv systems depend on scheduling parameters, p , that varies with time,

but are measurable

. A general lpv system can be written, in state-space representation, in continuous time, (see Tóth [2008]), as

G

( p ) :

˙

(

t

) = A ( p ) x (

t

) + B ( p ) u (

t

)

,

y (

t

) = C ( p ) x (

t

) + D ( p ) u (

t

)

,

(2.33) where p is the vector of

scheduling parameters

. Note that there is no restriction on how the lpv system depends on the scheduling parameters, hence it can be nonlinear and also depend on the time derivative of p .

lpv systems have the property that if the scheduling parameters in the the system becomes a regular lti system.

lpv system are kept constant,

As with ordinary lti systems, the state-space representation for an lpv system is not unique and it is possible, by applying a state transformation, to change the basis of the states. As with the system matrices, when generalizing to tems from lti lpv syssystems, the state transformations can depend on the scheduling parameters, i.e., x = T ( p )ˆ

,

(2.34) where T ( p ) is a nonsingular continuously di ff erentiable matrix for all this similarity transformation to the system in (2.33) yields

t

. Applying

ˆ

( p ) =

T

1

( p ) A ( p ) T ( p ) + T

1

( p ) ˙ ( p )

C ( p ) T ( p )

T

1

( p ) B ( p )

D ( p )

.

(2.35)

Note that there is a term in the new A -matrix that depends on the time derivative of the state transformation.

A general discrete-time state-space lpv

Tóth [2011], system can be written as, see Kulcsar and

G

(

P

k

) =

A (

P

C (

P

k k

)

)

B (

P

k

D (

P

k

)

)

,

(2.36) where

P

k

= p

k

+

j

j

=

−∞

. By applying a similarity transformation (which can depend on the parameters), i.e., x

k

= T ( p

k

x

k

,

(2.37)

2.2

Optimization

15 where T ( p

k

) is a nonsingular and bounded matrix for all

k

, an lpv system with the same behavior but with another state-space representation is constructed,

ˆ

(

P

k

) =

T ( p

k

+1

C (

P

) A (

P

k k

) T ( p

)

k

T

)

( p

k

) T ( p

k

+1

D (

P

) B (

P

k k

)

)

.

(2.38)

Looking at how the state transformations work for the lpv system above, one realizes that in one state base the state-space matrices can depend on only the current value of the parameter and in another it can also depend the derivative

(in discrete time, the parameter values at other time steps than the current). Similar behavior can be seen when going from an lpv form to an input-output model structure of the system described in state-space lpv system. For example, study an example from Tóth et al. [2012], where a second order state-space representation of an lpv system is used, x

k

+1

=

0

1

a

2

a

1

(

p k

(

p k

)

) x

k

+

1 x

k

.

b

2

b

1

(

p k

(

p k

)

)

u k

, y k

= 0

This system only depends on the current parameter value, i.e., equivalent input-output form becomes

p k

. However, the

y k

=

a

1

(

p k

1

)

y k

1

+

a

2

(

p k

2

)

y k

2

+

b

1

(

p k

1

)

u k

1

+

b

2

(

p k

2

)

u k

2

,

which is clearly not only dependent of only the current parameter value. Hence, it is important to note, when working with lpv systems, if one is working with statespace or input-output forms, since these can give rise to di ff erent dependencies of the parameters.

2.2

Optimization

This section starts by giving a brief presentation of optimization and some methods that can be used to solve optimization problems. The presentation will closely follow relevant sections in Nocedal and Wright [2006].

Most optimization problems can mathematically be written as minimize x

f

( x ) subject to

g

I,i

( x )

0

, g

E,i

( x ) = 0

, i i

= 1

, . . . , m

= 1

, . . . , m

I

E

where the

f

( x ) is the

cost function

,

f

:

constraint functions

. A vector x

R

n

→ R and x

∈ R

n

, and

g

I,i

( x )

, g

E,i

( x ) are is called optimal if it produces the smallest value of the cost function of all the x that satisfy the constraints. In this thesis, the problems will mostly be unconstrained, i.e., problems without any

g

E,i

( x ). The value attained at the solution, is called a

minimum

x

g

I,i

, to the optimization problem,

(

f

x ) or

( x ),

. This can either be a local or global minimum and the point where this value is attained, x is called a

minimizer

(local or global). One way

16

2 Preliminaries to be able to classify when a minimum is attained is to use first order necessary conditions.

Optimization problems can be divided into two classes,

problems

and

non-convex optimization problems convex optimization

. The problems of interest in this thesis will be non-convex. To explain what a non-convex problem is, a convex problem is presented first.

First, define a convex set. A convex set,

N

, is a set, such that any point, line between any two points, x

,

y , in the set, this point, z z , on a

, should also lie in the set, i.e.,

θ

x + (1

θ

) y = z

∈ N

,

θ

[0

,

1]

,

x

,

y

∈ N

.

(2.39)

A convex function is defined in the same manner. A function is convex if it satisfies for all x

,

y

∈ N

f

(

θ

x + (1

θ

) y )

θf

(

x

) + (1

θ

)

f

( y ) and

θ

[0

,

1], where

N is a convex set.

A convex optimization problem is an optimization problem where both the cost function and the feasible set, the set of x ’s defined by the constraints, are convex. Convex optimization problems have the feature that

a local minimizer is always a global minimizer

. This means that when a minimum is found in a convex optimization problem it is the global minimum. This guarantee does not exist in general for non-convex optimization problems. The problem of finding the global minimizer for a general non-convex optimization problem is di ffi cult and often only local minimizers are sought. For further reading see e.g., Nocedal and Wright [2006].

2.2.1

Local Methods

One approach to solve non-convex optimization problems is to use

local methods

, methods that seek for a local minimizer, i.e., a point that in a neighborhood of feasible points has the smallest value of the cost function. A class of local methods which is widely used today in solving nonlinear non-convex problems is the class of

quasi-Newton line-search methods

. These methods typically require that the cost function is twice continuously di ff erentiable, at least for the convergence theory to hold. However, in practice, these methods have been shown to work well on certain non-smooth problems as well, see for example Lewis and Overton

[2012].

The line search strategy is to find a direction p

k

, and a step

α k

, such that f

k f

( x

k

)

> f

( x

k

+

α k

p

k

)

.

(2.40)

There exist many suggestions of how to find the direction p

k

and the step length

α k

. One suggestion, and maybe the most obvious, is to take the direction, which is p

k

=

∇ f

k

||∇ f

k

|| and choose

α k

as

steepest descent

α k

arg min

α f

( x

k

α

p

k

)

.

2.2

Optimization

17

A benefit with the choice p

k

=

∇ f

k

||∇ f

k

||

, is that only information about the gradient is needed and no second-order information, i.e., information about the Hessian.

The problem of choosing the steepest descent direction is that the convergence can be extremely slow.

By exploiting second-order information about the cost function a better search direction can be produced. Assume a model function

m k

( p ) f

k

+ p

T

∇ f

k

+ p

T

∇ 2 f

k

p

,

that approximates the function f be the solution to well in a neighborhood of x

k

, then define p

k

to minimize p

m k

( p )

,

i.e., p

k

=

(

∇ 2 f

k

)

1 ∇ f

k

and

α k

is chosen according some conditions, for more detail see, for example, Nocedal and Wright [2006]. A method with this choice of direction is called a

Newton method

. There are however two major drawbacks with this method, the Hessian has to be computed which can be very time consuming, and the Hessian has to be positive definite.

Quasi-Newton Methods

Quasi-Newton methods are methods that resemble Newton methods but in some way tries to approximate the Hessian in a computationally e ffi cient manner. As in the Newton method, start with a quadratic model function

m k

( p ) f

k

+

∇ f

T

k

p +

1

2 p

T

B

k

p

,

where B

k

is a symmetric positive definite matrix. Instead of computing a new

B

k

for every iteration only an update of is wanted to obtain

Newton method, the minimizer to the model function is p

k

=

B

B then used to calculate x

k

+1 as

B

k

k

1

+1

k

. As for the

∇ f

k

, which is x

k

+1 x

k

+

α k

p

k

.

As in the Newton method,

α k

is chosen according to some conditions which will not be further discussed here, see e.g., Nocedal and Wright [2006] for further reading.

One way of updating B

k

is to let B

k

+1 be the solution to the optimization problem minimize

B subject to B = B

T

,

||

B

B

Bs

k k

||

G

k

= y

k

1

(2.41a)

(2.41b) where s

k

α k

p

k

and y

k

∇ f

k

+1

− ∇ f

k

. The norm that is used in the optimization problem is the weighted Frobenius norm,

18

2 Preliminaries

||

B

||

G

1

k

G

k

1

2

BG

k

1

2

F

,

G

k

0

1

f

( x

k

+

τα k

p

k

)

dτ.

The structure of the optimization problem (2.41) can be explained like this. The constraint that B , which is an approximation of the Hessian, should be symmetric is obvious for a function that is a twice di ff erentiable function. The second constraint, the

secant equation

, ensures that B generates a consistent expression for a first-order approximation of the Hessian using the gradient. To determine B

k

+1 uniquely, the B , in some sense, closest to B

k

is chosen. Additionally, the minimization problem is made scale-invariant and dimensionless, which explains the minimization and the choice of norm and weights.

The optimization problem (2.41) has a closed form solution,

B

k

+1

= (

I −

ρ k

y

k

s

T

k

) B

k

(

I −

ρ k

s

k

y

T

k

) +

ρ k

y

k

y

T

k

, ρ k

y

T

k

1 s

k

.

This update of B

k

is called the dfp

(which stands for Davidon-Fletcher-Powell) updating formula. To compute the direction needed. Since B

k

+1 is a rank two update of B

k

p

k

=

B

1

k

∇ f

k

, the inverse of

, the inverse of

H

1

k

+1

B

k

is can be expressed in closed form as

B

k

+1

H

k

+1

= H

k

H

k

y

T

k

y

k

y

H

k

T

k

y

H

k k

+ s

k

y

T

k

s

T

k

s

k

.

An even better updating formula is the bfgs

(which stands for Broyden-Fletcher-

Goldfarb-Shanno) updating formula where a similar optimization problem as before, but for problem

H

k

+1 instead, is solved.

H

k

+1 is the solution to the optimization minimize

H subject to H = H

T

,

||

H

H

k

||

G

k

Hy

k

= s

k

which has the solution

H

k

+1

(

I −

ρ k

s

k

y

T

k

) H

k

(

I −

ρ k

y

k

s

T

k

) +

ρ k

s

k

s

T

k

.

The benefit with quasi-Newton methods is that every iteration in the optimization scheme now can be performed with complexity

O

(

n

2

), not including function and gradient evaluations.

The bfgs

-scheme will be used extensively in the strategies proposed in this thesis.

2.3

Matrix Theory

19

2.3

Matrix Theory

This section will briefly present, for the sake of easy reference in the later chapters, some basic matrix-theory concepts and definitions. The presented theory can also be found in Higham [2008], Skelton et al. [1998] and Lancaster and Tismenetsky [1985].

2.3.1

Properties for Dynamical Systems

In this thesis, linear dynamical systems plays an important role, especially asymptotically stable linear systems. Two useful matrix definitions for discrete and continuous-time linear systems are,

Definition 2.11.

then A

Let

λ i

is called Hurwitz.

be the eigenvalues to the square matrix A . If Re

λ i

<

0

,

i

,

Definition 2.12.

then A

Let is called Schur.

λ i

be the eigenvalues to the square matrix A . If

|

λ i

|

<

1

,

i

,

For a continuous-time (discrete-time) linear system it holds that, if the is Hurwitz (Schur), then the system is asymptotically stable.

A -matrix

As was explained in Section 2.1.2, the Gramians for linear systems are an important part in this thesis. To compute these Gramians a number of Lyapunov equations (both continuous and discrete), as in (2.7) and (2.14), have to be solved.

An important question to ask is; when do these equations have a unique solution?

Theorem 2.1 (Corollary 3.3.3 in Skelton et al. [1998]).

Lyapunov equation

0 = AX + XA

T

+ Y

,

Y 0

A matrix

X

solving a

(2.42)

is unique if and only if there are no two eigenvalues of located about the imaginary axis.

A

that are symmetrically

Proof: The left eigenvalues left and right by v

i

∗ and v

j

v

i

of A satisfy v

i

A =

, respectively, to obtain

λ i

v

i

. Multiplying (2.42) from

0 = v

i

AXv

j

+ v

i

XA

T v

j

+ v

i

Yv

j

= v

i

Xv

j

λ i

+

λ j

+ v

i

Yv

j

.

(2.43)

This yields unique values for the elements of the transformed ˆ :

ij

V

1

XV

−∗

ij

= v

i

Xv

j

=

− v

λ i i

Yv

j

+

λ j

,

i, j,

V

−∗

= [ v

1

· · · v

n

] if and only if

λ i

+

λ j

0 for all

i

and

j

.

(2.44)

20

2 Preliminaries

Theorem 2.2 (Corollary 3.4.1 in Skelton et al. [1998]).

discrete Lyapunov equation is unique if and only if λ i

(

0 = A

T

XA

X + Y

,

A )

Y 0

λ j

( A )

1

for all i and j .

A matrix

X

solving the

(2.45)

Proof: of A

Multiply (2.45) from the left and right with the matrix of left eigenvectors

(where as follows,

λ i

v

i

= v

i

A , V

−∗

= [ v

1 v

2

· · · v

n

], V

1

AV =

Λ

= diag (

λ

1

, λ

2

, . . . , λ n

)),

V

1

XV

−∗

= V

1

AXA

T

+ Y V

−∗

= V

1

AVV

1

XV

−∗

V

A

T

V

−∗

=

Λ

V

1

XV

−∗

Λ

+ V

1

YV

−∗

.

+ V

1

YV

−∗

This yields unique values for the elements of the transformed ˆ ,

ij

V

1

XV

−∗

= v

i

Xv

j

= 1

λ i

λ j

1 v

i

Yv

j

,

if and only if

λ i

λ j

1, for all

i

and

j

.

(2.46)

The two theorems above tells us that, given an asymptotically stable system ( A

Hurwitz for continuous time and A Schur for discrete time), then the solutions to the Lyapunov equations for the Gramians are unique.

2.3.2

Matrix Functions

This section will give some definitions of matrix functions and present some theory that will be useful in the later chapters of the thesis.

As stated in Higham [2008], there exist many ways of defining matrix functions,

f

( A ). Presented here, is the definition via Jordan canonical form, which exists for all matrices, see for example Lancaster and Tismenetsky [1985].

Definition 2.13 (Definition 1.1 in Higham [2008]).

defined on the spectrum of A

∈ C

n

×

n

if the values

The function

f

is said to be

f

(

j

)

(

λ i

)

, j

= 0

,

1

, . . . , n i

1

, i

= 1

,

2

, . . . , s

(2.47) exist. These are called the values of the function the sizes of the individual Jordan blocks in A and

f

eigenvalues.

s

on the spectrum of A .

n i

are is the number of individual

Now, if

f

( A ).

f

is defined on the spectrum of the matrix, then it is possible to define

Definition 2.14 (Definition 1.2 in Higham [2008]).

spectrum of A

∈ C

n

×

n

and let J

k

Let denote a Jordan block in

f

A be defined on the with A ZJZ

1

=

2.3

Matrix Theory

Z diag ( J

k

) Z

1 and

λ k

denote an eigenvalue of A . Then

f

( A ) Z

f

( J ) Z

1

= Z diag (

f

( J

k

)) Z

1

,

where

f

( J

k

)

⎜⎜⎜⎜

⎜⎜⎜⎜

⎜⎝

⎜⎜⎜⎜

⎜⎜⎜⎜

f

(

λ k

)

f f

(

λ k

(

λ k

)

)

. . .

. ..

. ..

f

(

nk

(

n k

1)

(

λ

1)!

.

..

k

)

f f

(

λ

(

λ k k

)

)

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎠

⎟⎟⎟⎟

⎟⎟⎟⎟

.

21

(2.48)

(2.49)

For example, given the function

f

the definition above can be used to compute

A = ZDZ

1

= Z diag(

λ i

) Z

1

, as

(

x

) = sin

x

, and we want to compute

f

( A

f

( A ). Then

), given a diagonalizable matrix sin A = Z (sin D ) Z

1

= Z diag(sin

λ i

) Z

1

.

(2.50)

A number of properties for general matrix functions, to be able to use them more e ffi ciently, can be derived.

Theorem 2.3 (Theorem 1.18 in Higham [2008]).

subset

Ω ⊆ C

Let such that each connected component of f

Ω

be analytic on an open is closed under conjugation. Consider the corresponding matrix function

C

n

×

n

, the set

D

=

{

A

∈ C

n

×

n

:

Λ

( A )

f on its natural domain in

Ω }

. Then the following are equivalent:

(a) f

( A

) =

f

( A )

for all

A

∈ D

.

(b) f

( A ) =

f

( A )

for all

A

∈ D

.

(c) f

(

R

n

×

n

∩ D

)

⊆ R

n

×

n

.

(d) f

(

R ∩ Ω

)

⊆ R

.

Theorem 2.4 (Theorem 1.19 in Higham [2008]).

or

C

and let f be n

1

Let

D

be an open subset of times continuously di ff erentiable on

D

. Then f

( A )

continuous matrix function on the set of matrices

A

∈ C

n

×

n with spectrum in

R

is a

D

.

Theorem 2.5 (Theorem 1.20 in Higham [2008]).

Theorem 2.4. Then f

( A ) = 0

f

( A ) = 0

for all diagonalizable for all

A

∈ C

A

n

×

n

C

n

×

n

Let f satisfy the conditions in with spectrum in with spectrum in

D

.

D

if and only if

Theorem 2.5 (together with Theorem 2.4) can be interpreted as, if a function satisfies some mild continuity conditions (see Theorem 2.4), then to check the validity of a matrix identity it is su ffi cient to only check it for diagonalizable matrices.

One matrix function that will be used extensively in this thesis is the matrix logarithm, defined below.

22

2 Preliminaries

Definition 2.15.

R

. Let A

Assume A

∈ satisfy the equation

C

n

×

n

A and that

= e

B

A does not have any eigenvalues on for a matrix B

∈ C

n

×

n

, then it holds that

B = ln A , where ln denotes the principal logarithm.

This means, for a diagonalizable matrix A = logarithm of the matrix A can be written as

ZDZ

1

= Z diag(

λ i

) Z

1

, the complex ln A = Z diag (ln

|

λ i

|

+

i

arg

λ i

) Z

1

.

(2.51)

Since computing the matrix logarithm can be computationally heavy, it can be beneficial, when having a sum of logarithm evaluations, to combine them, when possible, to one matrix logarithm computation, e.g., ln two theorems will guide us to when this is possible.

A + ln B = ln AB . The next

Theorem 2.6 (Theorem 11.2 in Higham [2008]).

values on

− ln A

and

R

− ln

and

A

1

/

2

α

=

1

2

[

− ln

1

A

,

.

1]

it holds that

ln A

α

=

α

For

ln

A

A

∈ C

n

×

n with no eigen-

. In particular,

ln A

1

=

Theorem 2.7 (Theorem 11.3 in Higham [2008]).

no eigenvalues on

R

and that

BC = CB

Suppose

B

,

C

. If for every eigenvalue corresponding eigenvalue μ j of

C

,

C

n

×

n

λ j of both have

B

and the

arg

λ j

+ arg

μ j

< π,

(2.52)

then

ln BC = ln B + ln C

.

The methods that will be derived in this thesis will be gradient-based optimization algorithms. Hence, it will be required to compute the Fréchet derivative of the matrix logarithm. The Fréchet derivative can be seen as generalization of the ordinary derivative for matrix functions.

Theorem 2.8 (See Chapter 11 in Higham [2008]).

Let L

( A

,

E )

denote the Fréchet derivative of the matrix logarithm, defined in Definition 2.15, at in the direction

E

∈ C

n

×

n

. Then it holds that

A

∈ C

n

×

n

1

L

( A

,

E ) =

(

t

( A

− I

) +

I

)

1

E

(

t

( A

− I

) +

I

)

1 d

t.

(2.53)

0

As written in (2.51) and (2.53), these equations are not suitable for computational evaluation. Thankfully, there exists computationally e ffi cient and stable algorithms to compute these entities, e.g., the Schur-Parlett algorithm (see, e.g.,

Higham [2008]) can be used to compute ln( A ), and all other functions that are analytic, and an algorithm for computing the Fréchet derivative of the matrix logarithm is described in Al-Mohy et al. [2012].

Frequency-Limited

H

3

2

-Norm

In this chapter, a new

H

2

-measure that, instead of taking the whole frequency interval into account, only focuses on pre-specified intervals is presented. The chapter starts by defining some new Gramians that are based on the ordinary

Gramians in Section 2.1.2, but are limited to a limited frequency interval. These new Gramians are then used to define a new

H

2

-measure that computes the

H

2

norm for a limited frequency interval.

3.1

Frequency-Limited Gramians

This section presents the framework that the new measure, that is presented in Section 3.2, is based on, the frequency-limited Gramians. These Gramians were introduced in Gawronski and Juang [1990] (continuous time) and Horta et al. [1993] (discrete time). The section starts by defining the frequency-limited

Gramians and continues by deriving some properties of the Gramians. Ways to e ffi ciently compute the Gramians are also presented. The results for the continuous-time case, which are also presented in Gawronski and Juang [1990] and

Gawronski [2004], are presented, both for the sake of completeness, and to give a more thorough derivation. Theorem 3.1 and Theorem 3.2, describing the frequency-limited Gramians, are results that already exist in Gawronski [2004]. However, in this section, the results are presented using the given notation and in more detail. The reformulations of S

ω

and S

Ω

3.1 have not been published elsewhere.

presented in Theorem 3.3 and Corollary

The results for the discrete-time case contain a new derivation which di ff ers from

Horta et al. [1993], both in approach and result.

23

24

3 Frequency-Limited

H

2

-Norm

3.1.1

Continuous Time

In this section, it is assumed that the system that is used, stable, with a realization

G

, is asymptotically

˙ (

t

) = Ax (

t

) + Bu (

t

)

,

y (

t

) = Cx (

t

) + Du (

t

)

.

(3.1a)

(3.1b)

G

being asymptotically stable is equivalent to having A Hurwitz. For this system we have that the standard controllability and observability Gramians are

P

Q

1

2

π

−∞

1

H

H

2

π

−∞

iν iν

BB

T

H

C

T

CH

iν iν

d

ν,

d

ν,

(3.2a)

(3.2b) where H

(

I

A )

1

. The controllability and observability Gramians also satisfy the Lyapunov equations

0 = AP + PA

T

+ BB

T

,

0 = A

T

Q + QA + C

T

C

.

(3.3a)

(3.3b)

Narrowing the frequency band in (3.2), from (

−∞

,

) to (

ω, ω

), where

ω <

, leads to the definition of the frequency-limited Gramians, see Gawronski and

Juang [1990].

Definition 3.1.

The frequency-limited controllability and observability Gramians for the system (3.1), are defined as

P

Q

ω

ω

ω

1

2

π

ω

ω

1

H

H

2

π

ω iν iν

BB

T

H

C

T

CH

iν iν

d

ν,

d

ν,

(3.4a)

(3.4b) with

ω <

.

As with the ordinary Gramians, the frequency-limited Gramians can also be written as solutions to two Lyapunov equations.

Theorem 3.1.

Given a system G

=

A

C

B

D

, where

A

is Hurwitz, it holds that

P

ω where

AP + PA

T

+ BB

T

= 0

and

S

ω

=

S

ω

P + PS

T

ω

,

1

2

π

ω

ω

H

(3.5) d

ν . Furthermore,

P

ω can also

3.1

Frequency-Limited Gramians

25

be computed as a solution to

AP

ω

+ P

ω

A

T

+ S

ω

BB

T

+ BB

T

S

T

ω

= 0

.

(3.6)

Lemma 3.1.

For the ordinary controllability and observability Gramians,

Q

, in (3.3), it holds that

P

and

H

BB

T

H

H

C

T

CH

= PH

= QH

iν iν

+ H

P

,

+ H

Q

.

(3.7a)

(3.7b)

Proof: Using the definition of side of (3.7a), it holds that

H

H

1

P + PH

−∗

= (

I −

A ) P + and starting with a variant of the right hand

P

I −

A

T

=

AP + PA

T

= BB

T

,

(3.8) which can be written as (3.7a) by multiplying with H and right, respectively. Similarly, it holds that

H

−∗

Q + QH

1

=

I −

A

T

Q + Q (

I −

A ) =

− and H

A

T

Q + QA which can be written as (3.7b) by multiplying with and right, respectively.

H

and H

from left

= C

T

C

(3.9) from left

Proof of Theorem 3.1: can be written as

P

ω

=

=

1

ω

2

π

PS

ω

ω

+

H

S

ω iν

P

.

Using the definition of

BB

T

H

d

ν

ω

1

= P

2

π

ω

P

ω

H

∗ in (3.4a) and Lemma 3.1, P

ω iν

d

ν

+

ω

1

2

π

ω

H

d

ν

P

Hence, it holds that P

ω

= PS + S

ω

P , with S

ω

=

1

2

π

ω

ω

H

d

ν

.

Before showing that (3.6) holds, observe that

AS

ω

= A

⎜⎜⎜⎝

2

1

π

ω

ω

H

d

ν

⎟⎟⎟⎟

⎟⎟⎟⎠

= A

⎜⎜⎜⎝

⎜⎜⎜⎜

ω

=

⎜⎜⎜⎝

⎜⎜⎜⎜

1

2

π

ω

(

I −

A )

1 d

ν

⎟⎟⎟⎠

⎟⎟⎟⎟

A =

1

2

π

ω

S

ω

ω

A

,

(

I −

A

)

1 d

ν

⎟⎟⎟⎠

⎟⎟⎟⎟ i.e., the matrices

S

ω

P

A and S

ω

commute. Using the newly shown result together with the fact that A and S

ω

commute, AP

ω

+ P

ω

A

T

P

ω

= PS

ω

+ can be written

26

3 Frequency-Limited

H

2

-Norm as

AP

ω

+ P

ω

A

T

=

=

A ( S

S

ω

ω

P

AP

+ PS

ω

+ PA

T

) + (

+

S

ω

P

AP +

+ PS

ω

) A

T

PA

T

S

ω

=

S

ω

BB

T

BB

T

S

ω

.

Hence, (3.6) holds.

The same can be stated for the observability Gramian

Theorem 3.2.

Given a system G

=

A

C

B

D

, where

A

is Hurwitz, it holds that

Q

ω where

A

T

Q + QA + B

T

B = 0

be computed as a solution to and

S

ω

=

S

T

ω

Q + QS

ω

,

1

2

π

ω

ω

H

(3.10) d

ν

. Furthermore,

Q

ω can also

A

T

Q

ω

+ Q

ω

A + S

T

ω

C

T

C + C

T

CS

ω

= 0

,

(3.11)

Proof: The proof is analogous with the proof in the previous theorem, with the controllability Gramian.

To be able to compute the limited-frequency Gramians P

ω

and have a more computationally tractable expression for the matrix S

Q

ω

ω

.

we need to

Theorem 3.3.

The matrix

S

ω

S

ω

=

1

2

π

= Re

ω

ω i

π

H

ln (

A d

ν

can be written as iω

I

)

!

.

(3.12)

Proof: We have that

S

ω

1

ω

2

π

ω

H

d

ν

=

1

ω

2

π

ω

(

I −

A )

1 d

ν f

( A )

.

(3.13)

With

f

(

x

) =

1

2

π

ω

ω

(

I −

x

)

1 d

ν

, Theorem 2.5 states that it is su late the function on the spectrum of A . Let

λ

ffi be an eigenvalue of cient to calcu-

A and since A is Hurwitz, it holds that Re

λ <

0. Hence

1

ω

2

π

ω iν

1

λ

d

ν

=

1

2

π

[

i

ln (

λ

) ]

ω

ω

=

2

1

π

(

i

ln (

λ

)

i

ln (

λ

) )

,

where ln ln

|

λ

|

+

i

λ

(3.14) denotes the principal branch of the complex logarithm, namely ln arg

λ

,

π <

arg

λ

π

. Going back to the matrix form entails

λ

=

S

ω

=

ω

1

2

π

ω

H

d

ν

=

1

2

π

[

i

ln (

A )

i

ln (

A ) ]

.

(3.15)

3.1

Frequency-Limited Gramians

27

Since the principal branch of the logarithm is used, Theorem 2.3 is applicable, which for this case means that given a matrix C

∈ C

n

×

n

it holds that ln C = ln C .

S

ω

becomes

S

ω

=

2

i

π

ln(

A

I

) +

2

i

π

ln(

A

I

) = Re

i

π

ln(

A

I

)

!

.

Remark 3.1.

An interesting property to investigate is what happens when finity. First note that if

x

∈ C \ R

ω

tends to in-

, then Re [

i

ln

x

] =

− arg

x

. Now, let

λ

be an eigenvalue to

A with Re

λ <

0 since A is Hurwitz, then Re proaches infinity. Hence, S

ω i

π

ln(

A

I

) will approach

1

2 when

ω

apwill approach

I

2 and the Lyapunov equations (3.6) and (3.11) will approach the Lyapunov equations for the regular Gramians (3.3) when

ω

approaches infinity.

Until now, only a single frequency band (

ω, ω

) around 0 has been considered.

It is also possible to have arbitrary segments in the frequency domain, e.g.,

Ω

= [

ω

4

,

ω

3

]

[

ω

2

,

ω

1

]

[

ω

1

, ω

2

]

[

ω

3

, ω

4

], 0

< ω

1

< ω

2

< ω

3

< ω

4

.

Q

Ω

,

Corollary 3.1.

For a union of disjunct frequency intervals

Ω

=

"

[

ω

2

k

,

ω

2

k

1

]

k

=1

[

ω

2

k

1

, ω

2

k

]

, with

0

ω

1

< ω

2

<

· · ·

< ω

2

N

<

, it holds that

1

P

Ω

=

2

π

Ω

1

Q

Ω

=

2

π

Ω

H

H

iν iν

BB

T

H

C

T

CH

iν iν

d d

ν,

ν,

(3.16a)

(3.16b)

satisfy the Lyapunov equations

0

0

=

=

AP

Ω

A

T

Q

+

ω

+

P

Ω

A

Q

ω

T

+

A +

S

Ω

S

T

Ω

BB

T

+

C

T

C +

BB

T

S

T

Ω

C

T

CS

,

Ω

, where

S

Ω

= Re

i

π

ln ⎢⎣

⎢⎢⎢⎢

k

=1

(

A

2

k

I

) (

A

2

k

1

I

)

1

⎥⎦

⎥⎥⎥⎥

.

(3.17a)

(3.17b)

(3.18)

Proof: The corollary is proven for the observability Gramian, the proof for the controllability Gramian follows the same procedure. Splitting the integral in

(3.16b) into two di ff erent sums with limits of the integral centered around 0,

28

3 Frequency-Limited

H

2

-Norm yields

Q

Ω

1

2

π

Ω

H

BB

T

H

ω

2

k

1

1

2

π

ω

2

k

1

H

N

1

ω

2

k

d

ν

=

k

=1

2

π

ω

2

k iν

BB

T

H

H d

ν iν

=

BB

N k

=1

T

H

Q

ω

2

k iν

− d

Q

ν

ω

2

k

1

.

(3.19)

Define entails

L

ω i

A

T

Q

ω i

+ Q

ω i

A + S

i

C

T

C + C

T

CS

ω i

= 0 . Using the fact that L

ω i

= 0

0 =

N k

=1

L

ω

2

k

+

L

ω

⎜⎝

⎜⎜⎜⎜

2

k

1

N k

=1

S

=

ω

2

k

A

T

⎜⎝

⎜⎜⎜⎜

N k

=1

Q

ω

S

ω

2

k

1

⎟⎠

⎟⎟⎟⎟

T

2

k

C

T

Q

ω

2

k

1

C + C

T

⎟⎠

⎟⎟⎟⎟

C

= A

T

Q

Ω

+

+

⎜⎝

⎜⎜⎜⎜

N k

=1

Q

Ω

⎜⎝

⎜⎜⎜⎜

N k

=1

A

S

+

ω

Q

2

k

S

ω

2

k

T

Ω

C

T

C

Q

+

ω

2

k

1

S

ω

2

k

1

⎟⎟⎟⎟

⎟⎠

C

T

⎟⎟⎟⎟

⎟⎠

CS

A

Ω

.

(3.20)

Hence, it is proven that (3.17b) holds. If

2

N

S

Ω

Lyapunov equation has to be solved to obtain can be computed, then only one

Q

Ω

.

S

Ω is for the moment a sum of matrix logarithms, which, using Theorem 2.6, can be rewritten as

S

Ω

=

N k

=1

S

ω

2

k

S

ω

2

k

1

= Re

= Re

i

N

π k

=1

i

π

N k

=1

[ln (

A

2

k

I

)

− ln (

A

2

k

1

I

) ] ln (

A

2

k

I

) + ln (

A

2

k

1

I

)

1

(3.21)

Now, we want to show that this sum can be combined into one matrix logarithm evaluation. Theorem 2.5 states that it is su ffi cient to calculate the function on the spectrum of A to show this. Let it holds that Re

λ <

π/

2

N k

=1

<

ln arg

x

2

k x i

<

+ ln

x

1

2

k

1

0. Define arg

x j

< π/

2 for

λ x i

be an eigenvalue to

=

λ

and reorder the terms

i

, with

A

ω i > j

. Note that arg

x

and since

i i

1

>

=

A is Hurwitz,

0 then it holds that

− arg

x i

. Start with

N k

=1 ln

x

2

k

+ ln

x

1

2

k

1

= ln

x

2

N

+ ln

x

1

1

+

N

1 ln

x

2

k k

=1

+ ln

x

1

2

k

+1

.

(3.22)

3.1

Frequency-Limited Gramians

29

Analyzing the argument of the first two terms,

π <

arg

x

2

N x

2

N

+ arg

x

1

1

<

and

0

, x

1

, gives hence, using Theorem 2.7, ln

x

2

N

+ ln

x

1

1

= ln

x

2

N x

1

1

,

π <

arg

x

2

N x

1

1

<

0

.

(3.23)

(3.24)

Analyzing the argument for the last sum in (3.22), yields that for all that 0

0

< ω

1

<

arg

x

< ω

2

2

k

<

+ arg

x

· · ·

< ω

N

1

2

k

+1

< π

and all

. Hence, ln

x i x

2

k

+ ln

x

1

2

k

+1

= ln

x

2

k x

1

2

k

+1

k

, it holds

. Now, since are in the open right half plane, it holds that

0

<

N

1 arg

x

2

k x

1

2

k

+1

k

=1

< π.

Hence, using Theorem 2.7,

N

1 ln

x

2

k

+ ln

x

1

2

k

+1

= ln

k

=1

k

=1

x

2

k x

1

2

k

+1

,

0

<

arg

'

1

x

2

k x

1

2

k

+1

k

=1

< π.

(3.25)

(3.26)

Returning to (3.22),

N k

=1 ln

x

2

k

+ ln

x

1

2

k

1

= ln

x

2

N

= ln

x

+ ln

x

2

N x

1

1

1

1

+

+ ln

N

1 ln

x

2

k

+ ln

x

1

2

k

+1

k

=1

'

1

x

2

k x

1

2

k

+1

= ln

k

=1

k

=1

x

2

k x

1

2

k

1

(3.27) since

π <

arg

x

2

N x

1

1

+ arg

,

N

1

k

=1

A , and therefore it also holds that

x

2

k x

2

k

+1

< π

. This holds for all eigenvalues of

S

Ω

=

N k

=1

S

ω

2

k

S

ω

2

k

1

= Re

= Re

i

N

[ln (

A

i

π

π

ln

k

⎢⎣

⎢⎢⎢⎢

k

=1

(

A

iω iω

2

k

2

k

I

)

− ln (

I

) (

A

A

2

k

1

2

k

1

I

)

1

⎥⎥⎥⎥

⎥⎦

I

) ]

.

(3.28)

Theorem 3.1 tells us that, by using addition of two or more frequency-limited

Gramians corresponding to di ff erent frequency intervals, it is possible to construct a frequency-limited Gramian for a combined frequency interval, e.g., you can construct the frequency-limited controllability Gramian,

ω

∈ Ω

=

Ω

1

∪ Ω

2

, with

Ω

1

= [

ω

2

,

ω

1

]

[

ω

1

, ω

2

] and

Ω

2

P

Ω

= [

ω

4

, for the interval

,

ω

3

]

[

ω

3

, ω

4

]

30

3 Frequency-Limited

H

2

-Norm as

AP

Ω

+ P

Ω

A

T

+ S

Ω

BB

T

+ BB

T

S

T

Ω

= 0

,

(3.29) with S

Ω computed as in Corollary 3.1.

Remark 3.2.

It is also possible to use, with abuse of notation,

ω

in that case the ordinary controllability Gramian, P

=

∞ as the end frequency, can be used in combination with the frequency-limited Gramians.

3.1.2

Discrete Time

The equations for the discrete-time frequency-limited Gramians are similar to the ones in the continuous-time case. However, since the derivation in Horta et al.

[1993] is not as straightforward and yields an erroneous result, we will present our derivation in this section.

Given an asymptotically stable system stable means having A

G

=

A

C

B

D

.

G

being asymptotically

Schur. For this system the frequency-limited controllability and observability Gramians can be defined.

Definition 3.2.

The frequency-limited controllability and observability Gramians for the system

G

=

A

C

B

D

, are defined as with

ω < π

and H e

P

ω

Q

ω

=

I e

ω

1

2

π

ω

ω

1

H

H

2

π

ω

e

e

BB

T

H

C

T

CH

A

1

.

e

e

d

ν,

d

ν,

(3.30)

(3.31)

Inspired by the continuous-time case, the frequency-limited Gramians in discrete-time can be written as solutions to two discrete-time Lyapunov equations.

Theorem 3.4.

Given a discrete-time system G

=

A

C

B

D

, where holds that

P

ω

S

ω

P + PS

T

ω

, where more,

APA

P

ω

T

P + BB

T

= 0

and

S

ω

=

1

4

π

ω

ω can be computed as a solution to

AP

ω

A

T

P

ω

+ S

ω

BB

T

I − e

A

+ BB

T

S

T

ω

1

= 0

.

I

+ A e

A d

is Schur, it

ν

(3.32)

. Further-

(3.33)

To prove Theorem 3.4, a lemma is first presented.

3.1

Frequency-Limited Gramians

Lemma 3.2.

For the ordinary Gramians

P

and

Q

, in

(2.14)

, it holds that

31

H e

BB

T

H

∗ e

H

∗ e

C

T

CH e

=e

PH

∗ e

=e

QH e

+ e

H e

P

P

,

+ e

H

∗ e

Q

Q

.

(3.34a)

(3.34b)

Proof: Using the definition of tions yields

H e

= e

I −

A

1

. Straightforward calculae

PH

∗ e

+ e

H e

P

P

= e

= e

H

1 e

I −

A e

P

P + e

+ e

P

PH e

−∗

I − e

A

H

1

− e

I e

A

PH

−∗

P e

e

I −

A

=

APA

T

P = BB

T

,

(3.35) which can be written as (3.34a) by multiplying with H and right, respectively. Similarly, it holds that e

and H

∗ e

from left e

QH e

+ e

H

∗ e

Q

Q

= e

= e

H

−∗

e

I − e

A

Q + e

Q + e

Q

QH

1 e

I e

A

=

A

T

QA

Q

H

−∗ e

e

I −

A

QH

1

Q e

e

I −

A

= C

T

C

(3.36) which can be written as (3.34b) by multiplying with H

∗ and right, respectively.

e

and H e

from left

Proof of Theorem 3.4: can be written as

Using the definition of P

ω

in (3.30) and Lemma 3.2, P

ω

32

3 Frequency-Limited

H

2

-Norm

=

P

ω

1

ω

=

ω

4

π

ω

1

H e

BB

T

H

∗ e

d

ν

2

π

ω

=

1

ω

2

π

ω

I − e

A

e

H

1

I

+ e

A e

I

2

.

!

= d d

ν

ν

1

P

P

ω

2

π

ω

e

ω

1

+ P

2

π

ω

ω

1

+ P

4

π

ω

PH

e

I

e

H

e e

+ e

A

1

H

I

2

.

I e

d

+

ν

A

P e

P

!

= S

ω

P + PS

ω

.

d

ν

d

ν

(3.37)

Hence, it holds that P

ω

= S

ω

P + PS

S

ω

=

, with

1

4

π

ω

ω

I − e

A

1

I

+ A e

!

d

ν.

Before showing that (3.33) holds, observe that

AS

ω

= A

⎜⎜⎜⎝

⎜⎜⎜⎜

4

1

π

ω

ω

I − e

A

1

I

+ A e

!

d

ν

⎟⎟⎟⎠

=

⎜⎜⎜⎝

⎜⎜⎜⎜

1

4

π

ω

ω

I − e

A

1

I

+ A e

!

d

ν

⎟⎟⎟⎠

⎟⎟⎟⎟

A = S

ω

A

,

(3.38) i.e., the matrices that A and S

ω

A and commute,

S

ω

AP commute. Using that

ω

A

T

P

ω

P

ω

= can be written as

S

ω

P + PS

ω

and the fact

AP

ω

A

T

P

ω

= A ( S

S

ω

ω

P +

APA

T

PS

ω

P

) A

T

+

( S

ω

APA

T

P +

P

PS

ω

S

ω

)

=

S

ω

BB

T

+ BB

T

S

ω

.

Hence, (3.33) holds.

(3.39)

The same can be shown for the observability Gramian.

Theorem 3.5.

Given a discrete-time system G

=

A

C

B

D

holds that where

A

T

QA

Q + C

T

C = 0

Q

ω and

S

ω

S

T

ω

Q + QS

ω

,

=

1

4

π

ω

ω

I − e

A

1

, where

A

is Schur, it

I

+ A e

d

ν

(3.40)

. Fur-

3.1

Frequency-Limited Gramians

33

thermore,

Q

ω can be computed as a solution to

A

T

Q

ω

A

Q

ω

+ S

T

ω

C

T

B + C

T

CS

ω

= 0

.

(3.41)

Proof: The proof is analogous to the one for the controllability Gramian.

Theorem 3.6.

as

The matrix

S

ω

=

1

4

π

S

ω

=

1

2

π

Re

ω

ω

I − e

A

1

I

+ A e

ω

I −

2

i

ln

I −

A e

.

d

ν can be written

(3.42)

Proof: We have that

S

ω

=

ω

1

4

π

ω

I − e

A

1

I

+ A e

d

ν f

( A )

.

(3.43)

With

f

(

x

) =

1

ω

4

π

ω

I − e

iν x

1

I

+

x

e

d

ν,

Theorem 2.5 states that it is su ffi cient to calculate the function on the spectrum of A . Let

λ

be an eigenvalue to A and since A is Schur, it holds that

|

λ

|

<

1. Hence

ω

ω

=

1 +

λ

e

1

λ

e

d

ν

ν

2

i

ln

=

i

ln e

1

λ

e

ω

ω

2

i

ln

= 2

ω

1

2

λ

e

ω

ω i

ln 1

λ

e

i

ln 1

λ

e

(3.44) where ln ln

|

z

|

+

z i

arg denotes the principal branch of the complex logarithm, namely ln

z z

,

π <

arg

z

π

. Going back to the matrix equation entails

=

S (

ω

) =

=

2

π

ω

1

4

π

ω

1

ω

I

I

i

e

A ln

I −

1

A e

I

+

A e

− ln d

I

ν

A e

.

(3.45)

Since the principal branch of the logarithm is used, Theorem 2.3 is applicable.

For this case it means that given a matrix C

∈ C

n

×

n

it holds that ln C = ln C .

S

ω

becomes

S (

ω

) =

=

=

2

2

2

1

π

1

π

1

π

/

ω

ω

Re

I

I

i i

ln

ω

I − ln

2

i

I −

I − ln

A

A e e

I −

A e

+

ln

i

ln

.

I −

A

I − e

A e

! 0

34

3 Frequency-Limited

H

2

-Norm

Remark 3.3.

If

ω

=

π

, then S

ω

=

I

2

− matrix is a real matrix, it follows that

1

π

S

Re [

i

ln (

I

ω

coincides with the regular Gramians when

=

ω

=

I

2

+ A ) ], and since the logarithm of a real

. Thus, the frequency-limited Gramians

π

.

3.2

Frequency-Limited

H

2

-Norm

In this section, we will introduce a new frequency-limited

H

2

-norm that uses the frequency-limited Gramians defined in the previous section. This new measure can for example be used to compare di ff erent models on limited frequency intervals, instead of the whole frequency domain.

3.2.1

Continuous Time

As presented in Section 2.1.3, the

A

C

B

D

H

2

-norm of a continuous-time system

, which is asymptotically stable (

0 ), can be described by

A

G

is Hurwitz) and strictly proper ( D

=

=

||

G

|| 2

H

2

=

=

=

1

2

π

tr

−∞

1 tr

2

π

−∞

1 tr

2

π

−∞

G

(

)

G

(

)d

ν

CH

B

T

H

iν iν

BB

C

T

T

H

CH

iν iν

C

T d

ν

B d

ν

= tr CPC

T

= tr B

T

QB

.

(3.46a)

(3.46b)

(3.46c)

In this section, a new frequency-limited

H

2

-like norm, that uses the frequencylimited Gramians presented in the previous section, is defined and is denoted as

||

G

||

H

2

.

Definition 3.3.

For an asymptotically stable system

G

and 0

< ω <

, define

||

G

||

2

H

2

ω

1

2

π

tr

ω

G

(

)

G

(

)d

ν.

(3.47)

To be able to use the limited-frequency

H

2

-norm in practice, it has to be expressed in a more computationally friendly way.

3.2

Frequency-Limited

H

2

-Norm

35

Theorem 3.7.

For an asymptotically stable system G

=

, the limited-frequency

||

G

|| 2

H

2

H

2

= tr

-norm can be computed as

-

CP

ω

C

T

+ 2 tr CS

ω

B + D

ω

2

π

.

A

C

D

T

!

B

D

, or

||

G

||

2

H

2

= tr B

T

Q

ω

B + 2 tr

-

CS

ω

B + D

ω

2

π

.

D

T

!

, where

S

0

0

ω

= AP

ω

+

= A

T

Q

i

ω

= Re

π

P

ω

A

T

+ S

ω

BB

T

+

+ Q ln (

ω

A +

A

S

T

ω iω

I

)

C

T

!

C +

.

BB

T

S

T

ω

C

T

CS

,

ω

, and

0

< ω <

(3.48)

(3.49)

(3.50a)

(3.50b)

(3.50c)

Proof: Using Theorem 3.1 we can rewrite equation (3.47),

||

G

||

2

H

2

ω

=

1 tr

G

(

)

G

(

)d

ν

2

π

=

1

2

π

tr

ω

ω

CH

B + D B

T

H

C

T

+ D

T

ω

ω

1

= tr C

2

π

ω

+ tr

= tr

⎜⎜⎜⎝

⎜⎜⎜⎜

C

1

ω

2

π

ω

CP

ω

C

T

BB

T

H

ω

H

H

+ 2 tr

d

ν

C

T

+ tr

1

2

π

ω iν

d

CS

ν

ω

BD

B

T

+ D

+

ω

2

π

.

ω

DB

T

1

D

2

π

T

!

.

ω

H

∗ d

ν

DD

d

T

ν

d

ν

C

T

⎟⎟⎟⎠

⎟⎟⎟⎟

The same procedure can be used, using Theorem 3.2 and the fact that also can be written as

||

G

||

2

H

2

Theorem 3.3 shows how S

ω

||

G

||

2

H

2

=

1

2

π

tr

ω

ω

G

(

can be computed.

)

G

(

)d

ν

, to show equation (3.49).

Using Corollary 3.1 it is possible, also for the limited-frequency compute the

Ω

= [

ω

4

,

ω

H

2

3

]

[

ω

2

,

ω

1

]

[

ω

1

, ω

2

]

[

ω

3

, ω

4

], 0

< ω

1

< ω

2

< ω

3

H

2

-norm, to

-norm on arbitrary segments in the frequency domain,

< ω

||

4

G

.

|| 2

H

2

,

Ω

,

One important thing to note that di ff ers between the limited-frequency

H

2 and the ordinary i.e., include

ω

=

H

2

-norm

-norm, is that, if we do not include an infinite interval in

Ω

, as the end frequency, then the system does not have to be strictly proper. This means that it is possible, in this case, to have D 0 .

36

3 Frequency-Limited

H

2

-Norm

3.2.2

Discrete Time

In this section, the new frequency-limited

H

2

-like norm for discrete-time systems, that uses the frequency-limited Gramians presented in Section 3.1.2, is defined.

Definition 3.4.

π

, define

For an asymptotically stable discrete-time system

G

and 0

< ω <

||

G

|| 2

H

2

ω

1

2

π

tr

ω

G

(

)

G

(

)d

ν.

(3.51)

Analogous to the continuous-time case, (3.51) can be expressed in a more computationally friendly way.

Theorem 3.8.

For an asymptotically stable discrete-time system G

=

and

0

< ω < π , the limited-frequency

||

G

|| 2

H

2

= tr CP

ω

C

T

H

2

-norm can be computed as

+ 2 tr

-

CR

ω

B + D

2

ω

π

.

D

T

!

, or

||

G

||

2

H

2

= tr B

T

Q

ω

B + 2 tr

-

CR

ω

B +

ω

D

2

π

.

D

T

!

, where

S

R

0

0

ω

ω

= AP

ω

A

T

P

ω

+ S

ω

BB

T

+ BB

T

S

T

ω

,

=

=

=

A

T

Q

ω

A

Q

ω

1

2

π

1

π

Re

A

1

ω

I −

Re

i

2

i

+ S

T

ω

ln

I

C

T

C +

A e

C

T

CS

,

ω

ln

I −

A e

.

,

A

C

B

D

(3.52)

(3.53)

(3.54a)

(3.54b)

(3.54c)

(3.54d)

Proof: to

By using Theorem 3.4 and Theorem 3.5,

||

G

|| 2

H

2

can easily be rewritten

||

G

||

2

H

2

||

G

|| 2

H

2

-

= tr CP

ω

C

T

= tr B

T

Q

ω

B

+ 2 tr

+ 2 tr

-

CR

ω

B +

CR

ω

B +

D

ω

D

2

π

ω

2

π

.

.

D

T

!

!

,

D

T

,

(3.55a)

(3.55b) where

R

ω

ω

1

=

2

π

ω

H e

d

ν

=

ω

1

2

π

ω

e

I −

A

1 d

ν.

(3.56)

This integral can be computed and simplified similarly to what is shown in the

3.3

Concluding Remarks proof for Theorem 3.4, which leads to

R

ω

=

1

π

A

1

Re

i

ln

I −

A e

.

37

(3.57)

3.3

Concluding Remarks

In this chapter, the frequency-limited Gramians and their derivations have been presented. Computationally more e ffi cient expressions than those presented in the original papers (Gawronski and Juang [1990] and Horta et al. [1993]), were derived. A detailed derivation of the discrete-time frequency-limited Gramians was presented, using the same notation and framework as in the continuous-time case and correcting errors in the available literature. Additionally, the frequencylimited

H

2

-norm that uses these Gramians, both for continuous and discrete time, were presented. This frequency-limited

H

2

-norm will be used for frequencylimited model reduction in Chapter 4.

4

Model Reduction

This chapter starts by introducing the model-reduction problem in Section 4.1.

In Section 4.2, one of the most commonly used methods, balanced truncation

(including frequency weighted and frequency limited), will be presented. Then in Section 4.3 some existing methods that use an

H

2

-measure for model reduction are presented. Then the proposed methods for ordinary, robust, frequencyweighted and frequency-limited model reduction will be presented in Section 4.4.

The material in this chapter is based on an extended version of the results in Petersson and Löfberg [2012a].

4.1

Introduction

Direct numerical simulation of dynamical systems has been a successful strategy for studying complex physical phenomena. However, deriving su ffi ciently detailed mathematical models, e.g., for designing controllers or analyzing performance, can be extremely di ffi cult and can result in large and unnecessarily complicated models. This is the case particularly for systems pertaining to circuit simulations or dynamical systems coming from discretized partial di ff erential equations. These large-scale models can make it di ffi cult to analyze the system, due to memory-limitations, time-limitations, ill-conditioning or computationally expensive analysis methods. Hence, there is a need for smaller models that can describe large complex systems well. One way of creating these low-order models is through model reduction.

Given an lti model,

G

:

˙ (

t

) = y (

t

) =

Ax (

t

) + Bu (

t

)

,

Cx (

t

) + Du (

t

)

,

39

40

4 Model Reduction y e

+

+

G

u

Figure 4.1:

Model reduction

where A

∈ R

n

×

n

, B

∈ R

n

×

m

, C

∈ R

p

×

n

and D

∈ R

p

×

m

reduction problem is to find a reduced-order model

. For this model, the model-

:

˙ˆ

(

t

) = ˆ ˆ (

t

Bu (

t

)

,

ˆ (

t

) = ˆ ˆ (

t

Du (

t

)

,

∈ R

ˆ

×

n

∈ R

ˆ

×

m

∈ R

p

×

ˆ

∈ R

p

×

m

and ˆ , where this reducedorder model, ˆ , describes the original model, to quantify the discrepancy between

G G

G

, well in some metric. One way

, is through the di respective outputs. Particularly, given a certain input, ff erence in their u (

t

), the di ff erence in the output, e (

t

) = y (

t

)

ˆ (

t

), should be small in some norm, see Figure 4.1.

This can be written as an optimization problem minimize

G

,

tem, and

G

H

∞ denotes the size of the system, i.e., the number of states in the sysor

H

2 are two examples of norms that could be used. There are a number of methods that address this problem, for example using balanced truncation (see Section 4.2), e.g., Enns [1984], Moore [1981], Glover [1984], or using optimization, e.g., Flagg et al. [2010], Beattie and Gugercin [2007], Beattie and

Gugercin [2009], Antoulas [2005], Poussot-Vassal [2011], Helmersson [1994] and the material in Section 4.4.

In many applications one is mainly interested in a low-order model that describes the system well only in a certain frequency interval. This leads us to investigate frequency-weighted model reduction. For the frequency-weighted model reduction, weighting filters are utilized, and in order to also facilitate mimo

-systems an input-filter (

W i

) and an output-filter (

W o

) are needed. Example of such methods are, for example, Enns [1984], Diab et al. [2000], Halevi [1992], Sreeram and

Sahlan [2009], Zhou [1995]. Writing the frequency-weighted model-reduction problem as an optimization problem, results in minimize

W o

(

G

− ˆ

)

W i

,

In the frequency-weighted case, the weights have to be given by the user and are in practice often di ffi cult to choose. However, in many applications it is the case that a system should be approximated over a limited frequency interval, while the other frequencies are not important at all. In this case one would like to use

4.2

Balanced Truncation

41 an ideal band-pass filter, but approximating an ideal band-pass filter requires a large number of states in the weighting filters, and can lead to other problems.

To address this issue there are methods, that could be classified as a special class of frequency-weighted model-reduction methods, that will be called frequencylimited model reduction. This class of methods uses approaches that behave as though ideal band-pass filters have been used, e.g., Gawronski and Juang [1990],

Huang et al. [2001], Horta et al. [1993], Sahlan et al. [2012] and Poussot-Vassal and Vuillemin [2012], and we will introduce a new method using this strategy in

Section 4.4.3.

4.2

Balanced Truncation

One of the most commonly used model-reduction schemes is called balanced truncation, introduced in Moore [1981]. The physical interpretation of the balanced truncation is very simple, remove the states that induce a small amount of energy in the output and at the same time require a large amount of energy to excite. By understanding how the observability and controllability Gramians connect to these energies, see Section 2.1.2, one realizes that the system has to be expressed in a basis where the observability and controllability Gramians are equal and diagonal. Recall that the elements on the diagonal in the Gramians are the Hankel singular values of the system, see Section 2.1.2. This basis describes the states that can be classified as both di ffi cult to control and observe, these states that can be removed. These are the states that correspond to the small Hankel singular values. When a system is expressed in such a basis the system is called

balanced

controllability Gramian

U

QU = K

Σ

2

K

. Given a system with the observability Gramian

P , where P have the Cholesky factor U , P = UU

Q

∗ and

, and

, it can be shown that the transformation needed to balance the system can be written as

T =

Σ 1

/

2

K

U

1 and T

1

= UK

Σ

1

/

2

,

(4.1)

= Tx , see for example Antoulas [2005].

Theorem 4.1 (Balanced reduction, Theorem 7.9 in Antoulas [2005]).

balanced system G

=

A

C

B

D

Given a

, which is asymptotically stable, with the Gramians equal to

Σ

and given the partitioning

A =

A

A

11

21

A

A

12

22

∈ R

n

×

n

,

B =

B

B

1

2

∈ R

n

×

m

,

C = C

1

C

2

∈ R

p

×

n

,

Σ

=

Σ

0

1

0

Σ

2

.

(4.2)

Then

=

A

11

C

1

B

D

1

,

A

11

∈ R ˆ

×

n is a reduced-order system of order which is both stable and balanced. Additionally, it holds that

ˆ

,

G

n

H

2

i

= ˆ +1

σ i

,

(4.3)

42

4 Model Reduction

where σ i are the Hankel singular values of the system in descending order of magnitude.

Proof: See Theorem 7.9 in Antoulas [2005]

There are several variations of the balanced-truncation method, which allow us to perform model reduction in a more computationally robust and e ffi cient manner, e.g., Safonov and Chiang [1989], Safonov et al. [1990], Glover [1984]. Two properties that most of the balanced-truncation methods have in common (which make them very popular) are the preservation of stability and the

a priori

computable error bounds. Important to note is that a system resulting from a balanced truncation scheme is not a minimizer to a specific system norm optimization (for example

H

2 and

H

).

As mentioned in Section 4.1, one important class of balanced-truncation methods are the frequency-weighted balanced-truncation methods and they are described in the following way. Let

G

=

A

C

B

D

, be an asymptotically stable system to be reduced. Also assume that an input weighting,

W o

W i

(

s

), and an output weighting,

(

s

), are given. Define the weighted controllability and observability Gramians as

P

Q

i o

1

=

2

π

−∞

1

=

2

π

−∞

(

(

iω iω

I

I

A

A

)

)

1

−∗

B

W i

C

W o

(

)

W i

(

) B

(

)

W o

(

I −

A )

(

) C (

I −

A )

−∗

1 d d

ω,

ω.

(4.4a)

(4.4b) and compute the state transformation that simultaneously diagonalizes

Q

o

P

i

and

. Frequency-weighted balanced-truncation methods then utilize this transformation that diagonalizes P

i

and Q

o

, to do a balanced truncation. This approach to frequency-weighted balanced truncation was first introduced in Enns [1984].

If either

W i

=

I or

W o

=

I this method guarantee stability of the reduced model.

However, if both input and output weightings are used at the same time nothing can be guaranteed. Modifications of this method that guarantee stability when both input and output weights are used are discussed in, e.g., Lin and Chiu

[1992], Varga and Anderson [2001].

Another important class of model-reduction methods, which was mentioned in

Section 4.1, is frequency-limited balanced truncation. This was introduced by

Gawronski and Juang [1990] for continuous-time systems and Horta et al. [1993] for discrete-time systems. In these articles they use frequency-limited Gramians

(see Section 3.1) and simultaneously diagonalize these, to obtain a basis in which the truncation is done. The method in Gawronski and Juang [1990] can be seen as a special case of the method in Enns [1984] by choosing the weighting filters to be ideal bandpass filters (see Gugercin and Antoulas [2004]). However, the method

4.3

Overview of Model-Reduction Methods using the

H

2

-Norm

43 in Gawronski and Juang [1990] cannot guarantee stability. A modification to this method that guarantee stability, has been presented in Gugercin and Antoulas

[2004].

4.3

Overview of Model-Reduction Methods using the

H

2

-Norm

The problem of finding a reduced-order model that, in

H

2 sense, resembles the original model well has been a goal in many investigations. Especially since the work of Meier and Luenberger [1967], and especially Wilson [1970], in which they derive first-order optimality conditions for minimization of the

H

2

-norm, see also, for example, Lepschy et al. [1991], Beattie and Gugercin [2007], Fulcheri and Olivi [1998], Yan and Lam [1999] and references therein. One reason for this could be the fact that the

H

2 criterion provides a meaningful characterization of the error, both in deterministic and stochastic contexts. For example, given two discrete-time asymptotically-stable

y

(

Φ

u t

) and ˆ

(

ω

(

t

siso

) respectively, and a white-noise input

) = 1), then it holds that systems

G

and ˆ , with the outputs

u

(

t

) (i.e., the input spectrum is minimize E (

y

y

ˆ )

2

= minimize

= minimize

π

G

(e

)

ˆ

(e

)

2

Φ

u

(

ω

)d

ω

π

π

G

(e

)

ˆ

(e

)

2 d

ω

= minimize

G

π

2

H

2

.

(4.5)

Finding global minimizers for the

H

2 approximation problem is very di ffi cult, it is in fact a nonlinear non-convex optimization problem (see Example 4.1). The existing methods for

H

2 approximation have the more modest goal of finding local minimizers and can crudely be categorized into two categories; methods using tangential interpolation techniques or methods using gradient-flow techniques.

Example 4.1: Non-Convexity

To show that the cost function

V

=

ˆ

G

true

2

H

2 is non-convex, we start with the system

G

true

=

1

1

1

0

.

A system

=

a c b

0

,

that approximates the system

G

true

, is sought, where

a, b

and

c

are the decision

44

4 Model Reduction variables. Consider an initial guess in an optimization formulation to be the system

G

0

=

8

2

4

0

.

Now, given the system example (

δa, δb, δc

)

T

G

0

, pick a descent direction for the cost function

= (7

,

5

,

5)

T

, such that

V

(

t

), for

ˆ

(

t

) =

8 + 7

t

2 + 5

t

4 + 5

t

0

, t

[0

,

1]

,

then the value of the cost function,

V

direction is non-convex, see Figure 4.2.

(

t

) =

ˆ

(

t

)

G

true

2

H

2

, along the descent

V

(

t

) along the search direction

3

2

.

5

2

1

.

5

1

0

.

5

0

0 0

.

1 0

.

2 0

.

3 0

.

4 0

.

5

t

0

.

6 0

.

7 0

.

8 0

.

9 1

Figure 4.2:

The value of the cost function along the search direction described in Example 4.1. The function clearly demonstrates the presence of local minimas along the search direction.

The gradient-flow algorithms use the gradients of

G

H

2 with respect to the state-space matrices, derived in Wilson [1974] and let these evolve in time to find a local approximation of the given system, see for example Yan and Lam [1999],

Fulcheri and Olivi [1998] and Huang et al. [2001]. The di ff erent algorithms in this class use di ff erent techniques to assure that the reduced model is stable, to speed up the process and to guarantee convergence.

4.4

Model Reduction using an

H

2

-Measure

45

The interpolation-based

H

2 model-reduction techniques tries to find a model whose transfer function interpolates the transfer function of the full-order system (and its derivative) at selected interpolation points. These methods often use computationally e ff ective Krylov-based algorithms which makes these techniques suitable for large-scale problems. Examples of these algorithms are Xu and Zeng [2011], Beattie and Gugercin [2007] and Poussot-Vassal [2011].

4.4

Model Reduction using an

H

2

-Measure

In this section, the proposed methods for model reduction are presented. We consider the following description for the model-reduction problem. Given a system

G

, search for the system ˆ such that

= arg min

W o

G

W i

2

H

2

.

(4.6)

It is assumed that the systems

G

G

=

A

C

B

D where

A

R

n

R

×

n

ˆ

×

n

,

,

B

R

n

×

m

R

ˆ

×

m

,

,

,

have the state-space realizations

C

=

R

p

R

p

×

n

,

×

ˆ

,

D

,

R

p

×

m

R

p

×

m

,

.

(4.7)

(4.8)

Since the

G

H

2

-norm is used, it is also assumed that the system that is to be reduced,

, is asymptotically stable. Since, otherwise, the

H

2

-norm is not defined.

The idea with the proposed methods is to try an approach that tries to tackle the model-reduction problem head on. In Helmersson [1994] the model reduction problem (in

H

-norm) is rewritten as an sdp problem with bmi s, which, even for small models, leads to large optimization problems that are hard to solve. In Ani ć et al. [2013] they rewrite the model-reduction problem to an interpolation problem which makes it hard to incorporate structure in the system matrices. The proposed technique to solve the model-reduction problem is instead to use a nonlinear optimization approach and simply use a quasi-Newton algorithm. Using this technique, the problem is not rewritten in any other format, which makes it possible to both use and incorporate structure in the system matrices. Additionally, by taking caution when di ff erentiating the di ff erent cost functions, and using the structure, the computational complexity can be kept low (in general an overhead cost of

O

(

n

3

) and

O

(

n

2

+

n

ˆ

2

) per iteration).

4.4.1

Standard Model Reduction

The method presented in this section was proposed already in Wilson [1970] for continuous time, however as a special case. The derivation in this section will include weighting filters and also the discrete-time case. In this thesis, a di ff erent derivation will be used, compared to Wilson [1970], with focus on being

46

4 Model Reduction computationally e ffi cient and also laying a foundation for the methods to come in the following sections.

The objective is to minimize the error between the given model,

G

, and the sought reduced-order model, ˆ , in the

H

2

-norm with weighting filters,

W i

and

W o

, i.e.,

= arg min

||

E

|| 2

H

2

, E

=

W o

G

W i

,

(4.9) where it is assumed that

W i

and

W o

are given by the user and have the realizations

W i

=

A

i

C

i

B

i

D

i

, W o

=

A

o

C

o

B

o

D

o

,

(4.10) where

A

i

A

o

R

n

R

n o i

×

n i

×

n o

,

,

B

i

B

o

R

n

R

n i o

×

m

×

m

,

,

C

i

C

o

R

p

×

n

R

p

×

n i o

,

,

D

D

o i

R

R

p

×

m p

×

m

,

.

(4.11)

Using the realizations of

E

=

A

C

E

E

B

D

E

E

=

⎢⎢⎢⎢

⎢⎢⎢⎢

⎢⎢⎢⎢

G,

⎜⎜⎜⎜

⎜⎜⎜⎜

G, W

A

⎜⎜⎜⎝

B

0

0

o

C

i

D

o

C and

0

0

B

o

D

o

W o

,

E

BC

i

BC

A

0

i i

can be realized as

0

0

0

A

o

⎟⎟⎟⎠

⎟⎟⎟⎟

⎟⎟⎟⎟

⎜⎜⎝

⎜⎜⎜⎜

⎜⎜⎜⎜ BD

ˆ

B

0

i i i

0 C

o

D

o

D

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎠

D

i

⎥⎥⎥⎥

⎥⎥⎥⎥

⎥⎥⎥⎥

.

(4.12)

To be able to use the structure in the realization of

E

, a partitioning of the Gramians, P

E

P

E

and

Q

E

=

⎜⎜⎜⎜

⎜⎜⎜⎜

⎜⎜⎜⎝

P

P

P

P

T

12

T

13

T

14

, is introduced

P

P

P

12

T

23

T

24

P

P

P

P

13

23

T

i

34

P

14

P

24

P

34

P

o

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎟⎠

,

Q

E

=

⎜⎜⎜⎜

⎜⎜⎜⎜

⎜⎜⎜⎝

Q

Q

Q

Q

T

12

T

13

T

14

Q

Q

Q

12

T

23

T

24

Q

Q

Q

Q

13

23

T

i

34

Q

Q

Q

Q

14

24

34

o

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎟⎠

.

(4.13)

Since there will be some di ff erences between the continuous and the discretetime cases, both cases will be presented. However, due to many similarities between the two, the continuous-time case will be presented in more detail than the discrete-time case.

Continuous Time

In the continuous-time case, it is assumed that the system is strictly proper, otherwise the

H

2

-norm will be unbounded, i.e., D

o

D

D

i

= 0 . Assuming this, the cost function in (4.9) can be written as, see Section 2.1.3,

||

E

||

2

H

2

= tr B

T

E

Q

E

= tr C

E

B

E

P

E

C

T

E

,

(4.14a)

(4.14b) which are two equivalent ways of computing the cost function, where

Q

E

P

E

and are the controllability and observability Gramians respectively, for the error

4.4

Model Reduction using an

H

2

-Measure

47 system

E

, satisfying the equations

A

E

A

T

E

P

Q

E

E

+

+

P

E

Q

E

A

T

E

A

E

+

+

B

E

B

T

E

C

T

E

C

E

= 0

,

= 0

.

(4.15a)

(4.15b)

Using (4.14) and (4.15) it is possible to state the general necessary conditions for optimality, in which the gradients of the problem readily can be extracted to be used in a quasi-Newton algorithm. In order to be as general as possible, we first neglect the structure in (4.12).

Theorem 4.2 (Necessary conditions for optimality).

W o are asymptotically stable and that

E

Assume that is strictly proper, for the

H

2

defined, i.e.,

A

,

ˆ

,

A

i and

A

o are Hurwitz and

D

o

D

D

i

= 0

G, G, W i and

-norm to be

. In order for the matrices

ˆ

,

ˆ

, to be optimal for the problem (4.9), it is necessary that they satisfy the equations in (4.15) and that

||

E

|| 2

H

2

ˆ

||

E

||

2

H

2

ˆ

||

E

|| 2

H

2

ˆ

E

E

=

2

T

T

Q

E

B

Q

P

E

E

T

o

E

T

o

P

E

Q

E

i

E

=

P

0

C

i

T

E

,

+

+ Q

E

B

E

D

i

D

T

o

C

E

P

E

= 0

,

= 0

,

(4.16a)

(4.16b)

(4.16c)

where

=

⎜⎜⎜⎜

⎜⎜⎜⎜

⎜⎜⎝

0

I

0

0

n n n

×

ˆ

ˆ

×

n i o

×

ˆ

×

ˆ

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎠

,

E

i

=

⎜⎜⎜⎜

⎜⎜⎜⎜

⎜⎜⎝ I

0

0

n

×

ˆ

0

n

ˆ

×

n i n o

×

ˆ

×

ˆ

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎠

,

E

o

=

⎜⎜⎜⎜

⎜⎜⎜⎜

0

0

⎜⎜⎝

0

n

×

ˆ

ˆ

×

n

I

n n i o

×

ˆ

×

ˆ

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎠

.

(4.17)

Before proving the theorem above, two lemmas are needed to simplify the proof.

Lemma 4.1.

If

M

and

N

satisfy the Sylvester equations

AM + MB + C = 0

,

NA + BN + D = 0

, then

tr CN = tr DM

.

Proof of Lemma 4.1:

N

Multiplying the first Sylvester equation from the left with and the second from the right with M , entails

NAM + NMB + NC = 0

,

NAM + BNM + DM = 0

.

Now taking the trace of both equations yields

− tr ( NAM + NMB ) = tr CN

,

− tr ( NAM + NMB ) = tr DM

.

Hence, it holds that tr CN = tr DM .

48

4 Model Reduction

Lemma 4.2.

that

tr

If

A

A

B

∂a ij

C

∈ R

n

×

p

,

B

= B

T

C

T

ij

R

m

×

n and

C

i, j

R

p

×

m and a ij or equivalently

A

= [ A ]

ij

, then it holds

(tr

BAC

) =

B

T

C

T

.

Proof of Lemma 4.2: First note that

A

∂a ij

= e

i

e

T

j

, which is a matrix with a one in element (

i, j

) and zeros elsewhere. Now, it holds that tr B

A

∂a ij

C = tr Be

i

e

T

j

C = tr e

T

j

CBe

i

= e

T

j

CBe

i

= [ CB ]

ji

= B

T

C

T

ij

.

Now, continuing with the proof for Theorem 4.2.

Proof of Theorem 4.2: If A

,

ˆ

,

A

i

and A

o

are Hurwitz, then all the equations in

(4.15) are uniquely solvable. The solutions to the equations in (4.15) are needed to compute the cost function and its gradient. Now, the gradient of the cost function with respect to ˆ ment (

i, j

) in ˆ

,

,

ˆ

,

have to be computed. Let

a ij

, b ij

and

c ij

denote elerespectively, now di ff erentiating (4.14) with respect to

a ij

, b ij

and

c ij

entails

||

E

|| 2

H

2

∂a ij

||

E

||

2

H

2

∂b ij

||

E

||

2

H

2

∂c ij

= tr

Q

E

∂a ij

B

E

B

T

E

,

= tr

= tr

2

B

T

E

∂b ij

Q

E

B

E

+

2

C

T

E

∂c ij

C

E

P

E

+

Q

∂b

E ij

B

E

B

P

E

∂c ij

C

T

E

C

E

T

E

.

,

(4.18a)

(4.18b)

(4.18c)

Di ff erentiate (4.15) with respect to

a ij

, b ij

and

c ij

,

A

T

E

A

T

E

A

∂a

E

Q

Q

∂b

E ij

E ij

P

E

∂c ij

+

+

+

∂a

Q

Q

∂b

E ij

E ij

P

E

∂c ij

A

A

E

E

A

T

E

+

+

+

A

T

E

∂a ij

A

T

E

∂b ij

Q

E

+ Q

E

A

∂a

E ij

A

E

∂c ij

Q

E

P

E

+

+

Q

E

A

E

,

P

E

∂b ij

A

T

E

∂c ij

.

,

(4.19a)

(4.19b)

(4.19c)

4.4

Model Reduction using an

H

2

-Measure

49

Using Lemma 4.1 with (4.18) and (4.19) yields

||

E

||

2

H

2

∂a ij

||

E

|| 2

H

2

∂b ij

||

E

||

2

H

2

∂c ij

=2 tr

A

T

E

∂a ij

Q

E

P

E

,

=2 tr

=2 tr

A

T

E

∂b ij

Q

E

P

E

+

A

E

∂c ij

Q

E

P

E

+

B

T

E

∂b ij

Q

E

B

E

C

T

E

∂c ij

C

E

P

E

.

,

(4.20a)

(4.20b)

(4.20c)

Using the structure in the realization of

E

, (4.12), and Lemma 4.2, entails

||

E

|| 2

H

2

ˆ

||

E

||

2

H

2

ˆ

||

E

|| 2

H

2

ˆ

E

E

=

2

T

T

Q

E

B

Q

P

E

E

T

o

E

T

o

P

E

Q

E

i

E

=

P

0

C

i

T

E

,

+

+ Q

E

B

E

D

i

D

T

o

C

E

P

E

=

=

0

,

0

,

(4.21a)

(4.21b)

(4.21c) where

=

⎜⎜⎜⎜

⎜⎜⎜⎜

⎜⎜⎝

0

0

I

n

×

ˆ

ˆ

×

n n i

0

n o

×

ˆ

×

ˆ

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎠

,

E

i

=

⎜⎜⎜⎜

⎜⎜⎜⎜

⎜⎜⎝

0

0

n

×

n

0

I

n

ˆ

×

n i

×

n i i n o i

×

n i

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎠

,

E

o

=

⎜⎜⎜⎜

⎜⎜⎜⎜

⎜⎜⎝

0

I

0

0

n n n

×

n o

ˆ

×

n o o i

×

n

×

n o o

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎠

.

(4.22)

At a first glance, it can seem restrictive to have a technique that operates on system matrices, since one is given a model in a specific realization. Does this influence the realization of the resulting model or in other ways restrict the sought model? As can be seen in Theorem 4.3 below, this is not the case since the optimization problem becomes invariant to the realization of the given model to be reduced.

Theorem 4.3.

The cost function in the optimization problem (4.6) and its gradient, given in Theorem 4.2, are invariant under state transformations of the systems

G

,

W i and

W o

.

Proof: Given the realizations of

G

,

W i

and in (4.7) and (4.10). The realizations of the transformed systems, given the transformations matrices T , T

i

and

T

o

, become

W o

G

= =

T

1

AT

CT

T

1

B

D

,

50

4 Model Reduction

W i

=

W o

=

C

i i o

C

o

B

o

D

o i i

=

=

T

i

1

C

i

A

T

i i

T

i

T

1

A

o

T

o

C

o

T

o

T

1

i

D

i

B

i

T

1

B

o

D

o

,

.

This can be written as

E

=

A

E

C

E

B

E

D

E

=

T

1

E

A

C

E

E

T

E

T

E

T

1

E

B

D

E

E

,

T

E

=

⎜⎜⎜⎜

⎜⎜⎜⎜ T

⎜⎜⎝

0

0

0

0

I

0

0

0

0

T

i

0

0

0

0

T

o

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎠

.

The matrices P

E

and Q

E

will be transformed as

(4.23)

P

E

= T

E E

T

T

E

,

Q

E

= T

− T

E

Q

E

T

1

E

.

(4.24)

Now it is easy to see that the cost function (4.14) is invariant under the transformation T

E

, since

||

E

|| 2

H

2

= tr B

T

E

Q

E

B

E

T

E

T

T

E

T

− T

E

E

T

1

E

T

E E

T

E

E

B

E

.

(4.25)

E

T

T

− T

E

, T

T

E

E

i

and E

T

o

T

− T

E

are evaluated,

E

T

T

− T

E

E

T

,

T

T

E

E

i

= E

i

T

i

T

,

E

T

o

T

− T

E

= T

− T

o

E

T

o

.

Using (4.26) when computing the gradient entails,

(4.26)

||

E

||

2

H

2

ˆ

E

T

Q

E

P

E

E

T

T

− T

E

E

T

1

E

T

E E

T

T

E

E

T

Q

E E

ˆ

,

||

E

||

2

H

2

ˆ

E

T

Q

E

P

E

E

i

C

i

T

+ Q

E

B

E

D

i

E

T

T

− T

E

Q

E

T

1

E

T

E E

T

T

E

E

i

T

i

− T

i

T

+ Q

E

T

1

E

T

E

B

E

D

i

E

T

E

P

E

E

i i

T

Q

E

B

E i

,

4.4

Model Reduction using an

H

2

-Measure

||

E

||

2

H

2

ˆ

=

2 B

=

2

T

o

E

T

o

Q

E

T

o

P

E

T

T

o

E

T

o

+ D

T

o

C

E

P

E

T

− T

E

Q

E

T

1

E

T

E E

=

T

o

C

E

T

1

E

T

E

2

T

o

E

T

o

E

E

T

T

E

E

T

o

C

E

P

E

ˆ

.

51

Looking at the special case when not having any weighting filters, i.e.,

W o

=

I

,

n i

=

n o

= 0, yields the cost function

W i

=

I and

||

E

|| 2

H

2

||

E

||

2

H

2

= tr

= tr

B

T

QB + 2 B

T

Q

12

ˆ

+ ˆ

T

CPC

T

2 CP

12

C

T

C

ˆ

C

T

,

,

and the first-order conditions for the gradient simplify to

(4.27a)

(4.27b)

||

E

|| 2

H

2

ˆ

||

E

||

2

H

2

ˆ

||

E

||

2

H

2

ˆ

= 2

= 2

= 2

Q

ˆ

+ Q

T

12

P

12

Q

ˆ

+ Q

T

12

B

C

ˆ −

CP

12

=

=

= 0

,

0

.

0

,

P

,

Q

,

ˆ

,

ˆ

,

P

12 and Q

12 satisfy the equations

(4.28a)

(4.28b)

(4.28c)

AP + PA

T

+ BB

T

= 0

,

AP

12

+ P

12

A

T

A

ˆ

P A

T

+ B

ˆ

T

B

ˆ

T

= 0

= 0

,

A

T

Q + QA + C

T

C = 0

,

,

A

T

Q

12

A

T

+ Q

12

ˆ −

C

T

ˆ

+ ˆ

T

=

=

0

0

.

,

(4.29a)

(4.29b)

(4.29c)

(4.29d)

(4.29e)

(4.29f)

Note that P and Q satisfy the Lyapunov equations for the controllability and observability Gramians for the given system,

G

, and ˆ satisfy the Lyapunov equations for the controllability and observability Gramians for the sought system, ˆ .

52

4 Model Reduction

For this special case it is also quite straightforward to derive the Hessian for the cost function. Using di ff erentiated (with respect to

a ij

, b ij

, c ij

equations in (4.29) and using Lemma 4.1 and Lemma 4.2, yields

) versions of the

2

V

∂a ij

∂a kl

=2

2

V

∂b ij

∂b kl

2

V

∂c ij

∂c kl

=

2

=

⎪⎪⎨

2

2

∂a ij

V

∂b kl

2

V

∂c ij

∂a kl

2

V

∂c ij

∂b kl

=2

=2

=2

ˆ

∂a ij

+

kl

0

, ik

, l l

=

j j

,

ˆ

∂a kl ij

+

0

, lj

, i i

=

k k

,

ˆ

∂b kl

ˆ

∂a kl

ˆ

∂b kl ij ij

+ 2 Q

T

12

P

12

∂b ij

2

ij

2

C

P

12

∂a kl

C

P

12

∂b kl ij

, ij

.

kl

,

Q

T

12

P

12

∂a ij kl

+ Q

T

12

P

12

∂a kl ij

,

(4.30a)

(4.30b)

(4.30c)

(4.30d)

(4.30e)

(4.30f)

The explicit equations for the cost function, the gradient and the Lyapunov equations for the case when having both input and output filters are included in Appendix 4.B.1.

Discrete Time

In the discrete-time case the cost function in (4.6) can be rewritten as, see Section 2.1.3,

||

E

|| 2

H

2

= tr B

T

E

Q

E

B

E

= tr C

E

P

E

C

T

E

+ D

T

E

D

E

+ D

E

D

T

E

,

(4.31a)

(4.31b) which are two equivalent ways of computing the cost function. The matrices and Q

E

P

E

are the controllability and observability Gramians respectively, for the error system

E

, and in this case they satisfy the discrete Lyapunov equations

A

E

P

E

A

T

E

A

T

E

Q

E

A

E

P

E

Q

E

+

+

B

E

B

T

E

C

T

E

C

E

= 0

,

= 0

.

(4.32a)

(4.32b)

Note that in the discrete-time case, the system

E

does not any longer have to be strictly proper, however it still has to be asymptotically stable for the

H

2

-norm to be defined.

Theorem 4.4 (Necessary conditions for optimality).

W o are asymptotically stable, for the are Schur. In order for the matrices

H

2

ˆ

,

-norm to be defined, i.e.,

ˆ

, and

Assume that

A

,

G,

ˆ

,

G, W i

A

i and and

A

o to be optimal for the problem

4.4

Model Reduction using an

H

2

-Measure

||

E

||

2

H

2

||

E

|| 2

H

2

= tr

= tr

B

T

QB

CPC

T

B

T

Q

T

12

T

12

B

C

T

B

T

Q

ˆ

+ D

T

D

D

T

D

C

ˆ

C

T

+ DD

T

2 D D

T and the first-order conditions for the gradient simplify to

||

E

|| 2

H

2

ˆ

||

E

||

2

H

2

ˆ

||

E

||

2

H

2

ˆ

||

E

||

2

H

2

ˆ

= 2

= 2

= 2

= 2

Q

ˆ

ˆ −

D

+ Q

Q

ˆ

+ Q

T

12

B

C

ˆ −

CP

12

=

T

12

0

,

AP

12

=

= 0

0

,

,

= 0

,

D

T

D

T

,

,

53

(4.9), it is necessary that they satisfy the equations in (4.32) and that

||

E

||

2

H

2

ˆ

||

E

|| 2

H

2

ˆ

||

E

||

2

H

2

ˆ

||

E

|| 2

H

2

ˆ

=

E

E

2

T

T

Q

E

Q

= 2 D

T

o

D

o

A

E

P

E

E

A

B

T

o

E

T

o

E

P

Q

E

E

D

=

E

i

A

E

0

,

C

T

i

P

E

D

i

+

+ Q

E

B

E

D

i

D

T

i

D

T

o

=

C

E

0

,

P

E

=

=

0

,

0

, where

=

⎜⎜⎜⎜

⎜⎜⎜⎜

⎜⎜⎝

0

I

0

0

n n n

×

ˆ

ˆ

×

n i o

×

ˆ

×

ˆ

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎠

,

E

i

=

⎜⎜⎜⎜

⎜⎜⎜⎜

0

0

I

n

×

ˆ

ˆ

×

n

⎜⎜⎝

0

n n o i

×

ˆ

×

ˆ

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎠

,

E

o

=

⎜⎜⎜⎜

⎜⎜⎜⎜

0

0

⎜⎜⎝

0

n

×

ˆ

ˆ

×

n

I

n n i o

×

ˆ

×

ˆ

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎠

.

(4.33a)

(4.33b)

(4.33c)

(4.33d)

(4.34)

Proof: The proof is analogous to the proof of Theorem 4.2 for the continuous time case.

Theorem 4.5.

The cost function of the optimization problem (4.6) and its gradient, given in Theorem 4.4, are invariant under state transformations of the systems G , W i and W o

.

Proof: The proof is analogous with the proof for Theorem 4.3.

Now looking at the special case when not having any weighting filters, i.e., and

W o

=

I

,

n i

=

n o

= 0, yields the cost function

W i

=

I

(4.35a)

(4.35b)

(4.36a)

(4.36b)

(4.36c)

(4.36d)

54 where P

,

Q

,

ˆ

,

ˆ

,

P

12 and Q

12 satisfy the equations

4 Model Reduction

APA

T

P

AP

12

A

T

P

12

A

ˆ

A

T

A

T

QA

Q

+

+

BB

B

ˆ

B

ˆ

T

T

T

+ C

T

C

T

Q

T

12

A

T

ˆ

Q

T

12

C

T

C

C

T

= 0

,

= 0

,

= 0

,

= 0

,

=

=

0

,

0

.

(4.37a)

(4.37b)

(4.37c)

(4.37d)

(4.37e)

(4.37f)

Note that P and Q satisfy the Lyapunov equations for the controllability and observability Gramians for the given system,

G

, and ˆ satisfy the Lyapunov equations for the controllability and observability Gramians for the sought system, ˆ . For this special case, in discrete time, it is also quite straightforward to derive the Hessian for the cost function. Using di ff erentiated (with respect to

a ij

, b ij

, c ij

) versions of the equations in (4.37) and using Lemma 4.1 and Lemma

4.2, entails

∂a

∂ ij

2

V

∂a kl

=2 Q

T

12

A

P

12

∂a ij

+ 2

ˆ

Q

ˆ

∂a ij kl kl

+ 2

+ 2

Q

T

12

A

P

∂a

12

kl

ˆ

Q

ˆ

∂a kl ij

.

ij

+ 2

∂b

2

V

∂c ij

∂c kl

2

V

∂d ij

∂d kl

2

V

∂a ij

∂b kl

=

∂c

∂c

∂a

∂ ij

∂ ij

∂ ij

∂ ij

2

2

2

2

V

∂b

V

∂a

V

∂b

V

∂d kl kl kl kl

=

2

=

⎪⎪⎨

2

0

, ik

, l

0

,

2

, i

0

, lj

, i

=

i l k, j

=

=

=

k k j j

otherwise

l

,

,

,

=2

=2

=2

=

∂b

ˆ

Q

ˆ

∂b kl

ˆ

∂a kl ij

2

ij

ˆ

∂b kl

V

∂d kl ij

+ 2

2

2

Q

ij

=

2

V

∂c ij

∂d kl

T

12

C

P

12

∂a kl

C

P

12

∂b kl

A

= 0

P

.

∂b ij ij

,

,

12

kl ij

, ik lj

(4.38a)

(4.38b)

(4.38c)

(4.38d)

(4.38e)

(4.38f)

(4.38g)

(4.38h)

4.4

Model Reduction using an

H

2

-Measure

55

The explicit equations for the cost function, the gradient and the Lyapunov equations for the case when having both input and output filters are included in Appendix 4.B.2.

4.4.2

Robust Model Reduction

In the previous section, it has been tacitly assumed that the given data, (i.e., the state-space matrices) are exact. In a more realistic setting, the presence of errors (e.g., modeling, truncation or round-o ff

) in these data can be assumed. The question is how to cope with these errors and take them into account. This can for example be done using

robust optimization

. However, this is a very di ffi cult problem, see, e.g., Bertsimas et al. [2011] or Ben-Tal and Nemirovski [2002]. In this section, a di ff erent view of robust optimization is investigated, that is to use

regularization

as a proxy for robust optimization, which can be seen as a worstcase optimization approach.

Before presenting the equations for the regularized model-reduction problem, the idea is first presented by using a more general description to get an intuition for the idea. The idea is then exemplified using a least-squares ( ls

) problem and a quadratic programming ( qp

) problem.

Regularization can be used to make ill-posed problems well posed or to make a solution less sensitive when having small amount of data. Commonly used regularization methods are for example problems referred to, in the

1

-case as

1

- and

2

-regularization, for least-squares lasso and in the

2

-case Tikhonov regularization or ridge regression, see e.g., Hastie et al. [2001]. In these regularizations an extra term,

1

- or

V

rob

( x ), is added to the cost function,

2

-norm of the sought variables, i.e.,

V

original

( x ), to penalize the

V

reg

( x ) =

V

original

( x ) +

λV

rob

( x )

.

(4.39)

The regularization parameter, here denoted

λ

, is seen as a design parameter and is in most cases hard to tune (see for example Bauer and Lukas [2011]).

In many applications, there is no

a priori

knowledge about the variables, e.g., that they should be small (typically achieved by should be sparse (typically achieved using

2

1

-regularization) or that the solution

-regularization). Instead, one would like to make the solution less sensitive to uncertainties. As mentioned above, in this section, regularization will be used as a proxy for robust optimization. The idea is to penalize the first-order derivative (with respect to

data

) of the cost function to make it less sensitive to uncertainties in data. This can be interpreted as doing a first-order approximation of the general robust optimization problem minimize x max

|| Λ ||

2

λ

V

( x

,

ˆ )

,

y +

Λ

,

(4.40)

∈ R

m

is the given data, y

∈ sents the uncertainty in the data and

R

m

x

∈ is the unperturbed data,

R

n

Λ ∈ R

m

repreis the sought variable. To see how a regularization can be an approximation of the robust optimization problem, a

Taylor expansion of the cost function with respect to the data is made. Assuming that the cost function is di ff erentiable in the data variables, the cost function can

56

4 Model Reduction be expressed as

V

( x

,

ˆ

) =

V

( x

,

y ) + ( ˆ

− y )

T

V

( x

,

y ) +

O

(

||

ˆ

− y

|| 2

2

)

=

V

( x

,

y ) +

Λ

T

f

( x

,

y ) +

O

(

|| Λ || 2

)

.

(4.41)

Limiting the uncertainty to be bounded, i.e., mum of (4.41), yields

|| Λ ||

2

λ

, and computing the maximax

|| Λ ||

2

λ

V

( x

,

ˆ

) = max

|| Λ ||

2

λ

V

( x

,

y ) +

Λ

T

V

( x

,

y ) +

O

(

|| Λ || 2

2

)

=

V

( x

,

y ) +

λ

V

( x

,

y )

2

+

O

(

λ

2

)

.

(4.42)

To make this more clear, some examples are presented for an ls qp problem.

problem and a

Example 4.2: Robust ls and qp

Let us start with one of the most common problems, an ls problem. Assume that the

data

A and b are given and a solution x , fulfilling x arg min x

V

( x

,

A

,

b ) = arg min x

( Ax

− b )

T

( Ax

− b )

,

(4.43) is sought. To see how, for example, the A -matrix influence the cost function, the cost function is di ff erentiated with respect to A , i.e.

∂V

( x

,

A

,

b )

= 2 tr

∂a ij

e

j

e

i

T

[ Ax

− b ] x

T

,

(4.44) where

a ij

is the (

i, j

) element in A . This yields

∂V

( x

,

A

,

b )

A

= 2 ( Ax

− b ) x

T

.

(4.45)

Hence,

∂V

( x

,

A

A

,

b )

2

= 2

||

Ax

− b

||

2

|| x

||

2

.

An interesting fact about the term in (4.46) is that it can be rewritten as

2

||

Ax

− b

||

2

|| x

||

2

= 2

||

Ax

|| x

||

2 b

||

2

|| x

||

2

2

=

μ

( x )

|| x

||

2

2

,

(4.46) where

μ

( x ) resembles Miller’s choice of regularization parameter (see El Ghaoui and Lebret [1997] or Miller [1970]). In Miller [1970] the regularization parameter

μ

( x ) is determined iteratively.

It is also possible to di ff erentiate with respect to b in the ls problem. This term, together with the terms coming from di ff erentiating with respect to H and f in a qp problem

V

( x ; H

,

f ) = x

T

Hx + f

T x

,

(4.47) are collected in Table 4.1.

4.4

Model Reduction using an

H

2

-Measure

Table 4.1:

The di ff erent regularization terms for the di ff erent variables in the special cases, ls problem and qp problem

Problem Variable Uncertainty in ls ls qp qp qp

H (not sym.)

H

A b

(sym.) f

||

||

||

|| Λ ||

||

Λ

Λ

Λ

Λ

||

||

||

||

F

2

F

F

2

λ

λ

λ

λ

λ

Reg. term

λ

|| x

||

2

λ

||

||

Ax x

||

2

2

− b

||

2

=

λ

||

Ax

||

1

1

2

λ

|| x

||

2

2

2

1

3

4 tr( xx

T

|| x

|| 4

2

λ

|| x

||

2

λ

||

Ax

− b

||

2

|| x

||

2 xx

T

)

|| x

||

2

2

57

Now, the regularization strategy explained above will be used as an extension to the special case of the model-reduction method in Section 4.4.1, having no weighting filters. To reduce the influence of errors in data, the unregularized cost function (4.27) is regularized by adding three new terms. These are the Frobenius norms of the derivatives of the cost function with respect to

the given data

, A , B , C and in

D , i.e., the solution obtained is inclined to be less sensitive to uncertainties

the data

.

The optimization problem with these new terms becomes

ˆ

,

min

ˆ

,

ˆ

,

||

E

||

2

H

2

+

V

rob

, E

=

G

(4.48) where

V

rob

=

A

||

E

|| 2

H

2

A

+

B

||

E

|| 2

H

2

B

+

C

||

E

|| 2

H

2

C

+

D

||

E

|| 2

H

2

D

.

(4.49)

F F F F

Note that here the term and

D

.

V

rob includes the regularization parameters,

A

,

B

,

C

V

rob becomes di ff erent in the continuous-time case and the discrete-time case. By exploiting the symmetry in (4.27), (4.28) and (4.29) with respect to ( ˆ

,

B

,

ˆ

) and

( A

,

B

,

C ) we obtain, in continuous time that

V

rob is

V

rob

= 2

A

QP + Q

12

P

T

12

F

+

B

QB + Q

12

F

+

C

CP

T

12

F

,

(4.50) and in the discrete-time cases it becomes

V

rob

= 2

A

QAP + Q

12

T

12

F

+

B

+

C

QB + Q

12

CP

T

12

F

F

+

D

D

F

.

(4.51)

By di ff erentiating the cost function (4.48) it is possible to state the necessary conditions for optimality, both for the continuous-time case and the discrete-time case.

58

4 Model Reduction

Theorem 4.6 (Necessary conditions for optimality in continuous time).

sume that

H

2

G and G are asymptotically stable and that

-norm to be defined. In order for the matrices

ˆ

,

E

B

is strictly proper, for the and

Asto be optimal for

(4.48), in continuous time, it is necessary that they satisfy the equations in

(4.29)

and the equations

T

W

1

+

AW

2

+

W

1

A

W

2

ˆ

T

+ Q

+

T

12

QP + Q

12

P

T

12

QP + Q

12

P

T

12

P

12

AW

3

+ W

3

A

T

+ QB + Q

12

= 0

,

= 0

,

B

T

= 0

,

A

T

W

4

+ W

4

A C

T T

12

CP = 0

,

(4.52a)

(4.52b)

(4.52c)

(4.52d)

and that

||

E

||

2

H

2

ˆ

||

E

|| 2

H

2

ˆ

||

E

||

2

H

2

ˆ

+

+

∂V rob

ˆ

∂V

∂ rob

ˆ

+

∂V rob

ˆ

= 0

,

= 0

,

= 0

.

(4.53a)

(4.53b)

(4.53c)

With

∂V rob

ˆ

∂V rob

ˆ

∂V rob

ˆ

=4

A

=4

A

W

1

P

12

+

||

E

||

A

H

2

Q

T

12

W

2

F

W

1

B

||

E

||

A

2

H

2

F

+ 4

+ 4

B

Q

T

12

B

QB

Q

T

12

||

E

||

B

2

H

2

W

3

||

E

||

B

H

2

+ Q

12

F

F

=

4

A

CW

2

||

E

||

A

H

2

F

4

B

CW

3

||

E

||

B

H

2

F

+ 4

C

W

4

P

12

||

E

||

C

H

2

F

,

4

C

+ 4

C

W

4

B

||

E

||

C

2

H

2

,

F

CP

||

E

||

C

H

2

T

12

F

P

12

, and

||

E

||

H

2

ˆ

,

||

E

||

H

2

ˆ

and

||

E

||

H

2

ˆ

as in

(4.28)

.

Proof: If

G

are asymptotically stable, the equations in (4.29) and (4.52) are uniquely solvable. The solutions to the equations in (4.29) and (4.52) are needed to compute the cost function and its gradient. Now the gradient of the cost function with respect to ˆ , ˆ and ˆ has to be computed. The first part of the gradient

||

E

||

H

2

ˆ

,

||

E

||

H

2

ˆ and

||

E

||

H

2

ˆ has been computed in Theorem 4.2 and can be found in (4.28). Only the equations for the gradient of the

V

rob

-part is left to be calculated, since this part enters as an additive term in the cost function. The calculations of this part of the gradient are moved to Appendix 4.A.

4.4

Model Reduction using an

H

2

-Measure

59

An analogous result can be stated in discrete time.

Theorem 4.7 (Necessary conditions for optimality in discrete time).

that

G and

G are asymptotically stable, for the for the matrices

A

,

ˆ

, and

H

2

Assume

-norm to be defined. In order to be optimal for (4.48), in discrete time, it is necessary that they satisfy the equations in

(4.37)

and the equations

A

T

W

1

A

AW

2

A

T

W

W

1

2

+

AW

3

A

T

T

Q

T

12

QAP +

QAP + Q

12

Q

12

T

12

T

12

P

12

A

T

W

3

+ QB + Q

12

B

T

= 0

,

= 0

,

= 0

,

A

T

W

4

A

Q

T

12

W

4

Q

12

C

T T

12

CP

T

12

+ QAP P

12

= 0

,

= W

5

,

(4.55a)

(4.55b)

(4.55c)

(4.55d)

(4.55e)

and that

||

E

|| 2

H

2

ˆ

||

E

||

2

H

2

ˆ

||

E

||

2

H

2

ˆ

+

∂V rob

ˆ

+

+

∂V rob

ˆ

∂V rob

ˆ

=

=

=

0

0

0

.

,

,

(4.56a)

(4.56b)

(4.56c)

With and

∂V

∂V

∂V

∂V

∂ rob

ˆ

rob

ˆ

rob

ˆ

rob

ˆ

=4

A

=4

A

W

5

+

W

1

B

||

E

||

A

H

2

W

1

F

AP

12

+ Q

T

12

AW

2

||

E

||

H

2

A

+ 4

B

F

Q

T

12

QB

+ 4

+

||

E

||

B

H

2

Q

12

F

B

=

4

A

CW

2

||

E

||

A

H

2

F

4

B

CW

3

||

E

||

B

H

2

F

Q

T

12

AW

3

||

E

||

H

2

B

F

+ 4

C

W

4

4

C

+ 4

C

W

4

B

||

E

||

C

H

2

,

F

CP

||

E

||

C

H

2

T

12

F

P

12

,

AP

12

||

E

||

H

2

C

F

,

=4

D

D

||

E

||

D

2

H

2

F

,

||

E

||

2

H

2

ˆ

,

||

E

||

2

H

2

ˆ

and

||

E

||

2

H

2

ˆ

as in

(4.36)

.

Proof: The proof is analogous with the one for Theorem 4.6.

60

4 Model Reduction

4.4.3

Frequency-Limited Model Reduction

The method proposed in this section is a new method that was introduced in

Petersson and Löfberg [2012a]. The method relies heavily on the theory in Chapter 3. The variants of this method for continuous and discrete time are similar and, therefore, the continuous-time case will be presented in full detail and we will not provide as much detail for the discrete-time case.

The method proposed in this section is a model-reduction method that given a model

G

G

, finds a reduced order model ˆ , which is a good approximation of on a chosen frequency interval, e.g., [0

, ω

]. The objective is to minimize the discrepancy between the given model and the sought reduced-order model in a frequency-limited

H

2

-norm, using the frequency-limited Gramians. Correspondingly, the optimization problem for this purpose is as follows

= arg min

||

E

||

2

H

2

, E

=

G

(4.58) where

||

E

||

2

H

2

is defined in Chapter 3.

Given the realization in (4.7), the error system can be realized, in state-space form, as

E

:

A

C

E

E

B

D

E

E

=

⎢⎢⎢⎢

A

0

C

0

ˆ

D

B

⎥⎥⎥⎥

.

(4.59)

Continuous Time

In the continuous-time case, the cost function of the optimization problem in

(4.58) can be rewritten as, see Section 3.2.1

-

||

E

||

2

H

2

= tr C

E

P

E,ω

C

T

E

+ 2 tr S

E,ω

-

C

E

= tr B

T

E

Q

E,ω

B

E

+ 2 tr C

E

S

E,ω

B

B

E

E

+

+

D

E

D

E

2

ω

π

ω

2

π

.

.

!

D

T

E

D

T

E

!

.

(4.60a)

(4.60b) where

A

A

E

P

E,ω

T

E

Q

E,ω

+

+ P

E,ω

A

T

E

Q

E,ω

A

E

+ S

E,ω

B

E

B

T

E

+ S

E,ω

C

T

E

C

E

+

+

B

E

B

T

E

S

E,ω

C

T

E

C

E

S

E,ω

= 0

,

= 0

,

(4.61a)

(4.61b) with

S

E,ω

= Re

i

2

π

ln (

A

E

I

)

!

.

(4.62)

Now, the cost function (4.60) can be rewritten using the inherent structure in the problem. This is done by using the realization given in (4.59) and by partitioning

4.4

Model Reduction using an

H

2

-Measure

61 the Gramians P

E,ω

and Q

E,ω

as

P

E,ω

=

P

P

T

ω

12

P

12

P

ω

,

and S

E,ω

as

S

E,ω

=

S

ω

0

Q

E,ω

=

Q

Q

T

ω

12

S

0

ω

.

Q

12

Q

ω

,

(4.63)

(4.64)

P

ω

,

Q

ω

,

equations

ω

,

Q

ω

,

P

12

and Q

12

satisfy, by (4.61), the Sylvester and Lyapunov

AP

AP

12

A

ˆ

ω

+

ω

A

T

Q

ω

A

T

Q

12

+

A

T

Q

ω

+

P

P

ω

12

P

ω

A

A

A

T

T

T

+

+

S

S

S

ω

ω

ω

BB

B

B

ˆ

ˆ

T

T

T

+

+

BB

T

S

ω

B

ˆ

T

B

ˆ

T

ω

ω

+

Q

Q

ω

12

ω

A

ˆ

+ S

ω

S

ω

ω

C

C

T

C

T

T

C +

ˆ

ˆ

C

C

+ ˆ

T

T

T

CS

C

C

ˆ

ˆ

ω

ω

ω

= 0

,

=

=

=

=

=

0

0

0

0

0

,

,

,

,

,

(4.65a)

(4.65b)

(4.65c)

(4.65d)

(4.65e)

(4.65f) with

S

ω

= Re

i

2

π

ln (

A

I

)

!

,

S

ω

= Re

i

2

π

ln

− ˆ −

I

!

.

(4.66)

Note that P

ω

and Q

ω

satisfy the Lyapunov equations for the frequency-limited

ω

Q

ω

satisfy the Lyapunov equations for the frequency-limited controllability and observability Gramians for the sought model, see Section 3.1.1.

With the partitioning of alternative forms

P

E,ω

and Q

E,ω

, it is possible to rewrite (4.60) in two

||

E

|| 2

H

2

||

E

||

2

H

2

= tr B

T

+ 2 tr

Q

ω

CS

B + 2 B

T

ω

B +

Q

12

D

ω

2

π

-

C

ˆ

B

T

Q

ω

ω

= tr CP

ω

+ 2 tr

C

T

CS

ω

2 CP

12

B + D

2

ω

π

C

T

-

C

C

ˆ

ω

ˆ

ω

C

T

ω

2

π

. !

D

T

D

T

ω

2

π

. !

D

T

D

T

.

,

(4.67a)

(4.67b)

Of course, as in Chapter 3, it is possible to have arbitrary segments in the frequency domain, e.g.,

0

< ω

1

< ω

2

< ω

3

||

E

|| 2

H

2

< ω

4

,

Ω

,

Ω

= [

ω

4

,

ω

3

]

[

ω

2

,

ω

. Important to note, is that if

1

]

Ω

[

ω

1

, ω

2

]

[

ω

3

, ω

4

], does not contain an infinite interval, then neither the given system to be reduced,

G

, nor the reduced system, ˆ , have to be strictly proper.

An appealing feature of the proposed optimization problem (4.58), is that the corresponding cost function, (4.67), is di ff erentiable in the system matrices, ˆ

,

ˆ

,

62

4 Model Reduction and ˆ . In addition, the closed-form expressions obtained when di ff erentiating the cost function is expressed in the given data ( A

,

B

,

C and variables ( ˆ

,

B

,

and ˆ

D ), the optimization

) and solutions to the equations in (4.65). This makes it possible to formulate necessary conditions for optimality for the optimization problem (4.58).

Theorem 4.8 (Necessary conditions for optimality).

asymptotically stable, for the frequency-limited and

A

are Hurwitz. In order for the matrices

ˆ

,

ˆ

H

,

ˆ

2

C

Assume that

G and are

-norm to be defined, i.e., and

D

A

to be optimal for the problem (4.58), it is necessary that they satisfy the equations in (4.65) and the equations in (4.29) and that

||

E

|| 2

H

2

ˆ

||

E

||

2

H

2

ˆ

||

E

||

2

H

2

ˆ

||

E

||

2

H

2

ˆ

=2

=2

Q

Q

T

12

ω

P

+

12

Q

Q

T

12

ω

B

T

ω

2 W

C

T

=

D

0

=2 C

ˆ

ω

CP

12

D

=

2

-

CS

ω

B + D

ω

π

C

ˆ

ω

ˆ −

B

T

,

= 0

T

ω

= 0

,

ω

.

π

= 0

,

,

(4.68a)

(4.68b)

(4.68c)

(4.68d)

where

W

V

-

i

= Re

π

C

T

C

ˆ

L

− ˆ −

C

T

CP

12

I

,

V

.

T

,

C

T

D

B

T

(4.68e)

(4.68f)

with the function L

(

· ,

Higham [2008].

·

)

being the Frechét derivative of the matrix logarithm, see

Proof: If A and ˆ are Hurwitz, then the equations in (4.65) are uniquely solvable, see Theorem 2.1. These are needed to compute the cost function and its gradient.

Now, the gradient of the cost function with respect to ˆ

,

ˆ

,

have to be calculated. However, this is done in Appendix 4.C, since the calculations are quite long.

As in Section 4.4.1 the optimization problem in this section also becomes invariant to the realization of the given model to be reduced, as can be seen in the following theorem.

Theorem 4.9.

The cost function in the optimization problem

(4.58)

and its gradient, given in Theorem 4.8, are invariant under state transformations of the system

G

.

Proof: Given the realization of

G

in (4.7) and a transformations matrix T , the

4.4

Model Reduction using an

H

2

-Measure realization of the transformed system becomes

G

= =

T

1

AT

CT

T

1

B

D

.

Realizing that

S

ω

= Re

i

2

π

S

ω

= T

1

S

ω

T , since

!

ln (

A

I

) = Re

=

2

i

π

T

1 ln

Re

T

1

i

2

π

− ¯ ln

I

− ¯ −

T

!

I

!

T = T

1

S

ω

T

,

the proof is analogous to the proof in Theorem 4.3.

63

Discrete Time

In the discrete-time case, the cost function in (4.58) can be written as, see Section 3.2.2,

||

G

||

2

H

2

= tr CP

+ 2 tr

-

C

T

CR

ω

B

C

+

ˆ

ω

2

ω

C

D

T

= tr B

T

Q

+ 2 tr

ω

D

B + 2 tr B

T

T

-

Q

12

CR

ω

B +

2 tr CP

C

ˆ

ω

ˆ

12

ω

2

D

C

T

.

D

ˆ

+ tr ˆ

T

ω

ω

2

D

C

ˆ

ω

ˆ −

ω

2

T

.

,

(4.69a)

(4.69b) where

A

AP

ω

A

T

P

ω

AP

A

T

Q

ω

12

A

T

A

P

Q

ω

12

T

Q

12

A

Q

12

+ S

ω

+ S

T

ω

BB

T

C

T

C +

+ BB

T

S

T

ω

C

T

CS

ω

+ S

ω

BB

T

+ BB

T

S

T

ω

+ S

T

ω

C

T

C + C

T

CS

ω

= 0

,

=

=

=

0

0

0

,

,

,

(4.70a)

(4.70b)

(4.70c)

(4.70d) with

S

ω

=

1

2

π

Re

ω

I −

2

i

ln

I −

A e

,

S

ω

=

1

2

π

Re

ω

I −

2

i

ln

I − ˆ e

,

R

ω

=

1

π

A

1

Re

i

ln

I −

A e

,

R

ω

=

1

π

A

1

Re

i

ln

I − ˆ e

.

(4.70e)

(4.70f)

For the discrete-time case it is also possible to calculate a closed form expression for the gradient of the cost function, and again this makes it possible to formulate necessary conditions for optimality.

Theorem 4.10 (Necessary conditions for optimality).

are asymptotically stable, for the frequency-limited

A

and

A

are Schur. In order for the matrices

ˆ

,

ˆ

,

H

2

-norm to be defined, i.e., and

Assume that

G and to be optimal for the problem in (4.58), it is necessary that they satisfy the equations in (4.70) and the

64

4 Model Reduction

equations in (4.37) and that

||

E

||

2

H

2

ˆ

=2 Q

T

12

AP

12

ω

A

ˆ

+ W

A

− T

C

T

D

B

T

R

T

= 0

,

||

E

|| 2

H

2

ˆ

||

E

||

2

H

2

ˆ

||

E

|| 2

H

2

ˆ

=2

=2

=

Q

C

2

ω

ˆ

ω

+ Q

T

12

B

CP

12

D

T

ω

C

T

D

CR

ω

B + D

ω

C

ˆ

ω

ˆ −

B

T

D

ˆ

ω

T

ω

=

=

= 0

,

0

0

,

, where

W

V

= Re

-

i

π

e

iπω

L

I −

C

T

ˆ −

P

T

12

C

T

ˆ −

ˆ e

,

V

.

T

,

D

T

C

ˆ

1

(4.71a)

(4.71b)

(4.71c)

(4.71d)

(4.72a)

(4.72b)

with the function L

(

· ,

Higham [2008].

·

)

being the Frechét derivative of the matrix logarithm, see

Proof: The proof is analogous to the proof for Theorem 4.8 for continuous time.

Theorem 4.11.

The cost function to the optimization problem (4.6) and its gradient, given in Theorem 4.10, are invariant under state transformations of the system G .

Proof: Realizing that S

ω

= T

1

S gous to the proof in Theorem 4.3.

ω

T and R

ω

= T

1

R

ω

T , makes the proof analo-

4.5

Computational Aspects of the Optimization

Problems

In this section, suggestions for how to initialize the optimization and how the optimization can be performed e ffi ciently, by using the inherent structure to speed up the computations, will be presented.

For all the methods that have been presented in Section 4.4, a cost function has been given and necessary conditions for optimality. The gradients for all the methods are readily extracted from the necessary conditions for optimality for the methods. With this information it is straightforward to, for example, use any quasi-Newton solver, see Section 2.2.1, to solve the optimization problem in (4.6).

For two special cases, the Hessians were also calculated, which can be used to

4.5

Computational Aspects of the Optimization Problems y y

1

G

1

+

+

+ y

2

G

2

G

u

Figure 4.3:

Models in parallel

65 initialize the Hessian in the quasi-Newton solver. Computing the Hessian in all iterations would be to computationally expensive.

4.5.1

Structure in Variables

In some cases, the system matrices A

,

B

,

C and D have a certain structure, that is desired to preserve while computing ˆ . In other words, it is desirable to have a similar structure in the system matrices for ˆ A

,

ˆ

,

and ˆ . For example, assume that

G

has the structure as given in Figure 4.3, with two systems in parallel where we want to use model reduction on the system

G

, but also keep the internal parallel structure. In this case a block diagonal ˆ -matrix is desired.

Looking at all the cost functions in Section 4.4, there is nothing holding us back from introducing structure in the system matrices, e.g., block diagonal ˆ , when formulating our optimization problem. The question is if the derived gradients are still usable when having structure in the system matrices, and the answer is, yes. This is because all the steps in deriving the gradients have been done element is desirable, only are relevant and are hence used. In general, for this purpose, the so called structure variables S

,

S

,

S and S , are introduced, which holds the structure of the system matrices, i.e., element (

i, j

) in

S is 1 if element (

i, j

) is a variable in the sought system matrix and 0 otherwise.

The gradients now become where

||

E

||

2

H

2

ˆ

||

E

|| 2

H

2

ˆ

S

S

ˆ

,

,

||

E

||

2

H

2

ˆ

||

E

|| 2

H

2

ˆ

S

S

ˆ

,

,

denotes the Hadamard (element wise) product of two matrices.

Furthermore, with ˆ and S

ˆ

,

ˆ

,

ˆ

D initialized with structure according to S

ˆ

,

S

, the structure will remain when moving along a quasi-Newton step.

,

S

4.5.2

Initialization

The optimization problem in (4.6), is both nonlinear and non-convex, see, for instance, Example 4.1. This makes the initialization an important part of the

66

4 Model Reduction problem. For the methods proposed in this chapter, the model used for initialization has to be asymptotically stable. Since there exists numerous methods for model reduction, which are easily computed and produces asymptotically stable reduced models, e.g., balanced truncation, see Section 4.2, any of them can be used to create a model for initialization. In the special cases, in Section 4.4.1, where there are no input or output filters, even more can be done for the initialization. Looking at the cost functions, (4.27) and (4.35), one sees that the cost

(or ˆ A and ˆ ) are fixed, and since ˆ P ) is positive semidefinite, the quadratic program is solvable. Hence, first a basic initialization is used to obtain a model with the correct number of states, e.g., using balanced truncation. This model is then used in the quadratic and ˆ .

4.5.3

Structure in Equations

In this section, the inherent structure in the equations will be used to speed up the computations. First, remember that the problem is a model

reduction

problem, and in most cases ˆ . The analysis in this section will be based on the continuous-time case, but the same results are also valid for the discrete-time case. Consider the cost function for the general case, when using input and output filters, (4.98). The terms D

i

T

B

T

QBD

i

and D

o

CPC

T

D do not depend on any of the optimization variables and are the only terms that include the matrices P and Q (see (4.96), (4.97) and (4.98)). Hence, puted. The same applies for the terms B

T

Q

ω

and Q

ω

in (4.65).

P

B and and

Q

CP does not have to be com-

ω

C

T and the matrices P

ω

In all the presented methods, for every iteration in the solver, both the cost function and its gradient have to be computed. To do this a number of Lyapunov and

Sylvester equations have to be solved. This is where most of the computational time is spent. Therefore, before starting to analyze what is done in every iteration, a brief explanation on how to solve a general Sylvester equation is presented. A general Sylvester equation can be written as

AX + XB + C = 0

,

A

∈ R

n

×

n

,

B

∈ R ˆ

×

n

,

C

∈ R

n

×

ˆ

.

(4.73)

The first main step when solving a Sylvester equation is to Schur factorize (see e.g., Golub and Van Loan [1996] or Bartels and Stewart [1972]) can be done in

O

(

n

3

) operations for A and

O

( ˆ

3

) operations for

A and B , which

B . Now the equation

A

S

X

S

+ X

S

B

S

= C

S

(4.74) has to be solved, where A

S

= U

T

AU and B

S

computed using the Schur factorization and

=

C

V

T

S

BV

= U

T are block upper triangular,

CV and X

S

= U

T

XV . It is not hard to verify that the new system of linear equations, (4.74), can be solved in

O

(

n

2

+

n

2

) complexity, and the solution to (4.73) is computed as, which also costs

O

(

n

2

+

n

ˆ

2

X = UX

S

V

T

). It can be concluded that when solving several

Sylvester equations with the same factors A and B but di ff erent C :s, speed can be gained in the computations if A and B are Schur factorized before solving the

4.6

Examples

67 equations. It can also be concluded that it is computationally much more e ffi cient to use the structure in the realizations (4.12) and (4.59) and split up the large

Lyapunov equations for P

E

and Q

E

in a number of smaller Lyapunov/Sylvester equations, as described in (4.96) and (4.97), which can be solved much more e ffi

ciently.

For the methods in Section 4.4.1 and Section 4.4.3, which are invariant under state transformations, the given system

G

(and the input and/or output filter if they are present) can be transformed to a basis such that the A -matrices are upper triangular (Schur factorize the factorization of A , such that A = U

¯

A -matrices). In other words, given a Schur

T is block upper triangular and U is orthogonal, we can transform the system as follows,

G

=

U

T

AU

CU

U

T

D

B

=

,

(4.75) and use this realization during the iterations. Additionally, looking at the Lyapunov/Sylvester equations needed to be solved (equations (4.96) and (4.97) or equations (4.65) or (4.52)), one observes that they all have the same underlying structure, i.e., their factors in the equations are A , ˆ , A

i

, and A

o

. Assuming that

A (and A

i

and A

o

) is given in real Schur form, then for every iteration only the has to be Schur factorized, which is small compared to A , to be able to solve all Lyapunov/Sylvester equations at a maximum cost of

O

(

n

2

ˆ +

n

2

).

4.6

Examples

In this section, some examples that show the applicability of the proposed methods will be presented. Where it is possible, comparisons with other relevant methods will be made. To be able to measure how well di ff erent methods perform, the relative error for the particular norm in use will be utilized, i.e.,

G

||

G

||

H

H

.

(4.76)

To shorten the names and make the figures more readable our proposed methods will be denoted as

• h

2 nl

– the ordinary model-reduction method without weights, described in Section 4.4.1

• wh

2 nl

– the ordinary model-reduction method with weights, described in

Section 4.4.1

• flh

2 nl

– the frequency-limited model-reduction method, described in Section 4.4.3

• rh

2 nl

– the robust model-reduction method, described in Section 4.4.2

The methods that will be used for comparison, in the di ff erent examples, are

68

4 Model Reduction

• bt

– ordinary balanced truncation, the implementation used is the function schurmr in Robust Control Toolbox in

M atlab

• wbt

– weighted balanced truncation, an implementation of the method in

Enns [1984]

• flbt

– frequency-limited balanced truncation, an implementation of the method in Gawronski and Juang [1990]

• mflbt

– modified frequency-limited balanced truncation, an implementation of the method in Gugercin and Antoulas [2004]

• itia

– iterative tangential interpolation algorithm, the implementation in the more

-toolbox is used (see Poussot-Vassal and Vuillemin [2012])

• istia

– iterative svd

-tangential interpolation algorithm (see Poussot-Vassal and Vuillemin [2012]), the implementation in the more

-toolbox is used

• flistia

– frequency-limited iterative tangential interpolation algorithm(see

Vuillemin et al. [2013]), the implementation in the more

-toolbox is used

We start with an example to illustrate that the balanced truncation method can be used for initialization of the proposed methods.

Example 4.3:

H

2

Model Reduction

In this example 10000 random asymptotically stable and strictly proper siso systems with 20 states using the function rss in Control System Toolbox in

M atlab are generated. On each of these systems, the number of states are reduced to 10 with h

2 nl and model from bt step on top of bt

. When reducing the order of a system with h

2 is used as the initial point. In this case h

2 nl

, the reduced nl works as a refinement bt

.

In Figure 4.4, two histograms are plotted. They show the histograms of the entities

||

G

− ˆ

G

− ˆ h bt

||

H

2 and

||

G

G

− ˆ

ˆ bt

||

H∞ respectively. In other words, they show how

2 norm, using nl h

H

2

2 nl

.

h

2 nl h

2 nl H∞ much the systems reduced using bt have been improved, in

H

2

-norm and

H

∞ works well as a model-reduction method and can in

H

2

-

most cases decrease the model reduction error 1-6 times, measured in the norm. The average improvement in

H

2

-norm is 4.15. Observe that also the

H

∞ is not

norm can be improved when using h a solution to a minimum norm, takes 1.82 seconds and with bt

H

2

2 or nl

, this is because of the fact that bt

H

, problem. In average a run with it takes 0.07 seconds.

h

2 nl

We continue with two more examples based on a medium-scale model of a clamped beam. For the first example we use ordinary model reduction without weights and for the second one the frequency-limited model-reduction method is utilized.

4.6

Examples

69

Ratio for

H

2

-norm

400

300

200

100

0

0 5 10

Ratio for

H

-norm

15 20

600

400

200

0

0 0

.

5 1 1

.

5

Ratio

2 2

.

5

Figure 4.4: reduced using between the

The figure illustrates, in two histograms, how much a system

H

bt

has been improved using

-norm and the error system from using

H

2

h

2

-norm of the error system from using nl

, i.e.,

||

G

h

G

− ˆ

h

2

2

nl

ˆ

bt

||

nl

H

H

. The

.

x -axis is the quotient bt and

Example 4.4: Clamped Beam Model, varying order

In this example a model of a clamped beam, a siso model with 348 states which can be found in Leibfritz and Lipinski [2003], is used. The model will be reduced to di ff erent orders,

n r

[4

,

30], with h

2 nl be compared with models reduced using

. The reduced models using istia

, itia and bt h

2 nl will

. In the left plot of

Figure 4.5, it can be observed that for small than bt

, for the

H

2

-norm, and for larger

n r n r

, h

2 nl

, itia and istia are better the error approaches zero for all methods. It can also be observed, in the right plot of Figure 4.5, that, even though we are minimizing the

H

2

-norm, the

H

-norm remains small for all the methods.

Example 4.5: Clamped Beam Model, limited frequency interval

In this example, the model of the clamped beam from the previous example is reused. This time, instead of trying di ff erent orders, the focus will be on finding reduced models for di ff erent frequency intervals, [0

, ω

]

, ω

[2

,

40] and fix the reduced-order model to have 12 states,

n r

= 12. The proposed method flh

2 nl will be used and it will be compared with the frequency-limited methods flistia

, flbt and mflbt

. Additionally, the methods wh

2 nl and wbt will be used, both with a tenth order Butterworth low-pass filter, with the cut-o ff frequency equal to

ω

. Looking at the left plot of Figure 4.6, it can be observed that for small all the

H

2 optimal methods do very well. However, for

ω >

7, h

2 nl

ω

gives better

, result than all the other methods. As in the previous example, the relative

H

-

70

4 Model Reduction

10

− 2

Relative error for H

2 norm

10

− 2

10

− 3

Relative error for

H

∞ norm bt itia istia h

2 nl

10

− 3

10

− 4

10

n r

20 30

10

5

10

n r

20 30

Figure 4.5: h 2 nl

, itia the relative

,

Reduction of a clamped beam model to di ff erent orders using istia and bt

. To the left, the relative

H

2 error and to the right

H

∞ error.

norm remains low, for almost all

H

ω

, the H

2 optimal methods have better relative

-error than the methods using balanced truncation.

Now, two smaller examples are presented to show how models coming from frequency-limited methods can look in the frequency region of interest and outside this region. We start with a small toy example.

Example 4.6: Small toy example

This example considers a small model with four states. The model is composed of two second-order models in series, one with a resonance frequency at and the other at

ω

= 3. The frequency range is limited to capture the first model. The model used is

ω

∈ [0

,

ω

= 1

2] to try to only

G

=

G

1

G

2

=

s

2

+ 0

1

.

2

s

+ 1

s

2

+ 0

.

9

003

s

+ 9

.

(4.77)

The methods flh

2 nl

, flistia

, are also compared with the methods pass Butterworth filter with a cut-o ff from the di ff erent methods can be seen in Figure 4.8, Figure 4.9 and Table 4.2.

As can be seen in the result, flbt flh

2 nl

, and wh

2 frequency of 2, see Figure 4.7. The results wh mflbt nl

2 nl

, and are compared. These methods wbt flistia using a tenth order lowand flbt in finding a good model for the relevant frequencies, especially are successful flh

2 nl

, which is almost six times better, in

Table 4.2.

mflbt

H

2

-norm, than the second best model, wh

2 nl

, see captures the wrong resonance mode (from our perspective) and fails completely in the lower frequency region, and wbt gain at both the resonance frequency and at the cut-o ff misses to capture the frequency. Interesting to note is how the methods, that does a good job, sacrifices the model fit at higher frequencies for the lower.

4.6 Examples

1

.

4

1

.

2

1

0

.

8

0

.

6

0

.

4

0

.

2

0

Relative error for

· 10

− 2

H

2 norm

10 20

ω

30

40

1

.

4

1

.

2

1

0

.

8

0

.

6

0

.

4

0

.

2

0

Relative error for

· 10

− 3

H

∞ norm wbt flbt h

2 nl mflbt flistia wh

2 nl

10 20

ω

30

40

Figure 4.6: Reduction of a clamped beam model to 12 states with focus on the frequency interval

[0

, ω

]

, ω

[2

,

40] using flh

2 nl

, wh

2 nl

, flistia

, flbt and mflbt

. The filter used for the weighted methods is a tenth order

Butterworth low-pass filter with cut-o ff frequency

ω

. To the left, the relative

H

2 error and to the right the relative

H

∞ error.

71

Magnitude plot for the filter, the true model and the filtered true model

20

0

− 20

− 40

10

− 1

True

Filter

Filtered True

10

0

Frequency [rad

/

s]

10

1

Figure 4.7: The true and filtered model and the low-pass filter for Example

4.6. The dashed vertical line denotes

ω

= 2

.

72

4 Model Reduction

40

20

0

Magnitude plot for the true and the reduced models wbt mflbt flbt flistia flh

2 nl wh

2 nl

True

− 20

40

10

1

10

0

Frequency [rad

/

s]

10

1

Figure 4.8: The true and reduced-order models for Example 4.6. The dashed vertical line denotes

ω

= 2

.

flh 2 nl

, wh 2 nl

, flistia and flbt are successful in finding a good model for the relevant frequencies while mflbt and wbt fails.

40

20

0

Magnitude plot for the error models wbt mflbt flbt flistia flh

2 wh

2 nl nl

− 20

40

− 60

10

− 1

10

0

Frequency [rad

/

s]

10

1

Figure 4.9: The error models for the di ff erent methods for Example 4.6. The dashed vertical line denotes

ω

= 2

.

flh

2 nl

, wh

2 nl

, flistia and flbt are successful in finding a good model for the relevant frequencies while mflbt and wbt fails.

4.6 Examples

73

Table 4.2: mflbt flbt flistia flh wh wbt

2

2 nl nl

Numerical results for Example 4.6

||

G

||

||

G

||

H

2

H

2

3.01e-01

1.00e+00

6.31e-02

6.38e-02

1.02e-02

5.97e-02

||

G

||

H∞

||

G

||

H∞

2.91e-01

1.00e+00

4.00e-02

3.96e-02

1.15e-02

3.95e-02

Re

λ

max

-1.00e-01

-1.51e-03

-9.93e-02

-9.99e-02

-1.01e-01

-1.00e-01

Magnitude plot for the filter, the true model and the filtered true model

40

20

True

Filter

Filtered True

0

− 20

− 40

− 60

10

0

10

1

10

2

10

3

Frequency [rad

/

s]

10

4

10

5

Figure 4.10: The true and filtered model and the band-pass filter for Example 4.7. The dashed vertical lines denote

ω

= 10 and

ω

= 10000

.

Example 4.7: CD player

This example uses a slightly larger model, a model of a compact-disc player with

120 states and two inputs and two outputs, see Leibfritz and Lipinski [2003]. In this example, to illustrate the result in the same way as in the previous example, only one siso part of the transfer function is chosen, namely the transfer function from the second input to the first output of the model. Here, focus will be on a banded frequency interval,

ω

∈ [10

,

1000] where the main peak gain is, see Figure 4.10. The methods that will be compared are the frequency-limited methods flbt

, mflbt

, flistia and flh

2 nl and the weighted methods with a tenth order Butterworth band-pass filter with cut-o ff wbt and wh

2 nl frequencies equal to

ω

= 10 and

ω

= 1000. Looking at the results in Figure 4.11, Figure 4.12 and Table 4.3 all the methods, except flistia

, does a good job, and again nl finds the best model.

flh

2

74

4 Model Reduction

40

Magnitude plot for the true and the reduced models

20

0 wbt mflbt flbt flistia flh

2 nl wh

2 nl

True

20

− 40

− 60

10

0

10

1

10

2

10

3

Frequency [rad

/

s]

10

4

10

5

Figure 4.11: The true and reduced order models for Example 4.7. The dashed vertical lines denote

ω

= 10 and

ω

= 10000

.

flh 2 nl

, wh 2 nl

, mflbt

, wbt and flbt are successful in finding a good model for the relevant frequencies. However, in this example the method flistia fails.

20

0

Magnitude plot for the error models wbt mflbt flbt flistia flh

2 wh

2 nl nl

20

40

60

10

0

10

1

10

2

10

3

Frequency [rad

/

s]

10

4

10

5

Figure 4.12: The error models for the di ff erent methods for Example 4.7.

The dashed vertical lines denote

ω

= 10 and

ω

= 10000

.

flh

2 nl

, wh

2 nl

, mflbt

, wbt and flbt are successful in finding a good model for the relevant frequencies. However, in this example the method flistia fails.

4.7

Conclusions

75

Table 4.3:

wbt

||

Numerical results for Example 4.7

G

− ˆ

||

||

G

||

H

2

H

2

1.24e-03

||

G

||

H∞

||

G

||

H∞

9.50e-04

Re

λ

max

-5.55e+00 mflbt flbt

1.25e-03

1.24e-03

9.43e-04

9.41e-04

-5.54e+00

-5.54e+00 flistia flh wh

2

2 nl nl

8.23e-02

6.95e-04

8.94e-04

5.64e-02

6.83e-04

7.76e-04

-2.26e-01

-5.63e+00

-5.80e+00

Example 4.8: CD player with perturbed poles

In this example, the model of the CD player from the last example is used again.

However, in this case the system matrices of the model are perturbed such that

A pert

= A + E

A

B pert

= B + E

B

C pert

= C + E

C

A

B

C

,

,

,

where the elements in the distribution

N

E

A

,

0

,

0

.

05

2

E

B and E

C are independent random variables with

. The perturbed model will be reduced to a fifteenth order model using rh

2 h

2 nl

. This will be compared with reducing the model with nl with di ff erent values of the regularization parameter. This procedure is repeated 250 times with di ff erent realizations of the random variables and the average is computed. The result from the optimization can be seen in Figure 4.13.

In this figure, the average relative error between the true, unperturbed, model and the reduced models, as a function of the regularization parameter, for rh

2 nl and h

2 nl are plotted.

In Figure 4.13 one observes that for the tested values of the regularization parameters it is possible, in this case, to find a better model. Even for the

H

-norm, it is possible to find a model that performs better than the unregularized method.

Some more examples using model reduction methods will be performed in Chapter 7, where two larger examples are presented which need more background.

4.7

Conclusions

In this chapter, three model-reduction methods (in both continuous and discrete time) based on minimizing the

H

2

-norm using optimization have been presented.

For these methods, both cost functions and gradients have been derived, which makes it possible to e ffi ciently use of-the-shelves quasi-Newton solvers. For a few cases the Hessians have been derived, which also can be utilized in the quasi-

Newton solver. The derivation of the methods enables us to impose structural

76

4 Model Reduction

Relative error for H

2 norm Relative error for

H

∞ norm

4

3

2

4

3

2

1

10

3

10

2

10

1

α

10

0

10

1

1

10

3

10

2

10

1

α

10

0

10

1

Figure 4.13: turbations, for and the blue line is the average relative error, over di ff erent perturbations, for h

2 nl

(

H

2

The black line is the average relative error, over di ff erent perrh

2 nl using di ff erent values of the regularization parameter

-norm in the left plot and

H

-norm in the right).

constraints, e.g., block diagonal ˆ -matrix, in the system matrices. Additionally, a number of examples showing the applicability of the methods, both for small and medium-scale problems have been presented, for which the methods have performed well.

One of the drawbacks with the methods is the non-convexity of the problem.

One way to possibly reduce the influence of the non-convexity is to have a better initialization, which is a subject of further research. However, for the examples presented in this chapter the proposed initialization procedure seems to work.

Appendix

4.A

Gradient of

V

rob

In this appendix, the derivation of the gradient, with respect to ˆ B

V

rob in (4.49) in Section 4.4.2 will be presented, where and ˆ , for

V

rob

=

A

||

E

||

2

H

2

A

F

+

B

||

E

||

2

H

2

B

F

+

C

||

E

||

2

H

2

C

F

+

D

||

E

||

2

H

2

D

F

.

(4.78)

To di ff erentiate function

f

V

rob

, we first need the definition of the Frobenius norm. Given a

(

x

), the Frobenius norm is defined as

2

||

f

(

x

)

||

F

tr

f

(

x

)

T

f

(

x

)

.

(4.79)

Di ff erentiating

||

f

(

x

)

||

F

with respect to

x

, yields

||

f

(

x

)

||

F

∂x

=

∂x

tr

f

2

||

f

(

x

)

T

f

(

x

)

||

F

(

x

)

.

(4.80)

This means that the still unknown part when calculating numerator part. Given the structure of

V

rob

||

f

(

x

)

||

F

∂x

given

f

(

x

), is the in (4.78) this means that to obtain an expression for

V

rob

, we need to calculate, for example, terms like

ˆ tr

⎜⎜⎜⎜

⎜⎜⎝

⎢⎣

⎢⎢⎢⎢

||

E

||

A

2

H

2

⎥⎥⎥⎥

⎥⎦

T

⎢⎢⎢⎢

⎢⎣

||

E

||

A

2

H

2

⎥⎥⎥⎥

⎥⎦

⎟⎟⎟⎟

⎟⎟⎠

.

In this appendix, elements in the matrices ˆ

,

and

c ij

respectively.

ˆ

C will be denoted with

a ij

, b ij

77

78

4 Model Reduction

To simplify the equations later on, four new Sylvester equations are defined,

T

W

1

+

AW

2

+

W

1

A

W

2

ˆ

T

+ Q

+

T

12

QP + Q

12

P

T

12

QP + Q

12

P

T

12

P

12

AW

3

+ W

3

A

T

+ QB + Q

12

= 0

,

= 0

,

B

T

= 0

,

A

T

W

4

+ W

4

A C

T T

12

CP = 0

,

(4.81a)

(4.81b)

(4.81c)

(4.81d) whose origin will become clear soon. Di ff erentiated versions of the equations in

(4.29) will also be needed

A

T

A

P

12

A

T

∂a ij

Q

T

12

∂a ij

∂a

ˆ

ij

+

ˆ

∂a ij

+

+

P

ˆ

+

12

∂a ij

Q

T

12

∂a ij

A +

ˆ

T

∂a ij

T

+

ˆ

P

12

ˆ

T

∂a ij

Q

∂a

Q

T

12

ˆ

∂a

ˆ

T

ij ij

A

P

∂b

12

ij

+

P

∂b

12

ij

A

T

+ B

ˆ

T

∂b ij

A

T

Q

T

12

∂c ij

+

Q

∂c

T

12

ij

A

ˆ

∂c

T

ij

C

= 0

,

=

=

=

=

0

0

0

0

.

,

,

,

(4.82a)

(4.82b)

(4.82c)

(4.82d)

(4.82e)

We start with the terms containing

||

E

||

A

2

H

2

, tr

⎜⎜⎝

⎜⎜⎜⎜

⎢⎣

⎢⎢⎢⎢

||

E

||

2

H

2

A

⎥⎦

⎥⎥⎥⎥

T

⎢⎣

⎢⎢⎢⎢

||

E

||

2

H

2

A

⎥⎦

⎥⎥⎥⎥

⎟⎟⎠

⎟⎟⎟⎟

= tr

-

4 QP + Q

12

P

T

12

T

QP + Q

12

P

T

12

.

= 4 tr PQQP + 2 P

12

Q

T

12

QP + P

12

Q

T

12

Q

12

P

T

12

.

(4.83)

Di ff erentiating with respect to ˆ :

∂a ij

tr

⎜⎜⎝

⎜⎜⎜⎜

⎢⎣

⎢⎢⎢⎢

||

E

|| 2

H

2

A

⎥⎦

⎥⎥⎥⎥

T

⎢⎣

⎢⎢⎢⎢

||

E

|| 2

H

2

A

⎥⎦

⎥⎥⎥⎥

⎟⎟⎠

⎟⎟⎟⎟

=

= 8 tr

P

12

∂a ij

Q

T

12

QP + Q

12

P

T

12

+

Q

T

12

∂a ij

QP + Q

12

P

T

12

P

12

.

(4.84)

Di ff erentiating with respect to ˆ :

∂b ij

tr

⎜⎜⎝

⎜⎜⎜⎜

⎢⎣

⎢⎢⎢⎢

||

E

|| 2

H

2

A

⎥⎦

⎥⎥⎥⎥

T

⎢⎣

⎢⎢⎢⎢

||

E

|| 2

H

2

A

⎥⎦

⎥⎥⎥⎥

⎟⎟⎠

⎟⎟⎟⎟

= 8 tr

P

12

∂b ij

Q

T

12

QP + Q

12

P

T

12

.

(4.85)

4.A

Gradient of

V

rob

Di ff erentiating with respect to ˆ :

∂c ij

tr

⎜⎜⎝

⎜⎜⎜⎜

⎢⎣

⎢⎢⎢⎢

||

E

|| 2

H

2

A

⎥⎦

⎥⎥⎥⎥

T

⎢⎣

⎢⎢⎢⎢

||

E

|| 2

H

2

A

⎥⎦

⎥⎥⎥⎥

⎟⎟⎠

⎟⎟⎟⎟

= 8 tr

Q

T

12

∂x ij

QP + Q

12

P

T

12

P

12

.

79

(4.86)

Now, continue with the terms containing

||

E

||

B

2

H

2

, tr ⎢⎣

⎢⎢⎢⎢

⎜⎜⎝

⎜⎜⎜⎜

||

E

||

2

H

2

B

⎥⎦

⎥⎥⎥⎥

T

⎢⎣

⎢⎢⎢⎢

||

E

||

2

H

2

B

⎥⎦

⎥⎥⎥⎥

⎟⎟⎠

⎟⎟⎟⎟

= tr

-

4 QB + Q

12

T

QB + Q

12

B

.

= tr B

T

QQB + 2 ˆ

T

Q

T

12

QB B

T

Q

T

12

Q

12

.

(4.87)

Di ff erentiating with respect to ˆ :

∂a ij

tr ⎢⎣

⎢⎢⎢⎢

⎜⎜⎝

⎜⎜⎜⎜

||

E

||

2

H

2

B

⎥⎦

⎥⎥⎥⎥

T

⎢⎣

⎢⎢⎢⎢

||

E

||

2

H

2

B

⎥⎦

⎥⎥⎥⎥

⎟⎟⎠

⎟⎟⎟⎟

= 8 tr

Q

T

12

∂a ij

QB + Q

12

B

T

.

(4.88)

Di ff erentiating with respect to ˆ :

∂b ij

tr

⎢⎣

⎢⎢⎢⎢

⎜⎜⎝

⎜⎜⎜⎜

||

E

|| 2

H

2

B

⎥⎦

⎥⎥⎥⎥

T

⎢⎣

⎢⎢⎢⎢

||

E

|| 2

H

2

B

⎥⎦

⎥⎥⎥⎥

⎟⎟⎠

⎟⎟⎟⎟

= 8 tr

ˆ

T

∂b ij

Q

T

12

QB + Q

12

.

(4.89)

Di ff erentiating with respect to ˆ :

∂c ij

tr

⎜⎜⎝

⎜⎜⎜⎜

⎢⎣

⎢⎢⎢⎢

||

E

|| 2

H

2

B

⎥⎦

⎥⎥⎥⎥

T

⎢⎣

⎢⎢⎢⎢

||

E

|| 2

H

2

B

⎥⎦

⎥⎥⎥⎥

⎟⎟⎠

⎟⎟⎟⎟

= 8 tr

Q

T

12

∂c ij

QB + Q

12

B

T

(4.90)

Continuing with the terms containing

||

E

||

C

2

H

2

, tr

⎢⎣

⎢⎢⎢⎢

⎜⎜⎝

⎜⎜⎜⎜

||

E

|| 2

H

2

C

⎥⎦

⎥⎥⎥⎥

T

⎢⎣

⎢⎢⎢⎢

||

E

|| 2

H

2

C

⎥⎦

⎥⎥⎥⎥

⎟⎟⎠

⎟⎟⎟⎟

= tr

-

4 CP

T

12

T

CP

T

12

.

= 4 tr PC

T

CP

2 P

12

C

T

CP + P

12

C

T T

12

(4.91)

Di ff erentiating with respect to ˆ :

∂a ij

tr

⎜⎜⎝

⎜⎜⎜⎜

⎢⎣

⎢⎢⎢⎢

||

E

|| 2

H

2

C

⎥⎦

⎥⎥⎥⎥

T

⎢⎣

⎢⎢⎢⎢

||

E

|| 2

H

2

C

⎥⎦

⎥⎥⎥⎥

⎟⎟⎠

⎟⎟⎟⎟

= 8 tr

P

12

C

T

∂a ij

T

12

CP (4.92)

80

4 Model Reduction

Di ff erentiating with respect to ˆ :

∂b ij

tr

⎢⎣

⎢⎢⎢⎢

⎜⎜⎝

⎜⎜⎜⎜

||

E

|| 2

H

2

C

⎥⎦

⎥⎥⎥⎥

T

⎢⎣

⎢⎢⎢⎢

||

E

|| 2

H

2

C

⎥⎦

⎥⎥⎥⎥

⎟⎟⎠

⎟⎟⎟⎟

= 8 tr

P

12

C

T

∂b ij

T

12

CP (4.93)

Di ff erentiating with respect to ˆ :

∂c ij

tr

⎢⎣

⎢⎢⎢⎢

⎜⎜⎝

⎜⎜⎜⎜

||

E

||

2

H

2

C

⎥⎦

⎥⎥⎥⎥

T

⎢⎣

⎢⎢⎢⎢

||

E

||

2

H

2

C

⎥⎦

⎥⎥⎥⎥

⎟⎟⎠

⎟⎟⎟⎟

=

8 tr

ˆ

T

∂c ij

CP

T

12

P

12

(4.94)

Here is where the equations for W

1

, W

2

, W

3 and W

4 from (4.81) comes in. Using

Lemma 4.1 with the equations in (4.81) and (4.82) together with the equations above entails

∂V

∂V

∂V

rob

ˆ rob

ˆ rob

ˆ

= 4

A

= 4

A

W

1

P

12

+ Q

T

12

W

2

||

E

||

A

H

2

W

1

B

||

E

||

A

2

H

2

F

+ 4

F

+ 4

B

Q

T

12

B

QB

Q

T

12

||

E

||

B

2

H

2

W

3

||

E

||

B

H

2

+ Q

12

F

F

=

4

A

CW

2

||

E

||

A

H

2

F

4

B

CW

3

||

E

||

B

H

2

F

+ 4

C

W

4

P

12

||

E

||

C

H

2

F

,

4

C

+ 4

C

W

4

B

||

E

||

C

2

H

2

,

F

CP

||

E

||

C

H

2

T

12

F

P

12

.

(4.95a)

(4.95b)

(4.95c)

4.B

Equations for Frequency-Weighted Model

Reduction

In this appendix, the equations that comes from partitioning P

E

and Q

E

as in

(4.13) and using the realization (4.12) of

E

, will be presented, both for continuous and discrete time.

4.B

Equations for Frequency-Weighted Model Reduction

4.B.1

Continuous Time

81

Splitting the equations in (4.15) using the partitioning in (4.13) yields the equations

A

o

P

o

+ P

o

A

AP

T

o

AP +

A

12

+

AP

ˆ

+

B

o

14

P

PA

P

CP

14

12

+ P

ˆ

A

T

T

T

14

+

+

BC

BC

P

i i

T

14

P

P

T

13

T

23

C

T

+

+

B

T

o

P

P

13

23

A

B

i

C

C

o

P

i

T

i

T

i

B

T

B

T

+

24

+

P

BD

i i i

A

i

T

P

+

T

24

D

D

i

T

i

T

B

T

B

T

B

i

C

T

B

B

T

o i

T

+

A

T

o

BC

+

i

AP

13

P

T

23

+ P

BC

i

P

+

13

34

P

13

A

i

T

+

C

i

T

+

B

BC

PC

T

T

B

i

T

o

+

P

i

P

BD

+

i

D

i

T

B

T

BD

12

i

C

T

B

B

T

o i

T

AP

24

+ P

A

i

24

P

A

T

o

34

AP

23

+ P

+

BC

i

34

P

P

A

T

o

23

34

+

A

i

T

+

P

P

T

12

T

13

C

BC

i

C

T

B

T

T

o

P

B

T

o

i

P

T

23

BD

i

B

P

ˆ

T

B

T

o i

T

C

T

B

T

o

= 0

,

= 0

,

= 0

,

= 0

,

= 0

,

= 0

,

= 0

,

= 0

,

= 0

,

= 0

,

(4.96a)

(4.96b)

(4.96c)

(4.96d)

(4.96e)

(4.96f)

(4.96g)

(4.96h)

(4.96i)

(4.96j) and

Q

i

A

i

+ A

i

T

QA

Q

Q

i

ˆ

+

+ A

T

Q +

+ ˆ

Q

T

13

T

BC

i

Q

Q

14

24

B

B

o o

C

ˆ

+

+ C

i

T

B

T

Q

13

C

T

B

T

o

C

T

B

T

o

+ Q

T

23

Q

T

14

Q

T

24

BC

i

+

Q

o

A

o

+ A

C

C

T

T

D

D

T

o

T

o

D

D

o o

C

+ C

i

T

T

o

Q

o

+

B

T

Q

23

C

T

o

C

o

Q

12

ˆ

+

Q

13

A

T

A

i

Q

12

+ A

T

Q

Q

14

14

Q

A

13

o

+

+

B

o

ˆ

+

QBC

i

A

T

Q

14

C

T

B

T

o

+ Q

12

Q

T

24

BC

i

C

T

D

T

o

+ C

T

B

T

o

D

o

Q

T

34

+ C

T

B

T

o

Q

o

+ C

T

D

T

o

C

o

Q

23

A

i

Q

Q

34

T

Q

23

24

A

o

A

o

+

+ Q

T

12

BC

i

Q

ˆ

i

C

T

B

T

o

Q

T

34

A

i

T

A

T

Q

Q

34

24

+ C

i

T

C

T

B

B

T

T

o

Q

Q

o

14

C

+ C

i

T

T

D

B

T

T

o

C

o

Q

24

= 0

,

=

=

=

=

=

=

=

=

=

0

0

0

0

0

0

0

0

0

.

,

,

,

,

,

,

,

,

(4.97a)

(4.97b)

(4.97c)

(4.97d)

(4.97e)

(4.97f)

(4.97g)

(4.97h)

(4.97i)

(4.97j)

Splitting the cost function, (4.14), using the realization of

E

, (4.12) and the partitioning of P

E

and Q

E

, yields

||

E

||

2

H

2

= tr D

i

T

B

T

QBD

i

+ 2 D

i

T

B

T

Q

T

12

BD

i

+ B

i

T

Q

i

B

i

+ D

T

i

B

T

BD

+ 2 B

i

T

Q

T

13

BD

i i

+ 2 B

T

i

Q

T

23

i

,

(4.98a)

||

E

||

2

H

2

= tr D

o

CPC

T

D

T

o

2 D

o

+ C

o

T

12

C

T

D

T

o

P

o

C

T

o

+ D

o

C

ˆ

C

T

D

T

o

+ 2 D

o

CP

14

C

T

o

2 D

o

CP

24

C

T

o

.

(4.98b)

82

4 Model Reduction

The gradient becomes

||

E

||

2

H

2

ˆ

||

E

|| 2

H

2

ˆ

= 2

= 2

+ 2

Q

ˆ

+ Q

T

12

P

12

+ Q

23

P

T

23

+ Q

24

P

T

24

Q

T

12

P

13

Q

ˆ

i

QP

23

+ Q

23

+ Q

T

12

BD

i

P

i

+ Q

+ Q

23

B

i

24

D

P

i

T

T

34

,

,

C

i

T

||

E

||

2

H

2

ˆ

=

2 B

T

o

+ 2 D

T

o

Q

T

14

D

o

P

C

12

ˆ

+ Q

T

24

D

o

ˆ

+

CP

12

Q

T

34

P

T

23

C

o

P

T

24

+

.

Q

o

P

T

24

(4.99a)

(4.99b)

(4.99c)

4.B.2

Discrete Time

Splitting the equations in (4.32) using the partitioning in (4.13) yields the equations

APA

T

P + BC

i

P

T

13

A

T

+ AP

13

C

i

T

B

T

+ BC

i

P

i

C

i

T

B

T

+ BD

i

D

i

T

B

T

= 0

,

(4.100a)

A

ˆ

T

BC

i

P

T

23

A

T

AP

23

C

i

T

B

T

BC

i

P

i

C

i

T

B

T

BD

i

D

i

T

B

T

= 0

,

(4.100b)

A

i

P

i

A

i

T

P

i

+ B

i

B

i

T

= 0

,

(4.100c)

A

o

P

o

A

T

o

P

o

B

o

+ B

o

CP

CP

12

14

C

T

B

A

T

o

T

o

+ A

B

o o

P

T

14

CP

T

12

C

T

B

T

o

C

T

B

o

+

B

o

B

o

CP

24

A

T

o

CPC

T

B

T

o

+

A

o

P

T

24

C

T

B

o

C

ˆ

C

T

B

T

o

B

T

o

= 0

,

(4.100d)

AP

12

T

P

12

+ BC

i

P

T

23

A

T

+ AP

13

C

i

T

B

T

+ BC

i

P

i

C

i

T

B

T

+ BD

i

D

i

T

B

T

= 0

,

(4.100e)

AP

13

A

i

T

P

13

+ BC

i

P

i

A

i

T

+ BD

i

B

i

T

= 0

,

(4.100f)

AP

14

A

T

o

P

14

+ BC

i

P

34

A

T

o

+ BC

i

+

P

T

13

APC

T

B

T

o

C

T

B

T

o

AP

BC

i

P

12

T

23

C

T

C

T

B

T

o

B

T

o

= 0

,

(4.100g)

AP

24

A

T

o

P

24

AP

23

A

i

T

i

B

i

T

= 0

,

(4.100h)

BC

i

P

34

A

T

o

BC

i

P

T

13

T

12

C

T

B

T

o

C

T

B

T

o

BC

i

A

ˆ

C

T

P

T

23

C

T

B

T

o

B

T

o

= 0

,

(4.100i)

A

i

P

34

A

T

o

P

34

+

P

A

i

23

BC

i

P

T

13

C

T

B

T

o

P

i

A

i

T

A

i

P

T

23

C

T

B

T

o

= 0

,

(4.100j)

4.B

Equations for Frequency-Weighted Model Reduction

83 and

A

T

QA

Q + A

T

Q

14

B

o

C + C

T

B

T

o

Q

T

14

A + C

T

B

T

o

Q

o

B

o

C + C

T

D

T

o

D

o

C = 0

,

(4.101a)

A

T

ˆ − ˆ −

A

T

Q

24

B

o

ˆ −

C

T

B

T

o

Q

T

24

ˆ

+ ˆ

T

B

T

o

Q

o

B

o

ˆ

+ ˆ

T

D

T

o

D

o

ˆ

= 0

,

(4.101b)

A

i

T

Q

i

A

i

Q

i

+ C

i

T

+ A

i

T

B

T

Q

Q

T

13

T

12

BC

BC

i i

+

+

C

T

i

C

i

T

B

B

T

T

Q

Q

13

12

A

BC

i i

+

+

A

C

T

i

T

i

Q

T

23

BC

B

T

QBC

i i

+ C

+ C

i

T

i

T

B

T

B

T

Q

23

A

Q

ˆ

i i

= 0

,

(4.101c)

A

T

o

Q

o

A

o

Q

o

+ C

T

o

C

o

= 0

,

(4.101d)

A

T

Q

12

ˆ −

Q

12

C

A

T

T

B

Q

T

o

14

Q

o

B

o

B

o

ˆ

+

ˆ −

C

T

C

T

B

T

o

D

T

o

Q

T

24

D

o

ˆ

= 0

,

(4.101e)

A

T

Q

13

A

i

Q

13

+ A

T

QBC

i

+ C

T

+ A

T

B

T

o

Q

Q

T

14

12

BC

i

BC

i

+ C

T

B

T

o

+ C

T

B

T

o

Q

Q

T

24

T

34

A

BC

i i

= 0

,

(4.101f)

A

T

Q

14

A

o

Q

14

+ C

T

B

T

o

Q

o

A

o

+ C

T

D

T

o

C

o

= 0

,

(4.101g)

T

Q

23

A

i

Q

23

A

T

Q

T

12

BC

i

− ˆ

T

B

T

o

A

T

Q

ˆ

i

Q

T

14

BC

i

C

T

B

T

o

Q

T

34

A

i

T

o

Q

24

BC

i

= 0

,

(4.101h)

T

Q

24

A

o

Q

24

C

T

B

T

o

Q

o

A

o

C

T

D

T

o

C

o

= 0

,

(4.101i)

A

i

T

Q

34

A

o

Q

34

+ C

T

i

B

T

Q

14

A

o

+ C

i

T

B

T

Q

24

A

o

= 0

.

(4.101j)

Using the partitioning in (4.13) again, yields the cost function

||

E

|| 2

H

2

= tr D

+ 2 B

i

T

Q

T

13

i

T

B

T

QBD

BD

i

+ 2

i

B

+ 2

i

T

D

Q

T

23

i

T

B

T

BD

i

Q

+

T

12

BD

i

D

T

i

+

D

T

D

T

i

B

T

D

T

BD

i

D

T

o

D

o

+ B

i

T

D

Q

i

B

i

D

i

,

(4.102a)

||

E

||

2

H

2

= tr D

o

CPC

T

+ 2 D

o

CQ

14

C

T

o

D

T

o

2 D

o

2 D

o

CQ

24

C

T

o

T

12

C

T

D

T

o

+ D

o

D

+

D

o

C

ˆ

C

T

D

T

o

D

i

D

i

T

D

T

+ C

o

P

o

C

T

o

D

T

D

T

o

.

(4.102b)

84

4 Model Reduction

The gradient becomes

||

E

|| 2

H

2

ˆ

= 2 Q

ˆ

+ Q

T

12

AP

12

+ Q

23

A

i

P

T

23

+ Q

24

A

o

P

T

24

+ Q

24

B

o

CP

12

C

ˆ

,

(4.103a)

||

E

||

2

H

2

ˆ

= 2 Q

T

12

AP

13

+ Q

24

B

o

Q

ˆ

23

+ Q

23

A

i

P

i

+ Q

24

A

o

Q

T

34

CP

13

CP

23

+ 2

+ Q

T

12

B

BD

i

C

i

P

i

+ Q

T

12

BD

i

C

i

T

+ Q

23

B

i

D

i

T

,

(4.103b)

||

E

||

2

H

2

ˆ

=

2 B

T

o

Q

T

14

AP

12

+ Q

T

24

A

ˆ

+ Q

T

34

A

i

P

T

23

+ Q

o

A

o

P

T

24

+ Q

o

B

o

CP

12

C

ˆ

+

+ 2 D

T

o

Q

T

14

B +

D

o

C

ˆ

Q

T

24

D

o

C

i

P

CP

12

T

23

C

o

P

T

24

,

(4.103c)

||

E

||

2

H

2

ˆ

= 2 D

T

o

D

o

ˆ −

D D

i

D

i

T

.

(4.103d)

4.C

Gradient of the Frequency-Limited Case

In this section, the derivation of the gradient of the cost function (4.67) will be presented. We start by di ff erentiating the cost function (4.67) with respect to

B

,

B and ˆ . First, note that neither Q

ω

,

Q

12

. This means that (4.67a) is quadratic in ˆ

Q

ω

in equation (4.67a) depend

B . Analogous observations can be made with equation (4.67b) and the variable ˆ D . Hence, the derivative of the cost function with respect ˆ

,

becomes

||

E

||

2

H

2

ˆ

||

E

|| 2

H

2

ˆ

||

E

||

2

H

2

ˆ

= 2

= 2

=

2

Q

ω

+ Q

T

12

B

ˆ

C

ˆ

ω

CP

12

CS

ω

B + D

ω

T

ω

C

T

D

C

ˆ

ω

D

ˆ −

B

ˆ

D

,

T

ω

T

ω

+

,

D

ω .

(4.104a)

(4.104b)

(4.104c)

When di ff erentiating with respect to ˆ

Q

12

depend on ˆ .

Q

ω

and

4.C

Gradient of the Frequency-Limited Case

85

||

E

|| 2

H

2

∂a ij

= tr BB

T

Q

12

∂a ij

B

ˆ

T

ˆ

∂a

ω ij

2 ˆ

ˆ

ω

∂a ij

D

T

D

T

,

(4.105) where

Q

12

∂a ij

in (4.65), and

ˆ

ω

∂a ij

depend on ˆ via the di ff erentiated versions of the equations

A

T

Q

T

12

∂a ij

+

Q

T

12

∂a ij

A +

ˆ

T

∂a ij

Q

T

12

ˆ

T

ω

∂a ij

C

T

C = 0

,

(4.106a)

A

T

ˆ

ω

∂a ij

+

ˆ

ω

∂a ij

ˆ

+

ˆ

T

∂a ij

Q

ω

ˆ

ω

∂a ij

+

ˆ

T

ω

∂a ij

C

T

ˆ

+ ˆ

T

ˆ

ω

∂a ij

= 0

.

(4.106b)

Using Lemma 4.1 on (4.105) with the equations in (4.29) and (4.106) yields

||

E

||

2

H

2

∂a ij

= 2 tr

ˆ

T

∂a ij

Y

T

ω

X

ω

+

ˆ

∂a ij

C

T

C

ˆ

2 tr

ˆ

ω

∂a ij

C

T

CX

D

T

D

T

C

.

(4.107)

What remains is to rewrite the two last terms in (4.107), which includes

ˆ

ω

∂a ij

ˆ

∂a ij

S

ω

,

S

ω

Re

-

i

π

ln

− ˆ −

I

.

and

(4.108) and di ff erentiate with respect to an element in ˆ , i.e.,

a ij

. This yields

ˆ

∂a

ω ij

= Re

= Re

2

π i i

2

π

L

L

− ˆ

− ˆ

I

,

∂a ij

ˆ

I

,

∂a ij

− ˆ

,

I

(4.109) where

L

( A

,

E ) is the Frechét derivative of the matrix logarithm with

L

( A

,

E ) =

0

1

(

t

( A

− I

) +

I

)

1

E (

t

( A

− I

) +

I

)

1 d

t,

see Higham [2008].

(4.110)

The function

L

( A

,

E ) can be e ffi ciently evaluated using the algorithm in Higham

[2008] or Al-Mohy et al. [2012]. Substituting (4.109) into (4.107) and using

(4.110) with the fact that the tr-operator and the integral can be interchanged,

86

4 Model Reduction yields

||

E

||

∂a

2

H

2

,ω ij

= 2 tr

= 2 tr

ˆ

∂a

T

ij

ˆ

T

∂a ij

Q

T

12

P

12

Q

T

12

P

12

2 tr

ˆ

ω

Q

ω

∂a ij

P

ω

D

2 tr

+

T

ˆ

∂a

ˆ

∂a

T

ω ij

D

T

T

ij

C

C

T

C

ˆ

Re

i

π

L

C

T

CP

12

− ˆ −

I

,

V

!

T

= tr

∂a

ˆ

T

ij

2 Q

T

12

P

12

Q

ω

2 W

,

(4.111) where

W

V

= Re

-

i

π

L

C

T

C

ˆ −

− ˆ −

C

T

CP

12

I

,

V

.

T

,

C

T

D

B

T

.

(4.112)

(4.113)

LPV

5

Modeling

In this chapter, local methods to approximate lpv models are developed. The methods use an approach that tries to preserve the input-output relations from the given models in the resulting lpv model. This is done by minimizing the sum of the lpv

H

2

-norms of the di ff erence between the given models and a parametrized model. When developing the methods, large e ff ort is made on making the method computationally e ffi cient. The material in this chapter is largely based on Petersson and Löfberg [2012c].

5.1

Introduction

In the last decades, intensive research has been carried out on linear parametervarying models ( lpv models), see e.g., Rugh and Shamma [2000], Leith and Leithead [2000], Tóth [2008], Lovera et al. [2011] or Mohammadpour and Scherer

[2012]. An important reason for this interest is that it is a powerful tool for modeling and analysis of nonlinear systems, such as aircrafts (see Marcos and Balas

[2004]) or wafer stages (see Wassink et al. [2005]). Some advanced robustness analysis methods, such as IQC-analysis and

μ

-analysis, see e.g., Megretski and

Rantzer [1997], Zhou et al. [1996], require a conversion of the lpv model into a linear fractional representation ( lfr

), see e.g., Zhou et al. [1996]. For this to be possible it is necessary that the parametric matrices A ( p ), B ( p ), C ( p ) and D ( p ) of the lpv model are rational in p . This requirement is often violated in lpv models generated directly from a non-fractional model description, either due to presence of non-fractional parametric expressions or tabulated data in the model.

In both cases, rational approximations must be used to obtain a suitable model.

This motivates a method that both can approximate a nonlinear model with an lpv model and approximate a complex lpv model with a less complex one.

87

88

5 lpv

Modeling

As described in Section 2.1.5, lpv

-models can be described by linear di ff erential equations whose coe ffi cients depend on scheduling parameters,

˙ (

t

) = A ( p ) x (

t

) + B ( p ) u (

t

)

,

y (

t

) = C ( p ) x (

t

) + D ( p ) u (

t

)

,

(5.1)

(5.2) where x (

t

) is the state, u (

t

) and y (

t

) are the input and output signals and p (

t

) is the vector of scheduling parameters. For example, in flight control applications, the components of p (

t

) are typically mass, position of centre of gravity and various aerodynamic coe ffi cients, but can also include state dependent parameters such as altitude and velocity, specifying current flight conditions.

Generation of lpv models can simplistically be divided into two main families of methods, global methods (see e.g., Nemani et al. [1995], Lee and Poolla [1999],

Bamieh and Giarre [2002], Felici et al. [2007], Tóth [2008]) and local methods (see e.g., Steinbuch et al. [2003], Wassink et al. [2005], Lovera and Mercere [2007],

De Caigny et al. [2011], Pfifer and Hecker [2008], De Caigny et al. [2012]). A survey of existing methods can be found in Tóth [2008]. The global methods will only be mentioned briefly, since the main focus will be on local methods.

5.2

Global Methods

In the class of global methods, a global identification experiment is performed by exciting the system while the scheduling parameters change the dynamics of the system. An advantage with this approach, of generating lpv models, is that it is also possible to capture the rate of change of the parameters and how they can vary between di ff erent operating points. However, one drawback is that it is sometimes, for example in some flight applications, not possible to perform such an experiment.

5.3

Local Methods

A

i

C

i

3

N

In the class of local methods, a set of lti models,

M

=

G i

= are interpolated, or in some other way combined, to generate an

B

D

i i

lpv

,

p

i i

=1 model.

,

These local models,

G i

, can, for example, have been identified using a set of inputoutput measurements where the parameters have been kept constant, for which there exists several methods, see e.g., Ljung [1999], or by linearizing a nonlinear model in di ff erent operating points.

In this family of methods it is assumed that the system can operate at di ff erent fixed operating points, where the scheduling parameters are “frozen”. There are of course systems where this is not possible and where this family of methods is inapplicable, requiring the use of global methods. Another drawback with this family of methods is that it does not take time variations of the scheduling parameters into account, thus limiting local methods to systems where the scheduling

5.4

lpv

Modeling using an

H

2

-Measure

89 parameters vary slowly in time, which is a commonly used assumption in gain scheduling, see Shamma and Athans [1992]. To see this more clearly, write the lpv system as

G

( p

,

˙

, . . .

) =

A

C

S

S

(

( p p

)

)

B

D

S

S

(

( p p

)

)

+

A

D

( p

,

C

D

( p

,

˙

, . . .

)

˙

, . . .

)

B

D

( p

,

D

D

( p

,

˙

, . . .

)

˙

, . . .

)

=

G

S

( p ) +

G

D

( p

,

˙

, . . .

)

,

(5.3) where

G

S

( p ) only depends on the current parameter value and does not include any dynamic dependence of the parameters, and namic dependence of the parameters.

G

D

G

D

( p

,

˙

, . . .

) includes all the dyhas the property that

G

D

( p

,

0

,

0

, . . .

) =

0. If the parameters are kept constant and the models,

G i

, are generated

G

( p

i

,

0

,

0

, , . . .

) =

G

S

( p

i

) +

G

D

( p

i

,

0

,

0

, . . .

) =

G

S

( p

i

)

,

one observes that the information in

G

D

is lost. This is one reason why one has to be careful when doing model interpolation. A paper that explains the pitfalls of interpolation is Tóth et al. [2007].

A common drawback of many of the local methods is that they need the local models to be given in the same state-space basis, see e.g., Pfifer and Hecker [2008].

However, the lti models given in

M are related to the true lpv system as

G i

=

A

C

i i

B

D

i i

=

T

1

i

A

C

S

S

( p

( p ) T

)

i

T

i

T

i

1

D

S

B

S

( p

(

) p )

,

for some invertible matrices T

i

eral, assume that the given lti

, which are unknown. Hence, one cannot, in genmodels are described in the same state-space basis.

be able to transform the lti models,

G i

T

i

to

, to a common basis that encourage interpolation, usually some canonical form, see e.g., Steinbuch et al. [2003]. However, these lti models in canonical forms may su ff er from bad numerics. In De Caigny et al. [2012] they solve this problem by fixing one of the given models as a reference model and transforming the other models to state-space bases that are consistent with the reference model.

5.4

LPV

Modeling using an

H

2

-Measure

The methods that will be described in this section are based on the model-reduction techniques introduced in Section 4.4 and are in the family of local methods.

The goal with the methods proposed in this section is to try to preserve the inputoutput relations of the given lti models in

M

, instead of doing direct interpolation of system matrices. Let

G

( p ) denote the true lpv system, then ideally the goal would be to find an lpv model, ˆ ( p ), that is optimal with respect to some global discrepancy measure on the model error, for instance the following integral

90

5 lpv

Modeling

ˆ

( p )

,

min

ˆ

( p )

,

ˆ

( p )

,

ˆ

( p )

G

( p )

− ˆ

( p )

2

H

2

d p

,

(5.4) where

ˆ

( p ) :

˙ (

t

) y (

t

)

=

=

ˆ

( p ) x (

t

B ( p ) u (

t

)

ˆ

( p ) x (

t

D ( p ) u (

t

)

.

(5.5)

This is not always practical or even tractable. In many applications, e.g., flight applications, one often only have a simulation model available or a model that is used for computational fluid-dynamic calculations and not an analytical nonlinear model and it is only possible to extract linearized models for discrete values of the scheduling parameters, p

i

, i.e., we are given the model set

M

=

{

G i

,

p

i

}

N i

=1

.

Having this in mind (5.4) is changed into a discretized, in the parameters, version,

N

ˆ

( p )

,

min

ˆ

( p )

,

ˆ

( p )

,

ˆ

( p )

i

=1

G i

− ˆ

( p

i

)

2

H

2

.

(5.6)

The two most widely used norms in system theory are the

(5.6), the norm that will be used here is the

H

2

- and

H

-norms, both capturing the input-output relation of the system. As indicated in (5.4) and

H

2

-norm (or the frequency-limited

H

2

-norm). The main reason for this choice is, as in Chapter 4, that the cost function, again, becomes di ff erentiable with respect to the optimization variables, with readily computed gradients.

5.4.1

General Properties

Since the lpv methods in this section will be based on the methods in Section 4.4, they also inherit the property that they are invariant under state transformations of the given lti systems. This was useful in the model-reduction scheme since it does not matter in which state basis the given system is described. For the lpv methods, this fact can be utilized again. As explained in Section 5.3, what we are searching for in the local methods is the related to the model set

M as

G

S

( p )-part of the lpv model, which is

M

=

{

G i

,

p

i

}

N i

=1

, G i

=

A

C

i i

B

D

i i

=

T

i

1

C

S

A

S

( p

(

) p

T

)

i

T

i

T

i

1

D

S

B

S

( p

(

) p )

,

where T

i

are some unknown invertible matrices, which, generally, are not related to each other. Since the methods are invariant under state transformations we do not seek to find these other local methods.

T

i

, only

G

S

( p ), which is an advantage compared to most

One thing that has been left out so far, is how the system matrices ˆ ( p ), ˆ ( p ), ˆ ( p ) and ˆ ( p ) are parametrized. These matrices are taken to be linear combinations of some basis functions

w k

( p ), e.g., in the polynomial case, monomials. The system

5.4

lpv

Modeling using an

H

2

-Measure

91 matrices in the lpv model, ˆ ( p ), will then depend on p as

ˆ

( p ) =

w k

( p ) ˆ

(

k

)

,

(5.7a)

k

ˆ

( p ) =

w k

( p ) ˆ

(

k

)

,

(5.7b)

k

ˆ

( p ) =

w k

( p ) ˆ

(

k

)

,

(5.7c)

k

ˆ

( p ) =

w k

( p ) ˆ

(

k

)

,

(5.7d)

k

where the functions

w k

( p ) are design choices that can be hard to choose to not make the model class to restrictive. However, it is not as restrictive as one might think. To see this, start by looking at how an lpv model changes when doing a state transformation, which can depend on the parameters. Given the state transformation

= ¯ ( p ) x

,

(5.8) where ¯ ( p ), in the continuous-time case is a nonsingular continuously di ff erentiable matrix for all valid parameter values, and in the discrete-time case is a matrix rational function of case, given an lpv p and invertible for all model as in (5.3), entails p

k

. For the continuous-time

¯

( p

,

˙

, . . .

+

) =

¯

( p

C

)

S

A

(

S

p

( p ) ¯

) ¯

1

(

1 p )

( p )

¯

( p ) B

S

D

S

(

( p ) p )

¯

( p ) A

D

( p

,

C

D

˙

, . . .

) ¯

1

( p

,

( p ) + ˙¯ ( p ) ¯

1

˙

, . . .

) ¯

1

( p )

( p )

¯

( p ) B

D

D

D

( p

,

( p

,

˙

˙

, . . .

, . . .

)

)

G

S

( p ) + ¯

D

( p

,

˙

, . . .

)

.

(5.9)

Important to note here is that the part

G

S

( p ) is transformed using only a static dependence in the parameters and, hence, it will, after the transformation, still only depend statically on the parameters. This fact can be used to realize that the choices of

w k

( p ) in (5.7) are not as restrictive as one can think. Let us illustrate this with an example.

Example 5.1: E ff ect of State Transformations

Assume samples from the continuous-time lpv model,

G

(

p

) are given.

do not have any dynamic dependence of the parameters, i.e.,

G

(

p

) =

G

S

G

(

p

)

(

p

) =

A (

p

)

C (

p

)

B (

p

)

D (

p

)

, where

A (

p

) =

⎜⎜⎜⎜

⎜⎜⎜⎜

0

.

4

p

0

.

4

p

2

2

+ 3

p

+ 3

.

6

p

3

.

6

3

.

2

1

.

6

p

1

.

6

0

.

4(

p

3

24

p

40)

0

.

2(2

p

3

+3

p p p

2

46

p

0

.

2(8

p

2

p

33

p

5)

10)

0

.

2(27

p

3

+55

p

2

+37

p

160)

0

.

2(27

p

3

0

.

2(23

p p

+23

p

2

p

68

p

10)

p

2

96

p

20)

⎟⎟⎟⎟

⎟⎟⎟⎟

,

92

5 lpv

Modeling

B (

p

) =

C (

p

) =

⎜⎜⎝

⎜⎜⎜⎜

8 + 7

p

6 + 2

p

3

-

+

+

0

.

2 + 0

.

2

p p

2

p

2 ⎟⎟⎠

⎟⎟⎟⎟

,

0

.

2(

9

p

+

p

2

10)

p

D (

p

) = 0

.

0

.

8(

p

+4

p

2

5)

p

.

,

This lpv model does not have any dynamic dependence in the parameter and to be certain to be in the correct model class we can use the basis functions,

{

p

1

,

1

, p, p

2 }

. However, a di ff erent realization of this model is given by

A

B

T

T

(

(

p p

) = ¯

) = ¯

(

(

p p

)

)

A

B (

(

p p

) ¯

) =

1

⎜⎜⎝

⎜⎜⎜⎜

(

p

) =

1 +

p

2 +

3

p

⎟⎟⎠

⎟⎟⎟⎟

⎜⎜⎝

⎜⎜⎜⎜

,

2 +

p

2 + 2

p

8 + 8

p

3 +

4 + 3

p

1 + 5

p p

5 + 2

p

1 + 5

p

2 + 3

p

⎟⎟⎠

⎟⎟⎟⎟

, w k

(

p

) =

C

T

(

p

) = C (

p

) ¯

1

(

p

) = 1 +

p

2 + 2

p

3 + 3

p ,

D

T

(

p

) =

¯

(

p

) =

⎜⎜⎝

⎜⎜⎜⎜

D (

p

) = 0

,

5

p

0

0

p

0

1

2

1

⎟⎟⎠

.

Obviously, in this realization, the model is only a ffi ne in

p

. This means that it is also possible to find the correct model using only the basis functions

{

1

, p

}

.

In the example above, it can be observed that the choice of

w k

( p ) can sometimes be a little forgiving, since we have methods that are invariant to the state basis that the given models are represented in.

5.4.2

The Optimization Problem

The general optimization problem that will be studied can be written as

N

minimize

A

(

k

)

,

ˆ

(

k

)

,

ˆ

(

k

)

,

ˆ

(

k

)

i

=1

W o,i

G i

− ˆ

( p

i

)

W i,i

2

H

2

=

N

minimize

(

k

)

,

B

(

k

)

,

ˆ

(

k

)

,

ˆ

(

k

)

i

=1

W o,i

E i

W i,i

2

H

2

, E i

=

G i

− ˆ

( p

i

)

.

(5.10)

To study the problem in (5.10), start by looking at the case when there is only one model and see what can be concluded. This problem becomes, almost, identical to the problems in Section 4.4. The only di ff erence is that the system matrices

5.4

lpv

Modeling using an

H

2

-Measure

93

A

(

k

)

B

(

k

)

,

C

(

k

)

D

(

k

) enter linearly in ˆ ( p ), ˆ ( p ), ˆ ( p D ( p ) which makes it easy to express the gradient in the new variables instead, for example,

2

∂ W o,i

G i

− ˆ

( p

i

)

W i,i

ˆ

(

k

)

H

2

= 2

w k

( p

i

E

T

Q

E,i

P

E,i

ˆ

.

(5.11)

Now returning to the original problem, (5.10), when having a number of lti models given, instead of just one. This is also a simple extension of the problems in Section 4.4, since this is a sum of the

H

2

-norm over a number of lti models, which yields the structure

+

i

W o,i

G i

ˆ

(

k

)

ˆ

( p

i

)

W i,i

2

H

2

=

N i

=1

∂ W o,i

G i

ˆ

ˆ

( p

i

)

(

k

)

W i,i

2

H

2

= 2

N i

=1

w k

( p

i

T

Q

E,i

P

E,i

ˆ

.

(5.12)

When converting the model-reduction methods in Section 4.4 into lpv methods, they will not only inherit the properties, but also the prerequisites of the methods, that is, when extending all the methods, it is required that the given in

M lti models are all asymptotically stable, and additionally for the continuous-time case and the methods in Section 4.4.1 and Section 4.4.2, the lti models require the error system to be strictly proper, i.e., of finding ˆ

D

i

= ˆ ( p

i

). For these methods, the problem

D ( p ) can be seen as a separate problem.

Before stating the necessary conditions for optimality for the proposed lpv methods (derived from the model-reduction methods), some notation has to be established. The given systems,

G i

in the set

M are assumed to have the realizations

G i

=

A

i

C

i

B

i

D

i

,

(5.13) and correspond to the parameter values p

i

. The notation and partitioning will be the same as in Section 4.4, with the exception that all variables will have a subscript

i

corresponding to the parameter value considered, p

i

. Only the necessary conditions for the continuous-time cases are stated, since most of the details are covered in Section 4.4 and the discrete-time cases are analogous with the continuous-time case. From the necessary conditions for optimality, the expressions for the gradients can be readily extracted to be used in, for example, a quasi-Newton algorithm.

The necessary conditions for optimality for the

Section 4.4.1 can be stated as follows.

lpv version of the method in

Theorem 5.1 (Necessary conditions for optimality).

and W o,i are asymptotically stable and that E i

Assume that G i is strictly proper, for the

,

H

2

i

, W i,i

-norm

94

to be defined, i.e., all

A

i

,

A

i

,

A

i,i i . In order for the matrices and

A

o,i

A

(

k

)

,

B

are Hurwitz and

(

k

)

,

C

(

k

)

D

o,i

D

i

D

i

D

i,i to be optimal for the problem

= 0

for

minimize

A

(

k

)

,

ˆ

(

k

)

,

ˆ

(

k

)

||

E i

|| 2

H

2

, E i

=

W o,i

G i

i i it is necessary that they satisfy the equations

A

E,i

P

E,i

A

T

E,i

Q

E,i

+ P

E,i

A

T

E,i

+ Q

E,i

A

E,i

+

+

B

E,i

B

T

E,i

C

T

E,i

C

E,i

= 0

,

= 0

, for all i

:s, and that

||

E

||

2

H

2

ˆ

(

k

)

= 2

||

E

|| 2

H

2

ˆ

(

k

)

= 2

||

E

||

2

H

2

ˆ

(

k

)

=

2

i i i w k

( p

i

E

T

Q

E,i

P

E,i w k

( p

i

E

T

w k

( p

i

)

Q

E,i

B

T

o

E

T

o

P

Q

E,i

E,i

=

E

i

P

0

C

i

T

E,i

,

+

+ Q

E,i

B

E,i

D

i

D

T

o

C

E,i

P

E,i

W i,i

,

=

=

0

,

0

,

(5.14)

(5.15a)

(5.15b)

(5.16a)

(5.16b)

(5.16c)

where

=

⎜⎜⎜⎜

⎜⎜⎜⎜ 0

n

×

ˆ

⎜⎜⎝

I

0

0

n

ˆ

×

n n o i

×

ˆ

×

ˆ

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎠

,

E

i

=

⎜⎜⎜⎜

⎜⎜⎜⎜

⎜⎜⎝

0

0

I

0

n

×

ˆ

n i n

ˆ

×

n o

×

ˆ

×

ˆ

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎠

,

E

o

=

⎜⎜⎜⎜

⎜⎜⎜⎜

⎜⎜⎝

0

0

0

I

n n

×

ˆ

ˆ

×

n n o i

×

ˆ

×

ˆ

⎟⎟⎟⎟

⎟⎟⎟⎟

⎟⎟⎠

.

(5.17)

Proof: The proof is analogous with Theorem 4.2.

5 lpv

Modeling

The necessary conditions for optimality for the lpv version of the method presented in Section 4.4.2, the robust extension, can be stated as

Theorem 5.2 (Necessary conditions for optimality).

and

W o,i are asymptotically stable and that to be defined, i.e.,

A

i

,

A

i

,

A

i,i and

A

o,i

E i

Assume that

D

i

D

i

G i is strictly proper, for the are Hurwitz and

D

o,i

D

,

H

i,i

G i

2

-norm

=

, W

0

i,i for all i . In order for the matrices

A

(

k

)

,

B

(

k

)

,

C

(

k

)

to be optimal for the problem

A

(

k

) min

,

ˆ (

k

)

,

ˆ (

k

)

i

||

E i

||

2

H

2

+

V rob

,

V rob

= 2

A

Q

i

P

i

+ Q

12

,i

P

T

12

,i

F i

+ 2

B

Q

i

B

i

+ Q

12

,i

B

i

F it is necessary that they satisfy the equations in

+ 2

C

(5.15)

C

i

P

i

(for

W

C

i i,i

P

=

T

12

,i

W o,i

F

,

(5.18)

=

I

) and

5.4

lpv

Modeling using an

H

2

-Measure

95

the equations

T

i

W

1

,i

A

i

W

2

,i

A

i

+

+

W

1

,i

A

i

W

2

,i i

T

+ Q

+

T

12

,i

Q

i

P

i

Q

i

+

P

i

+ Q

12

,i

P

T

12

,i

Q

12

,i

P

T

12

,i

P

12

,i

W

3

,i

+ W

3

,i i

T

+ Q

i

B

i

+ Q

12

,i

B

i i

T

i

T

W

4

,i

+ W

4

,i

A

i i

T

C

i

P

T

12

,i

C

i

P

i

= 0

,

= 0

,

= 0

,

= 0

.

(5.19a)

(5.19b)

(5.19c)

(5.19d)

for all i and that

||

E

|| 2

H

2

ˆ

(

k

)

||

E

||

2

H

2

ˆ

(

k

)

||

E

||

2

H

2

ˆ

(

k

)

+

∂V rob

ˆ

(

k

)

+

+

∂V rob

ˆ

(

k

)

∂V rob

ˆ

(

k

)

= 0

,

= 0

,

= 0

.

(5.20a)

(5.20b)

(5.20c)

With

∂V

ˆ

∂V

ˆ

∂V

ˆ

rob

(

k

)

rob

(

k

)

rob

(

k

)

= 4

= 4

=

4

i i i w k

( p

i

)

A

W

1

,i

Q

i

P

12

,i

P

i

+

+

Q

Q

12

,i

T

12

,i

P

W

T

12

,i

2

,i

F

+

B

Q

i

Q

T

12

,i

B

i

W

+ Q

3

,i

12

,i

B

i w i

( p

i

)

A

Q

i

P

i

+

W

1

,i

+ Q

B

i

12

,i

P

T

12

,i

F

B

Q

T

12

,i

Q

i

Q

B

i i

B

i

+

+ Q

Q

12

,i

12

,i i

B

i

F w k

( p

i

)

A

Q

i

P

i

C

+

i

W

2

,i

Q

12

,i

P

T

12

,i

F

F

+

B

Q

i

B

i

C

i

+

W

3

,i

Q

12

,i

B

i

+

+

C

C

C

C

i i

P

W

4

,i

P

i i

P

i

12

,i

P

T

12

,i

W

4

,i

C

i

B

i

P

T

12

,i

F

F

F

+

C

C

i

P

C

i i

P

i

C

i

P

T

12

,i

C

i

P

T

12

,i

P

12

,i

F

.

,

, and

||

E

||

ˆ

2

H

(

k

)

2

,

||

E

||

ˆ

2

H

(

k

)

2

and

||

E

||

ˆ

2

H

(

k

)

2

as in

(5.16)

.

Proof: The proof is analogous with the proof for Theorem 4.6.

For the lpv version of the frequency-limited method, described in Section 4.4.3, the necessary conditions for optimality can be stated as

96

5 lpv

Modeling

Theorem 5.3.

ited

H

2

Assume that all G i and

-norm to be defined, i.e., all the matrices

A

(

k

)

,

B

(

k

)

,

C

(

k

)

and

(

k

)

A

i

G i are asymptotically stable, for the limand

A

i are Hurwitz for all to be optimal for the problem i . In order for

minimize

ˆ

(

k

)

,

ˆ

(

k

)

,

ˆ

(

k

)

,

ˆ

(

k

)

||

E i

||

2

H

2

, E i

=

G i

i

, i where

+

+

i i

||

tions in (4.65) and the equations in (4.29) for all

||

||

E

E

ˆ

E

ˆ

i i i

||

||

(

k

)

||

(

k

)

2

H

2

2

H

2

2

H

2

,ω is defined in Chapter 3, it is necessary that they satisfy the equa-

=2

=2

i i w w k k

(

( p p

i i

)

)

Q

Q

T

12

,ω,i

ω,i

B

i

P

+

12

,i

Q

Q

T

12

,ω,i

ω,i

B

i i and that

P

i

T

ω,i

W

i i

T

=

D

i

0

,

i

= 0

,

(5.23a)

(5.23b)

+

i

+

i

||

||

E

ˆ

E

ˆ

i

(

k

)

i

||

||

(

k

)

2

H

2

2

H

2

=2

=

2

i i w k

(

w

p

k i

(

) p

i

C

)

i

-

P

C

ω,i i

S

ω,i

C

B

i i

P

+

12

,ω,i

D

i

ω

π

D

C

i i

S

D

ω,i i

B

i

i

T

D

T

ω,i i

ω

π

.

=

=

0

,

0

,

(5.23c)

(5.23d)

where

W

i

V

i

-

i

= Re

T

i

C

i

π

P

i

L

− ˆ

i i

T

C

i iω

I

P

12

,i

,

V

i

.

T

i

T

,

D

i

i i

T

.

(5.22)

(5.23e)

(5.23f)

With the function see Higham [2008].

L

(

· , ·

)

being the Frechét derivative of the matrix logarithm,

Proof: The proof is analogous with the proof for Theorem 4.8.

Low Rank Coefficient Matrices

For some applications it can be preferable to be able to control the rank of some of

A

(

k

)

,

B

(

k

)

,

C

(

k

) (

k

)

. See, for instance, the example in Section 7.1, where this is important.

One way of controlling the rank of the coe ffi cient matrices, is to parametrize them as

A

B

C

(

k

)

(

k

)

(

k

)

= V

(

k

)

A

W

= V

(

k

)

B

W

= V

(

k

)

C

W

(

k

)

T

A

(

k

)

T

B

(

k

)

T

C

,

,

.

(5.24a)

(5.24b)

(5.24c)

If, for example, it is assumed that the resulting lpv model should have

n r

states,

5.5

Computational Aspects of the Optimization Problems

97

A

R

(

k

)

n r

×

n k

R

n r

×

n

and

r

,

W

A k

(

k

)

, and the rank of the matrix ˆ

∈ R

n r

×

n k

(

k

) should be

n k

< n r

, then V

(

k

)

A

∈ is chosen. This type of parametrization have, with success, been used in, for example, Burer and Monteiro [2003] for semidefinite programs.

If this new parametrization is introduced, the only change in Theorem 5.1, Theorem 5.2 and Theorem 5.3, will be a small change in the gradients. For example,

A

(

k

) in Theorem 5.1 was computed as

||

E

||

2

H

2

ˆ

(

k

)

= 2

w k

( p

i

T

Q

E,i

P

E,i

ˆ

.

i

The new equations for the gradient, given the parametrization in (5.24), would be

||

E

|| 2

H

2

V

(

k

)

A

||

E

|| 2

H

2

W

(

k

)

A

= 2

= 2

i i w w k k

( p

( p

i i

)

T

Q

E

T

E,i

Q

P

E,i

E,i

P

EW

E,i

(

k

)

A

T

,

V

(

k

)

A

.

The equations for V

(

k

)

B

, W

(

k

)

T

B

, V

(

k

)

B

and W

(

k

)

T

C

follow analogously.

(5.25a)

(5.25b)

5.5

Computational Aspects of the Optimization

Problems

In this section, as in Section 4.5, an initialization will be suggested and again how to use both the structure in the variables and the equations to speed up the computations is shown.

As with the methods in Section 4.5, both the cost functions and gradients are given for the lpv methods and it is straightforward to use, for example, any quasi-

Newton solver to solve the optimization problem.

5.5.1

Structure in Variables and Equations

What was explained in Section 4.5.1, about structure in the sought system matrices, is applicable, with the same motivation, for the lpv methods. Hence, it is easy to impose structure in the system matrices, e.g., block-diagonal A -matrix.

In Section 4.5.3, it was explained how to use the inherent structure of the equations in the problem to, more e ffi ciently, compute the Lyapunov/Sylvester equations that is needed to compute the cost function and the gradient. For the modelreduction case it was possible to reduce the complexity for every iteration to

O

n

2

+

n

2

. For the lpv case, the same structure can be utilized for every lti be model in

O

N n

2

M

+

n

and iteration. This means that the complexity per iteration will

ˆ

2

, where

N

is the number of lti models in

M

.

98

5 lpv

Modeling

5.5.2

Initialization

A subject that needs more attention though, is the initialization. It is assumed, for the initializations described here, that one basis function for the sought system matrices is

w k

( p ) = 1, i.e., there is a constant term in the parametrization (5.7). A simple initialization is to use one of the given models in

M and set the constant matrix coe ffi cient terms to this model.

As with the model reduction problem, a bit more can be done in then case when there are no input or output filters. The cost functions in this case becomes

||

E i

||

2

H

2

= tr B

i

T

Q

i

B

i

+ 2 B

i

T

Q

12

,i

B

i i

T

i

B

i

,

(5.26a)

i i

||

E i

||

2

H

2

= tr C

i

P

i

C

i

T

2 C

i

P

12

,i

T

i

C

i

P

i i

T

.

(5.26b)

i i

A

(

k

)

B

(

k

)

C

(

k

) (

k

)

C

(

k

)

(or

C

(

k

)

) matrices are fixed. First, to have a system to start from, any of the given lti models in

M

= . Now set

ˆ

( p ) = ˜ , i.e., choose ˆ to be a constant matrix that does not depend on do the same thing for ˆ . The problem of finding ˆ

(

k

) p , and is now a quadratic problem which can be solved as explained below.

B

i

can be written as

B

i

= where size.

I

w w k

1

( p

i

( p

i

)

I

w

2

( p

×

i

-

)

I

B

(1)

w

3

T

( p

i

)

B

(2)

. . .

T

I

w

N w

( p

i

)

B

(3)

T

. . .

(

N w

)

T

.

T

i

¯

,

(5.27) are defined in (5.7) and

I is the identity matrix of compatible

Now rewrite the cost function (5.26a) as

V

= tr B

i

T

Q

i

B

i

+ 2 B

i

T

Q

12

,i

B

i i

T

i

Q

i

B

i

= tr B

i

T

Q

i

B

i

+ 2 B

i

T

Q

12

,i

p

i

¯

+ ¯

T

i

= tr

⎜⎝

⎜⎜⎜⎜

-

= tr

i

b

1

+

B

T

Q

i

B

i

b

2

+

+

⎢⎣

⎢⎢⎢⎢

2

1

2

B

T b

3

i

B

.

.

T

i

Q

12

,i i

⎥⎦

⎥⎥⎥⎥

i

T

¯

+

Q

i i

1

2

B

T

⎢⎣

⎢⎢⎢⎢

2

i i

T

Q

i i

The solution to the problem min

¯

V

, which always exists since b

3

⎥⎦

⎥⎥⎥⎥

B

⎟⎠

(5.28) is positive

5.6

Examples

99 semidefinite, is the solution to the linear system of equations b

3

=

− b

T

2

.

(5.29)

C

(

k

)

; define

= C

(1)

,

C

(2)

, . . . ,

(

N w

)

B

(

k

) that was found solving the quadratic problem described above. Now, the equations are obtained, where c

1

=

V

+

i

= tr

C

i

P

-

i

c

1

C

i

T

+

,

c c

2

2

C

=

T

+

2

1

2

+

Cc

i

3

C

i

C

P

T

.

12

,i

p

i

and c

3

= 2

+

i

(5.30)

i

T

P

i i

The solution to the quadratic problem in this case, which also always exists since

.

c

3 is positive semidefinite, is the solution to the system of linear equations

Cc

3

=

− c

2

.

(5.31)

These are suggestions for finding initial values for ˆ B and ˆ .

A

(

k

)

B

(

k

)

C

(

k

) matrices can be controlled, the initialization strategy above has to be used with caution. In the above strategy, using the parametrization

ˆ

( p ) = ˆ

(1)

+

w k

( p ) V

(

k

)

A

W

(

k

)

T

A

,

(5.32)

k

(

k

)

W

(

k

) are initialized as matrices with all zeros. Looking at (5.25), it can

V

(

k

)

A

W

(

k

)

A

(

k

) to stay zero

W

(

k

) to zero and the other one to a matrix with random values, or more generally to two orthogonal matrices. This will avoid the problem described above.

5.6

Examples

In this section, an illustrative example to shed light on some properties of the proposed methods will be presented. A larger more extensive example using the methods in this chapter will be presented in Chapter 7, since it requires more background material.

When solving the example, the function fminunc in

M atlab is used as the quasi-

Newton solver framework. To generate a starting point for the solver, which is an extremely important problem in need of significant amounts of research, the initialization procedure explained in Section 5.5.2 is used.

As a comparison, a method that will be called smile is used. The method is described in detail in De Caigny et al. [2012]. This method uses interpolation of the system matrices, by first changing all the given lti models to a common basis and then do a standard interpolation of the elements in the system matrices.

100

Example 5.2: Small lpv

Approximation Example

5 lpv

Modeling

To show the potential of the lpv approximation and illustrate the importance of addressing system properties, a small example is studied.

The system in this example is defined by a connection of two second-order systems, i.e., a system with four states, with parameter dependent damping,

G

ζ

1

=

G

1

G

2

,

= 0

.

1 + 0

.

9

p, ζ

2

1 where

G

1

= 0

.

=

s

1 + 0

2

.

+ 2

ζ

1

9(1

s

+ 1

p

)

, p

, G

2

[0

,

=

1]

.

s

2

9

+ 6

ζ

2

s

+ 9

,

(5.33a)

(5.33b)

The system was sampled by selecting 10 equidistant points in

p

∈ linear models with four states each are given as data to the method.

[0

,

1], i.e., 10

The data is given in a state basis where all the lti models are balanced. The elements in the system matrices happen to depend nonlinearly on the parameter

p

, see the gray dashed lines in Figure 5.1. The interesting and obvious property of this example is that there exists state bases (for example, observable canonical form) for which the model has a ffi ne dependence on of the system matrix A are a ffi ne in

p p

; in fact only two elements while all other matrix elements in A

,

B and

C are constants, see the black solid lines in Figure 5.1.

The method h

2 nl will be used with an a ffi ne parametrization with respect to the parameters, and we investigate if it is possible to find a representation of the true system with this structure, given the data where the individual elements in the system matrices depends nonlinearly on the parameter. Additionally, the method h

2 nl

, where we control the rank in ˆ

(1)

, where ˆ (

p

) = ˆ

(0)

+

p

ˆ

(1)

, will also be used. We will choose the rank to be two, since there exists a state-space basis where only two elements of the system matrix A are a ffi ne in

p

and all other elements are constant, see the black lines in Figure 5.1.

From the results in Table 5.1 it can be observed that when h

2 nl with rank 4 and rank 2, a high accuracy low order (indeed a ffi ne) the system can be found.

is used, both lpv model of

Using smile with an a ffi ne parametrization, a much worse model is obtained.

Achieving comparable results using the smile strategy requires polynomials of order two. To further illustrate the accuracy, 100 validation points are generated from (5.33) and the relative

H

2

-norm for the error model in these points is shown in Figure 5.2.

In this example, the importance of addressing the behavior of the system instead of interpolating the system matrices can be seen. First of all, it is hard to find base transformations such that all the given lti models are represented in the same basis (called a coherent state basis in De Caigny et al. [2012]), and second you cannot control how the system depends on the parameters in this basis, as is illustrated with the smile method using di ff erent orders in the polynomial of the parameter.

5.7

Conclusions

101

10

0

10

10

0

10

10

0

10

10

0

10

0 0

.

5 1 0 0

.

5 1 0 0

.

5 1 0 0

.

5 1

Figure 5.1: lpv system

The elements in the

(5.33)

A

-matrices as function of p for the four state for two di ff erent state bases. The gray dashed lines represents the elements in the

A

-matrix when any lti model extracted is given in a balanced form. For this state basis, the elements depend nonlinearly on p , This is also the basis for which the lti models that are given as data are extracted from. The black lines represents the elements in the another state basis when only two elements depend a ffi ne on p

A

-matrix for and the rest are constant. This state basis is shown here to show that there exist another, input-output equivalent, system which has a simple structure.

5.7

Conclusions

In this chapter, new local methods for computing an lpv model, given a set of lti models are proposed. These methods use a nonlinear optimization approach that is based on the model-reduction techniques in Chapter 4. The proposed methods try to preserve the input-output behavior of the given systems by minimizing the

H

2

-norm of the error systems. The cost functions and their gradients are derived to be computationally e ffi cient. This enables us to have a measure of first order optimality and to e ffi ciently use standard quasi-Newton solvers to solve the problem. The method has been shown to work both conceptually, on small examples, and on real-world examples, as we will see in Chapter 7.

There are two main advantages with the proposed methods, compared to existing local methods. The first one is that it is possible to impose structure in the elements in the system matrices. The other one is that the method tries to capture the input-output behavior of the given systems. However, this comes at the price of computational burden, which makes the method slower than many existing local methods. The fact that the methods consider the input-output behavior, using the

H

2

-norm, implies that the method is invariant to which state-space bases

102

5 lpv

Modeling

0

.

4

0

.

2

0

Table 5.1:

Method h

2 nl

, rank 2 h

2 nl

, rank 4 smile smile

i

||

E i

||

1

.

44 · 10

2

.

54 · 10

H

2

4

5

2

.

54 · 10

13

6

.

70

Degree

1

1

2

1 smile

, degree 1

· 10

14 smile

, degree 2

1

.

5

1

0

.

5

0

0

.

8

0

.

6

0

.

4

0

.

2

1

· 10 h

6

2 nl

, degree 1, rank 4

0 0

.

2 0

.

4

p

0

.

6 0

.

8

6

4

2

1

0

0

· 10 h

6

2 nl

, degree 1, rank 2

0

.

2 0

.

4

p

0

.

6 0

.

8 1

Figure 5.2:

The figure illustrates the relative

H

2

-norm of the error system in

100 validation points for the di ff erent methods. Note the di ff erent scales and that it takes a polynomial of order two using the smile approach to obtain a satisfactory result, as with the proposed method using an a ffi ne function.

the given local lti models are represented in and even how many states the given models have. It also implies that it is possible to find an lpv model with low dependence on the parameters, despite apparently complex dependence of the parameter.

6

Controller Synthesis

Let us start by quoting a sentence from Syrmos et al. [1997]: “

The static output feedback problem is one of the basic problems in feedback design, which, in the multivariable case, is still analytically unsolved.

” In Blondel and Tsitsiklis

[1997] they show that the static output-feedback stabilization problem is indeed

NP-hard if one constrains the coe ffi cients of the controller to lie in prespecified intervals. They also conjecture that already the unconstrained problem is NP-hard.

This chapter does not include a revolutionary solution to this problem, instead it proposes a computational method for finding locally optimal solutions to the mentioned problem and as will be shown, the method works for medium-scale systems and for controllers that have structural constraints. A method for synthesizing controllers for lpv systems, based on the first method, is also presented.

The methods use, as the methods in the previous chapters, a general nonlinear optimization approach.

6.1

Overview

The problem of finding an unstructured state-feedback

H

2 or

H

∞ controller is well known to be a problem that, under certain assumptions, see, e.g., Zhou et al.

[1996], easily can be solved. However, the problem of finding a static outputfeedback

H

2

(or

H

) controller is generally a non-convex problem and not solved as easily. The problem of finding an

H

2 controller is closely related to the problem of finding an optimal controller with a quadratic performance criterion. This problem was introduced in Kalman [1960] and has been studied since then. The problem has been attacked in di ff erent ways, both using direct general-purpose minimization, see, e.g., Rautert and Sachs [1997], and using semidefinite pro-

103

104

6 Controller Synthesis grams ( sdp

) see, e.g., Stingl [2006]. These methods can handle problems of moderate sizes but can experience problems already for small-scale systems.

sdp has been a hot topic during the last years, but the problem with the sdp approach is that it scales badly with the dimension of the problem. When formulating this particular optimization problem, of finding a reduced-order controller, it involves bilinear matrix inequalities ( bmi s) that makes the problem even more di ffi cult to solve, see Mesbahi et al. [1995]. Another approach that very recently has been published is the more direct approach in Lin et al. [2009] (and Fardad et al. [2009]) that formulates the problem as a general nonlinear optimization problem and uses a dedicated quasi-Newton algorithm to solve the problem. The first method presented in this chapter resembles closely the method presented in Lin et al. [2009], but has been independently derived with an, in our opinion, more straightforward derivation. The main focus in Lin et al. [2009] is on the ability to create structured controllers, e.g., interconnected systems subject to architectural constraints on the distributed controller. In this chapter the main goal is to find a method that is applicable to medium-scale systems and is expandable to a framework for creating robust

H

2 controllers or controllers for lpv system, e.g., controllers for systems with parametric uncertainties. The first method is then extended to handle controller synthesis for lpv systems, much as how the methods in Chapter 5 are extensions of the methods in Chapter 4.

The methods proposed in this chapter, for controller synthesis, both for lpv lti and systems, will of course have at least two drawbacks. The first one is that the methods need a stabilizing controller to be able to start the optimization and finding a stabilizing controller is most likely an NP-hard problem. The second one is that, given a stabilizing controller, the problem of finding a static outputfeedback

H

2 controller is a non-convex problem, therefore the proposed methods can not guarantee to find a globally optimal controller but only a locally optimal one.

6.2

Static Output-Feedback

H

2

-Controllers

In this section, a method for synthesizing static output-feedback for lti

H

2 controllers systems will be presented, and as explained in Section 2.1.4, this method can also be used to synthesize reduced-order controllers. The proposed method will, as the methods presented in Chapter 4 and Chapter 5, be based on minimizing the

H

2

-norm.

The goal with the optimization problem in this section is to formulate an optimization problem for synthesizing a static output-feedback controller. When formulating this optimization problem, great care need to be taken when deriving the expression for the cost function and its gradient to make sure that the expressions can be evaluated e ffi ciently. The method presented in this section is designed to work on medium-scale systems, which will be shown later, and it also works with structural constraints in the controller.

As described in Section 2.1.4, the model that will be used to measure the perfor-

6.2

Static Output-Feedback

H

2

-Controllers

105 mance of a system is

⎜⎜⎝

⎜⎜⎜⎜ x

⎞ z y

⎟⎟⎟⎟

⎟⎟⎠

=

⎜⎜⎝

⎜⎜⎜⎜

A

C

1

C

2

B

1

D

11

D

21

B

2

D

12

0

⎟⎟⎠

⎟⎟⎟⎟

⎜⎜⎝

⎜⎜⎜⎜ x w

⎞ u

⎟⎟⎠

,

(6.1) where x

∈ R

n x

control signal, signal.

is the state vector, z

∈ R

n z

w

∈ R

n w

the disturbance signal, the performance measure and y

∈ R

n y

u

∈ R

n u

the the measurement

Closing the loop with a static output-feedback controller, u = matrix describing the controller, yields the closed-loop system

Ky , where K is a

T w,z

=

A

T

C

T

B

T

D

T

=

C

A +

1

+

B

2

KC

2

D

12

KC

2

B

1

D

11

+

+

B

2

KD

21

D

12

KD

21

.

(6.2)

Now, let us formulate the optimization problem of minimizing the the closed-loop system from w to z ,

T w,z

, in (6.2), i.e.,

H

2

-norm of min

K

T w,z

2

H

2

.

(6.3)

Since the equations will di ff er in continuous and discrete time but the general ideas are the same, both versions will be presented but with less detail in the discrete-time case.

6.2.1

Continuous Time

For the

H

2

-norm to be defined, the system and strictly proper, i.e., A + B

2

KC

2

T w,z

has to be asymptotically stable has to be Hurwitz and D

11

+ D

12

KD

21

=

0 . Note that already the problem of finding a K that stabilizes the system is, as explained in the beginning of this chapter, most likely an NP-hard problem.

Because of this, for the rest of the chapter, if nothing else is mentioned, it will be assumed that K stabilizes the system.

To compute the cost function for the optimization problem (6.3), the cost function have to be expressed in a more suitable form for evaluation. Using (2.21), the cost function for the optimization problem (6.3) can be expressed as

T w,z

2

H

2

= tr B

T

T

Q

T

B

T

= tr C

T

P

T

C

T

T

,

where Q

T

and P

T

satisfy the Lyapunov equations

A

T

P

T

A

T

T

Q

T

+

+ P

T

A

T

T

Q

T

A

T

+

+

B

T

B

T

T

C

T

T

C

T

= 0

,

= 0

.

(6.4a)

(6.4b)

(6.5a)

(6.5b)

Now, with the equations in (6.4) and (6.5) it is possible to state necessary conditions for optimality for (6.3). In the theorem below, which states the necessary conditions for optimality, the gradient of the cost function for the optimization

106

6 Controller Synthesis problem (6.3) can be readily extracted to be used in, for example, a quasi-Newton algorithm.

Theorem 6.1 (Necessary conditions for optimality).

(6.1)

Given a system and a static output-feedback controller, described by the matrix

G as in

K

, such that

u = Ky

. The system G and the controller are given such that the closedloop system,

Hurwitz and

T w,z

D

T in

=

(6.2)

0

, is asymptotically stable and strictly proper, i.e.,

. In order for the matrix

(6.3), it is necessary that

K

K

satisfies the equations in (6.5) and that

A

T is to be optimal for the problem

∂ T w,z

K

2

H

2

= 2 B

T

2

Q

T

P

T

C

T

2

+ B

T

2

Q

T

B

T

D

T

21

+ D

T

21

C

T

P

T

C

T

2

= 0

.

(6.6)

Proof: If A

T

is Hurwitz, then the equations in (6.5) are uniquely solvable. These are needed to compute the cost function and its gradient. Now the gradient of the cost function with respect to K has to be computed. Let

(

i, j

) in K . First di ff erentiate (6.5b) with respect to

k ij k ij

denote element

, which will be needed later on, which entails

A

T

T

Q

T

∂k ij

+

Q

T

∂k ij

A

T

+

A

T

T

∂k ij

Q

T

+ Q

T

A

T

∂k ij

+

C

T

T

∂k ij

C

T

+ C

T

T

C

T

∂k ij

= 0

.

Now di ff erentiate the cost function (6.4a) with respect to

k ij

,

(6.7)

∂ T w,z

∂k ij

2

H

2

= 2 tr

B

T

T

∂k ij

Q

T

B

T

+ tr

Q

T

∂k ij

B

T

B

T

T

.

(6.8)

Using Lemma 4.1 on the equation above together with equations (6.5a) and (6.7) entails

∂ T w,z

∂k ij

2

H

2

= 2 tr

A

T

T

∂k ij

Q

T

P

T

+

B

T

T

∂k ij

Q

T

B

T

+

C

T

T

∂k ij

C

T

P

T

.

(6.9)

Using the structure of the variables A

T

, B

T

and C

T

in (6.2) and Lemma 4.2 yields

∂ T w,z

K

2

H

2

= 2 B

T

2

Q

T

P

T

C

T

2

+ B

T

2

Q

T

B

T

D

T

21

+ D

T

12

C

T

P

T

C

T

2

.

(6.10)

6.2

Static Output-Feedback

H

2

-Controllers

107

For the optimization problem (6.3) it is also quite straightforward to derive the

Hessian, in the same manner as deriving the gradient, ending up in

2

T w,z

∂k ij

∂k kl

2

H

2

= 2 tr

+ B

T

2

Q

T

P

∂k

T kl

C

K

T

∂k

T

2

ij

+

B

T

2

Q

T

∂k

B

T

2

Q

T

B

kl

2

B

T

K

∂k kl

D

T

21

D

21

+

D

D

T

21

T

21

+

C

T

P

∂k

T kl

D

T

21

D

21

C

T

2

+

K

∂k kl

B

T

2

C

2

Q

∂k

P

T

T

P

T kl

C

T

2

C

T

2

= 2 D

T

12

C

T

P

∂k

T kl

C

T

2

ij

+ 2 D

T

12

C

T

P

T

∂k ij

C

T

2

kl

+ 2 B

T

2

Q

T

P

∂k

T kl

C

T

2

ij

+ 2 B

T

2

Q

T

P

T

∂k ij

C

T

2

kl

+ 2 B

T

2

Q

T

B

2

ik

D

21

D

T

21

lj

+ 2 D

T

12

D

12

ik

C

2

P

T

C

T

2

lj

.

(6.11)

6.2.2

Discrete Time

In discrete time, for the totically stable, i.e., A +

H

2

B

2

-norm to be defined, the system

KC

2

T w,z

must be asymphas to be Schur. To compute the cost function in

(6.3) for discrete-time systems the equations

T w,z

2

H

2

= tr B

T

T

Q

T

B

T

= tr C

T

P

T

C

T

T

+ tr D

T

D

T

T

+ tr D

T

T

D

T

,

(6.12a)

(6.12b) can be used, where Q

T

and P

T

satisfy the discrete-time Lyapunov equations

A

T

A

T

T

P

Q

T

T

A

A

T

T

T

P

T

+ Q

T

+ B

T

+ C

T

T

B

T

T

C

T

=

=

0

0

.

,

(6.13a)

(6.13b)

Theorem 6.2 (Necessary conditions for optimality).

(6.1)

Given a system and a static output-feedback controller, described by the matrix

G as in

K

, such that

u = Ky

. The system G and the controller are given such that the closed-loop system, matrix

T w,z

K

in

(6.2)

, is asymptotically stable, i.e.,

A

T is Schur. In order for the to be optimal for the problem (6.3), it is necessary that it satisfies the equations in (6.13) and that

∂ T w,z

K

2

H

2

= 2 B

T

2

Q

T

A

T

P

T

C

T

2

+ B

T

2

Q

T

B

T

D

T

21

+ D

T

21

C

T

PC

T

2

+ D

T

12

D

T

D

T

21

= 0

.

(6.14)

Proof: The proof is analogous to the proof for Theorem 6.1

As in the continuous-time case it also here possible to compute the Hessian,

108

6 Controller Synthesis which becomes

2

T w,z

∂k ij

∂k kl

2

H

2

= 2 D

T

12

C

T

P

∂k

T kl

C

T

2

ij

+ 2 D

T

12

C

T

P

T

∂k ij

C

T

2

kl

+2

+ 2

B

T

2

Q

B

T

2

Q

T

A

T

P

T

∂k kl

C

T

2

T

B

2

ik

D

21

D

T

21

ij lj

+ 2

+2 D

B

T

2

Q

T

A

T

T

12

D

12

ik

P

∂k

T ij

C

T

2

C

2

P

T kl

C

T

2

+ 2

lj

+2

B

T

2

Q

T

B

2

D

T

12

D

12

ik ik

C

2

P

T

C

T

2

lj

D

21

D

T

21

.

lj

(6.15)

6.3

Static Output-Feedback

H

2

LPV

Controllers

The controller synthesis method for lpv systems presented in this section will be an extension of the method presented in the previous section, much as how the methods in Chapter 5 are extensions of the methods in Chapter 4. The goal with the optimization problem in this section, is to synthesize a static outputfeedback linear parameter-varying

H

2 controller. The idea is to be able to directly synthesize a controller using data, instead of first identifying an lpv model and then from that model synthesize a controller. As talked about in Chapter 5, given an lpv model,

G

( p ) :

⎜⎜⎝

⎜⎜⎜⎜ z x y

⎟⎟⎠

⎟⎟⎟⎟

=

⎜⎜⎝

⎜⎜⎜⎜

A

C

1

C

2

(

(

( p p p

)

)

)

B

D

D

1

11

21

(

(

( p p p

)

)

)

B

2

D

12

( p )

( p )

0

⎟⎟⎠ ⎜⎜⎝

⎜⎜⎜⎜ x w

⎞ u

⎟⎟⎠

(6.16) what ideally is wanted, is to minimize the integral min

K ( p )

T w,z

( p )

2

H

2 d p

,

(6.17) where

G

( p ) with the a set,

T w,z

( p

M

, of

) is the closed-loop system when closing the loop for the

N

lpv lti controller models,

G i

lpv model

K

( p ). However, in this section, it is assumed that

, for di ff erent fixed parameter values, p

i

, just as in Chapter 5, is given. This will of course lead to the fact that it is not possible to control the dynamic behavior coming from when the parameters are not fixed, as discussed in Chapter 5, since this information is not present in the data given. However, this is a common problem when working with gain-scheduling and it is assumed in this thesis that the parameters move slowly such that the dynamics from the parameters do not influence the system much, a commonly used assumption, see Shamma and Athans [1992]. The optimization problem now becomes

N

minimize

K ( p )

i

=1

T w,z

( p

i

)

2

H

2

,

(6.18) which for a fixed

i

becomes equivalent to the problem in Section 6.2.

6.4

Computational Aspects

109

The parametrization of the controller

K

( p ) : u (

t

) = K ( p ) y (

t

) with respect to the parameters is taken as

K ( p ) =

w k

( p ) K

(

k

)

.

k

(6.19)

(6.20)

As when identifying lpv models, the functions

w k

( p ) are design choices that can be hard to choose. However, given such a parametrization and a set of lti models, as in (6.16), where the given as

K

( p

i

) =

K i

problem can be written as lti models are denoted as and the closed loop system as

T w,z

( p

i

G

( p

i

) =

T

) =

w,z,i

G i

, the controller

, the optimization

N

minimize

K

(

k

)

i

=1

T w,z,i

2

H

2

(6.21) where Q

T ,i

and P

T ,i

satisfy the Lyapunov equations

A

T ,i

P

T ,i

A

T

T ,i

Q

T ,i

+

+ P

T ,i

A

T

T ,i

Q

T ,i

A

T ,i

+

+

B

T ,i

B

T

T ,i

C

T

T ,i

C

T ,i

= 0

,

= 0

,

(6.22a)

(6.22b) for the continuous-time case and their discrete-time counterpart in the discretetime case.

Now we formulate the necessary conditions for optimality for this method in continuous time, the conditions for the discrete-time case are analogous.

Theorem 6.3 (Necessary conditions for optimality).

Assume that

K

i stabilizes the system G i is Hurwitz and and that all closed-loop systems,

D

T ,i

= 0

for all i

T w,z,i are strictly proper, i.e.,

A

T ,i

. In order for the matrices

K

(

k

)

to be optimal for the problem (6.21), it is necessary that

K (

p

)

satisfies the equations in (6.22) for all i

, and that

+

i

T w,z,i

K

2

H

2

= 2

N w k

( p

i

)

i

=1

B

T

2

,i

Q

T ,i

P

T ,i

C

T

2

,i

+ B

T

2

,i

Q

T ,i

B

T ,i

D

T

21

,i

+ D

T

12

,i

C

T ,i

P

T ,i

C

T

2

,i

= 0

.

(6.23)

Proof: The proof is analogous with the proof for Theorem 6.1.

6.4

Computational Aspects

In this section, a suggestion of how the methods in this chapter can be initialized and how to speed up the computations will be presented.

As with the methods in the previous chapters, both cost functions and their gra-

110

6 Controller Synthesis dients have been calculated and can easily be used in, e.g., a quasi-Newton algorithm to solve the optimization problem. For the methods described in this chapter also the Hessians have been calculated, which can be utilized in a quasi-

Newton algorithm to initialize the Hessian approximation in, e.g., bfgs

. We do not want to use the Hessian information in every iteration since this would be too heavy, computationally.

The derivations for the gradients and the Hessians in Section 6.2, have been done element wise, as with the methods in the previous chapters. This means that it is possible, also for these methods, to introduce structure in the controller, e.g. a diagonal controller.

For the methods in this chapter it is, however, not as straightforward to utilize the structure in the Lyapunov equations (6.22) (or (6.5)) since, in the realization

(6.2), there is no obvious structure that can be exploited. What can be used, is that if both the cost function and the gradient have to be computed, both

P

T

must be computed, and the fact that A

T

Q

T

and P

T

Q

T

and can be solved e ffi ciently together by using is the factor in both of the Lyapunov equations, see for example

Benner et al. [1998].

The optimization problems (6.3) and (6.21) are both non-convex and nonlinear, which makes the initialization an important problem. Additionally, it is required that the initializing controller is stabilizing, which is probably an NP-hard problem, see Blondel and Tsitsiklis [1997]. If the given system (or systems if given a set of models) is asymptotically stable, then the initialization used is a controller with all zeros. However, if given an unstable system for which an

H

2 controller should be computed we take use of other, existing, methods/algorithms to try and stabilize the system and then start our method with this stabilizing controller.

The algorithm used to find a stabilizing controller is hifoo

(see Gumussoy et al.

[2009]).

6.5

Examples

In this section, we will try to show the applicability of the methods presented in this chapter using some examples. We begin with an example where the method presented in Section 6.2 is used on some systems in the COMPl e collection (see Leibfritz and Lipinski [2003]).

ib benchmark

Example 6.1: COMPl e ib-Systems

In this example our goal is to compare the method presented in Section 6.2, which will be called h

2 nlctrl

, with the sdp

-method described in Stingl [2006], called stingl

, and the method described in Arzelier et al. [2011], called hifoo

.

The systems used in this example comes from COMPl e ib (see Leibfritz and Lipinski [2003]), and are systems ranging from 2 to 1100 states. In all systems we have

D

11 the

=

H

2

0 and D

12

= 0 or D

21

= 0 , to make sure that

-norm of the closed-loop system is defined.

D

11

+ D

12

KD

21

= 0 , so that

6.5

Examples

111

To initialize h

2 nlctrl and hifoo

, a heuristic approach is used. First it is checked if the system is open-loop stable and if that is the case, then the optimization is initialized with K = 0 . If this does not hold then the optimization package hi

foo

, see Gumussoy et al. [2009], is called to minimize the real part of the largest eigenvalue of the matrix A + B

2 found are reported in the tables.

KC

2

. Only cases where a stabilizing controller is

In Table 6.1, Table 6.2, Table 6.3, Table 6.4 and Table 6.5, results from the numerical benchmark are presented. The name of the COMPl e ib-system is displayed in the first column. In the second column, the relevant sizes, i.e., number of states, number of output and number of inputs are display and are denoted

n u

respectively. In columns three, four and five the

H

2

n x

,

n y

and

-norm for the resulting closed-loop systems are displayed and in columns six, seven and eight how long time it took for the methods h

2 nlctrl

, hifoo and stingl to find the controller are displayed. For the first 31 systems, which also occur in the test performed in Stingl [2006], the results are compared to the results reported in Stingl [2006].

For the reminder of systems a “–” in the fourth column denotes that we do not have any other results from Stingl [2006] to compare with. In Stingl [2006] they could not find a controller for these systems, mostly because of numerical problems and rapid growth of the

Lyapunov matrices P or Q .

sdp s, since they optimize over both K and the

When the results using hifoo and h

2 nlctrl are compared with the results from stingl

, hifoo and h

2 nlctrl find for almost all systems the same value for

T

hifoo

w,z

H

2

, however, generally, much faster. When comparing h

2 nlctrl and they perform very similar for most of the systems regarding the value

T w,z

H

2

. However, for a large number of the systems, the controller faster than hifoo and for a few systems h

2 nlctrl hifoo is able to find is not able to compute a controller due to out of memory, denoted with “–” in the fifth column, where h

2 nlctrl can.

The results in the tables below also show the benefit with the new method, apart from being able to handle structure in the controller, it can handle medium-scale systems. The amount of optimization variables does not grow with the amount of states in the systems, as in the sdp case used by Stingl [2006], but only depends on the size of the controller.

112

6 Controller Synthesis

6.5

Examples

113

114

6 Controller Synthesis

6.5

Examples

115

116

6 Controller Synthesis

6.5

Examples

117

Table 6.6:

Numerical values for the coe ffi cients in the

Example 6.2 and the time to compute them.

Model K

(0)

K

(1)

K

(2)

lpv

Time [s]

controllers in

Constant

Linear

Quadratic

0.1638

0.259

0.2936

-0.305

-0.935

0.6903

0.09 s

0.12 s

0.11 s

A small example of an lpv controller synthesis problem is now presented to show the potential of the method proposed in Section 6.3.

Example 6.2

The system in this example is the same as in Example 5.2

G

=

G

1

G

2

,

where

G

1

ζ

1

=

s

= 0

.

2

+ 2

1

ζ

1 + 0

.

1

9

s

+ 1

p, ζ

2

, G

2

=

s

2

= 0

.

1 + 0

.

9(1

9

+ 6

ζ

2

s

+ 9

p

)

, p

,

[0

,

1]

.

(6.24a)

(6.24b)

(6.24c)

From these equations we obtain A (

p

)

,

B

2

(

p

)

,

C

2

(

p

) and D

22

(

p

), using the notation in (2.31), that represents the dynamical system. Then we create the matrices

D

B

1

(

p

) =

I

4

×

4

,

11

(

p

) = 0

4

×

4

,

C

D

1

(

12

p

(

) =

p

I

) =

4

×

4

0

3

×

1

1

,

D

21

(

p

) = 0

1

×

4 to have a fully defined performance measure of the system. From this system we extract five systems representing five equidistant points in

p

[0

,

1], i.e., we are given five lti models, extracted from the lpv system (6.24), with four states each.

The lpv system is expressed in a balanced state basis. In this state basis the lpv system depend nonlinearly on the parameter

p

, see Figure 6.1. Hence, judging from the given data, one could easily suspect that a complex lpv controller would be required. However, in this example, using the proposed method from

Section 6.3, we will try to find three static output-feedback lpv controllers of different complexity, one that is constant and independent of the parameter

p

, one that is linear in

p

and one that is quadratic in

p

. For example, the quadratic lpv controller has the structure,

u

(

t

) = K (

p

)

y

(

t

)

,

K (

p

) = K

(0)

+ K

(1)

p

+ K

(2)

p

2

.

(6.25)

The specific values for the resulting them can be found in Table 6.6

lpv controllers and times for computing

To validate the controllers, 100 validation points were generated from (6.24), for

p

[0

,

1]. For each of these 100 models, an optimal static output-feedback

118

6 Controller Synthesis

4

0

4

4

0

4

4

0

4

4

0

4

0 0

.

5 1 0 0

.

5 1 0 0

.

5 1 0 0

.

5 1

Figure 6.1: lpv system

The elements in the

(6.24)

A

-matrices as function of in the given state basis.

p for the four state

controller was created (the associated optimization problem is a scalar problem, hence trivially solved using, e.g., gridding). In Figure 6.2 the ratio between the

H

2

-performance for the di ff erent lpv controllers and the

H

2

-performance with the optimal static output-feedback controller in the di ff erent validation points is shown, i.e., the closer the curve is to the value one the closer the lpv controller is to the optimal controller. In Figure 6.2, we see that method is able to find lpv controllers that, depending on the complexity of the lpv the optimal reference controller in the validation points.

controller, is close to

In Figure 6.3, the reference controller and the resulting lpv controllers (constant, linear and quadratic in one can see that with an

p

) are plotted. Looking at both Figure 6.2 and Figure 6.3

lpv controller that is quadratic in

p

we find a controller that is very similar to the globally optimal one.

6.6

Conclusions

In this chapter, two methods for synthesizing one for lti systems and one for lpv

H

2 controllers have been presented, systems. The methods use a direct nonlinear optimization approach to solve the problem which makes it possible to control the structure of the controller to create, e.g., a diagonal or bidiagonal controller.

For these methods, both cost functions, gradients and hessians have been derived, which makes it possible to e ff ectively use of-the-shelf quasi-Newton solvers and makes it possible to solve problems of medium-scale size. One of the drawbacks with the methods is the non-convexity of the problems and the possible fact that finding a stabilizing controller is an NP-hard problem. However, this is a problem

6.6

Conclusions

119

1

.

03

1

.

02

1

.

01

Constant

Linear

Quadratic

1

0 0

.

2 0

.

4 0

.

6 0

.

8 1

p

Figure 6.2:

The ratio between the controllers and the

H

2

H

2

-performance with the di ff erent lpv

-performance with the optimal static output-feedback controller in the di ff erent validation points. The closer the curve is to the value one the closer the lpv controller is to the optimal controller.

0

.

3

0

.

2

0

.

1

0

True

Constant

Linear

Quadratic

0 0

.

2 0

.

4 0

.

6 0

.

8 1

p

Figure 6.3:

The reference controller (solid line) and the resulting lpv controllers (linear in in p , dashed line, quadratic in p , dash-dotted line and cubic p

, dotted line) plotted as functions of the parameter, p

.

120

6 Controller Synthesis that the methods have in common with other methods too, and is one of the problems that need more attention in the future. One possible direct extension that has not been tested is to use the idea of controlling the rank of the system matrices, as in Section 5.4.2. By using a method that can control the rank, one could, for example, enforce the controller to have integrators.

7

Examples of Applications

In this chapter, the methods from Chapter 4 and Chapter 5 are illustrated with two more elaborate examples. In the first example, both model-reduction methods from Chapter 4 and lpv generation methods from Chapter 5 are used on an

Airbus aircraft model, to show the applicability of the methods on a real-world example. In the second example, we show how model-reduction methods can be used in system identification to obtain better estimates for certain model structures.

7.1

Aircraft Example

The models used in this section, are models of an Airbus aircraft that were developed and used in an EU project called cofcluo

(Clearance Of Flight Control

Laws Using Optimization, see http://cofcluo.isy.liu.se/ and Varga et al.

[2012]). The main objective of the cofcluo project was to develop methods that use optimization techniques to make clearance of flight control laws more e ffi

cient and reliable, see for example Garulli et al. [2013]. The clearance of flight control laws is an important part of the certification and qualification process for the airplane industry. The models used in the examples below are three lpv models that, with di ff erent complexity, describe an airplane in closed loop in the longitudinal direction. All models are siso lpv depend polynomially on the parameters. The di ff models with 22 states and all erence between the lpv models is that they depend on one (di ff erent configurations for the center tank), two

(di ff erent configurations for the center tank and the outer tank) or three parameters (di ff erent configurations for the center tank, the outer tank and payload) respectively.

121

122

7 Examples of Applications

7.1.1

LPV

Simplification

To be able to use certain analysis methods for evaluating performance criteria for flight clearance, the lpv models have to be represented as linear fractional representations, lfr s, (see, e.g., Zhou et al. [1996] or Hecker [2006]). To be able to use the analysis methods e ffi ciently the lfr s have to be of low order. Generally, any lpv model with rational dependence in the parameters can be turned into an lfr

. However, it is a di ffi cult problem to guarantee that the resulting lfr is of minimal order. There exist some special cases for when this is possible, for example, when the lpv depends a ffi nely on the parameters, see Hecker [2006]).

Take an lpv model

G

( p ) =

A ( p )

C ( p )

B

D

(

( p p

)

)

,

where the system matrices depend a ffi nely on the parameters in

A

(0)

+ A

(1)

p

1

+ A

(2)

p

create the matrices F

2

+

(0)

,

· · ·

F

+

(1)

,

A

F

(

N

)

(2)

,

p

N

and the same for

. . .

, F

(

N

) as

B ( p ), C ( p p , i.e.,

) and D ( p

A ( p ) =

). Now

F

(0)

=

A

C

(0)

(0)

B

D

(0)

(0)

,

F

(1)

=

A

C

(1)

(1)

B

D

(1)

(1)

,

F

(2)

=

A

C

(2)

(2)

The minimal order the this lfr lfr

, generated from

G

( p ), can have is is easy to compute, see Hecker [2006].

B

(2)

D

(2)

, . . . .

+

N i

=1 rank F

(

i

) and

In this example, the lpv generation methods described in Chapter 5 will be used to reduce the complexity, with respect to the parameters, of the original lpv models. The strategy that will be used is to sample a number of three given lpv lti models from the models and choose an a ffi ne parametrization for the generated lpv models to be able to guarantee that a low order lfr can be computed from the generated lpv models.

The given lpv models are not strictly proper, which is a problem when using methods based on the

H

2 cumvent this problem, the

-norm, since the

D

H

2

-norm is infinite if D matrices are first ignored and an a ffi ne

0 . To cirlpv model is computed using only the A , B and C matrices. To find the resulting D matrices a simple element-wise interpolation problem is solved. However, since the D matrices are una ff ected by state transformations the complexity cannot as easy be reduced for the D matrices and a higher order polynomial might be necessary in the interpolation to obtain a su ffi ciently good approximation.

As mentioned above, an lfr of low order is preferred. The first step towards this was to use an a ffi ne parametrization. However, by using the rank controlling method described in Section 5.4.2, it is possible to control the rank of the coe ffi cient matrices ( F

(1)

,

F

(2)

, . . .

) in the generated lpv

. Hence, using the rank controlling method described in Section 5.4.2 the complexity of the resulting lfr can be lowered even more by constraining the appropriate matrices to have low rank.

In this example, we sample 10, 100 and 125 lti models from the one, two and

7.1

Aircraft Example

123 three parameter lpv models, respectively. The tantly in the parameter space. These lti lti models are sampled equidismodels are used as inputs to the proposed methods. Two lpv lpv models will be generated for the data sets from the models with one and two parameters, one with full rank in all the coe ffi cient matrices and one with rank deficient A

(1) and A

(2) matrices.

A few di ff erent ranks for the A

(1)

, A

(2) and A

(3) matrices were tested and for the one parameter model set, rank two was chosen for the matrix A

(1) and for the two parameter model set, rank eleven was used for both A

(1) and A

(2)

. For the three parameter model set, no su ffi ciently good model, for the ranks tested, was found and only the result using coe ffi cient matrices with full rank will be presented.

The validity of the resulting ferent, set of lti lpv models are evaluated by sampling a new, difmodels from each of the given lpv models and compare these with the generated lpv

H

2

-norm, ignoring the models. The models are compared both using the relative

D matrices and the relative matrices. The results from the lpv

H

-norm, including the D generation are displayed for the one parameter case in Figure 7.1. For the two parameter case, the full rank case is displayed in Figure 7.2 and for the low rank case in Figure 7.3. The result from the three parameter case is displayed in Figure 7.4.

In Figure 7.1 – 7.4, we can see that all the generated lpv models have a low relative

H

2

-norm for all validation models. This suggests that we have found good approximations of the original lpv models. Not only is the relative

H

2 norm low, but also the relative

H

-norm, which gives another certificate that the generated models approximates the given lpv models well. Looking at Table 7.1,

we can also see that complexity of the resulting lfr have decreased in most cases and especially in the cases where we were able to find lpv models with rank deficient coe ffi cient matrices. These facts suggest that the proposed lpv methods can be used to reduce the complexity of lpv models and their lfr s. Another interesting fact that can be seen in Figure 7.1 is that for the one parameter model, the resulting model using a rank deficient coe ffi cient matrix finds a better model than the one with full rank. Two likely explanations are that it could be due to the non-convexity of the problem or that the full rank case is an over-parametrization and the low rank method works as a regularization to the problem.

7.1.2

Model Reduction

The three lpv models, described in the previous section, describe an aircraft, and more precisely a flexible aircraft. The original models were computed using finite element computations and were very large. These models were then reduced such that the dynamics above 15 rad

/

s in the models were truncated. Hence, the given lpv models are only valid up till 15 rad

/

s, which makes these models suitable for testing the frequency-limited model-reduction method, described in

Section 4.4.3. As can be seen in Figure 7.5, which plots the magnitude curve of one of the lti models, it would be beneficial to be able to ignore the dynamics after 15 rad

/

s when doing model reduction.

For this example we extract one lti model from the one parameter lpv model

124

7 Examples of Applications

4

Relative error in

· 10

− 4

H

2

-norm for 100 validation points

2

0

2

Relative error in

· 10

− 2

H

-norm for 100 validation points

1

0

1

0

.

5 0

p

0

.

5 1

Figure 7.1:

H

2 the one parameter case. The gray line comes from the case when the coe ffi cient matrix

(1) has full rank and the black dashed line from the case when

A

(1)

Relative error in - and

H

-norm at 100 validation points, in has rank two. Interesting to note is that the low rank model performs better than the full rank one. This could, for example, be due to the non-convexity of the problem or over-parametrization.

Relative error in

H

2

-norm for 1225 validation points

· 10

− 4

1

1

− 0

.

5

0

Relative error in H

0

.

5

1

− 1

0

-norm for 1225 validation points

1

· 10

− 3

2

0

− 1

1

0

.

5

0

0

.

5

1

1

0

p

1

p

2

Figure 7.2: Relative error in

H

2

- and

H

-norm at 1225 validation points, in the two parameter case when the coe ffi cient matrices

A

(1) and

A

(2) have full rank.

7.1 Aircraft Example

125

Relative error in H

2

-norm for 1225 validation points

· 10

4

4

2

− 1

− 0

.

5

Relative error in

0

H

0

.

5

1

1

0

-norm for 1225 validation points

1

· 10

− 3

2

1

− 1

0

p

1

1

− 1

0

.

5

p

2

0

0

.

5

Figure 7.3: Relative error in

H

2

- and

H

-norm at 1225 validation points, in the two parameter case when the coe ffi cient matrices

(1) and

A

(2) have rank

11.

Relative error in

H

2

-norm for 3375 validation points

200

100

0

0 1 2 3 4 5 6 7

300

Relative error in H

· 1 · 10

− 4

-norm for 3375 validation points

200

100

0

0 2

4

6

Error

8 10 12

· 1 · 10

− 3

Figure 7.4: A histogram over the relative error in

H

2

- and

H

-norm at 3375 validation points, in the three parameter case when the coe ffi cient matrices

A

(1)

,

ˆ

(2) and

A

(3) have full rank.

126

7 Examples of Applications

Table 7.1: A table showing the amount of time it took to compute the di ff erent

¯

∆ lpv represents the size of the resulting ods and models from Section 7.1.1 and the sizes of the corresponding

n

∆ lfr coming from the proposed methrepresents the size of the resulting lfr from the original lfr s.

lpv model.

lpv

Model

n

n

Time

1 parameter, full rank

1 parameter, rank 2

2 parameters, full rank

2 parameters, rank 11

3 parameters, full rank

26

6

52

32

94

20

20

62

62

98

7m 56s

8m 56s

1h 35m 31s

50m 44s

1h 55m 53s

Magnitude plot for a sampled lti model

50

0

− 50

10

2

10

1

10

0

Frequency [rad

/

s]

10

1

10

2

Figure 7.5: eter lpv

A magnitude plot for a sampled lti model from the one parammodel. The dashed vertical line denotes

ω

= 15 rad

/

s

.

7.1 Aircraft Example

50

Magnitude plot for the error models wbt mflbt flbt flistia flh

2 wh

2 nl nl

0

127

− 50

10

− 2

10

− 1

10

0

Frequency [rad

/

s]

10

1

10

2

Figure 7.6: The error models resulting from the di ff erent methods, from

Section 7.1.2. The dashed vertical line denotes

ω

= 15 rad

/

s

. The red line

( flbt

) seems to have found the best model. However, this model is unstable.

The best model, in

H

2

-norm, is then the green model, which is our proposed method from Section 4.4.3.

at the nominal value

p

= 0. This model will be reduced using the methods described in Chapter 4 and will be compared with other model-reduction methods. The methods flh

2 nl

(which is our proposed frequency-limited modelreduction method, see Section 4.4.3), flistia

, flbt

These methods are also compared with the methods and wh

2 posed frequency-weighted model-reduction method, see Section 4.4.1) and using a tenth order low-pass Butterworth filter with a cut-o ff wbt frequency of 15 rad

/

s. The model is reduced from 22 states to 16 states.

mflbt nl are compared.

(which is our pro-

The results from the di ff erent methods can be seen in Figure 7.6, showing the di ff erent error models, and Figure 7.7, showing the true and reduced models, and Table 7.2. In Figure 7.6 it seems that flbt has found a good approximation.

However, looking at Table 7.2 we see that the model from flbt is unstable. All the other methods find models that are acceptable for the relevant frequency range and as in the examples in Section 4.6, nl finds the model with the best fit.

flh

2

H

2

In this example we had a model that was only valid up till a certain frequency and looking at the result in Figure 7.6 and Figure 7.7 and Table 7.2, we see that the frequency-limited model-reduction methods sacrifices the model fit in the upper frequencies for the valid, lower, frequency regions. Hence, we see the importance of using methods that are able focus on the relevant region.

128

7 Examples of Applications

80

Magnitude plot for the true and the reduced models

60

40

20 wbt mflbt flbt flistia flh

2 wh

2

True nl nl

0

20

− 40

10

− 2

10

− 1

10

0

Frequency [rad

/

s]

10

1

10

2

Figure 7.7: The true and reduced-order models, for the di ff erent methods, from Section 7.1.2. The dashed vertical line denotes

ω

= 15 rad

/

s

.

Table 7.2: Numerical results for the example in Section 7.1.2.

||

G

||

H

2

||

G

||

H

2

||

G

||

H∞

||

G

||

H∞

Re

λ

max mflbt flbt istia flh wh wbt

2

2 nl nl

9.90e-03

2.90e-02

7.79e-03

1.68e-03

8.12e-03

1.12e-02

2.07e-02

3.87e-04

9.33e-03

5.11e-03

1.37e-02

-1.63e-01

-1.19e-01

6.12e+00

-1.35e-01

-1.90e-01

-1.82e-01

7.2

Model Reduction in System Identification

In this example we will show how model reduction can be used in system identification to obtain parameter estimates with a smaller covariance matrix than with direct system identification. The example that will be used is taken from

Tjärnström [2003] where also the theoretical results are presented.

We will be work with a model with and

N

T s

= 1. Let siso discrete-time output-error ( oe

, see Ljung [1999])

y

(

t

) denote the output of the system and is the total number of measured data. The signal generated from the true system,

G

0

(

q

), as

u

(

t

) the input

y

(

t

) is assumed to be

y

(

t

) =

G

0

(

q

)

u

(

t

) +

e

(

t

)

,

where

q

is the discrete-time shift operator and the additive noise,

e

(

t

), is a zero-

7.2

Model Reduction in System Identification

129 mean, white-noise sequence, independent of the input. The sought system is parametrized as an oe the parameters for the model and denoted ˆ oe

(

q, θ

), where

θ

is a vector holding model. To identify a model using the input-output data the prediction-error method ( pem

) (see Ljung [1999]) can be used. One cost function that is commonly used when doing system identification using pem is

V

N

(

θ

) =

(

t, θ

) =

1

N

2

(

t, θ

)

,

2

N t

=1

y

(

t

)

− ˆ

(

q, θ

)

u

(

t

)

,

and the estimate of

θ

given

N θ

N

, is taken as

θ

N

= arg min

θ

V

N

(

θ

)

.

Using the notation and definitions above we can state a connection between system identification, using pem

, and model reduction, using the

H

2

-norm. Under weak conditions, it holds that

θ

N

θ

= arg min

+

θ

(

t

) lim

N

→∞

1

N

model structure, we have that

N t

=1

E

f

1

2

¯ 2

(

t, θ

)

¯

(

θ

)

,

as

N

→ ∞

,

(

t

) and using Parseval’s formula and an oe

π

¯

(

θ

) =

1

4

π

π

G

(e

)

ˆ

(e

, θ

)

2

Φ

u

(

ω

)d

ω

=

1

2

G

ˆ

(

θ

)

2

Φ

u

,

H

2

.

Results in Tjärnström and Ljung [2002] and Tjärnström [2003] states, when estimating an oe model of low order (undermodeling), it is better to estimate the low-order model with model reduction of a high-order model compared to estimating the low-order model directly from data. This was exemplified already in Tjärnström [2003]. However, not by using a

H

2 model-reduction algorithm but by using a first-order approximation of the covariance expression for the parameters, see Tjärnström [2003]. First in this example we will use the method proposed in Section 4.4.1 to do the model reduction when having a white-noise input. Secondly, we will use an input signal with a frequency-limited spectrum that requires the use of the method proposed in Section 4.4.3.

In this example the true system is given by

y

(

t

) =

B

(

q

)

F

(

q

)

u

(

t

) +

e

(

t

)

,

where

B

(

q

) = 2

F

(

q

) = 1

q

1 −

q

2

0

.

7

q

1

+ 0

.

52

q

2 −

0

.

092

q

3 −

0

.

1904

q

4

.

130

7 Examples of Applications

The input,

u

, and noise,

e

, are jointly independent. The noise is a zero-mean white-noise process with variance 1.

First we will use a zero-mean white-noise process with variance 1 for the input. The system is simulated with this input with

N

= 250 to obtain a data set with input and output data. This data set is used first to directly estimate, using

{

n

{

n b

using

b

pem

= 2

= 2

, n

pem

, n f f

, three low-order

= 2 duced, using h

, n

2

= 2

, n k k

= 1

} the same data set an oe

= 1

} and nl

, to three and

{ oe

n

{

b n b

oe models with orders

= 3

k

= 4

, n

models with orders

= 3

, n

, n f f

= 3 model with order

= 3

{

n

, n k

, n b

and this estimated model of order

{

n

= 1

}

b

= 1

}

f

{

n b

= 1 respectively. Now, using

= 4

, n k

= 4

, n f

{

n b

= 1

, n f

= 1

}

= 4

, n k

, n f

= 1

, n k

= 1

} is estimated

= 1

}

= 1

, n k

are re-

= 1

}

, respectively. This procedure

, is repeated 500 times and from the obtained estimates, Monte Carlo based estimates of the covariance matrices are computed. From each of the six covariance matrices, as in Tjärnström [2003], the eigenvalues are determined to represent the size of the covariance matrices. The results are presented in Table 7.3.

Table 7.3:

Numerical results for the example in Section 7.2 using a zeromean white-noise process with variance 1 for the input. The cases marked

“direct” means that the model comes from directly using pem and “reduced” means that first a fourth order model is identified using pem and then this model is then reduced using model reduction to the desired order.

Model – Method oe

(1

,

1

,

1) – direct oe oe

(1

,

1

,

1) – reduced

(2

,

2

,

1) – direct oe oe

(2

,

2

,

1) – reduced

(3

,

3

,

1) – direct oe

(3

,

3

,

1) – reduced

λ

1

0.930

0.924

1.87

1.81

233

179

λ

2

0.0859

0.0671

0.916

0.910

3.57

2.11

λ

3

0.0919

0.0871

0.915

0.952

λ

4

0.0440

0.0431

0.355

0.305

λ

5

0.0413

0.0407

λ

6

0.0276

0.0265

In a second experiment we use an input with a limited spectrum. The input in this case is a zero-mean gaussian signal with a non-zero spectrum on the frequency interval [0

, π/

2] and with variance 1. The same procedure as above is used to estimate six di ff erent oe models using the direct and reduced approach.

The di ff erence compared to the case above is that the proposed method from

Section 4.4.3 is used instead. From each of the six covariance matrices, as in

Tjärnström [2003], the eigenvalues are determined to represent the size of the covariance matrices. The results are presented in Table 7.4

This example repeats the results from Tjärnström [2003], that

H

2 model reduction can in some cases be used to find better estimates in system identification, by finding smaller covariance matrices, see Table 7.3 and Table 7.4. However, this time using an

H

2 model-reduction algorithm, both for the case of having white-noise input and an input with limited spectrum. This example is meant to highlight the connection between system identification and

H

2 model reduction, and illustrate yet another application of our results.

7.3

Conclusions

131

Table 7.4:

Numerical results for the example in Section 7.2 using a zeromean gaussian process with a limited spectrum with variance 1 for the input.

With “direct” means that the model comes from directly using pem and with reduced means that first a fourth order model is identified using pem and the this model is reduced, using model reduction to the correct order.

Model – Method oe oe oe oe oe oe

(1

,

1

,

1) – direct

(1

,

1

,

1) – reduced

(2

,

2

,

1) – direct

(2

,

2

,

1) – reduced

(3

,

3

,

1) – direct

(3

,

3

,

1) – reduced

λ

1

43.2

40.6

1290

1210

3590

1940

λ

2

0.411

0.400

80.9

65.9

595

530

λ

3

10.1

8.18

466

488

λ

4

0.214

0.246

128

99.6

λ

5

3.63

3.51

λ

6

0.170

0.180

7.3

Conclusions

The two examples in this chapter have been chosen to highlight some properties and applications for the model reduction and lpv algorithms and to show their applicability on a real-world example. In the aircraft example in Section 7.1 we could see how the lpv generating algorithms could be used to lower the complexity of an existing lpv model and how the limited-frequency model-reduction algorithm can be used to capture relevant frequency regions when performing model reduction. In the system identification example in Section 7.2 we highlight the connection between system identification and

H

2 model reduction using an example that shows how the covariance matrix of the estimates can be made smaller using model reduction together with system identification.

8

Concluding Remarks

The previous chapters have introduced, and shown the applicability of, some new methods for reducing the complexity of lti and lpv systems and for synthesizing

H

2 controllers. All methods are based on the same technique, which is minimizing the

H

2

-norm of di ff erent systems, and utilizing the structure of the problems to make the methods more e ffi cient. The methods have been developed such that an o ff

-the-shelf quasi-Newton solver can be used to solve the problems using the equations derived in the thesis.

In Section 4.4.1 a method for model reduction, for which the basic idea is not new, was presented. However, we presented how to utilize the structure of the problem and also laid the foundation for the other methods that were presented.

In Section 4.4.2 a model-reduction method that tries to cope with errors in the given data was presented. The method uses the foundation laid in Section 4.4.1

together with a di ff erent view of robust optimization, namely using regularization as a proxy for robust optimization.

In Chapter 3 a more complete and uniform derivation, than in the existing literature, of frequency-limited Gramians were presented. In Section 4.4.3 a frequencylimited model-reduction method was presented. This method was based on the derivations in Chapter 3 together with the foundation laid in Section 4.4.1.

All the model-reduction methods in Chapter 4 were then extended into an lpv framework to be able to handle lpv systems and to be able to reduce the complexity both in the states and the parameters for the lpv systems. Many of the existing lpv generating methods have one drawback in common, which is that they are not invariant to the state basis the lti models are given in. This drawback makes it hard for the existing models to be able to reduce the complexity of

133

134

8 Concluding Remarks the lpv to the model. However, by using a model-reduction method as the foundation lpv generating methods in Chapter 5 this drawback is eliminated.

The model-reduction problem is closely related to the controller-synthesis problem and using the same techniques as in Chapter 4 and Chapter 5,

H

2 controllersynthesis methods were developed in Chapter 6. As discussed in Chapter 6, a possible extension of the methods for synthesizing controllers could be to use the idea of controlling the rank of the system matrices, as in Section 5.4.2. By using this idea, of controlling the rank, one could, for example, enforce the controller to have integrators.

The presented methods have been shown to work well on the presented examples, which are both small academic examples and relevant real-world examples, for example a model of an Airbus aircraft.

All the methods described in this thesis tries to solve non-convex optimization problems, which are di ffi cult problems and only local solutions can be guaranteed. Hence, the initialization problem is a very important part of the methods presented in this thesis. We have presented some suggestions for initializing the methods and, in our examples, they have worked well. However, this is a part of the problem that is in need of further research and much can be gained by making even better initializations, e.g., faster and more reliable computations, since we can hopefully start even closer to an optimum.

Another problem that is in need of further research is the problem of finding a stabilizing controller, which is a problem that has not been discussed much in this thesis. The problem, of finding stabilizing controllers, is crucial to be able to use the methods in Chapter 6, and in this thesis only one simple suggestion that relies on existing methods is presented.

Bibliography

Awad H Al-Mohy, Nicholas J Higham, and Samuel D Relton. Computing the

Fréchet derivative of the matrix logarithm and estimating the condition number.

Manchester Institute for Mathematical Sciences. The University of Manchester, UK

, MIMS EPrint 2012.72, 2012. URL uk/1852/

. Cited on pages 22 and 85.

http://eprints.ma.man.ac.

Branimir Ani ć

, Christopher Beattie, Serkan Gugercin, and Athanasios C. Antoulas. Interpolatory weighted-

H

2 model reduction.

Automatica

, 49(5):1275–

1280, 2013. Cited on page 45.

Athanasios C. Antoulas.

Approximation of Large-Scale Dynamical Systems

. Advances in Design and Control. Society for Industrial and Applied Mathematics,

2005. ISBN 0898715296. Cited on pages 8, 40, 41, and 42.

Denis Arzelier, Deaconu Georgia, Suat Gumussoy, and Didier Henrion.

hifoo

. In

H

2 for

Proceedings of 2011 International Conference on Control and Optimization with Industrial Applications

, Ankara, Turkey, 2011. Cited on page

110.

Bassam Bamieh and Laura Giarre. Identification of linear parameter varying models.

International Journal of Robust and Nonlinear Control

, 12(9):841–853,

2002. Cited on page 88.

Richard H. Bartels and G. W. Stewart. Algorithm 432: The solution of the matrix equation

AX

BX

=

C

.

Communications of the ACM

, 8:820–826, 1972. Cited on page 66.

Frank Bauer and Mark A. Lukas. Comparing parameter choice methods for regularization of ill-posed problems.

Mathematics and Computers in Simulation

,

81(9):1795–1841, May 2011. Cited on page 55.

Christopher A. Beattie and Serkan Gugercin. Krylov-based minimization for optimal

H

2 model reduction. In

Proceedings of the 47th IEEE Conference on

Decision and Control

, pages 4385–4390, New Orleans, USA, 2007. Cited on pages 40, 43, and 45.

135

136

Bibliography

Christopher A. Beattie and Serkan Gugercin. A trust region method for optimal

H

2 model reduction. In

Proceedings of the 48th IEEE Conference on Decision and Control and 28th Chinese Control Conference

, pages 5370–5375, Shanghai, China, 2009. Cited on page 40.

Aharon Ben-Tal and Arkadi Nemirovski. Robust Optimization - Methodology and Applications.

Mathematical Programming (Series B)

, 92:453–480, 2002.

Cited on page 55.

Peter Benner, Jose M. Claver, and Enrique S. Quintana-Orti. E ffi cient solution of coupled Lyapunov equations via matrix sign function iteration. In

Proceedings of the 3rd Portuguese Conference on Automatic Control

, pages 205–210,

Coimbra, Portugal, 1998. Cited on page 110.

Dimitris Bertsimas, David B. Brown, and Constantine Caramanis. Theory and Applications of Robust Optimization.

SIAM Review

, 53(3):464–501, 2011. Cited on page 55.

Vincent Blondel and John N. Tsitsiklis. NP-hardness of some linear control design problems.

SIAM Journal on Control and Optimization

, 35(6):2118–2127, 1997.

Cited on pages 103 and 110.

Samuel Burer and Renato D.C. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization.

Mathematical

Programming, Series B

, 95(2):329–357, 2003. Cited on page 97.

Jan De Caigny, Juan F. Camino, and Jan Swevers. Interpolation-based modeling of mimo lpv systems.

IEEE Transactions on Control Systems Technology

, 19

(1):46–63, 2011. Cited on page 88.

Jan De Caigny, Rik Pintelon, Juan F. Camino, and Jan Swevers. Interpolated modeling of lpv systems based on observability and controllability. In

Proceedings of the 16th IFAC Symposium on System Identification

, pages 1773–1778, Brussels, Belgium, 2012. Cited on pages 88, 89, 99, and 100.

M. Diab, W.Q. Liu, and V. Sreeram. Optimal model reduction with a frequency weighted extension.

Dynamics and Control

, 10:255–276, 2000. Cited on page

40.

Laurent El Ghaoui and Hervé Lebret. Robust solutions to least-squares problems with uncertain data.

SIAM Journal on Matrix Analysis and Applications

, 18(4):

1035–1064, 1997. Cited on page 56.

Laurent El Ghaoui, Francois Oustry, and Mustapha AitRami. A cone complementarity linearization algorithm for static output-feedback and related problems.

IEEE Transactions on Automatic Control

, 42(8):1171–1176, aug 1997. Cited on page 13.

Dale F. Enns. Model reduction with balanced realizations: An error bound and a frequency weighted generalization. In

Proceedings of the 23rd IEEE Confer-

Bibliography

137

ence on Decision and Control

, pages 127 – 132, Las Vegas, USA, 1984. Cited on pages 40, 42, and 68.

Makan Fardad, Fu Lin, and Mihailo R. Jovanovic. On the optimal design of structured feedback gains for interconnected systems. In

Proceedings of the 48th

IEEE Conference on Decision and Control, held jointly with the 28th Chinese

Control Conference

, pages 978–983, Shanghai, China, 2009. Cited on page

104.

Federico Felici, Jan-Willem Van Wingerden, and Michel Verhaegen. Subspace identification of mimo lpv systems using a periodic scheduling sequence.

Automatica

, 43(10):1684–1697, 2007. Cited on page 88.

Garret M. Flagg, Serkan Gugercin, and Christopher A. Beattie. An interpolationbased approach to

H

∞ model reduction of dynamical systems. In

Proceedings of the 49th IEEE Conference on Decision and Control

, pages 6791–6796, Atlanta, GA, USA, 2010. Cited on page 40.

Pascale Fulcheri and Martine Olivi. Matrix rational dient algorithm based on schur analysis.

H

2 approximation: A gra-

SIAM Journal on Control and Optimization

, 36(6):2103–2127, 1998. Cited on pages 43 and 44.

Andrea Garulli, Anders Hansson, Sina Khoshfetrat Pakazad, Alfio Masi, and Ragnar Wallin. Robust finite-frequency

H

2 analysis of uncertain systems with application to flight comfort analysis.

Control Engineering Practice

, 21(6):887–

897, 2013. Cited on page 121.

Wodek Gawronski and Jer-Nan Juang. Model reduction in limited time and frequency intervals.

International Journal of Systems Science

, 21(2):349–376,

1990. Cited on pages 23, 24, 37, 41, 42, 43, and 68.

Wodek K. Gawronski.

Advanced Structural Dynamics and Active Control of

Structures

. Mechanical Engineering Series. Springer, 2004. Cited on page 23.

Keith Glover. All optimal hankel-norm approximations of linear multivariable systems and their

L

-error bounds.

International Journal of Control

1115–1193, 1984. Cited on pages 40 and 42.

, 39(6):

Gene H. Golub and Charles F. Van Loan.

Matrix Computations

. Johns Hopkins

University Press, 3rd edition, 1996. ISBN 0-8018-5413-8. Cited on page 66.

Serkan Gugercin and Athanasios C. Antoulas. A survey of model reduction by balanced truncation and some new results.

International Journal of Control

,

77(8):748–766, 2004. Cited on pages 42, 43, and 68.

Suat Gumussoy, Didier Henrion, Marc Millstone, and Michael L. Overton. Multiobjective robust control with hifoo

2.0. In

Proceedings of the IFAC Symposium on Robust Control Design

, Haifa, Israel, 2009. Cited on pages 110, 111,

112, 113, 114, 115, and 116.

Yoram Halevi. Frequency weighted model reduction via optimal projection.

IEEE

Transactions on Automatic Control

, 37(10):1537–1542, 1992. Cited on page 40.

138

Bibliography

Trevor Hastie, Robert Tibshirani, and Jerome Friedman.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

. Springer, 2001. ISBN

0-387-95284-5. Cited on page 55.

Simon Hecker.

Generation of low order lft

Representations for Robust Control

Applications

. PhD thesis, Technical University of Munich, 2006. Cited on page

122.

Anders Helmersson. Model reduction using lmi s. In

Proceedings of the 33rd

IEEE Conference on Decision and Control

, volume 4, pages 3217–3222, Lake

Buena Vista, USA, 1994. Cited on pages 40 and 45.

Nicholas J. Higham.

Functions of Matrices: Theory and Computation

. SIAM,

2008. Cited on pages 19, 20, 21, 22, 62, 64, 85, and 96.

Lucas G. Horta, Jer-Nan Juang, and Richard W. Longman. Discrete-time model reduction in limited frequency ranges.

Journal of Guidance, Control and Dynamics

, 16(6):1125–1130, 1993. Cited on pages 23, 30, 37, 41, and 42.

Xue-Xiang Huang, Wei-Yong Yan, and K. L. Teo.

tion.

H

2 near-optimal model reduc-

IEEE Transactions on Automatic Control

, 46(8):1279–1284, 2001. Cited on pages 41 and 44.

Rudolf E. Kalman. Contributions to the theory of optimal control.

Boletin de la

Sociedad Mathematica Mexicana

, 5:102–119, 1960. Cited on page 103.

Balazs Kulcsar and Roland Tóth. On the similarity state transformation for linear parameter-varying systems. In

Proceedings of the 18th IFAC World Congress

, pages 4155–4160, Milan, Italy, 2011. Cited on page 14.

Peter Lancaster and Miron Timor Tismenetsky.

The Theory of Matrices

. Computer Science and Scientific Computing. Academic Press, second edition edition, 1985. Cited on pages 19 and 20.

Lawton H. Lee and Kameshwar Poolla. Identification of linear parameter-varying systems using nonlinear programming.

Journal of Dynamic Systems, Measurement and Control

, 121(1):71–78, 1999. Cited on page 88.

Friedemann Leibfritz and W. Lipinski. Description of the benchmark examples in

COMPl e ib 1.0. Technical report, University of Trier, Department of Mathematics, Germany, 2003. Cited on pages 69, 73, 110, 112, 113, 114, 115, and 116.

Douglas J. Leith and William E. Leithead. Survey of gain-scheduling analysis and design.

International Journal of Control

, 73(11):1001–1025, 2000. Cited on page 87.

Antonio Lepschy, Gian Antonio Mian, G. Pinato, and Umberto Viaro. Rational

L

2 approximation: a non-gradient algorithm. In

Proceedings of the 30th IEEE

Conference on Decision and Control

, volume 3, pages 2321–2323, Brighton,

UK, 1991. Cited on page 43.

Bibliography

139

Adrian S Lewis and Michael L Overton. Nonsmooth optimization via quasinewton methods.

Mathematical Programming

, 2012. Cited on page 16.

Ching-An Lin and Tai-Yih Chiu. Model reduction via frequency weighted balanced realization.

Control Theory and Advanced Technology

, 8:341–351, 1992.

Cited on page 42.

Fu Lin, Makan Fardad, and Mihailo R. Jovanovic. Synthesis of structured controllers: primal and dual formulations. In

H

2 optimal static

Proceedings of the

47th Annual Allerton Conference

, pages 340–346, Urbana–Champaign, USA,

2009. Cited on page 104.

Lennart Ljung.

System Identification: Theory for the User

. Prentice Hall, second edition, 1999. ISBN 0-13-656695-2. Cited on pages 88, 128, and 129.

Marco Lovera and Guillaume Mercere. Identification for gain-scheduling: a balanced subspace approach. In

Proceedings of the American Control Conference

, pages 858–863, New York, USA, 2007. Cited on page 88.

Marco Lovera, Carlo Novara, Paulo Lopes dos Santos, and Daniel Rivera. Guest editorial special issue on applied lpv modeling and identification.

IEEE Transactions on Control Systems Technology

, 19(1):1–4, 2011. Cited on page 87.

Andres Marcos and Gary J. Balas. Development of linear-parameter-varying models for aircraft.

Journal of Guidance, Control and Dynamics

, 27(2):218–228,

2004. Cited on page 87.

Alexandre Megretski and Anders Rantzer. System analysis via integral quadratic constraints.

IEEE Transactions on Automatic Control

, 42(6):819–830, 1997.

Cited on page 87.

Lewis Meier and David G. Luenberger. Approximation of linear constant systems.

IEEE Transactions on Automatic Control

, 12(5):585–588, 1967. Cited on page

43.

Mehran Mesbahi, George P. Papavassilopoulos, and Michael G. Safonov. Matrix cones, complementarity problems, and the bilinear matrix inequality. In

Proceedings of the 34th IEEE Conference on Decision and Control

, volume 3, pages 3102–3107, New Orleans, USA, 1995. Cited on page 104.

Keith Miller. Least squares methods for ill-posed problems with a prescribed bound.

SIAM Journal on Mathematical Analysis

, 1(1):52–74, 1970. Cited on page 56.

Javad Mohammadpour and Carsten W. Scherer, editors.

Control of Linear Parameter Varying Systems with Applications

. Springer US, 2012. ISBN 978-1-4614-

1833-7. Cited on page 87.

Bruce C. Moore. Principal component analysis in linear systems: Controllability, observability and model reduction.

IEEE Transactions on Automatic Control

,

AC-26(1):17–32, 1981. Cited on pages 40 and 41.

140

Bibliography

Mahadevamurty Nemani, Rayadurgam Ravikanth, and Bassam A. Bamieh. Identification of linear parametrically varying systems. In

Proceedings of the 34th

IEEE Conference on Decision and Control

, volume 3, pages 2990–2995, Seville,

Spain, 1995. Cited on page 88.

Jorge Nocedal and Stephen J. Wright.

Numerical Optimization

. Springer, 2006.

ISBN 987-0387-30303-1. Cited on pages 15, 16, and 17.

Daniel Petersson.

Nonlinear optimization approaches to

H

2

-norm based lpv modelling and control

. Licentiate thesis no. 1453, Department of Electrical

Engineering, Linköping University, 2010. Not cited.

Daniel Petersson and Johan Löfberg. Optimization based of multi-model systems. In lpv

-approximation

Proceedings of the European Control Conference

, pages 3172–3177, Budapest, Hungary, 2009. Not cited.

Daniel Petersson and Johan Löfberg. Robust generation of els using a regularized

H

2

-cost. In lpv state-space mod-

Proceedings of the IEEE International Symposium on Computer-Aided Control System Design

, pages 1170–1175, Yokohama, Japan, 2010. Not cited.

Daniel Petersson and Johan Löfberg.

programming. In lpv

H

2

-controller synthesis using nonlinear

Proceedings of the 18th IFAC World Congress

, pages 6692–

6696, Milan, Italy, 2011. Not cited.

Daniel Petersson and Johan Löfberg. Model reduction using a frequency-limited

H

2

-cost.

arXiv preprint arXiv:1212.1603

, December 2012a. URL arxiv.org/abs/1212.1603

. Cited on pages 39 and 60.

http://

Daniel Petersson and Johan Löfberg.

Optimization Based Clearance of Flight

Control Laws - A Civil Aircraft Application

, chapter Identification of

Space Models Using

H

2 lpv

State-

-Minimisation, pages 111–128. Springer, 2012b. Not cited.

Daniel Petersson and Johan Löfberg. Optimization-based modeling of tems using an

H

2 objective.

lpv sys-

Submitted to International Journal of Control

,

December 2012c. Cited on page 87.

Harald Pfifer and Simon Hecker. Generation of optimal linear parametric models for lft

-based robust stability analysis and control design. In

Proceedings of the 47th IEEE Conference on Decision and Control

, pages 3866–3871, Cancun,

Mexico, 2008. Cited on pages 88 and 89.

Charles Poussot-Vassal. An iterative SVD-tangential interpolation method for medium-scale craft. In mimo systems approximation with application on flexible air-

Proceedings of the 50th IEEE Conference on Decision and Control and the European Control Conference

, pages 7117–7122, Orlando, FL, USA,

2011. Cited on pages 40 and 45.

Charles Poussot-Vassal and Pierre Vuillemin. Introduction to MORE: a MOdel RE-

Bibliography

141 duction Toolbox. In

Proceedings of the IEEE Multi Systems Conference (MSC

CCA’12)

, pages 776–781, Dubrovnik, Croatia, 2012. Cited on pages 41 and 68.

T. Rautert and Ekkehard W. Sachs. Computational design of optimal output feedback controllers.

page 103.

SIAM Journal on Optimization

, 7(3):837–852, 1997. Cited on

Wilson J. Rugh and Je ff

S. Shamma. Research on gain scheduling.

36(10):1401–1425, 2000. Cited on page 87.

Automatica

,

M. G. Safonov and R. Y. Chiang. A Schur Method for Balanced-Truncation Model

Reduction.

IEEE Transactions on Automatic Control

, 34(7):729–733, 1989.

Cited on page 42.

M. G. Safonov, R. Y. Chiang, and D. J. N. Limebeer. Optimal Hankel Model Reduction for Nonminimal Systems.

IEEE Transactions on Automatic Control

, 35

(4):496–502, 1990. Cited on page 42.

Shafishuhaza Sahlan, Abdul Ghafoor, and Victor Sreeram. A new method for the model reduction technique via a limited frequency interval impulse response gramian.

Mathematical and Computer Modelling

, 55(3-4):1034–1040, 2012.

Cited on page 41.

Je ff

S. Shamma and Michael Athans. Gain scheduling: potential hazards and possible remedies.

IEEE Control Systems Magazine

, 12(3):101–107, 1992. Cited on pages 89 and 108.

Robert E. Skelton, Tetsuya Iwasaki, and Karolos M. Grigoriadis.

A Unified Algebraic Approach to Linear Control Design

. Taylor and Francis, 1998. ISBN

0-7484-0592-5. Cited on pages 10, 19, and 20.

Sigurd Skogestad and Ian Postlethwaite.

Multivariable Feedback Control: Analysis and Design

. Wiley, second edition, 2007. ISBN 0-470-01167-6. Cited on pages 8, 10, and 13.

Victor Sreeram and Shafishuhaza Sahlan. Improved results on frequency weighted balanced truncation. In

Proceedings of the 48th IEEE Conference on Decision and Control, held jointly with the 28th Chinese Control Conference

, pages

3250–3255, Shanghai, China, 2009. Cited on page 40.

Maarten Steinbuch, Rene van de Molengraft, and Aart-Jan Van Der Voort. Experimental modelling and lpv control of a motion system. In

Proceedings of the American Control Conference

, volume 2, pages 1374–1379, Denver, USA,

2003. Cited on pages 88 and 89.

Michael Stingl.

On the Solution of Nonlinear Semidefinite Programs by Augmented Lagrangian Methods

. PhD thesis, University of Erlangen, 2006. Cited on pages 104, 110, 111, 112, 113, 114, 115, and 116.

Vassili L. Syrmos, Chaouki T. Abdallah, Peter Dorato, and Karolos Grigoriadis.

Static output feedback – a survey.

Automatica

, 33(2):125–137, 1997. Cited on page 103.

142

Bibliography

Fredrik Tjärnström. Variance analysis of eling - the output error case.

L

2 model reduction when undermod-

Automatica

, 39(10):1809–1815, 2003. Cited on pages 128, 129, and 130.

Fredrik Tjärnström and Lennart Ljung.

tion.

L

2 model reduction and variance reduc-

Automatica

, 38(9):1517–1530, Sep 2002. Cited on page 129.

Roland Tóth.

Modeling and Identification of Linear Parameter-Varying Systems, an Orthonormal Basis Function Approach

. PhD thesis, Delft University of Technology, 2008. Cited on pages 14, 87, and 88.

Roland Tóth, Federico Felici, Peter S. C. Heuberger, and Paul Van Den Hof. Discrete time lpv

I/O and state-space representations, di ff erences of behavior and pitfalls of interpolation. In

Proceedings of the European Control Conference

, pages 5418–5425, Kos, Greece, 2007. Cited on page 89.

Roland Tóth, Hossam Seddik Abbas, and Herbert Werner. On the state-space realization of lpv input-output models: Practical approaches.

IEEE Transactions on Control Systems Technology

, 20(1):139–153, 2012. Cited on page 15.

Andras Varga and Brian D.O. Anderson. Accuracy enhancing methods for the frequency-weighted balancing related model reduction. In

Proceedings of the

40th IEEE Conference on Decision and Control

, pages 3659–3664, Orlando,

USA 2001. Cited on page 42.

Andreas Varga, Anders Hansson, and Guilhem Puyou, editors.

Optimization

Based Clearance of Flight Control Laws

. Lecture Notes in Control and Information Science. Springer, 2012. Cited on page 121.

Pierre Vuillemin, Charles Poussot-Vassal, and Daniel Alazard.

frequency limited approximation methods for large-scale lti

H

2 optimal and dynamical systems. In

Proceedings of the 5th IFAC Symposium on System Structure and

Control

, pages 719–724, Grenoble, France, 2013. Cited on page 68.

Matthijs Groot Wassink, Marc Van De Wal, Carsten Scherer, and Okko Bosgra.

lpv control for a wafer stage: beyond the theoretical solution.

Control Engineering Practice

, 13(2):231–245, 2005. Cited on pages 87 and 88.

D. A. Wilson. Optimum solution of model reduction problem.

Proceedings of the

Institution of Electrical Engineers

, 117(6):1161–1165, 1970. Cited on pages 43 and 45.

D. A. Wilson. Model reduction for mutivariable systems.

of Control

, 20(1):57–64, 1974. Cited on page 44.

International Journal

Yuesheng Xu and Taishan Zeng. Optimal mimo

H

2 systems via tangential interpolation.

model reduction for large scale

International Journal of Numerical

Analysis and Modeling

, 8(1):174–188, 2011. Cited on page 45.

Wei-Yong Yan and James Lam. An approximate approach to reduction.

H

2 optimal model

IEEE Transactions on Automatic Control

, 44(7):1341–1358, 1999.

Cited on pages 43 and 44.

Bibliography

143

Kemin Zhou. Frequency-weighted reduction.

L

∞ norm and optimal Hankel norm model

IEEE Transactions on Automatic Control

, 40(10):1687–1699, 1995.

Cited on page 40.

Kemin Zhou, John C. Doyle, and Keith Glover.

Robust and optimal control

.

Prentice-Hall, Inc., 1996. ISBN 0-13-456567-3. Cited on pages 10, 12, 13,

87, 103, and 122.

PhD Dissertations

Division of Automatic Control

Linköping University

M. Millnert: Identification and control of systems subject to abrupt changes. Thesis

No. 82, 1982. ISBN 91-7372-542-0.

A. J. M. van Overbeek: On-line structure selection for the identification of multivariable systems. Thesis No. 86, 1982. ISBN 91-7372-586-2.

B. Bengtsson:

593-5.

S. Ljung:

On some control problems for queues. Thesis No. 87, 1982. ISBN 91-7372-

Fast algorithms for integral equations and least squares identification problems.

Thesis No. 93, 1983. ISBN 91-7372-641-9.

H. Jonson: A Newton method for solving non-linear optimal control problems with general constraints. Thesis No. 104, 1983. ISBN 91-7372-718-0.

E. Trulsson: Adaptive control based on explicit criterion minimization. Thesis No. 106,

1983. ISBN 91-7372-728-8.

K. Nordström: Uncertainty, robustness and sensitivity reduction in the design of single input control systems. Thesis No. 162, 1987. ISBN 91-7870-170-8.

B. Wahlberg: On the identification and approximation of linear systems. Thesis No. 163,

1987. ISBN 91-7870-175-9.

S. Gunnarsson: Frequency domain aspects of modeling and control in adaptive systems.

Thesis No. 194, 1988. ISBN 91-7870-380-8.

A. Isaksson: On system identification in one and two dimensions with signal processing applications. Thesis No. 196, 1988. ISBN 91-7870-383-2.

M. Viberg: Subspace fitting concepts in sensor array processing. Thesis No. 217, 1989.

ISBN 91-7870-529-0.

K. Forsman: Constructive commutative algebra in nonlinear control theory.

No. 261, 1991. ISBN 91-7870-827-3.

Thesis

F. Gustafsson: Estimation of discrete parameters in linear systems. Thesis No. 271, 1992.

ISBN 91-7870-876-1.

P. Nagy: Tools for knowledge-based signal processing with applications to system identification. Thesis No. 280, 1992. ISBN 91-7870-962-8.

T. Svensson: Mathematical tools and software for analysis and design of nonlinear control systems. Thesis No. 285, 1992. ISBN 91-7870-989-X.

S. Andersson: On dimension reduction in sensor array signal processing. Thesis No. 290,

1992. ISBN 91-7871-015-4.

H. Hjalmarsson: Aspects on incomplete modeling in system identification. Thesis No. 298,

1993. ISBN 91-7871-070-7.

I. Klein: Automatic synthesis of sequential control schemes.

Thesis No. 305, 1993.

ISBN 91-7871-090-1.

J.-E. Strömberg: A mode switching modelling philosophy. Thesis No. 353, 1994. ISBN 91-

7871-430-3.

K. Wang Chen: Transformation and symbolic calculations in filtering and control. Thesis

No. 361, 1994. ISBN 91-7871-467-2.

T. McKelvey: Identification of state-space models from time and frequency data. Thesis

No. 380, 1995. ISBN 91-7871-531-8.

J. Sjöberg: Non-linear system identification with neural networks. Thesis No. 381, 1995.

ISBN 91-7871-534-2.

R. Germundsson: Symbolic systems – theory, computation and applications. Thesis

No. 389, 1995. ISBN 91-7871-578-4.

P. Pucar: Modeling and segmentation using multiple models. Thesis No. 405, 1995.

ISBN 91-7871-627-6.

H. Fortell: Algebraic approaches to normal forms and zero dynamics. Thesis No. 407,

1995. ISBN 91-7871-629-2.

A. Helmersson: Methods for robust gain scheduling. Thesis No. 406, 1995. ISBN 91-7871-

628-4.

P. Lindskog: Methods, algorithms and tools for system identification based on prior knowledge. Thesis No. 436, 1996. ISBN 91-7871-424-8.

J. Gunnarsson: Symbolic methods and tools for discrete event dynamic systems. Thesis

No. 477, 1997. ISBN 91-7871-917-8.

M. Jirstrand: Constructive methods for inequality constraints in control. Thesis No. 527,

1998. ISBN 91-7219-187-2.

U. Forssell: Closed-loop identification: Methods, theory, and applications. Thesis No. 566,

1999. ISBN 91-7219-432-4.

A. Stenman: Model on demand: Algorithms, analysis and applications. Thesis No. 571,

1999. ISBN 91-7219-450-2.

N. Bergman: Recursive Bayesian estimation: Navigation and tracking applications. Thesis

No. 579, 1999. ISBN 91-7219-473-1.

K. Edström: Switched bond graphs: Simulation and analysis. Thesis No. 586, 1999.

ISBN 91-7219-493-6.

M. Larsson: Behavioral and structural model based approaches to discrete diagnosis. Thesis No. 608, 1999. ISBN 91-7219-615-5.

F. Gunnarsson: Power control in cellular radio systems: Analysis, design and estimation.

Thesis No. 623, 2000. ISBN 91-7219-689-0.

V. Einarsson: Model checking methods for mode switching systems. Thesis No. 652, 2000.

ISBN 91-7219-836-2.

M. Norrlöf: Iterative learning control: Analysis, design, and experiments. Thesis No. 653,

2000. ISBN 91-7219-837-0.

F. Tjärnström: Variance expressions and model reduction in system identification. Thesis

No. 730, 2002. ISBN 91-7373-253-2.

J. Löfberg: Minimax approaches to robust model predictive control. Thesis No. 812, 2003.

ISBN 91-7373-622-8.

J. Roll: Local and piecewise a ffi ne approaches to system identification. Thesis No. 802,

2003. ISBN 91-7373-608-2.

J. Elbornsson: Analysis, estimation and compensation of mismatch e ff ects in A/D converters. Thesis No. 811, 2003. ISBN 91-7373-621-X.

O. Härkegård: Backstepping and control allocation with applications to flight control.

Thesis No. 820, 2003. ISBN 91-7373-647-3.

R. Wallin: Optimization algorithms for system analysis and identification. Thesis No. 919,

2004. ISBN 91-85297-19-4.

D. Lindgren: Projection methods for classification and identification. Thesis No. 915,

2005. ISBN 91-85297-06-2.

R. Karlsson: Particle Filtering for Positioning and Tracking Applications. Thesis No. 924,

2005. ISBN 91-85297-34-8.

J. Jansson: Collision Avoidance Theory with Applications to Automotive Collision Mitigation. Thesis No. 950, 2005. ISBN 91-85299-45-6.

E. Geijer Lundin: Uplink Load in CDMA Cellular Radio Systems. Thesis No. 977, 2005.

ISBN 91-85457-49-3.

M. Enqvist: Linear Models of Nonlinear Systems. Thesis No. 985, 2005. ISBN 91-85457-

64-7.

T. B. Schön: Estimation of Nonlinear Dynamic Systems — Theory and Applications. Thesis No. 998, 2006. ISBN 91-85497-03-7.

I. Lind: Regressor and Structure Selection — Uses of ANOVA in System Identification.

Thesis No. 1012, 2006. ISBN 91-85523-98-4.

J. Gillberg: Frequency Domain Identification of Continuous-Time Systems Reconstruction and Robustness. Thesis No. 1031, 2006. ISBN 91-85523-34-8.

M. Gerdin: Identification and Estimation for Models Described by Di ff erential-Algebraic

Equations. Thesis No. 1046, 2006. ISBN 91-85643-87-4.

C. Grönwall: Ground Object Recognition using Laser Radar Data – Geometric Fitting,

Performance Analysis, and Applications. Thesis No. 1055, 2006. ISBN 91-85643-53-X.

A. Eidehall: Tracking and threat assessment for automotive collision avoidance. Thesis

No. 1066, 2007. ISBN 91-85643-10-6.

F. Eng: Non-Uniform Sampling in Statistical Signal Processing. Thesis No. 1082, 2007.

ISBN 978-91-85715-49-7.

E. Wernholt: Multivariable Frequency-Domain Identification of Industrial Robots. Thesis

No. 1138, 2007. ISBN 978-91-85895-72-4.

D. Axehill: Integer Quadratic Programming for Control and Communication. Thesis

No. 1158, 2008. ISBN 978-91-85523-03-0.

G. Hendeby: Performance and Implementation Aspects of Nonlinear Filtering. Thesis

No. 1161, 2008. ISBN 978-91-7393-979-9.

J. Sjöberg: Optimal Control and Model Reduction of Nonlinear DAE Models. Thesis

No. 1166, 2008. ISBN 978-91-7393-964-5.

D. Törnqvist: Estimation and Detection with Applications to Navigation. Thesis No. 1216,

2008. ISBN 978-91-7393-785-6.

P-J. Nordlund: E ffi cient Estimation and Detection Methods for Airborne Applications.

Thesis No. 1231, 2008. ISBN 978-91-7393-720-7.

H. Tidefelt: Di ff erential-algebraic equations and matrix-valued singular perturbation.

Thesis No. 1292, 2009. ISBN 978-91-7393-479-4.

H. Ohlsson: Regularization for Sparseness and Smoothness — Applications in System

Identification and Signal Processing. Thesis No. 1351, 2010. ISBN 978-91-7393-287-5.

S. Moberg: Modeling and Control of Flexible Manipulators. Thesis No. 1349, 2010.

ISBN 978-91-7393-289-9.

J. Wallén: Estimation-based iterative learning control. Thesis No. 1358, 2011. ISBN 978-

91-7393-255-4.

J. Hol: Sensor Fusion and Calibration of Inertial Sensors, Vision, Ultra-Wideband and GPS.

Thesis No. 1368, 2011. ISBN 978-91-7393-197-7.

D. Ankelhed: On the Design of Low Order H-infinity Controllers. Thesis No. 1371, 2011.

ISBN 978-91-7393-157-1.

C. Lundquist: Sensor Fusion for Automotive Applications.

Thesis No. 1409, 2011.

ISBN 978-91-7393-023-9.

P. Skoglar: Tracking and Planning for Surveillance Applications. Thesis No. 1432, 2012.

ISBN 978-91-7519-941-2.

K. Granström: Extended target tracking using PHD filters.

Thesis No. 1476, 2012.

ISBN 978-91-7519-796-8.

C. Lyzell: Structural Reformulations in System Identification. Thesis No. 1475, 2012.

ISBN 978-91-7519-800-2.

J. Callmer: Autonomous Localization in Unknown Environments. Thesis No. 1520, 2013.

ISBN 978-91-7519-620-6.

Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Related manuals