Differential-algebraic equations and matrix-valued singular perturbation $(function(){PrimeFaces.cw("Tooltip","widget_formSmash_items_resultList_23_j_idt799_0_j_idt801",{id:"formSmash:items:resultList:23:j_idt799:0:j_idt801",widgetVar:"widget_formSmash_items_resultList_23_j_idt799_0_j_idt801",showEffect:"fade",hideEffect:"fade",target:"formSmash:items:resultList:23:j_idt799:0:fullText"});});

Differential-algebraic equations and matrix-valued singular perturbation $(function(){PrimeFaces.cw("Tooltip","widget_formSmash_items_resultList_23_j_idt799_0_j_idt801",{id:"formSmash:items:resultList:23:j_idt799:0:j_idt801",widgetVar:"widget_formSmash_items_resultList_23_j_idt799_0_j_idt801",showEffect:"fade",hideEffect:"fade",target:"formSmash:items:resultList:23:j_idt799:0:fullText"});});
Linköping studies in science and technology. Dissertations.
No. 1292
Differential-algebraic equations and
matrix-valued singular perturbation
Henrik Tidefelt
Department of Electrical Engineering
Linköping University, SE–581 83 Linköping, Sweden
Linköping 2009
Cover illustration: Stereo pair showing the entries of a sampled uncertain lti dae
of nominal index 2 in its canonical form, displayed as A
E . The uniform distributions
correspond to the intervals
in
table
7.3,
signs
were
ignored,
and values transformed
by the map x 7→ 21 |x| + |x|1/7 to enhance resolution near 0 and 1. Sampled values
are encoded both as area of markers and as height in the image. Left eye’s view to the
left, right eye’s view to the right.
Linköping studies in science and technology. Dissertations.
No. 1292
Differential-algebraic equations and matrix-valued singular perturbation
Henrik Tidefelt
[email protected]
www.control.isy.liu.se
Division of Automatic Control
Department of Electrical Engineering
Linköping University
SE–581 83 Linköping
Sweden
ISBN 978-91-7393-479-4
ISSN 0345-7524
Copyright © 2009 Henrik Tidefelt
Printed by LiU-Tryck, Linköping, Sweden 2009
To Nina
@ λ
#
Abstract
With the arrival of modern component-based modeling tools for dynamic systems,
the differential-algebraic equation form is increasing in popularity as it is general
enough to handle the resulting models. However, if uncertainty is allowed in the
equations — no matter how small — this thesis stresses that such equations generally
become ill-posed. Rather than deeming the general differential-algebraic structure
useless up front due to this reason, the suggested approach to the problem is to ask
what assumptions that can be made in order to obtain well-posedness. Here, wellposedness is used in the sense that the uncertainty in the solutions should tend to
zero as the uncertainty in the equations tends to zero.
The main theme of the thesis is to analyze how the uncertainty in the solution to a
differential-algebraic equation depends on the uncertainty in the equation. In particular, uncertainty in the leading matrix of linear differential-algebraic equations
leads to a new kind of singular perturbation, which is referred to as matrix-valued
singular perturbation. Though a natural extension of existing types of singular perturbation problems, this topic has not been studied in the past. As it turns out that
assumptions about the equations have to be made in order to obtain well-posedness,
it is stressed that the assumptions should be selected carefully in order to be realistic to use in applications. Hence, it is suggested that any assumptions (not counting
properties which can be checked by inspection of the uncertain equations) should be
formulated in terms of coordinate-free system properties. In the thesis, the location
of system poles has been the chosen target for assumptions.
Three chapters are devoted to the study of uncertain differential-algebraic equations
and the associated matrix-valued singular perturbation problems. Only linear equations without forcing function are considered. For both time-invariant and timevarying equations of nominal differentiation index 1, the solutions are shown to converge as the uncertainties tend to zero. For time-invariant equations of nominal index 2, convergence has not been shown to occur except for an academic example.
However, the thesis contains other results for this type of equations, including the
derivation of a canonical form for the uncertain equations.
While uncertainty in differential-algebraic equations has been studied in-depth, two
related topics have been studied more passingly.
One chapter considers the development of point-mass filters for state estimation on
manifolds. The highlight is a novel framework for general algorithm development
with manifold-valued variables. The connection to differential-algebraic equations is
that one of their characteristics is that they have an underlying manifold-structure
imposed on the solution.
One chapter presents a new index closely related to the strangeness index of a
differential-algebraic equation. Basic properties of the strangeness index are shown
to be valid also for the new index. The definition of the new index is conceptually
simpler than that of the strangeness index, hence making it potentially better suited
for both practical applications and theoretical developments.
v
Populärvetenskaplig sammanfattning
Avhandlingen handlar främst om att beräkna hur osäkerhet i så kallade differentialalgebraiska ekvationer påverkar osäkerheten i ekvationernas lösning. Genom att studera problem som tillåter osäkerheter med mindre struktur i jämfört med tidigare
forskning, leder problemet snabbt vidare till att studera en ny klass av singulära perturbationsproblem, som här kallas matris-värda singulära perturbationsproblem.
Förutom ett verktyg för att förstå osäkerhet i lösningen till ekvationer med osäkerhet,
syftar analysen i avhandlingen till att skapa verktyg som kan användas även för problem utan osäkerhet. Som ett första exempel på sådana problem kan nämnas ekvationer som är formulerade i symbol-hanterande mjukvara för differential-algebraiska
ekvationer, där det inte alltid går att lita på att mjukvaran klarar av att bevisa att ett
visst uttryck kommer vara noll längs ekvationens lösningstrajektoria. Då kan det vara fördelaktigt att kunna betrakta uttrycket som ett osäkert värde nära noll. Som ett
andra exempel på sådana problem kan nämnas differential-algebraiska ekvationer
med tidsberoende, där den ledande matrisen som beror kontinuerligt av tiden tappar rang tid en viss tidpunkt. Då kan det vara fördelaktigt att kunna approximera
den ledande matrisen med en annan som har den lägre rangen i ett helt intervall av
tidpunkter kort före och kort efter tidpunkten där den faktiska rangen är lägre.
Utöver de resultat som rör osäkerhet i differential-algebraiska ekvationer innehåller
avhandlingen ett kapitel med resultat som handlar om att utifrån mätningar med
osäkerhet göra en uppskattning av en okänd variabel som tillhör en mångfald (en
sfär används som exempel). Den föreslagna metoden bygger på att dela upp mångfalden i små bitar, och beräkna sannolikheten för att variabeln befinner sig i respektive
bit. Metoden i sig är inte ny, utan fokus ligger på att föreslå ett ramverk för algoritmer
för den här typen av problem. Problem med mångfalds-struktur dyker regelmässigt
upp i samband med differential-algebraiska ekvationer.
Ett annat kapitel i avhandlingen handlar om ett nytt så kallat index-koncept för
differential-algebraiska ekvationer. Det nya indexet är nära relaterat till ett annat
väletablerat index, men är definierat på ett enklare sätt. Det nya indexet kan vara av
värde både i sig självt och som ett sätt att belysa det som är väl etablerat.
vii
Acknowledgments
My thanks to Professor Lennart Ljung, head of the Division of Automatic Control,
for generously allowing me to conduct research in his group. Lennart has been my
co-supervisor, and my work would only be half-finished by now if it was not for his
efforts to make me complete it. My thanks also to Professor Torkel Glad for being my
supervisor, I’ll get back to his name soon. Ulla Salaneck, secretary at the group, is a
Swiss army knife capable of solving any practical issue you can think of. Everybody
knows this, but what is probably less known is that she is also very good at poking
Lennart when it’s about time to prod slow students into finishing their theses.
Johan Sjöberg has been an important source of inspiration for the work on differential-algebraic equations, in particular the strangeness index. Marcus Gerdin was also
there with experienced advice when it all started.
Gustaf Hendeby has served the group with technical expertise in many areas related
to computer software, including being the LATEX guru for many years. He developed
the rtthesis class used to typeset this thesis, and taught me enough about LATEX so that
I was able to tweak the class to my own taste. Gustaf also helped out with proofreading. Martin Enqvist and Daniel Petersson contributed with outstandingly thorough
proofreading. Thanks goes to Thomas Schön for letting me work with him in the popular field of state estimation. As you, my dear reader, might already have guessed,
Torkel has been involved in proofreading most of the chapters. Those of you who
know him can easily imagine the value of this contribution to a thesis in the region
of automatic control where there is only little connection to reality. Umut Orguner
in the office on the opposite side of the corridor knows too much, so everyone asks
him all the questions, and he never refuses to answer.
Christian Lyzell brings his great attitude to work, is a hobby hard core Guitar Hero
guru, and has also provided valuable feedback on mpscatter, the Matlab toolbox
used to create most plots in this thesis. I also like our discussions on numerical maths,
even though it isn’t really related to my research. Talking about good discussions,
Daniel Petersson deserves a second mention for his interest in and many solutions to
a long list of mathematical problems I’ve had during the last years here.
There are many things I like to do in my spare time, and I’m very happy to have
had so many nice persons in the group to share my interests with. I’m thinking of
surfing waves and wind, nightly work on Shapes, gathering people for eating and
play, hiking and kayaking, disc golf, climbing, coffee and lunch breaks, and more.
It’s been great, and I hope that completing this thesis is not the end of it!
I am indebted to the Swedish Research Council for financial support of this work.
Nina, you make me ^
¨ and laugh. In addition, your contribution to this thesis as the
excellent chef behind it all has been worth a lot, so many thanks to you too. I’m sad
that all the writing has prevented me so much lately from sharing my time with you,
but I’m yours now!
Linköping, November 2009
Henrik Tidefelt
ix
Contents
Notation
I
xv
Background
1 Introduction
1.1 Differential-algebraic equations in automatic control
1.2 Introduction to matrix-valued singular perturbation
1.2.1 Linear time-invariant examples . . . . . . . .
1.2.2 Application to quasilinear shuffling . . . . . .
1.2.3 A missing piece . . . . . . . . . . . . . . . . .
1.2.4 How to approach the nominal equations . . .
1.2.5 Final remarks . . . . . . . . . . . . . . . . . .
1.3 Problem formulation . . . . . . . . . . . . . . . . . .
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . .
1.5 Thesis outline . . . . . . . . . . . . . . . . . . . . . .
1.6 Notation . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.1 Mathematical notation . . . . . . . . . . . . .
1.6.2 dae and ode terminology . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
2
5
6
6
7
8
9
9
10
10
12
2 Theoretical background
2.1 Models in automatic control . . . . . . . .
2.1.1 Examples . . . . . . . . . . . . . . .
2.1.2 Use in estimation . . . . . . . . . .
2.1.3 Use in control . . . . . . . . . . . .
2.1.4 Model classes . . . . . . . . . . . .
2.1.5 Model reduction . . . . . . . . . . .
2.1.6 Scaling . . . . . . . . . . . . . . . .
2.2 Differential-algebraic equations . . . . . .
2.2.1 Motivation . . . . . . . . . . . . . .
2.2.2 Common forms . . . . . . . . . . .
2.2.3 Indices and their deduction . . . .
2.2.4 Transformation to quasilinear form
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
15
16
17
17
18
19
21
22
23
25
28
35
xi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xii
CONTENTS
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.2.5 Structure algorithm . . . . . . . . . . . . . . . .
2.2.6 lti dae, matrix pencils, and matrix pairs . . .
2.2.7 Initial conditions . . . . . . . . . . . . . . . . .
2.2.8 Numerical integration . . . . . . . . . . . . . .
2.2.9 Existing software . . . . . . . . . . . . . . . . .
Initial condition response bounds . . . . . . . . . . . .
2.3.1 lti ode . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 ltv ode . . . . . . . . . . . . . . . . . . . . . .
2.3.3 Uncertain lti ode . . . . . . . . . . . . . . . . .
Regular perturbation theory . . . . . . . . . . . . . . .
2.4.1 lti ode . . . . . . . . . . . . . . . . . . . . . . .
2.4.2 ltv ode . . . . . . . . . . . . . . . . . . . . . .
2.4.3 Nonlinear ode . . . . . . . . . . . . . . . . . . .
Singular perturbation theory . . . . . . . . . . . . . . .
2.5.1 lti ode . . . . . . . . . . . . . . . . . . . . . . .
2.5.2 Generalizations of scalar singular perturbation
2.5.3 Multiparameter singular perturbation . . . . .
2.5.4 Perturbation of dae . . . . . . . . . . . . . . . .
Contraction mappings . . . . . . . . . . . . . . . . . .
Interval analysis . . . . . . . . . . . . . . . . . . . . . .
Gaussian elimination . . . . . . . . . . . . . . . . . . .
Miscellaneous results . . . . . . . . . . . . . . . . . . .
3 Shuffling quasilinear dae
3.1 Index reduction by shuffling . . . . . . . . .
3.1.1 The structure algorithm . . . . . . .
3.1.2 Quasilinear shuffling . . . . . . . . .
3.1.3 Time-invariant input affine systems .
3.1.4 Quasilinear structure algorithm . . .
3.2 Proposed algorithm . . . . . . . . . . . . . .
3.2.1 Algorithm . . . . . . . . . . . . . . .
3.2.2 Zero tests . . . . . . . . . . . . . . . .
3.2.3 Longevity . . . . . . . . . . . . . . .
3.2.4 Seminumerical twist . . . . . . . . .
3.2.5 Monitoring . . . . . . . . . . . . . . .
3.2.6 Sufficient conditions for correctness
3.3 Consistent initialization . . . . . . . . . . .
3.3.1 Motivating example . . . . . . . . . .
3.3.2 A bootstrap approach . . . . . . . . .
3.3.3 Comment . . . . . . . . . . . . . . . .
3.4 Conclusions . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
38
40
46
47
51
51
52
57
58
58
58
60
66
67
67
68
69
72
73
77
80
82
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
85
86
86
86
87
90
91
91
94
96
99
99
102
104
104
105
106
107
xiii
CONTENTS
II
Results
4 Point-mass filtering on manifolds
4.1 Introduction . . . . . . . . . . . . . . . . . . .
4.2 Background and related work . . . . . . . . .
4.3 Dynamic systems on manifolds . . . . . . . .
4.4 Point-mass filter . . . . . . . . . . . . . . . . .
4.4.1 Point-mass distributions on a manifold
4.4.2 Measurement update . . . . . . . . . .
4.4.3 Time update in general . . . . . . . . .
4.4.4 Dynamics that simplify time update .
4.5 Point estimates . . . . . . . . . . . . . . . . . .
4.5.1 Intrinsic point estimates . . . . . . . .
4.5.2 Extrinsic point estimates . . . . . . . .
4.6 Algorithm and implementation . . . . . . . .
4.6.1 Base tessellations (of spheres) . . . . .
4.6.2 Software design . . . . . . . . . . . . .
4.6.3 Supporting software . . . . . . . . . .
4.7 Example . . . . . . . . . . . . . . . . . . . . .
4.8 Conclusions and future work . . . . . . . . . .
4.A Populating the spheres . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
109
110
112
113
114
114
116
117
118
119
119
120
120
120
121
122
122
124
126
5 A new index close to strangeness
5.1 Two definitions . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 Derivative array equations and the strangeness index
5.1.2 Analysis based on the strangeness index . . . . . . . .
5.1.3 The simplified strangeness index . . . . . . . . . . . .
5.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Uniqueness and existence of solutions . . . . . . . . . . . . .
5.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.1 Computational complexity . . . . . . . . . . . . . . . .
5.4.2 Notes from experiments . . . . . . . . . . . . . . . . .
5.5 Conclusions and future work . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
129
129
130
132
134
138
141
145
146
146
147
6 lti ode of nominal index 1
6.1 Introduction . . . . . . . . . . . . . . . . . . . .
6.2 Schematic overview of nominal index 1 analysis
6.3 Decoupling transforms and initial conditions .
6.4 A matrix result . . . . . . . . . . . . . . . . . . .
6.5 An lti ode result . . . . . . . . . . . . . . . . .
6.6 The fast and uncertain subsystem . . . . . . . .
6.7 The coupled system . . . . . . . . . . . . . . . .
6.8 Extension to non-zero pointwise index . . . . .
6.9 Examples . . . . . . . . . . . . . . . . . . . . . .
6.10 Conclusions . . . . . . . . . . . . . . . . . . . .
6.A Details of proof of lemma 6.8 . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
149
150
152
154
159
164
167
168
170
174
178
180
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xiv
CONTENTS
7 lti ode of nominal index 2
7.1 Canonical form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.1 Derivation based on Weierstrass decomposition . . . . . . . . .
7.1.2 Derivation without use of Weierstrass decomposition . . . . .
7.1.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Initial conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Growth of eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4 Case study: a small system . . . . . . . . . . . . . . . . . . . . . . . . .
7.4.1 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4.2 Transition matrix . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4.3 Simultaneous consideration of initial conditions and transition
matrix bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.A Decoupling transforms . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.A.1 Eliminating slow variables from uncertain dynamics . . . . . .
7.A.2 Eliminating uncertain variables from slow dynamics . . . . . .
7.A.3 Remarks on duality . . . . . . . . . . . . . . . . . . . . . . . . .
7.B Example data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
185
185
187
190
193
195
198
202
203
205
8 ltv ode of nominal index 1
8.1 Slowly varying systems . . . . . . . . . . . . . . . . . . . . .
8.2 Time-varying systems with timescale separation . . . . . .
8.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . .
8.2.2 Eliminating slow variables from uncertain dynamics
8.2.3 Eliminating uncertain variables from slow dynamics
8.3 Comparison with scalar perturbation . . . . . . . . . . . . .
8.4 The decoupled system . . . . . . . . . . . . . . . . . . . . .
8.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.A Dynamics of related systems . . . . . . . . . . . . . . . . . .
227
228
232
233
234
237
239
239
242
243
9 Concluding remarks
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
207
208
210
211
216
219
221
247
A Sampling perturbations
249
A.1 Time-invariant perturbations . . . . . . . . . . . . . . . . . . . . . . . 249
A.2 Time-varying perturbations . . . . . . . . . . . . . . . . . . . . . . . . 250
Bibliography
255
Index
267
Notation
These tables are provided as quickly accessible complements to the lengthier explanations of notation in section 1.6.
Some sets, manifolds, and groups
Notation
N
R
C
Rn
Meaning
M
Set of natural numbers.
Set of real numbers.
Set of complex numbers.
Set of n-tuples of real numbers, or n-dimensional Euclidean space.
The n-sphere, that is, the sphere of dimension n.
Special orthogonal group of dimension n. S O(3) is the
group of rigid body rotations.
Standard manifold in chapter 4.
Lν
Solution set of FνS ( x, ẋ, . . . , ẋ(ν+1) , t ) = 0, in chapter 5.
Sn
S O(n)
!
Matrix properties
Notation
λ( X )
α( X )
λmin ( X )
λmax ( X )
max(X)
kXk2
kXkI
XY
n
Meaning
The set of eigenvalues of the matrix X.
max { Re λ : λ ∈ λ( X ) }
min { |λ| : λ ∈ λ( X ) }
max { |λ|
: λ ∈ λ( X ) }
maxi,j Xij Induced 2-norm of X. supu,0 |X|u|u|
supt∈I kX(t)k2 , where I is a given interval of time.
X − Y is positive semidefinite.
Matrix dimension, see section 1.6.1.
xv
xvi
Notation
Basic functions and operators
Notation
I
δ
ex
etpv
|x|
bxc
dxe
d( x, y )
X\Y
∂X
•
Meaning
Identity matrix or the identity map.
Dirac delta “function”, in chapter 4.
Exponential function evaluated at x.
Exponential map based in p, evaluated at t v.
Modulus (absolute value) of x if x is scalar, 2-norm of x
if x is vector.
Floor of x, that is, the largest integer not greater than x.
Ceiling of x, that is, the smallest integer not less than x.
Distance between x and y in induced Riemannian metric.
Set difference, set of elements in X that are not in Y .
Boundary of the set X.
Argument of function in bullet notation. Example:
f ( x, •, z ) = y 7→ f ( x, y, z ).
Differentiation and shift operators
Notation
x0
x0(i)
x0(i+)
x0{i}
q
∇x
∇i x
∇i+ x
Meaning
Derivative of x, with x being a function of a single real
argument. Avoid confusion with ẋ.
Derivative of x or order i. Avoid confusion with ẋ(i) .
Sequence or concatenation of x0(i) , x0(i+1) , . . . , x0(ν+1) , for
some ν determined by context. Analogous definition for
ẋ(i+) .
Sequence or concatenation of x, x0 , . . . , x0(i) . Analogous
definition for ẋ{i} .
Shift operator for sequences. (qx)( n ) = x( n + 1 ).
Gradient of x, see section 1.6.1.
Gradient of x with respect to argument number i, see
section 1.6.1.
Concatenated gradients of x with respect all arguments
starting from number i, see section 1.6.1.
Probability theory and filtering
Notation
P( H )
fx
fx|y
N ( m, C )
VarX ( x )
y0..t
xs|t
Meaning
Probability of the event H.
Probability density function for stochastic variable x.
fx|Y =y , with x and Y stochastic variables, and y a point.
Gaussian distribution with mean m and covariance C.
Variance of X at the point x, see (4.7).
Measurements up to time t, in chapter 4.
State estimate at time s given y0..t , in chapter 4.
xvii
Notation
Ordo notation
Meaning
< δ =⇒ y < k
Notation
y = O( x )
y = O( x0 )
y = o( x )
∃ δ > 0, k < ∞ : |x|
|x| Exception! ∃ δ > 0, k < ∞ : |x| < δ =⇒ y < k
|y |
limx→0 |x| = 0
Intervals
Notation
[ a, b ]
( a, b )
( a, b ]
[ a, b )
Meaning
{x ∈ R
{x ∈ R
{x ∈ R
{x ∈ R
:
:
:
:
a ≤ x ≤ b}
a < x < b}
a < x ≤ b}
a ≤ x < b}
Logical operators
Notation
P ∧Q
P ∨Q
Meaning
Logical conjunction of P and Q, “P and Q”.
Logical disjunction of P and Q, “P or Q”.
Abbreviations
Abbreviation
bdf
dae
irk
lti
ltv
ode
Meaning
Backwards difference formula
Differential-algebraic equation(s)
Implicit Runge-Kutta
Linear time-invariant
Linear time-varying
Ordinary differential equation(s)
Part I
Background
1
Introduction
This chapter gives an introduction to the thesis by explaining very briefly the field
in which it has been carried out, presenting the contributions in view of a problem
formulation, and giving some reading directions and explanations of notation.
1.1
Differential-algebraic equations in automatic control
This thesis has been carried out at the Division of Automatic Control, Linköping University, Sweden, within the research area nonlinear and hybrid systems. Differentialalgebraic equations is one of a small number of research topics in this area. We shall
not dwell on whether these equations are particularly nonlinear or related to hybrid
systems; much of the research so far in this group has been on linear time-invariant
differential-algebraic equations, although there has lately been some research also on
differential-algebraic equations that are not linear. From here on, the abbreviation
dae will be used for differential-algebraic equation(s).
In the field of automatic control, various kinds of mathematical descriptions are used
to build models of the objects to be controlled. Sometimes, the equations are used
primarily to compute information about the object (estimation), sometimes the equations are used primarily to compute control inputs to the object (control ), and often
both tasks are performed in combination. From the automatic control point of view
the dae are thus of interest due to their ability to model objects. Not only are they
able to model many objects, but in several situations they provide a very convenient
way of modeling these objects, as is further discussed in section 2.2. In practice, the
dae generally contain parameters that need to be estimated using measurements on
the object; this process is called identification.
1
2
1
Introduction
In this thesis the concern is neither primarily with estimation, control, nor identification of objects modeled by dae. Rather, we focus on the more fundamental questions
regarding how the equations relate to their solution in so-called initial value problems •. It is believed that this will be beneficial for future development of the other
three tasks.
1.2
Introduction to matrix-valued singular perturbation
Section 2.5 will give some background on scalar and multi-parameter singular perturbation problems, and in chapters 6, 7 and 8 methods from scalar singular perturbation theory (Kokotović et al., 1986) will play a key role in the theoretical development. In view of the problems encountered when analyzing dae under uncertainty,
we have coined the term matrix-valued singular perturbation to denote the generalization of the singular perturbation problems to the case when the uncertainties form
a whole matrix of small values.
For nonlinear time-invariant systems, the basic concern is systems in the form˜
!
x0 (t) + gx ( x(t), z(t) ) = 0
!
E z 0 (t) + gz ( x(t), z(t) ) = 0
(1.1)
where E is an unknown small matrix; max(E) ≤ m. For time-varying systems, E is
also allowed to be time-varying, and even more general nonlinear systems are obtained by allowing E to also depend on x(t). The problem is to analyze the solutions
to the equations as m → 0. However, the nonlinear form (1.1) is much more general
than the forms addressed in the thesis. Below, we give examples, clarifications, and
further motivation for the study of matrix-valued singular perturbation.
1.2.1
Linear time-invariant examples
The linear time-invariant (often abbreviated lti) examples below are typical in that
the equations are not given in matrix-valued singular perturbation form. Instead, the
form appears after some well-conditioned operations on the equations and changes
of variables, which will allow the solution to the original problem to be reconstructed
from the solution to the matrix-valued singular perturbation problem.
1.1 Example
Starting from an index 0 dae in two variables,
"
#
"
#
1. 7. 0
3. 2.
!
x (t) +
x(t) = 0
1. 3.
2. 1.
an index 1 dae in three variables is formed by making a copy of the second variable;
x̄1 = x1 , x̄2 = x2 , x̄3 = x2 . In the leading matrix (in front of x̄0 (t)), the second variable
The problem of computing the future trajectory of the variables given external inputs and sufficient information about the variables at the initial time.
˜
Note that the subscripts on g are just meaningful ornaments, and do not denote partial derivatives.
•
1.2
3
Introduction to matrix-valued singular perturbation
is replaced to 70% by the new variable. In the trailing matrix (in front of x̄(t)) we add
!
a row with the coefficients of the copy relation x̄2 (t) − x̄3 (t) = 0.




1. 2.1 4.9
3. 2. 0 
!
1. 0.9 2.1 0
2. 1. 0 

 x̄ (t) + 
 x̄(t) = 0




0
0
0
0 1 −1
|
{z
}
| {z }
E
(1.2)
A
To analyze this dae, it is noted that the leading matrix E is row reduced (its non-zero
rows are linearly independent), and the row where the leading matrix has only zeros
can be differentiated without introducing higher order derivatives. This leads to




1. 2.1 4.9
3. 2. 0
1. 0.9 2.1 x̄0 (t) + 2. 1. 0 x̄(t) =! 0




0
1 −1
0 0 0
where x0 (t) can be solved for, so that an ode is obtained. In the terminology of dae,
index reduction has successfully revealed an underlying (implicit) ode.
Now, instead of performing index reduction on (1.2) directly, consider first applying
the well-conditioned change of equations given by the matrix

−1
2. −9. 0.


T B 4. · 8. −5. 3.


1. −5. 7.
It is natural to expect that this should not make a big difference to the difficulty in
solving the dae via an underlying ode, but when the computation is performed on a
computer, the picture is not quite as clear. The new dae has the matrices T E and T A.
By computing a QR factorization (using standard computer software) of the leading
matrix, a structurally upper triangular leading matrix was obtained together with an
orthogonal matrix Q associated with this form. The corresponding trailing matrix is
obtained as QT A. This leads to




−2.2 
−0.53 −0.41 
−0.62 −0.95
 −1.6
!

 0

0.62
1.4  x̄0 (t) +  0.51
0.56 −0.048 x̄(t) = 0





−7.2 · 10−17 0.46
−0.46
0
0
3.4 · 10−16
where a well-conditioned change of variables can bring the equations into the linear
time-invariant form of (1.1) with E ∈ R1×1 . (One can just as easily construct examples
where E is of dimension larger than 1 × 1.)
Although looking like an implicit ode, this view is unacceptable for two reasons.
First, the system of equations is extremely stiff. (Even worse, the fast mode happens
to be unstable this time, not at all like the original system.) Second, considering
numerical precision in hardware, it would not make sense to compute a solution that
depends so critically on a coefficient that is not distinctly non-zero.
4
1
Introduction
The ad hoc solution to the problem in the example is to replace the tiny coefficient
in the leading matrix by zero, and then proceed as usual, but suppose ad hoc is not
good enough. How can one then determine if 3.4 · 10−16 is sufficiently tiny, or just
looks tiny due to equation and variable scalings? What is the theoretical excuse for
the replacement of small numbers by zeros? What assumptions have to be made?
The next example suggests that the ill-posedness may be possible to deal with. The
assumptions made here are chosen theoretically insufficient on purpose — the point
is that making even the simplest assumptions seems to solve the problem. The example also contains some very preliminary observations regarding how to scale the
equations in order to make it possible to make decisions based on the absolute size
of the perturbations.
1.2 Example
Having equations in the form
!
E x0 (t) + A x(t) = 0
modelling a two-timescale system (see section 2.5) where the slow dynamics is known
to be stable, we now decide that unstable fast dynamics is unreasonable for the system at hand. In terms of assumptions, we assume that the fast dynamics of the system
is stable. We then generate random perturbations in the equation coefficients that we
need to replace by zero, discarding any instances of the equations that disagree with
our assumption, and use standard software to solven the remaining
instances. Two
o
trailing matrices are used, given by selecting δ from 1, 10−2 in the pattern


 
 A11 A12   0.29 0.17 0.046 
A = 
 B  0.34 δ 0.66 δ 0.66 

A21 A22  
0.87 δ 0.83 δ 0.14
and then scaling the last two rows so they get the same norm as the first row. In the
leading matrix,


  1. 1. 1. 


 E11 E12  
 B  0 ?11 ?12 
E = 

0 E22  
0 ?21 ?22 
it is the block E22 that will be instantiated with small random perturbations. As in the
previous example, the form of E is just a well-conditioned change of variables away
from the lti form of (1.1). In order to illustrate what happens when the perturbations
become smaller, the perturbations are generated such that max(E22 ) = m, for a few
values of m. To achieve this, an intermediate matrix T of the same dimensions as E22
is generated by sampling each entry from a uniform distribution over [ −1, 1 ], and
m
T.
then E22 B max(T
)
The example is chosen such that m = 0 yields a stable slow system. Thus the perturbations of interest are those that make all modes of the stiff system stable. The initial
conditions are chosen with x1 (0) = 1 and consistent with m = 0.
1.2
5
Introduction to matrix-valued singular perturbation
1
1
0.5
0.5
0
0
0
2
4
6
1
1
0.5
0.5
0
0
2
4
6
0
2
4
6
0
2
4
6
0
0
2
4
6
1
1
0.5
0.5
0
0
0
2
4
6
Figure 1.1: Solutions for x1 obtained by generating 50 random perturbations
of given magnitudes. Details are given in the text. Left: A defined by δ = 1.
Right: A defined by δ = 10−2 . Top: m = 1. · 10−1 . Middle: m = 1. · 10−3 . Bottom:
m = 1. · 10−5 .
Simulation results are shown in figure 1.1. By choosing a threshold for m based on
visual appearance, the threshold can be related to δ. Finding that 1. · 10−3 and 1. · 10−5
could be reasonable choices for δ being 1 and 10−2 , respectively, it is tempting to
conclude that it would be wise to base the scaling of the last two rows on A22 alone.
1.2.2
Application to quasilinear shuffling
In theory, index reduction of equations in the quasilinear form
!
E( x(t), t ) x0 (t) + A( x(t), t ) = 0
(1.3)
is simple. Similar to how the linear time-invariant equations were analyzed in example 1.1, the equations are manipulated using invertible row operations so that
the leading matrix becomes separated into one block which is completely zeroed,
and one block with independent rows. The discovered non-differential equations are
then differentiated, and the procedure is repeated until the leading matrix gets full
rank. As examples of the in-theory ramifications of this description, consider the
following list:
• It may be difficult to perform the row reduction in a numerically wellconditioned way.
• The produced equations may involve very big expressions.
• Testing whether an expression is zero is highly non-trivial.
6
1
Introduction
The forthcoming discussion applies to the last of these ramifications. Typical examples in the literature have leading matrices whose rank is determined solely by
a zero-pattern. For instance, if some variable does not appear differentiated in any
equation, the corresponding column of the leading matrix will be structurally zero.
It is then easy to see that this column will remain zero after arbitrarily complex row
operations, so if the operations are chosen to create structural zeros in the other
columns at some row, it will follow that the whole row is structurally zero. Thus
a non-differential equation is revealed, and when differentiating this equation, the
presence of variables in the equation determines the structural zero-pattern of the
newly created row in the leading matrix, and so the index reduction may be continued.
Now, recall how the zero-pattern was lost by a seeminly harmless transform of the
equations in example 1.1. Another situation when linear dependence between rows
in the leading matrix is not visible in a zero-pattern, is when a user happens to write
down equations that are dependent up to available accuracy. It must be emphasized
here that available accuracy is often not a mere question of floating point number
representation in numerical hardware (as in our example), but a consequence of uncertainties in estimated model parameters.
In chapter 3, it is proposed that a numerical approach is taken to zero-testing whenever tracking of structural zeros does not hold the answer, where an expression is
taken for being (re-writable to) zero if it evaluates to zero at some trial point. Clearly,
a tolerance will have to be used in this test, and showing that a meaningful threshold
even exists is one of the main topics in the thesis. When there are many entries in
the leading matrix which need numerical evaluation at the time, well-conditioned
operations on the equations (row operations and a change of variables) lead to the
form (1.1) where E contains all the small expressions, and generally depends on both
x(t) and t.
1.2.3
A missing piece in singular perturbation of
ODE
Our final attempt to convince the reader that our topic is interesting is to remark that
matrix-valued singular perturbations are not only a delicate problem in the world of
dae. These problems are also interesting in their own right when the leading matrix
of a dae (or E in (1.1)) is known to be non-singular so that the dae is really just
an implicit ode. Then the matrix-valued singular perturbation problem is a natural
generalization of existing singular perturbation problems for ode (see section 2.5).
In the language of the thesis, we say that these equations are dae of pointwise index 0, and most of the singular perturbation results in the thesis will actually be
restricted to this type of equations.
1.2.4
How to approach the nominal equations
The perturbation results in the thesis are often formulated using O( max(E) ) (where
E is the matrix-valued perturbation, and not the complete leading matrix) or formulated to apply as m → 0, where m is a bound on all uncertainties in the equations. It
1.2
Introduction to matrix-valued singular perturbation
7
is motivated to ask what the practical implications of such bounds really mean, and
there are several answers.
In some situations, the uncertainties in the equations will be given without any possibility to be reduced. In that case, our results provide that we will be able to test the
size of the uncertainties to see if they are small enough for the perturbation analysis
to apply, and if they are, there will be a bound on the uncertainty in the solutions (or
whatever other property that the result at hand concerns). On the other hand, there is
always the alternative to artificially increase uncertainty in the model. Increasing the
uncertainty of a near-zero interval with large relative uncertainty, so that it includes
zero, may sometimes be interpreted as model reduction, where very stiff equations
are approximated by non-stiff equations.
In other situations, there may be a possibility to reduce the size of the uncertainties,
typically at the cost of spending more resources of some kind. Then, our results may
be interpreted as spending enough resources, property so and so can be obtained.
Examples of how spending more resources may reduce uncertainty are given below.
• If the equations contain parameters that are the result of a system identification
procedure, uncertainty can often be reduced by using more estimation data or
by investing in more accurate sensors.
• If the uncertainty in the equations is due to uncertainties in floating point
arithmetic, a multi-precision library for floating point arithmetic (such as GMP
(2009)) may be used to reduce uncertainty.
• In a time-varying system (and hopefully non-linear systems in the future),
where the matrix-valued singular perturbation problem arises when the leading matrix looses rank at some point, the size of the matrix-valued perturbation can be reduced by integrating the increasingly stiff lower-index equations
closer to the point where the rank drops. Due to the increasing stiffness, it will
be computationally costly to integrate the lower-index equations with adequate
precision.
1.2.5
Final remarks
There is also another application of results on matrix-valued singular perturbation,
more closely related to the field of automatic control. This concerns the use of unstructured dae as models of dynamic systems; not until well-posedness of solutions
to such equations has been established does it make sense to consider problems such
as system identification or control for such models.
It should also be mentioned that we are not aware of any strong connection between
physical models from any field, and matrix-valued singular perturbation. Electrical
systems, for instance, have scalar quantities, and the singular perturbation problems
one encounters are generally of multiparameter type. A natural place to search for
matrix-valued singular perturbations would be inertia-matrices of rod-like objects.
However, these matrices are normal, and a norm-preserving linear (although typically uncertain) change of variables can be used to make the inertia-matrix diago-
8
1
Introduction
nal.• Hence, only if one is unwilling to make use of the uncertain change of variables
and ignores the normality constraint which is satisfied by all inertia matrices, will
the matrix-valued singular perturbation be necessary to deal with in its full generality. Furthermore, even if a single rod would be handled as a matrix-valued singular
perturbation, the dimension of the uncertain subsystem is just 1, so scalar singular
perturbation techniques would apply. The rotation of point-like objects on the other
hand, does not have a non-trivial nominal solution, making also these objects unsuitable for demonstrations.
While we are not aware of physical modeling in any field which would require the
use of matrix-valued singular perturbation theory, if modeled carefully, we are aware
that many models are not developed carefully in order to avoid matrix-valued singular perturbation problems. Matrix-valued singular perturbation theory needs to
be developed for the rescue of these models, as well as all algorithms and software
which systematically produce such problems.
1.3
Problem formulation
The long term goal of the work in this thesis is a better understanding of uncertainty
in differential-algebraic equations used as models in automatic control and related
fields. While we were originally concerned with dae in the quasilinear form (1.3),
the questions arising regarding uncertainty in this form turned out to be unanswered
also for the much more restricted lti dae.
In order to understand the solutions of a dae one generally applies some kind of
index reduction scheme involving differentiation of the equations with respect to
time. One of the more recent approaches to analysis of general nonlinear dae centers
around the strangeness index, and one of the problems considered in the thesis is that
a better understanding of this analysis is needed in order to even see the structure of
the associated perturbation problems arising from the uncertainty in the equations.
The main problem addressed in the thesis is related to the less sophisticated index
reduction schemes associated with the differentiation index and the shuffle algorithm. Here the perturbation problems turn out to be readily transformable into the
matrix-valued singular perturbation form, and we ask how these problems can be
approached, what qualitative properties they possess, and how the relation between
uncertainty in the equations and the uncertainty in the solution may be quantified.
Another problem considered in the thesis, related to differential-algebraic equations
used as models in automatic control, is how to develop geometrically sound algorithms with manifold-valued variables.
•
More generally, this suggests that matrix-valued singular perturbation can be avoided in rigid body mechanics as long as the kinetic energy is a quadratic form in the time derivative of the generalized coordinates.
1.4
Contributions
1.4
9
Contributions
The main contributions in this thesis are, in approximate order of appearance:
• Introduction of the so-called matrix-valued singular perturbation problem as a
natural generalization of existing singular perturbation problem classes, with
applications to uncertainty and approximation in differential-algebraic equations.
• An application related to modeling with differential-algebraic equations: pointmass filtering on manifolds.
• The proposed simplified strangeness index along with basic properties and its
relation to the closely related strangeness index.
• Extension of previous perturbation results for linear time-invariant differentialalgebraic equations of nominal index 1, introducing assumptions about eigenvalues as the main tool to obtain convergence.
• A canonical form for uncertain matrix pairs of nominal index 2.
• Generalizations of some of the linear time-invariant perturbation results from
nominal index 1 to nominal index 2.
• Perturbation results for linear time-varying differential-algebraic equations of
nominal index 1.
1.5
Thesis outline
The thesis is divided into two parts dividing the thesis into theoretical background
(first) and new results (second).
Some notation is explained in the next section, completing the first chapter in the
first part. Most readers will probably find it worth-while skimming through that
section before proceeding to later chapters. The theoretical background of the thesis
is, with very few exceptions, given in chapter 2. When exceptions are made in the
second part of the thesis, this will be made clear so that there is no risk of confusion with new results. Chapter 3 contains material from the author’s licentiate thesis
Tidefelt (2007), and is included in the present thesis mainly to show the connection
between nonlinear systems and the matrix-valued singular perturbation results for
linear systems in the second part of the thesis. Readers interested in index reduction of quasilinear dae may find some of the ideas in the chapter interesting, but the
chapter is put in the first part of the thesis since the seminumerical schemes it proposes are incomplete as long as the related singular perturbation problems are better
understood. Other readers may safely skip chapter 3.
Turning to the second part, the first two chapters are only loosely related to the title of
the thesis. Chapter 4 presents a state estimation technique with potential application
to systems described by differential-algebraic equations. Then, chapter 5 proposes a
new index concept which is closely related to the strangeness index, but unlike the
10
1
Introduction
index reduction scheme of chapter 3, the structure of the perturbation problems associated with the strangeness-like indices is not yet analyzed. Hence, it is not clear
whether the results on matrix-valued singular perturbation in the following three
chapters will find applications in solution techniques related to the strangeness index.
Chapter 6 extends the early results on matrix-valued singular perturbation that appeared in Tidefelt (2007). These results apply to lti dae of nominal index 1, and
lti dae of nominal index 2 are considered in chapter 7. In chapter 8 some of the
nominal index 1 results are extended to time-varying equations.
Chapter 9 contains conclusions and directions for future research.
1.6
Notation
The present section introduces basic terminology and notation used throughout the
thesis. Not all the terminology is defined here though. Abbreviations and some
symbols are defined in the tables on page xv, and the reader will find references
to all definitions (including those given here) in the subject index at the end of the
thesis, after the bibliography.
1.6.1
Mathematical notation
The terms and factors ofQ
sums and products
over index
Q sets have unlimited extent to
Q
the right. For example, ( i |λi | ) + 1 , i |λi | + 1 = i (|λi | + 1).
If α is a scalar, Σ is a set of scalars, and ∼ is a relation between scalars, then α ∼ Σ
(or Σ ∼ α) means ∀ σ ∈ Σ : α ∼ σ (or ∀ σ ∈ Σ : σ ∼ α). For instance, Re λ( X ) < 0
means that all eigenvalues of X have negative real parts. In the example, we also
used that functions automatically map over sets (in Mathematica, the function is said
to thread over its argument) if there is no ambiguity.
!
The symbol = is used to indicate an equality that shall be thought of as an equation.
Compare this to the plain =, which is used to indicate that expressions are equal in
the sense that one can be rewritten as the other, possibly√using context-dependent
assumptions. For example, assuming x ≥ 0, we may write x2 = x.
The symbol B is used to introduce names for values or expressions. The meaning
4
of expressions can be defined using the symbol =. Note that the difference between
4
f B x 7→ x2 and f ( x ) = x2 is mainly conceptual; in many contexts both would
work equally well.
If x is a function of one variable (typically thought of as time), the derivative of x
with respect to its only argument is written x0 . The composed symbol ẋ shall be used
to denote a function which is independent of x, but intended to coincide with x0 . For
example, in numeric integration of x00 = u, where u is a forcing function, we write
1.6
11
Notation
the ordinary differential equation as
(
ẋ0 = u
x0 = ẋ
Higher order derivatives are denoted x00 , x0(3) , . . . , or ẍ, ẋ(3) , . . . . When the highest order of dots, say ẋ(ν+1) , is determined by context, ẋ(i+) is a short hand for the
sequence or concatenation of ẋ(i) , . . . , ẋ(ν+1) . Conversely, the sequence or concatenation of x, x0 , . . . , x0(i) is denoted x0{i} , and we define ẋ{i} analogously. Making the
distinction between x0 and ẋ this way — and not the other way around — is partly
for consistency with the syntax of the Mathematica language, in which our algorithms
are implemented.
Gradients (Jacobians), are written using the operator ∇. For example, ∇f is the gradient (Jacobian) of f , assuming f takes one vector-valued argument. If a function
takes several arguments, a subscript on the operator is used to denote with respect
to which argument the gradient is computed. For example, if f is a function of 3
arguments, then
∇2 f = ( x, y, z ) 7→ ∇ ( w 7→ f ( x, w, z ) ) ( y )
The notation ∇i+ is used to denote concatenated gradients with respect to all arguments starting from number i. For example, with f as before
h
i
∇2+ f = ( x, y, z ) 7→ ∇ ( w 7→ f ( x, w, z ) ) ( y ) ∇ ( w 7→ f ( x, y, w ) ) ( z )
Bullet notation is used for compact notation of functions of one unnamed argument.
The expression which becomes the “body” of the function is the smallest complete
expression containing the bullet. For example, let the first argument of f be real,
then
f ( •, y, z )0 ( x ) = ∇1 f ( x, y, z )
4
For a time series ( xn )n , the forward shift operator q is defined as qxn = xn+1 .
Matrices are constructed within square brackets. Vectors are constructed by vertical
alignment within parentheses. A single row of a matrix, thought of as an object
with only one index, is constructed by horizontal alignment within parentheses. If a
column of a matrix is though of as having only one index, it is constructed using the
same notation as a vector. There is no distinction in notation between contravariant
and covariant vectors. (Square brackets are, however, also used sometimes in the
same way as parentheses are used to indicate grouping in equations.) Square brackets
and parentheses are also used with two real scalars separated by a comma to denote
intervals of real numbers, see the notation table on page xv. Tuples (lists) are also
denoted using parentheses and elements separated by commas, but it will be clear
from the context when ( a, b ) is an open interval and when it is a two-tuple.
If the variable n has no other meaning in the current context, but there is a square
matrix that can be associated with the current context, then n denotes the dimension
of this matrix.
12
1.6.2
1
DAE
and
ODE
Introduction
terminology
In accordance with most literature on this subject, equations not involving differentiated variables will often be denoted algebraic equations, although non-differential
equations — a better notation from a mathematical point of view — will also be used
interchangeably.•
The quasilinear form of dae has already been introduced, repeated here,
!
E( x(t), t ) x0 (t) + A( x(t), t ) = 0
(1.3)
The matrix-valued function E which determines the coefficients for the differentiated variables, as well as the expression E( x(t), t ), will be referred to as the leading
matrix. This terminology is also used for the important subtypes of quasilinear dae
being the linear dae, see below. The function A as well as the expression A( x(t), t )
will be referred to as the algebraic term.˜ This terminology will only be used when
the algebraic term is not affine in x(t), for otherwise the terminology of linear dae is
more precise. This brings us to the linear dae.
An autonomous lti dae has the form
!
E x0 (t) + A x(t) = 0
(1.4)
where E and A are constant matrices. By autonomous, we mean that there is no way
external inputs can enter this equation, so the system evolves in a way completely defined by its initial conditions. Adding a forcing function (often representing external
inputs) while maintaining the lti property— leads to the general lti dae form
!
E x0 (t) + A x(t) + B u(t) = 0
(1.5)
where u is a vector-valued function representing external inputs to the model, and B
is a constant matrix.–
In the linear dae (1.5) and (1.4), the matrix A of coefficients for the non-differentiated
variables, is denoted the trailing matrix.† It may be a function of time, if the linear
dae is time-varying.
To complement the terminology that has been introduced for dae, we shall introduce
some corresponding terminology for ode. In the nonlinear ode (sometimes written
Seeking a notation which is both short and not misleading, the author would prefer static equations, but
this notation is avoided to make the text more accessible.
˜
By this definition, the algebraic term with reversed sign is sometimes referred to as the right hand side of
the quasilinear dae.
—
The solution will be linear in initial conditions (regardless of the initial time) if the forcing function is
zero, and linear in the forcing function (regardless of the initial time, if the forcing function is suitably
time-shifted) if the initial conditions are zero.
–
In the terminology of quasilinear dae, the expression A x(t) + B u(t) would constitute the algebraic term
here. However, it is affine in x(t) so we prefer to use the more specific terminology of linear dae.
† This terminology may seem in analogy with the term leading matrix. However, the reason why the leading
matrix has received its name is unknown to the author, and trailing matrix was invented for the thesis
to avoid ambiguity with the state feedback matrix, to there is no common source of analogy. Rather, the
term trailing matrix appears natural in view of the leading matrix being the matrix which is listed first in
a matrix pair, see section 2.2.6.
•
1.6
13
Notation
!
with “=” instead of “=” to stress that the differentiated variables are trivial to solve
for)
!
x0 (t) = f ( x(t), t )
(1.6)
the function f as well as the expression f ( x(t), t ) are called the right hand side of
the ode. If f is affine in its first argument, that is,
f ( x, t ) = M( t ) x(t) + b( t )
the matrix M as well as the expression M( t ) are called the state feedback matrix.•
When an ode or a dae has a term which only depends on time, such as b(t) here,
this term will be denoted the forcing term of the equation. Often, the forcing term
is in the form b(t) = β( u(t) ), where the function β is considered fixed, while u is
considered an unknown external input to the equation. The function u is then denoted the forcing function or input to the equation. If b(t) is linear in u(t), that is,
b(t) = B(t) u(t) where B(t) is a matrix, then this matrix as well as B are called the
input matrix of the equation. In case of ode, this leads to the ltv ode form
!
x0 (t) = M(t) x(t) + B(t) u(t)
(1.7)
and the lti ode form
!
x0 (t) = M x(t) + B u(t)
(1.8)
When f does not depend on its second argument the ode is said to be time-invariant ˜.
The autonomous counterparts of (1.7) and (1.8) are hence obtained by setting u B 0.—
1.3 Example
As an example of the notation, note that if E in (1.5) is non-singular, then there is
a corresponding ode in x with state feedback matrix −E −1 A. Since the term input
matrix is being used both for dae and ode, care must be taken when using the term
in a context where a system is being represented both as a dae and an ode; the input
matrix of the ode here would be −E −1 B, while the input matrix of the dae (1.5) is B.
For dae, the autonomous ltv form is
!
E( t ) x0 (t) + A( t ) x(t) = 0
(1.9)
and the general ltv dae form with forcing function is
!
E( t ) x0 (t) + A( t ) x(t) + B( t ) u(t) = 0
(1.10)
This notation is borrowed from Kailath (1980). We hereby avoid the perhaps more commonly used notation system matrix, because of the other — yet related — meanings this term also bears.
˜
While this terminology is widely used in the automatic control community, mathematicians tend to denote
the ode autonomous rather than time-invariant. Our use of autonomous indicates the absence of a forcing
term in the equations, and is only used with equation forms where there is a natural counterpart with
forcing function.
—
Note that our definition of autonomous linear time-varying ode is not autonomous in the sense often used
by mathematicians. For linear time-invariant ode, however, the two uses of autonomous are compatible.
•
14
1
Introduction
While the solution x to the ode is referred to as the state vector or just the state of
the ode, the elements of the solution x to the dae are referred to as the variables of
the dae.
A dae is denoted square if the number of equations and variables match. When
a set of equations characterizing the solution manifold has been derived, these are
sometimes completed with differential equations so that a square dae of strangeness
index 0 is obtained. This dae will then be referred to as the reduced equation.
By an initial value problem we refer to the problem of computing trajectories of the
variables of a dae (or ode), over an iterval [ t0 , t1 ], given sufficient information about
the variables and their derivatives at time t0 .
2
Theoretical background
The intended audience of this thesis is not expected to have prior experience with
both automatic control and differential-algebraic equations. For those without background in automatic control, we start the chapter in section 2.1 by providing general
motivation for why we study equations, and dae in particular. For those with background in automatic control, but with only very limited experience with dae, we try
to fill that gap in section 2.2.
The remaining sections of the chapter present other theoretical background material
that will be used in later chapters. To keep it clear what the contributions of the
thesis are, there are just a few exceptions (most notably in chapter 5, as is explained
in the introduction to that chapter) to the rule that existing results used in the second
part of the thesis are presented here.
2.1
Models in automatic control
Automatic control tasks are often solved by engineers without explicit mathematical
models of the controlled or estimated object. For instance, a simple low pass filter
may be used to get rid of measurement noise on the signal from a sensor, and this can
work well even without saying Assume that the correct measurement is distorted by
zero mean additive high frequency noise. Speaking out that phrase would express
the use of a simple model of the sensor (whether it could be called mathematical is a
matter of taste). As another example, many processes in industry are controlled by a
so-called pid controller, which has a small number of parameters that can be tuned to
obtain good performance. Often, these parameters are set manually by a person with
experience of how these parameters relate to production performance, and this can
be done without awareness of mathematical models. Most advances in control and
15
16
2
Theoretical background
estimation theory do, however, build on the assumption that a more or less accurate
mathematical model of the object is available, and how such models may be used,
simplified, and tuned for good numerical properties is the subject of this section.
2.1.1
Examples
The model of the sensor above was only expressed in words. Our first example of a
mathematical model will be to say the same thing with equations. Since equations
are typically more precise than words, we will loose some of the generality, a price we
are often willing to pay to get to the equations which we need to be able to apply our
favorite methods for estimation and/or control. Denote, at time t, the measurement
by y(t), the true value by x(t), and let e be a white noise• source with variance σ 2 . Let
v(t) be an internal variable of our model:
!
y(t) = x(t) + v(t)
!
0
0
v(t) + v (t) = e (t)
(2.1a)
(2.1b)
A drawback of using a precise model like this is that our methods may depend too
heavily on that this is the correct model; we need to be aware of how sensitive our
methods are to errors in the mathematical model. Imagine, for instance, that we build
a device that can remove disturbances at 50 Hz caused by the electric power supply.
If this device is too good at this, it will be useless if we move to a country where
the alternate current frequency is 60 Hz, and will even destroy information of good
quality at 50 Hz. The model (2.1) is often written more conveniently in the Laplace
transform domain, which is possible since the differential equations are linear:
!
Y (s) = X(s) + V (s)
(2.2a)
s
!
V (s) =
E(s)
(2.2b)
1+s
Here, the s/ ( 1 + s ) is often referred to as a filter; the white noise is turned into high
frequency noise by sending it through the filter.
As a second example of a mathematical model we consider a laboratory process often
used in basic courses in automatic control. The process consists of a cylindrical water
tank, with a drain at the bottom. Water can be pumped from a reservoir to the tank,
and the drain leads water back to the reservoir. There is also a gauge that senses the
level of water in the tank. The task for the student is to control the level of water
in the tank, and what makes the task interesting is that the flow of water through
the drain varies with the level of water; the larger the level of water, the higher the
flow. Limited performance can be achieved using for instance, a manually tuned
pid controller, but to get good performance at different desired levels of water, a
model-based controller is the natural choice. Let x denote the level of water, and
u the flow we demand from the pump. A common approximation is that the flow
through the drain is proportional to the square root of the level of water. Denote the
•
White noise and how it is used in the example models is a non-trivial subject, but to read this chapter
it should suffice to know that white noise is a concept which is often used as a building block of more
sophisticated models of noise.
2.1
Models in automatic control
17
corresponding constant cd , and let the constant relating the flow of water to the time
derivative of x (that is, this constant is the inverse of the bottom area of the tank) be
denoted ca . Then we get the following mathematical model with two parameters to
be determined from some kind of experiment:
p
(2.3)
x0 (t) = ca u(t) − cd x(t)
The constant ca could be determined by plugging the drain, adding a known volume
of water to the tank, and measuring the resulting level. The other constant can also
be determined from simple experiments.
2.1.2
Use in estimation
The first model example above was introduced with a very easy estimation problem
in mind. Let us instead consider the task of computing an accurate estimate of the
level of water, given a sensor that is both noisy and slow. We will not go into details
here, but just mention the basic idea of how the model can be used.
Since the flow we demand from the pump, u, is something we choose, it is a known
quantity in (2.3). Hence, if we were given a correct value of x(0) and the model would
be correct, we could compute all future values of x simply by integration of (2.3).
However, our model will never be correct, so the estimate will only be good during a
short period of time, before the estimate has drifted away from the true value. The
errors in our model are not only due to the limited precision in the experiments used
to determine the constants, but more importantly because the square root relation
is a rather coarse approximation. In addition, it is unrealistic to assume that we get
exactly the flow we want from the pump. This is where the sensor comes into play;
even though it is slow and noisy, it is sufficient to take care of the drift. The best of
both worlds can then be obtained by combining the simulation of (2.3) with use of
the sensor in a clever way. A very popular method for this is the so-called extended
Kalman filter (for instance, Jazwinski (1970, theorem 8.1)).
2.1.3
Use in control
Let us consider the laboratory process (2.3) again. The task was to control the level
of water, and this time we assume that the errors in the measurements are negligible.
There is a maximal flow, umax , that can be obtained from the pump, and it is impossible to pump water backwards from the tank to the reservoir, so we shall demand a
flow subject to the constraints 0 ≤ u(t) ≤ umax . We denote the desired level of water
the set point, symbolized by xref . The theoretically valid control law,



if x(t) ≥ xref (t)
0,
u(t) = 
(2.4)

umax , otherwise
will be optimal in theory (when changes in xref cannot be foreseen) in the sense that
deviations from the set point are eliminated as quickly as possible. However, this
type of control law will quickly wear the pump since it will be switching rapidly
between off and full speed once the level gets to about the right level. Although still
unrealistically naïve, at least the following control law somewhat reduces wear of the
18
2
Theoretical background
pump, at the price of allowing slow and bounded drift away from the set point. It
has three modes, called the drain mode, the fill mode, and the open-loop mode:
(
u(t) = 0
Drain mode:
Switch to open-loop mode if x(t) < xref (t)
(
u(t) = umax
Fill mode:
Switch to open-loop mode if x(t) > xref (t)
(2.5)
p


u(t) = cd xref (t)




Open-loop mode: 
Switch to drain mode if x(t) > ( 1 + δ ) xref (t)



 Switch to fill mode if x(t) < ( 1 − δ ) x (t)
ref
where the parameter δ is a small parameter chosen by considering the trade-off between performance and wear of the pump. In the open-loop mode, the flow demanded from the pump is chosen to match the flow through the drain to the best of
our knowledge. Note that if δ is sufficiently large, errors in the model will make the
level of water settle at the wrong level; to each fixed flow there is a corresponding
p
level where the level will settle, and errors in the model will make cd xref (t) correspond to something slightly different from xref (t). More sophisticated controllers can
remedy this.
2.1.4
Model classes
When developing theory, be it system identification, estimation or control, one has
to specify the structure of the models to work with. We shall use the term model
class to denote a set of models which can be easily characterized. A model class is
thus a rather vague term such as, for instance, a linear system with white noise on
the measurements. Depending on the number of states in the linear system, and how
the linear system is parameterized, various model structures are obtained. When developing theory, a parameter such as the number of states is typically represented by
a symbol in the calculations — this way, several model structures can be treated in
parallel, and it is often possible to draw conclusions regarding how such a parameter affects some performance measure. In the language of system identification, one
would thus say that theory is developed for a parameterized family of model structures. Since such a family is a model class, we will often have such a family in mind
when speaking of model classes. The concepts of models, model sets, and model
structures are rigorously defined in the standard Ljung (1999, section 4.5) on system
identification, but we shall allow these concepts to be used in a broader sense here.
In system identification, the choice of model class affects the ability to approximate
the true process as well as how efficiently or accurately the parameters of the model
may be determined. In estimation and control, applicability of the results is related
to how likely it is that a user will choose to work with the treated model structure,
in light of the power of the results; a user may be willing to identify a model from
a given class if that will enable the user to use a more powerful method. The choice
of model class will also allow various amount of elaboration of the theory; a model
class with much structural information will generally allow a more precise analysis,
2.1
19
Models in automatic control
at the cost of increased complexity, both in terms of theory and implementation of
the results.
Before we turn to some examples of model classes, it should be mentioned that models are often describing a system in discrete time. However, this thesis is predominantly concerned with continuous time models, so the examples will all be of this
kind.
Continuing on our first example of a model class, in the sense of a parameterized
family of model structures, it could be described as all systems in the linear state
space form
x0 (t) = A x(t) + B u(t)
y(t) = C x(t) + D u(t) + v(t)
(2.6)
where u is the vector of system inputs, y the vector of measured outputs, v is a vector
with white noise, and x is a finite-dimensional vector of states. For a given number
of states, n, a model is obtained by instantiating the matrices A, B, C, and D with
numerical values.
It turns out that the class (2.6) is over-parameterized in the sense that it contains
many equivalent models. If the system has just one input and one output, it is wellknown that it can be described by 2 n + 1 parameters, and it is possible to restrict
the structure of the matrices such that they only contain this number of unknown
parameters without restricting the possible input-output relations.
Our second and final example of a model class is obtained by allowing more freedom
in the dynamics than in (2.6), while removing the part of the model that relates the
system output to its states. In a model of this type, all states are considered outputs:
x0 (t) = A( x(t) ) + B u(t)
(2.7)
Here, we might pose various types of constraints on the function A. For instance, assuming Lipschitz continuity is very natural since it ensures that the model uniquely
defines the trajectory of x as a function of u and initial conditions. Another interesting choice for A is the polynomials, and if the degree is at most 2 one obtains a
small but natural extension of the linear case. Another important way of extending
the model class (2.6) is to look into how the system inputs u are allowed to enter the
dynamics.
2.1.5
Model reduction
Sophisticated methods in estimation and control may result in very computationally expensive implementations when applied to large models. By large models, we
generally refer to models with many states. For this reason methods and theory for
approximating large models by smaller ones have emerged. This approximation process is referred to as model reduction. Our interest in model reduction owes to its
relation to index reduction (explained in section 2.2), a relation which may not be
widely recognized, but one which this thesis tries bring attention to. This section
provides a small background on some available methods.
20
2
Theoretical background
In view of the dae for which index reduction is considered in detail in later chapters,
we shall only look at model reduction of lti systems here, and we assume that the
large model is given in state space form as in (2.6).
If the states of the model have physical meaning it might be desirable to produce a
smaller model where the set of states is a subset of the original set of states. It then
becomes a question of which states to remove, and how to choose the system matrices
A, B, C, and D for the smaller system. Let the states and matrices be partitioned such
that x2 are the states to be removed (this requires the states to be reordered if the
states to be removed are not the last components of x), and denote the blocks of the
partitioned matrices according to
#
! " #
! "
B
x10 (t)
A11 A12 x1 (t)
+ 1 u(t)
=
B2
x20 (t)
A21 A22 x2 (t)
(2.8)
!
h
i x (t)
1
+ D u(t) + v(t)
y(t) = C1 C2
x2 (t)
If x2 is selected to consist of states that are expected to be unimportant due to the
small values those states take under typical operating conditions, one conceivable
approximation is to set x2 = 0 in the model. This results in the truncated model
x10 (t) = A11 x1 (t) + B1 u(t)
y(t) = C1 x1 (t) + D u(t) + v(t)
(2.9)
Although — at first glance — this might seem like a reasonable strategy for model
reduction, it is generally hard to tell how the reduced model relates to the original
model. Also, selecting which states to remove based on the size of the values they
typically take is in fact a meaningless criterion, since any state can be made small by
scaling, see section 2.1.6.
Another approximation is obtained by formally replacing x20 (t) by 0 in (2.8). The
underlying assumption is that the dynamics of the states x2 is very fast compared
to x1 . A necessary condition for this to make sense is that A22 be Hurwitz, which
also makes it possible to solve for x2 in the obtained equation A21 x1 (t) + A22 x2 (t) +
!
B2 u(t) = 0. Inserting the solution in (2.8) results in the residualized model
−1
x10 (t) = A11 − A12 A−1
22 A12 x1 (t) + B1 − A12 A22 B2 u(t)
−1
y(t) = C1 − C2 A−1
22 A12 x1 (t) + D − C2 A22 B2 u(t) + v(t)
(2.10)
It can be shown that this model gives the same output as (2.8) for constant inputs u.
If the states of the original model do not have interpretations that we are keen to
preserve, the above two methods for model reduction can produce an infinite number of approximations if combined with a change of variables applied to the states;
applying the change of variables x = T ξ to (2.6) results in
ξ 0 (t) = T −1 A T ξ(t) + T −1 B u(t)
y(t) = C T ξ(t) + D u(t) + v(t)
(2.11)
2.1
Models in automatic control
21
and the approximations will be better or worse depending on the choice of T . Conversely, by certain choices of T , it will be possible to say more regarding how close
the approximations are to the original model. If T is chosen to bring the matrix A
in Jordan form, truncation is referred to as modal truncation, and residualization is
then equivalent to singular perturbation approximation (see section 2.5). (Skogestad
and Postlethwaite, 1996)
The change of variables T most well developed is that which brings the system into
balanced form. When performing truncation or residualization on a system in this
form, the difference between the approximation and the original system can be expressed in terms of the system’s Hankel singular values. We shall not go into details
about what these values are, but the largest of them defines the Hankel norm of a
system. Neither shall we give interpretations of this norm, but it turns out that it is
actually possible to compute the reduced model of a given order which minimizes the
Hankel norm of the difference between the original system and the approximation.
By now we have seen that there are many ways to compute smaller approximations
of a system, ranging from rather arbitrary choices to those which are clearly defined
as minimizers of a coordinate-independent objective function.
Some model reduction techniques have been extended to lti dae. (Stykel, 2004)
However, although the main question in this thesis is closely related to model reduction, these techniques cannot readily be applied in our framework since we are
interested in defending a given model reduction (this view should become clear in
later chapters) rather than finding one with good properties.
2.1.6
Scaling
In section 2.1.5, we mentioned that model reduction of a system in state space form,
(2.6), was a rather arbitrary process unless thinking in terms of some suitable coordinate system for the state space. The first example of this was selecting which states to
truncate based on the size of the values that the state attains under typical operating
conditions, and here we do the simple maths behind that statement. Partition the
states such that x2 is a single state which is to be scaled by the factor a. This results
in
#
! "
#
! "
1
B1
x10 (t)
A11
A12 x1 (t)
a
=
+
u(t)
x20 (t)
a B2
a A21 A22 x2 (t)
(2.12)
!
i x (t)
h
1
1
y(t) = C1 a C2
+ D u(t) + v(t)
x2 (t)
(not writing out that also initial conditions have to be scaled accordingly). Note that
the scalar A22 on the diagonal does not change (if it would, that would change the
trace of A, but the trace is known to be invariant under similarity transforms).
In the index reduction procedure studied in later chapters, the situation is reversed:
it is not a question about which states are small, but which coefficients that are small.
The situation is even worse for lti dae than for the state space systems considered so
far, since in a dae there is also the possibility to scale the equations independently of
22
2
Theoretical background
the states. Again, it becomes obvious that this cannot be answered in a meaningful
way unless the coordinate systems for the state space and the equation residuals are
chosen suitably. Just like in model reduction, the user may be keen to preserve the
interpretation of the model states, and may hence be reluctant to use methods that
apply variable transforms to the states. However, unlike model reduction of ordinary
differential equations, the dae may still be transformed by changing coordinates of
the equation residuals. In fact, changing the coordinate system of the equation residuals is the very core of the index reduction algorithm.
Pure scaling of the equation residuals is also an important part of the numerical
method for integration of dae that will be introduced in section 2.2.8. There, scaling
is important not because it facilitates analysis, but because it simply improves numeric quality of the solution. To see how this works, we use the well-known (see, for
instance, Golub and Van Loan (1996)) bound on the relative error in the solution to
!
a linear system of equations A x = b, which basically says the relative errors in A and
b are propagated to x by a factor bounded by the (infinity norm) condition number
of A. Now consider the linear system of equations in the variable qx (that is, x is
given)
#
"1 #
"1
! E1
E1 + A1 qx =
x
(2.13)
A2
0
where is a small but exactly known parameter. If we assume that the relative errors
in E and A are of similar magnitudes, smallness of gives both that the matrix on
the left hand side is ill-conditioned, and that the relative error of this matrix is approximately the same as the relative error in E1 alone. Scaling the upper row of the
equations will hence make the matrix on the left hand side better conditioned, while
not making the relative error significantly larger. On the right hand side, scaling of
the upper block by is the same as scaling all of the right hand side by , and hence
the relative error does not change. Hence, scaling by will give a smaller bound
on the relative error in the solution. Although the scaling by was performed for
the sake of numerics, it should be mentioned that, generally, the form (2.13) is only
obtained after choosing a suitable coordinate system for the dae residuals.
Another important situation we would like to mention — when scaling matters — is
when gradient-based methods are used in numerical optimization. (Numerical optimization in one form or another is the basic tool for system identification.) Generally,
the issue is how the space of optimization variables is explored, not so much the numerical errors in the evaluation of the objective function and its derivatives. It turns
out that the success of the optimization algorithm depends directly on how the optimization variables (that is, model parameters to be identified) are scaled. One of the
important advantages of optimization schemes that also make use of the Hessian of
the objective function is that they are unaffected by linear changes of variables.
2.2
Differential-algebraic equations
Differential-algebraic equations (generally written just dae) is a rather general kind
of equations which is suitable for describing systems which evolve over time. The
2.2
23
Differential-algebraic equations
advantage they offer over the more often used ordinary differential equations is that
they are generally easier to formulate. The price paid is that they are more difficult
to deal with.
The first topic of the background we give in this section is to try to clarify why dae
can be a convenient way of modeling systems in automatic control. After looking
at some common forms of dae, we then turn to the basic elements of analysis and
solution of dae. Finally, we mention some existing software tools. For recent results
on how to carry out applied tasks such as system identification and estimation for
dae models, see Gerdin (2006), or for optimal control, see Sjöberg (2008).
2.2.1
Motivation
Nonlinear differential-algebraic equations is the natural outcome of componentbased modeling of complex dynamic systems. Often, there is some known structure
to the equations, for instance, it was mentioned in chapter 1 that we would like to
understand a method that applies to equations in quasilinear form,
!
E( x(t), t ) x0 (t) + A( x(t), t ) = 0
(2.14)
In the next section, we approach this form by looking at increasingly general types of
equations.
Within many fields, equations emerge in the form (2.14) without being recognized
as such. The reason is that when x0 (t) is sufficiently easy to solve for, the equation is
converted to the state space form which can formally be written as
!
x0 (t) = −E( x(t), t )−1 A( x(t), t )
Sometimes, the leading matrix may be well conditioned, but nevertheless non-trivial
to invert. It may then be preferable to leave the equations in the form (2.14). In this
case, the form (2.14) is referred to as an implicit ode or an index 0 dae. One reason
for not converting to state space form is that one may loose sparsity patterns.• Hence,
the state space form may require much more storage than the implicit ode, and may
also be a much more expensive way of obtaining x0 (t). Besides, even when the inverse
of a sparse symbolic matrix is also sparse, the expressions in the inverse matrix are
generally of much higher complexity.˜
•
Here is an example that shows that the inverse of a sparse matrix may be full:

1
1


0
˜
0
1
1
−1
1
1
0 =

2
1

 1
−1


1
1
1
−1

−1
1 

1
If the example above is extended to a 5 by 5 matrix with unique symbolic constants at the non-zero positions, the memory required to store the original matrix in Mathematica (Wolfram Research, Inc., 2008)
is 528 bytes. If the inverse is represented with the inverse of the determinant factored out, the memory
requirement is 1648 bytes, and without the factorization the memory requirement is 6528 bytes.
24
2
Theoretical background
Although an interesting case by itself, the implicit ode form is not the purpose in this
thesis. What remains is the case when the leading matrix is singular. Such equations
appear naturally in many fields, and we will finish this section by looking briefly at
some examples.
As was mentioned above, quasilinear equations are the natural outcome of component-based modeling, and these will generally have a singular leading matrix. This
type of modeling refers to the bottom-up process, where one begins by making small
models of simple components. The small models are then combined to form bigger
models, and so on. Each component, be it small or large, have variables that are
thought of as inputs and outputs, and when models are combined to make models
at a higher level, this is done by connecting outputs with inputs. Each connection
renders a trivial equation where two variables are “set” equal. These equations contain no differentiated variables, and will hence have a corresponding zero row of the
leading matrix. The leading matrix must then be singular, but the problem has a
prominent structure which is easily exploited.
Our next example is models of electric networks. Here, many components (or subnetworks) may be connected in one node, where all electric potentials are equal and
Kirchoff’s Current Law provides the glue for currents. While the equations for the
potentials are trivial equalities between pairs of variables, the equations for the currents will generate linear equations involving several variables. Still, the corresponding part of the leading matrix is a zero row, and the coefficients of the currents are
±1, when present. This structure is also easy to exploit.
The previous example is often recognized as one of the canonical applications of the
so-called bond graph theory. Other domains where (one-dimensional) bond graphs
are used are mechanical translation, mechanical rotation, hydraulics (pneumatics),
some thermal systems, and some systems in chemistry. While the one-dimensional
bond graphs are the most widely known, there is an extension systems which among
other applications overcomes the limitation in mechanical systems to objects which
either translate along a given line or rotate about a given axis. This generalization
is known as multi-bond graphs or vector bond graphs, see Breedveid (1982), Ingrim
and Y. (1991), and references therein. In the bond graph framework, the causality of
a model needs to be determined in order to generate model equations in ode form.
However, the most frequently used technique for assigning causality to the bond
graph, named Sequential Causality Assignment Procedure (Rosenberg and Karnopp,
1983, section 4.3), suffers from a potential problem with combinatorial blow-up. One
way of avoiding this problem is to generate a dae instead.
Although some chemical processes can be modeled using bond graphs, this framework is rarely mentioned in recent literature on dae modeling in the chemistry domain. Rather, equation-based formulations prevail, and according to Unger et al.
(1995), most models have the quasilinear form. The amount on dae research within
the field of chemistry is remarkable, which is likely due to their extensive applicability in a profitable business where high fidelity models are a key to better control
strategies.
2.2
25
Differential-algebraic equations
2.2.2
Common forms
Having presented the general idea of finding suitable model classes to work with in
section 2.1.4, this section contains some common cases from the dae world. As we
are moving our focus away from the automatic control applications that motivate
our research, towards questions of more generic mathematical kind, our notation
changes; instead of using model class, we will now speak of the form of an equation.
We begin with some repetition of notation defined in section 1.6.
Beginning with the overly simple, an autonomous lti dae has the form
!
E x0 (t) + A x(t) = 0
(2.15)
where E and A are constant matrices. A large part of this thesis is devoted to the
study of this form. Adding forcing functions (often representing external inputs)
while maintaining the lti property, leads to the general lti dae form
!
E x0 (t) + A x(t) + B u(t) = 0
(2.16)
where u is a vector-valued function representing external inputs to the model, and B
is a constant matrix. The function u may be subject to various assumptions.
2.1 Example
In automatic control, system inputs are often computed as functions of the system
state or an estimate thereof — this is called feedback — but such inputs are not
external. To see how such feedback loops may be conveniently modeled using dae
models, let
!
h
i u (t) !
0
1
=0
(2.17)
EG x (t) + AG x(t) + BG1 BG2
u2 (t)
be a model of the system without the feedback control. Here, the inputs to the system
has been partitioned into one part, u1 , which will later be given by feedback, and one
part, u2 , which will be the truly external inputs to the feedback loop. Let
!
h
i u (t) !
0
1
EH x̂ (t) + AH x̂(t) + BH1 BH2
=0
(2.18)
u2 (t)
be the equations of the observer, generating the estimate x̂ of the true state x. Finally,
let a simple feedback be given by
u1 (t) = L x̂(t)
(2.19)
Now, it is more of a matter of taste whether to consider the three equations (2.17),
(2.18), and (2.19) to be in form (2.16) or not; if not, it just remains to note that if u1
is made an internal variable of the model, the equations can be written
 

 0  


BG1   x(t)   BG2 
EG
  x (t)  AG
!


  x̂0 (t)  + 
EH
AH BH1   x̂(t)  + BH2  u2 (t) = 0
(2.20)

  0  
 


u1 (t)
−L
I
u1 (t)
26
2
Theoretical background
Of course, eliminating u1 from these equations would be trivial;
"
# 0 ! "
#
! "
#
EG
x (t)
AG
−BG1 L
x(t)
BG2
!
+
+
u (t) = 0
EH x̂0 (t)
AH − BH1 L x̂(t)
BH2 2
but the purpose of this example is to show how the model can be written in a form
that is both a little easier to formulate and that is better at displaying the logical
structure of the model.
One way to generalize the form (2.16) is to remove the restriction to time-invariant
equations. This leads to the linear time-varying form of dae:
!
E( t ) x0 (t) + A( t ) x(t) + B( t ) u(t) = 0
(2.21)
While this form explicitly displays what part of the system’s time variability that is
due to “external inputs”, one can, without loss of generality, assume that the equations are in the form
!
E( t ) x0 (t) + A( t ) x(t) = 0
(2.22)
This is seen by (rather awkwardly) writing (2.21) as
"
!
#
# 0 ! "
A( t ) B( t ) u(t) x(t) !
x (t)
E( t )
=0
+
α(t)
I α 0 (t)
!
α(t0 ) = 1
where the variable α has been included as an awkward way of denoting the constant
1. Still, the form (2.21) is interesting as it stands since it can express logical structure
in a model, and if algorithms exploit that structure one may obtain more efficient implementations or results that are easier to interpret. In addition, it should be noted
that the model structures are not fully specified without telling what constraints the
various parts of the equations must satisfy. If one can handle a larger class of functions representing external inputs in the form (2.21) than the class of functions at the
algebraic term in (2.22), there are actually systems in the form (2.21) which cannot
be represented in the form (2.22). The same kind of considerations should be made
when considering the form
!
E( t ) x0 (t) + A( t ) x(t) + f (t) = 0
(2.23)
as a substitute for (2.21).
A natural generalization of (2.23) is to allow dependency of all variables where (2.23)
only allows dependency of t. With the risk of loosing structure in problems with
external inputs etc the resulting equations are then in the quasilinear form, repeated
here,
!
E( x(t), t ) x0 (t) + A( x(t), t ) = 0
(2.14)
The most general form of dae is
!
f ( x0 (t), x(t), t ) = 0
(2.24)
2.2
27
Differential-algebraic equations
but it takes some analysis to realize why writing this equation as

!



 f ( ẋ(t), x(t), t ) = 0


!


ẋ(t) − x0 (t) = 0
(2.25)
does not show that (2.14) is the most general form we need to consider.
So far, we have considered increasingly general forms of dae without considering
how the equations can be analyzed. For instance, modeling often leads to equations
which are clearly separated into differential and non-differential equations, and this
structure is often possible to exploit. Since discussion of the following forms requires
the reader to be familiar with the contents of section 2.2.3, the forms will only be
mentioned quickly to give some intuition about what forms with this type of structural properties may look like. What follows is a small and rather arbitrary selection
of the forms discussed in Brenan et al. (1996).
The semi-explicit form looks like
!
x10 (t) = f1 ( x1 (t), x2 (t), t )
(2.26)
!
0 = f2 ( x1 (t), x2 (t), t )
and one often speaks of semi-explicit index 1 dae (the concept of an index will be
discussed further in section 2.2.3), which means that the function f2 is such that x2
can be solved for:
∇2 f2 is square and non-singular
(2.27)
Another often used form is the Hessenberg form of size r,
!
x10 (t) = f1 ( x1 (t), x2 (t), . . . , xr (t), t )
!
x20 (t) = f2 ( x1 (t), x2 (t), . . . , xr−1 (t), t )
..
.
(2.28)
!
xi0 (t) = f2 ( xi−1 (t), xi (t), . . . , xr−1 (t), t )
..
.
!
0 = fr ( xr−1 (t), t )
where it is required that
!
!
∂fr ( xr−1 , t ) ∂fr−1 ( xr−2 , t )
∂xr−1
∂xr−2
∂f2 ( x1 , x2 , . . . , xr−1 , t )
···
∂x1
is non-singular.
!
∂f1 ( x1 , x2 , . . . , xr , t )
∂xr
!
(2.29)
28
2
2.2.3
Theoretical background
Indices and their deduction
In the previous sections, we have spoken of the index of a dae and index reduction,
and we have used the notions as if they were well defined. This is not the case;
there are many definitions of indices. In this section, we will mention some of these
definitions, and define what shall be meant by just index (without qualification) in
the remainder of the thesis. We shall do this in some more length than what is needed
for the following chapters, since this is a good way of introducing readers with none
or very limited experience with dae to typical dae issues.
At least three categories of indices can be identified:
• For equations that relate forcing functions to the equation variables, there are
indices that are equal for any two equivalent equations. In other words, these
indices are not a property of the equations per se, but of the abstract system
defined by the equations.
• For equations written in particular forms, one can introduce perturbations or
forcing functions at predefined slots in the equations, and then define indices
that tell how the introduced elements are propagated to the solution. Since
equivalence of equations generally do not account for the slots, these indices
are generally not the same for two equations considered equivalent. In other
words, these indices are a property of the equations per se, but are still defined
abstractly without reference to how they are computed.
• Analysis (for instance, revealing the underlying ordinary differential equation
on a manifold) and solution of dae has given rise to many methods, and one
can typically identify some natural number for each method as a measure of
how involved the equations are. This defines indices based on methods. Basically these are a property of the equations, but can generally not be defined
abstractly without reference to how to compute them.
The above categorization is not a clear cut in every case. For instance, an index which
was originally formulated in terms of a method may later be given an equivalent but
more abstract definition.
Sometimes, when modeling follows certain patterns, the resulting equations may be
of known index (of course, one has to specify which index is referred to). It may then
be possible to design special-purpose algorithms for automatic control tasks such as
simulation, system identification or state estimation.
In this thesis, we regard the solution of initial value problems as a key to understanding other aspects of dae in automatic control. We are not so much interested
in the mathematical questions of exactly when solutions exist or how the solutions
may be described abstractly, but often think in terms of numerical implementation.
For equations of unknown, higher index, all existing approaches to numerical solution of initial value problems that we know of perform index reduction so that one
obtains equations of low index (typically 0 or 1), which can then be fed to one of
the many available solvers for such equations. The index reduction algorithm used
in the following chapters on singular perturbation (described in chapter 3) relates to
2.2
Differential-algebraic equations
29
the differentiation index, which we will define first in terms of this algorithm. We
will then show an equivalent but more abstract definition. See Campbell and Gear
(1995) for a survey (although incomplete today) of various index definitions and for
examples of how different indices may be related.
The algorithm that we use to reveals the differentiation index is a so-called
elimination-differentiation approach. These have been in use for a long time,
and as is often the case in the area of dynamic systems, the essence of the idea is best
introduced by looking at linear time-invariant (lti) systems, while the extension
to nonlinearities brings many subtleties to the surface. The linear case was considered in Luenberger (1978), and the algorithm is commonly known as the shuffle
algorithm.
For notational convenience in algorithm 2.1 (on page 30), we recall the following
definition from section 1.6.1


 u 
 u 0 


4 
u 0{i} =  .. 
 . 
 0(i) 
u
In the algorithm, there is a clear candidate for an index: the final value of i. We make
this our definition of the differentiation index.
2.2 Definition (Differentiation index). The differentiation index of a square lti
dae is given by the final value of i in algorithm 2.1.
While the compact representation of lti systems makes the translation of theory to
computer programs rather straightforward, the implementation of nonlinear theory
is not at all as straightforward. This seems, at least to some part, to be explained
by the fact that there are no widespread computer tools for working with the mathematical concepts from differential geometry. A theoretical counterpart (called the
structure algorithm, see section 2.2.5) of the shuffle algorithm, but applying to general nonlinear dae, was used in Rouchon et al. (1992). However, its implementation
is nontrivial since it requires a computable representation of the function whose existence is granted by the implicit function theorem. For quasilinear dae, on the other
hand, an implicit function can be computed explicitly, and our current interest in
these methods owes to this fact. For references to implementation-oriented index
reduction of quasilinear dae along these lines, see for example Visconti (1999) or
Steinbrecher (2006). Instead of extending the above definition of the differentiation
index of square lti dae to the quasilinear form, we shall make a more general definition, which we will prove is a generalization of the former.
The following definition of the differentiation index of a general nonlinear dae can be
found in Campbell and Gear (1995). It should be mentioned, though, that the authors
of Campbell and Gear (1995) are not in favor of using this index to characterize a
model, and define replacements. On the other hand, in the context of particular
algorithms, the differentiation index may nevertheless be a relevant characterization.
30
2
Theoretical background
Algorithm 2.1 The shuffle algorithm.
Input: A square lti dae,
!
E x0 (t) + A x(t) + B u(t) = 0
Output: An equivalent non-square dae consisting of a square lti dae with non-
singular leading matrix (and redefined forcing function) and a set C =
equality constraints involving x and u 0{i} for some i.
S
i
Ci of linear
Algorithm:
E0 B E
A0 B A
B0 B B
iB0
while Ei is singular
Manipulate
the equations by row operations so that Ei becomes partitioned as
" #
Ēi
, where Ēi has full rank and Ẽi = 0. This can be done by, for instance,
Ẽi
Gaussian elimination or QR factorization.
Perform the same row operations on the other matrices, and partition the result
similarly.
!
Ci B Ãi x + B̃i u 0{i} = 0
" #
Ē
Ei+1 B i
Ãi
" #
Ā
Ai+1 B i
0
"
#
— B̄ — 0
Bi+1 B
0 — B̃ —
i B i+1
if i > dim x
abort with “ill-posed”
end
end
Remark: The new matrices computed in each iteration simply correspond to differentiating the equations from which the differentiated variables have been removed
by the row operations. (This should clarify the notation used in the construction the
Bi .) Since the row operations generate equivalent equations, and the equations that
get differentiated are also kept unaltered in C, it is seen that the output equations are
equivalent to the input equations.
See the notes in algorithm 2.2 regarding geometric differentiation, and note that assumptions about constant Jacobians are trivially satisfied in the lti case.
2.2
31
Differential-algebraic equations
Consider the general nonlinear dae
!
f ( x0 (t), x(t), t ) = 0
(2.30)
ẋ{i} (t) = x(t), ẋ(t), . . . , ẋ(i) (t)
(2.31)
By using the notation
!
the general form can be written f0 ( ẋ{1} (t), t ) = 0. Note that differentiation with
!
respect to t yields an equation which can be written f1 ( ẋ{2} (t), t ) = 0. Introducing
the derivative array


 f0 ( x0{1} (t), t ) 


..

Fi ( x0{i+1} (t), t ) = 
(2.32)

.


0{i+1}
fi ( x
(t), t )
the implied equation
!
Fi ( ẋ{i+1} (t), t ) = 0
(2.33)
is called the derivative array equations accordingly.
2.3 Definition (Differentiation index). Suppose (2.30) is solvable. If ẋ(t) is
uniquely determined given x(t) and t in the non-differential equation (2.33), for all
x(t) and t such that a solution exist, and νD is the smallest i for which this is possible,
then νD is denoted the differentiation index of (2.30).
Next, we show that the two definitions of the differentiation index are compatible.
2.4 Theorem. Definition 2.3 generalizes definition 2.2.
!
Proof: Consider the derivative array equations Fi ( ẋ{i+1} , t ) = 0 for the square lti
dae of definition 2.2:
  x   B u(t) 


 
 
A0 E0
  ẋ   B u 0 (t) 

A
E

0
0
 !


 

 = 0
(2.34)

  ..  + 
..
.
.

.
.
  .  

.
.
.


 
 

A0 E0 ẋ(i+1)
B u 0(i) (t)
Suppose definition 2.2 defines the index as i. Then Ei in algorithm 2.1 is non-singular
by definition. The first row elimination of the shuffle algorithm on (2.34) yields




 B̄ u(t) 
Ā0 Ē0

Ã
 
  B̃ u(t) 
 0



  x  
 B̄ u 0 (t) 

 
Ā0 Ē0


  ẋ  
0
Ã0

  .  +  B̃ u (t)  =! 0


  .  
..
..
..


  .  


  (i+1)  
.
.
.

 ẋ
 0(i) 



Ā0 Ē0 
B̄ u (t)


 0(i) 
Ã0
B̃ u (t)
32
Reordering the rows as

Ā0 Ē0

Ã0


..

.








Ã0
2
..
.
Ā0
Ē0
Ã0
Ā0
Theoretical background



 B̄ u(t) 



 
  B̃ u 0 (t) 


  x  
..

 

  ẋ  
.
 !
 
  ..  + B̄ u 0(i−1) (t) = 0
  .  

 
 ẋ(i+1)   B̃ u 0(i) (t) 

 B̄ u 0(i) (t) 
Ē0 





B̃ u(t)
(2.35)
and ignoring the last two rows, this can be written

 x 
A1 E1
  

  ẋ 
A
E
1
1
!

  

  ..  + · · · = 0
..
..

  . 
.
.

  
A1 E1 ẋ(i)
using the notation in algorithm 2.1. The forcing function u has been suppressed for
brevity. After repeating this procedure i times, one obtains
!
h
i x
!
Ai Ei
+ ··· = 0
ẋ
which shows that definition 2.2 gives an upper bound on the index defined by definition 2.3.
Conversely, it suffices to show that the last two rows of (2.35) do not contribute to
the determination of ẋ. The last row only restricts the feasible values for x, which is
considered a given in the equation. The second last row contains no information than
can be propagated to ẋ since it can be solved for any ẋ(i) by a suitable choice of ẋ(i+1)
(which appears in no other equation). Since this shows that no information about ẋ
was discarded, we have also found that if the index as defined by definition 2.2 is
greater than i, then Ei is singular, and hence the index as defined by definition 2.3
must also be greater than i. That is, definition 2.2 gives a lower bound on the index
defined by definition 2.3.
Many other variants of differentiation index definitions can be found in Campbell
and Gear (1995), which also provides the relevant references. However, they avoid
discussion of geometric definition of differentiation indices. While not being important for lti dae, where the representation by numeric matrices successfully captures
the geometry of the equations, geometric definitions turn out to be important for
nonlinear dae. This is emphasized in Thomas (1996), as it summarizes results by
other authors. (Rabier and Rheinboldt, 1994; Reich, 1991; Szatkowski, 1990, 1992)
It is noted that the geometrically defined differentiation index is bounded by the
dimension of the equations, and cannot be computed reliably using numerical methods; the indices which can be computed numerically are not geometric and may not
be bounded even for well-posed equations. The presentation in Thomas (1996) is
further developed in Reid et al. (2001) to apply also to partial differential-algebraic
equations.
2.2
33
Differential-algebraic equations
Having discussed the differentiation index with its strong connection to algorithms,
we now turn to an index concept of another kind, namely the perturbation index.
The following definition is taken from Campbell and Gear (1995), which refers to
Hairer et al. (1989).
!
2.5 Definition. The dae f ( x0 (t), x(t), t ) = 0 has perturbation index νP along a solution x on the interval I = [ 0, T ] if νP is the smallest integer such that if
!
f ( x0 (t), x(t), t ) = δ(t)
for sufficiently smooth δ, then there is an estimate•
kx̂(t) − x(t)k ≤ C kx̂(0) − x(0)k + kδktνP −1
Clearly, one can define a whole range of perturbation indices by considering various
“slots” in the equations, and each form of the equations may have its own natural
slots. There are two aspects of these indices we would like to emphasize. First, they
are defined completely without reference to a method for computing them, and in
this sense they seem closer to capturing intrinsic features of the system described by
the equations, than indices that are defined by how they are computed. Second, and
on the other hand, the following example shows that these indices may be strongly
related to which set of equations are used to describe a system.
2.6 Example
Consider computing the perturbation index of the dae
!
f ( x0 (t), x(t), t ) = 0
We must then examine how the solution depends on the forcing perturbation function δ in
!
f ( x0 (t), x(t), t ) = δ(t)
Now, let the matrix K( x(t), t ) define a smooth, non-singular transform of the equations, leading to
!
K( x(t), t ) f ( x0 (t), x(t), t ) = 0
with perturbation index defined by examination of
!
K( x(t), t ) f ( x0 (t), x(t), t ) = δ(t)
•
Here, the norm with ornaments is defined by
kδktm =
m
X
sup δ0(i) (τ) ,
i=0 τ∈[ 0, t ]
Zt 0(i) t =
kδk−1
δ (τ) dτ
0
m≥0
34
2
Theoretical background
Trying to relate this to the original perturbation index, we could try rewriting the
equations as
!
f ( x0 (t), x(t), t ) = K( x(t), t )−1 δ(t)
but this introduces x(t) on the right hand side, and is no good. Further, since the perturbation index does not give bounds on the derivative of the estimate error, there are
no readily available bounds on the derivatives of the factor K( x(t), t )−1 that depend
only on t.
In the special case when the perturbation
index is 0, however, a bound on K allows us
t
t
K(
to translate a bound in terms of
x(t), t )−1 δ(t)−1 to a bound in terms of kδ(t)k−1
.
This shows that, at least, this way of rewriting the equations does not change the
perturbation index.
It would be interesting to relate the differentiation index to the perturbation index,
but we have already seen an example of how different index definitions can be related, and shall not dwell more on this. Instead, there is one more index we would
like to mention since it is instrumental to a well developed theory and will be the
starting point for chapter 5. This is the strangeness index, developed for time-varying
linear dae in Kunkel and Mehrmann (1994), see also Kunkel and Mehrmann (2006).
Perhaps due to its ability of revealing a more intelligent characterization of a system compared to, for instance, the differentiation index, the strangeness index is
somewhat expensive to compute. This becomes particularly evident in the associated
method for solving initial value problems, where the index computations are performed at each step of the solution. This is addressed in the relatively recent Kunkel
and Mehrmann (2004), see also Kunkel and Mehrmann (2006, remark 6.7 and remark 6.9). However, one caveat remains, being that the implications of determining
ranks numerically are not understood, see for instance Kunkel and Mehrmann (2006,
remark 6.7) or Kunkel et al. (2001, remark 8). The kind of results that are missing
here is highly related to the matrix-valued perturbation problems considered in this
thesis, although our analysis is related with the differentiation index rather than the
strangeness index.
A quite different method which reduces the index is the Pantelides’ algorithm (Pantelides, 1988) and the dummy derivatives (Mattsson and Söderlind, 1993) extension
thereof. This technique is in extensive use in component-based modeling and simulation software for the Modelica language, such as Dymola (Mattsson et al., 2000;
Brück et al., 2002) and OpenModelica (Fritzson et al., 2006a,b). A major difference
between the previously discussed index reduction algorithms and Pantelides’ algorithm is that the former use mathematical analysis to derive the new form, while the
latter uses only the structure of the equations (the equation–variable graph). Since
the equation–variable graph does not require the equations to be in any particular
form, the technique is applicable to general nonlinear dae. While the graph-based
technique is expected to be mislead by a change of variables and other manipulations
of the equations (see section 1.2.1), it is well suited for the equations as they arise in
the software systems mentioned above.
2.2
35
Differential-algebraic equations
Hereafter, when speaking of just the index (without qualification), we refer to the
differentiation index, often thinking of it as the number of steps required to shuffle
the equations to an implicit ode.
In the presence of uncertainty, there are two more index concepts which we need to
define. Thinking of the uncertain dae as some element (point) in a set of dae, we
may use the term point dae to denote an exact dae. When a dae is uncertain, its
index also becomes uncertain in the general case, which is emphasized by the next
definition.
2.7 Definition (Pointwise index). Let just “index” refer to one of the notions of
index for a point dae. The pointwise index of the uncertain dae
!
f ( x(t), x0 (t), t ) = 0
for some fixed f ∈ F
is defined as the set
!
index of f ( x(t), x0 (t), t ) = 0 : f ∈ F
If the set contains exactly one element by construction or by assumption, we will
make reuse of notation and let the pointwise index refer to this element instead of
the set containing it.
We will often make assumptions regarding the pointwise index of a dae, and it is important to see that this is a way of removing unwanted point dae from the uncertain
dae.
The second index concept appears when the uncertain dae is being approximated
by one with less (or none) uncertainty, typically of higher index than the dae being
approximated.
2.8 Definition (Nominal index). When an uncertain dae is being approximated by
another dae where some of the uncertainty is removed, the nominal index refers to
the index of the latter dae.
The nominal index is thus something defined by how the uncertain dae is being
approximated, and there will generally be a trade-off between stiffness in an approximation of low nominal index, and large uncertainty bounds on the solution to an
approximation of higher nominal index.
2.2.4
Transformation to quasilinear form
In this section, the transformation of a general nonlinear dae to quasilinear form is
considered. This may seem like a topic for section 2.2.2, but since we need to refer to
the index concept, waiting until after section 2.2.3 has been motivated.
For ease of notation, we shall only deal with equations without explicit dependence
on the time variable in this section. This way, it makes sense to write a time-invariant
nonlinear dae as
!
f ( x, x0 , x00 , . . . ) = 0
(2.36)
36
2
Theoretical background
The variable in this equation is the function x, and the zero on the right hand side
must be interpreted as the mapping from all of the time domain to the constant
real vector 0. We choose to interpret the equality relation of the equation pointwise, although other measure-zero interpretations could be made (we are not seeking
new semantics, only a shorter notation compared to (2.24)). Including higher order
derivatives in the form (2.36) may seem just like a minor convenience compared to
using only first order derivatives in (2.24), but some authors remark that this is not
always the case (see, for instance, (Mehrmann and Shi, 2006)), and this is a topic for
the discussion below.
The time-invariant quasilinear form looks like
!
E( x ) x0 + A( x ) = 0
(2.37)
Assuming that (2.24) has index νD but is not in the form (2.37), can we say something
about the index of the corresponding (2.37)?
Not being in the form (2.37) can be for two reasons:
• There are higher-order derivatives.
• The residuals are not linear in the derivatives.
To remedy the first, one simply introduces new variables for all but the highest and
the higher-than-1-order derivatives. Of course, one also adds the equations relating
the introduced variables to the derivatives they represent; each new variable gets one
associated equation. This procedure does not raise the index, since the derivatives
which have to be solved for really have not changed. If the highest order derivatives
could be solved for in terms of lower-order derivatives after νD differentiations of
(2.24), they will be possible to solve for in terms of the augmented set of variables
after νD differentiations of (2.37) (of course, there is no need to differentiate the introduced trivial equations). The introduced variables’ derivatives that must also be
solved for are trivial (that is why the definitions of index does not have to mention
solution of the lower-order derivatives).
Turning to the less trivial reason, nonlinearity in derivatives, the fix is still easy;
introduce new variables for the derivatives that appear nonlinearly and add the linear
(trivial) equations that relate the new variables to derivatives of the old variables;
change
!
f ( x, x0 ) = 0
(2.38)
to

!



x0 = ẋ




 f ( x, ẋ ) =! 0
Note the important difference to the previous case: this time we introduce new variables for some highest-order derivatives. This may have implications for the index.
If the index was previously defined as the number of differentiations required to be
2.2
37
Differential-algebraic equations
able to solve for x0 , we must now be able to solve for ẋ0 = x00 . Clearly, this can be
obtained by one more differentiation once x0 has been solved for, as in the following
example.
2.9 Example
Consider the index-0 dae


x 0 ! x1


e 2 = e



 x0 =! −x2
1
Taking this into the form (2.37) brings us to

0 !



 x2 = ẋ


 ẋ ! x1

e =e





 x0 =! −x
2
1
where ẋ0 cannot be solved for immediately since it does not even appear. However,
after differentiating the purely algebraic equation once, all derivatives can be solved
for;

0 !



 x2 = ẋ


 ẋ 0 ! x1 0

e ẋ = e x1





 x0 =! −x
2
1
However, the index is not raised in general; it is only in case the nonlinearly appearing derivatives could not be solved for in less than νD steps that the index will raise.
The following example shows a typical case where the index is not raised.
2.10 Example
By modifying the previous example we get a system that is originally index-1,
 0 !
x2
x1



e = e


 0 !

x1 = −x2





 x =! 1
3
Taking this into the form (2.37) brings us to
 0 !

x = ẋ


 2



ẋ ! x1


e = e


!


x10 = −x2





!

x3 = 1
38
2
Theoretical background
which is still index-1 since all derivatives can be solved for after one differentiation
of the algebraic equations:

!

x20 = ẋ






ẋ 0 ! x1 0


 e ẋ = e x1


!


x10 = −x2





!

x30 = 0
Although the transformation discussed here may raise the index, it may still be a
useful tool in case the equations and forcing functions are sufficiently differentiable.
The transformation has been implemented as a part of a tool for finding the quasilinear structure in equations represented in general form. However, even though
automatic transformation to quasilinear form is possible, it should be noted that formulating equations in the quasilinear form is a critical part of the modeling process,
and should be done carefully. This is emphasized in the works on dae with properly state leading terms by März and coworkers (Higueras and März, 2004; März and
Riaza, 2006, 2007, 2008).
2.2.5
Structure algorithm
The application of the structure algorithm to dae described in this section is due to
Rouchon et al. (1992), which relies on results in Li and Feng (1987).
The structure algorithm was developed for the purpose of computing inverse systems; that is, to find the input signal that produces a desired output. It assumes that
the system’s state evolution is given by an ode and that the output is a function of
the state and current input. Since the desired output is a known function, it can be
included in the output function; that is, it can be assumed without loss of generality
that the desired output is zero. The algorithm thus provides a means to determine u
in the setup

!

0


 x (t) = h( x(t), u(t), t )


!


0 = f ( x(t), u(t), t )
!
The algorithm produces a new function η such that u can be determined from 0 =
η( x, u, t ). By taking h( x, u, t ) = u, this reduces to a means for determining the
derivatives of the variables x in the dae
!
0 = f ( x(t), x0 (t), t )
In algorithm 2.2 we give the algorithm applied to the dae setup. It is assumed that
dim f = dim x, that is, that the system is square.
2.2
39
Differential-algebraic equations
Algorithm 2.2 The structure algorithm.
Input: A square dae,
!
f ( x(t), x0 (t), t ) = 0
0
Output: An equivalent non-square dae consisting
of a square dae from which x
can be solved for, and a set of constraints C =
V
!
i
Φ i ( x(t), t, 0 ) = 0 . Let α be the
smallest integer such that ∇2 fα ( x, ẋ, t ) has full rank, or ∞ if no such number exists.
Invariant: The sequence of fk shall be such that the solution is always determined
by fk ( x, ẋ, t ) = 0, which is fulfilled for f0 by definition. Reversely, this will make
fk ( x, ẋ, t ) = 0 along solutions.
Algorithm:
f0 = f
iB0
while ∇2 fi ( x, ẋ, t ) is singular
Since the rank of ∇2 fi ( x, ẋ, t ) is not full, it makes sense to split fi into two
parts; f¯i being a selection of components of fi such that ∇2 f¯i ( x, ẋ, t ) has full
and maximal rank (that is, the same rank as ∇2 fi ( x, ẋ, t )), and f˜i being the
remaining components.
Locally (and as all results of this kind are local anyway, this will not be further
emphasized), this has the interpretation that the dependency of f˜i on ẋ can be
expressed in terms of f¯i ( x, ẋ, t ) instead of ẋ; there exists a function Φ i such
that f˜i ( x, ẋ, t ) = Φ i ( x, t, f¯i ( x, ẋ, t ) ).
Since f¯i ( x, ẋ, t ) = 0 along solutions, we replace the equations given by f˜i by
the residuals obtained by differentiating Φ i ( x(t), t, 0 ) with respect to t and
substituting ẋ for x0 ;
!
f¯i
fi+1 =
( x, ẋ, t ) 7→ ∇1 Φ i ( x, t, 0 ) ẋ + ∇2 Φ i ( x, t, 0 )
i B i+1
if i > dim x
abort with “ill-posed”
end
end
Remark: Assuming that all ranks of Jacobian matrices are constant, it is safe to
abort after dim x iterations. (Rouchon et al., 1992) Basically, this condition means
that the equations are not used pointwise, but rather as geometrical (algebraic) objects. Hence, in the phrasing of Thomas (1996), differentiations are geometric, and α
becomes analogous to the geometric differentiation index.
In Rouchon et al. (1992), additional assumptions on the selection of components to
constitute f¯i are made, but we will not use those here.
40
2.2.6
2
LTI DAE ,
Theoretical background
matrix pencils, and matrix pairs
The linear time-invariant dae
!
E x0 (t) + A x(t) = 0
(2.39)
is closely connected to the concepts of matrix pencils and matrix pairs.
To the equation we may associate the matrix pencil s 7→ s E + A, and a large amount of
dae analysis in the literature is formulated using matrix pencil theory. The sign convention we use (with addition instead of subtraction in the expression for the pencil)
differs from much of the literature on lti dae (for instance, Stewart and Sun (1990)),
but is natural in view of how the dae (2.39) is written, and is also the convention
which generalizes to higher order matrix polynomials, compare Higham et al. (2006).
We will not go deep into this theory in this background, since the theory is basically
concerned with exactly known matrices, while the theme of this thesis is uncertain
matrices. Just to show the close connection between (2.39) and the corresponding
matrix pencil, note that the Laplace transform of the equation is
!
E ( s X(s) − x(0) ) + A X(s) = 0
or
!
( s E + A ) X(s) = E x(0)
If the matrix pencil is invertible at some s, it will be invertible at almost every s, the
pencil is called regular, X(s) can be evaluated at almost every point, and it will be
possible to find x(t) by inverse Laplace transform. In the other case, when the pencil
is singular at every s, the pencil is called singular, and the next theorem explains
why we will avoid singular pencils in this thesis.
2.11 Theorem. If the matrix pencil associated with the dae (2.39) is singular, the
dae with initial conditions x(0) = 0 has an infinite number of solutions, including
both bounded and exponentially increasing functions. Among the non-zero bounded
solutions, there are solutions with arbitrarily slow exponential decay.
Proof: The following proof is similar to that of Kunkel and Mehrmann (2006, theorem 2.14).
Since λ E + A is singular for every λ, we can take n + 1 numbers { λi }n+1
i=1 and corresponding vectors { vi , 0 }ni=1 such that ( λi E + A ) vi = 0 for all i. To construct real
solutions, we make the selection such that if Im λi , 0, then the complex conjugate
of λi appears as λj for some j, and the corresponding vi and vj are also related by
complex conjugate. Since the number of elements in { vi , 0 }ni=1 exceeds the dimension of space, they are linearly dependent, and there is a linear combination which
vanishes,
X
αi vi = 0
i
2.2
41
Differential-algebraic equations
where not all αi are zero. It follows that the function
X
4
x(t) =
αi vi eλi t
i
is real-valued, satisfies x(0) = 0, is not identically zero, and solves (2.39).
Since the choice of { λi }n+1
i=1 was arbitrary up to the pairing of complex conjugates, and
two disjoint such sets cannot produce the same solution, the number of solutions is
infinite, and since the Re λi may be chosen all negative as well as all positive, there are
both bounded and exponentially increasing solutions. Arbitrarily slow exponential
decay is obtained by selecting all Re λi negative, but close to zero, and by setting one
eigenvalue to zero, the solutions will have a non-zero constant asymptote.
From the linearity of the differential equation (2.39), it follows that if there exists a
solution with non-zero initial conditions, it will not be unique since all the solutions
with zero initial conditions may be added to it to produce new solutions. In view of
this, we call the dae singular (regular) if its matrix pencil is singular (regular).
2.12 Corollary. An lti dae with singular pencil does not have a finite index.
Proof: If it had, the definition of index would imply that the dae had a unique solution. This contradicts singularity of its pencil.
The next lemma helps us detect singular pencils.
2.13 Lemma. If the respective right null spaces of E and A intersect, or the respective left null spaces intersect, the pencil s 7→ s E + A is singular.
Proof: First, the case of intersecting right null spaces. Take a v , 0 belonging to the
!
intersection. Then v is a non-trivial solution to ( s E + A ) v = 0 for any s, and hence
s E + A is not invertible for any s.
!
For intersecting left null spaces, take v , 0 from the intersection. Then ( s E + A )T v =
0 has a nontrivial solution for every s. Hence det ( s E + A )T = det ( s E + A ) = 0 for
every s.
The consequence of intersecting right null spaces is easy to characterize. It means
that a change of variables will reveal one or more transformed variables that do not
appear in the equations at all. Clearly, any differentiable functions passing through
the origin will do as a solution for these variables, if the problem has a solution at all.
However, the dae will generally have no solutions since the remaining variables will
be over-determined.
The case of intersecting left null spaces is illustrated as an example of theorem 2.11
at the end of the section.
For many purposes, the asymmetric roles of E and A in the matrix pencil may be
disturbing, and it may make more sense to speak of the matrix pair ( E, A ) instead.
42
2
Theoretical background
In accordance with lti dae, the first matrix in the pair (here, E) will be denoted
the leading matrix of the pair, while the second matrix (here, A) will be denoted
the trailing matrix of the pair. The notions of regular and singular carry over from
matrix pencils to matrix pairs by the obvious association.
2.14 Definition (Eigenvalues of a matrix pair). The eigenvalues of the matrix pair
( E, A ) are defined as the equivalence classes of complex scalar pairs ( α, β ) , ( 0, 0 )
such that there exists a vector x , 0 for which
!
αEx + βAx = 0
Here, equivalence is defined as two pairs being equivalent if one equals the other
times some complex scalar.•
By identifying the eigenvalue [( α, 0 ) ] with ∞, and any eigenvalue [( α, β ) ] where
β , 0 with the common ratio αβ , we may also consider the eigenvalues as belonging
to C ∪ { ∞ }.
Two matrix pairs ( E1 , A1 ) and ( E2 , A2 ) are said to be equivalent if there exist nonsingular matrices T and V such that E2 V = T E1 and A2 V = T A1 . From the definition of eigenvalues, it is easy to see that two equivalent matrix pairs have the same
eigenvalues.
The symmetric view on an eigenvalue as an equivalence class of pairs of scalars is
the natural choice when the symmetric relation between the two matrices in a matrix
pair is to be maintained. However, in our view of the matrix pair as a representation
of an lti dae, the matrices in the pair do not have a truly symmetric relation — it is
always the leading one which is multiplied by the scalar parameter in a matrix pencil.
The following trivial theorem justifies the other view on matrix pair eigenvalues in
this thesis.
2.15 Theorem. If E is non-singular in the matrix pair ( E, A ), then the eigenvalues
of the pair are the same as the eigenvalues of the matrix −E −1 A.
Proof: The pair ( E, A ) is equivalent to I , E −1 A . Clearly [( α, 0 ) ] is not an eigenvalue, so any eigenvalue is in the form [( α, 1 ) ], satisfying the equation
!
α x = −1 E −1 A x
This shows that [( α, 1 ) ], identified with α1 = α, is also a matrix eigenvalue of −E −1 A.
The argument also works in the converse direction.
The following theorem generalizes the Jordan canonical form for matrices to matrix
pairs.
•
The definition follows Stewart and Sun (1990), although the sign conventions differ due to different sign
conventions for matrix pencils.
2.2
Differential-algebraic equations
43
2.16 Theorem (Weierstrass canonical form). Let ( E, A ) be a regular matrix pair.
Then it is equivalent to a pair in the form
"
# "
#!
I
J
,
(2.40)
N
I
where J is in Jordan canonical form, and N is a nilpotent matrix in Jordan canonical
form.
Proof: This result is easy to find in the literature, and we suggest Stewart and Sun
(1990, chapter vi, theorem 1.13) since it has other connections to this thesis as well.
It is easy to see that the index of nilpotency of N in the canonical form coincides
with the differentiation index of the corresponding dae, with the convention that the
index of nilpotency of an empty matrix is 0.
2.17 Definition (Singular/regular uncertain matrix pair). An uncertain matrix
pair is said to be singular if it admits at least one singular point matrix pair. Compare
singular interval matrix. Otherwise, it is said to be regular.
The definitions of singular/regular are also applied to uncertain lti dae in the obvious manner.
The section ends with an example of a dae with singular matrix pair.
2.18 Example
Row and column reductions of the matrix pair are often useful tools to discover structure in linear dae. Row reduction corresponds to replacing equations by equivalent
ones, while column reduction corresponds to invertible changes of coordinates. Suppose row reduction of the leading matrix of some pair resulted in

 

  1 1 1 1   1 0 1 0  
  0 1 1 1   0 1 1 2  
 
 

(2.41)
  0 0 0 0  ,  1 1 1 1  
 
 
 
0 0 0 0
1 1 1 1
where the lower part of the trailing matrix does not have full rank since it has linearly
dependent rows. By also performing row reduction on the trailing matrix, a common
left null space is revealed,

 

  1 1 1 1   1 0 1 0  
  0 1 1 1   0 1 1 2  
 
 

  0 0 0 0  ,  1 1 1 1  
 
 
 
0 0 0 0
0 0 0 0
That is, the vector 0 0 0 1 proves that the matrix pencil is singular according
to lemma 2.13, and next we will construct some of the solutions whose existence are
given by theorem 2.11.
44
2
Theoretical background
Before we start solving the right null space, column operations (that is, a change of
variables) are applied to the leading matrix to reveal as many zeros as possible,

 

  1 0 0 0   1 −1 1 0  
  0 1 0 0   0 1 0 1  
 
 

  0 0 0 0  ,  1 0 0 0  
 
 
 
0 0 0 0
0 0 0 0
The pencil, at some point λ,


−1
1 0
1 + λ
 0
1 + λ 0 1


 1
0
0 0


0
0
0 0
can now be column reduced using a change of variables that will depend on λ, revealing its right null space


 

−1
1 0 
1
0
0 0 0 0 1 0
1 + λ
 0




1 + λ 0 1 
0
1
0 0 0 0 0 1


 
=

0
0 0 −( 1 + λ )
1
1 0 1 0 0 0
 1

 

0 0 0 0
0
0
0 0
0
−( 1 + λ ) 0 1
Hence,

1
0


0
1
4 

v( λ ) = 
1
−( 1 + λ )
0
−( 1 + λ )

  
0 0 0 
0


0 0 1 
1


   = 

1
1 0 0 

  
0 1 0
−( 1 + λ )
shows how to find non-trivial elements in the right null space. Inspection shows that
only one of the components of v( λ ) actually depends on λ, so any set of three or more
such vectors will be linearly dependent (form a matrix with the v( λ ) as columns, and
consider the row rank). Denoting three values for λ by { λi }3i=1 , it can be seen that for
every α,
α ( λ2 − λ3 ) v( λ1 ) + α ( λ3 − λ1 ) v( λ2 ) + α ( λ1 − λ2 ) v( λ3 ) = 0
For the coefficients to be real, α must be purely imaginary if there is a pair of complex
conjugates.
Hence,

1
0
4
x( t, α, λ1 , λ2 , λ3 ) = α 
0
0
−1 0
1 −1
0
1
0
0


 ( λ2 − λ3 ) v( λ1 ) eλ1 t 



0  
+






−1 
  ( λ3 − λ1 ) v( λ2 ) eλ2 t 
0  



+
1 



( λ1 − λ2 ) v( λ3 ) eλ3 t
(2.42)
is a family of nontrivial solutions (the matrix undoes the change of variables used
to eliminate entries in the leading matrix). Figure 2.1 shows a random selection of
bounded solutions.
2.2
45
Differential-algebraic equations
1.5
2
1
0.5
1
0
−0.5
5
10
15
t
0
0
0.1
5
10
15
t
5
10
15
t
0.5
0.05
0
−0.05
5
10
15
t
0
−0.5
−0.1
Figure 2.1: First coordinate of randomly generated solutions to the singular dae
(2.41), given by (2.42). Random numbers have been sampled uniformly from
the interval [ 0.1, 1 ]. Random real parts of exponents have been chosen with
negative sign to produce bounded solutions. The number α in (2.42) is chosen
with modulus 1, and such that real solutions are produced. Upper left, one
random real exponent, and one pair of random complex conjugates. Upper right,
one exponent at 0, and one pair of complex conjugates. Lower left three random
real exponents. Lower right, one exponent at 0, two random real exponents.
46
2
Theoretical background
The example shows that if a singular dae is within the set of dae defined by an uncertain dae, the infinite set of solutions produced by the singular dae will contain
solutions which could be the output of a regular system with any eigenvalues. Hence,
the type of assumptions used in the thesis to restrict the set of solutions to the uncertain system, formulated in terms of system poles, will not be capable of ruling out
the solutions of the singular dae. Indeed, the eigenvalues of a singular pencil are not
even defined. On the other hand, as will be seen in section 7.1.2, the uncertain dae
that admit singular ones will be possible to detect without additional assumptions,
and this allows us to disregard this case as one which is not covered by our theory.
Conversely, when our methods (including assumptions we have to make) show that
the solutions to the dae are converging uniformly as the uncertainties tend to zero,
this shows that the uncertain dae is regular, and hence that the singular uncertain
dae that are excluded from our theory are exceptional in this sense.
2.2.7
Initial conditions
The reader might have noticed that the shuffle algorithm (on page 30) not only produces an index and an implicit ode, but also a set of constraints. These constrains the
solution at any point in time, and the implicit ode is only to be used where the constraints are satisfied. The constraints are often referred to as the algebraic constraints
which emphasizes that they are non-differential equations. They can be explicit as
in the case of non-differential equations in the dae as it is posed, or implicit as in
the case of the output from the shuffle algorithm. Of course, the constraint equations
are not unique, and it may well happen that some of the equations output from the
shuffle algorithm were explicit in the original dae.
Making sure that numerical solutions to dae do not leave the manifold defined by
the algebraic constraints is a problem in itself, and several methods to ensure this
exist. However, in theory, no special methods are required, since the produced implicit ode is such that an exact solution starting on the manifold will remain on the
manifold. This brings up another practical issue, namely that initial value problems
are ill-posed if the initial conditions they specify are inconsistent with the algebraic
constraints.
Knowing that a dae can contain implicit algebraic constraints, how can we know that
all implicit constraints have been revealed at the end of the index reduction procedure? If the original dae is square, any algebraic constraints will be present in differentiated form in the index 0 square dae. This implies that the solution trajectory will
be tangent to the manifold defined by the algebraic constraints, and hence it is sufficient that the initial conditions for an initial value problem are consistent with the
algebraic constraints for the whole trajectory to remain consistent. In other words,
there exist solutions to the dae starting at any point which is consistent with the
algebraic constraints, and this shows that there can be no other implicit constraints.
We shall take a closer look at this problem in section 3.3. Until then, we just note
that rather than rejecting initial value problems as ill-posed if the initial conditions
they specify are inconsistent with algebraic constraints, one usually interprets the
initial conditions as a guess, and then applies some scheme to find truly consistent
2.2
47
Differential-algebraic equations
initial conditions that are close to the guess in some sense. The importance of this
task is suggested by the fact that the influential Pantelides (1988) addressed exactly
this, and it is no surprise (Chow, 1998) since knowing where a dae can be initialized
entails having a characterization of the manifold to which all of the solution must
belong. Another structural approach to system analysis is presented in Unger et al.
(1995). Their approach is similar to the one we propose in chapter 3. However, just
as Pantelides’ algorithm, it considers only the equation-variable graph, although it is
not presented as a graph theoretical approach. A later algorithm which is presented
as graph theoretical, is given in Leitold and Hangos (2001), although a comparison
to Pantelides’ algorithm seems missing.
In Leimkuhler et al. (1991), consistent initial conditions are computed using difference approximation of derivatives, assuming that the dae is quasilinear and of
index 1. Later, Veiera and Biscaia Jr. (2000) gives an overview of methods to compute consistent initial conditions. It is noted that several successful approaches have
been developed for specific applications where the equations are in a well understood form, and among other approaches (including one of their own) they mention that the method in Leimkuhler et al. (1991) has been extended by combining
it with Pantelides’ algorithm to analyze the system structure rather than assuming
the quasilinear index 1 form. Their own method, presented in some more detail in
Veiera and Biscaia Jr. (2001), is used to find initial conditions for systems starting in
steady state, but allows for a discontinuity in forcing functions at the initial time.
Of all previously presented methods for analysis of dae, the one which most resembles that proposed in chapter 3 is found in Chowdhry et al. (2004). They propose a
method similar to that in Unger et al. (1995), but take it one step further by making
a distinction between linear and nonlinear dependencies in the dae. This allows lti
dae to be treated exactly, which is an improvement over Unger et al. (1995), while
performing at least as good in the presence of nonlinearities. In view of our method,
the partitioning into structural zeros, constant coefficients, and nonlinearities seems
somewhat arbitrary. However, they suggest that even more categories could be added
to extend the class of systems for which the method is exact. The need for a rigorous
analysis of how tolerances affect the algorithm is not mentioned.
2.2.8
Numerical integration
There are several techniques in use for the solution of dae. In this section, we mention some of them briefly, and explain one in a bit more detail. A classic accessible
introduction to this subject is Brenan et al. (1996), which contains many references
to original papers and further theory.
The method we focus on in this section is applicable to equations with differentiation
index 1, and this is the one we describe first. It belongs to a family referred to as
backward difference formulas or bdf methods. The formula of the method tells how
to treat x0 (t) in
!
f ( x0 (t), x(t), t ) = 0
when the problem is discretized. By discretizing a problem we refer to replacing
48
2
Theoretical background
the infinite-dimensional problem of computing the value of x at each point of an
interval, with a finite-dimensional problem from which the solution to the original
problem can be approximately reconstructed. The most common way of discretizing
problems is to replace the continuous function x by a time series which approximates
x at discrete points in time:
xi ≈ x(ti )
Reconstruction can then be performed by interpolation. A common approach to the
interpolation is to do linear interpolation between the samples, but this will give a
function which is not even differentiable at the sample points. To remedy this, interpolating splines can be used. This suggests another way to discretize problems,
namely to represent the discretized solution directly in spline coefficients, which
makes both reconstruction as well as treatment of x0 trivial. However, solving for
such a discretization is a much more intricate problem than to solve for a pointwise
approximation.
Before presenting the bdf methods, let us just mention how the simple (forward)
Euler step for ode fits into this framework. The problem is discretized by point!
wise approximation, and the ode x0 (t) = g( x(t), t ) is written as a dae by defining
4
f ( ẋ, x, t ) = −ẋ + g( x, t ). Replacing x0 (tn ) by the approximation ( xn+1 − xn )/( tn+1 −
tn ) then yields the familiar integration method:
x
− xn
!
0 = f ( n+1
,x ,t )
⇐⇒
tn+1 − tn n n
x
− xn
!
+ g( xn , tn )
⇐⇒
0 = − n+1
tn+1 − tn
!
xn+1 = xn + ( tn+1 − tn ) g( xn , tn )
The k-step bdf method also discretizes the problem by pointwise approximation, but
replaces x0 (tn ) by the derivative at tn of the polynomial which interpolates the points
( tn , xn ), ( tn−1 , xn−1 ), . . . , ( tn−k , xn−k ). (Brenan et al., 1996, section 3.1) We shall take
a closer look at the 1-step bdf method, which given the solution up to ( tn−1 , xn−1 )
and a time tn > tn−1 solves the equation
!
xn − xn−1
!
f
,x ,t =0
tn − tn−1 n n
to obtain xn . Of course, selecting how far from tn−1 we may select tn without getting
too large errors in the solution is a very important question, but it is outside the scope
of this background to cover this. A related topic of great importance is to ensure
that the discretized solution converges to the true solution as the step size tends to
zero, and when it does, to investigate the order of this convergence. Such analyzes
reveal how the choice of k affects the quality of the solution, and will generally also
give results that depend on the index of the equations. The following example is
not giving any theoretical insights, but just shows the importance of the index when
solving a dae by the 1-step bdf method.
2.2
49
Differential-algebraic equations
2.19 Example
Consider applying the 1-step bdf method to the square index 1 lti dae
!
E x0 (t) + A x(t) + B u(t) = 0
Discretization leads to
E
xn − xn−1
!
+ A xn + B u(tn ) = 0
hn
Where hn = tn − tn−1 . By writing this as
!
( E + hn A ) xn = E xn−1 − hn B u(tn )
we see that the iteration matrix
E + hn A
(2.43)
must be non-singular for the solution to be well defined. Recalling that the differentiation index is revealed by the shuffle algorithm, we know that there exists a nonsingular matrix K such that
#
"
# " #
" #! " #
" #
"
I
I
Ē
Ā
Ē
Ā
(
)
K
E
+
h
A
=
+
h
=
+
h
1
1
n
n
n
0
0
Ã
Ã
hn I
hn I
where the first term is non-singular. This proves the non-singularity of the iteration
matrix (2.43) in general, since it is non-singular for hn = 0, and will hence only be
singular for finitely many values of hn . Had the index been higher than 1, interpretation of the index via the shuffle algorithm reveals that the iteration matrix is singular
for hn = 0, and hence ill-conditioned for small hn . (It can be shown that it is precisely the dae where the iteration matrix is singular for all hn , that are not solvable
at all. (Brenan et al., 1996, theorem 2.3.1)) This shows that this method is limited to
systems of index no more than 1.
Note that the row operations that revealed the non-singularity also have practical
use, since if applied before solving the dae, the condition number of the iteration
matrix is typically improved significantly, and this condition is directly related to
how errors in the estimate xn−1 are propagated to errors in xn .
The following example shows how to combine the shuffle algorithm with the 1-step
bdf method to solve lti dae of arbitrary index.
2.20 Example
Consider solving an initial value problem for the square higher-index (solvable) lti
dae
!
E x0 (t) + A x(t) + B u(t) = 0
After some iterations of the shuffle algorithm (it can be shown that the index is
bounded by the dimension of x for well-posed problems, see the remark in algo-
50
2
Theoretical background
rithm 2.1), we will obtain the square dae
"
#
"
#
ĀνD −1
ĒνD −1 0
!
x (t) +
x(t) + · · · = 0
0
ÃνD −1
where the dependence on u and its derivatives has been omitted for brevity. At this
stage, the full set of algebraic constraints has been revealed, which we write
!
CνD x(t) + · · · = 0
It is known that
"
ĒνD −1
ÃνD −1
#
is full-rank, where the lower block is contained in CνD . This shows that it is possible
to construct a square dae of index 1 which contains all the algebraic constraints, by
selecting as many independent equations as possible from the algebraic constraints,
and completing with differential equations from the upper block of the index 0 system.
Note that the resulting index 1 system has a special structure; there is a clear separation into differential and non-differential equations. This is valuable when the
equations are integrated, since it allows row scaling of the equations so as to improve
the condition of the iteration matrix — compare the previous example.
In the previous example, a higher index dae was transformed to a square index 1
dae which contained all the algebraic constraints. Why not just compute the implicit ode and apply an ode solver, or apply a bdf method to the index 1 equations
just before the last iteration of the shuffle algorithm? The reason is that there is no
magic in the ode solvers or the bdf method; they cannot guarantee that algebraic
constraints which are not present in the equations they see remain satisfied even
though the initial conditions are consistent. Still, the algebraic constraints are not
violated arbitrarily; for consistent initial conditions, the true solution will remain on
the manifold defined by the algebraic constraints, and it is only due to numerical errors that the computed solution will drift away from this manifold. By including the
algebraic constraints in the index 1 system, it is ensured that they will be satisfied at
each sampling instant of the computed solution.
There is another approach to integration of dae which seems to be gradually replacing the bdf methods in many implementations. These are the implicit RungeKutta methods, and early work on their application to dae include Petzold (1986)
and Roche (1989). Although these methods are basically applicable to dae of higher
index, poor convergence is prohibitive unless the index is low. (Compare the 1-step
bdf method which is not at all applicable unless the index is at most 1.) The class of
irk methods is large, and this is where the popular Radau IIa belongs.
Having seen that higher index dae require some kind of index-reducing treatment,
we finish this section by reminding that index reduction and index deduction are
closely related, and that both the shuffle algorithm (revealing the differentiation in-
2.3
Initial condition response bounds
51
dex) and the algorithm that is used to compute the strangeness index may be used
to produce equations of low index. In the latter context, one speaks of producing
strangeness-free equations.
2.2.9
Existing software
To round off our introductory background on dae topics, some existing software for
the numerical integration of dae will be mentioned. However, as numerical integration is merely one of the applications of the work in this thesis, the methods will only
be mentioned very briefly just to give an idea of what sort of tools there are.
The first report on dassl (Brenan et al., 1996) was written by Linda Ruth in September 1982. It is probably the best known dae solver, but has been superseded by
an extension called daspk (Brown et al., 1994). Both dassl and daspk use a bdf
method with dynamic selection of order (1-step through 5-step) and step size, but
the latter is better at handling large and sparse systems, and is also better at finding
consistent initial conditions.
The methods in daspk can also be found in the more recent ida (dating 2005) (Hindmarsh et al., 2004), which is part of the software package sundials (Hindmarsh
et al., 2005). The name of this software package is an abbreviation of SUite of Nonlinear and DIfferential/Algebraic equation Solvers, and the emphasis is on the movement from Fortran source code to C. The ida solver is the dae solver used by the
general-purpose scientific computing tool Mathematica•.
While the bdf methods in the software mentioned so far require that the user ensures that the index is sufficiently reduced, the implementations built around the
strangeness index perform index reduction on the fly. Another interesting difference
is that the solvers we find here implement also irk methods beside bdf. In 1995,
the first version of gelda (Kunkel et al., 1995) (A GEneral Linear Differential Algebraic equation solver) appeared. It applies to linear time-varying dae, and there
is an extension called genda (Kunkel and Mehrmann, 2006) which applies to general nonlinear systems. The default choice for integration of the strangeness-free
equations is the Radau IIa irk method implemented in radau5 (Hairer and Wanner,
1991).
2.3
Initial condition response bounds
The initial condition response of a system is the solution to the dynamic equations
of the system when all forcing functions have been set to zero, given an initial state.
Since setting all forcing functions to zero yields an autonomous system, the study of
initial condition responses is the study of autonomous systems. For linear systems,
the output of a system with forcing functions is the sum of the initial condition response, and the response to the forcing functions from a zero initial state. Hence, initial condition responses are also important for the understanding of systems which
are not autonomous.
•
Version 7 being the current version, see http://reference.wolfram.com/mathematica/
tutorial/NDSolveIDAMethod.html, or the corresponding part of the on-line help.
52
2
Theoretical background
One of the key problems is to bound the largest possible gain from initial conditions to the state at any later time. For linear systems, the state at time t is given by
the transition matrix (Rugh, 1996, chapter 3), sometimes known as the fundamental
matrix,
x(t) = Φ(t, 0) x(0)
Hence, the gain to be bounded may be expressed as
sup kΦ(t, 0)k2
t≥0
and we will often use language in terms of the transition matrix and initial condition
responses interchangeably. We are interested in systems which are asymptotically
stable (below, we will use stronger stability conditions, such as definition 2.37), for
which it is meaningful to seek bounds that hold at all future times.
2.3.1
LTI ODE
For linear time-invariant systems,
x0 (t) = M x(t)
the transition matrix is given by Φ(t, 0) = eM t . Bounding the matrix eM t is a fundamental problem which has been studied by many, and this section contains a selection of results from the litterature. Before we start, however, we remind of one of the
basic results by Lyapunov.
2.21 Theorem (An inverse Lyapunov theorem). If M is a Hurwitz matrix (that is,
α( M ) < 0), then there exists a symmetric positive definite matrix P satisfying the
(time-invariant) Lyapunov equation
!
M T P + P M = −I
(2.44)
(The matrix I may be replaced by any positive definite matrix.) The solution is given
by
Z∞
T
(2.45)
P = eM t eM t dt
0
Proof: This is a well-known result; for instance, see Rugh (1996, theorem 7.11).
Generally speaking, a Lyapunov function is a function used to prove stability properties of a system. They are used for lti, ltv, as well as nonlinear systems. The
idea is that the function shall be a continuously differentiable non-negative function
of the state, being 0 only at the origin, and such that when it is composed with the
state trajectory, it becomes a decreasing function of time. If a function is intended
to be a Lyapunov function, but it hasn’t proven so yet (for instance, because it has
some parameters to be determined first) it is often referred to as a Lyapunov function candidate. See Khalil (2002, chapter 4) for an introduction in the general context
of nonlinear systems. The primary purpose of theorem 2.21 is to use x 7→ xT P x as
2.3
53
Initial condition response bounds
a Lyapunov function for the system x0 = M x, and doing so it is easy to derive a
constant bound on eM t for Hurwitz M.
2.22 Theorem. If M is a Hurwitz matrix, then
q
eM t ≤ kP k2 P −1 ,
2
2
for all t ≥ 0
where P is the symmetric positive definite matrix whose existence is ensured by theorem 2.21.
Proof: Consider solutions to the differential equation
x0(t) = M x(t) with initial con
0
M
t
e
ditions x(0) = x , and let t ≥ 0. Then |x(t)| =
x0 . Since x 7→ xT P x is a Lyapunov function, it follows that x(t)T P x(t) ≤ x(0)T P x(0). Knowing that P is a sym2
T
metric positive definite matrix, we may conclude that x(t)
P x(t) ≥ σmin (P ) |x(t)| ,
2
T
−1
and x(0) P x(0) ≤ σmax (P ) |x(0)| . Noting that σmin (P ) = P 2 and σmax (P ) = kP k2 ,
q
one obtains eM t x0 = |x(t)| ≤ kP k2 P −1 2 x0 . Since x0 was arbitrary, this implies
the result.
In Gurfil and Jodorkovsky (2003), the Lyapunov method of theorem 2.22 was applied
in combination with convex optimization techniques to find a matrix P with small
condition number. The beauty of the method of using Lyapunov functions is that it
is not restricted to linear systems.
Theorem 2.22 is very coarse. For instance, at t = 0 it is clear that eM 0 = 1, while
2
the theorem completely fails to capture
this.• Additionally, it is well known that M
M
t
being Hurwitz implies that e 2 → 0 as t → ∞, and the theorem fails to capture
this too. A common technique is to obtain decaying bounds by using shifts.
2.23 Lemma. For any scalar z,
eM t = e− Re z eM t+I z 2
2
(2.46)
Proof: Since M t and I z commute,
eM t+I z = eM t eI z = ez eM t . Taking norms on
M
t
both sides and solving for e 2 gives the result.
Applying lemma 2.23 to theorem 2.22 with z = a ∈ ( 0, −α( M ) ) gives
q
eM t ≤ e−a t kPa k2 Pa−1 2
2
(2.47)
where Pa is the solution to (2.44) with the Hurwitz M + a I instead of M. Note the
form of this bound; it is the product of one finite expression which depends only on
M, and one expression which is exponentially decaying for Hurwitz M.
Better bounds were derived in Van Loan (1977). For instance, the next theorem gives
a bound which is able to capture the exponential decay.
•
In Veselić (2003, equation (13)) there is a reference to a similar result, where the same bound as above is
multiplied with the exponentially decaying factor e−t/( 2 kP k2 ) .
54
2
Theoretical background
2.24 Theorem. If M has Schur decomposition M = Q ( D + N ) QH (Q will be unitary,
D diagonal, and N upper triangular), then
n−1
X
kN tk2k
eM t ≤ eα( M ) t
2
k!
(2.48)
k=0
Proof: See the derivation of Van Loan (1977, equation (2.11)).
The theorem captures the fact that the problem is trivial if M is normal, as this implies N = 0, but this will not be the case in this thesis — this is a matter of what we are
willing to assume about M. More generally, if we were willing to make assumptions
about kN k2 , being the departure from normality of M, this would also immediately
yield a bound by theorem 2.24. However, we are inclined to only make assumptions
about system features, and this measure’s being invariant under norm-preserving
changes of variables does not convince us that it could rightfully be considered a system feature; it is the restriction to norm-preserving transformations which bothers
us.
Making shifts in (2.48) makes no difference, but a bound comparable with (2.47) is
still easy to derive. The following two results (except for the shifted bound) appeared
in Tidefelt and Glad (2008) and are much more simple than tight.
2.25 Corollary. The matrix exponential is bounded according to
n−1
X
( 2 kMk2 )i t i
eM t ≤ eα( M ) t
2
i!
(2.49)
i=0
Proof: Let QH MQ = D + N be a Schur decomposition of M, and use kN k2 =
kQH MQ − Dk2 ≤ kMk2 + kMk2 in theorem 2.24.
2.26 Lemma. If the map M is Hurwitz, that is, α( M ) < 0, then for t ≥ 0,
kMk
2 e−1 n −α( M2 )
eM t ≤ e
2
(2.50)
Further, shifting with a ∈ ( 0, −α( M ) ) results in
kMk
2 e−1 n −( α( M 2)+a )
eM t ≤ e−a t e
(2.51)
2
4 Proof: Let f ( t ) = eM t 2 . From corollary 2.25 we have that
f(t) ≤
n−1
X
X
( 2 kMk2 )i t i α( M ) t
e
C
fi ( t )
i!
i=0
i
Each fi ( t ) can easily be bounded globally since they are smooth, tend to 0 from above
as t → ∞, and the only stationary point is found via fi0 ( t ). From
fi0 ( t ) = eα( M ) t
( 2 kMk2 )i t i−1
(t α( M ) + i)
i!
2.3
55
Initial condition response bounds
it follows that the stationary point is t = − α( iM ) . Hence,
fi ( t ) ≤ fi
i
kMk
! 2 kMk2 i i i
2 e−1 n −α( M2 )
i
−α( M )
−
=
e−i ≤
α( M )
i!
i!
and it follows that
f(t) ≤
i
n−1 2 e−1 n kMk2
X
−α( M )
i=0
i!
≤
i
kMk
∞
X
2 e−1 n −α( M2 )
i!
i=0
=e
2 e−1 n
kMk2
−α( M )
The shifted result follows immediately from (2.50).
However, after the development of this result, the theorem below was found in the
literature. The bounds provided by the two theorems are both functions of the same
ratio between a matrix’ norm and the smallest distance from any eigenvalue to the
imaginary axis, and hence they are equivalent for our qualitative convergence results. However, for practical purposes, when quality must be turned into quantity,
the theorem below offers a tremendous advantage.
2.27 Theorem. For a Hurwitz matrix M ∈ Rn×n and t ≥ 0, the matrix exponential is
bounded as
!n−1
kMk2
eM t ≤ γ( n )
eα( M ) t/2
(2.52)
2
−α( M )
where
γ( n ) = 1 +
n−1
X
i=1
4i
(ie−1 )i
i!
Proof: This is Godunov (1997, proposition 3.3, p 20) extended with the expression
for γ( n ), which can easily be extracted from the proof.
Comparing (2.47) with (2.51), we note that the former bound involves the non-trivial
dependence on M through the solution to the Lyapunov equation (2.44), while the
latter often grossly over-estimates the norm it bounds, but uses only very elementary
properties of the matrix. However, the condition number of the solution to the Lyapunov equation may be bounded without actually solving the equation, by application of bounds listed in the survey
et al. (1996, equation (70) and equation (87))
Kwon
(in their notation, kPa k2 = α1 , Pa−1 2 = αn−1 , and kM + a I k2 = γ1 ). The only upper
bound they list for α1 makes use of twice the logarithmic norm (see Ström (1975) for
properties of this norm and further references) of M + a I , being α( M + M T + 2a I ),
and requires this to be negative. When this is the case the following bound is obtained,
s
q
kM + a I k2
kPa k2 Pa−1 2 ≤
−α( M + M T + 2a I )
56
2
Theoretical background
but unfortunately the logarithmic norm may be positive even though α( M + a I ) is
negative. Hence, we are unable to derive from (2.47) a bound that is both exponentially decaying whenever α( M ) is negative, and expressed without direct reference
to the solution to the Lyapunov equation.
While theorem 2.24 both gives a tight estimate at t = 0 and exhibits the true rate of
the exponential decay, the polynomial coefficient makes the estimate very conservative, even for small t. Since the tightness of bounds for the matrix exponential are
directly related to how well our results in this thesis are connected to applications (by
means of deriving useful quantitative bounds), we shall end our discussion of matrix
exponential bounds with a recent result which appear in Veselić (2003). However,
while the original presentation is concerned with exponentially stable semigroups
(which may be infinite-dimensional), the results are stated here in terms of matrices
to make the results more accessible to readers unfamiliar with the original framework.
The bounds are formulated using the following two scalar functions of a matrix:
δ( M ) = 2 sup Re xH M x
!
|x|=1
γ( M ) = 2 inf Re xH M x
!
|x|=1
For their forthcoming analysis, it is assumed that γ( M ) < δ( M ). The first of these
definitions, δ( M ), may be recognized as twice the logarithmic norm of M. Among
the properties for the logarithmic norm in Ström (1975, lemma 1c), we note
1
δ( M ) ≤ kMk2
(2.53)
2
and the following alternative formulation shows its close connection to the norm of
the matrix exponential
eMh 2
δ( M ) = 2 lim
h
h→0+
α( M ) ≤
Veselić (2003) also reminds that δ( M ) = α( M + M T ), and regarding the second of
the definitions, it is shown that γ( M ) ≤ − kP k−1
2 , where P is the solution to (2.44).
2.28 Theorem. For Hurwitz M,
 δ( M ) t



e 2 ,
1
eM t ≤ 
1

2
) kP k2 2 + 2 δ( M ) kP k2 −t/( 2 kP k2 )

 1+δ( M)/γ(
e
,
1−δ( M
M)
t ≤ h0 ( M )
(2.54)
h0 ( M ) ≤ t
where
1 + δ( M ) kP k2
1
h0 ( M ) =
ln
δ( M )
1 − δ( M ) /γ( M )
and P is the solution to (2.44).
Proof: See Veselić (2003, theorem 4).
!
(2.55)
2.3
57
Initial condition response bounds
2.3.2
LTV ODE
When we consider linear time-varying systems in chapter 8, we extend results from
Kokotović et al. (1986, section 5.2) and we shall make use of some results from there.
2.29 Lemma. Let φ( t, s ) denote the transition matrix of the time-scaled ltv system
m z 0 (t) = M(t) z(t)
Assume that there exist a time interval I and constants c1 > 0, c2 , β, such that
∀ t ∈ I : α( M(t) ) ≤ −c1
∀ t ∈ I : kM(t)k2 ≤ c2
∀ t ∈ I : M 0 (t)2 ≤ β
Then there exist positive constants m0 , a, K, such that for all m < m0 , and s, t in I,
t ≥ s ⇒ φ( t, s ) − eM(s) ( t−s )/m ≤ m K e−a ( t−s )/m
(2.56)
2
Proof: See Kokotović et al. (1986, lemma 5:2.2), with further references to similar
results given in Kokotović et al. (1986, section 5:10).
While discussing linear time-varying systems, we take the opportunity to give a definition related to transformations of such systems, even though it is not particularly
related to the bounding of initial condition responses. Consider the change of variables
T (t) z(t) = x(t)
(2.57)
x0 (t) = M(t) x(t)
(2.58)
in the system
Via the intermediate dae,
T 0 (t) z(t) + T (t) z 0 (t) = M(t) T (t) z(t)
the ode in z is found to be
z 0 (t) = T (t)−1 M(t) T (t) − T (t)−1 T 0 (t) z(t)
(2.59)
2.30 Definition (Lyapunov transformation). The square time-varying matrix T is
called a Lyapunov transformation if it is continuously differentiable, T (t)is invertible
for every t, and there are time-invariant constants bounding kT (t)k2 and T (t)−1 2 for
all t.
Knowing that a transformation matrix is a Lyapunov transformation allows us to
work with the transformed system instead of the original one, knowing that the qualitative properties will be the same, and when we are done, we apply the reverse transformation to obtain results for the original system. For a theoretical application of
this definition, see for instance Rugh (1996, theorem 6.15).
58
2.3.3
2
Uncertain
Theoretical background
LTI ODE
Bounds on the initial condition response of an uncertain system is closely related to
perturbation theory, so there is a strong connection between the present section and
section 2.4.1 below.
The bounds mentioned so far only apply to exactly known systems, while the applications in this thesis will concern uncertain systems. In Boyd et al. (1994), a bound for
linear differential inclusions (see section 2.4.2) is given as a linear matrix inequality
optimization problem. The technique is based on the idea to use Lyapunov functions
as described above. However, the classes of uncertainty that can be handled cannot
cater for the problems we encounter in later chapters. An alternative to convex optimization might be to use the plethora of bounds on the eigenvalues (or, equivalently,
the singular values) of the solution to the Lyapunov equation. The survey Kwon et al.
(1996) contains many such bounds, including the following theorem.
2.31 Theorem. Let M be Hurwitz. The solution P to the Lyapunov equation (2.44)
satisfies
1
(2.60)
kP k2 ≥
2 kMk2
Proof: This is a special case of the main result in Shapiro (1974) — the general case
allows for the right hand side of (2.44) to be an arbitrary negative definite matrix
instead of just −I . However, the current case is trivial since
! 1 = k−I k2 = P M + M T P 2 ≤ 2 kP k2 kMk2
2.4
Regular perturbation theory
By a regular perturbation we refer to perturbations of expression for the time derivative in an ode. In the literature, the perturbations often occur in just one small
parameter, but we shall not restrict our notion to this case. Instead, we let perturbation theory refer to any theory which aims to describe how the solutions to equations
depend on small parameters in the equations. The perturbation parameters may be
used to model uncertainties, but the theory may also be useful for known quantities. In this and the following sections, we only consider perturbations of differential
equations (compare lemma 2.46 which concerns the problem perturbation in matrix
inversion). Like the perturbed equations themselves, the perturbation parameters
may or may not be allowed to be time-varying.
2.4.1
LTI ODE
Since the solution to the initial value problem
x0 (t) = M x(t)
x(0) = x0
is given by
x(t) = eM t x0
(2.61)
2.4
59
Regular perturbation theory
understanding the perturbed problem
z 0 (t) = ( M + F ) z(t)
z(0) = x0
(2.62)
becomes a matter of understanding the sensitivity of the matrix exponential with
respect to perturbations.
The sensitivity of the norm of the matrix exponential was the theme of Van Loan
(1977), to which we have referred previously regarding results on bounds on the matrix exponential. It turns out that it is the bound on the matrix exponential (section 2.3.1) with is the key to the relative sensitivity, formalized by the following
lemma.
2.32 Lemma. Assume there exists is a monotonically increasing function γ on
[ 0, ∞ ) and a constant β such that t ≥ 0 implies
eM t ≤ γ(t) eβ t
2
Then
e( M+F )t 2 ≤ kFk2 t γ(t)2 e[ β−α( M )+kFk2 γ(t) ] t
eM t (2.63)
2
Proof: This is Van Loan (1977, lemma 1).
The lemma should be compared with the outer approximations of the reachable sets
in example 2.36 on page 61. For any choice of β and γ, the lemma implies that
restriction to a finite time interval makes the perturbations in the solutions O( kFk2 ).
While the lemma bounds |z(t)| by a factor times |x(t)|, we are often interested in a
different bound, namely the absolute difference between the two, |z(t) − x(t)|.
2.33 Lemma. Assume that the nominal system (2.61) is stable, and that there exist
a polynomial γ and a constant β < 0 such that t ≥ 0 implies
e( M+F )t ≤ γ(t) eβ t
2
Then there is a finite constant k such that the solution to the perturbed system (2.62)
satisfies
sup |z(t) − x(t)| = k kFk2
t≥0
Proof: Introducing y = z − x, we find
y 0 (t) = ( M + F ) y(t) + F x(t)
y(0) = 0
with solution
Zt
y(t) =
0
e( M+F )( t−τ ) F x(τ) dτ
60
2
Theoretical background
Since the nominal system is stable, |x| will be a bounded function, say supt≥0 |x(t)| ≤
x̄. Then we get the estimate
Zt
Zt (
)τ
M+F
dτ ≤ kFk2 x̄
e
y(t) ≤ kFk2 x̄
γ(t) eβ t dτ
2
0
0
Here, the integrand has a primitive function which is also a polynomial (with coefficients depending on β) times eβ t . Hence, the integral will be bounded independently
of t, which completes the proof.
Several possible choices of polynomials and exponents to use with lemma 2.33 are
listed in Van Loan (1977), but we find theorem 2.27 particularly convenient since it
only relies on two basic properties of the perturbed matrix. Clearly the number k provided by the lemma would be possible to improve if we also used that x will satisfy
an exponentially decaying bound, which may be important to utilise in applications
when quantitative perturbation bounds need to be as tight as possible.
2.34 Lemma. Consider the perturbed solution over the finite interval [ 0, tf ]. Then
there is a finite constant k such that the solution to the perturbed system (2.62) satisfies
sup |z(t) − x(t)| = k kFk2
t∈[ 0, tf ]
Proof: Compare the proof of lemma 2.33. Since the nominal solution x will be defined on a compact interval, it will be a bounded function. For the integral, we may
use the coarse over-estimate
Zt Zt
1
e( M+F )τ dτ ≤ ekM+Fk2 τ dτ =
kM+Fk2 t
e
−
1
2
kM + Fk2
0
2.4.2
0
LTV ODE
We now turn to perturbations of the system
x0 (t) = M(t) x(t)
x(t0 ) = x0
(2.64)
with transition matrix φ. Let the time interval I be defined as [ t0 , ∞ ), so that kMkI ≤
α is the same as saying that kM(t)k2 ≤ α for all t ≥ t0 . We will often consider t0 = 0
without loss of generality. The following definition turns out useful.
2.35 Definition (Uniformly bounded-input, bounded-state stable). The ltv system
x0 (t) = M(t) x(t) + B(t) u(t)
x(t0 ) = 0
with input u is called uniformly bounded-input, bounded-state stable if there exists
a finite constant γ such that for any t0 ≥ 0,
sup |x(t)| ≤ γ sup |u(t)|
t≥t0
t≥t0
2.4
61
Regular perturbation theory
See Rugh (1996, note 12.1) regarding some subtleties of definition 2.35.
Three types of results for the perturbation of (2.64) dominate the literature. The first
type is confined to only consider the stability properties of the perturbed system, and
the amount of results shows that this is both important and non-trivial. Some results
which will be useful in later chapters are included below. The second type, well
explained in Khalil (2002, chapter 10), addresses the effect that a scalar perturbation
has on the solutions, and here the amount of literature is likely to be related to the
many application areas where corresponding methods have been successful. Since
we are mainly interested in non-scalar perturbations in this thesis, we shall not give
an account of the scalar perturbation results, but turn attention to the solutions of
the perturbed equation
z 0 (t) = [ M(t) + F(t) ] z(t)
z(0) = x0
(2.65)
where there is a bound kFkI ≤ f0 .
Introducing y = z − x yields the system
y 0 (t) = [ M(t) + F(t) ] y(t) + F(t) x(t)
y(0) = 0
(2.66)
which can be handled by showing that (2.66) is uniformly bounded-input, boundedstate stable from the input F(t) x(t). Since the input to the system decays with the
size of F, uniform convergence of y to zero follows (provided that the input-state
relation is not only bounded uniformly in the input, but also in the perturbation).
We refer to Rugh (1996, chapter 12) for the definitions and basic results. Clearly,
using the gain provided by the uniformly bounded-input, bounded-state stability
property will result in very conservative perturbation bounds, since they will only
depend on the peak value of |x|, even though x is a known function.
The third way to analyze perturbations is to approximate (2.65) conservatively using
differential inclusions (Filippov, 1985),
n o
z 0 (t) ∈ { [ M(t) + ∆ ] z(t) : k∆k2 ≤ f0 }
z(0) ∈ z 0
(2.67)
That is, we include all the solutions obtained by letting F(t) vary arbitrarily from
one t to another. Hence, the differential inclusion approximation corresponds to
ignoring all differentiability and continuity properties that we may have for F. The
solution to the problem is represented by a set-valued solution function, at each t
giving the reachable set at that time. If these sets can be computed conservatively
(outer approximations), we have a means to deal with quite general perturbations
of ltv systems. The concepts are illustrated by the following example, applied to a
time-invariant system. In the book Boyd et al. (1994) on linear matrix inequalities,
four out of ten chapters are devoted to the study of linear differential inclusions, and
the text should be accessible to a broad audience.
2.36 Example
Consider the perturbed lti system
!
"
#
!
!
z10 (t)
0
1
z1 (t)
=
+
F
z20 (t)
−1 −1
z2 (t)
!
!
z1 (0)
0
=
z2 (0)
1
(2.68)
62
2
Theoretical background
where max(F) ≤ . Let B 0.1. The set-valued function f defined by
!! "
#
!
z1 4 [ −0.1, 0.1 ]
[ 0.9, 1.1 ]
z1
f
=
z2
[ −1.1, −0.9 ] [ −1.1, −0.9 ] z2
maps any point z to a point with interval coordinates, that is, a (convex) rectangle
in the ( z1 , z2 ) plane. It is easy to see that the image of a convex set under f is also
a convex set, and according to Kurzhanski and Vályi (1997, lemma 1.2.1) it follows
that the reachable sets are also convex.
We shall approach the perturbation problem in three ways, all illustrated in figure 2.2:
• Making a grid of points in the 4 dimensional uncertainty space, and generate
the corresponding solutions. This is supposed to be a reasonably good inner
approximation of the perturbation problem (2.68) we aim to solve.
• Compute an outer approximation of the reachable sets of the differential inclusion, by making an interval approximation of the reachable set at each time
instant. This results in an ode in 4 variables, being the lower and upper interval bounds on z1 , z2 .
• Compute an inner approximation of the reachable sets by discretizing time and
compute a set of points in the interior of the reachable set at each time instant.
Since the reachable sets will be convex, the points will actually represent their
convex hull. To “integrate” from one time instant to the next, each vertex in the
convex set is mapped to several new points by evaluating all possible combinations of minimum, mid, and maximum values in the uncertain intervals. When
all vertices have been mapped, points that are not vertices of the new convex
hull are first removed, and then a selection of the remaining vertices is made so
that the number of vertices is never more than 10 at any time instant.
Additional outer approximations for smaller values of are shown in figure 2.3. It is
seen that the outer approximation is useful during a short time interval, but that it
explodes exponentially — this is typical for this type of interval analysis. Of course,
other outer approximations could also be considered. For instance, in the context of
linear matrix inequalities ellipsoids are the obvious choice (see Boyd et al. (1994)),
and ellipsoids were also used for hybrid systems in Jönsson (2002). The inner approximation seems to approximate the solution to the original problem (2.68) well.
Formalizing the differential inclusion idea gives a constructive method to prove that
the solutions to the perturbed problem converge uniformly as the perturbation tends
to zero, see Filippov (1985, theorem 8.2). Analogously to the method using the
bounded-input, bounded-output stability above, application of Filippov (1985, theorem 8.2) to perturbed linear systems, requires us to conservatively use a bound on the
peak norm of the nominal solution. Unlike when uniform bounded-input boundedoutput stability is applied to (2.66), the cited theorem for differential inclusions applies only to bounded intervals of time and does not guarantee a rate of convergence.
Hence, in view of the fundamental theorems that establish continuous dependence of
2.4
63
Regular perturbation theory
2
1
0
1
2
3
5
4
t
6
−1
−2
Figure 2.2: Inner and outer approximation of the reachable sets of the differential inclusion corresponding to (2.68) with = 0.1. The converging trajectories
(gray) were generated by deterministically replacing the uncertainty F by the 34
matrices obtained by enumerating all matrices with entries in { −, 0, }. This
provides a heuristic inner approximation of the original perturbation problem,
to which the differential inclusion should be compared. The diverging trajectories are the bounds of the outer interval approximation of the reachable sets,
found by integrating an ode. The vertical lines are the projections of the inner
approximations of the reachable sets, obtained by replacing the uncertainty in
the differential inclusion by fixed choices of F over short intervals of time. Here,
the same 34 matrices were used again, over time intervals of length 0.01 (to enhance readability in the plot, the projections are only shown at a sparse selection
of time instants).
2
1
0
1
−1
−2
2
= 10−2
3
4
= 10−3
5
= 10−4
6
7
8
t
= 10−5
Figure 2.3: Outer approximations of the reachable sets of the differential inclusion corresponding to (2.68), for smaller values of compared to figure 2.2.
64
2
Theoretical background
solutions as functions of parameters in the system (uniformly in time), the strength of
the theorem lies in its applicability to equations with discontinuous right hand side.
Since we will not encounter such equations in this thesis, the method of differential
inclusion is here mainly considered a computational tool for perturbed systems.
The sequence of results below are slightly more precise formulations of results in
Rugh (1996). They are based on time-varying Lyapunov functions, and in their original form they provide conditions for uniform exponential stability, explained by the
following definition according to Rugh (1996, definition 6.5).
2.37 Definition (Uniformly exponentially stable). The system (2.64) is said to be
uniformly exponentially stable if there exist finite positive constants γ, λ such that
for any t0 , x0 , and t ≥ t0 the solution x satisfies
|x(t)| ≤ γ e−λ ( t−t0 ) x0 For lti systems, uniform exponential stability is easily characterized in terms of
eigenvalues.
2.38 Theorem. The lti system
x0 (t) = M x(t)
is uniformly exponentially stable if and only if M is Hurwitz.
Proof: The time-invariance of the system allows us to use t0 = 0 without loss of
generality. If M is Hurwitz, then we may take λ ∈ ( 0, −α( M ) ), and it remains to
show that
|x(t)| λ t
e
x0 can be bounded by some constant γ. Since
|x(t)| ≤ eM t 2 x0 and theorem 2.24 shows that there exists a polynomial p such that
eM t ≤ p(t) eα( M ) t
2
it follows that
|x(t)| λ t
e ≤ p(t) e( λ+α( M ) ) t
x0 where λ + α( M ) < 0. Since the exponential decay will dominate the polynomial
growth as t → ∞, and the function to be bounded is continuous, the function is
bounded.
Conversely, if M is not Hurwitz, then there exists at least one eigenvalue λ with
Re λ ≥ 0. Taking x0 as the corresponding eigenvector shows that |x(t)| does not tend
to zero as t → ∞, showing that the system cannot be uniformly exponentially stable.
2.4
65
Regular perturbation theory
The additional precision needed in this thesis is captured by the next definition,
which requires the constants of the exponential decay to be made visible. For systems without uncertainty, the difference to the usual uniform exponential stability
above is minor, but for uncertain systems, the new definition means that the uniform
exponential stability is uniform with respect to the uncertainty.
h
i
2.39 Definition (Uniformly γ e−λ• -stable). The system (2.64) is said to be unih
i
formly γ e−λ• -stable if it is uniformly exponentially stable with the parameters γ,
λ used in definition 2.37.
We now rephrase three theorems in Rugh (1996) using our new definition.
q
ρ − 2νρ •
e
2.40 Theorem. The system (2.64) is uniformly
-stable if there exists a
η
symmetric matrix-valued, continuously differentiable function P and constants η >
0, ρ ≥ η, and ν > 0, for all t satisfying
η I P (t) ρ I
(2.69a)
0
(2.69b)
T
M(t) P (t) + P (t) M(t) + P (t) −ν I
Proof: The proof of Rugh (1996, theorem 7.4) applies.
h
i
2.41 Theorem. Suppose the system (2.64) is uniformly γ e−λ• -stable and kMkI ≤
α. Then the matrix-valued function P defined by
Z∞
φ(τ, t)T φ(τ, t) dτ
P (t) =
(2.70)
t
is symmetric for all t, continuously differentiable, and satisfies (2.69) with
η=
1
2α
ρ=
γ2
2λ
ν=1
Proof: The proof of Rugh (1996, theorem 7.8) applies.
For the perturbation
z 0 (t) = [ M(t) + F(t) ] z(t)
(2.71)
of (2.64) we now have the following theorem.
2.42 Theorem. If the system (2.64) satisfies the assumptions of theorem 2.41, then
there exists a constant hβ > 0 isuch that kFkI ≤ β implies that the perturbed system (2.71) is uniformly γ̃ e−λ̃• -stable with
r
α
λ
γ̃ = γ
λ̃ =
λ
2 γ2
66
2
Theoretical background
Proof: Follows by the proof of Rugh (1996, theorem 8.6) with minor addition of
detail. The main idea is to use the P which theorem 2.41 provides for the nominal
system (2.64), and use it in theorem 2.40 applied to the perturbed system (2.71).
Of the two conditions P must satisfy, (2.69a) is trivial since it does not involve the
perturbation. For the other condition, (2.69b), the cited proof shows that ν = 12 does
the job. The proof is completed by inserting the values for η, ρ, ν in theorem 2.40.
The strength of theorem 2.42 compared to Rugh (1996, theorem 8.6) is that the exponential convergence parameters of the perturbed system are expressed only in the
exponential convergence parameters of the nominal system and the norm bound on
M. This will be useful in chapter 8, where the “nominal” system is unknown up to
the specifications required by theorem 2.42.
2.4.3
Nonlinear
ODE
That a whole chapter in the classic Coddington and Levinson (1985, chapter 17) is
devoted to the perturbations of a nonlinear system in two dimensions, signals that
perturbations of nonlinear systems is in general a very difficult problem. Nevertheless, Lyapunov-based stability results similar to those in the previous section exist,
see, for instance, Khalil (2002, chapter 9). However, since perturbations of nonlinear systems will not be considered in later chapters, we will not present any of the
Lyapunov-based results here. Instead, we will just quickly show how a standard perturbation form can be derived.
Consider adding a small perturbation g( x(t), t ) to the right hand side of the nominal
system
x0 (t) = f ( x(t), t )
(2.72)
z 0 (t) = f ( z(t), t ) + g( z(t), t )
(2.73)
yielding
Introducing y = z − x, and subtracting (2.72) from (2.73) results in
y 0 (t) = f ( x(t) + y(t), t ) − f ( x(t), t ) + g( x(t) + y(t), t )
Series expansion of the first term and regarding x(t) as a given function of t shows
that
y 0 (t) = M(t) y(y) + h( y(t), t ) + g( x(t) + y(t), t )
(2.74)
where h( y, t ) = o( y ) for each t. To help the analysis, it is typically assumed that
h( y, t ) + g( x(t) + y, t ) = o( y ) uniformly in t. Additionally assuming that M is
time-invariant helps even more, leading to the standard form
y 0 (t) = M y(t) + fˆ( y(t), t )
where M is assumed Hurwitz and fˆ( y, t ) = o( y ).
(2.75)
Results regarding the solutions to (2.75) can be found in standard text books on ode,
such as Coddington and Levinson (1985, chapter 13), Cesari (1971, chapter 6), or
Khalil (2002).
2.5
67
Singular perturbation theory
2.5
Singular perturbation theory
Recall the model reduction technique called residualization (section 2.1.5). In singular perturbation theory, a similar reduction can be seen as the limiting system as
some dynamics become arbitrarily fast. (Kokotović et al., 1986) However, some of the
assumptions made in the singular perturbation framework are not always satisfied
in the presence of matrix-valued singular perturbations, and this is a major concern
in this thesis. The connection to model reduction and singular perturbation theory is
interesting also for another reason, namely that the classical motivation in those areas
is that the underlying system being modeled is singularly perturbed in itself, and one
is interested in studying how this can be handled in modeling and model-based techniques. Although that framework is built around ordinary differential equations, the
situation is just as likely when dae are used to model the same systems. It is a goal
of this thesis to highlight the relation between matrix-valued singular perturbations
that are due to stiffness in the system being modeled, and the treatment of matrixvalued singular perturbations that are artifacts of numerical errors and the like. In
view of this, this section not only provides background for forthcoming chapters, but
also contains theory with which later development is to be contrasted.
Singular perturbation theory has already been mentioned when speaking of singular
perturbation approximation in section 2.1.5. However, singular perturbation theory
is far more important for this thesis than just being an example of something which
reminds of index reduction in dae. First, it provides a theorem which is fundamental
for the analysis in the second part of the thesis. Second, the way it is developed in
Kokotović et al. (1986) contains the key ideas used in our development from chapter 6 on. In this section, we begin by stating a main theorem for lti systems. We
then briefly indicate how the lti scalar singular perturbation problem has been generalized, as some of these generalizations provide important directions for future
developments of our work. We then give a more detailed account on the work on the
so-called multiparameter singular perturbations, since this generalization relative to
scalar singular perturbation reminds of the generalization to matrix-valued singular
perturbation initiated in this thesis. A fairly recent overview of singular perturbation
problems and techniques is presented in Naidu (2002).
2.5.1
LTI ODE
The following (scalar) singular perturbation theorem found in Kokotović et al. (1986,
chapter 2, theorem 5.1) will be useful. Consider the singularly perturbed lti ordinary differential equation
! "
#
!
!
!
x0 (t) ! M11 M12 x(t)
x(t0 ) ! x0
=
= 0
(2.76)
z 0 (t)
M21 M22 z(t)
z(t0 )
z
−1
where we are interested in small > 0. Define M0 B M11 − M12 M22
M21 , denote
!
xs0 (t) = M0 xs (t)
!
xs (t0 ) = x0
(2.77)
68
2
Theoretical background
the slow model (obtained by setting B 0 and eliminating z using the thereby obtained non-differential equations), and denote
!
zf0 (τ) = M22 zf (τ)
!
−1
zf (0) = z 0 + M22
M21 x0
(2.78)
the fast model (which is expressed in the timescale given by τ ∼ t − t0 ).
2.43 Theorem. If α( ( ) M22 ) < 0, there exists an ∗ > 0 such that, for all ∈ ( 0, ∗ ],
the states of the
original
system (2.76) starting from any bounded initial conditions
x0 and z 0 , x0 < c1 , z 0 < c2 , where c1 and c2 are constants independent of , are
approximated for all finite t ≥ t0 by
x(t) = xs (t) + O( )
−1
z(t) = −M22
M21 xs (t) + zf (τ) + O( )
(2.79)
where xs (t) and zf (τ) are the respective states of the slow model (2.77) and the fast
model (2.78). If also α( ( ) M0 ) < 0 then (2.79) holds for all t ∈ [ t0 , ∞ ).
Moreover, the boundary layer correction zf (τ) is significant only during the initial
short interval [ t0 , t1 ], t1 − t0 = O( log ), after which
−1
z(t) = −M22
M21 xs (t) + O( )
Among the applications of this theorem, numerical integration of the equations is
probably the simplest example. The theorem says that for every acceptable tolerance
δ > 0 in the solution, there exists a threshold for such that for smaller , the contribution to the global error from the timescale separation is at most, say, δ/2. If the
timescale separation is feasible, one can apply solvers for non-stiff problems in the
fast and slow model separately, and then combine the results according to (2.79). This
approach is likely to be much more efficient than applying a solver for stiff systems
to the original problem.
However, note that the theorem only states that there exist certain constants (each
O-expression has two inherent constants), as opposed to giving explicit expressions
for these. That is, even though the constants are possible to compute, it requires a
bit of calculations, and hence the theorem highlights the qualitative nature of the
result. Similarly, the perturbation results to be presented in later chapters of this
thesis are also formulated qualitatively, even though the constructive nature of the
proofs allows error estimates to be computed.
2.5.2
Generalizations of scalar singular perturbation
As an indication of how our results in this thesis may be extended in the future, we
devote some space here to listing a few directions in which theorem 2.43 has been
extended. The extensions in this section are still concerned with the case of just one
small perturbation parameter, and are all found in Kokotović et al. (1986).
The first extension is that the O( ) expressions in (2.79) can be refined so that the
first order dependency on is explicit. Neglecting the higher order terms in , this
makes it possible to approximate the thresholds which are needed to keep track of
2.5
Singular perturbation theory
69
the global error when integrating the equations in separate timescales. However, it is
still not clear when is sufficiently small for the O( 2 ) terms to be neglected.
The other extension we would like to mention is that of theorem 2.43 to time-varying
linear systems. That such results exist may not be surprising, but it should be noted
that time-varying systems have an additional source of timescale separation compared to time-invariant systems. This must be taken care of in the analysis, and is a
potential difficulty if these ideas are used to analyze a general nonlinear system by
linearizing the equations along a solution trajectory (because of the interactions between timescale separation in the solution itself and in the linearized equations that
determine it). The decoupling transform for time-varying systems appears in Chang
(1969, 1972). For the problem of finding a bound on the perturbation parameter such
that the asymptotic stability of the coupled system is ensured, Abed (1985b) contains
a relatively recent result.
In Khalil (1984), singularly perturbed systems are also derived from ode with singular trailing matrices. The so-called semisimple null structure assumption they make
enables the formulation of a corresponding scalar singularly perturbation problem.
Another related problem class is obtained if stochastic forcing functions are added
to the singularly perturbed systems (see, for instance, Ladde and Sirisaengtaksin
(1989)). The properties of the resulting stochastic solution processes may then be
studies using decoupling techniques similar to those used for deterministic systems.
Looking at statistical properties such as mean and variance will give quite different
results compared to the L∞ function measure often used for deterministic systems,
and this relates to yet another related deterministic problem class, obtained by replacing the L∞ function measure by, for instance, the L2 function measure. For many
applications such norms may be more relevant than the maximum-norm-over-time
measure of L∞ . While, the L∞ measure remains an important general-purpose measure that should be supported by general-purpose numerical integration software,
it appears that both developers and users of numerical integration software would
benefit from also allowing other error estimates for the computed solution. We note
that both stochastic formulation and alternative function measures constitute relevant generalizations of the singular perturbation results derived in the thesis.
2.5.3
Multiparameter singular perturbation
The multiparameter singular perturbation problem is closely related to the subject of
the present work in that it considers several small parameters at the same time. The
multiparameter singular perturbation form arises when small parasitics are included
in a physical model. For instance, a parasitic parameter may be the capacitance of
a wire in an electric model where such capacitances are expected to be negligible.
Since the parasitics have physical origin, they are known to be greater than zero, and
this requirement is part of the multiparameter singular perturbation form.
70
2
Theoretical background
In its linear formulation, the autonomous multiparameter singular perturbation form
may be written
x0 = Axx x + Axz z

1 I




..
.
N I


 0
 z = Azx x + Azz z


(2.80)
Here, all the i > 0 are the small singular perturbation parameters, and the goal is
to understand how the system behaves as maxi i tends to zero. By introducing the
parameter µ = maxi i the system may be written in a form which is closer to the
scalar singular perturbation form,
x0 = Axx x + Axz z

µ

 1 I


 (

0
.
.
µ z = 
 Azx x + Azz z )
.


µ


N I
|
{z
}
(2.81)
D
In the early work on multiparameter singular perturbation in Khalil and Kokotović
(1979), all singular perturbation parameters are assumed to be of the same order of
µ
magnitude, corresponding to assuming abound on the in (2.81) (another common
i
µ
choice of µ is the geometric mean of all i , and then the ratios needs to be bounded
i
both from above and below to imply the equal order of magnitude condition). The
condition about equal order of magnitudes was originally formulated as
m ≤ i ≤ M for all i, j
(2.82)
j
and this remains the most popular way to state the assumption. The condition should
be seen in contrast to the case when the singular perturbation parameters are assumed of different orders of magnitudes in the sense that
(2.83)
lim i+1 = 0 for all i = 1, 2, . . . , N − 1
i →0 i
Such problems can be analyzed by applying a sequence of scalar singular perturbation results, and are said to have multiple time scales (where multiple refers to the
number of fast time scales).
The main assumption used in Khalil and Kokotović (1979) is the so-called Dstability, which means that a system remains stable (with some positive, fixed,
margin between poles and the imaginary axis) if the state feedback matrix is leftmultiplied by any D of (2.81). They remark that the condition (2.82) is not realistic
in many applications. In Khalil (1981), the results are extended to the nonlinear
setting using Lyapunov methods, and further refinement was made in Khorasani
and Pai (1985).
2.5
Singular perturbation theory
71
Later, the condition (2.82) was removed for lti systems in Abed (1985a), and the socalled strong D-stability condition was then introduced in Abed (1986) to simplify
the analysis. However, when ltv systems are treated in Abed (1986), the condition (2.82) is still used.
In Khalil (1987), the condition (2.82) used in Khalil and Kokotović (1979), has been
removed. Further, both slow and fast subsystems are allowed to be nonlinear. Instead of (2.82), assumptions are used to ensure the existence of Lyapunov functions
with certain additional constraints. This technique was used for scalar singular perturbation in Saberi and Khalil (1984), where references to other authors’ early work
on singular perturbations based on Lyapunov functions can be found.
In Coumarbatch and Gajic (2000), the algebraic Riccati equation is analyzed for a
multiparameter singularly perturbed system. The system under consideration has
two small singular perturbation parameters of the same order of magnitude. To understand properties of the solutions of Riccati equations for singularly perturbed systems seems a promising tool for future developments in singular perturbation theory,
and the replacement of the equal order of magnitude assumption by something more
realistic from an application point of view would be a valuable development by itself.
Within the class of multiparameter singularly perturbed problems, the class of multiple time scale singularly perturbed problems allows the small singular perturbation
parameters to belong to different groups depending on order of magnitude. Within
a group of singular perturbation parameters, all parameters satisfy a condition of
the kind (2.82), while there is an ordering among the groups such dividing a parameter from one group by a parameter from a succeeding group yields a ratio which
tends to zero as the latter parameter tends to zero. This problem was studied for
two fast time scales using partial decoupling in Ladde and S̆iljak (1989) and later
will full decoupling in Ladde and Rajalakshmi (1985). The generalization to more
than two fast time scales was later presented with partial decoupling in Ladde and
Rajalakshmi (1988), and with full decoupling and a flowchart for the decoupling
procedure in Kathirkamanayagan and Ladde (1988). In Abed and Tits (1986), the
strong D-stability property is extended to the context of multiple time scales. They
also highlight an example showing that for asymptotic stability in the multiple time
scale setting (2.83), asymptotic stability given (2.82) is a sufficient condition for and
only for N = 2 and dim z = 2.
Comparing the multiparameter singular perturbation theory with the matrix-valued
singular perturbation results in the second part of the thesis, there are a few things
to mention here. Assumptions are often used to restrict the singular perturbation
parameters within the basic multiparameter singular perturbation form, but authors
agree that some of these assumptions are not realistic in view of typical applications
for the theory. Hence, there is a constant drive to do away with such assumptions,
replacing them by conditions that can be verified by inspection of properties of the
unperturbed system. For lti systems, the (strong) D-stability definition is such a
condition. We remark that the requirement that all singular perturbation parameters
be positive adds important structure to the problem, and this is essential for the Dstability concept.
72
2
Theoretical background
When we consider matrix-valued singular perturbations, the lack of structure in the
perturbation makes conditions such as D-stability much harder to come up with.
The kind of properties which can be meaningfully verified are simple things such as
norm bounds on matrices in the unperturbed system. Everything else will have to be
assumed, and finding assumptions which are reasonable in view of applications will
be key to a successful theory. Unlike multiparameter singular perturbation, however,
matrix-valued singular perturbation cannot be handled without assumptions, and
this is in line with the non-physical origin of the matrix-valued singular perturbation
problems that we know of. That is, it is primarily for problems of physical origin
that imposing assumptions can be inappropriate — for perturbation problems that
are due to modeling and software artifacts, it is less surprising that assumptions may
be necessary to mitigate the effects of those artifacts.
2.5.4
Perturbation of
DAE
The issue with perturbations in dae has been considered previously in Mattheij and
Wijckmans (1998). While they consider perturbations of the trailing matrix and not
in the leading matrix, we share many of their observations regarding the possibility
of ill-posed problems. It is due to this similarity and that the dae perturbations we
study turn out to be of singular perturbation type, that the current section resides
under section 2.5.
The above-mentioned work on perturbations in the trailing matrix is referred to in
Kunkel and Mehrmann (2006, remark 6.7), as the latter authors remark that a perturbation analysis is still lacking in their influential framework for numerical solution
of dae based on the strangeness index. Although this thesis deals with perturbation related more to index reduction by shuffling than the methods of Kunkel and
Mehrmann (2006), it is hoped that our work will inspire the development of similar
perturbation analyses in other contexts as well.
When the study of matrix-valued singular perturbation was motivated in section 1.2.4, one of the applications was to handle a change of rank in the leading
matrix of a time-varying system. For a recent alternative approach to singularities
in time-varying systems, see März and Riaza (2007), where the assumed existence of
smooth projector functions is used to mitigate isolated rank drops.
An interesting topic in perturbation of dae is the study of how sensitive the eigenvalues are to perturbations. The eigenvalue problems generalize naturally form matrix
pencils (or pairs) to matrix polynomials, with immediate application to higher order
dae. Some recent results on the conditioning of the eigenvalue problem appear in
Higham et al. (2006). Although eigenvalues are very central to our treatment of perturbations in dae, the assumptions we use allow the “fast and uncertain” eigenvalues
to be very sensitive to perturbation. This behavior is very different from the setting
where eigenvalue perturbation results can be used to bound the sensitivity, and unfortunately the difference has hindered us from seeing applications of the eigenvalue
perturbation theory in our work.
2.6
2.6
73
Contraction mappings
Contraction mappings
Contraction mappings can provide elegant proofs of existence and uniqueness of the
solution to an equation. In this section, we state the fundamental theorem, which can
be found in standard text books on real analysis. The theorem is illustrated with one
example and one lemma which will be useful in later chapters.
2.44 Theorem (Contraction principle). Let X be a complete metric space, with
metric d. Let T be a mapping from X into itself such that d( T (x2 ), T (x1 ) ) ≤
c d( x2 , x1 ) for some c < 1. Then there exists a unique x ∈ X such that T (x) = x.
Proof: See Rudin (1976, theorem 9.23).
2.45 Example
Consider the equation
!
x2 = 4 + ,
x>0
(2.84)
for small
√ values of the parameter ≤ m. Although we know that the solution is given
by x = 4 + , we shall estimate the quality of the first order approximation to the
solution.
As the first order approximation is given by x0 () = 2 +
1
4
, we set
x() = x0 () + m2 y()
where y() shall be bounded independently of (for sufficiently small m) using a
contraction mapping argument.
Inserting (2.45) in (2.84) yields (dropping the argument to y)
1 2
1
!
4++
+ 2 2 + m2 y + m4 y 2 = 4 + 16
4
which is rewritten with y alone on one side of the equation (but y may still appear
on the other side of the equation as well)
!
y=− 2 2+
1
4
1
1 2
2
4
m + m y 16
(2.85)
Now assume that y ≤ ρ, where ρ is to be selected soon, and define the mapping
4
T y=− 2 2+
1
4
1
1 2
2
4
16
m +m y
so that a fixed point of T is a solution to (2.85).
In view of
1
T y ≤
16 4 +
1
2
1
+ m2 y
74
2
1 ||2
x0 () + 16
4 − 2.9
3
Theoretical background
2
√
1
4+
4 + 2.9
x=
1 ||2
x0 () − 16
0
0
2
4
6
4+
Figure 2.4: Approximating the square root near 4 using a contraction mapping
argument. Using to denote the deviation from 4, and introducing the first
order approximation x0 () = 2 + 14 , the square root is expressed in the variable
2 !
y through the equation x0 () + y
= 4 + . The computed upper and lower
0
bounds on y are added to x and shown in the figure, and are known to be valid
for all ≤ 2.9.
1
we take ρ = 1/16 so that 21 m < 2, and hence 16
m2 < 1 ensures that T y ≤ ρ. Solving
√
the requirements for m reveals that m shall be chosen less than the smallest of 16
and 4 (both equal to 4).
To prove that T is a contraction, note that it is continuously differentiable with positive and decreasing derivative (since T y tends to zero from below as y grows). Hence,
the modulus of the derivative at the low end of the domain is a valid Lipschitz constant in all of the domain. In view of this, the domain shall be selected such that the
derivative at −ρ is less than 1. This yields the equation
2 2+
1
4
m4
1 2
2 16 < 1
2
4
m −m ρ
or
1
4+
1
2
−
m2
ρ
2
2 m < 16
which is implied by m ≤ 2.9 (the optimal bound is somewhere between 2.9 and 3.0).
Using theorem 2.44 it may be concluded that for || ≤ 2.9,
1
x() − x0 () ≤
||2
16
illustrated in figure 2.4.
2.6
75
Contraction mappings
The example deserves a few remarks. First, note that ρ could have been selected
1
arbitrarily close to 64
at the price of obtaining very small bounds on m. Also note
that other ways of isolating y would lead to other operators, and possibly to improved
bounds. Another feature of the example is that the operator could be defined without
reference to the square root function, which is the function we set out to approximate
in the first place.
The method can be contrasted with series expansion techniques. Using the contraction mapping principle we first guess the approximation x0 (), and then prove
bounds on the rest term. In contrast, a Taylor expansion requires the existence of
derivatives of the function being approximated. Here,
x00 () = −
1
4 ( 4 + )3/2
and bounding |x00 ()| for || ≤ 2 yields
1 1
x() − x0 () ≤
√ ||2
2 8 2
| {z }
=
1√
16 2
Note that this bound is stronger than that obtained in the example.
Having tried the technique on the scalar example, we now turn to matrix equations,
and the result is general enough to be put as a lemma.
2.46 Lemma. Let the non-singular matrix X have its inverse bounded as X −1 2 ≤ c.
Then
ρ 1
kFk2 ≤
ρ+c c
(2.86)
=⇒
( X + F )−1 − X −1 ≤ ρ
2
Proof: Assume kFk2 ≤
ρ 1
ρ+c c ,
define the operator
4 T G = − X −1 F X −1 + G F X −1
and consider the set G = G : kGk2 ≤ ρ . Then
kT Gk2 ≤ kFk2 ( c + ρ ) c ≤ ρ
so T G ⊂ G. Since
kT G2 − T G1 k2 = G2 F X −1 − G1 F X −1 2
≤ kFk2 c kG2 − G1 k2
ρ+c
< kFk2
c kG2 − G1 k2 ≤ kG2 − G1 k2
ρ
T is a contraction on G.
76
2
Theoretical background
By theorem 2.44 there is a unique solution G ∈ G. Hence, G satisfies
!
G = − X −1 F X −1 + G F X −1
and multiplying by X from the right reveals
!
X −1 F + G X + G F = 0
Adding I to both sides allows us to write
!
X −1 + G ( X + F ) = I
Where it is seen that X + F indeed is invertible, and
!
G = ( X + F )−1 − X −1
shows that ( X + F )−1 − X −1 ∈ G.
2.47 Corollary. Let X be a non-singular matrix with kXk2 ≤ c and let ρ > 0 be given.
Then there exists a constant m > 0 such that
(2.87)
kFk2 ≤ m =⇒ ( X + F )−1 2 ≤ c + ρ
ρ
Proof: Take m = ρ+c 1c and use ( X + F )−1 2 = ( X + F )−1 − X −1 + X −1 2 ≤
( X + F )−1 − X −1 + X −1 .
2
2
In chapter 8, when contracting mapping arguments are applied to time-varying systems, the fixed-point equations will be integral equations. We end this section with
an example that will come in handy when reading those arguments.
2.48 Example
Consider equations in time-varying matrices over the time interval [ 0, tf ). Let φ be
the transition matrix of the system
x0 (t) = M(t) x(t)
so that
φ(t, t) = I
φ(•, τ)0 (t) = M(t) φ(t, τ)
φ(τ, •)0 (t) = −φ(τ, t) M(t)
Define the operators S and T according to
4
Zt
4
Ztf
1
(S R) (t) =
a
φ(t, τ) P (τ) dτ
0
(T R) (t) =
1
a
P (τ) φ(τ, t) dτ
t
2.7
77
Interval analysis
where a is a constant and P is a matrix-valued function of time.
The fixed-point equation
!
SR=R
then implies (swapping the sides, multiplying by a, and differentiating)
Zt
!
0
a R (t) = P (t) +
Zt
M(t) φ(t, τ) P (τ) dτ = P (t) + M(t)
0
φ(t, τ) P (τ) dτ
0
(2.88)
= P (t) + a M(t) (S R) (t) = P (t) + a M(t) R(t)
Similarly, the fixed-point equation
!
T R=R
implies
!
a R0 (t) = −P (t) −
Ztf
t

 tf

 Z


P (τ) φ(τ, t) M(t) dτ = −P (t) −  P (τ) φ(τ, t) dτ  M(t)


(2.89)
t
= −P (t) − a (T R) (t) M(t) = −P (t) − a R(t) M(t)
Hence, by identifying the forms of (2.88) or (2.89) with some equation in timevarying matrices, we will be able to formulate corresponding fixed-point integral
equations.
2.7
Interval analysis
While perturbation analysis is the theoretical tool used to prove convergence results
in this thesis, it relies on a too coarse uncertainty model to be successful in applications. In the much more fine-grained uncertainty model used in interval analysis, one
uncertainty interval is used for each scalar quantity. A survey and quick introduction
is given in Kearfott (1996). Another quick introduction including many algorithms
is given in Jaulin et al. (2002), written jointly with two of the authors of the popular
book Jaulin et al. (2001).
Though superior to “implemented perturbation analysis”, interval analysis is often blamed for producing error bounds so pessimistic that they are useless. Unfortunately, pessimistic error bounds — which are not always useless — is the price
one has to pay to be certain that the result of the uncertain computation really includes every possible outcome of the uncertain problem. Methods to improve performance include use of preconditioning matrices, coordinate transformations, and
uncertainty models which captures some of the correlation between uncertain quantities. For instance, even if the uncertainty in the quantity x is large, as long as x is
78
2
Theoretical background
known to be non-zero it holds that
x
x+x
=1
x−x=0
=2
x
2x−x
but without support from symbolic math computations these relations may be difficult to maintain. This problem is addressed in Neumaier (1987), but in this thesis we
shall only use the following trivial observation.
Notation. In the language of interval analysis, scalars, vectors, and matrices, represented by box constraints on each entry, are denoted intervals, interval vectors, and
interval matrices, respectively. In contrast — when the difference needs to be emphasized — the denotation of the exactly known corresponding objects are real number,
point vector, and point matrix. The notion of point objects is natural in view of the
uncertain objects being technically defined as sets of point objects. Since this thesis
is not in the field of interval analysis (we merely use the technique for illustration),
we tend to use the terms uncertain and exact instead.
A function containing uncertainty is also thougt of as a set of exact functions. This
enables us to define the solution to the uncertain equation in the variable x ∈ Rn
!
f(x) = 0
as
x ∈ Rn :
!
∃ fˆ ∈ f : x = fˆ( x )
Notation. For a general set we speak of inner and outer approximations, refering to
its subsets and supersets. In the context of equations, we simply write inner/outer
solution when referring to inner/outer approximations of the solution.
An uncertain matrix is called regular if it only admits non-singular point matrices.
Otherwise, it is called singular. The next definition is non-standard but will be convenient in the thesis.
2.49 Definition (Pointwise non-singular uncertain matrix). An uncertain matrix
is said to have the additional property of being pointwise non-singular if it admits
only non-singular point matrices.
Clearly every regular uncertain matrix has the property of being pointwise nonsingular, but for singular uncertain matrices this is the property which allows the
inverse to be formed formally, even though the inverse is a matrix with at least one
unbounded interval of uncertainty. For the inverse to be useful, additional assumptions will have to be added, for instance, a bound on the norm of the inverse.
It is important to note that an interval matrix with additional constraints, such as
pointwise non-singularity or boundedness of inverse, is generally not possible to represent as an interval matrix, and it becomes more appropriate to use the more general
notion of uncertain matrix.
2.7
79
Interval analysis
If X is known to be a regular matrix, so that X −1 X = I , the following column reduction is possible
"
#
h
i X −1 −X −1 Y
h
i
X Y
= I 0
0
I
This also makes use of the trivial −X X −1 Y + Y = −I Y + Y = 0. Thinking of the
h col-i
umn reduction as a coordinate transformation, the uncertainty in the matrix X Y
was transferred to uncertainty in the coordinate transform. Of course, row reduction
can be carried out analogously.
While approximate intervals for X −1 are easily obtained by a first order expansion of
X −1 around some point matrix X0 ∈ X, good bounds that are guaranteed to contain
X −1 are more demanding. We shall not go into details, since we will rely on Mathematica to deliver accurate results, and interested readers are referred to Neumaier
(1990, theorem 4.1.11) for a theoretical result.
The following theorem is characteristic for interval arithmetic. It is a simple form of
constraint propagation, and proves itself.
2.50 Theorem. Consider the fixed-point equation,
!
x = f(x)
where uncertainty in f implies uncertainty in the solution, and where the solution
set x is known to be a subset of x1 . If conservative evaluation of f ( x1 ) results in x2 ,
that is
f ( x1 ) ⊂ x2
then
x ⊂ x1 ∩ x2
The theorem immediately gives a technique for iterative refinement of outer solutions
to a fixed-point equation. Hence, even very conservative outer solutions may be very
valuable as they can serve as a starting point for the iterative refinement.
2.51 Example
Consider the uncertain polynomial (the brackets denote intervals here, not matrices)
4
f ( x ) = [ 0.9, 1.1 ] x2 + [ 9.5, 10.5 ] x + [ −8.5, −7.5 ]
which has a solution near 0.75. The plot of f in figure 2.5 allows the solution to be
read off as the interval where 0 ∈ f ( x ), and it is easy to find that the true solution set
is approximately [ 0.6676, 0.8295 ].
!
Let x0 = [ 0.0, 2.0 ] be a given outer solution, and rewrite the equation f ( x ) = 0 as
!
the fixed-point equation x = T ( x ), where
4
T(x) = −
[ 0.9, 1.1 ] x2 + [ −8.5, −7.5 ]
[ 9.5, 10.5 ]
80
2
Theoretical background
f(x)
2
0
0.6
0.65
0.7
0.75
0.8
0.85
0.9
x
−2
Figure 2.5: The uncertain function f in example 2.51. Since no uncertain quantities appear more than once in the expression for f , it is straint-forward to compute the set f ( x1 ) at any point x1 .
The iterative refinement produces the following sequence of outer solutions.
Iterate
x0
x1
x2
x3
Outer solution
[ 0.00, 2.00 ]
[ 0.55, 0.89 ]
[ 0.63, 0.87 ]
[ 0.64, 0.86 ]
Further iteration gives little improvement; x6 = [ 0.6375, 0.8562 ], sharing its four
most significant digits with x200 . Hence, the iterative refinement produced significant improvement of the initial outer solution in just a few iterations, but was unable
to converge to the true solution set.
2.8
Gaussian elimination
Although assumed that the reader is familiar with Gaussian elimination, in this section some aspects of particular interest for the proposed algorithm in chapter 3 will
be discussed.
The shuffling algorithm in chapter 3 makes use of row reduction. The most well
known row reduction method is perhaps Gaussian elimination, and although infamous for its numerical properties, it is sufficiently simple to be a realistic choice for
implementations. In fact, the proposed algorithm makes this particular choice, and
among the many variations of Gaussian elimination, a fraction-free scheme is used.
This technique for taking a matrix to row echelon form• uses only addition and multi•
A matrix is said to be in row echelon form if each non-zero row has more leading zeros than the previous
row. Actually, in order to account for the outcome when full pivoting is used, one should really say that the
matrix is in row echelon form after suitable reordering of variables. In the current setting of elimination
2.8
Gaussian elimination
81
plication operations. In contrast, a fraction-producing scheme involves also division.
The difference is explained by example. Consider performing row reduction on a
matrix of integers of the same order of magnitude:
"
#
5 7
3 −4
A fraction-free scheme will produce a new matrix of integers,
"
# "
#
5
7
5
7
=
5 · 3 − 3 · 5 5 · (−4) − 3 · 7
0 −41
while a fraction producing scheme generally will produce a matrix of rational numbers,
#
# "
"
5
7
5
7
=
0 −(41/5)
3 − (3/5) · 5 (−4) − (3/5) · 7
The fraction-free scheme thus has the advantage that it is able to preserve the integer
structure present in the original matrix. On the other hand, if the original matrix is a
matrix of rational numbers, both schemes generally produce a new matrix of rational
numbers, so there is no advantage in using the fraction-free scheme. Note that it is
necessary not to allow the introduction of new integer entries in order to keep the
distinction clear, since any matrix of rational numbers can otherwise be converted
to a matrix of integers. Further, introducing non-integer scalars would destroy the
integer structure. The two schemes should also be compared by the numbers they
produce. The number −41 in comparison with the original numbers is a sign of the
typical blowup of entries caused by the fraction-free scheme. The number −(41/5) =
−8.2 does not indicate the same tendency.
When the matrix is interpreted as the coefficients of a linear system of equations to
be solved in the floating point domain, the blowup of entries implies bad numeric
condition, which in turn has negative implications for the quality of the computed
solution. Unfortunately, this is not the only drawback of the fraction-free scheme,
since the operations involved in the row reduction are ill-conditioned themselves.
This means that there may be poor correspondence between the original equations
and the row reduced equations, even before attempting to solve them.
Fraction-free Gaussian elimination can also be applied to a matrix of polynomials,
and will then preserve the polynomial structure. Note also that the structure is not
destroyed by allowing the introduction of new scalars. This can be used locally to
drastically improve the numerical properties of the reduction scheme by making it
approximately the same as those of the fraction producing scheme. That is, multiplication by scalars is used to locally make the pivot polynomial approximately equal
to 1, and then fraction-free operations are used to eliminate below the pivot as usual.
Finally, recall that Gaussian elimination also takes different flavors in the pivoting
dimension. However, this dimension is not explored when proposing the algorithm
where it makes sense to speak of structural zeros, the reference to reordering of variables can be avoided by
saying that the reduced matrix is such that each non-zero row has more structural zeros than the previous
row.
82
2
Theoretical background
in chapter 3.
2.9
Miscellaneous results
The chapter ends with a collection of results that are included here only to be referenced from subsequent chapters. Please refer to cited references for discussion of
these results.
The following theorem gives an upper bound on the roots of a polynomial, purely
in terms of the moduli of the coefficients. Of course, there are many ways to obtain
tighter estimates by using more knowledge about the coefficients, but we don’t have
that kind of information in this thesis, and the bounds we get from this theorem are
tight enough for our purposes.
2.52 Theorem. The moduli of the roots of the polynomial
f(z) =
n
X
ai z i
i=0
are bounded by

1
 
 an−i i

2 max  

 an n−1

1

[



 a0 n





 2 an i=1












Proof: The result is included in the form of an exercise in the thorough Marden
(1966, exercise 30.5).
Motivated by theorem 2.27, the bounding of matrices is very important in our work,
but it turns out that we actually know more about the inverse of the matrix, than
about the matrix itself. Since bounding the norm of the inverse of a matrix is related
to bounding the condition number, several useful results are presented with condition number bounds in mind. The survey Higham (1987) lists many such results.
One of them which is useful to us and applies to upper triangular matrices is given
as the next theorem.
2.53 Theorem. For the upper triangular matrix U , it holds that
p
( a + 1 )2 n + 2 n ( a + 2 ) − 1
kU k2 ≤
(a + 2)b
where, with J = U −1 ,
a = max
i<j
Jij |Jii |
= max λi Jij ≤ max U −1 λmax ( U )
i<j
b = min |Jii | = min
i
i
1
1
=
λi
λmax ( U )
(2.90)
2.9
Miscellaneous results
83
Proof: This is an immediate consequence of results in Lemeire (1975) and builds on
the theory of M-matrices.
3
Shuffling quasilinear DAE
Methods for index reduction of general nonlinear differential-algebraic equations are
generally difficult to implement due to the recurring use of functions defined only
via the implicit function theorem. The problem is avoided in chapter 5, but instead
of implementing the implicit functions, additional variables are introduced, and an
under-determined system of equations needs to be solved each time the derivative (or
the next iterate of a discretized solution) is computed. As an alternative, additional
structure may be added to the equations in order to make the implicit functions
possible to implement, thereby avoiding additional variables and under-determined
systems of equations. In particular, this is possible for the quasilinear and linear
time-invariant (lti) structures, and it turns out that there exists an algorithm for
the quasilinear form that is a generalization of the shuffle algorithm for the lti form
in the sense that, when applied to the lti form, it reduces to the shuffle algorithm.
For this reason, the more general algorithm is referred to as a quasilinear shuffle
algorithm.
This chapter is devoted to quasilinear shuffling. It is included in the background
part of the thesis since it was the application to quasilinear shuffling that was the
original motivation behind the study of matrix-valued singular perturbations. This
connection was mentioned in section 1.2.2 and will be discussed below when the
seminumerical approach presented in section 3.2.4. This chapter also contains a discussion on how the quasilinear shuffle algorithm can be used to find consistent initial
conditions, and touches upon the issue of the algorithm complexity.
The contents of the chapter presents only minor improvements compared to the
chapter with the same title in the author’s licentiate thesis, Tidefelt (2007, chapter 3),
except that the section on algorithm complexity has been removed due to its weak
connection to the current thesis.
85
86
3
Shuffling quasilinear dae
Notation. We use a star to mark that a symbol denotes a constant. For instance,
the symbol E ∗ denotes a constant matrix, while a symbol like E would in general
refer to a matrix-valued function. A few times, we will encounter the gradient of a
matrix-valued function. This object will be a function with 3 indices, but rather than
adopting tensor notation with the Einstein summation convention, we shall permute
indices using generalized transposes denoted (•)T and (•) . Since their operation will
be clear form the context, they will not be defined formally in this thesis.
T
3.1
Index reduction by shuffling
In section 2.2.3, algorithm 2.1 provided a way of reducing the differentiation index of
lti dae. The extension of that algorithm to the quasilinear form is immediate, but to
put this extension in a broader context, we will take the view of it as a specialization
instead. In this section, we mainly present the algorithm as it applies to equations
which are known exactly, and are to be index reduced exactly.
3.1.1
The structure algorithm
In section 2.2.5 we presented the structure algorithm (algorithm 2.2) as means for
index reduction of general nonlinear dae,
!
f ( x0 (t), x(t), t ) = 0
(3.1)
This method is generally not possible to implement, since the recurring use of the implicit function theorem often leaves the user with functions whose existence is given
by the theorem, but whose implementation is very involved (to the author’s knowledge, there are to date no available implementations serving this need). However, it
is possible to implement for the quasilinear form, as was done, for instance, using
Gauss-Bareiss elimination (Bareiss, 1968) in Visconti (1999), or outlined in Steinbrecher (2006).
3.1.2
Quasilinear shuffling
Even though algorithms for quasilinear dae exist, the results they produce may be
computationally demanding, partly because the problems they apply to are still very
general. This should be compared with the linear time-invariant (lti) case,
!
E x0 (t) + A x(t) + B u(t) = 0
(3.2)
to which the very simple and certainly tractable shuffle algorithm (see section 2.2.3)
applies — at least as long as there is no uncertainty in the equation coefficients. Interestingly, the algorithm for quasilinear dae described in Visconti (1999) is using the
same idea, and it generalizes the shuffle algorithm in the sense that, when applied
to the lti form, it reduces to the shuffle algorithm. For this reason, the algorithm in
Visconti (1999) is referred to as a quasilinear shuffle algorithm.•
•
Note that it is not referred to as the quasilinear shuffle algorithm, since there are many options regarding
how to do the generalization. There are also some variations on the theme of the lti shuffle algorithm,
leading to slightly different generalizations.
3.1
87
Index reduction by shuffling
In the next two sections, the alternative view of quasilinear shuffling as a specialization of the structure algorithm is taken. Before doing so, we show using a small
example what index reduction of quasilinear dae can look like.
3.1 Example
This example illustrates how row reduction can be performed for a quasilinear dae.
The aim is to present an idea rather than an algorithm, which will be a later topic.
Consider the dae (dropping the dependency of x on t from the notation)




5
x2
4t

2 + tan( x1 )


x3
 2 cos( x1 )
 x0 +  x2 + x3  =! 0
0
e




sin( x1 )
x2 cos( x1 ) 4 t cos( x1 ) − ex3
x1 ex3 + t 3 x2
The leading matrix is singular at any point since the first row times cos( x1 ) less the
second row yields the third row. Adding the second row to the third, and then subtracting cos( x1 ) times the first, is an invertible operation and thus yields the equivalent equations:




5
 !
2 + tan( x1 ) x2 4 t 

 0 
x3 
 = 0
 2 cos( x1 )
x
+
x
0
e
x
+


2
3




x
3
3
0
0
0
x1 e + t x2 + x2 + x3 − 5 cos( x1 )
This reveals the implicit constraint of this iteration,
!
x1 (t) ex3 (t) + t 3 x2 (t) + x2 (t) + x3 (t) − 5 cos( x1 (t) ) = 0
Then, differentiation yields the new dae




x2
4t
 2 + tan( x1 )

 5  !
 2 cos( x )


0
ex3  x0 + x2 + x3  = 0

1
 x3

 2 
e − 5 sin( x1 ) t 3 + 1 x1 ex3 + 1
3 t x2
Here, the leading matrix is generally non-singular, and the dae is esentially an ode
bundeled with the derived implicit constraint.
3.1.3
Time-invariant input affine systems
In this section, the structure algorithm is applied to equations
!
0 = f ( x(t), x0 (t), t )
where f is in the form
f ( x, ẋ, t ) = E( x ) ẋ + A( x ) + B( x ) u(t)
(3.3)
with u being a given forcing function. This system is considered time-invariant since
time only enters the equation via u.
After one iteration of the structure algorithm, we will see what requirements (3.3)
must fulfill in order for the equations after one iteration to be in the same form as the
original equations. This will show that (3.3) is not a natural form for dae treated by
the structure algorithm. In the next section, a more successful attempt will be made,
88
3
Shuffling quasilinear dae
starting from a more general form than (3.3).
The system is rewritten in the form
!
x0 (t) = ẋ(t)
(3.4a)
!
0 = f ( x(t), ẋ(t), t )
(3.4b)
to match the setup in Rouchon et al. (1992) (recall that the notation ẋ is not defined
to denote the derivative of x; it is a composed symbol denoting a newly introduced
function which is required to equal the derivative of x by (3.4a)). As is usual in the
analysis of dae, the analysis is only valid locally, giving just a local solution. As
is also customary, all matrix ranks that appear are assumed to be constant in the
neighborhood of the initial point defining the meaning of local solution.
We will now follow one iteration of the structure algorithm applied to this system
(compare algorithm 2.2).
4
Let f0 B f , and introduce E0 , A0 and B0 accordingly. Let µk = rank Ek (that is, the
rank of the “ẋ-gradient” of fk , which by assumption may be evaluated at any point
in the neighborhood), and let f¯k denote µk components of fk such that Ēk denotes µk
linearly independent rows of Ek . Let f˜k denote the remaining components of fk . By
the constant rank assumption it follows that, locally, the rows of Ēk ( x ) span the rows
of Ek ( x ), and hence there exists a function ϕk such that
Ẽk ( x ) = ϕk ( x ) Ēk ( x )
Hence,
f˜k ( x, ẋ, t ) = Ẽk ( x ) ẋ + Ãk ( x ) + B̃k ( x ) u(t)
= ϕk ( x ) Ēk ( x ) ẋ + Ãk ( x ) + B̃k ( x ) u(t)
= ϕk ( x ) Ēk ( x ) ẋ + Āk ( x ) + B̄k ( x ) u(t)
− ϕk ( x ) Āk ( x ) − ϕk ( x ) B̄k ( x ) u(t) + Ãk ( x ) + B̃k ( x ) u(t)
= ϕk ( x ) f¯k ( x, ẋ, t ) + Ãk ( x ) − ϕk ( x ) Āk ( x )
+ B̃k ( x ) − ϕk ( x ) B̄k ( x ) u(t)
Define
4
Âk ( x ) = Ãk ( x ) − ϕk ( x ) Āk ( x )
4
B̂k ( x ) = B̃k ( x ) − ϕk ( x ) B̄k ( x )
(3.5)
4
Φ k ( x, t, y ) = ϕk ( x ) y + Âk ( x ) + B̂k ( x ) u(t)
and note that along solutions,
Φ k ( x, t, 0 ) = Φ k ( x, t, f¯k ( x, ẋ, t ) ) = f˜k ( x, ẋ, t ) = 0
In particular, the expression is constant over time, so it can be differentiated with
respect to time to obtain a substitute for the (locally) uninformative equations given
3.1
89
Index reduction by shuffling
by f˜k . Thus, let
4
fk+1 ( x, ẋ, t ) =
f¯k ( x, ẋ, t )
!
(3.6)
∂ t7→Φ k ( x(t), t, 0 )
(t)
∂t
Expanding the differentiation using the chain rule, it follows that
∂ t 7→ Φ k ( x(t), t, 0 )
(t) = ∇1 Φ k ( x(t), t, 0 ) x0 ( t ) + ∇2 Φ k ( x(t), t, 0 )
∂t
= ∇Âk ( x(t) ) + ∇TB̂k ( x(t) ) u(t)
x0 ( t )
T
(3.7)
+ B̂k ( x(t) ) u 0 (t)
However, since x0 = ẋ along solutions, the following defines a valid replacement
for fk :
fk+1 ( x, ẋ, t ) =
#
! )
! "
("
# "
Ēk ( x )
B̄ ( x )
0
Āk ( x )
u(t)
ẋ +
+ k
+
0
0
∇TB̂k ( x )
∇Âk ( x )
0
B̂k ( x )
#
u(t)
u̇(t)
!
(3.8)
T
We have now completed one iteration of the structure algorithm, and turn to finding
conditions on (3.3) that make (3.8) fullfill the same conditions.
In (3.8), the product between u(t) and ẋ( t ) is unwanted, so the structure is restricted
by requiring
∇B̂k ( x(t) ) = 0
that is, B̂k is constant; B̂k ( x ) =
(3.9)
B̂∗k .
Unfortunately, the conflict has just been shifted to a new location by this requirement. The structure of fk+1 does not match the structure in (3.3) together with the
requirement (3.9), since B̂k ( x ) includes the non-constant expression ϕk ( x ). Hence
it is also required that Ek is constant so that ϕk ( x ) may be chosen constant. This is
written as Êk ( x ) = Êk∗ . Then, if structure is to be maintained,
"
#
Ēk∗
∇Âk ( x )
would have to be constant. Again, this condition is not met since ∇Âk ( x ) is generally
not constant. Finally, we are led to also requiring that ∇Âk ( x ) be constant. In other
words, that
Âk ( x ) = Â∗k x
so the structure of (3.3) is really
f ( x, ẋ, t ) = E ∗ ẋ + A∗ x + B∗ u(t)
which is a standard lti dae.
90
3
Shuffling quasilinear dae
Note that another way to obtain conditions on (3.3) which become fulfilled also by
(3.8) is to remove the forcing function u.
The key point of this section, however, is that we have seen that in order to be able
to run the structure algorithm repeatedly on equations in the form (3.3), an implementation that is designed for one iteration on (3.3) is insufficient. In other words, if
an implementation that can be iterated exists, it must apply to a more general form
than (3.3).
3.1.4
Quasilinear structure algorithm
Seeking a replacement for (3.3) such that an implementation for one step of the structure algorithm can be iterated, a look at (3.8) suggests that the form should allow for
dependency on time in the leading matrix. Further, since the forcing function u has
entered the leading matrix, the feature of u entering the equations in a simple way
has been lost. Hence it is no longer motivated to keep Ak ( x ) and Bk ( x )uk (t) separate,
but we might as well turn to the quasilinear form in its full generality,
fk ( x, ẋ, t ) = Ek ( x, t ) ẋ + Ak ( x, t )
The reader is referred to the previous section for the notation used below. This time,
the constant rank assumption leads to the existence of a ϕk such that
Ẽk ( x, t ) = ϕk ( x, t ) Ēk ( x, t )
Such a ϕk can be obtained from a row reduction of E, and corresponds to the row
reduction performed in a quasilinear shuffle algorithm.
It follows that
f˜k ( x, ẋ, t ) = Ẽk ( x, t ) ẋ + Ãk ( x, t )
= ϕk ( x, t ) Ēk ( x, t ) ẋ + Ãk ( x, t )
= ϕk ( x, t ) Ēk ( x, t ) ẋ + Āk ( x, t ) − ϕk ( x, t ) Āk ( x, t ) + Ãk ( x, t )
= ϕk ( x, t ) f¯k ( x, ẋ, t ) + Ãk ( x, t ) − ϕk ( x, t ) Āk ( x, t )
Define
4
Âk ( x, t ) = Ãk ( x, t ) − ϕk ( x, t ) Āk ( x, t )
4
(3.10)
Φ k ( x, t, y ) = ϕk ( x, t ) y + Âk ( x, t )
and note that along solutions,
Φ k ( x, t, 0 ) = Φ k ( x, t, f¯k ( x, ẋ, t ) ) = f˜k ( x, ẋ, t ) = 0
Taking a quasilinear shuffle algorithm perspective on this, we see that Φ k ( x, t, 0 ) =
Âk ( x, t ) is computed by applying the same row operations to A as were used to find
the function ϕk above.
The expression Φ k ( x, t, 0 ) is constant over time, so it can be differentiated with respect to time to obtain a substitute for the (locally) uninformative equations given
3.2
91
Proposed algorithm
by f˜k . Thus, let
4
fk+1 ( x, ẋ, t ) =
f¯k ( x, ẋ, t )
!
∂ t7→Φ k ( x(t), t, 0 )
(t)
∂t
Expanding the differentiation using the chain rule, it follows that
∂ t 7→ Φ k ( x(t), t, 0 )
(t) = ∇1 Φ k ( x(t), t, 0 ) x0 ( t ) + ∇2 Φ k ( x(t), t, 0 )
∂t
= ∇1 Âk ( x(t), t ) x0 ( t ) + ∇2 Âk ( x(t), t )
Again, since x0 = ẋ along solutions, fk may be replaced by
!
"
#
Ēk ( x, t )
Āk ( x, t )
ẋ +
fk+1 ( x, ẋ, t ) =
∇1 Âk ( x, t )
∇2 Âk ( x, t )
(3.11)
(3.12)
This completes one iteration of the structure algorithm, and it is clear that this can
also be seen as the completion of one iteration of a quasilinear shuffle algorithm. As
opposed to the outcome in the previous section, this time (3.12) is in the form we
started with, so the procedure can be iterated.
3.2
Proposed algorithm
Having seen how the structure algorithm can be implemented as an index reduction
method for (exact) quasilinear dae, and that this results in an immediate generalization of the shuffle algorithm for lti dae, we now turn to the task of detailing the
algorithm to make it applicable in a practical setting. Issues to be dealt with include
finding a suitable row reduction method and determining whether an expression is
zero along solutions.
The problem of adopting algorithms for revealing hidden constraints in exact equations to a practical setting has previously been addressed in Reid et al. (2002). The
geometrical awareness in their work is convincing, and the work was extended in
Reid et al. (2005). For examples of other approaches to system analysis and/or index
reduction which remind of ours, see for instance Unger et al. (1995) or Chowdhry
et al. (2004).
3.2.1
Algorithm
The reason to do index reduction in the following particular way is that it is simple
enough to make the analysis easy, and also that it does not rule out some of the
candidate forms (Tidefelt, 2007, section 4.2) already in the row reduction step by
producing a leading matrix outside the form. If maintaining invariant forms would
not be a goal in itself, the algorithm could easily be given better numeric properties
(compare section 2.8), and/or better performance in terms of computation time (by
reuse of expressions and similar techniques).
92
3
Shuffling quasilinear dae
Algorithm 3.1 Quasilinear shuffling iteration for invariant forms.
Input: A square dae,
!
E( x(t), t ) x0 (t) + A( x(t), t ) x(t) = 0
It is assumed that the leading matrix is singular (when the leading matrix is nonsingular, the index is 0 and index reduction is neither needed nor possible).
Output: An equivalent square dae of lower index, and additional algebraic con-
straints.
Iteration:
Select a set of independent rows in E( x(t), t ).
Perform fraction-free row reduction of the equations such that exactly the rows
that were not selected in the previous step are zeroed. The produced algebraic
terms corresponding to the zero rows in the leading matrix, define algebraic
equations restricting the solution manifold.
Differentiate the newly found algebraic equations with respect to time, and join
the resulting equations with the ones selected in the first step to obtain the new
square dae.
Remarks: The most important remark to make here is that the differentiation is not
guaranteed to be geometric (recall the remark in algorithm 2.2). Hence, the termination criterion based on the number of iterations in algorithm 2.2 cannot be used
safely in this context. If that termination criterion is met, our algorithm aborts with
“non-geometric differentiation” instead of “ill-posed”, but no conclusion regarding
the existence of solutions to the dae can be drawn.
Although there are choices regarding how to perform the fraction-free row reduction,
a conservative approach is taken by not assuming anything more fancy than fractionfree Gaussian elimination, with pivoting used only when so required and done the
most naïve way. This way, it is ensured that the tailoring of the reduction algorithms
is really just a tailoring rather than something requiring elaborate extension.
As an alternative to the fraction-free row reduction, the same step may be seen as
a matrix factorization. (Steinbrecher, 2006) This view hides the reduction process in
the factorization abstraction, and may therefore be better suited for high-level reasoning about the algorithm, while current presentation may be more natural from an
implementation point of view and easier to reason about on a lower level of abstraction.
It would be of no consequence for the analysis in the current chapter to require that
the set of equations chosen in the first step always include the equations selected in
the previous iteration, as is done in Rouchon et al. (1992).
3.2
Proposed algorithm
93
We stress again that an index reduction algorithm is typically run repeatedly until a
low index is obtained (compare, for instance, algorithm 2.2). Here, only one iteration
is described, but this is sufficient since the algorithm output is in the same form as
the algorithm input was assumed to be.
Recall the discussion on fraction producing versus fraction-free row reduction
schemes in section 2.8. The proposed algorithm uses a fraction-free scheme for two
reasons. Most importantly in this chapter, it does so in order to hold more invariant
forms (to be defined). Of subordinate importance is that it can be seen as a heuristics
for producing simpler expressions. The body of the index reduction loop is given in
algorithm 3.1.
3.2 Example
Here, one iteration is performed on the following quasilinear dae:

 0  
3
x1 (t) x2 (t) sin(t) 0 x1 (t) x2 (t)  !






0
x1 (t) 0 x2 (t) +  cos(t)  = 0
 ex3 (t)


 0  
4
t
1
0 x3 (t)
The leading matrix is clearly singular, and has rank 2.
For the first step in the algorithm, there is freedom to pick any two rows as the independent ones. For instance, the rows { 1, 3 } are chosen. The remaining row can then
be eliminated using the following series of fraction-free row operations. First


 0  
3
x1 (t) x2 (t) − t sin(t) 0 0 x1 (t) x2 (t) − 4 sin(t) !






0 0 x20 (t) +  cos(t) − 4 x1 (t)  = 0
 ex3 (t) − t x1 (t)


 0  
4
t
1 0 x3 (t)
Then


 0  
3
x1 (t) x2 (t) − t sin(t) 0 0 x1 (t) x2 (t) − 4 sin(t) !



0
0 0 x20 (t) +  e( x(t), t )  = 0


 0  
t
1 0 x3 (t)
4
where the algebraic equation discovered is given by
e( x, t ) = x1 x2 − t sin(t) cos(t) − 4 x1 − ex3 − t x1 x23 − 4 sin(t)
Differentiating the derived equation with respect to time yields a new equation with
residual in the form
 0 
x1 (t)
a1 ( x(t), t ) a2 ( x(t), t ) a3 ( x(t), t ) x20 (t) + b( x(t), t )
 0 
x3 (t)
94
3
Shuffling quasilinear dae
where
a1 ( x, t ) = x2 ( cos(t) − 4 x1 ) − 4 ( x1 x2 − t sin(t) ) + t x23 − 4 sin(t)
a2 ( x, t ) = x1 ( cos(t) − 4 x1 ) − 3 x22 ( ex3 − t x1 )
a3 ( x, t ) = −ex3 x23 − 4 sin(t)
b( x, t ) = − ( sin(t) + t cos(t) ) ( cos(t) − 4 x1 ) − sin(t) ( x1 x2 − t sin(t) )
+ x1 x23 − 4 sin(t) 4 cos(t) ( ex3 − t x1 )
Joining the new equation with the ones selected previously yields the following output from the algorithm (dropping some notation for brevity):

 0  
3
x1 (t) x2 (t) sin(t) 0  x1 (t) x2 (t)  !



t
1
0  x20 (t) +  4  = 0


 0  
a1
a2
a3 x3 (t)
b
Unfortunately, the expression swell seen in this example is typical for the investigated
algorithm. Compare with the neat outcome in example 3.1, where some intelligence
was used to find a parsimonious row reduction.
3.2.2
Zero tests
The crucial step in algorithm 3.1 is the row reduction, but exactly how this can be
done has not been discussed yet. One of the important topics for the row reduction
to consider is how it should detect when it is finished. For many symbolic matrices
whose rank is determined by the zero-pattern, the question is easy; the matrix is
row reduced when the rows which are not independent by construction consist of
only structural zeros. This was the case in example 3.2. However, the termination
criterion is generally more complicated since there may be expressions in the matrix
which are identically zero, although this is hard to detect using symbolic software.
It is proposed that structural zeros are tracked in the algorithm, making many of
the zero tests trivially affirmative. An expression which is not a structural zero is
tested against zero by evaluating it (and possibly its derivative with respect to time)
at the point where the index reduction is being performed. If this test is passed, the
expression is assumed rewritable to zero, but anticipating that this will be wrong
occasionally, the expression is also kept in a list of expressions that are assumed to be
zero for the index reduction to be valid. Numerical integrators and the like can then
monitor this list of expressions, and take appropriate action when an expression no
longer evaluates to zero.
Note that there are some classes of quasilinear dae where all expressions can be put
in a canonical form where expressions that are identically zero can be detected. For
instance, this is the case when all expressions are polynomials.
Of course, some tolerance must be used when comparing the value of an expression
against zero. Setting this tolerance is non-trivial, and at this stage we have no scien-
3.2
Proposed algorithm
95
tific guidelines to offer.• This need was the original motivation for the research on
matrix-valued singular perturbations.
The following example exhibits the weakness of the numerical evaluation approach.
It will be commented on in section 3.2.5.
3.3 Example
Let us consider numerical integration of the inevitable pendulum ˜, modeled by

00 !



x = λx


 00 !

y = λy −g





 1 =! x2 + y 2
where we take g B 10.0. Index reduction will be performed at two points (the time
part of these points is immaterial and will not be written out); one at rest, the other
not at rest. Note that it is quite common to begin a simulation of a pendulum (as well
as many other systems) at rest. The following values give approximately an initial
angle of 0.5 rad below the positive x axis:
x0,rest : { x(0) = 0.87, ẋ(0) = 0, y(0) = −0.50, ẏ(0) = 0, λ(0) = −5.0 }
x0,moving : { x(0) = 0.87, ẋ(0) = −0.055, y(0) = −0.50, ẏ(0) = −0.1, λ(0) = −4.8 }
Clearly, if ẋ or ẏ constitutes an entry of any of the intermediate leading matrices,
the algorithm will be in trouble, since these values are not typically zero. After two
reduction steps which are equal for both points, the equations look as follows (not
showing the already deduced algebraic constraints):


  0 
1

  x   −λ x 

1   ẋ0  10 − λ y 

 1
  y 0   −ẋ  !
 = 0

   + 


  0  
1

  ẏ   −ẏ 
2 ẋ 2 x 2 ẏ 2 y
λ0
0
Reducing these equations at x0,rest , the algorithm produces the algebraic equation
!
20 y = 2 x2 + y 2 λ
but the correct equation, produced at x0,moving is
!
20 y = 2 x2 + y 2 λ + ẋ2 + ẏ 2
The perturbation results in the second part of the thesis are limited to linear systems, but here a theory for
non-linear systems is needed.
˜
The two-dimensional pendulum in Cartesian coordinates is used so often in the study of dae that it was
given this nick name in Mattsson and Söderlind (1993). While their model contains parameters for the
length and mass of the pendulum, our pendulum is of unit length and mass to simplify notation.
•
96
3
Shuffling quasilinear dae
Our intuition about the mechanics of the problem gives immediately that ẋ0 and
ẏ 0 are non-zero at x0,rest . Hence, computing the derivatives of all variables using a
reduction to index 0 would reveal the mistake.
As a final note on the failure at x0,rest , note that ẋ and ẏ would be on the list of
expressions that had been assumed zero. Checking these conditions after integrating
the equations for a small period of time would detect the problem, so delivery of an
erroneous result is avoided.
3.2.3
Longevity
At the point ( x(t0 ), t0 ), the proposed algorithm performs tasks such as row operations, index reduction, selection of independent equations, etc. Each of these may be
valid at the point they were computed, but fail to be valid at future points along the
solution trajectory. By the longevity of such an operation, or the product thereof, we
refer to the duration until validity is lost.
A row reduction operation becomes invalid when its pivot entry becomes zero. A
selection of equations to be part of the square index 1 system becomes invalid when
the iteration matrix looses rank. An index reduction becomes invalid if an expression which was assumed to be zero becomes non-zero. The importance of longevity
considerations is shown by an example.
3.4 Example
A circular motion is described by the following equations (think of ζ as “zero”):
 !
0
0



ζ = x x+y y


 ! 0 2

1 = (x ) + (y 0 )2





 1 =! x2 + y 2
This system is square but not in quasilinear form. The trivial conversion to quasilinear form described in section 2.2.4 yields a square dae of size 5 with new variables
introduced for the derivatives x0 and y 0 .
By the geometrical interpretation of the equations we know that the solution manifold is one-dimensional and equal to the two disjoint sets (distinguished by the sign
choices below, of which one has been chosen to work with)


−

x = cos(β), ẋ = (+)
sin(β), 








+




y
=
sin(β),
ẏ
=
cos(β),
(−)


5
(
)
x,
ẋ,
y,
ẏ,
z
∈
R
:






ζ
=
0,










β ∈ [ 0, 2π ]
3.2
97
Proposed algorithm
Let us consider the initial conditions given by β = 1.4 in the set characterization. The
quasilinear equations are:
 !


ẋ = x0






 ẏ =! y 0




 !

ζ = ẋ x + ẏ y





!


1 = ẋ2 + ẏ 2





 1 =! x2 + y 2
Note that there are three algebraic equations here.• The equations are already row
reduced, and after performing one differentiation of the algebraic constraints and
one row reduction, the dae looks like


  0 
−ẋ

1
  x  






0
  y  
1
−ẏ








ẋ ẏ −1 x



0
y  ζ  + 
0




  0  


  ẋ  
2
ẋ
2
ẏ
0


  0 
ẏ
2 x ẋ + 2 y ẏ
Differentiation of the derived algebraic constraints will yield a full-rank leading matrix, so the index reduction algorithm terminates here. There are now four differential equations,
 

  x0   
 
1

  y 0  −ẋ

  0  −ẏ 
1

 ζ  +  
ẋ ẏ −1 x
y   0   0 

  ẋ   
2 ẋ 2ẏ  0 
0
ẏ
and four algebraic equations
 !

ζ = ẋ x + ẏ y





!


 1 = ẋ2 + ẏ 2



!


1 = x2 + y 2





 !
0 = 2 x ẋ + 2 y ẏ
with Jacobian

 ẋ



 2x

2 ẋ
•
ẏ
2y
2 ẏ
−1
x
2 ẋ
2x

y 
2 ẏ 



2y
!
Another quasilinear formulation would be obtained by replacing the third equation by ζ = x x0 + y y 0 ,
containing only two explicit algebraic equations. The corresponding leading matrix would not be row
reduced, so row reduction would reveal an implicit algebraic equation, and the result would be the same
in the end.
98
3
Shuffling quasilinear dae
The algebraic equations are independent, so they shall be completed with one of the
differential equations to form a square index 1 system. The last two differential equations are linearly dependent on the algebraic equations by construction, but either of
the first two differential equations is a valid choice. Depending on the choice, the
first row of the iteration matrix will be either of
1 0 0 −h 0
0 1 0 0 −h
or
After a short time, the four other rows of the iteration matrix (which are simply the
Jacobian of the algebraic constraints) will approach

 

π
− cos( π2 ) −1 cos( π2 )
sin( π2 )  1
−1
1
 sin( 2 )



π
π 
2 sin( 2 ) −2 cos( 2 ) 
2


 = 


π
π
2
2 cos( 2 ) 2 sin( 2 )
 




π
π
π
π 
2 sin( 2 ) −2 cos( 2 )
2 cos( 2 ) 2 sin( 2 )
2
2
In particular, the third row will be very aligned with 0 1 0 0 −h , which means
!
!
that it is better to select the differential equation ẋ = x0 than ẏ = y 0 . This holds not
only on paper, but numerical simulation using widespread index 1 solution software
(Hindmarsh et al., 2004, through the Mathematica interface) demands that the former
differential equation be chosen.
This example illustrated the fact that, if an implementation derives the reduced equations without really caring about the choices it makes, things such as the ordering of
variables may influence the end result. Hence, the usefulness of the reduced equations would depend on implementation details in the algorithm, even though the
result does not feature any numerically computed entities.
Even though repeated testing of the numerical conditioning while the equations are
integrated is sufficient to detect numerical ill-conditioning, the point made here is
that at the point ( x0 , t0 ) one wants to predict what will be the good ways of performing the row reduction and selecting equations to appear in the square index 1
form.
While it is difficult to foresee when the expressions which are assumed rewritable
to zero seizes to be zero (the optimistic longevity estimation is simply that they will
remain zero forever), there is more to be done concerning the longevity of the row
reduction operations. For each entry used as a pivot, it is possible to formulate scalar
conditions that are to be satisfied as long as the pivot stays in use. For instance, it
can be required that the pivot be no smaller in magnitude than half the magnitude
of largest value it is used to cancel.
Using the longevity predictions, each selection of a pivot can be made to maximize
longevity. Clearly, this is a non-optimal greedy strategy (since only one pivot selection is considered at a time, compared to considering all possible sequences of pivot
selections at once), but it can be implemented with little effort and at a reasonable
runtime cost.
3.2
99
Proposed algorithm
3.2.4
Seminumerical twist
In section 3.2.2 it was suggested that numerical evaluation of expressions (combined
with tracking of structural zeros) should be used to determine whether an expression
can be rewritten to zero or not. That added a bit of numerics to an otherwise symbolic
index reduction algorithm, but this combination of symbolic and numeric techniques
is more of a necessity than a nice twist. We now suggest that numerical evaluation
should also be the basic tool when predicting longevity. While the zero tests are
accompanied by difficult questions about tolerances, but are otherwise rather clear
how to perform, it is expected that the numeric decisions discussed in this section
allow more sophistication while not requiring intricate analysis of how tolerances
shall be set.
Without loss of generality, it is assumed that the scalar tests compare an expression,
e, with the constant 0. The simplest way to estimate (predict) the longevity of
e( x(t), t ) < 0
at the point ( x0 , t0 ) is to first compute the derivatives x0 at t0 using a method that
does not care about longevity, and use linear extrapolation to find the longevity. In
detail, the longevity, denoted Le ( x0 , t0 ), may be estimated as
4
ė( x0 , t0 ) = ∇1 e( x0 , t0 ) x0 (t0 ) + ∇2 e( x0 , t0 )

− e( x0 , t0 ) if ė > 0
4 
 ė( x0 , t0 )
L̂e ( x0 , t0 ) = 

∞
otherwise
In case of several alternatives having infinite longevity estimates by the calculation
above, the selection criterion needs to be refined. The natural extension of the above
procedure would be to compute higher order derivatives to be able to estimate the
first zero-crossing, but that would typically involve more differentiation of the equations than is needed otherwise, and is therefore not a good option. Rather, some
other heuristic should be used. One heuristic would be to disregard signs, but one
could also altogether ignore derivatives when this situation occurs and fall back on
the usual pivot selection based on magnitudes only.
A very simple way to select equations for the square index 1 system is to greedily add
one equation at a time, picking the one which has the largest angle to its projection in
the space spanned by the equations already selected. If the number of possible combinations is not overwhelmingly large, it may also be possible to check the condition
number for each combination, possibly also taking into account the time derivative
of the condition number.
3.2.5
Monitoring
Since the seminumerical algorithm may make false judgements regarding what expressions are identically zero, expressions which are not structurally zero but have
passed the zero-test anyway needs to be monitored. It may not be necessary to evaluate these expressions after each time step, but as was seen in example 3.3, it is wise
100
3
Shuffling quasilinear dae
to be alert during the first few iterations after the point of index reduction.
For the (extremely) basic bdf method applied to equations of index 1, the local integration accuracy is limited by the condition number of the iteration matrix for a time
step of size h. In the quasilinear index 1 case and for small h, the matrix should have
at least as good condition as
"
#
Ē( x, t )
(3.13)
∇1 Ã( x, t )
To see where this comes from•, consider solving for xn in
#
!
"
Ā( xn , tn ) !
Ē( xn , tn ) xn − xn−1
=0
+
0
Ã( xn , tn )
hn
where xn−1 is the iterate at time tn − hn . The equations being index 1 guarantees
that this system has a locally unique solution for hn small enough. Any method
of some sophistication will perform row (and possibly column) scaling at this stage
to improve numerical conditioning. (Brenan et al., 1996, section 5.4.3)(Golub and
Van Loan, 1996) It is assumed that any implementation will achieve at least as good
condition as is obtained by scaling the first group of equations by hn .
For small hn the equations may be approximated by their linearized counterpart for
which the numerical conditioning is simply given by the condition number of the
coefficient matrix for xn . See for example Golub and Van Loan (1996) for a discussion
of error analysis for linear equations. This coefficient of the linearized equation is˜

 "
#
 ∇TĒ( xn , tn ) · (xn − xn−1 ) + Ē( xn , tn ) hn ∇1 Ā( xn , tn )
 1
 +
∇1 Ã( xn , tn )
0
T
Using the approximation
"
xn − xn−1 ≈ hn x0 (tn ) ≈ hn
Ē( xn , tn )
∇1 Ã( xn , tn )
#−1
Ā( xn , tn )
∇2 Ã( xn , tn )
!
gives the coefficient
T
"


"
#−1
!
 T


Ē(
x
,
t
)
Ā(
x
,
t
)
n
n
n
n
 + ∇1 Ā( xn , tn )
Ē( xn , tn )
∇1Ē( xn , tn )


+ hn 
∇1 Ã( xn , tn )
∇2 Ã( xn , tn )


∇1 Ã( xn , tn )

0
#
As hn approaches 0, the matrix tends to (3.13). This limit will be used to monitor numerical integration in examples, but rather than looking at the raw condition number
κ(t) as a function of time t, a static transform, φ, will be applied to this value in order
to facilitate prediction of when the iteration matrix becomes singular. If possible, φ
should be chosen such that φ( κ(t) ) is approximately affine in t near a singularity.
Note that the iteration matrix of example 2.19 was found for an lti dae, while we are currently considering
the more general quasilinear form.
˜
The notation used is not widely accepted. Neither will it be explained here since the meaning should be
quite intuitive, and the terms involving inverse transposes will be discarded in just a moment.
•
3.2
101
Proposed algorithm
0
(κ = ∞)
1
2
3 t
0
−0.05
1
− κ(t)
−0.1
−0.15
Figure 3.1: The condition of the iteration matrix for the better choice of square
index 1 system in example 3.4. The strictly increasing transform of the condition
number makes it roughly linear over time near the singularity. Helper lines
are drawn to show the longevity predictions at the times 1.7 (pessimistic), 2.0
(optimistic), and 2.5 (rather accurate).
Since the ∞-norm and 2-norm condition numbers are essentially the same, the static
transform will be heuristically developed for the 2-norm condition number. Approximating the singular values to first order as functions of time, it follows that, near a
singularity, the condition number can be expected to grow as t 7→ t 1−t , where t1 is
1
the time of the singularity.
The following observation is useful to see how φ can be chosen to match the behavior
of the condition number near a singularity. Suppose φ is unbounded above, that is,
φ(κ) → ∞ as κ → ∞. Then every affine approximation of φ( κ(t) ) will be bad near
the singularity, since the affine approximation cannot tend to infinity. Hence, one
should consider strictly increasing functions that map infinity to a finite value, and
one may normalize the finite value to be 0 without loss of generality. Similarly, the
slope of the affine approximation may be normalized to 1. Given the assumed growth
of the condition number near a singularity, this leads to the following equation for
φ( κ ).
!
1
1
!
⇐⇒
φ
= t − t1 = − 1
t1 − t
t1 −t
1
φ( κ ) = −
κ
!
Since κ is always at least 1, this will squeeze the half open interval [ 1, ∞ ) onto
[ −1, 0 ). As is seen in figure 3.1, the first order approximation is useful well in advance of the singularity. However, further away it is not. For example the prediction
based on the linearization at time 2 would be rather optimistic.
102
3.2.6
3
Shuffling quasilinear dae
Sufficient conditions for correctness
It may not be obvious that the seminumerical row reduction algorithm above really
does the desired job. After all, it may seem a bit simplistic to reduce a symbolic matrix based on its numeric values evaluated at a certain point. In this section, sufficient
(while perhaps conservative) conditions for correctness will be presented. Some new
nomenclature will be introduced, but only for the purpose of making the theorem
below readable.
Consider the quasilinear dae
!
E( x(t), t ) x0 (t) + A( x(t), t ) = 0
Replacing a row
!
ei ( x(t), t ) x0 (t) + ai ( x(t), t ) = 0
by (dropping some “(t)” in the notation for compactness)
h
i
h
i !
ω( x, t )ei ( x, t ) + η( x, t )ej ( x, t ) x0 + ω( x, t )ai ( x, t ) + η( x, t )aj ( x, t ) = 0
where ω and η are both continuous at ( x0 , t0 ) and ω is non-zero at this point, is
called a non-singular row operation on the dae.
Since the new dae is obtained by multiplication from the left by a non-singular matrix, the non-singular row operation on the quasilinear dae does not alter the rank
of the leading matrix.
Let x be a solution to the dae on the interval I, and assume that the rank of E( x(t), t )
is constant as a function of t on I. A valid row reduction at ( x0 , t0 ) of the original
(quasilinear) dae
!
E( x(t), t ) x0 (t) + A( x(t), t ) = 0
is a sequence of non-singular row operations such that the resulting (quasilinear)
dae
!
Err ( x(t), t ) x0 (t) + Arr ( x(t), t ) = 0
has the following properties:
• A solution x is locally a solution to the original dae if and only if it is a solution
to the resulting dae.
• Err ( x, t ) has only as many non-zero rows as E( x, t ) has rank.
3.5 Theorem. Consider the time interval I with inf I = t0 , and the dae with initial
!
condition x(t0 ) = x0 . Assume
1. The dae with initial condition is consistent, and the solution is unique and
differentiable on I.
2. The dae is sufficiently differentiable for the purpose of running the row reduction algorithm.
3.2
Proposed algorithm
103
3. Entries of E( x0 , t0 ) that are zero, are zero in E( x(t), t ) for all t ∈ I. Further,
this condition shall hold for intermediate results as well.
Then there exists a time t1 ∈ I with t1 > t0 such that the row reduction of the symbolic matrix E( x, t ) based on the numeric guide E( x0 , t0 ) will compute a valid row
reduction where the non-zero rows of the reduced leading matrix Err ( x(t), t ) are linearly independent for all t ∈ [ t0 , t1 ].
Proof: The first two assumptions ensure that each entry of E( x(t), t ) is continuous
as a function of t at every iteration. Since the row reduction will produce no more
intermediate matrices than there are entries in the matrix, the total number of entries
in question is finite, and each of these will be non-zero for a positive amount of time.
Further, the non-zero rows of Err ( x0 , t0 ) are independent by construction (as this is
the reduced form of the guiding numeric matrix). Therefore they will contain a nonsingular sub-block. The determinant of this block will hence be non-zero at time t0 ,
and will be a continuous function of time.
Hence, there exists a time t1 ∈ I with t1 > t0 such that all those expressions that are
non-zero at time t0 remain non-zero for all t ∈ [ t0 , t1 ]. In particular, the determinant
will remain non-zero in this interval, thus ensuring linear independence of the nonzero reduced rows.
The last assumption ensures the constant rank condition required by the definition
of valid row reduction, which is a consequence of each step in the row reduction
preserving the original rank, and the rank revealed by the reduced form is already
shown to be constant.
Beginning with the part of the definition of valid row reduction concerning the number of zero-rows, note first that the number of non-zero rows will match the rank at
time t0 since the row reduction of the numeric guide will precisely reveal its rank as
the number of non-zero rows. It then suffices to show that the zero-pattern of the
symbolic matrix contains that of the numeric guide during each iteration of the row
reduction algorithm. However, this follows quite direct by the assumptions since the
zero-pattern will match at E( x(t0 ), t0 ), and the assumption about zeros staying zero
will ensure that no spurious non-zeros appear in the symbolic matrix evaluated at
later points in time.
It remains to show that a function x is a solution of the reduced dae if and only if it is
a solution of the original dae. However, this is trivial since the result of the complete
row reduction process may be written as a multiplication from the left by a sequence
of non-singular matrices. Hence, the equations are equivalent.
Note that, in example 3.3, the conditions of theorem 3.5 were not satisfied since the
expressions ẋ and ẏ were zero at ( x0 , t0 ), but do not stay zero. Since their deviation from zero is continuous, they will stay close to zero during the beginning of the
solution interval. Hence, it might be expected that the computed solution is approximately correct near t0 , and this is confirmed by experiments. However, to show that
this observation is generally valid, and to quantify the degree of approximation, we
104
3
Shuffling quasilinear dae
need a kind of perturbation theory which remains to be developed. This problem was
part of the original motivation for studying matrix-valued singular perturbations, the
main topic of the thesis.
3.3
Consistent initialization
The importance of being able to find a point on the solution manifold of a dae, which
is in some sense close to a point suggested or guessed by a user, was explained section 2.2.7. In this section, this problem is addressed using the proposed seminumerical quasilinear shuffle algorithm. While existing approaches (see section 2.2.7)
separate the structural analysis from the determination of initial conditions, we note
that the structural analysis may depend on where the dae is to be initialized. The interconnection can readily be seen in the seminumerical approach to index reduction,
and a simple bootstrap approach can be attempted to handle it.
3.3.1
Motivating example
Before turning to discussing the relation between guessed initial conditions and algebraic constraints derived by the seminumerical quasilinear shuffle algorithm, we
give an illustration to keep in mind in the sections below.
3.6 Example
Let us return to the inevitable pendulum in example 3.3,

00 !



x = λx


 00 !

y = λy −g





 1 =! x2 + y 2
where g = 10.0, with guessed initial conditions given by
x0,guess : { x(0) = cos(−0.5), ẋ(0) = 0, y(0) = sin(−0.5), ẏ(0) = −0.1, λ(0) = 0 }
Running the seminumerical quasilinear shuffle algorithm• at x0,guess produces the
algebraic constraints
 !


1 = x2 + y 2




 !
Cx0,guess = 
0 = 2 x ẋ + y ẏ





 0 =! 2 x2 λ + 2 y ( g − y λ ) + 2 ẏ 2
•
The implementation used here does not make the effort to compute the derivatives needed to make better
zero tests and longevity estimates.
3.3
105
Consistent initialization
The residuals of these equations at x0,guess are


0.0




0.0959




 9.61
so either the algorithm simply failed to produce the correct algebraic constraints although x0,guess was consistent, or x0,guess is simply not consistent. Assuming the latter, we try to find another point by modifying the initial conditions for the three
variables ẋ, y, and λ to satisfy the equally many equations in Cx0,guess . This yields
x0,second :
{ x(0) = cos(−0.5), ẋ(0) = −0.055, y(0) = −0.48, ẏ(0) = −0.1, λ(0) = −4.804 }
(This point does satisfy Cx0,guess ; solving the equations could be difficult, but in this
case it was not.) At this point the algorithm produces another set of algebraic constraints:
 !
 1 = x2 + y 2





 !
Cx0,second = 

 0 = 2 x ẋ + y ẏ



 0 =! 2 x2 λ + 2 y ( g − y λ ) + 2 ẋ2 + 2 ẏ 2
with residuals at x0,second :


0.0




0.0




 0.0060
By modifying the same components of the initial conditions again, we obtain
x0,final :
{ x(0) = cos(−0.5), ẋ(0) = −0.055, y(0) = −0.48, ẏ(0) = −0.1, λ(0) = −4.807 }
This point satisfies Cx0,second , and generates the same algebraic constraints as x0,second .
Further, the algorithm encountered no non-trivial expressions which had to be assumed rewritable to zero, so the index reduction was performed without seminumerical decisions. Hence, the index reduction is locally valid, and the reduced equations
provide a way to construct a solution starting at x0,final . In other words, x0,final is
consistent.
3.3.2
A bootstrap approach
A seminumerical shuffle algorithm maps any guessed initial conditions to a set of
algebraic constraints. Under certain circumstances, including that the initial conditions are truly consistent, the set of algebraic constraints will give a local description
of the solution manifold. Hence, truly consistent initial conditions will be consistent
with the derived algebraic constraints, and our primary objective is to find points
106
3
Shuffling quasilinear dae
with this property. Of course, if a correct characterization of the solution manifold
is available, finding consistent initial conditions is easy given a reasonable guess —
there are several ways to search a point which minimizes some norm of the residuals
of the algebraic constraints. If the minimization fails to find a point where all residuals are zero, the guess was simply not good enough, and an implementation may
require a better guess form the user.
If the guessed initial conditions are not consistent with the derived algebraic constraints, the guess cannot be truly consistent either, and we are interested in finding a
nearby point which is truly consistent. In hope that the derived algebraic constraints
could be a correct characterization of the solution manifold, even though they were
derived at an inconsistent point, the proposed action to take is to find a nearby point
which satisfies the derived constraints.
What shall be considered nearby is often very application-specific. Variables may
be of different types, defying a natural metric. Instead, if the solution manifold is
characterized by m independent equations, a user may prefer to keep all but m variables constant, and adjust the remaining to make the residuals zero. This avoids in a
natural way the need to produce an artificial metric.
No matter how nearby is defined, we may assume that the definition implies a mapping from the guessed point to a point which satisfies the derived algebraic constraints (or fails to find such a point, see the remark above). Noting that a guessed
point of initial conditions is mapped to a set of algebraic constraints, which then
maps the guessed point to a new point, we propose that this procedure be iterated
until convergence or cycling is either detected or suspected.
3.3.3
Comment
The algebraic constraints produced by the proposed seminumerical quasilinear shuffle algorithm are a function of the original equations and a number of decisions (pivot
selections and termination criteria) that depend on the point at which index reduction is performed. Since the number of index reduction steps before the algorithm
gives up is bounded given the number of equations, and the number of pivot selections and termination criterion evaluations is bounded given the number of index
reduction steps, the total number of decisions that depend on the point of index
reduction is bounded (although the algorithm has to give up for some problems).
Hence, any given original equations can only give rise to finitely many sets of algebraic constraints. The space of initial conditions (almost all of which are inconsistent) can thus be partitioned into finitely many regions according to what algebraic
constraints the algorithm produces.
Defining the index of a dae without assuming that certain ranks are constant in
the neighborhood of solutions can be difficult, and motivates the use of so-called
uniform and maximum indices, see Campbell and Gear (1995). To the bootstrap
approach above, constant ranks near solutions implies that the algorithm will produce the correct algebraic constraints if only the guessed initial conditions are close
enough to the solution manifold. To see how the approach suffers if constant ranks
near solutions are not granted, it suffices to note that even finding a point which gen-
3.4
Conclusions
107
erates constraints at level 0 which it satisfies can be hard. In other words, consistent
initialization can then be hard even for problems of index 1.
3.4
Conclusions
The shuffle algorithm for lti dae can be generalized so that it applies to quasilinear
dae. The current chapter highlights that numerical evaluation may be both necessary and convenient, and proposes several aspects of index reduction and numerical
solution where seminumerical methods are useful.
The numerical evaluation introduces uncertainty in the equations, and the related
perturbation problems do not have the structure of any of the well known perturbation problems in the literature. New perturbation problems also arise near points
where there is a change in matrix ranks.
Since the quasilinear form is a nonlinear form, the perturbation theory needed to
complete the proposed algorithms will also be nonlinear. Although the matrixvalued perturbation problems studied in the second part of the thesis are all linear,
it has always been a long term goal to derive results for nonlinear systems so that
the quasilinear shuffle algorithm can be theoretically justified. Having emphasized
the connection between nonlinear problems and the linear theory developed in the
second part of the thesis, it must be mentioned that the results for linear systems in
the second part of the thesis have an immediate application in the shuffle algorithm
for linear systems.
Part II
Results
4
Point-mass filtering on manifolds
L
ife in differential-algebraic equations is full of constraints. What if the
index-reduction based on matrix-valued singular perturbations revealed a constraint showing that we might as well consider ourselves living on a sphere?! If not knowing where we are on this sphere is
a problem we often have to deal with, we will surely write a lot of algorithms to find out.
Writing algorithms in terms of coordinate tuples was so convenient when
we thought the world was flat, but now what? Trying to use the same
interpretation of the coordinate tuple at any point in the world causes
singularity in our algorithms, and we get lost again. We need new tools to
shield us form the traps of the curvature in space.
The current chapter is an extended version of
Henrik Tidefelt and Thomas B. Schön. Robust point-mass filters on manifolds. In Proceedings of the 15th IFAC Symposium on System Identification, pages 540–545, Saint-Malo, France, July 2009.
and presents a framework for algorithm development for objects on manifolds.
The only connection with the rest of the thesis is that index reduction of nonlinear
differential-algebraic equations may reveal a manifold structure in the problem.
There are two main approaches for how to deal with this. The first approach is to
keep the equations in their differential-algebraic form, and try to invent (or search in
the literature) dae versions of all the theory and algorithms we need for the task at
hand (for example, finding out where we are on the sphere). The second approach is
to use the ordinary differential equations that result from index reduction together
with the manifold structure implied by the discovered non-differential constraints.
We then need tools for working with ordinary differential equations on a manifold,
and the present chapter is a contribution to this field.
109
110
4.1
4
Point-mass filtering on manifolds
Introduction
State estimation on manifolds is commonly performed by embedding the manifold
in a linear space of higher dimension, combining estimation techniques for linear
spaces with some projection scheme (Brun et al., 2007; Törnqvist et al., 2009; Crassidis et al., 2007; Lo and Eshleman, 1979). Obvious drawbacks of such schemes are
that computations are carried out in the wrong space, and that the arbitrary choice of
embedding has an undesirable effect on the projection operation. Another common
approach is to let the filter run in a linear space of local coordinates on the manifold.
Drawbacks include the local nature of coordinates, the nonlinearities introduced by
the curved nature of the manifold, and the dependency on the choice of coordinates.
Despite the drawbacks of these two approaches, it should be admitted that they work
well for many “natural” choices of embeddings and local coordinates, as long as the
uncertainty about the state is concentrated to a small — and hence approximately
flat — part of the manifold. Still, the strong dependency on embeddings and local
coordinates suggests that the estimation algorithms are not defined within the appropriate framework. The Monte-Carlo technique called the particle filter lends itself
naturally to a coordinate-free formulation (as in Kwon et al. (2007)). However, the
stochastic nature of the technique makes it unreliable•, and addressing this problem
motivates the word robust in the title of this work. With a growing geometric awareness among state estimation practitioners, geometrically sound algorithms tailored
for particular applications are emerging. A very common application is that of orientations of rigid bodies (for instance, Lee and Shin (2002)), and this is also a guiding
application in our work.
Our interest in this work is to examine how robust state estimation on compact manifolds of low dimension can be performed while honoring the geometric nature of the
problem. The robustness should be with respect to uncertainties which are not concentrated to a small part of the manifold, and is obtained by using a non-parametric
representation of stochastic variables on the manifold. By honoring the geometric
nature we mean that we intend to minimize references to embeddings and local coordinates in our algorithms. We say minimize since, under a layer of abstraction,
we too will employ embeddings to implement the manifold structure, and local coordinates are the natural way for users to interact with the filter. Still, the proposed
framework for state estimation can be characterized by the abstraction barrier that
separates the details of the embedding from the filter algorithm. For example, in the
context of estimation of orientations, rather than speaking of filters for unit quaternions or rotation matrices, this layer of abstraction enables us to simply speak of
filters for S O(3) — both unit quaternions and rotation matrices may be used to implement the low-level details of the manifold structure, but this is invisible to the
higher-level estimation algorithm.
Pursuing non-parametric filtering in curved space comes at some computational
•
The technique is unreliable in the sense that the produced estimate depends on samples from random
variables inside the algorithm, and the estimate can at most be expected to be correct on average. If the
random samples come out very unfortunate, the produced estimate may come out very far from the correct
result.
4.1
Introduction
111
costs compared to the linear space setting. Most notably, equidistant meshes do not
exist, but on the other hand our restriction to compact manifolds means that the
whole manifold can be “covered” by a mesh with finitely many nodes. One of the
practical benefits of the proposed non-parametric filter is the ability to dynamically
adapt the mesh to enhance the degree of detail in regions of interest, for instance,
where the probability density is high.
The proposed point-mass-based solution for filtering in curved space has three main
components:
• Compute — and possibly update — a tessellation of the manifold. Each region
of the tessellation is required to be associated with a point that will represent
the location of the region in calculations, and the volume of each region must
be known.
• Implement measurement and time updates. This requires a system model
which, unlike when filtering in Euclidean space, cannot have additive noise
on the state.
• Provide the user with a point estimate. There is always the option to compute a
cheap extrinsic estimate (typically the extrinsic mean), but honoring geometric
reasoning in this work, we also look into intrinsic estimates.
Each of these components will be considered in the following sections, including special treatment for the case of spheres where the general situation lacks detail. A more
detailed, algorithmic, description of the proposed solution is given in section 4.6.
Terminology. By manifold , we refer to a differentiable, Riemannian manifold.
Loosely speaking, a (contravariant) vector is a velocity on the manifold, belonging
to the tangent space (which is a vector space) at some point on the manifold, and is
basically valid only at that point. A curve on the manifold which locally connects
points along the shortest path between the points, is called a geodesic, and the
exponential map maps vectors to points on the manifold in such a way that, for a
vector v at p, the curve t 7→ etpv has velocity v at t = 0, and is a geodesic. When
needed, we shall assume that the manifold is geodesically complete, meaning that
the exponential map shall be defined for all vectors. We recommend Frankel (2004)
for an introduction
n too these concepts from differential geometry. A tessellation of the
manifold is a set Ri of subsets of the manifold, such that “there is no overlay and
i
no gap” between regions; the union of all regions shall be the whole manifold, and
the intersection of any two regions shall have measure zero. We shall additionally
require that each region Ri be simply connected.
Notation. The manifold on which the estimated state evolves is denoted M. We
make no distinction in notation between ordinary and stochastic variables; x my refer
both to a stochastic variable over the manifold and a particular point on the manifold.
The probability of a statement, such as x ∈ R, is written P( x ∈ R ). The probability
density function for a stochastic variable x is written fx . When conditioning on a
variable taking on a particular value, we usually drop the stochastic variable from the
notation; for instance, fx|y is a shorthand for fx|Y =y , where the distinction between the
112
4
Point-mass filtering on manifolds
stochastic variable, Y , and the value it takes, y, had to be made clear. The distance
in the induced Riemannian metric, between the points x and y, is written d( x, y ).
The symbol δ is used to denote the Dirac delta “function”. A Gaussian distribution
over a vector space, with mean m and covariance C, is denoted N ( m, C ), and if the
variable x is distributed according to this distribution, we write x ∼ N ( m, C ). (The
covariance is a symmetric, positive semidefinite, linear mapping of pairs of vectors
to scalars, and it should be emphasized that a covariance is basically only compatible
with vectors at a certain point on the manifold.) In relation to plans for future work,
we should also mention that group structure on the manifold is not used in this work,
although such manifolds, Lie groups, are often a suitable setting for estimation of
dynamic systems.
4.2
Background and related work
For models with continuous-time dynamics, the evolution of the probability distribution of the state is given by the Fokker-Planck equation, and a great amount of
research has been aimed at solving this partial differential equation under varying
assumptions and approximation schemes. Daum (2005) gives a good overview that
should be accessible to a broad audience. In the present discrete-time setting, the
corresponding relation is the Chapman-Kolmogorov equation. It tells how the distribution of the state at the next time step (given all available measurements up till the
present) depends on the distribution of the state at the current time step (given all
available measurements up till the present) and the process noise in the model. Let
y0..t be the measurements up to time t, and xs|t be the state estimate at time s given
y0..t . Conditioned on the measurements y0..t , and using that xt+1 is conditionally
independent of y0..t given xt , the Chapman-Kolmogorov equation states the familiar
Z
fxt+1|t ( xt+1 ) = fxt+1 |xt ( xt+1 ) fxt|t ( xt ) dxt
(4.1)
In combination with Bayes’ rule for taking the information in new measurements
into account,
fxt|t−1 ( xt ) fyt |xt ( yt )
fxt|t ( xt ) =
fyt|t−1 ( yt )
this describes exactly the equations that the discrete-time filtering problem is all
about.
To mention just a few references for the particular application of filtering on S O(3), a
filter for random walks on the tangent bundle (with the only system noise being additive noise in the Lie algebra corresponding to velocities) was developed in Chiuso and
Soatto (2000), a quaternion representation was used with projection and a Kalman
filter adapted to the curved space in Choukroun et al. (2006), and Lee et al. (2008)
proposes a method to propagate uncertainty under continuous-time dynamics in a
noise-free setting. The particle filter approach in Kwon et al. (2007) has already been
mentioned.
4.3
113
Dynamic systems on manifolds
A solid account of the most commonly used methods for filtering on S O(3) is provided by Crassidis et al. (2007). In Lo and Eshleman (1979) the authors presents an
interesting representation of probability density functions on S O(3), making use of
exponential Fourier densities.
4.3
Dynamic systems on manifolds
The filter is designed to track the discrete-time stochastic process x, evolving on some
manifold of low dimension. That the dimension is low is instrumental to enabling
the use of filter techniques that, in higher dimensions, break down performance-wise
due to the curse of dimensionality (Bergman, 1999, section 5.1). We use discrete-time
models• in the form
qx ∼ Wg( x, u )
y ∼ Vx
where Wg( x, u ) is the random distribution of process noise taking values on the manifold, u is a known external input, and the measurement y is distributed according to
the random distribution Vx . Not being aware of a standard name for a distribution
over the manifold, parameterized by a point on the same manifold, we shall use distribution field for W• (here, the bullet indicates that there is a free parameter — for
a fixed value of this parameter, we have an ordinary random distribution).
For example, the measurement equation could be given by
Vx = N h( x ), Cy ( x )
That is, we have additive Gaussian white noise added to the nominal measurements
h( x ), and we allow the noise covariance to depend on the state.
A less general example of the dynamic equation could be to combine Gaussian distributions with the exponential map
qx ∼ exp N 0, Cg( x, u )
Here, N 0, Cg( x, u ) is our way of denoting a zero mean Gaussian distribution of
vectors at g( x, u ). However, (without the structure of a Lie group) the simplicity of
this expression is misleading, since the Gaussian distributions at different points on
the manifold are defined in different tangent spaces. Hence, a common matrix will
not be sufficient to describe the covariance in all points.
To really obtain simple equations for the dynamic equation, we may employ distributions that only depend on the distance
fqX ( qx ) = fd ( d( qx, g( x, u ) ) )
•
We avoid the term state space model here since this notion is so strongly associated with models in terms
of a state vector which is just a coordinate tuple; our models shall be stated in a coordinate-free manner.
114
4.4
4
Point-mass filtering on manifolds
Point-mass filter
The main idea of the point-mass filter is to model the probability distribution of the
state x being estimated as a sum of weighted Dirac delta functions. The Dirac deltas
are located at fixed positions in a uniform grid, and the idea dates back to the seminal
work by Bucy and Senne (1971). When the filter is run, a sequence of such random
variables will be produced and there is a need to distinguish between the variables
before and after measurement and time updates, recall the notation introduced in
section 4.2.
Readers familiar with the particle filter will notice many similarities to the proposed
filter, but should also pay attention to the differences. To mention a few, the proposed filter is deterministic (and in this sense robust), does not require resampling,
associates each probability term (compare particle) with a region in the domain of
the estimated variable, and calculates with the volumes of these regions. One notable drawback compared to the particle filter is that when the estimated probability
is concentrated to a small part of the domain, the particle filter will automatically
adapt to provide estimates with smaller uncertainty, while the proposed filter would
require a non-trivial extension to do so.
In this section, we first discuss the representation of stochastic variables, and then
turn to deriving equations for the time and measurement updates, expressed using
the proposed representation.
4.4.1
Point-mass distributions on a manifold
In this section, we consider how any random variable on the manifold may be represented, and omit time subscripts to keep the notation clear. That the idea is termed
point-mass is due to the sometimes used assumption that the probability is distributed discretely at certain points. Written using the Dirac delta, the probability
density function for x is then given by
X
fX ( x ) =
p i δ( x − x i )
i
where the sum is over some finite number of points with probability p i located at
x i . While this makes several operations on the distribution feasible, which would
be extremely computationally demanding using other models, this is clearly very
different from what we would expect the density function to look like.
To be able to make other interpretations of the pairs ( p i , x i ), each such pair needs to
i
be associated
n o with a region R of the probability
space, and we require that the set of
i
i
regions, R , be a tessellation. Let µ = µ Ri , where µ( • ) measures volume.
i
That our definition of tessellation did not require that the overlaps between regions
be empty forces us to use only the interior of the regions for many purposes, that
is, Ri \ ∂Ri instead of Ri . For the sake of brevity, however, we shall make abuse of
notation and often write simply Ri when it is actually the interior that is referred to
— the reader should be able to see where this applies.
4.4
115
Point-mass filter
n o
Given a tessellation Ri (of cardinality N ), a more relaxed interpretation of the
i
probabilities p i is obviously
P X ∈ Ri = p i
(4.2)
and a more realistic model of the distribution is that it is piecewise constant;
X pi
fX ( x ) =
µi
i
i : x∈R
Note that the sum may expand to more than one term, but only on a set of measure
zero.
Given the tessellation, including the µi , it is clear that the numbers p i my be replaced
4 pi
by f i = µi . Since this is a more natural representation of piecewise constant functions
in general, we choose to use this also for the probability density function estimate.
For completeness, we state the above equations again, now using f i instead of p i :
P X ∈ Ri = f i µi
(4.3)
P
 f i µi δ( x − x i ) , (Point-mass)

 i
fX ( x ) = 
(4.4)
P

 i : x∈Ri f i ,
(Piecewise constant)
The point-mass filter is a meshless method in that it does not make use of a connection graph describing neighbor relations between the nodes x i . (A connection graph
is implicit in the tessellation, but it is not used.) While meshless methods in many finite element method applications would use interpolation (of, for instance, Sibson or
Laplace type, see Sukumar (2003) for an overview of these) instead of the piecewise
constant (4.4), our choice makes it easy to ensure that the density is non-negative
and integrates to 1. Furthermore, both computation of the interpolation itself, and
use of the interpolated density, would drastically increase the computational cost of
the algorithm.
It turns out that computing good tessellations is a major task of the implementation
of point-mass filters on manifolds, just like mesh generation is a major task when
using finite element methods. It may also be a time-consuming task, but a basic
implementation may do this once for all, offline. Since the number of regions greatly
influences the runtime cost of the filter, a tessellation computed offline will have
to be rather coarse. For models where large uncertainty is inherent in the filtering
problem, this may be sufficient, but if noise levels are low and accurate estimation is
theoretically achievable, the tessellation should be adapted to have smaller regions
in areas where the probability density is high.•
If each region Ri is given as the set of points being closer to x i than to all other x j,i ,
the tessellation is called a Voronoi diagram of the manifold (in case of the 2-sphere,
see for instance Augenbaum and Peskin (1985); Na et al. (2002)). Since this will make
•
This statement is based on intuition; it is a topic for future research to provide a theoretical foundation for
how to best adapt the tessellation.
116
4
Point-mass filtering on manifolds
the point-mass interpretation more reasonable, it seems to be a desirable property of
the tessellation, although a formal investigation of this strategy remains a relevant
topic for future research.
To make transitions between tessellations easy, we require that adaptation is performed by either splitting a region into smaller regions, or by recombining the parts
of a split region. Following this scheme, two kinds of tessellation operations are
needed; first one to compute a base tessellation of the whole manifold, and then
one to split a region into smaller parts. When the base tessellation is computed, the
curved shape of the manifold on a global scale will be necessary to consider. The base
tessellation should be fine enough to make flat approximations of each region feasible. Such approximation should be useful to the algorithm that then splits regions
into smaller parts. How to compute good base tessellations will generally require
some understanding of the particular manifold at hand, and will therefore require
specialized algorithms, while the splitting of approximately flat regions should be
possible in a general setting.
Finally, a scheme for when to split and when to recombine will be required. This
scheme shall ensure that the regions are small where appropriate, while keeping the
total number of regions below some given bound.
4.4.2
Measurement update
Just as for particle filters, the measurement update is a straightforward application
of Bayes’ rule. To incorporate a new measurement of the random variable Y ∼ Vx
modeling the output, we have•
fY |X∈Ri ( y ) P X ∈ Ri
i
P X ∈ R |y =
fY ( y )
fY |X=xi ( y ) P X ∈ Ri
≈
fY ( y )
where the measurement prior fY ( y ) need not be known since it is a common factor to
all probabilities on the mesh, and will just act as a normalizing constant. Converting
to our favorite representation f i , adding time indices, conditioning on Y0..t−1 , and
•
To see this, let By (r) denote a ball of radius r centered at y. The relation follows directly from
P X ∈ Ri | y
P X ∈ Ri | Y ∈ By (r) P Y ∈ By (r)
fY ( y ) = lim
r→0
µ By (r)
P X ∈ Ri ∧ Y ∈ By (r)
= lim
r→0
µ By (r)
P Y ∈ By (r) | X ∈ Ri P X ∈ Ri
= lim
r→0
µ By (r)
= fY |X∈Ri ( y ) P X ∈ Ri
4.4
117
Point-mass filter
using conditional independence of Y and Y0..t−1 given X, this reads
i
fY |X=xi ( y ) ft|t−1
P X ∈ Ri | y0..t
i
ft|t
=
≈
fYt|t−1 ( y )
µi
By defining
4
BayesRule( f , g ) = R
f g
f g
and noting that the result will always be a proper probability distribution (and hence
integrate to 1, just as the result of the BayesRule operator) we can write:
fXt|t = BayesRule fXt|t−1 , fY | X=• ( y )
Note how the volumes of regions enter the computation of the BayesRule operator:
BayesRule( f , g ) ( x i ) ≈ P
4.4.3
f ( x i ) g( x i )
j
j
j
j f ( x ) g( x ) µ
(4.5)
Time update in general
The time update can be described by the relation
Z Z
P qX ∈ Ri =
fWg( x, u ) ( x̄ ) fX ( x ) dx̄ dx
M Ri
In the filtering application, the stochastic entities in this relation will be conditioned
on y0..t , but since the conditioning is the same on both sides, it may be dropped for
the sake of a more compact notation in this section. By the mean value theorem, we
find
Z
P qX ∈ Ri = µi fWg( x, u ) ( x̄ ) fX ( x ) dx
M
Ri ,
for some x̄ ∈
and dividing both sides by µi and fitting the region in a shrinking
ball centered at x i , we obtain
P qX ∈ Ri
→ fqX x i
i
µ
and
Z
Z
fWg( x, u ) ( x̄ ) fX ( x ) dx →
M
fWg( x, u ) x i fX ( x ) dx
M
Hence, we obtain the Chapman-Kolmogorov equation (4.1) in the limit,
Z
fqX x i = fX ( x ) fWg( x, u ) x i dx
M
118
4
Point-mass filtering on manifolds
and this we make the definition of the convolution:
fqX = fX ∗ fWg( •, u )
The convolution of a distribution field and a probability density function is a new
probability density function. We shall think of the time update as implementing this
relation.
By approximating the probability density functions as constant over small regions
(assuming all the regions Ri are small ), we get the time update approximation
Z Z
i
P qX ∈ R =
fWg( x, u ) ( x̄ ) fX ( x ) dx̄ dx
M Ri
Z
≈
µi fWg( x, u ) x i fX ( x ) dx
M
=µ
i
XZ
j
≈µ
i
X
Rj
fW
j
= µi
X
j
fWg( x, u ) x i fX ( x ) dx
g( x j , u )
x
i
Z
fX ( x ) dx
Rj
fW
g( x j , u )
x i P X ∈ Rj
This is readily converted to an implementation of the convolution (here, the conditioning is written out for future reference):
P qX ∈ Ri | y0..t
i
ft+1|t
=
µi
X
P X ∈ Rj | y0..t
(4.6)
fW j
xi
≈
µj
j
g( x , u )
µ
j
X
j
fW j
x i ft|t µj
=
j
4.4.4
g( x , u )
Dynamics that simplify time update
Since the number of regions may be large, and computing the
update convolu time
2
i
tion involves N lookups of the probability density fW j
x , we should consider
g( x , u )
means to keep the cost of each such lookup low.
First, if the system is autonomous (that is, g( x j , u ) does not depend on u), all transitions may be computed offline and stored in a stochastic matrix •. The µj could also
be included in this matrix, reducing the convolution computation to a matrix multiplication.
•
This matrix is often called the transition matrix, but this notion has a different meaning in the thesis.
4.5
119
Point estimates
As was noted above, one class of distributions for the noise on the state, which makes
the expression simple, is that where the density depends only on the distance from
the nominal point;
X j
i
ft|t µj
≈
fd d g( x j , u ), x i
ft+1|t
j
This will be the structure in our example.
4.5
Point estimates
The distinction between intrinsic and extrinsic was introduced in Srivastava and
Klassen (2002), where a mean value of a distribution on a manifold was estimated
by first estimating the mean of the distribution of the manifold embedded in Euclidean space, and then projecting the mean back to the manifold. This, they termed
the extrinsic estimator. In contrast, an intrinsic estimator was defined without reference to an embedding in Euclidean space. While this may seem a hard contrast at
first, Brun et al. (2007) shows that both kinds of estimates may be meaningful from
a maximum likelihood point of view, for some manifolds with “natural embedding”.
4.5.1
Intrinsic point estimates
A common intrinsic generalization of the usual mean in Euclidean space is defined
as a point where the variance obtains a global minimum, where the variance “only”
requires a distance to be defined:
Z
4
VarX ( x ) = d( x̄, x )2 fX ( x̄ ) dx̄
(4.7)
Unfortunately, such a mean may not be unique, but if the support of the distribution
is compact, there will be at least one.
Other intrinsic point estimates may also be defined, but these alternatives will not
be discussed further. The reason is that the motivation for the current discussion is
just to illustrate that it is possible to define algorithms aimed at computing intrinsic
point estimates based on the proposed probability density representation.
Since distributions with a globally unique minimum may be arbitrarily close to distributions with several distinct global minimums, it is our understanding that schemes
based on local search, devised to find one good local minimizer, are reasonable approximations of the definition. Hence, there are two tasks to consider; implementation of the local search, and a scheme that uses the local search in order to find a good
local minimizer.
Given an implementation of the local search, we propose that it be run just once,
initiated at the region representative x i with the least variance. Since the region representatives are assumed to be reasonably spread over the whole manifold, there is
good hope that at least one of them is in the region of attraction of the global minimum. However, even if this is the case, it may not include the x i with least variance,
120
4
Point-mass filtering on manifolds
which directly leads to more robust schemes where the local search is initiated at
several (possibly all) x i . A completely different approach to initialization of the local
search, is to use an extrinsic estimate of the mean if available. Since the extrinsic
mean may be extremely cheap to compute compared to even evaluating the variance
at one point, and may at the same time be a good approximator of the intrinsic mean,
it is very beneficial to use, while the major drawback is that it requires us to go outside the geometric framework.
To implement a local search, one must be able to compute search directions and to
perform line searches. For this, we rely on the exponential map, which allows these
tasks to be carried out in the tangent space of the current search iterate. The search
direction used is steepest descent computed using finite difference approximation,
although more sophisticated methods exist in the literature (Pennec, 2006).
4.5.2
Extrinsic point estimates
The extrinsic mean estimator proposed in Srivastava and Klassen (2002) is defined
by replacing the distance d( x̄, x ) in (4.7) by the distance obtained by embedding
the manifold in Euclidean space and measuring in this space instead. It is argued
that if the support of the distribution is small, this should give results similar to the
intrinsic estimate. However, considering how arbitrary the choice of embedding is,
it is clear that the procedure as a whole is rather arbitrary as well. (Nevertheless, a
good embedding seems likely to produce useful results, see for instance the examples
in Srivastava and Klassen (2002).)
Recall that the algorithm for computing the extrinsic mean is very efficient; first compute the mean in the embedding space, and then project back to the manifold. The
projection step is defined to yield the point on the manifold which is closest to the
mean in the embedding space, and clearly assumes that this point will be unique.
To give an example of how sensitive the extrinsic mean is to the selection of embedding, and why we find it worth-while to spend effort on intrinsic estimates, consider
embedding S 2 in R3 . However, instead of identifying S 2 with the sphere in R3 , we
magnify the sphere in some direction.
4.6
Algorithm and implementation
The final component to discuss before putting the theory of the previous sections
together in an algorithm, is how tessellations are computed. In this section, we do
this, present the algorithm in a compact form in algorithm 4.1 on page 123, and
include some notes on the software design and implementation.
4.6.1
Base tessellations (of spheres)
To be more specific about how a base tessellation may be computed, we have considered how this can be done for spheres, but the technique we employ does not only
work for spheres.
4.6
Algorithm and implementation
121
The first step is to generate the set of points x i . Here, the user is given the ability to
affect the number of points generated, but precise control is sacrificed for the sake
of more evenly spread points. The basic idea is to use knowledge of a sphere’s total
volume to compute a desired volume of each region. Then we use spherical coordinates in nested loops, with the number of steps in each loop being a function of the
current coordinates of the loops outside. The details for the 2-sphere and 3-sphere
are provided in section 4.A.
The remaining steps are general and do not only apply to spheres. First, equations
for the half-space containing the manifold and being bordered by the tangent space
at each point x i is computed. This comes down to finding a base for the space orthogonal to the tangent space at x i — for spheres, this is trivial. The intersection of these
half-spaces is a polytope with a one-to-one correspondence between facets and generating points. (We rely on existing software here, please refer to section 4.6.3 at this
point.) Projecting the facets towards the origin will generate a tessellation, and for
spheres this will be a Voronoi tessellation if the “natural” embedding is used. Each
region is given by the set of projected vertices of the corresponding polytope facet.
As part of the tessellation task, the volume of each region must also be computed. For
the 2-sphere this can be done exactly thanks to the simple formula giving the area of
the region bounded inside the geodesics between three points on the sphere (Beger,
1978, p 198). In the general case we approximate the volumes on the manifold by
the volume of the polytope facets. (Note that a facet can be reconstructed from the
projected vertices by projecting back to the (embedded) tangent space at the generating point.) For spheres the ideal total volume is known, and any mismatch between
the sum of the volumes of the regions and the ideal total volume is compensated by
scaling all volumes by a normalizing constant.
4.6.2
Software design
Our implementation is written in C++ for fast execution. Still, there is a strong emphasis on careful representation of the concepts of geometry in the source code. Perhaps most notably, a manifold is implemented as a C++ type, and allows elements
to be handled in a coordinate-free manner. By providing a framework for writing
coordinate-free algorithms, we try to guide algorithm development in a direction that
makes sense from a geometric point of view. Quite obviously, there is an overhead
associated with the use of our framework, but it is our understanding that if the developed algorithms are to be put in production units, they shall be rewritten directly
in terms of the underlying embedding — our framework is aimed at research and
development, and it is an attempt to increase awareness of geometry in the filtering
community.
Other concepts of geometric relevance that are represented in the software design
are:
• Scalar functions, that is, mappings from a manifold to the set of real numbers.
• Coordinate maps, that is, invertible mappings from a part of the manifold to
tuples of real numbers.
122
4
Point-mass filtering on manifolds
• Tangent spaces, that is, the linear spaces of directional derivatives at a certain
point of the manifold. As with the manifold elements, elements of the tangent
spaces are handled in a coordinate-free manner. The basic means for construction of tangents is to form the partial derivative with respect to a coordinate
function.
• Euclidean spaces are implemented as special cases of manifolds.
4.6.3
Supporting software
A very important part of the tessellation procedure for spheres and other manifolds
with a convex interior seen in the embedding space, are the conversions between
polytope representations. That is, given a set of bounding hyperplanes, we want a
vertex representation of all the faces, and given a set of vertices, we want the corresponding set of hyperplanes. I our work, these tasks were carried out using cddlib
(Fukuda, 2008), distributed under the GNU general public licence.
Although several algorithms for computing the volume of polytopes of arbitrary dimension exist (Büeler et al., 2000), no freely available implementation compatible
with C++ was found. We would like to encourage the development, the sharing, and
the advertisement of such software. The authors’ implementation for this task is a
very simple triangulation-based recursion scheme.
4.7
Example
To illustrate the proposed filtering technique, a manifold of dimension 2 was chosen
so that the probability distributions are amenable to illustration. We consider the
bearing-tracking problem in 3 dimensions, that is, the state evolves on the 2-sphere.
This may be a robust alternative to tracking the position of an object when rangeinformation cannot be determined reliably. It is also a good example to mention
when discussing models without dynamics (velocities are not part of the state), since
the lack of (Lie) group structure makes the extension to dynamic models non-trivial.
As an example of a bearing-sensor in 3 dimensions, we may consider a camera and an
object recognition algorithm, which returns image coordinates in each image frame,
which are then converted to the three components of a unit vector in the corresponding direction. The example is about the higher-level considerations of the filtering
problem, and not the low-level details of implementing the manifold at hand.
The deterministic part of the dynamic equation, g, does not depend on any external input, and just maps any state to itself. The noise in the equation is given by a
von Mises-Fischer distribution field (see the overview Schaeben (1992)) with concentration parameter κ = 12 everywhere.
The three scalar relations in the measurement equation are clearly dependent, as the
manifold has only two dimensions. Also, the fact that the noise in the estimate from
the object recognition has only two dimensions, implies that the noise on the three
components in the measurement equation will be correlated. Besides the dependencies and correlations, noise levels should be state-dependent, as the uncertainty for a
4.7
Example
123
Algorithm 4.1 Summary of point-mass filter on a manifold.
Input:
• A model of a system with state belonging to a manifold.
• An a priori probability distribution for the state at the initial time.
• A sequence of measurement data.
Output: A sequence of probability density estimates for the filtered state, possibly
along with or replaced by point estimates.
i
are the (approximate) values of the probability density
Notation: The numbers ft|t−1
function at the point x i , at time t given the measurements from time 0 to time t − 1.
i
The numbers ft|t
are the (approximate) values of the probability density function at
time t, given also the measurements available at time t.
Initialization:
Compute a tessellation with regions Ri of the manifold. Assign a representative
point x i to each region, and measure the volumes µi . In case of spheres, see
section 4.6.1.
i
i
be the a priori distribution. That is, each f0|−1
is assigned a non-negative
Let f0|−1
P i
i
value, and all values jointly satisfy i f0|−1 µ = 1.
Process measurements:
for t = 0, 1, 2, . . .
Compute a point prediction from ft|t−1 , for instance, by minimizing (4.7).
Use the measurements yt to compute ft|t using BayesRule, see (4.5) for details.
Compute a point estimate from ft|t , for instance, by minimizing (4.7).
Make a time update to compute ft+1|t using (4.6).
Possibly update the tessellation. (Details are subject for future work.)
end
given direction component is at minimum (though not zero) when the tracked object
is in line with the component, and at maximum when at a right angle. Despite our
awareness of this structure, we make the model as simple as possible by assuming
independent and identically distributed Gaussian noise on the three components,
hence parameterized by the single scalar σ = 0.4.
Given an initial state, a simulation of the model equations (compare with simulating
a moving object in 3 dimensions, with measurement noise entering in a simulated
object recognition algorithm) is run, resulting in a sequence of measurements. The
manifold is tessellated into N = 200 approximately equally sized regions, and the
filter is initialized with a uniform probability density. The probability density estimate is then updated as measurements are made available to the filter. The result is
illustrated in figure 4.1.
124
4.8
4
Point-mass filtering on manifolds
Conclusions and future work
We have shown that point-mass filters can be used to construct robust filters on compact manifolds. By separating the implementation of the low-level manifold structure from the higher-level filter algorithm, we are able to formulate and implement
much of the algorithm without reference to a particular embedding. The technique
has been demonstrated by considering a simple application on the 2-sphere.
Future work includes application to S O(3), that is, the manifold of orientations,
adaptation of the tessellation, and utilizing Lie group structure when available. In
order to cope with the substantial increase of dimension that would result from augmenting the state of our models to also include physical quantities such as angular
momentum, the filter should be tailored to tangent or cotangent bundles.
4.8
Conclusions and future work
125
Figure 4.1: Estimated probability density function. Left: predictions before a
measurement becomes available. Right: estimates after measurement update.
Rows correspond to successive time steps. Patches are colored proportional to
the density in each region, and random samples are marked with dots. The color
of the patches is scaled so that white corresponds to zero density, while black
corresponds to the maximum density of the distribution (hence, the scale differs
from one figure to another). It is seen how the uncertainty increases when time
is incremented, and decreases when a measurement becomes available, and that
the uncertainty decreases over time as the information from several measurements is fused.
Appendix
4.A
Populating the spheres
This appendix contains the two algorithms we use to populate spheres S 2 (algorithm 4.2) and S 3 (algorithm 4.3) with points such that the density of points is approximately constant over the whole space. The method contains a minor random
element, but this is not crucial for the quality of the result, and could easily be replaced by deterministic choice.
The idea for populating spheres generalize to higher dimensions. The number of
steps to take in each loop is found by computing the length of the curve obtained
by sweeping the corresponding coordinate over its range while the other coordinates
are held fixed, and the curve length is divided by the side length of a hypercube of
desired volume. The curve length is found as the width of the coordinate’s span,
times the product of the cosines of the other coordinates, and the hypercube volume
times the desired number of points in the population should equal the total volume•
of the sphere (the formula for the volume can be found under the entry for sphere
in Hazewinkel (1992)). Denoting the side of the hypercube δ0 , the dimension of the
sphere N , and the desired number of points n, this corresponds to setting
v
t
N +1
2π 2
N
δ0 B
n Γ ( N2+1 )
where Γ is the standard gamma function.
•
The volume of S 2 is often denoted the area of the sphere.
126
4.A
Populating the spheres
127
Algorithm 4.2 Populating the 2-sphere.
Input: The desired number n of points in the population.
Output: A set P of points on S 2 , approximately of cardinality n.
Notation: Let φ denote the usual polar coordinate map mapping a point S 2 to the
tuple ( θ, ϕ ), where θ ∈ [ 0, π ], ϕ ∈ [ 0, 2 π ]. That is, embedding S 2 in R3 , the
inverse map is identified with


cos( θ ) cos( ϕ )


φ−1 ( θ, ϕ ) =  cos( θ ) sin( ϕ ) 


sin( θ )
Algoritm body:
P ← {}
q
δ0 B 4nπ (Compute the desired volume belonging to each point, and compute
an approximation of the angle which produces a square on the sphere with this
volume.)
l m
max
iθ B δπ
(Compute the number of steps to take in the θ coordinate.)
0
π
∆θ B − i max
θ
(Compute the corresponding step size in the θ coordinate.)
θ0 B π
for iθ = 0, 1, . . . , (iθmax − 1)
θ B θ0 + iθ ∆θ
n j 2 π cos( θ ) k o
iϕmax B max 1,
(Compute the circumference of the sphere at
δ0
the current θ coordinate, and find the number of ϕ steps by dividing by the
desired step length.)
π
∆ϕ B i2max
ϕ
ϕ0 B x, where x is a random sample from [ 0, 2π ].
for iϕ = 0, 1, . . . , (iϕmax − 1)
ϕ B ϕ0 + i ϕ ∆ ϕ
n
o
P ← P ∪ φ−1 ( θ, ϕ )
end
end
Remark: A deterministic replacement for the random initialization of ϕ0 in each iθ
iteration would be to add
ϕ ← ϕ−
1
2
∆ϕ
just before the update of θ, and then use the final ϕ at the end of one iθ iteration as
the initial ϕ in the next iθ iteration.
128
4
Point-mass filtering on manifolds
Algorithm 4.3 Populating the 3-sphere.
Input: The desired number n of points in the population.
Output: A set P of points on S 3 , approximately of cardinality n.
Notation: Let φ denote the usual polar coordinate map mapping a point S 3 to the
tuple ( θ, ϕ, γ ), where θ ∈ [ 0, π ], ϕ ∈ [ 0, π ], γ ∈ [ 0, 2 π ]. That is, embedding S 3
in R4 , the inverse map is identified with


cos( θ ) cos( ϕ ) cos( γ )
 cos( θ ) cos( ϕ ) sin( γ ) 


φ−1 ( θ, ϕ, γ ) = 


cos( θ ) sin( ϕ )



sin( θ )
Algoritm body: Compare the body of algorithm 4.2. This algorithm has the same
structure, and we shall only indicate how the important quantities are computed,
namely the number of steps to take in the different loops.
... q
2 π2
n
l m
π
max
iθ B δ
0
π
∆θ B − i max
θ
δ0 B
3
...
for iθ . . .
θ B ...
n j π cos( θ ) k o
iϕmax B max 1,
δ
π
∆ϕ B i max
ϕ
...
for iϕ . . .
ϕ B ...
0
n j 2 π cos( θ ) cos( ϕ ) k o
iγmax B max 1,
δ
0
π
∆γ B i2max
γ
...
for iγ . . .
γ B ...
n
o
P ← P ∪ φ−1 ( θ, ϕ, γ )
end
end
end
Remark: The random choices in this algorithm can be made deterministic in the
same way as in algorithm 4.2.
5
A new index close to strangeness
Kunkel and Mehrmann have developed a theory for analysis and numerical solution
of differential-algebraic equations. The theory centers around the strangeness index,
which differs from the differentiation index in that it does not consider the derivatives
of the solution to be independent of the solution itself at each time instant. Instead,
it takes the tangent space of the manifold of solutions into account, thereby reducing
the number of dimensions in which the derivative has to be determined. The book
Kunkel and Mehrmann (2006) covers the theory well and will be the predominant
reference used in the current chapter.
The numerical solution procedure applies to general nonlinear differential-algebraic
equations of higher indices, and is currently the only one we know of that can handle
such problems, let be that it does not provide a sensitivity analysis. Our interest in
this matter is mostly due to this capability.
Since the theme of the chapter is to relate a new index to the closely related
strangeness index, parts of the background theory has been included in the present
chapter instead of chapter 2 in order to put the two definitions side by side. Care has
been taken to make it clear what the contributions of the chapter are.
5.1
Two definitions
In this section, two index definitions will be presented along with some basic properties of each. The one to be presented first is the strangeness index, found in Kunkel
and Mehrmann (2006). The second, which is proposed as an alternative, is called the
simplified strangeness index. Both are based on the derivative array equations.
129
130
5.1.1
5
A new index close to strangeness
Derivative array equations and the strangeness index
As always when working with dae, it is crucial to be aware that the solutions are
restricted to a manifold. In practice, one is interested in obtaining equations describing that manifold, and the way this is done in the present chapter is by using the
derivative array introduced in Campbell (1993), see section 2.2.3.
Consider the dae
!
f ( x(t), x0 (t), t ) = 0
(5.1)
Assuming sufficient differentiability of f and of x, the idea is that the original dae is
completed with derivatives of the equations with respect to time. This will introduce
higher order derivatives of the solution, but the key idea is that, given values of x(t),
it suffices to be able to determine x0 (t) in order to compute a numerical solution
to the equations. That is, higher order derivatives such as x00 (t) may appear in the
equations, but are not necessary to determine.
Conversely, the choice of x(t) will affect the possibility to determine x0 (t), and the set
of points x(t) where the derivative array equations can be solved for x0 (t) is the solution manifold. Hence, the derivative array equations can be used as a characterization
of the solution manifold.
If the completion procedure is continued until the derivative array equations are
one-full with respect to x0 (t), the procedure has revealed the differentiation index
of the dae, see definition 2.3. The meaning of one-full is defined in terms of the
equation considered pointwise in time, so that a variable and its derivative become
independent variables. We emphasize the independence by using the variable ẋ(t)
instead of x0 (t), where the dot is just an ornament, while the prime is an operator. The
equations are then said to be one-full if they determine ẋ(t) uniquely within some
open ball, given x(t) and t. An equivalent characterization can be made in terms of
the Jacobian of the derivative array with respect to its differentiated variables; then
the equations are one-full if and only if row operations can bring the Jacobian into
block diagonal form, with a non-singular block in the block column corresponding
to derivatives with respect to ẋ(t) (clearly, this shows that it is possible to solve for
ẋ(t) without knowing the variables corresponding to higher order derivatives of x at
time t).
However, instead of requiring that the completed equations be one-full, it turns out
that there are good reasons for using the weaker requirement that the equations display the strangeness index instead. The definition of strangeness index is the topic
of the current chapter, and will soon be considered in detail. It turns out that equations displaying the strangeness index determine x0 (t) uniquely if one takes into account the connection between x(t) and x0 (t) being imposed by the non-differential
constraints which locally describe the solution manifold. Strangeness-free equations
(strangeness index 0) are suitable for numerical integration. (Kunkel and Mehrmann,
1996)
5.1
131
Two definitions
In the sequel, it will be convenient to speak of properties which hold on non-empty
open balls inside the set
4 !
L νS =
t, x, ẋ, . . . , ẋ νS +1 : FνS ( x, ẋ, . . . , ẋ(νS +1) , t ) = 0
(5.2)
5.1 Definition (Strangeness index). The strangeness index νS at ( t0 , x0 ) is defined
as the smallest number (or ∞ if no such number exists) such that the derivative array
equations•
!
FνS ( t, x(t), x0 (t), x0(2) (t), . . . , x0(νS +1) (t) ) = 0
(5.3)
satisfy the following properties on
\n
o
L νS
t, x, ẋ, . . . , ẋ νS +1 : t ∈ Bt0 (δ) ∧ x ∈ Bx0 (δ)
|
{z
}
bδ
for some δ > 0.
• P1a–[5.1]
There shall exist a constant number na such that the rank of
4 ∂Fν ( t, x, ẋ, ..., ẋ(νS +1) )
∂FνS ( t, x, ẋ, ..., ẋ(νS +1) )
S
MνS ( t, x, ẋ, . . . , ẋ(νS +1) ) =
...
(ν +1)
∂ẋ
∂ẋ
S
is pointwise equal to ( νS + 1 ) nx − na , and shall exist a smooth matrix-valued
function Z2 with na pointwise linearly independent columns such that Z2T MνS =
0.
• P1b–[5.1] Let nd = nx − na , and let AνS = Z2T NνS where
∂FνS ( t x, ẋ, . . . , ẋ(νS +1) )
∂x
Then the rank of AνS shall equal na , and there shall exist a smooth matrix-valued
function X with nd pointwise linearly independent columns such that AνS X = 0.
4
NνS ( t, x, ẋ, . . . , ẋ(νS +1) ) =
• P1c–[5.1] The rank of ∇2 f X shall be full, and there shall exist a smooth matrixvalued function Z1 with nd pointwise linearly independent columns such that
Z1T ∇2 f X is non-singular.
In section 5.1.2 we present the well-known result that the derivative x0 (t) is uniquely
!
defined by FνS = 0, and that we can construct a square strangeness-free dae with the
same solution for x0 (t). The square and strangeness-free equations are referred to as
the reduced equation. Then, it is not surprising that the reduced equations can also
be used for numerical integration, see Kunkel and Mehrmann (1996, 1998).
In section 5.1.3 we propose the new index, seen directly from the viewpoint of discretized equations. In sections 5.2 and 5.3 it is shown that the two views are closely
related.
•
The notation is defined such that x0(1) = x0 .
132
5
A new index close to strangeness
When working with the strangeness index for nonlinear dae, the next theorem is
important to keep in mind.
5.2 Theorem (Kunkel and Mehrmann (2006, theorem 4.13)). Let the strangeness
index of (5.1) be νS . If the conditions of definition
5.1 also hold with ( νS + 1, na , nd ),
(ν
+2)
S
and there is a point t0 , x0 , ẋ0 , . . . , ẋ
∈ LνS +1 , then the reduced square and
!
strangeness-free dae obtained from FνS = 0 has a unique solution passing through
the given point, and this solution also solves (5.1).
Proof: This is Kunkel and Mehrmann (2006, theorem 4.13), and we shall just give
a very brief overview of their proof. A closely related result for the simplified
strangeness index is proved in section 5.3.
Since the reduced equation is implied by the original equation, the original equation
cannot have more solutions than the reduced equation. It follows that it suffices to
show that the reduced equation has a unique solution, and that this solution satisfies
the original equation. In particular, it needs to be shown that the derivatives of the
algebraic variables are consistent with the original equation.
The thing to notice about theorem 5.2 is that only knowing that (5.1) has strangeness
index νS at the point ( t0 , x0 ) is not enough to ensure that there is a solution passing
through this point which also solves (5.1). This fact is well illustrated by Kunkel and
Mehrmann (2006, exercise 4.11).
!
However, if the reduced equation (or the full FνS = 0) has a unique solution, an
approximate alternative to using theorem 5.2 is simply to test the obtained solution
(at a finite number of points along the trajectory) against the original equation. In
view of this, we mainly consider definition 5.1 as a means for determining when the
derivative array equations can be used to show uniqueness of a solution, if one exists
at all.
In the next section, we elaborate what might be obvious, namely that definition 5.1
corresponds to a procedure to determine a solution uniquely, if it exists. Then, in
section 5.1.3 we make an alternative characterization of νS .
5.1.2
Analysis based on the strangeness index
The first step in the analysis, relying on P1a–[5.1], is to determine the local nature of
the non-differential constraints that can be deduced from the derivative array equations. By definition, these constraints do not involve any differentiated variables,
so the local nature of these constraints is obtained as linear combinations of the
derivative array equations such that the gradient with respect to derivatives vanishes. P1a–[5.1] states that there are na such linear combinations, and that the linear
combinations, the columns of Z2 , can be selected smoothly and linearly independent.
Since the Gram-Schmidt procedure can be carried out smoothly, it follows that the
columns of Z2 can be selected of unit length and orthogonal to each other.
5.1
133
Two definitions
Since the linear combinations Z2T FνS are smooth functions on a non-empty open set,
with zero derivative with respect to the differentiated variables everywhere, these
linear combinations do not depend on the differentiated variables at all. Hence, they
are pure non-differential constraints that give a local characterization of the solution
manifold. To see the local nature of these constraints, their gradient with respect
to x is computed, and P1b–[5.1] states that the so obtained normal directions to the
solution manifold are linearly independent. Since they are linearly independent, the
dimension of the solution manifold is nd = nx − na . P1b–[5.1] then states that it is
possible to construct a local coordinate map x(t) = φ−1 ( xd (t), t ) with coordinates xd ,
determined by the partial differential equation
!
∇1 φ−1 ( xd (t), t ) = X( x(t), t )
(5.4)
where the columns of X are smooth functions and pointwise linearly independent.
Again, they can be selected of unit length and orthogonal to each other. That is,
the columns of X can be selected as an orthonormal basis for the right null space of
the matrix A. The local coordinates xd are denoted the dynamic variables. (If the
requirement that X have orthonormal columns is dropped, the dynamic variables
can be selected as a subset of the original variables x, but this may lead to numeric
ill-conditioning.)
The last property, P1c–[5.1], is finally there to ensure that the time derivative of
the local coordinates on the solution manifold are determined by the original equation (5.1). Replacing (5.1) by an equation with residual expressed only through the
dynamic variables,
4 fd ( xd , ẋd , t ) = f φ−1 ( xd , t ), ∇1 φ−1 ( xd , t ) ẋd + ∇2 φ−1 ( xd , t ), t
(5.5)
property P1c–[5.1] states that the Jacobian with respect to ẋd ,
∇2 fd ( xd , ẋd , t ) = ∇2 f φ−1 ( xd , t ), ∇1 φ−1 ( xd , t ) ẋd + ∇2 φ−1 ( xd , t ), t X
(5.6)
is full-rank. Since there are only nd derivatives to be determined, and there are nx
equations, there are na more equations than unknowns. The property P1c–[5.1] also
states that nd linear combinations, given by the columns of Z1 , of the equations in
(5.1) can be chosen smoothly and linearly independent (and hence orthonormal), so
that these linear combinations are sufficient to determine the time derivatives of the
dynamic variables.
The reduced equations can now be constructed by joining the nd residuals Z1T f with
the na residuals Z2T FνS . The resulting system has nx equations and ( νS + 1 ) nx unknowns (x and t being known variables), but νS nx of these cannot and need not be
solved for. Hence, the system may be considered square, and it is easy to see that it
is strangeness-free (with trivial choices of Z1 and Z2 , and the same X as when the
strangeness index of (5.1) was determined).
Unfortunately, despite the theoretical appeal of the reduced equations, and while
it is sufficient to approximate Z1 in numeric implementations, the practical implementation of Z2 needs to make the non-differential equations truly independent of
the differentiated variables. This presents severe difficulties unless it can be shown
134
5
A new index close to strangeness
that Z2 only depends on t, for then it can be computed pointwise in time. It follows
that the reduced equations will generally not be suitable for numerical integration.
We shall now show how the analysis above can be used for numerical integration
without using the reduced equation. Although Kunkel and Mehrmann (2006, section 6.2) addresses this, we give another (although similar) argument for the case
here.
Consider numerical integration via a first order bdf method. We then need to show
that this formula uniquely determines the next iterate, qx, given x. The equations to
which the bdf method is applied is the full derivative array equations, where each
derivative is considered an unknown, and without discretizing any of the derivatives.
To these equations the nd selected linear combinations of f are added, discretizing
all derivatives.
We know that the derivative array equations constrain qx to lie on a manifold which
can be parameterized locally using xd as coordinates. It remains to show that these
coordinates are uniquely determined by the dynamic equations. However, thinking
of the dynamic equations in terms of the dynamic variables, it is readily seen that being able to solve for the derivatives (which is possible by definition 5.1) is equivalent
to being able to solve for qxd for sufficiently small step lengths.
5.1.3
The simplified strangeness index
The final remarks on numerical integration in the previous section relates closely to
the analysis in this section. Here we begin, not with reasoning about the ability to
solve for derivatives, but going directly to the topic of finding equations that uniquely
determine the next iterate in a bdf method. Later, it will be shown in lemma 5.13
that the resulting index definition can also be interpreted as a condition to make x0 (t)
uniquely determined by x(t) and t.
By discretizing the derivatives (using a bdf method) in the original equation (5.1)
(and scaling the equations by the step length), we get that the gradient of these equations with respect to x tends to ∇2 f ( x, ẋ, t ) as the step length tends to zero. Hence,
joining these equations with the full derivative array equations (where no derivatives
are discretized) yields a set of equations which (locally) shall determine x uniquely.
This leads to the following definition.
5.3 Definition (Simplified strangeness-index). The simplified strangeness index
νq at ( t0 , x0 ) is defined as the smallest number (or ∞ if no such number exists) such
that the derivative array equations
!
Fνq ( t, x(t), x0 (t), x0(2) (t), . . . , x0(νq +1) (t) ) = 0
satisfy the following property on
\n
o
L νq
t, x, ẋ, . . . , ẋ νq +1 : t ∈ Bt0 (δ) ∧ x ∈ Bx0 (δ)
|
{z
}
bδ
for some δ > 0.
(5.7)
5.1
135
Two definitions
• P2–[5.3]
Let

∂f ( x, ẋ, t )

∂ẋ

4 

Hνq =  ∂F ( t, x, ẋ, ..., ẋ(νq +1) )
 νq
∂x
0
∂Fνq ( t, x, ẋ, ..., ẋ(νq +1) )
∂ẋ




(νq +1) 
∂Fνq ( t, x, ẋ, ..., ẋ
)

...
(νq +1)
0
...
∂ẋ


 ∇2 f 0 

=  N M 
νq
νq
where Nνq and Mνq are defined as in definition 5.1. Then it shall hold that


#
"
0 
 I
∇ f
0


0  =! rank 2
rank ∇2 f

N νq M νq
N
M νq 
νq
That is, the basis vectors corresponding to x shall be in the span of the rows of
Hνq , which may be recognized as the property of Hνq being one-full.
The property P2–[5.3] can be interpreted as that there is no freedom in the x components of the solution to
!
h f ( x, 1h ( x − q−1 x ), t ) !
=0
Fνq ( t, x, ẋ, . . . , ẋ(νq +1) )
since adding additional equations for the x variables alone does not decrease the
solution space of the linearized equations. For theoretic considerations, however, the
continuous-time interpretation provided by lemma 5.13 below is more relevant.
Of course, we must show what the simplified strangeness index is for the inevitable
pendulum.
5.4 Example
Let us once more consider the pendulum from example 3.3. To match the notation of
the present chapter we define

     
 ξ  ξ̇    λ ξ − u̇ 
       λ y − g − v̇ 
 u  u̇   

     4 
f   y ,  ẏ , t  = ξ 2 + y 2 − 1
  v   v̇    ξ̇ − u 

      

   

λ λ̇
ẏ − v
We consider initial conditions where the pendulum is in motion and neither x nor y
is zero.
To check P2–[5.3] we look at the projection of a basis for the right null space of Hi
onto the space spanned by the basis vectors corresponding to x, for i = 1, 2, . . . , νq .
(The projection is implemented by just keeping the five first entries of the vectors.)
The basis for the null space is computed using Mathematica , and for i = 0, 1, 2 the
136
projected basis vectors are, in order,
     
   0 




0  0  



0 








0  0  
0  0 













   0  
 
   0 
0
0
,
,
,












   0  
   0 











0
0










   1  
    ξ 

 0 −  

 0
y
ẏ ξ−ξ̇ y
5










0
0
0
0
−y
ẏ ξ−ξ̇ y





























A new index close to strangeness
 


0


0


 
0,


 



0


0
 
0
0
 
0
 ,
 
0
0
 
0
0
 
0
 
 
0
0

















Assuming that the symbolic null space computations are valid in some neighborhood
of the initial conditions inside Li , it is seen that the λ component is undetermined
for i = 0 and i = 1, and as all components are determined for i = 2 we get νq = 2.
To verify that the symbolic computations of the null space are actually valid, it must
be checked that the denominators are non-zero. The expressions which were removed by the projections
! are also rational with the denominator ẏ ξ − ξ̇ y. Since the
ξ
is 1 on Li , a geometric interpretation shows that the denomlength of the vector
y
!
ξ̇
inator expression is the scalar product of the vector
and a unit vector which is
ẏ
tangent to the unit circle at the !point !( ξ, y ). Our intuition about the problem gives
u
ξ̇
that the initial conditions for
may actually be chosen parallel with the tan=
v
ẏ
gent, and the
! restriction to a neighborhood of the initial conditions inside Li gives
u
will at least be close to parallel with the tangent. Hence, the denominathat any
v
tor expression is zero precisely when the velocity variables are zero. Since we chose
to analyze the equations for initial conditions where the pendulum is in motion, the
velocity will remain non-zero in a neighborhood of the initial conditions, proving the
validity of the null space computations.
Since our prior understanding of the problem makes it easy to compute points inside
Li for any i, the simplified strangeness index can also be computed numerically if
we either assume or make sure that certain critical ranks are not sensitive to small
perturbations of the variables. The method is not pursued in the example, in order
to keep focus in the current chapter on methods and theory for exact dae.
The following lemma is an example of how easy the simplified strangeness index is
to work with.
5.5 Lemma. If P2–[5.3] is satisfied for νq on (the obvious projection of) Lνq +i ∩ bδ
for some i ≥ 1, then P2–[5.3] is also satisfied for νq + i on the same set.
Proof: If the basis vectors corresponding to x is in the span of the rows of Hνq , they
(extended to the appropriate dimension) will also be in the span of the rows of Hνq +i
h
i
since the upper part of this matrix equals Hνq 0 .
5.1
Two definitions
137
While lemma 5.13 below quite intuitively will show that a finite simplified
strangeness index implies uniqueness of solutions to the dae, we postpone until
section 5.3 to consider how P2–[5.3] may be used to also test existence of solutions.
For now, we concentrate on how to compute the solution if it exists; recall that there
is always a possibility to test any solution numerically (at a finite set of points along
the trajectory) against the original equation, which should yield a good indication of
true existence.
Once νq has been determined, the next task is to select which equations to use (that
is, pick a suitable Z1 ), and which variables to discretize (possibly via a change of
variables, using an approximation of X). It is important to not make the discretized
equations over-determined by including too many independent columns in Z1 , as this
may compromise the non-differential constraints of the dae. Since the discretized
derivatives are approximations, as few variables as possible should be discretized.
The procedure prescribed by definition 5.3 may become demanding if one tries to
use it directly to find a subset of components of f which is sufficient to determine
x, since a null space of a large map has to be computed for each candidate subset of
components. The following constructive method remedies this.
h
i
First, a basis for the right null space of Nνq Mνq (that is, the gradient of the derivative array equations with respect to all unknown variables) is computed. The basis
vectors are then chopped to get the tangent space of the solution manifold seen in xspace. Note that the chopped vectors will span the tangent space, but always contain
too many elements to be a basis (there are ( νq + 1 ) nf equations and ( νq + 2 ) nx variables, and the equations will generally be dependent). To simplify the argument, a
basis X for the tangent space is constructed, and the number of elements in this basis
is the number of dynamic variables. Hence, it is possible to locally parameterize x in
xd , and more equations are needed in order to determine xd given q−1 x.
As in section 5.1.2, we are led to rewriting the equations given by f in terms of
xd , and require that the gradient of these equations (where derivatives have been
discretized) with respect to xd be non-singular. By the chain rule, this means that the
product ∇2 f X shall have full column rank. Selecting a subset of components of f
directly translates to selecting a subset of rows in this matrix product, and selecting
a non-singular such subset is relatively cheap compared to computing the large null
spaces in P2–[5.3].
It can be seen that computing the null space basis X is not necessary, but at least the
dimension nd of the null space has to be known, because instead of requiring that the
product ∇2 f X be non-singular, we shall require that its rank agrees with nd .
Note that if a consistent point is given (with as many derivatives as we may need),
νq can be determined using definition 5.3, and then the constructive method can be
used to determine a sufficient subset of components of f (or, in general, determine
the matrix Z1 ).
For lti dae, definition 5.3 is easily related to the differentiation index νD (definition 2.3), as the next theorem shows.
138
5
A new index close to strangeness
5.6 Theorem. For the lti dae
!
E x0 (t) + A x(t) + B u(t) = 0
it holds that νq = max { 0, νD − 1 }.
Proof: The residual function of the dae is given by
f ( x, ẋ, t ) = E ẋ + A x + B u(t)
If νD = 0, ∇2 f ( x, ẋ, t ) = E is non-singular, and it follows that νq = 0. It remains to
consider νD > 0.
Recall the special structure of the derivative array equations for lti dae, seen in
(2.34),


 


  ẋ   B u(t) 
 E
A
0
 0 
 
A E
  .   B u (t) 

 

.




FνD ( t, x, ẋ, . . . , ẋ(νD +1) ) =  .  x + 
+
  .  
..
.. ..

 .. 
 

.
.
.

 
 ẋ(νD +1)  


0(ν
)
D
0
A E
Bu
(t)
|{z}
|
{z
}
N νD
M νD
!
By definition of νD , ẋ is uniquely determined by FνD = 0, which is a condition only
in terms of MνD . Partitioning this matrix as


 
 E

  ∇2 f
 A E

 


MνD = 
 = 
.. ..


NνD −1 MνD −1 

.
.


A E
shows that νq = νD − 1.
The strangeness index νS is known to have the same relation to the differentiation
index also for ltv dae (Kunkel and Mehrmann, 2006, section 3.3) and nonlinear
dae in Hessenberg form (Kunkel and Mehrmann, 2006, theorem 4.23), but some of
the proofs are lengthy and instead of making more comparisons of νq versus νD , we
now turn to the direct relation between νq and νS .
5.2
Relations
In this section, the two indices νS and νq will be shown to be closely related. This
is done by means of a matrix decomposition developed for this purpose. We first
show the matrix decomposition, and then interpret the two definitions in terms of
this decomposition.
5.2
139
Relations
5.7 Lemma. The matrix
h
N
M
i
where N ∈ Rk×l , M ∈ Rk×k , rank M = k − a, a ≥ 1, can be decomposed as


T

Q3,1
0 


"
#
T
h
i h
i 0 0 Σ 0 
Q3,2
0 


N M = Q1,1 Q1,2
A 0 0 0  Σ−1 QT N QT 

1,1
2,1 




T 
0
Q2,2
In this decomposition, the left matrix is unitary, as are the diagonal blocks of the
right matrix. The matrix Σ is a diagonal matrix of the non-zero singular values of M.
The matrix A is square.
Proof: Introducing the singular value decomposition
#
"
#"
#
"
i Σ 0 QT
Σ 0 T h
2,1
Q2 = Q1,1 Q1,2
M = Q1
T
0 0
0 0 Q2,2
we get
h
N
"
i
M = Q1
T
Q1,1
N
T
Q1,2
N
Σ
0
0
0

0 
T 
Q2,1 
T 
Q2,2
# I

 0

0
By the QR decomposition
#
"
i QT
3,1
Q1,2 N = A 0
T
Q3,2
h
T
where A is square and may contain dependent columns (in particular, some of the
columns may be zero), we then get

 T
0 
Q3,1
" T
#
 T

T
h
i
0 
Q N Q3,1 Q1,1
N Q3,2 Σ 0 Q3,2


N M = Q1 1,1
T
Q2,1 
A
0
0 0  0

T 
0
Q2,2
Finally, the relation
h
T
Q1,1
N Q3,1
T
Q1,1
N Q3,2
Σ
 T 
Q3,1 
i QT 
T
0  3,2  = Q1,1
N
 0 


0
h
= 0
0
Σ


T
 Q3,1 


i  QT


0  −1 3,2
T
N 
Σ Q1,1


0
140
5
A new index close to strangeness
enables us to write
h
N
i h
M = Q1,1
"
Q1,2
i 0 0
A 0
Σ
0

T
#  Q3,1
T

0  Q3,2
 −1 T
0 Σ Q1,1 N

0

0 

0 

T 
Q2,1 
T 
Q2,2
as desired.
5.8 Theorem. Definition 5.1 and definition 5.3 satisfy the relation νS ≥ νq .
Proof: Suppose that the strangeness index is νS and finite, as the infinite case is trivial. Let the matrices N and M in lemma 5.7 correspond to MνS and NνS as in definition 5.1.
First, let us consider νS in view of this decomposition. The left null space of M is
spanned by Q1,2 , and making these linear combinations of N results in


T
 Q3,1 
"
#

 h
T
h
i  Q
i QT

T

3,1
3,2
 = A 0
Q1,2 N = A 0 0 0  −1 T
T
Q3,2
Σ Q1,1 N 


0
where A has full rank due to P1b–[5.1]. This matrix determines the tangent space of
the non-differential constraints as being its null space, spanned by the independent
columns of Q3,2 . Hence, we can parameterize x as x = Q3,2 xd .
Turning to νq , we follow the constructive interpretation of P2–[5.3] in section 5.1.3.
h
i
The right null space of N M is spanned by the second and fourth rows of the right
factor in the decomposition;
!
h
i x !
N M
=0
⇐⇒
y
(5.8)
! "
#
!
x
Q
0
z1
∃z1 , z2 :
= 3,2
y
0
Q2,2 z2
Extracting the part of this equation which only involves x, we find that it can be
parameterized in z1 alone, and since the columns of Q3,2 are independent, we can
use z1 as dynamic variables; x = Q3,2 xd .
Since the strangeness index is νS , ∇2 f Q3,2 has full column rank according to
P1c–[5.1]. Hence,
"
# !
∇2 f 0 x !
=0
⇐⇒
N
M y
(5.9)
!
!
x
0
∃z2 :
=
y
Q2,2 z2
which is exactly the condition captured by P2–[5.3]. Since νq is the smallest index
such that this condition is satisfied, it is no greater than νS .
5.3
141
Uniqueness and existence of solutions
5.9 Theorem. Given the property
h
i
• P3–[5.9] The matrix Nνq Mνq has full row rank on the set Lνq ∩ bδ in definition 5.3. That is,
h
i
rank Nνq Mνq = ( νq + 1 ) nx
(5.10)
it holds that definition 5.1 and definition 5.3 satisfy the relation νS = νq .
Proof: Due to theorem 5.8 it suffices to show νS ≤ νq . To this end, suppose the
equations have finite simplified strangeness index νq , as the infinite case is trivial. Let
the matrices N and M in lemma 5.7 correspond to Mνq and Nνq as in definition 5.3.
The rank condition (5.10) implies that A in lemma 5.7 is non-singular.
!
Consider (5.8) and (5.9). Since adding the equation ∇2 f x = 0 is sufficient to conclude
!
x = 0 given x = Q3,2 z1 , it is seen that ∇2 f Q3,2 z1 = 0 must imply z1 = 0. This is only
true if ∇2 f Q3,2 has full column rank, which shows that P1c–[5.1] holds. Since νS is
the smallest index such that this condition is satisfied, it is no greater than νq .
We now have an alternative to the procedure of definition 5.1 for computing the
strangeness index νS . First, one computes νq according to definition 5.3. If P3–[5.9]
is satisfied for νq and some selection of bδ in definition 5.3, νS = νq according to
theorem 5.9. If P3–[5.9] is not satisfied for any choice of bδ , νS = ∞. The remaining
case, when P3–[5.9] holds on some set where P2–[5.3] does not hold, νS > νq may still
be finite. According to lemma 5.5 it can be found as the smallest number such that
P3–[5.9] holds on the intersection of νS and bδ ∈ Rnx ( νS +2 )+1 , while P2–[5.3] holds on
the obvious projection of this set.
5.3
Uniqueness and existence of solutions
The present section gives a result corresponding to what Kunkel and Mehrmann
(2006, theorem 4.13) states for the strangeness index. As the difference between the
two index definitions is basically a matter of whether P3–[5.9] is required or not, the
main ideas in Kunkel and Mehrmann (2006) apply here as well.
5.10 Lemma. If the simplified strangeness index νq is finite, there exist matrix functions Z1 , Z2 , X, similar to those in definition 5.1. They are all smooth with pointwise
linearly independent columns, satisfying
Z2T Mνq = 0
Z2T Nνq X = 0
T
Z1 ∇2 f X
and columns of Z2 span left null space of Mνq
(5.11a)
and columns of X span right null space of Z2T Nνq
(5.11b)
is non-singular
(5.11c)
Proof: Using the decomposition of lemma 5.7, we may take Z2 B Q1,2 and X = Q3,2 .
As in the proof of theorem 5.9, (5.8) and (5.9) then imply that ∇2 f X has full column
rank, and the existence of Z1 follows.
142
5
A new index close to strangeness
Multiplying the relations in (5.11) by smooth pointwise non-singular matrix functions shows that the matrix functions Z1 , Z2 , X are not unique, but they can be replaced by any smooth matrices with columns spanning the same linear spaces. For
numerical purposes, the smooth Gram-Schmidt orthonormalization procedure may
be used to obtain matrices with good numerical properties, while the theoretical argument of the present section benefits from another choice, to be derived next.
h
i
Select the non-singular constant matrix P = Pd Pa such that Z2T Nνq Pa is nonsingular in a neighborhood of the initial conditions, and make a change of the undotted variables in Lνq according to
!
h
i x
d
(5.12)
x = Pd Pa
xa
The following notation will turn out to be convenient later (note that Nνaq is nonsingular)
4
Nνdq = Z2T Nνq Pd
4
Nνaq = Z2T Nνq Pa
(5.13)
The next result corresponds to Kunkel and Mehrmann (2006, corollary 4.10) for the
strangeness index.
5.11 Lemma. There exists a smooth function R such that
xa = R( xd , t )
(5.14)
inside Lνq , in a neighborhood of the initial conditions.
Proof: In Lνq it holds that Fνq = 0 and Z2T Mνq = 0, and it follows that
∂Z2T Fνq
∂ẋ(1+)
= Z2T
∂Fνq
∂ẋ(1+)
+
∂Z2T
∂ẋ(1+)
F νq = 0
Hence, the construction of Z2 is such that Z2T Fνq only depends on t and x, and the
change of variables (5.12) was selected so that the part of the Jacobian corresponding
to xa is non-singular. It follows that xa can be expressed locally as s function of xd
and t.
We now introduce the function φ−1 to describe the local parameterization of x using
the coordinates xd and t,
!
xd
4
−1
x = φ ( xd , t ) = P
(5.15)
R( xd , t )
and the next lemma shows an important coupling between φ−1 and lemma 5.10.
5.12 Lemma. The matrix X in lemma 5.10 can be chosen in the form
"
#
I
X̂ = P
= ∇1 φ−1 ( xd , t )
∇1 R( xd , t )
(5.16)
5.3
143
Uniqueness and existence of solutions
Proof: Clearly, the columns are linearly independent and smooth. By verifying that
the matrix is in the right null space of Z2T Nνq we will show that its column spans the
same linear space as X. It will then follow that X and X̂ are related by a relation in
the form
X̂ = X W
for some smooth non-singular matrix function W . Using the form X W then shows
that (5.11c) is also satisfied. Hence, it remains to show that X̂ is in the right null
space of Z2T Nνq .
Using (5.14) and allowing also the dotted variables ẋ(1+) to depend on xd in (suppressing arguments)
∂Z2T Fνq
∂xd
it follows that

∂Fνq  ∂Z T
∂Fνq
∂Z2T
2
T
F νq + Z 2
+ 
Fνq + Z2T
∂xd
∂xd
∂xa
∂xa
Here, Fνq = 0 and Z2T
Z2T
∂Fνq
∂xd
∂Fνq
∂ẋ(1+)
+ Z2T
=0


∂Fνq
 ∂xa  ∂Z2T
T


 ∂x +  ∂ẋ(1+) Fνq + Z2 ∂ẋ(1+)
d

 ∂ẋ(1+) !
=0

∂xd
= Z2T Mνq = 0 implies that
∂Fνq
∂xa
h
∇1 R = Z2T ∇2 Fνq Pd
Pa
i
"
#
I
!
= Z2T Nνq X̂ = 0
∇1 R
Back in section 5.1.3 it was indicated that we would be able to show that a finite
simplified strangeness index implies local uniqueness of solutions. With lemma 5.7
at our disposal this statement can now be shown rather easily.
5.13 Lemma. If the simplified strangeness index is finite and x is a solution to the
dae for some initial conditions in Lνq ∩ bδ , then the solution x is locally unique.
Proof: Using the parameterization of x given by (5.15), it suffices to show that the
coordinates xd are uniquely defined. By the smoothness assumptions and the analytic implicit function theorem, Hörmander (1966), showing that xd0 (t) is uniquely
determined given xd (t) and t will be sufficient, since then the corresponding ode
will have a right hand side which is continuously differentiable, and hence locally
Lipschitz on any compact set. One may then complete the argument by applying
a basic local uniqueness theorem for ode, such as Coddington and Levinson (1985,
theorem 2.2)).
Reusing (5.5) for the current context, xd0 (t) is seen to be uniquely determined if
∇2 fd ( xd , ẋd , t ) is non-singular (in some neighborhood Lνq ∩ bδ of the initial conditions). Identifying (5.6) in (5.11c), lemma 5.12 completes the proof.
With X̂ according to (5.16) it follows that
!
Z2T Nνq X̂ = Nνdq + Nνaq ∇1 R = 0
(5.17)
144
5
A new index close to strangeness
using the notation (5.13). Before stating the main theorem of the section we derive
one more equation. Using (5.14) and allowing also the dotted variables ẋ(1+) to depend on t in (suppressing arguments)
Z2T
∂Fνq
∂t
!
=0
it follows that
Z2T ∇1 Fνq + ∇2 Fνq ∇2 φ−1 + Mνq
∂ẋ(1+)
∂t
!
!
= Z2T ∇1 Fνq + ∇2 Fνq ∇2 φ−1 = 0
(5.18)
5.14 Theorem. Consider a sufficiently smooth dae (5.1), repeated here,
!
f ( x(t), x0 (t), t ) = 0
(5.1)
with finite simplified strangeness index νq and where the un-dotted variables in Lνq
form a manifold of dimension nd . If the set where P2–[5.3] holds is the projection of
a similar bδ+ Lνq +1 , and P2–[5.3] also holds on bδ+ Lνq +1 with the same dimension nd ,
then there is a unique solution to (5.1) for any initial conditions in bδ+ Lνq +1 .
Proof: Considering how Fνq +1 is obtained form Fνq , it is seen that the equality
Fνq +1 = 0 can be written
∇1 Fνq + ∇2 Fνq ẋ + ∇3+ Fνq ẋ(2+) = 0
Multiplying by Z2T from the left and identifying the expressions for Nνq and Mνq , one
obtains
Z2T ∇1 Fνq + ∇2 Fνq ẋ = 0
Using (5.18) and the change of variables (compare (5.12))
!
h
i ẋ
d
P
P
ẋ = d
a
ẋa
(5.19)
leads to (using the notation introduced in (5.13))
h
Nνdq
Nνaq
i
−P
−1
∇2 φ
−1
ẋ
+ d
ẋa
!!
=0
Using (5.15) and (5.17) yields
−Nνaq ∇2 R( xd , t ) − Nνaq ∇1 R( xd , t ) ẋd + Nνaq ẋa = 0
and since Nνaq is non-singular, it must hold that
!
!
!
ẋ
I
0
ẋ = P d = P
ẋd + P
= ∇1 φ−1 ( xd , t ) ẋd + ∇2 φ−1 ( xd , t )
ẋa
∇1 R( xd , t )
∇2 R( xd , t )
!
Since f ( x, ẋ, t ) = 0 holds by definition on Lνq , it follows that
f ( φ−1 ( xd , t ), ∇1 φ−1 ( xd , t ) ẋd + ∇2 φ−1 ( xd , t ), t ) = 0
5.4
145
Implementation
where ẋd is uniquely determined given xd and t by (5.11c) with ∇1 φ−1 = X̂ in place
of X.
Hence, the dae
f ( φ−1 ( xd (t), t ), ∇1 φ−1 ( xd (t), t ) xd0 (t) + ∇2 φ−1 ( xd (t), t ), t ) = 0
has a (locally unique) solution and the trajectory generated by
x(t) = φ−1 ( xd (t), t )
is a solution to the original dae (5.1).
5.4
Implementation
The definition of the simplified strangeness index does not prescribe that a basis
for the tangent space of x should be computed in the same way as the definition of
the strangeness index does. We have seen, however, that this basis is an important
intermediate object for the selection of equations to be discretized during numerical
integration. Two ways to compute this basis will be presented in this section, and
their computational complexity will be compared.
We shall assume that occuring matrices are full, so that there is no particular structure that can be utilized in the problem. This assumption may be highly questionable
in many applications, but that just opens up for a refined analysis in the future, taking sparsity into account. We assume QR decomposition is used both for computing
a well-conditioned base for a null space, and to compute a well-conditioned base for
a range space. If sparsity is not utilized, the QR decomposition is preferably computed using Housholder reflections, while Givens rotations would be used to take
advantage of sparsity.
A conceptuallyh simple way
i to determine the basis is to compute a basis for the right
null space of Nνq Mνq , and project these vectors on the x-space; these are the
directions in which x will be free to move under the algebraic constraints. The set of
projected vectors will always contain at least nx elements which is typically too many
for the set to be a basis, and the vectors may also be poorly conditioned. Hence, to
obtain a well conditioned basis one additional computation has to be performed. This
method will be referred to as the projection method below.
An alternative way to determine the basis is to follow the definition of the strangeness
index, except that one does not require the matrix Aνq to have independent rows.
This method requires computation of the left null space of Mνq and the right null
space of Aνq . Lemma 5.7 was originally developed to show that the two ways of computing the basis are equivalent. This method will be referred to as the strangeness
index method below.
146
5.4.1
5
A new index close to strangeness
Computational complexity
Both methods will perform two QR decompositions, one large and one small.
The small one would not be required for the projection method unless a (wellconditioned) base was sought in the end, but it is not here the big difference in
computational burden is to be sought. Note that computing a complete QR decomposition involves more than twice the number of multiplications compared to only
computing the upper triangular factor. Hence, for the strangeness index method, it
will be more efficient to apply the same row operations to Nνq as are applied when
row-reducing Mνq , than first compute a matrix spanning the left null space and then
apply it to Nνq . Similarly, for the projection method, where only the projection of a
null space onto x-space is needed, it suffices to compute just the first columns of the
unitary matrix. This can be implemented by row reducing the left part of the matrix
" T
#
N νq I
MνTq 0
and then reading the lower right block of the resulting matrix. This will, however,
always involve more computation than to do the row reduction of the left block of
the matrix
h
i
M νq N νq
since both reductions involve the same number of columns, but the former has nx
more rows to take care of and will not terminate early (hence requiring ( νq + 1 ) nx −
1 Housholder reflections), while the latter will terminate when the na last rows of
the left block are found to be zero (thus requiring ( νq + 1 ) nx − na − 1 Housholder
reflections).
The comparison shows that the strangeness index method has an advantage. Still, we
think that the conceptual simplicity of the projection method adds valuable insight.
5.4.2
Notes from experiments
Experimental results are not included in the section in the usual sense, the reason
being that the two methods are equivalent. Therefore, we only include some brief
remarks based on experience from tests with our experimental implementations of
the two methods. In all examples, the simplified strangeness index has been equal to
the strangeness index.
Since the construction of Z1 is not canonical, it will generally depend on X via ∇2 f X,
or more generally, the columns used to span the tangent space of the solution manifold. We have seen that the two methods yield the same linear space spanned by the
columns of X, but the construction of X differs. Hence, in our experimental setup
where Z1 has been chosen using a greedy method to pick out a subset of the rows
of ∇2 f X with good condition number, the selection of rows is not always the same
for the two methods. However, comparing the resulting difference in the numerical
solution is meaningless since the differences will be due to the greedy algorithm and
not due to differences conceptual differences between the two methods.
5.5
Conclusions and future work
147
We remark that consistent initialization (that is, finding a root to the residual Fνq +1 ,
compare Kunkel and Mehrmann (2006, theorem 4.13)) has been a major concern for
the numeric experiments. However, the Mathematica function FindMinimum has
been a very useful tool, while finding a good enough initial guess for Mathematica’s
local search method FindRoot has turned out to be notoriously hard.
5.5
Conclusions and future work
In our view, a simpler way of computing the strangeness index has been proposed.
It gives a lower bound on the strangeness index, and when the auxiliary property P3–[5.9] holds, it gives an equivalent test. While the original definition follows a
three step procedure, the proposed definition has just one step (except that P3–[5.9]
needs to be verified separately). The new index definition is also appealing due to its
immediate interpretation from a numerical integration perspective.
Just as the strangeness index, the simplified strangeness index emphasizes the parameterization of all variables via a smaller number of “differential” variables, but
the corresponding dimensions are not required to be visible by looking only at Mνq .
Analogues of central results for the original strangeness index have been derived
for the simplified strangeness index. In particular, it has been shown that a finite
simplified strangeness index implies that if a solution exists, it will be unique, and
existence of a solution can be established by checking the property that defines the
index for two successive values of the index parameter.
For the simplified strangeness index, the computational complexities of two methods
for computing a basis for the x tangent space have been compared. The outcome was
favorable for a method closely related to the definition of the original strangeness index. The other method offers superior conceptual simplicity, and adds insight to the
more efficient definition, and hence also to the original strangeness index. These observations are though be useful when the strangeness index concept is being taught.
An important aspect of the analysis of the strangeness index provided in Kunkel and
Mehrmann (2006, chapter 4) is that the strangeness index is shown to be invariant
under some transformations of the equations which are known to yield equivalent
formulations of the same problem. It is an important topic for future research to find
out whether the simplified strangeness index is also invariant under these transformations. Another interesting topic for future work is to seek examples where νq , νS
in order to get a better understanding of this exceptional case.
Finally, in view of the emphasis that this thesis puts on the singular perturbation
problems arising from uncertainties in dae, it would be very interesting to derive
the singular perturbation problems related to the results of the present chapter.
6
LTI ODE
of nominal index 1
This is the first chapter of three with results for uncertain dae and the related matrixvalued singular perturbation problems.
The current chapter considers the same problem as was considered in Tidefelt (2007,
chapter 6), and deals with the two major deficiencies discussed there. At the same
time, the next chapter will show that some of the central ideas here are limited to the
nominal index 1 case, so this is the chapter where the strongest results appear. The
chapter contains the results of the two papers
Henrik Tidefelt and Torkel Glad. Index reduction of index 1 dae under uncertainty. In Proceedings of the 17th IFAC World Congress, pages
5053–5058, Seoul, Korea, July 2008.
Henrik Tidefelt and Torkel Glad. On the well-posedness of numerical
dae. In Proceedings of the European Control Conference 2009, pages
826–831, Budapest, Hungary, August 2009.
as well as some variations of results in Tidefelt (2007, chapter 6), that were omitted
from Tidefelt and Glad (2008) due to space constraints. At the end, the chapter
contains a new example which indicates better applicability of the theoretical results,
adds important insights to the problem, and connects the current chapter with the
next.
The singular perturbation theory, as presented in (Kokotović et al., 1986), provides
very relevant background for this chapter. In particular, theorem 6.21 herein should
be compared with Kokotović et al. (1986, chapter 2, theorem 5.1).
The chapter is organized as follows (compare figure 6.1). Section 6.1 introduces the
problem and derives the related matrix-valued singular perturbation problem. To be149
150
6 lti ode of nominal index 1
gin with, it is assumed that the nominal index 1 dae is pointwise index 0. Then the
new section 6.2 gives a schematic overview of the analysis and captures the essence
of the current chapter as well as chapter 8. Section 6.3 considers the decoupling of
nominal and fast uncertain dynamics. In section 6.4, we derive a matrix result which
will be the main tool for the formulation of assumptions in terms of eigenvalues.
It is applied in section 6.5 when we formulate results for ode which will then be
used when we study the fast and uncertain subsystem in section 6.6. In section 6.7
we draw conclusions regarding the original coupled system using results from previous sections. Then, section 6.8 considers what happens if the pointwise index 0
assumption is dropped. Two examples of the theory are given in section 6.9, before
the chapter is concluded in section 6.10.
6.1
Introduction
We are interested in the utility of modeling dynamic systems as unstructured uncertain differential-algebraic-equations. Consider the linear time-invariant dae
!
Ē x̄0 (t) + Ā x̄(t) = 0
(6.1)
If Ē is a regular matrix, this equation is readily turned into an ordinary differential
equation, but our interest is with the other case. By saying that this dae is unstructured, we mean that we cannot assume any of the matrix entries to be known exactly.
Instead, we assume that there is an uncertainty model with independent uncertainty
descriptions for the matrix entries, and then example 1.1 showed that additional assumptions are needed to turn the equation into an uncertain ode. To see what kind
of assumptions we might find reasonable to add, we recall that the equations represent a dynamic system, and so we might be willing to make assumptions regarding
system features; that is, properties of a dynamic system which are not dependent of
the particular equations used to describe the system. Invertibility of Ē is not a system feature. The poles are a system feature, and the poles of an ode are given by the
finite eigenvalues• of the matrix pair ( Ē, Ā ).
One way to analyze the equations is to apply a row reduction procedure to the equations, trying to bring Ē into upper triangular form (variables may be reordered as
needed). Such a procedure (we think of Gaussian elimination, or QR decomposition
using Given’s rotations or Householder reflections) can only proceed as long as the
lower right block (which remains to reduce to upper triangular form) contains an
entry which can be distinguished from zero. If the uncertain matrix is not regular,
the procedures will at some point fail to find an entry which can be distinguished
from zero (or else the procedure would prove the regularity of the matrix), and will
be unable to continue beyond that point:
"
#
! "
#
!
Ẽ11 Ẽ12 x̄10 (t)
Ã11 Ã12 x̄1 (t) !
+
=0
(6.2)
0
Ẽ22 x̄20 (t)
Ã21 Ã22 x̄2 (t)
•
Since all variables in the dae are considered outputs, the system is trivially observable — that is, the
eigenvalues cannot correspond to un-observable modes without corresponding system poles.
6.1
151
Introduction
where Ẽ11 is regular, and Ẽ22 has no entries which can be distinguished from zero.
The dae is considered ill-posed if the family of solutions does not converge as the
uncertainty tends to zero, for any initial conditions in some set of interest. Hence,
showing well-posedness in this sense is a first step towards replacing the ad hoc procedure of neglecting Ẽ22 with an analysis that accounts for the error in the solution
introduced by substituting zero for Ẽ22 .
The remaining part of this introduction contains just enough analysis to reach the
fundamental matrix-valued singular perturbation problem.
•
It is assumed that
the equations are of nominal index 1, where nominal is taken to
refer to max Ẽ22 = 0. This means that the matrix
"
#
Ẽ11 Ẽ12
(6.3)
Ã21 Ã22
is regular. When Ẽ22 = 0, the second group of equations becomes a static relation
between the variables. We also assume that the initial conditions x̄(0) satisfy this
relation.
Next, a change of variables leads to
"
#
! "
I 0 x0 (t)
A
+ 11
0 E z 0 (t)
A21
A12
A22
#
!
x(t) !
=0
z(t)
(6.4)
From the assumed regularity of (6.3), it follows that the new matrix
!
I
0
A21 A22
and hence A22 , is regular.
Maintaining the style of Tidefelt and Glad (2008), applicability of the results in this
chapter are increased by generalizing (6.4) slightly, allowing the trailing matrix to
depend on E. Doing so leads to the following lti matrix-valued singular perturbation
form (in chapter 7 a somewhat different form is used, given by (7.8))
!
x0 (t) + A11 (E) x(t) + A12 (E) z(t) = 0
!
E z 0 (t) + A21 (E) x(t) + A22 (E) z(t) = 0
(6.5x)
(6.5z)
where the following properties will be used throughout the chapter.
P1–[6.1] Property. The functions Aij shall be analytic functions, without uncertainty at 0, and with a known, finite, Lipschitz constant.
P2–[6.2] Property. The (nominal) matrix A22 (0) shall be non-singular.
That is, the uncertainty in the form (6.5) is in the matrix E and how the trailing
matrices Aij depend on E under the respective Lipschitz conditions.
•
That is, the differentiation index, see definition 2.2.
152
6 lti ode of nominal index 1
Note that the Lipschitz condition for A22 together with corollary 2.47 provide that
the inverse of A22 (E) can be bounded by a constant, if E is required to be sufficiently
small.• To ease notation, we shall often not write out the dependency of the Aij on E.
For reference to the published work behind this chapter, the notion of unstructured
perturbation has to be explained. It refers to the lack of structure in the uncertainty E
compared to the singular perturbation problems studied in the past (see section 2.5),
where E has either been in the form I for a small scalar > 0, or a diagonal matrix
with small but positive entries on the diagonal. In this thesis, the lack of structure is
marked by the use of matrix-valued uncertainty instead.
6.2
Schematic overview of nominal index 1 analysis
The analysis and convergence proofs (where present) in the present and subsequent
chapters have much in common. To emphasize this, and to enhance the reading of
these chapters, this section contains a schematic overview of the common structure.
The schematic overview is given in figure 6.1, annotated below.
• (a) The solution to the uncertain part of the decoupled system will generally
not be known beyond boundedness. Hence, it is not necessary that the change
of variables converge to a known transform as max(E) → 0, but what matters
is that the relation between the solution to the slow dynamics and the original
variables converges to a known entity, and that the influence of the solution to
the uncertain dynamics on the original variables is bounded independently of
max(E).
In chapter 8, where the pointwise index of the equations is assumed zero, showing the existence of a decoupling transform with the required properties is the
main concern.
• (b) Showing that the eigenvalues of the uncertain dynamics must grow as
max(E) → 0 is the main concern in chapter 7, as the rest of this chapter has
much in common with the present chapter.
• (c) Since the solution η will have a non-vanishing influence on some of the
original variables, it is necessary for convergence in the original variables that
η converges uniformly to zero as max(E) → 0. In particular, it has to be shown
that there is uniform convergence at the initial time.
• (d) Making assumptions about eigenvalues is a key ingredient in the convergence proofs. In the end, these assumptions will restrict the uncertainty sets
of the uncertain entities in the equations, but we prefer making the restrictions
indirectly via the eigenvalues since these can be related to system properties.
Besides stating the assumptions, it needs to be verified that the restricted uncertainty sets are non-empty, or else any further reasoning will be meaningless.
•
If it would not have been in order to maintain the style of Tidefelt and Glad (2008), the uncertainty model
of chapter 7 would have been used instead. There all deviation from some nominal matrices are assumed
bounded by a common number m, and instead of considering max(E) → 0, one considers m → 0.
6.2
153
Schematic overview of nominal index 1 analysis
Uncertain
dae
a
Slow
ode in ξ
d
Uncertain
dae in η
Decoupling
transform
c
b
Eigenvalue
assumptions
Show
|λ| → ∞
Show
η(0) → 0
e
λ ∈ fast region
f
Bound
kΦ(t, τ)k2
η→0
uniformly
Combine
solutions
g
Convergence
of solutions
Figure 6.1: Schematic overview of convergence proofs. The crucial steps have
been marked with a thick border. The labels link to annotations in the text.
154
6 lti ode of nominal index 1
• (e) Using that the eigenvalues of the uncertain dynamics must tend to infinity as max(E) → 0, it can be concluded that for sufficiently small max(E) the
eigenvalues of the uncertain dynamics must belong to a subset of the assumed
region, strictly included in the left half of the complex plane.
• (f) The last crucial step is to show that the location of the eigenvalues of the
uncertain dynamics imply that the initial condition response to the uncertain
dynamics can be bounded independently of E, for sufficiently small max(E).
(Then, uniform convergence of initial conditions to zero implies that the whole
solution converges uniformly to zero.)
In the lti cases, we use results from the theory of M-matrices. In the ltv case
we also need Lyapunov-based methods.
• (g) This is the final conclusion of the analysis. While we are mainly concerned
with the qualitative property of convergence in this thesis, a review of the
proofs will indicate how the convergence may be quantified. However, some
steps in the analysis seem to give rise to gross over-estimation, making the
quantitative results unpleasant to use in real applications (the alternative being
to ignore the issue with matrix-valued singular perturbation altogether, or try
to avoid the issue by re-deriving the equations with more attention to structure,
recall section 1.2.5).
6.3
Decoupling transforms and initial conditions
Following the scheme outlined in the previous section, we now start by deriving the
decoupling transform. Similarly to how this done in most of the literature on other
singular perturbation problems, the transform is divided into two steps. Since the
initial conditions for the decoupled system are a direct consequence of the decoupling transform (and the initial conditions for the coupled system), results on initial
conditions are also included in the present section. With this short introduction, we
now begin with a lemma for the first decoupling step.
6.3 Lemma. There exists an analytic matrix-valued function L such that, for sufficiently small max(E), the change of variables
! "
# !
x
I
0 x
=
(6.6)
z
L( E ) I η
transforms (6.5) into the system
"
# 0 ! "
I
x (t)
A + A12 L( E )
+ 11
E η 0 (t)
0
A12
A22 − E L( E ) A12
#
!
x(t) !
=0
η(t)
(6.7)
The matrix L( E ) satisfies
L( 0 ) = −A22 (0)−1 A21 (0)
and a Lipschitz condition.
(6.8)
6.3
155
Decoupling transforms and initial conditions
Proof: Applying the change of variables shows that x is eliminated from the η 0 equation provided L( E ) satisifies
!
0 = A21 (E) + A22 (E) L( E ) − E L( E ) A11 (E) + A12 (E) L( E )
(6.9)
For E = 0 there is the solution
L( 0 ) = −A22 (0)−1 A21 (0)
The derivative of the right hand side of (6.9) with respect to each column of L at E = 0
is A22 (0), which is non-singular. It follows from the analytical implicit function theorem, (Hörmander, 1966), that the equation can be solved to give an analytical L in
some neighborhood of 0. On a closed ball of positive radius within that neighborhood, L will be bounded due to its continuity. Since L0 will also be analytic (see, for
instance, Krantz and Parks (2002, proposition 1.1.14)), it follows that L0 will also be
bounded on the same closed ball, implying the Lipschitz condition.•
Let the initial conditions for (6.5) be
x(0) = x0
z(0) = z 0
6.4 Lemma. If the initial conditions x0 and z 0 are chosen to make the dae consistent for E = 0, that is,
!
0 = A21 (0) x0 + A22 (0) z 0
then the initial condition η(0) =
(6.10)
η 0 (E)
for the lower part of (6.7) satisfies
i
η 0 (E) = −A22 (0)−1 A21 (0) − L( E ) x0
h
(6.11)
In particular η 0 is analytic with
η 0 (E) = O( kEk2 )
•
(6.12)
In chapter 7, the decoupling transforms will be established using fixed-point methods rather than analytic
calculus. Then the neighborhoods — whose existence is the only thing we care about in this chapter — will
be balls with radii that are obtained constructively. This will bring theory much closer to application, and
the presentation in this chapter is using analytic calculus to maintain the flavor of the published works
that the chapter builds upon. We shall briefly indicate the type of results that fixed-point methods provide
by looking at how a bound on L over a closed ball may be constructed.
Let a1 (E), . . . , am (E) denote the columns of A21 (E), and let l1 (E), . . . , lm (E) be the columns of L( E ). Impose a bound on E so that there
exists constants k11 and k12 such that kA11 (E)k2 ≤ k11 and kA12 (E)k2 ≤
k12 . Let ρ > A22 (0)−1 A21 (0)2 denote the bound on kL( E )k2 to be established and write (6.9) as
 

 a1 (E)   A22 (E)
 .  !  
−  .  =  
 .   
am (E)
..
.






 + F( E ) 




A22 (E)


 l1 (E) 
 . 
 . 
 . 


lm (E)
where kF( E )k2 ≤ kEk2 ρ ( k11 + k12 ρ ). According to corollary 2.47, the solution L( E ) will satisfy the
bound kL( E )k2 ≤ ρ if kF( E )k2 is made sufficiently small. In other words, for each ρ > A22 (0)−1 A21 (0)2
we can construct an open ball for E, within which the bound kL( E )k2 ≤ ρ holds. The derivative can be
bounded similarly.
156
6 lti ode of nominal index 1
Proof: From the definition of the change of variables z = L( E ) x + η it follows that
η 0 ( E ) = z 0 − L( E ) x0
Substituting z 0 from (6.10) gives (6.11) while (6.8) and the Lipschitz condition for L
gives (6.12).
Introduce the notation
4
Aη( E ) = A22 (E) − E L( E ) A12 (E)
(6.13)
where Aη( 0 ) is the non-singular matrix A22 (0), and a Lipschitz condition for Aη is
obtained by taking E sufficiently small.
To emphasize the difference between uniform and “directional convergence” with
respect to the uncertainty, the following lemma gives a result to be contrasted with
lemma 6.20.
6.5 Theorem. Assume E = m E∗ , where E∗ is a non-singular matrix with max(E∗ ) =
1, such that −E∗−1 Aη(0) is Hurwitz. Then,
sup η(t) = O( m )
(6.14)
t≥0
Proof: With the change of variables t = mτ we get
∂η
= −E∗−1 Aη( m E∗ ) η,
∂τ
η(0) = η 0
with solution
−1 A
η(τ) = e−E∗
η( m E∗ ) τ
η0
Since −E∗−1 Aη(0) is a Hurwitz point matrix, the time-scaled system with m = 0 can be
h
i
shown to be uniformly γ e−λ• -stable for some α −E∗−1 Aη(0) < λ < 0 according to
theorem 2.38. Since the matrix in the exponent (as a function of m) satisfies a Lipschitz condition, and −E∗−1 Aη(0)2 is bounded since it is a point matrix, theorem 2.41
shows that there exist positive constants C1 and m0 (ignoring the exponential decay
rate) such that
−1
m ≤ m0 =⇒ e−E∗ Aη( m E∗ ) τ ≤ C1
2
Since
η0
= O( kEk2 ) = O( m ) the result follows.
If E∗ would have been singular in theorem 6.5, other estimates would be possible,
but note that the “directional convergence” is not the type of convergence we seek.
Consequently, theorem 6.5 has no applications in the thesis. The following example
gives a better picture of the problem we have to address.
6.3
157
Decoupling transforms and initial conditions
6.6 Example
In this example, the bounding of η over time is considered in case η has two components. For simplicity, we shall assume that η is given by
η 0 (t) = E −1 η(t)
where
−δ
E=
0
1−δ
−δ
!
where ∈ ( 0, m ] and δ > 0 is a small uncertainty parameter so that max(E) ≤ m.
Since
!
2
−1
−1 −1/δ 1/δ − 1/δ
E =
0
−1/δ
it is seen that both eigenvalues are perfectly stable and far into the left half plane,
while the off-diagonal entry is at the same time arbitrarily big. It is easy to verify
using software that the maximum norm of the matrix exponential grows without
bound as δ tends to zero, for any bound m. Hence, knowing that the initial conditions
η(0) must tend to zero with max(E) is not sufficient to obtain a uniform bound on η
which tends to zero with max(E).
In Tidefelt and Glad (2008), the unboundedness supt≥0 η(t) was remedied by assuming that the condition number of E be bounded, and it is easy to see that this would
imply a lower bound on δ in this example, thereby solving the problem. However,
assuming a bound on the condition number of E is very artificial and will not be done
in the thesis.
So far, the presentation in the present chapter has been rather close to Tidefelt and
Glad (2008). However, we will now omit Tidefelt and Glad (2008, lemma 5) and take
a different route thanks to the improved results in Tidefelt and Glad (2009). It is the
topic of section 6.6 to find conditions that can be used to provide a uniform bound
on the solution η which tends to zero with E, when η satisfies
!
E η 0 (t) + Aη(E) η(t) = 0
The vanishing function η may then be regarded an external input to the ode for x
in (6.7), and regular perturbation techniques may be used to show that the solutions
x converge as E tends to zero. However, it is also illustrative to apply a second decoupling transform which isolates the slow dynamics; the transform is guaranteed to
exist by the next lemma.
Continuing on the result of lemma 6.3, the following lemma shows that the influence
of η on x is small.
158
6 lti ode of nominal index 1
6.7 Lemma. There exists a matrix-valued function H such that, for sufficiently
small max(E), the change of variables
! "
# !
x
I H( E ) E ξ
=
(6.15)
η
0
I
η
transforms (6.7) into the system
"
# 0 ! "
A (E) + A12 (E) L( E )
I
ξ (t)
+ 11
0
E η 0 (t)
0
Aη(E)
#
!
ξ(t) !
=0
η(t)
(6.16)
where kH( E )k2 is bounded by a constant independently of E.
Proof: Applying the change of variables and then performing row operations on the
equations to eliminate η 0 from the first group of equations, lead to the condition
defining H( E ) (dropping other dependencies on E from the notation):
!
0 = [ A11 + A12 L ] H( E ) E + A12 − H( E ) [ A22 − E L A12 ]
(6.17)
It follows that
H( 0 ) = A12 (0) A22 (0)−1
which is clearly bounded independently of E. The equation is linear in H( E ), invertible at E = 0, and the coefficients depend smoothly on E, so as for L it follows
that H is analytical in some neighborhood of 0. Hence, restricting H to a sufficiently
small closed ball (via the selection of a sufficiently small bound on max(E)) will make
kH( E )k2 bounded independently of E.•
The results Tidefelt (2007, lemma 6.4, corollary 6.1) consider the derivatives of η 0 ( E )
with respect to E, and belong in the present section. However, since these have no
counterparts in later chapters, we prefer to present them using the more constructive
fixed-point methods of later chapters, rather than using the original framework of
real analytic functions.
6.8 Lemma. Irrespectively of the rank of E, the solution L to (6.9) has the form
L0 ( E ) = −A22 (E)−1 A21 (E)
h
i
L( E ) = L0 ( E ) + A22 (E)−1 E L0 ( E ) A11 (E) + A12 (E) L0 ( E ) + m RL (E)
(6.18)
where m is an upper bound on E, and RL (E) can be bounded independently of E, for
m sufficiently small.
Proof: The following proof will never make use of the rank or pointwise nonsingularity of E, which makes it valid for any rank.
Inserting (6.18) in (6.9), repeated here,
!
0 = A21 (E) + A22 (E) L( E ) − E L( E ) A11 (E) + A12 (E) L( E )
•
As in lemma 6.3, more constructive formulations are easily obtained using corollary 2.47.
6.4
159
A matrix result
yields
!
0 = A21 (E) − A22 (E) A22 (E)−1 A21 (E)
h
i
+ E L0 ( E ) A11 (E) + A12 (E) L0 ( E ) + m RL (E)
− E L( E ) A11 (E) + A12 (E) L( E )
and then
!
0 = m E RL (E)
h
i
+ E L0 ( E ) A11 (E) + A12 (E) L0 ( E )
− E L0 ( E ) A11 (E) + A12 (E) L0 ( E )
− E L0 ( E ) A12 (E) L( E ) − L0 ( E )
− E L( E ) − L0 ( E ) A11 (E) + A12 (E) L( E )
Cancelling a factor of E on the left means that the above equation is implied by the
one below.
!
0 = m RL (E)
− L0 ( E ) A12 (E) L( E ) − L0 ( E )
− L( E ) − L0 ( E ) A11 (E) + A12 (E) L( E )
The rest of the proof is a contraction mapping argument showing that there is a
ρL > 0, such that kRL (E)k2 ≤ ρL if max(E) is required to be sufficiently small. The
argument is given in section 6.A.
In addition to some details of the proof of lemma 6.8, section 6.A contains an example
of the lemma.
6.9 Corollary. If the trailing matrices in lemma 6.4 are independent of E, then the
following relation gives a more precise account of the relation (6.12).
h
i
0
A11 + A12 L0 + O( m ) x0
(6.19)
η 0 ( E ) = −A−1
22 E L
Proof: Use lemma 6.8. That the trailing matrices are independent of E implies that
the difference A22 (E)−1 A21 (E) − A22 (0)−1 A21 (0) vanishes.
6.4
A matrix result
Before we take on the problem of analyzing the dynamics of η, we need to switch context for a while and see how eigenvalue conditions can help to bound the inverse of
a small matrix. The result we need is quickly derived, and we then turn to examples
in an attempt to illustrate its qualities.
160
6 lti ode of nominal index 1
6.10 Lemma. For an invertible upper triangular matrix U , it holds that
p
( a + 1 )2 n + 2 n ( a + 2 ) − 1
kU k2 ≤ λmax ( U )
a+2
where a = max U −1 λmax ( U ).
(6.20)
Proof: The difference to theorem 2.53 is only minor. Here one uses that the bound
(2.90) of theorem 2.53 is increasing with a, so the bound is overestimated by inserting
the expression for b and the upper bound for a.
Extending the result for upper triangular matrices to the general case is easy and
relies on the Schur decomposition of a matrix. Unfortunately, the nice property of
the Schur decomposition that all factors but one are unitary, is not quite as beneficial
as if our results had been assuming bounds on the induced 2-norm of a matrix instead
of entry-wise maximum.
6.11 Theorem. For an invertible matrix X ∈ Rn×n , it holds that
p
( n a + 1 )2 n + 2 n ( n a + 2 ) − 1
kXk2 ≤ λmax ( X )
na+2
where a = max X −1 λmax ( X ).
(6.21)
Proof: Let Q U QH = X be a Schur decomposition
of X. Then kXk2 = kU k2 , and
λmax ( U ) = λmax ( X ), so a bound on max U −1 will yield a bound on kXk2 by
lemma 6.10. From
max U −1 ≤ U −1 2 = Q U −1 QH2 = X −1 2 ≤ n max X −1
(6.22)
the result follows by substituting n max X −1 for max U −1 in (6.20).
We now turn to the examples. An exact treatment of the optimization problem
bounded by (6.21),
maximize
kXk2
X∈Rn×n
max X −1 ≤ m
λmax ( X ) ≤ λ̂
appears difficult, even for as low dimension as n = 2. Instead, in example 6.12 we
turn to numeric (nonlinear, global) optimization in order to obtain examples which
we provide as indications of how tight the bound may actually be. The section then
ends with example 6.13, where we plot the true norm against the bound for a large
number of randomly generated matrices.
6.4
161
A matrix result
6.12 Example
Since we are only interested in finding good examples here, we choose to consider
the problem
minimize
max X −1
n×n
X∈R
λmax ( X ) ≤ λ̂
! kXk2 = X 0 2
where X 0 2 is a feasible objective function value in the original problem for bounds
m as low as indicated by solutions to this problem. The optimization problem is
further simplified by a restrictive parameterization of X −1 as Q U −1 QT. Here, U −1 is
chosen as
 −1

η
0
...
0 
λ

.. 

−1
 0
λ
η
. 



−1
.

..
U = 
(6.23)

η
0
0


 ..

 .
0 λ−1
η 


0
...
0
λ−1
with λ = −λ̂ and η chosen as some small constant (this will define X 0 ), and the
2
orthogonal• Q is parameterized as a composition of Given’s rotations and reflections. Different combinations of reflections are enumerated, and for each enumeratedcombination,
simulated annealing is applied to find rotations that yield a small
max Q U −1 QT .
Application of this scheme for some choices of n, λ̂, and η, gives the results shown
in table 6.1. The table shows that the bound is no or only a few orders of magnitude
from the ideal bound in cases of practical importance.
The following matrix manifests one of the rows of table 6.1. The reader is encouraged
to verify this, but is warned that the precision in the printed matrix entries is insufficient to reproduce the λmax ( X ) column with more than one accurate digit, causing
the bound (6.21) to have zero accurate digits.
X −1 =

 0.1518712275
 0.1524043923


−0.1524043968

0.1465683972
•
−0.1522399412
−0.143406358
−0.1487240742
0.1522222484
−2.388603613 · 10−3
−2.337599229 · 10−3
−0.1475359243
−0.152030243
Orthogonal instead of unitary ensures that X −1 is real.

2.33223507 · 10−3 
2.282425336 · 10−3 

0.1479232046 

0.1524043881
162
6 lti ode of nominal index 1
dim X
λmax ( X )
η
max X −1
kXk2
(6.21)
2
2
2
2
4
4
4
4
0.3
0.3
30.
30.
30.1
30.
3.03 · 102
3.01 · 102
0.3
30.
0.3
30.
0.3
3 · 10−2
0.3
3 · 10−2
3.33
18.
0.18
15.
0.157
3.33 · 10−2
0.152
1.61 · 10−2
0.314
2.73
2.73 · 102
2.7 · 104
2.21 · 104
76.1
2.19 · 108
2.21 · 105
0.735
3.27
3.27 · 102
2.71 · 104
2.23 · 105
3.13 · 103
1.93 · 109
2.4 · 106
Table 6.1: Some examples of the bound (6.21) of theorem 6.11 compared to the
true norm. The parameters are explained in the text. All λmax entries should be
3 times 10 to an integer power, exceptions are due to numeric ill-conditioning.
Note that the bound is very tight where the inequality (6.22) is tight.
6.13 Example
Another way to illustrate the bound (6.21) is to generate a large number of random
instantiations of X −1 and plot the true norm of kXk2 against the bound. Since a
scaling of X −1 results in the same scaling of the bound, the scaling should be fixed
to make the two-dimensional nature of the plot meaningful (otherwise, a histogram
over the ratios would be
a better illustration). Here, the freedom to scale is used
to fix the value of max X −1 , making the x-values in the plot a monotone function
of λmax ( X ). Figures 6.2 and 6.3 show the result of this procedure for two different
values of n.
Looking at the figures — figure 6.3 in particular — it is tempting to conclude that
there must be a possibility to tighten the bound by some factor which is independent
of λmax ( X ).
6.4
163
A matrix result
kXk2
1012
1010
108
106
104
106
104
108
1010
1012
bound
Figure 6.2: The true norm in (6.21)
against the corresponding bound
is plotted
for n = 2 and a fixed value of max X −1 , for 5000 random instantiations of X −1 .
In order to obtain the fixed value (here 1) for max X −1 , each matrix X −1 was
generated via an intermediate matrix Y according to X −1 = max(Y )−1 Y , where
the entries of Y were sampled from independent uniform distributions over the
interval [ −1, 1 ].
kXk2
1015
1012
109
106
103
103
106
109
1012
1015
bound
Figure 6.3: Analogous to figure 6.2, but with n = 3.
164
6 lti ode of nominal index 1
Im
ā m−1
R0
Re
φ0
Figure 6.4: The region (white color) in the complex plane where the eigenvalues
are assumed to reside, illustrating the condition (A1–6.26). The slow dynamics
are assumed to have eigenvalues smaller than R0 , while the fast and uncertain
dynamics are assumed to have eigenvalues smaller than ā m−1 and damping no
less than cos(φ0 ). The dashed line emphasizes the upper bound on the real part
of all eigenvalues larger than R0 .
6.5
An LTI ODE result
To simplify matters and to obtain results that are not limited to dae, we shall begin
by assuming that the invertible −Aη is the identity matrix, leading to the fundamental
question of bounding the initial condition response of the system
!
E η 0 (t) = η(t)
(6.24)
given stability, a bound on E, and some additional assumptions which remain to be
stated.
Although a bound on the induced matrix norm, kEk2 would be convenient for the
analysis, we observe that it is much more useful from an application point of view
to assume a bound on the maximum size of any entry in E, and this was why theorem 6.11 was formulated accordingly. However, since max(E) ≤ kEk2 , any result
which assumes max(E) ≤ m immediately follows if kEk2 ≤ m. Hence, the results below can readily be rewritten using an ordinary induced norm instead of max(•), but
doing so would introduce additional slack in the derived inequalities.
In the rest of this section, (6.24) is written in ode form,
!
η 0 (t) = M η(t)
(6.25)
and we let m > 0 be the known bound on max M −1 , that is, max(E) ≤ m. We are
ultimately interested in giving conditions under which the solutions converge as the
upper bound m on max(E) tends to zero.
We shall assume that the poles of the dynamic system being modeled satisfy the
following condition, illustrated in figure 6.4,
6.5
165
An lti ode result
A1–[6.14] Assumption. Let λ denote any eigenvalue of M. Assume there exist constants R0 > 0, φ0 ∈ [ 0, π/2 ), and ā > 1 such that
and
(A1–6.26)
|λ| > R0 =⇒ |arg(−λ)| ≤ φ0
where m is an upper bound for max M −1 , and ā presents a trade-off between generality of the assumption and the quantitative properties of the forthcoming convergence results.
|λ| m < ā
Note that if A1–[6.14] is acceptable for the value of m at hand, the assumption will
only become weaker (the feasible region for λ will grow) as we imagine smaller values
of m. It would be a subtle mistake to propose assumptions which are not satisfied for
any M, which motivates the following simple lemma.
6.15 Lemma. The condition ā > 1 for (A1–6.26) is sufficient (but not necessary) for
the assumption to be possible to fulfill with some M, for arbitrarily small m. It is
necessary that ā ≥ n−1 .
Proof: Sufficiency• follows by noting that M = −m−1 I obviously satisfies
max M −1 ≤ m
with |arg(−λ)| = 0 and |λ| m = 1 for every eigenvalue λ.
That the condition ā > 1 is not necessary — at least not for n = 2 and φ0 ≥ π/4 — is
demonstrated by the following example,
!−1
−m m
M=
−m −m
√
with |λ| m = 1/ 2 and |arg(−λ)| = π/4 for both eigenvalues.
Necessity of ā ≥ n−1 is a consequence of any eigenvalue being greater than
−1
−1 M −1 ≥ n max M −1
≥ (n m )−1
2
so for any eigenvalue λ it holds that |λ| m ≥ n−1 , and hence ā < n−1 would be a
contradiction.
The section now ends with a bound on the initial condition response of (6.25).
•
It should be noted here that the uncertainty model max M −1 ≤ m is often just a coarse over-estimation
of a more fine-grained model with individual uncertainty bounds on each matrix entry. Hence, some
of the best known uncertainty intervals for individual matrix entries may be much smaller than m, and
the matrix used to prove sufficiency here may actually not fall within these bounds. Note then, that the
coarser uncertainty model has a family of solution trajectories which includes those of the more finegrained uncertainty model. Hence, at some point, the fine-grained uncertainty model may be abandoned,
and the coarser uncertainty model which is more tractable be used instead — it is in this situation the
present lemma is to be applied.
166
6 lti ode of nominal index 1
6.16 Theorem (Main theorem for ode). Consider the ordinary differential equation
x0 (t) = Mx(t)
(6.27)
where max M −1 ≤ m.
Assuming A1–[6.14], there exist constants m0 > 0 and γ < ∞ such that
m < m0 =⇒ sup kx(t)k2 ≤ γ kx(0)k2
t≥0
Proof: In view of theorem 2.27, it is seen that the initial condition response of (6.27)
gets bounded if
m kMk2
kMk2
=
−α( M ) −m α( M )
can be bounded.
To see that the denominator is bounded from below by a constant independent of
m, if m is sufficiently small, we use A1–[6.14] and lemma 6.15. Then, any eigen −1
≥ (n m )−1 , showing that m < ( n R0 )−1 implies
value is greater than n max M −1
that all eigenvalues are greater than R0 , and hence −m α( M ) > m ( n m )−1 cos( φ0 ) =
n−1 cos( φ0 ).
That the numerator m kMk2 can be bounded from above follows from theorem 6.11,
as it shows that
p
( n ā + 1 )2 n + 2 n ( n ā + 2 ) − 1
m kMk2 ≤ ā
(6.28)
n ā + 2
Combining the bound for the denominator with the bound for the numerator, one
obtains
p
( n ā + 1 )2 n + 2 n ( n ā + 2 ) − 1
n ā
kMk2
≤
(6.29)
−α( M ) n ā + 2
cos( φ0 )
6.17 Remark. Note the trade-off present in the selection of ā. If ā is selected very large, (A1–
6.26) is easier to justify, while the bound on the gain of the initial condition response becomes
larger. At the other end, as ā → n−1 , (A1–6.26) becomes increasingly restrictive (recall that
lemma 6.15 does not even ensure consistency for ā < 1), while our bound for the gain of the
initial condition response tends to (here γ is referring to the notation of theorem 2.27)
eM t ≤ γ( n )
2
n−1
√
 22 n + 6 n − 1 

= γ( n )

3 cos( φ0 ) 
n−1
√
 22 n + 6 n − 1 
1



3
cos( φ0 )n−1
The part of this expression that only depends on n is 1 for n = 1, 4.3 for n = 2, and
grows rapidly. For n = 5, it is 9.7 · 105 , so even for very well dampened systems for which
1/ cos( φ0 )n−1 ≈ 1, the bound will be huge. This highlights the qualitative nature of this
work; the quantitative relation implied by the proof of theorem 6.16 will give so poor error
bounds for the solution that they are rarely meaningful to apply. This problem can be handled
6.6
167
The fast and uncertain subsystem
both by removing slack in the derivation of bounds under the current assumptions, or taking
advantage of stronger assumptions.
6.6
The fast and uncertain subsystem
As we now return to dae after the two preceding sections on matrices and ode, we
make the assumption that the uncertain dae is pointwise index• 0. This assumption
will be removed in section 6.8.
Let us now give A1–[6.14] a new interpretation when considering the differentialalgebraic equation
!
E x0 (t) + A x(t) = 0
(6.30)
where A is known and non-singular (the unknown but regular case will be considered
soon), while E is unknown but assumed pointwise non-singular (by definition of
pointwise index). For this system, the eigenvalues λ in A1–[6.14] naturally refer to
the eigenvalues of ( E, A ), and m is an upper bound for max(E). In this setup, we
require that ā > max(A).
6.18 Lemma. The condition ā > max(A) for (A1–6.26) in the context of (6.30) is
sufficient for the assumption to be possible to fulfill with some E, for arbitrarily
small m.
m
A.
max(A)
−1
−max(A) m .
Proof: Just take E =
( E, A ) equal to
This satisfies max(E) ≤ m with all eigenvalues of
We now obtain a corollary to theorem 6.16 by making minor changes to its proof.
6.19 Corollary. Consider (6.30). Assuming (A1–6.26) with ā > max(A), there exist
constants m0 > 0 and γ < ∞ such that
m < m0 =⇒ sup kx(t)k2 ≤ γ kx(0)k2
t≥0
Proof: Compare the proof of theorem 6.16. Writing the equation as an ode,
x0 (t) = −E −1 A x(t)
the ratio which needs to be bounded is seen to be
−E −1 A
m −E −1 A2
2
=
−α( −E −1 A ) −m α( −E −1 A )
For the denominator, any eigenvalue is greater than
−1 −E −1 A −1 ≥ A−1 −1 ( m n )−1
2
2
•
Recall definition on page 35.
168
6 lti ode of nominal index 1
so for sufficiently small m, any eigenvalue will be greater than R0 , and hence
−1
−1
−m α −E −1 A ≥ m A−1 2 ( m n )−1 cos( φ0 ) = A−1 2 n−1 cos( φ0 )
In order to apply theorem 6.11 for the numerator, we note that
−1 ≤ |λ| A−1 E 2 ≤ |λ| A−1 2 m n ≤ ā A−1 2 n C ā∗
|λ| max −E −1 A
yielding
m −E −1 A2 ≤ ā
p
( n ā∗ + 1 )2 n + 2 n ( n ā∗ + 2 ) − 1
n ā∗ + 2
Combining denominator and numerator bounds, one obtains
p
−E −1 A
( n ā∗ + 1 )2 n + 2 n ( n ā∗ + 2 ) − 1
ā∗
2
≤
∗
−1
cos( φ0 )
−α( −E A ) n ā + 2
6.20 Lemma. Compare the definition of Aη in (6.13). The conclusion of corollary 6.19 still holds if (6.30) is replaced by
!
E x0 (t) + A( E ) x(t) = 0
(6.31)
where the analytic A satisfies a Lipschitz condition in a neighborhood of zero, and
A( 0 ) is without uncertainty and non-singular.
Proof: By corollary 2.47 it follows that we can choose m0 so small that A( E )−1 2
can be bounded. The conclusion is now reached by following the steps of the proof
of corollary 6.19.
6.7
The coupled system
In Kokotović et al. (1986), results come in two flavors; one where approximations are
valid on any finite time interval, and one where stability of the slow dynamics in the
system makes the approximations valid without restriction to finite time intervals.
Compare lemma 2.34 and lemma 2.33, for the respective cases. Here, only finite time
intervals are considered, but the other case is treated just as easily.
Recall from section 6.1 how transformations of bounded condition number were
used to bring the original system (6.1) into the matrix-valued singular perturbation
form (6.5), repeated here,
!
x0 (t) + A11 (E) x(t) + A12 (E) z(t) = 0
!
E z 0 (t) + A21 (E) x(t) + A22 (E) z(t) = 0
(6.5x)
(6.5z)
6.7
169
The coupled system
Let the x̊ be the solution to
x̊0 = ( A11 (0) + A12 (0) L( 0 ) ) x̊
x̊(0) = x0
(6.32)
where L( 0 ) = −A22 (0)−1 A21 (0) according to (6.8), and let the solution to (6.5) at time
t be denoted x( t, E ), z( t, E ). For E = 0 (the nominal system), we have
x( t, 0 ) = x̊(t)
(6.33z)
z( t, 0 ) = L( 0 ) x̊(t)
(6.33z)
Summarizing the result of previous sections leads to the following theorem.
6.21 Theorem (Main theorem for pointwise index 0). Consider the form (6.5)
where E is pointwise non-singular, but otherwise unknown. The matrix expressions
Aij (E) have to satisfy a Lipschitz condition with respect to E, and A22 (0) is nonsingular (that is, the nominal dae is index 1). Assume that the initial conditions are
consistent with E = 0, and that A1–[6.14] holds with ā > max(A). Let I = [ 0, tf ] be a
finite interval of time. Then
sup |x( t, E ) − x( t, 0 )| = O(max(E))
(6.34x)
t∈I
sup |z( t, E ) − z( t, 0 )| = O(max(E))
(6.34z)
t∈I
Proof: Define L( E ) and H( E ) as in section 6.3,
the solution in terms of
and consider
ξ and η in (6.16). According to lemma 6.4, η( 0, E ) = O( kEk2 ) = O( max(E) ), and
then lemma 6.20 shows that supt≥0 η( t, E ) = O( max(E) ).
Note that x( t, 0 ) coincides with ξ( t, 0 ), so the left hand side of (6.34x) can be
bounded as
sup |x( t, E ) − x( t, 0 )| = sup ξ( t, E ) + H( E ) E η( t, E ) − ξ( t, 0 )
t∈I
t∈I
≤ sup |ξ( t, E ) − ξ( t, 0 )| + O( max(E)2 )
t∈I
To see that the first of these terms is O( max(E) ), note first that lemmas 6.4 and 6.7
give that the initial conditions for ξ are only O( max(E)2 ) away from x0 . Hence,
the restriction to a finite time interval gives that the contribution from initial conditions is negligible. The difference between the state feedback matrix of ξ( •, E )
and ξ( •, 0 ) in (6.16) is seen to be O( max(E) ) by using the Lipschitz conditions in
A11 (E) + A12 (E) L( E ). Hence, the contribution from perturbation of the state feedback matrix for ξ is O( max(E) ) according to lemma 2.34.
Concerning z,
sup |z( t, E ) − z( t, 0 )| = sup z( t, E ) + A22 (0)−1 A21 (0) x( t, 0 )
t∈I
t∈I
≤ sup z( t, E ) + A22 (0)−1 A21 (0) x( t, E )
t∈I
+ sup A22 (0)−1 A21 (0) ( x( t, 0 ) − x( t, E ) ) t∈I
170
6 lti ode of nominal index 1
The proof is completed by noting that
sup A22 (0)−1 A21 (0) ( x( t, 0 ) − x( t, E ) ) ≤ A22 (0)−1 A21 (0) 2 O( max(E) )
t∈I
= O( max(E) )
and
sup z( t, E ) + A22 (0)−1 A21 (0) x( t, E ) ≤ sup |z( t, E ) − L( E ) x( t, E )|
t∈I
t∈I
+ O( max(E) ) sup |x( t, E )|
t∈I
= sup η( t, E ) + O( max(E) ) sup |x( t, E )|
t∈I
t∈I
= O( max(E) )
since |x( t, E )| can be bounded over any finite time interval.
An immediate consequence of theorem 6.21 is the establishment of an O max Ẽ22
bound for the error introduced by neglecting Ẽ22 in (6.2). From a practical point of
view, though, this observation appears to be only of minor interest since determining
A22 (E) (or at least A22 (0)) seems necessary for any quantitative analysis of the fast
and uncertain dynamics.
6.8
Extension to non-zero pointwise index
With the exceptions of some results (including lemmas 6.3, 6.4, 6.7, and 6.8), the results so far require, via lemma 6.20, that E (or Ẽ22 in (6.2)) be pointwise non-singular.
However, we are able to obtain some results also when some singular values of E are
exactly zero. To that end, the results of the previous section will be extended to this
situation by revisiting the relevant proofs.
Since there are only finitely many choices of rank for E (that is, how many non-zero
singular values there are), showing convergence for an arbitrary value of the rank
immediately implies convergence independently of the rank.
6.22 Lemma. (Compare lemma 6.20.) In addition to the assumptions of lemma 6.4,
assume the perturbed dae has pointwise index no more than 1, and that its poles
(that is, the finite eigenvalues of the associated matrix pair) satisfy A1–[6.14]. Then,
sup E η( t, E ) = O( max(E)2 )
t≥0
Proof: The case of pointwise index 0, when E is full-rank, was treated in lemma 6.20,
so it remains to consider the case of pointwise index 1. When the rank of E is zero,
E = 0 and it is immediately seen from (6.7) that η must be identically zero and the
conclusion follows trivially. Hence, assume that the rank is neither full nor zero and
6.8
171
Extension to non-zero pointwise index
let
h
E = U1 (E)
U2 (E)
"
#"
#
i Σ(E) 0 V (E)T
1
0
0 V2 (E)T
be an SVD of E where Σ(E) is pointwise non-singular and
! of known dimensions.
η̄1
Applying the unknown change of variables η = V (E)
and the row operations
η̄2
represented by U (E)T, (6.7) turns into (dropping E from the notation)


 0  

I
  ξ (t)  A11 + A12 L A12 V1 A12 V2   ξ(t)  !



Σ 0 η̄10 (t) + 
K22
K23  η̄1 (t) = 0


 0  

0 0 η̄2 (t)
K32
K33
η̄2 (t)
where, for instance and in particular,
4
K33 = U2T A22 V2 − U2T E L A12 V2
= U2T A22 V2
Since the dae is known to be pointwise index 1, differentiation of the last group of
equations shows that K33 (E) is non-singular, and hence the change of variables
!
! "
#
I
0 η̄¯1 (t)
η̄1 (t)
(6.35)
=
η̄2 (t)
−K33 (E)−1 K32 (E) I η̄¯2 (t)
leads to the dae in ( ξ, η̄¯1 , η̄¯2 ) with matrix pair

 
−1
 I
 A11 + A12 LE A12 V1 − A12 V2 K33 K32


−1
 


Σ 0 , 
K22 − K23 K33 K32
 
 
0 0
0

A12 V2 

K23 

K33





It is seen that η̄¯2 = 0 and that η̄¯1 is given by an ode with state feedback matrix
4
Mη̄¯1 (E) = −Σ(E)−1 K22 (E) − K23 (E) K33 (E)−1 K32 (E)
Just like in lemma 6.20 it needs to be shown that the eigenvalues of this matrix grow
as max(E)−1 , but here we need to recall that E is not only present in Σ(E), but also in
the unknown unitary matrices U (E) and V (E).
Let m be a bound on max(E). Then kΣ(E)k2 = kEk2 ≤ m n.
From the partial block matrix inversion formula
"
#−1 
−1
−1

K22 (E) K23 (E)
=  K22 (E) − K23 (E) K33 (E) K32 (E)
K32 (E) K33 (E)
?
it follows that
"
K22 (E)
K (E)
32
K23 (E)
K33 (E)

?

?
#−1 −1 −1
≥ K22 (E) − K23 (E) K33 (E) K32 (E)
2
2
172
6 lti ode of nominal index 1
and hence
K22 (E) − K23 (E) K33 (E)−1 K32 (E) −1 2
−1 T
≤ U (E) A22 (E) − E L( E ) A12 (E) V (E)
2
−1
T
= V (E) A22 (E) − E L( E ) A12 (E)
U (E)
2
−1 = A22 (E) − E L( E ) A12 (E) (6.36)
2
This means that the eigenvalues are bounded from below by
−1
−1
−1 Mη̄¯1 (E)−1 = K22 (E) − K23 (E) K33 (E)−1 K32 (E)
Σ(E)
2
2
−1 −1
≥ m−1 n−1 A22 (E) − E L( E ) A12 (E) 2
and just like in lemma 6.20 the expression gives that the eigenvalues of Mη̄¯1 (E) grow
as m−1 . Before reaching the same conclusion as in lemma 6.20, it remains to show
that the constant ā∗ in the proof of corollary 6.19• can be chosen finite, but this
also
follows from (6.36). Hence, E can be chosen sufficiently small to make supt≥0 η̄¯1 (t)
bounded by some factor times η̄¯1 (0). Further,
! ! !
η̄¯ (0) η̄ (0) η̄ (0) 1
1
1
= η 0 ( E ) = O( max(E) )
¯
η̄1 (0) = ¯
=
≤
(6.37)
η̄2 (0) 0 η̄2 (0) Using this, the conclusion finally follows by taking such a small bound m on max(E),
!
!
¯1 (t) I
0
η̄
E η( t, E ) = E V
−1
0 −K33
K32 I
! !
I
0 Σ 0 T
O( max(E) )
≤ U
V V
−1
0 0
−K33
K32 I 2
!
Σ 0 O( max(E) ) = O( max(E)2 )
=
0 0 2
6.23 Corollary. Lemma 6.22 can be strengthened when z has only two components.
Then, just like in lemma 6.20, the conclusion is
sup η( t, E ) = O( max(E) )
t≥0
Proof: The only rank of E that needs to be considered is 1, and then η̄¯1 will be a
scalar and commute with the corresponding transition matrix φη̄¯1 . From (6.35) and
the last two equalities of (6.37) it follows that K33 (E)−1 K32 (E) η̄¯1 (0) = O( max(E) ),
•
The matrix A in the proof of corollary 6.19 is the trailing matrix of (6.30), here corresponding to the
trailing matrix of the dae for η̄¯1 .
6.8
173
Extension to non-zero pointwise index
and hence
!
! η̄1 (t) η̄¯1 (t)
sup = sup −K (E)−1 K (E) φ ¯ ( t, 0 ) η̄¯ (0) η̄
(t)
33
32
η̄1
1
2
t≥0
t≥0
−1
¯
¯
≤ sup η̄1 (t) + K33 (E) K32 (E) η̄1 (0) sup φη̄¯1 ( t, 0 )2
t≥0
t≥0
= O( max(E) )
It just remains to use η(t) = η̄(t). An alternative proof is included as a footnote.•
Theorem 6.21 can be extended as follows.
6.24 Theorem. Consider the setup (6.5), but rather than assuming that E be pointwise non-singular, it is assumed that E is a matrix with max(E) ≤ m, and that the
dae has pointwise index no more than 1. Except regarding E, the same assumptions
that were made in theorem 6.21 are made here. Then
sup |x( t, E ) − x( t, 0 )| = O( max(E) )
(6.38x)
t∈I
sup |E [z( t, E ) − z( t, 0 ) ] | = O( max(E)2 )
(6.38z)
t∈I
where the rather useless second equation is included for comparison with theorem 6.21.
Proof: Define L(E) and H(E) as above, and consider the solution expressed in the
variables ξ and η. Lemma 6.22 shows how E η is bounded uniformly over time. Note
that x( t, 0 ) coincides with ξ( t, 0 ), so the left hand side of (6.38x) can be bounded as
sup |x( t, E ) − x( t, 0 )| = sup ξ( t, E ) + H( E ) E η( t, E ) − ξ( t, 0 )
t∈I
t∈I
≤ sup |ξ( t, E ) − ξ( t, 0 )| + O( max(E)2 )
t∈I
The conclusion concerning x then follows by an identical argument to that found in
the proof of theorem 6.21.
For the weak conclusion regarding E z, the relation z = L( E ) x + η together with
(6.38x) and lemma 6.22 immediately yields the bound.
•
The alternative proof uses that K33 (E)−1 K32 (E) is a scalar, and hence of perfect condition number. It
follows that
sup η̄2 (t) ≤ K33 (E)−1 K32 (E)2 sup η̄¯1 (t)
t≥0
t≥0
≤ K33 (E)−1 K32 (E)2 sup φη̄¯1 ( t, 0 ) η̄¯1 (0)
t≥0
2
≤ sup φη̄¯1 ( t, 0 ) K33 (E)−1 K32 (E)2
2
t≥0
|
K (E)−1 K (E) −1 η̄ (0)
33
32
2
2
{z
=1
}
174
6 lti ode of nominal index 1
The following result reminds of the example concerning multiple time scale singular
perturbations given in Abed and Tits (1986), see section 2.5.3. That is, this is not the
first time in the history of singular perturbations that a result has been shown only
in the case when the fast dynamics has dimension two. Unlike Abed and Tits (1986),
however, we have not been able to show that the result holds only in this case.
6.25 Corollary. Theorem 6.24 can be strengthened in case z has only two components. Then (6.38z) is replaced by
sup |(z( t, E ) − z( t, 0 ) ) | = O( max(E) )
t∈I
Proof: Follows by repeating the argument of theorem 6.21 using corollary 6.23.
6.26 Remark. Regarding the failure to show convergence in z unless it has at most two components: Having excluded the possibility of bounding η by looking at the matrix exponential
alone, it remains to explore the fact that we are actually not interested in knowing the maximum gain from initial conditions to later states of the trajectory of η. That is, since the initial
conditions are a function of E, it might be sufficient to maximize over a subset of initial conditions. Here, it is expected that lemma 6.8 will come to use. Compare section 7.4.
6.9
Examples
The examples given here are primarily meant to illustrate the convergence property
being established in this chapter. We shall consider an uncertain dae and plot trajectories of randomly selected instances of the dae which are in agreement with the
assumptions proposed in this chapter. In order to make a close connection to theory,
the spread of these trajectories should be related to the bounds which can be constructed from the proofs herein. However, as has been indicated above, these bounds
will be overly pessimistic, and as we work them out it will be clear that they give
bounds which do not fit into our plots. Again, this stresses the qualitative nature of
our results.
The first example is constructed from the index 1 dae in the variable x̄, with matrix
pair (written A
E)
1.3 0.17 4.6 · 10−2
0.34 0.66 0.66
0.87 0.83 0.14
111
000
000
with the only finite eigenvalue −2.32. By operating on the equation with random
orthogonal matrices from both sides, an equally well behaved system of equations
should be obtained. The rows and columns are ordered so that the best pivot entry
is at the (1, 1) position — otherwise numerics would come out worse than necessary.
Finally an interval uncertainty of ±1.0 · 10−2 is added to each matrix entry. This results
6.9
175
Examples
in the pair
0.21±1 · 10−2 1.3±1 · 10−2 3.2 · 10−2 ±1 · 10−2
0.69±1 · 10−2 0.38±1 · 10−2
0.65±1 · 10−2
0.85±1 · 10−2 0.78±1 · 10−2
0.14±1 · 10−2
!
1.1±1 · 10−2
0.95±1 · 10−2
0.99±1 · 10−2
6.5 · 10−2 ±1 · 10−2 5.9 · 10−2 ±1 · 10−2 6.2 · 10−2 ±1 · 10−2
−5.2 · 10−2 ±1 · 10−2 −4.7 · 10−2 ±1 · 10−2 −4.9 · 10−2 ±1 · 10−2
!
6.27 Example
To illustrate theorem 6.21, we first take the equations into the form (6.5), propagating
interval uncertainties,
!
0.2±1.1 · 10−2
1.1±3.5 · 10−2
−0.16±2.4 · 10−2
−2
−2
−3
−2
0.68±1.3 · 10
−0.31±4.8 · 10
8.7 · 10 ±3.6 · 10
0.86±1.3 · 10−2 6.5 · 10−2 ±5.1 · 10−2 −0.68±3.9 · 10−2
(6.39)
1
0
0
0 −1.9 · 10−4 ±2 · 10−2 −2 · 10−4 ±2.1 · 10−2
0 1.9 · 10−4 ±2 · 10−2 1.9 · 10−4 ±2 · 10−2
This gives us the bound m for max(E), and noting that what is A in (A1–6.26) will be
close to A22 (E) in (6.5), we get approximately the lower bound for ā. Picking values
for ā, φ0 , and with R0 = 5 (this is big enough to encompass the slow eigenvalue, as
will be seen soon), the region where eigenvalues are allowed has been determined.
Ignoring the O(max(E)) terms in the transform to the form (6.7), we still obtain an
approximation of the interval uncertainties in (6.7), and are thus able to obtain an
approximate interval for the eigenvalue of the slow dynamics. It turns out to be −2.5±
1.2, so at least the uncertainties do not destroy the stability of the slow dynamics.
Next, the interval uncertainties in (6.5) are replaced by independent rectangular random variables, and the random equation is then sampled. The samples in the leading
matrix are scaled such that max(E) = m (expecting the constraint max(E) = m to be
active in the worst case). Samples which do not fulfill the eigenvalue condition are
rejected, while valid samples are used to compute and plot the trajectory of x̄1 to give
an idea of the uncertainty in the solution, see figure 6.5.
For comparison, we outline how the bound based on the proofs herein can be approximated. Since the uncertain leading matrix of the fast dynamics will be of the same
order of magnitude as the interval uncertainty, and since this will also be the order of
magnitude of the initial conditions for the fast and uncertain dynamics, the crucial
question is how the initial condition response gain of the fast and uncertain
dynam
ics can be bounded. The tightest bound is obtained with ā∗ = max(A) A−1 2 n,
where A is the trailing matrix of thefast and uncertain dynamics. Conservatively
over-estimating each of max(A) and A−1 2 independently of the other (where the
latter norm is approximated using local optimization), it is found that ā∗ < 17.5. This
corresponds to a gain of 4.2 · 104 . The inverse of this gain is an upper bound for the
order of magnitude that can be tolerated in the original uncertainty, if the computed
uncertainty estimates shall be at all usable. Being two orders of magnitude smaller
than the size of the uncertainties used to generate figure 6.5, this appears overly conservative.
176
6 lti ode of nominal index 1
1
0.5
0
0
0.5
1
1.5
Figure 6.5: Random trajectories from systems in the form (6.5), satisfying the
eigenvalue condition. The uncertainty intervals around each region is due to the
uncertainty in the change of variables leading to (6.5).
We shall now consider the same uncertain dae once more. This time, we will
bring the equations into a form where theorem 6.16 can be applied instead of
corollary 6.19.
In reaching (6.39), uncertain row and column operations had to be performed on the
equations. Since the elimination operations are given by simple rational expressions
in the uncertain entries of the matrices, it is straightforward to compute the uncertainty in the transforms. At this point, however, it is time to apply the decoupling
transforms, being given as the solution to second order matrix equations, and accurately computing the resulting interval uncertainties is non-trivial. We note the
following — two approximate and one conservative — options
• Compute the nominal solution, and then find an approximate interval solution
by solving the first order approximations at the nominal solution.
• Compute the nominal solution, and use it as a starting point in a local optimization method which optimizes the lower and upper interval bounds for each entry in the solution. Though a theoretically sound approach, this method has the
disadvantages that it is both time consuming and relies on local optimization
in possibly non-convex problems.
• Derive L and H using the same technique as is done later in section 7.A. Then
outer solutions for the uncertainties follow from the constructive style of the
fixed-point argument, and iterative refinement may then be applied to decrease
the uncertainties while maintaining the property of being outer approximations. This technique will be applied in example 7.3 on page 194.
Since the only conservative option relies on a technique which was not used in the
present chapter, we opt for one of the approximations in the present section. For
the particular problem of finding the interval solution to the matrix inverse problem,
6.9
177
Examples
x1
x2
0
0.5
1
1.5 t
x3
0
0.5
1
1.5 t
1
−0.2
0.5
−0.5
0
−1
−0.4
−0.6
0.5
1
1.5 t
Figure 6.6: Bounds on the uncertainty in the solution. That the bounds do not
converge as the solution components tend to zero is a consequence of the emphasis being on uniform convergence. Converging bounds are easily obtained
by using the time dependency of the bound in theorem 2.27.
examples show that the two approximation approaches often produce very similar
results, while the result as computed by Mathematica is distinctively more conservative. Even though we are just computing approximate solutions, we shall use the result computed by Mathematica since it presumably uses techniques which have been
more carefully developed than the two approximations listed above. In case a first
order approximation leads to a problem which can be solved via matrix inversion, we
will do so.
Now we have the tools to proceed with the decoupling transforms.
6.28 Example
In this example, we will be able to derive bounds on the uncertainty in the solution
to the coupled system, which — if not tight — are at least possible to visualize in
the same plots as the nominal solution. To this end we will use smaller interval
uncertainties than in the previous example; instead of ±1.0 · 10−2 we add just ±1.0 · 10−6
to the unperturbed matrix entries. Instead of (6.39), we now obtain
!
0.2±1.1 · 10−6
1.1±3.5 · 10−6
−0.16±2.4 · 10−6
0.68±1.3 · 10−6 −0.31±4.8 · 10−6 9.1 · 10−3 ±3.6 · 10−6
0.86±1.3 · 10−6 6.5 · 10−2 ±5 · 10−6 −0.68±3.9 · 10−6
(6.40)
1
0
0
0 −1.9 · 10−12 ±2 · 10−6 −2 · 10−12 ±2.1 · 10−6
0 1.9 · 10−12 ±2 · 10−6 1.9 · 10−12 ±2 · 10−6
Recall the equations for the decoupling transforms, (6.9) and (6.17). In the nota−1
tion of these equations, knowing that L( E ) = −M22
M21 + O( max(E) ) and H( E ) =
−1
M12 M22 + O( max(E) ), the first order approximations are seen to be
!
−1
−1
0 = A21 + A22 L + E M22
M21 A11 − A12 M22
M21
! −1
0 = A11 + A12 L( 0 ) M12 M22
E + A12 − H A22 − E L( 0 ) A12
178
6 lti ode of nominal index 1
These equations are solved using matrix inversion, and after application of the two
decoupling transforms, the pair becomes
!
2.3±1.8 · 10−4
0
0
0
−0.31±1.3 · 10−5 9.1 · 10−3 ±4.8 · 10−6
0
6.5 · 10−2 ±1.3 · 10−5 −0.68±5.1 · 10−6
(6.41)
1
0
0
0 −1.9 · 10−12 ±2 · 10−6 −2 · 10−12 ±2.1 · 10−6
0 1.9 · 10−12 ±2 · 10−6 1.9 · 10−12 ±2 · 10−6
As a last step, a matrix inverse is applied to the rows of the fast and uncertain dynamics to bring it into the form of theorem 6.16;
2.3±1.8 · 10−4 0 0
0
0
1
10
01
0
0
0 6.1 · 10−12 ±6.6 · 10−6 6.2 · 10−12 ±6.7 · 10−6
0 −2.2 · 10−12 ±3.6 · 10−6 −2.3 · 10−12 ±3.7 · 10−6
(6.42)
In the previous example, the initial conditions were never stated explicitly. Given
the unperturbed system at hand, the set of initial conditions which are valid for arbitrarily small perturbations form a one-dimensional
linear space. Fixing the first
0
component to 1 implies x̄ = 1. −0.923 −0.619 . Transforming to the variables of
(6.42), the initial conditions are given by
!  −0.415 ± 7.87 · 10−5 


ξ0
=  1.95 · 10−9 ± 3.4 · 10−4  , η 0 ≤ 4.24 · 10−4
0


η
1.59 · 10−9 ± 2.52 · 10−4
With n = 2, ā = 1.1, and φ0 = 1.4, (6.29) used in theorem 2.27 gives the bound
83.7 on the gain from η 0 to η(t). This allows each of the components of η(t) to be
bounded by a small constant. Concerning ξ(t), it is a scalar system, so upper and
lower bounds are easy to compute given the intervals of ξ 0 and the corresponding
eigenvalue (seen in (6.42)). Inverting the variable transforms one by one•, we are
finally able to compute interval uncertainties in the original variables. The bounds
are shown in figure 6.6.
6.10
Conclusions
The chapter has derived a matrix-valued singular perturbation problem related to
the analysis of uncertain lti dae of nominal index 1. The perturbation problem has
been solved using assumptions in terms of system features, namely its poles. Depending on whether we also made the assumption that the dae be pointwise index 0
or not, the convergence results come out a bit different except for when the fast and
uncertain dynamics has at most two states.
The decoupling transformations related to the matrix-valued singular perturbation
•
It is also possible to compute the aggregated variable transform by multiplying the individual transforms,
and then compute just one inverse, but it turns out that this causes loss of precision compared to computing the inverses of the individual transforms.
6.10
Conclusions
179
problem have been analyzed using asymptotic arguments. The reason for not deploying the more constructive methods used in the following chapters has been to
preserve the style of the original published work that the chapter builds upon. (Except for that, though, the style of presentation has been changed substantially for
better compatibility with the following chapters.)
The problem of bounding the norm of a matrix, given a bound on the moduli of its
eigenvalues and an entry-wise max bound on its inverse, has been motivated as a
useful tool in analysis of uncertain dae. A bound has been derived, and its quality
has been addressed in examples.
An example was used to illustrate the usefulness of analyzing the equations in a form
where the uncertainties of the fast and uncertain sub-system were brought entirely
to the leading matrix. This idea will appear again in the next chapter, when we set
out to understand perturbed equations of nominal index 2.
Appendix
6.A
Details of proof of lemma 6.8
This section proves that there exists a ρL > 0 such that the equation
!
0 = m RL (E)
− L0 ( E ) A12 (E) L( E ) − L0 ( E )
− L( E ) − L0 ( E ) A11 (E) + A12 (E) L( E )
appearing in lemma 6.8 has a solution satisfying kRL (E)k2 ≤ ρL if max(E) is required
to be sufficiently small.
The equation is written as the fixed-point form
!
RL (E) = TL ( RL (E), E )
by defining
1 0
L ( E ) A12 (E) L( E ) − L0 ( E )
m
1 +
L( E ) − L0 ( E ) A11 (E) + A12 (E) L( E )
m
where the dependence on RL (E) is through L( E ).
4
TL ( RL (E), E ) =
From here on, the dependency of RL (E) and TL ( RL (E), E ) on E is dropped from
the notation, so instead of TL ( RL (E), E ) we just write TL RL . Consider RL ∈ L =
RL : kRL k2 ≤ ρL .
By the bounded derivative of all matrices in the problem that depend on E, it follows
that my requiring max(E) to be sufficiently small, it will be possible to find c3 , c0 , c11 ,
180
6.A
181
Details of proof of lemma 6.8
+
c11
, c12 which fulfill
A22 (E)−1 ≤ c3
2
L0 ( E ) ≤ c0
2
kA11 (E)k2 ≤ c11
+
A11 (E) + A12 (E) L0 ( E ) ≤ c11
2
kA12 (E)k2 ≤ c12
Applied to (6.18) these bounds yield
+
L( E ) − L0 ( E ) = m c3 c0 c11
+
m
ρ
L
2
To ensure that TL maps L into itself, first note that
1 L( E ) − L0 ( E ) c1
kTL RL k2 ≤
2
m
where upper bounds on m and ρL are used to ensure that c1 can be chosen to fulfill
(dropping dependencies on E from the notation)
+
L0 A12 + A11 + A12 L0 + m c3 c0 c11
+ m ρL kA12 k2 ≤ c1
2
2
for the lowest of all bounds imposed on max(E).
Setting
+
ρL B ( 1 + αL ) c3 c0 c11
c1
(6.43)
for some αL > 0, will then yield the condition
m≤
+
αL c0 c11
αL
1
=
ρL
1 + αL c3 c1
(6.44)
for TL to map L into itself.
For the contraction part of the argument, let L1 and L2 denote the expressions for
L( E ) corresponding to RL,1 and RL,2 , respectively. Then
( L2 − L0 )( A11 + A12 L2 ) − ( L1 − L0 )( A11 + A12 L1 )
= ( L1 − L0 ) A12 ( L2 − L1 ) + ( L2 − L1 ) ( A11 + A12 L2 )
As (6.18) shows that L( E ) is affine in m RL with the matrix coefficient A22 (E)−1 E
acting from the left, one obtains
1 0
TL RL,2 − TL RL,1 =
L A12 A−1
22 E m RL,2 − m RL,1
m
1
+ ( L1 − L0 ) A12 A−1
22 E m RL,2 − m RL,1
m
1
+ A−1
22 E m RL,2 − m RL,1 ( A11 + A12 L2 )
m
= L1 A12 A−1
E
R
−
R
L,2
L,1
22
−1
+ A22 E RL,2 − RL,1 ( A11 + A12 L2 )
182
6 lti ode of nominal index 1
Using upper bounds on m and ρL to ensure that cL may be chosen to fulfill
+
+ m ρL ≤ cL
kLk2 ≤ c0 + m c3 c0 c11
one obtains (using nz to denote the dimension of E)
TL RL,2 − TL RL,1 ≤ m nz c3 ( c11 + 2 c12 cL ) RL,2 − RL,1 2
2
Hence, the condition
m<
1
nz c3 ( c11 + 2 c12 cL )
(6.45)
implies that TL is a contraction on L, and the contraction principle (theorem 2.44)
gives that there is a unique solution RL ∈ L.
This concludes the argument, and the section ends with an small example.
6.29 Example
To illustrate lemma 6.8, let us consider a small example with constant matrices in
(6.5) given by
#
#
"
"
0.1 0.5 2.
1. 0.5
A12 =
A11 =
0.5 0.1 1.
0.1 1.




1. 
1.
2. 
−1.
 0.5
 3.


0.3  A22 =  0
1. 0.5
A21 = 




0.5 −0.5
−0.5 0.5 0
Sampling the matrix E randomly a large number of times, indicates that ρL might be
as low as 3.5 · 103 for m < 1.0 · 10−3 .•
Supposing (guided by the Monte Carlo analysis) that the bounds to be derived will
show that ρL ≤ 10 · 103 , and require m to be less than 0.01, one obtains the following
numeric values of the constants used in the proof of lemma 6.8.
c3 = 3.561
c0 = 7.598
c12 = 2.311
c1 = 28.06
c11 = 1.32
+
= 6.31
c11
cL = 12.87
Taking αL = 0.5 yields
ρL = 7.185 · 103
and the following two bounds on m
m ≤ 3.336 · 10−3
and
m < 1.54 · 10−3
These values are in accordance with the supposed bounds.
For a particular value of m, we may now use interval arithmetic fixed-point iteration
to improve the bound on RL . In this example, m = 1.0 · 10−3 and five fixed point
•
The entries of E are sampled from independent uniform distributions over [ −m, m ]. For each E, we first
compute L, and then solve for RL in (6.18). In case E is not full rank, it is still possible to solve for RL by
using pseudo inverse techniques.
6.A
Details of proof of lemma 6.8
iterations results in


[ −6.476 · 103 , 6.476 · 103 ] [ −1.817 · 102 , 1.817 · 102 ]


RL ∈ [ −5.322 · 103 , 5.322 · 103 ] [ −1.517 · 102 , 1.517 · 102 ]


3
3
2
2
[ −4.716 · 10 , 4.716 · 10 ] [ −1.331 · 10 , 1.331 · 10 ]
The improvement of the entry-wise bounds is larger for smaller values of m.
183
7
LTI ODE
of nominal index 2
In chapter 6 the convergence of uncertain dae of true and nominal indices at most
1 was considered. In this chapter, the nominal index 2 case will be considered. As
in the nominal index 1 case, the analysis depends on the true index, and to simplify
matters true indices higher than 0 will not be considered.
For many purposes, it turns out that it is useful to distinguish between dae based
on their index; lower index dae or higher index dae. Only index-0 and index 1 are
considered low (recall that these have strangeness index 0), and these are generally
considered easy to deal with in comparison with the higher indices. Hence, the current chapter opens the door to the analysis of equations which are expected to be
difficult to deal with in comparison to the equations in previous chapters.
The chapter is organized as follows. In section 7.1 a canonical form for perturbed
matrix pairs of nominal index 2 is proposed. Section 7.3 contains an analysis of the
growth of the uncertain eigenvalues as the uncertainties tend to zero, and the initial
conditions of the fast and uncertain subsystem is the topic of section 7.2. Then, in
section 7.4 we take a closer look at a very small index 2 system, and we will see both
that it is possible to prove convergence of solutions in this case, and that the index 2
case really is a lot harder to deal with than the lower index systems. Section 7.5
summarizes our conclusions from the chapter.
7.1
Canonical form
In chapter 6, we were able to analyze the equations without bringing the equations
into a form where we could really define the size of the uncertainties; scaling rows
and columns in the equation could change the absolute size of the uncertainties arbi185
186
7 lti ode of nominal index 2
trarily. However, (6.16) is only a small step away from
"
# 0 ! "
#
!
I
ξ (t) ! Mξξ + O( max(E) )
ξ(t)
=
E η 0 (t)
I η(t)
where E is still O( m ), although different compared to (6.16). It would be possible to
add some more structure by, for instance, bringing Mξξ + O( max(E) ) into the form
Jξξ + O( max(E) ) where Jξξ would be the Jordan form of the nominal Mξξ , but since
there are many choices, and the choice has no implications for our understanding of
the pair (or dae) aspects of the matrix pair, we prefer to defer the choice of structure
for Mξξ + O( max(E) ). As this form can be reached from the original, coupled, equation using row and column operations which are nominally non-singular, and with
uncertainties of size O( m ), this may be considered a canonical form for perturbed
nominal index 1 lti dae — recall how the use of this form was intrumental for the
improvement in example 6.28 over example 6.27. In this section, a corresponding
canonical form is derived for lti dae of nominal index 2. The theorem is formulated
in terms of matrix pairs.
7.1 Theorem (Perturbed index 2 canonical form). Consider the parameterized set
of uncertain matrix pairs
( E(m), A(m) ) ,
m≥0
satisfying
E(m) − E 0 = O( m )
A(m) − A0 = O( m )
for some point matrix pair E 0 , A0 of index 2.
Then there exist a number m0 > 0 and uncertain regular matrices T (m) and V (m)
with
cond T (m) = O( m0 )
cond V (m) = O( m0 )
such that for all m ≤ m0
( E(m), A(m) ) ⊂



  I

  I

 

 
E
E
T (m)  
 ,
(m)
(m)

 
33
34

 
E
E

43
(m) 44
(m) 


A
 J + 11

(m)




I 





I


A

I
44
(m) 
where J is a point matrix and
Proof: See section 7.1.1.
E
ij
= O( m ),
i, j ∈ { 3, 4 }
A
ii
= O( m ),
i ∈ { 1, 4 }





 V (m)−1



7.1
Canonical form
187
The canonical form will first be derived by assuming that the Weierstrass form of
the nominal matrix pair is known. Except for the initial step where the Weierstrass
decomposition is used (see comment below), the form is derived constructively by
prescribing a sequence of transformations which each bring some additional structure to the matrix pair representing the equations. Each transformation is nominally
invertible, with O( m ) uncertainty, m as usual being the entry-wise bound on the
matrix entries in the original matrix pair. The sequence ends at a stage where the
perturbed nominal index 2 nature is obvious and we are unable to add more structure using nominally invertible transformations.
What makes the Weierstrass decoupling step non-constructive is that the nominal
matrix to be decomposed is typically not obvious in applications. Rather, the nominal
matrix is something which is selected as a means to obtain as small perturbation
bounds as possible, and it is not until the pair has been transformed into a form
which reveals more of the structure in the pair that the selection should take place.
Motivated by the practical shortcomings of the derivation based on the Weierstrass
form, the same form will be derived again using a sequence of steps which is possible
to use in applications.
Before we start, we also remark that the proposed form — similar to the Weierstrass
form — may be best suited for theoretical arguments regarding perturbed matrix
pairs. Once their theory is better understood, other forms based on approximate orthogonal matrix decompositions may be both be more applicable (allowing for larger
uncertainties) and able to deliver higher accuracy. We shall not explore such forms
in this chapter, but the idea was present back in theorem 6.24.
7.1.1
Derivation based on Weierstrass decomposition
The Weierstrass canonical form (recall theorem 2.16) allows us to identify any nominal index 2 dae with a pair in the form
"
#
"
#
!
I
J
−1
0
−1
0
V +E , T
V +F
T
N
I
where T and V are non-singular matrices, and E 0 and F 0 (here, non-negative superscripts are used as ornaments, while the superscript
−1 denotes
inverse) are
matrix
0
0
the uncertain perturbations satisfying max E ≤ m and max F ≤ m. Note that
while the nominal index is 2, the pointwise index is generally 0 since the perturbation E 0 generally makes the leading-matrix non-singular. By application of T −1 from
the left and V from the right, the pair transforms into
"
# "
#!
1
1
1
1
I + E11
E12
J + F11
F12
,
1
1
1
1
E21
N + E22
F21
I + F22
where E 1 = T −1 E V = O( m ) and F 1 = T −1 F V = O( m ) since T and V were assumed
non-singular point matrices.
1
Since I +E11
= I +O( m ), taking m sufficiently small will make it invertible according
to corollary 2.47, and applying the inverse as a small but uncertain row operation
188
7 lti ode of nominal index 2
produces
"
I
1
E21
# "
2
2
J + F11
E12
1
1 ,
F21
N + E22
2
F12
1
I + F22
#!
where, for instance,
1
1
1 −1
1
1 −1
2
J = O( m )
− E11
F11
− J = I + E11
J + F11
= I + E11
F11
It will be necessary to apply corollary 2.47 in nearly every transformation we make,
so from here on we take its use for granted at any point where needed. Since the
number of applications will be finite, the smallest of all imposed bounds on m will
still be positive.
We are now only one near-identity row operation and one near-identity column operation from
"
# "
#!
2
3
I
J + F11
F12
3 ,
3
3
N + E22
F21
I + F22
The O( m ) property of E 3 and F 3 is maintained.
Since the Jordan blocks of N are of size 1 or 2, with at least one of size 2 (or the
nominal index would be less than 2), there are numbers n1 and n2 such that n1 + 2 n2
equals the size of N , with n2 being the total number of off-diagonal 1 entries. Let
us consider permutations of rows and columns in the pair ( N , I ) for a while. By
permuting rows and columns in the same way, it is possible to bring the n1 Jordan
blocks of size 1 to the lower right part of N :
#!
" 1
# "
I
N2
,
I
0
By permuting columns in N21 so that columns 2, 4, . . . appears before columns
1, 3, . . . , and permuting rows so that rows 1, 3, . . . appears before rows 2, 4, . . . , one
obtains the form

 

 
 I 0
  0 I
 I 0
 
  0 0
 
 
 , 
 


0
I
and we finally swap the second and third block rows and columns to obtain
 


I  
 I
 

 
 

 , 
I
0
 
 
 


0
I
Using these permutations in the perturbed pair results in

 
2
4
4
F12
F13
 I
 J + F11
 
 4
4
4
4 
4
4

I + E22 E23 E24   F21
 
F22
F23
 
,  4

4
4
4 
4
4
 

E32
E33 E34   F31
F32
I + F33
 
4
4
4  
4
4
4
E42
E43 E44
F41
I + F42
F43

4
F14

4 

I + F24

4
F34 

4
F44







7.1
189
Canonical form
4
Similar to the first transforms, we now use that I + E22
can be reduced to I using a
matrix inverse for sufficiently small m, and then be used to eliminate below and to
the right in the leading matrix.


 
2
5
5
5
F12
F13
F14
 I
 
 J + F11
 
5
5
5
5 
 
  F21
I
F
F
I
+
F
 
 
22
23
24


,

 F5
5
5  
5
5
5
 
 
E
E

F
I
+
F
F


33
34
32
33
34 
 
 
 31
5
5  
5
5
5
5
E43 E44
F41
I + F42
F43
F44
Yet three more rounds of two inversions and elimination results in


 
2
5
5
6
F12
F13
F14
 I
 
 J + F11
 


5
5
5
6
  F21
I
 
F22
F23 I + F24  

 

 
,


6
6
6
6
 
 
E33 E34   F31
F32
I
 
 


6
6
6
6
6
E43 E44
F41
I + F42
F44

 

7
7
7
7
7
 I + E11 E12
 J + F11 F12 F13
 
 


 
7
5
5
  F 7
  E21
 
I
F
F
I


21
22
23
 


 
,
 7
6
7 
7
E33
E34
F32
I
 
  F31
 

7
7  
7  
E43 E44
I
F44


 
8
8
8
 I
 
 J + F11 F12 F13
 

  F 8
8
8
I
 
F22
F23
I  


21

 

 
,


6
7
7
8
 
 
E33 E34   F31
F32
I
 



7
7
7  
E43
E44
I
F44
8
8
This is the form in which the decouplings of section 7.A apply. Due to F21
, F31
being O( m ), also the decoupling transforms will only be O( m ) away from identity
transforms. After the transforms, the pair has the form



  J + F9
 
  I

11
 
 
 
 
 
  I
9
9
9 
F22
F23
I + F24
 
 
 
 
 
6
7 
(7.1)
 , 

 
E34
E33
9
9
9
 
 
F32 I + F33 F34  
 
 
7
7 
 
 

E43
E44

9
9
9
I + F42
F43
F44
9
The first block to eliminate is F22
, it takes two rounds.




9
 J + F11

  I


 
 

10
10
  I E E  
10
10 

F
I
+
F
23
24 

 
23
 

24 

 
,



6
7
 
9
9
9

E33 E34  
 
F
I
+
F
F
 

32
33
34 
 



7
7  

 

9
9
9
E43
E44
I + F42
F43
F44



  J + F9
  I


11
 
 

  I
 

10
10 
 
 

F
I
+
F
23
 



24
6
7 
 
 , 

E
E
9
11
11
 

33
34 
 
F
I
+
F
F



32
33
34
 
7
7 
 


E43
E44

9
11
11 
I + F42 F43
F44 






















190
7 lti ode of nominal index 2
Finally, four more O( m ) blocks are removed.


 
9
  I
 
  J + F11
 
 
 

  I


I  
 
 
 
 

12
12 
,

12
12
 
E33 E34  
I + F33 F34  
 
 
 
 
 
12
12
12
12 

 
E43 E44 
I
F43
F44



  I

  I

 

 
13
13 
 
E33 E34  ,
 

 
13
13 


E43
E44


9

 J + F11




I 





I


13 


I
F44









(7.2)
Equation (7.2) is the proposed canonical form. At the cost of a substantial increase
13
13
in the uncertainties, one could additionally obtain E33
and F44
in real Schur form.
The reason they cannot be put in Jordan canonical form is that the condition number
of the similarity transform must be possible to bound in order to maintain the O( m )
size of the uncertainties.
7.1.2
Derivation without use of Weierstrass decomposition
When we now derive the same canonical form again, we will be able to make reuse
of the latter part of the derivation in the previous section. The interesting part of the
derivation is how to get started without knowing the nominal pair.
The notation in this section is independent of that introduced in the previous section.
While corollary 2.47 was used repeatedly in the previous section to invert perturbed
identity matrices. We start by making a similar remark regarding perturbed full
rank matrices. Consider the perturbed matrix X + F with no less rows than columns,
where X being the nominal matrix has full column rank, and F is a perturbation of
size O( m ). Then a QR decomposition brings the perturbed matrix into the form
"
#
R + Q1T F
Q2T F
where R is an invertible point matrix, and QT F is still an O( m ) perturbation. It
follows by corollary 2.47 that taking m sufficiently small will allow the upper block
to be inverted using a column operation of bounded condition number, leading to


I


−1 


QT F R + QT F
2
1
where the lower block is still of size O( m ). Hence, a row operation of bounded
condition number brings the matrix into the final form
" #
I
0
7.1
191
Canonical form
The procedure of bounding m to be sufficiently small for this reduction to be possible
will be applied several times in the following derivation, and we take its use for
granted at any point where needed. The transposed case is analogous, and since the
total number of applications is finite, the smallest of all imposed bounds on m will
still be positive, and the respective products of all row and column operations will
still have bounded condition numbers.
Since the nominal matrices are unknown to us this time, we simply write the original
matrix pair as
E 0 , A0
(7.3)
where both E 0 and A0 are uncertain matrices.
As always, we start by applying row operations and column permutations until we
reach the form

 

  E 1 E 1   A1 A1  
  11 12  ,  11 12  

 
 
E 1   A1 A1  
22
1
E11
21
22
1
E22
1
where
is non-singular while
= O( m ). Typically, E22
will be so small that
it has no entries which can be distinguished from 0 — one proceeds with row operations as long as there exists non-zero entries to pivot on. When transforms are
1
applied to given pair data, however, there is no m which tends to zero, and E22
may
even contain non-zero intervals as long as they are sufficiently small — rather than
pivoting on a very small entry with large relative uncertainty, it may be wiser to artificially increase the uncertainty so that zero is within the interval of uncertainty in
order to avoid a very large uncertainties in other parts of the equations. For further
discussion of how to think of O( m ), see section 1.2.4.
Next, column operations are applied to yield


  2
2 
  I
 A
A

 
  11 12 
 

1 
 ,  2

E22
A21 A222 





and if A222 would be non-singular, we know from chapter 6 that the pair can be decoupled and that there is a natural choice of nominal equations of index 1. Since we
are considering dae of nominal index 2 in this section, it follows that A222 is singular
(that is, the uncertainties allows for an instantiation of A222 which is singular in the
ordinary sense of point matrices).
Since A222 is singular, it is possible to apply row and column operations which decomposes A222 in the same way as E 0 was decomposed.


  2
 A11 A312 A313  
  I

 
 
 
 
  E 3 E 3   3
 
 A21 I
 
,
(7.4)
22
23 

 
 
 
  E 3 E 3   3

3  
A
F
32
33
31
33
3
where F33
= O( m ) (and the same remark regarding given data applies again).
192
7 lti ode of nominal index 2
If A331 would not have full row rank, it would be possible to row reduce to reveal
a row with only small and uncertain coefficients, and the corresponding row in the
leading matrix would also contain only small and uncertain entries. Hence, the unit
basis vector which corresponds to this row would be in the left null space of both
the leading and the trailing matrix, showing that the matrix pair is singular. This
would contradict the index 2 assumption according to corollary 2.12, and it follows
that A331 must at least have full row rank in the nominal case. It follows that there
exists a positive bound on m which will make also the uncertain A331 have full row
rank. In particular, this means that A331 must have at least as many columns as rows.
By symmetry, it follows that A313 has the transposed size. If A313 would not have full
column rank, column operations would reveal a vector in the right null space of both
matrices, again showing that the pair is singular. However, we shall soon see that
A313 having full column rank is implied by the property that the nominal index of the
dae is 2.
7.2 Remark. That the nominal A331 has full row rank is not particularly related to the index 2,
but will be necessary for any finite index.
We now know the existence of column operations on A331 which are applied together
4
4
with row operations which maintain the leading matrix (the matrices Eij
and F44
introduced here are identical to matrices in the previous step, but the new notation
is introduced to avoid confusing subscripts in disagreement with the block structure),
yielding



  A4 A4 A4 A4  
  I

 
  11 12 13 14  
  I
  4
 
 
  A
A422 A423 A424  
21
 


 
4
4 
(7.5)
 
 ,  4
 
E33
E34
 
  A31 A432 I
 
 
 
4
4 
 

E33
E44

 
4 
I
F44
Until now, we have not made use of the property that the nominal index of the pair
be 2, but at this point the pair has enough structure to directly relate it to its index via
the shuffle algorithm. Let us temporarily consider the following nominal equation.



 A411 A412 A413 A414  
 


 
  I
  4
 
  A21 A422 A423 A424  
  I

, 
 
 
 
 
0 0   4
 
 A31 A432 I

 

0 0 
I
0
Shuffling the last two rows leads to


  I

 

 
I

 
,
  4
  A31 A432 I 
 


I
0


 A411 A412 A413 A414 


 4

4
4
4 
 A

 21 A22 A23 A24 


 0
0
0
0 

0
0
0
0









7.1
193
Canonical form
which is row reduced to


  I

 

 
I

 
,
  4
  A31 A432 I 
 


0
0

 A411

 4
 A
 21

 0

 −A4
21
A412
A413
A422 A423
0
0
−A422 −A423

A414  
 

A424  
 
0  
 
−A4  
24
Shuffling the last row shows that the nominal A424 will have to be non-singular for
the index 2 property to hold, and we now return to the non-nominal equations. For
sufficiently small m, A424 will be regular, enabling the following three steps to be
carried out.




  A4 A4 A4 A5  
  I

  11 12 13 14  
 
  4
 
  I
  A
A422 A423 I  
 
21


 , 
4
5 
 
 
  4
E33
E34
 
 
  A31 A432 I
 
 
 
4
5 
 
 

E
E


5 
33
44
I
F 
44


  I E 6

 
12

 

 
I

 
,
 
4
5 

 
E33
E34

 

 
4
5 


E33
E44


  I
 

  I

 

 

4
5 
 
,
E
E
 

33
34 
 

4
5 
 
E33 E44 



 A611 A612 A613



 4

4
4
 A

A
A
I
 21 22 23

 4

4
 A

 31 A32 I




5 
I
F44 


 A611 A712 A613



 4

7
4
 A

A
A
I
 21 22 23

 4

7
 A

 31 A32 I




5 
I
F44 






















(7.6)
Here, the form where the decoupling transforms of section 7.A apply has been
reached, and the remaining steps towards the canonical form are exactly the same as
in the previous section.
Note that, in the previous section, both the regularity and the nominal index of the
matrix pair were ensured by using the Weierstrass canonical form as a starting point.
In the current section, these properties were added as requirements along the way
in order to be able to proceed with the reduction towards the canonical form. The
regularity property will always be necessary in order to rule out system where the
uncertainties have destroyed the nominal solution. The restriction to system of nominal index at most 2, on the other hand, would make sense to relax.
7.1.3
Example
As an illustration of the two approaches to the canonical form, we shall use the Weierstrass form to construct an example of nominal index 2, and then use the constructive approach to rediscover the nominal structure. Among other things, the example
194
7 lti ode of nominal index 2
will show that the O( m ) of the theoretical development can be turned into concrete
quantities when our techniques are applied to data.
Due to space constraints, we are unable to present matrix entries to sufficient precision to make it possible to repeat the computations. The data is given in section 7.B,
but readers interested in the full precision will receive the complete computation as
a Mathematica notebook upon request.
7.3 Example
In order to avoid trivial dimensions, let us take as example a matrix pair in Weierstrass form (recall theorem 2.16) with
Eigenvalue
0
−0.5
∞
∞
∞
∞
Size of Jordan block
1
2
1
1
2
2
That is, the slow dynamics has 3 states, the index of the pair is 2, there are 2 index 1
variables, and 2 index 2 subsystems with 2 variables each. The Weierstrass form
is mutliplied by random matrices from the left (condition number 2.7) and right
(condition number 1.9), so that a pair with known eigenvalue structure, but no visible
structure, is obtained. Finally, an interval uncertainty of ±1.0 · 10−9 is added to each
entry. The added uncertainty may be strikingly small, but this is necessitated by the
conservative estimates we will make to obtain the initial outer solution in the first
decoupling step. The resulting matrix pair is show in table 7.2 on page 222.
Carrying out the transformation steps given in section 7.1.2, the matrix pair in table 7.3 is obtained, along with two chains of uncertain transformation matrices (one
operating from the left, and one from the right). Multiplying together the factors in
each chain, the condition numbers can be bounded as 40.0 (left) and 69.5 (right).
Regarding the two decoupling steps, implemented based the derivation in section 7.A.1 (applied to the transposed matrices in the second step), it is the first one
which turns out critical here, requiring the uncertainties to be very small to ensure
that the transformation is valid. As the O( m ) expressions in the derivation are
replaced by interval quantities in computations, there is no use of the parameter m,
and it may — without loss of generality — be set to 1. The items below provide some
insight into the computations.
• Bounding constants: cL0 B 1.00 · 10−3 , cL B 13.0 · 100 , cE B 9.03 · 10−7 , αL B
8.36 · 10−2 , ρL B 1.41 · 10−2 .
• Constraint on m: 1 ≤ 1.55.
• After a few rounds of iterative refinement (using (7.35) solved with respect to
RL by inverting L), the uncertainty kRL k2 comes down to 3.59 · 10−3 .
7.2
195
Initial conditions
The rather large potential for improvement of the initial outer solution for RL indicates that there is also potential for relaxing the constraint on m.
To verify that the decomposition is valid, the transformation chains are applied to the
pair in the canonical form, which shall result in a matrix pair containing the original
matrix pair. Consider the leading matrix, given by an expression like
T1 T2 · · · Tn E Vm · · · V2 V1
For point matrices, the order in which the multiplications are carried out would not
matter, but as an illustration of the nature of interval arithmetic, we compare two
different ways of carrying them out.
• Collapse the transformation matrices first, with multiplication associating to
the left, that is
[( ( T1 T2 ) · · · Tn ) E ] ( ( Vm · · · V2 ) V1 )
The result is shown in table 7.4, and completely includes the original matrix
pair.
• Apply the transformation matrices one by one, that is
[ [ [ ( T1 ( T2 · · · ( Tn E ) ) ) Vm ] · · · V2 ] V1 ]
The result is shown in table 7.5, and completely includes the original matrix
pair as well as the pair in table 7.4.
The difference is remarkable; the median interval width in table 7.5 is approximately
7 times that in table 7.4, and the ratio between the widest intervals is near 30. It
is a topic for future research to investigate whether the collapsed transformation
matrices can be given even higher accuracy by iterative refinement methods such
as forward-backward propagation (Jaulin et al., 2001, section 4.2.4).
7.2
Initial conditions
We now turn to the question whether nominally consistent initial conditions of the
original coupled system imply that the initial conditions of the fast and uncertain
subsystem tend to zero with m.
For the purposes of this section, the transformations leading to the canonical form
are divided into three groups. The variables of the original, coupled, form (7.3) are
denoted x̄,
!
Ex̄ x̄ x̄0 + Ax̄ x̄ x̄ = 0
(7.7)
196
7 lti ode of nominal index 2
The first group of transformations brings us to the form (7.6), where variables are
denoted
 
!  x 
v 
x
=  1 
v
v2 
 
v3
The corresponding dae manifests the lti matrix-valued singular perturbation form
of the present chapter (compare (6.5)), written
# !
# 0! "
"
x
Axx Axv x !
I
=0
(7.8)
+
Avx Avv v
Evv v 0
It is easy to check that the variable transforms have O( m0 ) norm, so
! x v = O( |x̄| )
The second group of transforms are the decoupling transforms, leading to the
form (7.1), with variables
 
!  ξ 
η 
ξ
=  1 
η
η2 
 
η3
belonging to
"
#
I
Eηη
! "
A
ξ0
+ ξξ
η0
#
Aηη
!
ξ !
=0
η
Here, v = L( m ) x + η relates η to the variable before the transforms, where L( m ) is
the matrix analyzed in section 7.A.1.
Finally, the last group of transforms, which only operates on the last three block rows
and columns, lead to the form (7.2), with variables
 
!  ξ 
ξ
η̄ 
=  1 
η̄
η̄2 
 
η̄3
and
"
#
I
Eη̄ η̄
! "
A
ξ0
+ ξξ
η̄ 0
#
Aη̄ η̄
!
ξ !
=0
η̄
(7.9)
Again, it is easy to check that the variable transforms have O( m0 ) norm, so
η̄ = O( η )
and hence we shall focus on the initial conditions for η in the rest of this section.
In section 7.A.1 is was shown that Evv L = O( m ), and it follows that
Eηη = Avv − Evv L Axv = Avv + O( m )
7.2
Initial conditions
197
We now end this section with a variation of lemma 6.4, establishing that the initial
conditions of the uncertain system tend to zero with m. The initial conditions for η
given by η(0) = v 0 − L x0 depend on the choice of x0 and v 0 , and becomes uncertain
due to the uncertainty in L. Note that it is generally not possible to establish convergence of the solutions if x0 and v 0 are set to fixed values without consideration of the
algebraic relations imposed by the dae. Rather, an integrator for dae need freedom
to select suitable initial conditions from some region or in the neighborhood of some
“guess” that a user may provide.
7.4 Lemma. Consider the fast and uncertain subsystem in (7.9), obtained from (7.7)
using the transformations of section 7.1. Allow for O( m ) uncertainty
in
equation
coefficients as well as initial conditions. The initial conditions satisfy η̄ 0 = O(m) if
and only if
!
Avx x0 + Avv v 0 = O(m)
(7.10)
Proof: The statement may be proved for η instead of η̄ without loss of generality. For
the sufficiency, consider the expression for η 0 ,
0
0
− m RL x0
(7.11)
η 0 = v 0 − L x0 = A−1
vv Avx x + Avv v
Here, A−1
vv 2 can be bounded by taking m sufficiently small, so the implication follows.
For the necessity, rearranging (7.11), we find that
Avx x0 + Avv v 0 = Avv η 0 + m RL x0
from which the O(m) size of (7.10) follows.
7.5 Corollary. The degrees of freedom in assignment of initial conditions under
(7.10) can be expressed
already
at the stage of (7.4). Denoting the variables belong
ing to this form x̆ v̆1 v̆2 , and writing Av̆2 x̆ in place of A331 , the condition may be
expressed as
!
Av̆2 x̆ x̆0 = O( m )
leaving no degrees of freedom for v̆1 and v̆2 . Equations to determine v̆1 and v̆2 in
terms of x̆0 are available in (7.5)
Proof: The statements can be verified by first checking that the involved block rows
are not changed by row operations between the stage of use and (7.6). Then using
3
that F33
in (7.4), the condition on x̆0 follows. That there is no degrees of freedom
for the remaining initial states, and that they can be determined from (7.5) follows
immediately from the non-singularity of
" 4
#
A23 A424
I
198
7.3
7 lti ode of nominal index 2
Growth of eigenvalues
This section amounts to some tedious bookkeeping of the characteristic polynomial
of the pair belonging to the η̄ dynamics (compare (7.2)). Since we are only concerned
with the pair in the final form in this section, we drop the superscripts on the symbols,




I 
I


!




0
E33 E34  η̄ (t) + 
I
(7.12)

 η̄(t) = 0




E43 E44
I
F44
The proofs are not difficult to understand, but may take some time to read.
We begin by stating some properties of the characteristic polynomial of (7.12) with
the nominal trailing matrix;




I 
I


!


E33 E34  η̄ 0 (t) + 
I
(7.13)
 η̄(t) = 0




E43 E44
I
0
| {z }
CAη̄ η̄ (0)
For future reference, the corresponding determinant is written separately in the variable λ


I 
λ I

λ E33 + I λ E34 
det 
(7.14)


I
λ E43
λ E44
We now make three observations. First, the modulus of a product of k entries from
E can be written as k for some ≤ max(E). Second, the term in the characteristic
polynomial which is free of factors from λ and E, is the determinant of the trailing
matrix. Third, with every factor from E follows a factor λ. Hence, the characteristic
polynomial expanded as a sum of monomials can be written


I  X

n

 +
I
det
σi λmi i i
(7.15)


I
0
i
|
{z
}
CD
where |D| = 1, |σi | = 1, |i | is some number smaller than max(E), and mi ≥ ni ≥ 1.
(The number of terms in the sum over i will depend on the matrix block dimensions.)
The ratios
bound.
mi
ni
turn out to be important, and in particular we will need a good upper
7.6 Lemma. The ratios
attained.
mi
ni
are bounded from above by 2, and this bound is generally
Proof: Please refer to (7.14) during the following argument.
7.3
199
Growth of eigenvalues
Since we are trying to maximize the power of λ relative to the power of i , we are
interested in the terms in the determinant which contain one or more factors from
the upper left block. From the structure of the matrix, with identity matrices in the
lower left and upper right blocks, any selection of n factors from the diagonal upper
left block will remove the corresponding rows and columns from the lower and right
block rows and columns from the remaining determinant factor. The lowest order
term in this remaining determinant will be a product containing all the positions
along the I in the middle block, and hence all remaining factors must come from the
lower right block. Adding up, the lowest order terms containing n factors from the
upper left block will be in the form


 X
Y
λn  (E33 )ii + 1
λn det( (E44 )ii )
i⊂Nn
i
where the last sum is over all minors with symmetric choice of rows and columns.
Hence, the generality of the constructions depends on at least one of these sums to
be non-zero.
To see that 2 is an upper bound, note that the ratio is made big by including factors
with a λ that do not come with an entry from E. Such factors only exist in the upper
left block, but it is clear from the argument
h above that ifor every such pure factor λ,
one factor from the corresponding row of λ E43 λ E44 must also be in the product.
The remaining factors in the term will be from the middle block row where there are
no factors λ that do not come with an entry from E, and hence the power of will
always be at least twice the power of λ.
Next, the lemma is used to show that the eigenvalues of the uncertain subsystem
must grow as max(E) → 0.
7.7 Theorem. There exists a constant k > 0 such that |λ| ≥ k max(E)−1/2 .
Proof: Let nη be the dimension of η̄ (and hence also of η), and hence the degree of
the characteristic polynomial (7.15). Lemma 7.6 allows us to write ni = mi /2 + ri for
some ri > 0. Rearranging the characteristic polynomial as a sum of monomials in λ,
we obtain
nη
X
X
m /2+ri
D+
λd
σi i i
d=1
i : mi =d
|
{z
}
Cad
r
Using i ≤ max(E) and assuming max(E) ≤ 1 (so that i i ≤ 1), the coefficients may
be bounded as
X
1
|ad | ≤ max(E)d/2
i : mi =d
| {z }
Cαd
200
7 lti ode of nominal index 2
where αd ∈ N only depends on the matrix block dimensions. Dividing the polynomial by λnη , and writing it as
nη−1
−1 nη
D (λ )
+
X
anη−d (λ−1 )d
d=0
an upper bound on λ−1 may be obtained using theorem 2.52. It gives that λ−1 is
bounded by 2 times the maximum of the expressions
a 1/d
1/d
d ≤ max(E)1/2 αd , d = 1, . . . , nη − 1
D
D
a 1/nη
α 1/nη
nη nη ≤ max(E)1/2 2 D 2D
Inverting the bound, the proof is completed by taking

( )
2 D 1/nη
 D 1/d nη−1 [ 

1


k = min  

 αnη 2
αd
d=1











This was for the case of the nominal trailing matrix. Note that typical perturbation
theory (for instance, Stewart and Sun (1990, chapter vi)) may be hard to apply here,
since we only have knowledge of the egienvalue magnitues so far, and only care about
the magnitudes. Typical perturbation analyses will study the perturbations of the
eigenvalues themselves, but these perturbations may actually be large in the present
situation without conflicting with our needs. So, instead of trying to use existing
eigenvalue perturbation theory, we consider the characteristic equation agian, this
time with perturbed coefficients.
The perturbed determinant det [ λ E − ( A + m F ) ] where max(F) = O( max(E)0 ), and
be rewritten
det [ λ E − ( A + m F ) ]
h
i
= det ( A + m F ) A−1 A ( A + m F )−1 [ λ E − ( A + m F ) ]
h
i
h
i
= det I + m F A−1 det λ A ( A + m F )−1 E − A
h
i
h h
i
i
= det I + m F A−1 det λ I − m F ( A + m F )−1 E − A
Here,


−F44 I  ( A + m F )−1 = 
I  = O( m0 )
2
 
I
2
h
i −1
Hence bounding det I + m F A
from below makes it possible to regard
h
i
I − m F ( A + m F )−1 E
as a new unstructured uncertainty which is still O( m ). If the special structure of the
leading matrix would not have been used in theorem 7.7, it would have been possible
7.3
201
Growth of eigenvalues
to apply again. However, the use of the blocked structure of the leading matrix does
not permit this, and even though the proof of the next theorem is far from as elegant
as the idea that we just ruled out, it does the job.
7.8 Lemma. Let the uncertainties in (7.12) be bounded by m by setting
m B max { max(E) , max(F) }
The characteristic polynomial can then be written


I  X

n


I
det
σi0 λmi i i
 +


I
F44
i
|
{z
}
CD 0
where |D 0 | = 1 + O( m ), σi0 = 1, |i | < m, and mi ≥ ni ≥ 1.
Just as for the case of nominal trailing (F44 = 0), it holds that mi ≤ 2 ni .
Proof: Compare lemma 7.6. The characteristic polynomial is now given by the determinant


I
λ I


λ E33 + I
λ E34 
det 


I
λ E43
λ E44 + F44
Trying to construct monomials in the determinant with as high degree in λ relative
to the degree in the entries of E and F, leads to the same reasoning as in lemma 7.6.
That is, with any
in the monomial, there must be one factor
h pure factor λ included
i
from the block λ E43 λ E44 + F44 . The only difference to the previous case is that
the included factor may be in the form λ e + f instead of just λ e this time. Hence,
there will be more monomials than before, but the added ones will never be one of
those which maximize the degree in λ relative to the degree in the entries of E and
F. Hence, the old result of lemma 7.6 obtains.
7.9 Example
For the matrix block dimensions corresponding to a nominal system with 3 index 1
subsystems and 4 index 2 subsystems, the dimension of η̄ is 1 · 3 + 2 · 4 = 11, and
depending on whether F is included or not, the characteristic polynomial is characterized by the numbers in table 7.1. It is seen that the table is in agreement with
lemma 7.8.
7.10 Corollary. Let the uncertainties in (7.12) satisfy
max(E) = O( m )
max(F) = O( m )
Then there are constants k > 0 and m0 > 0 such that for m < m0 , |λ| ≥ k m−1/2 for
every eigenvalue λ belonging to (7.12).
202
7 lti ode of nominal index 2
d
1
2
3
4
5
6
7
8
9
10
11
F=0
2 min { ni : mi = d }
2
2
4
4
6
6
8
8
10
12
14
αd
3
10
30
84
204
456
1008
1464
3240
2160
5040
d
1
2
3
4
5
6
7
8
9
10
11
F,0
2 min { ni : mi = d }
2
2
4
4
6
6
8
8
10
12
14
αd
7
34
138
492
1524
3984
8592
14424
17640
13680
5040
Table 7.1: Data for example 7.9. Compare the proof of theorem 7.7.
Proof: The O( m ) size of the uncertainties implies the existence of the constant m0 >
0, and numbers lE and lF such that m ≤ m0 implies
max(E) ≤ lE m
max(F) ≤ lF m
By defining m0 B max { lE , lF } m, and repeating the proof of theorem 7.7 with
m0 in place of max(E), it is seen that there exists k 0 > 0 such that |λ| ≥ k 0 m0 =
k 0 max { lE , lF } m. Hence, setting k B k 0 max { lE , lF } completes the proof.
7.4
Case study: a small system
At this point, we have established some results for index 2 lti dae that remind of
the results obtained for index 1 lti dae in chapter 6. Unfortunately, this route comes
to an end here. We shall investigate the reasons for this in the present section, but
as a first sign of the difficulties which arise for index 2 systems, note that the state
feedback matrix of (7.12) written as an ode will inevitably grow as max(E)−1 , while
the eigenvalues of this system have only been shown to grow at least as max(E)−1/2 .
Hence, it appears hard to establish a bound like (6.29) which allows the transition
matrix of the η̄ subsystem to be bounded by a constant. Indeed, we shall soon see
that no such bound exists.
Consider the smallest possible perturbed nominal index 2 fast and uncertain subsystem,
"
#
"
#
1
1
!
0
η̄ +
η̄ = 0
(7.16)
e
1 f
where both |e| and |f | are O( m ).
7.4
203
Case study: a small system
7.4.1
Eigenvalues
The characteristic polynomial in λ is


!2
!2

f
f

−1 

e λ + f λ − 1 = e  λ +
−
− e 
2e
2e
2
(7.17)
For both eigenvalues to be in the left half plane, it is required that e and f have equal
signs. For complex conjugate eigenvalues −e−1 = λ1 λ2 > 0, implying e < 0. For real
poles, stability requires
s
!2
f
f
+ e−1 < 0
+
−
2e
2e
which also simplifies to e < 0. Hence, both e and f must always be negative for the
eigenvalues to be in the left half plane.
For complex conjugate poles,
|λ| = (−e)−1/2
f
Re λ = −
2e
Since the modulus does not depend on f , f will only affect the argument of complex
conjugate eigenvalues, and when
f
≤ −(−e)−1/2
2e
the eigenvalues will become real. This condition simplifies to
−
− f ≥ 2 (−e)1/2
(7.18)
In case the eigenvalues are complex, the m−1/2 growth is obvious, but in the case of
real roots we resort to theorem 2.52. Applied to
!
−1 (λ−1 )2 + f (λ−1 ) + e = 0
the theorem immediately provides
(
|λ| ≥ min
1
1
,
2 |f | |e|1/2
)
which proves the m−1/2 growth regardless of poles being complex conjugates or not.
As usual we make the assumption there is a φ0 < π/2 such that arg(−λ) ≤ φ0 . Since
the argument is given by the ratio between imaginary and real parts, this effectively
puts an upper bound on |f | in relation to |e| via
f
cos( φ0 ) ≤
− Re λ
2e
=
|λ|
(−e)−1/2
204
7 lti ode of nominal index 2
That is,
|f | = −f ≥ 2 (−e)1/2 cos( φ0 )
In case they are real, the characteristic polynomial is written
e λ + (−e)−1/2 r
λ + (−e)−1/2 r −1
! f
e,
where r ≥ 1 must satisfy (−e)−1/2 ( r + r −1 ) =
!
r + r −1 =
that is, r is the bigger solution to
−f
≥2
(−e)1/2
(7.19)
where the left hand side is increasing for r ≥ 1, but r can also be found directly from
the expression for the smaller eigenvalue,
r
f 2 + 4e
1
1
r = − (−e)−1/2 f +
(7.20)
2
2
−e
In chapter 6, A1–[6.14] also imposed the bound
|λ| m < ā
(7.21)
on the eigenvalue moduli. We do not make this assumption yet in the current case
study, but we just note what it would imply. For complex eigenvalues, it would imply
1
= max |λ| ≤ ā m−1
(−e)1/2
(7.22)
(which is equivalent to a lower bound on |e| proportional to m2 ). For real eigenvalues,
it would imply
1
r = max |λ| ≤ ā m−1
(7.23)
−
1/2
(−e)
and hence,
r ≤ ā m−1 (−e)1/2 = O( m−1/2 )
(7.24a)
min |λ| = (−e)−1/2 r −1 ≥ ā−1 m (−e)−1 = O( m0 )
(7.24b)
As an alternative to the bound on moduli in A1–[6.14], we shall also consider bounding of r here. As an upper bound on r can be expressed in terms of the fast eigenvalues,
max { |λi | : |λi | ≥ R0 }
≤ r02
(7.25)
min { |λi | : |λi | ≥ R0 }
where the known growth of eigenvalues ensures that all the eigenvalues of the η̄
subsystem will satisfy |λ| ≥ R0 for sufficiently small m. Of course, (7.25) would imply
r ≤ r0 here. While φ ≤ φ0 imposes a lower bound on f relative to e, the bound on r
imposes an upper bound on |f | relative to |e| via (7.19) (the eigenvalues will always
7.4
205
Case study: a small system
be real near this limit),
−f
≤ r0 + r0−1
(−e)1/2
(7.26)
While the bound r ≤ r0 is a much stronger assumption than (7.21) for real eigenvalues
(compare (7.24a)), the bound adds no information given that the eigenvalues form a
single complex conjugate pair (compare (7.22)).
A lower bound on r would be quite artificial, and will not be considered an option.
7.4.2
Transition matrix
The transition matrix is computed using the Laplace transform. In the Laplace variable s the transition matrix is given by
" −1
#
1
e f + s −1
(7.27)
e−1
s
−e−1 + e−1 f s + s2
The gain of the transition matrix will be no smaller than the entries in the second
row. The form of the corresponding time functions expressed in the real domain will
depend on whether the eigenvalues are complex or not.
Let us first consider the case of complex conjugate eigenvalues. Using φ ∈ [ 0, π/2 )
to denote the common value of |arg( −λ )|, we introduce
α B −(−e)−1/2 cos( φ )
β B (−e)−1/2 sin( φ )
The characteristic polynomial can now be expressed using
α 2 + β 2 = −e−1
−2 α = e−1 f
Then the last row of transition matrix is
αt
αt
e
sin( β t )
eβ
e
[ β cos( β t )+α sin( β t ) ]
β
Optimizing out t, the largest gains are found to be (these values are attained, not just
upper bounds)
e−φ cot( φ )
−φ cot( φ ) cos( φ )
2
e
(−e)1/2
The expression e−φ cot( φ ) is a monotonically increasing function of φ, tending to e−1
as φ → 0, and equals 0 at φ = π/2. The second entry is a monotonically decreasing
function which tends to 2 e−1 as φ → 0, and equals 0 at φ = π/2. Since
e−1
≥ e−1 m−1/2
(−e)1/2
206
7 lti ode of nominal index 2
the maximum entry implies that the transition matrix is bounded from below,
sup kΦ(t)k2 ≥ e−1 m−1/2
(7.28)
t≥0
Using (7.22), we would obtain the bound
e−1
≤ e−1 ā m−1
(−e)1/2
but the bound grows too fast as m → 0 to be successful in combination with the O( m )
convergence of initial conditions.
Let us see if it helps to constrain the eigenvalues to be real. Again, optimizing out t
from the result yields expressions where the dependency on (−e)−1/2 can be factored
out, but this time the remaining factor depends on r ≥ 1 instead of φ, r 2 being the
ratio between the larger and smaller eigenvalues,
g1 ( r )
g
(
r
)
1
(−e)1/2
Here, both g1 and g2 are monotonically decreasing functions tending to zero, with
|g1 ( 1 )| = e−1 and |g2 ( 1 )| = e−2 . As we are unwilling to assume a lower bound on r, it
is seen that restriction to real eigenvalues would not lower the lower bound (7.28) on
the transition matrix.
Without an upper bound on eigenvalue magnitudes or r, (7.20) allows us to evaluate
the limit
g (r)
1
lim 1
=
e→0− (−e)1/2
f
which that the transition matrix will grow as
sup kΦ(t)k2 ≥ m−1
t≥0
However, letting e → 0− alone would violate (7.23) since the eigenvalues would tend
to infinity while the fixed f prevents m from approaching zero. On the other hand,
if the constraint (7.21) is added (7.23) yields
g (r)
g1 ( r )
≤ 1
ā m−1 ≤ e−1 ā m−1
1/2
r
(−e)
which is the same bound as for complex eigenvalues. An upper bound on r would
merely produce the same kind of lower bound as (7.28), but with g1 ( r0 ) replacing e−1 .
Since the transition matrix is bounded from below when the tight upper bounds in
−1
this section are expressed in terms
of m, and the upper bounds grow as m , it is not
possible to conclude that supt≥0 η̄(t) → 0 as m → 0, even though η̄(0) → 0. The
difficulty in separating the bounding of initial conditions from the bounding of the
initial condition response gain, is a major obstacle for the analysis of nominal index 2
dae.
7.4
207
Case study: a small system
7.4.3
Simultaneous consideration of initial conditions and
transition matrix bounds
We now end this case study by proving that supt≥0 η̄(t) → 0 as m → 0 in a very
simple case, despite how difficult this appears to us in the general case due to the
inability to consider bounds on initial conditions and initial condition response gain
in terms of m. The key to the problem is to consider how initial conditions and the
tight transition matrix upper bounds depend directly on e and f .
Noting that it is the gain from the initial condition of the first state which cannot be
bounded in terms of m, and that this gain grows as (−e)−1/2 , it will suffice to show
that the initial condition tends to 0 as |e|. It is the utter lack of generality that makes
us present this result as an example, even though the whole case study is a kind of
example in itself.
7.11 Example
Consider the coupled nominal index 2 system described by the pair

 

 1
 1
 

 


1  ,  a
1  
 

 

e
1 f
(7.29)
The system is decoupled using the matrix L = L0 + m RL , with exact solutions given
by
" #
af
L0 =
−a
"
#
a
e−f
L=
e−f −1 1
and thanks to the structure of the original pair, the decoupling will lead directly to
the canonical form, with the same parameters as in the rest of the current case study.
That is, η̄ = η in this case.
If the initial conditions x0 and v 0 of (7.29) are chosen nominally consistent, (7.11)
gives that the initial conditions η 0 are given by
η 0 = −m RL x0
Computing m RL as L − L0 yields
η0 = −
#
"
a
e(1 − f ) + f 2 0
x
e−f
e−f −1
Hence, for the first state to be O( |e| ), it is required that that f 2 = O( |e| ).
It the eigenvalues are not real, (7.18) directly gives
f 2 ≤ 4 |e|
208
7 lti ode of nominal index 2
It the case of real eigenvalues, (7.19) gives
2
f 2 = r + r −1 (−e)
Now,
the bound (7.25) on r is the most natural assumption to add in order to obtain
f 2 = O( |e| ). However, by taking m = |f |, the other bound (7.21) is still a valid
alternative since
(−e)1/2
r ≤ ā m−1 (−e)1/2 = ā
|f |
(−e)1/2
where |f | ≤ 1/2 for real eigenvalues. (Clearly, one would have to take ā > 2, but it
is expected that there will be a lower bound on how small ā can be chosen, compare
the sufficient condition for index 1 systems given in lemma 6.15.)
We argue that it is more elegant to use the assumption on r rather than m |λ|, since
the former does not involve the parameter m. Also note that it was due to the special
structure of the system under consideration that we were able to derive a bound on
r from the bound on m |λ| — the example does not show that this can be done for
general systems of nominal index 2.
From the simple example, we learnt that for systems of nominal index 2, it may be
necessary to bound the ratio between the moduli of the fast and uncertain eigenvalues (in one way or another), in order to obtain converging solutions of the fast and
uncertain subsystem. We find it more elegant to assume this directly, rather than
via the bound on m |λ| currently used for systems of nominal index 1. To assume a
bound on r may have many other applications as well, for instance, it might be a possible and more elegant replacement for the m |λ| bound also for systems of nominal
index 1.
7.5
Conclusions
This chapter was devoted to the analysis of autonomous index 0 lti dae of nominal
index 2. As the case study in section 7.4 shows in theory, and as do numerical experiments (not included in the thesis) indicate in other cases, there can be and generally
seems to be uniform convergence of solutions — just as we were able to prove generally for nominal index 1 in chapter 6. Though we have not been able to prove this
generally (the case study of a particular form of small system being the exception),
the chapter has contributed with several findings which we think will help in future
research on these systems.
First, the existence of a canonical form for perturbed lti dae of nominal index 2 has
been proposed. It is closely related to the Weierstrass form for exact matrix pairs,
and can be seen as a statement of where non-trivial perturbations of this form need
to be considered. The derivation includes existence proofs for the decoupling transforms which isolates the fast and uncertain dynamics from the slow and regularly
perturbed dynamics.
7.5
Conclusions
209
Second, it was shown that the eigenvalues of the fast and uncertain subsystem must
grow as m → 0. This makes it possible to formulate assumptions about the eigenvalues of the fast and uncertain subsystem.
Third, the case study of a small system has contributed with three findings. On the
one hand, the study shows that at least there can be uniform convergence of solutions, which should inspire research on how to prove this also in the general case.
On the other hand, the study revealed some drastic differences between the nominal
index 1 and nominal index 2 cases. In particular, the basic idea of limiting the initial
condition response gain of the fast and uncertain subsystem by a constant independent of the size of the perturbations turned out to be useless in this case. Finally, the
example in the case study showed that while the assumed bound on m |λ| used in
chapter 6 was still sufficient to obtain convergence, bounding the ratio of eigenvalue
moduli for the fast and uncertain subsystem can be an elegant replacement.
Appendix
7.A
Decoupling transforms
Inspection of the equations that L and H must satisfy to yield a decoupling transform
reveal that the results from chapter 6 are not readily applicable. Most notably, the
leading matrix of the lower part of the dae does not vanish with max(E).
In this section, we shall use notation which is unrelated to other sections in this
chapter, considering a matrix pair which is in the form

 

 I
 A11 A12 A13
 
 
 A21 A22 A23 I24  
I
 
 , 
 
(7.30)
 
 
E33 E34  A31 A32 I33
 
 
 
E43 E44
I42
F44
|
{z
}
CP
where
• max Eij and max(F44 ) are both O( m ).
• I24 , I33 , and I42 each have a corresponding non-singular nominal matrix Iij0
such that max Iij − Iij0 = O( m ).
• Aij are bounded independently of m.
In preceding sections of this chapter, it has been shown how this form can be reached
using non-singular transforms from more general starting points. Combining these
transforms with the transforms of the present section would result in decoupling
transforms applicable to more general forms than (7.30). However, we prefer the
notion of decoupling transforms applying to (7.30) avoid unnecessary clutter in this
section which is already lengthy.
For an application of these decoupling transforms to a concrete example, see example 7.3.
210
7.A
211
Decoupling transforms
The idea to use a fixed-point theorem to prove the existence of decoupling transforms
for lti systems appears in Kokotović (1975), where tighter estimates are provided
compared to the more general ltv results in Chang (1972).
Notation. For brevity, we shall often omit the for m sufficiently small in this section.
For instance, when we say that there exists a constant which gives a bound on something, this typically means that there is a m0 > 0 such that the bound is valid for all
m < m0 .
7.A.1
Eliminating slow variables from uncertain dynamics
In the first decoupling step we seek a matrix L partitioned as
 
L1 
 
L = L2 
 
L3
such that the blocks of

 
 I
 
 
− 
 
|
I
E33
E43

 "

 I


P


E34  L I  L


E44
{z
#
I
}
CPL
(which is a pair with the same leading matrix as P ) below the “11-position” in the
trailing matrix are zero. Writing
L = L0 + m RL
with L0 denoting the nominal solution corresponding to m = 0, we shall prove
uniqueness of L0 and that kRL k2 = O( m0 ).
Recall (6.9), the equation that L must satisfy. In the current context given by (7.30),
a corresponding residual function is defined by


 


A21  A22 A23 I24 
I


 h
i 

4 
 
 L − 
 L A11 + A12 A13 0 L
E
E
δL ( L ) = A31  + A32 I33
33
34




 
0
I42
F44
E43 E44
(7.31)
so that the equation is written
!
δL ( L ) = 0
(7.32)
Let us first consider the nominal solution to this equation, in which case the last of
the tree groups of equations reads
!
0 0
0 = I42
L1
0
and the known non-singularity of I42
gives that L01 = 0. Hence, the third term in
(7.32) vanishes in the nominal case, and L0 is obtained by either of the following
212
7 lti ode of nominal index 2
expressions


0
 "

#
#
"


−1
0
L0 =  A023 I24
A021 
− 0

A031
I33
 0
0 −1  0 
A22 A023 I24
 A21 
 0
  0 
0
 A31 
= − A32 I33
 

 0
0
I42
0
(7.33)
Note that the second form is exactly the same as in the index 1 case.
Since L0 only solves the nominal equation, the residual δL ( L0 ) is generally non-zero.
However, using that




I

0




0
E33 E34  L = 
E33 E34  L0





E43 E44
E43 E44
it is seen that δL ( L0 ) = O( m ), so by taking m sufficiently small, we know the existence of
∃ cL < ∞ : δ ( L0 ) ≤ cL m
0
L
2
0
Leaving the nominal case, we seek an O( m0 ) bound (that is, a bound which is independent of m for m sufficiently small) on kRL k2 . Inserting the decomposed L in (7.32)
and cancelling a factor of m in the equation, it reads




A22 A23 I24 
I



h
i 

! 


0 0 RL A11 + A12 A13 0 L0
0 = A32 I33
 RL − 




I42
F44
0 0
− m−1 δL ( L0 )


0

h
i

E33 E34  L0 A12 A13 0 RL
− 


E43 E44



0
h
i 
E33 E34  RL A11 + A12 A13 0 L0
− 


E43 E44


I

h
i

E33 E34  RL A12 A13 0 RL
− m 


E43 E44
To simplify notation, we introduce the linear function L given by




A22 A23 I24 
I



h


4 
 RL − 
 RL A11 + A12 A13
0
0
L( RL ) = A32 I33




I42
F44
0 0
(7.34)
i 0 L0
7.A
213
Decoupling transforms
and name the remaining terms according to


0

h
i
4 

E33 E34  L0 A12 A13 0 RL
g1 ( RL ) = 


E43 E44



0
h

E33 E34  RL A11 + A12 A13
+ 


E43 E44


I

h
i
4 

E33 E34  RL A12 A13 0 RL
g2 ( RL ) = 


E43 E44
i 0 L0
This allows us to write (7.34) as
!
L( RL ) = m−1 δL ( L0 ) + g1 ( RL ) + m g2 ( RL )
(7.35)
As it was possible to solve for L0 in the nominal equation, it is seen that the matrix of
L is only O( m ) away from an invertible matrix, so taking m sufficiently small allows
the induced 2-norm of the operator’s inverse to be bounded.
P1–[7.12] Property. The constant cL < ∞ shall be chosen so that
L−1 ≤ c
2
L
(P1–7.36)
For instance, the bound may be computed using
def
L−1 R
L−1 R
√ F
2
−1
L = sup
≤ 1
= n (vec L)−1 2
2
kRk2
√ kRkF
R,0
n
n
o
where n = min nη, nξ (nη by nξ being the dimensions of L), and vec L refers to a
vectorized version of L.
The following constants are also readily available
h
i ∃ c1 < ∞ : A11 + A12 A13 0 L0 ≤ c1
2
h
i ∃ c2 < ∞ : A12 A13 0 ≤ c2
2
Further, the O( m ) property of the Eij implies the existence of

 0
 
m−1 E33 m−1 E34  ≤ cE
∃ cE < ∞ : 


m−1 E43 m−1 E44 2
214
7 lti ode of nominal index 2
Now consider RL ∈ L = RL : kRL k2 ≤ ρL , where ρL is to be selected later. The the
following bounds are obtained for m small enough
m−1 δL ( L0 ) ≤ c0L
2
kg1 ( RL )k2 ≤ m cE L0 2 c2 + c1 ρL
kg2 ( RL )k2 ≤ c2 ρL2
Using the matrix equality
X2 Q X2 − X1 Q X1 = ( X2 − X1 ) Q X2 + X1 Q ( X2 − X1 )
(7.37)
(used already in Chang (1969)), one also obtains
g1 ( RL,2 ) − g1 ( RL,1 ) ≤ m cE L0 c2 + c1 RL,2 − RL,1 2
2
2
g2 ( RL,2 − RL,1 ) ≤ 2 c2 ρL RL,2 − RL,1 2
2
In search of a contraction mapping to prove the existence of abounded solution RL ∈
L, the operator TL is defined by
4
TL RL = L−1 m−1 δL ( L0 ) + g1 ( RL ) + m g2 ( RL )
(7.38)
Setting
ρL = ( 1 + αL ) cL c0L
(7.39)
for some αL > 0, and considering
i
h kTL RL k2 ≤ cL c0L 1 + m [ 1 + αL ] cE L0 2 c2 + c1 + c2 ρL
it is seen that TL maps L to itself if
i
h m [ 1 + αL ] cE L0 2 c2 + c1 + c2 ρL ≤ αL
or, equivalently,
m≤
1
αL
cE L0 2 c2 + c1 + c2 ρL 1 + αL
(7.40)
From
TL RL,2 − TL RL,1 ≤ m cL cE L0 c2 + c1 + 2 c2 ρL RL,2 − RL,1 2
2
2
it is seen that TL is a contraction if
m≤
1
1
cE L0 2 c2 + c1 + 2 c2 ρL cL
and the conjunction of the two conditions (7.40) and (7.41) is equivalent to
(
)
1
αL
1
m≤
min
,
c L 1 + αL
cE L0 2 c2 + c1 + 2 c2 ( 1 + αL ) cL c0L
(7.41)
(7.42)
7.A
215
Decoupling transforms
If the parameter αL is tuned for maximizing the bound on m, the optimal choice
in (7.42) can be given in closed form. In case c2 = 0 the best choice is that which
makes the two bounds equal (larger values will only worsen the bound on ρL without
improving the bound on m), so we consider the more interesting case when c2 , 0.
One first computes the optimum of (7.40) alone,
v
t
cE L0 2 c2 + c1
αL1 B 1 +
2 c2 cL c0L
h
i
The objective function in (7.40) is increasing in the range 0, αL1 , but the combined
objective in (7.42) is only increasing up to the point where
αL ! 1
=
1 + αL
cL
This can only happen if cL > 1, with solution αL = c 1−1 . Hence, the bound (7.42) is
L
maximized by


α1
1


if c2 , 0 and L 1 ≤ c1
αL ,
L
1+α
(7.43)
αL = 
L


 c 1−1 , otherwise
L
The choice (7.43) should be used with care, since if cL < 1 and c2 approaches zero,
the rule will assign arbitrarily large values to αL , ignoring the consequences for the
bound on ρL .
For future reference, we note that c2 = 0 and maximizing the bound in (7.42) with
respect to αL yields the bound
1
m≤
(7.44)
cE c1 c L
This concludes the proof of existence of the approximation
L = L0 + m RL
valid for sufficiently small m. Given a choice of the tuning parameter αL > 0, the
bound on m is given in (7.42), while the bound on kRL k2 in (7.39) is no less than
cL c0L . If the bound on kRL k2 is not critical, (7.43) may be used to set the tuning
parameter.
Now that the approximation has been proved, we may additionally conclude that


 

 I
0
 
I
  0
 
 

E
E
E
E
E
E
=
L
+
m
L



 RL 33
34
33
34
33
34

 

 
(7.45)
E43 E44 2 E43 E44
E43 E44
2
≤ L0 c + ρ m
2
E
L
216
7.A.2
7 lti ode of nominal index 2
Eliminating uncertain variables from slow dynamics
In the second decoupling step we seek a matrix H partitioned as
h
i
H = H1 H2 H3
such that the blocks of
"
I


I
−H

PL 
I


#

I


E33
E43
I
 
 
E34  H 
 
E44


(which again is a pair with the same leading matrix as P ) below and to the right of
the “11-position” in the trailing matrix are zero. While the objective is primarily to
show just that H = O( m0 ), we will still write
H = H 0 + m RH
with H 0 denoting the nominal solution corresponding to m = 0, in order to be able
to provide better estimates of the size of H. Hence, we shall prove uniqueness of H 0
and that kRH k2 = O( m0 ).
This section is to a large extent analogous to the previous section. This should come
as no surprise since the decoupling can be implemented with the same kind of transform used in the previous section, only applied to the transposed matrix pair this
time. While this proves the existence, the pair PLT has some structural differences to
the pair P , and we aim to exploit this to get insight into the problem and hopefully
obtain tighter bounds. We shall return to the duality between the two decoupling
steps in section 7.A.3 when we have the bounding expressions of the two steps at
hand, and the reader who is not interested in the minor details of obtaining good
bounds should skip to section 7.A.3 at this point.
The condition that H must satisfy has a corresponding residual function



i h
i I
4 h
E33 E34 
δH ( H ) = A12 A13 0 + A11 + A12 A13 0 L H 


E43 E44


 
 h
 A22 A23 I24  I
i
 

E33 E34  L A12 A13 0
− H  A32 I33
 − 
 


I42
F44
E43 E44



 (7.46)

so that the equation is written
!
δH ( H ) = 0
Using knowledge about L0 , the equation for H 0 simplifies to
 0

0 
A22 A023 I24
 h
I
i h
i 

!
0
 = A0 A0 0 + A0 + A0 A0 0 L0 H 0 
H 0 A032 I33

11
12
13
12
13

 0

I42
0
(7.47)
0
0


0

0
7.A
217
Decoupling transforms
Reading off the last block column of the equation reveals the readily invertible
!
0
H10 I24
=0
From H10 = 0 it follows that

I

H 0 

E33
E43
(7.48)



0

E34  = H 0 


E44
E33
E43


E34 

E44
and hence that H 0 is given by either of the following two expressions
"
"
# #
h
i A0 I 0 −1
0
0
0
32
33
H = 0 A12 A13
0
I42
 0
0
0 −1
h
i A22 A23 I24 
0

= A012 A013 0 A032 I33

 0
I42
0
(7.49)
(7.50)
Since H 0 only solves the nominal equation, δH ( H 0 ) is generally non-zero. However,
using (7.49) and (7.45) it is seen that δH ( H 0 ) = O( m ), so by taking m sufficiently
small, we know the existence of
∃ cH < ∞ : δ ( H 0 ) ≤ cH m
0
H
2
0
Note that it was possible to solve for H 0 without introducing additional assumptions
about distinct eigenvalues, as is typically needed when the equation is in the form
!
H 0 A + B H 0 = C. In particular, this means that the linear operator H defined by




A22 A23 I24  I

h
i
4

 − A11 + A12 A13 0 L H 0 

0
0
H( H 0 ) = H 0 A32 I33




I42
F44
0 0
has a matrix which is only O( m ) away from an invertible one, and hence taking m
sufficiently small will enable us to bound the inverse of the operator,
P2–[7.13] Property. The constant cH < ∞ shall be chosen so that
∃ : H−1 ≤ c
2
H
(P2–7.51)
Now that the existence and uniqueness of a nominal solution has been established,
we turn to RH . Inserting H = H 0 + m RH in (7.47) and cancelling a factor of m in the
equation, one obtains


I
 h
i
!

E33 E34  L A12 A13 0
H( RH ) = m−1 δH ( H 0 ) + RH 


E43 E44


0

h
i 
E33 E34  (7.52)
+ A11 + A12 A13 0 L RH 


E43 E44
218
7 lti ode of nominal index 2
which is written
!
H( RH ) = m−1 δH ( H 0 ) + m h( RH )
(7.53)
by means of the definition

I
4

h( RH ) = RH m−1 

h
E33
E43
h
+ A11 + A12

 h
E34  L A12

E44
A13
A13

0
i i
0 L RH m−1 

i
0
E33
E43


E34 

E44
Restricting the analysis to m so small that (7.45) is valid and using that
h
h
i i
A11 + A12 A13 0 L ≤ c1 + m c2 ρL
2
the gain of the linear function h can be bounded as
kh( RH )k2 0 ≤ L 2 cE + ρL c2 + [ c1 + m c2 ρL ] cE C ch
kRH k2
(7.54)
For the operator TH defined by
4
(7.55)
TH RH = H−1 m−1 δH ( H 0 ) + m h( RH )
and restricted to RH ∈ H = RH : kRH k2 ≤ ρH , where ρH is to be selected later, we
then obtain
kTH RH k2 ≤ cH c0H + m ch ρH
TH RH,2 − TH RH,1 = m H−1 h( RH,2 − RH,1 ) 2
2
R
≤ m c H ch
H,2 − RH,1 2
It just remains to set ρH = ( 1 + αH ) cH c0H with αH > 0, so that
m≤
1
αH
cH ch 1 + αH
(7.56)
ensures that TH maps H into itself, and since this implies that
m c H ch < 1
the contraction property imposes no additional requirements on m.
This concludes the proof of existence of the approximation
H = H 0 + m RH
valid for sufficiently small m. The provided bound on kRH k2 is no less than cH c0H ,
and for each choice of the bound, a corresponding bound on m follows from (7.56).
7.A
219
Decoupling transforms
We now end the section with a last analogy to the previous section.

 


 I
 0

I
 H 
E33 E34  = H 0 
E33 E34  + m RH 
E33 E34  






E43 E44 2 E43 E44
E43 E44 2
≤ H 0 2 cE + ρH m
7.A.3
(7.57)
Remarks on duality
We indicated in the beginning of section 7.A.2 that the existence of the second decoupling step could simply be obtained by applying the decoupling developed in
section 7.A.1 to the transposed pair PLT. In this section we shall make some comparisons that will show whether or not the development in section 7.A.2 was any good.
The theory provides two bounds, one on m which should be large for wider applicability, and one on ρH which should be small for increased precision in the results.
In view of n view of theorem 2.50, however, obtaining a tight bound on ρH may be
of little importance in applications as iterative refinement will generally produce RH
with kRH k2 < ρH anyway. Hence, the crucial bound for the comparison at hand is
that of m.
We need some notation to indicate what the expressions in section 7.A.1 would be if
applied to the decoupling step in section 7.A.2. We will let
{expr}7.A.1
denote the expression or quantity we would use in place of expr in section 7.A.1.
The pair PLT is given by

 I
 

 
I
 

 
E33

 


E43
|


 ,
E34 

E44
h

 A11 + A12 A13

 T 

A12 

AT 

 13 



0
{z
i T
0 L





T
M 









}
PLT
where

A22

M = A32

I42
A23
I33
 
I24  I
 
 − 
 
F44
E33
E43

 h
E34  L A12

E44
A13
0
i
That is, to use the notation of section 7.A.1 we have to make the replacements



T

I


 
I





 


E
E
E
E
∼


33
34
33
34

 







E43 E44 7.A.1
E43 E44
h
i T
{A11 }7.A.1 ∼ A11 + A12 A13 0 L
nh
io
A12 A13 0
∼0
7.A.1
220
7 lti ode of nominal index 2


A


 22


A32


 I42


 T 

A 


A12 

 21  

AT 
A
∼


 13 


31










 0 
0
7.A.1

A23 I24  


 
I33
∼ MT
 



F44
7.A.1
Using the replacement rules, we find that
n
h
i o
h
A11 + A12 A13 0 L0
∼ A11 + A12
7.A.1
i T
0 L
A13
and hence the only difference between the two operators L and HT is that


A22 A23 I24 
A

 32 I33



I42
F44
has been replaced by M, which we know is an O( m ) difference.
Noting that H−1 2 = (HT)−1 2 , we see that the larger uncertainty in M generally
implies that
{cL }7.A.1 ≥ cH ,
{cL }7.A.1 − cH = O( m )
Next, from δH ( −H )T = {δL }7.A.1 ( H ) it follows that
n o
c0H = c0L
7.A.1
Hence for a given value of the trade-off parameter α, applying section 7.A.2 would
yield the better bound on ρH , but the difference is small.
To see if there are any interesting differences in the more crucial bound on m, we
assume that αL7.A.1 is selected optimal in this respect. In section section 7.A.2, the
bound on (7.56) approaches
1
c H ch
from below as αH grows. Since {c2 }7.A.1 ∼ 0, the condition (7.44) shall be used in
section 7.A.1,
(
)
1
m≤
cE c1 cL 7.A.1
where
{c1 }7.A.1 ∼ c1 + m c2 ρL
This means that {cE c1 }7.A.1 constitutes just one of the two positive terms in ch (see
(7.54)), and hence the bound (7.56) is more restricting nthan (7.44),
o even for large
.
values of αH . At the same time, (7.44) is valid as soon as αL ≥ c 1−1
L
7.A.1
7.B
221
Example data
To conclude this section, the most promising approach to the second decoupling step
is that in section 7.A.1. In a particular problem with uncertain data, however, many
of the triangle inequalities used to establish bounds in this section, take (7.54) as an
example, may be unnecessarily conservative; more direct methods for computing upper bounds on the gain are likely to produce tighter bounds. Hence, we are unable
to tell a priori which approach would superior for some given data. In the implementation behind some of the examples in this thesis, we have only implemented
section 7.A.1.
7.B
Example data
This section contains tables with matrix pair data referenced from section 7.1.3. To
fit the matrices on the page, the pair
h
i h
i
E 1 E 2 , A1 A2
where the partitioning is related with space constraints — and not with the real structure in the pair — will be typeset as
[A1
[E1
···
A2 ]
E2 ]
and rotated.
···
−0.56744±1 · 10−9
0.10134±1 · 10−9
1.0545±1 · 10−9
0.63513±1 · 10−9
−0.81415±1 · 10−9
0.34382±1 · 10−9
0.8838±1 · 10−9
1.1997±1 · 10−9
−0.45008±1 · 10−9
−7.1453 · 10−2 ±1 · 10−9
0.1257±1 · 10−9
0.29268±1 · 10−9
6.6423 · 10−2 ±1 · 10−9 4.5755 · 10−2 ±1 · 10−9 0.44589±1 · 10−9
−0.47065±1 · 10−9
0.14189±1 · 10−9
0.29152±1 · 10−9
0.7618±1 · 10−9
0.69394±1 · 10−9
0.16941±1 · 10−9
−0.63687±1 · 10−9
0.92851±1 · 10−9
0.51348±1 · 10−9
1.0851±1 · 10−9
−0.43322±1 · 10−9 −1.2893±1 · 10−9
0.33946±1 · 10−9 5.5605 · 10−2 ±1 · 10−9
−0.45697±1 · 10−9
−0.39125±1 · 10−9 −0.62017±1 · 10−9
5.8362 · 10−2 ±1 · 10−9
0.48852±1 · 10−9
0.15069±1 · 10−9
−0.22055±1 · 10−9
0.28734±1 · 10−9
−0.14599±1 · 10−9
0.59524±1 · 10−9
0.68238±1 · 10−9 6.2673 · 10−2 ±1 · 10−9
0.24208±1 · 10−9
−0.18688±1 · 10−9
0.86988±1 · 10−9
−0.59554±1 · 10−9
−0.33192±1 · 10−9
0.64177±1 · 10−9
−0.62088±1 · 10−9
0.74071±1 · 10−9 9.9538 · 10−2 ±1 · 10−9 −4.7671 · 10−3 ±1 · 10−9
0.10846±1 · 10−9
−0.41661±1 · 10−9
2.5639 · 10−2 ±1 · 10−9

0.54801±1 · 10−9
−0.16533±1 · 10−9
−1.543±1 · 10−9
−6.5648 · 10−2 ±1 · 10−9
0.31343±1 · 10−9
−0.57061±1 · 10−9
−7.7208 · 10−2 ±1 · 10−9
−1.3632±1 · 10−9
−2.6105 · 10−2 ±1 · 10−9
−0.51955±1 · 10−9
1.0882±1 · 10−9
−0.66258±1 · 10−9
−0.15435±1 · 10−9
−0.57407±1 · 10−9
9.1776 · 10−2 ±1 · 10−9
−0.11226±1 · 10−9
−1.4279±1 · 10−9
0.33673±1 · 10−9
−0.2089±1 · 10−9
−1.3841±1 · 10−9
0.22632±1 · 10−9
0.57909±1 · 10−9

0.29551±1 · 10−9
1.6751±1 · 10−9
0.1552±1 · 10−9
0.31229±1 · 10−9

−9
−2
−9
−9
−9

0.21854±1 · 10
2.4901 · 10 ±1 · 10
0.62085±1 · 10
−0.25087±1 · 10

−9
−9
−9
−9

0.46305±1 · 10
0.44453±1 · 10
−0.86007±1 · 10
−0.29519±1 · 10

−9
−9
−9
−9
−0.55363±1 · 10
−0.40435±1 · 10
0.61376±1 · 10
0.71604±1 · 10

−2
−9
−9
−9
−2
−9

0.10724±1 · 10
0.10682±1 · 10
−3.1899 · 10 ±1 · 10 
5.9122 · 10 ±1 · 10

−9
−2
−9
−9
−2
−9
−0.75859±1 · 10
−5.3715 · 10 ±1 · 10
−0.84413±1 · 10
7.6798 · 10 ±1 · 10

−9
−9
−9
−9

−0.82638±1 · 10
−0.31475±1 · 10
0.36745±1 · 10
−0.23708±1 · 10
0.18827±1 · 10−9
−0.72398±1 · 10−9
−6.0146 · 10−2 ±1 · 10−9
1.176±1 · 10−9

−0.10204±1 · 10−9 4.0286 · 10−2 ±1 · 10−9 2.2346 · 10−2 ±1 · 10−9
−0.48771±1 · 10−9 
−0.28646±1 · 10−9
−1.242±1 · 10−9
−0.13023±1 · 10−9
0.15927±1 · 10−9 


−0.13891±1 · 10−9
0.32844±1 · 10−9
0.32631±1 · 10−9
−3.034 · 10−4 ±1 · 10−9 

−9
−9
−2
−9
−9
0.92142±1 · 10
−0.73052±1 · 10
−4.9639 · 10 ±1 · 10
0.51766±1 · 10


0.36114±1 · 10−9
−0.28374±1 · 10−9
0.52565±1 · 10−9
0.84354±1 · 10−9 

−0.18868±1 · 10−9
0.58109±1 · 10−9
0.38569±1 · 10−9
0.87639±1 · 10−9 

1.0234±1 · 10−9
−0.24213±1 · 10−9
−0.78102±1 · 10−9
−0.58414±1 · 10−9 

0.16812±1 · 10−9
0.9353±1 · 10−9
2.6376 · 10−2 ±1 · 10−9
−0.89432±1 · 10−9 
−0.29085±1 · 10−9 −0.73002±1 · 10−9
0.20136±1 · 10−9
0.25038±1 · 10−9
 1.1712±1 · 10−9

 1.5482±1 · 10−9
 0.49793±1 · 10−9

 0.40038±1 · 10−9
 −5.0966 · 10−2 ±1 · 10−9
 0.29889±1 · 10−9

 −0.33093±1 · 10−9
 −5.9865 · 10−3 ±1 · 10−9
−0.13283±1 · 10−9
 0.7529±1 · 10−9

 5.1293 · 10−2 ±1 · 10−9
 0.58978±1 · 10−9

 5.2723 · 10−2 ±1 · 10−9
 0.71232±1 · 10−9

 0.62964±1 · 10−9
 0.64198±1 · 10−9

 8.108 · 10−2 ±1 · 10−9
0.37423±1 · 10−9
222
7 lti ode of nominal index 2
Table 7.2: The initial matrix pair.
0
0
0
0
0
−16
1.5999 · 10 ±6.2549 · 10−6
−17
−7.8679 · 10 ±6.5123 · 10−6
−5.2925 · 10−17 ±2.2497 · 10−6
5.9823 · 10−17 ±2.5663 · 10−6
···
−1.0818±7.6346 · 10−4 −1.3864±1.0081 · 10−3
−0.5101±1.3366 · 10−3 −0.75049±1.9634 · 10−3
0.52916±1.8735 · 10−3 0.6205±2.7679 · 10−3
0
0
0
0
0
0
0
0
0
0
0
0
1 0 0 0 0
 0 1 0 0 0
 0 0 1 0 0
 0 0 0 1 0
 0 0 0 0 1
 0 0 0 0 0
 0 0 0 0 0

00000
00000
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1

0
0
0

0
0
0

0
0
0

0
1
0

0
0
1


0
0
0

1
0
0

−13
−6
−12
−6
0 3.0652 · 10 ±4.3756 · 10
3.145 · 10 ±4.7619 · 10
0 −4.235 · 10−13 ±5.6642 · 10−6 −3.818 · 10−12 ±6.2896 · 10−6

0
0
0

0
0
0

0
0
0

0
0
0

0
0
0

1.5112 · 10−16 ±1.7972 · 10−6 2.3882 · 10−16 ±4.0818 · 10−7 9.2357 · 10−17 ±6.2096 · 10−7 

−17
−6
−16
−7
−16
−7
−3.0947 · 10 ±1.8711 · 10 −2.6407 · 10 ±4.2497 · 10
−1.393 · 10 ±6.4651 · 10 

−16
−7
−16
−7
−16
−7
4.2506 · 10 ±6.4639 · 10
−2.4822 · 10 ±1.4681 · 10 −2.3711 · 10 ±2.2334 · 10 
−16
−7
−16
−7
−16
−7
−2.9196 · 10 ±7.3733 · 10
3.8433 · 10 ±1.6747 · 10
2.5761 · 10 ±2.5477 · 10
0
0
0
0
0
1
0
0
0
0

−3
 0.88959±1.0295 · 10−3
 0.53515±1.9335 · 10
 −0.36626±2.7152 · 10−3

0

0


0

0

0
7.B
Example data
223
Table 7.3: The decomposed pair in the proposed canonical form.

1.171±6.99 · 10−2
−0.5674±2.231 · 10−2
0.1013±1.708 · 10−2
1.055±1.25 · 10−2
0.548±7.038 · 10−2

0.6351±2.005 · 10−2
−0.8142±1.576 · 10−2
0.3438±1.136 · 10−2
−0.1653±6.651 · 10−2
 1.548±6.788 · 10−2
 0.498±6.367 · 10−2
0.8838±2.512 · 10−2
1.2±1.842 · 10−2
−0.4501±1.359 · 10−2
−1.543±7.037 · 10−2

 0.4004±6.809 · 10−2 −7.146 · 10−2 ±3.18 · 10−2 0.1257±2.203 · 10−2 0.2927±1.646 · 10−2 −6.563 · 10−2 ±8.163 · 10−2
 −5.113 · 10−2 ±6.54 · 10−2 6.645 · 10−2 ±4.252 · 10−2 4.574 · 10−2 ±2.834 · 10−2 0.4459±2.138 · 10−2
0.3135±9.378 · 10−2

 0.299±4.122 · 10−2
−0.4707±2.686 · 10−2
0.1419±1.845 · 10−2
0.2915±1.425 · 10−2
−0.5707±5.858 · 10−2

−0.3309±0.1286
0.7618±3.414 · 10−2
0.6939±2.721 · 10−2
0.1694±1.981 · 10−2
−7.719 · 10−2 ±0.1211

 −6.025 · 10−3 ±6.027 · 10−2 −0.6369±2.08 · 10−2
0.9285±1.597 · 10−2
0.5135±1.189 · 10−2
−1.363±6.252 · 10−2
−0.1327±7.339 · 10−2
1.085±3.024 · 10−2
−0.4332±2.179 · 10−2 −1.289±1.608 · 10−2 −2.617 · 10−2 ±8.285 · 10−2
 0.7529±4.42 · 10−3
0.3395±1.008 · 10−3 5.56 · 10−2 ±8.774 · 10−4
−0.457±5.882 · 10−4
−0.5196±3.921 · 10−3
 5.129 · 10−2 ±3.431 · 10−3 −0.3913±8.262 · 10−4 −0.6202±7.223 · 10−4 5.836 · 10−2 ±4.827 · 10−4
1.088±3.102 · 10−3

−0.2205±6.36 · 10−4
−0.6626±4.462 · 10−3
 0.5898±4.859 · 10−3 0.4885±1.194 · 10−3 0.1507±9.969 · 10−4
 5.273 · 10−2 ±7.656 · 10−3 0.2873±1.82 · 10−3
−0.146±1.524 · 10−3
0.5952±9.755 · 10−4
−0.1544±6.941 · 10−3

 0.7123±1.083 · 10−2 0.6824±2.659 · 10−3 6.267 · 10−2 ±2.172 · 10−3
0.2421±1.34 · 10−3
−0.5741±9.992 · 10−3
 0.6296±5.721 · 10−3 −0.1869±1.442 · 10−3 0.8699±1.195 · 10−3
−0.5955±7.426 · 10−4
9.178 · 10−2 ±5.344 · 10−3

 0.642±7.147 · 10−3 −0.3319±1.637 · 10−3 0.6418±1.499 · 10−3
−0.6209±1.062 · 10−3
· 10−3
 8.108 · 10−2 ±3.986 · 10−3 0.7407±9.501 · 10−4 9.954 · 10−2 ±8.377 · 10−4 −4.767 · 10−3 ±5.591 · 10−4 −0.1123±6.284
−1.428±3.596 · 10−3
0.3742±6.36 · 10−3
0.1085±1.555 · 10−3
−0.4166±1.292 · 10−3
2.564 · 10−2 ±8.148 · 10−4
0.3367±5.836 · 10−3
···

−2
−2
−0.209±8.443 · 10
−1.384±4.042 · 10
0.2263±1.645 · 10−2
0.5792±5.876 · 10−2 
−2
−2
−2
−2

0.2955±8.149 · 10
1.675±3.565 · 10
0.1552±1.418 · 10
0.3123±5.735 · 10

0.2186±7.705 · 10−2
2.489 · 10−2 ±4.706 · 10−2
0.6209±1.947 · 10−2
−0.2509±5.317 · 10−2 


−2
−2
−2
−2
0.4631±8.31 · 10
0.4445±6.101 · 10
−0.8601±2.569 · 10
−0.2953±5.555 · 10

−2
−2
−2
−2

−0.5539±8.098 · 10
−0.4044±8.413 · 10
0.6138±3.618 · 10
0.7162±5.261 · 10
−2
−2
−2
−2
−2
−2

5.917 · 10 ±5.139 · 10
0.1073±5.293 · 10
0.1068±2.26 · 10
−3.191 · 10 ±3.39 · 10 
−2
−2
−2
−2

−0.7586±0.154
−5.371 · 10 ±5.87 · 10
−0.8441±2.269 · 10
7.675 · 10 ±0.1076 

−0.8265±7.291 · 10−2
−0.3148±3.756 · 10−2
0.3675±1.509 · 10−2
−0.237±5.086 · 10−2 
0.1884±8.888 · 10−2
−0.724±5.711 · 10−2
−6.014 · 10−2 ±2.386 · 10−2
1.176±6.08 · 10−2

−0.102±5.491 · 10−3 4.029 · 10−2 ±1.863 · 10−3 2.235 · 10−2 ±8.336 · 10−4
−0.4877±3.916 · 10−3 

−0.2865±4.26 · 10−3
−1.242±1.536 · 10−3
−0.1302±6.852 · 10−4
0.1593±3.059 · 10−3


−0.1389±6.07 · 10−3
0.3284±2.296 · 10−3
0.3263±1.03 · 10−3
−3.025 · 10−4 ±4.316 · 10−3 

0.9214±9.599 · 10−3
−0.7305±3.488 · 10−3 −4.964 · 10−2 ±1.591 · 10−3
0.5177±6.789 · 10−3


0.3611±1.363 · 10−2
−0.2837±5.204 · 10−3
0.5257±2.361 · 10−3
0.8435±9.615 · 10−3

−0.1887±7.11 · 10−3
0.5811±2.797 · 10−3
0.3857±1.189 · 10−3
0.8764±5.113 · 10−3


1.023±8.84 · 10−3
−0.2421±2.904 · 10−3
−0.781±1.287 · 10−3
−0.5841±6.386 · 10−3 

−3
−3
−2
−4
−3
0.1681±4.911 · 10
0.9353±1.741 · 10
2.638 · 10 ±7.367 · 10
−0.8943±3.56 · 10
−0.2908±7.978 · 10−3
−0.73±3.009 · 10−3
0.2014±1.371 · 10−3
0.2504±5.647 · 10−3
224
7 lti ode of nominal index 2
Table 7.4: Reconstruction of the original pair by applying the reverse transformations to the pair in its canonical form. Transformations are collapsed to just
one matrix on each side of the pair, before the pair itself is transformed.
 1.171±1.161
−0.5674±0.1622
0.1013±0.1536
1.055±0.1129
0.548±0.9027
 1.548±2.023
0.6351±0.2839
−0.8141±0.2683 0.3438±0.1969
−0.1653±1.574
 0.4979±2.43
0.8838±0.3404
1.2±0.3215
−0.4501±0.2361
−1.543±1.889

 0.4004±1.821 −7.145 · 10−2 ±0.2545 0.1257±0.2403 0.2927±0.1768 −6.566 · 10−2 ±1.415
 −5.1 · 10−2 ±3.839 6.641 · 10−2 ±0.5376 4.574 · 10−2 ±0.5076 0.4459±0.373
0.3135±2.985
 0.2989±2.961
−0.4706±0.4161
0.1419±0.3929
0.2915±0.2882
−0.5706±2.304

 −0.3309±2.288
0.7618±0.3214
0.6939±0.3032
0.1694±0.2226 −7.721 · 10−2 ±1.779
 −5.987 · 10−3 ±1.566 −0.6369±0.2184
0.9285±0.2065
0.5135±0.1523
−1.363±1.217
−0.1328±1.501
1.085±0.2107
−0.4332±0.1989
−1.289±0.146 −2.61 · 10−2 ±1.168
 0.7529±2.272 · 10−2 0.3395±3.392 · 10−3 5.561 · 10−2 ±3.576 · 10−3
−0.457±2.39 · 10−3
−0.5196±1.793 · 10−2

1.088±3.035 · 10−2
 5.129 · 10−2 ±3.855 · 10−2 −0.3913±5.704 · 10−3 −0.6202±5.892 · 10−3 5.836 · 10−2 ±4.016 · 10−3
 0.5898±4.695 · 10−2 0.4885±6.988 · 10−3 0.1507±7.263 · 10−3
−0.2205±4.909 · 10−3
−0.6626±3.702 · 10−2

 5.272 · 10−2 ±3.834 · 10−2 0.2873±5.695 · 10−3 −0.146±5.991 · 10−3
0.5952±4.008 · 10−3
−0.1544±3.024 · 10−2
 0.7123±7.614 · 10−2
−2 6.267 · 10−2 ±1.197 · 10−2
−3
0.6824±1.14
·
10
0.2421±7.988
·
10
−0.5741±6.013
· 10−2

 0.6296±5.635 · 10−2 −0.1869±8.424 · 10−3 0.8699±8.747 · 10−3
−0.5955±5.886 · 10−3
9.178 · 10−2 ±4.447 · 10−2
 0.642±4.694 · 10−2 −0.3319±7.059 · 10−3 0.6418±7.31 · 10−3
−0.6209±4.972 · 10−3
−0.1123±3.707 · 10−2

 8.108 · 10−2 ±2.801 · 10−2 0.7407±4.207 · 10−3 9.954 · 10−2 ±4.454 · 10−3 −4.767 · 10−3 ±2.946 · 10−3 −1.428±2.214 · 10−2
0.3742±3.396 · 10−2
0.1085±5.004 · 10−3
−0.4166±5.186 · 10−3
2.564 · 10−2 ±3.529 · 10−3
0.3367±2.673 · 10−2
···

−0.2089±1.369
−1.384±0.2295
0.2263±7.882 · 10−2
0.5791±0.9775

0.2955±2.385
1.675±0.4023
0.1552±0.1375
0.3123±1.704


0.2185±2.864 2.49 · 10−2 ±0.4825
0.6209±0.1649
−0.2509±2.046 
0.4631±2.146
0.4445±0.3601
−0.8601±0.1232
−0.2952±1.532 


−0.5537±4.525
−0.4043±0.7619
0.6138±0.2604
0.7161±3.232

5.913 · 10−2 ±3.49
0.1072±0.59
0.1068±0.2016
−3.191 · 10−2 ±2.494 

−0.7586±2.697 −5.372 · 10−2 ±0.455
−0.8441±0.155
7.68 · 10−2 ±1.927 

−0.8264±1.847
−0.3147±0.3083
0.3674±0.1056
−0.2371±1.317 
0.1883±1.77
−0.724±0.2986 −6.014 · 10−2 ±0.1019
1.176±1.265

−0.102±2.702 · 10−2 4.029 · 10−2 ±5.151 · 10−3 2.235 · 10−2 ±2.046 · 10−3
−0.4877±1.993 · 10−2 

−0.2865±4.578 · 10−2 −1.242±8.507 · 10−3
−0.1302±3.247 · 10−3
0.1593±3.35 · 10−2


−0.1389±5.579 · 10−2
0.3284±1.049 · 10−2
0.3263±4.045 · 10−3
−3.032 · 10−4 ±4.094 · 10−2 

0.9214±4.56 · 10−2
−0.7305±8.636 · 10−3 −4.964 · 10−2 ±3.421 · 10−3
0.5177±3.358 · 10−2


0.3611±9.055 · 10−2
−0.2837±1.726 · 10−2
0.5257±6.771 · 10−3
0.8435±6.671 · 10−2

−0.1887±6.696 · 10−2
0.5811±1.269 · 10−2
0.3857±4.833 · 10−3
0.8764±4.916 · 10−2

1.023±5.579 · 10−2
−0.2421±1.048 · 10−2
−0.781±3.975 · 10−3
−0.5841±4.088 · 10−2 

−2
−3
−2
−3
−2

0.1681±3.334 · 10
0.9353±6.413 · 10
2.638 · 10 ±2.573 · 10
−0.8943±2.465 · 10
−2
−3
−3
−2
−0.2908±4.034 · 10
−0.73±7.492 · 10
0.2014±2.893 · 10
0.2504±2.954 · 10
7.B
Example data
225
Table 7.5: Reconstruction of the original pair by applying the reverse transformations to the pair in its canonical form. Transformations are applied one by
one.
8
LTV ODE
of nominal index 1
In the previous chapter, we explored some of the difficulties in generalizing the results for lti systems of nominal index 1 to nominal index 2. In this chapter we take
on another generalization of the lti nominal index 1 results, namely that to timevarying systems. In view of the failure to produce a general convergence result for
the lti systems of nominal index 2, treating ltv systems of nominal index 1 is the
best we can hope for. Unlike chapter 6, the technicalities of dealing with non-zero
pointwise indicies are avoided in the current chapter (compare with section 6.8).
The idea to use a fixed-point theorem to prove the existence of decoupling transforms
for ltv systems appears in Chang (1969, 1972). When it appears again in Kokotović
et al. (1986, section 5:2), it has been modified slightly, and we shall remark on the
difference in due time.
The chapter is organized as follows. Section 8.1 prepares the analysis of systems
with timescale separation by considering systems where only the fast time scale is
present. For ltv dae of nominal index 1, the first steps of analysis (corresponding
to section 6.1 for lti systems) lead to the linear time-varying matrix-valued singular
perturbation form
!
x0 (t) + A11 (t) x(t) + A12 (t) z(t) = 0
!
E(t) z 0 (t) + A21 (t) x(t) + A22 (t) z(t) = 0
The decoupling of these equations into slow and uncertain subsystems is the topic
of section 8.2, and section 8.3 contains some remarks on the difference compared
to the scalar perturbation case. In section 8.4 the results of previous sections are
summarized in a theorem for ltv dae of nominal index 1. Section 8.5 concludes the
chapter.
227
228
8.1
8 ltv ode of nominal index 1
Slowly varying systems
Here, the results in Kokotović et al. (1986, section 5.2) are generalized to matrixvalued singular perturbations. The form of equations to be analyzed is
!
E(t) z 0 (t) + A(t) z(t) = 0
(8.1)
where E(t) is an unknown square matrix, which is at least assumed non-singular and
with a known bound on the entries, max(E(t)) ≤ m. (For comparison, the uncertainty
has the form E(t) = I in Kokotović et al. (1986).) Our interest is restricted to systems
whose time-invariant approximations at each time instant are stable, as formalized
by the following assumption about the eigenvalues λ of the pair ( E(t), A(t) ) for a
fixed t:
A1–[8.1] Assumption. Assume there exist constants R0 > 0, φ0 ∈ [ 0, π/2 ), and
ā > supt max(A(t)) such that
|λ| m < ā
and
|λ| > R0 =⇒ |arg(−λ)| ≤ φ0
(A1–8.2)
where ā presents a trade-off between generality of the assumption and the quantitative properties of the forthcoming convergence results.
We refer to section 6.5, A1–[6.14], and lemma 6.18 for illustration and discussion of
this assumption. The method used in experiments to produce time-varying perturbations in agreement with A1–[8.1] is described in appendix A.
Two more constants are introduced to specify properties of A.
P1–[8.2] Property. The constant c2 < ∞ shall be chosen so that
(P1–8.3)
kAkI ≤ c2
P2–[8.3] Property. The constant c3 < ∞ shall be chosen so that
A−1 ≤ c
I
3
(P2–8.4)
The bound on kAkI is used also in Kokotović et al. (1986), while the bound on A−1 I
is a consequence of the need to deal with the matrix-valued uncertainty E instead of
a scalar.
8.4 Remark. To depend on a bound on A−1 I is actually very natural for the present setup.
Since a bound on kAkI is needed, it is realized that the smaller this bound is, the stronger the
conclusions regarding convergence will be. Further, any convergence result should be such
that smaller values of the bound
m on E(t) also leads to stronger conclusions. Hence, if there
would be no need to bound A−1 I , scaling both E and A by some positive factor less than
would yield stronger results! This is clearly contradictory, and we may consider bounding
1
A−1 as a way of fixing the scaling of the problem so that an absolute interpretation of m
I
becomes meaningful.
8.5 Remark. That P2–[8.3] should relate to the scaling of the problem is also well in agreement with example 6.28, where
the
was fixed
by inverting the trailing
scaling of the problem
matrix. Then, a bounded A(t)−1 2 ensures that max A(t)−1 E(t) will still be O( m ). In view
8.1
229
Slowly varying systems
of the possibility to invert the trailing matrix, and in view of the success of this approach in
!
example 6.28, it would also make sense to use E(t) z 0 (t) + z(t) = 0 as a starting point instead
of (8.1). On the other hand, the inversion of the trailing matrix will generally introduce additional uncertainty in the problem, and it is therefore of value to not assume that this has been
done beforehand.
The assumption A1–[8.1] is recognized as (A1–6.26) in chapter 6, to which we refer
for illustration and discussion of this condition. The following results are readily
extracted from corollary 6.19 in the same chapter.
8.6 Lemma. Under A1–[8.1], there is a constant k1 such that
m E(t)−1 A(t) ≤ k
1
2
(8.5)
Proof: This is readily extracted from the proof of corollary 6.19.
Since
−1 −1
−1 −1 −1
λmin −E(t)−1 A(t) ≥ A(t)−1 E(t)2 ≥ A(t)−1 2 kE(t)k−1
2 ≥ c3 n m
(where n is the dimension of (8.1)) we will only consider
−1 −1
m ≤ R−1
0 c3 n
(8.6)
in the rest of the chapter, so that the A1–[8.1] argument bound on the uncertain
eigenvalues applies.
8.7 Lemma. Assume A1–[8.1] and take m according to (8.6). Then there exist constants K∗ and a∗ > 0 such that for all θ ≥ 0,
e−E(t)−1 A(t) θ ≤ K∗ e−a∗ θ
(8.7)
2
Proof: As in corollary 6.19, we use that A1–[8.1] together with (8.6) implies that
there exists a constant a∗ > 0 such that
1
α −E(t)−1 A(t) < −2 m−1 c3−1 n−1 cos( φ0 ) < 0
2
|
{z
}
≥a∗
Hence (using lemma 8.6) the ratio
−E(t)−1 A(t)
2
−α( −E(t)−1 A(t) )
is also bounded by a constant independent of t, and the bound (8.7) is now a consequence of theorem 2.27.
8.8 Corollary. Assuming A1–[8.1], using P2–[8.3], and taking m according to
lemma 8.7, there is a bound on
m E −1 I
230
8 ltv ode of nominal index 1
Proof:
m E −1 = m E −1 A A−1 ≤ c3 k1
I
I
The most important thing in lemma 8.7 is that the exponential decay rate (with respect to θ) is independent of t. From here, it would be possible to derive results along
a parallel path to that taken in Kokotović et al. (1986), but instead of going along a
parallel track we shall build upon those results.
Since it has been assumed that E(t) is invertible, the system (8.1) can be written in
ode form. Scaling the equation by m, the ode reads
m z 0 (t) = −m E(t)−1 A(t) z(t)
(8.8)
which reminds of the standard singular perturbation setup, although it contains the
matrix-valued uncertainty E(t) on the right hand side. However, not much has to
be known about the right hand side in order to apply the results in Kokotović et al.
(1986), and the assumptions 2.1 and 2.2 made there have already been treated in
the current context as part of the proof of lemma 8.7. We now make an additional
assumption regarding the time-variability, corresponding to assumption 2.3 in Kokotović et al. (1986).
A2–[8.9] Assumption. Assume
d m E(t)−1 A(t) ≤ β
1
dt
(A2–8.9)
I
for some constant β1 .
This assumption involves the time variability of E(t), and may seem hard to justify
in applications. However, in seminumerical approaches to index reduction in dae,
the uncertainty E(t) may be a symbolic expression which one is unable (or unwilling
to try) to reduce to zero, and then it may be possible to compute a true bound on the
time variability of E(t). By writing
0
m E −1 A = m E −1 A A−1 A0 − m E −1 A A−1 m−1 E 0 m E −1 A
it is seen that (using lemma 8.6)
d m E(t)−1 A(t) ≤ k c A0 (t) + k m−1 E 0 (t) 1 3
1
dt
2
2
2
It follows that the following two conditions may be a useful alternative to A2–[8.9].
P3–[8.10] Property. The constant β2 < ∞ shall be chosen so that
A0 ≤ β
I
2
(P3–8.10)
A3–[8.11] Assumption. Assume
E 0 ≤ m β3
I
for some constant β3 .
(A3–8.11)
8.1
231
Slowly varying systems
The second of these should be interpreted as a requirement that the bound on kE 0 kI
scales with the bound on kEkI , which should be reasonable in many situations.
8.12 Lemma. Given a consistent choice of eigenvalue conditions, selecting β3 ≥
in A3–[8.11] is sufficient to ensure the existence of a perturbation E.
m
supt max(A(t))
β2
m
0
c2 max(A (t)) ≤ m c2 .
Proof: It suffices to note that the instantiation E(t) =
all t) both max(E(t)) ≤ m and
max(E 0 (t))
=
β2
c2
A(t) satisfies (for
The assumptions made so far allow us, according to lemma 2.29, to approximate
the solution to the time-varying system (8.1) by a time-invariant system, as m →
0. The lemma would have given a rather detailed account of the convergence if
−m E(s)−1 A(s) would have been known — in the usual singular perturbation setup
−m E(s)−1 = I and A is assumed to be a known slowly varying matrix — but here
the presence of the matrix-valued uncertainty E(s) implies that convergence to zero
is the only kind of convergence we can hope for. Since t > s implies
e−m E(s)−1 A(s) ( t−s )/m → 0, as m → 0
2
according to theorem 2.27, pointwise convergence of φ( t, s ) for t ≥ s is established
(here, we used φ( s, s ) = I ). However, we shall also include another proof of this fact
without taking the detour via lemma 2.29.
Let P (t) be the solution to the time-invariant Lyapunov equation (2.44) with M sub1
stituted by −m E(t)−1 A(t) (so that z 0 = − m
M). Then
V ( t, z ) = z TP (t) z
is a time-dependent Lyapunov function candidate. Since
d
1
V ( t, z(t) ) = − z(t)T P (t) M(t) + M(t)TP (t) z(t)
dt
m
+ z(t)T P 0 (t) z(t)
1 1 − m P 0 (t)2 |z(t)|2
≤−
m
this can be made negative by taking m sufficiently small if there is a bound on kP 0 kI .
In addition to such a bound, a bound on kP kI will allow V ( 0, z(0) ) to be bounded (in
relation
to z(0), of course, and for this we only need a bound on kP (0)k2 ), and a bound
on P −1 I will show that |z(t)| is bounded by a decreasing function. The upper bound
on kP (0)k2 is readily obtained by use of (8.7) in the formal solution (2.45) to the Lyapunov equation. The corresponding lower bound is established by theorem 2.31. To
bound P 0 (t), the time-dependent Lyapunov equation may be differentiated with respect to t, which yields a new Lyapunov equation in P 0 (t), and whose formal solution
turns out to be bounded by 2 β1 and the bound for kP kI squared. These results are
applied in the next lemma.
232
8 ltv ode of nominal index 1
8.13 Lemma.
Under P1–[8.2],
P2–[8.3], A2–[8.9], the time-varying system (8.1) is•
λ̃w ( m )
uniformly γw e− m • -stable, with
r
γw = K∗
λ̃w ( m ) = c2
c2
a∗
m
1−
2
2 a∗ /( β1 K∗4 )
(8.12)
!
Proof: Inserting the bounds on kP kI (upper and lower) and kP 0 kI (upper), the convergence can be stated via the coupled system in |z(t)| and V (t) = V ( t, z(t) ):
p
|z(t)| ≤ 2 c2 V (t)
K∗2
|z(0)|2
2 a∗
!
1
m
0
V (t) ≤ −
1−
|z(t)|2
m
2 a2∗ /( β1 K∗4 )
V (0) ≤
Solving this system with equalities everywhere will give upper bounds as functions
of t. With V̄ (t) being the upper bound for V (t), one obtains
−
V (t) ≤ V̄ (t) = V (0) e
2 c2
m
!
1−
m
4
2 a2
∗ /( β1 K∗ )
t
and it just remains to take a square root.
8.14 Corollary. Under the assumptions of lemma 8.13, bounding
m by a constant
− λmw •
2
4
-stable where
less than 2 a∗ /( β1 K∗ ) makes the system (8.1) uniformly γw e
λw > 0 is independent of m.
Proof: Follows immediately from lemma 8.13.
8.2
Time-varying systems with timescale separation
In the last section, the main tool was to use bounds for −m E(t)−1 A(t), which led to
immediate application of previous results valid for scalar perturbations. In this section we shall study the decoupling transform in the presence of a matrix-valued uncertainty. For scalar perturbations in time-varying systems, a common technique is to
study series expansions in the (scalar) perturbation variable. (Naidu, 2002) However,
the technique is demanding to generalize since expanding a multivariable function
can easily result in an overwhelming amount of bookkeeping.
•
The choice of notation for the parameters is motivated by the context where the lemma is applied in
section 8.2.
8.2
233
Time-varying systems with timescale separation
For the system
!
x0 (t) + A11 (t) x(t) + A12 (t) z(t) = 0
!
E(t) z 0 (t) + A21 (t) x(t) + A22 (t) z(t) = 0
(8.13x)
(8.13z)
our goal is to study the decoupling transform, that is, the change(s) of variables that
isolates the fast and uncertain dynamics from the slow dynamics which we wish to
approximate. In a fashion similar to Kokotović et al. (1986) the existence of these
transforms will be established constructively so that their approximation properties
for small m are made visible, where m now is the upper bound on E in (8.13) rather
than (8.1).
8.2.1
Overview
The first decoupling step serves to eliminate x from (8.13z). Making the change of
variables (compare (6.15) in the time-invariant case)
z(t) = L(t) x(t) + η(t)
and eliminating
x0 (t)
(8.14)
from (8.13z) by row operations, one obtains
!
x0 + ( A11 + A12 L ) x + A12 η = 0
!
0
E η + N1 x + ( A22 − E L A12 ) η = 0
(8.15x)
(8.15η)
where N1 is an expression that is to be eliminated by the choice of L. Equating N1
with 0 gives
!
E L0 = −A21 − A22 L + E L ( A11 + A12 L )
(8.16)
Assuming that this equation is solved by the choice of L (we will have to ensure
that the solution can be approximated well for sufficiently small m, even though E
is unknown), we can proceed to the second decoupling step; elimination of η from
(8.15x) by the change of variables
x(t) = ξ(t) + m H(t) η(t)
Making the change of variables and eliminating
one obtains
(8.17)
η 0 (t) from (8.15x) by row operations,
!
ξ 0 + ( A11 + A12 L ) ξ + N2 η = 0
!
E η 0 + ( A22 − E L A12 ) η = 0
where N2 is to be eliminated by the choice of H. Equating N2 with 0 gives
!
m H 0 = m H E −1 A22 − L A12 − A12 − m ( A11 + A12 L ) H
(8.18ξ)
(8.18η)
(8.19)
Until this point, the steps taken to derive the decoupling transform have been almost
identical to those in Kokotović et al. (1986, section 5.3), but as they make a series expansion of L and H in the scalar perturbation parameter (with the coefficients being
functions of time), the close similarity must end here.
234
8 ltv ode of nominal index 1
To make use of the results in the previous section we need to replace P2–[8.3]. It
would be an unnecessary restriction to require that the slow dynamics of the coupled
system must not have eigenvalues at the origin, and the property we need is another
one, stated next.
P4–[8.15] Property. The constant c3 < ∞ shall be chosen so that
A−1 ≤ c
3
22 I
8.2.2
(P4–8.20)
Eliminating slow variables from uncertain dynamics
Since we are interested in convergence as m → 0, rather than forming an expansion
of L (and soon H) in E, we write L in the form•
L(t) = mN RL (t) +
N
−1
X
mj Lj (t)
(8.21)
j=0
where, in order for the expansion to be meaningful, we must ensure that RL is
bounded independently of m, and that each Lj can be approximated sufficiently well
independently of m. In this work, the main focus is to prove convergence and it
suffices to consider N = 1 — larger values of N are of interest when a more accurate
solution to the original equations is sought. In general, it will only be possible to
show that the expansion is valid for sufficiently small m, but it is of interest to find an
upper bound on m where validity is known. To find equations for RL and Lj , (8.21)
is used in (8.16), occurrences of E are rewritten as m ( m−1 E ), and equal powers in m
(considering m−1 E as one unit) are identified. For N = 1 (8.16) is written
!
m (m−1 E) m R0L + L00 = −A21 − A22 [ m RL + L0 ]
+ m (m−1 E) [ m RL + L0 ] ( A11 + A12 [ m RL + L0 ] )
Gathering the m0 terms and collecting what remains, the following two equations are
obtained
!
(8.22a)
0 = A21 + A22 L0
!
m R0L + L00 = −m E −1 A22 RL + [ m RL + L0 ] ( A11 + A12 [ m RL + L0 ] )
h
i
= −m E −1 A22 − L0 A12 RL
+ L0 ( A11 + A12 L0 ) +m RL ( A11 + A12 ( m RL + L0 ) )
|
{z
}
|
{z
}
f1
(8.22b)
f 2 ( RL )
−A22 (t)−1 A21 (t)
By this identification, L0 (t) =
is completely independent of E, and
assuming A21 and A22 are bounded with bounded derivatives, P2–[8.3] implies that
•
Unlike the previous chapters on ltv systems, we now write Lj instead of Lj to denote the different functions in the expansion. This notation allows the usual notation for the differentiated function to be used
conveniently, and unlike the nominal index 2 case, there is no partitioning of L with blocks to be referred
to using subscripts.
8.2
235
Time-varying systems with timescale separation
both L0 (t) and L00 (t) can be bounded independently of t. It follows that there exists a
c4 such that
−L0 + f ≤ c
(8.23)
0
1 I
4
It remains to establish boundedness of RL , and in the spirit of Kokotović et al. (1986)
and section 7.A.1 we do so by a contraction mapping argument. We shall define an
operator S to act on RL ∈ L = RL : kRL kI < ρL (ρL will be chosen later) such that
• S RL = RL implies that RL solves (8.22b).
• S maps L into itself if ρL is chosen small enough.
• S RL,1 − S RL,2 I < c RL,1 − RL,2 I for some c < 1.
When these conditions hold, it follows that the fix point equation S RL = RL has a
unique solution in L, hence establishing boundedness of the solution to (8.22b).
We now introduce the approximation of the η subsystem (8.18η) obtained by replacing the decoupling function L by all terms in the expansion (8.21) except for the so
far unknown rest term m RL ,
!
E w0 + (A22 − E L0 A12 ) w = 0
(8.24)
It is for this system the assumption A1–[8.1] will be made in the current context. We
must remark that this is not quite satisfying since (8.24) has not yet been related to
the system features of the system modeled by the equations. We shall discuss this
choice briefly soon hereafter, and then discuss the issue in more detail in section 8.A.
We now check P1–[8.2], P2–[8.3], and P3–[8.10] for the trailing matrix in (8.24) in
place of the A of section 8.1 as follows.
• P2–[8.3] may be checked by first bounding A−1
22 I and kL0 A12 kI , and then using corollary 2.47. Less restrictive bounds on m can be obtained by considering
the bound as a function of time.
• P1–[8.2] is checked directly by inspection of A22 , L0 A12 = −A−1
22 A21 A12 and
any of the bounds imposed on m (for instance the one used to check P2–[8.3]).
• P3–[8.10] is checked using A3–[8.11] and inspection of A022 and the time derivative of L0 A12 .
Applying corollary 8.14
to (8.24)instead of (8.1) now gives that, for sufficiently small
m, (8.24) is uniformly γw e−
λw
m •
-stable for some constants γw and λw > 0.
Let φ denote the transition matrix of (8.24), so that taking m sufficiently small gives
φ(t, t) = I
φ(•, τ)0 (t) = − E(t)−1 A22 (t) − L0 (t) A12 (t) φ(t, τ)
φ(τ, •)0 (t) = φ(τ, t) E(t)−1 A22 (t) − L0 (t) A12 (t)
λw
φ(t, τ) ≤ γw e− m ( t−τ )
2
236
8 ltv ode of nominal index 1
Regarding the choice of system associated with φ, Kokotović et al. (1986) differs from
the original Chang (1969), and our choice is a compromise. We prefer not to follow
Chang (1969) since that would require the properties of (8.18η), where L appears,
to be checked in the process of determining L. On the other hand, we prefer not
to follow Kokotović et al. (1986) since that would make the discrepancy larger than
necessary between equations for which A1–[8.1] is assumed and equations which can
be related to the system being modeled. A feature of this work shared by Kokotović
et al. (1986) but not by Chang (1969), is the splitting of L into a nominal part and
higher order terms, providing better insight into approximation properties and the
decoupled problem.
By differentiation with respect to t, the following choice of S is verified to be compatible with (8.22b) (compare example 2.48, (2.88))
1
(S RL ) (t) =
m
4
Zt
φ(t, τ) −L00 (τ) + f1 (τ) + m f2 ( RL )(τ) dτ
(8.25)
0
In addition to (8.23), RL ∈ L implies that
kf2 ( RL )kI ≤ cξξ ρL + m cxz ρL2
(8.26)
for some cξξ , cxz that satisfy
kA11 + A12 L0 kI < cξξ
kA12 kI < cxz
(8.27)
To ensure that S maps L into itself, we note that for RL ∈ L,
1
kS RL kI ≤
m
Zt
γw e−
λw
m
( t−τ )
h
i
c4 + m cξξ ρL + m2 cxz ρL2 dτ
0
γ γ c
≤ w 4 + w m cξξ ρL + m2 cxz ρL2
λw
λw
The second of these terms will be made small by imposing a bound on m, so by taking
αL > 0 and setting
γ c
(8.28)
ρL = ( 1 + αL ) w 4
λw
we obtain kS RL kI < ρL whenever m cξξ ρL + m2 cxz ρL2 < αL c4 , or

s

!2 



cξξ
λw
αL c4
 cξξ

m<
+
+
 −

( 1 + αL ) γw c4  2 cxz
cxz
2 cxz 
λw
= αL
+ O( αL2 )
cξξ γw
To establish the contraction on L, note that
Zt
φ(t, τ) f2 ( RL,2 )(τ) − f2 ( RL,1 )(τ) dτ
S RL,2 (t) − S RL,1 (t) =
0
(8.29)
8.2
237
Time-varying systems with timescale separation
Hence, a sufficient condition for S to be a contraction on L is available in (using (7.37)
to express the difference between the terms quadratic in RL )
m γw S RL,2 − S RL,1 ≤
cξξ + 2 cxz ρL RL,2 − RL,1 I
I
λw
!
cξξ γw
+ 2 cxz c4 ( 1 + αL ) RL,2 − RL,1 I
=m
λw
As αL → 0, this the condition for contraction tends to a constant positive bound on
m (we use 0.99 < 1 just to ensure strict contraction),
m<
=
0.99
cξξ γw
λw
+ 2 cxz c4 ( 1 + αL )
0.99
cξξ γw
λw
+ 2 cxz c4
(8.30)
+ O( αL )
Hence, for small αL (8.30) will be implied by (8.29), but while the bound on m in
(8.29) is initially increasing with αL , (8.30) is decreasing for all values of αL so one
generally has to consider both bounds.
This concludes the contraction mapping argument for RL in the expansion L = L0 +
m RL . That is, by taking m less than all of the finitely many positive bounds imposed
on it, we obtain kRL kI ≤ ρL . The corresponding decoupling transformation will be
applied in section 8.4.
8.2.3
Eliminating uncertain variables from slow dynamics
We now turn to the second change of variables given by (8.17), where H satisfies
(8.19), which can be partitioned either as
!
m H 0 = m H E −1 A22 − L A12 − A12 − m [ A11 + A12 L ] H
or
!
m H 0 = m H E −1 A22 − L0 A12 − A12 − m ( m H RL A12 + [ A11 + A12 L ] H )
|
{z
}
g2 ( H )
depending on whether one prefers to reuse the transition matrix φ from the previous
section, or if one rather makes assumptions regarding the (8.18η) system which has
been isolated as a subsystem if the system being modeled. Using the first partitioning, the expression for g2 becomes easier to work with, and to check P3–[8.10] one
can use A3–[8.11] and corollary 8.8 in (8.16). On the other hand, assumptions have
already been made corresponding to the latter partitioning, and this is the choice
made here.
The following approximate expression for the solution can be derived by making an
expansion of H in powers of m,
−1
H = A12 A−1
22 m E + O(m)
238
8 ltv ode of nominal index 1
Since this expression is dominated by a term which can only be bounded, but not
further approximated, there is little use of an expansion of H in powers of m. Instead,
we aim directly for a bound on kHkI , and introduce an operator for this purpose.
Let the following operator T be defined for H ∈ H = H : kHkI < ρH (ρH will be
chosen soon). (Compare example 2.48, (2.89))
1
T H=
m
4
Ztf
[ A12 (τ) + m g2 ( H )(τ) ] φ(τ, t) dτ
(8.31)
t
For m sufficiently small to make the previous approximation of L valid, it holds that
kg2 ( H )kI
≤ m ρL kA12 kI + kA11 + A12 L0 kI + m ρL kA12 kI
kHkI
≤ cξξ + 2 m ρL cxz
(8.32)
and we obtain
h
kT HkI ≤ kA12 kI + m cξξ + 2 m ρL cxz
i
tf
1 Z φ(τ, t) dτ
ρH
I
m
t
γw
≤ cxz + m cξξ ρH + 2 m ρL cxz ρH
λw
2
We now pick αH > 0 and set•
ρH = ( 1 + αH )
γw cxz
λw
which means that T maps H into itself whenever m cξξ ρH + 2 m2 ρL cxz ρH ≤ αH cxz .
That is, we require

s
!
cξξ ρH 
cξξ ρH 2
ρH
1 

m≤
−
+
 αH
ρH 
2 ρL
2 cxz 2 ρL
2 cxz 2 ρL 
(8.33)
cxz
2
= αH
+ O(αH
)
cξξ ρH
Since g2 ( H ) is linear in H, g2 ( H2 ) − g2 ( H1 ) = g2 ( H2 − H1 ), and hence
kg2 ( H2 ) − g2 ( H1 )kI ≤ cξξ + 2 m ρL cxz kH2 − H1 kI
In
kT H2 − T H1 kI ≤
m γw cξξ + 2 m ρL cxz k H2 − H1 kI
λw
it is seen that for m < 1, the requirement for contraction here is weaker than that
for S.
•
This bound should be compared with the approximate expression for H, suggesting that the bound could
actually be as small as cxz c3 nz .
8.3
239
Comparison with scalar perturbation
For completeness, ρL is expressed using αL , yielding the following expression for the
bound on m (again 0.99 < 1 is an arbitrary choice just to ensure strict contraction)

v
t

  2 2
2
2
cξξ
cξξ

0.99
λw   cξξ 

m≤
( 1 + αL ) −
 +
 

1 + αL γw cξξ 
4 c4 cxz
2 c4 cxz
4 c4 cxz 
=
0.99 λw
γw cξξ

v
t
  c2 2
2
2
cξξ 
cξξ
  ξξ 
 +
 +O(αL )
 
−
  4 c c 
2 c4 cxz 4 c4 cxz 

4 xz
|
{z
}
(8.34)
≤1
This concludes the contraction mapping argument for H. That is, by taking m less
than all of the finitely many positive bounds imposed on it, we obtain kHkI ≤ ρH .
The corresponding decoupling transformation will be applied in section 8.4.
8.3
Comparison with scalar perturbation
Since the preceding section is similar to the treatment for a scalar perturbation found in Kokotović et al. (1986), we would like to highlight some of the differences.
• The matrix-valued uncertainty E(t) does not commute with other matrices in
the way would.
• In the change of variables (8.17), the bound m on the perturbation is used rather
than the perturbation E(t) itself.
• In the contraction operators, the transition matrix φ is associated with an approximation of what will turn out as the η subsystem, rather than with the
( E, A22 ) system. See section 8.A for a discussion.
• In the equation for H, there is now a term m H E −1 A22 where there used to be
just H A22 .
• The “nominal” (or “reduced”) solution for H, that is H0 , is no longer the known
−1 −1
entity A12 A−1
22 , but instead A12 A22 m E. However, since all that can be said
about the solution η is that it will be vanishing with m, not knowing H0 is no
limitation.
8.4
The decoupled system
Starting from the ltv dae
!
Ē(t) x̄0 + Ā(t) x̄(t) = 0
x̄(0) = x̄0
(8.35)
time-varying row and column reductions are applied in the time-varying analog of
the transforms for lti dae in section 6.1. The resulting system is (8.13), for which
section 8.2 shows the existence and approximation properties of the two decoupling
240
8 ltv ode of nominal index 1
matrices L and H. The two changes of variables given by L and H can be written
compactly as
!
! !
!
! !
ξ
I −m H x
I −m H I 0 x
=
=
η
0
I
η
0
I
−L I z
! !
I + m H L −m H x
(8.36)
=
−L
I
z
|
{z
}
T −1
with inverse given by
I
T =
L
mH
mLH +I
!
(8.37)
From the two factors making up T −1 it is readily seen that det T −1 = 1, and with
L and H bounded, any bound on m gives that T −1 is bounded, and hence that T −1
defines a Lyapunov transformation (recall definition 2.30).
Applying the transformation yields the system (8.18) which we repeat here
ξ 0 = − ( A11 + A12 L ) ξ
= − A11 − A12 A−1
22 A21 + m A12 RL ξ
!
E η 0 + ( A22 − E L A12 ) η = 0
(8.38ξ)
(8.38η)
This system has two isolated parts, where the (8.38ξ) is a regularly perturbed problem which may be addressed with any available method (see section 2.4). To be able
to establish a rate of convergence in this section, we demand one more condition on
the system (which will have to be verified in applications). Let x̊ denote the nominal
solution for ξ (and x as it turns out), that is, the solution to
x̊0 = A11 − A12 A−1
A
x̊
(8.39)
21
22
P5–[8.16] Property. The system (8.39) is uniformly exponentially stable.
h
i
By theorem 2.42, P5–[8.16] lets us conclude that (8.38ξ) is uniformly γξ e−λξ • stable for some constants γξ , λξ > 0, independently of RL . Then, Rugh (1996,
theorem 12.2) provides that (8.38ξ) is uniformly bounded-input, bounded-state stable, which means that the reformulated system (2.66) can be used to conclude that
supt |ξ(t) − x̊(t)| = O( m ).
The system (8.38η) is the kind of system considered in section 8.1, and since it is a
true subsystem of the real system, our assumptions apply and lemma 8.13 provides
parameters of uniform exponential stability for this system.
Since T −1 in (8.36) is bounded, it follows that bounded initial conditions for x(0)
and z(0) implies bounded initial conditions for η(0). Multiplying
by the exponential
convergence parameter γw , one obtains a bound on η(t), valid for all t ≥ 0. Looking
at (8.17), it is seen that the boundedness of H then implies that x converges to ξ
uniformly as m → 0.
8.4
241
The decoupled system
In order to also obtain convergence for z to z̄ = −A−1
22 A21 x̊, it is necessary to show that
η is not only bounded, but converges uniformly to 0 as m → 0 (compare with (8.14)
and recall that L = −A−1
22 A12 + O( m )). This can only follow if the initial conditions
for η(0) converges to 0 with m (and then the uniform convergence follows). The
convergence was the subject of lemma 6.4 as well as lemma 7.4. By identifying (8.13)
with (7.8), lemma 7.4 applies in the index 1 case as well, and the time-variability
here does not matter for initial conditions. Hence, there is no need to derive the
convergence again, and we just remind that the choice z 0 = −A22 (0)−1 A21 (0) x0 is the
only fixed choice for z 0 that can be used for arbitrarily small m.
The section is concluded with a theorem summarizing the convergence result for ltv
dae of nominal index 1.
8.17 Theorem. Consider the nominal index 1 ltv dae (8.35) repeated here,
!
Ē(t) x̄0 + Ā(t) x̄(t) = 0
x̄(0) = x̄0
(8.35)
and the corresponding partitioned equations (8.13) repeated here,
!
x(0) = x0
!
z(0) = z 0
x0 (t) + A11 (t) x(t) + A12 (t) z(t) = 0
E(t) z 0 (t) + A21 (t) x(t) + A22 (t) z(t) = 0
over the time interval I = [ 0, tf ). Let the nominal equation refer to the same equation, but with E(t) replaced by 0.
Let x0 , z 0 satisfy the nominal equation, and let x̊ denote the solution to (8.39) (that
is, the nominal differential equation for x). Let max(E(t)) ≤ m for all t, and make the
pointwise in time assumption A1–[8.1] regarding the eigenvalues of the approximation (8.24) of the fast and uncertain subsystem, as well as the either of the assumptions A2–[8.9] or A3–[8.11] regarding the time variability of E(t). Assume that the
properties P1–[8.2], P3–[8.10]–P5–[8.16] are also satisfied.
Then there exists constants k and m0 > 0 such that m ≤ m0 implies
sup |x(t) − x̊(t)| ≤ k m
t∈I
sup z(t) + A22 (t)−1 A21 (t) x̊ ≤ k m
t∈I
and the solution to (8.35) converges at the same rate.
Proof: This is a summary of results obtained in the present chapter.
Since the convergence of the full system is dependent of the convergence in the subsystem (8.38ξ), the requirement of P5–[8.16] can be replaced by other conditions
which enables convergence in (8.38ξ) to be established, and if the established convergence is not O( m ) uniformly in time, this convergence rate will replace the rates
in theorem 8.17. On the other hand, in a particular problem when the uncertainties
are given, the rate of convergence is not of importance since it is only the fixed bound
on supt |ξ(t) − x̊(t)| that matters.
242
8.5
8 ltv ode of nominal index 1
Conclusions
The solutions of the two timescale system (8.13) with a matrix-valued singular perturbation have been shown to converge as the bound on the perturbation tends to
zero. Aside from properties that can be verified in applications, the eigenvalue assumption used for lti systems in chapter 6 is assumed to hold pointwise in time
here, and is formulated with respect to an approximation of the fast and uncertain
subsystem rather than the true system. Further, compared to chapter 6, the timevariability of the system has led to the use of an additional assumption bounding the
time derivative of the perturbation.
Regarding directions for future research, there are some results in chapter 7 which
might be possible to generalize to ltv systems, and it remains to relax the assumption that E(t) be pointwise non-singular, so that the theory covers not only nominal
index 1, but also true index 1 systems.• However, while general convergence results
for nominal index 2 lti systems still remain to be derived, extending the results in
this chapter to higher nominal indices should wait. In the mean time, there is also
plenty of work to be done on the numeric implementation.
•
In doing so, the SVD decomposition of Steinbrecher (2006, theorem 2.4.1) is expected to be a key tool.
Appendix
8.A
Dynamics of related systems
This section contains some remarks concerning the choice of system to which the
transition matrix φ in section 8.2 belongs. Recall that we would ideally formulate
the eigenvalue assumptions for the matrix pair of (8.38η), but we were led to use
the matrix pair of (8.24) instead since (8.38η) was not available at the time when the
assumptions were needed. In retrospect, we would like to indicate how assumptions
about (8.38η) can justify the use of (8.24).
In the original references on this method, Chang (1969), the transition matrix φ is
associated with the fast (and uncertain) subsystem (8.38η), while Kokotović et al.
(1986) associates it with the system
!
E w00 + A22 w0 = 0
(8.40)
(but with E = I , of course). Our choice (8.24), repeated here,
!
E w0 + (A22 − E L0 A12 ) w = 0
(8.24)
is a third option, and it was explained in section 8.2.2 why this was preferred.
With the (8.38η) subsystem, repeated here,
!
E η 0 + ( A22 − E L A12 ) η = 0
(8.38η)
isolated by the choice of L given in section 8.2.2 we now consider associating φ with
this system instead, just like in Chang (1969), so that the eigenvalue assumptions
really concern instantaneous poles for the slowly varying fast subsystem.
First of all, once a crude estimate of L is available, the (8.38η) subsystem becomes
isolated, and it would be possible to start over from the beginning of section 8.2.2
with φ associated with (8.38η) instead of (8.24). Doing so would not justify the assumptions about (8.24), but there may be other ways of obtaining the initial crude
estimate.
243
244
8 ltv ode of nominal index 1
We require that, for m small enough, the decoupling matrix satisfies a constant
bound,
(8.41)
kLkI ≤ lˆ
(Note that we do not assume convergence as m → 0.) Then there is also a number
ρ ≤ lˆ + kL0 kI such that
km RL kI ≤ ρ
(8.42)
(It is seen that (8.42) also implies (8.41), so either one may be used as starting point.)
As on page 235, we must check P1–[8.2], P2–[8.3], and P3–[8.10] for the trailing
matrix in (8.38η) in place of the A of section 8.1. P1–[8.2] and P2–[8.3] are readily
checked using (8.41). To check P3–[8.10], the product E L is treated as one unit in
(8.38η), and in
dE(t) L(t)
(t) = E 0 (t) L(t) + E(t) L0 (t)
dt
it is seen that the time derivative is bounded by inserting (8.16) for E L0 and using
A3–[8.11] to bound E 0 L.
Making assumption A1–[8.1] for (8.38η), corollary 8.14 provides that the sys
λη tem (8.38η) is uniformly γη e− m • -stable for some constants γη, λη > 0. The
following lemma then shows that we can obtain the same qualitative exponential
convergence for the approximation (8.24).
− λmw •
8.18 Lemma. The system (8.24) is uniformly γw e
-stable for some constants
γw , λw > 0.
Proof: Rewrite (8.24) as a perturbed system,
i
h
!
w0 = − E −1 A22 − L0 A12 w = − E −1 A22 − L A12 + m RL A12 w
4
and time-scale by means of w̄(t) = w(m t). This yields
h i
w̄0 = − m E −1 A22 − L A12 + m ( m RL ) A12 w̄
Time-scaling the corresponding “nominal” system
ū 0 = −m E −1 A22 − L A12 ū
4
by means of ū(t) = u(m t) yields
u 0 = − E −1 A22 − L A12 u
which is recognized as the η subsystem (8.38η), with known uniform exponential
convergence parameters.
Due to P4–[8.15], (8.41), and corollary 2.47 it can be seen that
−1
E −1 A22 − L A12
= (A22 − E L A12 )−1 E
8.A
245
Dynamics of related systems
can be bounded by some constant
α̂u times m,for m sufficiently small. This, together
with A1–[8.1], implies that m E −1 A22 − L A12 I ≤ αū for some constant αū according
to theorem 6.11 (compare (6.28)).
λη Since the η subsystem is uniformly γη e− m • -stable, the ū system is uniformly
h
i
γη e−λη• -stable. Since the state feedback matrix of the same system has a norm
2.42 provides a bound on m which
bound of αū and we have km RLh kI ≤ ρ, theorem
i
makes the w̄ system uniformly γw e−λw • -stable for some positive constants γw , λw .
λw
It then follows that the system (8.24) is uniformly γw e− m • -stable.
The lemma shows that making assumptions about the eigenvalues of (8.38η) and
using the bound (8.41) (which may either be derived or postulated, and does not
require L to converge as m → 0), the necessary convergence property of the transition
matrix φ used in section 8.2.2 follows.
We end the section with a corollary which provides a possible substitute for
lemma 8.6, applicable in the context of the coupled system instead of the slowly
varying system in section 8.1.
8.19 Corollary. Taking m sufficiently small will provide a bound on m E −1 A22 I .
Proof: Using the αū
m E −1 A
from the proof of lemma 8.18 we obtain
= m E −1 A − L A + L A ≤ α + m lˆ
22 I
22
12
12 I
ū
kA12 kI
9
Concluding remarks
Looking back on the previous chapters in the thesis, we let the self-contained chapter 4 on filtering and chapter 5 on the new index speak for themselves. Here, we wrap
up our findings concerning matrix-valued singular perturbation problems related to
uncertain dae, because this is where the emphasis has been in the thesis.
The matrix-valued singular perturbation problems were introduced in section 1.2,
and chapter 3 detailed an application of future nonlinear results. The results in the
thesis have been limited to autonomous linear dae. Using assumptions regarding the
system poles, convergence of solutions has been established in the nominal index 1
case, for both lti and ltv dae. For lti dae of nominal index 2 we have not (except
for a very small example) been able to establish convergence of solutions, but several
results that are expected to be useful in the future have been derived. These results
include a Weierstrass-like canonical form for uncertain matrix pairs of nominal index 2. Most results assume that the pointwise index of the uncertain dae is 0, but for
lti dae of nominal index 1 results were partly extended to pointwise index 1.
Some directions for future research have been mentioned in earlier chapters, but the
following short list contains some which we think are particularly interesting.
• The canonical form for lti dae of nominal index 2 should be extended to higher
indices, and a reliable numeric implementation should be developed.
• The results for ltv systems of nominal index 1 should be extended from pointwise index 0 to pointwise index 1, as was done in the lti case.
• Other function measures or stochastic formulations of the problems may both
be relevant in applications and result in better error estimates. To consider
alternative function measures appears to be a good option also for systems with
inputs.
247
A
Sampling perturbations
Being unable to compute tight bounds in the analysis of matrix-valued singularly
perturbed systems is very related to the inability to construct worst-case perturbations. To illustrate our results, we are left with the option to sample randomly from
the set of perturbations that agree with our assumptions, and observe how the corresponding solution set changes as a function of the parameters in our assumptions. In
this chapter, we detail how the random samples were generated, so that our examples
can be reproduced and readers can try their own examples.
Since our aim in the examples is to illustrate convergence of the solutions as the
bound on the size of the perturbation tends to zero, it is desirable that the perturbations are such that there is not much slack in this constraint.
A.1
Time-invariant perturbations
Our sampling strategy for time-invariant perturbations is trivial in the index 0 case,
details follow. Given m > 0 and the parameters ā, R0 , and φ0 in
max(E) ≤ m
∀λ : |λ| m < ā
and
|λ| > R0 =⇒ |arg(−λ)| < φ0
we sample each entry of the matrix E from a uniform distribution over the interval
!
[ −m, m ], and then the whole matrix is scaled to satisfy max(E) = m. We then compute the eigenvalues (possibly taking also the slow dynamics into account), and reject
any samples that do not satisfy the eigenvalue constraints.
Clearly, one must not select ā too small, or an infinite loop of rejections will occur.
249
250
A
Sampling perturbations
0.01
0.005
0
−0.005
−0.01
0
1
2
3
t
Figure A.1: The trajectories of the entries of the perturbation E produced in
example A.1.
A.2
Time-varying perturbations
In the time-varying case, it is not only desirable to have little slack in the constraint
max(E) ≤ m, but also that there is small slack in the time variability constraint.
This makes sampling time-varying perturbations considerably harder than the timeinvariant case.
The word sample has two meanings in the current section•and to remove some of the
confusion we shall refer to the time-varying samples of E as realizations.
To obtain a computable test, the continuous-time eigenvalue constraint is relaxed by
only requiring it to hold at a limited number of sampling instants. If realization
would produce unexpected results in examples, it is important to remember this
relaxation and check the conditions more carefully before drawing any conclusions.
Our algorithm, presented in algorithm A.1 (page 251) and algorithm A.2, constructs
trajectories for E (that is, realizations) by a sequence of steps. It works with time
samples of the realization, and uses entry-wise linear interpolation between sampling instants. The algorithm is initialized with the trivial trajectory given by E(t) =
m
M22 (t) (which is assumed to fulfill the time-variability constraint), and
max(M22 (t))
maintains feasibility of the trajectory during the process of repeated perturbation of
individual samples along the trajectory.
To simplify notation, we give the algorithm for the case when there is no slow dynamics, but the extension to also include the slow dynamics is straightforward. Following
the algorithms, chapter ends with an example.
•
Note that sample is used with two meanings here. In the time-invariant setting, it was natural to think
of E as a sample from a random variable. Extending this use of sample to the time-varying setting, we
use it to refer to E as a realization of some stochastic process, and to produce such realizations is what
the chapter is all about. The other meaning of sample is to evaluate functions of time at certain sampling
instants in time.
A.2
251
Time-varying perturbations
Algorithm A.1 Sampling perturbation trajectories for ltv systems.
Input:
• An interval I of time for which the realization is to be computed.
• The trailing matrix (as a matrix-valued function of time) M22 .
• The bound m and bound on the time-variability. The following two types to
time-variability constraints will
be considered below:
d
(A.1a)
∀t : m E(t)−1 A(t) ≤ β1
dt
2
∀t : max(E 0 (t)) ≤ m β3
(A.1b)
Output: A continuous, piecewise linear trajectory E, which satisfies max(E) ≤ m and
the time-variability constraint (at all times for (A.1b), or at a finite set of sampling
instants for (A.1a)), and satisfies the sampling-relaxation of the eigenvalue constraint.
Initialization:
Select an initial number of sampling instants (not counting the initial time), and
distribute the sapling instants evenly over I.
Initialize the trajectory as a linear interpolation of the function t 7→
m
M22 (t) sampled at the sampling instants.
max(M (t))
22
Main loop:
repeat 2 or 3 times
Increase the number of sampling instants by a positive integer factor.
Distribute the sampling instants evenly over I (denote the interval t∆ ), and
sample the current trajectory accordingly. This results in a sequence of matrices { ( ti , Ei ) }i .
Perturb the sequence by performing a fixed number of minor iterations, see
algorithm A.2.
Reconstruct the continuous trajectory by linear interpolation.
end
Remarks: By keeping low the initial number of sampling instants, large but slow
variations are obtained at a moderate computational cost. However, the timevariability is constrained by the sampling interval, which is why the number of sampling instants is increased in each iteration. It should be validated that the final number of sampling instants is sufficiently large to allow the time variability constraint
to be activated.
By multiplying the number of sampling instants by an integer in each major iteration, and distributing the sampling instants evenly over I, it is ensured that the
up-sampled trajectory is identical to the trajectory at the end of the previous major
iteration. This is important for maintaining feasibility of the trajectory.
252
A
Sampling perturbations
Algorithm A.2 Details of algorithm A.1 — minor iterations.
Input/initialization: This algorithm is just a step in algorithm A.1.
Minor iteration: The number of minor iterations during one major iteration is typ-
ically in the sane order of magnitude as the number of sampling instants, but may
also be bounded by computational time considerations. In each minor iteration, the
following steps are taken to perturb the sequence of matrix samples.
Select which matrix sample to perturb at random (denoting the corresponding index i) and note the neighboring matrices (at the end points, there is just one
neighbor).
if using (A.1b)
Find the intervals of radius β3 t∆ centered at each entry in the neighboring
matrices, intersect the intervals entry-wise, and intersect also with [ −m, m ].
Using M22 (ti ), draw a random sample from the derived intervals in the same
manner as for time-invariant systems. That is, draw random samples uniformly from each interval until the eigenvalue conditions are satisfied.
optional: Scale the obtained matrix so that it has at least one entry at the
boundary of the corresponding interval.
else (That is, in case of (A.1a).)
Using M22 (ti ), draw a random sample matrix in exactly the same manner as for
time-invariant systems. Denote the resulting matrix E ∗ .
Starting with 1, try successively smaller values of the scalar parameter a in
Ei + a ( E ∗ − Ei ) until the linear interpolation between the neighbors and the
new matrix satisfies the time variability constraint at a set of intermediate
points. (Note that maintaining feasibility of the trajectory implies that a = 0
is a feasible — though pointless — choice.)
end
For each neighbor, indexed by j, select a small number of time instants evenly
distributed between ti and tj (excluding those of ti and tj that the eigenvalue
conditions have already been checked at), and use linear interpolation between
the neighbor and the new matrix to check the eigenvalue condition at the intermediate time instants.
if any eigenvalue condition fails
Discard the new matrix (and do not update the sequence of matrix samples).
else
Replace Ei by the new matrix.
else
A.2
253
Time-varying perturbations
A.1 Example
As an illustration of the sampling algorithm for time-varying perturbations, we consider the trailing matrix given by
"
#
0.94 − 0.05 log(t + 1)
4 0.94
A( t ) =
−0.46
0.23
over the time interval I = [ 0, 3.5 ].
The constraints on the time-varying perturbation are given by the following parameters:
m = 0.01
β1 = 10.0
ā = 10.0
φ0 = 1.4
R0 = 5.0
The initial trajectory was first sampled with a sampling interval of 0.3, and 30 minor
iterations were performed. Then the trajectory was sampled with a sampling interval
of 0.03, and 300 minor iterations were performed. The four entries of the final E are
shown in figure A.1. The eigenvalue assumption is verified in figure A.2, and the
time variability assumption is verified in figure A.3. The figures show that while the
constraints given by m and β1 are satisfied with little or even negative• slack, the
other constraints have large slacks. Since the eigenvalues grow as m → 0, the slack
in the lower bound given by R0 can always be made large by selecting m small, and
hence the large slack in this constraint does not have to be considered a deficiency of
the sampling algorithm.
Based on corresponding eigenvalue plots for 30 realizations of E, it was seen that the
constraint given by φ0 also can obtain small slack at some point, even though the
current algorithm has no component to increase the chance that this will happen.
Regarding the upper bound on the eigenvalues given by m−1 ā, the large slack seen
in figure A.2 appears typical for the proposed algorithm; in all 30 realizations, the
eigenvalue moduli were less then 400 at all times. In view of example 6.13 and considering that the time-variability constraint is locally independent of the pointwisein-time constraints, this is expected to be a deficiency of the algorithm, and not due
to the nature of the problem.
It would be a possible future extension of the perturbation sampling algorithm to
add components which try to minimize the slack in the constraints given by φ0 and
ā.
•
The algorithm may violate constraints since it only checks validity at a finite number of points. In the
current example, careful inspection of the plots shows that the violation occurs for a part of the trajectory which was generated in the first iteration of algorithm A.1. The current implementation checks the
constraints at two intermediate points between the sampling instants, meaning that the constraints are
checked at points 0.1 apart in the first iteration. In the next iteration where the time intervals are ten
times shorter, the violation gets detected, and attempts to modify this part of the trajectory are very likely
to be rejected.
254
A
Sampling perturbations
Im
100
−300
Re
−200
−100
d m E(t)−1 A(t)
m dt
2
Figure A.2: Verification of the eigenvalue assumptions in example A.1. The
eigenvalues of ( E(t), A(t) ) have been sampled with a time interval of 0.01. The
dashed rays show the angle constraint given by φ0 . The arc near the origin
is the lower bound on the eigenvalues given by R0 , while the upper bound
m−1 ā = 1000 is outside the figure. All constraints are seen to be satisfied.
β1
7.5
5
2.5
0
0
1
2
3
t
Figure A.3: Verification of the time-variability assumption in example A.1. Time
instants with sampling interval 0.01 are marked with dots. The horizontal marks
are used to label the points where the assumption is checked during the first
iteration of algorithm A.1. The dashed line shows the assumed bound, which is
only violated between the points checked during the first iteration.
Bibliography
Eyad H. Abed. Multiparameter singular perturbation problems: Iterative expansions
and asymptotic stability. Systems & Control Letters, 5(4):279–282, February 1985a.
Cited on page 71.
Eyad H. Abed. A new parameter estimate in singular perturbations. Systems &
Control Letters, 6(3):153–222, August 1985b. Cited on page 69.
Eyad H. Abed. Decomposition and stability of multiparameter singular perturbation
problems. IEEE Transactions on Automatic Control, AC-31(10):925–934, October
1986. Cited on page 71.
Eyad H. Abed and André L. Tits. On the stability of multiple time-scale systems.
International Journal of Control, 44(1):211–218, 1986. Cited on pages 71 and 174.
Jeffrey M. Augenbaum and Charles S. Peskin. On the construction of the voronoi
diagram on a sphere. Journal of Computational Physics, 59(2):177–192, June 1985.
Cited on page 115.
Erwin H. Bareiss. Sylvester’s identity and multistep integer-preserving Gaussian
elimination. Mathematics of Computation, 22(103):565–578, July 1968. Cited on
page 86.
William H. Beger, editor. CRC Handbook of mathematical sciences. CRC Press, Inc.,
5th edition, 1978. Cited on page 121.
Niclas Bergman. Recursive Bayesian estimation — Navigation and tracking applications. PhD thesis, Linköping University, May 1999. Cited on page 113.
Stephen Boyd, Laurent El Ghaoui, Eric Feron, and Venkataramanan Balakrishnan.
Linear matrix inequalities in system and control theory. SIAM Studies in Applied
Mathematics, 1994. Cited on pages 58, 61, and 62.
P. C. Breedveid. Proposition for an unambiguous vector bond graph notation. Journal of Dynamic Systems, Measurement, and Control, 104(3):267–270, September
1982. Cited on page 24.
255
256
Bibliography
Kathryn Eleda Brenan, Stephen L. Campbell, and Linda Ruth Petzold. Numerical
solution of initial-value problems in differential-algebraic equations. SIAM, 1996.
Classics edition. Cited on pages 27, 47, 48, 49, 51, and 100.
Peter N. Brown, Alan C. Hindmarsh, and Linda Ruth Petzold. Using Krylov methods in the solution of large-scale differential-algebraic systems. SIAM Journal on
Scientific Computation, 15(6):1467–1488, 1994. Cited on page 51.
Anders Brun, Carl-Fredrik Westin, Magnus Herberthson, and Hans Knutsson. Intrinsic and extrinsic means on the circle — a maximum likelihood interpretation.
In IEEE Conference on Acoustics, Speech, and Signal Processing, volume 3, pages
1053–1056, Honolulu, HI, USA, April 2007. Cited on pages 110 and 119.
Dag Brück, Hilding Elmqvist, Hans Olsson, and Sven Erik Mattsson. Dymola for
multi-engineering modeling and simulation. 2nd International Modelica Conference, Proceedings, pages 55–1–55–8, March 2002. Cited on page 34.
R. S. Bucy and K. D. Senne. Digital synthesis of nonlinear filters. Automatica, 7:
287–298, 1971. Cited on page 114.
Benno Büeler, Andreas Enge, and Komei Fukuda. Polytopes: Combinatorics and
computation, pages 131–154. Number 29 in DMV Seminar. Birkhäuser, 2000.
Chapter title: Exact volume computation for polytopes: A practical study. Cited
on page 122.
Stephen L. Campbell. Least squares completions for nonlinear differential algebraic
equations. Numerische Mathematik, 65(1):77–94, December 1993. Cited on page
130.
Stephen L. Campbell and C. William Gear. The index of general nonlinear daes.
Numerische Mathematik, 72:173–196, 1995. Cited on pages 29, 32, 33, and 106.
Lamberto Cesari. Asymptotic behavior and stability problems in ordinary differential
equations. Springer-Verlag, third edition, 1971. Cited on page 66.
Kok Wah Chang. Remarks on a certain hypothesis in singular perturbations. Proceedings of the American Mathematical Society, 23(1):41–45, October 1969. Cited
on pages 69, 214, 227, 236, and 243.
Kok Wah Chang. Singular perturbations of a general boundary value problem. SIAM
Journal on Mathematical Analysis, 3(3):520–526, August 1972. Cited on pages 69,
211, and 227.
Alessandro Chiuso and Stefano Soatto. Monte Carlo filtering on Lie groups. In Proceedings of the 39th IEEE Conference on Decision and Control, pages 304–309,
Sydney, Australia, December 2000. Cited on page 112.
Daniel Choukroun, Itzhack Y. Bar-Itzhack, and Yaakov Oshman. Novel quaternion
Kalman filter. IEEE Transactions on Areospace and Electronic Systems, 42(1):174–
190, January 2006. Cited on page 112.
Bibliography
257
Timothy Y. Chow. The surprise examination or unexpected hanging paradox. American Mathematical Monthly, 105(1):41–51, January 1998. Cited on page 47.
Shantanu Chowdhry, Helmut Krendl, and Andreas A. Linninger. Symbolic numeric
index analysis algorithm for differential algebraic equations. Industrial & Engineering Chemistry Research, 43(14):3886–3894, 2004. Cited on pages 47 and 91.
Earl A. Coddington and Norman Levinson. Theory of ordinary differential equations.
Robert E. Krieger Publishing Company, Inc., third edition, 1985. Cited on pages
66 and 143.
Cyril Coumarbatch and Zoran Gajic. Exact decomposition of the algebraic Riccati
equation of deterministic multimodeling optimal control problems. IEEE Transactions on Automatic Control, 45(4):790–794, April 2000. Cited on page 71.
John L. Crassidis, F. Landis Markley, and Yang Cheng. Survey of nonlinear attitude
estimation methods. Journal of Guidance, Control, and Dynamics, 30(1):12–28,
January 2007. Cited on pages 110 and 113.
Fred Daum. Nonlinear filters: Beyond the kalman filter. IEEE Aerospace and Electronic Systems Magazine, 20(8:2):57–69, 2005. Cited on page 112.
Alekseı̆ Fedorovic Filippov. Differential equations with discontinuous righthand
sides. Mathematics and its applications. Kluwer Academic Publishers, 1985. Cited
on pages 61 and 62.
Theodore Frankel, editor. The geometry of physics — an introduction. Cambridge
University Press, 2nd edition, 2004. Cited on page 111.
Peter Fritzson, Peter Aronsson, Adrian Pop, David Akhvlediani, Bernhard Bachmann,
David Broman, Anders Fernström, Daniel Hedberg, Elmin Jagudin, Håkan Lundvall, Kaj Nyström, Andreas Remar, and Anders Sandholm. OpenModelica system
documentation — preliminary draft, 2006-12-14, for OpenModelica 1.4.3 beta.
Technical report, Programming Environment Laboratory — PELAB, Department
of Computer and Information Science, Linköping University, Sweden, 2006a. Cited
on page 34.
Peter Fritzson, Peter Aronsson, Adrian Pop, David Akhvlediani, Bernhard Bachmann,
David Broman, Anders Fernström, Daniel Hedberg, Elmin Jagudin, Håkan Lundvall, Kaj Nyström, Andreas Remar, and Anders Sandholm. OpenModelica users
guide — preliminary draft, 2006-09-28, for OpenModelica 1.4.2. Technical report,
Programming Environment Laboratory — PELAB, Department of Computer and
Information Science, Linköping University, Sweden, 2006b. Cited on page 34.
Komei Fukuda. cddlib reference manual, version 0.94. Institute for Operations
Research and Institute of Theoretical Computer Science, ETH Zentrum, CH8092 Zurich, Switzerland, 2008. URL http://www.ifor.math.ethz.ch/
~fukuda/cdd_home/cdd.html. Cited on page 122.
Markus Gerdin. Identification and estimation for models described by differentialalgebraic equations. PhD thesis, Linköping University, 2006. Cited on page 23.
258
Bibliography
Developers of GMP. The GNU multiple precision arithmetic library, version 4.3.1.
Free Software Foundation, 2009. URL http://gmplib.org/. Cited on page 7.
Sergeı̆ Konstantinovich Godunov. Ordinary differential equations with constant coefficient, volume 169 of Translations of mathematical monographies. American
Mathematical Society, 1997. Cited on page 55.
Gene H. Golub and Charles F. Van Loan. Matrix computations. The Johns Hopkins
University Press, third edition, 1996. Cited on pages 22 and 100.
P. Gurfil and M. Jodorkovsky. Unified initial condition response analysis of Lur’e systems and linear time-invariant systems. International Journal of Systems Science,
34(1):49–62, 2003. Cited on page 53.
Ernst Hairer and Gerhard Wanner. Solving ordinary differential equations II — Stiff
and differential-algebraic problems, volume 14. Springer-Verlag, 1991. Cited on
page 51.
Ernst Hairer, Christian Lubich, and Michel Roche. The numerical solution of
differential-algebraic systems by Runge-Kutta methods. Lecture Notes in Mathematics, 1409, 1989. Cited on page 33.
Michiel Hazewinkel, editor. Encyclopedia of mathematics, volume 8. Kluwer Academic Publishers, 1992. URL http://eom.springer.de/. Cited on page 126.
Nicholas J. Higham. A survey of condition number estimation for triangular matrices.
SIAM Review, 29(4):575–596, December 1987. Cited on page 82.
Nicholas J. Higham, D. Steven Mackey, and Françoise Tisseur. The conditioning of
linearizations of matrix polynomials. SIAM Journal on Matrix Analysis and Applications, 28(4):1005–1028, 2006. Cited on pages 40 and 72.
Inmaculada Higueras and Roswitha März. Differential algebraic equations with
properly stated leading terms. Computers & Mathematics with Applications, 28
(1–2):215–235, 2004. Cited on page 38.
Alan C. Hindmarsh, Radu Serban, and Aaron Collier. User documentation for IDA
v2.4.0. Technical report, Center for Applied Schientific Computing, Lawrence Livermore National Laboratory, 2004. Cited on pages 51 and 98.
Alan C. Hindmarsh, Peter N. Brown, Keith E. Grant, Steven L. Lee, Radu Serban,
Dan E. Shumaker, and Carol S. Woodward. SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers. ACM Transactions on Mathematical Software,
31(3):363–396, 2005. Cited on page 51.
Lars Hörmander. An introduction to complex analysis in several variables. The
University Series in Higher Mathematics. D. Van Nostrand, Princeton, New Jersey,
1966. Cited on pages 143 and 155.
M. E. Ingrim and Masada G. Y. The extended bond graph notation. Journal of Dynamic Systems, Measurement, and Control, 113(1):113–117, March 1991. Cited on
page 24.
Bibliography
259
Luc Jaulin, Michel Kieffer, Olivier Didrit, and Walter Éric. Applied interval analysis.
Springer-Verlag, London, 2001. Cited on pages 77 and 195.
Luc Jaulin, Isabelle Braems, and Eric Walter. Interval methods for nonlinear identification and robust control. In Proceedings of the 41st IEEE Conference on Decision
and Control, pages 4676–4681, Las Vegas, NV, USA, December 2002. Cited on
page 77.
Andrew H. Jazwinski. Stochastic processes and filtering theory, volume 64 of Mathematics in science and engineering. Academic Press, New York and London, 1970.
Cited on page 17.
Ulf T. Jönsson. On reachability analysis of uncertain hybrid systems. In Proceedings
of the 41st IEEE Conference on Decision and Control, pages 2397–2402, Las Vegas,
NV, USA, December 2002. Cited on page 62.
Thomas Kailath. Linear Systems. Prentice-Hall, Inc., 1980. Cited on page 13.
M. Kathirkamanayagan and G. S. Ladde. Diagonalization and stability of large-scale
singularly perturbed linear system. Journal of Mathematical Analysis and Applications, 135(1):38–60, October 1988. Cited on page 71.
R. Baker Kearfott. Interval computations: Introduction, uses, and resources. Euromath Bulletin, 1(2):95–112, 1996. Cited on page 77.
Hassan K. Khalil. Asymptotic stability of nonlinear multiparameter singularly perturbed systems. Automatica, 17(6):797–804, November 1981. Cited on page 70.
Hassan K. Khalil. Time scale decomposition of linear implicit singularly perturbed
systems. IEEE Transactions on Automatic Control, AC-29(11):1054–1056, November 1984. Cited on page 69.
Hassan K. Khalil. Stability of nonlinear multiparameter singularly perturbed systems. IEEE Transactions on Automatic Control, AC-32(3):260–263, March 1987.
Cited on page 71.
Hassan K. Khalil. Nonlinear systems. Prentice Hall, Inc., third edition, 2002. Cited
on pages 52, 61, and 66.
Hassan K. Khalil and Peter V. Kokotović. D-stability and multi-parameter singular perturbation. SIAM Journal on Control and Optimization, 17(1):56–65, 1979.
Cited on pages 70 and 71.
K. Khorasani and M. A. Pai. Asymptotic stability improvements of multiparameter
nonlinear singularly perturbed systems. IEEE Transactions on Automatic Control,
AC-30(8):802–804, 1985. Cited on page 70.
Petar V. Kokotović. A Riccati equation for block-diagonalization of ill-conditioned
systems. IEEE Transactions on Automatic Control, 20(6):812–814, December 1975.
Cited on page 211.
260
Bibliography
Petar V. Kokotović, Hassan K. Khalil, and John O’Reilly. Singular perturbation methods in control: Analysis and applications. Academic Press Inc., 1986. Cited on
pages 2, 57, 67, 68, 149, 168, 227, 228, 230, 233, 235, 236, 239, and 243.
Steven G. Krantz and R. Parks, Harold. A primer of real analytic functions. Boston.
Birkhäuser, second edition, 2002. Cited on page 155.
Peter Kunkel and Volker Mehrmann. Canonical forms for linear differential-algebraic
equations with variable coefficients. Journal of computational and applied mathematics, 56(3):225–251, 1994. Cited on page 34.
Peter Kunkel and Volker Mehrmann. A new class of discretization methods for the
solution of linear differential-algebraic equations with variable coefficients. SIAM
Journal on Numerical Analysis, 33(5):2941–1961, October 1996. Cited on pages
130 and 131.
Peter Kunkel and Volker Mehrmann. Regular solutions of nonlinear differentialalgebraic equations. Numerische Mathematik, 79(4):581–600, June 1998. Cited on
page 131.
Peter Kunkel and Volker Mehrmann. Index reduction for differential-algebraic equations by minimal extension. ZAMM — Journal of Applied Mathematics and Mechanics, 84(9):579–597, 2004. Cited on page 34.
Peter Kunkel and Volker Mehrmann. Differential-algebraic equations — Analysis
and numerical solution. European Mathematical Society, 2006. Cited on pages 34,
40, 51, 72, 129, 132, 134, 138, 141, 142, and 147.
Peter Kunkel, Volker Mehrmann, Werner Rath, and Jörg Weickert. GELDA: A software package for the solution of general linear differential algebraic equations,
1995. Cited on page 51.
Peter Kunkel, Volker Mehrmann, and Werner Rath. Analysis and numerical solution of control problems in descriptor form. Mathematics of Control, Signals, and
Systems, 14(1):29–61, 2001. Cited on page 34.
Alexander B. Kurzhanski and István Vályi. Ellipsoidal calculus for estimation and
control. Birkhäuser, 1997. Cited on page 62.
Junghyun Kwon, Minseok Choi, F. C. Park, and Changmook Chun. Particle filtering
on the Euclidean group: Framework and applications. Robotica, 25:725–737, 2007.
Cited on pages 110 and 112.
Wook Hyun Kwon, Young Soo Moon, and Sang Chul Ahn. Bounds in algebraic ricatti
and lyapunov equations: A survey and some new results. International Journal of
Control, 64(3):377–389, June 1996. Cited on pages 55 and 58.
G. S. Ladde and S. G. Rajalakshmi. Diagonalization and stability of multi-time-scale
singularly perturbed linear systems. Applied Mathematics and Computation, 16
(2):115–140, February 1985. Cited on page 71.
Bibliography
261
G. S. Ladde and S. G. Rajalakshmi. Singular perturbations of linear systems with
multiparameters and multiple time scales. Journal of Mathematical Analysis and
Applications, 129(2):457–481, February 1988. Cited on page 71.
G. S. Ladde and O. Sirisaengtaksin. Large-scale stochastic singularly perturbed systems. Mathematics and Computers in Simulation, 31(1–2):31–40, February 1989.
Cited on page 69.
G. S. Ladde and D. D. S̆iljak. Multiparameter singular perturbations of linear systems
with multiple time scales. Automatica, 19(4):385–394, July 1989. Cited on page
71.
Jehee Lee and Sung Yong Shin. General construction of time-domain filters for orientation data. IEEE Transactions on Visualization and Computer Graphics, 8(2):
119–128, April 2002. Cited on page 110.
Taeyoung Lee, Melvin Leok, and Harris McClamroch. Global symplectic uncertainty
propagation on SO(3). In Proceedings of the 47th IEEE Conference on Decision
and Control, pages 61–66, Cancun, Mexico, December 2008. Cited on page 112.
Ben Leimkuhler, Linda Ruth Petzold, and C. William Gear. Approximation methods
for the consistent initialization of differential-algebraic equations. SIAM Journal
on Numerical Analysis, 28(1):204–226, February 1991. Cited on page 47.
Adrien Leitold and Katalin M. Hangos. Structural solvability analysis of dynamic
process models. Computers and Chemical Engineering, 25(11–12):1633–1646,
2001. Cited on page 47.
Frans Lemeire. Bounds for condition numbers of triangular and trapezoid matrices.
BIT Numerical Mathematics, 15(1):58–64, March 1975. Cited on page 83.
C.-W. Li and Y.-K. Feng. Functional reproducibility of general multivariable analytic
nonlinear systems. International Journal of Control, 45(1):255–268, 1987. Cited
on page 38.
Lennart Ljung. System identification, Theory for the User. Prentice-Hall, Inc., 1999.
Cited on page 18.
James Ting-Ho Lo and Linda R. Eshleman. Exponential fourier densities on SO(3)
and optimal estimation and detection for rotational processes. SIAM Journal on
Applied Mathematics, 36(1):73–82, February 1979. Cited on pages 110 and 113.
David G. Luenberger. Time-invariant descriptor systems. Automatica, 14(5):473–
480, 1978. Cited on page 29.
Morris Marden. Geometry of polynomials. American Mathematical Society, second
edition, 1966. Cited on page 82.
R. M. M. Mattheij and P. M. E. J. Wijckmans. Sensitivity of solutions of linear dae
to perturbations of the system matrices. Numerical Algorithms, 19(1–4):159–171,
1998. Cited on page 72.
262
Bibliography
Sven Erik Mattsson and Gustaf Söderlind. Index reduction in differential-algebraic
equations using dummy derivatives. SIAM Journal on Scientific Computation, 14
(3):677–692, May 1993. Cited on pages 34 and 95.
Sven Erik Mattsson, Hans Olsson, and Hilding Elmqvist. Dynamic selection of states
in dymola. Modelica Workshop, pages 61–67, October 2000. Cited on page 34.
Volker Mehrmann and Chunchao Shi.
Transformation of high order linear
differential-algebraic systems to first order. Numerical Algorithms, 42(3–4):281–
307, July 2006. Cited on page 36.
Roswitha März and Ricardo Riaza. Linear differential-algebraic equations with properly stated leading term: Regular points. Journal of Mathematical Analysis and
Applications, 323(2):1279–1299, December 2006. Cited on page 38.
Roswitha März and Ricardo Riaza. Linear differential-algebraic equations with properly stated leading term: A-critical points. Mathematical and Computer Modelling
of Dynamical Systems, 13(3):291–314, 2007. Cited on pages 38 and 72.
Roswitha März and Ricardo Riaza. Linear differential algebraic equations with properly stated leading terms: B-critical points. Dynamical Systems: An International
Journal, 23(4):505–522, 2008. Cited on page 38.
Hyeon-Suk Na, Chung-Nim Lee, and Otfried Cheong. Voronoi diagrams on the
sphere. Computational Geometry, 23:183–194, 2002. Cited on page 115.
D. Subbaram Naidu. Singular perturbations and time scales in control theory and
applications: An overview. Dynamics of Continuous, Discrete and Impulsive Systems, 9(2):233–278, 2002. Cited on pages 67 and 232.
Arnold Neumaier. Overestimation in linear interval equations. SIAM Journal on
Numerical Analysis, 24:207–214, 1987. Cited on page 78.
Arnold Neumaier. Interval methods for systems of equations. Cambridge University
Press, 1990. Cited on page 79.
Constantinos Pantelides. The consistent initialization of differential-algebraic systems. SIAM Journal on Scientific and Statistical Computing, 9(2):213–231, March
1988. Cited on pages 34 and 47.
Xavier Pennec. Intrinsic statistics on Riemannian manifolds: Basic tools for geometric
measurements. Journal of Mathematical Imaging and Vision, 25(1):127–154, July
2006. Cited on page 120.
Linda Ruth Petzold. Order results for Runge-Kutta methods applied to differential/algebraic systems. SIAM Journal on Numerical Analysis, 23(4):837–852, 1986.
Cited on page 50.
P. J. Rabier and W. C. Rheinboldt. A geometric treatment of implicit differentialalgebraic equations. Journal of Differential Equations, 109(1):110–146, April 1994.
Cited on page 32.
Bibliography
263
S. Reich. On an existence and uniqueness theory for non-linear differential-algebraic
equations. Circuits, Systems, and Signal Processing, 10(3):344–359, 1991. Cited
on page 32.
Gregory J. Reid, Ping Lin, and Allan D. Wittkopf. Differential eliminationcompletion algorithms for dae and pdae. Studies in Applied Mathematics, 106:
1–45, 2001. Cited on page 32.
Gregory J. Reid, Chris Smith, and Jan Verschelde. Geometric completion of differential systems using numeric-symbolc continuation. ACM SIGSAM Bulletin, 36(2):
1–17, June 2002. Cited on page 91.
Gregory J. Reid, Jan Verschelde, Allan Wittkopf, and Wenyuan Wu. Symbolicnumeric completion of differential systems by homotopy continuation. Proceedings of the 2005 international symposium on symbolic and algebraic computation,
pages 269–276, 2005. Cited on page 91.
Michel Roche. Implicit Runge-Kutta methods for differential algebraic equations.
SIAM Journal on Numerical Analysis, 26(4):963–975, 1989. Cited on page 50.
Ronald C. Rosenberg and Dean C. Karnopp. Introduction to physical system dynamics. McGraw-Hill Book Company, 1983. Cited on page 24.
P. Rouchon, M. Fliess, and J. Lévine. Kronecker’s canonical forms for nonlinear implicit differential systems. In Proceedings of the 2nd IFAC Workshop on Systems
Structure and Control, pages 248–251, Prague, Czech Republic, September 1992.
Cited on pages 29, 38, 39, 88, and 92.
Walter Rudin. Principles of mathematical analysis. McGraw-Hill, third edition, 1976.
Cited on page 73.
Wilson J. Rugh. Linear system theory. Prentice-Hall, Inc., second edition, 1996. Cited
on pages 52, 57, 61, 64, 65, 66, and 240.
A. Saberi and Hassan K. Khalil. Quadratic-type Lyapunov functions for singularly
perturbed systems. IEEE Transactions on Automatic Control, AC-29(6):542–550,
June 1984. Cited on page 71.
Helmut Schaeben. “Normal” orientation distributions. Texture, Stress, and Microstructure, 19(4):197–202, 1992. J. changed name in 2008 from Textu. M.-struct.
Cited on page 122.
Eliezer Y. Shapiro. On the Lyapunov matrix equation. IEEE Transactions on Automatic Control, 19(5):594–596, October 1974. Cited on page 58.
Johan Sjöberg. Optimal control and model reduction of nonlinear dae models. PhD
thesis, Linköping University, 2008. Cited on page 23.
Sigurd Skogestad and Ian Postlethwaite. Multivariable feedback control. John Wiley
& Sons, 1996. Cited on page 21.
264
Bibliography
Anuj Srivastava and Eric Klassen. Monte Carlo extrinsic estimators of manifoldvalued parameters. IEEE Transactions on Signal Processing, 50(2):299–308, February 2002. Cited on pages 119 and 120.
Andreas Steinbrecher. Numerical solution of quasi-linear differential-algebraic equations and industrial simulation of multibody systems. PhD thesis, Technischen
Universität Berlin, 2006. Cited on pages 29, 86, 92, and 242.
G. W. Stewart and Ji-guang Sun. Matrix perturbation theory. Computer Science and
Scientific Computing. Academic Press, 1990. Cited on pages 40, 42, 43, and 200.
Torsten Ström. On logarithmic norms. SIAM Journal on Numerical Analysis, 12(5):
741–753, 1975. Cited on pages 55 and 56.
Tatjana Stykel. Gramian based model reduction for descriptor systems. Mathematics
of Control, Signals, and Systems, 16(4):297–319, 2004. Cited on page 21.
N. Sukumar. Voronoi cell finite difference method for the diffusion operator on arbitrary unstructured grids. International Journal for Numerical Methods in Engineering, 57(1):1–34, May 2003. Cited on page 115.
Andrzej Szatkowski. Generalized dynamical systems: Differentiable dynamic complexes and differential dynamic systems. International Journal of Systems Science,
21(8):1631–1657, August 1990. Cited on page 32.
Andrzej Szatkowski. Geometric characterization of singular differential algebraic
eqautions. International Journal of Systems Science, 23(2):167–186, February
1992. Cited on page 32.
G. Thomas. Symbolic computation of the index of quasilinear differential-algebraic
equations. Proceedings of the 1996 international symposium on Symbolic and
algebraic computation, pages 196–203, 1996. Cited on pages 32 and 39.
Henrik Tidefelt. Structural algorithms and perturbations in differential-algebraic
equations. Technical Report Licentiate thesis No 1318, Division of Automatic Control, Linköping University, 2007. Cited on pages 9, 10, 85, 91, 149, and 158.
Henrik Tidefelt and Torkel Glad. Index reduction of index 1 dae under uncertainty.
In Proceedings of the 17th IFAC World Congress, pages 5053–5058, Seoul, Korea,
July 2008. Cited on pages 54, 149, 151, 152, and 157.
Henrik Tidefelt and Torkel Glad. On the well-posedness of numerical dae. In
Proceedings of the European Control Conference 2009, pages 826–831, Budapest,
Hungary, August 2009. Cited on page 157.
Henrik Tidefelt and Thomas B. Schön. Robust point-mass filters on manifolds. In
Proceedings of the 15th IFAC Symposium on System Identification, pages 540–
545, Saint-Malo, France, July 2009. Not cited.
David Törnqvist, Thomas B. Schön, Rickard Karlsson, and Fredrik Gustafsson. Particle filter slam with high dimensional vehicle model. Journal of Intelligent and
Robotic Systems, 55(4–5):249–266, August 2009. Cited on page 110.
Bibliography
265
J. Unger, A. Kröner, and W. Marquardt. Structural analysis of differential-algebraic
equation systems — theory and applications. Computers and Chemical Engineering, 19(8):867–882, 1995. Cited on pages 24, 47, and 91.
Charles F. Van Loan. The sensitivity of the matrix exponential. SIAM Journal on
Numerical Analysis, 14(6):971–981, December 1977. Cited on pages 53, 54, 59,
and 60.
R. C. Veiera and E. C. Biscaia Jr. An overview of initialization approaches for
differential-algebraic equations. Latin American Applied Research, 30(4):303–313,
2000. Cited on page 47.
R. C. Veiera and E. C. Biscaia Jr. Direct methods for consistent initialization of dae
systems. Computers and Chemical Engineering, 25(9–10):1299–1311, September
2001. Cited on page 47.
Krešimir Veselić. Bounds for exponentially stable semigroups. Linear Algebra and
its Applications, 358:309–333, 2003. Cited on pages 53 and 56.
Josselin Visconti. Numerical solution of differential algebraic equations, global error estimation and symbolic index reduction. PhD thesis, Institut d’Informatique
et Mathématiques Appliquées de Grenoble, November 1999. Cited on pages 29
and 86.
Wolfram Research, Inc. Mathematica. Wolfram Research, Inc., Champaign, Illinois,
2008. Version 7.0.0. Cited on page 23.
Index
abstraction barrier, 110
algebraic constraints, 46
algebraic equation, 12
algebraic term, 12
autonomous, 12, 13, 118
backward difference formulas, see bdf
method
balanced form, 21
Bayes’ rule, 112
operator, 117
bdf
abbreviation, xvii
method, 47, 134
bond graph, 24
boundary layer, 68
dassl, 51
decoupling transform
lti index 1, 177
lti index 2, 210
ltv index 1, 232
departure from normality, 54
derivative array, 31, 130
equations, 31, 131, 134, 138
differential inclusion, 61
differential-algebraic equation, see dae
differentiation index, 29, 31, 32, 35, 43,
130, 137
drift, 17, 50
dummy derivatives, 34
eigenvalues of matrix pair, 42
elimination-differentiation, 29
Chapman-Kolmogorov equation, 112, embedding, 110, 124, 127, 128
117
“natural”, 119, 121
component-based model, 24
Euclidean space, 122
constraint propagation, 79
notation, xv
contraction
exponential map, 111
mapping, 73, 182, 214, 218, 235, 238
notation, xvi
principle, 73
extrinsic mean, 120
contravariant vector, 11, 111
Fokker-Planck equation, 112
convolution, 118
forcing function, 12, 13
coordinate map, 121, 133
form
covariant vector, 11
balanced, 21
D-stability, 70
implicit ode, 23
dae, 22
quasilinear, 23
abbreviation, xvii
state space, 19
quasilinear, 23
fraction-free, 80
daspk, 51
fraction-producing, 81
267
268
fundamental matrix, 52
synonym, see transition matrix
Index
iteration matrix, 49
leading matrix
of (quasi)linear dae, 12
of matrix pair, 42
Lie group, 112, 113
local coordinates, 110, 133
longevity, 96
lti
Hankel norm, 21
abbreviation, xvii
autonomous dae, 12, 25
ida, 51
autonomous ode, 13
ill-posed
dae, 12, 25
in quasilinear shuffle algorithm, 92
ode, 13
in shuffle algorithm, 30
ltv
in structure algorithm, 39
abbreviation, xvii
initial value problem, 46
autonomous dae, 13, 26
uncertain dae, 151
autonomous ode, 13
implicit ode, 23
dae, 13, 26
implicit Runge-Kutta methods, see irk
ode, 13
index, 28
Lyapunov equation, 52
(unqualified), 35
Lyapunov function, 52
differentiation, see differentiation
candidate, 52, 231
index
Lyapunov transformation, 57, 240
nominal, see nominal index
perturbation, see perturbation index manifold, 111
pointwise, see pointwise, index
Mathematica, 11, 23n, 51, 79, 135, 177
simplified strangeness, see simpli- matrix pair, 41, 186
fied strangeness index
matrix pencil, 40
strangeness, see strangeness index
matrix-valued singular perturbation, 2,
index reduction, 3, 28, 34, 51, 85, 109
71, 151, 196, 227
seminumerical, 99, 105
matrix-valued uncertainty, 152
inevitable pendulum, 95
measurement update, 111, 116
initial value problem, 14
meshless, 115
consistent initial conditions, 47
model, 15
ill-posed, see ill-posed initial value
component-based, 24
problem
residualized, 20
inner approximation, 78
truncated, 20
input (to differential equation), 13
model class, 18
input matrix, 13
model reduction, 7, 19
interval
model structure, 18
matrix, 78
multiparameter singular perturbation,
real, 78
69
vector, 78
multiple time scale singular perturbaintrinsic mean, 119
tion, 71
irk, 50, 51
abbreviation, xvii
nominal index, 35
Gaussian distribution, 112
notation, xvi
gelda, 51
genda, 51
geodesic, 111
Index
269
non-differential equation, 12
right hand side
of ode, 13
normal, 7, 54
of quasilinear dae, 12
departure from, see departure from
normality
scalar singular perturbation, 67
shuffle algorithm, 29, 30
ode
quasilinear, see quasilinear shuffle
abbreviation, xvii
algorithm
autonomous, 13n
simplified strangeness index, 134
implicit, 23
singular
time-invariant, 13
lti dae, 41
one-full, 130, 135
matrix pair, 42
outer approximation, 78
matrix pencil, 40
pair, matrix, 41
uncertain matrix, 78
Pantelides’ algorithm, 34
uncertain matrix pair, 43
particle filter, 110
singular perturbation, 67
pencil, matrix, 40
matrix-valued, see matrix-valued
perturbation
singular perturbation
regular, see regular perturbation
multiparameter, see multiparameter
singular, see singular perturbation
singular perturbation
perturbation index, 33
multiple time scale, see multiple
point
time scale singular perturbation
matrix, 78
scalar, see scalar singular perturbavector, 78
tion
point estimate, 111, 119
singular perturbation approximation, 21
point-mass distribution, 114
square (dae), 14
point-mass filter, 111, 114
state (vector), 14
pointwise
state feedback matrix, 13
index, 35
state space model, 113n
non-singular, 78
strangeness index, 34, 72, 131
properly stated leading term, 38
structural zero, 6, 81n, 94
structure algorithm, 29, 38, 39
quasilinear form, 5, 12, 23
quasilinear shuffle algorithm, 5, 29, 86, tangent space, 111, 122
92
tessellation, 111, 114, 120
time update, 111, 117, 118
radau5, 51
trailing matrix
reduced equation, 14, 98, 131
of linear dae, 12
regular
of matrix pair, 42
lti dae, 41
transition matrix, 52, 118n, 205, 235
matrix pair, 42
truncated model, 20
matrix pencil, 40
truncation, 20
uncertain matrix, 42, 78
uncertain matrix pair, 43
uniformly bounded-input, boundedregular perturbation, 58
output stable, 60
residualization, 20
uniformly exponentially stable, 64
residualized model, 20
unstructured perturbation, 152
270
variable, 14
Voronoi diagram, 115, 121
Index
PhD Dissertations
Division of Automatic Control
Linköping University
M. Millnert: Identification and control of systems subject to abrupt changes. Thesis No. 82,
1982. ISBN 91-7372-542-0.
A. J. M. van Overbeek: On-line structure selection for the identification of multivariable systems. Thesis No. 86, 1982. ISBN 91-7372-586-2.
B. Bengtsson: On some control problems for queues. Thesis No. 87, 1982. ISBN 91-7372-5935.
S. Ljung: Fast algorithms for integral equations and least squares identification problems.
Thesis No. 93, 1983. ISBN 91-7372-641-9.
H. Jonson: A Newton method for solving non-linear optimal control problems with general
constraints. Thesis No. 104, 1983. ISBN 91-7372-718-0.
E. Trulsson: Adaptive control based on explicit criterion minimization. Thesis No. 106, 1983.
ISBN 91-7372-728-8.
K. Nordström: Uncertainty, robustness and sensitivity reduction in the design of single input
control systems. Thesis No. 162, 1987. ISBN 91-7870-170-8.
B. Wahlberg: On the identification and approximation of linear systems. Thesis No. 163, 1987.
ISBN 91-7870-175-9.
S. Gunnarsson: Frequency domain aspects of modeling and control in adaptive systems. Thesis No. 194, 1988. ISBN 91-7870-380-8.
A. Isaksson: On system identification in one and two dimensions with signal processing applications. Thesis No. 196, 1988. ISBN 91-7870-383-2.
M. Viberg: Subspace fitting concepts in sensor array processing. Thesis No. 217, 1989.
ISBN 91-7870-529-0.
K. Forsman: Constructive commutative algebra in nonlinear control theory. Thesis No. 261,
1991. ISBN 91-7870-827-3.
F. Gustafsson: Estimation of discrete parameters in linear systems. Thesis No. 271, 1992.
ISBN 91-7870-876-1.
P. Nagy: Tools for knowledge-based signal processing with applications to system identification. Thesis No. 280, 1992. ISBN 91-7870-962-8.
T. Svensson: Mathematical tools and software for analysis and design of nonlinear control
systems. Thesis No. 285, 1992. ISBN 91-7870-989-X.
S. Andersson: On dimension reduction in sensor array signal processing. Thesis No. 290,
1992. ISBN 91-7871-015-4.
H. Hjalmarsson: Aspects on incomplete modeling in system identification. Thesis No. 298,
1993. ISBN 91-7871-070-7.
I. Klein: Automatic synthesis of sequential control schemes. Thesis No. 305, 1993. ISBN 917871-090-1.
J.-E. Strömberg: A mode switching modelling philosophy. Thesis No. 353, 1994. ISBN 917871-430-3.
K. Wang Chen: Transformation and symbolic calculations in filtering and control. Thesis
No. 361, 1994. ISBN 91-7871-467-2.
T. McKelvey: Identification of state-space models from time and frequency data. Thesis
No. 380, 1995. ISBN 91-7871-531-8.
J. Sjöberg: Non-linear system identification with neural networks. Thesis No. 381, 1995.
ISBN 91-7871-534-2.
R. Germundsson: Symbolic systems – theory, computation and applications. Thesis No. 389,
1995. ISBN 91-7871-578-4.
P. Pucar: Modeling and segmentation using multiple models. Thesis No. 405, 1995. ISBN 917871-627-6.
H. Fortell: Algebraic approaches to normal forms and zero dynamics. Thesis No. 407, 1995.
ISBN 91-7871-629-2.
A. Helmersson: Methods for robust gain scheduling. Thesis No. 406, 1995. ISBN 91-7871628-4.
P. Lindskog: Methods, algorithms and tools for system identification based on prior knowledge. Thesis No. 436, 1996. ISBN 91-7871-424-8.
J. Gunnarsson: Symbolic methods and tools for discrete event dynamic systems. Thesis
No. 477, 1997. ISBN 91-7871-917-8.
M. Jirstrand: Constructive methods for inequality constraints in control. Thesis No. 527, 1998.
ISBN 91-7219-187-2.
U. Forssell: Closed-loop identification: Methods, theory, and applications. Thesis No. 566,
1999. ISBN 91-7219-432-4.
A. Stenman: Model on demand: Algorithms, analysis and applications. Thesis No. 571, 1999.
ISBN 91-7219-450-2.
N. Bergman: Recursive Bayesian estimation: Navigation and tracking applications. Thesis
No. 579, 1999. ISBN 91-7219-473-1.
K. Edström: Switched bond graphs: Simulation and analysis. Thesis No. 586, 1999. ISBN 917219-493-6.
M. Larsson: Behavioral and structural model based approaches to discrete diagnosis. Thesis
No. 608, 1999. ISBN 91-7219-615-5.
F. Gunnarsson: Power control in cellular radio systems: Analysis, design and estimation. Thesis No. 623, 2000. ISBN 91-7219-689-0.
V. Einarsson: Model checking methods for mode switching systems. Thesis No. 652, 2000.
ISBN 91-7219-836-2.
M. Norrlöf: Iterative learning control: Analysis, design, and experiments. Thesis No. 653,
2000. ISBN 91-7219-837-0.
F. Tjärnström: Variance expressions and model reduction in system identification. Thesis
No. 730, 2002. ISBN 91-7373-253-2.
J. Löfberg: Minimax approaches to robust model predictive control. Thesis No. 812, 2003.
ISBN 91-7373-622-8.
J. Roll: Local and piecewise affine approaches to system identification. Thesis No. 802, 2003.
ISBN 91-7373-608-2.
J. Elbornsson: Analysis, estimation and compensation of mismatch effects in A/D converters.
Thesis No. 811, 2003. ISBN 91-7373-621-X.
O. Härkegård: Backstepping and control allocation with applications to flight control. Thesis
No. 820, 2003. ISBN 91-7373-647-3.
R. Wallin: Optimization algorithms for system analysis and identification. Thesis No. 919,
2004. ISBN 91-85297-19-4.
D. Lindgren: Projection methods for classification and identification. Thesis No. 915, 2005.
ISBN 91-85297-06-2.
R. Karlsson: Particle Filtering for Positioning and Tracking Applications. Thesis No. 924,
2005. ISBN 91-85297-34-8.
J. Jansson: Collision Avoidance Theory with Applications to Automotive Collision Mitigation.
Thesis No. 950, 2005. ISBN 91-85299-45-6.
E. Geijer Lundin: Uplink Load in CDMA Cellular Radio Systems. Thesis No. 977, 2005.
ISBN 91-85457-49-3.
M. Enqvist: Linear Models of Nonlinear Systems. Thesis No. 985, 2005. ISBN 91-85457-64-7.
T. B. Schön: Estimation of Nonlinear Dynamic Systems — Theory and Applications. Thesis
No. 998, 2006. ISBN 91-85497-03-7.
I. Lind: Regressor and Structure Selection — Uses of ANOVA in System Identification. Thesis
No. 1012, 2006. ISBN 91-85523-98-4.
J. Gillberg: Frequency Domain Identification of Continuous-Time Systems Reconstruction
and Robustness. Thesis No. 1031, 2006. ISBN 91-85523-34-8.
M. Gerdin: Identification and Estimation for Models Described by Differential-Algebraic
Equations. Thesis No. 1046, 2006. ISBN 91-85643-87-4.
C. Grönwall: Ground Object Recognition using Laser Radar Data – Geometric Fitting, Performance Analysis, and Applications. Thesis No. 1055, 2006. ISBN 91-85643-53-X.
A. Eidehall: Tracking and threat assessment for automotive collision avoidance. Thesis
No. 1066, 2007. ISBN 91-85643-10-6.
F. Eng: Non-Uniform Sampling in Statistical Signal Processing. Thesis No. 1082, 2007.
ISBN 978-91-85715-49-7.
E. Wernholt: Multivariable Frequency-Domain Identification of Industrial Robots. Thesis
No. 1138, 2007. ISBN 978-91-85895-72-4.
D. Axehill: Integer Quadratic Programming for Control and Communication. Thesis
No. 1158, 2008. ISBN 978-91-85523-03-0.
G. Hendeby: Performance and Implementation Aspects of Nonlinear Filtering. Thesis
No. 1161, 2008. ISBN 978-91-7393-979-9.
J. Sjöberg: Optimal Control and Model Reduction of Nonlinear DAE Models. Thesis No. 1166,
2008. ISBN 978-91-7393-964-5.
D. Törnqvist: Estimation and Detection with Applications to Navigation. Thesis No. 1216,
2008. ISBN 978-91-7393-785-6.
P-J. Nordlund: Efficient Estimation and Detection Methods for Airborne Applications. Thesis
No. 1231, 2008. ISBN 978-91-7393-720-7.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement