Tesi Pilolli Massimo

Tesi Pilolli Massimo
ALMA MATER STUDIORUM
UNIVERSITÀ DI BOLOGNA
Dottorato di Ricerca in Modellistica Fisica per la Protezione dell’Ambiente – XX Ciclo
Dipartimento di Scienze della Terra e Geologico-Ambientali – Dipartimento di Fisica
Settore Scientifico-Disciplinare FIS/06 – Fisica per il Sistema Terra e il Mezzo Circumterrestre
Coordinatore: Prof. Ezio Todini
A Dynamical System Approach
to Data Assimilation
in Chaotic Models
Dottorando:
Dott. Massimo Pilolli
Relatori:
Prof. Rolando Rizzi
Dott.ssa Anna Trevisan
Anno 2008
A mio padre e mia madre
“Une cause très petite, qui nous échappe, détermine un effet considérable que nous ne
pouvons pas ne pas voir, et alors nous disons que cet effet est dû au hasard. Si nous connaissions
exactement les lois de la nature et la situation de l’univers à l’instant initial, nous pourrions
prédire exactement la situation de ce même univers à un instant ultérieur. Mais, lors même que
les lois naturelles n’auraient plus de secret pour nous, nous ne pourrons connaı̂tre la situation
initiale qu’approximativement. Si cela nous permet de prévoir la situation ultérieure avec la
même approximation, c’est tout ce qu’il nous faut, nous disons que le phénomène a été prévu,
qu’il est régi par des lois; mais il n’en est pas toujours ainsi, il peut arriver que de petites
différences dans les conditions initiales en engendrent de très grandes dans les phénomènes
finaux; une petite erreur sur les premières produirait une erreur énorme sur les derniers. La
prédiction devient impossible et nous avons le phénomène fortuit.”
Jules-Henri Poincaré – Science et Méthode, 1908
Acknowledgements
First of all, I wish to write a note in memory of Edward Norton Lorenz, who died in Cambridge, MA, USA, on April 16. He was not only the father of Chaos Theory and the butterfly
effect, one of the great scientific revolutions of the past century, but also the grandfather of all
meteorologists. I regret I have never had the chance to meet him.
I wish to thank my advisor Anna Trevisan (ISAC-CNR), one of the strongest and most
motivated woman I ever met: her deep insight into data assimilation, her constant support and
enlightening discussions made this work possible. A special thank you to Alberto Carrassi and
Francesco Uboldi for their frequent, friendly help. I also wish to thank Prof. Rolando Rizzi and
Prof. Ezio Todini for their support, useful suggestions and comments.
Truly unforgettable have been the kind hospitality and the useful discussions with Prof.
Catherine Rouvas-Nicolis and Dr. Stéphane Vannitsem, at the Department of Meteorological
Research and Development of the Institut Royal Météorologique/Koninklijk Meteorologisch Instituut, the Royal Meteorological Institute of Belgium, Brussels, where I met very nice and
friendly people: merci beaucoup/dank u wel!
E infine un grazie grande cosı̀ alla Stefania, per la sua pazienza e disponibilità!
The software used in this work has been developed on the basis of software written in the
NERC Data Assimilation Research Centre, Department of Meteorology, University of Reading,
United Kingdom. All experiments were run on a Debian 3.1 Linux computer, using both Matlab
6.5.0.180913a Release 13 and gcc/g++ 3.3.5 C/C++ compiler. Some figures were produced by
gnuplot 4.0.
M. P., Bologna, April 2008
v
Contents
Acknowledgements
v
List of Figures
xi
List of Tables
xv
Abbreviations
xvii
Partial List of Symbols
xix
1 Introduction
1
1.1
A historical perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Data Assimilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.3
Framing the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.3.1
Basic features of chaotic systems, a historical note . . . . . . . . . . . . .
6
1.3.2
Conservative, nonconservative dynamical systems . . . . . . . . . . . . . .
6
1.3.3
Probability Density Functions and Liouville equation . . . . . . . . . . . .
7
1.3.4
The Markov processes and the Fokker-Planck equation . . . . . . . . . . .
8
1.3.5
The Liouville Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.3.6
Dissipative systems, attractors and strange attractors . . . . . . . . . . . 11
1.3.7
Small perturbations dynamics, tangent linear model, adjoint model . . . . 12
1.3.8
Lyapunov vectors and Lyapunov exponents . . . . . . . . . . . . . . . . . 13
1.3.9
Bred vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.10 Dynamical systems described by maps . . . . . . . . . . . . . . . . . . . . 16
1.4
Lorenz’s three dimensional chaotic system (1963) . . . . . . . . . . . . . . . . . . 20
1.4.1
The equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.2
The meaning of variables and parameters . . . . . . . . . . . . . . . . . . 22
vii
1.4.3
Why it is so important: chaos implies limited predictability . . . . . . . . 27
1.4.4
Lyapunov exponents, dimensionality and doubling time . . . . . . . . . . 27
1.5
The aim of this study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6
Notation and conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.1
Model space and observational space . . . . . . . . . . . . . . . . . . . . . 29
1.6.2
The observation operator H and its linear approximation H . . . . . . . . 29
1.6.3
Error vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.6.4
Error covariance matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.6.5
Operators and vectors: a low dimensional example . . . . . . . . . . . . . 33
2 Data Assimilation: state of the art
2.1
2.2
2.3
2.4
37
Variational assimilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.1.1
Maximum likelihood approach . . . . . . . . . . . . . . . . . . . . . . . . 38
2.1.2
Bayesian approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.1.3
3D-Var scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Sequential assimilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2.1
Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2.2
Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2.3
Ensemble Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
AUS: Assimilation in the Unstable Subspace . . . . . . . . . . . . . . . . . . . . . 49
2.3.1
AUS: how it works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3.2
AUS: a simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.3.3
Refresh procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.4
Using a single bred vector for assimilation . . . . . . . . . . . . . . . . . . 53
2.3.5
Adaptive observation strategy . . . . . . . . . . . . . . . . . . . . . . . . . 53
BDAS: Breeding on the Data Assimilation System . . . . . . . . . . . . . . . . . 53
2.4.1
Standard breeding method . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.4.2
BDAS: how it works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.4.3
BDAS, an example of practical implementation . . . . . . . . . . . . . . . 55
3 Assimilation in the Lorenz 63 model: comparison among different methods 59
3.1
Experimental setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2
Extended Kalman Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.1
EKF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.2
Evensen’s version of EKF . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
viii
3.2.3
3.3
Yang’s version of EKF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Assimilation in the Unstable Subspace: further developments . . . . . . . . . . . 62
3.3.1
AUS-γ0 : no use of observations in the estimate of the forecast error amplitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.4
3.5
3.6
3.3.2
Estimate of the amplitude γ of the forecast error from observations . . . . 64
3.3.3
AUS-γ: using the estimate of γ from observations . . . . . . . . . . . . . 65
3.3.4
Iterating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3.5
Iterating and using a quasi-static Pf in stable zones of the attractor . . . 67
Comparing results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4.1
Synchronization: one perfect or quasi-perfect observation, case studies . . 68
3.4.2
Noisy observations with variance σ 2 = 2 . . . . . . . . . . . . . . . . . . . 70
3.4.3
Noisy observations with variance σ 2 = 1 . . . . . . . . . . . . . . . . . . . 81
3.4.4
Noisy observations with variance σ 2 = 0.1 . . . . . . . . . . . . . . . . . . 81
3.4.5
Root Mean Square forecast error: time dependence . . . . . . . . . . . . . 84
Some illustrative examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.5.1
Comparing AUS with EKF assimilation schemes: case study . . . . . . . 88
3.5.2
AUS-γ: a 3D example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.5.3
AUS-iterating: a step by step description . . . . . . . . . . . . . . . . . . 95
Adding two types of Model Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.6.1
Random error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.6.2
Random error: comparing performances . . . . . . . . . . . . . . . . . . . 98
3.6.3
Systematic error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.6.4
Systematic error: comparing performances . . . . . . . . . . . . . . . . . . 100
4 Conclusions
103
Appendices
105
A Euler and Runge-Kutta numerical integration methods
107
A.1 First order Euler method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
A.2 RK2: second order Runge-Kutta scheme . . . . . . . . . . . . . . . . . . . . . . . 108
A.3 RK4: fourth order Runge-Kutta scheme . . . . . . . . . . . . . . . . . . . . . . . 109
B Normalization factors for random variables
ix
111
C EKF, 1-Dimensional example
113
C.1 Forecast and analysis covariance matrices . . . . . . . . . . . . . . . . . . . . . . 113
C.2 Lyapunov exponents for free and forced systems . . . . . . . . . . . . . . . . . . . 115
D Inverse and pseudo-inverse matrices
D.1 Inverse matrix
117
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
D.2 Pseudo-inverse matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Bibliography
121
x
List of Figures
1.1
A 6-hours data assimilation cycle for weather forecasts. Analyses are computed
every 6 hours, typically at 0000 ZT, 0600 ZT, 1200 ZT, 1800 ZT. . . . . . . . . .
1.2
5
A set of Brownian processes starting at x(0) = 0. The random forcing has zero
mean and standard deviation σ = 1. . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.3
Bifurcation diagram for the Logistic map, 0 ≤ r ≤ 4. . . . . . . . . . . . . . . . . 17
1.4
Bifurcation diagram for the Logistic map, 2.8 ≤ r ≤ 4. Notice the period-3
window around r = 3.83. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5
Bifurcation diagram for the Logistic map, a zoom. We can observe the selfsimilarity properties of this diagram. . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6
Hénon map: strange attractor for classical parameters a and b (see text). . . . . 19
1.7
Lorenz’s 1963 model: a comparison among 1-st order Euler, 2-nd and 4-th order
Runge-Kutta schemes for 1000 time steps ∆t = 0.01. The initial point belongs to
the attractor and is the same for all schemes: (x0 , y0 , z0 ) = (14.2041, 15.0165, 34.7172). 21
1.8
Lorenz’s 1963 model: a comparison among 1-st order Euler, 2-nd and 4-th order
Runge-Kutta schemes for the variable x. The initial point belongs to the attractor
and is the same for all schemes: (x0 , y0 , z0 ) = (14.2041, 15.0165, 34.7172). . . . . . 21
1.9
Lorenz’s 1963 model attractor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.10 Lorenz’s 1963 model attractor projected on xz plane. . . . . . . . . . . . . . . . . 23
1.11 Lorenz’s 1963 model attractor projected on xy plane. . . . . . . . . . . . . . . . . 23
1.12 Lorenz’s 1963 model attractor projected on yz plane. . . . . . . . . . . . . . . . . 24
1.13 Lorenz’s 1963 model: time dependence of x. . . . . . . . . . . . . . . . . . . . . . 24
1.14 Lorenz’s 1963 model: time dependence of y. . . . . . . . . . . . . . . . . . . . . . 25
1.15 Lorenz’s 1963 model: time dependence of z. . . . . . . . . . . . . . . . . . . . . . 25
1.16 Lorenz’s 1963 map: values of the relative maximum zmax (n) and successive relative maximum zmax (n + 1). See Fig. 1.15. . . . . . . . . . . . . . . . . . . . . . 26
1.17 A simple case: 3 grid points (e, f, g) and 2 observations (1, 2). . . . . . . . . . . 34
xi
2.1
How the Kalman Filter works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1
Synchronization with truth for both EKF/EKF-Evensen and AUS-iterating assimilation schemes. The only observed variable is y with variance σ 2 = 0 and
assimilation window τ = 0.25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2
EKF and EKF-Evensen (same plot in these conditions, see text and eq. 3.40)
find it hard to synchronize with truth. The only observed variable is y with
variance σ 2 = 0 and assimilation window τ = 0.6 . . . . . . . . . . . . . . . . . . 70
3.3
AUS-iterating: synchronization with truth. The only observed variable is y with
variance σ 2 = 0 and assimilation window τ = 0.6 . . . . . . . . . . . . . . . . . . 71
3.4
EKF/EKF-Evensen and AUS-iterating RMS errors. While the former fail to
converge to the truth, the latter synchronizes very quickly. The only observed
variable is y with variance σ 2 = 0 and assimilation window τ = 0.6 . . . . . . . . 71
3.5
EKF/EKF-Evensen and AUS-iterating RMS errors: a zoom of the previous Fig.
3.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.6
EKF: the only observed variable is y with variance σ 2 = 0.01 and assimilation
window τ = 0.6; in this case and in these conditions the EKF fails to synchronize
to the truth. Note also the bad performance around time 14. . . . . . . . . . . . 72
3.7
EKF-Evensen: the only observed variable is y with variance σ 2 = 0.01 and
assimilation window τ = 0.6. A better performance than pure EKF. . . . . . . . 73
3.8
AUS-iterating: the only observed variable is y with variance σ 2 = 0.01 and
assimilation window τ = 0.6. A far better performance. . . . . . . . . . . . . . . 73
3.9
EKF, EKF-Evensen and AUS-iterating RMS errors: respective performances.
The only observed variable is y with variance σ 2 = 0.01 and assimilation window
τ = 0.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.10 EKF, EKF-Evensen and AUS-iterating RMS errors: a zoom of the previous Fig.
3.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.11 EKF does not synchronize with truth. The only observed variable is y with
variance σ 2 = 0.1 and assimilation window τ = 0.25 . . . . . . . . . . . . . . . . 75
3.12 EKF-Evensen does not synchronize with truth. The only observed variable is y
with variance σ 2 = 0.1 and assimilation window τ = 0.25 . . . . . . . . . . . . . . 75
3.13 AUS-iterating does not synchronize with truth. The only observed variable is y
with variance σ 2 = 0.1 and assimilation window τ = 0.25 . . . . . . . . . . . . . . 76
xii
3.14 The only observed variable is y with variance σ 2 = 0.1 and assimilation window
τ = 0.25: in these conditions, no assimilation scheme under investigation actually
converge to the truth, but AUS-iterating has a better performance than EKFEvensen, which in turn is far better than pure EKF. . . . . . . . . . . . . . . . . 76
3.15 A zoom of the previous Fig. 3.14. Even if the global performance of EKF is poor
because of filter divergence, this does not mean that EKF is always worse than
others DA schemes for all times; while the global performance of AUS-iterating
is globally better.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.16 RMS Analysis Error distribution: an average on 100,000 assimilations with an
assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 2 . . . . . 79
3.17 RMS Analysis Error distribution: an average on 100,000 assimilations with an
assimilation window τ = 0.25, 2 noisy observations with variance σ 2 = 2 . . . . . 79
3.18 RMS Forecast Error distribution @ time=T+0.25: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance
σ 2 = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.19 RMS Forecast Error distribution @ time=T+0.50: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance
σ 2 = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.20 RMS Forecast Error distribution @ time=T+0.75: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance
σ 2 = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.21 RMS Analysis Error distribution: an average on 100,000 assimilations with an
assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 1 . . . . . 82
3.22 RMS Forecast Error distribution @ time=T+0.25: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance
σ 2 = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.23 RMS Forecast Error distribution @ time=T+0.50: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance
σ 2 = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.24 RMS Forecast Error distribution @ time=T+0.75: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance
σ 2 = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.25 RMS Analysis Error distribution: an average on 100,000 assimilations with an
assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 0.1 . . . . 85
xiii
3.26 RMS Forecast Error distribution @ time=T+0.25: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance
σ 2 = 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.27 RMS Forecast Error distribution @ time=T+0.50: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance
σ 2 = 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.28 RMS Forecast Error distribution @ time=T+0.75: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance
σ 2 = 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.29 RMS analysis and forecast error: an average on 100,000 assimilation steps, assimilation window τ = 0.25, 3 variables observed with variance σ 2 = 2 ⇒ σ = 1.414. 87
3.30 RMS analysis and forecast error: an average on 100,000 assimilation steps, assimilation window τ = 0.25, 3 variables observed with variance 1. . . . . . . . . . 88
3.31 RMS analysis + forecast error: an average on 100,000 assimilation steps, assimilation window τ = 0.25, 3 variables observed with variance σ 2 = 0.1 ⇒ σ = 0.316. 89
3.32 EKF assimilation scheme: solution for x. The green line shows the time of the
last observation available. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.33 EKF-Evensen assimilation scheme: solution for x. . . . . . . . . . . . . . . . . . . 90
3.34 EKF-Yang assimilation scheme: solution for x. . . . . . . . . . . . . . . . . . . . 90
3.35 AUS-γ0 assimilation scheme: solution for x. . . . . . . . . . . . . . . . . . . . . . 91
3.36 AUS-γ assimilation scheme: solution for x.
. . . . . . . . . . . . . . . . . . . . . 91
3.37 AUS-iterating assimilation scheme: solution for x. . . . . . . . . . . . . . . . . . 92
3.38 AUS-iterating+ assimilation scheme: solution for x. . . . . . . . . . . . . . . . . 92
3.39 How AUS-γ assimilates observations. . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.40 A qualitative comparison between EKF and AUS-γ. . . . . . . . . . . . . . . . . 94
3.41 The zone of the attractor considered in the next Figure 3.42. . . . . . . . . . . . 95
3.42 How AUS-iterating works: a zoom of previous Fig. 3.41. . . . . . . . . . . . . . . 96
3.43 More details on AUS-iterating assimilation scheme, including the evolving unstable vectors in the final forecast trajectory, recomputed after iteration. . . . . . 96
xiv
List of Tables
2.1
Breeding on the Data Assimilation System: introducing the perturbations and
estimating the unstable subspace. This is a specific example in which the breeding
time is ∆t = 2τ , where τ = tk+1 − tk is the assimilation window. . . . . . . . . . 56
3.1
The different data assimilation schemes under investigation with their main features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.2
RMS analysis error, an average over 100,000 assimilations. 3 and 2 noisy observations with variance σ 2 = 2 ⇒ σ = 1.414. Assimilation window τ = 0.25. . . . 78
3.3
RMS forecast error, an average over 100,000 assimilations. The mean RMS
analysis error is the same as in Table 3.2 and is shown here again for comparison.
3 noisy observations with variance σ 2 = 2 ⇒ σ = 1.414. Assimilation window
τ = 0.25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.4
RMS analysis and forecast error, an average over 100,000 assimilations. 3 noisy
observations with variance σ 2 = 1. Assimilation window τ = 0.25. . . . . . . . . 82
3.5
RMS analysis and forecast error, an average over 100,000 assimilations. 3 noisy
observations with variance σ 2 = 0.1 ⇒ σ = 0.316. Assimilation window τ = 0.25. 85
3.6
Random error in the assimilation model: 3 observations with variance σ 2 = 2 ⇒
σ = 1.414 each assimilation window τ = 0.25. It is shown the mean RMS analysis
error, an average over 20000 assimilations, for the different DA schemes. . . . . . 98
3.7
Random error in the assimilation model: 2 observations with variance σ 2 = 2 ⇒
σ = 1.414 each assimilation window τ = 0.25. We show the mean RMS analysis
error, an average over 20000 assimilations. . . . . . . . . . . . . . . . . . . . . . . 99
3.8
Systematic error in the assimilation model, r = 28 (i.e. no error, for reference),
r = 30, r = 33. Three observations with variance σ 2 = 2 each assimilation
window τ = 0.25. It is shown the mean RMS analysis error, an average over
20,000 assimilations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
xv
3.9
Systematic error in the assimilation model, r = 28 (i.e. no error, for reference),
r = 30, r = 33. Two observations with variance σ 2 = 2 each assimilation
window τ = 0.25. It is shown the mean RMS analysis error, an average over
20000 assimilations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
xvi
Abbreviations
3D-Var
3-dimensional variational analysis
4D-Var
4-dimensional variational analysis
AUS
Assimilation in the Unstable Subspace
BDAS
Breeding on the Data Assimilation System
BV
Bred Vector
CFL
Courant-Friedricks-Lewy
DA
Data Assimilation
ECMWF
European Centre for Medium-Range Weather Forecasts
EKF
Extended Kalman Filter
EnKF
Ensemble Kalman Filter
ENIAC
Electronic Numerical Integrator And Computer
FPE
Fokker-Planck Equation
GCM
General Circulation Model
KF
Kalman Filter
L63
Lorenz’s 1963 three variables convective systems
LAM
Local Area Model
LE
Liouville Equation
LLV
Local Lyapunov vector
LT
Liouville Theorem
xvii
LV
Lyapunov vector
NWP
Numerical Weather Prediction
OI
Optimal Interpolation
PDF
Probability Density Function
QG
Quasi-Geostrophic
RK
Runge-Kutta
RK2
2nd order Runge-Kutta scheme
RK4
4-th order Runge-Kutta scheme
RMS
Root Mean Square
SCM
Successive Correction Method
TLM
Tangent Linear Model
UTC
Universal Time, Coordinated. Also referred to as ZT
ZT
Zulu Time. Also referred to as UTC
xviii
Partial list of symbols
Symbols are listed in alphabetical order, first Latin alphabet, then Greek one. Uppercase
symbols precede lowercase ones.
E
matrix whose columns are the unstable vectors
H
observation operator
H
linearized observation operator
I
identity matrix
K
gain matrix
M
nonlinear model operator
M
linear model operator, tangent linear model operator
Pa
analysis error covariance matrix
Pf
forecast error covariance matrix
Q
model error covariance matrix
R
observational error covariance matrix
S Mod
model space
S ob
observational space
e
leading local Lyapunov vector, single unstable column vector
f
generic function, Coriolis parameter
xa
analysis state column vector
xb
background, or “first guess” state column vector
xf
forecast state column vector
xix
yb
background, or “first guess” values at observation points
yo
observation vector (includes noise) column vector
yt
true values of the observed variables, column vector
Γ
diagonal matrix whose elements are the estimate of the amplitudes of the errors
along the unstable vectors
Λ
diagonal matrix whose elements are the amplification factors of the unstable vectors
Ω
earth angular rotation vector
δxa
analysis increment xa − xb
δyo
innovation yo − H(xb )
γ
estimate of error amplitude
ηM
Model error
ηa
analysis error xa − xt
ηf
forecast error xf − xt
ηo
observational error yo − H(xt )
λi
i-th Lyapunov exponent
τ
assimilation window
xx
Chapter 1
Introduction
The Atmosphere plays an important role in weather and climate systems, including the hydrosphere (Oceans), cryosphere (ice and snow), lithosphere (soil) and biosphere (living systems).
All these components interact with each others [8]. The Atmosphere is also well known to be
a chaotic system with an enormous number of degrees of freedom and with many scales of motions, both in space and time: its predictability has a finite time horizon [31]. In this context, to
accurately estimate the state of the system we need to extract as much information as possible
from observational data, when available, and from the equations governing the evolution of the
system, to the extent they are known: that’s the goal of Data Assimilation, a fundamental
step for Numerical Weather Prediction (NWP). In order to develop new theoretical work, we
address the problem of data assimilation in chaotic systems in the framework of Lorenz’s three
variable convective model (1963). The aim is to develop data assimilation schemes applicable
to different contexts and computationally affordable in operational environments.
1.1
A historical perspective
It was a Norwegian meteorologist, Vilhelm Bjerknes, who first realized in 1904 that weather
forecasts were actually an initial value problem, to be solved by integrating the set of governing
equations of the atmosphere, starting from an accurate estimate of initial conditions obtained
from observations. This was the first explicit recognition that the future state of the atmosphere can be completely, deterministically calculated by its initial state and known boundary
conditions, together with seven equations: Newton’s equations of motion (three equations for
the three velocity components), the continuity equation (conservation of mass), the equation
of state for ideal gases, the first law of thermodynamics (conservation of energy) and a conservation equation for water mass [16]. Bjerknes was able to persuade Norwegians to build
1
a network of surface observation stations, founded the renown Bergen School of synoptic and
dynamic meteorology and proposed the famous polar front theory of cyclogenesis.
Lewis F. Richardson suggested in 1922 to integrate numerically the equations of motion of
the atmosphere and described how exactly this could be done. His first attempt of weather
forecast was actually unsuccessful, but did not diminish the value of his seminal work. The
increasing development of observational network on one hand, and the introduction of reliable
computing machines on the other, boosted a new interest on Richardson’s approach.
An important issue to address was balancing of initial conditions: if they are not in quasigeostrophic balance1 , inertia-gravity waves will arise and propagate horizontally. After a while,
they will drastically reduce their amplitude, leaving a field in quasi-geostrophic balance: this
process is called geostrophic adjustment. The time scale for this process to take place is of the
order2 of f −1 , approximately 12h [16]. Moreover, in Richardson’s first attempt, the integration
of the equations resulted in computational instability, due to a violation of Courant-FriedricksLewy (CFL) condition, which requires that the time step must be smaller than the grid size
divided by the speed of the fastest waves (sound waves, moving at about 340 m/s).
To address these problems, in 1948-49 Jule C. Charney and Eliassen introduced “filtered”
equations of motion, based on quasi-geostrophic balance — i.e. slowly varying, so filtering out
gravity and sound waves — and based on pressure fields alone. In 1950 Charney, R. Fjørtoft and
J. von Neuman performed on ENIAC, one of the first computers built3 , the first historical 24h
weather forecast using a barotropic one-layer filtered model: the results were encouraging. The
initial values, though, were still set by subjective analysis, relying on judgments of experienced
analysts. In 1949 Panofsky made the first attempt to overcome subjective analysis by an
automatic procedure, called objective analysis, albeit the term “objective” actually depends on
the algorithms used. This procedure used a polynomial expansion to fit all the observations of
several grid-points in a given area.
The first operational numerical weather forecast were issued in Sweden by Rossby and his
group in September 1954. In the meanwhile, new advances were introduced by Gilchrist and
Cressman (1954), who used a polynomial expansion, too, but with a local rather than areal
1 The
quasi-geostrophic equations hold for large scale, low-frequency motions, except low latitudes [1].
Coriolis parameter f ≈ 2Ω sin ϕ0 , where Ω is the earth’s angular velocity, and ϕ0 the latitude.
3 The ENIAC (Electronic Numerical Integrator And Computer) was built in 1946. It was indeed not the
first computer ever built and operating. The first fully functional, freely programmable computers of the world
was actually the German Konrad Zuse’s Z1, Z2 and Z3, electro-mechanical machines built between 1936 and
1941. The next generation used vacuum tubes: the Atanasoff-Berry Computer, built in 1939 at Iowa State
University, USA, and the British Colossus, operating since December 1943 in Bletchley Park, Milton Keynes,
UK. Both Zuse and Atanasoff-Berry computers were conceptual milestones in Computer Science, because the
former introduced the idea of programmable machines and the latter the use of the binary system and other
innovations. Nevertheless, it was the Colossus that had a truly enormous impact in the mankind history, because
it helped a lot in defeating Nazism: it was used during World War II to decrypt the most important messages
transmitted between the German Army Field marshals and their Central High Command in Berlin.
2 The
2
fit of observations: the idea of radius of influence was born. At each grid point they used a
polynomial function to approximate the fields, taking into account only the observations near
the grid point, i.e. within the radius of influence. Two more elements, introduced by the same
authors, were later adopted in successive works: automatic check for data quality, and an a
priori estimate of the analysis, obtained from a previous numerical forecast. This preliminary
estimate is now referred to as background field, first guess field, or prior estimate.
During these pioneering efforts by meteorologists to run reliable NWP models, it quickly
turned out that the accuracy of a model strongly depends on spatial resolution. In general,
the higher the resolution, the higher the accuracy of the model, but — of course — the higher
the computational cost as well. That’s because, due to computational stability requirements,
doubling the 3-dimensional space resolution also requires to double the time resolution. This
implies a 24 factor for the total computational cost (three spatial and one time dimensions). As
a result, grinding atmospheric models has always been a challenging task for supercomputers,
and the model size has always been driven, in turn, by the available computing capacity.
Further improvements came along: in 1955 Bergthorsson and Doos developed an analysis
method that eventually became known as “successive correction”. They reduced the computational cost of the interpolating procedures by specifying an a priori weight for each observation,
weight to be determined on a statistical basis.
Thompson in 1961 proposed to take full advantage of the propagation of information from
well observed regions to data-void ones. A quite usual situation in the global observing system
is the presence of zones where new observations are regularly available (e.g. densely populated
regions) and regions where new observational data are scarce (e.g. oceans and deserts). An
objective analysis can be made for the former areas, and some sort of educated guess for the
latter. Then the integration of a NWP model will provide a forecast valid at the next observation
time. Now we have new observations from data-rich areas and model-output data for data-poor
ones: information has propagated.
When computers became sufficiently fast, scientists turned to primitive equations instead
of filtered ones, and introduced regional models (or Local Area Models, LAMs) side by side to
General Circulation Models (GCMs), a GCM setting the boundary conditions for the regional
one. Thus, a regional model can be used for short range forecast only, as its high quality
initial conditions will be lost due to the “information advection” of the ever changing boundary
conditions driven by the GCM. Furthermore, as a consequence of increasing computing power,
now meteorologists no more tend to use the hydrostatic approximation — in which vertical
acceleration is neglected versus the gravitational one — for regional models.
3
1.2
Data Assimilation
In his pioneering paper cited in the section above [4], Charney stressed the need of objective
analysis of meteorological data, not to rely upon time-consuming human activities and subjective interpretations. This process is nowadays referred to as Data Assimilation (DA): its goal
is to produce an optimal, automatic estimate of the state of a dynamical system from incomplete, noisy observations and (approximate) knowledge of the laws governing the evolution of
the system. In the initialization of forecast models of the Ocean and the Atmosphere, DA is
performed cyclically. A new term, Data Assimilation Cycle, has been introduced to describe
this cyclic procedure, that encompasses the following steps:
• Quality control of observational data
• Objective analysis
• Initialization of the forecast model
• Short-range forecast to be used to estimate the next background field
Quality control is a very delicate step, because it has been shown that the analysis can be
highly sensitive to quality control decisions [8]. Generally speaking, datum quality is checked
against its neighbors, and a further requirement of spatial and temporal consistency is asked to
be fulfilled. Observations can be checked against the background, too.
Objective analysis step exploits both the available observational data, yo , and the background field, xb , numerically computed in the previous observation time. Obtaining the background or first guess “observations” is a matter of interpolating the model forecast to the
observational stations: model variables are converted to observed variables. First guess “observations” are so H(xb ), where H is the operator performing the required interpolation and
conversion from model variables to observation space. It should be noted that the operator H
is not a linear one, in general. The difference yo − H(xb ) between the observations and the first
guess “observations” is usually called observational increment or innovation [16]. The analysis
state xa is computed by adding the innovation to the model background field, with weights W
to be determined by estimating statistical error covariances of the forecast and the observations:
xa = xb + W yo − H(xb )
(1.1)
Many analysis techniques, such as Successive Correction Method (SCM), Optimal Interpolation
(OI), 3-dimensional variational analysis (3D-Var) and Kalman Filter (KF) use eq. 1.1, but with
different ways to calculate the weights W .
4
Figure 1.1: A 6-hours data assimilation cycle for weather forecasts. Analyses are computed
every 6 hours, typically at 0000 ZT, 0600 ZT, 1200 ZT, 1800 ZT.
After calculating the analysis state xa , the forecast model can be initialized to obtain the
routine forecast xf : this is actually the aim of the entire procedure. The numerical shortrange forecast to estimate the background field for the next observation time is usually the
output of a high resolution model that implements primitive equations. This model, called
assimilation model, has a complex set of parameterizations such that, if no new observations
become available, the model climate — computed by time-averaging a long run of the model —
will approximate the true climate. Short-range numerical forecast, typically 6-h ahead, replaced
the simple use of climatology as background field.
One approach to assimilate observations at various time is 4D-Var (e.g. Lewis and Derber
[17], Courtier and Talagrand [7]). This assimilation system is the operational DA scheme used
at ECMWF and Météo-France, among other important meteorological centers.
1.3
Framing the problem
This work is focused in data assimilation on chaotic systems. In order to introduce chaotic systems, in this section we will briefly describe the general properties of conservative and nonconservative systems; then we will survey the basics of dissipative and chaotic systems, attractors,
5
Lyapunov vectors and exponents, and the bred vectors. We will also provide classic examples
of chaotic dynamical systems, described both by differential equations and maps.
1.3.1
Basic features of chaotic systems, a historical note
The fundamental property of a chaotic system is the sensitive dependence on initial conditions,
discovered by Jules-Henri Poincaré in 1897 for a simplified three-body problem: a planetary
system including 2 stars and a small “asteroid” [16]. Later in a remarkable 1908 monograph —
Science et Méthode — he wrote down these basic concepts about chaos (Livre premier, § IV):
“A very small cause which escapes our notice determines a considerable effect that
we cannot fail to see, and then we say that the effect is due to chance. If we knew
exactly the laws of nature and the situation of the universe at the initial moment,
we could predict exactly the situation of the same universe at a succeeding moment.
But even if it were the case that the natural laws had no longer any secret for us, we
could still know the situation only approximately. If that enables us to predict the
succeeding situation with the same approximation, that is all we require, and
we will say that the phenomenon has been predicted, that it is governed by the laws.
But it is not always so; it may happen that small differences in the initial conditions
produce very great ones in the final phenomena. A small error in the former will
produce an enormous error in the latter. Prediction becomes impossible and we have
the fortuitous phenomenon.” – Jules-Henri Poincaré, 1908
Albeit this note was written in 1908, it has never become dated. Thanks to the increasing
computational capability of computers, the meteorologist Edward N. Lorenz, now Professor
Emeritus at Massachusetts Institute of Technology, “rediscovered” chaos in early 60s while
examining a relatively simple mathematical model of weather. That’s the same we will survey
in section 1.4 below.
1.3.2
Conservative, nonconservative dynamical systems
Generally speaking, a dynamical system changes its state depending on time. A dynamical
system that evolves continuously in time is known as a flow [19]. It can be described by a set
of differential equations giving the evolution of the state of the system, knowing its previous
states:
ẋ = F(x)
6
(1.2)
where x(t) = (x1 (t), x2 (t), . . . , xN (t)) is the N -dimensional vector — depending on time t —
describing the state of the system at time t: the vector x(t) can be unambiguously mapped,
through a bijection, to a point in the phase space of the system. The term ẋ =
dx
dt
is the time
derivative of x(t) and F(x) = (F1 (x), F2 (x), . . . , FN (x)) is a N -dimensional vector function of
the state x of the system. Given the initial state x(0) we can deterministically calculate the
trajectory, or orbit, x(t) of the system for all future times. Here the time variable t is continuous.
In a system where particles move without friction, called Hamiltonian system, the Liouville’s
theorem (see subsection 1.3.5) assures that the volume of any subset of points in the phase space
is conserved: if each point of the initial subset is evolved forward in time, the resulting set of
points has the same volume as the initial one. So the system is also called conservative. In a
nonconservative system, instead, time evolution does not preserve volumes in phase space.
1.3.3
Probability Density Functions and Liouville equation
In a dynamical system of the form
dx
dt
= F(x, t), there could be uncertainties due to a bad esti-
mation of initial conditions, or even a bad knowledge of the model used: a statistical approach
is a standard practice. The probability density function (PDF) associated to the continuous
variable x(t) can be thought of as the set of possible realizations in the phase space of the
outcome of the model:
ρ = ρ(x(t), t)
(1.3)
If the PDF is Gaussian — an usual assumption in data assimilation — it is completely defined
by the mean of the state x(t) and by the second moment about the mean, i.e. its variance:
hxi
≡
≡
(x − hxi)2
Z
···
Z
···
Z
+∞
−∞
Z +∞
xρ(x(t), t) dx
(1.4)
(x− < x >)2 ρ(x(t), t) dx
(1.5)
−∞
The Liouville equation (LE) is the probabilistic description of the time-dependent evolution
of an ensemble of solutions of the numerical model
dx
dt
= F(x, t) from different initial conditions
[9]. It governs the time evolution of the PDF ρ (x(t), t) associated to the model state x(t). It
may be written in a simplified form as follows:
∂ρ
+ ∇ ·ρF = 0
∂t
7
(1.6)
or in the more complete form:
N
∂ρ (x(t), t) X ∂ +
ρ (x(t), t) F i (x(t), t) = 0
i
∂t
∂x
i=1
(1.7)
where F i is the i-th component of the vector function F. The Liouville equation is a conservation
equation: it states that the local change of ρ — in a particular point of the phase space —
must be equal to the net flux of realizations across the faces of an infinitesimal volume around
the point under examination; equivalently, Liouville equation means that phase space integral
of the realization density is a constant with respect to time. It is an inhomogeneous partial
differential equation, linear in the PDF ρ, which is the single dependent variable.
1.3.4
The Markov processes and the Fokker-Planck equation
If our dynamical system is forced by some sort of stochastic noise, that could arise from our
model’s misrepresentation, the general equation reads:
dx(t)
= F(x(t), q(t), t)
dt
(1.8)
where q(t) is the vector of random disturbances. If q(t) is a Markov process its probability law
in the future does only depend on the given state, not on how the system reached that state.
An example of Markov process is the Brownian motion
x(tn ) = x(tn−1 ) + w(tn )
(1.9)
where w is a white Gaussian forcing. A few examples of Brownian motions, with standard
deviation of the Gaussian forcing σ = 1, are shown in Fig. 1.2.
Furthermore, if in eq. 1.8 q(t) represents an additive white Gaussian forcing function, we
can write [15]:
dx(t)
dq(t)
= F(x(t), t) +
dt
dt
(1.10)
because the white Gaussian noise can be thought of as the derivative of Brownian motion [15].
Equation 1.10 is sometimes called the Langevin equation. It can be shown that, being q(t) a
Markov process, so it is x(t). We can now write the solution of 1.10 in the form
x(t) = M [x(t0 )] + q(t)
(1.11)
The Fokker-Planck equation (FPE) describes the evolution of the PDF associated to these
8
Figure 1.2: A set of Brownian processes starting at x(0) = 0. The random forcing has zero
mean and standard deviation σ = 1.
60
50
40
30
x(t)
20
10
0
−10
−20
−30
−40
0
100
200
300
400
500
600
700
800
900
1000
t
stochastic systems: it includes a random term — such as for example a model error — in the
Liouville equation 1.7. This term has the form of a diffusion component:
i
h
N ∂ 2 ρ (x(t), t) (Q)i,j
N
i
X
X
∂ ρ (x(t), t) F (x(t))
∂ρ (x(t), t)
1
=−
+
i
∂t
∂x
2
∂xi ∂xj
i,j=1
i=1
(1.12)
where the first sum refers to the drift and the second to diffusion: the Fokker-Planck equation
describes the time evolution of a PDF due to both of them. The matrix Q is the stochastic
noise covariance matrix. It should be noted that the FPE is a linear in ρ, which is the only
dependent variable.
1.3.5
The Liouville Theorem
As we already mentioned in subsection 1.3.2, the Liouville Theorem (LT) states that a conservative system ẋ = F(x) conserves in time the volumes of any subset of points in the phase
space. In nonconservative systems, instead, this does not occur: in particular, as we will prove
below, an initial volume V0 = V (0) of a given phase space region D0 shrinks according to [26, 2]:
Z
dV ∇ · F(x) dx
=
dt t=0
D0
9
(1.13)
If ∇ · F(x) does not depend on the vector x, eq. 1.13 simplifies:
dV dt t=0
= ∇·F
Z
dx
(1.14)
D0
dV0
= V0 ∇ · F
dt
(1.15)
Rearranging and integrating from time t0 = 0 to t:
Z
V
dV0
= ∇·F
V0
V0
Z
t
dt
(1.16)
t0
V
= t∇ ·F
V0
(1.17)
V (t) = V0 et∇·F
(1.18)
ln
So finally, in this particular case:
Proof of eq. 1.13. In the phase space of a dynamical system described by equation 1.2 we
can define the phase flux g t :
g t : x(0) → x(t)
and, by definition of Jacobian
∂gt x
∂x ,
(1.19)
∀t we have:
V (t) =
Z
det
∂g t x
∂x
dx
(1.20)
g t (x) = x + F(x) t + O(t2 )
(1.21)
∂F
∂g t x
= I+
t + O(t2 )
∂x
∂x
(1.22)
D0
For t → 0:
Thus
But for any N × N square matrix A = (aij ) and for t → 0 it holds the following relation:
det (I + At) = 1 + t tr (A) + O(t2 )
where tr (A) =
PN
i=1
(1.23)
aii is the trace of the matrix A. So we have:
det
∂g t x
∂x
= 1 + t tr
10
∂F
∂x
+ O(t2 )
(1.24)
Now we notice that, of course:
tr
∂F
∂x
=
N
X
∂Fi
i=1
∂xi
= ∇·F
(1.25)
so eq. 1.20 becomes:
V (t) =
Z
D0
1 + t ∇ · F + O(t2 ) dx
(1.26)
which proves equation 1.13. If the divergence ∇ · F = 0, the phase flux g t preserves the
volumes: ∀t, V (t) = V (0) and the Liouville theorem has been proved. Because of that volume
independence on time, the system is called conservative.
1.3.6
Dissipative systems, attractors and strange attractors
Systems which exhibit volume contraction in phase space are called dissipative, because commonly friction, viscosity or other processes dissipating energy are involved. The volume contraction proves also the existence of a bounded attracting set of points, the attractor, toward which
converge all trajectories, after an appropriate transient time: if we consider initial conditions
in an adequate region of phase space, for increasing time t they will eventually converge to the
attractor.
More formally, an attractor is a closed set A satisfying the following properties [31]:
• A is an invariant set: any trajectory x(t) starting in A will remain in A for all time:
∀x(t), x(0) ∈ A ⇒ x(t) ∈ A, ∀t
• A attracts an open set of initial conditions: this means that A attracts all trajectories
starting sufficiently close to it: ∃B, with B an open set, so that x(0) ∈ B ⇒ d(x(t), A) → 0
as t → ∞, where d(x(t), A) is the distance from x(t) to A. The largest B is called the
basin of attraction of A
• A is minimal: no proper subset of A will satisfy the above properties
In many cases, as for example the Lorenz’s 1963 convective system (see below), we have a
strange attractor: the “strangeness” refers to the sensitive dependence on initial conditions of
the nonperiodic flow, though initially strange attractors were called in such a way because of
their common fractal dimensionality [31]. A solution which is stable in the sense of Lyapunov
means that any other solution sufficiently close to it will remain close for increasing time. Thus
“sensitive dependence on initial conditions” means actually “unstable in the sense of Lyapunov”.
It can be shown that a solution possessing Lyapunov stability must be a periodic or quasiperiodic one [19].
11
The Lorenz’s 1963 system clearly shows a lack of periodicity, as we can see for example in
Fig. 1.13 below, which in turn implies a limited predictability of the system because of sensitive
dependence on initial conditions [19]. The general behavior is chaotic, even if we can always
find unstable periodic orbits arbitrarily close to aperiodic ones [32].
1.3.7
Small perturbations dynamics, tangent linear model, adjoint
model
Lyapunov instability is a matter of small perturbation growth. A small perturbation δx(t) of a
trajectory x(t) is assumed to evolve in a linear way. That is:
δx(tk+1 ) = Mk δx(tk )
(1.27)
which is the Tangent Linear Model (TLM). Here the operator Mk , that depends on time tk , is
an operator linearized around the base-trajectory. It is called the resolvent, or propagator, of
the TLM. Since Mk is an operator defined on real numbers, its adjoint is simply its transpose,
the operator MTk .
In order to justify eq. 1.27, consider a nonlinear discrete model that can be written as a set
N nonlinear coupled ordinary differential equations
dx
= F(x)
dt
(1.28)
where x is an N -dimensional vector and F an N -dimensional vector function. The model is
written in differential form. When a time-difference scheme is chosen, eq. 1.28 becomes a set
of difference equations. If for example a Crank-Nicholson approach is implemented, this set of
equations would be of the form [16]:
x(tk+1 ) = x(tk ) + ∆t · F
x(tk ) + x(tk+1 )
2
(1.29)
So we can integrate eq. 1.28 running the model between an initial time t0 and a final time t,
by recursively using eq. 1.29: the solution x(t) will depend on initial conditions only:
x(t) = M [x(t0 )]
(1.30)
which depends only on time t0 . Here the operator M — that in general is nonlinear — represents
the time integration between t0 and t.
12
If we add a small perturbation δx(t0 ) to the reference model integration x(t0 ) we can write:
M [x(t0 ) + δx(t0 )] =
=
∂M
δx(t0 ) + O δx(t0 )2
∂x
x(t) + δx(t) + O δx(t0 )2
M [x(t0 )] +
(1.31)
(1.32)
where we are using 1.30 and the small perturbation dynamics:
δx(t)
∂M
δx(t0 )
∂x
=
(1.33)
= M δx(t0 )
∂M
∂x
that is the same as eq. 1.27. Here M =
(1.34)
is the N × N matrix called the resolvent or
propagator of the tangent linear model, and propagates an initial small perturbation at time t0
to a perturbation at time t. Since it is linearized from t0 to t, M depends on reference trajectory
x(t) but not on perturbation δx(t0 ). The linearized evolution of δx(t0 ) will be given by
∂F [x(t)]
dδx(t)
=
δx(t)
dt
∂x
where
∂F[x(t)]
∂x
∀t ∈ [t0 , t]
(1.35)
is the Jacobian of F. This system (eq. 1.35) defines the tangent linear model in
differential form [16].
1.3.8
Lyapunov vectors and Lyapunov exponents
In order to have a more precise, quantitative idea of the “sensitive dependence on initial conditions”, we will summarize the concepts of Lyapunov vectors and Lyapunov exponents.
Let’s consider a trajectory on the attractor (after an appropriate transient time): here a
state at time t is described by the vector x(t). Now we consider a very close point, x(t) + δx(t),
where δx(t) is a separation vector whose initial length δx(0) is very small. We are interested in
how δx(t) will grow. One finds [31] that close trajectories, starting on a sphere of infinitesimal
radius, diverge exponentially fast:
kδx(t)k ' kδx(0)k eλt
(1.36)
where λ is called the global leading (or largest ) Lyapunov exponent. It describes the long term
growth of the resulting hyper-ellipsoid, and can be estimated by [16]:
1
λ = lim
t→+∞ t
δx(t) lim ln δx(0) δx(0)→0
13
(1.37)
In practice, the leading Lyapunov exponent is computed as follows:
• we perturb the vector trajectory x(t) with an infinitesimal random vector δx(t)
• we evolve it from time t to time t + ∆t using the Tangent Linear Model:
δx(t + ∆t) = M δx(t)
(1.38)
where we dropped — for sake of lighter notations — the obvious dependence of M on t
and ∆t
• we repeat the previous step for a long time, scaling down at regular intervals the perturbation vector to avoid computational overflow
Other Lyapunov exponents can be computed in the same way, except that we must periodically perform a Gramm-Schmidt orthogonalization to the set of perturbations that defines the
shrinking hyper-ellipsoid: otherwise, they all will converge to the first Lyapunov exponent. It
should be noted, indeed, that for an N -dimensional system there are N Lyapunov exponents:
an initially infinitesimal N -dimensional hyper-sphere will be distorted, due to the evolution of
the system, in an infinitesimal hyper-ellipsoid. If we define δi x(t), i = 1, ..., N the length of the
N principal axis of this hyper-ellipsoid, then λi will be their growth rates, and equation 1.36
will be replaced by:
kδi x(t)k ' kδi x(0)k eλi t
(1.39)
For large t the stretching of the hyper-ellipsoid will be driven by the most positive λi . Since
the Lyapunov exponents depend weakly on the trajectory, we must actually average on many
different points to get an estimate of λi :
1
t→+∞ t
λi = lim
δi x(t) ln δi x(0) δi x(0)→0
lim
(1.40)
The Lyapunov exponents defined so far, are global property of the flow: we are often interested in local dynamic properties. So we define the leading Local Lyapunov Vector (LLV) at
time t: it is the vector e towards which converge all random perturbation δx(t − ∆T ), started
a long time ∆T before t. It may be defined using the Tangent Linear Model:
e(t) =
lim
∆T →+∞
M(t − ∆t, t) δx(t − ∆t)
(1.41)
After computing the leading local Lyapunov vector, the corresponding local Lyapunov exponent
may be computed from the change of its norm.
14
Another feature which is related to the Lyapunov exponents is the dimensionality of the
attractor. The Kaplan-Yorke dimension of the system is defined by:
D≡k+
λ1 + λ2 + . . . + λk
|λk+1 |
(1.42)
where λ1 > λ2 > . . . > λN are the Lyapunov characteristic exponents in decreasing order and
k is the integer for which λ1 + λ2 + . . . + λk > 0 and λ1 + λ2 + . . . + λk + λk+1 < 0.
An intuitive justification of eq. 1.42 is the following. The sum of all the exponents is
the rate at which the volume of the hyper-ellipsoid will increase or decrease: it will be zero
for conservative systems and negative for dissipative ones. If we consider an N -dimensional
box containing the attractor, the sum λ1 + λ2 + . . . + λk of the first k Lyapunov exponents
accounts for the rate at which will increase or decrease the k-dimensional hyper-volume of the
projection of an infinitesimal hyper-ellipsoid on the k-dimensional face of the box. If λ1 > 0
but λ1 + λ2 < 0, then the projection of the hyper-ellipsoid on one edge of the box will grow,
while the projection on a 2-dimensional face will shrink: we may expect the attractor to consist
of complex curves, without surfaces.
On the other hand, if λ1 + λ2 > 0 but λ1 + λ2 + λ3 < 0, then the projection of the
hyper-ellipsoid on a 2-dimensional face of the box will grow, while the projection on a 3dimensional face will shrink: we may expect the attractor to consist of complex surfaces, but
no 3-dimensional manifolds. In general, if λ1 +λ2 +. . .+λk > 0 and λ1 +λ2 +. . .+λk +λk+1 < 0
the attractor may be thought of as consisting of complex k-dimensional manifolds [19].
A stable system will have all Lyapunov exponents less or equal to zero, while a chaotic one,
whether dissipative or not, will have at least one positive Lyapunov exponent. Furthermore,
a chaotic bounded flow must have a zero Lyapunov exponent, with the corresponding local
Lyapunov vector aligned to the trajectory.
1.3.9
Bred vectors
Bred vectors (BVs) represent finite amplitude perturbations. Lyapunov vectors (LVs), instead,
represent infinitesimal perturbations by definition.
The BVs are computed in a similar manner as the LVs, but using the nonlinear model and
a finite renormalization amplitude [16]:
• we perturb the vector trajectory x(t) with a given finite amplitude random vector δx(t);
this random perturbation is introduced only once, at the beginning of the breeding cycle. The size of the initial perturbation is the only tunable parameter of the breeding
15
procedure, and in operational NWP models it can be used to filter out unwanted fast
instabilities, such as convection or even Brownian motion;
• we evolve the resulting perturbed trajectory by using the nonlinear model M ; the same
will be done for the unperturbed trajectory. At fixed time interval ∆t we subtract the
unperturbed trajectory from the unperturbed one:
δx(t + ∆t) = M (x(t) + δx(t)) − M (x(t))
(1.43)
where we dropped again the obvious dependence of the nonlinear model M on t and ∆t.
• we scale down the resulting difference by dividing it by its amplification factor, in order
to keep its size the same as the initial one.
As we can see from their definition, BVs are closely related to leading Local Lyapunov Vectors
(LLVs), since after an infinite breeding time, infinitesimal amplitude bred vectors are identical
to LLVs. They share some features, such as the independence on the norm and on the rescaling
time, but not others: for example, even without orthogonalization, BVs don’t converge to a
single leading BV, because of nonlinearity.
1.3.10
Dynamical systems described by maps
Also very important are those dynamical systems where the time is a discrete variable. In such
a case the dynamical system is described by an N -dimensional map
xn+1 = G(xn )
(1.44)
where n is the discrete time variable, xn is the N -dimensional state vector of the system and
G is the N -dimensional vector function of the state vector xn ; i.e. G evolves the state vector
xn at time n, into the new state vector xn+1 at time n + 1. It should be noted that a map can
be created by any flow, described by eq. 1.2, simply by observing the flow only at regular time
intervals.
As an example of a one-dimensional nonlinear map, consider the logistic map:
xn+1 = r xn (1 − xn )
(1.45)
which is a simple population dynamics model akin to the classic logistic equation:
dx
= r x(1 − x)
dt
16
(1.46)
Bifurcation diagram for the logistic map
1
0.9
0.8
0.7
x
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.5
1
1.5
2
r
2.5
3
3.5
4
Figure 1.3: Bifurcation diagram for the Logistic map, 0 ≤ r ≤ 4.
that describes the logistic population growth [22]. Here r is the intrinsic per capita growth
rate, and x is the population density with respect to the total carrying capacity, i.e. 0 ≤ x ≤ 1.
The logistic map is chaotic with the parameter r = 4, but it actually displays many different
behaviors depending on the value of parameter r: we can have stable, periodic or chaotic
solutions. For example, as we can see in the bifurcation diagram shown in Figures 1.3, 1.4 and
1.5, we have a period 1 solution for 0 < r < 3, while for r = 3 we note the first bifurcation
from period 1 to period 2. We observe another bifurcation from period 2 to period 4 when
√
r = 1 + 6=3.45. The onset of chaos occurs at r ' 3.57, but there are also other periodic
√
windows, as for example the period 3 window at r = 1 + 2 2 = 3.83. The values of the
parameter r for which we have a periodic behavior is an infinite number of finite intervals,
while the values for which it is chaotic, with 3.57 < r ≤ 4, form a Cantor set [19].
The bifurcation diagram for the logistic map shown in Figures 1.3, 1.4 and 1.5 — with
different zoom windows — have been obtained by plotting as a function of the parameter r a
set of values for xn resulting as the evolution of a random value x0 : we iterated many times,
and discarded the first points corresponding to the transient time, before the convergence to
the attractor.
The logistic map is not invertible, because each points — except the maximum — has not
a unique past. Its Lyapunov exponent is λ = 0.693147 = ln 2.
17
Bifurcation Diagram for the logistic map
1
0.9
0.8
0.7
x
0.6
0.5
0.4
0.3
0.2
0.1
0
2.8
3
3.2
3.4
3.6
3.8
4
r
Figure 1.4: Bifurcation diagram for the Logistic map, 2.8 ≤ r ≤ 4. Notice the period-3 window
around r = 3.83.
Bifurcation Diagram for the logistic map
0.18
0.17
x
0.16
0.15
0.14
0.13
3.84
3.842
3.844
3.846
3.848
r
3.85
3.852
3.854
3.856
Figure 1.5: Bifurcation diagram for the Logistic map, a zoom. We can observe the self-similarity
properties of this diagram.
18
Figure 1.6: Hénon map: strange attractor for classical parameters a and b (see text).
0.4
0.3
0.2
y
0.1
0
-0.1
-0.2
-0.3
-0.4
-1.5
-1
-0.5
0
x
0.5
1
1.5
An example of a two-dimensional, dissipative map is the following Hénon map [12][31]:


 xn+1

 yn+1
=
yn + 1 − a x2n
=
b xn
(1.47)
This map, shown in Fig. 1.6, is also chaotic with the two canonical parameter a = 1.4, b =
0.3. Its attractor is shown below: they have been plotted 100,000 dots after a transient of
10,000 time step, with a starting point (x0 , y0 ) = (0, 0). The Jacobian J of a generic map
(xn+1 , yn+1 ) = (f (xn , yn ), g(xn , yn )) is:
J =



∂f
∂xn
∂f
∂yn
∂g
∂xn
∂g
∂yn



(1.48)
For the Hénon map we have |det(J)| < 1 for all xn :

 −2axn
| det(J)| = | det 
b

1 
 | = | − b| < 1
0
If −1 < b < 1 — as in the case of classic parameter b = 0.3 — this means that the Hénon map
is area contracting by a constant factor |b| for each iteration.
19
1.4
Lorenz’s three dimensional chaotic system (1963)
The Lorenz’s three dimensional model (1963), hereafter referred to L63, was found by the
MIT meteorologist Edward N. Lorenz in early 60s by studying a simplified version of a set of
equations modelling a convective fluid motion driven by heating from below.
1.4.1
The equations
The flow occurs in a uniform depth layer of fluid, and the temperature difference between upper
and lower surfaces has a constant value ∆T . The system has a steady-state solution, where
no motion occurs and the temperature varies linearly with depth. If, depending on physical
conditions, the solution is unstable, a convective motion will arise [18]. In case the motion is
completely vertical, with no deviations, if the upper and lower boundaries are taken to be free
and an abrupt truncation is performed, the system turns out to simplify in a set of 3 equations
only.
Thus, today L63 is known as a dynamical system described by the following dimensionless
equations:












where σ = 10, r = 28, b =
8
3











dx
dt
=
σ(y − x)
dy
dt
=
rx − y − xz
dz
dt
=
xy − bz
(1.49)
are positive parameters and t represents the dimensionless time
[18]. In these equations the variables x, y, z depend on time alone. In our experiments, we
will set the integration step ∆t = 0.01, and the numerical integration method will be a second
order Runge-Kutta scheme (see Appendix A).
A qualitative comparison among three popular numerical integration schemes, (1-st order)
Euler, 2-nd order Runge-Kutta and 4-th order Runge-Kutta, is shown in Figures 1.7 and 1.8
for a particular case. The initial condition is on the attractor set and is the same for all
schemes: (x0 , y0 , z0 ) = (14.2041, 15.0165, 34.7172). In both plots the numerical integration is
performed 1000 integration steps ahead, with a time step ∆t = 0.01. Since they are different
order integration schemes, of course they lead to slightly different evolutions of the system.
Due to the chaotic dynamics of the system, these differences tend to amplify. A generic initial
condition won’t be in the attractor set of the system: it will need a transient time to converge
to it: so, in practice, in our plots (see Figures 1.9, 1.10, 1.11, 1.12) we set a transient time of
20
Figure 1.7: Lorenz’s 1963 model: a comparison among 1-st order Euler, 2-nd and 4-th order
Runge-Kutta schemes for 1000 time steps ∆t = 0.01. The initial point belongs to the attractor
and is the same for all schemes: (x0 , y0 , z0 ) = (14.2041, 15.0165, 34.7172).
Euler
RK2
RK4
z
50
45
40
35
30
25
20
15
10
15
10
5
0
x
-5
-10
-15
15
-20 20
5
10
0
-5
-10
-15
-25
-20
y
Figure 1.8: Lorenz’s 1963 model: a comparison among 1-st order Euler, 2-nd and 4-th order
Runge-Kutta schemes for the variable x. The initial point belongs to the attractor and is the
same for all schemes: (x0 , y0 , z0 ) = (14.2041, 15.0165, 34.7172).
15
10
5
x
0
−5
−10
−15
Euler
RK2
RK4
−20
0
1
2
3
4
5
Time
21
6
7
8
9
10
Figure 1.9: Lorenz’s 1963 model attractor.
z
50
45
40
35
30
25
20
15
10
5
0
20
15
10
5
x
0
-5
-10
-15
-20 25
20
15
10
5
0
-5
-10
-15
-20
-25
y
10,000 integration steps.
In Fig. 1.16 we show the so called Lorenz map, where the (n + 1)-th local maximum of
zmax (n + 1) is plotted versus zmax (n), the previous one. See also Fig. 1.15. Note that actually
the Lorenz map is not actually a well defined function, since there may be more than one output
zmax (n + 1) for an input zmax (n) [31], depending on the other variables x and y. So it has a
thickness, that prevents us to use this map to deterministically forecast the state of the system.
On the other hand it should also be noticed that sometimes there are dots in the map well
apart from the great bulk of the other dots. Figure 1.16 has been obtained with a 50 × 106 time
steps integration after a 10,000 time steps transient, and it contains more than 600,000 dots.
1.4.2
The meaning of variables and parameters
In particular, the parameter σ is the Prandtl number 4 , and its value is σ = 10, which is typical
for cold water or approximately twice that of warm water[30][18]; r is the ratio Ra /Rc between
the Rayleigh number 5 Ra and its critical value Rc . The critical value r for instability of steady
4 The
Prandtl number is the ratio of viscosity and thermal conductivity of the fluid.
Rayleigh number describes the kind of heat transfer within a fluid: below a critical value for that fluid
we have basically conduction; over the critical value convection will set up, and the bulk of heat transfer will be
due to it.
5 The
22
Figure 1.10: Lorenz’s 1963 model attractor projected on xz plane.
45
40
35
z
30
25
20
15
10
5
0
−20
−15
−10
−5
0
5
10
15
20
x
Figure 1.11: Lorenz’s 1963 model attractor projected on xy plane.
25
20
15
10
y
5
0
−5
−10
−15
−20
−25
−20
−15
−10
−5
0
x
23
5
10
15
20
Figure 1.12: Lorenz’s 1963 model attractor projected on yz plane.
45
40
35
z
30
25
20
15
10
5
0
−25
−20
−15
−10
−5
0
5
10
15
20
25
y
Figure 1.13: Lorenz’s 1963 model: time dependence of x.
20
15
10
x
5
0
−5
−10
−15
−20
0
5
10
15
Time
24
20
25
30
Figure 1.14: Lorenz’s 1963 model: time dependence of y.
25
20
15
10
y
5
0
−5
−10
−15
−20
−25
0
5
10
15
20
25
30
Time
Figure 1.15: Lorenz’s 1963 model: time dependence of z.
50
45
40
35
z
30
25
20
15
10
5
0
0
5
10
15
Time
25
20
25
30
Figure 1.16: Lorenz’s 1963 map: values of the relative maximum zmax (n) and successive relative
maximum zmax (n + 1). See Fig. 1.15.
50
48
46
44
zmax(n+1)
42
40
38
36
34
32
30
28
28
30
32
34
36
38
40
zmax(n)
42
44
46
48
50
convection to occur is r = 470/19 = 24.74 [18]. So, a value of r = 28 is slightly supercritical and
the flow exhibits unstable convection. The parameter b = 8/3 is proportional to the geometry
of the convective cell. For the meaning of the 3 variables x, y and z, let’s quote Lorenz himself
[18]:
“In these equations x is proportional to the intensity of the convective motion,
while y is proportional to the temperature difference between the ascending and
descending currents, similar signs of x and y denoting that warm fluid is rising
and cold fluid is descending. The variable z is proportional to the distortion of
the vertical temperature profile from linearity, a positive value indicating that the
strongest gradients occur near the boundaries.”
Incidentally, note that an initial condition x = y = z = 0 means that the convective system
exhibits no convection, and we have a stable “periodic” trajectory.
This system was derived by Lorenz as a drastically simplified model of convection rolls in
a fluid heated from below in a gravitational field, but the same equations can also be derived
for other physical systems: the equations also exactly describe, for example, the motion of a
specific water-wheel [31].
26
1.4.3
Why it is so important: chaos implies limited predictability
Despite its apparent simplicity, the Lorenz’s deterministic, nonlinear system exhibits a chaotic
dynamics: the solutions oscillate in an irregular way, never exactly repeating themselves and
bounded in a particular region of phase space [31, 26]. There are no analytic solutions of the
system, so the solutions have to be found by numerical integration only. Lorenz felt that the
really important finding was the fact that under fairly general conditions a lack of periodicity
implied limited predictability [19]. We deal with this problem even for deterministic flows such
as, for example, the NWP models. A fortiori the atmosphere itself has a finite horizon of
predictability, which Lorenz estimated to be approximately two weeks. That is due to deep,
dynamical reasons — the chaotic behavior of the system — and cannot be overcome by increased
power of computational devices.
1.4.4
Lyapunov exponents, dimensionality and doubling time
Lorenz’s system L63 is a nonlinear but autonomous system: in eq. 1.2 the function F does
not depend explicitly on time. L63 is also a dissipative, nonconservative system, i.e. the time
evolution does not preserve volumes in phase space. In general, a volume V (t) of a phase space
region D shrinks according to 1.13:
Z
dV =
∇ · F(x) dx
dt t=0
D
For L63 the divergence of the flow is:
∇ · F = −(σ + 1 + b)
(1.50)
and an initial volume V (0) in phase space will shrink according to eq. 1.18:
V (t)
= V (0) exp [−t (σ + 1 + b)]
(1.51)
= V (0) exp(−13.67 t)
(1.52)
The Lorenz’s system has three Lyapunov exponents, which drive its dynamical features:
λ1
= 0.9056
(1.53)
λ2
= 0
(1.54)
λ3
= −14.5723
(1.55)
27
Since λ1 + λ2 > 0 and λ1 + λ2 + λ3 < 0, the Kaplan-Yorke dimension of the attractor of the
system (eq. 1.42) is:
D
= 2+
λ1 + λ2
|λ3 |
= 2.06215
(1.56)
(1.57)
The doubling time is the average time after which a small perturbation will double. It may
be calculated from eq. 1.36 coupled with the largest Lyapunov exponent λ1 in the following
way:
2 kδ0 k = kδ0 k eλ1 τdouble
1.5
(1.58)
=⇒ λ1 τdouble
=
ln 2
(1.59)
=⇒ τdouble
=
ln 2
= 0.7646
0.9065
(1.60)
The aim of this study
As we stated in section 1.2, the goal of Data Assimilation is to produce an optimal estimate
of the state of a dynamical system from incomplete, noisy observations and (approximate)
knowledge of the laws governing the evolution of the system. An important application is the
initialization of forecast models of the Atmosphere and Ocean.
In this work we address the problem of data assimilation in chaotic systems in the framework of Lorenz’s three variable convective model. The aim is to investigate the theory for
an advanced formulation of the Assimilation in the Unstable Subspace (AUS, see section 2.3),
a data assimilation scheme relying upon the system’s dynamics which is applicable to different contexts and computationally affordable in operational environments. AUS has already
provided encouraging results with realistic models and observational configurations. The low
dimensionality of the model used here allows for the direct comparison with a “golden standard”
data assimilation scheme, the Extended Kalman Filter, and for development of the theory.
1.6
Notation and conventions
Throughout this study and regardless to the scheme under consideration, we will conform to
the notation proposed by Ide et al. [14], widely used by the scientific community for data
assimilation.
28
1.6.1
Model space and observational space
The evolution of a dynamical discrete system from time tk to time tk+1 is described by the
equation
xf (tk+1 ) = Mk [xa (tk )]
(1.61)
where the column vector xf (tk+1 ) is the forecast state at time tk+1 and the column vector
xa (tk ) is the analysis state at time tk . Both have dimension N and are defined in the model
space S Mod : if the system is described by K variables at L locations, then N = K × L. If we
have M observations of the state of the system, we can define the observations column vector
yo , and its observational space S ob , i.e.: yo ∈ S ob . In an operational context, in general the
available observations are much less than the dimensionality of the NWP model, so M N .
The assimilation algorithm, aimed at initializing our model with the analysis state xa , can be
thought of as an application F , that combines both the forecast state xf and the observations
yo to provide the analysis state xa :
F : S Mod × S ob −→ S Mod
(xf , yo )
7−→ xa
(1.62)
(1.63)
In this work we will follow the common practice introduced by Rutherford in 1972 [14], who
proposed xb = xf , since forecast states had become better background fields than climatology.
Other choices may be taken for the vector field xb , for example by averaging over an ensemble
of different forecasts.
1.6.2
The observation operator H and its linear approximation H
The observation operator H is defined in the model space S Mod :
H : S Mod −→ S ob
(1.64)
It transforms an N -dimensional state vector x ∈ S Mod into an “observational” M -dimensional
vector. Its structure contains all mathematical and physical relations allowing for an a priori
estimate of the observations: of course, in general H is not linear. But the nonlinear operator
H can be linearized around the trajectory x, which we assume it’s well approximated by our
forecast trajectory xf , with
H(x + δx) = H(x) + H δx
29
(1.65)
where δx is a small perturbation of the state and H may be represented by an M × N matrix,
whose elements are [16]:
(H)ij =
∂Hi
∂xj
with i ∈ {1, . . . , M }, j ∈ {1, . . . , N }
(1.66)
So the operator H is the Jacobian of the operator H, and transforms vectors in model space
into their corresponding vectors in observation space. Its transpose HT transforms vectors in
observation space into vectors in model space.
Note that, if the operator H is linear, by definition of linearity it holds:
H(x + δx) = H(x) + H δx
(1.67)
H=H
(1.68)
and, by using eq. 1.65, we have
1.6.3
Error vectors
The model operator Mk in eq. 1.61 evolves the state from time tk to time tk+1 . If xt (tk ) is the
true state of the system at time tk , the corresponding analysis error and the forecast error will
be the column vectors
η a (tk ) = xa (tk ) − xt (tk )
(1.69)
ηf (tk ) = xf (tk ) − xt (tk )
(1.70)
These errors are defined in the model space S Mod .
Generally speaking, even the observation operator H may be misrepresented:
η H (tk ) = yt (tk ) − Hxt (tk )
(1.71)
where yt (tk ) is the vector of true values of the observed variables at time tk . The vector η H is
defined in the observational space S ob , as well the observational error:
η o (tk ) = yo (tk ) − Hxt (tk )
(1.72)
The observational error ηo implicitly includes the measurements error, the observation operator
misrepresentation and the model’s error of representativeness due to subgrid-scale processes not
represented in our grid-averaged values of the model and analysis [16].
30
The model error between the times tk and tk+1 is defined by
η M (tk ) = Mk xt (tk ) − xt (tk+1 )
(1.73)
Rearranging it with eq. 1.61, xf (tk+1 ) = Mk [xa (tk )], we have:
xf (tk+1 ) − xt (tk+1 ) = Mk [xa (tk )] − Mk xt (tk ) + η M (tk )
(1.74)
The lhs is the forecast error at time tk+1 . So, assuming Mk [xa (tk )] − Mk [xt (tk )] to be a small
term, we can calculate the new forecast error with the tangent linear model Mk :
η f (tk+1 ) = Mk η a (tk ) + η M (tk )
1.6.4
(1.75)
Error covariance matrices
Now we can define from the above column vectors their covariance matrices, built by rightmultiplying each of them by its transpose and taking the expectation values. In practice, the
expectation values are estimated by averaging on many cases. The analysis error covariance
matrix Pa , the forecast error covariance matrix Pf and the observation error covariance matrix
R then read:
Pa =< ηa (η a )T >
(1.76)
Pf =< η f (η f )T >
(1.77)
R =< η o (η o )T >
(1.78)
More explicitly:

< η1a η1a >


 < η2a η1a >

a
P =
..

.


a a
< ηN
η1 >

< η1f η1f >
< η1a η2a >
< η2a η2a >
..
.
a a
< ηN
η2 >
< η1f η2f >


 < ηf ηf > < ηf ηf >
2 1
2 2

f
P =
..
..

.
.


f f
f f
< ηN η1 > < ηN η2 >
31
···
a
< η1a ηN
>



a
< η2a ηN
> 


..

.


a a
· · · < ηN ηN >
···
..
.
···
f
< η1f ηN
>



f
< η2f ηN
> 



...


f f
· · · < ηN ηN >
···
..
.
(1.79)
(1.80)

< η1o η1o >


 < η2o η1o >

R=
..

.


o o
< ηM
η1 >
< η1o η2o >
···
o
< η1o ηM
>



<
> ··· <
> 


..
..
..

.
.
.


o o
o o
< ηM
η2 > · · · < ηM
ηM >
η2o η2o
o
η2o ηM
(1.81)
Since each vector depends on time tk all the above error covariance matrices depend on time tk
as well. Note also that Pa and Pf are N × N matrices, while R is an M × M one. Note that the
observation error covariance matrix R can be actually thought of as including three components
[16]: instrument error covariance matrix Rinstr , the representativeness error covariance matrix
Rrepr , both assumed to be uncorrelated, and the observation operator H error covariance matrix
RH :
R = Rinstr + Rrepr + RH
(1.82)
From eq. 1.75, η f (tk+1 ) = Mk η a (tk ) + η M (tk ), we have [14]:
Pfk+1
=
=
T
Mk η a (tk ) + η M (tk ) Mk η a (tk ) + η M (tk )
Mk Pak MTk + Qk
(1.83)
(1.84)
where the term Qk refers to the model error covariance matrix between the times tk and tk+1 .
It is defined, in a similar way as Pa , Pf and R, by the model error η M (tk ):
Q =< η M (η M )T >
(1.85)
(here we dropped the subscript k). That is:

< η1M η1M >


 < η2M η1M >

Q=
..

.


M M
η1 >
< ηN
< η1M η2M >
<
η2M η2M
>
..
.
M M
< ηN
η2 >
32
M
>
· · · < η1M ηN
··· <
..
.
M
η2M ηN
..
.
M M
· · · < ηN
ηN



> 





>
(1.86)
1.6.5
Operators and vectors: a low dimensional example
As an example, consider a 3-dimensional space model S Mod and a 2-dimensional observation
space S ob : the truth vector is 3 × 1, with components expressed in term of the grid points e, f,
g (see Fig. 1.17):


xte


t
xt = 
 xf

xtg





(1.87)
and similarly for analysis, background and forecast column vectors:
xa
=
xb
xbe xbf
h
= xfe xff
xf
=
xae
T
xag
T
xbg
iT
xfg
xaf
(1.88)
(1.89)
(1.90)
The observation vector, instead, is a 2 × 1 one expressed in term of the observation points 1
and 2 (see Fig. 1.17 again):

y1o

yo = 
y2o

(1.91)


The analysis and forecast covariance matrices are:

a
 pee

a
Pa = 
 pf e

page

paef
paf f
pagf
pfee


f
Pf = 
 pf e

pfge

paeg 

paf g 


pagg
pfef
pfeg
pff f
pff g
pfgf
pfgg
(1.92)






(1.93)
while the observation error covariance matrix will be:

 r11
R=
r21

r12 

r22
(1.94)
If measurements error at different locations are uncorrelated, R will be a diagonal matrix.
The linearized observation operator H projects vectors in model space S Mod (grid points)
into vectors in observation space S ob (observation points). Its components may be simple
33
Figure 1.17: A simple case: 3 grid points (e, f, g) and 2 observations (1, 2).
interpolation coefficients:

 h1e
H=
h2e
h1f
h2f

h1g 

h2g
(1.95)
So for example the background values at observation points are the following vector yb :
Hxb
=
=

 h1e

h2e

y1b


y2b
h1f
h2f



xbe

h1g  
b

 xf
h2g 
xbg






(1.96)

 = yb
(1.97)
In Kalman Filter (see subsection 2.2.1) we will also have the term:

pfee


f
P H =
 pf e

pfge
f
T
pfef
pfeg
pff f
pff g
pfgf
pfgg

  h1e

 h
  1f

h1g


pfe1
h2e  
 
 f
h2f 
 =  pf 1
 
pfg1
h2g
pfe2
pff 2
pfg2






(1.98)
that is a grid to observation points approximation, by interpolation, of the forecast error covariance matrix Pf ; as for example in pfe2 = pfee h2e + pfef h2f + pfeg h2g . We will also see the
term:

 h1e
HPf HT = 
h2e
h1f
h2f
h1g
h2g

f
 pee

f

 pf e

pfge

pfef
pff f
pfgf

pfeg   h1e


pff g 
  h1f

h1g
pfgg

h2e  
  pf11
h2f 
= f

p12
h2g
pf21
pf22



(1.99)
which in turn is an approximation by back interpolation of the forecast error covariance matrix
Pf between observation points [16].
34
We will also find the gain matrix
h
i−1
K = Pf HT HPf HT + R
(1.100)
that is an N × M matrix. In this low dimensional example it’s 3 × 2:

K
f
 pe1

f
= 
 pf 1

pfg1


pfe2  
  pf11
pff 2 
  f
  p12
pfg2
35
pf21
pf22


  r11
+
r21
r12
r22
−1






(1.101)
36
Chapter 2
Data Assimilation: state of the
art
A NWP forecast is an initial value problem coupled with a boundary problem: carefully initialized and with the appropriate boundary conditions, a NWP model outputs the atmospheric
evolution (forecast). Of course, to improve the quality of the forecast we need to better estimate
the initial conditions, or the present state of the system. As we already stated in paragraph
1.2, the purpose of Data Assimilation in NWP models is “using all the available information, to
determine as accurately as possible the state of the atmospheric (or oceanic) flow” (Talagrand,
1997, [16]). The available information is a statistical combination of (noisy) observations and a
short-range forecast.
In this chapter we will survey the most important techniques devised to combine these informations, without any attempt to be exhaustive. In particular, we will talk about Variational
and Sequential assimilation techniques. The former has been applied since the early 70s, greatly
enhanced at the end of the 80s, and still largely used in operational contexts. The latter have
an appealing probabilistic approach, but their implementation in realistic geophysical models
turns out to be problematic, for reasons to be discussed below.
2.1
Variational assimilation
In the variational method basically the problem is to find a model trajectory which best fit
the observational data within a given time interval τ , often called assimilation window, while
satisfying the dynamical constraints: in the strong constraints formulation the constraints have
to be satisfied exactly; in the weak constraints formulation they have to be satisfied only ap37
proximately [8]. Since the model equations are deterministic, all we need is the initial state and
the boundary conditions, and the variational problem may be restated from a constrained to
an unconstrained form. Thus, the idea is to find the initial state x(t0 ) which evolution under
the model equations best fit the observational data within the assimilation window.
In practice, we are looking for the initial state x(t0 ) that minimizes a so called cost function,
which is a scalar functional of the trajectory x(t), to be minimized under the model equations
constraint 1.2. The cost function J (x(t)) is defined as the sum of the squares of the distance
between x(t) and the forecast state xf (t) weighted by the inverse of the forecast error covariance
matrix Pf (t), plus the distance between the “first guess” observations Hx(t) and the true
observations yo (t), weighted by the inverse of the observations error covariance matrix R [16].
J (x(t))
2.1.1
=
T
−1 1
x(t) − xf (t)
Pf (t)
x(t) − xf (t) +
2
1
+ [Hx(t) − yo (t)]T R−1 [Hx(t) − yo (t)]
2
(2.1)
Maximum likelihood approach
Let’s see a maximum likelihood motivation for eq. 2.1. Consider first, as an illustrative example,
a 1-dimensional problem: we have two independent temperature observations T1 and T2 , both
assumed to have normally distributed errors with standard deviations σ1 and σ2 . The analysis
temperature T is the most likely value, given the two observations T1 and T2 and their statistical
errors. The cost function, in this particular case, is:
J(T ) =
(T − T2 )2
1 (T − T1 )2
+
2
σ12
σ22
(2.2)
The probability distribution of an observation T1 given a true value T and a standard
deviation σ1 for T1 is
2
(T −T )
− 1 2
1
2σ
1
pσ1 (T1 |T ) = √
e
2πσ1
(2.3)
A similar relation holds for T2 :
2
(T −T )
− 2 2
1
2σ2
pσ2 (T2 |T ) = √
e
2πσ2
(2.4)
The likelihood of a true value T given an observation T1 with a standard deviation σ1 is given
by [16]:
2
(T −T )
− 1 2
1
2σ1
Lσ1 (T |T1 ) = pσ1 (T1 |T ) = √
e
2πσ1
(2.5)
and — in the same way — the likelihood of a true value T given an observation T2 with a
38
standard deviation σ2 is:
2
(T −T )
− 2 2
1
2σ2
e
Lσ2 (T |T2 ) = pσ2 (T2 |T ) = √
2πσ2
(2.6)
Since the two observations T1 and T2 are independent, the likelihood of a true value T given
both T1 and T2 is the product:
Lσ1 σ2 (T |T1 , T2 ) =
Lσ1 (T |T1 ) · Lσ2 (T |T2 )
(T −T )2
− 1 2
2σ1
=
1
√
e
2πσ1
=
(T −T )
− 1 2
1
2σ
1
e
2πσ1 σ2
2
(2.7)
1
√
e
2πσ2
−
(T −T )2
− 2 2
2σ2
(T2 −T )2
2σ2
2
(2.8)
Thus, given the two measurements T1 and T2 and their standard deviations σ1 and σ2 , we can
find the most likely value of T by maximizing the likelihood 2.8, or even its logarithm:
(T1 − T )2
(T2 − T )2
max ln Lσ1 σ2 (T |T1 , T2 ) = max constant −
−
T
T
2σ12
2σ22
(2.9)
This maximization leads to the minimization of
(T1 − T )2
1
(T2 − T )2
−
J(T ) =
−
2
σ12
σ22
(2.10)
which is the cost function 2.1 for this simplified case.
Let’s consider now the more general cost function (eq. 2.1): we define the likelihood of
the true state x(t) given the forecast field xf (t) (used as background field) or given the new
observations yo in the following way [16]:
LPf (x|xf ) = pPf (xf |x) =
LR (x|yo ) = pR (yo |x) =
T
−1
1
− 1 xf −x) (Pf ) (xf −x)
e 2 (
N/2
f
1/2
(2π)
|P |
h
1
e− 2 [(y
1
(2π)M/2 |R|1/2
i
o
−Hx)T R−1 (yo −Hx)]
(2.11)
(2.12)
where N is the number of components of the vectors x(t) and xf (t), while M is that of the
vector yo . The joint likelihood, being independent the forecast xf and the new observations
yo , is the product of the two Gaussian likelihoods:
L(x|xf , yo ) =
=
LPf (x|xf ) LR (x|yo )
e
− 12
h
(x
f
T
−x)
f
(P ) (x
−1
(2.13)
f
i
−x)
− 12
[(y
o
T
−Hx) R
(2π)(N +M)/2 |Pf |1/2 |R|1/2
39
−1
o
(y −Hx)]
(2.14)
The most likely analysis state xa , which maximizes the joint likelihood and its logarithm as
well, also minimizes the cost function 2.1.
2.1.2
Bayesian approach
Let’s go back to our simplified 1-dimensional case, discussed in subsection 2.1.1: there is also
a Bayesian derivation for 2.10. We made the observation T1 (the forecast state in the assimilation cycle) with an a priori probability distribution of the truth — that is, before the second
observation:
2
pT1 σ1 (T ) = √
(T −T )
− 1 2
1
2σ
1
e
2πσ1
(2.15)
The Bayes Theorem for the a posteriori probability of the truth given the new measurement
T2 is:
pσ2 (T |T2 ) =
=
pσ2 (T2 |T ) pT1 σ1 (T )
pσ2 (T2 )
√ 1
2πσ2
e
−
(T2 −T )2
2
2σ2
√ 1
2πσ1
e
−
(T1 −T )2
2
2σ1
(2.16)
pσ2 (T2 )
Since the denominator
pσ2 (T2 ) =
∗ )2
Z
T∗
(T −T
− 2 2
1
2σ2
√
e
2πσ2
dT ∗
is independent of T , maximizing the a posteriori probability 2.16 means maximizing the logarithm of the numerator, that leads again to the minimization of the cost function 2.10.
In the more general case (eq. 2.1), we suppose that the truth vector x(t) is the result of a
stochastic process defined by the following a priori probability distribution function, given the
forecast field xf (t) (used as background):
T
−1
1
− 21 (xf −x) (Pf ) (xf −x)
pPf (x) =
e
(2π)N/2 |Pf |1/2
h
i
(2.17)
When we get new observations yo , the Bayes theorem gives us the a posteriori probability:
p(x|yo ) =
pR (yo |x) pPf (x)
p(yo )
(2.18)
where p(yo ) is the climatological observations distribution. Eq. 2.18 gives:
−1
2
e
p(x|yo ) =
»
(yo −Hx)T R−1 (yo −Hx)+(xf −x)
T
(2π)(N +M )/2 |R|1/2 |Pf |1/2
p(yo )
40
–
(Pf ) (xf −x)
−1
(2.19)
Since p(yo ) does not depend on the current state x(t), the maximum of the a posteriori probability 2.19 will coincide with the maximum numerator, or with the minimum of the cost function
2.1.
2.1.3
3D-Var scheme
The analysis state that minimizes the cost function J(x) in eq. 2.1 is given by:
∇x J(xa ) = 0
(2.20)
If we assume that the analysis is a good approximation to the truth and to the observations,
we can linearize the observation operator H around the background, or around the forecast if
we use this as the background field:
yo − H(x)
= yo − H[xf + (x − xf )]
(2.21)
= yo − H(xf ) − H(x − xf )
(2.22)
After dropping the time dependence for sake of clarity, we can rearrange the expression for the
cost function:
J (x) =
=
=
T
−1 1
1
T
x − xf
Pf
x − xf + [H(x) − yo ] R−1 [H(x) − yo ]
(2.23)
2
2
T
−1 1
x − xf
Pf
x − xf +
2
1
+ [yo − H(xf ) − H(x − xf )]T R−1 [yo − H(xf ) − H(x − xf )]
(2.24)
2
T
−1 1
T
1
x − xf
Pf
x − xf +
x − xf HT R−1 H x − xf +
2
2
1
T
1
f
o T
−1
R H x − xf +
x − xf HT R−1 H(xf ) − yo +
+ H(x ) − y
2
2
1
f
o T
−1
f
o
+ H(x ) − y
R
H(x ) − y
(2.25)
2
and see that it’s a quadratic function of the analysis increment x − xf . Now, consider the
general quadratic function
F (x) =
1 T
x Ax + vT x + k
2
(2.26)
where x and v are vectors of the same N -dimensional vector space, A is an N × N symmetric
matrix and k is a scalar. It can be shown [16] that its gradient is
∇x F (x) = Ax + v
41
(2.27)
So, for the cost function 2.25 the gradient with respect to x is the same as that with respect to
x − xf :
∇J(x) =
h
Pf
−1
+ HT R−1 H
i
x − xf + HT R−1 H(xf ) − yo
(2.28)
In order to minimize it and to calculate the analysis state xa , we set ∇J(xa ) = 0, so:
h
Pf
−1
+ HT R−1 H
i
xa − xf = HT R−1 yo − H(xf )
(2.29)
or alternatively:
xa = xf +
h
Pf
−1
+ HT R−1 H
i−1
HT R−1 yo − H(xf )
(2.30)
The last equation can be expressed in term of the analysis increment δxa = xa − xf and the
innovation δyo = yo − H(xf ):
δxa =
h
Pf
−1
+ HT R−1 H
i−1
HT R−1 δyo
(2.31)
which is the solution of the variational analysis problem, called 3D-Var. It can also be shown
that
h
Pf
−1
+ HT R−1 H
i−1
−1
HT R−1 = Pf HT HPf HT + R
(2.32)
so that
−1 o
δxa = Pf HT HPf HT + R
δy
(2.33)
−1 o
xa = xf + Pf HT HPf HT + R
y − H(xf )
(2.34)
or equivalently:
Subtracting the true state vector xt in both lhs and rhs, we get an expression for the analysis
error:
−1 o
y − H(xf )
η a = ηf + Pf HT HPf HT + R
(2.35)
If we define the gain matrix
−1
K = Pf HT HPf HT + R
(2.36)
the equations 2.33 and 2.34 become:
δxa
xa
= K δyo
(2.37)
= [I − KH] xf + Kyo
(2.38)
42
If the forecast error η f is small, we can linearize the observation operator H and the equation
H(x + δx) = H(x) + H δx holds true; so equation 2.35 yields:
ηa
=
η f + Kyo − KH(xf )
(2.39)
=
η f + Kyo − KH(xt + xf − xt )
(2.40)
=
η f + Kyo − KH(xt + η f )
(2.41)
=
η f + Kyo − KH(xt ) − KH(η f )
(2.42)
=
[I − KH] η f + K δyo
(2.43)
Since we may assume that the forecast error and the observation error are uncorrelated, the
analysis covariance matrix may be written as follows:
Pa
= < η a (η a )T >
(2.44)
= [I − KH] Pf [I − KH]T + KRKT
(2.45)
= [I − KH] Pf
(2.46)
where the last equation has been derived by inserting into eq. 2.45 the expression for the gain
−1
matrix K = Pf HT HPf HT + R . Indeed, from eq. 2.45 we have:
Pa
=
=
[I − KH] Pf − [I − KH] Pf HT KT + KRKT
[I − KH] Pf − Pf HT − KHPf HT KT + KRKT
(2.47)
(2.48)
Now, the two last terms in eq. 2.48 actually vanish, because
−1
K = Pf HT HPf HT + R
⇐⇒ KR = Pf HT − KHPf HT
2.2
(2.49)
Sequential assimilation
The sequential assimilation methods have a probabilistic approach to estimate the state of a
system. The basic idea is to project information ahead in time and to assimilate observational
data when available. We don’t need to compute the adjoint model: sequential assimilation
schemes are suitable for different models.
In the following subsections we will talk about Kalman Filter for linear systems and Extended
Kalman Filter for nonlinear ones. In the end, we will drop a note about the very promising
Ensemble Kalman Filter.
43
2.2.1
Kalman Filter
The Kalman Filter, hereafter referred to as KF, was first formulated by Kalman (1960) and
Kalman and Bucy (1961), so sometimes it is called Kalman-Bucy Filter. The KF deals with
linear stochastic dynamical systems where noisy observations are taken at discrete times. It is
an optimal recursive data processing algorithm, where ’optimal’ refers to the fact that it uses all
available information we can provide: it processes all measurements, when available, regardless
of their precision, by using [23]:
• the knowledge of the dynamics of the system and measurement device dynamics
• the statistical description of the system noises, measurement errors and model error
• any available information about initial conditions of the system
• it does not need old data to be kept in storage
The KF is basically a set of equations that implements a prediction-correction estimator: if
some conditions are met, the estimator minimizes the estimated error covariance [23]. If the
conditions are not fully satisfied, often the KF still works quite well. A sketch description of
how it works is shown in Fig. 2.1: the prediction stage consists of the first two equations, which
basically project ahead the analysis state and the analysis error covariance matrix, providing
the forecast vector and the forecast error covariance matrix. The correction stage consists of
further three equations, and takes advantage of new observations to provide the new analysis
vector and the analysis error covariance matrix. Then, recursively, a new projection ahead is
performed till new observations become available.
Let’s consider a linear discrete stochastic dynamical system of the form of the eq. 1.11, i.e.
xtk+1 = Mxtk + qk+1
(2.50)
Here the linear model operator M projects information ahead from time tk to time tk+1 , so
actually
M = M(tk , tk+1 )
(2.51)
We will use this simplified notation in equations below. The term qk+1 in eq. 2.50 is a white
Gaussian noise with zero mean and covariance matrix Qk+1 , namely
< qk+1 > =
Qk+1
=
0
(2.52)
< qk+1 qTk+1 >
(2.53)
44
and may represent subgrid-scale processes not resolved by the model [14]. Under these conditions the KF is described by the following set of equations:
xfk+1
=
Mxak
(2.54)
Pfk+1
=
MPak MT + Qk+1
(2.55)
Kk+1
=
Pfk+1 HT (HPfk+1 HT + R)−1
(2.56)
xak+1
=
o
(I − Kk+1 H)xfk+1 + Kk+1 yk+1
(2.57)
Pak+1
=
(I − Kk+1 H)Pfk+1
(2.58)
where, following usual conventions:
xt
is the true state
xa
is the analysis state
xf
the forecast state
yo
the observation vector
Pa
the analysis error covariance matrix
Pf
the forecast error covariance matrix
R
the observational error covariance matrix
H
the (possibly nonlinear) observation operator
H
the linearized observation operator (it transforms vectors in model space to vectors
in observation space)
M
the linear model operator
Q
the forecast model error covariance matrix
K
the gain matrix
I
the identity matrix.
Subscripts in the KF equations indicate the time steps where new observations are available.
If we compute the expectation value of eq. 2.50 we get:
< xtk+1 >
=
< Mxtk + qk+1 >
(2.59)
=
< Mxtk >
(2.60)
=
M < xtk >
(2.61)
45
Figure 2.1: How the Kalman Filter works.
or, in the usual data assimilation notation:
xfk+1 = Mxak
(2.62)
which is eq. 2.54. It should be noticed that, due to linearity of this special case, if the initial
PDF is Gaussian, so it will remain for any future time: we can have a complete description of
the future PDF by its mean and covariance. So we need the covariance matrix Pf evolution,
from time tk to time tk+1 . It can be derived by rearranging the forecast error definition, together
with the linearity of the model operator M, and by using equations 2.50, 2.62:
η fk+1
= xfk+1 − xtk+1
(2.63)
= Mxak − Mxtk − qk+1
(2.64)
= M(xak − xtk ) − qk+1
(2.65)
= Mη ak − qk+1
(2.66)
Since qk+1 is a white Gaussian noise with zero mean, the expectation value of η fk+1 reads:
< η fk+1 >= M < η ak >
46
(2.67)
Now, if the analysis error and the model error are uncorrelated, multiplying η fk+1 by its transpose and taking the expectation value, we get:
Pfk+1
=
< η fk+1 (η fk+1 )T >
(2.68)
=
< (Mη ak − qk+1 )(Mη ak − qk+1 )T >
(2.69)
=
M < η ak (η ak )T > MT + < qk+1 (qk+1 )T >
(2.70)
=
MPak MT + Qk+1
(2.71)
which is eq. 2.55.
The analysis state is computed as in 3D-Var assimilation scheme, eq. 2.34
xa = xf + Kk+1 yo − H(xf )
(2.72)
xa = [I − Kk+1 H] xf + Kk+1 yo
(2.73)
or even
where the Kalman gain Kk+1 is defined by
−1
Kk+1 = Pf HT HPf HT + R
(2.74)
The analysis error covariance matrix is given by eq. 2.46, i.e.:
Pa = (I − Kk+1 H)Pfk+1
2.2.2
(2.75)
Extended Kalman Filter
One of the methods devised to address the problem of estimating the initial conditions for a
forecast model is the Extended Kalman Filter (EKF), where the term ’extended’ refers to the
Kalman Filter’s approximation for nonlinear systems.
The EKF is described by the following set of equations (see for example [14]):
xfk+1
=
M xak
(2.76)
Pfk+1
=
MPak MT + Qk
(2.77)
Kk+1
=
Pfk+1 HT (HPfk+1 HT + R)−1
(2.78)
xak+1
=
o
(I − Kk+1 H)xfk+1 + Kk+1 yk+1
(2.79)
Pak+1
=
(I − Kk+1 H)Pfk+1
(2.80)
47
where we used the common conventions as for the KF, and the nonlinear model operator
M . Here M is no longer the linear model operator, but the Tangent Linear Model operator.
Subscripts indicate the time steps where new observations are available and, again, we dropped
them for operators M , M, H, H.
For EKF, due to the lack of linearity of the model operator M , equation 2.66 for the forecast
error is no longer valid; nonetheless it may be rewritten in an approximate form, by using the
tangent linear model operator and equation xtk+1 = M xtk + qk+1 :
η fk+1
= xfk+1 − xtk+1
(2.81)
= M xak − M xtk − qk+1
(2.82)
' M(xak − xtk ) − qk+1
(2.83)
So, since
η fk+1 ' Mηak − qk+1
(2.84)
is only an approximate expression, the corresponding expression for the forecast error covariance
matrix (eq. 2.71) will be approximate as well:
Pfk+1 ' MPak MT + Qk
(2.85)
The EKF, after an initial transient, should give both the best linear unbiased estimate of the
state of the system and its error covariance. But if the system is (locally) highly nonlinear, or
should the observations be not adequately frequent, the linearization may become a hypothesis
which is not actually fulfilled: that can jeopardize the stability of the filter and the filter
may diverge [16]. Furthermore, for realistic NWP model the EKF can not be implemented
due to both the prohibitive computational costs in estimating the covariance matrices and the
uncertainties about the model error.
2.2.3
Ensemble Kalman Filter
In the Ensemble Kalman Filter (EnKF) approach, proposed by Evensen in 1994 [10], an ensemble of Nens data assimilation cycles are run simultaneously and independently: all of them assimilate the same set of observations, but for each member of the ensemble it will be added a different random perturbation to the observations. This ensemble can be used to estimate the forecast error covariance matrix Pf : after computing the analysis state xaj (tk ), j ∈ {1, . . . , Nens },
48
for each member of the ensemble, we can obtain the forecast states:
xfj (tk+1 ) = Mkj xaj (tk )
j ∈ {1, . . . , Nens }
(2.86)
the ensemble average xf (tk+1 ) and an estimate of the forecast error covariance matrix, a sort
of average of the Nens forecast error covariance matrices, e.g.:
Pfk+1 '
1
Nens − 1
N
ens h
X
j=1
i h f
iT
xfj (tk+1 ) − xf (tk+1 )
xj (tk+1 ) − xf (tk+1 )
(2.87)
This will actually tend to underestimate the forecast error covariance matrix Pf : other estimate
can be devised [16].
The EnKF approach has many advantages, among which:
• since typically Nens is somewhere between 10 and 100, the computational cost of EnKF is
increased by the same factor with respect to 3D-Var, for example. But it’s much smaller
compared to that of an EKF
• EnKF does not need a linear or adjoint model
• it does not even require the linearization of the evolution of the forecast error covariance
matrix Pf
Despite EnKF is not yet implemented in operational NWP forecast, it seems nowadays one of
the most promising assimilation schemes for the future.
2.3
AUS: Assimilation in the Unstable Subspace
The Assimilation in the Unstable Subspace (AUS) was introduced by Trevisan and Uboldi
in 2004 [33], hereafter referred to as TU, and developed by Trevisan, Uboldi and Carrassi
[34, 35, 3], to minimize the analysis and forecast errors by exploiting the flow-dependent instabilities of the forecast-analysis cycle system, which may be thought of as a system forced
by observations. In the AUS scheme the assimilation is obtained by confining the analysis increment δxa = xa − xf in the unstable subspace of the forecast-analysis cycle system so that
it will have the same structure of the dominant instabilities of the system. In such a way the
dynamically unstable components, present in the forecast error, which are responsible for error
growth, are in principle systematically reduced or eliminated. The unstable subspace will be
estimated by breeding on the data assimilation system (BDAS), a technique to be discussed
below. TU showed that AUS is a reliable and efficient approach in the 40 variables Lorenz 1996
49
model [20], while the subsequent studies proved the same for different, more realistic models
and observational configurations, including a Quasi-Geostrophic model with 14784 degrees of
freedom [3], and a high dimensional, primitive equation ocean model with 301120 degrees of
freedom [35]; the experiments encompassed fixed and “adaptive”, or “targeted”, observations.
In these contexts, the AUS-BDAS dynamical system approach greatly reduces the analysis error, with reasonable computational costs for data assimilation with respect, for example, to a
prohibitive full Extended Kalman Filter approach. This is a follow-up study in which we will
revisit the AUS-BDAS approach in the more basic, highly nonlinear Lorenz 1963 convective
model.
In practice, in the same spirit as of the EKF approach (equations 2.76-2.80), the AUS
assimilation algorithm is aimed at finding a simplified form of the forecast covariance matrix
Pf by exploiting the local unstable structures of the forecast-analysis cycle system, which in
turn are estimated by BDAS. Once Pf has been estimated, an approximated gain matrix K may
be computed, so finally — with some knowledge of the observational error covariance matrix
R and the observation operator H — we can estimate the analysis vector xa .
2.3.1
AUS: how it works
The forecast error is regarded as made of two components, the first on the unstable subspace,
and the other one on the complementary subspace [35]:
η f = Eγ + ξ
(2.88)
where the matrix E stores in its columns the normalized estimated unstable directions, the
column vector γ represents the forecast error component in the unstable basis: so Eγ is the
linear combination of the unstable directions that represents the forecast error component on
the unstable subspace. The correspondent forecast covariance matrix may be derived:
Pf
=
< η f (η f )T >
(2.89)
=
E < γγ T > ET + E < γξ T > + < ξγ T > ET + < ξξ T >
(2.90)
If we set Γ =< γγ T >and assume that the forecast error component in the complementary
subspace is small, we can neglect in eq. 2.90 all term containing ξ, and the Pf may be approximated:
Pf ' EΓET
50
(2.91)
The corresponding gain matrix is approximated as well:
K
= Pf HT (HPf HT + R)−1
(2.92)
' EΓET HT (HEΓET HT + R)−1
(2.93)
where R is the usual observational error covariance matrix and H is the Jacobian of the possibly
nonlinear observation operator H. The analysis vector expression reads:
xa
= xf + K yo − H(xf )
(2.94)
' xf + EΓET HT (HEΓET HT + R)−1 yo − H(xf )
(2.95)
The approximated equation 2.91, written by neglecting the component of Pf out of the unstable
subspace spanned by the basis vector stored in the matrix E, results in a gain matrix K
computed in a subspace (eq. 2.93). The resulting analysis of eq. 2.95 reduces the error
component in such subspace [35].
If M is the tangent linear propagator (between time tk to time tk+1 , [3]) we can write down:
MEk = Ek+1 Λk
where Λk is the matrix whose diagonal elements are the amplification factors, exp
R
tk+1
tk
λi (t) dt ,
where λi are the local Lyapunov exponents, in decreasing order, corresponding to the i-th column vector of Ek . The forecast error will then evolve according to:
a
η fk+1 = Ek+1 Λk E−1
k ηk
2.3.2
(2.96)
AUS: a simple example
For example, if the state vector has dimension 3, the number of observations each assimilation
step is 2 and the number of unstable directions is 2, the 3 × 2 matrix E is:

 e11

E=
 e21

e31
51

e12 

e22 


e32
(2.97)
where e1 = [e11 e21 e31 ]T and e2 = [e12 e22 e32 ]T are the 2 unstable directions. The 2 × 1
column vector γ is


γ
 1 
γ=

γ2
(2.98)
so the 3 × 1 product Eγ is


 e11 e12 


 e

 21 e22  


e31 e32

 e11 γ1 + e12 γ2

 e γ +e γ
22 2
 21 1

e31 γ1 + e32 γ2
=
Eγ

=

γ1 

γ2
(2.99)






(2.100)
The forecast error covariance matrix is 3 × 3:
Pf
' EΓET

 e11 e12

' 
 e21 e22

e31 e32
(2.101)





  γ1 
 e11

 [γ1 γ2 ] 

 γ2
e12
e21
e22

e31 

e32
(2.102)
where Γ is the 2 × 2 matrix
Γ


γ
 1 
= 
 [γ1 γ2 ]
γ2


 γ1 γ1 γ1 γ2 
= 

γ1 γ2 γ2 γ2
(2.103)
(2.104)
As we already mentioned in subsection 1.6.5 for a similar case, the observation error covariance
R is 2 × 2, the observation operator H is 2 × 3, and the gain matrix K is 3 × 2.
2.3.3
Refresh procedure
After the analysis, part of the information in the bred vectors, used to estimate the unstable
structures of the system, will be no longer available: that’s why a “refresh” procedure may
improve our capabilities to capture the system’s instabilities. “Refresh” means that a new
random perturbation is introduced in the place of bred vectors already used in assimilation;
which in turn may be simply discarded or “recycled” by adding them to the other vectors.
52
Which is the better strategy depends on the complexity of our system, and an “in between”
approach may be also implemented [35].
2.3.4
Using a single bred vector for assimilation
When we use a single bred vector to estimate the unstable 1-dimensional subspace, the matrix
E reduces to a single column vector e, the matrix Γ becomes the scalar γ 2 and the expressions
discussed in subsection 2.3.1 for forecast error, forecast error covariance matrix and gain matrix
reduce to:
2.3.5
η f = eγ + ξ
(2.105)
Pf ' γ 2 e eT
(2.106)
K = γ 2 e eT HT (γ 2 H e eT HT + R)−1
(2.107)
Adaptive observation strategy
The basic idea underlying adaptive, or targeted, observations is to take measurements where
the unstable structures have the maximum amplitude. The same structures exploited to locate
adaptive observations are used to estimate the Pf , K and analysis state xa , through equations
2.91, 2.93 and 2.95 respectively. This approach has been already tested in previous works by
Trevisan, Uboldi and Carrassi and proved to be highly efficient in realistic contexts [33, 34, 3].
2.4
BDAS: Breeding on the Data Assimilation System
Breeding on the Data Assimilation System (BDAS) is a method devised to estimate the unstable structures of the data assimilation system, that can be thought of as a system forced
by observations. It is a modified formulation of breeding: the basic idea of BDAS is to breed
initially random perturbations of the analysis and to impose them the same dynamics as the
analysis-forecast solution, including assimilation of the observations whenever available. The
perturbations need to be evolved for a sufficiently long time for a reliable estimate of the instabilities: the corresponding time is the breeding time.
2.4.1
Standard breeding method
The breeding method is a nonlinear, finite-amplitude generalization of the algorithm used to
compute the leading Lyapunov vector: as we already mentioned in subsection 1.3.9, bred vectors
are indeed closely related to Local Lyapunov Vectors. This is the basic idea: A small pertur53
bation of the state, if its amplitude is periodically scaled down to be kept small, will evolve
in a linear combination of unstable directions [35]. These directions are estimated by integrating the nonlinear model and by using one or more perturbed states. In realistic geophysical
models the normalization amplitude, the length of the breeding time and the frequency of the
normalization procedure may be tuned to filter out unwanted instabilities, such as convection
[16].
In practice, standard breeding works as follows. Given a dynamical system in the form
of a flow, a breeding cycle is started by adding a random initial perturbation with a fixed
initial amplitude, which is introduced only once, at the beginning of the procedure. The same
nonlinear model is integrated from the unperturbed (“control”) and from the perturbed initial
conditions. At regular time intervals the control forecast is subtracted from the perturbed
forecast, and the resulting difference is scaled down to the initial amplitude. Then it is added
to the corresponding new analysis or model state [16]. The forecast state can be computed by
xfk+1 = M xak
(2.108)
and the perturbation dynamics will be described by
δxfk+1 = M δxak
(2.109)
where M is the model and M = M(x(t)) its Jacobian evaluated around the forecast trajectory
x(t),
2.4.2
∀t ∈]tk , tk+1 ].
BDAS: how it works
Since BDAS is a particular implementation of the breeding method, it shares with it the basic
principles, i.e. the evolution, by using the nonlinear model, of a random initial perturbation
added to the control trajectory. In the analysis step we have observational data to assimilate,
and the evolved random initial perturbation will undergo the assimilation procedure.
Whenever observations become available, the analysis state is computed by
o
xak+1 = [I − KH] xfk+1 + Kyk+1
(2.110)
o
xak+1 = [I − KH] M xak + Kyk+1
(2.111)
or even, using eq. 2.108:
where K is the gain matrix and H is the (possibly nonlinear) observation operator. The
54
expression 2.110 is the same in Kalman Filters and 3D-Var schemes. The perturbation equation
for the system undergoing observational forcing reads:
δxak+1
=
[I − KH] δxfk+1
(2.112)
=
[I − KH] M δxak
(2.113)
where we used the linearized observation operator H, i.e. the Jacobian of the observation
operator H. Breeding on the Data Assimilation System is based, rather than on eq. 2.109,
on eq. 2.113, in which the matrix operator [I − KH] has a general stabilizing effect: the
assimilation will reduce the amplifying components of the error.
So, basically: in an assimilation system where observations are available once in a while,
during the forecast time the free system instabilities dominate the error growth; in the analysis
step the assimilation of observations will in general reduce some fast-growing components of
the error.
Another important issue is the breeding time, i.e. the time needed for the perturbations
to capture the most unstable structures. It can not be infinite, but should be long enough to
provide a meaningful (set of) bred vector(s). Typically, the breeding time ∆t is longer than the
assimilation window τ , and often is set as a multiple of τ :
∆t = nτ
2.4.3
(2.114)
BDAS, an example of practical implementation
Just to focus on a specific example, let’s suppose that we deal with a simple low dimensional
system for which the breeding time ∆t = 2τ , where τ = tk+1 − tk is the fixed assimilation
window. We assume that:
• the unstable subspace is estimated by using a single bred vector
• we can discard bred vector after use
If tk is the previous assimilation step, we use the general equation 1.34, that is:
δx(t) =
∂M
δx(tk ) = M δx(tk ),
∂x
t ∈ [tk , tk+1 ]
(2.115)
to estimate the evolution of the perturbation δx(t), for t spanning from the previous assimilation step tk up to the new one tk+1 . At new assimilation step tk+1 we will use the evolved
perturbation δx(tk−1 ) to estimate the unstable direction of the system, after which it is dis55
Table 2.1: Breeding on the Data Assimilation System: introducing the perturbations and estimating the unstable subspace. This is a specific example in which the breeding time is ∆t = 2τ ,
where τ = tk+1 − tk is the assimilation window.
.
Time
introduced
evolving
used to assimilate
undergoing
and then discarded
assimilation
tk−1
δx(tk−1 )
—
evolved δx(tk−3 )
evolved δx(tk−2 )
tk−1 < t ≤ tk
—
δx(tk−2 ) & δx(tk−1 )
—
—
tk
δx(tk )
—
evolved δx(tk−2 )
evolved δx(tk−1 )
tk < t ≤ tk+1
—
δx(tk−1 ) & δx(tk )
—
—
tk+1
δx(tk+1 )
—
evolved δx(tk−1 )
evolved δx(tk )
tk+1 < t ≤ tk+2
—
δx(tk ) & δx(tk+1 )
—
—
tk+2
δx(tk+2 )
—
evolved δx(tk )
evolved δx(tk+1 )
carded. The already grown perturbation δx(tk ) will undergo the same assimilation process
used to evaluate the new analysis state. Furthermore, a new random perturbation δx(tk+1 ) is
introduced (refresh, see subsection 2.3.3), to be used at assimilation step tk+3 . Let’s summarize:
In the particular example at hand, with a breeding time ∆t = 2τ , we will recursively follow
this procedure (see Table 2.1):
1. At each new assimilation step tk+1 :
(a) We exploit the evolved perturbation δx(tk−1 ), previously introduced at time tk−1 and
assimilated at time tk , to estimate the dominant part of the forecast error covariance
matrix Pfk+1
(b) After using, the evolved perturbation δx(tk−1 ) is discarded
(c) At the same time tk+1 , through the same assimilation scheme we calculate both the
analysis state of the system xak+1 and assimilate the perturbation δx(tk ), introduced
the previous assimilation time tk and evolved for all t ∈ ]tk , tk+1 ]
56
(d) Furthermore, also at time tk+1 , we randomly perturb the analysis state xak+1 with a
new, small vector δx(tk+1 ); that’s the refresh
2. For all t ∈ ]tk+1 , tk+2 ]:
(a) We evolve this new perturbation δx(tk+1 ) with the same dynamics of the system,
that is M = M(xak+1 ), with the TLM operator M evaluated along the forecast
trajectory x(t)
(b) We evolve also δx(tk ), that was introduced at time tk and that underwent to evolution
for t ∈ ]tk , tk+1 ] as well as an assimilation procedure at the assimilation step tk+1
(see item 1.c)
(c) We let the new perturbation δx(tk+1 ) grow for a suitable time before use, in this
example for a breeding time ∆t = 2τ , so it will be used at assimilation time tk+3 .
Since the time interval ∆t is greater than the assimilation window τ , the evolved
perturbation δx(tk+1 ) will undergo to the assimilation process at time tk+2 . The assimilation will be done by using the perturbation δx(tk ), evolved and yet assimilated
at previous assimilation step tk+1
3. At assimilation step tk+2 the cycle is repeated
Throughout this work we will use a single bred vector δx(tk ). When it is ready, after a breeding
time ∆t = 2τ , it is assumed to capture the most unstable structure of the forced system. It
may be normalized or not, depending on the particular scheme adopted: if not normalized, an
opportune initial length has to be chosen.
57
58
Chapter 3
Assimilation in the Lorenz 63
model: comparison among
different methods
In the context of Lorenz convective 3-dimensional model (1963), we will run observation system
simulation experiments in a perfect model setting and, in section 3.6, with some kind of model
error as well.
In section 3.2, we will describe the algorithms used for different flavors of Extended Kalman
Filters, while in section 3.3 we will do the same for different AUS assimilation schemes with
increasing capabilities. We will then illustrate the results in section 3.4. In section 3.5 we will
show some examples about the different behavior of the EKF and AUS assimilation schemes in
some specific circumstances. Finally, in section 3.6, we will test our methods in the presence of
model error.
3.1
Experimental setups
Since we are mainly interested in comparing different data assimilation schemes in critical
circumstances, we choose hard setups for our experiments. Basically we will perform three kind
of experiments, concerning:
• Synchronization to the “truth”
• Noisy observations
• Model error
59
In synchronization experiments we will qualitatively show for a few typical cases the capabilities
to converge to the truth for the EKF, the Evensen’s flavor of EKF (see section 3.2), and the
best-performance AUS scheme (see section 3.3). In particular we will use a mix of perfect or
quasi-perfect observations (σ 2 = 0, σ 2 = 0.01 or σ 2 = 0.1) with long or very long assimilation
windows (τ = 0.25 or τ = 0.6). In these case studies we will observe only the y variable, which
is the most valuable one.
In noisy observation experiments the goal is to show the average RMS analysis and forecast
error for all the different DA schemes with a long assimilation window (τ = 0.25) and noisy
observations (σ 2 = 2). The results shown are the mean values of 100,000 assimilations, with
3 or 2 variables observed. When 2 observations are used, the most valuable two (y and z) are
used for the EKF schemes, while for the AUS schemes we will use two adaptive observations
(see subsection 2.3.5). Similar trials with different noisy observations (σ 2 = 1 or σ 2 = 0.1) will
be performed, with 3 observed variables.
In model error experiments, first we will perform trials by adding a random error to the
model equations at each integration step, then we will test the effects of systematic error by
varying one model parameter. For these experiments the results shown are the mean values of
20,000 assimilations.
3.2
Extended Kalman Filters
When the EKF is applied to a strongly nonlinear system, such as Lorenz’s three variables model,
filter divergence can occur.
Different empirical techniques have been devised to overcome this difficulty: Evensen in
1997 added a term Q akin to the model error covariance term in EKF [11]; Yang et al. in 2006
perturbed the analysis error covariance matrix and inflated the background error covariance
matrix [36].
3.2.1
EKF
The assimilation window is set to τ = 0.25, which is a quite large value for this dynamical
system: due to the limits of the linear approximation, in these conditions the EKF tends to
have poor performances with respect to other assimilation schemes, and often filter divergence
occurs.
These experiments are based on standard EKF equations 2.76-2.80, with the covariance
60
matrix Q set to zero, so:
xfk+1
= M xak
(3.1)
Pfk+1
= MPak MT
(3.2)
Kk+1
= Pfk+1 HT (HPfk+1 HT + R)−1
(3.3)
xak+1
o
= (I − Kk+1 H)xfk+1 + Kk+1 yk+1
(3.4)
Pak+1
= (I − Kk+1 H)Pfk+1
(3.5)
Here the observation operator is the same as its Jacobian H = H. If we observe 3 variables
at each assimilation step, H will reduce to the identity 3 × 3 matrix:

Hxyz
 1

=I=
 0

0

0 0 

1 0 


0 1
(3.6)
If we observe 2 variables, they will be y and z, the most useful. The operator H will be

 0 1
Hyz = 
0 0
3.2.2

0 

1
(3.7)
Evensen’s version of EKF
In Evensen [11], where the model is considered perfect, standard EKF equations 2.76-2.80 are
still used, but the covariance matrix Q, an additive term akin to the model error covariance,
is used as a correction to avoid filter divergence. The matrix Q has been estimated after
optimization:


Q =
3.2.3
 0.1491 0.1505 0.0007 


 0.1505 0.9048 0.0014 




0.0007 0.0014 0.9180
(3.8)
Yang’s version of EKF
In Yang et al. (2006) the analysis error covariance has been perturbed and the background
error covariance inflated [36]:
• Random noise: small random perturbations uniformly distributed between 0 and 1, and
multiplied by µ = 0.1 (when 3 variables are observed) or µ = 0.2 (when 2 variables are
61
observed) are added to the diagonal of the analysis error covariance matrix obtained with
the EKF at every assimilation step
• Inflation: the background error covariance is inflated by a factor of 1 + δ = 1.1, prior to
the analysis step
Once the empirical parameters have been optimized for a specific assimilation interval and observation noise variance, these techniques provide satisfactory results, particularly for sufficiently
short assimilation intervals.
3.3
Assimilation in the Unstable Subspace: further developments
In this section we will apply, to the L63 dynamical system, different formulations of the AUS
scheme, with increasing capabilities. We use a single unstable vector of the analysis-forecast
system, estimated by BDAS (see section 2.4). So the recurrence equations are [3]:
xfk+1
=
M xak
Pfk+1
=
γ 2 ek+1 eTk+1
(3.10)
Kk+1
=
(3.11)
xak+1
=
2
−1
2
γk+1
ek+1 (Hek+1 )T γk+1
(Hek+1 )(Hek+1 )T + R
Pak+1
=
(3.9)
o
(I − Kk+1 H)xfk+1 + Kk+1 yk+1
(3.12)
(I − Kk+1 H)Pfk+1
(3.13)
where the symbols have the usual meaning as in EKF equations, except for:
• the normalized column vector ek+1 , that is the single unstable unit vector, at assimilation
step tk+1 , estimated by BDAS. The bulk of the forecast error covariance matrix Pfk+1 is
computed in the unstable 1-dimensional subspace defined by the unit vector ek+1 , and
assumed to be small in the complementary subspace
• the amplitude of the forecast error γ
Note also that here the observation operator H is actually linear (we simply observe a variable
or not) and it has then been replaced by its linearized version: H = H. The analysis covariance
matrix Pak+1 (eq. 3.13) has been shown here for completeness, but it’s not actually used in the
estimate of the successive Pfk+2 .
62
A key problem in AUS is the estimate of the amplitude of the forecast error γ. We begin in
subsection 3.3.1 with a very basic AUS-γ0 approach, where no use of observations is made in the
estimate of γ, then in subsection 3.3.2 we introduce a new estimate of γ from observations, that
will be used in subsection 3.3.3. Then a new formulation of AUS is introduced in subsections
3.3.4 and 3.3.5.
3.3.1
AUS-γ0 : no use of observations in the estimate of the forecast
error amplitude
This DA scheme is a simplified formulation of the AUS approach. The main reason why we
implement this assimilation scheme is to show that we actually need a way to estimate the
amplitude γ of the forecast error in the unstable subspace from observations. In this assimilation
scheme, instead, we simply optimize the initial amplitude α used in the re-normalization of the
random perturbation. By breeding on the data-assimilation system (BDAS), we obtain the
perturbation in the unstable 1-dimensional subspace, whose amplitude is γ, at the time it
is used in the assimilation. During the forecast steps this perturbation will amplify and its
amplitude will be reduced at assimilation time.
Let’s suppose that the breeding time has been chosen equal to 2 assimilation windows: the
unstable vector may need more time than a single assimilation window to grow and to capture
the instability of the flow. So, at each assimilation step tk+1 (see Table
2.1):
• we use in the assimilation process the vector γ ek+1 , i.e. the evolution, through the tangent
linear model propagator, of the random perturbation α δxk−1 introduced two assimilation
steps earlier, that underwent also the assimilation process in the previous step tk through
the (I − Kk H) operator
• in the same way, we assimilate the observations in the trajectory obtained by adding α δxk
to the control at the previous assimilation step. The evolved perturbation will be used at
the next assimilation step, time tk+2 , to estimate Pfk+2
• we introduce a new perturbation α δxk+1 , where δxk+1 is a random vector whose components are Gaussian, with zero mean and standard deviation 1; its evolution will be used
at assimilation step tk+3 , after having undergone the assimilation process at time tk+2
The parameter α needs to be tuned. In the present application the optimized value is α = 1.
The estimate of the forecast error covariance matrix is
Pfk+1
=
γ 2 ek+1 eTk+1
63
(3.14)
The Kalman gain, in the same spirit as in the EKF, is
Kk+1 = Pfk+1 HT (HPfk+1 HT + R)−1
(3.15)
It will be used to estimate the analysis state, by correcting the forecast state, via the
xak+1 = (I − Kk+1 H) xfk+1 + Kk+1 yok+1
(3.16)
and the analysis covariance matrix
Pak+1 = (I − Kk+1 H)Pfk+1
(3.17)
The unstable vector δxk , introduced at the previous assimilation step tk , will be corrected as
well:
δxak = (I − Kk+1 H) Mδxk−1
3.3.2
(3.18)
Estimate of the amplitude γ of the forecast error from observations
An estimate of the amplitude γ of the forecast error in a single unstable direction e can be
obtained in the following way, by rearranging the definition of the forecast error and by assuming
that the bulk of it is along the vector e:
Hη f = H(xf − xt )
xf − xt
H(xf − xt )
= −d + η o
(3.19)
= γe + δf
(3.20)
= γHe + δHf
(3.21)
where we dropped the subscripts k + 1 referring to the assimilation time and where
d =
yo − Hxf
innovation
ηo
=
yo − Hxt
observational error
ηf
=
xf − xt
forecast error
64
The vector f spans the subspace complementary of the unstable subspace. So:
γHe + δHf = −d + η o
(3.22)
γeT HT He + δeT HT Hf = −eT HT d + eT HT η o
(3.23)
Now, if we left-multiply by eT HT :
Neglecting the terms δeT HT Hf and eT HT η o (which is zero, on average), we obtain the following
estimate of γ:
γ ' −(eT HT He)−1 eT HT d
(3.24)
Our estimate of Pf is the same as in eq. 2.106, but with this new estimate of γ:
Pf ' γ 2 e · eT
(3.25)
This new estimate of γ also affects the corresponding gain matrix (see eq. 2.107):
K ' γ 2 e eT HT (γ 2 H e eT HT + R)−1
(3.26)
Accordingly, the analysis increment of eq. 2.95 becomes:
δxa = xa − xf
= K (yo − Hxf )
(3.27)
= γ 2 eeT HT (γ 2 HeeT HT + R)−1 (yo − Hxf )
(3.28)
= ec
(3.29)
In equation 3.29, c is a scalar coefficient.
When the assimilation interval is sufficiently long with respect to the typical doubling time
and the forecast error becomes large with respect to the observation error, the estimate of γ
from observations leads to significant improvement of assimilation performance.
3.3.3
AUS-γ: using the estimate of γ from observations
In this scheme we use the same approach as in AUS-γ0 , but with three important improvements:
• The unstable direction e (normalized to 1), which is again estimated via BDAS, is estimated by evolving an infinitesimal perturbation δxk−1 , so that we can expect the perturbed trajectory to still be on the attractor (or at least not far from it)
• An estimate of the amplitude of the forecast error γ from observations, as described in
65
subsection 3.3.2; the amplitude of the forecast error is intended to be in the unstable
direction e
• A new estimate of Pfk+1
The second and third improvements have been discussed in subsection 3.3.2: after estimating
the amplitude of the forecast error γ from observations, the forecast error covariance matrix
will be
Pfk+1 = γ 2 ek+1 eTk+1
(3.30)
The gain matrix will be computed with
Kk+1 = Pfk+1 HT (HPfk+1 HT + R)−1
(3.31)
which is formally the same as eq. 3.15, but the better estimate of Pfk+1 will in turn provide the
following better approximated expression, already mentioned in eq. 3.26:
Kk+1 = γ 2 ek+1 eTk+1 HT (γ 2 H ek+1 eTk+1 HT + R)−1
We are now able to compute the analysis state xak+1 and the analysis covariance matrix Pak+1
through equations 3.16 and 3.17, as usual.
It will be shown in section 3.4 that this algorithm will greatly enhance the performance of
AUS scheme.
3.3.4
Iterating
This assimilation scheme is a further improvement to AUS-γ. After calculating the coefficient
c at time tk+1 (eq. 3.29), we apply at the previous analysis state xak the perturbation ∆xak =
Rt
c exp(− tkk+1 λ(t) dt)ek that, if the error behaved linearly, would lead exactly to the analysis
xak+1 obtained from the analysis increment expression (3.27-3.29). In fact, from time tk to tk+1 ,
we have:
Mek = exp
Z
tk+1
λ(t) dt ek+1
tk
(3.32)
where λ(t) is the leading local Lyapunov exponent. Therefore:
M∆xak
Z
= c exp −
tk+1
λ(t) dt Mek
tk
tk+1
Z
= c exp −
λ(t) dt exp
tk
= c ek+1
(3.33)
Z
tk+1
tk
λ(t) dt ek+1
(3.34)
(3.35)
66
We now integrate the system nonlinearly from this new estimate of the previous analysis state
xak , evolve the perturbation ek following the updated nonlinear trajectory from tk to tk+1
and perform the final analysis at time tk+1 . This procedure is hereafter referred to as AUSiterating: it is a major improvement with respect to the AUS-γ scheme.
3.3.5
Iterating and using a quasi-static Pf in stable zones of the attractor
This technique is a refinement of AUS-iterating. If the dynamical system is passing through a
zone of the attractor where there are stable trajectories, i.e. where they do not diverge, we can
assume that errors do not actually grow.
So, in these stable situations (where the amplification factor during forecast step is ≤ 1) we
use a quasi-static, diagonal Pf . If:
kM δxk k < kδxk k
(3.36)
then we set:
Pfk+1 = a2
kM δxk k
kδxk k
2
I
(3.37)
where M δxk is the evolution from time tk to tk+1 of the perturbation δxk , which in turn is
the evolution of a random perturbation, introduced at time tk−1 , which has grown from time
tk−1 to tk and undergone an assimilation process at time tk . The coefficient a2 is proportional
to the square of the average analysis error. In particular, it has been chosen
a2 =
1
< η a >2
2
(3.38)
where η a is the analysis error. This procedure is hereafter referred to as AUS-iterating+: it
provides the best performances.
3.4
Comparing results
In this section we will compare the performances of the different assimilation schemes under
investigation. In the following remarks we will use the acronyms of Table 3.1.
We will begin with some case studies of synchronization to the truth with a perfect or quasiperfect observation (the variable y) and a long assimilation window; then we will deal with
67
Table 3.1: The different data assimilation schemes under investigation with their main features.
EKF
pure EKF
EKF-Evensen
Evensen’s EKF, with an optimized Q [11]
EKF-Yang
Yang’s EKF, with random noise in Pa and inflation [36]
AUS-γ0
AUS with no use of observations for the estimate
of the amplitude of the forecast error γ; Pf = γ 2 eeT
AUS-γ
AUS with γ = −(eT HT He)−1 eT HT d,
AUS-iterating
AUS-γ with iterations
AUS-iterating+
AUS-iterating with static Pfk+1 if amplification factor ≤ 1
during forecast step
Pf = γ 2 eeT
shorter assimilation windows and noisy observations in different configurations.
3.4.1
Synchronization: one perfect or quasi-perfect observation, case
studies
In this subsection we present a few case studies about synchronization with truth for perfect
or quasi-perfect observations, for different assimilation windows. The case studies shown have
been chosen for the clarity of the resulting plots and because they are neither particularly advantageous nor pathological for any assimilation scheme. We’re going to compare only EKF,
EKF-Evensen and AUS-iterating assimilation schemes, which do not need any tuning of parameters. We will notice that all of them eventually converge to the truth, but AUS-iterating
converges faster than EKF or EKF-Evensen. Furthermore, for critical situations, in terms of
assimilation window length or observational error amplitude, AUS-iterating still converges to
the truth in cases when the EKF or EKF-Evensen do not.
It’s known that, even with a single perfect observation only, the EKF may synchronize with
the truth, if the assimilation window is not too long. This is shown in Fig. 3.1, where we
observe y with no observational error (σ 2 = 0) with assimilation window τ = 0.25. You can
notice how AUS-iterating DA scheme converges much faster.
The pure EKF and the EKF-Evensen version give the same results for σ 2 = 0, since the
gain matrix and the analysis vector will reduce in both cases to the same simpler expression.
68
Plot of RMS error (Analysis−Truth) in RMS against time
1
EKF & EKF−Evensen
AUS−iterating
0.9
0.8
0.7
RMS Error
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
4
5
Time
6
7
8
9
10
Figure 3.1: Synchronization with truth for both EKF/EKF-Evensen and AUS-iterating assimilation schemes. The only observed variable is y with variance σ 2 = 0 and assimilation window
τ = 0.25
From the Kalman gain expression (eq. 2.78):
Kk+1 = Pfk+1 HT (HPfk+1 HT + R)−1
if the matrix R = 0, in observation space:
HKk+1
= HPfk+1 HT (HPfk+1 HT )−1
(3.39)
= I
(3.40)
Thus, from the analysis state expression (eq. 2.79):
o
xak+1 = (I − Kk+1 H)xfk+1 + Kk+1 yk+1
we get:
o
Hxak+1 = yk+1
(3.41)
Just for example, in Figures 3.2-3.5 the EKF and EKF-Evensen solution are shown for x and
the assimilation performance for σ 2 = 0 and assimilation window τ = 0.6. The vertical dotted
69
Solution for x
20
15
10
x
5
0
−5
−10
−15
−20
Truth
Observations
Analysis
0
5
10
15
Time
20
25
30
Figure 3.2: EKF and EKF-Evensen (same plot in these conditions, see text and eq. 3.40) find
it hard to synchronize with truth. The only observed variable is y with variance σ 2 = 0 and
assimilation window τ = 0.6
line is the time of the last observation. In Figures 3.6-3.10 the same is shown for σ 2 = 0.01
and assimilation window τ = 0.6 while Figures 3.11-3.15 show the case with σ 2 = 0.1 and
assimilation window τ = 0.25.
We can see that, when synchronization occurs for all DA schemes, AUS-iterating has the
most rapid convergence to the truth; when it does not occur, AUS-iterating has an overall better
performance. Furthermore, there are circumstances in which AUS-iterating synchronizes to the
truth while neither EKF nor EKF-Evensen do.
3.4.2
Noisy observations with variance σ 2 = 2
These experiments are performed with the following setup:
• A 105 assimilations statistics
• 3 or 2 noisy observations at each assimilation step, with variance σ 2 = 2 ⇒ σ = 1.414
• an assimilation window τ = 0.25
70
Solution for x
20
15
10
x
5
0
−5
−10
−15
−20
Truth
Observations
Analysis
0
5
10
15
Time
20
25
30
Figure 3.3: AUS-iterating: synchronization with truth. The only observed variable is y with
variance σ 2 = 0 and assimilation window τ = 0.6
Plot of RMS error (Analysis−Truth) in RMS against time
60
EKF & EKF−Evensen
AUS−iterating
50
RMS Error
40
30
20
10
0
0
5
10
15
Time
20
25
30
Figure 3.4: EKF/EKF-Evensen and AUS-iterating RMS errors. While the former fail to converge to the truth, the latter synchronizes very quickly. The only observed variable is y with
variance σ 2 = 0 and assimilation window τ = 0.6
71
Plot of RMS error (Analysis−Truth) in RMS against time
3.5
EKF & EKF−Evensen
AUS−iterating
3
RMS Error
2.5
2
1.5
1
0.5
0
0
0.5
1
1.5
2
2.5
3
3.5
4
Time
Figure 3.5: EKF/EKF-Evensen and AUS-iterating RMS errors: a zoom of the previous Fig.
3.4.
Solution for x
30
Truth
Observations
Analysis
20
10
x
0
−10
−20
−30
−40
−50
0
5
10
15
Time
20
25
30
Figure 3.6: EKF: the only observed variable is y with variance σ 2 = 0.01 and assimilation
window τ = 0.6; in this case and in these conditions the EKF fails to synchronize to the truth.
Note also the bad performance around time 14.
72
Solution for x
20
15
10
x
5
0
−5
−10
−15
−20
Truth
Observations
Analysis
0
5
10
15
Time
20
25
30
Figure 3.7: EKF-Evensen: the only observed variable is y with variance σ 2 = 0.01 and assimilation window τ = 0.6. A better performance than pure EKF.
Solution for x
20
15
10
x
5
0
−5
−10
−15
−20
Truth
Observations
Analysis
0
5
10
15
Time
20
25
30
Figure 3.8: AUS-iterating: the only observed variable is y with variance σ 2 = 0.01 and assimilation window τ = 0.6. A far better performance.
73
Plot of RMS error (Analysis−Truth) in RMS against time
120
EKF
EKF−Evensen
AUS−iterating
100
RMS Error
80
60
40
20
0
0
5
10
15
20
25
30
Time
Figure 3.9: EKF, EKF-Evensen and AUS-iterating RMS errors: respective performances. The
only observed variable is y with variance σ 2 = 0.01 and assimilation window τ = 0.6
Plot of RMS error (Analysis−Truth) in RMS against time
60
EKF
EKF−Evensen
AUS−iterating
50
RMS Error
40
30
20
10
0
0
1
2
3
4
5
6
7
8
9
10
Time
Figure 3.10: EKF, EKF-Evensen and AUS-iterating RMS errors: a zoom of the previous Fig.
3.9.
74
Solution for x
20
Truth
Observations
Analysis
15
10
x
5
0
−5
−10
−15
−20
0
5
10
15
Time
20
25
30
Figure 3.11: EKF does not synchronize with truth. The only observed variable is y with variance
σ 2 = 0.1 and assimilation window τ = 0.25
Solution for x
20
Truth
Observations
Analysis
15
10
x
5
0
−5
−10
−15
−20
0
5
10
15
Time
20
25
30
Figure 3.12: EKF-Evensen does not synchronize with truth. The only observed variable is y
with variance σ 2 = 0.1 and assimilation window τ = 0.25
75
Solution for x
20
Truth
Observations
Analysis
15
10
5
x
0
−5
−10
−15
−20
0
5
10
15
Time
20
25
30
Figure 3.13: AUS-iterating does not synchronize with truth. The only observed variable is y
with variance σ 2 = 0.1 and assimilation window τ = 0.25
Plot of RMS error (Analysis−Truth) in RMS against time
50
EKF
EKF−Evensen
AUS−iterating
45
40
35
RMS Error
30
25
20
15
10
5
0
0
5
10
15
Time
20
25
30
Figure 3.14: The only observed variable is y with variance σ 2 = 0.1 and assimilation window
τ = 0.25: in these conditions, no assimilation scheme under investigation actually converge to
the truth, but AUS-iterating has a better performance than EKF-Evensen, which in turn is far
better than pure EKF.
76
Plot of RMS error (Analysis−Truth) in RMS against time
9
EKF
EKF−Evensen
AUS−iterating
8
7
RMS Error
6
5
4
3
2
1
0
0
2
4
6
8
10
Time
12
14
16
18
20
Figure 3.15: A zoom of the previous Fig. 3.14. Even if the global performance of EKF is
poor because of filter divergence, this does not mean that EKF is always worse than others DA
schemes for all times; while the global performance of AUS-iterating is globally better.
When 2 observations are used, the most valuable two (y and z) are used for the EKF schemes;
for the AUS schemes two adaptive observations are used (see subsection 2.3.5). In Figures 3.163.20 we show the average RMS analysis error distributions for 3 and 2 observed variables, and
the forecast error distribution at time T+0.25, T+0.5 and T+0.75, where T is the assimilation
time. All distributions are simply truncated at RMS error equal to 10.
These distributions show that not only AUS-iterating schemes are better — on average
— than competitors, but their right tails are much less populated, too: the regime’s changes
tracking capability has been highly improved.
Numerical results are also shown in Tables 3.2 and 3.3: we can see that the AUS-iterating
schemes outperform the other techniques, with even an average RMS analysis error well below
the observations standard deviation. Similar relative performances can be seen for the average
RMS forecast error. A further conclusion may be drawn: from the results of AUS-γ0 and AUS-γ
schemes, we see that the proposed estimate of the forecast error amplitude from observations
is really helpful, since it greatly boosts the assimilation performance.
77
Table 3.2: RMS analysis error, an average over 100,000 assimilations. 3 and 2 noisy observations
with variance σ 2 = 2 ⇒ σ = 1.414. Assimilation window τ = 0.25.
assimilation technique
EKF
EKF-Evensen
EKF-Yang
AUS-γ0
AUS-γ
AUS-iterating
AUS-iterating+
τ = 0.25, 3 obs
15.5
1.72
3.77
7.02
2.27
1.38
1.16
τ = 0.25, 2 obs
15.6
1.79
3.90
7.37
2.52
1.58
1.33
Table 3.3: RMS forecast error, an average over 100,000 assimilations. The mean RMS analysis
error is the same as in Table 3.2 and is shown here again for comparison. 3 noisy observations
with variance σ 2 = 2 ⇒ σ = 1.414. Assimilation window τ = 0.25.
Assimilation
technique
<RMS analysis error>
EKF
EKF-Evensen
EKF-Yang
AUS-γ0
AUS-γ
AUS-iterating
AUS-iterating+
15.5
1.72
3.77
7.02
2.27
1.38
1.16
<RMS forecast error>
@ T+0.25
16.6
3.27
5.36
8.81
3.79
2.75
2.33
78
@ T+0.50
16.9
6.26
6.97
10.4
6.23
4.94
4.06
@ T+0.75
16.9
7.24
7.64
11.1
7.33
5.61
4.71
RMS Analysis Error distribution − 105 assimilations, τ=0.25, 3 obs., σ2=2
8000
EKF
EKF−Evensen
EKF−Yang
AUS−γo
AUS−γ
AUS−iterating
AUS−iterating+
7000
Number of realizations
6000
5000
4000
3000
2000
1000
0
0
1
2
3
4
5
6
7
8
9
10
RMS Analysis Error
Figure 3.16: RMS Analysis Error distribution: an average on 100,000 assimilations with an
assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 2
RMS Analysis Error distribution − 105 assimilations, τ=0.25, 2obs., σ2=2
7000
EKF
EKF−Evensen
EKF−Yang
AUS−γo
AUS−γ
AUS−iterating
AUS−iterating+
6000
Number of realizations
5000
4000
3000
2000
1000
0
0
1
2
3
4
5
6
7
8
9
10
RMS Analysis Error
Figure 3.17: RMS Analysis Error distribution: an average on 100,000 assimilations with an
assimilation window τ = 0.25, 2 noisy observations with variance σ 2 = 2
79
RMS Forecast Error distribution @time=T+0.25, 105 assimilations, 3 obs., σ2=2
6000
EKF
EKF−Evensen
EKF−Yang
AUS−γo
AUS−γ
AUS−iterating
AUS−iterating+
Number of realizations
5000
4000
3000
2000
1000
0
0
1
2
3
4
5
6
7
8
9
10
RMS Forecast Error
Figure 3.18: RMS Forecast Error distribution @ time=T+0.25: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 2
RMS Forecast Error distribution @time=T+0.50, 105 assimilations, 3 obs., σ2=2
5000
EKF
EKF−Evensen
EKF−Yang
AUS−γo
AUS−γ
AUS−iterating
AUS−iterating+
4500
Number of realizations
4000
3500
3000
2500
2000
1500
1000
500
0
0
1
2
3
4
5
6
7
8
9
10
RMS Forecast Error
Figure 3.19: RMS Forecast Error distribution @ time=T+0.50: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 2
80
RMS Forecast Error distribution @time=T+0.75, 105 assimilations, 3 obs., σ2=2
4000
EKF
EKF−Evensen
EKF−Yang
AUS−γo
AUS−γ
AUS−iterating
AUS−iterating+
3500
Number of realizations
3000
2500
2000
1500
1000
500
0
0
1
2
3
4
5
6
7
8
9
10
RMS Forecast Error
Figure 3.20: RMS Forecast Error distribution @ time=T+0.75: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 2
3.4.3
Noisy observations with variance σ 2 = 1
Now we’re going to diminish the variance of the observational error. Since the relative performance are similar, we will only show the results for 3 observations probing. So the experimental
context is:
• A 100,000 assimilations statistics
• 3 noisy observations at each assimilation step, with variance σ 2 = 1
• an assimilation window τ = 0.25
The results, shown in Table 3.4 and in Figures 3.21-3.24, confirm those with σ 2 = 2. Note
that EKF is still under-performing, and still experiencing frequent filter divergence.
3.4.4
Noisy observations with variance σ 2 = 0.1
Let’s continue to lower observational noise variance: all the experiments are now performed
under the following conditions:
• A 100,000 assimilations statistics
81
Table 3.4: RMS analysis and forecast error, an average over 100,000 assimilations. 3 noisy
observations with variance σ 2 = 1. Assimilation window τ = 0.25.
Assimilation
technique
<RMS analysis error>
EKF
EKF-Evensen
EKF-Yang
AUS-γ0
AUS-γ
AUS-iterating
AUS-iterating+
15.2
1.30
0.99
3.55
1.42
0.96
0.80
<RMS forecast error>
@ T+0.25
16.3
2.39
1.78
4.93
2.52
1.94
1.63
@ T+0.50
16.6
4.94
3.04
6.65
4.56
3.77
3.07
@ T+0.75
16.6
5.93
3.63
7.50
5.57
4.39
3.65
5
2
RMS Analysis Error distribution, 10 assimilations, τ=0.25, 3 obs., σ =1
12000
EKF
EKF−Evensen
EKF−Yang
AUS−γo
AUS−γ
AUS−iterating
AUS−iterating+
Number of realizations
10000
8000
6000
4000
2000
0
0
1
2
3
4
5
6
7
8
9
10
RMS Analysis Error
Figure 3.21: RMS Analysis Error distribution: an average on 100,000 assimilations with an
assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 1
82
RMS Forecast Error distribution @time=T+0.25, 105 assimilations, 3 obs., σ2=1
9000
EKF
EKF−Evensen
EKF−Yang
AUS−γo
AUS−γ
AUS−iterating
AUS−iterating+
8000
Number of realizations
7000
6000
5000
4000
3000
2000
1000
0
0
1
2
3
4
5
6
7
8
9
10
RMS Forecast Error
Figure 3.22: RMS Forecast Error distribution @ time=T+0.25: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 1
RMS Forecast Error distribution @time=T+0.50, 105 assimilations, 3 obs., σ2=1
7000
EKF
EKF−Evensen
EKF−Yang
AUS−γo
AUS−γ
AUS−iterating
AUS−iterating+
6000
Number of realizations
5000
4000
3000
2000
1000
0
0
1
2
3
4
5
6
7
8
9
10
RMS Forecast Error
Figure 3.23: RMS Forecast Error distribution @ time=T+0.50: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 1
83
RMS Forecast Error distribution @time=T+0.75, 105 assimilations, 3 obs., σ2=1
6000
EKF
EKF−Evensen
EKF−Yang
AUS−γo
AUS−γ
AUS−iterating
AUS−iterating+
Number of realizations
5000
4000
3000
2000
1000
0
0
1
2
3
4
5
6
7
8
9
10
RMS Forecast Error
Figure 3.24: RMS Forecast Error distribution @ time=T+0.75: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 1
• 3 noisy observations each assimilation step, with variances σ 2 = 0.1 ⇒ σ = 0.316
• an assimilation window τ = 0.25
The results are shown in Table 3.5 and in Figures 3.25-3.28, where all distributions are simply
truncated where RMS error is equal to 5.
3.4.5
Root Mean Square forecast error: time dependence
Now we show the average time growth of RMS analysis error. The average is computed on
100000 assimilation steps. There are 3 variables observed, each assimilation window τ = 0.25.
At time t = 0 the curves show the average RMS analysis error (see Tables 3.3, 3.4 and 3.5).
The plots are listed on the basis of the variance of the observational error:
• In the 1st plot, Fig. 3.29: σ 2 = 2
• in the 2nd plot, Fig. 3.30: σ 2 = 1
• in the 3rd plot, Fig. 3.31: σ 2 = 0.1
The EKF curve has been intentionally plotted only in the first one. Again, the AUS-iterating
schemes, in particular AUS-iterating+, set the benchmark.
84
Table 3.5: RMS analysis and forecast error, an average over 100,000 assimilations. 3 noisy
observations with variance σ 2 = 0.1 ⇒ σ = 0.316. Assimilation window τ = 0.25.
Assimilation
technique
<RMS analysis error>
EKF
EKF-Evensen
EKF-Yang
AUS-γ0
AUS-γ
AUS-iterating
AUS-iterating+
15.0
0.49
0.25
0.54
0.36
0.30
0.26
4
3
x 10
<RMS forecast error>
@ T+0.25
16.0
0.79
0.53
1.02
0.72
0.61
0.52
@ T+0.50
16.4
1.86
1.21
1.99
1.56
1.37
1.14
5
2
RMS Analysis Error distribution − 10 assim., τ=0.25, 3 obs., σ =0.1
EKF
EKF−Evensen
EKF−Yang
AUS−γo
AUS−γ
AUS−iterating
AUS−iterating+
2.5
Number of realizations
@ T+0.75
16.4
2.67
1.62
2.74
2.20
1.85
1.53
2
1.5
1
0.5
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
RMS Analysis Error
Figure 3.25: RMS Analysis Error distribution: an average on 100,000 assimilations with an
assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 0.1
85
2
RMS Forecast Error distribution @time=T+0.25, σ =0.1
4
2.5
x 10
EKF
EKF−Evensen
EKF−Yang
AUS−γo
AUS−γ
AUS−iterating
AUS−iterating+
Number of realizations
2
1.5
1
0.5
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
RMS Forecast Error
Figure 3.26: RMS Forecast Error distribution @ time=T+0.25: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 0.1
2
RMS Forecast Error distribution @time=T+0.50, σ =0.1
4
2
x 10
EKF
EKF−Evensen
EKF−Yang
AUS−γo
AUS−γ
AUS−iterating
AUS−iterating+
1.8
Number of realizations
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
RMS Forecast Error
Figure 3.27: RMS Forecast Error distribution @ time=T+0.50: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 0.1
86
RMS Forecast Error distribution @time=T+0.75, σ2=0.1
15000
Number of realizations
EKF
EKF−Evensen
EKF−Yang
AUS−γo
AUS−γ
AUS−iterating
AUS−iterating+
10000
5000
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
RMS Forecast Error
Figure 3.28: RMS Forecast Error distribution @ time=T+0.75: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 0.1
<RMS Analysis+Forecast Error> − 105 assimilations, τ=0.25, 3 obs., σ2=2
18
EKF
EKF−Evensen
EKF−Yang
AUS−γ0
16
AUS−γ
AUS−iterating
AUS−iterating+
14
RMS Error
12
10
8
6
4
RMS Error=σ
2
0
Analysis
0.25
0.5
0.75
Time
Figure 3.29: RMS analysis and forecast error: an average on 100,000 assimilation steps, assimilation window τ = 0.25, 3 variables observed with variance σ 2 = 2 ⇒ σ = 1.414.
87
<RMS Analysis+Forecast Error> − 105 assimilations, τ=0.25, 3 obs., σ2=1
10
9
8
RMS Error
7
EKF−Evensen
EKF−Yang
AUS−γ0
AUS−γ
AUS−iterating
AUS−iterating+
6
5
4
3
2
RMS Error=σ
1
0
Analysis
0.25
0.5
Time
0.75
Figure 3.30: RMS analysis and forecast error: an average on 100,000 assimilation steps, assimilation window τ = 0.25, 3 variables observed with variance 1.
3.5
Some illustrative examples
In this section we show some examples of noisy data assimilation, outputs of the different
schemes probed in this work: in subsection 3.5.1 we present a case study with the same truth
trajectory of the system (the variable x only), and the assimilation performances of our DA
schemes. In subsection 3.5.2 we show a 3D plot of the AUS technique with some comments,
while in subsection 3.5.3 we provide a step by step description AUS-iterating schemes.
3.5.1
Comparing AUS with EKF assimilation schemes: case study
In Figures 3.32-3.38 we show, for the same case, the solution for the variable x and the analysis
performed. The vertical green line shows the time of the last observation available for the
assimilation. The experiment setup is:
• assimilation window τ = 0.25
• 3 variables observed with error variance σ 2 = 2
We can see that, in this context, the EKF tends not to capture the regime changes, which is
consistent with the average results over 100,000 assimilations shown in Table 3.3. EKF-Evensen
88
5
2
<RMS Analysis+Forecast Error> − 10 assimilations, τ=0.25, 3 obs., σ =0.1
3.5
3
RMS Error
2.5
EKF−Evensen
EKF−Yang
AUS−γ0
AUS−γ
AUS−iterating
AUS−iterating+
2
1.5
1
RMS Error=σ
0.5
0
Analysis
0.25
0.5
Time
0.75
Figure 3.31: RMS analysis + forecast error: an average on 100,000 assimilation steps, assimilation window τ = 0.25, 3 variables observed with variance σ 2 = 0.1 ⇒ σ = 0.316.
Figure 3.32: EKF assimilation scheme: solution for x. The green line shows the time of the last
observation available.
EKF − Solution for x
25
Truth
Background
Observations
Analysis
20
15
10
x
5
0
−5
−10
−15
−20
−25
0
5
10
15
Time
89
20
25
30
Figure 3.33: EKF-Evensen assimilation scheme: solution for x.
EKF−Evensen − Solution for x
25
Truth
Background
Observations
Analysis
20
15
10
x
5
0
−5
−10
−15
−20
−25
0
5
10
15
20
25
30
Time
Figure 3.34: EKF-Yang assimilation scheme: solution for x.
EKF−Yang − Solution for x
25
Truth
Background
Observations
Analysis
20
15
10
x
5
0
−5
−10
−15
−20
−25
0
5
10
15
Time
90
20
25
30
Figure 3.35: AUS-γ0 assimilation scheme: solution for x.
AUS−γ0 − Solution for x
25
Truth
Background
Observations
Analysis
20
15
10
x
5
0
−5
−10
−15
−20
−25
0
5
10
15
20
25
30
Time
Figure 3.36: AUS-γ assimilation scheme: solution for x.
AUS−γ − Solution for x
25
Truth
Background
Observations
Analysis
20
15
10
x
5
0
−5
−10
−15
−20
0
5
10
15
Time
91
20
25
30
Figure 3.37: AUS-iterating assimilation scheme: solution for x.
AUS−iterating − Solution for x
25
Truth
Background
Observations
Analysis
20
15
10
x
5
0
−5
−10
−15
−20
0
5
10
15
20
25
30
Time
Figure 3.38: AUS-iterating+ assimilation scheme: solution for x.
AUS−iterating+ − Solution for x
25
Truth
Background
Observations
Analysis
20
15
10
x
5
0
−5
−10
−15
−20
0
5
10
15
Time
92
20
25
30
Figure 3.39: How AUS-γ assimilates observations.
45
Truth
AUS−γ
assim. windows = 0.8
2
perfect obs. (σ =0)
observed: y
40
t=0
35
30
t=0
z
t=T
25
t=T
20
t=2T
15
10
−20
0
x
20
−20
−10
−15
−5
0
5
10
15
20
y
and EKF-Yang perform definitely better, particularly the former. As for the AUS schemes we
can appreciate the improvements from the under-performing AUS-γ0 up to AUS-iterating.
Note that EKF-Evensen has here a better performance in the forecast period than the AUSiterating techniques, but this is not the general case: as already mentioned in Table 3.3, the
average performance of EKF-Evensen is worse than that of AUS-iterating, both for analysis
error and for forecast error.
3.5.2
AUS-γ: a 3D example
In Fig. 3.39 we show what is the essence of the AUS approach. We choose:
• a very long assimilation window, τ = 0.8
• only one observed variable: y
• perfect observation, σ 2 = 0
In the plot, the blue trajectory is the truth, the red one is the AUS-γ assimilation trajectory.
The red circles, labelled with t = 0, mark the beginning of both trajectories. The black circles
(t = T ) show the same trajectories after one assimilation window, and the green circles (t = 2T )
one assimilation window ahead. The black vectors are the bred vectors estimated by BDAS
93
Figure 3.40: A qualitative comparison between EKF and AUS-γ.
45
assim. windows = 0.8
noisy obs (σ2=2)
observed: x, y, z
Truth
EKF
40
AUS−γ
35
z
30
25
20
t=0
t=2T
15
t=T
10
−20
−10
x
0
10
20
−25
−20
−15
−10
−5
5
0
10
15
20
y
(see section 2.4) and evolved by the Tangent Linear Model operator, plotted every ∆t = 0.05.
When t = T the assimilation is done along the corresponding bred vector, assumed to capture
the most unstable structure of the system (see section 2.3).
In Fig. 3.40 we show, as an example, the 3D comparison between EKF and AUS-γ, in the
following context:
• a very long assimilation window, τ = 0.8
• all 3 variables observed
• noisy observations, with σ 2 = 2
Here the bred vectors for AUS-γ have not been plotted for sake of clarity. The initial condition
is the same for all schemes. After each DA technique does its job for 3 assimilation windows,
we plot the trajectories starting from the the red circles (t = 0). The blue trajectory is the
truth, the red one is the AUS-γ trajectory, while the green one is that of EKF. Both EKF and
AUS-γ are far from the truth, at the beginning, but at t = T (black circles) the EKF is not
able to capture the regime change, while AUS-γ does.
94
Figure 3.41: The zone of the attractor considered in the next Figure 3.42.
50
45
40
35
30
25
20
15
10
5
0
50
0
−50
−15
3.5.3
−10
−5
0
5
10
15
20
AUS-iterating: a step by step description
In Fig. 3.41 a 3D plot of the L63 system has been shown, with the highly unstable zone of the
attractor, which is enlarged in Fig. 3.42. Here we set:
• a long assimilation window, τ = 0.25
• 2 adaptive observations
• noisy observations, with σ 2 = 2
The blue trajectory is the truth. The green star is the forecast state, with which the first
analysis can be computed (red circle); the red trajectory is the “first-attempt” forecast trajectory. Black vectors are the bred vector estimated by BDAS (see section 2.4) and evolved
by the Tangent Linear Model operator, plotted every ∆t = 0.05. When a new observation
becomes available, a new assimilation can be done. In the iteration we go back to the previous assimilation step, correct the previous analysis according to equations 3.28-3.29 and to
Rt
∆xak = c exp(− tkk+1 λ(t) dt)ek (see subsection 3.3.4), and compute the final forecast trajectory
(the green one). Of course new bred vectors are computed as well, but are shown only in Fig.
3.43 (in magenta).
95
Figure 3.42: How AUS-iterating works: a zoom of previous Fig. 3.41.
.
assim. window=0.25
noisy obs. (σ2=2)
2 observations
10
8
first analysis
z
6
.
forecast state
4
final analysis
2
truth
−15
.
−10
−5
x
0
5
10
−12
−10
−8
−6
−4
−2
0
2
4
6
8
10
y
Figure 3.43: More details on AUS-iterating assimilation scheme, including the evolving unstable
vectors in the final forecast trajectory, recomputed after iteration.
10
8
6
4
2
−15
−10
−5
0
5
10
−12
−10
−8
−6
96
−4
−2
0
2
4
6
8
10
3.6
Adding two types of Model Error
We now wish to run experiments with some kind of model error in the equations of our L63
system. In subsection 3.6.1, we will start by simply adding a random term to each equation
and at each integration step. In subsection 3.6.3, we will introduce a systematic error in our
model by changing the value of the parameter r, which drives the convection instability (see
subsection 1.4.2).
In short, we will compute the “true” trajectory of the system, we will get noisy observations
of the true state of the system, but we will perform data assimilation using an imperfect model.
3.6.1
Random error
In these experiments we will simulate a systematic model error by adding, at each integration
step dt = 0.01, a random term in the model equations (eq. 1.49):
”Truth” System























Assimilation Model























dx
dt
= σ(y − x)
dy
dt
= rx − y − xz
dz
dt
= xy − bz
dx
dt
= σ(y − x) + 1
dy
dt
= rx − y − xz + 2
dz
dt
= xy − bz + 3
(3.42)
(3.43)
where 1 , 2 , 3 are random additive terms, normally distributed with standard deviation A ×
0.1σ, with the parameter A assuming different values.
97
Table 3.6: Random error in the assimilation model: 3 observations with variance σ 2 = 2 ⇒
σ = 1.414 each assimilation window τ = 0.25. It is shown the mean RMS analysis error, an
average over 20000 assimilations, for the different DA schemes.
.
3 obs
assimilation technique
EKF
EKF-Evensen
EKF-Yang
AUS-γ0
AUS-γ
AUS-iterating
AUS-iterating+
3.6.2
A=0
15.6
1.71
3.54
7.02
2.24
1.35
1.15
A=1
15.5
1.73
7.80
8.87
2.72
1.94
1.80
A=2
15.7
1.77
9.98
11.0
3.89
2.75
2.44
Random error: comparing performances
All the experiments are performed under the following conditions:
• A 20,000 assimilations statistics
• 3 or 2 noisy observations with variance σ 2 = 2
• an assimilation window τ = 0.25
• random additive terms 123 are added at each integration step dt = 0.01. They are
normally distributed with standard deviation A × 0.1σ ; the parameter A will assume the
values A = 1 and A = 2
• in the 2 observation experiments, we will observe the two most convenient variables, y
and z, for EKF schemes. For AUS schemes, we will observe the two variables associated
with the two largest components of the unstable vector
• In EKF-Evensen scheme, the term Q is the same as in eq. 3.8
• In EKF-Yang, no further tuning of the parameters has been done: the algorithm is the
same as in experiments without model error
We show the results in Tables 3.6 and 3.7. The case with A = 0, i.e. without random error
at all, is shown here for reference (even if it was already shown in paragraph 3.4 with a longer
statistics).
The best AUS schemes are quite robust with respect to almost all the others. EKF-Evensen
is actually more stable under these conditions, but it needs an estimate of the model error
covariance matrix Q.
98
Table 3.7: Random error in the assimilation model: 2 observations with variance σ 2 = 2 ⇒
σ = 1.414 each assimilation window τ = 0.25. We show the mean RMS analysis error, an
average over 20000 assimilations.
2 obs
assimilation technique
EKF
EKF-Evensen
EKF-Yang
AUS-γ0
AUS-γ
AUS-iterating
AUS-iterating+
3.6.3
A=0
15.5
1.78
3.67
7.46
2.50
1.54
1.28
A=1
15.6
1.82
7.08
9.35
3.04
2.22
2.11
A=2
15.6
1.92
9.47
10.9
4.19
3.27
3.07
Systematic error
In these experiments we will simulate a systematic error by augmenting the value of the parameter r in the model equations (eq. 1.49). As we have already seen in subsection 1.4.2, the
parameter r is related to the intensity of the convection instability. We have to keep in mind
that a value r = 28 is a slightly supercritical value for unstable convection to occur. Thus the
equations to be used are:
”Truth” System























Assimilation Model























dx
dt
= σ(y − x)
dy
dt
= rx − y − xz
dz
dt
= xy − bz
dx
dt
= σ(y − x)
dy
dt
= (r + ∆r)x − y − xz
dz
dt
= xy − bz
(3.44)
(3.45)
The term ∆r will introduce a systematic error in the equations used for data assimilation, by
increasing the instability of convective motion in the model.
99
Table 3.8: Systematic error in the assimilation model, r = 28 (i.e. no error, for reference),
r = 30, r = 33. Three observations with variance σ 2 = 2 each assimilation window τ = 0.25. It
is shown the mean RMS analysis error, an average over 20,000 assimilations.
.
3 obs
assimilation technique
EKF
EKF-Evensen
EKF-Yang
AUS-γ0
AUS-γ
AUS-iterating
AUS-iterating+
3.6.4
r = 28
15.6
1.71
3.54
7.02
2.24
1.35
1.15
r = 30
16.7
1.74
11.4
11.5
3.91
2.53
2.24
r = 33
17.6
1.86
13.9
14.6
8.01
4.90
3.87
Systematic error: comparing performances
All the experiments are performed under the following conditions:
• A 20,000 assimilations statistics
• 3 or 2 noisy observations with variance σ 2 = 2
• an assimilation window τ = 0.25
• systematic errors ∆r = 0 (for reference), ∆r = 2 and ∆r = 5
• in the 2 observation experiments we will observe the two most convenient variables, y and
z, for EKF schemes; while for AUS schemes we will observe the two variables associated
with the two largest components of the unstable vector
• In EKF-Evensen scheme, the term Q is the same as in eq. 3.8
• In EKF-Yang, no further tuning of the parameters has been done: the algorithm is the
same as in experiments without model error
We show the results in Tables 3.8 and 3.9. The case with r = 28, i.e. without systematic error,
is included here for comparison (even if it was already shown in paragraph 3.4 with a longer
statistics).
As for random error, all schemes considered turn out similar results: the EKF-Evensen
scheme does outperform the others, but it needs an estimate of the model error covariance,
which is not easy to do in realistic models. In these contexts, the best AUS schemes are still
quite robust.
100
Table 3.9: Systematic error in the assimilation model, r = 28 (i.e. no error, for reference),
r = 30, r = 33. Two observations with variance σ 2 = 2 each assimilation window τ = 0.25. It
is shown the mean RMS analysis error, an average over 20000 assimilations.
.
2 obs
assimilation technique
EKF
EKF-Evensen
EKF-Yang
AUS-γ0
AUS-γ
AUS-iterating
AUS-iterating+
r = 28
15.5
1.78
3.67
7.46
2.50
1.54
1.28
101
r = 30
16.9
1.81
10.9
11.6
4.26
2.76
2.53
r = 33
17.9
1.96
13.8
14.8
8.21
5.38
4.88
102
Chapter 4
Conclusions
In this work we focused on the data assimilation problem for the highly nonlinear, chaotic
Lorenz’s 63 system. Specifically, we tried to estimate the state of the system given a set of
noisy observations at regular time intervals and the equations of the model.
In the contexts we have examined, including those with a model error, we have seen that
Assimilation in the Unstable Subspace (AUS) once again has shown better efficiency than other
advanced data assimilation schemes. Furthermore, at least for sufficiently long assimilation
windows, our proposed approach for the estimate of the forecast error amplitude leads to a
significant improvement of the assimilation performance. In the cases we have considered,
the iterating extension of AUS improves over the standard AUS. In particular, it boosts the
efficiency of regime’s changes tracking, with a low computational cost.
Other data assimilation schemes need estimates of ad hoc parameters, which have to be tuned
for the specific model at hand. In NWP models, tuning of parameters — and in particular an
estimate of the model error covariance matrix Q — may turn out to be quite difficult. The
estimate of Q, on the other hand, is at the basis of the good performance of the Evensen model
in particular in the presence of model error. As a final remark, we should note that the proposed
approach may well be implemented in operational NWP models.
103
Appendices
Appendix A
Euler and Runge-Kutta
numerical integration methods
The Runge-Kutta methods are used to numerically approximate the solution of ordinary differential equations: they have much better performances with respect to the first order Euler
method.
The kind of problems we want to solve is the following. Let’s suppose we have a vector x
and a vector function f (x) so that:
d
x = f (x)
dt
(A.1)
Given the initial condition x(t0 ) = x0 , we wish a systematic way to approximate the solution
x(t).
A.1
First order Euler method
The formula for the Euler method to advance a solution from tn to tn+1 ≡ tn + ∆t, that is to
estimate x(tn + ∆t) ' xn+1 given xn , comes straightforwardly from the definition of derivative:
xn+1 = xn + f (xn ) ∆t + O(∆t2 )
(A.2)
where the error E = |x(tn + ∆t) − xn+1 | will tend to zero with ∆t, i.e. E ∝ ∆t.
In the Euler method the solution xn+1 , advanced through an interval ∆t, is computed with
the derivative information f (xn ) calculated at the initial time tn only: it is not recommended
for practical use because it is not so accurate, if compared to other methods, and it’s not even
very stable, too.
107
A.2
RK2: second order Runge-Kutta scheme
A better convergence to zero of the error can be obtained with the 2nd order Runge-Kutta
scheme, also known as improved Euler method, where the error E = |x(tn +∆t)−xn+1 | ∝ (∆t)2 .
Here the solution xn+1 is computed by using an average for the derivative:
1
xn+1 = xn + (k1 + k2 ) + O(∆t3 )
2
(A.3)
where
k1
=
f (xn ) ∆t
(A.4)
k2
=
f (xn + k1 ) ∆t = f (x̃n+1 ) ∆t
(A.5)
The meaning of k1 , k2 and x̃n+1 is the following:
• k1 is a trial step to evaluate (xn+1 − xn ) through the Euler method, as in eq. A.2; the
first evaluation of xn+1 has been called x̃n+1
• k2 is another estimate of (xn+1 − xn ) through the same Euler method, but using f (x̃n+1 ),
which is the derivative in the estimated end of the time interval
So RK2 (eq. A.3) exploits an average between the derivative f (xn ) calculated at the initial
time tn and the derivative f (x̃n+1 ) = f (xn + k1 ) which is an estimate of the derivative f (xn+1 )
calculated at the final time.
108
A.3
RK4: fourth order Runge-Kutta scheme
We need not limit ourselves to the second order. A very popular Runge-Kutta method is the
fourth order one, in which the error E = |x(tn + ∆t) − xn+1 | ∝ (∆t)4 . Here we use a weighted
average for the derivative:
1
xn+1 = xn + (k1 + 2k2 + 2k3 + k4 ) + O(∆t5 )
6
(A.6)
where
k1
=
k2
=
k3
=
k4
=
f (xn ) ∆t
1
f xn + k1 ∆t
2
1
f xn + k2 ∆t
2
f (xn + k3 ) ∆t
(A.7)
(A.8)
(A.9)
(A.10)
Higher order RK schemes are conceivable, but not necessarily better: they need more computational efforts.
109
110
Appendix B
Normalization factors for random
variables
If x is a random variable with standard deviation σ and Probability Density Function (PDF):
1
x2
√
P (x) =
exp − 2
2σ
2π σ
(B.1)
and x is a vector whose components are random with standard deviation σ and the same PDF,
it can be shown that:
Z
Z Z
Z Z Z
+∞
−∞
+∞
−∞
+∞
−∞
|x| P (x)dx
kxk P (x1 )P (x2 )dx1 dx2
kxk P (x1 )P (x2 )P (x3 )dx1 dx2 dx3
111
=
r
2
σ
π
1-Dimensional case
=
r
π
σ
2
2-Dimensional case
=
r
8
σ
π
3-Dimensional case
112
Appendix C
EKF, 1-Dimensional example
In the Extended Kalman Filter, if the system is described by a single variable only, the forecast
and analysis covariance matrices and Kalman gain reduce to scalars.
C.1
Forecast and analysis covariance matrices
From Extended Kalman Filter equations, we have:
Kk
=
pfk
=
pak
=
pfk
(C.1)
σ 2 + pfk
αk pak−1
(1 − K)pfk =
(C.2)
σ 2 pfk
σ 2 + pfk
Let’s find forecast and analysis covariance matrices step by step:
p0
≡
pa0
pf1
=
α1 p0
pa1
=
pf2
=
α2 pa1 =
pa2
=
σ 2 pf2
pf3
=
α3 pa2 =
pa3
=
σ 2 pf3
σ 2 pf1
σ2 +
σ2 +
σ2 +
pf1
pf2
pf3
=
σ 2 α1 p0
+ α1 p0
σ2
σ 2 p0 α1 α2
σ 2 + α1 p0
=
σ2
σ 2 p0 α1 α2
+ p0 (α1 + α1 α2 )
σ 2 p0 α1 α2 α3
σ 2 + p0 (α1 + α1 α2 )
=
σ2
σ 2 p0 α1 α2 α3
+ p0 (α1 + α1 α2 α3 )
113
(C.3)
..
.
..
.
pfk
=
pak
=
Kk
=
Qk
σ 2 p0 j=1 αk
σ 2 p0 α1 α2 . . . αk
=
Pk−1 Qi
σ 2 + p0 (α1 + α1 α2 + α1 α2 α3 + . . . + α1 α2 . . . αk−1 )
σ 2 + p0 ( i=1
j=1 αj )
Q
k
σ 2 p0 j=1 αk
σ 2 p0 α1 α2 . . . αk
=
Pk Q i
σ 2 + p0 (α1 + α1 α2 + α1 α2 α3 + . . . + α1 α2 . . . αk )
σ 2 + p0 ( i=1 j=1 αj )
pak
σ2
In these equations we have also showed the result for Kalman gain Kk , that can be easily
Pkf
2
σ +Pkf
computed from Kk =
. The analysis, background and observational errors are bound by
the following general equation [16]:
a
= b + (B−1 + HT R−1 H)−1 HT R−1 (o − Hb )
(C.4)
In this 1-dimensional example, if H = 1 and R = σ 2 it becomes:
a
σ −2
(o −
f −1
(pk ) + σ −2
pf
b + f k (o − b )
pk + σ 2
(1 − Kk )b + Kk o
= b +
=
=
Hb )
If αk = α = constant > 1, then
lim pfk
= σ 2 (α − 1)
lim pak
= σ2
(C.6)
=
(C.7)
k→∞
k→∞
lim Kk
k→∞
α−1
α
α−1
α
Indeed:
lim pfk
k→∞
σ 2 p0 αk
k→∞ σ 2 + p0 (α + α2 + . . . + αk−1 )
σ 2 p0 α
= lim σ2
1
k→∞ k−1 + p0
+ . . . + α1 + 1
α
αk−2
=
=
lim
σ2 α
1
1
1− α
= σ 2 (α − 1)
114
(C.5)
and
lim pak
k→∞
=
=
=
=
σ 2 p0 αk
k→∞ σ 2 + p0 (α + α2 + . . . + αk )
σ 2 p0
lim σ2
1
1
k→∞ k + p0
α
αk−1 + αk−2 + . . . +
lim
1
α
+1
σ2
1
1
1− α
σ2
α−1
α
for if α > 1 then 1/α < 1. So:
lim Kk
k→∞
=
=
σ 2 (α − 1)
+ σ 2 (α − 1)
α−1
α
σ2
or, equivalently:
lim (1 − Kk )
k→∞
C.2
=
1
α
Lyapunov exponents for free and forced systems
Now we redefine α as the error growth (instead of the analysis error growth), so now 1−K ≈
1
α2 ,
or 1 − K ≈ exp(−2λτ ). The error growth of the forced system will be:
ef τ = (1 − K)eλτ
(C.8)
where f and λ are the greatest Lyapunov exponents of the forced and free systems respectively.
So, if αk = α = constant > 1:
ef τ
=
=
⇒f
(1 − K)eλτ
1
e2λτ
=
e−λτ
=
−λ
eλτ
If αk are not constant the results are almost the same. If α 1 then 1 − K ≈ 0, or K ≈ 1,
115
so the forecast will be almost ignored and the analysis error covariance matrix will tend to the
observational error: pa = σ 2 (1 − 1/α) ≈ σ 2 .
If we diminish the observational interval τ (τ −→ 0):
pa = σ 2 (1 − exp(−2λτ )) −→ 0
116
(C.9)
Appendix D
Inverse and pseudo-inverse
matrices
An inverse matrix can be defined for square matrices only. The pseudo-inverse matrix, instead,
is a generalization that can also be defined for rectangular matrices.
D.1
Inverse matrix
Given an N × N square matrix A, the inverse matrix A−1 , if it does exist, has the same
dimensions as A and is such that
A−1 A = AA−1 = I
(D.1)
where I is the N × N identity matrix. The matrix A has the inverse matrix A−1 if and only if
A is non-singular, i.e.:
det(A) = |A| =
6 0
(D.2)
The matrix A−1 can be written as follows:
A−1 =
1
CT
|A|
117
(D.3)
where

c11


 c12

CT =  .
 .
 .

c1N
c21
···
cN 1
c22
..
.
···
..
.
cN 2
..
.
c2N
· · · cN N









(D.4)
is the transpose of cofactor matrix of A.
Example. Given a 3 × 3 non-singular matrix A

 a11

A=
 a21

a31
a12
a22
a32

a13 

a23 


a33
(D.5)
with det(A) = |A| =
6 0, the cofactor matrix C is


a22
 + det 



a32









 a12
C =  − det 

a32








a12

 + det 

a22



a23 
 a21
 − det 
a33
a31
a23 

a33


a13 
 a11
 + det 
a31
a33

a13 

a33


a13 
 a11
 − det 
a21
a23

a13 

a23

 a21
+ det 
a31
 
a22 

a32
a12 

a22






















+(a12 a23 − a22 a13 )



a12 

a32


 a11
− det 
a31
 a11
+ det 
a21
so it transpose is

+(a22 a33 − a32 a23 )





CT = 
 −(a21 a33 − a31 a23 )




+(a21 a32 − a31 a22 )
−(a12 a33 − a32 a13 )
+(a11 a33 − a31 a13 )
−(a11 a32 − a31 a12 )
118





−(a11 a23 − a21 a13 ) 





+(a11 a22 − a21 a12 )
and the inverse matrix A−1 is

A−1
+(a22 a33 − a32 a23 )
−(a12 a33 − a32 a13 )




1 
 −(a a − a a )
=
21 33
31 23
|A| 




+(a21 a32 − a31 a22 )
+(a11 a33 − a31 a13 )
−(a11 a32 − a31 a12 )
+(a12 a23 − a22 a13 )






−(a11 a23 − a21 a13 ) 





+(a11 a22 − a21 a12 )
Lastly, we recall some other useful facts: it can be shown that if the matrix A can be written
as a product, then the inverse matrix A−1 can be expressed in term of the product of the inverse
matrices:
A = BC
A−1 = C−1 B−1
=⇒
(D.6)
If the matrix A admits the inverse A−1 then we have:
D.2
(AT )−1
=
(A−1 )T
(D.7)
(kA)−1
=
k −1 A−1
(D.8)
Pseudo-inverse matrix
Given an M × N matrix A, the pseudo-inverse A+ has the same dimension as AT , and satisfies
the following relations:
AA+ A = A
A+ AA+
(D.9)
= A+
(D.10)
(AA+ )∗
= AA+
=⇒ AA+
is hermitian
(D.11)
(A+ A)∗
= A+ A
=⇒ A+ A is hermitian
(D.12)
where A∗ is the conjugate transpose of the matrix A, i.e. the matrix computed by taking the
transpose of A and then the complex conjugate of each element. If the matrix A is real, then
A∗ = AT . Should exist the inverse of A∗ A, then
A+ = (A∗ A)−1 AT
If it does exist the inverse A−1 of the matrix A, then it can be shown that A−1 = A+ .
119
(D.13)
Example. Given a 3 × 2 real matrix A

1
0





A= 2 5 



7 3

then its pseudo-inverse A+ is


 0.0389 −0.0994 0.1657 
A+ = 

−0.0354 0.2377 −0.0629
which satisfies the four properties above:


 0.0389 −0.0994 0.1657 



(AA ) = AA = 
 −0.0994 0.9897 0.0171 


0.1657
0.0171 0.9714
+ T
+

1
0


(A+ A)T = A+ A = 

0 1

Furthermore, since


 54 31 
AT A = 

31 34
then
(AT A)−1
and

0.0389
−0.0354


=

−0.0354 0.0617



0.0389
−0.0994
0.1657


A+ = (AT A)−1 AT = 

−0.0354 0.2377 −0.0629
120
Bibliography
[1] Andrews, D. G., 2000: An Introduction to Atmospheric Physics. Cambridge University
Press. Cambridge, UK.
[2] Arnold, V. I., 1986: Metodi Matematici della Meccanica Classica. Editori Riuniti Edizioni
MIR, Roma, Italia.
[3] Carrassi, A., Trevisan, A. and Uboldi, F. 2006: Adaptive observations and assimilation in
the unstable subspace by breeding on the data assimilation system. Tellus, 59A-1, 101-113.
[4] Charney, J. G., 1951: Dynamical forecasting by numerical process. Compendium of meteorology. American Meteorological Society, Boston, MA, USA.
[5] Chin, T. M., Turmon, M. J., Jewell, J. B., Ghil, M. 2005. An ensemble-based smoother
with optimized weights for highly nonlinear systems, in print.
[6] Corazza, M., Kalnay, E., Patil, D. J., Yang, S. C., Morss, R., Cai, M., Szunyogh, I., Hunt,
B. R. and Yorke, J. A. 2003: Use of the breeding technique to estimate the structure of
the analysis “error of the day”. Nonlin. Processes Geophys. 10, 223-243.
[7] Courtier, P. and O. Talagrand, 1990: Variational assimilation of meteorological observations with the direct and adjoint shallow water equations. Tellus, 42A, 531-549.
[8] Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press. Cambridge, UK.
[9] Ehrendorfer, M., 2002: The Liouville equation in atmospheric predictability. ECMWF
Seminar Proceedings series 2002, Predictability of Weather and Climate.
[10] Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model
using Monte Carlo methods to forecast error statistics. J. Geophys. Res. 99 (C5), 1014310162.
[11] Evensen, G., 1997: Advanced data assimilation for strongly nonlinear dynamics. Mon.
Wea. Rev., 125, 1342-1354.
121
[12] Hénon, M., 1976: A two-dimensional mapping with a strange attractor. Comm. Math.
Phys. 50, 69-77.
[13] Holton, J. R., An Introduction to Dynamic Meteorology. Academic Press. Orlando, USA.
[14] Ide, K., P. Courtier, M. Ghil and A. C. Lorenc, 1997: Unified notation for data assimilation:
Operational, sequential and variational. J. Meteor. Soc. Japan, 75, 181-189.
[15] Jazwinski, A. H., 1970: Stochastic Processes and Filtering Theory. Academic Press, New
York, NY, USA.
[16] Kalnay, E., 2003: Atmospheric Modeling, Data Assimilation and Predictability. Cambridge
University Press. Cambridge, UK.
[17] Lewis, J. and J. Derber, 1985: The use of adjoint equations to solve a variational adjustment
problem with advective constraint. Tellus, 37A, 309-322.
[18] Lorenz, E. N., 1963: Deterministic non-periodic flows. J. Atmos. Sci. 20, 130-141.
[19] Lorenz, E. N., 1993: The Essence of Chaos. University of Washington Press. Seattle, USA.
[20] Lorenz, E. N., 1996: Predictability: A problem partly solved. Proc. Seminar on Predictability, Vol. 1, Reading, United Kingdom, ECMWF, 1-18.
[21] Lorenz, E. N. and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: simulation with a small model. J. Atmos. Sci. 55, 399-414.
[22] May, R. M., 1974: Stability and Complexity in Model Ecosystems. Princeton University
Press. Princeton, NJ, USA.
[23] Maybeck, P. S., 1979: Stochastic Models: Estimation and Control, Volume 1. Academic
Press, New York, NY, USA.
[24] Nicolis, C., 2003: Dynamics of Model Error: Some Generic Features. J. Atmos. Sci. 60,
2208-2218.
[25] Nicolis, C., 2004: Dynamics of Model Error: The Role of Unresolved Scales Revisited. J.
Atmos. Sci. 61, 1740-1753.
[26] Ott E., Sauer T. and Yorke J. A., 1994: Coping with Chaos: Analysis of Chaotic Data and
the Exploitation of Chaotic Systems. John Wiley & Sons Inc., New York.
[27] Peixoto, J. P. and A. H. Oort, 1992: Physics of Climate. American Institute of Physics,
New York, USA.
122
[28] Poincaré, J. H., 1997 (reprint): Scienza e Metodo. Giulio Einaudi editore SpA, Torino,
Italia.
[29] Pratt, W. J., Raiffa H. and Schlaifer R., 1995: Introduction to Statistical Decision Theory.
The MIT Press, Cambridge, MA, USA.
[30] Saltzman, B., 1962: Finite amplitude free convection as an initial value problem–I. J.
Atmos. Sci. 19, 329-341.
[31] Strogatz, S. H, 1994: Nonlinear Dynamics and Chaos. Westview Press. Cambridge, USA.
[32] Trevisan, A. and F. Pancotti, 1998: Periodic orbits, Lyapunov vectors and singular vectors
in the Lorenz system. J. Atmos. Sci. 55, 390-398.
[33] Trevisan, A. and F. Uboldi, 2004: Assimilation of Standard and Targeted Observations
within the Unstable Subspace of the Observation-Analysis-Forecast Cycle System. J. Atmos. Sci. 61, 103-113.
[34] Uboldi, F., Trevisan, A. and Carrassi, A. 2005: Developing a dynamically based assimilation method for targeted and standard observations. Nonlin. Processes in Geophys. 12,
149-156.
[35] Uboldi, F. and A. Trevisan, 2006: Detecting unstable structures and controlling error
growth by assimilation of standard and adaptive observations in a primitive equation ocean
model. Nonlin. Processes in Geophys. 13, 67-81.
[36] Yang et al., 2006: Data Assimilation as Synchronization of Truth and Model: Experiments
with the Three-Variable Lorenz System. J. Atmos. Sci. 63, 2340-2354.
123
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement