# Tesi Pilolli Massimo

ALMA MATER STUDIORUM UNIVERSITÀ DI BOLOGNA Dottorato di Ricerca in Modellistica Fisica per la Protezione dell’Ambiente – XX Ciclo Dipartimento di Scienze della Terra e Geologico-Ambientali – Dipartimento di Fisica Settore Scientifico-Disciplinare FIS/06 – Fisica per il Sistema Terra e il Mezzo Circumterrestre Coordinatore: Prof. Ezio Todini A Dynamical System Approach to Data Assimilation in Chaotic Models Dottorando: Dott. Massimo Pilolli Relatori: Prof. Rolando Rizzi Dott.ssa Anna Trevisan Anno 2008 A mio padre e mia madre “Une cause très petite, qui nous échappe, détermine un effet considérable que nous ne pouvons pas ne pas voir, et alors nous disons que cet effet est dû au hasard. Si nous connaissions exactement les lois de la nature et la situation de l’univers à l’instant initial, nous pourrions prédire exactement la situation de ce même univers à un instant ultérieur. Mais, lors même que les lois naturelles n’auraient plus de secret pour nous, nous ne pourrons connaı̂tre la situation initiale qu’approximativement. Si cela nous permet de prévoir la situation ultérieure avec la même approximation, c’est tout ce qu’il nous faut, nous disons que le phénomène a été prévu, qu’il est régi par des lois; mais il n’en est pas toujours ainsi, il peut arriver que de petites différences dans les conditions initiales en engendrent de très grandes dans les phénomènes finaux; une petite erreur sur les premières produirait une erreur énorme sur les derniers. La prédiction devient impossible et nous avons le phénomène fortuit.” Jules-Henri Poincaré – Science et Méthode, 1908 Acknowledgements First of all, I wish to write a note in memory of Edward Norton Lorenz, who died in Cambridge, MA, USA, on April 16. He was not only the father of Chaos Theory and the butterfly effect, one of the great scientific revolutions of the past century, but also the grandfather of all meteorologists. I regret I have never had the chance to meet him. I wish to thank my advisor Anna Trevisan (ISAC-CNR), one of the strongest and most motivated woman I ever met: her deep insight into data assimilation, her constant support and enlightening discussions made this work possible. A special thank you to Alberto Carrassi and Francesco Uboldi for their frequent, friendly help. I also wish to thank Prof. Rolando Rizzi and Prof. Ezio Todini for their support, useful suggestions and comments. Truly unforgettable have been the kind hospitality and the useful discussions with Prof. Catherine Rouvas-Nicolis and Dr. Stéphane Vannitsem, at the Department of Meteorological Research and Development of the Institut Royal Météorologique/Koninklijk Meteorologisch Instituut, the Royal Meteorological Institute of Belgium, Brussels, where I met very nice and friendly people: merci beaucoup/dank u wel! E infine un grazie grande cosı̀ alla Stefania, per la sua pazienza e disponibilità! The software used in this work has been developed on the basis of software written in the NERC Data Assimilation Research Centre, Department of Meteorology, University of Reading, United Kingdom. All experiments were run on a Debian 3.1 Linux computer, using both Matlab 6.5.0.180913a Release 13 and gcc/g++ 3.3.5 C/C++ compiler. Some figures were produced by gnuplot 4.0. M. P., Bologna, April 2008 v Contents Acknowledgements v List of Figures xi List of Tables xv Abbreviations xvii Partial List of Symbols xix 1 Introduction 1 1.1 A historical perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Data Assimilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Framing the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 Basic features of chaotic systems, a historical note . . . . . . . . . . . . . 6 1.3.2 Conservative, nonconservative dynamical systems . . . . . . . . . . . . . . 6 1.3.3 Probability Density Functions and Liouville equation . . . . . . . . . . . . 7 1.3.4 The Markov processes and the Fokker-Planck equation . . . . . . . . . . . 8 1.3.5 The Liouville Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.6 Dissipative systems, attractors and strange attractors . . . . . . . . . . . 11 1.3.7 Small perturbations dynamics, tangent linear model, adjoint model . . . . 12 1.3.8 Lyapunov vectors and Lyapunov exponents . . . . . . . . . . . . . . . . . 13 1.3.9 Bred vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3.10 Dynamical systems described by maps . . . . . . . . . . . . . . . . . . . . 16 1.4 Lorenz’s three dimensional chaotic system (1963) . . . . . . . . . . . . . . . . . . 20 1.4.1 The equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.4.2 The meaning of variables and parameters . . . . . . . . . . . . . . . . . . 22 vii 1.4.3 Why it is so important: chaos implies limited predictability . . . . . . . . 27 1.4.4 Lyapunov exponents, dimensionality and doubling time . . . . . . . . . . 27 1.5 The aim of this study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.6 Notation and conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.6.1 Model space and observational space . . . . . . . . . . . . . . . . . . . . . 29 1.6.2 The observation operator H and its linear approximation H . . . . . . . . 29 1.6.3 Error vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 1.6.4 Error covariance matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 1.6.5 Operators and vectors: a low dimensional example . . . . . . . . . . . . . 33 2 Data Assimilation: state of the art 2.1 2.2 2.3 2.4 37 Variational assimilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.1.1 Maximum likelihood approach . . . . . . . . . . . . . . . . . . . . . . . . 38 2.1.2 Bayesian approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.1.3 3D-Var scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Sequential assimilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.2.1 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.2.2 Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.2.3 Ensemble Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 AUS: Assimilation in the Unstable Subspace . . . . . . . . . . . . . . . . . . . . . 49 2.3.1 AUS: how it works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.3.2 AUS: a simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.3.3 Refresh procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.3.4 Using a single bred vector for assimilation . . . . . . . . . . . . . . . . . . 53 2.3.5 Adaptive observation strategy . . . . . . . . . . . . . . . . . . . . . . . . . 53 BDAS: Breeding on the Data Assimilation System . . . . . . . . . . . . . . . . . 53 2.4.1 Standard breeding method . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.4.2 BDAS: how it works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.4.3 BDAS, an example of practical implementation . . . . . . . . . . . . . . . 55 3 Assimilation in the Lorenz 63 model: comparison among different methods 59 3.1 Experimental setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.2 Extended Kalman Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.1 EKF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.2 Evensen’s version of EKF . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 viii 3.2.3 3.3 Yang’s version of EKF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Assimilation in the Unstable Subspace: further developments . . . . . . . . . . . 62 3.3.1 AUS-γ0 : no use of observations in the estimate of the forecast error amplitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.4 3.5 3.6 3.3.2 Estimate of the amplitude γ of the forecast error from observations . . . . 64 3.3.3 AUS-γ: using the estimate of γ from observations . . . . . . . . . . . . . 65 3.3.4 Iterating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.5 Iterating and using a quasi-static Pf in stable zones of the attractor . . . 67 Comparing results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.4.1 Synchronization: one perfect or quasi-perfect observation, case studies . . 68 3.4.2 Noisy observations with variance σ 2 = 2 . . . . . . . . . . . . . . . . . . . 70 3.4.3 Noisy observations with variance σ 2 = 1 . . . . . . . . . . . . . . . . . . . 81 3.4.4 Noisy observations with variance σ 2 = 0.1 . . . . . . . . . . . . . . . . . . 81 3.4.5 Root Mean Square forecast error: time dependence . . . . . . . . . . . . . 84 Some illustrative examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.5.1 Comparing AUS with EKF assimilation schemes: case study . . . . . . . 88 3.5.2 AUS-γ: a 3D example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.5.3 AUS-iterating: a step by step description . . . . . . . . . . . . . . . . . . 95 Adding two types of Model Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.6.1 Random error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.6.2 Random error: comparing performances . . . . . . . . . . . . . . . . . . . 98 3.6.3 Systematic error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 3.6.4 Systematic error: comparing performances . . . . . . . . . . . . . . . . . . 100 4 Conclusions 103 Appendices 105 A Euler and Runge-Kutta numerical integration methods 107 A.1 First order Euler method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 A.2 RK2: second order Runge-Kutta scheme . . . . . . . . . . . . . . . . . . . . . . . 108 A.3 RK4: fourth order Runge-Kutta scheme . . . . . . . . . . . . . . . . . . . . . . . 109 B Normalization factors for random variables ix 111 C EKF, 1-Dimensional example 113 C.1 Forecast and analysis covariance matrices . . . . . . . . . . . . . . . . . . . . . . 113 C.2 Lyapunov exponents for free and forced systems . . . . . . . . . . . . . . . . . . . 115 D Inverse and pseudo-inverse matrices D.1 Inverse matrix 117 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 D.2 Pseudo-inverse matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Bibliography 121 x List of Figures 1.1 A 6-hours data assimilation cycle for weather forecasts. Analyses are computed every 6 hours, typically at 0000 ZT, 0600 ZT, 1200 ZT, 1800 ZT. . . . . . . . . . 1.2 5 A set of Brownian processes starting at x(0) = 0. The random forcing has zero mean and standard deviation σ = 1. . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Bifurcation diagram for the Logistic map, 0 ≤ r ≤ 4. . . . . . . . . . . . . . . . . 17 1.4 Bifurcation diagram for the Logistic map, 2.8 ≤ r ≤ 4. Notice the period-3 window around r = 3.83. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.5 Bifurcation diagram for the Logistic map, a zoom. We can observe the selfsimilarity properties of this diagram. . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.6 Hénon map: strange attractor for classical parameters a and b (see text). . . . . 19 1.7 Lorenz’s 1963 model: a comparison among 1-st order Euler, 2-nd and 4-th order Runge-Kutta schemes for 1000 time steps ∆t = 0.01. The initial point belongs to the attractor and is the same for all schemes: (x0 , y0 , z0 ) = (14.2041, 15.0165, 34.7172). 21 1.8 Lorenz’s 1963 model: a comparison among 1-st order Euler, 2-nd and 4-th order Runge-Kutta schemes for the variable x. The initial point belongs to the attractor and is the same for all schemes: (x0 , y0 , z0 ) = (14.2041, 15.0165, 34.7172). . . . . . 21 1.9 Lorenz’s 1963 model attractor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.10 Lorenz’s 1963 model attractor projected on xz plane. . . . . . . . . . . . . . . . . 23 1.11 Lorenz’s 1963 model attractor projected on xy plane. . . . . . . . . . . . . . . . . 23 1.12 Lorenz’s 1963 model attractor projected on yz plane. . . . . . . . . . . . . . . . . 24 1.13 Lorenz’s 1963 model: time dependence of x. . . . . . . . . . . . . . . . . . . . . . 24 1.14 Lorenz’s 1963 model: time dependence of y. . . . . . . . . . . . . . . . . . . . . . 25 1.15 Lorenz’s 1963 model: time dependence of z. . . . . . . . . . . . . . . . . . . . . . 25 1.16 Lorenz’s 1963 map: values of the relative maximum zmax (n) and successive relative maximum zmax (n + 1). See Fig. 1.15. . . . . . . . . . . . . . . . . . . . . . 26 1.17 A simple case: 3 grid points (e, f, g) and 2 observations (1, 2). . . . . . . . . . . 34 xi 2.1 How the Kalman Filter works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.1 Synchronization with truth for both EKF/EKF-Evensen and AUS-iterating assimilation schemes. The only observed variable is y with variance σ 2 = 0 and assimilation window τ = 0.25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.2 EKF and EKF-Evensen (same plot in these conditions, see text and eq. 3.40) find it hard to synchronize with truth. The only observed variable is y with variance σ 2 = 0 and assimilation window τ = 0.6 . . . . . . . . . . . . . . . . . . 70 3.3 AUS-iterating: synchronization with truth. The only observed variable is y with variance σ 2 = 0 and assimilation window τ = 0.6 . . . . . . . . . . . . . . . . . . 71 3.4 EKF/EKF-Evensen and AUS-iterating RMS errors. While the former fail to converge to the truth, the latter synchronizes very quickly. The only observed variable is y with variance σ 2 = 0 and assimilation window τ = 0.6 . . . . . . . . 71 3.5 EKF/EKF-Evensen and AUS-iterating RMS errors: a zoom of the previous Fig. 3.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.6 EKF: the only observed variable is y with variance σ 2 = 0.01 and assimilation window τ = 0.6; in this case and in these conditions the EKF fails to synchronize to the truth. Note also the bad performance around time 14. . . . . . . . . . . . 72 3.7 EKF-Evensen: the only observed variable is y with variance σ 2 = 0.01 and assimilation window τ = 0.6. A better performance than pure EKF. . . . . . . . 73 3.8 AUS-iterating: the only observed variable is y with variance σ 2 = 0.01 and assimilation window τ = 0.6. A far better performance. . . . . . . . . . . . . . . 73 3.9 EKF, EKF-Evensen and AUS-iterating RMS errors: respective performances. The only observed variable is y with variance σ 2 = 0.01 and assimilation window τ = 0.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.10 EKF, EKF-Evensen and AUS-iterating RMS errors: a zoom of the previous Fig. 3.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.11 EKF does not synchronize with truth. The only observed variable is y with variance σ 2 = 0.1 and assimilation window τ = 0.25 . . . . . . . . . . . . . . . . 75 3.12 EKF-Evensen does not synchronize with truth. The only observed variable is y with variance σ 2 = 0.1 and assimilation window τ = 0.25 . . . . . . . . . . . . . . 75 3.13 AUS-iterating does not synchronize with truth. The only observed variable is y with variance σ 2 = 0.1 and assimilation window τ = 0.25 . . . . . . . . . . . . . . 76 xii 3.14 The only observed variable is y with variance σ 2 = 0.1 and assimilation window τ = 0.25: in these conditions, no assimilation scheme under investigation actually converge to the truth, but AUS-iterating has a better performance than EKFEvensen, which in turn is far better than pure EKF. . . . . . . . . . . . . . . . . 76 3.15 A zoom of the previous Fig. 3.14. Even if the global performance of EKF is poor because of filter divergence, this does not mean that EKF is always worse than others DA schemes for all times; while the global performance of AUS-iterating is globally better. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.16 RMS Analysis Error distribution: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 2 . . . . . 79 3.17 RMS Analysis Error distribution: an average on 100,000 assimilations with an assimilation window τ = 0.25, 2 noisy observations with variance σ 2 = 2 . . . . . 79 3.18 RMS Forecast Error distribution @ time=T+0.25: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.19 RMS Forecast Error distribution @ time=T+0.50: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.20 RMS Forecast Error distribution @ time=T+0.75: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.21 RMS Analysis Error distribution: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 1 . . . . . 82 3.22 RMS Forecast Error distribution @ time=T+0.25: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.23 RMS Forecast Error distribution @ time=T+0.50: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.24 RMS Forecast Error distribution @ time=T+0.75: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.25 RMS Analysis Error distribution: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 0.1 . . . . 85 xiii 3.26 RMS Forecast Error distribution @ time=T+0.25: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.27 RMS Forecast Error distribution @ time=T+0.50: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.28 RMS Forecast Error distribution @ time=T+0.75: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.29 RMS analysis and forecast error: an average on 100,000 assimilation steps, assimilation window τ = 0.25, 3 variables observed with variance σ 2 = 2 ⇒ σ = 1.414. 87 3.30 RMS analysis and forecast error: an average on 100,000 assimilation steps, assimilation window τ = 0.25, 3 variables observed with variance 1. . . . . . . . . . 88 3.31 RMS analysis + forecast error: an average on 100,000 assimilation steps, assimilation window τ = 0.25, 3 variables observed with variance σ 2 = 0.1 ⇒ σ = 0.316. 89 3.32 EKF assimilation scheme: solution for x. The green line shows the time of the last observation available. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.33 EKF-Evensen assimilation scheme: solution for x. . . . . . . . . . . . . . . . . . . 90 3.34 EKF-Yang assimilation scheme: solution for x. . . . . . . . . . . . . . . . . . . . 90 3.35 AUS-γ0 assimilation scheme: solution for x. . . . . . . . . . . . . . . . . . . . . . 91 3.36 AUS-γ assimilation scheme: solution for x. . . . . . . . . . . . . . . . . . . . . . 91 3.37 AUS-iterating assimilation scheme: solution for x. . . . . . . . . . . . . . . . . . 92 3.38 AUS-iterating+ assimilation scheme: solution for x. . . . . . . . . . . . . . . . . 92 3.39 How AUS-γ assimilates observations. . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.40 A qualitative comparison between EKF and AUS-γ. . . . . . . . . . . . . . . . . 94 3.41 The zone of the attractor considered in the next Figure 3.42. . . . . . . . . . . . 95 3.42 How AUS-iterating works: a zoom of previous Fig. 3.41. . . . . . . . . . . . . . . 96 3.43 More details on AUS-iterating assimilation scheme, including the evolving unstable vectors in the final forecast trajectory, recomputed after iteration. . . . . . 96 xiv List of Tables 2.1 Breeding on the Data Assimilation System: introducing the perturbations and estimating the unstable subspace. This is a specific example in which the breeding time is ∆t = 2τ , where τ = tk+1 − tk is the assimilation window. . . . . . . . . . 56 3.1 The different data assimilation schemes under investigation with their main features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.2 RMS analysis error, an average over 100,000 assimilations. 3 and 2 noisy observations with variance σ 2 = 2 ⇒ σ = 1.414. Assimilation window τ = 0.25. . . . 78 3.3 RMS forecast error, an average over 100,000 assimilations. The mean RMS analysis error is the same as in Table 3.2 and is shown here again for comparison. 3 noisy observations with variance σ 2 = 2 ⇒ σ = 1.414. Assimilation window τ = 0.25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.4 RMS analysis and forecast error, an average over 100,000 assimilations. 3 noisy observations with variance σ 2 = 1. Assimilation window τ = 0.25. . . . . . . . . 82 3.5 RMS analysis and forecast error, an average over 100,000 assimilations. 3 noisy observations with variance σ 2 = 0.1 ⇒ σ = 0.316. Assimilation window τ = 0.25. 85 3.6 Random error in the assimilation model: 3 observations with variance σ 2 = 2 ⇒ σ = 1.414 each assimilation window τ = 0.25. It is shown the mean RMS analysis error, an average over 20000 assimilations, for the different DA schemes. . . . . . 98 3.7 Random error in the assimilation model: 2 observations with variance σ 2 = 2 ⇒ σ = 1.414 each assimilation window τ = 0.25. We show the mean RMS analysis error, an average over 20000 assimilations. . . . . . . . . . . . . . . . . . . . . . . 99 3.8 Systematic error in the assimilation model, r = 28 (i.e. no error, for reference), r = 30, r = 33. Three observations with variance σ 2 = 2 each assimilation window τ = 0.25. It is shown the mean RMS analysis error, an average over 20,000 assimilations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 xv 3.9 Systematic error in the assimilation model, r = 28 (i.e. no error, for reference), r = 30, r = 33. Two observations with variance σ 2 = 2 each assimilation window τ = 0.25. It is shown the mean RMS analysis error, an average over 20000 assimilations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 xvi Abbreviations 3D-Var 3-dimensional variational analysis 4D-Var 4-dimensional variational analysis AUS Assimilation in the Unstable Subspace BDAS Breeding on the Data Assimilation System BV Bred Vector CFL Courant-Friedricks-Lewy DA Data Assimilation ECMWF European Centre for Medium-Range Weather Forecasts EKF Extended Kalman Filter EnKF Ensemble Kalman Filter ENIAC Electronic Numerical Integrator And Computer FPE Fokker-Planck Equation GCM General Circulation Model KF Kalman Filter L63 Lorenz’s 1963 three variables convective systems LAM Local Area Model LE Liouville Equation LLV Local Lyapunov vector LT Liouville Theorem xvii LV Lyapunov vector NWP Numerical Weather Prediction OI Optimal Interpolation PDF Probability Density Function QG Quasi-Geostrophic RK Runge-Kutta RK2 2nd order Runge-Kutta scheme RK4 4-th order Runge-Kutta scheme RMS Root Mean Square SCM Successive Correction Method TLM Tangent Linear Model UTC Universal Time, Coordinated. Also referred to as ZT ZT Zulu Time. Also referred to as UTC xviii Partial list of symbols Symbols are listed in alphabetical order, first Latin alphabet, then Greek one. Uppercase symbols precede lowercase ones. E matrix whose columns are the unstable vectors H observation operator H linearized observation operator I identity matrix K gain matrix M nonlinear model operator M linear model operator, tangent linear model operator Pa analysis error covariance matrix Pf forecast error covariance matrix Q model error covariance matrix R observational error covariance matrix S Mod model space S ob observational space e leading local Lyapunov vector, single unstable column vector f generic function, Coriolis parameter xa analysis state column vector xb background, or “first guess” state column vector xf forecast state column vector xix yb background, or “first guess” values at observation points yo observation vector (includes noise) column vector yt true values of the observed variables, column vector Γ diagonal matrix whose elements are the estimate of the amplitudes of the errors along the unstable vectors Λ diagonal matrix whose elements are the amplification factors of the unstable vectors Ω earth angular rotation vector δxa analysis increment xa − xb δyo innovation yo − H(xb ) γ estimate of error amplitude ηM Model error ηa analysis error xa − xt ηf forecast error xf − xt ηo observational error yo − H(xt ) λi i-th Lyapunov exponent τ assimilation window xx Chapter 1 Introduction The Atmosphere plays an important role in weather and climate systems, including the hydrosphere (Oceans), cryosphere (ice and snow), lithosphere (soil) and biosphere (living systems). All these components interact with each others [8]. The Atmosphere is also well known to be a chaotic system with an enormous number of degrees of freedom and with many scales of motions, both in space and time: its predictability has a finite time horizon [31]. In this context, to accurately estimate the state of the system we need to extract as much information as possible from observational data, when available, and from the equations governing the evolution of the system, to the extent they are known: that’s the goal of Data Assimilation, a fundamental step for Numerical Weather Prediction (NWP). In order to develop new theoretical work, we address the problem of data assimilation in chaotic systems in the framework of Lorenz’s three variable convective model (1963). The aim is to develop data assimilation schemes applicable to different contexts and computationally affordable in operational environments. 1.1 A historical perspective It was a Norwegian meteorologist, Vilhelm Bjerknes, who first realized in 1904 that weather forecasts were actually an initial value problem, to be solved by integrating the set of governing equations of the atmosphere, starting from an accurate estimate of initial conditions obtained from observations. This was the first explicit recognition that the future state of the atmosphere can be completely, deterministically calculated by its initial state and known boundary conditions, together with seven equations: Newton’s equations of motion (three equations for the three velocity components), the continuity equation (conservation of mass), the equation of state for ideal gases, the first law of thermodynamics (conservation of energy) and a conservation equation for water mass [16]. Bjerknes was able to persuade Norwegians to build 1 a network of surface observation stations, founded the renown Bergen School of synoptic and dynamic meteorology and proposed the famous polar front theory of cyclogenesis. Lewis F. Richardson suggested in 1922 to integrate numerically the equations of motion of the atmosphere and described how exactly this could be done. His first attempt of weather forecast was actually unsuccessful, but did not diminish the value of his seminal work. The increasing development of observational network on one hand, and the introduction of reliable computing machines on the other, boosted a new interest on Richardson’s approach. An important issue to address was balancing of initial conditions: if they are not in quasigeostrophic balance1 , inertia-gravity waves will arise and propagate horizontally. After a while, they will drastically reduce their amplitude, leaving a field in quasi-geostrophic balance: this process is called geostrophic adjustment. The time scale for this process to take place is of the order2 of f −1 , approximately 12h [16]. Moreover, in Richardson’s first attempt, the integration of the equations resulted in computational instability, due to a violation of Courant-FriedricksLewy (CFL) condition, which requires that the time step must be smaller than the grid size divided by the speed of the fastest waves (sound waves, moving at about 340 m/s). To address these problems, in 1948-49 Jule C. Charney and Eliassen introduced “filtered” equations of motion, based on quasi-geostrophic balance — i.e. slowly varying, so filtering out gravity and sound waves — and based on pressure fields alone. In 1950 Charney, R. Fjørtoft and J. von Neuman performed on ENIAC, one of the first computers built3 , the first historical 24h weather forecast using a barotropic one-layer filtered model: the results were encouraging. The initial values, though, were still set by subjective analysis, relying on judgments of experienced analysts. In 1949 Panofsky made the first attempt to overcome subjective analysis by an automatic procedure, called objective analysis, albeit the term “objective” actually depends on the algorithms used. This procedure used a polynomial expansion to fit all the observations of several grid-points in a given area. The first operational numerical weather forecast were issued in Sweden by Rossby and his group in September 1954. In the meanwhile, new advances were introduced by Gilchrist and Cressman (1954), who used a polynomial expansion, too, but with a local rather than areal 1 The quasi-geostrophic equations hold for large scale, low-frequency motions, except low latitudes [1]. Coriolis parameter f ≈ 2Ω sin ϕ0 , where Ω is the earth’s angular velocity, and ϕ0 the latitude. 3 The ENIAC (Electronic Numerical Integrator And Computer) was built in 1946. It was indeed not the first computer ever built and operating. The first fully functional, freely programmable computers of the world was actually the German Konrad Zuse’s Z1, Z2 and Z3, electro-mechanical machines built between 1936 and 1941. The next generation used vacuum tubes: the Atanasoff-Berry Computer, built in 1939 at Iowa State University, USA, and the British Colossus, operating since December 1943 in Bletchley Park, Milton Keynes, UK. Both Zuse and Atanasoff-Berry computers were conceptual milestones in Computer Science, because the former introduced the idea of programmable machines and the latter the use of the binary system and other innovations. Nevertheless, it was the Colossus that had a truly enormous impact in the mankind history, because it helped a lot in defeating Nazism: it was used during World War II to decrypt the most important messages transmitted between the German Army Field marshals and their Central High Command in Berlin. 2 The 2 fit of observations: the idea of radius of influence was born. At each grid point they used a polynomial function to approximate the fields, taking into account only the observations near the grid point, i.e. within the radius of influence. Two more elements, introduced by the same authors, were later adopted in successive works: automatic check for data quality, and an a priori estimate of the analysis, obtained from a previous numerical forecast. This preliminary estimate is now referred to as background field, first guess field, or prior estimate. During these pioneering efforts by meteorologists to run reliable NWP models, it quickly turned out that the accuracy of a model strongly depends on spatial resolution. In general, the higher the resolution, the higher the accuracy of the model, but — of course — the higher the computational cost as well. That’s because, due to computational stability requirements, doubling the 3-dimensional space resolution also requires to double the time resolution. This implies a 24 factor for the total computational cost (three spatial and one time dimensions). As a result, grinding atmospheric models has always been a challenging task for supercomputers, and the model size has always been driven, in turn, by the available computing capacity. Further improvements came along: in 1955 Bergthorsson and Doos developed an analysis method that eventually became known as “successive correction”. They reduced the computational cost of the interpolating procedures by specifying an a priori weight for each observation, weight to be determined on a statistical basis. Thompson in 1961 proposed to take full advantage of the propagation of information from well observed regions to data-void ones. A quite usual situation in the global observing system is the presence of zones where new observations are regularly available (e.g. densely populated regions) and regions where new observational data are scarce (e.g. oceans and deserts). An objective analysis can be made for the former areas, and some sort of educated guess for the latter. Then the integration of a NWP model will provide a forecast valid at the next observation time. Now we have new observations from data-rich areas and model-output data for data-poor ones: information has propagated. When computers became sufficiently fast, scientists turned to primitive equations instead of filtered ones, and introduced regional models (or Local Area Models, LAMs) side by side to General Circulation Models (GCMs), a GCM setting the boundary conditions for the regional one. Thus, a regional model can be used for short range forecast only, as its high quality initial conditions will be lost due to the “information advection” of the ever changing boundary conditions driven by the GCM. Furthermore, as a consequence of increasing computing power, now meteorologists no more tend to use the hydrostatic approximation — in which vertical acceleration is neglected versus the gravitational one — for regional models. 3 1.2 Data Assimilation In his pioneering paper cited in the section above [4], Charney stressed the need of objective analysis of meteorological data, not to rely upon time-consuming human activities and subjective interpretations. This process is nowadays referred to as Data Assimilation (DA): its goal is to produce an optimal, automatic estimate of the state of a dynamical system from incomplete, noisy observations and (approximate) knowledge of the laws governing the evolution of the system. In the initialization of forecast models of the Ocean and the Atmosphere, DA is performed cyclically. A new term, Data Assimilation Cycle, has been introduced to describe this cyclic procedure, that encompasses the following steps: • Quality control of observational data • Objective analysis • Initialization of the forecast model • Short-range forecast to be used to estimate the next background field Quality control is a very delicate step, because it has been shown that the analysis can be highly sensitive to quality control decisions [8]. Generally speaking, datum quality is checked against its neighbors, and a further requirement of spatial and temporal consistency is asked to be fulfilled. Observations can be checked against the background, too. Objective analysis step exploits both the available observational data, yo , and the background field, xb , numerically computed in the previous observation time. Obtaining the background or first guess “observations” is a matter of interpolating the model forecast to the observational stations: model variables are converted to observed variables. First guess “observations” are so H(xb ), where H is the operator performing the required interpolation and conversion from model variables to observation space. It should be noted that the operator H is not a linear one, in general. The difference yo − H(xb ) between the observations and the first guess “observations” is usually called observational increment or innovation [16]. The analysis state xa is computed by adding the innovation to the model background field, with weights W to be determined by estimating statistical error covariances of the forecast and the observations: xa = xb + W yo − H(xb ) (1.1) Many analysis techniques, such as Successive Correction Method (SCM), Optimal Interpolation (OI), 3-dimensional variational analysis (3D-Var) and Kalman Filter (KF) use eq. 1.1, but with different ways to calculate the weights W . 4 Figure 1.1: A 6-hours data assimilation cycle for weather forecasts. Analyses are computed every 6 hours, typically at 0000 ZT, 0600 ZT, 1200 ZT, 1800 ZT. After calculating the analysis state xa , the forecast model can be initialized to obtain the routine forecast xf : this is actually the aim of the entire procedure. The numerical shortrange forecast to estimate the background field for the next observation time is usually the output of a high resolution model that implements primitive equations. This model, called assimilation model, has a complex set of parameterizations such that, if no new observations become available, the model climate — computed by time-averaging a long run of the model — will approximate the true climate. Short-range numerical forecast, typically 6-h ahead, replaced the simple use of climatology as background field. One approach to assimilate observations at various time is 4D-Var (e.g. Lewis and Derber [17], Courtier and Talagrand [7]). This assimilation system is the operational DA scheme used at ECMWF and Météo-France, among other important meteorological centers. 1.3 Framing the problem This work is focused in data assimilation on chaotic systems. In order to introduce chaotic systems, in this section we will briefly describe the general properties of conservative and nonconservative systems; then we will survey the basics of dissipative and chaotic systems, attractors, 5 Lyapunov vectors and exponents, and the bred vectors. We will also provide classic examples of chaotic dynamical systems, described both by differential equations and maps. 1.3.1 Basic features of chaotic systems, a historical note The fundamental property of a chaotic system is the sensitive dependence on initial conditions, discovered by Jules-Henri Poincaré in 1897 for a simplified three-body problem: a planetary system including 2 stars and a small “asteroid” [16]. Later in a remarkable 1908 monograph — Science et Méthode — he wrote down these basic concepts about chaos (Livre premier, § IV): “A very small cause which escapes our notice determines a considerable effect that we cannot fail to see, and then we say that the effect is due to chance. If we knew exactly the laws of nature and the situation of the universe at the initial moment, we could predict exactly the situation of the same universe at a succeeding moment. But even if it were the case that the natural laws had no longer any secret for us, we could still know the situation only approximately. If that enables us to predict the succeeding situation with the same approximation, that is all we require, and we will say that the phenomenon has been predicted, that it is governed by the laws. But it is not always so; it may happen that small differences in the initial conditions produce very great ones in the final phenomena. A small error in the former will produce an enormous error in the latter. Prediction becomes impossible and we have the fortuitous phenomenon.” – Jules-Henri Poincaré, 1908 Albeit this note was written in 1908, it has never become dated. Thanks to the increasing computational capability of computers, the meteorologist Edward N. Lorenz, now Professor Emeritus at Massachusetts Institute of Technology, “rediscovered” chaos in early 60s while examining a relatively simple mathematical model of weather. That’s the same we will survey in section 1.4 below. 1.3.2 Conservative, nonconservative dynamical systems Generally speaking, a dynamical system changes its state depending on time. A dynamical system that evolves continuously in time is known as a flow [19]. It can be described by a set of differential equations giving the evolution of the state of the system, knowing its previous states: ẋ = F(x) 6 (1.2) where x(t) = (x1 (t), x2 (t), . . . , xN (t)) is the N -dimensional vector — depending on time t — describing the state of the system at time t: the vector x(t) can be unambiguously mapped, through a bijection, to a point in the phase space of the system. The term ẋ = dx dt is the time derivative of x(t) and F(x) = (F1 (x), F2 (x), . . . , FN (x)) is a N -dimensional vector function of the state x of the system. Given the initial state x(0) we can deterministically calculate the trajectory, or orbit, x(t) of the system for all future times. Here the time variable t is continuous. In a system where particles move without friction, called Hamiltonian system, the Liouville’s theorem (see subsection 1.3.5) assures that the volume of any subset of points in the phase space is conserved: if each point of the initial subset is evolved forward in time, the resulting set of points has the same volume as the initial one. So the system is also called conservative. In a nonconservative system, instead, time evolution does not preserve volumes in phase space. 1.3.3 Probability Density Functions and Liouville equation In a dynamical system of the form dx dt = F(x, t), there could be uncertainties due to a bad esti- mation of initial conditions, or even a bad knowledge of the model used: a statistical approach is a standard practice. The probability density function (PDF) associated to the continuous variable x(t) can be thought of as the set of possible realizations in the phase space of the outcome of the model: ρ = ρ(x(t), t) (1.3) If the PDF is Gaussian — an usual assumption in data assimilation — it is completely defined by the mean of the state x(t) and by the second moment about the mean, i.e. its variance: hxi ≡ ≡ (x − hxi)2 Z ··· Z ··· Z +∞ −∞ Z +∞ xρ(x(t), t) dx (1.4) (x− < x >)2 ρ(x(t), t) dx (1.5) −∞ The Liouville equation (LE) is the probabilistic description of the time-dependent evolution of an ensemble of solutions of the numerical model dx dt = F(x, t) from different initial conditions [9]. It governs the time evolution of the PDF ρ (x(t), t) associated to the model state x(t). It may be written in a simplified form as follows: ∂ρ + ∇ ·ρF = 0 ∂t 7 (1.6) or in the more complete form: N ∂ρ (x(t), t) X ∂ + ρ (x(t), t) F i (x(t), t) = 0 i ∂t ∂x i=1 (1.7) where F i is the i-th component of the vector function F. The Liouville equation is a conservation equation: it states that the local change of ρ — in a particular point of the phase space — must be equal to the net flux of realizations across the faces of an infinitesimal volume around the point under examination; equivalently, Liouville equation means that phase space integral of the realization density is a constant with respect to time. It is an inhomogeneous partial differential equation, linear in the PDF ρ, which is the single dependent variable. 1.3.4 The Markov processes and the Fokker-Planck equation If our dynamical system is forced by some sort of stochastic noise, that could arise from our model’s misrepresentation, the general equation reads: dx(t) = F(x(t), q(t), t) dt (1.8) where q(t) is the vector of random disturbances. If q(t) is a Markov process its probability law in the future does only depend on the given state, not on how the system reached that state. An example of Markov process is the Brownian motion x(tn ) = x(tn−1 ) + w(tn ) (1.9) where w is a white Gaussian forcing. A few examples of Brownian motions, with standard deviation of the Gaussian forcing σ = 1, are shown in Fig. 1.2. Furthermore, if in eq. 1.8 q(t) represents an additive white Gaussian forcing function, we can write [15]: dx(t) dq(t) = F(x(t), t) + dt dt (1.10) because the white Gaussian noise can be thought of as the derivative of Brownian motion [15]. Equation 1.10 is sometimes called the Langevin equation. It can be shown that, being q(t) a Markov process, so it is x(t). We can now write the solution of 1.10 in the form x(t) = M [x(t0 )] + q(t) (1.11) The Fokker-Planck equation (FPE) describes the evolution of the PDF associated to these 8 Figure 1.2: A set of Brownian processes starting at x(0) = 0. The random forcing has zero mean and standard deviation σ = 1. 60 50 40 30 x(t) 20 10 0 −10 −20 −30 −40 0 100 200 300 400 500 600 700 800 900 1000 t stochastic systems: it includes a random term — such as for example a model error — in the Liouville equation 1.7. This term has the form of a diffusion component: i h N ∂ 2 ρ (x(t), t) (Q)i,j N i X X ∂ ρ (x(t), t) F (x(t)) ∂ρ (x(t), t) 1 =− + i ∂t ∂x 2 ∂xi ∂xj i,j=1 i=1 (1.12) where the first sum refers to the drift and the second to diffusion: the Fokker-Planck equation describes the time evolution of a PDF due to both of them. The matrix Q is the stochastic noise covariance matrix. It should be noted that the FPE is a linear in ρ, which is the only dependent variable. 1.3.5 The Liouville Theorem As we already mentioned in subsection 1.3.2, the Liouville Theorem (LT) states that a conservative system ẋ = F(x) conserves in time the volumes of any subset of points in the phase space. In nonconservative systems, instead, this does not occur: in particular, as we will prove below, an initial volume V0 = V (0) of a given phase space region D0 shrinks according to [26, 2]: Z dV ∇ · F(x) dx = dt t=0 D0 9 (1.13) If ∇ · F(x) does not depend on the vector x, eq. 1.13 simplifies: dV dt t=0 = ∇·F Z dx (1.14) D0 dV0 = V0 ∇ · F dt (1.15) Rearranging and integrating from time t0 = 0 to t: Z V dV0 = ∇·F V0 V0 Z t dt (1.16) t0 V = t∇ ·F V0 (1.17) V (t) = V0 et∇·F (1.18) ln So finally, in this particular case: Proof of eq. 1.13. In the phase space of a dynamical system described by equation 1.2 we can define the phase flux g t : g t : x(0) → x(t) and, by definition of Jacobian ∂gt x ∂x , (1.19) ∀t we have: V (t) = Z det ∂g t x ∂x dx (1.20) g t (x) = x + F(x) t + O(t2 ) (1.21) ∂F ∂g t x = I+ t + O(t2 ) ∂x ∂x (1.22) D0 For t → 0: Thus But for any N × N square matrix A = (aij ) and for t → 0 it holds the following relation: det (I + At) = 1 + t tr (A) + O(t2 ) where tr (A) = PN i=1 (1.23) aii is the trace of the matrix A. So we have: det ∂g t x ∂x = 1 + t tr 10 ∂F ∂x + O(t2 ) (1.24) Now we notice that, of course: tr ∂F ∂x = N X ∂Fi i=1 ∂xi = ∇·F (1.25) so eq. 1.20 becomes: V (t) = Z D0 1 + t ∇ · F + O(t2 ) dx (1.26) which proves equation 1.13. If the divergence ∇ · F = 0, the phase flux g t preserves the volumes: ∀t, V (t) = V (0) and the Liouville theorem has been proved. Because of that volume independence on time, the system is called conservative. 1.3.6 Dissipative systems, attractors and strange attractors Systems which exhibit volume contraction in phase space are called dissipative, because commonly friction, viscosity or other processes dissipating energy are involved. The volume contraction proves also the existence of a bounded attracting set of points, the attractor, toward which converge all trajectories, after an appropriate transient time: if we consider initial conditions in an adequate region of phase space, for increasing time t they will eventually converge to the attractor. More formally, an attractor is a closed set A satisfying the following properties [31]: • A is an invariant set: any trajectory x(t) starting in A will remain in A for all time: ∀x(t), x(0) ∈ A ⇒ x(t) ∈ A, ∀t • A attracts an open set of initial conditions: this means that A attracts all trajectories starting sufficiently close to it: ∃B, with B an open set, so that x(0) ∈ B ⇒ d(x(t), A) → 0 as t → ∞, where d(x(t), A) is the distance from x(t) to A. The largest B is called the basin of attraction of A • A is minimal: no proper subset of A will satisfy the above properties In many cases, as for example the Lorenz’s 1963 convective system (see below), we have a strange attractor: the “strangeness” refers to the sensitive dependence on initial conditions of the nonperiodic flow, though initially strange attractors were called in such a way because of their common fractal dimensionality [31]. A solution which is stable in the sense of Lyapunov means that any other solution sufficiently close to it will remain close for increasing time. Thus “sensitive dependence on initial conditions” means actually “unstable in the sense of Lyapunov”. It can be shown that a solution possessing Lyapunov stability must be a periodic or quasiperiodic one [19]. 11 The Lorenz’s 1963 system clearly shows a lack of periodicity, as we can see for example in Fig. 1.13 below, which in turn implies a limited predictability of the system because of sensitive dependence on initial conditions [19]. The general behavior is chaotic, even if we can always find unstable periodic orbits arbitrarily close to aperiodic ones [32]. 1.3.7 Small perturbations dynamics, tangent linear model, adjoint model Lyapunov instability is a matter of small perturbation growth. A small perturbation δx(t) of a trajectory x(t) is assumed to evolve in a linear way. That is: δx(tk+1 ) = Mk δx(tk ) (1.27) which is the Tangent Linear Model (TLM). Here the operator Mk , that depends on time tk , is an operator linearized around the base-trajectory. It is called the resolvent, or propagator, of the TLM. Since Mk is an operator defined on real numbers, its adjoint is simply its transpose, the operator MTk . In order to justify eq. 1.27, consider a nonlinear discrete model that can be written as a set N nonlinear coupled ordinary differential equations dx = F(x) dt (1.28) where x is an N -dimensional vector and F an N -dimensional vector function. The model is written in differential form. When a time-difference scheme is chosen, eq. 1.28 becomes a set of difference equations. If for example a Crank-Nicholson approach is implemented, this set of equations would be of the form [16]: x(tk+1 ) = x(tk ) + ∆t · F x(tk ) + x(tk+1 ) 2 (1.29) So we can integrate eq. 1.28 running the model between an initial time t0 and a final time t, by recursively using eq. 1.29: the solution x(t) will depend on initial conditions only: x(t) = M [x(t0 )] (1.30) which depends only on time t0 . Here the operator M — that in general is nonlinear — represents the time integration between t0 and t. 12 If we add a small perturbation δx(t0 ) to the reference model integration x(t0 ) we can write: M [x(t0 ) + δx(t0 )] = = ∂M δx(t0 ) + O δx(t0 )2 ∂x x(t) + δx(t) + O δx(t0 )2 M [x(t0 )] + (1.31) (1.32) where we are using 1.30 and the small perturbation dynamics: δx(t) ∂M δx(t0 ) ∂x = (1.33) = M δx(t0 ) ∂M ∂x that is the same as eq. 1.27. Here M = (1.34) is the N × N matrix called the resolvent or propagator of the tangent linear model, and propagates an initial small perturbation at time t0 to a perturbation at time t. Since it is linearized from t0 to t, M depends on reference trajectory x(t) but not on perturbation δx(t0 ). The linearized evolution of δx(t0 ) will be given by ∂F [x(t)] dδx(t) = δx(t) dt ∂x where ∂F[x(t)] ∂x ∀t ∈ [t0 , t] (1.35) is the Jacobian of F. This system (eq. 1.35) defines the tangent linear model in differential form [16]. 1.3.8 Lyapunov vectors and Lyapunov exponents In order to have a more precise, quantitative idea of the “sensitive dependence on initial conditions”, we will summarize the concepts of Lyapunov vectors and Lyapunov exponents. Let’s consider a trajectory on the attractor (after an appropriate transient time): here a state at time t is described by the vector x(t). Now we consider a very close point, x(t) + δx(t), where δx(t) is a separation vector whose initial length δx(0) is very small. We are interested in how δx(t) will grow. One finds [31] that close trajectories, starting on a sphere of infinitesimal radius, diverge exponentially fast: kδx(t)k ' kδx(0)k eλt (1.36) where λ is called the global leading (or largest ) Lyapunov exponent. It describes the long term growth of the resulting hyper-ellipsoid, and can be estimated by [16]: 1 λ = lim t→+∞ t δx(t) lim ln δx(0) δx(0)→0 13 (1.37) In practice, the leading Lyapunov exponent is computed as follows: • we perturb the vector trajectory x(t) with an infinitesimal random vector δx(t) • we evolve it from time t to time t + ∆t using the Tangent Linear Model: δx(t + ∆t) = M δx(t) (1.38) where we dropped — for sake of lighter notations — the obvious dependence of M on t and ∆t • we repeat the previous step for a long time, scaling down at regular intervals the perturbation vector to avoid computational overflow Other Lyapunov exponents can be computed in the same way, except that we must periodically perform a Gramm-Schmidt orthogonalization to the set of perturbations that defines the shrinking hyper-ellipsoid: otherwise, they all will converge to the first Lyapunov exponent. It should be noted, indeed, that for an N -dimensional system there are N Lyapunov exponents: an initially infinitesimal N -dimensional hyper-sphere will be distorted, due to the evolution of the system, in an infinitesimal hyper-ellipsoid. If we define δi x(t), i = 1, ..., N the length of the N principal axis of this hyper-ellipsoid, then λi will be their growth rates, and equation 1.36 will be replaced by: kδi x(t)k ' kδi x(0)k eλi t (1.39) For large t the stretching of the hyper-ellipsoid will be driven by the most positive λi . Since the Lyapunov exponents depend weakly on the trajectory, we must actually average on many different points to get an estimate of λi : 1 t→+∞ t λi = lim δi x(t) ln δi x(0) δi x(0)→0 lim (1.40) The Lyapunov exponents defined so far, are global property of the flow: we are often interested in local dynamic properties. So we define the leading Local Lyapunov Vector (LLV) at time t: it is the vector e towards which converge all random perturbation δx(t − ∆T ), started a long time ∆T before t. It may be defined using the Tangent Linear Model: e(t) = lim ∆T →+∞ M(t − ∆t, t) δx(t − ∆t) (1.41) After computing the leading local Lyapunov vector, the corresponding local Lyapunov exponent may be computed from the change of its norm. 14 Another feature which is related to the Lyapunov exponents is the dimensionality of the attractor. The Kaplan-Yorke dimension of the system is defined by: D≡k+ λ1 + λ2 + . . . + λk |λk+1 | (1.42) where λ1 > λ2 > . . . > λN are the Lyapunov characteristic exponents in decreasing order and k is the integer for which λ1 + λ2 + . . . + λk > 0 and λ1 + λ2 + . . . + λk + λk+1 < 0. An intuitive justification of eq. 1.42 is the following. The sum of all the exponents is the rate at which the volume of the hyper-ellipsoid will increase or decrease: it will be zero for conservative systems and negative for dissipative ones. If we consider an N -dimensional box containing the attractor, the sum λ1 + λ2 + . . . + λk of the first k Lyapunov exponents accounts for the rate at which will increase or decrease the k-dimensional hyper-volume of the projection of an infinitesimal hyper-ellipsoid on the k-dimensional face of the box. If λ1 > 0 but λ1 + λ2 < 0, then the projection of the hyper-ellipsoid on one edge of the box will grow, while the projection on a 2-dimensional face will shrink: we may expect the attractor to consist of complex curves, without surfaces. On the other hand, if λ1 + λ2 > 0 but λ1 + λ2 + λ3 < 0, then the projection of the hyper-ellipsoid on a 2-dimensional face of the box will grow, while the projection on a 3dimensional face will shrink: we may expect the attractor to consist of complex surfaces, but no 3-dimensional manifolds. In general, if λ1 +λ2 +. . .+λk > 0 and λ1 +λ2 +. . .+λk +λk+1 < 0 the attractor may be thought of as consisting of complex k-dimensional manifolds [19]. A stable system will have all Lyapunov exponents less or equal to zero, while a chaotic one, whether dissipative or not, will have at least one positive Lyapunov exponent. Furthermore, a chaotic bounded flow must have a zero Lyapunov exponent, with the corresponding local Lyapunov vector aligned to the trajectory. 1.3.9 Bred vectors Bred vectors (BVs) represent finite amplitude perturbations. Lyapunov vectors (LVs), instead, represent infinitesimal perturbations by definition. The BVs are computed in a similar manner as the LVs, but using the nonlinear model and a finite renormalization amplitude [16]: • we perturb the vector trajectory x(t) with a given finite amplitude random vector δx(t); this random perturbation is introduced only once, at the beginning of the breeding cycle. The size of the initial perturbation is the only tunable parameter of the breeding 15 procedure, and in operational NWP models it can be used to filter out unwanted fast instabilities, such as convection or even Brownian motion; • we evolve the resulting perturbed trajectory by using the nonlinear model M ; the same will be done for the unperturbed trajectory. At fixed time interval ∆t we subtract the unperturbed trajectory from the unperturbed one: δx(t + ∆t) = M (x(t) + δx(t)) − M (x(t)) (1.43) where we dropped again the obvious dependence of the nonlinear model M on t and ∆t. • we scale down the resulting difference by dividing it by its amplification factor, in order to keep its size the same as the initial one. As we can see from their definition, BVs are closely related to leading Local Lyapunov Vectors (LLVs), since after an infinite breeding time, infinitesimal amplitude bred vectors are identical to LLVs. They share some features, such as the independence on the norm and on the rescaling time, but not others: for example, even without orthogonalization, BVs don’t converge to a single leading BV, because of nonlinearity. 1.3.10 Dynamical systems described by maps Also very important are those dynamical systems where the time is a discrete variable. In such a case the dynamical system is described by an N -dimensional map xn+1 = G(xn ) (1.44) where n is the discrete time variable, xn is the N -dimensional state vector of the system and G is the N -dimensional vector function of the state vector xn ; i.e. G evolves the state vector xn at time n, into the new state vector xn+1 at time n + 1. It should be noted that a map can be created by any flow, described by eq. 1.2, simply by observing the flow only at regular time intervals. As an example of a one-dimensional nonlinear map, consider the logistic map: xn+1 = r xn (1 − xn ) (1.45) which is a simple population dynamics model akin to the classic logistic equation: dx = r x(1 − x) dt 16 (1.46) Bifurcation diagram for the logistic map 1 0.9 0.8 0.7 x 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 r 2.5 3 3.5 4 Figure 1.3: Bifurcation diagram for the Logistic map, 0 ≤ r ≤ 4. that describes the logistic population growth [22]. Here r is the intrinsic per capita growth rate, and x is the population density with respect to the total carrying capacity, i.e. 0 ≤ x ≤ 1. The logistic map is chaotic with the parameter r = 4, but it actually displays many different behaviors depending on the value of parameter r: we can have stable, periodic or chaotic solutions. For example, as we can see in the bifurcation diagram shown in Figures 1.3, 1.4 and 1.5, we have a period 1 solution for 0 < r < 3, while for r = 3 we note the first bifurcation from period 1 to period 2. We observe another bifurcation from period 2 to period 4 when √ r = 1 + 6=3.45. The onset of chaos occurs at r ' 3.57, but there are also other periodic √ windows, as for example the period 3 window at r = 1 + 2 2 = 3.83. The values of the parameter r for which we have a periodic behavior is an infinite number of finite intervals, while the values for which it is chaotic, with 3.57 < r ≤ 4, form a Cantor set [19]. The bifurcation diagram for the logistic map shown in Figures 1.3, 1.4 and 1.5 — with different zoom windows — have been obtained by plotting as a function of the parameter r a set of values for xn resulting as the evolution of a random value x0 : we iterated many times, and discarded the first points corresponding to the transient time, before the convergence to the attractor. The logistic map is not invertible, because each points — except the maximum — has not a unique past. Its Lyapunov exponent is λ = 0.693147 = ln 2. 17 Bifurcation Diagram for the logistic map 1 0.9 0.8 0.7 x 0.6 0.5 0.4 0.3 0.2 0.1 0 2.8 3 3.2 3.4 3.6 3.8 4 r Figure 1.4: Bifurcation diagram for the Logistic map, 2.8 ≤ r ≤ 4. Notice the period-3 window around r = 3.83. Bifurcation Diagram for the logistic map 0.18 0.17 x 0.16 0.15 0.14 0.13 3.84 3.842 3.844 3.846 3.848 r 3.85 3.852 3.854 3.856 Figure 1.5: Bifurcation diagram for the Logistic map, a zoom. We can observe the self-similarity properties of this diagram. 18 Figure 1.6: Hénon map: strange attractor for classical parameters a and b (see text). 0.4 0.3 0.2 y 0.1 0 -0.1 -0.2 -0.3 -0.4 -1.5 -1 -0.5 0 x 0.5 1 1.5 An example of a two-dimensional, dissipative map is the following Hénon map [12][31]: xn+1 yn+1 = yn + 1 − a x2n = b xn (1.47) This map, shown in Fig. 1.6, is also chaotic with the two canonical parameter a = 1.4, b = 0.3. Its attractor is shown below: they have been plotted 100,000 dots after a transient of 10,000 time step, with a starting point (x0 , y0 ) = (0, 0). The Jacobian J of a generic map (xn+1 , yn+1 ) = (f (xn , yn ), g(xn , yn )) is: J = ∂f ∂xn ∂f ∂yn ∂g ∂xn ∂g ∂yn (1.48) For the Hénon map we have |det(J)| < 1 for all xn : −2axn | det(J)| = | det b 1 | = | − b| < 1 0 If −1 < b < 1 — as in the case of classic parameter b = 0.3 — this means that the Hénon map is area contracting by a constant factor |b| for each iteration. 19 1.4 Lorenz’s three dimensional chaotic system (1963) The Lorenz’s three dimensional model (1963), hereafter referred to L63, was found by the MIT meteorologist Edward N. Lorenz in early 60s by studying a simplified version of a set of equations modelling a convective fluid motion driven by heating from below. 1.4.1 The equations The flow occurs in a uniform depth layer of fluid, and the temperature difference between upper and lower surfaces has a constant value ∆T . The system has a steady-state solution, where no motion occurs and the temperature varies linearly with depth. If, depending on physical conditions, the solution is unstable, a convective motion will arise [18]. In case the motion is completely vertical, with no deviations, if the upper and lower boundaries are taken to be free and an abrupt truncation is performed, the system turns out to simplify in a set of 3 equations only. Thus, today L63 is known as a dynamical system described by the following dimensionless equations: where σ = 10, r = 28, b = 8 3 dx dt = σ(y − x) dy dt = rx − y − xz dz dt = xy − bz (1.49) are positive parameters and t represents the dimensionless time [18]. In these equations the variables x, y, z depend on time alone. In our experiments, we will set the integration step ∆t = 0.01, and the numerical integration method will be a second order Runge-Kutta scheme (see Appendix A). A qualitative comparison among three popular numerical integration schemes, (1-st order) Euler, 2-nd order Runge-Kutta and 4-th order Runge-Kutta, is shown in Figures 1.7 and 1.8 for a particular case. The initial condition is on the attractor set and is the same for all schemes: (x0 , y0 , z0 ) = (14.2041, 15.0165, 34.7172). In both plots the numerical integration is performed 1000 integration steps ahead, with a time step ∆t = 0.01. Since they are different order integration schemes, of course they lead to slightly different evolutions of the system. Due to the chaotic dynamics of the system, these differences tend to amplify. A generic initial condition won’t be in the attractor set of the system: it will need a transient time to converge to it: so, in practice, in our plots (see Figures 1.9, 1.10, 1.11, 1.12) we set a transient time of 20 Figure 1.7: Lorenz’s 1963 model: a comparison among 1-st order Euler, 2-nd and 4-th order Runge-Kutta schemes for 1000 time steps ∆t = 0.01. The initial point belongs to the attractor and is the same for all schemes: (x0 , y0 , z0 ) = (14.2041, 15.0165, 34.7172). Euler RK2 RK4 z 50 45 40 35 30 25 20 15 10 15 10 5 0 x -5 -10 -15 15 -20 20 5 10 0 -5 -10 -15 -25 -20 y Figure 1.8: Lorenz’s 1963 model: a comparison among 1-st order Euler, 2-nd and 4-th order Runge-Kutta schemes for the variable x. The initial point belongs to the attractor and is the same for all schemes: (x0 , y0 , z0 ) = (14.2041, 15.0165, 34.7172). 15 10 5 x 0 −5 −10 −15 Euler RK2 RK4 −20 0 1 2 3 4 5 Time 21 6 7 8 9 10 Figure 1.9: Lorenz’s 1963 model attractor. z 50 45 40 35 30 25 20 15 10 5 0 20 15 10 5 x 0 -5 -10 -15 -20 25 20 15 10 5 0 -5 -10 -15 -20 -25 y 10,000 integration steps. In Fig. 1.16 we show the so called Lorenz map, where the (n + 1)-th local maximum of zmax (n + 1) is plotted versus zmax (n), the previous one. See also Fig. 1.15. Note that actually the Lorenz map is not actually a well defined function, since there may be more than one output zmax (n + 1) for an input zmax (n) [31], depending on the other variables x and y. So it has a thickness, that prevents us to use this map to deterministically forecast the state of the system. On the other hand it should also be noticed that sometimes there are dots in the map well apart from the great bulk of the other dots. Figure 1.16 has been obtained with a 50 × 106 time steps integration after a 10,000 time steps transient, and it contains more than 600,000 dots. 1.4.2 The meaning of variables and parameters In particular, the parameter σ is the Prandtl number 4 , and its value is σ = 10, which is typical for cold water or approximately twice that of warm water[30][18]; r is the ratio Ra /Rc between the Rayleigh number 5 Ra and its critical value Rc . The critical value r for instability of steady 4 The Prandtl number is the ratio of viscosity and thermal conductivity of the fluid. Rayleigh number describes the kind of heat transfer within a fluid: below a critical value for that fluid we have basically conduction; over the critical value convection will set up, and the bulk of heat transfer will be due to it. 5 The 22 Figure 1.10: Lorenz’s 1963 model attractor projected on xz plane. 45 40 35 z 30 25 20 15 10 5 0 −20 −15 −10 −5 0 5 10 15 20 x Figure 1.11: Lorenz’s 1963 model attractor projected on xy plane. 25 20 15 10 y 5 0 −5 −10 −15 −20 −25 −20 −15 −10 −5 0 x 23 5 10 15 20 Figure 1.12: Lorenz’s 1963 model attractor projected on yz plane. 45 40 35 z 30 25 20 15 10 5 0 −25 −20 −15 −10 −5 0 5 10 15 20 25 y Figure 1.13: Lorenz’s 1963 model: time dependence of x. 20 15 10 x 5 0 −5 −10 −15 −20 0 5 10 15 Time 24 20 25 30 Figure 1.14: Lorenz’s 1963 model: time dependence of y. 25 20 15 10 y 5 0 −5 −10 −15 −20 −25 0 5 10 15 20 25 30 Time Figure 1.15: Lorenz’s 1963 model: time dependence of z. 50 45 40 35 z 30 25 20 15 10 5 0 0 5 10 15 Time 25 20 25 30 Figure 1.16: Lorenz’s 1963 map: values of the relative maximum zmax (n) and successive relative maximum zmax (n + 1). See Fig. 1.15. 50 48 46 44 zmax(n+1) 42 40 38 36 34 32 30 28 28 30 32 34 36 38 40 zmax(n) 42 44 46 48 50 convection to occur is r = 470/19 = 24.74 [18]. So, a value of r = 28 is slightly supercritical and the flow exhibits unstable convection. The parameter b = 8/3 is proportional to the geometry of the convective cell. For the meaning of the 3 variables x, y and z, let’s quote Lorenz himself [18]: “In these equations x is proportional to the intensity of the convective motion, while y is proportional to the temperature difference between the ascending and descending currents, similar signs of x and y denoting that warm fluid is rising and cold fluid is descending. The variable z is proportional to the distortion of the vertical temperature profile from linearity, a positive value indicating that the strongest gradients occur near the boundaries.” Incidentally, note that an initial condition x = y = z = 0 means that the convective system exhibits no convection, and we have a stable “periodic” trajectory. This system was derived by Lorenz as a drastically simplified model of convection rolls in a fluid heated from below in a gravitational field, but the same equations can also be derived for other physical systems: the equations also exactly describe, for example, the motion of a specific water-wheel [31]. 26 1.4.3 Why it is so important: chaos implies limited predictability Despite its apparent simplicity, the Lorenz’s deterministic, nonlinear system exhibits a chaotic dynamics: the solutions oscillate in an irregular way, never exactly repeating themselves and bounded in a particular region of phase space [31, 26]. There are no analytic solutions of the system, so the solutions have to be found by numerical integration only. Lorenz felt that the really important finding was the fact that under fairly general conditions a lack of periodicity implied limited predictability [19]. We deal with this problem even for deterministic flows such as, for example, the NWP models. A fortiori the atmosphere itself has a finite horizon of predictability, which Lorenz estimated to be approximately two weeks. That is due to deep, dynamical reasons — the chaotic behavior of the system — and cannot be overcome by increased power of computational devices. 1.4.4 Lyapunov exponents, dimensionality and doubling time Lorenz’s system L63 is a nonlinear but autonomous system: in eq. 1.2 the function F does not depend explicitly on time. L63 is also a dissipative, nonconservative system, i.e. the time evolution does not preserve volumes in phase space. In general, a volume V (t) of a phase space region D shrinks according to 1.13: Z dV = ∇ · F(x) dx dt t=0 D For L63 the divergence of the flow is: ∇ · F = −(σ + 1 + b) (1.50) and an initial volume V (0) in phase space will shrink according to eq. 1.18: V (t) = V (0) exp [−t (σ + 1 + b)] (1.51) = V (0) exp(−13.67 t) (1.52) The Lorenz’s system has three Lyapunov exponents, which drive its dynamical features: λ1 = 0.9056 (1.53) λ2 = 0 (1.54) λ3 = −14.5723 (1.55) 27 Since λ1 + λ2 > 0 and λ1 + λ2 + λ3 < 0, the Kaplan-Yorke dimension of the attractor of the system (eq. 1.42) is: D = 2+ λ1 + λ2 |λ3 | = 2.06215 (1.56) (1.57) The doubling time is the average time after which a small perturbation will double. It may be calculated from eq. 1.36 coupled with the largest Lyapunov exponent λ1 in the following way: 2 kδ0 k = kδ0 k eλ1 τdouble 1.5 (1.58) =⇒ λ1 τdouble = ln 2 (1.59) =⇒ τdouble = ln 2 = 0.7646 0.9065 (1.60) The aim of this study As we stated in section 1.2, the goal of Data Assimilation is to produce an optimal estimate of the state of a dynamical system from incomplete, noisy observations and (approximate) knowledge of the laws governing the evolution of the system. An important application is the initialization of forecast models of the Atmosphere and Ocean. In this work we address the problem of data assimilation in chaotic systems in the framework of Lorenz’s three variable convective model. The aim is to investigate the theory for an advanced formulation of the Assimilation in the Unstable Subspace (AUS, see section 2.3), a data assimilation scheme relying upon the system’s dynamics which is applicable to different contexts and computationally affordable in operational environments. AUS has already provided encouraging results with realistic models and observational configurations. The low dimensionality of the model used here allows for the direct comparison with a “golden standard” data assimilation scheme, the Extended Kalman Filter, and for development of the theory. 1.6 Notation and conventions Throughout this study and regardless to the scheme under consideration, we will conform to the notation proposed by Ide et al. [14], widely used by the scientific community for data assimilation. 28 1.6.1 Model space and observational space The evolution of a dynamical discrete system from time tk to time tk+1 is described by the equation xf (tk+1 ) = Mk [xa (tk )] (1.61) where the column vector xf (tk+1 ) is the forecast state at time tk+1 and the column vector xa (tk ) is the analysis state at time tk . Both have dimension N and are defined in the model space S Mod : if the system is described by K variables at L locations, then N = K × L. If we have M observations of the state of the system, we can define the observations column vector yo , and its observational space S ob , i.e.: yo ∈ S ob . In an operational context, in general the available observations are much less than the dimensionality of the NWP model, so M N . The assimilation algorithm, aimed at initializing our model with the analysis state xa , can be thought of as an application F , that combines both the forecast state xf and the observations yo to provide the analysis state xa : F : S Mod × S ob −→ S Mod (xf , yo ) 7−→ xa (1.62) (1.63) In this work we will follow the common practice introduced by Rutherford in 1972 [14], who proposed xb = xf , since forecast states had become better background fields than climatology. Other choices may be taken for the vector field xb , for example by averaging over an ensemble of different forecasts. 1.6.2 The observation operator H and its linear approximation H The observation operator H is defined in the model space S Mod : H : S Mod −→ S ob (1.64) It transforms an N -dimensional state vector x ∈ S Mod into an “observational” M -dimensional vector. Its structure contains all mathematical and physical relations allowing for an a priori estimate of the observations: of course, in general H is not linear. But the nonlinear operator H can be linearized around the trajectory x, which we assume it’s well approximated by our forecast trajectory xf , with H(x + δx) = H(x) + H δx 29 (1.65) where δx is a small perturbation of the state and H may be represented by an M × N matrix, whose elements are [16]: (H)ij = ∂Hi ∂xj with i ∈ {1, . . . , M }, j ∈ {1, . . . , N } (1.66) So the operator H is the Jacobian of the operator H, and transforms vectors in model space into their corresponding vectors in observation space. Its transpose HT transforms vectors in observation space into vectors in model space. Note that, if the operator H is linear, by definition of linearity it holds: H(x + δx) = H(x) + H δx (1.67) H=H (1.68) and, by using eq. 1.65, we have 1.6.3 Error vectors The model operator Mk in eq. 1.61 evolves the state from time tk to time tk+1 . If xt (tk ) is the true state of the system at time tk , the corresponding analysis error and the forecast error will be the column vectors η a (tk ) = xa (tk ) − xt (tk ) (1.69) ηf (tk ) = xf (tk ) − xt (tk ) (1.70) These errors are defined in the model space S Mod . Generally speaking, even the observation operator H may be misrepresented: η H (tk ) = yt (tk ) − Hxt (tk ) (1.71) where yt (tk ) is the vector of true values of the observed variables at time tk . The vector η H is defined in the observational space S ob , as well the observational error: η o (tk ) = yo (tk ) − Hxt (tk ) (1.72) The observational error ηo implicitly includes the measurements error, the observation operator misrepresentation and the model’s error of representativeness due to subgrid-scale processes not represented in our grid-averaged values of the model and analysis [16]. 30 The model error between the times tk and tk+1 is defined by η M (tk ) = Mk xt (tk ) − xt (tk+1 ) (1.73) Rearranging it with eq. 1.61, xf (tk+1 ) = Mk [xa (tk )], we have: xf (tk+1 ) − xt (tk+1 ) = Mk [xa (tk )] − Mk xt (tk ) + η M (tk ) (1.74) The lhs is the forecast error at time tk+1 . So, assuming Mk [xa (tk )] − Mk [xt (tk )] to be a small term, we can calculate the new forecast error with the tangent linear model Mk : η f (tk+1 ) = Mk η a (tk ) + η M (tk ) 1.6.4 (1.75) Error covariance matrices Now we can define from the above column vectors their covariance matrices, built by rightmultiplying each of them by its transpose and taking the expectation values. In practice, the expectation values are estimated by averaging on many cases. The analysis error covariance matrix Pa , the forecast error covariance matrix Pf and the observation error covariance matrix R then read: Pa =< ηa (η a )T > (1.76) Pf =< η f (η f )T > (1.77) R =< η o (η o )T > (1.78) More explicitly: < η1a η1a > < η2a η1a > a P = .. . a a < ηN η1 > < η1f η1f > < η1a η2a > < η2a η2a > .. . a a < ηN η2 > < η1f η2f > < ηf ηf > < ηf ηf > 2 1 2 2 f P = .. .. . . f f f f < ηN η1 > < ηN η2 > 31 ··· a < η1a ηN > a < η2a ηN > .. . a a · · · < ηN ηN > ··· .. . ··· f < η1f ηN > f < η2f ηN > ... f f · · · < ηN ηN > ··· .. . (1.79) (1.80) < η1o η1o > < η2o η1o > R= .. . o o < ηM η1 > < η1o η2o > ··· o < η1o ηM > < > ··· < > .. .. .. . . . o o o o < ηM η2 > · · · < ηM ηM > η2o η2o o η2o ηM (1.81) Since each vector depends on time tk all the above error covariance matrices depend on time tk as well. Note also that Pa and Pf are N × N matrices, while R is an M × M one. Note that the observation error covariance matrix R can be actually thought of as including three components [16]: instrument error covariance matrix Rinstr , the representativeness error covariance matrix Rrepr , both assumed to be uncorrelated, and the observation operator H error covariance matrix RH : R = Rinstr + Rrepr + RH (1.82) From eq. 1.75, η f (tk+1 ) = Mk η a (tk ) + η M (tk ), we have [14]: Pfk+1 = = T Mk η a (tk ) + η M (tk ) Mk η a (tk ) + η M (tk ) Mk Pak MTk + Qk (1.83) (1.84) where the term Qk refers to the model error covariance matrix between the times tk and tk+1 . It is defined, in a similar way as Pa , Pf and R, by the model error η M (tk ): Q =< η M (η M )T > (1.85) (here we dropped the subscript k). That is: < η1M η1M > < η2M η1M > Q= .. . M M η1 > < ηN < η1M η2M > < η2M η2M > .. . M M < ηN η2 > 32 M > · · · < η1M ηN ··· < .. . M η2M ηN .. . M M · · · < ηN ηN > > (1.86) 1.6.5 Operators and vectors: a low dimensional example As an example, consider a 3-dimensional space model S Mod and a 2-dimensional observation space S ob : the truth vector is 3 × 1, with components expressed in term of the grid points e, f, g (see Fig. 1.17): xte t xt = xf xtg (1.87) and similarly for analysis, background and forecast column vectors: xa = xb xbe xbf h = xfe xff xf = xae T xag T xbg iT xfg xaf (1.88) (1.89) (1.90) The observation vector, instead, is a 2 × 1 one expressed in term of the observation points 1 and 2 (see Fig. 1.17 again): y1o yo = y2o (1.91) The analysis and forecast covariance matrices are: a pee a Pa = pf e page paef paf f pagf pfee f Pf = pf e pfge paeg paf g pagg pfef pfeg pff f pff g pfgf pfgg (1.92) (1.93) while the observation error covariance matrix will be: r11 R= r21 r12 r22 (1.94) If measurements error at different locations are uncorrelated, R will be a diagonal matrix. The linearized observation operator H projects vectors in model space S Mod (grid points) into vectors in observation space S ob (observation points). Its components may be simple 33 Figure 1.17: A simple case: 3 grid points (e, f, g) and 2 observations (1, 2). interpolation coefficients: h1e H= h2e h1f h2f h1g h2g (1.95) So for example the background values at observation points are the following vector yb : Hxb = = h1e h2e y1b y2b h1f h2f xbe h1g b xf h2g xbg (1.96) = yb (1.97) In Kalman Filter (see subsection 2.2.1) we will also have the term: pfee f P H = pf e pfge f T pfef pfeg pff f pff g pfgf pfgg h1e h 1f h1g pfe1 h2e f h2f = pf 1 pfg1 h2g pfe2 pff 2 pfg2 (1.98) that is a grid to observation points approximation, by interpolation, of the forecast error covariance matrix Pf ; as for example in pfe2 = pfee h2e + pfef h2f + pfeg h2g . We will also see the term: h1e HPf HT = h2e h1f h2f h1g h2g f pee f pf e pfge pfef pff f pfgf pfeg h1e pff g h1f h1g pfgg h2e pf11 h2f = f p12 h2g pf21 pf22 (1.99) which in turn is an approximation by back interpolation of the forecast error covariance matrix Pf between observation points [16]. 34 We will also find the gain matrix h i−1 K = Pf HT HPf HT + R (1.100) that is an N × M matrix. In this low dimensional example it’s 3 × 2: K f pe1 f = pf 1 pfg1 pfe2 pf11 pff 2 f p12 pfg2 35 pf21 pf22 r11 + r21 r12 r22 −1 (1.101) 36 Chapter 2 Data Assimilation: state of the art A NWP forecast is an initial value problem coupled with a boundary problem: carefully initialized and with the appropriate boundary conditions, a NWP model outputs the atmospheric evolution (forecast). Of course, to improve the quality of the forecast we need to better estimate the initial conditions, or the present state of the system. As we already stated in paragraph 1.2, the purpose of Data Assimilation in NWP models is “using all the available information, to determine as accurately as possible the state of the atmospheric (or oceanic) flow” (Talagrand, 1997, [16]). The available information is a statistical combination of (noisy) observations and a short-range forecast. In this chapter we will survey the most important techniques devised to combine these informations, without any attempt to be exhaustive. In particular, we will talk about Variational and Sequential assimilation techniques. The former has been applied since the early 70s, greatly enhanced at the end of the 80s, and still largely used in operational contexts. The latter have an appealing probabilistic approach, but their implementation in realistic geophysical models turns out to be problematic, for reasons to be discussed below. 2.1 Variational assimilation In the variational method basically the problem is to find a model trajectory which best fit the observational data within a given time interval τ , often called assimilation window, while satisfying the dynamical constraints: in the strong constraints formulation the constraints have to be satisfied exactly; in the weak constraints formulation they have to be satisfied only ap37 proximately [8]. Since the model equations are deterministic, all we need is the initial state and the boundary conditions, and the variational problem may be restated from a constrained to an unconstrained form. Thus, the idea is to find the initial state x(t0 ) which evolution under the model equations best fit the observational data within the assimilation window. In practice, we are looking for the initial state x(t0 ) that minimizes a so called cost function, which is a scalar functional of the trajectory x(t), to be minimized under the model equations constraint 1.2. The cost function J (x(t)) is defined as the sum of the squares of the distance between x(t) and the forecast state xf (t) weighted by the inverse of the forecast error covariance matrix Pf (t), plus the distance between the “first guess” observations Hx(t) and the true observations yo (t), weighted by the inverse of the observations error covariance matrix R [16]. J (x(t)) 2.1.1 = T −1 1 x(t) − xf (t) Pf (t) x(t) − xf (t) + 2 1 + [Hx(t) − yo (t)]T R−1 [Hx(t) − yo (t)] 2 (2.1) Maximum likelihood approach Let’s see a maximum likelihood motivation for eq. 2.1. Consider first, as an illustrative example, a 1-dimensional problem: we have two independent temperature observations T1 and T2 , both assumed to have normally distributed errors with standard deviations σ1 and σ2 . The analysis temperature T is the most likely value, given the two observations T1 and T2 and their statistical errors. The cost function, in this particular case, is: J(T ) = (T − T2 )2 1 (T − T1 )2 + 2 σ12 σ22 (2.2) The probability distribution of an observation T1 given a true value T and a standard deviation σ1 for T1 is 2 (T −T ) − 1 2 1 2σ 1 pσ1 (T1 |T ) = √ e 2πσ1 (2.3) A similar relation holds for T2 : 2 (T −T ) − 2 2 1 2σ2 pσ2 (T2 |T ) = √ e 2πσ2 (2.4) The likelihood of a true value T given an observation T1 with a standard deviation σ1 is given by [16]: 2 (T −T ) − 1 2 1 2σ1 Lσ1 (T |T1 ) = pσ1 (T1 |T ) = √ e 2πσ1 (2.5) and — in the same way — the likelihood of a true value T given an observation T2 with a 38 standard deviation σ2 is: 2 (T −T ) − 2 2 1 2σ2 e Lσ2 (T |T2 ) = pσ2 (T2 |T ) = √ 2πσ2 (2.6) Since the two observations T1 and T2 are independent, the likelihood of a true value T given both T1 and T2 is the product: Lσ1 σ2 (T |T1 , T2 ) = Lσ1 (T |T1 ) · Lσ2 (T |T2 ) (T −T )2 − 1 2 2σ1 = 1 √ e 2πσ1 = (T −T ) − 1 2 1 2σ 1 e 2πσ1 σ2 2 (2.7) 1 √ e 2πσ2 − (T −T )2 − 2 2 2σ2 (T2 −T )2 2σ2 2 (2.8) Thus, given the two measurements T1 and T2 and their standard deviations σ1 and σ2 , we can find the most likely value of T by maximizing the likelihood 2.8, or even its logarithm: (T1 − T )2 (T2 − T )2 max ln Lσ1 σ2 (T |T1 , T2 ) = max constant − − T T 2σ12 2σ22 (2.9) This maximization leads to the minimization of (T1 − T )2 1 (T2 − T )2 − J(T ) = − 2 σ12 σ22 (2.10) which is the cost function 2.1 for this simplified case. Let’s consider now the more general cost function (eq. 2.1): we define the likelihood of the true state x(t) given the forecast field xf (t) (used as background field) or given the new observations yo in the following way [16]: LPf (x|xf ) = pPf (xf |x) = LR (x|yo ) = pR (yo |x) = T −1 1 − 1 xf −x) (Pf ) (xf −x) e 2 ( N/2 f 1/2 (2π) |P | h 1 e− 2 [(y 1 (2π)M/2 |R|1/2 i o −Hx)T R−1 (yo −Hx)] (2.11) (2.12) where N is the number of components of the vectors x(t) and xf (t), while M is that of the vector yo . The joint likelihood, being independent the forecast xf and the new observations yo , is the product of the two Gaussian likelihoods: L(x|xf , yo ) = = LPf (x|xf ) LR (x|yo ) e − 12 h (x f T −x) f (P ) (x −1 (2.13) f i −x) − 12 [(y o T −Hx) R (2π)(N +M)/2 |Pf |1/2 |R|1/2 39 −1 o (y −Hx)] (2.14) The most likely analysis state xa , which maximizes the joint likelihood and its logarithm as well, also minimizes the cost function 2.1. 2.1.2 Bayesian approach Let’s go back to our simplified 1-dimensional case, discussed in subsection 2.1.1: there is also a Bayesian derivation for 2.10. We made the observation T1 (the forecast state in the assimilation cycle) with an a priori probability distribution of the truth — that is, before the second observation: 2 pT1 σ1 (T ) = √ (T −T ) − 1 2 1 2σ 1 e 2πσ1 (2.15) The Bayes Theorem for the a posteriori probability of the truth given the new measurement T2 is: pσ2 (T |T2 ) = = pσ2 (T2 |T ) pT1 σ1 (T ) pσ2 (T2 ) √ 1 2πσ2 e − (T2 −T )2 2 2σ2 √ 1 2πσ1 e − (T1 −T )2 2 2σ1 (2.16) pσ2 (T2 ) Since the denominator pσ2 (T2 ) = ∗ )2 Z T∗ (T −T − 2 2 1 2σ2 √ e 2πσ2 dT ∗ is independent of T , maximizing the a posteriori probability 2.16 means maximizing the logarithm of the numerator, that leads again to the minimization of the cost function 2.10. In the more general case (eq. 2.1), we suppose that the truth vector x(t) is the result of a stochastic process defined by the following a priori probability distribution function, given the forecast field xf (t) (used as background): T −1 1 − 21 (xf −x) (Pf ) (xf −x) pPf (x) = e (2π)N/2 |Pf |1/2 h i (2.17) When we get new observations yo , the Bayes theorem gives us the a posteriori probability: p(x|yo ) = pR (yo |x) pPf (x) p(yo ) (2.18) where p(yo ) is the climatological observations distribution. Eq. 2.18 gives: −1 2 e p(x|yo ) = » (yo −Hx)T R−1 (yo −Hx)+(xf −x) T (2π)(N +M )/2 |R|1/2 |Pf |1/2 p(yo ) 40 – (Pf ) (xf −x) −1 (2.19) Since p(yo ) does not depend on the current state x(t), the maximum of the a posteriori probability 2.19 will coincide with the maximum numerator, or with the minimum of the cost function 2.1. 2.1.3 3D-Var scheme The analysis state that minimizes the cost function J(x) in eq. 2.1 is given by: ∇x J(xa ) = 0 (2.20) If we assume that the analysis is a good approximation to the truth and to the observations, we can linearize the observation operator H around the background, or around the forecast if we use this as the background field: yo − H(x) = yo − H[xf + (x − xf )] (2.21) = yo − H(xf ) − H(x − xf ) (2.22) After dropping the time dependence for sake of clarity, we can rearrange the expression for the cost function: J (x) = = = T −1 1 1 T x − xf Pf x − xf + [H(x) − yo ] R−1 [H(x) − yo ] (2.23) 2 2 T −1 1 x − xf Pf x − xf + 2 1 + [yo − H(xf ) − H(x − xf )]T R−1 [yo − H(xf ) − H(x − xf )] (2.24) 2 T −1 1 T 1 x − xf Pf x − xf + x − xf HT R−1 H x − xf + 2 2 1 T 1 f o T −1 R H x − xf + x − xf HT R−1 H(xf ) − yo + + H(x ) − y 2 2 1 f o T −1 f o + H(x ) − y R H(x ) − y (2.25) 2 and see that it’s a quadratic function of the analysis increment x − xf . Now, consider the general quadratic function F (x) = 1 T x Ax + vT x + k 2 (2.26) where x and v are vectors of the same N -dimensional vector space, A is an N × N symmetric matrix and k is a scalar. It can be shown [16] that its gradient is ∇x F (x) = Ax + v 41 (2.27) So, for the cost function 2.25 the gradient with respect to x is the same as that with respect to x − xf : ∇J(x) = h Pf −1 + HT R−1 H i x − xf + HT R−1 H(xf ) − yo (2.28) In order to minimize it and to calculate the analysis state xa , we set ∇J(xa ) = 0, so: h Pf −1 + HT R−1 H i xa − xf = HT R−1 yo − H(xf ) (2.29) or alternatively: xa = xf + h Pf −1 + HT R−1 H i−1 HT R−1 yo − H(xf ) (2.30) The last equation can be expressed in term of the analysis increment δxa = xa − xf and the innovation δyo = yo − H(xf ): δxa = h Pf −1 + HT R−1 H i−1 HT R−1 δyo (2.31) which is the solution of the variational analysis problem, called 3D-Var. It can also be shown that h Pf −1 + HT R−1 H i−1 −1 HT R−1 = Pf HT HPf HT + R (2.32) so that −1 o δxa = Pf HT HPf HT + R δy (2.33) −1 o xa = xf + Pf HT HPf HT + R y − H(xf ) (2.34) or equivalently: Subtracting the true state vector xt in both lhs and rhs, we get an expression for the analysis error: −1 o y − H(xf ) η a = ηf + Pf HT HPf HT + R (2.35) If we define the gain matrix −1 K = Pf HT HPf HT + R (2.36) the equations 2.33 and 2.34 become: δxa xa = K δyo (2.37) = [I − KH] xf + Kyo (2.38) 42 If the forecast error η f is small, we can linearize the observation operator H and the equation H(x + δx) = H(x) + H δx holds true; so equation 2.35 yields: ηa = η f + Kyo − KH(xf ) (2.39) = η f + Kyo − KH(xt + xf − xt ) (2.40) = η f + Kyo − KH(xt + η f ) (2.41) = η f + Kyo − KH(xt ) − KH(η f ) (2.42) = [I − KH] η f + K δyo (2.43) Since we may assume that the forecast error and the observation error are uncorrelated, the analysis covariance matrix may be written as follows: Pa = < η a (η a )T > (2.44) = [I − KH] Pf [I − KH]T + KRKT (2.45) = [I − KH] Pf (2.46) where the last equation has been derived by inserting into eq. 2.45 the expression for the gain −1 matrix K = Pf HT HPf HT + R . Indeed, from eq. 2.45 we have: Pa = = [I − KH] Pf − [I − KH] Pf HT KT + KRKT [I − KH] Pf − Pf HT − KHPf HT KT + KRKT (2.47) (2.48) Now, the two last terms in eq. 2.48 actually vanish, because −1 K = Pf HT HPf HT + R ⇐⇒ KR = Pf HT − KHPf HT 2.2 (2.49) Sequential assimilation The sequential assimilation methods have a probabilistic approach to estimate the state of a system. The basic idea is to project information ahead in time and to assimilate observational data when available. We don’t need to compute the adjoint model: sequential assimilation schemes are suitable for different models. In the following subsections we will talk about Kalman Filter for linear systems and Extended Kalman Filter for nonlinear ones. In the end, we will drop a note about the very promising Ensemble Kalman Filter. 43 2.2.1 Kalman Filter The Kalman Filter, hereafter referred to as KF, was first formulated by Kalman (1960) and Kalman and Bucy (1961), so sometimes it is called Kalman-Bucy Filter. The KF deals with linear stochastic dynamical systems where noisy observations are taken at discrete times. It is an optimal recursive data processing algorithm, where ’optimal’ refers to the fact that it uses all available information we can provide: it processes all measurements, when available, regardless of their precision, by using [23]: • the knowledge of the dynamics of the system and measurement device dynamics • the statistical description of the system noises, measurement errors and model error • any available information about initial conditions of the system • it does not need old data to be kept in storage The KF is basically a set of equations that implements a prediction-correction estimator: if some conditions are met, the estimator minimizes the estimated error covariance [23]. If the conditions are not fully satisfied, often the KF still works quite well. A sketch description of how it works is shown in Fig. 2.1: the prediction stage consists of the first two equations, which basically project ahead the analysis state and the analysis error covariance matrix, providing the forecast vector and the forecast error covariance matrix. The correction stage consists of further three equations, and takes advantage of new observations to provide the new analysis vector and the analysis error covariance matrix. Then, recursively, a new projection ahead is performed till new observations become available. Let’s consider a linear discrete stochastic dynamical system of the form of the eq. 1.11, i.e. xtk+1 = Mxtk + qk+1 (2.50) Here the linear model operator M projects information ahead from time tk to time tk+1 , so actually M = M(tk , tk+1 ) (2.51) We will use this simplified notation in equations below. The term qk+1 in eq. 2.50 is a white Gaussian noise with zero mean and covariance matrix Qk+1 , namely < qk+1 > = Qk+1 = 0 (2.52) < qk+1 qTk+1 > (2.53) 44 and may represent subgrid-scale processes not resolved by the model [14]. Under these conditions the KF is described by the following set of equations: xfk+1 = Mxak (2.54) Pfk+1 = MPak MT + Qk+1 (2.55) Kk+1 = Pfk+1 HT (HPfk+1 HT + R)−1 (2.56) xak+1 = o (I − Kk+1 H)xfk+1 + Kk+1 yk+1 (2.57) Pak+1 = (I − Kk+1 H)Pfk+1 (2.58) where, following usual conventions: xt is the true state xa is the analysis state xf the forecast state yo the observation vector Pa the analysis error covariance matrix Pf the forecast error covariance matrix R the observational error covariance matrix H the (possibly nonlinear) observation operator H the linearized observation operator (it transforms vectors in model space to vectors in observation space) M the linear model operator Q the forecast model error covariance matrix K the gain matrix I the identity matrix. Subscripts in the KF equations indicate the time steps where new observations are available. If we compute the expectation value of eq. 2.50 we get: < xtk+1 > = < Mxtk + qk+1 > (2.59) = < Mxtk > (2.60) = M < xtk > (2.61) 45 Figure 2.1: How the Kalman Filter works. or, in the usual data assimilation notation: xfk+1 = Mxak (2.62) which is eq. 2.54. It should be noticed that, due to linearity of this special case, if the initial PDF is Gaussian, so it will remain for any future time: we can have a complete description of the future PDF by its mean and covariance. So we need the covariance matrix Pf evolution, from time tk to time tk+1 . It can be derived by rearranging the forecast error definition, together with the linearity of the model operator M, and by using equations 2.50, 2.62: η fk+1 = xfk+1 − xtk+1 (2.63) = Mxak − Mxtk − qk+1 (2.64) = M(xak − xtk ) − qk+1 (2.65) = Mη ak − qk+1 (2.66) Since qk+1 is a white Gaussian noise with zero mean, the expectation value of η fk+1 reads: < η fk+1 >= M < η ak > 46 (2.67) Now, if the analysis error and the model error are uncorrelated, multiplying η fk+1 by its transpose and taking the expectation value, we get: Pfk+1 = < η fk+1 (η fk+1 )T > (2.68) = < (Mη ak − qk+1 )(Mη ak − qk+1 )T > (2.69) = M < η ak (η ak )T > MT + < qk+1 (qk+1 )T > (2.70) = MPak MT + Qk+1 (2.71) which is eq. 2.55. The analysis state is computed as in 3D-Var assimilation scheme, eq. 2.34 xa = xf + Kk+1 yo − H(xf ) (2.72) xa = [I − Kk+1 H] xf + Kk+1 yo (2.73) or even where the Kalman gain Kk+1 is defined by −1 Kk+1 = Pf HT HPf HT + R (2.74) The analysis error covariance matrix is given by eq. 2.46, i.e.: Pa = (I − Kk+1 H)Pfk+1 2.2.2 (2.75) Extended Kalman Filter One of the methods devised to address the problem of estimating the initial conditions for a forecast model is the Extended Kalman Filter (EKF), where the term ’extended’ refers to the Kalman Filter’s approximation for nonlinear systems. The EKF is described by the following set of equations (see for example [14]): xfk+1 = M xak (2.76) Pfk+1 = MPak MT + Qk (2.77) Kk+1 = Pfk+1 HT (HPfk+1 HT + R)−1 (2.78) xak+1 = o (I − Kk+1 H)xfk+1 + Kk+1 yk+1 (2.79) Pak+1 = (I − Kk+1 H)Pfk+1 (2.80) 47 where we used the common conventions as for the KF, and the nonlinear model operator M . Here M is no longer the linear model operator, but the Tangent Linear Model operator. Subscripts indicate the time steps where new observations are available and, again, we dropped them for operators M , M, H, H. For EKF, due to the lack of linearity of the model operator M , equation 2.66 for the forecast error is no longer valid; nonetheless it may be rewritten in an approximate form, by using the tangent linear model operator and equation xtk+1 = M xtk + qk+1 : η fk+1 = xfk+1 − xtk+1 (2.81) = M xak − M xtk − qk+1 (2.82) ' M(xak − xtk ) − qk+1 (2.83) So, since η fk+1 ' Mηak − qk+1 (2.84) is only an approximate expression, the corresponding expression for the forecast error covariance matrix (eq. 2.71) will be approximate as well: Pfk+1 ' MPak MT + Qk (2.85) The EKF, after an initial transient, should give both the best linear unbiased estimate of the state of the system and its error covariance. But if the system is (locally) highly nonlinear, or should the observations be not adequately frequent, the linearization may become a hypothesis which is not actually fulfilled: that can jeopardize the stability of the filter and the filter may diverge [16]. Furthermore, for realistic NWP model the EKF can not be implemented due to both the prohibitive computational costs in estimating the covariance matrices and the uncertainties about the model error. 2.2.3 Ensemble Kalman Filter In the Ensemble Kalman Filter (EnKF) approach, proposed by Evensen in 1994 [10], an ensemble of Nens data assimilation cycles are run simultaneously and independently: all of them assimilate the same set of observations, but for each member of the ensemble it will be added a different random perturbation to the observations. This ensemble can be used to estimate the forecast error covariance matrix Pf : after computing the analysis state xaj (tk ), j ∈ {1, . . . , Nens }, 48 for each member of the ensemble, we can obtain the forecast states: xfj (tk+1 ) = Mkj xaj (tk ) j ∈ {1, . . . , Nens } (2.86) the ensemble average xf (tk+1 ) and an estimate of the forecast error covariance matrix, a sort of average of the Nens forecast error covariance matrices, e.g.: Pfk+1 ' 1 Nens − 1 N ens h X j=1 i h f iT xfj (tk+1 ) − xf (tk+1 ) xj (tk+1 ) − xf (tk+1 ) (2.87) This will actually tend to underestimate the forecast error covariance matrix Pf : other estimate can be devised [16]. The EnKF approach has many advantages, among which: • since typically Nens is somewhere between 10 and 100, the computational cost of EnKF is increased by the same factor with respect to 3D-Var, for example. But it’s much smaller compared to that of an EKF • EnKF does not need a linear or adjoint model • it does not even require the linearization of the evolution of the forecast error covariance matrix Pf Despite EnKF is not yet implemented in operational NWP forecast, it seems nowadays one of the most promising assimilation schemes for the future. 2.3 AUS: Assimilation in the Unstable Subspace The Assimilation in the Unstable Subspace (AUS) was introduced by Trevisan and Uboldi in 2004 [33], hereafter referred to as TU, and developed by Trevisan, Uboldi and Carrassi [34, 35, 3], to minimize the analysis and forecast errors by exploiting the flow-dependent instabilities of the forecast-analysis cycle system, which may be thought of as a system forced by observations. In the AUS scheme the assimilation is obtained by confining the analysis increment δxa = xa − xf in the unstable subspace of the forecast-analysis cycle system so that it will have the same structure of the dominant instabilities of the system. In such a way the dynamically unstable components, present in the forecast error, which are responsible for error growth, are in principle systematically reduced or eliminated. The unstable subspace will be estimated by breeding on the data assimilation system (BDAS), a technique to be discussed below. TU showed that AUS is a reliable and efficient approach in the 40 variables Lorenz 1996 49 model [20], while the subsequent studies proved the same for different, more realistic models and observational configurations, including a Quasi-Geostrophic model with 14784 degrees of freedom [3], and a high dimensional, primitive equation ocean model with 301120 degrees of freedom [35]; the experiments encompassed fixed and “adaptive”, or “targeted”, observations. In these contexts, the AUS-BDAS dynamical system approach greatly reduces the analysis error, with reasonable computational costs for data assimilation with respect, for example, to a prohibitive full Extended Kalman Filter approach. This is a follow-up study in which we will revisit the AUS-BDAS approach in the more basic, highly nonlinear Lorenz 1963 convective model. In practice, in the same spirit as of the EKF approach (equations 2.76-2.80), the AUS assimilation algorithm is aimed at finding a simplified form of the forecast covariance matrix Pf by exploiting the local unstable structures of the forecast-analysis cycle system, which in turn are estimated by BDAS. Once Pf has been estimated, an approximated gain matrix K may be computed, so finally — with some knowledge of the observational error covariance matrix R and the observation operator H — we can estimate the analysis vector xa . 2.3.1 AUS: how it works The forecast error is regarded as made of two components, the first on the unstable subspace, and the other one on the complementary subspace [35]: η f = Eγ + ξ (2.88) where the matrix E stores in its columns the normalized estimated unstable directions, the column vector γ represents the forecast error component in the unstable basis: so Eγ is the linear combination of the unstable directions that represents the forecast error component on the unstable subspace. The correspondent forecast covariance matrix may be derived: Pf = < η f (η f )T > (2.89) = E < γγ T > ET + E < γξ T > + < ξγ T > ET + < ξξ T > (2.90) If we set Γ =< γγ T >and assume that the forecast error component in the complementary subspace is small, we can neglect in eq. 2.90 all term containing ξ, and the Pf may be approximated: Pf ' EΓET 50 (2.91) The corresponding gain matrix is approximated as well: K = Pf HT (HPf HT + R)−1 (2.92) ' EΓET HT (HEΓET HT + R)−1 (2.93) where R is the usual observational error covariance matrix and H is the Jacobian of the possibly nonlinear observation operator H. The analysis vector expression reads: xa = xf + K yo − H(xf ) (2.94) ' xf + EΓET HT (HEΓET HT + R)−1 yo − H(xf ) (2.95) The approximated equation 2.91, written by neglecting the component of Pf out of the unstable subspace spanned by the basis vector stored in the matrix E, results in a gain matrix K computed in a subspace (eq. 2.93). The resulting analysis of eq. 2.95 reduces the error component in such subspace [35]. If M is the tangent linear propagator (between time tk to time tk+1 , [3]) we can write down: MEk = Ek+1 Λk where Λk is the matrix whose diagonal elements are the amplification factors, exp R tk+1 tk λi (t) dt , where λi are the local Lyapunov exponents, in decreasing order, corresponding to the i-th column vector of Ek . The forecast error will then evolve according to: a η fk+1 = Ek+1 Λk E−1 k ηk 2.3.2 (2.96) AUS: a simple example For example, if the state vector has dimension 3, the number of observations each assimilation step is 2 and the number of unstable directions is 2, the 3 × 2 matrix E is: e11 E= e21 e31 51 e12 e22 e32 (2.97) where e1 = [e11 e21 e31 ]T and e2 = [e12 e22 e32 ]T are the 2 unstable directions. The 2 × 1 column vector γ is γ 1 γ= γ2 (2.98) so the 3 × 1 product Eγ is e11 e12 e 21 e22 e31 e32 e11 γ1 + e12 γ2 e γ +e γ 22 2 21 1 e31 γ1 + e32 γ2 = Eγ = γ1 γ2 (2.99) (2.100) The forecast error covariance matrix is 3 × 3: Pf ' EΓET e11 e12 ' e21 e22 e31 e32 (2.101) γ1 e11 [γ1 γ2 ] γ2 e12 e21 e22 e31 e32 (2.102) where Γ is the 2 × 2 matrix Γ γ 1 = [γ1 γ2 ] γ2 γ1 γ1 γ1 γ2 = γ1 γ2 γ2 γ2 (2.103) (2.104) As we already mentioned in subsection 1.6.5 for a similar case, the observation error covariance R is 2 × 2, the observation operator H is 2 × 3, and the gain matrix K is 3 × 2. 2.3.3 Refresh procedure After the analysis, part of the information in the bred vectors, used to estimate the unstable structures of the system, will be no longer available: that’s why a “refresh” procedure may improve our capabilities to capture the system’s instabilities. “Refresh” means that a new random perturbation is introduced in the place of bred vectors already used in assimilation; which in turn may be simply discarded or “recycled” by adding them to the other vectors. 52 Which is the better strategy depends on the complexity of our system, and an “in between” approach may be also implemented [35]. 2.3.4 Using a single bred vector for assimilation When we use a single bred vector to estimate the unstable 1-dimensional subspace, the matrix E reduces to a single column vector e, the matrix Γ becomes the scalar γ 2 and the expressions discussed in subsection 2.3.1 for forecast error, forecast error covariance matrix and gain matrix reduce to: 2.3.5 η f = eγ + ξ (2.105) Pf ' γ 2 e eT (2.106) K = γ 2 e eT HT (γ 2 H e eT HT + R)−1 (2.107) Adaptive observation strategy The basic idea underlying adaptive, or targeted, observations is to take measurements where the unstable structures have the maximum amplitude. The same structures exploited to locate adaptive observations are used to estimate the Pf , K and analysis state xa , through equations 2.91, 2.93 and 2.95 respectively. This approach has been already tested in previous works by Trevisan, Uboldi and Carrassi and proved to be highly efficient in realistic contexts [33, 34, 3]. 2.4 BDAS: Breeding on the Data Assimilation System Breeding on the Data Assimilation System (BDAS) is a method devised to estimate the unstable structures of the data assimilation system, that can be thought of as a system forced by observations. It is a modified formulation of breeding: the basic idea of BDAS is to breed initially random perturbations of the analysis and to impose them the same dynamics as the analysis-forecast solution, including assimilation of the observations whenever available. The perturbations need to be evolved for a sufficiently long time for a reliable estimate of the instabilities: the corresponding time is the breeding time. 2.4.1 Standard breeding method The breeding method is a nonlinear, finite-amplitude generalization of the algorithm used to compute the leading Lyapunov vector: as we already mentioned in subsection 1.3.9, bred vectors are indeed closely related to Local Lyapunov Vectors. This is the basic idea: A small pertur53 bation of the state, if its amplitude is periodically scaled down to be kept small, will evolve in a linear combination of unstable directions [35]. These directions are estimated by integrating the nonlinear model and by using one or more perturbed states. In realistic geophysical models the normalization amplitude, the length of the breeding time and the frequency of the normalization procedure may be tuned to filter out unwanted instabilities, such as convection [16]. In practice, standard breeding works as follows. Given a dynamical system in the form of a flow, a breeding cycle is started by adding a random initial perturbation with a fixed initial amplitude, which is introduced only once, at the beginning of the procedure. The same nonlinear model is integrated from the unperturbed (“control”) and from the perturbed initial conditions. At regular time intervals the control forecast is subtracted from the perturbed forecast, and the resulting difference is scaled down to the initial amplitude. Then it is added to the corresponding new analysis or model state [16]. The forecast state can be computed by xfk+1 = M xak (2.108) and the perturbation dynamics will be described by δxfk+1 = M δxak (2.109) where M is the model and M = M(x(t)) its Jacobian evaluated around the forecast trajectory x(t), 2.4.2 ∀t ∈]tk , tk+1 ]. BDAS: how it works Since BDAS is a particular implementation of the breeding method, it shares with it the basic principles, i.e. the evolution, by using the nonlinear model, of a random initial perturbation added to the control trajectory. In the analysis step we have observational data to assimilate, and the evolved random initial perturbation will undergo the assimilation procedure. Whenever observations become available, the analysis state is computed by o xak+1 = [I − KH] xfk+1 + Kyk+1 (2.110) o xak+1 = [I − KH] M xak + Kyk+1 (2.111) or even, using eq. 2.108: where K is the gain matrix and H is the (possibly nonlinear) observation operator. The 54 expression 2.110 is the same in Kalman Filters and 3D-Var schemes. The perturbation equation for the system undergoing observational forcing reads: δxak+1 = [I − KH] δxfk+1 (2.112) = [I − KH] M δxak (2.113) where we used the linearized observation operator H, i.e. the Jacobian of the observation operator H. Breeding on the Data Assimilation System is based, rather than on eq. 2.109, on eq. 2.113, in which the matrix operator [I − KH] has a general stabilizing effect: the assimilation will reduce the amplifying components of the error. So, basically: in an assimilation system where observations are available once in a while, during the forecast time the free system instabilities dominate the error growth; in the analysis step the assimilation of observations will in general reduce some fast-growing components of the error. Another important issue is the breeding time, i.e. the time needed for the perturbations to capture the most unstable structures. It can not be infinite, but should be long enough to provide a meaningful (set of) bred vector(s). Typically, the breeding time ∆t is longer than the assimilation window τ , and often is set as a multiple of τ : ∆t = nτ 2.4.3 (2.114) BDAS, an example of practical implementation Just to focus on a specific example, let’s suppose that we deal with a simple low dimensional system for which the breeding time ∆t = 2τ , where τ = tk+1 − tk is the fixed assimilation window. We assume that: • the unstable subspace is estimated by using a single bred vector • we can discard bred vector after use If tk is the previous assimilation step, we use the general equation 1.34, that is: δx(t) = ∂M δx(tk ) = M δx(tk ), ∂x t ∈ [tk , tk+1 ] (2.115) to estimate the evolution of the perturbation δx(t), for t spanning from the previous assimilation step tk up to the new one tk+1 . At new assimilation step tk+1 we will use the evolved perturbation δx(tk−1 ) to estimate the unstable direction of the system, after which it is dis55 Table 2.1: Breeding on the Data Assimilation System: introducing the perturbations and estimating the unstable subspace. This is a specific example in which the breeding time is ∆t = 2τ , where τ = tk+1 − tk is the assimilation window. . Time introduced evolving used to assimilate undergoing and then discarded assimilation tk−1 δx(tk−1 ) — evolved δx(tk−3 ) evolved δx(tk−2 ) tk−1 < t ≤ tk — δx(tk−2 ) & δx(tk−1 ) — — tk δx(tk ) — evolved δx(tk−2 ) evolved δx(tk−1 ) tk < t ≤ tk+1 — δx(tk−1 ) & δx(tk ) — — tk+1 δx(tk+1 ) — evolved δx(tk−1 ) evolved δx(tk ) tk+1 < t ≤ tk+2 — δx(tk ) & δx(tk+1 ) — — tk+2 δx(tk+2 ) — evolved δx(tk ) evolved δx(tk+1 ) carded. The already grown perturbation δx(tk ) will undergo the same assimilation process used to evaluate the new analysis state. Furthermore, a new random perturbation δx(tk+1 ) is introduced (refresh, see subsection 2.3.3), to be used at assimilation step tk+3 . Let’s summarize: In the particular example at hand, with a breeding time ∆t = 2τ , we will recursively follow this procedure (see Table 2.1): 1. At each new assimilation step tk+1 : (a) We exploit the evolved perturbation δx(tk−1 ), previously introduced at time tk−1 and assimilated at time tk , to estimate the dominant part of the forecast error covariance matrix Pfk+1 (b) After using, the evolved perturbation δx(tk−1 ) is discarded (c) At the same time tk+1 , through the same assimilation scheme we calculate both the analysis state of the system xak+1 and assimilate the perturbation δx(tk ), introduced the previous assimilation time tk and evolved for all t ∈ ]tk , tk+1 ] 56 (d) Furthermore, also at time tk+1 , we randomly perturb the analysis state xak+1 with a new, small vector δx(tk+1 ); that’s the refresh 2. For all t ∈ ]tk+1 , tk+2 ]: (a) We evolve this new perturbation δx(tk+1 ) with the same dynamics of the system, that is M = M(xak+1 ), with the TLM operator M evaluated along the forecast trajectory x(t) (b) We evolve also δx(tk ), that was introduced at time tk and that underwent to evolution for t ∈ ]tk , tk+1 ] as well as an assimilation procedure at the assimilation step tk+1 (see item 1.c) (c) We let the new perturbation δx(tk+1 ) grow for a suitable time before use, in this example for a breeding time ∆t = 2τ , so it will be used at assimilation time tk+3 . Since the time interval ∆t is greater than the assimilation window τ , the evolved perturbation δx(tk+1 ) will undergo to the assimilation process at time tk+2 . The assimilation will be done by using the perturbation δx(tk ), evolved and yet assimilated at previous assimilation step tk+1 3. At assimilation step tk+2 the cycle is repeated Throughout this work we will use a single bred vector δx(tk ). When it is ready, after a breeding time ∆t = 2τ , it is assumed to capture the most unstable structure of the forced system. It may be normalized or not, depending on the particular scheme adopted: if not normalized, an opportune initial length has to be chosen. 57 58 Chapter 3 Assimilation in the Lorenz 63 model: comparison among different methods In the context of Lorenz convective 3-dimensional model (1963), we will run observation system simulation experiments in a perfect model setting and, in section 3.6, with some kind of model error as well. In section 3.2, we will describe the algorithms used for different flavors of Extended Kalman Filters, while in section 3.3 we will do the same for different AUS assimilation schemes with increasing capabilities. We will then illustrate the results in section 3.4. In section 3.5 we will show some examples about the different behavior of the EKF and AUS assimilation schemes in some specific circumstances. Finally, in section 3.6, we will test our methods in the presence of model error. 3.1 Experimental setups Since we are mainly interested in comparing different data assimilation schemes in critical circumstances, we choose hard setups for our experiments. Basically we will perform three kind of experiments, concerning: • Synchronization to the “truth” • Noisy observations • Model error 59 In synchronization experiments we will qualitatively show for a few typical cases the capabilities to converge to the truth for the EKF, the Evensen’s flavor of EKF (see section 3.2), and the best-performance AUS scheme (see section 3.3). In particular we will use a mix of perfect or quasi-perfect observations (σ 2 = 0, σ 2 = 0.01 or σ 2 = 0.1) with long or very long assimilation windows (τ = 0.25 or τ = 0.6). In these case studies we will observe only the y variable, which is the most valuable one. In noisy observation experiments the goal is to show the average RMS analysis and forecast error for all the different DA schemes with a long assimilation window (τ = 0.25) and noisy observations (σ 2 = 2). The results shown are the mean values of 100,000 assimilations, with 3 or 2 variables observed. When 2 observations are used, the most valuable two (y and z) are used for the EKF schemes, while for the AUS schemes we will use two adaptive observations (see subsection 2.3.5). Similar trials with different noisy observations (σ 2 = 1 or σ 2 = 0.1) will be performed, with 3 observed variables. In model error experiments, first we will perform trials by adding a random error to the model equations at each integration step, then we will test the effects of systematic error by varying one model parameter. For these experiments the results shown are the mean values of 20,000 assimilations. 3.2 Extended Kalman Filters When the EKF is applied to a strongly nonlinear system, such as Lorenz’s three variables model, filter divergence can occur. Different empirical techniques have been devised to overcome this difficulty: Evensen in 1997 added a term Q akin to the model error covariance term in EKF [11]; Yang et al. in 2006 perturbed the analysis error covariance matrix and inflated the background error covariance matrix [36]. 3.2.1 EKF The assimilation window is set to τ = 0.25, which is a quite large value for this dynamical system: due to the limits of the linear approximation, in these conditions the EKF tends to have poor performances with respect to other assimilation schemes, and often filter divergence occurs. These experiments are based on standard EKF equations 2.76-2.80, with the covariance 60 matrix Q set to zero, so: xfk+1 = M xak (3.1) Pfk+1 = MPak MT (3.2) Kk+1 = Pfk+1 HT (HPfk+1 HT + R)−1 (3.3) xak+1 o = (I − Kk+1 H)xfk+1 + Kk+1 yk+1 (3.4) Pak+1 = (I − Kk+1 H)Pfk+1 (3.5) Here the observation operator is the same as its Jacobian H = H. If we observe 3 variables at each assimilation step, H will reduce to the identity 3 × 3 matrix: Hxyz 1 =I= 0 0 0 0 1 0 0 1 (3.6) If we observe 2 variables, they will be y and z, the most useful. The operator H will be 0 1 Hyz = 0 0 3.2.2 0 1 (3.7) Evensen’s version of EKF In Evensen [11], where the model is considered perfect, standard EKF equations 2.76-2.80 are still used, but the covariance matrix Q, an additive term akin to the model error covariance, is used as a correction to avoid filter divergence. The matrix Q has been estimated after optimization: Q = 3.2.3 0.1491 0.1505 0.0007 0.1505 0.9048 0.0014 0.0007 0.0014 0.9180 (3.8) Yang’s version of EKF In Yang et al. (2006) the analysis error covariance has been perturbed and the background error covariance inflated [36]: • Random noise: small random perturbations uniformly distributed between 0 and 1, and multiplied by µ = 0.1 (when 3 variables are observed) or µ = 0.2 (when 2 variables are 61 observed) are added to the diagonal of the analysis error covariance matrix obtained with the EKF at every assimilation step • Inflation: the background error covariance is inflated by a factor of 1 + δ = 1.1, prior to the analysis step Once the empirical parameters have been optimized for a specific assimilation interval and observation noise variance, these techniques provide satisfactory results, particularly for sufficiently short assimilation intervals. 3.3 Assimilation in the Unstable Subspace: further developments In this section we will apply, to the L63 dynamical system, different formulations of the AUS scheme, with increasing capabilities. We use a single unstable vector of the analysis-forecast system, estimated by BDAS (see section 2.4). So the recurrence equations are [3]: xfk+1 = M xak Pfk+1 = γ 2 ek+1 eTk+1 (3.10) Kk+1 = (3.11) xak+1 = 2 −1 2 γk+1 ek+1 (Hek+1 )T γk+1 (Hek+1 )(Hek+1 )T + R Pak+1 = (3.9) o (I − Kk+1 H)xfk+1 + Kk+1 yk+1 (3.12) (I − Kk+1 H)Pfk+1 (3.13) where the symbols have the usual meaning as in EKF equations, except for: • the normalized column vector ek+1 , that is the single unstable unit vector, at assimilation step tk+1 , estimated by BDAS. The bulk of the forecast error covariance matrix Pfk+1 is computed in the unstable 1-dimensional subspace defined by the unit vector ek+1 , and assumed to be small in the complementary subspace • the amplitude of the forecast error γ Note also that here the observation operator H is actually linear (we simply observe a variable or not) and it has then been replaced by its linearized version: H = H. The analysis covariance matrix Pak+1 (eq. 3.13) has been shown here for completeness, but it’s not actually used in the estimate of the successive Pfk+2 . 62 A key problem in AUS is the estimate of the amplitude of the forecast error γ. We begin in subsection 3.3.1 with a very basic AUS-γ0 approach, where no use of observations is made in the estimate of γ, then in subsection 3.3.2 we introduce a new estimate of γ from observations, that will be used in subsection 3.3.3. Then a new formulation of AUS is introduced in subsections 3.3.4 and 3.3.5. 3.3.1 AUS-γ0 : no use of observations in the estimate of the forecast error amplitude This DA scheme is a simplified formulation of the AUS approach. The main reason why we implement this assimilation scheme is to show that we actually need a way to estimate the amplitude γ of the forecast error in the unstable subspace from observations. In this assimilation scheme, instead, we simply optimize the initial amplitude α used in the re-normalization of the random perturbation. By breeding on the data-assimilation system (BDAS), we obtain the perturbation in the unstable 1-dimensional subspace, whose amplitude is γ, at the time it is used in the assimilation. During the forecast steps this perturbation will amplify and its amplitude will be reduced at assimilation time. Let’s suppose that the breeding time has been chosen equal to 2 assimilation windows: the unstable vector may need more time than a single assimilation window to grow and to capture the instability of the flow. So, at each assimilation step tk+1 (see Table 2.1): • we use in the assimilation process the vector γ ek+1 , i.e. the evolution, through the tangent linear model propagator, of the random perturbation α δxk−1 introduced two assimilation steps earlier, that underwent also the assimilation process in the previous step tk through the (I − Kk H) operator • in the same way, we assimilate the observations in the trajectory obtained by adding α δxk to the control at the previous assimilation step. The evolved perturbation will be used at the next assimilation step, time tk+2 , to estimate Pfk+2 • we introduce a new perturbation α δxk+1 , where δxk+1 is a random vector whose components are Gaussian, with zero mean and standard deviation 1; its evolution will be used at assimilation step tk+3 , after having undergone the assimilation process at time tk+2 The parameter α needs to be tuned. In the present application the optimized value is α = 1. The estimate of the forecast error covariance matrix is Pfk+1 = γ 2 ek+1 eTk+1 63 (3.14) The Kalman gain, in the same spirit as in the EKF, is Kk+1 = Pfk+1 HT (HPfk+1 HT + R)−1 (3.15) It will be used to estimate the analysis state, by correcting the forecast state, via the xak+1 = (I − Kk+1 H) xfk+1 + Kk+1 yok+1 (3.16) and the analysis covariance matrix Pak+1 = (I − Kk+1 H)Pfk+1 (3.17) The unstable vector δxk , introduced at the previous assimilation step tk , will be corrected as well: δxak = (I − Kk+1 H) Mδxk−1 3.3.2 (3.18) Estimate of the amplitude γ of the forecast error from observations An estimate of the amplitude γ of the forecast error in a single unstable direction e can be obtained in the following way, by rearranging the definition of the forecast error and by assuming that the bulk of it is along the vector e: Hη f = H(xf − xt ) xf − xt H(xf − xt ) = −d + η o (3.19) = γe + δf (3.20) = γHe + δHf (3.21) where we dropped the subscripts k + 1 referring to the assimilation time and where d = yo − Hxf innovation ηo = yo − Hxt observational error ηf = xf − xt forecast error 64 The vector f spans the subspace complementary of the unstable subspace. So: γHe + δHf = −d + η o (3.22) γeT HT He + δeT HT Hf = −eT HT d + eT HT η o (3.23) Now, if we left-multiply by eT HT : Neglecting the terms δeT HT Hf and eT HT η o (which is zero, on average), we obtain the following estimate of γ: γ ' −(eT HT He)−1 eT HT d (3.24) Our estimate of Pf is the same as in eq. 2.106, but with this new estimate of γ: Pf ' γ 2 e · eT (3.25) This new estimate of γ also affects the corresponding gain matrix (see eq. 2.107): K ' γ 2 e eT HT (γ 2 H e eT HT + R)−1 (3.26) Accordingly, the analysis increment of eq. 2.95 becomes: δxa = xa − xf = K (yo − Hxf ) (3.27) = γ 2 eeT HT (γ 2 HeeT HT + R)−1 (yo − Hxf ) (3.28) = ec (3.29) In equation 3.29, c is a scalar coefficient. When the assimilation interval is sufficiently long with respect to the typical doubling time and the forecast error becomes large with respect to the observation error, the estimate of γ from observations leads to significant improvement of assimilation performance. 3.3.3 AUS-γ: using the estimate of γ from observations In this scheme we use the same approach as in AUS-γ0 , but with three important improvements: • The unstable direction e (normalized to 1), which is again estimated via BDAS, is estimated by evolving an infinitesimal perturbation δxk−1 , so that we can expect the perturbed trajectory to still be on the attractor (or at least not far from it) • An estimate of the amplitude of the forecast error γ from observations, as described in 65 subsection 3.3.2; the amplitude of the forecast error is intended to be in the unstable direction e • A new estimate of Pfk+1 The second and third improvements have been discussed in subsection 3.3.2: after estimating the amplitude of the forecast error γ from observations, the forecast error covariance matrix will be Pfk+1 = γ 2 ek+1 eTk+1 (3.30) The gain matrix will be computed with Kk+1 = Pfk+1 HT (HPfk+1 HT + R)−1 (3.31) which is formally the same as eq. 3.15, but the better estimate of Pfk+1 will in turn provide the following better approximated expression, already mentioned in eq. 3.26: Kk+1 = γ 2 ek+1 eTk+1 HT (γ 2 H ek+1 eTk+1 HT + R)−1 We are now able to compute the analysis state xak+1 and the analysis covariance matrix Pak+1 through equations 3.16 and 3.17, as usual. It will be shown in section 3.4 that this algorithm will greatly enhance the performance of AUS scheme. 3.3.4 Iterating This assimilation scheme is a further improvement to AUS-γ. After calculating the coefficient c at time tk+1 (eq. 3.29), we apply at the previous analysis state xak the perturbation ∆xak = Rt c exp(− tkk+1 λ(t) dt)ek that, if the error behaved linearly, would lead exactly to the analysis xak+1 obtained from the analysis increment expression (3.27-3.29). In fact, from time tk to tk+1 , we have: Mek = exp Z tk+1 λ(t) dt ek+1 tk (3.32) where λ(t) is the leading local Lyapunov exponent. Therefore: M∆xak Z = c exp − tk+1 λ(t) dt Mek tk tk+1 Z = c exp − λ(t) dt exp tk = c ek+1 (3.33) Z tk+1 tk λ(t) dt ek+1 (3.34) (3.35) 66 We now integrate the system nonlinearly from this new estimate of the previous analysis state xak , evolve the perturbation ek following the updated nonlinear trajectory from tk to tk+1 and perform the final analysis at time tk+1 . This procedure is hereafter referred to as AUSiterating: it is a major improvement with respect to the AUS-γ scheme. 3.3.5 Iterating and using a quasi-static Pf in stable zones of the attractor This technique is a refinement of AUS-iterating. If the dynamical system is passing through a zone of the attractor where there are stable trajectories, i.e. where they do not diverge, we can assume that errors do not actually grow. So, in these stable situations (where the amplification factor during forecast step is ≤ 1) we use a quasi-static, diagonal Pf . If: kM δxk k < kδxk k (3.36) then we set: Pfk+1 = a2 kM δxk k kδxk k 2 I (3.37) where M δxk is the evolution from time tk to tk+1 of the perturbation δxk , which in turn is the evolution of a random perturbation, introduced at time tk−1 , which has grown from time tk−1 to tk and undergone an assimilation process at time tk . The coefficient a2 is proportional to the square of the average analysis error. In particular, it has been chosen a2 = 1 < η a >2 2 (3.38) where η a is the analysis error. This procedure is hereafter referred to as AUS-iterating+: it provides the best performances. 3.4 Comparing results In this section we will compare the performances of the different assimilation schemes under investigation. In the following remarks we will use the acronyms of Table 3.1. We will begin with some case studies of synchronization to the truth with a perfect or quasiperfect observation (the variable y) and a long assimilation window; then we will deal with 67 Table 3.1: The different data assimilation schemes under investigation with their main features. EKF pure EKF EKF-Evensen Evensen’s EKF, with an optimized Q [11] EKF-Yang Yang’s EKF, with random noise in Pa and inflation [36] AUS-γ0 AUS with no use of observations for the estimate of the amplitude of the forecast error γ; Pf = γ 2 eeT AUS-γ AUS with γ = −(eT HT He)−1 eT HT d, AUS-iterating AUS-γ with iterations AUS-iterating+ AUS-iterating with static Pfk+1 if amplification factor ≤ 1 during forecast step Pf = γ 2 eeT shorter assimilation windows and noisy observations in different configurations. 3.4.1 Synchronization: one perfect or quasi-perfect observation, case studies In this subsection we present a few case studies about synchronization with truth for perfect or quasi-perfect observations, for different assimilation windows. The case studies shown have been chosen for the clarity of the resulting plots and because they are neither particularly advantageous nor pathological for any assimilation scheme. We’re going to compare only EKF, EKF-Evensen and AUS-iterating assimilation schemes, which do not need any tuning of parameters. We will notice that all of them eventually converge to the truth, but AUS-iterating converges faster than EKF or EKF-Evensen. Furthermore, for critical situations, in terms of assimilation window length or observational error amplitude, AUS-iterating still converges to the truth in cases when the EKF or EKF-Evensen do not. It’s known that, even with a single perfect observation only, the EKF may synchronize with the truth, if the assimilation window is not too long. This is shown in Fig. 3.1, where we observe y with no observational error (σ 2 = 0) with assimilation window τ = 0.25. You can notice how AUS-iterating DA scheme converges much faster. The pure EKF and the EKF-Evensen version give the same results for σ 2 = 0, since the gain matrix and the analysis vector will reduce in both cases to the same simpler expression. 68 Plot of RMS error (Analysis−Truth) in RMS against time 1 EKF & EKF−Evensen AUS−iterating 0.9 0.8 0.7 RMS Error 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 Time 6 7 8 9 10 Figure 3.1: Synchronization with truth for both EKF/EKF-Evensen and AUS-iterating assimilation schemes. The only observed variable is y with variance σ 2 = 0 and assimilation window τ = 0.25 From the Kalman gain expression (eq. 2.78): Kk+1 = Pfk+1 HT (HPfk+1 HT + R)−1 if the matrix R = 0, in observation space: HKk+1 = HPfk+1 HT (HPfk+1 HT )−1 (3.39) = I (3.40) Thus, from the analysis state expression (eq. 2.79): o xak+1 = (I − Kk+1 H)xfk+1 + Kk+1 yk+1 we get: o Hxak+1 = yk+1 (3.41) Just for example, in Figures 3.2-3.5 the EKF and EKF-Evensen solution are shown for x and the assimilation performance for σ 2 = 0 and assimilation window τ = 0.6. The vertical dotted 69 Solution for x 20 15 10 x 5 0 −5 −10 −15 −20 Truth Observations Analysis 0 5 10 15 Time 20 25 30 Figure 3.2: EKF and EKF-Evensen (same plot in these conditions, see text and eq. 3.40) find it hard to synchronize with truth. The only observed variable is y with variance σ 2 = 0 and assimilation window τ = 0.6 line is the time of the last observation. In Figures 3.6-3.10 the same is shown for σ 2 = 0.01 and assimilation window τ = 0.6 while Figures 3.11-3.15 show the case with σ 2 = 0.1 and assimilation window τ = 0.25. We can see that, when synchronization occurs for all DA schemes, AUS-iterating has the most rapid convergence to the truth; when it does not occur, AUS-iterating has an overall better performance. Furthermore, there are circumstances in which AUS-iterating synchronizes to the truth while neither EKF nor EKF-Evensen do. 3.4.2 Noisy observations with variance σ 2 = 2 These experiments are performed with the following setup: • A 105 assimilations statistics • 3 or 2 noisy observations at each assimilation step, with variance σ 2 = 2 ⇒ σ = 1.414 • an assimilation window τ = 0.25 70 Solution for x 20 15 10 x 5 0 −5 −10 −15 −20 Truth Observations Analysis 0 5 10 15 Time 20 25 30 Figure 3.3: AUS-iterating: synchronization with truth. The only observed variable is y with variance σ 2 = 0 and assimilation window τ = 0.6 Plot of RMS error (Analysis−Truth) in RMS against time 60 EKF & EKF−Evensen AUS−iterating 50 RMS Error 40 30 20 10 0 0 5 10 15 Time 20 25 30 Figure 3.4: EKF/EKF-Evensen and AUS-iterating RMS errors. While the former fail to converge to the truth, the latter synchronizes very quickly. The only observed variable is y with variance σ 2 = 0 and assimilation window τ = 0.6 71 Plot of RMS error (Analysis−Truth) in RMS against time 3.5 EKF & EKF−Evensen AUS−iterating 3 RMS Error 2.5 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 Time Figure 3.5: EKF/EKF-Evensen and AUS-iterating RMS errors: a zoom of the previous Fig. 3.4. Solution for x 30 Truth Observations Analysis 20 10 x 0 −10 −20 −30 −40 −50 0 5 10 15 Time 20 25 30 Figure 3.6: EKF: the only observed variable is y with variance σ 2 = 0.01 and assimilation window τ = 0.6; in this case and in these conditions the EKF fails to synchronize to the truth. Note also the bad performance around time 14. 72 Solution for x 20 15 10 x 5 0 −5 −10 −15 −20 Truth Observations Analysis 0 5 10 15 Time 20 25 30 Figure 3.7: EKF-Evensen: the only observed variable is y with variance σ 2 = 0.01 and assimilation window τ = 0.6. A better performance than pure EKF. Solution for x 20 15 10 x 5 0 −5 −10 −15 −20 Truth Observations Analysis 0 5 10 15 Time 20 25 30 Figure 3.8: AUS-iterating: the only observed variable is y with variance σ 2 = 0.01 and assimilation window τ = 0.6. A far better performance. 73 Plot of RMS error (Analysis−Truth) in RMS against time 120 EKF EKF−Evensen AUS−iterating 100 RMS Error 80 60 40 20 0 0 5 10 15 20 25 30 Time Figure 3.9: EKF, EKF-Evensen and AUS-iterating RMS errors: respective performances. The only observed variable is y with variance σ 2 = 0.01 and assimilation window τ = 0.6 Plot of RMS error (Analysis−Truth) in RMS against time 60 EKF EKF−Evensen AUS−iterating 50 RMS Error 40 30 20 10 0 0 1 2 3 4 5 6 7 8 9 10 Time Figure 3.10: EKF, EKF-Evensen and AUS-iterating RMS errors: a zoom of the previous Fig. 3.9. 74 Solution for x 20 Truth Observations Analysis 15 10 x 5 0 −5 −10 −15 −20 0 5 10 15 Time 20 25 30 Figure 3.11: EKF does not synchronize with truth. The only observed variable is y with variance σ 2 = 0.1 and assimilation window τ = 0.25 Solution for x 20 Truth Observations Analysis 15 10 x 5 0 −5 −10 −15 −20 0 5 10 15 Time 20 25 30 Figure 3.12: EKF-Evensen does not synchronize with truth. The only observed variable is y with variance σ 2 = 0.1 and assimilation window τ = 0.25 75 Solution for x 20 Truth Observations Analysis 15 10 5 x 0 −5 −10 −15 −20 0 5 10 15 Time 20 25 30 Figure 3.13: AUS-iterating does not synchronize with truth. The only observed variable is y with variance σ 2 = 0.1 and assimilation window τ = 0.25 Plot of RMS error (Analysis−Truth) in RMS against time 50 EKF EKF−Evensen AUS−iterating 45 40 35 RMS Error 30 25 20 15 10 5 0 0 5 10 15 Time 20 25 30 Figure 3.14: The only observed variable is y with variance σ 2 = 0.1 and assimilation window τ = 0.25: in these conditions, no assimilation scheme under investigation actually converge to the truth, but AUS-iterating has a better performance than EKF-Evensen, which in turn is far better than pure EKF. 76 Plot of RMS error (Analysis−Truth) in RMS against time 9 EKF EKF−Evensen AUS−iterating 8 7 RMS Error 6 5 4 3 2 1 0 0 2 4 6 8 10 Time 12 14 16 18 20 Figure 3.15: A zoom of the previous Fig. 3.14. Even if the global performance of EKF is poor because of filter divergence, this does not mean that EKF is always worse than others DA schemes for all times; while the global performance of AUS-iterating is globally better. When 2 observations are used, the most valuable two (y and z) are used for the EKF schemes; for the AUS schemes two adaptive observations are used (see subsection 2.3.5). In Figures 3.163.20 we show the average RMS analysis error distributions for 3 and 2 observed variables, and the forecast error distribution at time T+0.25, T+0.5 and T+0.75, where T is the assimilation time. All distributions are simply truncated at RMS error equal to 10. These distributions show that not only AUS-iterating schemes are better — on average — than competitors, but their right tails are much less populated, too: the regime’s changes tracking capability has been highly improved. Numerical results are also shown in Tables 3.2 and 3.3: we can see that the AUS-iterating schemes outperform the other techniques, with even an average RMS analysis error well below the observations standard deviation. Similar relative performances can be seen for the average RMS forecast error. A further conclusion may be drawn: from the results of AUS-γ0 and AUS-γ schemes, we see that the proposed estimate of the forecast error amplitude from observations is really helpful, since it greatly boosts the assimilation performance. 77 Table 3.2: RMS analysis error, an average over 100,000 assimilations. 3 and 2 noisy observations with variance σ 2 = 2 ⇒ σ = 1.414. Assimilation window τ = 0.25. assimilation technique EKF EKF-Evensen EKF-Yang AUS-γ0 AUS-γ AUS-iterating AUS-iterating+ τ = 0.25, 3 obs 15.5 1.72 3.77 7.02 2.27 1.38 1.16 τ = 0.25, 2 obs 15.6 1.79 3.90 7.37 2.52 1.58 1.33 Table 3.3: RMS forecast error, an average over 100,000 assimilations. The mean RMS analysis error is the same as in Table 3.2 and is shown here again for comparison. 3 noisy observations with variance σ 2 = 2 ⇒ σ = 1.414. Assimilation window τ = 0.25. Assimilation technique <RMS analysis error> EKF EKF-Evensen EKF-Yang AUS-γ0 AUS-γ AUS-iterating AUS-iterating+ 15.5 1.72 3.77 7.02 2.27 1.38 1.16 <RMS forecast error> @ T+0.25 16.6 3.27 5.36 8.81 3.79 2.75 2.33 78 @ T+0.50 16.9 6.26 6.97 10.4 6.23 4.94 4.06 @ T+0.75 16.9 7.24 7.64 11.1 7.33 5.61 4.71 RMS Analysis Error distribution − 105 assimilations, τ=0.25, 3 obs., σ2=2 8000 EKF EKF−Evensen EKF−Yang AUS−γo AUS−γ AUS−iterating AUS−iterating+ 7000 Number of realizations 6000 5000 4000 3000 2000 1000 0 0 1 2 3 4 5 6 7 8 9 10 RMS Analysis Error Figure 3.16: RMS Analysis Error distribution: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 2 RMS Analysis Error distribution − 105 assimilations, τ=0.25, 2obs., σ2=2 7000 EKF EKF−Evensen EKF−Yang AUS−γo AUS−γ AUS−iterating AUS−iterating+ 6000 Number of realizations 5000 4000 3000 2000 1000 0 0 1 2 3 4 5 6 7 8 9 10 RMS Analysis Error Figure 3.17: RMS Analysis Error distribution: an average on 100,000 assimilations with an assimilation window τ = 0.25, 2 noisy observations with variance σ 2 = 2 79 RMS Forecast Error distribution @time=T+0.25, 105 assimilations, 3 obs., σ2=2 6000 EKF EKF−Evensen EKF−Yang AUS−γo AUS−γ AUS−iterating AUS−iterating+ Number of realizations 5000 4000 3000 2000 1000 0 0 1 2 3 4 5 6 7 8 9 10 RMS Forecast Error Figure 3.18: RMS Forecast Error distribution @ time=T+0.25: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 2 RMS Forecast Error distribution @time=T+0.50, 105 assimilations, 3 obs., σ2=2 5000 EKF EKF−Evensen EKF−Yang AUS−γo AUS−γ AUS−iterating AUS−iterating+ 4500 Number of realizations 4000 3500 3000 2500 2000 1500 1000 500 0 0 1 2 3 4 5 6 7 8 9 10 RMS Forecast Error Figure 3.19: RMS Forecast Error distribution @ time=T+0.50: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 2 80 RMS Forecast Error distribution @time=T+0.75, 105 assimilations, 3 obs., σ2=2 4000 EKF EKF−Evensen EKF−Yang AUS−γo AUS−γ AUS−iterating AUS−iterating+ 3500 Number of realizations 3000 2500 2000 1500 1000 500 0 0 1 2 3 4 5 6 7 8 9 10 RMS Forecast Error Figure 3.20: RMS Forecast Error distribution @ time=T+0.75: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 2 3.4.3 Noisy observations with variance σ 2 = 1 Now we’re going to diminish the variance of the observational error. Since the relative performance are similar, we will only show the results for 3 observations probing. So the experimental context is: • A 100,000 assimilations statistics • 3 noisy observations at each assimilation step, with variance σ 2 = 1 • an assimilation window τ = 0.25 The results, shown in Table 3.4 and in Figures 3.21-3.24, confirm those with σ 2 = 2. Note that EKF is still under-performing, and still experiencing frequent filter divergence. 3.4.4 Noisy observations with variance σ 2 = 0.1 Let’s continue to lower observational noise variance: all the experiments are now performed under the following conditions: • A 100,000 assimilations statistics 81 Table 3.4: RMS analysis and forecast error, an average over 100,000 assimilations. 3 noisy observations with variance σ 2 = 1. Assimilation window τ = 0.25. Assimilation technique <RMS analysis error> EKF EKF-Evensen EKF-Yang AUS-γ0 AUS-γ AUS-iterating AUS-iterating+ 15.2 1.30 0.99 3.55 1.42 0.96 0.80 <RMS forecast error> @ T+0.25 16.3 2.39 1.78 4.93 2.52 1.94 1.63 @ T+0.50 16.6 4.94 3.04 6.65 4.56 3.77 3.07 @ T+0.75 16.6 5.93 3.63 7.50 5.57 4.39 3.65 5 2 RMS Analysis Error distribution, 10 assimilations, τ=0.25, 3 obs., σ =1 12000 EKF EKF−Evensen EKF−Yang AUS−γo AUS−γ AUS−iterating AUS−iterating+ Number of realizations 10000 8000 6000 4000 2000 0 0 1 2 3 4 5 6 7 8 9 10 RMS Analysis Error Figure 3.21: RMS Analysis Error distribution: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 1 82 RMS Forecast Error distribution @time=T+0.25, 105 assimilations, 3 obs., σ2=1 9000 EKF EKF−Evensen EKF−Yang AUS−γo AUS−γ AUS−iterating AUS−iterating+ 8000 Number of realizations 7000 6000 5000 4000 3000 2000 1000 0 0 1 2 3 4 5 6 7 8 9 10 RMS Forecast Error Figure 3.22: RMS Forecast Error distribution @ time=T+0.25: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 1 RMS Forecast Error distribution @time=T+0.50, 105 assimilations, 3 obs., σ2=1 7000 EKF EKF−Evensen EKF−Yang AUS−γo AUS−γ AUS−iterating AUS−iterating+ 6000 Number of realizations 5000 4000 3000 2000 1000 0 0 1 2 3 4 5 6 7 8 9 10 RMS Forecast Error Figure 3.23: RMS Forecast Error distribution @ time=T+0.50: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 1 83 RMS Forecast Error distribution @time=T+0.75, 105 assimilations, 3 obs., σ2=1 6000 EKF EKF−Evensen EKF−Yang AUS−γo AUS−γ AUS−iterating AUS−iterating+ Number of realizations 5000 4000 3000 2000 1000 0 0 1 2 3 4 5 6 7 8 9 10 RMS Forecast Error Figure 3.24: RMS Forecast Error distribution @ time=T+0.75: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 1 • 3 noisy observations each assimilation step, with variances σ 2 = 0.1 ⇒ σ = 0.316 • an assimilation window τ = 0.25 The results are shown in Table 3.5 and in Figures 3.25-3.28, where all distributions are simply truncated where RMS error is equal to 5. 3.4.5 Root Mean Square forecast error: time dependence Now we show the average time growth of RMS analysis error. The average is computed on 100000 assimilation steps. There are 3 variables observed, each assimilation window τ = 0.25. At time t = 0 the curves show the average RMS analysis error (see Tables 3.3, 3.4 and 3.5). The plots are listed on the basis of the variance of the observational error: • In the 1st plot, Fig. 3.29: σ 2 = 2 • in the 2nd plot, Fig. 3.30: σ 2 = 1 • in the 3rd plot, Fig. 3.31: σ 2 = 0.1 The EKF curve has been intentionally plotted only in the first one. Again, the AUS-iterating schemes, in particular AUS-iterating+, set the benchmark. 84 Table 3.5: RMS analysis and forecast error, an average over 100,000 assimilations. 3 noisy observations with variance σ 2 = 0.1 ⇒ σ = 0.316. Assimilation window τ = 0.25. Assimilation technique <RMS analysis error> EKF EKF-Evensen EKF-Yang AUS-γ0 AUS-γ AUS-iterating AUS-iterating+ 15.0 0.49 0.25 0.54 0.36 0.30 0.26 4 3 x 10 <RMS forecast error> @ T+0.25 16.0 0.79 0.53 1.02 0.72 0.61 0.52 @ T+0.50 16.4 1.86 1.21 1.99 1.56 1.37 1.14 5 2 RMS Analysis Error distribution − 10 assim., τ=0.25, 3 obs., σ =0.1 EKF EKF−Evensen EKF−Yang AUS−γo AUS−γ AUS−iterating AUS−iterating+ 2.5 Number of realizations @ T+0.75 16.4 2.67 1.62 2.74 2.20 1.85 1.53 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 RMS Analysis Error Figure 3.25: RMS Analysis Error distribution: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 0.1 85 2 RMS Forecast Error distribution @time=T+0.25, σ =0.1 4 2.5 x 10 EKF EKF−Evensen EKF−Yang AUS−γo AUS−γ AUS−iterating AUS−iterating+ Number of realizations 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 RMS Forecast Error Figure 3.26: RMS Forecast Error distribution @ time=T+0.25: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 0.1 2 RMS Forecast Error distribution @time=T+0.50, σ =0.1 4 2 x 10 EKF EKF−Evensen EKF−Yang AUS−γo AUS−γ AUS−iterating AUS−iterating+ 1.8 Number of realizations 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 RMS Forecast Error Figure 3.27: RMS Forecast Error distribution @ time=T+0.50: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 0.1 86 RMS Forecast Error distribution @time=T+0.75, σ2=0.1 15000 Number of realizations EKF EKF−Evensen EKF−Yang AUS−γo AUS−γ AUS−iterating AUS−iterating+ 10000 5000 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 RMS Forecast Error Figure 3.28: RMS Forecast Error distribution @ time=T+0.75: an average on 100,000 assimilations with an assimilation window τ = 0.25, 3 noisy observations with variance σ 2 = 0.1 <RMS Analysis+Forecast Error> − 105 assimilations, τ=0.25, 3 obs., σ2=2 18 EKF EKF−Evensen EKF−Yang AUS−γ0 16 AUS−γ AUS−iterating AUS−iterating+ 14 RMS Error 12 10 8 6 4 RMS Error=σ 2 0 Analysis 0.25 0.5 0.75 Time Figure 3.29: RMS analysis and forecast error: an average on 100,000 assimilation steps, assimilation window τ = 0.25, 3 variables observed with variance σ 2 = 2 ⇒ σ = 1.414. 87 <RMS Analysis+Forecast Error> − 105 assimilations, τ=0.25, 3 obs., σ2=1 10 9 8 RMS Error 7 EKF−Evensen EKF−Yang AUS−γ0 AUS−γ AUS−iterating AUS−iterating+ 6 5 4 3 2 RMS Error=σ 1 0 Analysis 0.25 0.5 Time 0.75 Figure 3.30: RMS analysis and forecast error: an average on 100,000 assimilation steps, assimilation window τ = 0.25, 3 variables observed with variance 1. 3.5 Some illustrative examples In this section we show some examples of noisy data assimilation, outputs of the different schemes probed in this work: in subsection 3.5.1 we present a case study with the same truth trajectory of the system (the variable x only), and the assimilation performances of our DA schemes. In subsection 3.5.2 we show a 3D plot of the AUS technique with some comments, while in subsection 3.5.3 we provide a step by step description AUS-iterating schemes. 3.5.1 Comparing AUS with EKF assimilation schemes: case study In Figures 3.32-3.38 we show, for the same case, the solution for the variable x and the analysis performed. The vertical green line shows the time of the last observation available for the assimilation. The experiment setup is: • assimilation window τ = 0.25 • 3 variables observed with error variance σ 2 = 2 We can see that, in this context, the EKF tends not to capture the regime changes, which is consistent with the average results over 100,000 assimilations shown in Table 3.3. EKF-Evensen 88 5 2 <RMS Analysis+Forecast Error> − 10 assimilations, τ=0.25, 3 obs., σ =0.1 3.5 3 RMS Error 2.5 EKF−Evensen EKF−Yang AUS−γ0 AUS−γ AUS−iterating AUS−iterating+ 2 1.5 1 RMS Error=σ 0.5 0 Analysis 0.25 0.5 Time 0.75 Figure 3.31: RMS analysis + forecast error: an average on 100,000 assimilation steps, assimilation window τ = 0.25, 3 variables observed with variance σ 2 = 0.1 ⇒ σ = 0.316. Figure 3.32: EKF assimilation scheme: solution for x. The green line shows the time of the last observation available. EKF − Solution for x 25 Truth Background Observations Analysis 20 15 10 x 5 0 −5 −10 −15 −20 −25 0 5 10 15 Time 89 20 25 30 Figure 3.33: EKF-Evensen assimilation scheme: solution for x. EKF−Evensen − Solution for x 25 Truth Background Observations Analysis 20 15 10 x 5 0 −5 −10 −15 −20 −25 0 5 10 15 20 25 30 Time Figure 3.34: EKF-Yang assimilation scheme: solution for x. EKF−Yang − Solution for x 25 Truth Background Observations Analysis 20 15 10 x 5 0 −5 −10 −15 −20 −25 0 5 10 15 Time 90 20 25 30 Figure 3.35: AUS-γ0 assimilation scheme: solution for x. AUS−γ0 − Solution for x 25 Truth Background Observations Analysis 20 15 10 x 5 0 −5 −10 −15 −20 −25 0 5 10 15 20 25 30 Time Figure 3.36: AUS-γ assimilation scheme: solution for x. AUS−γ − Solution for x 25 Truth Background Observations Analysis 20 15 10 x 5 0 −5 −10 −15 −20 0 5 10 15 Time 91 20 25 30 Figure 3.37: AUS-iterating assimilation scheme: solution for x. AUS−iterating − Solution for x 25 Truth Background Observations Analysis 20 15 10 x 5 0 −5 −10 −15 −20 0 5 10 15 20 25 30 Time Figure 3.38: AUS-iterating+ assimilation scheme: solution for x. AUS−iterating+ − Solution for x 25 Truth Background Observations Analysis 20 15 10 x 5 0 −5 −10 −15 −20 0 5 10 15 Time 92 20 25 30 Figure 3.39: How AUS-γ assimilates observations. 45 Truth AUS−γ assim. windows = 0.8 2 perfect obs. (σ =0) observed: y 40 t=0 35 30 t=0 z t=T 25 t=T 20 t=2T 15 10 −20 0 x 20 −20 −10 −15 −5 0 5 10 15 20 y and EKF-Yang perform definitely better, particularly the former. As for the AUS schemes we can appreciate the improvements from the under-performing AUS-γ0 up to AUS-iterating. Note that EKF-Evensen has here a better performance in the forecast period than the AUSiterating techniques, but this is not the general case: as already mentioned in Table 3.3, the average performance of EKF-Evensen is worse than that of AUS-iterating, both for analysis error and for forecast error. 3.5.2 AUS-γ: a 3D example In Fig. 3.39 we show what is the essence of the AUS approach. We choose: • a very long assimilation window, τ = 0.8 • only one observed variable: y • perfect observation, σ 2 = 0 In the plot, the blue trajectory is the truth, the red one is the AUS-γ assimilation trajectory. The red circles, labelled with t = 0, mark the beginning of both trajectories. The black circles (t = T ) show the same trajectories after one assimilation window, and the green circles (t = 2T ) one assimilation window ahead. The black vectors are the bred vectors estimated by BDAS 93 Figure 3.40: A qualitative comparison between EKF and AUS-γ. 45 assim. windows = 0.8 noisy obs (σ2=2) observed: x, y, z Truth EKF 40 AUS−γ 35 z 30 25 20 t=0 t=2T 15 t=T 10 −20 −10 x 0 10 20 −25 −20 −15 −10 −5 5 0 10 15 20 y (see section 2.4) and evolved by the Tangent Linear Model operator, plotted every ∆t = 0.05. When t = T the assimilation is done along the corresponding bred vector, assumed to capture the most unstable structure of the system (see section 2.3). In Fig. 3.40 we show, as an example, the 3D comparison between EKF and AUS-γ, in the following context: • a very long assimilation window, τ = 0.8 • all 3 variables observed • noisy observations, with σ 2 = 2 Here the bred vectors for AUS-γ have not been plotted for sake of clarity. The initial condition is the same for all schemes. After each DA technique does its job for 3 assimilation windows, we plot the trajectories starting from the the red circles (t = 0). The blue trajectory is the truth, the red one is the AUS-γ trajectory, while the green one is that of EKF. Both EKF and AUS-γ are far from the truth, at the beginning, but at t = T (black circles) the EKF is not able to capture the regime change, while AUS-γ does. 94 Figure 3.41: The zone of the attractor considered in the next Figure 3.42. 50 45 40 35 30 25 20 15 10 5 0 50 0 −50 −15 3.5.3 −10 −5 0 5 10 15 20 AUS-iterating: a step by step description In Fig. 3.41 a 3D plot of the L63 system has been shown, with the highly unstable zone of the attractor, which is enlarged in Fig. 3.42. Here we set: • a long assimilation window, τ = 0.25 • 2 adaptive observations • noisy observations, with σ 2 = 2 The blue trajectory is the truth. The green star is the forecast state, with which the first analysis can be computed (red circle); the red trajectory is the “first-attempt” forecast trajectory. Black vectors are the bred vector estimated by BDAS (see section 2.4) and evolved by the Tangent Linear Model operator, plotted every ∆t = 0.05. When a new observation becomes available, a new assimilation can be done. In the iteration we go back to the previous assimilation step, correct the previous analysis according to equations 3.28-3.29 and to Rt ∆xak = c exp(− tkk+1 λ(t) dt)ek (see subsection 3.3.4), and compute the final forecast trajectory (the green one). Of course new bred vectors are computed as well, but are shown only in Fig. 3.43 (in magenta). 95 Figure 3.42: How AUS-iterating works: a zoom of previous Fig. 3.41. . assim. window=0.25 noisy obs. (σ2=2) 2 observations 10 8 first analysis z 6 . forecast state 4 final analysis 2 truth −15 . −10 −5 x 0 5 10 −12 −10 −8 −6 −4 −2 0 2 4 6 8 10 y Figure 3.43: More details on AUS-iterating assimilation scheme, including the evolving unstable vectors in the final forecast trajectory, recomputed after iteration. 10 8 6 4 2 −15 −10 −5 0 5 10 −12 −10 −8 −6 96 −4 −2 0 2 4 6 8 10 3.6 Adding two types of Model Error We now wish to run experiments with some kind of model error in the equations of our L63 system. In subsection 3.6.1, we will start by simply adding a random term to each equation and at each integration step. In subsection 3.6.3, we will introduce a systematic error in our model by changing the value of the parameter r, which drives the convection instability (see subsection 1.4.2). In short, we will compute the “true” trajectory of the system, we will get noisy observations of the true state of the system, but we will perform data assimilation using an imperfect model. 3.6.1 Random error In these experiments we will simulate a systematic model error by adding, at each integration step dt = 0.01, a random term in the model equations (eq. 1.49): ”Truth” System Assimilation Model dx dt = σ(y − x) dy dt = rx − y − xz dz dt = xy − bz dx dt = σ(y − x) + 1 dy dt = rx − y − xz + 2 dz dt = xy − bz + 3 (3.42) (3.43) where 1 , 2 , 3 are random additive terms, normally distributed with standard deviation A × 0.1σ, with the parameter A assuming different values. 97 Table 3.6: Random error in the assimilation model: 3 observations with variance σ 2 = 2 ⇒ σ = 1.414 each assimilation window τ = 0.25. It is shown the mean RMS analysis error, an average over 20000 assimilations, for the different DA schemes. . 3 obs assimilation technique EKF EKF-Evensen EKF-Yang AUS-γ0 AUS-γ AUS-iterating AUS-iterating+ 3.6.2 A=0 15.6 1.71 3.54 7.02 2.24 1.35 1.15 A=1 15.5 1.73 7.80 8.87 2.72 1.94 1.80 A=2 15.7 1.77 9.98 11.0 3.89 2.75 2.44 Random error: comparing performances All the experiments are performed under the following conditions: • A 20,000 assimilations statistics • 3 or 2 noisy observations with variance σ 2 = 2 • an assimilation window τ = 0.25 • random additive terms 123 are added at each integration step dt = 0.01. They are normally distributed with standard deviation A × 0.1σ ; the parameter A will assume the values A = 1 and A = 2 • in the 2 observation experiments, we will observe the two most convenient variables, y and z, for EKF schemes. For AUS schemes, we will observe the two variables associated with the two largest components of the unstable vector • In EKF-Evensen scheme, the term Q is the same as in eq. 3.8 • In EKF-Yang, no further tuning of the parameters has been done: the algorithm is the same as in experiments without model error We show the results in Tables 3.6 and 3.7. The case with A = 0, i.e. without random error at all, is shown here for reference (even if it was already shown in paragraph 3.4 with a longer statistics). The best AUS schemes are quite robust with respect to almost all the others. EKF-Evensen is actually more stable under these conditions, but it needs an estimate of the model error covariance matrix Q. 98 Table 3.7: Random error in the assimilation model: 2 observations with variance σ 2 = 2 ⇒ σ = 1.414 each assimilation window τ = 0.25. We show the mean RMS analysis error, an average over 20000 assimilations. 2 obs assimilation technique EKF EKF-Evensen EKF-Yang AUS-γ0 AUS-γ AUS-iterating AUS-iterating+ 3.6.3 A=0 15.5 1.78 3.67 7.46 2.50 1.54 1.28 A=1 15.6 1.82 7.08 9.35 3.04 2.22 2.11 A=2 15.6 1.92 9.47 10.9 4.19 3.27 3.07 Systematic error In these experiments we will simulate a systematic error by augmenting the value of the parameter r in the model equations (eq. 1.49). As we have already seen in subsection 1.4.2, the parameter r is related to the intensity of the convection instability. We have to keep in mind that a value r = 28 is a slightly supercritical value for unstable convection to occur. Thus the equations to be used are: ”Truth” System Assimilation Model dx dt = σ(y − x) dy dt = rx − y − xz dz dt = xy − bz dx dt = σ(y − x) dy dt = (r + ∆r)x − y − xz dz dt = xy − bz (3.44) (3.45) The term ∆r will introduce a systematic error in the equations used for data assimilation, by increasing the instability of convective motion in the model. 99 Table 3.8: Systematic error in the assimilation model, r = 28 (i.e. no error, for reference), r = 30, r = 33. Three observations with variance σ 2 = 2 each assimilation window τ = 0.25. It is shown the mean RMS analysis error, an average over 20,000 assimilations. . 3 obs assimilation technique EKF EKF-Evensen EKF-Yang AUS-γ0 AUS-γ AUS-iterating AUS-iterating+ 3.6.4 r = 28 15.6 1.71 3.54 7.02 2.24 1.35 1.15 r = 30 16.7 1.74 11.4 11.5 3.91 2.53 2.24 r = 33 17.6 1.86 13.9 14.6 8.01 4.90 3.87 Systematic error: comparing performances All the experiments are performed under the following conditions: • A 20,000 assimilations statistics • 3 or 2 noisy observations with variance σ 2 = 2 • an assimilation window τ = 0.25 • systematic errors ∆r = 0 (for reference), ∆r = 2 and ∆r = 5 • in the 2 observation experiments we will observe the two most convenient variables, y and z, for EKF schemes; while for AUS schemes we will observe the two variables associated with the two largest components of the unstable vector • In EKF-Evensen scheme, the term Q is the same as in eq. 3.8 • In EKF-Yang, no further tuning of the parameters has been done: the algorithm is the same as in experiments without model error We show the results in Tables 3.8 and 3.9. The case with r = 28, i.e. without systematic error, is included here for comparison (even if it was already shown in paragraph 3.4 with a longer statistics). As for random error, all schemes considered turn out similar results: the EKF-Evensen scheme does outperform the others, but it needs an estimate of the model error covariance, which is not easy to do in realistic models. In these contexts, the best AUS schemes are still quite robust. 100 Table 3.9: Systematic error in the assimilation model, r = 28 (i.e. no error, for reference), r = 30, r = 33. Two observations with variance σ 2 = 2 each assimilation window τ = 0.25. It is shown the mean RMS analysis error, an average over 20000 assimilations. . 2 obs assimilation technique EKF EKF-Evensen EKF-Yang AUS-γ0 AUS-γ AUS-iterating AUS-iterating+ r = 28 15.5 1.78 3.67 7.46 2.50 1.54 1.28 101 r = 30 16.9 1.81 10.9 11.6 4.26 2.76 2.53 r = 33 17.9 1.96 13.8 14.8 8.21 5.38 4.88 102 Chapter 4 Conclusions In this work we focused on the data assimilation problem for the highly nonlinear, chaotic Lorenz’s 63 system. Specifically, we tried to estimate the state of the system given a set of noisy observations at regular time intervals and the equations of the model. In the contexts we have examined, including those with a model error, we have seen that Assimilation in the Unstable Subspace (AUS) once again has shown better efficiency than other advanced data assimilation schemes. Furthermore, at least for sufficiently long assimilation windows, our proposed approach for the estimate of the forecast error amplitude leads to a significant improvement of the assimilation performance. In the cases we have considered, the iterating extension of AUS improves over the standard AUS. In particular, it boosts the efficiency of regime’s changes tracking, with a low computational cost. Other data assimilation schemes need estimates of ad hoc parameters, which have to be tuned for the specific model at hand. In NWP models, tuning of parameters — and in particular an estimate of the model error covariance matrix Q — may turn out to be quite difficult. The estimate of Q, on the other hand, is at the basis of the good performance of the Evensen model in particular in the presence of model error. As a final remark, we should note that the proposed approach may well be implemented in operational NWP models. 103 Appendices Appendix A Euler and Runge-Kutta numerical integration methods The Runge-Kutta methods are used to numerically approximate the solution of ordinary differential equations: they have much better performances with respect to the first order Euler method. The kind of problems we want to solve is the following. Let’s suppose we have a vector x and a vector function f (x) so that: d x = f (x) dt (A.1) Given the initial condition x(t0 ) = x0 , we wish a systematic way to approximate the solution x(t). A.1 First order Euler method The formula for the Euler method to advance a solution from tn to tn+1 ≡ tn + ∆t, that is to estimate x(tn + ∆t) ' xn+1 given xn , comes straightforwardly from the definition of derivative: xn+1 = xn + f (xn ) ∆t + O(∆t2 ) (A.2) where the error E = |x(tn + ∆t) − xn+1 | will tend to zero with ∆t, i.e. E ∝ ∆t. In the Euler method the solution xn+1 , advanced through an interval ∆t, is computed with the derivative information f (xn ) calculated at the initial time tn only: it is not recommended for practical use because it is not so accurate, if compared to other methods, and it’s not even very stable, too. 107 A.2 RK2: second order Runge-Kutta scheme A better convergence to zero of the error can be obtained with the 2nd order Runge-Kutta scheme, also known as improved Euler method, where the error E = |x(tn +∆t)−xn+1 | ∝ (∆t)2 . Here the solution xn+1 is computed by using an average for the derivative: 1 xn+1 = xn + (k1 + k2 ) + O(∆t3 ) 2 (A.3) where k1 = f (xn ) ∆t (A.4) k2 = f (xn + k1 ) ∆t = f (x̃n+1 ) ∆t (A.5) The meaning of k1 , k2 and x̃n+1 is the following: • k1 is a trial step to evaluate (xn+1 − xn ) through the Euler method, as in eq. A.2; the first evaluation of xn+1 has been called x̃n+1 • k2 is another estimate of (xn+1 − xn ) through the same Euler method, but using f (x̃n+1 ), which is the derivative in the estimated end of the time interval So RK2 (eq. A.3) exploits an average between the derivative f (xn ) calculated at the initial time tn and the derivative f (x̃n+1 ) = f (xn + k1 ) which is an estimate of the derivative f (xn+1 ) calculated at the final time. 108 A.3 RK4: fourth order Runge-Kutta scheme We need not limit ourselves to the second order. A very popular Runge-Kutta method is the fourth order one, in which the error E = |x(tn + ∆t) − xn+1 | ∝ (∆t)4 . Here we use a weighted average for the derivative: 1 xn+1 = xn + (k1 + 2k2 + 2k3 + k4 ) + O(∆t5 ) 6 (A.6) where k1 = k2 = k3 = k4 = f (xn ) ∆t 1 f xn + k1 ∆t 2 1 f xn + k2 ∆t 2 f (xn + k3 ) ∆t (A.7) (A.8) (A.9) (A.10) Higher order RK schemes are conceivable, but not necessarily better: they need more computational efforts. 109 110 Appendix B Normalization factors for random variables If x is a random variable with standard deviation σ and Probability Density Function (PDF): 1 x2 √ P (x) = exp − 2 2σ 2π σ (B.1) and x is a vector whose components are random with standard deviation σ and the same PDF, it can be shown that: Z Z Z Z Z Z +∞ −∞ +∞ −∞ +∞ −∞ |x| P (x)dx kxk P (x1 )P (x2 )dx1 dx2 kxk P (x1 )P (x2 )P (x3 )dx1 dx2 dx3 111 = r 2 σ π 1-Dimensional case = r π σ 2 2-Dimensional case = r 8 σ π 3-Dimensional case 112 Appendix C EKF, 1-Dimensional example In the Extended Kalman Filter, if the system is described by a single variable only, the forecast and analysis covariance matrices and Kalman gain reduce to scalars. C.1 Forecast and analysis covariance matrices From Extended Kalman Filter equations, we have: Kk = pfk = pak = pfk (C.1) σ 2 + pfk αk pak−1 (1 − K)pfk = (C.2) σ 2 pfk σ 2 + pfk Let’s find forecast and analysis covariance matrices step by step: p0 ≡ pa0 pf1 = α1 p0 pa1 = pf2 = α2 pa1 = pa2 = σ 2 pf2 pf3 = α3 pa2 = pa3 = σ 2 pf3 σ 2 pf1 σ2 + σ2 + σ2 + pf1 pf2 pf3 = σ 2 α1 p0 + α1 p0 σ2 σ 2 p0 α1 α2 σ 2 + α1 p0 = σ2 σ 2 p0 α1 α2 + p0 (α1 + α1 α2 ) σ 2 p0 α1 α2 α3 σ 2 + p0 (α1 + α1 α2 ) = σ2 σ 2 p0 α1 α2 α3 + p0 (α1 + α1 α2 α3 ) 113 (C.3) .. . .. . pfk = pak = Kk = Qk σ 2 p0 j=1 αk σ 2 p0 α1 α2 . . . αk = Pk−1 Qi σ 2 + p0 (α1 + α1 α2 + α1 α2 α3 + . . . + α1 α2 . . . αk−1 ) σ 2 + p0 ( i=1 j=1 αj ) Q k σ 2 p0 j=1 αk σ 2 p0 α1 α2 . . . αk = Pk Q i σ 2 + p0 (α1 + α1 α2 + α1 α2 α3 + . . . + α1 α2 . . . αk ) σ 2 + p0 ( i=1 j=1 αj ) pak σ2 In these equations we have also showed the result for Kalman gain Kk , that can be easily Pkf 2 σ +Pkf computed from Kk = . The analysis, background and observational errors are bound by the following general equation [16]: a = b + (B−1 + HT R−1 H)−1 HT R−1 (o − Hb ) (C.4) In this 1-dimensional example, if H = 1 and R = σ 2 it becomes: a σ −2 (o − f −1 (pk ) + σ −2 pf b + f k (o − b ) pk + σ 2 (1 − Kk )b + Kk o = b + = = Hb ) If αk = α = constant > 1, then lim pfk = σ 2 (α − 1) lim pak = σ2 (C.6) = (C.7) k→∞ k→∞ lim Kk k→∞ α−1 α α−1 α Indeed: lim pfk k→∞ σ 2 p0 αk k→∞ σ 2 + p0 (α + α2 + . . . + αk−1 ) σ 2 p0 α = lim σ2 1 k→∞ k−1 + p0 + . . . + α1 + 1 α αk−2 = = lim σ2 α 1 1 1− α = σ 2 (α − 1) 114 (C.5) and lim pak k→∞ = = = = σ 2 p0 αk k→∞ σ 2 + p0 (α + α2 + . . . + αk ) σ 2 p0 lim σ2 1 1 k→∞ k + p0 α αk−1 + αk−2 + . . . + lim 1 α +1 σ2 1 1 1− α σ2 α−1 α for if α > 1 then 1/α < 1. So: lim Kk k→∞ = = σ 2 (α − 1) + σ 2 (α − 1) α−1 α σ2 or, equivalently: lim (1 − Kk ) k→∞ C.2 = 1 α Lyapunov exponents for free and forced systems Now we redefine α as the error growth (instead of the analysis error growth), so now 1−K ≈ 1 α2 , or 1 − K ≈ exp(−2λτ ). The error growth of the forced system will be: ef τ = (1 − K)eλτ (C.8) where f and λ are the greatest Lyapunov exponents of the forced and free systems respectively. So, if αk = α = constant > 1: ef τ = = ⇒f (1 − K)eλτ 1 e2λτ = e−λτ = −λ eλτ If αk are not constant the results are almost the same. If α 1 then 1 − K ≈ 0, or K ≈ 1, 115 so the forecast will be almost ignored and the analysis error covariance matrix will tend to the observational error: pa = σ 2 (1 − 1/α) ≈ σ 2 . If we diminish the observational interval τ (τ −→ 0): pa = σ 2 (1 − exp(−2λτ )) −→ 0 116 (C.9) Appendix D Inverse and pseudo-inverse matrices An inverse matrix can be defined for square matrices only. The pseudo-inverse matrix, instead, is a generalization that can also be defined for rectangular matrices. D.1 Inverse matrix Given an N × N square matrix A, the inverse matrix A−1 , if it does exist, has the same dimensions as A and is such that A−1 A = AA−1 = I (D.1) where I is the N × N identity matrix. The matrix A has the inverse matrix A−1 if and only if A is non-singular, i.e.: det(A) = |A| = 6 0 (D.2) The matrix A−1 can be written as follows: A−1 = 1 CT |A| 117 (D.3) where c11 c12 CT = . . . c1N c21 ··· cN 1 c22 .. . ··· .. . cN 2 .. . c2N · · · cN N (D.4) is the transpose of cofactor matrix of A. Example. Given a 3 × 3 non-singular matrix A a11 A= a21 a31 a12 a22 a32 a13 a23 a33 (D.5) with det(A) = |A| = 6 0, the cofactor matrix C is a22 + det a32 a12 C = − det a32 a12 + det a22 a23 a21 − det a33 a31 a23 a33 a13 a11 + det a31 a33 a13 a33 a13 a11 − det a21 a23 a13 a23 a21 + det a31 a22 a32 a12 a22 +(a12 a23 − a22 a13 ) a12 a32 a11 − det a31 a11 + det a21 so it transpose is +(a22 a33 − a32 a23 ) CT = −(a21 a33 − a31 a23 ) +(a21 a32 − a31 a22 ) −(a12 a33 − a32 a13 ) +(a11 a33 − a31 a13 ) −(a11 a32 − a31 a12 ) 118 −(a11 a23 − a21 a13 ) +(a11 a22 − a21 a12 ) and the inverse matrix A−1 is A−1 +(a22 a33 − a32 a23 ) −(a12 a33 − a32 a13 ) 1 −(a a − a a ) = 21 33 31 23 |A| +(a21 a32 − a31 a22 ) +(a11 a33 − a31 a13 ) −(a11 a32 − a31 a12 ) +(a12 a23 − a22 a13 ) −(a11 a23 − a21 a13 ) +(a11 a22 − a21 a12 ) Lastly, we recall some other useful facts: it can be shown that if the matrix A can be written as a product, then the inverse matrix A−1 can be expressed in term of the product of the inverse matrices: A = BC A−1 = C−1 B−1 =⇒ (D.6) If the matrix A admits the inverse A−1 then we have: D.2 (AT )−1 = (A−1 )T (D.7) (kA)−1 = k −1 A−1 (D.8) Pseudo-inverse matrix Given an M × N matrix A, the pseudo-inverse A+ has the same dimension as AT , and satisfies the following relations: AA+ A = A A+ AA+ (D.9) = A+ (D.10) (AA+ )∗ = AA+ =⇒ AA+ is hermitian (D.11) (A+ A)∗ = A+ A =⇒ A+ A is hermitian (D.12) where A∗ is the conjugate transpose of the matrix A, i.e. the matrix computed by taking the transpose of A and then the complex conjugate of each element. If the matrix A is real, then A∗ = AT . Should exist the inverse of A∗ A, then A+ = (A∗ A)−1 AT If it does exist the inverse A−1 of the matrix A, then it can be shown that A−1 = A+ . 119 (D.13) Example. Given a 3 × 2 real matrix A 1 0 A= 2 5 7 3 then its pseudo-inverse A+ is 0.0389 −0.0994 0.1657 A+ = −0.0354 0.2377 −0.0629 which satisfies the four properties above: 0.0389 −0.0994 0.1657 (AA ) = AA = −0.0994 0.9897 0.0171 0.1657 0.0171 0.9714 + T + 1 0 (A+ A)T = A+ A = 0 1 Furthermore, since 54 31 AT A = 31 34 then (AT A)−1 and 0.0389 −0.0354 = −0.0354 0.0617 0.0389 −0.0994 0.1657 A+ = (AT A)−1 AT = −0.0354 0.2377 −0.0629 120 Bibliography [1] Andrews, D. G., 2000: An Introduction to Atmospheric Physics. Cambridge University Press. Cambridge, UK. [2] Arnold, V. I., 1986: Metodi Matematici della Meccanica Classica. Editori Riuniti Edizioni MIR, Roma, Italia. [3] Carrassi, A., Trevisan, A. and Uboldi, F. 2006: Adaptive observations and assimilation in the unstable subspace by breeding on the data assimilation system. Tellus, 59A-1, 101-113. [4] Charney, J. G., 1951: Dynamical forecasting by numerical process. Compendium of meteorology. American Meteorological Society, Boston, MA, USA. [5] Chin, T. M., Turmon, M. J., Jewell, J. B., Ghil, M. 2005. An ensemble-based smoother with optimized weights for highly nonlinear systems, in print. [6] Corazza, M., Kalnay, E., Patil, D. J., Yang, S. C., Morss, R., Cai, M., Szunyogh, I., Hunt, B. R. and Yorke, J. A. 2003: Use of the breeding technique to estimate the structure of the analysis “error of the day”. Nonlin. Processes Geophys. 10, 223-243. [7] Courtier, P. and O. Talagrand, 1990: Variational assimilation of meteorological observations with the direct and adjoint shallow water equations. Tellus, 42A, 531-549. [8] Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press. Cambridge, UK. [9] Ehrendorfer, M., 2002: The Liouville equation in atmospheric predictability. ECMWF Seminar Proceedings series 2002, Predictability of Weather and Climate. [10] Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res. 99 (C5), 1014310162. [11] Evensen, G., 1997: Advanced data assimilation for strongly nonlinear dynamics. Mon. Wea. Rev., 125, 1342-1354. 121 [12] Hénon, M., 1976: A two-dimensional mapping with a strange attractor. Comm. Math. Phys. 50, 69-77. [13] Holton, J. R., An Introduction to Dynamic Meteorology. Academic Press. Orlando, USA. [14] Ide, K., P. Courtier, M. Ghil and A. C. Lorenc, 1997: Unified notation for data assimilation: Operational, sequential and variational. J. Meteor. Soc. Japan, 75, 181-189. [15] Jazwinski, A. H., 1970: Stochastic Processes and Filtering Theory. Academic Press, New York, NY, USA. [16] Kalnay, E., 2003: Atmospheric Modeling, Data Assimilation and Predictability. Cambridge University Press. Cambridge, UK. [17] Lewis, J. and J. Derber, 1985: The use of adjoint equations to solve a variational adjustment problem with advective constraint. Tellus, 37A, 309-322. [18] Lorenz, E. N., 1963: Deterministic non-periodic flows. J. Atmos. Sci. 20, 130-141. [19] Lorenz, E. N., 1993: The Essence of Chaos. University of Washington Press. Seattle, USA. [20] Lorenz, E. N., 1996: Predictability: A problem partly solved. Proc. Seminar on Predictability, Vol. 1, Reading, United Kingdom, ECMWF, 1-18. [21] Lorenz, E. N. and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: simulation with a small model. J. Atmos. Sci. 55, 399-414. [22] May, R. M., 1974: Stability and Complexity in Model Ecosystems. Princeton University Press. Princeton, NJ, USA. [23] Maybeck, P. S., 1979: Stochastic Models: Estimation and Control, Volume 1. Academic Press, New York, NY, USA. [24] Nicolis, C., 2003: Dynamics of Model Error: Some Generic Features. J. Atmos. Sci. 60, 2208-2218. [25] Nicolis, C., 2004: Dynamics of Model Error: The Role of Unresolved Scales Revisited. J. Atmos. Sci. 61, 1740-1753. [26] Ott E., Sauer T. and Yorke J. A., 1994: Coping with Chaos: Analysis of Chaotic Data and the Exploitation of Chaotic Systems. John Wiley & Sons Inc., New York. [27] Peixoto, J. P. and A. H. Oort, 1992: Physics of Climate. American Institute of Physics, New York, USA. 122 [28] Poincaré, J. H., 1997 (reprint): Scienza e Metodo. Giulio Einaudi editore SpA, Torino, Italia. [29] Pratt, W. J., Raiffa H. and Schlaifer R., 1995: Introduction to Statistical Decision Theory. The MIT Press, Cambridge, MA, USA. [30] Saltzman, B., 1962: Finite amplitude free convection as an initial value problem–I. J. Atmos. Sci. 19, 329-341. [31] Strogatz, S. H, 1994: Nonlinear Dynamics and Chaos. Westview Press. Cambridge, USA. [32] Trevisan, A. and F. Pancotti, 1998: Periodic orbits, Lyapunov vectors and singular vectors in the Lorenz system. J. Atmos. Sci. 55, 390-398. [33] Trevisan, A. and F. Uboldi, 2004: Assimilation of Standard and Targeted Observations within the Unstable Subspace of the Observation-Analysis-Forecast Cycle System. J. Atmos. Sci. 61, 103-113. [34] Uboldi, F., Trevisan, A. and Carrassi, A. 2005: Developing a dynamically based assimilation method for targeted and standard observations. Nonlin. Processes in Geophys. 12, 149-156. [35] Uboldi, F. and A. Trevisan, 2006: Detecting unstable structures and controlling error growth by assimilation of standard and adaptive observations in a primitive equation ocean model. Nonlin. Processes in Geophys. 13, 67-81. [36] Yang et al., 2006: Data Assimilation as Synchronization of Truth and Model: Experiments with the Three-Variable Lorenz System. J. Atmos. Sci. 63, 2340-2354. 123

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

### Related manuals

Download PDF

advertisement