DIAGNOSTICS AND GENERALIZATIONS FOR PARAMETRIC STATE ESTIMATION by Grey Stephen Nearing _____________________ Copyright © Grey Nearing 2013 A Dissertation Submitted to the Faculty of the DEPARTMENT OF HYDROLOGY AND WATER RESOURCES In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY WITH A MAJOR IN HYDROLOGY In the Graduate College THE UNIVERSITY OF ARIZONA 2013 2 THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE As members of the Dissertation Committee, we certify that we have read the dissertation prepared by Grey Nearing entitled Diagnostics and Generalizations for Parametric State Estimation and recommend that it be accepted as fulfilling the dissertation requirement for the Degree of Doctor of Philosophy ____________________________________________________________Date: 3/25/13 Hoshin V. Gupta ____________________________________________________________Date: 3/25/13 C. Larry Winter ____________________________________________________________Date: 3/25/13 Ty P.A. Ferré ____________________________________________________________Date: 3/25/13 Wade T. Crow Final approval and acceptance of this dissertation is contingent upon the candidate's submission of the final copies of the dissertation to the Graduate College. I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement. ____________________________________________________________Date: 3/25/13 Dissertation Director: Hoshin V. Gupta 3 STATEMENT BY AUTHOR This dissertation has been submitted in partial fulfillment of the requirements for an advanced degree at the University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library. Brief quotations from this dissertation are allowable without special permission, provided that an accurate acknowledgement of the source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the copyright holder. SIGNED: Grey Nearing 4 ACKNOWLEDGEMENTS Thanks to Hoshin and Susan for showing me that patience and respect are the most important ingredients for teaching, learning, and communicating. 5 DEDICATION To my father, who gave me exactly what I needed when I needed it. 6 TABLE OF CONTENTS ABSTRACT ......................................................................................................................11 CHAPTER 1: INTRODUCTION ...................................................................................12 1.1. Review of Literature and Statement of Problems............................................12 1.2. Philosophy and Organization of This Dissertation ..........................................15 1.3. Statement of Author’s Contribution .................................................................16 CHAPTER 2: PRESENT STUDIES ..............................................................................18 2.1. Assimilating Remote Sensing Observations of Leaf Area Index and Soil Moisture for Wheat Yield Estimates: An Observing System Simulation Experiment ..........................................................................................................18 2.2. An Approach to Quantifying the Efficiency of a Bayesian Filter...................19 2.3. Information Loss in Estimation of Agricultural Yield: A Comparison of Generative and Discriminative Approaches ....................................................20 2.4. Measuring Information about Model Structure Introduced During System Identification .......................................................................................................22 2.5. Kalman Filtering with a Gaussian Process Observation Function ................24 CHAPTER 3: DISCUSSION AND FUTURE WORK .................................................26 REFERENCES .................................................................................................................28 APPENDIX A: ASSIMILATING REMOTE SENSING OBSERVATIONS OF LEAF AREA INDEX AND SOIL MOISTURE FOR WHEAT YIELD ESTIMATES: AN OBSERVING SYSTEM SIMULATION EXPERIMENT ..........32 Abstract .......................................................................................................................33 1. Introduction............................................................................................................34 2. Methods...................................................................................................................36 2.1. The DSSAT CropSim Ceres Wheat Model ................................................37 2.2. The Ensemble Kalman Filter .......................................................................40 2.3. The Sequential Importance Resampling Filter ..........................................42 2.4. Observing System Simulation Experiments ...............................................43 2.4.1. Modeling Uncertainty Distributions ..................................................44 2.4.2. Generating Synthetic Observations ...................................................48 2.4.3. Ensemble Size Experiments ................................................................50 2.4.4. Modeling Uncertainty Experiments ...................................................51 2.4.5. Observation Uncertainty Experiments ..............................................52 7 TABLE OF CONTENTS - Continued 3. Results .....................................................................................................................53 3.1. Ensemble Size Experiments .........................................................................53 3.2. Modeling Uncertainty Experiments ............................................................54 3.3. Observation Uncertainty Experiments .......................................................58 4. Discussion ...............................................................................................................58 Acknowledgement ......................................................................................................60 Tables ..........................................................................................................................61 Figures .........................................................................................................................69 References ...................................................................................................................72 APPENDIX B: AN APPROACH TO QUANTIFYING THE EFFICIENCY OF A BAYESIAN FILTER .......................................................................................................76 Abstract .......................................................................................................................77 1. Introduction............................................................................................................78 2. Theory: Background and Proposed Metrics .......................................................80 2.1. Bayes Filters ..................................................................................................80 2.2. Ensemble Kalman Filter ..............................................................................83 2.3. Observing System Simulation Experiments ...............................................84 2.4. Quantifying Observation Utility and Filter Efficiency ..............................85 3. Demonstration: An OSSE for Estimating Root-Zone Soil Moisture ................88 3.1. A 3-Layer Soil Moisture Model ...................................................................89 3.1.1. State Transition Function ...................................................................89 3.1.2. Simulation Period and Boundary Conditions ...................................91 3.1.3. Observation Function ..........................................................................92 3.2. Ensemble Size ................................................................................................92 3.3. Methods of OSSE Analysis ..........................................................................94 3.4. Assessing the Effects of Simulator Uncertainty .........................................95 3.5. OSSE Results .................................................................................................95 4. Summary and Discussion ......................................................................................98 Acknowledgement ....................................................................................................101 Figures .......................................................................................................................102 8 TABLE OF CONTENTS - Continued References .................................................................................................................106 APPENDIX C: INFORMATION LOSS IN ESTIMATION OF AGRICULTURAL YIELD: A COMPARISON OF GENERATIVE AND DISCRIMINATIVE APPROACHES ..............................................................................................................109 Abstract .....................................................................................................................110 1. Introduction..........................................................................................................111 2. Methods.................................................................................................................113 2.1. A Crop Development Simulator ................................................................113 2.1.1. The State Transition Function ..........................................................115 2.1.2. The Observation Function ................................................................115 2.1.3. Simulation Period and Uncertainty Sampling ................................116 2.2. Data Assimilation and the Ensemble Kalman Filter ...............................118 2.3. A Gaussian Process Regression Interpolator ...........................................121 2.4. Observing System Simulation Experiments .............................................124 2.5. Measuring Information in Observations ..................................................125 2.6. Measuring Information Loss .....................................................................127 3. Results ...................................................................................................................128 4. Conclusions and Discussion ................................................................................130 Appendix A: The Crop Development Simulator State Transition Function .....131 Acknowledgement ....................................................................................................136 Tables ........................................................................................................................137 Figures .......................................................................................................................138 References .................................................................................................................144 APPENDIX D: MEASURING INFORMATION ABOUT MODEL STRUCTURE INTRODUCED DURING SYSTEM IDENTIFICATION ........................................149 Abstract .....................................................................................................................150 1. Introduction..........................................................................................................151 2. Methods.................................................................................................................153 2.1. Dynamic System Simulators ......................................................................153 2.2. An EM System Identification Algorithm ..................................................155 2.2.1. E-Step: The Ensemble Kalman Smoother .......................................156 9 TABLE OF CONTENTS - Continued 2.2.2. M-Step: Sparse Gaussian Process Regression.................................158 2.2.3. Why We Cannot Use Filters .............................................................161 2.3. Measuring Information in the Conceptual and Mathematical Models ..........................................................................................................162 2.3.1. Information about the Model Extracted from Observations.........163 2.3.2. Information Contained in the Model about a Hydrologic Process ................................................................................................164 2.4. An Application Experiment .......................................................................166 2.4.1. Leaf River Data and Simulation Period ..........................................166 2.4.2. The HyMod Simulator.......................................................................167 2.4.3. Implementing the Learning Algorithm............................................168 3. Results ...................................................................................................................168 4. Discussion .............................................................................................................172 Appendix A: The HyMod Simulator ......................................................................174 Tables ........................................................................................................................176 Figures .......................................................................................................................177 References .................................................................................................................184 APPENDIX E: KALMAN FILTERING WITH A GAUSSIAN PROCESS OBSERVATION FUNCTION......................................................................................187 Abstract .....................................................................................................................188 1. Introduction..........................................................................................................189 2. Methods.................................................................................................................190 2.1. Overview of Data Assimilation ..................................................................190 2.2. GPR Observation Function .......................................................................191 2.3. gpEnKF Posterior Likelihood and Gradient ...........................................193 3. Demonstrations ....................................................................................................194 3.1. Streamflow Forecasting ..............................................................................196 3.2. Root-Zone Soil Moisture Estimation.........................................................198 3.3. Lorenz 3-D ..................................................................................................199 4. Discussion .............................................................................................................200 10 TABLE OF CONTENTS - Continued Appendix: A Three-Layer Mahrt-Pan Soil Moisture Model ...............................201 Figures .......................................................................................................................204 References .................................................................................................................208 11 ABSTRACT This dissertation is comprised of a collection of five distinct research projects which apply, evaluate and extend common methods for land surface data assimilation. The introduction of novel diagnostics and extensions of existing algorithms is motivated by an example, related to estimating agricultural productivity, of failed application of current methods. We subsequently develop methods, based on Shannon’s theory of communication, to quantify the contributions from all possible factors to the residual uncertainty in state estimates after data assimilation, and to measure the amount of information contained in observations which is lost due to erroneous assumptions in the assimilation algorithm. Additionally, we discuss an appropriate interpretation of Shannon information which allows us to measure the amount of information contained in a model, and use this interpretation to measure the amount of information introduced during data assimilation-based system identification. Finally, we propose a generalization of the ensemble Kalman filter designed to alleviate one of the primary assumptions – that the observation function is linear. 12 CHAPTER 1: INTRODUCTION 1.1. Review of Literature and Statement of Problems Data assimilation is defined as the application of Bayes’ law (Bayes and Price 1763) to the problem of estimating the states of a hidden Markov model (HMM) conditional on observations (Wikle and Berliner 2007), and has been used extensively in earth science (Reichle 2008), including hydrology (Liu and Gupta 2007) and agronomy (Prevot et al. 2003). For practical purposes, most data assimilation systems are based on approximations of Bayes’ Law; both parametric (Evensen 2003; Kalman 1960) and nonparametric (Gordon et al. 1993) approximations have been proposed. In many applications, practical data assimilation is sufficient for reducing uncertainty in, or increasing the accuracy of, model state estimates (e.g. Reichle et al. 2002; Vrugt et al. 2006). This is not always the case, however, and even though studies which demonstrate little value from assimilating observations are rarely published, there are a few (e.g., de Wit and van Diepen 2007); statistical acumen and reading between the lines (or luck) are generally necessary to identify this type of result in published literature. Our contribution to this body of work is direct, and clearly shows that assimilating observations of crop color (in the form of leaf area index) and soil moisture is statistically ineffective at improving estimates of crop yield, due to the nature of the relationship between the states of crop development models and remote sensing observations (Nearing et al. 2012). Several approximations of the HMM are inherent in common data assimilation algorithms, and although methods have been proposed for estimating the sensitivity of 13 model states to remote sensing observations (e.g., Gelaro et al. 2007) and the amount of information contained in remote sensing observations (Rodgers 2000; p33), previous to our efforts (Nearing et al. 2013) no method had been developed to compare the actual amount of information contained in observations with the amount extracted by approximately Bayesian filters. Using an approach inspired by Gong et al. (2013) that is fundamentally based on the data processing inequality (Cover and Thomas 1991; p34), we use entropy as a surrogate for information and quantify the amount of entropy in the data assimilation posterior distribution that is due to approximations of Bayes’ law(Nearing et al. 2013). The method is then extended to formally quantify information loss (Nearing and Gupta in review). Any state estimates obtained using data assimilation are conditional (usually implicitly) on the HMM structure. The process of identifying an appropriate HMM structure is called system identification, and can be viewed as a Bayesian learning problem, similar to data assimilation (Liu and Gupta 2007). System identification is defined as a two-step process of building dynamical models that are consistent with physical knowledge about the system and with observational data. The first step is conceptual structure identification, and involves a selection of system boundaries and boundary conditions, system states, and important system processes. The second step is mathematical structure identification and involves specifying appropriate mathematical representations of system processes (Bulygina and Gupta 2010). The most common form of mathematical structure identification is an expectation-maximization (Dempster et al. 1977) algorithm which iteratively (E-step) infers the distribution of model states conditional on observations and 14 then (M-step) infers the maximum-likelihood values of parameters of the state transition function (Ghahramani and Roweis 1999); this method was used by Vrugt et al. (2005) and Bulygina and Gupta (2009, 2010, 2011) to identify the mathematical structure of rainfall-runoff models. The parameters which are calibrated in the M-step can be parameters of a nonparametric regression so that the method is a general method for the identification of the mathematical structure of a dynamic system model (e.g., Bulygina and Gupta 2009; Damianou et al. 2011; Ghahramani and Roweis 1999; Turner et al. 2009; Wang et al. 2008). Gong et al. (2013) hypothesized that a certain amount of information about system outputs was stored in data about system inputs, and interpreted the data processing inequality (DPI) to mean that models which mapped system inputs to outputs could only lose some of that information (could not add information about the outputs). In actuality, the DPI says that no stochastic model can perform as well as a model which exactly represents the true joint distribution between inputs and outputs. We therefore propose that even imperfect models contain information about any system property of interest (including about a model of the system; e.g., Ye et al. 2008), and that this information is about the mapping from stochastic measured data to the property of interest. In this context, information is measured as the divergence (Kullback and Leibler 1951) caused by conditioning the property of interest on measurements, and any model which approximates Bayesian conditioning contains information. The most common algorithm used in land surface data assimilation - the ensemble Kalman filter (EnKF; Evensen 2003) - has two major assumptions: (1) that at any given 15 time, the state of the HMM is Gaussian-distributed, and (2) that the relationship between the state and observation (called the observation function) of the HMM is linear. These conditions result in a Gaussian posterior which can be computed analytically if the moments of the prior and are known. The second moment of the prior is computed from a sample set called the ensemble and the mean of the posterior is calculated by separately assuming that that each ensemble sample represents the prior mean. This results in an independent and identically distributed sample of the posterior. A popular generalization of this method linearizes the observation function around the first moments of the HMM state and observation distributions and implements an iterative solution for the mean of the posterior (Zupanski 2005). Linearization of the observation function is achieved by computing the covariance between samples of the HMM states and observations, so it is impossible to estimate the gradient of the objective function local to each ensemble member. This limitation precludes using nonparametric (kernel or copula) prior distributions, and therefore precludes a fully nonlinear ensemble filter. We propose to emulate the observation function via Gaussian process regression (GPR: Rasmussen and Williams 2006) and solve for the gradient of the observation with respect to the states locally at each HMM sample. 1.2. Philosophy and Organization of this Dissertation This dissertation is a collection of five distinct reports – reproduced in Appendices A-E with individual overviews given in Chapter 2. These reports outline my efforts to understand the state of the science of data assimilation, and more generally, the art of learning from data. This was very much a hands-on process, where the bulk of the 16 learning was facilitated by coding and testing existing and novel algorithms. Ultimately, the reports included in this dissertation represent a fraction of that work, and were chosen to highlight novel efforts toward understanding the interaction between dynamic system models and data. Because I did not have a single over-arching research objective, the projects reported here are diverse and do not collectively address any single underlying deficiency in our understanding of the universe. Some of these projects are engineering projects in that they represent the development of solutions to practical problems rather than investigations of natural phenomenon, and others should be interpreted as philosophical discussion supported by demonstrations. There is very little evidence-based science in this dissertation. 1.3. Statement of the Author’s Contribution The majority of the novel philosophical and methodological contributions described in this dissertation, including the central ideas of four of the five studies, are due to the Author. The major exception is that the application study (Appendix A) constitutes work described by a NASA ROSES proposal which was drafted by a group of individuals, led by Dr. Wade T. Crow and including the Author. Additionally, inspiration for the data assimilation diagnostics (Appendix B) came from Dr. Wei Gong’s creative application of the data processing inequality (Gong et al. 2013). All of the computer code used in each study was written by the Author with the exception of the DSSAT crop model (Hoogenboom et al. 2008) used in the application study (Appendix A); the DSSAT model was extensively modified by the Author. All of the published and un-published papers 17 included in the appendices of this dissertation were written by the Author, with editing by the various listed co-authors (esp. Dr. Hoshin Gupta). 18 CHAPTER 2: PRESENT STUDIES The methods, results, and conclusions of the studies which constitute this dissertation are presented in appendices. The following subsections summarize the key elements of each study. 2.1. Assimilating Remote Sensing Observations of Leaf Area Index and Soil Moisture for Wheat Yield Estimates: An Observing System Simulation Experiment (Nearing et al. 2012) The paper in Appendix A was published in Water Resources Research in May, 2012, and is a case study on the application of the ensemble Kalman filter and a nonparametric resampling filter to the problem of improving dynamic model estimates of seasonal crop yield. Like all studies presented in this dissertation, this is a synthetic study, which means that the observations which were assimilated were generated by the hidden Markov model itself. In this case the HMM was based on the Decision Support System for Agrotechnology Transfer (DSSAT) CropSim-Ceres model (Hoogenboom et al. 2008). The objective of this experiment was to determine which types of uncertainty could be mitigated by assimilating (1) observations of surface soil moisture, like what are expected to be available from the Soil Moisture Active-Passive mission (Entekhabi et al. 2010), and (2) observations of leaf area index like what are available from Moderate Resolution Imaging Spectroradiometer (Knyazikhin et al. 1999). We found little statistical improvement to estimates of seasonal wheat yield produced by two rain-fed crops (energy-limited winter wheat and water-limited summer wheat) due to assimilating 19 observations of either type. An analysis of the correlations between HMM states and observations revealed that limitations were due to (i) a lack of correlation between plant development and soil moisture soil moisture observations, even in the water-limited case, (ii) error in LAI observations, and (iii) a lack of correlation between leaf growth and grain growth. Even though we were able to make qualitative statements – albeit based on quantitative evidence – about the reasons for poor data assimilation performance, we were not able to formally identify or quantify contributions to uncertainty in posterior state estimates. More generally, the existing literature contained no comprehensive discussion regarding factors which contribute to uncertainty in data assimilation posteriors 2.2. An Approach to Quantifying the Efficiency of a Bayesian Filter (Nearing et al. 2013) The paper in Appendix B was submitted to Water Resources Research in September, 2012. This paper identifies three factors that collectively result in a non-Dirac posterior state distribution, and formally quantifies the contribution of each of these factors to posterior uncertainty in state estimates. These factors are: (1) non-injectivity of the mapping between states and observations, (2) noise in the observations, and (3) inefficiency in the data assimilation algorithm. Uncertainty in the data assimilation posterior is defined as Shannon (1948) entropy, and the potential for an observation to reduce uncertainty is quantified by the mutual information between the observation and the state. Both the entropy and mutual 20 information can be estimated from Monte Carlo samples the HMM, and the contributions of factors (1-3) to the entropy of the posterior are measured by estimating the information content of observations with and without error. Entropy and mutual information are estimated by maximum likelihood methods by discretizing the HMM state and observation spaces. The primary limitation to this method is computational: it is impractical to estimate highdimensional empirical probability density functions, and therefore it is impractical to estimate the entropy of, and information in, high-dimensional observations. This limits the applicability of the proposed method to estimating the contributions to posterior entropy by assimilating only one observation at a time. In the paper, contributions to posterior uncertainty from different sources are averaged over a period of time simulated by a stationary HMM. The proposed procedure was demonstrated on the problem of estimating profile soil moisture from observations at the surface (top 5 cm). When synthetic observations of 5 cm soil moisture were assimilated into a three-layer model of soil hydrology, it was found that part of the entropy of the posterior estimates of soil moisture states was due to inefficiencies in the EnKF. This implies that the EnKF did not use all of the available information in observations. 2.3. Information Loss in Estimation of Agricultural Yield: A Comparison of Generative and Discriminative Approaches (Nearing and Gupta in review) Although (Nearing et al. 2013) provide a way to measure contributions to uncertainty in data assimilation posterior distributions and give an example where not all of the 21 information was used by the data assimilation algorithm, the approach does not actually measure information loss. Measuring uncertainty can be thought of as estimating the precision of probabilistic state estimates, whereas the amount of information in a probabilistic estimate is a measure related to both accuracy and precision. Intuitively, observations contain information about the HMM state, and it is important to know how much of this information is incorporated into the data assimilation state estimate. Some of this information contained in observations might be used by the data assimilation algorithm to inform the data assimilation posterior distribution, and some of it might be lost. Additionally, there is potential for the data assimilation algorithm to introduce artifacts into the posterior estimate which are not informed by the observation. The paper in Appendix C formalizes these concepts by defining used information, lost information, and bad information. Used information and lost information sum to account for the total mutual information between observations and HMM state, while lost information and bad information sum to the Kullback-Leibler divergence (Kullback and Leibler 1951) from the true Bayesian posterior to the data assimilation approximation of that posterior. These concepts were demonstrated by comparing data assimilation and regression as two methods for estimating agricultural yield from remote sensing observations of leaf area index and soil moisture. Synthetic experiments were used to measure information loss and bad information in posterior estimates of end-of-season biomass made by the EnKF and GPR. It was found that GPR was generally as efficient as the EnKF at extracting information from observations. Since regression can be implemented independent of an 22 HMM simulator while data assimilation cannot, Nearing and Gupta (in review) argue that GPR is generally preferable to the EnKF for estimating agricultural yield from remote sensing observations. 2.4. Measuring Information about Model Structure Introduced During System Identification (Nearing and Gupta in preparation) Fundamentally, the data assimilation prior is facilitated by a HMM; in this sense, the data assimilation posterior is conditional on that HMM (Liu and Gupta 2007). System identification is an approach to define the HMM structure which has been used many times in hydrology (Bulygina and Gupta 2009; Liu and Gupta 2007; Vrugt et al. 2005); technically, parameter estimation is a form of system identification (Roweis and Ghahramani 2000). Bulygina and Gupta (2010) defined system identification as a twostep process of “building dynamical models that are simultaneously consistent with physical knowledge about the system and with the information contained in observational data.” The first step is conceptual structure identification, and involves a selection of system boundaries and boundary conditions, system states, and important system processes. The second step is mathematical structure identification and involves specifying appropriate mathematical representations of system processes. Presumably, both the conceptual identification and mathematical identification steps introduce information about the system. We propose to measure the amount of information introduced during each step. The first step in this process is to define what we mean by information, including what this information is about. Generally, we say that a model contains information about a particular phenomenon which arises from a 23 dynamic system by defining a probability distribution over a random variable associated with that phenomenon. Technically, the act of assigning a model structure for this purpose should be a Bayesian exercise, and we should therefore start with a prior distribution over the outcomes of the phenomenon; this prior should then be conditioned by some approximation of Bayes’ Law. The result is a distribution over the random variable associated with the phenomenon we wish to understand which is conditional on the model structure, and it is therefore possible to estimate the divergence from the conditional to the prior distribution. We demonstrate this conceptualization of the intuitive idea of information contained in a model on the problem of estimating the structure of a rainfall-runoff simulator using observations from the Leaf River watershed in Mississippi, USA. The conceptual identification step is accomplished by assigning a joint distribution between measured boundary conditions (precipitation and potential evapotranspiration) and streamflow using a stochastic implementation of the HyMod simulator (Boyle 2000). The mathematical identification step is accomplished using an expectation-maximization (EM; Dempster et al. 1977) approach to system identification (Ghahramani and Roweis 1999). The particular EM algorithm we employ uses data assimilation, in the form of the ensemble Kalman smoother (Evensen and van Leeuwen 2000) to calculate expected distributions over model states (the E-step) and then maximizes the likelihood of the values of parameters in a nonparametric regression (the M-step). The result of iterating this procedure is an identified state transition model. A similar approach was employed 24 by Bulygina and Gupta (2009, 2010, 2011) on the same problem we use for demonstration. The paper in Appendix D demonstrates EM mathematical structure identification and calculates the information introduced during the conceptual and mathematical identification portions of system identification. 2.5. Kalman Filtering with a Gaussian Process Observation Function (Nearing in preparation) One widely recognized limitation of the EnKF is the assumption of a linear observation function. Given a Gaussian prior over model states, this assumption allows for an analytical estimate of the conjugate (Gaussian) posterior. The EnKF strategy for estimating the posterior is to estimate a set of first moments, each conditional on taking an individual independently and identically distributed sample of the prior as the prior mean; this results in many independent and identically distributed samples of the posterior. Since the mean of a Gaussian is also its maximum likelihood estimator, the assumption can be alleviated simply by variational estimation of the mode of the posterior; in particular, this is done by iterative gradient-based maximization of the loglikelihood function. The primary requirement for variationally sampling the posterior is that the likelihood expression and its gradient must be known. Zupanski (2005) uses the covariance between a sample of the HMM state and observation as the gradient, and when this covariance is estimated as the sample covariance, it is effectively an estimate of the likelihood gradient 25 taken at the ensemble mean. In other words, it is only possible to estimate the gradient at one point in the joint state/observation space. This precludes using a kernel density estimate as the filter prior. One way to get around the limitation imposed by using the state/observation covariance to estimate the likelihood gradient is to calculate the gradient of the observation function analytically; this must be done individually for each HMM. We instead propose to emulate the observation function by nonparametric regression; this leads to standard likelihood and gradient expressions which can be calculated at any point in the HMM state and observation space. In particular, this means that the gradient can be local to each ensemble member, which would allow variational estimates of the mode of the posterior given a sample-based kernel estimate of the prior. The paper in Appendix D gives the likelihood and gradient expressions for a filter posterior when the observation function is emulated by GPR. Three application examples that employ Gaussian priors are used to demonstrate that the theory is sound: emulating the observation function does not attenuate filter performance. This paper has not been published because the observation function emulator has not been tested in conjunction with a nonparametric prior. 26 CHAPTER 3: DISCUSSION AND FUTURE WORK The work contained in this dissertation can be thought of as serving three somewhat distinct purposes. On the one hand, we have introduced general methods which rigorously evaluate the performance of common data assimilation algorithms (Nearing and Gupta in review; Nearing et al. 2013), and which extend the applicability of these methods (Nearing and Gupta in preparation). On the other hand, most of our application examples are related, either directly or indirectly, to the problem of estimating agricultural productivity from satellite observations of the land surface, and we are able to make general statements about the feasibility of this endeavor. Finally, and perhaps most importantly, we have attempted to initiate a meaningful discussion about what information means from the perspective of a hydrologic modeler. Related to the latter, we wish to emphasize that information must be about something that is well defined, and that both models and observations contain information. The novel methods proposed in Appendixes B-E are theoretically applicable to a diverse range of problems; each of these papers presents a new method. The methods which are presented in Appendixes B and C were inspired by reframing an old and under-studied problem in a new context. We hope and expect that this change in perspective will offer a foundation for further philosophical and practical developments. Certainly we expect to apply these types of evaluation metrics to both new and old practical state estimation problems. 27 The paper in Appendix D is our first attempt to understand what it means for a model to contain information. We expect that this might prove to be the most important contribution of this dissertation, and have begun formal studies into the possibility of segregating information contained in models into good and bad information; similar to what was described in Appendix C. The application examples described in Appendixes A-C provide insight into the feasibility of using observations of soil moisture and leaf area index to improve seasonal estimates of agricultural productivity. The main lessons learned during these studies were (1) that data assimilation is inefficient at improving state estimates in agricultural simulation models, and (2) that simple regression is probably preferable to data assimilation for estimating yield. While we don’t suggest that data assimilation should not be applied to the problem of estimating agricultural productivity, we do encourage caution and careful assessment of the goals and impacts of such a strategy. We feel that discriminative approaches are the more promising method, and are very interested to see interpolation applied to other forecasting problems which are typically addressed using data assimilation. 28 REFERENCES Bayes, M., & Price, M. (1763). An Essay towards Solving a Problem in the Doctrine of Chances. By the Late Rev. Mr. Bayes, F. R. S. Communicated by Mr. Price, in a Letter to John Canton, A. M. F. R. S. Philosophical Transactions, 53, 370-418 Boyle, D.P. (2000). Multicriteria calibration of hydrologic models. In, Department of Hydrology and Water Resources. Tucson, AZ: University of Arizona Bulygina, N., & Gupta, H. (2009). Estimating the uncertain mathematical structure of a water balance model via Bayesian data assimilation. Water Resources Research, 45 Bulygina, N., & Gupta, H. (2010). How Bayesian data assimilation can be used to estimate the mathematical structure of a model. Stochastic Environmental Research and Risk Assessment, 24, 925 Bulygina, N., & Gupta, H. (2011). Correcting the mathematical structure of a hydrological model via Bayesian data assimilation. Water Resources Research, 47 Cover, T.M., & Thomas, J.A. (1991). Elements of information theory. In. New York, NY, USA: Wiley-Interscience Damianou, A.C., Titsias, M.K., & Lawrence, N.D. (2011). Variational Gaussian process dynamical systems. In, NIPS (pp. 2510–2518). Granada, Spain de Wit, A.M., & van Diepen, C.A. (2007). Crop model data assimilation with the Ensemble Kalman filter for improving regional crop yield forecasts. Agricultural and Forest Meteorology, 146, 38-56 Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1-38 Entekhabi, D., Njoku, E.G., O'Neill, P.E., Kellogg, K.H., Crow, W.T., Edelstein, W.N., Entin, J.K., Goodman, S.D., Jackson, T.J., Johnson, J., Kimball, J., Piepmeier, J.R., Koster, R.D., Martin, N., McDonald, K.C., Moghaddam, M., Moran, S., Reichle, R., Shi, J.C., Spencer, M.W., Thurman, S.W., Tsang, L., & Van Zyl, J. (2010). The Soil Moisture Active Passive (SMAP) Mission. Proceedings of the IEEE, 98, 704-716 Evensen, G. (2003). The Ensemble Kalman Filter: theoretical formulation and practical implementation. Ocean Dynamics, 53, 343–367, doi:310.1007/s10236-10003-1003610239 Evensen, G., & van Leeuwen, P.J. (2000). An ensemble Kalman smoother for nonlinear dynamics. Monthly Weather Review, 128, 1852-1867 29 Gelaro, R., Zhu, Y., & Errico, R.M. (2007). Examination of various-order adjoint-based approximations of observation impact. Meteorologische Zeitschrift, 16, 685-692, doi:610.1127/0941-2948/2007/0248 Ghahramani, Z., & Roweis, S.T. (1999). Learning nonlinear dynamical systems using an EM algorithm. Advances in neural information processing systems, 431-437 Gong, W., Gupta, H.V., Yang, D., Sricharan, K., & Hero, A.O. (2013). Estimating epistemic & aleatory uncertainties during hydrologic modeling: An information theoretic approach. Water Resources Research, n/a-n/a Gordon, N.J., Salmond, D.J., & Smith, A.F.M. (1993). Novel approach to nonlinear nonGaussian Bayesian state estimation. IEE Proceedings-F Radar and Signal Processing, 140, 107-113, doi:110.1049/ip-f-1042.1993.0015 Hoogenboom, G., Jones, J.W., Wilkens, P.W., Porter, C.H., Hunt, L.A., Boote, K.L., Singh, U., Uryasev, O., Lizaso, J., Gijsman, A.J., White, J.W., Batchelor, W.D., & Tsuji, G.Y. (2008). Decision Support System for Agrotechnology Transfer. In. Honolulu, HI: University of Hawaii Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering, 82, 35-45, doi:10.1115/1111.3662552 Knyazikhin, Y., Glassy, J., Privette, J.L., Tian, Y., Lotsch, A., Zhang, Y., Wang, Y., Morisette, J.T., Votava, P., Myneni, R.B., Nemani, R.R., & Running, S.W. (1999). MODIS Leaf Area Index (LAI) and Fraction of Photosynthetically Active Radiation Absorbed by Vegetation (FPAR) Product (MOD15) Algorithm Theoretical Basis Document Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22, 79-86; doi:10.2307/2236703 Liu, Y.Q., & Gupta, H.V. (2007). Uncertainty in hydrologic modeling: Toward an integrated data assimilation framework. Water Resources Research, 43, W07401, doi:07410.01029/02006wr005756|issn 000043-001397 Nearing, G.S. (in preparation). Kalman filtering with a Gaussian process observation function Nearing, G.S., Crow, W.T., Thorp, K.R., Moran, M.S., Reichle, R.H., & Gupta, H.V. (2012). Assimilating remote sensing observations of leaf area index and soil moisture for wheat yield estimates: An observing system simulation experiment. Water Resources Research, 48 30 Nearing, G.S., & Gupta, H.V. (in preparation). Measuring information about model structure introduced during system identification Nearing, G.S., & Gupta, H.V. (in review). Information loss in generative and discriminative approaches to estimating yield Nearing, G.S., Gupta, H.V., Crow, W.T., & Gong, W. (2013). An approach to quantifying the efficiency of a bayesian filter. Water Resources Research Prevot, L., Chauki, H., Troufleau, D., Weiss, M., Baret, F., & Brisson, N. (2003). Assimilating optical and radar data into the STICS crop model for wheat. Agronomie, 23, 297-303 Rasmussen, C., & Williams, C. (2006). Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press Reichle, R.H. (2008). Data assimilation methods in the Earth sciences. Advances in Water Resources, 31, 1411-1418 Reichle, R.H., McLaughlin, D.B., & Entekhabi, D. (2002). Hydrologic data assimilation with the ensemble Kalman filter. Monthly Weather Review, 130, 103-114, doi:110.1175/1520-0493(2002)1130<0103:HDAWTE>1172.1170.CO;1172 Rodgers, C.D. (2000). Inverse Methods for Atmospheric Sounding: Theory and Practice. World Scientific Roweis, S., & Ghahramani, Z. (2000). An EM algorithm for identification of nonlinear dynamical systems Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379-423 Turner, R., M., D., & C., R. (2009). System identification in Gaussian process dynamical systems. In D. Görür (Ed.), Nonparametric Bayes Workshop at NIPS. Whistler, Canada Vrugt, J.A., Diks, C.G.H., Gupta, H.V., Bouten, W., & Verstraten, J.M. (2005). Improved treatment of uncertainty in hydrologic modeling: Combining the strengths of global optimization and data assimilation. Water Resources Research, 41, 17 Vrugt, J.A., Gupta, H.V., & Nuallain, B.O. (2006). Real-time data assimilation for operational ensemble streamflow forecasting. Journal of Hydrometeorology, 7, 548-565, doi:510.1175/jhm1504.1171 Wang, J.M., Fleet, D.J., & Hertzmann, A. (2008). Gaussian process dynamical models for human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 283-298 31 Wikle, C.K., & Berliner, L.M. (2007). A Bayesian tutorial for data assimilation. Physica D-Nonlinear Phenomena, 230, 1-16, doi:10.1016/j.physd.2006.1009.1017 Ye, M., Meyer, P.D., & Neuman, S.P. (2008). On model selection criteria in multimodel analysis. Water Resources Research, 44, W03428 Zupanski, M. (2005). Maximum likelihood ensemble filter: Theoretical aspects. Monthly Weather Review, 133, 1710-1726 32 APPENDIX A: ASSIMILATING REMOTE SENSING OBSERVATIONS OF LEAF AREA INDEX AND SOIL MOISTURE FOR WHEAT YIELD ESTIMATES: AN OBSERVING SYSTEM SIMULATION EXPERIMENT 1 Grey S. Nearing, 2Wade T. Crow, 3Kelly R. Thorp, 4M. Susan Moran, 5Rolf R. Reichle, and 1Hoshin V. Gupta 1 University of Arizona Department of Hydrology and Water Resources; Tucson, AZ 2 USDA-ARS Hydrology and Remote Sensing Laboratory; Beltsville, MD 3 USDA-ARS Arid-Land Agricultural Research Center; Maricopa, AZ 4 USDA-ARS Southwest Watershed Research Center; Tucson, AZ 5 NASA Goddard Space Flight Center; Greenbelt, MD Article published in Water Resources Research (2012), 48, W05525. 33 Abstract Observing system simulation experiments were used to investigate ensemble Bayesian state-updating data assimilation of observations of leaf area index (LAI) and soil moisture ( ) for the purpose of improving single season wheat yield estimates with the DSSAT CropSim-Ceres model. Assimilation was conducted in an energy-limited environment and a water-limited environment. Modelling uncertainty was prescribed to weather inputs, soil parameters and initial conditions, cultivar parameters, and though perturbations to model state transition equations. The ensemble Kalman filter (EnKF) and the sequential importance resampling filter (SIRF) were tested for the ability to attenuate effects of these types of uncertainty on yield estimates. LAI and observations were synthesized according to characteristics of existing remote sensing data and effects of observation error were tested. Results indicate that the potential for assimilation to improve end-of-season yield estimates is low. Limitations are due to lack of root-zone soil moisture information, error in LAI observations, and lack of correlation between leaf and grain growth. 34 1. Introduction Dynamic crop models, such as the Decision Support System for Agrotechnology Transfer crop simulation model (DSSAT; Hoogenboom et al. 2004), are used to aid decision making under uncertainty (Jones et al. 2003). For instance, DSSAT is used by the insurance industry to predict regional crop yields on a seasonal basis. Crop simulation models have an advantage over empirical models of agricultural productivity in that they can react dynamically to changes in local conditions in a physically and biologically meaningful way. However, because of uncertainties in model representations of realworld systems and because of uncertainties inherent in input data regarding soils, cultivar genetics and weather, any model-based estimate of agricultural yield will be subject to error. One approach to mitigating this type of error is to constrain model simulations using remote sensing observations through a process of data assimilation (Liu and Gupta 2007). Remote sensing measurements related to agriculture generally contain information about weather, vegetation or soil. Information about weather is used to force crop simulations directly. Remotely sensed information about vegetation often comes in the form of a leaf area index (LAI; e.g., Knyazikhin et al. 1999), which is a crop model component related to canopy cover. Similarly, soil moisture is a model state variable that acts as the primary control on plant water stress and observations of volumetric moisture content ( [ ]) in the top few centimetres of soil ( ) are available from remote sensing sources: AMSR-E (Njoku et al. 2003), SMOS (Kerr et al. 2010), and SMAP (Entekhabi et al. 35 2010). Together LAI and SW observations provide complementary information for agricultural monitoring. There are many types of data assimilation which are common in agronomy (Moulin et al. 1998; Prevot et al. 2003). This work investigates the potential for ensemble Bayesian state-updating filters (McLaughlin 2002) to mitigate modelling uncertainty on end-ofseason wheat yield estimates. Conceptually, ensemble Bayesian filters operate on the principle that a probability density function (pdf) representing uncertainty in model states can be approximated by a discrete set of model simulations, and that a pdf of model predictions can be estimated using Monte Carlo integration to marginalize uncertainty in model states. From a Bayesian perspective, the physical model provides context (a prior and likelihood) for interpreting information contained in remotely sensed data. Currently, a robust understanding of the response of physically-based model estimates of agricultural yield to state-updating assimilation remains lacking. The first step in this process is to perform a controlled synthetic data study, also called an Observing System Simulation Experiment (OSSE; Arnold and Dey 1986), which will allow for an analysis of interactions between uncertainty, observations and the model. Although both Pauwels et al. (2007) and Pellenq and Boulet (2004) present OSSEs which investigate the assimilation of LAI and/or into crop simulation models, these studies assess the effects of assimilation on model states; neither investigates the impact of data assimilation on yield estimates. de Wit and van Diepen (2007) present a case study on the effects of assimilating observations on yield estimates, however this does not provide sufficient 36 statistical and methodological control to differentiate limitations imposed by the model, the assimilation algorithm, and uncertainty in model inputs and observations. We present a set of OSSEs which assess LAI and assimilation for improving DSSAT CropSim-Ceres wheat yield estimates in a controlled synthetic environment. This allows for an understanding of model response to state updating and a delineation of the effects of modelling uncertainty, filter error, and observation error. This investigation provides a benchmark for interpreting the results of case studies (like de Wit and van Diepen (2007)) and a foundation on which to direct the development of agricultural models and remote sensing algorithms aimed at predicting yield. 2. Methods Several experiments are presented. First, modelling uncertainty was partitioned into isolated sources: weather inputs, soil parameters and initial conditions, cultivar parameters, and model state equations. Synthesized remote sensing observations were assimilated using the ensemble Kalman filter (EnKF; Evensen 2003) and the sequential importance resampling filter (SIRF; Gordon et al. 1993), and mean yield predictions from these filters were assessed. In addition, observations with variable error characteristics were assimilated to test the effects of observation error on EnKF and SIRF results. Sections 2.1-2.4 describe the model, data assimilation filters, and sets of numerical experiments. 37 2.1. The DSSAT-CropSim Ceres Wheat Model DSSAT is a collection of independent crop-growth modules supported by a land process wrapper. Integration takes place on a daily time step and the forcing data required are daily maximum and minimum temperature, daily integrated solar radiation and daily cumulative precipitation. DSSAT soil layering is user defined; we used nine layers with one surface layer representing the 0-5 cm of soil typically assumed to be visible to L-band wavelength satellites and a set of lower layers reaching a total depth of 1.8 m. DSSAT soil moisture is calculated using a Richie-type soil water balance (Richie 1998) which employs a curve number approach to partitioning runoff and updates the water content of each soil layer based on a set of linear drainage equations. The soil surface parameters are: a runoff curve number, an upper limit on evaporation, a drainage rate parameter, and albedo. Soil layers are parametrized by saturated water content (porosity), drained upper limit (field capacity), lower limit, saturated hydraulic conductivity, and a root growth factor. Similar to Mo et al. (2005), we used the soil water balance routines but did not simulate the soil nitrogen balance or any management decisions. This was done because it is impossible to presume that information about these aspects of agricultural development would be available at remote sensing scales. The CropSim-Ceres module (CC) simulates wheat crops. The CC models yield as a function of a Grain Weight state. Grain growth is developmentally dependent on daily development units, which are a function of mean daily temperature and daily cumulative solar radiation, a temperature control factor, vegetation biomass ( ) defined 38 as the sum of mass storage model states Stem Weight, Leaf Weight, and Reserves Weight, and model parameters. The most important crop model parameters are related to the cultivar: vernalizing duration, which specifies the number of days of optimum temperature necessary for vernalization; photoperiod response, which specifies the percent reduction in photosynthesis for every ten hour reduction in photoperiod ; grain filling phase duration in growing-degree-days [C days]; number of kernels per unit plant weight [#/g]; the standard kernel size [mg]; the standard tiller weight [g]; and the photoperiod interval between leaf tip appearances. In contrast to Grain Weight, LAI is a function of the model state Plant Leaf Area which is developmentally dependent on a temperature control factor and has a potential value set by the number of plant leaves, which is in turn determined at each time step by the cumulative sum of daily development units. Again, in contrast to grain growth, potential daily leaf growth is attenuated by an additive factor proportional to water stress, , so that a stress factor of 0 indicates potential growth and a stress factor of 1 indicates no growth. Potential grain growth is not modified in this way; the other components of biomass are affected indirectly by stress through leaf assimilation of plant carbon reserves. Water stress is the ratio of total root water uptake to potential transpiration, which is a fraction of potential evapotranspiration calculated according to the Priestley-Taylor method (Priestley and Taylor 1972). Root water uptake from each layer is a function of the difference between soil moisture state and the lower limit parameter. Thus when sufficient soil moisture is available to supply transpiration demand, water stress is zero. Given the way model develops vegetation and grain 39 components of the wheat plant, we know that LAI and will inform yield by informing . The model state vector contains all of the internal dynamic model variables necessary to transition the simulation from one time step to the next – that is, all of the Markov information. More specifically, at given time t, state ( ) is a function of the state at the previous time ( ), forcing data at the current time ( ), and time-invariant model parameters ( ) according to the state transition relationship : [1.1] The combined land process wrapper and CC Markov state vector has 97 components (Tables 1 and 2). The model output vector at time t ( ; here we use the term output to refer to model predictions which correspond directly with observations) is calculated according to the relationship as a function of the current state, current inputs and parameters: [1.2] For our purposes, the output vector contained the quantities 9 soil layers) and LAI; are also state variables so (soil moisture in each of simply preserves these values through identity relationships. LAI is not a state variable because its value is calculated independently at each time step as a function of the state Plant Leaf Area. 40 2.2. The Ensemble Kalman Filter The EnKF is commonly used for state-updating in moderately nonlinear geophysical models (Reichle 2008). It estimates the model state pdf by drawing samples from a joint uncertainty pdf over model parameters, forcing data, and state perturbations, and then propagates this sample through time using the model equations. This set of model simulations is the ensemble. At every observation time, the EnKF updates the state pdf based on the assumptions that all model states are linearly related to model output and that uncertainty in model states, model output, and observations can be quantified by second-order pdf approximations. Because the method has been widely discussed, we only present a brief overview and follow a variation on the formulation of Houtekamer and Mitchel (2001). The ensemble of model state predictions at time t is stored in which has size model outputs is where [ is the dimension of [ ], . Similarly, the ensemble of ], which has dimensions is the dimension of the observation vector, is required a priori and an observation sample where . The observation error covariance, [ , ] is generated according to: [2] The ensemble of EnKF updated model states, ̂, is calculated as a least-squares estimate based on model predictions and observations resulting in: 41 ̂ [3] where: [3.1] is the cross-covariance between ensemble deviations from mean model state and deviations from mean output and: [3.2] is the covariance matrix of ensemble deviations from mean model output. Both and are sampled directly from the ensemble. The finite nature of ensemble representations of uncertainty can lead to spurious updates when contains components that are not approximately or locally linearly related to components of . An analysis of DSSAT state and observation correlations resulted in a list of important Markov state components which have local approximately linear relationships with one or more output (Table 3, except stage timing states). This list includes all CC plant mass storage components, plant leaf area, and root volume (root accumulation is stored as a volumetric fraction rather than as a mass) as well as canopy height. Our EnKF employed a threshold filter which discarded any relationship between model states and modeled observation components with a Pearson product-moment correlation coefficient correlations. . This reduced the possibility of picking up spurious 42 It is important to note that the CC is a set of step functions which calculate crop attributes in fundamentally different ways depending on the current stage of development. Because of these and other nonlinearities, the EnKF will not guarantee mutually consistent model states after each update since it uses a single correlation relationship to update every ensemble member regardless of the current growth stage of each particular simulation. 2.3. The Sequential Importance Resampling Filter The SIRF provides an approximate Bayesian estimate of model state uncertainty at each time step conditional on past observations without assumptions of linearity and second order statistics. At each observation time, each ensemble member's state vector is assigned an importance weight, , which is proportional to the posterior likelihood of that state vector conditional on all past observations: [4] superscripts index the ensemble member. The observations are assumed to be independent conditional on the model output, and model output is a deterministic function of model state according to equation [1.2] so that the likelihood function relates observations to model state vectors according to: [4.1] is emulated by the observation uncertainty pdf, in this case Gaussian with mean and covariance . The state prior, , is estimated iid discretely by by the ensemble in the same way as the EnKF; this is achieved for time step t+1 by 43 resampling the ensemble at time step t with replacement and with probabilities proportional to . resulting in an iid discrete representation of the posterior, , as calculated from by equation [1.1], thus contains an iid discrete representation of the prior at time step t+1, . Proportional probability weights are simply calculated as: ( ) [4.2] In our case, the simulation ensemble was updated by replacing all 97 components of each member's state vector (Tables 1 and 2) with a state vector from a different simulation (sampling with replacement). Each model retained its own parameters and forcing data. 2.4. Observing System Simulation Experiments OSSEs, as diagrammed in Figure 1, were used to assess LAI and A group of assimilation potential. simulations (ensemble size is discussed in section 2.4.3) was sampled from a modelling uncertainty pdf. One of these simulations was chosen randomly as the truth system (upper path Figure 1) leaving an -member prediction ensemble which was used to estimate yield with and without data assimilation. Synthetic observations were generated by sampling from an observation uncertainty distribution around the truth system output and assimilated by the EnKF and SIRF (middle paths in Figure 1). The ensemble of model simulations without data assimilation was called the open loop (bottom path in Figure 1). The open loop, EnKF and SIRF all used the same 44 truth system and ensemble members (parameters, initial conditions and weather forcing data) and the EnKF and SIRF used the same synthetic observations. This type of OSSE was used to (i) choose an appropriate ensemble size, (ii) test the effects of EnKF and SIRF assimilation on segregated and combined modelling uncertainty sources, (iii) test the effects of observation uncertainty on EnKF and SIRF assimilation. Experiments (ii) and (iii) used the ensemble size chosen by (i). In all cases except for when determining ensemble size, each OSSE was repeated fifty times by drawing separate truth systems and ensembles; this Monte Carlo repetition provided a statistically independent experiment sample. The following subsections describe the modelling uncertainty pdf, the procedure for generating synthetic observations from truth system output, and these sets of experiments. 2.4.1. Modelling Uncertainty Distributions Assimilation was tested on two rain-fed wheat crops with different levels of water stress. Mean parameter and weather inputs came from field experiment data which are packaged with the DSSAT version 4.5 release: a 1975 study of a summer wheat crop conducted in Swift Current, Saskatchewan, Canada reported by (Campbell et al. 1977a; Campbell et al. 1977b) and a 1974-1975 study on winter wheat conducted in Rothamsted, UK. Figure 2 plots LAI, Grain Weight and water stress for both crops simulated using mean parameters outlined in Table 4, columns 2 and 3 and unperturbed weather forcing data. The Swift Current summer wheat crop represents a water-limited system and yielded 104 [kg/ha] using mean parameters listed in Table 4, column 2; the potential, non-stressed 45 yield was 4266 [kg/ha]. The mean parameter and forcing system received 153.6 [mm] of rainfall over a total of 95 days from planting on 25 May 1975 to maturity on 28 Aug 1975 and had a total evapotranspiration of 151.9 [mm]. The water stress factor reached a high of during the Ear Growth stage which occurred between day 51 and day 61 after planting. Although this crop produced very little yield using the mean parameter and input values, it was often the case that simulations sampled from the parameter and input uncertainty distribution caused a substantial increase in yield. The Rothamsted winter wheat crop represents an energy-limited system and reached potential yield of 6651 [kg/ha] using mean parameters listed in Table 4, column 3. The system received 512.9 [mm] of rainfall over a total of 269 days from planting on 6 Nov 1974 to maturity on 6 Aug 1975 and had a total evapotranspiration of 381.2 [mm]. The water stress factor reached a high of during the Grain Filling stage which occurred between day 240 and harvest; the water stress factor was close to 0 during all other development stages. Although this crop produced potential yield using the mean parameter and input values, it was often the case that simulations sampled from the parameter and input uncertainty distribution caused a substantial decrease in yield. Weather forcing data uncertainty was emulated by perturbing daily measured weather data with values sampled form the temporally and cross-correlated joint pdf outlined in Table 5. Perturbations on solar radiation and precipitation were multiplicative lognormally distributed with mean 1 and standard deviation 0.3 and 0.5 respectively; perturbations on temperature were additive Gaussian with mean 0 and unit variance; the same daily perturbation was applied to daily maximum and minimum temperature. 46 Weather perturbations were cross-correlated and AR(1) (first-order auto-regressive) temporally auto-correlated with correlation coefficients 1/e following (Reichle et al. 2007; Reichle et al. 2010); since integration was on a daily time step, these auto regression coefficients are relevant to a daily time series. Parameter uncertainty distributions were Gaussian (approximately, due to bounds) with means, variances, and bounds listed in Table 4. Cultivar parameters are model-specific; parameter files included with DSSAT release version 4.5 provided the limits and variances listed in Table 4. Variances and bounds for surface soil parameters were also estimated using a library of soil parametrizations included with DSSAT version 4.5. We used lumped parameters for the bottom 8 soil layers due to a presumed lack of knowledge about subsurface soil properties. The root growth factor was assumed to decrease exponentially with depth and was parametrized by a maximum value at the surface. Porosity, saturated conductivity and residual saturation were calculated from clay and silt percentages using pedotransfer functions from (Cosby et al. 1984). Bulk density was calculated as a function of porosity assuming mineral density of 2.65 and drained upper limit was taken as the average of saturated and residual moisture contents. Global soils maps that provide clay and silt contents are not usually associated with useful error estimates because much of the error in soil mapping is due to sparse measurements of heterogeneous areas. Here we sampled sand and clay percentage parameters independently, each with a standard deviation of 10%; sand and clay percentages were constrained to be positive and the sum was constrained to be less than one by, when necessary, reducing both parameters by an equal amount. Given the mean sand and clay 47 parameters from Table 4, this resulted in 95% confidence bounds which spanned approximately 18% of the soil textural triangle at and Swift Current and 25% of the soil textural triangle at Rothamsted. Model structural uncertainty was simulated by adding noise to the model state transition equations: [5] This type of model error was added to those states listed in Table 3 except for Grain Weight, and in some cases, where specified, to the development unit stage timing states Cumulative Development Units and Cumulative Germination Units. It was found that perturbing the Grain Weight state caused irreconcilable yield error by weakening statistical relationships between observations and yield. Random state perturbations were drawn from zero-mean Gaussian distributions with heteroscedastic variances: [5.1] where is the dimension identity matrix. This ensured that no state would become finite purely because of the perturbation, and threshold filters were used to ensure that all state values remained non-negative. Perturbations were sampled independently across states and time steps. 48 2.4.2. Generating Synthetic Observations The truth system output vector ( synthetic observations, ) was used to generate . At every observation time, a remote sensing measurement process was simulated by drawing from the Gaussian distribution: [ and ] [6] represent the observation error covariance matrixes related to LAI and observations respectively. Synthetic observations have no spatial scale. Frequency and error properties of synthetic observations were guided by uncertainty in existing remote sensing data. Observations of are available from SMOS at most major agricultural areas every 3 days with a spatial resolution of 50 km and approximate retrieval accuracy of error [ ] (Kerr et al. 2010). An improvement in spatial resolution (to 9 km) is expected with the launch of SMAP in 2014 (Entekhabi et al. 2010). Measurement accuracy will degrade as vegetation water content, [ ], increases throughout the growing season; in the case of SMAP, observations at or better than the [ ] accuracy level are expected to be confined to areas with vegetation water content less than 5 [ ] (Entekhabi et al. 2010). Jackson and Schmugge (1991) developed a relationship between vegetation water content and vegetation transmissivity, as proposed by Kirdiashev et al. (1979), which Bolten et al. (2010) adapted to model soil moisture observation uncertainty as: 49 ( ) [6.1] [ ] is the variance of the soil moisture retrieval at time t made over vegetation, [ ] is the variance of estimates made over bare soil, [ ] is an environment parameter which accounts for vegetation type and roughness, and incidence angle. The SMAP incidence angle is [ ] is the satellite and we adopted the value of for agricultural crops (Crow et al. 2005), along with a bare soil observation error standard deviation of approaches ( [ [ ) ] resulting in a retrieval accuracy model which [ ] as vegetation water content approaches ]. Vegetation water content was assumed to be a constant fraction of plant biomass and plant population ( [ ( ]): ) [6.2] was set according to results reported by Malhotra (1933). Plant population is a CC component that is not included in the EnKF update. Synthetic observations of were generated every three days, each layer perturbed independently with the same statistical error characteristics. Remote sensing platforms are only able to measure soil water content in the upper few centimetres of soil, and in later sections we compare assimilations which used only assimilations which used observations of . observations with 50 The MODIS MOD15A2 product group provides LAI [ ] estimates as an eight-day composite. The accuracy and uncertainty in this product has been investigated over agricultural areas by comparing the composite image to reference LAI from a single day at the end of the composite period; the uncertainty standard deviation was reported to be [ with ] (Tan et al. 2005). We generated synthetic LAI observations every eight days to simulate the remote sensing measurement process. 2.4.3. Ensemble Size Experiments Ensemble size represents a balance between pdf representation and computation expense. Effects of varying ensemble size were evaluated using ensembles of = 10, 25, 50, 75, 100, 250, 500, 750 and 1,000. An appropriate ensemble size was found when pdf predictions of Grain Weight, LAI and became stable with increasing sample size. The quality of ensemble representations of outputs LAI & and state Grain Weight was quantified using the root mean squared error (RMSE) taken over all simulation time steps of the difference between mean ensemble predicted output values and the true value. To make direct comparisons between OSSEs with different ensemble sizes, it was necessary to use the same truth system and observation set. Four truth systems were chosen and, for each truth system, a corresponding set of observations were generated and an member ensemble was sampled from the full uncertainty pdf outlined in Tables 4 and 5 and equation [5]. For each truth system, each ensemble of increasing size completely contained all smaller ensembles. For example, for a given 51 truth system, the member ensemble contained the member ensemble plus 15 additional simulations. The RMSE averaged over these four experiments is reported, and the particular choice of ensemble size is discussed in section 3.1. 2.4.4. Modelling Uncertainty Experiments Once an appropriate ensemble size was chosen, experiments were conducted to test the ability of data assimilation to mitigate particular types of modelling uncertainty in yield estimates. Modleing uncertainty pdf were taken as marginal distributions of the entire joint uncertainty distribution (Tables 4 and 5 and equation [5]). We tested the full joint uncertainty pdf described in section 2.4.1 plus five marginal uncertainty pdf related to: (i) weather forcing data (Table 5), (ii) soil parameters and initial conditions (Table 4), (iii) cultivar parameters (Table 4), (iv) model state perturbations to states listed in Table 3 except for Grain Weight, Cumulative Development Units and Cumulative Germination Units, and (v) same as (iv) but with perturbations to Cumulative Development Units and Cumulative Germination Units. Because each uncertainty type was independent from all others, marginalizing a particular uncertainty component was done by setting all of the variances of its uncertainty components to zero. Assimilation OSSEs were run for simulations of water-limited (Swift Current) and energy-limited (Rothamsted) crops using four types of observation sets: LAI and only, only, and , LAI ; the first three represent what are available from satellites and the fourth provides a way to assess limitations of only having surface level soil moisture 52 information. Each OSSE was repeated fifty times with different truth systems, ensembles and observations. The results were evaluated using a mean error score (ME [kg/ha]), which is the absolute difference between the ensemble mean predicted end-of-season yield and the true yield. This was calculated for the open loop, EnKF and SIRF ensembles for each of fifty OSSEs, and a single-tailed, two-sample t-test ( ) was used to test for a significant reduction in mean ME score due to SIRF or EnKF assimilation. Yield estimates can only be expected to improve when there are strong (although possibly indirect) relationships between model outputs and Grain Weight. Since Grain Weight is related to LAI and water stress through biomass, we report the absolute timeaveraged Pearson product-moment correlation coefficients ( and . For ) between model outputs the sum of profile water was used. Statistics were calculated using all open loop ensemble members from the 50 combined uncertainty OSSEs. 2.4.5. Observation Uncertainty Experiments Eight sets of OSSEs which utilized the full modelling uncertainty pdf were used to test effects of observation uncertainty on assimilation results. Synthetic observations of LAI and were generated and assimilated every three and eight days as described in section 2.4.2, however error variances were 0.030, 0.040, and 0.050 = 0.001, 0.005, 0.010, 0.015, 0.020, and LAI error variances were 0.01, 0.02, 0.05, 0.10, 0.15, 0.20, 0.30, and 0.40 for the eight sets of OSSEs. For both water-limited and energy-limited crops each observation type and uncertainty level was tested using 53 fifty Monte Carlo OSSEs and the reduction in ME scores was used to evaluate EnKF and SIRF performance. 3. Results 3.1. Ensemble Size Experiments Figure 3 illustrates RMSE output and Grain Weight dynamics due to varying ensemble size averaged over four sets of Swift Current OSSEs. Generally, when LAI was assimilated LAI RMSE improved but not RMSE, and when was assimilated RMSE improved but not LAI RMSE; this is similar to the findings of Pauwels et al. (2007) who used the EnKF and observed little improvement to modeled LAI when observations were assimilated. In both cases where outputs improved – LAI RMSE for LAI assimilation and RMSE for assimilation – output RMSE values were relatively stable with increasing ensemble size after about In the two cases where outputs did not improve – RMSE for for both the EnKF and SIRF. RMSE for LAI assimilation and LAI assimilation – output RMSE values stabilized around . de Wit and van Diepen (2007) used the EnKF and found stability in mean-squared-error of modeled LAI with EnKF assimilation of at around , which is in agreement with our findings. The Grain Weight state RMSE generally improved when LAI observations were assimilated but not observations; in both cases, the Grain Weight RMSE became relatively stable around discussed in this paper used an ensemble size of . The remainder of the OSSEs . 54 3.2. Modelling Uncertainty Experiments Table 6 reports Monte Carlo average ME scores and time-averaged correlations between outputs and end-of-season yield for water-limited and energy-limited OSSEs. The results for each uncertainty type are described below. It is important to understand that the timeaveraged correlations between model outputs and biomass inform assimilation results, but that the relationship is indirect and situation dependent. Thus the coefficients in Table 6 cannot be compared directly. Weather Inputs: Precipitation uncertainty affects grain development indirectly via the water stress control on the Leaf Weight component of biomass, while radiation and temperature affect grain development directly as well as indirectly though biomass. In simulations of the water-limited crop, LAI and were correlated with at approximately . ME scores were not significantly improved by assimilating LAI or using either filter. The inefficiency in assimilating LAI observations can be attributed to differences in the way radiation and temperature affect leaf and grain development and the fact that uncertainty in radiation and temperature affected grain growth after leaves were senesced (Figure 2). The SIRF was able to reduce the average ME score by assimilating and was more successful than the EnKF in this case likely due to the EnKF's linear model assumption. Swapping out state vectors for ones from ensemble members which represent crops grown in conditions similar to the truth system (similar historical water demand and availability) improved yield predictions, however updating plant states based on linear 55 correlations with soil states was inefficient, and improvements to estimates of profile water content did not translate into improved simulations of future plant development. The useful information stored in the soil moisture state was information about growth histories; this suggests that an improved understanding of the effects of weather on the crop growth environment might not be as important as an improved understanding of the effects of weather on crops themselves. In energy-limited simulations, LAI was correlated with at and EnKF assimilation of LAI observations improved yield predictions. Water stress was a decorrelating factor between LAI and biomass because of differences in how Leaf Weight, Stem Weight, and Reserves Weight responded to stress. Again, in the energylimited environment, SIRF assimilation of improved yield estimates while the EnKF did not. Soil Parameters and Initial Conditions: In simulations of the water-limited crop, the ability of the soil to infiltrate and store water was important for productivity. Correlations between LAI and were high ( ), however assimilation of LAI did not result in improved yield estimates. In this case, when the filters increased biomass due to high LAI observations, large plants were essentially placed into soils that could support them. When the filters decreased biomass due to low LAI observations, the plants grew quickly due to sufficient water availability. This is an example of the limitations of data assimilation filters when model parameters are uncertain. Since LAI observations were not available after senescence (Figure 2), the updated plant simulations were able to respond to the new environment before grain growth was completed. 56 In contrast, correlation between and was moderate ( SIRF was able to improve ME scores by assimilating ), however the . In this case, since observations of soil moisture were available through the end of the growing season, the SIRF was able to replace the ensemble of plant state vectors late in the grain development phase with ensemble members which had developed in conditions similar to the truth system. Surface level soil moisture observations contained insufficient information to improve Grain Weight simulations due to the fact that surface soil layer parameters were independent of root-zone parameters and the root-zone largely controls water availability. In the low-stress simulations, soil parameters did not affect grain growth. Cultivar Parameters: Error in yield estimates due to uncertainty in cultivar parameters was not mitigated by state-updating for either crop using any combination of filter and observations. Correlations between LAI and were high, but differences in the kernel number and standard kernel size parameters decoupled from Grain Weight. LAI and observations do not inform these parameter values. State Perturbations Without Stage Timing States: When states other than stage timing variables Cumulative Development Units and Cumulative Germination Units were perturbed, water-limited LAI was correlated with energy-limited environment, it was not ( ( ), however in the ). In the water-limited environment, perturbations to the root-zone soil water state partially controlled growth variations through the stress factor, and this resulted in correlated LAI and . In the energy-limited case, LAI and were not well-correlated because the state perturbations were 57 independent. Thus in the water-limited environment, LAI assimilation resulted in improved ME score, and in the energy-limited environment, it did not. was not well-correlated with in either set of simulations, however was correlated in the water-limited environment. Again, this is due to the fact that perturbations to soil water states were independent between layers and root-zone water availability determined water stress. assimilation improved yield estimates in this case. State Perturbations Including Stage Timing States: When perturbations to stage timing states Cumulative Development Units and Cumulative Germination Units were included, the water-limited LAI- correlation was reduced ( to ). Differences in growth stages between ensemble simulations accumulated throughout the growing season and caused a gradual decrease in cross-correlation between vegetation states (not shown). In energy-limited simulations, the correlation increased slightly ( to ) and the SIRF was able to improve ME scores by assimilating LAI. In the water-limited case, stress controlled biomass and LAI through leaf development and dissimilar development stage transitions resulted in decorrelation. In the energy-limited case, random perturbations controlled leaf development, Stem Weight, and Reserves Weight independently and similar development stage transitions resulted in slightly higher correlated vegetation states. Combined Uncertainty Sources: in a full modelling uncertainty paradigm, the EnKF and SIRF were unable to significantly improve ME scores in any case. This is a combined 58 result of lack in state correlations caused by differences in cultivar parameters and differences in weather controls on biomass, LAI and Grain Weight. 3.3. Observation Uncertainty Experiments Results from varying error variances of synthetic LAI and observations (Table 7) suggest that even nearly perfect observations of surface-level soil moisture will not improve single season yield estimates under reasonable modelling uncertainty assumptions. LAI assimilation was valuable in the water-limited simulations when the LAI observations error standard deviation was between 0.05 and 0.30 [m 2/m2. The SIRF was almost always better at assimilating LAI in the water-limited simulations, which is likely due to the highly nonlinear nature of the CC module. 4. Discussion The purpose of this study was to identify potential for state-updating data assimilation to mitigate error in yield estimates due to modelling uncertainty. Results show that this approach was generally unable to improve yield estimates under realistic uncertainty scenarios. There were many factors which contribute to this: (1) weather inputs affect grain and leaf growth differently meaning that similar LAI outputs do not imply similar Grain Weight states, (2) that certain cultivar parameters affect grain development directly in a way which is independent of all other model states, (3) that state-updating often results in plant state vectors which disagree with model (soil) parameters, and (4) that surface level soil moisture observations did not contain sufficient information about available water and water stress. Results suggest that in water-limited environments, LAI 59 assimilation would be more useful if observation error was lower than what is currently available. This is a problem because LAI observations will suffer from uncertainties which were not considered in this study: namely due to spatial heterogeneity in agricultural systems and discrepancies in spatial resolutions between fields and image pixels. These findings are qualitatively similar to those from the case study reported by de Wit and van Diepen (2007), who used the EnKF to assimilate observations over crop land in Europe. They found improvement to winter wheat yield estimates in 33 out of 88 test regions. Although their real-world experiment was likely hampered by mismatches in spatial resolution between agricultural fields and remote sensing observations, as well as other spatial factors including crop heterogeneity, we have shown that these factors alone were likely not the reason for poor assimilation results. This study suggests that in order to combine remote sensing observations with agricultural models for the purpose of estimating yield at single-season time scales, it will be necessary to modify our interpretation of crop development. Primarily, it would be interesting to investigate methods and ancillary data necessary for correlating leaf development with grain development directly. It is likely that the utility of soil moisture observations will be limited to monitoring extreme events over large time scales as was implied by Bolten et al. (2010), or for estimating irrigation schedules and agricultural water use as was done by Wang and Cai (2007). 60 Acknowledgement This research was jointly supported by a grant entitled from the NASA Terrestrial Ecology program entitled Ecological and agricultural productivity forecasting using root-zone soil moisture products derived from the NASA SMAP mission (PI - W.T. Crow), the NASA SMAP Science Definition Team, and grant 08-SMAPSDT08-0042 (PI - M.S. Moran). The authors would like to thank Cheryl Porter from the Department of Agricultural and Biological Engineering at the University of Florida for her help acquiring and managing DSSAT source code. 61 Tables Table 1: The Decision Support System for Agrotechnology Transfer (DSSAT) Markov State Vector (33 Components) Component Description Units ATOT Sum of last 5 days soil temp. C CANHT Canopy height m DRN Drained soil water cm EO Potential evap. mm/day EOP Potential plant transp. mm/day EORATIO Increase evap. per unit LAI mm/day EOS Potential soil evap. mm/day EP Plant transp. mm/day ES Soil evap. rate mm/day FRACRTS Fraction of soil contact w/roots # KSEVAP Light extinction coeff. (evap.) # KTRANS Light extinction coeff. (trans.) # PORMIN Min. pore space for O to plants m /m RLV Root volume by soil layer m /m RWUMX Max. uptake per unit root length m /m SNOW Snow accumulation mm SRFTEMP Surface soil temperature C SSOMC Soil Carbon kg/ha ST Soil temperature C STGDOY Stage transition dates day SUMES1 Cumulative stage 1 soil evap. mm SUMES2 Cumulative stage 2 soil evap. mm SW Soil water m /m SWDELTS Drainage rate m /m SWDELTU Change in SW due to evap. m /m SWDELTX Change in SW due to plant uptake m /m TMA Last 5 days of soil temp. C TRWU Total root water uptake mm TRWUP Total potential root water uptake m /m TSOILEV Duration stage 2 evap. days TSS Number of days with saturated soil days UPFLOW Upward flow due to evap. cm WINF Infiltration mm 62 Table 2: The CropSim-Ceres Wheat Module Markov State Vector (64 Components) Component Description Units ADATEND Anthesis end date date AFLFSUM Carbohydrate leaf factor # CARBOC Cumulative carbohydrate assimilated g/plant CHRSWT Chaff reserves g/plant CHWT Chaff weight g/plant CUMDU Cumulative development units # CUMGEU Cumulative germination units # CUMVD Cumulative vernalization days days DAE Days after emergence # DEADWT Dead leaf weight retained on plant g/plant G2 Coeff. grain growth (modified) mg/days C GEDSUM Germination + emergence duration days GESTAGE Germination at emergence stage # GETSUM Germination + emergence temp. sum C GPLA Green leaf area cm /plant GPLASENS Green leaf area during senescence cm /plant GRNUM Grains per plant #/plant GRWT Grain weight g/plant ISTAGE Current developmental stage # ISTAGEP Previous developmental stage # LAGSTAGE Lag phase for grain filling stage # LAP Leaf area by leaf number cm /plant LAPOT Leaf area potentials by leaf number cm /leaf LAPS Leaf area senesced by leaf number cm /plant LFWT Leaf weight g/plant LLRSWAD Leaf lamina reserves weight kg/ha LLRSWT Leaf lamina reserves g/plant LLWAD Leaf lamina weight kg/ha LNUMSD Leaf number per Haun stage # LNUMSG Growing leaf number # LSHAI Leaf sheath area index # LSHRSWAD Leaf sheath reserves weight kg/ha 63 Table 2 (continued): The CropSim-Ceres Wheat Module Markov State Vector (64 Components) Component Description Units LSHRSWT Leaf sheath reserves g/plant LSHWAD Leaf sheath weight kg/ha PARI PAR interception fraction # PLA Plant leaf area cm PLTPOP Plant Population #/m RSTAGE Reproductive development stage # RSWT Reserves weight g/plant RTDEP Root depth cm RTWT Root weight g/plant RTWTL Root weight by layer g/plant SEEDRS Seed reserves g/plant SEEDRSAV Seed reserves available g/plant SENCL Senesced Carbon by layer g/plant SENCS Senesced Carbon added to soil g/plant SENLA Cumulative senesced leaf area cm /plant SRADSUM Cumulative radiation MJ/m SSTAGE Secondary stage of development # STRSWT Stem reserves g/plant STWT Stem weight g/plant TNUM Tiller number #/plant TSDAT Terminal spikelet date # TSS Duration of saturation days TTD Thermal time over last 20 days C TTNUM Thermal time means in sum # VF Vernalization factor # WFG Water stress factor for growth # WFGC Cumulative growth water factor # WFLFNUM Water stress factor for each leaf # WFLFSUM Cumulative water stress factor per leaf # XSTAGE Stage of development # ZSTAGE Zadok stage of development # ZSTAGEP Precious Zadok stage # 64 Table 3: The State Vector Updated by the EnKF and the State Vector Perturbed by Equation [5.1] to Simulate Model Structural Errora a State Component Units Dimension CSM States Soil Water 9 [m /m ] Canopy Height [m] 1 CC States Root Volume Fraction 9 [cm /cm ] Chaff Weight [g/plant] 1 Stem Weight [g/plant] 1 Leaf Weight [g/plant] 1 Reserves Weight [g/plant] 1 Grain Weight [g/plant] 1 Plant Leaf Area 1 [cm ] Seed Reserves [g/plant] 1 Stage Timing States Cumulative Development Units [ C days] 1 Cumulative Germination Units [ C days] 1 Cumulative Development Units and Cumulative Germination Units are not updated by the ensemble Kalman filter (EnKf), and Grain Weight is not perturbed by equation [5.1]. 65 Table 4: The (Approximately) Gaussian Probability Density Function of Uncertainty in Model Parameters and Initial Conditions Uncertainty Uncertainty Source Cultivar Parameters Vernalizing Duration Photoperiod Response Grain Filling Duration Kernel Number Standard Kernel Size Standard Tiller Weight Interval Between Leafs Soil Parameters Albedo Upper Limit Evaporation Drainage Rate Parameter Runoff Curve Number Root Growth Factor Layer 1 (0-5cm) Layer 2-9 (5-180 cm) Clay % Layer 1 (0-5cm) Layer 2-9 (5-180 cm) Silt % Layer 1 (0-5cm) Layer 2-9 (5-180 cm) Initial Soil Moisture Layer 1 (0-5cm) Layer 2-9 (5-180 cm) Mean Parameters Values Swift Rotha- Std. Low. Upp. Current msted Dev. Bnd Bnd Units 0 60 335 25 26 1.5 86 60 67 515 14 44 4.0 100 3 0 60 [days] 10 0 200 [%] 33.5 100 1000 [C days] 2.5 10 50 [#/g] 2.6 10 80 [mg] 0.3 0.5 8 [g] 10 30 150 [C days] 0.10 9.4 0.20 84.0 0.14 6.0 0.50 60.0 0.05 0 1 [ ] 2.0 1 12 [cm/day] 0.3 0.01 0.99 [1/day] 10 1 99 [ ] 1.00 0.74 1.00 0.90 0.05 0.1 0 0 1 1 [ ] [ ] 10.7 9.2 23.4 23.4 10 10 0 0 100 100 [%] [%] 29.9 29.7 30.0 30.0 10 10 0 0 100 100 [%] [%] 0.23 0.33 0.04 0.20 0.33 Low Lim Low 0.08 Lim Satura [m /m ] tion Satura [m /m ] tion 66 Table 5: Forcing Data Perturbation Sampling Parameters Correlations Mult. Or Std. AR(1) w/ w/ w/ Weather Inputs Add. Dev. Coeff.a Temp Rad. Precip. Data Unitsb Temp(Max and Min) A 1 1/e 1 -0.80 -0.32 [C ] Solar Radiation M 0.3 1/e -0.80 1 0.40 MJ/m day Precipitation M 0.5 1/e -0.32 0.40 1 [mm] a First order autoregressive coefficients assume a daily time series. b Data units are the dimensions of the forcing data itself and not the units of the perturbations except in the case of temperature which uses additive perturbations; radiation and precipitation perturbations are multiplicative and unitless. 67 Table 6: Monte Carlo Average Mean Error Scores and Time-Averaged Correlation Coefficients for Modeling Uncertainty OSSEsa Crop System Uncertainty Weather Soil Parameters & Initial Conditions Cultivar Parameters WaterState Perturbations w/o Limited Stage Timing (Swift Current) State Perturbations w/ Stage Timing Combined Sources Weather Soil Parameters & Initial Conditions Cultivar Parameters EnergyState Perturbations w/o Limited Stage Timing (Rothamsted) State Perturbations w/ Stage Timing Combined Sources a Open Loop 243.7 EnKF SIRF LAI LAI & LAI & LAI LAI 280.2 255.2 280.0 299.3 219.1 219.1 213.7 183.1a 0.548 0.490 0.572 322.6 287.3 311.9 297.9 616.4 427.0 377.3 620.7 239.3 0.918 0.088 0.233 375.5 366.2 364.3 373.9 386.0 414.9 375.9 396.7 394.6 0.964 0.286 0.461 604.8 470.0 484.0 555.3 296.4 477.2 479.5 544.4 471.6 0.845 0.115 0.780 587.4 543.9 556.5 589.4 500.9 547.1 533.1 637.6 653.1 0.657 0.120 0.159 710.9 330.7 660.8 642.9 738.3 756.7 721.4 652.5 776.4 722.0 0.568 275.2 274.4 330.7 330.7 307.9 315.3 322.2 244.5 0.808 0.164 0.131 0.421 0.656 0.7 0.7 0.7 0.7 0.7 0.8 0.9 0.7 0.7 0.879 0.034 0.047 1352.5 1330.8 1320.7 1352.5 1352.5 1400.3 1319.2 1405.0 1359.9 0.770 0.607 0.054 356.8 382.3 356.7 360.5 364.4 406.3 367.2 366.0 545.2 0.341 0.007 0.005 1034.9 1084.9 1081.6 999.5 1024.6 1005.2 722.9 1040.8 1383.5 0.452 0.038 0.004 1342.3 1227.8 1180.2 1330.8 1303.6 1172.2 1217.5 1405.0 1870.0 0.645 0.101 0.234 Bold indicates a significant reduction in mean ME score by assimilation . 68 Table 7: Monte Carlo Average Mean Error Scores for Combined Modelling Uncertainty Assimilations with Increasing Observation Error Variance (Observation Error)a LAI Open Obs. ME Obs. ME ME Unc. EnKF SIRF Unc. EnKF SIRF 718.7 0.01 607.6 656.9 0.001 758.9 1054.6 731.9 0.02 666.8 682.6 0.005 786.9 949.1 701.5 0.05 609.8 543.8a 0.010 750.5 760.6 0.10 597.6 586.0 0.015 671.2 664.9 Water-Limited 665.0 0.15 565.5 494.5 0.020 681.7 771.1 (Swift Current) 705.8 681.1 0.20 551.7 524.2 0.030 707.7 745.1 752.4 0.30 631.2 609.0 0.040 732.9 760.4 666.7 0.40 559.5 572.2 0.050 670.1 658.9 1314.1 0.01 1394.0 1528.6 0.001 1300.3 2111.5 1296.0 0.02 1321.9 1232.0 0.005 1304.0 1922.3 1360.6 0.05 1377.2 1341.2 0.010 1351.9 1865.5 Energy-Limited 1572.4 0.10 1626.2 1435.2 0.015 1594.3 1776.3 1744.9 0.15 1785.2 1278.0 0.020 1727.6 2074.9 (Rothamsted) 1383.5 0.20 1397.3 1342.1 0.030 1413.2 1513.7 1454.7 0.30 1345.2 1352.5 0.040 1438.6 1353.1 1472.9 0.40 1460.0 1330.3 0.050 1419.9 1731.0 a Bold indicates a significant reduction in mean ME score by assimilation . Crop System 69 Figures Figure 1: OSSE process diagram: Transparent gray boxes represent SIRF and EnKF assimilation algorithms, are forcing data, are filter-updated model states, yield. , and are model parameters, are modeled LAI and , are model states, ̂ are observed LAI and , are uncertainty variances listed in Tables 4 and 5 and equation [5]. is 70 Figure 2: Baseline simulations of water-limited (Swift Current) summer wheat and energy-limited (Rothamsted) winter wheat with parameters listed in Table 4, columns 2 and 3 respectively: In the bottom plot, water stress is magnified by a factor of eight. 71 Figure 3: Output and Grain Weight time-averaged open loop, EnKF and SIRF RMSE values as a function of increasing ensemble size 72 References Arnold, C.P., & Dey, C.H. (1986). Observing-systems simulation experiments - past, present, and future. Bulletin of the American Meteorological Society, 67, 687-695, doi:610.1175/1520-0477(1986)1067<0687:OSSEPP>1172.1170.CO;1172 Bolten, J.D., Crow, W.T., Zhan, X.W., Jackson, T.J., & Reynolds, C.A. (2010). Evaluating the utility of remotely sensed soil moisture retrievals for operational agricultural drought monitoring. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 3, 57-66, doi:10.1109/jstars.2009.2037163 Campbell, C.A., Cameron, D.R., Nicholaichuk, W., & Davidson, H.R. (1977a). Effects of fertilizer-N and soil-moisture on growth, N-content, and moisture use by spring wheat. Canadian Journal of Soil Science, 57, 289-310 Campbell, C.A., Davidson, H.R., & Warder, F.G. (1977b). Effects of fertilizer-N and soil-moisture on yield, yield components, protein-content and N accumulation in aboveground pars of spring wheat. Canadian Journal of Soil Science, 57, 311-327 Cosby, B.J., Hornberger, G.M., Clapp, R.B., & Ginn, T.R. (1984). A statistical exploration of the relationships of soil-moisture characteristics to the physical properties of soils. Water Resources Research, 20, 682-690 Crow, W.T., Koster, R.D., Reichle, R.H., & Sharif, H.O. (2005). Relevance of timevarying and time-invariant retrieval error sources on the utility of spaceborne soil moisture products. Geophysical Research Letters, 32 de Wit, A.M., & van Diepen, C.A. (2007). Crop model data assimilation with the Ensemble Kalman filter for improving regional crop yield forecasts. Agricultural and Forest Meteorology, 146, 38-56 Entekhabi, D., Njoku, E.G., O'Neill, P.E., Kellogg, K.H., Crow, W.T., Edelstein, W.N., Entin, J.K., Goodman, S.D., Jackson, T.J., Johnson, J., Kimball, J., Piepmeier, J.R., Koster, R.D., Martin, N., McDonald, K.C., Moghaddam, M., Moran, S., Reichle, R., Shi, J.C., Spencer, M.W., Thurman, S.W., Tsang, L., & Van Zyl, J. (2010). The Soil Moisture Active Passive (SMAP) Mission. Proceedings of the IEEE, 98, 704-716 Evensen, G. (2003). The Ensemble Kalman Filter: theoretical formulation and practical implementation. Ocean Dynamics, 53, 343–367, doi:310.1007/s10236-10003-1003610239 Gordon, N.J., Salmond, D.J., & Smith, A.F.M. (1993). Novel approach to nonlinear nonGaussian Bayesian state estimation. IEE Proceedings-F Radar and Signal Processing, 140, 107-113, doi:110.1049/ip-f-1042.1993.0015 73 Hoogenboom, G., Jones, J.W., Wilkens, P.W., Batchelor, W.D., Hunt, L.A., Boote, K.L., Singh, U., Uraysev, O., Bowen, W.T., Gijsman, A.J., Toit, A.D., White, J.W., & Tsuji, G.Y. (2004). Decision Support System for Agrotechnology Transfer version 4.0. In. Honolulu, HI: University of Hawaii Houtekamer, P.L., & Mitchell, H.L. (2001). A sequential ensemble Kalman filter for atmospheric data assimilation. Monthly Weather Review, 129, 123-137 Jackson, T.J., & Schmugge, T.J. (1991). Vegetation effects on the microwave emission of soils. Remote Sensing of Environment, 36, 203-212 Jones, J.W., Hoogenboom, G., Porter, C.H., Boote, K.J., Batchelor, W.D., Hunt, L.A., Wilkens, P.W., Singh, U., Gijsman, A.J., & Ritchie, J.T. (2003). The DSSAT cropping system model. European Journal of Agronomy, 18, 235-265 Kerr, Y.H., Waldteufel, P., Wigneron, J.P., Delwart, S., Cabot, F., Boutin, J., Escorihuela, M.J., Font, J., Reul, N., Gruhier, C., Juglea, S.E., Drinkwater, M.R., Hahne, A., Martin-Neira, M., & Mecklenburg, S. (2010). The SMOS Mission: New Tool for Monitoring Key Elements of the Global Water Cycle. Proceedings of the Ieee, 98, 666687 Kirdiashev, K.P., Chukhlantsev, A.A., & Shutko, A.M. (1979). Microwave radiation of the earth's surface in the presence of vegetation cover. Radiotekhnika i Elektronika, 24, 256-264 Knyazikhin, Y., Glassy, J., Privette, J.L., Tian, Y., Lotsch, A., Zhang, Y., Wang, Y., Morisette, J.T., Votava, P., Myneni, R.B., Nemani, R.R., & Running, S.W. (1999). MODIS Leaf Area Index (LAI) and Fraction of Photosynthetically Active Radiation Absorbed by Vegetation (FPAR) Product (MOD15) Algorithm Theoretical Basis Document Liu, Y.Q., & Gupta, H.V. (2007). Uncertainty in hydrologic modeling: Toward an integrated data assimilation framework. Water Resources Research, 43, W07401, doi:07410.01029/02006wr005756|issn 000043-001397 Malhotra, R.C. (1933). A contribution to the biochemistry of the wheat plant. Journal of Biochemistry, 18, 199-205 McLaughlin, D. (2002). An integrated approach to hydrologic data assimilation: interpolation, smoothing, and filtering. Advances in Water Resources, 25, 1275-1286 Mo, X.G., Liu, S.X., Lin, Z.H., Xu, Y.Q., Xiang, Y., & McVicar, T.R. (2005). Prediction of crop yield, water consumption and water use efficiency with a SVAT-crop growth 74 model using remotely sensed data on the North China Plain. Ecological Modelling, 183, 301-322 Moulin, S., Bondeau, A., & Delecolle, R. (1998). Combining agricultural crop models and satellite observations: from field to regional scales. International Journal of Remote Sensing, 19, 1021-1036 Njoku, E.G., Jackson, T.J., Lakshmi, V., Chan, T.K., & Nghiem, S.V. (2003). Soil moisture retrieval from AMSR-E. IEEE Transactions Geoscience and Remote Sensing, 41, 215-229 Pauwels, V.R.N., Verhoest, N.E.C., De Lannoy, G.J.M., Guissard, V., Lucau, C., & Defourny, P. (2007). Optimization of a coupled hydrology-crop growth model through the assimilation of observed soil moisture and leaf area index values using an ensemble Kalman filter. Water Resources Research, 43, W04421, doi:04410.01029/02006wr004942 Pellenq, J., & Boulet, G. (2004). A methodology to test the pertinence of remote-sensing data assimilation into vegetation models for water and energy exchange at the land surface. Agronomie, 24, 197-204 Prevot, L., Chauki, H., Troufleau, D., Weiss, M., Baret, F., & Brisson, N. (2003). Assimilating optical and radar data into the STICS crop model for wheat. Agronomie, 23, 297-303 Priestley, C.H.B., & Taylor, R.J. (1972). Assessment of surface heat-flux and evaporation using large-scale parameters. Monthly Weather Review, 100, 81-92 Reichle, R.H. (2008). Data assimilation methods in the Earth sciences. Advances in Water Resources, 31, 1411-1418 Reichle, R.H., Koster, R.D., Liu, P., Mahanama, S.P.P., Njoku, E.G., & Owe, M. (2007). Comparison and assimilation of global soil moisture retrievals from the Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) and the Scanning Multichannel Microwave Radiometer (SMMR). Journal of Geophysical Research-Atmospheres, 112, D09108 Reichle, R.H., Kumar, S.V., Mahanama, S.P.P., Koster, R.D., & Liu, Q. (2010). Assimilation of satellite-derived skin temperature observations into land surface models. Journal of Hydrometeorology, 11, 1103-1122 Richie, J.T. (1998). Soil water balance and plant water stress. In G.Y. Tsuji, G. Hoogenboom, & P.K. Thornton (Eds.), Understanding Options for Agricultural Production (pp. 41-54). Dordrecht, Netherlands: Kluwer Academic Publishers 75 Tan, B., Hu, J.N., Zhang, P., Huang, D., Shabanov, N., Weiss, M., Knyazikhin, Y., & Myneni, R.B. (2005). Validation of Moderate Resolution Imaging Spectroradiometer leaf area index product in croplands of Alpilles, France. Journal of Geophysical ResearchAtmospheres, 110, D01107 Wang, D.B., & Cai, X.M. (2007). Optimal estimation of irrigation schedule - An example of quantifying human interferences to hydrologic processes. Advances in Water Resources, 30, 1844-1857 76 APPENDIX B: AN APPROACH TO QUANTIFYING THE EFFICIENCY OF A BAYESIAN FILTER 1 Grey S. Nearing, 1Hoshin V. Gupta, 2Wade T. Crow, 3Wei Gong 1 University of Arizona Department of Hydrology and Water Resources; Tucson, AZ 2 USDA-ARS Hydrology and Remote Sensing Laboratory; Beltsville, MD 3 Beijing Normal University College of Global Change and Earth System Science; Beijing, China Article accepted to Water Resources Research on March 2, 2013. 77 Abstract Data assimilation is the Bayesian conditioning of uncertain model simulations on observations to reduce uncertainty about model states. In practice, it is common to make simplifying assumptions about the prior and posterior state distributions, and to employ approximations of the likelihood function, which can reduce the efficiency of the filter. We propose metrics that quantify how much of the uncertainty in a Bayesian posterior state distribution is due to (i) the observation operator, (ii) observation error, and (iii) approximations of Bayes’ Law. Our approach uses discrete Shannon entropy to quantify uncertainty, and we define the utility of an observation (for reducing uncertainty about a model state) as the ratio of the mutual information between the state and observation to the entropy of the state prior. These metrics make it possible to analyze the efficiency of a proposed observation system and data assimilation strategy, and provide a way to examine the propagation of information through the dynamic system model. We demonstrate the procedure on the problem of estimating profile soil moisture from observations at the surface (top 5 cm). The results show that when synthetic observations of 5 cm soil moisture are assimilated into a three-layer model of soil hydrology, the ensemble Kalman filter does not use all of the information available in observations. 78 1. Introduction Uncertainties in model estimates and forecasts of dynamic systems are often expressed probabilistically, and data assimilation is the Bayesian process of conditioning state uncertainty distributions on observations of the modeled system (Wikle and Berliner 2007). Intuitively, data assimilation relies on a model to provide prior information about the behaviour of a dynamic system and observations provide independent, collaborating evidence. In this sense, prior knowledge is the set of physical laws by which we expect the system to operate, and the Bayesian data assimilation prior is a probability distribution representing a) uncertainty in the representation of these physical laws themselves, b) uncertainty about the isomorphism between mathematical or numerical representations of physics and the actual physics of the system, and c) uncertainty in the boundary condition. To update these prior uncertainty distributions using observations it is necessary to have a model of the observation process; this model and any associated observation uncertainty constitutes the Bayesian likelihood function. When designing an observing system, it is often desirable to estimate the value of certain types of hypothetical observations; this is especially important if collecting observations is expensive. Observing system simulation experiments (OSSEs; which are synthetic studies) can be used to evaluate the strength of computational approaches to conditioning models on observations (Reichle et al. 2001). A typical data assimilation OSSE will compare the uncertainties associated with model forecasts before and after assimilating a set of synthetic observations generated according to the specifications of the proposed 79 observing system. The result is usually an estimate of the change in forecast accuracy or precision. For practical reasons, most data assimilation algorithms use approximations of Bayes’ Law. Therefore, there are three factors that might combine to reduce the effectiveness of Bayesian learning from observations: (i) The mapping from states to observations may not be injective (i.e. a one-toone mapping), so that even perfect observations do not translate into perfect state estimates. (ii) The signal to noise ratio in the observations may be small. (iii) Approximations of Bayes’ Law may result in spurious updates to the state uncertainty distributions. Any set of observations contains a certain amount of information about model states, this amount is determined by (i) and (ii), while (iii) results in information loss. Although it is possible to estimate the effects of reducing noise in the observations simply by changing the specifications for generating synthetic observations, traditional OSSE analyses do not formally differentiate between the individual contributions of (i-iii) to the posterior uncertainty. Furthermore, they do not formally track the assimilated information through the dynamic system model. In this paper, we propose a set of metrics which quantify the fraction of uncertainty in posterior state distributions (after data assimilation) related to each of the three factors listed above. We discuss, specifically, the case of one-dimensional observations. 80 Efficiency is computed by comparing the reduction in uncertainty about model states actually achieved to that which is potentially attainable. These metrics, based on the quantification of uncertainty as discrete Shannon entropy (Shannon 1948), provide a consistent way to measure both (a) the potential ability of an observing system to inform specific science questions, and (b) the importance of various limiting factors in the data assimilation process. In addition, they can be used to track information introduced into a dynamic system model via Bayesian updating (updating acts as a perturbation on the system) through time and state-space and therefore provide a method for illustrating dynamic information flow. Section 2 provides an overview of probabilistic filtering and observing system simulation experiments, and presents the proposed efficiency metrics. Section 3 demonstrates the usefulness of the metrics for a aystem designed to estimate profile soil moisture content from observations of surface soil moisture. Section 4 concludes and discusses some limitations of the method. 2. Theory: Background and Proposed Metrics 2.1. Bayes Filters Nonlinear dynamical system (NLDS) simulators used for data assimilation are usually numerical integrators of stochastic differential equations of the form (Miller et al. 1999): [1] 81 where and respectively and are the simulator state and boundary condition at time is a Wiener process. Solutions to [1] at discrete times are approximated by sampling a Markov process (Liu and Gupta 2007): [2.1] where the state transition function is an approximation of the drift function and are noise sampled from the Gaussian distribution . Periodically the state of the system is observed according to an observation function : [2.2] The observation process is typically associated with some error so that the actual observation is: [2.3] where , and the observation error is drawn from an arbitrary distribution . In the above, [2] constitutes a hidden Markov model (HMM) and implies probability distributions and respectively, where is the number of simulation time steps. To make predictions with this model it is essential to know the distribution of the forcing data which, along with the conditional distributions and , define a joint probability distribution over model states and observations for the simulation period . 82 Data assimilation addresses the question “what can be learned about x by observing z?” and the Bayesian answer is a smoother; given observations during the simulation period, the posterior uncertainty in is given by: [3] ∫ In the general case, no analytical solution to [3] exists and it is impractical to sample the posterior directly due to the fact that it is -dimensional (dimension of the state multiplied by the number of integration time steps). Therefore it is often necessary to make some simplifying assumptions, the most common being that the state at time is conditionally independent of observations at times greater than . This assumption results in a filter: ∫ [4] ∫ ∫ which can be applied sequentially using the posterior at time defines the prior at time in the integral which . Since the filter posterior distribution is only -dimensional at any given time, it is sometimes feasible to use Markov Chain Monte Carlo (MCMC) techniques to generate approximate independent and identically distributed (iid) samples from the posterior. However, because MCMC sampling is computationally expensive and must be performed at every observation time step, additional simplifying approximations are often used to facilitate analytical estimates of the posterior. Common approximations 83 include emulating the prior, likelihood, and posterior as Gaussian and and as linear (the Kalman filter; Kalman 1960), emulating the prior and posterior as a discrete set of samples from Gaussians and as linear (the ensemble Klaman filter; Evensen 2003), or emulating the prior as a set of discrete samples and estimating the posterior by resampling the prior (Gordon et al. 1993). 2.2. Ensemble Kalman Filter Ensemble filters estimate the various HMM distributions from a set of which at each simulation timestep consists of the background), [2.2], and samples of the prior ( samples of the observation ( samples of the posterior ( iid samples, ; called ) derived from according to ; called the analysis). The ensemble filter most commonly used in hydrology, and the one which we will use in our example (section 3), is the ensemble Kalman filter (EnKF; Evensen 2003). Given an observation where is jointly Gaussian independent in time with variances , maximum likelihood estimates of the posterior (where each background ensemble member is taken to be the mean of the prior) can be approximated by linearizing around the ensemble mean and minimizing the expected squared error to obtain: ̅̅̅̅)( ( ̅̅̅̅) ( ( ̅̅̅̅)( ̅̅̅̅) [5] ) where ( ) are samples from the observation uncertainty distribution Under the stated conditions (Gaussian prior, Gaussian and linear ), . is an iid 84 sample of the posterior of [5] and can be used as the condition of the prior at timestep : each sample in is found by sampling with a single member of the ensemble taken to be the previous state condition. 2.3. Observing System Simulation Experiments Synthetic experiments that examine whether a proposed observing system and data assimilation strategy can be expected to produce posterior state distributions having increased accuracy (or reduced uncertainty) compared to the HMM simulator distributions are called observing system simulation experiments (OSSEs; Arnold and Dey 1986). The typical data assimilation OSSE is called an identical-twin experiment (e.g., Crow and Van Loon 2006) and consists of three principle components: 1. samples of the forcing data, state and observation distributions called the open loop samples; these are denoted , and respectively. We will also consider error-free samples of the observation . 2. A single sample from the forcing data and HMM distributions which is used to define the true state of the NLDS system; this is called the truth system and denoted 3. , , , and samples of the state distributions conditional on are called the analysis samples and denoted At every timestep, once the analysis is determined, . after data assimilation; these . can be sampled from Similarly, forecasts can be derived from analysis vectors by propagating . forward in 85 time timesteps (via the simulator) without assimilating any observations. We will denote this set of forecasts where distribution for timestep that represents a sample of the forecast made after assimilating the observation at time . Notice , and that the open loop sample , at time , is not a sample from the filter prior. Since the choice of truth system is random, it is necessary to repeat each OSSE a number of times, in this case times. The results of analyzing these OSSEs will thus be a comparison between analysis and forecast samples against a single open loop sample. In this case we used such that each open loop sample was used individually as the truth system for a single OSSE; this is necessary and the reason is explained in section 2.4. The most common way to evaluate an OSSE is in terms of accuracy of the ensemble mean, for example Pauwels et al. (2007) compared the mean-squared errors (MSE) of the expected value of open and assimilation loop state estimates. We propose instead to quantify the contributions of (i-iii) (from section 1) to uncertainty in the posterior state distributions. 2.4. Quantifying Observation Utility and Filter Efficiency The amount of information contained in the realization of a random variable over a discrete event space according to distribution is ( distributed ) (Shannon 86 1948). The entropy of the distribution from a sample of over is the expected amount of information : ∑ ( ) [6] Entropy can be interpreted as a measure of uncertainty about distribution . The expected amount of information about one random variable contained in a realization of another random variable between according to the is called the mutual information and : ∑ ∑ ( ) [7] The mutual information normalized by the entropy of the predictand is known as the Theil index (Theil 1967), which is a fractional measure of the expected uncertainty reduction due to Bayesian conditioning since: [8] If we approximate the HMM state and observation spaces as discrete, the Theil utility of any single observation taken at time for reducing uncertainty in the state dimension can be estimated from the background ensembles: ( ) ( Theil utility ranges between zero and one: is deterministic and injective and [9] ) implies that the mapping from implies that and to are statistically 87 independent. In general, Theil utility could be used to estimate the utility of any observation for estimating any simulator component including parameters, states, other observations or forecasts. The fraction of entropy in the analysis distribution due to non-injectivity of the observation function is proportional to one minus the Theil utility of observations with no observation error: ( ( ) ( ) )( ( ) ) ( ( ) [10] ) Theil utility is denoted as functionally dependent on the observation error distribution which, when there is zero observation uncertainty, is the Dirac delta function. The fraction of entropy in the analysis distribution due to observation error is: ( ( ) ( ) )( ) ( ) ( ( ) ) [11] and the fraction due to approximating Bayes’ Law at each time step is the scaled difference between the entropy of the analysis distribution and the entropy of the true Bayesian conditional distribution: ( ) ( ( ) | ) [12] Generally, [10-12] can be used to assess the fraction of posterior uncertainty on the state dimension at time due to assimilating an observation at any time less than . Specifically, this means that we can estimate the efficiencies for assimilating 88 observations to make forecasts. The posterior entropy fractions related to forecast state distributions , and will be indexed where is either , , or . The statistics , will be called the simulator, observation, and filter inefficiency effects respectively. Since each OSSE was repeated times to account for the random choice of truth system, and because the mutual information terms in [10] and [11] represent an expected value over observation space, the expected value of the entropy of the analysis distribution over truth systems was used in [10-12]. To ensure that mutual information estimates are comparable with the expected value of the entropy of the analysis sample, it is important that each open loop ensemble member be used as a truth system exactly once. 3. Demonstration: An OSSE for Estimating Root-Zone Soil Moisture This section describes an example OSSE which compares the inefficiency metrics outlined in the previous section with a standard MSE evaluation. This example is motivated by the problem of estimating root-zone soil moisture from observations taken at the surface. Measurements of radar scattering and emission can provide noninvasive estimates of water content of the top few centimeters of soil with penetration depth dependent on wavelength: ~2 cm at C-band and ~5 cm at L-band. Many efforts have been made to estimate root-zone soil moisture from radiometer observations using data assimilation (e.g., Bolten et al. 2010; Crow and Wood 2003; Galantowicz et al. 1999; Li and Islam 1999; Margulis et al. 2002) including several observing system simulation experiments 89 which test the feasibility of such approaches using synthetic observations (Crow and Reichle 2008; Dunne and Entekhabi 2005; Flores et al. 2012; Reichle et al. 2008; Reichle et al. 2001; Reichle et al. 2002b). The OSSE described in this section assimilates synthetic observations of soil moisture in the surface soil layer (5 cm) into a three-layer soil moisture model with total depth of 30 cm. 3.1. A 3-Layer Soil Moisture Model 3.1.1. State Transition Function The HMM simulator state transition function consisted of a dynamic soil moisture accounting model based on the two-layer model of soil hydrology developed by Mahrt and Pan (1984). The three-dimensional state vector consisted of volumetric water content in three soil layers with cumulative depth [m3/m3]. We used layer thicknesses of from the total depth of the root-zone ( cm; the states are denoted cm, cm and ), model parameters included Brooks and Corey (1964) hydraulic coefficients: porosity ( ; [m3/m3]), bubbling pressure ( saturated hydraulic conductivity ( cm. Aside ; [cm]), ; [cm/day]) and pore size distribution index ( ; [~]), as well as the residual moisture content ( ; [m3/m3]). The boundary condition was described by values of daily cumulative precipitation ( ; [cm/day]) and values of daily cumulative potential evaporation ( ; [cm/day]). At the beginning of each time step, the average volumetric water content in each layer was used to estimate unsaturated conductivity and soil diffusivity of that layer according to Brooks and Corey (1964): 90 ( ( ) [13.1] ) [13.2] Infiltration into the first two soil layers [cm/day] was calculated as: ( ( ( ) ( ) [13.3.1] ( ( ( )) ( ) ( )) ( ) [13.3.2] ( )) ( )) Direct evaporation [cm/day] from the top soil layer was: ( ( ( ) ) ( )) [13.4] ) [13.5.1] Soil moisture in each layer was updated according to: ( ( ) 91 ( ( ( ( The middle layer ( ) [13.5.2] ) ( ) ) ) [13.5.3] was added to the model by Mahrt and Pan (1984) to allow for sufficient infiltration. 3.1.2. Simulation Period and Boundary Conditions The three-layer model from section 3.1 was forced for days with daily precipitation and potential evaporation data collected at the Leaf River watershed in Mississippi, USA beginning on Nov 5, 1952. Forcing data uncertainty was assumed to be multiplicative and log-normally distributed so that the log-transformed (multiplicative) perturbations to measured precipitation and potential evaporation used to generate ensemble samples of [ were Gaussian distributed with mean zero and covariance ]. This forcing data uncertainty distribution was adapted from the distribution used by Reichle et al. (2007); the covariance between perturbations to precipitation and potential evaporation was and we did not consider temporal autocorrelation. We also assumed that 8% of rainfall events were undetected and added a rainfall amount sampled uniformly from the entire set of precipitation measurements to 8% of the days when zero rainfall was reported. We did not consider state transition 92 uncertainty, so that and all uncertainty was assumed to come from measurements of the boundary condition. Model parameters values were , and , , , . 3.1.3. Observation Function Synthetic observations of 5 cm soil moisture were generated according to: [14.1] [14.2] with the observation uncertainty distribution distribution has standard deviation . This observation error [m3/m3] so that the observation of the surface soil [m3/m3] of the true state 95% of the moisture state was expected to fall within time. The initial state for each ensemble member was sampled from the distribution for each layer , which represents an initially uncertain soil moisture profile with mean of 0.10 [m3/m3] in each layer. 3.2. Ensemble Size In application, ensemble size will be dependent on the desired accuracy and precision of the various entropy estimates described in section 2.4. The entropy and mutual information estimates in [6-7] are maximum likelihood estimators (MLEs) for discrete random variables, and the asymptotic bias of an MLE estimator of entropy ( bounded by – ( ) upper bound at zero is tight when , where ) is is the number of histogram bins; the (Paninski 2003). We approximated the 93 continuous HMM state and observation spaces as discrete using a histogram bin width given by Scott (2004): [15] ̂ and chose an ensemble size according to this bound on the bias of the scaled filter inefficiency effect: ( ) ( ) ( ) ( ) [16] ̂ refers to the standard deviation of the background state sample, the analysis state sample, or observation sample at time . Assuming that the range of each sample from a distribution for which we wished to estimate entropy was approximately 7.5 standard deviations ( ̂), so that √ for estimates of the entropy of the joint distributions between states and observations ( ( ( ) were bounded above by approximately )), the biases of the quantities ( ). Given that this is a loose bound and noting that the mean of the entropies of the marginal EnKF posteriors over was about 3.4 nats, we chose an ensemble size so that the upper bound on the fraction of posterior uncertainty due to filter inefficiency was approximately 0.34 nats, or about 10% of the mean of entropies of the filter posteriors. This ensemble size is much larger than what is needed to estimate the EnKF prior (Reichle et al. 2002a). 94 3.3. Methods of OSSE Analysis We report the time-averaged simulator, observation and filter inefficiency effects related to analysis and forecast posterior state distributions: ̅̅̅̅̅̅ ∑ [17] Each ̅̅̅̅̅̅ statistic estimates an inefficiency effect related to assimilating observations of to reduce uncertainty about future ( at the present time ( ) and up to days into the ). MSE statistics are also reported; these consist of, for each state dimension , the expected value over truth systems of the MSE of the mean of the analysis state vector normalized by the MSE of the ensemble mean of the open loop state vector (̅̅̅̅̅̅ is the ensemble mean of the analysis state sample due to assimilating observations of the truth system through time ): ∑ ∑ (̅̅̅̅̅̅ ) ∑ (̅̅̅̅ ) [18.1] as well as the similarly normalized MSE of the assimilation loop forecast ensemble mean state estimates: ∑ ∑ (̅̅̅̅̅̅̅ ) ∑ (̅̅̅̅̅̅̅ ) [18.2] 95 These statistics were calculated for EnKF assimilation of observations generated with the error distribution and also with a reduced error distribution to directly test the effect of reducing observation error on assimilation results. The effect of reducing observation error was quantified as the fractional reduction in MSE due to decreasing observation error variance, which is notated 3.4. Assessing the Effects of Simulator Uncertainty A separate experiment was conducted to assess the relative effects of boundary condition uncertainty and state transition uncertainty by scaling and variances. A separate simulator was sampled for eighty-one linearly spaced values of so that multiplicative perturbations to measured forcing data came from the log-normal distributions and state transition perturbations came from distributions . The Theil utilities of perfect observations (no observation error) for estimating the state at one timestep in the future were calculated from each of these simulators using the open loop samples. Since estimating Theil utility did not require data assimilation, no truth systems were sampled. We report the timeaveraged Theil utility over timesteps as a function of the scaling factors . 3.5. OSSE Results Figure 1 plots the first sixty timesteps of the open loop ensembles and assimilation loop ensembles resulting from assimilating observations from a single truth system with . The EnKF MSE statistics calculated by [18] (Figure 2) with were always smaller than one, which indicated that mean-squared errors of the expected 96 value of the analysis posteriors were smaller than mean-squared errors of the expected values of the open loop priors. Figure 2 shows that the expected value (over time and truth systems) of the effects on MSE of assimilating observations of on estimates of states and persist for one and two timesteps respectively before attenuating. This is an indication that the boundary condition (precipitation and evaporation), which largely influences directly, was the predominant contributor of uncertainty in this simulator. On average, assimilating observations with reduced the ensemble mean MSE at time 80%, and reducing the observation error to by about further reduced the ensemble mean MSE by almost 100% in all three states. It was possible to estimate all three true states exactly using precise observations, however the effect of reducing observation error variance was almost completely mitigated by the first forecast timestep. Figure 3 illustrates the time-averaged inefficiency metrics related to each model state. The predominant contribution to posterior state uncertainty was the observation mapping: ̅̅̅̅ except when estimating at the time of observation. In the latter case, observation uncertainty contributed approximately 80% of the posterior state entropy and EnKF assumptions contributed just over 20%. This result might appear to differ from the MSE analysis which showed that close to 100% of posterior estimation error could be mitigated by reducing observation error, however the efficiency analysis showed that when the observation variance is effectively greater than zero, the posterior uncertainty in was partly due to observation error and partially due to EnKF simplifications of the prior and posterior probability density functions. Gaussian pdfs have the highest entropy 97 of any density function with given variance and the true density functions over states do not have unbounded support, so approximating these as Gaussian in the prior and posterior added entropy to the posterior. Most importantly, the fact that some of the ̅̅̅̅ (bottom subplot in Figure 3) indicates that there was unused information in the observations – that is, uncertainty was not reduced as much as it could have been. The filter did not estimate the density function of the surface layer correctly, and this artifact propagated through the soil layer between time steps. The MSE analysis showed that it took one timestep for uncertainty in the boundary condition to propagate between soil layers and here we see evidence of assimilated information propagating between soil layers. In this particular case, more closely related to than and also more closely related to for example, the average Theil utility of Theil utility of for estimating for estimating was was was than to : and the average , indicating that filtering might be the wrong way to approach the problem of estimating . The soil moisture reanalysis by Dunne and Entekhabi (2005) highlighted a similar effect, by showing that an approximate smoother resulted in improved soil moisture state estimates as compared to an approximate filter, however our analysis indicates that a six-dimensional posterior might be sufficient (condition distributions of , , and on ). Since the ̅̅̅̅ were the largest inefficiency effects, we evaluated the effects of scaling the variances of and ; Figure 4 plots the time-averaged Theil utilities of perfect observations (Dirac delta function error distributions) for reducing uncertainty in the 98 state at one timestep in the future. This figure takes the form of a response surface due to varying . The utility of perfect observations decreased with decreasing uncertainty in the boundary condition due to the fact that observations of a highly uncertain system were more informative than observations of a system with little uncertainty. When the boundary condition uncertainty was high, observation utility decreased with increasing state transition error perturbations, due to the fact that state transition perturbations added noise to the system and served to decorrelate and . When the boundary condition uncertainty was low, increasing state transition perturbations increased the utility of observations due to the fact that there was no value in observations if there was no uncertainty in the system. 4. Summary and Discussion The information-based analysis of observing system simulation experiments described in this paper is novel and has the potential to inform the design of data assimilation systems. There are a number of methods for analyzing the value of observations in a data assimilation context (e.g. Gelaro et al. 2007), however existing methods are only able to provide estimates of the value of observation in the context of a particular filter, which does not allow for any estimate of the efficiency of the filter itself. By framing the problem in the context of uncertainty reduction, it is possible to make direct estimates of the informativeness of observations and the propagation of information through an NLDS simulator without additional simplifying assumptions. 99 The major limitation of this method is that it is only possible to measure the information content in a small subset of observations. Due to the curse of dimensionality (Bellman 2003), it is difficult to estimate the entropy of, or mutual information between, probability distributions of high dimensional random variables. This has two implications. First, we cannot calculate the potential utility of a smoother since a time series of observations must be treated as a -dimensional variable. The method we have demonstrated suggests an analysis of the expected value of collecting additional observations, or increasing observing frequency, by calculating the expected and actual reduction in entropy due to assimilating given that have been assimilated by some filter; however such estimates are not independent of filter assumptions due to the fact that the filter influences the prior at each timestep. The second implication is that the method is restricted to estimating the efficiency of assimilating only one-dimensional observations at any given time. If more than one type of observation (perhaps from different observing systems) or spatially distributed observations were available, each observation dimension would have to be analyzed separately. In certain cases it might be practical to project sets of cotemporaneous observations into low-dimensional space and then to estimate the mutual information between these projected observations and model states. Certainly mutual information-based methods similar to what we have outlined in this paper could be developed to test locality assumptions in spatially distributed data assimilation applications. There are also methods for calculating joint entropies using kernel approximations of underlying probability mass function which do not require reprojections (e.g., Ahmad and Lin 1976), however these methods are ultimately subject 100 to the same curse of dimensionality which limits all nonparametric density estimation, and are only practical up to a few dimensions. In both the Theory and Demonstration sections (sections 2 and 3), we estimated the entropy of one- and two-dimensional probability density functions and considered only one observation at a time. The computational cost of the proposed OSSE approach is dominated by sampling the HMM simulator, which requires running an NLDS simulator. The total number of model evaluations required is ( ) , and the ensemble size depends on the desired accuracy and precision of the entropy calculations. The central limit theorems for many discrete entropy estimators are known and can be used to determine an appropriate ensemble size based on application requirements. It is important to remember that uncertainty reduction does not necessarily equate to improved accuracy. In many applications, such as flood or drought forecasting, Type II error can have serious consequences, and the possibility of over-constrained or inaccurate posteriors should always be evaluated. The method we propose does not evaluate bias in the assimilation, only whether uncertainty reduction meets the potential. The strategy of evaluating data assimilation based on tracking the information contained in observations is intuitive. It is important to understand how this information ‘penetrates’ the model to allow for improved estimates of states which are only indirectly related to observations. Kalman-type filters assume linear relationships between model states, vis-a-vis the Gaussian prior, and estimating the nonparametric or semi-parametric mutual information between states and observations allows for an evaluation of this 101 assumption without actually knowing a priori the joint state distribution. By understanding what information is contained in observations about what model states, we might be able to design directed assimilation strategies using Bayes’ Law to condition appropriately chosen marginal distributions. Acknowledgement This work was supported by a grant from the NASA Terrestrial Ecology program entitled Ecological and agricultural productivity forecasting using root-zone soil moisture products derived from the NASA SMAP mission; principal investigator Wade T. Crow, and by grant number 1208294 from the US National Science Foundation East Asia and Pacific Summer Institute for US Graduate Students titled Estimating agricultural wateruse: an approach to evaluating the utility of data assimilation; principal investigator Grey S. Nearing. 102 Figures Figure 1: Example EnKF synthetic twin OSSE results for a single truth system are plotted for sixty timesteps (of ) with an ensemble size of observation error distribution for this experiment was . The and observations were assimilated at every timestep. The top panel compares prior and posterior distributions over the state distributions over observations and bottom panel compares the prior and posterior . 103 Figure 2: Normalized differences in mean-squared errors of posterior EnKF state forecasts (from [18.1] and [18.2]) due to assimilating observations from (top) and (middle) are plotted as a function of forecast time. The bottom plot shows the fractional improvement ( uncertainty standard deviation from to . ) due to reducing observation 104 Figure 3: Time-averaged simulator, observation error and EnKF inefficiency effects related to assimilating an observation at time moisture states at forecast times ; with prior . to improve forecasts of soil 105 Figure 4: Combined effects on time-averaged Theil utility of observations of cm soil moisture) for predicting (0-5 (5-15 cm soil moisture at one time-step in the future) due to co-varying the scale of the variances of distributions over boundary conditions and state transition perturbations ( ). 106 References Ahmad, I.A., & Lin, P.E. (1976). Nonparametric estimation of entropy for absolutely continuous distributions. IEEE Transactions on Information Theory, 22, 372-375, doi:310.1109/tit.1976.1055550 Arnold, C.P., & Dey, C.H. (1986). Observing-systems simulation experiments - past, present, and future. Bulletin of the American Meteorological Society, 67, 687-695, doi:610.1175/1520-0477(1986)1067<0687:OSSEPP>1172.1170.CO;1172 Bellman, R. (2003). Dynamic Programing. Mineola, NY: Dover Publications, Inc Bolten, J.D., Crow, W.T., Zhan, X.W., Jackson, T.J., & Reynolds, C.A. (2010). Evaluating the utility of remotely sensed soil moisture retrievals for operational agricultural drought monitoring. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 3, 57-66, doi:10.1109/jstars.2009.2037163 Brooks, R.H., & Corey, A.T. (1964). Hydraulic properties of porous media. Hydrology Papers, Colorado State University Crow, W.T., & Reichle, R.H. (2008). Comparison of adaptive filtering techniques for land surface data assimilation. Water Resources Research, 44, W08423, doi:08410.01029/02008wr006883 Crow, W.T., & Van Loon, E. (2006). Impact of incorrect model error assumptions on the sequential assimilation of remotely sensed surface soil moisture. Journal of Hydrometeorology, 7, 421-432, doi:410.1175/JHM1499.1171 Crow, W.T., & Wood, E.F. (2003). The assimilation of remotely sensed soil brightness temperature imagery into a land surface model using Ensemble Kalman filtering: a case study based on ESTAR measurements during SGP97. Advances in Water Resources, 26, 137-149, doi:doi:110.1016/S0309-1708(1002)00088-X Dunne, S., & Entekhabi, D. (2005). An ensemble-based reanalysis approach to land data assimilation. Water Resources Research, 41, W02013, doi:02010.01029/02004WR003449 Evensen, G. (2003). The Ensemble Kalman Filter: theoretical formulation and practical implementation. Ocean Dynamics, 53, 343–367, doi:310.1007/s10236-10003-1003610239 Flores, A.N., Bras, R.L., & Entekhabi, D. (2012). Hydrologic data assimilation with a hillslope-scale resolving model and L-band radar observations: Synthetic experiments with the ensemble Kalman filter. Water Resources Research, in press, doi:10.1029/2011WR011500 107 Galantowicz, J.F., Entekhabi, D., & Njoku, E.G. (1999). Tests of sequential data assimilation for retrieving profile soil moisture and temperature from observed L-band radiobrightness. IEEE Transactions on Geoscience and Remote Sensing, 37, 1860-1870, doi:1810.1109/1836.774699 Gelaro, R., Zhu, Y., & Errico, R.M. (2007). Examination of various-order adjoint-based approximations of observation impact. Meteorologische Zeitschrift, 16, 685-692, doi:610.1127/0941-2948/2007/0248 Gordon, N.J., Salmond, D.J., & Smith, A.F.M. (1993). Novel approach to nonlinear nonGaussian Bayesian state estimation. IEE Proceedings-F Radar and Signal Processing, 140, 107-113, doi:110.1049/ip-f-1042.1993.0015 Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering, 82, 35-45, doi:10.1115/1111.3662552 Li, J., & Islam, S. (1999). On the estimation of soil moisture profile and surface fluxes partitioning from sequential assimilation of surface layer soil moisture. Journal of Hydrology, 220, 86-103, doi:110.1016/s0022-1694(1099)00066-00069 Liu, Y.Q., & Gupta, H.V. (2007). Uncertainty in hydrologic modeling: Toward an integrated data assimilation framework. Water Resources Research, 43, W07401, doi:07410.01029/02006wr005756|issn 000043-001397 Mahrt, L., & Pan, H. (1984). A 2-layer model of soil hydrology. Boundary-Layer Meteorology, 29, 1-20 Margulis, S.A., McLaughlin, D., Entekhabi, D., & Dunne, S. (2002). Land data assimilation and estimation of soil moisture using measurements from the Southern Great Plains 1997 Field Experiment. Water Resources Research, 38, 18pp, doi:10.1029/2001WR001114 Miller, R.N., Carter, E.F., & Blue, S.T. (1999). Data assimilation into nonlinear stochastic models. Tellus Series a-Dynamic Meteorology and Oceanography, 51, 167194 Paninski, L. (2003). Estimation of entropy and mutual information. Neural Computation, 15, 1191-1253 Pauwels, V.R.N., Verhoest, N.E.C., De Lannoy, G.J.M., Guissard, V., Lucau, C., & Defourny, P. (2007). Optimization of a coupled hydrology-crop growth model through the assimilation of observed soil moisture and leaf area index values using an ensemble Kalman filter. Water Resources Research, 43, W04421, doi:04410.01029/02006wr004942 108 Reichle, R.H., Crow, W.T., & Keppenne, C.L. (2008). An adaptive ensemble Kalman filter for soil moisture data assimilation. Water Resources Research, 44, W03423, doi:03410.01029/02007WR006357 Reichle, R.H., Koster, R.D., Liu, P., Mahanama, S.P.P., Njoku, E.G., & Owe, M. (2007). Comparison and assimilation of global soil moisture retrievals from the Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) and the Scanning Multichannel Microwave Radiometer (SMMR). Journal of Geophysical Research-Atmospheres, 112, D09108 Reichle, R.H., McLaughlin, D.B., & Entekhabi, D. (2001). Variational data assimilation of microwave radiobrightness observations for land surface hydrology applications. IEEE Transactions on Geoscience and Remote Sensing, 39, 1708-1718, doi:1710.1109/1736.942549 Reichle, R.H., McLaughlin, D.B., & Entekhabi, D. (2002a). Hydrologic data assimilation with the ensemble Kalman filter. Monthly Weather Review, 130, 103-114, doi:110.1175/1520-0493(2002)1130<0103:HDAWTE>1172.1170.CO;1172 Reichle, R.H., Walker, J.P., Koster, R.D., & Houser, P.R. (2002b). Extended versus ensemble Kalman filtering for land data assimilation. Journal of Hydrometeorology, 3, 728-740, doi:710.1175/1525-7541(2002)1003<0728:EVEKFF>1172.1170.CO;1172 Scott, D.W. (2004). Multivariate density estimation and visualization. In J.E. Gentle, W. Haerdle, & Y. Mori (Eds.), Handbook of Computational Statistics: Concepts and Methods (pp. 517-538). New York: Springer Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379-423 Theil, H. (1967). Economics and Information Theory. Chicago: Rand McNally Wikle, C.K., & Berliner, L.M. (2007). A Bayesian tutorial for data assimilation. Physica D-Nonlinear Phenomena, 230, 1-16, doi:10.1016/j.physd.2006.1009.1017 109 APPENDIX C: INFORMATION LOSS IN ESTIMATION OF AGRICULTURAL YIELD: A COMPARISON OF GENERATIVE AND DISCRIMINATIVE APPROACHES Grey S. Nearing and Hoshin V. Gupta University of Arizona Department of Hydrology and Water Resources; Tucson, AZ Article in preparation for Hydrology and Earth System Sciences. 110 Abstract Data assimilation and regression are two commonly used methods for predicting agricultural yield from remote sensing observations. Data assimilation is a generative approach because it requires explicit approximations of the Bayesian prior and likelihood to compute the probability density function of yield, conditional on observations; regression is discriminative because it models the conditional yield density function directly. Here synthetic experiments were used to evaluate the abilities of two methods the ensemble Kalman filter (EnKF) and Gaussian process regression (GPR) - to extract information from observations. The amount of information in an observation was formally quantified as the mutual information between that observation and end-ofseason biomass. We formally define information loss, used information, and bad information as partial divergences from the true Bayesian posterior (yield conditional on the observations). Our results suggest that the simpler discriminative GPR approach can be as efficient as the more complex generative EnKF at extracting information from observations, and may therefore be better suited to dealing with the practical problems associated with remote sensed data (inhomogeneity of the satellite image pixel and mismatches in spatial resolution). This is important because discriminative methods can be applied without the need for a physical or conceptual simulator. Our method for analyzing information use has many potential applications. Approximations of Bayes’ law are used regularly in predictive models of environmental systems of all kinds, and the efficiency of such approximations has not been formally addressed. 111 1. Introduction From a probabilistic perspective, the goal of a yield model is to estimate the probability density function (pdf) of yield conditional on remote sensing observations. Two fundamentally different approaches to conditioning yield estimates on observations have emerged in the literature. The first, and most common, is discriminative (see Ng and Jordan 2001 for a definition of discriminative and generative approaches to classification), meaning that the conditional pdf is estimated directly by regression to map observations onto the probability space of yield estimates; both parametric and semiparametric regressions have been used (e.g., Jiang et al. 2004; Koppe et al. 2012; Kouadio et al. 2012; Li et al. 2007; Uno et al. 2005; Ye et al. 2006). The second method involves conditioning dynamic physical or conceptual simulations of crop phenology on information available in remote sensing observations. Usually, in this case, observations are used to estimate simulator parameters (called re-initialization; e.g., Dente et al. 2008; Doraiswamy et al. 2005; Maas 1988) or simulator states (called data assimilation; e.g., de Wit and van Diepen 2007; Nearing et al. 2012; Pauwels et al. 2007; Pellenq and Boulet 2004; Thorp et al. 2010). Techniques for re-initialization and data assimilation are usually Bayesian (although this is overlooked by many authors) in the sense that repeated evaluations of the simulator are used to either implicitly or explicitly define a pdf over simulator states (the Bayesian prior) and a pdf of observations conditional on states (the Bayesian likelihood). Specifically, data assimilation is a generative method for Bayesian conditioning, since formal estimates of the distribution of simulator states conditional on observations are 112 obtained by approximating Bayes’ Law (Wikle and Berliner 2007). The data assimilation, or generative, approach is conditional on simulator assumptions about the nature of crop development (e.g., physics, biology, phenology) and requires weather data at the temporal resolution of the numerical integration, whereas the regression, or discriminative, approach is independent of simulator assumptions and does not require specific weather data. Since generative methods are conditional on simulator assumptions, their performance is maximized when those assumptions are correct; in particular, generative methods achieve optimal performance in a synthetic setting (e.g., Nearing et al. 2012). Discriminative methods are not conditional on simulator assumptions. For example, data assimilation applications typically assume homogeneity in land surface characteristics at the scale of a remote sensing observation (e.g., one image pixel), whereas regression models can be trained directly using historical observations and yield statistics, and thus can implicitly account for mismatches in spatial resolution between agricultural areas and observations. For this reason, discriminative methods are generally preferable to generative methods for predicting yield; as long as they can be relied upon to efficiently extract information from observations. In this paper, we conduct synthetic experiments to compare the amount of information that is extracted from remote sensing observations by regression and by data assimilation. (Nearing et al. 2013) present a set of metrics, based on Shannon’s (1948) theory of communication, that can be used to quantify the contribution of data assimilation inefficiency to residual uncertainty in posterior state estimates (via synthetic data 113 experiments); here we extend that framework to explicitly measure information loss and information use. Specifically, we compare the ability of the ensemble Kalman filter (EnKF; Evensen 2003) with that of Gaussian process regression (GPR; Rasmussen and Williams 2006) to extract information from observations of leaf area index (LAI) and soil moisture – two types of observations related to agricultural productivity that are currently available from earth observing satellites (e.g., Kerr et al. 2010; Knyazikhin et al. 1999). The paper is organized as follows: Section 2 provides basic information about the crop simulator used to generate and assimilate synthetic remote sensing observations (section 2.1), the EnKF (section 2.2), and GPR (section 2.3). The simulation experiments are detailed in section 2.4 and section 2.5 explains how we estimate the information content of observations and information loss. Section 3 presents the results of these experiments and section 4 concludes. 2. Methods 2.1. A Crop Development Simulator Most crop simulators estimate yield as a fraction of biomass at the time of harvest; therefore, our objective was to estimate a probability distribution over plant biomass at harvest conditional on observations of either leaf-area index or soil moisture taken at some times during the growing season. A prior distribution over the harvest biomass was generated by a crop development simulator based on the EPIC model (Williams et al. 1989). Uncertainty in these prior biomass estimates was due to uncertainty in daily weather conditions and uncertainty in the crop development simulator itself. 114 In general terms, the dynamic system simulator can be thought of as a numerical integrator of the stochastic differential equation (Miller et al. 1999): [1] where and respectively and are the simulator state and boundary condition at time is a Wiener process. Solutions to [1] at discrete times are approximated by sampling a Markov process (Liu and Gupta 2007): [2.1] where the state transition function the is an approximation of the drift function are samples of noise with (Gaussian) distributions observed by , and . Periodically the system is , which is dependent on the state at the current time according to the observation function : [2.2] where represents random observation error drawn from the arbitrary distribution . [2] constitutes a hidden Markov model (HMM) and implies probability distributions and respectively, where is the number of simulation time steps. The following subsections describe the state transition function observation function condition, (section 2.1.1), the (section 2.1.2), as well as distributions over the boundary , and state transition perturbations, , which represent uncertainty about weather and the simulator’s representation of phenological development (section 2.1.3). 115 2.1.1. The State Transition Function The crop development simulator we used has 5 state dimensions ( ) consisting of a Heat Unit Index ( ; [~]), which represents the accumulated fraction of growing degree days (C day) needed for maturity, Plant Biomass ( ; [kg/m2]), Leaf Area Index ( ; [m2/m2]), volumetric soil moisture in the root zone ( evaporation zone ( ; [m3/m3]), and in a 5 cm deep ; [m3/m3]), so that . Simulator parameters are listed in Table 1 and the state transition functions are detailed in Appendix A; soil texture was assumed to be homogeneous with depth. The simulator was integrated at a daily time step using daily mean temperature ( ; [C]), daily cumulative solar radiation ( ; [MJ/day]), and daily cumulative precipitation ( ; [cm]) so that ( ). 2.1.2. The Observation Function Simulator predictions of remote sensing observations of LAI and surface soil moisture were generated according to the identity relationship: [3] where was the diagonal matrix with elements errors are [m2/m2] for MODIS [m3/m3] for SMAP , , and . Realistic observation observations (Tan et al. 2005) and (Entekhabi et al. 2010). It is not possible to observe directly by satellite, however we performed tests of assimilating observations of the total root zone soil moisture to assess the relative value of complete observations of the soil water state compared to observations of surface level soil moisture only. Observations 116 were synthesized according to [3] with the standard errors: and . Realistic observation overpass frequencies are every 4 days for surface soil moisture observations (Entekhabi et al. 2010) and as an 8-day composite for MODIS LAI observations (Tan et al. 2005). We will investigate the effect of observing frequency on yield estimation results. 2.1.3. Simulation Period and Uncertainty Sampling The simulator in Appendix A was used to simulate a rain-fed wheat crop grown in Rothamsted, UK with planting on May 10 and harvest days after planting. Simulator parameter values (listed in Table 1) were taken from the Spring Wheat column of Table 8.2.1 by Arnold et al. (1995). State transition perturbations were independent between time steps and state dimensions, and came from heteroscedastic distributions , where is the dimension identity matrix. Daily weather data collected by the Institute of Arable Crops Research (formerly the Rothamsted Research Station) spanning 40 years, from 1959-1999, are available from the Decision Support System for Agrotechnology Transfer version 4 release (Hoogenboom et al. 2008). Monthly statistics in the form of mean maximum and minimum daily temperature (̅̅̅̅, ̅̅̅̅), the number of wet days per month ( ), the total monthly precipitation (̅̅̅̅), and the mean and standard deviation of daily net radiation (̅̅̅̅, ) were derived from these daily data for the months of May, June and July each year, and used to generate samples of daily weather data from the uncertainty distribution outlined 117 by Schuol and Abbaspour (2007). Daily precipitation samples were generated by a twostate first order Markov process, where the probability of a wet day was: [4] where was the number of days in the month. The conditional probabilities of a wet day following a wet day and of a wet day following a dry day were: [5.1] [5.2] The rainfall amount on any given dry day was and the rainfall amount on a wet day was given by a gamma distribution: [6.1] with parameters: ̅̅̅̅ ̅̅̅̅ [6.2] [6.3] Daily mean temperature samples were generated by averaging samples of Gaussian distributions around daily maximum and minimum temperatures. The distribution over daily maximum temperatures was dependent on whether it was a wet day or dry day: 118 { ̅̅̅̅ ̅̅̅̅ ̅̅̅̅ ̅̅̅̅ [ ( (̅̅̅̅ ̅̅̅̅ ̅̅̅̅ [ ( (̅̅̅̅ ̅̅̅̅ ̅̅̅̅ )]] ̅̅̅̅ [7.1] ̅̅̅̅ )]] whereas the distribution over daily minimum temperatures was not so dependent: ̅̅̅̅ A sample of [ ̅̅̅̅ ]. [7.2] values for daily net radiation were sampled by the Gaussian distribution with mean ̅̅̅̅ and variance , and the samples were ordered so that the highest value of net radiation occurred on the day with the highest mean temperature, and so forth. Since yield estimates are often desired for locations that are not well monitored, we considered the situation where no knowledge of the weather was available other than a historical record of these monthly statistics. The forcing data uncertainty distribution was therefore sampled by choosing a random weather year and a random daily time series generated from the weather statistics from that weather year according to the Schuol and Abbaspour (2007) uncertainty distribution. 2.2. Data Assimilation and the Ensemble Kalman Filter From a Bayesian perspective, the HMM state transition distribution represents prior knowledge (before observations) about the state of the system, and the observation distribution is the Bayesian likelihood. The application of Bayes law to estimate a time series of HMM states given some observations is called a smoother: 119 [8] ∫ In the general case, no analytical solution to [8] exists and it is impractical to sample the posterior directly, due to the fact that it is -dimensional (dimension of the state multiplied by the number of integration time steps). Therefore it is necessary to make some simplifying approximations. The most common approximation is that the posterior of the state at time is independent of observations at times ; this assumption results in a filter: ∫ [9] ∫ ∫ Although the filter posterior is only -dimensional, it is, due to what is called the curse of dimensionality (Bellman 2003), relatively expensive to estimate nonparametrically, even for relatively small . The most common parametric approximation is due to Kalman (1960), who assumed and to be Gaussian and and to be linear; these assumptions result in a Gaussian posterior at each time step, which can be derived analytically. Evensen (2003) proposed to alleviate the assumption about a linear state transition function by repeated sampling of the HMM simulator. This results in an approximate Monte Carlo solution to [1] according to [2], and it is possible to derive sample estimates of the first two moments of the prior distribution represented by the integral in the numerator of [9]. By approximating this integral as Gaussian, it is still possible to derive a Gaussian approximation of the posterior analytically assuming a 120 linear observation function. Evensen’s (2003) approximation of [9] is the ensemble Kalman filter (EnKF). To implement the EnKF, an ensemble of independent and identically distributed (iid) samples of the boundary condition were drawn from . At time , ( along with ; this sample set is notated samples of the posterior ; called the analysis ensemble) were used to draw HMM distribution at time samples from the by propagating each of the samples through the state-transition equations and adding a random perturbation drawn from resulting sample set is called the background ensemble and notated . The . Given an observation , and since was jointly Gaussian independent in time with covariance , the set of maximum likelihood estimates of the posterior derived using each background ensemble member as the mean of the Gaussian prior was approximated by linearizing around the ensemble mean and minimizing the expected squared error to obtain (Houtekamer and Mitchell 2001): ̅̅̅̅)( ( ̅̅̅̅) ( ( ̅̅̅̅)( ̅̅̅̅) [10] ) ( ) were samples from the observation uncertainty distribution Under the stated conditions (Gaussian prior, Gaussian and linear . ), is an iid sample of the posterior of [9] and was used as the condition of the prior at timestep . The EnKF is fully generative because the Monte Carlo samples of the HMM state and 121 observation space were used in [10] to explicitly estimate the prior and likelihood EnKF distributions. In this case we used ensemble members, and the EnKF was applied to assimilate observations taken at three different observing frequencies: 1. a single seasonal observation (either , , or ), 2. a pair of observations of a given type taken at different times during the growing season, and 3. observations of a given type taken at overpass frequencies (every 8 days for LAI and every 4 days for soil moisture). In cases 1 and 2 where single observations or pairs of observations were used, observation times were chosen to yield the most information about yield, according to the definitions outlined in section 2.5. 2.3. A Gaussian Process Regression Interpolator As an alternative to filtering, it is possible to construct regressions that map observations directly onto the probability space of states. In the synthetic setting, this discriminative approach to Bayesian conditioning was done by sampling the HMM simulator and then training regressions from these samples. In practice, regressions might be constructed directly from historical yield data, and would therefore avoid the simulator and its inherent assumptions altogether – which is something that is impossible for data assimilation. 122 Training regression from simulator samples is a Bayes-discriminative approach, since density functions over all model components are implied by samples of the HMM state and observation space, but the conditional distribution of yield is not estimated directly by Bayes law. The Bayes-discriminative approach to data assimilation was discussed by McLaughlin (2002) and Wikle and Berliner (2007) who call Gaussian process regression (GPR; also known as kriging) interpolation, and formally compared GPR with smoothers and filters. These authors, however, used GPR to interpolate in 3-dimensional physical space; here we use it to interpolate in the HMM state and observation space. For sampled from , samples of constitute samples of the Bayesian prior (the denoted stands for open loop, which is a term used to indicate the process of sampling the HMM distributions, this will be discussed further in section 2.4). To estimate using some subset of observations on the range of times , we hypothesized that the prior distribution represented a Gaussian process with covariance function between the ad dependent on the value of observations, so that the covariance open loop samples of biomass was: ( where is the ) ) (of ) sample of the HMM distribution The GPR covariance matrix and ( , . is defined such that for generic random variables ( estimate of the marginal posterior from [8] over pp 16): [11] ). Then the two moments of the GPR were (Rasmussen and Williams 2006; 123 ̂ [( [ ] [12.1] ̂) | ] [12.2] is a hyperparameter of the covariance function (Rasmussen and Williams 2006; pp20). For this study, we used an anisotropic squared exponential covariance function: ( where ) ( ∑ ( ) ) is the Kronecker delta. To apply [12], the hyperparameters [13] , , and were estimated by maximum a posteriori (MAP) optimization: i.e., minimizing the negative log-likelihood objective function (Rasmussen and Williams 2006; pp 19): ( ( | )) [14] Neal (1996) demonstrated that GPR is identical to single-layer feed forward neural networks with infinite hidden nodes, but is more resistant to overfitting than neural networks when using an MAP training procedure. Thus, GPR is similar to the artificial neural network regressions used by many of the studies referenced in section 1. Notice that in a general application, could be substituted by any state of forecast for which estimates or predictions are desired. GPR was applied to estimate distributions conditional on the same nine sets of observatoins that were used by the EnKF – each of three observation types at each of the three observing frequencies listed in section 2.2. 124 2.4. Observing System Simulation Experiments Synthetic experiments that examine whether a proposed observing system and data assimilation strategy can be expected to produce posterior state distributions having increased accuracy (or reduced uncertainty) compared to the HMM simulator distributions are called observing system simulation experiments (OSSEs; Arnold and Dey 1986). The typical data assimilation OSSE is called an identical-twin experiment (e.g., Crow and Van Loon 2006) and involves comparing samples of the HMM state distribution with samples of some approximate Bayesian posterior distribution. An identical-twin experiment consists of three principle components: 4. samples of the forcing data, state and observation distributions called the open loop samples; these are denoted , and respectively. We will also consider error-free samples of the observation . 5. A single sample from the forcing data and HMM distributions which is used to define the true state of the NLDS system; this is called the truth system and denoted 6. , , , and samples of the state distributions conditional on these are called the analysis samples and denoted . after data assimilation; . Since the choice of truth system is random, it is necessary to repeat each OSSE a number (say ) of times. The results of analyzing these OSSEs will thus be a comparison between sets of samples of the analysis distribution over (denoted in the 125 case of the EnKF and in the case of GPR) against a single open loop sample (denoted ). Here we used such that each open loop sample was used individually as the truth system for a single OSSE; this is necessary and the reason is explained in section 2.5. 2.5. Measuring Information in Observations The amount of information contained in the realization of a random variable over an event space according to distribution The entropy of the distribution of is ( distributed ) (Shannon 1948). is the expected amount of information from a sample : ( ) [ ( )] [15] Entropy can be interpreted as a measure of uncertainty about distribution . Given that a random variable approximating from to by according to the is distributed according to , then results in information loss, which is measured by the divergence (Kullback and Leibler 1951): ( ) [ ( ) ( )] [16] The expected amount of information about one random variable realization of another random variable is called the mutual information between and is measured by the expected divergence, over and marginal distributions over : contained in a and , between the conditional 126 [ ( )] We measure the amount of information about ( conditional distributions contained in set s from the Divergences from the ( ) sampled by ( (or | [17] contained in ), where by first estimating the , are the observations at times truth system, directly from open loop samples. | ) posteriors to the analysis distributions (call them ) were estimated for each truth system, and ) was found by taking the expected value of this divergence over truth systems. The integrations necessary for measuring mutual information (in [16] and [17]) were performed by discretizing the HMM state and observation space. This not only allowed tractability, but also ensured that mutual information (as well as lost and used information) were non-negative (Cover and Thomas 1991). The HMM state and observation spaces were discretized using a histogram bin width given by Scott (2004): [18] ̂ were ̂ refers to either the standard deviation of the open loop samples of 20th percentile standard deviation of the open loop samples of the dimension at times (of , or to the ) observation ; ̂ was estimated separately for each observation dimension. Integrations were approximated as summations over the empirical pdf bins in state and observation space. We estimated the amount of information contained in each individual observation and in each pair of observations of a given type, however it was impractical to estimate the true 127 conditional distribution for more than a 2-dimensional observation set. We therefore approximated the amount of information contained in sets of observations taken at satellite overpass frequencies by assuming that the observations were independent conditional on information about . The single observations and pairs of observations with the most were assimilated by the EnKF and used as regressors in GPR prediction models. 2.6. Measuring Information Loss The total amount of information lost due to approximating by is ( This situation is illustrated in Figure 1, which shows that the divergence from integrates (with respect to ). to ) differences between the (log-transformed) conditional and analysis distributions. This total divergence is due to two parts: (i) loss of information contained in observations and (ii) the introduction of bad information by data assimilation (or regression) approximations. Note that both of these can arise due to approximations in the implementation of Bayes’ law; information loss can arise due to incomplete assimilation (see Nearing et al. 2013), whereas bad information can arise due to imperfections in the likelihood (observation) function. The amount of information contained in by about which is lost due to approximating can be found by integrating, with respect to , the difference in log- probabilities between the prior and conditional distributions in areas which overlap with differences in log-probabilities between the analysis and conditional distributions. Similarly, the amount of information contained in about which is used by the analysis 128 distribution can be found by integrating, with respect to , the difference between log- probabilities of the prior and analysis distribution in areas which overlap with differences between the prior and conditional distributions. Finally, bad information introduced into the analysis distribution is measured by integrating, again with respect to , the difference between log-probabilities of the prior and analysis distributions in areas which overlap with differences between the analysis and conditional distributions. These concepts of lost information, used information, and bad information are illustrated by Figure 2. Lost information and used information sum to the mutual information between states and observations, while lost information and bad information sum to the divergence from to . These three quantities can be thought of as partial divergences from the conditional distribution to the analysis distribution and from the conditional to the prior. Again, these quantities were estimated by discretizing the HMM state and observation space using [18], and as expected values over the truth systems. 3. Results Figure 3 illustrates an example EnKF OSSE including a set of samples (of the HMM open loop state distribution), a single truth system and the corresponding EnKF analysis state samples after sequentially assimilating observations of LAI at a frequency of one every 8 days. The entropy of the HMM prior distribution over was estimated as 1.92 nats by [15] from the open loop samples. The ratios of the amount of information contained in single 129 observations over the course of the growing season to the entropy of the prior distribution are illustrated in Figure 4. Under this particular uncertainty scenario, there was almost no informational value to observations of surface-level soil moisture, and very little value (generally less than 10% of total entropy) in observations of total rootzone water content. The information content of LAI observations was greatest at points in the growing season when differences between open loop samples of LAI were highest. Figure 5 illustrates the information content of pairs of observations – again plotted as the ratio of mutual information to entropy of the HMM prior distribution over LAI observations can be expected to reduce uncertainty about . A pair of by as much as 65% whereas a pair of root zone soil moisture observations can only be expected to reduce uncertainty by just over 15%. The ability of GPR and the EnKF to extract information from observations is illustrated by Figure 5. The fraction of LAI information extracted by both methods (used information) was greater than 75% in all cases. However over 50% of the soil moisture information was lost by both algorithms, except in the case of the EnKF assimilating a pair of surface soil moisture observations (in which case there was almost no information to begin with). Figure 5 illustrates the total amount of information in observations as the sum of used and lost information (left-hand plots). The results indicate only a small benefit to increasing LAI observation frequency above one or two observations per season – the information fraction rose about 20% when increasing the frequency from one to two observations and only about 13% when increasing from two per season to MODIS observing frequency. In contrast, there was greater value in increasing the 130 frequency of soil moisture observations, with an increase in information of about 260% when increasing from two per season to SMAP observing frequency. The performance of the EnKF and GPR when conditioning on LAI were similar – the EnKF performed slightly better when assimilating a pair of observations; GPR appeared to be slightly better at assimilating a time series at the MODIS overpass frequency, however the true conditional distribution was an approximation in this case. There was almost no bad information introduced by either algorithm when conditioning on LAI. The fraction of soil moisture used by both algorithms was also similar, however the EnKF introduced substantially more bad information into the analysis posterior than GPR when conditioning on observations at SMAP overpass frequency. The divergence from to the analysis posteriors is illustrated in Figure 5 as the sum of bad and lost information (right-had plots). 4. Conclusions and Discussion The results of this synthetic experiment indicate that GPR, as an example of a discriminative method for approximate Bayesian conditioning of yield estimates on remote sensing observations, is generally as efficient as the most common generative method (data assimilation by the EnKF) at extracting information from observations. This is important, because there are several practical and theoretical reasons to prefer discriminative methods over generative ones. The most important is that discriminative methods can be applied without the need for an HMM simulator. Although we tested a Bayes-discriminative approach, in a real-world situation regression models could be 131 trained directly on historical yield data (bypassing the simulator altogether); this is impossible for data assimilation. This implies that GPR (or any other type of regression) may be generally better suited to dealing with the practical problems that confound typical EnKF application in a remote sensing setting - namely assumptions about homogeneity of the satellite image pixel and mismatches in spatial resolution between the simulator, modeled system, and observations. In any case, our results indicate that there is no reason to prefer the generative approach. Further, we have provided a general method for comparing various methods. The method for analyzing information use has many potential applications. Approximations of Bayes’ law are used regularly in predictive models of environmental systems of all kinds (e.g., Abdu et al. 2008; Crow and Wood 2003; de Wit and van Diepen 2007; Koppe et al. 2012; Liu and Gupta 2007; McLaughlin 2002; Reichle 2008; Vrugt et al. 2012; Weerts and El Serafy 2006), and the efficiency of these approximations has never been addressed. This paper introduces a formal and rigorous method for analyzing the efficiency of approximate Bayesian methods for extracting information from remote sensing observations to condition HMM state estimates. Appendix A: The Crop Development Simulation State Transition Function Prior to simulation, soil texture parameters sand and clay fractions ( and ; [m3/m3]; Table 1) were converted to Brooks and Corey (1964) type hydraulic coefficients: porosity ( ; [m3/m3]), bubbling pressure ( ; [cm]), saturated hydraulic conductivity ( ; 132 [cm/day]) and pore size distribution index ( ; [~]) using pedotransfer functions developed by Cosby et al. (1984) (as reported by Santanello et al. 2007): [A.1.1] [A.1.2] [A.1.3] [A.1.4] At the beginning of each time step, LAI was updated as a function of biomass during growth and as a linear decay function during senescence: ( ( ) ( ) ) [A.2] { Potential evapotranspiration [cm/day] was estimated by a Priestly and Taylor (1972) approximation from daily mean temperature, net radiation and albedo: [A.3.1] Albedo was a function of LAI: ( ) ( ) [A.3.2] Potential evapotranspiration was partitioned into potential evaporation and potential transpiration [cm/day] according to LAI: 133 ( ) [A.3.3] [A.3.4] The average volumetric water content in the soil below the top 5 cm was estimated using: [A.4] and unsaturated conductivity and soil diffusivity in each layer were calculated according to Brooks and Corey (1964) ( ( ( or ) ): ) ( ( ) ) [A.5.1] ( { ( ) ) [A.5.2] The infiltration potential [cm/day] is similar to that used by Mahrt and Pan (1984): ( [A.6.1] ) where the interception of precipitation by the canopy and was used to calculate throughfall [cm/day]: { ( ( ) ( ) )} [A.6.2] 134 Actual infiltration [cm/day] into the top two soil layers was: ( ) ( [A.6.3] ) [A.6.4] Direct evaporation [cm/day] from the top soil layer was (Mahrt and Pan, 1984): ( ( )) [A.7] The root zone depth [cm] was estimated according to Borg and Grimes (1986): ( ) [A.8.1] The root distribution function: ( ) [A.8.2] allowed for a calculation of the fraction of roots above depth z: { ( ( ) ) [A.8.3] Transpiration [cm/day] from the 5 cm soil layer was: [A.9.1] and transpiration from the lower portion of the soil column was: [A.9.2] 135 Soil moisture accounting was similar to that used by Mahrt and Pan (1984) except that we included transpiration; volumetric water content in the 5 cm and lower soil layers were updated as: ( ) [A.10.1] ( [A.10.2] ) The root-zone soil moisture state was: [A.10.3] Water and temperature stress factors range between zero and one and acted as multiplicative controls on potential biomass production. Plant water stress was the realized fraction of potential transpiration: [A.11.1] and plant temperature stress was: { ( ) } Photosynthetically active radiation [MJ/day] was estimated by Beer’s law : [A.11.2] 136 ( ) [A.12] and biomass development was simulated if emergence had occurred (i.e. ): [A.13] Acknowledgement This work was supported by a grant from the NASA Terrestrial Ecology program entitled Ecological and agricultural productivity forecasting using root-zone soil moisture products derived from the NASA SMAP mission; principal investigator Wade T. Crow. 137 Tables Table 1: Crop Growth Model Parameters Parameter Description Name Maximum rooting depth Fraction of biomass in roots Maximum LAI Fraction of LAI senesced per day Minimal (base) growing temperature Optimal growing temperature Heat units necessary for emergence Heat units necessary for maturity Heat units necessary for senescence Biomass energy conversion rate Soil albedo Soil volumetric sand fraction Soil volumetric clay fraction Residual Moisture Content Value Units 30 0.25 5 0.05 4 15 30 700 560 30 0.15 0.7 0.2 0.05 138 Figures Figure 1: An illustration of the area which contributes to divergence from the true Bayesian posterior to the analysis distribution . 139 Figure 2: An illustration of areas which contribute to used and lost portions of the mutual information between the state and observations, as well as areas which contribute to bad information in the analysis distribution . 140 Figure 3: prior (open loop) state samples drawn from the HMM distribution (gray) and from the EnKF analysis distributions due to assimilating daily LAI observations at MODIS observing frequency of once every 8 days (black). The truth system (one of ) used to generate these synthetic observations is marked in red. The observation uncertainty distribution for this OSSE was . 141 Figure 4: Illustrations of the ratio of mutual information between end-of-season biomass and single observations to the entropy of the prior distribution over end-of-season biomass as a function of observation time. These ratios were calculated from open loop samples. 142 Figure 5: Maps of the ratio of mutual information between end-of-season biomass and observations pairs to the entropy of the prior distribution over end-of-season biomass as a function of observation times. Pairs with the highest utility are marked with black circles. 143 Figure 7: Used, lost, and bad information in the EnKF and GPR posteriors scaled by the entropy of the prior distribution over . Used and lost information sum to the mutual information between observations and end-of-season biomass (left-hand plots) while bad and lost information sum to the divergence from the conditional to the analysis distributions (right-had plots). The information content (and thus used, lost and bad information) of sets of daily observations and sets of observations at satellite overpass frequencies were estimated by assuming that the observations were independent conditional on . 144 References Abdu, H., Robinson, D.A., Seyfried, M., & Jones, S.B. (2008). Geophysical imaging of watershed subsurface patterns and prediction of soil texture and water holding capacity. Water Resources Research, 44 Arnold, C.P., & Dey, C.H. (1986). Observing-systems simulation experiments - past, present, and future. Bulletin of the American Meteorological Society, 67, 687-695, doi:610.1175/1520-0477(1986)1067<0687:OSSEPP>1172.1170.CO;1172 Arnold, J.G., Weltz, M.A., Alberts, E.E., & Flanagan, D.C. (1995). Plant growth component. In D.C. Flanagan, & M.A. Nearing (Eds.), USDA Water Erosion Prediction Project hillslope profile and watershed model documentation (pp. 8.1 - 8.41). West Lafayette, IN USA: USDA-ARS National Soil Erosion Research Laboratory Bellman, R. (2003). Dynamic Programing. Mineola, NY: Dover Publications, Inc Borg, H., & Grimes, D.W. (1986). Depth development of roots with time - an empirical description. Transactions of the Asae, 29, 194-197 Brooks, R.H., & Corey, A.T. (1964). Hydraulic properties of porous media. Hydrology Papers, Colorado State University Cosby, B.J., Hornberger, G.M., Clapp, R.B., & Ginn, T.R. (1984). A statistical exploration of the relationships of soil-moisture characteristics to the physical properties of soils. Water Resources Research, 20, 682-690 Cover, T.M., & Thomas, J.A. (1991). Elements of information theory. In. New York, NY, USA: Wiley-Interscience Crow, W.T., & Van Loon, E. (2006). Impact of incorrect model error assumptions on the sequential assimilation of remotely sensed surface soil moisture. Journal of Hydrometeorology, 7, 421-432, doi:410.1175/JHM1499.1171 Crow, W.T., & Wood, E.F. (2003). The assimilation of remotely sensed soil brightness temperature imagery into a land surface model using Ensemble Kalman filtering: a case study based on ESTAR measurements during SGP97. Advances in Water Resources, 26, 137-149, doi:doi:110.1016/S0309-1708(1002)00088-X de Wit, A.M., & van Diepen, C.A. (2007). Crop model data assimilation with the Ensemble Kalman filter for improving regional crop yield forecasts. Agricultural and Forest Meteorology, 146, 38-56 145 Dente, L., Satalino, G., Mattia, F., & Rinaldi, M. (2008). Assimilation of leaf area index derived from ASAR and MERIS data into CERES-Wheat model to map wheat yield. Remote Sensing of Environment, 112, 1395-1407 Doraiswamy, P.C., Sinclair, T.R., Hollinger, S., Akhmedov, B., Stern, A., & Prueger, J. (2005). Application of MODIS derived parameters for regional crop yield assessment. Remote Sensing of Environment, 97, 192-202 Entekhabi, D., Njoku, E.G., O'Neill, P.E., Kellogg, K.H., Crow, W.T., Edelstein, W.N., Entin, J.K., Goodman, S.D., Jackson, T.J., Johnson, J., Kimball, J., Piepmeier, J.R., Koster, R.D., Martin, N., McDonald, K.C., Moghaddam, M., Moran, S., Reichle, R., Shi, J.C., Spencer, M.W., Thurman, S.W., Tsang, L., & Van Zyl, J. (2010). The Soil Moisture Active Passive (SMAP) Mission. Proceedings of the IEEE, 98, 704-716 Evensen, G. (2003). The Ensemble Kalman Filter: theoretical formulation and practical implementation. Ocean Dynamics, 53, 343–367, doi:310.1007/s10236-10003-1003610239 Hoogenboom, G., Jones, J.W., Wilkens, P.W., Porter, C.H., Hunt, L.A., Boote, K.L., Singh, U., Uryasev, O., Lizaso, J., Gijsman, A.J., White, J.W., Batchelor, W.D., & Tsuji, G.Y. (2008). Decision Support System for Agrotechnology Transfer. In. Honolulu, HI: University of Hawaii Houtekamer, P.L., & Mitchell, H.L. (2001). A sequential ensemble Kalman filter for atmospheric data assimilation. Monthly Weather Review, 129, 123-137 Jiang, D., Yang, X., Clinton, N., & Wang, N. (2004). An artificial neural network model for estimating crop yields using remotely sensed information. International Journal of Remote Sensing, 25, 1723-1732 Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering, 82, 35-45, doi:10.1115/1111.3662552 Kerr, Y.H., Waldteufel, P., Wigneron, J.P., Delwart, S., Cabot, F., Boutin, J., Escorihuela, M.J., Font, J., Reul, N., Gruhier, C., Juglea, S.E., Drinkwater, M.R., Hahne, A., Martin-Neira, M., & Mecklenburg, S. (2010). The SMOS Mission: New Tool for Monitoring Key Elements of the Global Water Cycle. Proceedings of the Ieee, 98, 666687 Knyazikhin, Y., Glassy, J., Privette, J.L., Tian, Y., Lotsch, A., Zhang, Y., Wang, Y., Morisette, J.T., Votava, P., Myneni, R.B., Nemani, R.R., & Running, S.W. (1999). MODIS Leaf Area Index (LAI) and Fraction of Photosynthetically Active Radiation 146 Absorbed by Vegetation (FPAR) Product (MOD15) Algorithm Theoretical Basis Document Koppe, W., Gnyp, M.L., Hennig, S.D., Li, F., Miao, Y.X., Chen, X.P., Jia, L.L., & Bareth, G. (2012). Multi-Temporal Hyperspectral and Radar Remote Sensing for Estimating Winter Wheat Biomass in the North China Plain. Photogrammetrie Fernerkundung Geoinformation, 281-298 Kouadio, L., Duveiller, G., Djaby, B., El Jarroudi, M., Defourny, P., & Tychon, B. (2012). Estimating regional wheat yield from the shape of decreasing curves of green area index temporal profiles retrieved from MODIS data. International Journal of Applied Earth Observation and Geoinformation, 18, 111-118 Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22, 79-86; doi:10.2307/2236703 Li, A.N., Liang, S.L., Wang, A.S., & Qin, J. (2007). Estimating crop yield from multitemporal satellite data using multivariate regression and neural network techniques. Photogrammetric Engineering and Remote Sensing, 73, 1149-1157 Liu, Y.Q., & Gupta, H.V. (2007). Uncertainty in hydrologic modeling: Toward an integrated data assimilation framework. Water Resources Research, 43, W07401, doi:07410.01029/02006wr005756|issn 000043-001397 Maas, S.J. (1988). Using satellite data to improve model estimates of crop yield. Agronomy Journal, 80, 655-662 Mahrt, L., & Pan, H. (1984). A 2-layer model of soil hydrology. Boundary-Layer Meteorology, 29, 1-20 McLaughlin, D. (2002). An integrated approach to hydrologic data assimilation: interpolation, smoothing, and filtering. Advances in Water Resources, 25, 1275-1286 Miller, R.N., Carter, E.F., & Blue, S.T. (1999). Data assimilation into nonlinear stochastic models. Tellus Series a-Dynamic Meteorology and Oceanography, 51, 167194 Neal, R.M. (1996). Bayesian Learning for Neural Networks. New York: Springer Nearing, G.S., Crow, W.T., Thorp, K.R., Moran, M.S., Reichle, R.H., & Gupta, H.V. (2012). Assimilating remote sensing observations of leaf area index and soil moisture for wheat yield estimates: An observing system simulation experiment. Water Resources Research, 48 147 Nearing, G.S., Gupta, H.V., Crow, W.T., & Gong, W. (2013). An approach to quantifying the efficiency of a bayesian filter. Water Resources Research Ng, A.Y., & Jordan, M.I. (2001). On discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes Pauwels, V.R.N., Verhoest, N.E.C., De Lannoy, G.J.M., Guissard, V., Lucau, C., & Defourny, P. (2007). Optimization of a coupled hydrology-crop growth model through the assimilation of observed soil moisture and leaf area index values using an ensemble Kalman filter. Water Resources Research, 43, W04421, doi:04410.01029/02006wr004942 Pellenq, J., & Boulet, G. (2004). A methodology to test the pertinence of remote-sensing data assimilation into vegetation models for water and energy exchange at the land surface. Agronomie, 24, 197-204 Priestley, C.H.B., & Taylor, R.J. (1972). Assessment of surface heat-flux and evaporation using large-scale parameters. Monthly Weather Review, 100, 81-92 Rasmussen, C., & Williams, C. (2006). Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press Reichle, R.H. (2008). Data assimilation methods in the Earth sciences. Advances in Water Resources, 31, 1411-1418 Santanello, J.A., Peters-Lidard, C.D., Garcia, M.E., Mocko, D.M., Tischler, M.A., Moran, M.S., & Thoma, D.P. (2007). Using remotely-sensed estimates of soil moisture to infer soil texture and hydraulic properties across a semi-arid watershed. Remote Sensing of Environment, 110, 79-97 Schuol, J., & Abbaspour, K.C. (2007). Using monthly weather statistics to generate daily data in a SWAT model application to West Africa. Ecological Modelling, 201, 301-311 Scott, D.W. (2004). Multivariate density estimation and visualization. In J.E. Gentle, W. Haerdle, & Y. Mori (Eds.), Handbook of Computational Statistics: Concepts and Methods (pp. 517-538). New York: Springer Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379-423 Tan, B., Hu, J.N., Zhang, P., Huang, D., Shabanov, N., Weiss, M., Knyazikhin, Y., & Myneni, R.B. (2005). Validation of Moderate Resolution Imaging Spectroradiometer leaf area index product in croplands of Alpilles, France. Journal of Geophysical ResearchAtmospheres, 110, D01107 148 Thorp, K.R., Hunsaker, D.J., & French, A.N. (2010). Assimilating leaf area index estimates from remote sensing into the simulations of a cropping systems model. Transactions of the Asabe, 53, 251-262 Uno, Y., Prasher, S.O., Lacroix, R., Goel, P.K., Karimi, Y., Viau, A., & Patel, R.M. (2005). Artificial neural networks to predict corn yield from Compact Airborne Spectrographic Imager data. Computers and Electronics in Agriculture, 47, 149-161 Vrugt, J.A., ter Braak, C.J.F., Diks, C.G.H., & Schoups, G. (2012). Hydrologic data assimilation using particle Markov chain Monte Carlo simulation: Theory, concepts and applications. Advances in Water Resources Weerts, A.H., & El Serafy, G.Y.H. (2006). Particle filtering and ensemble Kalman filtering for state updating with hydrological conceptual rainfall-runoff models. Water Resources Research, 42 Wikle, C.K., & Berliner, L.M. (2007). A Bayesian tutorial for data assimilation. Physica D-Nonlinear Phenomena, 230, 1-16, doi:10.1016/j.physd.2006.1009.1017 Williams, J., Jones, C., Kiniry, J., & D., S. (1989). The EPIC crop growth model. Transactions of the ASAE, 32, 497-511 Ye, X.J., Sakai, K., Garciano, L.O., Asada, S.I., & Sasao, A. (2006). Estimation of citrus yield from airborne hyperspectral images using a neural network model. Ecological Modelling, 198, 426-432 149 APPENDIX D: MEASURING INFORMATION ABOUT MODEL STRUCTURE INTRODUCED DURING SYSTEM IDENTIFICATION Grey S. Nearing and Hoshin V. Gupta University of Arizona Department of Hydrology and Water Resources; Tucson, AZ Article in preparation. The content of this article will be presented at the European Geophysical Union General Assembly on April 8, 2013 (session HS1.2; Data & Models, Induction & Prediction, Information & Uncertainty: Towards a common framework for model building and predictions in the Geosciences) 150 Abstract System identification is the process of building models of dynamic systems that both correspond, structurally, to our understanding of physical laws and are behaviorally consistent with observations of the system. Fundamentally, this is a two-part process consisting of conceptual structure identification and mathematical structure identification, where the latter includes parameter estimation. The most common method for mathematical structure identification is an expectation-maximization (EM) approach – a variational Bayesian method that seeks to estimate the mode of the posterior distribution of model structures conditional on observations. The Bayesian prior is typically defined in the context of a probabilistic conceptual or physics-based simulator. Observations can be thought of as supplying Shannon-type information about model structure, and the model can be thought of as containing Shannon-type information about system behavior or system properties. In this paper, we use the EM procedure to identify the mathematical structure of a rainfall-runoff model by using HyMod to supply the EM prior (the conceptual structure). Further, we quantify the amount of information introduced by each of the conceptual and mathematical identification phases about (1) model structure and (2) streamflow, during both calibration and evaluation. 151 1. Introduction Dynamic simulation models are typically not able to represent the behavior of hydrologic systems in a completely accurate manner. For this reason, forecast efforts often adapt models and simulations using observations of the system (Liu and Gupta 2007). Bulygina and Gupta (2010) defined system identification as a two-step process of “building dynamical models that are simultaneously consistent with physical knowledge about the system and with the information contained in observational data.” The first step is conceptual structure identification, and involves a selection of system boundaries and boundary conditions, system states, and important system processes, resulting in a directed graph (see Figure 1). The second step is mathematical structure identification and involves specifying appropriate mathematical representations of system processes. Clark et al. (2008) focused on the first step and Bulygina and Gupta (2009) focused on the second step after assuming an a priori conceptual structure. Gupta et al (2012) discuss these steps in more detail. Here, we focus on the information added during each of these steps. The most common form of mathematical structure identification is an expectationmaximization (Dempster et al. 1977) algorithm which iteratively infers the distribution of model states conditional on observations (the E-step) and then infers the maximumlikelihood values of parameters of the state transition function (the M-step); this EM approach was introduced by Ghahramani and Roweis (1999) and subsequently applied by numerous authors (e.g., Damianou et al. 2011; Roweis and Ghahramani 2000; Turner et al. 2009; Wang et al. 2008), including Vrugt et al. (2005) and Bulygina and Gupta (2009; 152 2010, 2011) who used it in the context of identifying rainfall-runoff models. The E-step is data assimilation (Wikle and Berliner 2007), and the M-step is maximum a posteriori (MAP) parameter estimation (or its common approximation: maximum likelihood parameter estimation; see Aldrich 1997). The parameters that are calibrated in the M-step can be parameters of a nonparametric regression, so that the method provides a general approach for identifying the mathematical structure of a dynamic system model (e.g., Bulygina and Gupta 2009; Ghahramani and Roweis 1999); alternatively, these may be parameters of a parametric model such as a physics-based dynamic system simulator (e.g., Vrugt et al. 2005). EM is a variational Bayesian method that seeks to estimate the mode of the posterior distribution over model parameters conditional on observations, where the Bayesian prior is often defined in the context of a conceptual or physics-based simulator (e.g., Bulygina and Gupta 2009; Vrugt et al. 2005). In this sense, observations supply information about model structure because conditioning on observations modifies the distribution over parameters. Information (Shannon 1948) is defined as the expected divergence (Kullback and Leibler 1951) caused by Bayesian conditioning (Cover and Thomas 1991 p6); this concept is explained in detail in section 2.3. We are interested in measuring the amount of information introduced during the two steps of system identification. That is, we want to know how much information is introduced by defining a conceptual model (the EM prior) and, subsequently, a mathematical model (the EM posterior); the latter is the amount of information extracted from observations by EM conditioning. This process of 153 measuring information is demonstrated via the estimation of a dynamic rainfall-runoff model. The paper is organized as follows: section 2 outlines our definition of a dynamic system model (section 2.1), the EM algorithm used in this study (section 2.2), the method we propose to measure information (section 2.3), and the system identification problem we use for demonstration (section 2.4). Section 3 describes the results of our application experiment, and section 4 concludes. 2. Methods 2.1. Dynamic System Simulators Fundamentally, system identification is the process of reducing uncertainty about the structure of an appropriate model of a dynamic system. The most common approach to quantifying uncertainty in model structure is to represent the time evolution of the system state by an Euler-Maruyama approximation of an Ito stochastic differential equation (e.g., Archambeau et al. 2007); data assimilation can be used to attenuate this type of uncertainty (Miller et al. 1999; Restrepo 2008). Thus, we will begin by defining a nonlinear dynamical system model as a numerical integrator of the equation: [1] where respectively and and are the simulator state and boundary condition at time is a Wiener process. Solutions to [1] at discrete times approximated by sampling a Markov process (Liu and Gupta 2007): are 154 [2.1] where the state transition function is an approximation of the drift function , and are noise sampled from the Gaussian distribution . Periodically the state of the system is observed according to an observation function : [2.2] where , and the observation error is drawn from an arbitrary distribution . In the above, [2] constitutes a hidden Markov model (HMM) and implies probability distributions and respectively, where simulation time steps. The conditional distributions and is the number of define a joint probability distribution over model states and observations for the simulation period . From the perspective of this paper, the purpose of system identification is to choose a state transition model which describes the time-dynamics of the modeled system. We have found it impossible to reliably identify both and simultaneously (although some authors report success in this endeavor, e.g., Wang et al. 2008) due to the fact that given a finite data set there are many equifinal maps from Presumably, in most applications to when . will be largely defined by the physics of a particular measurement device, which we assume to be well-understood compared to the physics of the dynamic system. 155 2.2. An EM System Identification Algorithm The first step to implementing an EM approach to mathematical structure identification is to define a prior distribution over model structures. The best way to do this is to identify a probabilistic conceptual model. The basic components of a viable conceptual structure for are: (1) a finite set of possible values for the dimension of the state each possible value of , and , and (2) for distributions which define a joint probability distribution between all state and observation components. In section 2.1, we discussed and in the context of a single HMM, but theoretically, several HMM could contribute to populating this joint density function. For simplicity of discussion, the rest of this article will only consider the case of a single possible value of . EM system identification is composed of two parts: data assimilation (E-step) and regression (M-step). The E-step estimates a distribution over model states, and the M-step finds parameters of a regression model which maximize the likelihood of the E-step state distributions. The E and M steps are iterated until convergence; in this article, we will not formally define convergence, and instead use a fixed number of EM loops; note that convergence of the EM algorithm is assured (Bulygina and Gupta 2011; Ghahramani and Roweis 1999) . Each M-step results in a regression model; these will be notated model results from the so that the M-step iteration. The EM prior is defined by a conceptual simulator, and we begin the EM procedure by emulating this simulator using a 156 nonparametric regression model; this initial emulator will be designated step performs data assimilation using model . The E- . Simultaneous state-updating and parameter estimation is a procedure that has been applied to hydrologic prediction problems (e.g., Moradkhani et al. 2005b); notably Vrugt et al. (2005) and Bulygina and Gupta (2009) use EM approaches which are conceptually similar to the one proposed by Ghahramani and Roweis (1999; also see Roweis and Ghahramani 2000). The method we use is also very similar to the original; our E and M steps are described in subsections 2.2.1 and 2.2.2. 2.2.1. E-Step: The Ensemble Kalman Smoother From a Bayesian perspective represents prior knowledge (before assimilating observations) about the state of the system, and the observation distribution is the Bayesian likelihood. The application of Bayes’ Law to estimate a time series of HMM states given some observations modeled observations which correspond directly with is called a smoother: [3] ∫ In the general case, no analytical solution to [3] exists and it is impractical to sample the posterior directly, due to the fact that it is -dimensional (dimension of the state multiplied by the number of simulation time steps). Therefore it is necessary to make some simplifying approximations. The most common approximation is due to Kalman (1960), who was able to approximate analytically by assuming that state at time 157 is independent of observations at times ; this results in a filter (see McLaughlin 2002 for a concise definition of smoothers and filters): [4] ∫ ∫ ∫ Kalman’s (1960) implementation also assumed that assumption implies that and and were Gaussian. This are linear, and results in an analytical solution for the (Gaussian) posterior. Evensen (2003) proposed to alleviate Kalman’s (1960) assumption about a linear state transition function by using a Monte Carlo approximation of at each timestep; this is called the ensemble Kalman filter (EnKF). To implement the EnKF at time , an ensemble of posterior distribution used to draw each of the independent and identically distributed (iid) samples of the at time ( ; called the analysis ensemble) are samples from the HMM distribution by propagating samples of through the state-transition equations and adding a random perturbation drawn from . The resulting sample set is called the background ensemble and notated . Given an observation independent in time with covariance , and since is jointly Gaussian , the set of maximum likelihood estimates of the posterior derived using each background ensemble member as the mean of the Gaussian prior is approximated by linearizing around the ensemble mean and minimizing the expected squared error to obtain (Evensen and van Leeuwen 2000): 158 [5.1] ̅̅̅̅) ( ( ( where are ̅̅̅̅)( ̅̅̅̅) ) ( ) [5.2] samples from the observation uncertainty distribution . Under the stated conditions (filter assumption, Gaussian prior, Gaussian and linear ), is an iid sample of the posterior of [4] at timestep , and is used as the condition of the prior at timestep . The same procedure is used at time except that a prior distribution over initial states is sampled; these samples are propagated through the state transition equations, and along with samples of samples from , used to generate . Evensen and van Leeuwen (2000) showed that the smoother posterior could be estimated sequentially, like the sequence of filter posteriors. They derived a procedure for sampling the posterior of [3] with assumptions similar to the EnKF; this method is called the ensemble Kalman smoother (EnKS), and the state sample at time is: ∏ [6] 2.2.2. M-Step: Sparse Gaussian Process Regression Once observations have been assimilated to produce a posterior distribution over model states, a new state transition function similar as possible to can be chosen so that is as . Specifically, we want a regression model . One possible model, as proposed by Ghahramani and Roweis 159 (1999), is to use a summation of radial basis functions; this is a special case of Gaussian process regression (GPR; Rasmussen and Williams 2006 p14). The strategy is to train a GPR model using samples from . We prefer to use sparse Gaussian process regression (SGPR; Snelson and Ghahramani 2006) over GPR for two reasons: (1) SGPR is more computationally efficient than GPR, and (2) SGPR results in predictions with higher uncertainty than GPR. Due to (2), SGPR results in expanded support of the data assimilation prior, which has an effect similar to introducing an inflation factor into the data assimilation algorithm. This is desirable to avoid underestimating the variance of the data assimilation prior due to finite sample size and improper representation of model structure uncertainty (Anderson 2007). An SGPR model consists of a pair of mappings from an independent variable the mean and variance of a distribution over a dependent variable to ; that is, an SGPR model is a one-dimensional Gaussian process (GP) indexed by . The first step to defining an SGPR model is to define a covariance function for the GP. The covariance function (or kernel) is a function on pairs of associated with those , which defines the covariance between . In this case we use an anisotropic squared exponential (also called the automatic relevance determination kernel): ( ∑ where ) is the Kronecker delta and hyperparameters. The second step is to define a set of [7] , , and are called training data pairs consisting of 160 inputs and targets which will be used to train the regression. The third step is to define a set of pseudo-inputs ̃ {̃ ̃ {̃ } and pseudo-targets ̃ ̃ }. The hyperparameters, pseudo-inputs, and pseudo-targets are parameters of the SGPR model and will be trained using MAP estimation; the training data pairs are designated a priori (for example, by sampling ). A matrix of covariances between targets (training, test, or pseudo) is notated . The probability of the training data ( | where ̃ ̃) [ is ( where is: ̃) (̃ ̃) ( the ̃ diagonal ̃) (̃ ̃) (̃ ] matrix [8] with entries ). According to [8], each is independent of the others. MAP estimation is accomplished by maximizing the probability of the training data according to [8]; further details are given by Snelson and Ghahramani (2006). Prediction at new (test) input ( ̃)( (̃ ̃) ( (̃ is given by: )( ) ( ̃ )) [9.1] ̃)( (̃ ̃) ( (̃ ̃) (̃ [9.2] )( ) ( ̃ )) ) ( ̃ ) 161 is the prediction mean and is the prediction variance. Separate SGPR were used to define a state transition model was represented by a single GP, and all initial ; each state dimension GPs were independent of the others. When the emulator was trained, training inputs and targets came from samples of as defined by the conceptual simulator. During the subsequent EM iterations, training data came from the EnKS posterior samples given by both and in [7-9] represent model states: if at time according to EnKS sample sample of the entire state vector at time (i.e., . Notice that was a value of the state dimension , then was a concatenation of the and the sample of the inputs ). All training data for the SGPR M-steps were normalized to have zero mean and unit variance. This means that the prior before imposing the conceptual model in the form of the emulator was a standard normal distribution which was independent over all states (Rasmussen and Williams 2006 p15 Figure 2.2a). 2.2.3. Why We Cannot Use Filters From a system identification perspective, the primary difference between the smoother [3], and the filter [4] is that [3] defines a joint distribution over whereas [4] does not – it defines a series of conditionally independent distributions through the simulation time period. Training a regression model and requires samples of the joint posterior (pairs of ); this means that samples from the time-independent distributions from [4] 162 are not useful as training data. For this reason, it is necessary to use smoothers rather than filters for data assimilation-driven system identification. 2.3. Measuring Information in the Conceptual and Mathematical Models The amount of information contained in a random variable ( is drawn from distribution ) (Shannon 1948). The information contained in a random variable the distribution is called the mutual information between and the expected (over distribution of to a distribution and is quantified as ) divergence between the marginal distribution conditional on : about and the . More generally, divergence from a distribution is the expected information loss due to approximating by and is defined as the expected value of the difference in log-probability between and (Kullback and Leibler 1951): [ ( ) ( )] [10] where the integration (expectation) is taken with respect to the probability measure defined by the original distribution . Mutual information, or information gain due to Bayesian conditioning, is illustrated in Figure 2. There are two types of information added during system identification that are of interest to hydrologists: (i) information about the model which we choose to represent the system (e.g., Ye et al. 2008) and (ii) information contained in the model about a particular aspect of the system. Information about the model is quantified as the divergence from a prior distribution over model structures to a posterior distribution, while information 163 about a phenomenon which arises from the system is quantified by the divergence from prior to posterior distributions over the outcomes of the phenomenon. These two types of information are described in detail in subsections 2.3.1 and 2.3.2. 2.3.1. Information about the Model Extracted from Observations EM system identification seeks to estimate the mode of the posterior distribution over model parameters (structure) conditional on observations; therefore, the idea that system identification extracts information about the model is probably the more intuitive of the two types of information. In the conceptual structure identification step (defining the EM prior), we change the distribution over possible models by an amount quantified as the divergence from the SGPR prior (a set of standard normal distributions) to the emulator. In the mathematical structure identification step (defining the EM posterior), the amount of information extracted from observations about the mathematical structure of the model during the and all EM step is the divergence from emulators consist of sets of to another for state dimension analytically as: to . The SGPR prior independent GPs. The divergence from one GP (think of these as the GPs representing the state transition function of at the and EM iterations respectively) can be computed 164 ( ) [ (( | ( | where | | ) ) ( ) ( ) [11] )] and and covariance matrices of processes and and are the means and respectively at a set of . Equation [11] was applied for and ) ( and test inputs , as well as for representing the SGPR prior. In our case, we used equally spaced test inputs in the hypercube defined by the maximum and minimum boundary condition measurements and maximum and minimum of state samples from all Since each of the divergence from Gaussian processes that constitute to is the sum of the . are independent, the total divergences estimated by [11]. 2.3.2. Information Contained in the Model about a Hydrologic Process Gong et al. (2013) discuss information contained in inputs about a hydrologic process. For convenience (and following Gong et al. 2013), let’s say the process we are interested in is the observed process; note, however, this can be generalized to unobserved processes also. Gong et al. (2013) further point out that the data processing inequality (Cover and Thomas 1991 p34) states that any model which conditions on will be less efficient than Bayesian conditioning using the true underlying joint distribution . The problem is that this true underlying joint distribution is generally 165 unknown and a surrogate model of that distribution must be used instead. From this perspective, any model information about through which results in a distribution , and this amount of information can be measured as the divergence from any prior distribution over will notate over states contributes to the posterior distribution, which we . Considering that observations are available for data assimilation and system identification, we defined the prior over for as a histogram of observations (call this empirical distribution ̂) with bin widths for each observation dimension given by Scott (2004): ( where is the variance of the [12] ) dimension of the observation vector (see [5.2]) and is the number of avialable observations ( might be empty for some times during the simulatoin period). The amount of information added at timestep due to prescribing the conceptual model emulated by ̂ is therefore the divergence from to . Similarly the amount of information added due to mathematical structure identification during the EM iteration is the divergence from to . All of these divergences were estimated by discretizing the observation space using bin widths given by [12]; this means that the expected value of the divergence [10] represents a summation over histogram bins. From a practical point of view, this means that [11] measures the divergence between continuous (Gaussian) distributions whereas [12] implies that we measure divergence between discrete 166 approximations of continuous distributions. One implication of this is due to the fact that continuous information is unbounded, whereas discrete information is always positive. Thus, the amounts of information we gain about model structure and about a process of interest are not directly comparable using the framework outlined by [11] and [12]. It is trivial to estimate [11] for a discrete state space, however this is not necessary for the present discussion. 2.4. An Application Experiment We tested the EM algorithm presented in section 2.2 and the method for estimating information introduced during system identification outlined in section 2.3 using a toy problem related to predicting streamflow in the Leaf River catchment (1944 km2) in southern Mississippi, USA. 2.4.1. Leaf River Data and Simulation Period Forty years (1948-1988) of daily cumulative precipitation [mm/day], cumulative potential evapotranspiration [mm/day], and streamflow (m3/day) are available from the Hydrology Lab of the US National Weather Service. We used data from 1951 and 1952 for this experiment: the first 30 days of data were used for warm-up, the next 365 days were used to perform system identification (including parameter estimation of the conceptual simulator; call this the calibration period) and the final 365 days were used for performance evaluation: the simulation period was only available during the calibration period, so days but observations were . 167 2.4.2. The HyMod Simulator The conceptual simulator used to define the model was HyMod (Boyle 2000 and Figure 1). HyMod is commonly used for proof of concept demonstrations for hydrologic data assimilation problems (e.g., Bulygina and Gupta 2011; Moradkhani et al. 2005a; Vrugt et al. 2012; Weerts and El Serafy 2006). The model requires inputs of precipitation ( [mm/day]) and potential evapotranspiration ( streamflow ( component ( [mm/day]) and estimates of [mm/day]). The state vector consists of a soil moisture storage [mm]) and storage in a single slow-flow routing tank ( some number of quick-flow routing tanks; here we used two ( had a 4-dimensional state vector ( dimensional output ( [mm]) and [mm]); thus the model ), 2-dimensional input ( ) and 1- ). There are five model parameters: soil moisture storage capacity , a partitioning coefficient , and tank outflow coefficients , infiltration exponent and . We calibrated these parameters and initial states to streamflow observations during the calibration period using shuffled complex evolution (Duan et al. 1992) with a Nash-Sutcliffe (Nash and Sutcliffe 1970) efficiency ( ) objective function; these calibrated values are listed in Table 1. The HyMod state transition functions and observation function are described in Appendix A. The state transition uncertainty distributions used to define simulator were independent in time and given by for this conceptual [ ]. 168 Streamflow was the observed variable, and the observation error distribution was [ ]. 2.4.3. Implementing the Learning Algorithm The and distributions defined by HyMod and described in section 2.4.2 were sampled to provide training data for an SGPR emulator. Conceptual learning about model structure was measured as the divergence from the SGPR prior to the emulator. We used distribution pseudo-inputs for the SGPR, sampled the and times (each sample consisted of state and observation values for timesteps), and used samples of the training inputs sampled randomly from the distribution during the calibration period. During mathematical structure learning, we also used a value of for the EnKS and also for sampling each distribution for SGPR training. SGPR training during each EM iteration used and . 3. Results Samples ( ) of the calibrated HyMod simulator over the day simulation period are illustrated in Figure 3. The efficiencies of the mean streamflow estimates made by this model after parameter calibration were and for the calibration and evaluation periods respectively. Figure 4 shows the behavior of the Nash-Sutcliffe efficiencies of the mean model estimates of streamflow during the conceptual and mathematical identification steps. 169 There are three interesting things to note here. First, efficiency of the similar to that of the HyMod simulator during the calibration period, but the emulator was emulator reduced prediction efficiency during the evaluation period as compared to the calibrated HyMod simulator, by about . This was likely largely due to the fact that our training samples did not fill the state- and input-space hypercube, but could evidence of nonstationarity in the watershed response – we do not address nonstationarity in this paper. Second, the efficiency of the mean model predictions of streamflow during the calibration period converged to a value above by the second EM iteration, but the evaluation period efficiency remained strictly less (~0.80-0.85) than the calibration efficiency. Third, we notice that the calibration period efficiency generally increased (with a small exception during the fourth EM iteration), and although the evaluation efficiency generally increased, there were iterations where the value decreased by small amounts. Figure 5 compares the input-output response of the HyMod state transition simulator with the mean response of the SGPR model (via equation [9.1]). There are noticeable differences in the response of the two models. One advantage of starting with a conceptual simulator as the EM prior is that we can hypothesize about a physical interpretation of some of these differences (Bulygina and Gupta 2011), however we caution against making concrete interpretations about learned characteristics of the system since response surface evolution is affected by assumptions in the EM algorithm itself – including but not limited to a known function and a Gaussian relationship between model states imposed by the EnKS. Some hypotheses might be: 170 1. In the case of highest precipitation values, input values but decreased at high response is increased at low input values (Figure 5, subplots H-a and M-a). This indicates that the HyMod infiltration process representation might be improved to better account for precipitation intensity and antecedent conditions. 2. In the identified mathematical structure, responds to the state of the quick- flow tanks (Figure 5, subplots H-b and M-b). In the HyMod conceptual simulator, the slow-flow and quick-flow tanks represent routing storage, however the identified model suggests that there is some feedback from the routing process to the soil moisture process, which controls infiltration. One possible explanation is that the HyMod routing process might be accounting for outflow delays actually caused by transfer in the soil. 3. sensitivity to is increased at high values (Figure 5, subplots H-c and M-c). This indicates that the linear routing coefficient represented by a dynamic process depending on 4. might be better . responds to the states of the quick-flow tanks (Figure 5, subplots H-d and Md). If the routing process were influenced by transient time of water moving through the soil matrix, then this process appears to include some feedback between processes represented at different time scales. This hypothesis is also supported by the nonlinear feedback found in the quick-flow state responses to other quick-flow states. 171 5. Quick-flow state response to is increased, especially for low rainfall values (Figure 5, subplots H-e and M-e). This again indicates that the model requires a dynamic runoff partitioning process which is dependent on antecedent conditions. These types of hypotheses might be valuable in guiding physics-based or in situ investigation of the system. Figure 6 shows the amount of information learned about the model at each stage in the system identification process. The conceptual phase introduced less information about model structure than the first mathematical identification phase due to the fact that the SGPR prior assumed highly uncertain estimates of the states (the variance was large). This illustrates why Shannon information can be interpreted as a measurement of surprise: after assigning the SGPR prior, very few SGPR state transition models would elicit large surprises about the value of the states. A vast majority of the learning during the mathematical identification phase took place during the first EM iteration, illustrating the efficiency of the EM algorithm. Whereas, conditioning on the observation data supplied quite a bit of information about incorrect mathematical structure of the HyMod state-transition function, this was mostly extracted by data assimilation. It is important to note that the absolute values of the divergences reported in Figure 6 are highly dependent on the number of input test samples , and it is therefore the relative magnitude of these values that is of interest. Interestingly, Figure 7 shows that more of the learning about streamflow occurred during the conceptual identification phase than in the mathematical identification phase. This is 172 because the prior over streamflow was taken to be the histogram of calibration period observations, and so there was considerable room for surprise after imposing a conceptual model structure, but much less room for surprise during mathematical structure estimation. While learning about streamflow continued through the second EM iteration, the small divergences in the later EM iterations were at least partially due to the fact that we used a finite sample to approximate the distributions. It is interesting to note that both the conceptual and mathematical identified models contained almost the same amount of information about calibration- and evaluation-period streamflow observations. 4. Discussion This paper has two primary purposes. The first is to frame the discussion that appears in the hydrologic literature regarding system identification (including, to some extent, parameter estimation) in the context of well understood methods from other fields. EM system identification is quite general, flexible and conceptually simple, and it would be beneficial to discuss future efforts for simultaneous state and parameter estimation for hydrologic models in the context of existing theory. The second purpose is to frame a discussion about the meaning of information in the context of hydrologic models. Certainly, we would like to believe that our models and data sets are informative (in some sense). But, the key to understanding what it means to say that ‘a model or data set is informative’ is to understand what the information it contains is about (e.g., a conceptual rainfall-runoff model may be informative about streamflow, evapotranspiration and soil moisture, but perhaps not about relative humidity 173 or pressure in the atmospheric boundary layer). More clearly, it is generally useful to think about the information contained in a random variable as being about a probability density function over the distribution of that or any other random variable. This is the specific context in which Shannon’s (1948) theory applies. Furthermore, it is important to notice that models do contain information. While this may seem contrary to the data processing inequality, which states that no sequence of (linear or non-linear) transformations of the data can add information to the data, it is important to note that the data processing inequality measures information loss (or non-positive information added) against a standard that is set by the perfect model that exactly explains the relationships contained in the data – in general, this is the joint distribution between a regressor and regressand. Intuitively we expect that a model contains information, and by understanding that information manifests as the divergence caused by conditioning one random variable on another, we see that models contain information about relationships between random variables. In this context, the data processing inequality simply states that no model will perform as well as the true underlying (probabilistic) generating process. The methods we have used here for measuring information about model structure are specific to GPR. It is only possible to explicitly quantify continuous divergence between certain types of parametric distributions, and Gaussians happen to be one type where the formula is known. Furthermore, [11] actually calculates the divergence between probability distributions over model states, but in the case of models which are GP’s this is an appropriate surrogate for calculating the divergence between probabilistic models. 174 Appendix A: The HyMod Simulator Inputs (boundary conditions) are precipitation ( evapotranspiration ( [mm/day]). The state vector at each timestep consists of a soil moisture storage component ( ( [mm/day]) and potential [mm]) and storage in a single slow-flow routing tank [mm]) and some number of quick-flow routing tanks; here we used two ( [mm]). The observation (according to equation [2.2]) is an estimate of streamflow ( [mm/day]). Parameters are: soil moisture storage capacity partitioning coefficient , and tank outflow coefficients , infiltration exponent , a and . The HyMod simulator is illustrated in Figure 1. Infiltration is controlled by the relative available capacity of soil moisture storage at the beginning of the time step and a unitless parameter : ( ) ∫ ( ) [A.1] and evapotranspiration is controlled by the relative available capacity of soil moisture storage at the end of the time step: ( ) [A.2] The soil moisture state update is: [A.3] 175 Effective precipitation is the portion of precipitation that is not infiltrated, and this is routed to the slow- and quick-flow tanks according to parameter . The quick-flow tanks are in series so that the state in the first tank is: ( [A.4.1] ) ( [A.4.2] ) Outflow from the watershed is a linear function of the current state according to parameters and respectively: [A.5] The slow-flow storage tank is updated according to: ( ( )) [A.6] 176 Tables Table 1: Calibrated HyMod Parameter Values Parameter or Initial State Calibrated Range Value 50 – 600 425.73 0.05 – 1.95 0.07 0.5 – 0.95 0.90 0.001 – 0.1 0.05 0.3 – 0.95 0.58 0 – 600 396.13 0 – 300 20.64 0 – 100 84.50 0 – 50 22.22 Units [mm] ~ ~ ~ ~ [mm] [mm] [mm] [mm] 177 Figures Figure 1: The HyMod simulator state transition functions and observation function are represented as a graphical network at a single timestep. Each connection in the graphical network is labeled according to which parameter controls this interaction. See Table 1 and Appendix A for an explanation of symbols and a mathematical representation of each connection. The simulator components which represent a single timestep are grouped by the shaded box. 178 Figure 2: Mutual information between two random variables can be thought of as the information gain about one random variable due to Bayesian conditioning on the other. This is equal to the expected divergence from the Bayesian posterior to the Bayesian prior; divergence is an integration over the difference between log-transformed probability distributions. This figure illustrates the areas (prior to log-transforming the distribution) which are integrated (after log-transforming of both distributions) to measure divergence. 179 Figure 3: A member sample of the HyMod simulator including input and state transition uncertainty. Streamflow predictions are compared with observations in the bottom subplot. 180 Figure 4: Convergence of the Nash-Sutcliffe efficiency of mean model estimates of streamflow. 181 Figure 5: Partial state transition mean response surfaces of the HyMod simulator (H-a through H-h) and the EM SGPR model (M-a through M-h). Each surface represents an evaluation of the mean model response due to changing the two input state values on the x- and y-axes by fixing the other input values at a point in the hypercube; each of the three surfaces in each subplot was generated by fixing the inputs which do not appear on the x- or y-axis at a different (equally-spaced) location in the hypercube. 182 Figure 6: Information extracted from observations of streamflow about the conceptual and mathematical components of structure. EM iteration 0 represents the information added at the conceptual model identification phase (i.e., by ). Quantities in this figure are estimates of continuous divergence. The absolute magnitudes are dependent on the number of training samples, whereas the relative magnitudes are dependent on selecting a representative set of training samples – in this case, we are interested in the relative values. 183 Figure 7: Mean (discrete) information about streamflow during calibration and evaluation periods added during system identification. EM iteration 0 represents the information added at the conceptual model identification phase (i.e., by ) when starting with a prior which consisted of a histogram of calibration-period observations. Quantities in this figure are estimates of discrete divergence. 184 References Aldrich, J. (1997). R. A. Fisher and the making of maximum likelihood 1912-1922. Statistical Science, 12, 162-176 Anderson, J.L. (2007). An adaptive covariance inflation error correction algorithm for ensemble filters. Tellus A, 59, 210-224 Archambeau, C., Cornford, D., Opper, M., & Shawe-Taylor, J. (2007). Gaussian process approximations of stochastic differential equations. In N. Lawrence, A. Schwaighofer, & J. Quiñonero Candela (Eds.), Gaussian Processes in Practice (pp. 1-17). Bletchley, U.K. Boyle, D.P. (2000). Multicriteria calibration of hydrologic models. In, Department of Hydrology and Water Resources. Tucson, AZ: University of Arizona Bulygina, N., & Gupta, H. (2009). Estimating the uncertain mathematical structure of a water balance model via Bayesian data assimilation. Water Resources Research, 45 Bulygina, N., & Gupta, H. (2010). How Bayesian data assimilation can be used to estimate the mathematical structure of a model. Stochastic Environmental Research and Risk Assessment, 24, 925 Bulygina, N., & Gupta, H. (2011). Correcting the mathematical structure of a hydrological model via Bayesian data assimilation. Water Resources Research, 47 Clark, M.P., Slater, A.G., Rupp, D.E., Woods, R.A., Vrugt, J.A., Gupta, H.V., Wagener, T., & Hay, L.E. (2008). Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44 Cover, T.M., & Thomas, J.A. (1991). Elements of information theory. In. New York, NY, USA: Wiley-Interscience Damianou, A.C., Titsias, M.K., & Lawrence, N.D. (2011). Variational Gaussian process dynamical systems. In, NIPS (pp. 2510–2518). Granada, Spain Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1-38 Duan, Q.Y., Sorooshian, S., & Gupta, V. (1992). EFFECTIVE AND EFFICIENT GLOBAL OPTIMIZATION FOR CONCEPTUAL RAINFALL-RUNOFF MODELS. Water Resources Research, 28, 1015-1031 185 Evensen, G. (2003). The Ensemble Kalman Filter: theoretical formulation and practical implementation. Ocean Dynamics, 53, 343–367, doi:310.1007/s10236-10003-1003610239 Evensen, G., & van Leeuwen, P.J. (2000). An ensemble Kalman smoother for nonlinear dynamics. Monthly Weather Review, 128, 1852-1867 Ghahramani, Z., & Roweis, S.T. (1999). Learning nonlinear dynamical systems using an EM algorithm. Advances in neural information processing systems, 431-437 Gong, W., Gupta, H.V., Yang, D., Sricharan, K., & Hero, A.O. (2013). Estimating epistemic & aleatory uncertainties during hydrologic modeling: An information theoretic approach. Water Resources Research, n/a-n/a Gupta, H.V., Clark, M.P., Vrugt, J.A., Abramowitz, G., & Ye, M. (2012). Towards a comprehensive assessment of model structural adequacy. Water Resources Research, 48 Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering, 82, 35-45, doi:10.1115/1111.3662552 Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22, 79-86; doi:10.2307/2236703 Liu, Y.Q., & Gupta, H.V. (2007). Uncertainty in hydrologic modeling: Toward an integrated data assimilation framework. Water Resources Research, 43, W07401, doi:07410.01029/02006wr005756|issn 000043-001397 McLaughlin, D. (2002). An integrated approach to hydrologic data assimilation: interpolation, smoothing, and filtering. Advances in Water Resources, 25, 1275-1286 Miller, R.N., Carter, E.F., & Blue, S.T. (1999). Data assimilation into nonlinear stochastic models. Tellus Series a-Dynamic Meteorology and Oceanography, 51, 167194 Moradkhani, H., Hsu, K.L., Gupta, H., & Sorooshian, S. (2005a). Uncertainty assessment of hydrologic model states and parameters: Sequential data assimilation using the particle filter. Water Resources Research, 41 Moradkhani, H., Sorooshian, S., Gupta, H.V., & Houser, P.R. (2005b). Dual state– parameter estimation of hydrological models using ensemble Kalman filter. Advances in Water Resources, 28, 135-147 Nash, J.E., & Sutcliffe, J.V. (1970). River flow forecasting through conceptual models part I -- A discussion of principles. Journal of Hydrology, 10, 282-290 186 Rasmussen, C., & Williams, C. (2006). Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press Restrepo, J.M. (2008). A path integral method for data assimilation. Physica D: Nonlinear Phenomena, 237, 14-27 Roweis, S., & Ghahramani, Z. (2000). An EM algorithm for identification of nonlinear dynamical systems Scott, D.W. (2004). Multivariate density estimation and visualization. In J.E. Gentle, W. Haerdle, & Y. Mori (Eds.), Handbook of Computational Statistics: Concepts and Methods (pp. 517-538). New York: Springer Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379-423 Snelson, E., & Ghahramani, Z. (Eds.) (2006). Sparse Gaussian Processes using Pseudoinputs. Cambridge, MA: MIT Press Turner, R., M., D., & C., R. (2009). System identification in Gaussian process dynamical systems. In D. Görür (Ed.), Nonparametric Bayes Workshop at NIPS. Whistler, Canada Vrugt, J.A., Diks, C.G.H., Gupta, H.V., Bouten, W., & Verstraten, J.M. (2005). Improved treatment of uncertainty in hydrologic modeling: Combining the strengths of global optimization and data assimilation. Water Resources Research, 41, 17 Vrugt, J.A., ter Braak, C.J.F., Diks, C.G.H., & Schoups, G. (2012). Hydrologic data assimilation using particle Markov chain Monte Carlo simulation: Theory, concepts and applications. Advances in Water Resources Wang, J.M., Fleet, D.J., & Hertzmann, A. (2008). Gaussian process dynamical models for human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 283-298 Weerts, A.H., & El Serafy, G.Y.H. (2006). Particle filtering and ensemble Kalman filtering for state updating with hydrological conceptual rainfall-runoff models. Water Resources Research, 42 Wikle, C.K., & Berliner, L.M. (2007). A Bayesian tutorial for data assimilation. Physica D-Nonlinear Phenomena, 230, 1-16, doi:10.1016/j.physd.2006.1009.1017 Ye, M., Meyer, P.D., & Neuman, S.P. (2008). On model selection criteria in multimodel analysis. Water Resources Research, 44, W03428 187 APPENDIX E: KALMAN FILTERING WITH A GAUSSIAN PROCESS OBSERVATION FUNCTION Grey S. Nearing University of Arizona Department of Hydrology and Water Resources; Tucson, AZ Manuscript not submitted for publication. 188 Abstract We derive the gain expression of the iterative ensemble Kalman filter for a hidden Markov model with a Gaussian process regression observation function. Kriging the observation function allows for an explicit expression for the gradient of the filter posterior density local to each ensemble member. The gain must be approximated variationally and the performance is similar to that of the ensemble Kalman filter and a resampling particle filter for toy problems related to streamflow forecasting, soil moisture estimation and the Lorenz attractor. 189 1. Introduction Kalman-type Bayesian state-updating filters are commonly applied to nonlinear hidden Markov models (HMM) by approximating the prior (forecast) as Gaussian and the observation function as locally linear. In the case of a nonlinear HMM state transition function, the ensemble Kalman filter (Evensen 2003) estimates the mean and covariance of a Gaussian approximation of the prior by Monte Carlo sampling; in the case of a nonlinear HMM observation function, variational methods are used to find the mode of the filter posterior (Zupanski 2005). One limitation of these variational methods is that the tangent to the observation function is localized around the ensemble mean. By emulating the observation operator as a Gaussian process regression (GPR; Rasmussen and Williams 2006) it is possible to derive the filter posterior density function and its gradient anywhere in model state space; this allows for a variational update that is local to each ensemble member. GPR is Kriging in an arbitrary function domain so that the GPR prediction is a linear combination of training samples with kernel weights based on domain proximity (see section 2.2 and Rasmussen and Williams 2006 Chapter 1 for further explanation). GPR is also identical to certain single-layer, infinite-node, feedforward neural networks (Neal 1996). It is simple to implement, resistant to overfitting, and can reproduce any smooth response as long as the covariance structure of that response is known. When the gradients of the kernels are explicit, the gradient of the emulator is as well, and the gradient of the EnKF analysis can be computed anywhere. Section 2 of this paper presents the gain expression and variational approximation of the EnKF for an HMM with a GPR observation function (gpEnKF). Section 3 provides toy 190 examples which compare this filter with the EnKF, ensemble transform Kalman filter (ETKF; Bishop et al. 2001) and the sequential importance resampling filter (SIRF; Gordon et al. 1993). These examples are related to streamflow forecasting, root-zone soil moisture estimation and the Lorenz attractor. 2. Methods 2.1. Overview of Data Assimilation Data assimilation assumes a discrete-time nonlinear dynamical system (NLDS) simulator where the state-transition function: [1.1] at time depends, up to some uncertainty distribution and the current boundary condition by , on the previous state . The state is indirectly observed according to the observation function: [1.2] up to a random observation error and . In most data assimilation applications, are deterministic functions of random variables , represent the expected value of distributions , , and , which and respectively. For example, standard applications of the EnKF use iid samples of and and to create iid samples of and at every time step using a deterministic simulator and prescribed error distributions and . , , 191 Data assimilation addresses the question “what can be learned about x by observing y?” and the Bayesian answer is a smoother. Given observations during simulation time steps (we drop the notation for forcing data and error distributions): [2] ∫ In general, there is no analytical solution to [2] and it is impractical to sample the - dimensional (dimension of the state multiplied by number of integration time steps) posterior directly, therefore it is necessary to make some simplifying assumptions, the most common being that the state at time is conditionally independent of observations at times greater than . This assumption results in a filter: [3] ∫ ∫ ∫ which can be applied sequentially using the posterior at time defines the prior at time in the integral which . The filter posterior is -dimensional (dimension of the state) and it is still rare for nonparametric density estimation to be practical. 2.2. GPR Observation Function A GPR observation function emulator can be trained with samples of forcing data, states and modeled observations collected from the HMM defined by [1] and uncertainty distribution and , and . If these samples are stored in training data matrices , the expected values of the observations resulting from 192 state and forcing data samples, of and , according to a GPR emulator are: ( ) ( [4] ) is the GPR covariance function (see Rasmussen and Williams 2006; Chapter 4) and [4] is the simple Kriging mean. It is usually the case for natural systems that have different sensitivity to the different dimensions of and will and thus the covariance function should be anisotropic; one such covariance function is the squared-exponential known as the automatic relevance determination kernel (Neal 1996): ( ) [5] ( ∑ is the Kronecker delta, and ( , ) ) , and are scale, inverse squared correlation-lengths, and noise hyperparameters respectively which must be estimated by maximizing the probability of the training data (Rasmussen and Williams 2006; Chapter 5). and can be sampled directly from and if is independent from time, the emulator can be trained prior to data assimilation; in this case is independent of the state and boundary condition at any time . We will use notation . 193 2.3. gpEnKF Posterior Likelihood and Gradient is observed with error covariance ensemble ( and the prior is sampled by the forecast and assumed to be Gaussian with covariance ) . The negative log-likelihood of the posterior is given up to an additive constant by: [6] [( )( ) ( )] The gradients of the likelihood function with respect to the samples of the - dimensional model state are: | ( The maximum likelihood (analysis) filter estimate of )( ) conditional on [7] is: [8.1] ( where | ) [8.2] is the Kalman gain. [8] is implicit and requires a variational solution. Many variational approaches to either minimizing the negative log-likelihood of the posterior [6] or to locating a local zero to the log-likelihood gradient [8] are sufficient for locating 194 the a local estimate of the maximum likelihood state estimator. The log-likelihood gradient [7] is taken with respect to each ensemble member independently (except for the calculation of and is local to that ensemble member. Gradient-based variational solutions are common in high-dimensional latent-variable GPR models (e.g., Lawrence 2005; Titsias and Lawrence 2010; Wang et al. 2008). It is important to note that neither the gpEnKF nor any other filter with a Gaussian prior results in a maximum likelihood update which considers the full nonlinearity of the HMM since the covariance matrix assumes a linear relationship between state dimensions. This is intuited by noting that if according to ( ) ( ) even though and ( , will be updated ) is not considered. 3. Demonstrations Four toy demonstrations of the gpEnKF follow. Each one is an identical twin observing system simulation experiment (e.g., Crow and Van Loon 2006), where a truth state and assimilation ensemble were sampled iid from the HMM simulator used for data assimilation. Observations synthesized from the synthetic truth system according to were assimilated using four filters: the EnKF, SIRF, ETKF and gpEnKF. A member assimilation ensemble was used in each case and we ran 30 Monte Carlo repetitions of each experiment to account for the random truth system. Filter performance was evaluated in terms of improvement in the expected squared-error of ensemble mean estimates, and in terms of the posterior probability of the truth system. 195 We report the time-averaged normalized improvement of the expected squared-error of the ensemble mean: ∑ ( ∑ ∑ ( ∑ ̂ ) ( ∑ ) ( ∑ ̂ ) [9] ) and the time-averaged log-scaled improvement of the probability of the truth system assuming Gaussian posterior distributions: ∑ [ ] [10] is the Gaussian distribution with first two moments estimated from the analysis sample , where is calculated by each filter type. Both [9] and [10] were calculated separately for each state dimension, and the analog was calculated for each observation dimension. Larger posterior mean estimates and larger improvement scores represent more accurate values represent lower chance of type II prediction error. In each example, the GPR observation function emulator used by the gpEnKF was trained once before data assimilation using prior ensemble state and observation function means for the entire simulation (integration) period. Each dimension of GPR training variables was standardized to have zero mean and unit variance before hyperparameter optimization. 196 3.1. Streamflow Forecasting In the first toy demonstration, observations of river stage were assimilated into a conceptual lumped rainfall-runoff model based on HyMod (Boyle 2000) under the presumption that improved state estimates translate to improved streamflow forecasts (Maurer and Lettenmaier 2003). The HyMod is commonly used for proof of concept demonstrations for hydrologic data assimilation (Bulygina and Gupta 2011; Moradkhani et al. 2005; Vrugt et al. 2005; Vrugt et al. 2012; Weerts and El Serafy 2006) and is described extensively in existing literature. The model requires precipitation potential evapotranspiration [cm] and [cm] as boundary conditions and our implementation used a 3-dimensional state vector including a soil moisture storage which acts as a control on infiltration and evaporation, a linear slow-flow routing storage tank which simulates groundwater, and a single linear quick-flow routing storage tank which simulates nearsurface water. The soil moisture tank drains into the slow and quick routing tanks, which are in parallel. Streamflow is estimated as a linear function of the routing tank states and river stage ( [m]) – an exponential function of streamflow ( [cm/m2] over the watershed) – was observed according to: [11] Parameters of [11] were chosen to match the NWS rating curve for the Leaf River Watershed in Mississippi, USA at the time of this publication. Vrugt et al (2012) give a concise explanation of the HyMod state transition model we used here, except that ours had a single quick-flow state and we did not consider parameter uncertainty. 197 The truth systems and prior ensembles were forced for days with mean boundary conditions supplied by measurements taken at the Leaf River Watershed during the period Nov 5, 1952 through Feb 3, 1953. Forcing data uncertainty was prescribed Gaussian [ independent in time: [ ] with covariance ]. The state uncertainty distribution was Synthetic observations were generated according to . . Normalized posterior mean MSE and normalized posterior log-probabilities of truth are plotted in Figure 1 along with 1-standard deviation error-bars from Monte Carlo repetitions of the OSSE. There was no significant difference ( performance of any filter except for the SIRF, which had a negative ) between the value for the soil moisture state. In this case the observation function was locally close to linear and in some cases the mean improvement scores for the EnKF were higher than for either variant with a nonlinear observation function. Sensitivity of the observation to state dimensions is illustrated in Figure 2 which plots an example of GPR observation function ARD kernel inverse squared-correlation length hyperparameters; large values represent states to which the observation is sensitive. According to this ARD sensitivity analysis, stage is not dependent on soil moisture, and any update to the soil moisture state will be due to the joint prior. This effect is small on average for all four filters, however it is larger for the EnKF than for the gpEnKF. 198 3.2. Root-Zone Soil Moisture Estimation The second toy example assimilated active radar backscatter synthesized by the integral equation model (IEM; Fung et al. 1996) into a simple physical soil moisture accounting simulator based on Marht and Pan (1984). The soil moisture state transition equations and their parameters are in Appendix A. These OSSE used the same precipitation and potential evaporation boundary condition as the HyMod example. Model state variables were volumetric water content in three layers with cumulative depths of 5, 15 and 30 [cm] respectively; 5 cm soil depth is typically assumed to be visible to L-band active radar satellites (Entekhabi et al. 2010). Vertically and horizontally polarized backscatter coefficients and were synthesized from the truth system 5 cm soil moisture states with surface roughness RMS height of autocorrelation length of , and a of , Gaussian roughness incidence angle and a backscatter error distribution . It is interesting to note that this application is similar to the neural network approach to estimating soil moisture by Pulvirenti et al (2009) except that the IEM was effectively inverted by Bayes law rather than by training an explicit inversion emulator like Baghdadi et al (2002). Validation results for this soil moisture experiment are illustrated in Figure 3. Here again, the parametric filters performed similarly and better than the nonparametric SIRF. Most notably, the gpEnKF performance was similar to the MLEF performance. No statistically significant differences were found between any filter when considering a single performance metric. 199 3.3. Lorenz 3-D The third toy problem was based on the 3-dimensional Lorenz attractor (Lorenz 1963) and is a common example for data assimilation in nonlinear models (Evensen and van Leeuwen 2000; Miller et al. 1994; Pham 2001; Vrugt et al. 2012). The Lorenz equations were forced by perturbations and the state transition model: [12] was integrated by Runga-Kutta with a time step of [ The initial state distribution was parameters were , for timesteps. ] and the , . Observations were generated according to the nonlinear transform: ( ) where the standard deviations of the three state dimensions ensemble means, and [13] were estimated from . Results are illustrated in Figure 4. In this case, the filters with nonlinear observation functions seemed to outperform the EnKF and SIRF in terms of both evaluation criteria, however there was no significant difference between the performance metrics of any 200 filters. The application of Gaussian filters to the Lorenz system is sub-optimal (Cornford et al. 2008), however in our experiments all filters were able to improve MSE of state estimates in almost every OSSE. 4. Discussion The innovation in this paper is due to non-linearizing the HMM observation function with a nonparametric regression to allow for a standard form of the gain expression with gradient local to each ensemble member. The posterior likelihood [6] and its gradient [7] are linear combinations of terms related to the Bayesian prior and likelihood. This means that any filter prior could be substituted for the Gaussian used in our examples. Most notably, the likelihood portion of the gradient could be used in conjunction with a kernel or copula prior, which is something that the MLEF does not allow since it localizes the posterior likelihood around the ensemble mean. The standard filter gain expression comes at the cost of an implicit MLE update. The computational cost of the proposed filter is difficult to calculate exactly since the solution is variational, however assuming (we used one calculation of the gpEnKF likelihood and gradient costs update costs GPR training data points), , whereas one EnKF . We generally found convergence in between 20 and 60 log-likelihood function evaluations in all cases (all of our models were 3-D). When the observation function is highly nonlinear, this cost may be acceptable given the improved accuracy of the update, however this choice will be application dependent. 201 In our toy examples, the gpEnKF performed as as well as the MLEF and sometimes better than the EnKF when the observation function was highly nonlinear. Parametric filter performance was generally better than SIRF performance; resampling filters are rarely used in land surface data assimilation because they generally require a large simulation ensemble (Snyder et al. 2008) and often offer little accuracy benefit over the EnKF (Kivman 2003; Nearing et al. 2012; van Leeuwen 2003; Weerts and El Serafy 2006). Any data assimilation application must balance effectiveness and efficiency: computational cost is due to both the filter update and the HMM integration (size of the ensemble), and the appropriateness of filter assumptions will depend on the NLDS and the nature of the desired state estimates. OSSE are a good way to inform this tradeoff. Appendix: A Three-Layer Mahrt-Pan Soil Moisture Model We developed a three-layer soil moisture model based on the two-layer model presented by Mahrt & Pan (1984). The state vector consists of volumetric water content in the soil layers ( ); for our implementation we used layers with cumulative depths of 5, 15 and 30 [cm]; soil layer thicknesses were . The model requires a thin upper layer to simulate water available for direct evaporation. Integration was on a daily time step, and forcing data were daily potential evaporation ( daily cumulative precipitation ( hydraulic coefficients: porosity ( bubbling pressure ( ) and )). Model parameters were the Brooks-Corey ), residual moisture content ( ), saturated hydraulic conductivity ( ), ), and 202 pore size distribution index (b , and ). We used values of , , , . At each time step, soil diffusivity ( ) and unsaturated conductivity ( ) were calculated according to Brooks and Corey (1964): ( ( Infiltration potential was ( ( [A.1.1] ) [A.1.2] ) ): ( ) [A.2.1] ( and actual infiltration ( ) ( )) ): ( ( Evaporation from the top soil layer was ( ( )) ( )) ): [A.2.2] [A.2.3] 203 ( ) { [A.3.1] ( ) [A.3.2] Volumetric water content was updated as: ( ( ( ( ) ( ( ) ( ) ) ) ) [A.4.1] [A.4.2] ) [A.4.3] 204 Figures Figure 1: Logarithms of the normalized posterior MSE (left; lower values are better) and normalized posterior log-probabilities of the truth (right; higher values are better) from assimilating stage observations into HyMod. Error-bars represent one standard deviation. 205 Figure 2: Inverse squared ARD correlation lengths for the HyMod GPR observation function emulator. 206 Figure 3: Mahrt-Pan OSSE results from assimilating 5 [cm] soil moisture observations (top) and radar backscatter observations standard deviation. and (bottom). Error-bars represent one 207 Figure 4: Results from Lorenz 3-D OSSE nonlinear observation function. Error-bars represent one standard deviation. 208 References Baghdadi, N., Gaultier, S., & King, C. (2002). Retrieving surface roughness and soil moisture from synthetic aperture radar (SAR) data using neural networks. Canadian Journal of Remote Sensing, 28, 701-711 Bishop, C.H., Etherton, B.J., & Majumdar, S.J. (2001). Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Monthly Weather Review, 129, 420-436 Boyle, D.P. (2000). Multicriteria calibration of hydrologic models. In, Department of Hydrology and Water Resources. Tucson, AZ: University of Arizona Brooks, R.H., & Corey, A.T. (1964). Hydraulic properties of porous media. Hydrology Papers, Colorado State University Bulygina, N., & Gupta, H. (2011). Correcting the mathematical structure of a hydrological model via Bayesian data assimilation. Water Resources Research, 47 Cornford, D., Shen, Y., Vrettas, M., Opper, M., Archambeau, C., & Shawe-Taylor, J. (2008). Approximations in non-Gaussian data assimilation: when does it matter? In, RSS / RMS joint meeting on non-Gaussian / non-linear aspects of data assimilation, 10 April 2008. London, UK: Royal Statistical Society Crow, W.T., & Van Loon, E. (2006). Impact of incorrect model error assumptions on the sequential assimilation of remotely sensed surface soil moisture. Journal of Hydrometeorology, 7, 421-432, doi:410.1175/JHM1499.1171 Entekhabi, D., Njoku, E.G., O'Neill, P.E., Kellogg, K.H., Crow, W.T., Edelstein, W.N., Entin, J.K., Goodman, S.D., Jackson, T.J., Johnson, J., Kimball, J., Piepmeier, J.R., Koster, R.D., Martin, N., McDonald, K.C., Moghaddam, M., Moran, S., Reichle, R., Shi, J.C., Spencer, M.W., Thurman, S.W., Tsang, L., & Van Zyl, J. (2010). The Soil Moisture Active Passive (SMAP) Mission. Proceedings of the IEEE, 98, 704-716 Evensen, G. (2003). The Ensemble Kalman Filter: theoretical formulation and practical implementation. Ocean Dynamics, 53, 343–367, doi:310.1007/s10236-10003-1003610239 Evensen, G., & van Leeuwen, P.J. (2000). An ensemble Kalman smoother for nonlinear dynamics. Monthly Weather Review, 128, 1852-1867 Fung, A.K., Dawson, M.S., Chen, K.S., Hsu, A.Y., Engman, E.T., O'Neill, P.O., & Wang, J. (1996). A modified IEM model for: scattering from soil surfaces with application to soil moisture sensing. IGARSS '96. 1996 International Geoscience and 209 Remote Sensing Symposium. Remote Sensing for a Sustainable Future (Cat. No.96CH35875) Gordon, N.J., Salmond, D.J., & Smith, A.F.M. (1993). Novel approach to nonlinear nonGaussian Bayesian state estimation. IEE Proceedings-F Radar and Signal Processing, 140, 107-113, doi:110.1049/ip-f-1042.1993.0015 Kivman, G.A. (2003). Sequential parameter estimation for stochastic systems. Nonlin. Processes Geophys., 10, 253-259 Lawrence, N. (2005). Probabilistic non-linear principal component analysis with Gaussian process latent variable models. Journal of Machine Learning Research, 6, 1783-1816 Lorenz, E.N. (1963). Deterministic nonperiodic flow. Journal of the Atmospheric Sciences, 20, 130-141 Mahrt, L., & Pan, H. (1984). A 2-layer model of soil hydrology. Boundary-Layer Meteorology, 29, 1-20 Maurer, E.P., & Lettenmaier, D.P. (2003). Predictability of seasonal runoff in the Mississippi River basin. Journal of Geophysical Research-Atmospheres, 108 Miller, R.N., Ghil, M., & Gauthiez, F. (1994). Advanced data assimilation in strongly nonlinear dynamical-systems. Journal of the Atmospheric Sciences, 51, 1037-1056 Moradkhani, H., Hsu, K.L., Gupta, H., & Sorooshian, S. (2005). Uncertainty assessment of hydrologic model states and parameters: Sequential data assimilation using the particle filter. Water Resources Research, 41 Neal, R.M. (1996). Bayesian Learning for Neural Networks. New York: Springer Nearing, G.S., Crow, W.T., Thorp, K.R., Moran, M.S., Reichle, R.H., & Gupta, H.V. (2012). Assimilating remote sensing observations of leaf area index and soil moisture for wheat yield estimates: An observing system simulation experiment. Water Resources Research, 48 Pham, D.T. (2001). Stochastic methods for sequential data assimilation in strongly nonlinear systems. Monthly Weather Review, 129, 1194-1207 Pulvirenti, L., Ticconi, F., & Pierdicca, N. (2009). Neural network emulation of the Integral Equation Model with multiple scattering. Sensors, 9, 8109-8125 Rasmussen, C., & Williams, C. (2006). Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press 210 Snyder, C., Bengtsson, T., Bickel, P., & Anderson, J. (2008). Obstacles to highdimensional particle filtering. Monthly Weather Review, 136, 4629-4640 Titsias, M.K., & Lawrence, N.D. (2010). Bayesian Gaussian process latent variable model. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 844-851 van Leeuwen, P.J. (2003). A variance-minimizing filter for large-scale applications. Monthly Weather Review, 131, 2071-2084 Vrugt, J.A., Diks, C.G.H., Gupta, H.V., Bouten, W., & Verstraten, J.M. (2005). Improved treatment of uncertainty in hydrologic modeling: Combining the strengths of global optimization and data assimilation. Water Resources Research, 41, 17 Vrugt, J.A., ter Braak, C.J.F., Diks, C.G.H., & Schoups, G. (2012). Hydrologic data assimilation using particle Markov chain Monte Carlo simulation: Theory, concepts and applications. Advances in Water Resources Wang, J.M., Fleet, D.J., & Hertzmann, A. (2008). Gaussian process dynamical models for human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 283-298 Weerts, A.H., & El Serafy, G.Y.H. (2006). Particle filtering and ensemble Kalman filtering for state updating with hydrological conceptual rainfall-runoff models. Water Resources Research, 42 Zupanski, M. (2005). Maximum likelihood ensemble filter: Theoretical aspects. Monthly Weather Review, 133, 1710-1726

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement