DIAGNOSTICS AND GENERALIZATIONS FOR PARAMETRIC STATE ESTIMATION by

DIAGNOSTICS AND GENERALIZATIONS FOR PARAMETRIC STATE ESTIMATION  by
DIAGNOSTICS AND GENERALIZATIONS FOR PARAMETRIC STATE
ESTIMATION
by
Grey Stephen Nearing
_____________________
Copyright © Grey Nearing 2013
A Dissertation Submitted to the Faculty of the
DEPARTMENT OF HYDROLOGY AND WATER RESOURCES
In Partial Fulfillment of the Requirements
For the Degree of
DOCTOR OF PHILOSOPHY
WITH A MAJOR IN HYDROLOGY
In the Graduate College
THE UNIVERSITY OF ARIZONA
2013
2
THE UNIVERSITY OF ARIZONA
GRADUATE COLLEGE
As members of the Dissertation Committee, we certify that we have read the dissertation
prepared by Grey Nearing entitled Diagnostics and Generalizations for Parametric State
Estimation and recommend that it be accepted as fulfilling the dissertation requirement
for the Degree of Doctor of Philosophy
____________________________________________________________Date: 3/25/13
Hoshin V. Gupta
____________________________________________________________Date: 3/25/13
C. Larry Winter
____________________________________________________________Date: 3/25/13
Ty P.A. Ferré
____________________________________________________________Date: 3/25/13
Wade T. Crow
Final approval and acceptance of this dissertation is contingent upon the candidate's
submission of the final copies of the dissertation to the Graduate College.
I hereby certify that I have read this dissertation prepared under my direction and
recommend that it be accepted as fulfilling the dissertation requirement.
____________________________________________________________Date: 3/25/13
Dissertation Director: Hoshin V. Gupta
3
STATEMENT BY AUTHOR
This dissertation has been submitted in partial fulfillment of the requirements for
an advanced degree at the University of Arizona and is deposited in the University
Library to be made available to borrowers under rules of the Library.
Brief quotations from this dissertation are allowable without special permission,
provided that an accurate acknowledgement of the source is made. Requests for
permission for extended quotation from or reproduction of this manuscript in whole or in
part may be granted by the copyright holder.
SIGNED: Grey Nearing
4
ACKNOWLEDGEMENTS
Thanks to Hoshin and Susan for showing me that patience and respect are the most
important ingredients for teaching, learning, and communicating.
5
DEDICATION
To my father, who gave me exactly what I needed when I needed it.
6
TABLE OF CONTENTS
ABSTRACT ......................................................................................................................11
CHAPTER 1: INTRODUCTION ...................................................................................12
1.1. Review of Literature and Statement of Problems............................................12
1.2. Philosophy and Organization of This Dissertation ..........................................15
1.3. Statement of Author’s Contribution .................................................................16
CHAPTER 2: PRESENT STUDIES ..............................................................................18
2.1. Assimilating Remote Sensing Observations of Leaf Area Index and Soil
Moisture for Wheat Yield Estimates: An Observing System Simulation
Experiment ..........................................................................................................18
2.2. An Approach to Quantifying the Efficiency of a Bayesian Filter...................19
2.3. Information Loss in Estimation of Agricultural Yield: A Comparison of
Generative and Discriminative Approaches ....................................................20
2.4. Measuring Information about Model Structure Introduced During System
Identification .......................................................................................................22
2.5. Kalman Filtering with a Gaussian Process Observation Function ................24
CHAPTER 3: DISCUSSION AND FUTURE WORK .................................................26
REFERENCES .................................................................................................................28
APPENDIX A: ASSIMILATING REMOTE SENSING OBSERVATIONS OF
LEAF AREA INDEX AND SOIL MOISTURE FOR WHEAT YIELD
ESTIMATES: AN OBSERVING SYSTEM SIMULATION EXPERIMENT ..........32
Abstract .......................................................................................................................33
1. Introduction............................................................................................................34
2. Methods...................................................................................................................36
2.1. The DSSAT CropSim Ceres Wheat Model ................................................37
2.2. The Ensemble Kalman Filter .......................................................................40
2.3. The Sequential Importance Resampling Filter ..........................................42
2.4. Observing System Simulation Experiments ...............................................43
2.4.1. Modeling Uncertainty Distributions ..................................................44
2.4.2. Generating Synthetic Observations ...................................................48
2.4.3. Ensemble Size Experiments ................................................................50
2.4.4. Modeling Uncertainty Experiments ...................................................51
2.4.5. Observation Uncertainty Experiments ..............................................52
7
TABLE OF CONTENTS - Continued
3. Results .....................................................................................................................53
3.1. Ensemble Size Experiments .........................................................................53
3.2. Modeling Uncertainty Experiments ............................................................54
3.3. Observation Uncertainty Experiments .......................................................58
4. Discussion ...............................................................................................................58
Acknowledgement ......................................................................................................60
Tables ..........................................................................................................................61
Figures .........................................................................................................................69
References ...................................................................................................................72
APPENDIX B: AN APPROACH TO QUANTIFYING THE EFFICIENCY OF A
BAYESIAN FILTER .......................................................................................................76
Abstract .......................................................................................................................77
1. Introduction............................................................................................................78
2. Theory: Background and Proposed Metrics .......................................................80
2.1. Bayes Filters ..................................................................................................80
2.2. Ensemble Kalman Filter ..............................................................................83
2.3. Observing System Simulation Experiments ...............................................84
2.4. Quantifying Observation Utility and Filter Efficiency ..............................85
3. Demonstration: An OSSE for Estimating Root-Zone Soil Moisture ................88
3.1. A 3-Layer Soil Moisture Model ...................................................................89
3.1.1. State Transition Function ...................................................................89
3.1.2. Simulation Period and Boundary Conditions ...................................91
3.1.3. Observation Function ..........................................................................92
3.2. Ensemble Size ................................................................................................92
3.3. Methods of OSSE Analysis ..........................................................................94
3.4. Assessing the Effects of Simulator Uncertainty .........................................95
3.5. OSSE Results .................................................................................................95
4. Summary and Discussion ......................................................................................98
Acknowledgement ....................................................................................................101
Figures .......................................................................................................................102
8
TABLE OF CONTENTS - Continued
References .................................................................................................................106
APPENDIX C: INFORMATION LOSS IN ESTIMATION OF AGRICULTURAL
YIELD: A COMPARISON OF GENERATIVE AND DISCRIMINATIVE
APPROACHES ..............................................................................................................109
Abstract .....................................................................................................................110
1. Introduction..........................................................................................................111
2. Methods.................................................................................................................113
2.1. A Crop Development Simulator ................................................................113
2.1.1. The State Transition Function ..........................................................115
2.1.2. The Observation Function ................................................................115
2.1.3. Simulation Period and Uncertainty Sampling ................................116
2.2. Data Assimilation and the Ensemble Kalman Filter ...............................118
2.3. A Gaussian Process Regression Interpolator ...........................................121
2.4. Observing System Simulation Experiments .............................................124
2.5. Measuring Information in Observations ..................................................125
2.6. Measuring Information Loss .....................................................................127
3. Results ...................................................................................................................128
4. Conclusions and Discussion ................................................................................130
Appendix A: The Crop Development Simulator State Transition Function .....131
Acknowledgement ....................................................................................................136
Tables ........................................................................................................................137
Figures .......................................................................................................................138
References .................................................................................................................144
APPENDIX D: MEASURING INFORMATION ABOUT MODEL STRUCTURE
INTRODUCED DURING SYSTEM IDENTIFICATION ........................................149
Abstract .....................................................................................................................150
1. Introduction..........................................................................................................151
2. Methods.................................................................................................................153
2.1. Dynamic System Simulators ......................................................................153
2.2. An EM System Identification Algorithm ..................................................155
2.2.1. E-Step: The Ensemble Kalman Smoother .......................................156
9
TABLE OF CONTENTS - Continued
2.2.2. M-Step: Sparse Gaussian Process Regression.................................158
2.2.3. Why We Cannot Use Filters .............................................................161
2.3. Measuring Information in the Conceptual and Mathematical
Models ..........................................................................................................162
2.3.1. Information about the Model Extracted from Observations.........163
2.3.2. Information Contained in the Model about a Hydrologic
Process ................................................................................................164
2.4. An Application Experiment .......................................................................166
2.4.1. Leaf River Data and Simulation Period ..........................................166
2.4.2. The HyMod Simulator.......................................................................167
2.4.3. Implementing the Learning Algorithm............................................168
3. Results ...................................................................................................................168
4. Discussion .............................................................................................................172
Appendix A: The HyMod Simulator ......................................................................174
Tables ........................................................................................................................176
Figures .......................................................................................................................177
References .................................................................................................................184
APPENDIX E: KALMAN FILTERING WITH A GAUSSIAN PROCESS
OBSERVATION FUNCTION......................................................................................187
Abstract .....................................................................................................................188
1. Introduction..........................................................................................................189
2. Methods.................................................................................................................190
2.1. Overview of Data Assimilation ..................................................................190
2.2. GPR Observation Function .......................................................................191
2.3. gpEnKF Posterior Likelihood and Gradient ...........................................193
3. Demonstrations ....................................................................................................194
3.1. Streamflow Forecasting ..............................................................................196
3.2. Root-Zone Soil Moisture Estimation.........................................................198
3.3. Lorenz 3-D ..................................................................................................199
4. Discussion .............................................................................................................200
10
TABLE OF CONTENTS - Continued
Appendix: A Three-Layer Mahrt-Pan Soil Moisture Model ...............................201
Figures .......................................................................................................................204
References .................................................................................................................208
11
ABSTRACT
This dissertation is comprised of a collection of five distinct research projects which
apply, evaluate and extend common methods for land surface data assimilation. The
introduction of novel diagnostics and extensions of existing algorithms is motivated by an
example, related to estimating agricultural productivity, of failed application of current
methods. We subsequently develop methods, based on Shannon’s theory of
communication, to quantify the contributions from all possible factors to the residual
uncertainty in state estimates after data assimilation, and to measure the amount of
information contained in observations which is lost due to erroneous assumptions in the
assimilation algorithm. Additionally, we discuss an appropriate interpretation of Shannon
information which allows us to measure the amount of information contained in a model,
and use this interpretation to measure the amount of information introduced during data
assimilation-based system identification. Finally, we propose a generalization of the
ensemble Kalman filter designed to alleviate one of the primary assumptions – that the
observation function is linear.
12
CHAPTER 1: INTRODUCTION
1.1. Review of Literature and Statement of Problems
Data assimilation is defined as the application of Bayes’ law (Bayes and Price 1763) to
the problem of estimating the states of a hidden Markov model (HMM) conditional on
observations (Wikle and Berliner 2007), and has been used extensively in earth science
(Reichle 2008), including hydrology (Liu and Gupta 2007) and agronomy (Prevot et al.
2003). For practical purposes, most data assimilation systems are based on
approximations of Bayes’ Law; both parametric (Evensen 2003; Kalman 1960) and
nonparametric (Gordon et al. 1993) approximations have been proposed.
In many applications, practical data assimilation is sufficient for reducing uncertainty in,
or increasing the accuracy of, model state estimates (e.g. Reichle et al. 2002; Vrugt et al.
2006). This is not always the case, however, and even though studies which demonstrate
little value from assimilating observations are rarely published, there are a few (e.g., de
Wit and van Diepen 2007); statistical acumen and reading between the lines (or luck) are
generally necessary to identify this type of result in published literature. Our contribution
to this body of work is direct, and clearly shows that assimilating observations of crop
color (in the form of leaf area index) and soil moisture is statistically ineffective at
improving estimates of crop yield, due to the nature of the relationship between the states
of crop development models and remote sensing observations (Nearing et al. 2012).
Several approximations of the HMM are inherent in common data assimilation
algorithms, and although methods have been proposed for estimating the sensitivity of
13
model states to remote sensing observations (e.g., Gelaro et al. 2007) and the amount of
information contained in remote sensing observations (Rodgers 2000; p33), previous to
our efforts (Nearing et al. 2013) no method had been developed to compare the actual
amount of information contained in observations with the amount extracted by
approximately Bayesian filters. Using an approach inspired by Gong et al. (2013) that is
fundamentally based on the data processing inequality (Cover and Thomas 1991; p34),
we use entropy as a surrogate for information and quantify the amount of entropy in the
data assimilation posterior distribution that is due to approximations of Bayes’
law(Nearing et al. 2013). The method is then extended to formally quantify information
loss (Nearing and Gupta in review).
Any state estimates obtained using data assimilation are conditional (usually implicitly)
on the HMM structure. The process of identifying an appropriate HMM structure is
called system identification, and can be viewed as a Bayesian learning problem, similar to
data assimilation (Liu and Gupta 2007). System identification is defined as a two-step
process of building dynamical models that are consistent with physical knowledge about
the system and with observational data. The first step is conceptual structure
identification, and involves a selection of system boundaries and boundary conditions,
system states, and important system processes. The second step is mathematical structure
identification and involves specifying appropriate mathematical representations of system
processes (Bulygina and Gupta 2010). The most common form of mathematical structure
identification is an expectation-maximization (Dempster et al. 1977) algorithm which
iteratively (E-step) infers the distribution of model states conditional on observations and
14
then (M-step) infers the maximum-likelihood values of parameters of the state transition
function (Ghahramani and Roweis 1999); this method was used by Vrugt et al. (2005)
and Bulygina and Gupta (2009, 2010, 2011) to identify the mathematical structure of
rainfall-runoff models. The parameters which are calibrated in the M-step can be
parameters of a nonparametric regression so that the method is a general method for the
identification of the mathematical structure of a dynamic system model (e.g., Bulygina
and Gupta 2009; Damianou et al. 2011; Ghahramani and Roweis 1999; Turner et al.
2009; Wang et al. 2008).
Gong et al. (2013) hypothesized that a certain amount of information about system
outputs was stored in data about system inputs, and interpreted the data processing
inequality (DPI) to mean that models which mapped system inputs to outputs could only
lose some of that information (could not add information about the outputs). In actuality,
the DPI says that no stochastic model can perform as well as a model which exactly
represents the true joint distribution between inputs and outputs. We therefore propose
that even imperfect models contain information about any system property of interest
(including about a model of the system; e.g., Ye et al. 2008), and that this information is
about the mapping from stochastic measured data to the property of interest. In this
context, information is measured as the divergence (Kullback and Leibler 1951) caused
by conditioning the property of interest on measurements, and any model which
approximates Bayesian conditioning contains information.
The most common algorithm used in land surface data assimilation - the ensemble
Kalman filter (EnKF; Evensen 2003) - has two major assumptions: (1) that at any given
15
time, the state of the HMM is Gaussian-distributed, and (2) that the relationship between
the state and observation (called the observation function) of the HMM is linear. These
conditions result in a Gaussian posterior which can be computed analytically if the
moments of the prior and are known. The second moment of the prior is computed from a
sample set called the ensemble and the mean of the posterior is calculated by separately
assuming that that each ensemble sample represents the prior mean. This results in an
independent and identically distributed sample of the posterior.
A popular generalization of this method linearizes the observation function around the
first moments of the HMM state and observation distributions and implements an
iterative solution for the mean of the posterior (Zupanski 2005). Linearization of the
observation function is achieved by computing the covariance between samples of the
HMM states and observations, so it is impossible to estimate the gradient of the objective
function local to each ensemble member. This limitation precludes using nonparametric
(kernel or copula) prior distributions, and therefore precludes a fully nonlinear ensemble
filter. We propose to emulate the observation function via Gaussian process regression
(GPR: Rasmussen and Williams 2006) and solve for the gradient of the observation with
respect to the states locally at each HMM sample.
1.2. Philosophy and Organization of this Dissertation
This dissertation is a collection of five distinct reports – reproduced in Appendices A-E
with individual overviews given in Chapter 2. These reports outline my efforts to
understand the state of the science of data assimilation, and more generally, the art of
learning from data. This was very much a hands-on process, where the bulk of the
16
learning was facilitated by coding and testing existing and novel algorithms. Ultimately,
the reports included in this dissertation represent a fraction of that work, and were chosen
to highlight novel efforts toward understanding the interaction between dynamic system
models and data.
Because I did not have a single over-arching research objective, the projects reported here
are diverse and do not collectively address any single underlying deficiency in our
understanding of the universe. Some of these projects are engineering projects in that
they represent the development of solutions to practical problems rather than
investigations of natural phenomenon, and others should be interpreted as philosophical
discussion supported by demonstrations. There is very little evidence-based science in
this dissertation.
1.3. Statement of the Author’s Contribution
The majority of the novel philosophical and methodological contributions described in
this dissertation, including the central ideas of four of the five studies, are due to the
Author. The major exception is that the application study (Appendix A) constitutes work
described by a NASA ROSES proposal which was drafted by a group of individuals, led
by Dr. Wade T. Crow and including the Author. Additionally, inspiration for the data
assimilation diagnostics (Appendix B) came from Dr. Wei Gong’s creative application of
the data processing inequality (Gong et al. 2013). All of the computer code used in each
study was written by the Author with the exception of the DSSAT crop model
(Hoogenboom et al. 2008) used in the application study (Appendix A); the DSSAT model
was extensively modified by the Author. All of the published and un-published papers
17
included in the appendices of this dissertation were written by the Author, with editing by
the various listed co-authors (esp. Dr. Hoshin Gupta).
18
CHAPTER 2: PRESENT STUDIES
The methods, results, and conclusions of the studies which constitute this dissertation are
presented in appendices. The following subsections summarize the key elements of each
study.
2.1. Assimilating Remote Sensing Observations of Leaf Area Index and Soil
Moisture for Wheat Yield Estimates: An Observing System Simulation
Experiment (Nearing et al. 2012)
The paper in Appendix A was published in Water Resources Research in May, 2012, and
is a case study on the application of the ensemble Kalman filter and a nonparametric
resampling filter to the problem of improving dynamic model estimates of seasonal crop
yield. Like all studies presented in this dissertation, this is a synthetic study, which means
that the observations which were assimilated were generated by the hidden Markov
model itself. In this case the HMM was based on the Decision Support System for
Agrotechnology Transfer (DSSAT) CropSim-Ceres model (Hoogenboom et al. 2008).
The objective of this experiment was to determine which types of uncertainty could be
mitigated by assimilating (1) observations of surface soil moisture, like what are expected
to be available from the Soil Moisture Active-Passive mission (Entekhabi et al. 2010),
and (2) observations of leaf area index like what are available from Moderate Resolution
Imaging Spectroradiometer (Knyazikhin et al. 1999). We found little statistical
improvement to estimates of seasonal wheat yield produced by two rain-fed crops
(energy-limited winter wheat and water-limited summer wheat) due to assimilating
19
observations of either type. An analysis of the correlations between HMM states and
observations revealed that limitations were due to (i) a lack of correlation between plant
development and soil moisture soil moisture observations, even in the water-limited case,
(ii) error in LAI observations, and (iii) a lack of correlation between leaf growth and
grain growth.
Even though we were able to make qualitative statements – albeit based on quantitative
evidence – about the reasons for poor data assimilation performance, we were not able to
formally identify or quantify contributions to uncertainty in posterior state estimates.
More generally, the existing literature contained no comprehensive discussion regarding
factors which contribute to uncertainty in data assimilation posteriors
2.2. An Approach to Quantifying the Efficiency of a Bayesian Filter (Nearing et al.
2013)
The paper in Appendix B was submitted to Water Resources Research in September,
2012. This paper identifies three factors that collectively result in a non-Dirac posterior
state distribution, and formally quantifies the contribution of each of these factors to
posterior uncertainty in state estimates. These factors are: (1) non-injectivity of the
mapping between states and observations, (2) noise in the observations, and (3)
inefficiency in the data assimilation algorithm.
Uncertainty in the data assimilation posterior is defined as Shannon (1948) entropy, and
the potential for an observation to reduce uncertainty is quantified by the mutual
information between the observation and the state. Both the entropy and mutual
20
information can be estimated from Monte Carlo samples the HMM, and the contributions
of factors (1-3) to the entropy of the posterior are measured by estimating the information
content of observations with and without error. Entropy and mutual information are
estimated by maximum likelihood methods by discretizing the HMM state and
observation spaces.
The primary limitation to this method is computational: it is impractical to estimate highdimensional empirical probability density functions, and therefore it is impractical to
estimate the entropy of, and information in, high-dimensional observations. This limits
the applicability of the proposed method to estimating the contributions to posterior
entropy by assimilating only one observation at a time. In the paper, contributions to
posterior uncertainty from different sources are averaged over a period of time simulated
by a stationary HMM.
The proposed procedure was demonstrated on the problem of estimating profile soil
moisture from observations at the surface (top 5 cm). When synthetic observations of 5
cm soil moisture were assimilated into a three-layer model of soil hydrology, it was
found that part of the entropy of the posterior estimates of soil moisture states was due to
inefficiencies in the EnKF. This implies that the EnKF did not use all of the available
information in observations.
2.3. Information Loss in Estimation of Agricultural Yield: A Comparison of
Generative and Discriminative Approaches (Nearing and Gupta in review)
Although (Nearing et al. 2013) provide a way to measure contributions to uncertainty in
data assimilation posterior distributions and give an example where not all of the
21
information was used by the data assimilation algorithm, the approach does not actually
measure information loss. Measuring uncertainty can be thought of as estimating the
precision of probabilistic state estimates, whereas the amount of information in a
probabilistic estimate is a measure related to both accuracy and precision.
Intuitively, observations contain information about the HMM state, and it is important to
know how much of this information is incorporated into the data assimilation state
estimate. Some of this information contained in observations might be used by the data
assimilation algorithm to inform the data assimilation posterior distribution, and some of
it might be lost. Additionally, there is potential for the data assimilation algorithm to
introduce artifacts into the posterior estimate which are not informed by the observation.
The paper in Appendix C formalizes these concepts by defining used information, lost
information, and bad information. Used information and lost information sum to account
for the total mutual information between observations and HMM state, while lost
information and bad information sum to the Kullback-Leibler divergence (Kullback and
Leibler 1951) from the true Bayesian posterior to the data assimilation approximation of
that posterior.
These concepts were demonstrated by comparing data assimilation and regression as two
methods for estimating agricultural yield from remote sensing observations of leaf area
index and soil moisture. Synthetic experiments were used to measure information loss
and bad information in posterior estimates of end-of-season biomass made by the EnKF
and GPR. It was found that GPR was generally as efficient as the EnKF at extracting
information from observations. Since regression can be implemented independent of an
22
HMM simulator while data assimilation cannot, Nearing and Gupta (in review) argue that
GPR is generally preferable to the EnKF for estimating agricultural yield from remote
sensing observations.
2.4. Measuring Information about Model Structure Introduced During System
Identification (Nearing and Gupta in preparation)
Fundamentally, the data assimilation prior is facilitated by a HMM; in this sense, the data
assimilation posterior is conditional on that HMM (Liu and Gupta 2007). System
identification is an approach to define the HMM structure which has been used many
times in hydrology (Bulygina and Gupta 2009; Liu and Gupta 2007; Vrugt et al. 2005);
technically, parameter estimation is a form of system identification (Roweis and
Ghahramani 2000). Bulygina and Gupta (2010) defined system identification as a twostep process of “building dynamical models that are simultaneously consistent with
physical knowledge about the system and with the information contained in observational
data.” The first step is conceptual structure identification, and involves a selection of
system boundaries and boundary conditions, system states, and important system
processes. The second step is mathematical structure identification and involves
specifying appropriate mathematical representations of system processes.
Presumably, both the conceptual identification and mathematical identification steps
introduce information about the system. We propose to measure the amount of
information introduced during each step. The first step in this process is to define what
we mean by information, including what this information is about. Generally, we say that
a model contains information about a particular phenomenon which arises from a
23
dynamic system by defining a probability distribution over a random variable associated
with that phenomenon. Technically, the act of assigning a model structure for this
purpose should be a Bayesian exercise, and we should therefore start with a prior
distribution over the outcomes of the phenomenon; this prior should then be conditioned
by some approximation of Bayes’ Law. The result is a distribution over the random
variable associated with the phenomenon we wish to understand which is conditional on
the model structure, and it is therefore possible to estimate the divergence from the
conditional to the prior distribution.
We demonstrate this conceptualization of the intuitive idea of information contained in a
model on the problem of estimating the structure of a rainfall-runoff simulator using
observations from the Leaf River watershed in Mississippi, USA. The conceptual
identification step is accomplished by assigning a joint distribution between measured
boundary conditions (precipitation and potential evapotranspiration) and streamflow
using a stochastic implementation of the HyMod simulator (Boyle 2000). The
mathematical identification step is accomplished using an expectation-maximization
(EM; Dempster et al. 1977) approach to system identification (Ghahramani and Roweis
1999).
The particular EM algorithm we employ uses data assimilation, in the form of the
ensemble Kalman smoother (Evensen and van Leeuwen 2000) to calculate expected
distributions over model states (the E-step) and then maximizes the likelihood of the
values of parameters in a nonparametric regression (the M-step). The result of iterating
this procedure is an identified state transition model. A similar approach was employed
24
by Bulygina and Gupta (2009, 2010, 2011) on the same problem we use for
demonstration.
The paper in Appendix D demonstrates EM mathematical structure identification and
calculates the information introduced during the conceptual and mathematical
identification portions of system identification.
2.5. Kalman Filtering with a Gaussian Process Observation Function (Nearing in
preparation)
One widely recognized limitation of the EnKF is the assumption of a linear observation
function. Given a Gaussian prior over model states, this assumption allows for an
analytical estimate of the conjugate (Gaussian) posterior. The EnKF strategy for
estimating the posterior is to estimate a set of first moments, each conditional on taking
an individual independently and identically distributed sample of the prior as the prior
mean; this results in many independent and identically distributed samples of the
posterior. Since the mean of a Gaussian is also its maximum likelihood estimator, the
assumption can be alleviated simply by variational estimation of the mode of the
posterior; in particular, this is done by iterative gradient-based maximization of the loglikelihood function.
The primary requirement for variationally sampling the posterior is that the likelihood
expression and its gradient must be known. Zupanski (2005) uses the covariance between
a sample of the HMM state and observation as the gradient, and when this covariance is
estimated as the sample covariance, it is effectively an estimate of the likelihood gradient
25
taken at the ensemble mean. In other words, it is only possible to estimate the gradient at
one point in the joint state/observation space. This precludes using a kernel density
estimate as the filter prior.
One way to get around the limitation imposed by using the state/observation covariance
to estimate the likelihood gradient is to calculate the gradient of the observation function
analytically; this must be done individually for each HMM. We instead propose to
emulate the observation function by nonparametric regression; this leads to standard
likelihood and gradient expressions which can be calculated at any point in the HMM
state and observation space. In particular, this means that the gradient can be local to each
ensemble member, which would allow variational estimates of the mode of the posterior
given a sample-based kernel estimate of the prior.
The paper in Appendix D gives the likelihood and gradient expressions for a filter
posterior when the observation function is emulated by GPR. Three application examples
that employ Gaussian priors are used to demonstrate that the theory is sound: emulating
the observation function does not attenuate filter performance. This paper has not been
published because the observation function emulator has not been tested in conjunction
with a nonparametric prior.
26
CHAPTER 3: DISCUSSION AND FUTURE WORK
The work contained in this dissertation can be thought of as serving three somewhat
distinct purposes. On the one hand, we have introduced general methods which
rigorously evaluate the performance of common data assimilation algorithms (Nearing
and Gupta in review; Nearing et al. 2013), and which extend the applicability of these
methods (Nearing and Gupta in preparation). On the other hand, most of our application
examples are related, either directly or indirectly, to the problem of estimating
agricultural productivity from satellite observations of the land surface, and we are able
to make general statements about the feasibility of this endeavor. Finally, and perhaps
most importantly, we have attempted to initiate a meaningful discussion about what
information means from the perspective of a hydrologic modeler. Related to the latter, we
wish to emphasize that information must be about something that is well defined, and
that both models and observations contain information.
The novel methods proposed in Appendixes B-E are theoretically applicable to a diverse
range of problems; each of these papers presents a new method. The methods which are
presented in Appendixes B and C were inspired by reframing an old and under-studied
problem in a new context. We hope and expect that this change in perspective will offer a
foundation for further philosophical and practical developments. Certainly we expect to
apply these types of evaluation metrics to both new and old practical state estimation
problems.
27
The paper in Appendix D is our first attempt to understand what it means for a model to
contain information. We expect that this might prove to be the most important
contribution of this dissertation, and have begun formal studies into the possibility of
segregating information contained in models into good and bad information; similar to
what was described in Appendix C.
The application examples described in Appendixes A-C provide insight into the
feasibility of using observations of soil moisture and leaf area index to improve seasonal
estimates of agricultural productivity. The main lessons learned during these studies were
(1) that data assimilation is inefficient at improving state estimates in agricultural
simulation models, and (2) that simple regression is probably preferable to data
assimilation for estimating yield. While we don’t suggest that data assimilation should
not be applied to the problem of estimating agricultural productivity, we do encourage
caution and careful assessment of the goals and impacts of such a strategy. We feel that
discriminative approaches are the more promising method, and are very interested to see
interpolation applied to other forecasting problems which are typically addressed using
data assimilation.
28
REFERENCES
Bayes, M., & Price, M. (1763). An Essay towards Solving a Problem in the Doctrine of
Chances. By the Late Rev. Mr. Bayes, F. R. S. Communicated by Mr. Price, in a Letter to
John Canton, A. M. F. R. S. Philosophical Transactions, 53, 370-418
Boyle, D.P. (2000). Multicriteria calibration of hydrologic models. In, Department of
Hydrology and Water Resources. Tucson, AZ: University of Arizona
Bulygina, N., & Gupta, H. (2009). Estimating the uncertain mathematical structure of a
water balance model via Bayesian data assimilation. Water Resources Research, 45
Bulygina, N., & Gupta, H. (2010). How Bayesian data assimilation can be used to
estimate the mathematical structure of a model. Stochastic Environmental Research and
Risk Assessment, 24, 925
Bulygina, N., & Gupta, H. (2011). Correcting the mathematical structure of a
hydrological model via Bayesian data assimilation. Water Resources Research, 47
Cover, T.M., & Thomas, J.A. (1991). Elements of information theory. In. New York,
NY, USA: Wiley-Interscience
Damianou, A.C., Titsias, M.K., & Lawrence, N.D. (2011). Variational Gaussian process
dynamical systems. In, NIPS (pp. 2510–2518). Granada, Spain
de Wit, A.M., & van Diepen, C.A. (2007). Crop model data assimilation with the
Ensemble Kalman filter for improving regional crop yield forecasts. Agricultural and
Forest Meteorology, 146, 38-56
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum Likelihood from
Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B
(Methodological), 39, 1-38
Entekhabi, D., Njoku, E.G., O'Neill, P.E., Kellogg, K.H., Crow, W.T., Edelstein, W.N.,
Entin, J.K., Goodman, S.D., Jackson, T.J., Johnson, J., Kimball, J., Piepmeier, J.R.,
Koster, R.D., Martin, N., McDonald, K.C., Moghaddam, M., Moran, S., Reichle, R., Shi,
J.C., Spencer, M.W., Thurman, S.W., Tsang, L., & Van Zyl, J. (2010). The Soil Moisture
Active Passive (SMAP) Mission. Proceedings of the IEEE, 98, 704-716
Evensen, G. (2003). The Ensemble Kalman Filter: theoretical formulation and practical
implementation. Ocean Dynamics, 53, 343–367, doi:310.1007/s10236-10003-1003610239
Evensen, G., & van Leeuwen, P.J. (2000). An ensemble Kalman smoother for nonlinear
dynamics. Monthly Weather Review, 128, 1852-1867
29
Gelaro, R., Zhu, Y., & Errico, R.M. (2007). Examination of various-order adjoint-based
approximations of observation impact. Meteorologische Zeitschrift, 16, 685-692,
doi:610.1127/0941-2948/2007/0248
Ghahramani, Z., & Roweis, S.T. (1999). Learning nonlinear dynamical systems using an
EM algorithm. Advances in neural information processing systems, 431-437
Gong, W., Gupta, H.V., Yang, D., Sricharan, K., & Hero, A.O. (2013). Estimating
epistemic & aleatory uncertainties during hydrologic modeling: An information theoretic
approach. Water Resources Research, n/a-n/a
Gordon, N.J., Salmond, D.J., & Smith, A.F.M. (1993). Novel approach to nonlinear nonGaussian Bayesian state estimation. IEE Proceedings-F Radar and Signal Processing,
140, 107-113, doi:110.1049/ip-f-1042.1993.0015
Hoogenboom, G., Jones, J.W., Wilkens, P.W., Porter, C.H., Hunt, L.A., Boote, K.L.,
Singh, U., Uryasev, O., Lizaso, J., Gijsman, A.J., White, J.W., Batchelor, W.D., & Tsuji,
G.Y. (2008). Decision Support System for Agrotechnology Transfer. In. Honolulu, HI:
University of Hawaii
Kalman, R.E. (1960). A new approach to linear filtering and prediction problems.
Transactions of the ASME–Journal of Basic Engineering, 82, 35-45,
doi:10.1115/1111.3662552
Knyazikhin, Y., Glassy, J., Privette, J.L., Tian, Y., Lotsch, A., Zhang, Y., Wang, Y.,
Morisette, J.T., Votava, P., Myneni, R.B., Nemani, R.R., & Running, S.W. (1999).
MODIS Leaf Area Index (LAI) and Fraction of Photosynthetically Active Radiation
Absorbed by Vegetation (FPAR) Product (MOD15) Algorithm Theoretical Basis
Document
Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. The Annals of
Mathematical Statistics, 22, 79-86; doi:10.2307/2236703
Liu, Y.Q., & Gupta, H.V. (2007). Uncertainty in hydrologic modeling: Toward an
integrated data assimilation framework. Water Resources Research, 43, W07401,
doi:07410.01029/02006wr005756|issn 000043-001397
Nearing, G.S. (in preparation). Kalman filtering with a Gaussian process observation
function
Nearing, G.S., Crow, W.T., Thorp, K.R., Moran, M.S., Reichle, R.H., & Gupta, H.V.
(2012). Assimilating remote sensing observations of leaf area index and soil moisture for
wheat yield estimates: An observing system simulation experiment. Water Resources
Research, 48
30
Nearing, G.S., & Gupta, H.V. (in preparation). Measuring information about model
structure introduced during system identification
Nearing, G.S., & Gupta, H.V. (in review). Information loss in generative and
discriminative approaches to estimating yield
Nearing, G.S., Gupta, H.V., Crow, W.T., & Gong, W. (2013). An approach to
quantifying the efficiency of a bayesian filter. Water Resources Research
Prevot, L., Chauki, H., Troufleau, D., Weiss, M., Baret, F., & Brisson, N. (2003).
Assimilating optical and radar data into the STICS crop model for wheat. Agronomie, 23,
297-303
Rasmussen, C., & Williams, C. (2006). Gaussian Processes for Machine Learning.
Cambridge, MA: MIT Press
Reichle, R.H. (2008). Data assimilation methods in the Earth sciences. Advances in
Water Resources, 31, 1411-1418
Reichle, R.H., McLaughlin, D.B., & Entekhabi, D. (2002). Hydrologic data assimilation
with the ensemble Kalman filter. Monthly Weather Review, 130, 103-114,
doi:110.1175/1520-0493(2002)1130<0103:HDAWTE>1172.1170.CO;1172
Rodgers, C.D. (2000). Inverse Methods for Atmospheric Sounding: Theory and Practice.
World Scientific
Roweis, S., & Ghahramani, Z. (2000). An EM algorithm for identification of nonlinear
dynamical systems
Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical
Journal, 27, 379-423
Turner, R., M., D., & C., R. (2009). System identification in Gaussian process dynamical
systems. In D. Görür (Ed.), Nonparametric Bayes Workshop at NIPS. Whistler, Canada
Vrugt, J.A., Diks, C.G.H., Gupta, H.V., Bouten, W., & Verstraten, J.M. (2005). Improved
treatment of uncertainty in hydrologic modeling: Combining the strengths of global
optimization and data assimilation. Water Resources Research, 41, 17
Vrugt, J.A., Gupta, H.V., & Nuallain, B.O. (2006). Real-time data assimilation for
operational ensemble streamflow forecasting. Journal of Hydrometeorology, 7, 548-565,
doi:510.1175/jhm1504.1171
Wang, J.M., Fleet, D.J., & Hertzmann, A. (2008). Gaussian process dynamical models
for human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30,
283-298
31
Wikle, C.K., & Berliner, L.M. (2007). A Bayesian tutorial for data assimilation. Physica
D-Nonlinear Phenomena, 230, 1-16, doi:10.1016/j.physd.2006.1009.1017
Ye, M., Meyer, P.D., & Neuman, S.P. (2008). On model selection criteria in multimodel
analysis. Water Resources Research, 44, W03428
Zupanski, M. (2005). Maximum likelihood ensemble filter: Theoretical aspects. Monthly
Weather Review, 133, 1710-1726
32
APPENDIX A:
ASSIMILATING REMOTE SENSING OBSERVATIONS OF LEAF AREA
INDEX AND SOIL MOISTURE FOR WHEAT YIELD ESTIMATES: AN
OBSERVING SYSTEM SIMULATION EXPERIMENT
1
Grey S. Nearing, 2Wade T. Crow, 3Kelly R. Thorp, 4M. Susan Moran, 5Rolf R. Reichle,
and 1Hoshin V. Gupta
1
University of Arizona Department of Hydrology and Water Resources; Tucson, AZ
2
USDA-ARS Hydrology and Remote Sensing Laboratory; Beltsville, MD
3
USDA-ARS Arid-Land Agricultural Research Center; Maricopa, AZ
4
USDA-ARS Southwest Watershed Research Center; Tucson, AZ
5
NASA Goddard Space Flight Center; Greenbelt, MD
Article published in Water Resources Research (2012), 48, W05525.
33
Abstract
Observing system simulation experiments were used to investigate ensemble Bayesian
state-updating data assimilation of observations of leaf area index (LAI) and soil moisture
( ) for the purpose of improving single season wheat yield estimates with the DSSAT
CropSim-Ceres model. Assimilation was conducted in an energy-limited environment
and a water-limited environment. Modelling uncertainty was prescribed to weather
inputs, soil parameters and initial conditions, cultivar parameters, and though
perturbations to model state transition equations. The ensemble Kalman filter (EnKF) and
the sequential importance resampling filter (SIRF) were tested for the ability to attenuate
effects of these types of uncertainty on yield estimates. LAI and
observations were
synthesized according to characteristics of existing remote sensing data and effects of
observation error were tested. Results indicate that the potential for assimilation to
improve end-of-season yield estimates is low. Limitations are due to lack of root-zone
soil moisture information, error in LAI observations, and lack of correlation between leaf
and grain growth.
34
1. Introduction
Dynamic crop models, such as the Decision Support System for Agrotechnology Transfer
crop simulation model (DSSAT; Hoogenboom et al. 2004), are used to aid decision
making under uncertainty (Jones et al. 2003). For instance, DSSAT is used by the
insurance industry to predict regional crop yields on a seasonal basis. Crop simulation
models have an advantage over empirical models of agricultural productivity in that they
can react dynamically to changes in local conditions in a physically and biologically
meaningful way. However, because of uncertainties in model representations of realworld systems and because of uncertainties inherent in input data regarding soils, cultivar
genetics and weather, any model-based estimate of agricultural yield will be subject to
error. One approach to mitigating this type of error is to constrain model simulations
using remote sensing observations through a process of data assimilation (Liu and Gupta
2007).
Remote sensing measurements related to agriculture generally contain information about
weather, vegetation or soil. Information about weather is used to force crop simulations
directly. Remotely sensed information about vegetation often comes in the form of a leaf
area index (LAI; e.g., Knyazikhin et al. 1999), which is a crop model component related
to canopy cover. Similarly, soil moisture is a model state variable that acts as the primary
control on plant water stress and observations of volumetric moisture content (
[
]) in the top few centimetres of soil ( ) are available from remote sensing sources:
AMSR-E (Njoku et al. 2003), SMOS (Kerr et al. 2010), and SMAP (Entekhabi et al.
35
2010). Together LAI and SW observations provide complementary information for
agricultural monitoring.
There are many types of data assimilation which are common in agronomy (Moulin et al.
1998; Prevot et al. 2003). This work investigates the potential for ensemble Bayesian
state-updating filters (McLaughlin 2002) to mitigate modelling uncertainty on end-ofseason wheat yield estimates. Conceptually, ensemble Bayesian filters operate on the
principle that a probability density function (pdf) representing uncertainty in model states
can be approximated by a discrete set of model simulations, and that a pdf of model
predictions can be estimated using Monte Carlo integration to marginalize uncertainty in
model states. From a Bayesian perspective, the physical model provides context (a prior
and likelihood) for interpreting information contained in remotely sensed data.
Currently, a robust understanding of the response of physically-based model estimates of
agricultural yield to state-updating assimilation remains lacking. The first step in this
process is to perform a controlled synthetic data study, also called an Observing System
Simulation Experiment (OSSE; Arnold and Dey 1986), which will allow for an analysis
of interactions between uncertainty, observations and the model. Although both Pauwels
et al. (2007) and Pellenq and Boulet (2004) present OSSEs which investigate the
assimilation of LAI and/or
into crop simulation models, these studies assess the effects
of assimilation on model states; neither investigates the impact of data assimilation on
yield estimates. de Wit and van Diepen (2007) present a case study on the effects of
assimilating
observations on yield estimates, however this does not provide sufficient
36
statistical and methodological control to differentiate limitations imposed by the model,
the assimilation algorithm, and uncertainty in model inputs and observations.
We present a set of OSSEs which assess LAI and
assimilation for improving DSSAT
CropSim-Ceres wheat yield estimates in a controlled synthetic environment. This allows
for an understanding of model response to state updating and a delineation of the effects
of modelling uncertainty, filter error, and observation error. This investigation provides a
benchmark for interpreting the results of case studies (like de Wit and van Diepen (2007))
and a foundation on which to direct the development of agricultural models and remote
sensing algorithms aimed at predicting yield.
2. Methods
Several experiments are presented. First, modelling uncertainty was partitioned into
isolated sources: weather inputs, soil parameters and initial conditions, cultivar
parameters, and model state equations. Synthesized remote sensing observations were
assimilated using the ensemble Kalman filter (EnKF; Evensen 2003) and the sequential
importance resampling filter (SIRF; Gordon et al. 1993), and mean yield predictions from
these filters were assessed. In addition, observations with variable error characteristics
were assimilated to test the effects of observation error on EnKF and SIRF results.
Sections 2.1-2.4 describe the model, data assimilation filters, and sets of numerical
experiments.
37
2.1. The DSSAT-CropSim Ceres Wheat Model
DSSAT is a collection of independent crop-growth modules supported by a land process
wrapper. Integration takes place on a daily time step and the forcing data required are
daily maximum and minimum temperature, daily integrated solar radiation and daily
cumulative precipitation.
DSSAT soil layering is user defined; we used nine layers with one surface layer
representing the 0-5 cm of soil typically assumed to be visible to L-band wavelength
satellites and a set of lower layers reaching a total depth of 1.8 m. DSSAT soil moisture
is calculated using a Richie-type soil water balance (Richie 1998) which employs a curve
number approach to partitioning runoff and updates the water content of each soil layer
based on a set of linear drainage equations. The soil surface parameters are: a runoff
curve number, an upper limit on evaporation, a drainage rate parameter, and albedo. Soil
layers are parametrized by saturated water content (porosity), drained upper limit (field
capacity), lower limit, saturated hydraulic conductivity, and a root growth factor. Similar
to Mo et al. (2005), we used the soil water balance routines but did not simulate the soil
nitrogen balance or any management decisions. This was done because it is impossible to
presume that information about these aspects of agricultural development would be
available at remote sensing scales.
The CropSim-Ceres module (CC) simulates wheat crops. The CC models yield as a
function of a Grain Weight state. Grain growth is developmentally dependent on daily
development units, which are a function of mean daily temperature and daily cumulative
solar radiation, a temperature control factor, vegetation biomass (
) defined
38
as the sum of mass storage model states Stem Weight, Leaf Weight, and Reserves Weight,
and model parameters. The most important crop model parameters are related to the
cultivar: vernalizing duration, which specifies the number of days of optimum
temperature necessary for vernalization; photoperiod response, which specifies the
percent reduction in photosynthesis for every ten hour reduction in photoperiod
;
grain filling phase duration in growing-degree-days [C days]; number of kernels per unit
plant weight [#/g]; the standard kernel size [mg]; the standard tiller weight [g]; and the
photoperiod interval between leaf tip appearances.
In contrast to Grain Weight, LAI is a function of the model state Plant Leaf Area which is
developmentally dependent on a temperature control factor and has a potential value set
by the number of plant leaves, which is in turn determined at each time step by the
cumulative sum of daily development units. Again, in contrast to grain growth, potential
daily leaf growth is attenuated by an additive factor proportional to water stress,
, so that a stress factor of 0 indicates potential growth and a stress factor of 1
indicates no growth. Potential grain growth is not modified in this way; the other
components of biomass are affected indirectly by stress through leaf assimilation of plant
carbon reserves. Water stress is the ratio of total root water uptake to potential
transpiration, which is a fraction of potential evapotranspiration calculated according to
the Priestley-Taylor method (Priestley and Taylor 1972). Root water uptake from each
layer is a function of the difference between soil moisture state and the lower limit
parameter. Thus when sufficient soil moisture is available to supply transpiration
demand, water stress is zero. Given the way model develops vegetation and grain
39
components of the wheat plant, we know that LAI and
will inform yield by informing
.
The model state vector contains all of the internal dynamic model variables necessary to
transition the simulation from one time step to the next – that is, all of the Markov
information. More specifically, at given time t, state ( ) is a function of the state at the
previous time (
), forcing data at the current time ( ), and time-invariant model
parameters ( ) according to the state transition relationship
:
[1.1]
The combined land process wrapper and CC Markov state vector has 97 components
(Tables 1 and 2). The model output vector at time t ( ; here we use the term output to
refer to model predictions which correspond directly with observations) is calculated
according to the relationship
as a function of the current state, current inputs and
parameters:
[1.2]
For our purposes, the output vector contained the quantities
9 soil layers) and LAI;
are also state variables so
(soil moisture in each of
simply preserves these
values through identity relationships. LAI is not a state variable because its value is
calculated independently at each time step as a function of the state Plant Leaf Area.
40
2.2. The Ensemble Kalman Filter
The EnKF is commonly used for state-updating in moderately nonlinear geophysical
models (Reichle 2008). It estimates the model state pdf by drawing
samples from a
joint uncertainty pdf over model parameters, forcing data, and state perturbations, and
then propagates this sample through time using the model equations. This set of model
simulations is the ensemble. At every observation time, the EnKF updates the state pdf
based on the assumptions that all model states are linearly related to model output and
that uncertainty in model states, model output, and observations can be quantified by
second-order pdf approximations. Because the method has been widely discussed, we
only present a brief overview and follow a variation on the formulation of Houtekamer
and Mitchel (2001).
The ensemble of model state predictions at time t is stored in
which has size
model outputs is
where
[
is the dimension of
[
],
. Similarly, the ensemble of
], which has dimensions
is the dimension of the observation vector,
is required a priori and an observation sample
where
. The observation error covariance,
[
,
] is generated
according to:
[2]
The ensemble of EnKF updated model states, ̂, is calculated as a least-squares estimate
based on model predictions and observations resulting in:
41
̂
[3]
where:
[3.1]
is the cross-covariance between ensemble deviations from mean model state and
deviations from mean output and:
[3.2]
is the covariance matrix of ensemble deviations from mean model output. Both
and
are sampled directly from the ensemble.
The finite nature of ensemble representations of uncertainty can lead to spurious updates
when
contains components that are not approximately or locally linearly related to
components of
. An analysis of DSSAT state and observation correlations resulted in a
list of important Markov state components which have local approximately linear
relationships with one or more output (Table 3, except stage timing states). This list
includes all CC plant mass storage components, plant leaf area, and root volume (root
accumulation is stored as a volumetric fraction rather than as a mass) as well as canopy
height. Our EnKF employed a threshold filter which discarded any relationship between
model states and modeled observation components with a Pearson product-moment
correlation coefficient
correlations.
. This reduced the possibility of picking up spurious
42
It is important to note that the CC is a set of step functions which calculate crop attributes
in fundamentally different ways depending on the current stage of development. Because
of these and other nonlinearities, the EnKF will not guarantee mutually consistent model
states after each update since it uses a single correlation relationship to update every
ensemble member regardless of the current growth stage of each particular simulation.
2.3. The Sequential Importance Resampling Filter
The SIRF provides an approximate Bayesian estimate of model state uncertainty at each
time step conditional on past observations without assumptions of linearity and second
order statistics. At each observation time, each ensemble member's state vector is
assigned an importance weight,
, which is proportional to the posterior likelihood of
that state vector conditional on all past observations:
[4]
superscripts index the ensemble member. The observations are assumed to be
independent conditional on the model output, and model output is a deterministic
function of model state according to equation [1.2] so that the likelihood function relates
observations to model state vectors according to:
[4.1]
is emulated by the observation uncertainty pdf, in this case Gaussian with mean
and covariance
. The state prior,
, is estimated iid discretely by
by
the ensemble in the same way as the EnKF; this is achieved for time step t+1 by
43
resampling the ensemble at time step t with replacement and with probabilities
proportional to
.
resulting in an iid discrete representation of the posterior,
, as calculated from
by equation [1.1], thus contains an iid discrete
representation of the prior at time step t+1,
. Proportional probability weights
are simply calculated as:
(
)
[4.2]
In our case, the simulation ensemble was updated by replacing all 97 components of each
member's state vector (Tables 1 and 2) with a state vector from a different simulation
(sampling with replacement). Each model retained its own parameters and forcing data.
2.4. Observing System Simulation Experiments
OSSEs, as diagrammed in Figure 1, were used to assess LAI and
A group of
assimilation potential.
simulations (ensemble size is discussed in section 2.4.3) was
sampled from a modelling uncertainty pdf. One of these simulations was chosen
randomly as the truth system (upper path Figure 1) leaving an
-member prediction
ensemble which was used to estimate yield with and without data assimilation. Synthetic
observations were generated by sampling from an observation uncertainty distribution
around the truth system output and assimilated by the EnKF and SIRF (middle paths in
Figure 1). The ensemble of model simulations without data assimilation was called the
open loop (bottom path in Figure 1). The open loop, EnKF and SIRF all used the same
44
truth system and ensemble members (parameters, initial conditions and weather forcing
data) and the EnKF and SIRF used the same synthetic observations.
This type of OSSE was used to (i) choose an appropriate ensemble size, (ii) test the
effects of EnKF and SIRF assimilation on segregated and combined modelling
uncertainty sources, (iii) test the effects of observation uncertainty on EnKF and SIRF
assimilation. Experiments (ii) and (iii) used the ensemble size chosen by (i). In all cases
except for when determining ensemble size, each OSSE was repeated fifty times by
drawing separate truth systems and ensembles; this Monte Carlo repetition provided a
statistically independent experiment sample. The following subsections describe the
modelling uncertainty pdf, the procedure for generating synthetic observations from truth
system output, and these sets of experiments.
2.4.1. Modelling Uncertainty Distributions
Assimilation was tested on two rain-fed wheat crops with different levels of water stress.
Mean parameter and weather inputs came from field experiment data which are packaged
with the DSSAT version 4.5 release: a 1975 study of a summer wheat crop conducted in
Swift Current, Saskatchewan, Canada reported by (Campbell et al. 1977a; Campbell et al.
1977b) and a 1974-1975 study on winter wheat conducted in Rothamsted, UK. Figure 2
plots LAI, Grain Weight and water stress for both crops simulated using mean parameters
outlined in Table 4, columns 2 and 3 and unperturbed weather forcing data.
The Swift Current summer wheat crop represents a water-limited system and yielded 104
[kg/ha] using mean parameters listed in Table 4, column 2; the potential, non-stressed
45
yield was 4266 [kg/ha]. The mean parameter and forcing system received 153.6 [mm] of
rainfall over a total of 95 days from planting on 25 May 1975 to maturity on 28 Aug 1975
and had a total evapotranspiration of 151.9 [mm]. The water stress factor reached a high
of
during the Ear Growth stage which occurred between day 51 and day 61
after planting. Although this crop produced very little yield using the mean parameter and
input values, it was often the case that simulations sampled from the parameter and input
uncertainty distribution caused a substantial increase in yield.
The Rothamsted winter wheat crop represents an energy-limited system and reached
potential yield of 6651 [kg/ha] using mean parameters listed in Table 4, column 3. The
system received 512.9 [mm] of rainfall over a total of 269 days from planting on 6 Nov
1974 to maturity on 6 Aug 1975 and had a total evapotranspiration of 381.2 [mm]. The
water stress factor reached a high of
during the Grain Filling stage which
occurred between day 240 and harvest; the water stress factor was close to 0 during all
other development stages. Although this crop produced potential yield using the mean
parameter and input values, it was often the case that simulations sampled from the
parameter and input uncertainty distribution caused a substantial decrease in yield.
Weather forcing data uncertainty was emulated by perturbing daily measured weather
data with values sampled form the temporally and cross-correlated joint pdf outlined in
Table 5. Perturbations on solar radiation and precipitation were multiplicative lognormally distributed with mean 1 and standard deviation 0.3 and 0.5 respectively;
perturbations on temperature were additive Gaussian with mean 0 and unit variance; the
same daily perturbation was applied to daily maximum and minimum temperature.
46
Weather perturbations were cross-correlated and AR(1) (first-order auto-regressive)
temporally auto-correlated with correlation coefficients 1/e following (Reichle et al.
2007; Reichle et al. 2010); since integration was on a daily time step, these auto
regression coefficients are relevant to a daily time series.
Parameter uncertainty distributions were Gaussian (approximately, due to bounds) with
means, variances, and bounds listed in Table 4. Cultivar parameters are model-specific;
parameter files included with DSSAT release version 4.5 provided the limits and
variances listed in Table 4. Variances and bounds for surface soil parameters were also
estimated using a library of soil parametrizations included with DSSAT version 4.5. We
used lumped parameters for the bottom 8 soil layers due to a presumed lack of knowledge
about subsurface soil properties. The root growth factor was assumed to decrease
exponentially with depth and was parametrized by a maximum value at the surface.
Porosity, saturated conductivity and residual saturation were calculated from clay and silt
percentages using pedotransfer functions from (Cosby et al. 1984). Bulk density was
calculated as a function of porosity assuming mineral density of 2.65
and drained
upper limit was taken as the average of saturated and residual moisture contents. Global
soils maps that provide clay and silt contents are not usually associated with useful error
estimates because much of the error in soil mapping is due to sparse measurements of
heterogeneous areas. Here we sampled sand and clay percentage parameters
independently, each with a standard deviation of 10%; sand and clay percentages were
constrained to be positive and the sum was constrained to be less than one by, when
necessary, reducing both parameters by an equal amount. Given the mean sand and clay
47
parameters from Table 4, this resulted in 95% confidence bounds which spanned
approximately 18% of the soil textural triangle at and Swift Current and 25% of the soil
textural triangle at Rothamsted.
Model structural uncertainty was simulated by adding noise to the model state transition
equations:
[5]
This type of model error was added to those states listed in Table 3 except for Grain
Weight, and in some cases, where specified, to the development unit stage timing states
Cumulative Development Units and Cumulative Germination Units. It was found that
perturbing the Grain Weight state caused irreconcilable yield error by weakening
statistical relationships between observations and yield. Random state perturbations were
drawn from zero-mean Gaussian distributions with heteroscedastic variances:
[5.1]
where
is the
dimension identity matrix. This ensured that no state would
become finite purely because of the perturbation, and threshold filters were used to
ensure that all state values remained non-negative. Perturbations were sampled
independently across states and time steps.
48
2.4.2. Generating Synthetic Observations
The truth system output vector (
synthetic observations,
) was used to generate
. At every observation time, a remote sensing measurement
process was simulated by drawing from the Gaussian distribution:
[
and
]
[6]
represent the observation error covariance matrixes related to LAI and
observations respectively. Synthetic observations have no spatial scale.
Frequency and error properties of synthetic observations were guided by uncertainty in
existing remote sensing data. Observations of
are available from SMOS at most major
agricultural areas every 3 days with a spatial resolution of 50 km and approximate
retrieval accuracy of error
[
] (Kerr et al. 2010). An improvement in
spatial resolution (to 9 km) is expected with the launch of SMAP in 2014 (Entekhabi et
al. 2010). Measurement accuracy will degrade as vegetation water content,
[
],
increases throughout the growing season; in the case of SMAP, observations at or better
than the
[
] accuracy level are expected to be confined to areas with
vegetation water content less than 5 [
] (Entekhabi et al. 2010). Jackson and
Schmugge (1991) developed a relationship between vegetation water content and
vegetation transmissivity, as proposed by Kirdiashev et al. (1979), which Bolten et al.
(2010) adapted to model soil moisture observation uncertainty as:
49
(
)
[6.1]
[
] is the variance of the soil moisture retrieval at time t made over vegetation,
[
] is the variance of estimates made over bare soil,
[ ] is an environment
parameter which accounts for vegetation type and roughness, and
incidence angle. The SMAP incidence angle is
[ ] is the satellite
and we adopted the value of
for agricultural crops (Crow et al. 2005), along with a bare soil observation error standard
deviation of
approaches (
[
[
)
] resulting in a retrieval accuracy model which
[
] as vegetation water content approaches
]. Vegetation water content was assumed to be a constant fraction of plant
biomass and plant population (
[
(
]):
)
[6.2]
was set according to results reported by Malhotra (1933). Plant population is a
CC component that is not included in the EnKF update.
Synthetic observations of
were generated every three days, each layer perturbed
independently with the same statistical error characteristics. Remote sensing platforms
are only able to measure soil water content in the upper few centimetres of soil, and in
later sections we compare assimilations which used only
assimilations which used observations of
.
observations with
50
The MODIS MOD15A2 product group provides LAI [
] estimates as an eight-day
composite. The accuracy and uncertainty in this product has been investigated over
agricultural areas by comparing the composite image to reference LAI from a single day
at the end of the composite period; the uncertainty standard deviation was reported to be
[
with
] (Tan et al. 2005). We generated synthetic LAI observations
every eight days to simulate the remote sensing measurement
process.
2.4.3. Ensemble Size Experiments
Ensemble size represents a balance between pdf representation and computation expense.
Effects of varying ensemble size were evaluated using ensembles of
= 10, 25, 50,
75, 100, 250, 500, 750 and 1,000. An appropriate ensemble size was found when pdf
predictions of Grain Weight, LAI and
became stable with increasing sample size. The
quality of ensemble representations of outputs LAI &
and state Grain Weight was
quantified using the root mean squared error (RMSE) taken over all simulation time steps
of the difference between mean ensemble predicted output values and the true value.
To make direct comparisons between OSSEs with different ensemble sizes, it was
necessary to use the same truth system and observation set. Four truth systems were
chosen and, for each truth system, a corresponding set of observations were generated
and an
member ensemble was sampled from the full uncertainty pdf
outlined in Tables 4 and 5 and equation [5]. For each truth system, each ensemble of
increasing size completely contained all smaller ensembles. For example, for a given
51
truth system, the
member ensemble contained the
member
ensemble plus 15 additional simulations. The RMSE averaged over these four
experiments is reported, and the particular choice of ensemble size is discussed in section
3.1.
2.4.4. Modelling Uncertainty Experiments
Once an appropriate ensemble size was chosen, experiments were conducted to test the
ability of data assimilation to mitigate particular types of modelling uncertainty in yield
estimates. Modleing uncertainty pdf were taken as marginal distributions of the entire
joint uncertainty distribution (Tables 4 and 5 and equation [5]). We tested the full joint
uncertainty pdf described in section 2.4.1 plus five marginal uncertainty pdf related to: (i)
weather forcing data (Table 5), (ii) soil parameters and initial conditions (Table 4), (iii)
cultivar parameters (Table 4), (iv) model state perturbations to states listed in Table 3
except for Grain Weight, Cumulative Development Units and Cumulative Germination
Units, and (v) same as (iv) but with perturbations to Cumulative Development Units and
Cumulative Germination Units. Because each uncertainty type was independent from all
others, marginalizing a particular uncertainty component was done by setting all of the
variances of its uncertainty components to zero.
Assimilation OSSEs were run for simulations of water-limited (Swift Current) and
energy-limited (Rothamsted) crops using four types of observation sets: LAI and
only,
only, and
, LAI
; the first three represent what are available from satellites and the
fourth provides a way to assess limitations of only having surface level soil moisture
52
information. Each OSSE was repeated fifty times with different truth systems, ensembles
and observations. The results were evaluated using a mean error score (ME [kg/ha]),
which is the absolute difference between the ensemble mean predicted end-of-season
yield and the true yield. This was calculated for the open loop, EnKF and SIRF
ensembles for each of fifty OSSEs, and a single-tailed, two-sample t-test (
) was
used to test for a significant reduction in mean ME score due to SIRF or EnKF
assimilation.
Yield estimates can only be expected to improve when there are strong (although
possibly indirect) relationships between model outputs and Grain Weight. Since Grain
Weight is related to LAI and water stress through biomass, we report the absolute timeaveraged Pearson product-moment correlation coefficients (
and
. For
) between model outputs
the sum of profile water was used. Statistics were calculated using all
open loop ensemble members from the 50 combined uncertainty OSSEs.
2.4.5. Observation Uncertainty Experiments
Eight sets of OSSEs which utilized the full modelling uncertainty pdf were used to test
effects of observation uncertainty on assimilation results. Synthetic observations of LAI
and
were generated and assimilated every three and eight days as described in section
2.4.2, however
error variances were
0.030, 0.040, and 0.050
= 0.001, 0.005, 0.010, 0.015, 0.020,
and LAI error variances were
0.01, 0.02,
0.05, 0.10, 0.15, 0.20, 0.30, and 0.40 for the eight sets of OSSEs. For both water-limited
and energy-limited crops each observation type and uncertainty level was tested using
53
fifty Monte Carlo OSSEs and the reduction in ME scores was used to evaluate EnKF and
SIRF performance.
3. Results
3.1. Ensemble Size Experiments
Figure 3 illustrates RMSE output and Grain Weight dynamics due to varying ensemble
size averaged over four sets of Swift Current OSSEs. Generally, when LAI was
assimilated LAI RMSE improved but not
RMSE, and when
was assimilated
RMSE improved but not LAI RMSE; this is similar to the findings of Pauwels et al.
(2007) who used the EnKF and observed little improvement to modeled LAI when
observations were assimilated. In both cases where outputs improved – LAI RMSE for
LAI assimilation and
RMSE for
assimilation – output RMSE values were relatively
stable with increasing ensemble size after about
In the two cases where outputs did not improve –
RMSE for
for both the EnKF and SIRF.
RMSE for LAI assimilation and LAI
assimilation – output RMSE values stabilized around
. de Wit
and van Diepen (2007) used the EnKF and found stability in mean-squared-error of
modeled LAI with EnKF assimilation of
at around
, which is in agreement
with our findings. The Grain Weight state RMSE generally improved when LAI
observations were assimilated but not
observations; in both cases, the Grain Weight
RMSE became relatively stable around
discussed in this paper used an ensemble size of
. The remainder of the OSSEs
.
54
3.2. Modelling Uncertainty Experiments
Table 6 reports Monte Carlo average ME scores and time-averaged correlations between
outputs and end-of-season yield for water-limited and energy-limited OSSEs. The results
for each uncertainty type are described below. It is important to understand that the timeaveraged correlations between model outputs and biomass inform assimilation results,
but that the relationship is indirect and situation dependent. Thus the
coefficients in
Table 6 cannot be compared directly.
Weather Inputs: Precipitation uncertainty affects grain development indirectly via the
water stress control on the Leaf Weight component of biomass, while radiation and
temperature affect grain development directly as well as indirectly though biomass. In
simulations of the water-limited crop, LAI and
were correlated with
at approximately
. ME scores were not significantly improved by assimilating LAI or
using
either filter. The inefficiency in assimilating LAI observations can be attributed to
differences in the way radiation and temperature affect leaf and grain development and
the fact that uncertainty in radiation and temperature affected grain growth after leaves
were senesced (Figure 2).
The SIRF was able to reduce the average ME score by assimilating
and was more
successful than the EnKF in this case likely due to the EnKF's linear model assumption.
Swapping out state vectors for ones from ensemble members which represent crops
grown in conditions similar to the truth system (similar historical water demand and
availability) improved yield predictions, however updating plant states based on linear
55
correlations with soil states was inefficient, and improvements to estimates of profile
water content did not translate into improved simulations of future plant development.
The useful information stored in the soil moisture state was information about growth
histories; this suggests that an improved understanding of the effects of weather on the
crop growth environment might not be as important as an improved understanding of the
effects of weather on crops themselves.
In energy-limited simulations, LAI was correlated with
at
and EnKF
assimilation of LAI observations improved yield predictions. Water stress was a
decorrelating factor between LAI and biomass because of differences in how Leaf
Weight, Stem Weight, and Reserves Weight responded to stress. Again, in the energylimited environment, SIRF assimilation of
improved yield estimates while the EnKF
did not.
Soil Parameters and Initial Conditions: In simulations of the water-limited crop, the
ability of the soil to infiltrate and store water was important for productivity. Correlations
between LAI and
were high (
), however assimilation of LAI did not result
in improved yield estimates. In this case, when the filters increased biomass due to high
LAI observations, large plants were essentially placed into soils that could support them.
When the filters decreased biomass due to low LAI observations, the plants grew quickly
due to sufficient water availability. This is an example of the limitations of data
assimilation filters when model parameters are uncertain. Since LAI observations were
not available after senescence (Figure 2), the updated plant simulations were able to
respond to the new environment before grain growth was completed.
56
In contrast, correlation between
and
was moderate (
SIRF was able to improve ME scores by assimilating
), however the
. In this case, since observations
of soil moisture were available through the end of the growing season, the SIRF was able
to replace the ensemble of plant state vectors late in the grain development phase with
ensemble members which had developed in conditions similar to the truth system.
Surface level soil moisture observations contained insufficient information to improve
Grain Weight simulations due to the fact that surface soil layer parameters were
independent of root-zone parameters and the root-zone largely controls water availability.
In the low-stress simulations, soil parameters did not affect grain growth.
Cultivar Parameters: Error in yield estimates due to uncertainty in cultivar parameters
was not mitigated by state-updating for either crop using any combination of filter and
observations. Correlations between LAI and
were high, but differences in the kernel
number and standard kernel size parameters decoupled
from Grain Weight. LAI and
observations do not inform these parameter values.
State Perturbations Without Stage Timing States: When states other than stage timing
variables Cumulative Development Units and Cumulative Germination Units were
perturbed, water-limited LAI was correlated with
energy-limited environment, it was not (
(
), however in the
). In the water-limited environment,
perturbations to the root-zone soil water state partially controlled growth variations
through the stress factor, and this resulted in correlated LAI and . In the energy-limited
case, LAI and
were not well-correlated because the state perturbations were
57
independent. Thus in the water-limited environment, LAI assimilation resulted in
improved ME score, and in the energy-limited environment, it did not.
was not well-correlated with
in either set of simulations, however
was
correlated in the water-limited environment. Again, this is due to the fact that
perturbations to soil water states were independent between layers and root-zone water
availability determined water stress.
assimilation improved yield estimates in this
case.
State Perturbations Including Stage Timing States: When perturbations to stage timing
states Cumulative Development Units and Cumulative Germination Units were included,
the water-limited LAI-
correlation was reduced (
to
).
Differences in growth stages between ensemble simulations accumulated throughout the
growing season and caused a gradual decrease in cross-correlation between vegetation
states (not shown). In energy-limited simulations, the correlation increased slightly
(
to
) and the SIRF was able to improve ME scores by
assimilating LAI. In the water-limited case, stress controlled biomass and LAI through
leaf development and dissimilar development stage transitions resulted in decorrelation.
In the energy-limited case, random perturbations controlled leaf development, Stem
Weight, and Reserves Weight independently and similar development stage transitions
resulted in slightly higher correlated vegetation states.
Combined Uncertainty Sources: in a full modelling uncertainty paradigm, the EnKF and
SIRF were unable to significantly improve ME scores in any case. This is a combined
58
result of lack in state correlations caused by differences in cultivar parameters and
differences in weather controls on biomass, LAI and Grain Weight.
3.3. Observation Uncertainty Experiments
Results from varying error variances of synthetic LAI and
observations (Table 7)
suggest that even nearly perfect observations of surface-level soil moisture will not
improve single season yield estimates under reasonable modelling uncertainty
assumptions. LAI assimilation was valuable in the water-limited simulations when the
LAI observations error standard deviation was between 0.05 and 0.30 [m 2/m2. The SIRF
was almost always better at assimilating LAI in the water-limited simulations, which is
likely due to the highly nonlinear nature of the CC module.
4. Discussion
The purpose of this study was to identify potential for state-updating data assimilation to
mitigate error in yield estimates due to modelling uncertainty. Results show that this
approach was generally unable to improve yield estimates under realistic uncertainty
scenarios. There were many factors which contribute to this: (1) weather inputs affect
grain and leaf growth differently meaning that similar LAI outputs do not imply similar
Grain Weight states, (2) that certain cultivar parameters affect grain development directly
in a way which is independent of all other model states, (3) that state-updating often
results in plant state vectors which disagree with model (soil) parameters, and (4) that
surface level soil moisture observations did not contain sufficient information about
available water and water stress. Results suggest that in water-limited environments, LAI
59
assimilation would be more useful if observation error was lower than what is currently
available. This is a problem because LAI observations will suffer from uncertainties
which were not considered in this study: namely due to spatial heterogeneity in
agricultural systems and discrepancies in spatial resolutions between fields and image
pixels.
These findings are qualitatively similar to those from the case study reported by de Wit
and van Diepen (2007), who used the EnKF to assimilate
observations over crop land
in Europe. They found improvement to winter wheat yield estimates in 33 out of 88 test
regions. Although their real-world experiment was likely hampered by mismatches in
spatial resolution between agricultural fields and remote sensing observations, as well as
other spatial factors including crop heterogeneity, we have shown that these factors alone
were likely not the reason for poor assimilation results.
This study suggests that in order to combine remote sensing observations with
agricultural models for the purpose of estimating yield at single-season time scales, it will
be necessary to modify our interpretation of crop development. Primarily, it would be
interesting to investigate methods and ancillary data necessary for correlating leaf
development with grain development directly. It is likely that the utility of soil moisture
observations will be limited to monitoring extreme events over large time scales as was
implied by Bolten et al. (2010), or for estimating irrigation schedules and agricultural
water use as was done by Wang and Cai (2007).
60
Acknowledgement
This research was jointly supported by a grant entitled from the NASA Terrestrial
Ecology program entitled Ecological and agricultural productivity forecasting using
root-zone soil moisture products derived from the NASA SMAP mission (PI - W.T.
Crow), the NASA SMAP Science Definition Team, and grant 08-SMAPSDT08-0042 (PI
- M.S. Moran). The authors would like to thank Cheryl Porter from the Department of
Agricultural and Biological Engineering at the University of Florida for her help
acquiring and managing DSSAT source code.
61
Tables
Table 1: The Decision Support System for Agrotechnology Transfer (DSSAT) Markov
State Vector (33 Components)
Component
Description
Units
ATOT
Sum of last 5 days soil temp.
C
CANHT
Canopy height
m
DRN
Drained soil water
cm
EO
Potential evap.
mm/day
EOP
Potential plant transp.
mm/day
EORATIO
Increase evap. per unit LAI
mm/day
EOS
Potential soil evap.
mm/day
EP
Plant transp.
mm/day
ES
Soil evap. rate
mm/day
FRACRTS Fraction of soil contact w/roots
#
KSEVAP
Light extinction coeff. (evap.)
#
KTRANS Light extinction coeff. (trans.)
#
PORMIN Min. pore space for O to plants m /m
RLV
Root volume by soil layer
m /m
RWUMX Max. uptake per unit root length m /m
SNOW
Snow accumulation
mm
SRFTEMP
Surface soil temperature
C
SSOMC
Soil Carbon
kg/ha
ST
Soil temperature
C
STGDOY
Stage transition dates
day
SUMES1
Cumulative stage 1 soil evap.
mm
SUMES2
Cumulative stage 2 soil evap.
mm
SW
Soil water
m /m
SWDELTS
Drainage rate
m /m
SWDELTU
Change in SW due to evap.
m /m
SWDELTX Change in SW due to plant uptake m /m
TMA
Last 5 days of soil temp.
C
TRWU
Total root water uptake
mm
TRWUP Total potential root water uptake m /m
TSOILEV
Duration stage 2 evap.
days
TSS
Number of days with saturated soil days
UPFLOW
Upward flow due to evap.
cm
WINF
Infiltration
mm
62
Table 2: The CropSim-Ceres Wheat Module Markov State Vector (64 Components)
Component
Description
Units
ADATEND
Anthesis end date
date
AFLFSUM
Carbohydrate leaf factor
#
CARBOC Cumulative carbohydrate assimilated g/plant
CHRSWT
Chaff reserves
g/plant
CHWT
Chaff weight
g/plant
CUMDU
Cumulative development units
#
CUMGEU
Cumulative germination units
#
CUMVD
Cumulative vernalization days
days
DAE
Days after emergence
#
DEADWT Dead leaf weight retained on plant g/plant
G2
Coeff. grain growth (modified) mg/days C
GEDSUM Germination + emergence duration
days
GESTAGE Germination at emergence stage
#
GETSUM Germination + emergence temp. sum
C
GPLA
Green leaf area
cm /plant
GPLASENS Green leaf area during senescence cm /plant
GRNUM
Grains per plant
#/plant
GRWT
Grain weight
g/plant
ISTAGE
Current developmental stage
#
ISTAGEP
Previous developmental stage
#
LAGSTAGE Lag phase for grain filling stage
#
LAP
Leaf area by leaf number
cm /plant
LAPOT
Leaf area potentials by leaf number cm /leaf
LAPS
Leaf area senesced by leaf number cm /plant
LFWT
Leaf weight
g/plant
LLRSWAD
Leaf lamina reserves weight
kg/ha
LLRSWT
Leaf lamina reserves
g/plant
LLWAD
Leaf lamina weight
kg/ha
LNUMSD
Leaf number per Haun stage
#
LNUMSG
Growing leaf number
#
LSHAI
Leaf sheath area index
#
LSHRSWAD
Leaf sheath reserves weight
kg/ha
63
Table 2 (continued): The CropSim-Ceres Wheat Module Markov State Vector (64
Components)
Component
Description
Units
LSHRSWT
Leaf sheath reserves
g/plant
LSHWAD
Leaf sheath weight
kg/ha
PARI
PAR interception fraction
#
PLA
Plant leaf area
cm
PLTPOP
Plant Population
#/m
RSTAGE
Reproductive development stage
#
RSWT
Reserves weight
g/plant
RTDEP
Root depth
cm
RTWT
Root weight
g/plant
RTWTL
Root weight by layer
g/plant
SEEDRS
Seed reserves
g/plant
SEEDRSAV
Seed reserves available
g/plant
SENCL
Senesced Carbon by layer
g/plant
SENCS
Senesced Carbon added to soil
g/plant
SENLA
Cumulative senesced leaf area
cm /plant
SRADSUM
Cumulative radiation
MJ/m
SSTAGE
Secondary stage of development
#
STRSWT
Stem reserves
g/plant
STWT
Stem weight
g/plant
TNUM
Tiller number
#/plant
TSDAT
Terminal spikelet date
#
TSS
Duration of saturation
days
TTD
Thermal time over last 20 days
C
TTNUM
Thermal time means in sum
#
VF
Vernalization factor
#
WFG
Water stress factor for growth
#
WFGC
Cumulative growth water factor
#
WFLFNUM Water stress factor for each leaf
#
WFLFSUM Cumulative water stress factor per leaf
#
XSTAGE
Stage of development
#
ZSTAGE
Zadok stage of development
#
ZSTAGEP
Precious Zadok stage
#
64
Table 3: The State Vector Updated by the EnKF and the State Vector Perturbed by
Equation [5.1] to Simulate Model Structural Errora
a
State Component
Units Dimension
CSM States
Soil Water
9
[m /m ]
Canopy Height
[m]
1
CC States
Root Volume Fraction
9
[cm /cm ]
Chaff Weight
[g/plant]
1
Stem Weight
[g/plant]
1
Leaf Weight
[g/plant]
1
Reserves Weight
[g/plant]
1
Grain Weight
[g/plant]
1
Plant Leaf Area
1
[cm ]
Seed Reserves
[g/plant]
1
Stage Timing States
Cumulative Development Units [ C days]
1
Cumulative Germination Units [ C days]
1
Cumulative Development Units and Cumulative Germination Units are not updated by
the ensemble Kalman filter (EnKf), and Grain Weight is not perturbed by equation [5.1].
65
Table 4: The (Approximately) Gaussian Probability Density Function of Uncertainty in
Model Parameters and Initial Conditions
Uncertainty
Uncertainty Source
Cultivar Parameters
Vernalizing Duration
Photoperiod Response
Grain Filling Duration
Kernel Number
Standard Kernel Size
Standard Tiller Weight
Interval Between Leafs
Soil Parameters
Albedo
Upper Limit Evaporation
Drainage Rate Parameter
Runoff Curve Number
Root Growth Factor
Layer 1 (0-5cm)
Layer 2-9 (5-180 cm)
Clay %
Layer 1 (0-5cm)
Layer 2-9 (5-180 cm)
Silt %
Layer 1 (0-5cm)
Layer 2-9 (5-180 cm)
Initial Soil Moisture
Layer 1 (0-5cm)
Layer 2-9 (5-180 cm)
Mean
Parameters
Values
Swift Rotha- Std. Low. Upp.
Current msted Dev. Bnd Bnd
Units
0
60
335
25
26
1.5
86
60
67
515
14
44
4.0
100
3
0
60
[days]
10
0
200
[%]
33.5 100 1000 [C days]
2.5 10
50
[#/g]
2.6 10
80
[mg]
0.3 0.5
8
[g]
10
30 150 [C days]
0.10
9.4
0.20
84.0
0.14
6.0
0.50
60.0
0.05 0
1
[ ]
2.0
1
12 [cm/day]
0.3 0.01 0.99 [1/day]
10
1
99
[ ]
1.00
0.74
1.00
0.90
0.05
0.1
0
0
1
1
[ ]
[ ]
10.7
9.2
23.4
23.4
10
10
0
0
100
100
[%]
[%]
29.9
29.7
30.0
30.0
10
10
0
0
100
100
[%]
[%]
0.23
0.33
0.04
0.20
0.33
Low
Lim
Low
0.08
Lim
Satura
[m /m ]
tion
Satura
[m /m ]
tion
66
Table 5: Forcing Data Perturbation Sampling Parameters
Correlations
Mult.
Or
Std. AR(1)
w/
w/
w/
Weather Inputs
Add. Dev. Coeff.a Temp Rad. Precip. Data Unitsb
Temp(Max and Min)
A
1
1/e
1
-0.80 -0.32
[C ]
Solar Radiation
M
0.3
1/e
-0.80
1
0.40 MJ/m day
Precipitation
M
0.5
1/e
-0.32 0.40
1
[mm]
a
First order autoregressive coefficients assume a daily time series.
b
Data units are the dimensions of the forcing data itself and not the units of the
perturbations except in the case of temperature which uses additive perturbations;
radiation and precipitation perturbations are multiplicative and unitless.
67
Table 6: Monte Carlo Average Mean Error Scores and Time-Averaged Correlation
Coefficients for Modeling Uncertainty OSSEsa
Crop System
Uncertainty
Weather
Soil Parameters &
Initial Conditions
Cultivar Parameters
WaterState Perturbations w/o
Limited
Stage Timing
(Swift Current)
State Perturbations w/
Stage Timing
Combined Sources
Weather
Soil Parameters &
Initial Conditions
Cultivar Parameters
EnergyState Perturbations w/o
Limited
Stage Timing
(Rothamsted)
State Perturbations w/
Stage Timing
Combined Sources
a
Open
Loop
243.7
EnKF
SIRF
LAI
LAI
&
LAI
&
LAI
LAI
280.2 255.2 280.0 299.3 219.1 219.1 213.7 183.1a 0.548
0.490 0.572
322.6
287.3 311.9 297.9 616.4 427.0 377.3 620.7 239.3 0.918
0.088 0.233
375.5
366.2 364.3 373.9 386.0 414.9 375.9 396.7 394.6 0.964
0.286 0.461
604.8
470.0 484.0 555.3 296.4 477.2 479.5 544.4 471.6 0.845
0.115 0.780
587.4
543.9 556.5 589.4 500.9 547.1 533.1 637.6 653.1 0.657
0.120 0.159
710.9
330.7
660.8 642.9 738.3 756.7 721.4 652.5 776.4 722.0 0.568
275.2 274.4 330.7 330.7 307.9 315.3 322.2 244.5 0.808
0.164 0.131
0.421 0.656
0.7
0.7
0.7
0.7
0.7
0.8
0.9
0.7
0.7
0.879
0.034 0.047
1352.5 1330.8 1320.7 1352.5 1352.5 1400.3 1319.2 1405.0 1359.9 0.770
0.607 0.054
356.8
382.3 356.7 360.5 364.4 406.3 367.2 366.0 545.2 0.341
0.007 0.005
1034.9 1084.9 1081.6 999.5 1024.6 1005.2 722.9 1040.8 1383.5 0.452
0.038 0.004
1342.3 1227.8 1180.2 1330.8 1303.6 1172.2 1217.5 1405.0 1870.0 0.645
0.101 0.234
Bold indicates a significant reduction in mean ME score by assimilation
.
68
Table 7: Monte Carlo Average Mean Error Scores for Combined Modelling Uncertainty
Assimilations with Increasing Observation Error Variance (Observation Error)a
LAI
Open Obs.
ME
Obs.
ME
ME
Unc. EnKF SIRF Unc. EnKF SIRF
718.7
0.01 607.6 656.9 0.001 758.9 1054.6
731.9
0.02 666.8 682.6 0.005 786.9
949.1
701.5
0.05 609.8 543.8a 0.010 750.5
760.6
0.10 597.6 586.0 0.015 671.2
664.9
Water-Limited 665.0
0.15 565.5 494.5 0.020 681.7
771.1
(Swift Current) 705.8
681.1
0.20 551.7 524.2 0.030 707.7
745.1
752.4
0.30 631.2 609.0 0.040 732.9
760.4
666.7
0.40 559.5 572.2 0.050 670.1
658.9
1314.1 0.01 1394.0 1528.6 0.001 1300.3 2111.5
1296.0 0.02 1321.9 1232.0 0.005 1304.0 1922.3
1360.6 0.05 1377.2 1341.2 0.010 1351.9 1865.5
Energy-Limited 1572.4 0.10 1626.2 1435.2 0.015 1594.3 1776.3
1744.9 0.15 1785.2 1278.0 0.020 1727.6 2074.9
(Rothamsted)
1383.5 0.20 1397.3 1342.1 0.030 1413.2 1513.7
1454.7 0.30 1345.2 1352.5 0.040 1438.6 1353.1
1472.9 0.40 1460.0 1330.3 0.050 1419.9 1731.0
a
Bold indicates a significant reduction in mean ME score by assimilation
.
Crop
System
69
Figures
Figure 1: OSSE process diagram: Transparent gray boxes represent SIRF and EnKF
assimilation algorithms,
are forcing data,
are filter-updated model states,
yield. ,
and
are model parameters,
are modeled LAI and ,
are model states, ̂
are observed LAI and ,
are uncertainty variances listed in Tables 4 and 5 and equation [5].
is
70
Figure 2: Baseline simulations of water-limited (Swift Current) summer wheat and
energy-limited (Rothamsted) winter wheat with parameters listed in Table 4, columns 2
and 3 respectively: In the bottom plot, water stress is magnified by a factor of eight.
71
Figure 3: Output and Grain Weight time-averaged open loop, EnKF and SIRF RMSE
values as a function of increasing ensemble size
72
References
Arnold, C.P., & Dey, C.H. (1986). Observing-systems simulation experiments - past,
present, and future. Bulletin of the American Meteorological Society, 67, 687-695,
doi:610.1175/1520-0477(1986)1067<0687:OSSEPP>1172.1170.CO;1172
Bolten, J.D., Crow, W.T., Zhan, X.W., Jackson, T.J., & Reynolds, C.A. (2010).
Evaluating the utility of remotely sensed soil moisture retrievals for operational
agricultural drought monitoring. IEEE Journal of Selected Topics in Applied Earth
Observations and Remote Sensing, 3, 57-66, doi:10.1109/jstars.2009.2037163
Campbell, C.A., Cameron, D.R., Nicholaichuk, W., & Davidson, H.R. (1977a). Effects of
fertilizer-N and soil-moisture on growth, N-content, and moisture use by spring wheat.
Canadian Journal of Soil Science, 57, 289-310
Campbell, C.A., Davidson, H.R., & Warder, F.G. (1977b). Effects of fertilizer-N and
soil-moisture on yield, yield components, protein-content and N accumulation in
aboveground pars of spring wheat. Canadian Journal of Soil Science, 57, 311-327
Cosby, B.J., Hornberger, G.M., Clapp, R.B., & Ginn, T.R. (1984). A statistical
exploration of the relationships of soil-moisture characteristics to the physical properties
of soils. Water Resources Research, 20, 682-690
Crow, W.T., Koster, R.D., Reichle, R.H., & Sharif, H.O. (2005). Relevance of timevarying and time-invariant retrieval error sources on the utility of spaceborne soil
moisture products. Geophysical Research Letters, 32
de Wit, A.M., & van Diepen, C.A. (2007). Crop model data assimilation with the
Ensemble Kalman filter for improving regional crop yield forecasts. Agricultural and
Forest Meteorology, 146, 38-56
Entekhabi, D., Njoku, E.G., O'Neill, P.E., Kellogg, K.H., Crow, W.T., Edelstein, W.N.,
Entin, J.K., Goodman, S.D., Jackson, T.J., Johnson, J., Kimball, J., Piepmeier, J.R.,
Koster, R.D., Martin, N., McDonald, K.C., Moghaddam, M., Moran, S., Reichle, R., Shi,
J.C., Spencer, M.W., Thurman, S.W., Tsang, L., & Van Zyl, J. (2010). The Soil Moisture
Active Passive (SMAP) Mission. Proceedings of the IEEE, 98, 704-716
Evensen, G. (2003). The Ensemble Kalman Filter: theoretical formulation and practical
implementation. Ocean Dynamics, 53, 343–367, doi:310.1007/s10236-10003-1003610239
Gordon, N.J., Salmond, D.J., & Smith, A.F.M. (1993). Novel approach to nonlinear nonGaussian Bayesian state estimation. IEE Proceedings-F Radar and Signal Processing,
140, 107-113, doi:110.1049/ip-f-1042.1993.0015
73
Hoogenboom, G., Jones, J.W., Wilkens, P.W., Batchelor, W.D., Hunt, L.A., Boote, K.L.,
Singh, U., Uraysev, O., Bowen, W.T., Gijsman, A.J., Toit, A.D., White, J.W., & Tsuji,
G.Y. (2004). Decision Support System for Agrotechnology Transfer version 4.0. In.
Honolulu, HI: University of Hawaii
Houtekamer, P.L., & Mitchell, H.L. (2001). A sequential ensemble Kalman filter for
atmospheric data assimilation. Monthly Weather Review, 129, 123-137
Jackson, T.J., & Schmugge, T.J. (1991). Vegetation effects on the microwave emission of
soils. Remote Sensing of Environment, 36, 203-212
Jones, J.W., Hoogenboom, G., Porter, C.H., Boote, K.J., Batchelor, W.D., Hunt, L.A.,
Wilkens, P.W., Singh, U., Gijsman, A.J., & Ritchie, J.T. (2003). The DSSAT cropping
system model. European Journal of Agronomy, 18, 235-265
Kerr, Y.H., Waldteufel, P., Wigneron, J.P., Delwart, S., Cabot, F., Boutin, J.,
Escorihuela, M.J., Font, J., Reul, N., Gruhier, C., Juglea, S.E., Drinkwater, M.R., Hahne,
A., Martin-Neira, M., & Mecklenburg, S. (2010). The SMOS Mission: New Tool for
Monitoring Key Elements of the Global Water Cycle. Proceedings of the Ieee, 98, 666687
Kirdiashev, K.P., Chukhlantsev, A.A., & Shutko, A.M. (1979). Microwave radiation of
the earth's surface in the presence of vegetation cover. Radiotekhnika i Elektronika, 24,
256-264
Knyazikhin, Y., Glassy, J., Privette, J.L., Tian, Y., Lotsch, A., Zhang, Y., Wang, Y.,
Morisette, J.T., Votava, P., Myneni, R.B., Nemani, R.R., & Running, S.W. (1999).
MODIS Leaf Area Index (LAI) and Fraction of Photosynthetically Active Radiation
Absorbed by Vegetation (FPAR) Product (MOD15) Algorithm Theoretical Basis
Document
Liu, Y.Q., & Gupta, H.V. (2007). Uncertainty in hydrologic modeling: Toward an
integrated data assimilation framework. Water Resources Research, 43, W07401,
doi:07410.01029/02006wr005756|issn 000043-001397
Malhotra, R.C. (1933). A contribution to the biochemistry of the wheat plant. Journal of
Biochemistry, 18, 199-205
McLaughlin, D. (2002). An integrated approach to hydrologic data assimilation:
interpolation, smoothing, and filtering. Advances in Water Resources, 25, 1275-1286
Mo, X.G., Liu, S.X., Lin, Z.H., Xu, Y.Q., Xiang, Y., & McVicar, T.R. (2005). Prediction
of crop yield, water consumption and water use efficiency with a SVAT-crop growth
74
model using remotely sensed data on the North China Plain. Ecological Modelling, 183,
301-322
Moulin, S., Bondeau, A., & Delecolle, R. (1998). Combining agricultural crop models
and satellite observations: from field to regional scales. International Journal of Remote
Sensing, 19, 1021-1036
Njoku, E.G., Jackson, T.J., Lakshmi, V., Chan, T.K., & Nghiem, S.V. (2003). Soil
moisture retrieval from AMSR-E. IEEE Transactions Geoscience and Remote Sensing,
41, 215-229
Pauwels, V.R.N., Verhoest, N.E.C., De Lannoy, G.J.M., Guissard, V., Lucau, C., &
Defourny, P. (2007). Optimization of a coupled hydrology-crop growth model through
the assimilation of observed soil moisture and leaf area index values using an ensemble
Kalman
filter.
Water
Resources
Research,
43,
W04421,
doi:04410.01029/02006wr004942
Pellenq, J., & Boulet, G. (2004). A methodology to test the pertinence of remote-sensing
data assimilation into vegetation models for water and energy exchange at the land
surface. Agronomie, 24, 197-204
Prevot, L., Chauki, H., Troufleau, D., Weiss, M., Baret, F., & Brisson, N. (2003).
Assimilating optical and radar data into the STICS crop model for wheat. Agronomie, 23,
297-303
Priestley, C.H.B., & Taylor, R.J. (1972). Assessment of surface heat-flux and evaporation
using large-scale parameters. Monthly Weather Review, 100, 81-92
Reichle, R.H. (2008). Data assimilation methods in the Earth sciences. Advances in
Water Resources, 31, 1411-1418
Reichle, R.H., Koster, R.D., Liu, P., Mahanama, S.P.P., Njoku, E.G., & Owe, M. (2007).
Comparison and assimilation of global soil moisture retrievals from the Advanced
Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) and the
Scanning Multichannel Microwave Radiometer (SMMR). Journal of Geophysical
Research-Atmospheres, 112, D09108
Reichle, R.H., Kumar, S.V., Mahanama, S.P.P., Koster, R.D., & Liu, Q. (2010).
Assimilation of satellite-derived skin temperature observations into land surface models.
Journal of Hydrometeorology, 11, 1103-1122
Richie, J.T. (1998). Soil water balance and plant water stress. In G.Y. Tsuji, G.
Hoogenboom, & P.K. Thornton (Eds.), Understanding Options for Agricultural
Production (pp. 41-54). Dordrecht, Netherlands: Kluwer Academic Publishers
75
Tan, B., Hu, J.N., Zhang, P., Huang, D., Shabanov, N., Weiss, M., Knyazikhin, Y., &
Myneni, R.B. (2005). Validation of Moderate Resolution Imaging Spectroradiometer leaf
area index product in croplands of Alpilles, France. Journal of Geophysical ResearchAtmospheres, 110, D01107
Wang, D.B., & Cai, X.M. (2007). Optimal estimation of irrigation schedule - An example
of quantifying human interferences to hydrologic processes. Advances in Water
Resources, 30, 1844-1857
76
APPENDIX B:
AN APPROACH TO QUANTIFYING THE EFFICIENCY OF A BAYESIAN
FILTER
1
Grey S. Nearing, 1Hoshin V. Gupta, 2Wade T. Crow, 3Wei Gong
1
University of Arizona Department of Hydrology and Water Resources; Tucson, AZ
2
USDA-ARS Hydrology and Remote Sensing Laboratory; Beltsville, MD
3
Beijing Normal University College of Global Change and Earth System Science;
Beijing, China
Article accepted to Water Resources Research on March 2, 2013.
77
Abstract
Data assimilation is the Bayesian conditioning of uncertain model simulations on
observations to reduce uncertainty about model states. In practice, it is common to make
simplifying assumptions about the prior and posterior state distributions, and to employ
approximations of the likelihood function, which can reduce the efficiency of the filter.
We propose metrics that quantify how much of the uncertainty in a Bayesian posterior
state distribution is due to (i) the observation operator, (ii) observation error, and (iii)
approximations of Bayes’ Law. Our approach uses discrete Shannon entropy to quantify
uncertainty, and we define the utility of an observation (for reducing uncertainty about a
model state) as the ratio of the mutual information between the state and observation to
the entropy of the state prior. These metrics make it possible to analyze the efficiency of
a proposed observation system and data assimilation strategy, and provide a way to
examine the propagation of information through the dynamic system model. We
demonstrate the procedure on the problem of estimating profile soil moisture from
observations at the surface (top 5 cm). The results show that when synthetic observations
of 5 cm soil moisture are assimilated into a three-layer model of soil hydrology, the
ensemble Kalman filter does not use all of the information available in observations.
78
1. Introduction
Uncertainties in model estimates and forecasts of dynamic systems are often expressed
probabilistically, and data assimilation is the Bayesian process of conditioning state
uncertainty distributions on observations of the modeled system (Wikle and Berliner
2007). Intuitively, data assimilation relies on a model to provide prior information about
the behaviour of a dynamic system and observations provide independent, collaborating
evidence. In this sense, prior knowledge is the set of physical laws by which we expect
the system to operate, and the Bayesian data assimilation prior is a probability
distribution representing a) uncertainty in the representation of these physical laws
themselves, b) uncertainty about the isomorphism between mathematical or numerical
representations of physics and the actual physics of the system, and c) uncertainty in the
boundary condition. To update these prior uncertainty distributions using observations it
is necessary to have a model of the observation process; this model and any associated
observation uncertainty constitutes the Bayesian likelihood function.
When designing an observing system, it is often desirable to estimate the value of certain
types of hypothetical observations; this is especially important if collecting observations
is expensive. Observing system simulation experiments (OSSEs; which are synthetic
studies) can be used to evaluate the strength of computational approaches to conditioning
models on observations (Reichle et al. 2001). A typical data assimilation OSSE will
compare the uncertainties associated with model forecasts before and after assimilating a
set of synthetic observations generated according to the specifications of the proposed
79
observing system. The result is usually an estimate of the change in forecast accuracy or
precision.
For practical reasons, most data assimilation algorithms use approximations of Bayes’
Law. Therefore, there are three factors that might combine to reduce the effectiveness of
Bayesian learning from observations:
(i)
The mapping from states to observations may not be injective (i.e. a one-toone mapping), so that even perfect observations do not translate into perfect
state estimates.
(ii)
The signal to noise ratio in the observations may be small.
(iii)
Approximations of Bayes’ Law may result in spurious updates to the state
uncertainty distributions.
Any set of observations contains a certain amount of information about model states, this
amount is determined by (i) and (ii), while (iii) results in information loss. Although it is
possible to estimate the effects of reducing noise in the observations simply by changing
the specifications for generating synthetic observations, traditional OSSE analyses do not
formally differentiate between the individual contributions of (i-iii) to the posterior
uncertainty. Furthermore, they do not formally track the assimilated information through
the dynamic system model.
In this paper, we propose a set of metrics which quantify the fraction of uncertainty in
posterior state distributions (after data assimilation) related to each of the three factors
listed above. We discuss, specifically, the case of one-dimensional observations.
80
Efficiency is computed by comparing the reduction in uncertainty about model states
actually achieved to that which is potentially attainable. These metrics, based on the
quantification of uncertainty as discrete Shannon entropy (Shannon 1948), provide a
consistent way to measure both (a) the potential ability of an observing system to inform
specific science questions, and (b) the importance of various limiting factors in the data
assimilation process. In addition, they can be used to track information introduced into a
dynamic system model via Bayesian updating (updating acts as a perturbation on the
system) through time and state-space and therefore provide a method for illustrating
dynamic information flow.
Section 2 provides an overview of probabilistic filtering and observing system simulation
experiments, and presents the proposed efficiency metrics. Section 3 demonstrates the
usefulness of the metrics for a aystem designed to estimate profile soil moisture content
from observations of surface soil moisture. Section 4 concludes and discusses some
limitations of the method.
2. Theory: Background and Proposed Metrics
2.1. Bayes Filters
Nonlinear dynamical system (NLDS) simulators used for data assimilation are usually
numerical integrators of stochastic differential equations of the form (Miller et al. 1999):
[1]
81
where
and
respectively and
are the simulator state and boundary condition at time
is a Wiener process. Solutions to [1] at discrete times
are
approximated by sampling a Markov process (Liu and Gupta 2007):
[2.1]
where the state transition function
is an approximation of the drift function
and
are noise sampled from the Gaussian distribution
. Periodically the state of the
system is observed according to an observation function
:
[2.2]
The observation process is typically associated with some error so that the actual
observation is:
[2.3]
where
, and the observation error
is drawn from an arbitrary distribution
.
In the above, [2] constitutes a hidden Markov model (HMM) and implies probability
distributions
and
respectively, where
is the number of
simulation time steps. To make predictions with this model it is essential to know the
distribution of the forcing data
which, along with the conditional distributions
and
, define a joint probability distribution over model states and observations for the
simulation period
.
82
Data assimilation addresses the question “what can be learned about x by observing z?”
and the Bayesian answer is a smoother; given observations during the simulation period,
the posterior uncertainty in
is given by:
[3]
∫
In the general case, no analytical solution to [3] exists and it is impractical to sample the
posterior directly due to the fact that it is
-dimensional (dimension of the state
multiplied by the number of integration time steps). Therefore it is often necessary to
make some simplifying assumptions, the most common being that the state at time
is conditionally independent of observations at times greater than . This assumption
results in a filter:
∫
[4]
∫
∫
which can be applied sequentially using the posterior at time
defines the prior at time
in the integral which
.
Since the filter posterior distribution is only
-dimensional at any given time, it is
sometimes feasible to use Markov Chain Monte Carlo (MCMC) techniques to generate
approximate independent and identically distributed (iid) samples from the posterior.
However, because MCMC sampling is computationally expensive and must be
performed at every observation time step, additional simplifying approximations are
often used to facilitate analytical estimates of the posterior. Common approximations
83
include emulating the prior, likelihood, and posterior as Gaussian and
and
as linear
(the Kalman filter; Kalman 1960), emulating the prior and posterior as a discrete set of
samples from Gaussians and
as linear (the ensemble Klaman filter; Evensen 2003), or
emulating the prior as a set of discrete samples and estimating the posterior by
resampling the prior (Gordon et al. 1993).
2.2. Ensemble Kalman Filter
Ensemble filters estimate the various HMM distributions from a set of
which at each simulation timestep consists of
the background),
[2.2], and
samples of the prior (
samples of the observation (
samples of the posterior (
iid samples,
; called
) derived from
according to
; called the analysis). The ensemble
filter most commonly used in hydrology, and the one which we will use in our example
(section 3), is the ensemble Kalman filter (EnKF; Evensen 2003). Given an observation
where
is jointly Gaussian independent in time with variances
, maximum
likelihood estimates of the posterior (where each background ensemble member is taken
to be the mean of the prior) can be approximated by linearizing
around the ensemble
mean and minimizing the expected squared error to obtain:
̅̅̅̅)(
(
̅̅̅̅) (
(
̅̅̅̅)(
̅̅̅̅)
[5]
)
where
(
)
are samples from the observation uncertainty distribution
Under the stated conditions (Gaussian prior, Gaussian
and linear
),
.
is an iid
84
sample of the posterior of [5] and can be used as the condition of the prior at timestep
: each sample in
is found by sampling
with a single member of the
ensemble taken to be the previous state condition.
2.3. Observing System Simulation Experiments
Synthetic experiments that examine whether a proposed observing system and data
assimilation strategy can be expected to produce posterior state distributions having
increased accuracy (or reduced uncertainty) compared to the HMM simulator
distributions are called observing system simulation experiments (OSSEs; Arnold and
Dey 1986). The typical data assimilation OSSE is called an identical-twin experiment
(e.g., Crow and Van Loon 2006) and consists of three principle components:
1.
samples of the forcing data, state and observation distributions called the open
loop samples; these are denoted
,
and
respectively. We will also consider error-free samples of the observation
.
2. A single sample from the forcing data and HMM distributions which is used to
define the true state of the NLDS system; this is called the truth system and
denoted
3.
,
,
, and
samples of the state distributions conditional on
are called the analysis samples and denoted
At every timestep, once the analysis
is determined,
.
after data assimilation; these
.
can be sampled from
Similarly, forecasts can be derived from analysis vectors by propagating
.
forward in
85
time
timesteps (via the simulator) without assimilating any observations. We will
denote this set of forecasts
where
distribution for timestep
that
represents a sample of the forecast
made after assimilating the observation at time . Notice
, and that the open loop sample
, at time , is not a sample from the
filter prior.
Since the choice of truth system is random, it is necessary to repeat each OSSE a number
of times, in this case
times. The results of analyzing these OSSEs will thus be a
comparison between
analysis and forecast samples against a single open loop sample.
In this case we used
such that each open loop sample was used individually as the
truth system for a single OSSE; this is necessary and the reason is explained in section
2.4.
The most common way to evaluate an OSSE is in terms of accuracy of the ensemble
mean, for example Pauwels et al. (2007) compared the mean-squared errors (MSE) of the
expected value of open and assimilation loop state estimates. We propose instead to
quantify the contributions of (i-iii) (from section 1) to uncertainty in the posterior state
distributions.
2.4. Quantifying Observation Utility and Filter Efficiency
The amount of information contained in the realization of a random variable
over a discrete event space
according to distribution
is
(
distributed
) (Shannon
86
1948). The entropy of the distribution
from a sample of
over
is the expected amount of information
:
∑
(
)
[6]
Entropy can be interpreted as a measure of uncertainty about
distribution
. The expected amount of information about one random variable
contained in a realization of another random variable
between
according to the
is called the mutual information
and :
∑ ∑
(
)
[7]
The mutual information normalized by the entropy of the predictand is known as the
Theil index (Theil 1967), which is a fractional measure of the expected uncertainty
reduction due to Bayesian conditioning since:
[8]
If we approximate the HMM state and observation spaces as discrete, the Theil utility of
any single observation taken at time
for reducing uncertainty in the
state dimension
can be estimated from the background ensembles:
(
)
(
Theil utility ranges between zero and one:
is deterministic and injective and
[9]
)
implies that the mapping from
implies that
and
to
are statistically
87
independent. In general, Theil utility could be used to estimate the utility of any
observation for estimating any simulator component including parameters, states, other
observations or forecasts.
The fraction of entropy in the analysis distribution due to non-injectivity of the
observation function
is proportional to one minus the Theil utility of observations with
no observation error:
(
(
)
(
)
)(
(
)
)
(
(
)
[10]
)
Theil utility is denoted as functionally dependent on the observation error distribution
which, when there is zero observation uncertainty, is the Dirac delta function. The
fraction of entropy in the analysis distribution due to observation error is:
(
(
)
(
)
)(
)
(
)
(
(
)
)
[11]
and the fraction due to approximating Bayes’ Law at each time step is the scaled
difference between the entropy of the analysis distribution and the entropy of the true
Bayesian conditional distribution:
(
)
(
(
)
|
)
[12]
Generally, [10-12] can be used to assess the fraction of posterior uncertainty on the
state dimension at time
due to assimilating an observation at any time less than .
Specifically, this means that we can estimate the efficiencies for assimilating
88
observations to make forecasts. The posterior entropy fractions related to forecast state
distributions
, and
will be indexed
where
is either
,
, or
. The statistics
,
will be called the simulator, observation, and filter inefficiency effects
respectively.
Since each OSSE was repeated
times to account for the random choice of truth system,
and because the mutual information terms in [10] and [11] represent an expected value
over observation space, the expected value of the entropy of the analysis distribution over
truth systems was used in [10-12]. To ensure that mutual information estimates are
comparable with the expected value of the entropy of the analysis sample, it is important
that each open loop ensemble member be used as a truth system exactly once.
3. Demonstration: An OSSE for Estimating Root-Zone Soil Moisture
This section describes an example OSSE which compares the inefficiency metrics
outlined in the previous section with a standard MSE evaluation. This example is
motivated by the problem of estimating root-zone soil moisture from observations taken
at the surface.
Measurements of radar scattering and emission can provide noninvasive estimates of
water content of the top few centimeters of soil with penetration depth dependent on
wavelength: ~2 cm at C-band and ~5 cm at L-band. Many efforts have been made to
estimate root-zone soil moisture from radiometer observations using data assimilation
(e.g., Bolten et al. 2010; Crow and Wood 2003; Galantowicz et al. 1999; Li and Islam
1999; Margulis et al. 2002) including several observing system simulation experiments
89
which test the feasibility of such approaches using synthetic observations (Crow and
Reichle 2008; Dunne and Entekhabi 2005; Flores et al. 2012; Reichle et al. 2008; Reichle
et al. 2001; Reichle et al. 2002b). The OSSE described in this section assimilates
synthetic observations of soil moisture in the surface soil layer (5 cm) into a three-layer
soil moisture model with total depth of 30 cm.
3.1. A 3-Layer Soil Moisture Model
3.1.1. State Transition Function
The HMM simulator state transition function consisted of a dynamic soil moisture
accounting model based on the two-layer model of soil hydrology developed by Mahrt
and Pan (1984). The three-dimensional state vector consisted of volumetric water content
in three soil layers with cumulative depth
[m3/m3]. We used layer thicknesses of
from the total depth of the root-zone (
cm; the states are denoted
cm,
cm and
), model parameters included Brooks and Corey
(1964) hydraulic coefficients: porosity ( ; [m3/m3]), bubbling pressure (
saturated hydraulic conductivity (
cm. Aside
; [cm]),
; [cm/day]) and pore size distribution index ( ; [~]),
as well as the residual moisture content ( ; [m3/m3]). The boundary condition was
described by values of daily cumulative precipitation ( ; [cm/day]) and values of daily
cumulative potential evaporation ( ; [cm/day]).
At the beginning of each time step, the average volumetric water content in each layer
was used to estimate unsaturated conductivity and soil diffusivity of that layer according
to Brooks and Corey (1964):
90
(
(
)
[13.1]
)
[13.2]
Infiltration into the first two soil layers [cm/day] was calculated as:
(
(
(
)
(
)
[13.3.1]
(
(
(
))
(
)
(
))
(
)
[13.3.2]
(
))
(
))
Direct evaporation [cm/day] from the top soil layer was:
(
(
(
)
)
(
))
[13.4]
)
[13.5.1]
Soil moisture in each layer was updated according to:
(
(
)
91
(
(
(
(
The middle layer (
)
[13.5.2]
)
(
)
)
)
[13.5.3]
was added to the model by Mahrt and Pan (1984) to allow for
sufficient infiltration.
3.1.2. Simulation Period and Boundary Conditions
The three-layer model from section 3.1 was forced for
days with daily
precipitation and potential evaporation data collected at the Leaf River watershed in
Mississippi, USA beginning on Nov 5, 1952. Forcing data uncertainty was assumed to be
multiplicative and log-normally distributed so that the log-transformed (multiplicative)
perturbations to measured precipitation and potential evaporation used to generate
ensemble samples of
[
were Gaussian distributed with mean zero and covariance
]. This forcing data uncertainty distribution was adapted from the
distribution used by Reichle et al. (2007); the covariance between perturbations to
precipitation and potential evaporation was
and we did not consider temporal
autocorrelation. We also assumed that 8% of rainfall events were undetected and added a
rainfall amount sampled uniformly from the entire set of precipitation measurements to
8% of the days when zero rainfall was reported. We did not consider state transition
92
uncertainty, so that
and all uncertainty was assumed to come from measurements
of the boundary condition. Model parameters values were
, and
,
,
,
.
3.1.3. Observation Function
Synthetic observations of 5 cm soil moisture were generated according to:
[14.1]
[14.2]
with the observation uncertainty distribution
distribution has standard deviation
. This observation error
[m3/m3] so that the observation of the surface soil
[m3/m3] of the true state 95% of the
moisture state was expected to fall within
time. The initial state for each ensemble member was sampled from the distribution
for each layer
, which represents an initially uncertain soil
moisture profile with mean of 0.10 [m3/m3] in each layer.
3.2. Ensemble Size
In application, ensemble size will be dependent on the desired accuracy and precision of
the various entropy estimates described in section 2.4. The entropy and mutual
information estimates in [6-7] are maximum likelihood estimators (MLEs) for discrete
random variables, and the asymptotic bias of an MLE estimator of entropy (
bounded by –
(
)
upper bound at zero is tight when
, where
) is
is the number of histogram bins; the
(Paninski 2003). We approximated the
93
continuous HMM state and observation spaces as discrete using a histogram bin width
given by Scott (2004):
[15]
̂
and chose an ensemble size according to this bound on the bias of the scaled filter
inefficiency effect:
(
)
(
)
(
)
(
)
[16]
̂ refers to the standard deviation of the background state sample, the analysis state
sample, or observation sample at time . Assuming that the range of each sample from a
distribution for which we wished to estimate entropy was approximately 7.5 standard
deviations (
̂), so that
√
for estimates of the entropy of the joint
distributions between states and observations ( (
(
)
were bounded above by approximately
)), the biases of the quantities
(
). Given that this
is a loose bound and noting that the mean of the entropies of the marginal EnKF
posteriors over
was about 3.4 nats, we chose an ensemble size
so that the
upper bound on the fraction of posterior uncertainty due to filter inefficiency was
approximately 0.34 nats, or about 10% of the mean of entropies of the
filter posteriors.
This ensemble size is much larger than what is needed to estimate the EnKF prior
(Reichle et al. 2002a).
94
3.3. Methods of OSSE Analysis
We report the time-averaged simulator, observation and filter inefficiency effects related
to analysis and forecast posterior state distributions:
̅̅̅̅̅̅
∑
[17]
Each ̅̅̅̅̅̅ statistic estimates an inefficiency effect related to assimilating observations of
to reduce uncertainty about
future (
at the present time (
) and up to
days into the
).
MSE statistics are also reported; these consist of, for each state dimension
, the
expected value over truth systems of the MSE of the mean of the analysis state vector
normalized by the MSE of the ensemble mean of the open loop state vector (̅̅̅̅̅̅ is the
ensemble mean of the analysis state sample due to assimilating observations of the
truth system through time ):
∑
∑
(̅̅̅̅̅̅
)
∑
(̅̅̅̅
)
[18.1]
as well as the similarly normalized MSE of the assimilation loop forecast ensemble mean
state estimates:
∑
∑
(̅̅̅̅̅̅̅
)
∑
(̅̅̅̅̅̅̅
)
[18.2]
95
These statistics were calculated for EnKF assimilation of observations generated with the
error distribution
and also with a reduced error distribution
to directly test the effect of reducing observation error on assimilation
results. The effect of reducing observation error was quantified as the fractional reduction
in MSE due to decreasing observation error variance, which is notated
3.4. Assessing the Effects of Simulator Uncertainty
A separate experiment was conducted to assess the relative effects of boundary condition
uncertainty and state transition uncertainty by scaling
and
variances. A separate
simulator was sampled for eighty-one linearly spaced values of
so that multiplicative perturbations to measured forcing data came from
the log-normal distributions
and state transition perturbations came
from distributions
. The Theil utilities of perfect observations (no
observation error) for estimating the
state at one timestep in the future were calculated
from each of these simulators using the open loop samples. Since estimating Theil utility
did not require data assimilation, no truth systems were sampled. We report the timeaveraged Theil utility over
timesteps as a function of the scaling factors .
3.5. OSSE Results
Figure 1 plots the first sixty timesteps of the open loop ensembles and assimilation loop
ensembles resulting from assimilating observations from a single truth system with
. The EnKF MSE statistics calculated by [18] (Figure 2) with
were always smaller than one, which indicated that mean-squared errors of the expected
96
value of the analysis posteriors were smaller than mean-squared errors of the expected
values of the open loop priors.
Figure 2 shows that the expected value (over time and truth systems) of the effects on
MSE of assimilating observations of
on estimates of states
and
persist for one
and two timesteps respectively before attenuating. This is an indication that the boundary
condition (precipitation and evaporation), which largely influences
directly, was the
predominant contributor of uncertainty in this simulator. On average, assimilating
observations with
reduced the ensemble mean MSE at time
80%, and reducing the observation error to
by about
further reduced the
ensemble mean MSE by almost 100% in all three states. It was possible to estimate all
three true states exactly using precise observations, however the effect of reducing
observation error variance was almost completely mitigated by the first forecast timestep.
Figure 3 illustrates the time-averaged inefficiency metrics related to each model state.
The predominant contribution to posterior state uncertainty was the observation mapping:
̅̅̅̅
except when estimating
at the time of observation. In the latter case,
observation uncertainty contributed approximately 80% of the posterior state entropy and
EnKF assumptions contributed just over 20%. This result might appear to differ from the
MSE analysis which showed that close to 100% of posterior
estimation error could be
mitigated by reducing observation error, however the efficiency analysis showed that
when the observation variance is effectively greater than zero, the posterior uncertainty in
was partly due to observation error and partially due to EnKF simplifications of the
prior and posterior probability density functions. Gaussian pdfs have the highest entropy
97
of any density function with given variance and the true density functions over states do
not have unbounded support, so approximating these as Gaussian in the prior and
posterior added entropy to the posterior.
Most importantly, the fact that some of the ̅̅̅̅
(bottom subplot in Figure 3) indicates
that there was unused information in the observations – that is, uncertainty was not
reduced as much as it could have been. The filter did not estimate the density function of
the surface layer correctly, and this artifact propagated through the soil layer between
time steps. The MSE analysis showed that it took one timestep for uncertainty in the
boundary condition to propagate between soil layers and here we see evidence of
assimilated information propagating between soil layers. In this particular case,
more closely related to
than
and also more closely related to
for example, the average Theil utility of
Theil utility of
for estimating
for estimating
was
was
was
than to
:
and the average
, indicating that filtering might be the
wrong way to approach the problem of estimating
. The soil moisture reanalysis by
Dunne and Entekhabi (2005) highlighted a similar effect, by showing that an
approximate smoother resulted in improved soil moisture state estimates as compared to
an approximate filter, however our analysis indicates that a six-dimensional posterior
might be sufficient (condition distributions of
,
, and
on
).
Since the ̅̅̅̅ were the largest inefficiency effects, we evaluated the effects of scaling the
variances of
and
; Figure 4 plots the time-averaged Theil utilities of perfect
observations (Dirac delta function error distributions) for reducing uncertainty in the
98
state at one timestep in the future. This figure takes the form of a response surface due to
varying
. The utility of perfect observations decreased with
decreasing uncertainty in the boundary condition due to the fact that observations of a
highly uncertain system were more informative than observations of a system with little
uncertainty. When the boundary condition uncertainty was high, observation utility
decreased with increasing state transition error perturbations, due to the fact that state
transition perturbations added noise to the system and served to decorrelate
and
. When the boundary condition uncertainty was low, increasing state transition
perturbations increased the utility of observations due to the fact that there was no value
in observations if there was no uncertainty in the system.
4. Summary and Discussion
The information-based analysis of observing system simulation experiments described in
this paper is novel and has the potential to inform the design of data assimilation systems.
There are a number of methods for analyzing the value of observations in a data
assimilation context (e.g. Gelaro et al. 2007), however existing methods are only able to
provide estimates of the value of observation in the context of a particular filter, which
does not allow for any estimate of the efficiency of the filter itself. By framing the
problem in the context of uncertainty reduction, it is possible to make direct estimates of
the informativeness of observations and the propagation of information through an NLDS
simulator without additional simplifying assumptions.
99
The major limitation of this method is that it is only possible to measure the information
content in a small subset of observations. Due to the curse of dimensionality (Bellman
2003), it is difficult to estimate the entropy of, or mutual information between,
probability distributions of high dimensional random variables. This has two
implications. First, we cannot calculate the potential utility of a smoother since a time
series of
observations must be treated as a
-dimensional variable. The method we
have demonstrated suggests an analysis of the expected value of collecting additional
observations, or increasing observing frequency, by calculating the expected and actual
reduction in entropy due to assimilating
given that
have been assimilated by
some filter; however such estimates are not independent of filter assumptions due to the
fact that the filter influences the prior at each timestep. The second implication is that the
method is restricted to estimating the efficiency of assimilating only one-dimensional
observations at any given time. If more than one type of observation (perhaps from
different observing systems) or spatially distributed observations were available, each
observation dimension would have to be analyzed separately. In certain cases it might be
practical to project sets of cotemporaneous observations into low-dimensional space and
then to estimate the mutual information between these projected observations and model
states. Certainly mutual information-based methods similar to what we have outlined in
this paper could be developed to test locality assumptions in spatially distributed data
assimilation applications. There are also methods for calculating joint entropies using
kernel approximations of underlying probability mass function which do not require
reprojections (e.g., Ahmad and Lin 1976), however these methods are ultimately subject
100
to the same curse of dimensionality which limits all nonparametric density estimation,
and are only practical up to a few dimensions. In both the Theory and Demonstration
sections (sections 2 and 3), we estimated the entropy of one- and two-dimensional
probability density functions and considered only one observation at a time.
The computational cost of the proposed OSSE approach is dominated by sampling the
HMM simulator, which requires running an NLDS simulator. The total number of model
evaluations required is
(
)
, and the ensemble size depends on the desired
accuracy and precision of the entropy calculations. The central limit theorems for many
discrete entropy estimators are known and can be used to determine an appropriate
ensemble size based on application requirements.
It is important to remember that uncertainty reduction does not necessarily equate to
improved accuracy. In many applications, such as flood or drought forecasting, Type II
error can have serious consequences, and the possibility of over-constrained or inaccurate
posteriors should always be evaluated. The method we propose does not evaluate bias in
the assimilation, only whether uncertainty reduction meets the potential.
The strategy of evaluating data assimilation based on tracking the information contained
in observations is intuitive. It is important to understand how this information
‘penetrates’ the model to allow for improved estimates of states which are only indirectly
related to observations. Kalman-type filters assume linear relationships between model
states, vis-a-vis the Gaussian prior, and estimating the nonparametric or semi-parametric
mutual information between states and observations allows for an evaluation of this
101
assumption without actually knowing a priori the joint state distribution. By
understanding what information is contained in observations about what model states, we
might be able to design directed assimilation strategies using Bayes’ Law to condition
appropriately chosen marginal distributions.
Acknowledgement
This work was supported by a grant from the NASA Terrestrial Ecology program entitled
Ecological and agricultural productivity forecasting using root-zone soil moisture
products derived from the NASA SMAP mission; principal investigator Wade T. Crow,
and by grant number 1208294 from the US National Science Foundation East Asia and
Pacific Summer Institute for US Graduate Students titled Estimating agricultural wateruse: an approach to evaluating the utility of data assimilation; principal investigator
Grey S. Nearing.
102
Figures
Figure 1: Example EnKF synthetic twin OSSE results for a single truth system are
plotted for sixty timesteps (of
) with an ensemble size of
observation error distribution for this experiment was
. The
and observations
were assimilated at every timestep. The top panel compares prior and posterior
distributions over the state
distributions over observations
and bottom panel compares the prior and posterior
.
103
Figure 2: Normalized differences in mean-squared errors of posterior EnKF state
forecasts (from [18.1] and [18.2]) due to assimilating observations from (top)
and (middle)
are plotted as a function of forecast time. The
bottom plot shows the fractional improvement (
uncertainty standard deviation from
to
.
) due to reducing observation
104
Figure 3: Time-averaged simulator, observation error and EnKF inefficiency effects
related to assimilating an observation at time
moisture states at forecast times
;
with prior
.
to improve forecasts of soil
105
Figure 4: Combined effects on time-averaged Theil utility of observations of
cm soil moisture) for predicting
(0-5
(5-15 cm soil moisture at one time-step in the
future) due to co-varying the scale of the variances of distributions over boundary
conditions and state transition perturbations (
).
106
References
Ahmad, I.A., & Lin, P.E. (1976). Nonparametric estimation of entropy for absolutely
continuous distributions. IEEE Transactions on Information Theory, 22, 372-375,
doi:310.1109/tit.1976.1055550
Arnold, C.P., & Dey, C.H. (1986). Observing-systems simulation experiments - past,
present, and future. Bulletin of the American Meteorological Society, 67, 687-695,
doi:610.1175/1520-0477(1986)1067<0687:OSSEPP>1172.1170.CO;1172
Bellman, R. (2003). Dynamic Programing. Mineola, NY: Dover Publications, Inc
Bolten, J.D., Crow, W.T., Zhan, X.W., Jackson, T.J., & Reynolds, C.A. (2010).
Evaluating the utility of remotely sensed soil moisture retrievals for operational
agricultural drought monitoring. IEEE Journal of Selected Topics in Applied Earth
Observations and Remote Sensing, 3, 57-66, doi:10.1109/jstars.2009.2037163
Brooks, R.H., & Corey, A.T. (1964). Hydraulic properties of porous media. Hydrology
Papers, Colorado State University
Crow, W.T., & Reichle, R.H. (2008). Comparison of adaptive filtering techniques for
land surface data assimilation. Water Resources Research, 44, W08423,
doi:08410.01029/02008wr006883
Crow, W.T., & Van Loon, E. (2006). Impact of incorrect model error assumptions on the
sequential assimilation of remotely sensed surface soil moisture. Journal of
Hydrometeorology, 7, 421-432, doi:410.1175/JHM1499.1171
Crow, W.T., & Wood, E.F. (2003). The assimilation of remotely sensed soil brightness
temperature imagery into a land surface model using Ensemble Kalman filtering: a case
study based on ESTAR measurements during SGP97. Advances in Water Resources, 26,
137-149, doi:doi:110.1016/S0309-1708(1002)00088-X
Dunne, S., & Entekhabi, D. (2005). An ensemble-based reanalysis approach to land data
assimilation.
Water
Resources
Research,
41,
W02013,
doi:02010.01029/02004WR003449
Evensen, G. (2003). The Ensemble Kalman Filter: theoretical formulation and practical
implementation. Ocean Dynamics, 53, 343–367, doi:310.1007/s10236-10003-1003610239
Flores, A.N., Bras, R.L., & Entekhabi, D. (2012). Hydrologic data assimilation with a
hillslope-scale resolving model and L-band radar observations: Synthetic experiments
with the ensemble Kalman filter. Water Resources Research, in press,
doi:10.1029/2011WR011500
107
Galantowicz, J.F., Entekhabi, D., & Njoku, E.G. (1999). Tests of sequential data
assimilation for retrieving profile soil moisture and temperature from observed L-band
radiobrightness. IEEE Transactions on Geoscience and Remote Sensing, 37, 1860-1870,
doi:1810.1109/1836.774699
Gelaro, R., Zhu, Y., & Errico, R.M. (2007). Examination of various-order adjoint-based
approximations of observation impact. Meteorologische Zeitschrift, 16, 685-692,
doi:610.1127/0941-2948/2007/0248
Gordon, N.J., Salmond, D.J., & Smith, A.F.M. (1993). Novel approach to nonlinear nonGaussian Bayesian state estimation. IEE Proceedings-F Radar and Signal Processing,
140, 107-113, doi:110.1049/ip-f-1042.1993.0015
Kalman, R.E. (1960). A new approach to linear filtering and prediction problems.
Transactions of the ASME–Journal of Basic Engineering, 82, 35-45,
doi:10.1115/1111.3662552
Li, J., & Islam, S. (1999). On the estimation of soil moisture profile and surface fluxes
partitioning from sequential assimilation of surface layer soil moisture. Journal of
Hydrology, 220, 86-103, doi:110.1016/s0022-1694(1099)00066-00069
Liu, Y.Q., & Gupta, H.V. (2007). Uncertainty in hydrologic modeling: Toward an
integrated data assimilation framework. Water Resources Research, 43, W07401,
doi:07410.01029/02006wr005756|issn 000043-001397
Mahrt, L., & Pan, H. (1984). A 2-layer model of soil hydrology. Boundary-Layer
Meteorology, 29, 1-20
Margulis, S.A., McLaughlin, D., Entekhabi, D., & Dunne, S. (2002). Land data
assimilation and estimation of soil moisture using measurements from the Southern Great
Plains 1997 Field Experiment. Water Resources Research, 38, 18pp,
doi:10.1029/2001WR001114
Miller, R.N., Carter, E.F., & Blue, S.T. (1999). Data assimilation into nonlinear
stochastic models. Tellus Series a-Dynamic Meteorology and Oceanography, 51, 167194
Paninski, L. (2003). Estimation of entropy and mutual information. Neural Computation,
15, 1191-1253
Pauwels, V.R.N., Verhoest, N.E.C., De Lannoy, G.J.M., Guissard, V., Lucau, C., &
Defourny, P. (2007). Optimization of a coupled hydrology-crop growth model through
the assimilation of observed soil moisture and leaf area index values using an ensemble
Kalman
filter.
Water
Resources
Research,
43,
W04421,
doi:04410.01029/02006wr004942
108
Reichle, R.H., Crow, W.T., & Keppenne, C.L. (2008). An adaptive ensemble Kalman
filter for soil moisture data assimilation. Water Resources Research, 44, W03423,
doi:03410.01029/02007WR006357
Reichle, R.H., Koster, R.D., Liu, P., Mahanama, S.P.P., Njoku, E.G., & Owe, M. (2007).
Comparison and assimilation of global soil moisture retrievals from the Advanced
Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) and the
Scanning Multichannel Microwave Radiometer (SMMR). Journal of Geophysical
Research-Atmospheres, 112, D09108
Reichle, R.H., McLaughlin, D.B., & Entekhabi, D. (2001). Variational data assimilation
of microwave radiobrightness observations for land surface hydrology applications. IEEE
Transactions
on
Geoscience
and
Remote
Sensing,
39,
1708-1718,
doi:1710.1109/1736.942549
Reichle, R.H., McLaughlin, D.B., & Entekhabi, D. (2002a). Hydrologic data assimilation
with the ensemble Kalman filter. Monthly Weather Review, 130, 103-114,
doi:110.1175/1520-0493(2002)1130<0103:HDAWTE>1172.1170.CO;1172
Reichle, R.H., Walker, J.P., Koster, R.D., & Houser, P.R. (2002b). Extended versus
ensemble Kalman filtering for land data assimilation. Journal of Hydrometeorology, 3,
728-740, doi:710.1175/1525-7541(2002)1003<0728:EVEKFF>1172.1170.CO;1172
Scott, D.W. (2004). Multivariate density estimation and visualization. In J.E. Gentle, W.
Haerdle, & Y. Mori (Eds.), Handbook of Computational Statistics: Concepts and
Methods (pp. 517-538). New York: Springer
Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical
Journal, 27, 379-423
Theil, H. (1967). Economics and Information Theory. Chicago: Rand McNally
Wikle, C.K., & Berliner, L.M. (2007). A Bayesian tutorial for data assimilation. Physica
D-Nonlinear Phenomena, 230, 1-16, doi:10.1016/j.physd.2006.1009.1017
109
APPENDIX C:
INFORMATION LOSS IN ESTIMATION OF AGRICULTURAL YIELD: A
COMPARISON OF GENERATIVE AND DISCRIMINATIVE APPROACHES
Grey S. Nearing and Hoshin V. Gupta
University of Arizona Department of Hydrology and Water Resources; Tucson, AZ
Article in preparation for Hydrology and Earth System Sciences.
110
Abstract
Data assimilation and regression are two commonly used methods for predicting
agricultural yield from remote sensing observations. Data assimilation is a generative
approach because it requires explicit approximations of the Bayesian prior and likelihood
to compute the probability density function of yield, conditional on observations;
regression is discriminative because it models the conditional yield density function
directly. Here synthetic experiments were used to evaluate the abilities of two methods the ensemble Kalman filter (EnKF) and Gaussian process regression (GPR) - to extract
information from observations. The amount of information in an observation was
formally quantified as the mutual information between that observation and end-ofseason biomass. We formally define information loss, used information, and bad
information as partial divergences from the true Bayesian posterior (yield conditional on
the observations). Our results suggest that the simpler discriminative GPR approach can
be as efficient as the more complex generative EnKF at extracting information from
observations, and may therefore be better suited to dealing with the practical problems
associated with remote sensed data (inhomogeneity of the satellite image pixel and
mismatches in spatial resolution). This is important because discriminative methods can
be applied without the need for a physical or conceptual simulator. Our method for
analyzing information use has many potential applications. Approximations of Bayes’
law are used regularly in predictive models of environmental systems of all kinds, and the
efficiency of such approximations has not been formally addressed.
111
1. Introduction
From a probabilistic perspective, the goal of a yield model is to estimate the probability
density function (pdf) of yield conditional on remote sensing observations. Two
fundamentally different approaches to conditioning yield estimates on observations have
emerged in the literature. The first, and most common, is discriminative (see Ng and
Jordan 2001 for a definition of discriminative and generative approaches to
classification), meaning that the conditional pdf is estimated directly by regression to
map observations onto the probability space of yield estimates; both parametric and semiparametric regressions have been used (e.g., Jiang et al. 2004; Koppe et al. 2012;
Kouadio et al. 2012; Li et al. 2007; Uno et al. 2005; Ye et al. 2006). The second method
involves conditioning dynamic physical or conceptual simulations of crop phenology on
information available in remote sensing observations. Usually, in this case, observations
are used to estimate simulator parameters (called re-initialization; e.g., Dente et al. 2008;
Doraiswamy et al. 2005; Maas 1988) or simulator states (called data assimilation; e.g., de
Wit and van Diepen 2007; Nearing et al. 2012; Pauwels et al. 2007; Pellenq and Boulet
2004; Thorp et al. 2010).
Techniques for re-initialization and data assimilation are usually Bayesian (although this
is overlooked by many authors) in the sense that repeated evaluations of the simulator are
used to either implicitly or explicitly define a pdf over simulator states (the Bayesian
prior) and a pdf of observations conditional on states (the Bayesian likelihood).
Specifically, data assimilation is a generative method for Bayesian conditioning, since
formal estimates of the distribution of simulator states conditional on observations are
112
obtained by approximating Bayes’ Law (Wikle and Berliner 2007). The data assimilation,
or generative, approach is conditional on simulator assumptions about the nature of crop
development (e.g., physics, biology, phenology) and requires weather data at the
temporal resolution of the numerical integration, whereas the regression, or
discriminative, approach is independent of simulator assumptions and does not require
specific weather data.
Since generative methods are conditional on simulator assumptions, their performance is
maximized when those assumptions are correct; in particular, generative methods achieve
optimal performance in a synthetic setting (e.g., Nearing et al. 2012). Discriminative
methods are not conditional on simulator assumptions. For example, data assimilation
applications typically assume homogeneity in land surface characteristics at the scale of a
remote sensing observation (e.g., one image pixel), whereas regression models can be
trained directly using historical observations and yield statistics, and thus can implicitly
account for mismatches in spatial resolution between agricultural areas and observations.
For this reason, discriminative methods are generally preferable to generative methods
for predicting yield; as long as they can be relied upon to efficiently extract information
from observations.
In this paper, we conduct synthetic experiments to compare the amount of information
that is extracted from remote sensing observations by regression and by data assimilation.
(Nearing et al. 2013) present a set of metrics, based on Shannon’s (1948) theory of
communication, that can be used to quantify the contribution of data assimilation
inefficiency to residual uncertainty in posterior state estimates (via synthetic data
113
experiments); here we extend that framework to explicitly measure information loss and
information use. Specifically, we compare the ability of the ensemble Kalman filter
(EnKF; Evensen 2003) with that of Gaussian process regression (GPR; Rasmussen and
Williams 2006) to extract information from observations of leaf area index (LAI) and soil
moisture – two types of observations related to agricultural productivity that are currently
available from earth observing satellites (e.g., Kerr et al. 2010; Knyazikhin et al. 1999).
The paper is organized as follows: Section 2 provides basic information about the crop
simulator used to generate and assimilate synthetic remote sensing observations (section
2.1), the EnKF (section 2.2), and GPR (section 2.3). The simulation experiments are
detailed in section 2.4 and section 2.5 explains how we estimate the information content
of observations and information loss. Section 3 presents the results of these experiments
and section 4 concludes.
2. Methods
2.1. A Crop Development Simulator
Most crop simulators estimate yield as a fraction of biomass at the time of harvest;
therefore, our objective was to estimate a probability distribution over plant biomass at
harvest conditional on observations of either leaf-area index or soil moisture taken at
some times during the growing season. A prior distribution over the harvest biomass was
generated by a crop development simulator based on the EPIC model (Williams et al.
1989). Uncertainty in these prior biomass estimates was due to uncertainty in daily
weather conditions and uncertainty in the crop development simulator itself.
114
In general terms, the dynamic system simulator can be thought of as a numerical
integrator of the stochastic differential equation (Miller et al. 1999):
[1]
where
and
respectively and
are the simulator state and boundary condition at time
is a Wiener process. Solutions to [1] at discrete times
are
approximated by sampling a Markov process (Liu and Gupta 2007):
[2.1]
where the state transition function
the
is an approximation of the drift function
are samples of noise with (Gaussian) distributions
observed by
, and
. Periodically the system is
, which is dependent on the state at the current time according to
the observation function
:
[2.2]
where
represents random observation error drawn from the arbitrary distribution
. [2] constitutes a hidden Markov model (HMM) and implies probability
distributions
and
respectively, where
is the number
of simulation time steps.
The following subsections describe the state transition function
observation function
condition,
(section 2.1.1), the
(section 2.1.2), as well as distributions over the boundary
, and state transition perturbations,
, which represent uncertainty about
weather and the simulator’s representation of phenological development (section 2.1.3).
115
2.1.1. The State Transition Function
The crop development simulator we used has 5 state dimensions (
) consisting of a
Heat Unit Index ( ; [~]), which represents the accumulated fraction of growing degree
days (C day) needed for maturity, Plant Biomass ( ; [kg/m2]), Leaf Area Index ( ;
[m2/m2]), volumetric soil moisture in the root zone (
evaporation zone (
; [m3/m3]), and in a 5 cm deep
; [m3/m3]), so that
. Simulator parameters
are listed in Table 1 and the state transition functions are detailed in Appendix A; soil
texture was assumed to be homogeneous with depth. The simulator was integrated at a
daily time step using daily mean temperature ( ; [C]), daily cumulative solar radiation
( ; [MJ/day]), and daily cumulative precipitation ( ; [cm]) so that
(
).
2.1.2. The Observation Function
Simulator predictions of remote sensing observations of LAI and surface soil moisture
were generated according to the identity relationship:
[3]
where
was the diagonal matrix with elements
errors are
[m2/m2] for MODIS
[m3/m3] for SMAP
,
, and
. Realistic observation
observations (Tan et al. 2005) and
(Entekhabi et al. 2010). It is not possible to observe
directly by satellite, however we performed tests of assimilating observations of the total
root zone soil moisture to assess the relative value of complete observations of the soil
water state compared to observations of surface level soil moisture only. Observations
116
were synthesized according to [3] with the standard errors:
and
. Realistic observation overpass frequencies are every 4 days for surface soil
moisture observations (Entekhabi et al. 2010) and as an 8-day composite for MODIS LAI
observations (Tan et al. 2005). We will investigate the effect of observing frequency on
yield estimation results.
2.1.3. Simulation Period and Uncertainty Sampling
The simulator in Appendix A was used to simulate a rain-fed wheat crop grown in
Rothamsted, UK with planting on May 10 and harvest
days after planting.
Simulator parameter values (listed in Table 1) were taken from the Spring Wheat column
of Table 8.2.1 by Arnold et al. (1995). State transition perturbations were independent
between time steps and state dimensions, and came from heteroscedastic distributions
, where
is the
dimension identity matrix.
Daily weather data collected by the Institute of Arable Crops Research (formerly the
Rothamsted Research Station) spanning 40 years, from 1959-1999, are available from the
Decision Support System for Agrotechnology Transfer version 4 release (Hoogenboom et
al. 2008). Monthly statistics in the form of mean maximum and minimum daily
temperature (̅̅̅̅, ̅̅̅̅), the number of wet days per month (
), the total monthly
precipitation (̅̅̅̅), and the mean and standard deviation of daily net radiation (̅̅̅̅,
)
were derived from these daily data for the months of May, June and July each year, and
used to generate samples of daily weather data from the uncertainty distribution outlined
117
by Schuol and Abbaspour (2007). Daily precipitation samples were generated by a twostate first order Markov process, where the probability of a wet day was:
[4]
where
was the number of days in the month. The conditional probabilities of a wet
day following a wet day and of a wet day following a dry day were:
[5.1]
[5.2]
The rainfall amount on any given dry day was
and the rainfall amount on a wet
day was given by a gamma distribution:
[6.1]
with parameters:
̅̅̅̅
̅̅̅̅
[6.2]
[6.3]
Daily mean temperature samples were generated by averaging samples of Gaussian
distributions around daily maximum and minimum temperatures. The distribution over
daily maximum temperatures was dependent on whether it was a wet day or dry day:
118
{
̅̅̅̅
̅̅̅̅
̅̅̅̅
̅̅̅̅
[
(
(̅̅̅̅
̅̅̅̅
̅̅̅̅
[
(
(̅̅̅̅
̅̅̅̅
̅̅̅̅ )]]
̅̅̅̅
[7.1]
̅̅̅̅ )]]
whereas the distribution over daily minimum temperatures was not so dependent:
̅̅̅̅
A sample of
[
̅̅̅̅ ].
[7.2]
values for daily net radiation were sampled by the Gaussian distribution
with mean ̅̅̅̅ and variance
, and the samples were ordered so that the highest value of
net radiation occurred on the day with the highest mean temperature, and so forth.
Since yield estimates are often desired for locations that are not well monitored, we
considered the situation where no knowledge of the weather was available other than a
historical record of these monthly statistics. The forcing data uncertainty distribution
was therefore sampled by choosing a random weather year and a random daily time series
generated from the weather statistics from that weather year according to the Schuol and
Abbaspour (2007) uncertainty distribution.
2.2. Data Assimilation and the Ensemble Kalman Filter
From a Bayesian perspective, the HMM state transition distribution
represents prior knowledge (before observations) about the state of the system, and the
observation distribution
is the Bayesian likelihood. The application of
Bayes law to estimate a time series of HMM states given some observations is called a
smoother:
119
[8]
∫
In the general case, no analytical solution to [8] exists and it is impractical to sample the
posterior directly, due to the fact that it is
-dimensional (dimension of the state
multiplied by the number of integration time steps). Therefore it is necessary to make
some simplifying approximations. The most common approximation is that the posterior
of the state at time
is independent of observations at times
; this assumption
results in a filter:
∫
[9]
∫
∫
Although the filter posterior is only
-dimensional, it is, due to what is called the curse
of dimensionality (Bellman 2003), relatively expensive to estimate nonparametrically,
even for relatively small
. The most common parametric approximation is due to
Kalman (1960), who assumed
and
to be Gaussian and
and
to be linear;
these assumptions result in a Gaussian posterior at each time step, which can be derived
analytically. Evensen (2003) proposed to alleviate the assumption about a linear state
transition function by repeated sampling of the HMM simulator. This results in an
approximate Monte Carlo solution to [1] according to [2], and it is possible to derive
sample estimates of the first two moments of the prior distribution represented by the
integral in the numerator of [9]. By approximating this integral as Gaussian, it is still
possible to derive a Gaussian approximation of the posterior analytically assuming a
120
linear observation function. Evensen’s (2003) approximation of [9] is the ensemble
Kalman filter (EnKF).
To implement the EnKF, an ensemble of
independent and identically distributed (iid)
samples of the boundary condition were drawn from
. At time ,
(
along with
; this sample set is notated
samples of the posterior
; called the analysis ensemble) were used to draw
HMM distribution
at time
samples from the
by propagating each of the
samples through
the state-transition equations and adding a random perturbation drawn from
resulting sample set is called the background ensemble and notated
. The
. Given an
observation
, and since
was jointly Gaussian independent in time with
covariance
, the set of maximum likelihood estimates of the posterior derived
using each background ensemble member as the mean of the Gaussian prior was
approximated by linearizing
around the ensemble mean and minimizing the expected
squared error to obtain (Houtekamer and Mitchell 2001):
̅̅̅̅)(
(
̅̅̅̅) (
(
̅̅̅̅)(
̅̅̅̅)
[10]
)
(
)
were samples from the observation uncertainty distribution
Under the stated conditions (Gaussian prior, Gaussian
and linear
.
),
is an iid
sample of the posterior of [9] and was used as the condition of the prior at timestep
.
The EnKF is fully generative because the Monte Carlo samples of the HMM state and
121
observation space were used in [10] to explicitly estimate the prior and likelihood EnKF
distributions.
In this case we used
ensemble members, and the EnKF was applied to
assimilate observations taken at three different observing frequencies:
1. a single seasonal observation (either ,
, or
),
2. a pair of observations of a given type taken at different times during the growing
season, and
3. observations of a given type taken at overpass frequencies (every 8 days for LAI
and every 4 days for soil moisture).
In cases 1 and 2 where single observations or pairs of observations were used,
observation times were chosen to yield the most information about yield, according to the
definitions outlined in section 2.5.
2.3. A Gaussian Process Regression Interpolator
As an alternative to filtering, it is possible to construct regressions that map observations
directly onto the probability space of states. In the synthetic setting, this discriminative
approach to Bayesian conditioning was done by sampling the HMM simulator and then
training regressions from these samples. In practice, regressions might be constructed
directly from historical yield data, and would therefore avoid the simulator and its
inherent assumptions altogether – which is something that is impossible for data
assimilation.
122
Training regression from simulator samples is a Bayes-discriminative approach, since
density functions over all model components are implied by samples of the HMM state
and observation space, but the conditional distribution of yield is not estimated directly
by Bayes law. The Bayes-discriminative approach to data assimilation was discussed by
McLaughlin (2002) and Wikle and Berliner (2007) who call Gaussian process regression
(GPR; also known as kriging) interpolation, and formally compared GPR with smoothers
and filters. These authors, however, used GPR to interpolate in 3-dimensional physical
space; here we use it to interpolate in the HMM state and observation space.
For
sampled from
, samples of
constitute samples of the Bayesian prior (the
denoted
stands for open loop, which is a term used
to indicate the process of sampling the HMM distributions, this will be discussed further
in section 2.4). To estimate
using some subset of observations
on the range of times
, we hypothesized that the prior distribution represented a Gaussian process with
covariance function
between the
ad
dependent on the value of observations, so that the covariance
open loop samples of biomass was:
(
where
is the
)
)
(of ) sample of the HMM distribution
The GPR covariance matrix
and
(
,
.
is defined such that for generic random variables
(
estimate of the marginal posterior from [8] over
pp 16):
[11]
). Then the two moments of the GPR
were (Rasmussen and Williams 2006;
123
̂
[(
[
]
[12.1]
̂) | ]
[12.2]
is a hyperparameter of the covariance function
(Rasmussen and Williams 2006;
pp20). For this study, we used an anisotropic squared exponential covariance function:
(
where
)
( ∑
(
)
)
is the Kronecker delta. To apply [12], the hyperparameters
[13]
,
, and
were
estimated by maximum a posteriori (MAP) optimization: i.e., minimizing the negative
log-likelihood objective function (Rasmussen and Williams 2006; pp 19):
( (
|
))
[14]
Neal (1996) demonstrated that GPR is identical to single-layer feed forward neural
networks with infinite hidden nodes, but is more resistant to overfitting than neural
networks when using an MAP training procedure. Thus, GPR is similar to the artificial
neural network regressions used by many of the studies referenced in section 1.
Notice that in a general application,
could be substituted by any state of forecast for
which estimates or predictions are desired. GPR was applied to estimate
distributions
conditional on the same nine sets of observatoins that were used by the EnKF – each of
three observation types at each of the three observing frequencies listed in section 2.2.
124
2.4. Observing System Simulation Experiments
Synthetic experiments that examine whether a proposed observing system and data
assimilation strategy can be expected to produce posterior state distributions having
increased accuracy (or reduced uncertainty) compared to the HMM simulator
distributions are called observing system simulation experiments (OSSEs; Arnold and
Dey 1986). The typical data assimilation OSSE is called an identical-twin experiment
(e.g., Crow and Van Loon 2006) and involves comparing samples of the HMM state
distribution
with samples of some approximate Bayesian posterior
distribution. An identical-twin experiment consists of three principle components:
4.
samples of the forcing data, state and observation distributions called the open
loop samples; these are denoted
,
and
respectively. We will also consider error-free samples of the observation
.
5. A single sample from the forcing data and HMM distributions which is used to
define the true state of the NLDS system; this is called the truth system and
denoted
6.
,
,
, and
samples of the state distributions conditional on
these are called the analysis samples and denoted
.
after data assimilation;
.
Since the choice of truth system is random, it is necessary to repeat each OSSE a number
(say ) of times. The results of analyzing these OSSEs will thus be a comparison between
sets of
samples of the analysis distribution over
(denoted
in the
125
case of the EnKF and
in the case of GPR) against a single open loop
sample (denoted
). Here we used
such that each open loop
sample was used individually as the truth system for a single OSSE; this is necessary and
the reason is explained in section 2.5.
2.5. Measuring Information in Observations
The amount of information contained in the realization of a random variable
over an event space
according to distribution
The entropy of the distribution
of
is
(
distributed
) (Shannon 1948).
is the expected amount of information from a sample
:
( )
[
(
)]
[15]
Entropy can be interpreted as a measure of uncertainty about
distribution
. Given that a random variable
approximating
from
to
by
according to the
is distributed according to
, then
results in information loss, which is measured by the divergence
(Kullback and Leibler 1951):
(
)
[ (
)
(
)]
[16]
The expected amount of information about one random variable
realization of another random variable
is called the mutual information between
and is measured by the expected divergence, over
and marginal distributions over :
contained in a
and
, between the conditional
126
[ (
)]
We measure the amount of information about
(
conditional distributions
contained in set s from the
Divergences from the
(
) sampled by
(
(or
|
[17]
contained in
), where
by first estimating the
, are the observations at times
truth system, directly from open loop samples.
|
) posteriors to the analysis distributions (call them
) were estimated for each truth system, and
) was found by taking the expected value of this divergence over truth systems.
The integrations necessary for measuring mutual information (in [16] and [17]) were
performed by discretizing the HMM state and observation space. This not only allowed
tractability, but also ensured that mutual information (as well as lost and used
information) were non-negative (Cover and Thomas 1991). The HMM state and
observation spaces were discretized using a histogram bin width given by Scott (2004):
[18]
̂
were ̂ refers to either the standard deviation of the open loop samples of
20th percentile standard deviation of the open loop samples of the
dimension at times
(of
, or to the
) observation
; ̂ was estimated separately for each observation
dimension. Integrations were approximated as summations over the empirical pdf bins in
state and observation space.
We estimated the amount of information contained in each individual observation and in
each pair of observations of a given type, however it was impractical to estimate the true
127
conditional distribution for more than a 2-dimensional observation set. We therefore
approximated the amount of information contained in sets of observations taken at
satellite overpass frequencies by assuming that the observations were independent
conditional on
information about
. The single observations and pairs of observations with the most
were assimilated by the EnKF and used as regressors in GPR
prediction models.
2.6. Measuring Information Loss
The total amount of information lost due to approximating
by
is
(
This situation is illustrated in Figure 1, which shows that the divergence from
integrates (with respect to
).
to
) differences between the (log-transformed) conditional
and analysis distributions. This total divergence is due to two parts: (i) loss of
information contained in observations and (ii) the introduction of bad information by data
assimilation (or regression) approximations. Note that both of these can arise due to
approximations in the implementation of Bayes’ law; information loss can arise due to
incomplete assimilation (see Nearing et al. 2013), whereas bad information can arise due
to imperfections in the likelihood (observation) function.
The amount of information contained in
by
about
which is lost due to approximating
can be found by integrating, with respect to
, the difference in log-
probabilities between the prior and conditional distributions in areas which overlap with
differences in log-probabilities between the analysis and conditional distributions.
Similarly, the amount of information contained in
about
which is used by the analysis
128
distribution can be found by integrating, with respect to
, the difference between log-
probabilities of the prior and analysis distribution in areas which overlap with differences
between the prior and conditional distributions. Finally, bad information introduced into
the analysis distribution is measured by integrating, again with respect to
, the
difference between log-probabilities of the prior and analysis distributions in areas which
overlap with differences between the analysis and conditional distributions. These
concepts of lost information, used information, and bad information are illustrated by
Figure 2.
Lost information and used information sum to the mutual information between states and
observations, while lost information and bad information sum to the divergence from
to
. These three quantities can be thought of as partial divergences from the conditional
distribution to the analysis distribution and from the conditional to the prior. Again, these
quantities were estimated by discretizing the HMM state and observation space using
[18], and as expected values over the
truth systems.
3. Results
Figure 3 illustrates an example EnKF OSSE including a set of
samples (of the HMM
open loop state
distribution), a single truth system and the corresponding
EnKF analysis state samples after sequentially assimilating observations of LAI at a
frequency of one every 8 days.
The entropy of the HMM prior distribution over
was estimated as 1.92 nats by [15]
from the open loop samples. The ratios of the amount of information contained in single
129
observations over the course of the growing season to the entropy of the prior
distribution are illustrated in Figure 4. Under this particular uncertainty scenario, there
was almost no informational value to observations of surface-level soil moisture, and
very little value (generally less than 10% of total entropy) in observations of total rootzone water content. The information content of LAI observations was greatest at points in
the growing season when differences between open loop samples of LAI were highest.
Figure 5 illustrates the information content of pairs of observations – again plotted as the
ratio of mutual information to entropy of the HMM prior distribution over
LAI observations can be expected to reduce uncertainty about
. A pair of
by as much as 65%
whereas a pair of root zone soil moisture observations can only be expected to reduce
uncertainty by just over 15%.
The ability of GPR and the EnKF to extract information from observations is illustrated
by Figure 5. The fraction of LAI information extracted by both methods (used
information) was greater than 75% in all cases. However over 50% of the soil moisture
information was lost by both algorithms, except in the case of the EnKF assimilating a
pair of surface soil moisture observations (in which case there was almost no information
to begin with). Figure 5 illustrates the total amount of information in observations as the
sum of used and lost information (left-hand plots). The results indicate only a small
benefit to increasing LAI observation frequency above one or two observations per
season – the information fraction rose about 20% when increasing the frequency from
one to two observations and only about 13% when increasing from two per season to
MODIS observing frequency. In contrast, there was greater value in increasing the
130
frequency of soil moisture observations, with an increase in information of about 260%
when increasing from two per season to SMAP observing frequency.
The performance of the EnKF and GPR when conditioning on LAI were similar – the
EnKF performed slightly better when assimilating a pair of observations; GPR appeared
to be slightly better at assimilating a time series at the MODIS overpass frequency,
however the true conditional distribution was an approximation in this case. There was
almost no bad information introduced by either algorithm when conditioning on LAI. The
fraction of soil moisture used by both algorithms was also similar, however the EnKF
introduced substantially more bad information into the analysis posterior than GPR when
conditioning on observations at SMAP overpass frequency. The divergence from
to
the analysis posteriors is illustrated in Figure 5 as the sum of bad and lost information
(right-had plots).
4. Conclusions and Discussion
The results of this synthetic experiment indicate that GPR, as an example of a
discriminative method for approximate Bayesian conditioning of yield estimates on
remote sensing observations, is generally as efficient as the most common generative
method (data assimilation by the EnKF) at extracting information from observations. This
is important, because there are several practical and theoretical reasons to prefer
discriminative methods over generative ones. The most important is that discriminative
methods can be applied without the need for an HMM simulator. Although we tested a
Bayes-discriminative approach, in a real-world situation regression models could be
131
trained directly on historical yield data (bypassing the simulator altogether); this is
impossible for data assimilation. This implies that GPR (or any other type of regression)
may be generally better suited to dealing with the practical problems that confound
typical EnKF application in a remote sensing setting - namely assumptions about
homogeneity of the satellite image pixel and mismatches in spatial resolution between the
simulator, modeled system, and observations. In any case, our results indicate that there
is no reason to prefer the generative approach. Further, we have provided a general
method for comparing various methods.
The method for analyzing information use has many potential applications.
Approximations of Bayes’ law are used regularly in predictive models of environmental
systems of all kinds (e.g., Abdu et al. 2008; Crow and Wood 2003; de Wit and van
Diepen 2007; Koppe et al. 2012; Liu and Gupta 2007; McLaughlin 2002; Reichle 2008;
Vrugt et al. 2012; Weerts and El Serafy 2006), and the efficiency of these approximations
has never been addressed. This paper introduces a formal and rigorous method for
analyzing the efficiency of approximate Bayesian methods for extracting information
from remote sensing observations to condition HMM state estimates.
Appendix A: The Crop Development Simulation State Transition Function
Prior to simulation, soil texture parameters sand and clay fractions (
and
; [m3/m3];
Table 1) were converted to Brooks and Corey (1964) type hydraulic coefficients: porosity
(
; [m3/m3]), bubbling pressure (
; [cm]), saturated hydraulic conductivity (
;
132
[cm/day]) and pore size distribution index ( ; [~]) using pedotransfer functions
developed by Cosby et al. (1984) (as reported by Santanello et al. 2007):
[A.1.1]
[A.1.2]
[A.1.3]
[A.1.4]
At the beginning of each time step, LAI was updated as a function of biomass during
growth and as a linear decay function during senescence:
(
(
)
(
)
)
[A.2]
{
Potential evapotranspiration [cm/day] was estimated by a Priestly and Taylor (1972)
approximation from daily mean temperature, net radiation and albedo:
[A.3.1]
Albedo was a function of LAI:
(
)
(
)
[A.3.2]
Potential evapotranspiration was partitioned into potential evaporation and potential
transpiration [cm/day] according to LAI:
133
(
)
[A.3.3]
[A.3.4]
The average volumetric water content in the soil below the top 5 cm was estimated using:
[A.4]
and unsaturated conductivity and soil diffusivity in each layer were calculated according
to Brooks and Corey (1964) (
(
(
or
)
):
)
(
(
)
)
[A.5.1]
(
{
(
)
)
[A.5.2]
The infiltration potential [cm/day] is similar to that used by Mahrt and Pan (1984):
(
[A.6.1]
)
where the interception of precipitation by the canopy and was used to calculate throughfall [cm/day]:
{
(
(
)
(
) )}
[A.6.2]
134
Actual infiltration [cm/day] into the top two soil layers was:
(
)
(
[A.6.3]
)
[A.6.4]
Direct evaporation [cm/day] from the top soil layer was (Mahrt and Pan, 1984):
(
(
))
[A.7]
The root zone depth [cm] was estimated according to Borg and Grimes (1986):
(
)
[A.8.1]
The root distribution function:
(
)
[A.8.2]
allowed for a calculation of the fraction of roots above depth z:
{
(
(
)
)
[A.8.3]
Transpiration [cm/day] from the 5 cm soil layer was:
[A.9.1]
and transpiration from the lower portion of the soil column was:
[A.9.2]
135
Soil moisture accounting was similar to that used by Mahrt and Pan (1984) except that
we included transpiration; volumetric water content in the 5 cm and lower soil layers
were updated as:
(
)
[A.10.1]
(
[A.10.2]
)
The root-zone soil moisture state was:
[A.10.3]
Water and temperature stress factors range between zero and one and acted as
multiplicative controls on potential biomass production. Plant water stress was the
realized fraction of potential transpiration:
[A.11.1]
and plant temperature stress was:
{
(
)
}
Photosynthetically active radiation [MJ/day] was estimated by Beer’s law :
[A.11.2]
136
(
)
[A.12]
and biomass development was simulated if emergence had occurred (i.e.
):
[A.13]
Acknowledgement
This work was supported by a grant from the NASA Terrestrial Ecology program entitled
Ecological and agricultural productivity forecasting using root-zone soil moisture
products derived from the NASA SMAP mission; principal investigator Wade T. Crow.
137
Tables
Table 1: Crop Growth Model Parameters
Parameter
Description
Name
Maximum rooting depth
Fraction of biomass in roots
Maximum LAI
Fraction of LAI senesced per day
Minimal (base) growing temperature
Optimal growing temperature
Heat units necessary for emergence
Heat units necessary for maturity
Heat units necessary for senescence
Biomass energy conversion rate
Soil albedo
Soil volumetric sand fraction
Soil volumetric clay fraction
Residual Moisture Content
Value Units
30
0.25
5
0.05
4
15
30
700
560
30
0.15
0.7
0.2
0.05
138
Figures
Figure 1: An illustration of the area which contributes to divergence from the true
Bayesian posterior
to the analysis distribution
.
139
Figure 2: An illustration of areas which contribute to used and lost portions of the mutual
information between the state and observations, as well as areas which contribute to bad
information in the analysis distribution
.
140
Figure 3:
prior (open loop) state samples drawn from the HMM distribution
(gray) and from the EnKF analysis distributions due to assimilating daily LAI
observations at MODIS observing frequency of once every 8 days (black). The truth
system (one of
) used to generate these synthetic observations is marked in red. The
observation uncertainty distribution for this OSSE was
.
141
Figure 4: Illustrations of the ratio of mutual information between end-of-season biomass
and single observations to the entropy of the prior distribution over end-of-season
biomass as a function of observation time. These ratios were calculated from open loop
samples.
142
Figure 5: Maps of the ratio of mutual information between end-of-season biomass and
observations pairs to the entropy of the prior distribution over end-of-season biomass as a
function of observation times. Pairs with the highest utility are marked with black circles.
143
Figure 7: Used, lost, and bad information in the EnKF and GPR posteriors scaled by the
entropy of the prior distribution over
. Used and lost information sum to the mutual
information between observations and end-of-season biomass (left-hand plots) while bad
and lost information sum to the divergence from the conditional to the analysis
distributions (right-had plots). The information content (and thus used, lost and bad
information) of sets of daily observations and sets of observations at satellite overpass
frequencies were estimated by assuming that the observations were independent
conditional on
.
144
References
Abdu, H., Robinson, D.A., Seyfried, M., & Jones, S.B. (2008). Geophysical imaging of
watershed subsurface patterns and prediction of soil texture and water holding capacity.
Water Resources Research, 44
Arnold, C.P., & Dey, C.H. (1986). Observing-systems simulation experiments - past,
present, and future. Bulletin of the American Meteorological Society, 67, 687-695,
doi:610.1175/1520-0477(1986)1067<0687:OSSEPP>1172.1170.CO;1172
Arnold, J.G., Weltz, M.A., Alberts, E.E., & Flanagan, D.C. (1995). Plant growth
component. In D.C. Flanagan, & M.A. Nearing (Eds.), USDA Water Erosion Prediction
Project hillslope profile and watershed model documentation (pp. 8.1 - 8.41). West
Lafayette, IN USA: USDA-ARS National Soil Erosion Research Laboratory
Bellman, R. (2003). Dynamic Programing. Mineola, NY: Dover Publications, Inc
Borg, H., & Grimes, D.W. (1986). Depth development of roots with time - an empirical
description. Transactions of the Asae, 29, 194-197
Brooks, R.H., & Corey, A.T. (1964). Hydraulic properties of porous media. Hydrology
Papers, Colorado State University
Cosby, B.J., Hornberger, G.M., Clapp, R.B., & Ginn, T.R. (1984). A statistical
exploration of the relationships of soil-moisture characteristics to the physical properties
of soils. Water Resources Research, 20, 682-690
Cover, T.M., & Thomas, J.A. (1991). Elements of information theory. In. New York,
NY, USA: Wiley-Interscience
Crow, W.T., & Van Loon, E. (2006). Impact of incorrect model error assumptions on the
sequential assimilation of remotely sensed surface soil moisture. Journal of
Hydrometeorology, 7, 421-432, doi:410.1175/JHM1499.1171
Crow, W.T., & Wood, E.F. (2003). The assimilation of remotely sensed soil brightness
temperature imagery into a land surface model using Ensemble Kalman filtering: a case
study based on ESTAR measurements during SGP97. Advances in Water Resources, 26,
137-149, doi:doi:110.1016/S0309-1708(1002)00088-X
de Wit, A.M., & van Diepen, C.A. (2007). Crop model data assimilation with the
Ensemble Kalman filter for improving regional crop yield forecasts. Agricultural and
Forest Meteorology, 146, 38-56
145
Dente, L., Satalino, G., Mattia, F., & Rinaldi, M. (2008). Assimilation of leaf area index
derived from ASAR and MERIS data into CERES-Wheat model to map wheat yield.
Remote Sensing of Environment, 112, 1395-1407
Doraiswamy, P.C., Sinclair, T.R., Hollinger, S., Akhmedov, B., Stern, A., & Prueger, J.
(2005). Application of MODIS derived parameters for regional crop yield assessment.
Remote Sensing of Environment, 97, 192-202
Entekhabi, D., Njoku, E.G., O'Neill, P.E., Kellogg, K.H., Crow, W.T., Edelstein, W.N.,
Entin, J.K., Goodman, S.D., Jackson, T.J., Johnson, J., Kimball, J., Piepmeier, J.R.,
Koster, R.D., Martin, N., McDonald, K.C., Moghaddam, M., Moran, S., Reichle, R., Shi,
J.C., Spencer, M.W., Thurman, S.W., Tsang, L., & Van Zyl, J. (2010). The Soil Moisture
Active Passive (SMAP) Mission. Proceedings of the IEEE, 98, 704-716
Evensen, G. (2003). The Ensemble Kalman Filter: theoretical formulation and practical
implementation. Ocean Dynamics, 53, 343–367, doi:310.1007/s10236-10003-1003610239
Hoogenboom, G., Jones, J.W., Wilkens, P.W., Porter, C.H., Hunt, L.A., Boote, K.L.,
Singh, U., Uryasev, O., Lizaso, J., Gijsman, A.J., White, J.W., Batchelor, W.D., & Tsuji,
G.Y. (2008). Decision Support System for Agrotechnology Transfer. In. Honolulu, HI:
University of Hawaii
Houtekamer, P.L., & Mitchell, H.L. (2001). A sequential ensemble Kalman filter for
atmospheric data assimilation. Monthly Weather Review, 129, 123-137
Jiang, D., Yang, X., Clinton, N., & Wang, N. (2004). An artificial neural network model
for estimating crop yields using remotely sensed information. International Journal of
Remote Sensing, 25, 1723-1732
Kalman, R.E. (1960). A new approach to linear filtering and prediction problems.
Transactions of the ASME–Journal of Basic Engineering, 82, 35-45,
doi:10.1115/1111.3662552
Kerr, Y.H., Waldteufel, P., Wigneron, J.P., Delwart, S., Cabot, F., Boutin, J.,
Escorihuela, M.J., Font, J., Reul, N., Gruhier, C., Juglea, S.E., Drinkwater, M.R., Hahne,
A., Martin-Neira, M., & Mecklenburg, S. (2010). The SMOS Mission: New Tool for
Monitoring Key Elements of the Global Water Cycle. Proceedings of the Ieee, 98, 666687
Knyazikhin, Y., Glassy, J., Privette, J.L., Tian, Y., Lotsch, A., Zhang, Y., Wang, Y.,
Morisette, J.T., Votava, P., Myneni, R.B., Nemani, R.R., & Running, S.W. (1999).
MODIS Leaf Area Index (LAI) and Fraction of Photosynthetically Active Radiation
146
Absorbed by Vegetation (FPAR) Product (MOD15) Algorithm Theoretical Basis
Document
Koppe, W., Gnyp, M.L., Hennig, S.D., Li, F., Miao, Y.X., Chen, X.P., Jia, L.L., &
Bareth, G. (2012). Multi-Temporal Hyperspectral and Radar Remote Sensing for
Estimating Winter Wheat Biomass in the North China Plain. Photogrammetrie
Fernerkundung Geoinformation, 281-298
Kouadio, L., Duveiller, G., Djaby, B., El Jarroudi, M., Defourny, P., & Tychon, B.
(2012). Estimating regional wheat yield from the shape of decreasing curves of green
area index temporal profiles retrieved from MODIS data. International Journal of
Applied Earth Observation and Geoinformation, 18, 111-118
Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. The Annals of
Mathematical Statistics, 22, 79-86; doi:10.2307/2236703
Li, A.N., Liang, S.L., Wang, A.S., & Qin, J. (2007). Estimating crop yield from multitemporal satellite data using multivariate regression and neural network techniques.
Photogrammetric Engineering and Remote Sensing, 73, 1149-1157
Liu, Y.Q., & Gupta, H.V. (2007). Uncertainty in hydrologic modeling: Toward an
integrated data assimilation framework. Water Resources Research, 43, W07401,
doi:07410.01029/02006wr005756|issn 000043-001397
Maas, S.J. (1988). Using satellite data to improve model estimates of crop yield.
Agronomy Journal, 80, 655-662
Mahrt, L., & Pan, H. (1984). A 2-layer model of soil hydrology. Boundary-Layer
Meteorology, 29, 1-20
McLaughlin, D. (2002). An integrated approach to hydrologic data assimilation:
interpolation, smoothing, and filtering. Advances in Water Resources, 25, 1275-1286
Miller, R.N., Carter, E.F., & Blue, S.T. (1999). Data assimilation into nonlinear
stochastic models. Tellus Series a-Dynamic Meteorology and Oceanography, 51, 167194
Neal, R.M. (1996). Bayesian Learning for Neural Networks. New York: Springer
Nearing, G.S., Crow, W.T., Thorp, K.R., Moran, M.S., Reichle, R.H., & Gupta, H.V.
(2012). Assimilating remote sensing observations of leaf area index and soil moisture for
wheat yield estimates: An observing system simulation experiment. Water Resources
Research, 48
147
Nearing, G.S., Gupta, H.V., Crow, W.T., & Gong, W. (2013). An approach to
quantifying the efficiency of a bayesian filter. Water Resources Research
Ng, A.Y., & Jordan, M.I. (2001). On discriminative vs. Generative classifiers: A
comparison of logistic regression and naive Bayes
Pauwels, V.R.N., Verhoest, N.E.C., De Lannoy, G.J.M., Guissard, V., Lucau, C., &
Defourny, P. (2007). Optimization of a coupled hydrology-crop growth model through
the assimilation of observed soil moisture and leaf area index values using an ensemble
Kalman
filter.
Water
Resources
Research,
43,
W04421,
doi:04410.01029/02006wr004942
Pellenq, J., & Boulet, G. (2004). A methodology to test the pertinence of remote-sensing
data assimilation into vegetation models for water and energy exchange at the land
surface. Agronomie, 24, 197-204
Priestley, C.H.B., & Taylor, R.J. (1972). Assessment of surface heat-flux and evaporation
using large-scale parameters. Monthly Weather Review, 100, 81-92
Rasmussen, C., & Williams, C. (2006). Gaussian Processes for Machine Learning.
Cambridge, MA: MIT Press
Reichle, R.H. (2008). Data assimilation methods in the Earth sciences. Advances in
Water Resources, 31, 1411-1418
Santanello, J.A., Peters-Lidard, C.D., Garcia, M.E., Mocko, D.M., Tischler, M.A.,
Moran, M.S., & Thoma, D.P. (2007). Using remotely-sensed estimates of soil moisture to
infer soil texture and hydraulic properties across a semi-arid watershed. Remote Sensing
of Environment, 110, 79-97
Schuol, J., & Abbaspour, K.C. (2007). Using monthly weather statistics to generate daily
data in a SWAT model application to West Africa. Ecological Modelling, 201, 301-311
Scott, D.W. (2004). Multivariate density estimation and visualization. In J.E. Gentle, W.
Haerdle, & Y. Mori (Eds.), Handbook of Computational Statistics: Concepts and
Methods (pp. 517-538). New York: Springer
Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical
Journal, 27, 379-423
Tan, B., Hu, J.N., Zhang, P., Huang, D., Shabanov, N., Weiss, M., Knyazikhin, Y., &
Myneni, R.B. (2005). Validation of Moderate Resolution Imaging Spectroradiometer leaf
area index product in croplands of Alpilles, France. Journal of Geophysical ResearchAtmospheres, 110, D01107
148
Thorp, K.R., Hunsaker, D.J., & French, A.N. (2010). Assimilating leaf area index
estimates from remote sensing into the simulations of a cropping systems model.
Transactions of the Asabe, 53, 251-262
Uno, Y., Prasher, S.O., Lacroix, R., Goel, P.K., Karimi, Y., Viau, A., & Patel, R.M.
(2005). Artificial neural networks to predict corn yield from Compact Airborne
Spectrographic Imager data. Computers and Electronics in Agriculture, 47, 149-161
Vrugt, J.A., ter Braak, C.J.F., Diks, C.G.H., & Schoups, G. (2012). Hydrologic data
assimilation using particle Markov chain Monte Carlo simulation: Theory, concepts and
applications. Advances in Water Resources
Weerts, A.H., & El Serafy, G.Y.H. (2006). Particle filtering and ensemble Kalman
filtering for state updating with hydrological conceptual rainfall-runoff models. Water
Resources Research, 42
Wikle, C.K., & Berliner, L.M. (2007). A Bayesian tutorial for data assimilation. Physica
D-Nonlinear Phenomena, 230, 1-16, doi:10.1016/j.physd.2006.1009.1017
Williams, J., Jones, C., Kiniry, J., & D., S. (1989). The EPIC crop growth model.
Transactions of the ASAE, 32, 497-511
Ye, X.J., Sakai, K., Garciano, L.O., Asada, S.I., & Sasao, A. (2006). Estimation of citrus
yield from airborne hyperspectral images using a neural network model. Ecological
Modelling, 198, 426-432
149
APPENDIX D:
MEASURING INFORMATION ABOUT MODEL STRUCTURE INTRODUCED
DURING SYSTEM IDENTIFICATION
Grey S. Nearing and Hoshin V. Gupta
University of Arizona Department of Hydrology and Water Resources; Tucson, AZ
Article in preparation.
The content of this article will be presented at the European Geophysical Union General
Assembly on April 8, 2013 (session HS1.2; Data & Models, Induction & Prediction,
Information & Uncertainty: Towards a common framework for model building and
predictions in the Geosciences)
150
Abstract
System identification is the process of building models of dynamic systems that both
correspond, structurally, to our understanding of physical laws and are behaviorally
consistent with observations of the system. Fundamentally, this is a two-part process
consisting of conceptual structure identification and mathematical structure identification,
where the latter includes parameter estimation. The most common method for
mathematical structure identification is an expectation-maximization (EM) approach – a
variational Bayesian method that seeks to estimate the mode of the posterior distribution
of model structures conditional on observations. The Bayesian prior is typically defined
in the context of a probabilistic conceptual or physics-based simulator. Observations can
be thought of as supplying Shannon-type information about model structure, and the
model can be thought of as containing Shannon-type information about system behavior
or system properties. In this paper, we use the EM procedure to identify the mathematical
structure of a rainfall-runoff model by using HyMod to supply the EM prior (the
conceptual structure). Further, we quantify the amount of information introduced by each
of the conceptual and mathematical identification phases about (1) model structure and
(2) streamflow, during both calibration and evaluation.
151
1. Introduction
Dynamic simulation models are typically not able to represent the behavior of hydrologic
systems in a completely accurate manner. For this reason, forecast efforts often adapt
models and simulations using observations of the system (Liu and Gupta 2007). Bulygina
and Gupta (2010) defined system identification as a two-step process of “building
dynamical models that are simultaneously consistent with physical knowledge about the
system and with the information contained in observational data.” The first step is
conceptual structure identification, and involves a selection of system boundaries and
boundary conditions, system states, and important system processes, resulting in a
directed graph (see Figure 1). The second step is mathematical structure identification
and involves specifying appropriate mathematical representations of system processes.
Clark et al. (2008) focused on the first step and Bulygina and Gupta (2009) focused on
the second step after assuming an a priori conceptual structure. Gupta et al (2012) discuss
these steps in more detail. Here, we focus on the information added during each of these
steps.
The most common form of mathematical structure identification is an expectationmaximization (Dempster et al. 1977) algorithm which iteratively infers the distribution of
model states conditional on observations (the E-step) and then infers the maximumlikelihood values of parameters of the state transition function (the M-step); this EM
approach was introduced by Ghahramani and Roweis (1999) and subsequently applied by
numerous authors (e.g., Damianou et al. 2011; Roweis and Ghahramani 2000; Turner et
al. 2009; Wang et al. 2008), including Vrugt et al. (2005) and Bulygina and Gupta (2009;
152
2010, 2011) who used it in the context of identifying rainfall-runoff models. The E-step is
data assimilation (Wikle and Berliner 2007), and the M-step is maximum a posteriori
(MAP) parameter estimation (or its common approximation: maximum likelihood
parameter estimation; see Aldrich 1997). The parameters that are calibrated in the M-step
can be parameters of a nonparametric regression, so that the method provides a general
approach for identifying the mathematical structure of a dynamic system model (e.g.,
Bulygina and Gupta 2009; Ghahramani and Roweis 1999); alternatively, these may be
parameters of a parametric model such as a physics-based dynamic system simulator
(e.g., Vrugt et al. 2005).
EM is a variational Bayesian method that seeks to estimate the mode of the posterior
distribution over model parameters conditional on observations, where the Bayesian prior
is often defined in the context of a conceptual or physics-based simulator (e.g., Bulygina
and Gupta 2009; Vrugt et al. 2005). In this sense, observations supply information about
model structure because conditioning on observations modifies the distribution over
parameters. Information (Shannon 1948) is defined as the expected divergence (Kullback
and Leibler 1951) caused by Bayesian conditioning (Cover and Thomas 1991 p6); this
concept is explained in detail in section 2.3. We are interested in measuring the amount
of information introduced during the two steps of system identification. That is, we want
to know how much information is introduced by defining a conceptual model (the EM
prior) and, subsequently, a mathematical model (the EM posterior); the latter is the
amount of information extracted from observations by EM conditioning. This process of
153
measuring information is demonstrated via the estimation of a dynamic rainfall-runoff
model.
The paper is organized as follows: section 2 outlines our definition of a dynamic system
model (section 2.1), the EM algorithm used in this study (section 2.2), the method we
propose to measure information (section 2.3), and the system identification problem we
use for demonstration (section 2.4). Section 3 describes the results of our application
experiment, and section 4 concludes.
2. Methods
2.1. Dynamic System Simulators
Fundamentally, system identification is the process of reducing uncertainty about the
structure of an appropriate model of a dynamic system. The most common approach to
quantifying uncertainty in model structure is to represent the time evolution of the system
state by an Euler-Maruyama approximation of an Ito stochastic differential equation (e.g.,
Archambeau et al. 2007); data assimilation can be used to attenuate this type of
uncertainty (Miller et al. 1999; Restrepo 2008). Thus, we will begin by defining a
nonlinear dynamical system model as a numerical integrator of the equation:
[1]
where
respectively and
and
are the simulator state and boundary condition at time
is a Wiener process. Solutions to [1] at discrete times
approximated by sampling a Markov process (Liu and Gupta 2007):
are
154
[2.1]
where the state transition function
is an approximation of the drift function
, and
are noise sampled from the Gaussian distribution
. Periodically the state of the
system is observed according to an observation function
:
[2.2]
where
, and the observation error
is drawn from an arbitrary distribution
.
In the above, [2] constitutes a hidden Markov model (HMM) and implies probability
distributions
and
respectively, where
simulation time steps. The conditional distributions
and
is the number of
define a joint probability
distribution over model states and observations for the simulation period
.
From the perspective of this paper, the purpose of system identification is to choose a
state transition model
which describes the time-dynamics of the modeled system. We
have found it impossible to reliably identify both
and
simultaneously (although
some authors report success in this endeavor, e.g., Wang et al. 2008) due to the fact that
given a finite data set there are many equifinal maps from
Presumably, in most applications
to
when
.
will be largely defined by the physics of a particular
measurement device, which we assume to be well-understood compared to the physics of
the dynamic system.
155
2.2. An EM System Identification Algorithm
The first step to implementing an EM approach to mathematical structure identification is
to define a prior distribution over model structures. The best way to do this is to identify a
probabilistic conceptual model. The basic components of a viable conceptual structure for
are: (1) a finite set of possible values for the dimension of the state
each possible value of
,
and
, and (2) for
distributions which define a joint probability
distribution between all state and observation components. In section 2.1, we discussed
and
in the context of a single HMM, but theoretically, several HMM could
contribute to populating this joint density function. For simplicity of discussion, the rest
of this article will only consider the case of a single possible value of
.
EM system identification is composed of two parts: data assimilation (E-step) and
regression (M-step). The E-step estimates a distribution over model states, and the M-step
finds parameters of a regression model which maximize the likelihood of the E-step state
distributions. The E and M steps are iterated until convergence; in this article, we will not
formally define convergence, and instead use a fixed number of EM loops; note that
convergence of the EM algorithm is assured (Bulygina and Gupta 2011; Ghahramani and
Roweis 1999) .
Each M-step results in a regression model; these will be notated
model results from the
so that the
M-step iteration. The EM prior is defined by a conceptual
simulator, and we begin the EM procedure by emulating this simulator using a
156
nonparametric regression model; this initial emulator will be designated
step performs data assimilation using model
. The
E-
.
Simultaneous state-updating and parameter estimation is a procedure that has been
applied to hydrologic prediction problems (e.g., Moradkhani et al. 2005b); notably Vrugt
et al. (2005) and Bulygina and Gupta (2009) use EM approaches which are conceptually
similar to the one proposed by Ghahramani and Roweis (1999; also see Roweis and
Ghahramani 2000). The method we use is also very similar to the original; our E and M
steps are described in subsections 2.2.1 and 2.2.2.
2.2.1. E-Step: The Ensemble Kalman Smoother
From a Bayesian perspective
represents prior knowledge (before
assimilating observations) about the state of the system, and the observation distribution
is the Bayesian likelihood. The application of Bayes’ Law to estimate a
time series of HMM states given some observations
modeled observations
which correspond directly with
is called a smoother:
[3]
∫
In the general case, no analytical solution to [3] exists and it is impractical to sample the
posterior directly, due to the fact that it is
-dimensional (dimension of the state
multiplied by the number of simulation time steps). Therefore it is necessary to make
some simplifying approximations. The most common approximation is due to Kalman
(1960), who was able to approximate
analytically by assuming that state at time
157
is independent of observations at times
; this results in a filter (see McLaughlin
2002 for a concise definition of smoothers and filters):
[4]
∫
∫
∫
Kalman’s (1960) implementation also assumed that
assumption implies that
and
and
were Gaussian. This
are linear, and results in an analytical solution for the
(Gaussian) posterior.
Evensen (2003) proposed to alleviate Kalman’s (1960) assumption about a linear state
transition function by using a Monte Carlo approximation of
at each
timestep; this is called the ensemble Kalman filter (EnKF). To implement the EnKF at
time
, an ensemble of
posterior distribution
used to draw
each of the
independent and identically distributed (iid) samples of the
at time
(
; called the analysis ensemble) are
samples from the HMM distribution
by propagating
samples of
through the state-transition equations and adding a random
perturbation drawn from
. The resulting sample set is called the background ensemble
and notated
. Given an observation
independent in time with covariance
, and since
is jointly Gaussian
, the set of maximum likelihood
estimates of the posterior derived using each background ensemble member as the mean
of the Gaussian prior is approximated by linearizing
around the ensemble mean and
minimizing the expected squared error to obtain (Evensen and van Leeuwen 2000):
158
[5.1]
̅̅̅̅) ( (
(
where
are
̅̅̅̅)(
̅̅̅̅)
)
(
)
[5.2]
samples from the observation uncertainty distribution
. Under the stated conditions (filter assumption, Gaussian prior, Gaussian
and linear
),
is an iid sample of the posterior of [4] at timestep , and is used as the
condition of the prior at timestep
. The same procedure is used at time
except
that a prior distribution over initial states is sampled; these samples are propagated
through the state transition equations, and along with samples of
samples from
, used to generate
.
Evensen and van Leeuwen (2000) showed that the smoother posterior could be estimated
sequentially, like the sequence of filter posteriors. They derived a procedure for sampling
the posterior of [3] with assumptions similar to the EnKF; this method is called the
ensemble Kalman smoother (EnKS), and the state sample at time is:
∏
[6]
2.2.2. M-Step: Sparse Gaussian Process Regression
Once observations have been assimilated to produce a posterior distribution over model
states, a new state transition function
similar as possible to
can be chosen so that
is as
. Specifically, we want a regression model
. One possible model, as proposed by Ghahramani and Roweis
159
(1999), is to use a summation of radial basis functions; this is a special case of Gaussian
process regression (GPR; Rasmussen and Williams 2006 p14). The strategy is to train a
GPR model
using samples from
. We prefer to use sparse Gaussian process
regression (SGPR; Snelson and Ghahramani 2006) over GPR for two reasons: (1) SGPR
is more computationally efficient than GPR, and (2) SGPR results in predictions with
higher uncertainty than GPR. Due to (2), SGPR results in expanded support of the data
assimilation prior, which has an effect similar to introducing an inflation factor into the
data assimilation algorithm. This is desirable to avoid underestimating the variance of the
data assimilation prior due to finite sample size and improper representation of model
structure uncertainty (Anderson 2007).
An SGPR model consists of a pair of mappings from an independent variable
the mean and variance of a distribution over a dependent variable
to
; that is, an SGPR
model is a one-dimensional Gaussian process (GP) indexed by
. The first step to
defining an SGPR model is to define a covariance function for the GP. The covariance
function (or kernel) is a function on pairs of
associated with those
, which defines the covariance between
. In this case we use an anisotropic squared exponential (also
called the automatic relevance determination kernel):
( ∑
where
)
is the Kronecker delta and
hyperparameters. The second step is to define a set of
[7]
,
, and
are called
training data pairs consisting of
160
inputs
and targets
which will be used to train the regression. The third step is to define a set of
pseudo-inputs ̃
{̃
̃
{̃
} and pseudo-targets ̃
̃
}. The hyperparameters, pseudo-inputs, and pseudo-targets are parameters
of the SGPR model and will be trained using MAP estimation; the training data pairs are
designated a priori (for example, by sampling
). A matrix of covariances between
targets (training, test, or pseudo) is notated
. The probability of the training data
( |
where
̃ ̃)
[
is
(
where
is:
̃) (̃ ̃)
(
the
̃
diagonal
̃) (̃ ̃) (̃
]
matrix
[8]
with
entries
). According to [8], each
is
independent of the others. MAP estimation is accomplished by maximizing the
probability of the training data according to [8]; further details are given by Snelson and
Ghahramani (2006). Prediction at new (test) input
(
̃)( (̃ ̃)
(
(̃
is given by:
)(
)
(
̃ ))
[9.1]
̃)( (̃ ̃)
( (̃ ̃)
(̃
[9.2]
)(
)
(
̃ )) ) ( ̃
)
161
is the prediction mean and
is the prediction variance.
Separate SGPR were used to define a state transition model
was represented by a single GP, and all
initial
; each state dimension
GPs were independent of the others. When the
emulator was trained, training inputs and targets came from samples of
as defined by the conceptual simulator. During the subsequent EM
iterations, training data came from the EnKS posterior samples given by
both
and
in [7-9] represent model states: if
at time
according to EnKS sample
sample of the entire state vector at time
(i.e.,
. Notice that
was a value of the state dimension
, then
was a concatenation of the
and the
sample of the inputs
).
All training data for the SGPR M-steps were normalized to have zero mean and unit
variance. This means that the prior before imposing the conceptual model in the form of
the
emulator was a standard normal distribution which was independent over all
states (Rasmussen and Williams 2006 p15 Figure 2.2a).
2.2.3. Why We Cannot Use Filters
From a system identification perspective, the primary difference between the smoother
[3], and the filter [4] is that [3] defines a joint distribution over
whereas [4] does not
– it defines a series of conditionally independent distributions through the simulation time
period. Training a regression model
and
requires samples of the joint posterior (pairs of
); this means that samples from the time-independent distributions from [4]
162
are not useful as training data. For this reason, it is necessary to use smoothers rather than
filters for data assimilation-driven system identification.
2.3. Measuring Information in the Conceptual and Mathematical Models
The amount of information contained in a random variable
(
is
drawn from distribution
) (Shannon 1948). The information contained in a random variable
the distribution
is called the mutual information between and
the expected (over
distribution of
to a distribution
and is quantified as
) divergence between the marginal distribution
conditional on
:
about
and the
. More generally, divergence from a distribution
is the expected information loss due to approximating
by
and is defined as the expected value of the difference in log-probability between
and
(Kullback and Leibler 1951):
[ (
)
(
)]
[10]
where the integration (expectation) is taken with respect to the probability measure
defined by the original distribution
. Mutual information, or information gain due to
Bayesian conditioning, is illustrated in Figure 2.
There are two types of information added during system identification that are of interest
to hydrologists: (i) information about the model which we choose to represent the system
(e.g., Ye et al. 2008) and (ii) information contained in the model about a particular
aspect of the system. Information about the model is quantified as the divergence from a
prior distribution over model structures to a posterior distribution, while information
163
about a phenomenon which arises from the system is quantified by the divergence from
prior to posterior distributions over the outcomes of the phenomenon. These two types of
information are described in detail in subsections 2.3.1 and 2.3.2.
2.3.1. Information about the Model Extracted from Observations
EM system identification seeks to estimate the mode of the posterior distribution over
model parameters (structure) conditional on observations; therefore, the idea that system
identification extracts information about the model is probably the more intuitive of the
two types of information. In the conceptual structure identification step (defining the EM
prior), we change the distribution over possible models by an amount quantified as the
divergence from the SGPR prior (a set of
standard normal distributions) to the
emulator. In the mathematical structure identification step (defining the EM posterior),
the amount of information extracted from observations about the mathematical structure
of the model during the
and all
EM step is the divergence from
emulators consist of sets of
to another
for state dimension
analytically as:
to
. The SGPR prior
independent GPs. The divergence from one GP
(think of these as the GPs representing the state transition function
of
at the
and
EM iterations respectively) can be computed
164
(
)
[
((
|
(
|
where
|
|
)
)
(
) (
)
[11]
)]
and
and
covariance matrices of processes
and
and
are the means and
respectively at a set of
. Equation [11] was applied for
and
) (
and
test inputs
, as well as for
representing the SGPR prior. In our case, we used
equally
spaced test inputs in the hypercube defined by the maximum and minimum boundary
condition measurements and maximum and minimum of state samples from all
Since each of the
divergence from
Gaussian processes that constitute
to
is the sum of the
.
are independent, the total
divergences estimated by [11].
2.3.2. Information Contained in the Model about a Hydrologic Process
Gong et al. (2013) discuss information contained in inputs
about a hydrologic
process. For convenience (and following Gong et al. 2013), let’s say the process we are
interested in is the observed process; note, however, this can be generalized to
unobserved processes also. Gong et al. (2013) further point out that the data processing
inequality (Cover and Thomas 1991 p34) states that any model which conditions
on
will be less efficient than Bayesian conditioning using the true underlying joint
distribution
. The problem is that this true underlying joint distribution is generally
165
unknown and a surrogate model of that distribution must be used instead. From this
perspective, any model
information about
through
which results in a distribution
, and this amount of information can be measured as the
divergence from any prior distribution over
will notate
over states contributes
to the posterior distribution, which we
.
Considering that observations are available for data assimilation and system
identification, we defined the prior over
for
as a histogram of observations
(call this empirical distribution ̂) with bin widths for each observation dimension
given by Scott (2004):
(
where
is the variance of the
[12]
)
dimension of the observation vector (see [5.2]) and
is the number of avialable observations (
might be empty for some times during
the simulatoin period). The amount of information added at timestep due to prescribing
the conceptual model emulated by
̂
is therefore the divergence from
to
. Similarly the amount of information added due to mathematical structure
identification during the
EM iteration is the divergence from
to
. All of these divergences were estimated by discretizing the observation
space using bin widths given by [12]; this means that the expected value of the
divergence [10] represents a summation over histogram bins. From a practical point of
view, this means that [11] measures the divergence between continuous (Gaussian)
distributions whereas [12] implies that we measure divergence between discrete
166
approximations of continuous distributions. One implication of this is due to the fact that
continuous information is unbounded, whereas discrete information is always positive.
Thus, the amounts of information we gain about model structure and about a process of
interest are not directly comparable using the framework outlined by [11] and [12]. It is
trivial to estimate [11] for a discrete state space, however this is not necessary for the
present discussion.
2.4. An Application Experiment
We tested the EM algorithm presented in section 2.2 and the method for estimating
information introduced during system identification outlined in section 2.3 using a toy
problem related to predicting streamflow in the Leaf River catchment (1944 km2) in
southern Mississippi, USA.
2.4.1. Leaf River Data and Simulation Period
Forty years (1948-1988) of daily cumulative precipitation [mm/day], cumulative potential
evapotranspiration [mm/day], and streamflow (m3/day) are available from the Hydrology
Lab of the US National Weather Service. We used data from 1951 and 1952 for this
experiment: the first 30 days of data were used for warm-up, the next 365 days were used
to perform system identification (including parameter estimation of the conceptual
simulator; call this the calibration period) and the final 365 days were used for
performance evaluation: the simulation period was
only available during the calibration period, so
days but observations were
.
167
2.4.2. The HyMod Simulator
The conceptual simulator used to define the
model was HyMod (Boyle 2000 and
Figure 1). HyMod is commonly used for proof of concept demonstrations for hydrologic
data assimilation problems (e.g., Bulygina and Gupta 2011; Moradkhani et al. 2005a;
Vrugt et al. 2012; Weerts and El Serafy 2006). The model requires inputs of precipitation
(
[mm/day]) and potential evapotranspiration (
streamflow (
component (
[mm/day]) and estimates of
[mm/day]). The state vector consists of a soil moisture storage
[mm]) and storage in a single slow-flow routing tank (
some number of quick-flow routing tanks; here we used two (
had a 4-dimensional state vector (
dimensional output (
[mm]) and
[mm]); thus the model
), 2-dimensional input (
) and 1-
).
There are five model parameters: soil moisture storage capacity
, a partitioning coefficient , and tank outflow coefficients
, infiltration exponent
and
. We calibrated
these parameters and initial states to streamflow observations during the calibration
period using shuffled complex evolution (Duan et al. 1992) with a Nash-Sutcliffe (Nash
and Sutcliffe 1970) efficiency (
) objective function; these calibrated values are listed
in Table 1. The HyMod state transition functions and observation function are described
in Appendix A.
The state transition uncertainty distributions used to define
simulator were independent in time and given by
for this conceptual
[
].
168
Streamflow was the observed variable, and the observation error distribution was
[
].
2.4.3. Implementing the Learning Algorithm
The
and
distributions defined by HyMod and described in section 2.4.2 were
sampled to provide training data for an SGPR
emulator. Conceptual learning about
model structure was measured as the divergence from the SGPR prior to the
emulator. We used
distribution
pseudo-inputs for the SGPR, sampled the
and
times (each sample consisted of state and observation values for
timesteps), and used
samples of the
training inputs sampled randomly from the
distribution during the calibration period.
During mathematical structure learning, we also used a value of
for the EnKS
and also for sampling each
distribution for SGPR training. SGPR training during
each EM iteration used
and
.
3. Results
Samples (
) of the calibrated HyMod simulator over the
day simulation
period are illustrated in Figure 3. The efficiencies of the mean streamflow estimates made
by this model after parameter calibration were
and
for the
calibration and evaluation periods respectively.
Figure 4 shows the behavior of the Nash-Sutcliffe efficiencies of the mean model
estimates of streamflow during the conceptual and mathematical identification steps.
169
There are three interesting things to note here. First, efficiency of the
similar to that of the HyMod simulator during the calibration period, but the
emulator was
emulator
reduced prediction efficiency during the evaluation period as compared to the calibrated
HyMod simulator, by about
. This was likely largely due to the fact that our
training samples did not fill the state- and input-space hypercube, but could
evidence of nonstationarity in the watershed response – we do not address nonstationarity
in this paper. Second, the efficiency of the mean model predictions of streamflow during
the calibration period converged to a value above
by the second EM
iteration, but the evaluation period efficiency remained strictly less (~0.80-0.85) than the
calibration efficiency. Third, we notice that the calibration period efficiency generally
increased (with a small exception during the fourth EM iteration), and although the
evaluation efficiency generally increased, there were iterations where the value decreased
by small amounts.
Figure 5 compares the input-output response of the HyMod state transition simulator with
the mean response of the
SGPR model (via equation [9.1]). There are noticeable
differences in the response of the two models. One advantage of starting with a
conceptual simulator as the EM prior is that we can hypothesize about a physical
interpretation of some of these differences (Bulygina and Gupta 2011), however we
caution against making concrete interpretations about learned characteristics of the
system since response surface evolution is affected by assumptions in the EM algorithm
itself – including but not limited to a known
function and a Gaussian relationship
between model states imposed by the EnKS. Some hypotheses might be:
170
1. In the case of highest precipitation values,
input values but decreased at high
response is increased at low
input values (Figure 5, subplots H-a and
M-a). This indicates that the HyMod infiltration process representation might be
improved to better account for precipitation intensity and antecedent conditions.
2. In the identified mathematical structure,
responds to the state of the quick-
flow tanks (Figure 5, subplots H-b and M-b). In the HyMod conceptual simulator,
the slow-flow and quick-flow tanks represent routing storage, however the
identified model suggests that there is some feedback from the routing process to
the soil moisture process, which controls infiltration. One possible explanation is
that the HyMod routing process might be accounting for outflow delays actually
caused by transfer in the soil.
3.
sensitivity to
is increased at high
values (Figure 5, subplots H-c and
M-c). This indicates that the linear routing coefficient
represented by a dynamic process depending on
4.
might be better
.
responds to the states of the quick-flow tanks (Figure 5, subplots H-d and Md). If the routing process were influenced by transient time of water moving
through the soil matrix, then this process appears to include some feedback
between processes represented at different time scales. This hypothesis is also
supported by the nonlinear feedback found in the quick-flow state responses to
other quick-flow states.
171
5. Quick-flow state response to
is increased, especially for low rainfall values
(Figure 5, subplots H-e and M-e). This again indicates that the model requires a
dynamic runoff partitioning process which is dependent on antecedent conditions.
These types of hypotheses might be valuable in guiding physics-based or in situ
investigation of the system.
Figure 6 shows the amount of information learned about the model at each stage in the
system identification process. The conceptual phase introduced less information about
model structure than the first mathematical identification phase due to the fact that the
SGPR prior assumed highly uncertain estimates of the states (the variance was large).
This illustrates why Shannon information can be interpreted as a measurement of
surprise: after assigning the SGPR prior, very few SGPR state transition models would
elicit large surprises about the value of the states. A vast majority of the learning during
the mathematical identification phase took place during the first EM iteration, illustrating
the efficiency of the EM algorithm. Whereas, conditioning on the observation data
supplied quite a bit of information about incorrect mathematical structure of the HyMod
state-transition function, this was mostly extracted by data assimilation. It is important to
note that the absolute values of the divergences reported in Figure 6 are highly dependent
on the number of input test samples , and it is therefore the relative magnitude of these
values that is of interest.
Interestingly, Figure 7 shows that more of the learning about streamflow occurred during
the conceptual identification phase than in the mathematical identification phase. This is
172
because the prior over streamflow was taken to be the histogram of calibration period
observations, and so there was considerable room for surprise after imposing a
conceptual model structure, but much less room for surprise during mathematical
structure estimation. While learning about streamflow continued through the second EM
iteration, the small divergences in the later EM iterations were at least partially due to the
fact that we used a finite sample to approximate the
distributions. It is
interesting to note that both the conceptual and mathematical identified models contained
almost the same amount of information about calibration- and evaluation-period
streamflow observations.
4. Discussion
This paper has two primary purposes. The first is to frame the discussion that appears in
the hydrologic literature regarding system identification (including, to some extent,
parameter estimation) in the context of well understood methods from other fields. EM
system identification is quite general, flexible and conceptually simple, and it would be
beneficial to discuss future efforts for simultaneous state and parameter estimation for
hydrologic models in the context of existing theory.
The second purpose is to frame a discussion about the meaning of information in the
context of hydrologic models. Certainly, we would like to believe that our models and
data sets are informative (in some sense). But, the key to understanding what it means to
say that ‘a model or data set is informative’ is to understand what the information it
contains is about (e.g., a conceptual rainfall-runoff model may be informative about
streamflow, evapotranspiration and soil moisture, but perhaps not about relative humidity
173
or pressure in the atmospheric boundary layer). More clearly, it is generally useful to
think about the information contained in a random variable as being about a probability
density function over the distribution of that or any other random variable. This is the
specific context in which Shannon’s (1948) theory applies. Furthermore, it is important to
notice that models do contain information. While this may seem contrary to the data
processing inequality, which states that no sequence of (linear or non-linear)
transformations of the data can add information to the data, it is important to note that the
data processing inequality measures information loss (or non-positive information added)
against a standard that is set by the perfect model that exactly explains the relationships
contained in the data – in general, this is the joint distribution between a regressor and
regressand. Intuitively we expect that a model contains information, and by
understanding that information manifests as the divergence caused by conditioning one
random variable on another, we see that models contain information about relationships
between random variables. In this context, the data processing inequality simply states
that no model will perform as well as the true underlying (probabilistic) generating
process.
The methods we have used here for measuring information about model structure are
specific to GPR. It is only possible to explicitly quantify continuous divergence between
certain types of parametric distributions, and Gaussians happen to be one type where the
formula is known. Furthermore, [11] actually calculates the divergence between
probability distributions over model states, but in the case of models which are GP’s this
is an appropriate surrogate for calculating the divergence between probabilistic models.
174
Appendix A: The HyMod Simulator
Inputs (boundary conditions) are precipitation (
evapotranspiration (
[mm/day]). The state vector at each timestep consists of a soil
moisture storage component (
(
[mm/day]) and potential
[mm]) and storage in a single slow-flow routing tank
[mm]) and some number of quick-flow routing tanks; here we used two (
[mm]).
The observation (according to equation [2.2]) is an estimate of streamflow (
[mm/day]). Parameters are: soil moisture storage capacity
partitioning coefficient , and tank outflow coefficients
, infiltration exponent , a
and
. The HyMod simulator
is illustrated in Figure 1.
Infiltration is controlled by the relative available capacity of soil moisture storage at the
beginning of the time step and a unitless parameter :
(
)
∫
(
)
[A.1]
and evapotranspiration is controlled by the relative available capacity of soil moisture
storage at the end of the time step:
(
)
[A.2]
The soil moisture state update is:
[A.3]
175
Effective precipitation is the portion of precipitation that is not infiltrated, and this is
routed to the slow- and quick-flow tanks according to parameter . The quick-flow tanks
are in series so that the state in the first tank is:
(
[A.4.1]
)
(
[A.4.2]
)
Outflow from the watershed is a linear function of the current state according to
parameters
and
respectively:
[A.5]
The slow-flow storage tank is updated according to:
(
(
))
[A.6]
176
Tables
Table 1: Calibrated HyMod Parameter Values
Parameter or
Initial State
Calibrated
Range
Value
50 – 600
425.73
0.05 – 1.95
0.07
0.5 – 0.95
0.90
0.001 – 0.1
0.05
0.3 – 0.95
0.58
0 – 600
396.13
0 – 300
20.64
0 – 100
84.50
0 – 50
22.22
Units
[mm]
~
~
~
~
[mm]
[mm]
[mm]
[mm]
177
Figures
Figure 1: The HyMod simulator state transition functions and observation function are
represented as a graphical network at a single timestep. Each connection in the graphical
network is labeled according to which parameter controls this interaction. See Table 1
and Appendix A for an explanation of symbols and a mathematical representation of each
connection. The simulator components which represent a single timestep are grouped by
the shaded box.
178
Figure 2: Mutual information between two random variables can be thought of as the
information gain about one random variable due to Bayesian conditioning on the other.
This is equal to the expected divergence from the Bayesian posterior to the Bayesian
prior; divergence is an integration over the difference between log-transformed
probability distributions. This figure illustrates the areas (prior to log-transforming the
distribution) which are integrated (after log-transforming of both distributions) to
measure divergence.
179
Figure 3: A
member sample of the HyMod simulator including input and state
transition uncertainty. Streamflow predictions are compared with observations in the
bottom subplot.
180
Figure 4: Convergence of the Nash-Sutcliffe efficiency of mean model estimates of
streamflow.
181
Figure 5: Partial state transition mean response surfaces of the HyMod simulator (H-a
through H-h) and the EM
SGPR model (M-a through M-h). Each surface represents
an evaluation of the mean model response due to changing the two input state values on
the x- and y-axes by fixing the other input values at a point in the hypercube; each of the
three surfaces in each subplot was generated by fixing the inputs which do not appear on
the x- or y-axis at a different (equally-spaced) location in the hypercube.
182
Figure 6: Information extracted from observations of streamflow about the conceptual
and mathematical components of
structure. EM iteration 0 represents the information
added at the conceptual model identification phase (i.e., by
). Quantities in this figure
are estimates of continuous divergence. The absolute magnitudes are dependent on the
number of training samples, whereas the relative magnitudes are dependent on selecting a
representative set of training samples – in this case, we are interested in the relative
values.
183
Figure 7: Mean (discrete) information about streamflow during calibration and
evaluation periods added during system identification. EM iteration 0 represents the
information added at the conceptual model identification phase (i.e., by
) when
starting with a prior which consisted of a histogram of calibration-period observations.
Quantities in this figure are estimates of discrete divergence.
184
References
Aldrich, J. (1997). R. A. Fisher and the making of maximum likelihood 1912-1922.
Statistical Science, 12, 162-176
Anderson, J.L. (2007). An adaptive covariance inflation error correction algorithm for
ensemble filters. Tellus A, 59, 210-224
Archambeau, C., Cornford, D., Opper, M., & Shawe-Taylor, J. (2007). Gaussian process
approximations of stochastic differential equations. In N. Lawrence, A. Schwaighofer, &
J. Quiñonero Candela (Eds.), Gaussian Processes in Practice (pp. 1-17). Bletchley, U.K.
Boyle, D.P. (2000). Multicriteria calibration of hydrologic models. In, Department of
Hydrology and Water Resources. Tucson, AZ: University of Arizona
Bulygina, N., & Gupta, H. (2009). Estimating the uncertain mathematical structure of a
water balance model via Bayesian data assimilation. Water Resources Research, 45
Bulygina, N., & Gupta, H. (2010). How Bayesian data assimilation can be used to
estimate the mathematical structure of a model. Stochastic Environmental Research and
Risk Assessment, 24, 925
Bulygina, N., & Gupta, H. (2011). Correcting the mathematical structure of a
hydrological model via Bayesian data assimilation. Water Resources Research, 47
Clark, M.P., Slater, A.G., Rupp, D.E., Woods, R.A., Vrugt, J.A., Gupta, H.V., Wagener,
T., & Hay, L.E. (2008). Framework for Understanding Structural Errors (FUSE): A
modular framework to diagnose differences between hydrological models. Water
Resources Research, 44
Cover, T.M., & Thomas, J.A. (1991). Elements of information theory. In. New York,
NY, USA: Wiley-Interscience
Damianou, A.C., Titsias, M.K., & Lawrence, N.D. (2011). Variational Gaussian process
dynamical systems. In, NIPS (pp. 2510–2518). Granada, Spain
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum Likelihood from
Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B
(Methodological), 39, 1-38
Duan, Q.Y., Sorooshian, S., & Gupta, V. (1992). EFFECTIVE AND EFFICIENT
GLOBAL OPTIMIZATION FOR CONCEPTUAL RAINFALL-RUNOFF MODELS.
Water Resources Research, 28, 1015-1031
185
Evensen, G. (2003). The Ensemble Kalman Filter: theoretical formulation and practical
implementation. Ocean Dynamics, 53, 343–367, doi:310.1007/s10236-10003-1003610239
Evensen, G., & van Leeuwen, P.J. (2000). An ensemble Kalman smoother for nonlinear
dynamics. Monthly Weather Review, 128, 1852-1867
Ghahramani, Z., & Roweis, S.T. (1999). Learning nonlinear dynamical systems using an
EM algorithm. Advances in neural information processing systems, 431-437
Gong, W., Gupta, H.V., Yang, D., Sricharan, K., & Hero, A.O. (2013). Estimating
epistemic & aleatory uncertainties during hydrologic modeling: An information theoretic
approach. Water Resources Research, n/a-n/a
Gupta, H.V., Clark, M.P., Vrugt, J.A., Abramowitz, G., & Ye, M. (2012). Towards a
comprehensive assessment of model structural adequacy. Water Resources Research, 48
Kalman, R.E. (1960). A new approach to linear filtering and prediction problems.
Transactions of the ASME–Journal of Basic Engineering, 82, 35-45,
doi:10.1115/1111.3662552
Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. The Annals of
Mathematical Statistics, 22, 79-86; doi:10.2307/2236703
Liu, Y.Q., & Gupta, H.V. (2007). Uncertainty in hydrologic modeling: Toward an
integrated data assimilation framework. Water Resources Research, 43, W07401,
doi:07410.01029/02006wr005756|issn 000043-001397
McLaughlin, D. (2002). An integrated approach to hydrologic data assimilation:
interpolation, smoothing, and filtering. Advances in Water Resources, 25, 1275-1286
Miller, R.N., Carter, E.F., & Blue, S.T. (1999). Data assimilation into nonlinear
stochastic models. Tellus Series a-Dynamic Meteorology and Oceanography, 51, 167194
Moradkhani, H., Hsu, K.L., Gupta, H., & Sorooshian, S. (2005a). Uncertainty assessment
of hydrologic model states and parameters: Sequential data assimilation using the particle
filter. Water Resources Research, 41
Moradkhani, H., Sorooshian, S., Gupta, H.V., & Houser, P.R. (2005b). Dual state–
parameter estimation of hydrological models using ensemble Kalman filter. Advances in
Water Resources, 28, 135-147
Nash, J.E., & Sutcliffe, J.V. (1970). River flow forecasting through conceptual models
part I -- A discussion of principles. Journal of Hydrology, 10, 282-290
186
Rasmussen, C., & Williams, C. (2006). Gaussian Processes for Machine Learning.
Cambridge, MA: MIT Press
Restrepo, J.M. (2008). A path integral method for data assimilation. Physica D:
Nonlinear Phenomena, 237, 14-27
Roweis, S., & Ghahramani, Z. (2000). An EM algorithm for identification of nonlinear
dynamical systems
Scott, D.W. (2004). Multivariate density estimation and visualization. In J.E. Gentle, W.
Haerdle, & Y. Mori (Eds.), Handbook of Computational Statistics: Concepts and
Methods (pp. 517-538). New York: Springer
Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical
Journal, 27, 379-423
Snelson, E., & Ghahramani, Z. (Eds.) (2006). Sparse Gaussian Processes using Pseudoinputs. Cambridge, MA: MIT Press
Turner, R., M., D., & C., R. (2009). System identification in Gaussian process dynamical
systems. In D. Görür (Ed.), Nonparametric Bayes Workshop at NIPS. Whistler, Canada
Vrugt, J.A., Diks, C.G.H., Gupta, H.V., Bouten, W., & Verstraten, J.M. (2005). Improved
treatment of uncertainty in hydrologic modeling: Combining the strengths of global
optimization and data assimilation. Water Resources Research, 41, 17
Vrugt, J.A., ter Braak, C.J.F., Diks, C.G.H., & Schoups, G. (2012). Hydrologic data
assimilation using particle Markov chain Monte Carlo simulation: Theory, concepts and
applications. Advances in Water Resources
Wang, J.M., Fleet, D.J., & Hertzmann, A. (2008). Gaussian process dynamical models
for human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30,
283-298
Weerts, A.H., & El Serafy, G.Y.H. (2006). Particle filtering and ensemble Kalman
filtering for state updating with hydrological conceptual rainfall-runoff models. Water
Resources Research, 42
Wikle, C.K., & Berliner, L.M. (2007). A Bayesian tutorial for data assimilation. Physica
D-Nonlinear Phenomena, 230, 1-16, doi:10.1016/j.physd.2006.1009.1017
Ye, M., Meyer, P.D., & Neuman, S.P. (2008). On model selection criteria in multimodel
analysis. Water Resources Research, 44, W03428
187
APPENDIX E:
KALMAN FILTERING WITH A GAUSSIAN PROCESS OBSERVATION
FUNCTION
Grey S. Nearing
University of Arizona Department of Hydrology and Water Resources; Tucson, AZ
Manuscript not submitted for publication.
188
Abstract
We derive the gain expression of the iterative ensemble Kalman filter for a hidden
Markov model with a Gaussian process regression observation function. Kriging the
observation function allows for an explicit expression for the gradient of the filter
posterior density local to each ensemble member. The gain must be approximated
variationally and the performance is similar to that of the ensemble Kalman filter and a
resampling particle filter for toy problems related to streamflow forecasting, soil moisture
estimation and the Lorenz attractor.
189
1. Introduction
Kalman-type Bayesian state-updating filters are commonly applied to nonlinear hidden
Markov models (HMM) by approximating the prior (forecast) as Gaussian and the
observation function as locally linear. In the case of a nonlinear HMM state transition
function, the ensemble Kalman filter (Evensen 2003) estimates the mean and covariance
of a Gaussian approximation of the prior by Monte Carlo sampling; in the case of a
nonlinear HMM observation function, variational methods are used to find the mode of
the filter posterior (Zupanski 2005). One limitation of these variational methods is that
the tangent to the observation function is localized around the ensemble mean.
By emulating the observation operator as a Gaussian process regression (GPR;
Rasmussen and Williams 2006) it is possible to derive the filter posterior density function
and its gradient anywhere in model state space; this allows for a variational update that is
local to each ensemble member. GPR is Kriging in an arbitrary function domain so that
the GPR prediction is a linear combination of training samples with kernel weights based
on domain proximity (see section 2.2 and Rasmussen and Williams 2006 Chapter 1 for
further explanation). GPR is also identical to certain single-layer, infinite-node, feedforward neural networks (Neal 1996). It is simple to implement, resistant to overfitting,
and can reproduce any smooth response as long as the covariance structure of that
response is known. When the gradients of the kernels are explicit, the gradient of the
emulator is as well, and the gradient of the EnKF analysis can be computed anywhere.
Section 2 of this paper presents the gain expression and variational approximation of the
EnKF for an HMM with a GPR observation function (gpEnKF). Section 3 provides toy
190
examples which compare this filter with the EnKF, ensemble transform Kalman filter
(ETKF; Bishop et al. 2001) and the sequential importance resampling filter (SIRF;
Gordon et al. 1993). These examples are related to streamflow forecasting, root-zone soil
moisture estimation and the Lorenz attractor.
2. Methods
2.1. Overview of Data Assimilation
Data assimilation assumes a discrete-time nonlinear dynamical system (NLDS) simulator
where the state-transition function:
[1.1]
at time
depends, up to some uncertainty distribution
and the current boundary condition
by
, on the previous state
. The state is indirectly observed
according to the observation function:
[1.2]
up to a random observation error
and
. In most data assimilation applications,
are deterministic functions of random variables
,
represent the expected value of distributions
,
, and
, which
and
respectively. For example, standard applications of the EnKF use iid samples of
and
and
to create iid samples of
and
at every time step using a deterministic
simulator and prescribed error distributions
and
.
,
,
191
Data assimilation addresses the question “what can be learned about x by observing y?”
and the Bayesian answer is a smoother. Given observations during
simulation time
steps (we drop the notation for forcing data and error distributions):
[2]
∫
In general, there is no analytical solution to [2] and it is impractical to sample the
-
dimensional (dimension of the state multiplied by number of integration time steps)
posterior directly, therefore it is necessary to make some simplifying assumptions, the
most common being that the state at time
is conditionally independent of
observations at times greater than . This assumption results in a filter:
[3]
∫
∫
∫
which can be applied sequentially using the posterior at time
defines the prior at time
in the integral which
. The filter posterior is -dimensional (dimension of the
state) and it is still rare for nonparametric density estimation to be practical.
2.2. GPR Observation Function
A GPR observation function emulator can be trained with
samples of forcing data,
states and modeled observations collected from the HMM defined by [1] and uncertainty
distribution
and
,
and
. If these samples are stored in training data matrices
, the expected values of the observations
resulting from
192
state and forcing data samples,
of
and
, according to a GPR emulator
are:
(
) (
[4]
)
is the GPR covariance function (see Rasmussen and Williams 2006; Chapter 4)
and [4] is the simple Kriging mean. It is usually the case for natural systems that
have different sensitivity to the different dimensions of
and
will
and thus the covariance
function should be anisotropic; one such covariance function is the squared-exponential
known as the automatic relevance determination kernel (Neal 1996):
(
)
[5]
( ∑
is the Kronecker delta, and
(
,
) )
, and
are scale, inverse squared
correlation-lengths, and noise hyperparameters respectively which must be estimated by
maximizing the probability of the training data (Rasmussen and Williams 2006; Chapter
5).
and
can be sampled directly from
and if
is independent from
time, the emulator can be trained prior to data assimilation; in this case
is independent of the state and boundary condition at any time . We
will use notation
.
193
2.3. gpEnKF Posterior Likelihood and Gradient
is observed with error covariance
ensemble
(
and the prior is sampled by the forecast
and assumed to be Gaussian with covariance
) . The negative log-likelihood of the posterior is given up to an additive
constant by:
[6]
[(
)(
) (
)]
The gradients of the likelihood function with respect to the
samples of the
-
dimensional model state are:
|
(
The maximum likelihood (analysis) filter estimate of
)(
)
conditional on
[7]
is:
[8.1]
(
where
|
)
[8.2]
is the Kalman gain. [8] is implicit and requires a variational solution. Many
variational approaches to either minimizing the negative log-likelihood of the posterior
[6] or to locating a local zero to the log-likelihood gradient [8] are sufficient for locating
194
the a local estimate of the maximum likelihood state estimator. The log-likelihood
gradient [7] is taken with respect to each ensemble member independently (except for the
calculation of
and is local to that ensemble member.
Gradient-based variational solutions are common in high-dimensional latent-variable
GPR models (e.g., Lawrence 2005; Titsias and Lawrence 2010; Wang et al. 2008). It is
important to note that neither the gpEnKF nor any other filter with a Gaussian prior
results in a maximum likelihood update which considers the full nonlinearity of the
HMM since the
covariance matrix assumes a linear relationship between state
dimensions. This is intuited by noting that if
according to (
)
(
) even though
and
(
,
will be updated
) is not considered.
3. Demonstrations
Four toy demonstrations of the gpEnKF follow. Each one is an identical twin observing
system simulation experiment (e.g., Crow and Van Loon 2006), where a truth state
and assimilation ensemble were sampled iid from the HMM simulator used
for data assimilation. Observations synthesized from the synthetic truth system according
to
were assimilated using four filters: the EnKF, SIRF, ETKF and gpEnKF. A
member assimilation ensemble was used in each case and we ran 30 Monte Carlo
repetitions of each experiment to account for the random truth system.
Filter performance was evaluated in terms of improvement in the expected squared-error
of ensemble mean estimates, and in terms of the posterior probability of the truth system.
195
We report the time-averaged normalized improvement of the expected squared-error of
the ensemble mean:
∑
( ∑
∑
( ∑
̂
) ( ∑
) ( ∑
̂
)
[9]
)
and the time-averaged log-scaled improvement of the probability of the truth system
assuming Gaussian posterior distributions:
∑
[
]
[10]
is the Gaussian distribution with first two moments estimated from the
analysis sample
, where
is calculated by each filter type. Both [9] and [10] were
calculated separately for each state dimension, and the analog was calculated for each
observation dimension. Larger
posterior mean estimates and larger
improvement scores represent more accurate
values represent lower chance of type II
prediction error.
In each example, the GPR observation function emulator used by the gpEnKF was
trained once before data assimilation using prior ensemble state and observation function
means for the entire simulation (integration) period. Each dimension of GPR training
variables was standardized to have zero mean and unit variance before hyperparameter
optimization.
196
3.1. Streamflow Forecasting
In the first toy demonstration, observations of river stage were assimilated into a
conceptual lumped rainfall-runoff model based on HyMod (Boyle 2000) under the
presumption that improved state estimates translate to improved streamflow forecasts
(Maurer and Lettenmaier 2003). The HyMod is commonly used for proof of concept
demonstrations for hydrologic data assimilation (Bulygina and Gupta 2011; Moradkhani
et al. 2005; Vrugt et al. 2005; Vrugt et al. 2012; Weerts and El Serafy 2006) and is
described extensively in existing literature. The model requires precipitation
potential evapotranspiration
[cm] and
[cm] as boundary conditions and our implementation
used a 3-dimensional state vector including a soil moisture storage which acts as a control
on infiltration and evaporation, a linear slow-flow routing storage tank which simulates
groundwater, and a single linear quick-flow routing storage tank which simulates nearsurface water. The soil moisture tank drains into the slow and quick routing tanks, which
are in parallel. Streamflow is estimated as a linear function of the routing tank states and
river stage (
[m]) – an exponential function of streamflow (
[cm/m2] over the
watershed) – was observed according to:
[11]
Parameters of [11] were chosen to match the NWS rating curve for the Leaf River
Watershed in Mississippi, USA at the time of this publication. Vrugt et al (2012) give a
concise explanation of the HyMod state transition model we used here, except that ours
had a single quick-flow state and we did not consider parameter uncertainty.
197
The truth systems and prior ensembles were forced for
days with mean
boundary conditions supplied by measurements taken at the Leaf River Watershed during
the period Nov 5, 1952 through Feb 3, 1953. Forcing data uncertainty was prescribed
Gaussian
[
independent
in
time:
[
]
with
covariance
]. The state uncertainty distribution was
Synthetic observations were generated according to
.
.
Normalized posterior mean MSE and normalized posterior log-probabilities of truth are
plotted in Figure 1 along with 1-standard deviation error-bars from Monte Carlo
repetitions of the OSSE. There was no significant difference (
performance of any filter except for the SIRF, which had a negative
) between the
value for the
soil moisture state. In this case the observation function was locally close to linear and in
some cases the mean improvement scores for the EnKF were higher than for either
variant with a nonlinear observation function.
Sensitivity of the observation to state dimensions is illustrated in Figure 2 which plots an
example of GPR observation function ARD kernel inverse squared-correlation length
hyperparameters; large values represent states to which the observation is sensitive.
According to this ARD sensitivity analysis, stage is not dependent on soil moisture, and
any update to the soil moisture state will be due to the joint prior. This effect is small on
average for all four filters, however it is larger for the EnKF than for the gpEnKF.
198
3.2. Root-Zone Soil Moisture Estimation
The second toy example assimilated active radar backscatter synthesized by the integral
equation model (IEM; Fung et al. 1996) into a simple physical soil moisture accounting
simulator based on Marht and Pan (1984). The soil moisture state transition equations and
their parameters are in Appendix A. These OSSE used the same precipitation and
potential evaporation boundary condition as the HyMod example. Model state variables
were volumetric water content
in three layers with cumulative depths of 5, 15
and 30 [cm] respectively; 5 cm soil depth is typically assumed to be visible to L-band
active radar satellites (Entekhabi et al. 2010). Vertically and horizontally polarized
backscatter coefficients
and
were synthesized from the truth system 5 cm
soil moisture states with surface roughness RMS height of
autocorrelation length of , and a
of
, Gaussian roughness
incidence angle and a backscatter error distribution
. It is interesting to note that this application is similar to the neural
network approach to estimating soil moisture by Pulvirenti et al (2009) except that the
IEM was effectively inverted by Bayes law rather than by training an explicit inversion
emulator like Baghdadi et al (2002).
Validation results for this soil moisture experiment are illustrated in Figure 3. Here again,
the parametric filters performed similarly and better than the nonparametric SIRF. Most
notably, the gpEnKF performance was similar to the MLEF performance. No statistically
significant differences were found between any filter when considering a single
performance metric.
199
3.3. Lorenz 3-D
The third toy problem was based on the 3-dimensional Lorenz attractor (Lorenz 1963)
and is a common example for data assimilation in nonlinear models (Evensen and van
Leeuwen 2000; Miller et al. 1994; Pham 2001; Vrugt et al. 2012). The Lorenz equations
were forced by perturbations
and the state transition model:
[12]
was integrated by Runga-Kutta with a time step of
[
The initial state distribution was
parameters were
,
for
timesteps.
] and the
,
. Observations were generated according to
the nonlinear transform:
(
)
where the standard deviations of the three state dimensions
ensemble means, and
[13]
were estimated from
.
Results are illustrated in Figure 4. In this case, the filters with nonlinear observation
functions seemed to outperform the EnKF and SIRF in terms of both evaluation criteria,
however there was no significant difference between the performance metrics of any
200
filters. The application of Gaussian filters to the Lorenz system is sub-optimal (Cornford
et al. 2008), however in our experiments all filters were able to improve MSE of state
estimates in almost every OSSE.
4. Discussion
The innovation in this paper is due to non-linearizing the HMM observation function
with a nonparametric regression to allow for a standard form of the gain expression with
gradient local to each ensemble member. The posterior likelihood [6] and its gradient [7]
are linear combinations of terms related to the Bayesian prior and likelihood. This means
that any filter prior could be substituted for the Gaussian used in our examples. Most
notably, the likelihood portion of the gradient could be used in conjunction with a kernel
or copula prior, which is something that the MLEF does not allow since it localizes the
posterior likelihood around the ensemble mean.
The standard filter gain expression comes at the cost of an implicit MLE update. The
computational cost of the proposed filter is difficult to calculate exactly since the solution
is variational, however assuming
(we used
one calculation of the gpEnKF likelihood and gradient costs
update costs
GPR training data points),
, whereas one EnKF
. We generally found convergence in between 20 and 60 log-likelihood
function evaluations in all cases (all of our models were 3-D). When the observation
function is highly nonlinear, this cost may be acceptable given the improved accuracy of
the update, however this choice will be application dependent.
201
In our toy examples, the gpEnKF performed as as well as the MLEF and sometimes
better than the EnKF when the observation function was highly nonlinear. Parametric
filter performance was generally better than SIRF performance; resampling filters are
rarely used in land surface data assimilation because they generally require a large
simulation ensemble (Snyder et al. 2008) and often offer little accuracy benefit over the
EnKF (Kivman 2003; Nearing et al. 2012; van Leeuwen 2003; Weerts and El Serafy
2006).
Any data assimilation application must balance effectiveness and efficiency:
computational cost is due to both the filter update and the HMM integration (size of the
ensemble), and the appropriateness of filter assumptions will depend on the NLDS and
the nature of the desired state estimates. OSSE are a good way to inform this tradeoff.
Appendix: A Three-Layer Mahrt-Pan Soil Moisture Model
We developed a three-layer soil moisture model based on the two-layer model presented
by Mahrt & Pan (1984). The state vector consists of volumetric water content in the soil
layers (
); for our implementation we used layers with cumulative
depths of 5, 15 and 30 [cm]; soil layer thicknesses were
. The model
requires a thin upper layer to simulate water available for direct evaporation. Integration
was on a daily time step, and forcing data were daily potential evaporation (
daily cumulative precipitation (
hydraulic coefficients: porosity (
bubbling pressure (
) and
)). Model parameters were the Brooks-Corey
), residual moisture content (
), saturated hydraulic conductivity (
),
), and
202
pore size distribution index (b
, and
). We used values of
,
,
,
.
At each time step, soil diffusivity ( ) and unsaturated conductivity ( ) were calculated
according to Brooks and Corey (1964):
(
(
Infiltration potential was (
(
[A.1.1]
)
[A.1.2]
)
):
(
)
[A.2.1]
(
and actual infiltration (
)
(
))
):
(
(
Evaporation from the top soil layer was (
(
))
(
))
):
[A.2.2]
[A.2.3]
203
(
)
{
[A.3.1]
(
)
[A.3.2]
Volumetric water content was updated as:
(
(
(
(
)
(
(
)
(
)
)
)
)
[A.4.1]
[A.4.2]
)
[A.4.3]
204
Figures
Figure 1: Logarithms of the normalized posterior MSE (left; lower values are better) and
normalized posterior log-probabilities of the truth (right; higher values are better) from
assimilating stage observations into HyMod. Error-bars represent one standard deviation.
205
Figure 2: Inverse squared ARD correlation lengths for the HyMod GPR observation
function emulator.
206
Figure 3: Mahrt-Pan OSSE results from assimilating 5 [cm] soil moisture observations
(top) and radar backscatter observations
standard deviation.
and
(bottom). Error-bars represent one
207
Figure 4: Results from Lorenz 3-D OSSE nonlinear observation function. Error-bars
represent one standard deviation.
208
References
Baghdadi, N., Gaultier, S., & King, C. (2002). Retrieving surface roughness and soil
moisture from synthetic aperture radar (SAR) data using neural networks. Canadian
Journal of Remote Sensing, 28, 701-711
Bishop, C.H., Etherton, B.J., & Majumdar, S.J. (2001). Adaptive sampling with the
ensemble transform Kalman filter. Part I: Theoretical aspects. Monthly Weather Review,
129, 420-436
Boyle, D.P. (2000). Multicriteria calibration of hydrologic models. In, Department of
Hydrology and Water Resources. Tucson, AZ: University of Arizona
Brooks, R.H., & Corey, A.T. (1964). Hydraulic properties of porous media. Hydrology
Papers, Colorado State University
Bulygina, N., & Gupta, H. (2011). Correcting the mathematical structure of a
hydrological model via Bayesian data assimilation. Water Resources Research, 47
Cornford, D., Shen, Y., Vrettas, M., Opper, M., Archambeau, C., & Shawe-Taylor, J.
(2008). Approximations in non-Gaussian data assimilation: when does it matter? In, RSS /
RMS joint meeting on non-Gaussian / non-linear aspects of data assimilation, 10 April
2008. London, UK: Royal Statistical Society
Crow, W.T., & Van Loon, E. (2006). Impact of incorrect model error assumptions on the
sequential assimilation of remotely sensed surface soil moisture. Journal of
Hydrometeorology, 7, 421-432, doi:410.1175/JHM1499.1171
Entekhabi, D., Njoku, E.G., O'Neill, P.E., Kellogg, K.H., Crow, W.T., Edelstein, W.N.,
Entin, J.K., Goodman, S.D., Jackson, T.J., Johnson, J., Kimball, J., Piepmeier, J.R.,
Koster, R.D., Martin, N., McDonald, K.C., Moghaddam, M., Moran, S., Reichle, R., Shi,
J.C., Spencer, M.W., Thurman, S.W., Tsang, L., & Van Zyl, J. (2010). The Soil Moisture
Active Passive (SMAP) Mission. Proceedings of the IEEE, 98, 704-716
Evensen, G. (2003). The Ensemble Kalman Filter: theoretical formulation and practical
implementation. Ocean Dynamics, 53, 343–367, doi:310.1007/s10236-10003-1003610239
Evensen, G., & van Leeuwen, P.J. (2000). An ensemble Kalman smoother for nonlinear
dynamics. Monthly Weather Review, 128, 1852-1867
Fung, A.K., Dawson, M.S., Chen, K.S., Hsu, A.Y., Engman, E.T., O'Neill, P.O., &
Wang, J. (1996). A modified IEM model for: scattering from soil surfaces with
application to soil moisture sensing. IGARSS '96. 1996 International Geoscience and
209
Remote Sensing Symposium. Remote Sensing for a Sustainable Future (Cat.
No.96CH35875)
Gordon, N.J., Salmond, D.J., & Smith, A.F.M. (1993). Novel approach to nonlinear nonGaussian Bayesian state estimation. IEE Proceedings-F Radar and Signal Processing,
140, 107-113, doi:110.1049/ip-f-1042.1993.0015
Kivman, G.A. (2003). Sequential parameter estimation for stochastic systems. Nonlin.
Processes Geophys., 10, 253-259
Lawrence, N. (2005). Probabilistic non-linear principal component analysis with
Gaussian process latent variable models. Journal of Machine Learning Research, 6,
1783-1816
Lorenz, E.N. (1963). Deterministic nonperiodic flow. Journal of the Atmospheric
Sciences, 20, 130-141
Mahrt, L., & Pan, H. (1984). A 2-layer model of soil hydrology. Boundary-Layer
Meteorology, 29, 1-20
Maurer, E.P., & Lettenmaier, D.P. (2003). Predictability of seasonal runoff in the
Mississippi River basin. Journal of Geophysical Research-Atmospheres, 108
Miller, R.N., Ghil, M., & Gauthiez, F. (1994). Advanced data assimilation in strongly
nonlinear dynamical-systems. Journal of the Atmospheric Sciences, 51, 1037-1056
Moradkhani, H., Hsu, K.L., Gupta, H., & Sorooshian, S. (2005). Uncertainty assessment
of hydrologic model states and parameters: Sequential data assimilation using the particle
filter. Water Resources Research, 41
Neal, R.M. (1996). Bayesian Learning for Neural Networks. New York: Springer
Nearing, G.S., Crow, W.T., Thorp, K.R., Moran, M.S., Reichle, R.H., & Gupta, H.V.
(2012). Assimilating remote sensing observations of leaf area index and soil moisture for
wheat yield estimates: An observing system simulation experiment. Water Resources
Research, 48
Pham, D.T. (2001). Stochastic methods for sequential data assimilation in strongly
nonlinear systems. Monthly Weather Review, 129, 1194-1207
Pulvirenti, L., Ticconi, F., & Pierdicca, N. (2009). Neural network emulation of the
Integral Equation Model with multiple scattering. Sensors, 9, 8109-8125
Rasmussen, C., & Williams, C. (2006). Gaussian Processes for Machine Learning.
Cambridge, MA: MIT Press
210
Snyder, C., Bengtsson, T., Bickel, P., & Anderson, J. (2008). Obstacles to highdimensional particle filtering. Monthly Weather Review, 136, 4629-4640
Titsias, M.K., & Lawrence, N.D. (2010). Bayesian Gaussian process latent variable
model. Proceedings of the 13th International Conference on Artificial Intelligence and
Statistics (AISTATS), 844-851
van Leeuwen, P.J. (2003). A variance-minimizing filter for large-scale applications.
Monthly Weather Review, 131, 2071-2084
Vrugt, J.A., Diks, C.G.H., Gupta, H.V., Bouten, W., & Verstraten, J.M. (2005). Improved
treatment of uncertainty in hydrologic modeling: Combining the strengths of global
optimization and data assimilation. Water Resources Research, 41, 17
Vrugt, J.A., ter Braak, C.J.F., Diks, C.G.H., & Schoups, G. (2012). Hydrologic data
assimilation using particle Markov chain Monte Carlo simulation: Theory, concepts and
applications. Advances in Water Resources
Wang, J.M., Fleet, D.J., & Hertzmann, A. (2008). Gaussian process dynamical models
for human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30,
283-298
Weerts, A.H., & El Serafy, G.Y.H. (2006). Particle filtering and ensemble Kalman
filtering for state updating with hydrological conceptual rainfall-runoff models. Water
Resources Research, 42
Zupanski, M. (2005). Maximum likelihood ensemble filter: Theoretical aspects. Monthly
Weather Review, 133, 1710-1726
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement