dep_berg_20051215.
Model Reduction for Dynamic Real-Time
Optimization of Chemical Processes
Cover design by Rob Bergervoet
c
Copyright 2005
Edoch
Model Reduction for Dynamic Real-Time
Optimization of Chemical Processes
PROEFSCHRIFT
ter verkrijging van de graad van doctor
aan de Technische Universiteit Delft,
op gezag van de Rector Magnificus prof.dr.ir. J.T. Fokkema,
voorzitter van het College voor Promoties,
in het openbaar te verdedigen
op donderdag 15 december om 13:00 uur
door
Jogchem VAN DEN BERG
werktuigkundig ingenieur
geboren te Enschede
Dit proefschrift is goedgekeurd door de promotor:
Prof.ir. O.H. Bosgra
Samenstelling promotiecommissie:
Rector Magnificus
Prof.ir. O.H. Bosgra
Prof.dr.ir. A.C.P.M. Backx
Prof.ir. J. Grievink
Dr.ir. P.J.T. Verheijen
Prof.dr.-Ing. H.A. Preisig
Prof.dr.-Ing. W. Marquardt
Prof.dr.ir. P.A. Wieringa
voorzitter
Technische Universiteit Delft, promotor
Technische Universiteit Eindhoven
Technische Universiteit Delft
Technische Universiteit Delft
Norwegian University of Science and Technology
RWTH Aachen
Technische Universiteit Delft
Published by: OPTIMA
OPTIMA
P.O. Box 84115
3009 CC Rotterdam
The Netherlands
Telephone: +31-102201149
Telefax :
+31-104566354
E-mail:
[email protected]
ISBN 90-8559-152-x
Keywords: chemical processes, model reduction, optimization.
c
Copyright 2005
by Jogchem van den Berg
All rights reserved. No part of the material protected by this copyright notice
may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and
retrieval system, without written permission from Jogchem van den Berg.
Printed in The Netherlands.
Voorwoord
Na een memorabele tijd van bijna zeven jaar staat dan uiteindelijk toch het
resultaat van mijn onderzoek zwart op wit. En dat doet goed.
Tijdens mijn afstuderen raakte ik er steeds meer van overtuigd dat het zeer
de moeite waard zou zijn om een promotieonderzoek te gaan doen. Na enkele
gespreken met Ton Backx en Okko Bosgra over een internationaal project, kwam
het onderwerp modelreductie ter sprake waarvan ik meteen geloofde dat het een
boeiend onderwerp zou zijn.
Ik denk met erg veel plezier terug aan de absurdistische gesprekken afgewisseld met heftige inhoudelijke neuzel discussies. Meestal begon het serieus maar
gelukkig was er altijd wel iemand met een verfrissende opmerking, om zo het
belang van de zaak te relativeren.
Een paar mensen wilde ik graag in het bijzonder bedanken. Ten eerste natuurlijk Okko die mij alle vrijheid heeft gegeven om mijn eigen plan te trekken
op basis van onze inhoudelijke altijd interessante discussies. Adrie wil ik graag
bedanken voor zijn gepassioneerde uitleg over alles was met chemie te maken
heeft en voor de rol van klankbord die hij voor mij vervulde. Mijn kamergenoten
Rob en Dennis, nestor David, Martijn, Eduard, Branko, Camile, Gideon, Leon,
Maria, Martijn, Matthijs en Agnes, Carsten, Debbie, Peter, Piet, Sjoerd en Ton
wil ik bedanken voor alle koffietafelgesprekken en borrelpraat. Ook wil ik mijn
collega’s van het project bedanken waaronder Martin, Jitendra, Wolfgang Marquardt, Mario, Jobert, Wim, Sjoerd, Celeste, Piet-Jan, Pieter, Peter Verheijen,
Johan Grievink. Zonder jullie was het project zeker niet geslaagd.
Tenslotte wil ik mijn ouders bedanken voor de onvoorwaardelijke steun die
ik altijd van hen gekregen heb. Mijn broer Mattijs en Kirsten voor Lynn voor
wie ik nu eindelijk een goede suikeroom kan zijn. Rob voor het ontwerp van de
omslag van mijn boekje en mijn andere vrienden de al die tijd verhalen hebben
moeten aanhoren over de ups en downs die ik tijdens mijn promotietijd heb
gehad. Op naar de volgende uitdaging!
Jogchem van den Berg
Rotterdam, oktober 2005
v
Contents
Voorwoord
v
1 Introduction and problem formulation
1.1 Introduction . . . . . . . . . . . . . . . .
1.2 Problem exploration . . . . . . . . . . .
1.3 Literature on nonlinear model reduction
1.4 Solution directions . . . . . . . . . . . .
1.5 Research questions . . . . . . . . . . . .
1.6 Outline of this thesis . . . . . . . . . . .
.
.
.
.
.
.
1
1
4
17
25
28
29
2 Model order reduction suitable for large scale nonlinear models
2.1 Model order reduction . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Balanced reduction . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Proper orthogonal decomposition . . . . . . . . . . . . . . . . . .
2.4 Balanced reduction revisited . . . . . . . . . . . . . . . . . . . . .
2.5 Evaluation on a process model . . . . . . . . . . . . . . . . . . .
2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
31
37
45
50
62
73
3 Dynamic optimization
3.1 Base case . . . . . .
3.2 Results . . . . . . . .
3.3 Model quality . . . .
3.4 Discussion . . . . . .
75
75
85
86
88
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Physics-based model reduction
89
4.1 Rigorous model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2 Physics-based reduced model . . . . . . . . . . . . . . . . . . . . 95
vii
4.3
4.4
Model quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5 Model order reduction by projection
5.1 Introduction . . . . . . . . . . . . . . . .
5.2 Projection of nonlinear models . . . . .
5.3 Results of model reduction by projection
5.4 Discussion . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
107
107
110
113
123
6 Conclusions and future research
127
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Bibliography
140
List of symbols
141
A Gramians
143
A.1 Balancing transformations . . . . . . . . . . . . . . . . . . . . . . 143
A.2 Perturbed empirical Gramians . . . . . . . . . . . . . . . . . . . 145
B Proper orthogonal decomposition
147
C Nonlinear Optimization
149
D Gradient information of projected models
153
Summary
155
Samenvatting
157
Curriculum Vitae
159
viii
Chapter 1
Introduction and problem
formulation
This thesis explores possibilities of model reduction techniques for online optimization based control in chemical process industry. The success of this control
approach on industrial scale problems started in petrochemical industry and becomes now adopted by chemical process industry. This is a challenge because
the operation of chemical plants differs from petrochemical plant operation imposing different requirements on optimization based control and consequently on
process models. For online optimization based control we have limited computational time so computational load is a critical issue. This thesis focusses on the
models and their contribution to online optimization based control.
1.1
Introduction
The value of models in process industries becomes apparent in practice
and literature where numerous successful applications are reported of steadystate plant design optimization and model based control. This development
was boosted by maturing commercial modelling tools and continuously increasing computing power. Side effect of this development is that not only larger
processes with more unit operations can be modelled but each unit operation
can be modelled in more detail as well. Especially spatially distributed systems, such as distillation columns and tubular reactors, are well known model
size boosters.
Large-scale models are in principle not a problem, at the most inconvenient
for the model developer because of the long simulation times involved. Application of models in an online setting seriously pushes the demands on models
1
to its limits since the available time for computations is limited. Numerous
solutions are thinkable that contribute in solving this issue varying from buying
faster computers to solving approximate, less computational demanding, control
problems.
Industry focusses on implementation of effective optimization based control
solutions whereas university groups focus on understanding of (in)effectiveness
of these solutions. Understanding gives direction to development of more effective solutions suitable for industrial applications. This thesis is the result
of close collaboration between industry and universities aiming at symbiosis of
these two different focusses.
Online optimization
From digital control theory (see e.g. Åström and Wittenmark, 1997) we know
that sampling introduces a delay in the control system, limiting controller bandwidth. The sampling period is therefore preferably chosen as short as possible. A similar situation holds for online optimization. When applying online
optimization-based control we can make a tradeoff between a high precision
solution with a long sampling period and an approximate solution at short sampling period. In case of a fairly good model and low frequently disturbances, a
low sample rate would most probably be sufficient. However, a tight quality constraint in combination with a fast response to a disturbance for the same system
will require a much higher controller bandwidth to maintain performance. So,
the tradeoff between solution accuracy and sampling period depends on plantmodel mismatch, disturbance characteristics, plant dynamics and presence of
constraints.
With current modelling packages, processes can be modelled in great detail
and model accuracy seems unquestionable. Unfortunately, in reality we always
have to deal with uncertain parameters and stochastic disturbances. One can
think of heat exchanger fouling, catalyst decay, uncertain reaction rates and
uncertain flow patterns. This motivates the necessity of a feedback mechanism
that deals with disturbances and uncertainties on a suitable sampling interval.
So we can develop large scale models, based on modelling assumptions, that
have some degree of accuracy. Still we need to estimate some key uncertain
parameters online (e.g. heat exchanger fouling) from data. Why not choose for
a shorter sampling period enabling a higher controller bandwidth by allowing
some model approximation? At some point there must be a break even point
where performance of controller based on an accurate model at low sampling rate
is as good as based on a less accurate model at high sampling rate. Applications
of linear model predictive controllers to mildly nonlinear processes illustrate
this successful tradeoff between model accuracy and sample frequency. This
observation is the main motivation to research and develop nonlinear model
2
approximation techniques for online optimization based control. Aim of this
approximative model is to improve performance by adding more accuracy to
the predictive part without being overtaken by the downside of a slightly longer
sampling period.
Model reduction
Model reduction is not a purpose in itself. Without being specific what is aimed
at by reduction, model reduction is meaningless. In this thesis we will assess
the value of model reduction techniques for dynamic real-time optimization.
In system theory we associate model reduction with model-order reduction,
which implies a reduction of the number of differential equations. Because linear model reduction was successful, this notion was carried over to reduction of
process models governed by nonlinear differential and algebraic equations (dae).
First difference between these two models types is obviously that dae process
models are nonlinear in their differential part. This nonlinearity was precisely
the extension we looked for since this should improve model accuracy. Second
difference is that dae process models consist of many algebraic equations. In
case of a set of (index one) nonlinear differential algebraic equations in general
we cannot eliminate these algebraic equations by analytical substitution, bringing us back to ordinary differential equations. This implies we deal with a truly
different model structure if implicit algebraic equations are present. This difference has far reaching consequences for the notion of reduction of nonlinear
models as will become clear later in this thesis.
As discussed in the previous section, both solution accuracy and computational load of the online optimization determine the online performance. In case
of online optimization based control, in principle we do not care about the exact
number of differential or algebraic variables of a model; model reduction should
result in a reduction of the optimization time accepting some small degradation
of solution accuracy. Model approximation is an alternative description for the
model reduction that will be used in this thesis since it is less associated to
model order reduction only.
A model is not only characterized by its number of differential and algebraic
equations, but also by structural properties such as sparsity and controllabilityobservability. Time scales, nonlinearity and steady-state gains are important
properties as well. The relation between model properties and computational
load and accuracy is not always trivial, which can result in unexpected findings
when evaluating model reduction techniques. Note further that the model is
not the only degree of freedom within the optimization framework. The success of model reduction for online optimization-based control depends on the
optimization strategy and implementation details such as choices of solvers and
solver options.
3
Realizing that the final judgement of the success of a model reduction technique
for online optimization depends on many different choices of implementation we
elaborate in the next section on different aspects that effect the optimization
problem.
1.2
Problem exploration
In this section we will explore different aspects to be considered when
discussing dynamic real time optimization of large scale nonlinear chemical
processes. First of all we need a model representing the process behavior. This
is not a trivial task but is nowadays supported by various commercially available tools. Such a model is then confronted with plant data to assess its model
validity, which in general requires changes in the model. After several iterations we end up with a validated model, which is ready for use within online
optimization. Since we are interested in future dynamic evolution of some key
output process variables we require simulation techniques to relate them to future process input variables by simulation. Computation of the best future
inputs is done by formulating and solving an optimization problem. We will
touch on differences between the two main implementation variants to solve
such a dynamic optimization problem. Finally we will motivate the need for
model reduction by showing the computational consequences of straightforward
implementation of this optimization problem for the model size that is common
for industrial chemical processes.
Mathematical models
Numerous different names are available for models, each referring to a specific
property of the model. One could distinct between models based on conservation laws and data driven models. The first class of models is referred to as first
principles, fundamental, rigorous or theoretical models and are in general nonlinear dynamic continuous time models. These models are formulated by a set
of ordinary differential equations (ode model) or a set of differential algebraic
equations (dae model).
The second class of models is referred to as identified, step response or
impulse response models and are in general defined as linear discrete time inputoutput regressive models. Nonlinear static models can be added to these linear
dynamic models giving an overall nonlinear behavior. The subdivision is not as
black and white as stated here and combinations of both classes are possible as
the term hybrid models already implies.
Process models that are used for dynamic optimization in general are formulated as a set of differential algebraic equations. In the special case that
4
all algebraic equations can be eliminated by substitution, the dae model can
be rewritten as a set of ordinary differential equations. Distinction between
these two models is important because of their different numerical integration
properties, which will be treated later in this chapter. Characteristic for models
in process industry are physical property relations that generally do not allow
for elimination by substitution and to a large extent contribute to the number
of algebraic equations. Physical property calculations can also be hidden in
an external module interfaced to the simulation software. Caution is required
interfacing both pieces of software1 .
Partial differential equations (pde model) describe microscopic conservation
laws and emerge naturally in case spatial distributions are modelled. Classical
examples are the tubular heat exchanger and tubular reactor. Although this
type of models is explicitly mentioned here, the model can be translated into a
dae-model that approximates the true partial differential equations.
Dae and ode models both have internal variables, which implies that the
effect of all past inputs to the future can be captured by a single value for all
variables of the model at time zero, as opposed to most identified models that
in general are regressive and require historical data to capture future behavior
of past inputs. Disadvantage of continuous time nonlinear models is the computational effort for simulation whereas simulation with a discrete time model
is trivial. On the other hand stability of a nonlinear discrete time model is
nontrivial. Advantage of rigorous models is that they in general have a larger
validity range than identified models because of their fundamental origin of nonlinear equations. Nonlinear identification techniques of dynamic processes are
emerging but are still far from mature.
Although modelling still can be tedious, developments by commercial process
simulation and modelling tools such as gPROMS and Aspen Custom Modeler
allow people with different modelling skills to use and build rigorous models
quite efficiently. A graphical user interface with drag and drop features increased the accessability for more people than only diehard command prompt
programmers. Still, thorough modelling knowledge is required to deal with fundamental problems such as index problems.
All mathematical models can be described by its structure and parameters.
Fundamental question is how to decide on model structure and how to interpret mismatch between plant and simulated data. Do we need to change model
structure or do we need to change model parameter values? No tools are available how to discriminate between those two options not to mention help finding
a better structure.
1 Error handling in the external module can conflict with error handling by the simulation
software.
5
In case we do have a match between plant and simulated data, we could have the
situation where not all parameters can be uniquely determined from available
plant data. Danger is that the match for this specific data can be satisfactory but
using a new data set could give a terrible match. Whenever possible it seems
wise to arrange parameters in order of uncertainty. Dedicated experiments
can decrease uncertainty of specific parameters such as activity coefficients and
pre-exponential factors in an Arrhenius equation or physical properties such as
specific heat. Besides parameter uncertainty we have structural uncertainty due
to lumping based on uncertain flow patterns or uncertain reaction schemes. In
some cases we can interchange model uncertainty by parameter uncertainty by
adding a structure that can be inactivated by a parameter. Risk is that we end
up with too many parameters to be determined uniquely from available data.
Computation of the values of model parameters will be discussed in the next
section.
Identification and validation
Computation of parameter values can be formulated as a parameter optimization
problem and is referred to as a parameter estimation problem in case the model
structure is based on first principles. In case model structure is motivated
by mathematical arguments, identification is more commonly used to address
the procedure. Basically both boil down to a parameter optimization problem
minimizing some error function.
Important for model validation are the model requirements. Typically,
model requirements are defined in terms of error tolerance on key process variables over some predefined operating region. Most often these are steady-state
operating points but these requirements can also be checked for dynamic operation. Less common is a model requirement defined in terms of a maximum
commotional effort. The objective of a parameter identification is to find those
parameters that minimizes the error over this operation region.
Resulting parameter values of either estimation or identification are now
ready for validation. During the validation procedure parameter values are
fixed and the model is used to generate predictions based on new input-output
data. This split of data is also referred to as estimation data set and validation
data set. From a more philosophical point of view a one could better refer to
model validation by model unfalsification (Kosut, 1995); a model is valid until
proven otherwise. This touches on the problem that for nonlinear models not
all possible input signals can be validated against plant data. For linear models
we can assess model error because of the superposition principle and duality
between time and frequency domain.
Model identification is a data driven approach to develop models. The therefore required data can either be obtained during normal operation, or as in most
6
cases, from dedicated experiments (e.g. step response tests). Elegant property
of this approach is that the identified model is both observable and controllable,
which is typically not the case for rigorous models. Since only a very limited
number of modes are controllable and observable this results in low order models.
In that sense rigorous modelling can learn from model identification techniques.
Linear model identification is a mature area whereas for nonlinear identification several techniques are available (e.g. Volterra series, neural nets and
splines) without a thorough theoretical foundation. Neural networks have a very
flexible mathematical structure with many parameters. By means of a parameter optimization, referred to as training of the neural net, an error criterion is
minimized. The result is tested (validated) against data that was not used for
training.
Many papers are written on this appealing topic with the main focus on parameter optimization strategy and internal approximative functions and structure.
Danger of neural nets is over-fitting, which results in poor interpolative predictions. Over-fitting implies that the data used for training is not rich enough
to uniquely determine all parameters (comparable to an under-determined or
ill-conditioned least squares solution). Extrapolative predictive capability is acknowledged to be very bad (Can et al., 1998) and one is even advised to train
the neural net with data that encloses a little bit more than the relevant operating envelope. This reveals another weak spot of this type of data driven
models since data is required at operating conditions that are undesired. Lots
of data is required, which can be very costly if these data has to be generated by
dedicated tests. A validated rigorous model can take away part of this problem
when used as an alternative data generator.
Simulation
Simulation of linear and discrete time models is a straightforward task whereas
simulation of continuous time nonlinear models is more involved. In general
simulation is executed by a numerical integration routine available in many
different variants. Basic variants are described in textbooks such as by Shampine
(1994), Dormand (1996) and Brenan et al. (1996). The main problem with this
method is that the efficiency of these routines are strongly effected by heuristics
in e.g. error, step-size and prediction order control, which is less easy to see
through.
Easily understandable are fixed step-size explicit integration routines like
Euler and Runga-Kutta schemes. The main problem here is the poor efficiency
for stiff systems due to small integration step-size. The stability region of explicit integration schemes limits step size whereas stability of implicit integration
routines does not depend on step-size. Implicit fixed step-size integration routines can be viewed as an optimization solved by iterative Newton steps. A well
7
known property of this Newton step based optimization is its fast convergence,
given a good initial guesses. Numerous different approaches based on different
interpolation polynomials are available for this initial guess under which the
Gear predictor (Dormand, 1996) is probably known best.
Routines with a variable step-size are more tedious to understand, caused by
heuristics in step-size control. This control balances step-size with the number
of Newton steps needed for convergence with the objective to minimize computational load. Similarly the order of the prediction mechanism may be variable
and controlled by heuristics. Inspection of all options reveals that many handles are available to influence the numerical integration (e.g. absolute tolerance,
relative tolerance, convergence tolerance, maximum iterations, maximum iteration of no improvement, effective zero, perturbation factor, pivot search depth,
etc.). Fixed step-size numerical integration routines exhibit a variable numerical
integration error with a pre-computed upper bound whereas variable step-size
routines maximize step-size constraint to a maximum integration error tolerance.
Experience learns that consistent initialization of dae models is a delicate
issue and far from trivial, since it reduces to an optimization problem with
as many degrees of freedom as variables that are to be initialized (easily over
tens of thousands of variables). Not only does a good initial guess speed up
convergence, in practice it appears to be a necessity; with a default initial guess,
initialization will most probably fail. Modelling of dae systems in practice is
done by developing a small model that gradually is extended, reusing previously
converged initial values as an (incomplete) initial best guess of both algebraic
and differential variables.
Numerical integration routines were developed for autonomous systems. Discontinuities can be handled but at cost of a (computationally expensive) reinitialization. Since a digital controller would introduce a discontinuity at every
sample time, and consequently require a re-initialization, it can be attractive
to approximate this digital controller by its analogue (continuous) equivalent
if possible. For simulation of optimal trajectories defined on a basis of discontinuous functions, it might be worthwhile to approximate the trajectory by a
set of continuous and differentiable basis-functions. This reduces the number of
re-initializations and therefore can improve computational efficiency, unless the
step-size has to be reduced drastically where the differentiable approximation
introduces large gradients.
Selecting a solver and fine tuning solver options balancing speed and robustness is a tedious exercise and makes it hard to derive general conclusions about
different available solvers. Generally models of chemical processes exhibit different time-scales (stiff system) and a low degree of interacting variables (sparsity).
Sparse implicit solvers deal with this type of models very efficiently.
8
Optimization
Like in the world of modelling, the field of dynamic optimization has its own
jargon to address specific characteristics of the problem. Most optimization
problems in process industry can be characterized as non-convex, nonlinear,
constrained optimization problems. In practice this implies that only local optimal solutions can be found instead of global optimal solutions.
The presence of constraints requires constraint handling, which can be done
in different ways (see e.g. textbooks by Nash and Sofer, 1996 and Edgar and
Himmelblau, 1989). Often these constraints are multiplied with Lagrange multipliers and added to the objective, which transforms the original optimization
problem into an unconstraint optimization problem. We can distinguish between penalty and barrier functions. The penalty function approach allows for
(intermediate) solutions that violate constraints (most probably they will, since
solutions tend to be at the constraint), whereas the barrier function approach
requires a feasible initial guess and from this solution guarantees feasibility.
Finding a feasible initial guess can already be very challenging, which explains
the popularity of the penalty function approach.
For steady-state plant (Floudas, 1995) design optimization, typical optimization parameters are equipment size, recycle flows and operating conditions like
temperature, pressure and concentration. Discrete decision variables to determine the type of equipment (or number of distillation trays) yield a computational hard optimization known as a mixed integer nonlinear program (minlp).
The optimization problem to be solved for computation of optimal input trajectories is referred to as a dynamic optimization problem and generally assumes
smooth nonlinear models without discontinuities. Using a parametrization of
these trajectories by means of basis functions and coefficients, such a problem
can be written as a nonlinear program. The choice of basis functions determines
the set of possible solutions. A typical set of basis functions consists of functions
that are one for a specific time interval and zero otherwise. This basis allows a
progressive distribution of decision variables over time, which is very commonly
used in online applications. A progressive basis reflects the desire (or expectation!) to have an optimal solution with (possible) high frequent control moves
in the beginning and low frequent control moves towards the end of the control
horizon. Since a clever choice of basis functions could reduce the number of
basis functions (and consequently the number of parameters for optimization)
this is an interesting field of research.
For the solution of dynamic optimization problems we need to distinguish between the sequential and simultaneous approach (Kraft, 1985; Vassilidis, 1993).
The sequential approach computes a function evaluation by simulation of the
model followed by a gradient based update of the solution. This sequence is re9
peated until solution tolerances are satisfied (converged solution) or some other
termination criterion is satisfied (non converged solution).
In the simultaneous approach, also referred to as collocation method (Neuman and Sen, 1973; Biegler, 2002), not only the input trajectory is parameterized but the state trajectories as well. This trajectory is described by a set
of basis functions and coefficients from which the time derivatives can be computed. At each discrete point in time this trajectory time derivative should
satisfy the time derivative defined by the model equations. This results in a
nonlinear program (nlp) type of optimization problem were the objective is
minimized subjected to a very large set of coupled equality constraints representing the process behavior. The free parameters of this nlp are both the
parameters that define the input trajectory and parameters that describe the
state trajectory. Since mathematically there is no difference between these parameters and all parameters are updated each iteration step together, this method
is called the simultaneous approach.
In general the sequential approach outperforms the simultaneous approach
for large systems. This is not a rigid conclusion since in both areas researchers
are developing better algorithms exploiting structure and computationally cheap
approximations. Note that in case of the sequential approach during all (intermediate) solutions the model equations are satisfied by means of simulation.
In case of the simultaneous approach intermediate solutions generally do not
satisfy model equations. Note furthermore that with a fixed input trajectory
the collocation method is an alternative for numerical integration.
Both the sequential as well as the simultaneous approach are implemented as
an approximate Newton step type of optimization. The Hessian is approximated
by an iterative scheme efficiently reusing derivative information. A true Newton
step is simply not worthwhile because of its computational load. Optimization
routines require a sensitivity function of optimization parameters with respect
to the objective function (and constraints). This sensitivity function is reflected
by partial derivatives that can be computed by numerical perturbation or in
special cased by analytical derivatives.
Jacobian information generated during simulation can be used to build a
linear time variant model along the trajectory, which proves to be an efficient
and suitable approximation of the partial derivatives. Furthermore, parametric sensitivity can also be derived by integration of sensitivity equations or by
solving adjoint equations. Reuse of Jacobian information from the simulation
and exploitation of structure can reduce the computational load resulting in an
attractive alternative.
10
Industrial process operation and control
Process operation covers a very wide area and involves different people throughout the company. The main objective the plant operation is to maximize profitability of the plant.
Primary task of plant operation is safeguarding. Safety of people and environment always gets highest priority. In order to achieve this, hardware measures are implemented. Furthermore, measurements are combined to determine
the status of the plant. If a dangerous situation is detected, a prescribed scenario is launched that shuts down the plant safely. For the detection as well
as for the development of scenarios, models can be employed. Fault detection
can be considered as a subtask within the safeguarding system. It involves the
determination of the status of the plant. A fault does not always induce a plant
shutdown but can also trigger a maintenance action.
Basic control is the first level in the hierarchy as depicted in Figure 1.1
providing control actions to keep the process at desired conditions. Unstable
processes can be stabilized allowing safe operation. Typically, basic controllers
receive temperature, pressure and flow measurements and act on valve positions.
Furthermore, all kinds of smart control solutions are developed to increase performance. Different linearizing transformation schemes and decoupling ratio and
feed forward schemes are implemented in the distributed control system (dcs)
and perform quite well. These schemes are to a high degree based on process
knowledge, but nevertheless not referred to as advanced process control.
Steady-state energy optimization (pinch) studies can reduce energy costs by
rearranging energy streams (heat integration). Side effect is the introduction
of (undesired) interaction of different parts of a plant. An upset downstream
can act as a disturbance upstream without a material recycle present. Material
recycle streams are known to introduce large time constants of several hours
(Luyben et al., 1999). Both heat integration and material recycles complicate
control for operators.
Automation of the process industry took place very gradually. Nowadays
most measurements are digitally available in the control room from which practically all controls can be executed. The availability of these measurements were
a necessity for the development of advanced process control techniques, such as
model predictive control (see tutorial paper by Rawlings, 2000), and because of
its success, it initiated real-time process optimization.
Scheduling can be considered as the top level of plant operations (Tousain,
2002) as depicted in Figure 1.1. At this level it is decided what product is
produced at what time and sometimes even by which plant. Processing of information from the sales and marketing department, the purchase department,
storage of raw material and end products is a very complex task. Without
radical simplifications, implementation of a scheduling problem would result in
11
scheduler
✻
❄
(dynamic) real time optimizer
✻
❄
model predictive controller
✻
❄
plant + basic controllers
Figure 1.1: Control hierarchy with different layers and information transfer.
a mixed integer optimization that exceeds the complexity of a dynamic optimization. Therefore models used for scheduling problems only reflect very basic
properties preventing the scheduling problem to explode. Scheduling will be
not further discussed in this thesis although it is recognized as a field with large
opportunities.
In practice very pragmatic solutions are implemented such as the production
of different products in a fixed order, referred to as a product wheel. This rigid
way of production has the advantage that detailed information is available to
predict all costs that are involved. Downside is that opportunities are missed
because of this inflexible operation. The availability of model-based process
control enables a larger variety of transitions between different products. Information on the characteristics of different transitions can be made available
and can be exploited by the scheduling task. This increases the potential of
scheduling but requires powerful tools.
Production nowadays shifts from bulk to specialties, which creates new opportunities for those who know to swiftly control their processes within new
specifications (Backx et al., 2000). Capability of fast and cheap transitions
enables companies to produce and sell on demand at usually favorable prices
and brings added value to the business. In order to be more flexible, stationary
plant operation is replaced by a more flexible transient (or even batch wise) type
of operation. An other driver to improve on process control is environmental
legislation, which becomes more and more stringent and pushes operation to
its limits. Optimization-based process control contributes to this flexible and
competitive high quality plant operation.
12
Economic dynamic real time optimization plays a key role in bringing money
to the business, since it translates a schedule into economically optimal set
point trajectories for the plant. At least as important is the feedback that the
dynamic optimization can give to the scheduling optimization in terms of e.g.
minimal required transition times and estimated costs of different and possible
new transitions. This information, depicted by the arrow from dynamic real
time optimization to the scheduler in Figure 1.1, enables improved scheduling
performance because the information is more accurate and complete and allows
for more flexible operation. This more enhanced information can, for example,
bring the difference between accepting and refusing a customers order. Dynamic
real-time optimization plays a crucial role in connecting scheduling to plant
control and can give a significant contribution to the profitability of a plant.
Real-time process optimization
state of the art operated plants have a layered control structure where the
plants steady-state optimum is computed recursively by the real-time process
optimizer providing set points that are tracked by a linear model predictive
controller. Besides that this approach was implementable from a computational
point of view, from the operators perspective, this approach was acceptable to
do as well, with a safety argument that pleads for a layered control structure.
In case of failure of the real-time optimization the process is not out of control
but only the optimization is not executed.
Since a state of the art optimizer assumes some steady-state condition, this
condition is checked before the optimizer is started (Backx et al., 2000). This
check is somewhat arbitrary because in practice a plant is never in steady state.
Before the next steady-state optimization is executed a parameter estimation
is carried out using online data. The result of the steady-state optimization is
a new set of set points causing a dynamic response of the plant. Only after
the process is stable again a next cycle can be started, which limits the update
frequency of optimal set points. If a process is operated quasi steady-state and
optimal conditions change gradually this approach can be very effective.
For a continuous process that produces different products we require economical optimal transitions from one operation point to the other. Including
process dynamics in the optimization enables exploitation of the full potential
of the plant. Result of this optimization approach will be a set of optimal set
point trajectories and a predicted dynamic response of the process. In this approach we do not require steady-state conditions to start an optimization and
it enables shorter transition times. Shorter transition times generally result in
reduction of off spec material and therefore increase profitability of a plant.
The real-time, steady-state optimizer and linear model predictive controller
can be replaced by a single dynamic real-time optimization (drto) based on one
13
✛
✲❄
❄
✲❄
✲❄
✛
❄
.
.
.
.
............
.
.
.
.
.
.
.
.
....
....
............. ..
..
..........✲
✛ ..
..
..............
............
......
.... ....
............
.................
.........
..............
............
.....
......
.......
.....
.
.........................
..............
............
..............
............
❄
.........
..
.........
..............
............
✠
.................
✲... ...
✛
..............
............
..............
............
.....
......
.......
❄
.........
..
.........
..............
............
✠
.................
✲... ..
..............
............
..............
............
.....
......
.......
❄
.........
..
.........
..............
............
✠
.................
✲... ..
..............
............
✲
✲
✲
✲
✲
✲
✒
....
.
...................
..............
............
..............
............
✒
....
.
..................
..............
............
..............
............
✒
....
.
..................
..............
............
✲
Figure 1.2: Typical chemical process flow sheet with multiple different unit
operations and material recycle streams representing behavior of a broad class
of industrial processes.
large-scale nonlinear dynamic process model. This problem should be solved
at the sample rate of the model predictive controller to maintain similar disturbance rejection properties as the linear model predictive controller. The
prediction horizon of the optimization should be a couple of times the process
dominant time constant. The implication of this straightforward implementation is discussed next.
Implications straightforward implementation
Let us now explore what the implication is of straightforward implementation
of dynamic real-time optimization as a replacement of the real-time steadystate optimizer and linear model predictive controller. In a typical chemical
process, two or more components react into the product of interest followed
by one or more separation steps. This represents typical behavior of a broad
class of industrial processes and therefore findings can be carried over to many
plants. In general, one or more side reactions take place introducing extra
components. The use of a catalyst can shift selectivity but never prevent side
reactions completely. Suppose we assume only one side reaction, we already
have to deal with four species, or even five if we take the catalyst into account.
We can separate the four species with three distillation columns as depicted in
Figure 1.2 if we assume that the catalyst can be separated by a decanter. The
recycle streams introduce positive feedback and therefore long time constants
in the overall plant dynamics (Luyben et al., 1999).
Suppose we assume a instantaneous phase equilibrium and uniform mixing
on each tray, the number of differential equation that describes the separation
14
of this chemical process is
nx = nc (ns + 1)(nt + 2) ,
where nx is the number of differential equations, nc is the number of columns,
ns is the number of species that are involved and nt is the average number of
trays per column. The one in the formula represent the energy balance and the
two represents the reboiler and condensor of a column. For a setup with three
columns with twenty, forty and sixty trays and five species we need already over
seven hundred and fifty differential equations. If the reaction takes place in
a tubular reactor we need to add a partial differential equation to the model.
This can only be implemented after discretization, easily adding another tree
hundred equations (five species times sixty discretization points) to the model
bringing the total over a thousand equations. So we can extend the previous
formula to
nx = nc (ns + 1)(nt + 2) + ns nd ,
where nd is the number of discretization points. In practice the number of
algebraic variables is three to ten times the number of differential equations,
depending on implementation of physical properties (as hidden module or as
explicit equations in the model). This brings the total of equations to several thousands up to ten thousand equations. This estimate serves as a lower
bound for a first principle industrial process model, and illustrates the number of
equations that should be handled by plant-wide model-based process operation
techniques.
Fortunately the models are very nicely structured that can be exploited. This
model property is referred to as the model sparsity and reflects the interaction
or coupling between equations and variables. This property can be visualized by
a matrix, the so-called Jacobian matrix J. If the j th variable occurs in the ith
equation the J(i, j) element is nonzero and zero otherwise. Most zero elements
are structural so do not depend on variable values. These elements do not have
to be recomputed during simulation, which allows for efficient implementation
of simulation algorithms. The number of nonzero elements for process models
is about five to ten percent of the total elements of the Jacobian matrix.
Next we will estimate the number of manipulated variables that are involved.
Every column has five manipulated variables: the outgoing flows from reboiler
and condensor, reboiler duty, condenser duty and the reflux rate or ratio. In
this simple example we can manipulate within the reactor the fresh feed flow
and composition, feed to catalyst ratio, cooling rate and outgoing flow of the
reactor. This brings the number of manipulated variables to twenty, all potential candidates to be computed by model-based optimization. The number of
15
parameters that is involved can be computed by the next formula:
np =
nu H
,
ts
where np is the number of free parameters, nu is the number of manipulated
variables, H is the control horizon and ts is the sampling rate. For a typical
process as described in this section, the dominant time constant can be over
a day, especially if recycle streams are present introducing positive feedback.
An acceptable sampling rate for most manipulated variables is a sampling rate
of one to five minutes, however, pressure control might require a much higher
sampling rate. In case of a horizon of three times the dominant time constant,
twenty inputs and a sampling rate of five minutes, the total number of free parameters is over seventeen thousand. This results in a very large optimization
problem that is not very likely to give sensible results. A selection of manipulated variables and clever parametrization of the input signals can reduce this
number of free parameters. The input signal can even be implemented as a
fully adaptive, problem-dependent parameterization generated by repetitive solution of increasingly refined optimization problems (Schlegel et al., 2005). The
adaptation is based on a wavelet analysis of the solution profiles obtained in the
previous step.
In practice, first some base layer control would be implemented around each
column to control levels and pressures. Set points for these controllers could
then be degree of freedom for optimization. The added value of including these
set points within a dynamic optimization are not evident but small inventories
could decrease transition times. If for some reason the added value of these
degrees of freedom are expected to be small they can be removed from the
optimization problem reducing the number of optimization parameters.
Suppose we want to do one nonlinear integration of the rigorous model within
one sampling period to do a prediction of an input trajectory. In this case
we need to simulate three days within five minutes. This requires at least
simulation speed of over eight hundred times real time. If a sampling period of
one hour is acceptable we still need a simulation speed of seventy two times real
time. In this scenario we did not account for multiple, in case of the sequential
optimization approach approximately between five and twenty, simulations and
other computations than simulation. Depending on the input sequence, for
the size of the models that is considered on current standard computer the
simulation speed is between one to twenty times realtime. This reveals the
tremendous gap between desired and current status of numerical integration.
Nevertheless, numerical solvers are already very sophisticated handling different
timescales, also referred to as stiff systems, and exploiting model structure.
With current commercial modelling tools we usually end up with a set of
differential and algebraic equations. Keeping the model in line with the process,
16
measurements are used to estimate the actual state of the process by means of
an observer, e.g. an extended Kalman filter (e.g. Lewis, 1986). This is a model
based filtering technique, balancing model error with measurement error. The
resulting state is then used as a corrected initial condition for the model. Finding a consistent solution for this new initial condition for a set of differential
algebraic equations is called an initialization problem, which is hard to solve
without a good initial guess. Fortunately, we can use the uncorrected state as
initial guess which should be good enough. Still this initialization problem has
to be solved every iteration at cost of valuable computing time.
Going online with a straightforward implementation of real time dynamic plant
optimization based on first principle models introduces an enormous computational overload. At present only very pragmatic solutions are available that
directly provoke all kinds of comments such as the inconsistency that is introduced by the use of different models in different layers within the plant operation
hierarchy. These approaches are legitimated by the argument that there are no
better alternatives readily available. Despite all this criticism on the pragmatic
solutions for the model based plant operation, the approach has proven to contribute towards the profitability of the plant. This profitability can only be
increased if consistent solutions are developed that replace the pragmatic solutions. Model reduction can provide a consistent solution and is explored in
this thesis. First we will continue with model reduction techniques available in
literature.
1.3
Literature on nonlinear model reduction
Models that are available for large scale industrial processes can in
general be characterized as a set of differential and algebraic equations (dae).
Therefore we search for model reduction techniques that are applicable to this
general class of models. This class of models is capable to describe the majority
of processes and is more general than a set of ordinary differential equations
(ode). Transformation of a dae into an ode is not possible in general and is
regarded as major model reduction step.
Since we are interested in the effect of different models on computational
load for optimization every technique mapping one model to an other model is
a candidate model reduction technique. This implies that different modelling
and identification techniques can be considered using the original model as data
generating plant replacement.
Marquardt (2001) states that the proper way to assess model reduction techniques for online nonlinear model based control is to compare the closed loop
performance based on the original model, with low sampling frequency, with
17
the reduced model at higher sampling frequency. Maximum sampling frequencies are determined by the computational load that is related to the differences
between original and reduced model. The reduced model should enable higher
sampling frequencies compensating for loss in accuracy and therefore result in
higher closed loop performance. None of this type of assessments have been
found in literature. Therefore we will need to resort to more general literature
on model reduction and nonlinear modelling techniques.
Computation and performance assessment of NLMPC
Findeisen et al. (2002) assessed computation and performance of nonlinear
model predictive control. The implementation of the control problem used in
this paper was the so-called direct shooting, which is a special efficient implementation of the simultaneous approach (Diehl et al., 2002). In their assessment,
different models are compared under closed loop control. The different models
of the 20 tray distillation column were a nonlinear wave model with 2 differential
and 3 algebraic equations, a concentration and holdup model with 42 ordinary
differential equations and a more detailed model (including tray temperatures)
with 44 differential and 122 algebraic equations. All different models where
the result of remodelling, thus extra simplifications and assumptions based on
physics and process knowledge resulted in reduced models.
The effect on the computational load is presented even for different control horizons, distinguishing between the maximum and average computation
time. More simplified models resulted in lower computational load, which is
not surprising. More interesting is that the reduction in computational load
is quantified. The increase of controller performance due to higher sampling
frequency enabled by reduced computational effort was not presented. Neither
is completely clear how big the modelling error between original and reduced
models is.
Load of state estimation for these different reduced models was assessed as
well. Furthermore computational load of different nonlinear predictive schemes
were assessed with the original model in case of no plant model mismatch.
Nonlinear wave approximation
Balasubramhanya and Doyle III (2000) developed a reduced-order model of a
batch reactive distillation column using travelling waves. The reader is referred
to e.g. Marquardt (1990) and Kienle (2000) for more details on travelling waves.
This nonlinear model was successfully deployed within a nonlinear model predictive controller (nlmpc) and linear Kalman filter that was computationally
more efficient than a nlmpc based on the original full-order nonlinear model.
Although the original full order model was only a 31st order ode and the re18
duced model a 5th order ode, the closed loop performance was over six times
faster in closed-loop with nlmpc based on the reduced model while maintaining
performance. Performance was quite high despite of prediction horizon of only
two samples and a control horizon of one. Furthermore they compared the nonlinear models with the linearized model illustrating the level of nonlinearity of
the process.
Simplified physical property models
Successful use of simplified physical property models within flow sheet optimization is reported by Ganesh and Biegler (1987). A simple flash with recycle as
well as a more involved ethylbenzene process with reactor, flash, two columns
and two recycle streams are presented in that paper. In both cases the rigorous phase equilibrium model (Soave-Redlich-Kwong) was approximated by a
simplified model (Raoult’s law, Antoine’s equation and ideal enthalpy). This
type of model simplification is based on process knowledge and physical insight.
It is a tailored approach but applicable to all models where phase equilibria
are to be computed. Reductions up to an order of magnitude were reported
by straightforward use of the simplified model within optimization. Danger
of this approach was already reported by Biegler (1985) and is that the optimum does not coincide with the optimum of the original model. Combining
the rigorous and simplified model in their optimization strategy Ganesh and
Biegler can guarantee convergence to the true optimum with the original model
and still reducing the computational load by over thirty percent. Model simplification based on physics appears to be successful for steady-state flowsheet
optimization.
Chimowitz and Lee (1985) reported increase of computational efficiency of
a factor of order three by the use of local thermodynamic models. According
to Chimowitz up to 90% of computational time was used for thermodynamic
computations during simulation, motivating their approach of model approximation. The local thermodynamic models are integrated with the numerical
solver where an updating mechanism of the parameters of the local models was
included. This approach is not easy to use since model reduction and numerical algorithm are integrated. Ledent and Heyen (1994) attempted to use local
models within dynamic simulations but were not successful due discontinuities
introduced by updating the local models. Still local models as such, without update mechanism, can be used reducing computational load despite their limited
validity.
Perregaard (1993) worked on model simplification and reduction for simulation and optimization of chemical processes. The objective of his paper is to
present a simplification procedure of the algebraic equations that through simplification of the algebraic equations for phase equilibria calculations is capable of
19
reducing the computing time to solve the model equations without effecting the
convergence characteristics of the numerical method. Furthermore it exploits
the inherited structure of equations representing the chemical process. This
structured equation oriented framework was adopted from Gani et al. (1990)
who distinguish between differential, explicit algebraic and implicit algebraic
equations. Key observation is that for Newton-like methods, the Jacobian can
be approximated during intermediate iterations. They replace the true Jacobian by a cheap to compute approximate Jacobian. This approximate Jacobian
information is derived from local thermodynamic models with analytical derivatives. They present in their paper several cases and report reductions of overall
computational times of the order 20-60% without loss of accuracy and no side
affects on convergence of the numerical method. Støren and Hertzberg (1997)
developed a tailored dae solver that is computationally more efficient and reliable and report limited reduction (34-63%) in computation times for dynamic
optimization calculations. In their approach also local thermodynamic models
are exploited.
Model order reduction by projection
Many papers are available on nonlinear model reduction by projection. More
precise would be order reduction of nonlinear model by linear projection. Order
referring to the number of differential equations. A generic procedure can be
formulated by three steps. First a suitable transformation is applied revealing
the important contributions to process dynamics. Second, the new coordinate
system is decomposed into two subspaces. Finally, the dynamics can be formulated in the new coordinate system where either the unimportant dynamics
are truncated or added as algebraic constraints (residualization). In case of
residualization the resulting model is dae format and will not reduce computational effort (Marquardt, 2001) due to increased complexity (loss of sparsity).
Therefore in most cases the transformed model is truncated. An approximate
solution with reduced computational load is known as slaving. Aling (1997)
reported increasing computational load with increasing residualization and reduced computational load by approximating the solution of slaved modes.
In most papers, projection is applied to ordinary differential equations (Marquardt, 2001). Only Löffler and Marquardt (1991) applied their projection to
both differential and algebraic equations. As an error measure between original
and reduced model, plots of trajectories of key variables are used. These are
simply generated by simulation of a specific input sequence and applied to both
models. In some papers results of computational time of simulations are added
as relevant information. Important information on the applied numerical integration algorithm is mostly not available despite the fact that this is crucial for
interpretation of the results. This becomes clear when comparing an explicit
20
fixed step numerical integration scheme with variable step-size implicit numerical integration scheme. Extremely important is the ability of the algorithms
to exploit sparsity (Bogle and Perkins, 1990). Process models are known to be
very sparse, which can be efficiently exploited by some numerical integration
algorithms reducing the computational load of simulation. Projection methods
in general destroy this sparsity, which is reflected on computational load of those
algorithms that exploit this sparsity.
Projection methods differ in how the projection is computed. Two main
approaches how to compute these projections are discussed next: proper orthogonal decomposition and balance projection, respectively.
Proper orthogonal decomposition
Popular is projection based on a proper orthogonal decomposition (pod) with
its origin in fluid dynamics (see e.g. Berkooz et al., 1993; Holmes et al., 1997;
Sirovich, 1991). This approach is also referred to as Karhunen-Loève expansion
or method of empirical eigenfunctions. Bendersky and Christofides (2000) apply
a static optimization of a catalytic rod and a packed bed reactor described by
partial differential equations resulting in reductions of over a factor of thirty in
computational load. In order to find the empirical eigenfunctions they generated
data with the original model and grid the design variables between upper an
lower bound. In case of the packed bed this implied with three design variables
at nine equally spaced values 93 = 729 simulations representing the complete
operating envelope. This is a brute force solution to a problem also addressed by
Marquardt (2001). However in case of a dynamic optimization this would not be
attractive due to the much higher number of decision variables: typically over
four inputs and at least 10 points in time would imply 940 ≈ 1038 simulations.
Baker and Christofides (2000) applied proper orthogonal projection to a
rapid thermal chemical vapor deposition (rtcvd) process model to be able to
design a nonlinear output feedback controller with four inputs and four outputs. This design can be done off-line, so no computational load aspects were
mentioned. They show that the nonlinear output feedback controller outperforms four pi controllers in a disturbance free scenario. Still the deposition was
unevenly distributed. Addition of a model based feedforward would add performance to control solution and might diminish the difference between a nonlinear
output controller and the four pi controllers.
Aling et al. (1997) applied the proper orthogonal decomposition reduction
method to a rapid thermal processing system. The reduction of differential
equations was impressive from one hundred and ten to less than forty. Reduction of computational load for simulation was up to a factor ten. First a
simplified model, a set of ordinary differential equations, is derived from a finite
element model (Aling, 1996). Then this model is further reduced by a proper
21
orthogonal decomposition of order forty, twenty, ten and five by truncation.
These truncated models are the further reduced by residualization. Residualization transforms the ode into a dae that is solved using a ddasac solver.
Residualization does not reduce the computation and therefore they propose a
so-called pseudo-steady approaximation (slaving), which is a computationally
cheaper solution than residualisation.
Order reduction by balanced projection
Lall (1999, 2002) introduced empirical Gramians as an equivalent for linear
Gramians that can be used for balanced linear model order reduction (Moore,
1981). Hahn and Edgar (2002) elaborate on model order reduction by balancing
empirical Gramians and show results of significant model order reduction but
limited reduction in simulation times. Some closed loop results were presented
but little details were presented on the implementation of the model predictive
controller scheme. Performance of the controller based on the reduced model
were as good as based on the full-order model but no reduction in computational
effort was achieved.
Lee et al. (2000) exploit subspace identification (Favoreel et al., 2000) for
control relevant model reduction by balanced truncation. The technique is control relevant because it is based on the input to output map instead of the input
to state map that is used for Proper Orthogonal Decomposition model reduction. This argument holds for all balanced model reduction techniques like the
empirical Gramians (Lall, 2002; Hahn and Edgar, 2002).
Newman and Krishnaprasad (1998) compared proper orthogonal decomposition and the method of balancing. Their focus was on ordinary differential
equations describing the heat transfer in a rapid thermal chemical vapor deposition (rtcvd) for semiconductor manufacturing. The transformation that was
used for balancing the nonlinear system was derived from a linear model in a
nominal operating point. The transformation balancing this linear model was
then applied to the nonlinear model. The transformation matrices were very ill
conditioned and they used a method proposed by Safonov and Chiang (1989) to
overcome this problem. An idea was suggested to find a better approximation
of the nonlinear balancing approach presented by Scherpen (1993). The order
of the models was significantly reduced by both projection methods with acceptable error, but no results were presented on reduction of the computational
load.
Zhang and Lam (2002) developed a reduction technique for bilinear systems
that outperformed a Gramian based reduction, though demonstrated on a small
example. The solution of the model reduction problem is based on the gradient
flow technique to optimize the H2 error between original and reduced order
model.
22
Singular perturbation
Reducing the number of differential equations can easily be done if the model
is in a standard form of a singular perturbed system (Kokotovic, 1986). In this
special case we can distinguish between the first couple of differential equations
representing the slow dynamics and the remaining differential equations associated with fast dynamics. Model reduction is then done by assuming that the
fast dynamics behave like algebraic constraints, which reduces the number of
differential equations.
For some differential equations it is fairly obvious to determine its time
scale but in general it is nontrivial. Tatrai et al. (1994a, 1994b) and Robertson et al. (1996a, 1996b) use state to eigenvalue association to bring models in
standard form. This involves a homotopy procedure with continuation parameter that varies from zero to one, weighting the system matrix at some operating
point with its trace. At different values of the continuation parameter the eigenvalues of the composed matrix are computed enabling the state to eigenvalue
association. Problem is that the true eigenvalues are the result of the interaction between several differential equations and therefore in principle cannot be
assigned to one particular differential state.
Duchêne and Rouchon (1996) show that the originally chosen state space is
not the best coordinate system to apply singular perturbation. They illustrate
this on a simple example and later demonstrate their approach on a case study
with 13 species and 67 reactions. Their reduction approach is compared with
a quasi steady-state approach and original by plotting time responses to a non
steady-state initial condition.
Reaction kinetics simplification
Petzold (1999) applied an optimization based method to determine what reactions dominate the overall dynamics. Aim is to derive the simplest reaction
system, which retains the essential features of the full system. The original
mixed integer nonlinear program (minlp) is approximated by a problem that
can be solved by a standard sequential quadratic programming (sqp) method by
using a so-called beta function. Results are presented in this paper for several
reaction mechanisms.
Androulakis (2000) formulates the selection of dominant reaction mechanisms as a minlp as well but uses a branch and bound algorithm to solve it.
Edwards et al. (2000) not only eliminates reactions but species as well by solving
a minlp using dicopt.
Li and Rabitz (1993) presented a paper on approximate lumping schemes
by singular perturbation, which they later developed into a combined symbolic
and numerical techniques to apply constrained nonlinear lumping applied to
23
an oxidation model (Li and Rabitz, 1996). Significant order reductions were
presented but no effect on computational load were discussed in these papers.
Nonlinear empirical modelling
Empirical models have a very low computational complexity and therefore alow
for fast simulations. Typically, interpolation capabilities are comparable to fundamental models but extrapolation of fundamental models is far superior (Can et
al., 1998). Since we do not want to restrict ourselves to optimal trajectories that
are interpolations of historical data, these type of models seem unsuitable for dynamic optimization. Nevertheless we will mention some literature on nonlinear
empirical modelling.
Sentoni et al. (1996) successfully applied Laguerre systems combined with
neural nets to approximate nonlinear process behaviour.
Safavi et al. (2002) present a hybrid model of a binary distillation column
combining overall mass and energy balances with a neural net accounting for the
separation. This model was used for an online optimization of the distillation
column and compared to the full mechanistic model. The resulting optima were
close, indicating that the hybrid model was performing well. No results were
presented on the computational benefit of the hybrid model.
Ling and Rivera (1998) present a control relevant model reduction of Volterra
models by a Hammerstein model with reduced number of parameters. Focus was
on closed loop performance of a simple polymerization described by 4 ordinary
differential equations reactor and the computational aspects were not discussed.
Later Ling and Rivera (2001) presented a three step approach to derive control
relevant models. First, a nonlinear arx model is estimated from plant data
using an orthogonal least squares algorithm. Second, a Volterra series model is
generated from the nonlinear arx model. Finally a restricted complexity model
is estimated from the Volterra series through the model reduction algorithm
described above. This seems to involve quite some non trivial steps to finally
arrive at the reduced model.
Miscellaneous modelling techniques
Norquay et al. (1999) successfully deployed a Wiener model within a model
predictive controller on an industrial splitter using a linear dynamics and a
static output nonlinearity. Pearson and Pottmann (2000) compare a Wiener,
Hammerstein and nonlinear feedback structure for gray-box identification, all
based on linear dynamics interconnected with a static nonlinear element. Pearson (2003) elaborates in his review paper on nonlinear identification on selection
of nonlinear structures for computer control.
24
Stewart et al. (1985) presented a rigorous model reduction approach for nonlinear spatially distributed systems such as distillation columns, by means of
orthogonal collocation. A stiff solver was used to test and compare the original
with different collocation strategies that appear to be remarkably efficient.
Briesen and Marquardt (2000) present results on adaptive model reduction
for simulation of thermal cracking of multi component hydrocarbon mixtures.
This method provides an error controlled simulation. During simulation an
adaptive grid reduces model complexity where possible. The error estimation
governs the efficiency of the complete procedure and no results were presented
on reduction of computational load. For online use of such a model it would
require some adaptation of a Kalman filter since the order of the model is
changing continuously.
Kumar and Daoutidis (1999) applied a nonlinear input-output linearizing
feedback controller to a high purity distillation column that was non-robust
using the original model but had excellent robustness properties using a singularly perturbed reduced model. No details were presented on the effect of the
model reduction on computational load. See e.g. Nijmeijer and van der Schaft
(1990), Isidori (1989) and Kurtz and Henson (1997) for more details on feedback
linearizing control.
Important observation of this literature overview is that no reduction techniques are available directly linked to reduction of computation load or simulation of optimization. All techniques have different focuses, and the effect on
computational load can only be evaluated by implementation. There does not
exist a modelling synthesis technique for dynamic optimization that provides an
optimal model for a fixed sampling interval and prediction horizon. So the effect of most promising model reduction techniques should be evaluated on their
merits for real time dynamic optimization by implementation.
1.4
Solution directions
The gap revealed for consistent online nonlinear model based optimization is caused by its computational load. Since computing speed approximately
doubles every eighteen months one could argue that it is only a matter of time
before the gap is closed. With a gap of factor eight hundred derived in the
previous section we still need to wait for almost fifteen years until computers
are fast enough, assuming that the computer speed improvements can be extrapolated. However, suppose next decades computing power were available we
would immediately want to control even more unit operations by this optimization based controller or increase the sampling frequency to improve performance.
This brings us back to square one where the basic question is how to reduce
computational load of the optimization problem so that it can be solved within
25
the limited time available in online applications. This is the concept of model
reduction addressed in this thesis.
1. The divide and conquer approach (e.g. Tousain, 2002) was already mentioned as state of the art process control solution. In this approach a
steady-state nonlinear model, based on first principles, is used within a
steady-state optimization, producing optimal set points. These set points
are tracked by a mpc based on an identified linear dynamic model. Typically these optimal set points are recomputed a couple times per day
maximum, whereas the mpc is implemented as a receding horizon controller with a sample time of one to five minutes and control horizon up
to a couple of hours maximum.
(a) The inconsistency of state of the art real-time optimization can partially be eliminated by replacing the static optimization by a dynamic optimization. The result of the dynamic optimization problem
are optimal input and output trajectories exploiting dynamics of the
process. These input and output trajectories can be tracked by the
linear mpc in so-called delta mode. This removes only part of the
inconsistency since now models in both control layers are based on
dynamic models. Inconsistency is still present due to use of a linear
model in the mpc whereas model used for dynamic optimization is
nonlinear, like the true process.
(b) Nonlinear elements can gradually be incorporated within a linear
model predictive controller scheme. The linear prediction can be
replaced by prediction from simulation of future input trajectories
with the nonlinear dynamic model. And using linear time varying
models derived from the nonlinear model along the simulated trajectory brings us close to a full nonlinear model based controller.
2. Another solution direction is improvement of optimization and numerical
integration routines. Great improvements have been achieved in numerical
integration routines over the last decades, which makes this direction not
effective. Improving efficiency of optimization routines will not be the
scope of this thesis.
3. Solution direction treated in this thesis is one where approximate models
are derived from a rigorous model. These approximate models should
reflect optimization relevant dynamic behavior, but with more favorable
computational properties than the original rigorous model. The rigorous
model is strongly appreciated and therefore should serve as base model
from which models for different applications can be derived. A rigorous
model should be the backbone of nonlinear optimization based control. This
proposition was decisive for the course of this research.
26
(a) A possible way to derive approximate models is to generate data by
dedicated simulations with a validated rigorous model. This data can
be then used for model identification and validation. Some disadvantages are removed in this approach such as time consuming costly
real life experiments for dedicated data collection. Furthermore the
data collection on simulation basis can be done over a larger operating range than would have been allowed for in the real plant. Although the models that result from this approach do have favorable
computational properties loads of data have to be processed, which
has to be redone if for some reason a model parameter is adapted.
Since plant hardware and operation are improved continuously this
approach requires model updating by re-identification.
(b) In nonlinear control theory linear model reduction concepts have been
generalized to nonlinear models (Isidori, 1989; Nijmeijer, 1990; Scherpen, 1993) and applied to small models. No feasible solution for large
scale problems became available from this research area.
(c) From linear control theory, different model reduction techniques are
available that can be applied to nonlinear models as well. These
techniques reduce to projection of dynamics achieved by transformation, followed by either truncation or residualization. Projection of
dynamics can be done based on different arguments and result in reduced models with different properties. Successful results on model
reduction by projection have been presented in literature though not
evaluated within an online dynamic optimization setting. This topic
will be extensively treated in this thesis.
(d) Alternative approach to arrive at approximate models reduces to remodelling. Main question is what model accuracy is required for models used in online dynamic optimization applications. Presence of
uncertainties and disturbances will require continuously updating of
the control actions. So an approximate model for sure should reflect
the plant dynamics but does not have to predict process variables
correctly up to ten decimals. A successful attempt to reduce computational load by remodelling for simultaneous dynamic optimization
was reported by Findeisen et al. (2002). Remodelling will be the
second main model reduction approach treated in this thesis.
The idea of decomposing the optimization in two layers is strong if both layers
are using models derived from the same core model providing consistency. In
either case it is beneficial to reduce the computational effort for large scale
dynamic optimization. We are capable of formulating this type of problems
for industrial processes so this is assumed not to be the problem. This thesis
focusses on model approximation of a rigorous nonlinear dynamic model. Non27
linear identification is not selected because it lacks a good basis for a proper
choice of model structure. Nonlinear control theory has only been applicable
to very small toy examples and was therefore not selected. This leaves model
reduction by projection and by remodelling as solution directions to be explored
in this thesis.
1.5
Research questions
Starting point in this thesis is the availability of a validated rigorous
model. Development and validation of such a model is a challenging task, which
is beyond the scope of this thesis. A rigorous model is based on numerous modelling assumptions. Important assumptions are the set of admissible variable
values and timescale defining the validity envelope of the rigorous model. In
general a much smaller region of this envelope is of interest for process optimization. Precisely this observation provides the fundamental basis for model
approximation. The approximation needs only to be accurate in this restricted
region of interest.
With the availability of this validated rigorous model we can distinguish between generation and evaluation of different approximate models. Therefore we
need to develop and apply different model approximation techniques to generate different models and build a test environment to evaluate the effect on
optimization performance.
Approximation techniques for linear models by projection are quite matured
and the principles can be carried over to nonlinear models. Approximation by
projection is a mathematical (and therefore quite generally applicable) approach
that reduces the number of differential equations. First research question is:
1. How to derive a suitable projection?
Suitable in the sense that it is suitable for dynamic real-time optimization,
so reducing computational load of the optimization and with minimal loss of
controller performance.
A different approach for model approximation is remodelling, which requires
detailed process knowledge of both physics and process operation. Second
research question is:
2. Is physics-based reduction a suitable technique to derive models for dynamic real-time optimization?
Last research question is:
3. How to evaluate different approximated nonlinear models for dynamic
real-time optimization?
28
A fundamental problem emerges if we want to classify different nonlinear models.
Linear models can be classified by means of distance measures (norms) that
capture all possible input signals. Similar distance measures can be defined
for nonlinear models but represent only one (set of) specific input signals since
the superposition principle is not applicable to nonlinear models. Moreover we
are not only interested in the approximation of the input-output map of the
model but especially interested in the online performance of the optimization
based on this reduced model. A pragmatic approach to classify nonlinear models
for a specific optimization will be presented in this thesis, addressing solution
accuracy, model adequacy and computational efficiency.
1.6
Outline of this thesis
In the previous sections we explored different research areas in which
this research is embedded. The interplay between these different fields within
online dynamic optimization is not always trivial and requires thorough knowledge and understanding of all areas. This complexity makes advanced process
control such a fascinating research area.
Furthermore we observed a large gap in time, required to solve the true optimization problem online, and desired sampling time required for controller performance. Nowadays very pragmatic approximate solutions are implemented,
not pushing plant to its maximum performance. Since there are many aspects
to optimization based process control, many different solutions can contribute
closing this gap. In this thesis we will focus on the role of the model, its properties and the effect on optimization based control.
After exploring literature for possible solutions for the model reduction problem we presented different solution directions. We selected two solution approaches and formulated three research questions that will be worked out in the
next chapters.
In Chapter 3 a base case dynamic optimization is outlined in quite some detail
in order to make clear how the optimization was implemented. Results of the
dynamic optimization based on a detailed rigorous model are presented. The
modelling equations of the detailed plant model reflecting the process behavior
are discussed in Chapter 4 along with the physics-based model reduction that
was applied. Results of the base case dynamic optimization based on this reduced model are presented and compared with the optimization results with the
detailed rigorous model in this chapter.
The physics-based reduced model is the new starting point for the exploration of the reduction by projection in Chapter 5. Two projection techniques
are assessed by performing dynamic optimizations with all possible reduced29
orders. Solution accuracy and efficiency are the key properties of interest. This
resulted in over 500 optimizations providing a complete view on the properties of
model reduction by projection. The projection was applied to the physics-based
reduced model instead of the original rigorous model simply because of practical reasons. Finally conclusions and future research directions are presented in
Chapter 6.
These chapters are preceded by Chapter 2 on model reduction by projection. Focus is on current available techniques and lucent interpretation thereof.
This results in a generalization and new interpretation of a existing projection
techniques. The projection techniques are evaluated in Chapter 5 within a dynamic optimization, setting as defined in the base case dynamic optimization
in Chapter 3 and applied to the physics-based reduced model as derived in
Chapter 4.
30
Chapter 2
Model order reduction
suitable for large scale
nonlinear models
In this chapter two methods of order reduction methods for nonlinear models
are reviewed. These two methods are treated in more detail since those are serious candidates for large scale model reduction, respectively balanced model order
reduction via empirical Gramians and model order reduction by Proper Orthogonal Decomposition. Both methods are extensively explored, tested and compared.
2.1
Model order reduction
Most nonlinear model order reduction methods are extensions of linear
methods for model reduction. We can be even more precise: most reduction
techniques apply some linear projection to the nonlinear model as if it were a
linear model. It is sensible to look into linear model reduction first since the
arguments for model reduction are the same for the linear and nonlinear case.
For linear model reduction we can find a thorough overview in textbooks like
e.g. by Obinata and Anderson (2001). Overview articles on different linear
model reduction techniques are available by e.g. Gugercin (2000) and Antoulas
(2001).
These methods can be described by a transformation in the first step followed by either truncation or singular perturbation as a second step. Reduction
methods differ in the first step whereas the second step can be chosen on other
31
grounds, e.g. the need for an accurate steady-state gain would prefer singular
perturbation (residualization) over truncation.
Since nonlinear model order reduction techniques are inspired by linear
model reduction we first will start with the basic linear model reduction methods. And even more basic we will start how to arrive at a linear model from
a nonlinear model.
Linearization of a nonlinear model
Suppose a process can be described by a nonlinear model defined as a set of
ordinary differential equations
ẋ
f (x, u)
x(t0 ) = x0 .
=
,
(2.1)
y
g(x, u)
where x is the state vector in Rnx , u is the input vector in Rnu , y is the output
vector in Rny and x0 is the initial condition at time t0 in Rnx . This nonlinear
system can be linearized around an equilibrium point (x∗ , u∗ , y ∗ ) defined by
f (x∗ , u∗ )
0
.
(2.2)
=
g(x∗ , u∗ )
y∗
Consider x(t) = x∗ + ∆x(t), u(t) = u∗ + ∆u(t) and y(t) = y ∗ + ∆y(t) and the
Taylor series about the equilibrium point (x∗ , u∗ , y ∗ )
∂f 2
f (x, u) = f (x∗ , u∗ ) + ∂f
∆x
+
∂x
∂u ∆u + O(∆) ,
∗
∗
∂g g(x, u) = g(x∗ , u∗ ) + ∂x
∆x +
∂g 2
∂u ∆u + O(∆) ,
∗
(2.3)
∗
with ∂∂ ·· ∗ the partial derivative in stationary point (x∗ , u∗ ) and O(∆)2 representing the error of approximation. The original set of differential equations
can be approximated in the stationary point by a linearization valid for small
perturbations
∂f ∆x
+
∆ẋ = ∂f
∂x
∂u ∆u,
∗
∗
∂g ∆y = ∂x
∆x +
∗
or in matrix notation
∆ẋ
A
=
∆y
C
B
D
∆x
∆u
∂g ∂u ∆u,
∗
,
32
(2.4)
∆x(t0 ) = x0 − x∗ ,
(2.5)
where
A=
∂f ∂x C=
∂g ∂x ∗
∈ Rnx×nx ,
∈R
ny×nx
∗
B=
∂f ∂x D=
∂g ∂x ∗
,
∈ Rnx×nu ,
(2.6)
∈R
ny×nu
∗
.
The matrix quadruple A, B, C, D is referred to as the system matrices in linear
systems theory (e.g. Zhou et al., 1995).
Linear model order reduction
Consider the linearization (2.5) of the set of nonlinear ordinary differential equations defined in Equation (2.1) in an equilibrium point
ẋ
A B
x
=
,
(2.7)
y
C D
u
where for sake of notation ∆x and ∆u are replaced by x and u, respectively.
With matrices A, B, C, D fixed in time we have defined a so-called linear timeinvariant (lti) system with sorted eigenvalues {λ1 , λ2 , . . . , λn } such that
0 ≤ |λ1 | ≤ |λ2 | ≤ . . . ≤ |λn |.
(2.8)
The system has a two-time-scale property if there exists one or more eigenvalue
gaps defined by
|λp |
1 , p ∈ {1, . . . , n}.
|λp+1 |
(2.9)
Suppose we can distinguish between slow and fast dynamics by splitting x into
x1 and x2 , respectively. The system is then in so-called standard form of a
singular perturbed system (Kokotovic et al., 1986),
 



A11 A12 B1
x1
x˙1
 x˙2  =  A21 A22 B2   x2  .
(2.10)
y
C1 C2 D
u
Kokotovic et al. (1986) derived a sufficient condition to have a two time scale
property
A−1
22 <
1
A0 +
A12 A−1
22 A21 12
+ 2 A0 A12 A−1
22 A21 ,
(2.11)
where
A0 = A11 − A12 A−1
22 A21 .
33
(2.12)
The proof is given in Kokotovic et al. (1986). Assuming A22 is nonsingular,
−1
x2 = −A−1
22 A21 x1 − A22 B2 u can be substituted which yields the singularly
perturbed slow approximation where we assume the fast dynamics associated
with ẋ2 to be infinitely fast
x˙1
A11 − A12 A−1
A21 B1 − A12 A−1
B2
x1
22
22
=
.
(2.13)
y
u
C1 − C2 A−1
D − C2 A−1
22 A21
22 B2
The order of the approximation is reduced by the number of differential equations associated with x2 . Note that the possible sparse structure present in
the system matrix A11 and A22 is destroyed by the substitution of x2 due to
the inversion of matrix A22 , which in general is not sparse. Motivation for this
reduction approach is that in general we are more interested the slow process
dynamics since slow dynamics dominates controller design and in practice limits
performance.
A different approximation approach is truncation. Suppose we can distinguish between varying and constant states by splitting x into x1 and x2 , respectively. We approximate the slow dynamics associated with ẋ2 to be infinitely
slow. So by substitution of x2 = c and elimination of ẋ2 we can derive a truncated model





A11 A12 B1
x1
x˙1
 x˙2  =  A21 A22 B2   x2  →
y
C1 C2 D
u
(2.14)


x1
x˙1
A11 A12 B1 
c .
=
y
C1 C2 D
u
Note that possible present sparse structure in A11 is preserved in the truncated
model. However, it is rare that the original system is in this format. In general a
sparse structure will be already be destroyed by transformation into the format
that is suitable for truncation.
Both reduction techniques require a partition of differential equations in slow
and fast dynamics or in varying and constant states, respectively. In practice the
differential equations are not in this standard form and need to be sorted first.
Sorting differential equations can be done by a permutation matrix, which can
be considered as a special transformation: xp = P x where xp is the permuted
state vector and P a permutation matrix. Permutation does not effect the input
to output behavior of the system. The original (permuted) coordinate system is
seldom a coordinate system suitable for model reduction. In general we need a
so-called similarity transformation to arrive at the suitable coordinate system.
From linear system theory we that a similarity transformation without reduction
does not effect the input to output behavior.
34
The proof is short and simple. The input to output behavior can be described
by a so-called transfer function in the Laplace domain. The Laplace transfer
function G(s) that can be computed from the system matrices is
G(s) =
y(s)
−1
= C (sI − A) B + D.
u(s)
(2.15)
Transformation to another coordinate system z by means of an invertible matrix
T ∈ Cnx×nx does not effect the transfer function G(s). This can be demonstrated by considering the transformation z = T x with T T −1 = I. We can
write Equation (2.7) in the new coordinate system
ż
T AT −1 T B
z
=
,
(2.16)
y
D
u
CT −1
which gives us the new system matrices in the transformed domain. The transfer
function for the transformed system is
−1
T B + D.
(2.17)
G(s) = CT −1 sI − T AT −1
Using a matrix equality for square and invertible matrices
X −1 Y −1 Z −1 = (ZY X)−1 ,
(2.18)
where X = T , Y = (sI − T AT −1 ) and Z = T −1 we can show by substitution
that
−1
−1
T = (sI − A) ,
(2.19)
T −1 sI − T AT −1
which finishes the proof. Transformation to a new coordinate system is crucial
for the properties of the resulting reduced model, along with the choice for either
truncation or singular perturbation.
Singular perturbation, truncation and transformation can be applied to a
set of nonlinear differential equations as we will see next.
Nonlinear model order reduction
Let us consider the set of nonlinear ordinary differential equations defined in
(2.1) and assume that the system is in a standard form of a singular perturbed
system with again x1 and x2 associated with the slow and fast dynamics, respectively. We can reduce the number of differential equations by approximating the
fast dynamics by infinitely fast dynamics by letting ε → 0, which results in an
algebraic constraint
 
 




f1 (x1 , x2 , u)
f1 (x1 , x2 , u)
x˙1
x˙1
 εx˙2  =  f2 (x1 , x2 , u)  −→  0  =  f2 (x1 , x2 , u)  .
(2.20)
g(x1 , x2 , u)
y
g(x1 , x2 , u)
y
35
Note that the set of ordinary differential equations is turned into a set of differential and algebraic equations, but unlike for the linear model these equations
cannot be eliminated by substitution in general.
Suppose that we partitioned our system such that x1 and x2 represent the
varying and very slow varying states, respectively. We can then approximate
the model by truncation where we assume the slow dynamics to be infinitely
slow i.e. constant, x2 = x∗2
 

f1 (x1 , x2 , u)
x˙1
∗
 x˙2  =  f2 (x1 , x2 , u)  −→ x˙1 = f1 (x1 , x∗2 , u) .
y
g(x1 , x2 , u)
y
g(x1 , x2 , u)

(2.21)
A proper choice for x∗2 is a steady-state value for the original system. This can
be computed from 0 = f1 (x∗1 , x∗2 , u∗ ), 0 = f2 (x∗1 , x∗2 , u∗ ) and 0 = g(x∗1 , x∗2 , u∗ ).
The truncated model is again a set of ordinary differential equations like the
model we reduced.
Just like for linear systems we can apply a linear coordinate transformation
z = T (x − x∗ )
ż
y
=
T f (T −1 z + x∗ , u)
g(T −1 z + x∗ , u)
,
z(t0 ) = T (x0 − x∗ ).
(2.22)
This coordinate transformation is required to do model reduction in a suitable
coordinate system. This transformation can be a permutation of the differential
equations but can also involve a full state transformation. In case of a full state
transformation the new states cannot be associated with physical states any
more but the relation between new and old states is fixed by the transformation
matrix. In an attempt to prevent that Tatrai et al. (1994a, 1994b) and Robertson et al. (1996a, 1996b) tried to associate eigenvalues to states in the original
coordinate system using a linearization and a homotopy parameter.
There are different arguments to arrive at a suitable coordinate system that
will be discussed in detail the next section. The method using empirical Gramians for a balanced reduction is based on the observation that some combinations of states contribute more to the input-output behavior than other combination of states. A specific transformation arranges the new transformed states
in order of descending contribution. The method referred to as reduction by
proper orthogonal decomposition is based on the observation that some combination of states are approximately constant during simulation of predefined
input signals. A specific transformation arranges the new transformed states in
order of descending excitation.
36
2.2
Balanced reduction
Balanced reduction can be explained by the Kalman decomposition in
block diagonal form, that partitions the system into four sets of state variables of
which the first and second are controllable and the first and third are observable.
The fourth set of states is neither controllable nor observable
 



A11
x1
ẋ1
0
0
0
B1


 ẋ2   0
A22
0
0
B2 
 
  x2 



 ẋ3  =  0
0
A33
0
0   x3 
(2.23)
 
.

 ẋ4   0
0
0
A44 0   x4 
y
C1
0
C3
0
D
u
The only relevant part of this model is the part that is both controllable and
observable, since this is the part that can be effected by a controller in feedback.
These decomposition is invariant under similarity transformation. Small perturbations of the system matrices B and C will prevent this exact decomposition
and in case the system matrices are not analytically determined but the result
of some numerical routine all zeros are represented by small values

 


A11
x1
ẋ1
0
0
0
B1

 ẋ2   0

A22
0
0
B2 

 
  x2 

 ẋ3  =  0

0
A33
0
εB3   x3 
(2.24)

 
.
 ẋ4   0
0
0
A44 εB4   x4 
y
C1 εC2 C3 εC4
D
u
Observe that compared to the unperturbed case x2 is still controllable but now
weakly observable, x3 is still observable but now weakly controllable and x4 is
now weakly controllable and weakly observable. We can now demonstrate the
effect of scaling. Assume we multiply x2 by ε and divide x3 by ε the scaled
system matrices are defined by


 

x1
ẋ1
A11
0
0
0
B1

 εẋ2   0

A22
0
0
εB2 
 1
  1εx2 
 

 ẋ3  =  0

0
A33
0
B3   ε x3 
(2.25)
 ε
 
.
 ẋ4   0
0
0
A44 εB4   x4 
C1 C2 εC3 εC4
D
y
u
Compared to the unscaled case x2 is now strongly observable but weakly controllable and the reverse holds for x3 . By scaling strongly controllable states
that are weakly observable can be interchanged by weakly controllable states
that are strongly observable. For demonstration purposes we chose to not perturb the A matrix but the effect of small non zero values instead of zeros is
similar on observability and controllability properties.
37
The Kalman decomposition is a theoretical decomposition that helps to understand the concept of observability and controllability but cannot be applied in
practice. This is caused by the presence of many small values in system matrices. Furthermore we demonstrated the effect of scaling on observability and
controllability. Observability and controllability are not invariant under scaling
however we know that the input-output behavior is. This naturally leads us to
the model approximation by balanced truncation as posed by Moore (1981). After a specific transformation a single state of the transformed system is equally
controllable and observable, or so-called balanced. Furthermore the states are
ordered in descending rate of controllability and observability.
Linear balanced reduction
Let us consider the continuous time stable linear time-invariant system (2.7).
The controllability Gramian and the observability Gramian, respectively P and
Q, are defined as
∞
T
P = 0 eAt BB T eA t dt
∞ AT t T At
,
(2.26)
Q= 0 e
C Ce dt
and satisfy the Lyapunov equations
AP + P AT + BB T = 0
.
AT Q + QA + C T C = 0
(2.27)
The quadruple A, B, C, D is a balanced realization if and only if
P =Q=Σ,
(2.28)
where Σ is diagonal with σ1 ≥ σ2 ≥ · · · ≥ σn . If the system is not a balanced
realization there exists a transformation after which the transformed system is
a balanced realization. The diagonal elements of Σ are also revered to as the
Hankel singular values of the system, which are invariant under transformation.
The similarity transformation T that balances the system if observable, controllable and stable, can be derived from the controllability Gramian and observability Gramian (see e.g. Zhou, 1995)
P
RQRT
T
= RT R
= U Σ2 U T
1
= Σ 2 U T RT
,
(2.29)
which finalizes the approach in the continuous time linear case. A more detailed
description can be found in Appendix A.1.
38
Since we will work in this chapter with discrete time models we also present
the discrete time observability and controllability Gramian definitions. Let us
therefore define a discrete time model for the linear time invariant system defined
in Equation (2.7)
F G
xk
xk+1
=
,
(2.30)
yk
uk
C D
where F and G are discrete transition matrices defined as
ts
eA(ts −τ ) Bdτ ,
F = eAts and G =
(2.31)
0
for piece-wise constant (zero order hold) input signals and where ts is the sample
interval.
For a discrete time linear system (2.30) the controllability and observability
Gramians are defined as
T
k
T k
Wc = Σ∞
k=0 F GG F
,
∞
kT T
Wo = Σk=0 F C CF k
(2.32)
which satisfy the Lyapunov equations
F Wc F T + BB T − Wc = 0
.
F T Wo F + C T C − W o = 0
(2.33)
The discrete time system defined by the system matrix quadruple F, G, C, D is
balanced if and only if
Wc = Wo = Σd ,
(2.34)
where Σd are the Hankel singular values of the discrete time system. In case
a discrete system is not a balanced realization there exists a transformation
after which the transformed system is a balanced realization. This transformation can be computed as described in Equations (2.29) but with discrete time
controllability and observability Gramians.
The finite time discrete time controllability and observability Gramians are
defined as
T
k
T k
Wc = ΣN
k=0 F GG F
,
T
k
Wo = ΣN
C T CF k
k=0 F
(2.35)
where N is a big number instead of ∞. These Gramians are approximate
solutions of the Lyapunov Equations (2.33) and will be used later on in this
chapter.
39
Numerical issues
The corresponding transformation for a balanced realization of the original system can be numerically ill conditioned. This is caused by the small Hankel
singular values that appear in the transformation matrix. Inversion of these
small Hankel singular values expose the addressed numerical problem. This
problem can be circumvented in case that we are only interested in the approximate reduced model.
P = Uc Σ2c UcT ,
Q = Uo Σ2o UoT .
(2.36)
(2.37)
H := Σo UoT Uc Σc ,
(2.38)
Define the Hankel matrix
and define the singular value decomposition of H
H = UH ΣH VHT .
(2.39)
Then the transformation matrix T is defined as
1
T
2
T := ΣH
VHT Σ−1
c Uc ,
(2.40)
and inverse
−1
T −1 = Uc Σc VH ΣH 2 ,
(2.41)
or dually we can define T as
−1
T
T := ΣH 2 UH
Σo UoT ,
(2.42)
with inverse
1
2
T −1 = Uo Σ−1
o UH ΣH .
(2.43)
Transformation T brings the system (2.7) into a balanced realization. Proof is
by substitution of transformation matrices and Gramian decompositions of P
and Q in Lyapunov equations (2.27).
In case we are only interested in the reduced-order model we can derive a
well conditioned projection
z˙1
TL ATR−1 TL B
z1
=
,
(2.44)
y
u
CTR−1
D
where
−1
T
TL = ΣH12 UH
Σo UoT ,
1
40
(2.45)
and
−1
TR−1 = Uc Σc VH1 ΣH12 ,
with H partitioned as
H=
UH2
UH1
ΣH1
0
0
ΣH2
(2.46)
VHT1
VHT2
.
(2.47)
Observe that the small singular values of the Hankel singular values are truncated and yield well conditioned computation of the projection matrices.
Let us look at a special case to gain more insight on Gramian based balancing.
Therefore we define a stable linear system with one input and one output and
were Λ is diagonal with on the diagonal the sorted eigenvalues |λ1 | < |λ2 | <
· · · < |λn |, B = [ ε0 , ε1 , . . . , εn−1 ]T with ε < 1, B = C T and D = 0. For a third
order system the example expands to
ẋ
y
=
Λ
C
B
D
x
u
 
λ1
ẋ1
 ẋ2   0
 
→
 ẋ3  =  0
y
1

0
λ2
0
ε
0
0
λ3
ε2

1
x1
 x2
ε 

ε2   x3
0
u


 .(2.48)

This form is not a balanced realization although the systems is perfectly symmetrical and the input to output contribution of each states decreases. This
can be verified by the corresponding Lyapunov equations as defined in (2.27).
Substitution of the system matrices yields
ΛP + P Λ + BB T = 0
ΛP + P ΛT + BB T = 0
→
→P =Q.
T
T
Λ Q + QΛ + C C = 0
ΛQ + QΛ + BB T = 0
(2.49)
The system would be a balanced realization if P = Q = Σ with Σ diagonal and
with on the diagonal the Hankel singular values. In this example P cannot be
diagonal because Λ is diagonal and BB T is not.
The transformation that brings the system in a balanced realization is U T ,
which is derived from the singular value decomposition of P where P = U ΣU T .
Substitution in the Lyapunov equation yields
ΛU ΣU T + U ΣU T Λ + BB T = 0
.
ΛU ΣU T + U ΣU T Λ + C T C = 0
(2.50)
Pre multiplication with U T and post multiplication with U gives
+ ΣA
T + B
B
T = 0
AΣ
U T ΛU Σ + ΣU T ΛU + U T BB T U = 0
→ T
T
T
T T
= 0 . (2.51)
+C
T C
U ΛU Σ + ΣU ΛU + U C CU = 0
A Σ + ΣA
41
The balanced realization is of the example is
T
ż
U ΛU U T B
z
=
=
y
u
CU
D
B
A
C D
z
u
.
(2.52)
If ε 1 in the example U approaches the unity matrix and in that case the
example already is in a balanced realization.
Empirical Gramians
For linear systems we used Gramians to compute a balanced realization suitable
for model reduction by truncation. These Gramians are the solution of two Lyapunov equations that can be solved. For nonlinear systems we cannot derive
Gramians in this way. The idea behind empirical Gramians is to derive Gramians from data generated by simulation. If it is possible to construct Gramians
from simulated data for linear systems the same technique can be applied to
nonlinear systems since simulation of a nonlinear system is possible. In this way
it would be possible to construct a Gramian that is associated to a nonlinear
system.
Lall (1999) introduced the idea of empirical Gramians. Empirical Gramiams
are closely related to the covariance matrix introduced by Pallaske (1987)
∞
M=
(x(t) − x∗ )(x(t) − x∗ )T dtdG ,
(2.53)
G
0
although Pallaske did not relate the approach to balancing. The symbol G
denotes a set of trajectories resulting from a variation of initial conditions and
input signals. Löffler and Marquardt (1991) further elaborated the approach
and suggested to use a set of step responses to generate data that represent the
systems dynamics.
Lall (1999) reconstructs the Gramians from data generated by either an
impulse response, for the controllability Gramian, or response to an initial condition to derive the observability Gramian. Because of the orthogonality of the
impulses and initial conditions the data adds up to the controllability and observability Gramian, respectively.
Lall defined the following sets for the empirical input Gramian:
Tn
M
En
= {T1 , . . . , Tr | Tl ∈ Rn×n , Tl TlT = I, l = 1, . . . , r}
= {c1 , . . . , cs | cm ∈ R+ , m = 1, . . . , s}
= {e1 , . . . , en | standard unit vectors in Rn }
,
(2.54)
where r is the number of different perturbation orientations, s is the number of
different perturbation magnitudes and n is the number of inputs of the system
42
for the controllability Gramian and the number of states of the full-order system
for the observability Gramian. Lall does not motivate this choice for these sets
nor does he give an interpretation of his definition of empirical Gramians. He
simply presents the definitions and proves that for a linear system the empirical
Gramian coincides with the classical definition of a Gramian.
Empirical controllability Gramian. Let T p , M and E p be given as described above, where p is the number of inputs of the system. The empirical
controllability Gramian for system (2.7) is defined by
P =
p
r s l=1 m=1 i=1
1
rsc2m
∞
Φilm (t)dt ,
(2.55)
0
where Φilm (t) ∈ Rn×n is given by
Φilm (t) := (xilm (t) − xss )(xilm (t) − xss )T ,
(2.56)
and xilm (t) is the state of the system corresponding to the impulse response
u(t) = cm Tl ei δ(t) + uss with initial condition x0 = xss . The proof can be found
in Lall (1999).
The definition of the empirical controllability Gramian is explained as follows.
The first set of summations in the empirical controllability Gramian can be given
the interpretation of a rotation of the standard unity vectors Rr . Infinitely many
different rotations would form a unity sphere in Rr with its origin located in
uss . Each of this rotated unity vectors is used as a direction for a Dirac pulse to
excite the system. By simulation of the free response of the system to this Dirac
pulse we can see how the energy is absorbed by the system and manifests itself
in state trajectories. This gives a measure for controllability of that specific
input direction. If the Dirac pulse results in a large excursion in a specific direction of the state space we give it the notion of a good controllable subspace. In
order to search the state space for nonlinear behavior the input space is gridded.
This gridding is done by defining different rotations and different amplitudes.
The different amplitudes explain the second summation in the definition of the
controllability Gramian. Since the different responses are the result of Dirac
pulses with different energy levels this energy level is compensated for to make
the responses comparable. This explains the division by c2m of each response.
The rotation of unity vectors makes sure that all input directions are equally
represented, which is not only intuitively the right thing to do but it appears
a necessity for the construction of the empirical controllability Gramian in this
way.
43
Empirical observability Gramian. Let T n , M and E n be given as described
above, where n is the number of states of the original system. The empirical
observability Gramian for system (2.7) is defined by
Q=
r s
l=1
1
2
rsc
m
m=1
∞
0
Tl Ψlm (t)TlT dt ,
(2.57)
where Ψlm (t) ∈ Rn×n is given by
ilm
Ψlm
(t) − yss )T (y jlm (t) − yss ) ,
ij (t) := (y
(2.58)
and y ilm (t) is the output of the system corresponding to the initial condition
x0 = cm Tl ei + xss and yss is the steady-state value corresponding to the input
u(t) = uss . The proof can be found in Lall (1999).
The first set of summations in the empirical observability Gramian can be given
the interpretation of a rotation of the standard unity vectors Rn . Infinitely
many different rotations would form a unity sphere in Rn with its origin located
in the state state of the system. Each of this rotated unity vectors is used as a
perturbation on the steady-state xss of the system. By simulation of the free
response of the system from this initial condition we can see how the energy
manifests itself in the output. This gives a measure for observability of that
specific direction in the state space. If it takes little energy to drive the system
in that direction of the state space and the energy strongly manifests itself in the
output we give it the notion of a good controllable and observable subspace. In
order to search the state space for nonlinear behavior the state space is gridded.
This gridding is done by defining different rotations and different amplitudes.
The different amplitudes are the second summation in the definition of the observability Gramian. Since the different responses start with different energy
levels this energy level is compensated for to make the responses comparable.
The explains the division by c2m of each response. The rotation of unity vectors
makes sure that all directions are equally represented, which is not only intuitively the right thing to do but it appears a necessity for the construction of the
empirical observability Gramian in this way.
Hahn et al. (2000) translated this to a discrete time version based on the
same definitions as Lall. Roughly speaking Hahn replaced the integral in the
empirical Gramians by a finite sum approximation of the sampled continuous
time system, where xk is short hand notation for x|t=k∆t .
Discrete time empirical controllability Gramian. Let T p , M and E p be
given as described above, where p is the number of inputs of the system. The
44
discrete time empirical controllability Gramian for system (2.30) is defined by
Wc =
p
r s l=1 m=1 i=1
q
1 ilm
Φk ,
rsc2m
(2.59)
k=0
where Φilm
∈ Rn×n is given by
k
ilm
− xss )(xilm
− xss )T ,
Φilm
k (t) := (xk
k
(2.60)
is the state of the system corresponding to the impulse response
and xilm
k
uk = cm Tl ei δk=0 + uss with initial condition x0 = xss .
Discrete time empirical observability Gramian. Let T n , M and E n be
given as described above, where n is the number of states of the original system.
The discrete time empirical observability Gramian for system (2.30) is defined
by
Wo =
r s
l=1
q
1 T
Tl Ψlm
k Tl ,
2
rsc
m
m=1
(2.61)
k=0
n×n
where Ψlm
is given by
k ∈R
ilm
Ψlm
− yss )T (ykjlm − yss ) ,
ij k := (yk
(2.62)
and ykilm is the output of the system corresponding to the initial condition
x0 = cm Tl ei + xss and input uk = uss .
This approach enables to reconstruct discrete time Gramians from sampled
data. If this data is generated by a linear system the solution will converge
to the same solution as obtained by solving the Lyapunov equations (2.33).
For data generated by a nonlinear system we end up with a so-called empirical
Gramian. The empirical Gramian and the way Hahn derives it will be assessed
later in this chapter.
2.3
Proper orthogonal decomposition
Different names for proper orthogonal decomposition exist, such as
Karhunen-Loève expansion or method of empirical eigenfunctions. It provides
a very effective way to compute low order approximate models. The idea of a
proper orthogonal composition is to find an orthogonal transformation maximizing the energy content in the first basis vectors. A common way to determine these optimal basis vectors (e.g. Aling et al., 1996; Shvartsman and
45
Kevrekidis, 1998) will be described next. For more references on this topic the
reader is referred to the literature section in the previous chapter.
Let us execute a simulation with a model defined by Equations (2.1) and
input sequence u(t). Sampling the state trajectories of this simulation provides
a so-called snapshot of the system
(2.63)
XN = ∆x(t0 ) ∆x(t1 ) . . . ∆x(tN ) ,
where ∆x(tk ) = x(tk ) − x∗ and x∗ is a steady state. The snapshot matrix
XN ∈ Rnx ×N with N nx can be an ensemble of different simulations. A
singular value decomposition of XN is defined as
XN = U ΣV T ,
(2.64)
where Σ is diagonal with σ1 ≥ σ2 ≥ · · · ≥ σn and with U U T = I and V V T = I.
XN can be partitioned as


V1T
Σ1 0 0
 V2T  .
(2.65)
XN = U1 U2
0 Σ2 0
V3T
Suppose that σmin (Σ1 ) σmax (Σ2 ) we can assume Σ2 = 0, which implies that
XN
0
= U1 Σ1 V1T
= U2 Σ2 V2T
.
(2.66)
Apparently the transformation U distinguishes between two subspaces where
most energy is captured by U1 and the rest of the energy is gathered in U2 . Let
us define a new coordinate system
T U1
z1 (t)
(2.67)
=
(x(t) − x∗ ) .
z2 (t)
U2T
The set of ordinary differential equations defined by Equations (2.1) can be
reduced by transformation and truncation
T
z˙1
U1 f (U1 z1 + x∗ , u)
z1 (t0 ) = U1T (x0 − x∗ )
=
,
. (2.68)
∗
y
= 0
g(U1 z1 + x , u)
z2
Or without elimination of x in the right hand side, which can be practical from
an implementation point of view
 


U1 z1 + x∗
x
z1 (t0 ) = U1T (x0 − x∗ )
 z˙1  =  U1T f (x, u)  ,
.
(2.69)
= 0
z2
y
g(x, u)
46
The same transformation can be used to reduce the model by residualization


  T
z˙1
U1 f (U1 z1 + U2 z2 + x∗ , u)
 0  =  U2T f (U1 z1 + U2 z2 + x∗ , u)  , z1 (t0 ) = U1T (x0 − x∗ ) , (2.70)
y
g(U1 z1 + U2 z2 + x∗ , u)
which is equivalent to

  T
z˙1
U1 f (x, u)
 0   U2T f (x, u)

 
 y =
g(x, u)
x
U z + x∗




z1 (t0 ) = U1T (x0 − x∗ ) .
,
(2.71)
Recall that in case of residualization we reduce the number of differential equations transforming the set of ordinary differential equations (ode) into a set of
differential and algebraic equations (dae).
Proper orthogonal decomposition revised
State transformation and scaling before applying the singular value decomposition has a decisive effect on the transformation and thus reduction of the model.
N = W XN = W U ΣV T .
X
(2.72)
A state coordinate change has a major effect on the model reduction. This is
an important observation:
Reduction by means of proper orthogonal decomposition
strongly depends on specific choice of coordinate system.
This can be proven by a special choice of W = Σ−1 U T , which yields a coordinate
system where all states are equally important,
N = W U ΣV T = Σ−1 U T U ΣV T = U
ΣV
T ,
X
(2.73)
= I ∈ Rnx ×nx and Σ
= I ∈ Rnx ×N .
with U
A pragmatic solution is scaling with a matrix with only on its diagonal the
reciprocal of the difference between maximal and minimal allowable value for
that specific variable. In this way all variables are normalized and differences in
units are compensated for. Most successes reported in papers on model reduction with proper orthogonal projection consider discretized partial differential
equations with only one type of variable (e.g. only temperatures). When a
model consists variables with different units their relative importance depends
on the specific choice of units. Proper orthogonal decomposition applied to a
model involving temperatures and mass, where mass is expressed in kilograms
47
and temperatures in degrees Kelvin, will give other results than the same model
with mass expressed in tons.
Computation of the singular value decomposition of a snapshot can be quite
demanding. This is caused mainly by the computation of V T , which is a N × N
matrix with N very large. In case we are only interested in the orthogonal
projection matrix U we can reduce the computation load by computation of the
T
,
singular value decomposition of XN XN
T
XN XN
= U ΣV T V ΣU T = U Σ2 U T .
(2.74)
In this way we do not compute V T and still have access to the projection matrix
U and the corresponding singular values Σ.
The data for the snapshot matrix is generated by simulation of the model
with a specific choice of inputs. Transformation based on this snapshot strongly
depends on the choice of input signals. White noise input signals on all input
channels does not discriminate between high and low frequency content and
will excite the model without preference. In case one has information on what
frequency range is relevant for the model to approximate we can put more energy
in this part of the input signal, which results in a better approximation in this
frequency range. In case we have a clear idea what the relevant input signals
look like we can fine tune the model reduction to this set of input signals. This
most probably allows for a larger degree of model reduction. However, for signals
not represented in the set input signals used for generation of the snapshots we
cannot expect good performance of the reduced model. So depending of the
knowledge on future input signals this property can either be an advantage or
a disadvantage.
In case a model based optimization is used to explore new optimal trajectories we do not know how the input trajectories look like. In that case we have
to resort to some white random type of input signals, most probably still allowing for a certain degree of reduction. One can think of the weakly controllable
states. Regardless of the choice of inputs, these states will only get little energy
from the inputs and will be reduced in this approach anyway. The discrete time
controllability Gramian and snapshot matrix are strongly related in case that
white noise inputs are used,
T
= N · Wc = N · Uc Σ2c UcT ,
XN XN
(2.75)
with N the number of samples in the snapshot matrix, Wc the discrete time controllability Gramian and Uc and Σ2c the orthogonal matrix and singular value
matrix of the controllability Gramian, respectively. Note that the transformation matrix used for proper orthogonal projection coincides with the singular
value decomposition of the controllability Gramian. See Appendix B for details.
In case another than white noise signal is used to generate the snapshot
matrix we can consider this as the result of a filtered white noise signal. This
48
brings us into the topic of so-called frequency balanced reduction. See for details
on frequency weighted balanced reduction e.g. Wortelboer (1994). Frequency
analysis of the input signals used to generate the snapshot matrix will provide
insight in what frequency ranges the model is excited. All input data without a
white spectrum indicate some kind of frequency weighting. So the observation
is that:
For a linear system a snapshot matrix times its transpose approximates
the (frequency weighted) discrete time controllability Gramian.
Dynamics related to input directions and frequency ranges that are not excited
will not be present in the data and therefore will be removed from the model.
This is not a model property but a the result of input signal selection. Note
that a model itself acts as a filter, so in case two units are in a series connection
the first acts as a filter on signals that are applied to the first unit before they
are affecting the second unit. Even more interesting is it when the output of
the second unit affects the first unit by means of a material recycle stream
or feedback control. Although the above reasoning is based on linear model
properties for smooth nonlinear systems we can argue that a similar reasoning
holds.
Singular values can provide information on the possible model order reduction. However, model stability is a more important property of the reduced
model. Unfortunately no guarantees exist for stability of projected nonlinear
models. Even stronger, a lower order reduced model can be stable for a specific
trajectory whereas a higher order reduced model becomes unstable for the same
trajectory. In Chapter 5 will be elaborate on this issue.
Löffler and Marquardt (1991) motivated and applied a weighted Euclidian
norm · Q defined as
x
Q = xT QT Qx ,
(2.76)
with a positive definite and quadratic weighting matrix Q. The covariance
matrix defined in Equation (2.53) can be written as
∞
M=
Q(x(t) − x∗ )(x(t) − x∗ )T QT dtdG .
(2.77)
G
0
T T
This weighted covariance matrix can be approximated by QXN XN
Q and
shows that model reduction by a weighted proper orthogonal decomposition
is strongly related to model reduction based on the covariation matrix M . If
G consists of the same set of simulations that generated XN , reduction based
proper orthogonal decomposition and reduction based on the covariance matrix
will be almost identical.
49
In this chapter two important model reduction techniques from literature were
discussed and given an interpretation. We will now proceed with a reformulation of the empirical Gramian in a new format that is more flexible and more
insightful than the empirical Gramians previously defined in this chapter.
2.4
Balanced reduction revisited
In this section we will focus on properties and interpretation of empirical
Gramians and how these are derived. First we will formulate a more simple
way to derive discrete time empirical Gramians as introduced by Lall (1999).
This will be referred to as perturbed data based empirical Gramians. This
formulation allows for a more flexible way how to derive empirical Gramians.
Then this formulation is extended to a generalized formulation that allows for a
derivation of empirical Gramians with almost arbitrary input trajectories. This
will be referred to as the generalized empirical Gramians. Finally we will explain
how to interpret the empirical Gramian by means of a simple example revealing
the true mechanism behind empirical Gramians.
From a theoretical point of view balanced reduction using Gramians is preferred over reduction by orthogonal decomposition since it really takes into
account the model contribution between input and output and does not depend on internal coordinate system (see previous section). The affect of scaling
of selected inputs and outputs on the reduced model can qualitatively be predicted with common engineering sense. Scaling of the states is still preferable
but only from a numerical conditioning point of view. Theoretically a state
transformation does not affect the result of balanced model reduction.
Discrete time perturbed data based Gramians
Gramians can be reconstructed from data generated by perturbation of the
steady-state of a linear discrete time system. For the empirical observability
Gramian we need to perturb the initial condition in different directions and
simulate the free response to is equilibrium. For the empirical controllability
Gramian we apply Dirac pulses in different directions via the inputs and simulate the free responses. This was the basis on which the empirical Gramians by
Lall (1999) were defined. The constraint that is enforced by Lall is that each set
of perturbations is orthogonal. This constraint is crucial for the computation of
the empirical Gramians as formulated in his way. This restriction is somewhat
artificial and can be very unpractical and above all is unnecessary as we will
demonstrate next.
50
For this reformulation new data matrices are required and these are defined
next. Suppose a stable linear discrete time system as defined in (2.30) is in
equilibrium with u∗ = 0 and therefore x∗ = 0. Let us define YN as an output
response data matrix
 1

y0 y02 · · · y0p
 y11 y12 · · · y1p 


(2.78)
YN =  .
..
..  ,
 ..
.
. 
yq1 yq2 · · · yqp
where ykr is the value of the output at time t = kh and k = {0, . . . , q} of the
rth free response observed in the output. This free response is the result of a
perturbed initial condition xr0 . All perturbed initial conditions are stacked in a
second matrix X0 defined as
(2.79)
X0 = x10 x20 · · · xp0 .
With the definition of YN and X0 we defined the two matrices required to
compute the discrete time perturbed data based observability Gramian.
Discrete time perturbed data based observability Gramian
Let YN and X0 be given as described above. The data based discrete time
observability Gramian as defined in Equation (2.32) for the discrete time linear
system as defined in Equation (2.30) is
T
Wo = X0† YNT YN X0† ,
(2.80)
where X0† is a right inverse of X0
X0† = X0T (X0 X0T )−1 .
(2.81)
Proof:
The output data matrix YN can be written as
YN = Γo X0 ,
with Γo the discrete time observability matrix


C
 CF 


Γo =  .  .
 .. 
CF q
51
(2.82)
(2.83)
Substitution yields
Wo
T
= X0† X0T ΓTo Γo X0 X0† = ΓTo Γo
=
CT




F qT CT
F T CT

C
CF
..
.





(2.84)
(2.85)
CF q
=
T
Σqk=0 F k C T CF k .
(2.86)
The proof ends by letting q → ∞.
Now we can link this formulation to the formulation used by Hahn. Suppose
that the we have as many initial conditions as states and that these initial
conditions are orthonormal. The advantage of this orthonormality is that we
do not need to compute the inverse of X0 X0T since this is by definition equal to
the unity matrix. This knowledge is exploited in the formulation used by Hahn.
By substitution of X0 X0T = I into Equations (2.81) and (2.80) we can recognize
the formulation by Hahn presented in Equation (2.61)
Wo = X0 YNT YN X0T .
(2.87)
We only need to convert the matrix notation into a summation representation.
The full proof that the formulation by Hahn fits within the new formulation can
be found in Appendix A.2. The price we have to pay in this new definition is
that we need to compute the pseudo inverse X0† but in return we may plug in
any initial condition we would like, which obviously is less restrictive. The new
constraint is that the conditioning of X0 X0T should allow for a numerically stable
inversion, which is less restrictive than enforcing orthogonality. Furthermore we
can add extra initial conditions one by one instead of by a whole orthogonal
set at once. The different amplitudes and the number of initial conditions are
directly accounted for.
In a similar way we can derive the discrete time perturbed data based controllability Gramian. Let us define XN as the state response data matrix
1
p 2
XN
· · · XN
,
(2.88)
XN = XN
where
r
XN
=
xr1
xr2
···
xrq
,
(2.89)
and xrk is the value of the state at t = kh of the impulse response to ur0 . Define
U0 as the matrix with all impulse values
(2.90)
U0 = U01 U02 · · · U0p ,
52
where

ur0
 0

U0r = 

0
0
ur0
..
.
0

0
0 

 ,

ur0
(2.91)
such that
r
XN
= Γc U0r ,
(2.92)
with the discrete time controllability matrix
Γc = G F G · · · F q G .
(2.93)
Discrete time perturbed data based controllability Gramian
Let XN and U0 be given as described above. The data based discrete time
controllability Gramian as defined in Equation (2.32) for the discrete time linear
system as defined in Equation (2.30) is
T
T
,
Wc = XN U0† U0† XN
(2.94)
where U0† is the right inverse of U0
U0† = U0T (U0 U0T )−1 .
(2.95)
Proof:
The state response matrix XN can be written as
XN = Γc U0 .
(2.96)
Substitution yields
Wc
=
=
=
T
Γc U0 U0† U0† U0T ΓTc = Γc ΓTc

G FG
G
 GT F T

F qG 
..

.
GT F q T
···
T
Σqk=0 F k GGT F k .
The proof ends by letting q → ∞.
53
T





(2.97)
(2.98)
(2.99)
ζ2
Rp
Rp
ζ2
ζ1
ζ1
Figure 2.1: Left: favorable steady-state with maximum perturbation radius
where ζ represent either states in Rnx or inputs in Rnu . Right: steady-state
near a constraint results in a smaller admissible radius of feasible perturbations.
Again we can link this formulation to the formulation adopted by Hahn. Suppose we have as many impulse responses as inputs and these impulses happen to be orthonormal, which is required within Hahn’s framework. In that
case we know that by definition U0 U0T = I. By substitution of U0 U0T = I in
Equations (2.95) and (2.94) we already can recognize the formulation by Hahn
presented in Equation (2.59)
T
T
= XN XN
.
Wc = XN U0 U0T XN
(2.100)
We only need to convert the matrix notation to a summation representation.
Most of all pros and cons that apply to the empirical observability Gramian
apply to the empirical controllability Gramian. So, in the new formulation
we can add one input perturbation at a time in an arbitrary direction and
magnitude as long as the conditioning of U0 U0T allows for a numerical stable
computation of the right inverse. This is a much more flexible formulation and
enables easy computation.
Note the resemblance between Equation (2.100) and snapshot matrix used to
do proper orthogonal decomposition in Equation (2.75). For special choices of
signals the snapshot matrix times its transpose is identical to the controllability
Gramian.
So the empirical Gramians as defined by Lall (1999) and adopted by Hahn
et al. (2000) are a special case of the Gramians presented in this section. If the
perturbations are chosen as orthogonal sets with possibly different amplitudes
the two methods coincide. However, since there is no fundamental motivation
for this choice, other than numerical reasons, the method as presented in this
section leaves more freedom for the user to choose perturbations. The only
condition on the perturbations is that they should span the whole column space.
54
A practical disadvantage of the orthogonal perturbations as proposed by Lall
(1999), is related to constraints on perturbations. These constraints come natural with the model such as positiveness of variables and (molar) fractions that
should remain between zero and one. This is illustrated in Figure 2.1 where
in the left picture the steady-state is chosen such that a large area of the admissible perturbations can be covered by orthogonal sets of perturbations. In
the right of the same figure is illustrated that a constraint near the steady-state
restricts the radius of admissible perturbations. In case of the method presented
in this section the perturbations can freely be chosen as long as they satisfy the
constraints and span the whole space. Note that the illustration holds for input
perturbations as well as for state perturbations.
It is questionable if the relevant nonlinear dynamics are revealed with this
perturbation approach. This observation inspired to develop a method that
enables computation of empirical Gramians from a single simulation of a relevant trajectory. It is closely related to the discrete time perturbed data based
Gramians introduced in this section.
Generalized data based Gramians
The data based Gramians in the previous section are constructed from impulse
response data and data from perturbation of steady-state conditions. In this
section we will generalize this to a much wider class of signals. We will present
how to use data from one simulation of a trajectory for the construction of
both controllability and observability Gramian. We will demonstrate how the
approach is applied to an asymptotically stable linear model which is assumed
to be in steady-state at the beginning of the trajectory.
Let us define the data matrix XN as the snapshot matrix of the matrix
XN = x1 x2 · · · xN ,
(2.101)
where xk is the value of the state at t = kh of the response to the input sequence
t
u(t). The covariance matrix M = G 0 N (x(t) − x∗ )(x(t) − x∗ )T dtdG as used by
T
Löffler and Marquardt (1991) can be approximated by XN XN
. The covariance
matrix does not equal the observability or controllability Gramian. Only if G
is chosen in the special way (orthogonal impulse responses) as defined in Lall
(1999), the covariance matrix will approximate the controllability Gramian.
XN can be written as
N
XN = ΓN
c UN ,
with ΓN
c the discrete time controllability matrix
G FG · · · FNG ,
ΓN
c =
55
(2.102)
(2.103)
N
and the input matrix UN
defined

u0
 0

N
UN
= .
 ..
0
as
u1
u0
..
.
···
···
..
.
uN
uN −1
..
.
···
0
u0



 .

(2.104)
If the system is stable limq→∞ F q = 0. Therefore we can truncate the controllability matrix and input matrix and approximate XN
XN ≈ Γc UN ,
(2.105)
with Γc the truncated controllability matrix as in Equation (2.93) and the truncated input matrix as


u0 u1 · · ·
uq
uN −1
 0 u0 · · · uq−1
uN −2 


(2.106)
UN =  .
 .
..
..
..
 ..

.
.
.
···
0
0
uN −1−q
u0
Generalized discrete time data based controllability Gramian
Let XN and UN be given as described above. The data based discrete time
controllability Gramian for the discrete time linear system (2.30) is defined by
T
†
†
T
UN
XN
,
Wc = XN UN
where
†
UN
(2.107)
is the right inverse of UN
†
T
T −1
UN
= UN
(UN UN
) .
(2.108)
Proof:
The state response matrix XN can be written as
XN = Γc UN
(2.109)
Substitution yields
Wc
=
=
=
T
†
†
T T
Γc UN UN
UN
UN
Γc = Γc ΓTc

G FG
G
 GT F T

F qG 
..

.
GT F q T
···
T
Σqk=0 F k GGT F k ,
56
T





(2.110)
(2.111)
(2.112)
which completes the proof.
For the observability Gramian we use the Hankel matrix. Let us define the
output data matrix YN as
YN =
y1
y2
···
yN
,
(2.113)
with the argument that limq→∞ F q = 0 YN can be approximated by
YN ≈ Γco UN ,
(2.114)
with the truncated Markov parameters Γco defined as
Γco =
CG CF G
···
CF 2q G
,
(2.115)
and UN is the truncated input matrix. The Markov parameters can be determined by
†
,
Γco = YN UN
(2.116)
†
with UN
the right inverse of UN as defined in Equation (2.108). The Hankel
matrix is defined as


CG
CF G · · · CF q G


..

 CF G
.
 ,

(2.117)
H = Γo Γc = 

..
..


.
.
···
· · · CF 2q G
CF q G
which can be filled with the Markov parameters.
Generalized discrete time data based observability Gramian
Let H and Γc be defined as described above. The data based discrete time
observability Gramian for the discrete time linear system (2.30) is defined by
T
Wo = Æc H T HÆc ,
(2.118)
with Γ†c as the right inverse of Γc
Γ†c = ΓTc (Γc ΓTc )−1 = ΓTc Wc−1 .
57
(2.119)
Proof:
Substitution of H yields
Wo
=
=
T
Γ†c ΓTc ΓTo Γo Γc Γ†c = ΓTo Γo
CT





F qT CT
F T CT
(2.120)

C
CF
..
.




(2.121)
CF q
=
T
Σqk=0 F k C T CF k ,
(2.122)
which completes the proof.
The right inverses used in the computation of the Gramians must exist and
should be well conditioned. In case of the right inverse of the input matrix UN
this can be achieved by a proper choice of inputs whereas the existence and well
conditioned right inverse of Γc cannot be guaranteed. This issue will be treated
in an example in the next section. First we will reveal the underling mechanism
of empirical Gramians. Basic question was what an empirical Gramian looked
like if data was gathered in two different operating conditions of a nonlinear
system using small perturbations. The use of small perturbations implies that
only the local linear behavior of the nonlinear model is excited.
Example of two linear models
We will investigate the mechanisms that occur in the case of computation of
empirical Gramians. Therefore we assume two linear discrete time systems that
represent the local dynamics of a nonlinear system in two different operating
points
F1
C1
G1
0
,
F2
C2
G2
0
.
The according controllability Gramians are

Wc1 = Γc1 ΓTc1 =
G1
F1 G1
...
F1q G1




GT1
GT1 F1T
..
.
GT1 F1q
58
T



 ,

and

Wc2 = Γc2 ΓTc2 =
G2
F2 G2
...
F2q G2




GT2
GT2 F2T
..
.
GT2 F2q



 .

T
We can generate data in these two points that we collect in data matrices XN 1
and XN 2 , respectively
XN 1 = Γc1 UN 1 ,
XN 2 = Γc2 UN 2 .
We already proved the local controllability matrix can be reconstructed from
this data and therefore the the controllability Gramian as well. The two local
controllability matrices are
T
T −1
Γc1 = XN 1 UN
1 (UN 1 UN 1 )
T
T −1
Γc2 = XN 2 UN 2 (UN 2 UN 2 ) .
Substitution of XN 1 and XN 2 yields
T
T −1
Γc1 = Γc1 UN 1 UN
1 (UN 1 UN 1 )
T
T −1
Γc2 = Γc2 UN 2 UN
.
2 (UN 2 UN 2 )
Suppose we computed the average controllability matrix by combining the two
data sets
UN 1 UN 2
UN =
UN 1
0
XN 1 XN 2 = Γc1 Γc2
XN =
0
UN 2
= Γc UN .
Solution for this average controllability matrix Γc is
T
T −1
Γc = XN UN
(UN UN
) .
Substitution of XN and UN yields
Γc
=
=
Γc1
Γc2
Γc1
Γc2

T
UN
1



UN 1


T
UN
1
0

UN 2 
UN 2
T
T
UN
UN
2
2


T
U
U
N1 N1
−1
T
T

 UN 1 UN
.
1 + UN 2 UN 2
T
UN 2 UN
2
UN 1
0
59
−1



Suppose UN 2 = γUN 1 . This implies that we use identical input signals for data
generation in the second operating point as were used for data generation in the
first operating point but scaled with the contstant γ. Substitution yields


T
UN 1 UN
1
−1
T
2
T
 UN 1 UN
Γc1 Γc2 
Γc =
1 + γ UN 1 UN 1
T
γ 2 UN 1 UN
1
1 I
Γ
Γ
=
c1
c2
γ2I
1 + γ2
=
1
γ2
Γc1 +
Γc2 .
2
1+γ
1 + γ2
If γ = 1, so if we use exactly the same inputs signals, we see that the interpolated
controllability matrix coincides with the equally weighted interpolation of the
two local controllability matrices
Γc =
1
(Γc1 + Γc2 ) .
2
T
2
T
2
An other special case is when UN 1 UN
1 = α I and UN 2 UN 2 = β I. This is the
2
case if we apply an energy level of α to the to the first operating point and β 2
to the second operating point. The perturbation orientation and amplitude in
both operating points are completely free as long as they can be described by
orthogonal sets of perturbations. Substitution yields
−1
α2 I 2
α I + β2I
Γc1 Γc2
Γc =
2
β I
α2 I
1
Γ
Γ
=
c1
c2
β2I
α2 + β 2
=
α2
α2
β2
Γc1 + 2
Γc2 .
2
+β
α + β2
So by allocation of energy in the test signals over operating points we emphasize
different operating points. If we want to compute the empirical Gramian based
on data generated in two operating points and use small perturbations this is
similar to averaging of the two controllability matrices. This is not equal to
averaging the two controllability Gramians. The proof is simple
Wc = 12 Wc1 + 12 Wc2 = 12 Γc1 ΓTc1 + 12 Γc2 ΓTc2
=
,
(2.123)
Γc ΓTc = 14 Γc1 ΓTc1 + 14 Γc1 ΓTc2 + 14 Γc1 ΓTc2 + 14 Γc2 ΓTc2
.
(2.124)
One could argue that this is a flaw of the empirical Gramian. This concludes
the true underlying mechanism of the Empirical Gramians.
60
Interpolation of local linear Gramians
Motivated by the conclusion that in the end empirical Gramians attempt to
reconstruct Gramians of local linear dynamics a simple alternative for the data
based Gramians is interpolation of local Gramians. Löffler and Marquardt
(1991) approximated the covariance matrix as defined in Equation (2.53) by
using multiple linear models. The covariance matrix was computed directly by
means of a Monte Carlo method and using the linear approximation.
Interpolation of local Gramians is done by deriving a number of linear models
in a relevant operating range from which the local Gramians can be computed
by solving the discrete time Lyapunov equations (2.33). This yields
Wc =
N
1 Wc,i
N i=1
,
(2.125)
Wo =
N
1 Wo,i
N i=1
,
(2.126)
and
where Wc,i and Wo,i are the local controllability and observability Gramian,
respectively. The local Gramians can be weighted with γ to emphasize specific
operating points. This yields
Wc = N
N
1
i=1
γc,i
γc,i Wc,i
,
(2.127)
γo,i Wo,i
.
(2.128)
i=1
and
Wo = N
N
1
i=1
γo,i
i=1
The result of this averaging approximates the perturbed data based Gramians
in case the effect finite data is negligible and sufficiently small perturbations are
used.
61
2.5
Evaluation on a process model
In this chapter different ways to compute Gramians were presented.
In this section we will compare different approaches by applying them to a
simple model. In the first test we want to exclude nonlinear effects to enable
a lucent assessment of the different data based empirical reduction techniques
by using a linearization of a nonlinear model. In the next tests we return to
the nonlinear model we will then compare different Gramian based reduction
approaches where we focus on differences between data based Gramians and
Gramians derived from linearizations of the nonlinear model.
.......
.....
........
.........
.... ....
..............
❄
.........
..
..........
.... ....
..............
.
.........
..................
..............
✠
......
......... ..........
✲.. ..
..................
..............
❄❄
..............................
.....
...
...
.................
..............
✛
✲
................
..................
........................
.
....
....
.......
.........................
✲
.................
.............
..................
..............
✒
....
.
........................
..................
..............
✲
Figure 2.2: Schematic of the process model: reactor in series with a distillation
column with recycle to the reactor.
In this chapter we propose to use a simple model of a plant. A schematic of
that plant is presented in Figure 2.2. The process represent a general class
of chemical processes consisting of a reactor with separation unit and recycle
stream. All levels are controlled and the top and bottom quality are measured
and controlled by reflux ratio and boilup rate, respectively. Temperate in the
reactor is assumed to be constant by tight temperature control. For detailed
model information the reader is referred to Chapter 3 for a general description
and Chapter 4 for details on modelling assumptions.
The model consists of 24 differential equations1 and we are interested in the
top purity and bottom impurity that are controlled by reflux ratio and boilup
rate. So by model reduction we want to find a low order model that still properly
represents the input to output dynamics. Note that the effect of the reactor is
taken into account since the reactor is connected to the column by fresh feed
and recycle stream.
1 gproms
code can be found on the webpage http://www.dcsc.tudelft.nl/Research/Software.
62
xk+1 = F xk + Guk
yk = Cxk + Duk
(2.32)
d
✲ Wc
d
✲ L
q
✲ Wcq
q
✲ L
e
✲ Wce
e
✲ L
g
✲ Wcg
g
✲ L
Wod
(2.32)
(2.35)
(2.35)
(2.59)
(2.61)
(2.107)
(2.118)
Wo
Wo
Wo
Rd
Rq
Re
Rg
Figure 2.3: Schematic overview of projections based on different Gramians with
between brackets the corresponding equation numbers.
Results on the linearized model
For the linear model the bottom impurity and top purity are used as outputs, y1
and y2 , respectively, and reflux ratio and boilup rate are used as inputs, u1 and
u2 , respectively. In this thesis we restrict to piecewise constant input signals
and therefore it is sensible to use discrete time Gramians instead of continuous
time Gramians. In this way the choice for the limited class of possible input
signals enters the Gramians and consequently into the projections. Moore (1981)
elaborates in his paper on the effect of sample time on discrete time Gramians
and the relation to the continuous time Gramian.
In Figure 2.3 all different projections that will be considered in the first test
are presented in a schematic way. Wc and Wo are controllability and observability matrices with superscripts d for discrete-time (Equation 2.32), q for finite
discrete-time (Equation 2.35), e for empirical (Equations 2.59 and 2.61) and
q for generalized (Equations 2.107 and 2.118), respectively. L and R are the left
and right projectors derived from the Gramians and result in the reduced-order
model by truncation.
In Figure 2.4, the norm of the error, defined as
G(s) − G(s)
,
(2.129)
of the reduced-order models based on the four different Gramians (exact, finite
time, empirical and generalized, respectively) is plotted against the reduced
model order. It is hard to discriminate between the reduced models for most
orders since the approximation error is of the same order of magnitude. Theoretically the results should coincide except for the exact discrete-time solution. The
63
H2 approximation error
−2
10
−4
||ε||2
10
−6
10
−8
10
0
5
10
order
15
20
25
H2 approximation error 2 step approach
−2
10
−4
||ε||2
10
−6
10
exact
finite
empirical
generalized
−8
10
0
5
10
order
15
20
25
Figure 2.4: Norm of the error of the four different Gramian based reduced
models against model order. Top: direct reduction on the original system.
Bottom: two step reduction method.
differences can be explained by numerical errors in the different computations.
In case of the generalized Gramian the inverse of the controllability Gramian
is used (Equation (2.119)) to compute the observability Gramian. This computation introduces numerical errors when the controllability Gramian is close to
singular. Therefore a two-step approach is proposed that is explained next.
In the first is step the model is projected based on a singular value decomposition of the controllability Gramian
T Σc1
0
Uc1
Wc = Uc Σc UcT = Uc1 Uc2
.
(2.130)
T
0
Σc2
Uc2
We can partition Uc such that the conditioning of Σc1 is acceptable from a
numerical point of view to compute its inverse. The result of projection and
truncation based on Uc1 is a reduced-order model with only controllable states.
This reduced model can be reduced in a second step by a numerically proper
64
balanced reduction. In this example the first reduction resulted in a reduced
model of order eight. Therefore in the bottom of Figure 2.4 only the errors of
the balanced reduced models up to the order of eight are present. This two-step
approach does not work in general and it is possible to come up with an example
to prove this. In practice it provides a suitable reduction as we will see.
The dashed line in Figure 2.4 represent a user defined error level of 10−3 . We
can see in the top of this figure that we need three orders to meet that bound
except for the generalized empirical Gramian reduced-order model that needs
five orders. In case of the two step approach we see at the bottom of Figure 2.4
the same result except that for the generalized empirical Gramian we need four
orders instead of five. This is the effect of better numerical properties of the
computations.
A different way to assess the quality of the reduced models is to compare step
responses. Step responses of these models are depicted in Figures 2.5 and 2.6.
Note that the steady state values of the nonlinear model for the bottom impurity
were 0.01 and 0.90 for the top purity and that the reduced linear models only are
valid close to this operating point. The two successive projections do not affect
the reduction up to the fifth order for all Gramians except for the generalized
Gramian. The reduced models derived from the generalized empirical Gramians
improve by applying the two successive projections: the fourth order model in
Figure 2.6 is better than the fifth order model in Figure 2.5. This conclusion
can also be drawn from Figure 2.4 where the error of the fifth order model in
the top of the figure is larger than the error of the forth order computed by the
two-step approach.
The results illustrate that both data based methods to derive Gramians
enable a balanced reduction that approximates the balanced reduction based on
the Gramian that is the solution of the Lyapunov equation. The computations
that are involved in case of the generalized empirical observability Gramian
are sensitive to the conditioning of the controllability Gramian and therefore a
two-step reduction approach was introduced.
Results on the nonlinear model
For nonlinear models we know that the controllability and observability depend
on the point of operation. Therefore we need to make sure that only data is
produced that represent the operating envelope of interest. This is not trivial
but we can think of different possible scenarios. In Figure 2.7 four different
approaches are presented to cover the relevant operating envelope.
In the situation indicated with the capital roman number I it is illustrated
that we compute some average operating point from which the system is perturbed such that both steady-states A and B are enclosed by these unit ball
perturbations. Besides a number of practical problems that can occur, such as
65
Step Response
u(1)
0
−0.006
Amplitude
0
10
20
30
time [hr]
−4
40
u(1)
0
0
10
20
30
time [hr]
40
u(2)
0.14
−0.02
0.12
−0.04
0.1
−0.06
0.08
y(2)
y(2)
−2
−3
−0.008
−0.08
0.06
−0.1
0.04
−0.12
0.02
−0.14
x 10
−1
−0.004
y(1)
y(1)
−0.002
−0.01
u(2)
−3
0
0
10
20
30
time [hr]
0
40
org−24
redq−3
rede−3
redg−5
0
10
20
30
time [hr]
40
Figure 2.5: Full order linear model and reduced models based on different Gramians. Boilup rate and reflux rate are inputs u(1) and u(2), respectively and
bottom impurity and top purity are outputs y(1) and y(2), respectively.
Step Response
u(1)
0
−0.006
Amplitude
0
10
20
30
time [hr]
−4
40
u(1)
0
0
10
20
30
time [hr]
40
u(2)
0.14
−0.02
0.12
−0.04
0.1
−0.06
0.08
y(2)
y(2)
−2
−3
−0.008
−0.08
0.06
−0.1
0.04
−0.12
0.02
−0.14
x 10
−1
−0.004
y(1)
y(1)
−0.002
−0.01
u(2)
−3
0
0
10
20
30
time [hr]
0
40
org−24
redq−3
rede−3
redg−4
0
10
20
30
time [hr]
40
Figure 2.6: Full order linear model and reduced models based on two successive
projections. Boilup rate and reflux rate are inputs u(1) and u(2), respectively
and bottom impurity and top purity are outputs y(1) and y(2), respectively.
66
Rnx
Rnx
B
A
A
.....
................
..........
.....
....
...
.
...
...
..
..
.
.
.
....
......
..........
..................
B
I
Rnx
A
......
...............
..........
......
....
.
.
...
...
..
..
.
.
..
....
.......
...........
..................
II
Rnx
B
A
III
......
...............
..........
......
....
.
...
...
..
..
.
.
..
....
.......
...........
..................
B
IV
Figure 2.7: Four different approaches to cover the relevant state space operating
envelope between the two stationary operating points, i.e. 0 = f (x, u), A and
B in Rnx . The solid line represents a trajectory that satisfies ẋ = f (x, u).
e.g. the presence of constraints that restrict the radius of admissible perturbations, it is very questionable whether this approach represents the relevant
operating envelope. By increasing the operating envelope the validity requirements for the model increase, which reduces the potential for model reduction.
Therefore the operating envelope should be large enough to capture all relevant
dynamics but also as small as possible to exclude all non-relevant dynamics.
In the situation indicated by the capital roman number II it is illustrated
that we restrict to small perturbations in the two steady-states A and B. This
is implementable since we can decrease the perturbations until all constraint
violations have disappeared. Note however, that for small perturbations all
data based Gramians converge to the two stationary Gramians that can also be
computed by derivation of a linear model and solving two Lyapunov equations.
The empirical Gramian is derived from the united data sets derived in the two
steady-states A and B. Note that in case we use sufficiently small perturbations
and an equal number of perturbations in both steady-states, computation of empirical Gramians approximates the average Gramian of the two local Gramians.
Drawback of this approach can be the area covered by the perturbations does
not cover the whole transition between steady-states A and B. This naturally
brings us to the situation that is depicted in Figure 2.7 with the capital roman
number III.
67
input perturbations
3.2
3
2.8
2.6
reflux ratio
2.4
2.2
2
1.8
1.6
1.4
1.2
1.8
2
2.2
2.4
2.6
2.8
boilup rate
3
3.2
3.4
3.6
3.8
Figure 2.8: Input perturbations of reflux ratio and boilup rate to generate impulse response data for construction of empirical controllability Gramian as
described in situation II in Figure 2.7. Each marker represents one simulation.
The + markers are centered around the first operating point A and × markers
are centered around the second operating point B.
boilup rate
3.4
Qreb
3.2
3
2.8
2.6
2.4
0
10
20
30
40
50
time [hr]
60
70
80
90
60
70
80
90
reflux ratio
2.6
RR
2.4
2.2
2
1.8
1.6
0
10
20
30
40
50
time [hr]
Figure 2.9: Input trajectories of boilup rate (top) and reflux ratio(bottom) used
for evaluation of the projected models.
68
The situation indicated with the capital roman number III exactly covers the
whole relevant area defined by the transition by performing small perturbations
in different points along the transition. This situation coincides with the situation indicated with capital roman number IV if the perturbations are sufficiently
small and the local dynamics can be represented by a linear model. The result
can then be interpreted as the average of the different local Gramians along the
trajectory.
We will now compare reductions applied to the nonlinear example. First we
present the result that is described in situation II where the operation envelope
is covered by perturbation in the two different steady-state operating points.
The right upper circle in the figure is centered around operating point A whereas
the circle in the left lower corner is centered around B. The input perturbations
are depicted in Figure 2.8.
In an attempt to cover the relevant nonlinear dynamics we perturbed the
inputs with a magnitude such that an overlap in the state space was created.
Still this is a somewhat arbitrary choice for input perturbations. By means of
simulation, data is generated that was used to derive the empirical controllability
Gramian. In a similar manner perturbations were chosen to compute different
observability Gramians but since these perturbations are of a too high dimension
these cannot be depicted in a figure like the perturbations used for the inputs.
The state perturbation were chosen relatively small to guarantee feasibility2 .
This inherently resulted in responses that represented the local, and thus linear,
dynamics.
This data was used to compute empirical Gramians, as defined in Equations
(2.59) and (2.61), in two ways: one where all data was collected and the controllability (Wce ) and observability (Woe ) empirical Gramians where computed
in one go. The second way was to use the data in each operating point to
compute the local empirical Gramians that were then averaged. The according
Gramians are denoted as Wcem and Woem , respectively.
In situation IV of Figure 2.7 was suggested to use local linearized dynamics
along a trajectory to represent relevant dynamics. This is the third approach
added to the test. So along the simulation of the input trajectory at equally
spaced times a linear model was derived from which a discrete time controllability and observability Gramians were computed by solving two Lyapunov
equations. These were averaged and resulted in an average controllability and
observability Gramian, as defined in Equation (2.126), here denoted as Wcm and
Wom , respectively.
2 Since the bottom impurity was 0.01 perturbations are restricted to 0.01 to guarantee
positive perturbed impurity.
69
x 10−3
25
response bottom impurity
xb [−]
20
15
10
5
0
10
20
30
40
50
time [hr]
60
70
80
90
0
||ε||
10
Wm
We
Wem
−5
10
−10
10
0
5
10
order
15
20
25
Figure 2.10: Top: time responses of bottom impurity of reduced models based
on different linear Gramians: averaged linearized (W m ), empirical (W e ) and averaged empirical (W em ) Gramian, respectively. Bottom: error between reducedorder models and original response for different Gramians plotted against reduced model order.
−2
x 10
xd [−]
90.5
response top purity
90
89.5
0
10
20
30
40
50
time [hr]
60
70
80
90
0
10
Wm
We
Wem
−2
||ε||
10
−4
10
−6
10
−8
10
0
5
10
order
15
20
25
Figure 2.11: Top: time responses of top purity of reduced models based on different linear Gramians: averaged linearized (W m ), empirical (W e ) and averaged
empirical (W em ) Gramian, respectively. Bottom: error between reduced-order
models and original response for different Gramians plotted against reduced
model order.
70
The quality of the projections derived from the different Gramians are evaluated
by simulation of the model going from one operating point to the other. This
input trajectory is depicted in Figure 2.9 where the inputs are ramped in eight
hours from the values of one operating point to the other and continued by 48
hours of constant inputs. In a similar way the model was ramped back to the
first operating point.
The results of the balanced reductions by truncation based on the three
different methods to compute nonlinear Gramians are presented in Figures 2.10
and 2.11. Simulations with the reduced models that resulted in an error smaller
than the dotted line in the bottom of Figures 2.10 and 2.11 are plotted in the
top of the same figures. The error is defined as the square root of the summation
of the quadratic error over all samples of the output. For every order we can
compute this error for both outputs providing insight in the quality of the
different reduction.
From the results it can be concluded that the approach of representing the
relevant dynamics by local linear dynamics along the trajectory is more successful than the two alternative approaches discussed here. Although the results of
the two data based approaches are not exactly the same, it is hard to discriminate between those two. The cross terms as explained in the previous section,
see Equations (2.123) and (2.124), do affect the reduction but whether it results
in better reductions is questionable. For the higher order models the averaging approach seems better, suggesting even a negative effect of computing the
empirical Gramians from all data at once.
Löffler and Marquardt (1991) sampled the state-space to construct a covariance matrix using linear models that represented the relevant dynamics in
different operating points. Instead of the Monte Carlo approach they used, it is
very well possible to sample a trajectory like was done in this section in their
approach.
Finally we compare the effect of averaging local Gramians, so derived from
linearized models and solving Lyapunov equations, in more detail. Therefore
we compare the responses of the reduced models from Gramians derived in
operating point A, B, the averaged Gramian in A and B and the averaged
Gramian along the trajectory, respectively Wa , Wb , Wab and Wm .
The results are presented in Figures 2.12 and 2.13 in exactly the same way
as in Figures 2.10 and 2.11. Note that the absence of a result for a specific order
and reduction approach indicates a failure of the simulation. Some simulation
failures could be related to a variable that hit its bound. Therefore the bounds
on variables that were defined in case of the original model were relaxed, which
resulted in more successful simulations of reduced models. These bounds can
be helpful as a debugging tool during the modelling process but are of no use
in evaluating the reduced models. For the reduced models it was allowed to
71
−3
x 10
25
response bottom impurity
xb [−]
20
15
10
5
0
10
20
30
40
50
time [hr]
60
70
80
90
5
10
Wab
Wm
Wa
Wb
0
||ε||
10
−5
10
−10
10
0
5
10
order
15
20
25
Figure 2.12: Top: time responses of bottom impurity of reduced models based
on different linear Gramians: Averaged Gramian of A and B (Wab ), Gramian
in A (Wa ), Gramian in B (Wb ) and averaged along trajectory Gramian (Wm ).
Bottom: error between reduced-order models and original response for different
Gramians plotted against reduced model order.
x 10−2
xd [−]
90.5
response top purity
90
89.5
0
10
20
30
40
50
time [hr]
60
70
80
90
5
10
Wab
Wm
Wa
Wb
0
||ε||
10
−5
10
−10
10
0
5
10
order
15
20
25
Figure 2.13: Top: time responses of top purity of reduced models based on
different linear Gramians: Averaged Gramian of A and B (Wab ), Gramian in
A (Wa ), Gramian in B (Wb ) and averaged along trajectory Gramian (Wm ).
Bottom: error between reduced-order models and original response for different
Gramians plotted against reduced model order.
72
compute negative fractions, which obviously have no physical meaning, but
since the only criterion for the reduced models is error in the predicted output,
these values are excepted during the simulation.
From these results it can be concluded that the mechanism of averaging is
successful since the reduced models based on the average of two local Gramian
in the two operating points perform better than the reduced models using only
local dynamics in one operating point. Averaging over Gramians that can be
derived along the trajectory provides the best results and will therefore be used
as the approach in the next part of this thesis. The assessment of this model
reduction approach for dynamic optimization will be done in Chapter 5.
2.6
Discussion
In this chapter different balanced model order reduction by empirical
Gramians were assessed. It appeared that empirical Gramians are comparable
to averaging of linear Gramians. Therefore we needed to reformulate the computation of the empirical Gramians to a format that is more generic and insightful
than the formulation introduced by Lall (1999). In literature this interpretation was not available and therefore the interpretation can be considered as a
contribution of this chapter.
Inspired by the interpretation of empirical Gramians, a pragmatic approach
was proposed to challenge the empirical Gramian based reduction. This pragmatic approach is based on linearization of the model in different points in the
state space. Via averaging of the Gramians that can be related to the linear
models, an average controllability and observability Gramian can be computed.
This approach provides good results and is less involved than the computation
of empirical Gramians.
Furthermore we were able to make a clear relation between proper orthogonal
decomposition (e.g. Berkooz et al., 1993; Baker and Christofides, 2000), empirical Gramians (Lall, 1999; Hahn and Edgar, 2002) and the covariance matrix
(Pallaske, 1987; Löffler and Marquardt, 1991). The snapshot matrix multiplied
with its transpose approximates the discrete time controllability Gramian in
case white noise signals are used. This same matrix approximates the integral
that defines the covariance matrix. In case other type of signals are used we
can give it a frequency weighted controllability Gramian interpretation. Important difference between proper orthogonal decomposition and Gramian based
reduction is that proper orthogonal decomposition strongly depends on specific
choice of coordinates. This implies a strong dependency of state scaling with
only rough guidelines how to choose this scaling.
73
Chapter 3
Dynamic optimization
In this chapter the optimization base case will be presented. This case involves a
plant with reactor and separation section with recycle stream. The plant model
has a high level of detail like current models developed with state of the art
commercial modelling tools. We will pose the dynamic optimization problem
and motivate some specific implementation details to provide a clear picture of
execution of the optimization problem.
3.1
Base case
From an academic point of view the use of a case seems not to be very
elegant since it does not hold as a mathematical proof but only demonstrates
the success or failure of an approach on a specific example. This specific example was carefully selected and represents the complexity of a class of dynamic
optimization problems in process industry. The nonlinear model is described
by a set of differential-algebraic equations with over two thousand equations1
of which seventy five are differential equations. The model size and complexity
requires serious modelling and simulation tools but is just manageable for doing
research with. Therefore results on this example can be transferred to dynamic
optimization problems of a whole class of processes.
A case study is only valuable if well documented, which explains a full chapter on the base case optimization problem formulation and implementation.
The selected case problem represents characteristics common in process industry. Plant designs consisting of a reactor and separation section recycling the
main reactant over the reactor are widespread. This can be motivated from an
investment point of view looking at required capital costs for the production
1 This
includes the identity equations introduced by connecting sub-models.
75
of an end product with specific quality. Driving force for the reaction are the
concentrations of the reactants. In batch operation with only a reactor this concentration rapidly decreases slowing down the reaction rate. Batch operation
would therefore require long residence times for the desired end-product quality,
thus large reactors to meet capacity specifications and consequently high capital
costs. Combining a reactor with separation and reactant recycle enables higher
concentrations and thus higher reaction rates in the reactor because the endproduct is obtained from the separation unit. Selectivity can often be improved
by the use of a catalyst minimizing the byproducts increasing productivity for
both setups.
In the introduction we estimated that a simple chemical production plant
would easily involve a thousand differential equations and a couple of thousand
algebraic equations. In the example process that will be used in the base case
we confine to a first order reaction. This moderates the scale of the model
without compromising the plant-wide aspects. Still the scale of the model is
large enough to test model reduction techniques for large scale models.
First we will describe the plant and motivate the formulation of the dynamic
optimization problem. Then the optimization strategy is outlined including the
most characteristic implementation details. Specific implementation choices can
have a decisive effect on the performance of the dynamic optimization algorithm.
Finally the results of the dynamic optimization problem are presented and discussed.
Plant description
The example is realistic in the sense that the unit operations reflect a typical
industrial plant since is involves a reaction step combined with a separation step
and recycle with basic control. Therefore the essentials of a real plant behavior
is captured and reflects the interaction of the separate unit operations. It is an
academic example because the reaction and reaction kinetics are not based on
real data and describe the academic A → B reaction. The parameters in the
Soavé-Redlich-Kwong physical property model used in the distillation column
are from literature (Reid, 1987) and describe the properties of propane and
n-butane. The model was programmed with gproms software2 .
From an economic point of view it is optimal to operate at maximum reactor
volume and fixed temperature. The reactor temperature is either determined by
selectivity or bounded by a safety argument. Therefore the reactor holdup and
temperature are PI-controlled. The holdup in reboiler and condenser are not
crucial for operation but their capacity can be used for disturbances rejection.
Both holdups are therefore P-controlled. Quality control in the bottom of the
column is controlled by manipulation of the boilup rate and quality in the top
2 gproms
code can be found on the webpage http://www.dcsc.tudelft.nl/Research/Software.
76
LC
03
FC
04
CC
03
CT
04
x .y
T d = 323 K
xd = 0.90
F0 = 0.50 mol/s
T 0 = 293 K
z 0 = 1.00
LC
04
PC
03
D = 0.6125 mol/s
R = 1.5739 mol/s
FC
05
TC
01
T = 323 K
N = 400 mol
CC
01
CT
05
F = 1.1125 mol/s
z = 0.50
LC
05
LC
01
Q = 5250 J/s
CC
02
LC
02
x .y
FC
06
CT
06
LC
06
VB = 2.1864 mol/s
B = 0.50 mol/s
xb = 0.01
Figure 3.1: Example process with reactor, distillation, recycle, basic control and
product storage.
of the column is controlled by manipulation of the reflux ratio. Pressure in the
column is controlled by the condenser duty with a PI-controller with a fixed
set-point. No heat integration was possible since the temperature in the reactor
is too low for heat integration with the reboiler. The product is stored in a
tank park from which the product can be transferred to a ship, train or tank
wagon. In this specific case we will assume that the product can be blended
and therefore the tanks are considered to be part of the production site to be
optimized.
Dynamic optimization problem
The dynamic optimization is part of the control hierarchy as shown in Figure 1.1
and gets information from the scheduler when to produce what product. Typical
look ahead of this schedule is a couple of days or weeks and typical update rate
is daily at most. Objective of the dynamic optimizer of consideration is the
realization of this production schedule provided by the scheduling and planning
department at minimum costs. Typical look ahead for a dynamic optimization
is a couple of hours up to one day. The production schedule fixes quality and
quantity in time and those values enter the dynamic optimization problem as
constraints on the content of a tank. This formulation leaves maximum freedom
77
for the optimizer. In this specific case we want to enforce a transition followed
by a quasi steady-state, which has advantages from an operation point of view.
The objective of the optimization problem for this case is:
”Realize the production schedule at minimal utility costs
while satisfying all input and path constraints”.
In the dynamic optimization as described here we chose a horizon such that two
complete tanks of the schedule are within the horizon. From the schedule the
qualities and delivery times are obtained and translated into path constraints
for the dynamic optimization. In this case after eight hours the first tank has
to contain a product with an impurity between 0.04 and 0.05 and the next tank
has to be ready after another eight hours with an impurity lower than 0.01. The
initial conditions of the plant in this example are a steady-state but could be
an arbitrary non steady-state. In case of an online application we need some
state estimation that provides the best estimate of the current state, which is
most likely not a steady-state. After solving the dynamic optimization only
the first set points are send to the plant after which the state estimation and
optimization is repeated. This is know as a receding horizon principle.
The utility cost are assumed to be dominated by the energy used in the
reboiler. Mixing energy is low and although cooling energy is significant, by
using cooling water its cost can be assumed to be negligible. Material costs do
not enter the economic objective since we consider a simple reaction without
secondary reactions, i.e. all base material A reacts only and completely to B.
The inventory is controlled by the base controllers. The boilup duty and reflux
ratio are the degrees of freedom in this optimization problem. This formulation prevents the snowball effect that is discussed in many papers, e.g. Wu
et. al. (2002). Furthermore we assume the plant to be operated in a push mode,
which implies that the fresh feed is not a degree of freedom but fixed in this
case due to constant upstream production.
The mathematical formulation of the economic optimization problem is as
follows
min
u∈U
s.t.
V =
tf
Lzdt
t=t0
ẋ = f (x, y, u) , x(t0 ) = x0
0 = g(x, y, u)
z = Cx x + Cy y + Cu u
0 ≤ h(z, t) ,
(3.1)
where x, y and u are state, algebraic and input variables, respectively. The
relevant input variables for the economic objective are selected by matrix L
78
that enables scaling of the variables as well. The performance variables are
defined by the matrices Cx , Cy and Cu , respectively, that are used to select
and scale the relevant performance variables. The path constraints h(z, t) are
defined by the inequality equations. Furthermore we chose the so-called Haar
basis as control parametrization
u(t) = P T φ(t) ,
(3.2)
with
φj (t) = 1,
φj (t) = 0,
j∆t ≤ t < (j + 1)∆t
otherwise.
(3.3)
We focused on the sequential approach that uses simulation to satisfy the model
equations enabling ease of implementation. According to Vassiliadis (1993) sequential approach is more successful for large scale dynamic optimization problems than the simultaneous approach (see e.g. Biegler, 1984). Yet researches
are continuously making progress to develop more efficient dedicated solvers for
the simultaneous approach (Biegler, 2002) making the approach more and more
competitive for large problems. Recall from the introduction that a plant model
easily consists of a hundred thousand equations and we consider in this example
case a model with two thousand equations.
We used piece-wise constant basis functions as the input parametrization
with a sample time of ten minutes overlooking a period of sixteen hours for
two decision variables. Finally, we approximated the gradient information by
computing the linear time-varying models along the trajectory, which is quite
efficient reusing Jacobian information from the simulation. Alternatives for this
approach are integration of the so-called sensitivity equations, solving adjoint
equations or parameter perturbation (see e.g. Vassiliadis, 1993; Støren, 1999).
In the sequential approach first the initial trajectory is simulated. Based
on this solution objective function and constraints can be evaluated. Then
gradient information is used to determine a trajectory, improving the objective
and constraints, which is simulated again. This sequence is repeated until some
termination criterion is met using a standard sqp.
Approximation of objective function and constraints
The integral in the objective function V (z) in Equation (3.1) is replaced by
a trapezium rule to approximate integral and we sampled the inequality constraints h(z, t). Furthermore we added a quadratic term to the objective, which
regularizes the optimization problem. Matrix Q quadratically penalizes subsequent input moves and therefore smooths the optimal solution. Advantage of a
smooth optimal signal is that it is more likely to be accepted and understood by
79
Output variables (y)
Bottom impurity
Impurity tank 1
Impurity tank 2
Input variables (u)
Reboiler duty
Reflux ratio
Table 3.1: Definition output vector (y) and input vector (u).
operators. Furthermore it seems that the quadratic term reduces the number of
local minima. This is favorable when comparing solutions of this optimization
based on different reduced models.
This results in the following approximate formulation of the dynamic optimization problem3
min
p
s.t.
V =
N
i=1
(Lzi + 12 ∆uTi Q∆ui ) tf∆t
−t0
ẋ = f (x, y, u)
0 = g(x, y, u)
z = Cx x + Cy y + Cu u
0 ≤ hi (zi )
0 = −∆ui + ui − ui−1
,
x(t0 ) = x0
,
,
i = 1...N
i = 1...N .
(3.4)
where
p
=
[uT0 , uT1 , . . . , uTN −1 ]T .
(3.5)
In this specific case we defined three output variables: bottom impurity and the
impurities in the two tanks, respectively. The two inputs are defined as reboiler
duty and reflux ratio (see Table 3.1). The impurities in the tanks are required
since we want to meet the schedule that is defined as a specification on quality
in a tank. Operation strategy is to blend off spec product during the transition
to an on spec product in the tank. This enforces a swift transition followed by a
more stationary operation. Furthermore it is a robust operation strategy since
the quality in the tank is very soon within specification. With more product in
the tank it is less sensitive to control actions.
Reboiler duty appears in the objective function linearly reflecting the dominating utility costs of the plant. Note that the variables appear in an absolute
sense. Relative changes of the reboiler duty and reflux ratio enter the objective
in the quadratic term with z T = [y T , uT ]
10 0
L= 0 0 0 1 0
,
Q=
.
(3.6)
0 10
3 We
scaled the objective by dividing the original objective by the horizon length.
80
Matrices Cx , Cy and Cu are used for scaling which is required for proper constraint handling. Without scaling a constraint violation of e.g. 0.1o C would be
equally penalized as constraint violation of 0.1 in impurity. If we would like that
changes of the impurity in a range of 0 − 0.01 are equally important as changes
of the temperature in a range of 45 − 55o C, we need to scale to compensate for
units. Scaling was done by dividing the variable by is this range. This resulted
in the following scaling matrices4




100 0
0
0
0
 0
 0
10 0 
0 






0
0
10
0
0 
,
C
(3.7)
Cy = 
=
u



 .
 0
 0.1 0 
0
0 
0
0
0
0 0.1
Gradient of the objective and inequality constraints can be approximated by
linearization of the plant dynamics. This can be derived as presented next
∂V
∂p
∂h
∂p
=
=
∂V ∂z ∂u
,
∂z ∂u ∂p
∂h ∂z ∂u
,
∂z ∂u ∂p
(3.8)
(3.9)
∂h
∂u
∂z
where ∂V
∂z , ∂z and ∂p can be derived analytically and ∂u can be approximated
by a linear time variant model along the trajectory






D0
0
0 ...
0
∆u0
∆z0


D1
0 ...
0 
C1 Γ0

 ∆z1  
  ∆u1 
 

Φ
Γ
C
Γ
D
.
.
.
0
C
2
1
0
2
1
2
  ∆u2 
 ∆z2  
=
 , (3.10)

..
..  
..
 . 
 ..  
.
.
. 
  .. 
 .  
N!
−1


∆zN
∆uN
Φi Γ0
...
. . . . . . DN
CN
i=1
where for piecewise constant control signals the transition matrices are defined
as
Ai ∆t
Φi = e
(i+1)∆t
,
eAi s dsBi ,
Γi =
(3.11)
t=i∆t
with the Jacobians evaluated at ti = i∆t in xi , yi , ui
" #−1
" #−1
∂f ∂g
∂f ∂g
∂g
∂g
∂f
∂f
−
, Bi =
−
, (3.12)
Ai =
∂x
∂y ∂y
∂x
∂u
∂y ∂y
∂u
" #−1
" #−1
∂g
∂g
∂g
∂g
Ci = −Cy
+ Cx , Di = −Cy
+ Cu . (3.13)
∂y
∂x
∂y
∂u
4 In
this case we did not use Cx .
81
reboiler duty
7
initial
optimal
6
5
Qreb
4
3
2
1
0
0
2
4
6
8
time [hr]
10
12
14
16
reflux rate
5.5
initial
optimal
5
RR
4.5
4
3.5
3
2.5
2
0
2
4
6
8
time [hr]
10
12
14
16
Figure 3.2: Initial guess (dots) and optimal solution (solid) for reboiler
duty (top) and reflux rate (bottom).
x
b
0.08
initial
optimal
0.06
0.04
0.02
0
0
2
4
6
8
time [hr]
z1
0.08
10
0.06
0.04
0.04
0.02
0.02
0
2
4
time [hr]
6
0
8
14
16
z2
0.08
0.06
0
12
8
10
12
time [hr]
14
16
Figure 3.3: Trajectories of bottom impurity (top), first vessel (bottom left)
and second vessel (bottom right) to initial guess (dots) and resulting optimal
solution (solid). The constraints are the block shaped functions between 2 and
8 and between 10 and 16 hour. The optimal solution satisfies path constraints
whereas the initial guess does not.
82
This can be written as

 
G00
∆z0
 ∆z1   G10

 
 ∆z2   G20

 
 ∆z3  =  G30

 
 ..   ..
 .   .
∆zN
0
G11
G21
G31
GN 0
...
where Gij is the approximate
0
0
G22
G32
...
∂zi
∂uj .
0
0
0
G33
...
...
...
...
0
0
0
..
..
.
.
...









GN N
∆u0
∆u1
∆u2
∆u3
..
.





 , (3.14)



∆uN
Note that both zi and uj are vectors.
We need to select an initial guess for the trajectory and initial condition to
start the dynamic optimization. In this case we assume the plant to be in steadystate so the initial condition is determined by the inputs at time zero. The initial
guess for the optimization was determined by finding two steady-state operation
conditions for the column that satisfy end product quality, not compensating
for off spec product during transition. The reflux rate was chosen to be constant
whereas the reboiler duty was step shaped, as depicted with the line with dots
in Figure 3.2. The number of decision variables for the optimization is 192 (16
hours, 2 inputs and a sample time of 10 minutes).
Path constraints
Bottom impurity
t
i
xb
Impurity tank 1
z1
Impurity tank 2
z2
Reboiler duty
Qreb
Reflux ratio
RR
[hr]
[-]
≤
≥
≤
≥
≤
≥
≤
≥
≤
≥
(0. . . 2)
1. . . 11
0.10
0.00
0.10
0.00
10
0
10
0
[2. . . 8)
12. . . 47
0.05
0.04
0.05
0.04
10
0
10
0
[8. . . 10)
48. . . 59
0.10
0.00
0.10
0.00
10
0
10
0
[10. . . 16]
60. . . 96
0.01
0.00
0.01
0.00
10
0
10
0
Table 3.2: Definition path constraints.
The path constraints, depicted in Figure 3.3 and shown in Table 3.2, allow for
a transition of two hours after which in a period of six hours the production is
quasi stationary. Stationary production is attractive from a operational point
of view but restrains flexibility and may exclude opportunities. These should
be carefully balanced for a reliable and economic attractive operation.
83
Iter
F-count
1
1
2
3
3
5
4
7
5
9
6
11
7
13
8
15
9
17
10
19
11
21
12
23
13
25
14
27
15
29
16
31
17
33
18
35
19
37
20
39
21
41
22
43
23
45
24
47
25
49
solution:
f(x)
2.72085
3.02699
3.54475
3.93788
3.99698
3.87452
3.68284
3.61549
3.56108
3.51111
3.45050
3.40538
3.36935
3.33937
3.31468
3.29462
3.27738
3.26133
3.24626
3.23284
3.22063
3.20878
3.19912
3.19121
3.18387
3.17896
max
constraint
190.8
221.6
43.31
10.16
1.959
0.3502
4.906
1.849
2.17
4.023
6.801
4.49
4.153
3.712
3.054
2.497
2.531
2.824
2.704
1.919
1.304
1.314
1.302
1.181
1.392
0.8278
Step-size
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Directional
derivative
-0.0273
-0.0148
0.119
0.044
-0.141
-0.265
-0.0881
-0.0743
-0.0637
-0.0844
-0.0605
-0.0483
-0.0402
-0.0326
-0.0262
-0.0225
-0.0211
-0.02
-0.0178
-0.0162
-0.0164
-0.0136
-0.0108
-0.0106
-0.00656
-0.00234
optimization took: 190 min
solver options:
MaxIter=50 MaxFunEval=100 TolFun=1e-2 TolCon=1e-0 TolX=1e-6
Table 3.3: Output of optimization with original model.
In the top of Figure 3.3 constraints on the bottom impurity are shown whereas
the bottom left and right show the constraints on the impurity in the two tanks
in the tank park. Both tanks are empty and are filled by the bottom flow of the
column successively, both during eight hours.
The material produced during the transition does not satisfy end-product
quality specifications and would have been off spec product. However, during
the transition it is blended such that is does meet end-product quality specifications. In this way we do not need intermediate storage of off spec product
and prevent down stream blending. This is attractive from a economic point of
view but requires good planning and scheduling.
84
We used the commercially available modelling tool gproms that can solve differential and algebraic equations and selected the integration routine dasolv. Alternative solvers will probably yield different simulation times and consequently
different optimization times. We tried to stick to the default values of the solver.
For a more general introduction of numerical integration the reader is referred
to textbooks on this topic e.g. by Shampine (1994), Brenan et.al. (1996) and
Dormand (1996).
A sequential quadratic programming algorithm (fmincon) was adopted from
Tousain (2002) to solve the nonlinear constraint optimization that was available
in Matlab. In this approach a so-called Lagrangian is defined which is the original objective function extended with a multiplication of so-called Lagrangian
multipliers and constraints (see Appendix C). This formulation yields first order
optimality conditions that coincide with the original constraint optimization
problem. See for more details e.g. Nash and Sofer (1996), Gill et. al. (1981)
and Han (1977).
In joint work with Schlegel et al. (2002) a similar sequential optimization
problem was solved using adopt with optimization times in the same order of
magnitude. It would have been interesting to compare these and other implementations in more detail, which is left for future research.
3.2
Results
The results of the optimization is shown in Figure 3.3. The response
of the key variables to the initial guess are plotted with the lines with dots.
Severe constraint violations are visible. Especially the final impurity in the
second tank is far too high. The solid line in Figure 3.3 shows the trajectories
after optimization. The constraints are satisfied we can observe the undershoot
that nicely makes up for the off spec production during the second transition.
Furthermore we note that the maximum allowable impurity is produced, which
is explainable since this is economically most attractive. Note that before the
first vessel is finished the impurity in the bottom of the column already starts
to decrease. It anticipates on the transition to a higher purity by increasing the
reboiler duty half an hour before the first vessel is finished.
The output of intermediate results of the optimization is shown in Table 3.3.
The table shows the objective value, maximum constraint violation, step-size
and directional derivative for the different iterations. Furthermore the number
of function evaluations are shown in the second column of the table. The final
solution is presented on the last row of the table where it is clear that the
solution converged to a feasible solution were the maximum constraint violation
is less then the constraint tolerance. In Table 3.4 approximate times per task
are shown from which the most important observation is that the time required
85
to solve the sequential quadratic programme (sqp) is very small compared to
simulation and computation of gradient information. Overhead involves time
spent on e.g. communication between gproms and Matlab. The gradient
information is computed from the Jacobian that is available in the simulator at
each sample time. From this information a discrete time linear model can be
derived that is used in Equation 3.10.
simulation
gradient information
SQP
overhead
%
80
10
5
5
Table 3.4: Approximate contribution computational load per task.
The time required to solve this problem was over three hours, which shows that
we can solve the problem but it requires a significant amount of time. Using
this optimization for online control would be most probably too slow already
to adequately deal with disturbances. This only gets worse if we would include
more components and extra distillation columns to split those components. Note
that the implementation as presented here is already quite sophisticated by
supplying approximate gradient information in a computationally efficient way
and which reduces required optimization time.
3.3
Model quality
Development of a method to derive approximate models is useless without assessing the value of the approximate model. The appreciation of a model
for dynamic optimization mainly depends on two properties. First the computational load of the overall optimization to a large extent determines its potential
for online applications. Second is the quality of the solution of the optimization
based on the approximate model. The first problem already was addressed in
a previous section. The second property was addressed by Forbes (1994) in his
paper on model adequacy requirements. He mentions three ways to differentiate between alternative models. The most common used method is to compare
their capability in predicting key process variables. The second method is to
check the optimality conditions, which can be done if a solution of the original
problem is available. Finally, it is tested whether the optimization based on
the approximate model is capable of predicting the manipulated variables that
coincide with the true optimal trajectories. This is illustrated in Figure 3.4
where the solution of the optimization parameter space is depicted. The initial
86
.................................................
...............
..........
.........
........
........
.......
.......
.......
.
.
.
.
.
......
....
.
.
.
.
......
...
.
.
.
.
.
........................
.
.
.
.
.
.
....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
.
.
.
.
.........
...
.
.
.
.
.
.
.
.
.
...
...
.
.
.
.
.
...
...
.
.
.
...
....
...
....
....
....
....
...
.....
..........
........
.... 2
2
........
...
...
........
........
...
.......
..
........
...
.......
........
....
.
.
........
.....
.
.
.
.
.
.
.
.
.
.
.
.
.
.........
.
....................................
...........
.............
1...............
...................
..................................................
$
O(M)
O(M)
p0
Rnp
❘ p
p❘
p
I
p
$
O(M)
✒
O(M)
Figure 3.4: Model adequacy and solution adequacy illustrated by optimizations
$ mapping dif(O) based on original model (M) and approximate model (M)
ferent initial guesses to optimal solutions in Rnp . A solution derived with an
approximate model is adequate if it satisfies optimality conditions of the optimization based on the original model. An approximate model is adequate if the
true optimal solution satisfies optimality conditions of the optimization based
on the approximate model. In the graph p0 represents the initial guess that is
mapped to p1 and p1 by optimizations based on the original and reduced model,
respectively. Subsequently p1 and p1 are used as initial guesses for optimizations
based on the reduced and original model, respectively. The result of these last
two optimizations are p2 and p2 .
guess p0 can be processed by an optimization problem based on the original
$ respectively. These
model or based on a approximate model, O(M) or O(M),
two optimizations result in two solutions, p1 and p1 that can be used as initial
guess for two alternate optimization problems where we switch models. These
optimizations result in two solutions p2 and p2 for the optimization based on
the original and approximate model, respectively. In case of global optimality
the equations
p1
= p2
p1
= p2
(3.15)
,
(3.16)
are valid and for a good approximate model the equations
p1
≈ p2
p1
≈ p2
hold true.
87
(3.17)
,
(3.18)
An approximate model is adequate in case the distance p1 − p2 is acceptable
small. This implies that the predicted minimizer of the original problem is a
minimizer of the optimization problem based on the approximate model.
The solution of an optimization based on an approximate model is adequate
in case the distance p1 −p2 is acceptable small. This implies that the minimizer
of the optimization problem with the approximate model provides a solution
that is close to a minimizer of the original model. In case p1 − p1 is small we
p1 − p2 to be small, i.e. the same local minimizers. Note
expect p1 − p2 and that due to local minima p1 − p2 can be large and still result in the conclusion
that solution and approximate model is accurate.
In this thesis we will assess model performance by simulation and evaluation
of the key process variables and both adequacy test explained above. The optimization statistics will also be presented such as number of iterations, number of
function evaluations and CPU time of the optimization/simulation, to compare
the computational load of the different optimizations.
3.4
Discussion
In this chapter we showed a sensible dynamic optimization problem for
a typical chemical process. Although great effort was put into this formulation
one could think of arguments that would result in a different formulation of the
optimization problem.
An interesting extension of this problem would have been for example to add
the fresh feed to the set of optimization variables and add constraints on the
levels in the tank park. Furthermore, the time interval allowed for transition
could be enlarged providing more freedom and possibly more opportunities for
economic improvement. These scenarios were regarded as very interesting but
did not add extra value to purpose of this example namely the study of different
model reduction techniques for dynamic optimization. Nevertheless this case
represents a large class of optimization problems in the process industry.
The optimization was successfully implemented and provided optimal trajectories that would be acceptable for real implementation. For off line trajectory
optimization we could stop here but for online optimization a sample time larger
than tree hours is not acceptable. Computational load is the main hurdle to
overcome going from off line to an online implementation that is fast enough to
adequately respond to disturbances.
We motivated and provided detailed information on the implementation
of the optimization problem. This is necessary to interpret the results since
seemingly unimportant details can have a decisive impact. Although we would
expect similar optimal trajectories for a simultaneous optimization approach we
did not put effort in comparing these two approaches.
88
Chapter 4
Physics-based model
reduction
In this chapter model reduction based on physics (remodelling) is discussed.
We will start to present the rigorous model for a reaction separation process
followed by the physics-based reduced model. Modelling assumptions are the
key for this reduction. This approach is mainly inspired on the question what
the added value is of detailed modelling for dynamic optimization. Simplified
models became outdated because for simulation purposes it was not a problem to
develop and use more detailed models. Dynamic optimization is computationally
more demanding than simulation and therefore the idea arose to fall back to
simple models. The quality of the reduced model is assessed in the last section
by execution of a dynamic optimization based on the reduced model.
4.1
Rigorous model
The process consists of a reactor and distillation column as depicted in
the flow sheet in Figure 3.1. The reactor is fed by a fresh feed and the recycle
stream from the top of the distillation. The outlet of the reactor is connected
to the feed of the column. First we will describe the very simple reactor model
followed by the column model, which is far more complex and therefore a better
candidate for remodelling. The detailed column model is a standard model
with component balances and an energy balance. The physical properties model
used to derive gas and liquid fugacity, enthalpy and density is based on a cubic
equation of state as described in Reid (1987). The simplified model is based on
a constant molar overflow model and constant relative volatility as described in
89
Skogestad (1997). This last model was slightly modified by making the relative
volatility a dependent of composition.
Reactor model
The reactor is modelled as a continuously stirred tank reactor (cstr). We assume the content to be ideally mixed, which implies no temperature and no
concentration gradients within the reactor occur. In the reactor a first order
exothermic irreversible reaction A → B takes place without side reactions, so
only two species are modelled. The reaction rate depends on concentration CA
and temperature described by an Arrhenius equation.
Total molar balance
dN
= F0 + D − F ,
(4.1)
dt
where N is the total molar holdup in the reactor, F0 is the fresh feed, D is the
recycle flow from the top of the distillation column and F is the product flow
out of the reactor. All flows are total molar flows of the both reactant A and
product B. The outgoing flow F is controlled by a level controller. The fresh
feed F0 is assumed to be determined by the upstream process. The distillate
flow D is controlled by the level controller on the condenser.
Component mass balance
−Ea
dN z
= ρm F0 z0 + ρm Dxd − ρm F z − k0 CA V e RT ,
(4.2)
dt
where ρm is the molar density of both the reactant A and product B, z is the
molar fraction of A in the reactor, xd and z are molar fractions of A in recycle
and fresh feed flow, k0 and Ea are reaction rate and activation constants, respectively, and R is the relative gas constant. CA and T are the molar concentration
and temperature in the reactor and V is the volume in the reactor of reactant
A and product B together. The component mass balance can be rearranged
z
dz
dN
by substitution of the equalities dN
dt = N dt + z dt , CA V = N z and the total
molar balance:
ρm
Component balance rearranged
dz
F0
D
k0 −Ea
=
(z0 − z) + (xd − z) −
e RT z .
dt
N
N
ρm
(4.3)
Energy balance
ρm cp
−Ea
dN T
= ρm cp (F0 T0 + DTd − F T ) − Q + k0 CA V e RT ∆H ,
dt
90
(4.4)
where cp is the specific heat and assumed to be independent of temperature and
independent of composition, T0 , Td and T are temperatures of fresh feed, distillate flow and temperature in the reactor, respectively. Q is the heat removed
from the reactor by cooling and ∆H is the reaction heat constant.
Heat removal
dρc cpc Vc Tc
= ρc cpc Φc (Tci − Tc ) + Q
dt
Q = αc Ac (T − Tc ) .
(4.5)
(4.6)
where ρc , cpc , Vc and Tc are density, specific heat, volume and temperature of
the cooler, respectively. Φc is the coolant flow rate and Tci is the temperature
of the coolant at the inlet of the heat exchanger. Ac and αc are the total heat
transfer coefficient and the total area of heat transfer between coolant and reactor content. We assume the dynamics to be very fast and the coolant in the
heat exchanger to be ideally mixed. It is clear that by manipulating the coolant
flow rate, Φc , we can control the coolant temperature Tc . Tc determines with T
the driving force for the heat removal. A controller that measures the temperature in the reactor and actuates on the coolant flow rate can in this way control
the heat removal and therefore the reactor temperature. If the temperature
controller is well tuned and the heat exchanger is well designed we can assume
that we directly manipulate the heat removal. In the plant model that is used
in this thesis we assume that the heat exchanger is well designed. Therefore the
equations describing the heat exchanger are not included and the temperature
controller directly controls the heat removal.
The energy balance can be rearranged by substitution of the equalities
dN
N dT
dt + T dt , CA V = N z and the total molar balance.
dN T
dt
=
Energy balance rearranged
dT
dt
=
F0
D
Q 1
(T0 − T ) + (Td − T ) −
+
N
N
N ρm cp
−Ea
1
k0 e RT z∆H .
ρm cp
(4.7)
The reactor model is very simple, however, without tight temperature control
even this very simple model can exhibit limit cycling behavior.
We could add two modelling assumptions that simplify the reactor model
even further. We could assume the temperature and level to be constant. This
assumption can be motivated by the assumption that we are able to control
these variables very fast. The controlled variables are now considered to be
91
directly manipulated variables. These assumptions are applied in the model
that was depicted in Figure 2.2 but not in the models described in this chapter.
Rigorous column model
The distillation column consists of trays, a reboiler and condensor. The next
equations1 in this section are based on the gproms model library (1997). These
equations hold for all three sub-models where for the reboiler holds that there
vap
= 0, for the condensor there is no liquid flow
is no vapor flow entering Fin
liq
vap
entering Fin = 0 and no vapor flow out Fout
= 0 (total condensor). For all sub
models except for the feed tray there is no feed flow Ff eed = 0. The equations
can describe a mixture of two or more components. In this case it is a binary
mixture so N C = 2.
Component molar balance
dMi
dt
liq liq
vap vap
liq
vap
= Fin
zi,in + Fin
zi,in + Ff eed zi,f eed − Fout
xi − Fout
yi
(4.8)
i = 1 . . . N C.
Energy balance
dU
liq liq
vap vap
liq liq
vap vap
= Fin
hin + Fin
hin + Ff eed hf eed − Fout
h − Fout
h
+Q.
dt
(4.9)
Molar holdup
Mi = ML xi + MV yi
,
i = 1 . . . N C.
(4.10)
Total energy
U = ML hliq + MV hvap − P Vtray .
(4.11)
Mole fraction normalization
NC
i=1
xi =
NC
yi = 1 .
(4.12)
i=1
Phase equilibrium
vap
Φliq
i (P, T, xi )xi = Φi (P, T, yi )yi
i = 1 . . . N C.
(4.13)
Geometry constraint
ML v̄liq + MV v̄vap = Vtray .
1 gproms
(4.14)
code can be found on the webpage http://www.dcsc.tudelft.nl/Research/Software.
92
Level of clear liquid
Lliq =
ML v̄liq
.
Ap
(4.15)
Furthermore we used a physical property model based on a cubic equation of
state from which the properties of the liquid phase as well as the properties of
the vapor phase can be computed.
The Soave equation of state
Z 3 − Z 2 + (A − B − B 2 )Z − AB = 0 .
(4.16)
where
A
=
B
=
aP
R2 T 2
bP
.
RT
(4.17)
(4.18)
The values of a and b can be deduced from the critical of the pure components
properties and acentric factor and mixing rules
ai
=
fw,i
=
bi
=
2
R2 Tc,i
0.42748
Pc,i
&
%
1 + fw,i (1 −
'2
T
)
Tc,i
0.48 + 1.574wi − 0.17wi2
RTc,i
0.08664
.
Pc,i
(4.19)
(4.20)
(4.21)
The values of a and b for mixtures are given by the following mixing rules
'2
%N C
√
zi ai
a =
(4.22)
i=1
b
=
NC
z i bi .
(4.23)
i=1
where zi is the molar fraction in the mixture.
The cubic equation is solved, which yields three real roots
Zl ≤ Zj ≤ Zv .
93
(4.24)
which are implemented by the next three equations
1
A − B − B2
AB
= Z1 + Z2 + Z3 ,
= Z1 Z2 + Z2 Z3 + Z3 Z1 ,
(4.25)
(4.26)
= Z1 Z2 Z3 .
(4.27)
The liquid and vapor compressibility are determined as follows
Zliq
= Zl ,
(4.28)
Zvap
= Zv .
(4.29)
With the solution of the equation of state the physical properties of interest
are calculated using the following expressions. Substitution of Zliq results in
the property for the liquid phase whereas substitution of Zvap results in the
property for the vapor phase.
Specific volume
v̄ =
ZRT
.
P
(4.30)
Specific density
1
ρ=
xi M Wi .
v̄ i=1
NC
(4.31)
Specific enthalpy
1
h=
b
"
# "
#
T
NC
Z
∂a
xi
Cp,i (T )dT. (4.32)
a−T
ln
+ RT (Z − 1) +
∂T
Z +B
Tref
i=1
Fugacity
ln(Φi ) =
A
bi
(Z − 1) − ln(Z − B) +
b
B
"
#
√ # "
ai
bi
Z +B
−2
ln
,
b
a
Z
(4.33)
where Cp,i is given by
Cp,i (T ) =
4
(j)
Cp,i T j−1 .
j=1
For the tray model we need equations that specify the internal flows.
94
(4.34)
Overflow over weir
"
liq
Fout
v̄liq
= 1.84Lw
Pressure drop over the plate
vap
Fin
&
Ah
=
v̄vap
Lliq − βhw
β
# 32
vap
Pin
− P − ρliq gLliq
αρvap
.
.
(4.35)
(4.36)
Adiabatic operation of a tray
Q=0.
(4.37)
For reboiler and condensor Q in not equal to zero because of the heat input and
heat removal, respectively. Not all streams are present in all sub-models. Only
at the feed tray the feed enters the column so for all other trays the feed becomes
zero. Furthermore, in the reboiler there does not enter a vapor stream whereas
in the condensor there does not exit any vapor stream (total condensor). These
equations were put together in a rigorous model for the reaction separation
process.
4.2
Physics-based reduced model
From literature we know that a simple way of modelling a distillation
column is to assume constant molar holdup, constant molar overflow and constant relative volatility (see e.g. Skogestad, 1997). Such a model is probably
too simple for our case but will provide the starting point for our physically
reduced model2 .
Component molar balance
dMi
dt
liq liq
vap vap
liq
vap
= Fin
zi,in + Fin
zi,in + Ff eed zi,f eed − Fout
xi − Fout
yi (4.38)
i = 1 . . . N C.
Molar holdup is in the liquid phase
Mi = ML xi + MV yi ≈ ML xi
,
i = 1 . . . N C.
(4.39)
Mole fraction normalization
NC
i=1
2 gproms
xi =
NC
yi = 1 .
(4.40)
i=1
code can be found on the webpage http://www.dcsc.tudelft.nl/Research/Software.
95
Phase equilibrium by constant relative volatility
αi
yi = N C
xi
i=1 αi xi
,
i = 1 . . . N C.
(4.41)
Linearized liquid dynamics
∗
liq
liq
Fout
= Fout
+
ML − ML∗
.
τ
(4.42)
Constant vapor stream
vap
vap
Fout
= Fin
.
(4.43)
We will now go the details of the different assumptions that yield the reduced
model. An important assumption needed for the physical based model reduction
is that the pressure in the column is controlled at a constant set point. This
seems somewhat restrictive but in practice the pressure set point is not the main
handle for control of a distillation column. If the pressure drop over the column
is negligible compared to the absolute pressure we can assume the pressure to be
constant if we consider physical properties. The assumption of constant pressure
in the column with respect to the internal vapor flux is not valid since only a
pressure difference can enforce a flow. Still we need to determine the internal
vapor flow.
Constant molar vapor stream
Start with a steady-state assumption for the molar holdup and energy balance
0
0
liq
liq
vap
vap
= Fin
− Fout
+ Fin
− Fout
=
liq liq
Fin
hin
−
liq liq
Fout
h
+
vap vap
Fin
hin
(4.44)
−
vap vap
Fout
h
.
(4.45)
liq
yields
Elimination of Fin
liq
vap liq
vap
vap liq
liq
vap
(hliq
).
0 = Fout
in − h ) − Fin (hin − hin ) + Fout (hin − h
(4.46)
The specific enthalpy of the liquid phase is much larger than the specific enthalpy
of the vapor phase. This combined with the small differences in specific enthalpy
of neighboring trays justifies the assumption that
vap
liq
vap
hevap = hliq
.
in − hin ≈ hin − h
(4.47)
liq
vap
vap evap
liq
0 = Fout
(hliq
.
in − h ) − (Fin − Fout )h
(4.48)
Substitution yields
96
Rearranging this equation yields
liq
hliq
F vap − F vap
in − h
= in liq out .
evap
h
Fout
(4.49)
With the same argument that the differences of specific enthalpy of two neighboring trays is small relative to the heat of evaporation combined with vapor
streams that are in the same order of magnitude of the liquid streams, the difference in vapor streams must be approximately zero. Therefore we can assume
vap
vap
Fout
= Fin
.
(4.50)
Linearized hydrodynamics
In the rigorous model the liquid flow over the weir is described by
"
#3
1.84 Lliq − βhw 2
liq
Fout =
.
v̄liq
β
This equation can be linearized in L∗liq
%
'"
#1
L∗liq − βhw 2 1
3 1.84
∆Lliq
liq
∗
∗
∆Lliq = Fliq
Fout = Fliq +
+
,
∗
2 v̄liq
β
β
τh∗
(4.51)
(4.52)
where τh∗ is the hydraulic time constant in that operating point and ∆Lliq =
Lliq − L∗liq . This approach was adopted from Skogestad (1997). The value of τh∗
can be directly computed from the parameters and steady state values of the
rigorous model. In the simplified model we need to rewrite the equation as a
function of the molar liquid holdup. This is done with Mliq v̄liq = Lliq Ap
liq
∗
Fout
= Fliq
+
∗
∆Mliq v̄liq
,
τh∗ Ap
(4.53)
∗
with ∆Mliq = Mliq − Mliq
.
We can reduce the liquid hydrodynamics even further by assuming that
the liquid holdup on a tray is constant. This is equal to the assumption that
liq
liq
Fin
= Fout
. This assumption was only applied in the model used in Chapter 2.
Boilup rate and reboiler duty
Furthermore we need to add a new equation that relates the heat input to a
molar flow since the energy balance was eliminated. Again it starts with the
quasi steady-state assumption of mass and heat balance
liq
liq
vap
0 = Fin
+ Fout
+ Fout
0 =
liq liq
hin
Fin
+
liq liq
Fout
h
97
(4.54)
+
vap vap
Fout
h
+ Qreb .
(4.55)
liq
Elimination of Fin
yields
liq
vap liq
liq
vap
0 = Fout
(hliq
) + Qreb .
in − h ) + Fout (hin − h
(4.56)
And with the argument that the difference in specific enthalpy between liquid
and vapor is much larger than the difference in specific liquid enthalpy of two
liq
vap
neighboring trays, 0 ≈ hliq
hliq
≈ hevap , the equation can be
in − h
in − h
approximated by
vap liq
0 ≈ Fout
(hin − hvap ) + Qreb
≈
vap evap
Fout
h
+ Qreb .
(4.57)
(4.58)
Despite the fact that hevap (P, T, x) is a function of pressure, temperature and
composition, we will assume it to be constant since the variations are in general
quite small.
Constant relative volatility
The vapor-liquid equilibrium in the simplified model is computed by the constant relative volatility equation
αi
xi .
yi = N C
j=1 αj xj
(4.59)
We can observe that the equation is very simple compared to the cubic equation
of state approach. In particular, the equations can be solved explicitly if we
choose x as a state variable.
Underlying assumption for this simplification is that the pressure does not
play a role for the vapor-liquid equilibrium, which in general is certainly not
true. In practice a column is operated at a controlled pressure and therefore we
can neglect pressure effects on the vapor-liquid equilibrium.
The constant relative volatility was derived in order to remove the temperature effects on the vapor liquid equilibrium and therefore is implicitly accounted
for in the constant relative volatility approach. Still we need to check whether
the relative volatility is indeed constant. This can be checked by computing the
relative volatility in the original model.
It appears that in this case the relative volatility is not constant. Next
step is to find an explicit relation that describes the relation of the observed
relative volatility. This is where the engineering comes in and can hardly be
automated. After some trial and error we came up with the following extension
of the relative volatility
αi (xi ) = θ1,i + θ2,i xi .
98
(4.60)
Nice property of this choice is that the relative volatility remains an explicit
expression of x, which is computationally attractive. In practice the relative volatility of one component is set to one to normalize the other relative volatilities.
This leaves two parameters for a binary mixture.
Reconstruction of temperature
Temperature is not present in the simplified model because the energy balance
was removed by a quasi steady-state assumption combined with the constant
molar overflow assumption. This is not desired since temperature measurements
are often used as inferential measurements for composition.
If pressure and composition are fixed, temperature is no longer a degree of
freedom. From following equations the temperature can be computed from pressure, composition and the Antoine constants for the pure component: Raoult’s
Law, Dalton’s Law and the Antoine equation,
pi = xi p0i
,
pi = yi P
ln p0i = Ai +
,
Bi
,
Ci + T
(4.61)
respectively, where pi is the partial pressure, p0i is the saturation pressure of the
pure component, xi is the molar fraction in the liquid phase, yi is the molar
fraction in the gas phase, P is the pressure, T is the temperature and Ai , Bi
and Ci , the Antoine constants of component i, can be rearranged to
Bi
Ai + C +T
0
i
yi
e
p
.
(4.62)
= i =
xi
P
P
This equation can be written as an explicit equation for the temperature
(
T =
ln
Bi
)
yi
xi P
− Ai
− Ci .
(4.63)
In case the Antoine constants are not available a parameter estimation is necessary. Since the pressure variations already were limited by assuming the pressure
to be controlled in a tight interval, we can assume the pressure to be constant.
Results of optimization based on reduced model
The statistics for this optimization are presented in Table 4.1. This optimization
$ with initial
is referred to in the graph in Figure 3.4 as the optimization O(M)
guess p0 and solution p1 . The model statistics are listed in Table 4.2. The largest
reduction is achieved in the number of algebraic variables and the number of
nonzero elements.
99
bottom impurity
0.08
original
reduced
0.06
x
b
0.04
0.02
0
0
2
4
6
8
time [hr]
10
12
14
16
10
12
14
16
bottom flow
0.54
B
0.52
0.5
0.48
0.46
0
2
4
6
8
time [hr]
Figure 4.1: Responses of the key process variables of the reduced model to the
original optimal trajectories.
Iter
F-count
1
1
2
3
3
5
4
7
5
9
6
11
7
13
8
15
9
17
10
19
solution:
f(x)
2.72085
2.82561
3.15963
3.31951
3.29696
3.20454
3.14829
3.11884
3.09374
3.06053
3.04862
max
constraint
180.7
194.5
27.83
2.569
0.1779
0.08808
0.2168
0.2434
0.2807
0.6904
0.2509
Step-size
1
1
1
1
1
1
1
1
1
1
1
Directional
derivative
-0.159
0.0597
0.118
-0.0284
-0.116
-0.0764
-0.038
-0.0296
-0.0467
-0.0156
-0.0179
optimization took: 11 min
solver options:
MaxIter=50 MaxFunEval=100 TolFun=1e-2 TolCon=1e-0 TolX=1e-6
Table 4.1: Output of the optimization with the reduced model.
100
Compared to the optimization with the original model, see Table 3.3, the number of iterations is reduced by a factor of more than two. This does not explain
the factor seventeen in reduction of the overall optimization time. Simulations
with the reduced model are less computational demanding (approximately a
factor seven) and this is the second explanation for the tremendous reduction.
Note that the only difference between the two optimizations is the function evaluation and the gradient information since these are based on different models.
model:
original
reduced
ratio
nx
75
54
1.4
ny
1912
211
9.1
na
24
24
1.0
nnz
7237
859
8.4
Table 4.2: Properties original and reduced model with nx , ny and na the number
of differential, algebraic and assigned (fixed) variables, respectively. nnz is the
number of nonzero elements in the Jacobian.
Furthermore, note that reduction in simulation times by a factor of seven can
be associated best with the reduction of the number of nonzero elements, which
is a factor of eight approximately. This association is closer than the reduction
ratio of differential equations or total number of equations.
The quality of model and the solution of this dynamic optimization are topic
of the next section.
4.3
Model quality
As presented in Chapter 3 we can assess model quality in three different
manners. These three will be applied in this section. First we will present the
evaluation by simulation. Secondly, we assess model adequacy by verifying
optimality conditions of the optimization based on the reduced model with the
solution of the original optimization problem. This optimization is referred to
$ with initial guess p1 and
in the graph in Figure 3.4 as the optimization O(M)
solution p2 . Finally, we present the reversed check for optimality conditions
referred to as the optimization O(M) with initial guess p1 and solution p2 .
In that case the optimality conditions for the optimization problem with the
original model for the approximate solution are checked.
101
reboiler duty
7
pinit
popt
6
5
Q
reb
4
3
2
1
0
0
2
4
6
8
time [hr]
10
12
14
16
reflux rate
5.5
pinit
popt
5
RR
4.5
4
3.5
3
2.5
2
0
2
4
6
8
time [hr]
10
12
14
16
$ with as initial guess
Figure 4.2: Optimization with the reduced model O(M)
the solution from the optimization with the original model p1 (dots) and approximate solution p2 (solid).
Iter
F-count
1
1
2
3
3
5
4
7
5
9
6
11
solution:
f(x)
3.17896
3.15842
3.13354
3.03198
3.0054
2.9684
2.9614
max
constraint
58.03
0.09697
4.441e-016
1.19
0.2554
0.3072
0.6529
Step-size
1
1
1
1
1
1
1
Directional
derivative
-0.03
-0.0273
-0.13
-0.0287
-0.0433
-0.00832
-0.00449
optimization took: 6.8 min.
solver options:
MaxIter=50 MaxFunEval=100 TolFun=1e-2 TolCon=1e-0 TolX=1e-6
$ with
Table 4.3: Output of the optimization with the reduced model O(M)
as initial guess the solution from optimization with the original model p1 and
resulting in solution p2 as depicted in Figure 4.2.
102
Evaluation by simulation
A simple check of model quality is to do two simulations, first with the original
and repeat the simulation with the reduced model. Comparison of the key
process variables gives a fist impression of model quality. For the test of the
reduced model as discussed in this chapter we used the optimal trajectories
computed by the optimization based on the original model. These trajectories
where applied to the reduced model and Figure 4.1 shows the responses of
bottom impurity and bottom flow. These process variables are considered to be
the key process variables.
We see that constraints for the bottom impurity are not satisfied at all
times but the constraint violation is minimal. These minor differences can be
ascribed to small differences in internal flows related to the constant molar vapor
flow assumption. A separate check on the relative volatility showed that the
approximation was nearly perfect for the range of operation that was considered
here.
Differences in bottom flow rate are very small. In steady-state these cannot
differ because that would imply a violation of the conservation laws since it is
the only way out of the process.
Model adequacy
The second approach to assess model quality is to check the optimality conditions of the optimization based on the reduced model with the true optimum.
Or stated differently, check whether or not the true optimum is also an optimum
for the optimization based on the reduced model.
Results of these tests are presented in Figure 4.2. Some minor changes can be
noted mainly due to the constraint violation that occurs with the original guess.
This can be verified in the Table 4.3 that confirms the constraint violation at the
first iteration. Note that the responses of the key process variables to the initial
guess coincides with the responses used for the comparison based on simulation
and therefore equal to the responses depicted in Figure 4.1. Remarkable is the
time still needed to converge (Table 4.3) compared to the result as presented in
Table 4.1 with the step shaped initial guess. Despite the initial guess is quite
good the warm started optimization is less than two times faster compared to
the optimization with a cold start.
Although this is a sensible approach to assess the model quality for dynamic
optimization, in practice we would like to do optimizations based on the reduced model and assess the quality of those solutions. This is precisely what is
discussed next.
103
reboiler duty
7
pinit
popt
6
5
Q
reb
4
3
2
1
0
0
2
4
6
8
time [hr]
10
12
14
reflux rate
4.5
pinit
popt
4
RR
16
3.5
3
2.5
2
0
2
4
6
8
time [hr]
10
12
14
16
Figure 4.3: Optimization with the original model O(M) with as initial guess
the approximate solution from optimization with the reduced model p1 (dots)
and resulting solution p2 (solid).
Iter
F-count
1
1
2
3
3
5
4
7
5
9
6
11
solution:
f(x)
3.04862
3.29795
3.39243
3.37446
3.32522
3.30531
3.29149
max
constraint
73.07
24.13
10.71
0.4812
0.07207
0.3974
0.02154
Step-size
1
1
1
1
1
1
1
Directional
derivative
0.184
0.0767
-0.0215
-0.0655
-0.0221
-0.0149
-0.00599
optimization took: 46.8 min.
solver options:
MaxIter=50 MaxFunEval=100 TolFun=1e-2 TolCon=1e-0 TolX=1e-6
Table 4.4: Output of the optimization with the original model O(M) with
as initial guess the approximate solution from optimization with the reduced
model p1 and resulting in solution p2 as depicted in Figure 4.3.
104
Solution adequacy
The capability of a reduced or approximate models to predict optimal solutions
that are acceptable close to the true process optimum is crucial. This capability
is tested by redoing the optimization based on the original model with the
approximate solution as initial guess derived with the reduced model.
The result of this test is depicted in Figure 4.3 where the input trajectories
of the reboiler duty and reflux rate are shown. Again we only observe minor
differences between initial guess and converged solution. The intermediate statistics are gathered in Table 4.4, which shows an initial constraint violation that
is satisfied at cost of an increased objective value.
Note that the optimal solutions represent local minima. This is clearly visible
if we compare the reflux rate in Figures 4.2 and 4.3. This may be caused by
a lack of sensitivity for this input channel. The shape of the optimal input
trajectories of the boilup rate are striking similar, which pleas for the followed
approach.
From a computational point of view we already concluded that is was very
attractive to use the reduced model, reducing computation time with a factor
of seventeen. If we use the reduced model only for computation of an initial
guess as a first step, we still reduce the computational effort by more than a
factor three. The local minimum of the original optimization problem is slightly
better than a two-step approach. This emphasizes that we indeed found a local
minimizer.
A last comment is that coincidentally the same number of iteration is needed
for convergence as in the previous test.
4.4
Discussion
In this chapter the added value of a physically reduced model became
evident for this dynamic optimization problem. A significant reduction in computational time was achieved using the reduced model. Even when this solution
was used as an initial guess for the optimization with the detailed model the
overall time for optimization is shorter3 .
In practice, one could argue what the added value is of the use of the detailed
model since the quality of the reduced model is already quite high. Especially
considering that there will also be mismatch between the detailed model and
the real plant. Furthermore the presence of disturbances will require a feedback
mechanism. Since the performance of a feedback controller is limited by the
sample rate a controller based on a detailed model with low sample rate can be
3 In a closed loop implementation the converged solution of the previous cycle could be
used to construct an good initial guess.
105
outperformed by a controller based on a simple less accurate model with a high
sample rate.
The approach to derive the simplified model can be generalized to models
that describe physical properties based on detailed physical property models e.g.
based on a Soave equation of state. These detailed physical property models
have a large validity range that is not exploited within the dynamic optimization.
This allows for simplification by using simplified models available from literature
In this thesis the simplified distillation column from Skogestad (1997) was used.
This model was extended with equations that reconstruct variables that are
present in the detailed model but not available in the simplified model.
Key to the success of this approach is that the simplified model has similar
properties as the detailed model. The simplified model has a similar sparsity
structure and nonlinearity but with less differential and algebraic equations.
Inevitably will an optimization based on such a model be computationally less
demanding. It requires detailed process knowledge to distinguish between relevant phenomena and less important details to come to a right tradeoff.
Should this dynamic optimization run online we saw that a warm started
optimization with the reduced model is seven times faster than the optimization
with the original. A closed loop evaluation with disturbances is not performed
in this thesis but a controller based on the reduced model and a sampling rate of
ten minutes would most likely outperform the controller based on the rigorous
model and sampling rate of almost one hour. How much better the controller
would be mainly depends on the character of the disturbances.
106
Chapter 5
Model order reduction by
projection
In this chapter the possibilities of model order reduction by projection for dynamic optimization are explored. The focus in this chapter will be on the model
quality whereas the techniques to derive the projection have been discussed Chapter 2 of this thesis. Proper Orthogonal Decomposition and a Gramian based reduction are applied to the physics-based reduced model as presented in Chapter 4
and assessed within a dynamic optimization problem as outlined in Chapter 3.
5.1
Introduction
In Chapter 2 model order reduction by projection has been discussed.
Because of its success in reducing model order significantly while keeping the
approximation error small it was selected as a promising model reduction technique for dynamic optimization. Two different approaches to derive a suitable
projection are selected. The first approach is based on a proper orthogonal
decomposition of data generated by simulation of the optimal trajectory resulting from the optimization as defined in Chapter 3. The second approach is
a balanced reduction based on averaging controllability Gramians and averaging observability Gramians derived along the same optimal trajectory (that
was used for proper orthogonal decomposition) via linearization and Lyapunov
equations.
In this chapter we will assess those two different projection techniques for
both truncation and residualization and for all reduced model orders. As a
reference model we use the physics-based reduced model as presented in Chapter 4, which has forty seven differential equations. With two different projection
107
techniques and both truncation and residualization we have almost two hundred
reduced-order models that will be assessed in this chapter.
The motivation for assessing all model orders is that in literature no leads
are available how to find the optimal order for the projected model. One expects
a gradual increase of the simulation approximation error when increasing the
level of reduction. At the same time one expects that increasing the level of
reduction results in a gradual decrease of computational load of simulation. For
the sequential approach implementation of dynamic optimization, a reduction
of computational load of the simulation directly results in a reduction in time
required for one iteration. Model reduction by projection is therefore expected
to reduce the overall computational effort assuming that the number of iterations
remains approximately the same for the optimization.
Difference between model reduction by truncation and by residualization
is that in case of residualization the steady state approximation error is zero
whereas for truncation in general this is not the case. Note that model reduction
by singular perturbation is also a reduction by residualization. However, model
reduction by singular perturbation is based on a modal analysis whereas proper
orthogonal decomposition and Gramian based reduction are based on an energy
measure from input to state or output, respectively. For dynamic optimization
a zero steady-state approximation error is a favorable property because the tail
of the solution is dominated by low frequent dynamics and approach a steadystate.
When comparing proper orthogonal decomposition with Gramian based reduction both methods have strong and weak points. Strong point for proper
orthogonal decomposition is that the reduction is based on a trajectory, which
can be an optimal trajectory based on the full-order model. The reduction is
tailored for this specific trajectory and requires a conscious choice of input trajectories used for model reduction. Recall however that the result also depends
on scaling of the different state variables. Gramian based reduction does not depend on the scaling of states and really takes the input to output behavior into
account. On the other hand the projection matrix required for a Gramian-based
projection can be close to singular whereas the projection matrix for proper orthogonal decomposition is orthogonal. It is hard to predict what method should
be preferred.
It is not possible to give an analytical answer to the question which model
projection approach provides the best results in terms of quality and reduction of
computational load. Therefore in this chapter these different projection methods
are assessed experimentally within an optimization environment. Based on the
results we can answer the question whether it is possible to successfully apply
different projected models within dynamic optimization. This will be done based
on performance indicators that are outlined next.
108
Performance indicators
First assessment of a reduced-order model is an evaluation based on a simulation,
generating trajectories of key process variables. This is similar to what we did in
Figure 4.1 for the assessment of the physics-based reduced model. The optimal
inputs of the dynamic optimization based on the full-order model are used for
this simulation. The two optimal inputs are the boilup rate and reflux rate,
respectively. Both are depicted in Figure 4.3. A first quick evaluation is done
by visual inspection of key process variables and the norm of the error signal
on key process variables. The assumption that motivates this assessment is
that a reduced model that correctly predicts key process variables is suitable for
optimization. However, gradient information is not evaluated in this assessment,
which is essential for optimization. Gradient information is used within the
optimization to find an optimal solution. In case the gradient information is not
correct, it is unlikely that the optimization converges to the correct optimum.
Therefore we assess the reduced order models like we assessed the physics-based
reduced model in Chapter 4 within an optimization setting.
initial guess:
O(M)
$
O(M)
p0
p1
p1
p1
p2
p1
p2
-
Table 5.1: Model adequacy and solution adequacy tested by optimizations based
$ mapping different
on original model O(M) and reduced-order model O(M)
initial guesses to optimal solutions. The result of optimization based on the
original model and initial guess p0 is mapped to p1 . Then p1 is used as an
initial guess for the optimization based on the reduced-order model and mapped
to p2 . In a similar way the initial guess p0 is mapped to p1 by an optimization
based on the reduced-order model. Finally p1 is used as an initial guess for the
optimization based on the original model resulting in p2 .
We will explain the assessment of the reduced-order models using Table 5.1,
which is a different way of presenting Figure 3.4. The dynamic optimization,
O, as described in Chapter 3 with the physics-based reduced model, M, maps
the initial guess p0 into an optimal solution p1 . If the solution p1 coincides
with the optimal solution for the same optimization based on the reduced-order
model the reduced-order model is adequate. To check this we execute the same
$ and with
dynamic optimization but now based on the reduced-order model, M,
the optimal solution p1 as initial guess for the optimization. Since we have
almost two hundred reduced-order models this results in as many optimizations
to check model adequacy.
109
Conversely, we can execute a dynamic optimization based on the reduced-order
model with p0 as initial guess for the optimization. This results in an optimal
solution p1 for each reduced-order model. The quality of this solution can
be checked in a similar way as we checked the quality of the reduced-order
models. We repeat the dynamic optimization based on the full-order model
M and use p1 as the initial guess for the optimization. If the solution based
on the reduced-order model coincides with the optimal solution for the fullorder model the solution is adequate. Again this results in almost two hundred
optimizations. Before executing these optimizations a first quick evaluation of
the optimal solution p1 is done by visual inspection. This is done by comparing
these solutions to the optimal solution p1 based on the full-order model.
Besides the model quality we are interested in the computational load of
the dynamic optimization based on the reduced-order models. The results on
computational load of the dynamic optimization and the number of iterations
to reach convergence are defined as performance indicators. Note that for the
evaluation of the reduced models we use the same initial guess, p0 , for the optimization. First we will continue with the properties of the nonlinear models and
the implementation of projection of a set of differential and algebraic equations.
Then we will present the results of all optimizations as discussed in this section
to assess reduced-order model quality.
5.2
Projection of nonlinear models
Model properties
The class of models that are considered in this thesis can be described by a set
of differential and algebraic equations (dae) of at most index one (Brenan et
al., 1996),
0 = F (ẋ, x, y, u) ,
(5.1)
where x ∈ Rnx are defined as state variables, y ∈ Rny are defined as algebraic
variables, u ∈ Rnu are defined as input variables and t ∈ R is defined as the
variable time. Here nx is referred to as the order of the model.
We will confine to models defined by a set of explicit differential algebraic equations
ẋ = f (x, y, u)
0 = g(x, y, u) ,
(5.2)
where f ∈ Rnx and g ∈ Rny . This can be written in the general format as
−ẋ + f (x, y, u)
0 = F (ẋ, x, y, u) =
.
(5.3)
g(x, y, u)
110
This shows precisely the structure that is imposed on the models that we consider in this chapter.
A special case occurs when all algebraic equations can be made explicit in
term of an analytical expression of state and input variables
−ẋ + f (x, y, u)
0 = F (ẋ, x, y, u) =
.
(5.4)
−y + g̃(x, u)
The algebraic equations can be eliminated from the differential equations by
substitution
ẋ = f (x, g(x, u), u) = f(x, u)
y = g(x, u) .
(5.5)
This is defined as a set of ordinary differential equations (ode) because the
differential equations only depend on state variables and input variables. The
physics-based reduced model used in this chapter can be described as in Equation (5.4).
Projection dynamics of nonlinear models
For the type of differential and algebraic equations as described in Equation (5.2)
we will present how to project the dynamics. How to compute a suitable projection was discussed in Chapter 2.
1. Transform the original state under similarity into a new coordinate system
more suitable for model reduction
z = T (x − x∗ ) ,
(5.6)
and transformation to original coordinate system
x = T −1 z + x∗ ,
(5.7)
where T is square and nonsingular and x∗ is a steady-state vector in Rnx
defined by f (x∗ , y ∗ , u∗ ) = 0 and g(x∗ , y ∗ , u∗ ) = 0. The dynamics in the
new coordinate system can be written as
ż
0
= T f (T −1 z + x∗ , y, u)
= g(T −1 z + x∗ , y, u) .
(5.8)
2. Decompose the transformed space into two subspaces with state vectors
z1 ∈ Rnr and z2 ∈ Rnx −nr , respectively. Hence,
z1
T1
(5.9)
=
(x − x∗ ) ,
z2
T2
111
and transformation to original coordinate system
*
+
z1
x = T1 T2
+ x∗ ,
z2
(5.10)
where
*
T1
T2
+
T1
T2
=I,
(5.11)
with consistent partitioning. The dynamics in the new coordinate system
can be written as
ż1
ż2
0
= T1 f (T1 z1 + T2 z2 + x∗ , y, u)
= T2 f (T1 z1 + T2 z2 + x∗ , y, u)
=
g(T1 z1 + T2 z2 + x∗ , y, u) .
(5.12)
3. Finally, like in Equation (2.68) we can reduce the number of differential
equations by residualization, i.e. ż2 = 0
ż1
0
0
= T1 f (T1 z1 + T2 z2 + x∗ , y, u)
= T2 f (T1 z1 + T2 z2 + x∗ , y, u)
=
g(T1 z1 + T2 z2 + x∗ , y, u) ,
(5.13)
or like in Equation (2.70) we can reduce the number of differential equations by truncation, i.e. z2 = 0
ż1
0
= T1 f (T1 z1 + x∗ , y, u)
=
g(T1 z1 + x∗ , y, u) .
(5.14)
Note that model-order reduction by residualization, like model-order reduction
by singular perturbation, does not introduce a steady-state error.
From an implementation point of view, in a simulation environment the
transformation is just added to the set of equations. This increases the total
number of equations, as we will see in the next section, but prevents manual
elimination of x,
ż
0
z
= T f (x, y, u)
=
g(x, y, u)
= T (x − x∗ ) ,
where x is no longer a state variable but an explicit algebraic variable.
112
(5.15)
5.3
Results of model reduction by projection
The dynamic optimization used to assess the model quality was outlined
in Chapter 3. Before we present the results, some model statistics are presented
in Table 5.2. This table shows the model statistics of the physical reduced fullorder model and residualized and truncated models depending on the order of
reduction. Note that the addition of the transformation has a significant effect
on the number of nonzero elements due to the (non sparse) state transformation.
And note that the number of algebraic equations is much larger than the number
of differential equations.
model:
full-order
residualized
truncated
nx
54
54 − nr
54 − nr
ny
211
428 + nr
428 + nr
na
24
24
24
nnz
859
5574 − nr
5574 − nr · 47
Table 5.2: Properties original and reduced model with nx , ny and na the number
of differential, algebraic and assigned variables, respectively. nnz is the number of nonzero elements in the Jacobian. nr is the number of projected state
variables.
Although one may expect monotonicity in both error and computational load
for an increasing degree of reduction, experience proves otherwise. Therefore it
was decided to do the full-scale exploration of all reduced-order models, both
with residualization as well as truncation. With two different projections, two
reduction methods and an original model with 47 differential equations1 this
adds up to almost two hundred candidate reduced models.
A first evaluation is done by simulation: solution p1 (see Table 5.1) is applied as the input trajectory for reboiler duty and reflux rate for all projected
models. The output trajectories of the key process variables are compared to
the trajectory produced by simulation with the original model. In total we
will assess 184 different reduced models: two different transformations (proper
orthogonal decomposition and Gramian based) times two different projection
methods (residualization and truncation) times forty six different orders.
Secondly, a check whether the original optimal solution p1 satisfies optimality conditions for all reduced models. We therefore redo the optimization
with the reduced models with initial guess equal to the optimal solution p1 :
$ p1 ) → p2 and check the maximum constraint violation and count number
O(M,
of iterations before reaching convergence. Recall that p1 was already available
from O(M, p0 ) → p1 in Chapter 4.
1 In the model 54 differential equations are present because the three tanks and one redundant pressure controller were counted that are not part of the projection.
113
bottom impurity
0.08
0.06
x
b
0.04
0.02
0
0
2
4
6
8
time [hr]
10
12
14
16
gram resid
gram trunc
pod resid
pod trunc
0
||ε||
10
−5
10
0
5
10
15
20
25
order
30
35
40
45
50
Figure 5.1: Top: response of bottom impurity to the optimal trajectory p1 for all
different projections. Bottom: norm of error between original and approximated
response.
bottom flow
0.53
0.52
0.51
B
0.5
0.49
0.48
0.47
0
2
4
6
8
time [hr]
10
12
14
16
gram resid
gram trunc
pod resid
pod trunc
0
||ε||
10
−5
10
0
5
10
15
20
25
order
30
35
40
45
50
Figure 5.2: Top: response of bottom flow to the optimal trajectory p1 for all
different projections. Bottom: norm of error between original and approximated
response.
114
Thirdly, we compute p1 by solving the optimization problem again for all pro$ p0 ) → p1 . Then we will do a visual inspection of the
jected models: O(M,
solution and compute the error norm p1 − p1 .
Finally, we will present the maximum constraint violation of the approximate
solution applied to the original model and the number of iterations required to
reach convergence of the original optimization problem with the approximate
solution as initial guess. This implies that for all converged solutions p1 we
execute an optimization O(M, p1 ) → p2 and check for the maximum constraint
violation and count the number of iterations before reaching convergence.
Evaluation by simulation
Most common method to differentiate between models is by comparing their
ability to predict key process variables. The key process variables in this case are
the product quality and throughput, which are the bottom impurity and bottom
flow of the distillation column. These variables are predicted by simulation with
all different projections. The input trajectories used for the simulation are the
optimal input trajectories resulting from the optimization with the full-order
model as presented in Chapter 4.
The results of these simulations are presented in Figures 5.1 and 5.2 where
the order of the projected models is set on horizontal axis in the bottom of both
figures. On the vertical axis in the bottom of both figures the logarithm of the
squared integral error is shown. In the top of both figures the time responses are
shown of all simulations with an error smaller than the level represented by the
dashed line in the bottom of both figures. So each cross, diamond, square and
plus in the bottom figure under the dashed line represents a simulation of which
the resulting trajectory is plotted in the top of the figure. The truncated models are pointed out with square and diamond of which the square corresponds
to the Gramian based projection and the diamond to the proper orthogonal
decomposition based projection. The plus and cross represent the residualized
models of which the plus corresponds to the Gramian based projection and the
cross to the proper orthogonal decomposition based projection.
During the execution of a simulation with a reduced-order model it appeared
that some simulations were not successful. These unsuccessful simulations terminated due to convergence problems of the simulation algorithm. This can be
explained by the approximative character of the reduced model. Due to this
approximation some variables hit their upper of lower bound where those limits
where not hit in the full-order model. These upper and lower limits are helpful
during the process of building the model but are not strictly necessary during
simulation. Therefore all limits were removed from the reduced-order models,
which resulted in a larger number of successful simulations. Still at some point
the solver was not able to solve for all reduced models.
115
CPU time [sec]
200
gram resid
gram trunc
pod resid
pod trunc
180
160
140
120
100
80
60
40
20
0
0
5
10
15
20
25
order
30
35
40
45
50
Figure 5.3: Computational load for simulation of the optimal input trajectory
of different projected models measured in cpu seconds.
CPU time [min]
3
10
gram resid
gram trunc
pod resid
pod trunc
2
10
1
10
0
10
0
5
10
15
20
25
order
30
35
40
45
50
30
35
40
45
50
iterations
2
10
1
10
0
10
0
5
10
15
20
25
order
Figure 5.4: Optimizations with projected models and initial condition p0
$ p0 ) → p1 Top: cpu time required for the optimization. Bottom: number
O(M,
of iterations required by the optimization with different projections.
116
The bottom impurity trajectories of all successful simulations that had an error
below the dashed line are depicted in the top of Figure 5.1. Simulation results
that are not plotted because the integral error was too large are considered to
be bad. In a similar way the results of the bottom flow are shown in Figure 5.2.
First observation from Figures 5.1 and 5.2 is that the error does not monotonically decrease with the model order. This can be explained by the fact that
the model reduction was based on energy in signals and did not explicitly include feasibility of the simulation. From this observation we can conclude that
knowledge on the error of two neighboring reduced-order models (one with a
higher and the other with a lower order) does not give a guaranteed prediction
of the error of the intermediate reduced-order model.
Furthermore, we see that in general the residualized model seems to have a
higher accuracy than the truncated models. This is according to expectation.
For the lower-order models we see that the Gramian based residualized models
outperform the other reduced models in predicting the bottom impurity. In case
of prediction of the bottom flow the distinction is less pronounced, which may
be explained by the choice of output variables for the Gramian based reduction
of which the bottom flow was no part of.
The plots with the simulation results are intended to reflect model quality
of its capability to represent the input-output behavior of the original model.
As discussed in Chapter 1 the motivation for the model reduction was to reduce computational effort. Therefore we simply plotted the time needed for
the simulation for each reduced model with the same convention for the square,
diamond, plus and cross as in Figures 5.1 and 5.2. This resulted in Figure 5.3
where the dashed line represents the simulation time of the physics-based reduced model without transformation equations. From the models that are only
transformed but not reduced we can see the impact of approximately a factor
three on cpu compared to the original model.
The cpu decreases almost monotonically with the model order in case of the
proper orthogonal decomposed truncated models. Same holds for the proper
orthogonal decomposed residualized models but with more exceptions. Unexpected is the behavior of the Gramian based reduced models. For some reason
the cpu time peaks between model order of twenty and forty. Overall, it can be
concluded that projection did not result in a reduction of simulation cpu times
compared to the original model.
In joint work with Schlegel (Schlegel et al., 2002) the same effect on cpu time
of model reduction by proper orthogonal projection was observed. In that work
a similar model (177 equations of which 47 differential) but different dynamic
optimization was defined. Furthermore, in the sequential dynamic optimization
sensitivity equations were used to obtain gradient information instead of using
the linear time varying gradient approximation. For this optimization problem
the computational time of the projected model was almost five times longer
117
reboiler duty
7
6
5
Q
reb
4
3
2
1
0
0
2
4
6
8
time [hr]
10
12
14
16
4
10
2
||ε||
10
0
10
gram resid
gram trunc
pod resid
pod trunc
−2
10
−4
10
0
5
10
15
20
25
order
30
35
40
45
50
$ p0 ) → p1 for the boilup duty. BotFigure 5.5: Top: optimal trajectories O(M,
tom: error p1 − p1 for different projections.
reflux rate
6
RR
5
4
3
2
1
0
2
4
6
8
time [hr]
10
12
14
16
4
10
2
||ε||
10
0
10
gram resid
gram trunc
pod resid
pod trunc
−2
10
−4
10
0
5
10
15
20
25
order
30
35
40
45
50
$ p0 ) → p1 for the reflux ratio. BotFigure 5.6: Top: optimal trajectories O(M,
tom: error p1 − p1 for different projections.
118
than for the nominal model. Like in Figure 5.3 the computational time slightly
reduced with reducing the order of reduction. Furthermore the quality of the
residualized model was higher than the truncated reduced-order models. In the
paper not all reduced-orders are tested but only a selection of nine.
We need to stress that the choice of numerical solver can have a significant
impact on the cpu. We found that the dasolv routine had a good performance
on the original model. This same routine was used for simulation of the reduced
models with an absolute accuracy of 10−6 and relative accuracy of 10−4 .
In case the model has a sparse structure it is important to know whether this
structure is exploited by the solver. If a solver does not exploit model sparsity
structure the impact of a non sparse projection is less significant as we saw
in our example. Important observation is that if we assess a model reduction
technique by simulation, we actually assess the combination of specific reduced
model and solver.
$ p0 ) → p1
Evaluation of model quality by optimization: O(M,
Next we will do the evaluation of reduced models and optimal solutions derived
from them as already was discussed in the previous section. We start with
generating optimal solutions with all reduced models with initial guess equal to
the initial guess that was used in the optimization with original model. The
result is presented in Figures 5.5 and 5.6. These figures are a visual inspection
of the optimal trajectories computed for reboiler duty and reflux rate. In the
top of both figures, the trajectories are shown in the case that the error between
approximate and original solution is smaller than some error bound. The shapes
of the approximate optimal solutions are quite similar. The error p1 − p1 is
plotted in the bottom of both figures against model order for all projection
methods.
In the high order range the proper orthogonal decomposition based truncated
models perform worse than the other reduced models. In the mid range of
reduced model the absence of Gramian based reduced models attracts attention.
This can be related to long CPU times as presented in Figure 5.3 indicating a
higher change of termination of the simulation and subsequently termination of
the optimization. For the low-order reduced models it is clear that the solutions
based on the residualized models are more accurate where the Gramian based
reduced models produce slightly better solutions than the proper orthogonal
decomposition based reduced models.
Evaluation of solution adequacy by optimality check: O(M, p1 ) → p2
In the top of Figure 5.4 the computational load of each optimization based
on the reduced-order models is presented. The bottom of Figure 5.4 shows the
119
maximum contstraint at first iteration
4
10
gram resid
gram trunc
pod resid
pod trunc
2
gmax
10
0
10
−2
10
0
5
10
15
25
order
30
35
40
45
50
35
40
45
50
iterations for convergence
2
10
iter
20
1
10
0
10
0
5
10
15
20
25
order
30
Figure 5.7: Solution adequacy test O(M, p1 ) → p2 Top: Maximum constraint
violation at first iteration. Bottom: Number of iterations for convergence.
maximum contstraint at first iteration
4
10
gram resid
gram trunc
pod resid
pod trunc
2
gmax
10
0
10
−2
10
0
5
10
15
25
order
30
35
40
45
50
35
40
45
50
iterations for convergence
2
10
iter
20
1
10
0
10
0
5
10
15
20
25
order
30
$ p1 ) → p2 Top: Maximum constraint
Figure 5.8: Model adequacy test. O(M,
violation at first iteration. Bottom: Number of iterations for convergence.
120
number of iterations required for convergence also on a logarithmic scale and for
all reduced-order models. Missing data indicates a failure of the optimization
due to an unsuccessful simulation in one of the iterations. Reasons for an
unsuccessful simulation were already discussed in this section. Recall that the
original optimization converged in ten iterations. For the ten highest order
reduced models the projection has not an effect on the number of iterations
except for the proper orthogonal decomposition based truncated models. For
these reduced models the number of iterations was increased by a factor ten.
The other results are scattered from which can be concluded that the effect of
projection on the computational load of this dynamic optimization cannot be
predicted.
The optimal solutions that were computed by optimization with reduced
models were already assessed by visual inspection. An alternative way of assessment is to check whether the approximate solution satisfies optimality conditions of the optimization with the original model. This can be checked by
an optimization with the original model and the approximate solution as initial
guess. Two performance indicators are presented in Figure 5.7. In the top of
this figure the maximum constraint violation at the first iteration is shown on
a logarithmic scale for all approximate solutions against the model order that
was used to derive the approximate solution.
The second indicator is presented in the bottom of Figure 5.7 and show
the number of iterations for convergence. A number of solutions only need one
iteration, which implies that the approximate solution satisfies the optimality
conditions. All solutions but one where better than the original initial guess p0
since the number of required iterations is less than ten. Solutions generated by
the proper orthogonal decomposition based reduced models appear to perform
less in terms of maximum constraint violation and number of iterations required
for convergence. The rest of the results are again scattered and from that can
be concluded that the effect of projection on the quality of the optimal solution
cannot be predicted.
Note that the missing data is explained by non convergence of the optimization based on the projected models. Only the converged solutions could be
assessed.
$ p1 ) → p2
Evaluation of model adequacy by optimality check: O(M,
The last check presented here is a model adequacy test. In this test all reduced
models are used within an optimization with initial guess the original solution p1 .
Again we will assess the model by two performance indicators that are presented
in Figure 5.8. The first indicator is shown in the top of the figure and shows the
maximum constraint violation at first iteration. A second indicator is presented
121
in the bottom of the same figure and shows the number of iterations required
to reach a converged solution.
Although the results are again quite scattered some conclusions can be
drawn. First we can observe the performance of the proper orthogonal decomposition based residualized models outperform almost all other reduced models.
This can be explained by the fact that proper orthogonal decomposition based
models are tailored to this trajectory, so to speak. This is in line with the
assessment based on simulation performance of the key process variables.
Remarkable are the sudden jumps of e.g. the Gramian-based residualized
models. Many of these reduced models require only one iteration to converge
whereas seemingly random, some orders require between 2 and 50 iterations.
Connection between different figures
Let us trace a specific reduced-order model throughout all figures and pick the
proper orthogonal truncated model of order twenty. In the bottom of Figures
5.1 and 5.2 we see that it is the only reduced-order model of order twenty
that successfully performed the simulation with the optimal input trajectory
represented by p1 . This optimal solution was the result of the optimization
with initial guess p0 and the full order model or equivalently O(M, p0 ) → p1 .
For the other three reduced-order models of order twenty, the simulation of this
optimal trajectory failed. In Figure 5.3 we can find the according time required
for simulation of this optimal input trajectory. And because the others failed,
no simulation times in that figure are present for the other three reduced models
of order twenty.
The simulation needs to be successful in each iteration for the optimization
to proceed. The simulation results in Figures 5.1 and 5.2 correspond to the
$ p1 ) → p2 . Therefore in
first simulation executed within the optimization O(M,
Figure 5.8 for the model order of twenty only the truncated proper orthogonal
reduced-order model is present. The other reduced models of order twenty were
not successful in simulating the optimal trajectory, p1 , and therefore could not
provide a maximum constraint violation and the optimization could not continue
to find a converged solution p2 .
When we start the optimization with initial guess p0 and use the reduced$ p0 ) → p1 , not all optimizations result in a converged soluorder models, O(M,
tion. We can see in Figure 5.4 that for the reduced-order models of order twenty
only the optimization with the truncated proper orthogonal reduced-order model
converged. This can also be observed in Figures 5.5 and 5.6 where the resulting
optimal trajectories, p1 , are presented. Only for this solution we can perform
the solution adequacy test, O(M, p1 ) → p2 , as presented in Figure 5.7.
All four types of optimization as presented in Table 5.1 are used to assess
the reduced-order models. Next we will discuss and interpret the results.
122
5.4
Discussion
In this chapter an extensive assessment has been presented of almost
two hundred candidate reduced models by different projections that involved
more than five hundred optimizations.
We can come to the following observations after studying the results of all
optimization and simulations executed in this chapter:
1. It has been shown that quite some of the reduced models are adequate in
the sense that the optimal solution of the original model is also (close to)
an optimal solution for the reduced model. This is illustrated in Figure
5.8. Many optimizations based on the reduced model required only one
iteration, starting with the optimal solution of the original model as initial
guess. This implies that the gradient information of the reduced-order
model (approximately) coincides with the gradient information of the fullorder model.
2. We showed that it is possible to use projected models for dynamic optimization. This can be explained by the fact that the simulation algorithm
uses gradient information, which is the same gradient information used
within the optimization algorithm. Therefore a small simulation error
implies high quality gradient approximation, which explains the good optimization results. See Appendix D for details on the result of projection
on gradient information.
A first robustness test was to execute the optimization with a different
initial trajectory than was used for deriving the projections. In this way we
can investigate the sensitivity to the trajectory used for model reduction by
projection. No extensive testing was done on performance of the reducedorder models for different optimization objectives. One can think of other
quality specifications on the end product. It is nontrivial how to define a
set of trajectories that provides a suitable projection for a class of different
optimization objectives.
3. None of the simulations in Figure 5.3 with projected models had a lower
computational load than the full-order model. This result depends on
the combination of model properties and specific numerical integration
routine. Projection transforms a sparse structured model into a dense
model. In case that the model is sparse, which is exploited by the solver,
simulation of the reduced-order model is less efficient. Starting with the
full-order dense model we see that for proper orthogonal decomposition the
efficiency of the simulation increases a little by reducing the model order.
For the Gramian-based reduced-order models we see that the reduced123
order models in the mid range even get less efficient. For the lower order
models the efficiency is a little bit better than the full-order dense model.
4. Some optimizations based on reduced-order models required less iterations
than the full-order model. Despite the longer simulation time per iteration
this resulted in a small reduction of the overall optimization time. This
is illustrated in Figure 5.4. The reduction in overall optimization time
is very small and cannot counterbalance the model approximation. With
the full-order model we know that the simulation will be successful for
a large set of input trajectories, even without testing them on forehand.
This is purely based on the underlying model assumptions. After model
order reduction we cannot say much about model quality for other input
trajectories than the one used to derive the projection for. Even worse,
we cannot say anything on model quality on forehand for the trajectory
that was used to derive the projection.
5. The issue related to scaling of state variables, which affects reduction by
proper orthogonal decomposition, appeared not to be a problem in this
model. Apparently the model was properly scaled. In general it is worthwhile to scale your model properly. Simply because this is beneficial for all
numerical operations applied to the model such as simulation and linearization. From a theoretical point of view the Gramian based reduction is
more elegant since it does not depend on the coordinate system you start
with. From a practical point of view proper orthogonal decomposition is
more easy to apply.
6. All performance indicators that are used in this chapter behave discontinuously. The result of a specific reduced-order cannot be estimated by
interpolation of its neighboring reduced-order models. Model reduction
by projection is based on energy in signals and does not consider stability
or efficiency of simulation. The model reduction has a different objective than what we want to use it for because not better alternative is yet
available.
7. Comparing proper orthogonal decomposition with Gramian-based model
order reduction we can make several observations:
(a) Models reduced by proper orthogonal decomposition have more favorable simulation properties in terms of computational load, which
is illustrated in Figure 5.3. This holds for both the truncated and
residualized models.
(b) In this application the approximation error in general is higher for
truncated models than for residualized models, which is illustrated
in Figures 5.1 and 5.2.
124
By the experimental setup for testing two well known projection methods we
now better understand what the value is for the sequential implementation of
dynamic optimization based on dae model. Projection of dynamics of dae
with a sparse structure is not a suitable model reduction technique to reduce the
computational load of dynamic optimization. Open issues are the choice of input
trajectories that represent the relevant operating envelope. For this operating
envelope, the reduced-order model should provide a high quality approximation.
For nonlinear models it is not possible to check this analytically. Only by many
different simulations one can build confidence that the quality of a model is
good enough.
For the simultaneous implementation of dynamic optimization we can expect
similar results as we found for the sequential approach that was implemented
in this thesis. For other model structures than dae the results can be quite
different. One can think of an ode model combined with a fixed step-size
solver. In case the projected model does not have fast dynamics anymore, the
fixed step-size can be increased, which increases the simulation efficiency.
125
Chapter 6
Conclusions and future
research
In this thesis we pose three main research questions. First research question
is how to derive projections for nonlinear model order reduction suitable for
dynamic optimization. Second research question is how to derive an approximate model by physics-based model reduction suitable for dynamic optimization
and third research question is how to assess different reduced models for their
use within dynamic optimization. Conclusions on these research questions are
present in this chapter.
6.1
Conclusions
Model order reduction of nonlinear models by projection
• Projection is an effective method to reduce the number of differential equations for process models. Proper orthogonal decomposition and Gramianbased projection are selected as most promising transformation techniques
from which reduced models can be derived by truncation as well as residualization. Proper orthogonal decomposition is less involved than Gramianbased reduction but has a weaker theoretical basis. The freedom to scale
the differential variables before applying the proper orthogonal decomposition has a decisive impact on the resulting transformation but only a
pragmatic approach for scaling is available.
• Empirical Gramians have been unravelled and reduced to averaging of
linear controllability and observability matrices of local dynamics. Computation of these averaged Gramians is less involved than computation of
127
empirical Gramians. Compared to proper orthogonal decomposition, the
theoretical basis for Gramian-based reduction is much stronger. Gramianbased reduction really accounts for the relevant input to output behavior
whereas proper orthogonal decomposition only accounts for input to state
behavior. Internal scaling of differential equations (states) does not affect
the Gramian-based reduced models since reduction is based on input to
output behavior. The effect of scaling of different input and output variables in this case is clear. For Gramian based reduction the transformation
matrix can become nearly singular whereas the transformation by proper
orthogonal decomposition the projection matrix is always orthonormal.
• The relation between proper orthogonal decomposition and balanced reduction is presented. The snapshot matrix multiplied with its transpose
approximates the discrete time controllability Gramian in case white noise
signals are used to generate the snapshot data.
• Projection of nonlinear models does not have a predefined error bound
like there is for the linear models. It is not possible to interpolate results.
Results of two neighboring reduced-order models does not provide an estimate of model properties of the intermediate reduced-order model. This
is a serious problem that is rarely reported in literature.
Physics-based model reduction
• Opposed to the mathematical projection, which is generally applicable to
different process models we studied the possibility of physics-based model
reduction. This is a process specific approach and in the case of this
thesis involved a distillation process. This process is well studied and a
relative volatility model is used that simplifies the vapor-liquid equilibrium
described by a cubic equation of state. The relative volatility constant is
made dependent of component mole fraction to better match the original
equilibrium model that is based on a cubic equation of state.
The result of this approach is a tremendous reduction of both differential
and algebraic equations and number of nonzero elements in the Jacobian
without significantly effecting the input-output behavior. This reduction
approach (reusing simplified models available in literature) was systematic
and therefore it should be possible to carry it over to other processes
than distillation processes. Its success depends on operation envelope and
degree of exotic behavior captured.
• Model reduction of the physical property model is more generally applicable wherever many equations are involved to very precisely compute these
properties over a very wide operating range. It is questionable if this
128
accuracy is required for online applications and very likely that more efficient simplified physical property models will reduce computational load
without losing too much accuracy.
Assessment of reduced models for dynamic optimization
Assessment of models, and thus model reduction techniques, requires formulation of an objective. The objective for a model used within an online dynamic
optimization is not straightforward. Therefore different performance indicators
were introduced. First indicator was based on how models tend to be assessed
in general, which is by visual inspection or error norm of some key process
variables for a relevant input trajectory. Second indicator is visual inspection
of the optimal solution generated by an optimization based on the approximate
model compared to the optimal solution based on the original model with the
same initial guess for the optimization. This optimization generates two other
figures namely the number of iterations and CPU time required to solve the
optimization of which the last in important for the assessment of models for
online applications.
Third and forth performance indicators are optimality tests of respectively
approximate model and approximate solution. Therefore we start the optimization with the approximate model and original solution. The number of iterations
required for convergence indicates how well the original optimum coincides with
the approximate model. Finally we can start the optimization with original
model with the approximate solution. Again the number of iterations required
for convergence is an indicator that reflects how well the approximate solution
coincides with the original solution. Furthermore, we can simply compute the
maximum constraint violation on the approximate solution which also assesses
the quality of the approximate solution. These performance indicators enable
model assessment for dynamic optimization.
• The performance indicators as defined in this thesis assess the combination of model reduction and specific optimization technique. This notion
cannot be stressed enough and is underexposed in literature.
• With the performance indicators as defined in this thesis the physicsbased reduced model has been assessed and appeared to be very successful
according to the different performance indicators. The optimization is over
a factor of seventeen faster without losing too much accuracy in simulation
and optimization results. If this approximate solution is used as initial
guess for the original model, this still reduces the overall optimization
time with a factor of more than three.
• Model order reduction by projection is assessed by applying it to the
physics-based reduced model. This is partially successful according to
129
the performance indicators. Even with a strongly reduced number of
transformed differential equations it is possible to produce acceptable approximate solutions. The original optimal solution was for many reduced
models close to the optimum of the optimization based on the reduced
models. However, model reduction by projection does not reduce computational time of the optimization as implemented in this thesis, i.e. a
sequential dynamic optimization approach with linear time-varying gradient approximation.
• Model order reduction by projection of nonlinear models described by a
dae is not suited for simulation. Projection does not provide predictable
results in terms of simulation error and stability. It is too unreliable for
online applications and does not reduce the computational load of simulation. Consequently it will not reduce computational load of dynamic
optimization that utilizes a simulation. This is at least the case for the sequential implementation of a dynamic optimization problem as presented
in this thesis.
6.2
Future research
During the course of this thesis some interesting directions were identified but not explored. These could give direction for future research in the area
of model reduction for dynamic real time optimization.
• The performance indicators defined in this thesis should be extended with
a closed-loop performance indicator. In this setup the receding horizon
sampling rate is determined by the computational load. The tradeoff between model accuracy and overall computational speed can then be taken
into account resulting in an real-time closed loop performance indicator.
• In this thesis we restricted ourselves to the sequential dynamic optimization approach. Since the concept of the simultaneous dynamic optimization is different, it would be interesting to study the effect of projection
in that framework.
• The sequential approach may be improved by reuse of solutions. In the
current implementation the numerical solver has to redo all step-size and
predictor-corrector type of calculations every iteration. The solution of
the previous simulation may contain useful information that could speed
up simulation, especially close to convergence where input perturbations
become small. In a receding horizon implementation one even could think
of transfer of this information from one complete simulation to the next.
130
• One of the observations is that the correlation between the computational
load of simulations and the number of nonzero elements in the Jacobian
is much stronger than with e.g. the number of differential equations. It
would be interesting to eliminate algebraic variables by automatic substitution or symbolic manipulation. The crucial step is an algorithm that
identifies the implicit algebraic. This could lead to a notion of nonlinear
minimal realization.
• A fundamental problem working with nonlinear models is that no guaranteed error bounds can be derived. This would enable a more rigorous
way for nonlinear model assessment. The best one can do is to pick a set
of input trajectories representing the process envelope and test for these
conditions. This will always be a selection and therefore not a guarantee
for all inputs trajectories. The problem even gets more interesting when
disturbance scenarios are considered.
131
Bibliography
[1] H. Aling, S. Banerjee, A.K. Bangia, V. Cole, J. Ebert, A. Emami-Naeini,
K.F. Jensen, I.G. Kevrekidis, and S. Shvartsman. Nonlinear model reduction for simulation and control of rapid thermal processing. In Proceedings
of the American Control Conference, pages 2233–2238, 1997.
[2] H. Aling, J.L. Ebert, A. Emani-Neaini, and R.L. Kosut. Application of a
nonlinear model reduction method to rapid thermal processing (rtp) reactors. Proceedings of the 13th IFAC World congress, B:205–210, 30th june 5th july 1996.
[3] H. Aling, R.L. Kosut, A. Emami-Naeini, and J.L. Ebert. Nonlinear model
reduction with application to rapid thermal processing. In Conference on
Decision and Control, pages 4305–4309, 1996.
[4] I.P. Androulakis. Kinetic mechanism reduction based on an integer programming approach. AIChE Journal, 46(2):361–371, 2000.
[5] A.C. Antoulas and D.C. Sorensen. Approximation of large-scale dynamical
systems: An overview. International Journal of Applied Mathematics and
Computer Science, 11(5):1093–1121, 2001.
[6] K.J. Åström and B. Wittenmark. Computer-Controlled Systems. PrenticeHall, Upper Saddle River, 1997.
[7] T. Backx, O.H. Bosgra, and W. Marquardt. Integration of model predictive
control and optimization of processes. Proceedings ADCHEM 2000, 1:249–
260, 2000.
[8] J. Baker and P.D. Christofides. Finite-dimensional approximation and control of non-ninear parabolic PDE systems. International Journal of Control,
73(5):439–456, 2000.
[9] L.S. Balasubramhanya and F.J. Doyle III. Nonlinear model-based control
of a batch reactive distillation column. Journal of Process Control, 10:209–
218, 2000.
133
[10] E. Bendersky and P.D. Christofides. Optimization of transport-reaction
processes using nonlinear model reduction. Chemical Engineering Science,
55:4349–4366, 2000.
[11] G. Berkooz, P. Holmes, and J. L. Lumley. The proper orthogonal decomposition in the analysis of turbulent flows. Ann. Rev. Fluid Mech., 25:539–575,
1993.
[12] L.T. Biegler. Solution of dynamic optimization problems by successive
quadratic programming and orthogonal collocation. Computers and Chemical Engineering, 8(3):243–248, 1984.
[13] L.T. Biegler. Advances in simultaneous strategies for dynamic process optimization. Chemical Engineering Science, 57(4):575–593, 2002.
[14] L.T. Biegler, I.E. Grossmann, and A.W. Westerberg. A note on approximation techniques used for process optimization. Computers and Chemical
Engineering, 6(2):201–206, 1985.
[15] I.D.L. Bogle and J.D. Perkins. A new sparsity preserving quasi-newton update for solving nonlinear equations. SIAM Journal on Scientific Statistical
Computing, 11(4):621–630, 1990.
[16] K.E. Brenan, S.L. Campbell, and L.R. Petzold. Numerical Solution of
Initial-Value Problems in Differential-Algebraic Equations. SIAM, Philadelphia, 1996.
[17] H. Briesen and W. Marquardt. Adaptive model reduction and simulation
of thermal cracking of multicomponent hydrocarbon mixtures. Computers
and Chemical Engineering, 24:1287–1292, 2000.
[18] H.J.L. Van Can, H.A.B. Te Braake, S. Dubbelman, C. Hellinga, K.Ch.A.M
Luyben, and J.J. Heijnen. Understanding and applying the extrapolation
properties of serial gray-box models. AIChE Journal, 44(5):1071–1089,
1998.
[19] E.H. Chimowitz and C.S. Lee. Local thermodynamic models for high pressure process calculations. Computers and Chemical Engineering, 9(2):195–
200, 1985.
[20] M. Diehl, H.G. Bock, J.P. Schlöder, R. Findeisen, Z. Nagy, and F. Allgöwer.
Real-time optimization and nonlinear model predictive control of processes
governed by differential-algebraic equations. Journal of Process Control,
12:577–585, 2002.
[21] J.R. Dormand. Numerical methods for differential equations. CRC Press,
Boca Raton, USA, 1996.
134
[22] P. Duchêne and P. Rouchon. Kinetic scheme reduction via geometric singular perturbation techniques. Chemical Engineering Science, 51(20):4661–
4672, 1996.
[23] T.F. Edgar and D.M. Himmelblau. Optimization of chemical processes.
McGraw-Hill Book Co., New York, 1989.
[24] K. Edwards, T.F. Edgar, and V.I. Manousiouthakis. Reaction mechanism
simplification using mixed-integer nonlinear programming. Computers and
Chemical Engineering, 24:67–79, 2000.
[25] W. Favoreel, B. De Moor, and P. Van Overschee. Subspace state space
system identification for industrial processes. Journal of Process Control,
10:149–155, 2000.
[26] R. Findeisen, M. Diehl, I. Disli-Uslu, S. Schwarzkopf, F. Allgöwer, J.P.
Bock, H.G. Schlöder, and E.D. Gilles. Computation and performance assessment of nonlinear model predictive control. IEEE Conference on Decision and Control, December 2002.
[27] C.A. Floudas. Nonlinear and Mixed-Integer Optimization. Fundamentals
and Applications. Oxford University Press, Inc., New York, 1995.
[28] J.F. Forbes, T.E. Marlin, and J.F. MacGregor. Model adequacy requirements for optimizing plant operations. Computers and Chemical Engineering, 18(6):497–510, 1994.
[29] N. Ganesh and L.T. Biegler. A robust technique for process flowsheet
optimization using simplified model approximations. Computers Chemical
Engineering, 11(6):553–565, 1987.
[30] R. Gani, J. Perregaard, and H. Johansen. Simulation strategies for design and analysis of complex chemical processes. Trans. IChemE, 68(Part
A):407–417, 1990.
[31] P.E. Gill, W. Murray, and M.H. Wright. Practical Optimization. Academic
Press, London, 1981.
[32] gPROMS Technical Document. The gPROMS Model Library. Process
Systems Enterprise Ltd., London, UK, 1997.
[33] S. Gugercin and A.C. Antoulas. A comparative study of 7 algorithms for
model reduction. In Proceedings IEEE Conference on Decision and Control,
December 2000, pages 2367–2372, 2000.
[34] J. Hahn and T. E. Edgar. Reduction of nonlinear models using balancing
of emperical Gramians and Galerkin projections. In Proceedings of the
American Control Conference, pages 2864–2868, 2000.
135
[35] J. Hahn and T.F. Edgar. An improved method for nonlinear model reduction using balancing of emperical gramians. Computer and Chemical
Engineering, 26:1379–1397, 2002.
[36] S.P. Han. A globally convergent method for nonlinear programming. Journal of Optimization Theory and Applications, 22:197, 1977.
[37] P.J. Holmes, J.L. Lumley, G. Berkooz, J.C. Mattingly, and R.W. Wittenberg. Low-dimensional models of coherent structures in turbulence. Physics
Reports, 287:337–384, 1997.
[38] A. Isidori. Nonlinear control systems: An introduction. Springer Verlag,
Berlin, 1989.
[39] A. Kienle. Low-order dynamic models for ideal multicomponent distillation
processes using nonlinear wave propagation. Chemical Engineering Science,
55:1817–1828, 2000.
[40] P.V. Kokotovic, H.K. Khalil, and J. O’Reilly. Singular Perturbation Methods in Control: Analysis and Design. Academic Press, London, 1986.
[41] R.L. Kosut. Uncertainty model unfalsification: A system identification
paradigm compatible with robust control design. In Conference on Decision
and Control, pages 3492–3497, 1995.
[42] D. Kraft. On converting optimal control problems into nonlinear programming problems. Comput. and Math. Prog., 15:261–280, 1985.
[43] A. Kumar and P. Daoutidis. Nonlinear model reduction and control of highpurity distillation columns. Americal Control Conference, pages 2057–2061,
June 1999.
[44] M.J. Kurtz and M.A. Henson. Input-output linearizing control of constrained nonlinear processes. Journal of Process Control, 7(1):3–17, 1997.
[45] S. Lall, J.E. Marsden, and S. Glavaski. A subspace approach to balanced
truncation for model reduction of nonlinear control systems. Int. J. Robust
Nonlinear Control, 12:519–535, 2002.
[46] S. Lall, J.E. Marsden, and S. Glavaski. Empirical model reduction of controlled nonlinear systems. Proceedings of the IFAC World Congress, F:473–
478, July 1999.
[47] T. Ledent and G. Heyen. Dynamic approximation of thermodynamic
properties by means of local models. Computers and Chemical Engineering,
18(Suppl.):S87–S91, 1994.
136
[48] K.S. Lee, Y. Eom, J.W. Chung, J. Choi, and D. Yang. A control-relevant
model reduction technique for nonlinear systems. Computers and Chemical
Engineering, 24:309–315, 2000.
[49] F.L. Lewis. Optimal Estimation. John Wiley and Sons, Inc., New York,
1986.
[50] G. Li and H. Rabitz. Combined symbolic and numerical approach to constrained nonlinear lumping - with application to an H2 /O2 oxidation model.
Chemical Engineering Science, 51(21):4801–4816, 1996.
[51] G.Y. Li, A.S. Tomlin, and H. Rabitz. Determination of approximate lumping schemes by singular perturbation techniques. Journal of Chemical
Physics, 99(5):3562–3574, 1993.
[52] W.M. Ling and D.E. Rivera. Control relevant model reduction of Voltera
series models. Journal of Process Control, 8(2):78–88, 1998.
[53] W.M. Ling and D.E. Rivera. A methodology for control-relevant nonlinear
system identification using restricted complexity models. Journal of Process
Control, 11:209–222, 2001.
[54] H.P. Löffler and W. Marquardt. Order reduction of non-linear differentialalgebraic process models. Journal of Process Control, 1(1):32–40, 1991.
[55] W.L. Luyben, B.D. Tyréus, and M.L. Luyben. Plant wide process control.
McGraw-Hill Book Co., New York, 1999.
[56] W. Marquardt. Traveling waves in chemical processes. Int. Chem. Engng.,
30:585–606, 1990.
[57] W. Marquardt. Nonlinear model reduction for optimization based control of
transient chemical processes. In Proceedings of Chemical Process Control-6,
pages 30–60, 2001.
[58] B.C. Moore. Principal component analysis in linear systems: Controllability, observability, and model reduction. IEEE Transactions on Automatic
Control, 26(1):17–32, 1981.
[59] S.G. Nash and A. Sofer. Linear and Nonlinear Programming. McGraw-Hill
Book Co., New York, 1996.
[60] C.P. Neuman and A. Sen. Suboptimal control algorithm for constrained
problems using cubic splines. Automatica, 9:601–613, 1973.
[61] A. Newman and P.S. Krishnaprasad. Nonlinear model reduction for
RTCVD. IEEE Proceedings 32nd Conference Information Sciences and
Systems, 1998.
137
[62] H. Nijmeijer and A.J. van der Schaft. Nonlinear Dynamical Control Systems. Springer Verlag, New York, 1990.
[63] S.J. Norquay, A. Palazoglu, and J.A. Romagnoli. Application of Wiener
model predictive control (WMPC) to an industrial C2-splitter. Journal of
Process Control, 9:461–473, 1999.
[64] G. Obinata and B.D.O. Anderson. Model Reduction for Control System
Design. Springer, London, 2001.
[65] U. Pallaske. Ein verfahren zur ordnungsreduktion mathematischer prozessmodelle. Chem.-Ing.-Tech., 59(7):604–605, 1987.
[66] R.K. Pearson. Selecting nonlinear model structures for computer control.
Journal of Process Control, 13:1–26, 2003.
[67] R.K. Pearson and M. Pottmann. Gray-box identification of block-oriented
nonlinear models. Journal of Process Control, 10:301–315, 2000.
[68] J. Perregaard. Model simplification and reduction for simulation and optimization of chemical processes. Computers and Chemical Engineering,
17(5/6):465–483, 1993.
[69] L. Petzold and W. Zhu. Model reduction for chemical kinetics: An optimization approach. AIChE Journal, 45(4):869–886, 1999.
[70] J.B. Rawlings. Tutorial overview of model predictive control. IEEE Control
Systems Magazine, 20(3):38–52, 2000.
[71] R.C. Reid, J.M. Prausnitz, and B.E. Poling. The Properties of Gases and
Liquids. McGraw-Hill, New York, 1987.
[72] G.A. Robertson and I.T. Cameron. Analysis of dynamic models for structural insight and model reduction. Part 2: A multi-stage compressor shutdown case-study. Computers and Chemical Engineering, 21(5):475–488,
1996.
[73] G.A. Robertson and I.T. Cameron. Analysis of dynamic process models
for structural insight and model reduction. Part 1: Structural identification
measures. Computers and Chemical Engineering, 21(5):455–473, 1996.
[74] A.A. Safavi, A. Nooraii, and J.A. Romagnoli. A hybrid model formulation
for a distillation column and the on-line optimization study. Journal of
Process Control, 9:125–134, 1999.
[75] M.G. Safonov and R.Y. Chiang. A Schur method for balanced-truncation
model reduction. IEEE Transactions on Automatic Control, 34(7):729–733,
1989.
138
[76] J. Scherpen. Balancing for nonlinear systems. Systems and Control Letters,
21:143–153, 1993.
[77] M. Schlegel, J. van den Berg, W. Marquardt, and O.H. Bosgra. Projection
based model reduction for dynamic optimization. AIChE Annual Meeting.
Indianapolis, 2002.
[78] M. Schlegel, K. Stockmann, T. Binder, and W. Marquardt. Dynamic optimization using adaptive control vector parameterization. Computers and
Chemical Engineering, 29(8):1731–1751, 2005.
[79] G. Sentoni, O. Agamennoni, A. Desages, and J. Romagnoli. Approximate
models for nonlinear control. AIChE Journal, 42(8):2240–2250, 1996.
[80] L.F. Shampine. Numerical solution of ordinary differential equations.
Chapman and Hall, New York, 1994.
[81] S.Y. Shvartsman and I.G. Kevrekidis. Nonlinear model reduction for control of distributed systems: A computer-assisted study. AIChE Journal,
44(7):1579–1595, 1998.
[82] L. Sirovich. Analysis of turbulent flows by means of the empirical eigenfunctions. Fluid Dynamics Research, 8:85–100, 1991.
[83] S. Skogestad. Dynamics and control of distillation columns - a tutorial
introduction. In Institution of Chemical Engineers Symposium Series, pages
23–57, 1997.
[84] W.E. Stewart, K.L. Levien, and M. Morari. Simulation of fractionation
by orthogonal collocation. Chemical Engineering Science, 40(3):409–421,
1985.
[85] S. Støren and T. Hertzberg. Local thermodynamic models used in sensitivity estimation of dynamic systems. Computers and Chemical Engineering,
21(Suppl.):S709–S714, 1997.
[86] S. Støren and T. Hertzberg. Obtaining sensitivity information in dynamic
optimization problems solved by the sequential approach. Computers and
Chemical Engineering, 23:807–819, 1999.
[87] F.Z. Tatrai, P.A. Lant, P.L. Lee, I.T. Cameron, and R.B. Newell. Control relevant model reduction: A reduced order model for ’Model IV’ fluid
catalic cracking units. Journal of Process Control, 4(1):3–14, 1994.
[88] F.Z. Tatrai, P.A. Lant, P.L. Lee, I.T. Cameron, and R.B. Newell. Model reduction for regulatory control: An FCCU cas study. Chemical Engineering
Research and Design / Transactions Institution of Chemical Engineering
(Trans IChemE), 72(5):402–407, 1994.
139
[89] R.L. Tousain. Dynamic optimization in business-wide process control. PhD
thesis, Delft University of Technology, Systems and Control Group, 2002.
[90] V. Vassiliadis. Computational Solution of Dynamic Optimization Problems
with General Differential-Algebraic Constraints. PhD thesis, Imperial College of Science, London, 1993.
[91] P.M.R. Wortelboer. Frequency-weighted balanced reduction of closed-loop
mechanical servo-systems: theory and tools. PhD thesis, Delft University
of Technology, Systems and Control Group, 1994.
[92] K.L. Wu, C.C. Yu, W.L. Luyben, and S. Skogestad. Reactor/Separator
processes with recycles 2. Design for composition control. Computers &
Chemical Engineering, 27:401–421, 2002.
[93] L. Zhang and J. Lam. On H2 model reduction of bilinear systems. Automatica, 38:205–216, 2002.
[94] K. Zhou, J.C. Doyle, and K. Glover. Robust and Optimal Control. PrenticeHall, Upper Saddle River, 1995.
140
List of symbols
Symbol
A
B
C
D
F
F (.)
f (.)
G
g(.)
H
h(.)
L
M
M
$
M
nu
nx
ny
nz
np
nr
O(M)
O(M, p)
P
p
Q
T
t
tf
Description
system matrix ∈ Rnx×nx
system matrix input to state ∈ Rnx×nu
system matrix state to output ∈ Rny×nx
system matrix input to output ∈ Rny×nu
discrete time system matrix input to state ∈ Rnx×nu
differential algebraic equations
differential equations
discrete time system matrix ∈ Rnx×nx
algebraic equations
discrete time Hankel matrix ∈ R∞×∞
inequality constraints
linear performance weight ∈ Rnz×nz
covariance matrix
plant model
approximate model of M
number of inputs variables ∈ N
number of state variables ∈ N
number of output variables ∈ N
number of performance variables ∈ N
number of parametrization coefficients ∈ N
number of reduced state variables ∈ N
optimization operator based on model M
optimization operator based on model M and initial condition p
controllability Gramian ∈ Rnx×nx
parametrization coefficients optimization problem ∈ Rnp
observability Gramian ∈ Rnx×nx or quadratic weight ∈ Rnz×nz
transformation matrix ∈ Rnx×nx
time variable ∈ R
final time ∈ R
141
ts
U
u
V
Wo
Wc
X
x
Y
y
z
Γo
Γc
sample time variable ∈ R
orthogonal transformation matrix ∈ Rnx×nx
input variable ∈ Rnx
objective optimization
discrete time observability Gramian ∈ Rnx×nx
discrete time controllability Gramian ∈ Rnx×nx
state data matrix ∈ Rnx×N
state variable ∈ Rnx
output data matrix ∈ Rny·nq×N
output variable ∈ Rnx
performance variable ∈ Rnz
observability matrix ∈ R∞×nx
controllability matrix ∈ Rnx×∞
Sub/superscript
c
i
k
N
o
Description
controllability
time instant i
time instant k
number of samples
observability
Abbreviation
APC
DAE
DCS
DRTO
INCOOP
LTI
LTV
MPC
ODE
PDE
POD
RTO
Description
Advanced Process Control
Differential Algebraic Equaltion
Distributed Control System
Dynamic Real Time Optimization
INtegration of COntrol and OPtimization
Linear Time Invariant
Linear Time Varying
Model Productive Control
Ordinary Differential Equaltion
Partial Differential Equaltion
Proper Orthogonal Decomposition
Real Time Optimization
142
Appendix A
Gramians
A.1
Balancing transformations
Suppose we define the following continuous time linear time-invariant
system
ẋ
y
=
A
C
B
D
x
u
,
(A.1)
and the Lyapunov equations from which the controllability and observability
Gramians can be solved
AP + P AT + BB T = 0
AT Q + QA + C T C = 0 .
(A.2)
Let us define the eigenvalue decompositions of the controllability and observability Gramians P and Q, respectively, where Uc and Uo are chosen orthogonal
P = Uc Σ2c UcT
Q = Uo Σ2o UoT .
(A.3)
(A.4)
The product of controllability and observability Gramians can now be written
as
P Q = Uc Σ2c UcT Uo Σ2o UoT .
(A.5)
Consider the similarity transformation T
T
T = Σ−1
c Uc ,
143
(A.6)
and the transformed linear system
B
z
ż
T AT −1 T B
z
A
.
=
=
D
u
y
D
u
CT −1
C
(A.7)
The transformed Lyapunov equations
(T
or
−T
(T AT −1 )(T P T T ) + (T P T T )(T −T AT T T ) + (T B)(B T T T ) = 0
(A.8)
AT T T )(T −T QT −1 ) + (T −T QT −1 )(T AT −1 ) + (T −T C T )(CT −1 ) = 0 ,
P + PA
T + B
B
T = 0
A
T T A Q + QA + C C = 0 .
(A.9)
With this specific transformation the controllability gramian becomes the unity
matrix,
T
2 T
−1
P = T P T T = Σ−1
c Uc Uc Σc Uc Uc Σc = I .
(A.10)
The product of controllability and observability gramians in the transformed
domain can now be written as
= T P T T T −T QT −1 = T P QT −1
PQ
T
2 T
2 T
= (Σ−1
c Uc )(Uc Σc Uc )(Uo Σo Uo )(Uc Σc )
(A.11)
T
=
(Σc Uc Uo Σo )(Σo UoT Uc Σc )
T
=
H H .
The singular value decomposition of H gives
H = UH ΣH VHT ,
(A.12)
which enables to write the controllability gramian in the transformed domain
as
= IQ
=Q
= H T H = VH ΣH U T UH ΣH V T = VH Σ2 V T .
PQ
H
H
H H
(A.13)
We can balance the transformed linear system by applying a second transformation R such that
−1 = ΣH .
RPRT = R−T QR
(A.14)
1
2
VHT
The transformation that does this is R = ΣH
−1
−1
R−1
= VH ΣH 2
→
R−T
= ΣH 2 VHT
R
2
= ΣH
VHT
→
RT
2
= VH ΣH
.
1
1
(A.15)
The composed transformation that brings the original system into a balanced
realization is
1
T
2
RT = ΣH
VHT Σ−1
c Uc .
144
(A.16)
A.2
Perturbed empirical Gramians
Suppose we define initial conditions in sets of orthogonal groups Tl
with different amplitudes cm . The matrix of perturbed initial conditions used
for the discrete time perturbed data based observability Gramian becomes
X0 = [c1 T1 , c1 T2 , .., c1 Tr |c2 T1 , c2 T2 , .., c2 Tr | . . . |cs T1 , cs T2 , .., cs Tr ] ,
(A.17)
with Tl TlT = I, l = 1, . . . , r. In this special case X0 X0T can be written as
X0 X0T
=
r s
c2m Tl TlT
=
l=1 m=1
r s
c2m I = γI ,
(A.18)
l=1 m=1
which yields for right-inverse X0†
X0† = X0T (X0 X0T )−1 = γ −1 X0T .
(A.19)
Substitution of X0† = γ −1 X0T and YN = Γo X0 in the definition of Wo yields
T
Wo = X0† YNT YN X0† = γ −2 X0 X0T ΓTo Γo X0 X0T = γ −1 X0 X0T ΓTo Γo .
(A.20)
This matrix multiplication can be written as a sequence of summations
γ −1 X0 X0T ΓTo Γo
= γ −1
=
r
l=1
=
s
r c2m Tl TlT
l=1 m=1
s
q
(A.21)
k=0
q
1 2
1
k
· 2
cm Tl TlT F T C T CF k Tl TlT(A.22)
rs
c
m
m=1
k=0
r s
l=1
k
F T C T CF k
q
1 T
Tl Ψlm
k Tl ,
2
rsc
m
m=1
(A.23)
k=0
where
ilm
Ψlm
− yss )T (ykjlm − yss ) ,
ij k = (yk
(A.24)
ykilm = cm CF k Tl ei ,
(A.25)
since
is the output at time t = k∆t to the free response of the system to the perturbed
initial condition xo = cm Tl ei + xss .
This proves that for the special orthogonal sets of initial conditions, the
empirical observability Gramian defined by Lall is a special case of the discrete
time perturbed data based observability Gramian as defined in this thesis.
145
Appendix B
Proper orthogonal
decomposition
Suppose we define the following discrete time linear time-invariant system
xk+1
F G
xk
=
,
(B.1)
yk
uk
C D
and white noise input sequence u0 , u1 , . . . , uN . We can generate a snapshot
matrix XN by simulation of the system with the white noise input sequence,
XN = x1 x2 · · · xN ,
(B.2)
where xk is the value of the state at t = kh of the response to the input sequence
u(t). XN can be written as
N
XN = ΓN
c UN ,
(B.3)
N
with ΓN
c the discrete time controllability matrix and the input matrix UN
G FG · · · FNG
(B.4)
ΓN
c =


u0 u1 · · ·
uN
 0 u0 · · · uN −1 


N
(B.5)
= .
UN
 .
..
..
..
 ..

.
.
.
0 ···
0
u0
If the system is stable limq→∞ F q = 0. Therefore we can truncate the controllability matrix and input matrix and approximate XN
XN ≈ Γc UN ,
147
(B.6)
with Γc the truncated controllability matrix as in Equation (2.93) and the truncated input matrix as
Γc = G F G · · · F q G ,
(B.7)



UN = 


u0
0
..
.
u1
u0
..
.
···
···
..
.
uq
uq−1
uN −1
uN −2
..
.
0
···
0
u0
uN −1−q


 .

(B.8)
T
The expected value for N q of UN UN
is
 N −1
T
}
E{UN UN
k=0


=

uk uTk
0
..
.
0
N −2
k=0
...
uk uTk
..
.
0



=

0
(N − 2)I
...
..


=N

I
0
..
.
0
since E{ui uTj } = 0




uk uTk





(B.9)
(N − 1 − q)I
0
I
...
..
.
0
0
..
.



 ,

I
∀ i = j and E{ui uTj } = I
The expected value of
0
0
..
.
.
0

N −1−q
k=0
(N − 1)I
0
..
.

0
0
..
.
T
E{XN XN
}
∀ i = j.
is
T
T T
} = E{Γc UN UN
Γc } = N Γc ΓTc = N Wc .
E{XN XN
(B.10)
In case we compare both singular value decompositions we see that both are
identical except for the factor N
Γc ΓTc = Wc = Uc Σc UcT
T
XN XN
= N Wc = N Uc Σc UcT .
(B.11)
So that is the relation between the snapshot matrix exited with white noise and
the discrete time controllability Gramian.
148
Appendix C
Nonlinear Optimization
We consider only smooth, i.e. differentiable constraint nonlinear programming
problems
min f (x), x ∈ Rn
x
s.t.
gi (x) = 0, i ∈ {1, . . . , p}
(C.1)
hj (x) ≥ 0, j ∈ {1, . . . , q} .
Here, x is a n-dimensional vector with the so-called decision variables, f (x) is
the objective or cost function to be minimized subjected to p equality constraints
and q inequality constraints. These functions are assumed to be continuously
differentiable in Rn .
Equality constraints
Suppose we only have equality constraints the Lagrangian for the problem is
L(x, λ) = f (x) − λT g(x) ,
and the first order optimality condition is
∇x f (x) − λT ∇x g(x)
∇L(x, λ) =
=0.
−g(x)
(C.2)
(C.3)
We can write down the sequence that is used in a sequential quadratic program
(sqp) to compute the optimal value to this problem starting at an initial guess
for the problem xk and λk . The update scheme is
xk
∆xk
xk+1
=
+
,
(C.4)
λk+1
λk
∆λk
149
where ∆xk and ∆λk are the solution of
∆xk
2
∇ L(xk , λk )
= −∇L(xk , λk ) ,
∆λk
and where
∇2 L(xk , λk ) =
∇2xx f (xk ) − λT ∇2xx g(xk ) −∇x g(xk )
0
−∇x g(xk )T
(C.5)
.
(C.6)
Inequality constraints
Inequality constraints are more difficult to deal with because it is unknown
which inequalities are active. Interior point method a possible approach to deal
with inequality constraints and is discussed here. For simplicity we assume only
to have inequality constraints. By introducing slack variables the optimization
problem is transformed in
min f (x), x ∈ Rn
x
s.t.
hj (x) − y = 0, j ∈ {1, . . . , q}
(C.7)
y≥0.
A Lagrangian for this problem is
L(x, λ, y) = f (x) − λT (h(x) − y) − µ
q
log yj ,
(C.8)
j=1
with the first order optimality condition


∇x f (x) − λT ∇x h(x)
=0.
−(h(x) − y)
∇L(x, λ, y) = 
λj − yµj , j ∈ {1, . . . , q}
(C.9)
When µ approaches to zero the first order optimality conditions coincide with
the optimality conditions for the problem with the slack variable
∇x f (x) − λT ∇x h(x)
= 0
(C.10)
y − h(x) = 0
λj yj − µ = 0,
(C.11)
(C.12)
j ∈ {1, . . . , q} .
This implies that the j −th constraint is active when yj = 0 and the corresponding Lagrangian multiplier λj ≥ 0. Or, reversely, the j −th constraint is not active
when yj > 0 which results in the corresponding Lagrangian multiplier λj = 0.
150
This is the interior point method and a modern variant of a barrier method. In
this case the barrier function added to the Lagrangian assures that y ≥ 0.
Suppose µk is a sequence of positive numbers tending to zero. We can write
down the sequence that is used in a sequential quadratic program (sqp) to
compute the optimal value to this problem starting at an initial guess for the
problem xk , λk and yk . The update scheme is
 
 


xk
∆xk
xk+1
 λk+1  =  λk  +  ∆λk  ,
(C.13)
yk+1
yk
∆yk
where ∆xk , ∆λk and ∆yk are the solution of


∆xk
∇2 L(xk , λk , yk )  ∆λk  = −∇L(xk , λk , yk ) ,
∆yk
and where

∇2xx f (xk ) − λT ∇2xx h(xk )
2

−∇x h(xk )T
∇ L(xk , λk , yk ) =
0
(C.14)

−∇x h(xk ) 0
0
I  , (C.15)
Λk
Yk
with Yk = diag{y1k , . . . , yqk } and Λk = diag{λk1 , . . . , λkq }. With these matrices
λkj yjk with j ∈ {1, . . . , q} can be written as Λk ykT or Yk λTk . See for more details
e.g. Nash and Sofer (1996).
151
Appendix D
Gradient information of
projected models
For a dynamic optimization defined as
min
u(p)∈U
s.t.
tf
V =
Lzdt
t=t0
ẋ = f (x, y, u) , x(t0 ) = x0
0 = g(x, y, u)
z = Cx x + Cy y + Cu u
0 ≤ h(z, t) ,
(D.1)
the gradient information required for sequential dynamic optimization can be
derived from the partial differential equations
∂V
∂p
=
∂V ∂z ∂u
,
∂z ∂u ∂p
(D.2)
∂h
∂p
=
∂h ∂z ∂u
,
∂z ∂u ∂p
(D.3)
∂h
∂u
∂z
where ∂V
∂z , ∂z and ∂p can be derived analytically and ∂u can be approximated
by a linear time variant model along the trajectory.
∂z
The effect of model order reduction by projection on the approximation of ∂u
:
153







∆z0
∆z1
∆z2
∆z3
..
.


 
 
 
=
 
 
where
T1)
T2)
D0
1 Γ0
C
2 Φ
1Γ
0
C
0
C3 Φ2 Φ1 Γ
..
.
T1
T2
0
D1
2 Γ
1
C
1
C3 Φ2 Γ
0
0
0
D3
...
...
...
..
=I
0
0
D2
3 Γ
2
C
⇔
T1
T2
T1)
T2)







.
=
∆u0
∆u1
∆u2
∆u3
..
.
I
0
0
I




 , (D.4)


,
(D.5)
and the effect of truncation is
i = T1 Φi T1)
Φ
i = T 1 Γi
Γ
(D.6)
i = Ci T )
C
1
(D.8)
(D.7)
Therefore
3 Φ
2Φ
1Γ
0 = C3 T1) T1 Φ2 T1) T1 Φ1 T1) T1 Γ0 = C3 Φ2 Φ1 Γ0 .
C
(D.9)
Note that the size does not change. For residualization a similar derivation is
possible. The direct throughput term, D, is effected then as well. The gradient
is effected by projection which explains why the number of iterations can differ
for different projections.
154
Summary
Model Reduction for Dynamic Real-Time Optimization of Chemical
Processes
Jogchem van den Berg
The value of models in process industries becomes apparent in practice and
literature where numerous successful applications are reported. Process models
are being used for optimal plant design, simulation studies, for off-line and online
process optimization.
For online optimization applications the computational load is a limiting
factor. The focus of this thesis is on nonlinear model approximation techniques
aiming at reduction of computational load of a dynamic real-time optimization
problem. Two type of model approximation methods were selected from literature and assessed within a dynamic optimization case study: model reduction
by projection and physics-based model reduction.
The model in the case study is described by a set of nonlinear coupled differential and algebraic equations and for the implementation of the dynamic
optimization problem is chosen for the sequential approach. Assessment of different algorithms and implementations of the dynamic optimization are not part
this thesis.
Model order reduction by projection is partially successful. Even with a
strongly reduced number of transformed differential equations it is possible to
compute acceptable approximate solutions. The original optimal solution was
for many reduced order models close to the optimum of the optimization based
on the reduced order models. On reduction of computational time of the optimization however it is not successful. Model reduction by projection does not
reduce computational time of the optimization as implemented in this thesis.
Reduced order models by projection of nonlinear models, described by a set
of differential and algebraic equations, are not suited for simulation. Projection
does not provide predictable results in terms of simulation error and stability
155
and does not reduce the computational load of simulation. This can be ascribed
to the sparse structure that is destroyed by the projection.
Two projection methods from literature are compared. Empirical Gramians
were unravelled and reduced to averaging of linear Gramians. Proper orthogonal
decomposition was related to balanced reduction. In special cases the empirical
controllability Gramian can be computed directly from the data and used for a
proper orthogonal decomposition.
Physics-based model reduction is very successful in reducing the computational load of the sequential dynamic optimization problem. It reduces the
computational load in the case study with a factor of seventeen and with an acceptable approximation error on the optimal solution. This reduction technique
requires detailed process and modelling knowledge.
156
Samenvatting
Model Reduction for Dynamic Real-Time Optimization of Chemical
Processes
Jogchem van den Berg
Procesmodellen worden onder meer gebruikt voor het optimaliseren van fabrieksontwerpen, het uitvoeren van simulatiestudies en voor zowel off-line als online
optimalisatie van de bedrijfsvoering. Ook in de literatuur worden talrijke succesvolle toepassingen vermeld waaruit de toegevoegde waarde van het gebruik
van modellen in de procesindustrie duidelijk wordt.
Voor online optimalisatie toepassingen is de tijd die nodig is voor de berekeningen een limiterende factor. In dit proefschrift ligt de nadruk op approximatietechnieken van niet-lineaire modellen met als doel het reduceren van de
benodigde rekentijd voor het oplossen van dynamische optimalisatieproblemen.
Hiervoor zijn twee approximatietechnieken uit de literatuur geselecteerd en beoordeeld binnen een dynamische optimalisatie voorbeeldstudie: modelreductie
door middel van projectie en fysisch gebaseerde reductie.
Het model in de voorbeeldstudie wordt beschreven door een stelsel van nietlineaire gekoppelde differentiaal en algebraı̈sche vergelijkingen en als implementatie van het dynamische optimalisatieprobleem is gekozen voor de sequentiële
methode. Het beoordelen van verschillende algoritmen en implementaties van
dynamische optimalisatie zijn geen onderdeel van dit proefschrift.
Modelreductie door middel van projectie is deels succesvol gebleken. Zelfs
met een zeer sterk gereduceerd aantal getransformeerde differentiaal vergelijkingen is het mogelijk gebleken om acceptabele benaderende oplossingen te
berekenen. Ook lag het oorspronkelijke optimum voor veel gereduceerde modellen dicht bij het optimum gebaseerd op het gereduceerde model. Het reduceren
van de benodigde rekentijd was daarentegen minder succesvol. Modelreductie
door middel van projectie levert geen reductie op van de benodigde rekentijd
van het optimalisatieprobleem zoals dat geı̈mplementeerd is in dit proefschrift.
157
Modelreductie door middel van projectie van niet-lineaire modellen, beschreven
door een set van differentiaal en algebraı̈sche vergelijkingen, is niet geschikt voor
simulatie doeleinden. Projectie levert geen voorspelbare resultaten in termen
van simulatiefout en simulatiestabiliteit en reduceert de benodigde rekentijd van
de simulatie niet. Het laatste kan toegeschreven worden aan de ijle structuur
die teniet wordt gedaan door de projectie.
Er zijn twee projectiemethoden uit de literatuur vergeleken. Empirische
Gramianen zijn ontrafeld en teruggebracht tot het middelen van lineaire Gramianen. En proper orthogonal decomposition is gerelateerd aan gebalanceerde reductie. In bijzondere gevallen kan de empirische Gramian direct uit de data
berekend worden die ook gebruikt wordt voor een proper orthogonal decomposition.
De op vereenvoudigen van de fysica gebaseerde modelreductie is zeer effectief in het reduceren van de rekentijd die nodig is voor het oplossen van
een sequentiële dynamisch optimalisatieprobleem. Het reduceert de benodigde
rekentijd in het geval van de voorbeeldstudie met een factor zeventien met een
acceptabele benaderingsfout op de optimale oplossing. Deze reductietechniek
vereist wel gedegen proces- en modelleringkennis.
158
Curriculum Vitae
Jogchem van den Berg was born at February 8, 1974 in Enschede, The Netherlands.
1986-1992
1982-1998
1998-2003
2003-
Atheneum-B at Jacobus College, Enschede.
MSc. Mechanical Engineering Systems and Control at Delft University of Thechnology. Thesis on Modelling of a Crystallization
Process conducted at DSM, Geleen.
PhD. Mechanical Engineering Systems and Control at Delft University of Thechnology. Thesis on Model Reduction for Dynamic
Real-Time Optimization of Chemical Processes.
Advanced Process Control Engineer at Cargill, Bergen op Zoom.
159
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertisement