Model Reduction for Dynamic Real-Time Optimization of Chemical Processes Cover design by Rob Bergervoet c Copyright 2005 Edoch Model Reduction for Dynamic Real-Time Optimization of Chemical Processes PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magniﬁcus prof.dr.ir. J.T. Fokkema, voorzitter van het College voor Promoties, in het openbaar te verdedigen op donderdag 15 december om 13:00 uur door Jogchem VAN DEN BERG werktuigkundig ingenieur geboren te Enschede Dit proefschrift is goedgekeurd door de promotor: Prof.ir. O.H. Bosgra Samenstelling promotiecommissie: Rector Magniﬁcus Prof.ir. O.H. Bosgra Prof.dr.ir. A.C.P.M. Backx Prof.ir. J. Grievink Dr.ir. P.J.T. Verheijen Prof.dr.-Ing. H.A. Preisig Prof.dr.-Ing. W. Marquardt Prof.dr.ir. P.A. Wieringa voorzitter Technische Universiteit Delft, promotor Technische Universiteit Eindhoven Technische Universiteit Delft Technische Universiteit Delft Norwegian University of Science and Technology RWTH Aachen Technische Universiteit Delft Published by: OPTIMA OPTIMA P.O. Box 84115 3009 CC Rotterdam The Netherlands Telephone: +31-102201149 Telefax : +31-104566354 E-mail: [email protected] ISBN 90-8559-152-x Keywords: chemical processes, model reduction, optimization. c Copyright 2005 by Jogchem van den Berg All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without written permission from Jogchem van den Berg. Printed in The Netherlands. Voorwoord Na een memorabele tijd van bijna zeven jaar staat dan uiteindelijk toch het resultaat van mijn onderzoek zwart op wit. En dat doet goed. Tijdens mijn afstuderen raakte ik er steeds meer van overtuigd dat het zeer de moeite waard zou zijn om een promotieonderzoek te gaan doen. Na enkele gespreken met Ton Backx en Okko Bosgra over een internationaal project, kwam het onderwerp modelreductie ter sprake waarvan ik meteen geloofde dat het een boeiend onderwerp zou zijn. Ik denk met erg veel plezier terug aan de absurdistische gesprekken afgewisseld met heftige inhoudelijke neuzel discussies. Meestal begon het serieus maar gelukkig was er altijd wel iemand met een verfrissende opmerking, om zo het belang van de zaak te relativeren. Een paar mensen wilde ik graag in het bijzonder bedanken. Ten eerste natuurlijk Okko die mij alle vrijheid heeft gegeven om mijn eigen plan te trekken op basis van onze inhoudelijke altijd interessante discussies. Adrie wil ik graag bedanken voor zijn gepassioneerde uitleg over alles was met chemie te maken heeft en voor de rol van klankbord die hij voor mij vervulde. Mijn kamergenoten Rob en Dennis, nestor David, Martijn, Eduard, Branko, Camile, Gideon, Leon, Maria, Martijn, Matthijs en Agnes, Carsten, Debbie, Peter, Piet, Sjoerd en Ton wil ik bedanken voor alle koﬃetafelgesprekken en borrelpraat. Ook wil ik mijn collega’s van het project bedanken waaronder Martin, Jitendra, Wolfgang Marquardt, Mario, Jobert, Wim, Sjoerd, Celeste, Piet-Jan, Pieter, Peter Verheijen, Johan Grievink. Zonder jullie was het project zeker niet geslaagd. Tenslotte wil ik mijn ouders bedanken voor de onvoorwaardelijke steun die ik altijd van hen gekregen heb. Mijn broer Mattijs en Kirsten voor Lynn voor wie ik nu eindelijk een goede suikeroom kan zijn. Rob voor het ontwerp van de omslag van mijn boekje en mijn andere vrienden de al die tijd verhalen hebben moeten aanhoren over de ups en downs die ik tijdens mijn promotietijd heb gehad. Op naar de volgende uitdaging! Jogchem van den Berg Rotterdam, oktober 2005 v Contents Voorwoord v 1 Introduction and problem formulation 1.1 Introduction . . . . . . . . . . . . . . . . 1.2 Problem exploration . . . . . . . . . . . 1.3 Literature on nonlinear model reduction 1.4 Solution directions . . . . . . . . . . . . 1.5 Research questions . . . . . . . . . . . . 1.6 Outline of this thesis . . . . . . . . . . . . . . . . . 1 1 4 17 25 28 29 2 Model order reduction suitable for large scale nonlinear models 2.1 Model order reduction . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Balanced reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Proper orthogonal decomposition . . . . . . . . . . . . . . . . . . 2.4 Balanced reduction revisited . . . . . . . . . . . . . . . . . . . . . 2.5 Evaluation on a process model . . . . . . . . . . . . . . . . . . . 2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 37 45 50 62 73 3 Dynamic optimization 3.1 Base case . . . . . . 3.2 Results . . . . . . . . 3.3 Model quality . . . . 3.4 Discussion . . . . . . 75 75 85 86 88 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Physics-based model reduction 89 4.1 Rigorous model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.2 Physics-based reduced model . . . . . . . . . . . . . . . . . . . . 95 vii 4.3 4.4 Model quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5 Model order reduction by projection 5.1 Introduction . . . . . . . . . . . . . . . . 5.2 Projection of nonlinear models . . . . . 5.3 Results of model reduction by projection 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 107 110 113 123 6 Conclusions and future research 127 6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.2 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Bibliography 140 List of symbols 141 A Gramians 143 A.1 Balancing transformations . . . . . . . . . . . . . . . . . . . . . . 143 A.2 Perturbed empirical Gramians . . . . . . . . . . . . . . . . . . . 145 B Proper orthogonal decomposition 147 C Nonlinear Optimization 149 D Gradient information of projected models 153 Summary 155 Samenvatting 157 Curriculum Vitae 159 viii Chapter 1 Introduction and problem formulation This thesis explores possibilities of model reduction techniques for online optimization based control in chemical process industry. The success of this control approach on industrial scale problems started in petrochemical industry and becomes now adopted by chemical process industry. This is a challenge because the operation of chemical plants diﬀers from petrochemical plant operation imposing diﬀerent requirements on optimization based control and consequently on process models. For online optimization based control we have limited computational time so computational load is a critical issue. This thesis focusses on the models and their contribution to online optimization based control. 1.1 Introduction The value of models in process industries becomes apparent in practice and literature where numerous successful applications are reported of steadystate plant design optimization and model based control. This development was boosted by maturing commercial modelling tools and continuously increasing computing power. Side eﬀect of this development is that not only larger processes with more unit operations can be modelled but each unit operation can be modelled in more detail as well. Especially spatially distributed systems, such as distillation columns and tubular reactors, are well known model size boosters. Large-scale models are in principle not a problem, at the most inconvenient for the model developer because of the long simulation times involved. Application of models in an online setting seriously pushes the demands on models 1 to its limits since the available time for computations is limited. Numerous solutions are thinkable that contribute in solving this issue varying from buying faster computers to solving approximate, less computational demanding, control problems. Industry focusses on implementation of eﬀective optimization based control solutions whereas university groups focus on understanding of (in)eﬀectiveness of these solutions. Understanding gives direction to development of more effective solutions suitable for industrial applications. This thesis is the result of close collaboration between industry and universities aiming at symbiosis of these two diﬀerent focusses. Online optimization From digital control theory (see e.g. Åström and Wittenmark, 1997) we know that sampling introduces a delay in the control system, limiting controller bandwidth. The sampling period is therefore preferably chosen as short as possible. A similar situation holds for online optimization. When applying online optimization-based control we can make a tradeoﬀ between a high precision solution with a long sampling period and an approximate solution at short sampling period. In case of a fairly good model and low frequently disturbances, a low sample rate would most probably be suﬃcient. However, a tight quality constraint in combination with a fast response to a disturbance for the same system will require a much higher controller bandwidth to maintain performance. So, the tradeoﬀ between solution accuracy and sampling period depends on plantmodel mismatch, disturbance characteristics, plant dynamics and presence of constraints. With current modelling packages, processes can be modelled in great detail and model accuracy seems unquestionable. Unfortunately, in reality we always have to deal with uncertain parameters and stochastic disturbances. One can think of heat exchanger fouling, catalyst decay, uncertain reaction rates and uncertain ﬂow patterns. This motivates the necessity of a feedback mechanism that deals with disturbances and uncertainties on a suitable sampling interval. So we can develop large scale models, based on modelling assumptions, that have some degree of accuracy. Still we need to estimate some key uncertain parameters online (e.g. heat exchanger fouling) from data. Why not choose for a shorter sampling period enabling a higher controller bandwidth by allowing some model approximation? At some point there must be a break even point where performance of controller based on an accurate model at low sampling rate is as good as based on a less accurate model at high sampling rate. Applications of linear model predictive controllers to mildly nonlinear processes illustrate this successful tradeoﬀ between model accuracy and sample frequency. This observation is the main motivation to research and develop nonlinear model 2 approximation techniques for online optimization based control. Aim of this approximative model is to improve performance by adding more accuracy to the predictive part without being overtaken by the downside of a slightly longer sampling period. Model reduction Model reduction is not a purpose in itself. Without being speciﬁc what is aimed at by reduction, model reduction is meaningless. In this thesis we will assess the value of model reduction techniques for dynamic real-time optimization. In system theory we associate model reduction with model-order reduction, which implies a reduction of the number of diﬀerential equations. Because linear model reduction was successful, this notion was carried over to reduction of process models governed by nonlinear diﬀerential and algebraic equations (dae). First diﬀerence between these two models types is obviously that dae process models are nonlinear in their diﬀerential part. This nonlinearity was precisely the extension we looked for since this should improve model accuracy. Second diﬀerence is that dae process models consist of many algebraic equations. In case of a set of (index one) nonlinear diﬀerential algebraic equations in general we cannot eliminate these algebraic equations by analytical substitution, bringing us back to ordinary diﬀerential equations. This implies we deal with a truly diﬀerent model structure if implicit algebraic equations are present. This difference has far reaching consequences for the notion of reduction of nonlinear models as will become clear later in this thesis. As discussed in the previous section, both solution accuracy and computational load of the online optimization determine the online performance. In case of online optimization based control, in principle we do not care about the exact number of diﬀerential or algebraic variables of a model; model reduction should result in a reduction of the optimization time accepting some small degradation of solution accuracy. Model approximation is an alternative description for the model reduction that will be used in this thesis since it is less associated to model order reduction only. A model is not only characterized by its number of diﬀerential and algebraic equations, but also by structural properties such as sparsity and controllabilityobservability. Time scales, nonlinearity and steady-state gains are important properties as well. The relation between model properties and computational load and accuracy is not always trivial, which can result in unexpected ﬁndings when evaluating model reduction techniques. Note further that the model is not the only degree of freedom within the optimization framework. The success of model reduction for online optimization-based control depends on the optimization strategy and implementation details such as choices of solvers and solver options. 3 Realizing that the ﬁnal judgement of the success of a model reduction technique for online optimization depends on many diﬀerent choices of implementation we elaborate in the next section on diﬀerent aspects that eﬀect the optimization problem. 1.2 Problem exploration In this section we will explore diﬀerent aspects to be considered when discussing dynamic real time optimization of large scale nonlinear chemical processes. First of all we need a model representing the process behavior. This is not a trivial task but is nowadays supported by various commercially available tools. Such a model is then confronted with plant data to assess its model validity, which in general requires changes in the model. After several iterations we end up with a validated model, which is ready for use within online optimization. Since we are interested in future dynamic evolution of some key output process variables we require simulation techniques to relate them to future process input variables by simulation. Computation of the best future inputs is done by formulating and solving an optimization problem. We will touch on diﬀerences between the two main implementation variants to solve such a dynamic optimization problem. Finally we will motivate the need for model reduction by showing the computational consequences of straightforward implementation of this optimization problem for the model size that is common for industrial chemical processes. Mathematical models Numerous diﬀerent names are available for models, each referring to a speciﬁc property of the model. One could distinct between models based on conservation laws and data driven models. The ﬁrst class of models is referred to as ﬁrst principles, fundamental, rigorous or theoretical models and are in general nonlinear dynamic continuous time models. These models are formulated by a set of ordinary diﬀerential equations (ode model) or a set of diﬀerential algebraic equations (dae model). The second class of models is referred to as identiﬁed, step response or impulse response models and are in general deﬁned as linear discrete time inputoutput regressive models. Nonlinear static models can be added to these linear dynamic models giving an overall nonlinear behavior. The subdivision is not as black and white as stated here and combinations of both classes are possible as the term hybrid models already implies. Process models that are used for dynamic optimization in general are formulated as a set of diﬀerential algebraic equations. In the special case that 4 all algebraic equations can be eliminated by substitution, the dae model can be rewritten as a set of ordinary diﬀerential equations. Distinction between these two models is important because of their diﬀerent numerical integration properties, which will be treated later in this chapter. Characteristic for models in process industry are physical property relations that generally do not allow for elimination by substitution and to a large extent contribute to the number of algebraic equations. Physical property calculations can also be hidden in an external module interfaced to the simulation software. Caution is required interfacing both pieces of software1 . Partial diﬀerential equations (pde model) describe microscopic conservation laws and emerge naturally in case spatial distributions are modelled. Classical examples are the tubular heat exchanger and tubular reactor. Although this type of models is explicitly mentioned here, the model can be translated into a dae-model that approximates the true partial diﬀerential equations. Dae and ode models both have internal variables, which implies that the eﬀect of all past inputs to the future can be captured by a single value for all variables of the model at time zero, as opposed to most identiﬁed models that in general are regressive and require historical data to capture future behavior of past inputs. Disadvantage of continuous time nonlinear models is the computational eﬀort for simulation whereas simulation with a discrete time model is trivial. On the other hand stability of a nonlinear discrete time model is nontrivial. Advantage of rigorous models is that they in general have a larger validity range than identiﬁed models because of their fundamental origin of nonlinear equations. Nonlinear identiﬁcation techniques of dynamic processes are emerging but are still far from mature. Although modelling still can be tedious, developments by commercial process simulation and modelling tools such as gPROMS and Aspen Custom Modeler allow people with diﬀerent modelling skills to use and build rigorous models quite eﬃciently. A graphical user interface with drag and drop features increased the accessability for more people than only diehard command prompt programmers. Still, thorough modelling knowledge is required to deal with fundamental problems such as index problems. All mathematical models can be described by its structure and parameters. Fundamental question is how to decide on model structure and how to interpret mismatch between plant and simulated data. Do we need to change model structure or do we need to change model parameter values? No tools are available how to discriminate between those two options not to mention help ﬁnding a better structure. 1 Error handling in the external module can conﬂict with error handling by the simulation software. 5 In case we do have a match between plant and simulated data, we could have the situation where not all parameters can be uniquely determined from available plant data. Danger is that the match for this speciﬁc data can be satisfactory but using a new data set could give a terrible match. Whenever possible it seems wise to arrange parameters in order of uncertainty. Dedicated experiments can decrease uncertainty of speciﬁc parameters such as activity coeﬃcients and pre-exponential factors in an Arrhenius equation or physical properties such as speciﬁc heat. Besides parameter uncertainty we have structural uncertainty due to lumping based on uncertain ﬂow patterns or uncertain reaction schemes. In some cases we can interchange model uncertainty by parameter uncertainty by adding a structure that can be inactivated by a parameter. Risk is that we end up with too many parameters to be determined uniquely from available data. Computation of the values of model parameters will be discussed in the next section. Identiﬁcation and validation Computation of parameter values can be formulated as a parameter optimization problem and is referred to as a parameter estimation problem in case the model structure is based on ﬁrst principles. In case model structure is motivated by mathematical arguments, identiﬁcation is more commonly used to address the procedure. Basically both boil down to a parameter optimization problem minimizing some error function. Important for model validation are the model requirements. Typically, model requirements are deﬁned in terms of error tolerance on key process variables over some predeﬁned operating region. Most often these are steady-state operating points but these requirements can also be checked for dynamic operation. Less common is a model requirement deﬁned in terms of a maximum commotional eﬀort. The objective of a parameter identiﬁcation is to ﬁnd those parameters that minimizes the error over this operation region. Resulting parameter values of either estimation or identiﬁcation are now ready for validation. During the validation procedure parameter values are ﬁxed and the model is used to generate predictions based on new input-output data. This split of data is also referred to as estimation data set and validation data set. From a more philosophical point of view a one could better refer to model validation by model unfalsiﬁcation (Kosut, 1995); a model is valid until proven otherwise. This touches on the problem that for nonlinear models not all possible input signals can be validated against plant data. For linear models we can assess model error because of the superposition principle and duality between time and frequency domain. Model identiﬁcation is a data driven approach to develop models. The therefore required data can either be obtained during normal operation, or as in most 6 cases, from dedicated experiments (e.g. step response tests). Elegant property of this approach is that the identiﬁed model is both observable and controllable, which is typically not the case for rigorous models. Since only a very limited number of modes are controllable and observable this results in low order models. In that sense rigorous modelling can learn from model identiﬁcation techniques. Linear model identiﬁcation is a mature area whereas for nonlinear identiﬁcation several techniques are available (e.g. Volterra series, neural nets and splines) without a thorough theoretical foundation. Neural networks have a very ﬂexible mathematical structure with many parameters. By means of a parameter optimization, referred to as training of the neural net, an error criterion is minimized. The result is tested (validated) against data that was not used for training. Many papers are written on this appealing topic with the main focus on parameter optimization strategy and internal approximative functions and structure. Danger of neural nets is over-ﬁtting, which results in poor interpolative predictions. Over-ﬁtting implies that the data used for training is not rich enough to uniquely determine all parameters (comparable to an under-determined or ill-conditioned least squares solution). Extrapolative predictive capability is acknowledged to be very bad (Can et al., 1998) and one is even advised to train the neural net with data that encloses a little bit more than the relevant operating envelope. This reveals another weak spot of this type of data driven models since data is required at operating conditions that are undesired. Lots of data is required, which can be very costly if these data has to be generated by dedicated tests. A validated rigorous model can take away part of this problem when used as an alternative data generator. Simulation Simulation of linear and discrete time models is a straightforward task whereas simulation of continuous time nonlinear models is more involved. In general simulation is executed by a numerical integration routine available in many diﬀerent variants. Basic variants are described in textbooks such as by Shampine (1994), Dormand (1996) and Brenan et al. (1996). The main problem with this method is that the eﬃciency of these routines are strongly eﬀected by heuristics in e.g. error, step-size and prediction order control, which is less easy to see through. Easily understandable are ﬁxed step-size explicit integration routines like Euler and Runga-Kutta schemes. The main problem here is the poor eﬃciency for stiﬀ systems due to small integration step-size. The stability region of explicit integration schemes limits step size whereas stability of implicit integration routines does not depend on step-size. Implicit ﬁxed step-size integration routines can be viewed as an optimization solved by iterative Newton steps. A well 7 known property of this Newton step based optimization is its fast convergence, given a good initial guesses. Numerous diﬀerent approaches based on diﬀerent interpolation polynomials are available for this initial guess under which the Gear predictor (Dormand, 1996) is probably known best. Routines with a variable step-size are more tedious to understand, caused by heuristics in step-size control. This control balances step-size with the number of Newton steps needed for convergence with the objective to minimize computational load. Similarly the order of the prediction mechanism may be variable and controlled by heuristics. Inspection of all options reveals that many handles are available to inﬂuence the numerical integration (e.g. absolute tolerance, relative tolerance, convergence tolerance, maximum iterations, maximum iteration of no improvement, eﬀective zero, perturbation factor, pivot search depth, etc.). Fixed step-size numerical integration routines exhibit a variable numerical integration error with a pre-computed upper bound whereas variable step-size routines maximize step-size constraint to a maximum integration error tolerance. Experience learns that consistent initialization of dae models is a delicate issue and far from trivial, since it reduces to an optimization problem with as many degrees of freedom as variables that are to be initialized (easily over tens of thousands of variables). Not only does a good initial guess speed up convergence, in practice it appears to be a necessity; with a default initial guess, initialization will most probably fail. Modelling of dae systems in practice is done by developing a small model that gradually is extended, reusing previously converged initial values as an (incomplete) initial best guess of both algebraic and diﬀerential variables. Numerical integration routines were developed for autonomous systems. Discontinuities can be handled but at cost of a (computationally expensive) reinitialization. Since a digital controller would introduce a discontinuity at every sample time, and consequently require a re-initialization, it can be attractive to approximate this digital controller by its analogue (continuous) equivalent if possible. For simulation of optimal trajectories deﬁned on a basis of discontinuous functions, it might be worthwhile to approximate the trajectory by a set of continuous and diﬀerentiable basis-functions. This reduces the number of re-initializations and therefore can improve computational eﬃciency, unless the step-size has to be reduced drastically where the diﬀerentiable approximation introduces large gradients. Selecting a solver and ﬁne tuning solver options balancing speed and robustness is a tedious exercise and makes it hard to derive general conclusions about diﬀerent available solvers. Generally models of chemical processes exhibit diﬀerent time-scales (stiﬀ system) and a low degree of interacting variables (sparsity). Sparse implicit solvers deal with this type of models very eﬃciently. 8 Optimization Like in the world of modelling, the ﬁeld of dynamic optimization has its own jargon to address speciﬁc characteristics of the problem. Most optimization problems in process industry can be characterized as non-convex, nonlinear, constrained optimization problems. In practice this implies that only local optimal solutions can be found instead of global optimal solutions. The presence of constraints requires constraint handling, which can be done in diﬀerent ways (see e.g. textbooks by Nash and Sofer, 1996 and Edgar and Himmelblau, 1989). Often these constraints are multiplied with Lagrange multipliers and added to the objective, which transforms the original optimization problem into an unconstraint optimization problem. We can distinguish between penalty and barrier functions. The penalty function approach allows for (intermediate) solutions that violate constraints (most probably they will, since solutions tend to be at the constraint), whereas the barrier function approach requires a feasible initial guess and from this solution guarantees feasibility. Finding a feasible initial guess can already be very challenging, which explains the popularity of the penalty function approach. For steady-state plant (Floudas, 1995) design optimization, typical optimization parameters are equipment size, recycle ﬂows and operating conditions like temperature, pressure and concentration. Discrete decision variables to determine the type of equipment (or number of distillation trays) yield a computational hard optimization known as a mixed integer nonlinear program (minlp). The optimization problem to be solved for computation of optimal input trajectories is referred to as a dynamic optimization problem and generally assumes smooth nonlinear models without discontinuities. Using a parametrization of these trajectories by means of basis functions and coeﬃcients, such a problem can be written as a nonlinear program. The choice of basis functions determines the set of possible solutions. A typical set of basis functions consists of functions that are one for a speciﬁc time interval and zero otherwise. This basis allows a progressive distribution of decision variables over time, which is very commonly used in online applications. A progressive basis reﬂects the desire (or expectation!) to have an optimal solution with (possible) high frequent control moves in the beginning and low frequent control moves towards the end of the control horizon. Since a clever choice of basis functions could reduce the number of basis functions (and consequently the number of parameters for optimization) this is an interesting ﬁeld of research. For the solution of dynamic optimization problems we need to distinguish between the sequential and simultaneous approach (Kraft, 1985; Vassilidis, 1993). The sequential approach computes a function evaluation by simulation of the model followed by a gradient based update of the solution. This sequence is re9 peated until solution tolerances are satisﬁed (converged solution) or some other termination criterion is satisﬁed (non converged solution). In the simultaneous approach, also referred to as collocation method (Neuman and Sen, 1973; Biegler, 2002), not only the input trajectory is parameterized but the state trajectories as well. This trajectory is described by a set of basis functions and coeﬃcients from which the time derivatives can be computed. At each discrete point in time this trajectory time derivative should satisfy the time derivative deﬁned by the model equations. This results in a nonlinear program (nlp) type of optimization problem were the objective is minimized subjected to a very large set of coupled equality constraints representing the process behavior. The free parameters of this nlp are both the parameters that deﬁne the input trajectory and parameters that describe the state trajectory. Since mathematically there is no diﬀerence between these parameters and all parameters are updated each iteration step together, this method is called the simultaneous approach. In general the sequential approach outperforms the simultaneous approach for large systems. This is not a rigid conclusion since in both areas researchers are developing better algorithms exploiting structure and computationally cheap approximations. Note that in case of the sequential approach during all (intermediate) solutions the model equations are satisﬁed by means of simulation. In case of the simultaneous approach intermediate solutions generally do not satisfy model equations. Note furthermore that with a ﬁxed input trajectory the collocation method is an alternative for numerical integration. Both the sequential as well as the simultaneous approach are implemented as an approximate Newton step type of optimization. The Hessian is approximated by an iterative scheme eﬃciently reusing derivative information. A true Newton step is simply not worthwhile because of its computational load. Optimization routines require a sensitivity function of optimization parameters with respect to the objective function (and constraints). This sensitivity function is reﬂected by partial derivatives that can be computed by numerical perturbation or in special cased by analytical derivatives. Jacobian information generated during simulation can be used to build a linear time variant model along the trajectory, which proves to be an eﬃcient and suitable approximation of the partial derivatives. Furthermore, parametric sensitivity can also be derived by integration of sensitivity equations or by solving adjoint equations. Reuse of Jacobian information from the simulation and exploitation of structure can reduce the computational load resulting in an attractive alternative. 10 Industrial process operation and control Process operation covers a very wide area and involves diﬀerent people throughout the company. The main objective the plant operation is to maximize proﬁtability of the plant. Primary task of plant operation is safeguarding. Safety of people and environment always gets highest priority. In order to achieve this, hardware measures are implemented. Furthermore, measurements are combined to determine the status of the plant. If a dangerous situation is detected, a prescribed scenario is launched that shuts down the plant safely. For the detection as well as for the development of scenarios, models can be employed. Fault detection can be considered as a subtask within the safeguarding system. It involves the determination of the status of the plant. A fault does not always induce a plant shutdown but can also trigger a maintenance action. Basic control is the ﬁrst level in the hierarchy as depicted in Figure 1.1 providing control actions to keep the process at desired conditions. Unstable processes can be stabilized allowing safe operation. Typically, basic controllers receive temperature, pressure and ﬂow measurements and act on valve positions. Furthermore, all kinds of smart control solutions are developed to increase performance. Diﬀerent linearizing transformation schemes and decoupling ratio and feed forward schemes are implemented in the distributed control system (dcs) and perform quite well. These schemes are to a high degree based on process knowledge, but nevertheless not referred to as advanced process control. Steady-state energy optimization (pinch) studies can reduce energy costs by rearranging energy streams (heat integration). Side eﬀect is the introduction of (undesired) interaction of diﬀerent parts of a plant. An upset downstream can act as a disturbance upstream without a material recycle present. Material recycle streams are known to introduce large time constants of several hours (Luyben et al., 1999). Both heat integration and material recycles complicate control for operators. Automation of the process industry took place very gradually. Nowadays most measurements are digitally available in the control room from which practically all controls can be executed. The availability of these measurements were a necessity for the development of advanced process control techniques, such as model predictive control (see tutorial paper by Rawlings, 2000), and because of its success, it initiated real-time process optimization. Scheduling can be considered as the top level of plant operations (Tousain, 2002) as depicted in Figure 1.1. At this level it is decided what product is produced at what time and sometimes even by which plant. Processing of information from the sales and marketing department, the purchase department, storage of raw material and end products is a very complex task. Without radical simpliﬁcations, implementation of a scheduling problem would result in 11 scheduler ✻ ❄ (dynamic) real time optimizer ✻ ❄ model predictive controller ✻ ❄ plant + basic controllers Figure 1.1: Control hierarchy with diﬀerent layers and information transfer. a mixed integer optimization that exceeds the complexity of a dynamic optimization. Therefore models used for scheduling problems only reﬂect very basic properties preventing the scheduling problem to explode. Scheduling will be not further discussed in this thesis although it is recognized as a ﬁeld with large opportunities. In practice very pragmatic solutions are implemented such as the production of diﬀerent products in a ﬁxed order, referred to as a product wheel. This rigid way of production has the advantage that detailed information is available to predict all costs that are involved. Downside is that opportunities are missed because of this inﬂexible operation. The availability of model-based process control enables a larger variety of transitions between diﬀerent products. Information on the characteristics of diﬀerent transitions can be made available and can be exploited by the scheduling task. This increases the potential of scheduling but requires powerful tools. Production nowadays shifts from bulk to specialties, which creates new opportunities for those who know to swiftly control their processes within new speciﬁcations (Backx et al., 2000). Capability of fast and cheap transitions enables companies to produce and sell on demand at usually favorable prices and brings added value to the business. In order to be more ﬂexible, stationary plant operation is replaced by a more ﬂexible transient (or even batch wise) type of operation. An other driver to improve on process control is environmental legislation, which becomes more and more stringent and pushes operation to its limits. Optimization-based process control contributes to this ﬂexible and competitive high quality plant operation. 12 Economic dynamic real time optimization plays a key role in bringing money to the business, since it translates a schedule into economically optimal set point trajectories for the plant. At least as important is the feedback that the dynamic optimization can give to the scheduling optimization in terms of e.g. minimal required transition times and estimated costs of diﬀerent and possible new transitions. This information, depicted by the arrow from dynamic real time optimization to the scheduler in Figure 1.1, enables improved scheduling performance because the information is more accurate and complete and allows for more ﬂexible operation. This more enhanced information can, for example, bring the diﬀerence between accepting and refusing a customers order. Dynamic real-time optimization plays a crucial role in connecting scheduling to plant control and can give a signiﬁcant contribution to the proﬁtability of a plant. Real-time process optimization state of the art operated plants have a layered control structure where the plants steady-state optimum is computed recursively by the real-time process optimizer providing set points that are tracked by a linear model predictive controller. Besides that this approach was implementable from a computational point of view, from the operators perspective, this approach was acceptable to do as well, with a safety argument that pleads for a layered control structure. In case of failure of the real-time optimization the process is not out of control but only the optimization is not executed. Since a state of the art optimizer assumes some steady-state condition, this condition is checked before the optimizer is started (Backx et al., 2000). This check is somewhat arbitrary because in practice a plant is never in steady state. Before the next steady-state optimization is executed a parameter estimation is carried out using online data. The result of the steady-state optimization is a new set of set points causing a dynamic response of the plant. Only after the process is stable again a next cycle can be started, which limits the update frequency of optimal set points. If a process is operated quasi steady-state and optimal conditions change gradually this approach can be very eﬀective. For a continuous process that produces diﬀerent products we require economical optimal transitions from one operation point to the other. Including process dynamics in the optimization enables exploitation of the full potential of the plant. Result of this optimization approach will be a set of optimal set point trajectories and a predicted dynamic response of the process. In this approach we do not require steady-state conditions to start an optimization and it enables shorter transition times. Shorter transition times generally result in reduction of oﬀ spec material and therefore increase proﬁtability of a plant. The real-time, steady-state optimizer and linear model predictive controller can be replaced by a single dynamic real-time optimization (drto) based on one 13 ✛ ✲❄ ❄ ✲❄ ✲❄ ✛ ❄ . . . . ............ . . . . . . . . .... .... ............. .. .. ..........✲ ✛ .. .. .............. ............ ...... .... .... ............ ................. ......... .............. ............ ..... ...... ....... ..... . ......................... .............. ............ .............. ............ ❄ ......... .. ......... .............. ............ ✠ ................. ✲... ... ✛ .............. ............ .............. ............ ..... ...... ....... ❄ ......... .. ......... .............. ............ ✠ ................. ✲... .. .............. ............ .............. ............ ..... ...... ....... ❄ ......... .. ......... .............. ............ ✠ ................. ✲... .. .............. ............ ✲ ✲ ✲ ✲ ✲ ✲ ✒ .... . ................... .............. ............ .............. ............ ✒ .... . .................. .............. ............ .............. ............ ✒ .... . .................. .............. ............ ✲ Figure 1.2: Typical chemical process ﬂow sheet with multiple diﬀerent unit operations and material recycle streams representing behavior of a broad class of industrial processes. large-scale nonlinear dynamic process model. This problem should be solved at the sample rate of the model predictive controller to maintain similar disturbance rejection properties as the linear model predictive controller. The prediction horizon of the optimization should be a couple of times the process dominant time constant. The implication of this straightforward implementation is discussed next. Implications straightforward implementation Let us now explore what the implication is of straightforward implementation of dynamic real-time optimization as a replacement of the real-time steadystate optimizer and linear model predictive controller. In a typical chemical process, two or more components react into the product of interest followed by one or more separation steps. This represents typical behavior of a broad class of industrial processes and therefore ﬁndings can be carried over to many plants. In general, one or more side reactions take place introducing extra components. The use of a catalyst can shift selectivity but never prevent side reactions completely. Suppose we assume only one side reaction, we already have to deal with four species, or even ﬁve if we take the catalyst into account. We can separate the four species with three distillation columns as depicted in Figure 1.2 if we assume that the catalyst can be separated by a decanter. The recycle streams introduce positive feedback and therefore long time constants in the overall plant dynamics (Luyben et al., 1999). Suppose we assume a instantaneous phase equilibrium and uniform mixing on each tray, the number of diﬀerential equation that describes the separation 14 of this chemical process is nx = nc (ns + 1)(nt + 2) , where nx is the number of diﬀerential equations, nc is the number of columns, ns is the number of species that are involved and nt is the average number of trays per column. The one in the formula represent the energy balance and the two represents the reboiler and condensor of a column. For a setup with three columns with twenty, forty and sixty trays and ﬁve species we need already over seven hundred and ﬁfty diﬀerential equations. If the reaction takes place in a tubular reactor we need to add a partial diﬀerential equation to the model. This can only be implemented after discretization, easily adding another tree hundred equations (ﬁve species times sixty discretization points) to the model bringing the total over a thousand equations. So we can extend the previous formula to nx = nc (ns + 1)(nt + 2) + ns nd , where nd is the number of discretization points. In practice the number of algebraic variables is three to ten times the number of diﬀerential equations, depending on implementation of physical properties (as hidden module or as explicit equations in the model). This brings the total of equations to several thousands up to ten thousand equations. This estimate serves as a lower bound for a ﬁrst principle industrial process model, and illustrates the number of equations that should be handled by plant-wide model-based process operation techniques. Fortunately the models are very nicely structured that can be exploited. This model property is referred to as the model sparsity and reﬂects the interaction or coupling between equations and variables. This property can be visualized by a matrix, the so-called Jacobian matrix J. If the j th variable occurs in the ith equation the J(i, j) element is nonzero and zero otherwise. Most zero elements are structural so do not depend on variable values. These elements do not have to be recomputed during simulation, which allows for eﬃcient implementation of simulation algorithms. The number of nonzero elements for process models is about ﬁve to ten percent of the total elements of the Jacobian matrix. Next we will estimate the number of manipulated variables that are involved. Every column has ﬁve manipulated variables: the outgoing ﬂows from reboiler and condensor, reboiler duty, condenser duty and the reﬂux rate or ratio. In this simple example we can manipulate within the reactor the fresh feed ﬂow and composition, feed to catalyst ratio, cooling rate and outgoing ﬂow of the reactor. This brings the number of manipulated variables to twenty, all potential candidates to be computed by model-based optimization. The number of 15 parameters that is involved can be computed by the next formula: np = nu H , ts where np is the number of free parameters, nu is the number of manipulated variables, H is the control horizon and ts is the sampling rate. For a typical process as described in this section, the dominant time constant can be over a day, especially if recycle streams are present introducing positive feedback. An acceptable sampling rate for most manipulated variables is a sampling rate of one to ﬁve minutes, however, pressure control might require a much higher sampling rate. In case of a horizon of three times the dominant time constant, twenty inputs and a sampling rate of ﬁve minutes, the total number of free parameters is over seventeen thousand. This results in a very large optimization problem that is not very likely to give sensible results. A selection of manipulated variables and clever parametrization of the input signals can reduce this number of free parameters. The input signal can even be implemented as a fully adaptive, problem-dependent parameterization generated by repetitive solution of increasingly reﬁned optimization problems (Schlegel et al., 2005). The adaptation is based on a wavelet analysis of the solution proﬁles obtained in the previous step. In practice, ﬁrst some base layer control would be implemented around each column to control levels and pressures. Set points for these controllers could then be degree of freedom for optimization. The added value of including these set points within a dynamic optimization are not evident but small inventories could decrease transition times. If for some reason the added value of these degrees of freedom are expected to be small they can be removed from the optimization problem reducing the number of optimization parameters. Suppose we want to do one nonlinear integration of the rigorous model within one sampling period to do a prediction of an input trajectory. In this case we need to simulate three days within ﬁve minutes. This requires at least simulation speed of over eight hundred times real time. If a sampling period of one hour is acceptable we still need a simulation speed of seventy two times real time. In this scenario we did not account for multiple, in case of the sequential optimization approach approximately between ﬁve and twenty, simulations and other computations than simulation. Depending on the input sequence, for the size of the models that is considered on current standard computer the simulation speed is between one to twenty times realtime. This reveals the tremendous gap between desired and current status of numerical integration. Nevertheless, numerical solvers are already very sophisticated handling diﬀerent timescales, also referred to as stiﬀ systems, and exploiting model structure. With current commercial modelling tools we usually end up with a set of diﬀerential and algebraic equations. Keeping the model in line with the process, 16 measurements are used to estimate the actual state of the process by means of an observer, e.g. an extended Kalman ﬁlter (e.g. Lewis, 1986). This is a model based ﬁltering technique, balancing model error with measurement error. The resulting state is then used as a corrected initial condition for the model. Finding a consistent solution for this new initial condition for a set of diﬀerential algebraic equations is called an initialization problem, which is hard to solve without a good initial guess. Fortunately, we can use the uncorrected state as initial guess which should be good enough. Still this initialization problem has to be solved every iteration at cost of valuable computing time. Going online with a straightforward implementation of real time dynamic plant optimization based on ﬁrst principle models introduces an enormous computational overload. At present only very pragmatic solutions are available that directly provoke all kinds of comments such as the inconsistency that is introduced by the use of diﬀerent models in diﬀerent layers within the plant operation hierarchy. These approaches are legitimated by the argument that there are no better alternatives readily available. Despite all this criticism on the pragmatic solutions for the model based plant operation, the approach has proven to contribute towards the proﬁtability of the plant. This proﬁtability can only be increased if consistent solutions are developed that replace the pragmatic solutions. Model reduction can provide a consistent solution and is explored in this thesis. First we will continue with model reduction techniques available in literature. 1.3 Literature on nonlinear model reduction Models that are available for large scale industrial processes can in general be characterized as a set of diﬀerential and algebraic equations (dae). Therefore we search for model reduction techniques that are applicable to this general class of models. This class of models is capable to describe the majority of processes and is more general than a set of ordinary diﬀerential equations (ode). Transformation of a dae into an ode is not possible in general and is regarded as major model reduction step. Since we are interested in the eﬀect of diﬀerent models on computational load for optimization every technique mapping one model to an other model is a candidate model reduction technique. This implies that diﬀerent modelling and identiﬁcation techniques can be considered using the original model as data generating plant replacement. Marquardt (2001) states that the proper way to assess model reduction techniques for online nonlinear model based control is to compare the closed loop performance based on the original model, with low sampling frequency, with 17 the reduced model at higher sampling frequency. Maximum sampling frequencies are determined by the computational load that is related to the diﬀerences between original and reduced model. The reduced model should enable higher sampling frequencies compensating for loss in accuracy and therefore result in higher closed loop performance. None of this type of assessments have been found in literature. Therefore we will need to resort to more general literature on model reduction and nonlinear modelling techniques. Computation and performance assessment of NLMPC Findeisen et al. (2002) assessed computation and performance of nonlinear model predictive control. The implementation of the control problem used in this paper was the so-called direct shooting, which is a special eﬃcient implementation of the simultaneous approach (Diehl et al., 2002). In their assessment, diﬀerent models are compared under closed loop control. The diﬀerent models of the 20 tray distillation column were a nonlinear wave model with 2 diﬀerential and 3 algebraic equations, a concentration and holdup model with 42 ordinary diﬀerential equations and a more detailed model (including tray temperatures) with 44 diﬀerential and 122 algebraic equations. All diﬀerent models where the result of remodelling, thus extra simpliﬁcations and assumptions based on physics and process knowledge resulted in reduced models. The eﬀect on the computational load is presented even for diﬀerent control horizons, distinguishing between the maximum and average computation time. More simpliﬁed models resulted in lower computational load, which is not surprising. More interesting is that the reduction in computational load is quantiﬁed. The increase of controller performance due to higher sampling frequency enabled by reduced computational eﬀort was not presented. Neither is completely clear how big the modelling error between original and reduced models is. Load of state estimation for these diﬀerent reduced models was assessed as well. Furthermore computational load of diﬀerent nonlinear predictive schemes were assessed with the original model in case of no plant model mismatch. Nonlinear wave approximation Balasubramhanya and Doyle III (2000) developed a reduced-order model of a batch reactive distillation column using travelling waves. The reader is referred to e.g. Marquardt (1990) and Kienle (2000) for more details on travelling waves. This nonlinear model was successfully deployed within a nonlinear model predictive controller (nlmpc) and linear Kalman ﬁlter that was computationally more eﬃcient than a nlmpc based on the original full-order nonlinear model. Although the original full order model was only a 31st order ode and the re18 duced model a 5th order ode, the closed loop performance was over six times faster in closed-loop with nlmpc based on the reduced model while maintaining performance. Performance was quite high despite of prediction horizon of only two samples and a control horizon of one. Furthermore they compared the nonlinear models with the linearized model illustrating the level of nonlinearity of the process. Simpliﬁed physical property models Successful use of simpliﬁed physical property models within ﬂow sheet optimization is reported by Ganesh and Biegler (1987). A simple ﬂash with recycle as well as a more involved ethylbenzene process with reactor, ﬂash, two columns and two recycle streams are presented in that paper. In both cases the rigorous phase equilibrium model (Soave-Redlich-Kwong) was approximated by a simpliﬁed model (Raoult’s law, Antoine’s equation and ideal enthalpy). This type of model simpliﬁcation is based on process knowledge and physical insight. It is a tailored approach but applicable to all models where phase equilibria are to be computed. Reductions up to an order of magnitude were reported by straightforward use of the simpliﬁed model within optimization. Danger of this approach was already reported by Biegler (1985) and is that the optimum does not coincide with the optimum of the original model. Combining the rigorous and simpliﬁed model in their optimization strategy Ganesh and Biegler can guarantee convergence to the true optimum with the original model and still reducing the computational load by over thirty percent. Model simpliﬁcation based on physics appears to be successful for steady-state ﬂowsheet optimization. Chimowitz and Lee (1985) reported increase of computational eﬃciency of a factor of order three by the use of local thermodynamic models. According to Chimowitz up to 90% of computational time was used for thermodynamic computations during simulation, motivating their approach of model approximation. The local thermodynamic models are integrated with the numerical solver where an updating mechanism of the parameters of the local models was included. This approach is not easy to use since model reduction and numerical algorithm are integrated. Ledent and Heyen (1994) attempted to use local models within dynamic simulations but were not successful due discontinuities introduced by updating the local models. Still local models as such, without update mechanism, can be used reducing computational load despite their limited validity. Perregaard (1993) worked on model simpliﬁcation and reduction for simulation and optimization of chemical processes. The objective of his paper is to present a simpliﬁcation procedure of the algebraic equations that through simpliﬁcation of the algebraic equations for phase equilibria calculations is capable of 19 reducing the computing time to solve the model equations without eﬀecting the convergence characteristics of the numerical method. Furthermore it exploits the inherited structure of equations representing the chemical process. This structured equation oriented framework was adopted from Gani et al. (1990) who distinguish between diﬀerential, explicit algebraic and implicit algebraic equations. Key observation is that for Newton-like methods, the Jacobian can be approximated during intermediate iterations. They replace the true Jacobian by a cheap to compute approximate Jacobian. This approximate Jacobian information is derived from local thermodynamic models with analytical derivatives. They present in their paper several cases and report reductions of overall computational times of the order 20-60% without loss of accuracy and no side aﬀects on convergence of the numerical method. Støren and Hertzberg (1997) developed a tailored dae solver that is computationally more eﬃcient and reliable and report limited reduction (34-63%) in computation times for dynamic optimization calculations. In their approach also local thermodynamic models are exploited. Model order reduction by projection Many papers are available on nonlinear model reduction by projection. More precise would be order reduction of nonlinear model by linear projection. Order referring to the number of diﬀerential equations. A generic procedure can be formulated by three steps. First a suitable transformation is applied revealing the important contributions to process dynamics. Second, the new coordinate system is decomposed into two subspaces. Finally, the dynamics can be formulated in the new coordinate system where either the unimportant dynamics are truncated or added as algebraic constraints (residualization). In case of residualization the resulting model is dae format and will not reduce computational eﬀort (Marquardt, 2001) due to increased complexity (loss of sparsity). Therefore in most cases the transformed model is truncated. An approximate solution with reduced computational load is known as slaving. Aling (1997) reported increasing computational load with increasing residualization and reduced computational load by approximating the solution of slaved modes. In most papers, projection is applied to ordinary diﬀerential equations (Marquardt, 2001). Only Löﬄer and Marquardt (1991) applied their projection to both diﬀerential and algebraic equations. As an error measure between original and reduced model, plots of trajectories of key variables are used. These are simply generated by simulation of a speciﬁc input sequence and applied to both models. In some papers results of computational time of simulations are added as relevant information. Important information on the applied numerical integration algorithm is mostly not available despite the fact that this is crucial for interpretation of the results. This becomes clear when comparing an explicit 20 ﬁxed step numerical integration scheme with variable step-size implicit numerical integration scheme. Extremely important is the ability of the algorithms to exploit sparsity (Bogle and Perkins, 1990). Process models are known to be very sparse, which can be eﬃciently exploited by some numerical integration algorithms reducing the computational load of simulation. Projection methods in general destroy this sparsity, which is reﬂected on computational load of those algorithms that exploit this sparsity. Projection methods diﬀer in how the projection is computed. Two main approaches how to compute these projections are discussed next: proper orthogonal decomposition and balance projection, respectively. Proper orthogonal decomposition Popular is projection based on a proper orthogonal decomposition (pod) with its origin in ﬂuid dynamics (see e.g. Berkooz et al., 1993; Holmes et al., 1997; Sirovich, 1991). This approach is also referred to as Karhunen-Loève expansion or method of empirical eigenfunctions. Bendersky and Christoﬁdes (2000) apply a static optimization of a catalytic rod and a packed bed reactor described by partial diﬀerential equations resulting in reductions of over a factor of thirty in computational load. In order to ﬁnd the empirical eigenfunctions they generated data with the original model and grid the design variables between upper an lower bound. In case of the packed bed this implied with three design variables at nine equally spaced values 93 = 729 simulations representing the complete operating envelope. This is a brute force solution to a problem also addressed by Marquardt (2001). However in case of a dynamic optimization this would not be attractive due to the much higher number of decision variables: typically over four inputs and at least 10 points in time would imply 940 ≈ 1038 simulations. Baker and Christoﬁdes (2000) applied proper orthogonal projection to a rapid thermal chemical vapor deposition (rtcvd) process model to be able to design a nonlinear output feedback controller with four inputs and four outputs. This design can be done oﬀ-line, so no computational load aspects were mentioned. They show that the nonlinear output feedback controller outperforms four pi controllers in a disturbance free scenario. Still the deposition was unevenly distributed. Addition of a model based feedforward would add performance to control solution and might diminish the diﬀerence between a nonlinear output controller and the four pi controllers. Aling et al. (1997) applied the proper orthogonal decomposition reduction method to a rapid thermal processing system. The reduction of diﬀerential equations was impressive from one hundred and ten to less than forty. Reduction of computational load for simulation was up to a factor ten. First a simpliﬁed model, a set of ordinary diﬀerential equations, is derived from a ﬁnite element model (Aling, 1996). Then this model is further reduced by a proper 21 orthogonal decomposition of order forty, twenty, ten and ﬁve by truncation. These truncated models are the further reduced by residualization. Residualization transforms the ode into a dae that is solved using a ddasac solver. Residualization does not reduce the computation and therefore they propose a so-called pseudo-steady approaximation (slaving), which is a computationally cheaper solution than residualisation. Order reduction by balanced projection Lall (1999, 2002) introduced empirical Gramians as an equivalent for linear Gramians that can be used for balanced linear model order reduction (Moore, 1981). Hahn and Edgar (2002) elaborate on model order reduction by balancing empirical Gramians and show results of signiﬁcant model order reduction but limited reduction in simulation times. Some closed loop results were presented but little details were presented on the implementation of the model predictive controller scheme. Performance of the controller based on the reduced model were as good as based on the full-order model but no reduction in computational eﬀort was achieved. Lee et al. (2000) exploit subspace identiﬁcation (Favoreel et al., 2000) for control relevant model reduction by balanced truncation. The technique is control relevant because it is based on the input to output map instead of the input to state map that is used for Proper Orthogonal Decomposition model reduction. This argument holds for all balanced model reduction techniques like the empirical Gramians (Lall, 2002; Hahn and Edgar, 2002). Newman and Krishnaprasad (1998) compared proper orthogonal decomposition and the method of balancing. Their focus was on ordinary diﬀerential equations describing the heat transfer in a rapid thermal chemical vapor deposition (rtcvd) for semiconductor manufacturing. The transformation that was used for balancing the nonlinear system was derived from a linear model in a nominal operating point. The transformation balancing this linear model was then applied to the nonlinear model. The transformation matrices were very ill conditioned and they used a method proposed by Safonov and Chiang (1989) to overcome this problem. An idea was suggested to ﬁnd a better approximation of the nonlinear balancing approach presented by Scherpen (1993). The order of the models was signiﬁcantly reduced by both projection methods with acceptable error, but no results were presented on reduction of the computational load. Zhang and Lam (2002) developed a reduction technique for bilinear systems that outperformed a Gramian based reduction, though demonstrated on a small example. The solution of the model reduction problem is based on the gradient ﬂow technique to optimize the H2 error between original and reduced order model. 22 Singular perturbation Reducing the number of diﬀerential equations can easily be done if the model is in a standard form of a singular perturbed system (Kokotovic, 1986). In this special case we can distinguish between the ﬁrst couple of diﬀerential equations representing the slow dynamics and the remaining diﬀerential equations associated with fast dynamics. Model reduction is then done by assuming that the fast dynamics behave like algebraic constraints, which reduces the number of diﬀerential equations. For some diﬀerential equations it is fairly obvious to determine its time scale but in general it is nontrivial. Tatrai et al. (1994a, 1994b) and Robertson et al. (1996a, 1996b) use state to eigenvalue association to bring models in standard form. This involves a homotopy procedure with continuation parameter that varies from zero to one, weighting the system matrix at some operating point with its trace. At diﬀerent values of the continuation parameter the eigenvalues of the composed matrix are computed enabling the state to eigenvalue association. Problem is that the true eigenvalues are the result of the interaction between several diﬀerential equations and therefore in principle cannot be assigned to one particular diﬀerential state. Duchêne and Rouchon (1996) show that the originally chosen state space is not the best coordinate system to apply singular perturbation. They illustrate this on a simple example and later demonstrate their approach on a case study with 13 species and 67 reactions. Their reduction approach is compared with a quasi steady-state approach and original by plotting time responses to a non steady-state initial condition. Reaction kinetics simpliﬁcation Petzold (1999) applied an optimization based method to determine what reactions dominate the overall dynamics. Aim is to derive the simplest reaction system, which retains the essential features of the full system. The original mixed integer nonlinear program (minlp) is approximated by a problem that can be solved by a standard sequential quadratic programming (sqp) method by using a so-called beta function. Results are presented in this paper for several reaction mechanisms. Androulakis (2000) formulates the selection of dominant reaction mechanisms as a minlp as well but uses a branch and bound algorithm to solve it. Edwards et al. (2000) not only eliminates reactions but species as well by solving a minlp using dicopt. Li and Rabitz (1993) presented a paper on approximate lumping schemes by singular perturbation, which they later developed into a combined symbolic and numerical techniques to apply constrained nonlinear lumping applied to 23 an oxidation model (Li and Rabitz, 1996). Signiﬁcant order reductions were presented but no eﬀect on computational load were discussed in these papers. Nonlinear empirical modelling Empirical models have a very low computational complexity and therefore alow for fast simulations. Typically, interpolation capabilities are comparable to fundamental models but extrapolation of fundamental models is far superior (Can et al., 1998). Since we do not want to restrict ourselves to optimal trajectories that are interpolations of historical data, these type of models seem unsuitable for dynamic optimization. Nevertheless we will mention some literature on nonlinear empirical modelling. Sentoni et al. (1996) successfully applied Laguerre systems combined with neural nets to approximate nonlinear process behaviour. Safavi et al. (2002) present a hybrid model of a binary distillation column combining overall mass and energy balances with a neural net accounting for the separation. This model was used for an online optimization of the distillation column and compared to the full mechanistic model. The resulting optima were close, indicating that the hybrid model was performing well. No results were presented on the computational beneﬁt of the hybrid model. Ling and Rivera (1998) present a control relevant model reduction of Volterra models by a Hammerstein model with reduced number of parameters. Focus was on closed loop performance of a simple polymerization described by 4 ordinary diﬀerential equations reactor and the computational aspects were not discussed. Later Ling and Rivera (2001) presented a three step approach to derive control relevant models. First, a nonlinear arx model is estimated from plant data using an orthogonal least squares algorithm. Second, a Volterra series model is generated from the nonlinear arx model. Finally a restricted complexity model is estimated from the Volterra series through the model reduction algorithm described above. This seems to involve quite some non trivial steps to ﬁnally arrive at the reduced model. Miscellaneous modelling techniques Norquay et al. (1999) successfully deployed a Wiener model within a model predictive controller on an industrial splitter using a linear dynamics and a static output nonlinearity. Pearson and Pottmann (2000) compare a Wiener, Hammerstein and nonlinear feedback structure for gray-box identiﬁcation, all based on linear dynamics interconnected with a static nonlinear element. Pearson (2003) elaborates in his review paper on nonlinear identiﬁcation on selection of nonlinear structures for computer control. 24 Stewart et al. (1985) presented a rigorous model reduction approach for nonlinear spatially distributed systems such as distillation columns, by means of orthogonal collocation. A stiﬀ solver was used to test and compare the original with diﬀerent collocation strategies that appear to be remarkably eﬃcient. Briesen and Marquardt (2000) present results on adaptive model reduction for simulation of thermal cracking of multi component hydrocarbon mixtures. This method provides an error controlled simulation. During simulation an adaptive grid reduces model complexity where possible. The error estimation governs the eﬃciency of the complete procedure and no results were presented on reduction of computational load. For online use of such a model it would require some adaptation of a Kalman ﬁlter since the order of the model is changing continuously. Kumar and Daoutidis (1999) applied a nonlinear input-output linearizing feedback controller to a high purity distillation column that was non-robust using the original model but had excellent robustness properties using a singularly perturbed reduced model. No details were presented on the eﬀect of the model reduction on computational load. See e.g. Nijmeijer and van der Schaft (1990), Isidori (1989) and Kurtz and Henson (1997) for more details on feedback linearizing control. Important observation of this literature overview is that no reduction techniques are available directly linked to reduction of computation load or simulation of optimization. All techniques have diﬀerent focuses, and the eﬀect on computational load can only be evaluated by implementation. There does not exist a modelling synthesis technique for dynamic optimization that provides an optimal model for a ﬁxed sampling interval and prediction horizon. So the effect of most promising model reduction techniques should be evaluated on their merits for real time dynamic optimization by implementation. 1.4 Solution directions The gap revealed for consistent online nonlinear model based optimization is caused by its computational load. Since computing speed approximately doubles every eighteen months one could argue that it is only a matter of time before the gap is closed. With a gap of factor eight hundred derived in the previous section we still need to wait for almost ﬁfteen years until computers are fast enough, assuming that the computer speed improvements can be extrapolated. However, suppose next decades computing power were available we would immediately want to control even more unit operations by this optimization based controller or increase the sampling frequency to improve performance. This brings us back to square one where the basic question is how to reduce computational load of the optimization problem so that it can be solved within 25 the limited time available in online applications. This is the concept of model reduction addressed in this thesis. 1. The divide and conquer approach (e.g. Tousain, 2002) was already mentioned as state of the art process control solution. In this approach a steady-state nonlinear model, based on ﬁrst principles, is used within a steady-state optimization, producing optimal set points. These set points are tracked by a mpc based on an identiﬁed linear dynamic model. Typically these optimal set points are recomputed a couple times per day maximum, whereas the mpc is implemented as a receding horizon controller with a sample time of one to ﬁve minutes and control horizon up to a couple of hours maximum. (a) The inconsistency of state of the art real-time optimization can partially be eliminated by replacing the static optimization by a dynamic optimization. The result of the dynamic optimization problem are optimal input and output trajectories exploiting dynamics of the process. These input and output trajectories can be tracked by the linear mpc in so-called delta mode. This removes only part of the inconsistency since now models in both control layers are based on dynamic models. Inconsistency is still present due to use of a linear model in the mpc whereas model used for dynamic optimization is nonlinear, like the true process. (b) Nonlinear elements can gradually be incorporated within a linear model predictive controller scheme. The linear prediction can be replaced by prediction from simulation of future input trajectories with the nonlinear dynamic model. And using linear time varying models derived from the nonlinear model along the simulated trajectory brings us close to a full nonlinear model based controller. 2. Another solution direction is improvement of optimization and numerical integration routines. Great improvements have been achieved in numerical integration routines over the last decades, which makes this direction not eﬀective. Improving eﬃciency of optimization routines will not be the scope of this thesis. 3. Solution direction treated in this thesis is one where approximate models are derived from a rigorous model. These approximate models should reﬂect optimization relevant dynamic behavior, but with more favorable computational properties than the original rigorous model. The rigorous model is strongly appreciated and therefore should serve as base model from which models for diﬀerent applications can be derived. A rigorous model should be the backbone of nonlinear optimization based control. This proposition was decisive for the course of this research. 26 (a) A possible way to derive approximate models is to generate data by dedicated simulations with a validated rigorous model. This data can be then used for model identiﬁcation and validation. Some disadvantages are removed in this approach such as time consuming costly real life experiments for dedicated data collection. Furthermore the data collection on simulation basis can be done over a larger operating range than would have been allowed for in the real plant. Although the models that result from this approach do have favorable computational properties loads of data have to be processed, which has to be redone if for some reason a model parameter is adapted. Since plant hardware and operation are improved continuously this approach requires model updating by re-identiﬁcation. (b) In nonlinear control theory linear model reduction concepts have been generalized to nonlinear models (Isidori, 1989; Nijmeijer, 1990; Scherpen, 1993) and applied to small models. No feasible solution for large scale problems became available from this research area. (c) From linear control theory, diﬀerent model reduction techniques are available that can be applied to nonlinear models as well. These techniques reduce to projection of dynamics achieved by transformation, followed by either truncation or residualization. Projection of dynamics can be done based on diﬀerent arguments and result in reduced models with diﬀerent properties. Successful results on model reduction by projection have been presented in literature though not evaluated within an online dynamic optimization setting. This topic will be extensively treated in this thesis. (d) Alternative approach to arrive at approximate models reduces to remodelling. Main question is what model accuracy is required for models used in online dynamic optimization applications. Presence of uncertainties and disturbances will require continuously updating of the control actions. So an approximate model for sure should reﬂect the plant dynamics but does not have to predict process variables correctly up to ten decimals. A successful attempt to reduce computational load by remodelling for simultaneous dynamic optimization was reported by Findeisen et al. (2002). Remodelling will be the second main model reduction approach treated in this thesis. The idea of decomposing the optimization in two layers is strong if both layers are using models derived from the same core model providing consistency. In either case it is beneﬁcial to reduce the computational eﬀort for large scale dynamic optimization. We are capable of formulating this type of problems for industrial processes so this is assumed not to be the problem. This thesis focusses on model approximation of a rigorous nonlinear dynamic model. Non27 linear identiﬁcation is not selected because it lacks a good basis for a proper choice of model structure. Nonlinear control theory has only been applicable to very small toy examples and was therefore not selected. This leaves model reduction by projection and by remodelling as solution directions to be explored in this thesis. 1.5 Research questions Starting point in this thesis is the availability of a validated rigorous model. Development and validation of such a model is a challenging task, which is beyond the scope of this thesis. A rigorous model is based on numerous modelling assumptions. Important assumptions are the set of admissible variable values and timescale deﬁning the validity envelope of the rigorous model. In general a much smaller region of this envelope is of interest for process optimization. Precisely this observation provides the fundamental basis for model approximation. The approximation needs only to be accurate in this restricted region of interest. With the availability of this validated rigorous model we can distinguish between generation and evaluation of diﬀerent approximate models. Therefore we need to develop and apply diﬀerent model approximation techniques to generate diﬀerent models and build a test environment to evaluate the eﬀect on optimization performance. Approximation techniques for linear models by projection are quite matured and the principles can be carried over to nonlinear models. Approximation by projection is a mathematical (and therefore quite generally applicable) approach that reduces the number of diﬀerential equations. First research question is: 1. How to derive a suitable projection? Suitable in the sense that it is suitable for dynamic real-time optimization, so reducing computational load of the optimization and with minimal loss of controller performance. A diﬀerent approach for model approximation is remodelling, which requires detailed process knowledge of both physics and process operation. Second research question is: 2. Is physics-based reduction a suitable technique to derive models for dynamic real-time optimization? Last research question is: 3. How to evaluate diﬀerent approximated nonlinear models for dynamic real-time optimization? 28 A fundamental problem emerges if we want to classify diﬀerent nonlinear models. Linear models can be classiﬁed by means of distance measures (norms) that capture all possible input signals. Similar distance measures can be deﬁned for nonlinear models but represent only one (set of) speciﬁc input signals since the superposition principle is not applicable to nonlinear models. Moreover we are not only interested in the approximation of the input-output map of the model but especially interested in the online performance of the optimization based on this reduced model. A pragmatic approach to classify nonlinear models for a speciﬁc optimization will be presented in this thesis, addressing solution accuracy, model adequacy and computational eﬃciency. 1.6 Outline of this thesis In the previous sections we explored diﬀerent research areas in which this research is embedded. The interplay between these diﬀerent ﬁelds within online dynamic optimization is not always trivial and requires thorough knowledge and understanding of all areas. This complexity makes advanced process control such a fascinating research area. Furthermore we observed a large gap in time, required to solve the true optimization problem online, and desired sampling time required for controller performance. Nowadays very pragmatic approximate solutions are implemented, not pushing plant to its maximum performance. Since there are many aspects to optimization based process control, many diﬀerent solutions can contribute closing this gap. In this thesis we will focus on the role of the model, its properties and the eﬀect on optimization based control. After exploring literature for possible solutions for the model reduction problem we presented diﬀerent solution directions. We selected two solution approaches and formulated three research questions that will be worked out in the next chapters. In Chapter 3 a base case dynamic optimization is outlined in quite some detail in order to make clear how the optimization was implemented. Results of the dynamic optimization based on a detailed rigorous model are presented. The modelling equations of the detailed plant model reﬂecting the process behavior are discussed in Chapter 4 along with the physics-based model reduction that was applied. Results of the base case dynamic optimization based on this reduced model are presented and compared with the optimization results with the detailed rigorous model in this chapter. The physics-based reduced model is the new starting point for the exploration of the reduction by projection in Chapter 5. Two projection techniques are assessed by performing dynamic optimizations with all possible reduced29 orders. Solution accuracy and eﬃciency are the key properties of interest. This resulted in over 500 optimizations providing a complete view on the properties of model reduction by projection. The projection was applied to the physics-based reduced model instead of the original rigorous model simply because of practical reasons. Finally conclusions and future research directions are presented in Chapter 6. These chapters are preceded by Chapter 2 on model reduction by projection. Focus is on current available techniques and lucent interpretation thereof. This results in a generalization and new interpretation of a existing projection techniques. The projection techniques are evaluated in Chapter 5 within a dynamic optimization, setting as deﬁned in the base case dynamic optimization in Chapter 3 and applied to the physics-based reduced model as derived in Chapter 4. 30 Chapter 2 Model order reduction suitable for large scale nonlinear models In this chapter two methods of order reduction methods for nonlinear models are reviewed. These two methods are treated in more detail since those are serious candidates for large scale model reduction, respectively balanced model order reduction via empirical Gramians and model order reduction by Proper Orthogonal Decomposition. Both methods are extensively explored, tested and compared. 2.1 Model order reduction Most nonlinear model order reduction methods are extensions of linear methods for model reduction. We can be even more precise: most reduction techniques apply some linear projection to the nonlinear model as if it were a linear model. It is sensible to look into linear model reduction ﬁrst since the arguments for model reduction are the same for the linear and nonlinear case. For linear model reduction we can ﬁnd a thorough overview in textbooks like e.g. by Obinata and Anderson (2001). Overview articles on diﬀerent linear model reduction techniques are available by e.g. Gugercin (2000) and Antoulas (2001). These methods can be described by a transformation in the ﬁrst step followed by either truncation or singular perturbation as a second step. Reduction methods diﬀer in the ﬁrst step whereas the second step can be chosen on other 31 grounds, e.g. the need for an accurate steady-state gain would prefer singular perturbation (residualization) over truncation. Since nonlinear model order reduction techniques are inspired by linear model reduction we ﬁrst will start with the basic linear model reduction methods. And even more basic we will start how to arrive at a linear model from a nonlinear model. Linearization of a nonlinear model Suppose a process can be described by a nonlinear model deﬁned as a set of ordinary diﬀerential equations ẋ f (x, u) x(t0 ) = x0 . = , (2.1) y g(x, u) where x is the state vector in Rnx , u is the input vector in Rnu , y is the output vector in Rny and x0 is the initial condition at time t0 in Rnx . This nonlinear system can be linearized around an equilibrium point (x∗ , u∗ , y ∗ ) deﬁned by f (x∗ , u∗ ) 0 . (2.2) = g(x∗ , u∗ ) y∗ Consider x(t) = x∗ + ∆x(t), u(t) = u∗ + ∆u(t) and y(t) = y ∗ + ∆y(t) and the Taylor series about the equilibrium point (x∗ , u∗ , y ∗ ) ∂f 2 f (x, u) = f (x∗ , u∗ ) + ∂f ∆x + ∂x ∂u ∆u + O(∆) , ∗ ∗ ∂g g(x, u) = g(x∗ , u∗ ) + ∂x ∆x + ∂g 2 ∂u ∆u + O(∆) , ∗ (2.3) ∗ with ∂∂ ·· ∗ the partial derivative in stationary point (x∗ , u∗ ) and O(∆)2 representing the error of approximation. The original set of diﬀerential equations can be approximated in the stationary point by a linearization valid for small perturbations ∂f ∆x + ∆ẋ = ∂f ∂x ∂u ∆u, ∗ ∗ ∂g ∆y = ∂x ∆x + ∗ or in matrix notation ∆ẋ A = ∆y C B D ∆x ∆u ∂g ∂u ∆u, ∗ , 32 (2.4) ∆x(t0 ) = x0 − x∗ , (2.5) where A= ∂f ∂x C= ∂g ∂x ∗ ∈ Rnx×nx , ∈R ny×nx ∗ B= ∂f ∂x D= ∂g ∂x ∗ , ∈ Rnx×nu , (2.6) ∈R ny×nu ∗ . The matrix quadruple A, B, C, D is referred to as the system matrices in linear systems theory (e.g. Zhou et al., 1995). Linear model order reduction Consider the linearization (2.5) of the set of nonlinear ordinary diﬀerential equations deﬁned in Equation (2.1) in an equilibrium point ẋ A B x = , (2.7) y C D u where for sake of notation ∆x and ∆u are replaced by x and u, respectively. With matrices A, B, C, D ﬁxed in time we have deﬁned a so-called linear timeinvariant (lti) system with sorted eigenvalues {λ1 , λ2 , . . . , λn } such that 0 ≤ |λ1 | ≤ |λ2 | ≤ . . . ≤ |λn |. (2.8) The system has a two-time-scale property if there exists one or more eigenvalue gaps deﬁned by |λp | 1 , p ∈ {1, . . . , n}. |λp+1 | (2.9) Suppose we can distinguish between slow and fast dynamics by splitting x into x1 and x2 , respectively. The system is then in so-called standard form of a singular perturbed system (Kokotovic et al., 1986), A11 A12 B1 x1 x˙1 x˙2 = A21 A22 B2 x2 . (2.10) y C1 C2 D u Kokotovic et al. (1986) derived a suﬃcient condition to have a two time scale property A−1 22 < 1 A0 + A12 A−1 22 A21 12 + 2 A0 A12 A−1 22 A21 , (2.11) where A0 = A11 − A12 A−1 22 A21 . 33 (2.12) The proof is given in Kokotovic et al. (1986). Assuming A22 is nonsingular, −1 x2 = −A−1 22 A21 x1 − A22 B2 u can be substituted which yields the singularly perturbed slow approximation where we assume the fast dynamics associated with ẋ2 to be inﬁnitely fast x˙1 A11 − A12 A−1 A21 B1 − A12 A−1 B2 x1 22 22 = . (2.13) y u C1 − C2 A−1 D − C2 A−1 22 A21 22 B2 The order of the approximation is reduced by the number of diﬀerential equations associated with x2 . Note that the possible sparse structure present in the system matrix A11 and A22 is destroyed by the substitution of x2 due to the inversion of matrix A22 , which in general is not sparse. Motivation for this reduction approach is that in general we are more interested the slow process dynamics since slow dynamics dominates controller design and in practice limits performance. A diﬀerent approximation approach is truncation. Suppose we can distinguish between varying and constant states by splitting x into x1 and x2 , respectively. We approximate the slow dynamics associated with ẋ2 to be inﬁnitely slow. So by substitution of x2 = c and elimination of ẋ2 we can derive a truncated model A11 A12 B1 x1 x˙1 x˙2 = A21 A22 B2 x2 → y C1 C2 D u (2.14) x1 x˙1 A11 A12 B1 c . = y C1 C2 D u Note that possible present sparse structure in A11 is preserved in the truncated model. However, it is rare that the original system is in this format. In general a sparse structure will be already be destroyed by transformation into the format that is suitable for truncation. Both reduction techniques require a partition of diﬀerential equations in slow and fast dynamics or in varying and constant states, respectively. In practice the diﬀerential equations are not in this standard form and need to be sorted ﬁrst. Sorting diﬀerential equations can be done by a permutation matrix, which can be considered as a special transformation: xp = P x where xp is the permuted state vector and P a permutation matrix. Permutation does not eﬀect the input to output behavior of the system. The original (permuted) coordinate system is seldom a coordinate system suitable for model reduction. In general we need a so-called similarity transformation to arrive at the suitable coordinate system. From linear system theory we that a similarity transformation without reduction does not eﬀect the input to output behavior. 34 The proof is short and simple. The input to output behavior can be described by a so-called transfer function in the Laplace domain. The Laplace transfer function G(s) that can be computed from the system matrices is G(s) = y(s) −1 = C (sI − A) B + D. u(s) (2.15) Transformation to another coordinate system z by means of an invertible matrix T ∈ Cnx×nx does not eﬀect the transfer function G(s). This can be demonstrated by considering the transformation z = T x with T T −1 = I. We can write Equation (2.7) in the new coordinate system ż T AT −1 T B z = , (2.16) y D u CT −1 which gives us the new system matrices in the transformed domain. The transfer function for the transformed system is −1 T B + D. (2.17) G(s) = CT −1 sI − T AT −1 Using a matrix equality for square and invertible matrices X −1 Y −1 Z −1 = (ZY X)−1 , (2.18) where X = T , Y = (sI − T AT −1 ) and Z = T −1 we can show by substitution that −1 −1 T = (sI − A) , (2.19) T −1 sI − T AT −1 which ﬁnishes the proof. Transformation to a new coordinate system is crucial for the properties of the resulting reduced model, along with the choice for either truncation or singular perturbation. Singular perturbation, truncation and transformation can be applied to a set of nonlinear diﬀerential equations as we will see next. Nonlinear model order reduction Let us consider the set of nonlinear ordinary diﬀerential equations deﬁned in (2.1) and assume that the system is in a standard form of a singular perturbed system with again x1 and x2 associated with the slow and fast dynamics, respectively. We can reduce the number of diﬀerential equations by approximating the fast dynamics by inﬁnitely fast dynamics by letting ε → 0, which results in an algebraic constraint f1 (x1 , x2 , u) f1 (x1 , x2 , u) x˙1 x˙1 εx˙2 = f2 (x1 , x2 , u) −→ 0 = f2 (x1 , x2 , u) . (2.20) g(x1 , x2 , u) y g(x1 , x2 , u) y 35 Note that the set of ordinary diﬀerential equations is turned into a set of diﬀerential and algebraic equations, but unlike for the linear model these equations cannot be eliminated by substitution in general. Suppose that we partitioned our system such that x1 and x2 represent the varying and very slow varying states, respectively. We can then approximate the model by truncation where we assume the slow dynamics to be inﬁnitely slow i.e. constant, x2 = x∗2 f1 (x1 , x2 , u) x˙1 ∗ x˙2 = f2 (x1 , x2 , u) −→ x˙1 = f1 (x1 , x∗2 , u) . y g(x1 , x2 , u) y g(x1 , x2 , u) (2.21) A proper choice for x∗2 is a steady-state value for the original system. This can be computed from 0 = f1 (x∗1 , x∗2 , u∗ ), 0 = f2 (x∗1 , x∗2 , u∗ ) and 0 = g(x∗1 , x∗2 , u∗ ). The truncated model is again a set of ordinary diﬀerential equations like the model we reduced. Just like for linear systems we can apply a linear coordinate transformation z = T (x − x∗ ) ż y = T f (T −1 z + x∗ , u) g(T −1 z + x∗ , u) , z(t0 ) = T (x0 − x∗ ). (2.22) This coordinate transformation is required to do model reduction in a suitable coordinate system. This transformation can be a permutation of the diﬀerential equations but can also involve a full state transformation. In case of a full state transformation the new states cannot be associated with physical states any more but the relation between new and old states is ﬁxed by the transformation matrix. In an attempt to prevent that Tatrai et al. (1994a, 1994b) and Robertson et al. (1996a, 1996b) tried to associate eigenvalues to states in the original coordinate system using a linearization and a homotopy parameter. There are diﬀerent arguments to arrive at a suitable coordinate system that will be discussed in detail the next section. The method using empirical Gramians for a balanced reduction is based on the observation that some combinations of states contribute more to the input-output behavior than other combination of states. A speciﬁc transformation arranges the new transformed states in order of descending contribution. The method referred to as reduction by proper orthogonal decomposition is based on the observation that some combination of states are approximately constant during simulation of predeﬁned input signals. A speciﬁc transformation arranges the new transformed states in order of descending excitation. 36 2.2 Balanced reduction Balanced reduction can be explained by the Kalman decomposition in block diagonal form, that partitions the system into four sets of state variables of which the ﬁrst and second are controllable and the ﬁrst and third are observable. The fourth set of states is neither controllable nor observable A11 x1 ẋ1 0 0 0 B1 ẋ2 0 A22 0 0 B2 x2 ẋ3 = 0 0 A33 0 0 x3 (2.23) . ẋ4 0 0 0 A44 0 x4 y C1 0 C3 0 D u The only relevant part of this model is the part that is both controllable and observable, since this is the part that can be eﬀected by a controller in feedback. These decomposition is invariant under similarity transformation. Small perturbations of the system matrices B and C will prevent this exact decomposition and in case the system matrices are not analytically determined but the result of some numerical routine all zeros are represented by small values A11 x1 ẋ1 0 0 0 B1 ẋ2 0 A22 0 0 B2 x2 ẋ3 = 0 0 A33 0 εB3 x3 (2.24) . ẋ4 0 0 0 A44 εB4 x4 y C1 εC2 C3 εC4 D u Observe that compared to the unperturbed case x2 is still controllable but now weakly observable, x3 is still observable but now weakly controllable and x4 is now weakly controllable and weakly observable. We can now demonstrate the eﬀect of scaling. Assume we multiply x2 by ε and divide x3 by ε the scaled system matrices are deﬁned by x1 ẋ1 A11 0 0 0 B1 εẋ2 0 A22 0 0 εB2 1 1εx2 ẋ3 = 0 0 A33 0 B3 ε x3 (2.25) ε . ẋ4 0 0 0 A44 εB4 x4 C1 C2 εC3 εC4 D y u Compared to the unscaled case x2 is now strongly observable but weakly controllable and the reverse holds for x3 . By scaling strongly controllable states that are weakly observable can be interchanged by weakly controllable states that are strongly observable. For demonstration purposes we chose to not perturb the A matrix but the eﬀect of small non zero values instead of zeros is similar on observability and controllability properties. 37 The Kalman decomposition is a theoretical decomposition that helps to understand the concept of observability and controllability but cannot be applied in practice. This is caused by the presence of many small values in system matrices. Furthermore we demonstrated the eﬀect of scaling on observability and controllability. Observability and controllability are not invariant under scaling however we know that the input-output behavior is. This naturally leads us to the model approximation by balanced truncation as posed by Moore (1981). After a speciﬁc transformation a single state of the transformed system is equally controllable and observable, or so-called balanced. Furthermore the states are ordered in descending rate of controllability and observability. Linear balanced reduction Let us consider the continuous time stable linear time-invariant system (2.7). The controllability Gramian and the observability Gramian, respectively P and Q, are deﬁned as ∞ T P = 0 eAt BB T eA t dt ∞ AT t T At , (2.26) Q= 0 e C Ce dt and satisfy the Lyapunov equations AP + P AT + BB T = 0 . AT Q + QA + C T C = 0 (2.27) The quadruple A, B, C, D is a balanced realization if and only if P =Q=Σ, (2.28) where Σ is diagonal with σ1 ≥ σ2 ≥ · · · ≥ σn . If the system is not a balanced realization there exists a transformation after which the transformed system is a balanced realization. The diagonal elements of Σ are also revered to as the Hankel singular values of the system, which are invariant under transformation. The similarity transformation T that balances the system if observable, controllable and stable, can be derived from the controllability Gramian and observability Gramian (see e.g. Zhou, 1995) P RQRT T = RT R = U Σ2 U T 1 = Σ 2 U T RT , (2.29) which ﬁnalizes the approach in the continuous time linear case. A more detailed description can be found in Appendix A.1. 38 Since we will work in this chapter with discrete time models we also present the discrete time observability and controllability Gramian deﬁnitions. Let us therefore deﬁne a discrete time model for the linear time invariant system deﬁned in Equation (2.7) F G xk xk+1 = , (2.30) yk uk C D where F and G are discrete transition matrices deﬁned as ts eA(ts −τ ) Bdτ , F = eAts and G = (2.31) 0 for piece-wise constant (zero order hold) input signals and where ts is the sample interval. For a discrete time linear system (2.30) the controllability and observability Gramians are deﬁned as T k T k Wc = Σ∞ k=0 F GG F , ∞ kT T Wo = Σk=0 F C CF k (2.32) which satisfy the Lyapunov equations F Wc F T + BB T − Wc = 0 . F T Wo F + C T C − W o = 0 (2.33) The discrete time system deﬁned by the system matrix quadruple F, G, C, D is balanced if and only if Wc = Wo = Σd , (2.34) where Σd are the Hankel singular values of the discrete time system. In case a discrete system is not a balanced realization there exists a transformation after which the transformed system is a balanced realization. This transformation can be computed as described in Equations (2.29) but with discrete time controllability and observability Gramians. The ﬁnite time discrete time controllability and observability Gramians are deﬁned as T k T k Wc = ΣN k=0 F GG F , T k Wo = ΣN C T CF k k=0 F (2.35) where N is a big number instead of ∞. These Gramians are approximate solutions of the Lyapunov Equations (2.33) and will be used later on in this chapter. 39 Numerical issues The corresponding transformation for a balanced realization of the original system can be numerically ill conditioned. This is caused by the small Hankel singular values that appear in the transformation matrix. Inversion of these small Hankel singular values expose the addressed numerical problem. This problem can be circumvented in case that we are only interested in the approximate reduced model. P = Uc Σ2c UcT , Q = Uo Σ2o UoT . (2.36) (2.37) H := Σo UoT Uc Σc , (2.38) Deﬁne the Hankel matrix and deﬁne the singular value decomposition of H H = UH ΣH VHT . (2.39) Then the transformation matrix T is deﬁned as 1 T 2 T := ΣH VHT Σ−1 c Uc , (2.40) and inverse −1 T −1 = Uc Σc VH ΣH 2 , (2.41) or dually we can deﬁne T as −1 T T := ΣH 2 UH Σo UoT , (2.42) with inverse 1 2 T −1 = Uo Σ−1 o UH ΣH . (2.43) Transformation T brings the system (2.7) into a balanced realization. Proof is by substitution of transformation matrices and Gramian decompositions of P and Q in Lyapunov equations (2.27). In case we are only interested in the reduced-order model we can derive a well conditioned projection z˙1 TL ATR−1 TL B z1 = , (2.44) y u CTR−1 D where −1 T TL = ΣH12 UH Σo UoT , 1 40 (2.45) and −1 TR−1 = Uc Σc VH1 ΣH12 , with H partitioned as H= UH2 UH1 ΣH1 0 0 ΣH2 (2.46) VHT1 VHT2 . (2.47) Observe that the small singular values of the Hankel singular values are truncated and yield well conditioned computation of the projection matrices. Let us look at a special case to gain more insight on Gramian based balancing. Therefore we deﬁne a stable linear system with one input and one output and were Λ is diagonal with on the diagonal the sorted eigenvalues |λ1 | < |λ2 | < · · · < |λn |, B = [ ε0 , ε1 , . . . , εn−1 ]T with ε < 1, B = C T and D = 0. For a third order system the example expands to ẋ y = Λ C B D x u λ1 ẋ1 ẋ2 0 → ẋ3 = 0 y 1 0 λ2 0 ε 0 0 λ3 ε2 1 x1 x2 ε ε2 x3 0 u .(2.48) This form is not a balanced realization although the systems is perfectly symmetrical and the input to output contribution of each states decreases. This can be veriﬁed by the corresponding Lyapunov equations as deﬁned in (2.27). Substitution of the system matrices yields ΛP + P Λ + BB T = 0 ΛP + P ΛT + BB T = 0 → →P =Q. T T Λ Q + QΛ + C C = 0 ΛQ + QΛ + BB T = 0 (2.49) The system would be a balanced realization if P = Q = Σ with Σ diagonal and with on the diagonal the Hankel singular values. In this example P cannot be diagonal because Λ is diagonal and BB T is not. The transformation that brings the system in a balanced realization is U T , which is derived from the singular value decomposition of P where P = U ΣU T . Substitution in the Lyapunov equation yields ΛU ΣU T + U ΣU T Λ + BB T = 0 . ΛU ΣU T + U ΣU T Λ + C T C = 0 (2.50) Pre multiplication with U T and post multiplication with U gives + ΣA T + B B T = 0 AΣ U T ΛU Σ + ΣU T ΛU + U T BB T U = 0 → T T T T T = 0 . (2.51) +C T C U ΛU Σ + ΣU ΛU + U C CU = 0 A Σ + ΣA 41 The balanced realization is of the example is T ż U ΛU U T B z = = y u CU D B A C D z u . (2.52) If ε 1 in the example U approaches the unity matrix and in that case the example already is in a balanced realization. Empirical Gramians For linear systems we used Gramians to compute a balanced realization suitable for model reduction by truncation. These Gramians are the solution of two Lyapunov equations that can be solved. For nonlinear systems we cannot derive Gramians in this way. The idea behind empirical Gramians is to derive Gramians from data generated by simulation. If it is possible to construct Gramians from simulated data for linear systems the same technique can be applied to nonlinear systems since simulation of a nonlinear system is possible. In this way it would be possible to construct a Gramian that is associated to a nonlinear system. Lall (1999) introduced the idea of empirical Gramians. Empirical Gramiams are closely related to the covariance matrix introduced by Pallaske (1987) ∞ M= (x(t) − x∗ )(x(t) − x∗ )T dtdG , (2.53) G 0 although Pallaske did not relate the approach to balancing. The symbol G denotes a set of trajectories resulting from a variation of initial conditions and input signals. Löﬄer and Marquardt (1991) further elaborated the approach and suggested to use a set of step responses to generate data that represent the systems dynamics. Lall (1999) reconstructs the Gramians from data generated by either an impulse response, for the controllability Gramian, or response to an initial condition to derive the observability Gramian. Because of the orthogonality of the impulses and initial conditions the data adds up to the controllability and observability Gramian, respectively. Lall deﬁned the following sets for the empirical input Gramian: Tn M En = {T1 , . . . , Tr | Tl ∈ Rn×n , Tl TlT = I, l = 1, . . . , r} = {c1 , . . . , cs | cm ∈ R+ , m = 1, . . . , s} = {e1 , . . . , en | standard unit vectors in Rn } , (2.54) where r is the number of diﬀerent perturbation orientations, s is the number of diﬀerent perturbation magnitudes and n is the number of inputs of the system 42 for the controllability Gramian and the number of states of the full-order system for the observability Gramian. Lall does not motivate this choice for these sets nor does he give an interpretation of his deﬁnition of empirical Gramians. He simply presents the deﬁnitions and proves that for a linear system the empirical Gramian coincides with the classical deﬁnition of a Gramian. Empirical controllability Gramian. Let T p , M and E p be given as described above, where p is the number of inputs of the system. The empirical controllability Gramian for system (2.7) is deﬁned by P = p r s l=1 m=1 i=1 1 rsc2m ∞ Φilm (t)dt , (2.55) 0 where Φilm (t) ∈ Rn×n is given by Φilm (t) := (xilm (t) − xss )(xilm (t) − xss )T , (2.56) and xilm (t) is the state of the system corresponding to the impulse response u(t) = cm Tl ei δ(t) + uss with initial condition x0 = xss . The proof can be found in Lall (1999). The deﬁnition of the empirical controllability Gramian is explained as follows. The ﬁrst set of summations in the empirical controllability Gramian can be given the interpretation of a rotation of the standard unity vectors Rr . Inﬁnitely many diﬀerent rotations would form a unity sphere in Rr with its origin located in uss . Each of this rotated unity vectors is used as a direction for a Dirac pulse to excite the system. By simulation of the free response of the system to this Dirac pulse we can see how the energy is absorbed by the system and manifests itself in state trajectories. This gives a measure for controllability of that speciﬁc input direction. If the Dirac pulse results in a large excursion in a speciﬁc direction of the state space we give it the notion of a good controllable subspace. In order to search the state space for nonlinear behavior the input space is gridded. This gridding is done by deﬁning diﬀerent rotations and diﬀerent amplitudes. The diﬀerent amplitudes explain the second summation in the deﬁnition of the controllability Gramian. Since the diﬀerent responses are the result of Dirac pulses with diﬀerent energy levels this energy level is compensated for to make the responses comparable. This explains the division by c2m of each response. The rotation of unity vectors makes sure that all input directions are equally represented, which is not only intuitively the right thing to do but it appears a necessity for the construction of the empirical controllability Gramian in this way. 43 Empirical observability Gramian. Let T n , M and E n be given as described above, where n is the number of states of the original system. The empirical observability Gramian for system (2.7) is deﬁned by Q= r s l=1 1 2 rsc m m=1 ∞ 0 Tl Ψlm (t)TlT dt , (2.57) where Ψlm (t) ∈ Rn×n is given by ilm Ψlm (t) − yss )T (y jlm (t) − yss ) , ij (t) := (y (2.58) and y ilm (t) is the output of the system corresponding to the initial condition x0 = cm Tl ei + xss and yss is the steady-state value corresponding to the input u(t) = uss . The proof can be found in Lall (1999). The ﬁrst set of summations in the empirical observability Gramian can be given the interpretation of a rotation of the standard unity vectors Rn . Inﬁnitely many diﬀerent rotations would form a unity sphere in Rn with its origin located in the state state of the system. Each of this rotated unity vectors is used as a perturbation on the steady-state xss of the system. By simulation of the free response of the system from this initial condition we can see how the energy manifests itself in the output. This gives a measure for observability of that speciﬁc direction in the state space. If it takes little energy to drive the system in that direction of the state space and the energy strongly manifests itself in the output we give it the notion of a good controllable and observable subspace. In order to search the state space for nonlinear behavior the state space is gridded. This gridding is done by deﬁning diﬀerent rotations and diﬀerent amplitudes. The diﬀerent amplitudes are the second summation in the deﬁnition of the observability Gramian. Since the diﬀerent responses start with diﬀerent energy levels this energy level is compensated for to make the responses comparable. The explains the division by c2m of each response. The rotation of unity vectors makes sure that all directions are equally represented, which is not only intuitively the right thing to do but it appears a necessity for the construction of the empirical observability Gramian in this way. Hahn et al. (2000) translated this to a discrete time version based on the same deﬁnitions as Lall. Roughly speaking Hahn replaced the integral in the empirical Gramians by a ﬁnite sum approximation of the sampled continuous time system, where xk is short hand notation for x|t=k∆t . Discrete time empirical controllability Gramian. Let T p , M and E p be given as described above, where p is the number of inputs of the system. The 44 discrete time empirical controllability Gramian for system (2.30) is deﬁned by Wc = p r s l=1 m=1 i=1 q 1 ilm Φk , rsc2m (2.59) k=0 where Φilm ∈ Rn×n is given by k ilm − xss )(xilm − xss )T , Φilm k (t) := (xk k (2.60) is the state of the system corresponding to the impulse response and xilm k uk = cm Tl ei δk=0 + uss with initial condition x0 = xss . Discrete time empirical observability Gramian. Let T n , M and E n be given as described above, where n is the number of states of the original system. The discrete time empirical observability Gramian for system (2.30) is deﬁned by Wo = r s l=1 q 1 T Tl Ψlm k Tl , 2 rsc m m=1 (2.61) k=0 n×n where Ψlm is given by k ∈R ilm Ψlm − yss )T (ykjlm − yss ) , ij k := (yk (2.62) and ykilm is the output of the system corresponding to the initial condition x0 = cm Tl ei + xss and input uk = uss . This approach enables to reconstruct discrete time Gramians from sampled data. If this data is generated by a linear system the solution will converge to the same solution as obtained by solving the Lyapunov equations (2.33). For data generated by a nonlinear system we end up with a so-called empirical Gramian. The empirical Gramian and the way Hahn derives it will be assessed later in this chapter. 2.3 Proper orthogonal decomposition Diﬀerent names for proper orthogonal decomposition exist, such as Karhunen-Loève expansion or method of empirical eigenfunctions. It provides a very eﬀective way to compute low order approximate models. The idea of a proper orthogonal composition is to ﬁnd an orthogonal transformation maximizing the energy content in the ﬁrst basis vectors. A common way to determine these optimal basis vectors (e.g. Aling et al., 1996; Shvartsman and 45 Kevrekidis, 1998) will be described next. For more references on this topic the reader is referred to the literature section in the previous chapter. Let us execute a simulation with a model deﬁned by Equations (2.1) and input sequence u(t). Sampling the state trajectories of this simulation provides a so-called snapshot of the system (2.63) XN = ∆x(t0 ) ∆x(t1 ) . . . ∆x(tN ) , where ∆x(tk ) = x(tk ) − x∗ and x∗ is a steady state. The snapshot matrix XN ∈ Rnx ×N with N nx can be an ensemble of diﬀerent simulations. A singular value decomposition of XN is deﬁned as XN = U ΣV T , (2.64) where Σ is diagonal with σ1 ≥ σ2 ≥ · · · ≥ σn and with U U T = I and V V T = I. XN can be partitioned as V1T Σ1 0 0 V2T . (2.65) XN = U1 U2 0 Σ2 0 V3T Suppose that σmin (Σ1 ) σmax (Σ2 ) we can assume Σ2 = 0, which implies that XN 0 = U1 Σ1 V1T = U2 Σ2 V2T . (2.66) Apparently the transformation U distinguishes between two subspaces where most energy is captured by U1 and the rest of the energy is gathered in U2 . Let us deﬁne a new coordinate system T U1 z1 (t) (2.67) = (x(t) − x∗ ) . z2 (t) U2T The set of ordinary diﬀerential equations deﬁned by Equations (2.1) can be reduced by transformation and truncation T z˙1 U1 f (U1 z1 + x∗ , u) z1 (t0 ) = U1T (x0 − x∗ ) = , . (2.68) ∗ y = 0 g(U1 z1 + x , u) z2 Or without elimination of x in the right hand side, which can be practical from an implementation point of view U1 z1 + x∗ x z1 (t0 ) = U1T (x0 − x∗ ) z˙1 = U1T f (x, u) , . (2.69) = 0 z2 y g(x, u) 46 The same transformation can be used to reduce the model by residualization T z˙1 U1 f (U1 z1 + U2 z2 + x∗ , u) 0 = U2T f (U1 z1 + U2 z2 + x∗ , u) , z1 (t0 ) = U1T (x0 − x∗ ) , (2.70) y g(U1 z1 + U2 z2 + x∗ , u) which is equivalent to T z˙1 U1 f (x, u) 0 U2T f (x, u) y = g(x, u) x U z + x∗ z1 (t0 ) = U1T (x0 − x∗ ) . , (2.71) Recall that in case of residualization we reduce the number of diﬀerential equations transforming the set of ordinary diﬀerential equations (ode) into a set of diﬀerential and algebraic equations (dae). Proper orthogonal decomposition revised State transformation and scaling before applying the singular value decomposition has a decisive eﬀect on the transformation and thus reduction of the model. N = W XN = W U ΣV T . X (2.72) A state coordinate change has a major eﬀect on the model reduction. This is an important observation: Reduction by means of proper orthogonal decomposition strongly depends on speciﬁc choice of coordinate system. This can be proven by a special choice of W = Σ−1 U T , which yields a coordinate system where all states are equally important, N = W U ΣV T = Σ−1 U T U ΣV T = U ΣV T , X (2.73) = I ∈ Rnx ×nx and Σ = I ∈ Rnx ×N . with U A pragmatic solution is scaling with a matrix with only on its diagonal the reciprocal of the diﬀerence between maximal and minimal allowable value for that speciﬁc variable. In this way all variables are normalized and diﬀerences in units are compensated for. Most successes reported in papers on model reduction with proper orthogonal projection consider discretized partial diﬀerential equations with only one type of variable (e.g. only temperatures). When a model consists variables with diﬀerent units their relative importance depends on the speciﬁc choice of units. Proper orthogonal decomposition applied to a model involving temperatures and mass, where mass is expressed in kilograms 47 and temperatures in degrees Kelvin, will give other results than the same model with mass expressed in tons. Computation of the singular value decomposition of a snapshot can be quite demanding. This is caused mainly by the computation of V T , which is a N × N matrix with N very large. In case we are only interested in the orthogonal projection matrix U we can reduce the computation load by computation of the T , singular value decomposition of XN XN T XN XN = U ΣV T V ΣU T = U Σ2 U T . (2.74) In this way we do not compute V T and still have access to the projection matrix U and the corresponding singular values Σ. The data for the snapshot matrix is generated by simulation of the model with a speciﬁc choice of inputs. Transformation based on this snapshot strongly depends on the choice of input signals. White noise input signals on all input channels does not discriminate between high and low frequency content and will excite the model without preference. In case one has information on what frequency range is relevant for the model to approximate we can put more energy in this part of the input signal, which results in a better approximation in this frequency range. In case we have a clear idea what the relevant input signals look like we can ﬁne tune the model reduction to this set of input signals. This most probably allows for a larger degree of model reduction. However, for signals not represented in the set input signals used for generation of the snapshots we cannot expect good performance of the reduced model. So depending of the knowledge on future input signals this property can either be an advantage or a disadvantage. In case a model based optimization is used to explore new optimal trajectories we do not know how the input trajectories look like. In that case we have to resort to some white random type of input signals, most probably still allowing for a certain degree of reduction. One can think of the weakly controllable states. Regardless of the choice of inputs, these states will only get little energy from the inputs and will be reduced in this approach anyway. The discrete time controllability Gramian and snapshot matrix are strongly related in case that white noise inputs are used, T = N · Wc = N · Uc Σ2c UcT , XN XN (2.75) with N the number of samples in the snapshot matrix, Wc the discrete time controllability Gramian and Uc and Σ2c the orthogonal matrix and singular value matrix of the controllability Gramian, respectively. Note that the transformation matrix used for proper orthogonal projection coincides with the singular value decomposition of the controllability Gramian. See Appendix B for details. In case another than white noise signal is used to generate the snapshot matrix we can consider this as the result of a ﬁltered white noise signal. This 48 brings us into the topic of so-called frequency balanced reduction. See for details on frequency weighted balanced reduction e.g. Wortelboer (1994). Frequency analysis of the input signals used to generate the snapshot matrix will provide insight in what frequency ranges the model is excited. All input data without a white spectrum indicate some kind of frequency weighting. So the observation is that: For a linear system a snapshot matrix times its transpose approximates the (frequency weighted) discrete time controllability Gramian. Dynamics related to input directions and frequency ranges that are not excited will not be present in the data and therefore will be removed from the model. This is not a model property but a the result of input signal selection. Note that a model itself acts as a ﬁlter, so in case two units are in a series connection the ﬁrst acts as a ﬁlter on signals that are applied to the ﬁrst unit before they are aﬀecting the second unit. Even more interesting is it when the output of the second unit aﬀects the ﬁrst unit by means of a material recycle stream or feedback control. Although the above reasoning is based on linear model properties for smooth nonlinear systems we can argue that a similar reasoning holds. Singular values can provide information on the possible model order reduction. However, model stability is a more important property of the reduced model. Unfortunately no guarantees exist for stability of projected nonlinear models. Even stronger, a lower order reduced model can be stable for a speciﬁc trajectory whereas a higher order reduced model becomes unstable for the same trajectory. In Chapter 5 will be elaborate on this issue. Löﬄer and Marquardt (1991) motivated and applied a weighted Euclidian norm · Q deﬁned as x Q = xT QT Qx , (2.76) with a positive deﬁnite and quadratic weighting matrix Q. The covariance matrix deﬁned in Equation (2.53) can be written as ∞ M= Q(x(t) − x∗ )(x(t) − x∗ )T QT dtdG . (2.77) G 0 T T This weighted covariance matrix can be approximated by QXN XN Q and shows that model reduction by a weighted proper orthogonal decomposition is strongly related to model reduction based on the covariation matrix M . If G consists of the same set of simulations that generated XN , reduction based proper orthogonal decomposition and reduction based on the covariance matrix will be almost identical. 49 In this chapter two important model reduction techniques from literature were discussed and given an interpretation. We will now proceed with a reformulation of the empirical Gramian in a new format that is more ﬂexible and more insightful than the empirical Gramians previously deﬁned in this chapter. 2.4 Balanced reduction revisited In this section we will focus on properties and interpretation of empirical Gramians and how these are derived. First we will formulate a more simple way to derive discrete time empirical Gramians as introduced by Lall (1999). This will be referred to as perturbed data based empirical Gramians. This formulation allows for a more ﬂexible way how to derive empirical Gramians. Then this formulation is extended to a generalized formulation that allows for a derivation of empirical Gramians with almost arbitrary input trajectories. This will be referred to as the generalized empirical Gramians. Finally we will explain how to interpret the empirical Gramian by means of a simple example revealing the true mechanism behind empirical Gramians. From a theoretical point of view balanced reduction using Gramians is preferred over reduction by orthogonal decomposition since it really takes into account the model contribution between input and output and does not depend on internal coordinate system (see previous section). The aﬀect of scaling of selected inputs and outputs on the reduced model can qualitatively be predicted with common engineering sense. Scaling of the states is still preferable but only from a numerical conditioning point of view. Theoretically a state transformation does not aﬀect the result of balanced model reduction. Discrete time perturbed data based Gramians Gramians can be reconstructed from data generated by perturbation of the steady-state of a linear discrete time system. For the empirical observability Gramian we need to perturb the initial condition in diﬀerent directions and simulate the free response to is equilibrium. For the empirical controllability Gramian we apply Dirac pulses in diﬀerent directions via the inputs and simulate the free responses. This was the basis on which the empirical Gramians by Lall (1999) were deﬁned. The constraint that is enforced by Lall is that each set of perturbations is orthogonal. This constraint is crucial for the computation of the empirical Gramians as formulated in his way. This restriction is somewhat artiﬁcial and can be very unpractical and above all is unnecessary as we will demonstrate next. 50 For this reformulation new data matrices are required and these are deﬁned next. Suppose a stable linear discrete time system as deﬁned in (2.30) is in equilibrium with u∗ = 0 and therefore x∗ = 0. Let us deﬁne YN as an output response data matrix 1 y0 y02 · · · y0p y11 y12 · · · y1p (2.78) YN = . .. .. , .. . . yq1 yq2 · · · yqp where ykr is the value of the output at time t = kh and k = {0, . . . , q} of the rth free response observed in the output. This free response is the result of a perturbed initial condition xr0 . All perturbed initial conditions are stacked in a second matrix X0 deﬁned as (2.79) X0 = x10 x20 · · · xp0 . With the deﬁnition of YN and X0 we deﬁned the two matrices required to compute the discrete time perturbed data based observability Gramian. Discrete time perturbed data based observability Gramian Let YN and X0 be given as described above. The data based discrete time observability Gramian as deﬁned in Equation (2.32) for the discrete time linear system as deﬁned in Equation (2.30) is T Wo = X0† YNT YN X0† , (2.80) where X0† is a right inverse of X0 X0† = X0T (X0 X0T )−1 . (2.81) Proof: The output data matrix YN can be written as YN = Γo X0 , with Γo the discrete time observability matrix C CF Γo = . . .. CF q 51 (2.82) (2.83) Substitution yields Wo T = X0† X0T ΓTo Γo X0 X0† = ΓTo Γo = CT F qT CT F T CT C CF .. . (2.84) (2.85) CF q = T Σqk=0 F k C T CF k . (2.86) The proof ends by letting q → ∞. Now we can link this formulation to the formulation used by Hahn. Suppose that the we have as many initial conditions as states and that these initial conditions are orthonormal. The advantage of this orthonormality is that we do not need to compute the inverse of X0 X0T since this is by deﬁnition equal to the unity matrix. This knowledge is exploited in the formulation used by Hahn. By substitution of X0 X0T = I into Equations (2.81) and (2.80) we can recognize the formulation by Hahn presented in Equation (2.61) Wo = X0 YNT YN X0T . (2.87) We only need to convert the matrix notation into a summation representation. The full proof that the formulation by Hahn ﬁts within the new formulation can be found in Appendix A.2. The price we have to pay in this new deﬁnition is that we need to compute the pseudo inverse X0† but in return we may plug in any initial condition we would like, which obviously is less restrictive. The new constraint is that the conditioning of X0 X0T should allow for a numerically stable inversion, which is less restrictive than enforcing orthogonality. Furthermore we can add extra initial conditions one by one instead of by a whole orthogonal set at once. The diﬀerent amplitudes and the number of initial conditions are directly accounted for. In a similar way we can derive the discrete time perturbed data based controllability Gramian. Let us deﬁne XN as the state response data matrix 1 p 2 XN · · · XN , (2.88) XN = XN where r XN = xr1 xr2 ··· xrq , (2.89) and xrk is the value of the state at t = kh of the impulse response to ur0 . Deﬁne U0 as the matrix with all impulse values (2.90) U0 = U01 U02 · · · U0p , 52 where ur0 0 U0r = 0 0 ur0 .. . 0 0 0 , ur0 (2.91) such that r XN = Γc U0r , (2.92) with the discrete time controllability matrix Γc = G F G · · · F q G . (2.93) Discrete time perturbed data based controllability Gramian Let XN and U0 be given as described above. The data based discrete time controllability Gramian as deﬁned in Equation (2.32) for the discrete time linear system as deﬁned in Equation (2.30) is T T , Wc = XN U0† U0† XN (2.94) where U0† is the right inverse of U0 U0† = U0T (U0 U0T )−1 . (2.95) Proof: The state response matrix XN can be written as XN = Γc U0 . (2.96) Substitution yields Wc = = = T Γc U0 U0† U0† U0T ΓTc = Γc ΓTc G FG G GT F T F qG .. . GT F q T ··· T Σqk=0 F k GGT F k . The proof ends by letting q → ∞. 53 T (2.97) (2.98) (2.99) ζ2 Rp Rp ζ2 ζ1 ζ1 Figure 2.1: Left: favorable steady-state with maximum perturbation radius where ζ represent either states in Rnx or inputs in Rnu . Right: steady-state near a constraint results in a smaller admissible radius of feasible perturbations. Again we can link this formulation to the formulation adopted by Hahn. Suppose we have as many impulse responses as inputs and these impulses happen to be orthonormal, which is required within Hahn’s framework. In that case we know that by deﬁnition U0 U0T = I. By substitution of U0 U0T = I in Equations (2.95) and (2.94) we already can recognize the formulation by Hahn presented in Equation (2.59) T T = XN XN . Wc = XN U0 U0T XN (2.100) We only need to convert the matrix notation to a summation representation. Most of all pros and cons that apply to the empirical observability Gramian apply to the empirical controllability Gramian. So, in the new formulation we can add one input perturbation at a time in an arbitrary direction and magnitude as long as the conditioning of U0 U0T allows for a numerical stable computation of the right inverse. This is a much more ﬂexible formulation and enables easy computation. Note the resemblance between Equation (2.100) and snapshot matrix used to do proper orthogonal decomposition in Equation (2.75). For special choices of signals the snapshot matrix times its transpose is identical to the controllability Gramian. So the empirical Gramians as deﬁned by Lall (1999) and adopted by Hahn et al. (2000) are a special case of the Gramians presented in this section. If the perturbations are chosen as orthogonal sets with possibly diﬀerent amplitudes the two methods coincide. However, since there is no fundamental motivation for this choice, other than numerical reasons, the method as presented in this section leaves more freedom for the user to choose perturbations. The only condition on the perturbations is that they should span the whole column space. 54 A practical disadvantage of the orthogonal perturbations as proposed by Lall (1999), is related to constraints on perturbations. These constraints come natural with the model such as positiveness of variables and (molar) fractions that should remain between zero and one. This is illustrated in Figure 2.1 where in the left picture the steady-state is chosen such that a large area of the admissible perturbations can be covered by orthogonal sets of perturbations. In the right of the same ﬁgure is illustrated that a constraint near the steady-state restricts the radius of admissible perturbations. In case of the method presented in this section the perturbations can freely be chosen as long as they satisfy the constraints and span the whole space. Note that the illustration holds for input perturbations as well as for state perturbations. It is questionable if the relevant nonlinear dynamics are revealed with this perturbation approach. This observation inspired to develop a method that enables computation of empirical Gramians from a single simulation of a relevant trajectory. It is closely related to the discrete time perturbed data based Gramians introduced in this section. Generalized data based Gramians The data based Gramians in the previous section are constructed from impulse response data and data from perturbation of steady-state conditions. In this section we will generalize this to a much wider class of signals. We will present how to use data from one simulation of a trajectory for the construction of both controllability and observability Gramian. We will demonstrate how the approach is applied to an asymptotically stable linear model which is assumed to be in steady-state at the beginning of the trajectory. Let us deﬁne the data matrix XN as the snapshot matrix of the matrix XN = x1 x2 · · · xN , (2.101) where xk is the value of the state at t = kh of the response to the input sequence t u(t). The covariance matrix M = G 0 N (x(t) − x∗ )(x(t) − x∗ )T dtdG as used by T Löﬄer and Marquardt (1991) can be approximated by XN XN . The covariance matrix does not equal the observability or controllability Gramian. Only if G is chosen in the special way (orthogonal impulse responses) as deﬁned in Lall (1999), the covariance matrix will approximate the controllability Gramian. XN can be written as N XN = ΓN c UN , with ΓN c the discrete time controllability matrix G FG · · · FNG , ΓN c = 55 (2.102) (2.103) N and the input matrix UN deﬁned u0 0 N UN = . .. 0 as u1 u0 .. . ··· ··· .. . uN uN −1 .. . ··· 0 u0 . (2.104) If the system is stable limq→∞ F q = 0. Therefore we can truncate the controllability matrix and input matrix and approximate XN XN ≈ Γc UN , (2.105) with Γc the truncated controllability matrix as in Equation (2.93) and the truncated input matrix as u0 u1 · · · uq uN −1 0 u0 · · · uq−1 uN −2 (2.106) UN = . . .. .. .. .. . . . ··· 0 0 uN −1−q u0 Generalized discrete time data based controllability Gramian Let XN and UN be given as described above. The data based discrete time controllability Gramian for the discrete time linear system (2.30) is deﬁned by T † † T UN XN , Wc = XN UN where † UN (2.107) is the right inverse of UN † T T −1 UN = UN (UN UN ) . (2.108) Proof: The state response matrix XN can be written as XN = Γc UN (2.109) Substitution yields Wc = = = T † † T T Γc UN UN UN UN Γc = Γc ΓTc G FG G GT F T F qG .. . GT F q T ··· T Σqk=0 F k GGT F k , 56 T (2.110) (2.111) (2.112) which completes the proof. For the observability Gramian we use the Hankel matrix. Let us deﬁne the output data matrix YN as YN = y1 y2 ··· yN , (2.113) with the argument that limq→∞ F q = 0 YN can be approximated by YN ≈ Γco UN , (2.114) with the truncated Markov parameters Γco deﬁned as Γco = CG CF G ··· CF 2q G , (2.115) and UN is the truncated input matrix. The Markov parameters can be determined by † , Γco = YN UN (2.116) † with UN the right inverse of UN as deﬁned in Equation (2.108). The Hankel matrix is deﬁned as CG CF G · · · CF q G .. CF G . , (2.117) H = Γo Γc = .. .. . . ··· · · · CF 2q G CF q G which can be ﬁlled with the Markov parameters. Generalized discrete time data based observability Gramian Let H and Γc be deﬁned as described above. The data based discrete time observability Gramian for the discrete time linear system (2.30) is deﬁned by T Wo = Γ†c H T HΓ†c , (2.118) with Γ†c as the right inverse of Γc Γ†c = ΓTc (Γc ΓTc )−1 = ΓTc Wc−1 . 57 (2.119) Proof: Substitution of H yields Wo = = T Γ†c ΓTc ΓTo Γo Γc Γ†c = ΓTo Γo CT F qT CT F T CT (2.120) C CF .. . (2.121) CF q = T Σqk=0 F k C T CF k , (2.122) which completes the proof. The right inverses used in the computation of the Gramians must exist and should be well conditioned. In case of the right inverse of the input matrix UN this can be achieved by a proper choice of inputs whereas the existence and well conditioned right inverse of Γc cannot be guaranteed. This issue will be treated in an example in the next section. First we will reveal the underling mechanism of empirical Gramians. Basic question was what an empirical Gramian looked like if data was gathered in two diﬀerent operating conditions of a nonlinear system using small perturbations. The use of small perturbations implies that only the local linear behavior of the nonlinear model is excited. Example of two linear models We will investigate the mechanisms that occur in the case of computation of empirical Gramians. Therefore we assume two linear discrete time systems that represent the local dynamics of a nonlinear system in two diﬀerent operating points F1 C1 G1 0 , F2 C2 G2 0 . The according controllability Gramians are Wc1 = Γc1 ΓTc1 = G1 F1 G1 ... F1q G1 GT1 GT1 F1T .. . GT1 F1q 58 T , and Wc2 = Γc2 ΓTc2 = G2 F2 G2 ... F2q G2 GT2 GT2 F2T .. . GT2 F2q . T We can generate data in these two points that we collect in data matrices XN 1 and XN 2 , respectively XN 1 = Γc1 UN 1 , XN 2 = Γc2 UN 2 . We already proved the local controllability matrix can be reconstructed from this data and therefore the the controllability Gramian as well. The two local controllability matrices are T T −1 Γc1 = XN 1 UN 1 (UN 1 UN 1 ) T T −1 Γc2 = XN 2 UN 2 (UN 2 UN 2 ) . Substitution of XN 1 and XN 2 yields T T −1 Γc1 = Γc1 UN 1 UN 1 (UN 1 UN 1 ) T T −1 Γc2 = Γc2 UN 2 UN . 2 (UN 2 UN 2 ) Suppose we computed the average controllability matrix by combining the two data sets UN 1 UN 2 UN = UN 1 0 XN 1 XN 2 = Γc1 Γc2 XN = 0 UN 2 = Γc UN . Solution for this average controllability matrix Γc is T T −1 Γc = XN UN (UN UN ) . Substitution of XN and UN yields Γc = = Γc1 Γc2 Γc1 Γc2 T UN 1 UN 1 T UN 1 0 UN 2 UN 2 T T UN UN 2 2 T U U N1 N1 −1 T T UN 1 UN . 1 + UN 2 UN 2 T UN 2 UN 2 UN 1 0 59 −1 Suppose UN 2 = γUN 1 . This implies that we use identical input signals for data generation in the second operating point as were used for data generation in the ﬁrst operating point but scaled with the contstant γ. Substitution yields T UN 1 UN 1 −1 T 2 T UN 1 UN Γc1 Γc2 Γc = 1 + γ UN 1 UN 1 T γ 2 UN 1 UN 1 1 I Γ Γ = c1 c2 γ2I 1 + γ2 = 1 γ2 Γc1 + Γc2 . 2 1+γ 1 + γ2 If γ = 1, so if we use exactly the same inputs signals, we see that the interpolated controllability matrix coincides with the equally weighted interpolation of the two local controllability matrices Γc = 1 (Γc1 + Γc2 ) . 2 T 2 T 2 An other special case is when UN 1 UN 1 = α I and UN 2 UN 2 = β I. This is the 2 case if we apply an energy level of α to the to the ﬁrst operating point and β 2 to the second operating point. The perturbation orientation and amplitude in both operating points are completely free as long as they can be described by orthogonal sets of perturbations. Substitution yields −1 α2 I 2 α I + β2I Γc1 Γc2 Γc = 2 β I α2 I 1 Γ Γ = c1 c2 β2I α2 + β 2 = α2 α2 β2 Γc1 + 2 Γc2 . 2 +β α + β2 So by allocation of energy in the test signals over operating points we emphasize diﬀerent operating points. If we want to compute the empirical Gramian based on data generated in two operating points and use small perturbations this is similar to averaging of the two controllability matrices. This is not equal to averaging the two controllability Gramians. The proof is simple Wc = 12 Wc1 + 12 Wc2 = 12 Γc1 ΓTc1 + 12 Γc2 ΓTc2 = , (2.123) Γc ΓTc = 14 Γc1 ΓTc1 + 14 Γc1 ΓTc2 + 14 Γc1 ΓTc2 + 14 Γc2 ΓTc2 . (2.124) One could argue that this is a ﬂaw of the empirical Gramian. This concludes the true underlying mechanism of the Empirical Gramians. 60 Interpolation of local linear Gramians Motivated by the conclusion that in the end empirical Gramians attempt to reconstruct Gramians of local linear dynamics a simple alternative for the data based Gramians is interpolation of local Gramians. Löﬄer and Marquardt (1991) approximated the covariance matrix as deﬁned in Equation (2.53) by using multiple linear models. The covariance matrix was computed directly by means of a Monte Carlo method and using the linear approximation. Interpolation of local Gramians is done by deriving a number of linear models in a relevant operating range from which the local Gramians can be computed by solving the discrete time Lyapunov equations (2.33). This yields Wc = N 1 Wc,i N i=1 , (2.125) Wo = N 1 Wo,i N i=1 , (2.126) and where Wc,i and Wo,i are the local controllability and observability Gramian, respectively. The local Gramians can be weighted with γ to emphasize speciﬁc operating points. This yields Wc = N N 1 i=1 γc,i γc,i Wc,i , (2.127) γo,i Wo,i . (2.128) i=1 and Wo = N N 1 i=1 γo,i i=1 The result of this averaging approximates the perturbed data based Gramians in case the eﬀect ﬁnite data is negligible and suﬃciently small perturbations are used. 61 2.5 Evaluation on a process model In this chapter diﬀerent ways to compute Gramians were presented. In this section we will compare diﬀerent approaches by applying them to a simple model. In the ﬁrst test we want to exclude nonlinear eﬀects to enable a lucent assessment of the diﬀerent data based empirical reduction techniques by using a linearization of a nonlinear model. In the next tests we return to the nonlinear model we will then compare diﬀerent Gramian based reduction approaches where we focus on diﬀerences between data based Gramians and Gramians derived from linearizations of the nonlinear model. ....... ..... ........ ......... .... .... .............. ❄ ......... .. .......... .... .... .............. . ......... .................. .............. ✠ ...... ......... .......... ✲.. .. .................. .............. ❄❄ .............................. ..... ... ... ................. .............. ✛ ✲ ................ .................. ........................ . .... .... ....... ......................... ✲ ................. ............. .................. .............. ✒ .... . ........................ .................. .............. ✲ Figure 2.2: Schematic of the process model: reactor in series with a distillation column with recycle to the reactor. In this chapter we propose to use a simple model of a plant. A schematic of that plant is presented in Figure 2.2. The process represent a general class of chemical processes consisting of a reactor with separation unit and recycle stream. All levels are controlled and the top and bottom quality are measured and controlled by reﬂux ratio and boilup rate, respectively. Temperate in the reactor is assumed to be constant by tight temperature control. For detailed model information the reader is referred to Chapter 3 for a general description and Chapter 4 for details on modelling assumptions. The model consists of 24 diﬀerential equations1 and we are interested in the top purity and bottom impurity that are controlled by reﬂux ratio and boilup rate. So by model reduction we want to ﬁnd a low order model that still properly represents the input to output dynamics. Note that the eﬀect of the reactor is taken into account since the reactor is connected to the column by fresh feed and recycle stream. 1 gproms code can be found on the webpage http://www.dcsc.tudelft.nl/Research/Software. 62 xk+1 = F xk + Guk yk = Cxk + Duk (2.32) d ✲ Wc d ✲ L q ✲ Wcq q ✲ L e ✲ Wce e ✲ L g ✲ Wcg g ✲ L Wod (2.32) (2.35) (2.35) (2.59) (2.61) (2.107) (2.118) Wo Wo Wo Rd Rq Re Rg Figure 2.3: Schematic overview of projections based on diﬀerent Gramians with between brackets the corresponding equation numbers. Results on the linearized model For the linear model the bottom impurity and top purity are used as outputs, y1 and y2 , respectively, and reﬂux ratio and boilup rate are used as inputs, u1 and u2 , respectively. In this thesis we restrict to piecewise constant input signals and therefore it is sensible to use discrete time Gramians instead of continuous time Gramians. In this way the choice for the limited class of possible input signals enters the Gramians and consequently into the projections. Moore (1981) elaborates in his paper on the eﬀect of sample time on discrete time Gramians and the relation to the continuous time Gramian. In Figure 2.3 all diﬀerent projections that will be considered in the ﬁrst test are presented in a schematic way. Wc and Wo are controllability and observability matrices with superscripts d for discrete-time (Equation 2.32), q for ﬁnite discrete-time (Equation 2.35), e for empirical (Equations 2.59 and 2.61) and q for generalized (Equations 2.107 and 2.118), respectively. L and R are the left and right projectors derived from the Gramians and result in the reduced-order model by truncation. In Figure 2.4, the norm of the error, deﬁned as G(s) − G(s) , (2.129) of the reduced-order models based on the four diﬀerent Gramians (exact, ﬁnite time, empirical and generalized, respectively) is plotted against the reduced model order. It is hard to discriminate between the reduced models for most orders since the approximation error is of the same order of magnitude. Theoretically the results should coincide except for the exact discrete-time solution. The 63 H2 approximation error −2 10 −4 ||ε||2 10 −6 10 −8 10 0 5 10 order 15 20 25 H2 approximation error 2 step approach −2 10 −4 ||ε||2 10 −6 10 exact finite empirical generalized −8 10 0 5 10 order 15 20 25 Figure 2.4: Norm of the error of the four diﬀerent Gramian based reduced models against model order. Top: direct reduction on the original system. Bottom: two step reduction method. diﬀerences can be explained by numerical errors in the diﬀerent computations. In case of the generalized Gramian the inverse of the controllability Gramian is used (Equation (2.119)) to compute the observability Gramian. This computation introduces numerical errors when the controllability Gramian is close to singular. Therefore a two-step approach is proposed that is explained next. In the ﬁrst is step the model is projected based on a singular value decomposition of the controllability Gramian T Σc1 0 Uc1 Wc = Uc Σc UcT = Uc1 Uc2 . (2.130) T 0 Σc2 Uc2 We can partition Uc such that the conditioning of Σc1 is acceptable from a numerical point of view to compute its inverse. The result of projection and truncation based on Uc1 is a reduced-order model with only controllable states. This reduced model can be reduced in a second step by a numerically proper 64 balanced reduction. In this example the ﬁrst reduction resulted in a reduced model of order eight. Therefore in the bottom of Figure 2.4 only the errors of the balanced reduced models up to the order of eight are present. This two-step approach does not work in general and it is possible to come up with an example to prove this. In practice it provides a suitable reduction as we will see. The dashed line in Figure 2.4 represent a user deﬁned error level of 10−3 . We can see in the top of this ﬁgure that we need three orders to meet that bound except for the generalized empirical Gramian reduced-order model that needs ﬁve orders. In case of the two step approach we see at the bottom of Figure 2.4 the same result except that for the generalized empirical Gramian we need four orders instead of ﬁve. This is the eﬀect of better numerical properties of the computations. A diﬀerent way to assess the quality of the reduced models is to compare step responses. Step responses of these models are depicted in Figures 2.5 and 2.6. Note that the steady state values of the nonlinear model for the bottom impurity were 0.01 and 0.90 for the top purity and that the reduced linear models only are valid close to this operating point. The two successive projections do not aﬀect the reduction up to the ﬁfth order for all Gramians except for the generalized Gramian. The reduced models derived from the generalized empirical Gramians improve by applying the two successive projections: the fourth order model in Figure 2.6 is better than the ﬁfth order model in Figure 2.5. This conclusion can also be drawn from Figure 2.4 where the error of the ﬁfth order model in the top of the ﬁgure is larger than the error of the forth order computed by the two-step approach. The results illustrate that both data based methods to derive Gramians enable a balanced reduction that approximates the balanced reduction based on the Gramian that is the solution of the Lyapunov equation. The computations that are involved in case of the generalized empirical observability Gramian are sensitive to the conditioning of the controllability Gramian and therefore a two-step reduction approach was introduced. Results on the nonlinear model For nonlinear models we know that the controllability and observability depend on the point of operation. Therefore we need to make sure that only data is produced that represent the operating envelope of interest. This is not trivial but we can think of diﬀerent possible scenarios. In Figure 2.7 four diﬀerent approaches are presented to cover the relevant operating envelope. In the situation indicated with the capital roman number I it is illustrated that we compute some average operating point from which the system is perturbed such that both steady-states A and B are enclosed by these unit ball perturbations. Besides a number of practical problems that can occur, such as 65 Step Response u(1) 0 −0.006 Amplitude 0 10 20 30 time [hr] −4 40 u(1) 0 0 10 20 30 time [hr] 40 u(2) 0.14 −0.02 0.12 −0.04 0.1 −0.06 0.08 y(2) y(2) −2 −3 −0.008 −0.08 0.06 −0.1 0.04 −0.12 0.02 −0.14 x 10 −1 −0.004 y(1) y(1) −0.002 −0.01 u(2) −3 0 0 10 20 30 time [hr] 0 40 org−24 redq−3 rede−3 redg−5 0 10 20 30 time [hr] 40 Figure 2.5: Full order linear model and reduced models based on diﬀerent Gramians. Boilup rate and reﬂux rate are inputs u(1) and u(2), respectively and bottom impurity and top purity are outputs y(1) and y(2), respectively. Step Response u(1) 0 −0.006 Amplitude 0 10 20 30 time [hr] −4 40 u(1) 0 0 10 20 30 time [hr] 40 u(2) 0.14 −0.02 0.12 −0.04 0.1 −0.06 0.08 y(2) y(2) −2 −3 −0.008 −0.08 0.06 −0.1 0.04 −0.12 0.02 −0.14 x 10 −1 −0.004 y(1) y(1) −0.002 −0.01 u(2) −3 0 0 10 20 30 time [hr] 0 40 org−24 redq−3 rede−3 redg−4 0 10 20 30 time [hr] 40 Figure 2.6: Full order linear model and reduced models based on two successive projections. Boilup rate and reﬂux rate are inputs u(1) and u(2), respectively and bottom impurity and top purity are outputs y(1) and y(2), respectively. 66 Rnx Rnx B A A ..... ................ .......... ..... .... ... . ... ... .. .. . . . .... ...... .......... .................. B I Rnx A ...... ............... .......... ...... .... . . ... ... .. .. . . .. .... ....... ........... .................. II Rnx B A III ...... ............... .......... ...... .... . ... ... .. .. . . .. .... ....... ........... .................. B IV Figure 2.7: Four diﬀerent approaches to cover the relevant state space operating envelope between the two stationary operating points, i.e. 0 = f (x, u), A and B in Rnx . The solid line represents a trajectory that satisﬁes ẋ = f (x, u). e.g. the presence of constraints that restrict the radius of admissible perturbations, it is very questionable whether this approach represents the relevant operating envelope. By increasing the operating envelope the validity requirements for the model increase, which reduces the potential for model reduction. Therefore the operating envelope should be large enough to capture all relevant dynamics but also as small as possible to exclude all non-relevant dynamics. In the situation indicated by the capital roman number II it is illustrated that we restrict to small perturbations in the two steady-states A and B. This is implementable since we can decrease the perturbations until all constraint violations have disappeared. Note however, that for small perturbations all data based Gramians converge to the two stationary Gramians that can also be computed by derivation of a linear model and solving two Lyapunov equations. The empirical Gramian is derived from the united data sets derived in the two steady-states A and B. Note that in case we use suﬃciently small perturbations and an equal number of perturbations in both steady-states, computation of empirical Gramians approximates the average Gramian of the two local Gramians. Drawback of this approach can be the area covered by the perturbations does not cover the whole transition between steady-states A and B. This naturally brings us to the situation that is depicted in Figure 2.7 with the capital roman number III. 67 input perturbations 3.2 3 2.8 2.6 reflux ratio 2.4 2.2 2 1.8 1.6 1.4 1.2 1.8 2 2.2 2.4 2.6 2.8 boilup rate 3 3.2 3.4 3.6 3.8 Figure 2.8: Input perturbations of reﬂux ratio and boilup rate to generate impulse response data for construction of empirical controllability Gramian as described in situation II in Figure 2.7. Each marker represents one simulation. The + markers are centered around the ﬁrst operating point A and × markers are centered around the second operating point B. boilup rate 3.4 Qreb 3.2 3 2.8 2.6 2.4 0 10 20 30 40 50 time [hr] 60 70 80 90 60 70 80 90 reflux ratio 2.6 RR 2.4 2.2 2 1.8 1.6 0 10 20 30 40 50 time [hr] Figure 2.9: Input trajectories of boilup rate (top) and reﬂux ratio(bottom) used for evaluation of the projected models. 68 The situation indicated with the capital roman number III exactly covers the whole relevant area deﬁned by the transition by performing small perturbations in diﬀerent points along the transition. This situation coincides with the situation indicated with capital roman number IV if the perturbations are suﬃciently small and the local dynamics can be represented by a linear model. The result can then be interpreted as the average of the diﬀerent local Gramians along the trajectory. We will now compare reductions applied to the nonlinear example. First we present the result that is described in situation II where the operation envelope is covered by perturbation in the two diﬀerent steady-state operating points. The right upper circle in the ﬁgure is centered around operating point A whereas the circle in the left lower corner is centered around B. The input perturbations are depicted in Figure 2.8. In an attempt to cover the relevant nonlinear dynamics we perturbed the inputs with a magnitude such that an overlap in the state space was created. Still this is a somewhat arbitrary choice for input perturbations. By means of simulation, data is generated that was used to derive the empirical controllability Gramian. In a similar manner perturbations were chosen to compute diﬀerent observability Gramians but since these perturbations are of a too high dimension these cannot be depicted in a ﬁgure like the perturbations used for the inputs. The state perturbation were chosen relatively small to guarantee feasibility2 . This inherently resulted in responses that represented the local, and thus linear, dynamics. This data was used to compute empirical Gramians, as deﬁned in Equations (2.59) and (2.61), in two ways: one where all data was collected and the controllability (Wce ) and observability (Woe ) empirical Gramians where computed in one go. The second way was to use the data in each operating point to compute the local empirical Gramians that were then averaged. The according Gramians are denoted as Wcem and Woem , respectively. In situation IV of Figure 2.7 was suggested to use local linearized dynamics along a trajectory to represent relevant dynamics. This is the third approach added to the test. So along the simulation of the input trajectory at equally spaced times a linear model was derived from which a discrete time controllability and observability Gramians were computed by solving two Lyapunov equations. These were averaged and resulted in an average controllability and observability Gramian, as deﬁned in Equation (2.126), here denoted as Wcm and Wom , respectively. 2 Since the bottom impurity was 0.01 perturbations are restricted to 0.01 to guarantee positive perturbed impurity. 69 x 10−3 25 response bottom impurity xb [−] 20 15 10 5 0 10 20 30 40 50 time [hr] 60 70 80 90 0 ||ε|| 10 Wm We Wem −5 10 −10 10 0 5 10 order 15 20 25 Figure 2.10: Top: time responses of bottom impurity of reduced models based on diﬀerent linear Gramians: averaged linearized (W m ), empirical (W e ) and averaged empirical (W em ) Gramian, respectively. Bottom: error between reducedorder models and original response for diﬀerent Gramians plotted against reduced model order. −2 x 10 xd [−] 90.5 response top purity 90 89.5 0 10 20 30 40 50 time [hr] 60 70 80 90 0 10 Wm We Wem −2 ||ε|| 10 −4 10 −6 10 −8 10 0 5 10 order 15 20 25 Figure 2.11: Top: time responses of top purity of reduced models based on diﬀerent linear Gramians: averaged linearized (W m ), empirical (W e ) and averaged empirical (W em ) Gramian, respectively. Bottom: error between reduced-order models and original response for diﬀerent Gramians plotted against reduced model order. 70 The quality of the projections derived from the diﬀerent Gramians are evaluated by simulation of the model going from one operating point to the other. This input trajectory is depicted in Figure 2.9 where the inputs are ramped in eight hours from the values of one operating point to the other and continued by 48 hours of constant inputs. In a similar way the model was ramped back to the ﬁrst operating point. The results of the balanced reductions by truncation based on the three diﬀerent methods to compute nonlinear Gramians are presented in Figures 2.10 and 2.11. Simulations with the reduced models that resulted in an error smaller than the dotted line in the bottom of Figures 2.10 and 2.11 are plotted in the top of the same ﬁgures. The error is deﬁned as the square root of the summation of the quadratic error over all samples of the output. For every order we can compute this error for both outputs providing insight in the quality of the diﬀerent reduction. From the results it can be concluded that the approach of representing the relevant dynamics by local linear dynamics along the trajectory is more successful than the two alternative approaches discussed here. Although the results of the two data based approaches are not exactly the same, it is hard to discriminate between those two. The cross terms as explained in the previous section, see Equations (2.123) and (2.124), do aﬀect the reduction but whether it results in better reductions is questionable. For the higher order models the averaging approach seems better, suggesting even a negative eﬀect of computing the empirical Gramians from all data at once. Löﬄer and Marquardt (1991) sampled the state-space to construct a covariance matrix using linear models that represented the relevant dynamics in diﬀerent operating points. Instead of the Monte Carlo approach they used, it is very well possible to sample a trajectory like was done in this section in their approach. Finally we compare the eﬀect of averaging local Gramians, so derived from linearized models and solving Lyapunov equations, in more detail. Therefore we compare the responses of the reduced models from Gramians derived in operating point A, B, the averaged Gramian in A and B and the averaged Gramian along the trajectory, respectively Wa , Wb , Wab and Wm . The results are presented in Figures 2.12 and 2.13 in exactly the same way as in Figures 2.10 and 2.11. Note that the absence of a result for a speciﬁc order and reduction approach indicates a failure of the simulation. Some simulation failures could be related to a variable that hit its bound. Therefore the bounds on variables that were deﬁned in case of the original model were relaxed, which resulted in more successful simulations of reduced models. These bounds can be helpful as a debugging tool during the modelling process but are of no use in evaluating the reduced models. For the reduced models it was allowed to 71 −3 x 10 25 response bottom impurity xb [−] 20 15 10 5 0 10 20 30 40 50 time [hr] 60 70 80 90 5 10 Wab Wm Wa Wb 0 ||ε|| 10 −5 10 −10 10 0 5 10 order 15 20 25 Figure 2.12: Top: time responses of bottom impurity of reduced models based on diﬀerent linear Gramians: Averaged Gramian of A and B (Wab ), Gramian in A (Wa ), Gramian in B (Wb ) and averaged along trajectory Gramian (Wm ). Bottom: error between reduced-order models and original response for diﬀerent Gramians plotted against reduced model order. x 10−2 xd [−] 90.5 response top purity 90 89.5 0 10 20 30 40 50 time [hr] 60 70 80 90 5 10 Wab Wm Wa Wb 0 ||ε|| 10 −5 10 −10 10 0 5 10 order 15 20 25 Figure 2.13: Top: time responses of top purity of reduced models based on diﬀerent linear Gramians: Averaged Gramian of A and B (Wab ), Gramian in A (Wa ), Gramian in B (Wb ) and averaged along trajectory Gramian (Wm ). Bottom: error between reduced-order models and original response for diﬀerent Gramians plotted against reduced model order. 72 compute negative fractions, which obviously have no physical meaning, but since the only criterion for the reduced models is error in the predicted output, these values are excepted during the simulation. From these results it can be concluded that the mechanism of averaging is successful since the reduced models based on the average of two local Gramian in the two operating points perform better than the reduced models using only local dynamics in one operating point. Averaging over Gramians that can be derived along the trajectory provides the best results and will therefore be used as the approach in the next part of this thesis. The assessment of this model reduction approach for dynamic optimization will be done in Chapter 5. 2.6 Discussion In this chapter diﬀerent balanced model order reduction by empirical Gramians were assessed. It appeared that empirical Gramians are comparable to averaging of linear Gramians. Therefore we needed to reformulate the computation of the empirical Gramians to a format that is more generic and insightful than the formulation introduced by Lall (1999). In literature this interpretation was not available and therefore the interpretation can be considered as a contribution of this chapter. Inspired by the interpretation of empirical Gramians, a pragmatic approach was proposed to challenge the empirical Gramian based reduction. This pragmatic approach is based on linearization of the model in diﬀerent points in the state space. Via averaging of the Gramians that can be related to the linear models, an average controllability and observability Gramian can be computed. This approach provides good results and is less involved than the computation of empirical Gramians. Furthermore we were able to make a clear relation between proper orthogonal decomposition (e.g. Berkooz et al., 1993; Baker and Christoﬁdes, 2000), empirical Gramians (Lall, 1999; Hahn and Edgar, 2002) and the covariance matrix (Pallaske, 1987; Löﬄer and Marquardt, 1991). The snapshot matrix multiplied with its transpose approximates the discrete time controllability Gramian in case white noise signals are used. This same matrix approximates the integral that deﬁnes the covariance matrix. In case other type of signals are used we can give it a frequency weighted controllability Gramian interpretation. Important diﬀerence between proper orthogonal decomposition and Gramian based reduction is that proper orthogonal decomposition strongly depends on speciﬁc choice of coordinates. This implies a strong dependency of state scaling with only rough guidelines how to choose this scaling. 73 Chapter 3 Dynamic optimization In this chapter the optimization base case will be presented. This case involves a plant with reactor and separation section with recycle stream. The plant model has a high level of detail like current models developed with state of the art commercial modelling tools. We will pose the dynamic optimization problem and motivate some speciﬁc implementation details to provide a clear picture of execution of the optimization problem. 3.1 Base case From an academic point of view the use of a case seems not to be very elegant since it does not hold as a mathematical proof but only demonstrates the success or failure of an approach on a speciﬁc example. This speciﬁc example was carefully selected and represents the complexity of a class of dynamic optimization problems in process industry. The nonlinear model is described by a set of diﬀerential-algebraic equations with over two thousand equations1 of which seventy ﬁve are diﬀerential equations. The model size and complexity requires serious modelling and simulation tools but is just manageable for doing research with. Therefore results on this example can be transferred to dynamic optimization problems of a whole class of processes. A case study is only valuable if well documented, which explains a full chapter on the base case optimization problem formulation and implementation. The selected case problem represents characteristics common in process industry. Plant designs consisting of a reactor and separation section recycling the main reactant over the reactor are widespread. This can be motivated from an investment point of view looking at required capital costs for the production 1 This includes the identity equations introduced by connecting sub-models. 75 of an end product with speciﬁc quality. Driving force for the reaction are the concentrations of the reactants. In batch operation with only a reactor this concentration rapidly decreases slowing down the reaction rate. Batch operation would therefore require long residence times for the desired end-product quality, thus large reactors to meet capacity speciﬁcations and consequently high capital costs. Combining a reactor with separation and reactant recycle enables higher concentrations and thus higher reaction rates in the reactor because the endproduct is obtained from the separation unit. Selectivity can often be improved by the use of a catalyst minimizing the byproducts increasing productivity for both setups. In the introduction we estimated that a simple chemical production plant would easily involve a thousand diﬀerential equations and a couple of thousand algebraic equations. In the example process that will be used in the base case we conﬁne to a ﬁrst order reaction. This moderates the scale of the model without compromising the plant-wide aspects. Still the scale of the model is large enough to test model reduction techniques for large scale models. First we will describe the plant and motivate the formulation of the dynamic optimization problem. Then the optimization strategy is outlined including the most characteristic implementation details. Speciﬁc implementation choices can have a decisive eﬀect on the performance of the dynamic optimization algorithm. Finally the results of the dynamic optimization problem are presented and discussed. Plant description The example is realistic in the sense that the unit operations reﬂect a typical industrial plant since is involves a reaction step combined with a separation step and recycle with basic control. Therefore the essentials of a real plant behavior is captured and reﬂects the interaction of the separate unit operations. It is an academic example because the reaction and reaction kinetics are not based on real data and describe the academic A → B reaction. The parameters in the Soavé-Redlich-Kwong physical property model used in the distillation column are from literature (Reid, 1987) and describe the properties of propane and n-butane. The model was programmed with gproms software2 . From an economic point of view it is optimal to operate at maximum reactor volume and ﬁxed temperature. The reactor temperature is either determined by selectivity or bounded by a safety argument. Therefore the reactor holdup and temperature are PI-controlled. The holdup in reboiler and condenser are not crucial for operation but their capacity can be used for disturbances rejection. Both holdups are therefore P-controlled. Quality control in the bottom of the column is controlled by manipulation of the boilup rate and quality in the top 2 gproms code can be found on the webpage http://www.dcsc.tudelft.nl/Research/Software. 76 LC 03 FC 04 CC 03 CT 04 x .y T d = 323 K xd = 0.90 F0 = 0.50 mol/s T 0 = 293 K z 0 = 1.00 LC 04 PC 03 D = 0.6125 mol/s R = 1.5739 mol/s FC 05 TC 01 T = 323 K N = 400 mol CC 01 CT 05 F = 1.1125 mol/s z = 0.50 LC 05 LC 01 Q = 5250 J/s CC 02 LC 02 x .y FC 06 CT 06 LC 06 VB = 2.1864 mol/s B = 0.50 mol/s xb = 0.01 Figure 3.1: Example process with reactor, distillation, recycle, basic control and product storage. of the column is controlled by manipulation of the reﬂux ratio. Pressure in the column is controlled by the condenser duty with a PI-controller with a ﬁxed set-point. No heat integration was possible since the temperature in the reactor is too low for heat integration with the reboiler. The product is stored in a tank park from which the product can be transferred to a ship, train or tank wagon. In this speciﬁc case we will assume that the product can be blended and therefore the tanks are considered to be part of the production site to be optimized. Dynamic optimization problem The dynamic optimization is part of the control hierarchy as shown in Figure 1.1 and gets information from the scheduler when to produce what product. Typical look ahead of this schedule is a couple of days or weeks and typical update rate is daily at most. Objective of the dynamic optimizer of consideration is the realization of this production schedule provided by the scheduling and planning department at minimum costs. Typical look ahead for a dynamic optimization is a couple of hours up to one day. The production schedule ﬁxes quality and quantity in time and those values enter the dynamic optimization problem as constraints on the content of a tank. This formulation leaves maximum freedom 77 for the optimizer. In this speciﬁc case we want to enforce a transition followed by a quasi steady-state, which has advantages from an operation point of view. The objective of the optimization problem for this case is: ”Realize the production schedule at minimal utility costs while satisfying all input and path constraints”. In the dynamic optimization as described here we chose a horizon such that two complete tanks of the schedule are within the horizon. From the schedule the qualities and delivery times are obtained and translated into path constraints for the dynamic optimization. In this case after eight hours the ﬁrst tank has to contain a product with an impurity between 0.04 and 0.05 and the next tank has to be ready after another eight hours with an impurity lower than 0.01. The initial conditions of the plant in this example are a steady-state but could be an arbitrary non steady-state. In case of an online application we need some state estimation that provides the best estimate of the current state, which is most likely not a steady-state. After solving the dynamic optimization only the ﬁrst set points are send to the plant after which the state estimation and optimization is repeated. This is know as a receding horizon principle. The utility cost are assumed to be dominated by the energy used in the reboiler. Mixing energy is low and although cooling energy is signiﬁcant, by using cooling water its cost can be assumed to be negligible. Material costs do not enter the economic objective since we consider a simple reaction without secondary reactions, i.e. all base material A reacts only and completely to B. The inventory is controlled by the base controllers. The boilup duty and reﬂux ratio are the degrees of freedom in this optimization problem. This formulation prevents the snowball eﬀect that is discussed in many papers, e.g. Wu et. al. (2002). Furthermore we assume the plant to be operated in a push mode, which implies that the fresh feed is not a degree of freedom but ﬁxed in this case due to constant upstream production. The mathematical formulation of the economic optimization problem is as follows min u∈U s.t. V = tf Lzdt t=t0 ẋ = f (x, y, u) , x(t0 ) = x0 0 = g(x, y, u) z = Cx x + Cy y + Cu u 0 ≤ h(z, t) , (3.1) where x, y and u are state, algebraic and input variables, respectively. The relevant input variables for the economic objective are selected by matrix L 78 that enables scaling of the variables as well. The performance variables are deﬁned by the matrices Cx , Cy and Cu , respectively, that are used to select and scale the relevant performance variables. The path constraints h(z, t) are deﬁned by the inequality equations. Furthermore we chose the so-called Haar basis as control parametrization u(t) = P T φ(t) , (3.2) with φj (t) = 1, φj (t) = 0, j∆t ≤ t < (j + 1)∆t otherwise. (3.3) We focused on the sequential approach that uses simulation to satisfy the model equations enabling ease of implementation. According to Vassiliadis (1993) sequential approach is more successful for large scale dynamic optimization problems than the simultaneous approach (see e.g. Biegler, 1984). Yet researches are continuously making progress to develop more eﬃcient dedicated solvers for the simultaneous approach (Biegler, 2002) making the approach more and more competitive for large problems. Recall from the introduction that a plant model easily consists of a hundred thousand equations and we consider in this example case a model with two thousand equations. We used piece-wise constant basis functions as the input parametrization with a sample time of ten minutes overlooking a period of sixteen hours for two decision variables. Finally, we approximated the gradient information by computing the linear time-varying models along the trajectory, which is quite eﬃcient reusing Jacobian information from the simulation. Alternatives for this approach are integration of the so-called sensitivity equations, solving adjoint equations or parameter perturbation (see e.g. Vassiliadis, 1993; Støren, 1999). In the sequential approach ﬁrst the initial trajectory is simulated. Based on this solution objective function and constraints can be evaluated. Then gradient information is used to determine a trajectory, improving the objective and constraints, which is simulated again. This sequence is repeated until some termination criterion is met using a standard sqp. Approximation of objective function and constraints The integral in the objective function V (z) in Equation (3.1) is replaced by a trapezium rule to approximate integral and we sampled the inequality constraints h(z, t). Furthermore we added a quadratic term to the objective, which regularizes the optimization problem. Matrix Q quadratically penalizes subsequent input moves and therefore smooths the optimal solution. Advantage of a smooth optimal signal is that it is more likely to be accepted and understood by 79 Output variables (y) Bottom impurity Impurity tank 1 Impurity tank 2 Input variables (u) Reboiler duty Reﬂux ratio Table 3.1: Deﬁnition output vector (y) and input vector (u). operators. Furthermore it seems that the quadratic term reduces the number of local minima. This is favorable when comparing solutions of this optimization based on diﬀerent reduced models. This results in the following approximate formulation of the dynamic optimization problem3 min p s.t. V = N i=1 (Lzi + 12 ∆uTi Q∆ui ) tf∆t −t0 ẋ = f (x, y, u) 0 = g(x, y, u) z = Cx x + Cy y + Cu u 0 ≤ hi (zi ) 0 = −∆ui + ui − ui−1 , x(t0 ) = x0 , , i = 1...N i = 1...N . (3.4) where p = [uT0 , uT1 , . . . , uTN −1 ]T . (3.5) In this speciﬁc case we deﬁned three output variables: bottom impurity and the impurities in the two tanks, respectively. The two inputs are deﬁned as reboiler duty and reﬂux ratio (see Table 3.1). The impurities in the tanks are required since we want to meet the schedule that is deﬁned as a speciﬁcation on quality in a tank. Operation strategy is to blend oﬀ spec product during the transition to an on spec product in the tank. This enforces a swift transition followed by a more stationary operation. Furthermore it is a robust operation strategy since the quality in the tank is very soon within speciﬁcation. With more product in the tank it is less sensitive to control actions. Reboiler duty appears in the objective function linearly reﬂecting the dominating utility costs of the plant. Note that the variables appear in an absolute sense. Relative changes of the reboiler duty and reﬂux ratio enter the objective in the quadratic term with z T = [y T , uT ] 10 0 L= 0 0 0 1 0 , Q= . (3.6) 0 10 3 We scaled the objective by dividing the original objective by the horizon length. 80 Matrices Cx , Cy and Cu are used for scaling which is required for proper constraint handling. Without scaling a constraint violation of e.g. 0.1o C would be equally penalized as constraint violation of 0.1 in impurity. If we would like that changes of the impurity in a range of 0 − 0.01 are equally important as changes of the temperature in a range of 45 − 55o C, we need to scale to compensate for units. Scaling was done by dividing the variable by is this range. This resulted in the following scaling matrices4 100 0 0 0 0 0 0 10 0 0 0 0 10 0 0 , C (3.7) Cy = = u . 0 0.1 0 0 0 0 0 0 0 0.1 Gradient of the objective and inequality constraints can be approximated by linearization of the plant dynamics. This can be derived as presented next ∂V ∂p ∂h ∂p = = ∂V ∂z ∂u , ∂z ∂u ∂p ∂h ∂z ∂u , ∂z ∂u ∂p (3.8) (3.9) ∂h ∂u ∂z where ∂V ∂z , ∂z and ∂p can be derived analytically and ∂u can be approximated by a linear time variant model along the trajectory D0 0 0 ... 0 ∆u0 ∆z0 D1 0 ... 0 C1 Γ0 ∆z1 ∆u1 Φ Γ C Γ D . . . 0 C 2 1 0 2 1 2 ∆u2 ∆z2 = , (3.10) .. .. .. . .. . . . .. . N! −1 ∆zN ∆uN Φi Γ0 ... . . . . . . DN CN i=1 where for piecewise constant control signals the transition matrices are deﬁned as Ai ∆t Φi = e (i+1)∆t , eAi s dsBi , Γi = (3.11) t=i∆t with the Jacobians evaluated at ti = i∆t in xi , yi , ui " #−1 " #−1 ∂f ∂g ∂f ∂g ∂g ∂g ∂f ∂f − , Bi = − , (3.12) Ai = ∂x ∂y ∂y ∂x ∂u ∂y ∂y ∂u " #−1 " #−1 ∂g ∂g ∂g ∂g Ci = −Cy + Cx , Di = −Cy + Cu . (3.13) ∂y ∂x ∂y ∂u 4 In this case we did not use Cx . 81 reboiler duty 7 initial optimal 6 5 Qreb 4 3 2 1 0 0 2 4 6 8 time [hr] 10 12 14 16 reflux rate 5.5 initial optimal 5 RR 4.5 4 3.5 3 2.5 2 0 2 4 6 8 time [hr] 10 12 14 16 Figure 3.2: Initial guess (dots) and optimal solution (solid) for reboiler duty (top) and reﬂux rate (bottom). x b 0.08 initial optimal 0.06 0.04 0.02 0 0 2 4 6 8 time [hr] z1 0.08 10 0.06 0.04 0.04 0.02 0.02 0 2 4 time [hr] 6 0 8 14 16 z2 0.08 0.06 0 12 8 10 12 time [hr] 14 16 Figure 3.3: Trajectories of bottom impurity (top), ﬁrst vessel (bottom left) and second vessel (bottom right) to initial guess (dots) and resulting optimal solution (solid). The constraints are the block shaped functions between 2 and 8 and between 10 and 16 hour. The optimal solution satisﬁes path constraints whereas the initial guess does not. 82 This can be written as G00 ∆z0 ∆z1 G10 ∆z2 G20 ∆z3 = G30 .. .. . . ∆zN 0 G11 G21 G31 GN 0 ... where Gij is the approximate 0 0 G22 G32 ... ∂zi ∂uj . 0 0 0 G33 ... ... ... ... 0 0 0 .. .. . . ... GN N ∆u0 ∆u1 ∆u2 ∆u3 .. . , (3.14) ∆uN Note that both zi and uj are vectors. We need to select an initial guess for the trajectory and initial condition to start the dynamic optimization. In this case we assume the plant to be in steadystate so the initial condition is determined by the inputs at time zero. The initial guess for the optimization was determined by ﬁnding two steady-state operation conditions for the column that satisfy end product quality, not compensating for oﬀ spec product during transition. The reﬂux rate was chosen to be constant whereas the reboiler duty was step shaped, as depicted with the line with dots in Figure 3.2. The number of decision variables for the optimization is 192 (16 hours, 2 inputs and a sample time of 10 minutes). Path constraints Bottom impurity t i xb Impurity tank 1 z1 Impurity tank 2 z2 Reboiler duty Qreb Reﬂux ratio RR [hr] [-] ≤ ≥ ≤ ≥ ≤ ≥ ≤ ≥ ≤ ≥ (0. . . 2) 1. . . 11 0.10 0.00 0.10 0.00 10 0 10 0 [2. . . 8) 12. . . 47 0.05 0.04 0.05 0.04 10 0 10 0 [8. . . 10) 48. . . 59 0.10 0.00 0.10 0.00 10 0 10 0 [10. . . 16] 60. . . 96 0.01 0.00 0.01 0.00 10 0 10 0 Table 3.2: Deﬁnition path constraints. The path constraints, depicted in Figure 3.3 and shown in Table 3.2, allow for a transition of two hours after which in a period of six hours the production is quasi stationary. Stationary production is attractive from a operational point of view but restrains ﬂexibility and may exclude opportunities. These should be carefully balanced for a reliable and economic attractive operation. 83 Iter F-count 1 1 2 3 3 5 4 7 5 9 6 11 7 13 8 15 9 17 10 19 11 21 12 23 13 25 14 27 15 29 16 31 17 33 18 35 19 37 20 39 21 41 22 43 23 45 24 47 25 49 solution: f(x) 2.72085 3.02699 3.54475 3.93788 3.99698 3.87452 3.68284 3.61549 3.56108 3.51111 3.45050 3.40538 3.36935 3.33937 3.31468 3.29462 3.27738 3.26133 3.24626 3.23284 3.22063 3.20878 3.19912 3.19121 3.18387 3.17896 max constraint 190.8 221.6 43.31 10.16 1.959 0.3502 4.906 1.849 2.17 4.023 6.801 4.49 4.153 3.712 3.054 2.497 2.531 2.824 2.704 1.919 1.304 1.314 1.302 1.181 1.392 0.8278 Step-size 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Directional derivative -0.0273 -0.0148 0.119 0.044 -0.141 -0.265 -0.0881 -0.0743 -0.0637 -0.0844 -0.0605 -0.0483 -0.0402 -0.0326 -0.0262 -0.0225 -0.0211 -0.02 -0.0178 -0.0162 -0.0164 -0.0136 -0.0108 -0.0106 -0.00656 -0.00234 optimization took: 190 min solver options: MaxIter=50 MaxFunEval=100 TolFun=1e-2 TolCon=1e-0 TolX=1e-6 Table 3.3: Output of optimization with original model. In the top of Figure 3.3 constraints on the bottom impurity are shown whereas the bottom left and right show the constraints on the impurity in the two tanks in the tank park. Both tanks are empty and are ﬁlled by the bottom ﬂow of the column successively, both during eight hours. The material produced during the transition does not satisfy end-product quality speciﬁcations and would have been oﬀ spec product. However, during the transition it is blended such that is does meet end-product quality speciﬁcations. In this way we do not need intermediate storage of oﬀ spec product and prevent down stream blending. This is attractive from a economic point of view but requires good planning and scheduling. 84 We used the commercially available modelling tool gproms that can solve diﬀerential and algebraic equations and selected the integration routine dasolv. Alternative solvers will probably yield diﬀerent simulation times and consequently diﬀerent optimization times. We tried to stick to the default values of the solver. For a more general introduction of numerical integration the reader is referred to textbooks on this topic e.g. by Shampine (1994), Brenan et.al. (1996) and Dormand (1996). A sequential quadratic programming algorithm (fmincon) was adopted from Tousain (2002) to solve the nonlinear constraint optimization that was available in Matlab. In this approach a so-called Lagrangian is deﬁned which is the original objective function extended with a multiplication of so-called Lagrangian multipliers and constraints (see Appendix C). This formulation yields ﬁrst order optimality conditions that coincide with the original constraint optimization problem. See for more details e.g. Nash and Sofer (1996), Gill et. al. (1981) and Han (1977). In joint work with Schlegel et al. (2002) a similar sequential optimization problem was solved using adopt with optimization times in the same order of magnitude. It would have been interesting to compare these and other implementations in more detail, which is left for future research. 3.2 Results The results of the optimization is shown in Figure 3.3. The response of the key variables to the initial guess are plotted with the lines with dots. Severe constraint violations are visible. Especially the ﬁnal impurity in the second tank is far too high. The solid line in Figure 3.3 shows the trajectories after optimization. The constraints are satisﬁed we can observe the undershoot that nicely makes up for the oﬀ spec production during the second transition. Furthermore we note that the maximum allowable impurity is produced, which is explainable since this is economically most attractive. Note that before the ﬁrst vessel is ﬁnished the impurity in the bottom of the column already starts to decrease. It anticipates on the transition to a higher purity by increasing the reboiler duty half an hour before the ﬁrst vessel is ﬁnished. The output of intermediate results of the optimization is shown in Table 3.3. The table shows the objective value, maximum constraint violation, step-size and directional derivative for the diﬀerent iterations. Furthermore the number of function evaluations are shown in the second column of the table. The ﬁnal solution is presented on the last row of the table where it is clear that the solution converged to a feasible solution were the maximum constraint violation is less then the constraint tolerance. In Table 3.4 approximate times per task are shown from which the most important observation is that the time required 85 to solve the sequential quadratic programme (sqp) is very small compared to simulation and computation of gradient information. Overhead involves time spent on e.g. communication between gproms and Matlab. The gradient information is computed from the Jacobian that is available in the simulator at each sample time. From this information a discrete time linear model can be derived that is used in Equation 3.10. simulation gradient information SQP overhead % 80 10 5 5 Table 3.4: Approximate contribution computational load per task. The time required to solve this problem was over three hours, which shows that we can solve the problem but it requires a signiﬁcant amount of time. Using this optimization for online control would be most probably too slow already to adequately deal with disturbances. This only gets worse if we would include more components and extra distillation columns to split those components. Note that the implementation as presented here is already quite sophisticated by supplying approximate gradient information in a computationally eﬃcient way and which reduces required optimization time. 3.3 Model quality Development of a method to derive approximate models is useless without assessing the value of the approximate model. The appreciation of a model for dynamic optimization mainly depends on two properties. First the computational load of the overall optimization to a large extent determines its potential for online applications. Second is the quality of the solution of the optimization based on the approximate model. The ﬁrst problem already was addressed in a previous section. The second property was addressed by Forbes (1994) in his paper on model adequacy requirements. He mentions three ways to diﬀerentiate between alternative models. The most common used method is to compare their capability in predicting key process variables. The second method is to check the optimality conditions, which can be done if a solution of the original problem is available. Finally, it is tested whether the optimization based on the approximate model is capable of predicting the manipulated variables that coincide with the true optimal trajectories. This is illustrated in Figure 3.4 where the solution of the optimization parameter space is depicted. The initial 86 ................................................. ............... .......... ......... ........ ........ ....... ....... ....... . . . . . ...... .... . . . . ...... ... . . . . . ........................ . . . . . . .... . . . . . . . . . . . . . . . 1 . . . . ......... ... . . . . . . . . . ... ... . . . . . ... ... . . . ... .... ... .... .... .... .... ... ..... .......... ........ .... 2 2 ........ ... ... ........ ........ ... ....... .. ........ ... ....... ........ .... . . ........ ..... . . . . . . . . . . . . . ......... . .................................... ........... ............. 1............... ................... .................................................. $ O(M) O(M) p0 Rnp ❘ p p❘ p I p $ O(M) ✒ O(M) Figure 3.4: Model adequacy and solution adequacy illustrated by optimizations $ mapping dif(O) based on original model (M) and approximate model (M) ferent initial guesses to optimal solutions in Rnp . A solution derived with an approximate model is adequate if it satisﬁes optimality conditions of the optimization based on the original model. An approximate model is adequate if the true optimal solution satisﬁes optimality conditions of the optimization based on the approximate model. In the graph p0 represents the initial guess that is mapped to p1 and p1 by optimizations based on the original and reduced model, respectively. Subsequently p1 and p1 are used as initial guesses for optimizations based on the reduced and original model, respectively. The result of these last two optimizations are p2 and p2 . guess p0 can be processed by an optimization problem based on the original $ respectively. These model or based on a approximate model, O(M) or O(M), two optimizations result in two solutions, p1 and p1 that can be used as initial guess for two alternate optimization problems where we switch models. These optimizations result in two solutions p2 and p2 for the optimization based on the original and approximate model, respectively. In case of global optimality the equations p1 = p2 p1 = p2 (3.15) , (3.16) are valid and for a good approximate model the equations p1 ≈ p2 p1 ≈ p2 hold true. 87 (3.17) , (3.18) An approximate model is adequate in case the distance p1 − p2 is acceptable small. This implies that the predicted minimizer of the original problem is a minimizer of the optimization problem based on the approximate model. The solution of an optimization based on an approximate model is adequate in case the distance p1 −p2 is acceptable small. This implies that the minimizer of the optimization problem with the approximate model provides a solution that is close to a minimizer of the original model. In case p1 − p1 is small we p1 − p2 to be small, i.e. the same local minimizers. Note expect p1 − p2 and that due to local minima p1 − p2 can be large and still result in the conclusion that solution and approximate model is accurate. In this thesis we will assess model performance by simulation and evaluation of the key process variables and both adequacy test explained above. The optimization statistics will also be presented such as number of iterations, number of function evaluations and CPU time of the optimization/simulation, to compare the computational load of the diﬀerent optimizations. 3.4 Discussion In this chapter we showed a sensible dynamic optimization problem for a typical chemical process. Although great eﬀort was put into this formulation one could think of arguments that would result in a diﬀerent formulation of the optimization problem. An interesting extension of this problem would have been for example to add the fresh feed to the set of optimization variables and add constraints on the levels in the tank park. Furthermore, the time interval allowed for transition could be enlarged providing more freedom and possibly more opportunities for economic improvement. These scenarios were regarded as very interesting but did not add extra value to purpose of this example namely the study of diﬀerent model reduction techniques for dynamic optimization. Nevertheless this case represents a large class of optimization problems in the process industry. The optimization was successfully implemented and provided optimal trajectories that would be acceptable for real implementation. For oﬀ line trajectory optimization we could stop here but for online optimization a sample time larger than tree hours is not acceptable. Computational load is the main hurdle to overcome going from oﬀ line to an online implementation that is fast enough to adequately respond to disturbances. We motivated and provided detailed information on the implementation of the optimization problem. This is necessary to interpret the results since seemingly unimportant details can have a decisive impact. Although we would expect similar optimal trajectories for a simultaneous optimization approach we did not put eﬀort in comparing these two approaches. 88 Chapter 4 Physics-based model reduction In this chapter model reduction based on physics (remodelling) is discussed. We will start to present the rigorous model for a reaction separation process followed by the physics-based reduced model. Modelling assumptions are the key for this reduction. This approach is mainly inspired on the question what the added value is of detailed modelling for dynamic optimization. Simpliﬁed models became outdated because for simulation purposes it was not a problem to develop and use more detailed models. Dynamic optimization is computationally more demanding than simulation and therefore the idea arose to fall back to simple models. The quality of the reduced model is assessed in the last section by execution of a dynamic optimization based on the reduced model. 4.1 Rigorous model The process consists of a reactor and distillation column as depicted in the ﬂow sheet in Figure 3.1. The reactor is fed by a fresh feed and the recycle stream from the top of the distillation. The outlet of the reactor is connected to the feed of the column. First we will describe the very simple reactor model followed by the column model, which is far more complex and therefore a better candidate for remodelling. The detailed column model is a standard model with component balances and an energy balance. The physical properties model used to derive gas and liquid fugacity, enthalpy and density is based on a cubic equation of state as described in Reid (1987). The simpliﬁed model is based on a constant molar overﬂow model and constant relative volatility as described in 89 Skogestad (1997). This last model was slightly modiﬁed by making the relative volatility a dependent of composition. Reactor model The reactor is modelled as a continuously stirred tank reactor (cstr). We assume the content to be ideally mixed, which implies no temperature and no concentration gradients within the reactor occur. In the reactor a ﬁrst order exothermic irreversible reaction A → B takes place without side reactions, so only two species are modelled. The reaction rate depends on concentration CA and temperature described by an Arrhenius equation. Total molar balance dN = F0 + D − F , (4.1) dt where N is the total molar holdup in the reactor, F0 is the fresh feed, D is the recycle ﬂow from the top of the distillation column and F is the product ﬂow out of the reactor. All ﬂows are total molar ﬂows of the both reactant A and product B. The outgoing ﬂow F is controlled by a level controller. The fresh feed F0 is assumed to be determined by the upstream process. The distillate ﬂow D is controlled by the level controller on the condenser. Component mass balance −Ea dN z = ρm F0 z0 + ρm Dxd − ρm F z − k0 CA V e RT , (4.2) dt where ρm is the molar density of both the reactant A and product B, z is the molar fraction of A in the reactor, xd and z are molar fractions of A in recycle and fresh feed ﬂow, k0 and Ea are reaction rate and activation constants, respectively, and R is the relative gas constant. CA and T are the molar concentration and temperature in the reactor and V is the volume in the reactor of reactant A and product B together. The component mass balance can be rearranged z dz dN by substitution of the equalities dN dt = N dt + z dt , CA V = N z and the total molar balance: ρm Component balance rearranged dz F0 D k0 −Ea = (z0 − z) + (xd − z) − e RT z . dt N N ρm (4.3) Energy balance ρm cp −Ea dN T = ρm cp (F0 T0 + DTd − F T ) − Q + k0 CA V e RT ∆H , dt 90 (4.4) where cp is the speciﬁc heat and assumed to be independent of temperature and independent of composition, T0 , Td and T are temperatures of fresh feed, distillate ﬂow and temperature in the reactor, respectively. Q is the heat removed from the reactor by cooling and ∆H is the reaction heat constant. Heat removal dρc cpc Vc Tc = ρc cpc Φc (Tci − Tc ) + Q dt Q = αc Ac (T − Tc ) . (4.5) (4.6) where ρc , cpc , Vc and Tc are density, speciﬁc heat, volume and temperature of the cooler, respectively. Φc is the coolant ﬂow rate and Tci is the temperature of the coolant at the inlet of the heat exchanger. Ac and αc are the total heat transfer coeﬃcient and the total area of heat transfer between coolant and reactor content. We assume the dynamics to be very fast and the coolant in the heat exchanger to be ideally mixed. It is clear that by manipulating the coolant ﬂow rate, Φc , we can control the coolant temperature Tc . Tc determines with T the driving force for the heat removal. A controller that measures the temperature in the reactor and actuates on the coolant ﬂow rate can in this way control the heat removal and therefore the reactor temperature. If the temperature controller is well tuned and the heat exchanger is well designed we can assume that we directly manipulate the heat removal. In the plant model that is used in this thesis we assume that the heat exchanger is well designed. Therefore the equations describing the heat exchanger are not included and the temperature controller directly controls the heat removal. The energy balance can be rearranged by substitution of the equalities dN N dT dt + T dt , CA V = N z and the total molar balance. dN T dt = Energy balance rearranged dT dt = F0 D Q 1 (T0 − T ) + (Td − T ) − + N N N ρm cp −Ea 1 k0 e RT z∆H . ρm cp (4.7) The reactor model is very simple, however, without tight temperature control even this very simple model can exhibit limit cycling behavior. We could add two modelling assumptions that simplify the reactor model even further. We could assume the temperature and level to be constant. This assumption can be motivated by the assumption that we are able to control these variables very fast. The controlled variables are now considered to be 91 directly manipulated variables. These assumptions are applied in the model that was depicted in Figure 2.2 but not in the models described in this chapter. Rigorous column model The distillation column consists of trays, a reboiler and condensor. The next equations1 in this section are based on the gproms model library (1997). These equations hold for all three sub-models where for the reboiler holds that there vap = 0, for the condensor there is no liquid ﬂow is no vapor ﬂow entering Fin liq vap entering Fin = 0 and no vapor ﬂow out Fout = 0 (total condensor). For all sub models except for the feed tray there is no feed ﬂow Ff eed = 0. The equations can describe a mixture of two or more components. In this case it is a binary mixture so N C = 2. Component molar balance dMi dt liq liq vap vap liq vap = Fin zi,in + Fin zi,in + Ff eed zi,f eed − Fout xi − Fout yi (4.8) i = 1 . . . N C. Energy balance dU liq liq vap vap liq liq vap vap = Fin hin + Fin hin + Ff eed hf eed − Fout h − Fout h +Q. dt (4.9) Molar holdup Mi = ML xi + MV yi , i = 1 . . . N C. (4.10) Total energy U = ML hliq + MV hvap − P Vtray . (4.11) Mole fraction normalization NC i=1 xi = NC yi = 1 . (4.12) i=1 Phase equilibrium vap Φliq i (P, T, xi )xi = Φi (P, T, yi )yi i = 1 . . . N C. (4.13) Geometry constraint ML v̄liq + MV v̄vap = Vtray . 1 gproms (4.14) code can be found on the webpage http://www.dcsc.tudelft.nl/Research/Software. 92 Level of clear liquid Lliq = ML v̄liq . Ap (4.15) Furthermore we used a physical property model based on a cubic equation of state from which the properties of the liquid phase as well as the properties of the vapor phase can be computed. The Soave equation of state Z 3 − Z 2 + (A − B − B 2 )Z − AB = 0 . (4.16) where A = B = aP R2 T 2 bP . RT (4.17) (4.18) The values of a and b can be deduced from the critical of the pure components properties and acentric factor and mixing rules ai = fw,i = bi = 2 R2 Tc,i 0.42748 Pc,i & % 1 + fw,i (1 − '2 T ) Tc,i 0.48 + 1.574wi − 0.17wi2 RTc,i 0.08664 . Pc,i (4.19) (4.20) (4.21) The values of a and b for mixtures are given by the following mixing rules '2 %N C √ zi ai a = (4.22) i=1 b = NC z i bi . (4.23) i=1 where zi is the molar fraction in the mixture. The cubic equation is solved, which yields three real roots Zl ≤ Zj ≤ Zv . 93 (4.24) which are implemented by the next three equations 1 A − B − B2 AB = Z1 + Z2 + Z3 , = Z1 Z2 + Z2 Z3 + Z3 Z1 , (4.25) (4.26) = Z1 Z2 Z3 . (4.27) The liquid and vapor compressibility are determined as follows Zliq = Zl , (4.28) Zvap = Zv . (4.29) With the solution of the equation of state the physical properties of interest are calculated using the following expressions. Substitution of Zliq results in the property for the liquid phase whereas substitution of Zvap results in the property for the vapor phase. Speciﬁc volume v̄ = ZRT . P (4.30) Speciﬁc density 1 ρ= xi M Wi . v̄ i=1 NC (4.31) Speciﬁc enthalpy 1 h= b " # " # T NC Z ∂a xi Cp,i (T )dT. (4.32) a−T ln + RT (Z − 1) + ∂T Z +B Tref i=1 Fugacity ln(Φi ) = A bi (Z − 1) − ln(Z − B) + b B " # √ # " ai bi Z +B −2 ln , b a Z (4.33) where Cp,i is given by Cp,i (T ) = 4 (j) Cp,i T j−1 . j=1 For the tray model we need equations that specify the internal ﬂows. 94 (4.34) Overﬂow over weir " liq Fout v̄liq = 1.84Lw Pressure drop over the plate vap Fin & Ah = v̄vap Lliq − βhw β # 32 vap Pin − P − ρliq gLliq αρvap . . (4.35) (4.36) Adiabatic operation of a tray Q=0. (4.37) For reboiler and condensor Q in not equal to zero because of the heat input and heat removal, respectively. Not all streams are present in all sub-models. Only at the feed tray the feed enters the column so for all other trays the feed becomes zero. Furthermore, in the reboiler there does not enter a vapor stream whereas in the condensor there does not exit any vapor stream (total condensor). These equations were put together in a rigorous model for the reaction separation process. 4.2 Physics-based reduced model From literature we know that a simple way of modelling a distillation column is to assume constant molar holdup, constant molar overﬂow and constant relative volatility (see e.g. Skogestad, 1997). Such a model is probably too simple for our case but will provide the starting point for our physically reduced model2 . Component molar balance dMi dt liq liq vap vap liq vap = Fin zi,in + Fin zi,in + Ff eed zi,f eed − Fout xi − Fout yi (4.38) i = 1 . . . N C. Molar holdup is in the liquid phase Mi = ML xi + MV yi ≈ ML xi , i = 1 . . . N C. (4.39) Mole fraction normalization NC i=1 2 gproms xi = NC yi = 1 . (4.40) i=1 code can be found on the webpage http://www.dcsc.tudelft.nl/Research/Software. 95 Phase equilibrium by constant relative volatility αi yi = N C xi i=1 αi xi , i = 1 . . . N C. (4.41) Linearized liquid dynamics ∗ liq liq Fout = Fout + ML − ML∗ . τ (4.42) Constant vapor stream vap vap Fout = Fin . (4.43) We will now go the details of the diﬀerent assumptions that yield the reduced model. An important assumption needed for the physical based model reduction is that the pressure in the column is controlled at a constant set point. This seems somewhat restrictive but in practice the pressure set point is not the main handle for control of a distillation column. If the pressure drop over the column is negligible compared to the absolute pressure we can assume the pressure to be constant if we consider physical properties. The assumption of constant pressure in the column with respect to the internal vapor ﬂux is not valid since only a pressure diﬀerence can enforce a ﬂow. Still we need to determine the internal vapor ﬂow. Constant molar vapor stream Start with a steady-state assumption for the molar holdup and energy balance 0 0 liq liq vap vap = Fin − Fout + Fin − Fout = liq liq Fin hin − liq liq Fout h + vap vap Fin hin (4.44) − vap vap Fout h . (4.45) liq yields Elimination of Fin liq vap liq vap vap liq liq vap (hliq ). 0 = Fout in − h ) − Fin (hin − hin ) + Fout (hin − h (4.46) The speciﬁc enthalpy of the liquid phase is much larger than the speciﬁc enthalpy of the vapor phase. This combined with the small diﬀerences in speciﬁc enthalpy of neighboring trays justiﬁes the assumption that vap liq vap hevap = hliq . in − hin ≈ hin − h (4.47) liq vap vap evap liq 0 = Fout (hliq . in − h ) − (Fin − Fout )h (4.48) Substitution yields 96 Rearranging this equation yields liq hliq F vap − F vap in − h = in liq out . evap h Fout (4.49) With the same argument that the diﬀerences of speciﬁc enthalpy of two neighboring trays is small relative to the heat of evaporation combined with vapor streams that are in the same order of magnitude of the liquid streams, the difference in vapor streams must be approximately zero. Therefore we can assume vap vap Fout = Fin . (4.50) Linearized hydrodynamics In the rigorous model the liquid ﬂow over the weir is described by " #3 1.84 Lliq − βhw 2 liq Fout = . v̄liq β This equation can be linearized in L∗liq % '" #1 L∗liq − βhw 2 1 3 1.84 ∆Lliq liq ∗ ∗ ∆Lliq = Fliq Fout = Fliq + + , ∗ 2 v̄liq β β τh∗ (4.51) (4.52) where τh∗ is the hydraulic time constant in that operating point and ∆Lliq = Lliq − L∗liq . This approach was adopted from Skogestad (1997). The value of τh∗ can be directly computed from the parameters and steady state values of the rigorous model. In the simpliﬁed model we need to rewrite the equation as a function of the molar liquid holdup. This is done with Mliq v̄liq = Lliq Ap liq ∗ Fout = Fliq + ∗ ∆Mliq v̄liq , τh∗ Ap (4.53) ∗ with ∆Mliq = Mliq − Mliq . We can reduce the liquid hydrodynamics even further by assuming that the liquid holdup on a tray is constant. This is equal to the assumption that liq liq Fin = Fout . This assumption was only applied in the model used in Chapter 2. Boilup rate and reboiler duty Furthermore we need to add a new equation that relates the heat input to a molar ﬂow since the energy balance was eliminated. Again it starts with the quasi steady-state assumption of mass and heat balance liq liq vap 0 = Fin + Fout + Fout 0 = liq liq hin Fin + liq liq Fout h 97 (4.54) + vap vap Fout h + Qreb . (4.55) liq Elimination of Fin yields liq vap liq liq vap 0 = Fout (hliq ) + Qreb . in − h ) + Fout (hin − h (4.56) And with the argument that the diﬀerence in speciﬁc enthalpy between liquid and vapor is much larger than the diﬀerence in speciﬁc liquid enthalpy of two liq vap neighboring trays, 0 ≈ hliq hliq ≈ hevap , the equation can be in − h in − h approximated by vap liq 0 ≈ Fout (hin − hvap ) + Qreb ≈ vap evap Fout h + Qreb . (4.57) (4.58) Despite the fact that hevap (P, T, x) is a function of pressure, temperature and composition, we will assume it to be constant since the variations are in general quite small. Constant relative volatility The vapor-liquid equilibrium in the simpliﬁed model is computed by the constant relative volatility equation αi xi . yi = N C j=1 αj xj (4.59) We can observe that the equation is very simple compared to the cubic equation of state approach. In particular, the equations can be solved explicitly if we choose x as a state variable. Underlying assumption for this simpliﬁcation is that the pressure does not play a role for the vapor-liquid equilibrium, which in general is certainly not true. In practice a column is operated at a controlled pressure and therefore we can neglect pressure eﬀects on the vapor-liquid equilibrium. The constant relative volatility was derived in order to remove the temperature eﬀects on the vapor liquid equilibrium and therefore is implicitly accounted for in the constant relative volatility approach. Still we need to check whether the relative volatility is indeed constant. This can be checked by computing the relative volatility in the original model. It appears that in this case the relative volatility is not constant. Next step is to ﬁnd an explicit relation that describes the relation of the observed relative volatility. This is where the engineering comes in and can hardly be automated. After some trial and error we came up with the following extension of the relative volatility αi (xi ) = θ1,i + θ2,i xi . 98 (4.60) Nice property of this choice is that the relative volatility remains an explicit expression of x, which is computationally attractive. In practice the relative volatility of one component is set to one to normalize the other relative volatilities. This leaves two parameters for a binary mixture. Reconstruction of temperature Temperature is not present in the simpliﬁed model because the energy balance was removed by a quasi steady-state assumption combined with the constant molar overﬂow assumption. This is not desired since temperature measurements are often used as inferential measurements for composition. If pressure and composition are ﬁxed, temperature is no longer a degree of freedom. From following equations the temperature can be computed from pressure, composition and the Antoine constants for the pure component: Raoult’s Law, Dalton’s Law and the Antoine equation, pi = xi p0i , pi = yi P ln p0i = Ai + , Bi , Ci + T (4.61) respectively, where pi is the partial pressure, p0i is the saturation pressure of the pure component, xi is the molar fraction in the liquid phase, yi is the molar fraction in the gas phase, P is the pressure, T is the temperature and Ai , Bi and Ci , the Antoine constants of component i, can be rearranged to Bi Ai + C +T 0 i yi e p . (4.62) = i = xi P P This equation can be written as an explicit equation for the temperature ( T = ln Bi ) yi xi P − Ai − Ci . (4.63) In case the Antoine constants are not available a parameter estimation is necessary. Since the pressure variations already were limited by assuming the pressure to be controlled in a tight interval, we can assume the pressure to be constant. Results of optimization based on reduced model The statistics for this optimization are presented in Table 4.1. This optimization $ with initial is referred to in the graph in Figure 3.4 as the optimization O(M) guess p0 and solution p1 . The model statistics are listed in Table 4.2. The largest reduction is achieved in the number of algebraic variables and the number of nonzero elements. 99 bottom impurity 0.08 original reduced 0.06 x b 0.04 0.02 0 0 2 4 6 8 time [hr] 10 12 14 16 10 12 14 16 bottom flow 0.54 B 0.52 0.5 0.48 0.46 0 2 4 6 8 time [hr] Figure 4.1: Responses of the key process variables of the reduced model to the original optimal trajectories. Iter F-count 1 1 2 3 3 5 4 7 5 9 6 11 7 13 8 15 9 17 10 19 solution: f(x) 2.72085 2.82561 3.15963 3.31951 3.29696 3.20454 3.14829 3.11884 3.09374 3.06053 3.04862 max constraint 180.7 194.5 27.83 2.569 0.1779 0.08808 0.2168 0.2434 0.2807 0.6904 0.2509 Step-size 1 1 1 1 1 1 1 1 1 1 1 Directional derivative -0.159 0.0597 0.118 -0.0284 -0.116 -0.0764 -0.038 -0.0296 -0.0467 -0.0156 -0.0179 optimization took: 11 min solver options: MaxIter=50 MaxFunEval=100 TolFun=1e-2 TolCon=1e-0 TolX=1e-6 Table 4.1: Output of the optimization with the reduced model. 100 Compared to the optimization with the original model, see Table 3.3, the number of iterations is reduced by a factor of more than two. This does not explain the factor seventeen in reduction of the overall optimization time. Simulations with the reduced model are less computational demanding (approximately a factor seven) and this is the second explanation for the tremendous reduction. Note that the only diﬀerence between the two optimizations is the function evaluation and the gradient information since these are based on diﬀerent models. model: original reduced ratio nx 75 54 1.4 ny 1912 211 9.1 na 24 24 1.0 nnz 7237 859 8.4 Table 4.2: Properties original and reduced model with nx , ny and na the number of diﬀerential, algebraic and assigned (ﬁxed) variables, respectively. nnz is the number of nonzero elements in the Jacobian. Furthermore, note that reduction in simulation times by a factor of seven can be associated best with the reduction of the number of nonzero elements, which is a factor of eight approximately. This association is closer than the reduction ratio of diﬀerential equations or total number of equations. The quality of model and the solution of this dynamic optimization are topic of the next section. 4.3 Model quality As presented in Chapter 3 we can assess model quality in three diﬀerent manners. These three will be applied in this section. First we will present the evaluation by simulation. Secondly, we assess model adequacy by verifying optimality conditions of the optimization based on the reduced model with the solution of the original optimization problem. This optimization is referred to $ with initial guess p1 and in the graph in Figure 3.4 as the optimization O(M) solution p2 . Finally, we present the reversed check for optimality conditions referred to as the optimization O(M) with initial guess p1 and solution p2 . In that case the optimality conditions for the optimization problem with the original model for the approximate solution are checked. 101 reboiler duty 7 pinit popt 6 5 Q reb 4 3 2 1 0 0 2 4 6 8 time [hr] 10 12 14 16 reflux rate 5.5 pinit popt 5 RR 4.5 4 3.5 3 2.5 2 0 2 4 6 8 time [hr] 10 12 14 16 $ with as initial guess Figure 4.2: Optimization with the reduced model O(M) the solution from the optimization with the original model p1 (dots) and approximate solution p2 (solid). Iter F-count 1 1 2 3 3 5 4 7 5 9 6 11 solution: f(x) 3.17896 3.15842 3.13354 3.03198 3.0054 2.9684 2.9614 max constraint 58.03 0.09697 4.441e-016 1.19 0.2554 0.3072 0.6529 Step-size 1 1 1 1 1 1 1 Directional derivative -0.03 -0.0273 -0.13 -0.0287 -0.0433 -0.00832 -0.00449 optimization took: 6.8 min. solver options: MaxIter=50 MaxFunEval=100 TolFun=1e-2 TolCon=1e-0 TolX=1e-6 $ with Table 4.3: Output of the optimization with the reduced model O(M) as initial guess the solution from optimization with the original model p1 and resulting in solution p2 as depicted in Figure 4.2. 102 Evaluation by simulation A simple check of model quality is to do two simulations, ﬁrst with the original and repeat the simulation with the reduced model. Comparison of the key process variables gives a ﬁst impression of model quality. For the test of the reduced model as discussed in this chapter we used the optimal trajectories computed by the optimization based on the original model. These trajectories where applied to the reduced model and Figure 4.1 shows the responses of bottom impurity and bottom ﬂow. These process variables are considered to be the key process variables. We see that constraints for the bottom impurity are not satisﬁed at all times but the constraint violation is minimal. These minor diﬀerences can be ascribed to small diﬀerences in internal ﬂows related to the constant molar vapor ﬂow assumption. A separate check on the relative volatility showed that the approximation was nearly perfect for the range of operation that was considered here. Diﬀerences in bottom ﬂow rate are very small. In steady-state these cannot diﬀer because that would imply a violation of the conservation laws since it is the only way out of the process. Model adequacy The second approach to assess model quality is to check the optimality conditions of the optimization based on the reduced model with the true optimum. Or stated diﬀerently, check whether or not the true optimum is also an optimum for the optimization based on the reduced model. Results of these tests are presented in Figure 4.2. Some minor changes can be noted mainly due to the constraint violation that occurs with the original guess. This can be veriﬁed in the Table 4.3 that conﬁrms the constraint violation at the ﬁrst iteration. Note that the responses of the key process variables to the initial guess coincides with the responses used for the comparison based on simulation and therefore equal to the responses depicted in Figure 4.1. Remarkable is the time still needed to converge (Table 4.3) compared to the result as presented in Table 4.1 with the step shaped initial guess. Despite the initial guess is quite good the warm started optimization is less than two times faster compared to the optimization with a cold start. Although this is a sensible approach to assess the model quality for dynamic optimization, in practice we would like to do optimizations based on the reduced model and assess the quality of those solutions. This is precisely what is discussed next. 103 reboiler duty 7 pinit popt 6 5 Q reb 4 3 2 1 0 0 2 4 6 8 time [hr] 10 12 14 reflux rate 4.5 pinit popt 4 RR 16 3.5 3 2.5 2 0 2 4 6 8 time [hr] 10 12 14 16 Figure 4.3: Optimization with the original model O(M) with as initial guess the approximate solution from optimization with the reduced model p1 (dots) and resulting solution p2 (solid). Iter F-count 1 1 2 3 3 5 4 7 5 9 6 11 solution: f(x) 3.04862 3.29795 3.39243 3.37446 3.32522 3.30531 3.29149 max constraint 73.07 24.13 10.71 0.4812 0.07207 0.3974 0.02154 Step-size 1 1 1 1 1 1 1 Directional derivative 0.184 0.0767 -0.0215 -0.0655 -0.0221 -0.0149 -0.00599 optimization took: 46.8 min. solver options: MaxIter=50 MaxFunEval=100 TolFun=1e-2 TolCon=1e-0 TolX=1e-6 Table 4.4: Output of the optimization with the original model O(M) with as initial guess the approximate solution from optimization with the reduced model p1 and resulting in solution p2 as depicted in Figure 4.3. 104 Solution adequacy The capability of a reduced or approximate models to predict optimal solutions that are acceptable close to the true process optimum is crucial. This capability is tested by redoing the optimization based on the original model with the approximate solution as initial guess derived with the reduced model. The result of this test is depicted in Figure 4.3 where the input trajectories of the reboiler duty and reﬂux rate are shown. Again we only observe minor diﬀerences between initial guess and converged solution. The intermediate statistics are gathered in Table 4.4, which shows an initial constraint violation that is satisﬁed at cost of an increased objective value. Note that the optimal solutions represent local minima. This is clearly visible if we compare the reﬂux rate in Figures 4.2 and 4.3. This may be caused by a lack of sensitivity for this input channel. The shape of the optimal input trajectories of the boilup rate are striking similar, which pleas for the followed approach. From a computational point of view we already concluded that is was very attractive to use the reduced model, reducing computation time with a factor of seventeen. If we use the reduced model only for computation of an initial guess as a ﬁrst step, we still reduce the computational eﬀort by more than a factor three. The local minimum of the original optimization problem is slightly better than a two-step approach. This emphasizes that we indeed found a local minimizer. A last comment is that coincidentally the same number of iteration is needed for convergence as in the previous test. 4.4 Discussion In this chapter the added value of a physically reduced model became evident for this dynamic optimization problem. A signiﬁcant reduction in computational time was achieved using the reduced model. Even when this solution was used as an initial guess for the optimization with the detailed model the overall time for optimization is shorter3 . In practice, one could argue what the added value is of the use of the detailed model since the quality of the reduced model is already quite high. Especially considering that there will also be mismatch between the detailed model and the real plant. Furthermore the presence of disturbances will require a feedback mechanism. Since the performance of a feedback controller is limited by the sample rate a controller based on a detailed model with low sample rate can be 3 In a closed loop implementation the converged solution of the previous cycle could be used to construct an good initial guess. 105 outperformed by a controller based on a simple less accurate model with a high sample rate. The approach to derive the simpliﬁed model can be generalized to models that describe physical properties based on detailed physical property models e.g. based on a Soave equation of state. These detailed physical property models have a large validity range that is not exploited within the dynamic optimization. This allows for simpliﬁcation by using simpliﬁed models available from literature In this thesis the simpliﬁed distillation column from Skogestad (1997) was used. This model was extended with equations that reconstruct variables that are present in the detailed model but not available in the simpliﬁed model. Key to the success of this approach is that the simpliﬁed model has similar properties as the detailed model. The simpliﬁed model has a similar sparsity structure and nonlinearity but with less diﬀerential and algebraic equations. Inevitably will an optimization based on such a model be computationally less demanding. It requires detailed process knowledge to distinguish between relevant phenomena and less important details to come to a right tradeoﬀ. Should this dynamic optimization run online we saw that a warm started optimization with the reduced model is seven times faster than the optimization with the original. A closed loop evaluation with disturbances is not performed in this thesis but a controller based on the reduced model and a sampling rate of ten minutes would most likely outperform the controller based on the rigorous model and sampling rate of almost one hour. How much better the controller would be mainly depends on the character of the disturbances. 106 Chapter 5 Model order reduction by projection In this chapter the possibilities of model order reduction by projection for dynamic optimization are explored. The focus in this chapter will be on the model quality whereas the techniques to derive the projection have been discussed Chapter 2 of this thesis. Proper Orthogonal Decomposition and a Gramian based reduction are applied to the physics-based reduced model as presented in Chapter 4 and assessed within a dynamic optimization problem as outlined in Chapter 3. 5.1 Introduction In Chapter 2 model order reduction by projection has been discussed. Because of its success in reducing model order signiﬁcantly while keeping the approximation error small it was selected as a promising model reduction technique for dynamic optimization. Two diﬀerent approaches to derive a suitable projection are selected. The ﬁrst approach is based on a proper orthogonal decomposition of data generated by simulation of the optimal trajectory resulting from the optimization as deﬁned in Chapter 3. The second approach is a balanced reduction based on averaging controllability Gramians and averaging observability Gramians derived along the same optimal trajectory (that was used for proper orthogonal decomposition) via linearization and Lyapunov equations. In this chapter we will assess those two diﬀerent projection techniques for both truncation and residualization and for all reduced model orders. As a reference model we use the physics-based reduced model as presented in Chapter 4, which has forty seven diﬀerential equations. With two diﬀerent projection 107 techniques and both truncation and residualization we have almost two hundred reduced-order models that will be assessed in this chapter. The motivation for assessing all model orders is that in literature no leads are available how to ﬁnd the optimal order for the projected model. One expects a gradual increase of the simulation approximation error when increasing the level of reduction. At the same time one expects that increasing the level of reduction results in a gradual decrease of computational load of simulation. For the sequential approach implementation of dynamic optimization, a reduction of computational load of the simulation directly results in a reduction in time required for one iteration. Model reduction by projection is therefore expected to reduce the overall computational eﬀort assuming that the number of iterations remains approximately the same for the optimization. Diﬀerence between model reduction by truncation and by residualization is that in case of residualization the steady state approximation error is zero whereas for truncation in general this is not the case. Note that model reduction by singular perturbation is also a reduction by residualization. However, model reduction by singular perturbation is based on a modal analysis whereas proper orthogonal decomposition and Gramian based reduction are based on an energy measure from input to state or output, respectively. For dynamic optimization a zero steady-state approximation error is a favorable property because the tail of the solution is dominated by low frequent dynamics and approach a steadystate. When comparing proper orthogonal decomposition with Gramian based reduction both methods have strong and weak points. Strong point for proper orthogonal decomposition is that the reduction is based on a trajectory, which can be an optimal trajectory based on the full-order model. The reduction is tailored for this speciﬁc trajectory and requires a conscious choice of input trajectories used for model reduction. Recall however that the result also depends on scaling of the diﬀerent state variables. Gramian based reduction does not depend on the scaling of states and really takes the input to output behavior into account. On the other hand the projection matrix required for a Gramian-based projection can be close to singular whereas the projection matrix for proper orthogonal decomposition is orthogonal. It is hard to predict what method should be preferred. It is not possible to give an analytical answer to the question which model projection approach provides the best results in terms of quality and reduction of computational load. Therefore in this chapter these diﬀerent projection methods are assessed experimentally within an optimization environment. Based on the results we can answer the question whether it is possible to successfully apply diﬀerent projected models within dynamic optimization. This will be done based on performance indicators that are outlined next. 108 Performance indicators First assessment of a reduced-order model is an evaluation based on a simulation, generating trajectories of key process variables. This is similar to what we did in Figure 4.1 for the assessment of the physics-based reduced model. The optimal inputs of the dynamic optimization based on the full-order model are used for this simulation. The two optimal inputs are the boilup rate and reﬂux rate, respectively. Both are depicted in Figure 4.3. A ﬁrst quick evaluation is done by visual inspection of key process variables and the norm of the error signal on key process variables. The assumption that motivates this assessment is that a reduced model that correctly predicts key process variables is suitable for optimization. However, gradient information is not evaluated in this assessment, which is essential for optimization. Gradient information is used within the optimization to ﬁnd an optimal solution. In case the gradient information is not correct, it is unlikely that the optimization converges to the correct optimum. Therefore we assess the reduced order models like we assessed the physics-based reduced model in Chapter 4 within an optimization setting. initial guess: O(M) $ O(M) p0 p1 p1 p1 p2 p1 p2 - Table 5.1: Model adequacy and solution adequacy tested by optimizations based $ mapping diﬀerent on original model O(M) and reduced-order model O(M) initial guesses to optimal solutions. The result of optimization based on the original model and initial guess p0 is mapped to p1 . Then p1 is used as an initial guess for the optimization based on the reduced-order model and mapped to p2 . In a similar way the initial guess p0 is mapped to p1 by an optimization based on the reduced-order model. Finally p1 is used as an initial guess for the optimization based on the original model resulting in p2 . We will explain the assessment of the reduced-order models using Table 5.1, which is a diﬀerent way of presenting Figure 3.4. The dynamic optimization, O, as described in Chapter 3 with the physics-based reduced model, M, maps the initial guess p0 into an optimal solution p1 . If the solution p1 coincides with the optimal solution for the same optimization based on the reduced-order model the reduced-order model is adequate. To check this we execute the same $ and with dynamic optimization but now based on the reduced-order model, M, the optimal solution p1 as initial guess for the optimization. Since we have almost two hundred reduced-order models this results in as many optimizations to check model adequacy. 109 Conversely, we can execute a dynamic optimization based on the reduced-order model with p0 as initial guess for the optimization. This results in an optimal solution p1 for each reduced-order model. The quality of this solution can be checked in a similar way as we checked the quality of the reduced-order models. We repeat the dynamic optimization based on the full-order model M and use p1 as the initial guess for the optimization. If the solution based on the reduced-order model coincides with the optimal solution for the fullorder model the solution is adequate. Again this results in almost two hundred optimizations. Before executing these optimizations a ﬁrst quick evaluation of the optimal solution p1 is done by visual inspection. This is done by comparing these solutions to the optimal solution p1 based on the full-order model. Besides the model quality we are interested in the computational load of the dynamic optimization based on the reduced-order models. The results on computational load of the dynamic optimization and the number of iterations to reach convergence are deﬁned as performance indicators. Note that for the evaluation of the reduced models we use the same initial guess, p0 , for the optimization. First we will continue with the properties of the nonlinear models and the implementation of projection of a set of diﬀerential and algebraic equations. Then we will present the results of all optimizations as discussed in this section to assess reduced-order model quality. 5.2 Projection of nonlinear models Model properties The class of models that are considered in this thesis can be described by a set of diﬀerential and algebraic equations (dae) of at most index one (Brenan et al., 1996), 0 = F (ẋ, x, y, u) , (5.1) where x ∈ Rnx are deﬁned as state variables, y ∈ Rny are deﬁned as algebraic variables, u ∈ Rnu are deﬁned as input variables and t ∈ R is deﬁned as the variable time. Here nx is referred to as the order of the model. We will conﬁne to models deﬁned by a set of explicit diﬀerential algebraic equations ẋ = f (x, y, u) 0 = g(x, y, u) , (5.2) where f ∈ Rnx and g ∈ Rny . This can be written in the general format as −ẋ + f (x, y, u) 0 = F (ẋ, x, y, u) = . (5.3) g(x, y, u) 110 This shows precisely the structure that is imposed on the models that we consider in this chapter. A special case occurs when all algebraic equations can be made explicit in term of an analytical expression of state and input variables −ẋ + f (x, y, u) 0 = F (ẋ, x, y, u) = . (5.4) −y + g̃(x, u) The algebraic equations can be eliminated from the diﬀerential equations by substitution ẋ = f (x, g(x, u), u) = f(x, u) y = g(x, u) . (5.5) This is deﬁned as a set of ordinary diﬀerential equations (ode) because the diﬀerential equations only depend on state variables and input variables. The physics-based reduced model used in this chapter can be described as in Equation (5.4). Projection dynamics of nonlinear models For the type of diﬀerential and algebraic equations as described in Equation (5.2) we will present how to project the dynamics. How to compute a suitable projection was discussed in Chapter 2. 1. Transform the original state under similarity into a new coordinate system more suitable for model reduction z = T (x − x∗ ) , (5.6) and transformation to original coordinate system x = T −1 z + x∗ , (5.7) where T is square and nonsingular and x∗ is a steady-state vector in Rnx deﬁned by f (x∗ , y ∗ , u∗ ) = 0 and g(x∗ , y ∗ , u∗ ) = 0. The dynamics in the new coordinate system can be written as ż 0 = T f (T −1 z + x∗ , y, u) = g(T −1 z + x∗ , y, u) . (5.8) 2. Decompose the transformed space into two subspaces with state vectors z1 ∈ Rnr and z2 ∈ Rnx −nr , respectively. Hence, z1 T1 (5.9) = (x − x∗ ) , z2 T2 111 and transformation to original coordinate system * + z1 x = T1 T2 + x∗ , z2 (5.10) where * T1 T2 + T1 T2 =I, (5.11) with consistent partitioning. The dynamics in the new coordinate system can be written as ż1 ż2 0 = T1 f (T1 z1 + T2 z2 + x∗ , y, u) = T2 f (T1 z1 + T2 z2 + x∗ , y, u) = g(T1 z1 + T2 z2 + x∗ , y, u) . (5.12) 3. Finally, like in Equation (2.68) we can reduce the number of diﬀerential equations by residualization, i.e. ż2 = 0 ż1 0 0 = T1 f (T1 z1 + T2 z2 + x∗ , y, u) = T2 f (T1 z1 + T2 z2 + x∗ , y, u) = g(T1 z1 + T2 z2 + x∗ , y, u) , (5.13) or like in Equation (2.70) we can reduce the number of diﬀerential equations by truncation, i.e. z2 = 0 ż1 0 = T1 f (T1 z1 + x∗ , y, u) = g(T1 z1 + x∗ , y, u) . (5.14) Note that model-order reduction by residualization, like model-order reduction by singular perturbation, does not introduce a steady-state error. From an implementation point of view, in a simulation environment the transformation is just added to the set of equations. This increases the total number of equations, as we will see in the next section, but prevents manual elimination of x, ż 0 z = T f (x, y, u) = g(x, y, u) = T (x − x∗ ) , where x is no longer a state variable but an explicit algebraic variable. 112 (5.15) 5.3 Results of model reduction by projection The dynamic optimization used to assess the model quality was outlined in Chapter 3. Before we present the results, some model statistics are presented in Table 5.2. This table shows the model statistics of the physical reduced fullorder model and residualized and truncated models depending on the order of reduction. Note that the addition of the transformation has a signiﬁcant eﬀect on the number of nonzero elements due to the (non sparse) state transformation. And note that the number of algebraic equations is much larger than the number of diﬀerential equations. model: full-order residualized truncated nx 54 54 − nr 54 − nr ny 211 428 + nr 428 + nr na 24 24 24 nnz 859 5574 − nr 5574 − nr · 47 Table 5.2: Properties original and reduced model with nx , ny and na the number of diﬀerential, algebraic and assigned variables, respectively. nnz is the number of nonzero elements in the Jacobian. nr is the number of projected state variables. Although one may expect monotonicity in both error and computational load for an increasing degree of reduction, experience proves otherwise. Therefore it was decided to do the full-scale exploration of all reduced-order models, both with residualization as well as truncation. With two diﬀerent projections, two reduction methods and an original model with 47 diﬀerential equations1 this adds up to almost two hundred candidate reduced models. A ﬁrst evaluation is done by simulation: solution p1 (see Table 5.1) is applied as the input trajectory for reboiler duty and reﬂux rate for all projected models. The output trajectories of the key process variables are compared to the trajectory produced by simulation with the original model. In total we will assess 184 diﬀerent reduced models: two diﬀerent transformations (proper orthogonal decomposition and Gramian based) times two diﬀerent projection methods (residualization and truncation) times forty six diﬀerent orders. Secondly, a check whether the original optimal solution p1 satisﬁes optimality conditions for all reduced models. We therefore redo the optimization with the reduced models with initial guess equal to the optimal solution p1 : $ p1 ) → p2 and check the maximum constraint violation and count number O(M, of iterations before reaching convergence. Recall that p1 was already available from O(M, p0 ) → p1 in Chapter 4. 1 In the model 54 diﬀerential equations are present because the three tanks and one redundant pressure controller were counted that are not part of the projection. 113 bottom impurity 0.08 0.06 x b 0.04 0.02 0 0 2 4 6 8 time [hr] 10 12 14 16 gram resid gram trunc pod resid pod trunc 0 ||ε|| 10 −5 10 0 5 10 15 20 25 order 30 35 40 45 50 Figure 5.1: Top: response of bottom impurity to the optimal trajectory p1 for all diﬀerent projections. Bottom: norm of error between original and approximated response. bottom flow 0.53 0.52 0.51 B 0.5 0.49 0.48 0.47 0 2 4 6 8 time [hr] 10 12 14 16 gram resid gram trunc pod resid pod trunc 0 ||ε|| 10 −5 10 0 5 10 15 20 25 order 30 35 40 45 50 Figure 5.2: Top: response of bottom ﬂow to the optimal trajectory p1 for all diﬀerent projections. Bottom: norm of error between original and approximated response. 114 Thirdly, we compute p1 by solving the optimization problem again for all pro$ p0 ) → p1 . Then we will do a visual inspection of the jected models: O(M, solution and compute the error norm p1 − p1 . Finally, we will present the maximum constraint violation of the approximate solution applied to the original model and the number of iterations required to reach convergence of the original optimization problem with the approximate solution as initial guess. This implies that for all converged solutions p1 we execute an optimization O(M, p1 ) → p2 and check for the maximum constraint violation and count the number of iterations before reaching convergence. Evaluation by simulation Most common method to diﬀerentiate between models is by comparing their ability to predict key process variables. The key process variables in this case are the product quality and throughput, which are the bottom impurity and bottom ﬂow of the distillation column. These variables are predicted by simulation with all diﬀerent projections. The input trajectories used for the simulation are the optimal input trajectories resulting from the optimization with the full-order model as presented in Chapter 4. The results of these simulations are presented in Figures 5.1 and 5.2 where the order of the projected models is set on horizontal axis in the bottom of both ﬁgures. On the vertical axis in the bottom of both ﬁgures the logarithm of the squared integral error is shown. In the top of both ﬁgures the time responses are shown of all simulations with an error smaller than the level represented by the dashed line in the bottom of both ﬁgures. So each cross, diamond, square and plus in the bottom ﬁgure under the dashed line represents a simulation of which the resulting trajectory is plotted in the top of the ﬁgure. The truncated models are pointed out with square and diamond of which the square corresponds to the Gramian based projection and the diamond to the proper orthogonal decomposition based projection. The plus and cross represent the residualized models of which the plus corresponds to the Gramian based projection and the cross to the proper orthogonal decomposition based projection. During the execution of a simulation with a reduced-order model it appeared that some simulations were not successful. These unsuccessful simulations terminated due to convergence problems of the simulation algorithm. This can be explained by the approximative character of the reduced model. Due to this approximation some variables hit their upper of lower bound where those limits where not hit in the full-order model. These upper and lower limits are helpful during the process of building the model but are not strictly necessary during simulation. Therefore all limits were removed from the reduced-order models, which resulted in a larger number of successful simulations. Still at some point the solver was not able to solve for all reduced models. 115 CPU time [sec] 200 gram resid gram trunc pod resid pod trunc 180 160 140 120 100 80 60 40 20 0 0 5 10 15 20 25 order 30 35 40 45 50 Figure 5.3: Computational load for simulation of the optimal input trajectory of diﬀerent projected models measured in cpu seconds. CPU time [min] 3 10 gram resid gram trunc pod resid pod trunc 2 10 1 10 0 10 0 5 10 15 20 25 order 30 35 40 45 50 30 35 40 45 50 iterations 2 10 1 10 0 10 0 5 10 15 20 25 order Figure 5.4: Optimizations with projected models and initial condition p0 $ p0 ) → p1 Top: cpu time required for the optimization. Bottom: number O(M, of iterations required by the optimization with diﬀerent projections. 116 The bottom impurity trajectories of all successful simulations that had an error below the dashed line are depicted in the top of Figure 5.1. Simulation results that are not plotted because the integral error was too large are considered to be bad. In a similar way the results of the bottom ﬂow are shown in Figure 5.2. First observation from Figures 5.1 and 5.2 is that the error does not monotonically decrease with the model order. This can be explained by the fact that the model reduction was based on energy in signals and did not explicitly include feasibility of the simulation. From this observation we can conclude that knowledge on the error of two neighboring reduced-order models (one with a higher and the other with a lower order) does not give a guaranteed prediction of the error of the intermediate reduced-order model. Furthermore, we see that in general the residualized model seems to have a higher accuracy than the truncated models. This is according to expectation. For the lower-order models we see that the Gramian based residualized models outperform the other reduced models in predicting the bottom impurity. In case of prediction of the bottom ﬂow the distinction is less pronounced, which may be explained by the choice of output variables for the Gramian based reduction of which the bottom ﬂow was no part of. The plots with the simulation results are intended to reﬂect model quality of its capability to represent the input-output behavior of the original model. As discussed in Chapter 1 the motivation for the model reduction was to reduce computational eﬀort. Therefore we simply plotted the time needed for the simulation for each reduced model with the same convention for the square, diamond, plus and cross as in Figures 5.1 and 5.2. This resulted in Figure 5.3 where the dashed line represents the simulation time of the physics-based reduced model without transformation equations. From the models that are only transformed but not reduced we can see the impact of approximately a factor three on cpu compared to the original model. The cpu decreases almost monotonically with the model order in case of the proper orthogonal decomposed truncated models. Same holds for the proper orthogonal decomposed residualized models but with more exceptions. Unexpected is the behavior of the Gramian based reduced models. For some reason the cpu time peaks between model order of twenty and forty. Overall, it can be concluded that projection did not result in a reduction of simulation cpu times compared to the original model. In joint work with Schlegel (Schlegel et al., 2002) the same eﬀect on cpu time of model reduction by proper orthogonal projection was observed. In that work a similar model (177 equations of which 47 diﬀerential) but diﬀerent dynamic optimization was deﬁned. Furthermore, in the sequential dynamic optimization sensitivity equations were used to obtain gradient information instead of using the linear time varying gradient approximation. For this optimization problem the computational time of the projected model was almost ﬁve times longer 117 reboiler duty 7 6 5 Q reb 4 3 2 1 0 0 2 4 6 8 time [hr] 10 12 14 16 4 10 2 ||ε|| 10 0 10 gram resid gram trunc pod resid pod trunc −2 10 −4 10 0 5 10 15 20 25 order 30 35 40 45 50 $ p0 ) → p1 for the boilup duty. BotFigure 5.5: Top: optimal trajectories O(M, tom: error p1 − p1 for diﬀerent projections. reflux rate 6 RR 5 4 3 2 1 0 2 4 6 8 time [hr] 10 12 14 16 4 10 2 ||ε|| 10 0 10 gram resid gram trunc pod resid pod trunc −2 10 −4 10 0 5 10 15 20 25 order 30 35 40 45 50 $ p0 ) → p1 for the reﬂux ratio. BotFigure 5.6: Top: optimal trajectories O(M, tom: error p1 − p1 for diﬀerent projections. 118 than for the nominal model. Like in Figure 5.3 the computational time slightly reduced with reducing the order of reduction. Furthermore the quality of the residualized model was higher than the truncated reduced-order models. In the paper not all reduced-orders are tested but only a selection of nine. We need to stress that the choice of numerical solver can have a signiﬁcant impact on the cpu. We found that the dasolv routine had a good performance on the original model. This same routine was used for simulation of the reduced models with an absolute accuracy of 10−6 and relative accuracy of 10−4 . In case the model has a sparse structure it is important to know whether this structure is exploited by the solver. If a solver does not exploit model sparsity structure the impact of a non sparse projection is less signiﬁcant as we saw in our example. Important observation is that if we assess a model reduction technique by simulation, we actually assess the combination of speciﬁc reduced model and solver. $ p0 ) → p1 Evaluation of model quality by optimization: O(M, Next we will do the evaluation of reduced models and optimal solutions derived from them as already was discussed in the previous section. We start with generating optimal solutions with all reduced models with initial guess equal to the initial guess that was used in the optimization with original model. The result is presented in Figures 5.5 and 5.6. These ﬁgures are a visual inspection of the optimal trajectories computed for reboiler duty and reﬂux rate. In the top of both ﬁgures, the trajectories are shown in the case that the error between approximate and original solution is smaller than some error bound. The shapes of the approximate optimal solutions are quite similar. The error p1 − p1 is plotted in the bottom of both ﬁgures against model order for all projection methods. In the high order range the proper orthogonal decomposition based truncated models perform worse than the other reduced models. In the mid range of reduced model the absence of Gramian based reduced models attracts attention. This can be related to long CPU times as presented in Figure 5.3 indicating a higher change of termination of the simulation and subsequently termination of the optimization. For the low-order reduced models it is clear that the solutions based on the residualized models are more accurate where the Gramian based reduced models produce slightly better solutions than the proper orthogonal decomposition based reduced models. Evaluation of solution adequacy by optimality check: O(M, p1 ) → p2 In the top of Figure 5.4 the computational load of each optimization based on the reduced-order models is presented. The bottom of Figure 5.4 shows the 119 maximum contstraint at first iteration 4 10 gram resid gram trunc pod resid pod trunc 2 gmax 10 0 10 −2 10 0 5 10 15 25 order 30 35 40 45 50 35 40 45 50 iterations for convergence 2 10 iter 20 1 10 0 10 0 5 10 15 20 25 order 30 Figure 5.7: Solution adequacy test O(M, p1 ) → p2 Top: Maximum constraint violation at ﬁrst iteration. Bottom: Number of iterations for convergence. maximum contstraint at first iteration 4 10 gram resid gram trunc pod resid pod trunc 2 gmax 10 0 10 −2 10 0 5 10 15 25 order 30 35 40 45 50 35 40 45 50 iterations for convergence 2 10 iter 20 1 10 0 10 0 5 10 15 20 25 order 30 $ p1 ) → p2 Top: Maximum constraint Figure 5.8: Model adequacy test. O(M, violation at ﬁrst iteration. Bottom: Number of iterations for convergence. 120 number of iterations required for convergence also on a logarithmic scale and for all reduced-order models. Missing data indicates a failure of the optimization due to an unsuccessful simulation in one of the iterations. Reasons for an unsuccessful simulation were already discussed in this section. Recall that the original optimization converged in ten iterations. For the ten highest order reduced models the projection has not an eﬀect on the number of iterations except for the proper orthogonal decomposition based truncated models. For these reduced models the number of iterations was increased by a factor ten. The other results are scattered from which can be concluded that the eﬀect of projection on the computational load of this dynamic optimization cannot be predicted. The optimal solutions that were computed by optimization with reduced models were already assessed by visual inspection. An alternative way of assessment is to check whether the approximate solution satisﬁes optimality conditions of the optimization with the original model. This can be checked by an optimization with the original model and the approximate solution as initial guess. Two performance indicators are presented in Figure 5.7. In the top of this ﬁgure the maximum constraint violation at the ﬁrst iteration is shown on a logarithmic scale for all approximate solutions against the model order that was used to derive the approximate solution. The second indicator is presented in the bottom of Figure 5.7 and show the number of iterations for convergence. A number of solutions only need one iteration, which implies that the approximate solution satisﬁes the optimality conditions. All solutions but one where better than the original initial guess p0 since the number of required iterations is less than ten. Solutions generated by the proper orthogonal decomposition based reduced models appear to perform less in terms of maximum constraint violation and number of iterations required for convergence. The rest of the results are again scattered and from that can be concluded that the eﬀect of projection on the quality of the optimal solution cannot be predicted. Note that the missing data is explained by non convergence of the optimization based on the projected models. Only the converged solutions could be assessed. $ p1 ) → p2 Evaluation of model adequacy by optimality check: O(M, The last check presented here is a model adequacy test. In this test all reduced models are used within an optimization with initial guess the original solution p1 . Again we will assess the model by two performance indicators that are presented in Figure 5.8. The ﬁrst indicator is shown in the top of the ﬁgure and shows the maximum constraint violation at ﬁrst iteration. A second indicator is presented 121 in the bottom of the same ﬁgure and shows the number of iterations required to reach a converged solution. Although the results are again quite scattered some conclusions can be drawn. First we can observe the performance of the proper orthogonal decomposition based residualized models outperform almost all other reduced models. This can be explained by the fact that proper orthogonal decomposition based models are tailored to this trajectory, so to speak. This is in line with the assessment based on simulation performance of the key process variables. Remarkable are the sudden jumps of e.g. the Gramian-based residualized models. Many of these reduced models require only one iteration to converge whereas seemingly random, some orders require between 2 and 50 iterations. Connection between diﬀerent ﬁgures Let us trace a speciﬁc reduced-order model throughout all ﬁgures and pick the proper orthogonal truncated model of order twenty. In the bottom of Figures 5.1 and 5.2 we see that it is the only reduced-order model of order twenty that successfully performed the simulation with the optimal input trajectory represented by p1 . This optimal solution was the result of the optimization with initial guess p0 and the full order model or equivalently O(M, p0 ) → p1 . For the other three reduced-order models of order twenty, the simulation of this optimal trajectory failed. In Figure 5.3 we can ﬁnd the according time required for simulation of this optimal input trajectory. And because the others failed, no simulation times in that ﬁgure are present for the other three reduced models of order twenty. The simulation needs to be successful in each iteration for the optimization to proceed. The simulation results in Figures 5.1 and 5.2 correspond to the $ p1 ) → p2 . Therefore in ﬁrst simulation executed within the optimization O(M, Figure 5.8 for the model order of twenty only the truncated proper orthogonal reduced-order model is present. The other reduced models of order twenty were not successful in simulating the optimal trajectory, p1 , and therefore could not provide a maximum constraint violation and the optimization could not continue to ﬁnd a converged solution p2 . When we start the optimization with initial guess p0 and use the reduced$ p0 ) → p1 , not all optimizations result in a converged soluorder models, O(M, tion. We can see in Figure 5.4 that for the reduced-order models of order twenty only the optimization with the truncated proper orthogonal reduced-order model converged. This can also be observed in Figures 5.5 and 5.6 where the resulting optimal trajectories, p1 , are presented. Only for this solution we can perform the solution adequacy test, O(M, p1 ) → p2 , as presented in Figure 5.7. All four types of optimization as presented in Table 5.1 are used to assess the reduced-order models. Next we will discuss and interpret the results. 122 5.4 Discussion In this chapter an extensive assessment has been presented of almost two hundred candidate reduced models by diﬀerent projections that involved more than ﬁve hundred optimizations. We can come to the following observations after studying the results of all optimization and simulations executed in this chapter: 1. It has been shown that quite some of the reduced models are adequate in the sense that the optimal solution of the original model is also (close to) an optimal solution for the reduced model. This is illustrated in Figure 5.8. Many optimizations based on the reduced model required only one iteration, starting with the optimal solution of the original model as initial guess. This implies that the gradient information of the reduced-order model (approximately) coincides with the gradient information of the fullorder model. 2. We showed that it is possible to use projected models for dynamic optimization. This can be explained by the fact that the simulation algorithm uses gradient information, which is the same gradient information used within the optimization algorithm. Therefore a small simulation error implies high quality gradient approximation, which explains the good optimization results. See Appendix D for details on the result of projection on gradient information. A ﬁrst robustness test was to execute the optimization with a diﬀerent initial trajectory than was used for deriving the projections. In this way we can investigate the sensitivity to the trajectory used for model reduction by projection. No extensive testing was done on performance of the reducedorder models for diﬀerent optimization objectives. One can think of other quality speciﬁcations on the end product. It is nontrivial how to deﬁne a set of trajectories that provides a suitable projection for a class of diﬀerent optimization objectives. 3. None of the simulations in Figure 5.3 with projected models had a lower computational load than the full-order model. This result depends on the combination of model properties and speciﬁc numerical integration routine. Projection transforms a sparse structured model into a dense model. In case that the model is sparse, which is exploited by the solver, simulation of the reduced-order model is less eﬃcient. Starting with the full-order dense model we see that for proper orthogonal decomposition the eﬃciency of the simulation increases a little by reducing the model order. For the Gramian-based reduced-order models we see that the reduced123 order models in the mid range even get less eﬃcient. For the lower order models the eﬃciency is a little bit better than the full-order dense model. 4. Some optimizations based on reduced-order models required less iterations than the full-order model. Despite the longer simulation time per iteration this resulted in a small reduction of the overall optimization time. This is illustrated in Figure 5.4. The reduction in overall optimization time is very small and cannot counterbalance the model approximation. With the full-order model we know that the simulation will be successful for a large set of input trajectories, even without testing them on forehand. This is purely based on the underlying model assumptions. After model order reduction we cannot say much about model quality for other input trajectories than the one used to derive the projection for. Even worse, we cannot say anything on model quality on forehand for the trajectory that was used to derive the projection. 5. The issue related to scaling of state variables, which aﬀects reduction by proper orthogonal decomposition, appeared not to be a problem in this model. Apparently the model was properly scaled. In general it is worthwhile to scale your model properly. Simply because this is beneﬁcial for all numerical operations applied to the model such as simulation and linearization. From a theoretical point of view the Gramian based reduction is more elegant since it does not depend on the coordinate system you start with. From a practical point of view proper orthogonal decomposition is more easy to apply. 6. All performance indicators that are used in this chapter behave discontinuously. The result of a speciﬁc reduced-order cannot be estimated by interpolation of its neighboring reduced-order models. Model reduction by projection is based on energy in signals and does not consider stability or eﬃciency of simulation. The model reduction has a diﬀerent objective than what we want to use it for because not better alternative is yet available. 7. Comparing proper orthogonal decomposition with Gramian-based model order reduction we can make several observations: (a) Models reduced by proper orthogonal decomposition have more favorable simulation properties in terms of computational load, which is illustrated in Figure 5.3. This holds for both the truncated and residualized models. (b) In this application the approximation error in general is higher for truncated models than for residualized models, which is illustrated in Figures 5.1 and 5.2. 124 By the experimental setup for testing two well known projection methods we now better understand what the value is for the sequential implementation of dynamic optimization based on dae model. Projection of dynamics of dae with a sparse structure is not a suitable model reduction technique to reduce the computational load of dynamic optimization. Open issues are the choice of input trajectories that represent the relevant operating envelope. For this operating envelope, the reduced-order model should provide a high quality approximation. For nonlinear models it is not possible to check this analytically. Only by many diﬀerent simulations one can build conﬁdence that the quality of a model is good enough. For the simultaneous implementation of dynamic optimization we can expect similar results as we found for the sequential approach that was implemented in this thesis. For other model structures than dae the results can be quite diﬀerent. One can think of an ode model combined with a ﬁxed step-size solver. In case the projected model does not have fast dynamics anymore, the ﬁxed step-size can be increased, which increases the simulation eﬃciency. 125 Chapter 6 Conclusions and future research In this thesis we pose three main research questions. First research question is how to derive projections for nonlinear model order reduction suitable for dynamic optimization. Second research question is how to derive an approximate model by physics-based model reduction suitable for dynamic optimization and third research question is how to assess diﬀerent reduced models for their use within dynamic optimization. Conclusions on these research questions are present in this chapter. 6.1 Conclusions Model order reduction of nonlinear models by projection • Projection is an eﬀective method to reduce the number of diﬀerential equations for process models. Proper orthogonal decomposition and Gramianbased projection are selected as most promising transformation techniques from which reduced models can be derived by truncation as well as residualization. Proper orthogonal decomposition is less involved than Gramianbased reduction but has a weaker theoretical basis. The freedom to scale the diﬀerential variables before applying the proper orthogonal decomposition has a decisive impact on the resulting transformation but only a pragmatic approach for scaling is available. • Empirical Gramians have been unravelled and reduced to averaging of linear controllability and observability matrices of local dynamics. Computation of these averaged Gramians is less involved than computation of 127 empirical Gramians. Compared to proper orthogonal decomposition, the theoretical basis for Gramian-based reduction is much stronger. Gramianbased reduction really accounts for the relevant input to output behavior whereas proper orthogonal decomposition only accounts for input to state behavior. Internal scaling of diﬀerential equations (states) does not aﬀect the Gramian-based reduced models since reduction is based on input to output behavior. The eﬀect of scaling of diﬀerent input and output variables in this case is clear. For Gramian based reduction the transformation matrix can become nearly singular whereas the transformation by proper orthogonal decomposition the projection matrix is always orthonormal. • The relation between proper orthogonal decomposition and balanced reduction is presented. The snapshot matrix multiplied with its transpose approximates the discrete time controllability Gramian in case white noise signals are used to generate the snapshot data. • Projection of nonlinear models does not have a predeﬁned error bound like there is for the linear models. It is not possible to interpolate results. Results of two neighboring reduced-order models does not provide an estimate of model properties of the intermediate reduced-order model. This is a serious problem that is rarely reported in literature. Physics-based model reduction • Opposed to the mathematical projection, which is generally applicable to diﬀerent process models we studied the possibility of physics-based model reduction. This is a process speciﬁc approach and in the case of this thesis involved a distillation process. This process is well studied and a relative volatility model is used that simpliﬁes the vapor-liquid equilibrium described by a cubic equation of state. The relative volatility constant is made dependent of component mole fraction to better match the original equilibrium model that is based on a cubic equation of state. The result of this approach is a tremendous reduction of both diﬀerential and algebraic equations and number of nonzero elements in the Jacobian without signiﬁcantly eﬀecting the input-output behavior. This reduction approach (reusing simpliﬁed models available in literature) was systematic and therefore it should be possible to carry it over to other processes than distillation processes. Its success depends on operation envelope and degree of exotic behavior captured. • Model reduction of the physical property model is more generally applicable wherever many equations are involved to very precisely compute these properties over a very wide operating range. It is questionable if this 128 accuracy is required for online applications and very likely that more eﬃcient simpliﬁed physical property models will reduce computational load without losing too much accuracy. Assessment of reduced models for dynamic optimization Assessment of models, and thus model reduction techniques, requires formulation of an objective. The objective for a model used within an online dynamic optimization is not straightforward. Therefore diﬀerent performance indicators were introduced. First indicator was based on how models tend to be assessed in general, which is by visual inspection or error norm of some key process variables for a relevant input trajectory. Second indicator is visual inspection of the optimal solution generated by an optimization based on the approximate model compared to the optimal solution based on the original model with the same initial guess for the optimization. This optimization generates two other ﬁgures namely the number of iterations and CPU time required to solve the optimization of which the last in important for the assessment of models for online applications. Third and forth performance indicators are optimality tests of respectively approximate model and approximate solution. Therefore we start the optimization with the approximate model and original solution. The number of iterations required for convergence indicates how well the original optimum coincides with the approximate model. Finally we can start the optimization with original model with the approximate solution. Again the number of iterations required for convergence is an indicator that reﬂects how well the approximate solution coincides with the original solution. Furthermore, we can simply compute the maximum constraint violation on the approximate solution which also assesses the quality of the approximate solution. These performance indicators enable model assessment for dynamic optimization. • The performance indicators as deﬁned in this thesis assess the combination of model reduction and speciﬁc optimization technique. This notion cannot be stressed enough and is underexposed in literature. • With the performance indicators as deﬁned in this thesis the physicsbased reduced model has been assessed and appeared to be very successful according to the diﬀerent performance indicators. The optimization is over a factor of seventeen faster without losing too much accuracy in simulation and optimization results. If this approximate solution is used as initial guess for the original model, this still reduces the overall optimization time with a factor of more than three. • Model order reduction by projection is assessed by applying it to the physics-based reduced model. This is partially successful according to 129 the performance indicators. Even with a strongly reduced number of transformed diﬀerential equations it is possible to produce acceptable approximate solutions. The original optimal solution was for many reduced models close to the optimum of the optimization based on the reduced models. However, model reduction by projection does not reduce computational time of the optimization as implemented in this thesis, i.e. a sequential dynamic optimization approach with linear time-varying gradient approximation. • Model order reduction by projection of nonlinear models described by a dae is not suited for simulation. Projection does not provide predictable results in terms of simulation error and stability. It is too unreliable for online applications and does not reduce the computational load of simulation. Consequently it will not reduce computational load of dynamic optimization that utilizes a simulation. This is at least the case for the sequential implementation of a dynamic optimization problem as presented in this thesis. 6.2 Future research During the course of this thesis some interesting directions were identiﬁed but not explored. These could give direction for future research in the area of model reduction for dynamic real time optimization. • The performance indicators deﬁned in this thesis should be extended with a closed-loop performance indicator. In this setup the receding horizon sampling rate is determined by the computational load. The tradeoﬀ between model accuracy and overall computational speed can then be taken into account resulting in an real-time closed loop performance indicator. • In this thesis we restricted ourselves to the sequential dynamic optimization approach. Since the concept of the simultaneous dynamic optimization is diﬀerent, it would be interesting to study the eﬀect of projection in that framework. • The sequential approach may be improved by reuse of solutions. In the current implementation the numerical solver has to redo all step-size and predictor-corrector type of calculations every iteration. The solution of the previous simulation may contain useful information that could speed up simulation, especially close to convergence where input perturbations become small. In a receding horizon implementation one even could think of transfer of this information from one complete simulation to the next. 130 • One of the observations is that the correlation between the computational load of simulations and the number of nonzero elements in the Jacobian is much stronger than with e.g. the number of diﬀerential equations. It would be interesting to eliminate algebraic variables by automatic substitution or symbolic manipulation. The crucial step is an algorithm that identiﬁes the implicit algebraic. This could lead to a notion of nonlinear minimal realization. • A fundamental problem working with nonlinear models is that no guaranteed error bounds can be derived. This would enable a more rigorous way for nonlinear model assessment. The best one can do is to pick a set of input trajectories representing the process envelope and test for these conditions. This will always be a selection and therefore not a guarantee for all inputs trajectories. The problem even gets more interesting when disturbance scenarios are considered. 131 Bibliography [1] H. Aling, S. Banerjee, A.K. Bangia, V. Cole, J. Ebert, A. Emami-Naeini, K.F. Jensen, I.G. Kevrekidis, and S. Shvartsman. Nonlinear model reduction for simulation and control of rapid thermal processing. In Proceedings of the American Control Conference, pages 2233–2238, 1997. [2] H. Aling, J.L. Ebert, A. Emani-Neaini, and R.L. Kosut. Application of a nonlinear model reduction method to rapid thermal processing (rtp) reactors. Proceedings of the 13th IFAC World congress, B:205–210, 30th june 5th july 1996. [3] H. Aling, R.L. Kosut, A. Emami-Naeini, and J.L. Ebert. Nonlinear model reduction with application to rapid thermal processing. In Conference on Decision and Control, pages 4305–4309, 1996. [4] I.P. Androulakis. Kinetic mechanism reduction based on an integer programming approach. AIChE Journal, 46(2):361–371, 2000. [5] A.C. Antoulas and D.C. Sorensen. Approximation of large-scale dynamical systems: An overview. International Journal of Applied Mathematics and Computer Science, 11(5):1093–1121, 2001. [6] K.J. Åström and B. Wittenmark. Computer-Controlled Systems. PrenticeHall, Upper Saddle River, 1997. [7] T. Backx, O.H. Bosgra, and W. Marquardt. Integration of model predictive control and optimization of processes. Proceedings ADCHEM 2000, 1:249– 260, 2000. [8] J. Baker and P.D. Christoﬁdes. Finite-dimensional approximation and control of non-ninear parabolic PDE systems. International Journal of Control, 73(5):439–456, 2000. [9] L.S. Balasubramhanya and F.J. Doyle III. Nonlinear model-based control of a batch reactive distillation column. Journal of Process Control, 10:209– 218, 2000. 133 [10] E. Bendersky and P.D. Christoﬁdes. Optimization of transport-reaction processes using nonlinear model reduction. Chemical Engineering Science, 55:4349–4366, 2000. [11] G. Berkooz, P. Holmes, and J. L. Lumley. The proper orthogonal decomposition in the analysis of turbulent ﬂows. Ann. Rev. Fluid Mech., 25:539–575, 1993. [12] L.T. Biegler. Solution of dynamic optimization problems by successive quadratic programming and orthogonal collocation. Computers and Chemical Engineering, 8(3):243–248, 1984. [13] L.T. Biegler. Advances in simultaneous strategies for dynamic process optimization. Chemical Engineering Science, 57(4):575–593, 2002. [14] L.T. Biegler, I.E. Grossmann, and A.W. Westerberg. A note on approximation techniques used for process optimization. Computers and Chemical Engineering, 6(2):201–206, 1985. [15] I.D.L. Bogle and J.D. Perkins. A new sparsity preserving quasi-newton update for solving nonlinear equations. SIAM Journal on Scientiﬁc Statistical Computing, 11(4):621–630, 1990. [16] K.E. Brenan, S.L. Campbell, and L.R. Petzold. Numerical Solution of Initial-Value Problems in Diﬀerential-Algebraic Equations. SIAM, Philadelphia, 1996. [17] H. Briesen and W. Marquardt. Adaptive model reduction and simulation of thermal cracking of multicomponent hydrocarbon mixtures. Computers and Chemical Engineering, 24:1287–1292, 2000. [18] H.J.L. Van Can, H.A.B. Te Braake, S. Dubbelman, C. Hellinga, K.Ch.A.M Luyben, and J.J. Heijnen. Understanding and applying the extrapolation properties of serial gray-box models. AIChE Journal, 44(5):1071–1089, 1998. [19] E.H. Chimowitz and C.S. Lee. Local thermodynamic models for high pressure process calculations. Computers and Chemical Engineering, 9(2):195– 200, 1985. [20] M. Diehl, H.G. Bock, J.P. Schlöder, R. Findeisen, Z. Nagy, and F. Allgöwer. Real-time optimization and nonlinear model predictive control of processes governed by diﬀerential-algebraic equations. Journal of Process Control, 12:577–585, 2002. [21] J.R. Dormand. Numerical methods for diﬀerential equations. CRC Press, Boca Raton, USA, 1996. 134 [22] P. Duchêne and P. Rouchon. Kinetic scheme reduction via geometric singular perturbation techniques. Chemical Engineering Science, 51(20):4661– 4672, 1996. [23] T.F. Edgar and D.M. Himmelblau. Optimization of chemical processes. McGraw-Hill Book Co., New York, 1989. [24] K. Edwards, T.F. Edgar, and V.I. Manousiouthakis. Reaction mechanism simpliﬁcation using mixed-integer nonlinear programming. Computers and Chemical Engineering, 24:67–79, 2000. [25] W. Favoreel, B. De Moor, and P. Van Overschee. Subspace state space system identiﬁcation for industrial processes. Journal of Process Control, 10:149–155, 2000. [26] R. Findeisen, M. Diehl, I. Disli-Uslu, S. Schwarzkopf, F. Allgöwer, J.P. Bock, H.G. Schlöder, and E.D. Gilles. Computation and performance assessment of nonlinear model predictive control. IEEE Conference on Decision and Control, December 2002. [27] C.A. Floudas. Nonlinear and Mixed-Integer Optimization. Fundamentals and Applications. Oxford University Press, Inc., New York, 1995. [28] J.F. Forbes, T.E. Marlin, and J.F. MacGregor. Model adequacy requirements for optimizing plant operations. Computers and Chemical Engineering, 18(6):497–510, 1994. [29] N. Ganesh and L.T. Biegler. A robust technique for process ﬂowsheet optimization using simpliﬁed model approximations. Computers Chemical Engineering, 11(6):553–565, 1987. [30] R. Gani, J. Perregaard, and H. Johansen. Simulation strategies for design and analysis of complex chemical processes. Trans. IChemE, 68(Part A):407–417, 1990. [31] P.E. Gill, W. Murray, and M.H. Wright. Practical Optimization. Academic Press, London, 1981. [32] gPROMS Technical Document. The gPROMS Model Library. Process Systems Enterprise Ltd., London, UK, 1997. [33] S. Gugercin and A.C. Antoulas. A comparative study of 7 algorithms for model reduction. In Proceedings IEEE Conference on Decision and Control, December 2000, pages 2367–2372, 2000. [34] J. Hahn and T. E. Edgar. Reduction of nonlinear models using balancing of emperical Gramians and Galerkin projections. In Proceedings of the American Control Conference, pages 2864–2868, 2000. 135 [35] J. Hahn and T.F. Edgar. An improved method for nonlinear model reduction using balancing of emperical gramians. Computer and Chemical Engineering, 26:1379–1397, 2002. [36] S.P. Han. A globally convergent method for nonlinear programming. Journal of Optimization Theory and Applications, 22:197, 1977. [37] P.J. Holmes, J.L. Lumley, G. Berkooz, J.C. Mattingly, and R.W. Wittenberg. Low-dimensional models of coherent structures in turbulence. Physics Reports, 287:337–384, 1997. [38] A. Isidori. Nonlinear control systems: An introduction. Springer Verlag, Berlin, 1989. [39] A. Kienle. Low-order dynamic models for ideal multicomponent distillation processes using nonlinear wave propagation. Chemical Engineering Science, 55:1817–1828, 2000. [40] P.V. Kokotovic, H.K. Khalil, and J. O’Reilly. Singular Perturbation Methods in Control: Analysis and Design. Academic Press, London, 1986. [41] R.L. Kosut. Uncertainty model unfalsiﬁcation: A system identiﬁcation paradigm compatible with robust control design. In Conference on Decision and Control, pages 3492–3497, 1995. [42] D. Kraft. On converting optimal control problems into nonlinear programming problems. Comput. and Math. Prog., 15:261–280, 1985. [43] A. Kumar and P. Daoutidis. Nonlinear model reduction and control of highpurity distillation columns. Americal Control Conference, pages 2057–2061, June 1999. [44] M.J. Kurtz and M.A. Henson. Input-output linearizing control of constrained nonlinear processes. Journal of Process Control, 7(1):3–17, 1997. [45] S. Lall, J.E. Marsden, and S. Glavaski. A subspace approach to balanced truncation for model reduction of nonlinear control systems. Int. J. Robust Nonlinear Control, 12:519–535, 2002. [46] S. Lall, J.E. Marsden, and S. Glavaski. Empirical model reduction of controlled nonlinear systems. Proceedings of the IFAC World Congress, F:473– 478, July 1999. [47] T. Ledent and G. Heyen. Dynamic approximation of thermodynamic properties by means of local models. Computers and Chemical Engineering, 18(Suppl.):S87–S91, 1994. 136 [48] K.S. Lee, Y. Eom, J.W. Chung, J. Choi, and D. Yang. A control-relevant model reduction technique for nonlinear systems. Computers and Chemical Engineering, 24:309–315, 2000. [49] F.L. Lewis. Optimal Estimation. John Wiley and Sons, Inc., New York, 1986. [50] G. Li and H. Rabitz. Combined symbolic and numerical approach to constrained nonlinear lumping - with application to an H2 /O2 oxidation model. Chemical Engineering Science, 51(21):4801–4816, 1996. [51] G.Y. Li, A.S. Tomlin, and H. Rabitz. Determination of approximate lumping schemes by singular perturbation techniques. Journal of Chemical Physics, 99(5):3562–3574, 1993. [52] W.M. Ling and D.E. Rivera. Control relevant model reduction of Voltera series models. Journal of Process Control, 8(2):78–88, 1998. [53] W.M. Ling and D.E. Rivera. A methodology for control-relevant nonlinear system identiﬁcation using restricted complexity models. Journal of Process Control, 11:209–222, 2001. [54] H.P. Löﬄer and W. Marquardt. Order reduction of non-linear diﬀerentialalgebraic process models. Journal of Process Control, 1(1):32–40, 1991. [55] W.L. Luyben, B.D. Tyréus, and M.L. Luyben. Plant wide process control. McGraw-Hill Book Co., New York, 1999. [56] W. Marquardt. Traveling waves in chemical processes. Int. Chem. Engng., 30:585–606, 1990. [57] W. Marquardt. Nonlinear model reduction for optimization based control of transient chemical processes. In Proceedings of Chemical Process Control-6, pages 30–60, 2001. [58] B.C. Moore. Principal component analysis in linear systems: Controllability, observability, and model reduction. IEEE Transactions on Automatic Control, 26(1):17–32, 1981. [59] S.G. Nash and A. Sofer. Linear and Nonlinear Programming. McGraw-Hill Book Co., New York, 1996. [60] C.P. Neuman and A. Sen. Suboptimal control algorithm for constrained problems using cubic splines. Automatica, 9:601–613, 1973. [61] A. Newman and P.S. Krishnaprasad. Nonlinear model reduction for RTCVD. IEEE Proceedings 32nd Conference Information Sciences and Systems, 1998. 137 [62] H. Nijmeijer and A.J. van der Schaft. Nonlinear Dynamical Control Systems. Springer Verlag, New York, 1990. [63] S.J. Norquay, A. Palazoglu, and J.A. Romagnoli. Application of Wiener model predictive control (WMPC) to an industrial C2-splitter. Journal of Process Control, 9:461–473, 1999. [64] G. Obinata and B.D.O. Anderson. Model Reduction for Control System Design. Springer, London, 2001. [65] U. Pallaske. Ein verfahren zur ordnungsreduktion mathematischer prozessmodelle. Chem.-Ing.-Tech., 59(7):604–605, 1987. [66] R.K. Pearson. Selecting nonlinear model structures for computer control. Journal of Process Control, 13:1–26, 2003. [67] R.K. Pearson and M. Pottmann. Gray-box identiﬁcation of block-oriented nonlinear models. Journal of Process Control, 10:301–315, 2000. [68] J. Perregaard. Model simpliﬁcation and reduction for simulation and optimization of chemical processes. Computers and Chemical Engineering, 17(5/6):465–483, 1993. [69] L. Petzold and W. Zhu. Model reduction for chemical kinetics: An optimization approach. AIChE Journal, 45(4):869–886, 1999. [70] J.B. Rawlings. Tutorial overview of model predictive control. IEEE Control Systems Magazine, 20(3):38–52, 2000. [71] R.C. Reid, J.M. Prausnitz, and B.E. Poling. The Properties of Gases and Liquids. McGraw-Hill, New York, 1987. [72] G.A. Robertson and I.T. Cameron. Analysis of dynamic models for structural insight and model reduction. Part 2: A multi-stage compressor shutdown case-study. Computers and Chemical Engineering, 21(5):475–488, 1996. [73] G.A. Robertson and I.T. Cameron. Analysis of dynamic process models for structural insight and model reduction. Part 1: Structural identiﬁcation measures. Computers and Chemical Engineering, 21(5):455–473, 1996. [74] A.A. Safavi, A. Nooraii, and J.A. Romagnoli. A hybrid model formulation for a distillation column and the on-line optimization study. Journal of Process Control, 9:125–134, 1999. [75] M.G. Safonov and R.Y. Chiang. A Schur method for balanced-truncation model reduction. IEEE Transactions on Automatic Control, 34(7):729–733, 1989. 138 [76] J. Scherpen. Balancing for nonlinear systems. Systems and Control Letters, 21:143–153, 1993. [77] M. Schlegel, J. van den Berg, W. Marquardt, and O.H. Bosgra. Projection based model reduction for dynamic optimization. AIChE Annual Meeting. Indianapolis, 2002. [78] M. Schlegel, K. Stockmann, T. Binder, and W. Marquardt. Dynamic optimization using adaptive control vector parameterization. Computers and Chemical Engineering, 29(8):1731–1751, 2005. [79] G. Sentoni, O. Agamennoni, A. Desages, and J. Romagnoli. Approximate models for nonlinear control. AIChE Journal, 42(8):2240–2250, 1996. [80] L.F. Shampine. Numerical solution of ordinary diﬀerential equations. Chapman and Hall, New York, 1994. [81] S.Y. Shvartsman and I.G. Kevrekidis. Nonlinear model reduction for control of distributed systems: A computer-assisted study. AIChE Journal, 44(7):1579–1595, 1998. [82] L. Sirovich. Analysis of turbulent ﬂows by means of the empirical eigenfunctions. Fluid Dynamics Research, 8:85–100, 1991. [83] S. Skogestad. Dynamics and control of distillation columns - a tutorial introduction. In Institution of Chemical Engineers Symposium Series, pages 23–57, 1997. [84] W.E. Stewart, K.L. Levien, and M. Morari. Simulation of fractionation by orthogonal collocation. Chemical Engineering Science, 40(3):409–421, 1985. [85] S. Støren and T. Hertzberg. Local thermodynamic models used in sensitivity estimation of dynamic systems. Computers and Chemical Engineering, 21(Suppl.):S709–S714, 1997. [86] S. Støren and T. Hertzberg. Obtaining sensitivity information in dynamic optimization problems solved by the sequential approach. Computers and Chemical Engineering, 23:807–819, 1999. [87] F.Z. Tatrai, P.A. Lant, P.L. Lee, I.T. Cameron, and R.B. Newell. Control relevant model reduction: A reduced order model for ’Model IV’ ﬂuid catalic cracking units. Journal of Process Control, 4(1):3–14, 1994. [88] F.Z. Tatrai, P.A. Lant, P.L. Lee, I.T. Cameron, and R.B. Newell. Model reduction for regulatory control: An FCCU cas study. Chemical Engineering Research and Design / Transactions Institution of Chemical Engineering (Trans IChemE), 72(5):402–407, 1994. 139 [89] R.L. Tousain. Dynamic optimization in business-wide process control. PhD thesis, Delft University of Technology, Systems and Control Group, 2002. [90] V. Vassiliadis. Computational Solution of Dynamic Optimization Problems with General Diﬀerential-Algebraic Constraints. PhD thesis, Imperial College of Science, London, 1993. [91] P.M.R. Wortelboer. Frequency-weighted balanced reduction of closed-loop mechanical servo-systems: theory and tools. PhD thesis, Delft University of Technology, Systems and Control Group, 1994. [92] K.L. Wu, C.C. Yu, W.L. Luyben, and S. Skogestad. Reactor/Separator processes with recycles 2. Design for composition control. Computers & Chemical Engineering, 27:401–421, 2002. [93] L. Zhang and J. Lam. On H2 model reduction of bilinear systems. Automatica, 38:205–216, 2002. [94] K. Zhou, J.C. Doyle, and K. Glover. Robust and Optimal Control. PrenticeHall, Upper Saddle River, 1995. 140 List of symbols Symbol A B C D F F (.) f (.) G g(.) H h(.) L M M $ M nu nx ny nz np nr O(M) O(M, p) P p Q T t tf Description system matrix ∈ Rnx×nx system matrix input to state ∈ Rnx×nu system matrix state to output ∈ Rny×nx system matrix input to output ∈ Rny×nu discrete time system matrix input to state ∈ Rnx×nu diﬀerential algebraic equations diﬀerential equations discrete time system matrix ∈ Rnx×nx algebraic equations discrete time Hankel matrix ∈ R∞×∞ inequality constraints linear performance weight ∈ Rnz×nz covariance matrix plant model approximate model of M number of inputs variables ∈ N number of state variables ∈ N number of output variables ∈ N number of performance variables ∈ N number of parametrization coeﬃcients ∈ N number of reduced state variables ∈ N optimization operator based on model M optimization operator based on model M and initial condition p controllability Gramian ∈ Rnx×nx parametrization coeﬃcients optimization problem ∈ Rnp observability Gramian ∈ Rnx×nx or quadratic weight ∈ Rnz×nz transformation matrix ∈ Rnx×nx time variable ∈ R ﬁnal time ∈ R 141 ts U u V Wo Wc X x Y y z Γo Γc sample time variable ∈ R orthogonal transformation matrix ∈ Rnx×nx input variable ∈ Rnx objective optimization discrete time observability Gramian ∈ Rnx×nx discrete time controllability Gramian ∈ Rnx×nx state data matrix ∈ Rnx×N state variable ∈ Rnx output data matrix ∈ Rny·nq×N output variable ∈ Rnx performance variable ∈ Rnz observability matrix ∈ R∞×nx controllability matrix ∈ Rnx×∞ Sub/superscript c i k N o Description controllability time instant i time instant k number of samples observability Abbreviation APC DAE DCS DRTO INCOOP LTI LTV MPC ODE PDE POD RTO Description Advanced Process Control Diﬀerential Algebraic Equaltion Distributed Control System Dynamic Real Time Optimization INtegration of COntrol and OPtimization Linear Time Invariant Linear Time Varying Model Productive Control Ordinary Diﬀerential Equaltion Partial Diﬀerential Equaltion Proper Orthogonal Decomposition Real Time Optimization 142 Appendix A Gramians A.1 Balancing transformations Suppose we deﬁne the following continuous time linear time-invariant system ẋ y = A C B D x u , (A.1) and the Lyapunov equations from which the controllability and observability Gramians can be solved AP + P AT + BB T = 0 AT Q + QA + C T C = 0 . (A.2) Let us deﬁne the eigenvalue decompositions of the controllability and observability Gramians P and Q, respectively, where Uc and Uo are chosen orthogonal P = Uc Σ2c UcT Q = Uo Σ2o UoT . (A.3) (A.4) The product of controllability and observability Gramians can now be written as P Q = Uc Σ2c UcT Uo Σ2o UoT . (A.5) Consider the similarity transformation T T T = Σ−1 c Uc , 143 (A.6) and the transformed linear system B z ż T AT −1 T B z A . = = D u y D u CT −1 C (A.7) The transformed Lyapunov equations (T or −T (T AT −1 )(T P T T ) + (T P T T )(T −T AT T T ) + (T B)(B T T T ) = 0 (A.8) AT T T )(T −T QT −1 ) + (T −T QT −1 )(T AT −1 ) + (T −T C T )(CT −1 ) = 0 , P + PA T + B B T = 0 A T T A Q + QA + C C = 0 . (A.9) With this speciﬁc transformation the controllability gramian becomes the unity matrix, T 2 T −1 P = T P T T = Σ−1 c Uc Uc Σc Uc Uc Σc = I . (A.10) The product of controllability and observability gramians in the transformed domain can now be written as = T P T T T −T QT −1 = T P QT −1 PQ T 2 T 2 T = (Σ−1 c Uc )(Uc Σc Uc )(Uo Σo Uo )(Uc Σc ) (A.11) T = (Σc Uc Uo Σo )(Σo UoT Uc Σc ) T = H H . The singular value decomposition of H gives H = UH ΣH VHT , (A.12) which enables to write the controllability gramian in the transformed domain as = IQ =Q = H T H = VH ΣH U T UH ΣH V T = VH Σ2 V T . PQ H H H H (A.13) We can balance the transformed linear system by applying a second transformation R such that −1 = ΣH . RPRT = R−T QR (A.14) 1 2 VHT The transformation that does this is R = ΣH −1 −1 R−1 = VH ΣH 2 → R−T = ΣH 2 VHT R 2 = ΣH VHT → RT 2 = VH ΣH . 1 1 (A.15) The composed transformation that brings the original system into a balanced realization is 1 T 2 RT = ΣH VHT Σ−1 c Uc . 144 (A.16) A.2 Perturbed empirical Gramians Suppose we deﬁne initial conditions in sets of orthogonal groups Tl with diﬀerent amplitudes cm . The matrix of perturbed initial conditions used for the discrete time perturbed data based observability Gramian becomes X0 = [c1 T1 , c1 T2 , .., c1 Tr |c2 T1 , c2 T2 , .., c2 Tr | . . . |cs T1 , cs T2 , .., cs Tr ] , (A.17) with Tl TlT = I, l = 1, . . . , r. In this special case X0 X0T can be written as X0 X0T = r s c2m Tl TlT = l=1 m=1 r s c2m I = γI , (A.18) l=1 m=1 which yields for right-inverse X0† X0† = X0T (X0 X0T )−1 = γ −1 X0T . (A.19) Substitution of X0† = γ −1 X0T and YN = Γo X0 in the deﬁnition of Wo yields T Wo = X0† YNT YN X0† = γ −2 X0 X0T ΓTo Γo X0 X0T = γ −1 X0 X0T ΓTo Γo . (A.20) This matrix multiplication can be written as a sequence of summations γ −1 X0 X0T ΓTo Γo = γ −1 = r l=1 = s r c2m Tl TlT l=1 m=1 s q (A.21) k=0 q 1 2 1 k · 2 cm Tl TlT F T C T CF k Tl TlT(A.22) rs c m m=1 k=0 r s l=1 k F T C T CF k q 1 T Tl Ψlm k Tl , 2 rsc m m=1 (A.23) k=0 where ilm Ψlm − yss )T (ykjlm − yss ) , ij k = (yk (A.24) ykilm = cm CF k Tl ei , (A.25) since is the output at time t = k∆t to the free response of the system to the perturbed initial condition xo = cm Tl ei + xss . This proves that for the special orthogonal sets of initial conditions, the empirical observability Gramian deﬁned by Lall is a special case of the discrete time perturbed data based observability Gramian as deﬁned in this thesis. 145 Appendix B Proper orthogonal decomposition Suppose we deﬁne the following discrete time linear time-invariant system xk+1 F G xk = , (B.1) yk uk C D and white noise input sequence u0 , u1 , . . . , uN . We can generate a snapshot matrix XN by simulation of the system with the white noise input sequence, XN = x1 x2 · · · xN , (B.2) where xk is the value of the state at t = kh of the response to the input sequence u(t). XN can be written as N XN = ΓN c UN , (B.3) N with ΓN c the discrete time controllability matrix and the input matrix UN G FG · · · FNG (B.4) ΓN c = u0 u1 · · · uN 0 u0 · · · uN −1 N (B.5) = . UN . .. .. .. .. . . . 0 ··· 0 u0 If the system is stable limq→∞ F q = 0. Therefore we can truncate the controllability matrix and input matrix and approximate XN XN ≈ Γc UN , 147 (B.6) with Γc the truncated controllability matrix as in Equation (2.93) and the truncated input matrix as Γc = G F G · · · F q G , (B.7) UN = u0 0 .. . u1 u0 .. . ··· ··· .. . uq uq−1 uN −1 uN −2 .. . 0 ··· 0 u0 uN −1−q . (B.8) T The expected value for N q of UN UN is N −1 T } E{UN UN k=0 = uk uTk 0 .. . 0 N −2 k=0 ... uk uTk .. . 0 = 0 (N − 2)I ... .. =N I 0 .. . 0 since E{ui uTj } = 0 uk uTk (B.9) (N − 1 − q)I 0 I ... .. . 0 0 .. . , I ∀ i = j and E{ui uTj } = I The expected value of 0 0 .. . . 0 N −1−q k=0 (N − 1)I 0 .. . 0 0 .. . T E{XN XN } ∀ i = j. is T T T } = E{Γc UN UN Γc } = N Γc ΓTc = N Wc . E{XN XN (B.10) In case we compare both singular value decompositions we see that both are identical except for the factor N Γc ΓTc = Wc = Uc Σc UcT T XN XN = N Wc = N Uc Σc UcT . (B.11) So that is the relation between the snapshot matrix exited with white noise and the discrete time controllability Gramian. 148 Appendix C Nonlinear Optimization We consider only smooth, i.e. diﬀerentiable constraint nonlinear programming problems min f (x), x ∈ Rn x s.t. gi (x) = 0, i ∈ {1, . . . , p} (C.1) hj (x) ≥ 0, j ∈ {1, . . . , q} . Here, x is a n-dimensional vector with the so-called decision variables, f (x) is the objective or cost function to be minimized subjected to p equality constraints and q inequality constraints. These functions are assumed to be continuously diﬀerentiable in Rn . Equality constraints Suppose we only have equality constraints the Lagrangian for the problem is L(x, λ) = f (x) − λT g(x) , and the ﬁrst order optimality condition is ∇x f (x) − λT ∇x g(x) ∇L(x, λ) = =0. −g(x) (C.2) (C.3) We can write down the sequence that is used in a sequential quadratic program (sqp) to compute the optimal value to this problem starting at an initial guess for the problem xk and λk . The update scheme is xk ∆xk xk+1 = + , (C.4) λk+1 λk ∆λk 149 where ∆xk and ∆λk are the solution of ∆xk 2 ∇ L(xk , λk ) = −∇L(xk , λk ) , ∆λk and where ∇2 L(xk , λk ) = ∇2xx f (xk ) − λT ∇2xx g(xk ) −∇x g(xk ) 0 −∇x g(xk )T (C.5) . (C.6) Inequality constraints Inequality constraints are more diﬃcult to deal with because it is unknown which inequalities are active. Interior point method a possible approach to deal with inequality constraints and is discussed here. For simplicity we assume only to have inequality constraints. By introducing slack variables the optimization problem is transformed in min f (x), x ∈ Rn x s.t. hj (x) − y = 0, j ∈ {1, . . . , q} (C.7) y≥0. A Lagrangian for this problem is L(x, λ, y) = f (x) − λT (h(x) − y) − µ q log yj , (C.8) j=1 with the ﬁrst order optimality condition ∇x f (x) − λT ∇x h(x) =0. −(h(x) − y) ∇L(x, λ, y) = λj − yµj , j ∈ {1, . . . , q} (C.9) When µ approaches to zero the ﬁrst order optimality conditions coincide with the optimality conditions for the problem with the slack variable ∇x f (x) − λT ∇x h(x) = 0 (C.10) y − h(x) = 0 λj yj − µ = 0, (C.11) (C.12) j ∈ {1, . . . , q} . This implies that the j −th constraint is active when yj = 0 and the corresponding Lagrangian multiplier λj ≥ 0. Or, reversely, the j −th constraint is not active when yj > 0 which results in the corresponding Lagrangian multiplier λj = 0. 150 This is the interior point method and a modern variant of a barrier method. In this case the barrier function added to the Lagrangian assures that y ≥ 0. Suppose µk is a sequence of positive numbers tending to zero. We can write down the sequence that is used in a sequential quadratic program (sqp) to compute the optimal value to this problem starting at an initial guess for the problem xk , λk and yk . The update scheme is xk ∆xk xk+1 λk+1 = λk + ∆λk , (C.13) yk+1 yk ∆yk where ∆xk , ∆λk and ∆yk are the solution of ∆xk ∇2 L(xk , λk , yk ) ∆λk = −∇L(xk , λk , yk ) , ∆yk and where ∇2xx f (xk ) − λT ∇2xx h(xk ) 2 −∇x h(xk )T ∇ L(xk , λk , yk ) = 0 (C.14) −∇x h(xk ) 0 0 I , (C.15) Λk Yk with Yk = diag{y1k , . . . , yqk } and Λk = diag{λk1 , . . . , λkq }. With these matrices λkj yjk with j ∈ {1, . . . , q} can be written as Λk ykT or Yk λTk . See for more details e.g. Nash and Sofer (1996). 151 Appendix D Gradient information of projected models For a dynamic optimization deﬁned as min u(p)∈U s.t. tf V = Lzdt t=t0 ẋ = f (x, y, u) , x(t0 ) = x0 0 = g(x, y, u) z = Cx x + Cy y + Cu u 0 ≤ h(z, t) , (D.1) the gradient information required for sequential dynamic optimization can be derived from the partial diﬀerential equations ∂V ∂p = ∂V ∂z ∂u , ∂z ∂u ∂p (D.2) ∂h ∂p = ∂h ∂z ∂u , ∂z ∂u ∂p (D.3) ∂h ∂u ∂z where ∂V ∂z , ∂z and ∂p can be derived analytically and ∂u can be approximated by a linear time variant model along the trajectory. ∂z The eﬀect of model order reduction by projection on the approximation of ∂u : 153 ∆z0 ∆z1 ∆z2 ∆z3 .. . = where T1) T2) D0 1 Γ0 C 2 Φ 1Γ 0 C 0 C3 Φ2 Φ1 Γ .. . T1 T2 0 D1 2 Γ 1 C 1 C3 Φ2 Γ 0 0 0 D3 ... ... ... .. =I 0 0 D2 3 Γ 2 C ⇔ T1 T2 T1) T2) . = ∆u0 ∆u1 ∆u2 ∆u3 .. . I 0 0 I , (D.4) , (D.5) and the eﬀect of truncation is i = T1 Φi T1) Φ i = T 1 Γi Γ (D.6) i = Ci T ) C 1 (D.8) (D.7) Therefore 3 Φ 2Φ 1Γ 0 = C3 T1) T1 Φ2 T1) T1 Φ1 T1) T1 Γ0 = C3 Φ2 Φ1 Γ0 . C (D.9) Note that the size does not change. For residualization a similar derivation is possible. The direct throughput term, D, is eﬀected then as well. The gradient is eﬀected by projection which explains why the number of iterations can diﬀer for diﬀerent projections. 154 Summary Model Reduction for Dynamic Real-Time Optimization of Chemical Processes Jogchem van den Berg The value of models in process industries becomes apparent in practice and literature where numerous successful applications are reported. Process models are being used for optimal plant design, simulation studies, for oﬀ-line and online process optimization. For online optimization applications the computational load is a limiting factor. The focus of this thesis is on nonlinear model approximation techniques aiming at reduction of computational load of a dynamic real-time optimization problem. Two type of model approximation methods were selected from literature and assessed within a dynamic optimization case study: model reduction by projection and physics-based model reduction. The model in the case study is described by a set of nonlinear coupled differential and algebraic equations and for the implementation of the dynamic optimization problem is chosen for the sequential approach. Assessment of different algorithms and implementations of the dynamic optimization are not part this thesis. Model order reduction by projection is partially successful. Even with a strongly reduced number of transformed diﬀerential equations it is possible to compute acceptable approximate solutions. The original optimal solution was for many reduced order models close to the optimum of the optimization based on the reduced order models. On reduction of computational time of the optimization however it is not successful. Model reduction by projection does not reduce computational time of the optimization as implemented in this thesis. Reduced order models by projection of nonlinear models, described by a set of diﬀerential and algebraic equations, are not suited for simulation. Projection does not provide predictable results in terms of simulation error and stability 155 and does not reduce the computational load of simulation. This can be ascribed to the sparse structure that is destroyed by the projection. Two projection methods from literature are compared. Empirical Gramians were unravelled and reduced to averaging of linear Gramians. Proper orthogonal decomposition was related to balanced reduction. In special cases the empirical controllability Gramian can be computed directly from the data and used for a proper orthogonal decomposition. Physics-based model reduction is very successful in reducing the computational load of the sequential dynamic optimization problem. It reduces the computational load in the case study with a factor of seventeen and with an acceptable approximation error on the optimal solution. This reduction technique requires detailed process and modelling knowledge. 156 Samenvatting Model Reduction for Dynamic Real-Time Optimization of Chemical Processes Jogchem van den Berg Procesmodellen worden onder meer gebruikt voor het optimaliseren van fabrieksontwerpen, het uitvoeren van simulatiestudies en voor zowel oﬀ-line als online optimalisatie van de bedrijfsvoering. Ook in de literatuur worden talrijke succesvolle toepassingen vermeld waaruit de toegevoegde waarde van het gebruik van modellen in de procesindustrie duidelijk wordt. Voor online optimalisatie toepassingen is de tijd die nodig is voor de berekeningen een limiterende factor. In dit proefschrift ligt de nadruk op approximatietechnieken van niet-lineaire modellen met als doel het reduceren van de benodigde rekentijd voor het oplossen van dynamische optimalisatieproblemen. Hiervoor zijn twee approximatietechnieken uit de literatuur geselecteerd en beoordeeld binnen een dynamische optimalisatie voorbeeldstudie: modelreductie door middel van projectie en fysisch gebaseerde reductie. Het model in de voorbeeldstudie wordt beschreven door een stelsel van nietlineaire gekoppelde diﬀerentiaal en algebraı̈sche vergelijkingen en als implementatie van het dynamische optimalisatieprobleem is gekozen voor de sequentiële methode. Het beoordelen van verschillende algoritmen en implementaties van dynamische optimalisatie zijn geen onderdeel van dit proefschrift. Modelreductie door middel van projectie is deels succesvol gebleken. Zelfs met een zeer sterk gereduceerd aantal getransformeerde diﬀerentiaal vergelijkingen is het mogelijk gebleken om acceptabele benaderende oplossingen te berekenen. Ook lag het oorspronkelijke optimum voor veel gereduceerde modellen dicht bij het optimum gebaseerd op het gereduceerde model. Het reduceren van de benodigde rekentijd was daarentegen minder succesvol. Modelreductie door middel van projectie levert geen reductie op van de benodigde rekentijd van het optimalisatieprobleem zoals dat geı̈mplementeerd is in dit proefschrift. 157 Modelreductie door middel van projectie van niet-lineaire modellen, beschreven door een set van diﬀerentiaal en algebraı̈sche vergelijkingen, is niet geschikt voor simulatie doeleinden. Projectie levert geen voorspelbare resultaten in termen van simulatiefout en simulatiestabiliteit en reduceert de benodigde rekentijd van de simulatie niet. Het laatste kan toegeschreven worden aan de ijle structuur die teniet wordt gedaan door de projectie. Er zijn twee projectiemethoden uit de literatuur vergeleken. Empirische Gramianen zijn ontrafeld en teruggebracht tot het middelen van lineaire Gramianen. En proper orthogonal decomposition is gerelateerd aan gebalanceerde reductie. In bijzondere gevallen kan de empirische Gramian direct uit de data berekend worden die ook gebruikt wordt voor een proper orthogonal decomposition. De op vereenvoudigen van de fysica gebaseerde modelreductie is zeer effectief in het reduceren van de rekentijd die nodig is voor het oplossen van een sequentiële dynamisch optimalisatieprobleem. Het reduceert de benodigde rekentijd in het geval van de voorbeeldstudie met een factor zeventien met een acceptabele benaderingsfout op de optimale oplossing. Deze reductietechniek vereist wel gedegen proces- en modelleringkennis. 158 Curriculum Vitae Jogchem van den Berg was born at February 8, 1974 in Enschede, The Netherlands. 1986-1992 1982-1998 1998-2003 2003- Atheneum-B at Jacobus College, Enschede. MSc. Mechanical Engineering Systems and Control at Delft University of Thechnology. Thesis on Modelling of a Crystallization Process conducted at DSM, Geleen. PhD. Mechanical Engineering Systems and Control at Delft University of Thechnology. Thesis on Model Reduction for Dynamic Real-Time Optimization of Chemical Processes. Advanced Process Control Engineer at Cargill, Bergen op Zoom. 159

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement