# Institutionen för systemteknik Department of Electrical Engineering Testing

Institutionen för systemteknik Department of Electrical Engineering Examensarbete Evaluation of two Methods for Identifiability Testing Examensarbete utfört i Reglerteknik vid Tekniska högskolan i Linköping av Peter Nyberg LiTH-ISY-EX--09/4278--SE Linköping 2009 Department of Electrical Engineering Linköpings universitet SE-581 83 Linköping, Sweden Linköpings tekniska högskola Linköpings universitet 581 83 Linköping Evaluation of two Methods for Identifiability Testing Examensarbete utfört i Reglerteknik vid Tekniska högskolan i Linköping av Peter Nyberg LiTH-ISY-EX--09/4278--SE Handledare: Gunnar Cedersund ike, Linköpings universitet Christian Lyzell isy, Linköpings universitet Jan Brugård Mathcore Examinator: Martin Enqvist isy, Linköpings universitet Linköping, 7 October, 2009 Avdelning, Institution Division, Department Datum Date Division of Automatic Control Department of Electrical Engineering Linköpings universitet SE-581 83 Linköping, Sweden Språk Language Rapporttyp Report category ISBN Svenska/Swedish Licentiatavhandling ISRN Engelska/English ⊠ Examensarbete ⊠ C-uppsats D-uppsats Övrig rapport 2009-10-07 — LiTH-ISY-EX--09/4278--SE Serietitel och serienummer ISSN Title of series, numbering — URL för elektronisk version http://www.control.isy.liu.se http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-ZZZZ Titel Title Utvärdering av två metoder för identifierbarhetstestning Evaluation of two Methods for Identifiability Testing Författare Peter Nyberg Author Sammanfattning Abstract This thesis concerns the identifiability issue; which, if any, parameters can be deduced from the input and output behavior of a model? The two types of identifiability concepts, a priori and practical, will be addressed and explained. Two methods for identifiability testing are evaluated and the result shows that the two methods work well if they are combined. The first method is for a priori identifiability analysis and it can determine the a priori identifiability of a system in polynomial time. The result from the method is probabilistic with a high probability of correct answer. The other method takes the simulation approach to determine whether the model is practically identifiable. Non-identifiable parameters manifest themselves as a functional relationship between the parameters and the method uses transformations of the parameter estimates to conclude if the parameters are linked. The two methods are verified on models with known identifiability properties and then tested on some examples from systems biology. Although the output from one of the methods is cumbersome to interpret, the results show that the number of parameters that can be determined in practice (practical identifiability) are far fewer than the ones that can be determined in theory (a priori identifiability). The reason for this is the lack of quality, noise and lack of excitation, of the measurements. Nyckelord Keywords Identifiability, Mean optimal transformation approach, Multistart simulated annealing, Sedoglavic observability test Abstract This thesis concerns the identifiability issue; which, if any, parameters can be deduced from the input and output behavior of a model? The two types of identifiability concepts, a priori and practical, will be addressed and explained. Two methods for identifiability testing are evaluated and the result shows that the two methods work well if they are combined. The first method is for a priori identifiability analysis and it can determine the a priori identifiability of a system in polynomial time. The result from the method is probabilistic with a high probability of correct answer. The other method takes the simulation approach to determine whether the model is practically identifiable. Non-identifiable parameters manifest themselves as a functional relationship between the parameters and the method uses transformations of the parameter estimates to conclude if the parameters are linked. The two methods are verified on models with known identifiability properties and then tested on some examples from systems biology. Although the output from one of the methods is cumbersome to interpret, the results show that the number of parameters that can be determined in practice (practical identifiability) are far fewer than the ones that can be determined in theory (a priori identifiability). The reason for this is the lack of quality, noise and lack of excitation, of the measurements. Sammanfattning Fokus i denna rapport är på identifierbarhetsproblemet. Vilka parametrar kan unikt bestämmas från en modell? Det existerar två typer av identifierbarhetsbegrepp, a priori och praktisk identifierbarhet, som kommer att förklaras. Två metoder för identifierbarhetstestning är utvärderade och resultaten visar på att de två metoderna fungerar bra om de kombineras med varandra. Den första metoden är för a priori identifierbarhetsanalys och den kan avgöra identifierbarheten för ett system i polynomiell tid. Resultaten från metoden är slumpmässigt med hög sannolikhet för ett korrekt svar. Den andra metoden använder sig av simuleringar för att avgöra om modellen är praktiskt identifierbar. Icke-identifierbara parametrar yttrar sig som funktionella kopplingar mellan parametrar och metoden använder sig av transformationer av parameterskattningarna för att avgöra om parametrarna är kopplade. De två metoderna är verifierade på modeller där identifierbarheten är känd och är därefter testade på några exempel från systembiologi. Trots att resultaten från den ena metoden är besvärliga att tolka visar resultaten på att antalet parametrar som går att bestämma i verkligheten (praktiskt identifierbara) v vi är betydligt färre än de parametrar som kan bestämmas i teorin (a priori identifierbara). Anledningen beror på brist på kvalitet, både brus och brist på excitation, i mätningarna. Acknowledgments First I would like to thank my examiner Martin Enqvist and my co-supervisor Christian Lyzell for the help with the report. I also would like to thank my co-supervisor Gunnar Cedersund for all good answers to my questions and the different models from systems biology. Thanks to Jan Brugård who gave me the opportunity to do my thesis at MathCore. Last but not least I would like to thank my wonderful girlfriend Eva for sticking with me. vii Contents 1 Introduction 1.1 Thesis Objectives . . . . . . 1.2 Computer Software . . . . . 1.2.1 Mathematica . . . . 1.2.2 MathModelica . . . 1.2.3 Matlab . . . . . . . 1.2.4 Maple . . . . . . . . 1.3 Organization of the Report 1.4 Limitations . . . . . . . . . 1.5 Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 2 2 3 3 3 3 3 2 Theoretical Background 2.1 Identifiability . . . . . . . . . . . . . . . . . 2.2 Multistart Simulated Annealing . . . . . . . 2.2.1 Cost Function . . . . . . . . . . . . . 2.2.2 Settings . . . . . . . . . . . . . . . . 2.2.3 Input . . . . . . . . . . . . . . . . . 2.2.4 Output . . . . . . . . . . . . . . . . 2.3 Mean Optimal Transformation Approach . 2.3.1 Alternating Conditional Expectation 2.3.2 Input . . . . . . . . . . . . . . . . . 2.3.3 The Idea Behind MOTA . . . . . . . 2.3.4 Test-function . . . . . . . . . . . . . 2.3.5 Output . . . . . . . . . . . . . . . . 2.4 Observability . . . . . . . . . . . . . . . . . 2.5 Sedoglavic’s Method . . . . . . . . . . . . . 2.5.1 A Quick Introduction . . . . . . . . 2.5.2 Usage . . . . . . . . . . . . . . . . . 2.5.3 Drawbacks of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 7 7 8 10 11 11 12 12 13 13 14 15 18 18 19 20 20 3 Results 3.1 Verifying MOTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 21 21 24 . . . . . . . . . . . . . . . . . . . . . . . . . . . ix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Contents 3.2 Verifying Sedoglavic Observability Test . . . . . . . . . . . . . . . . 3.2.1 A Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Goodwin’s Napkin Example . . . . . . . . . . . . . . . . . . 3.2.3 A Nonlinear Model . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Compartmental Model . . . . . . . . . . . . . . . . . . . . . Implementation Aspects and Computational Complexity . . . . . . 3.3.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 ACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Selection Algorithm . . . . . . . . . . . . . . . . . . . . . . Evaluation on Real Biological Data . . . . . . . . . . . . . . . . . . 3.4.1 Model 1: SimplifiedModel . . . . . . . . . . . . . . . . . . . 3.4.2 Model 2: Addition of insulin to the media . . . . . . . . . . 3.4.3 Model 3: A model for the insulin receptor signaling, including internalization . . . . . . . . . . . . . . . . . . . . . . . 26 26 27 27 27 28 28 29 29 29 30 35 4 Conclusions and Future Work 4.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Proposal for Future Work . . . . . . . . . . . . . . . . . . . . . . . 47 47 48 A Programming Examples A.1 Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 MathModelica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3 Maple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 51 51 52 3.3 3.4 41 Chapter 1 Introduction A common question in control theory and systems biology is whether or not a model is identifiable. The reason for this is that the parameters of the model cannot be uniquely determined if the model is non-identifiable. Why is this important? The answer is that the parameters can have some physical meaning or the search procedures for the parameter estimates may suffer if these are not unique (Ljung and Glad, 1994). In systems biology this is a big problem due to few measurements compared to the number of parameters in the model. The models in systems biology are often described by systems of differential equations (Hengl et al., 2007) also known as the state-space representation ẋ = f (x, p, u) y = g(x, p, u) where x are the states, p the parameters, y the outputs, and u the inputs of the = f (x, p, u). The model. The dynamics is described by the formula ẋ = dx(t) dt parameters are, e.g., reaction rates which have to be determined with the help of the measured data. As we will see there are two types of identifiability concepts. The first one regards only the model structure and the other one originates form the lack of quality of the measurements. In this thesis two methods will be evaluated regarding the identifiability issue. The first method focuses on the model equations and the second method handles also the quality of the measurements. 1.1 Thesis Objectives In this thesis we will describe the difference between the two types of identifiability concepts which will be explained in Section 2. The main purpose of this thesis is to investigate methods for determining the identifiability property of a given model. This includes both implementation and evaluation of existing methods in the MathModelica software environment. The algorithms which have been 1 2 Introduction translated from Matlab to MathModelica/Mathematica are described in Section 2.2 and in Section 2.3. 1.2 Computer Software During the work with this thesis several programming languages have been used. A translation of two algorithms from Matlab to Mathematica has been done. For the simulation a Modelica-based language MathModelica and the Systems Biology Toolbox (SBTB) have been used and finally Maple has been required due to another algorithm that is written in that language. These four languages will be briefly described in this section. Some syntax examples are shown in Appendix A. 1.2.1 Mathematica Mathematica is a program originally developed by Stephen Wolfram. The largest part of the Mathematica user community consists of technical professionals but the language is also used in education around the globe. One useful feature with Mathematica is that you can choose what programming style that suits you. Ordinary or procedural programming goes hand in hand with functional and rule-based programming. The downside is that you can write one line of code that does almost anything but the reader understands nothing. This can be solved with proper comments and basic knowledge about the language. An example in Appendix A illustrates the different ways to write a piece of code that sums up all the even numbers from 1 to 1000. For more information about Mathematica see Wolfram (1999). 1.2.2 MathModelica Due to the demand of experiment fitting, that is, fitting the model several1 times to data to obtain estimates of the parameters, there is a need of simulation environments and in this thesis MathModelica has partly been used for the simulation. MathModelica System Designer Professional from MathCore has been used and it provides a graphical environment for modeling and an environment for simulation tasks. With the Mathematica link, which connects MathModelica with Mathematica, one can make use of the Mathematica notebook environment and its facilities (Modelica, 2009). For more information about MathModelica see Mathcore (2009) and for the Modelica language see Fritzson (2003). One of the main advantages of a Modelica-based simulation environment is that its acausal. There is no predetermined input or output (if you do not force it to be) for the components meaning, e.g., for a resistor component (see Appendix A) either R, i or v can serve as inputs or outputs depending of the structure on the whole system. 1 Several is due to the choice of the parameter estimator in this thesis, see Section 2.2. 1.3 Organization of the Report 1.2.3 3 Matlab Matlab is a language that is based on numerical computing. It is developed by The MathWorks and it is used heavily around the world for technical and education purposes. Matlab can be extended with several toolboxes and in this thesis the SBTB has been used with the package SBADDON for the purpose of simulating in the Matlab environment. 1.2.4 Maple Maple is a technical computing and documentation environment, based on a computer algebra system and originated from the Symbolic Computation Group at the University of Waterloo, Canada. Maple has been used because one algorithm for testing identifiability has been written in Maple. The algorithm will be explained in Section 2.5. 1.3 Organization of the Report The organization of this thesis is as follows. The background theory and the methods for testing identifiability will be explained in Section 2. In Section 3 the results are presented. The two methods are first verified with the help of some examples. Furthermore, the implementation aspects from Matlab to MathModelica/Mathematica are discussed. Thereafter, the methods will be tested with the help of examples from systems biology. In Section 4 the conclusions and the proposal to future work are to be found. 1.4 Limitations One limitation in this thesis is that we focus on systems biology and the examples are all taken from that field of science. Another limitation is that the the second method for determining the identifiability of a model, see Section 2.5, has not been translated to MathModelica/Mathematica and therefore has not been studied as thoroughly. 1.5 Acronyms MOTA Mean Optimal Transformation Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 MSA Multistart Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 ACE Alternating Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 SOT Sedoglavic’s Observability Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 SBTB Systems Biology Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Chapter 2 Theoretical Background In systems biology, but also within many other areas, over-parametrization and non-identifiability is a big problem. This problem is present when there are parts of a model that cannot be identified uniquely (or observed) from the given measurements. There are two types of identifiability: a priori and practical. Practical identifiability implies a priori identifiability, but not vice versa. For a model with unknown parameters there is a possibility to determine these parameters from the input and output signals. If the model is a priori non-identifiable the parameters cannot be determined even if the input signals are free from noise. Analyzing the model structure itself with respect to non-identifiability is called a priori (structural) identifiability analysis since the analysis is done before any experiment fitting and simulation. In this thesis the a priori or structural identifiability regards the local identifiability aspects of a model and not the global property. Global identifiability implies local identifiability, but not the other way around. See Section 2.1 for the definition of global and local identifiability. For nonlinear models numerous approaches have been proposed, e.g., the power series expansion (Pohjanpalo, 1978), differential algebra (Ljung and Glad, 1994) and similarity transform (Vajda et al., 1989). However, with increasing model complexity, these methods become mathematically intractable (Hengl et al., 2007; Audoly et al., 2001). Due to the importance of time efficient algorithms a method proposed by Alexandre Sedoglavic has been used in this thesis. The algorithm, Sedoglavic’s Observability Test (SOT), is polynomial in time with the drawback that the result is probabilistic (Sedoglavic, 2002). The probabilistic nature of the algorithm originates from simplifications in the algorithm to speed up the calculations (Sedoglavic, 2002). In Section 2.5 the SOT will be further explained. There is another way to detect non-identifiable parameters, besides a priori identifiability analysis, and this is done with the help of simulation and parameter fitting. Non-identifiable parameters manifest themselves as functionally related parameters, in other words the parameters depend on each other. If the parameters are functionally related this leads to that it may only be, for example, the sum or quotient that can be determined. This results in that the parameters cannot be determined uniquely and the parameters are non-identifiable. Non-identifiable pa5 6 Theoretical Background Figure 2.1. This figure show the block diagram of the different paths used by the two algorithms. The simulation approach is the upper one. From the model equations the MSA algorithm produces estimates of the parameters which are used in MOTA to determine which of the parameters that are identifiable. The a priori identifiability analysis is a bit more straightforward; from the model equations the parameters which are a priori identifiable are determined with the help of the SOT. rameters can be detected by fitting the model to data consecutively, several1 times, to obtain estimates of the parameters which are then examined. One such method is presented in Hengl et al. (2007). Their method, Mean Optimal Transformation Approach (MOTA), is based on the simulation approach and is developed from the Alternating Conditional Expectation (ACE) (Breiman and Friedman, 1985). The simulation approach needs a parameter estimator that produces estimates of the parameters. One such algorithm is Potterswheel (Maiwald and Timmer, 2008) that Hengl et al, used. Another one is the well known Gauss-Newton algorithm. However in this thesis the Multistart Simulated Annealing (MSA) (Pettersson, 2008) is used instead. One of the advantages with the latter one is that it only needs function evaluations and not derivatives of the function. An advantage over the Gauss-Newton method is that the MSA algorithm can find parameter estimates that can describe the model fairly well even if the initial guess is far from the minimum. In Figure 2.1 the a priori identifiability analysis and the simulation approach are illustrated. This chapter contains an introduction to some basic results about identifiability and observability. Furthermore, the two algorithms that have been used in this thesis are explained. 1 Due to the choice of the parameter estimator. 2.1 Identifiability 2.1 7 Identifiability To be more specific and formal about identifiability, study the following case. Given a constrained model structure, a structure with constraints given by the model, dx(t, p) = f (x(t, p), u(t), t, p) dt y(t, p) = g(x(t, p), p) x0 = x(t0 , p) h(x(t, p), u(t); p) ≥ 0 t0 ≤t ≤ tf , (2.1a) (2.1b) (2.1c) (2.1d) (2.1e) where x denotes the state variables, u the externally given input signals, p the system parameters, and y the observations. The initial values x0 = x(t0 , p), and h denote all additional constraints formulated as explicit or implicit algebraic equations. A single parameter pi in (2.1) is globally identifiable, if there exist a unique solution for pi from the constrained model structure. A parameter with countable or uncountable number of solutions is locally identifiable or non-identifiable, respectively. In theory one can assume that the measurements are ideal, e.g., noise-free and continuous, but in many situations this is not the case. As for biological data the measurements include observational noise. To take this into consideration we can generalize (2.1b) to y(t, p) = g(x(t, p), p) + ǫ(t), where ǫ(t) represents the noise. As mentioned before, practical identifiability implies a priori identifiability, but not the other way around. Practical identifiability of a given model structure holds when we can determine the values of the parameters, with a small enough variance, from the measurements of the input and output signals. Due to the noise, the a priori identifiable parameters can become practically non-identifiable (Hengl et al., 2007). 2.2 Multistart Simulated Annealing To determine the functional relationship between parameters there is a need for an algorithm that produces estimates of these parameters that can be further analyzed. In this thesis the MSA algorithm has been used. Another algorithm that can be used is the multi-experiment fitting that is presented by Maiwald and Timmer (2008). From now on we shall concentrate on the first one and use it to obtain the estimates of the parameters. These estimates are then used by the algorithm that is presented in Section 2.3, see also Figure 2.1. 8 Theoretical Background The MSA algorithm is a random search method that tries to mimic the behavior of atoms in equilibrium at a given temperature. The algorithm starts at one temperature and then begins to cool down until its temperature reaches the stoptemperature. The temperature is proportional to the randomness in the search, the higher the temperature the more random is the search and for lower temperature the algorithm behaves more like a local search algorithm. One of the advantages with this algorithm is that it only needs function evaluations, not derivatives of the function. For more information about the technical aspects of the MSA algorithm see Chapter 3 in Pettersson (2008) and the references therein. Algorithm 1 Multistart Simulated Annealing Requirements: Initial starting guess X0 , start temperature T1 , stop temperature Ts , temperature factor νtemp , cost function, low bound lb , high bound hb , a positive number σ and the number of iterations used for each temperature N . 1: Initiate parameters and put k = 1. 2: while Tk > Ts do 3: Perform N iterations of Annealed Downhill-Simplex, a random search (Pettersson, 2008), at temperature Tk . Each iteration gives a point p̄ and N iterations gives the points P = [p̄1 , p̄2 , . . . , p̄N ]. 4: Set Tk+1 lower than Tk , Tk+1 = νtemp Tk and put k = k + 1. 5: The suitable restart points R are calculated with the help of clustering analysis. From the N points a critical distance, 1 k log(kN ) k1 rk = π − 2 Γ(1 + )F (σ, lb , hb ) , 2 kN is calculated. The restart points R are the points p̄ that have a distance sufficiently far from the other points according to R = p̄ | || col(P ) − p̄ ||2 > rk , where F is a rational function and Γ is the Gamma function and col(P ) = [p̄1 , p̄2 , . . . , p̄N ] denotes all columns in P except p̄ . 6: end while 7: return The best points from each valley that have been found during the last temperature iteration. 2.2.1 Cost Function To determine which parameter vector p̄ that gives the best fit, the principle of minimizing the prediction error has been used. Let yi be measured output data 2.2 Multistart Simulated Annealing 9 points from the system that we want to estimate the parameters for. For a parameter vector p̄ the constrained model structure given by (2.1a)-(2.1e) is simulated and the outcome is a prediction ŷ(p̄)i . The input u is known and it is used when we simulate the model for different parameter vectors. The prediction error ǫ(p̄)i = yi − ŷ(p̄)i is used to measure how good the fit is between the measured data points to the simulated. From the measured output data points yi and the simulated ones ŷ(p̄)i the cost function calculates the cost, e.g., by V= N X (yi − ŷ(p̄)i )2 i std(ȳ)2 , (2.2) where std denotes the standard deviation. When the algorithm minimizes V in (2.2) we obtain estimates of parameter vectors that hopefully can explain the output from the system fairly well. The length of the simulation is determined by the measured data points yi which have been measured before the experiment fitting. The time samples determine which prediction ŷi that will be used in, e.g., the cost function (2.2). There are also possibilities to take other things into consideration than just the prediction error normalized with the standard deviation. We can choose how the weight is distributed between the prediction error, Ṽ, and an ad-hoc extra cost, V̄, by V = α Ṽ + V̄, (2.3) where α determines the weight between the two terms. For instance, the measured signal has an overshoot and all p̄ that does not produce this overshoot is given a larger cost in (2.3). For models with more than one output, one choice of cost function is the trivial expansion of (2.2), V= N1 N2 X (yi1 − ŷi1 )2 X (yi2 − ŷi2 )2 + + ... std(y1 )2 std(y2 )2 i i 1 (2.4) 2 The model or the constrained model structure is simulated numerous times in the MSA algorithm to calculate the cost function which the algorithm minimizes with respect to the parameter vector p̄. In this thesis all the models have been simulated either with the help of Mathematica and MathModelica or by Matlab and SBTB. For the first case the model has been written in MathModelica and then transfered to interact with Mathematica by the Mathematica Link. Acceptable Parameters The MSA algorithm is an optimization algorithm that searches for an optimal parameter vector p̄ that minimizes the cost. How can this be of any use regarding the issue of identifiability? The reason will be thoroughly explained in Section 2.3 where the MOTA algorithm is presented. The MOTA algorithm takes an n × q matrix K = [p̄1 , p̄2 , . . . p̄q ] as input, 10 Theoretical Background K= p1,1 p2,1 .. . p1,2 p2,2 .. . ... ... .. . p1,q p2,q .. . pn,1 pn,2 . . . pn,q , (2.5) where each row represents the r-th estimate of the parameters. The parameters are represented by the columns, e.g., the third row and fifth column represent the third estimate of the fifth parameter in the model. Here lies the answer of the question above; the MSA algorithm is used to produce this matrix K and this in done by searching for acceptable parameters. The acceptable parameters are all parameter combinations that have a cost near the cost of the best parameter vector so far. In the best parameter vector we mean the one with the lowest cost so far. In other words, all parameters that produce an acceptable cost are regarded as acceptable parameters. In this thesis we have used the threshold 110 percent of the best cost so far to determine if the cost is near or not. For each cost function evaluation the current parameter vector p̄ with a given cost is taken as acceptable if the cost is near the best cost so far. These acceptable parameters are then taken as estimates of the parameters and are used to determine if there exist any functional relationships between them. 2.2.2 Settings The MSA algorithm can be controlled by a number of settings that affect the random search. This section will explain which these settings are and what they do. The settings of the algorithm are presented in Table 2.1. Table 2.1: Settings in the MSA algorithm Temp-start Temp-end Temp-factor Maxitertemp Maxitertemp0 Max-time Tolx Tolfun Maxrestartpoints Low bound High bound the temperature where the algorithm starts the temperature where the search do its last iteration the factor of the cool-down process, higher factor implies faster cool-down after each temperature-loop the number of iterations for each temperature the number of iterations when the temp-end has been reached the maximum time the search can proceed until termination a critical distance between the evaluated points in the optimization a tolerance between the maximum and minimum function value the number of possible restart points after each iteration in the temperature loop a low bound for the parameters which are optimized a high bound for the parameters which are optimized 2.2 Multistart Simulated Annealing 11 The algorithm consists mainly of two nested loops, one outer loop called the temperature loop and one inner loop that contains the main algorithm. The temperature loop runs as long as the temperature is above the critical stop temperature setting. For each temperature this loop constructs a simplex, a geometrical figure, for each restart point. From the restart point, which is a q-dimensional vector, the simplex is created by modifying the restart point element by element. This modification is either a relative change or an absolute one. In the current implementation the relative change2 is 1.25 times the element and if the relative change results in an element which is larger than the high bound (or lower than the low bound) then an absolute3 change of ±0.5 (the sign depends on the settings of the low and high bound in the MSA) is done instead. The result is a geometrical figure which consists of q + 1 corners. In the two-dimensional case the simplex is a triangle. Let R = p̄ = [p1 , p2 ] denote a restart point. The simplex, p1 1.25p1 p1 , (2.6) simplex = p2 p2 1.25p2 is constructed by copying the restart point and also modifying its elements. For example, if the modification is only relative the simplex is the one shown in (2.6). The idea is to update the worst point/column, the one with the highest function value, in the simplex with a better one. In the inner loop the simplex is contracted, reflected, and/or expanded and the new point that has been retrieved is compared with the rest of the points in the simplex. When the the outer loop has iterated through all the restart points the next restart points are calculated with clustering techniques that take all the evaluated points with function values. Depending on a critical distance the output is new restart points that will be used in the next temperature. More information can be found in Pettersson (2008). 2.2.3 Input The algorithm needs two sorts of inputs. The first input is a cost function, e.g. (2.2), that determine how the function evaluations will be conducted for each parameter vector p̄. The second is a start guess of the parameters p̄0 where the algorithm starts the search. This start guess is similar to the internal restart points with the exception that it, in the current version, has to be a vector (a single restart point). 2.2.4 Output The original outputs from the algorithm are the best points in each valley with its cost. However in this thesis we are not solely interested in the optimal point. We want to get hold of several parameter estimates that later on can be analyzed with respect to functional relationships between the parameters. These estimates or acceptable parameters form a matrix J that contains m number of estimations of q number of parameters. This matrix is then used, after some modification, 2A 25 percent increase of the value in the element. absolute value of ±0.5 instead of the old value of the element. 3 An 12 Theoretical Background in the MOTA algorithm which will be explained in the next section. Usually the matrix J contains a large number of estimates. Due to computational complexity, further explained in Section 3.3, some problems would occur if we would use J as an input directly to MOTA. 2.3 Mean Optimal Transformation Approach The Mean Optimal Transformation Approach (MOTA) was proposed by Hengl et al. (2007) and is a non-parametric bootstrap-based identifiability testing algorithm. It uses optimal transformations that are estimated with the use of the Alternating Conditional Expectation (ACE) (Breiman and Friedman, 1985). The MOTA algorithm finds linear and/or nonlinear relations between the parameters regardless of the model complexity or size. This functional relationship between the parameters is then mapped to the identifiability problem; a parameter that can be expressed by other parameters is not identifiable and vice versa. 2.3.1 Alternating Conditional Expectation The Alternating Conditional Expectation (ACE) algorithm was developed by Breiman and Friedman (1985). It was first intended to be used in regression analysis but has also been applied in several other fields, e.g., Wang and Murphy (2005) used it to identify nonlinear relationships. The algorithm estimates, non-parametrically, optimal transformations. In the bivariate case the algorithm b 1 ) and Φ b 1 (p2 ) which maximize the linear estimates the optimal transformations Θ(p b 1 ) and Φ b 1 (p2 ) correlation R between Θ(p b Φ} b p ,p = sup | R(Θ̃(p1 ), Φ̃(p2 )) | . {Θ, 1 2 (2.7) Θ̃,Φ̃ In the core of the algorithm there is a simple iterative algorithm that uses bivariate conditional expectations. When the conditional expectation are estimated from a finite data set the conditional expectation is replaced by smoothing techniques4 (Breiman and Friedman, 1985). The two-dimensional case can easily be extended to any size. Let K denote an n × m matrix where n is the number of estimates and m is the number of parameters. Suppose that the m parameters have an unknown functional relationship and let Θ and Φi denote the true transformation between the parameters, Θ(pi ) = m X Φj (pj ) + ǫ, j6=i where ǫ is normal-distributed noise . The algorithm estimates optimal transforb i ) and Φ b j (pj ), where j 6= i, such that mations Θ(p 4 In b i) = Θ(p m X j6=i b j (pj ), Φ Breiman and Friedman (1985) they use the so called super-smoother. (2.8) 2.3 Mean Optimal Transformation Approach 13 and where optimal means in the sense of (2.7). The ACE algorithm differs between the left and right-hand side terms. The left-hand term is denoted as response and the right-hand terms as predictors. The calculation of (2.8) is done iteratively by the algorithm, new estimates of the transformation of the response serve as input to new estimates of the transformation of the predictors and vice versa. The ACE algorithm is summarized in Algorithm 2. For further information about the algorithm see Breiman and Friedman (1985) and Hengl et al. (2007) and the references therein. Algorithm 2 Alternating Conditional Expectation ACE minimizes the unexplained variance between the response and predictors. P For e2 (Θ, Φ1 , . . . , Φp ) = E[Θ(Y ) − pj=1 Φj (Xj )]2 the algorithm is the following (Breiman and Friedman, 1985) 1: Initiate Θ(Y ) = Y /kY k and Φ1 (X1 ), . . . , Φp (Xp ) = 0. 2: while e2 (Θ, Φ1 , . . . , Φp ) decreases do 3: while e2 (Θ, Φ1 , . . . , Φp ) decreases do P 4: for k = 1 to p do: Φk,1 (Xk ) = E[Θ(Y ) − i6=k Φi (Xi )|Xk ], replace Φk (Xk ) with Φk,1 (Xk ); The conditional expectation is replaced by smoothing techniques (Breiman and Friedman, 1985). 5: end for loop 6: end inner while Pp P 7: Θ1 (Y ) = E[ i Φi (Xi )|Y ]/kE[ i Φi (Xi )|Y ]k, replace Θ(Y ) with Θ1 (Y ). 8: end outer while 9: Θ, Φ1 , . . . , Φp are the solutions Θ∗ , Φ∗1 , . . . , Φ∗p . 10: return 2.3.2 Input The input to the MOTA algorithm is an n × q matrix K containing n estimates of the q parameters. This matrix is then analyzed with the respect to functional relationships between the parameters. How the algorithm finds these relations is the topic of the next section. 2.3.3 The Idea Behind MOTA Non-identifiability manifests itself as functionally related parameters. These relationships can be estimated by ACE and the idea is to use these estimates to inves- 14 Theoretical Background tigate the identifiability of the parameters. If there exists relationships between the parameters, the optimal transformations are quite stable from one sample to another, from a matrix K1 to a new draw of the matrix K2 . If the first matrix K1 renders one optimal transformation then K2 will render a similar one if there exists a relation between the parameters. If there is no functional relationship then these transformations will differ from sample to sample. This depends on the data smoother/filter applied by the ACE algorithm, (Breiman and Friedman, 1985; Hengl et al., 2007). This is what differs the parameters that are linked with each other from the independent ones. The process of drawing new matrices K is replaced by bootstrapping. Bootstrapping is a re-sampling method that creates re-samples from a dataset. In this case the dataset is the input matrix K. The outcome from the bootstrapping is a new matrix that has been created from random sampling with replacement from matrix K. The bootstrapping speeds up the algorithm significantly. Each matrix K is denoted as a single fitting sequence. 2.3.4 Test-function A well behaved test-function is of greatest importance and due to robustness all estimated optimal transformations are ranked. The following definition is presented in Hengl et al. (2007) Definition 1 Let φbik (pkr ) denote the value of the optimal transformation of parameter pk at its r-th estimate (r-th row) in the i-th fitting sequence, and let card denote the cardinal number of a given set, i.e., the number of elements contained in the set. Then we define αik (pkr ) as the function which maps each parameter estimate of a certain fitting sequence onto its cardinality divided by the total number N of fits conducted within one fitting sequence 1 αik (pkr ) = card Φik (pkr′ )|r′ ∈ {1, ..., N } , Φik (pkr′ ) ≤ Φik (pkr ) . N This function is then used to calculate the average optimal transformation. Note that the values of αik (pkr ) is in the range [0, 1]. Let M denote the number of fitting sequences. The used test-function is the following " # M 1 X i α (pkr ) , (2.9) Hk := var dr M i=1 k P 1 where vd ar is the empirical variance, vd ar = q−p (...)2 . A motivation is needed here. For parameters that have a strong functional relationship the average transformation ᾱk (pkr ) = M 1 X i α (pkr ) M i=1 k is independent of M , meaning the variance is constant. For parameters without any functional relationship the function αik (pkr ) is not stable from a fitting sequence to another. This leads to that αik (pkr ) has the value from zero to one from 2.3 Mean Optimal Transformation Approach 15 a fitting sequence to another one. This implies that ᾱk (t) → 0.5 when M → ∞, in other words zero variance. In the supplementary material of Hengl et al. (2007) the test-function is thoroughly explained. In the supplementary material it is 1 1 − Ordo(N −1/2 )) M holds for parameters that are indeshown that E[Hk ] = ( 12 pendent and for parameters which have a functional relationship the it holds that 1 E[Hk ] = 12 (1 − N12 ). When the test-function Hk (2.9) is low this indicates there is no functional relationship between the current response to the predictors. The other case, when Hk is not low, is more difficult. There are three threshold values T1 , T2 and T3 that are used to determine whether there exists any functional relationship between the parameters or not. If the test-function falls below T1 this is regarded as the response parameter has no functional relationship with the predictors. If the test function is between T1 and T2 there is not enough information to establish whether there exists a functional relationship between the parameters. When the test-function is above threshold T2 there is enough information to confirm that there exists a strong relation between the response and the predictors. The third threshold T3 is for the performance of the algorithm and is not important for the functionality of MOTA. In Hengl et al. (2007) and Hengl (2007) these things are thoroughly explained. 2.3.5 Output MOTA determines which parameters that are linked with each other. These relationships are one of the outputs from MOTA. These relations can be seen in Table 2.2. The table has a simple structure, the different rows represent which parameter taken as the response parameter and the columns represent the predictors. For example, the first row is when the parameter p1 is taken as the response. Due to the simple structure of the table one can easily show these relations in a matrix form instead (2.10). In this thesis the matrix (2.10) is denoted the output q × q matrix S where q denotes the number of parameters. The matrix contains only zeros and ones. In each row, ones indicate which parameters that have a functional relationship. The matrix (2.10) indicates that the first parameter p1 is independent of the other parameters. This is also true for p5 . The second row shows that when p2 is taken as the response the MOTA algorithm finds that p2 is related to p2 (trivial) but also p3 and p4 . Row three and four also display this. Table 2.2: The parameter relations from MOTA * p1 p2 p3 p4 p5 p1 1 0 0 0 0 p2 0 1 1 1 0 p3 0 1 1 1 0 p4 0 1 1 1 0 p5 0 0 0 0 1 16 Theoretical Background S= 1 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 1 (2.10) This is an ideal output, a symmetric matrix. However, this is not the case for all matrices S. The reason for this is that all parameters do not have equal contribution strength5 when taken as a predictor for a certain response. The less a parameter, a predictor, contributes to the response the noisier the transformation Φj (pj ) becomes. Finally the noise is so high the algorithm can not determine it from an independent parameter (Hengl et al., 2007). A simple example of this is low gradient, p1 = 0.001p2 + ǫ. The transformation Φ2 (p2 ) will be noisy and the algorithm will have difficulties to conclude that the two parameters are functionally related. Uneven contribution strength can result in output matrices that have a non-symmetric shape. A problem arises from this, if the matrix is non-symmetric, is this due to uneven contribution strength of the parameters or is the result incorrect for some parameters? The MOTA gives also the r2 -values P P (Θ − Φ)2 r2 = 1 − P (2.11) (Θ − mean(Θ))2 i.e., the fractional amount of variance of the response explained by the predictors, as output. This will be used in the next section when we will analyze the output from MOTA. The coefficient of variation cv = std(p̄)/mean(p̄) (2.12) will also be used according to the recommendation given by Hengl (2007). Another tool in the investigation of the identifiability issue of a model can be seen in Table 2.3. The table contains information about the parameter estimates and also information from the MOTA from a single run. The first column in the Table is the index, ix, of the response parameter. The table also contains the well known output matrix. Next are the r2 -values which are the fractional amount of variance of the response explained by the predictors. The larger value (maximum is one) the more variance that is explained by the predictors. The cv-value is the coefficient of variation. Note that the cv-value is not a percentage value as ordinary. The #-column is the number of special parameter combinations that have been found by the MOTA algorithm. In this case the output matrix contains two identical rows, the first and the third, that shows that p1 and p3 are related. This information is stored in that column. The last column shows the parameter combinations which the response parameter ix has found related. The difference between the r2 -value of the second row and the fourth is that the test-function (2.9) was below threshold T1 , Section 2.3.4, when the parameter p4 was taken as response but between threshold T1 and T2 when parameter p2 was 5 Some predictors contribute more or less to the response variable. 2.3 Mean Optimal Transformation Approach 17 taken as response. When the test-function drops below threshold T1 the r2 value is never calculated which the zero value for row 4 shows. When the test-function has a value that is between T1 and T2 there is not enough information to establish if there exists a functional relationship between the parameters. The r2 -value is calculated and the algorithm does another loop with more predictors. If the testfunction does not reach T2 the result is the one shown in row 2 in Table 2.3. The output matrix shows that parameter p2 is not linked with any other parameters but the r2 -value is nonzero. In Hengl (2007) there are recommendations on how to interpret the output from MOTA. Functional relations with r2 -value greater than 0.9 and a cv-value greater than 0.1 are recommended. If the functional relation has been found more than once this is a strong indication that the parameters are linked. Table 2.3: Output from a MOTA run and properties of the estimates ix 1 2 3 4 p1 1 0 1 0 p2 0 1 0 0 p3 1 0 1 0 p4 0 0 0 1 r2 0.9936 0.5203 0.9936 0.0000 cv 0.5836 0.6592 0.8413 0.6263 # 2 1 2 1 pars p1 , p3 p2 p1 , p3 p4 Accumulated Output Matrix Due to the bootstrap-based technique to speed up MOTA, the output can vary from a run to another. The number of estimates from MSA also affects the output matrix. In this thesis we have taken this into account and we use several MOTA runs in hope that the result will be more robust. An accumulated output matrix is just an ordinary output matrix but the elements have been summed up, accumulated, from numerous MOTA runs. An example of an accumulated output matrix is 100 0 100 0 0 100 0 0 , S200 (2.13) 100 = 100 0 100 0 0 0 0 100 where the lower index stands for how many times we have run the MOTA algorithm, and the upper index stands for how many estimates that serve as input to the algorithm. In this case MOTA has been run 100 times (lower index) and each run has been conducted with 200 estimates (upper index) of each parameter. The elements of the matrix can be as large as the number of runs in MOTA, since the ordinary output matrix only contains ones and zeros. From the matrix (2.13) one can conclude that parameter p1 and p3 seem to have some relation with each other. 18 2.4 Theoretical Background Observability The observability of a system S is determined by the state variables and their impact on the output. If a state variable has no effect on the output, directly or indirectly through other state variables, the system is unobservable. A definition is given in Ljung and Glad (2006). Definition 2 A state vector x∗ 6= 0 is said to be unobservable if the output is identically zero when the initial value is x∗ and the input is identically zero. The system S is said to be observable if it lacks unobservable state vectors. Observability is related to identifiability. Parameters can be considered as state variables with time derivative zero. By doing so the identifiability of the parameters is mapped to the observability rank test. This observability rank test is performed by rank calculation of the Jacobian (Anguelova, 2007). For a linear system ẋ = Ax + Bu, (2.14a) y = Cx + Du, (2.14b) the observability can be determined by the well known observability matrix C CA (2.15) O(A, C) = . .. CAn−1 where n denotes the number of state variables. The unobservable states form the linear null-space of O (Ljung and Glad, 2006). There is an equivalent test named PBH test; the system (2.14) is observable if and only if A − λI C has full rank for all λ, (Kailath, 1980). When the parameters are considered as state variables the model often becomes nonlinear and therefore a tool that can determine the observability of the system is needed. One such tool, SOT, is presented in the next section. For more information about the observability rank test see Anguelova (2007). 2.5 Sedoglavic’s Method Besides the Mean Optimal Transformation Approach (MOTA) that was presented in Section 2.3 the other main algorithm used in this thesis is Sedoglavic’s Observability Test (SOT) (Sedoglavic, 2002). SOT is an algorithm that calculates the identifiability of a system in polynomial time. A Maple implementation of the 2.5 Sedoglavic’s Method 19 algorithm can be retrieved from Sedoglavic’s homepage (Sedoglavic, 2009). SOT is an a priori (local) observability algorithm that only focuses on the model structure. Due to the polynomial time properties, this algorithm is of interest when investigating the identifiability6 of fairly large models. In the default settings the SOT is quicker than MOTA. With the input 14 × 233 matrix K a single run of MOTA takes around an hour and for the SOT the a priori identifiability analysis takes a couple of seconds (Intel Celeron M processor 410, 1.46 GHz). In this thesis the algorithm has only been used in its current form in Maple. 2.5.1 A Quick Introduction The algorithm is based on differential algebra and it is related to the power series approach of Pohjanpalo (1978). In this section we will present the main result from Sedoglavic (2002). Let Σ denote an algebraic system as the following, p̄˙ = 0, x̄˙ = f (x̄, p̄, u), Σ (2.16) ȳ = g(x̄, p̄, ū), and assume that there is l parameters p = [p1 , p2 , . . . , pl ], n state variables x = [x1 , x2 , . . . , xn ], m output variables y = [y1 , y2 , . . . , ym ] and r input variables u = [u1 , u2 , . . . , ur ]. Let also represent the system Σ by a straight-line program which requires L arithmetic operations, e.g., the expression e = (x + 1)3 is represented as the instructions t1 := x + 1, t2 = t21 , t3 = t2 t1 and L = 3. The following theorem is the main result (Sedoglavic, 2002). Theorem 2.1 Let Σ be a differential system described in (2.16). There exists a probabilistic algorithm which determines the set of observable variables of Σ and gives the number of unobservable variables which should be assumed to be known in order to obtain an observable system. The arithmetic complexity of this algorithm is bounded by O M(ν) N (n + l) + (n + m)L + mνN (n + l) with ν ≤ n + l and with M(ν) (resp. N (ν)) the cost of power series multiplication at order ν + 1 (resp. ν × ν). ′ Let µ be a positive integer, D be 4(n + l)2 (n + m)d and D be 2 ln(n + l + r + 1) + ln µD D + 4(n + l)2 (n + m)h + ln 2nD ′ If the computations are done modulo a prime number p > 2D µ then the probability of a correct answer is at least (1 − µ1 )2 . The detected observable variables are for sure observable. It is the unobservable variables that are unobservable with high probability. If we choose µ = 3000 the probability of a correct answer is 0.9993 and the modulo is 10859887151 (Sedoglavic, 2002). As we pointed out earlier the algorithm is fairly complicated and more information can be found in Sedoglavic (2002) and Sedoglavic (2009). 6 Parameters are regarded as states with ṗ = 0. 20 Theoretical Background 2.5.2 Usage This section will describe how to use the SOT which is written in Maple. For more information about the usage and the syntax of the algorithm see Sedoglavic (2009). Given an algebraic system Σ (2.16) we write down the equation as the following example Example 2.1: Calling Observability Test in Maple f := [x*(a-b*x)-c*x]; x := [x]; g := [x]; p := [a,b,c]; u := []; observabilityTest(f,x,g,p,u) ; f a list of algebraic expressions representing a vector field. x a list of names such that diff(x[i],t) = f[i]. g a list of algebraic expressions representing outputs. p a list of the names of the parameters. u a list of the names of the inputs. All parameters are regarded as states with ṗ = 0 and the algorithm tests if the states are a priori observable. If a parameter, regarded as a state variable, is a priori observable then the parameter is a priori identifiable, which was discussed in Section 2.4. The output of the algorithm is a vector that contains information about which parameters/states that are observable, which parameters/states that are unobservable and also the transcendence-degree; how many parameters are needed to be known for the system to be observable/identifiable. 2.5.3 Drawbacks of the Algorithm The SOT implementation is a pilot implementation and therefore contains some defects like, the variable t must be unassigned, the list x of the names of state variables has to be ordered such that diff(x[i],t) = f[i] represents the vector field associated to the model, a division by zero can occur if the chosen initial conditions cancel the separant. Some functions can not be handled. For example, the use of a square root implies that we work on an algebraic extension of a finite field. Some variables and some equations have to be added in order to handle this case. The implementation is efficient for one output. If there are many outputs and if the computation time is too long then some tests can be done in the main loop and some useless computations avoided, (Sedoglavic, 2009). Chapter 3 Results In this chapter the results will be presented. First of all the algorithms that have been used will be verified. This is done with the help of examples for which the identifiability/observability properties are already known. The two algorithms have been applied to these examples and the outputs compared with the correct one. Furthermore, a comparison between the MOTA algorithm written in Matlab and in Mathematica is presented. Finally the algorithms, MOTA and SOT, have been applied to three models with different complexity and sizes. 3.1 Verifying MOTA In the article of Hengl et al. (2007) there are two examples that are used to illustrate and demonstrate MOTA. Those examples have been reused and the results are presented below. In the first example the amount of Gaussian noise is not presented in Hengl et al. (2007) and in this thesis we have used ǫ ∈ N (0, 0.1). The second example is exactly the one that Hengl et al. (2007) used. All MOTA runs have the default settings which are: T1 = 0.01, T2 = 0.07, T3 = 0.08, the number of bootstrap samples drawn from the input matrix K is set to half of the number of estimates of the matrix. Finally the number of bootstraps or fitting sequences is set to 35 in the algorithm. 3.1.1 Example 1 The first example contains four parameters. The parameters p2 , p3 and p4 have a uniform distribution on the interval I = [0 5]. The parameter p1 depends on two other parameters according to p1 = p22 + sin(p3 ) + ǫ, where ǫ ∈ N (0, 0.1). To verify the MOTA algorithm we will draw 100 and 200 estimates independently of each other and compare the results. The correct solution is that the three first parameters p1 , p2 and p3 are functionally related and p4 is lacking any relationship with the other parameters. In other words the parameter relations are as shown in Table 3.1 or in the output matrix form (3.1), 21 22 Results Table 3.1: Relationships between the parameters * p1 p2 p3 p4 p1 1 1 1 0 1 1 S= 1 0 p2 1 1 1 0 1 1 1 0 p3 1 1 1 0 1 1 1 0 p4 0 0 0 1 0 0 . 0 1 (3.1) From now on we will prefer to use the latter representation when we show the relationship between the parameters. As already explained a symmetric matrix is not always the case due to different contribution strengths from the predictors to the response parameter (Hengl et al., 2007; Hengl, 2007). We will also come aware of that the number of estimates, taken as input to MOTA, affects the outcome considerably. 100 estimates When 100 estimates are drawn, the accumulated output matrix, described in Section 2.3.5, of 100 MOTA runs has the following composition (100 runs and 100 estimates), 100 0 100 0 0 100 0 0 . (3.2) S100 100 = 100 0 100 0 0 0 0 100 If one only looks at the matrix one would assume that p2 and p4 are independent and that a functional relationship exists only between p1 and p3 . The reason for this faulty behavior of the MOTA algorithm originates from that only 100 estimates are drawn in this case. More estimates have to be drawn due to the low contribution strength. The number of fits required for the algorithm depends on the functional relations of the parameters (Hengl et al., 2007). In Table 3.2, which is the outcome from a single MOTA run, we can see that the parameters p1 and p3 are functionally related and also meet the recommendations from Hengl (2007), r2 ≥ 0.9 and cv ≥ 0.1. However, since the number of fits is too low the algorithm does not reveal that the parameter p2 is also related to p1 and p3 . 3.1 Verifying MOTA 23 Table 3.2: Output from a MOTA run and properties of the estimates from Example 1, 100 estimates ix 1 2 3 4 p1 1 0 1 0 p2 0 1 0 0 p3 1 0 1 0 p4 0 0 0 1 r2 0.9936 0.5203 0.9936 0.0000 cv 0.5836 0.6592 0.8413 0.6263 # 2 1 2 1 pars p1 , p3 p2 p1 , p3 p4 200 estimates In this experiment 200 estimates are drawn instead of 100 estimates in the previous experiment. The accumulated output matrix from 100 MOTA runs are shown below (100 runs and 200 estimates), 100 99 100 0 0 100 0 0 . S100 (3.3) 200 = 100 99 100 0 0 0 0 100 100 The second row of S100 200 is the same as the S100 . If one only looks at this row one could assume that parameter p2 is lacking any relationship with the other parameter. However, row one and three indicates that parameters p1 , p2 and p3 have a strong functional relationship (correct assumption). This situation, when some of the rows contradict each other, is a common result from MOTA. One reason is the contribution strength, another is the bootstrap-technique that results in some random behavior of the MOTA algorithm. A third reason, as mentioned previously, is due to the number of estimates taken as input. It can be a bit tricky to choose how many estimates that are needed for a specific run since the underlying relationship between the parameters is in general not known. The problem is that the more estimates that are used as input the longer the algorithm takes to calculate the relations between the parameters. However, if the number of estimates is too low, as seen above when 100 estimates are drawn, the output matrix can differ considerably from the correct one. In Table 3.3 which is from a single MOTA run one can see that the parameters p1 , p2 and p3 are related. Row 1 and 3 show the same results, that all of them are functionally related, and because of the nonzero r2 -value in the second row there is likely that p2 is related to the other parameters. The test-function, when parameter p2 is taken as response, is between the first and second threshold which indicates that the algorithm can not conclude if there exist any relationship between the parameters for that case. However, due to the test-function does not drop below threshold 1 and there are other rows that show that p2 are functionally related, one can suspect that the parameter p2 is related. 24 Results Table 3.3: Output from a MOTA run and properties of the estimates from Example 1, 200 estimates ix 1 2 3 4 3.1.2 p1 1 0 1 0 p2 1 1 1 0 p3 1 0 1 0 p4 0 0 0 1 r2 0.9992 0.4728 0.9992 0.0000 cv 0.5590 0.6062 0.8621 0.5788 # 2 1 2 1 pars p1 , p2 , p3 p2 p1 , p2 , p3 p4 Example 2 The second example, also taken from Hengl et al. (2007), contains seven parameters which are related as p1 = −p2 + 10 5 p3 = p4 p5 p6 = η (3.4) p7 = 0.1, where p2 , p4 , p5 and η are all uniformly distributed, drawn independently from the interval I = [0 5]. The input to the MOTA algorithm is a matrix K = [p¯1 , . . . p¯7 ] and the output matrix should look something like this 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 S= (3.5) 0 0 1 1 1 0 0 . 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 Two draws, one consisting of 100 estimates and the other one of 200 estimates, are presented below. 100 estimates 100 estimates of p2 , p4 , p5 and η are drawn from the interval I = [0 5] and the other parameters are created from (3.4). The MOTA algorithm is run 100 times resulting in the accumulated output matrix 100 100 0 0 0 0 0 100 100 0 0 0 0 0 0 0 100 100 100 0 0 0 100 100 100 0 0 S100 (3.6) 100 = 0 . 0 0 100 100 100 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 100 3.1 Verifying MOTA 25 This is clearly the correct output for each run. Due to strong contribution strength the input matrix 100 × 7 K is sufficient to reveal these relationships between the parameters. The parameters p1 and p2 have a connection, p3 , p4 and p5 show a functional relationship, and p6 and p7 are independent of all other parameters. From Table 3.4 one can see that the MOTA finds the relationships between the parameters and there is no ambiguity which parameters are linked and which are independent. The high r2 -values for ix ∈ [1, 5] indicate that the predictors can explain the variance very well. The independent parameters p6 and p7 are identified fairly fast which can be seen in the zero r2 -values which indicates that the test-function drop below threshold 1 in its first iteration. During the first iteration, if the predictor that gives the largest value of the test-function is not good enough the test-function is below T1 and the algorithm conclude that the response parameter is independent. Table 3.4: Output from a MOTA run and properties of the estimates from Example 2, 100 estimates ix 1 2 3 4 5 6 7 p1 1 1 0 0 0 0 0 p2 1 1 0 0 0 0 0 p3 0 0 1 1 1 0 0 p4 0 0 1 1 1 0 0 p5 0 0 1 1 1 0 0 p6 0 0 0 0 0 1 0 p7 0 0 0 0 0 0 1 r2 1.0000 1.0000 0.9832 0.9778 0.9692 0.0000 0.0000 cv 0.1924 0.6134 2.4188 0.6452 0.5753 0.5999 0.0000 # 2 2 3 3 3 1 1 pars p1 , p2 p1 , p2 p3 , p4 , p5 p3 , p4 , p5 p3 , p4 , p5 p6 p7 200 estimates When 200 estimates are drawn for the matrix 200 × 7 K the result is the same, as it should be, with the 100-estimates-case. The accumulated output matrix is shown below 100 100 0 0 0 0 0 100 100 0 0 0 0 0 0 0 100 100 100 0 0 . S100 = 0 0 100 100 100 0 0 (3.7) 200 0 0 100 100 100 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 100 The number of estimates, or in the general case acceptable parameters, that the MOTA algorithm needs to behave well depends obviously on the structure of the underlying system. In this example a change from 100 to 200 estimates does not change the behavior of the algorithm as it does for the first example. If one suspects that the number of estimates is to few then a re-run of the algorithm with more estimates is a good idea. Table 3.5 show basically the same as for the 100 tuples case with one interesting 26 Results change. The r2 -value when ix = 6 is no longer zero. The algorithm has problem to determine that parameter p6 is independent of all other parameters in this case. During the first iteration the test-function is between T1 and T2 and the r2 -value is calculated. However, there are no other rows that indicate that p6 would be linked to any other parameter. Even if the algorithm gives a nonzero r2 -value this does not automatically mean that the parameter has an undiscovered functional relationship. Table 3.5: Output from a MOTA run and properties of the estimates from Example 2, 200 estimates ix 1 2 3 4 5 6 7 3.2 p1 1 1 0 0 0 0 0 p2 1 1 0 0 0 0 0 p3 0 0 1 1 1 0 0 p4 0 0 1 1 1 0 0 p5 0 0 1 1 1 0 0 p6 0 0 0 0 0 1 0 p7 0 0 0 0 0 0 1 r2 1.0000 1.0000 0.9934 0.9885 0.9894 0.6531 0.0000 cv 0.2045 0.5690 2.9393 0.4898 0.5615 0.6175 0.0000 # 2 2 3 3 3 1 1 pars p1 , p2 p1 , p2 p3 , p4 , p5 p3 , p4 , p5 p3 , p4 , p5 p6 p7 Verifying Sedoglavic Observability Test To verify SOT (Sedoglavic, 2002, 2009) the algorithm has been tested on linear and nonlinear models. The properties of these models are known and the output from SOT can therefore be verified. Most of the examples have been taken from Ljung and Glad (1994). 3.2.1 A Linear Model The linear model ẋ1 = −x1 ẋ2 = −2x2 ẋ3 = −3x3 y = 2x1 + x2 has been taken from Ljung and Glad (2006) page 174. The observability of the equation above can be analyzed with the observability matrix, (2.15). With −1 0 0 A = 0 −2 0 0 0 −3 and C= 2 1 0 3.2 Verifying Sedoglavic Observability Test 27 the observability matrix is 2 1 O(A, C) = −2 −2 2 4 0 0 . 0 Since the third column only consists of zeros one can conclude that the state x3 is unobservable. SOT applied on the model equations above gives that the third state is unobservable which is correct. 3.2.2 Goodwin’s Napkin Example Goodwin’s Napkin example, from Ljung and Glad (1994) page 9, is described by ÿ + 2θẏ + θ2 y = 0. (3.8) In the current form one can not use the SOT, however with the help of the following transformations, x1 = y ⇒ ẋ1 = ẏ = x2 x2 = ẏ ⇒ ẋ2 = ÿ = −2θẏ − θ2 y = −2θx2 − θ2 x1 , one can use the SOT. When observing y, the output of the system, then SOT gives that all parameters and states are observable. This is also the conclusion made by Ljung and Glad (1994) page 9. 3.2.3 A Nonlinear Model In Ljung and Glad (1994), page 9, a simple nonlinear model is presented. The model is described by the following equations ẋ1 = θx22 ẋ2 = u y = x1 . With the algorithm presented in Ljung and Glad (1994) the output is that the system is a priori identifiable. SOT gives the same result. 3.2.4 Compartmental Model A compartmental model is given as Example 4 in (Ljung and Glad, 1994) page 10. The model equations are Vm x(t) − k01 x(t) km + x(t) x(0) = D ẋ(t) = y = cx. 28 Results Ritt’s algorithm in Ljung and Glad (1994) gives that the only parameter that is a priori identifiable is k01 , if the initial value D is regarded as unknown. SOT does not regard initial values of the states and the outcome is the same as Ritt’s algorithm. The parameter k01 is a priori identifiable. 3.3 Implementation Aspects and Computational Complexity The main part of this master thesis work has been to translate the algorithms Multistart Simulated Annealing (MSA) algorithm and Mean Optimal Transformation Approach (MOTA) algorithm from Matlab to MathModelica/Mathematica. In the beginning it was planned that all testing and evaluations would be conducted in the MathModelica and Mathematica environment. However due to timing issues testing has mainly been performed in Matlab. The reason for this is twofold. Firstly, in the MSA algorithm, there is a need for a lot of simulations of the system with different sets of parameters. A fast simulation is vital for the performance and timing of the algorithm. Secondly, in MOTA the ACE algorithm is run many times and without efficient code the whole MOTA will be very slow and testing fairly complicated models will take a long time. Another problem is the computational complexity in MOTA. The number of estimates that MSA produce is in most cases far more than MOTA can handle. A selection algorithm is needed and the one we have used in this thesis is presented in a section below. 3.3.1 Simulation In the MSA algorithm at least one simulation is performed each time the cost function is calculated. This process can be troublesome if the simulation is not fast enough. In the MathModelica/Mathematica software environment the simulation is performed in MathModelica with the help of the Mathematica Link. The main program is written in Mathematica and use MathModelica when the simulation is to be carried out. For each parameter vector p̄ the simulation is run and the output is then used in Mathematica. The simulation is initiated for every p̄ without any pre-compilation for a more efficient simulation. Here lies the reason why Matlab is a lot faster than the current implementation of the MSA algorithm in MathModelica/Mathematica. In Matlab the simulation is performed in the SBTB with the SBADDON package. When the MSA starts, the model equation are converted to a compiled C-file and the parameters are passed as arguments each time the simulation is run. The result is that the simulation is about 100 times faster than the usually implementation. This is the main reason why the MSA algorithm is much faster in Matlab than Mathematica. In MathModelica one can do similar things for a faster simulation and this has also been pointed out to MathCore. It is essential for the performance of the algorithm that the simulation is time efficient. 3.4 Evaluation on Real Biological Data 3.3.2 29 ACE As mentioned earlier the MOTA uses optimal transformations to identify functionally related parameters. In practice this is done by calculating the average of the optimal transformation and then apply a test-function to determine which parameters are linked with each other. These optimal transformations are calculated in ACE and the core function is vital for the speed of the whole algorithm. In the current version of MOTA, in Mathematica, this is conducted either by callby-value1 or by using a global structure to calculate the optimal transformations. However this is not fast enough. In the version of MOTA in Matlab the core function in ACE is written in a C-file and the result is much faster than it would be if the function would be written as an ordinary .m-file. In MathModelica one can use the MathCode to do similar things. One can use a pre-compiled C-file, instead of evaluations of Mathematica notebooks, -nb-files. MathCore has been informed about this timing problem with ACE. 3.3.3 Selection Algorithm Although the MOTA algorithm has no limit on the number of estimates in the input matrix K, the computational complexity increases with increasing number of estimates. Therefore a selection algorithm is needed that reduces the number of estimates from MSA. The idea is to get fewer and sparse estimates that the MOTA can handle. This is in line with the recommendations cv ≥ 0.1 in Hengl (2007). The more sparse the estimates are the higher cv-value in general. The used selection algorithm is v uk=n uX t (p̄ i,k − p̄j,k )2 ≥ d i ∈ I, i 6= j, n = #parameters (3.9) k=1 where I denotes the determined acceptable parameters. The selection algorithm calculates the Euclidean distances between the parameter vectors p̄ and if the distance is lower than a critical distance d, for any i ∈ I, then the current parameter vector is not used. The algorithm is also described in Algorithm 3. 3.4 Evaluation on Real Biological Data In this section the algorithms, SOT and, MOTA with the MSA, will be applied to a number of models from systems biology. The data used in the cost function is real. 1 The values of the variables are copied which takes computational time. 30 Results Algorithm 3 Selection algorithm for the acceptable parameters The critical distance d can be changed in (3.9) and the selection algorithm is the following one, 1: sort all parameter vectors, p̄, in descending order on their corresponding value of the cost function, denote this matrix B. 2: take the first parameter vector in matrix B, the one with the lowest value/cost. 3: remove all points in matrix B that have an Euclidean distance smaller than d to the taken parameter vector. 4: the taken parameter vector is chosen as an acceptable parameter and if B is not empty go to 2:. 3.4.1 Model 1: SimplifiedModel The first model that will be investigated is called SimplifiedModel and consists of the following equations Vm x1 km + x1 Vm x1 ẋ3 = km + x1 x1 (0) = 4 x3 (0) = 0 ẋ1 = − y = x3 . (3.10a) (3.10b) (3.10c) (3.10d) (3.10e) The name SimplifiedModel originates from that the model is a simplification of a larger model in the sense of fewer states. This is the reason why the first model only contains states x1 and x3 . The decay rate, in the larger model, of x3 is assumed to be negligible and the remaining reactions is in the classical MichaelisMenten form. This is the reason why the model (3.10) is used. The larger model is presented below, ẋ1 = −k1 x1 + km1 x2 ẋ2 = k1 x1 − km1 x2 − k2 x2 ẋ3 = k2 x2 − k3 x3 x1 (0) = 4 x2 (0) = 0 x3 (0) = 0 y = x3 . 3.4 Evaluation on Real Biological Data 31 The output data ymeas = x3 (t) has been measured eight consecutive times in time-samples t ∈ [0, 0.2, 0.4, . . . , 2.0]. The result are presented in Table 3.6. The measurements have been done using so called Western blots, which allows for timeresolved measurements of the state of phosphorylation of various proteins; here the insulin receptor and the insulin receptor substracte-1. It has been measured by the group of Peter Strålfors at IKE, Linköping University. Table 3.6: This table presents the values from the eight measurements. The time samples are shown in the first column time 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 ȳ1 -0.087 3.367 4.014 3.913 3.674 3.791 3.991 4.104 4.113 4.065 3.747 ȳ2 -0.333 3.549 3.688 4.131 4.046 4.116 3.928 3.939 4.148 4.004 3.545 ȳ3 0.025 3.286 4.098 4.156 3.783 4.094 4.210 3.914 4.176 3.859 4.182 ȳ4 0.058 3.841 4.280 4.136 4.277 4.331 3.617 3.931 3.790 3.875 3.881 ȳ5 -0.230 3.377 3.817 4.251 3.833 4.111 4.077 3.695 4.031 4.075 4.050 ȳ6 0.238 3.427 4.127 4.127 4.100 3.864 4.170 3.943 4.036 3.797 4.032 ȳ7 0.238 3.617 4.206 4.231 4.038 4.069 4.138 4.013 3.787 4.143 3.989 ȳ8 -0.008 3.416 3.636 3.753 3.810 3.791 4.107 4.053 3.840 4.100 3.784 In Figure 3.1 all eight measurements are plotted against time. Note that these samples are measurements of the signal and due to noise they are not all the same. Due to the variations between the measurements a mean value is computed. The mean can also be seen in Figure 3.1. The model has been tested with the help of MOTA and SOT to identify which, if any, parameters that are non-identifiable. Results from SOT The equations from SimplifiedModel (3.10) are similar to the compartmental model in Section 3.2.4. A significant difference is that in the former case the measured signal y is x3 without any unknown parameter involved in the expression. When applying SOT on the SimplifiedModel the output is that all states and parameters are a priori observable. Results from MOTA The model (3.10) has been simulated in the MSA algorithm and the acceptable parameters, those parameters that give a sufficiently low cost (110 percent of the best one) of the cost function, was collected. These acceptable parameters, (542 × 2), was taken as input to the MOTA algorithm. 10 runs of MOTA have been conducted and the accumulated output matrix is the following 32 Results mean of the measurements 4.5 4 4 3.5 3.5 3 3 2.5 2.5 value value measurements SimplifiedModel 4.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 0 0.5 1 time 1.5 2 0 0.5 1 time 1.5 2 Figure 3.1. This figure show (a) the eight measurements of ymeas = x3 (t), (b) the mean of the measurements which is used is the MSA when calculating the cost for a given parameter vector p̄ S542 10 = 10 10 10 10 . (3.11) The MOTA identifies that there exists a strong functional relationship between the parameters Vm and km . In Figure 3.2 the optimal solution for the parameters from the MSA is plotted against the mean values of the measurements. In Table 3.7 the result from a single MOTA run is presented. The r2 -value is high and the functional relationship between Vm and km is strong. This follows the recommendation given in Hengl (2007), r2 ≥ 0.9 and cv ≥ 0.1. Table 3.7: Output from a MOTA run and properties of the estimates from Model 1, 542 estimates ix 1 2 Vm 1 1 km 1 1 r2 0.9991 0.9991 cv 0.3623 0.3637 # 2 2 pars p1 , p2 p1 , p2 3.4 Evaluation on Real Biological Data 33 Comparison simulation/experiment 4.5 4 3.5 3 x 3 2.5 2 1.5 1 0.5 0 0 0.5 1 time 1.5 2 Figure 3.2. A comparison between the mean values and output from the model when the optimal parameter vector p̄ is used. The stars represent the mean values 34 Results Vm vs km 11000 10000 9000 8000 km 7000 6000 5000 4000 3000 2000 1000 1 2 3 4 5 6 7 Vm 8 9 10 4 x 10 Figure 3.3. This figure is for the model (3.10) and shows the estimates, from MSA, of the two parameters, Vm and km , that are plotted against each other. The linear relationship is prominent Analysis The results from SOT and MOTA differs. This is not a contradiction as one could assume but rather an indication that the model is practically non-identifiable. In other words, in the ideal case the parameters can be estimated from the output y, there exist no input u in this case, however because of the data is not of sufficient quality this results in that the parameters become non-identifiable An indication of the reason why the model is practically non-identifiable derive from the expression (3.12). When x1 is small then the expression can be approximated with the following, ẋ3 = Vm x1 Vm x1 ≈ = kx1 , km + x1 km (3.12) the original model equation can be approximated with the equation kx1 . This m implies that there is only the quotient Vkm = k that can be derived. The result is that the parameters Vm and km are functionally related and non-identifiable. Figure 3.3 shows this behavior. One can clearly see a linear relationship between the parameters. The correlation value between the two vectors V̄m and k̄m of acceptable parameters is as high as 0.99. This is a schoolbook example of the impact of practical non-identifiability. Even if the a priori identifiability analysis indicates that the parameters can be estimated from the input and output this does not imply that the parameters can be estimated in practice. 3.4 Evaluation on Real Biological Data 3.4.2 35 Model 2: Addition of insulin to the media The model described by (3.13) is a minimal model for the first few steps in the insulin receptor activation. It contains insulin binding, receptor internalization into the cytosol, and recycling back from the cytosol to the membrane. A common observation in systems biology is a overshoot which is a measured signal that shoots over the final value. The model in this section has been created to give one such overshoot before reaching the final value. For more information about the model see Cedersund et al. (2008) and Brännmark et al. (2009). The model equations are I˙R = −k1 uIR + kR IRi I˙Rp = k1 uIR − kID IRp I˙Ri = kID uIRp − kR IRi (3.13a) (3.13b) (3.13c) IR (0) = 10.0 IRp (0) = 0.0 (3.13d) (3.13e) IRi (0) = 0.0 y = kY IRp . (3.13f) (3.13g) The input u = ins is the insulin level. In the following measurements this signal has been equal to one, u = u0 ≡ 1. The measured signal in the the model and experiment is y = kY IRp (t) and it has been measured three consecutive times. The result can be viewed in Figure 3.4. As one can see, the experimental data differs from measurement to measurement indicating significant noise. The timesamples, t ∈ [0, 0.9, 1.15, 1.30, 2.30, 3.30, 5.30, 7, 15], when the measurements are taken are not equidistant. Due to bad measurements the third sample, when t = 2.3, has been corrupted and is not used. The parameters vector in this example is p̄ = [k1 , kID , kR , kY ]. The mean from the measurements is illustrated in Figure 3.4 and it is used in the MSA algorithm when calculating the cost for a given parameter vector p̄. Results from SOT When applying SOT to the model (3.13) we get that the only parameter that is a priori non-identifiable is kY , all other parameters are a priori identifiable. This result will be taken into account when the MOTA will be used. Results from MOTA The model that is described by (3.13) has been run and simulated in the MSA algorithm. The acceptable parameters were taken from the parameter vector p̄ that had a cost that were below 110 percent of the best cost so far. Six different starting points for the algorithm were chosen randomly between the low and high bounds in the MSA algorithm. The settings for the MSA can be seen in Table 3.8. The low and high bounds and the six starting points can be viewed in Table 3.9 36 Results mean of measurements 110 100 100 90 90 80 80 70 70 60 60 value value measurements 110 50 50 40 40 30 30 20 20 10 10 0 0 5 10 15 0 0 5 time 10 15 time Figure 3.4. This figure show (a) the three measurements of y = kY IRp (t), (b) the mean of the measurements and Table 3.10. The result was six different sets of the acceptable parameters. If one only summed up these subsets the result would be 32946 estimations for every parameter. This is far too many estimations that MOTA can handle for the input matrix K. Therefore the selection algorithm has been applied to this larger set. The selection algorithm that has been applied to the set of acceptable parameters can be seen in (3.9), Table 3.8: Settings in the MSA for the experiments in Model 2, p̄ = [k1 , kID , kR , kY ] Temp start Temp end Temp factor Maxitertemp Maxitertemp0 Max restart points 10000 1 0.1 4000 12000 10 Table 3.9: The low and high bounds and the starting points for the first and second run for the parameters Parameter k1 kID kR kY Low bound 0.01 0.01 0.01 0.01 High bound 500 500 500 500 Start guess run 1 387.6278 103.3198 211.8210 489.6217 Start guess run 2 495.2873 55.929 257.3806 106.6258 3.4 Evaluation on Real Biological Data 37 Table 3.10: The starting points of the parameters for run 3 to run 6 Start guess run 3 224.8788 113.3238 20.3859 244.8283 Start guess run 4 161.3668 281.1164 414.9658 23.4521 Start guess run 5 22.9590 260.1913 418.6167 483.8139 Start guess run 6 460.4434 156.9335 243.7368 205.4742 After the larger set has been run in (3.9) with d = 1 the acceptable parameters are reduced to 558 × 4 K and it is this matrix that is taken as input to MOTA. The MOTA algorithm has been run 100 times and the result is the accumulated output matrix 100 0 2 0 0 100 22 100 . (3.14) S558 100 = 10 0 100 0 0 100 8 100 This accumulated output matrix is non-uniform and the rows contradict each other. However the indication is that the parameters kID and kY have a strong functional relationship and are therefore non-identifiable. There are some other smaller indications, row 1 and 3, that k1 and kR would be connected and that kR would relate, row 2 and 4, to both kID and kY . To determine if these smaller indications describe the true linkage between the parameters or just a random element in the MOTA, another run of the MSA is conducted. The new run is with new low and high bounds taken from the previous acceptable parameters lowest respectively highest value for each parameter. The settings are the same as before and can be viewed in Table 3.8. In Table 3.11 the start guess for the seventh run is presented. Table 3.11: Settings in the MSA for the experiment M7 Parameter k1 kID kR kY Low bound 0.69 0.68 0.28 15 High bound 625 26 10 500 Start guess run 7 270.7719 6.6090 2.514 488.6613 The selection algorithm (3.9) is once again applied with d = 1 to the new set and the input matrix K consists of 233 estimations for each parameter. The MOTA has now been run 1000 times and the accumulated output matrix is the following 38 Results comparising simulation/experiment 200 180 160 y=kY(IRp) 140 120 100 80 60 40 20 0 0 5 10 15 time Figure 3.5. This figure show the output from the optimal parameter vector p̄ when applied to the model (3.13). The mean value from the measurements is marked with stars S233 1000 1000 0 0 0 0 1000 151 1000 . = 0 0 1000 0 0 1000 109 1000 (3.15) The indication that k1 and kR would have a connection is gone. On the other hand the indication that kR are related to kID and kY is still there. The result is similar with the result in Section 3.1.1 with the difference that in this example there are lower percentage values. In Table 3.12 the result from a single MOTA run is shown. With the recommendations from (Hengl, 2007), r2 ≥ 0.9 and cv ≥ 0.1, we conclude that parameter kID and kY have a strong functional relation. Table 3.12: Output from a MOTA run and properties of the estimates from Model 2, 223 estimates ix 1 2 3 4 k1 1 0 0 0 kID 0 1 0 1 kR 0 0 1 0 kY 0 1 0 1 r2 0.7914 0.9421 0.7928 0.9421 cv 0.6365 1.4250 0.0227 1.4066 # 1 2 1 2 pars p1 p2 , p4 p3 p2 , p4 In Figure 3.5 the optimal solution for the parameters from the MSA is plotted against the mean values of the measurements. One can see the overshoot clearly and the model follows the measurements fairly well. 3.4 Evaluation on Real Biological Data 39 kID vs kY 550 500 450 400 350 kY 300 250 200 150 100 50 0 0 5 10 15 20 25 kID Figure 3.6. A plot showing the estimates of kID and kY . The cluster in the lower left corner is zoomed in and presented in Figure 3.7 Analysis In Figure 3.6 the parameter kID and kY is plotted. Here one can see why the MOTA algorithm finds a strong functional relationship between the parameters. There are two clusters and the first cluster, the one in the lower left corner is enlarged in Figure 3.7. The linear properties are clear and only the quotient between the parameter kID and kY seems to be identifiable which render both these parameters non-identifiable. The parameter kY is a priori (and practically) non-identifiable and the parameter kID is practically non-identifiable. The other parameters k1 and kR are practically identifiable 40 Results kID vs kY 30 28 26 24 kY 22 20 18 16 14 12 10 0.5 0.6 0.7 0.8 0.9 1 kID 1.1 1.2 1.3 1.4 1.5 Figure 3.7. This figure show one of the clusters in Figure 3.6. The linear property is striking 3.4 Evaluation on Real Biological Data 3.4.3 41 Model 3: A model for the insulin receptor signaling, including internalization The third and last model that is examined in this thesis is also a biological one. This model describes the same system as Model 2, but a little bit more extensively. More states are included in the insulin receptor activation, and also the first substrate - insulin-receptor substrate-1- is included. For more information about the model and the measurements see Cedersund et al. (2008) and Brännmark et al. (2009). The model is described by I˙R = −k1 uIR − k1basal IR + kR IRptp + km1 IRins I˙Rins = k1 uIR + k1basal IR − k2 IRins − km1 IRins I˙Rp = k2 IRins − k3 IRp + km IRpP T P I˙RpP T P = k3 IRp − km3 IRpP T P − kD IRpP T P I˙Rptp = kD IRpP T P − kR IRptp I˙Rs = −k4 (IRp + IRpP T P )IRs + km4 IRSP (3.16a) (3.16b) (3.16c) (3.16d) (3.16e) (3.16f) I˙RSP = k4 (IRp + IRpP T P )IRs − km4 IRSP (3.16g) IR (0) = 10.0 (3.16h) IRins (0) = 0.0 IRp (0) = 0.0 (3.16i) (3.16j) IRpP T P (0) = 0.0 IRptp (0) = 0.0 (3.16k) (3.16l) IRs (0) = 10.0 IRSP (0) = 0.0 (3.16m) (3.16n) yIRp = kY 1 (IRp + IRpP T P ) yDoubleStep = kY 2 IRSP (3.16o) (3.16p) yAnna = kY Anna IRSP yDosR = kY DosR IRSP . (3.16q) (3.16r) As one can see the model contains seven states and fourteen parameters. There are three consecutive measurements that have been used when calculating the cost for the different parameter vectors. The cost function in this case is similar to the one presented in (2.4). The model (3.16) is tested in SOT and MOTA and the results are presented in the following sections. Results from SOT SOT applied to the model (3.16) gives that the following parameters are a priori non-identifiable: [k3 , km3 , kD , k4 , kY 1 , kY 2 , kY Anna , kY DosR ]. The remaining parameters, [k1 , k1basal , km1 , k2 , kR , km4 ], are a priori identifiable. 42 Results Results from MOTA Before testing the model in MOTA a run of the MSA algorithm is done. The settings can be viewed in Table 3.13. The start guess, shown in Table 3.14, is the best known parameter vector that minimize the current cost function and is used in this case. Table 3.13: Settings in the MSA for the experiment M7 Temp start Temp end Temp factor Maxitertemp Maxitertemp0 Max restart points 10000 0.5 0.1 14000 42000 30 Table 3.14: Low and high bounds and the starting point in MSA Parameter k1 k1basal km1 k2 k3 km3 kD kR k4 km4 kY 1 kY 2 kY Anna kY DosR Low bound 0.01 0.0001 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 10 1 1 1 High bound 500 100 500 500 500 500 500 500 1500 1500 25 100 100 100 Start guess 149.1 0.0001 413.1 0.7564 6.0889 381.0 481.0 0.3369 292.5 1496.5 15.2186 53.1776 100.0 98.2 From the MSA algorithm we get over 200000 estimates (acceptable parameters) when we run the algorithm with the settings from Table 3.13. These are too many and we use the selection algorithm described in Algorithm 3 with d = 100. The number of estimates then drop down to 290 which is used as input to MOTA algorithm. 30 runs of the MOTA give us the accumulated output matrix (3.17), 3.4 Evaluation on Real Biological Data S290 30 = 30 0 0 0 0 30 0 0 8 8 30 8 0 0 0 30 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 30 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 7 5 0 0 0 30 0 0 0 0 0 0 30 21 21 12 21 21 0 18 2 1 23 30 23 0 23 22 0 4 8 0 18 18 30 0 18 0 0 0 0 0 0 0 0 30 0 0 0 0 0 0 0 0 0 30 30 30 0 0 0 0 1 1 0 28 29 30 0 2 0 0 0 0 0 0 0 0 30 0 0 0 0 0 0 0 0 0 0 30 30 0 0 0 0 0 0 0 0 30 30 0 0 0 0 0 0 0 0 30 0 30 43 . (3.17) It is hard to ratiocinate which parameters that are identifiable and which are not. However it seems that k1 , k1basal , km1 and kY 1 are practically identifiable. The rest of the parameters seem to be functionally related with each other in different ways. The parameters k2 and kR appear to have a connection and kR , k4 , km4 is another parameter combination that is linked. Row 12 and 13 show that kY 2 and kY Anna is solely connected, however row 14 indicates that the linkage between kY 2 and kY DosR is strong. In Table 3.15 the result from a single MOTA run is presented. Due to the large output matrix the parameter relations in Table 3.15 are shown in Table 3.16 instead. In this MOTA run the conclusions from (3.17) are confirmed. The parameters that seem to be identifiable are k1 , k1basal , km1 and kY 1 . The second observation is that the k2 and kR are related. On the other hand the r2 -value for row 4 and 8 is below 0.90 which do not follow the recommendations of Hengl (2007). A third observation is of parameters kY 2 and kY Anna . Due to the high r2 value (higher than 0.90) and the cv-value higher than 0.1 Hengl (2007) recommend to conclude that there is a real linkage between them. The other rows contradict each other and it is difficult to see which parameters are related and in what way. 44 Results Table 3.15: Output from a MOTA run and properties of the estimates from Model 3, 290 estimates. The found relations can be viewed in Table 3.16 ix 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k1 1 .. . k1basal ... .. . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 r2 0.9978 cv 0.3518 # 1 pars k1 0.9961 0.9977 0.8797 0.9982 0.9670 0.9916 0.8797 0.9304 0.8555 0.9936 0.9440 0.9440 0.8951 0.4518 0.2640 0.5076 2.6240 0.6022 0.5630 0.2710 0.6395 0.3539 0.3188 0.1862 0.1726 0.1898 1 1 2 1 1 1 2 2 2 1 2 2 1 k1basal km1 k2 , kR k2 , k3 , km3 , kD , kR , k4 , km4 , kY 2 k3 , km3 , kD , k4 , km4 k3 , km3 , kD , k4 k2 , kR kR , k4 , km4 kR , k4 , km4 kY 1 kY 2 , kY Anna kY 2 , kY Anna kY 2 , kY DosR Table 3.16: Relationships between the parameters from a single MOTA run for model 3. k1b = k1basal , kY A = kY Anna and kY D = kY DosR k1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 k1b 0 1 0 0 0 0 0 0 0 0 0 0 0 0 km1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 k2 0 0 0 1 1 0 0 1 0 0 0 0 0 0 k3 0 0 0 0 1 1 1 0 0 0 0 0 0 0 km3 0 0 0 0 1 1 1 0 0 0 0 0 0 0 kD 0 0 0 0 1 1 1 0 0 0 0 0 0 0 kR 0 0 0 1 1 0 0 1 1 1 0 0 0 0 k4 0 0 0 0 1 1 1 0 1 1 0 0 0 0 km4 0 0 0 0 1 1 0 0 1 1 0 0 0 0 kY 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 kY 2 0 0 0 0 1 0 0 0 0 0 0 1 1 1 kY A 0 0 0 0 0 0 0 0 0 0 0 1 1 0 kY D 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Analysis Let us recollect the outcome from the two algorithms. SOT gives the result that the parameters [k1 , k1basal , km1 , k2 , kR , km4 ] are a priori identifiable. The rest, [k3 , km3 , kD , k4 , kY 1 , kY 2 , kY Anna , kY DosR ], are concluded to be non-identifiable. 3.4 Evaluation on Real Biological Data 45 The MOTA on the other hand gives that the parameters that seem to be practically identifiable are k1 ,k1basal ,km1 , kY 1 . As said before, practical identifiability implies a priori identifiability. On the other hand, if a parameter is a priori non-identifiable this implies that the parameter cannot be practically identifiable. If not, it would imply that the parameter would be a priori identifiable which is contradicting. The one who has paid attention to the results above has already found that the two results contradict each other. The contradiction is due to parameter kY 1 . SOT has it as a non-identifiable parameter and MOTA has it as a practically identifiable parameter. How can this happen? Which algorithm is correct? As a matter of fact SOT is the one with the correct result. The reason for this faulty behavior of MOTA is that the parameter kY 1 never is used in the cost function used by MSA. This is rather a user mistake than a fault in MOTA. The measurements of yIRp were not used when constructing the cost function. If a parameter is not used when calculating the cost then the MSA algorithm cannot handle it correctly. A change in parameter kY 1 does not impact the cost and becomes disconnected with the rest of the parameters. This disconnection is in MOTA treated as an independent parameter and the outcome is the same as if the parameter would be identifiable. SOT on the other hand focuses only on the model equations and in that perspective the parameter kY 1 is a priori non-identifiable. There are three other parameters that seem to be practically identifiable in MOTA, k1 , k1basal and km1 . They are, according to SOT, a priori identifiable and therefore there is no contradiction in this case. Therefore it is likely that these are the only parameters in the model (3.16) that can be determined from the input and output signals. The other parameters that are a priori identifiable by SOT are k2 , kR and km4 . Both k2 and kR show an indication that they are related according to the accumulated output matrix (3.17) and also by the single run of MOTA in Table 3.16. However, the r2 -value is below the recommended. This is a sign of that there can be more parameters related to k2 and kR rather than a sign that the parameters are independent. Parameter km4 appears to be related at least with parameter k4 . From the accumulated output matrix (3.17) there are some contradicting rows with the respect of km4 . This non-symmetric output matrix is a real problem when determining which parameters are functionally related. To give a summary of the conclusions for this third model there are three parameters, k1 , k1basal and km1 , that appear to be practically identifiable. Three parameters, k2 , kR and km4 , are a priori identifiable but practically non-identifiable. Parameter kY 1 is special case and its value is never used in the MSA algorithm and is therefore mistaken for an independent parameter by MOTA. The rest of the parameters are a priori non-identifiable. Chapter 4 Conclusions and Future Work In this chapter the conclusions of the thesis will be presented together with some ideas for future work. 4.1 Conclusions First of all, the algorithms that have been evaluated, MOTA and SOT, can be used together. The use of one of them does not exclude the use of the other. When we try to determine the identifiability issue of a model both the a priori identifiability analysis and the simulation approach can be used successively, which is also addressed by Hengl et al. (2007). As a matter of fact, the use of a priori identifiability analysis, in this case SOT, helps a great deal when we are trying to decipher the output matrix from MOTA. The parameters that have been found a priori identifiable from the a priori identifiability analysis are the only ones that can be practically identifiable. This is under the assumption that the model equations are used correctly and all parameters affect the cost when calculating the cost function, when the acceptable parameters are searched for. This was not the case in model 3 in Section 3.4.3. The SOT algorithm, although it has some known drawbacks (Sedoglavic, 2009), is efficient for the examples and models that have been tested in this thesis. Time is an important observation and since the algorithm is polynomial in time the SOT is highly interesting when a fast algorithm is needed to determine the a priori identifiability aspects of a model. However, even if the algorithm is polynomial in time, large models with many states and parameters can be time-consuming, but it is anyway far better than most of the exponential ones. A problem that is prominent during this thesis is the non-symmetric shape of the output matrix from MOTA. The reason for this is that the matrices often consist of contradicting rows which makes it difficult to decode the functional relationship of the parameters. Fortunately the SOT was a big help many times 47 48 Conclusions and Future Work when the relation between the parameters were examined. The non-symmetric matrices from MOTA depend on a great deal of the estimates of the parameters, or acceptable parameters. These estimates have been produced by MSA. Therefore the settings of the MSA are of importance for the result of the MOTA algorithm. The number of fits/estimates depends on the underlying functional relation and for different models these relations differ. Because of that it is hard to know how many fits that are required by the MOTA algorithm to reveal the parameters that are connected with each other. The contribution strength also inflicts on the output matrix. A parameter when taken as a predictor on the right-hand side in MOTA can contribute more or less to the response. A parameter with low contribution strength to a certain response can therefore be mistaken as a non-related parameter to the response leading to a non-symmetric output matrix. Also the quality of the acceptable parameters affects the behavior of MOTA. How to get sufficiently good estimates from the MSA? In this thesis a selection algorithm is used to get more sparse estimates of the parameters which are then used by MOTA. This is according to Hengl (2007) recommended due to the cv ≥ 0.1 recommendation when deciphering the functional relationship from the output matrix. Even if the MOTA algorithm is difficult to manage, the problem of practically non-identifiable parameters is of great interest and a big problem. If the model in question is a priori identifiable this does not directly imply that the parameters can be estimated in practice. The quality of data may not have been considered and could result in practically non-identifiable parameters. Due to this, more focus is required on the quality of the measurements from the input and output signals. 4.2 Proposal for Future Work One proposal for future work is to go through thoroughly the work of Sedoglavic. In this thesis we only scratched the surface of his algorithm and a deeper understanding of the SOT would shed more light to the subject of identifiability. Another interesting subject is the relations between the parameters, how the parameters are connected and which parameters are needed to be known for an identifiable model. An algorithm that can be of help is one from Sedoglavic, Observability Symmetries (Sedoglavic, 2009), which determine how the parameters are linked. The space of the acceptable parameters is also a field that can be examined more thoroughly. Which methods are more suitable than others to identify these acceptable parameters? How good is the MSA algorithm for obtaining these parameters? The last proposal for future work is to investigate how to obtain symmetric output matrices from MOTA. What can be done to reduce the possibility of getting non-symmetric output matrices? If this is solved then MOTA would be a lot more useful and reliable to work with. Bibliography M. Anguelova. Observability and identifiability of nonlinear systems with applications in biology. PhD thesis, Chalmers University of Technology, 2007. S. Audoly, G. Bellu, L. D’Angiò, M.P. Saccomani, and C. Cobelli. Global identfiability of nonlinear models if biological systems. Biomedical engineering, 48(1): 55–65, 2001. L. Breiman and J. Friedman. Estimating optimal transformations for multiple regressions and correlation. Journal of the American Statistical Association, 80 (19):580–598, 1985. C. Brännmark, R. Palmer, T. Glad, G. Cedersund, and P. Strålfors. Receptor internalization is necessary but not sufficient for control of insulin signalling in adipocytes. Submitted, 2009. G. Cedersund, J. Roll, E. Ulfheilm, A. Danielsson, H. Tidefelt, and P. Strålfors. Model-based hypothesis testing of key mechanisms in initial phase of insulin signaling. PLoS Comput Biol., 4(6), 2008. P. Fritzson. Principles of object-oriented modelling and simulation with Modelica 2.1. IEEE Press, 2003. ISBN 0-471-47163-1. S. Hengl. Quickstart to the MOTA-Software, 2007. S. Hengl, C. Kreutz, J. Timmer, and T. Maiwald. Data-based identifiability analysis of non-linear dynamical models. Bioinformatics, 23(19):2612–2618, July 2007. T. Kailath. Linear System. Pretice Hall, 1980. L. Ljung and T. Glad. On global identifiability for arbitrary model parametrizations. Automatica, 30(2):265–276, 1994. L. Ljung and T. Glad. Reglerteknik -Grundläggnade teori. Studentlitteratur, 2006. ISBN 978-91-44-02275-8. T. Maiwald and J. Timmer. Dynamical modeling and multi-experiment fitting with potterswheel. Bioinfomatics, 24(18):2037–2043, 2008. 49 50 BIBLIOGRAPHY Mathcore. http://www.mathcore.com, 2009. Modelica. http://www.modelica.org/tools, 2009. T. Pettersson. Global optimization methods for estimation of descriptive models. Master’s thesis, Linköpings University, 2008. H. Pohjanpalo. System identfiability based on the power series expansion of the solution. Mathematical biosciences, 41:21–33, 1978. Sedoglavic. http://www2.lifl.fr/~sedoglav/, 2009. A. Sedoglavic. A probabilistic algorithm to test local algebraic observability in polynomial time. Journal of Symbolic Computation, 33(5):735–755, 2002. S. Vajda, K. Godfrey, and H. Rabitz. Similarity transformation approach to identfiability analysis of nonlinear compartmental models. Mathematical biosciences, 93:217–248, 1989. D. Wang and M. Murphy. Identifying nonlinear relationsships on regression using the ace algorithm. Journal of Applied Statistics, 32:243–258, 2005. S. Wolfram. The Mathematica Book. Cambridge University Press, 1999. ISBN 0-521-64314-7. Appendix A Programming Examples A.1 Mathematica Example A.1: An example of Mathematica Coding Procedural programming: sum=0; For[i=1,i<=1000,i++,If[Mod[i,2]==0,sum+=i]]; sum Functional programming: Apply[Plus,Select[Range[1000],EvenQ]] Rule-based programming: Range[2,1000,2]//.{y_,x_,z___}->{x+y,z} A.2 MathModelica Example A.2: An Example of Modeling in MathModelica type Voltage = Real(unit="V"); type Current = Real(unit="A"); connector Pin Voltage v; flow Current i; 51 52 Programming Examples end Pin; model TwoPin "Superclass of elements with two electrical pins" Pin p, n; Voltage v; Current i; equation v = p.v-n.v; 0 = p.i+n.i; i = p.i; end TwoPin; model Resistor "Ideal electrical resistor" extends TwoPin; parameter Real R(unit="ohm") "Resistance"; equation R*i=v; end Resistor; A.3 Maple Example A.3: Coding Faculty in Maple Imperative programming: myfac := proc(n::nonnegint) local out, i; out := 1; for i from 2 to n do out := out * i end do; out end proc; Another way, ’maps to’ arrow notation: myfac := n -> product( i, i=1..n );

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement