# Department of Physics and Measurement Technology

Department of Physics and Measurement Technology Master’s Thesis Evaluation and Development of Methods for Identification of Biochemical Networks Alexandra Jauhiainen LITH-IFM-EX-05/1378-SE Department of Physics and Measurement Technology Linköpings universitet SE-581 83 Linköping, Sweden Master’s Thesis LITH-IFM-EX-05/1378-SE Evaluation and Development of Methods for Identification of Biochemical Networks Alexandra Jauhiainen Supervisor: Mats Jirstrand Fraunhofer-Chalmers Research Centre for Industrial Mathematics Examiner: Jesper Tegnér IFM, Linköpings universitet Linköping, 22 February, 2005 Datum Date Avdelning, Institution Division, Department Division of Computational Biology Department of Physics and Measurement Technology Linköpings universitet SE-581 83 Linköping, Sweden Språk Language Rapporttyp Report category ISBN Svenska/Swedish Licentiatavhandling ISRN Engelska/English Examensarbete C-uppsats D-uppsats Övrig rapport 2005-02-22 — LITH-IFM-EX-05/1378-SE Serietitel och serienummer ISSN Title of series, numbering — URL för elektronisk version http://www.ep.liu.se/exjobb/ifm/bi/ 2005/1378/ Titel Title Evaluering och utveckling av metoder för identifiering av biokemiska nätverk Evaluation and Development of Methods for Identification of Biochemical Networks Författare Alexandra Jauhiainen Author Sammanfattning Abstract Systems biology is an area concerned with understanding biology on a systems level, where structure and dynamics of the system is in focus. Knowledge about structure and dynamics of biological systems is fundamental information about cells and interactions within cells and also play an increasingly important role in medical applications. System identification deals with the problem of constructing a model of a system from data and an extensive theory of particularly identification of linear systems exists. This is a master thesis in systems biology treating identification of biochemical systems. Methods based on both local parameter perturbation data and time series data have been tested and evaluated in silico. The advantage of local parameter perturbation data methods proved to be that they demand less complex data, but the drawbacks are the reduced information content of this data and sensitivity to noise. Methods employing time series data are generally more robust to noise but the lack of available data limits the use of these methods. The work has been conducted at the Fraunhofer-Chalmers Research Centre for Industrial Mathematics in Göteborg, and at the division of Computational Biology at the Department of Physics and Measurement Technology, Biology, and Chemistry at Linköping University during the autumn of 2004. Nyckelord Keywords Systems Biology, System Identification, Biochemical Networks. Abstract Systems biology is an area concerned with understanding biology on a systems level, where structure and dynamics of the system is in focus. Knowledge about structure and dynamics of biological systems is fundamental information about cells and interactions within cells and also play an increasingly important role in medical applications. System identification deals with the problem of constructing a model of a system from data and an extensive theory of particularly identification of linear systems exists. This is a master thesis in systems biology treating identification of biochemical systems. Methods based on both local parameter perturbation data and time series data have been tested and evaluated in silico. The advantage of local parameter perturbation data methods proved to be that they demand less complex data, but the drawbacks are the reduced information content of this data and sensitivity to noise. Methods employing time series data are generally more robust to noise but the lack of available data limits the use of these methods. The work has been conducted at the Fraunhofer-Chalmers Research Centre for Industrial Mathematics in Göteborg, and at the division of Computational Biology at the Department of Physics and Measurement Technology, Biology, and Chemistry at Linköping University during the autumn of 2004. Keywords: Systems Biology, System Identification, Biochemical Networks i ii Acknowledgement I would like to thank my supervisor at Fraunhofer-Chalmers Research Centre (FCC), Mats Jirstrand, for his help and enthusiasm in this project. Additional thanks to my friends and everyone who has provided comments and support on my thesis work. Final thanks to the staff at FCC. iii iv Notation Symbols and abbreviations used in the thesis are gathered here for clarification. All the abbreviations are also explained in the main part of the thesis, at their first occurrence. Symbols x, X θ DM Boldface letters are used for vectors, matrices, and sets. Parameter vector. Set of values over which θ ranges in a model structure. Abbreviations SITB MAPK PRBS System Identification ToolBox Mitogen Activated Protein Kinase Pseudo Random Binary Signal v vi Contents 1 Introduction 1 1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Systems Biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2.1 What is Systems Biology? . . . . . . . . . . . . . . . . . . . . 1 1.2.2 Why Perform Research on Systems Biology? . . . . . . . . . 6 The Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.1 Aim and Audience . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 2 Theory 2.1 2.2 2.3 9 Systems, Modelling, and Simulation . . . . . . . . . . . . . . . . . . 9 2.1.1 Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 Models 2.1.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 System Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 The Identification Process . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Biological Network Structures . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 Metabolic Networks . . . . . . . . . . . . . . . . . . . . . . . 12 vii viii 2.4 Contents 2.3.2 Gene Regulatory Networks . . . . . . . . . . . . . . . . . . . 13 2.3.3 Signalling Networks . . . . . . . . . . . . . . . . . . . . . . . 13 Identification of Biological Networks . . . . . . . . . . . . . . . . . . 14 2.4.1 2.5 Identification Approaches . . . . . . . . . . . . . . . . . . . . 15 Chemical Kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5.1 Rate Laws and Reaction Mechanisms . . . . . . . . . . . . . 15 2.5.2 Equilibrium and Steady State . . . . . . . . . . . . . . . . . . 18 2.5.3 Enzyme Kinetics . . . . . . . . . . . . . . . . . . . . . . . . . 19 3 Methods 23 3.1 3.2 Methods Based on Local Parameter Perturbation Data . . . . . . . . 23 3.1.1 Interaction Graph Determination . . . . . . . . . . . . . . . . 24 3.1.2 Determination of Control Loops in Mass Flow Networks . . . 27 Methods Employing Time Series Data . . . . . . . . . . . . . . . . . 28 3.2.1 Linearising Around an Operating Point . . . . . . . . . . . . 30 3.2.2 A Discrete Linearisation Method . . . . . . . . . . . . . . . . 34 4 Results 37 4.1 Results from Interaction Graph Determination . . . . . . . . . . . . 37 4.2 Results from Control Loop Determination . . . . . . . . . . . . . . . 40 4.3 Results from Local Linearisation and SITB Estimation . . . . . . . . 43 4.3.1 The Evaluation Network . . . . . . . . . . . . . . . . . . . . . 43 4.3.2 Simulation and Identifiability . . . . . . . . . . . . . . . . . . 45 4.3.3 The Identification Step 4.3.4 Varying Sampling Intervals . . . . . . . . . . . . . . . . . . . 49 4.3.5 Noise addition . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3.6 Additional Investigations . . . . . . . . . . . . . . . . . . . . 59 . . . . . . . . . . . . . . . . . . . . . 47 Contents ix 5 Discussion 63 6 Conclusions 67 A Kinetic Parameters for the Evaluation Networks 73 B Some Functions in the SITB 75 C Noise Effects on Estimation 77 List of Figures 1.1 Robust perfect adaption . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Integral feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Enzymatic transformation . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 A MAPK cascade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Dinitrogen pentoxide . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Decomposition data for dinitrogen pentoxide . . . . . . . . . . . . . 16 2.4 Logarithmic plot of decomposition data . . . . . . . . . . . . . . . . 17 2.5 Michaelis-Menten dynamics . . . . . . . . . . . . . . . . . . . . . . . 20 3.1 Interaction graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Two network architectures . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4 Folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5 Block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1 Massflow network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 Simplified cascade . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 x Contents 4.3 Step responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.4 PRBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.5 Bode plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.6 Gradual sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.7 Bode plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.8 Sampling dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.9 Noise addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.10 Signal and noise spectrums . . . . . . . . . . . . . . . . . . . . . . . 56 4.11 Noisy outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.12 Noise-free outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.13 MKKK dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 C.1 Complete noise addition . . . . . . . . . . . . . . . . . . . . . . . . . 79 List of Tables A.1 Michaelis-Menten parameters (I) . . . . . . . . . . . . . . . . . . . . 73 A.2 Michaelis-Menten parameters (II) . . . . . . . . . . . . . . . . . . . . 74 Chapter 1 Introduction The issues examined in this master thesis work are introduced in this chapter, together with the structure and aim of the thesis. Some background information on systems biology is also given. 1.1 Problem Formulation The main task of this master thesis is to examine and evaluate different methods for reconstruction and identification of biochemical networks. The methods are also investigated with an aim to possibly improve their applicability. 1.2 Systems Biology The task of the master thesis is within the area of systems biology. This section aims to explain different interpretations of the systems biology concept as well as give some motivation on why we perform research on this subject. 1.2.1 What is Systems Biology? Several different opinions on how to define systems biology exist. One description of the area is as a field of research concerned with understanding biology on a systems level. A systems biologist is interested in understanding the structure and dynamics of a system (Kitano, 2002). In more detail, this understanding of a system can be divided into four parts: system structure, system dynamics, control 1 2 Introduction methods, and design methods. The approach on how to thoroughly apprehend biological systems is proposed by (Kitano, 2001): System structure Networks of biochemical character is to be identified. This means that signal transduction as well as controlling mechanisms and mass flow between entities of the system need to be recognised. System dynamics When fundamental knowledge about the structure of a system is found, it is of interest to learn more about system behaviour over time and under different conditions. Control methods Methods on how to control the system can be the next step after identification and analysis of dynamic activities. The interest might be how to control or sustain a state for the system. Design methods Design principles and simulations with information on strategies to modify the system. The aim is to achieve the ability to modify, or even create, a system to fulfill certain purposes; for example provide cures for diseases. The systems biology research hence strives to understand the complex interactions between DNA, RNA, proteins, metabolic and informational pathways. Both interand intra-cellular networks are of interest. The application of systems and control theory to biological systems is one of many definitions of systems biology (Wolkenhauer et al., 2003). This view can be illustrated with the following example from (Yi et al., 2000). Example 1 Robust perfect adaption Adaption, or desensitisation, is a process of biological systems, that allows the system to return to a steady state, despite continuously being subject to stimulating signals. An example of this is the adaption in bacterial chemotaxis to changes in the levels of constituent proteins. The process of adaption is robust and a result of integral feedback control. The process is exemplified with the model of Figure 1.1. The amount of the species Y is held at a constant level by the integral feedback loop. The amount is only dependent on the rate constants for the rates leading to and from the A-component as follows: dA V3 Y = − V4 dt K3m + Y V4 K3m Y = V3 − V 4 1.2 Systems Biology 3 E1 E2 v3 Y A v4 Activate Inhibit (saturation) Figure 1.1. The process of robust perfect adaption. E1 and E2 represent perturbations to the system. The rate v4 must be constant, i.e., operating at saturation as the figure indicates. The interpretation of the rate constants and a more thorough treatment on kinetics can be found in Section 2.5. The block representation of this system is shown in Figure 1.2, where the reference signal r is the level of the component that the systems strives to maintain constant: r + - Σ e 1 s u G(s) y Figure 1.2. Robust perfect adaption as a block representation of integral feedback control. In addition to systems and control theory, mathematical and computational tools are utilised in research, such as data preprocessing, statistical and informatics mining tools (Morel et al., 2004). The area of systems biology is not new but has evolved concurrently with the development of suitable tools and increasing experimental skills and technology in producing useful data. Bioinformaticians have large pools of information owing to several genome projects producing sequence data. In contrast, a systems biologist is in need of a different kind of data, and to retrieve this type of data is not an easy task (Wolkenhauer et al., 2003). 4 Introduction With the definition of systems biology as the application of system theory to biology, the lack of applicable data can be understood. In systems and control theory, perturbations, performed in a systematical manner, and the following dynamic response of the system are used to deduce facts and information about the system. If this approach is copied to biology, time series data of the biological system is required. These kinds of measurements are preferably performed in vivo, to ensure that the system is in its natural environment, and that is not an easy task. In addition, data need to be collected systematically and, as in all applications, preferably with small influence of noise, which is hard to achieve in biological systems. It is also difficult to perform measurements without altering the environment or the state of the system under observation. The available data is more often steady state data, which is a lot less informative since it does not contain any information about the dynamic behaviour of the system. The systems biology approach to a problem can be illustrated with the following example of different levels of modelling of a enzyme catalysed transformation. Example 2 Modelling of an enzymatic transformation The three figures gathered below in Figure 1.3 illustrate the modelling of a transformation catalysed by an enzyme. The concept is further explained in Section 2.5.3. The first of the figures is a simple view of the transformation and gives the information that an enzyme catalyses the transformation of the substrate, S, into the product, P . The second figure is a more detailed description of how the transformation occurs. The enzyme and substrate form a complex which dissociates into product and free enzyme. S S E P (a) Textbook drawing. E ES P (b) Reaction view. 1.2 Systems Biology 5 S E k1/k-1 ES k2 P (c) Circuit diagram, modified after (Wolkenhauer et al., 2003). Figure 1.3. Three different levels of modelling of an enzyme catalysed reaction. The substrate is denoted S, the product P , and the enzyme-substrate complex ES. The last of the figures is an elaborate view of this kinetic reaction: k1 k S + E ES →2 E + P k−1 A mathematical model can be built from the kinetic interactions in the circuit diagram. The diagram is a mapping of the four differential equations which describe the dynamics of the reaction and the equations can be used in simulation of the system (Wolkenhauer et al., 2003). A circuit diagram is useful module to build models of other hand, this could also like to obtain for a network in a systems biology approach. It can be used as a larger networks with these kind of kinetics. On the be the kind of information a systems biologist would with less known kinetics. 6 1.2.2 Introduction Why Perform Research on Systems Biology? Why do we need information about the structure and dynamics of a system? Several reasons exist. Information provided by systems biology research is a part of fundamental knowledge about cells and interactions within cells. The information can also, for example, be used in medical applications. A vast number of diseases are not dependent on a singly mutated gene or, for example, a misfolded protein. Instead, complex multimolecular interactions explain the disease state (Morel et al., 2004). Therefore, the system of interactions needs to be understood in order to explain the disease mechanism. In addition, knowledge about the controlling mechanisms in a system can help in identification of important nodes in the network, which are possible targets for drug action or other treatments (Kitano, 2002). A recent example of the utility of systems biology in medicine is a research project on the possibility of replacing animal testing with computer modelling called BioSim (BioSim Network, 2005). The aim of the project is to decrease the number of, or at least improve, animal tests and speed up the drug development process. The computational modelling will hopefully give more detailed information on how drugs affect the patient and how the drugs are processed in the body. 1.3 1.3.1 The Thesis Aim and Audience The aim of this thesis is to describe system identification methods applied to biochemical systems and the different problems arising when doing so. The report is meant to give the necessary background information for the reader to understand how the work has been carried out and what results this has led to. The thesis is written with a person in mind having basic knowledge in biology as well as familiarity with topics like mathematics, control theory, and signal processing. 1.3.2 Structure The thesis is commenced with an abstract summing the work in a few lines while acknowledgements and notation clarifications are written there after. As a guide 1.3 The Thesis 7 to the thesis, a comprehensive table of contents as well as a list of figures and a list of tables are listed. The introductory chapter is followed by a more extensive theory part meant to explain fundamental concepts needed for further reading. Ensuing the theory block is a method chapter where the different methods used in the thesis are demonstrated and analysed. The results are presented and discussed in the consecutive chapters and the main part of the thesis is completed with a chapter containing the conclusions from the work. Finally, a reference list and appendices are given. 8 Introduction Chapter 2 Theory This chapter introduces necessary theory in order for the reader to understand the different methods used in the work. The methods are further described in Chapter 3. 2.1 Systems, Modelling, and Simulation Basic concepts in modelling, simulation, and validation are explained in this section. The view of system identification here is the one used in traditional engineering applications. The information can be valuable to compare to the system identification methods that have sofar been utilised and developed in systems biology, see for example (Wahde and Hertz, 2000, Ideker et al., 2001, Friedman et al., 2000). The information in this section is collected from (Ljung and Glad, 2004), if not otherwise stated. 2.1.1 Systems A broad definition of a system is a group of objects, or possibly a single object, that we wish to study. Another one, not including information about our interest in the system, is that the system is a group of elements interacting and bringing about cause-and-effect relationships within the system (Close et al., 2002). A system can be studied by performing experiments. The observable signals are usually referred to as outputs. Signals that in turn affect the system are called inputs. 9 10 Theory Sometimes, it is not possible to perform all the experiments on the system one would like due to, safety, financial, time consumption, or technical reasons. A model of the system is needed if it is to be examined. 2.1.2 Models A model is a tool, describing the system of interest, that enables us to answer questions about the system without performing experiments. The model is a description of how the elements in the system relate to each other (Ljung, 1987). How detailed a model is depends on what level we wish to examine the system. What level we choose is dependent on the purpose of the model. The interest in modelling is not always focused on the details of the dynamics within the system. The relation between the input and output signals can instead be the main focus, and less attention is put on the interpretation of the detailed internal dynamics. A model with this purpose is called a black box model. The parameters in the model are estimated with the only purpose of connecting the output and input signals, and are not associated with physical properties (Ljung, 1987). Sets of standard models that have parameters possible to relate to physical properties exist. These mixed models are referred to as grey box models. It is important to remember that a model is simply a model. The possibility of an exhaustive description of the behaviour of a system is non-existent. In addition, observing the system always results in different kinds of noise, stochastic in nature, bringing about unpredictable variations in the observations. In engineering applications, the most extensively used models are of mathematical nature. A mathematical model defines, through distinct equations, how the elements of a system relate to each other. A mathematical model can be time discrete or continuous, deterministic or stochastic, linear or nonlinear, and lumped or distributed. What attributes to assign to a model is dependent on the system and what information we can retrieve from it. A simple, time continuous, lumped model of a biological system is given in the example below. Example 1 The Lotka-Volterra equations d N1 (t) = aN1 (t) − bN1 (t)N2 (t) dt d N2 (t) = −cN2 (t) + dN1 (t)N2 (t) dt 2.2 System Identification 11 The equations describe the dynamics between the populations of prey, N1 (t), and predator, N2 (t), respectively. The variables can represent population density or biomass. Even though this model is simple it is highly nonlinear and cannot be examined analytically. To form the equations, several assumptions have been made regarding growth, death, and predation. The assumptions, analysis of the system, and extensions of the model can be found in (Edelstein-Keshet, 1988). The equations are originally from a classical article by Volterra. 2.1.3 Simulation A model can be used to deduce the behaviour of a system under certain conditions, which correspond to experimental tests. The deduction of results can be performed by analytical computation or by performing numerical calculations using the model. The latter is what we call simulation. If we think of a model as a set of instructions, for example equations in the case of mathematical models, a simulator obeys these instructions and generates the behaviour (Zeigler et al., 2000). 2.2 System Identification System identification is about constructing a model of a system from observed data. The information in this section is a summary of the introductory chapters of (Ljung, 1987). 2.2.1 The Identification Process The system identification process can be divided into three phases: recording data, obtaining a set of candidate models, and choosing the best of the candidate models. Recording data Input and output signals are monitored and recorded during experiments. The objective is to generate data carrying as much information as possible about the system. Special identification measurements can be made or data is gathered during normal operation of the system. Obtaining a set of candidate models This step is the most crucial and difficult one in the identification procedure. A set of models have to be chosen on the basis of the current knowledge of the system. One can choose from sets 12 Theory of basic models, or a model can be developed using physical characteristics of the system. A model from a set of standard models is a black box or a grey box model. Choosing the best of the candidate models according to data In this phase, the actual identification occurs. Parameters in the chosen model are estimated from the data. 2.2.2 Validation Assume we have obtained a model of a system from our recorded data. Can we trust the model? We have to examine if the model is useful for the purpose we intended it to. This is what is more known as validation. A model is generally considered to be useful, if it can reproduce experimental data. We compare the behaviour of the model with the system behaviour (Ljung and Glad, 2004). Important to remember is that a model is only valid within its validated area. It is never advisory to extrapolate information from a model. What if we are not satisfied with the result that our model produced? Then, we need to go back to our identification process and change one or more of the criteria. Going through the process again will produce a different, and perhaps better result. 2.3 Biological Network Structures Different types of biological networks exist. In this thesis, the emphasis is on the molecular level: intra-cellular networks and components. An account of different types of intra-cellular networks are given in this section, in order for the reader to understand the kind of networks that the identification methods in the thesis may be applied to. An arbitrary network within a cell is a biochemical network, since it consists of entities in a biological unit, the cell, and they interact with some kind of chemical reaction. The grouping into separate kinds of networks differ in various articles and textbooks. Biochemical networks are sometimes not considered to include the genetic networks of the cell. However, the networks of the cell are divided into metabolic, gene regulatory, and signalling networks in this thesis. 2.3.1 Metabolic Networks The majority of biochemical reactions are degradative or synthetic (Zubay et al., 1995). The synthetic reactions, also called anabolic reactions, are the basis for 2.3 Biological Network Structures 13 the formation of large biomolecules from simpler units. An example is the assembly of proteins from nucleic acids. The catabolic reactions are degradative and involves the breakdown of larger, complex organic molecules into smaller components (Zubay et al., 1995). The β-oxidation of fatty acids is a catabolic process. The catabolic and anabolic processes comprise the metabolism of the cell. A metabolic pathway is a complex series of reactions (Zubay et al., 1995) and the word series emphasises the directional property of the pathways; there is a net flow of mass in a specific direction or a specific end purpose of the pathway, for instance one, or several, products. A pathway can be considered as a network with a relatively small amount of interconnections. When different pathways share components, for example second messengers, networks of higher order and complexity occur. In describing a metabolic network, one usually distinguishes biochemical connections from controlling connections. Controlling links represent that two nodes are connected without any mass transfer. Instead one species influences the production or consumption rate of the other species. The controlling links are not always represented in traditional pathway descriptions, although they are important. Controlling links are part of feedback and feedforward loops that are important for the stability and robustness of the pathway. 2.3.2 Gene Regulatory Networks The central dogma of molecular biology states that DNA is replicated, transcribed into RNA, and translated into proteins. DNA interacts with a vast set of molecules in the cell, from complex proteins to simple transcription factors. The RNA-molecules usually make up the nodes of a gene regulatory network. Mass flow is not as frequent in these networks as in a metabolic network and instead controlling mechanisms dominate. 2.3.3 Signalling Networks Cells receive, process, and respond to information in a process called signal transduction (Lodish et al., 2000). The signals are mediated by signalling pathways, and since components often interact between the pathways, they form networks (Bhalla and Iyengar, 1999). Mechanisms of information transfer might be protein-protein interactions, enzyme activity regulations, or phosphorylation. The last form of transfer is exemplified below using the highly conserved kinase cascade where Mitogen Activated Protein Kinase (MAPK) is activated (Lodish et al., 2000). 14 Theory Example 2 MAPK Cascade The cascade is built up by several levels where a kinase in the upstream level phosphorolyses a kinase in the level downstream. A MAPK cascade with all controlling elements included is given in Figure 2.1. Ras/MKKKK v1 MKKK v4 v2 MKKK-P v3 MKKK-PP v5 MKK v8 v6 MKK-P v7 MKK-PP Activate Inhibit v9 MAPK v12 v10 MAPK-P v11 MAPK-PP Intramodular interactions Figure 2.1. A MAPK cascade. 2.4 Identification of Biological Networks System structure identification corresponds to the first of the four parts of understanding a biological system described in Section 1.2.1. In this thesis, the focus is on network structure identification. Network structure identification is not as broad a concept as systems structure identification and excludes for example the identification of structural connections among cells as well as cell-cell association (Kitano, 2001). 2.5 Chemical Kinetics 2.4.1 15 Identification Approaches The identification of a network includes finding all the components, their function, and how they interact. This is a difficult task since this kind of information cannot be inferred from experimental data based on some general rules or principles. Biological systems are stochastic in nature and not necessarily optimal (Kitano, 2001). In addition, several network realisations might produce similar experimental data and the identification involves singling out the correct one. This corresponds to the process of finding the best model out of a set of candidate models as described in Section 2.2.1. There are two general approaches in network structure identification: bottom-up and top-down identification. Bottom-up Different sources of data, for example literature and experiments, are integrated in order to understand the network of interest. This data-driven approach is mostly suitable when almost all pieces of the network already are known and the quest is to find the missing parts. Top-down A more hypothesis-driven approach where high-throughput data is utilised in trying to determine the network structure. Some information about the network is usually needed before hand, but not as extensive as in the bottom-up approach. 2.5 Chemical Kinetics Chemical kinetics is the study of rates of chemical reactions under consideration of all the intermediates in the process (Atkins and Jones, 1999). The area also examines the details about how reactions advance and what determines the rate of a chemical reaction. 2.5.1 Rate Laws and Reaction Mechanisms Dinitrogen pentoxide, N2 O5 , is an organic compound present in solid form at room temperature. The chemical structure of the substance is depicted in Figure 2.2 with the covalent bonds linking the different atoms in the molecule. More information on the substance can, for example, be found in (Linstrom and Mallard, 2003). At a temperature of 67o , the compound is in gaseous form, and decomposes into 16 Theory O O N N O O O Figure 2.2. Chemical structure of dinitrogen pentoxide. nitrogen oxide and oxygen according to 2N2 O5 (g) → 4NO(g) + 3O2 (g) x2 x1 (2.1) x3 The variables under each substance denote the amount of the substance in question. A plot of data for the reaction is shown in Figure 2.3. 1 0.9 Amount of dinitrogen pentoxide (M) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 Time (min) 3 3.5 4 4.5 5 Figure 2.3. Decomposition data for dinitrogen pentoxide. From this graph, one can observe that the amount of N2 O5 is consumed, fast initially, but the consumption rate decreases gradually. If we make a plot of the logarithm of the data, and plot it against the time as before, we get the plot in Figure 2.4 of a straight line. This plot confirms the fact that the reaction is of first order, which means that the amount of dinitrogen pentoxide is consumed with a rate directly proportional to its amount. A mathematical model of the reaction is made with differential equations. The 2.5 Chemical Kinetics 17 0 Logarithm of amount of dinitrogen pentoxide (M) −0.2 −0.4 −0.6 −0.8 −1 −1.2 −1.4 −1.6 −1.8 0 0.5 1 1.5 2 2.5 Time (min) 3 3.5 4 4.5 5 Figure 2.4. Logarithmic plot of decomposition data. amount of each compound is as before denoted xi and the rate is denoted r. dx1 =−2r =−2kx1 dt dx2 = 4r = 4kx1 dt dx3 = 3r = 3kx1 dt (2.2) (2.3) (2.4) The rate constant for the decomposition step of equation (2.2) is defined as the proportionality constant (excluding the sign), and is hence 2k. It can be deduced from the experiments as the slope of the straight line in Figure 2.4. Observe that the formation parts of the reaction, equations (2.3) and (2.4), have other rate constants, since the stoichiometric ratios between the species differ. The order of a reaction cannot in general be determined from the reaction formula; it is a property determined by experiments. The order varies from one chemical reaction to another and fractional orders also exist (Atkins and Jones, 1999). If two species react to form a third species, the overall order is defined as the sum of the orders for each reactant. For the example below, the order is a + b. A+B →C gives dA = −k(xA )a (xB )b dt The difficulty of extracting the rate law for a reaction directly from its equation owes to the fact that all, but the simplest, reactions occur in several steps called elementary reactions (Atkins and Jones, 1999). All steps might not be given in a 18 Theory reaction formula. To understand how the reaction proceeds, a mechanism needs to be proposed. Assume that we have a total reaction A + 2B → C + D. A possible reaction mechanism is given below, where X and Y represent intermediates. A+B →X X →C +Y Y +B →D X r1 = k1 xA xB r2 = k2 xX r3 = k3 xB xY A + 2B → C + D The deduction of a rate law (and the reaction order) from a mechanism can be done by employing different methods, and all include some kind of approximation of, or assumption concerning, the dynamics of the mechanism. The different methods can sometimes give the same rate law for a given mechanism. One method, called the steady state approximation, is employed in Section 2.5.3. 2.5.2 Equilibrium and Steady State The simple reaction given by formula (2.1) is an irreversible reaction, meaning that it only proceeds in one direction. This is a simplification in the modelling of the dynamics; in fact, all reactions also have a reverse course of events compared to the forward reaction. For the dinitrogen pentoxide reaction this means that some amount of the product always decomposes back into reactants. Reactions modelled as irreversible have a reverse reaction rate small enough to be neglected in modelling of the process. Reactions not possessing this property have forward and reverse reaction rates of the same magnitude; the reaction is reversible. This is depicted as k1 A+B C +D k−1 If the reverse and forward reaction rates are equal, the reaction has reached chemical equilibrium. A chemical equilibrium is characterised by a minimum of the free energy for the reaction (Atkins and Jones, 1999). There is no inclination for change in any of the directions for the reaction. The equilibrium constant is defined as K= xC xD xA xB If the reverse and forward reactions both are of simple second order, the equilibrium will correspond to k1 xA xB = k−1 xC xD , and K = k1 /k−1 . If K is large a lot of 2.5 Chemical Kinetics 19 product is produced before equilibrium is reached, since the rate constant for the forward reaction then is larger than the reverse rate constant (Atkins and Jones, 1999). In cells, compounds can exist in concentrations far from their chemical equilibrium states (Zubay et al., 1995). These states are connected to a larger free energy, and the concentrations are not only governed by the external environment, as it is in chemical equilibria. The rates in reaction sequences vary according to the cells requirements. The concentrations of key metabolites are held at constant levels by balancing the rates of production and consumption of reaction intermediates (Zubay et al., 1995). A reaction with an intermediate species, X, is depicted below. The concentration of the intermediate is held at a constant level. Mathematically it corresponds to a derivative of zero, for the amount of the component; dxX /dt = 0. This is a non chemical equilibrium situation where the intermediate resides in what is called a steady state. A+B X C +D 2.5.3 Enzyme Kinetics Enzymes are proteins acting as catalysts to chemical reactions in cells. Enzymes have an active site where the reaction takes place. The molecule which the enzyme acts upon is called substrate. Enzymes work by decreasing the activation free energy, i.e., the energy needed for the substrate(s) to enter into a state of transformation, called the transition state (Zubay et al., 1995). Enzyme catalysed reactions have a feature that distinguishes them from simpler, chemical reactions; they show saturation (Cornish-Bowden and Wharton, 1988). Almost all reactions of this type are of first order for small substrate concentrations, but the rate decreases with increased substrate concentration. The rate eventually becomes constant, independent of the concentration of substrate. The behaviour can be observed if the rate of a reaction can be deduced from a set of experiments with different substrate concentrations. For each substrate concentration, a small amount of the catalysing enzyme is added, and the amount of formed product is monitored during a time span. The monitoring can be done with light spectrometry, provided that the product absorbs light. The amount of product is plotted against time, and the slope of the curve at the start of the experiment corresponds to the rate. An artificial curve of the typical behaviour is given in Figure 2.5. The constants Km and V in the figure are parameters in the rate law for the reaction which is deduced from the following reaction mechanism where E represents the 20 Theory V 0.5V 0 Km [S] Figure 2.5. Typical dependence of reaction rate on substrate concentration for a reaction following Michaelis-Menten dynamics (see below). Km is the substrate concentration giving a rate of 0.5V . enzyme, S the substrate, and P the product. E + S k1 ES k →2 E + P k−1 Etot − xES xS xES xP The amount of free substrate is considered to be much larger than the amount bound to the enzyme-substrate complex, ES, and it is hence assumed that the free amount of substrate and the total amount are equal (Cornish-Bowden and Wharton, 1988). The conversion of ES to free product and free enzyme, E + P , is considered to be irreversible, if one only measures the initial rate of the reaction in the steady state (the regeneration of ES is negligible since the amount of product is small) (Zubay et al., 1995). The steady state assumption for this mechanism is that the amount of the intermediate species does not change: dxES = k1 (Etot − xES )xS − k−1 xES − k2 xES = 0 dt Solving the algebraic expression for xES gives that xES = k1 Etot xS k−1 + k2 + k1 xS and since the dissociation of the enzyme-substrate complex is of first order with respect to the intermediate, we have a rate of the form where Km = (k−1 + k2 )/k1 and V = k2 Etot : k1 k2 Etot xS V xS = (2.5) v= k−1 + k2 + k1 xS Km + xS 2.5 Chemical Kinetics 21 This equation is known as the Michaelis-Menten equation and Km consequently as the Michaelis constant, although it was Briggs and Haldane that in 1925 proposed the mechanism (Cornish-Bowden and Wharton, 1988, Zubay et al., 1995). Michaelis and Menten in fact assumed that the first step was an equilibrium, which is a less general assumption than the one made above. The equation can be extended to describing a mechanism with several substrate molecules. Substances called activators and inhibitors affect enzyme activity and cause the catalysed reaction to proceed faster or slower respectively. The most important kinds of inhibition, although several other exist, are results of the inhibitor binding to the enzyme (Cornish-Bowden and Wharton, 1988). The simplest type of inhibition is linear, meaning that terms proportional to the inhibitor concentration appear in the denominator of the rate law (Cornish-Bowden and Wharton, 1988). If the enzyme-catalysed reaction followed Michaelis-Menten dynamics in the absence of inhibitor it will still do so if inhibitor is present, with modifications of the effective parameter values. Competitive inhibition is the most common kind (Cornish-Bowden and Wharton, 1988) and occurs when the substrate, S, and inhibitor, I, compete for the free enzyme. If the inhibitor binds, it will result in an inactive complex, EI, that does not lead to products. The Michaelis constant will have an altered effective value, while the limiting rate, V , will have the same effective value as before. The interpretation of this is that the enzyme-substrate complex is as reactive as before, but the effective affinity of the substrate to the enzyme is decreased (CornishBowden and Wharton, 1988). Uncompetitive inhibition, on the other hand, is a result of the inhibitor binding to the enzyme-substrate complex and producing an inactive complex, ESI. A pure form of this kind of inhibition is uncommon, and is most important as product inhibition (Cornish-Bowden and Wharton, 1988). The effective limiting rate and Michaelis constant are both affected, but their ratio is not. If an inhibitor binds to both the free enzyme and to the enzyme-substrate complex, the inhibition is called mixed. In some books, the term non-competitive inhibition occurs, and is regarded as an additional form of inhibition. In this case, the inhibitor affects the enzyme without affecting binding of the substrate, and the inhibition is independent of substrate concentration (Cornish-Bowden and Wharton, 1988, Zubay et al., 1995). According to (Cornish-Bowden and Wharton, 1988), this is not a plausible mechanism in nature, and there are no examples recorded, if effects of the pH are excluded. Activators are effectors that bind to an enzyme and increase enzyme activity without being changed in the reaction (Cornish-Bowden, 1995). Specific activation occurs when the enzyme, in the absence of the activator, does not have any activity. In analogy with the competitive inhibition, the effective value of the Michaelis 22 Theory constant is altered. An activation counterpart to the mixed inhibition also exists. More complex expressions for the rate law of enzyme activation occur when the enzyme is less active, but not inactive, in the absence of activator (Cornish-Bowden, 1995). A simple model for activation is to insert the concentration of the activator in the numerator of the rate law. The model might be used when the underlying mechanism of activation is unknown. Most enzymes follow Michaelis-Menten kinetics, but some do not. These enzymes show a sigmoidal dependence on substrate of the rate, instead of a hyperbolic dependence as in Figure 2.5. Often, these enzymes have controlling, or regulatory, tasks in a biological network. The sigmoidal response is linked to cooperativity of the enzyme. An enzyme exhibiting cooperativity will be ultra-sensitive to changes in the substrate concentration, which is not the case of Michaelis-Menten kinetics (Cornish-Bowden, 1995). Positive cooperativity can for example occur if an enzyme is binding several substrate molecules and the binding of subsequent substrates is facilitated by previous binding of substrate. Chapter 3 Methods Several identification methods were used in the thesis work and they are described and analysed in this chapter. Emphasis is given to the mathematical basis for each method and to the data required to apply them to biochemical systems. 3.1 Methods Based on Local Parameter Perturbation Data Local parameter perturbation data is based on steady state measurements. The amount of each interesting species does not change, which corresponds to a derivative equal to zero for the variables representing these amounts. Consider a network consisting of n components (or n groups of components), each described by a state variable, xi (t), that (in some units) represents the amount of the component. The variables are gathered in a vector, x(t) = (x1 (t), . . . , xn (t)). The system is modelled by a set of differential equations where the rates of change of the variables are dependent, in addition to the variables xi , on a set of parameters, p = (p1 , . . . , pm ). ẋ1 (t) = f1 (x1 (t), . . . , xn (t), p1 , . . . , pm ) ẋ2 (t) = f2 (x1 (t), . . . , xn (t), p1 , . . . , pm ) .. . ẋn (t) = fn (x1 (t), . . . , xn (t), p1 , . . . , pm ) (3.1) A component fi is not explicitly dependent on all the parameters in p, but the exact dependence is not known. The steady state is given by ẋ(t) = f (ξ(p̄)) = 0, 23 24 Methods where ξ(p̄) is a vector of the steady state values, associated to a specific set of parameter values p̄, of the state variables. We assume that it is possible to perturb the parameters in p individually. A realisation of this could be the addition of specific inhibitors or activators that affect the catalysing enzymes in the network. Another possibility is to introduce a double stranded RNA that interferes with a gene product, which can subsequently lead to a reduction of the amount of a protein. Interference RNA is often denoted RNAi. Each affected enzyme is responsible for catalysing a reaction dependent upon a parameter pj . Individual additions of activators, RNAi, or inhibitors are repeated for all parameters. The result will be m + 1 different steady state measurements, of all state variables, including the reference steady state, ξ ref . All of the different measurements are associated to a specific set of parameters. A perturbation experiment involving an alteration of the parameter pj results in the following quotient, which approximates the sensitivities of the steady state values to changes in the parameter: σij = ξi − ξref ∆ξi = , ∆pj pj − pj ref i = 1, . . . , n (3.2) The calculations are repeated for all parameters and the result is gathered in a matrix Σ = (σij ). This matrix has the dimensions (n×m) and it is the approximation of ∂ξi /∂pj for i = 1, . . . , n and j = 1, . . . , m. If the exact changes of the parameter values are unknown, the sensitivities can be approximated in a different manner, explained in (Kholodenko and Sontag, 2002). 3.1.1 Interaction Graph Determination An account of the top-down identification method from (Kholodenko and Sontag, 2002) is given in this section. Assuming that the components in the network are known, the method aims at determining the interaction graph of the system. The interaction graph, also named connection graph, is a description of how each node in the network qualitatively affects the other nodes. The interaction graph is a basis for further investigation of the network and its dynamic behaviour. A simple interaction graph for a network of three components is shown in Figure 3.1. How the different nodes in the network affect each other is displayed with the arrows - the net effect is negative or positive with respect to the amount of the component(s) represented by each node. Exactly how the network is built is not always trivial to extract from the interaction graph. The graph only describes the functional dependencies of the total rate of change of each variable with respect to 3.1 Methods Based on Local Parameter Perturbation Data 25 Figure 3.1. A simple interaction graph. other variables in the system. The deduction of the network wiring is almost always impossible if the nodes of the network are connected by mass flow, since several network structures can correspond to the same graph. The impossibility of distinguishing between network structures is shown in the example below. The method also allows a modular approach, where several components can be represented by each node. The same method, with minor variations, is described in (Kholodenko et al., 2002) and is applied to a gene regulatory network. The interaction graph can be deduced from the Jacobian, f x = (fik ), of the general system (3.1) of differential equations, where fik = ∂fi (x̄, p̄) ∂xk A connection from node i to node k in the interaction graph, corresponds to a non-zero element in position (k, i) of the Jacobian. Example 1 Network uniqueness Two network architectures are given in Figure 3.2. The two networks have some mass flow between their nodes. Both systems have the following quantitative Jacobian. The deduction of the network architecture from this Jacobian will not lead to a unique network. A B C A − 0 + B + − − C 0 0 − 26 Methods C C Activate Inhibit A B A B Figure 3.2. Two network architectures. The intention of finding the Jacobian of the system from steady state data cannot be entirely realised, since a multiplication of each row in the Jacobian with a constant also is a solution to ẋ = 0. Hence, the rows can only be determined up to a scalar multiple. This is still enough to determine the interaction graph since we are only interested in whether an element of the Jacobian is non-zero or not. A few assumptions are needed to make the method legitimate. There must, for each node, represented by a variable xi , exist other nodes that are not directly connected to the current node i. This means that there is a set of parameters for which ∂fi /∂pj = 0 for each i ∈ {1, . . . , n}. The required set of parameters have to, for each xi , correspond to n − 1 independent columns from the Σ-matrix. For each node i, perturbations must be made to n − 1 other nodes, and the parameters that are perturbed cannot be a part of any of the rates that lead to or from the node i. It is not necessary to perform n · (n − 1) perturbations, since each perturbation can be used for more than one node. To use the method, some information about how the nodes are connected must be known before hand. The information needed is, for example, that a subset of the nodes are known to be unconnected to the remaining set of nodes. If the network connections only are of controlling nature, the assumptions are easy to fulfill. For one state, i.e., one row in the Jacobian, the following is valid: ∂fi ∂x ∂fi ∂ fi (x̄, p̄) = (x̄, p̄) (x̄) + (x̄, p̄) ∂pj ∂x ∂pj ∂pj The assumptions above and the fact that the system is in steady state produces the pursuing orthogonality criterion: ∂fi ∂x (x̄, p̄) (x̄) = 0 ∂x ∂pj (3.3) for each pj that fulfills the first assumption for the particular fi . This information is all we need to reproduce the rows of the Jacobian up to scalar multiples. The assumptions guarantee that the orthogonality criteria correspond 3.1 Methods Based on Local Parameter Perturbation Data 27 to a system of linear equations from which we can produce the values of each row. Also, by fixing the diagonal elements in the Jacobian to −1, the linear equation system is reduced and can be solved using least square methods. 3.1.2 Determination of Control Loops in Mass Flow Networks A problem with the interaction graph identification method in Section 3.1.1 is that the interaction graph can correspond to several non-separable network architectures. It is not possible to separate mass flow connections from control loops, and the interaction graph tends to become complex when mass flow is present. The method of this section aims to identify the control loops in a network where the mass flow is known before hand. The applicability of the method is of course limited by the assumption that the mass flow is known, but it is not an at all unlikely situation. Situations may exist where the ”back-bone” of the network is known from experiments, but nothing of the internal control mechanisms. The set of differential equations from the system (3.1) can be represented in the following manner where the fi components are replaced by the rates. The time dependence has been left out, as well as the dependence of the rates on the state variables, to simplify the notations: ẋ1 ẋ2 ẋn = m1,1 r1 (p1 ) + m1,2 r2 (p2 ) + . . . + m1,q rq (pq ) = m2,1 r1 (p1 ) + m2,2 r2 (p2 ) + . . . + m2,q rq (pq ) .. . = mn,1 r1 (p1 ) + mn,2 r2 (p2 ) + . . . + mn,q rq (pq ) The kinetics of the rates are not known but the mass flow in the network is reflected by the knowledge of the mi,l elements. If the stoichiometric ratios of the network is one-to-one for all species, the matrix M = (mi,l ) only contains 0, 1 or −1 in appropriate places. Row i of the M -matrix has zeros for the rates not leading to or from the node represented by variable i. The differential equations are summarised as ẋ = M r(p) (3.4) Examining a specific row i, denoted M (i)r(p), in the relation above and differentiating it with respect to the parameter pj will result in: ∂ ∂ (M (i)r(p)) = (mi,1 r1 + . . . + mi,q rq ) = ∂pj ∂pj ∂r1 ∂x ∂rq ∂x + mi,1 + . . . + mi,q ∇rq + mi,q mi,1 ∇r1 ∂pj ∂pj ∂pj ∂pj (3.5) 28 Methods The differentiation of the arbitrary row from expression (3.5) can be represented in a more compact form, where r x = (∇r1 ∇r2 . . . ∇rn )T quantifies how the rates depend on the state variables: M (i)r x ∂x ∂r1 ∂rq T + M (i)( ,..., ) ∂pj ∂pj ∂pj (3.6) The parameter pj in equation (3.6) can be chosen to produce a cancellation of the last term of the equation. If the rates that lead to or from state that is represented by the row i are independent of the parameter, we have cancellation. Selecting a parameter with these properties will produce the following criterion, similar to the orthogonality criterion of equation (3.3): M (i)r x ∂x =0 ∂pj (3.7) The vector ∂x/∂pj is approximated by the corresponding column of the Σ-matrix defined earlier in this chapter. Since the matrix M is known, the criterion results in an equation of some of the entries in the r x -matrix. Combining all different rows and ∂x/∂pj vectors as in equation (3.7) will produce an under determined equation system, assuming that each rate cannot be perturbed more than once. The assumption is reasonable, since we only perform small perturbations and the response should be similar whatever kind of perturbation we use on parameters in the same rate, i.e., additional perturbations will not add any independent equations. Further analysis of the method can be found in Chapter 4. 3.2 Methods Employing Time Series Data The methods in the previous section are based on steady state data. As noted, steady state data is not as informative as time series data and it is not possible to exactly determine the Jacobian of the system. Time series data is retrieved by recording the amounts of the involved species during a time span. The data is informative but hard to collect. Measurements in vitro of dynamic species is not an easy task, but can be achieved using elaborate methods including different kinds of spectrometry. Sampling of the data is needed, for both continuous and discrete data records. The intention is for the data to pick up the dynamics of the system and in order to do so, a lot of thought must underlie the choice of sampling interval. The sampling interval, T , is the time between samples. The angular frequency of the sampling is subsequently defined as ωs = 2π/T = 2πfs and naturally called (angular) sampling frequency. 3.2 Methods Employing Time Series Data 29 A continuous signal is presumed to have a Fourier transform describing its frequency content. Sufficient, but not necessary, conditions for the transform to exist are the Dirichlet conditions (Svärdström, 1999). Periodic functions as well as the unit step function do not fulfill the requirements in these conditions. For these signals, a transform is defined in the limit allowing generalised functions like the Dirac pulse. The transform is defined as Z ∞ F [x(t)] = X(ω) = x(t)e−jωt dt −∞ where ω is the angular frequency. Sampling of a continuous signal modelled by a multiplication of the signal with a Pis ∞ sequence of impulses, δT (t) = n=−∞ δ(t−nT ), where δ represents the Dirac pulse. The sampling interval is the time between the pulses. The procedure of sampling is illustrated in Figure 3.3. The transform of the signal is repeated with a distance of ωs rad/s in the frequency domain. The signal is limited in the time domain, since we only can sample for a finite time span. A time limited signal cannot be band limited; the transform is infinite in the frequency domain (Svärdström, 1999). x(t ) t 0 ∑ δ (t − nT ) x (t ) ⋅ 0 T 0 T ∑ δ (t − nT ) t t Figure 3.3. Sampling of a continuous signal. The unlimited transform causes problems during sampling. The frequencies above the Nyquist frequency, ωN = πfs = ωs /2 rad/s, are folded into the transform, and are instead misinterpreted as lower frequencies. The effect is illustrated in Figure 3.4. Consequently, there are two problems in sampling; folding of frequencies above the Nyquist frequency and the necessity of limiting the signal in the process. Both effects distort the transform of the continuous signal. 30 Methods X( ) X s( ) N s Figure 3.4. Frequencies above ωN are folded into the angular frequency range [−ωN , ωN ]. The folding effect is also known as aliasing. A way to avoid the aliasing-effect is to use an anti-alias filter on the data before the sampling is made. An anti-alias filter is a regular low-pass filter with the cut-off frequency chosen a bit below the Nyquist frequency (Ljung and Glad, 2004). In this way, the frequency content above ωN is lost, and does not disturb the remaining information by folding effects. In addition, if there is high-frequency noise in the data, it is also removed by the anti-alias filter (Ljung, 1987). A rule of thumb is to choose the sampling frequency a ten-fold greater than the bandwidth we wish our model of the system to cover. The choice corresponds to 4-8 sample points on the flank of the step response from the system (Ljung and Glad, 2004). It might therefore be of interest to measure a step response from the system before hand. 3.2.1 Linearising Around an Operating Point Consider a non-linear network that resides in a steady state, which is the operating point. How the network is wired in this steady state is, as before, given by the Jacobian of the system at the operating point and is what we wish to determine. The system is described using the following setup, similar to the one given by equation (3.1): ẋ1 (t) = f1 (x1 (t), . . . , xn (t), u1 (t), . . . , uk (t)) ẋ2 (t) = f2 (x1 (t), . . . , xn (t), u1 (t), . . . , uk (t)) .. . ẋn (t) = fn (x1 (t), . . . , xn (t), u1 (t), . . . , uk (t)) (3.8) Compared to equation (3.1), the dependence on the parameters is not explicitly stated in the representation and the variables ui (t) are added that correspond to 3.2 Methods Employing Time Series Data 31 input signals to the system. The variables are gathered in an input signal vector, u(t). The representation in equation (3.8) implies that there is a way to effect the system, in a controlled manner, by changing the input signals. The steady state of the system corresponds to the specific values (x0 , u0 ) of the state variables and the input signals. The system is observed in a vicinity of the steady states where small deviations, (x(t) − x0 , u(t) − u0 ), are represented e (t)). The deviations are often called perturbations. Substituting the by (e x(t), u expressions of the deviations into the differential equation system of (3.8) and omitting the time dependence will give d e ) = f (x0 + x e , u0 + u e) (x0 + x (3.9) dt By expanding the left hand side of the expression and noting that dx0 /dt = 0 and du0 /dt = 0 as well as expanding the right hand side in a Taylor series around the steady state (x0 , u0 ) we get d ∂f ∂f (e x) = f (x0 , u0 ) + (x0 , u0 )e x+ (x0 , u0 )e u + h.o.t. dt ∂x ∂u The perturbations around the steady state are small and the higher order terms can be neglected. We have a linear system of differential equations for the perturbations e , since f (x0 , u0 ) = 0 by definition of a steady state: x ∂f ∂f e˙ = x (x0 , u0 )e x+ (x0 , u0 )e u ∂x ∂u The matrix ∂f /∂x is the Jacobian for the system in the steady state and describes the network wiring, as stated before. In equation (3.8), the variables ui (t) represent input signals. The approximate input-output modelling of the system is completed by defining a set of output signals, y(t), as linear combinations of the state variables as well as the input signals. The result is a linear state space model: e˙ (t) = A(θ)e x x(t) + B(θ)e u(t) y(t) = C(θ)e x(t) + D(θ)e u(t) (3.10) The time dependence is explicitly stated to point out that θ is time independent. θ is a vector of the parameters, determining the behaviour of the system, that were omitted in equation (3.8). The true set of the parameters is what we wish to find. Noise is inevitably added to the measurements during data collection. The state space model given by equation (3.10) does not contain any noise model. Adding a noise model is advisory, and a fairly simple model is to include additive noise sources. An extended model for the system, including noise sources w(t) and v(t) representing process noise and measurement noise respectively, is (Ljung, 1987): e˙ (t) = A(θ)e x x(t) + B(θ)e u(t) + w(t) y(t) = C(θ)e x(t) + D(θ)e u(t) + v(t) (3.11) 32 Methods The noise sources are modelled as sequences of independent random variables with zero mean. A so called observer estimating the state variables is given by equation (3.12). To avoid cumbersome notations, the˜-accent is dropped, meaning that ˆ . The parameter dependence is also omitted for the same reason. e x̂ in fact is x ˙ x̂(t) = Ax̂(t) + Bu(t) + K(y(t) − C x̂(t) − Du(t)) (3.12) The quantity y(t) − C x̂(t) − Du(t) is a measurement of how well the predictor x̂(t) estimates x(t). The observer that minimises the estimation error is called the Kalman filter, represented by K. The filter is calculated from the matrices for the state space description as well as the variance and covariance matrices for the noise sources. For a more thorough treatment, see for example (Ljung and Glad, 1997) or (Ljung, 1987). The prediction error ν(t) = y(t) − C x̂(t) − Du(t), used as feedback quantity in the observer, is the new information in the measurement y(t) that is not available in the previous measurements. The signal ν(t) is called the innovation (Ljung and Glad, 1997, Ljung, 1987). The innovations form of the state space description of equations 3.11 is given by ˙ x̂(t) = A(θ)x̂(t) + B(θ)u(t) + K(θ)ν(t) y(t) = C(θ)x̂(t) + D(θ)u(t) + ν(t) (3.13) A set of transfer functions can be deduced for the linear differential equation system on the innovations form. The mathematical expressions for the transfer functions as well as a block diagram are given in equation (3.14) and Figure 3.5, respectively. The dependence on the parameter vector θ is omitted. Y (s) = (C(sI − A)−1 B + D) U (s) + ((sI − A)−1 K + I) V (s) {z } {z } | | G(s) (3.14) H(s) H(s) u G(s) + Σ + y Figure 3.5. A block representation of the system. If an identification process on the system given by equation (3.13) is to be successful, some conditions need to be fulfilled. 3.2 Methods Employing Time Series Data 33 Identifiability is central to the identification process and deals with the problem of the process being able to yield true values of the parameters of the model and/or giving a model equal to the true system (Ljung, 1987). Assuming that our sampled data from the system is informative enough to represent the system behaviour, the question is if two parameterisations of the system can produce equal models, i.e., give non-separable output data. Another way to put the question is if the model structure is invertible or not (Ljung, 1987). Invertibility is a necessary condition for identifiability and a model, M, is identifiable with a true parametrisation θ ∗ if M(θ) = M(θ ∗ ) ⇒ θ = θ ∗ , θ ∈ DM where θ and θ ∗ represent different parameterisations in the set of possible values, DM , for the model (Ljung and Glad, 2004, Ljung, 1987). The condition above guarantees global identifiability. Local identifiability is defined in a similar way, but the invertibility is only valid in a vicinity of θ ∗ . The input signals, u(t), must be chosen to excite the system (Ljung and Glad, 2004), giving data that is informative enough for the identification, as stated above. The input signals must have variations as fast as the smallest time constants of the system. For a sampling process to pick up the frequencies of the oscillations, it is necessary for the input signals to contain these frequencies. Equally important is then to sample fast enough not to lose the frequencies by folding or anti-alias filtering. An input signal alternating between two levels is a good choice for a linear system, like the one given by equation (3.10). If the variations are random, this signal contains all frequencies (Ljung and Glad, 2004). Since we have a linearised system, it is important to verify that the variations in the input signal are small enough for the system to respond in a linear fashion. Crucial to the validity of the linearisation is closeness of the system to the steady state. A common signal in identification of linear systems is a Pseudo Random Binary Signal (PRBS)(Ljung and Glad, 2004). A PRBS is a signal that alternates between two levels in an apparent stochastic manner, although it is deterministic. Our question is now if the state space model in equation (3.13) is identifiable, provided that the inputs excite the system in a satisfactory way. In our application, the matrix D(θ) is zero, meaning that the inputs do not directly affect the outputs. By the theorem (4A.2) in (Ljung, 1987) it is stated that the system is globally and locally identifiable, given that it is parameterised according to the observer canonical form, at θ ∗ if and only if {A(θ ∗ ), [B(θ ∗ )K(θ ∗ )]} is controllable. The observer canonical form for a multiple input - multiple output (MIMO) system is characterised by a full parametrisation of the B(θ)-matrix while the C(θ)-matrix is built up by zeros and ones. The A(θ)-matrix has the same number of fully 34 Methods parameterised rows as the number of output channels, while the remaining rows are filled with zeros and ones in a certain pattern (Ljung, 2001). The exact definition of the observer canonical form can be found in (Ljung, 1987) and the following is an example with three state variables, two output variables and one input variable (a SIMO system): 0 1 0 × A(θ) = × × × , B(θ) = × × × × × × × 1 0 0 K(θ) = × × , C(θ) = 0 0 1 × × To achieve a unique connection between the input and output signals, we have to limit the flexibility in the A(θ)-matrix by fixating certain elements, since we do not measure all the state variables. The observer canonical form for the state space model is a sufficient parametrisation for some of the systems we intend to apply the method to, as stated in Chapter 4. We have now stated that the method of this section can be used to identify linearised systems around an operating point. The identification itself can, for example, be made with the Systems Identification Toolbox for Matlab (SITB), which is designed for identification of linear systems. 3.2.2 A Discrete Linearisation Method A less general and discrete method based on the same kind of linearisation as the method in the previous section is briefly explained here. The method illustrates a simple version of the estimation procedure of the SITB, and is included for basic understanding of parameter estimation. Time series data is utilised and the availability of measurements of all state variables is a requirement, i.e., C d (θ) is the identity matrix. The matrix D d (θ) is assumed to be zero. The subscript d emphasises that we have the discrete equivalents of the matrices from the continuous system of equation (3.10). A difference equation is the basis for a discrete modelling of the system. The system is dependent on a set of parameters, but the dependence is omitted in the description: xn+1 = Ad xn + B d un (3.15) y n = C d xn An estimation of the matrix Ad can be used to deduce the Jacobian for the continuous system. This estimation is made with least square methods where the 3.2 Methods Employing Time Series Data 35 following matrices are needed: R= xn+1 xn . . . x1 , M= xn un xn−1 un−1 . . . x0 . . . u0 By inserting the matrices into the system description (3.15) we will get R = Ad B d M To estimate Ad we multiply the expression from the right with the pseudoinverse of M . The resulting matrix from this operation is of dimensions (n × n + k) where n is the number of state variables and k the number of input signals. The first n columns correspond to Ad . Taking the matrix logarithm of the relation Ad = eJ∆T and dividing by the sampling interval produces an estimated Jacobian. The estimation of the SITB is more complex, and includes noise models, which is not the case here. The absence of noise models renders the method very sensitive to noise. 36 Methods Chapter 4 Results Results from testing and evaluation of the methods in the previous chapter are presented here. Further discussion on the applicability of the methods can be found in Chapter 5. 4.1 Results from Interaction Graph Determination The identification method from Section 3.1.1 was evaluated on a simple, artificial network. The evaluation network is depicted in Figure 4.1. It consists of six nodes that are connected by eight irreversible rates obeying Michaelis-Menten kinetics. The kinetic parameters governing the dynamics are given in Appendix A. The inhibition is modelled as purely competitive while the activation is multiplicative. The software PathwayLab, developed by InNetics (InNetics AB, 2005), was used to implement the evaluation network. The dynamics of the network is described with a set of differential equations. The steady state, which is the operating point, is sustained through a constant flow of mass into the system through the rate r1 . Systematic perturbations to each node in the system were simulated with alterations of the maximum rate, V , from the Michaelis-Menten expression in equation (2.5), for all rates. The maximum rate was perturbed for one rate at a time, and the steady state computed afterwards. The perturbation matrix Σ, made up by the approximated sensitivities in (3.2), was calculated from the set of steady state measurements. The simulation of the system as well as the calculations were 37 38 Results Figure 4.1. The artificial mass flow network used for method evaluation. made with Mathematica (Wolfram Research, Inc., 2003). Rows from the symbolic expression for the Jacobian, with the diagonal elements fixated to −1, were multiplied with selected columns from the perturbation matrix Σ in order to form the equation system for identification of the Jacobian. The symbolic expression for the Jacobian is −1 f2,1 f3,1 f4,1 f5,1 f6,1 f1,2 −1 f3,2 f4,2 f5,2 f6,2 f1,3 f2,3 −1 f4,3 f5,3 f6,3 f1,4 f2,4 f3,4 −1 f5,4 f6,4 f1,5 f2,5 f3,5 f4,5 −1 f6,5 f1,6 f2,6 f3,6 f4,6 f5,6 −1 (4.1) The columns of Σ that are multiplied with row i correspond to perturbations of V in the rates that are not connected to node i. For example is the first row of the Jacobian multiplied with columns 3, . . . , 8 of Σ. This multiplication produces six equations. The equation system is made up by 30 unknown variables and the method supplies 35 equations. A solution for all the variables was calculated with least square methods. By fixing the diagonal elements of the Jacobian, we use the fact that we cannot 4.1 Results from Interaction Graph Determination 39 estimate the Jacobian but up to a scalar multiple, and instead we gain a more reliable solution to the equation system, i.e., a more reliable estimation. An increase of five percent of the value of the perturbed parameters resulted in the following Jacobian: −12 −12 −11 −13 −1 −1.36·10 0.1829 −1 3.9956 0.0016 −0.0003 0.0024 −0.0025 1.5539 0.0016 0.0099 −3.11·10 −0.2679 −1 0.1890 0.7033 −0.0006 A need to choose a cut-off limit considered to be zero is obvious. −1 0 0.1829 −1 0 3.9956 0 0 0 1.5539 0 0 0.8333 −0.1524 −0.0013 −1 −0.0385 0.0225 1.51·10 4.15·10 2.96·10−13 2.94·10−14 0.0074 0.0033 −0.0016 −0.0007 −1 0.0002 1.5496 −1 where elements with a value below the limit are A cut-off of 0.05 gave 0 0.8333 0 0 −0.2679 −0.1524 0 0 −1 0 0 0 0.1890 −1 0 0 0.7033 0 −1 0 0 0 1.5496 −1 We can compare this estimation with the true Jacobian for the network, which is calculated by deriving the differential equations and inserting the steady state values. The Jacobian has been normalised to make it easier in comparing it with the estimated version. −1 0 0 0.8333 0 0 0.1989 −1 −0.2521 −0.1658 0 0 0 3.9663 −1 0 0 0 0 0 0.1912 −1 0 0 0 1.4014 0.6418 0 −1 0 0 0 0 0 1.5260 −1 The validation shows that the estimation of the elements are close to the true values, and perhaps more important, the elements that in the true Jacobian are zero, are very small compared to the estimated values of the non-zero elements. Normalisation of the diagonal elements has the effect of adding certainty to the estimated values of the Jacobian entries. On the other hand, we risk to amplify elements that in the true Jacobian are zero, beyond the cut-off limit, due to numerical errors. The choice of cut-off limit is difficult, and every choice of the cut-off is associated with this risk. If we apply the method to experimental data, the risk of normalisation issues should be increased, since noise inevitably will affect the measurements. It is not possible 40 Results to determine a reasonable cut-off limit before hand, since we do not know the magnitude of the elements that are to be estimated. 4.2 Results from Control Loop Determination The artificial network in Figure 4.1 was also used to evaluate the method in Section 3.1.2. The method can only be applied if the mass flow connections between the nodes, named ri , are known, as well as the stoichiometric ratios between substrate and product in each step. The evaluation network has simple stoichiometric relationships, and consequently, the M -matrix describing the mass flow, in the differential equation system (3.4), will have 0, 1 and −1 entries. The total mathematical model of the network is: ẋ1 ẋ2 ẋ3 ẋ4 ẋ5 ẋ6 = r1 (p1 ) − r2 (p2 ) = r2 (p2 ) − r3 (p3 ) − r6 (p6 ) = r3 (p3 ) − r4 (p4 ) = r4 (p4 ) − r5 (p5 ) = r6 (p6 ) − r7 (p7 ) = r7 (p7 ) − r8 (p8 ) From this description, we can identify 1 −1 0 0 1 −1 0 0 1 M = 0 0 0 0 0 0 0 0 0 the M -matrix as 0 0 0 0 0 0 0 −1 0 0 −1 0 0 0 0 1 −1 0 0 0 0 0 1 −1 0 0 0 0 1 −1 Perturbations of the rates were simulated with a change of the maximum rate parameter of the Michaelis-Menten rate law, in the same way as described in the previous section. The sensitivity matrix formed consequently were of dimensions (6 × 8). We do not possess the possibility of normalising the diagonal elements of rx , as we did earlier with fx , since the matrix M rx formed will not have the normalised elements on the diagonal. The alternative is to normalise one element of rx , to set the level for the estimation. Alternatively, M rx could be normalised with the diagonal elements, but this would give an estimate of rx where the elements are not scaled equally on each row, compared to the true values. An equation system is formed by multiplying individual rows of M with rx and selected columns from the sensitivity matrix Σ according to the criterion (3.7). 4.2 Results from Control Loop Determination 41 The columns that are valid for row i in this criterion are the ones corresponding to parameters that are not a part of any rate leading to or from node i. For row i, these are the indexes for the columns of M that contain a zero in row i. Whether or not we perform a normalisation of the elements, the equation system is under determined. The method still supplies 35 equations, but we now have additional unknown variables, due to the fact that rx is of dimensions (8 × 6). In its current state, the method cannot be applied to an arbitrary network. Though, by observing the sensitivity matrix and combining it with the available information on network structure, some additional information can be achieved. A perturbation of the parameter p8 , that is a part of r8 , will yield the following vector, corresponding to the last column of the sensitivity matrix: 0 0 0 ∂x = 0 ∂p8 0 c The c corresponds to the non-zero entry. With the example above, this means that a perturbation of a parameter in r8 only results in a steady state change in the amount of the species represented by state six. What does this mean in terms of network wiring? The criterion (3.7) for the perturbation is 0 0 1 B0 B 0=B B0 @0 0 −1 1 0 0 0 0 −1 1 0 0 0 0 −1 1 0 0 0 0 −1 0 0 −1 0 0 1 0 0 0 0 −1 r1,1 1 Br2,1 0 B Br3,1 B 0C C Br4,1 C 0C B Br5,1 0A B Br6,1 0 B @r7,1 r8,1 r1,2 r2,2 r3,2 r4,2 r5,2 r6,2 r7,2 r8,2 r1,3 r2,3 r3,3 r4,3 r5,3 r6,3 r7,3 r8,3 r1,4 r2,4 r3,4 r4,4 r5,4 r6,4 r7,4 r8,4 r1,5 r2,5 r3,5 r4,5 r5,5 r6,5 r7,5 r8,5 1 r1,6 0 1 r2,6 C C 0 B C r3,6 C C B0C B C r4,6 C C B0C B C r5,6 C C B0C C r6,6 C @0A r7,6 A c r8,6 Matrix multiplication of the last matrix and the perturbation vector results in a vector of dimensions (8 × 1) and this operation extracts the last column of the r x -matrix multiplied with the non-zero element from the perturbation vector. r1,6 r2,6 r3,6 r4,6 c r5,6 r6,6 r7,6 r8,6 42 Results To finish the matrix multiplication, the resulting criterion has the following appearance: r1,6 − r2,6 r2,6 − r3,6 − r6,6 0 = c r3,6 − r4,6 r4,6 − r5,6 r6,6 − r7,6 This is an under determined equation system which cannot be divided into independent subsystems. Since this system does not have a unique solution, we can make three interpretations: - All variables are non-zero This means that species six affects all rates, and hence all other species in the network. This is an improbable scenario in a biological system. Why should there be controlling effects that exactly balance each other out for all species? - A subset of the variables are non-zero It is possible that some of the variables in the equation system are non-zero. For example, if r1,6 and r2,6 are zero, and the rest are non-zero, the other variables are uniquely determined. The non-zero variables must balance each other out exactly. Is this scenario probable? Perhaps more probable than the first scenario, but still not very believable from a biological point of view. Species six must affect rates in the network in a very balanced fashion. - All variables are zero This is hence the most probable interpretation: species six does not affect any other rate than rate eight. From this long statement, we can draw the conclusion that the entries r1,6 , r2,6 , r3,6 , r4,6 , r5,6 , r6,6 , r7,6 of rx are zero. Although we can eliminate seven variables from out equation system, we still have too many variables in relation to equations. This consideration of the network architecture combined with the sensitivity vectors is not all in vain, even if the method still is not applicable to the evaluation network. This at least shows that information can be deduced from the sensitivity matrix alone, although it is not an easy task. It should also be noted that the algebraic connections between the different elements deduced from the criterion (3.7) shows a pattern that cannot at all be picked up by the method of Section 3.1.1. See further discussion in Chapter 5. 4.3 Results from Local Linearisation and SITB Estimation 4.3 43 Results from Local Linearisation and SITB Estimation The local linearisation method described in Section 3.2.1 has been evaluated on different models of signal cascades with several phosphorylation levels. The idea was to develop an identification method that could be applied to data from mass spectrometry, where the ratio of the phosphorylated species to the total amount, or possibly the individual phosphorylated species, could be measured on each level. 4.3.1 The Evaluation Network As a starting point, a simple signal cascade of three levels, with one unphosphorylated and one phosphorylated species on each level, was implemented in PathwayLab. The phosphorylation steps were modelled with Michaelis-Menten kinetics with kinetic parameters that are listed in Appendix A. Each level is an isolated module with internal mass flow. The different levels communicate via control loops. The simple model was made with a real signal cascade in mind. It is important to note that Michaelis-Menten not necessarily is the most plausible kinetic model for the phosphorylation steps in a real signal cascade, although it is frequently used. The time constants in the artificial cascade is not necessarily the same as time constants in real signal cascade. However, the reasoning on sampling and frequency interpretation below, can be transferred to an arbitrary signal cascade. The cascade is depicted in Figure 4.2. The signal u in the figure represents the u v-1 M1 v-2 M1-P v-3 M2 v-4 M2-P Activate v-5 M3 v-6 M3-P Inhibit Figure 4.2. The simplified signal cascade with two species on each level. input signal to the system. If we relate this signal to a real cascade, it could correspond to an extracellular activator that binds to receptors on the cell surface. 44 Results The control loops were mathematically modelled as competitive inhibition and simple multiplicative activation respectively. The cascade is described with a state space model with three state variables. Since we have mass conservation on each level in the cascade, we only have one independent species per level. For example, if the total amount of phosphorylated and unphosphorylated species on the first level is m1tot , then we can determine the unphosphorylated species at all time instants as m1tot − M1 P (t), with the same notation as in Figure 4.2. If we only can measure the ratio of phosphorylated species to the total amount on each level, we in fact measure each independent species, but scaled with a constant. Selecting the scaled phosphorylated species as state variables, will give the following system description: d M1 P (t) = f1 (M1 P (t), M3 P (t), u(t), θ) dt m1tot d M2 P (t) = f2 (M1 P (t), M2 P (t), M3 P (t), θ) dt m2tot d M3 P (t) = f3 (M2 P (t), M3 P (t), θ) dt m3tot (4.2) A less cumbersome notation for the set of differential equations is ẋ1 (t) = f1 (x1 (t), x3 (t), u(t), θ) ẋ2 (t) = f2 (x1 (t), x2 (t), x3 (t), θ) ẋ3 (t) = f3 (x2 (t), x3 (t), θ) (4.3) The variables each fi is dependent on are deduced from the illustration of the cascade. The vector θ gathers the parameters of the system. The cascade eventually reaches the chosen operating point, i.e., the steady state. The true Jacobian of the system, which is used for the validation process, is retrieved by differentiating the right hand side of the differential equations for the system with respect to each state variable, i.e., the phosphorylated species, and inserting the steady state values. x1 x2 x3 x1 −0.0358 0 −0.0493 Aref = x2 1.8032 −0.2642 −3.1192 x3 0 0.3426 −0.1367 The matrix is, as the name implies, the A-matrix of the true linearised state space model given by equation (3.13). The true B-matrix, quantifying the effect of the input signal on the derivatives of the state variables, was also calculated in the same manner. Step responses for each of the three subsystems, {(y1 , u), (y2 , u), (y3 , u)}, were calculated using these matrices and are given in Figure 4.3. The input signal, u, to the system was a PRBS and was chosen with respect to its ability to excite the system. The signal is given in Figure 4.4. The offset of the 4.3 Results from Local Linearisation and SITB Estimation 45 Step Response y1 1 0.5 Amplitude y2 0 0.2 0.1 0 y3 0.6 0.4 0.2 0 0 10 20 30 40 Time (sec) 50 60 70 80 Figure 4.3. Step responses for the three subsystems. signal is equal to the value of u in the steady state and the level of the pulses are small enough for the linear approximation to be valid. steady state value t Figure 4.4. The input signal with offset equal to the steady state value. 4.3.2 Simulation and Identifiability The system was simulated to retrieve data for use in the identification process. The simulation step involved solving the differential equations modelling the dynamics between the entities in the system, and was made with Mathematica. 46 Results The data from the simulation was preprocessed before the identification step. Preprocessing is an important part of the identification. First, the steady state values were subtracted from the values of each of the state variables, xi , as well as the input variable u to create the signals for the deviations around the steady state; e (t)). (x(t) − x0 , u(t) − u0 ) = (e x(t), u Further, the mean values were removed from the signals. This does not affect the A-matrix but compensates for an offset in the linear differential equation system originating from the constant amount of mass on each level. We have e˙ = Ae e x x + Bu Time series for all state variables are available, giving a C-matrix equal to the identity matrix. The input signals do not directly affect the outputs, giving a Dmatrix equal to the zero-matrix. The linear state space model on the innovations form, with the time dependence suppressed, for small deviations from the steady state becomes: x1 ν1 × × × × ẋ1 × × × ẋ2 = × × × x2 + 0 u + × × × ν2 × × × x3 0 ν3 ẋ3 × × × {z } | {z } | {z } | A B K (4.4) ν1 x1 1 0 0 y1 y2 = 0 1 0 x2 + ν2 x3 0 0 1 y3 ν3 {z } | C The parametrisation of the matrices given by the state space model in (4.4) is a canonical parametrisation, if we allow all elements of the B-matrix to be adjustable in the identification process. The full parametrisation of B gives a more general method, since it might not be known which modules that are explicitly dependent on the input signal(s). The system is hence globally and locally identifiable, given that {A, [B K]} is controllable, as stated in the previous section. If we choose the K-matrix to be equal to zero, we have an output-error model; the error only enters the output equation (Ljung, 2001). The system, {Aref , B ref }, is controllable because the determinant of the matrix S = [B ref Aref B ref A2ref B ref ] is non-zero: 4.3 Results from Local Linearisation and SITB Estimation 47 x1 x2 x3 0.0672 x1 −0.0358 0 −0.0493 B = 0 Aref = x2 1.8032 −0.2642 −3.1192 ref 0 x3 0 0.3426 −0.1367 0.0672 0 0 −0.0024 det S = 0.000338256 0.1212 0 S= 0.0001 −0.0364 0.0415 4.3.3 The Identification Step With the information in the previous sextion, we can conclude that if we use our simulated data, without any noise sources added, sample fast enough and our chosen input signal is sufficiently exciting, we will be able to identify the system properly. The identification was made using the SITB for Matlab, and more specific the pem function, the basic estimation command, with the iddata data object (Ljung, 2001). More information about the pem function and the choice of parameters is given in Appendix B. The linearised system corresponds to three transfer functions; G1 (s), G2 (s), G3 (s). If we subject the system to an input signal with frequency ω, the complex number Gi (iw) is the system response for output signal yi . The function Gi (iw) as a function of ω is called the frequency response. The frequency responses for all the output signals are given in Figure 4.5 as Bode diagrams. Frequency response 2 10 Amplitude 0 10 −2 10 −4 10 −2 10 −1 10 0 10 0 10 10 1 10 2 1 10 0 Phase (deg) −20 −40 −60 −80 −100 −2 10 −1 10 10 Frequency (rad/s) (a) y1 2 48 Results Frequency response 0 Amplitude 10 −5 10 −2 10 −1 10 0 10 0 10 10 1 10 2 1 10 1 10 1 10 0 Phase (deg) −50 −100 −150 −200 −2 10 −1 10 10 Frequency (rad/s) 2 (b) y2 Frequency response 0 10 −2 Amplitude 10 −4 10 −6 10 −8 10 −2 10 −1 10 0 10 0 10 10 2 0 Phase (deg) −50 −100 −150 −200 −250 −300 −2 10 −1 10 10 Frequency (rad/s) 2 (c) y3 Figure 4.5. Bode plots for the output signals. The last two amplitude curves have a resonance peak, which means that some frequencies in the spectrum have a larger gain in the system. A resonance peak corresponds to a higher order system. The presence of complex conjugate pole pairs induces an oscillating behaviour in the step response from the system. In the bode plots we see a peak frequency of approximately 1 rad/s, implying that it is necessary to pick the Nyquist frequency above 1 rad/s, and well beyond that. 4.3 Results from Local Linearisation and SITB Estimation 49 A choice of the bandwidth of 3 rad/s will include all the frequencies of the resonance peaks and, according to the rule of thumb given in Section 3.2, ωS should be chosen to be 30 rad/s. A sampling interval, T , of 0.2 s should therefore be sufficient for identification. 4.3.4 Varying Sampling Intervals That the choice of a sampling frequency equal to 0.2 s is sufficient for identification is confirmed by simulation and subsequent identification for a set of different sampling intervals were T ranges from 0.1 to 3 s with an interval of 0.1 s. The estimated elements of the Jacobian are close to the true elements when the sampling interval is close to 0.2 s. The estimations compared to the true values for each of the elements of the Jacobian are depicted in Figure 4.6. Variations of Jacobi entries with sampling interval 4 Variations of Jacobi entries with sampling interval 1 0.8 3 0.6 0.4 Entry (1,2) of jacobian Entry (1,1) of jacobian 2 1 0.2 0 −0.2 0 −0.4 −0.6 −1 −0.8 −2 0 0.5 1 1.5 Sampling interval 2 (a) The element at (1,1) 2.5 3 −1 0 0.5 1 1.5 Sampling interval 2 (b) The element at (1,2) 2.5 3 50 Results Variations of Jacobi entries with sampling interval 1 Variations of Jacobi entries with sampling interval 6 5 0 4 Entry (2,1) of jacobian Entry (1,3) of jacobian −1 −2 3 2 −3 1 −4 −5 0 0 0.5 1 1.5 Sampling interval 2 2.5 −1 3 0 0.5 (c) The element at (1,3) 1.5 Sampling interval 2 2.5 3 2.5 3 2.5 3 (d) The element at (2,1) Variations of Jacobi entries with sampling interval 1 1 Variations of Jacobi entries with sampling interval 1 0 0.5 −1 Entry (2,3) of jacobian Entry (2,2) of jacobian 0 −0.5 −2 −3 −1 −4 −1.5 −2 −5 0 0.5 1 1.5 Sampling interval 2 2.5 −6 3 0 0.5 (e) The element at (2,2) 1.5 Sampling interval 2 (f) The element at (2,3) Variations of Jacobi entries with sampling interval 1 1 Variations of Jacobi entries with sampling interval 2 1.5 Entry (3,2) of jacobian Entry (3,1) of jacobian 0.5 1 0.5 0 0 −0.5 0 0.5 1 1.5 Sampling interval 2 (g) The element at (3,1) 2.5 3 −0.5 0 0.5 1 1.5 Sampling interval 2 (h) The element at (3,2) 4.3 Results from Local Linearisation and SITB Estimation 51 Variations of Jacobi entries with sampling interval 1 0.5 Entry (3,3) of jacobian 0 −0.5 −1 −1.5 −2 0 0.5 1 1.5 Sampling interval 2 2.5 3 (i) The element at (3,3) Figure 4.6. Gradual sampling results for each element of the Jacobian. The dotted line represents the true value of the element. From the subfigures of Figure 4.6 it is possible to see a trend towards estimations that lie further and further from the true value, with increasing sampling interval. The exception is for time intervals of 0.5, 1, 2, and 2.5 s. The estimations produced with these intervals are very close to the true values, and compared to the neighbouring time intervals, a lot closer to the true values. A reason for this might be that the sampling points are located on favourable parts of the step responses from the PRBS that are important for the identification process. A method to confirm this suspicion is to alter the dynamics of the signal cascade, and perform the same investigations. If the frequency responses from the altered cascade system still are tolerably similar to the ones of Figure 4.5 but with resonance peaks displaced, schemes for the gradual sampling intervals, like the ones in Figure 4.6, should exhibit the same pattern, but also dislocated. Dislocating the resonance peaks and the frequency responses as a whole involves slowing down or speeding up the dynamics of the step responses and the oscillations therein. The Bode plot for y2 of an altered system is depicted in Figure 4.7 and comparing it to subfigure Figure 4.5(b) will reveal a shifted amplitude curve with the resonance peak dislocated to lower frequencies. The same is true for the resonance peak in the bode plot for y3 . A plot with the dependence of the estimation of the Jacobian element (3,3) upon sampling interval is shown in Figure 4.8 below. The estimation with a sampling interval of 2.5 s is still close to the true value, and closer to the estimations with 52 Results Frequency response 0 10 Amplitude −2 10 −4 10 −6 10 −2 −1 10 10 0 10 0 10 10 1 10 2 1 10 0 Phase (deg) −50 −100 −150 −200 −2 10 −1 10 10 Frequency (rad/s) 2 Figure 4.7. The bode plot for the output channel corresponding to y2 for the alterred cascade. Variations of Jacobi entries with sampling interval 0 −0.05 Entry (3,3) of jacobian −0.1 −0.15 −0.2 −0.25 −0.3 −0.35 0 0.5 1 1.5 Sampling interval 2 2.5 3 Figure 4.8. Dependence of the estimated Jacobi element in position (3,3) upon sampling interval for the altered cascade. sampling intervals near 2.5 s. The plausible explanation of the sudden occurring improved estimations could not be proven with this procedure. 4.3.5 Noise addition Noise always affects data collection in real systems. The sensitivity of the identification of the simplified cascade in Figure 4.2 can be evaluated through addition of noise to the simulated signals. 4.3 Results from Local Linearisation and SITB Estimation 53 Measurement noise was simulated by an addition of random elements, drawn from a normal distribution with zero mean and a small variance, σ 2 , to the sampled values of the output signals. The noise is stochastic and cannot be predicted, even if complete information of its history is available. It is neither possible to exactly describe the noise as a signal, although we can assess mean, amplitude distribution, and autocorrelation (Svärdström, 1999). Noise lacks periodicity and does not correlate with itself. The autocorrelation for a white noise signal is consequently simple. The autocorrelation only consists of a strong peak at k = 0: rxx (k) = σ 2 δ(k) The phase spectrum of the noise is impossible to retrieve because of the stochastic nature of the noise, but the power spectrum, P (ω), is attainable. The power spectrum and the autocorrelation for a signal constitute a discrete-time Fourier transform (DTFT) pair. The DTFT for a signal x(n) is defined as F{x(n)} = X(Ω) = ∞ X x(n)e−jΩn n=−∞ The transform is continuous and periodic with period 2π (Poularikas, 1999). The argument Ω is a normalised frequency. The transform can be expressed with the angular frequency, ω, as the argument and is an approximation to the continuoustime Fourier transform, defined earlier. X(ω) = ∞ X T x(n)e−jωnT n=−∞ where T is the sampling interval for the sequence x(n) (Poularikas, 1999). The power spectrum for a white noise signal is, with these definitions, equal to σ 2 T . Non-band limited white noise has a constant power spectrum for all frequencies. The addition of noise sources produces a need to also estimate the Kalman filter K(θ), which will minimise the prediction error, as we saw earlier. How the estimation of the Jacobian elements succeed as a function of the variance of the noise that is added can be seen in Figure 4.9. The complete set of plots can be found in Appendix C. 54 Results Variations of Jacobi entries with noise variance 0.06 Variations of Jacobi entries with noise variance 0.06 0.04 0.05 0.02 0.04 Entry (1,2) of jacobian Entry (1,1) of jacobian 0 −0.02 −0.04 0.03 0.02 −0.06 0.01 −0.08 0 −0.1 −0.12 −12 10 −11 10 −10 10 Noise variance −9 10 (a) The element at (1,1) −8 10 −0.01 −12 10 −11 10 −10 10 Noise variance −9 10 −8 10 (b) The element at (1,2) Figure 4.9. The effect of addition of noise sources with increasing variance on the estimation of the Jacobian elements. The estimations of the elements are remarkably further from the true values when the variance of the noise reaches 10−9 and particularly 10−8 . We can motivate the increasingly poorer estimations by an examination of the power spectrums for the noise source and the estimated power spectrums for the output signals. Let the noise spectrum be constant and equal to 0.2 · 10−9 = 2 · 10−10 . The joint plots of Figure 4.10 for the noise spectrum and for each output signal help us to evaluate which frequencies that are shadowed by the noise. The plots of the spectrum for all output signals reveal that the noise will affect and dominate, at least partially, over the frequencies that correspond to the resonance peaks (in the Bode diagrams), that is, the oscillations in the step responses. These frequencies are vital to the identification process, and the poor resulting estimations are understandable. Performing several analyses of this type for different variances might give information of exactly which frequencies that are crucial to the identification process. That a variance of 10−9 for the noise sources causes identification problems is quite clear if we examine the signals in Figure 4.11 and compare them to the noise-free versions of the signals in Figure 4.12. The oscillations of the step responses are drowned by the addition of noise and it is not possible to identify the Jacobian elements correctly. 4.3 Results from Local Linearisation and SITB Estimation Spectrum estimate −6 10 −7 10 −8 10 −9 y1 10 −10 10 −11 10 −12 10 −13 10 −14 10 −1 10 0 10 Frequency (rad/s) 1 10 1 10 10 2 (a) Spectrum for y1 Spectrum estimate −6 10 −7 10 −8 10 −9 y2 10 −10 10 −11 10 −12 10 −13 10 −14 10 −1 10 0 10 Frequency (rad/s) (b) Spectrum for y2 10 2 55 56 Results Spectrum estimate −6 10 −7 10 −8 10 −9 y3 10 −10 10 −11 10 −12 10 −13 10 −14 10 −1 0 10 1 10 Frequency (rad/s) 2 10 10 (c) Spectrum for y3 Figure 4.10. Plot of the spectrum for each output signal and the noise spectrum. −4 12 x 10 10 8 y1 6 4 2 0 −2 0 100 200 300 400 500 time (s) (a) y1 600 700 800 900 1000 4.3 Results from Local Linearisation and SITB Estimation 57 −4 3.5 x 10 3 2.5 2 y2 1.5 1 0.5 0 −0.5 −1 0 100 200 300 400 500 time (s) 600 700 800 900 1000 600 700 800 900 1000 (b) y2 −4 8 x 10 7 6 5 y3 4 3 2 1 0 −1 0 100 200 300 400 500 time (s) (c) y3 Figure 4.11. Output signals with noise of variance 10−9 added and a sampling interval of 0.2 s. 58 Results −3 1.2 x 10 1 y1 0.8 0.6 0.4 0.2 0 0 100 200 300 400 500 time (s) 600 700 800 900 1000 600 700 800 900 1000 (a) y1 −4 3 x 10 2.5 y2 2 1.5 1 0.5 0 0 100 200 300 400 500 time (s) (b) y2 4.3 Results from Local Linearisation and SITB Estimation 59 −4 6 x 10 y3 4 2 0 0 100 200 300 400 500 time (s) 600 700 800 900 1000 (c) y3 Figure 4.12. Output signals without added noise. 4.3.6 Additional Investigations The method of this section was also applied to the signal cascade of Figure 2.1, which is a more complex cascade with three species on each level. The role of the input signal of this system is carried by the membrane bound kinase Ras-GTP (Kholodenko et al., 2002). The network has six independent species owing to mass conservation on each phosphorylation level. The network dynamics is modelled with six differential equations that are given in the appendix of (Kholodenko et al., 2002). The expressions for the rates are complex and are dependent on a large set of parameters. The cascade can be linearised around a steady state, like we saw in the beginning of the section, and will result in a set of six linear differential equations, x = (x1 , . . . , x6 ): ẋ = Ax + Bu. (4.5) The linearisation is valid in a vicinity of a steady state, and the variable x repree is dropped to avoid sents the deviations from this steady state. The notation x cumbersome expressions. If we can measure all the phosphorylated species independently, the system is globally identifiable and it is possible to find the true Jacobian, although it is 60 Results computationally demanding. The objective to find the Jacobian if time series of only fractions of phosphorylated species on each level is available, is more complex. Since we have three species on each level, and two are independent, we do not measure all the state variables. If we use the canonical parametrisation for the state space description, we cannot fully fill the A-matrix. This is not enough for our identification purposes, since we seek a fully parameterised Jacobian. The interactions within the different levels are very complex, and it would be difficult to interpret the wiring. A slightly different approach is to only estimate a Jacobian of dimensions (3 × 3). The fraction of the phosphorylated species for each level form the three state variables. An estimation of this Jacobian would reveal how the sets of phosphorylated species on different levels are connected through control loops. The difficulty with this approach is most likely dependent on the fact that the summation of two signals might cancel out important oscillations that are crucial to the identification process. It is always more difficult to try and identify parts of a system where states that in reality do affect the dynamics of the included states are left out. Figure 4.13 shows a cancellation, probably due to the fact the oscillations are small compared to the amplitude of the signals. The small oscillatory behaviour induced by the input signal is lost if we add the two phosphorylated species. The identification attempts were unsuccessful when this data was used. What we try to do when we reduce the system to only three states instead of six, is really an attempt to estimate the six states of the system from only measurement of the three species. By multiplying the reduced differential equation system (4.5) for six species with a transformation matrix representing the addition of the phosphorylated species on each level will lead to: 1 1 0 0 0 0 m1,tot m1,tot 1 1 0 0 0 (4.6) T = 0 m2,tot m2,tot 1 1 0 0 0 0 m3,tot m3,tot d T x = T ẋ = T Ax + T Bu (4.7) dt If we want to remove x from the three species system, we need to approximate it by replacing it with T −1 z, where T −1 is the pseudoinverse of T . ż = Using time series data for only the bi-phosphorylated species did not lead to any useful results. This confirms the fact that measurements of all independent variables are necessary for the method to be applicable. 4.3 Results from Local Linearisation and SITB Estimation 61 MKKK-PP 19.5 MKKK-P 15.65 19.4 19.3 15.6 19.2 15.55 19.1 500 520 540 560 580 600 time 500 520 540 560 580 600 time 18.9 15.45 (a) MKKK-PP (b) MKKK-P HMKKK-P+MKKK-PPLDm1,tot 0.175 0.174 0.173 500 520 540 560 580 600 time (c) Fraction of phosphorylated species Figure 4.13. The dynamics of a response to an impulse for a part of the MAPK cascade. 62 Results Chapter 5 Discussion In this chapter, different aspects of identification of biochemical networks are discussed. The applicability of the different methods are compared and the choice of which method to possibly employ in different situations is discussed. The choice of identification method is first and foremost dependent on the available experimental data. The division of the sections in the previous chapters reflect this fact. Time series data and steady state data have different information content and this of course affects the amount of information possible to retrieve from the data. The interaction graph determining method of Section 3.1.1 is practically not a complicated method, but the requirements of the method can be hard to fulfill. In contrast, the theoretical basis for the method is more elaborate. The mathematics behind the method is easier to understand when the method is put to use. The interaction graph method is dependent on the possibility of perturbations to the different nodes of the network under investigation. The perturbations can be additions of inhibitors or activators as well as RNAi:s that effect the production of a catalysing enzyme. If the network consists of n nodes, at least n perturbations to different nodes is needed, and some information about the network wiring is required before hand, as described in the previous chapter. The question is if it really is realistic to perform this amount of perturbation experiments. Effectors of different kinds must be known for a large set of catalysing enzymes, and each effector can only affect a single enzyme. The change in the parameter, ∆pj , of a perturbation is seldom known. The sensitivities must then be approximated, and with the consideration of measurement noise, the method can be unreliable. The normalisation of the diagonal elements of the Jacobian can cause enlargement of elements that otherwise should be zero. 63 64 Discussion The advantage of the method is that for simple, and particularly signalling, networks it is easier to use. Nodes of signalling networks can be chosen to have limited information exchange through mass flow, and this simplifies the method considerably, although the problems with noise and normalisation still remain. The second method based on local parameter perturbation data, from Section 3.1.2, exhibits the same issues with perturbations and normalisation, but since the mass flow is known, the requirements are easy to fulfill. The method was developed in order to replace the need for interpretation of the Jacobian, which can be difficult for networks with mass flow. As described earlier, the mass flow of the network and the stoichiometric relationships are assumed to be known. The problem with this method was that, despite knowledge of the mass flow, the introduction of additional unknown variables, rendered the method unusable. The core of this problem is that a set of non-zero elements of the matrix rx is known to be non-zero, but the elements cannot be normalised. Since the values of these entries were not exactly known, the information could not be used. The method still provided some useful interpretations of certain entries of the sensitivity matrix. The occurrence of a sparse perturbation vector induced several alternatives concerning the entries of rx . Analysis of the same network with the interaction graph determining method did not reveal any of this information. This extra complexity is probably the reason that the control loop determination method became to elaborate for application. It is still important to note that information is retrievable from the perturbation matrix in combination with the method, although some work must be done to do so. The main method based on time series data from Section 3.2.1 is fundamentally different from the methods based on local parameter perturbation data. The application of control system methods is an aid in the evaluation of the method. The method is based on the fact that all species, that are represented by a state variable, are measurable during a time span. In the previous chapters, the difficulty of time series measurements on biological systems have been investigated. With the development of experimental methods, the availability of time series data will probably increase, but at the moment quantitative data of this kind is rare. The method requires an input signal with certain properties. The signal must sufficiently excite the system, and such a signal can be difficult to realise in practise. The PRBS used in the simulations can correspond to changes in the amount of an extracellular signalling molecule or any other substance effecting the nodes of the network. An extracellularly regulated input signal is much simpler to affect than an intracellular signal, since alterations of the cell environment risk changing the complete state of the cell, and are simply more difficult to realise. The advantage of the method is that a Jacobian of the system under investigation is possible to exactly determine, given the “right” circumstances, as described 65 earlier. Individual perturbations to the nodes of the network are not needed, and no information about the network wiring is required before hand, as in the methods based on steady state measurements. The local linearisation method simply demands knowledge and measurements of the species that are represented by the state variables. Which state variables that are explicitly dependent on the input signal(s) are neither necessary to know before hand. If the system under investigation is large, i.e., has many nodes, the identification algorithm can be slow, and convergence problems might appear. The problem of large, complex networks is in fact an issue in all identification methods. The local linearisation method has a certain tolerance for noise, due to the noise models that are included in the system description, and since biological data often is noisy, this is an advantage compared to the other methods of this thesis. The methods based on steady state data, do not have any modelling of the noise, and are hence more susceptible to different noise sources. As a comparison, the simple discrete method of Section 3.2.2 was evaluated on the same network as the more complex local linearisation model of Section 3.2.1, and exhibited extreme sensitivity to noise of even a very small variance, showing the importance of adding noise models. 66 Discussion Chapter 6 Conclusions Identification of biochemical systems is not an easy task, due to several reasons like measurement issues, the complexity of the networks, and input signal excitation and perturbation possibilities. Are any of the identification methods in this thesis really applicable to experimental systems? The methods probably can be applied to biological systems, but thorough consideration of the requirements of each method is a must, as well as understanding of the basis for the methods. If the requirements of the method are not fulfilled, the identification will inevitably fail. In applying the methods to biological systems, it is advisory, if possible, to not use certain knowledge of the system in the methods, and instead employ this information in some kind of validation, although this is not a complete validation procedure. The most important thing to bare in mind is that an identification process will give the results that are deducible from the data, which is the “true” system, based on this particular data. An advise to the user: consider the output from an identification method a possible estimation of the system structure. The most reliable identification of a system would probably be the result of a combination of different methods, based on both local parameter perturbation data and time series data respectively. The method most reliable on its own is probably the local linearisation method with added noise models. A natural continuation of this master thesis would be to, if the possibility exists, test the methods on experimental data from biochemical systems. 67 68 Conclusions Bibliography Atkins, P. and Jones, L. (1999). Chemical principles: the quest for insight. W. H. Freeman and Company, NY, USA. Bhalla, U. S. and Iyengar, R. (1999). Emergent properties of networks of biological signaling pathways. Science, 283(5400):381–387. BioSim Network (2005). Biosim project. http://chaos.fys.dtu.dk/biosim. Close, C. M., Frederick, D. K., and Newell, J. C. (2002). Modeling and Analysis of Dynamic Systems. John Wiley & Sons, 605 Third Avenue, New York, NY USA, 3rd edition. Cornish-Bowden, A. (1995). Fundamentals of Enzyme Kinetics, Revised edition. Portland Press, London, UK. Cornish-Bowden, A. and Wharton, C. W. (1988). Enzyme kinetics. In focus. IRL Press, Eynsham, Oxford, UK. Edelstein-Keshet, L. (1988). Mathematical Models in Biology. McGraw-Hill. Friedman, N., Linial, M., Nachman, I., and Pe’er, D. (2000). Using bayesian networks to analyze expression data. Journal of Computational Biology, 7(3/4):601–620. Ideker, T., Thorsson, V., Ranish, J. A., Christmas, R., Buhler, J., Eng, J. K., Bumgarner, R., Goodlett, D. R., Aebersold, R., and Hood, L. (2001). Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science, 292(5518):929–934. InNetics AB (2005). Pathwaylab. http://innetics.com/. Kholodenko, B. N., Kiyatkin, A., Bruggeman, F. J., Sontag, E., Westerhoff, H. V., and Hoek, J. B. (2002). Untangling the wires: A strategy to trace functional interactions in signaling and gene networks. PNAS, 99(20):12841–12846. Kholodenko, B. N. and Sontag, E. D. (2002). Determination of functional network structure from local parameter dependence data. ArXiv Physics e-prints. 69 70 Bibliography Kitano, H. (2001). Foundations of Systems Biology, chapter Systems Biology: Toward System-level Understanding of Biological Systems, pages 1–29. MIT Press, Cambridge, MA USA. Kitano, H. (2002). Systems biology: A brief overview. Science, 295(5560):1662– 1664. Linstrom, P. J. and Mallard, W. G., editors (2003). NIST Chemistry WebBook, NIST Standard Reference Database Number 69. National Institute of Standards and Technology, Gaithersburg MD 20899, USA. (http://webbook.nist.gov). Ljung, L. (1987). System Identification: Theory for the User. Prentice-Hall, Englewood Cliffs, N.J. USA. Ljung, L. (2001). System Identification Toolbox for use with MATLAB, User’s Guide. The MathWorks, Inc., Natick, MA USA, version 5 edition. Ljung, L. and Glad, T. (1997). Reglerteori. Studentlitteratur, Lund, Sweden. In Swedish. Ljung, L. and Glad, T. (2004). Modellbygge och simulering. Studentlitteratur, Lund, Sweden, 2nd edition. In Swedish. Lodish, H., Berk, A., Zipursky, S. L., Matsudaira, P., Baltimore, D., and Darnell, J. (2000). Molecular Cell Biology. W. H. Freeman and Company, 4th edition. Morel, N. M., Holland, J. M., van der Greef, J., Marple, E. W., Clish, C., Loscalzo, J., and Naylor, S. (2004). Primer on medical genomics part xiv: introduction to systems biology - a new approach to understanding disease and treatment. Mayo Clinic Proceedings, 79(5):651–658. Poularikas, A. D. (1999). The Handbook of Formulas and Tables for Signal Processing, chapter Discrete-Time Fourier Transform, One- and Two-Dimensional. The Electrical Engineering Handbook Series. CRC Press, Boca Raton, FL USA. Svärdström, A. (1999). Signaler och system. Studentlitteratur, Lund, Sweden. In Swedish. Wahde, M. and Hertz, J. (2000). Coarse-grained reverse engineering of genetic regulatory networks. Biosystems, 55(1-3):129–136. Wolfram Research, Inc. (2003). Mathematica. Wolfram Research, Inc., Champaign, Illinois, USA. Wolkenhauer, O., Kitano, H., and Cho, K.-H. (2003). Systems biology. IEEE Control Systems Magazine, 23(4):38–48. 71 Yi, T.-M., Huang, Y., Simon, M. I., and Doyle, J. (2000). Robust perfect adaption in bacterial chemotaxis through integral feedback control. PNAS, 97(9):4649– 4653. Zeigler, B. P., Praehofer, H., and Kim, T. G. (2000). Theory of modeling and simulation : integrating discrete event and continuous complex dynamic systems. Academic Press, San Diego, CA USA. Zubay, G. L., Parson, W. W., and Vance, D. E. (1995). Principles of Biochemistry. Wm. C. Brown Communications, Dubuque, IA USA. 72 Bibliography Appendix A Kinetic Parameters for the Evaluation Networks The evaluation network from Section 4.1 consists of six states that are interconnected through rates with Michaelis-Menten kinetics. The kinetic parameters for each rate are given in Table A.1 below, where V is the maximum rate, Km the Michaelis constant, and Ki the inhibition constant in the purely competitive inhibition mechanism. Table A.1. Michaelis-Menten parameters for the rates in the evaluation network with mass flow. r1 r2 r3 r4 r5 r6 r7 r8 V 0.5 1.1 1 0.5 1.2 1.5 1 0.8 Km 1 1 1 1 1 1 1 Ki 1 - The initial values of the amounts of each species were set to 1.0. 73 74 Kinetic Parameters for the Evaluation Networks The kinetic parameters of the simplified artificial signal cascade, depicted in Figure 4.2 and used for evaluation in Section 4.3, are listed in Table A.2. Table A.2. Michaelis-Menten parameters for the rates in the simplified signal cascade. v1 v2 v3 v4 v5 v6 V 2.5 2.2 1.2 2.5 0.5 5 Km 0.9 1 1 1 1 1 Ki 1.3 - The initial values for the phosphorylated species were set to zero, while the initial values for M1 , M2 , and M3 were chosen to be 30, 18, and 25 respectively. Appendix B Some Functions in the SITB The System Identification ToolBox (SITB) for Matlab contains a plethora of functions and data objects. The information in this appendix is summarised from the SITB manual (Ljung, 2001). The pem function is the basic estimation command and estimates parameters of general linear models. It is a maximum likelihood method and iteratively minimises a quadratic prediction error criterion. The search for the the optimum is governed by a set of options. m = pem(data,orders,’Property1’,Value1,...,’PropertyN’,ValueN) The data is any form of data object while orders indicates how many states the model to be estimated has. The iddata object is a basic object for handling of signals in the toolbox. Most functions can process data in this form. The data object handles both frequency and time domain data. data = iddata(y,u,Ts,’Property1’,Value1,...,’PropertyN’,ValueN) y represents the output signals in column form, while u contains the input signals in the same form. T s is the sampling interval in the data. The following code exemplifies the estimations made in the thesis: %make data data=iddata(output,input,sampinterval); 75 76 Some Functions in the SITB %detrend data datad = detrend(data,’constant’); %make model model=pem(datad,3,’ss’,’can’,’Ts’,0,’DisturbanceModel’,’None’); Included is also the detrend-command that removes trends from data. Here the mean is removed from each signal, since the option ’constant’ is given to the function. The option ’ss’ represents the parametrisation of the matrices in the model, which is chosen to be canonical in the example. The option ’Ts’ is set to 0 to give the information that a continuous model is to be estimated. A disturbance model is not estimated in the example, indicated by the option ’None’. Appendix C Noise Effects on Estimation White noise effects can be simulated by adding random elements from a normal distribution with zero mean and a given variance to output signals from a system, which was explained in the main part of the thesis. Estimations of the Jacobian elements for the simplified signalling cascade, given by Figure 4.2, were done with additions of noise with increasingly larger variance. The estimations of all nine Jacobi elements as a function of the noise variance is illustrated with the following plots. Variations of Jacobi entries with noise variance 0.06 Variations of Jacobi entries with noise variance 0.06 0.04 0.05 0.02 0.04 Entry (1,2) of jacobian Entry (1,1) of jacobian 0 −0.02 −0.04 0.03 0.02 −0.06 0.01 −0.08 0 −0.1 −0.12 −12 10 −11 10 −10 10 Noise variance −9 10 −8 10 (a) The element at (1,1) −0.01 −12 10 −11 10 −10 10 Noise variance −9 10 (b) The element at (1,2) 77 −8 10 78 Noise Effects on Estimation Variations of Jacobi entries with noise variance 0.1 Variations of Jacobi entries with noise variance 2.2 2.15 0.05 2.1 2.05 Entry (2,1) of jacobian Entry (1,3) of jacobian 0 −0.05 −0.1 2 1.95 1.9 1.85 −0.15 1.8 −0.2 1.75 −0.25 −12 10 −11 10 −10 10 Noise variance −9 10 1.7 −12 10 −8 10 (c) The element at (1,3) −10 10 Noise variance −9 10 −8 10 (d) The element at (2,1) Variations of Jacobi entries with noise variance −0.15 −11 10 Variations of Jacobi entries with noise variance −2.9 −3 −0.2 −3.1 Entry (2,3) of jacobian Entry (2,2) of jacobian −0.25 −0.3 −0.35 −0.4 −3.3 −3.4 −3.5 −0.45 −0.5 −12 10 −3.2 −3.6 −11 10 −10 10 Noise variance −9 10 (e) The element at (2,2) −8 10 −3.7 −12 10 −11 10 −10 10 Noise variance −9 10 (f) The element at (2,3) −8 10 79 Variations of Jacobi entries with noise variance 0.05 Variations of Jacobi entries with noise variance 0.355 0.35 0.345 0 Entry (3,2) of jacobian Entry (3,1) of jacobian 0.34 −0.05 −0.1 0.335 0.33 0.325 0.32 0.315 −0.15 0.31 −0.2 −12 10 −11 10 −10 10 Noise variance −9 0.305 −12 10 −8 10 10 −11 −10 10 (g) The element at (3,1) 10 Noise variance −9 10 −8 10 (h) The element at (3,2) Variations of Jacobi entries with noise variance 0.2 0.15 Entry (3,3) of jacobian 0.1 0.05 0 −0.05 −0.1 −0.15 −0.2 −12 10 −11 10 −10 10 Noise variance −9 10 −8 10 (i) The element at (3,3) Figure C.1. The effects of addition of noise with increasing variance on the estimation of the complete set of Jacobian elements.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

### Related manuals

Download PDF

advertisement