Department of Physics and Measurement Technology

Department of Physics and Measurement Technology
Department of Physics and Measurement Technology
Master’s Thesis
Evaluation and Development of Methods for
Identification of Biochemical Networks
Alexandra Jauhiainen
LITH-IFM-EX-05/1378-SE
Department of Physics and Measurement Technology
Linköpings universitet
SE-581 83 Linköping, Sweden
Master’s Thesis
LITH-IFM-EX-05/1378-SE
Evaluation and Development of Methods for
Identification of Biochemical Networks
Alexandra Jauhiainen
Supervisor:
Mats Jirstrand
Fraunhofer-Chalmers Research Centre for Industrial Mathematics
Examiner:
Jesper Tegnér
IFM, Linköpings universitet
Linköping, 22 February, 2005
Datum
Date
Avdelning, Institution
Division, Department
Division of Computational Biology
Department of Physics and Measurement Technology
Linköpings universitet
SE-581 83 Linköping, Sweden
Språk
Language
Rapporttyp
Report category
ISBN
Svenska/Swedish
Licentiatavhandling
ISRN
Engelska/English
Examensarbete
C-uppsats
D-uppsats
Övrig rapport
2005-02-22
—
LITH-IFM-EX-05/1378-SE
Serietitel och serienummer ISSN
Title of series, numbering
—
URL för elektronisk version
http://www.ep.liu.se/exjobb/ifm/bi/
2005/1378/
Titel
Title
Evaluering och utveckling av metoder för identifiering av biokemiska nätverk
Evaluation and Development of Methods for Identification of Biochemical Networks
Författare Alexandra Jauhiainen
Author
Sammanfattning
Abstract
Systems biology is an area concerned with understanding biology on a systems
level, where structure and dynamics of the system is in focus. Knowledge about
structure and dynamics of biological systems is fundamental information about
cells and interactions within cells and also play an increasingly important role in
medical applications.
System identification deals with the problem of constructing a model of a system from data and an extensive theory of particularly identification of linear systems exists.
This is a master thesis in systems biology treating identification of biochemical
systems. Methods based on both local parameter perturbation data and time
series data have been tested and evaluated in silico.
The advantage of local parameter perturbation data methods proved to be that
they demand less complex data, but the drawbacks are the reduced information
content of this data and sensitivity to noise. Methods employing time series data
are generally more robust to noise but the lack of available data limits the use of
these methods.
The work has been conducted at the Fraunhofer-Chalmers Research Centre
for Industrial Mathematics in Göteborg, and at the division of Computational
Biology at the Department of Physics and Measurement Technology, Biology, and
Chemistry at Linköping University during the autumn of 2004.
Nyckelord
Keywords
Systems Biology, System Identification, Biochemical Networks.
Abstract
Systems biology is an area concerned with understanding biology on a systems
level, where structure and dynamics of the system is in focus. Knowledge about
structure and dynamics of biological systems is fundamental information about
cells and interactions within cells and also play an increasingly important role in
medical applications.
System identification deals with the problem of constructing a model of a system
from data and an extensive theory of particularly identification of linear systems
exists.
This is a master thesis in systems biology treating identification of biochemical
systems. Methods based on both local parameter perturbation data and time
series data have been tested and evaluated in silico.
The advantage of local parameter perturbation data methods proved to be that
they demand less complex data, but the drawbacks are the reduced information
content of this data and sensitivity to noise. Methods employing time series data
are generally more robust to noise but the lack of available data limits the use of
these methods.
The work has been conducted at the Fraunhofer-Chalmers Research Centre for Industrial Mathematics in Göteborg, and at the division of Computational Biology at
the Department of Physics and Measurement Technology, Biology, and Chemistry
at Linköping University during the autumn of 2004.
Keywords:
Systems Biology, System Identification, Biochemical Networks
i
ii
Acknowledgement
I would like to thank my supervisor at Fraunhofer-Chalmers Research Centre
(FCC), Mats Jirstrand, for his help and enthusiasm in this project. Additional
thanks to my friends and everyone who has provided comments and support on my
thesis work. Final thanks to the staff at FCC.
iii
iv
Notation
Symbols and abbreviations used in the thesis are gathered here for clarification.
All the abbreviations are also explained in the main part of the thesis, at their first
occurrence.
Symbols
x, X
θ
DM
Boldface letters are used for vectors, matrices, and sets.
Parameter vector.
Set of values over which θ ranges in a model structure.
Abbreviations
SITB
MAPK
PRBS
System Identification ToolBox
Mitogen Activated Protein Kinase
Pseudo Random Binary Signal
v
vi
Contents
1 Introduction
1
1.1
Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Systems Biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2.1
What is Systems Biology? . . . . . . . . . . . . . . . . . . . .
1
1.2.2
Why Perform Research on Systems Biology? . . . . . . . . .
6
The Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.3.1
Aim and Audience . . . . . . . . . . . . . . . . . . . . . . . .
6
1.3.2
Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.3
2 Theory
2.1
2.2
2.3
9
Systems, Modelling, and Simulation . . . . . . . . . . . . . . . . . .
9
2.1.1
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.1.2
Models
2.1.3
Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
System Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1
The Identification Process . . . . . . . . . . . . . . . . . . . . 11
2.2.2
Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Biological Network Structures . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1
Metabolic Networks . . . . . . . . . . . . . . . . . . . . . . . 12
vii
viii
2.4
Contents
2.3.2
Gene Regulatory Networks . . . . . . . . . . . . . . . . . . . 13
2.3.3
Signalling Networks . . . . . . . . . . . . . . . . . . . . . . . 13
Identification of Biological Networks . . . . . . . . . . . . . . . . . . 14
2.4.1
2.5
Identification Approaches . . . . . . . . . . . . . . . . . . . . 15
Chemical Kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5.1
Rate Laws and Reaction Mechanisms . . . . . . . . . . . . . 15
2.5.2
Equilibrium and Steady State . . . . . . . . . . . . . . . . . . 18
2.5.3
Enzyme Kinetics . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Methods
23
3.1
3.2
Methods Based on Local Parameter Perturbation Data . . . . . . . . 23
3.1.1
Interaction Graph Determination . . . . . . . . . . . . . . . . 24
3.1.2
Determination of Control Loops in Mass Flow Networks . . . 27
Methods Employing Time Series Data . . . . . . . . . . . . . . . . . 28
3.2.1
Linearising Around an Operating Point . . . . . . . . . . . . 30
3.2.2
A Discrete Linearisation Method . . . . . . . . . . . . . . . . 34
4 Results
37
4.1
Results from Interaction Graph Determination . . . . . . . . . . . . 37
4.2
Results from Control Loop Determination . . . . . . . . . . . . . . . 40
4.3
Results from Local Linearisation and SITB Estimation . . . . . . . . 43
4.3.1
The Evaluation Network . . . . . . . . . . . . . . . . . . . . . 43
4.3.2
Simulation and Identifiability . . . . . . . . . . . . . . . . . . 45
4.3.3
The Identification Step
4.3.4
Varying Sampling Intervals . . . . . . . . . . . . . . . . . . . 49
4.3.5
Noise addition . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3.6
Additional Investigations . . . . . . . . . . . . . . . . . . . . 59
. . . . . . . . . . . . . . . . . . . . . 47
Contents
ix
5 Discussion
63
6 Conclusions
67
A Kinetic Parameters for the Evaluation Networks
73
B Some Functions in the SITB
75
C Noise Effects on Estimation
77
List of Figures
1.1
Robust perfect adaption . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2
Integral feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3
Enzymatic transformation . . . . . . . . . . . . . . . . . . . . . . . .
5
2.1
A MAPK cascade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2
Dinitrogen pentoxide . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3
Decomposition data for dinitrogen pentoxide . . . . . . . . . . . . . 16
2.4
Logarithmic plot of decomposition data . . . . . . . . . . . . . . . . 17
2.5
Michaelis-Menten dynamics . . . . . . . . . . . . . . . . . . . . . . . 20
3.1
Interaction graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2
Two network architectures . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3
Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4
Folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5
Block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1
Massflow network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2
Simplified cascade . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
x
Contents
4.3
Step responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4
PRBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5
Bode plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6
Gradual sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.7
Bode plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.8
Sampling dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.9
Noise addition
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.10 Signal and noise spectrums . . . . . . . . . . . . . . . . . . . . . . . 56
4.11 Noisy outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.12 Noise-free outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.13 MKKK dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
C.1 Complete noise addition . . . . . . . . . . . . . . . . . . . . . . . . . 79
List of Tables
A.1 Michaelis-Menten parameters (I) . . . . . . . . . . . . . . . . . . . . 73
A.2 Michaelis-Menten parameters (II) . . . . . . . . . . . . . . . . . . . . 74
Chapter 1
Introduction
The issues examined in this master thesis work are introduced in this chapter,
together with the structure and aim of the thesis. Some background information
on systems biology is also given.
1.1
Problem Formulation
The main task of this master thesis is to examine and evaluate different methods
for reconstruction and identification of biochemical networks. The methods are
also investigated with an aim to possibly improve their applicability.
1.2
Systems Biology
The task of the master thesis is within the area of systems biology. This section
aims to explain different interpretations of the systems biology concept as well as
give some motivation on why we perform research on this subject.
1.2.1
What is Systems Biology?
Several different opinions on how to define systems biology exist. One description
of the area is as a field of research concerned with understanding biology on a
systems level. A systems biologist is interested in understanding the structure
and dynamics of a system (Kitano, 2002). In more detail, this understanding of a
system can be divided into four parts: system structure, system dynamics, control
1
2
Introduction
methods, and design methods. The approach on how to thoroughly apprehend
biological systems is proposed by (Kitano, 2001):
System structure Networks of biochemical character is to be identified. This
means that signal transduction as well as controlling mechanisms and mass
flow between entities of the system need to be recognised.
System dynamics When fundamental knowledge about the structure of a system
is found, it is of interest to learn more about system behaviour over time and
under different conditions.
Control methods Methods on how to control the system can be the next step
after identification and analysis of dynamic activities. The interest might be
how to control or sustain a state for the system.
Design methods Design principles and simulations with information on strategies to modify the system. The aim is to achieve the ability to modify, or
even create, a system to fulfill certain purposes; for example provide cures
for diseases.
The systems biology research hence strives to understand the complex interactions
between DNA, RNA, proteins, metabolic and informational pathways. Both interand intra-cellular networks are of interest.
The application of systems and control theory to biological systems is one of many
definitions of systems biology (Wolkenhauer et al., 2003). This view can be illustrated with the following example from (Yi et al., 2000).
Example 1
Robust perfect adaption
Adaption, or desensitisation, is a process of biological systems, that allows the
system to return to a steady state, despite continuously being subject to stimulating
signals. An example of this is the adaption in bacterial chemotaxis to changes in
the levels of constituent proteins. The process of adaption is robust and a result
of integral feedback control.
The process is exemplified with the model of Figure 1.1. The amount of the species
Y is held at a constant level by the integral feedback loop. The amount is only
dependent on the rate constants for the rates leading to and from the A-component
as follows:
dA
V3 Y
=
− V4
dt
K3m + Y
V4 K3m
Y =
V3 − V 4
1.2 Systems Biology
3
E1
E2
v3
Y
A
v4
Activate
Inhibit
(saturation)
Figure 1.1. The process of robust perfect adaption. E1 and E2 represent perturbations
to the system.
The rate v4 must be constant, i.e., operating at saturation as the figure indicates.
The interpretation of the rate constants and a more thorough treatment on kinetics
can be found in Section 2.5. The block representation of this system is shown in
Figure 1.2, where the reference signal r is the level of the component that the
systems strives to maintain constant:
r
+
-
Σ
e
1
s
u
G(s)
y
Figure 1.2. Robust perfect adaption as a block representation of integral feedback
control.
In addition to systems and control theory, mathematical and computational tools
are utilised in research, such as data preprocessing, statistical and informatics
mining tools (Morel et al., 2004).
The area of systems biology is not new but has evolved concurrently with the
development of suitable tools and increasing experimental skills and technology in
producing useful data. Bioinformaticians have large pools of information owing to
several genome projects producing sequence data. In contrast, a systems biologist
is in need of a different kind of data, and to retrieve this type of data is not an
easy task (Wolkenhauer et al., 2003).
4
Introduction
With the definition of systems biology as the application of system theory to biology, the lack of applicable data can be understood. In systems and control theory,
perturbations, performed in a systematical manner, and the following dynamic
response of the system are used to deduce facts and information about the system.
If this approach is copied to biology, time series data of the biological system is
required. These kinds of measurements are preferably performed in vivo, to ensure
that the system is in its natural environment, and that is not an easy task. In addition, data need to be collected systematically and, as in all applications, preferably
with small influence of noise, which is hard to achieve in biological systems. It
is also difficult to perform measurements without altering the environment or the
state of the system under observation. The available data is more often steady
state data, which is a lot less informative since it does not contain any information
about the dynamic behaviour of the system.
The systems biology approach to a problem can be illustrated with the following
example of different levels of modelling of a enzyme catalysed transformation.
Example 2
Modelling of an enzymatic transformation
The three figures gathered below in Figure 1.3 illustrate the modelling of a transformation catalysed by an enzyme. The concept is further explained in Section 2.5.3.
The first of the figures is a simple view of the transformation and gives the information that an enzyme catalyses the transformation of the substrate, S, into the
product, P . The second figure is a more detailed description of how the transformation occurs. The enzyme and substrate form a complex which dissociates into
product and free enzyme.
S
S
E
P
(a) Textbook drawing.
E
ES
P
(b) Reaction view.
1.2 Systems Biology
5
S
E
k1/k-1
ES
k2
P
(c) Circuit diagram, modified after
(Wolkenhauer et al., 2003).
Figure 1.3. Three different levels of modelling of an enzyme catalysed reaction. The
substrate is denoted S, the product P , and the enzyme-substrate complex ES.
The last of the figures is an elaborate view of this kinetic reaction:
k1
k
S + E ES →2 E + P
k−1
A mathematical model can be built from the kinetic interactions in the circuit
diagram. The diagram is a mapping of the four differential equations which describe
the dynamics of the reaction and the equations can be used in simulation of the
system (Wolkenhauer et al., 2003).
A circuit diagram is useful
module to build models of
other hand, this could also
like to obtain for a network
in a systems biology approach. It can be used as a
larger networks with these kind of kinetics. On the
be the kind of information a systems biologist would
with less known kinetics.
6
1.2.2
Introduction
Why Perform Research on Systems Biology?
Why do we need information about the structure and dynamics of a system? Several reasons exist. Information provided by systems biology research is a part of
fundamental knowledge about cells and interactions within cells. The information
can also, for example, be used in medical applications.
A vast number of diseases are not dependent on a singly mutated gene or, for
example, a misfolded protein. Instead, complex multimolecular interactions explain
the disease state (Morel et al., 2004). Therefore, the system of interactions needs
to be understood in order to explain the disease mechanism.
In addition, knowledge about the controlling mechanisms in a system can help in
identification of important nodes in the network, which are possible targets for
drug action or other treatments (Kitano, 2002).
A recent example of the utility of systems biology in medicine is a research project
on the possibility of replacing animal testing with computer modelling called BioSim
(BioSim Network, 2005). The aim of the project is to decrease the number of, or at
least improve, animal tests and speed up the drug development process. The computational modelling will hopefully give more detailed information on how drugs
affect the patient and how the drugs are processed in the body.
1.3
1.3.1
The Thesis
Aim and Audience
The aim of this thesis is to describe system identification methods applied to biochemical systems and the different problems arising when doing so. The report is
meant to give the necessary background information for the reader to understand
how the work has been carried out and what results this has led to.
The thesis is written with a person in mind having basic knowledge in biology as well
as familiarity with topics like mathematics, control theory, and signal processing.
1.3.2
Structure
The thesis is commenced with an abstract summing the work in a few lines while
acknowledgements and notation clarifications are written there after. As a guide
1.3 The Thesis
7
to the thesis, a comprehensive table of contents as well as a list of figures and a list
of tables are listed.
The introductory chapter is followed by a more extensive theory part meant to
explain fundamental concepts needed for further reading. Ensuing the theory block
is a method chapter where the different methods used in the thesis are demonstrated
and analysed.
The results are presented and discussed in the consecutive chapters and the main
part of the thesis is completed with a chapter containing the conclusions from the
work. Finally, a reference list and appendices are given.
8
Introduction
Chapter 2
Theory
This chapter introduces necessary theory in order for the reader to understand
the different methods used in the work. The methods are further described in
Chapter 3.
2.1
Systems, Modelling, and Simulation
Basic concepts in modelling, simulation, and validation are explained in this section.
The view of system identification here is the one used in traditional engineering
applications. The information can be valuable to compare to the system identification methods that have sofar been utilised and developed in systems biology, see
for example (Wahde and Hertz, 2000, Ideker et al., 2001, Friedman et al., 2000).
The information in this section is collected from (Ljung and Glad, 2004), if not
otherwise stated.
2.1.1
Systems
A broad definition of a system is a group of objects, or possibly a single object,
that we wish to study. Another one, not including information about our interest
in the system, is that the system is a group of elements interacting and bringing
about cause-and-effect relationships within the system (Close et al., 2002).
A system can be studied by performing experiments. The observable signals are
usually referred to as outputs. Signals that in turn affect the system are called
inputs.
9
10
Theory
Sometimes, it is not possible to perform all the experiments on the system one
would like due to, safety, financial, time consumption, or technical reasons. A
model of the system is needed if it is to be examined.
2.1.2
Models
A model is a tool, describing the system of interest, that enables us to answer questions about the system without performing experiments. The model is a description
of how the elements in the system relate to each other (Ljung, 1987).
How detailed a model is depends on what level we wish to examine the system.
What level we choose is dependent on the purpose of the model.
The interest in modelling is not always focused on the details of the dynamics
within the system. The relation between the input and output signals can instead
be the main focus, and less attention is put on the interpretation of the detailed
internal dynamics. A model with this purpose is called a black box model. The
parameters in the model are estimated with the only purpose of connecting the
output and input signals, and are not associated with physical properties (Ljung,
1987). Sets of standard models that have parameters possible to relate to physical
properties exist. These mixed models are referred to as grey box models.
It is important to remember that a model is simply a model. The possibility of an
exhaustive description of the behaviour of a system is non-existent. In addition,
observing the system always results in different kinds of noise, stochastic in nature,
bringing about unpredictable variations in the observations.
In engineering applications, the most extensively used models are of mathematical nature. A mathematical model defines, through distinct equations, how the
elements of a system relate to each other.
A mathematical model can be time discrete or continuous, deterministic or stochastic, linear or nonlinear, and lumped or distributed. What attributes to assign to a
model is dependent on the system and what information we can retrieve from it.
A simple, time continuous, lumped model of a biological system is given in the
example below.
Example 1
The Lotka-Volterra equations
d
N1 (t) = aN1 (t) − bN1 (t)N2 (t)
dt
d
N2 (t) = −cN2 (t) + dN1 (t)N2 (t)
dt
2.2 System Identification
11
The equations describe the dynamics between the populations of prey, N1 (t), and
predator, N2 (t), respectively. The variables can represent population density or
biomass. Even though this model is simple it is highly nonlinear and cannot be
examined analytically.
To form the equations, several assumptions have been made regarding growth,
death, and predation. The assumptions, analysis of the system, and extensions of
the model can be found in (Edelstein-Keshet, 1988). The equations are originally
from a classical article by Volterra.
2.1.3
Simulation
A model can be used to deduce the behaviour of a system under certain conditions,
which correspond to experimental tests. The deduction of results can be performed
by analytical computation or by performing numerical calculations using the model.
The latter is what we call simulation. If we think of a model as a set of instructions,
for example equations in the case of mathematical models, a simulator obeys these
instructions and generates the behaviour (Zeigler et al., 2000).
2.2
System Identification
System identification is about constructing a model of a system from observed
data. The information in this section is a summary of the introductory chapters of
(Ljung, 1987).
2.2.1
The Identification Process
The system identification process can be divided into three phases: recording data,
obtaining a set of candidate models, and choosing the best of the candidate models.
Recording data Input and output signals are monitored and recorded during
experiments. The objective is to generate data carrying as much information
as possible about the system. Special identification measurements can be
made or data is gathered during normal operation of the system.
Obtaining a set of candidate models This step is the most crucial and difficult one in the identification procedure. A set of models have to be chosen on
the basis of the current knowledge of the system. One can choose from sets
12
Theory
of basic models, or a model can be developed using physical characteristics
of the system. A model from a set of standard models is a black box or a
grey box model.
Choosing the best of the candidate models according to data In this phase,
the actual identification occurs. Parameters in the chosen model are estimated from the data.
2.2.2
Validation
Assume we have obtained a model of a system from our recorded data. Can we
trust the model? We have to examine if the model is useful for the purpose we
intended it to. This is what is more known as validation.
A model is generally considered to be useful, if it can reproduce experimental
data. We compare the behaviour of the model with the system behaviour (Ljung
and Glad, 2004). Important to remember is that a model is only valid within its
validated area. It is never advisory to extrapolate information from a model.
What if we are not satisfied with the result that our model produced? Then, we
need to go back to our identification process and change one or more of the criteria.
Going through the process again will produce a different, and perhaps better result.
2.3
Biological Network Structures
Different types of biological networks exist. In this thesis, the emphasis is on the
molecular level: intra-cellular networks and components. An account of different
types of intra-cellular networks are given in this section, in order for the reader to
understand the kind of networks that the identification methods in the thesis may
be applied to.
An arbitrary network within a cell is a biochemical network, since it consists of
entities in a biological unit, the cell, and they interact with some kind of chemical
reaction. The grouping into separate kinds of networks differ in various articles
and textbooks. Biochemical networks are sometimes not considered to include the
genetic networks of the cell. However, the networks of the cell are divided into
metabolic, gene regulatory, and signalling networks in this thesis.
2.3.1
Metabolic Networks
The majority of biochemical reactions are degradative or synthetic (Zubay et al.,
1995). The synthetic reactions, also called anabolic reactions, are the basis for
2.3 Biological Network Structures
13
the formation of large biomolecules from simpler units. An example is the assembly of proteins from nucleic acids. The catabolic reactions are degradative and
involves the breakdown of larger, complex organic molecules into smaller components (Zubay et al., 1995). The β-oxidation of fatty acids is a catabolic process.
The catabolic and anabolic processes comprise the metabolism of the cell.
A metabolic pathway is a complex series of reactions (Zubay et al., 1995) and the
word series emphasises the directional property of the pathways; there is a net
flow of mass in a specific direction or a specific end purpose of the pathway, for
instance one, or several, products. A pathway can be considered as a network with
a relatively small amount of interconnections. When different pathways share components, for example second messengers, networks of higher order and complexity
occur.
In describing a metabolic network, one usually distinguishes biochemical connections from controlling connections. Controlling links represent that two nodes are
connected without any mass transfer. Instead one species influences the production
or consumption rate of the other species.
The controlling links are not always represented in traditional pathway descriptions,
although they are important. Controlling links are part of feedback and feedforward
loops that are important for the stability and robustness of the pathway.
2.3.2
Gene Regulatory Networks
The central dogma of molecular biology states that DNA is replicated, transcribed
into RNA, and translated into proteins. DNA interacts with a vast set of molecules
in the cell, from complex proteins to simple transcription factors.
The RNA-molecules usually make up the nodes of a gene regulatory network. Mass
flow is not as frequent in these networks as in a metabolic network and instead
controlling mechanisms dominate.
2.3.3
Signalling Networks
Cells receive, process, and respond to information in a process called signal transduction (Lodish et al., 2000). The signals are mediated by signalling pathways, and
since components often interact between the pathways, they form networks (Bhalla
and Iyengar, 1999). Mechanisms of information transfer might be protein-protein
interactions, enzyme activity regulations, or phosphorylation. The last form of
transfer is exemplified below using the highly conserved kinase cascade where Mitogen Activated Protein Kinase (MAPK) is activated (Lodish et al., 2000).
14
Theory
Example 2
MAPK Cascade
The cascade is built up by several levels where a kinase in the upstream level phosphorolyses a kinase in the level downstream. A MAPK cascade with all controlling
elements included is given in Figure 2.1.
Ras/MKKKK
v1
MKKK
v4
v2
MKKK-P
v3
MKKK-PP
v5
MKK
v8
v6
MKK-P
v7
MKK-PP
Activate
Inhibit
v9
MAPK
v12
v10
MAPK-P
v11
MAPK-PP
Intramodular
interactions
Figure 2.1. A MAPK cascade.
2.4
Identification of Biological Networks
System structure identification corresponds to the first of the four parts of understanding a biological system described in Section 1.2.1. In this thesis, the focus
is on network structure identification. Network structure identification is not as
broad a concept as systems structure identification and excludes for example the
identification of structural connections among cells as well as cell-cell association
(Kitano, 2001).
2.5 Chemical Kinetics
2.4.1
15
Identification Approaches
The identification of a network includes finding all the components, their function,
and how they interact. This is a difficult task since this kind of information cannot be inferred from experimental data based on some general rules or principles.
Biological systems are stochastic in nature and not necessarily optimal (Kitano,
2001).
In addition, several network realisations might produce similar experimental data
and the identification involves singling out the correct one. This corresponds to
the process of finding the best model out of a set of candidate models as described
in Section 2.2.1.
There are two general approaches in network structure identification: bottom-up
and top-down identification.
Bottom-up Different sources of data, for example literature and experiments, are
integrated in order to understand the network of interest. This data-driven
approach is mostly suitable when almost all pieces of the network already are
known and the quest is to find the missing parts.
Top-down A more hypothesis-driven approach where high-throughput data is
utilised in trying to determine the network structure. Some information
about the network is usually needed before hand, but not as extensive as
in the bottom-up approach.
2.5
Chemical Kinetics
Chemical kinetics is the study of rates of chemical reactions under consideration
of all the intermediates in the process (Atkins and Jones, 1999). The area also
examines the details about how reactions advance and what determines the rate of
a chemical reaction.
2.5.1
Rate Laws and Reaction Mechanisms
Dinitrogen pentoxide, N2 O5 , is an organic compound present in solid form at room
temperature. The chemical structure of the substance is depicted in Figure 2.2 with
the covalent bonds linking the different atoms in the molecule. More information
on the substance can, for example, be found in (Linstrom and Mallard, 2003).
At a temperature of 67o , the compound is in gaseous form, and decomposes into
16
Theory
O
O
N
N
O
O
O
Figure 2.2. Chemical structure of dinitrogen pentoxide.
nitrogen oxide and oxygen according to
2N2 O5 (g) → 4NO(g) + 3O2 (g)
x2
x1
(2.1)
x3
The variables under each substance denote the amount of the substance in question.
A plot of data for the reaction is shown in Figure 2.3.
1
0.9
Amount of dinitrogen pentoxide (M)
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.5
1
1.5
2
2.5
Time (min)
3
3.5
4
4.5
5
Figure 2.3. Decomposition data for dinitrogen pentoxide.
From this graph, one can observe that the amount of N2 O5 is consumed, fast
initially, but the consumption rate decreases gradually. If we make a plot of the
logarithm of the data, and plot it against the time as before, we get the plot in
Figure 2.4 of a straight line. This plot confirms the fact that the reaction is of first
order, which means that the amount of dinitrogen pentoxide is consumed with a
rate directly proportional to its amount.
A mathematical model of the reaction is made with differential equations. The
2.5 Chemical Kinetics
17
0
Logarithm of amount of dinitrogen pentoxide (M)
−0.2
−0.4
−0.6
−0.8
−1
−1.2
−1.4
−1.6
−1.8
0
0.5
1
1.5
2
2.5
Time (min)
3
3.5
4
4.5
5
Figure 2.4. Logarithmic plot of decomposition data.
amount of each compound is as before denoted xi and the rate is denoted r.
dx1
=−2r =−2kx1
dt
dx2
= 4r = 4kx1
dt
dx3
= 3r = 3kx1
dt
(2.2)
(2.3)
(2.4)
The rate constant for the decomposition step of equation (2.2) is defined as the
proportionality constant (excluding the sign), and is hence 2k. It can be deduced
from the experiments as the slope of the straight line in Figure 2.4. Observe that
the formation parts of the reaction, equations (2.3) and (2.4), have other rate
constants, since the stoichiometric ratios between the species differ.
The order of a reaction cannot in general be determined from the reaction formula;
it is a property determined by experiments. The order varies from one chemical
reaction to another and fractional orders also exist (Atkins and Jones, 1999).
If two species react to form a third species, the overall order is defined as the sum
of the orders for each reactant. For the example below, the order is a + b.
A+B →C
gives
dA
= −k(xA )a (xB )b
dt
The difficulty of extracting the rate law for a reaction directly from its equation
owes to the fact that all, but the simplest, reactions occur in several steps called
elementary reactions (Atkins and Jones, 1999). All steps might not be given in a
18
Theory
reaction formula. To understand how the reaction proceeds, a mechanism needs to
be proposed.
Assume that we have a total reaction A + 2B → C + D. A possible reaction
mechanism is given below, where X and Y represent intermediates.
A+B →X
X →C +Y
Y +B →D
X
r1 = k1 xA xB
r2 = k2 xX
r3 = k3 xB xY
A + 2B → C + D
The deduction of a rate law (and the reaction order) from a mechanism can be done
by employing different methods, and all include some kind of approximation of, or
assumption concerning, the dynamics of the mechanism. The different methods
can sometimes give the same rate law for a given mechanism. One method, called
the steady state approximation, is employed in Section 2.5.3.
2.5.2
Equilibrium and Steady State
The simple reaction given by formula (2.1) is an irreversible reaction, meaning that
it only proceeds in one direction. This is a simplification in the modelling of the
dynamics; in fact, all reactions also have a reverse course of events compared to
the forward reaction. For the dinitrogen pentoxide reaction this means that some
amount of the product always decomposes back into reactants.
Reactions modelled as irreversible have a reverse reaction rate small enough to be
neglected in modelling of the process. Reactions not possessing this property have
forward and reverse reaction rates of the same magnitude; the reaction is reversible.
This is depicted as
k1
A+B C +D
k−1
If the reverse and forward reaction rates are equal, the reaction has reached chemical equilibrium. A chemical equilibrium is characterised by a minimum of the free
energy for the reaction (Atkins and Jones, 1999). There is no inclination for change
in any of the directions for the reaction.
The equilibrium constant is defined as
K=
xC xD
xA xB
If the reverse and forward reactions both are of simple second order, the equilibrium
will correspond to k1 xA xB = k−1 xC xD , and K = k1 /k−1 . If K is large a lot of
2.5 Chemical Kinetics
19
product is produced before equilibrium is reached, since the rate constant for the
forward reaction then is larger than the reverse rate constant (Atkins and Jones,
1999).
In cells, compounds can exist in concentrations far from their chemical equilibrium
states (Zubay et al., 1995). These states are connected to a larger free energy, and
the concentrations are not only governed by the external environment, as it is in
chemical equilibria.
The rates in reaction sequences vary according to the cells requirements. The
concentrations of key metabolites are held at constant levels by balancing the rates
of production and consumption of reaction intermediates (Zubay et al., 1995).
A reaction with an intermediate species, X, is depicted below. The concentration
of the intermediate is held at a constant level. Mathematically it corresponds to
a derivative of zero, for the amount of the component; dxX /dt = 0. This is a non
chemical equilibrium situation where the intermediate resides in what is called a
steady state.
A+B X C +D
2.5.3
Enzyme Kinetics
Enzymes are proteins acting as catalysts to chemical reactions in cells. Enzymes
have an active site where the reaction takes place. The molecule which the enzyme acts upon is called substrate. Enzymes work by decreasing the activation
free energy, i.e., the energy needed for the substrate(s) to enter into a state of
transformation, called the transition state (Zubay et al., 1995).
Enzyme catalysed reactions have a feature that distinguishes them from simpler,
chemical reactions; they show saturation (Cornish-Bowden and Wharton, 1988).
Almost all reactions of this type are of first order for small substrate concentrations,
but the rate decreases with increased substrate concentration. The rate eventually
becomes constant, independent of the concentration of substrate.
The behaviour can be observed if the rate of a reaction can be deduced from
a set of experiments with different substrate concentrations. For each substrate
concentration, a small amount of the catalysing enzyme is added, and the amount
of formed product is monitored during a time span. The monitoring can be done
with light spectrometry, provided that the product absorbs light. The amount
of product is plotted against time, and the slope of the curve at the start of the
experiment corresponds to the rate. An artificial curve of the typical behaviour is
given in Figure 2.5.
The constants Km and V in the figure are parameters in the rate law for the reaction
which is deduced from the following reaction mechanism where E represents the
20
Theory
V
0.5V
0
Km
[S]
Figure 2.5. Typical dependence of reaction rate on substrate concentration for a reaction
following Michaelis-Menten dynamics (see below). Km is the substrate concentration
giving a rate of 0.5V .
enzyme, S the substrate, and P the product.
E
+
S
k1
ES
k
→2
E
+
P
k−1
Etot − xES
xS
xES
xP
The amount of free substrate is considered to be much larger than the amount
bound to the enzyme-substrate complex, ES, and it is hence assumed that the
free amount of substrate and the total amount are equal (Cornish-Bowden and
Wharton, 1988). The conversion of ES to free product and free enzyme, E + P , is
considered to be irreversible, if one only measures the initial rate of the reaction in
the steady state (the regeneration of ES is negligible since the amount of product
is small) (Zubay et al., 1995).
The steady state assumption for this mechanism is that the amount of the intermediate species does not change:
dxES
= k1 (Etot − xES )xS − k−1 xES − k2 xES = 0
dt
Solving the algebraic expression for xES gives that
xES =
k1 Etot xS
k−1 + k2 + k1 xS
and since the dissociation of the enzyme-substrate complex is of first order with
respect to the intermediate, we have a rate of the form where Km = (k−1 + k2 )/k1
and V = k2 Etot :
k1 k2 Etot xS
V xS
=
(2.5)
v=
k−1 + k2 + k1 xS
Km + xS
2.5 Chemical Kinetics
21
This equation is known as the Michaelis-Menten equation and Km consequently
as the Michaelis constant, although it was Briggs and Haldane that in 1925 proposed the mechanism (Cornish-Bowden and Wharton, 1988, Zubay et al., 1995).
Michaelis and Menten in fact assumed that the first step was an equilibrium, which
is a less general assumption than the one made above. The equation can be extended to describing a mechanism with several substrate molecules.
Substances called activators and inhibitors affect enzyme activity and cause the
catalysed reaction to proceed faster or slower respectively.
The most important kinds of inhibition, although several other exist, are results
of the inhibitor binding to the enzyme (Cornish-Bowden and Wharton, 1988).
The simplest type of inhibition is linear, meaning that terms proportional to the
inhibitor concentration appear in the denominator of the rate law (Cornish-Bowden
and Wharton, 1988). If the enzyme-catalysed reaction followed Michaelis-Menten
dynamics in the absence of inhibitor it will still do so if inhibitor is present, with
modifications of the effective parameter values.
Competitive inhibition is the most common kind (Cornish-Bowden and Wharton,
1988) and occurs when the substrate, S, and inhibitor, I, compete for the free
enzyme. If the inhibitor binds, it will result in an inactive complex, EI, that
does not lead to products. The Michaelis constant will have an altered effective
value, while the limiting rate, V , will have the same effective value as before. The
interpretation of this is that the enzyme-substrate complex is as reactive as before,
but the effective affinity of the substrate to the enzyme is decreased (CornishBowden and Wharton, 1988).
Uncompetitive inhibition, on the other hand, is a result of the inhibitor binding to
the enzyme-substrate complex and producing an inactive complex, ESI. A pure
form of this kind of inhibition is uncommon, and is most important as product
inhibition (Cornish-Bowden and Wharton, 1988). The effective limiting rate and
Michaelis constant are both affected, but their ratio is not.
If an inhibitor binds to both the free enzyme and to the enzyme-substrate complex, the inhibition is called mixed. In some books, the term non-competitive
inhibition occurs, and is regarded as an additional form of inhibition. In this case,
the inhibitor affects the enzyme without affecting binding of the substrate, and the
inhibition is independent of substrate concentration (Cornish-Bowden and Wharton, 1988, Zubay et al., 1995). According to (Cornish-Bowden and Wharton, 1988),
this is not a plausible mechanism in nature, and there are no examples recorded,
if effects of the pH are excluded.
Activators are effectors that bind to an enzyme and increase enzyme activity without being changed in the reaction (Cornish-Bowden, 1995). Specific activation
occurs when the enzyme, in the absence of the activator, does not have any activity. In analogy with the competitive inhibition, the effective value of the Michaelis
22
Theory
constant is altered. An activation counterpart to the mixed inhibition also exists.
More complex expressions for the rate law of enzyme activation occur when the
enzyme is less active, but not inactive, in the absence of activator (Cornish-Bowden,
1995).
A simple model for activation is to insert the concentration of the activator in
the numerator of the rate law. The model might be used when the underlying
mechanism of activation is unknown.
Most enzymes follow Michaelis-Menten kinetics, but some do not. These enzymes
show a sigmoidal dependence on substrate of the rate, instead of a hyperbolic
dependence as in Figure 2.5. Often, these enzymes have controlling, or regulatory,
tasks in a biological network.
The sigmoidal response is linked to cooperativity of the enzyme. An enzyme exhibiting cooperativity will be ultra-sensitive to changes in the substrate concentration, which is not the case of Michaelis-Menten kinetics (Cornish-Bowden, 1995).
Positive cooperativity can for example occur if an enzyme is binding several substrate molecules and the binding of subsequent substrates is facilitated by previous
binding of substrate.
Chapter 3
Methods
Several identification methods were used in the thesis work and they are described
and analysed in this chapter. Emphasis is given to the mathematical basis for each
method and to the data required to apply them to biochemical systems.
3.1
Methods Based on Local Parameter Perturbation Data
Local parameter perturbation data is based on steady state measurements. The
amount of each interesting species does not change, which corresponds to a derivative equal to zero for the variables representing these amounts.
Consider a network consisting of n components (or n groups of components), each
described by a state variable, xi (t), that (in some units) represents the amount of
the component. The variables are gathered in a vector, x(t) = (x1 (t), . . . , xn (t)).
The system is modelled by a set of differential equations where the rates of change of
the variables are dependent, in addition to the variables xi , on a set of parameters,
p = (p1 , . . . , pm ).
ẋ1 (t) = f1 (x1 (t), . . . , xn (t), p1 , . . . , pm )
ẋ2 (t) = f2 (x1 (t), . . . , xn (t), p1 , . . . , pm )
..
.
ẋn (t) = fn (x1 (t), . . . , xn (t), p1 , . . . , pm )
(3.1)
A component fi is not explicitly dependent on all the parameters in p, but the
exact dependence is not known. The steady state is given by ẋ(t) = f (ξ(p̄)) = 0,
23
24
Methods
where ξ(p̄) is a vector of the steady state values, associated to a specific set of
parameter values p̄, of the state variables.
We assume that it is possible to perturb the parameters in p individually. A
realisation of this could be the addition of specific inhibitors or activators that
affect the catalysing enzymes in the network. Another possibility is to introduce a
double stranded RNA that interferes with a gene product, which can subsequently
lead to a reduction of the amount of a protein. Interference RNA is often denoted
RNAi. Each affected enzyme is responsible for catalysing a reaction dependent
upon a parameter pj . Individual additions of activators, RNAi, or inhibitors are
repeated for all parameters.
The result will be m + 1 different steady state measurements, of all state variables,
including the reference steady state, ξ ref . All of the different measurements are
associated to a specific set of parameters.
A perturbation experiment involving an alteration of the parameter pj results in
the following quotient, which approximates the sensitivities of the steady state
values to changes in the parameter:
σij =
ξi − ξref
∆ξi
=
,
∆pj
pj − pj ref
i = 1, . . . , n
(3.2)
The calculations are repeated for all parameters and the result is gathered in a matrix Σ = (σij ). This matrix has the dimensions (n×m) and it is the approximation
of ∂ξi /∂pj for i = 1, . . . , n and j = 1, . . . , m.
If the exact changes of the parameter values are unknown, the sensitivities can be
approximated in a different manner, explained in (Kholodenko and Sontag, 2002).
3.1.1
Interaction Graph Determination
An account of the top-down identification method from (Kholodenko and Sontag,
2002) is given in this section. Assuming that the components in the network are
known, the method aims at determining the interaction graph of the system.
The interaction graph, also named connection graph, is a description of how each
node in the network qualitatively affects the other nodes. The interaction graph
is a basis for further investigation of the network and its dynamic behaviour. A
simple interaction graph for a network of three components is shown in Figure 3.1.
How the different nodes in the network affect each other is displayed with the
arrows - the net effect is negative or positive with respect to the amount of the
component(s) represented by each node. Exactly how the network is built is not
always trivial to extract from the interaction graph. The graph only describes the
functional dependencies of the total rate of change of each variable with respect to
3.1 Methods Based on Local Parameter Perturbation Data
25
Figure 3.1. A simple interaction graph.
other variables in the system.
The deduction of the network wiring is almost always impossible if the nodes of
the network are connected by mass flow, since several network structures can correspond to the same graph. The impossibility of distinguishing between network
structures is shown in the example below.
The method also allows a modular approach, where several components can be
represented by each node. The same method, with minor variations, is described
in (Kholodenko et al., 2002) and is applied to a gene regulatory network.
The interaction graph can be deduced from the Jacobian, f x = (fik ), of the general
system (3.1) of differential equations, where
fik =
∂fi
(x̄, p̄)
∂xk
A connection from node i to node k in the interaction graph, corresponds to a
non-zero element in position (k, i) of the Jacobian.
Example 1
Network uniqueness
Two network architectures are given in Figure 3.2. The two networks have some
mass flow between their nodes.
Both systems have the following quantitative Jacobian. The deduction of the network architecture from this Jacobian will not lead to a unique network.


A B C
 A − 0 + 


 B + − − 
C 0 0 −
26
Methods
C
C
Activate
Inhibit
A
B
A
B
Figure 3.2. Two network architectures.
The intention of finding the Jacobian of the system from steady state data cannot
be entirely realised, since a multiplication of each row in the Jacobian with a
constant also is a solution to ẋ = 0. Hence, the rows can only be determined up
to a scalar multiple. This is still enough to determine the interaction graph since
we are only interested in whether an element of the Jacobian is non-zero or not.
A few assumptions are needed to make the method legitimate. There must, for
each node, represented by a variable xi , exist other nodes that are not directly
connected to the current node i. This means that there is a set of parameters for
which ∂fi /∂pj = 0 for each i ∈ {1, . . . , n}. The required set of parameters have to,
for each xi , correspond to n − 1 independent columns from the Σ-matrix.
For each node i, perturbations must be made to n − 1 other nodes, and the parameters that are perturbed cannot be a part of any of the rates that lead to or
from the node i. It is not necessary to perform n · (n − 1) perturbations, since
each perturbation can be used for more than one node. To use the method, some
information about how the nodes are connected must be known before hand. The
information needed is, for example, that a subset of the nodes are known to be
unconnected to the remaining set of nodes. If the network connections only are of
controlling nature, the assumptions are easy to fulfill.
For one state, i.e., one row in the Jacobian, the following is valid:
∂fi
∂x
∂fi
∂
fi (x̄, p̄) =
(x̄, p̄)
(x̄) +
(x̄, p̄)
∂pj
∂x
∂pj
∂pj
The assumptions above and the fact that the system is in steady state produces
the pursuing orthogonality criterion:
∂fi
∂x
(x̄, p̄)
(x̄) = 0
∂x
∂pj
(3.3)
for each pj that fulfills the first assumption for the particular fi .
This information is all we need to reproduce the rows of the Jacobian up to scalar
multiples. The assumptions guarantee that the orthogonality criteria correspond
3.1 Methods Based on Local Parameter Perturbation Data
27
to a system of linear equations from which we can produce the values of each row.
Also, by fixing the diagonal elements in the Jacobian to −1, the linear equation
system is reduced and can be solved using least square methods.
3.1.2
Determination of Control Loops in Mass Flow Networks
A problem with the interaction graph identification method in Section 3.1.1 is that
the interaction graph can correspond to several non-separable network architectures. It is not possible to separate mass flow connections from control loops, and
the interaction graph tends to become complex when mass flow is present.
The method of this section aims to identify the control loops in a network where
the mass flow is known before hand. The applicability of the method is of course
limited by the assumption that the mass flow is known, but it is not an at all
unlikely situation. Situations may exist where the ”back-bone” of the network is
known from experiments, but nothing of the internal control mechanisms.
The set of differential equations from the system (3.1) can be represented in the
following manner where the fi components are replaced by the rates. The time
dependence has been left out, as well as the dependence of the rates on the state
variables, to simplify the notations:
ẋ1
ẋ2
ẋn
= m1,1 r1 (p1 ) + m1,2 r2 (p2 ) + . . . + m1,q rq (pq )
= m2,1 r1 (p1 ) + m2,2 r2 (p2 ) + . . . + m2,q rq (pq )
..
.
= mn,1 r1 (p1 ) + mn,2 r2 (p2 ) + . . . + mn,q rq (pq )
The kinetics of the rates are not known but the mass flow in the network is reflected
by the knowledge of the mi,l elements. If the stoichiometric ratios of the network
is one-to-one for all species, the matrix M = (mi,l ) only contains 0, 1 or −1 in
appropriate places. Row i of the M -matrix has zeros for the rates not leading to or
from the node represented by variable i. The differential equations are summarised
as
ẋ = M r(p)
(3.4)
Examining a specific row i, denoted M (i)r(p), in the relation above and differentiating it with respect to the parameter pj will result in:
∂
∂
(M (i)r(p)) =
(mi,1 r1 + . . . + mi,q rq ) =
∂pj
∂pj
∂r1
∂x
∂rq
∂x
+ mi,1
+ . . . + mi,q ∇rq
+ mi,q
mi,1 ∇r1
∂pj
∂pj
∂pj
∂pj
(3.5)
28
Methods
The differentiation of the arbitrary row from expression (3.5) can be represented
in a more compact form, where r x = (∇r1 ∇r2 . . . ∇rn )T quantifies how the rates
depend on the state variables:
M (i)r x
∂x
∂r1
∂rq T
+ M (i)(
,...,
)
∂pj
∂pj
∂pj
(3.6)
The parameter pj in equation (3.6) can be chosen to produce a cancellation of the
last term of the equation. If the rates that lead to or from state that is represented
by the row i are independent of the parameter, we have cancellation. Selecting a
parameter with these properties will produce the following criterion, similar to the
orthogonality criterion of equation (3.3):
M (i)r x
∂x
=0
∂pj
(3.7)
The vector ∂x/∂pj is approximated by the corresponding column of the Σ-matrix
defined earlier in this chapter. Since the matrix M is known, the criterion results
in an equation of some of the entries in the r x -matrix. Combining all different
rows and ∂x/∂pj vectors as in equation (3.7) will produce an under determined
equation system, assuming that each rate cannot be perturbed more than once.
The assumption is reasonable, since we only perform small perturbations and the
response should be similar whatever kind of perturbation we use on parameters in
the same rate, i.e., additional perturbations will not add any independent equations.
Further analysis of the method can be found in Chapter 4.
3.2
Methods Employing Time Series Data
The methods in the previous section are based on steady state data. As noted,
steady state data is not as informative as time series data and it is not possible to
exactly determine the Jacobian of the system.
Time series data is retrieved by recording the amounts of the involved species
during a time span. The data is informative but hard to collect. Measurements in
vitro of dynamic species is not an easy task, but can be achieved using elaborate
methods including different kinds of spectrometry.
Sampling of the data is needed, for both continuous and discrete data records. The
intention is for the data to pick up the dynamics of the system and in order to do
so, a lot of thought must underlie the choice of sampling interval.
The sampling interval, T , is the time between samples. The angular frequency of
the sampling is subsequently defined as ωs = 2π/T = 2πfs and naturally called
(angular) sampling frequency.
3.2 Methods Employing Time Series Data
29
A continuous signal is presumed to have a Fourier transform describing its frequency
content. Sufficient, but not necessary, conditions for the transform to exist are the
Dirichlet conditions (Svärdström, 1999). Periodic functions as well as the unit step
function do not fulfill the requirements in these conditions. For these signals, a
transform is defined in the limit allowing generalised functions like the Dirac pulse.
The transform is defined as
Z ∞
F [x(t)] = X(ω) =
x(t)e−jωt dt
−∞
where ω is the angular frequency.
Sampling of a continuous signal
modelled by a multiplication of the signal with a
Pis
∞
sequence of impulses, δT (t) = n=−∞ δ(t−nT ), where δ represents the Dirac pulse.
The sampling interval is the time between the pulses. The procedure of sampling is
illustrated in Figure 3.3. The transform of the signal is repeated with a distance of
ωs rad/s in the frequency domain. The signal is limited in the time domain, since
we only can sample for a finite time span. A time limited signal cannot be band
limited; the transform is infinite in the frequency domain (Svärdström, 1999).
x(t )
t
0
∑ δ (t − nT )
x (t ) ⋅
0
T
0
T
∑ δ (t − nT )
t
t
Figure 3.3. Sampling of a continuous signal.
The unlimited transform causes problems during sampling. The frequencies above
the Nyquist frequency, ωN = πfs = ωs /2 rad/s, are folded into the transform,
and are instead misinterpreted as lower frequencies. The effect is illustrated in
Figure 3.4. Consequently, there are two problems in sampling; folding of frequencies
above the Nyquist frequency and the necessity of limiting the signal in the process.
Both effects distort the transform of the continuous signal.
30
Methods
X( )
X s( )
N
s
Figure 3.4. Frequencies above ωN are folded into the angular frequency range [−ωN , ωN ].
The folding effect is also known as aliasing. A way to avoid the aliasing-effect is to
use an anti-alias filter on the data before the sampling is made. An anti-alias filter
is a regular low-pass filter with the cut-off frequency chosen a bit below the Nyquist
frequency (Ljung and Glad, 2004). In this way, the frequency content above ωN is
lost, and does not disturb the remaining information by folding effects. In addition,
if there is high-frequency noise in the data, it is also removed by the anti-alias filter
(Ljung, 1987).
A rule of thumb is to choose the sampling frequency a ten-fold greater than the
bandwidth we wish our model of the system to cover. The choice corresponds to
4-8 sample points on the flank of the step response from the system (Ljung and
Glad, 2004). It might therefore be of interest to measure a step response from the
system before hand.
3.2.1
Linearising Around an Operating Point
Consider a non-linear network that resides in a steady state, which is the operating
point. How the network is wired in this steady state is, as before, given by the
Jacobian of the system at the operating point and is what we wish to determine.
The system is described using the following setup, similar to the one given by
equation (3.1):
ẋ1 (t) = f1 (x1 (t), . . . , xn (t), u1 (t), . . . , uk (t))
ẋ2 (t) = f2 (x1 (t), . . . , xn (t), u1 (t), . . . , uk (t))
..
.
ẋn (t) = fn (x1 (t), . . . , xn (t), u1 (t), . . . , uk (t))
(3.8)
Compared to equation (3.1), the dependence on the parameters is not explicitly
stated in the representation and the variables ui (t) are added that correspond to
3.2 Methods Employing Time Series Data
31
input signals to the system. The variables are gathered in an input signal vector,
u(t). The representation in equation (3.8) implies that there is a way to effect the
system, in a controlled manner, by changing the input signals.
The steady state of the system corresponds to the specific values (x0 , u0 ) of the
state variables and the input signals. The system is observed in a vicinity of
the steady states where small deviations, (x(t) − x0 , u(t) − u0 ), are represented
e (t)). The deviations are often called perturbations. Substituting the
by (e
x(t), u
expressions of the deviations into the differential equation system of (3.8) and
omitting the time dependence will give
d
e ) = f (x0 + x
e , u0 + u
e)
(x0 + x
(3.9)
dt
By expanding the left hand side of the expression and noting that dx0 /dt = 0 and
du0 /dt = 0 as well as expanding the right hand side in a Taylor series around the
steady state (x0 , u0 ) we get
d
∂f
∂f
(e
x) = f (x0 , u0 ) +
(x0 , u0 )e
x+
(x0 , u0 )e
u + h.o.t.
dt
∂x
∂u
The perturbations around the steady state are small and the higher order terms can
be neglected. We have a linear system of differential equations for the perturbations
e , since f (x0 , u0 ) = 0 by definition of a steady state:
x
∂f
∂f
e˙ =
x
(x0 , u0 )e
x+
(x0 , u0 )e
u
∂x
∂u
The matrix ∂f /∂x is the Jacobian for the system in the steady state and describes
the network wiring, as stated before.
In equation (3.8), the variables ui (t) represent input signals. The approximate
input-output modelling of the system is completed by defining a set of output
signals, y(t), as linear combinations of the state variables as well as the input
signals. The result is a linear state space model:
e˙ (t) = A(θ)e
x
x(t) + B(θ)e
u(t)
y(t) = C(θ)e
x(t) + D(θ)e
u(t)
(3.10)
The time dependence is explicitly stated to point out that θ is time independent.
θ is a vector of the parameters, determining the behaviour of the system, that were
omitted in equation (3.8). The true set of the parameters is what we wish to find.
Noise is inevitably added to the measurements during data collection. The state
space model given by equation (3.10) does not contain any noise model. Adding
a noise model is advisory, and a fairly simple model is to include additive noise
sources. An extended model for the system, including noise sources w(t) and v(t)
representing process noise and measurement noise respectively, is (Ljung, 1987):
e˙ (t) = A(θ)e
x
x(t) + B(θ)e
u(t) + w(t)
y(t) = C(θ)e
x(t) + D(θ)e
u(t) + v(t)
(3.11)
32
Methods
The noise sources are modelled as sequences of independent random variables with
zero mean. A so called observer estimating the state variables is given by equation (3.12). To avoid cumbersome notations, the˜-accent is dropped, meaning that
ˆ . The parameter dependence is also omitted for the same reason.
e
x̂ in fact is x
˙
x̂(t)
= Ax̂(t) + Bu(t) + K(y(t) − C x̂(t) − Du(t))
(3.12)
The quantity y(t) − C x̂(t) − Du(t) is a measurement of how well the predictor
x̂(t) estimates x(t). The observer that minimises the estimation error is called the
Kalman filter, represented by K. The filter is calculated from the matrices for the
state space description as well as the variance and covariance matrices for the noise
sources. For a more thorough treatment, see for example (Ljung and Glad, 1997)
or (Ljung, 1987).
The prediction error ν(t) = y(t) − C x̂(t) − Du(t), used as feedback quantity in
the observer, is the new information in the measurement y(t) that is not available
in the previous measurements. The signal ν(t) is called the innovation (Ljung and
Glad, 1997, Ljung, 1987). The innovations form of the state space description of
equations 3.11 is given by
˙
x̂(t)
= A(θ)x̂(t) + B(θ)u(t) + K(θ)ν(t)
y(t) = C(θ)x̂(t) + D(θ)u(t) + ν(t)
(3.13)
A set of transfer functions can be deduced for the linear differential equation system
on the innovations form. The mathematical expressions for the transfer functions
as well as a block diagram are given in equation (3.14) and Figure 3.5, respectively.
The dependence on the parameter vector θ is omitted.
Y (s) = (C(sI − A)−1 B + D) U (s) + ((sI − A)−1 K + I) V (s)
{z
}
{z
}
|
|
G(s)
(3.14)
H(s)
H(s)
u
G(s)
+
Σ
+
y
Figure 3.5. A block representation of the system.
If an identification process on the system given by equation (3.13) is to be successful,
some conditions need to be fulfilled.
3.2 Methods Employing Time Series Data
33
Identifiability is central to the identification process and deals with the problem of
the process being able to yield true values of the parameters of the model and/or
giving a model equal to the true system (Ljung, 1987).
Assuming that our sampled data from the system is informative enough to represent
the system behaviour, the question is if two parameterisations of the system can
produce equal models, i.e., give non-separable output data. Another way to put the
question is if the model structure is invertible or not (Ljung, 1987). Invertibility is
a necessary condition for identifiability and a model, M, is identifiable with a true
parametrisation θ ∗ if
M(θ) = M(θ ∗ ) ⇒ θ = θ ∗ ,
θ ∈ DM
where θ and θ ∗ represent different parameterisations in the set of possible values,
DM , for the model (Ljung and Glad, 2004, Ljung, 1987). The condition above
guarantees global identifiability. Local identifiability is defined in a similar way,
but the invertibility is only valid in a vicinity of θ ∗ .
The input signals, u(t), must be chosen to excite the system (Ljung and Glad,
2004), giving data that is informative enough for the identification, as stated above.
The input signals must have variations as fast as the smallest time constants of
the system. For a sampling process to pick up the frequencies of the oscillations, it
is necessary for the input signals to contain these frequencies. Equally important
is then to sample fast enough not to lose the frequencies by folding or anti-alias
filtering.
An input signal alternating between two levels is a good choice for a linear system,
like the one given by equation (3.10). If the variations are random, this signal
contains all frequencies (Ljung and Glad, 2004). Since we have a linearised system,
it is important to verify that the variations in the input signal are small enough for
the system to respond in a linear fashion. Crucial to the validity of the linearisation
is closeness of the system to the steady state.
A common signal in identification of linear systems is a Pseudo Random Binary
Signal (PRBS)(Ljung and Glad, 2004). A PRBS is a signal that alternates between
two levels in an apparent stochastic manner, although it is deterministic.
Our question is now if the state space model in equation (3.13) is identifiable,
provided that the inputs excite the system in a satisfactory way. In our application,
the matrix D(θ) is zero, meaning that the inputs do not directly affect the outputs.
By the theorem (4A.2) in (Ljung, 1987) it is stated that the system is globally
and locally identifiable, given that it is parameterised according to the observer
canonical form, at θ ∗ if and only if {A(θ ∗ ), [B(θ ∗ )K(θ ∗ )]} is controllable.
The observer canonical form for a multiple input - multiple output (MIMO) system
is characterised by a full parametrisation of the B(θ)-matrix while the C(θ)-matrix
is built up by zeros and ones. The A(θ)-matrix has the same number of fully
34
Methods
parameterised rows as the number of output channels, while the remaining rows
are filled with zeros and ones in a certain pattern (Ljung, 2001).
The exact definition of the observer canonical form can be found in (Ljung, 1987)
and the following is an example with three state variables, two output variables
and one input variable (a SIMO system):




0 1 0
×
A(θ) =  × × ×  , B(θ) =  × 
× × ×
×


× ×
1 0 0
K(θ) =  × ×  ,
C(θ) =
0 0 1
× ×
To achieve a unique connection between the input and output signals, we have to
limit the flexibility in the A(θ)-matrix by fixating certain elements, since we do
not measure all the state variables.
The observer canonical form for the state space model is a sufficient parametrisation
for some of the systems we intend to apply the method to, as stated in Chapter 4.
We have now stated that the method of this section can be used to identify linearised systems around an operating point. The identification itself can, for example, be made with the Systems Identification Toolbox for Matlab (SITB), which is
designed for identification of linear systems.
3.2.2
A Discrete Linearisation Method
A less general and discrete method based on the same kind of linearisation as the
method in the previous section is briefly explained here. The method illustrates a
simple version of the estimation procedure of the SITB, and is included for basic
understanding of parameter estimation.
Time series data is utilised and the availability of measurements of all state variables
is a requirement, i.e., C d (θ) is the identity matrix. The matrix D d (θ) is assumed
to be zero. The subscript d emphasises that we have the discrete equivalents of the
matrices from the continuous system of equation (3.10).
A difference equation is the basis for a discrete modelling of the system. The
system is dependent on a set of parameters, but the dependence is omitted in the
description:
xn+1 = Ad xn + B d un
(3.15)
y n = C d xn
An estimation of the matrix Ad can be used to deduce the Jacobian for the continuous system. This estimation is made with least square methods where the
3.2 Methods Employing Time Series Data
35
following matrices are needed:
R=
xn+1
xn
. . . x1
,
M=
xn
un
xn−1
un−1
. . . x0
. . . u0
By inserting the matrices into the system description (3.15) we will get
R = Ad B d M
To estimate Ad we multiply the expression from the right with the pseudoinverse
of M . The resulting matrix from this operation is of dimensions (n × n + k) where
n is the number of state variables and k the number of input signals. The first n
columns correspond to Ad . Taking the matrix logarithm of the relation Ad = eJ∆T
and dividing by the sampling interval produces an estimated Jacobian.
The estimation of the SITB is more complex, and includes noise models, which is
not the case here. The absence of noise models renders the method very sensitive
to noise.
36
Methods
Chapter 4
Results
Results from testing and evaluation of the methods in the previous chapter are
presented here. Further discussion on the applicability of the methods can be
found in Chapter 5.
4.1
Results from Interaction Graph Determination
The identification method from Section 3.1.1 was evaluated on a simple, artificial
network. The evaluation network is depicted in Figure 4.1. It consists of six nodes
that are connected by eight irreversible rates obeying Michaelis-Menten kinetics.
The kinetic parameters governing the dynamics are given in Appendix A. The
inhibition is modelled as purely competitive while the activation is multiplicative.
The software PathwayLab, developed by InNetics (InNetics AB, 2005), was used
to implement the evaluation network.
The dynamics of the network is described with a set of differential equations. The
steady state, which is the operating point, is sustained through a constant flow of
mass into the system through the rate r1 .
Systematic perturbations to each node in the system were simulated with alterations of the maximum rate, V , from the Michaelis-Menten expression in equation (2.5), for all rates. The maximum rate was perturbed for one rate at a time,
and the steady state computed afterwards. The perturbation matrix Σ, made up
by the approximated sensitivities in (3.2), was calculated from the set of steady
state measurements. The simulation of the system as well as the calculations were
37
38
Results
Figure 4.1. The artificial mass flow network used for method evaluation.
made with Mathematica (Wolfram Research, Inc., 2003).
Rows from the symbolic expression for the Jacobian, with the diagonal elements
fixated to −1, were multiplied with selected columns from the perturbation matrix
Σ in order to form the equation system for identification of the Jacobian. The
symbolic expression for the Jacobian is

−1
f2,1

f3,1

f4,1

f5,1
f6,1
f1,2
−1
f3,2
f4,2
f5,2
f6,2
f1,3
f2,3
−1
f4,3
f5,3
f6,3
f1,4
f2,4
f3,4
−1
f5,4
f6,4
f1,5
f2,5
f3,5
f4,5
−1
f6,5

f1,6
f2,6 

f3,6 

f4,6 

f5,6 
−1
(4.1)
The columns of Σ that are multiplied with row i correspond to perturbations of V
in the rates that are not connected to node i. For example is the first row of the
Jacobian multiplied with columns 3, . . . , 8 of Σ. This multiplication produces six
equations.
The equation system is made up by 30 unknown variables and the method supplies
35 equations. A solution for all the variables was calculated with least square
methods.
By fixing the diagonal elements of the Jacobian, we use the fact that we cannot
4.1 Results from Interaction Graph Determination
39
estimate the Jacobian but up to a scalar multiple, and instead we gain a more
reliable solution to the equation system, i.e., a more reliable estimation.
An increase of five percent of the value of the perturbed parameters resulted in the
following Jacobian:


−12
−12
−11
−13
−1
−1.36·10
 0.1829 −1

3.9956
 0.0016
 −0.0003
0.0024

 −0.0025
1.5539
0.0016
0.0099
−3.11·10
−0.2679
−1
0.1890
0.7033
−0.0006
A need to choose a cut-off limit
considered to be zero is obvious.

−1
0
 0.1829 −1

 0
3.9956

 0
0

 0
1.5539
0
0
0.8333
−0.1524
−0.0013
−1
−0.0385
0.0225
1.51·10
4.15·10
2.96·10−13
2.94·10−14
0.0074
0.0033
−0.0016
−0.0007
−1
0.0002
1.5496
−1






where elements with a value below the limit are
A cut-off of 0.05 gave

0
0.8333
0
0
−0.2679 −0.1524
0
0 

−1
0
0
0 

0.1890 −1
0
0 

0.7033
0
−1
0 
0
0
1.5496 −1
We can compare this estimation with the true Jacobian for the network, which
is calculated by deriving the differential equations and inserting the steady state
values. The Jacobian has been normalised to make it easier in comparing it with
the estimated version.








−1
0
0
0.8333
0
0
0.1989 −1
−0.2521 −0.1658
0
0
0
3.9663 −1
0
0
0
0
0
0.1912 −1
0
0
0
1.4014
0.6418
0
−1
0
0
0
0
0
1.5260 −1








The validation shows that the estimation of the elements are close to the true
values, and perhaps more important, the elements that in the true Jacobian are
zero, are very small compared to the estimated values of the non-zero elements.
Normalisation of the diagonal elements has the effect of adding certainty to the
estimated values of the Jacobian entries. On the other hand, we risk to amplify
elements that in the true Jacobian are zero, beyond the cut-off limit, due to numerical errors. The choice of cut-off limit is difficult, and every choice of the cut-off
is associated with this risk.
If we apply the method to experimental data, the risk of normalisation issues should
be increased, since noise inevitably will affect the measurements. It is not possible
40
Results
to determine a reasonable cut-off limit before hand, since we do not know the
magnitude of the elements that are to be estimated.
4.2
Results from Control Loop Determination
The artificial network in Figure 4.1 was also used to evaluate the method in Section 3.1.2. The method can only be applied if the mass flow connections between
the nodes, named ri , are known, as well as the stoichiometric ratios between substrate and product in each step.
The evaluation network has simple stoichiometric relationships, and consequently,
the M -matrix describing the mass flow, in the differential equation system (3.4),
will have 0, 1 and −1 entries. The total mathematical model of the network is:
ẋ1
ẋ2
ẋ3
ẋ4
ẋ5
ẋ6
= r1 (p1 ) − r2 (p2 )
= r2 (p2 ) − r3 (p3 ) − r6 (p6 )
= r3 (p3 ) − r4 (p4 )
= r4 (p4 ) − r5 (p5 )
= r6 (p6 ) − r7 (p7 )
= r7 (p7 ) − r8 (p8 )
From this description, we can identify

1 −1 0
0 1 −1

0 0
1
M =
0 0
0

0 0
0
0 0
0
the M -matrix as

0
0
0
0
0
0
0 −1 0
0

−1 0
0
0
0

1 −1 0
0
0

0
0
1 −1 0 
0
0
0
1 −1
Perturbations of the rates were simulated with a change of the maximum rate
parameter of the Michaelis-Menten rate law, in the same way as described in the
previous section. The sensitivity matrix formed consequently were of dimensions
(6 × 8).
We do not possess the possibility of normalising the diagonal elements of rx , as
we did earlier with fx , since the matrix M rx formed will not have the normalised
elements on the diagonal. The alternative is to normalise one element of rx , to
set the level for the estimation. Alternatively, M rx could be normalised with the
diagonal elements, but this would give an estimate of rx where the elements are
not scaled equally on each row, compared to the true values.
An equation system is formed by multiplying individual rows of M with rx and
selected columns from the sensitivity matrix Σ according to the criterion (3.7).
4.2 Results from Control Loop Determination
41
The columns that are valid for row i in this criterion are the ones corresponding
to parameters that are not a part of any rate leading to or from node i. For row i,
these are the indexes for the columns of M that contain a zero in row i.
Whether or not we perform a normalisation of the elements, the equation system
is under determined. The method still supplies 35 equations, but we now have
additional unknown variables, due to the fact that rx is of dimensions (8 × 6).
In its current state, the method cannot be applied to an arbitrary network. Though,
by observing the sensitivity matrix and combining it with the available information
on network structure, some additional information can be achieved.
A perturbation of the parameter p8 , that is a part of r8 , will yield the following
vector, corresponding to the last column of the sensitivity matrix:
 
0
0
 
0
∂x

=
0
∂p8
 
0
c
The c corresponds to the non-zero entry. With the example above, this means that
a perturbation of a parameter in r8 only results in a steady state change in the
amount of the species represented by state six. What does this mean in terms of
network wiring?
The criterion (3.7) for the perturbation is
0
0
1
B0
B
0=B
B0
@0
0
−1
1
0
0
0
0
−1
1
0
0
0
0
−1
1
0
0
0
0
−1
0
0
−1
0
0
1
0
0
0
0
−1
r1,1
1 Br2,1
0 B
Br3,1
B
0C
C Br4,1
C
0C B
Br5,1
0A B
Br6,1
0 B
@r7,1
r8,1
r1,2
r2,2
r3,2
r4,2
r5,2
r6,2
r7,2
r8,2
r1,3
r2,3
r3,3
r4,3
r5,3
r6,3
r7,3
r8,3
r1,4
r2,4
r3,4
r4,4
r5,4
r6,4
r7,4
r8,4
r1,5
r2,5
r3,5
r4,5
r5,5
r6,5
r7,5
r8,5
1
r1,6 0 1
r2,6 C
C 0
B C
r3,6 C
C B0C
B C
r4,6 C
C B0C
B C
r5,6 C
C B0C
C
r6,6 C @0A
r7,6 A c
r8,6
Matrix multiplication of the last matrix and the perturbation vector results in a
vector of dimensions (8 × 1) and this operation extracts the last column of the
r x -matrix multiplied with the non-zero element from the perturbation vector.


r1,6
r2,6 


r3,6 


r4,6 


c

r5,6 
r6,6 


r7,6 
r8,6
42
Results
To finish the matrix multiplication, the resulting criterion has the following appearance:


r1,6 − r2,6
r2,6 − r3,6 − r6,6 



0 = c
 r3,6 − r4,6 
 r4,6 − r5,6 
r6,6 − r7,6
This is an under determined equation system which cannot be divided into independent subsystems. Since this system does not have a unique solution, we can
make three interpretations:
- All variables are non-zero
This means that species six affects all rates, and hence all other species in
the network. This is an improbable scenario in a biological system. Why
should there be controlling effects that exactly balance each other out for all
species?
- A subset of the variables are non-zero
It is possible that some of the variables in the equation system are non-zero.
For example, if r1,6 and r2,6 are zero, and the rest are non-zero, the other
variables are uniquely determined. The non-zero variables must balance each
other out exactly. Is this scenario probable? Perhaps more probable than
the first scenario, but still not very believable from a biological point of view.
Species six must affect rates in the network in a very balanced fashion.
- All variables are zero
This is hence the most probable interpretation: species six does not affect
any other rate than rate eight.
From this long statement, we can draw the conclusion that the entries r1,6 , r2,6 ,
r3,6 , r4,6 , r5,6 , r6,6 , r7,6 of rx are zero. Although we can eliminate seven variables
from out equation system, we still have too many variables in relation to equations.
This consideration of the network architecture combined with the sensitivity vectors
is not all in vain, even if the method still is not applicable to the evaluation network.
This at least shows that information can be deduced from the sensitivity matrix
alone, although it is not an easy task. It should also be noted that the algebraic
connections between the different elements deduced from the criterion (3.7) shows
a pattern that cannot at all be picked up by the method of Section 3.1.1. See
further discussion in Chapter 5.
4.3 Results from Local Linearisation and SITB Estimation
4.3
43
Results from Local Linearisation and SITB
Estimation
The local linearisation method described in Section 3.2.1 has been evaluated on
different models of signal cascades with several phosphorylation levels. The idea
was to develop an identification method that could be applied to data from mass
spectrometry, where the ratio of the phosphorylated species to the total amount,
or possibly the individual phosphorylated species, could be measured on each level.
4.3.1
The Evaluation Network
As a starting point, a simple signal cascade of three levels, with one unphosphorylated and one phosphorylated species on each level, was implemented in PathwayLab. The phosphorylation steps were modelled with Michaelis-Menten kinetics
with kinetic parameters that are listed in Appendix A. Each level is an isolated
module with internal mass flow. The different levels communicate via control loops.
The simple model was made with a real signal cascade in mind. It is important
to note that Michaelis-Menten not necessarily is the most plausible kinetic model
for the phosphorylation steps in a real signal cascade, although it is frequently
used. The time constants in the artificial cascade is not necessarily the same as
time constants in real signal cascade. However, the reasoning on sampling and
frequency interpretation below, can be transferred to an arbitrary signal cascade.
The cascade is depicted in Figure 4.2. The signal u in the figure represents the
u
v-1
M1
v-2
M1-P
v-3
M2
v-4
M2-P
Activate
v-5
M3
v-6
M3-P
Inhibit
Figure 4.2. The simplified signal cascade with two species on each level.
input signal to the system. If we relate this signal to a real cascade, it could
correspond to an extracellular activator that binds to receptors on the cell surface.
44
Results
The control loops were mathematically modelled as competitive inhibition and
simple multiplicative activation respectively.
The cascade is described with a state space model with three state variables. Since
we have mass conservation on each level in the cascade, we only have one independent species per level. For example, if the total amount of phosphorylated and
unphosphorylated species on the first level is m1tot , then we can determine the
unphosphorylated species at all time instants as m1tot − M1 P (t), with the same
notation as in Figure 4.2. If we only can measure the ratio of phosphorylated
species to the total amount on each level, we in fact measure each independent
species, but scaled with a constant. Selecting the scaled phosphorylated species as
state variables, will give the following system description:
d M1 P (t)
= f1 (M1 P (t), M3 P (t), u(t), θ)
dt m1tot
d M2 P (t)
= f2 (M1 P (t), M2 P (t), M3 P (t), θ)
dt m2tot
d M3 P (t)
= f3 (M2 P (t), M3 P (t), θ)
dt m3tot
(4.2)
A less cumbersome notation for the set of differential equations is
ẋ1 (t) = f1 (x1 (t), x3 (t), u(t), θ)
ẋ2 (t) = f2 (x1 (t), x2 (t), x3 (t), θ)
ẋ3 (t) = f3 (x2 (t), x3 (t), θ)
(4.3)
The variables each fi is dependent on are deduced from the illustration of the cascade. The vector θ gathers the parameters of the system. The cascade eventually
reaches the chosen operating point, i.e., the steady state.
The true Jacobian of the system, which is used for the validation process, is retrieved by differentiating the right hand side of the differential equations for the
system with respect to each state variable, i.e., the phosphorylated species, and
inserting the steady state values.


x1
x2
x3
 x1 −0.0358
0
−0.0493 

Aref = 
 x2
1.8032 −0.2642 −3.1192 
x3
0
0.3426 −0.1367
The matrix is, as the name implies, the A-matrix of the true linearised state space
model given by equation (3.13). The true B-matrix, quantifying the effect of the
input signal on the derivatives of the state variables, was also calculated in the same
manner. Step responses for each of the three subsystems, {(y1 , u), (y2 , u), (y3 , u)},
were calculated using these matrices and are given in Figure 4.3.
The input signal, u, to the system was a PRBS and was chosen with respect to its
ability to excite the system. The signal is given in Figure 4.4. The offset of the
4.3 Results from Local Linearisation and SITB Estimation
45
Step Response
y1
1
0.5
Amplitude
y2
0
0.2
0.1
0
y3
0.6
0.4
0.2
0
0
10
20
30
40
Time (sec)
50
60
70
80
Figure 4.3. Step responses for the three subsystems.
signal is equal to the value of u in the steady state and the level of the pulses are
small enough for the linear approximation to be valid.
steady
state
value
t
Figure 4.4. The input signal with offset equal to the steady state value.
4.3.2
Simulation and Identifiability
The system was simulated to retrieve data for use in the identification process. The
simulation step involved solving the differential equations modelling the dynamics
between the entities in the system, and was made with Mathematica.
46
Results
The data from the simulation was preprocessed before the identification step. Preprocessing is an important part of the identification. First, the steady state values
were subtracted from the values of each of the state variables, xi , as well as the
input variable u to create the signals for the deviations around the steady state;
e (t)).
(x(t) − x0 , u(t) − u0 ) = (e
x(t), u
Further, the mean values were removed from the signals. This does not affect the
A-matrix but compensates for an offset in the linear differential equation system
originating from the constant amount of mass on each level. We have
e˙ = Ae
e
x
x + Bu
Time series for all state variables are available, giving a C-matrix equal to the
identity matrix. The input signals do not directly affect the outputs, giving a Dmatrix equal to the zero-matrix. The linear state space model on the innovations
form, with the time dependence suppressed, for small deviations from the steady
state becomes:
 

  
   
x1
ν1
× × ×
×
ẋ1
× × ×
ẋ2  = × × × x2  +  0  u + × × × ν2 
× × ×
x3
0
ν3
ẋ3
× × ×
{z
}
|
{z
}
| {z }
|
A
B
K
   
  
(4.4)
ν1
x1
1 0 0
y1
y2  = 0 1 0 x2  + ν2 
x3
0 0 1
y3
ν3
{z
}
|
C
The parametrisation of the matrices given by the state space model in (4.4) is a
canonical parametrisation, if we allow all elements of the B-matrix to be adjustable
in the identification process. The full parametrisation of B gives a more general
method, since it might not be known which modules that are explicitly dependent
on the input signal(s).
The system is hence globally and locally identifiable, given that {A, [B K]} is
controllable, as stated in the previous section.
If we choose the K-matrix to be equal to zero, we have an output-error model; the
error only enters the output equation (Ljung, 2001). The system, {Aref , B ref }, is
controllable because the determinant of the matrix S = [B ref Aref B ref A2ref B ref ]
is non-zero:
4.3 Results from Local Linearisation and SITB Estimation
47




x1
x2
x3
0.0672
 x1 −0.0358

0
−0.0493 

B
=  0
Aref = 
 x2
1.8032 −0.2642 −3.1192  ref
0
x3
0
0.3426 −0.1367


0.0672
0
0
 −0.0024
 det S = 0.000338256
0.1212 0
S=
0.0001 −0.0364 0.0415
4.3.3
The Identification Step
With the information in the previous sextion, we can conclude that if we use our
simulated data, without any noise sources added, sample fast enough and our chosen input signal is sufficiently exciting, we will be able to identify the system properly. The identification was made using the SITB for Matlab, and more specific the
pem function, the basic estimation command, with the iddata data object (Ljung,
2001). More information about the pem function and the choice of parameters is
given in Appendix B.
The linearised system corresponds to three transfer functions; G1 (s), G2 (s), G3 (s).
If we subject the system to an input signal with frequency ω, the complex number
Gi (iw) is the system response for output signal yi . The function Gi (iw) as a
function of ω is called the frequency response. The frequency responses for all the
output signals are given in Figure 4.5 as Bode diagrams.
Frequency response
2
10
Amplitude
0
10
−2
10
−4
10
−2
10
−1
10
0
10
0
10
10
1
10
2
1
10
0
Phase (deg)
−20
−40
−60
−80
−100
−2
10
−1
10
10
Frequency (rad/s)
(a) y1
2
48
Results
Frequency response
0
Amplitude
10
−5
10
−2
10
−1
10
0
10
0
10
10
1
10
2
1
10
1
10
1
10
0
Phase (deg)
−50
−100
−150
−200
−2
10
−1
10
10
Frequency (rad/s)
2
(b) y2
Frequency response
0
10
−2
Amplitude
10
−4
10
−6
10
−8
10
−2
10
−1
10
0
10
0
10
10
2
0
Phase (deg)
−50
−100
−150
−200
−250
−300
−2
10
−1
10
10
Frequency (rad/s)
2
(c) y3
Figure 4.5. Bode plots for the output signals.
The last two amplitude curves have a resonance peak, which means that some
frequencies in the spectrum have a larger gain in the system. A resonance peak
corresponds to a higher order system. The presence of complex conjugate pole
pairs induces an oscillating behaviour in the step response from the system. In the
bode plots we see a peak frequency of approximately 1 rad/s, implying that it is
necessary to pick the Nyquist frequency above 1 rad/s, and well beyond that.
4.3 Results from Local Linearisation and SITB Estimation
49
A choice of the bandwidth of 3 rad/s will include all the frequencies of the resonance
peaks and, according to the rule of thumb given in Section 3.2, ωS should be chosen
to be 30 rad/s. A sampling interval, T , of 0.2 s should therefore be sufficient for
identification.
4.3.4
Varying Sampling Intervals
That the choice of a sampling frequency equal to 0.2 s is sufficient for identification
is confirmed by simulation and subsequent identification for a set of different sampling intervals were T ranges from 0.1 to 3 s with an interval of 0.1 s. The estimated
elements of the Jacobian are close to the true elements when the sampling interval
is close to 0.2 s.
The estimations compared to the true values for each of the elements of the Jacobian
are depicted in Figure 4.6.
Variations of Jacobi entries with sampling interval
4
Variations of Jacobi entries with sampling interval
1
0.8
3
0.6
0.4
Entry (1,2) of jacobian
Entry (1,1) of jacobian
2
1
0.2
0
−0.2
0
−0.4
−0.6
−1
−0.8
−2
0
0.5
1
1.5
Sampling interval
2
(a) The element at (1,1)
2.5
3
−1
0
0.5
1
1.5
Sampling interval
2
(b) The element at (1,2)
2.5
3
50
Results
Variations of Jacobi entries with sampling interval
1
Variations of Jacobi entries with sampling interval
6
5
0
4
Entry (2,1) of jacobian
Entry (1,3) of jacobian
−1
−2
3
2
−3
1
−4
−5
0
0
0.5
1
1.5
Sampling interval
2
2.5
−1
3
0
0.5
(c) The element at (1,3)
1.5
Sampling interval
2
2.5
3
2.5
3
2.5
3
(d) The element at (2,1)
Variations of Jacobi entries with sampling interval
1
1
Variations of Jacobi entries with sampling interval
1
0
0.5
−1
Entry (2,3) of jacobian
Entry (2,2) of jacobian
0
−0.5
−2
−3
−1
−4
−1.5
−2
−5
0
0.5
1
1.5
Sampling interval
2
2.5
−6
3
0
0.5
(e) The element at (2,2)
1.5
Sampling interval
2
(f) The element at (2,3)
Variations of Jacobi entries with sampling interval
1
1
Variations of Jacobi entries with sampling interval
2
1.5
Entry (3,2) of jacobian
Entry (3,1) of jacobian
0.5
1
0.5
0
0
−0.5
0
0.5
1
1.5
Sampling interval
2
(g) The element at (3,1)
2.5
3
−0.5
0
0.5
1
1.5
Sampling interval
2
(h) The element at (3,2)
4.3 Results from Local Linearisation and SITB Estimation
51
Variations of Jacobi entries with sampling interval
1
0.5
Entry (3,3) of jacobian
0
−0.5
−1
−1.5
−2
0
0.5
1
1.5
Sampling interval
2
2.5
3
(i) The element at (3,3)
Figure 4.6. Gradual sampling results for each element of the Jacobian. The dotted line
represents the true value of the element.
From the subfigures of Figure 4.6 it is possible to see a trend towards estimations
that lie further and further from the true value, with increasing sampling interval.
The exception is for time intervals of 0.5, 1, 2, and 2.5 s. The estimations produced with these intervals are very close to the true values, and compared to the
neighbouring time intervals, a lot closer to the true values.
A reason for this might be that the sampling points are located on favourable parts
of the step responses from the PRBS that are important for the identification
process.
A method to confirm this suspicion is to alter the dynamics of the signal cascade,
and perform the same investigations. If the frequency responses from the altered
cascade system still are tolerably similar to the ones of Figure 4.5 but with resonance peaks displaced, schemes for the gradual sampling intervals, like the ones in
Figure 4.6, should exhibit the same pattern, but also dislocated.
Dislocating the resonance peaks and the frequency responses as a whole involves
slowing down or speeding up the dynamics of the step responses and the oscillations
therein.
The Bode plot for y2 of an altered system is depicted in Figure 4.7 and comparing it
to subfigure Figure 4.5(b) will reveal a shifted amplitude curve with the resonance
peak dislocated to lower frequencies. The same is true for the resonance peak in
the bode plot for y3 .
A plot with the dependence of the estimation of the Jacobian element (3,3) upon
sampling interval is shown in Figure 4.8 below. The estimation with a sampling
interval of 2.5 s is still close to the true value, and closer to the estimations with
52
Results
Frequency response
0
10
Amplitude
−2
10
−4
10
−6
10
−2
−1
10
10
0
10
0
10
10
1
10
2
1
10
0
Phase (deg)
−50
−100
−150
−200
−2
10
−1
10
10
Frequency (rad/s)
2
Figure 4.7. The bode plot for the output channel corresponding to y2 for the alterred
cascade.
Variations of Jacobi entries with sampling interval
0
−0.05
Entry (3,3) of jacobian
−0.1
−0.15
−0.2
−0.25
−0.3
−0.35
0
0.5
1
1.5
Sampling interval
2
2.5
3
Figure 4.8. Dependence of the estimated Jacobi element in position (3,3) upon sampling
interval for the altered cascade.
sampling intervals near 2.5 s. The plausible explanation of the sudden occurring
improved estimations could not be proven with this procedure.
4.3.5
Noise addition
Noise always affects data collection in real systems. The sensitivity of the identification of the simplified cascade in Figure 4.2 can be evaluated through addition of
noise to the simulated signals.
4.3 Results from Local Linearisation and SITB Estimation
53
Measurement noise was simulated by an addition of random elements, drawn from
a normal distribution with zero mean and a small variance, σ 2 , to the sampled
values of the output signals. The noise is stochastic and cannot be predicted, even
if complete information of its history is available. It is neither possible to exactly
describe the noise as a signal, although we can assess mean, amplitude distribution,
and autocorrelation (Svärdström, 1999).
Noise lacks periodicity and does not correlate with itself. The autocorrelation for
a white noise signal is consequently simple. The autocorrelation only consists of a
strong peak at k = 0:
rxx (k) = σ 2 δ(k)
The phase spectrum of the noise is impossible to retrieve because of the stochastic
nature of the noise, but the power spectrum, P (ω), is attainable. The power
spectrum and the autocorrelation for a signal constitute a discrete-time Fourier
transform (DTFT) pair. The DTFT for a signal x(n) is defined as
F{x(n)} = X(Ω) =
∞
X
x(n)e−jΩn
n=−∞
The transform is continuous and periodic with period 2π (Poularikas, 1999). The
argument Ω is a normalised frequency. The transform can be expressed with the
angular frequency, ω, as the argument and is an approximation to the continuoustime Fourier transform, defined earlier.
X(ω) =
∞
X
T x(n)e−jωnT
n=−∞
where T is the sampling interval for the sequence x(n) (Poularikas, 1999).
The power spectrum for a white noise signal is, with these definitions, equal to σ 2 T .
Non-band limited white noise has a constant power spectrum for all frequencies.
The addition of noise sources produces a need to also estimate the Kalman filter
K(θ), which will minimise the prediction error, as we saw earlier. How the estimation of the Jacobian elements succeed as a function of the variance of the noise
that is added can be seen in Figure 4.9. The complete set of plots can be found in
Appendix C.
54
Results
Variations of Jacobi entries with noise variance
0.06
Variations of Jacobi entries with noise variance
0.06
0.04
0.05
0.02
0.04
Entry (1,2) of jacobian
Entry (1,1) of jacobian
0
−0.02
−0.04
0.03
0.02
−0.06
0.01
−0.08
0
−0.1
−0.12
−12
10
−11
10
−10
10
Noise variance
−9
10
(a) The element at (1,1)
−8
10
−0.01
−12
10
−11
10
−10
10
Noise variance
−9
10
−8
10
(b) The element at (1,2)
Figure 4.9. The effect of addition of noise sources with increasing variance on the
estimation of the Jacobian elements.
The estimations of the elements are remarkably further from the true values when
the variance of the noise reaches 10−9 and particularly 10−8 . We can motivate the
increasingly poorer estimations by an examination of the power spectrums for the
noise source and the estimated power spectrums for the output signals. Let the
noise spectrum be constant and equal to 0.2 · 10−9 = 2 · 10−10 . The joint plots of
Figure 4.10 for the noise spectrum and for each output signal help us to evaluate
which frequencies that are shadowed by the noise.
The plots of the spectrum for all output signals reveal that the noise will affect and
dominate, at least partially, over the frequencies that correspond to the resonance
peaks (in the Bode diagrams), that is, the oscillations in the step responses. These
frequencies are vital to the identification process, and the poor resulting estimations
are understandable.
Performing several analyses of this type for different variances might give information of exactly which frequencies that are crucial to the identification process.
That a variance of 10−9 for the noise sources causes identification problems is quite
clear if we examine the signals in Figure 4.11 and compare them to the noise-free
versions of the signals in Figure 4.12.
The oscillations of the step responses are drowned by the addition of noise and it
is not possible to identify the Jacobian elements correctly.
4.3 Results from Local Linearisation and SITB Estimation
Spectrum estimate
−6
10
−7
10
−8
10
−9
y1
10
−10
10
−11
10
−12
10
−13
10
−14
10
−1
10
0
10
Frequency (rad/s)
1
10
1
10
10
2
(a) Spectrum for y1
Spectrum estimate
−6
10
−7
10
−8
10
−9
y2
10
−10
10
−11
10
−12
10
−13
10
−14
10
−1
10
0
10
Frequency (rad/s)
(b) Spectrum for y2
10
2
55
56
Results
Spectrum estimate
−6
10
−7
10
−8
10
−9
y3
10
−10
10
−11
10
−12
10
−13
10
−14
10
−1
0
10
1
10
Frequency (rad/s)
2
10
10
(c) Spectrum for y3
Figure 4.10. Plot of the spectrum for each output signal and the noise spectrum.
−4
12
x 10
10
8
y1
6
4
2
0
−2
0
100
200
300
400
500
time (s)
(a) y1
600
700
800
900
1000
4.3 Results from Local Linearisation and SITB Estimation
57
−4
3.5
x 10
3
2.5
2
y2
1.5
1
0.5
0
−0.5
−1
0
100
200
300
400
500
time (s)
600
700
800
900
1000
600
700
800
900
1000
(b) y2
−4
8
x 10
7
6
5
y3
4
3
2
1
0
−1
0
100
200
300
400
500
time (s)
(c) y3
Figure 4.11. Output signals with noise of variance 10−9 added and a sampling interval
of 0.2 s.
58
Results
−3
1.2
x 10
1
y1
0.8
0.6
0.4
0.2
0
0
100
200
300
400
500
time (s)
600
700
800
900
1000
600
700
800
900
1000
(a) y1
−4
3
x 10
2.5
y2
2
1.5
1
0.5
0
0
100
200
300
400
500
time (s)
(b) y2
4.3 Results from Local Linearisation and SITB Estimation
59
−4
6
x 10
y3
4
2
0
0
100
200
300
400
500
time (s)
600
700
800
900
1000
(c) y3
Figure 4.12. Output signals without added noise.
4.3.6
Additional Investigations
The method of this section was also applied to the signal cascade of Figure 2.1,
which is a more complex cascade with three species on each level. The role of the
input signal of this system is carried by the membrane bound kinase Ras-GTP
(Kholodenko et al., 2002).
The network has six independent species owing to mass conservation on each phosphorylation level. The network dynamics is modelled with six differential equations
that are given in the appendix of (Kholodenko et al., 2002). The expressions for
the rates are complex and are dependent on a large set of parameters.
The cascade can be linearised around a steady state, like we saw in the beginning
of the section, and will result in a set of six linear differential equations, x =
(x1 , . . . , x6 ):
ẋ = Ax + Bu.
(4.5)
The linearisation is valid in a vicinity of a steady state, and the variable x repree is dropped to avoid
sents the deviations from this steady state. The notation x
cumbersome expressions.
If we can measure all the phosphorylated species independently, the system is
globally identifiable and it is possible to find the true Jacobian, although it is
60
Results
computationally demanding. The objective to find the Jacobian if time series of
only fractions of phosphorylated species on each level is available, is more complex.
Since we have three species on each level, and two are independent, we do not
measure all the state variables. If we use the canonical parametrisation for the
state space description, we cannot fully fill the A-matrix. This is not enough for
our identification purposes, since we seek a fully parameterised Jacobian. The
interactions within the different levels are very complex, and it would be difficult
to interpret the wiring.
A slightly different approach is to only estimate a Jacobian of dimensions (3 × 3).
The fraction of the phosphorylated species for each level form the three state variables. An estimation of this Jacobian would reveal how the sets of phosphorylated
species on different levels are connected through control loops.
The difficulty with this approach is most likely dependent on the fact that the
summation of two signals might cancel out important oscillations that are crucial
to the identification process. It is always more difficult to try and identify parts
of a system where states that in reality do affect the dynamics of the included
states are left out. Figure 4.13 shows a cancellation, probably due to the fact the
oscillations are small compared to the amplitude of the signals.
The small oscillatory behaviour induced by the input signal is lost if we add the
two phosphorylated species. The identification attempts were unsuccessful when
this data was used.
What we try to do when we reduce the system to only three states instead of six,
is really an attempt to estimate the six states of the system from only measurement of the three species. By multiplying the reduced differential equation system
(4.5) for six species with a transformation matrix representing the addition of the
phosphorylated species on each level will lead to:

 1
1
0
0
0
0
m1,tot
m1,tot

1
1
0
0
0 
(4.6)
T = 0

m2,tot
m2,tot
1
1
0
0
0
0
m3,tot
m3,tot
d
T x = T ẋ = T Ax + T Bu
(4.7)
dt
If we want to remove x from the three species system, we need to approximate it
by replacing it with T −1 z, where T −1 is the pseudoinverse of T .
ż =
Using time series data for only the bi-phosphorylated species did not lead to any
useful results. This confirms the fact that measurements of all independent variables are necessary for the method to be applicable.
4.3 Results from Local Linearisation and SITB Estimation
61
MKKK-PP
19.5
MKKK-P
15.65
19.4
19.3
15.6
19.2
15.55
19.1
500
520
540
560
580
600
time
500
520
540
560
580
600
time
18.9
15.45
(a) MKKK-PP
(b) MKKK-P
HMKKK-P+MKKK-PPLDm1,tot
0.175
0.174
0.173
500
520
540
560
580
600
time
(c) Fraction of phosphorylated species
Figure 4.13. The dynamics of a response to an impulse for a part of the MAPK cascade.
62
Results
Chapter 5
Discussion
In this chapter, different aspects of identification of biochemical networks are discussed. The applicability of the different methods are compared and the choice of
which method to possibly employ in different situations is discussed.
The choice of identification method is first and foremost dependent on the available
experimental data. The division of the sections in the previous chapters reflect this
fact. Time series data and steady state data have different information content
and this of course affects the amount of information possible to retrieve from the
data.
The interaction graph determining method of Section 3.1.1 is practically not a
complicated method, but the requirements of the method can be hard to fulfill. In
contrast, the theoretical basis for the method is more elaborate. The mathematics
behind the method is easier to understand when the method is put to use.
The interaction graph method is dependent on the possibility of perturbations to
the different nodes of the network under investigation. The perturbations can be
additions of inhibitors or activators as well as RNAi:s that effect the production of
a catalysing enzyme. If the network consists of n nodes, at least n perturbations
to different nodes is needed, and some information about the network wiring is
required before hand, as described in the previous chapter. The question is if it
really is realistic to perform this amount of perturbation experiments. Effectors
of different kinds must be known for a large set of catalysing enzymes, and each
effector can only affect a single enzyme.
The change in the parameter, ∆pj , of a perturbation is seldom known. The sensitivities must then be approximated, and with the consideration of measurement
noise, the method can be unreliable. The normalisation of the diagonal elements
of the Jacobian can cause enlargement of elements that otherwise should be zero.
63
64
Discussion
The advantage of the method is that for simple, and particularly signalling, networks it is easier to use. Nodes of signalling networks can be chosen to have limited
information exchange through mass flow, and this simplifies the method considerably, although the problems with noise and normalisation still remain.
The second method based on local parameter perturbation data, from Section 3.1.2,
exhibits the same issues with perturbations and normalisation, but since the mass
flow is known, the requirements are easy to fulfill. The method was developed in
order to replace the need for interpretation of the Jacobian, which can be difficult
for networks with mass flow. As described earlier, the mass flow of the network and
the stoichiometric relationships are assumed to be known. The problem with this
method was that, despite knowledge of the mass flow, the introduction of additional
unknown variables, rendered the method unusable.
The core of this problem is that a set of non-zero elements of the matrix rx is
known to be non-zero, but the elements cannot be normalised. Since the values of
these entries were not exactly known, the information could not be used.
The method still provided some useful interpretations of certain entries of the
sensitivity matrix. The occurrence of a sparse perturbation vector induced several
alternatives concerning the entries of rx . Analysis of the same network with the
interaction graph determining method did not reveal any of this information. This
extra complexity is probably the reason that the control loop determination method
became to elaborate for application. It is still important to note that information is
retrievable from the perturbation matrix in combination with the method, although
some work must be done to do so.
The main method based on time series data from Section 3.2.1 is fundamentally
different from the methods based on local parameter perturbation data. The application of control system methods is an aid in the evaluation of the method.
The method is based on the fact that all species, that are represented by a state
variable, are measurable during a time span. In the previous chapters, the difficulty
of time series measurements on biological systems have been investigated. With
the development of experimental methods, the availability of time series data will
probably increase, but at the moment quantitative data of this kind is rare.
The method requires an input signal with certain properties. The signal must
sufficiently excite the system, and such a signal can be difficult to realise in practise.
The PRBS used in the simulations can correspond to changes in the amount of an
extracellular signalling molecule or any other substance effecting the nodes of the
network. An extracellularly regulated input signal is much simpler to affect than
an intracellular signal, since alterations of the cell environment risk changing the
complete state of the cell, and are simply more difficult to realise.
The advantage of the method is that a Jacobian of the system under investigation is possible to exactly determine, given the “right” circumstances, as described
65
earlier. Individual perturbations to the nodes of the network are not needed, and
no information about the network wiring is required before hand, as in the methods based on steady state measurements. The local linearisation method simply
demands knowledge and measurements of the species that are represented by the
state variables. Which state variables that are explicitly dependent on the input
signal(s) are neither necessary to know before hand. If the system under investigation is large, i.e., has many nodes, the identification algorithm can be slow, and
convergence problems might appear. The problem of large, complex networks is in
fact an issue in all identification methods.
The local linearisation method has a certain tolerance for noise, due to the noise
models that are included in the system description, and since biological data often
is noisy, this is an advantage compared to the other methods of this thesis. The
methods based on steady state data, do not have any modelling of the noise, and
are hence more susceptible to different noise sources.
As a comparison, the simple discrete method of Section 3.2.2 was evaluated on the
same network as the more complex local linearisation model of Section 3.2.1, and
exhibited extreme sensitivity to noise of even a very small variance, showing the
importance of adding noise models.
66
Discussion
Chapter 6
Conclusions
Identification of biochemical systems is not an easy task, due to several reasons like
measurement issues, the complexity of the networks, and input signal excitation
and perturbation possibilities. Are any of the identification methods in this thesis
really applicable to experimental systems?
The methods probably can be applied to biological systems, but thorough consideration of the requirements of each method is a must, as well as understanding of
the basis for the methods. If the requirements of the method are not fulfilled, the
identification will inevitably fail. In applying the methods to biological systems, it
is advisory, if possible, to not use certain knowledge of the system in the methods,
and instead employ this information in some kind of validation, although this is
not a complete validation procedure. The most important thing to bare in mind
is that an identification process will give the results that are deducible from the
data, which is the “true” system, based on this particular data. An advise to the
user: consider the output from an identification method a possible estimation of
the system structure.
The most reliable identification of a system would probably be the result of a
combination of different methods, based on both local parameter perturbation data
and time series data respectively. The method most reliable on its own is probably
the local linearisation method with added noise models.
A natural continuation of this master thesis would be to, if the possibility exists,
test the methods on experimental data from biochemical systems.
67
68
Conclusions
Bibliography
Atkins, P. and Jones, L. (1999). Chemical principles: the quest for insight. W. H.
Freeman and Company, NY, USA.
Bhalla, U. S. and Iyengar, R. (1999). Emergent properties of networks of biological
signaling pathways. Science, 283(5400):381–387.
BioSim Network (2005). Biosim project. http://chaos.fys.dtu.dk/biosim.
Close, C. M., Frederick, D. K., and Newell, J. C. (2002). Modeling and Analysis
of Dynamic Systems. John Wiley & Sons, 605 Third Avenue, New York, NY
USA, 3rd edition.
Cornish-Bowden, A. (1995). Fundamentals of Enzyme Kinetics, Revised edition.
Portland Press, London, UK.
Cornish-Bowden, A. and Wharton, C. W. (1988). Enzyme kinetics. In focus. IRL
Press, Eynsham, Oxford, UK.
Edelstein-Keshet, L. (1988). Mathematical Models in Biology. McGraw-Hill.
Friedman, N., Linial, M., Nachman, I., and Pe’er, D. (2000). Using bayesian
networks to analyze expression data. Journal of Computational Biology,
7(3/4):601–620.
Ideker, T., Thorsson, V., Ranish, J. A., Christmas, R., Buhler, J., Eng, J. K., Bumgarner, R., Goodlett, D. R., Aebersold, R., and Hood, L. (2001). Integrated
genomic and proteomic analyses of a systematically perturbed metabolic network. Science, 292(5518):929–934.
InNetics AB (2005). Pathwaylab. http://innetics.com/.
Kholodenko, B. N., Kiyatkin, A., Bruggeman, F. J., Sontag, E., Westerhoff, H. V.,
and Hoek, J. B. (2002). Untangling the wires: A strategy to trace functional
interactions in signaling and gene networks. PNAS, 99(20):12841–12846.
Kholodenko, B. N. and Sontag, E. D. (2002). Determination of functional network
structure from local parameter dependence data. ArXiv Physics e-prints.
69
70
Bibliography
Kitano, H. (2001). Foundations of Systems Biology, chapter Systems Biology:
Toward System-level Understanding of Biological Systems, pages 1–29. MIT
Press, Cambridge, MA USA.
Kitano, H. (2002). Systems biology: A brief overview. Science, 295(5560):1662–
1664.
Linstrom, P. J. and Mallard, W. G., editors (2003).
NIST Chemistry
WebBook, NIST Standard Reference Database Number 69. National Institute of Standards and Technology, Gaithersburg MD 20899, USA.
(http://webbook.nist.gov).
Ljung, L. (1987). System Identification: Theory for the User. Prentice-Hall, Englewood Cliffs, N.J. USA.
Ljung, L. (2001). System Identification Toolbox for use with MATLAB, User’s
Guide. The MathWorks, Inc., Natick, MA USA, version 5 edition.
Ljung, L. and Glad, T. (1997). Reglerteori. Studentlitteratur, Lund, Sweden. In
Swedish.
Ljung, L. and Glad, T. (2004). Modellbygge och simulering. Studentlitteratur,
Lund, Sweden, 2nd edition. In Swedish.
Lodish, H., Berk, A., Zipursky, S. L., Matsudaira, P., Baltimore, D., and Darnell,
J. (2000). Molecular Cell Biology. W. H. Freeman and Company, 4th edition.
Morel, N. M., Holland, J. M., van der Greef, J., Marple, E. W., Clish, C., Loscalzo,
J., and Naylor, S. (2004). Primer on medical genomics part xiv: introduction
to systems biology - a new approach to understanding disease and treatment.
Mayo Clinic Proceedings, 79(5):651–658.
Poularikas, A. D. (1999). The Handbook of Formulas and Tables for Signal Processing, chapter Discrete-Time Fourier Transform, One- and Two-Dimensional.
The Electrical Engineering Handbook Series. CRC Press, Boca Raton, FL
USA.
Svärdström, A. (1999). Signaler och system. Studentlitteratur, Lund, Sweden. In
Swedish.
Wahde, M. and Hertz, J. (2000). Coarse-grained reverse engineering of genetic
regulatory networks. Biosystems, 55(1-3):129–136.
Wolfram Research, Inc. (2003). Mathematica. Wolfram Research, Inc., Champaign,
Illinois, USA.
Wolkenhauer, O., Kitano, H., and Cho, K.-H. (2003). Systems biology. IEEE
Control Systems Magazine, 23(4):38–48.
71
Yi, T.-M., Huang, Y., Simon, M. I., and Doyle, J. (2000). Robust perfect adaption
in bacterial chemotaxis through integral feedback control. PNAS, 97(9):4649–
4653.
Zeigler, B. P., Praehofer, H., and Kim, T. G. (2000). Theory of modeling and simulation : integrating discrete event and continuous complex dynamic systems.
Academic Press, San Diego, CA USA.
Zubay, G. L., Parson, W. W., and Vance, D. E. (1995). Principles of Biochemistry.
Wm. C. Brown Communications, Dubuque, IA USA.
72
Bibliography
Appendix A
Kinetic Parameters for the
Evaluation Networks
The evaluation network from Section 4.1 consists of six states that are interconnected through rates with Michaelis-Menten kinetics. The kinetic parameters for
each rate are given in Table A.1 below, where V is the maximum rate, Km the
Michaelis constant, and Ki the inhibition constant in the purely competitive inhibition mechanism.
Table A.1. Michaelis-Menten parameters for the rates in the evaluation network with
mass flow.
r1
r2
r3
r4
r5
r6
r7
r8
V
0.5
1.1
1
0.5
1.2
1.5
1
0.8
Km
1
1
1
1
1
1
1
Ki
1
-
The initial values of the amounts of each species were set to 1.0.
73
74
Kinetic Parameters for the Evaluation Networks
The kinetic parameters of the simplified artificial signal cascade, depicted in Figure 4.2 and used for evaluation in Section 4.3, are listed in Table A.2.
Table A.2. Michaelis-Menten parameters for the rates in the simplified signal cascade.
v1
v2
v3
v4
v5
v6
V
2.5
2.2
1.2
2.5
0.5
5
Km
0.9
1
1
1
1
1
Ki
1.3
-
The initial values for the phosphorylated species were set to zero, while the initial
values for M1 , M2 , and M3 were chosen to be 30, 18, and 25 respectively.
Appendix B
Some Functions in the SITB
The System Identification ToolBox (SITB) for Matlab contains a plethora of functions and data objects. The information in this appendix is summarised from the
SITB manual (Ljung, 2001).
The pem function is the basic estimation command and estimates parameters of
general linear models. It is a maximum likelihood method and iteratively minimises
a quadratic prediction error criterion. The search for the the optimum is governed
by a set of options.
m = pem(data,orders,’Property1’,Value1,...,’PropertyN’,ValueN)
The data is any form of data object while orders indicates how many states the
model to be estimated has.
The iddata object is a basic object for handling of signals in the toolbox. Most
functions can process data in this form. The data object handles both frequency
and time domain data.
data = iddata(y,u,Ts,’Property1’,Value1,...,’PropertyN’,ValueN)
y represents the output signals in column form, while u contains the input signals
in the same form. T s is the sampling interval in the data.
The following code exemplifies the estimations made in the thesis:
%make data
data=iddata(output,input,sampinterval);
75
76
Some Functions in the SITB
%detrend data
datad = detrend(data,’constant’);
%make model
model=pem(datad,3,’ss’,’can’,’Ts’,0,’DisturbanceModel’,’None’);
Included is also the detrend-command that removes trends from data. Here the
mean is removed from each signal, since the option ’constant’ is given to the function.
The option ’ss’ represents the parametrisation of the matrices in the model, which
is chosen to be canonical in the example. The option ’Ts’ is set to 0 to give the
information that a continuous model is to be estimated. A disturbance model is
not estimated in the example, indicated by the option ’None’.
Appendix C
Noise Effects on Estimation
White noise effects can be simulated by adding random elements from a normal
distribution with zero mean and a given variance to output signals from a system,
which was explained in the main part of the thesis. Estimations of the Jacobian
elements for the simplified signalling cascade, given by Figure 4.2, were done with
additions of noise with increasingly larger variance. The estimations of all nine
Jacobi elements as a function of the noise variance is illustrated with the following
plots.
Variations of Jacobi entries with noise variance
0.06
Variations of Jacobi entries with noise variance
0.06
0.04
0.05
0.02
0.04
Entry (1,2) of jacobian
Entry (1,1) of jacobian
0
−0.02
−0.04
0.03
0.02
−0.06
0.01
−0.08
0
−0.1
−0.12
−12
10
−11
10
−10
10
Noise variance
−9
10
−8
10
(a) The element at (1,1)
−0.01
−12
10
−11
10
−10
10
Noise variance
−9
10
(b) The element at (1,2)
77
−8
10
78
Noise Effects on Estimation
Variations of Jacobi entries with noise variance
0.1
Variations of Jacobi entries with noise variance
2.2
2.15
0.05
2.1
2.05
Entry (2,1) of jacobian
Entry (1,3) of jacobian
0
−0.05
−0.1
2
1.95
1.9
1.85
−0.15
1.8
−0.2
1.75
−0.25
−12
10
−11
10
−10
10
Noise variance
−9
10
1.7
−12
10
−8
10
(c) The element at (1,3)
−10
10
Noise variance
−9
10
−8
10
(d) The element at (2,1)
Variations of Jacobi entries with noise variance
−0.15
−11
10
Variations of Jacobi entries with noise variance
−2.9
−3
−0.2
−3.1
Entry (2,3) of jacobian
Entry (2,2) of jacobian
−0.25
−0.3
−0.35
−0.4
−3.3
−3.4
−3.5
−0.45
−0.5
−12
10
−3.2
−3.6
−11
10
−10
10
Noise variance
−9
10
(e) The element at (2,2)
−8
10
−3.7
−12
10
−11
10
−10
10
Noise variance
−9
10
(f) The element at (2,3)
−8
10
79
Variations of Jacobi entries with noise variance
0.05
Variations of Jacobi entries with noise variance
0.355
0.35
0.345
0
Entry (3,2) of jacobian
Entry (3,1) of jacobian
0.34
−0.05
−0.1
0.335
0.33
0.325
0.32
0.315
−0.15
0.31
−0.2
−12
10
−11
10
−10
10
Noise variance
−9
0.305
−12
10
−8
10
10
−11
−10
10
(g) The element at (3,1)
10
Noise variance
−9
10
−8
10
(h) The element at (3,2)
Variations of Jacobi entries with noise variance
0.2
0.15
Entry (3,3) of jacobian
0.1
0.05
0
−0.05
−0.1
−0.15
−0.2
−12
10
−11
10
−10
10
Noise variance
−9
10
−8
10
(i) The element at (3,3)
Figure C.1. The effects of addition of noise with increasing variance on the estimation
of the complete set of Jacobian elements.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement