Difference algebra and system identification Linköping University Post Print
Difference algebra and system identification
Christian Lyzell, Torkel Glad, Martin Enqvist and Lennart Ljung
Linköping University Post Print
N.B.: When citing this work, cite the original article.
Original Publication:
Christian Lyzell, Torkel Glad, Martin Enqvist and Lennart Ljung, Difference algebra and system identification, 2011, Automatica, (47), 9, 18961904. http://dx.doi.org/10.1016/j.automatica.2011.06.013
Copyright: Elsevier http://www.elsevier.com/
Postprint available at: Linköping University Electronic Press http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva71084
Di
ff
erence Algebra and System Identification
Christian Lyzell, Torkel Glad, Martin Enqvist, Lennart Ljung
Division of Automatic Control, Department of Electrical Engineering, Link¨oping University, SE58183, Sweden
Abstract
The framework of di ff erential algebra, especially Ritt’s algorithm, has turned out to be a useful tool when analyzing the identifiability of certain nonlinear continuoustime model structures. This framework provides conceptually interesting means to analyze complex nonlinear model structures via the much simpler linear regression models. One di ffi culty when working with continuoustime signals is dealing with white noise in nonlinear systems. In this paper,
di ff erence
algebraic techniques, which mimic the di ff erential algebraic techniques, are presented. Besides making it possible to analyze discretetime model structures, this opens up the possibility of dealing with noise. Unfortunately, the corresponding discretetime identifiability results are not as conclusive as in continuous time. In addition, an alternative elimination scheme to Ritt’s algorithm will be formalized and the resulting algorithm is analyzed when applied to a special form of the nfir model structure.
Key words:
System identification, Identifiability, Ritt’s algorithm.
1 Introduction
A trend in the optimization community is to try to find convex reformulations of initially nonconvex optimization problems, either by relaxations or by algebraic manipulations, for which e ffi cient and robust algorithms already exist.
The
predictionerror
approach (see, for instance, Ljung,
1999) to parameter estimation for timeinvariant model structures often involves solving a nonconvex optimization problem. Since there are usually many local optima, it may be di ffi cult to guarantee that the global optimum will be found when applying a local search method.
A necessary condition for the uniqueness of the global optimum in a parameter estimation problem is that the chosen model structure is
globally identifiable
. In continuous time, Ljung and Glad (1994) were able to show that a nonlinear model structure, containing polynomial nonlinearities, is globally identifiable if and only if it can be rewritten as a linear regression model. The implications of this result for system identification are twofold. First of all, instead of analyzing the properties of a globally identifiable nonlinear model structure, one can work with an equivalent linear regres
Email addresses:
(Christian Lyzell), [email protected]
(Torkel Glad), [email protected]
(Martin Enqvist), [email protected]
(Lennart Ljung).
sion model, which significantly simplifies the analysis.
Second, since an algorithm that performs the transformations of the nonlinear model to an equivalent linear regression model is provided, this also implies that the parameter estimation problem becomes trivial once the transformations have been made. The current paper concerns the discretetime version of the above result.
The proof of the identifiability result in Ljung and
Glad (1994) is based on di ff erential algebraic techniques and, in particular, Ritt’s algorithm. This theory was founded by Ritt (1950) and then further developed by Seidenberg (1956) and Kolchin (1973). An introduction to di ff erential algebra from a control perspective is given in Fliess and Glad (1994), Fliess (1990) and further control related applications are presented in Fliess et al. (1995). Algorithmic aspects of Ritt’s algorithm are discussed in Diop and Fliess (1991a), Diop and
Fliess (1991b), Diop (1989), Diop (1991), Glad (1992) and Glad (1997). For nondi ff erential problems an alternative to Ritt’s theory of characteristic sets is given by Gr¨obner bases, see, for instance, Buchberger (1976) and Becker and Weispfenning (1993). A comparison of the two approaches can be found in Forsman (1991,
Chapter 2).
The goal of this paper is to generalize Ritt’s algorithm to systems of di ff erence equations. This problem has been addressed earlier by Kotsios (2001) by introducing algebraic operations on the
δ
operator first ap
Preprint submitted to Automatica 3 March 2011
pearing in Kotsios and Kalouptsidis (1993). However, this makes it di ffi cult to compare the resulting algorithm with the continuoustime version presented in, for instance, Glad (1997). In this presentation, a more direct approach to the generalization of Ritt’s algorithm to discrete time with as few alterations from the continuoustime version as possible will be given. Besides making it possible to analyze discretetime model structures, this opens up the possibility of dealing with noise in a system identification setting.
Example 1.
Consider the model structure defined by
y
(
t
) =
θu
(
t
) +
θ
2
u
(
t
− 1) +
e
(
t
)
,
(1) where
y
(
t
) is the output,
u
(
t
) is the input, and
e
(
t
) is the noise, respectively. Now, consider the problem of estimating the unknown parameter
θ
in (1) by minimizing the
leastsquares predictionerror
V
N
(
θ, Z
N
)
,
1
N
N
X
t
=1
y
(
t
) − ˆ (
t

θ
)
2
,
where the
predictor
ˆ
(
t

θ
) is given by
y
ˆ (
t

θ
)
,
θu
(
t
) +
θ
2
u
(
t
− 1)
.
(2)
A dataset
Z
N
= (
u
(
t
)
, y
(
t
))
99
t
=0 has been generated from (1) using the input
u
(
t
) = sin (
t
) + sin (
t/
3), the parameter value
θ
= 0
.
9 and the noise
e
(
t
) = 0. The solid line in Figure 1 illustrates the cost function (2) as a quartic polynomial in the parameter
θ
, which is clearly nonconvex. Thus, to be able to find the global
2
1
0
−
3
−
2
−
1 0
θ
1 2 3
Fig. 1. The cost function (2) for the problem of estimating the parameter
θ
in (1) via minimizing the cost function (2).
The solid line represents using (1) directly and the dashed line using the reformulated description (4).
optimum via a local search, one must have some a priori knowledge regarding the value of the parameter
θ
.
A common method to find an initial estimate of
θ
is to use
overparametrization
(see, for instance, Bai, 1998).
θ
1
θ
2
,
θ
and ˜
2
,
, so that the model defined by (1) is linear in the new parameters, which may now be estimated using standard leastsquares techniques. Since one is only
θ
1
, one can just ignore
θ
1
θ
2 or reconcile it with the
. Here, we propose a di ff erent technique.
Instead of introducing new parameters, let us, inspired by the di ff erential algebra (see, for instance, Ljung and
Glad, 1994), time shift (1):
y
(
t
− 1) =
θu
(
t
− 1) +
θ
2
u
(
t
− 2) +
e
(
t
− 1)
.
(3)
By examining the equations, we see that multiplying (1) by
u
(
t
− 2) and (3) by
u
(
t
− 1), and then subtracting the result, we obtain
u
(
t
− 2)
y
(
t
) −
u
(
t
− 1)
y
(
t
− 1)
= (
u
(
t
− 2)
u
(
t
) −
u
2
(
t
− 1))
θ
+
u
(
t
− 2)
e
(
t
) −
u
(
t
− 1)
e
(
t
− 1)
,
(4) which is a linear regression model. Now, the leastsquares cost function for this formulation, given the same noisefree dataset as above, is illustrated by the dashed line in Figure 1, which is a quadratic polynomial in the parameter
θ
with the minimum coinciding with the global minimum of the leastsquares predictionerror cost function.
The discretetime model structures that will be studied in this paper are those that can be written as a system of di ff erence equations in the form
f i z
(
t
)
, . . . , z
(
t
−
N
)
, θ
= 0
, i
= 1
, . . . , n,
(5) where
f i
are polynomials. The variable eral components
z
contains sev
z
(
t
)
T
= (
y
(
t
)
T
, u
(
t
)
T
, e
(
t
)
T
, x
(
t
)
T
)
,
(6) where
y
is the output,
u
is the input,
e
is the noise and
x
consists of internal variables.
This paper is a development of the material presented in Lyzell et al. (2009) and is outlined as follows: In
Section 2, the basic algebraic framework of this paper is presented with the main result being the generalization of Ritt’s algorithm to systems of di ff erence equations. The ensuing Section 3 discusses the generalization of the identifiability results from Ljung and
Glad (1994) to discrete time. Furthermore, the implications of noise in the parameter estimation problem for certain nonlinear model structures are investigated in Section 4. This section contains also a formulation
2
of an alternative approach to Ritt’s algorithm for a special form of the
nonlinear finite impulse response
( nfir
) model structure and a numerical example is given. Finally, some conclusions are drawn in Section 5.
2 Algebraic concepts
In this section, we are interested in formalizing the algebra concerning solutions to systems of polynomial equations, where the polynomials depend only on a finite number of variables (which are themselves elements of one or several timedependent signals), that is, the solution to systems of
di ff erence equations
. For example, the polynomial
x
(
t
)
3
+ 1 in the variable
x
(
t
) is a function
f
of the sequence
X t
maps
X t
to the polynomial
X
3
t,
1 first element in the sequence
X t
= (
x
(
t
−
τ
))
∞
τ
=0
+ 1 (where
X t,
1 which is the
). The solution to the di ff erence equation
f
= 0, that is
x
(
t
)
3
+ 1 = 0, is thus a sequence (
x
(
τ
))
∞
τ
= −∞ satisfying
f
(
X t
) = 0 for all
t
. In general, the solution to a system of di ff erence equations will be found by algebraic manipulations involving the backward shift operator
q
− 1
, which applied to our example polynomial results in
q
−
1
f
(
X
Thus, the timeshifted polynomial
q t
) =
− 1
f x
(
t
− 1)
3
+ 1.
is a function of the second element
X t,
2 of the sequence same way as
f
is a function of
X t,
1 nomials
f
(
X t
) and
q
− 1
f
(
X t
X t
in the
. Thus, the two poly
) are considered to be different. For the sake of notational convenience, from here on the argument of the polynomials will be left out. Starting with the basics of di ff erence algebra for timedependent signals, we will move on to polynomials and finally an algorithm is presented for systems of di ff erence polynomials.
2.1
Signal shifts
As discussed above, we are interested in systems described by polynomials in timedependent variables and their time shifts. From here on,
N is defined as the set of all nonnegative integers and
Z
+ as the set of all positive integers, respectively. The shifts will be denoted by
u
(
k
)
,
q
−
k u
(
t
) =
u
(
t
−
k
)
, k
∈
N
,
(7) where
q
is the forward time shift operator and the order of the displacement is given in the parenthesis
1
.
To simplify the notation in the examples to come, we also introduce ˙
,
u
(1) and ¨
,
u
(2)
. A fundamental concept for the algorithmic aspects of di ff erential algebra is
ranking
. This is a
total ordering
(see, for instance,
Lang, 2002) of all variables and their derivatives. In
1
Alternatively, we may use shifts forwards in time
u
(
k
)
q k u
(
t
) for all
k
∈
N and the same theory applies.
, the discretetime case it corresponds to a total ordering of all timeshifted variables.
Definition 2.
A binary operator ≺ , which is a total ordering satisfying
(i)
u
(
µ
)
(ii)
u
(
µ
)
≺
u
(
µ
+
σ
)
≺
y
(
ν
)
⇒
u
(
µ
+
σ
)
≺
y
(
ν
+
σ
)
,
for all
µ
∈
N and
ν, σ
∈
Z
+
, is called an
ordering of the signals u
and
y
and we say that
u
is
ranked
lower than
y
if
u
≺
y
.
There are many possible choices of orderings of signals.
For example, let
u
and
y
be two signals. Then two possible orderings are
u
≺
y
≺ ˙ ≺
y
˙ ≺ ¨ ≺
y
≺ · · ·
u
≺
˙
≺
¨
≺ · · · ≺
y
≺
y
˙
≺
y
≺ · · ·
(8a)
(8b)
The latter ordering will often be written in short
u
( · )
y
≺
( · )
. Let us turn our attention to polynomials with variables that are timeshifted signals. These polynomials will be used to represent di ff erence equations as discussed in the introduction to this section.
2.2
Polynomials
As with signals, polynomials can also be timeshifted.
In fact, the polynomial
f
(
σ
) is the result when all variables in the polynomial
f
have been shifted
σ
∈
N time steps. Even though most of the results and definitions in this section apply to polynomials in static variables, the formulations will focus on polynomials in timedependent variables.
To illustrate the algebraic concepts below in a simple manner, using the notation introduced in (7), let
f
,
˙
+ ¨
3
y
˙
2
, g
,
y
˙
2
h
,
u
+ ˙
2
,
(9a)
(9b)
(9c) be polynomials in the sequences
U t
∞
Y t
=
y
(
t
+
τ
)
τ
=0
.
=
u
(
t
+
τ
)
∞
τ
=0 and
To be able to order polynomials we need to find the highest ranked variable in the polynomial.
Definition 3.
The highest ranked timeshifted variable in a, possibly timeshifted, polynomial
f
is called the
leader
and is denoted by
` f
. The
degree
of a variable
x
in
f
is the highest exponent of
x
that occurs in
f
and is denoted by deg
x f
.
3
The polynomial (9a), with the ordering (8a), has the leader
` f
= ¨ with deg
f
= 3 and if the ordering is changed to (8b), the leader is given by
` f
deg
y
˙
= ˙ with
f
= 2. Thus, the leader depends on both the polynomial at hand and the ordering of the timeshifted signals that is used. The ranking of polynomials can now be defined.
Definition 4.
Let
f
and
g
be two polynomials with leaders
` f
and
` g
, respectively. We say that
lower
than
g
, denoted
f
≺
g
, if either
` f
≺
` g f
is or if
ranked
` f
=
` g
and deg
` f f <
deg
` f g
. If
` f
=
` g
and deg
` f f
= deg
` f g
we say that
f
and
g
have
equal ranking
and write
f
∼
g
.
,
The polynomials
f
and
g
in (9), with the ordering (8a), have the leaders
` f
= ¨ and
` g
= ¨ , respectively. Since
¨ ≺
y
¨ , according to (8a), it follows that
f
≺
g
.
In this section, we will be dealing with the elimination of variables in timeshifted polynomials. In this context the following concept is important.
Definition 5.
Let
f
be a polynomial with leader
` f
A polynomial
g
is said to be
reduced
with respect to
f
, if there is no positive time shift of
` f
in
g
and if deg
` f g <
deg
` f f
.
.
Using the ordering (8b) with the polynomials
f
and
g
in (9), the leaders are given by
` f
= ˙ and
` g
= ¨ , respectively. Thus, in this case,
f
is reduced with respect to
g
but not vice versa. The above concepts (ranking and reduced) are related as follows.
Lemma 6.
Let f and g be two polynomials. If f
≺
g under some ordering, then f is also reduced with respect to g under that ordering.
Proof.
If
f
≺
g
, then either
` f
deg
` f f <
deg
` g
≺
` g
or
` f
=
` g
and
g
. In the former case, then it follows from Definition 2 that
` f
≺
` g
≺
`
(
σ
)
g
for all
σ
∈
Z
+
.
Thus,
f
does not depend on
`
particular 0 = deg
` g f <
deg
` g
(
σ
)
g
for any
σ
∈
N and in
g
. Hence,
f
must be reduced with respect to
g
. The latter case follows in a similar fashion.
That the two concepts are not equivalent is easily seen if one chooses the ordering (8a) with the simple polynomials
f
=
y
and
g
=
u
. Since
f
does not depend on the variable
u
, it holds that
f
is reduced with respect to
g
, but
f
⊀
g
since
` g
≺
` f
.
Before we continue providing the tool needed to reduce polynomials with respect to each other, some additional concepts of di ff erence polynomials are needed.
These will not play a major role in what follows but are used to guarantee the existence of solutions to the resulting reduced sets of polynomials.
Definition 7.
The
separant S f
of a polynomial
f
is the partial derivative of
f
with respect to the leader, while the
initial I f
is the coe ffi cient of the highest power of the leader in
f
.
The polynomial
f
in (9a) have, under the ordering (8a), the leader
` f
= ¨ . This implies that the separant is given by
S f u
2
y
˙
2 and the initial
I f
2
. The tool needed to reduce polynomials with respect to each other is a variant of the standard polynomial division.
Lemma 8.
Let f and g be polynomials in the variable x of degree m and n , respectively, written in the form f
=
a m x m
+ · · · +
a
0
and g
=
b n x n
+ · · · +
b
0
, where m
≥
n . Then there exist polynomials Q
,
0
,
R such that
Qf
= ¯ +
R,
¯
and where
deg
then x
R < n . Furthermore, with Q given by b
¯
and R are unique.
m
−
n
+1
n
,
Proof.
See, for instance, Mishra (1993, page 168).
The suggested choice of
Q
in the lemma above is the initial
I g
to the power of
m
−
n
+ 1. It is also worth noting that the polynomials
f
and
g
in Lemma 8 may be multivariable, which implies that the coe ffi cients of the variable
x
are polynomials in the remaining variables.
Now, let us illustrate the concept of pseudodivision on the polynomials
f
and
g
in (9), where the variable in question is ˙ . Since
I g
= 1, it holds that
f
= ¨
3
g
+ ( −
¨
3
¨
uy
)
,
so that
Q
= 1, ¯
u
3 and
R
= − ¨
3
y
¨ isfies Lemma 8 if we notice that deg
y
˙
uy
R
, which sat
= 0. During the algebraic simplifications, it is important that the solutions to the original system of polynomial equations are preserved. To this end, the following results show how pseudodivision can be used to eliminate variables while preserving the solution.
Lemma 9.
Let g be a polynomial with leader ` g and let f be a polynomial containing `
(
σ
)
g for some there exist polynomials R and Q such that
σ
∈
N
. Then
(i) R does not contain `
(
σ
)
g
.
(ii) Every solution of f
= 0
, g
= 0
is also a solution of
R
= 0
, g
= 0
.
4
(iii) Every solution of R
= 0
, g
= 0
with Q
,
0
is also a solution of f
= 0
, g
= 0
.
Proof.
Let
m
and
n
be the degrees of
f
and
g
(
σ
) as polynomials in
`
(
σ
)
g
, respectively. On one hand, if
m
≥
n
, then pseudodivision according to Lemma 8 yields polynomials
Q
1
,
0, ¯
1 and
R
1 such that
Q
1
f
= ¯
1
g
(
σ
)
+
R
1
,
(10) with deg
`
(
σ
)
g
R
1
< n
≤
m
. If
R
1 still depends on the variable
`
(
σ
)
g
, then further pseudodivision yields polynomials
Q
2
,
0, ¯
2 and
R
2 such that
Q
2
g
(
σ
)
Q
2
R
1
R
2
,
(11) with deg
`
(
σ
)
g
yields
R
2
<
deg
`
(
σ
)
g
R
1
. Combining (11) and (10)
Q
1
Q
2
f
= (
Q
2
Q
1
Q
2
)
g
(
σ
)
+
R
2
,
where we have defined
R
2
,
−
˜
2
. Continuing in this manner, by recursive application of Lemma 8, one may always construct polynomials
Q
, ¯ and
R
such that
Qf
= ¯
(
σ
)
+
R,
(12) with deg
`
(
σ
)
g
R
= 0, that is,
R
does not depend on the variable
`
(
σ
)
g
. Now, assume that such polynomials have been constructed. Then the first statement in the lemma holds true. For the second statement, rewrite (12) as
R
=
Qf
−
(
σ
)
.
Since
g
(
σ
) is just
g
with all variables time shifted
σ
≥ 0 steps, it must hold that
g
(
σ
)
= 0 whenever
g
= 0. Thus,
R
= 0 whenever
f
= 0 and
g
= 0. For the final statement, rewrite (12), assuming
Q
,
0, as
f
=
1
Q
(
σ
)
+
R .
Since
g
= 0 implies
g
(
σ
)
= 0, it follows that
f
= 0 whenever
R
= 0 and
g
= 0, which concludes the proof for the case when
m
≥
n
. On the other hand, if
m < n
, then
Lemma 8 yields polynomials
Q
,
0, ¯ and
R
such that
Qg
(
σ
)
= ¯ +
R
with deg
`
(
σ
)
g
R < m
and similar arguments as in the former case can be applied.
In the proof of the Lemma 9 above, we see that it is possible that several pseudodivisions are needed in the case
σ >
0 to eliminate
`
(
σ
)
g
from the polynomial
f
. This is the main di ff erence between the continuoustime case and the discretetime case. In the continuoustime case,
g
(
σ
) is a ffi ne in
`
(
σ
)
g
(due to the product rule of di ff erentiation) and thus only one pseudodivision is needed to eliminate
`
(
σ
)
g
from
f
. In the discretetime case, time shifting a polynomial does not change its degree with respect to a variable, that is, deg
` g g
= deg
`
(
σ
)
g g
(
σ
) and several pseudodivisions might be needed.
Now, the following extension of Lemma 9 provides (as we will see later) the main step in our algorithm.
Lemma 10.
Let f be a polynomial which is not reduced with respect to the polynomial g . Then there exist polynomials R and Q such that
(i) R is reduced with respect to g .
(ii) Every solution of f
= 0
, g
= 0
is also a solution of
R
= 0
, g
= 0
.
(iii) Every solution of
R
= 0
, g
= 0
with
Q
,
0
is also a solution of f
= 0
, g
= 0
.
Proof.
If
f
is not reduced with respect to
g
, then either
f g
contains some positive timeshift of the leader or else
f
contains
` g
to a higher power than
g
.
` g
of
In the latter case, Lemma 9 yields polynomials
Q
, ¯ and
R
such that
Qf
= ¯ +
R,
(13) where
R
is reduced with respect to
g
. Thus, the proof of the statements follows in the same way as in proof of Lemma 9.
In the former case, the polynomial
f
contains
`
(
σ
1
g
some
Q
1
σ
1 and
R
∈
Z
+
1
. Lemma 9 then yields polynomials
Q
1 satisfying
) for
,
Q
1
f
= ¯
1
g
(
σ
1
)
+
R
1
,
(14) where
R
1 does not contain for some
σ
2
∈
N with yields polynomials
Q
2
σ
2
`
(
σ
1
g
)
. If
R
< σ
1
Q
2 and
R
2 still contains satisfying
`
(
σ g
2
)
1
, further use of Lemma 9
Q
2
R
1
Q
2
g
(
σ
2
)
+
R
2
,
(15) where
R
2 does not contain and (15) yields
`
(
σ g
2
)
. Combining (14)
Q
2
Q
1
f
=
Q
2
Q
1
g
(
σ
1
)
Q
2
g
(
σ
2
)
+
R
2
.
Continuing in this manner, by recursive application of
Lemma 9, one can construct polynomials
Q
and
R
such
5
that
Qf
=
n
X
Q i g
(
σ i
)
+
R, i
=1
(16)
Q i
,
i
= 1
, . . . , n
, where
R
does not contain
`
(
σ
)
g
for any
σ
∈
N
. Thus, we have constructed a polynomial
R
that is reduced with respect to
g
. Since
g
= 0 implies that
g
(
σ
)
= 0 for all
σ
∈
N
, the remaining statements follow in the same way as in the proof of Lemma 9.
Lemma 10 shows how to reduce one di ff erence equation with respect to another while preserving the solution of both and that the main tool for achieving this is the pseudodivision concept presented in Lemma 8.
Now, let us turn our attention to systems of di ff erence equations.
2.3
Sets of polynomials
Now, we are ready to define the necessary concepts for systems of polynomial di ff erence equations, which will be represented by sets of di ff erence polynomials. The following definition generalizes the concept of reduced polynomials.
Definition 11.
A set
A
= {
A
1
, . . . , A p
} of polynomials is called
autoreduced
if all elements
A i
reduced with respect to each other. If, in addition, the polynomials
A
1
, . . . , A p
are pairwise in the autoreduced set
A are in increasing rank, then
A is said to be
ordered
.
Let us once again consider the polynomials
f
and
g
in (9), but now with the ordering (8a). Then it is easy to see that
f
and
g
are reduced with respect to each other and the set {
f , g
} is autoreduced. On the other hand, using the ordering (8b), we have already seen that
g
is not reduced with respect to
f
and the set {
f , g
} is not autoreduced. The following definition generalizes the concept of ranking to autoreduced sets.
Definition 12.
, . . . , B n
Let
A
= {
A
1
, . . . , A m
} and
B
=
{
B
1
} be two ordered autoreduced sets. Then
A is said to have a lower
ranking
than
B if either there exists an integer
k
such that 1 ≤
k
≤ min (
m, n
) satisfying
A j
∼
B j
, j
= 1
, . . . , k
−
1
A k
≺
B k
,
or else if
m > n
and
A j
∼
B j
for all
j
= 1
, . . . , n
.
The ordering (8a) applied to the polynomials
f
,
g
and
h
in (9) yields two ordered autoreduced sets of polynomials with two elements, namely
A
,
{
f , g
} and
B
,
{
h, g
} . Since
h
≺
f
, but not vice versa, it holds that
B is ranked lower than
A
. The concept of reduced polynomials can be generalized as follows.
Definition 13.
A
characteristic set
for a given set of timeshifted polynomials is an autoreduced subset such that no other autoreduced subset is ranked lower.
Using the same example of ordered autoreduced sets
A
,
{
f , g
} and
B
,
{
h, g
} chosen from the set {
f , g, h
} of polynomials (9), it is clear that
B is the characteristic set under the ordering (8a). The basic idea of our algorithm is to reduce a set of polynomials by the use of characteristic sets. In each iteration, the characteristic set is used to reduce the highest ranked polynomial not in the characteristic set to a lower ranked one.
Thus, it is important to guarantee that a sequence with decreasing rank is always finite for the algorithm to terminate in a finite number of steps.
Lemma 14.
A sequence of autoreduced sets, each one ranked lower than the preceding one, can only have finite length.
In particular, the above statement includes sequences of characteristic sets and is a direct consequence of the following result.
Lemma 15.
A sequence of timeshifted variables from a finite number of signals, each one ranked lower than the preceding one, can only have finite length.
Proof.
Let
y
1
, . . . , y p
denote all the variables whose time shifts appear anywhere in the sequence. For each
y j
let
σ j
denote the order of the first appearing time shift.
Then there can be only
σ j
lower timeshifted
y j
in the sequence and the total number of elements is thus bounded by
σ
1
+ · · · +
σ p
+
p
.
Finally, we are ready to state the elimination algorithm for systems of di ff erence equations in the form (5).
Algorithm 1.
Reduces a set of polynomials to a characteristic set.
Input:
Two sets of polynomials
F
= {
f
1
G
= ∅ with an ordering.
, . . . , f n
} and
Output:
The updated set
F as a characteristic set containing the reduced polynomials and the set
G containing information about the separants, initials and the quotients resulting from the performed pseudodivisions in the di ff erent steps.
1) Compute a characteristic set
A
= {
A
1
, . . . , A p
} of
F
:
Order the polynomials in
F according to their leaders, so that the first polynomial has the lowest ordered leader and initialize
A
← {
f
1
} . For all
f i
∈
F
,
6
i
= 2
, . . . , n
, test if that case update
2) If
F
\
A
, then go to Step 4.
A
A
∪ {
←
A
f i
} is autoreduced and in
∪ {
f i
} .
∅ , where \ denotes the set di ff erence,
3) Add
S
A
,
I
A
for all
A
∈
A to
G and
stop
.
4) Let
f
be the highest ranked unreduced polynomial in
F with respect to
A and apply Lemma 10 to get polynomials
Q
and
R
such that
Qf
=
n
X
Q i
A
(
σ i
)
+
R, i
=1 where
A
is the highest ordered polynomial in
A such that
f
is not reduced with respect to
A
. Update
G
←
G
∪ {
Q
} .
5) If
R
= 0 update
F
←
F
\ {
f k
} , else
{
R
} and continue from Step 1.
F
← (
F
\ {
f k
} ) ∪
Algorithm 1 is quite similar to the RittSeidenberg algorithm, see (Ritt, 1950) and (Seidenberg, 1956), and will be referred to as Ritt’s algorithm from here on.
Remark 16.
The elements added to the set
G in Algorithm 1 are important to ensure the solvability of the resulting system of di ff erence equations and should all be nonzero. The nonvanishing of the separant is used later to ensure that the requirements for the implicit function theorem (see, for instance, Rudin, 1976) are fulfilled.
Remark 17.
For those familiar to the continuoustime version of Ritt’s algorithm, it is worth noting that Algorithm 1 can be readily generalized to only consider
partially reducedness
in each iteration as described in Fliess and Glad (1994), which may simplify the result when applying Ritt’s algorithm.
Some important properties of the proposed algorithm are given below. The proofs of the results are similar to corresponding proofs for the continuoustime case, but with the small distinction remarked upon in connection with Lemma 9.
Theorem 18.
The Algorithm 1 will reach the stop after a finite number of steps.
Proof.
The only possible loop is via Step 5 to Step 1.
This involves either the removal of a polynomial or its replacement with one that is reduced with respect to
A or has its highest unreduced timeshift removed. If
R
is reduced, then it is possible to construct a lower autoreduced set. An infinite loop would thus contradict either Lemma 14 or Lemma 15.
Theorem 19.
Every solution to the initial set
F
in Algorithm 1 is also a solution to the final set. Every solution to the final set for which the polynomials of
G
are nonzero is also a solution of the initial set.
Proof.
The result follows from repeated application of
Lemma 10.
We conclude this section by repeating Example 1, but here the parameter nonlinearity is eliminated using
Ritt’s algorithm.
Example 20.
Consider, once again, the model structure defined by
y
(
t
) =
θu
(
t
) +
θ
2
u
(
t
− 1) +
e
(
t
)
,
(17) where
y
(
t
) is the output,
u
(
t
) is the input, and
e
(
t
) is the noise, respectively. Let us try to apply Algorithm 1 to reduce (17) to a linear regression model. To this end, it is necessary to express that the parameter
θ
in (17) is constant, since the di ff erence algebraic framework developed above assumes, by construction, that all signals are time dependent. Therefore, define
f
1
,
y
−
uθ
−
˙
2
−
e, f
2
,
θ
−
(18)
(19) where the dot operator denotes the backward shift. Let
F
,
{
f
1
, f
chosen as
2
} and
G
= ∅ , respectively. The ordering is
u
( · )
≺
y
( · )
≺
e
( · )
≺
θ
( · )
.
Now, Ritt’s algorithm will need two iterations to terminate, but here we will only illustrate the first iteration:
Step 1:
The leaders
`
spectively. Since to
f
2
` f
1
f
1 and
`
≺
` f
2
,
f
1
f
2 in
F are
θ
and ˙ , reis reduced with respect
. The other way around, since the polynomial
f
contains a positive time shift of the leader
` f
1
2
, namely
`
˙
f
1
= ˙ , it is not reduced with respect to characteristic set
A is given by {
f
1
} .
f
1
, that is, the
Step 2:
Since
F
\
A
= {
f
2
}
,
∅ , we move to step 4.
Step 4:
Now, we will make use of the scheme depicted in the proof of Lemma 10 to reduce
f
2
f
1
. The division algorithm yields
Q
1
f
2 with respect to
Q
1
f
˙
1
+
R
1
, with
R
1
= −
y
˙ + ˙ + ˙
u
˙
Q
1
= ˙ with
u
˙
, and ¯ another division is needed, that is,
Q
2
R
1
2
= 1. Since
R
1
= ˙ −
e
˙
−
˙
− still depends on ˙
uθ
2
, f
2
Q
2
R
1
+
, yet
R
2
,
7
Q
2
= − ¨ , and ¯
2
= 1. The polynomial duced with respect to
f
1
R
2 is not reand further polynomial division yields
Q
3
R
2
Q
3
f
1
+
R
3
, with
R
3
= −
¨
+ ¨ + ˙
y
˙
−
˙
e
˙
+ ( ¨ −
u
2
)
θ,
Q
3
= ¨ , and ¯
3
= with respect to
f
1
−
˙
. The polynomial
R
3 is reduced
. Putting it all together yields
Qf
2
Q
1
f
+ ˜
2
f
˙
1
+
R,
where
Q
=
Q
3
R
=
R
3
Q
2
−
Q
1
Q
2
Q
1
Q
3
Q
2
. Finally, update
G
←
G
∪ {
Q
} .
= −
Q
3
Q
1
Q
2
, and
Step 5:
Update
F
← {
f
1
, R
} .
Thus, after the first iteration of Algorithm 1 we have found a new first order polynomial
R
such that the solution to the system of di ff erence equations corresponding to the polynomial set {
f
1
, R
} is the same as the one for the original system given by {
f
1
, f
2
} . Furthermore, the polynomial
R
coincides with the polynomial defined by (4) in Example 1.
It is worth noting that in Step 4, two divisions had to be made to eliminate ˙ from the polynomial
f
2
This is one of the di ff erences between the discretetime
.
and the continuoustime Ritt’s algorithm. In continuous time, the derivative always leaves a polynomial which is a ffi ne in the leader and only one division is needed for each elimination.
In Ljung and Glad (1994), the di ff erential algebraic tools were used to derive some interesting results on the identifiability of polynomial model structures.
Since the discretetime algorithm is so similar, one could expect to get the same results.
3 Identifiability
In this section, the focus will lie entirely on system identification aspects of the framework presented above. Therefore, in the following, it is assumed that all parameters are constant and that the initial polynomial set
F contains, in addition to the polynomials describing the model structure, the polynomials describing this fact, see Example 20.
In Ljung and Glad (1994), necessary and su ffi cient conditions for global identifiability of continuoustime model structures were given. Here, we will discuss the generalization of these results to discretetime model structures. First, let us recall the definition of identifiability.
Definition 21.
Let (5) have the solution
θ
,
z
T
(
y
T
, u
T
,
0
, x
T
) and also the solution ˜ , ¯
T
= ( ¯
T
,
¯
T
,
0
,
¯
T
=
).
If
u
= ¯ ,
y
= ¯ implies that
θ
= ¯ for all solutions in a given solution set, then the model structure is said to be
identifiable
for that solution set.
Now, since the discretetime version of Ritt’s algorithm is so similar to the continuoustime case, should we not be able to derive equivalent results concerning the identifiability of discretetime model structures? That is, is a discretetime model structure, only containing polynomial nonlinearities, globally identifiable if and only if it can be written as a linear regression model?
The first part of this statement is, at this point, quite easy to prove. Namely, if Ritt’s algorithm results in a linear regression model, then the corresponding model structure is globally identifiable.
Theorem 22.
If the output of Algorithm 1 contains an expression of the form Qθ
−
P , where the diagonal matrix
Q and the vector P does not depend on θ and x , then the model structure is globally identifiable, provided that
det
Q
,
0
for the measured data
2
.
Proof.
Since, according to Theorem 19, every solution of the original equations is also a solution of the output equations, it follows that there can only be one value of
θ
that is consistent with the measured values, provided that det
Q
,
0.
Now, let us consider the converse of the above statement. In the continuoustime case, the proof of this fact is based on the following property: If
f
(
t
) and
g
(
t
) are analytical functions satisfying
f
(
t
)
g
(
t
) = 0 for all
t
∈
R
, it holds that either
f
(
t
) = 0 or
g
(
t
) = 0 for all
t
∈
R
. Unfortunately, this property does not remain valid when the domain changes from
R to
Z
. A simple example of a discretetime signal that does not satisfy the above property is the Kronecker delta function
δ
(
t
)
,
(
1
, t
= 0
0
, t
∈
Z
\ {
0
}
For instance, it holds that
δ
(
t
)
δ
(
t
−
1) = 0 for all
t
∈
Z even though
δ
(
t
) and
δ
(
t
− 1) are nonzero for
t
= 0 and
t
= 1, respectively. The absence of the above property for discretetime signals hinders a straightforward generalization of the desired identifiability result. Reasonable assumptions to be able to prove the theorem are yet to be found.
Even though we were not able to provide a general identifiability result, equivalent to that achieved in the
2
The requirement det
Q
,
0 can be interpreted as a condition of
persistence of excitation
of the input signal (see, for instance, Ljung, 1999).
8
continuoustime case, we are still able to draw some conclusions when the use of Ritt’s algorithm does not result in a linear regression model.
Theorem 23.
Let the ranking be given by u
( · )
≺
y
( · )
≺
θ
( · )
1
≺ · · · ≺
θ
( · )
m
≺
x
( · )
where x contains any unmeasured variable. Assume that the output of Algorithm 1 is an autoreduced set with the following form p
0
(
u,
˙
y, . . .
)
, θ
1
−
θ
1
, p
2
(
u,
˙ ˙
1
, θ
2
)
, . . .
p m
(
u,
˙ ˙
1
, . . . , θ m
)
, . . .
Furthermore, assume that there exists a solution to this set such that all polynomials in
G
from Algorithm 1 are nonzero. Then there are infinitely many values of
θ compatible with the u and y of this solution, that is, the system is unidentifiable.
Proof.
Fixing the values of
u
and
y
in the first polynomial
p
0 to those of the given solution means that the corresponding equation is always satisfied. The parameter
θ
1 can now be changed to a new arbitrary constant in the second polynomial
p
2
. If this change is small enough the remaining equations can now be solved successively for the leader due to the nonvanishing of the separants (the
implicit function theorem
, see, for instance, Rudin, 1976). Now, Theorem 19 gives that the solutions corresponding to the polynomial set {
p
0
, . . . , p m
} are also solutions to the initial set
F
.
Since this initial set contains polynomials
θ
all
i
= 1
, . . . , m
, this implies that the
θ i
constant. Hence, the result follows.
i
−
˙
i
for solutions will be
The above theorem implies that, in this particular case, if Ritt’s algorithm is not able to eliminate the timeshift for at least one of the parameters, not necessarily the first parameter, then the model structure is unidentifiable. Thus, even though we were not able to prove that every globally identifiable model structure may be rewritten as a linear regression model, this indicates that Ritt’s algorithm still can be useful when analyzing the identifiability of discretetime model structures.
Ritt’s algorithm has some interesting implications for the parameter estimation problem in system identification, since it, in some cases, results in a linear regression.
4 A pragmatic approach
When working with Ritt’s algorithm and the framework developed in the sections above, one notices almost immediately that the complexity of the resulting linear regression model grows quite rapidly with the number of parameters involved in the original nonlinear model structure. This is especially true in discrete time.
Another drawback using Ritt’s algorithm is illustrated in the example below.
Example 24.
Consider the model structure defined by
y
=
θ
1
u
+
θ
1
θ
2
˙
+
e,
(20) where the dot, as usual, denotes the backward shift.
This is a special case of a nonlinearly parametrized
finite impulse response
( fir
) model structure. Now, assume that one is interested in finding estimates of the
T parameters
θ
=
θ
1
θ
2 given some input and output data. One possibility would be to use Ritt’s algorithm to find a linear regression model from which an estimate can be found via the leastsquares method. For the resulting estimate to be unbiased, it is necessary that the regressor and the noise in the linear regression model are uncorrelated. Unfortunately, this is not generally the case for fir models, like the one defined by (20). Running Algorithm 1 with the ordering
e
( · )
≺
u
( · )
≺
y
( · )
≺
θ
1
≺
θ
2
≺
˙
1
≺
˙
2
,
yields a linear regression model which is too lengthy to present in its entirety and only the linear regression for the first parameter
z
1
=
ϕ
1
θ
1
+
v
1 is given:
z
1
=
u
(3)
¨
y
¨
−
¨
2
˙
y
¨
+ 2
u
(3)
¨ ˙
y
˙
−
u
(4)
˙
2
− (
u
(3)
)
2
u y
˙
−
u
(3)
¨
2
y
+
u
(4)
¨
uy, y
˙
ϕ
1
= 2
u
(3)
¨ ˙
2
−
¨
3
˙
−
u
(4)
u
3
− (
u
(3)
)
2
˙
+
u
(4)
¨
uu
+
u
(3)
¨ ˙ ¨
− (
u
(3)
)
2
u
¨
−
u
(3)
¨
2
y
˙
+
u
(4)
u
(3)
u y
˙
+ (
u
(3)
)
2
¨
−
u
(4)
u
(3)
˙
v
1
=
eu
(4)
¨
u
−
eu
(3)
¨
2
−
e
˙
(
u
(3)
)
2
u
+ ¨
(3)
¨
(3)
¨ ˙
−
e
¨
u
2
˙
−
eu
(4)
˙
2
Here we note that the regressors
ϕ
1 the noise signal
v
1
, since ˙ are correlated with has a direct dependence on ˙ . Thus, the leastsquares estimate of the parameters
θ
may be biased. To find an unbiased estimate, one needs to make use of techniques like
instrumental variables
(see, for instance, S¨oderstr¨om and Stoica,
1983). It should be mentioned that the linear regression model given by Ritt’s algorithm for the second parameter is even lengthier than the one for the first parameter presented above.
In the example above, we noticed that, even for the fir model structure, one can get correlation between the
9
regressors and the noise in the resulting linear regression model when using Ritt’s algorithm. With this in mind and the complexity of Ritt’s algorithm, we will here formalize a more pragmatic approach to eliminate the parameter nonlinearities that might appear. The main idea is to use successive time shifts and pseudodivisions as in Example 1.
Theorem 25.
Consider the model structure defined by y
=
η
T
f
(
u, q
) +
m
X
h i
(
η, ξ
)
g i
(
u, q
) +
e, i
=1
(21)
where y is the output, u is the input, and e is the noise, respectively. The functions f ,
(
h i
)
m i
=1
, and
(
g i
)
m i
=1
may have a nonlinear dependence on the indicated variables, where q denotes the forward shift operator. The unknown parameters
θ
T
=
η
T
ξ
T
are constant, where all parameters that do not appear in any linear term in
(21)
have been collected in ξ . Under the assumptions above, the model structure
(21)
can be written as a linear regression model
H
m
(
u, q
)
y
=
η
T
ϕ m
(
u, q
) + H
m
(
u, q
)
e,
(22)
in the parameter vector η . Here, ϕ m valued function in a finite number of components in u and
H
m is a nonlinear vectoris a linear filter whose coe ffi cients have a nonlinear dependence of the components in u , respectively. The components of
(22)
are defined by
H
j
+1
(
u, q
) = ˜
m
−
j,j
( ˙ ) H
j
(
u, q
) − ˜
m
−
j,j
(
u, q
) H
j
( ˙ )
q
−
1
, ϕ j
+1
(
u, q
) = ˜
m
−
j,j
( ˙ )
ϕ j
(
u, q
) − ˜
m
−
j,j
(
u, q
)
ϕ j
( ˙ )
, g
˜
i,j
+1
(
u, q
) = ˜
m
−
j,j
( ˙
g i,j
(
u, q
) −
g
˜
m
−
j,j
(
u, q
) ˜
i,j
( ˙ )
, for j
= 0
, . . . , m
− 1
, with
H
0
respectively.
,
1
, ϕ
0
,
f , and
˜
i,
0
,
g i
,
Proof.
The proof will proceed by eliminating the nonlinear functions (
h k
)
m k
=1 from (21) one at a time using time shifts and pseudodivisions. Starting o ff by time shifting (21), bearing in mind that the parameters in
η
and
ξ
are constant, yields
y
˙ =
η
T
f
( ˙ ) +
m
X
h i
(
η, ξ
)
g i
( ˙
e.
i
=1
(23)
Multiplying (21) by
g m
( ˙ ) and (23) by
g m
(
u, q
), and then subtracting the results, yields a di ff erence equation in the form
H
1
(
u, q
)
y
=
η
T
ϕ
1
(
u, q
) +
m
− 1
X
h i
(
η, ξ
) ˜
i,
1
(
u, q
) + H
1
(
u, q
)
e, i
=1 where the filter H
g i,
1
1 and the nonlinear functions
ϕ
1 depend on the indicated signal, for instance,
H
1
(
u, q
)
,
g m
( ˙ ) −
g m
( above expression yields
u, q
)
q
− 1
. Now, time shifting the
H
1
( ˙
y
=
η
T
ϕ
1
( ˙ ) +
m
− 1
X
h i
(
η, ξ
) ˜
i,
1
( ˙ ) + H
1
( ˙
e, i
=1
Thus, crossmultiplying the expressions above with
g m
− 1
,
1
( ˙ ) and ˜
m
− 1
,
1
(
u, q
), respectively, and subtracting the results, yields a di ff erence equation in the form
H
2
(
u, q
)
y
=
η
T
ϕ
2
(
u, q
) +
m
− 2
X
h i
(
η, ξ
) ˜
i,
2
(
u, q
) + H
2
(
u, q
)
e, i
=1 where the filter H
2
g i,j
and the nonlinear functions
ϕ
2 depend on the indicated signals, for instance, and
H
2
(
u, q
)
,
˜
m
− 1
,
1
( ˙ ) H
1
(
u, q
) − ˜
m
− 1
,
1
(
u, q
) H
1
( ˙ )
q
− 1
.
Note that the resulting di ff erence equation after the second step is essentially the same as that after the first step. Hence, continuing in this manner, by successive shifts and pseudodivisions, all the nonlinearities
(
h i
)
m i
=1 will be eliminated in a total of
m
steps and the resulting di ff erence equation will be a linear regression model of the form (22).
There are some conclusions that can be drawn from the result given in Theorem 25. Firstly, the pragmatic elimination scheme applied to an nfir model structure results in a linear regression model in the parameters appearing linearly in the original nfir model structure.
In particular, if
e
(
t
) is white noise with zero mean and independent of the input
u
(
t
), then the leastsquares estimate of
η
will be consistent. Secondly, since the resulting noise filter
H
m
is known, one may compensate for it using the
weighted leastsquares method
(see, for example, Ljung, 1999, page 547). Finally, it is worth noting that neither the regressors
ϕ m
nor the filter H
m
in the resulting linear regression model (22) depend on the nonlinear functions
h i
. Thus, these functions do not need to be known when performing the elimination scheme.
Now, let us try out using the elimination scheme constructed in the proof of Theorem 25 on a simple example.
Example 26.
Consider the Hammerstein system (see, for instance, Ljung, 1999, page 143) generated by
y
= (1 +
ξq
− 1
)(
η
1
u
+
η
2
u
3
) +
e
=
η
1
u
+
η
2
u
3
+
η
1
ξ
˙
+
η
2
ξ
˙
3
+
e,
(24)
10
where
y
denotes the output,
u
denotes the input and
e
is assume zero mean white noise, respectively. The
T parameters appearing linearly in (24) are
η
=
η
1
η
2
.
Even for this rather simple example, the application
Ritt’s algorithm is not practically tractable and the expressions involved become quite complicated. Furthermore, the transformed noise and the regressors will not be independent and a leastsquares estimate will therefore become biased.
Now, the pragmatic elimination scheme outlined in the proof of Theorem 25 applied to (24) is quite simple, see also Example 1. First the nonlinearity containing
η
2
ξ
is eliminated by a timeshift of (24) followed by a pseudodivision. Second, the remaining nonlinearity containing
η
1
ξ
is eliminated in a similar manner. The resulting components of the linear regression model (22) are given by
H
(
u, q
)
,
(
u
(3)
)
3
u
3
−
u
(3)
u
5
+
u
(3)
u
3
− (
u
(3)
)
3
˙ ¨
2
q
− 1
+
¨
5
˙
−
¨
3
u
3
q
− 2
,
and
ϕ
η
1
(
u, q
)
,
(
u
(3)
)
3
u
2 2
−
¨
6
˙
u
4
− (
u
(3)
)
3
u
3
u
+
u
(3)
u
5
u, u
3
− (
u
(3)
u
2
u
4
ϕ
η
2
(
u, q
)
,
¨
6
u
3
− ¨
8
˙ + (
u
(3)
)
3 2
u
4
−
(
u
(3)
)
3
u
3
u
3
+
u
(3)
u
5
u
3
,
−
u
(3)
u
2
u
6 where the subscript denotes the corresponding part of the vectorvalued function
ϕ
.
To evaluate the weighted leastsquares parameter estimate performance when using the above linear regression model, a dataset
Z t
=1 ated by simulating (24) with the input signal
u
(
t
) =
(1 + 0
.
5
q
−
1
)
v
(
t
), where
v
N
(
t
=
y
(
t
)
, u
(
t
)
2
,
000 was gener
) was chosen uniformly random in the interval [ − 1
,
1], and the noise signal
e
was chosen as Gaussian white noise with zero mean and variance 1. The parameter values where chosen as
ξ
= − 0
.
75
, η
1
= 0
.
75
, η
2
= 0
.
5
,
and the
signal to noise ratio
was determined to be approximately 35
.
6 dB. The estimation experiment was repeated 500 times with di ff erent input and noise realizations and resulted in the weighted leastsquares mean and standard deviation estimates
η
1
= 0
.
7513 ± 0
.
0347
,
ˆ
2
= 0
.
4995 ± 0
.
0245
.
This indicates that the pragmatic approach may in certain cases be useful when searching for an initial estimate to the predictionerror method.
As pointed out in the example above, the pragmatic elimination scheme may work well enough to be used to find an initial parameter estimate. The scheme is easily implemented using some symbolic software.
It is worth noting that the pragmatic elimination scheme constructed in the proof of Theorem 25 also works for narx models, that is, when the functions
f
or (
g k
)
m k
=1 in (21) also depends on the output
y
. The di ff erence is that one cannot guarantee that the regressor and the noise in the resulting linear regression will be uncorrelated. Thus, the leastsquares parameter estimate given a dataset may not be asymptotically unbiased and one might need to make use of some instrumental variables approach (see, for example,
S¨oderstr¨om and Stoica, 1983).
5 Conclusions
In this paper a generalization of Ritt’s algorithm to systems of di ff erence equations has been presented.
Furthermore, the generalization of the identifiability results given in Ljung and Glad (1994) has been discussed. It turned out that it was di ffi cult to do a straightforward generalization and the result could only be partially generalized. As a final result, a pragmatic alternative to Ritt’s algorithm was formalized and the e ff ects of the nonlinear transformations of the noise in a parameter estimation perspective was analyzed for a particular class of nfir models and illustrated via a numerical example. It turned out that the transformed noise signal was the original noise filtered through a known linear timedependent filter.
Thus, if the original noise signal is independent of the input, then the leastsquares estimate, using the transformed model structure, will be consistent.
6 Acknowledgment
The authors would like to thank the Swedish Research
Council and the Vinnova Industry Excellence Center link sic for support.
References
E. Bai. An optimal twostage identification algorithm for HammersteinWiener nonlinear systems.
Automatica
, 34(3):333–338, 1998.
T. Becker and V. Weispfenning.
Gr¨obner Bases: A
Computational Approach to Commutative Algebra
.
SpringerVerlag, New York, 1993.
11
B. Buchberger. Theoretical basis for the reduction of polynomials to canonical forms.
ACM SIGSAM Bulletin
, 39:19–24, 1976.
S. Diop. A state elimination procedure for nonlinear systems. In
New Trends in Nonlinear Control Theory
, volume 122 of
Lecture Notes in Control and Information Sciences
, pages 190–198. Springer, 1989.
S. Diop. Elimination in control theory.
Mathematics of
Control, Signals and Systems
, 4(1):17–32, 1991.
S. Diop and M. Fliess. On nonlinear observability. In
Proceedings of the 1st European Control Conference
, pages 152–157, Grenoble, France, 1991a.
S. Diop and M. Fliess. Nonlinear observability, identifiability and persistent trajectories. In
Proceedings of the 36th IEEE Conference on Decision and Control
, pages 714–719, Brighton, England, 1991b.
M. Fliess. Automatique en temps discret et alg`ebre aux di ff
´erences.
Forum Mathematicum
, 2:213–232, 1990.
M. Fliess and T. Glad. An algebraic approach to linear and nonlinear control. In
Essays on Control: Perspectives in the Theory and Its Applications
, volume 14 of
Progress in System and Control Theory
, pages 190–
198. Birkh¨auser, 1994.
M. Fliess, J. L´evine, P. Martin, and P. Rouchon. Flatness and defect of nonlinear systems: Introductory theory and examples.
International Journal of Control
, 61(6):
1327–1361, 1995.
K. Forsman.
Constructive Commutative Algebra in
Nonlinear Control Theory
. PhD thesis, Dissertation
No. 261, Link¨oping University, 1991.
T. Glad. Implementing Ritt’s algorithm of di ff erential algebra. In
Proceedings of the 2nd IFAC Symposium on Nonlinear Control Systems Design
, pages 610–614,
Bordeux, France, 1992.
T. Glad. Solvability of di ff erential algebraic equations and inequalities: An algorithm. In
European Control
Conference
, Brussels, Belgium, 1997.
E. Kolchin.
Di ff erential Algebra and Algebraic Groups
.
Academic Press, 1973.
S. Kotsios. An application of Ritt’s remainder algorithm to discrete polynomial control systems.
IMA
Journal of Mathematical Control and Information
, 18
(18):19–29, 2001.
S. Kotsios and N. Kalouptsidis. The model matching problem for a certain class of nonlinear systems.
International Journal of Control
, 57(3):707–730, 1993.
S. Lang.
Algebra
. SpringerVerlag, third revised edition,
2002.
L. Ljung.
System Identification: Theory for the User
.
Prentice Hall, second edition, 1999.
L. Ljung and T. Glad. On global identifiability for arbitrary model parametrizations.
Automatica
, 30
(2):265–276, 1994.
C. Lyzell, T. Glad, M. Enqvist, and L. Ljung. Identification aspects of Ritt’s algorithm for discretetime systems. In
Proceedings of the 15th IFAC Symposium on
System Identification
, SaintMalo, France, July 2009.
B. Mishra.
Algorithmic Algebra
. Texts and Monographs in Computer Science. SpringerVerlag, 1993.
J. F. Ritt.
Di ff erential Algebra
. Dover Publications, 1950.
W. Rudin.
Principles of Mathematical Analysis
.
McGrawHill, third edition, 1976.
A. Seidenberg. An elimination theory for di ff erential algebra. In
University of California Publications in
Mathematics: New Series
, pages 31–66. University of
California Press, 1956.
T. S¨oderstr¨om and P. Stoica.
Instrumental Variable Methods for System Identification
. Springer, 1983.
12
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project