Time series, Part 3 Nonlinear analysis of time series  

Time series, Part 3 Nonlinear analysis of time series  
Time series, Part 3
Nonlinear analysis of time series
Linear analysis / linear models
autocorrelation
AR model
ARMA(p,q) model
xt  1 xt 1     p xt  p  zt  1 zt 1     q zt q
Advantages:
Shortcomings:
1.
2.
1.
3.
Simple
Gaussian process, established
theory for stochastic processes
and statistical inference
Useful in applications
2.
Cannot explain irregular patterns
in the time series
- data (distribution) asymmetry
- time irreversibility
- «bursts»
Deterministic part:
- stable fixed point system
- unstable system
- periodic system
Nonlinear analysis of time series
description of irregular
patterns
explanation / detection of complex
deterministic patterns
A general
nonlinear model
X t  f ( X t 1 , X t 2 ,, X t  p ,  t )
X t  f ( X t 1 , X t 2 ,, X t  p )   t


X t 1  X t 1 , X t 2 ,, X t  p '   p
f ?
f :p  
additive
noise
Linear AR
model
X t  1 X t 1  2 X t 2   p X t  p   t
Generalizations / extensions of the ΑR model
constant (linear ΑR)
1 ,2 ,, p
1(1) , 2(1) ,,  p(1)
1( 2) , 2( 2) ,,  p( 2)
piecewise models
- SETAR
- Markovian
1(l ) , 2(l ) ,,  p(l )
random coefficients
constant (linear ΑR, ARMA)

function of Xt
- ARCH
- GARCH
- RCA
- BL
Self-excited threshold autoregressive models (SETAR)
selection of a lag d,
partition of  for X t d
Partition of 
p
r0 , r1 ,, rl 1 , rl 
   r0  r1    rl  
Ri  (ri 1 , ri ], i  1,, l
  R1  R2   Rl
SETAR
X t  1( j ) X t 1  2( j ) X t 2   p( j ) X t  p   ( j ) t
when X t d  R j
Example for SETAR
αν X t 1  0
αν X t 1  0
 t ~ (0,1)
(xt-1,xt) for a SETAR model
4
3
2
1
x(t)
 2.0  0.6 X t 1   t
Xt  
 1.0  0.4 X t 1   t
0
-1
-2
-3
-5
0
x(t-1)
5
AR models with probabilistic selection of threshold
Exponential autoregressive models (EAR)
Xt  
( j)
1
X t 1  
( j)
2
με 
1
j
2 με 1  
X t 2   t
Example

(1)
1
 1 
(1)
2
0

( 2)
1
( 2)

 0 2  2
AR models with periodic coefficients
X t  1( j ) X t 1  2( j ) X t 2   t
όταν t  2k
1
j
2 όταν t  2k  1
Markov chain driven AR models
The selection of the threshold
is determined by a Markov chain
J t  j  1,2,, l
Example
Xt  
( Jt )
X t 1   t
P( J t  j | J t 1
Transition matrix
P( J t  j | J t 1  i)
 (1)  0.9
 0.1 0.9
 i) = 

0.2 0.8
 ( 2)  0.9
Piecewise polynomial models
X t  f ( X t 1 , X t 2 ,, X t  p )   t
X t  pm ( X t 1 , X t 2 ,
, X t  p )  t
polynomial of
order p and
degree m
Example
X t  aX t 1 (1  X t 1 )  aX t 1  aX t21
a 1
Two fixed points: 0 and (a  1) / a
Fractional autoregressive models
Fraction of two polynomials
p
Example
Xt 
a0   a j X t j1
j 1
q
b0   b j X
j 1
j
t 1
0  p  q 1
 t
ap  0
bq  0
logistic map
random coefficients autoregressive models (RCA)
X t   t X t 1
AR(1) with multiplicative errors
p
RCA
i  bi  Bi (t )
X t   bi  Bi (t ) X t i   t
i 1
bi constant
B1 (t ), B2 (t ),, B p (t )
random with mean 0
Example
X t   0.1  Bi (t )X t i   t
independent of
Bt ~ (0,0.9 2 )
t
Xt
Bilinear models (BL)
p
X t   ai  Ai (t ) X t i   t
i 1
coefficients
s
Ai (t )   b jk  t k
k 1
i  ai  Ai (t )
“Bilinear” because:
Xs, s  t
- If s  t , X s  const  X t linear w.r.t.  s , s  t
- If s  t ,  s  const  X t
BL of order 1:
linear w.r.t.
X t  aX t 1  b t X t 1   t
AR models with conditional heteroscedasticity
Model of multiplicative noise
ARCH
X t   t Vt
 0
i  0
 t ~ (0,  2 )
Vt    1 X t21     p X t2 p
X t  ~ ARCH
GARCH
X t   t Vt
 
 X t2
~ BL
p
q
i 1
i 1
Vt    i X t2i   iVt i
 0
i  0
i  0
Analysis with nonlinear models
1. Model selection
Μ candidate models, m = 1,...,M


AIC(m)  2ln g x | θˆm ( x)  2r
2. Parameter estimation
- maximum likelihood method
- method of ordinary least squares
3. Diagnostic checking
uncorrelated
errors (rediduals):
following normal distribution
Real world time series
physiology
mechanics
geophysics
economics
Nonlinear time series analysis and dynamical systems
Time series x1 , x2 ,
, xn
Assumption:
observation : xt  h( st )
st   d : trajectory of the dynamical system
h :  d   observation function
st  f t ( s0 )
Nonlinear dynamical system
s 0 : state vector at time 0
f t :  d   d system function
t : continuous or discrete time
For time series we assume underlying systems to be dissipative
Trajectory in 
d
attractor
Attractor:
● stable fixed (equilibrium) point
● finite set of equilibrium points
can be derived by
a linear system
● limit cycle
cannot be derived by
a linear system
● torus
● strange attractor
self similarity - fractals
sensitivity to initial conditions
chaos
Nonlinear dynamical systems, maps (discrete time)
Logistic map
si = a si-1(1 - si-1)
chaotic a=4
periodic a=3.52
chaotic map Hénon
si = 1 – 1.4 si-12 + 0.3si-2
chaotic map Ikeda

6i 

sk  1  0.9sk 1exp 0.4i 
2


1

s
k 1 

Nonlinear dynamical systems, flows (continuous time)
Lorenz system:
s1 , s 2 , s3
s1
s1  a( s2  s1 )
s2  bs1  s2  s1s3
s3  cs3  s1s2
a  10 b  28 c 
8
3
sampling time τs
s2
s3
Noise in the time series
Observation
xt  h( st )
noise
xt  h( st )  wt
observational noise
wt : white noise, uncorrelated to xt and s t
Dynamical system
st  f ( s0 )
t
noise
st  f t ( s0 )   t
dynamic (system) noise
 t : white noise, uncorrelated to su u  t
Noise:
dynamic (system) ε
observational (measurement) w
logistic map
si = a si-1(1 - si-1) + εi , εi ~ N(0,s2)
xi = si
periodic
chaotic
si = a si-1(1 - si-1)
xi = si + wi, wi ~ N(0,s)
Scatter diagrams in 2 and 3 dimensions
d=1
d=2
d=3
Scatter diagrams in 2 and 3 dimensions
d=2
d=1
annual sunspots 1700-1996
d=3
sunspots
200
200
150
150
sunspots
200
100
x(i-2)
x(i-1)
x(i)
150
100
100
50
50
50
200
150
0
0
0
0
50
100
150
time index i
200
250
300
0
0
square of AR(9)
50
100
Squarex(i)
of AR(9)
150
100
50
100
50
150
200 0
Square of AR(9)
x(i)
200
500
x(i-1)
500
400
400
500
400
300
x(i-2)
x(i)
x(i-1)
300
200
200
300
200
100
100
100
600
0
0
400
200
0
0
50
100
150
time index i
200
250
0
0
300
200
300
400
500
2000
1500
1500
x(i-1)
x(i)
square of z-lorenz
square of z-lorenz
square of z-lorenz
0
600
x(i)
2000
200
400
100
2000
1000
x(i-2)
x(i-1)
x(i)
1500
1000
1000
500
500
500
2000
0
0
0
50
100
150
time index i
200
250
0
0
500
1000
x(i)
1500
2000
1500
500
1000
1000
1500
x(i)
500
2000
0
x(i-1)
Topics in
the analysis of time series and dynamical systems
- State space reconstruction
in order to observe the complexity / stochasticity / structure
of the system
- Estimation of characteristics of the system / attractor
measuring the complexity / dimension of the system
- Modeling / Prediction
Use nonlinear models to improve predictions
- Other topics:
- Hypothesis testing for linearity / nonlinearity
- Control system evolution
- Synchronization
-…
State space reconstruction
condition: m  2D  1
initial state
space
Embedding
M
?
si 1  f ( si )
Φ
si
si 1
Rm
xi 1  F ( xi )
xi
xi 1
xi = F(si )
h
We assume that
the studied system
is deterministic
Method of delays
xi = h(si )
R
observed
quantity
reconstructed
state space
x
xi = [xi , xi-t ,…, xi-(m-1)t ]
Parameters
embedding dimension m
delay time t
time window length tw
tw = (m-1)t
s(i)= 1 – 1.4 s(i-1)2 + 0.3s(i-2)
or
s1 (i)= 1 – 1.4 s1(i-1)2 + s2(i-1)
s2 (i)= 0.3 s1(i-1)
Example: Hénon map
Method of delays
self-intersections
m=2 τ=1
m=2 τ=2
m=3 τ=1
m=3 τ=2
projection
xi= s1 (i)
s1  a ( s1  s2 )
s2   s1s3  bs1  s2
s3  s1s2  cs3
a=10, b=28, c=8/3
projection
xi= s1 (i)
Example: Lorenz system
optimal τ ?
Method of delays, m=3
τ =5
τ=1
τ =10
τ =20
Estimation of τ
• From the autocorrelation r(τ)
(measures linear correlation)
τ  r(τ) =1/e ή
τ  r(τ) =0
• From the mutual information I(τ)
(measures linear and
nonlinear correlation)
τ  first local minimum I(τ)
I ( X , Y )   p XY ( x, y ) log
x, y
X  xi
p XY ( x, y )
p X ( x) pY ( y )
Y  xi t
I ( X , Y )  I (t )
Estimation of m
Optimal m ?
• Too small m
R
 self-intersection in the attractor
• Too large m
 “curse of dimensionality”
R2
• Takens theorem: m  2D  1
… but D is unknown
Method of false nearest neighbors (FNN)
• Close points on the attractor are:
- either real neighboring points due to system dynamics
- or false neighboring points due to self-intersections and insufficiently low m
• At a larger m where there are no self-intersections all false neighboring points
will be resolved as they will no longer be close
• The optimal m’ is the one for which there are no longer any false nearest
neighbors as the dimension increases by one from m’ to m’+1.
An example of estimating m by the method FNN
x-Lorenz without noise
x-Lorenz + 10% noise
FNN, x-lorenz, no-noise
FNN, x-lorenz 10% noise
40
40
t=2
t=5
t=10
t=20
35
30
30
25
% FNN
% FNN
25
20
20
15
15
10
10
5
5
0
t=2
t=5
t=10
t=20
35
2
4
6
m
8
10
0
2
4
6
m
The estimation of m with the method FNN depends on:
- the delay τ
- noise
8
10
Estimation of nonlinear characteristics
Nonlinear characteristics (invariant measures)
• Dimension
1. Euclidean
2. Topologic
3. Fractal
(correlation, information, box counting, …)
• Lyapunov exponents
(largest, the whole spectrum)
• Entropy
Correlation dimension ν
The correlation dimension ν characterizes the fractal structure of the
attractor (self-similarity at different scales) using the density of the points
of the attractor in the reconstructed state space
The basic idea is that the probability of two points being
closer than a distance r

 xi  x j  r

xi
changes w.r.t. r as a power of r
 i : number of points lying in a sphere with
radius r and center x i
i
scaling law
i
x

  xi  x j  r
x
~ r

holds for
r 0 N 
ν integer
the attractor is a regular geometric object
ν non-integer
attractor is a fractal
xi
Estimation of the correlation dimension ν
time series
xi , i  1,
Estimation of
i
x
, N  (m  1)t
reconstruction xi , i  1,, N

N
N
2
Correlation sum C (r ) 
   r  xi  x j
N ( N  1) i 1 j i 1

0 when x  0

(
x
)


Heaviside function
1 when x  0
Scaling law
xi
Estimation  
xi
C ( r )  r
for small r
d log C (r )
for a range of r
d log r
Convergence of ν(m) for m sufficiently large
If ν small and non-integer and the system is deterministic
small dimension and fractal (chaotic) structure
log C(r) vs log r
x-Lorenz without noise, τ=2
local slope vs log r
x-Lorenz + 10% observational noise, τ=2
x-Lorenz + 10% observational noise, τ=10
ν vs m
The estimation of ν is affected by the following factors:
- correlation time 
- selection of τ and m
- noise
- time series length
i j w
()
n=924
0
()
()
5
5
4
4
m=1
-2
m=10
-3
3
3
m=10

local slope
Hénon
logC(r)
-1
2
2
1
-4
1
m=1
-5
-2
-1.5
-1
-0.5
0
0
-2
0.5
-1.5
-1
logr
-0.5
0
0
0
0.5
2
4
log r
()
6
8
10
6
8
10
6
8
10
6
8
10
m
(t)
()
5
5
0
m=1
4
4
-1
-2
m=10
-3
3

local slope
3
logC(r)
Hénon
+ 10% white noise
m=10
2
2
1
1
-4
-5
-2
m=1
-1.5
-1
-0.5
0
0
-2
0.5
logr
()
0
0
-1.5
-1
-0.5
0
2
4
0.5
m
log r
()
()
10
10
0
8
m=1
m=10
8
-2
-3
-4
-5
-4
6
6

logC(r)
Returns of ASE index
1/1/2005 – 20/9/2005
local slope
-1
4
4
2
2
m=10
-3.5
-3
-2.5
logr
()
-2
-1.5
0
-4
-1
-3.5
-3
-2.5
log r
()
m=1
-2
-1.5
0
0
-1
2
4
m
()
10
10
0
8
m=1
m=10
8
-2
-3
-4
-5
-4
6
6

local slope
white noise
logC(r)
-1
4
4
2
2
m=10
-3.5
-3
-2.5
logr
-2
-1.5
-1
0
-4
-3.5
-3
-2.5
log r
m=1
-2
-1.5
-1
0
0
2
4
m
Lyapunov exponents
The Lyapunov exponents measure the average rate of divergence and convergence
of the trajectories on the attractor at the directions of the local state space
Lyapunov spectrum:
1  2  ...  m
λi > 0  divergence
λi < 0  convergence
λi = 0  direction of flow
m
Dissipative system :

i 1
i
0
If λ1 > 0 and the system is deterministic
chaos
Largest Lyapunov exponent λ1
Initial distance d0= xi - xi’ of two nearby trajectories is
expected to increase exponentially with time
After time t: dt= xi+t - xi’+t
λ1 is the largest
Lyapunov exponent
t
If  t   0e
1
1 N t, j
1 
 ln
Nt j 1  0, j
Computation:
xi’+t
xi’
d0
dt
xi
xi+t
Example: x-Lorenz
without noise
with 10%-noise
The estimation of λ1 depends on : τ, m, noise
Prediction models
The true system generating the time series:
si 1  f ( si )
Hénon map
s1,i 1  1  1.4s1,2i  s2,i
s2,i 1  0.3s1,i
f
si 
 si1
f1
( s1,i , s2,i ) 
 s1,i 1
f2
( s1,i , s2,i ) 
 s2,i 1
Prediction models
unknown
si 1  f ( si )
The reconstructed system from the time series: xi 1  F ( xi )
estimation?
The true system generating the time series:
The problem of modeling and prediction of time series:
given x1, x2, … xi , to estimate / predict xi+1
State space reconstruction
with the method of delays:
xi = [xi, xi-t …, xi-(m-1)t]
The function that is relevant to
time series prediction:
xi 1  F ( xi ) F :  m   m
xi 1  F ( xi ) F :  m  
m = 2, τ = 1
xi 1  F ( xi , xi 1 )
Nonlinear prediction models
• Global models,
e.g. polynomials
function F bears the same analytic expression
for the whole domain
• Local models,
e.g. the local linear model
function F is defined differently at each point of the
reconstructed state space
• Semi-local models,
e.g. neural networks
the form of function F is derived as a weighted sum of
local basic functions
Prediction using similar segments of the time series
Prediction at time i+T from the mappings Τ step ahead of
“similar” segments from the past of the time series
Local prediction models
Implementation of the idea of “similar” segments:
time series segments  reconstructed points
The nearest neighboring points to xi: { xi (1) , xi ( 2) ,..., xi ( K ) }
Prediction of xi+T from the mappings of the neighbors: {xi (1)T , xi ( 2)T ,..., xi ( K )T }
Zeroth order prediction:
Average prediction:
xˆi T  xi (T )  xi (1)T
1
xi (T ) 
K
K
x
j 1
i ( j ) T
Local linear prediction
We assume that for the neighbor of xi the local linear model is valid :
xi 1  F ( xi )  F ( xi , xi t , , xi ( m 1)t )
 a0  a1 xi  a2 xi t    am xi ( m1)t
 a0  a'xi
xi(1)+T = a0 + a’ xi(1)
The model holds for
xi (1) , xi ( 2) ,..., xi ( K )
xi(2)+T = a0 + a’ xi(2)
xi(K)+T = a0 + a’ xi(K)
Estimation of parameters a0 , a1 ,, am
(method of ordinary least squares)
K
min
a0 ,a1 ,,am
(x
j 1
i ( j ) 1
 a0  a1 xi ( j )    am xi ( j )( m1)t ) 2
Estimation of prediction error
xi (T )
We split the time series in two parts:
xˆN1 1 ,
, xˆN
, xN1 , xN1 1 ,
, xN
predictions
x1 , x2,
learning set
statistic for
prediction error
ei T  xi T  xˆi T
prediction error
test set
NRM SE(T ) 
N T
1
xt T  xˆt T 2

N  T  N1 t  N1 1
1 N
2


x

x
 i
N i 1
Example: x-Lorenz
Prediction with:
• local linear prediction model (LLP)
• local average prediction model (LAP)
without noise
with 10%-noise
t  1, m  5, K  11
annual- quarter growth rate of GNP of USE in the period 1947 – 1991
Prediction with
- linear model, AR
- local average model, LAM
- local linear model, LLM
Predictions starting at the first
quarter of 1989 with prediction
horizon being the last 6 years
Prediction error (nrmse) for the
last 30 quarters
()
()
0.02
real
AR(3)
LAM(m=5,K=15)
LLM(m=5,K=15)
0.015
1.1
nrmse(m)
0.01
1
0.005
0.9
0
0.8
0.7
0
AR
LAM(K=15)
LLM(K=15)
2
4
6
m
8
10
-0.005
-0.01
164
166
168
170
172
174
176
ASE index in the period 1/1/2002 – 20/9/2005
Predict index with
- linear model, AR
- local average model, LAM
Prediction starting at 20/9/2005
and prediction horizon is up to 16 days ahead
yt 
returns
xt  xt 1
xt 1
index
()
()
0.015
3450
3400
0.005
close index
returns of index
0.01
general index
xn(T), AR(7)
xn(T), LAM(m=7,K=20)
0
3350
3300
-0.005
general index returns
y (T), AR(7)
n
y (T), LAM(m=7,K=20)
-0.01
3250
n
-0.015
18
25
02
day
09
16
3200
18
25
02
day
09
16
ASE index in the period 1/1/2002 – 20/9/2005
Predict index with
- linear model, AR
- local average model, LAM
One step ahead prediction
in the period 21/9/2005 – 12/10/2005
yt 
returns
xt  xt 1
xt 1
index
()
()
3450
0.015
0.01
general index
xn(1) AR(7)
xn(1) LAM(m=7,K=20)
3400
close index
index return
0.005
0
3350
3300
-0.005
3250
general index
y (1) AR(7)
n
y (1) LAM(m=7,K=20)
-0.01
n
-0.015
18
25
02
day
09
16
3200
18
25
02
day
09
16
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement