Agosto Arianna tesi

Agosto Arianna tesi
Alma Mater Studiorum – Università di Bologna
DOTTORATO DI RICERCA IN
METODOLOGIA STATISTICA PER LA RICERCA
SCIENTIFICA
Ciclo XXVI
Settore Concorsuale di afferenza: 13/A5
Settore Scientifico disciplinare:
SECS-P/05
ECONOMETRICS OF DEFAULT RISK
Presentata da:
Arianna Agosto
Coordinatore Dottorato
Tutor
Prof. Angela Montanari
Prof. Giuseppe Cavaliere
Co-tutor
Prof. Anders Rahbek
Esame finale anno 2012/2013
Contents
1 Introduction to Default Risk
6
1.1 Default risk: de…nition and measurement . . . . . . . . . . . . . . . .
6
1.2 The Default Clustering . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.3 Motivation and overview . . . . . . . . . . . . . . . . . . . . . . . . .
11
2 Econometric modelling of Default Risk
2.1 Default prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1
14
14
The role of rating . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.2 Default correlation and Contagion . . . . . . . . . . . . . . . . . . . .
20
2.3 The study of default correlation through count models . . . . . . . .
22
2.3.1
Testing conditional independence of defaults . . . . . . . . . .
2.3.2
An Autoregressive Conditional Duration model of credit risk
23
contagion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.4 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3 Econometric modelling of Count Time Series
29
3.1 Generalized Linear Models for time series . . . . . . . . . . . . . . . .
29
3.2 The Poisson Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.2.1
Model speci…cation . . . . . . . . . . . . . . . . . . . . . . . .
32
3.2.2
Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.2.3
Asymptotic theory . . . . . . . . . . . . . . . . . . . . . . . .
37
3.2.4
Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . .
38
3.2.5
Goodness of …t . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.2.6
Model selection . . . . . . . . . . . . . . . . . . . . . . . . . .
41
1
CONTENTS
2
3.3 The doubly-truncated Poisson model . . . . . . . . . . . . . . . . . .
41
3.4 The Zeger-Qaqish model . . . . . . . . . . . . . . . . . . . . . . . . .
42
3.5 Overdispersion and negative binomial regression . . . . . . . . . . . .
45
3.6 Poisson Autoregression . . . . . . . . . . . . . . . . . . . . . . . . . .
46
3.6.1
Model speci…cation . . . . . . . . . . . . . . . . . . . . . . . .
47
3.6.2
Ergodicity results . . . . . . . . . . . . . . . . . . . . . . . . .
48
3.6.3
Estimation of parameters . . . . . . . . . . . . . . . . . . . . .
50
3.6.4
Asymptotic theory . . . . . . . . . . . . . . . . . . . . . . . .
51
3.7 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
4 A new Poisson Autoregressive model with covariates
54
4.1 Related literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
4.2 Speci…cation of PARX models . . . . . . . . . . . . . . . . . . . . . .
56
4.3 Time series properties . . . . . . . . . . . . . . . . . . . . . . . . . .
57
4.4 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . .
61
4.5 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
4.6 Finite-sample simulations . . . . . . . . . . . . . . . . . . . . . . . .
64
4.6.1
Simulation design . . . . . . . . . . . . . . . . . . . . . . . . .
65
4.6.2
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
4.7 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
5 Empirical study of Corporate Default Counts
76
5.1 Overview of the approach . . . . . . . . . . . . . . . . . . . . . . . .
77
5.2 Corporate default counts data . . . . . . . . . . . . . . . . . . . . . .
77
5.3 Choice of the covariates . . . . . . . . . . . . . . . . . . . . . . . . .
82
5.3.1
Financial market variables . . . . . . . . . . . . . . . . . . . .
83
5.3.2
Production and macroeconomic indicators . . . . . . . . . . .
88
5.4 Poisson Autoregressive models for corporate default counts . . . . . .
90
5.4.1
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
5.4.2
Goodness of …t analysis
97
. . . . . . . . . . . . . . . . . . . . .
5.5 Out-of-sample prediction . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
CONTENTS
3
6 Conclusions
107
A Appendix
111
Bibliography
119
Abstract
This thesis is the result of a project aimed at the study of a crucial topic in …nance:
default risk, whose measurement and modelling have achieved increasing relevance
in recent years. We investigate the main issues related to the default phenomenon,
under both a methodological and empirical perspective. The topics of default predictability and correlation are treated with a constant attention to the modelling
solutions and reviewing critically the literature. From the methodological point of
view, our analysis results in the proposal of a new class of models, called Poisson
Autoregression with Exogenous Covariates (PARX). The PARX models, including
both autoregressive end exogenous components, are able to capture the dynamics of
default count time series, characterized by persistence of shocks and slowly decaying
autocorrelation.
Application of di¤erent PARX models to the monthly default counts of US industrial …rms in the period 1982-2011 allows an empirical insight of the defaults
dynamics and supports the identi…cation of the main default predictors at an aggregate level.
4
Acknowledgements
I am grateful to my supervisor Prof. Giuseppe Cavaliere for precious advice and for
all I learned from him.
I express my sincere gratitude to my co-tutor Prof. Anders Rahbek for supporting
my ideas and for the great experience in Copenhagen.
Thanks to all my research group for useful suggestions and comments. I am particularly grateful to Dr. Luca De Angelis for all his support.
I would like to thank Pablo Barbagallo from Moody’s Corporation.
A special thank to Lucia for all the moments we shared in our PhD experience.
I am grateful to Dr. Enrico Moretto, who believes in me more than I do.
Many thanks to my family for teaching me to never give up and, last but not least,
to Rocco for all his love and support.
5
Chapter 1
Introduction to Default Risk
This chapter explains how default risk can be de…ned and measured, motivating
the importance of deriving models for its analysis and prediction. After giving a
technical de…nition of the default event, we illustrate the main empirical evidences
in the corporate default phenomenon as well as two crucial topics related to their
interpretation - default predictability and correlation between corporate defaults.
The structure and the motivation of the thesis work is then presented and connected
to the economic and …nancial issues introduced.
1.1
Default risk: de…nition and measurement
Default risk is de…ned as the risk of loss from a counterparty failure to repay the owed
amount in terms of either principal or interests of a loan. Default is considered as the
most serious event related to credit risk, the last referring to the more comprehensive
case of a change in the current value of a credit exposure due to an expected variation
of the borrower solvency.
Banks and …nancial groups are highly involved in both corporate and retail default
risk and are required to adopt methodologies for quantifying such risk and thereby
determining the amount of capital necessary to support their business and to protect
themselves against volatility in the level of losses. The default risk management
is included in the Basel II regulation for the stability of the international banking
6
CHAPTER 1. INTRODUCTION TO DEFAULT RISK
7
system and comprises both general economic capital requirements and internal rating
procedures. A key aspect in default risk management is the measurement of the
Probability of Default, i.e. the probability that, following the de…nition given by the
Bank of International Settlements, with regard to a particular obligor either or both
of the two events have taken place:
the bank considers that the obligor is unlikely to pay its credit obligations to
the banking group in full, without recourse by the bank to actions such as
realising securities (if held)
the obligor is past due more than 90 days on any material credit obligation to
the banking group.
There are two main approaches to default risk modelling: the structural and the
reduced form approach. The …rst considers default as an endogenously determined
event which can be predicted by the economic and …nancial conditions of the company, re‡ected in its balance sheet data and market value. Therefore, structural
models study the evolution of structural …rm variables such as the assets and debt
values in order to determine the probability and the timing of bankruptcy, thus explicitly relating default to the …rst time the assets fall below a certain level - the
default barrier - as an endogenously determined event. This approach was introduced by the seminal work of Merton (1974), which …rst relies on the option pricing
theory for deriving the probability that the assets fall below the outstanding value of
debt. Merton model is based on treating the assets of a …rm as a call option held by
the stockholders, whose price - the (known) market value - implies the probability of
default. This approach is then extended by abandoning some irrealistic assumptions,
such as the existence of a …xed default barrier given by the nominal total value of
debt. Black and Cox (1976) introduce a time-varying threshold de…ned as a fraction
of the nominal value of liabilities, as it is done by Leland (1994), which also considers
the …scal aspects of bakruptcy decision. Leland and Toft (1996) …rst evaluate the
e¤ects of the presence of coupons and of short-term debt roll-over. A recent development by Agosto and Moretto (2012) determines the curvature parameter of the
CHAPTER 1. INTRODUCTION TO DEFAULT RISK
8
nonconstant default barrier by using …rm-speci…c balance sheet and market data.
Moody’s KMV, the proprietary model used by the rating agency Moody’s for determining the probability of default, is the most famous application of a structural
model and is based on the extension of the Merton model developed by Kealhofer,
McQuown and Vasicek in 1989.
In contrast to the structural approach, reduced-form models consider default
as an exogenously determined process and use immediately available market and
credit data - mainly forward rates, rating and price of the issued bonds - rather than
modelling the asset value dynamics. Jarrow and Turnbull (1995) and its development
Jarrow, Lando and Turnbull (1997), for example, de…ne a model which explicitly
incorporates credit rating information into debt instruments pricing and can also be
used for risk management purposes as it allows to derive the probabilities of solvency
implied by credit spreads. An important class of reduced-form models is that of the
so called intensity models. They consider the default time as the stochastic …rst
jump time of a count process - Poisson in many cases - whose intensity is a function
of latent or observable variables. Their link to probability of default modelling is
clear if one thinks that the limit of the intensity of a count process, for a time period
approximating zero, is the probability of observing one event. The popularity of
intensity models has increased in recent years, as they allow for many econometric
applications based on the estimation of default intensity through risk factors and
business failure predictors. This approach is followed, for example, by Du¢ e and
Singleton (1999) and Lando (1998) and, as we shall explain, can be e¤ectively used
for considering relevant aspects such as dependence between corporate defaults.
Looking at the empirical measures of default risk, the data typically used in risk
management and published in rating agencies and …nancial institutions reports are:
default rate: it is the most widely used measure of the default phenomenon
incidence, being de…ned as the number of defaulting companies in a certain
time period divided by the total number of debt issuers in the same period.
An alternative de…nition, that we do not consider here, is the value-weighted
default rate, which considers the incidence of defaults in terms of money loss;
CHAPTER 1. INTRODUCTION TO DEFAULT RISK
9
default count: it is the number of failures in a certain time period (typically
a month). As we shall see, there are several reasons motivating the counting
approach to default risk modelling;
…rm-speci…c measures, such as distance-to-default: this is a volatility-adjusted
measure calculated and periodically published by Moody’s, resulting from the
application of the above mentioned KMV model. Following Crosbie and Bohn
(2002), it can be de…ned as “the number of asset value’s standard deviations
between the market asset value and the default point”.
Most of the works presented in the following are focused on default rates or
counts modelling and often use “ready-available”measures of …rm-speci…c risk such
as distance-to-default.
1.2
The Default Clustering
Looking at the corporate defaults phenomenon under an aggregate perspective, the
most relevant aspect is the strong empirical evidence that corporate defaults cluster
in time: both default rates and counts show very high peaks, followed by periods of
low incidence. This is clear from Figure 1.1, showing the time series of US default
rates and counts among Moody’s rated industrial …rms from 1982 to 2011.
The potentially strong impact of the default clusters on the investors and …nancial
institutions risk has increased the interest of the …nancial and econometric literature
in the two main issues related to the presence of default peaks: default predictability
and default correlation.
First, a central objective in risk management is …nding macroeconomic variables
and …nancial indicators that are able to predict the peaks in the number of defaults, in
support of …nancial vigilance and central banks decisions. There are indeed many empirical studies analyzing the strong time variation of default frequencies and linking
it to macroeconomic variables and business cycle indicators. This is done, amongst
others, by Shumway (2001) and Du¢ e et al. (2007).
CHAPTER 1. INTRODUCTION TO DEFAULT RISK
10
Figure 1.1: (a) Monthly default count of Moody’s rated industrial …rms from January
1982 to December 2011. (b) Monthly default rate of Moody’s rated industrial …rms
from January 1982 to December 2011.
The interpretation of default clustering is also connected to the issue of correlation, as a high number of defaults in a short period could also be caused by
commercial and …nancial links between the companies. The study of correlation
between corporate defaults is an essential tool of credit risk management at portfolio level and its importance has increased in recent years for several reasons. First,
banks minimum capital requirements in the Basel II approach are function, among
the other things, of the borrowers joint default probability, measured by asset correlation. Second, there has been a large growth of …nancial instruments like Collateralized Debt Obligations, whose cash ‡ows depend explicitly on default frequency at
portfolio level. Furthermore, the evaluation of default probability at the level of an
individual security is not able to give an adequate explanation of credit risk spreads,
whose dynamics are in‡uenced by commonality in corporate solvency.
The default clustering phenomenon has given rise to a debate about its possible explanation. An important question is whether cross-…rm default correlation
associated with observable macroeconomic and …nancial factors a¤ecting corporate
solvency is su¢ cient to explain the observed degree of default clustering or it is
CHAPTER 1. INTRODUCTION TO DEFAULT RISK
11
possible to document contagion e¤ects by which one …rm’s default increases the likelihood of other …rms defaulting. The “cascade” e¤ect which seems to be generated
by defaults could spread by means of contractual relationships (customer-supplier or
borrower-creditor, for example) or through an “informational” channel, that means
a change in the agents expectations of corporate solvency. An increased uncertainty
on the credit market leading to a worsening in funding conditions, like credit crunch
or higher interest rates, can indeed in‡uence the risk perception.
Furthermore,
the default clusters could be linked to the systematic (aggregate) risk generated by
common macroeconomic and …nancial risk factors a¤ecting …rm solvency: this case
is usually excluded from the most strict de…nition of contagion, that refers instead
to between-…rms e¤ects on default timing. The works we are going to present in
the following chapter are related to default prediction and correlation, investigated
through models for aggregate or …rm-speci…c data on default events.
1.3
Motivation and overview
The aim of this work is to study how default risk can be measured and modelled.
We contribute to the existing literature by de…ning, studying and applying a count
time series model for the number of corporate defaults, providing a good in- and
out-of-sample forecasting of default counts in an extended group of debt issuers.
Our model speci…cation results from the analysis of the stylized facts of corporate
default count time series presented in this chapter. First of all, as it often happens for
rare events, the default phenomenon is characterized by overdispersion: the variance
of the number of events is much higher than its mean, leading to series showing both
peaks (“clusters”) and periods of low incidence. Moreover, the default count time
series are characterized by a slowly decreasing autocorrelation function, which is a
typical feature of long-memory processes.
We start, in Chapter 2, with a review of the main econometric and …nancial
models for default risk, with a …nal focus on intensity models applied to count time
series of corporate defaults.
We then present, in Chapter 3, the main models for count data used in econo-
CHAPTER 1. INTRODUCTION TO DEFAULT RISK
12
metrics, which rely on the theory of Generalized Linear Models. For several reasons
related to the empirical evidences in corporate default count time series, we focus
on conditional Poisson models, taking Poisson Autoregression by Fokianos, Rahbek
and Tjøstheim (2009) as our main reference. This model (reviewed in Section 3.6)
is based on the de…nition of the count process as a sequence of Poisson drawings
which are independent conditional on the past count history. The time-varying intensity (i.e. the expected number of events at time t) is speci…ed as a linear function
of lagged counts and intensities. This approach shares some similarities with the
Generalized Autoregressive Conditional Heteroskedasticity (GARCH) approach for
volatility (Bollerslev, 1986). The idea - which can be considered as the …rst part of our
contribution - is that of modelling default clustering in a similar way to the models
for volatility clustering, through an autoregressive model which also gives a measure
of “persistence” of the series. The dependence of the process (i.e. the number of
defaults, in our case) on its past history can indeed explain its long memory and allows to study it under the perspective of shocks persistence. Poisson Autoregression
- di¤erently from the traditional Poisson model - also allows for overdispersion.
The consideration that the expected number of defaults is probably in‡uenced by
the macroeconomic and …nancial context in which corporate …rms operate has led us
to the idea of extending Poisson Autoregression by including exogenous covariates.
Thus, in Chapter 4, we present our methodological contribution, developing a class
of Poisson intensity AutoRegressions with eXogeneous covariates (PARX) models
that can be used for modelling and forecasting time series of counts. We analyze the
time series properties and the conditions for stationarity and develop the asymptotic
theory for this new model. This way we provide a ‡exible framework for analyzing
dependence of default intensity on both the past number of default events and other
relevant …nancial variables. It is also interesting to consider the impact of including
a lagged covariate process on the estimated persistence.
In Chapter 5, we present an extended empirical study of US corporate defaults,
based on the application of alternative PARX models. We consider the monthly
default counts of US Moody’s rated corporate …rms: the rating agency Moody’s
provides monthly and annual reports showing default rates and counts and also
CHAPTER 1. INTRODUCTION TO DEFAULT RISK
13
o¤ers some instruments for looking more analitically through the data. One of these
services is the Credit Risk Calculator, which allows to create customized reports and
get data on defaults and rating transitions for speci…c sectors in a given geographical
area. We use a dataset which covers the period from January 1982 to December
2011 and consists in the monthly default counts of US Moody’s rated corporate
…rms classi…ed as “broad industrial”, that means it excludes banking, …nancial and
insurance companies as well as public utility and transportation activities. As we
will see in the review part, the use of data on industrial …rms is common in the
corporate defaults analyses. We consider the impact on default intensity of several
covariate processes, such as business cycle indicators, production indexes and rating
downgrades. For analyzing the link between the …nancial and the credit market
we also include a measure of realized volatility of returns. Realized volatility is
expected to summarize the level of uncertainty during periods of …nancial turmoil
when corporate defaults are more likely to cluster and we show that it is signi…cantly
and positively associated with the number of defaults.
Chapter 2
Econometric modelling of Default
Risk
The two main issues related to the corporate default phenomenon - default predictability and correlation - are now analyzed through an overview of the existing
…nancial and econometric literature of credit risk modelling, with a special focus on
the models for default intensity, de…ned as the expected number of bankruptcies in
a given period. These models often include macroeconomic and …nancial explanatory variables, in the aim of …nding both common and …rm-speci…c risk factors for
solvency and default predictors. Furthermore, the count modelling framework allows
extensions easing the analysis of dependence between default events.
2.1
Default prediction
The most obvious default predictor for a single …rm is represented by its business
and …nancial conditions, which can be summarized by balance sheet data such as
leverage and net pro…t measures. This approach is natural in the above mentioned
structural models, which are based on the study of the …rm’s asset evolution, but
also characterizes a variety of statistical methods for credit risk measurement, such as
credit scoring. It is due, for example, to Altman (1968) the development of a multiple
discriminant statistical methodology applied to bankruptcy prediction through a
14
CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK
15
set of …nancial and economic ratios which are shown to successfully discriminate
between failing and nonfailing …rms. The discriminant function includes variables
such as working capital on total assets ratio, market on book value ratio and the
sales amount. It is clear that this represents a microeconomic approach which seems
not to be suitable when analyzing the default likelihood of large dimension or listed
companies, which are expected to be more involved with the overall …nancial and
macroeconomic scenarios.
Recently, there is a growing interest in the speci…cation of models explaining the
number or the frequency of corporate defaults with a set of exogenous covariates. An
example can be found in Giesecke et al. (2011). They focus on modelling the default
rate, which is one of the most used measure of the default phenomenon incidence,
being de…ned as the number of defaulting companies in a certain time period divided
by the total number of debt issuers in the same period, and periodically published in
rating agencies reports. Their empirical analysis considers a large dataset of monthly
default rates of US industrial …rms, spanning the 1866-2008 period, and is based on
the application of a regime-switching model, in the aim of examining the extent to
which default rates can be predicted by …nancial and macroeconomic variables. The
econometric speci…cation is the following:
Dt =
where X t
1
t
+
k Xk;t 1
+ "t ,
"t
i:i:d:N (0;
2
(2.1)
)
is a k-vector of exogenous explanatory variables and the
k
terms are
the corresponding slope coe¢ cients. The intercept term follows a three-state Markov
chain taking values
1,
2
and
3
- corresponding to “low”, “medium” and “high”
default regime respectively - and the
ij
probability of transition from state i to state
j is the (i; j)-th entry of a transition matrix. Following Hamilton (2005), the model
is estimated by a maximum likelihood algorithm based on the recursive updating of
the probability
i;t
of being in state i at time t, the recursion expression being:
i;t
P3
= P3
i=1
i=1
P3
ij i;t 1 jt
j=1
ij i;t 1 jt
(2.2)
CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK
with conditional likelihood function
0
jt
=
1
2
2
B
exp @
jt
16
given by
Dt
jt
PN
k=1
2
2
2
k Xk;t 1
1
C
A
(2.3)
Among the regressors the authors include both business cycle variables, such as
GDP and Industrial Production (IP) growth, and …nancial covariates (stock returns,
change in returns volatility and change in credit spread), as well as the lagged default
rate itself. Several covariates, like the change in returns volatility and returns themselves, turn out to be signi…cant in explaining default rates dynamics, while others,
such as the growth in Industrial Production and the change in credit spreads, have a
low explanatory power. An interesting point - which seems to be not deeply investigated in the paper - is the high value of the lagged default rate coe¢ cient, highlighting
the relevance of the autoregressive components in default rate evolution. The maximum likelihood estimate of the time-varying intercept
goes from a minimum of
0:007 in the “low” regime to a value of 0:111 under the worst scenario, so it is in
general quite low. The “Dot-Com bubble”of 2001-2002, for instance, corresponds to
a high default regime, although its severity is not comparable to other crisis periods
such as the Great Depression. Other empirical studies which try to …nd a connection between the business cycle and the default rates are, amongst others, Kavvathas
(2001) and Koopman and Lucas (2005).
A missing element in this kind of approach is the absence of …rm-speci…c variables, which are instead present in other, even previous, works, like Du¢ e et al.
(2007). This article provides maximum likelihood estimators of multi-period conditional probabilities of corporate default incorporating the dynamics of both …rmspeci…c and macroeconomic variables. The empirical analysis is again based on a
dataset of defaults among Moody’s rated US industrial …rms. With regard to the
modelling framework, a Cox regression model for counting processes is used: this
approach is shared by some of the works related to the analysis of default correlation
presented in Section 2.2, so it will be described in detail later. The individual …rm
covariates considered in Du¢ e et al. (2007) are the previously de…ned distance to
default and the …rm trailing stock return, while the overall regressors are the trailing
CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK
17
S&P 500 returns and the three-month Treasury bill rate. It is quite surprising - and
also recalls the results of Giesecke et al. (2011) - the lack of signi…cance of other
variables, such as credit spreads and GDP growth, that are instead expected to be
relevant in default prediction.
2.1.1
The role of rating
When talking about default predictability, an analysis of the role of credit rating
information cannot be avoided. Rating is, indeed, the main result of the evaluation
of a company solvency, made by specialized agencies. The rating information is
synthetical and categorical, two features that summarize the potential advantage
of this kind of evaluation and explain the wide use of rating in support of pricing
and investment decisions. Furthermore, rating agencies methodologies should rely
on statistical and econometric models, thus giving a quantitative judgement which is
reasonably thought to be objective. However, in the last years some well-known cases
like that of Lehman Brothers, whose collapse was not preceeded by any “in time”
rating downgrade - Standard & Poor’s maintained the investment-grade rating of
“A”and Moody’s downgraded Lehman only one business day before the bankruptcy
announcement - has given rise to a burning debate about the possible mistakes in
rating evaluation and whether other aspects than a rational and documented quantitative analysis in‡uence the action of rating agencies. Beyond the often unproductive
and simplistic discussions trying to mark rating as “good” or “bad”, the question
arising in a proper econometric analysis is whether the current rating of a …rm is
a good predictor of its default probability. There is a double link between rating
and the probability of default (henceforth PD). First of all, “default” is one of the
classes characterizing the rating scale: class “D”is present in the classi…cation used
by all the main rating agencies, such as Fitch, Moody’s and Standard & Poor’s.
In the long-term rating assignment, the companies in the “default class” are those
that have already failed to repay all or some of their obligations, even in case bankruptcy has not been o¢ cially declared yet; in the short-term rating scale, class “D”
corresponds to an e¤ective state of insolvency. Secondly, rating agencies periodical
CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK
18
material establishes a correspondance between rating classes and PD, based on historical default rates of …rms with di¤erent rating scores. As an example, we brie‡y
describe the Moody’s approach to rating attribution: the output of its proprietary
(KMV) model - based on the application of Merton’s option pricing formulas in order
to derive the market value of assets and its volatility from the market value of equity
(…rm stocks) - is the so-called Expected Default Frequency (EDF). Figure 2.1 gives
a graphical representation of EDF, as the probability that the …rm assets fall below
a certain threshold over a given time horizon, typically one year or more, based on
the hypothesis of log-normal dynamics of the asset value which is typical of Black
and Scholes modelling framework.
Figure 2.1 Illustration of EDF determined by Moody’s KMV. Source: Moody’s.
To each interval of EDF, Moody’s associates a class of what the agency itself de…nes
as implied rating and declares to be a relevant component of the overall rating,
the latter also including qualitative and discretionary considerations. Thus, implied
rating represents the link between rating and PD.
The econometric analysis of rating is mainly based on the modelling of rating
CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK
19
history, that is the changes in a …rm rating. This is also motivated by the fact that
a kind of information widely used in the risk management of …nancial institutions is
given by the rating transition matrices, both historical and forecasted. The general
framework of the models for rating, chacterizing, among the others, Jarrow, Lando
and Turnbull (1997), is the following. A Markov chain is de…ned on a …nite space of
states:
S = f1; 2; :::; kg
(2.4)
Each state corresponds to a di¤erent rating class, so that the k-th state is the default
category, hence we may write, following, as an example, Moody’s classi…cation,
S = fAAA; AA; :::; Dg
It is assumed that the Markovian process describing rating evolution is homogenous,
i.e. its transition matrix does not change in time. The transition matrix Q for (2.4)
is de…ned as follows:
0
q1;1
q1;2
:::
B
B q
q2;2
B 2;1
B .
..
.
Q=B
.
B .
B
B qk 1;1 qk 1;2 ::: qk 1;k
@
0
0
:::
q1;k
q2;k
..
.
1
qk
1;k
1
1
C
C
C
C
C
C
C
C
A
where the generic entry qi;j is the probability that a company belonging to rating
class i in t will have rating j in t + 1. It is trivial that the following must be veri…ed
for i = 1; :::; k :
qi;j
0
qi;i = 1
Xk
j=1
qi;j
for i = 1; :::; k
j6=i
Note that the last row corresponds to the obvious assumption that default is an
absorbing state, i.e. it is not possible to move from state k to another. The assumed
homogeneity implies that matrix, say, Q(t; T ), containing the probabilities q(t; T ) of
being in state i at time t and in j at T , is obtained by simply multiplying it for itself:
Q(t; T ) = QT
t
CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK
20
The transition probabilities are, in general, given by historical data on average rating
changes rates. Another possibility is that of deriving “risk-neutral”transition probabilities by multiplying Q by a matrix containing credit risk premiums estimated
from empirical credit spreads.
In this framework, the PD by time T calculated in t is de…ned as
P D(t; T ) = 1
qi;k (t; T )
(2.5)
This approch is simple but operationally appealing. Lando and Skødeberg (2002)
revisit it by introducing a corrected transition matrix that takes into account the
rating changes occured between t and T , as ignoring them can lead to underestimate
the probability of downgrade. A more complex intensity-based model for rating
transitions has been, instead, proposed by Koopman et al. (2008).
With regard to the investigation of the predictive power of rating information
through empirical analyses, a common strategy used in econometric works is that
of analyzing how much the current rating of a …rm really incorporates the stage of
the business cycle and the risk pro…le of its sector, by studying dependence of the
published rating transition probabilities on a set of indicators. Nickell et al. (2000)
…nd that business cycle e¤ects have a strong impact on rating especially for lowgrade issuers, while Behar and Nagpal (2001) argue that the current rating of a …rm
seems not to incorporate much of the in‡uence of the macroeconomic context on the
default rates.
2.2
Default correlation and Contagion
When modelling the rate or the number of defaults, one of the main objective is …nding macroeconomic variables and …nancial indicators able to predict the peaks in the
number of defaults, in support of …nancial vigilance and central banks decisions. Another crucial topic a great part of the literature focuses on is default correlation: are
corporate defaults independent rare events or are there connections between them?
First, there are several works supporting the hypothesis of default correlation with
empirical analyses. For example, Das et al. (2006) document default correlation -
CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK
21
derived as correlation between individual default probabilities in an intensity-based
setting - in various economic sectors and emphasize that correlation e¤ects are timevarying. They further claim that it is possible to distinguish between two “default
regimes”: a high regime characterized by a higher correlation and a low regime in
which correlation is modest. Another important aspect is the already mentioned
possibility of contagion e¤ects by which one …rm’s default directly increases the
likelihood of other …rms defaulting, generating the “default cascade” e¤ect which
seems to characterize the crisis periods. Some examples of contagion models include
Davis and Lo (2001), Jarrow and Yu (2001) and Azizpour and Giesecke (2008a).
These models share the assumption that the default event of one …rm directly triggers the default of other …rms or causes their default probabilities to increase. A
missing element in this kind of modeling is testing the hypothesis of conditional independence between default events, which are probably subject to a common source
of randomness due to the mutual exposure to common risk factors. The test of
the doubly stochastic assumption, i.e. the assumption that defaults are independent
after conditioning on common factors, has been introduced in two recent works about
contagion, Das et al. (2007) and Lando and Nielsen (2010), the latter reviewed in
the following. Both examine whether default events in an intensity-based setting can
be considered conditionally independent testing whether bankruptcy count behaves
as a standard Poisson process. This means to verify in an intensity-based setting the
doubly stochastic assumption, under which the default events are dependent only on
exogenous variables.
A distinct class of models for contagion is that of the so-called frailty models.
They aim at individuating latent (unobservable) factors acting as an additional channel for the spread of defaults. As stated in Azizpour and Giesecke (2008b), in frailty
models default clustering is indeed explained by three kinds of factors:
observable common factors: changes and shocks in the macroeconomic and
…nancial context;
frailty factors: unobservable common factors a¤ecting corporate solvency;
CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK
22
contagion: the direct negative impact that the default event has on other
companies. This can be due to contractual relationships linking …rms to each
other, but also to the “informational” aspect, as bankruptcy announcements
increase the market uncertainty and cause a decrease in the value of stocks
portfolio of both industrial and banking …rms, with important consequences
on credit supply and companies …nancial conditions. The e¤ects of default
announcements are also treated in Lang and Stulz (1992).
In this class of models, including, among others, Du¢ e et al. (2009), Azizpour
et al. (2010) and Koopman et al. (2011), both frailty and contagion e¤ects are analyzed with self-exciting point processes. These are characterized by the speci…cation
of the conditional instantaneous default intensity of a counting process, that is of
the in…tesimal rate at which events are expected to occur around a certain time,
allowing for dependence on the timing of previous events. The major reference for
this approach is the self-exciting process de…ned by Hawkes (1971).
A di¤erent speci…cation of conditional default intensity can be found in Focardi
and Fabozzi (2005) and Chou (2012): both use the Autoregressive Conditional Duration (ACD) model introduced by Engle and Russell (1998). In the ACD model, the
expectation of the duration, i.e. of the interval between two arrival times, conditional on the past is …rst speci…ed and the conditional intensity is expressed as the
product of a baseline hazard rate - as in the tradition of proportional hazard models
for survival data - and a function of the expected duration.
2.3
The study of default correlation through count
models
The economic and …nancial relevance of the default phenomenon, showing peaks of
incidence like the sharp one in the crisis period of 2008-2010, has led to an increasing
interest in modelling and forecasting time series of corporate default counts. Modeling time series of counts rather than the default rate is quite common and justi…ed
by the fact that the default rate denominator - the total number of borrowers in a
CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK
23
certain economic sector or rating class - is usually known by the risk managers in
a certain advance. It is also possible to note (see Figure 1.1, for instance) that the
time series of default counts and default rates share a very similar trend.
2.3.1
Testing conditional independence of defaults
According to the doubly stochastic assumption, default events depend uniquely on
exogenous variables, that means they are independent conditionally on common macroeconomic and …nancial factors. A method for testing this assumption is developed
by Lando and Nielsen (2010), revisiting the method of time change test already used
by Das et al. (2007), though reaching di¤erent results.
In Lando and Nielsen (2010), the default time of a …rm is modelled through
its stochastic default intensity. If the …rm is alive at time t, then the conditional
intensity at time t, i.e. the conditional mean default arrival rate for …rm i satis…es
it
where
i
= lim
P (t <
i
t+
t
t!0
tj
i
t; Ft )
(2.6)
is the default time for …rm i. That means the probability of default within
a small time period
t after t is close to
it
t, where
it
depends on information
available at time t as represented by Ft .
The individual …rm default intensity is then speci…ed through a Cox regression:
it
= Rit exp(
0
W Wt
+
0
X Xit )
(2.7)
where Wt is the vector of covariates that are common to all companies whereas Xit
contains …rm-speci…c variables and Rit is a dummy variable which assumes value
1 if …rm i is alive and observable at time t, zero otherwise. The crucial point is
to determine the …rm-speci…c and macroeconomic variables which are signi…cant
explanatory variables in the regression of default intensity.
The Cox regression model was introduced by Cox (1972) in a survival data setting
and then extended to the general counting process framework by Andersen and Gill
(1982). This approach arises from the Cox proportional hazard model, which is a
semi-parametric proportional hazard model making no assumptions about the shape
CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK
24
of the baseline hazard function h(t) in the de…nition of the conditional intensity. The
latter is in general expressed as:
h (tj X) = h(t) exp(
1 X1
+ ::: +
p Xp )
The theory of Cox regression provides the partial log-likelihood to be maximized by
standard techniques in order to draw inference about the parameters vector
(
W;
=
X ):
l( ) =
n Z
X
i=1
T
0
W Wt
(
+
0
X Xit ) dNi (t)
0
n Z T
X
i=1
Rit exp(
0
W Wt
+
0
0
X Xit )1( i >t) dt
(2.8)
where Ni (t) is the one-jump process which jumps to 1 if …rm i defaults at time t, n
is the total numbers of …rms and T is the terminal time point of the estimation.
The cumulative number of defaults among n …rms is then de…ned as:
N (t) =
n
X
1(
i
t)
i=1
The objective is to verify the assumption of orthogonality, i.e. that there are
never exact simultaneous defaults. Under this assumption, the aggregate default
intensity is the sum of the individual ones:
(t) =
n
X
i (t)1(
i
t)
i=1
In order to execute the test, the cumulative default process has to be “timescaled”, that means the scale of time is replaced by the scale of intensity. This is
done by de…ning the compensator
(t) =
Z
t
(s)ds
0
that allows to write the time-changed process as
J(t) = N (
1
(t))
CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK
25
It is possible to show that J(t) is a unit-rate Poisson process with jump times
Vi =
(
(i) )
where 0
(1)
(2)
::: are the ordered default times. As a
consequence, the interarrival times V1 , V2
V1 ,... are independent exponentially
distributed variables and, for any c > 0, the jump times
Zj =
n
X
1]c(j
1);cj] Vi
i=1
are independent Poisson variables of c intensity.
Testing orthogonality of defaults means splitting up the entire time period into
intervals in which the cumulative integrated default intensity
increases by an in-
teger c and verifying, by using several test statistics, if the default counts in each
interval are independent and Poisson distributed with mean c. Note that the tested
property is the independence of defaults conditional to observable common factors,
with the aim of detecting an excess default clustering that is conceivable with the
existence of contagion e¤ects.
The data used by the authors are the monthly number of Moody’s rated US
corporate …rms’defaults occured between 1982 and 2005.
With regard to covariates, Wt vector contains the following selection of macroeconomic variables:
1-year return on the S&P index
3-month US Treasury bill rate
1-year percentage change in the US industrial production, calculated from
monthly data
spread between the 10-year and the 1-year Treasury rate
while the …rm-speci…c covariates entering vector Xit are:
1-year equity return
1-year Moody’s distance-to-default
CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK
26
quick ratio, calculated as the sum of cash, short-term investments and total
receivables divided by the current liabilities
log book asset value.
The results obtained in the paper by applying the time-change method and then
using several test statistics - like the Fisher dispersion and the upper tail statistics in order to test the Poisson assumption, lead to accept the hypothesis that default
times are conditionally independent, that was rejected in Das et al. (2007). The
authors claim that this is due to the use of a di¤erent set of explanatory variables
and so that the contagion e¤ects apparently revealed by the previous analysis are
instead explained by missing covariates. They also argue that the time-change test is
actually a misspeci…cation test, as the hypothesis of correct intensity speci…cation is
satis…ed by construction and that, furthermore, the doubly stochastic assumption is
not needed for having orthogonality of default times. They …nd indeed no evidence
of contagion by considering a di¤erent speci…cation, that is the Hawkes self-exciting
process
it
= Rit exp(
0
W Wt
+
0
X Xit )
+
Z
t
(
0
+
1 Ys ) exp(
2 (t
s)dNs + )
0
where Ys is the log book asset value of the …rm defaulting at time s. Model (2.13)
explicitly includes a contagion e¤ect through an a¢ ne function of Y so that larger
…rms’bankrupt has a higher impact on the individual default intensities. The exponential function makes the default impact decay exponentially with time, with
2
measuring the time horizon of in‡uence of a default on the overall intensity. Estimation can be carried out by partial maximum likelihood standard instruments (see,
for example, Andersen et al., 1992).
In a recent extension of Lando and Nielsen (2010), Lando et al. (2013) replace the
Cox multiplicative model with an additive default intensity, based on Aalen (1989)
regression model, where the covariate e¤ects act in an additive way on a baseline
intensity. The authors claim that the advantage of this model is allowing for the
introduction of time-varying e¤ects without the need for estimation procedures more
complex than the least squares methods. The focus moves from the test of the
CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK
27
conditional independence hypothesis characterizing the previous paper to the search
for predictive variables acting on default intensity with nonconstant magnitude. The
results are partly di¤erent from those reached by the previous analysis: the timevarying e¤ects of …rm-speci…c variables like distance-to-default and short-to-long
term debt are found signi…cant, but none of the macroeconomic covariates - many of
which already successfully employed in Lando and Nielsen (2010) - are. A problem
in the interpretation of results is that some of the coe¢ cients are negative, thus
leading to negative default intensities, which is a nonsense from a technical point of
view. With regard to this aspect, the authors claim that default intensity should be
interpreted as a risk measure rather than an expected rate and that negative values
could indicate that a …rm is weakly involved in the risk of failure.
2.3.2
An Autoregressive Conditional Duration model of credit
risk contagion
The use of self-exciting processes for representing the cascading phenomenon of bankruptcies was already present in another previous work, through a di¤erent speci…cation. Focardi and Fabozzi (2005) propose indeed a self-exciting point process. The
model belongs to the autoregressive conditional duration (ACD) family introduced
by Engle and Russell (1998) and is based on the idea of modelling default clustering
with econometric techniques that are the point process analogue of ARCH-GARCH
models. Applying the ACD speci…cation to the number of defaults, the default process in a time interval (0; t) is de…ned as a sequence of default times ti ; i = 1; 2; :::;
with the related durations between defaults
ti = (ti +1 ti ). The model is speci…ed
in terms of the conditional densities of the durations, de…ning
E [ ti j
ti 1 ; :::; t1 ] =
[ ti 1 ; :::; t1 ; ] =
i
(2.9)
and
ti =
where "t are i.i.d. variables and
i "t
(2.10)
is a parameter.
It is then assumed that the expectation of the present duration is linearly determined by the last m durations between defaults and the last q expectations of
CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK
durations:
i
=!+
m
X
j
ti
j=0
j
+
q
X
j
ti
j
28
(2.11)
j=0
This model is called an ACD(m; q) model.
The authors apply ACD models to simulated data of default durations in order to
evaluate the impact of di¤erent expected durations on the value of a credit portfolio.
2.4
Concluding remarks
We have investigated how the econometric and …nancial literature has faced the modelling of default risk and the interpretation of the relative empirical results under the
perspective of default predictability and correlation, also clarifying the origin and the
issues of the current debate about contagion. The search for explanatory variables in
the default rates and counts evolution has led to not always obvious results, because,
for example, the link with business cycle indicators and macroeconomic variables
does not appear so strong. We have also considered the discussion on the predictive
power of rating and described some common approaches to the modelling of rating
transitions. We have progressively focused on models which consider count processes for investigating the corporate defaults dynamics. Many of these models aim
at analyzing default correlation. With regard to this topic, we claim that the idea of
distinguishing between common factors and contagion, thus separating the systematic risk from other risk components, is worth being further investigated. An aspect
which seems somewhat missing in the literature yet is that of the autoregressive
components in the defaults dynamics, which could lead to interesting considerations
about the persistence in the default phenomenon. It is, indeed, present in Focardi
and Fabozzi (2005), but without considering the role of covariate processes, so giving
a limited de…nition of contagion which does not take into account crucial aspects of
credit and …nancial risk, and without presenting any application to real data. Our
approach to default risk modelling, which we will present in Chapter 4, considers
indeed both exogenous variables and autoregressive components and is applied to an
empirical corporate default count time series in Chapter 5.
Chapter 3
Econometric modelling of Count
Time Series
This chapter presents the main models for count time series. They are based on
the theory of Generalized Linear Models for time series, that is reviewed in the
…rst section. The aim of the next sections is to make a critical review, focused on
the suitability of the presented models to explain some features commonly found in
empirical count time series, such as overdispersion in the data. This is instrumental
to the following of our work, which proposes a modelling framework for default count
data, based on the extension of the Poisson autoregressive model introduced in the
last section.
3.1
Generalized Linear Models for time series
It is well known that generalized linear models (GLM), introduced by Nelder and
Wedderburn (1972), allow to extend ordinary linear regression to nonnormal data.
Applying the theory of GLM to time series makes thus possible to handle very
common processes like binary and count data, which are not normally distributed.
Before presenting the most important applications of GLM to the modelling of
count data, it is important to present the concept of partial likelihood, introduced by
Cox (1975). Partial likelihood is an useful tool when the observations are depend-
29
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
30
ent and the covariates representing auxiliary information are also random and time
dependent. In these situations the likelihood function is not readily available as the
nonindependence prevents from deriving a simple factorization.
Consider a generic response time series fyt g, t = 1; :::; T . If no other assumption
is added, the joint density f (y1 ; :::; yT ), parametrized by vector , is de…ned as
f (y1 ; :::yT ) = f (y1 )
T
Y
t=2
f (yt j y1 ; y2 ; :::; yt 1 )
(3.1)
where the main di¢ culty is that, if no other assumption is made, the size of
increases as the series size T does. A more tractable likelihood function can be
obtained by introducing limitations in conditional dependence such as Markovianity,
according to which we could use, for example, the following factorization:
f (y1 ; :::yT ) = f (y1 )
T
Y
t=2
where inference regarding
f (yt j yt 1 )
(3.2)
can be based only on the product term, as the …rst factor
is not dependent on T .
Then, consider the case where the response variable is observed jointly with
some time-dependent random covariate Xt . Then the joint density of the X and Y
observations can be written, using conditional probabilities, as:
"T
#" T
#
Y
Y
f (x1 ; y1 ; :::; xT ; yT ) = f (y1 )
f (xt j dt )
f (yt jct )
t=2
(3.3)
t=2
where dt = (y1 ; x1 ; :::; yt 1 ; xt 1 ) and ct = (y1 ; x1 ; :::; yt 1 ; xt 1 ; xt ). The idea of Cox is
to take into account only the second product of the right hand side of (3.3), which is a
“partial”likelihood in the sense that it does not consider the conditional distribution
of the covariate process Xt . Moreover, it does not specify the full joint distribution
of the response and the covariate. Cox (1975) shows that the second product term in
(3.3) can be used for inference, although it ignores a part of the information about
.
The general de…nition of the partial likelihood (PL) relative to , Ft
1
and the
observations Y1 ; :::; YT applies this idea joint with that of limited conditional dependence mentioned above. Considering only what is known to the observer up to the
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
31
present time allows for sequential conditional inference:
PL( ;y1 ; :::; yT ) =
T
Y
t=1
where Ft
1
f (yt ; t jFt 1 ) =
T
Y
f (yt ; )
(3.4)
t=1
is the …ltration generated by all is known to the observer by t and pos-
sibly including the information given by a random covariate process. Note that this
de…nition simpli…es to ordinary likelihood when there is no auxiliary information and
the data are independent, while it becomes a conditional likelihood when a deterministic - i.e. known throughout the period of observation - covariate process is included.
This formulation enables conditional inference for nonMarkovian processes where the
response depends on autoregressive components and past values of covariates, as it
does not require the full knowledge of the joint distribution of the response and the
covariates.
The vector
maximizing equation (3.4) is called the maximum partial likelihood
estimator (MPLE) and its theoretical properties have been studied by Wong (1986).
We now show how the theory of GLM and partial likelihood can be applied to
time series (see Kedem and Fokianos, 2002 for a complete review).
Consider again the response series fyt g, t = 1; :::; T and include a p-dimensional
vector of explanatory variables xt = (xt;1 ; :::; xt;p )0 . Then denote the -…eld generated
by yt 1 ; yt 2 ; :::; xt 1 ; xt 2 ; ::: as
Ft
1
=
fyt 1 ; yt 2 ; :::; xt 1 ; xt 2 ; :::g
where is often convenient to de…ne Zt = (yt ; xt )0 which contains both the past values
of the response and a set of covariates:
Ft
1
=
fZt 1 ; Zt 2 ; :::g
The main feature of GLM for time series is the de…nition of the conditional expectation of yt given the past of the process Zt :
t
= E [yt j Ft 1 ]
(3.5)
It is worth to note that de…ning the expected value of yt as a linear function of the
covariates can lead to senseless results when the data are not normal. For instance,
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
linear regression of
t
32
on the covariates may lead to negative estimates of intensity
when the response is Poisson distributed.
The GLM approach to time series can be stated in two steps:
1. Random component: the conditional distribution of the response given the
past belongs to the exponential family of distributions, that is
f (yt ;
where
t
j Ft 1 ) = exp fyt
t
(3.6)
+ b( t ) + c(yt )g
is the natural (or canonical) parameter of the distribution.
Q
Q
By setting Tt=1 f (yt ; t j Ft 1 ) = Tt=1 f (yt ; t ), the latter product de…nes
t
a partial likelihood in the sense of Cox (1975), as it is a nested sequence of
conditioning history, not requiring the knowledge of the full likelihood.
2. Systematic component: there exists a monotone function g( ) such that
g( t ) =
t
=
p
X
j Z(t 1)j
= Z0t
(3.7)
1
j=1
where we call g( ) the link function, while we refer to
of the model, and
t
as the linear predictor
is a vector of coe¢ cients. It is quite common to include
also Xt , i.e. the present value of x, in the covariate vector, if it is already
known in t
1. It can happen, for instance, when x is a deterministic process
or when yt is a delayed output. We refer, then, to g 1 ( ) as the inverse link
function.
3.2
3.2.1
The Poisson Model
Model speci…cation
When handling count data, a natural candidate is the Poisson distribution. If we
assume that the conditional density of the response given the past, i.e. the available
information up to time t, is that of a Poisson variable with mean
f (yt ;
t
j Ft 1 ) =
yt
t) t
exp(
yt !
; t = 1; :::; T
t,
we get
(3.8)
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
33
In the Poisson model, the conditional expectation of the response is equal to its
conditional variance:
E [yt j Ft 1 ] = V ar [yt j Ft 1 ] =
(3.9)
t
Then we denote by fZt 1 g ; t = 1; :::; T a p-dimensional vector of covariates which
may include past values of the response and other auxiliary information. A typical
choice for Zt
1
is
Zt
1
= (1; yt 1 ; xt )0
but it is also possible to consider interactions between the processes by de…ning, for
instance, Zt
1
= (1; yt 1 ; xt ; yt 1 xt )0 .
Following the theory of of GLM and recalling (3.7), a suitable model is obtained
by setting
t
=
t
and
g( t ) =
where
t
= Z0t
(3.10)
t = 1; :::; T
1
is a p-dimensional vector of unknown parameters.
The most common model is that using the canonical link function, which is
derived from the canonical form of the Poisson conditional density:
f (yt ;
t
j Ft 1 ) = exp f(yt log
t
t)
where the natural parameter turns out to be log
log
t !g ;
t = 1; :::; T
t.
Hence,
g( t ) = log
t;
is de…ned as the canonical link, while the inverse link function g
t
(3.11)
t = 1; :::; T
1
guarantees that
> 0 for every t, as:
g 1 ( t ) = exp( t ); t = 1; :::; T
(3.12)
The resulting de…nition of intensity
t
= exp(Z0t
1
); t = 1; :::; T
(3.13)
characterizes the so-called log-linear model, which has been widely applied in econometrics since Hausman et al. (1984).
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
3.2.2
34
Inference
Consider …rst the estimation of the parameter vector
case of the Poisson model with g( t ) =
Z0t 1
=(
1 ; :::;
p)
for the general
. Recalling (3.4), the partial likelihood
function is
PL( ) =
=
T
Y
t=1
T
Y
f (yt ;
j Ft 1 )
exp(
t(
t=1
)) t ( )yt
yt !
(3.14)
Hence, the partial log-likelihood is the following:
l( )
log PL( )
T
X
yt log t ( )
=
t=1
T
X
t(
)
T
X
yt log yt !
(3.15)
t=1
t=1
The partial score function is then obtained by di¤erentiating the log-likelihood:
ST ( ) = rl( ) =
=
T
X
t=1
Zt
@l( )
@l( )
; :::;
@ 1
@ p
@g 1 ( t )
1
@ t
1
(yt
t( )
0
t(
))
(3.16)
Then, the MPLE ^ (see Wong, 1986) is obtained by solving the system
ST ( ) = rl( ) = 0
(3.17)
which has to be solved numerically, because is nonlinear. Besides the use of standard
Newton-Raphson type algorithms, a possible method for solving (3.17) is the Fisher
scoring, which is a modi…cation of the Newton-Raphson algorithm where the observed information matrix is replaced by its conditional expectation, yielding some
computational advantages. The application of the Fisher scoring method to the partial likelihood estimation of the Poisson model is presented in Kedem and Fokianos
(2002).
De…ne …rst the observed information matrix as
HT ( ) =
rr0 l( )
(3.18)
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
35
It admits the following decomposition:
H T ( ) = GT ( )
(3.19)
RT ( )
where GT ( ) is the cumulative conditional information matrix, which is de…ned as
T
X
GT ( ) =
t=1
T
X
=
Cov Zt
Zt
1
@g 1 ( t )
@ t
2
@g 1 ( t )
@ t
1
t=1
0
1
(yt
t( )
1
Z0t
(
)
t
t(
)) j Ft
1
1
(3.20)
= Z W( )Z
where Z = (Z00 ; Z01 ; :::; Z0T
1)
entries
wt =
is a T
p matrix and W( ) = diag(w1 ; :::; wT ) with
@g 1 ( t )
@ t
and
RT ( ) =
T
X
2
1
; t = 1; :::; T
t( )
Zt 1 dt ( )Z0t 1 (yt
t(
(3.21)
))
t=1
2
1
with dt ( ) = [@ log g ( t )[email protected]
2
t ].
By substituting HT with GT , if GT 1 exists, the iterations take the form
^ (k+1) = ^ (k) + G 1 ( ^ (k) )ST ( ^ (k) )
T
(3.22)
An interesting feature of the Fisher scoring is that it can be viewed as an iterative
reweighted least squares (IRLS) method.
It should indeed be noted that equation (3.22) can be rewritten as
(k
GT ( ^ ) ^
(k+1)
= GT ( ^
(k)
)^
(k)
+ ST ( ^
(k)
)
(3.23)
where the right-hand side is a p-dimensional vector whose i-th element is
" T
#
p
T
1
X
X Z(t 1)j Z(t 1)i @g 1 ( ) 2 (k) X
(yt
t )Z(t 1)i @g ( t )
t
^ +
j
2
2
@ t
@ t
t
t
t=1
j=1 t=1
=
T
X
t=1
Z(t
1)i wt
t + (yt
t)
@g 1 ( t )
@ t
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
36
Thus, de…ning
(k)
qt
=
T
X
^ (k) + (yt
1)i
Z(t
t)
j
t=1
^
t(
=
(k)
) + (yt
t)
@g 1 ( t )
@ t
@g 1 ( t )
@ t
(k)
and, denoted by q(k) the T -dimensional vector whose elements are the qt , the righthand side of (3.23) is equal to Z0 W(
(k)
)q(k) . By applying (3.20) to the left side,
(3.23) becomes
Z0 W( ^
(k)
)Z ^
(k+1)
= Z0 W( ^
(k)
)q(k)
and the iteration simpli…es to
^ (k+1) = (Z0 W( ^ (k) )Z) 1 Z0 W( ^ (k) )q(k)
(3.24)
The limit for k ! 1 of recursion (3.24) is the maximum partial likelihood estimator ^ . In each iteration we can recognize the form of the weighted least squares with
adjusted weight W(
(k)
) and the adjusted dependent variable q(k) . For initializing
the recursions the conditional means can be replaced by the corresponding responses
in order to get a …rst estimate of the weight matrix W and hence a starting point
for
.
When the canonical link is used, we have
t(
) = exp(Z0t
)
1
and several sempli…cations are possible. Indeed, for the log-linear model, equations
(3.17) and (3.20) become
ST ( ) =
T
X
Zt 1 (yt
t(
))
(3.25)
)
(3.26)
t=1
and
GT ( ) =
T
X
0
Zt 1 Zt
1 t(
t=1
Moreover, as dt = 0 in (3.21), RT ( ) vanishes and we get
H T ( ) = GT ( )
(3.27)
thus for the log-linear model the Fisher scoring and Newton-Raphson methods coincide.
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
3.2.3
37
Asymptotic theory
In the general theory of GLM the following assumptions (see Fahrmeir and Kaufmann,
1985 for more details) allow to show consistency and asymptotic normality of the
MPLE ^ :
Assumption 1 The true parameter
Rp :
belongs to an open set
The covariate vector Zt 1 almost surely lies in a nonrandom
i
hP
0
T
0
>
0
= 1. In addition, Zt 1
of Rp , such that P
Z
Z
t
1
t 1
t=1
Assumption 2
compact subset
lies almost surely in the domain H of the inverse link function g
and
1
for all Zt
2 B:
Assumption 3 The inverse link function g
1
2
is twice continuously di¤erentiable
and [email protected]( )[email protected] j =
6 0.
Assumption 4
1
There is a probability measure
on Rp such that
R
Rp
zz 0 (dz)
is positive de…nite, and such that, if the conditional distribution of Yt belongs to the
exponential family of distributions in canonical form and under (3.10), for Borel sets
Rp ,
A
T
1X
I[Z
T t=1 t
as T ! 1, at the true value of
p
1 2A]
! (A)
.
Assumption 4 assures the existence of a p
matrix
G( ) =
Z
Rp
with
= Z0
such that
Z
@g 1 ( )
@
p nonrandom limiting information
2
1
g
1(
GT ( ) p
! G( )
T
)
Z0 (dz)
(3.28)
(3.29)
Once stated the above assumption, the following theorem, providing the asymptotic properties of the MPLE, can be presented.
Theorem 3.1 For the Poisson model, as well as for the general case of GLM, it
can be shown that, under assumptions 1-4, the maximum partial likelihood estimator
is almost surely unique for all su¢ ciently large T and
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
38
1. The MPLE is consistent and asymptotically normal:
p
^!
and
p
T (^
d
) ! N (0; G 1 ( ))
as T ! 1.
2. The following holds:
p
T (^
1
p
p G 1 ( )ST ( ) ! 0
T
)
as T ! 1.
3.2.4
Hypothesis testing
Consider the test of the hypothesis
H0 : C0
=r
where C is a known p q matrix with full rank and r is a known q-dimensional column
vector. Then denote by
0
the restricted maximum partial likelihood estimator under
the null hypothesis.
The most commonly used test statistics for testing H0 in the context of the
Poisson model are:
- the partial likelihood ratio statistic
n
LRT = 2 log PL( ^ )
log PL(
0)
o
(3.30)
- the Wald statistic
WT = (C0 ^
0
0
1
0 ) (C G (
0 )C)
1
(C0 ^
0)
(3.31)
- the partial score statistic
LMT =
1 0 ~
S ( )G 1 ( ~ )ST ( ~ )
T T
(3.32)
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
39
Kedem and Fokianos (2002) prove the following theorem concerning the asymptotic distribution of the test statistics de…ned above.
Theorem 3.2
Under the set of assumptions 1-4, the test statistics LRT ; WT
and LMT are asymptotically equivalent. Furthermore, under H0 , their asymptotic
distribution is chi-square with r degrees of freedom.
3.2.5
Goodness of …t
In the context of Poisson regression for count time series, several de…nitions of residuals can be employed (see Cox and Snell, 1968).
- The raw residual is the di¤erence between the response and its conditional expectation:
r^t = yt
t(
^ ); t = 1; :::; T
(3.33)
- The Pearson residual is the standardized version of the raw residual, taking into
account that the variance of Yt is not constant:
yt
e^t = q
t(
^)
;
(3.34)
t = 1; :::; T
^
t( )
- The deviance residual
d^t = sign(yt
^
t ( ))
q
lt (yt )
lt ( t ( ^ ))
(3.35)
can be viewed as the t-th contribute to the model deviance.
The notion of deviance is based on a likelihood comparison between the full
(or saturated) model and the estimated model. The full model is that where
is estimated directly from the data y1 ; :::; yT disregarding
t
, thus it has as many
parameters as observations, as in this case the maximum partial likelihood of t is
yt . The estimated model includes p < T parameters instead. Since l(y; y) l( ^ t ; y),
the deviance statistic
n
D = 2 l(y; y)
l( ^ t ; y)
o
(3.36)
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
where l(y; y) =
T
P
40
yt has been suggested as a measure of the model overall goodness
t=1
of …t. Lower positive values correspond to a better …tted model. The deviance
2
T p
statistic has been shown to be have an approximate
distributions under certain
conditions (see Mc Cullagh, 1986).
In many generalized linear model, including the Poisson, Pearson residuals are
known to be skewed and fat tails. It can be indeed convenient to use a normalizing
transformation so that they are more likely to achieve approximate normality under
the correct model, like the Anscombe residuals. In McCullagh and Nelder (1983)
these are de…ned as:
a
^t =
3 2=3
y
2
^ 2=3
t
(3.37)
^ 1=6
t
Autocorrelation of Pearson residuals
The large sample properties of the MPLE stated by Theorem 3.1 imply that e^t is a
t
consistent estimator of et = yp
e (k)
t(
t(
)
)
, so that the autocorrelation of the et ’s at lag k
can be consistently estimated by
T
1 X
e^t e^t
^e (k) =
T t=k+1
k
(3.38)
Li (1991) has proved the following theorem relative to the asymptotic distribution
of the autocorrelation vector.
Theorem 3.3 Under the correct model, the vector
1
p ^e =
T
^e (1) ^e (2)
^e (m)
p ; p ; :::; p
T
T
T
for some m > 0 is asymptotically normally distributed with mean 0 and some diagonal limiting covariance matrix (see Li, 1991 for details).
Testing the “whiteness” of Pearson residuals is used in many applications for
goodness of …t analysis, as they should be a white noise, i.e. a sequence of uncorrelated random variables with mean 0 and …nite variance, under the correct model (see
Kedem and Fokianos, 2002). Plots of the sample autocorrelation function of Pearson
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
residuals with con…dence bands at
41
p
1:96= T are commonly used for goodness of …t
evaluation.
3.2.6
Model selection
In GLM for count time series, selection among competing models can be based on the
traditional information criteria. The Akaike Information Criterion (AIC) introduced
by Akaike (1974), in the partial likelihood estimation context is a function of the
partial log-likelihood and the number of parameters:
2 log PL( ^ ) + 2p
AIC(p) =
(3.39)
The model with the number of parameters p which minimizes (3.39) is preferred.
The so-called Bayesian information criterion (BIC), following Schwarz (1978) is
de…ned as
BIC(p) =
3.3
2 log PL( ^ ) + p log T
(3.40)
The doubly-truncated Poisson model
The traditional Poisson model can be generalized, as in Fokianos (2001), by assuming
that the conditional distribution of the response is doubly truncated Poisson. Let
fYt g, t = 1; :::; T be a time series of counts and suppose to obmit the values below
a known …xed constant c1 and exceeding another known …xed constant c2, with
c1 < c2. Then the doubly truncated Poisson conditional density is
f (yt ;
t ; c1; c2 j Ft 1 ) =
exp( t ) yt t
; t = 1; :::; T
yt ! (c1; c2; t )
where the function
and clearly
is de…ned as
8
c2
y
X
>
>
t
<
y!
(c1; c2; t ) =
y=c1
>
>
:
(0; c2;
(3.41)
(0; 1;
t)
if 0 6 c1 < c2
t)
otherwise
= exp( t ) leads to the common Poisson model. This gener-
alization turns out to be useful for modelling truncated count data.
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
42
An often used speci…cation is that obtained by setting c1 = 1 and c2 = 1. In
this case (3.41) becomes:
f (yt ;
t ; 1; 1
yt
t
j Ft 1 ) =
yt !(exp(
t)
1)
; t = 1; :::; T
It should be noted that, di¤erently from the traditional Poisson model, for the truncated Poisson model the conditional mean is not equal to the conditional variance,
as
E tr [yt ; c1; c2 j Ft 1 ] =
(c1
t
1; c2 1;
(c1; c2; t )
t)
while
1
f
(c1; c2; t )
(c1; c2; t ) +
V artr [yt ; c1; c2 j Ft 1 ] =
2
[ (c1; c2;
t)
2
t
(c1
2; c2
2;
(c1
1; c2
1;
t
t
(c1
1; c2
t)
1;
t)
t )]g
As can be noticed from (3.41), the doubly truncated Poisson distribution belongs to
the exponential family of distributions, hence its canonical link is the logarithm and
the inverse link is the exponential. Therefore, we obtain again the log-linear model
t
exp(Z0t
)
1
and inference is based on maximization of the log-likelihood function derived by
(3.41).
3.4
The Zeger-Qaqish model
Zeger and Qaqish (1988) de…ne the following multiplicative model:
t(
) = exp(
0
+
1 xt
+
= exp(
0
+
yt 1
1 xt )~
2
2
log(~
yt 1 ))
,
(3.42)
t = 1; :::; T
and no distributional assumption for the response yt is speci…ed. It is clear that,
when
2
< 0, there is an inverse relationship between y~t
1
and
t(
), while the
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
conditional mean grows with y~t
1
when
2
> 0. Observe that, when
2
43
< 0, (3.42)
reduces to a log-linear model.
In this formulation Zt
= (1; Xt ; log(~
yt 1 ))0 ,
1
=(
0;
1;
0
2) ,
while y~t
1
is
de…ned either as
y~t
1
= max(c; yt 1 ),
0<c<1
y~t
c>0
or
so that yt
1
1
= yt
1
+ c,
= 0 is not an absorbing state.
Equation (3.42) de…nes the …rst conditional moment. With respect to the conditional variance it is assumed:
V ar[yt j Ft 1 ] = V ( t )
(3.43)
where V ( ) is a known variance function de…ning the relationship between the conditional mean and the conditional variance, and ' is an unknown dispersion parameter.
The so-called working variance V ( t ) allows to accomodate some features found in
the data. For example, the variance model
t,
with ' > 1, may hold for count data
where the conditional variance exceeds the conditional mean. As can be seen, in
this model the assumptions on the response distribution concern only the …rst and
second conditional moments.
A possible extension of (3.42) is the following multiplicative error model:
t(
) = exp(
0
+
1 xt )
y~t 1
exp( 0 + 1 xt 1 )
2
t = 1; :::; T
which can be generalized by considering, as in Kedem and Fokianos (2002), the
following model:
t(
where
"
) = exp x0t +
q
X
i=1
= ( 0 ; 1 ; :::;
0
q)
~t 1
i (log y
x0t
1
#
)
is an s + q-dimensional parameter vector and fxt g is an s-
dimensional covariate vector of covariates. Note that when s = 2, q = 1,
xt = (1; xt )0 and
1
=
2,
(3.44)
t = 1; :::; T
(3.44) reduces to (3.42).
=(
0;
1 ),
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
44
Turning to the theory of inference for the Zeger-Qaqish model (3.49), we consider
the case where c is known. In this case, the estimation of the parameter vector
can be carried out by using the quasi-score function:
T
X
ST ( ) =
Zt
(yt
t ( ))
V ( t ( ))
t
t
1
t=1
(3.45)
which resembles the score function (3.16), except that the true conditional variance
is replaced by the working variance.
According to the theory of quasi-partial maximum likelihood estimation for GLM
(see Wedderburn, 1974), the estimator ^ q is consistent and asymptotically normal:
p
T (^q
d
) ! N (0; G 1 ( )G1 ( )G 1 ( ))
where G( ) and G1 ( ) are the following matrices:
GT ( ) =
and
where
2
t(
T
1X
Zt
T t=1
T
1X
G1 ( ) =
Zt
T t=1
2
1
t
1
t
2
1
t
2
t
V
2
t(
2(
p
Z0t
1
! G( )
)
Z0t
(
))
t
1
! G1 ( )
V ( t ( ))
p
) denotes the true conditional variance. In practice, the covariance mat-
rix of ^ q is estimated by replacing the parameters '; ;
estimates. The true conditional variance
dispersion parameter
2
t(
1
T
s
) by their respective
2
) is replaced by yt
can be estimated by
^=
2
t(
T
X
e^t
t=1
where e^t is the Pearson residual at time t:
^
yt
t( q )
e^t = q
V ( t( ^ q )
t(
^ q ) . The
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
3.5
45
Overdispersion and negative binomial regression
The equality of mean and variance characterizing the Poisson model makes it nonsuitable when the data show overdispersion, i.e. the response variance is higher than
the mean. We will show in the following that the introduction of lagged values of
the response among the regressors for
t
allows the unconditional variance to be
higher than the unconditional mean, di¤erently from the traditional Poisson model
with only exogenous regressors. However, in general, when modelling count data the
problem of overdispersion should be addressed. Several post-hoc tests - i.e. performed
after modelling the data - have been proposed in order to detect overdispersion. One
of them is the Pearson statistic, de…ned as the sum of squared Pearson residuals:
2
=
T
yt
X
t=1
t(
t(
^)
2
(3.46)
^)
Its distribution was studied, among the others, by McCullagh (1986) and McCullagh
and Nelder (1989). Under suitable regularity conditions, its distribution converges
to a chi-square with T
p degrees of freedom.
A distribution which is known to …t overdispersed count data is the negative
binomial. If the conditional density of a time series given the past is that of a
negative binomial variable with parameters pt and r , its distributional law is
yt + r 1 r
f (yt ; pt ; r j Ft 1 ) =
pt (1 pt )yt ; t = 1; :::; T
(3.47)
r 1
where pt is the probability that an event occurs in t while r is the scale parameter
and its inverse 1=r is known as the overdispersion parameter. The conditional mean
r(1 pt )
E [Yt j Ft 1 ] = =
is lower than the conditional variance V ar [Yt j Ft 1 ] =
pt
r(1 pt )
.
p2t
The systematic component of the GLM in the negative binomial case, linking pt ,
and thus the expected conditional value, to a set of covariates Z, can be de…ned, as
in Davis and Wu (2009), through the following logit model:
log
pt
1
pt
= exp(Z0t
1
)
(3.48)
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
46
yielding
= r exp(Z0t
1
(3.49)
)
The maximum likelihood estimator ^ maximizes the partial log-likelihood function
l( )
log PL( ) = r
T
X
log(1+exp(Z0t
T
X
))
1
Yt log(1+exp(Z0t
1
t=1
t=1
))+log
T
Y
yt + r 1
r 1
t=1
(3.50)
Several optimization algorithms have been proposed by Hilbe (2007).
As we said, negative binomial is often used as an alternative to the Poisson
model. For testing the Poisson model against the negative binomial distribution,
a commonly used test statistic is that characterizing the Z test, which Lee (1986)
de…nes as follows:
Z=
T
P
yt
t(
^)
2
t(
^)
t=1
T
p P
2
(3.51)
t(
^)
t=1
and is shown to have asymptotic standard normal distribution. As the probability
limit of the numerator is shown to be positive under the alternative hypothesis that
the negative binomial distribution is preferable, a one-sided test is convenient. In
particular, the Poisson speci…cation is rejected in favour of the negative binomial
with a level of signi…cance
T
X
if
yt
^
t( )
2
^
t( ) > c
t=1
T
p X
2
t(
^)
t=1
where c is the critical value.
3.6
Poisson Autoregression
Fokianos, Rahbek and Tjøstheim (2009), henceforth FRT (2009), study a particular
Poisson time series model, characterized by a linear autoregressive intensity and
allowing to …t data showing a very slowly decreasing dependence. This model was
already existing in literature and shown to …t some …nancial count data satisfactorily,
but FRT (2009) is the …rst work to study ergodicity and develop the asymptotic
theory, which is crucial for likelihood inference.
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
3.6.1
47
Model speci…cation
FRT (2009) study the properties of the following Poisson model:
yt jFtY; 1
t
where the parameters !,
P ois( t )
= ! + yt
and
1
+
t 1
t>1
are assumed to be positive. In addition,
(3.52)
0
and y0
are assumed to be …xed.
By introducing for each time point t a “scaled” Poisson process Nt ( ) of unit
intensity, it is possible to rephrase (3.52) so that the response is de…ned explicitly as
a function of the conditional mean:
yt = Nt ( t )
t
= ! + yt
1
+
t 1
t>1
where yt is then equal to the number of events of Nt ( ) in the time interval [0;
(3.53)
t ].
The
rephrased model (3.53) is found to be more convenient when proving the asymptotic
normality of the parameter estimates. Furthermore, expressing yt as a function of
conditional mean - which in the Poisson model is equal the conditional variance recalls the …rst de…ning equation in the GARCH model. It is interesting to note that
the sum ( + ) can be considered as a measure of persistence in intensity, just as
the sum of the ARCH and GARCH parameters in the GARCH model can be read
as a measure of persistence in volatility.
Both (3.52) and (3.53) refer to the theory of generalized linear model (GLM)
for count time series. Here the random component is the Poisson distribution, as
the unobserved process
t
can be expressed as a function of the past values of the
observed process yt after recursive substitution.
The peculiarities of this approach are mainly two. First, it is characterized by
a noncanonical link function - the identity - while, as we have seen, the traditional
Poisson model uses the log-linear speci…cation. The other contribution is the introduction of an autoregressive feedback mechanism in f t g, while in the tradition
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
48
of GLM the intensity is function of a vector of covariates, possibly including the
lagged value of the response. This aspect makes the model able to capture a strong
persistence with a small number of parameters.
As said before, although FRT (2009) is the …rst work studying ergodicity of
(3.53), that is critical in developing the asymptotic theory, this model was already
been considered in the econometric literature. It belongs indeed to the class of
observation-driven models for time series of counts studied, among the others, by
Zeger and Qaqish (1988) and, more recently, by Davis et al. (2003) and Heinen
(2003). The latter de…nes, in particular, an Autoregressive Conditional Poisson
model (ACP), which is a more general form of 3.53 including several lags of counts
and intensity. A strong motivation for the analysis of this class of models is that
is shown to well approximate some common …nancial count time series, such as the
number of trades in a short time interval (Rydberg and Shephard, 2000 and Streett,
2000).
In particular, Ferland et al. (2006) de…ne model (3.53) explicitly as an an integervalued GARCH(1,1), i.e. an INGARCH(1,1), and show that Yt is stationary provided
that 0 6
+
< 1. In particular,
E[yt ] = E[ t ] =
= !=(1
)
They further show that all the moments are …nite if and only if 0 6
+
< 1.
Turning to the second moments, as
2
V ar[yt ] =
1+
1
( + )2
it is immediate to conclude that V ar[Yt ] > E[Yt ], with equality when
= 0. Thus,
including the past values of the response in the evolution of intensity leads to overdispersion, a feature often found in real count data.
3.6.2
Ergodicity results
A crucial point in the analysis of this model is to prove the geometric ergodicity of the
joint process (yt ;
t ),
where yt is the observed component, while the intensity process
is latent. The notion of geometric ergodicity for a Markov chain process can be
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
49
summarized as follows. First, the concept of '-irreducibility has to be introduced.
Consider the homogenous Markov chain Zt de…ned on a
-…eld M on A, where
P t (z; B) = P (Zt 2 B j Z0 = z) is the probability of moving from z 2 A to the set
B 2 M in t steps. The Markov chain (Zt ) is said -irreducible if, for some nontrivial
-…nite measure
on (A; M),
8B 2 M
(B) > 0 ) 8x 2 A; 9t > 0;
P t (z; B) > 0
If a -irreducible Markov chain is positive recurrent (see Meyn and Tweedie, 1996),
then there exists a (unique) invariant distribution, that is a probability measure
such that
8B 2 B
(B) =
Z
P (z; B) (dz)
Finally, (Zt ) is said to be geometrically ergodic if there exists a
t
8x 2 A
P t (z; )
2 (0; 1) such that
! 0 as t ! +1
Thus, geometric ergodicity states convergence to the invariant distribution.
FRT (2009) succeed in proving geometric ergodicity of (yt ;
t)
by using an ap-
proximated (perturbed) model and proving that is geometrically ergodic under some
restrictions on the parameter space. Then, they show that the perturbed model can
be made arbitrarily close to the unperturbed one, allowing to extend the results to
the latter.
The perturbed model is de…ned as:
ytm = Nt (
m
t
where
m
0
m
t )
= ! + ytm 1 +
m
t 1
+ "t;m
(3.54)
and y0m are …xed and
"t;m
=
cm I ytm 1 = 1 ut ;
cm
>
0
cm
! 0 as m ! 1
where I f g is the indicator function and fUt g is a sequence of i.i.d. uniform random
variables on (0; 1) such that fUt g and fNt g are independent. The introduction of
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
fUt g enables to establish
irreducibility, where
support [k; 1) for some k
proof that the point
, with
50
is the Lebesgue measure with
) solution of
= !=(1
=!+
. The
is reachable, and so that f t g is open set irreducible on
[ ; 1), provided that
< 1, is instead given (see FRT, 2009 for details) without
using any perturbation.
The following lemma allows to complete the proof of ergodicity of (3.53), establishing that the perturbed model can be made arbitrarily close to the unperturbed
one.
Lemma 3.1 With (yt ;
if 0
2. E(
m
t
m
t
t )j
2
t)
= jE(ytm
yt )2
and
! 0 as m
i;m
m
t
3.6.3
de…ned by (3.53) and (3.54) respectively,
yt )j
1;m
2;m
3. E(ytm
large, j
m
t )
and (ytm ;
1, then the following statements hold:
+
1. jE(
t)
3;m
! 1 for i = 1; 2; 3. Furthermore, with m su¢ ciently
and jytm
tj
yt j
for any
> 0 almost surely.
Estimation of parameters
Denoting by
the three-dimensional vector of unknown parameters, i.e.
(!; ; )0 , the conditional likelihood function for
observations y1 ; :::; yT given the starting values
L( ) =
T exp(
Q
t=1
where
by
0
t(
= (! 0 ;
) = ! + yt 1 ( ) +
0;
0)
0
, we can write
based on (3.52) in terms of the
0 ; y0
t(
=
))
is the following:
yt
t (
)
(3.55)
yt !
t 1,
while, denoting the true parameter vector
=
t ( 0 ).
t
Thus the conditional log-likelihood function is given, up to a constant, by
l( ) =
T
X
t=1
while the score function is
T
X
lt ( ) =
(yt log
ST ( ) =
t(
)
t(
))
(3.56)
t=1
T
X
t=1
yt
t( )
1
@ t( )
@
(3.57)
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
51
where @ t ( )[email protected] is a three-dimensional vector with components
@ t 1
@ t
= 1+
;
@!
@!
@ t
@ t
= yt 1 +
@
@
@
@
t
=
t 1
@
+
t 1
@
;
1
The solution of ST ( ) = 0 yields the conditional maximum likelihood estimator of
, denoted by ^ .
The Hessian matrix is then obtained by further di¤erentiation of the score equations (3.57):
HT ( ) =
=
T
X
@ 2 lt ( )
@ @ 0
t=1
T
X
t=1
yt
2
t( )
T
X
t=1
@ t( )
@
yt
t( )
1
@ t( )
@
0
@ 2 t( )
@ @ 0
(3.58)
In order to study the asymptotic properties of the maximum likelihood estimator for
the unperturbed model which are presented in the following, it is again helpful to
use the ergodic properties of the perturbed model, whose likelihood function, based
on the Poisson assumption and the independence of Ut from (ytm ;
m
L ( )=
T exp(
Q
t=1
m
t (
))(
m
yt !
m
t (
m
t );
is de…ned as
m
T
))yt Q
fu (Ut )
t=1
where fu denotes the uniform density. Note that, as Lm ( ) and L( ) has the same
m
form, then Sm
T ( ) and HT ( ) are the counterpart of ST ( ) and HT ( ), where (yt ;
are replaced by (ytm ;
3.6.4
t)
m
t ).
Asymptotic theory
FRT (2009) prove that the maximum likelihood estimator ^ is consistent and asympm
totically normal by …rst showing these properties for ^ . For proving consistency and
m
asymptotic normality of ^ they take advantage of the fact that the log-likelihood
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
52
function is three times di¤erentiable, which allows to apply Lemma 1 of Jensen
and Rahbek (2004). The latter states consistency and asymptotic normality of the
maximum likelihood estimator for the traditional GARCH(1,1) model when some
assumptions on parameters are relaxed. It is then shown that the score function, the
information matrix and the third derivatives of the perturbed likelihood tend to the
corresponding quantities of the unperturbed likelihood function. This allows to use
proposition 6.3.9 of Brockwell and Davis (1991), stating convergence in distribution
of a random vector when some conditions are satis…ed.
Before formulating the theorem stating the main result, it is necessary to de…ne
the lower and upper values of each component of , ! L < ! 0 < ! U ;
1; and
L
<
0
<
L
<
0
<
U
<
U:
O( 0 ) = f j0 < ! L
j0 < ! L
!
!
!U ;
!U ;
0 <
L
U
< 1 and
0 <
L
U
g
The following theorem states the properties of consistency and asymptotically normality of the maximum likelihood estimator, under a stationarity condition.
Theorem 3.3
0
+
0
Under model (3.53), assuming that at the true value
< 1, there exists a …xed open neighborhood O = O( 0 ) of
0
0; 0
<
such that
with probability tending to 1, as T ! 1, the log-likelihood function has a unique
maximum point ^ and, furthermore, ^ is consistent and asymptotically normal:
p
T (^
0)
d
! N (0; G 1 ( ))
where the conditional information matrix G( ) is de…ned as
1
t( )
G( ) = E
@
@
@
@
t
t
0
(3.59)
and can be consistently estimated by
GT ( ) =
=
T
X
t=1
T
X
t=1
V ar
1
t( )
@lt
j Ft
@
@
@
t
1
@
@
t
0
(3.60)
CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES
53
The standard errors of parameter estimates can be obtained from matrix GT ( ).
3.7
Concluding remarks
We have reviewed the main models for time series of counts used in econometrics.
They belong to the class of GLM and their estimation relies on partial likelihood
theory. We have deeply analyzed one of the most used count model, which is the
Poisson with log-linear intensity. Then we have introduced a recently developed
Poisson model: Poisson Autoregression by Fokianos, Rahbek and Tjøstheim (FRT,
2009). This model de…nes intensity as a linear function of its own past values and
the past number of events and is able to capture the overdispersion and the strong
persistence characterizing many count data. As these features are also found in the
corporate default count time series, we can think to Poisson Autoregression as an
useful tool for the count time series analysis of the default phenomenon.
Chapter 4
A new Poisson Autoregressive
model with Exogenous Covariates
We have concluded the previous chapter by presenting Poisson Autoregression by
Fokianos, Rahbek and Tjøstheim [FRT] (2009) and explaining its potential advantages in modelling overdispersed and long-memory count data, which are features
found in the corporate default counts that will be the object of our empirical study
in Chapter 5. Though, this formulation does not consider the role of covariate processes in the intensity dynamics, i.e. in the distribution of the number of events. We
claim that including exogenous predictors in the conditional mean speci…cation can
enrich the analysis of count time series and also improve the in- and out-of-sample
forecasting performance, especially when applying the model to empirical time series
strongly linked to the …nancial and economic context. In this chapter we then propose
and develop a class of Poisson intensity AutoRegressions with eXogenous covariates
(PARX) models. Extending the theory developed by FRT (2009) allowing for covariate processes requires a strong theoretical e¤ort which is a relevant part of our
methodological contribution. First, we provide results on the time series properties
of PARX models, including conditions for stationarity and existence of moments.
We then provide an asymptotic theory for the maximum-likelihood estimators of the
parameters entering the model, allowing inference and forecasting.
54
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES55
4.1
Related literature
The PARX model is related to a recent literature on GARCH models augmented by
additional covariates with the aim of improving the volatility forecasting performance. In many cases the lagged squared returns o¤er just a weak signal about the
level of volatility and, as a consequence, the approximation provided by standard
GARCH models is poor when volatility changes rapidly to a new level. Realized
volatility measures calculated from high-frequency …nancial data and introduced in
the literature by seminal works such as Andersen, Bollerslev, Diebold and Labys
(2001) and Barndor¤-Nielsen and Shephard (2002) can be useful to improve the approximation of these models. These measures are found indeed to approximate the
level of volatility very well. The …rst models including realized volatility measures
in the GARCH equation are the so-called GARCH-X models estimated by Engle
(2002), but are quite incomplete as they do not explain the variation in the realized
measures. More complete models are those introduced by Engle and Gallo (2006)
and the HEAVY model of Shephard and Sheppard (2010), both specifying multiple
latent volatility processes, and the Realized GARCH model of Hansen et al. (2012),
which combines a GARCH structure for the daily returns with an integrated model
for realized measures of volatility. More generally, there are several works presenting
empirical analyses where the time-varying volatility is explained by past returns and
volatilities together with additional covariates, typically the volume of transactions
as a proxy of the ‡ow of information reaching the market (see, for example, Lamoureux and Lastrapes, 1990 and Gallo and Pacini, 2000). An econometric analysis
of ARCH and GARCH models including exogenous covariates can be found in Han
and Park (2008) and Han and Kristensen (2013). The PARX shares the same motivation and modelling approach of the presented literature, except that the variable
of interest in our case is the time-varying Poisson intensity.
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES56
4.2
Speci…cation of PARX models
Consider the Poisson model for the counts yt , conditional on past intensity and
counts, denoted by
t m
and yt
m,
for m
1, respectively, as well as past values of
an explanatory variable xt :
yt j F t
where Ft
1
= (yt
m;
t m;
xt
m; m
(4.1)
P ois( t )
1
1) and
t
is the, potentially time-varying,
Poisson intensity. Following FRT (2009), equation (4.1) can be rewritten in terms
of an i.i.d. sequence Nt (: ) of Poisson processes with unit-intensity
(4.2)
yt = Nt ( t )
The time-varying intensity is speci…ed in terms of the linear link function considered in FRT (2009), here augmented by an exogenous covariate xt 2 R entering
the intensity through a known function f : R ! R+ :
t
=!+
p
X
i yt i
i=1
+
q
X
i t i
The parameters of interest are given by ! > 0, and
is easy to observe that, when
(4.3)
+ f (xt 1 )
j=1
1 ; :::;
p,
1 ; :::;
q
and
0. It
= 0, the model reduces to the Poisson Autoregression
in FRT (2009). Also note that we de…ne a more general speci…cation, allowing for
p lags of the response and q lags of the intensity. We can then use the notation
PARX(p; q) in an analogous way as GARCH(p; q) identi…es a GARCH models where
p lags of the returns and q lags of the volatility are included. The presence of the
lagged covariate value rather than the value at time t allows the de…nition of a
conditional intensity that is known at time t given the information available up to
time t
1.
In order to carry out multi-step ahead forecasting, we close the model by imposing
a Markov-structure on the covariate,
xt = g(xt 1 ; "t ; )
for some function g(x; "; ) which is known up to parameter
(4.4)
and where "t is an
i.i.d. error term. We will assume that f"t g and fNt (: )g are mutually independent so
that there is no feedback e¤ect from yt to xt .
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES57
4.3
Time series properties
We here provide su¢ cient conditions for a PARX process to be stationary and ergodic
with polynomial moments of a given order1 . The analysis is carried out by applying
recent results on so-called weak dependence developed in Doukhan and Wintenberger
(2008). The notion of weak dependence allows to prove the existence of a strictly
stationary solution for a large variety of time series models called chains with in…nite
memory, de…ned by the equation
Xt = F (Xt 1 ; Xt 2 ; :::; ; t ) a.s. for t 2 T
where F takes values in a Banach space and
t
constitutes an i.i.d. sequence (see
Doukhan and Wintenberger, 2008 for details). These models can be seen as a natural
extension either of linear models or Markov models. While weak dependence is a
slightly weaker concept than the geometric ergodicity used in FRT (2009), it does
imply that a strong law of large numbers as well as a central limit theory, both used
for the results on econometric inference shown in the following, apply.
Speci…cally, we make the following assumptions:
Assumption 1 jf (x)
of points x; x~ 2 R.
Assumption 2
f (~
x)j
E [kg(x; "t )
L kx
x~k, for some L > 0 and for every pair
g(~
x; "t )ks ]
kx
x~ks for some
< 1, s
1
and for every pair of points x; x~ 2 R, and E [kg(0; "t )ks ] < 1.
Pmax(p;q)
( i + i ) < 1.
Assumption 3
i=1
Assumption 4
("0t ; Nt (: )) are i.i.d.
A few remarks on these assumptions are needed.
First, Assumption 1 states that f satis…es the Lipschitz condition. This assumption will be weakened in the following in order to gain ‡exibility in the choice of the
function f .
Assumption 2 concerns, instead, a function g de…ning the structure of the covariate process and requires it to be Ls -Lipschitz for all values of x. This is a key
1
All theorems and lemmas are proved in Appendix A.
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES58
assumption when proving stationarity of many popular time series models, including
the linear autoregressive ones.
Assumption 3 implies that the function L(y; ) = ! +
Pp
i=1
i yi
+
Pq
i=1
i i
is
Lipschitz. This assumption is imposed in Doukhan and Wintenberger (2008) for
applying the weak dependence theory and it is identical to the condition imposed in
FRT (2009) for the Poisson autoregressive model.
Finally, Assumption 4 rules out dependence in the two error terms driving the
model. It could be weakened, still satisfying the conditions of Doukhan and Wintenberger (2008), by allowing the two joint innovation terms to be Markov processes.
This would accomodate “leverage intensity-e¤ects”if f"t g and fNt (: )g are negatively
correlated. Though, for our purpose here we maintain Assumption 4.
In the following we provide a theorem stating the existence of a stationary solution
for process yt under the assumptions de…ned above. Before stating it, we brie‡y
present the theory of weak dependence developed by Doukhan and Wintenberger
(2008). They use the notion of weak dependence introduced by Dedecker and Prieur
(2004) and de…ned as follows.
Let ( ; C; P) be a probability space, M a
-subalgebra of C and Z a generic
random variable with values in A. Assume that kZk1 < 1, where k km denotes the
m
Lm norm, i.e. kZkm
1, and de…ne the coe¢ cient as
m = EkZk for m
Z
Z
(M; Z) = sup
f (z)PXjM (dz)
f (z)PX (dz) with f 2 1 (A)
1
An easy way to bound this coe¢ cient is based on a coupling argument:
(M; Z)
kZ
W k1
for any W with the same distribution as Z and independent of M. Under certain
conditions on the probability space ( ; C; P) (see Dedecker and Prieur, 2004), then
there exists a Z such that kZ
Z k1 and, using the de…nition of , the dependence
between the past of the sequence (Zt )t2T and its future k-tuples may be assessed.
Consider the norm kz
p) and de…ne
wk = kz1
w1 k + :::+ kzk
wk k on Ak , set Mp = (Zt ; t
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES59
1
= max sup f (Mp ; (Zj1 ; :::; Zjl )) with p + r
1 l kl
1 (r) = sup k (r)
j1 ; :::; jl g ;
k (r)
k>0
The time series (Zt )t2T is said -weakly dependent when its coe¢ cients
1 (r)
tend to 0 as r tends to in…nity. The notion of geometric ergodicity (see 3.6.2)
is stronger and refers to the rate of convergence of the Markov chain transition
probabilities to the invariant distribution. It requires the
-irreducibility of the
Markov chain and in FRT (2009) is shown for an approximated (perturbated) Poisson
Autoregressive model.
Under Assumptions 1-4 there exists a -weakly dependent sta-
Theorem 4.1
tionary and ergodic solution Xt = (yt ;
Pmax(p;q)
( i + i ); .
i=1
t ; xt )
with E [kXt ks ] < 1 and (r) = max
The above theorem complements the results of FRT (2009). Note that here
we provide su¢ cient conditions for weak dependence of the actual model, not an
approximated version. On the other hand, we do not show the stronger property of
geometric ergodicity.
Given the existence of a stationary distribution, it can easily be shown that
E[yt ] = E[ t ] =
and furthermore V ar [yt ]
=
! + E [f (xt 1 )]
Pmax(p;q)
( i+
1
i=1
i)
E[yt ]. Thus, by including past values of the response and
covariates in the evolution of the intensity, the PARX model generates overdispersion,
which is a prominent feature in many count time series.
An important consequence of Theorem 4.1 is that, using again the results of
Doukhan and Wintenberger (2008), if Assumptions 1-4 are satis…ed then the (strong)
law of large numbers (LLN) applies to any function h(: ) of Xt = (yt ;
t ; xt )
provided
E [kh(Xt )k] < 1. As a lemma we note that the same applies independently of the
choice of initial values (y0 ;
0 ; x0 ),
that is:
Lemma 4.1 If Xt = F (Xt 1 ; t ) with t i.i.d. and Xt -weakly dependent,
1 XT
a:s:
then
h(Xt ) ! E [h(Xt )] provided that E [kh(Xt )k] < 1.
t=1
T
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES60
Note that no role is played by the initial values in what stated above.
Also observe that when "t is an i.i.d.(0;
2
) sequence and E [h2 (Xt )] < 1, it
follows by Lemma 4.1 and a CLT for martingales (see Brown, 1971) that
T
1 X
d
p
h(Xt ) "t ! N (0;
T t=1
2
E h2 (Xt ) )
(4.5)
It is worth remarking that the Lipschitz condition in Assumption 1 rules out some
unbounded transformations f (x) of xt , such as f (x) = exp(x).
In order to handle such situations we introduce a truncated model:
c
t
=!+
p
X
i yt i
+
i=1
q
X
c
i t i
i=1
+ f (xt 1 )I fkxt 1 k
cg
(4.6)
for some cut-o¤ point c > 0.
We can then relax Assumption 1 allowing f (x) to be locally Lipschitz in the
following sense:
Assumption 1’
For all c > 0, there exists some Lc < 1 such that
jf (x)
f (~
x)j
L kx
x~k ;
kxk ; k~
xk
c
By replacing Assumption 1 with Assumption 1’, we now obtain, by identical arguments as in the proof of Theorem 4.1, that the truncated process has a weakly
dependent stationary and ergodic solution. Though this approach recalls the approximated GARCH-type Poisson process introduced in FRT (2009), the reasoning
is di¤erent. In FRT (2009) an approximated process was needed to establish geometric ergodicity of the Poisson process, while here we introduce the truncated process
in order to handle the practice - often used in literature - of introducing non-log realized volatility measures as exogenous covariate. Note that, as c ! 1, the truncated
process approximates the untruncated one (c = +1) in the following sense:
Under Assumptions 1’- 4 together with E [f (xt )] < 1,
Lemma 4.2
jE [
c
t
t ]j
E[
c
t
t]
2
= jE [ytc
2 (c);
yt ]j
1 (c);
E [ytc
y t ]2
3 (c)
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES61
where
k (c)
! 0 as c ! 1; k = 1; 2; 3.
The above result is akin to Lemma 2.1 in FRT (2009). The additional assumption
of E [f (xt )] being …nite needs to be veri…ed on a case-by-case basis. For example,
with f (x) = exp(x), then this holds if xt has a Gaussian distribution, or some
other distribution for which the moment generating function, or Laplace transform,
is well-de…ned.
4.4
Maximum likelihood estimation
= (!; ; ; ) 2 Rp+q+2 , where
Denote by
=(
1 ; :::;
p)
0
and
=(
1 ; :::;
0
q)
the
set of unknown parameters entering the PARX model in (4.2)-(4.3). The conditional
log-likelihood function in terms of observations y1 ; :::; yT , given the initial values
( 0;
1 ; :::;
q+1 ; y0 ; y 1 ; :::; y p+1 ),
LT ( ) =
T
X
takes the form
where lt ( ) = yt log
lt ( );
t(
)
t(
(4.7)
)
t=1
where we have left out a constant term and
t(
)=!+
p
X
i yt i
q
X
+
i=1
i t i(
) + f (xt 1 )
i=1
The maximum likelihood estimator is then computed as
^ = arg max LT ( )
(4.8)
2
where
Rp+q+2 is the parameter space.
We now impose the following conditions on the parameters:
Assumption 5
Moreover, for all
!
! L > 0.
Assume that
Rp+q+2 , with
2
= (!; ; ; ) 2
,
i
U
compact and
0
2 int .
< 1=q for i = 1; 2; :::; q and
Under this assumption together with the ones used to establish stationarity of
the model, we obtain the following asymptotic result for the maximum likelihood
estimator:
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES62
Under Assumptions 1-5, ^ is consistent and
"
#
2
p
l
(
)
d
t
1
T (^
G= E
0 ) ! N (0; G )
0
Theorem 4.2
=
(4.9)
0
An important remark is the following. If the distribution of yt is misspeci…ed, thus
there is an error term in the de…nition of intensity, but it still holds that E[yt ] =
t,
we expect the asymptotic properties of the maximum likelihood estimator to remain
correct except that the asymptotic variance now takes the sandwich form G
where
=E
"
lt ( ) lt ( )
0
=
0
1
G
1
#
See Gourieroux et al. (2004) for an analysis of Quasi-Maximum Likelihood Estimation (QMLE) of Poisson models.
Theorem 4.2 generalizes the result of FRT (2009) to allow for estimation of parameters associated with additional regressors in the speci…cation of
t.
By combining
the arguments in FRT (2009) with Lemma 4.2, the asymptotic result can be extended
to allow f to be locally Lipschitz (see Assumption 1’).
More precisely, we de…ne the likelihood quantities for the approximated, or truncated, model as
LcT ( ) =
T
X
ltc ( );
c
t(
where ltc ( ) = ytc log
c
t(
)
)
(4.10)
t=1
c
It immediately follows that the results of Theorem 4.2 holds for the QMLE ^ of
LcT ( ). However, as the approximated likelihood function can be made arbitrarily
close to the true likelihood as c ! 1, one can show that we can replace Assumption
1 in Theorem 4.2 by Assumption 1’:
Theorem 4.3
consistent and
p
T (^
Under Assumptions 1’, 2-5 and E [f (xt )] < 1, then ^ is
0)
d
! N (0; G 1 )
G=
E
"
2
lt ( )
0
=
0
#
(4.11)
With the above theorem we have generalized the asymptotic results by allowing the
assumptions on function f to be relaxed.
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES63
4.5
Forecasting
The PARX model can be used to generate forecasts of both the intensity,
t,
and the
number of events, yt . It is important to remark that, for multi-step forecasting, we
also need to estimate the model for xt as given in (4.4). Given that xt is exogenous, we
can estimate the parameters entering equation (4.4) independently of . If no model
is available for xt , only one-step ahead forecasts are possible. In the following, we
treat the parameters entering the model as known for notational ease. In practice,
the unknown parameters are simply replaced by their estimates. Forecasting of
Poisson autoregressive processes is similar to forecasting of GARCH processes (see,
e.g., Hansen et al, 2012, Section 6.2) since it proceeds in two steps. First, a forecast
of the time-varying parameter - the variance in the case of GARCH, the intensity
in the case of PARX - is obtained; then, this is substituted into the conditional
distribution of the observed process yt .
Consider the forecasting of
=!+
T +1 j T
t.
A natural one-step ahead forecast is
p
X
i yT +1 i
i=1
+
q
X
i T +1 i
+ f (xT )
(4.12)
i=1
More generally, a multi-step ahead forecast of the distribution of yT +h , for some
h > 1, takes the form
FT +h j T (y) = F y j
where
T +h j T
T +h j T
is the …nal output of the following recursion:
max(p;q)
T +h j T
=!+
X
(
i
+
i=1
where the initial value
T +1 j T
i ) T +k i j T
+ f (xT +k
i j T );
k = 1; :::; h (4.13)
derives from (4.12) and xT +k j T , k = 1; :::; h
1, is
obtained from some forecast procedure based on (4.4). For example, if the model for
xt is an AR, the natural forecast is
yT +h j T := E [yT +h j Ft ] =
together with the 1
for some
con…dence interval (as implied by the forecast distribution)
2 (0; 1). The symmetric 1
CI1
T +h j T
= Q
=2j
T +h j T
con…dence interval takes the form
;Q 1
=2j
T +h j T
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES64
where p 7 ! Q(pj ) denotes the quantile function of a Poisson distribution with in-
tensity . The quantile function is available in standard statistical software packages,
such as Matlab. The forecasting results can be used to evaluate competing PARX
models, e.g. based on di¤erent choices of covariates. A number of di¤erent tests have
been proposed in the literature for comparing forecasting models. One can either use
forecast evaluation methods based on point forecast, yT +h j T , as proposed in, among
others, Christo¤ersen and Diebold (1997). Alternatively, the evaluation of the forecast distribution can be made by using the so-called scoring rules (Diebold et al.,
1998). These take as starting point some loss function S(P; y) whose arguments are
the probability forecast, P , and the future realization, y. For instance, the log-score,
S(P; y) = log P (y) can be used for ranking probability forecast methods by comparing their average scores. A test based on the scoring rules is the likelihood ratio test
studied by Amisano and Giacomini (2007). Suppose we have two competing PARX
models with corresponding intensity forecasts
(1)
T +h j T
(2)
T +h j T .
and
We then de…ne
the corresponding log-likelihood functions given the actual outcome in period T + h,
(k)
T +h j T
= yT +h log
(k)
T +h j T
(k)
T +h j T
,
k = 1; 2
and compare the two forecasting models in terms of the Kullback-Leibler distance
across k
1 realizations and corresponding forecasts
m+k
1 Xn
LR =
k + 1 T =m
where m
(1)
T +h j T
(2)
T +h j T
o
1 is the “training sample size” with fyt ; xt : t = 1; :::; mg being used
to obtain the parameter estimates. If LR > 0 (< 0) we prefer the …rst (second)
model. Amisano and Giacomini (2007) show that LR follows a normal distribution
as k ! 1.
4.6
Finite-sample simulations
In this section we present a simulation study with the aim of evaluating the performance of MLE for PARX models. We consider the results of simulations from
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES65
PARX models with di¤erent covariate processes, mainly distinguishing between longmemory and short-memory processes. The objective is indeed to show not only the
satisfactory performance of the estimation algorithm, but also the ‡exibility of PARX
in terms of choice of the covariates.
4.6.1
Simulation design
This experiment2 is focused on the …nite-sample behaviour of MLE for PARX models.
We evaluate the parameter estimates for di¤erent sample sizes, in order to verify not
only the accuracy but also the convergence to the asymptotic Gaussian distribution.
In particular, our study is organized as follows. We simulate and …t the PARX(1,1)
model
t
=!+
yt j F t
1
1 yt 1
+
P ois( t )
1 t 1
+ exp(xt 1 )
Though here our Monte Carlo experiment is shown for a PARX(1,1) model only,
the results are very similar if more lags of the response and intensity are included.
We choose the exponential function as the positive function f for including the generated exogenous covariate in the model (see Equation 4.3). This allows to evaluate
the parameter estimates when the Lipschitz condition on f is relaxed, allowing for
unbounded transformation to be employed (see assumption A’). The exponential
transformation will also be used in our empirical study.
We examine di¤erent cases, based on alternative choices of the function g(x; "; )
in
Xt = g(xt 1 ; "t ; )
The cases included in our simulation design are the following:
Case 1: stationary AR(1) covariate
xt = 'xt
1
+ "t
= 0:50
2
We use Matlab for writing the data generation and estimation code.
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES66
Case 2: MA(1) covariate
xt = x t
1
+ "t
= 0:50
Case 3: ARFIMA (0,0.25,0) covariate
d
j xt
= xt 1 + "t
d = 0:25 where, using the backward shift operator L, dj = (1 L)d =
Xj
(k d)Lk
, with ( ) denoting the gamma function and j denotk=0 ( d) (k + 1)
ing the truncation order of the theoretical in…nite sum d = (1 L)d =
X1
(k d)Lk
.
k=0 ( d) (k + 1)
2
In each case the innovation process f"t g is chosen to be i.i.d. normal with variance
such that the variance of the covariate model is 1 and thus facilitating compar-
isons. In all cases the initial values are set to x0 = 0. Note that the choice of a
fractional di¤erencing order d = 0:25 for the fractional white noise satis…es the stationarity condition for autoregressive fractionally integrated processes jdj < 0:50, so
that Assumption 2 on the Lipschitz condition is not violated.
For each case we consider four alternative scenarios for the data-generating parameter values, changing the value of the sum of the persistence parameters
1
+
1:
Scenario 1: null coe¢ cient of intensity:
! = 0:10;
1
= 0:30;
1
= 0:00;
= 0:50
Scenario 2 - “low”persistence:
! = 0:10;
1
= 0:30;
1
= 0:20;
= 0:50
Scenario 3 - “high”persistence with the coe¢ cient of the response larger than
the coe¢ cient of intensity:
! = 0:10;
1
= 0:70;
1
= 0:25;
= 0:50
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES67
Scenario 4 - “high”persistence with the coe¢ cient of intensity larger than the
coe¢ cient of the response:
! = 0:1;
1
= 0:25;
1
= 0:70;
= 0:50
The …rst scenario is comparable to an ARCH model as only the lagged response
is included. Note that none of the presented scenarios violates the condition of
Pmax(p;q)
stationarity
( i + i ) < 1 (Assumption 3) that we have imposed when
i=1
developing the asymptotic theory.
For all scenarios we simulate for sample sizes T 2 f100; 250; 500; 1000g with 1000
replications. We also include small sample sizes for providing insights into the quality
of the estimates for short length count time series which are commonly modeled in
many empirical applications.
4.6.2
Results
As discussed above, our study of the MLE performance in …nite samples concerns
both the accuracy and the speed of convergence to normality. In Tables 4.1 to 4.6,
the mean of the parameter estimates (obtained averaging out the results from all
the replications) is reported in the fourth column, while the …fth shows the root
mean square error (RMSE) of the estimates. The sixth and the seventh column report the skewness and the kurtosis of the estimates distribution. We also perform a
Kolmogorov-Smirnov test on the estimates for testing against the standard normal
distribution and report the corresponding p-value in the last column. In what follows, we comment the results obtained for the cases with AR/MA (short-memory)
covariates and long-memory covariates separately.
Results for the short-memory covariates
In Tables 4.1 to 4.4, we show the results for the case where short-memory processes
are included in the intensity speci…cation. We consider a stationary AR(1) and a
stationary MA(1), thus two short-memory processes characterized by a di¤erent rate
of decrease of the autocorrelation function. The results are very similar. In both
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES68
cases the estimate precision is fully satisfactory for a sample size of 500. We can also
note a relevant improvement moving from T = 100 to T = 250. The best results are
obtained in the …rst and second (low persistence) scenarios (see Tables 4.1 to 4.4).
The “worst” scenario appears to be the third, i.e. when the value of persistence
is close to one and the coe¢ cient of the response
of intensity
1.
1
is higher than the coe¢ cient
Moreover, even in this case, the approximation improves quicky
as the sample size increases. The less accurate estimate is that of the constant (!)
parameter. Convergence to normality is evident in both cases and for all the scenarios
considered, as normality is never rejected at a 5% signi…cance level when the sample
size is at least 500.
Results for the long-memory covariates
Case 3 considers the inclusion of a fractionally integrated process (Tables 4.5 to
4.6). ARFIMA processes are weakly stationary if the condition jdj < 0:50 (as in our
experiment) is satis…ed, but have slowly-decaying autocorrelations compared to the
exponential rate of decay typical of ARMA models. Considering this case separately
is then convenient. The results do not show substancial di¤erences with respect to
the previously examined case of AR/MA covariates. Again, the approximation is
satisfactory, except for the constant parameter in Scenario 3, which substantially
improves for a sample size of 1000, though. Convergence to normality is con…rmed,
as the only rejection for sample sizes larger than 250 concerns the constant parameter
in Scenarios 3 and 4 (see Tables 4.5 to 4.6).
0.02
0.51
0.00
0.50
T=1000
T=500
T=250
0.28
0.30
1
0.30
0.00
0.50
0.30
0.00
0.50
1
0.30
0.00
0.50
0.30
0.00
0.50
1
0.30
0.00
0.50
0.30
0.00
0.50
1
1
0.10
0.10
!
1
0.10
0.10
!
1
0.09
0.10
!
1
0.09
0.10
!
T=100
Mean
True
Parameter
Sample size
Scenario 2: "low" persistence.
0.02
0.03
0.03
0.03
0.02
0.05
0.04
0.05
0.04
0.08
0.07
0.07
0.07
0.15
0.13
0.16
RMSE
-0.01
0.11
-0.09
0.21
0.06
-0.13
0.15
0.15
0.18
-0.02
0.05
-0.11
0.06
0.15
0.10
0.23
Skewness
Scenario 1
3.23
3.04
3.05
3.18
3.02
3.26
3.64
2.94
3.01
3.13
3.27
3.28
3.31
3.85
3.85
3.65
Kurtosis
0.74
0.98
0.52
0.38
0.34
0.17
0.33
0.66
0.49
0.93
0.87
0.85
0.85
0.31
0.32
0.36
KS p-value
0.50
0.20
0.30
0.10
0.50
0.20
0.30
0.10
0.50
0.20
0.30
0.10
0.50
0.20
0.30
0.10
True
0.50
0.20
0.30
0.10
0.50
0.20
0.30
0.10
0.50
0.21
0.29
0.10
0.51
0.22
0.27
0.10
Mean
0.02
0.03
0.03
0.04
0.02
0.05
0.04
0.05
0.04
0.08
0.07
0.08
0.07
0.14
0.11
0.18
RMSE
0.13
-0.03
0.02
0.25
0.00
0.13
-0.17
0.27
0.01
-0.08
0.04
0.33
0.35
0.07
-0.05
0.35
Skewness
Scenario 2
2.81
2.95
2.92
3.08
3.06
3.16
3.17
3.07
2.97
2.93
2.87
3.64
3.26
4.14
3.38
3.82
Kurtosis
0.32
0.71
0.61
0.42
0.75
0.16
0.87
0.35
0.92
0.63
0.99
0.19
0.32
0.34
0.97
0.01
KS p-value
Table 4.1: Results of simulations for PARX(1,1) with stationary AR(1) covariate. Scenario 1: null coe¢ cient of intensity.
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES69
0.27
0.52
0.25
0.50
T=1000
T=500
T=250
0.67
0.70
1
0.69
0.26
0.50
0.70
0.25
0.50
1
0.70
0.25
0.50
0.70
0.25
0.50
1
0.70
0.25
0.50
0.70
0.25
0.50
1
1
0.13
0.10
!
1
0.15
0.10
!
1
0.22
0.10
!
1
0.20
0.10
!
T=100
Mean
True
Parameter
Sample size
0.03
0.03
0.03
0.17
0.03
0.05
0.05
0.23
0.07
0.06
0.06
0.36
0.13
0.12
0.12
0.48
RMSE
-0.02
0.05
-0.01
0.46
0.08
-0.07
0.12
0.57
0.19
-0.15
0.16
0.61
0.09
0.48
-0.46
0.41
Skewness
Scenario 3
2.78
3.02
3.01
3.59
2.92
3.16
3.12
3.53
3.97
3.07
3.14
4.01
3.22
5.41
5.21
4.20
Kurtosis
0.99
0.73
0.71
0.05
0.94
0.99
0.99
0.00
0.75
0.45
0.26
0.00
0.86
0.18
0.12
0.01
KS p-value
0.50
0.70
0.25
0.10
0.50
0.70
0.25
0.10
0.50
0.70
0.25
0.10
0.50
0.70
0.25
0.10
True
0.50
0.71
0.24
0.10
0.50
0.71
0.24
0.11
0.50
0.72
0.23
0.13
0.51
0.77
0.18
0.15
Mean
0.02
0.02
0.02
0.10
0.02
0.04
0.04
0.13
0.05
0.06
0.06
0.21
0.11
0.15
0.15
0.30
RMSE
0.07
-0.02
0.03
0.30
0.05
0.01
-0.04
0.30
0.05
0.20
-0.20
0.36
0.04
0.89
-0.92
0.56
Skewness
Scenario 4
high coe¢ cient of the response. Scenario 4: "high" persistence due to high coe¢ cient of intensity.
2.91
2.89
2.88
3.20
2.87
2.85
2.79
2.88
3.20
3.92
3.91
3.30
2.94
3.46
3.50
3.94
Kurtosis
0.99
0.81
0.79
0.24
0.95
0.96
0.86
0.21
0.81
0.64
0.72
0.13
0.84
0.00
0.00
0.07
KS p-value
Table 4.2: Results of simulations for PARX(1,1) with stationary AR(1) covariate. Scenario 3: "high" persistence due to
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES70
0.00
0.52
0.00
0.50
T=1000
T=500
T=250
0.28
0.30
1
0.29
0.00
0.50
0.30
0.00
0.50
1
0.30
0.00
0.50
0.30
0.00
0.50
1
0.30
0.00
0.50
0.30
0.00
0.50
1
1
0.10
0.10
!
1
0.11
0.10
!
1
0.10
0.10
!
1
0.10
0.10
!
T=100
Mean
True
Parameter
Sample size
2: "low" persistence.
0.03
0.05
0.03
0.05
0.05
0.07
0.05
0.08
0.06
0.10
0.07
0.10
0.11
0.20
0.12
0.19
RMSE
0.17
0.15
-0.04
0.22
0.01
0.07
0.06
0.15
0.11
-0.04
-0.07
0.28
0.26
0.07
0.15
0.37
Skewness
Scenario 1
3.27
2.94
2.94
3.33
2.92
3.01
3.63
2.95
2.85
3.19
3.03
3.21
3.24
3.80
3.24
5.74
Kurtosis
0.93
0.73
0.99
0.29
0.72
0.71
0.34
0.83
0.86
0.55
0.81
0.03
0.70
0.16
0.51
0.01
KS p-value
0.50
0.20
0.30
0.10
0.50
0.20
0.30
0.10
0.50
0.20
0.30
0.10
0.50
0.20
0.30
0.10
True
0.50
0.20
0.30
0.10
0.50
0.20
0.30
0.11
0.51
0.21
0.29
0.10
0.51
0.20
0.27
0.13
Mean
0.04
0.05
0.03
0.07
0.05
0.08
0.05
0.09
0.07
0.10
0.07
0.12
0.16
0.24
0.14
0.19
RMSE
-0.05
0.08
-0.12
0.35
0.04
-0.05
0.05
0.50
0.12
-0.14
-0.06
0.49
0.25
0.21
-0.29
0.93
Skewness
Scenario 2
3.00
3.32
3.43
3.31
3.11
3.30
2.81
3.94
2.77
3.41
3.17
3.48
2.89
4.21
3.94
4.78
Kurtosis
0.88
0.57
0.40
0.40
0.40
0.54
0.96
0.05
0.69
0.72
1.00
0.16
0.29
0.03
0.37
0.00
KS p-value
Table 4.3: Results of simulations for PARX(1,1) with MA(1) covariate. Scenario 1: null coe¢ cient of intensity. Scenario
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES71
0.26
0.51
0.25
0.50
T=1000
T=500
T=250
0.67
0.70
1
0.69
0.25
0.50
0.70
0.25
0.50
1
0.70
0.25
0.50
0.70
0.25
0.50
1
0.70
0.25
0.50
0.70
0.25
0.50
1
1
0.15
0.10
!
1
0.18
0.10
!
1
0.24
0.10
!
1
0.30
0.10
!
T=100
Mean
True
Parameter
Sample size
0.09
0.03
0.03
0.17
0.10
0.05
0.04
0.25
0.15
0.07
0.07
0.35
0.27
0.12
0.11
0.60
RMSE
0.13
-0.07
0.01
0.33
-0.01
-0.02
0.05
0.65
0.11
-0.09
0.08
0.82
0.25
-0.08
-0.03
0.35
Skewness
Scenario 3
3.19
2.98
2.99
3.40
3.06
3.32
3.26
4.08
3.33
3.26
3.19
5.00
3.19
3.38
3.07
4.93
Kurtosis
0.91
0.56
0.84
0.36
1.00
0.97
0.97
0.02
0.98
0.96
1.00
0.01
0.68
0.92
0.25
0.00
KS p-value
0.50
0.70
0.25
0.10
0.50
0.70
0.25
0.10
0.50
0.70
0.25
0.10
0.50
0.70
0.25
0.10
True
0.50
0.70
0.25
0.13
0.51
0.70
0.24
0.17
0.51
0.71
0.23
0.17
0.52
0.77
0.17
0.16
Mean
0.05
0.03
0.02
0.14
0.08
0.04
0.04
0.26
0.13
0.06
0.06
0.33
0.19
0.16
0.16
0.34
RMSE
-0.04
-0.12
0.08
0.39
0.01
-0.22
0.15
0.73
0.25
0.15
-0.22
0.48
0.04
0.74
-0.80
0.55
Skewness
Scenario 4
coe¢ cient of the response. Scenario 4: "high" persistence due to high coe¢ cient of intensity.
3.08
2.85
2.79
2.99
3.24
3.31
3.09
4.35
2.86
2.97
2.82
3.54
3.29
3.48
3.62
3.96
Kurtosis
0.98
0.90
0.95
0.03
0.99
0.49
0.79
0.10
0.16
0.35
0.38
0.02
0.71
0.00
0.00
0.02
KS p-value
Table 4.4: Results of simulations for PARX(1,1) with MA(1) covariate. Scenario 3: "high" persistence due to high
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES72
-0.01
0.51
0.00
0.50
T=1000
T=500
T=250
0.29
0.30
1
0.30
0.00
0.50
0.30
0.00
0.50
1
0.30
0.00
0.50
0.30
0.00
0.50
1
0.30
0.00
0.50
0.30
0.00
0.50
1
1
0.10
0.10
!
1
0.10
0.10
!
1
0.10
0.10
!
1
0.12
0.10
!
T=100
Mean
True
Parameter
Sample size
Scenario 2: "low" persistence.
0.03
0.05
0.03
0.05
0.04
0.07
0.05
0.07
0.06
0.10
0.07
0.09
0.13
0.23
0.13
0.20
RMSE
0.09
0.03
-0.05
0.09
-0.01
0.03
-0.05
0.30
0.26
-0.24
0.17
0.39
0.17
0.00
-0.05
0.74
Skewness
Scenario 1
3.11
3.13
3.04
2.93
2.98
3.13
2.94
3.44
3.06
3.66
3.48
3.71
3.21
4.42
3.80
5.30
Kurtosis
0.74
0.82
0.81
0.73
0.59
1.00
0.95
0.22
0.39
0.33
0.70
0.14
0.50
0.16
0.47
0.00
KS p-value
0.50
0.20
0.30
0.10
0.50
0.20
0.30
0.10
0.50
0.20
0.30
0.10
0.50
0.20
0.30
0.10
True
0.50
0.20
0.30
0.10
0.50
0.20
0.30
0.10
0.50
0.20
0.29
0.12
0.51
0.21
0.27
0.11
Mean
0.03
0.05
0.03
0.05
0.05
0.07
0.05
0.07
0.07
0.10
0.07
0.12
0.12
0.19
0.13
0.18
RMSE
0.08
0.07
-0.15
0.40
0.02
0.12
-0.10
0.30
0.25
-0.15
0.01
0.47
0.17
0.02
-0.18
0.81
Skewness
Scenario 2
2.79
3.03
2.83
3.68
2.79
3.02
2.96
3.13
2.95
3.15
3.02
3.49
3.11
4.05
3.55
4.21
Kurtosis
0.43
0.80
0.56
0.14
0.97
0.90
0.96
0.54
0.85
0.81
0.57
0.08
0.32
0.31
0.43
0.00
KS p-value
Table 4.5: Results of simulations for PARX(1,1) with ARFIMA(0,0.25,0) covariate. Scenario 1: null coe¢ cient of intensity.
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES73
0.27
0.52
0.25
0.50
T=1000
T=500
T=250
0.66
0.70
1
0.69
0.25
0.50
0.70
0.25
0.50
1
0.70
0.25
0.50
0.70
0.25
0.50
1
0.70
0.25
0.50
0.70
0.25
0.50
1
1
0.15
0.10
!
1
0.19
0.10
!
1
0.29
0.10
!
1
0.29
0.10
!
T=100
Mean
True
Parameter
Sample size
0.07
0.03
0.03
0.19
0.09
0.05
0.04
0.27
0.14
0.07
0.07
0.44
0.15
0.12
0.12
0.57
RMSE
0.17
-0.04
0.10
0.40
0.06
0.03
-0.09
0.67
0.11
0.19
-0.22
0.76
0.15
0.14
-0.20
0.73
Skewness
Scenario 3
2.92
3.18
3.28
3.23
2.76
2.83
2.88
3.64
3.20
3.45
3.37
3.95
2.79
3.33
3.44
5.33
Kurtosis
0.36
0.31
0.19
0.11
0.85
0.76
0.34
0.00
0.45
0.29
0.59
0.00
0.69
0.61
0.74
0.01
KS p-value
0.50
0.70
0.25
0.10
0.50
0.70
0.25
0.10
0.50
0.70
0.25
0.10
0.50
0.70
0.25
0.10
True
0.51
0.70
0.24
0.12
0.51
0.71
0.24
0.13
0.51
0.71
0.23
0.18
0.51
0.78
0.17
0.16
Mean
0.05
0.03
0.02
0.11
0.07
0.04
0.04
0.14
0.14
0.06
0.05
0.25
0.14
0.16
0.16
0.30
RMSE
0.05
-0.06
0.03
0.37
0.15
-0.09
0.06
0.71
0.33
0.09
-0.25
1.15
-0.02
0.80
-0.88
0.63
Skewness
Scenario 4
to high coe¢ cient of the response. Scenario 4:"high" persistence due to high coe¢ cient of intensity.
2.94
3.12
2.97
3.28
3.02
2.81
2.83
3.81
3.47
3.98
4.35
5.93
3.14
3.30
3.34
3.96
Kurtosis
0.77
0.97
0.95
0.02
0.46
0.29
0.47
0.00
0.30
0.84
0.58
0.00
0.81
0.00
0.00
0.02
KS p-value
Table 4.6: Results of simulations for PARX(1,1) with ARFIMA(0,0.25,0) covariate. Scenario 3: "high" persistence due
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES74
CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES75
4.7
Concluding remarks
In this chapter we have de…ned and studied the properties of Poisson Autoregressions
with Exogenous Covariates (PARX). Speci…cally, we have developed both the asymptotic and estimation theory, in addition to establishing the conditions for stationarity
and ergodicity of the de…ned process. We have also considered how forecasting can
be carried out and evaluated in our framework. In the last section we have conducted a simulation study of di¤erent PARX models, i.e. including di¤erent covariates.
The results show a good performance of MLE and very little di¤erences among the
alternative PARX models considered. In the empirical analysis discussed in the next
chapter, we will show that the PARX model is extremely useful for investigating the
corporate defaults phenomenon.
Chapter 5
Empirical study of Corporate
Default Counts
So far we have presented default risk and the main measures and models for analyzing
it (see Chapter 1 and 2). We have presented and discussed the literature of default
correlation as well as several studies investigating the phenomenon - which is central
in risk management - of the default peaks predictability. We have reviewed regression
models including variables which may explain the incidence of corporate defaults
phenomenon, in terms of either default rates or counts. We have progressively focused
on models for default counts, encouraged by the fact that the same clusters shown
in the default rates time series are also evident in the time series of bankruptcy
counts. Furthermore, as previously said, the main point in default rate prediction is
forecasting the number of defaulting issuers by a certain time horizon. The predicted
default intensity - the expected number of defaults - can be an easy and immediate
instrument in bank risk management communications. The count models typically
used for rare events like the Poisson, presented in Chapter 3 together with other
count time series models, seem to be suitable. Our idea of using Poisson models with
both autoregressive components and exogenous regressors for capturing the default
clustering has led to the de…nition of a new model called Poisson Autoregression with
Exogenous Covariates (PARX). How Poisson Autoregressions and PARX models
perform when handling actual corporate default data and how the results of their
76
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 77
application should be intepreted are the research questions we address in this chapter.
5.1
Overview of the approach
We investigate the corporate default dynamics through a count time series approach
including autoregressive components and exogenous variables, sharing some similarities with the generalized autoregressive models for conditional volatility. Our analysis
of corporate defaults dynamics is made under an aggregate perspective, which does
not take into account …rm-speci…c conditions determining the individual probability
of default of a company. This study tries indeed to measure an overall default risk
concerning debt issuers of considerable relevance in terms of dimension, because we
consider defaults among rated, thus in most cases listed, …rms. The default intensity
of high dimension …rms is expected to be linked to common risk factors arising from
the …nancial and macroeconomic context, as well as possible contagion e¤ects. We
claim that this approach can give a useful measure of the general tendency in the
corporate default dynamics, providing a measure of “systematic”default risk which
can support the traditional analysis of individual …rm solvency conditions.
5.2
Corporate default counts data
The time series of corporate default counts we analyze here refers to the monthly
number of bankruptcies among Moody’s rated United States …rms in the period going from January 1982 to December 2011. The default count dataset is one of the risk
monitoring instruments provided by Moody’s Credit Risk Calculator (CRC), which
allows to download historical default rates and counts in the form of customized
reports, with many options in terms of time interval length and economic sectors.
We choose to focus our study on the industrial sector: this means to include all the
…rms covering non…nancial activities and exclude banking, …nancial and insurance
companies. This choice is quite common in the study of corporate default counts
(see, for instance, Das et al., 2007, Lando and Nielsen, 2010 and Lando et al. 2013)
and motivated by the convenience of considering the real and …nancial economy
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 78
default events separately, at least in the …rst place. Other categories typically excluded are the public utilities and transportation activities, because of their peculiar
management structure, often linked to the public sector.
More generally, the choice of using US data is motivated by the good quality and
organization of the default data material, at least from the 1980s. The Bankruptcy
Reform Act of 1978, amending the Bankruptcy Act of 1898, is the …rst complete
expression of the US default law, trying to give protection to the creditors as well as
the chance to the borrowers to reorganize their activity. With this act, the default
legislation becomes uniform in all the federal states. The Bankruptcy Reform Act of
1978 continues to serve as the federal law that governs the bankruptcy cases today,
and again a strong emphasis is given to business reorganization (see Skeel, 2001 for
a history of the US bankruptcy law). However, in the US as in many European
countries, during the period from World War II through the 1970s, bankruptcy was
a nearly exceptional event. With the exception of Northeastern railroads, there were
not many notable business failures in the U.S. in that time. During the 1970s, there
were only two corporate bankruptcies of prominence: Penn Central Transportation
Corporation in 1970 and W.T. Grant Company in 1975. It is interesting that the
failure of Penn Central and Northeastern railroads is often cited as the …rst documented case of contagion, as the major case of the railroads default was the missed
payment of obligations by Penn Central. Both Das et al. (2007) and Lando and
Nielsen (2010) cite the Penn Central case in their empirical analyses. The small
number of defaults before the 1980s explains our choice of using January 1982 as the
starting period of our empirical analysis.
Some …rst considerations about the time series of corporate default counts in US
in the last thirty years can be made by inspecting a simple plot of our data, shown
in Figure 5.1.
The …rst evidence from Figure 5.1 is that the data show the peaks typically found
in corporate default counts time series and also referred to as “default clusters”. The
long memory of the series is evident from the slowly decaying autocorrelation function
(see Figure 5.2).
Looking more in detail at the peak periods and trying to connect them with
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 79
Figure 5.1: Monthly default counts of US Moody’s rated industrial …rms from January 1982 to December 2011.
Figure 5.2: Autocorrelation function of the monthly default counts.
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 80
the …nancial crises, during the 1980s and early 1990s many bankruptcies took place.
Many well-known companies …led for bankruptcy, mainly encouraged by reorganization opportunities. These Include LTV, Eastern Airlines, Texaco, Continental
Airlines, Allied Stores, Federated Department Stores, Greyhound, Maxwell Communication and Olympia & York. Indeed, also the …nancial sector lived years of trouble
between the 1980s and the 1990s, like the well-known “savings and loan”crisis. The
…nancial crisis did not involve the banking sector only, as the 1987 market crash
showed. The second peak in our series appears in the 1999-2002 period and, again,
this is not surprising: in the years 2000-2001 a strong …nancial crisis took place,
starting from the so-called “Dot-com” (or “Tech”) bubble, causing the recession of
2001 and 2002. After a period of stability from 2003 to 2007, a new peak characterizes the …nal part of our sample, from 2008 to 2010, starting from the …nancial
sector with the subprime crisis of 2007 and spreading to the real, as a global and
systemic crisis, in the following years.
It is interesting to compare the default count time series to macroeconomic indicators such as the monthly Leading Index published by the Federal Reserve. The
Leading Index includes the Coincident Index and a set of variables that “lead” the
economy: the state-level housing permits, the state initial unemployment insurance
claims, the delivery times from the Institute for Supply Management (ISM) manufacturing survey, the interest rate spread between the 10-year Treasury bond and
the 3-month Treasury bill.
Looking at Figure 5.3, the low level in the late 1980s and earlier 1990s as well as
in 2000-2002 con…rms the previous analysis, and again the last crisis turns out to be
the most dramatic period. Another relevant index, explicitly signalling the phases
of the business cycle, is the recession indicator released by the National Bureau of
Economic Research (NBER): the NBER recession indicator is a time series which
consists in dummy variables that distinguish the periods of expansion and recession,
where a value of 1 indicates a recessionary period, while a value of 0 signals an
expansionary one. The shaded areas created by the recession dates in Figure 5.4
con…rm the previous identi…cation of three turbolence periods (1982-1991, 20002002, 2008-2010). In our analysis we shall also consider the connection between the
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 81
Figure 5.3: Monthly Leading Index from January 1982 to December 2011.
business cycle and the number of corporate defaults.
Based on the previous considerations, in Table 5.1 we show some descriptive
statistics of the data in di¤erent subsamples of our dataset, which includes a total
of 360 observations. In particular, we distinguish the three clusters of the late 1980s
and early 1990s, the …rst 2000s and 2007-2010 respectively. In addition to the mean,
the standard deviation and the median we also report the variance, underlying that
all the considered subsamples present data overdispersion.
It is interesting to note that the e¤ects on defaults of the crisis spread in 2000
Table 5.1: Descriptive statistics of the default count data.
Sample
Mean Std. Dev. Variance Median
…rst cluster: 1986-1991
3.54
3.54
7.50
3
second cluster: 2000-2003
7.69
3.79
14.83
7
third cluster: 2007-2010
5.96
6.65
44.17
4
whole dataset
3.51
3.95
15.57
2
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 82
Figure 5.4: Monthly NBER recession indicator from January 1982 to December 2011.
are the most severe in terms of average number of defaults. In the last …nancial
and economic crisis period the most relevant aspect is instead the variance, as the
number of defaults explodes and decreases quickly, while the previous clusters are
more lasting in time.
5.3
Choice of the covariates
Our empirical study concerns the time series analysis and modelling of the number
of corporate defaults and also aims at measuring the impact of the macroeconomic
and …nancial context on the defaults phenomenon. This needs some re‡ections about
the variables to be considered, that are expected to be common factors for corporate
solvency conditions and thus to be predictive of the default clusters. This section
complements the previous - describing the default counts dataset which will be our
response time series - by presenting the other data included in our study and motivating our choices. The covariates presented in the following can be divided into two
groups:
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 83
…nancial and credit market variables
production and macroeconomic indicators
All the variables are included using monthly frequency data.
5.3.1
Financial market variables
The performance of the …nancial market in‡uences both …rms returns on …nancial
investments, thus their pro…tability, and their funding capability, two aspects which
strongly a¤ect the liquidity and solvency conditions. Not only the stock market, but
also the monetary market, which includes short-term …nancial instruments such as
Treasury Bills, deposits and short-lived mortgages, is part of the …nancial market and
a relevant part of the credit market, where the companies raise funds. With respect
to funding, important variables are those expressing its cost, thus the interest rates
and the relations between di¤erent interest rates, i.e. the credit spreads, which are
widely used for deriving the implied di¤erences in risk. The market is not the only
evaluator of the corporate debt issuers, which are subject to the risk to become
insolvent, but also to that of being downgraded by the rating agencies. Based on the
above considerations, the …nancial and credit market variables we consider here are a
measure of realized volatility of returns, the spread between the Moody’s Baa rated
corporate bonds yield and the 10-year Treasury rate and the number of Moody’s
downgrades.
Realized Volatility of returns
Our choice of using a measure of volatility of the stock returns rather than the returns themselves is motivated by the features of the corporate defaults time series,
whose dynamics are mostly driven by variance. Indeed, as expected for rare events,
the mean number of defaults is low and the level often comes back to zero. It is interesting to investigate the link between the …nancial market and the corporate defaults
dynamics, which is expected to be strong in the crisis periods. Realized volatility
deserves a special insight for several reasons. First, as for each of the covariates we
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 84
include in PARX models, it is important to analyze its time series properties and
verify whether the assumptions on its dynamics (see in particular Assumption 2 in
Chapter 4) are satis…ed. Furthermore, estimating a model for the covariate processes
allows multi-step ahead forecasting (see Section 4.5). Recalling Section 4.1, the traditional realized volatility measures rely on the theory of a series of seminal papers by
Andersen, Bollerslev, Diebold and Labys (2001), Andersen, Bollerslev, Diebold and
Ebens (2001), and Barndor¤-Nielsen and Shephard (2002), showing that the daily
integrated variance, i.e. the integral of the instantaneous variance over the one-day
interval, can be approximated to an arbitrary precision using the sum of intraday
squared returns. Furthermore, other works such as Andersen, Bollerslev, Diebold,
and Labys (2003) show that direct time series modelling of realized volatility strongly
outperforms both the GARCH and stochastic volatility models.
Our approach refers to this theory, even though is not really high-frequency: we
construct a proxy of monthly realized volatility by using the daily returns. Monthly
volatility proxies of this kind can be found, for example, in French, Schwert and
Stambaugh (1987) and Schwert (1989). According to this approach we de…ne the
following measure for the S&P 500 monthy realized volatility:
RVt =
nt
X
2
ri;t
(5.1)
i=1
where ri;t is the i-th daily return on the S&P 500 index in month t and nt is the
number of trading days in month t.
The high values of skewness (9:02) and kurtosis (100:26) of our proxy of realized
variance indicate that it is far from being normally distributed. Nonnormality is
pointed out in empirical works based on realized volatility measures from high frequency data, such as Martens et al. (2009). Realized volatility time series usually
show high variance and peaks, recalling the sharp spikes of in…nite variance processes
that have often been used for modelling the stock market prices (see, for example,
Fama, 1965). The logarithmic transformation of our monthly realized volatility (see
Figure 5.5 (a)) is more suitable for standard time series modelling, because the variance is lower and there are no outlier observations. The high and slowly decaying
autocorrelation (see Figure 5.5 (b)) suggests the use of long memory processes such
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 85
Figure 5.5: (a) Logarithm of S&P 500 monthly realized volatility. (b) Autocorrelation function of logarithmic realized volatility.
as ARFIMA. The long memory of realized volatility is a crucial point in some recent
works on this topic - such as Andersen, Bollerslev and Diebold (2007) and Corsi
(2009) - and put in doubt that the needed stationarity condition is satis…ed. However, the same works claim that the long memory is “apparent”in the sense that the
persistence in realized volatility series can be e¤ectively captured by a special class of
autoregressive models, which include di¤erent autoregressive parts corresponding to
volatility components realized over di¤erent time horizons. These models are called
Heterogeneous Autoregressive model of Realized Volatility (HAR-RV).
Corsi (2009) de…nes a HAR model for daily realized volatility calculated from
intraday data by considering three volatility components corresponding to time horizons of one day (1d), one week (1w) and one month (1m). These “heterogeneous”
lags can be interpreted as taking into account …nancial returns variability with respect to di¤erent investment time horizons. The speci…cation proposed by the author
for the daily realized volatility is the following:
(d)
(d)
where RVt
(d)
(w)
(m)
RVt = c + (d) RVt 1d + (w) RVt 1d + (m) RVt 1d + "t
(5.2)
qP
nt
2
=
i=0 ri;t and nt number of available intraday squared returns while
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 86
(w)
1
RVt
(m)
and
(m)
1
RVt
denote the weekly and monthly realized volatility respectively,
computed as:
(w)
= 51 (RVt
(m)
=
RVt
RVt
(d)
(d)
1d
+ RVt
(m)
1
(RVt
22
(d)
4d )
+ ::: + RVt
(m)
1d
+ RVt
(m)
21d )
+ ::: + RVt
where the multiperiod volatilities are calculated as the simple averages of the
daily ones during the period.
This model is shown to be able to reproduce the long memory of the empirical
volatility. The model performance in terms of both in-sample and out-of-sample forecasting is comparable to that of fractionally integrated models and can be estimated
more easily, since OLS can be employed.
Adapting this approach to our monthly realized volatility could be useful for
carrying out multi-step ahead forecasting in a PARX model including this variable.
A possible choice of the “heterogeneous”lags suitable for our monthly measure would
be including the …rst lag of logarithmic realized volatility and the last half-year
logarithmic realized volatility. The latter is computed as the simple average of the
last six monthly logarithmic realized volatility. This yields the following model:
log RVt = c +
(1m)
log RVt
1
+
(6m)
(6m)
1
log RVt
+ "t
(5.3)
where RVt is de…ned in (5.1), while for the longer period component we have:
1
(6m)
log RVt
= (log RVt + log RVt 1 + ::: + log RVt 5 )
6
Following the notation of Corsi (2009), this speci…cation corresponds to a HAR(2)
model, because two volatility components are entered.
As an example, estimation of (5.3) for the logarithm of monthly realized volatility
in the period from 1982 to 2011 yields the following model:
log RVt =
1:1030 + 0:5543 log RVt
(0:2711)
(0:0580)
1
(6m)
1
+ 0:2733 log RVt
(0:0527)
which is a stationary autoregressive process.
Baa/10-year Treasury spread
The default risk premium, i.e. the risk premium the investors require for accepting
the risk of corporate default, is often calculated as the di¤erence between the yields
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 87
Figure 5.6: Monthly spread between Baa Moody’s seasoned corporate bonds and
10-year Treasury yield.
on corporate bonds and the yields on government securities - mainly the Treasury
bills - which are expected to be risk free. The spreads on Treasury rates can be
considered as an implied default risk, which we expect to be positively correlated to
default intensity. One of the most used is the Baa/10-year Treasury spread, i.e. the
di¤erence between the Moody’s seasoned Baa corporate bond yield and the constant
maturity 10-year Treasury rate. Our source for both rates is the FRED website1 ,
provided by the Federal Reserve Bank of St. Louis. Being a measure of the market
perception of credit risk, the Baa/10-year spread is usually higher during recession
periods, when the investors are worried of default risk even for upper-medium quality
…rms like the Baa rated. This is evident from Figure 5.6: look, for example, at the
high peak in the last crisis period.
1
http://research.stlouisfed.org/.
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 88
Number of downgrades
The monthly counts of defaults are not the only data we get from Moody’s CRC,
which also provides the monthly rating transition matrices, where each entry is the
number of …rms moving from a rating class to another (see 2.1.1 for a comprehensive
analysis of rating and its modelling). As discussed before, the main role of rating is
to give an objective evaluation of corporate solvency. Therefore, the number of …rms
which are downgraded, i.e. moved to a lower rating class, it is naturally expected to
be predictive of an increased default probability. However, the capability of rating to
be a default predictor is not so fair, and, as seen, also put under discussion by several
econometric analyses, like, among the others, Blume et al. (1998) and Nickell et al.
(2000). Thus we think that is important to measure whether and how much the
number of downgrades can support the prediction of the number of defaults. At a
…rst sight (see Figure 5.7), most of the downgrade peaks correspond to the recession
periods and the default clusters, except for the …rst peak taking place in 1982, which
is due to a credit rating re…nement carried out and announced by Moody’s, which
modi…es the classes number and assignment (see Tang, 2009).
5.3.2
Production and macroeconomic indicators
Change in Industrial Production Index
The Industrial Production Index is an economic indicator that measures the real
output for all facilities located in the United States manufacturing, mining and utilities. It is compiled by the Federal Reserve System on a monthly basis in order to
bring attention to short–term changes in the industrial production. As it measures
the movements in the industrial output, it is expected to highlight the structural
developments in the economy. Its change can be considered as an indicator of the
growth in the industrial sector and is already used as a default intensity regressor in
Lando and Nielsen (2010). The monthly percentage change in Industrial Production
index (Figure 5.8) is computed as the logarithmic di¤erence of the monthly Industrial
production Index downloaded from the FRED website.
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 89
Figure 5.7: Monthly number of downgrades among industrial Moody’s rated …rms.
Figure 5.8: Monthly percentage change in Industrial Production Index.
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 90
Leading Index and NBER recession indicator
As our analysis of the default phenomenon is made under an aggregate perspective,
we claim that the e¤ect of business cycle on the default intensity has to be measured
through overall indicators, representing the state of the economy - such as the Leading
Index published by the Federal Reserve - or signalling the expansion and recession
periods, as captured by the NBER recession index. They have been presented in
Section 5.2. The data for both variables are downloaded by the FRED website.
For each …nancial and macroeconomic covariate described above, we perform an
Augmented Dickey-Fuller (ADF) test, rejecting the null hypothesis of presence of
unit roots in all the cases. All the variables introduced above can thus be employed
in the following analysis, since they satisfy the Lipschitz condition (see Assumption
2 in Chapter 4). For realized volatility, the ADF test has been performed on the
series in logarithms, whose properties we have previously investigated.
5.4
Poisson Autoregressive models for corporate
default counts
The …rst objective of our analysis of corporate default counts dynamics is to evaluate
whether the inclusion of exogenous variables can improve the prediction of the number of defaults. In particular, we consider alternative PARX models by including
di¤erent covariates and compare the results. Furthermore we compare the PARX
models with the Poisson Autoregression without exogenous regressors as proposed
by FRT (2009) (PAR). We mainly focus on two aspects: …rst, we evaluate which
of the chosen variables allow to explain the default intensity; second, we compute
the value of the estimated persistence. As seen before, the latter allows to measure
the persistence of shocks in the default counts process. We also aim at evaluating
whether the inclusion of di¤erent covariates has a di¤erent impact on the estimated
persistence: the magnitude of the autoregressive coe¢ cients is expected to decline
in the case one or more covariates explain most of the series long memory. This
objective is thus similar to that of several empirical studies which consider the im-
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 91
pact of covariates, such as the trading volume, in the GARCH speci…cation (see,
for instance, Lamoureux and Lastrapes, 1990 and Gallo and Pacini, 2000) and evaluate their e¤ect on the ARCH and GARCH parameter estimates. In our context,
the …nancial and macroeconomic variables explaining the default intensity can be
considered as common factors in‡uencing the solvency conditions of all companies.
As seen before, in PARX models negative covariates are handled by transforming
them through a positive function f , which can be chosen case by case, as long as the
Lipschitz condition stated in Assumption 1’of Chapter 4 is satis…ed. The speci…cation which generalizes (4.3) by including an n-dimensional vector of covariates is the
following:
t
=!+
p
X
i yt
i+
i=1
where ! > 0,
1;
2;
1;
i
q
X
i=1
+
0, f : R ! R .
i t
i+
n
X
i fi (xi;t 1 )
(5.4)
i=1
According to the choice motivated in the previous section, the covariates included
are the following:
S&P 500 realized volatility (RV ) (see Section 5.3.1 for details on its computation)
Baa Moody’s rated to 10-year Treasury bill spread (BAA_T B)
Moody’s downgrade count (DG)
NBER recession indicator (N BER)
percentage change in Industrial Production Index (IP )
Leading Index (LI)
Function f is simply the identity for covariates assuming only positive values,
while we use the absolute value for transforming the two variables which assume also
negative values, that are the percentage change in the Industrial Production Index
(IP) and the value of Leading Index (LI). Both are also expected to be negatively
correlated to default intensity. Then, for capturing the asymmetric e¤ect of positive
and negative values of these covariates, we introduce a dummy variable which is 1
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 92
when the value is lower than zero. This solution is analogous to that adopted in
the GJR-GARCH model by Glosten et al. (1993), where a dummy variable is introduced for capturing the asymmetric e¤ect of positive and negative lagged returns.
According to Engle and Ng (1993), in the volatility modelling this approach outperforms other speci…cations that overcome the problem of nonnegativity, such as the
EGARCH by Nelson (1991). As to realized volatility covariate, in the previous section we have analyzed its logarithmic transform, which is stationary according to the
ADF test performed. Furthermore, as we have seen, our logarithmic realized volatility has similar properties to the realized volatility measures analyzed in literature,
whose long memory can be e¤ectively captured by stationary HAR processes (Corsi,
2009). Variable RV can then be considered as the exponential transformation of the
logarithmic realized volatility, satisfying the model assumptions.
Preliminary model selection based on information criteria and likelihood ratio
tests leads to choose p = 2 and q = 1, i.e. two lags of the response and one lag of
intensity. Thus, the model including all the six covariates - nesting all the estimated
models presented in the next section - is speci…ed as
= !+
t
1 yt 1
+
+ 4 N BERt
5.4.1
2 yt 2
1
+
5
+
1 t 1
jIPt 1 j +
+
1 RVt 1
6 IfIPt
+
1 <0g
2 BAA_T Bt 1
jIPt 1 j +
7
+
(5.5)
3 DGt 1
jLIt 1 j +
8 IfLIt
1 <0g
jLIt 1 j
Results
Table 5.2 shows the results obtained by estimating2 nine di¤erent PARX models.
The upper portion of Table 5.2 reports the parameter estimates (standard errors
in brackets). The lower portion reports, for each model, two information criteria,
i.e. the AIC (Akaike, 1974) and the BIC (Schwarz, 1978), and the p-value of the
likelihood ratio (LR) test. The latter compares each estimated model with respect
to the one which includes all the six covariates (All in Table 5.2), thus following a
speci…c-to-general model selection approach. The second column reports the results
for the PAR model, i.e. the model with no covariates. The columns from third to
eighth in Table 5.2 report the results of estimation of models including one covariate
2
We write in Matlab the optimization code for maximum likelihood estimation.
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 93
at the time. As explained above, for covariates IP and LI we also consider the
e¤ect of negative values separately, by introducing a dummy variable as in (5.5).
The …rst evidence from our results is that the autoregressive components play the
main role in the defaults dynamics. The estimated persistence is indeed not far
from one in all the models. The number of defaults in the US economy shows a
high persistence of shocks, supporting our proposal of a model able to capture long
memory. But can exogenous covariates explain the strong autocorrelation, and then
the clusters, of defaults? The …rst evidence is that several of the covariates we have
considered are found signi…cant in explaining default intensity when included one at
the time. They are the S&P 500 index realized volatility, the Baa Moody’s rated
to 10-year Treasury spread, the number of Moody’s downgrades and the NBER
recession indicator3 . First of all, we think that it is of particular interest that a
…nancial variable as realized volatility accounts for a real economic issue as defaults
of industrial …rms. The inclusion of realized volatility is indeed new in default risk
analysis. While the use of credit spreads like the Baa to 10-year Treasury Bill is quite
common in default risk prediction - especially in the reduced-form models mentioned
in Chapter 1, using a pricing approach to default risk measurement - the inclusion
of the number of downgrades among the regressors of default counts is new as well.
In fact, there are in literature several works focusing on the link between the rating
transitions and the business cycle - like, among the others, Nickell et al. (2000) and
Behar and Nagpal (2001) - but not estimating a direct relation between downgrades
and defaults at an aggregate level. The signi…cance of the NBER recession indicator
highlights a connection between the business cycle and the defaults dynamics and
con…rms the idea of a relation between economic recession and default clusters. The
e¤ect of the macroeconomic context on default intensity is also captured by including
the Industrial Production Index and the Leading Index. The asymmetric e¤ect of the
positive and negative values of variables IP and LI on default intensity is con…rmed,
3
All the mentioned covariates are found signi…cant at a level of 5% or less, except for the number
of downgrades, which is found signi…cant at the 10% level.
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 94
as they are found signi…cant only when assuming negative values4 : both a decrease in
Industrial Production and a decrease in the value of Leading Index result in a higher
predicted level of default risk. According to the LR test, as well as information
criteria, all the models including one covariate at the time are preferable to the
PAR model, thus highlighting that covariates are needed to account for the default
phenomenon. Among these PARX models, according to both the information criteria
and the LR test, the best are RV and LI. Realized volatility of returns and negative
values of Leading Index are indeed the only two signi…cant covariates in the All
model (5.5), including all the covariates. The result that the number of defaults
is positively associated to the level of uncertainty shown by the …nancial market
only one month before is of particular interest and could be e¤ectively used for risk
management operational purposes. Furthermore, the signi…cance of Leading Index
shows that the macroeconomic context is relevant in default prediction. This is not
an obvious result, as the existence of a link between macroeconomic variables and
corporate default phenomenon is not always supported by similar analyses in the
econometric literature. While, for example, Keenan, Sobehart, and Hamilton (1999)
and Helwege and Kleiman (1997) forecast aggregate US corporate default rates using
various macroeconomic variables, including industrial production, interest rates and
indicators for recession, in some recent works the estimated relation between the
default rates and the business cycle is not so strong. In particular, the empirical
results of both Du¢ e et al. (2009) and Giesecke et al. (2011) show a not signi…cant
role of production growth and Lando et al. (2013) …nd that, conditional on individual
…rm risk factors, no macroeconomic covariate is signi…cant in explaining default
intensity.
Looking now at the estimated persistence (^ 1 + ^ 2 + ^ 1 ) and comparing it between
PAR and All models, we observe that the inclusion of covariates leads to a small
decrease in the level of persistence (from 0:9155 to 0:8758), which is not signi…cant.
The large value of the estimated persistence and its substantial invariance when exogenous covariates are included indicates that the autoregressive parts of the model
4
For models “IP” and “LI ”, as well as “All ”, we perform a restricted maximization of the
log-likelihood function by constraining the coe¢ cients to be positive.
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 95
explain most of the slowly decaying behaviour of the autocorrelation function characterizing the default dynamics (see Figure 5.2). However, …nding signi…cant variables
in default count time series is of relevant interest in default risk evaluation and
forecasting. An increase in the level of the identi…ed risk factors can indeed be a
“warning”for risk managers and, in general, default risk evaluators.
The …nal model we obtain on the basis of our model selection procedure is labelled
LMRV & LI (-) in Table 5.2. Here we include both the S&P 500 realized volatility
and the Leading Index - when taking negative values - in the model speci…cation.
1
(0.0617)
(0.0667)
-1352.04
-1336.47
0.0000
LR test (p-value)
0.4424
-1349.36
-1368.82
0.9026
(0.0170)
0.9155
(0.0267)
BIC
(0.0675)
0.5520
(0.0643)
0.1453
(0.0451)
0.2127
(0.0944)
0.2023
IP
(0.0724)
0.4979
(0.0635)
0.1979
(0.0452)
0.1927
(0.1465)
0.2949
LI
(0.0686)
0.5177
(0.0618)
0.1878
(0.0450)
0.1850
(0.0717)
0.2324
RV & LI(
0.8788
0.0455
-1340.40
-1359.86
(0.0169)
0.8731
0.0047
-1333.42
-1352.88
(0.0213)
0.9039
0.0095
-1335.48
-1354.94
(0.0168)
0.9100
0.0982
-1337.17
-1360.52
(0.0202)
0.9931
-1351.71
-1375.06
(0.0223)
0.8885
0.9964
-1354.17
-1377.52
(0.0261)
0.8905
0.7540
0.7297
(0.1954)
0.9413
(0.2245)
-1319.14
-1365.84
(0.0241)
0.8758
(0.3189)
0.0000
(0.0821)
0.0000
0.0000
(0.1843)
0.6945
(0.2113)
(0.0644)
0.0000
(0.1647)
(0.1423)
(0.4656)
(0.1883)
0.0000
0.0000
0.4196
0.0059
(0.0092)
(0.0090)
(0.0951)
0.0171
(0.0867)
(14.368)
24.313
(0.0723)
0.5123
(0.0633)
0.1834
(0.0457)
0.1801
(0.2081)
0.2083
All
0.0000
)
0.2407
(13.659)
(0.0746)
0.4696
(0.0657)
0.2063
(0.0445)
0.2280
(0.0816)
0.2897
NBER
(15.565)
(0.0750)
0.4547
(0.0653)
0.1976
(0.0448)
0.2208
(0.0930)
0.2065
DG
28.092
(0.0797)
0.4298
(0.0660)
0.2217
(0.0448)
0.2273
(0.1621)
0.1166
BAA_TB
63.991
(0.0663)
0.1796
0.2148
(0.0755)
(0.0447)
(0.0443)
0.5263
0.1966
0.2409
0.4598
0.1690
(0.0685)
0.3015
(0.0832)
AIC
^ 1 + ^2 + ^1
^8
^7
^6
^5
^4
^3
^2
^1
^
^2
^1
!
^
RV
Table 5.2: Estimation results of di¤erent PARX models.
PAR
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 96
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 97
Figure 5.9: Observed and …tted monthly number of defaults from January 1982 to
December 2011 for PARX model including logarithmic realized volatility and Leading
Index.
5.4.2
Goodness of …t analysis
Overall, as can be seen from Figure 5.9, the model including realized volatility and
Leading Index using the prediction y^t = ^ t captures the default counts dynamics
satisfactorily.
A commonly used diagnostic check for Poisson-type count models is to test the
absence of autocorrelation in the Pearson residuals (see Section 3.2.5), which are
^
the standardized version of the raw residual yt
t ( ), taking into account that the
conditional variance of yt is not constant. In fact, the sequence of Pearson residuals
estimates the sequence
yt
et = p
t
;
t = 1; :::; T
t
which, as previously seen, is an uncorrelated process with mean zero and constant
variance under the correct model. In addition, no signi…cant serial correlation should
be found in the sequence e2t as well. As can be seen from Figure 5.10, the Pearson
residuals of our …nal estimated model do not show signi…cant autocorrelation at any
lag, thus approximate a white noise satisfactorily. In order to check the adequacy of
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 98
Figure 5.10: Autocorrelation function of Pearson residuals for PARX model including
logarithmic realized volatility and Leading Index.
our model, following Jung et al. (2006) we perform a Ljung-Box test for the Pearson
residuals and the squared Pearson residuals including 30 lags. The resulting p-values
(0:661 and 0:373 respectively) indicate that the model successfully accounts for the
dynamics of the …rst and second order moments of our default counts.
An important point concerning the PARX model goodness of …t analysis in the
speci…c case of our empirical study should be considered: when applying the PARX
model to the counts of defaults, the aim is to capture the default clusters and signal
the periods where the default intensity, and thus the default risk, is higher. Then, the
model performance is crucial when the number of observed events is relatively high.
In this respect, Table 5.3 compares the empirical (second column) and estimated
frequencies (third column) for di¤erent values of yt . Each of the estimated frequencies
is computed as the probability of observing a count falling in the range de…ned in
the …rst column, under the estimated model.
In order to test the equality between theoretical and observed frequencies, we
employ the test derived in the following and similar to the common test for equality
of Bernoulli proportions. Suppose that we want to test the equality of the empirical
and theoretical frequency of yt values belonging to a subset A of N[f0g = f0; 1; 2; :::g.
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 99
Count
Empirical frequency Estimated frequency p-value
yt = 0
0 < yt
5
yt > 5
5 < yt
10
0:18
0:12
0:001
0:62
0:68
0:002
0:21
0:19
0:384
0:14
0:14
0:741
Table 5.3: Empirical and estimated frequencies of default counts.
First de…ne
Zt = I (yt 2 A)
and
t
It can be noted that E (Zt
= Pr (Zt = 1jFt 1 )
t jFt 1 )
= 0, i.e. Zt
is a martingale di¤erence
t
sequence with respect to Ft 1 . The conditional variance of each Zt
t
variable can
be derived as follows:
V (Zt
t jIt 1 )
2
t)
= E (Zt
= E Zt2 jIt
1
= E Zt2 jIt
1
jIt
1
2
t
+
2
t
De…ne now
ST =
= E Zt2 jIt
T
X
+
2
t
2 t E (Zt jIt 1 )
2
t
2
=
1
t
(Zt
2
t
=
t
(1
t)
t)
t=1
As the sequence
t
(1
t)
is a stationary and ergodic process, we have that the
mean of the conditional variances is asymptotically constant:
V
S
pT
T
=
T
1X
T t=1
t
(1
t)
p
!
2
This allows to apply the Martingale Central Limit Theorem (Brown, 1971) to ST
and state that
sT = qP
T
t=1
ST
t (1
t)
!d N (0; 1)
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 100
A one-sided or two sided test can be constructed based on N (0; 1) critical values,
replacing the unknown
t ’s
with their estimates
^ t = Pr Zt = 1j t (^)
given by the model.
The last column of Table 5.3 shows the p-value of the two-sided test constructed
as above for di¤erent A subsets.
As can be seen from Table 5.3, for values larger than 5 and for the subset (5; 10],
we accept the null hypothesis of equality between the empirical and theoretical proportion at the 5% signi…cance level. It is a good result that the model correctly
estimates the frequency of defaults when the relevance of the phenomenon becomes
considerable. Prediction is indeed not crucial in periods of stability, when defaults
are rare and isolated events. Equality is rejected when the number of defaults is null
or very low.
Some considerations have to be made about the incidence of zero counts. Default
of rated …rms is a rare event, nearly exceptional in periods of economic expansion
and …nancial stability. Thus, default count time series are characterized by a high
number of zero observations. In our default counts dataset, there are 63 zeros on
a total of 360 observations, corresponding to a proportion of 17:5%. In the PARX
models, the distribution of the number of events conditional on its past and on the
past value of a set of covariates is Poisson. The Poisson distribution does allow for
zero observations. At each time t, the probability of having a zero count is given by
exp(
t ),
intensity
i.e. the probability corresponding to value 0 in a Poisson distribution of
t.
An aspect often investigated in Poisson regression models speci…cation
analysis is whether the incidence of zero counts is greater than expected for the
Poisson distribution. In our application, the analysis of the incidence of zero counts
should take into account two main points. First, the empirical frequency of zero
counts has to be compared to that implied by the PARX model. Then, the relevance
of a possible underestimate of the number of zeros has to be evaluated with respect
to our speci…c case.
Figure 5.11 can give an idea of the relation between the observed zeros and the
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 101
Figure 5.11: Empirical zero counts (asterisks) and probability of having a zero count
under the estimated model (crosses).
probability of having a sampling zero under the model assumptions. There is a clear
correspondance between the periods characterized by a higher number of zeros and
the probability of having a sampling zero. The latter reaches values of more than
40% in the two most “zero-in‡ated” periods of 1982-1987 and 1994-1997. There is
only one part of the series, around year 1987, showing an estimated frequency of less
than 10% zeros when the empirical one is high. However, this period anticipates that
of the last eighties …nancial crisis, characterized by a rapidly increasing number of
defaults and corresponding to a decrease in the estimated zero counts probability.
A possible way of accounting for excess zeros in the Poisson models is to de…ne
mixture models such as those proposed and applied in the works of Mullahy (1986),
Johnson, Kotz, and Kemp (1992) and Lambert (1992) and known as Zero-In‡ated
Poisson models (ZIP). In ZIP models, an extra-proportion of zeros is added to that
implied by the Poisson distribution.The zeros from the Poisson distribution can be
considered as sampling zeros, occurring by chance, while the others are structural
zeros, not depending on the regressors dynamics. It is worth to note that in our application, considering an aggregate data of default incidence, the distinction between
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 102
structural and sampling zeros is not so relevant. First of all, the occurrence of the
single default is linked to the individual …rm history and to occasional - and di¢ cult
to predict - individual events. Furthermore, the zero-in‡ated periods are those where
the importance of default prediction is low.
5.5
Out-of-sample prediction
We perform a forecasting experiment for evaluating the PARX model out-of-sample
performance. We focus, in particular, on the out-of-sample prediction in the period
going from January 2008 to December 2011, corresponding to the last …nancial crisis
and showing a sharp peak in the number of defaults. In particular, we perform a
series of static one-step-ahead forecasts, updating the parameter estimates at each
observation. The PARX model we consider includes the S&P 500 realized volatility
and the negative values of the Leading Index, which is the preferable model according
to the selection presented in the previous section. We also compare the results with
those obtained with the PAR model, for evaluating whether the covariates included
improve the prediction. Table 5.4 shows the results of both point (third and sixth
column) and interval (columns fourth to …fth and sixth to seventh) estimate at each
step, from h = 1 to h = 48, corresponding to the last observation in our dataset.
Following Section 4.5, the point estimate of yT +h is de…ned as
y^T +hjT +h
1
= ^ T +hjT +h
1
while the 95% con…dence interval for the estimate of y^T +hjT +h 1 is given by
h
i
^
^
CI1 = Q =2j T +hjT +h 1 ; Q 1
=2j T +hjT +h 1
where
= 0:05. In Table 5.4, Q
=2j ^ T +h j T
and Q 1
=2j ^ T +h j T
are indic-
ated as “min” and “max” respectively. We also report, as performance measures,
the mean absolute error (MAE) and the root mean square error (RMSE). According
to both indicators, the PARX model slightly outperforms the model without covariates. A comparison between the two models is also possible from Figure 5.12,
plotting the actual number of defaults joint to the minimum (“min”) and maximum
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 103
(“max”) value of the forecast con…dence interval for the PARX (…rst panel) and the
PAR (second panel) model. Not surprisingly, in both cases the peak of March 2009,
corresponding to an outlier in the default count time series, is out of the forecasting
interval. There is indeed for both models a delay of three months in predicting the
sharpest peak of the series. However, the PARX model predicts four defaults more
than the PAR in the peak, thus considering the realized volatility - as a proxy of the
…nancial market uncertainty- and the Leading Index - summarizing the macroeconomic context - allows to reduce the underestimate of the number of defaults in this
cluster. Furthermore, the rapid increase of the default counts starting from November 2008 is captured better from the PARX model, whose predicted values increase
more quickly than the number of defaults forecasted by the PAR. The high value of
persistence, not far from one in all the estimates, and the consequent slow decrease
of the autocorrelation lead the predicted series to decrease more slowly than the
empirical series of default counts. Overall, the PARX model performs better than
the PAR in capturing the default clustering.
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 104
PARX
PAR
min
max
min
max
1.094
0
4
1.081
0
3
3
1.786
0
5
1.779
0
5
3
4
2.340
0
6
2.337
0
6
4
3
2.703
0
6
2.705
0
6
5
7
2.958
0
7
2.919
0
7
6
3
3.635
0
8
3.602
0
8
7
6
3.953
1
8
3.893
1
8
8
4
4.230
1
9
4.108
1
8
9
5
4.501
1
9
4.287
1
9
10
4
4.590
1
9
4.334
1
9
11
3
4.589
1
9
4.321
1
9
12
16
6.077
2
11
4.059
1
8
13
11
8.912
4
15
6.046
2
11
14
16
11.066
5
18
8.144
3
14
15
29
13.230
7
21
10.178
4
17
16
19
17.216
10
26
15.170
8
23
17
23
19.602
11
29
18.103
10
27
18
21
20.121
12
29
18.963
11
28
19
14
20.290
12
30
19.600
11
29
20
5
18.369
10
27
17.767
10
26
21
16
14.690
8
23
13.650
7
21
22
6
12.512
6
20
11.867
6
19
23
5
11.062
5
18
10.786
5
18
24
6
8.255
3
14
7.840
3
14
25
6
6.705
2
12
6.470
2
12
26
1
5.970
2
11
6.012
2
11
27
5
4.731
1
9
4.617
1
9
28
4
3.799
1
8
3.788
1
8
29
0
3.926
1
8
4.116
1
9
30
3
3.161
0
7
3.063
0
7
31
3
2.546
0
6
2.387
0
6
32
4
2.801
0
6
2.790
0
6
33
2
3.076
0
7
3.205
0
7
34
2
3.055
0
7
3.138
0
7
35
4
2.599
0
6
2.652
0
6
36
4
2.723
0
6
2.917
0
7
37
0
3.212
0
7
3.479
0
8
38
1
2.660
0
6
2.762
0
6
39
3
1.832
0
5
1.822
0
5
40
0
1.964
0
5
2.070
0
5
41
2
1.876
0
5
1.912
0
5
42
2
1.595
0
4
1.656
0
5
43
0
1.849
0
5
1.971
0
5
44
0
1.613
0
4
1.634
0
5
45
1
1.185
0
4
1.049
0
3
46
1
1.298
0
4
1.018
0
3
47
3
1.465
0
4
1.224
0
4
48
2
1.933
0
5
1.802
0
5
h
yT +h
1
5
2
y^T +hjT +h
1
y^T +hjT +h
1
MAE
2.543
2.840
RMSE
4.119
4.613
Table 5.4: Out-of-sample estimation results of PARX and PAR model.
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 105
Figure 5.12: Actual and forecasted number of defaults with PARX (…rst panel) and
PAR (second panel) model.
5.6
Concluding remarks
In this chapter we have presented an empirical analysis of the corporate default
dynamics. Our study is based on the estimation of Poisson Autoregressive models
for the monthly count of defaults among Moody’s rated industrial …rms in the period
from January 1982 to December 2011. The objectives of our analysis is two-fold:
…rst, we want to evaluate whether there are macroeconomic and …nancial variables
which can be useful in default prediction; secondly, an important point is to consider
the relevance of the autoregressive components, whose presence is an essential part
CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 106
of our modelling approach. We estimate both the Poisson Autoregression without
covariates (PAR) and di¤erent PARX models, including macroeconomic and …nancial
covariates. Our results show that all the PARX models estimated are preferable to
the PAR. The more relevant covariates in explaining default intensity according to
our results are a macroeconomic variable - the Leading Index released by the Federal
Reserve - and a …nancial variable - the realized volatility of the S&P 500 returns.
At our knowledge, this is the …rst work showing a positive association between the
…nancial market uncertainty captured by the realized volatility and the number of
corporate defaults. The link between the returns realized volatility and the defaults
dynamics worths to be further investigated. Another aspect which should be further
analyzed is the high persistence in the default intensity estimated by PARX models.
The persistence of the shocks in the number of defaults could be caused by both
persistence in the common default risk factors and contagion e¤ects among …rms.
Overall, our results show that the PARX model including realized volatility and
Leading Index …ts the data satisfactorily and captures the default clustering. We have
also performed a forecasting experiment in order to evaluate the PARX model out-ofsample performance during the 2008-2011 crisis period and reached quite satisfactory
results, showing that including covariates improves the out-of-sample prediction of
the default counts.
Chapter 6
Conclusions
We have developed this thesis work in the aim of studying the modelling of default risk, proposing a new modelling framework and highlighting the main factors
in‡uencing the corporate defaults dynamics.
We have started from the analysis of the stylized facts in corporate default counts
and rates time series. The default phenomenon, as most rare events, is characterized by overdispersion - the variance of the number of events is much higher than
its mean - leading to series showing both peaks (“clusters”) and periods of low incidence. Moreover, the defaults time series are characterized by a slowly decreasing
autocorrelation function, which is a typical feature of long-memory processes. In recent years, encouraged by the increasing relevance of the default phenomenon during
the …nancial crisis started in 2008, the econometric and …nancial literature has shown
a growing interest in default risk modelling. In particular, as seen in Chapter 2, in
most works the topic of default predictability has been investigated by analyzing the
link between the default clusters and the macroeconomic context. Another relevant
aspect in default prediction is the role of rating, which we have analyzed both in
the theoretical part of the thesis and in our empirical study. Several recent works
- we have reviewed in details the approach of Das et al. (2006), Lando and Nielsen
(2010), Lando et al. (2013) - have developed and applied models based on counting processes, where the modelled variable is the default intensity, i.e. the expected
number of defaults in the time unit, typically a month. The use of counts eases
107
CHAPTER 6. CONCLUSIONS
108
the test of independence of default events conditional on common macroeconomic
and …nancial factors. Comparing the distribution of the default counts to a Poisson
distribution with constant intensity is the crucial feature of the cited works and has
inspired our idea: modelling defaults with a conditional Poisson models with timevarying intensity, allowing for overdispersion and slowly decaying autocorrelation of
the counts through the inclusion of autoregressive dynamics. We have, then, reviewed
the recent literature of Autoregressive Conditional Poisson models (ACP), focusing
on Poisson Autoregression by Fokianos, Rahbek and Tjøstheim (2009), which is the
…rst work studying ergodicity of these models and providing the asymptotic theory,
allowing for inference. De…ning an autoregressive Poisson model for default counts,
linking the expected number of default events on its past history, is the …rst part of
our contribution. The inclusion of autoregressive components is also relevant in the
analysis of correlation between corporate defaults, linked to the recent debate about
the possible existence of default contagion e¤ects.
The consideration that the expected number of defaults is probably in‡uenced by
the macroeconomic and …nancial context in which corporate …rms operate has led us
to the idea of extending Poisson Autoregression by Fokianos, Rahbek and Tjøstheim
(2009) (PAR) by including exogenous covariates. This is our methodological contribution, developed in Chapter 4, where we have presented a class of Poisson intensity
AutoRegressions with eXogeneous covariates (PARX) models that can be used for
modelling and forecasting time series of counts. We have analyzed the time series
properties and the conditions for stationarity for this new models, also developing
the asymptotic theory. The PARX models provide a ‡exible framework for analyzing dependence of default intensity on both the past number of default events and
other relevant variables. In Chapter 5 we have applied di¤erent Poisson Autoregressive models, presenting an extended empirical study of US corporate defaults based
on Moody’s monthly default count data. The time interval considered, going from
January 1982 to December 2011, includes three clusters of defaults corresponding to
three crisis periods: the last eighties …nancial markets crisis, the 2000-2001 information technology bubble and the …nancial and economic crisis started in 2008. We
have proposed and motivated a selection of covariates which can potentially explain
CHAPTER 6. CONCLUSIONS
109
the default clusters and the strong autocorrelation in the number of defaults. An
original feature is, in particular, the inclusion of a measure of intra-monthly realized
volatility, computed from daily S&P 500 returns. Realized volatility is indeed expected to summarize the uncertainty on …nancial markets, characterizing the periods of
…nancial turmoil when defaults are more likely to cluster. According to the results
of our empirical analysis, the one-month lagged realized volatility of returns is the
most relevant covariate in explaining default intensity, together with the one-month
lagged Leading Index. The latter is a macroeconomic indicator provided by the Federal Reserve and including a set of variables expected to anticipate the US economic
tendency. At our knowledge, ours is the …rst work showing a positive association
between the …nancial market uncertainty captured by the realized volatility and the
number of corporate defaults. Also the inclusion of the Leading Index is new and
its signi…cance highlights the predictive role of the business cycle, which previous
works try to include using GDP and industrial production growth, not always found
signi…cant in explaining default frequencies. Overall, our results have shown that the
PARX model including realized volatility and Leading Index …ts the default count
data satisfactorily and captures the default clustering. We have also performed a
forecasting experiment in order to evaluate the PARX model out-of-sample performance during the 2008-2011 crisis period and reached quite satisfactory results, showing that including covariates improves the out-of-sample prediction of the default
counts. However, the default counts dynamics are mainly led by the autoregressive
components and show a high persistence of shocks, even when signi…cant exogenous
covariates are included. In this respect, the main consideration arising is that the
modelling of the aggregate default intensity should be supported by the analysis of
…rm-speci…c, or, at least, sector-speci…c variables. Sector pro…t indexes, for example,
could improve the default prediction, as solvency is strongly linked to the …rms balance sheet data. Including less aggregate data in default risk analysis could also
allow to identify the risk factors linked to correlation among the solvency conditions
of di¤erent companies. The fact that the autoregressive components have a stronger
role than the overall default risk factors in explaining the defaults dynamics is an
interesting result. However, it is not su¢ cient to state that contagion e¤ects explain
CHAPTER 6. CONCLUSIONS
110
the autocorrelation in the number of defaults, as long as the commercial and …nancial links among companies are not taken into account. Another important aspect to
point out relative to the prominent role of the autoregressive part is that it should
not discourage the search and the analysis of exogenous risk factors. Finding variables signi…cantly associated to the number of defaults can indeed provide warning
signals in default risk evaluation.
At the aggregate level, the default phenomenon is in‡uenced by the …nancial
and macroeconomic context, but, at the same time, has an e¤ect on it. The most
immediate example is that of the credit spreads - included in our empirical study
- which re‡ect the level of default risk connected to …nancial positions. A higher
default risk also a¤ects the agents expectations, having an impact on the uncertainty
captured by the …nancial returns volatility. When the number of defaults is high,
also the companies investment decisions and the commercial links among …rms are
a¤ected, with consequences on industrial production. These considerations suggest
the relaxing of the covariate exogeneity assumption and, as a future development of
our work, the de…nition of a multivariate model. Another aspect which should be
further analyzed is the usefulness of the PARX models for defaults at the operational
level: the relevance of a new model for default risk should be evaluated with respect
to the actual needs in risk management practices. As an example, one of the main
applications of the models for default risk concerns the pricing of corporate bonds.
Measuring how much our estimated default intensity re‡ects in the market price of
the …nancial instruments issued by rated companies could support the evaluation of
the PARX models performance.
Appendix A
Proofs
Proof of Theorem 4.1
Pmax(p;q)
(
De…ne := max
i=1
i
+
< 1. Moreover, consider the norm given
i) ;
by k(x; )kw := wx kxk + w k k, where wx ; wy > 0 are chosen below. Next, with
=(
1 ; :::;
p)
and
=
1 ; :::;
, and, correspondingly, N of dimension p and
q
of dimension q,
+ f (x))0 ;
F (x; ; "; N ) = (g (x; ") ; ! + N ( ) +
consider, with Nt = (Nt ; :::; Nt p )0 ,
h
i
E F x; ; "t ; Nt ( )
F x~; ~ ; "t ; Nt ( )
h n w
= wx E [kg (x; ") g (~
x; ")k] + w E
Nt ( )
o
~
Nt
max(p;q)
1=s
wx
= wx
If
=
E
If
=
kx
1=s
Pmax(p;q)
h
(
i
+
i
+
~ +w
i)
L kx
i ),
X
(
i
+
x~k
(A.2)
i=1
F x~; ~ ; "t ; Nt ( )
, then choose,
1=s
~
i)
1=s
then choose w = wx
wx
o
~ + ff (x)
max(p;q)
x~k + w
F x; ; "t ; Nt ( )
1=s
(
i=1
L kx
+w
i=1
X
x~k + w
+
n
(A.1)
+w
w
i
L = (1 + )
111
= ( L) such that,
(x; )
x~; ~
.
w
1=s
wx ,
(A.3)
f (~
x)g
i
APPENDIX A. APPENDIX
112
or w = 1=s wx = ( L), for some small > 0, such that (1 + ) < 1, and hence
h
i
E F x; ; "t ; Nt ( )
F x~; ~ ; "t ; Nt ( )
(1 + ) (x; )
x~; ~
w
w
.
Finally, E
F 0; 0; "t ; Nt
= wx E [kg (0; ")k] + w ( f (0) + !) < 1 by As-
w
sumption 4. Then the result holds by Corollary 3.1 in Doukhan and Wintenberger
(2008).
That yt is stationary is clear. Next, with zt := (x0t ;
P ((yt ; zt ) 2 A
t)
0
consider
B j My;t p ; Mz;t p ) = P (yt 2 A j zt 2 B, My;t p ; Mz;t p )
P (zt 2 B j My;t p ; Mz;t p ) ;
where Mx;t
k
=
(xt k ; xt
k 1 ; :::).
Now by de…nition of the process,
P (yt 2 A j zt 2 B, My;t p ; Mz;t p ) = P (yt 2 A j zt 2 B) :
Next, using the Markov chain property of zt ,
P (zt 2 B j My;t p ; Mz;t p ) = P (zt 2 B j Mz;t p ) ;
where the right hand side by
weak dependence of zt converges to the marginal
P (zt 2 B) as p ! 1. Hence so does P ((yt ; zt ) 2 A
A; B and p, p ! 1.
Now consider E [jyt js ] =
Ps
j=0
s
j
B j My;t p ; Mz;t p ) for any
E[( t )j ], where
max(p;q)
X
E[ t ] =
(
i
+
yt
1
i) E
( t ) + Ef xt
+!
1
i=1
( t )s =
with yt = (yt ; :::; yt
s
X
s
j
j=0
0
p+1 )
and
t
j
+
! + f xt
t 1
= ( t ; :::;
s j
1
;
0
t q+1 ) .
Hence,
s
h
X
s
E[( t ) ] =
E
j
j=0
s
=E
yt
1
+
yt
1
+
s
t 1
j
t 1
! + f xt
+ E ! + f xt
s j
1
s
1
i
+ E rs
1
yt 1 ;
t 1; f
xt
1
;
with rs
1
(y; ; z) an (s
1)-order polynomial in y; ; z and so E [rs
induction assumption.
Moreover, E
! + f xt
s
1
( )] < 1 by
< 1 by applying Doukhan and Wintenberger
1
(2008) (Theorem 3.2) on xt and applying Assumption 2, such that we are left with
considering terms of the form,
E
i yt 1 i
s
X
s
=
j
j=0
s
X
s
=
j
j=0
=
s
X
s
j
j=0
s
+
i t 1 i
j
i
s j
E
i
j
i
s j
i
j
i
s j
E
i
h
yt
s j
j
1 i
t 1 i
j
h
X
j
E ( t )s+(k
k
k=0
j)
i
i
[( t )s ] + C
E [( t )s ] + C;
i
h
k
as by induction assumption all E ( t ) < 1, for k < s: Collecting terms,
=(
i
+
i)
2
s
max(p;q)
which for
Pmax(p;q)
i=1
E [( t )s ] = 4
(
i
+
i)
X
(
i
+
i=1
3s
~
5 E [( t )s ] + C;
i)
< 1 has a well-de…ned solution.
Proof of Lemma 4.1
In terms of initial values, consider, next, a process Xt = F (Xt 1 ; "t ), where
kF (x; )
F (~
x; )k
kx
x~k, j j < 1 and kg (0; ")k < 1, which is -weakly
dependent. With Xt denoting the stationary solution and X0 = x …xed, we wish to
P
a:s:
show, for some h
, T1 t=1 h (Xt ) ! E[h (Xt )]. Now,
1X
1X
h (Xt ) =
[h (Xt )
T t=1
T t=1
and
1X
[h (Xt )
T t=1
h (Xt )]
h (Xt )] +
1X
h (Xt ) ;
T t=1
1X
jh (Xt )
T t=1
h (Xt )j :
APPENDIX A. APPENDIX
114
Assume furthermore that jh (x)
h (~
x)j
x~jj;
Ljjx
…nd by repeated use of iterated expectations,
h (Xt )j] = E E jh (Xt )
E [jh (Xt )
LE E
h (Xt )jj Xt 1 ; Xt
g (Xt 1 ; "t )
L E E
=L E
Xt
Xt
1
Xt
1
Xt
1
z, z > 0, then we
(z)
1
g Xt 1 ; "t
Xt 1 ; Xt
Xt 1 ; Xt
1
1
1
L t E [kX0
X0 k]
Proof of Lemma 4.2
The proof mimics the proof of Lemma 2.1 in Fokianos, Rahbek and Tjøstheim
(2009), where the case p = q = 1 is treated. Without loss of generality, set here
p = q , such that, by de…nition,
c
t
t
=
p
X
i
ytc
yt
i
+
i
i
c
t i
+ ect ;
t i
(A.4)
i=1
with ect := f (xt 1 ) I (kxt 1 k c). Hence E [ ct
t] =
Pp
and, as j=1 j + j < 1, E ect i
1 (c) with
Pp
result holds with 1 (c) := 1 (c) = 1
j + j
j=1
E(
c
t
2
t)
=
p
X
2
iE
ytc
i
yt
2
i
1
i=0
1
i
j
j=1
+
j
. Next,
2
t i
+
2
E (ect )2
i=1
+2
p
X
i jE
c
t j
t j
ytc
i;j=1;i<j
p
+2
X
c
t i
iE
ect
t i
+2
i=1
+2
p
X
i
p
X
yt
i
i
E ect ytc
i=1
i
jE
ytc
j
yt
j
ytc
i
yt
i
i;j=1;i<j
p
+2
X
i;j=1;i<j
i jE
c
t j
t j
c
t i
E ect
(c) ! 0 as c ! 1, the …rst
c
t i
2
iE
+
Pp
Pt
t i
i
yt
i
i
,
With
c
t
t;
and t
s;
c
t
c
t ) (ys
E [(
c
t
= E [E ((
c
t
= E [(
where Fs
1
=
ys )]
c
t ) (ys
t) E
(xk ; Nk ; k
ys )j Fs 1 )]
c
s ])]
(Ns [ s ;
=E(
c
t]
1) and Nt [ t ;
s
c
t
yt ) (ysc
= E [(ytc
c
t
ys )] = E [E ((ytc
yt ) E ((ysc
=
p
X
i
i=1
p
=
X
i=1
p
+
X
i
"
i
ytc
yt
i
p
X
+
i
ytc
j
yt ) (ysc
ys )j Fs 1 )] = E (ytc
s; note that the recursion for (
t
c
t.
t
c
t]
Also observe that,
s;
E [(ytc
For t
(A.5)
s) ;
is the number of events in [ t ;
for the unit-intensity Poisson process Nt : Likewise for
still for t
c
t) ( s
t)
c
t i
i
yt
i j
c
t
+
yt ) (
c
s
(A.6)
s) ;
above gives,
+ ect
t i
i j
ys )j Fs 1 )]
c
t i j
j
+ ect
t i j
i
j=1
ytc
yt
i
#
+ ect
i
i=1
= :::
=
t s
X
aj
ytc j
yt
j
+ gj et
j
+
j=1
p
X
cj
c
s j
s j
+ dj ecs + hj ysc
j
ys
j=1
(A.7)
Observe that aj ; gj ; cj ; dj and hj are all summable. Using this, we …nd,
!
t s
X
c
E [( ct
ys )] = E
aj ytc j yt j + gj et j (ysc ys )
t ) (ys
j=1
+E
p
X
cj
c
s j
s j
+ dj ecs + hj ysc
j
ys
j
(ysc
ys )
j=1
(A.8)
Pt
2
2
c
Collecting terms, one …nds E ( ct
for some
t ) is bounded by C
j=1 j E et j
P1
constant C, some i with i=1 i < 1 and which therefore tends to zero.
!
j
:
APPENDIX A. APPENDIX
116
Finally, using again the properties of the Poisson process Nt , we …nd
E (ytc
yt )2
E (
2
t)
c
t
+ jE (
c
t
t )j
E(
2
t)
c
t
+
1
(c) :
(A.9)
This completes the proof of Lemma 4.2.
Proof of Theorem 4.2
We provide the proofs for the case of p = q = 1 as the general case is complex in
terms of notation. With p = q = 1;
t
( ) = ! + yt
1
+
t 1
( ) + f (xt 1 ) :
The result is shown by verifying the conditions in Kristensen and Rahbek (2005,
Lemma X).
Score
The score ST ( ) = @LT ( ) = (@ ) is given by
ST ( ) =
T
X
yt
t( )
st ( ) , where st ( ) =
t=1
Here, with
1
@
t
@
( )
.
(A.10)
= (!; ; )0 and vt = (1; yt 1 ; f (xt 1 ))0
@
t
( )
@
@
t
( )
@
In particular, with
t
=
t
t
=
t 1
@
t 1
( )
(A.11)
@
( )+
@
t 1
( )
(A.12)
@
( 0 ),
st ( 0 ) =
and where _ t = @
= vt +
@
( ) = (@ )
respect to Ft = F (yt k ; xt k ;
t
( )
@
=
0
t;
t
:=
Nt ( t )
1 :
(A.13)
t
. This is a martingale di¤erence sequence with
t k,
k = 0; 1; 2; :::) as E ( t jFt 1 ) = 0. It therefore
p
follows by the CLT for martingales, see, e.g., Brown (1971), that T ST ( 0 ) !d
N (0; ), where
= E st ( 0 ) st ( 0 )0 ;
APPENDIX A. APPENDIX
117
if we can show that the quadratic variation converges, hST ( 0 )i !P
end, observe that E
2
t jFt 1
= 1=
t
< 1=! 0 . Thus,
T
1X
hST ( 0 )i =
E st ( 0 ) st ( 0 )0 jFt
T t=1
where _ t = @
t
( ) = (@ )
=
_ t = (v 0 ;
t
0
. To this
1
T
1 X _ _0
=
t t= t;
T t=1
(A.14)
. As _ 0 = 0,
t
0
_t
1) +
1
=
t 1
X
i
vt0 i ;
t 1 i
0
(A.15)
;
i=1
By the same arguments as in the proof of Theorem 4.1, it is easily checked that the
~ t := Xt ; _ t , with Xt de…ned in Theorem 4.1, is weakly deaugmented process X
!, it therefore follows that E _ t _ t
P
0
1. Thus, we can employ Lemma 4.1 to obtain that T1 Tt=1 _ t _ t = t ! .
pendent with second moment. Since
t
0
=
<
t
Information
It is easily veri…ed that
@ 2 lt ( )
=
@ @ 0
yt @ t ( ) @ t ( )
2
@
@ 0
t ( )
yt
t( )
1
@2 t ( )
;
@ @ 0
(A.16)
where
@ t 1( )
@2 t 1 ( ) X
@2 t ( )
=
+
=
@ @
@
@ @
i=1
t 1
@2 t
@2 t ( )
@ t 1( )
+
=
2
@
@ 2
@
1(
2
)
=2
[email protected] t i
t 1
X
[email protected] t i
( )
(A.18)
@
(A.19)
~ t ( ) := X 0 ( ) ; _ t ( ) ; • t ( )
In particular, the augmented process X
t
to be weakly dependent with second moments for
2
0
can be shown
. In particular, for all
T
T
h
i
1X
1 X @ 2 lt ( )
~ t ( ) !P E h X
~t ( ) ;
=
h
X
T t=1 @ @ 0
T t=1
Moreover,
(A.17)
@
i=1
@2 t ( )
@2 t ( )
=
= ::: = 0
@ 2
@ 2
( )
2
~ t ( ) = @ lt ( ) :
h X
@ @ 0
7! @ 2 lt ( ) = (@ @ 0 ) is continuous and satis…es
@ 2 lt ( )
@ @ 0
~t
D X
:=
yt @ t
! 2L @
@
t
@
0
2
yt
!L
1
@2 t
@ @
0
;
,
APPENDIX A. APPENDIX
where
contains the maximum values of the individual parai
~
, with E D Xt ( ) < 1. For example,
= (! U ;
meters in
@
118
t
@
( )
=
U;
U;
h
t 1
U)
( )+
@
t 1
( )
@
t 1
X
i
U
t 1 i
=
@
t
(A.20)
@
i=0
and
@2 t ( )
@2 t
@ t 1( )
+
=
2
@
@ 2
@
1(
2
)
2
t 1
X
i
U
_t
1 i
i=0
@2 t
=
@ 2
:
(A.21)
It now follows by Lemma X in Kristensen and Rahbek (2005) that
sup
2
T
1 X @ 2 lt ( )
T t=1 @ @ 0
h
i
p
~t ( )
! 0:
E h X
(A.22)
Proof of Theorem 4.3
The proof follows by noting that Lemmas 3.1-3.4 in FRT (2009) hold for our
setting. The only di¤erence is that the parameter vector include
loading f (xt 1 ) :
However, as E[f (xt 1 )] < 1, all the arguments remain identical as is easily seen
upon inspection of the proofs of the lemmas in FRT (2009).
Bibliography
Aalen, O. O. (1989), “A model for non-parametric regression analysis of life times”,
in J. Rosinski, W. Klonecki, and A. Kozek (eds.), Mathematical Statistics and Probability Theory, vol. 2 of Lecture Notes in Statistics, pp. 1–25, Springer, New York.
Agosto, A., and Moretto, E. (2012), “Exploiting default probabilities in a structural
model with nonconstant barrier”, Applied Financial Economics, 22:8, 667-679.
Akaike, H. (1974), “A new look at the statistical model identi…cation”, lEEE Transactions on Automatic Control, AC-19, 716-723.
Amisano, G., and Giacomini, R. (2007), “Comparing Density Forecasts via Weighted
Likelihood Ratio Tests”, Journal of Business and Economic Statistics, 25, 177-190.
Andersen, P.K., Borgan, Ø., Gill, R.D., and Keiding, N. (1992), Statistical Models
Based on Counting Processes, Springer-Verlag.
Andersen, P. K., and Gill, R. D. (1982), “Cox’s Regression Model for Counting
Processes: A Large Sample Study”, Annals of Statistics, 10, 1100–1120.
Andersen, T. G., Bollerslev, T., and Diebold, F. X. (2007), “Roughing it up: Including jump components in the measurement, modeling, and forecasting of return
volatility”, The Review of Economic and Statistics, 89, 701–720.
Andersen, T. G., Bollerslev, T., Diebold, F. X., and Labys, P. (2001), “The distribution of realized exchange rate volatility”, Journal of the American Statistical
Association, 96, 42–55.
119
BIBLIOGRAPHY
120
Azizpour, S., Giesecke, K., (2008a), “Premia for Correlated Default Risk. Department of Management Science and Engineering”, Stanford University. Unpublished
manuscript.
Azizpour, S., Giesecke, K., (2008b), “Self-exciting Corporate Defaults: Contagion vs.
Frailty”, Department of Management Science and Engineering, Stanford University.
Unpublished manuscript.
Azizpour, S., Giesecke, K., (2010), “Azizpour, S., Giesecke, K., (2010), “Exploring
the sources of default clustering”, Department of Management Science and Engineering, Stanford University. Unpublished manuscript.
Barndor¤-Nielsen, O., and Shephard, N., 2002, “Estimating quadratic variation using
realized variance”, Journal of Applied Econometrics 17, 457–477.
Behar, R., and Nagpal, K. (2001), “Dynamics of rating transition”, Algo Research
Quarterly, 4 (March/June), 71–92.
Bollerslev, T. (1986), “Generalized Autoregressive Conditional Heteroskedasticity”,
Journal of Econometrics, 31, 307–327.
Blume, M. E., Lim, F., and Craig, A. (1998), “The Declining Credit Quality of U.S.
Corporate Debt: Myth or Reality?”, The Journal of Finance, 53, 1389-1413.
Brockwell, P.J. and Davis, R. A. (1991), Time Series: Data Analysis and Theory,
Springer, New York, 2nd edition.
Brown, B. M. (1971), “Martingale Central Limit Theorems”, The Annals of Mathematical Statistics, 42, 59-66.
Christo¤ersen, P.F. and Diebold, F.X. (1997), “Optimal Prediction Under Asymmetric Loss,”Econometric Theory, 13, 808-817.
Chou, H. (2012), “Using the autoregressive conditional duration model to analyse
the process of default contagion”, Applied Financial Economics, 22:13, 1111-1120.
BIBLIOGRAPHY
121
Czado, C., Gneiting, T. and Held, L. (2009), “Predictive Model Assessment for Count
Data,”Biometrics 65, 1254–1261.
Cox, D. R. (1972), “Regression models and life-tables (with discussion)”, Journal of
the Royal Statistical Society, Series B, 34, 187-220.
Cox, D. R. (1975), “Partial likelihood”, Biometrika, 62, 69-76.
Cox, D. R., and Snell, E. J. (1968), “A general de…nition of residuals”, Journal of
the Royal Statistical Society, Series B, 30, 248-275.
Corsi, F. (2009), “A Simple Approximate Long-Memory Model of Realized Volatility”, Journal of Financial Econometrics, 7, 174–196.
Crosbie, P. J., and Bohn, J. (2002), “Modeling default risk”, Technical report, KMV,
LLC.
Das, S.R., Du¢ e, D., Kapadia, N., and Saita, L. (2007), “Common failings: How
corporate defaults are correlated,”Journal of Finance 62, 93–117.
Davis, M., and Lo, V. (2001), “Modeling default correlation in bond portfolios”, in C.
Alexander, ed., Mastering Risk Volume 2: Applications, Prentice Hall, pp. 141-151.
Davis, A. R., and Wu (2009), R., “A negative binomial model for time series of
counts”, Biometrika, 96, 735-749.
Dedecker, J. and Prieur, C. (2004), “Coupling for -dependent sequences and applications”, Journal of Theoretical Probability, 17, 861–855.
Diebold, F. X., Gunther, T. A. and Tay, A. S. (1998), “Evaluating density forecasts
with applications to …nancial risk management,”International Economic Review, 39,
863-883.
Doukhan, P., and Wintenberger, O. (2008), “Weakly dependent chains with in…nite
memory”, Stochastic Processes and their Applications, 118, 1997-2013.
Du¢ e, D., and Singleton, K. (1999), “Modeling Term Structure of Defaultable
Bonds”, The Review of Financial Studies, 12:4, 687-720.
BIBLIOGRAPHY
122
Du¢ e, D., Saita, L., and Wang, K. (2007), “Multi-period corporate default prediction
with stochastic covariates”, Journal of Financial Economics, 83, 635-665.
Du¢ e, D., Eckner, A., Horel, G., and Saita, L. (2009), “Frailty Correlated Default”,
Journal of Finance, 64, 2089-2123.
Engle, R. F. (2002), “New frontiers for ARCH models”, Journal of Applied Econometrics, 17, 425–446.
Engle, R. F., and Gallo, G. M. (2006), “A multiple indicators model for volatility
using intra-daily data”, Journal of Econometrics, 131, 3-27.
Engle, R. F., and Ng, V. (1993), “Measuring and testing of the impact of news on
volatility”, Journal of Finance, 48, 1749-1778.
Engle, R. F., and Russell, J.R. (1998), “Autoregressive conditional duration: a new
model for irregularly spaced transaction data”, Econometrica, 66:5, 1127-62.
Fahrmeir, L., and Kaufmann, H. (1985), “Consistency and asymptotic normality of
the maximum likelihood estimates in generalized linear models”, Annals of Statistics,
13, 342-368.
Fama, E. F. (1965), “The Behavior of Stock-Market Prices”, The Journal of Business,
38, 34-105.
Ferland, R., Latour, A., and Oraichi, D. (2006), “Integer-Valued GARCH Processes”,
Journal of Time Series Analysis, 27, 923–942.
Focardi, S.M., and Fabozzi, F.J. (2005), “An autoregressive conditional duration
model of credit-risk contagion”, The Journal of Risk Finance, 6, 208 - 225.
Fokianos, K. (2001), “Truncated Poisson regression for time series of counts”, Scandinavian Journal of Statistics, 28, 645-659.
Fokianos, K., and Kedem, B. (2004), “Partial Likelihood Inference for Time Series
Following Generalized Linear Models”, Journal of Time Series Analysis, 25, 173–197.
BIBLIOGRAPHY
123
Fokianos, K., Rahbek, A., and Tjøstheim, D. (2009), “Poisson autoregression”,
Journal of the American Statistical Association, 104, 1430–1439.
French, K. R., Schwert, G. W., and Staumbaugh, R. F. (1987), Journal of Financial
Economics, 19, 3-29.
Gallo, G. M., and Pacini, B. (2000), “The e¤ects of trading activity on market
volatility”, The European Journal of Finance 6, 163–175.
Giesecke, K., Longsta¤, F., Schaefer, S., and Strebulaev, I. (2011), “Corporate bond
default risk: A 150-year perspective”, Journal of Financial Economics, 102, 233-250.
Glosten. L. R.. Jagannathan. R.. and Runkle. D. (1993), “Relationship between the
Expected Value and the Volatility of the Nominal Excess Return on Stocks”, Journal
of Finance, 48, 1779-1802.
Gourieroux, C., Monfort, A. and Trognon, A. (1984), “Pseudo Maximum Likelihood
Methods Theory”, Econometrica, 52, 681-700.
Hamilton, J. (2005), Regime-Switching Models”, The New Palgrave Dictionary of
Economics.
Han, H., and Park, J.Y. (2008), “Time series properties of ARCH processes with
persistent covariates”, Journal of Econometrics, 146, 275–292.
Han, H., and Kristensen, D. (2013), “Asymptotic theory for the QMLE in GARCHX models with stationary and non-stationary covariates,”CeMMAP working papers
CWP18/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Hansen, P.R., Huang, Z. and Shek, H.W. (2012) “Realized GARCH: A joint model
for returns and realized measures of volatility,”Journal of Applied Econometrics, 27,
877–906.
Hausman, A., Hall, B. H., and Griliches, Z. (1984) Econometric Models for Count
Data with an Application to the Patents-R&D Relationship”, Econometrica, 52,
909-938.
BIBLIOGRAPHY
124
Hawkes, A.G., (1971), “Spectra of some self-exciting and mutually exciting point
processes”, Biometrika, 58, 83–90.
Heinen, A. (2003), “Modeling time series count data: An autoregressive conditional
Poisson model”, CORE Discussion Paper 2003/62, Center of Operations research
and Econometrics, Université Catholique de Louvain.
Hilbe, J. M. (2007), Negative binomial regression, Cambridge University Press.
Jarrow, R.,and Turnbull, S. (1995), “Pricing options on Financial Securities Subject
to Default Risk”, Journal of Finance, 50, 53–86.
Jarrow, R., Lando, D., Turnbull, S. (1997), “A Markov model for the term structure
of credit risk spreads”, Review of Financial Studies, 481–523.
Jarrow, R. and Fan, Y. (2001), “Counterparty risk and the pricing of defaultable
securities”, Journal of Finance, 56, 555-576.
Jensen, S. T., and Rahbek, A. (2004), “Asymptotic Inference for Nonstationary
GARCH”, Econometric Theory, 20, 1203–1226.
Johnson, N. L., Kotz, S., and Kemp, A. W. (1992), Univariate Discrete Distributions,
second edition, John Wiley & Sons, Inc., New York.
Jung, R.C., Kukuk, M. and Liesenfeld, R. (2006), “Time series of count data: modeling, estimation and diagnostics”, Computational Statistics and Data Analysis, 51,
2350-2364.
Kavvathas, D., “Estimating credit rating transition probabilities for corporate
bonds”, Working paper, University of Chicago.
Koopman, S.J., and Lucas, A. (2005), “Business and Default Cycle for Credit Risk”,
Journal of Applied Econometrics, 20: 311–323.
Koopman, S.J., Lucas, A., and Monteiro, A. (2008), “The multi-state latent factor
intensity model for credit rating transitions”, Journal of Econometrics, 142, 399-424.
BIBLIOGRAPHY
125
Koopman, S.J., Lucas, A., and Schwaab, B., “Modeling frailty-correlated defaults
using many macroeconomic covariates”, Journal of Econometrics, 162, 312-325.
Kedem, B., and Fokianos, K. (2002), Regression Models for Time Series Analysis,
Hoboken, NJ: Wiley.
Kristensen, D. and Rahbek, A. (2005), "Asymptotics of the QMLE for a Class of
ARCH(q) Models", Econometric Theory, 21, 946–961.
Lambert, D. (1992), “Zero-in‡ated Poisson regression, with an application to defects
in manufacturing”, Technometrics, 34, 1-14.
Lamoureux, C. G., and Lastrapes, W. D. (1990), “Heteroskedasticity in stock return
data: Volume versus GARCH e¤ects”, Journal of Finance, 45, 221–229.
Lando, D. (1998), “On Cox processes and credit risky securities, Review of Derivatives Research, 2, 99–120.
Lando, D., and Nielsen, M. (2010), “Correlation in corporate defaults: Contagion or
conditional independence?”, Journal of Financial Intermediation, 19, 355-372.
Lando, D., Medhat, M., Nielsen, M., and Nielsen, S. (2013), “Additive Intensity Regression Models in Corporate Default Analysis”, Journal of Financial Econometrics,
11, 443–485.
Lando, D., and Skødeberg, T. M. (2002), “Analyzing rating transitions and rating
drift with continuous observations”, Journal of Banking and Finance, 26, 423-444.
Lang, L.H.P., Stulz, R.M., (1992), “Contagion and competitive intra-industry effects of bankruptcy announcements. An empirical analysis”, Journal of Financial
Economics, 32, 45–60.
Leland, H. E. (1994), “Corporate debt value, bond covenants, and the optimal capital
structure”, Journal of Finance, 49, 1213–52.
Leland, H. E. and Toft, K. B. (1996), “Optimal capital structure, endogenous bankruptcy, and the term structure of credit spreads”, Journal of Finance, 60, 987–1019.
BIBLIOGRAPHY
126
Li, W. K. (1991), “Testing model adequacy for some Markov regression models for
time series”, Biometrika, 78, 83-89.
Martens, M., van Dijk, D., de Pooter. M. (2004),“Forecasting S&P 500 volatility:
Long memory, level shifts, leverage e¤ects, day-of-the-week seasonality, and macroeconomic announcements”, International Journal of Forecasting, 25, 282-303.
McCullagh, P. (1986), “The Conditional Distribution of Goodness-of-Fit Statistics
for Discrete Data”, Journal of the American Statistical Association, 81:393, 104-107.
McCullagh, P., and Nelder, J. A. (1983), Generalized Linear Models, Chapman &
Hall, New York.
McCullagh, P., and Nelder, J. A. (1989), Generalized Linear Models, Chapman &
Hall, London, 2nd edition.
Meitz, M., and Saikonnen, P. (2008), “Ergodicity,Mixing and Existence of Moments
of a Class of Markov Models With Applications to GARCH and ACD Models”,
Econometric Theory, 24, 1291–1320.
Meyn, S. P., and Tweedie, R. L. (1993), Markov Chains and Stochastic Stability,
London: Springer.
Merton, R. C. (1974), “On the pricing of corporate debt: the risk structure of interest
rates”, Journal of Finance, 29, 49–70.
Mullahy, J. (1986), “Speci…cation and testing of some modi…ed count data models”,
Journal of Econometrics, 33, 341{365.
Nelder, J. A., and Wedderburn, R. W. M. (1972), “Generalized linear models”,
Journal of the Royal Statistical Society, Series A, 135:370-384.
Nelson. D. B. (1991). “Conditional Heteroskedasticity in Asset Pricing: A New Approach”, Econometrica, 59, 347-370.
Nickell, P., Perraudin, W., and Varotto, S. (2000), “Stability of rating transitions”,
Journal of Banking and Finance, 24, 203-227.
BIBLIOGRAPHY
127
Rydberg, T. H., and Shephard, N. (2000), “A Modeling Framework for the Prices and
Times of Trades on the New York Stock Exchange,”in Nonlinear and Nonstationary
Signal Processing, eds. W. J. Fitzgerlad, R. L. Smith, A. T. Walden, and P. C. Young,
Cambridge: Isaac Newton Institute and Cambridge University Press, pp. 217–246.
Shephard, N. and Sheppard, K. (2010), Realising the future: Forecasting with highfrequency-based volatility (HEAVY) models, Journal of Applied Econometrics 25,
197-231.
Shumway, T. (2001), Forecasting bankruptcy more e¢ ciently: A simple hazard
model, Journal of Business, 74, 101–124.
Schwarz, G. (1978), “Estimating the dimension of a model”, Annals of Statistics, 6,
461-464.
Schwert, G. W. (1989), “Why Does Stock Market Volatility Change Over Time?”,
The Journal of Finance, 44, 1115-1153.
Skeel, D. A. (2001), “Debt’s Dominion: A History of Bankruptcy Law in America”,
Princeton University Press.
Streett, S. (2000), “Some Observation Driven Models for Time Series of Counts,”
Ph.D. thesis, Colorado State University, Dept. of Statistics.
Tay, A.S and Wallis, K.F. (2000) “Density Forecasting: A Survey”, Journal of Forecasting, 19, 235-254.
Tang, T. T. (2009), “Information asymmetry and …rms’credit market access: Evidence from Moody’s credit rating format re…nement”, Journal of Financial Economics, 93, 325-351.
Wedderburn, R. W. M. (1974), “Quasi-likelihood functions, generalized linear models
and the Gaussian method”, Biometrika, 61, 439-447.
Wong, W. H. (1986), “Theory of partial likelihood”, Annals of Statistics, 14, 88-123.
BIBLIOGRAPHY
128
Zeger, S. L., and Qaqish, B. (1988), “Markov Regression Models for Time Series: A
Quasi-Likelihood Approach,”Biometrics, 44, 1019–1031.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement