Alma Mater Studiorum – Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA Ciclo XXVI Settore Concorsuale di afferenza: 13/A5 Settore Scientifico disciplinare: SECS-P/05 ECONOMETRICS OF DEFAULT RISK Presentata da: Arianna Agosto Coordinatore Dottorato Tutor Prof. Angela Montanari Prof. Giuseppe Cavaliere Co-tutor Prof. Anders Rahbek Esame finale anno 2012/2013 Contents 1 Introduction to Default Risk 6 1.1 Default risk: de…nition and measurement . . . . . . . . . . . . . . . . 6 1.2 The Default Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Motivation and overview . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Econometric modelling of Default Risk 2.1 Default prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 14 14 The role of rating . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Default correlation and Contagion . . . . . . . . . . . . . . . . . . . . 20 2.3 The study of default correlation through count models . . . . . . . . 22 2.3.1 Testing conditional independence of defaults . . . . . . . . . . 2.3.2 An Autoregressive Conditional Duration model of credit risk 23 contagion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3 Econometric modelling of Count Time Series 29 3.1 Generalized Linear Models for time series . . . . . . . . . . . . . . . . 29 3.2 The Poisson Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.1 Model speci…cation . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.3 Asymptotic theory . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2.4 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.5 Goodness of …t . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2.6 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . 41 1 CONTENTS 2 3.3 The doubly-truncated Poisson model . . . . . . . . . . . . . . . . . . 41 3.4 The Zeger-Qaqish model . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.5 Overdispersion and negative binomial regression . . . . . . . . . . . . 45 3.6 Poisson Autoregression . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.6.1 Model speci…cation . . . . . . . . . . . . . . . . . . . . . . . . 47 3.6.2 Ergodicity results . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.6.3 Estimation of parameters . . . . . . . . . . . . . . . . . . . . . 50 3.6.4 Asymptotic theory . . . . . . . . . . . . . . . . . . . . . . . . 51 3.7 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4 A new Poisson Autoregressive model with covariates 54 4.1 Related literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2 Speci…cation of PARX models . . . . . . . . . . . . . . . . . . . . . . 56 4.3 Time series properties . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.4 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . 61 4.5 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.6 Finite-sample simulations . . . . . . . . . . . . . . . . . . . . . . . . 64 4.6.1 Simulation design . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.7 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5 Empirical study of Corporate Default Counts 76 5.1 Overview of the approach . . . . . . . . . . . . . . . . . . . . . . . . 77 5.2 Corporate default counts data . . . . . . . . . . . . . . . . . . . . . . 77 5.3 Choice of the covariates . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3.1 Financial market variables . . . . . . . . . . . . . . . . . . . . 83 5.3.2 Production and macroeconomic indicators . . . . . . . . . . . 88 5.4 Poisson Autoregressive models for corporate default counts . . . . . . 90 5.4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.4.2 Goodness of …t analysis 97 . . . . . . . . . . . . . . . . . . . . . 5.5 Out-of-sample prediction . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 CONTENTS 3 6 Conclusions 107 A Appendix 111 Bibliography 119 Abstract This thesis is the result of a project aimed at the study of a crucial topic in …nance: default risk, whose measurement and modelling have achieved increasing relevance in recent years. We investigate the main issues related to the default phenomenon, under both a methodological and empirical perspective. The topics of default predictability and correlation are treated with a constant attention to the modelling solutions and reviewing critically the literature. From the methodological point of view, our analysis results in the proposal of a new class of models, called Poisson Autoregression with Exogenous Covariates (PARX). The PARX models, including both autoregressive end exogenous components, are able to capture the dynamics of default count time series, characterized by persistence of shocks and slowly decaying autocorrelation. Application of di¤erent PARX models to the monthly default counts of US industrial …rms in the period 1982-2011 allows an empirical insight of the defaults dynamics and supports the identi…cation of the main default predictors at an aggregate level. 4 Acknowledgements I am grateful to my supervisor Prof. Giuseppe Cavaliere for precious advice and for all I learned from him. I express my sincere gratitude to my co-tutor Prof. Anders Rahbek for supporting my ideas and for the great experience in Copenhagen. Thanks to all my research group for useful suggestions and comments. I am particularly grateful to Dr. Luca De Angelis for all his support. I would like to thank Pablo Barbagallo from Moody’s Corporation. A special thank to Lucia for all the moments we shared in our PhD experience. I am grateful to Dr. Enrico Moretto, who believes in me more than I do. Many thanks to my family for teaching me to never give up and, last but not least, to Rocco for all his love and support. 5 Chapter 1 Introduction to Default Risk This chapter explains how default risk can be de…ned and measured, motivating the importance of deriving models for its analysis and prediction. After giving a technical de…nition of the default event, we illustrate the main empirical evidences in the corporate default phenomenon as well as two crucial topics related to their interpretation - default predictability and correlation between corporate defaults. The structure and the motivation of the thesis work is then presented and connected to the economic and …nancial issues introduced. 1.1 Default risk: de…nition and measurement Default risk is de…ned as the risk of loss from a counterparty failure to repay the owed amount in terms of either principal or interests of a loan. Default is considered as the most serious event related to credit risk, the last referring to the more comprehensive case of a change in the current value of a credit exposure due to an expected variation of the borrower solvency. Banks and …nancial groups are highly involved in both corporate and retail default risk and are required to adopt methodologies for quantifying such risk and thereby determining the amount of capital necessary to support their business and to protect themselves against volatility in the level of losses. The default risk management is included in the Basel II regulation for the stability of the international banking 6 CHAPTER 1. INTRODUCTION TO DEFAULT RISK 7 system and comprises both general economic capital requirements and internal rating procedures. A key aspect in default risk management is the measurement of the Probability of Default, i.e. the probability that, following the de…nition given by the Bank of International Settlements, with regard to a particular obligor either or both of the two events have taken place: the bank considers that the obligor is unlikely to pay its credit obligations to the banking group in full, without recourse by the bank to actions such as realising securities (if held) the obligor is past due more than 90 days on any material credit obligation to the banking group. There are two main approaches to default risk modelling: the structural and the reduced form approach. The …rst considers default as an endogenously determined event which can be predicted by the economic and …nancial conditions of the company, re‡ected in its balance sheet data and market value. Therefore, structural models study the evolution of structural …rm variables such as the assets and debt values in order to determine the probability and the timing of bankruptcy, thus explicitly relating default to the …rst time the assets fall below a certain level - the default barrier - as an endogenously determined event. This approach was introduced by the seminal work of Merton (1974), which …rst relies on the option pricing theory for deriving the probability that the assets fall below the outstanding value of debt. Merton model is based on treating the assets of a …rm as a call option held by the stockholders, whose price - the (known) market value - implies the probability of default. This approach is then extended by abandoning some irrealistic assumptions, such as the existence of a …xed default barrier given by the nominal total value of debt. Black and Cox (1976) introduce a time-varying threshold de…ned as a fraction of the nominal value of liabilities, as it is done by Leland (1994), which also considers the …scal aspects of bakruptcy decision. Leland and Toft (1996) …rst evaluate the e¤ects of the presence of coupons and of short-term debt roll-over. A recent development by Agosto and Moretto (2012) determines the curvature parameter of the CHAPTER 1. INTRODUCTION TO DEFAULT RISK 8 nonconstant default barrier by using …rm-speci…c balance sheet and market data. Moody’s KMV, the proprietary model used by the rating agency Moody’s for determining the probability of default, is the most famous application of a structural model and is based on the extension of the Merton model developed by Kealhofer, McQuown and Vasicek in 1989. In contrast to the structural approach, reduced-form models consider default as an exogenously determined process and use immediately available market and credit data - mainly forward rates, rating and price of the issued bonds - rather than modelling the asset value dynamics. Jarrow and Turnbull (1995) and its development Jarrow, Lando and Turnbull (1997), for example, de…ne a model which explicitly incorporates credit rating information into debt instruments pricing and can also be used for risk management purposes as it allows to derive the probabilities of solvency implied by credit spreads. An important class of reduced-form models is that of the so called intensity models. They consider the default time as the stochastic …rst jump time of a count process - Poisson in many cases - whose intensity is a function of latent or observable variables. Their link to probability of default modelling is clear if one thinks that the limit of the intensity of a count process, for a time period approximating zero, is the probability of observing one event. The popularity of intensity models has increased in recent years, as they allow for many econometric applications based on the estimation of default intensity through risk factors and business failure predictors. This approach is followed, for example, by Du¢ e and Singleton (1999) and Lando (1998) and, as we shall explain, can be e¤ectively used for considering relevant aspects such as dependence between corporate defaults. Looking at the empirical measures of default risk, the data typically used in risk management and published in rating agencies and …nancial institutions reports are: default rate: it is the most widely used measure of the default phenomenon incidence, being de…ned as the number of defaulting companies in a certain time period divided by the total number of debt issuers in the same period. An alternative de…nition, that we do not consider here, is the value-weighted default rate, which considers the incidence of defaults in terms of money loss; CHAPTER 1. INTRODUCTION TO DEFAULT RISK 9 default count: it is the number of failures in a certain time period (typically a month). As we shall see, there are several reasons motivating the counting approach to default risk modelling; …rm-speci…c measures, such as distance-to-default: this is a volatility-adjusted measure calculated and periodically published by Moody’s, resulting from the application of the above mentioned KMV model. Following Crosbie and Bohn (2002), it can be de…ned as “the number of asset value’s standard deviations between the market asset value and the default point”. Most of the works presented in the following are focused on default rates or counts modelling and often use “ready-available”measures of …rm-speci…c risk such as distance-to-default. 1.2 The Default Clustering Looking at the corporate defaults phenomenon under an aggregate perspective, the most relevant aspect is the strong empirical evidence that corporate defaults cluster in time: both default rates and counts show very high peaks, followed by periods of low incidence. This is clear from Figure 1.1, showing the time series of US default rates and counts among Moody’s rated industrial …rms from 1982 to 2011. The potentially strong impact of the default clusters on the investors and …nancial institutions risk has increased the interest of the …nancial and econometric literature in the two main issues related to the presence of default peaks: default predictability and default correlation. First, a central objective in risk management is …nding macroeconomic variables and …nancial indicators that are able to predict the peaks in the number of defaults, in support of …nancial vigilance and central banks decisions. There are indeed many empirical studies analyzing the strong time variation of default frequencies and linking it to macroeconomic variables and business cycle indicators. This is done, amongst others, by Shumway (2001) and Du¢ e et al. (2007). CHAPTER 1. INTRODUCTION TO DEFAULT RISK 10 Figure 1.1: (a) Monthly default count of Moody’s rated industrial …rms from January 1982 to December 2011. (b) Monthly default rate of Moody’s rated industrial …rms from January 1982 to December 2011. The interpretation of default clustering is also connected to the issue of correlation, as a high number of defaults in a short period could also be caused by commercial and …nancial links between the companies. The study of correlation between corporate defaults is an essential tool of credit risk management at portfolio level and its importance has increased in recent years for several reasons. First, banks minimum capital requirements in the Basel II approach are function, among the other things, of the borrowers joint default probability, measured by asset correlation. Second, there has been a large growth of …nancial instruments like Collateralized Debt Obligations, whose cash ‡ows depend explicitly on default frequency at portfolio level. Furthermore, the evaluation of default probability at the level of an individual security is not able to give an adequate explanation of credit risk spreads, whose dynamics are in‡uenced by commonality in corporate solvency. The default clustering phenomenon has given rise to a debate about its possible explanation. An important question is whether cross-…rm default correlation associated with observable macroeconomic and …nancial factors a¤ecting corporate solvency is su¢ cient to explain the observed degree of default clustering or it is CHAPTER 1. INTRODUCTION TO DEFAULT RISK 11 possible to document contagion e¤ects by which one …rm’s default increases the likelihood of other …rms defaulting. The “cascade” e¤ect which seems to be generated by defaults could spread by means of contractual relationships (customer-supplier or borrower-creditor, for example) or through an “informational” channel, that means a change in the agents expectations of corporate solvency. An increased uncertainty on the credit market leading to a worsening in funding conditions, like credit crunch or higher interest rates, can indeed in‡uence the risk perception. Furthermore, the default clusters could be linked to the systematic (aggregate) risk generated by common macroeconomic and …nancial risk factors a¤ecting …rm solvency: this case is usually excluded from the most strict de…nition of contagion, that refers instead to between-…rms e¤ects on default timing. The works we are going to present in the following chapter are related to default prediction and correlation, investigated through models for aggregate or …rm-speci…c data on default events. 1.3 Motivation and overview The aim of this work is to study how default risk can be measured and modelled. We contribute to the existing literature by de…ning, studying and applying a count time series model for the number of corporate defaults, providing a good in- and out-of-sample forecasting of default counts in an extended group of debt issuers. Our model speci…cation results from the analysis of the stylized facts of corporate default count time series presented in this chapter. First of all, as it often happens for rare events, the default phenomenon is characterized by overdispersion: the variance of the number of events is much higher than its mean, leading to series showing both peaks (“clusters”) and periods of low incidence. Moreover, the default count time series are characterized by a slowly decreasing autocorrelation function, which is a typical feature of long-memory processes. We start, in Chapter 2, with a review of the main econometric and …nancial models for default risk, with a …nal focus on intensity models applied to count time series of corporate defaults. We then present, in Chapter 3, the main models for count data used in econo- CHAPTER 1. INTRODUCTION TO DEFAULT RISK 12 metrics, which rely on the theory of Generalized Linear Models. For several reasons related to the empirical evidences in corporate default count time series, we focus on conditional Poisson models, taking Poisson Autoregression by Fokianos, Rahbek and Tjøstheim (2009) as our main reference. This model (reviewed in Section 3.6) is based on the de…nition of the count process as a sequence of Poisson drawings which are independent conditional on the past count history. The time-varying intensity (i.e. the expected number of events at time t) is speci…ed as a linear function of lagged counts and intensities. This approach shares some similarities with the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) approach for volatility (Bollerslev, 1986). The idea - which can be considered as the …rst part of our contribution - is that of modelling default clustering in a similar way to the models for volatility clustering, through an autoregressive model which also gives a measure of “persistence” of the series. The dependence of the process (i.e. the number of defaults, in our case) on its past history can indeed explain its long memory and allows to study it under the perspective of shocks persistence. Poisson Autoregression - di¤erently from the traditional Poisson model - also allows for overdispersion. The consideration that the expected number of defaults is probably in‡uenced by the macroeconomic and …nancial context in which corporate …rms operate has led us to the idea of extending Poisson Autoregression by including exogenous covariates. Thus, in Chapter 4, we present our methodological contribution, developing a class of Poisson intensity AutoRegressions with eXogeneous covariates (PARX) models that can be used for modelling and forecasting time series of counts. We analyze the time series properties and the conditions for stationarity and develop the asymptotic theory for this new model. This way we provide a ‡exible framework for analyzing dependence of default intensity on both the past number of default events and other relevant …nancial variables. It is also interesting to consider the impact of including a lagged covariate process on the estimated persistence. In Chapter 5, we present an extended empirical study of US corporate defaults, based on the application of alternative PARX models. We consider the monthly default counts of US Moody’s rated corporate …rms: the rating agency Moody’s provides monthly and annual reports showing default rates and counts and also CHAPTER 1. INTRODUCTION TO DEFAULT RISK 13 o¤ers some instruments for looking more analitically through the data. One of these services is the Credit Risk Calculator, which allows to create customized reports and get data on defaults and rating transitions for speci…c sectors in a given geographical area. We use a dataset which covers the period from January 1982 to December 2011 and consists in the monthly default counts of US Moody’s rated corporate …rms classi…ed as “broad industrial”, that means it excludes banking, …nancial and insurance companies as well as public utility and transportation activities. As we will see in the review part, the use of data on industrial …rms is common in the corporate defaults analyses. We consider the impact on default intensity of several covariate processes, such as business cycle indicators, production indexes and rating downgrades. For analyzing the link between the …nancial and the credit market we also include a measure of realized volatility of returns. Realized volatility is expected to summarize the level of uncertainty during periods of …nancial turmoil when corporate defaults are more likely to cluster and we show that it is signi…cantly and positively associated with the number of defaults. Chapter 2 Econometric modelling of Default Risk The two main issues related to the corporate default phenomenon - default predictability and correlation - are now analyzed through an overview of the existing …nancial and econometric literature of credit risk modelling, with a special focus on the models for default intensity, de…ned as the expected number of bankruptcies in a given period. These models often include macroeconomic and …nancial explanatory variables, in the aim of …nding both common and …rm-speci…c risk factors for solvency and default predictors. Furthermore, the count modelling framework allows extensions easing the analysis of dependence between default events. 2.1 Default prediction The most obvious default predictor for a single …rm is represented by its business and …nancial conditions, which can be summarized by balance sheet data such as leverage and net pro…t measures. This approach is natural in the above mentioned structural models, which are based on the study of the …rm’s asset evolution, but also characterizes a variety of statistical methods for credit risk measurement, such as credit scoring. It is due, for example, to Altman (1968) the development of a multiple discriminant statistical methodology applied to bankruptcy prediction through a 14 CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK 15 set of …nancial and economic ratios which are shown to successfully discriminate between failing and nonfailing …rms. The discriminant function includes variables such as working capital on total assets ratio, market on book value ratio and the sales amount. It is clear that this represents a microeconomic approach which seems not to be suitable when analyzing the default likelihood of large dimension or listed companies, which are expected to be more involved with the overall …nancial and macroeconomic scenarios. Recently, there is a growing interest in the speci…cation of models explaining the number or the frequency of corporate defaults with a set of exogenous covariates. An example can be found in Giesecke et al. (2011). They focus on modelling the default rate, which is one of the most used measure of the default phenomenon incidence, being de…ned as the number of defaulting companies in a certain time period divided by the total number of debt issuers in the same period, and periodically published in rating agencies reports. Their empirical analysis considers a large dataset of monthly default rates of US industrial …rms, spanning the 1866-2008 period, and is based on the application of a regime-switching model, in the aim of examining the extent to which default rates can be predicted by …nancial and macroeconomic variables. The econometric speci…cation is the following: Dt = where X t 1 t + k Xk;t 1 + "t , "t i:i:d:N (0; 2 (2.1) ) is a k-vector of exogenous explanatory variables and the k terms are the corresponding slope coe¢ cients. The intercept term follows a three-state Markov chain taking values 1, 2 and 3 - corresponding to “low”, “medium” and “high” default regime respectively - and the ij probability of transition from state i to state j is the (i; j)-th entry of a transition matrix. Following Hamilton (2005), the model is estimated by a maximum likelihood algorithm based on the recursive updating of the probability i;t of being in state i at time t, the recursion expression being: i;t P3 = P3 i=1 i=1 P3 ij i;t 1 jt j=1 ij i;t 1 jt (2.2) CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK with conditional likelihood function 0 jt = 1 2 2 B exp @ jt 16 given by Dt jt PN k=1 2 2 2 k Xk;t 1 1 C A (2.3) Among the regressors the authors include both business cycle variables, such as GDP and Industrial Production (IP) growth, and …nancial covariates (stock returns, change in returns volatility and change in credit spread), as well as the lagged default rate itself. Several covariates, like the change in returns volatility and returns themselves, turn out to be signi…cant in explaining default rates dynamics, while others, such as the growth in Industrial Production and the change in credit spreads, have a low explanatory power. An interesting point - which seems to be not deeply investigated in the paper - is the high value of the lagged default rate coe¢ cient, highlighting the relevance of the autoregressive components in default rate evolution. The maximum likelihood estimate of the time-varying intercept goes from a minimum of 0:007 in the “low” regime to a value of 0:111 under the worst scenario, so it is in general quite low. The “Dot-Com bubble”of 2001-2002, for instance, corresponds to a high default regime, although its severity is not comparable to other crisis periods such as the Great Depression. Other empirical studies which try to …nd a connection between the business cycle and the default rates are, amongst others, Kavvathas (2001) and Koopman and Lucas (2005). A missing element in this kind of approach is the absence of …rm-speci…c variables, which are instead present in other, even previous, works, like Du¢ e et al. (2007). This article provides maximum likelihood estimators of multi-period conditional probabilities of corporate default incorporating the dynamics of both …rmspeci…c and macroeconomic variables. The empirical analysis is again based on a dataset of defaults among Moody’s rated US industrial …rms. With regard to the modelling framework, a Cox regression model for counting processes is used: this approach is shared by some of the works related to the analysis of default correlation presented in Section 2.2, so it will be described in detail later. The individual …rm covariates considered in Du¢ e et al. (2007) are the previously de…ned distance to default and the …rm trailing stock return, while the overall regressors are the trailing CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK 17 S&P 500 returns and the three-month Treasury bill rate. It is quite surprising - and also recalls the results of Giesecke et al. (2011) - the lack of signi…cance of other variables, such as credit spreads and GDP growth, that are instead expected to be relevant in default prediction. 2.1.1 The role of rating When talking about default predictability, an analysis of the role of credit rating information cannot be avoided. Rating is, indeed, the main result of the evaluation of a company solvency, made by specialized agencies. The rating information is synthetical and categorical, two features that summarize the potential advantage of this kind of evaluation and explain the wide use of rating in support of pricing and investment decisions. Furthermore, rating agencies methodologies should rely on statistical and econometric models, thus giving a quantitative judgement which is reasonably thought to be objective. However, in the last years some well-known cases like that of Lehman Brothers, whose collapse was not preceeded by any “in time” rating downgrade - Standard & Poor’s maintained the investment-grade rating of “A”and Moody’s downgraded Lehman only one business day before the bankruptcy announcement - has given rise to a burning debate about the possible mistakes in rating evaluation and whether other aspects than a rational and documented quantitative analysis in‡uence the action of rating agencies. Beyond the often unproductive and simplistic discussions trying to mark rating as “good” or “bad”, the question arising in a proper econometric analysis is whether the current rating of a …rm is a good predictor of its default probability. There is a double link between rating and the probability of default (henceforth PD). First of all, “default” is one of the classes characterizing the rating scale: class “D”is present in the classi…cation used by all the main rating agencies, such as Fitch, Moody’s and Standard & Poor’s. In the long-term rating assignment, the companies in the “default class” are those that have already failed to repay all or some of their obligations, even in case bankruptcy has not been o¢ cially declared yet; in the short-term rating scale, class “D” corresponds to an e¤ective state of insolvency. Secondly, rating agencies periodical CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK 18 material establishes a correspondance between rating classes and PD, based on historical default rates of …rms with di¤erent rating scores. As an example, we brie‡y describe the Moody’s approach to rating attribution: the output of its proprietary (KMV) model - based on the application of Merton’s option pricing formulas in order to derive the market value of assets and its volatility from the market value of equity (…rm stocks) - is the so-called Expected Default Frequency (EDF). Figure 2.1 gives a graphical representation of EDF, as the probability that the …rm assets fall below a certain threshold over a given time horizon, typically one year or more, based on the hypothesis of log-normal dynamics of the asset value which is typical of Black and Scholes modelling framework. Figure 2.1 Illustration of EDF determined by Moody’s KMV. Source: Moody’s. To each interval of EDF, Moody’s associates a class of what the agency itself de…nes as implied rating and declares to be a relevant component of the overall rating, the latter also including qualitative and discretionary considerations. Thus, implied rating represents the link between rating and PD. The econometric analysis of rating is mainly based on the modelling of rating CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK 19 history, that is the changes in a …rm rating. This is also motivated by the fact that a kind of information widely used in the risk management of …nancial institutions is given by the rating transition matrices, both historical and forecasted. The general framework of the models for rating, chacterizing, among the others, Jarrow, Lando and Turnbull (1997), is the following. A Markov chain is de…ned on a …nite space of states: S = f1; 2; :::; kg (2.4) Each state corresponds to a di¤erent rating class, so that the k-th state is the default category, hence we may write, following, as an example, Moody’s classi…cation, S = fAAA; AA; :::; Dg It is assumed that the Markovian process describing rating evolution is homogenous, i.e. its transition matrix does not change in time. The transition matrix Q for (2.4) is de…ned as follows: 0 q1;1 q1;2 ::: B B q q2;2 B 2;1 B . .. . Q=B . B . B B qk 1;1 qk 1;2 ::: qk 1;k @ 0 0 ::: q1;k q2;k .. . 1 qk 1;k 1 1 C C C C C C C C A where the generic entry qi;j is the probability that a company belonging to rating class i in t will have rating j in t + 1. It is trivial that the following must be veri…ed for i = 1; :::; k : qi;j 0 qi;i = 1 Xk j=1 qi;j for i = 1; :::; k j6=i Note that the last row corresponds to the obvious assumption that default is an absorbing state, i.e. it is not possible to move from state k to another. The assumed homogeneity implies that matrix, say, Q(t; T ), containing the probabilities q(t; T ) of being in state i at time t and in j at T , is obtained by simply multiplying it for itself: Q(t; T ) = QT t CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK 20 The transition probabilities are, in general, given by historical data on average rating changes rates. Another possibility is that of deriving “risk-neutral”transition probabilities by multiplying Q by a matrix containing credit risk premiums estimated from empirical credit spreads. In this framework, the PD by time T calculated in t is de…ned as P D(t; T ) = 1 qi;k (t; T ) (2.5) This approch is simple but operationally appealing. Lando and Skødeberg (2002) revisit it by introducing a corrected transition matrix that takes into account the rating changes occured between t and T , as ignoring them can lead to underestimate the probability of downgrade. A more complex intensity-based model for rating transitions has been, instead, proposed by Koopman et al. (2008). With regard to the investigation of the predictive power of rating information through empirical analyses, a common strategy used in econometric works is that of analyzing how much the current rating of a …rm really incorporates the stage of the business cycle and the risk pro…le of its sector, by studying dependence of the published rating transition probabilities on a set of indicators. Nickell et al. (2000) …nd that business cycle e¤ects have a strong impact on rating especially for lowgrade issuers, while Behar and Nagpal (2001) argue that the current rating of a …rm seems not to incorporate much of the in‡uence of the macroeconomic context on the default rates. 2.2 Default correlation and Contagion When modelling the rate or the number of defaults, one of the main objective is …nding macroeconomic variables and …nancial indicators able to predict the peaks in the number of defaults, in support of …nancial vigilance and central banks decisions. Another crucial topic a great part of the literature focuses on is default correlation: are corporate defaults independent rare events or are there connections between them? First, there are several works supporting the hypothesis of default correlation with empirical analyses. For example, Das et al. (2006) document default correlation - CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK 21 derived as correlation between individual default probabilities in an intensity-based setting - in various economic sectors and emphasize that correlation e¤ects are timevarying. They further claim that it is possible to distinguish between two “default regimes”: a high regime characterized by a higher correlation and a low regime in which correlation is modest. Another important aspect is the already mentioned possibility of contagion e¤ects by which one …rm’s default directly increases the likelihood of other …rms defaulting, generating the “default cascade” e¤ect which seems to characterize the crisis periods. Some examples of contagion models include Davis and Lo (2001), Jarrow and Yu (2001) and Azizpour and Giesecke (2008a). These models share the assumption that the default event of one …rm directly triggers the default of other …rms or causes their default probabilities to increase. A missing element in this kind of modeling is testing the hypothesis of conditional independence between default events, which are probably subject to a common source of randomness due to the mutual exposure to common risk factors. The test of the doubly stochastic assumption, i.e. the assumption that defaults are independent after conditioning on common factors, has been introduced in two recent works about contagion, Das et al. (2007) and Lando and Nielsen (2010), the latter reviewed in the following. Both examine whether default events in an intensity-based setting can be considered conditionally independent testing whether bankruptcy count behaves as a standard Poisson process. This means to verify in an intensity-based setting the doubly stochastic assumption, under which the default events are dependent only on exogenous variables. A distinct class of models for contagion is that of the so-called frailty models. They aim at individuating latent (unobservable) factors acting as an additional channel for the spread of defaults. As stated in Azizpour and Giesecke (2008b), in frailty models default clustering is indeed explained by three kinds of factors: observable common factors: changes and shocks in the macroeconomic and …nancial context; frailty factors: unobservable common factors a¤ecting corporate solvency; CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK 22 contagion: the direct negative impact that the default event has on other companies. This can be due to contractual relationships linking …rms to each other, but also to the “informational” aspect, as bankruptcy announcements increase the market uncertainty and cause a decrease in the value of stocks portfolio of both industrial and banking …rms, with important consequences on credit supply and companies …nancial conditions. The e¤ects of default announcements are also treated in Lang and Stulz (1992). In this class of models, including, among others, Du¢ e et al. (2009), Azizpour et al. (2010) and Koopman et al. (2011), both frailty and contagion e¤ects are analyzed with self-exciting point processes. These are characterized by the speci…cation of the conditional instantaneous default intensity of a counting process, that is of the in…tesimal rate at which events are expected to occur around a certain time, allowing for dependence on the timing of previous events. The major reference for this approach is the self-exciting process de…ned by Hawkes (1971). A di¤erent speci…cation of conditional default intensity can be found in Focardi and Fabozzi (2005) and Chou (2012): both use the Autoregressive Conditional Duration (ACD) model introduced by Engle and Russell (1998). In the ACD model, the expectation of the duration, i.e. of the interval between two arrival times, conditional on the past is …rst speci…ed and the conditional intensity is expressed as the product of a baseline hazard rate - as in the tradition of proportional hazard models for survival data - and a function of the expected duration. 2.3 The study of default correlation through count models The economic and …nancial relevance of the default phenomenon, showing peaks of incidence like the sharp one in the crisis period of 2008-2010, has led to an increasing interest in modelling and forecasting time series of corporate default counts. Modeling time series of counts rather than the default rate is quite common and justi…ed by the fact that the default rate denominator - the total number of borrowers in a CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK 23 certain economic sector or rating class - is usually known by the risk managers in a certain advance. It is also possible to note (see Figure 1.1, for instance) that the time series of default counts and default rates share a very similar trend. 2.3.1 Testing conditional independence of defaults According to the doubly stochastic assumption, default events depend uniquely on exogenous variables, that means they are independent conditionally on common macroeconomic and …nancial factors. A method for testing this assumption is developed by Lando and Nielsen (2010), revisiting the method of time change test already used by Das et al. (2007), though reaching di¤erent results. In Lando and Nielsen (2010), the default time of a …rm is modelled through its stochastic default intensity. If the …rm is alive at time t, then the conditional intensity at time t, i.e. the conditional mean default arrival rate for …rm i satis…es it where i = lim P (t < i t+ t t!0 tj i t; Ft ) (2.6) is the default time for …rm i. That means the probability of default within a small time period t after t is close to it t, where it depends on information available at time t as represented by Ft . The individual …rm default intensity is then speci…ed through a Cox regression: it = Rit exp( 0 W Wt + 0 X Xit ) (2.7) where Wt is the vector of covariates that are common to all companies whereas Xit contains …rm-speci…c variables and Rit is a dummy variable which assumes value 1 if …rm i is alive and observable at time t, zero otherwise. The crucial point is to determine the …rm-speci…c and macroeconomic variables which are signi…cant explanatory variables in the regression of default intensity. The Cox regression model was introduced by Cox (1972) in a survival data setting and then extended to the general counting process framework by Andersen and Gill (1982). This approach arises from the Cox proportional hazard model, which is a semi-parametric proportional hazard model making no assumptions about the shape CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK 24 of the baseline hazard function h(t) in the de…nition of the conditional intensity. The latter is in general expressed as: h (tj X) = h(t) exp( 1 X1 + ::: + p Xp ) The theory of Cox regression provides the partial log-likelihood to be maximized by standard techniques in order to draw inference about the parameters vector ( W; = X ): l( ) = n Z X i=1 T 0 W Wt ( + 0 X Xit ) dNi (t) 0 n Z T X i=1 Rit exp( 0 W Wt + 0 0 X Xit )1( i >t) dt (2.8) where Ni (t) is the one-jump process which jumps to 1 if …rm i defaults at time t, n is the total numbers of …rms and T is the terminal time point of the estimation. The cumulative number of defaults among n …rms is then de…ned as: N (t) = n X 1( i t) i=1 The objective is to verify the assumption of orthogonality, i.e. that there are never exact simultaneous defaults. Under this assumption, the aggregate default intensity is the sum of the individual ones: (t) = n X i (t)1( i t) i=1 In order to execute the test, the cumulative default process has to be “timescaled”, that means the scale of time is replaced by the scale of intensity. This is done by de…ning the compensator (t) = Z t (s)ds 0 that allows to write the time-changed process as J(t) = N ( 1 (t)) CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK 25 It is possible to show that J(t) is a unit-rate Poisson process with jump times Vi = ( (i) ) where 0 (1) (2) ::: are the ordered default times. As a consequence, the interarrival times V1 , V2 V1 ,... are independent exponentially distributed variables and, for any c > 0, the jump times Zj = n X 1]c(j 1);cj] Vi i=1 are independent Poisson variables of c intensity. Testing orthogonality of defaults means splitting up the entire time period into intervals in which the cumulative integrated default intensity increases by an in- teger c and verifying, by using several test statistics, if the default counts in each interval are independent and Poisson distributed with mean c. Note that the tested property is the independence of defaults conditional to observable common factors, with the aim of detecting an excess default clustering that is conceivable with the existence of contagion e¤ects. The data used by the authors are the monthly number of Moody’s rated US corporate …rms’defaults occured between 1982 and 2005. With regard to covariates, Wt vector contains the following selection of macroeconomic variables: 1-year return on the S&P index 3-month US Treasury bill rate 1-year percentage change in the US industrial production, calculated from monthly data spread between the 10-year and the 1-year Treasury rate while the …rm-speci…c covariates entering vector Xit are: 1-year equity return 1-year Moody’s distance-to-default CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK 26 quick ratio, calculated as the sum of cash, short-term investments and total receivables divided by the current liabilities log book asset value. The results obtained in the paper by applying the time-change method and then using several test statistics - like the Fisher dispersion and the upper tail statistics in order to test the Poisson assumption, lead to accept the hypothesis that default times are conditionally independent, that was rejected in Das et al. (2007). The authors claim that this is due to the use of a di¤erent set of explanatory variables and so that the contagion e¤ects apparently revealed by the previous analysis are instead explained by missing covariates. They also argue that the time-change test is actually a misspeci…cation test, as the hypothesis of correct intensity speci…cation is satis…ed by construction and that, furthermore, the doubly stochastic assumption is not needed for having orthogonality of default times. They …nd indeed no evidence of contagion by considering a di¤erent speci…cation, that is the Hawkes self-exciting process it = Rit exp( 0 W Wt + 0 X Xit ) + Z t ( 0 + 1 Ys ) exp( 2 (t s)dNs + ) 0 where Ys is the log book asset value of the …rm defaulting at time s. Model (2.13) explicitly includes a contagion e¤ect through an a¢ ne function of Y so that larger …rms’bankrupt has a higher impact on the individual default intensities. The exponential function makes the default impact decay exponentially with time, with 2 measuring the time horizon of in‡uence of a default on the overall intensity. Estimation can be carried out by partial maximum likelihood standard instruments (see, for example, Andersen et al., 1992). In a recent extension of Lando and Nielsen (2010), Lando et al. (2013) replace the Cox multiplicative model with an additive default intensity, based on Aalen (1989) regression model, where the covariate e¤ects act in an additive way on a baseline intensity. The authors claim that the advantage of this model is allowing for the introduction of time-varying e¤ects without the need for estimation procedures more complex than the least squares methods. The focus moves from the test of the CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK 27 conditional independence hypothesis characterizing the previous paper to the search for predictive variables acting on default intensity with nonconstant magnitude. The results are partly di¤erent from those reached by the previous analysis: the timevarying e¤ects of …rm-speci…c variables like distance-to-default and short-to-long term debt are found signi…cant, but none of the macroeconomic covariates - many of which already successfully employed in Lando and Nielsen (2010) - are. A problem in the interpretation of results is that some of the coe¢ cients are negative, thus leading to negative default intensities, which is a nonsense from a technical point of view. With regard to this aspect, the authors claim that default intensity should be interpreted as a risk measure rather than an expected rate and that negative values could indicate that a …rm is weakly involved in the risk of failure. 2.3.2 An Autoregressive Conditional Duration model of credit risk contagion The use of self-exciting processes for representing the cascading phenomenon of bankruptcies was already present in another previous work, through a di¤erent speci…cation. Focardi and Fabozzi (2005) propose indeed a self-exciting point process. The model belongs to the autoregressive conditional duration (ACD) family introduced by Engle and Russell (1998) and is based on the idea of modelling default clustering with econometric techniques that are the point process analogue of ARCH-GARCH models. Applying the ACD speci…cation to the number of defaults, the default process in a time interval (0; t) is de…ned as a sequence of default times ti ; i = 1; 2; :::; with the related durations between defaults ti = (ti +1 ti ). The model is speci…ed in terms of the conditional densities of the durations, de…ning E [ ti j ti 1 ; :::; t1 ] = [ ti 1 ; :::; t1 ; ] = i (2.9) and ti = where "t are i.i.d. variables and i "t (2.10) is a parameter. It is then assumed that the expectation of the present duration is linearly determined by the last m durations between defaults and the last q expectations of CHAPTER 2. ECONOMETRIC MODELLING OF DEFAULT RISK durations: i =!+ m X j ti j=0 j + q X j ti j 28 (2.11) j=0 This model is called an ACD(m; q) model. The authors apply ACD models to simulated data of default durations in order to evaluate the impact of di¤erent expected durations on the value of a credit portfolio. 2.4 Concluding remarks We have investigated how the econometric and …nancial literature has faced the modelling of default risk and the interpretation of the relative empirical results under the perspective of default predictability and correlation, also clarifying the origin and the issues of the current debate about contagion. The search for explanatory variables in the default rates and counts evolution has led to not always obvious results, because, for example, the link with business cycle indicators and macroeconomic variables does not appear so strong. We have also considered the discussion on the predictive power of rating and described some common approaches to the modelling of rating transitions. We have progressively focused on models which consider count processes for investigating the corporate defaults dynamics. Many of these models aim at analyzing default correlation. With regard to this topic, we claim that the idea of distinguishing between common factors and contagion, thus separating the systematic risk from other risk components, is worth being further investigated. An aspect which seems somewhat missing in the literature yet is that of the autoregressive components in the defaults dynamics, which could lead to interesting considerations about the persistence in the default phenomenon. It is, indeed, present in Focardi and Fabozzi (2005), but without considering the role of covariate processes, so giving a limited de…nition of contagion which does not take into account crucial aspects of credit and …nancial risk, and without presenting any application to real data. Our approach to default risk modelling, which we will present in Chapter 4, considers indeed both exogenous variables and autoregressive components and is applied to an empirical corporate default count time series in Chapter 5. Chapter 3 Econometric modelling of Count Time Series This chapter presents the main models for count time series. They are based on the theory of Generalized Linear Models for time series, that is reviewed in the …rst section. The aim of the next sections is to make a critical review, focused on the suitability of the presented models to explain some features commonly found in empirical count time series, such as overdispersion in the data. This is instrumental to the following of our work, which proposes a modelling framework for default count data, based on the extension of the Poisson autoregressive model introduced in the last section. 3.1 Generalized Linear Models for time series It is well known that generalized linear models (GLM), introduced by Nelder and Wedderburn (1972), allow to extend ordinary linear regression to nonnormal data. Applying the theory of GLM to time series makes thus possible to handle very common processes like binary and count data, which are not normally distributed. Before presenting the most important applications of GLM to the modelling of count data, it is important to present the concept of partial likelihood, introduced by Cox (1975). Partial likelihood is an useful tool when the observations are depend- 29 CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 30 ent and the covariates representing auxiliary information are also random and time dependent. In these situations the likelihood function is not readily available as the nonindependence prevents from deriving a simple factorization. Consider a generic response time series fyt g, t = 1; :::; T . If no other assumption is added, the joint density f (y1 ; :::; yT ), parametrized by vector , is de…ned as f (y1 ; :::yT ) = f (y1 ) T Y t=2 f (yt j y1 ; y2 ; :::; yt 1 ) (3.1) where the main di¢ culty is that, if no other assumption is made, the size of increases as the series size T does. A more tractable likelihood function can be obtained by introducing limitations in conditional dependence such as Markovianity, according to which we could use, for example, the following factorization: f (y1 ; :::yT ) = f (y1 ) T Y t=2 where inference regarding f (yt j yt 1 ) (3.2) can be based only on the product term, as the …rst factor is not dependent on T . Then, consider the case where the response variable is observed jointly with some time-dependent random covariate Xt . Then the joint density of the X and Y observations can be written, using conditional probabilities, as: "T #" T # Y Y f (x1 ; y1 ; :::; xT ; yT ) = f (y1 ) f (xt j dt ) f (yt jct ) t=2 (3.3) t=2 where dt = (y1 ; x1 ; :::; yt 1 ; xt 1 ) and ct = (y1 ; x1 ; :::; yt 1 ; xt 1 ; xt ). The idea of Cox is to take into account only the second product of the right hand side of (3.3), which is a “partial”likelihood in the sense that it does not consider the conditional distribution of the covariate process Xt . Moreover, it does not specify the full joint distribution of the response and the covariate. Cox (1975) shows that the second product term in (3.3) can be used for inference, although it ignores a part of the information about . The general de…nition of the partial likelihood (PL) relative to , Ft 1 and the observations Y1 ; :::; YT applies this idea joint with that of limited conditional dependence mentioned above. Considering only what is known to the observer up to the CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 31 present time allows for sequential conditional inference: PL( ;y1 ; :::; yT ) = T Y t=1 where Ft 1 f (yt ; t jFt 1 ) = T Y f (yt ; ) (3.4) t=1 is the …ltration generated by all is known to the observer by t and pos- sibly including the information given by a random covariate process. Note that this de…nition simpli…es to ordinary likelihood when there is no auxiliary information and the data are independent, while it becomes a conditional likelihood when a deterministic - i.e. known throughout the period of observation - covariate process is included. This formulation enables conditional inference for nonMarkovian processes where the response depends on autoregressive components and past values of covariates, as it does not require the full knowledge of the joint distribution of the response and the covariates. The vector maximizing equation (3.4) is called the maximum partial likelihood estimator (MPLE) and its theoretical properties have been studied by Wong (1986). We now show how the theory of GLM and partial likelihood can be applied to time series (see Kedem and Fokianos, 2002 for a complete review). Consider again the response series fyt g, t = 1; :::; T and include a p-dimensional vector of explanatory variables xt = (xt;1 ; :::; xt;p )0 . Then denote the -…eld generated by yt 1 ; yt 2 ; :::; xt 1 ; xt 2 ; ::: as Ft 1 = fyt 1 ; yt 2 ; :::; xt 1 ; xt 2 ; :::g where is often convenient to de…ne Zt = (yt ; xt )0 which contains both the past values of the response and a set of covariates: Ft 1 = fZt 1 ; Zt 2 ; :::g The main feature of GLM for time series is the de…nition of the conditional expectation of yt given the past of the process Zt : t = E [yt j Ft 1 ] (3.5) It is worth to note that de…ning the expected value of yt as a linear function of the covariates can lead to senseless results when the data are not normal. For instance, CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES linear regression of t 32 on the covariates may lead to negative estimates of intensity when the response is Poisson distributed. The GLM approach to time series can be stated in two steps: 1. Random component: the conditional distribution of the response given the past belongs to the exponential family of distributions, that is f (yt ; where t j Ft 1 ) = exp fyt t (3.6) + b( t ) + c(yt )g is the natural (or canonical) parameter of the distribution. Q Q By setting Tt=1 f (yt ; t j Ft 1 ) = Tt=1 f (yt ; t ), the latter product de…nes t a partial likelihood in the sense of Cox (1975), as it is a nested sequence of conditioning history, not requiring the knowledge of the full likelihood. 2. Systematic component: there exists a monotone function g( ) such that g( t ) = t = p X j Z(t 1)j = Z0t (3.7) 1 j=1 where we call g( ) the link function, while we refer to of the model, and t as the linear predictor is a vector of coe¢ cients. It is quite common to include also Xt , i.e. the present value of x, in the covariate vector, if it is already known in t 1. It can happen, for instance, when x is a deterministic process or when yt is a delayed output. We refer, then, to g 1 ( ) as the inverse link function. 3.2 3.2.1 The Poisson Model Model speci…cation When handling count data, a natural candidate is the Poisson distribution. If we assume that the conditional density of the response given the past, i.e. the available information up to time t, is that of a Poisson variable with mean f (yt ; t j Ft 1 ) = yt t) t exp( yt ! ; t = 1; :::; T t, we get (3.8) CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 33 In the Poisson model, the conditional expectation of the response is equal to its conditional variance: E [yt j Ft 1 ] = V ar [yt j Ft 1 ] = (3.9) t Then we denote by fZt 1 g ; t = 1; :::; T a p-dimensional vector of covariates which may include past values of the response and other auxiliary information. A typical choice for Zt 1 is Zt 1 = (1; yt 1 ; xt )0 but it is also possible to consider interactions between the processes by de…ning, for instance, Zt 1 = (1; yt 1 ; xt ; yt 1 xt )0 . Following the theory of of GLM and recalling (3.7), a suitable model is obtained by setting t = t and g( t ) = where t = Z0t (3.10) t = 1; :::; T 1 is a p-dimensional vector of unknown parameters. The most common model is that using the canonical link function, which is derived from the canonical form of the Poisson conditional density: f (yt ; t j Ft 1 ) = exp f(yt log t t) where the natural parameter turns out to be log log t !g ; t = 1; :::; T t. Hence, g( t ) = log t; is de…ned as the canonical link, while the inverse link function g t (3.11) t = 1; :::; T 1 guarantees that > 0 for every t, as: g 1 ( t ) = exp( t ); t = 1; :::; T (3.12) The resulting de…nition of intensity t = exp(Z0t 1 ); t = 1; :::; T (3.13) characterizes the so-called log-linear model, which has been widely applied in econometrics since Hausman et al. (1984). CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 3.2.2 34 Inference Consider …rst the estimation of the parameter vector case of the Poisson model with g( t ) = Z0t 1 =( 1 ; :::; p) for the general . Recalling (3.4), the partial likelihood function is PL( ) = = T Y t=1 T Y f (yt ; j Ft 1 ) exp( t( t=1 )) t ( )yt yt ! (3.14) Hence, the partial log-likelihood is the following: l( ) log PL( ) T X yt log t ( ) = t=1 T X t( ) T X yt log yt ! (3.15) t=1 t=1 The partial score function is then obtained by di¤erentiating the log-likelihood: ST ( ) = rl( ) = = T X t=1 Zt @l( ) @l( ) ; :::; @ 1 @ p @g 1 ( t ) 1 @ t 1 (yt t( ) 0 t( )) (3.16) Then, the MPLE ^ (see Wong, 1986) is obtained by solving the system ST ( ) = rl( ) = 0 (3.17) which has to be solved numerically, because is nonlinear. Besides the use of standard Newton-Raphson type algorithms, a possible method for solving (3.17) is the Fisher scoring, which is a modi…cation of the Newton-Raphson algorithm where the observed information matrix is replaced by its conditional expectation, yielding some computational advantages. The application of the Fisher scoring method to the partial likelihood estimation of the Poisson model is presented in Kedem and Fokianos (2002). De…ne …rst the observed information matrix as HT ( ) = rr0 l( ) (3.18) CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 35 It admits the following decomposition: H T ( ) = GT ( ) (3.19) RT ( ) where GT ( ) is the cumulative conditional information matrix, which is de…ned as T X GT ( ) = t=1 T X = Cov Zt Zt 1 @g 1 ( t ) @ t 2 @g 1 ( t ) @ t 1 t=1 0 1 (yt t( ) 1 Z0t ( ) t t( )) j Ft 1 1 (3.20) = Z W( )Z where Z = (Z00 ; Z01 ; :::; Z0T 1) entries wt = is a T p matrix and W( ) = diag(w1 ; :::; wT ) with @g 1 ( t ) @ t and RT ( ) = T X 2 1 ; t = 1; :::; T t( ) Zt 1 dt ( )Z0t 1 (yt t( (3.21) )) t=1 2 1 with dt ( ) = [@ log g ( t )[email protected] 2 t ]. By substituting HT with GT , if GT 1 exists, the iterations take the form ^ (k+1) = ^ (k) + G 1 ( ^ (k) )ST ( ^ (k) ) T (3.22) An interesting feature of the Fisher scoring is that it can be viewed as an iterative reweighted least squares (IRLS) method. It should indeed be noted that equation (3.22) can be rewritten as (k GT ( ^ ) ^ (k+1) = GT ( ^ (k) )^ (k) + ST ( ^ (k) ) (3.23) where the right-hand side is a p-dimensional vector whose i-th element is " T # p T 1 X X Z(t 1)j Z(t 1)i @g 1 ( ) 2 (k) X (yt t )Z(t 1)i @g ( t ) t ^ + j 2 2 @ t @ t t t t=1 j=1 t=1 = T X t=1 Z(t 1)i wt t + (yt t) @g 1 ( t ) @ t CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 36 Thus, de…ning (k) qt = T X ^ (k) + (yt 1)i Z(t t) j t=1 ^ t( = (k) ) + (yt t) @g 1 ( t ) @ t @g 1 ( t ) @ t (k) and, denoted by q(k) the T -dimensional vector whose elements are the qt , the righthand side of (3.23) is equal to Z0 W( (k) )q(k) . By applying (3.20) to the left side, (3.23) becomes Z0 W( ^ (k) )Z ^ (k+1) = Z0 W( ^ (k) )q(k) and the iteration simpli…es to ^ (k+1) = (Z0 W( ^ (k) )Z) 1 Z0 W( ^ (k) )q(k) (3.24) The limit for k ! 1 of recursion (3.24) is the maximum partial likelihood estimator ^ . In each iteration we can recognize the form of the weighted least squares with adjusted weight W( (k) ) and the adjusted dependent variable q(k) . For initializing the recursions the conditional means can be replaced by the corresponding responses in order to get a …rst estimate of the weight matrix W and hence a starting point for . When the canonical link is used, we have t( ) = exp(Z0t ) 1 and several sempli…cations are possible. Indeed, for the log-linear model, equations (3.17) and (3.20) become ST ( ) = T X Zt 1 (yt t( )) (3.25) ) (3.26) t=1 and GT ( ) = T X 0 Zt 1 Zt 1 t( t=1 Moreover, as dt = 0 in (3.21), RT ( ) vanishes and we get H T ( ) = GT ( ) (3.27) thus for the log-linear model the Fisher scoring and Newton-Raphson methods coincide. CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 3.2.3 37 Asymptotic theory In the general theory of GLM the following assumptions (see Fahrmeir and Kaufmann, 1985 for more details) allow to show consistency and asymptotic normality of the MPLE ^ : Assumption 1 The true parameter Rp : belongs to an open set The covariate vector Zt 1 almost surely lies in a nonrandom i hP 0 T 0 > 0 = 1. In addition, Zt 1 of Rp , such that P Z Z t 1 t 1 t=1 Assumption 2 compact subset lies almost surely in the domain H of the inverse link function g and 1 for all Zt 2 B: Assumption 3 The inverse link function g 1 2 is twice continuously di¤erentiable and [email protected]( )[email protected] j = 6 0. Assumption 4 1 There is a probability measure on Rp such that R Rp zz 0 (dz) is positive de…nite, and such that, if the conditional distribution of Yt belongs to the exponential family of distributions in canonical form and under (3.10), for Borel sets Rp , A T 1X I[Z T t=1 t as T ! 1, at the true value of p 1 2A] ! (A) . Assumption 4 assures the existence of a p matrix G( ) = Z Rp with = Z0 such that Z @g 1 ( ) @ p nonrandom limiting information 2 1 g 1( GT ( ) p ! G( ) T ) Z0 (dz) (3.28) (3.29) Once stated the above assumption, the following theorem, providing the asymptotic properties of the MPLE, can be presented. Theorem 3.1 For the Poisson model, as well as for the general case of GLM, it can be shown that, under assumptions 1-4, the maximum partial likelihood estimator is almost surely unique for all su¢ ciently large T and CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 38 1. The MPLE is consistent and asymptotically normal: p ^! and p T (^ d ) ! N (0; G 1 ( )) as T ! 1. 2. The following holds: p T (^ 1 p p G 1 ( )ST ( ) ! 0 T ) as T ! 1. 3.2.4 Hypothesis testing Consider the test of the hypothesis H0 : C0 =r where C is a known p q matrix with full rank and r is a known q-dimensional column vector. Then denote by 0 the restricted maximum partial likelihood estimator under the null hypothesis. The most commonly used test statistics for testing H0 in the context of the Poisson model are: - the partial likelihood ratio statistic n LRT = 2 log PL( ^ ) log PL( 0) o (3.30) - the Wald statistic WT = (C0 ^ 0 0 1 0 ) (C G ( 0 )C) 1 (C0 ^ 0) (3.31) - the partial score statistic LMT = 1 0 ~ S ( )G 1 ( ~ )ST ( ~ ) T T (3.32) CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 39 Kedem and Fokianos (2002) prove the following theorem concerning the asymptotic distribution of the test statistics de…ned above. Theorem 3.2 Under the set of assumptions 1-4, the test statistics LRT ; WT and LMT are asymptotically equivalent. Furthermore, under H0 , their asymptotic distribution is chi-square with r degrees of freedom. 3.2.5 Goodness of …t In the context of Poisson regression for count time series, several de…nitions of residuals can be employed (see Cox and Snell, 1968). - The raw residual is the di¤erence between the response and its conditional expectation: r^t = yt t( ^ ); t = 1; :::; T (3.33) - The Pearson residual is the standardized version of the raw residual, taking into account that the variance of Yt is not constant: yt e^t = q t( ^) ; (3.34) t = 1; :::; T ^ t( ) - The deviance residual d^t = sign(yt ^ t ( )) q lt (yt ) lt ( t ( ^ )) (3.35) can be viewed as the t-th contribute to the model deviance. The notion of deviance is based on a likelihood comparison between the full (or saturated) model and the estimated model. The full model is that where is estimated directly from the data y1 ; :::; yT disregarding t , thus it has as many parameters as observations, as in this case the maximum partial likelihood of t is yt . The estimated model includes p < T parameters instead. Since l(y; y) l( ^ t ; y), the deviance statistic n D = 2 l(y; y) l( ^ t ; y) o (3.36) CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES where l(y; y) = T P 40 yt has been suggested as a measure of the model overall goodness t=1 of …t. Lower positive values correspond to a better …tted model. The deviance 2 T p statistic has been shown to be have an approximate distributions under certain conditions (see Mc Cullagh, 1986). In many generalized linear model, including the Poisson, Pearson residuals are known to be skewed and fat tails. It can be indeed convenient to use a normalizing transformation so that they are more likely to achieve approximate normality under the correct model, like the Anscombe residuals. In McCullagh and Nelder (1983) these are de…ned as: a ^t = 3 2=3 y 2 ^ 2=3 t (3.37) ^ 1=6 t Autocorrelation of Pearson residuals The large sample properties of the MPLE stated by Theorem 3.1 imply that e^t is a t consistent estimator of et = yp e (k) t( t( ) ) , so that the autocorrelation of the et ’s at lag k can be consistently estimated by T 1 X e^t e^t ^e (k) = T t=k+1 k (3.38) Li (1991) has proved the following theorem relative to the asymptotic distribution of the autocorrelation vector. Theorem 3.3 Under the correct model, the vector 1 p ^e = T ^e (1) ^e (2) ^e (m) p ; p ; :::; p T T T for some m > 0 is asymptotically normally distributed with mean 0 and some diagonal limiting covariance matrix (see Li, 1991 for details). Testing the “whiteness” of Pearson residuals is used in many applications for goodness of …t analysis, as they should be a white noise, i.e. a sequence of uncorrelated random variables with mean 0 and …nite variance, under the correct model (see Kedem and Fokianos, 2002). Plots of the sample autocorrelation function of Pearson CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES residuals with con…dence bands at 41 p 1:96= T are commonly used for goodness of …t evaluation. 3.2.6 Model selection In GLM for count time series, selection among competing models can be based on the traditional information criteria. The Akaike Information Criterion (AIC) introduced by Akaike (1974), in the partial likelihood estimation context is a function of the partial log-likelihood and the number of parameters: 2 log PL( ^ ) + 2p AIC(p) = (3.39) The model with the number of parameters p which minimizes (3.39) is preferred. The so-called Bayesian information criterion (BIC), following Schwarz (1978) is de…ned as BIC(p) = 3.3 2 log PL( ^ ) + p log T (3.40) The doubly-truncated Poisson model The traditional Poisson model can be generalized, as in Fokianos (2001), by assuming that the conditional distribution of the response is doubly truncated Poisson. Let fYt g, t = 1; :::; T be a time series of counts and suppose to obmit the values below a known …xed constant c1 and exceeding another known …xed constant c2, with c1 < c2. Then the doubly truncated Poisson conditional density is f (yt ; t ; c1; c2 j Ft 1 ) = exp( t ) yt t ; t = 1; :::; T yt ! (c1; c2; t ) where the function and clearly is de…ned as 8 c2 y X > > t < y! (c1; c2; t ) = y=c1 > > : (0; c2; (3.41) (0; 1; t) if 0 6 c1 < c2 t) otherwise = exp( t ) leads to the common Poisson model. This gener- alization turns out to be useful for modelling truncated count data. CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 42 An often used speci…cation is that obtained by setting c1 = 1 and c2 = 1. In this case (3.41) becomes: f (yt ; t ; 1; 1 yt t j Ft 1 ) = yt !(exp( t) 1) ; t = 1; :::; T It should be noted that, di¤erently from the traditional Poisson model, for the truncated Poisson model the conditional mean is not equal to the conditional variance, as E tr [yt ; c1; c2 j Ft 1 ] = (c1 t 1; c2 1; (c1; c2; t ) t) while 1 f (c1; c2; t ) (c1; c2; t ) + V artr [yt ; c1; c2 j Ft 1 ] = 2 [ (c1; c2; t) 2 t (c1 2; c2 2; (c1 1; c2 1; t t (c1 1; c2 t) 1; t) t )]g As can be noticed from (3.41), the doubly truncated Poisson distribution belongs to the exponential family of distributions, hence its canonical link is the logarithm and the inverse link is the exponential. Therefore, we obtain again the log-linear model t exp(Z0t ) 1 and inference is based on maximization of the log-likelihood function derived by (3.41). 3.4 The Zeger-Qaqish model Zeger and Qaqish (1988) de…ne the following multiplicative model: t( ) = exp( 0 + 1 xt + = exp( 0 + yt 1 1 xt )~ 2 2 log(~ yt 1 )) , (3.42) t = 1; :::; T and no distributional assumption for the response yt is speci…ed. It is clear that, when 2 < 0, there is an inverse relationship between y~t 1 and t( ), while the CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES conditional mean grows with y~t 1 when 2 > 0. Observe that, when 2 43 < 0, (3.42) reduces to a log-linear model. In this formulation Zt = (1; Xt ; log(~ yt 1 ))0 , 1 =( 0; 1; 0 2) , while y~t 1 is de…ned either as y~t 1 = max(c; yt 1 ), 0<c<1 y~t c>0 or so that yt 1 1 = yt 1 + c, = 0 is not an absorbing state. Equation (3.42) de…nes the …rst conditional moment. With respect to the conditional variance it is assumed: V ar[yt j Ft 1 ] = V ( t ) (3.43) where V ( ) is a known variance function de…ning the relationship between the conditional mean and the conditional variance, and ' is an unknown dispersion parameter. The so-called working variance V ( t ) allows to accomodate some features found in the data. For example, the variance model t, with ' > 1, may hold for count data where the conditional variance exceeds the conditional mean. As can be seen, in this model the assumptions on the response distribution concern only the …rst and second conditional moments. A possible extension of (3.42) is the following multiplicative error model: t( ) = exp( 0 + 1 xt ) y~t 1 exp( 0 + 1 xt 1 ) 2 t = 1; :::; T which can be generalized by considering, as in Kedem and Fokianos (2002), the following model: t( where " ) = exp x0t + q X i=1 = ( 0 ; 1 ; :::; 0 q) ~t 1 i (log y x0t 1 # ) is an s + q-dimensional parameter vector and fxt g is an s- dimensional covariate vector of covariates. Note that when s = 2, q = 1, xt = (1; xt )0 and 1 = 2, (3.44) t = 1; :::; T (3.44) reduces to (3.42). =( 0; 1 ), CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 44 Turning to the theory of inference for the Zeger-Qaqish model (3.49), we consider the case where c is known. In this case, the estimation of the parameter vector can be carried out by using the quasi-score function: T X ST ( ) = Zt (yt t ( )) V ( t ( )) t t 1 t=1 (3.45) which resembles the score function (3.16), except that the true conditional variance is replaced by the working variance. According to the theory of quasi-partial maximum likelihood estimation for GLM (see Wedderburn, 1974), the estimator ^ q is consistent and asymptotically normal: p T (^q d ) ! N (0; G 1 ( )G1 ( )G 1 ( )) where G( ) and G1 ( ) are the following matrices: GT ( ) = and where 2 t( T 1X Zt T t=1 T 1X G1 ( ) = Zt T t=1 2 1 t 1 t 2 1 t 2 t V 2 t( 2( p Z0t 1 ! G( ) ) Z0t ( )) t 1 ! G1 ( ) V ( t ( )) p ) denotes the true conditional variance. In practice, the covariance mat- rix of ^ q is estimated by replacing the parameters '; ; estimates. The true conditional variance dispersion parameter 2 t( 1 T s ) by their respective 2 ) is replaced by yt can be estimated by ^= 2 t( T X e^t t=1 where e^t is the Pearson residual at time t: ^ yt t( q ) e^t = q V ( t( ^ q ) t( ^ q ) . The CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 3.5 45 Overdispersion and negative binomial regression The equality of mean and variance characterizing the Poisson model makes it nonsuitable when the data show overdispersion, i.e. the response variance is higher than the mean. We will show in the following that the introduction of lagged values of the response among the regressors for t allows the unconditional variance to be higher than the unconditional mean, di¤erently from the traditional Poisson model with only exogenous regressors. However, in general, when modelling count data the problem of overdispersion should be addressed. Several post-hoc tests - i.e. performed after modelling the data - have been proposed in order to detect overdispersion. One of them is the Pearson statistic, de…ned as the sum of squared Pearson residuals: 2 = T yt X t=1 t( t( ^) 2 (3.46) ^) Its distribution was studied, among the others, by McCullagh (1986) and McCullagh and Nelder (1989). Under suitable regularity conditions, its distribution converges to a chi-square with T p degrees of freedom. A distribution which is known to …t overdispersed count data is the negative binomial. If the conditional density of a time series given the past is that of a negative binomial variable with parameters pt and r , its distributional law is yt + r 1 r f (yt ; pt ; r j Ft 1 ) = pt (1 pt )yt ; t = 1; :::; T (3.47) r 1 where pt is the probability that an event occurs in t while r is the scale parameter and its inverse 1=r is known as the overdispersion parameter. The conditional mean r(1 pt ) E [Yt j Ft 1 ] = = is lower than the conditional variance V ar [Yt j Ft 1 ] = pt r(1 pt ) . p2t The systematic component of the GLM in the negative binomial case, linking pt , and thus the expected conditional value, to a set of covariates Z, can be de…ned, as in Davis and Wu (2009), through the following logit model: log pt 1 pt = exp(Z0t 1 ) (3.48) CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 46 yielding = r exp(Z0t 1 (3.49) ) The maximum likelihood estimator ^ maximizes the partial log-likelihood function l( ) log PL( ) = r T X log(1+exp(Z0t T X )) 1 Yt log(1+exp(Z0t 1 t=1 t=1 ))+log T Y yt + r 1 r 1 t=1 (3.50) Several optimization algorithms have been proposed by Hilbe (2007). As we said, negative binomial is often used as an alternative to the Poisson model. For testing the Poisson model against the negative binomial distribution, a commonly used test statistic is that characterizing the Z test, which Lee (1986) de…nes as follows: Z= T P yt t( ^) 2 t( ^) t=1 T p P 2 (3.51) t( ^) t=1 and is shown to have asymptotic standard normal distribution. As the probability limit of the numerator is shown to be positive under the alternative hypothesis that the negative binomial distribution is preferable, a one-sided test is convenient. In particular, the Poisson speci…cation is rejected in favour of the negative binomial with a level of signi…cance T X if yt ^ t( ) 2 ^ t( ) > c t=1 T p X 2 t( ^) t=1 where c is the critical value. 3.6 Poisson Autoregression Fokianos, Rahbek and Tjøstheim (2009), henceforth FRT (2009), study a particular Poisson time series model, characterized by a linear autoregressive intensity and allowing to …t data showing a very slowly decreasing dependence. This model was already existing in literature and shown to …t some …nancial count data satisfactorily, but FRT (2009) is the …rst work to study ergodicity and develop the asymptotic theory, which is crucial for likelihood inference. CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 3.6.1 47 Model speci…cation FRT (2009) study the properties of the following Poisson model: yt jFtY; 1 t where the parameters !, P ois( t ) = ! + yt and 1 + t 1 t>1 are assumed to be positive. In addition, (3.52) 0 and y0 are assumed to be …xed. By introducing for each time point t a “scaled” Poisson process Nt ( ) of unit intensity, it is possible to rephrase (3.52) so that the response is de…ned explicitly as a function of the conditional mean: yt = Nt ( t ) t = ! + yt 1 + t 1 t>1 where yt is then equal to the number of events of Nt ( ) in the time interval [0; (3.53) t ]. The rephrased model (3.53) is found to be more convenient when proving the asymptotic normality of the parameter estimates. Furthermore, expressing yt as a function of conditional mean - which in the Poisson model is equal the conditional variance recalls the …rst de…ning equation in the GARCH model. It is interesting to note that the sum ( + ) can be considered as a measure of persistence in intensity, just as the sum of the ARCH and GARCH parameters in the GARCH model can be read as a measure of persistence in volatility. Both (3.52) and (3.53) refer to the theory of generalized linear model (GLM) for count time series. Here the random component is the Poisson distribution, as the unobserved process t can be expressed as a function of the past values of the observed process yt after recursive substitution. The peculiarities of this approach are mainly two. First, it is characterized by a noncanonical link function - the identity - while, as we have seen, the traditional Poisson model uses the log-linear speci…cation. The other contribution is the introduction of an autoregressive feedback mechanism in f t g, while in the tradition CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 48 of GLM the intensity is function of a vector of covariates, possibly including the lagged value of the response. This aspect makes the model able to capture a strong persistence with a small number of parameters. As said before, although FRT (2009) is the …rst work studying ergodicity of (3.53), that is critical in developing the asymptotic theory, this model was already been considered in the econometric literature. It belongs indeed to the class of observation-driven models for time series of counts studied, among the others, by Zeger and Qaqish (1988) and, more recently, by Davis et al. (2003) and Heinen (2003). The latter de…nes, in particular, an Autoregressive Conditional Poisson model (ACP), which is a more general form of 3.53 including several lags of counts and intensity. A strong motivation for the analysis of this class of models is that is shown to well approximate some common …nancial count time series, such as the number of trades in a short time interval (Rydberg and Shephard, 2000 and Streett, 2000). In particular, Ferland et al. (2006) de…ne model (3.53) explicitly as an an integervalued GARCH(1,1), i.e. an INGARCH(1,1), and show that Yt is stationary provided that 0 6 + < 1. In particular, E[yt ] = E[ t ] = = !=(1 ) They further show that all the moments are …nite if and only if 0 6 + < 1. Turning to the second moments, as 2 V ar[yt ] = 1+ 1 ( + )2 it is immediate to conclude that V ar[Yt ] > E[Yt ], with equality when = 0. Thus, including the past values of the response in the evolution of intensity leads to overdispersion, a feature often found in real count data. 3.6.2 Ergodicity results A crucial point in the analysis of this model is to prove the geometric ergodicity of the joint process (yt ; t ), where yt is the observed component, while the intensity process is latent. The notion of geometric ergodicity for a Markov chain process can be CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 49 summarized as follows. First, the concept of '-irreducibility has to be introduced. Consider the homogenous Markov chain Zt de…ned on a -…eld M on A, where P t (z; B) = P (Zt 2 B j Z0 = z) is the probability of moving from z 2 A to the set B 2 M in t steps. The Markov chain (Zt ) is said -irreducible if, for some nontrivial -…nite measure on (A; M), 8B 2 M (B) > 0 ) 8x 2 A; 9t > 0; P t (z; B) > 0 If a -irreducible Markov chain is positive recurrent (see Meyn and Tweedie, 1996), then there exists a (unique) invariant distribution, that is a probability measure such that 8B 2 B (B) = Z P (z; B) (dz) Finally, (Zt ) is said to be geometrically ergodic if there exists a t 8x 2 A P t (z; ) 2 (0; 1) such that ! 0 as t ! +1 Thus, geometric ergodicity states convergence to the invariant distribution. FRT (2009) succeed in proving geometric ergodicity of (yt ; t) by using an ap- proximated (perturbed) model and proving that is geometrically ergodic under some restrictions on the parameter space. Then, they show that the perturbed model can be made arbitrarily close to the unperturbed one, allowing to extend the results to the latter. The perturbed model is de…ned as: ytm = Nt ( m t where m 0 m t ) = ! + ytm 1 + m t 1 + "t;m (3.54) and y0m are …xed and "t;m = cm I ytm 1 = 1 ut ; cm > 0 cm ! 0 as m ! 1 where I f g is the indicator function and fUt g is a sequence of i.i.d. uniform random variables on (0; 1) such that fUt g and fNt g are independent. The introduction of CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES fUt g enables to establish irreducibility, where support [k; 1) for some k proof that the point , with 50 is the Lebesgue measure with ) solution of = !=(1 =!+ . The is reachable, and so that f t g is open set irreducible on [ ; 1), provided that < 1, is instead given (see FRT, 2009 for details) without using any perturbation. The following lemma allows to complete the proof of ergodicity of (3.53), establishing that the perturbed model can be made arbitrarily close to the unperturbed one. Lemma 3.1 With (yt ; if 0 2. E( m t m t t )j 2 t) = jE(ytm yt )2 and ! 0 as m i;m m t 3.6.3 de…ned by (3.53) and (3.54) respectively, yt )j 1;m 2;m 3. E(ytm large, j m t ) and (ytm ; 1, then the following statements hold: + 1. jE( t) 3;m ! 1 for i = 1; 2; 3. Furthermore, with m su¢ ciently and jytm tj yt j for any > 0 almost surely. Estimation of parameters Denoting by the three-dimensional vector of unknown parameters, i.e. (!; ; )0 , the conditional likelihood function for observations y1 ; :::; yT given the starting values L( ) = T exp( Q t=1 where by 0 t( = (! 0 ; ) = ! + yt 1 ( ) + 0; 0) 0 , we can write based on (3.52) in terms of the 0 ; y0 t( = )) is the following: yt t ( ) (3.55) yt ! t 1, while, denoting the true parameter vector = t ( 0 ). t Thus the conditional log-likelihood function is given, up to a constant, by l( ) = T X t=1 while the score function is T X lt ( ) = (yt log ST ( ) = t( ) t( )) (3.56) t=1 T X t=1 yt t( ) 1 @ t( ) @ (3.57) CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 51 where @ t ( )[email protected] is a three-dimensional vector with components @ t 1 @ t = 1+ ; @! @! @ t @ t = yt 1 + @ @ @ @ t = t 1 @ + t 1 @ ; 1 The solution of ST ( ) = 0 yields the conditional maximum likelihood estimator of , denoted by ^ . The Hessian matrix is then obtained by further di¤erentiation of the score equations (3.57): HT ( ) = = T X @ 2 lt ( ) @ @ 0 t=1 T X t=1 yt 2 t( ) T X t=1 @ t( ) @ yt t( ) 1 @ t( ) @ 0 @ 2 t( ) @ @ 0 (3.58) In order to study the asymptotic properties of the maximum likelihood estimator for the unperturbed model which are presented in the following, it is again helpful to use the ergodic properties of the perturbed model, whose likelihood function, based on the Poisson assumption and the independence of Ut from (ytm ; m L ( )= T exp( Q t=1 m t ( ))( m yt ! m t ( m t ); is de…ned as m T ))yt Q fu (Ut ) t=1 where fu denotes the uniform density. Note that, as Lm ( ) and L( ) has the same m form, then Sm T ( ) and HT ( ) are the counterpart of ST ( ) and HT ( ), where (yt ; are replaced by (ytm ; 3.6.4 t) m t ). Asymptotic theory FRT (2009) prove that the maximum likelihood estimator ^ is consistent and asympm totically normal by …rst showing these properties for ^ . For proving consistency and m asymptotic normality of ^ they take advantage of the fact that the log-likelihood CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 52 function is three times di¤erentiable, which allows to apply Lemma 1 of Jensen and Rahbek (2004). The latter states consistency and asymptotic normality of the maximum likelihood estimator for the traditional GARCH(1,1) model when some assumptions on parameters are relaxed. It is then shown that the score function, the information matrix and the third derivatives of the perturbed likelihood tend to the corresponding quantities of the unperturbed likelihood function. This allows to use proposition 6.3.9 of Brockwell and Davis (1991), stating convergence in distribution of a random vector when some conditions are satis…ed. Before formulating the theorem stating the main result, it is necessary to de…ne the lower and upper values of each component of , ! L < ! 0 < ! U ; 1; and L < 0 < L < 0 < U < U: O( 0 ) = f j0 < ! L j0 < ! L ! ! !U ; !U ; 0 < L U < 1 and 0 < L U g The following theorem states the properties of consistency and asymptotically normality of the maximum likelihood estimator, under a stationarity condition. Theorem 3.3 0 + 0 Under model (3.53), assuming that at the true value < 1, there exists a …xed open neighborhood O = O( 0 ) of 0 0; 0 < such that with probability tending to 1, as T ! 1, the log-likelihood function has a unique maximum point ^ and, furthermore, ^ is consistent and asymptotically normal: p T (^ 0) d ! N (0; G 1 ( )) where the conditional information matrix G( ) is de…ned as 1 t( ) G( ) = E @ @ @ @ t t 0 (3.59) and can be consistently estimated by GT ( ) = = T X t=1 T X t=1 V ar 1 t( ) @lt j Ft @ @ @ t 1 @ @ t 0 (3.60) CHAPTER 3. ECONOMETRIC MODELLING OF COUNT TIME SERIES 53 The standard errors of parameter estimates can be obtained from matrix GT ( ). 3.7 Concluding remarks We have reviewed the main models for time series of counts used in econometrics. They belong to the class of GLM and their estimation relies on partial likelihood theory. We have deeply analyzed one of the most used count model, which is the Poisson with log-linear intensity. Then we have introduced a recently developed Poisson model: Poisson Autoregression by Fokianos, Rahbek and Tjøstheim (FRT, 2009). This model de…nes intensity as a linear function of its own past values and the past number of events and is able to capture the overdispersion and the strong persistence characterizing many count data. As these features are also found in the corporate default count time series, we can think to Poisson Autoregression as an useful tool for the count time series analysis of the default phenomenon. Chapter 4 A new Poisson Autoregressive model with Exogenous Covariates We have concluded the previous chapter by presenting Poisson Autoregression by Fokianos, Rahbek and Tjøstheim [FRT] (2009) and explaining its potential advantages in modelling overdispersed and long-memory count data, which are features found in the corporate default counts that will be the object of our empirical study in Chapter 5. Though, this formulation does not consider the role of covariate processes in the intensity dynamics, i.e. in the distribution of the number of events. We claim that including exogenous predictors in the conditional mean speci…cation can enrich the analysis of count time series and also improve the in- and out-of-sample forecasting performance, especially when applying the model to empirical time series strongly linked to the …nancial and economic context. In this chapter we then propose and develop a class of Poisson intensity AutoRegressions with eXogenous covariates (PARX) models. Extending the theory developed by FRT (2009) allowing for covariate processes requires a strong theoretical e¤ort which is a relevant part of our methodological contribution. First, we provide results on the time series properties of PARX models, including conditions for stationarity and existence of moments. We then provide an asymptotic theory for the maximum-likelihood estimators of the parameters entering the model, allowing inference and forecasting. 54 CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES55 4.1 Related literature The PARX model is related to a recent literature on GARCH models augmented by additional covariates with the aim of improving the volatility forecasting performance. In many cases the lagged squared returns o¤er just a weak signal about the level of volatility and, as a consequence, the approximation provided by standard GARCH models is poor when volatility changes rapidly to a new level. Realized volatility measures calculated from high-frequency …nancial data and introduced in the literature by seminal works such as Andersen, Bollerslev, Diebold and Labys (2001) and Barndor¤-Nielsen and Shephard (2002) can be useful to improve the approximation of these models. These measures are found indeed to approximate the level of volatility very well. The …rst models including realized volatility measures in the GARCH equation are the so-called GARCH-X models estimated by Engle (2002), but are quite incomplete as they do not explain the variation in the realized measures. More complete models are those introduced by Engle and Gallo (2006) and the HEAVY model of Shephard and Sheppard (2010), both specifying multiple latent volatility processes, and the Realized GARCH model of Hansen et al. (2012), which combines a GARCH structure for the daily returns with an integrated model for realized measures of volatility. More generally, there are several works presenting empirical analyses where the time-varying volatility is explained by past returns and volatilities together with additional covariates, typically the volume of transactions as a proxy of the ‡ow of information reaching the market (see, for example, Lamoureux and Lastrapes, 1990 and Gallo and Pacini, 2000). An econometric analysis of ARCH and GARCH models including exogenous covariates can be found in Han and Park (2008) and Han and Kristensen (2013). The PARX shares the same motivation and modelling approach of the presented literature, except that the variable of interest in our case is the time-varying Poisson intensity. CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES56 4.2 Speci…cation of PARX models Consider the Poisson model for the counts yt , conditional on past intensity and counts, denoted by t m and yt m, for m 1, respectively, as well as past values of an explanatory variable xt : yt j F t where Ft 1 = (yt m; t m; xt m; m (4.1) P ois( t ) 1 1) and t is the, potentially time-varying, Poisson intensity. Following FRT (2009), equation (4.1) can be rewritten in terms of an i.i.d. sequence Nt (: ) of Poisson processes with unit-intensity (4.2) yt = Nt ( t ) The time-varying intensity is speci…ed in terms of the linear link function considered in FRT (2009), here augmented by an exogenous covariate xt 2 R entering the intensity through a known function f : R ! R+ : t =!+ p X i yt i i=1 + q X i t i The parameters of interest are given by ! > 0, and is easy to observe that, when (4.3) + f (xt 1 ) j=1 1 ; :::; p, 1 ; :::; q and 0. It = 0, the model reduces to the Poisson Autoregression in FRT (2009). Also note that we de…ne a more general speci…cation, allowing for p lags of the response and q lags of the intensity. We can then use the notation PARX(p; q) in an analogous way as GARCH(p; q) identi…es a GARCH models where p lags of the returns and q lags of the volatility are included. The presence of the lagged covariate value rather than the value at time t allows the de…nition of a conditional intensity that is known at time t given the information available up to time t 1. In order to carry out multi-step ahead forecasting, we close the model by imposing a Markov-structure on the covariate, xt = g(xt 1 ; "t ; ) for some function g(x; "; ) which is known up to parameter (4.4) and where "t is an i.i.d. error term. We will assume that f"t g and fNt (: )g are mutually independent so that there is no feedback e¤ect from yt to xt . CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES57 4.3 Time series properties We here provide su¢ cient conditions for a PARX process to be stationary and ergodic with polynomial moments of a given order1 . The analysis is carried out by applying recent results on so-called weak dependence developed in Doukhan and Wintenberger (2008). The notion of weak dependence allows to prove the existence of a strictly stationary solution for a large variety of time series models called chains with in…nite memory, de…ned by the equation Xt = F (Xt 1 ; Xt 2 ; :::; ; t ) a.s. for t 2 T where F takes values in a Banach space and t constitutes an i.i.d. sequence (see Doukhan and Wintenberger, 2008 for details). These models can be seen as a natural extension either of linear models or Markov models. While weak dependence is a slightly weaker concept than the geometric ergodicity used in FRT (2009), it does imply that a strong law of large numbers as well as a central limit theory, both used for the results on econometric inference shown in the following, apply. Speci…cally, we make the following assumptions: Assumption 1 jf (x) of points x; x~ 2 R. Assumption 2 f (~ x)j E [kg(x; "t ) L kx x~k, for some L > 0 and for every pair g(~ x; "t )ks ] kx x~ks for some < 1, s 1 and for every pair of points x; x~ 2 R, and E [kg(0; "t )ks ] < 1. Pmax(p;q) ( i + i ) < 1. Assumption 3 i=1 Assumption 4 ("0t ; Nt (: )) are i.i.d. A few remarks on these assumptions are needed. First, Assumption 1 states that f satis…es the Lipschitz condition. This assumption will be weakened in the following in order to gain ‡exibility in the choice of the function f . Assumption 2 concerns, instead, a function g de…ning the structure of the covariate process and requires it to be Ls -Lipschitz for all values of x. This is a key 1 All theorems and lemmas are proved in Appendix A. CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES58 assumption when proving stationarity of many popular time series models, including the linear autoregressive ones. Assumption 3 implies that the function L(y; ) = ! + Pp i=1 i yi + Pq i=1 i i is Lipschitz. This assumption is imposed in Doukhan and Wintenberger (2008) for applying the weak dependence theory and it is identical to the condition imposed in FRT (2009) for the Poisson autoregressive model. Finally, Assumption 4 rules out dependence in the two error terms driving the model. It could be weakened, still satisfying the conditions of Doukhan and Wintenberger (2008), by allowing the two joint innovation terms to be Markov processes. This would accomodate “leverage intensity-e¤ects”if f"t g and fNt (: )g are negatively correlated. Though, for our purpose here we maintain Assumption 4. In the following we provide a theorem stating the existence of a stationary solution for process yt under the assumptions de…ned above. Before stating it, we brie‡y present the theory of weak dependence developed by Doukhan and Wintenberger (2008). They use the notion of weak dependence introduced by Dedecker and Prieur (2004) and de…ned as follows. Let ( ; C; P) be a probability space, M a -subalgebra of C and Z a generic random variable with values in A. Assume that kZk1 < 1, where k km denotes the m Lm norm, i.e. kZkm 1, and de…ne the coe¢ cient as m = EkZk for m Z Z (M; Z) = sup f (z)PXjM (dz) f (z)PX (dz) with f 2 1 (A) 1 An easy way to bound this coe¢ cient is based on a coupling argument: (M; Z) kZ W k1 for any W with the same distribution as Z and independent of M. Under certain conditions on the probability space ( ; C; P) (see Dedecker and Prieur, 2004), then there exists a Z such that kZ Z k1 and, using the de…nition of , the dependence between the past of the sequence (Zt )t2T and its future k-tuples may be assessed. Consider the norm kz p) and de…ne wk = kz1 w1 k + :::+ kzk wk k on Ak , set Mp = (Zt ; t CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES59 1 = max sup f (Mp ; (Zj1 ; :::; Zjl )) with p + r 1 l kl 1 (r) = sup k (r) j1 ; :::; jl g ; k (r) k>0 The time series (Zt )t2T is said -weakly dependent when its coe¢ cients 1 (r) tend to 0 as r tends to in…nity. The notion of geometric ergodicity (see 3.6.2) is stronger and refers to the rate of convergence of the Markov chain transition probabilities to the invariant distribution. It requires the -irreducibility of the Markov chain and in FRT (2009) is shown for an approximated (perturbated) Poisson Autoregressive model. Under Assumptions 1-4 there exists a -weakly dependent sta- Theorem 4.1 tionary and ergodic solution Xt = (yt ; Pmax(p;q) ( i + i ); . i=1 t ; xt ) with E [kXt ks ] < 1 and (r) = max The above theorem complements the results of FRT (2009). Note that here we provide su¢ cient conditions for weak dependence of the actual model, not an approximated version. On the other hand, we do not show the stronger property of geometric ergodicity. Given the existence of a stationary distribution, it can easily be shown that E[yt ] = E[ t ] = and furthermore V ar [yt ] = ! + E [f (xt 1 )] Pmax(p;q) ( i+ 1 i=1 i) E[yt ]. Thus, by including past values of the response and covariates in the evolution of the intensity, the PARX model generates overdispersion, which is a prominent feature in many count time series. An important consequence of Theorem 4.1 is that, using again the results of Doukhan and Wintenberger (2008), if Assumptions 1-4 are satis…ed then the (strong) law of large numbers (LLN) applies to any function h(: ) of Xt = (yt ; t ; xt ) provided E [kh(Xt )k] < 1. As a lemma we note that the same applies independently of the choice of initial values (y0 ; 0 ; x0 ), that is: Lemma 4.1 If Xt = F (Xt 1 ; t ) with t i.i.d. and Xt -weakly dependent, 1 XT a:s: then h(Xt ) ! E [h(Xt )] provided that E [kh(Xt )k] < 1. t=1 T CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES60 Note that no role is played by the initial values in what stated above. Also observe that when "t is an i.i.d.(0; 2 ) sequence and E [h2 (Xt )] < 1, it follows by Lemma 4.1 and a CLT for martingales (see Brown, 1971) that T 1 X d p h(Xt ) "t ! N (0; T t=1 2 E h2 (Xt ) ) (4.5) It is worth remarking that the Lipschitz condition in Assumption 1 rules out some unbounded transformations f (x) of xt , such as f (x) = exp(x). In order to handle such situations we introduce a truncated model: c t =!+ p X i yt i + i=1 q X c i t i i=1 + f (xt 1 )I fkxt 1 k cg (4.6) for some cut-o¤ point c > 0. We can then relax Assumption 1 allowing f (x) to be locally Lipschitz in the following sense: Assumption 1’ For all c > 0, there exists some Lc < 1 such that jf (x) f (~ x)j L kx x~k ; kxk ; k~ xk c By replacing Assumption 1 with Assumption 1’, we now obtain, by identical arguments as in the proof of Theorem 4.1, that the truncated process has a weakly dependent stationary and ergodic solution. Though this approach recalls the approximated GARCH-type Poisson process introduced in FRT (2009), the reasoning is di¤erent. In FRT (2009) an approximated process was needed to establish geometric ergodicity of the Poisson process, while here we introduce the truncated process in order to handle the practice - often used in literature - of introducing non-log realized volatility measures as exogenous covariate. Note that, as c ! 1, the truncated process approximates the untruncated one (c = +1) in the following sense: Under Assumptions 1’- 4 together with E [f (xt )] < 1, Lemma 4.2 jE [ c t t ]j E[ c t t] 2 = jE [ytc 2 (c); yt ]j 1 (c); E [ytc y t ]2 3 (c) CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES61 where k (c) ! 0 as c ! 1; k = 1; 2; 3. The above result is akin to Lemma 2.1 in FRT (2009). The additional assumption of E [f (xt )] being …nite needs to be veri…ed on a case-by-case basis. For example, with f (x) = exp(x), then this holds if xt has a Gaussian distribution, or some other distribution for which the moment generating function, or Laplace transform, is well-de…ned. 4.4 Maximum likelihood estimation = (!; ; ; ) 2 Rp+q+2 , where Denote by =( 1 ; :::; p) 0 and =( 1 ; :::; 0 q) the set of unknown parameters entering the PARX model in (4.2)-(4.3). The conditional log-likelihood function in terms of observations y1 ; :::; yT , given the initial values ( 0; 1 ; :::; q+1 ; y0 ; y 1 ; :::; y p+1 ), LT ( ) = T X takes the form where lt ( ) = yt log lt ( ); t( ) t( (4.7) ) t=1 where we have left out a constant term and t( )=!+ p X i yt i q X + i=1 i t i( ) + f (xt 1 ) i=1 The maximum likelihood estimator is then computed as ^ = arg max LT ( ) (4.8) 2 where Rp+q+2 is the parameter space. We now impose the following conditions on the parameters: Assumption 5 Moreover, for all ! ! L > 0. Assume that Rp+q+2 , with 2 = (!; ; ; ) 2 , i U compact and 0 2 int . < 1=q for i = 1; 2; :::; q and Under this assumption together with the ones used to establish stationarity of the model, we obtain the following asymptotic result for the maximum likelihood estimator: CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES62 Under Assumptions 1-5, ^ is consistent and " # 2 p l ( ) d t 1 T (^ G= E 0 ) ! N (0; G ) 0 Theorem 4.2 = (4.9) 0 An important remark is the following. If the distribution of yt is misspeci…ed, thus there is an error term in the de…nition of intensity, but it still holds that E[yt ] = t, we expect the asymptotic properties of the maximum likelihood estimator to remain correct except that the asymptotic variance now takes the sandwich form G where =E " lt ( ) lt ( ) 0 = 0 1 G 1 # See Gourieroux et al. (2004) for an analysis of Quasi-Maximum Likelihood Estimation (QMLE) of Poisson models. Theorem 4.2 generalizes the result of FRT (2009) to allow for estimation of parameters associated with additional regressors in the speci…cation of t. By combining the arguments in FRT (2009) with Lemma 4.2, the asymptotic result can be extended to allow f to be locally Lipschitz (see Assumption 1’). More precisely, we de…ne the likelihood quantities for the approximated, or truncated, model as LcT ( ) = T X ltc ( ); c t( where ltc ( ) = ytc log c t( ) ) (4.10) t=1 c It immediately follows that the results of Theorem 4.2 holds for the QMLE ^ of LcT ( ). However, as the approximated likelihood function can be made arbitrarily close to the true likelihood as c ! 1, one can show that we can replace Assumption 1 in Theorem 4.2 by Assumption 1’: Theorem 4.3 consistent and p T (^ Under Assumptions 1’, 2-5 and E [f (xt )] < 1, then ^ is 0) d ! N (0; G 1 ) G= E " 2 lt ( ) 0 = 0 # (4.11) With the above theorem we have generalized the asymptotic results by allowing the assumptions on function f to be relaxed. CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES63 4.5 Forecasting The PARX model can be used to generate forecasts of both the intensity, t, and the number of events, yt . It is important to remark that, for multi-step forecasting, we also need to estimate the model for xt as given in (4.4). Given that xt is exogenous, we can estimate the parameters entering equation (4.4) independently of . If no model is available for xt , only one-step ahead forecasts are possible. In the following, we treat the parameters entering the model as known for notational ease. In practice, the unknown parameters are simply replaced by their estimates. Forecasting of Poisson autoregressive processes is similar to forecasting of GARCH processes (see, e.g., Hansen et al, 2012, Section 6.2) since it proceeds in two steps. First, a forecast of the time-varying parameter - the variance in the case of GARCH, the intensity in the case of PARX - is obtained; then, this is substituted into the conditional distribution of the observed process yt . Consider the forecasting of =!+ T +1 j T t. A natural one-step ahead forecast is p X i yT +1 i i=1 + q X i T +1 i + f (xT ) (4.12) i=1 More generally, a multi-step ahead forecast of the distribution of yT +h , for some h > 1, takes the form FT +h j T (y) = F y j where T +h j T T +h j T is the …nal output of the following recursion: max(p;q) T +h j T =!+ X ( i + i=1 where the initial value T +1 j T i ) T +k i j T + f (xT +k i j T ); k = 1; :::; h (4.13) derives from (4.12) and xT +k j T , k = 1; :::; h 1, is obtained from some forecast procedure based on (4.4). For example, if the model for xt is an AR, the natural forecast is yT +h j T := E [yT +h j Ft ] = together with the 1 for some con…dence interval (as implied by the forecast distribution) 2 (0; 1). The symmetric 1 CI1 T +h j T = Q =2j T +h j T con…dence interval takes the form ;Q 1 =2j T +h j T CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES64 where p 7 ! Q(pj ) denotes the quantile function of a Poisson distribution with in- tensity . The quantile function is available in standard statistical software packages, such as Matlab. The forecasting results can be used to evaluate competing PARX models, e.g. based on di¤erent choices of covariates. A number of di¤erent tests have been proposed in the literature for comparing forecasting models. One can either use forecast evaluation methods based on point forecast, yT +h j T , as proposed in, among others, Christo¤ersen and Diebold (1997). Alternatively, the evaluation of the forecast distribution can be made by using the so-called scoring rules (Diebold et al., 1998). These take as starting point some loss function S(P; y) whose arguments are the probability forecast, P , and the future realization, y. For instance, the log-score, S(P; y) = log P (y) can be used for ranking probability forecast methods by comparing their average scores. A test based on the scoring rules is the likelihood ratio test studied by Amisano and Giacomini (2007). Suppose we have two competing PARX models with corresponding intensity forecasts (1) T +h j T (2) T +h j T . and We then de…ne the corresponding log-likelihood functions given the actual outcome in period T + h, (k) T +h j T = yT +h log (k) T +h j T (k) T +h j T , k = 1; 2 and compare the two forecasting models in terms of the Kullback-Leibler distance across k 1 realizations and corresponding forecasts m+k 1 Xn LR = k + 1 T =m where m (1) T +h j T (2) T +h j T o 1 is the “training sample size” with fyt ; xt : t = 1; :::; mg being used to obtain the parameter estimates. If LR > 0 (< 0) we prefer the …rst (second) model. Amisano and Giacomini (2007) show that LR follows a normal distribution as k ! 1. 4.6 Finite-sample simulations In this section we present a simulation study with the aim of evaluating the performance of MLE for PARX models. We consider the results of simulations from CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES65 PARX models with di¤erent covariate processes, mainly distinguishing between longmemory and short-memory processes. The objective is indeed to show not only the satisfactory performance of the estimation algorithm, but also the ‡exibility of PARX in terms of choice of the covariates. 4.6.1 Simulation design This experiment2 is focused on the …nite-sample behaviour of MLE for PARX models. We evaluate the parameter estimates for di¤erent sample sizes, in order to verify not only the accuracy but also the convergence to the asymptotic Gaussian distribution. In particular, our study is organized as follows. We simulate and …t the PARX(1,1) model t =!+ yt j F t 1 1 yt 1 + P ois( t ) 1 t 1 + exp(xt 1 ) Though here our Monte Carlo experiment is shown for a PARX(1,1) model only, the results are very similar if more lags of the response and intensity are included. We choose the exponential function as the positive function f for including the generated exogenous covariate in the model (see Equation 4.3). This allows to evaluate the parameter estimates when the Lipschitz condition on f is relaxed, allowing for unbounded transformation to be employed (see assumption A’). The exponential transformation will also be used in our empirical study. We examine di¤erent cases, based on alternative choices of the function g(x; "; ) in Xt = g(xt 1 ; "t ; ) The cases included in our simulation design are the following: Case 1: stationary AR(1) covariate xt = 'xt 1 + "t = 0:50 2 We use Matlab for writing the data generation and estimation code. CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES66 Case 2: MA(1) covariate xt = x t 1 + "t = 0:50 Case 3: ARFIMA (0,0.25,0) covariate d j xt = xt 1 + "t d = 0:25 where, using the backward shift operator L, dj = (1 L)d = Xj (k d)Lk , with ( ) denoting the gamma function and j denotk=0 ( d) (k + 1) ing the truncation order of the theoretical in…nite sum d = (1 L)d = X1 (k d)Lk . k=0 ( d) (k + 1) 2 In each case the innovation process f"t g is chosen to be i.i.d. normal with variance such that the variance of the covariate model is 1 and thus facilitating compar- isons. In all cases the initial values are set to x0 = 0. Note that the choice of a fractional di¤erencing order d = 0:25 for the fractional white noise satis…es the stationarity condition for autoregressive fractionally integrated processes jdj < 0:50, so that Assumption 2 on the Lipschitz condition is not violated. For each case we consider four alternative scenarios for the data-generating parameter values, changing the value of the sum of the persistence parameters 1 + 1: Scenario 1: null coe¢ cient of intensity: ! = 0:10; 1 = 0:30; 1 = 0:00; = 0:50 Scenario 2 - “low”persistence: ! = 0:10; 1 = 0:30; 1 = 0:20; = 0:50 Scenario 3 - “high”persistence with the coe¢ cient of the response larger than the coe¢ cient of intensity: ! = 0:10; 1 = 0:70; 1 = 0:25; = 0:50 CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES67 Scenario 4 - “high”persistence with the coe¢ cient of intensity larger than the coe¢ cient of the response: ! = 0:1; 1 = 0:25; 1 = 0:70; = 0:50 The …rst scenario is comparable to an ARCH model as only the lagged response is included. Note that none of the presented scenarios violates the condition of Pmax(p;q) stationarity ( i + i ) < 1 (Assumption 3) that we have imposed when i=1 developing the asymptotic theory. For all scenarios we simulate for sample sizes T 2 f100; 250; 500; 1000g with 1000 replications. We also include small sample sizes for providing insights into the quality of the estimates for short length count time series which are commonly modeled in many empirical applications. 4.6.2 Results As discussed above, our study of the MLE performance in …nite samples concerns both the accuracy and the speed of convergence to normality. In Tables 4.1 to 4.6, the mean of the parameter estimates (obtained averaging out the results from all the replications) is reported in the fourth column, while the …fth shows the root mean square error (RMSE) of the estimates. The sixth and the seventh column report the skewness and the kurtosis of the estimates distribution. We also perform a Kolmogorov-Smirnov test on the estimates for testing against the standard normal distribution and report the corresponding p-value in the last column. In what follows, we comment the results obtained for the cases with AR/MA (short-memory) covariates and long-memory covariates separately. Results for the short-memory covariates In Tables 4.1 to 4.4, we show the results for the case where short-memory processes are included in the intensity speci…cation. We consider a stationary AR(1) and a stationary MA(1), thus two short-memory processes characterized by a di¤erent rate of decrease of the autocorrelation function. The results are very similar. In both CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES68 cases the estimate precision is fully satisfactory for a sample size of 500. We can also note a relevant improvement moving from T = 100 to T = 250. The best results are obtained in the …rst and second (low persistence) scenarios (see Tables 4.1 to 4.4). The “worst” scenario appears to be the third, i.e. when the value of persistence is close to one and the coe¢ cient of the response of intensity 1. 1 is higher than the coe¢ cient Moreover, even in this case, the approximation improves quicky as the sample size increases. The less accurate estimate is that of the constant (!) parameter. Convergence to normality is evident in both cases and for all the scenarios considered, as normality is never rejected at a 5% signi…cance level when the sample size is at least 500. Results for the long-memory covariates Case 3 considers the inclusion of a fractionally integrated process (Tables 4.5 to 4.6). ARFIMA processes are weakly stationary if the condition jdj < 0:50 (as in our experiment) is satis…ed, but have slowly-decaying autocorrelations compared to the exponential rate of decay typical of ARMA models. Considering this case separately is then convenient. The results do not show substancial di¤erences with respect to the previously examined case of AR/MA covariates. Again, the approximation is satisfactory, except for the constant parameter in Scenario 3, which substantially improves for a sample size of 1000, though. Convergence to normality is con…rmed, as the only rejection for sample sizes larger than 250 concerns the constant parameter in Scenarios 3 and 4 (see Tables 4.5 to 4.6). 0.02 0.51 0.00 0.50 T=1000 T=500 T=250 0.28 0.30 1 0.30 0.00 0.50 0.30 0.00 0.50 1 0.30 0.00 0.50 0.30 0.00 0.50 1 0.30 0.00 0.50 0.30 0.00 0.50 1 1 0.10 0.10 ! 1 0.10 0.10 ! 1 0.09 0.10 ! 1 0.09 0.10 ! T=100 Mean True Parameter Sample size Scenario 2: "low" persistence. 0.02 0.03 0.03 0.03 0.02 0.05 0.04 0.05 0.04 0.08 0.07 0.07 0.07 0.15 0.13 0.16 RMSE -0.01 0.11 -0.09 0.21 0.06 -0.13 0.15 0.15 0.18 -0.02 0.05 -0.11 0.06 0.15 0.10 0.23 Skewness Scenario 1 3.23 3.04 3.05 3.18 3.02 3.26 3.64 2.94 3.01 3.13 3.27 3.28 3.31 3.85 3.85 3.65 Kurtosis 0.74 0.98 0.52 0.38 0.34 0.17 0.33 0.66 0.49 0.93 0.87 0.85 0.85 0.31 0.32 0.36 KS p-value 0.50 0.20 0.30 0.10 0.50 0.20 0.30 0.10 0.50 0.20 0.30 0.10 0.50 0.20 0.30 0.10 True 0.50 0.20 0.30 0.10 0.50 0.20 0.30 0.10 0.50 0.21 0.29 0.10 0.51 0.22 0.27 0.10 Mean 0.02 0.03 0.03 0.04 0.02 0.05 0.04 0.05 0.04 0.08 0.07 0.08 0.07 0.14 0.11 0.18 RMSE 0.13 -0.03 0.02 0.25 0.00 0.13 -0.17 0.27 0.01 -0.08 0.04 0.33 0.35 0.07 -0.05 0.35 Skewness Scenario 2 2.81 2.95 2.92 3.08 3.06 3.16 3.17 3.07 2.97 2.93 2.87 3.64 3.26 4.14 3.38 3.82 Kurtosis 0.32 0.71 0.61 0.42 0.75 0.16 0.87 0.35 0.92 0.63 0.99 0.19 0.32 0.34 0.97 0.01 KS p-value Table 4.1: Results of simulations for PARX(1,1) with stationary AR(1) covariate. Scenario 1: null coe¢ cient of intensity. CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES69 0.27 0.52 0.25 0.50 T=1000 T=500 T=250 0.67 0.70 1 0.69 0.26 0.50 0.70 0.25 0.50 1 0.70 0.25 0.50 0.70 0.25 0.50 1 0.70 0.25 0.50 0.70 0.25 0.50 1 1 0.13 0.10 ! 1 0.15 0.10 ! 1 0.22 0.10 ! 1 0.20 0.10 ! T=100 Mean True Parameter Sample size 0.03 0.03 0.03 0.17 0.03 0.05 0.05 0.23 0.07 0.06 0.06 0.36 0.13 0.12 0.12 0.48 RMSE -0.02 0.05 -0.01 0.46 0.08 -0.07 0.12 0.57 0.19 -0.15 0.16 0.61 0.09 0.48 -0.46 0.41 Skewness Scenario 3 2.78 3.02 3.01 3.59 2.92 3.16 3.12 3.53 3.97 3.07 3.14 4.01 3.22 5.41 5.21 4.20 Kurtosis 0.99 0.73 0.71 0.05 0.94 0.99 0.99 0.00 0.75 0.45 0.26 0.00 0.86 0.18 0.12 0.01 KS p-value 0.50 0.70 0.25 0.10 0.50 0.70 0.25 0.10 0.50 0.70 0.25 0.10 0.50 0.70 0.25 0.10 True 0.50 0.71 0.24 0.10 0.50 0.71 0.24 0.11 0.50 0.72 0.23 0.13 0.51 0.77 0.18 0.15 Mean 0.02 0.02 0.02 0.10 0.02 0.04 0.04 0.13 0.05 0.06 0.06 0.21 0.11 0.15 0.15 0.30 RMSE 0.07 -0.02 0.03 0.30 0.05 0.01 -0.04 0.30 0.05 0.20 -0.20 0.36 0.04 0.89 -0.92 0.56 Skewness Scenario 4 high coe¢ cient of the response. Scenario 4: "high" persistence due to high coe¢ cient of intensity. 2.91 2.89 2.88 3.20 2.87 2.85 2.79 2.88 3.20 3.92 3.91 3.30 2.94 3.46 3.50 3.94 Kurtosis 0.99 0.81 0.79 0.24 0.95 0.96 0.86 0.21 0.81 0.64 0.72 0.13 0.84 0.00 0.00 0.07 KS p-value Table 4.2: Results of simulations for PARX(1,1) with stationary AR(1) covariate. Scenario 3: "high" persistence due to CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES70 0.00 0.52 0.00 0.50 T=1000 T=500 T=250 0.28 0.30 1 0.29 0.00 0.50 0.30 0.00 0.50 1 0.30 0.00 0.50 0.30 0.00 0.50 1 0.30 0.00 0.50 0.30 0.00 0.50 1 1 0.10 0.10 ! 1 0.11 0.10 ! 1 0.10 0.10 ! 1 0.10 0.10 ! T=100 Mean True Parameter Sample size 2: "low" persistence. 0.03 0.05 0.03 0.05 0.05 0.07 0.05 0.08 0.06 0.10 0.07 0.10 0.11 0.20 0.12 0.19 RMSE 0.17 0.15 -0.04 0.22 0.01 0.07 0.06 0.15 0.11 -0.04 -0.07 0.28 0.26 0.07 0.15 0.37 Skewness Scenario 1 3.27 2.94 2.94 3.33 2.92 3.01 3.63 2.95 2.85 3.19 3.03 3.21 3.24 3.80 3.24 5.74 Kurtosis 0.93 0.73 0.99 0.29 0.72 0.71 0.34 0.83 0.86 0.55 0.81 0.03 0.70 0.16 0.51 0.01 KS p-value 0.50 0.20 0.30 0.10 0.50 0.20 0.30 0.10 0.50 0.20 0.30 0.10 0.50 0.20 0.30 0.10 True 0.50 0.20 0.30 0.10 0.50 0.20 0.30 0.11 0.51 0.21 0.29 0.10 0.51 0.20 0.27 0.13 Mean 0.04 0.05 0.03 0.07 0.05 0.08 0.05 0.09 0.07 0.10 0.07 0.12 0.16 0.24 0.14 0.19 RMSE -0.05 0.08 -0.12 0.35 0.04 -0.05 0.05 0.50 0.12 -0.14 -0.06 0.49 0.25 0.21 -0.29 0.93 Skewness Scenario 2 3.00 3.32 3.43 3.31 3.11 3.30 2.81 3.94 2.77 3.41 3.17 3.48 2.89 4.21 3.94 4.78 Kurtosis 0.88 0.57 0.40 0.40 0.40 0.54 0.96 0.05 0.69 0.72 1.00 0.16 0.29 0.03 0.37 0.00 KS p-value Table 4.3: Results of simulations for PARX(1,1) with MA(1) covariate. Scenario 1: null coe¢ cient of intensity. Scenario CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES71 0.26 0.51 0.25 0.50 T=1000 T=500 T=250 0.67 0.70 1 0.69 0.25 0.50 0.70 0.25 0.50 1 0.70 0.25 0.50 0.70 0.25 0.50 1 0.70 0.25 0.50 0.70 0.25 0.50 1 1 0.15 0.10 ! 1 0.18 0.10 ! 1 0.24 0.10 ! 1 0.30 0.10 ! T=100 Mean True Parameter Sample size 0.09 0.03 0.03 0.17 0.10 0.05 0.04 0.25 0.15 0.07 0.07 0.35 0.27 0.12 0.11 0.60 RMSE 0.13 -0.07 0.01 0.33 -0.01 -0.02 0.05 0.65 0.11 -0.09 0.08 0.82 0.25 -0.08 -0.03 0.35 Skewness Scenario 3 3.19 2.98 2.99 3.40 3.06 3.32 3.26 4.08 3.33 3.26 3.19 5.00 3.19 3.38 3.07 4.93 Kurtosis 0.91 0.56 0.84 0.36 1.00 0.97 0.97 0.02 0.98 0.96 1.00 0.01 0.68 0.92 0.25 0.00 KS p-value 0.50 0.70 0.25 0.10 0.50 0.70 0.25 0.10 0.50 0.70 0.25 0.10 0.50 0.70 0.25 0.10 True 0.50 0.70 0.25 0.13 0.51 0.70 0.24 0.17 0.51 0.71 0.23 0.17 0.52 0.77 0.17 0.16 Mean 0.05 0.03 0.02 0.14 0.08 0.04 0.04 0.26 0.13 0.06 0.06 0.33 0.19 0.16 0.16 0.34 RMSE -0.04 -0.12 0.08 0.39 0.01 -0.22 0.15 0.73 0.25 0.15 -0.22 0.48 0.04 0.74 -0.80 0.55 Skewness Scenario 4 coe¢ cient of the response. Scenario 4: "high" persistence due to high coe¢ cient of intensity. 3.08 2.85 2.79 2.99 3.24 3.31 3.09 4.35 2.86 2.97 2.82 3.54 3.29 3.48 3.62 3.96 Kurtosis 0.98 0.90 0.95 0.03 0.99 0.49 0.79 0.10 0.16 0.35 0.38 0.02 0.71 0.00 0.00 0.02 KS p-value Table 4.4: Results of simulations for PARX(1,1) with MA(1) covariate. Scenario 3: "high" persistence due to high CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES72 -0.01 0.51 0.00 0.50 T=1000 T=500 T=250 0.29 0.30 1 0.30 0.00 0.50 0.30 0.00 0.50 1 0.30 0.00 0.50 0.30 0.00 0.50 1 0.30 0.00 0.50 0.30 0.00 0.50 1 1 0.10 0.10 ! 1 0.10 0.10 ! 1 0.10 0.10 ! 1 0.12 0.10 ! T=100 Mean True Parameter Sample size Scenario 2: "low" persistence. 0.03 0.05 0.03 0.05 0.04 0.07 0.05 0.07 0.06 0.10 0.07 0.09 0.13 0.23 0.13 0.20 RMSE 0.09 0.03 -0.05 0.09 -0.01 0.03 -0.05 0.30 0.26 -0.24 0.17 0.39 0.17 0.00 -0.05 0.74 Skewness Scenario 1 3.11 3.13 3.04 2.93 2.98 3.13 2.94 3.44 3.06 3.66 3.48 3.71 3.21 4.42 3.80 5.30 Kurtosis 0.74 0.82 0.81 0.73 0.59 1.00 0.95 0.22 0.39 0.33 0.70 0.14 0.50 0.16 0.47 0.00 KS p-value 0.50 0.20 0.30 0.10 0.50 0.20 0.30 0.10 0.50 0.20 0.30 0.10 0.50 0.20 0.30 0.10 True 0.50 0.20 0.30 0.10 0.50 0.20 0.30 0.10 0.50 0.20 0.29 0.12 0.51 0.21 0.27 0.11 Mean 0.03 0.05 0.03 0.05 0.05 0.07 0.05 0.07 0.07 0.10 0.07 0.12 0.12 0.19 0.13 0.18 RMSE 0.08 0.07 -0.15 0.40 0.02 0.12 -0.10 0.30 0.25 -0.15 0.01 0.47 0.17 0.02 -0.18 0.81 Skewness Scenario 2 2.79 3.03 2.83 3.68 2.79 3.02 2.96 3.13 2.95 3.15 3.02 3.49 3.11 4.05 3.55 4.21 Kurtosis 0.43 0.80 0.56 0.14 0.97 0.90 0.96 0.54 0.85 0.81 0.57 0.08 0.32 0.31 0.43 0.00 KS p-value Table 4.5: Results of simulations for PARX(1,1) with ARFIMA(0,0.25,0) covariate. Scenario 1: null coe¢ cient of intensity. CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES73 0.27 0.52 0.25 0.50 T=1000 T=500 T=250 0.66 0.70 1 0.69 0.25 0.50 0.70 0.25 0.50 1 0.70 0.25 0.50 0.70 0.25 0.50 1 0.70 0.25 0.50 0.70 0.25 0.50 1 1 0.15 0.10 ! 1 0.19 0.10 ! 1 0.29 0.10 ! 1 0.29 0.10 ! T=100 Mean True Parameter Sample size 0.07 0.03 0.03 0.19 0.09 0.05 0.04 0.27 0.14 0.07 0.07 0.44 0.15 0.12 0.12 0.57 RMSE 0.17 -0.04 0.10 0.40 0.06 0.03 -0.09 0.67 0.11 0.19 -0.22 0.76 0.15 0.14 -0.20 0.73 Skewness Scenario 3 2.92 3.18 3.28 3.23 2.76 2.83 2.88 3.64 3.20 3.45 3.37 3.95 2.79 3.33 3.44 5.33 Kurtosis 0.36 0.31 0.19 0.11 0.85 0.76 0.34 0.00 0.45 0.29 0.59 0.00 0.69 0.61 0.74 0.01 KS p-value 0.50 0.70 0.25 0.10 0.50 0.70 0.25 0.10 0.50 0.70 0.25 0.10 0.50 0.70 0.25 0.10 True 0.51 0.70 0.24 0.12 0.51 0.71 0.24 0.13 0.51 0.71 0.23 0.18 0.51 0.78 0.17 0.16 Mean 0.05 0.03 0.02 0.11 0.07 0.04 0.04 0.14 0.14 0.06 0.05 0.25 0.14 0.16 0.16 0.30 RMSE 0.05 -0.06 0.03 0.37 0.15 -0.09 0.06 0.71 0.33 0.09 -0.25 1.15 -0.02 0.80 -0.88 0.63 Skewness Scenario 4 to high coe¢ cient of the response. Scenario 4:"high" persistence due to high coe¢ cient of intensity. 2.94 3.12 2.97 3.28 3.02 2.81 2.83 3.81 3.47 3.98 4.35 5.93 3.14 3.30 3.34 3.96 Kurtosis 0.77 0.97 0.95 0.02 0.46 0.29 0.47 0.00 0.30 0.84 0.58 0.00 0.81 0.00 0.00 0.02 KS p-value Table 4.6: Results of simulations for PARX(1,1) with ARFIMA(0,0.25,0) covariate. Scenario 3: "high" persistence due CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES74 CHAPTER 4. A NEW POISSON AUTOREGRESSIVE MODEL WITH COVARIATES75 4.7 Concluding remarks In this chapter we have de…ned and studied the properties of Poisson Autoregressions with Exogenous Covariates (PARX). Speci…cally, we have developed both the asymptotic and estimation theory, in addition to establishing the conditions for stationarity and ergodicity of the de…ned process. We have also considered how forecasting can be carried out and evaluated in our framework. In the last section we have conducted a simulation study of di¤erent PARX models, i.e. including di¤erent covariates. The results show a good performance of MLE and very little di¤erences among the alternative PARX models considered. In the empirical analysis discussed in the next chapter, we will show that the PARX model is extremely useful for investigating the corporate defaults phenomenon. Chapter 5 Empirical study of Corporate Default Counts So far we have presented default risk and the main measures and models for analyzing it (see Chapter 1 and 2). We have presented and discussed the literature of default correlation as well as several studies investigating the phenomenon - which is central in risk management - of the default peaks predictability. We have reviewed regression models including variables which may explain the incidence of corporate defaults phenomenon, in terms of either default rates or counts. We have progressively focused on models for default counts, encouraged by the fact that the same clusters shown in the default rates time series are also evident in the time series of bankruptcy counts. Furthermore, as previously said, the main point in default rate prediction is forecasting the number of defaulting issuers by a certain time horizon. The predicted default intensity - the expected number of defaults - can be an easy and immediate instrument in bank risk management communications. The count models typically used for rare events like the Poisson, presented in Chapter 3 together with other count time series models, seem to be suitable. Our idea of using Poisson models with both autoregressive components and exogenous regressors for capturing the default clustering has led to the de…nition of a new model called Poisson Autoregression with Exogenous Covariates (PARX). How Poisson Autoregressions and PARX models perform when handling actual corporate default data and how the results of their 76 CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 77 application should be intepreted are the research questions we address in this chapter. 5.1 Overview of the approach We investigate the corporate default dynamics through a count time series approach including autoregressive components and exogenous variables, sharing some similarities with the generalized autoregressive models for conditional volatility. Our analysis of corporate defaults dynamics is made under an aggregate perspective, which does not take into account …rm-speci…c conditions determining the individual probability of default of a company. This study tries indeed to measure an overall default risk concerning debt issuers of considerable relevance in terms of dimension, because we consider defaults among rated, thus in most cases listed, …rms. The default intensity of high dimension …rms is expected to be linked to common risk factors arising from the …nancial and macroeconomic context, as well as possible contagion e¤ects. We claim that this approach can give a useful measure of the general tendency in the corporate default dynamics, providing a measure of “systematic”default risk which can support the traditional analysis of individual …rm solvency conditions. 5.2 Corporate default counts data The time series of corporate default counts we analyze here refers to the monthly number of bankruptcies among Moody’s rated United States …rms in the period going from January 1982 to December 2011. The default count dataset is one of the risk monitoring instruments provided by Moody’s Credit Risk Calculator (CRC), which allows to download historical default rates and counts in the form of customized reports, with many options in terms of time interval length and economic sectors. We choose to focus our study on the industrial sector: this means to include all the …rms covering non…nancial activities and exclude banking, …nancial and insurance companies. This choice is quite common in the study of corporate default counts (see, for instance, Das et al., 2007, Lando and Nielsen, 2010 and Lando et al. 2013) and motivated by the convenience of considering the real and …nancial economy CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 78 default events separately, at least in the …rst place. Other categories typically excluded are the public utilities and transportation activities, because of their peculiar management structure, often linked to the public sector. More generally, the choice of using US data is motivated by the good quality and organization of the default data material, at least from the 1980s. The Bankruptcy Reform Act of 1978, amending the Bankruptcy Act of 1898, is the …rst complete expression of the US default law, trying to give protection to the creditors as well as the chance to the borrowers to reorganize their activity. With this act, the default legislation becomes uniform in all the federal states. The Bankruptcy Reform Act of 1978 continues to serve as the federal law that governs the bankruptcy cases today, and again a strong emphasis is given to business reorganization (see Skeel, 2001 for a history of the US bankruptcy law). However, in the US as in many European countries, during the period from World War II through the 1970s, bankruptcy was a nearly exceptional event. With the exception of Northeastern railroads, there were not many notable business failures in the U.S. in that time. During the 1970s, there were only two corporate bankruptcies of prominence: Penn Central Transportation Corporation in 1970 and W.T. Grant Company in 1975. It is interesting that the failure of Penn Central and Northeastern railroads is often cited as the …rst documented case of contagion, as the major case of the railroads default was the missed payment of obligations by Penn Central. Both Das et al. (2007) and Lando and Nielsen (2010) cite the Penn Central case in their empirical analyses. The small number of defaults before the 1980s explains our choice of using January 1982 as the starting period of our empirical analysis. Some …rst considerations about the time series of corporate default counts in US in the last thirty years can be made by inspecting a simple plot of our data, shown in Figure 5.1. The …rst evidence from Figure 5.1 is that the data show the peaks typically found in corporate default counts time series and also referred to as “default clusters”. The long memory of the series is evident from the slowly decaying autocorrelation function (see Figure 5.2). Looking more in detail at the peak periods and trying to connect them with CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 79 Figure 5.1: Monthly default counts of US Moody’s rated industrial …rms from January 1982 to December 2011. Figure 5.2: Autocorrelation function of the monthly default counts. CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 80 the …nancial crises, during the 1980s and early 1990s many bankruptcies took place. Many well-known companies …led for bankruptcy, mainly encouraged by reorganization opportunities. These Include LTV, Eastern Airlines, Texaco, Continental Airlines, Allied Stores, Federated Department Stores, Greyhound, Maxwell Communication and Olympia & York. Indeed, also the …nancial sector lived years of trouble between the 1980s and the 1990s, like the well-known “savings and loan”crisis. The …nancial crisis did not involve the banking sector only, as the 1987 market crash showed. The second peak in our series appears in the 1999-2002 period and, again, this is not surprising: in the years 2000-2001 a strong …nancial crisis took place, starting from the so-called “Dot-com” (or “Tech”) bubble, causing the recession of 2001 and 2002. After a period of stability from 2003 to 2007, a new peak characterizes the …nal part of our sample, from 2008 to 2010, starting from the …nancial sector with the subprime crisis of 2007 and spreading to the real, as a global and systemic crisis, in the following years. It is interesting to compare the default count time series to macroeconomic indicators such as the monthly Leading Index published by the Federal Reserve. The Leading Index includes the Coincident Index and a set of variables that “lead” the economy: the state-level housing permits, the state initial unemployment insurance claims, the delivery times from the Institute for Supply Management (ISM) manufacturing survey, the interest rate spread between the 10-year Treasury bond and the 3-month Treasury bill. Looking at Figure 5.3, the low level in the late 1980s and earlier 1990s as well as in 2000-2002 con…rms the previous analysis, and again the last crisis turns out to be the most dramatic period. Another relevant index, explicitly signalling the phases of the business cycle, is the recession indicator released by the National Bureau of Economic Research (NBER): the NBER recession indicator is a time series which consists in dummy variables that distinguish the periods of expansion and recession, where a value of 1 indicates a recessionary period, while a value of 0 signals an expansionary one. The shaded areas created by the recession dates in Figure 5.4 con…rm the previous identi…cation of three turbolence periods (1982-1991, 20002002, 2008-2010). In our analysis we shall also consider the connection between the CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 81 Figure 5.3: Monthly Leading Index from January 1982 to December 2011. business cycle and the number of corporate defaults. Based on the previous considerations, in Table 5.1 we show some descriptive statistics of the data in di¤erent subsamples of our dataset, which includes a total of 360 observations. In particular, we distinguish the three clusters of the late 1980s and early 1990s, the …rst 2000s and 2007-2010 respectively. In addition to the mean, the standard deviation and the median we also report the variance, underlying that all the considered subsamples present data overdispersion. It is interesting to note that the e¤ects on defaults of the crisis spread in 2000 Table 5.1: Descriptive statistics of the default count data. Sample Mean Std. Dev. Variance Median …rst cluster: 1986-1991 3.54 3.54 7.50 3 second cluster: 2000-2003 7.69 3.79 14.83 7 third cluster: 2007-2010 5.96 6.65 44.17 4 whole dataset 3.51 3.95 15.57 2 CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 82 Figure 5.4: Monthly NBER recession indicator from January 1982 to December 2011. are the most severe in terms of average number of defaults. In the last …nancial and economic crisis period the most relevant aspect is instead the variance, as the number of defaults explodes and decreases quickly, while the previous clusters are more lasting in time. 5.3 Choice of the covariates Our empirical study concerns the time series analysis and modelling of the number of corporate defaults and also aims at measuring the impact of the macroeconomic and …nancial context on the defaults phenomenon. This needs some re‡ections about the variables to be considered, that are expected to be common factors for corporate solvency conditions and thus to be predictive of the default clusters. This section complements the previous - describing the default counts dataset which will be our response time series - by presenting the other data included in our study and motivating our choices. The covariates presented in the following can be divided into two groups: CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 83 …nancial and credit market variables production and macroeconomic indicators All the variables are included using monthly frequency data. 5.3.1 Financial market variables The performance of the …nancial market in‡uences both …rms returns on …nancial investments, thus their pro…tability, and their funding capability, two aspects which strongly a¤ect the liquidity and solvency conditions. Not only the stock market, but also the monetary market, which includes short-term …nancial instruments such as Treasury Bills, deposits and short-lived mortgages, is part of the …nancial market and a relevant part of the credit market, where the companies raise funds. With respect to funding, important variables are those expressing its cost, thus the interest rates and the relations between di¤erent interest rates, i.e. the credit spreads, which are widely used for deriving the implied di¤erences in risk. The market is not the only evaluator of the corporate debt issuers, which are subject to the risk to become insolvent, but also to that of being downgraded by the rating agencies. Based on the above considerations, the …nancial and credit market variables we consider here are a measure of realized volatility of returns, the spread between the Moody’s Baa rated corporate bonds yield and the 10-year Treasury rate and the number of Moody’s downgrades. Realized Volatility of returns Our choice of using a measure of volatility of the stock returns rather than the returns themselves is motivated by the features of the corporate defaults time series, whose dynamics are mostly driven by variance. Indeed, as expected for rare events, the mean number of defaults is low and the level often comes back to zero. It is interesting to investigate the link between the …nancial market and the corporate defaults dynamics, which is expected to be strong in the crisis periods. Realized volatility deserves a special insight for several reasons. First, as for each of the covariates we CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 84 include in PARX models, it is important to analyze its time series properties and verify whether the assumptions on its dynamics (see in particular Assumption 2 in Chapter 4) are satis…ed. Furthermore, estimating a model for the covariate processes allows multi-step ahead forecasting (see Section 4.5). Recalling Section 4.1, the traditional realized volatility measures rely on the theory of a series of seminal papers by Andersen, Bollerslev, Diebold and Labys (2001), Andersen, Bollerslev, Diebold and Ebens (2001), and Barndor¤-Nielsen and Shephard (2002), showing that the daily integrated variance, i.e. the integral of the instantaneous variance over the one-day interval, can be approximated to an arbitrary precision using the sum of intraday squared returns. Furthermore, other works such as Andersen, Bollerslev, Diebold, and Labys (2003) show that direct time series modelling of realized volatility strongly outperforms both the GARCH and stochastic volatility models. Our approach refers to this theory, even though is not really high-frequency: we construct a proxy of monthly realized volatility by using the daily returns. Monthly volatility proxies of this kind can be found, for example, in French, Schwert and Stambaugh (1987) and Schwert (1989). According to this approach we de…ne the following measure for the S&P 500 monthy realized volatility: RVt = nt X 2 ri;t (5.1) i=1 where ri;t is the i-th daily return on the S&P 500 index in month t and nt is the number of trading days in month t. The high values of skewness (9:02) and kurtosis (100:26) of our proxy of realized variance indicate that it is far from being normally distributed. Nonnormality is pointed out in empirical works based on realized volatility measures from high frequency data, such as Martens et al. (2009). Realized volatility time series usually show high variance and peaks, recalling the sharp spikes of in…nite variance processes that have often been used for modelling the stock market prices (see, for example, Fama, 1965). The logarithmic transformation of our monthly realized volatility (see Figure 5.5 (a)) is more suitable for standard time series modelling, because the variance is lower and there are no outlier observations. The high and slowly decaying autocorrelation (see Figure 5.5 (b)) suggests the use of long memory processes such CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 85 Figure 5.5: (a) Logarithm of S&P 500 monthly realized volatility. (b) Autocorrelation function of logarithmic realized volatility. as ARFIMA. The long memory of realized volatility is a crucial point in some recent works on this topic - such as Andersen, Bollerslev and Diebold (2007) and Corsi (2009) - and put in doubt that the needed stationarity condition is satis…ed. However, the same works claim that the long memory is “apparent”in the sense that the persistence in realized volatility series can be e¤ectively captured by a special class of autoregressive models, which include di¤erent autoregressive parts corresponding to volatility components realized over di¤erent time horizons. These models are called Heterogeneous Autoregressive model of Realized Volatility (HAR-RV). Corsi (2009) de…nes a HAR model for daily realized volatility calculated from intraday data by considering three volatility components corresponding to time horizons of one day (1d), one week (1w) and one month (1m). These “heterogeneous” lags can be interpreted as taking into account …nancial returns variability with respect to di¤erent investment time horizons. The speci…cation proposed by the author for the daily realized volatility is the following: (d) (d) where RVt (d) (w) (m) RVt = c + (d) RVt 1d + (w) RVt 1d + (m) RVt 1d + "t (5.2) qP nt 2 = i=0 ri;t and nt number of available intraday squared returns while CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 86 (w) 1 RVt (m) and (m) 1 RVt denote the weekly and monthly realized volatility respectively, computed as: (w) = 51 (RVt (m) = RVt RVt (d) (d) 1d + RVt (m) 1 (RVt 22 (d) 4d ) + ::: + RVt (m) 1d + RVt (m) 21d ) + ::: + RVt where the multiperiod volatilities are calculated as the simple averages of the daily ones during the period. This model is shown to be able to reproduce the long memory of the empirical volatility. The model performance in terms of both in-sample and out-of-sample forecasting is comparable to that of fractionally integrated models and can be estimated more easily, since OLS can be employed. Adapting this approach to our monthly realized volatility could be useful for carrying out multi-step ahead forecasting in a PARX model including this variable. A possible choice of the “heterogeneous”lags suitable for our monthly measure would be including the …rst lag of logarithmic realized volatility and the last half-year logarithmic realized volatility. The latter is computed as the simple average of the last six monthly logarithmic realized volatility. This yields the following model: log RVt = c + (1m) log RVt 1 + (6m) (6m) 1 log RVt + "t (5.3) where RVt is de…ned in (5.1), while for the longer period component we have: 1 (6m) log RVt = (log RVt + log RVt 1 + ::: + log RVt 5 ) 6 Following the notation of Corsi (2009), this speci…cation corresponds to a HAR(2) model, because two volatility components are entered. As an example, estimation of (5.3) for the logarithm of monthly realized volatility in the period from 1982 to 2011 yields the following model: log RVt = 1:1030 + 0:5543 log RVt (0:2711) (0:0580) 1 (6m) 1 + 0:2733 log RVt (0:0527) which is a stationary autoregressive process. Baa/10-year Treasury spread The default risk premium, i.e. the risk premium the investors require for accepting the risk of corporate default, is often calculated as the di¤erence between the yields CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 87 Figure 5.6: Monthly spread between Baa Moody’s seasoned corporate bonds and 10-year Treasury yield. on corporate bonds and the yields on government securities - mainly the Treasury bills - which are expected to be risk free. The spreads on Treasury rates can be considered as an implied default risk, which we expect to be positively correlated to default intensity. One of the most used is the Baa/10-year Treasury spread, i.e. the di¤erence between the Moody’s seasoned Baa corporate bond yield and the constant maturity 10-year Treasury rate. Our source for both rates is the FRED website1 , provided by the Federal Reserve Bank of St. Louis. Being a measure of the market perception of credit risk, the Baa/10-year spread is usually higher during recession periods, when the investors are worried of default risk even for upper-medium quality …rms like the Baa rated. This is evident from Figure 5.6: look, for example, at the high peak in the last crisis period. 1 http://research.stlouisfed.org/. CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 88 Number of downgrades The monthly counts of defaults are not the only data we get from Moody’s CRC, which also provides the monthly rating transition matrices, where each entry is the number of …rms moving from a rating class to another (see 2.1.1 for a comprehensive analysis of rating and its modelling). As discussed before, the main role of rating is to give an objective evaluation of corporate solvency. Therefore, the number of …rms which are downgraded, i.e. moved to a lower rating class, it is naturally expected to be predictive of an increased default probability. However, the capability of rating to be a default predictor is not so fair, and, as seen, also put under discussion by several econometric analyses, like, among the others, Blume et al. (1998) and Nickell et al. (2000). Thus we think that is important to measure whether and how much the number of downgrades can support the prediction of the number of defaults. At a …rst sight (see Figure 5.7), most of the downgrade peaks correspond to the recession periods and the default clusters, except for the …rst peak taking place in 1982, which is due to a credit rating re…nement carried out and announced by Moody’s, which modi…es the classes number and assignment (see Tang, 2009). 5.3.2 Production and macroeconomic indicators Change in Industrial Production Index The Industrial Production Index is an economic indicator that measures the real output for all facilities located in the United States manufacturing, mining and utilities. It is compiled by the Federal Reserve System on a monthly basis in order to bring attention to short–term changes in the industrial production. As it measures the movements in the industrial output, it is expected to highlight the structural developments in the economy. Its change can be considered as an indicator of the growth in the industrial sector and is already used as a default intensity regressor in Lando and Nielsen (2010). The monthly percentage change in Industrial Production index (Figure 5.8) is computed as the logarithmic di¤erence of the monthly Industrial production Index downloaded from the FRED website. CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 89 Figure 5.7: Monthly number of downgrades among industrial Moody’s rated …rms. Figure 5.8: Monthly percentage change in Industrial Production Index. CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 90 Leading Index and NBER recession indicator As our analysis of the default phenomenon is made under an aggregate perspective, we claim that the e¤ect of business cycle on the default intensity has to be measured through overall indicators, representing the state of the economy - such as the Leading Index published by the Federal Reserve - or signalling the expansion and recession periods, as captured by the NBER recession index. They have been presented in Section 5.2. The data for both variables are downloaded by the FRED website. For each …nancial and macroeconomic covariate described above, we perform an Augmented Dickey-Fuller (ADF) test, rejecting the null hypothesis of presence of unit roots in all the cases. All the variables introduced above can thus be employed in the following analysis, since they satisfy the Lipschitz condition (see Assumption 2 in Chapter 4). For realized volatility, the ADF test has been performed on the series in logarithms, whose properties we have previously investigated. 5.4 Poisson Autoregressive models for corporate default counts The …rst objective of our analysis of corporate default counts dynamics is to evaluate whether the inclusion of exogenous variables can improve the prediction of the number of defaults. In particular, we consider alternative PARX models by including di¤erent covariates and compare the results. Furthermore we compare the PARX models with the Poisson Autoregression without exogenous regressors as proposed by FRT (2009) (PAR). We mainly focus on two aspects: …rst, we evaluate which of the chosen variables allow to explain the default intensity; second, we compute the value of the estimated persistence. As seen before, the latter allows to measure the persistence of shocks in the default counts process. We also aim at evaluating whether the inclusion of di¤erent covariates has a di¤erent impact on the estimated persistence: the magnitude of the autoregressive coe¢ cients is expected to decline in the case one or more covariates explain most of the series long memory. This objective is thus similar to that of several empirical studies which consider the im- CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 91 pact of covariates, such as the trading volume, in the GARCH speci…cation (see, for instance, Lamoureux and Lastrapes, 1990 and Gallo and Pacini, 2000) and evaluate their e¤ect on the ARCH and GARCH parameter estimates. In our context, the …nancial and macroeconomic variables explaining the default intensity can be considered as common factors in‡uencing the solvency conditions of all companies. As seen before, in PARX models negative covariates are handled by transforming them through a positive function f , which can be chosen case by case, as long as the Lipschitz condition stated in Assumption 1’of Chapter 4 is satis…ed. The speci…cation which generalizes (4.3) by including an n-dimensional vector of covariates is the following: t =!+ p X i yt i+ i=1 where ! > 0, 1; 2; 1; i q X i=1 + 0, f : R ! R . i t i+ n X i fi (xi;t 1 ) (5.4) i=1 According to the choice motivated in the previous section, the covariates included are the following: S&P 500 realized volatility (RV ) (see Section 5.3.1 for details on its computation) Baa Moody’s rated to 10-year Treasury bill spread (BAA_T B) Moody’s downgrade count (DG) NBER recession indicator (N BER) percentage change in Industrial Production Index (IP ) Leading Index (LI) Function f is simply the identity for covariates assuming only positive values, while we use the absolute value for transforming the two variables which assume also negative values, that are the percentage change in the Industrial Production Index (IP) and the value of Leading Index (LI). Both are also expected to be negatively correlated to default intensity. Then, for capturing the asymmetric e¤ect of positive and negative values of these covariates, we introduce a dummy variable which is 1 CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 92 when the value is lower than zero. This solution is analogous to that adopted in the GJR-GARCH model by Glosten et al. (1993), where a dummy variable is introduced for capturing the asymmetric e¤ect of positive and negative lagged returns. According to Engle and Ng (1993), in the volatility modelling this approach outperforms other speci…cations that overcome the problem of nonnegativity, such as the EGARCH by Nelson (1991). As to realized volatility covariate, in the previous section we have analyzed its logarithmic transform, which is stationary according to the ADF test performed. Furthermore, as we have seen, our logarithmic realized volatility has similar properties to the realized volatility measures analyzed in literature, whose long memory can be e¤ectively captured by stationary HAR processes (Corsi, 2009). Variable RV can then be considered as the exponential transformation of the logarithmic realized volatility, satisfying the model assumptions. Preliminary model selection based on information criteria and likelihood ratio tests leads to choose p = 2 and q = 1, i.e. two lags of the response and one lag of intensity. Thus, the model including all the six covariates - nesting all the estimated models presented in the next section - is speci…ed as = !+ t 1 yt 1 + + 4 N BERt 5.4.1 2 yt 2 1 + 5 + 1 t 1 jIPt 1 j + + 1 RVt 1 6 IfIPt + 1 <0g 2 BAA_T Bt 1 jIPt 1 j + 7 + (5.5) 3 DGt 1 jLIt 1 j + 8 IfLIt 1 <0g jLIt 1 j Results Table 5.2 shows the results obtained by estimating2 nine di¤erent PARX models. The upper portion of Table 5.2 reports the parameter estimates (standard errors in brackets). The lower portion reports, for each model, two information criteria, i.e. the AIC (Akaike, 1974) and the BIC (Schwarz, 1978), and the p-value of the likelihood ratio (LR) test. The latter compares each estimated model with respect to the one which includes all the six covariates (All in Table 5.2), thus following a speci…c-to-general model selection approach. The second column reports the results for the PAR model, i.e. the model with no covariates. The columns from third to eighth in Table 5.2 report the results of estimation of models including one covariate 2 We write in Matlab the optimization code for maximum likelihood estimation. CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 93 at the time. As explained above, for covariates IP and LI we also consider the e¤ect of negative values separately, by introducing a dummy variable as in (5.5). The …rst evidence from our results is that the autoregressive components play the main role in the defaults dynamics. The estimated persistence is indeed not far from one in all the models. The number of defaults in the US economy shows a high persistence of shocks, supporting our proposal of a model able to capture long memory. But can exogenous covariates explain the strong autocorrelation, and then the clusters, of defaults? The …rst evidence is that several of the covariates we have considered are found signi…cant in explaining default intensity when included one at the time. They are the S&P 500 index realized volatility, the Baa Moody’s rated to 10-year Treasury spread, the number of Moody’s downgrades and the NBER recession indicator3 . First of all, we think that it is of particular interest that a …nancial variable as realized volatility accounts for a real economic issue as defaults of industrial …rms. The inclusion of realized volatility is indeed new in default risk analysis. While the use of credit spreads like the Baa to 10-year Treasury Bill is quite common in default risk prediction - especially in the reduced-form models mentioned in Chapter 1, using a pricing approach to default risk measurement - the inclusion of the number of downgrades among the regressors of default counts is new as well. In fact, there are in literature several works focusing on the link between the rating transitions and the business cycle - like, among the others, Nickell et al. (2000) and Behar and Nagpal (2001) - but not estimating a direct relation between downgrades and defaults at an aggregate level. The signi…cance of the NBER recession indicator highlights a connection between the business cycle and the defaults dynamics and con…rms the idea of a relation between economic recession and default clusters. The e¤ect of the macroeconomic context on default intensity is also captured by including the Industrial Production Index and the Leading Index. The asymmetric e¤ect of the positive and negative values of variables IP and LI on default intensity is con…rmed, 3 All the mentioned covariates are found signi…cant at a level of 5% or less, except for the number of downgrades, which is found signi…cant at the 10% level. CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 94 as they are found signi…cant only when assuming negative values4 : both a decrease in Industrial Production and a decrease in the value of Leading Index result in a higher predicted level of default risk. According to the LR test, as well as information criteria, all the models including one covariate at the time are preferable to the PAR model, thus highlighting that covariates are needed to account for the default phenomenon. Among these PARX models, according to both the information criteria and the LR test, the best are RV and LI. Realized volatility of returns and negative values of Leading Index are indeed the only two signi…cant covariates in the All model (5.5), including all the covariates. The result that the number of defaults is positively associated to the level of uncertainty shown by the …nancial market only one month before is of particular interest and could be e¤ectively used for risk management operational purposes. Furthermore, the signi…cance of Leading Index shows that the macroeconomic context is relevant in default prediction. This is not an obvious result, as the existence of a link between macroeconomic variables and corporate default phenomenon is not always supported by similar analyses in the econometric literature. While, for example, Keenan, Sobehart, and Hamilton (1999) and Helwege and Kleiman (1997) forecast aggregate US corporate default rates using various macroeconomic variables, including industrial production, interest rates and indicators for recession, in some recent works the estimated relation between the default rates and the business cycle is not so strong. In particular, the empirical results of both Du¢ e et al. (2009) and Giesecke et al. (2011) show a not signi…cant role of production growth and Lando et al. (2013) …nd that, conditional on individual …rm risk factors, no macroeconomic covariate is signi…cant in explaining default intensity. Looking now at the estimated persistence (^ 1 + ^ 2 + ^ 1 ) and comparing it between PAR and All models, we observe that the inclusion of covariates leads to a small decrease in the level of persistence (from 0:9155 to 0:8758), which is not signi…cant. The large value of the estimated persistence and its substantial invariance when exogenous covariates are included indicates that the autoregressive parts of the model 4 For models “IP” and “LI ”, as well as “All ”, we perform a restricted maximization of the log-likelihood function by constraining the coe¢ cients to be positive. CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 95 explain most of the slowly decaying behaviour of the autocorrelation function characterizing the default dynamics (see Figure 5.2). However, …nding signi…cant variables in default count time series is of relevant interest in default risk evaluation and forecasting. An increase in the level of the identi…ed risk factors can indeed be a “warning”for risk managers and, in general, default risk evaluators. The …nal model we obtain on the basis of our model selection procedure is labelled LMRV & LI (-) in Table 5.2. Here we include both the S&P 500 realized volatility and the Leading Index - when taking negative values - in the model speci…cation. 1 (0.0617) (0.0667) -1352.04 -1336.47 0.0000 LR test (p-value) 0.4424 -1349.36 -1368.82 0.9026 (0.0170) 0.9155 (0.0267) BIC (0.0675) 0.5520 (0.0643) 0.1453 (0.0451) 0.2127 (0.0944) 0.2023 IP (0.0724) 0.4979 (0.0635) 0.1979 (0.0452) 0.1927 (0.1465) 0.2949 LI (0.0686) 0.5177 (0.0618) 0.1878 (0.0450) 0.1850 (0.0717) 0.2324 RV & LI( 0.8788 0.0455 -1340.40 -1359.86 (0.0169) 0.8731 0.0047 -1333.42 -1352.88 (0.0213) 0.9039 0.0095 -1335.48 -1354.94 (0.0168) 0.9100 0.0982 -1337.17 -1360.52 (0.0202) 0.9931 -1351.71 -1375.06 (0.0223) 0.8885 0.9964 -1354.17 -1377.52 (0.0261) 0.8905 0.7540 0.7297 (0.1954) 0.9413 (0.2245) -1319.14 -1365.84 (0.0241) 0.8758 (0.3189) 0.0000 (0.0821) 0.0000 0.0000 (0.1843) 0.6945 (0.2113) (0.0644) 0.0000 (0.1647) (0.1423) (0.4656) (0.1883) 0.0000 0.0000 0.4196 0.0059 (0.0092) (0.0090) (0.0951) 0.0171 (0.0867) (14.368) 24.313 (0.0723) 0.5123 (0.0633) 0.1834 (0.0457) 0.1801 (0.2081) 0.2083 All 0.0000 ) 0.2407 (13.659) (0.0746) 0.4696 (0.0657) 0.2063 (0.0445) 0.2280 (0.0816) 0.2897 NBER (15.565) (0.0750) 0.4547 (0.0653) 0.1976 (0.0448) 0.2208 (0.0930) 0.2065 DG 28.092 (0.0797) 0.4298 (0.0660) 0.2217 (0.0448) 0.2273 (0.1621) 0.1166 BAA_TB 63.991 (0.0663) 0.1796 0.2148 (0.0755) (0.0447) (0.0443) 0.5263 0.1966 0.2409 0.4598 0.1690 (0.0685) 0.3015 (0.0832) AIC ^ 1 + ^2 + ^1 ^8 ^7 ^6 ^5 ^4 ^3 ^2 ^1 ^ ^2 ^1 ! ^ RV Table 5.2: Estimation results of di¤erent PARX models. PAR CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 96 CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 97 Figure 5.9: Observed and …tted monthly number of defaults from January 1982 to December 2011 for PARX model including logarithmic realized volatility and Leading Index. 5.4.2 Goodness of …t analysis Overall, as can be seen from Figure 5.9, the model including realized volatility and Leading Index using the prediction y^t = ^ t captures the default counts dynamics satisfactorily. A commonly used diagnostic check for Poisson-type count models is to test the absence of autocorrelation in the Pearson residuals (see Section 3.2.5), which are ^ the standardized version of the raw residual yt t ( ), taking into account that the conditional variance of yt is not constant. In fact, the sequence of Pearson residuals estimates the sequence yt et = p t ; t = 1; :::; T t which, as previously seen, is an uncorrelated process with mean zero and constant variance under the correct model. In addition, no signi…cant serial correlation should be found in the sequence e2t as well. As can be seen from Figure 5.10, the Pearson residuals of our …nal estimated model do not show signi…cant autocorrelation at any lag, thus approximate a white noise satisfactorily. In order to check the adequacy of CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 98 Figure 5.10: Autocorrelation function of Pearson residuals for PARX model including logarithmic realized volatility and Leading Index. our model, following Jung et al. (2006) we perform a Ljung-Box test for the Pearson residuals and the squared Pearson residuals including 30 lags. The resulting p-values (0:661 and 0:373 respectively) indicate that the model successfully accounts for the dynamics of the …rst and second order moments of our default counts. An important point concerning the PARX model goodness of …t analysis in the speci…c case of our empirical study should be considered: when applying the PARX model to the counts of defaults, the aim is to capture the default clusters and signal the periods where the default intensity, and thus the default risk, is higher. Then, the model performance is crucial when the number of observed events is relatively high. In this respect, Table 5.3 compares the empirical (second column) and estimated frequencies (third column) for di¤erent values of yt . Each of the estimated frequencies is computed as the probability of observing a count falling in the range de…ned in the …rst column, under the estimated model. In order to test the equality between theoretical and observed frequencies, we employ the test derived in the following and similar to the common test for equality of Bernoulli proportions. Suppose that we want to test the equality of the empirical and theoretical frequency of yt values belonging to a subset A of N[f0g = f0; 1; 2; :::g. CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 99 Count Empirical frequency Estimated frequency p-value yt = 0 0 < yt 5 yt > 5 5 < yt 10 0:18 0:12 0:001 0:62 0:68 0:002 0:21 0:19 0:384 0:14 0:14 0:741 Table 5.3: Empirical and estimated frequencies of default counts. First de…ne Zt = I (yt 2 A) and t It can be noted that E (Zt = Pr (Zt = 1jFt 1 ) t jFt 1 ) = 0, i.e. Zt is a martingale di¤erence t sequence with respect to Ft 1 . The conditional variance of each Zt t variable can be derived as follows: V (Zt t jIt 1 ) 2 t) = E (Zt = E Zt2 jIt 1 = E Zt2 jIt 1 jIt 1 2 t + 2 t De…ne now ST = = E Zt2 jIt T X + 2 t 2 t E (Zt jIt 1 ) 2 t 2 = 1 t (Zt 2 t = t (1 t) t) t=1 As the sequence t (1 t) is a stationary and ergodic process, we have that the mean of the conditional variances is asymptotically constant: V S pT T = T 1X T t=1 t (1 t) p ! 2 This allows to apply the Martingale Central Limit Theorem (Brown, 1971) to ST and state that sT = qP T t=1 ST t (1 t) !d N (0; 1) CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 100 A one-sided or two sided test can be constructed based on N (0; 1) critical values, replacing the unknown t ’s with their estimates ^ t = Pr Zt = 1j t (^) given by the model. The last column of Table 5.3 shows the p-value of the two-sided test constructed as above for di¤erent A subsets. As can be seen from Table 5.3, for values larger than 5 and for the subset (5; 10], we accept the null hypothesis of equality between the empirical and theoretical proportion at the 5% signi…cance level. It is a good result that the model correctly estimates the frequency of defaults when the relevance of the phenomenon becomes considerable. Prediction is indeed not crucial in periods of stability, when defaults are rare and isolated events. Equality is rejected when the number of defaults is null or very low. Some considerations have to be made about the incidence of zero counts. Default of rated …rms is a rare event, nearly exceptional in periods of economic expansion and …nancial stability. Thus, default count time series are characterized by a high number of zero observations. In our default counts dataset, there are 63 zeros on a total of 360 observations, corresponding to a proportion of 17:5%. In the PARX models, the distribution of the number of events conditional on its past and on the past value of a set of covariates is Poisson. The Poisson distribution does allow for zero observations. At each time t, the probability of having a zero count is given by exp( t ), intensity i.e. the probability corresponding to value 0 in a Poisson distribution of t. An aspect often investigated in Poisson regression models speci…cation analysis is whether the incidence of zero counts is greater than expected for the Poisson distribution. In our application, the analysis of the incidence of zero counts should take into account two main points. First, the empirical frequency of zero counts has to be compared to that implied by the PARX model. Then, the relevance of a possible underestimate of the number of zeros has to be evaluated with respect to our speci…c case. Figure 5.11 can give an idea of the relation between the observed zeros and the CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 101 Figure 5.11: Empirical zero counts (asterisks) and probability of having a zero count under the estimated model (crosses). probability of having a sampling zero under the model assumptions. There is a clear correspondance between the periods characterized by a higher number of zeros and the probability of having a sampling zero. The latter reaches values of more than 40% in the two most “zero-in‡ated” periods of 1982-1987 and 1994-1997. There is only one part of the series, around year 1987, showing an estimated frequency of less than 10% zeros when the empirical one is high. However, this period anticipates that of the last eighties …nancial crisis, characterized by a rapidly increasing number of defaults and corresponding to a decrease in the estimated zero counts probability. A possible way of accounting for excess zeros in the Poisson models is to de…ne mixture models such as those proposed and applied in the works of Mullahy (1986), Johnson, Kotz, and Kemp (1992) and Lambert (1992) and known as Zero-In‡ated Poisson models (ZIP). In ZIP models, an extra-proportion of zeros is added to that implied by the Poisson distribution.The zeros from the Poisson distribution can be considered as sampling zeros, occurring by chance, while the others are structural zeros, not depending on the regressors dynamics. It is worth to note that in our application, considering an aggregate data of default incidence, the distinction between CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 102 structural and sampling zeros is not so relevant. First of all, the occurrence of the single default is linked to the individual …rm history and to occasional - and di¢ cult to predict - individual events. Furthermore, the zero-in‡ated periods are those where the importance of default prediction is low. 5.5 Out-of-sample prediction We perform a forecasting experiment for evaluating the PARX model out-of-sample performance. We focus, in particular, on the out-of-sample prediction in the period going from January 2008 to December 2011, corresponding to the last …nancial crisis and showing a sharp peak in the number of defaults. In particular, we perform a series of static one-step-ahead forecasts, updating the parameter estimates at each observation. The PARX model we consider includes the S&P 500 realized volatility and the negative values of the Leading Index, which is the preferable model according to the selection presented in the previous section. We also compare the results with those obtained with the PAR model, for evaluating whether the covariates included improve the prediction. Table 5.4 shows the results of both point (third and sixth column) and interval (columns fourth to …fth and sixth to seventh) estimate at each step, from h = 1 to h = 48, corresponding to the last observation in our dataset. Following Section 4.5, the point estimate of yT +h is de…ned as y^T +hjT +h 1 = ^ T +hjT +h 1 while the 95% con…dence interval for the estimate of y^T +hjT +h 1 is given by h i ^ ^ CI1 = Q =2j T +hjT +h 1 ; Q 1 =2j T +hjT +h 1 where = 0:05. In Table 5.4, Q =2j ^ T +h j T and Q 1 =2j ^ T +h j T are indic- ated as “min” and “max” respectively. We also report, as performance measures, the mean absolute error (MAE) and the root mean square error (RMSE). According to both indicators, the PARX model slightly outperforms the model without covariates. A comparison between the two models is also possible from Figure 5.12, plotting the actual number of defaults joint to the minimum (“min”) and maximum CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 103 (“max”) value of the forecast con…dence interval for the PARX (…rst panel) and the PAR (second panel) model. Not surprisingly, in both cases the peak of March 2009, corresponding to an outlier in the default count time series, is out of the forecasting interval. There is indeed for both models a delay of three months in predicting the sharpest peak of the series. However, the PARX model predicts four defaults more than the PAR in the peak, thus considering the realized volatility - as a proxy of the …nancial market uncertainty- and the Leading Index - summarizing the macroeconomic context - allows to reduce the underestimate of the number of defaults in this cluster. Furthermore, the rapid increase of the default counts starting from November 2008 is captured better from the PARX model, whose predicted values increase more quickly than the number of defaults forecasted by the PAR. The high value of persistence, not far from one in all the estimates, and the consequent slow decrease of the autocorrelation lead the predicted series to decrease more slowly than the empirical series of default counts. Overall, the PARX model performs better than the PAR in capturing the default clustering. CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 104 PARX PAR min max min max 1.094 0 4 1.081 0 3 3 1.786 0 5 1.779 0 5 3 4 2.340 0 6 2.337 0 6 4 3 2.703 0 6 2.705 0 6 5 7 2.958 0 7 2.919 0 7 6 3 3.635 0 8 3.602 0 8 7 6 3.953 1 8 3.893 1 8 8 4 4.230 1 9 4.108 1 8 9 5 4.501 1 9 4.287 1 9 10 4 4.590 1 9 4.334 1 9 11 3 4.589 1 9 4.321 1 9 12 16 6.077 2 11 4.059 1 8 13 11 8.912 4 15 6.046 2 11 14 16 11.066 5 18 8.144 3 14 15 29 13.230 7 21 10.178 4 17 16 19 17.216 10 26 15.170 8 23 17 23 19.602 11 29 18.103 10 27 18 21 20.121 12 29 18.963 11 28 19 14 20.290 12 30 19.600 11 29 20 5 18.369 10 27 17.767 10 26 21 16 14.690 8 23 13.650 7 21 22 6 12.512 6 20 11.867 6 19 23 5 11.062 5 18 10.786 5 18 24 6 8.255 3 14 7.840 3 14 25 6 6.705 2 12 6.470 2 12 26 1 5.970 2 11 6.012 2 11 27 5 4.731 1 9 4.617 1 9 28 4 3.799 1 8 3.788 1 8 29 0 3.926 1 8 4.116 1 9 30 3 3.161 0 7 3.063 0 7 31 3 2.546 0 6 2.387 0 6 32 4 2.801 0 6 2.790 0 6 33 2 3.076 0 7 3.205 0 7 34 2 3.055 0 7 3.138 0 7 35 4 2.599 0 6 2.652 0 6 36 4 2.723 0 6 2.917 0 7 37 0 3.212 0 7 3.479 0 8 38 1 2.660 0 6 2.762 0 6 39 3 1.832 0 5 1.822 0 5 40 0 1.964 0 5 2.070 0 5 41 2 1.876 0 5 1.912 0 5 42 2 1.595 0 4 1.656 0 5 43 0 1.849 0 5 1.971 0 5 44 0 1.613 0 4 1.634 0 5 45 1 1.185 0 4 1.049 0 3 46 1 1.298 0 4 1.018 0 3 47 3 1.465 0 4 1.224 0 4 48 2 1.933 0 5 1.802 0 5 h yT +h 1 5 2 y^T +hjT +h 1 y^T +hjT +h 1 MAE 2.543 2.840 RMSE 4.119 4.613 Table 5.4: Out-of-sample estimation results of PARX and PAR model. CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 105 Figure 5.12: Actual and forecasted number of defaults with PARX (…rst panel) and PAR (second panel) model. 5.6 Concluding remarks In this chapter we have presented an empirical analysis of the corporate default dynamics. Our study is based on the estimation of Poisson Autoregressive models for the monthly count of defaults among Moody’s rated industrial …rms in the period from January 1982 to December 2011. The objectives of our analysis is two-fold: …rst, we want to evaluate whether there are macroeconomic and …nancial variables which can be useful in default prediction; secondly, an important point is to consider the relevance of the autoregressive components, whose presence is an essential part CHAPTER 5. EMPIRICAL STUDY OF CORPORATE DEFAULT COUNTS 106 of our modelling approach. We estimate both the Poisson Autoregression without covariates (PAR) and di¤erent PARX models, including macroeconomic and …nancial covariates. Our results show that all the PARX models estimated are preferable to the PAR. The more relevant covariates in explaining default intensity according to our results are a macroeconomic variable - the Leading Index released by the Federal Reserve - and a …nancial variable - the realized volatility of the S&P 500 returns. At our knowledge, this is the …rst work showing a positive association between the …nancial market uncertainty captured by the realized volatility and the number of corporate defaults. The link between the returns realized volatility and the defaults dynamics worths to be further investigated. Another aspect which should be further analyzed is the high persistence in the default intensity estimated by PARX models. The persistence of the shocks in the number of defaults could be caused by both persistence in the common default risk factors and contagion e¤ects among …rms. Overall, our results show that the PARX model including realized volatility and Leading Index …ts the data satisfactorily and captures the default clustering. We have also performed a forecasting experiment in order to evaluate the PARX model out-ofsample performance during the 2008-2011 crisis period and reached quite satisfactory results, showing that including covariates improves the out-of-sample prediction of the default counts. Chapter 6 Conclusions We have developed this thesis work in the aim of studying the modelling of default risk, proposing a new modelling framework and highlighting the main factors in‡uencing the corporate defaults dynamics. We have started from the analysis of the stylized facts in corporate default counts and rates time series. The default phenomenon, as most rare events, is characterized by overdispersion - the variance of the number of events is much higher than its mean - leading to series showing both peaks (“clusters”) and periods of low incidence. Moreover, the defaults time series are characterized by a slowly decreasing autocorrelation function, which is a typical feature of long-memory processes. In recent years, encouraged by the increasing relevance of the default phenomenon during the …nancial crisis started in 2008, the econometric and …nancial literature has shown a growing interest in default risk modelling. In particular, as seen in Chapter 2, in most works the topic of default predictability has been investigated by analyzing the link between the default clusters and the macroeconomic context. Another relevant aspect in default prediction is the role of rating, which we have analyzed both in the theoretical part of the thesis and in our empirical study. Several recent works - we have reviewed in details the approach of Das et al. (2006), Lando and Nielsen (2010), Lando et al. (2013) - have developed and applied models based on counting processes, where the modelled variable is the default intensity, i.e. the expected number of defaults in the time unit, typically a month. The use of counts eases 107 CHAPTER 6. CONCLUSIONS 108 the test of independence of default events conditional on common macroeconomic and …nancial factors. Comparing the distribution of the default counts to a Poisson distribution with constant intensity is the crucial feature of the cited works and has inspired our idea: modelling defaults with a conditional Poisson models with timevarying intensity, allowing for overdispersion and slowly decaying autocorrelation of the counts through the inclusion of autoregressive dynamics. We have, then, reviewed the recent literature of Autoregressive Conditional Poisson models (ACP), focusing on Poisson Autoregression by Fokianos, Rahbek and Tjøstheim (2009), which is the …rst work studying ergodicity of these models and providing the asymptotic theory, allowing for inference. De…ning an autoregressive Poisson model for default counts, linking the expected number of default events on its past history, is the …rst part of our contribution. The inclusion of autoregressive components is also relevant in the analysis of correlation between corporate defaults, linked to the recent debate about the possible existence of default contagion e¤ects. The consideration that the expected number of defaults is probably in‡uenced by the macroeconomic and …nancial context in which corporate …rms operate has led us to the idea of extending Poisson Autoregression by Fokianos, Rahbek and Tjøstheim (2009) (PAR) by including exogenous covariates. This is our methodological contribution, developed in Chapter 4, where we have presented a class of Poisson intensity AutoRegressions with eXogeneous covariates (PARX) models that can be used for modelling and forecasting time series of counts. We have analyzed the time series properties and the conditions for stationarity for this new models, also developing the asymptotic theory. The PARX models provide a ‡exible framework for analyzing dependence of default intensity on both the past number of default events and other relevant variables. In Chapter 5 we have applied di¤erent Poisson Autoregressive models, presenting an extended empirical study of US corporate defaults based on Moody’s monthly default count data. The time interval considered, going from January 1982 to December 2011, includes three clusters of defaults corresponding to three crisis periods: the last eighties …nancial markets crisis, the 2000-2001 information technology bubble and the …nancial and economic crisis started in 2008. We have proposed and motivated a selection of covariates which can potentially explain CHAPTER 6. CONCLUSIONS 109 the default clusters and the strong autocorrelation in the number of defaults. An original feature is, in particular, the inclusion of a measure of intra-monthly realized volatility, computed from daily S&P 500 returns. Realized volatility is indeed expected to summarize the uncertainty on …nancial markets, characterizing the periods of …nancial turmoil when defaults are more likely to cluster. According to the results of our empirical analysis, the one-month lagged realized volatility of returns is the most relevant covariate in explaining default intensity, together with the one-month lagged Leading Index. The latter is a macroeconomic indicator provided by the Federal Reserve and including a set of variables expected to anticipate the US economic tendency. At our knowledge, ours is the …rst work showing a positive association between the …nancial market uncertainty captured by the realized volatility and the number of corporate defaults. Also the inclusion of the Leading Index is new and its signi…cance highlights the predictive role of the business cycle, which previous works try to include using GDP and industrial production growth, not always found signi…cant in explaining default frequencies. Overall, our results have shown that the PARX model including realized volatility and Leading Index …ts the default count data satisfactorily and captures the default clustering. We have also performed a forecasting experiment in order to evaluate the PARX model out-of-sample performance during the 2008-2011 crisis period and reached quite satisfactory results, showing that including covariates improves the out-of-sample prediction of the default counts. However, the default counts dynamics are mainly led by the autoregressive components and show a high persistence of shocks, even when signi…cant exogenous covariates are included. In this respect, the main consideration arising is that the modelling of the aggregate default intensity should be supported by the analysis of …rm-speci…c, or, at least, sector-speci…c variables. Sector pro…t indexes, for example, could improve the default prediction, as solvency is strongly linked to the …rms balance sheet data. Including less aggregate data in default risk analysis could also allow to identify the risk factors linked to correlation among the solvency conditions of di¤erent companies. The fact that the autoregressive components have a stronger role than the overall default risk factors in explaining the defaults dynamics is an interesting result. However, it is not su¢ cient to state that contagion e¤ects explain CHAPTER 6. CONCLUSIONS 110 the autocorrelation in the number of defaults, as long as the commercial and …nancial links among companies are not taken into account. Another important aspect to point out relative to the prominent role of the autoregressive part is that it should not discourage the search and the analysis of exogenous risk factors. Finding variables signi…cantly associated to the number of defaults can indeed provide warning signals in default risk evaluation. At the aggregate level, the default phenomenon is in‡uenced by the …nancial and macroeconomic context, but, at the same time, has an e¤ect on it. The most immediate example is that of the credit spreads - included in our empirical study - which re‡ect the level of default risk connected to …nancial positions. A higher default risk also a¤ects the agents expectations, having an impact on the uncertainty captured by the …nancial returns volatility. When the number of defaults is high, also the companies investment decisions and the commercial links among …rms are a¤ected, with consequences on industrial production. These considerations suggest the relaxing of the covariate exogeneity assumption and, as a future development of our work, the de…nition of a multivariate model. Another aspect which should be further analyzed is the usefulness of the PARX models for defaults at the operational level: the relevance of a new model for default risk should be evaluated with respect to the actual needs in risk management practices. As an example, one of the main applications of the models for default risk concerns the pricing of corporate bonds. Measuring how much our estimated default intensity re‡ects in the market price of the …nancial instruments issued by rated companies could support the evaluation of the PARX models performance. Appendix A Proofs Proof of Theorem 4.1 Pmax(p;q) ( De…ne := max i=1 i + < 1. Moreover, consider the norm given i) ; by k(x; )kw := wx kxk + w k k, where wx ; wy > 0 are chosen below. Next, with =( 1 ; :::; p) and = 1 ; :::; , and, correspondingly, N of dimension p and q of dimension q, + f (x))0 ; F (x; ; "; N ) = (g (x; ") ; ! + N ( ) + consider, with Nt = (Nt ; :::; Nt p )0 , h i E F x; ; "t ; Nt ( ) F x~; ~ ; "t ; Nt ( ) h n w = wx E [kg (x; ") g (~ x; ")k] + w E Nt ( ) o ~ Nt max(p;q) 1=s wx = wx If = E If = kx 1=s Pmax(p;q) h ( i + i + ~ +w i) L kx i ), X ( i + x~k (A.2) i=1 F x~; ~ ; "t ; Nt ( ) , then choose, 1=s ~ i) 1=s then choose w = wx wx o ~ + ff (x) max(p;q) x~k + w F x; ; "t ; Nt ( ) 1=s ( i=1 L kx +w i=1 X x~k + w + n (A.1) +w w i L = (1 + ) 111 = ( L) such that, (x; ) x~; ~ . w 1=s wx , (A.3) f (~ x)g i APPENDIX A. APPENDIX 112 or w = 1=s wx = ( L), for some small > 0, such that (1 + ) < 1, and hence h i E F x; ; "t ; Nt ( ) F x~; ~ ; "t ; Nt ( ) (1 + ) (x; ) x~; ~ w w . Finally, E F 0; 0; "t ; Nt = wx E [kg (0; ")k] + w ( f (0) + !) < 1 by As- w sumption 4. Then the result holds by Corollary 3.1 in Doukhan and Wintenberger (2008). That yt is stationary is clear. Next, with zt := (x0t ; P ((yt ; zt ) 2 A t) 0 consider B j My;t p ; Mz;t p ) = P (yt 2 A j zt 2 B, My;t p ; Mz;t p ) P (zt 2 B j My;t p ; Mz;t p ) ; where Mx;t k = (xt k ; xt k 1 ; :::). Now by de…nition of the process, P (yt 2 A j zt 2 B, My;t p ; Mz;t p ) = P (yt 2 A j zt 2 B) : Next, using the Markov chain property of zt , P (zt 2 B j My;t p ; Mz;t p ) = P (zt 2 B j Mz;t p ) ; where the right hand side by weak dependence of zt converges to the marginal P (zt 2 B) as p ! 1. Hence so does P ((yt ; zt ) 2 A A; B and p, p ! 1. Now consider E [jyt js ] = Ps j=0 s j B j My;t p ; Mz;t p ) for any E[( t )j ], where max(p;q) X E[ t ] = ( i + yt 1 i) E ( t ) + Ef xt +! 1 i=1 ( t )s = with yt = (yt ; :::; yt s X s j j=0 0 p+1 ) and t j + ! + f xt t 1 = ( t ; :::; s j 1 ; 0 t q+1 ) . Hence, s h X s E[( t ) ] = E j j=0 s =E yt 1 + yt 1 + s t 1 j t 1 ! + f xt + E ! + f xt s j 1 s 1 i + E rs 1 yt 1 ; t 1; f xt 1 ; with rs 1 (y; ; z) an (s 1)-order polynomial in y; ; z and so E [rs induction assumption. Moreover, E ! + f xt s 1 ( )] < 1 by < 1 by applying Doukhan and Wintenberger 1 (2008) (Theorem 3.2) on xt and applying Assumption 2, such that we are left with considering terms of the form, E i yt 1 i s X s = j j=0 s X s = j j=0 = s X s j j=0 s + i t 1 i j i s j E i j i s j i j i s j E i h yt s j j 1 i t 1 i j h X j E ( t )s+(k k k=0 j) i i [( t )s ] + C E [( t )s ] + C; i h k as by induction assumption all E ( t ) < 1, for k < s: Collecting terms, =( i + i) 2 s max(p;q) which for Pmax(p;q) i=1 E [( t )s ] = 4 ( i + i) X ( i + i=1 3s ~ 5 E [( t )s ] + C; i) < 1 has a well-de…ned solution. Proof of Lemma 4.1 In terms of initial values, consider, next, a process Xt = F (Xt 1 ; "t ), where kF (x; ) F (~ x; )k kx x~k, j j < 1 and kg (0; ")k < 1, which is -weakly dependent. With Xt denoting the stationary solution and X0 = x …xed, we wish to P a:s: show, for some h , T1 t=1 h (Xt ) ! E[h (Xt )]. Now, 1X 1X h (Xt ) = [h (Xt ) T t=1 T t=1 and 1X [h (Xt ) T t=1 h (Xt )] h (Xt )] + 1X h (Xt ) ; T t=1 1X jh (Xt ) T t=1 h (Xt )j : APPENDIX A. APPENDIX 114 Assume furthermore that jh (x) h (~ x)j x~jj; Ljjx …nd by repeated use of iterated expectations, h (Xt )j] = E E jh (Xt ) E [jh (Xt ) LE E h (Xt )jj Xt 1 ; Xt g (Xt 1 ; "t ) L E E =L E Xt Xt 1 Xt 1 Xt 1 z, z > 0, then we (z) 1 g Xt 1 ; "t Xt 1 ; Xt Xt 1 ; Xt 1 1 1 L t E [kX0 X0 k] Proof of Lemma 4.2 The proof mimics the proof of Lemma 2.1 in Fokianos, Rahbek and Tjøstheim (2009), where the case p = q = 1 is treated. Without loss of generality, set here p = q , such that, by de…nition, c t t = p X i ytc yt i + i i c t i + ect ; t i (A.4) i=1 with ect := f (xt 1 ) I (kxt 1 k c). Hence E [ ct t] = Pp and, as j=1 j + j < 1, E ect i 1 (c) with Pp result holds with 1 (c) := 1 (c) = 1 j + j j=1 E( c t 2 t) = p X 2 iE ytc i yt 2 i 1 i=0 1 i j j=1 + j . Next, 2 t i + 2 E (ect )2 i=1 +2 p X i jE c t j t j ytc i;j=1;i<j p +2 X c t i iE ect t i +2 i=1 +2 p X i p X yt i i E ect ytc i=1 i jE ytc j yt j ytc i yt i i;j=1;i<j p +2 X i;j=1;i<j i jE c t j t j c t i E ect (c) ! 0 as c ! 1, the …rst c t i 2 iE + Pp Pt t i i yt i i , With c t t; and t s; c t c t ) (ys E [( c t = E [E (( c t = E [( where Fs 1 = ys )] c t ) (ys t) E (xk ; Nk ; k ys )j Fs 1 )] c s ])] (Ns [ s ; =E( c t] 1) and Nt [ t ; s c t yt ) (ysc = E [(ytc c t ys )] = E [E ((ytc yt ) E ((ysc = p X i i=1 p = X i=1 p + X i " i ytc yt i p X + i ytc j yt ) (ysc ys )j Fs 1 )] = E (ytc s; note that the recursion for ( t c t. t c t] Also observe that, s; E [(ytc For t (A.5) s) ; is the number of events in [ t ; for the unit-intensity Poisson process Nt : Likewise for still for t c t) ( s t) c t i i yt i j c t + yt ) ( c s (A.6) s) ; above gives, + ect t i i j ys )j Fs 1 )] c t i j j + ect t i j i j=1 ytc yt i # + ect i i=1 = ::: = t s X aj ytc j yt j + gj et j + j=1 p X cj c s j s j + dj ecs + hj ysc j ys j=1 (A.7) Observe that aj ; gj ; cj ; dj and hj are all summable. Using this, we …nd, ! t s X c E [( ct ys )] = E aj ytc j yt j + gj et j (ysc ys ) t ) (ys j=1 +E p X cj c s j s j + dj ecs + hj ysc j ys j (ysc ys ) j=1 (A.8) Pt 2 2 c Collecting terms, one …nds E ( ct for some t ) is bounded by C j=1 j E et j P1 constant C, some i with i=1 i < 1 and which therefore tends to zero. ! j : APPENDIX A. APPENDIX 116 Finally, using again the properties of the Poisson process Nt , we …nd E (ytc yt )2 E ( 2 t) c t + jE ( c t t )j E( 2 t) c t + 1 (c) : (A.9) This completes the proof of Lemma 4.2. Proof of Theorem 4.2 We provide the proofs for the case of p = q = 1 as the general case is complex in terms of notation. With p = q = 1; t ( ) = ! + yt 1 + t 1 ( ) + f (xt 1 ) : The result is shown by verifying the conditions in Kristensen and Rahbek (2005, Lemma X). Score The score ST ( ) = @LT ( ) = (@ ) is given by ST ( ) = T X yt t( ) st ( ) , where st ( ) = t=1 Here, with 1 @ t @ ( ) . (A.10) = (!; ; )0 and vt = (1; yt 1 ; f (xt 1 ))0 @ t ( ) @ @ t ( ) @ In particular, with t = t t = t 1 @ t 1 ( ) (A.11) @ ( )+ @ t 1 ( ) (A.12) @ ( 0 ), st ( 0 ) = and where _ t = @ = vt + @ ( ) = (@ ) respect to Ft = F (yt k ; xt k ; t ( ) @ = 0 t; t := Nt ( t ) 1 : (A.13) t . This is a martingale di¤erence sequence with t k, k = 0; 1; 2; :::) as E ( t jFt 1 ) = 0. It therefore p follows by the CLT for martingales, see, e.g., Brown (1971), that T ST ( 0 ) !d N (0; ), where = E st ( 0 ) st ( 0 )0 ; APPENDIX A. APPENDIX 117 if we can show that the quadratic variation converges, hST ( 0 )i !P end, observe that E 2 t jFt 1 = 1= t < 1=! 0 . Thus, T 1X hST ( 0 )i = E st ( 0 ) st ( 0 )0 jFt T t=1 where _ t = @ t ( ) = (@ ) = _ t = (v 0 ; t 0 . To this 1 T 1 X _ _0 = t t= t; T t=1 (A.14) . As _ 0 = 0, t 0 _t 1) + 1 = t 1 X i vt0 i ; t 1 i 0 (A.15) ; i=1 By the same arguments as in the proof of Theorem 4.1, it is easily checked that the ~ t := Xt ; _ t , with Xt de…ned in Theorem 4.1, is weakly deaugmented process X !, it therefore follows that E _ t _ t P 0 1. Thus, we can employ Lemma 4.1 to obtain that T1 Tt=1 _ t _ t = t ! . pendent with second moment. Since t 0 = < t Information It is easily veri…ed that @ 2 lt ( ) = @ @ 0 yt @ t ( ) @ t ( ) 2 @ @ 0 t ( ) yt t( ) 1 @2 t ( ) ; @ @ 0 (A.16) where @ t 1( ) @2 t 1 ( ) X @2 t ( ) = + = @ @ @ @ @ i=1 t 1 @2 t @2 t ( ) @ t 1( ) + = 2 @ @ 2 @ 1( 2 ) =2 [email protected] t i t 1 X [email protected] t i ( ) (A.18) @ (A.19) ~ t ( ) := X 0 ( ) ; _ t ( ) ; • t ( ) In particular, the augmented process X t to be weakly dependent with second moments for 2 0 can be shown . In particular, for all T T h i 1X 1 X @ 2 lt ( ) ~ t ( ) !P E h X ~t ( ) ; = h X T t=1 @ @ 0 T t=1 Moreover, (A.17) @ i=1 @2 t ( ) @2 t ( ) = = ::: = 0 @ 2 @ 2 ( ) 2 ~ t ( ) = @ lt ( ) : h X @ @ 0 7! @ 2 lt ( ) = (@ @ 0 ) is continuous and satis…es @ 2 lt ( ) @ @ 0 ~t D X := yt @ t ! 2L @ @ t @ 0 2 yt !L 1 @2 t @ @ 0 ; , APPENDIX A. APPENDIX where contains the maximum values of the individual parai ~ , with E D Xt ( ) < 1. For example, = (! U ; meters in @ 118 t @ ( ) = U; U; h t 1 U) ( )+ @ t 1 ( ) @ t 1 X i U t 1 i = @ t (A.20) @ i=0 and @2 t ( ) @2 t @ t 1( ) + = 2 @ @ 2 @ 1( 2 ) 2 t 1 X i U _t 1 i i=0 @2 t = @ 2 : (A.21) It now follows by Lemma X in Kristensen and Rahbek (2005) that sup 2 T 1 X @ 2 lt ( ) T t=1 @ @ 0 h i p ~t ( ) ! 0: E h X (A.22) Proof of Theorem 4.3 The proof follows by noting that Lemmas 3.1-3.4 in FRT (2009) hold for our setting. The only di¤erence is that the parameter vector include loading f (xt 1 ) : However, as E[f (xt 1 )] < 1, all the arguments remain identical as is easily seen upon inspection of the proofs of the lemmas in FRT (2009). Bibliography Aalen, O. O. (1989), “A model for non-parametric regression analysis of life times”, in J. Rosinski, W. Klonecki, and A. Kozek (eds.), Mathematical Statistics and Probability Theory, vol. 2 of Lecture Notes in Statistics, pp. 1–25, Springer, New York. Agosto, A., and Moretto, E. (2012), “Exploiting default probabilities in a structural model with nonconstant barrier”, Applied Financial Economics, 22:8, 667-679. Akaike, H. (1974), “A new look at the statistical model identi…cation”, lEEE Transactions on Automatic Control, AC-19, 716-723. Amisano, G., and Giacomini, R. (2007), “Comparing Density Forecasts via Weighted Likelihood Ratio Tests”, Journal of Business and Economic Statistics, 25, 177-190. Andersen, P.K., Borgan, Ø., Gill, R.D., and Keiding, N. (1992), Statistical Models Based on Counting Processes, Springer-Verlag. Andersen, P. K., and Gill, R. D. (1982), “Cox’s Regression Model for Counting Processes: A Large Sample Study”, Annals of Statistics, 10, 1100–1120. Andersen, T. G., Bollerslev, T., and Diebold, F. X. (2007), “Roughing it up: Including jump components in the measurement, modeling, and forecasting of return volatility”, The Review of Economic and Statistics, 89, 701–720. Andersen, T. G., Bollerslev, T., Diebold, F. X., and Labys, P. (2001), “The distribution of realized exchange rate volatility”, Journal of the American Statistical Association, 96, 42–55. 119 BIBLIOGRAPHY 120 Azizpour, S., Giesecke, K., (2008a), “Premia for Correlated Default Risk. Department of Management Science and Engineering”, Stanford University. Unpublished manuscript. Azizpour, S., Giesecke, K., (2008b), “Self-exciting Corporate Defaults: Contagion vs. Frailty”, Department of Management Science and Engineering, Stanford University. Unpublished manuscript. Azizpour, S., Giesecke, K., (2010), “Azizpour, S., Giesecke, K., (2010), “Exploring the sources of default clustering”, Department of Management Science and Engineering, Stanford University. Unpublished manuscript. Barndor¤-Nielsen, O., and Shephard, N., 2002, “Estimating quadratic variation using realized variance”, Journal of Applied Econometrics 17, 457–477. Behar, R., and Nagpal, K. (2001), “Dynamics of rating transition”, Algo Research Quarterly, 4 (March/June), 71–92. Bollerslev, T. (1986), “Generalized Autoregressive Conditional Heteroskedasticity”, Journal of Econometrics, 31, 307–327. Blume, M. E., Lim, F., and Craig, A. (1998), “The Declining Credit Quality of U.S. Corporate Debt: Myth or Reality?”, The Journal of Finance, 53, 1389-1413. Brockwell, P.J. and Davis, R. A. (1991), Time Series: Data Analysis and Theory, Springer, New York, 2nd edition. Brown, B. M. (1971), “Martingale Central Limit Theorems”, The Annals of Mathematical Statistics, 42, 59-66. Christo¤ersen, P.F. and Diebold, F.X. (1997), “Optimal Prediction Under Asymmetric Loss,”Econometric Theory, 13, 808-817. Chou, H. (2012), “Using the autoregressive conditional duration model to analyse the process of default contagion”, Applied Financial Economics, 22:13, 1111-1120. BIBLIOGRAPHY 121 Czado, C., Gneiting, T. and Held, L. (2009), “Predictive Model Assessment for Count Data,”Biometrics 65, 1254–1261. Cox, D. R. (1972), “Regression models and life-tables (with discussion)”, Journal of the Royal Statistical Society, Series B, 34, 187-220. Cox, D. R. (1975), “Partial likelihood”, Biometrika, 62, 69-76. Cox, D. R., and Snell, E. J. (1968), “A general de…nition of residuals”, Journal of the Royal Statistical Society, Series B, 30, 248-275. Corsi, F. (2009), “A Simple Approximate Long-Memory Model of Realized Volatility”, Journal of Financial Econometrics, 7, 174–196. Crosbie, P. J., and Bohn, J. (2002), “Modeling default risk”, Technical report, KMV, LLC. Das, S.R., Du¢ e, D., Kapadia, N., and Saita, L. (2007), “Common failings: How corporate defaults are correlated,”Journal of Finance 62, 93–117. Davis, M., and Lo, V. (2001), “Modeling default correlation in bond portfolios”, in C. Alexander, ed., Mastering Risk Volume 2: Applications, Prentice Hall, pp. 141-151. Davis, A. R., and Wu (2009), R., “A negative binomial model for time series of counts”, Biometrika, 96, 735-749. Dedecker, J. and Prieur, C. (2004), “Coupling for -dependent sequences and applications”, Journal of Theoretical Probability, 17, 861–855. Diebold, F. X., Gunther, T. A. and Tay, A. S. (1998), “Evaluating density forecasts with applications to …nancial risk management,”International Economic Review, 39, 863-883. Doukhan, P., and Wintenberger, O. (2008), “Weakly dependent chains with in…nite memory”, Stochastic Processes and their Applications, 118, 1997-2013. Du¢ e, D., and Singleton, K. (1999), “Modeling Term Structure of Defaultable Bonds”, The Review of Financial Studies, 12:4, 687-720. BIBLIOGRAPHY 122 Du¢ e, D., Saita, L., and Wang, K. (2007), “Multi-period corporate default prediction with stochastic covariates”, Journal of Financial Economics, 83, 635-665. Du¢ e, D., Eckner, A., Horel, G., and Saita, L. (2009), “Frailty Correlated Default”, Journal of Finance, 64, 2089-2123. Engle, R. F. (2002), “New frontiers for ARCH models”, Journal of Applied Econometrics, 17, 425–446. Engle, R. F., and Gallo, G. M. (2006), “A multiple indicators model for volatility using intra-daily data”, Journal of Econometrics, 131, 3-27. Engle, R. F., and Ng, V. (1993), “Measuring and testing of the impact of news on volatility”, Journal of Finance, 48, 1749-1778. Engle, R. F., and Russell, J.R. (1998), “Autoregressive conditional duration: a new model for irregularly spaced transaction data”, Econometrica, 66:5, 1127-62. Fahrmeir, L., and Kaufmann, H. (1985), “Consistency and asymptotic normality of the maximum likelihood estimates in generalized linear models”, Annals of Statistics, 13, 342-368. Fama, E. F. (1965), “The Behavior of Stock-Market Prices”, The Journal of Business, 38, 34-105. Ferland, R., Latour, A., and Oraichi, D. (2006), “Integer-Valued GARCH Processes”, Journal of Time Series Analysis, 27, 923–942. Focardi, S.M., and Fabozzi, F.J. (2005), “An autoregressive conditional duration model of credit-risk contagion”, The Journal of Risk Finance, 6, 208 - 225. Fokianos, K. (2001), “Truncated Poisson regression for time series of counts”, Scandinavian Journal of Statistics, 28, 645-659. Fokianos, K., and Kedem, B. (2004), “Partial Likelihood Inference for Time Series Following Generalized Linear Models”, Journal of Time Series Analysis, 25, 173–197. BIBLIOGRAPHY 123 Fokianos, K., Rahbek, A., and Tjøstheim, D. (2009), “Poisson autoregression”, Journal of the American Statistical Association, 104, 1430–1439. French, K. R., Schwert, G. W., and Staumbaugh, R. F. (1987), Journal of Financial Economics, 19, 3-29. Gallo, G. M., and Pacini, B. (2000), “The e¤ects of trading activity on market volatility”, The European Journal of Finance 6, 163–175. Giesecke, K., Longsta¤, F., Schaefer, S., and Strebulaev, I. (2011), “Corporate bond default risk: A 150-year perspective”, Journal of Financial Economics, 102, 233-250. Glosten. L. R.. Jagannathan. R.. and Runkle. D. (1993), “Relationship between the Expected Value and the Volatility of the Nominal Excess Return on Stocks”, Journal of Finance, 48, 1779-1802. Gourieroux, C., Monfort, A. and Trognon, A. (1984), “Pseudo Maximum Likelihood Methods Theory”, Econometrica, 52, 681-700. Hamilton, J. (2005), Regime-Switching Models”, The New Palgrave Dictionary of Economics. Han, H., and Park, J.Y. (2008), “Time series properties of ARCH processes with persistent covariates”, Journal of Econometrics, 146, 275–292. Han, H., and Kristensen, D. (2013), “Asymptotic theory for the QMLE in GARCHX models with stationary and non-stationary covariates,”CeMMAP working papers CWP18/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies. Hansen, P.R., Huang, Z. and Shek, H.W. (2012) “Realized GARCH: A joint model for returns and realized measures of volatility,”Journal of Applied Econometrics, 27, 877–906. Hausman, A., Hall, B. H., and Griliches, Z. (1984) Econometric Models for Count Data with an Application to the Patents-R&D Relationship”, Econometrica, 52, 909-938. BIBLIOGRAPHY 124 Hawkes, A.G., (1971), “Spectra of some self-exciting and mutually exciting point processes”, Biometrika, 58, 83–90. Heinen, A. (2003), “Modeling time series count data: An autoregressive conditional Poisson model”, CORE Discussion Paper 2003/62, Center of Operations research and Econometrics, Université Catholique de Louvain. Hilbe, J. M. (2007), Negative binomial regression, Cambridge University Press. Jarrow, R.,and Turnbull, S. (1995), “Pricing options on Financial Securities Subject to Default Risk”, Journal of Finance, 50, 53–86. Jarrow, R., Lando, D., Turnbull, S. (1997), “A Markov model for the term structure of credit risk spreads”, Review of Financial Studies, 481–523. Jarrow, R. and Fan, Y. (2001), “Counterparty risk and the pricing of defaultable securities”, Journal of Finance, 56, 555-576. Jensen, S. T., and Rahbek, A. (2004), “Asymptotic Inference for Nonstationary GARCH”, Econometric Theory, 20, 1203–1226. Johnson, N. L., Kotz, S., and Kemp, A. W. (1992), Univariate Discrete Distributions, second edition, John Wiley & Sons, Inc., New York. Jung, R.C., Kukuk, M. and Liesenfeld, R. (2006), “Time series of count data: modeling, estimation and diagnostics”, Computational Statistics and Data Analysis, 51, 2350-2364. Kavvathas, D., “Estimating credit rating transition probabilities for corporate bonds”, Working paper, University of Chicago. Koopman, S.J., and Lucas, A. (2005), “Business and Default Cycle for Credit Risk”, Journal of Applied Econometrics, 20: 311–323. Koopman, S.J., Lucas, A., and Monteiro, A. (2008), “The multi-state latent factor intensity model for credit rating transitions”, Journal of Econometrics, 142, 399-424. BIBLIOGRAPHY 125 Koopman, S.J., Lucas, A., and Schwaab, B., “Modeling frailty-correlated defaults using many macroeconomic covariates”, Journal of Econometrics, 162, 312-325. Kedem, B., and Fokianos, K. (2002), Regression Models for Time Series Analysis, Hoboken, NJ: Wiley. Kristensen, D. and Rahbek, A. (2005), "Asymptotics of the QMLE for a Class of ARCH(q) Models", Econometric Theory, 21, 946–961. Lambert, D. (1992), “Zero-in‡ated Poisson regression, with an application to defects in manufacturing”, Technometrics, 34, 1-14. Lamoureux, C. G., and Lastrapes, W. D. (1990), “Heteroskedasticity in stock return data: Volume versus GARCH e¤ects”, Journal of Finance, 45, 221–229. Lando, D. (1998), “On Cox processes and credit risky securities, Review of Derivatives Research, 2, 99–120. Lando, D., and Nielsen, M. (2010), “Correlation in corporate defaults: Contagion or conditional independence?”, Journal of Financial Intermediation, 19, 355-372. Lando, D., Medhat, M., Nielsen, M., and Nielsen, S. (2013), “Additive Intensity Regression Models in Corporate Default Analysis”, Journal of Financial Econometrics, 11, 443–485. Lando, D., and Skødeberg, T. M. (2002), “Analyzing rating transitions and rating drift with continuous observations”, Journal of Banking and Finance, 26, 423-444. Lang, L.H.P., Stulz, R.M., (1992), “Contagion and competitive intra-industry effects of bankruptcy announcements. An empirical analysis”, Journal of Financial Economics, 32, 45–60. Leland, H. E. (1994), “Corporate debt value, bond covenants, and the optimal capital structure”, Journal of Finance, 49, 1213–52. Leland, H. E. and Toft, K. B. (1996), “Optimal capital structure, endogenous bankruptcy, and the term structure of credit spreads”, Journal of Finance, 60, 987–1019. BIBLIOGRAPHY 126 Li, W. K. (1991), “Testing model adequacy for some Markov regression models for time series”, Biometrika, 78, 83-89. Martens, M., van Dijk, D., de Pooter. M. (2004),“Forecasting S&P 500 volatility: Long memory, level shifts, leverage e¤ects, day-of-the-week seasonality, and macroeconomic announcements”, International Journal of Forecasting, 25, 282-303. McCullagh, P. (1986), “The Conditional Distribution of Goodness-of-Fit Statistics for Discrete Data”, Journal of the American Statistical Association, 81:393, 104-107. McCullagh, P., and Nelder, J. A. (1983), Generalized Linear Models, Chapman & Hall, New York. McCullagh, P., and Nelder, J. A. (1989), Generalized Linear Models, Chapman & Hall, London, 2nd edition. Meitz, M., and Saikonnen, P. (2008), “Ergodicity,Mixing and Existence of Moments of a Class of Markov Models With Applications to GARCH and ACD Models”, Econometric Theory, 24, 1291–1320. Meyn, S. P., and Tweedie, R. L. (1993), Markov Chains and Stochastic Stability, London: Springer. Merton, R. C. (1974), “On the pricing of corporate debt: the risk structure of interest rates”, Journal of Finance, 29, 49–70. Mullahy, J. (1986), “Speci…cation and testing of some modi…ed count data models”, Journal of Econometrics, 33, 341{365. Nelder, J. A., and Wedderburn, R. W. M. (1972), “Generalized linear models”, Journal of the Royal Statistical Society, Series A, 135:370-384. Nelson. D. B. (1991). “Conditional Heteroskedasticity in Asset Pricing: A New Approach”, Econometrica, 59, 347-370. Nickell, P., Perraudin, W., and Varotto, S. (2000), “Stability of rating transitions”, Journal of Banking and Finance, 24, 203-227. BIBLIOGRAPHY 127 Rydberg, T. H., and Shephard, N. (2000), “A Modeling Framework for the Prices and Times of Trades on the New York Stock Exchange,”in Nonlinear and Nonstationary Signal Processing, eds. W. J. Fitzgerlad, R. L. Smith, A. T. Walden, and P. C. Young, Cambridge: Isaac Newton Institute and Cambridge University Press, pp. 217–246. Shephard, N. and Sheppard, K. (2010), Realising the future: Forecasting with highfrequency-based volatility (HEAVY) models, Journal of Applied Econometrics 25, 197-231. Shumway, T. (2001), Forecasting bankruptcy more e¢ ciently: A simple hazard model, Journal of Business, 74, 101–124. Schwarz, G. (1978), “Estimating the dimension of a model”, Annals of Statistics, 6, 461-464. Schwert, G. W. (1989), “Why Does Stock Market Volatility Change Over Time?”, The Journal of Finance, 44, 1115-1153. Skeel, D. A. (2001), “Debt’s Dominion: A History of Bankruptcy Law in America”, Princeton University Press. Streett, S. (2000), “Some Observation Driven Models for Time Series of Counts,” Ph.D. thesis, Colorado State University, Dept. of Statistics. Tay, A.S and Wallis, K.F. (2000) “Density Forecasting: A Survey”, Journal of Forecasting, 19, 235-254. Tang, T. T. (2009), “Information asymmetry and …rms’credit market access: Evidence from Moody’s credit rating format re…nement”, Journal of Financial Economics, 93, 325-351. Wedderburn, R. W. M. (1974), “Quasi-likelihood functions, generalized linear models and the Gaussian method”, Biometrika, 61, 439-447. Wong, W. H. (1986), “Theory of partial likelihood”, Annals of Statistics, 14, 88-123. BIBLIOGRAPHY 128 Zeger, S. L., and Qaqish, B. (1988), “Markov Regression Models for Time Series: A Quasi-Likelihood Approach,”Biometrics, 44, 1019–1031.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement