Alma Mater Studiorum - Universitá degli Studi di Bologna Facoltá di Economia Dipartimento di Scienze Economiche Dottorato di ricerca in Economia XIX ciclo Settore Scientiﬁco Disciplinare: SECS-P/05 ECONOMETRIA Topics in Econometrics of Financial Markets Relatore: Chiar.mo Prof. Sergio Pastorello Coordinatore: Chiar.mo Prof. Andrea Ichino Tesi di Dottorato di Ricerca di: Laura Coroneo Contents Introduction 2 1 A Quantile Regression Approach to Intraday Seasonality 5 1.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Market and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Quantile Regression as Density Regression . . . . . . . . . . . . . . 12 1.4 Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.5 Intraday Value at Risk . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2 Forecasting the yield curve using large macroeconomic information 40 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.2.1 Alternative models . . . . . . . . . . . . . . . . . . . . . . . 49 2.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.4 Estimation procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.4.1 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.5 Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.6 Out-of-sample forecast . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.6.1 Forecast performances . . . . . . . . . . . . . . . . . . . . . i 59 2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3 How Arbitrage-Free is the Nelson-Siegel Model? 81 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.2 Modeling framework . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.2.1 The Nelson-Siegel model . . . . . . . . . . . . . . . . . . . . 86 3.2.2 Gaussian arbitrage-free models . . . . . . . . . . . . . . . . 88 3.2.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.4 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.4.1 Resampling procedure . . . . . . . . . . . . . . . . . . . . . 94 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 3.5.1 Testing results . . . . . . . . . . . . . . . . . . . . . . . . . . 96 3.5.2 In-sample comparison . . . . . . . . . . . . . . . . . . . . . . 99 3.5.3 Out-of-sample comparison . . . . . . . . . . . . . . . . . . . 100 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 References 124 ii List of Figures 1.1 Kernel estimates at diﬀerent hours of the day. . . . . . . . . . . . . 32 1.2 15 minutes returns . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.3 Location and scale shifts in the pdf through the quantile function . 34 1.4 Estimated parameters . . . . . . . . . . . . . . . . . . . . . . . . . 35 1.5 Seasonal component . . . . . . . . . . . . . . . . . . . . . . . . . . 36 1.6 Seasonality in the quantiles . . . . . . . . . . . . . . . . . . . . . . 37 1.7 Seasonality and the tails . . . . . . . . . . . . . . . . . . . . . . . . 38 1.8 VaR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.1 Yield data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 2.2 Macro-yields model in sample ﬁt: yields . . . . . . . . . . . . . . . 75 2.3 Macro-yields model in sample ﬁt: key macro variables/1 . . . . . . 76 2.4 Macro-yields model in sample ﬁt: key macro variables/2 . . . . . . 77 2.5 Estimated macro-yields factors . . . . . . . . . . . . . . . . . . . . . 78 2.6 Smoothed square forecast errors . . . . . . . . . . . . . . . . . . . . 79 2.7 Smoothed forecast errors . . . . . . . . . . . . . . . . . . . . . . . . 80 3.1 Nelson-Siegel factor loadings . . . . . . . . . . . . . . . . . . . . . . 116 3.2 No-Arbitrage Latent factors and Nelson and Siegel factors . . . . . 117 3.3 Zero-coupon yields data . . . . . . . . . . . . . . . . . . . . . . . . 118 3.4 No-Arbitrage loadings of the Nelson and Siegel factors . . . . . . . 119 3.5 Distribution of the estimated loadings for aN A . . . . . . . . . . . . 120 3.6 Distribution of the estimated loadings for bN A (1) iii . . . . . . . . . . 121 3.7 Distribution of the estimated loadings for bN A (2) . . . . . . . . . . 122 3.8 Distribution of the estimated loadings for bN A (3) . . . . . . . . . . 123 iv List of Tables 1.1 Descriptive statistics at diﬀerent hours of the day . . . . . . . . . . 28 1.2 GARCH(1,1) estimates with Student-t distribution . . . . . . . . . 29 1.3 Kupiec test on the VaR forecasts . . . . . . . . . . . . . . . . . . . 29 1.4 Christoﬀersen’s likelihood ratio test on the VaR forecasts . . . . . 30 1.5 GARCH(1,1) estimates of the standardized returns . . . . . . . . . 31 1.6 Kupiec test on the VaR forecasts . . . . . . . . . . . . . . . . . . . 31 2.1 Summary statistics of the US zero-coupon data . . . . . . . . . . . 68 2.2 Macroeconomic series . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 2.4 Summary statistics of the estimated factors . . . . . . . . . . . . . 71 2.5 Goodness of ﬁt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 2.6 Out-of-sample performance . . . . . . . . . . . . . . . . . . . . . . . 73 3.1 Summary statistics of the US zero-coupon data . . . . . . . . . . . 104 3.2 Autocorrelations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.3 Parameter estimates . . . . . . . . . . . . . . . . . . . . . . . . . . 106 3.4 Estimation results for aN A . . . . . . . . . . . . . . . . . . . . . . . 108 3.5 Estimation results for bN A (1) . . . . . . . . . . . . . . . . . . . . . 109 3.6 Estimation results for bN A (2) . . . . . . . . . . . . . . . . . . . . . 110 3.7 Estimation results for bN A (3) . . . . . . . . . . . . . . . . . . . . . 111 3.8 Summary statistics for the resampled parameters . . . . . . . . . . 112 3.9 Measures of ﬁt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 v 3.10 Out-of-sample performance . . . . . . . . . . . . . . . . . . . . . . . 114 3.11 Diebold-Mariano test statistics . . . . . . . . . . . . . . . . . . . . . 115 1 Introduction In past years, improvements in the computational speed of computers and the availability of large datasets have further fostered the research in forecasting analysis for ﬁnancial time series. This thesis contributes to the empirical literature on ﬁnancial forecasting by addressing issues related to the problem of improving models performance by exploiting larger information sets. The ﬁrst paper investigates the distribution of high frequency ﬁnancial returns, with special emphasis on the seasonality. With the availability of detailed information on trades and quotes, due to the implementation of electronic trading systems, intraday data has become a major pole of interest for researchers and ﬁnancial agents that practice intraday trading. Within the day there are signiﬁcant variations in asset prices, which imply diﬀerent evaluations of the return’s distribution through the day, these variations are partly deterministic and due to the intraday seasonality. Intraday value at risk evaluations therefore depend on the time of the day. If an intraday trader does not take this seasonality into account in her risk estimations, she will underestimate the expected loss at the opening and closing and overestimate it at noon. I propose a quantile regression approach (Koenker and Basset, 1978) to model the distribution of high frequency ﬁnancial returns and to forecast intraday value at risk. This choice is motivated by several reasons. First of all, not only the volatility of high frequency ﬁnancial returns presents seasonal movements, but also the skewness and kurtosis. Moreover, quantile regression does not assume the existence of any moment, is distribution free and robust to the presence of outliers or jumps. Using 15 minutes quote midpoints of three stocks traded 2 at the Spanish stock exchange from January 2001 to December 2003, I show that indeed the conditional probability distribution depends on the time of the day. Results, in terms of quantiles, permit straightforward intraday risk evaluations, such as value at risk. I show how the intraday value at risk at 2.5%, 1% and 0.5% conﬁdence levels depend on the time of the day and I perform out-of-sample value at risk forecasts. The tests performed on the out-of-sample value at risk forecasts conﬁrm that the model is able to provide good risk assessments and to outperform standard approaches (Gaussian and Student-t GARCH). In the second paper, I focus on the problem of forecasting yields including large datasets of macroeconomic information. The interaction between ﬁnancial markets and macroeconomic conditions has raised the necessity of developing new ﬁnancial models which are able to eﬃciently summarize the macroeconomic information. I propose an innovative way to exploit the linkages between macro variables and yields. Rather than including in the yield curve model macroeconomic variables as factors, I directly extract the latent factors from a data set composed of both yields (seventeen series) and macro variables (one hundred-eighteen) which includes real variables (sectoral industrial production, employment and hours worked), nominal variables (consumer and producer price indices, wages, and money aggregates) and asset prices (stock prices and exchange rates). To identify the yield curve factors, I follow the approach based on the Nelson and Siegel (1987) curve imposing restrictions only on the loadings relative to the yields, leaving the loadings relative to macro variables free. This allows to use latent yield curve factors which are enriched with information from macro variables, thereby keeping parsimony. I estimate the model by maximum likelihood, combining EM algorithm and Kalman ﬁlter, using monthly observations from January 1970 to December 2000. Results show that out-of-sample forecast performances improves at mid and long horizons (i.e. 6 and 12-months ahead) compared with the forecasts generated by a model estimated using only the yields, a model augmented with three key macro variables, a model augmented with the ﬁrst three principal components extracted from the 3 same dataset of macroeconomic variables and the random walk (which is a standard benchmark for yield curve forecasting). In the third paper, I test whether the Nelson and Siegel (1987) yield curve model is arbitrage-free in a statistical sense. Fixed-income wealth managers in public organizations, investment banks and central banks rely heavily on Nelson and Siegel (1987) type of models to ﬁt and forecast yield curves. Despite its empirical merits and wide-spread use in the ﬁnance community, two theoretical concerns can be raised against the Nelson-Siegel model. It is not theoretically arbitrage-free and it falls outside the class of aﬃne yield curve models. I estimate the Nelson and Siegel factors and use them as exogenous factors in an essentially-aﬃne term structure model to estimate the implied arbitrage-free factor loadings. For the no-arbitrage model with time-varying term premia, I use the two-step approach of Ang, Piazzesi and Wei (2006). Using a non-parametric resampling technique and zero-coupon yield curve data from the US market covering the period from January 1970 to December 2000 and spanning 18 maturities from 1 month to 10 years, I ﬁnd that estimated parameters from the no-arbitrage model are not statistically diﬀerent from those obtained from the Nelson-Siegel model, at a 95 percent conﬁdence level. To corroborate this result, I show that the Nelson-Siegel model performs as well as its no-arbitrage counterpart in an out-of-sample forecasting experiment. I therefore conclude that the Nelson and Siegel yield curve model is compatible with arbitragefreeness. 4 Chapter 1 A Quantile Regression Approach to Intraday Seasonality ABSTRACT: This paper investigates the distribution of high frequency ﬁnancial returns, with special emphasis on the seasonality. Using quantile regression, we show the expansions and shrinks of the probability law through the day for three years of 15 minutes sampled stock returns. Returns are more dispersed and less concentrated around the median at the hours near the opening and closing. We provide intraday value at risk assessments and we show how it adapts to changes of dispersion over the day. The tests performed on the out-of-sample forecasts of the VaR show that the model is able to give good risk assessments and it outperforms Gaussian and Student’s t GARCH models. Keywords: High frequency returns, Quantile Regression, Seasonality, Intraday VaR. JEL classiﬁcation: C14, C22, C53, G10. This chapter is adapted from the paper ”Intraday seasonality of returns distribution. A quantile regression approach and intraday VaR estimation” written with David Veredas (Universite Libre de Bruxelles), CORE DP 2006/77. 5 1.1 Introduction and Motivation The interest of intraday seasonal patterns of the probability law of high frequency ﬁnancial returns rests on two facts. First, intraday data has become a major pole of interest for researchers and ﬁnancial agents that practice and analyze high frequency trading. This practice and this analysis is used in an array of instruments such as derivative pricing, eﬃcient estimation of security’s beta, liquidity analysis, responses to news arrivals, and any operation that involves risk measures. For instance, high frequency hedge fund managers often open and close positions within the day. For these managers intraday risk evaluation is an important tool to follow the market and to build optimal intraday trading strategies. Second, the analysis of risk is intimately related with the analysis of probabilities and, therefore, the analysis of the conditional probability distribution. Asset returns are realizations of a random variable and their behavior is fully described by their conditional probability law. Any function, such as the density function, describing this law conveys information about the likelihood that the next realization will take a certain value. But within the day these odds depend partly on a deterministic seasonal component that makes the probability density function to expand or shrink as a function of the time of the day. This eﬀect is illustrated in the kernel densities for returns at diﬀerent hours of the day shown in Figure 1.1. Data are 15 minutes sampled returns for three stocks (large, medium and small caps) traded at the Spanish stock exchange.1 The kernel density estimates for returns at diﬀerent times of the day vary signiﬁcantly. Around lunch the density is more peaked and the tails are thinner while it is more dispersed at the hours near the opening and closing. One of the most common risk-related intraday measures, that make use of the probability law of returns, is value at risk.2 Value at risk evaluations depend very 1 All over the analysis we use standardized returns for comparison purposes. Otherwise the scale of the plots depends on the price, perturbing the interpretation. 2 There are many other risk-measures that may be constructed from the intraday return’s distribution, such as volatility or left extreme tail analysis. Alternatively we may, exploit the 6 much on the time of the day. If an intraday trader does not take this seasonality into account in her risk estimations, she will underestimate the expected loss at the opening and closing and overestimate it at noon. [FIGURE 1.1 AROUND HERE] Moreover, not only volatility presents seasonal movements, but also skewness and kurtosis. Table 1.1 shows descriptive statistics for the data grouped according to the hour of the day. We report the sample mean, the sample standard deviation, the skewness and the kurtosis indices for four diﬀerent hours.3 These estimates are proxies of the intraday behavior of the probability law. There is no evidence of an intraday seasonal pattern in the sample mean of returns. However, there is a very clear U-shaped pattern in the sample standard deviation, as found in many former studies. In addition to this, the last two columns suggest the presence of an intraday seasonality in the skewness and kurtosis indices. While the movements in the skewness are small in magnitude, for the kurtosis index there are large variations during the day. For all the stocks, there is a signiﬁcant increase in the thickness of the tails just after 15:00, right before the opening of the NYSE. [TABLE 1.1 AROUND HERE] The standard approach to analyze the conditional distribution function of intraday asset returns is to ﬁt a model for the second moment as a function of two components. One for the dynamics and another for the seasonality. If returns are Gaussian, the second moment provides information enough to describe the conditional probability law, as all the odd moments are zero and the even moments intraday distribution to construct daily measures that can be used to compute daily volatilities and daily value at risk. Yet, the way dynamic models for the density aggregate (e.g. aggregation of quantile regression models as the one we present here) is still an open √ research question. n(n−1) 3 The table displays the bias adjusted skewness index computed as m3 /s3 where n n−2 is the number of days (there is one observation for each day), m3 is the third sample central the sample standard deviation. And the bias adjusted kurtosis computed as moment and s2 is m n−1 4 (n−1)(n−3) (n + 1) s4 − 3(n − 1) where m4 is the fourth sample central moment. 7 are functions of the second moment. This property of the Gaussian distribution is very appealing but, at the same time, this distribution is not able to reproduce the tail behavior present in the data. This is one of the reasons for which it is now commonly accepted that asset returns are not normally distributed. More ﬂexible distributions, such as the Student’s t distribution, are needed. However, the drawback of these laws is that moments beyond the second are either zero -e.g. the third moment- or functions of an invariant tail index -e.g. the fourth moment. For instance, a GARCH model with a Student’s t distribution has constant kurtosis given by a function of the estimated degrees of freedom, which is not consistent with the data features. A possible solution to overcome this problem would be to ﬁt models for diﬀerent moments, similarly to Hansen (1994) or Harvey and Siddique (1999) among others, but it is not clear the functional forms that these models should take and/or which regressors to use. Since our interest is the analysis of the seasonality of the conditional distribution, a natural alternative is to model directly the conditional probability. Among all the functions that characterize the conditional probability (density, cumulative, characteristic, Laplace, hazard, etc), the conditional quantiles are the better suited due to the existence of quantile regression, introduced by the seminal work of Koenker and Basset (1978). Indeed quantile regression has a number of useful features. First, quantile regression is one of the possible ways to characterize the conditional probability law and, since there is a one to one relation with all the other possible characterizations, it allows, indirectly, to analyze the eﬀect of the time of the day on the density function of asset returns. Second, quantile regression does not assume the existence of any moment. In fact, it does not assume anything about the moments. Often it happens that the tails of returns are so thick that some important moments do not exist. For instance, Table 1.2 shows the estimated parameters of a GARCH(1,1) with Student’s t distribution. The estimated degrees of freedom for the three stocks are very low. So low that, according to the model, kurtosis does not exist for any of them and variance does not even exist for one of 8 them.4 [TABLE 1.2 AROUND HERE] Third, quantile regression is robust in the sense that the estimated coeﬃcients are not sensitive to outliers on the dependent variable. This is particularly useful in the analysis of high frequency ﬁnancial returns since often we do ﬁnd outliers or, al least, observations that are remarkably diﬀerent to the rest of the process. For instance, Figure 3.3 shows the actual returns for the three stocks we analyze. For all there is at least one observation that is unusually high. Fourth, quantile regression is a distribution free model. This is a very compelling feature. It does not rely on any distribution speciﬁcation but, ironically, it is an estimate of the conditional probability distribution. As noted earlier, and shown in Table 1.2, assuming a parametric distribution for intraday asset returns entails a series of problems that sometimes, e.g. inﬁnite variance, are diﬃcult to overcome. [FIGURE 3.3 AROUND HERE] The use of quantile regression in asset returns is not new. One of the ﬁrst to use it are Engle and Manganelli (2004) who introduce the CAViaR (Conditional Autoregressive Value at Risk). CAViaR extends the traditional linear quantile regression to a nonlinear framework and develop a new test of model adequacy, the Dynamic Quantile (DQ) test, using the criterion that each period the probability of exceeding the VaR must be independent of all the past information. Gourieroux and Jasiak (2005) introduce a new dynamic quantile model univariate series and panel data as well as the Quantile Factor Model. Less related, Bouye and Salmon (2003) use quantile regression in a copula context, that is they deduce the form 4 We am here abusing a bit of the estimation results. According to the table, the variance does exist for the stock ANA, as the estimated degrees of freedom is higher than two, though very close to it. However, in the software we use -the GARCH toolbox of Matlab- the parameter is constrained to be greater than two (as it often happens due to the way in which the Student’s t distribution is written). Therefore, in some sense, it can be expected that the unbounded estimator for the degrees of freedom to be below two. 9 of the non linear conditional quantile regression implied by the copula. As for intraday VaR, Giot (2005) quantify intraday VaR (15 and 30 minute returns) using normal GARCH, Student GARCH, RiskMetrics and Log-ACD models. He shows that Student GARCH model performs best. Last, Dionne et al. (2005) investigate the use of tick-by-tick data for market risk measurement and propose an intraday Value at Risk at diﬀerent horizons based on irregularly time-spaced high-frequency data by using an intraday Monte Carlo simulation. Using quote midpoints of three stocks traded at the Spanish stock exchange from January 2001 to December 2003, we show that indeed the conditional probability distribution depends on the time of the day. At the opening and closing the density ﬂattens and the tails become thicker, while in the middle of the day returns concentrate around the median and the tails are thinner. Results are intuitive, in the sense that they conﬁrm the general perception that in the opening and closing the probabilities of ﬁnding large price ﬂuctuations are higher than at lunch. Results, in terms of quantiles, permit straightforward intraday risk evaluations, such as value at risk. We show the intraday variation of the maximum expected loss at 2.5%, 1% and 0.5% conﬁdence levels. The maximum expected loss is maximal at the opening and closing and minimal at lunch time. Failure rates tests, based on Kupiec (1995) and Christoﬀersen (1998) conﬁrm that the model is able to provide good forecasts of the maximum expected loss. Comparison with standard approaches (Gaussian and Student-t GARCH) show that the latter miss the correct probabilities and that quantile regression outperforms them. The structure of the paper is as follows. Section 1.2 introduces the data and the market structure. Section 1.3 briefs quantile regression, the model that is used for estimation and how to interpret results in term of density functions. Section 1.4 shows the estimation results, while Section 1.5 contains intraday value at risk forecast and evaluation with the quantile regression model as well as its GARCH competitors. Section 1.6 concludes. 10 1.2 Market and Data Data come from the Spanish Stock Exchange (SSE), the 9th world largest stock exchange in terms of capitalization (the 3th among continental European markets), and the 7th in terms of total value of share trading (the 3th in continental Europe) according to the World Federation of Exchanges. The Spanish stock exchange interconnection system is the electronic platform that connects, since 1995, the four exchanges that compose the SSE (Barcelona, Bilbao, Madrid, and Valencia). This system holds all the Spanish stocks that achieve pre-determined minimum levels of trading frequency and liquidity. Every order submitted to the system is electronically routed to a centralized limit order book (LOB) to proceed with its immediate execution or storage. The matching of orders is, therefore, computerized. The LOB on the brokers’ screens is updated each time there is a cancelation, execution, modiﬁcation or new submission. The SSE is organized as an orderdriven market with a daily continuous trading session from 9:00 a.m. to 5:30 p.m. and two call auctions that determine the opening and closing prices. During the continuous trading session, a trade takes place if an only if an order hits the quotes. Pre-arranged trades are not allowed during the continuous session, and price-improvements are impossible. There are no market makers and there is no ﬂoor trading. The market is governed by a strict price-time priority rule, but an order may lose priority if modiﬁed. Stocks are quoted in euros. The minimum price variation (tick) equals 0.01 for prices below 50 and 0.05 for prices above 50. The minimum trade size is one share. There are three basic types of orders: market, limit, and market-to-limit. Market orders are executed against the best prices on the opposite side of the book. Any excess that cannot be executed at the best bid or ask quote is executed at less favorable prices by walking down (up) the book until the order is fulﬁlled. Market-to-limit orders do not specify a limit price but are limited to the best opposite-side price on the book at the time of entry. Any excess that cannot be executed is converted into a limit order at that price. Finally, limit orders are to be executed at the limit price or better. Any unexecuted part of 11 the order is stored in front of the book at the limit price. By default, orders expire at the end of the session. The oﬃcial market index of the SSE is the IBEX-35, which includes the 35 most liquid and active stocks of the exchange, weighted by market capitalization. Its composition is regularly revised every semester. Our initial sample is formed by the 35 index constituents from January 2001 to December 2003. The data used in this study consists of 15 minutes sampled quote midpoints during 3 years, from January 2001 to December 2003, of the 35 companies listed in the IBEX-35. For each stock there are 34 intraday observations for a total of 25.430 observations. For simplicity, we will report the analysis only on 3 of the 35 companies but the results are valid for all of them and they are available upon request. Among the 35 companies of the IBEX-35, we report the results for Telefonica (TEF), Endesa (ELE) and Aciona (ANA) that are, respectively, a big, medium and small company, weighting, approximately, 20%, 6% and 0.8% in the index. 1.3 Quantile Regression as Density Regression The probability law of a random variable rt can be characterized by means of diﬀerent functions. Some, like the density or the cumulative functions, are common. Others, like the quantile function, the hazard function or the characteristic function are less used. Yet, any can be written as a function of the others and hence the knowledge of one implies the knowledge of the others. The quantile function is particularly compelling in the context of conditional distributions. This is due to the existence of a solid theory on quantile regression (see Koenker, 2005, for a survey). Let Qrt (τ ) be the τ -th quantile of rt . It is well known that ∂ F (rt ) and ∂rt Qrt (τ ) = F −1 (τ ) = inf{rt : F (rt ) ≥ τ }, f (rt ) = 12 τ ∈ (0, 1), where f (rt ) is the probability density function, pdf hereafter, and F (rt ) is the cumulative distribution function, cdf hereafter. Top row of Figure 1.3 shows this idea. The density is symmetric around the mean, which implies that the quantiles are also symmetric around the median (that equals the mean). The density is centered at zero, and hence the quantile function at the median, Qrt (0.5), is zero. One may question what happens with the pdf and the quantile function if there is a location-scale shift. Second to fourth rows of Figure 1.3 illustrate these cases. The second row shows a positive location shift in the density, which produces a parallel upward shift of the quantile function. Or, inversely, if the quantile function shifts, the density shifts the location. It is worth noticing that after the shift the quantile function at the median, Qrt (0.5), is not zero anymore as the mean in the pdf is not zero anymore either. [FIGURE 1.3 AROUND HERE] The third row shows the eﬀect of a positive scale shift in the density. This shift produces an expansion of the quantile function, or, inversely, an expansion of the quantile function implies a positive scale shift in the density.5 The expansion in the quantile function implies an increase in the dispersion of the quantiles. This happens when we compare the probability law at, for instance, lunch and the closing, as already noted in Figure 1.1. By contrast, the dispersion of the observations decreases if we compare the probability law the opening and at lunch which means a contraction of the quantiles. Finally, last row illustrates a positive location shift and a scale shift in the density, implying an asymmetric shift -a mix of shift and expansion eﬀects- in the quantile function. More complex shifts are possible. For instance a one-sided expansion in the quantile implies an increase of the dispersion in only one side of the density, creating skewness. Fat tails can be also created in the density if the quantile are stretched only at the highest and lowest values, say 1% and/or 99% quantiles. In sum, either a location shift or a scale shift, or 5 These type of shifts are of particular relevance for this article. 13 both, in the pdf has a clear representation in terms of quantiles, as both functions contain the same information about the random variable of interest. The understanding of the eﬀect of these shifts and how the quantile and the density function are aﬀected by them is important in a conditional context. In fact, the movements in the densities of Figure 1.1 are produced by the intraday seasonality. It is therefore meaningful to model how the probability distribution evolves conditional to the time of the day. Quantile regression (QR henceforth), introduced by Koenker and Bassett (1978), is the appropriate tool. The problem of ﬁnding the τ -th unconditional quantile can be expressed as the solution of a simple linear optimization problem. Generalizing these results to the case in which the quantiles are linear functions of some explanatory variables leads to the QR method. The fundamental diﬀerence of QR with respect to mean regression is that the latter considers the eﬀect of the regressor on the mean of the regressand while QR considers the eﬀect of the regressor on the speciﬁc τ -th quantile of the regressand. Hence, for a suﬃciently narrow grid of τ , the QR method can fully describe the quantile function. The basic QR model is Qrt (τ |xt ) = ω(τ ) + J βj (τ )xjt , τ ∈ (0, 1), (1.3.1) j=1 where the intercept ω(τ ) and the slope parameters βj (τ ) are functions of τ . While in the mean regression model there is a unique parameter βj that describes the eﬀect that xjt has on the conditional mean of rt , in QR for each τ ∈ (0, 1) there is a parameter βj (τ ) that describes the eﬀect of xjt on the τ -th conditional quantile of rt . In other words, QR measures the eﬀect of the regressors on each quantile of the conditional distribution of the dependent variable. In this way it allows to analyze how a shock in the regressors aﬀects the diﬀerent quantiles and hence the pdf of returns. The set of regressors xjt is divided in two parts. One accounts for the intraday seasonality, the main object of interest, and the second controls for the dynamics. 14 As for the seasonality, we model it using a Fourier series of order 3: d d seasd (τ ) = αj (τ ) cos 2πj + γj (τ ) sin 2πj , 34 34 j=1 3 (1.3.2) where 34 is the number of intraday time intervals (for the 15 minutes sampled returns) and d denotes the time of the day in ordinal sense (i.e. the sequence 1, 2, ..., 34).6 Fourier series are convenient expressions for seasonality as the combination of cosines and sines is ﬂexible enough to capture virtually any seasonal pattern. The cosine component of the ﬁrst Fourier series reaches the maximum at the opening and at the closing, the hours of the day in which the dispersion is higher, and has the minimum at lunch time, the time of the day in which the dispersion is minimal. We therefore expect this cosine term to capture most of the seasonal pattern. To control for the dynamics, we follow Koenker (2005) choosing one lag of the absolute value of returns: β(τ )|rt−1 |. More lags or other functions of rt to capture the dynamics as, for instance, square returns are possible. However, in a robust setting, the choice of absolute values is more sensible.7 Putting all the elements together the model we estimate is Qrt (τ |d, |rt−1 |) = ω(τ ) + +β(τ )|rt−1 | + 3 d d αj (τ ) cos 2πj + γj (τ ) sin 2πj . + D D j=1 (1.3.3) Estimation has been implemented in GAUSS using a modiﬁed version of the library Qreg.8 Parameters are estimated using the interior point method, as described by Portnoy and Koenker (1997). The chosen grid of quantiles is (0.05, 6 We tried higher orders of the Fourier series but results do not change substantially. We have also tried with more lags of absolute returns and results, available upon request, don’t change qualitatively. 8 Qreg, a GAUSS library for computing quantile regression, D. Jacomy, J. Messines and T. Roncally (2000), Groupe de Recherche Operationelle, Credit Lyonnais, France: http : //gro.creditlyonnais.f r/content/rd/homeg auss.htm. All the codes have also been translated into Matlab, using the function lp f nm of Daniel Morillo and Roger Koenker, translated from Ox to Matlab by Paul Eilers 1999, modiﬁed by Roger Koenker 2000 and by Paul Eilers 2004. 7 15 0.10,..., 0.95) and the limiting covariance matrix has been computed in GAUSS using the procedure described in Appendix. 1.4 Estimation Results Parameters in equation (1.3.3) depend on the quantile considered, τ . There are as many parameters as quantiles times the number of explanatory variables plus those in the intercept ω(τ ). Because this number may become large, in our case is 168 per stock, we follow the literature, see for instance Koenker (2005), and we present all the results graphically. This presentation nicely dovetails with Figure 1.3 as the interpretation of density movements in terms of quantiles applies. Figure 1.4 shows the estimated parameters of model (1.3.3) for TEF, ELE and ANA respectively. Every point is an estimated parameter for a diﬀerent quantile. We also plot the 5% point-wise conﬁdence intervals. Top left plots of each panel show the intercept parameters, ω̂(τ ) while the coeﬃcients for past absolute returns, β̂(τ ), are in the top right plots. For all the stocks, the magnitude of the lagged value of the return is an important source of variation. But it aﬀects diﬀerently the diﬀerent quantiles of the distribution. The median is unaﬀected by a shock in |rt−1 |. Following the logic of Figure 1.3, there is no location shift and hence the median remains unchanged for any value of |rt−1 |. It changes however any quantile beyond and below 50%. For a given past absolute return, the eﬀect on the extreme quantiles is larger than for the quantiles near the median. Exemplifying, if return at t − 1 was zero, the density, conditional to the time of the day, remains unchanged. If, by contrast, at t − 1 there is a large movement in returns, the density becomes more sparse around the median, that remains unchanged, increasing the probabilities of ﬁnding a large price variation the next period. If return at t − 1 is small, the density shrinks, decreasing the probabilities of ﬁnding large price variations. The remaining six plots show the estimated values of the parameters for the Fourier series. The second line refers to the estimated parameters of the ﬁrst 16 Fourier series, the third one to the estimated parameters of the second Fourier series and the last one to the third Fourier series. The coeﬃcients for the cosine terms, the alphas, are larger, for all stocks, than the sinus ones, the gammas. This is due to the fact that the cosine series peak at the opening and the closing, the times of the day at which trading activity is more intense and return dispersion is bigger. None of the coeﬃcients is diﬀerent from zero for τ = 0.5, meaning that the median is not aﬀected by past observation nor the time of the day. In other words, no proﬁt strategies based on the time of the day are found. Consequently, since also the estimated coeﬃcient of |rt−1 | for τ = 0.5 is zero, the conditional median is equal to the unconditional one, that is zero. Figure 1.5 shows the estimated seasonal component, seas ˆ d (τ ), computed as in (1.3.2). The plots read as follows. Each line is the seasonal component for a speciﬁc hour of the day for diﬀerent quantiles. The estimated seasonal components displays diﬀerent shapes within the day and some conclusions can be drawn. First, the shape and the magnitude of the seasonal component is fairly similar for all the stocks. In particular, there is no seasonal behavior at the median. But there is beyond it and becomes more remarkable as we approach the extreme quantiles. Second, the seasonal component is clearly diﬀerent at the opening and the closing, with values that are negative for taus smaller than 0.5 and positive for taus bigger than 0.5. Third, the seasonal component at 13:00 and 14:00 displays exactly the opposite behavior with respect to the one at the opening and closing, but with a smaller magnitude. To better see how the conditional distribution of returns moves though the day, Figure 1.6 plots the conditional quantiles of the 15 minutes returns for diﬀerent hours of the day. Rewriting equation (1.3.3) conditional to a particular value of past absolute return and on diﬀerent hours of the day, we have Qrt (τ |d, |rt−1 |) = ω(τ ) + β(τ )|rt−1 | + seasd (τ ). The choice of the conditioning value of |rt−1 | has a quantitative but not qualitative eﬀect. For a given τ , β(τ )|rt−1 | is constant, while the term seasd (τ ) changes 17 according to the hour of the day (as shown in Figure 1.5). The only eﬀect that the chosen value of |rt−1 | has is to shift all the conditional quantiles at the same τ by the same amount. The ﬁgure reads as follows: the closer the line is to the horizontal zero line, the more concentrated is the density around the median. And the further it is, the more dispersed it is. The time of the day at which there is the largest seasonal eﬀect is at 17:00, the closure of the market, followed by the eﬀect at 9:30, the opening. At these hours the conditional density becomes more dispersed. In the opposite direction, for all the companies, are the seasonal eﬀects at 13:00 and 14:00. They decrease (in absolute value) the conditional quantiles, decreasing the dispersion. This eﬀect can be associated to a reduced trading activity during the lunch break. 1.5 Intraday Value at Risk As shown in Section 1.3, there is a one to one relation among the quantile and density functions. This is particularly appealing in the construction of risk measures, which are intimately related with the analysis of the tails of the density function. Using the results of Figure 1.6 and equation (1.7.2) in the Appendix, we can compute the conditional density at diﬀerent quantiles. Figure 1.7 shows the tails of these densities.9 Each point of the conditional density is derived from its relative conditional quantile. As expected, the density mass at the extremes is way larger around the opening and closing than around lunch. This seasonal tail behavior has to be taken into account in the computation of intraday risk measures, such as VaR. Value at Risk was developed to provide a single number that could summarize the information about the risk in a portfolio. Over the last ten years, this technique has been increasingly used by banks and regulators all over the world as a way to 9 A full picture of the density is possible but not relevant as the ﬁnancial interest lies on the tails and not around the median. And, moreover, it has been shown earlier that nothing interesting happens around the median. 18 estimate possible losses related to the trading of ﬁnancial assets, i.e. as a tool designed to quantify and forecast market risk. In particular, the goal of VaR is to assess the possible loss that can be incurred by a trader or bank, for a given portfolio of assets, over a given time period and for a certain conﬁdence level. The time period and the conﬁdence level are the two major parameters that should be chosen in a way appropriate to the overall goal of risk measurement. When the primary goal is to satisfy external regulatory requirements, such as bank capital requirements of the Basel II Capital Accord, the conﬁdence level is typically small, 1%, and the time horizon is long (usually a 10 day period). However for an internal risk management model, used by a company to control the risk exposure, the typical conﬁdence level is even smaller and the time horizon shorter. In particular, for active market participants such as high frequency traders, ﬂoor traders or market makers, the time horizon of their returns is shorter and the corresponding trading risk must be assessed on such short time intervals. Therefore a VaR model that characterizes the market risk on an intraday basis is useful for market participants (such as intraday traders and market makers) involved in frequent intraday trades. The VaR at a conﬁdence level of τ for a given portfolio is the loss at the τ percent probability level, which can simply be deﬁned as the τ empirical quantile of the conditional distribution of returns: P r[rt < V aRt (τ |t−1 )] = τ ⇔ V aRt (τ |t−1 ) = Qrt (τ |t−1 ). From an empirical point of view, the computation of the V aRt (τ |t−1 ) of a portfolio of assets requires the computation of the empirical quantile at level τ of the distribution of the future returns of the portfolio given the information set available at time t − 1, t−1 . Engle and Manganelli (1999) introduced nonlinear QR as a method for computing VaR. The originality of our model relies on two points: the use of high frequency data to forecast VaR at intraday time horizon and the use of the Fourier series to model the intraday seasonality of returns in a quantile regression framework. Our model deﬁnes the information set available up to time 19 t − 1, t−1 , as including the lagged absolute value of returns, |rt−1 |, and the three deterministic Fourier series, that are indexed by the time of the day d V aRt (τ |d, |rt−1 |) = Qrt (τ |d, |rt−1 |). The one step ahead out-of-sample VaR forecast is conducted using a rolling window scheme, a method popular among practitioners since Fama and MacBeth (1973) and Gonedes (1973). The use of rolling windows is justiﬁed by parameter instability, which can distort the out-of-sample forecast. The window size is adapted to the liquidity of the stock. For the most liquid stock, TEF, we use a rolling window of 2000 observations, for ELE a window of 2500 observations and for ANA the less liquid stock a bigger window of 3000 observations.10 This choice is motivated by the fact that in the same time spam, there is a diﬀerent number of transactions. While for the most liquid stocks (like TEF) in 2000 observations of 15 minutes returns there is enough information due to the high number of transactions, for the less liquid stocks (like ANA) this time interval is too short because it includes a fewer number of transactions.11 This lead to 23.430, 22.930 and 22.430 one-step ahead forecasts for TEF, ELE and ANA respectively. Figure 1.8 displays the last 500 observations of the 15 minutes sampled returns for TEF, ELE and ANA with the relative VaR forecasts at the conﬁdence levels of 2.5%, 1% and 0.5%.12 The estimated VaRs show clearly the eﬀects of the two components that we used to model the conditional quantiles. The seasonal component is responsible of the deterministic daily oscillations, while the dynamic one is amplifying or reducing the oscillations to take into account the dispersion clustering. Moreover, as the conﬁdence level of the VaR decreases, the dynamic component 10 2500 observation of 15 minutes return correspond to 58 days (three months), while 2500 observation of 15 minutes observation cover a time span of 73 days (four months) and, ﬁnally, 3000 observations are equivalent to 88 days (ﬁve months). 11 The choice of the optimal window, although relevant in this literature, is out of the scope of this paper. 12 These are reasonable conﬁdence levels for intraday market risk evaluations as Basel threshold is 1%. 20 becomes more relevant. At ﬁrst sight, it looks that the estimated VaR for the three conﬁdence levels and for all the stocks are close enough to the data, i.e. we are not overestimating the risk, and that the number of times that the realized retunrs are be below the estimated VaR is not too big. As a check, we computed the failure rates. That is the percentage of times that the observations are below the VaR. If the VaR is well speciﬁed, then the empirical failure rates, denoted by fˆ, should be close enough to the conﬁdence level. Table 1.3 reports the empirical failure rates for the three stocks and for the conﬁdence levels of 2.5%, 1% and 0.5%. The values that are in parenthesis refer to the conﬁdence intervals computed according to the Kupiec (1995) test. The null hypothesis of the test is that the empirical failure rate, fˆ, is equal to the conﬁdence level of the VaR, τ . The 5% conﬁdence interval for τ is given ˆ − fˆ)/N, where N is the total number of observations that we are by fˆ ± 1.96 f(1 evaluating, that is 23.430, 22.930 and 22.430 for TEF, ELE and ANA respectively. For all the stocks and all the conﬁdence levels, the conﬁdence interval contains the theoretical conﬁdence levels of 2.5%, 1% and 0.5% respectively, therefore we do not reject the null hypothesis that the empirical failure rates are equal to the theoretical ones for all the conﬁdence levels of the VaR and for all the stocks. A test that is equivalent the Kupiec’s is the likelihood ratio test of unconditional coverage developed by Christoﬀersen (1998). This test is based on a hit variable, that takes value 1 if there is a success, that is if the realized return is bigger than the expected VaR, and 0 otherwise, and therefore distributed according to a binomial distribution. The test is (1 − τ )n0 τ n1 LRuc = −2 log ∼ χ21 , n n ˆ ˆ 0 1 (1 − f ) f where n0 is the number of failures and n1 the number of successes. The ﬁrst panel of Table 1.4 reports the values of the test with the relative p-values. The conclusion are similar to Kupiec’s test. The model is able to predict well the VaR for all the stocks and for all the conﬁdence levels considered. 21 However, a drawback of the Kupiec and the likelihood ratio test of Christoﬀersen is that they just count the number of successes and of failures, testing only the equality between the VaR violations and the conﬁdence level. In a risk management framework, it is also important that the VaR violations are not correlated in time. The likelihood ratio test of independence, Christoﬀersen (1998), examines serial independence of VaR estimates. As the previous likelihood ratio test, this test is built starting from a hit variable that takes values according to ⎧ ⎨ 1, It = ⎩ 0, if rt > V aRt (τ |d, |rt−1 |); otherwise. The likelihood ratio test of independence tests the null of independence against a the alternative of a ﬁrst order Markov process of the violations. Denoting with nij the number of observation of I with value i followed by j, the likelihood ratio test of independence can be expressed as LRind = −2 log (1 − fˆ)n00 +n10 fˆn01 +n11 (1 − fˆ01 )n00 fˆn01 (1 − fˆ11 )n10 fˆn11 01 ∼ χ21 , 11 where fˆ01 is the percentage of successes after a failure and fˆ11 is the percentage of successes after a success. The null of the test is that fˆ01 = fˆ11 = fˆ . Third panel of Table 1.4 reports the value of the test with the relative p-values. The null of independence of the violations is accepted for all the stocks and all the conﬁdence levels of the VaR. Finally, as a more powerful tool, we performed the joint likelihood ratio test of independence and coverage. The Christoﬀersen’s likelihood ratio test of conditional coverage LRcc = LRuc + LRind ∼ χ22 , in which the null of the unconditional coverage is tested against the alternative of the independence test. Bottom panel of Table 1.4 reports the results. For all the 22 conﬁdence levels, we do not reject the null of conditional coverage conﬁrming that the model is well speciﬁed. The tests results show the ability of the model to provide good out-of-sample forecasts of the intraday VaR conﬁrming the importance of well specifying the intraday seasonality. This component, as shown in Figure 1.8, seems to have a crucial role in the determination of the intraday VaR. We compare the performance of the VaR using quantile regression with the benchmark in risk modelling: GARCH type of models. To account for the intraday seasonality in the variance, we follow a common used approach, see for example Giot (2005), which is to seasonally adjust the return series: rt r̃t = √ , φd where φd is the deterministic intraday seasonal component. The latter is deﬁned as the expected volatility conditioned on the time of the day, where the expectation is computed by averaging the squared raw returns for each time of the day. If r̃t has no mean eﬀects, a GARCH(1,1) can be written as r̃t = εt ht 2 h2t = ω + αr̃t−1 + βh2t−1 where ω > 0, α ≥ 0 and β ≥ 0 and εt is an i.i.d. sequence of random variables following either a Gaussian N(0, 1) or a Student-t St(0, 1, ν). Once that we have estimated the parameters, we compute the forecast of the variance of the deseasonalized returns and the intraday VaR for rt at a conﬁdence level τ as G V aRt (τ |d, r̃t−1 )G = zτ ĥt φd St V aRt (τ |d, r̃t−1 )St = zτ ĥt φd (1.5.1) where zτG and zτSt denote the τ -th quantiles of a standard Gaussian and Studentt distributions respectively, r̃t−1 is the set of past adjusted returns and ĥt is the one-step ahead forecasted variance. 23 Table 1.5 reports the estimation results for the two distributions. Notice that these results refer to the standardized returns series, while in Table 1.2 we reported the estimation results on raw data. These results conﬁrm that there is a strong seasonal component in the intraday return and that forgetting about that can be misleading. Indeed, for all the deseasonalized returns, Table 1.5, the estimated degrees of freedom of the Student t model are larger than the ones obtained on the raw returns even if the increase for ANA seems to be marginal. Yet, both the Gaussian and the Student-t models are close to be integrated. Table 1.6 reports Kupiec test and the empirical failure rates the VaR forecast for the three stocks, computed like in (1.5.1) and using the same rolling windows as for quantile regression. Both models fail to forecast correctly -for all the stocks and all the conﬁdence levels.13 With a Gaussian distribution failure rates are systematically bigger than the theoretical ones while the GARCH(1,1) with a Student t distribution does the opposite. This means that the Gaussian model underestimates the risk -assigns too little mass to the tails of the distribution- and the Student t overestimates it -assigns too much mass to the tails. This makes evident the advantage of using a semiparametric method such as quantile regression that does not require any assumption on the underlying distribution. 1.6 Conclusions We investigate intraday seasonal patterns on the probability law of high frequency ﬁnancial returns. Within the day there are signiﬁcant variations in asset prices, which imply diﬀerent evaluations of the tails of the return’s distribution through the day. And these variations are partly deterministic and due to the intraday seasonality. As returns are realizations of a random variable and as such their behavior is fully described by their conditional probability law. To analyze the intraday behavior of the probability law, we use quantile regression, where the 13 We do not show results for the LR tests as the model already failed to pass the simple Kupiec test. They are available under request. 24 regressors are Fourier series that capture the time of the day and past absolute returns. Using quote midpoints of three stocks traded at the Spanish stock exchange from January 2001 to December 2003, we show that indeed the conditional probability distribution depends on the time of the day. At the opening and closing the density ﬂattens and the tails become thicker, while in the middle of the day returns concentrate around the median and the tails are thinner. Results are intuitive, in the sense that they conﬁrm the general perception that in the opening and closing the probabilities of ﬁnding large price ﬂuctuations are higher than at lunch. Results, in terms of quantiles, permit straightforward intraday risk evaluations, such as value at risk. We show the intraday variation of the maximum expected loss at 2.5%, 1% and 0.5% conﬁdence levels. The maxima expected losses are, as expected, maximal at the opening and closing and minimal at lunch time. Moreover the test performed on the out-of-sample forecasts of the value at risk show that the model is able to provide good risk assessments contrary to the standard GARCH(1,1) models. 25 1.7 Appendix In this appendix, we describe the estimation procedure that we followed for the estimation of the asymptotic covariance matrix of the QR estimates. We follow Koenker (2005). Consider the basic model presented in equation (1.3.1). The asymptotic distribution of the QR estimator in a non-iid setting √ T (β̂(τ ) − β(τ )) → N(0, τ (1 − τ )HT−1 JT HT−1) where JT (τ ) = T −1 T xt xt t=1 and HT (τ ) = lim T −1 T →∞ n xt xt ft (ξt (τ )) (1.7.1) t=1 and ft (ξt (τ )) denotes the conditional density of the rt evaluated at the τ -th percent conditional quantile. The asymptotic covariance among estimates at diﬀerent quantiles has blocks √ √ Cov( T (β̂(τt ) − β(τt )), T (β̂(τs ) − β(τs ))) = [τt ∧ τs − τt τs ]HT (τt )−1 JT HT (τs )−1 The conditional density ft (ξt (τ )) in (1.7.1) is estimated using the Hendricks and Koenker (1992) sandwich form. This estimation procedure requires at ﬁrst to compute the optimal bandwidth for each τ , hT . To do it, we used the optimal bandwidth suggested by Boﬁnger(1975) hT = T 1/5 4.5φ4(Φ−1 (τ )) (2Φ−1 (τ )2 + 1)2 1/5 26 where T is the sample size, φ is the normal pdf and Φ−1 is the normal quantile function, i.e. the inverse of the normal cdf. Last, we re-perform the QR estimation for the grids τ − hn and τ + hn . As showed in Figure 1.3, the cdf can be obtained inverting the quantile function and, once that we have the cdf, we can recover the density function diﬀerentiating. Following this intuition, Hendricks and Boﬁnger suggest to estimate the conditional density function as fˆt = max{0, 2hT /(xt β̂(τ + hT ) − xt β̂(τ − hT ) − ε)} (1.7.2) where β̂(τ + hn ) and β̂(τ − hn ) are the estimated parameters at τ − hn and τ + hn and ε is a small tolerance parameter that we ﬁxed to 0.01 to avoid dividing by zero. 27 Table 1.1: Descriptive statistics at diﬀerent hours of the day 09:30 12:00 15:15 17:15 TEF Mean S. Dev 0.000 0.065 0.002 0.035 0.000 0.035 -0.002 0.048 Skew -0.283 -0.196 -0.274 -0.559 Kurt 7.053 5.945 7.296 6.062 ELE Mean S. Dev 09:30 -0.004 0.063 12:00 0.001 0.039 15:15 -0.001 0.033 17:15 0.002 0.051 Skew Kurt -0.076 6.512 0.353 9.017 -1.075 12.684 0.183 6.066 ANA Mean S. Dev 09:30 -0.008 0.136 12:00 0.005 0.080 15:15 -0.003 0.084 17:15 -0.002 0.111 Skew Kurt -0.604 8.077 -0.249 8.368 -2.338 34.049 0.253 6.290 The ﬁrst column reports the time of the day to which the statistics refer. The second displays the sample mean of all the observation at the selected time of the day. The third the sample standard deviation. The fourth the bias corrected skewness and the last one shows the bias adjusted kurtosis. 28 Table 1.2: GARCH(1,1) estimates with Student-t distribution ω α β ν TEF 0.048 0.270 0.730 3.48 ELE 0.097 0.354 0.646 3.35 ANA 0.164 0.424 0.576 2.49 Student’s t GARCH(1,1) ht = ω + 2 αrt−1 + βht−1 estimates. ν stands for degrees of freedom. Table 1.3: Kupiec test on the VaR forecasts VaR(2.5%) VaR(1%) VaR(0.5%) TEF 2.37 (2.17 2.56) 1.00 (0.88 1.13) 0.54 (0.44 0.63) ELE 2.33 (2.13 2.52) 1.04 (0.91 1.17) 0.58 (0.48 0.67) ANA 2.54 (2.34 2.75) 1.06 (0.93 1.20) 0.58 (0.48 0.68) Empirical failure rates of the VaR forecasts at the conﬁdence levels of 2.5% 1% and 0.5%. Conﬁdence intervals in parenthesis and values are in percentage. 29 Table 1.4: Christoﬀersen’s likelihood ratio test on the VaR forecasts LRuc VaR(2.5%) VaR(1%) VaR(0.5%) TEF 1.68 (0.19) 0.00 (0.96) 0.66 (0.42) ELE 2.81 (0.10) 0.41 (0.52) 2.52 (0.11) ANA 0.16 (0.69) 0.83 (0.36) 2.72 (0.10) LRind VaR(2.5%) VaR(1%) VaR(0.5%) TEF 1.06 (0.30) 0.15 (0.69) 1.38 (0.24) ELE 0.96 (0.33) 1.96 (0.16) 1.38 (0.24) ANA 0.50 (0.48) 1.26 (0.26) 0.06 (0.80) LRcc VaR(2.5%) VaR(1%) VaR(0.5%) TEF 2.74 (0.25) 0.16 (0.92) 2.04 (0.36) ELE 3.77 (0.15) 2.37 (0.31) 3.89 (0.14) ANA 0.66 (0.72) 2.09 (0.35) 2.78 (0.25) Christoﬀersen’s likelihood ratio test for the the VaR forecasts at the conﬁdence levels of 2.5% 1% and 0.5%. The ﬁrst panel presents results for the Christoﬀersen’s likelihood ratio test of unconditional coverage, LRuc with pvalues in parenthesis. The second panel presents results for Christoﬀersen’s likelihood ratio test of independence, LRind with p-values in parenthesis. Last panel presents results for Christoﬀersen’s joint likelihood ratio test of coverage and independence, LRcc with p-values in parenthesis. 30 Table 1.5: GARCH(1,1) estimates of the standardized returns Gaussian ω α β TEF 0.001 0.032 0.967 ELE 0.002 0.029 0.970 ANA 0.022 0.060 0.920 Student’s t ω α β TEF 0.001 0.035 0.965 ELE 0.008 0.062 0.934 ANA 0.125 0.266 0.734 ν 6.018 4.681 2.501 2 GARCH(1,1) ht = ω + αr̃t−1 + βht−1 estimates. ν stands for degrees of freedom. Table 1.6: Kupiec test on the VaR forecasts Gaussian VaR(2.5%) VaR(1%) VaR(0.5%) TEF 2.78 (2.57 2.99) 1.56 (1.40 1.72) 1.07 (0.94 1.20) ELE 2.86 (2.64 3.07) 1.62 (1.46 1.79) 1.08 (0.95 1.22) ANA 3.14 (2.91 3.37) 2.07 (1.88 2.25) 1.56 (1.39 1.72) Student’s t VaR(2.5%) VaR(1%) VaR(0.5%) TEF 1.62 (1.46 1.78) 0.63 (0.53 0.73) 0.37 (0.29 0.44) ELE 1.20 (1.06 1.34) 0.42 (0.34 0.50) 0.18 (0.13 0.24) ANA 0.38 (0.30 0.46) 0.10 (0.06 0.14) 0.03 (0.01 0.05) Empirical failure rates of the VaR forecasts using a GARCH(1,1) at the conﬁdence levels of 2.5% 1% and 0.5%. Conﬁdence intervals in parenthesis and values are in percentage. 31 Figure 1.1: Kernel estimates at diﬀerent hours of the day. TEF ELE 1 0.9 9 13 16:30 0.9 9 13 16:30 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 −6 −4 −2 0 2 4 0 −6 6 −4 −2 0 2 4 6 ANA 1.5 9 13 16:30 1 0.5 0 −6 −4 −2 0 2 4 6 Nonparametric density estimate of the 15 minutes returns at diﬀerent hours of the day. For each day, we included the observation at the selected hour, therefore each sample contains a number of observation equal to the number of days. The estimate is based on a Gaussian kernel with optimal bandwidth. 32 Figure 1.2: 15 minutes returns TEF 1 0.5 0 0.5 1 1.5 2 2.5 4 x 10 ELE 0.5 0 −0.5 0.5 1 1.5 2 2.5 4 x 10 ANA 3 2 1 0 −1 0.5 1 1.5 2 2.5 4 x 10 Standardized 15 minutes returns. The sample period runs from January 2001 to December 2003. For each stock there are 34 intraday observations for a total of 25 400. 33 Figure 1.3: Location and scale shifts in the pdf through the quantile function cdf quantile function 0 −2 0.2 0.4 0.6 0.8 0.6 0.4 0.2 0.8 0.3 pdf cdf quant 2 −2 −1 −2 0.6 0.8 0.6 −1 0 1 2 −2 −1 0.6 0.8 0 2 1 2 1 2 y 0.2 0.1 −2 −1 0 1 2 −2 −1 0 y 0.8 0.6 0.4 0.2 0.3 pdf cdf loc−scale effect −2 1 0.2 y 0 2 0.3 τ 2 1 0.1 0.8 0.6 0.4 0.2 0.8 0 y pdf cdf scale effect −2 τ −1 y 0 0.4 −2 0.3 −2 2 0.2 2 0.8 0.6 0.4 0.2 τ 0.4 1 pdf cdf location effect 0 0.2 0 y 2 0.4 0.2 0.1 τ 0.2 pdf 0.2 0.1 −2 −1 0 y 1 2 −2 −1 0 y Top row shows the pdf, cdf and quantile function of a standardized normal. For the other three rows, the continuous line indicates the pdf, cdf and quantile function of the standardized normal. The dashed line in the second row refers to the pdf, cdf and quantile function of a normal with mean 1 and variance 1. In the third row the dashed line indicates a normal with mean 0 and variance 1.5 and in the last row a normal with mean 1 and variance 1.5. 34 Figure 1.4: Estimated parameters TEF 0.4 1 β(τ) ω(τ) 0.2 0 0 −0.2 −1 −0.4 0.2 0.4 0.6 0.8 0.2 0.4 τ 0.5 0.6 0.8 0.6 0.8 0.6 0.8 0.6 0.8 0.6 0.8 0.6 0.8 0.6 0.8 0.6 0.8 τ γ1(τ) α1(τ) 0.05 0 0 −0.5 −0.05 0.2 0.4 0.6 0.8 0.2 τ 0.4 τ 0.1 0.05 γ2(τ) α2(τ) 0.2 0 0 −0.2 0.2 0.4 0.6 0.8 0.2 τ τ 0.05 γ3(τ) 0.1 α3(τ) 0.4 0.1 0.2 0 0 −0.05 −0.1 0.2 0.4 0.6 0.8 0.2 0.4 τ τ ELE 0.4 0.2 0 −0.2 −0.4 β(τ) ω(τ) 1 0 −1 0.2 0.4 0.6 0.8 0.2 τ 0.5 0.4 τ 0.1 γ1(τ) α1(τ) 0.05 0 0 −0.05 −0.5 0.2 0.4 0.6 0.8 0.2 0.4 τ τ 0.05 γ2(τ) α2(τ) 0.2 0 0 −0.05 −0.2 0.2 0.4 0.6 0.8 0.2 0.4 τ τ 0.05 0 γ3(τ) α3(τ) 0.1 0 −0.05 −0.1 −0.1 0.2 0.4 0.6 0.8 0.2 0.4 τ τ ANA β(τ) ω(τ) 1 0 −1 0.2 0.4 0.8 0.2 1 0 −0.5 0.4 0.6 0.8 0.6 0.8 0.6 0.8 0.6 0.8 0 0.8 0.2 τ 0.4 τ 0.05 0.2 0 γ2(τ) 2 0.6 τ −0.2 0.2 α (τ) 0.4 0.2 γ (τ) 1 α (τ) 0.6 τ 0.5 0.4 0.2 0 −0.2 −0.4 0 −0.05 −0.1 −0.2 −0.15 0.2 0.4 0.6 0.8 0.2 0.4 τ τ 0.1 γ (τ) 0 3 3 α (τ) 0.1 0 −0.1 −0.1 −0.2 0.2 0.4 0.6 τ 0.8 0.2 0.4 τ The ﬁgure displays the estimated parameters of equation (1.3.3). The continuous line indicates the estimated parameters for each τ quantile. The dashed one refers to the 5% point-wise conﬁdence intervals. 35 Figure 1.5: Seasonal component TEF 0.8 0.6 0.4 seas(τ) 0.2 0 −0.2 9.30 10 11 12 13 14 15 16 17 −0.4 −0.6 −0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.8 0.9 0.8 0.9 τ ELE 0.8 0.6 0.4 seas(τ) 0.2 0 −0.2 9.30 10 11 12 13 14 15 16 17 −0.4 −0.6 0.1 0.2 0.3 0.4 0.5 0.6 0.7 τ ANA 0.6 0.4 seas(τ) 0.2 0 −0.2 9.30 10 11 12 13 14 15 16 17 −0.4 −0.6 0.1 0.2 0.3 0.4 0.5 0.6 0.7 τ Estimated seasonal component seas ˆ t (τ ), as presented in equation (1.3.2), for diﬀerent times of 36 the day. Figure 1.6: Seasonality in the quantiles TEF 3 2 Conditional quantiles 1 0 −1 9.30 10 11 12 13 14 15 16 17 −2 −3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ELE 3 2 Conditional quantiles 1 0 −1 9.30 10 11 12 13 14 15 16 17 −2 −3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ANA 3 2 Conditional quantiles 1 0 −1 9.30 10 11 12 13 14 15 16 17 −2 −3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ Conditional quantiles of rt given |rt−1 | equal to its 50 percent empirical quantile, Qrt (τ |t, |rt−1 | = Q|rt−1 | (0.50)), and for diﬀerent times of the day . 37 Figure 1.7: Seasonality and the tails TEF 0.4 0.4 0.35 0.3 9.30 10 11 12 13 14 15 16 17 0.3 0.25 0.25 pdf pdf 9.30 10 11 12 13 14 15 16 17 0.35 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 −2 −1.5 −1 0 0.5 −0.5 1 1.5 2 ELE 0.4 0.4 0.35 0.3 9.30 10 11 12 13 14 15 16 17 0.3 0.25 pdf 0.25 pdf 9.30 10 11 12 13 14 15 16 17 0.35 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 −2 −1.5 −1 0 0.5 −0.5 1 1.5 2 ANA 0.4 0.4 0.35 0.3 9.30 10 11 12 13 14 15 16 17 0.3 0.25 pdf pdf 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 −2 9.30 10 11 12 13 14 15 16 17 0.35 −1.5 −1 −0.5 0 0.5 1 1.5 2 Left and right tail of the conditional densities of rt given |rt−1 | equal to its 50 percent empirical quantile for diﬀerent times of the day. The conditional density is computed using equation (1.7.2) 38 in the Appendix. Figure 1.8: VaR TEF 4 r t VaR(2.5) VaR(1) VaR(0.5) 3 2 1 0 −1 −2 −3 −4 0 100 200 300 400 500 ELE 4 rt VaR(2.5) VaR(1) VaR(0.5) 3 2 1 0 −1 −2 −3 −4 0 100 200 300 400 500 ANA 4 rt VaR(2.5) VaR(1) VaR(0.5) 3 2 1 0 −1 −2 −3 −4 0 100 200 300 400 500 Last 500 standardized 15 minutes returns and the relative out of sample Value at Risk forecast 39 at conﬁdence levels 2.5%, 1% and 0.5%. Chapter 2 Forecasting the yield curve using large macroeconomic information ABSTRACT: This paper investigates whether macroeconomic indicators are helpful in forecasting the yield curve. We incorporate a large number of macroeconomic predictors within the Nelson and Siegel (1987) model for the yield curve which can be cast in a common factor model representation. Estimation is performed by EM algorithm and Kalman ﬁlter using a data set composed by 17 yields and 118 macro variables. Results show that incorporating large macroeconomic information improves the accuracy of out-of-sample yield forecasts at medium and long horizons. Keywords: Yield Curve, Factor Models, Forecasting, Large Cross-Sections, Quasi Maximum Likelihood. JEL classiﬁcation: C33, C53, E43, E44. This chapter is adapted from the working paper ”Forecasting the term structure of interest rates using a large panel of macroeconomic data” with Domenico Giannone (ECB and Universite Libre de Bruxelles) and Michele Modugno (ECB and Universite Libre de Bruxelles). 40 2.1 Introduction The interaction between the yield curve and macro variables is a clear phenomenon manifested in the behavior of market agents and policy makers. On one hand, market participants closely monitor macro data releases and try to asses their impact in the yields, see for example Fleming and Remolona (1999) and Furﬁne (2001). On the other one, central banks, in the standard view of the monetary transmission mechanisms, react to current macroeconomic situation, stimulating aggregate demand and controlling inﬂation by ﬁxing the short term interest rates. Following the expectation theory, long term interest rates depend on present and future expected short term interest rates. This suggests that macro variables can incorporate important information in order to forecast the behavior of market and central bank practitioners, and thereby the evolution of the yield curve. The term structure of interest rates is characterized by a high degree of correlation among yields with diﬀerent maturities. This collinearity can be explained by few sources of co-movement. As a consequence, a parsimonious representation of the yield curve can be obtained by modeling fewer factors than observed maturities. Accordingly, the two main approaches for yield curve modeling can be cast in a factor model representation, which diﬀer from each other for the restrictions imposed on the model parameters. The ﬁrst approach, the Nelson and Siegel (1987) model, is a parsimonious model based on the relation between the yields and their corresponding maturities. This model is able to reproduce the historical stylized facts concerning the average shape of the yield curve, the variety of shapes assumed at diﬀerent times and the strong persistence of yields. Moreover, Diebold and Li (2006) reinterpret the Nelson and Siegel model as a dynamic three latent factors model with restricted loadings and show that it is able to provide good forecasts of the yield curve. The second approach, the no-arbitrage term structure models, is characterized by restrictions on the factor loadings that rule out arbitrage opportunities. These models impose a structure on the factor loadings such that the resulting yield curves, in the maturity dimension, are compatible with the time41 series dynamics of the yield curve factors. This consistency between the dynamic evolution of the yield curve factors, and hence the yields at diﬀerent maturities, is what ensures the absence of arbitrage opportunities and makes these models particularly useful for derivative pricing.1 However the imposition of the no-arbitrage restrictions on the term structure models imply that the resulting model is clearly not parsimonious and therefore not suitable for forecasting purposes.2 This is conﬁrmed by Duﬀee (2002) who ﬁnds that this type of models forecasts the yield curve poorly. Accordingly, given that the main focus of this paper is on forecasting of the term structure of interest rates, we adopt the more parsimonious Nelson and Siegel approach. In the literature there has been a lot of interest in the relationship between macroeconomic variables and the yield curve. In their seminal paper, Ang and Piazzesi (2003) study the interactions between yields and macroeconomic variables augmenting the standard no-arbitrage aﬃne term structure model with two observable macroeconomic factors, measuring inﬂation and real activity. The idea behind this model is to use macroeconomic variables to capture the variability of the yields not explained by the latent factors, improving the forecasting performance of the model. Their main conclusion is that macroeconomic factors help in forecasting the yield curve. Following this ﬁnding, several papers investigate the links between the yield curve and macroeconomic variables, incorporating macro determinants as factors, into multi factor no-arbitrage aﬃne models. Among others Dai and Philippon (2005), Dewachter and Lyrio (2006), Kozicki and Tinsley (2001), Wu (2006). Mönch (2005) uses, as additional factors, principal components extracted from a large macroeconomic data- set, instead of single macro variables.3 Rudebusch and Wu (2004) and Hördhal, Tristani and Vestin (2006) develop a theoretical framework which allows to identify the sources of co-movements by structural macroeconomic 1 For more details about the no-arbitrage term structure models see Duﬃe and Kan (1996), Litterman and Scheinkman (1991) and Dai and Singleton (2000). 2 For a more detailed comparison between the Nelson and Siegel model and the no-arbitrage term structure models see Chapter 3. 3 A similar approach is used to forecast the excess bond returns in Ludvigson and Ng (2005). 42 relations. On the other side, following the Nelson and Siegel approach, Diebold, Rudebush and Aruoba (2006) introduce a yield curve model where, in addition to the Nelson and Siegel latent factors, they include some observable macroeconomic factors. They show that observable macroeconomic factors have strong eﬀects on the future yield curve and that there is evidence of reverse inﬂuence. Mönch (2006) proposes to use principal components extracted from a large data set of macro economic variables to augment the Nelson and Siegel latent factors. Favero, Niu and Sala (2007) and De Pooter, Ravazzolo and Van Dijk (2007) investigate the impact of macro variables on the forectast of yields. They provide an exhaustive comparison of the existing yield curve models, with and without macro factors, and they ﬁnd that additional factors extracted from large macro dataset are important for yield curve forecasting. To summarize, the general idea behind the previous literature is to use macro variables as extra factors to capture the co-movement among yields not explained by the yield curve latent factors. One can raise three criticisms against this approach. First, the idea behind factor models is parsimony. Augmenting the number of factors goes against this notion, specially if the three latent factors already explain most of the variation of the yields. Moreover, these latent factors are frequently identiﬁed as proxies of macro variables, therefore adding macro variables can be redundant. Second, adding macro variables as factors has not been proven successful at improving the out-of-sample performance of these models. This can be due to the fact that the gain of exploiting a larger information set does not counterbalance the loss in terms of lack of parsimony. Third, this approach allows to exploit only a small set of macro variables. One could use principal components to summarize the information content of a larger macro variables set, but in this way it would not be possible to understand which macroeconomic variables are important in ﬁtting and forecasting the yield curve. Moreover, in the presence of a high correlation among the principal components and the latent yields curve factors, there would be a problem of parsimony. 43 In this paper, we propose a model for forecasting the yield curve using parsimoniously a large amount of macroeconomic information. We suggest an innovative way to exploit the linkages between macroeconomic variables and yields. Rather than including macroeconomic variables as factors in the yield curve model, we directly extract the latent factors from a data set composed of both yields (seventeen series) and macro variables (one hundred-eighteen). The macroeconomic variables considered in the analysis include real variables (sectorial industrial production, employment and hours worked), nominal variables (consumer and producer price indices, wages, and money aggregates) and asset prices (stock prices and exchange rates). To identify the yield curve factors, we impose the Nelson and Siegel restrictions on the loadings relative to the yields, leaving the loadings relative to macro variables free. This allows to enrich the yield curve latent factors with the information contained in the macroeconomic variables. This approach allows to preserve parsimony and include a large amount of information at the same time, since it is not necessary to augment the standard three latent factor models in order to include the additional information coming from the macroeconomic variables. Indeed, as shown by Diebold and Li (2006), the Nelson and Siegel factors are highly correlated with some macro variables, in particular with measures of inﬂation and industrial production. This means that the sources of co-movement for the yields co-move with the rest of the economy. Accordingly, in the aim of the factor model literature, extracting the latent factors from a panel of yields enriched with a large amount of macroeconomic variables, allows us to better identify them. Moreover, by looking at the loadings it is possible to discriminate which macro variables have signiﬁcant information content for each factor. Related to this work is Law (2006), who extracts the latent factors, using a no arbitrage model, from twenty-four macro series plus the yields. We diﬀer from him in several aspects. First, we exploit a broader quantity of information. Second, we perform an out of sample forecast exercise. Third, we use the Nelson and Siegel approach. We estimate the model by maximum likelihood combining EM algorithm and 44 Kalman ﬁlter. Doz, Giannone and Reichlin (2006) show that this procedure makes maximum likelihood estimation of factor models feasible for large cross sections, in the sense that it delivers consistent estimates for large cross sections and large sample sizes (for any relative size of the time span and cross sectional dimension). Consistency is guaranteed even if the hypothesis of orthogonality and of absence of serial correlation are violated for the idiosyncratic part. Moreover, this methodology allows us to impose the crucial restrictions on the loadings to identify the factors. Results show that the out-of-sample forecasting performance improves at middle and long horizons (i.e. 6 and 12-months ahead) compared with the forecasts generated by a model estimated using only the yields, the ones generated by a model á la Diebold, Rudebush and Aruoba (2006) where the Nelson and Siegel factors are augmented with three macroeconomic variables (the manufacturing capacity utilization, the federal funds rate and the annual price inﬂation), a model á la Mönch (2006) where the Nelson and Siegel factors are augmented with the ﬁrst three principal components extracted from the same macro dataset and the ones generated by a random walk. The paper is organized as follows. Section 2.2 introduces the macro-yields model and the four alternative models considered in the analysis. Section 2.3 presents the data describing how the yields are constructed and providing a description of the macroeconomic dataset. Section 2.4 describes the estimation technique used, with a more detailed description in Appendix, and derives the modiﬁed Bai and Ng (2002) information criterion that we use for model selection. Section 2.5 shows the estimation results of the macro-yields model and the in sample performances of the proposed models. The importance of using parsimoniously macro variables becomes clear in Section 2.6 where we compare the forecasting performances of the proposed model. Section 2.7 concludes. 45 2.2 Model The Nelson and Siegel (1987) model is a parsimonious model to ﬁt yields of diﬀerent maturities at a speciﬁc point in time. Diebold and Li (2006) reinterpret the Nelson and Siegel model as a latent factor model, where the evolution in time of the yields depends on three latent factors identiﬁed as level, slope and curvature through the restrictions on their relative factor loadings. Denoting with yt (τi ) the yield of maturity τi at time t, the Nelson and Sigel model, as reinterpreted by Diebold and Li (2006) can be expressed as yt (τi ) = Lt + St 1 − e−λτi λτi + Ct 1 − e−λτi − e−λτi λτi + vt (τi ) (2.2.1) where the level, slope and curvature are denoted by Lt , St and Ct , and vt (τi ) is the residual, or pricing error. The predetermined loadings (1, −λτ 1−e−λτi , 1−eλτi i λτi − e−λτi ) allow to identify the three factors as level, slope and curvature of the yield curve because of the eﬀects that they have on its shape. The loadings relative to the ﬁrst factor, equal to one for all maturities, imply that an increase in Lt increases all yields equally, shifting the level of the yield curve. The loadings of the second factor are high for short maturities decaying to zero for the long ones. Accordingly, as increase in St increases the slope of the yield curve. The loadings relative to Ct are zero for the shortest and the longest maturities, reaching the maximum for medium maturities. Therefore, an increase of Ct augments the curvature of the yield curve. The parameter λ governs the exponential decay rate, a small value of λ can better ﬁt the yield curve at long maturities, while large values can better ﬁt it at short maturities. Diebold and Li (2006) keep this parameter constant over time. Rewriting equation (2.2.1) in vector notation yt = Γ∗yf ft + vy,t (2.2.2) 46 where yt collects the yields of diﬀerent maturities available at time t, Γ∗yf is the matrix of restricted factor loadings with row i equal to (1, −λτ 1−e−λτi , 1−eλτi i λτi − e−λτi ) and ft is the vector of factors (Lt , St , Ct ) . The aim of this paper is to introduce a parsimonious model that exploits all the information about the state of the economy in order to ﬁt and forecast the yield curve. To summarize all the information in the macroeconomic variables, we do not add any speciﬁc macroeconomic variable as a factor, neither we add principal components extracted from a macroeconomic data set. We rather extract the level, slope and curvature from a large panel composed by yields and macroeconomic variables. Generalizing the Nelson and Siegel factor model of equation (2.2.2), we have ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ yt v Γ∗yf ⎝ ⎠ = ⎝ ⎠ ft + ⎝ y,t ⎠ xt Γxf vx,t (2.2.3) where yt is the vector of yields, xt is a large set of macroeconomic variables and ft collects the yield curve latent factors. To identify the three unobservable factors as level, slope and curvature, we restrict the matrix Γ∗yf of factor loadings relative to the yields á la Nelson and Siegel as in equation (2.2.2). While the matrix Γxf , that collects the loadings relative to the macro variables, is left unrestricted. Rather than including macro variables as additional factors, we use them to extract the Nelson and Siegel factors. Yields co-move with the whole economy, therefore the few sources that generate the evolution of the yields have to be related to the whole economy. This implies that extracting the Nelson and Siegel factors not only from yields but also from macro variables allows to use the extra information of the macro variables to better identify the factors. This feature is in line with the previous macro-ﬁnance literature, which links the level with diﬀerent measures of inﬂation and the slope with capacity utilization or industrial production, see among others Diebold and Li (2006), Diebold, Rudebusch and Aruoba (2006), Rudebusch and Wu (2004) and Hördhal et al (2002). Moreover, using a 47 large set of macro-variables allows to discriminate, through the loadings, which variables are related to the factors and useful to forecast out-of-sample the yields. The model presented in equation (2.2.3) can be easily extended to allow the presence of additional unobservable and unidentiﬁed factors in the following way ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ∗ y 0 f Γ v ⎝ t ⎠ = ⎝ yf ⎠ ⎝ t ⎠ + ⎝ y,t ⎠ Γxf Γxg xt gt vx,t (2.2.4) In this model, both yields and macro-variables participate in determining the yield curve factors (level, slope and curvature) collected in the vector ft , while the factors collected in gt are determined only by the macro variables and are unidentiﬁed since we do not impose any restriction on the matrix of factor loadings Γxg . In this model, even if we added additional and unidentiﬁed factors to explain the variation in the large data set of macro variables, the yields still load only on the three yield curve factors. This is consistent with the standard view that three factors are able to exploit all the information in the yield curve.4 The model presented in equation (2.2.4) can be easily put in a state-space representation. The macro-yields model that we propose is ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ∗ 0 y f Γ v ⎝ t ⎠ = ⎝ yf ⎠ ⎝ t ⎠ + ⎝ y,t ⎠ Γxf Γxg xt gt vx,t ⎞ ⎛ ⎞ ⎛ ⎛ ⎞ f u f ⎝ t ⎠ = A ⎝ t−1 ⎠ + ⎝ f,t ⎠ gt gt−1 ug,t (2.2.5) (2.2.6) where v = (vy,t , vx,t ) ∼ iid N(0, R) and u = (uf,t , ug,t ) ∼ iid N(0, Q), with a diagonal variance matrix of the idiosyncratic disturbances R and a non-diagonal variance matrix of the shocks driving the common factors Q. 4 We also estimated the model allowing Γyg to be diﬀerent from zero and, as expected, the estimated loadings were small in magnitude and not signiﬁcant. All the results presented in the paper do not change allowing Γyg to be diﬀerent than zero and are available upon request. 48 2.2.1 Alternative models We compare the macro-yields model, presented in equations (2.2.5) - (2.2.6), with three alternative models: the only yields model, the basic macro-yields model and the large macro-yields model. The only yields model uses only the information contained in the yields series to extract the yield curve factors. This model is a generalization of the Diebold and Li (2006) one, which has been showed to outperform several models in forecasting the U.S. yields. It can be represented as yt = Γ∗yf ft + vy,t , vy,t ∼ iid N(0, R) (2.2.7) ft = Aft−1 + uf,t , uf,t ∼ iid N(0, Q) (2.2.8) where the matrix of factor loadings of the yields Γ∗yf is restricted á la Nelson and Siegel. This model can be obtained from the macro-yields model, presented in equations (2.2.5) - (2.2.6), imposing the following restrictions: the macro variables do not participate in the determination of the yield curve factors, i.e. Γxf = 0, and there are only the three yield curve factors, i.e Γxg = 0. The basic macro-yields model augments the yields curve factors with a minimal set of fundamental macro variables as extra factors to capture basic macroeconomic dynamics. In particular, following Diebold, Rudebush and Aruoba (2006), we consider as additional factors the manufacturing capacity utilization (CU), the federal funds rate (FFR) and the annual price inﬂation (INFL). Therefore imposing gt = (CUt , F F Rt , INF Lt ), the basic macro-yields model can be written as yt = Γ∗yf ft + Γyg gt + vy,t , ⎛ ⎞ ⎛ ⎞ ft−1 ft ⎠ + ut , ⎝ ⎠ = A⎝ gt gt−1 vy,t ∼ iid N(0, R) (2.2.9) ut ∼ iid N(0, Q) (2.2.10) with Γ∗yf restricted á la Nelson and Siegel to identify the three yield curve factors. This model can be obtained from the macro-yields model, presented in equations 49 (2.2.5) - (2.2.6), imposing that the yield curve factors are extracted only from the yields, i.e. Γxf = 0, that the yields load both on the yield curve factors and the macro factors, i.e. Γyg = 0, that the additional factors are equal to the manufacturing capacity utilization (CU), the federal funds rate (FFR) and the annual price inﬂation (INFL), i.e. gt = (CUt , F F Rt , INF Lt ), and coincide with the additional macro variables, i.e. Γxg = I and Rx = 0. The large macro-yields model exploits a larger information set with respect to the basic macro-yields models. This model augments the three yield curve factors, extracted only from the yields, with the ﬁrst three principal components extracted from the large data-set of macroeconomic variables. Denoting with P Ct the vector of three principal components at time t, the large macro-yields model can be represented as yt = Γ∗yf ft + Γyg P Ct + vy,t , vy,t ∼ iid N(0, R) ⎞ ⎛ ⎞ ⎛ f f ⎝ t ⎠ = A ⎝ t−1 ⎠ + ut , ut ∼ iid N(0, Q) P Ct P Ct−1 (2.2.11) (2.2.12) with Γ∗yf restricted á la Nelson and Siegel to identify the three yields curve factors. Also this model can be considered as a restricted version of the macro-yields model, presented in equations (2.2.5) - (2.2.6), with the following restrictions: the yield curve factors are extracted only from the yields, i.e. Γxf = 0, the yields load both on the yield curve factors and the macro factors, i.e. Γyg = 0, the additional factors are equal to the principal components extracted form a large dataset of macro variables gt = P Ct and coincide with the additional macro variables, i.e. Γxg = I and Rx = 0. The large macro-yields model is closely related to the model proposed in Mönch (2006) and, through the comparison with the macro-yields model, we want to emphasize the importance of extracting the factors from both the yields and the macro series. The information set used for the analysis expands passing from the only yields to the basic macro-yields and to the large macro-yields models. While the large 50 macro-yields and the macro-yields model use the same information set. However the macro-yields model, presented in equations (2.2.5) - (2.2.6), is the only model that includes a large amount of macroeconomic information and has only three factors in the observation equation of the yields. 2.3 Data The data-set used for the empirical analysis contains monthly observations of zerocoupon yields and a large set of macro variables from January 1970 to December 2000. The zero-coupon yields have maturities of 3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 48, 60, 72, 84, 96, 108 and 120 months. This data, available on Diebold’s home page, are constructed from end-of-month price quotes (bid/ask average) for U.S. Treasuries taken from the CRSP government bonds ﬁles. CRSP ﬁlters the data eliminating bonds with option features (callable and ﬂower bonds), and bonds with special liquidity problems (notes and bonds with less than one year to maturity, and bills with less than one month to maturity), and then converts the ﬁltered bond prices to unsmoothed Fama-Bliss (1987) forward rates. Then these unsmoothed forward rates are converted into unsmoothed Fama-Bliss zero-coupon yields, using ﬁxed maturities. To pool the data in the ﬁxed maturities listed above a month is deﬁned as 30,4375 days, given that not every month has the same maturities, the data are obtained by linearly interpolating nearby maturities. For example in each month there are many bonds with either 30, 31, 32, 33 or 34 days to maturities, and by interpolating them it is possible to get the yields with maturity of one month (of 30.4375 days). Summary statistics for the zero-coupon yields used in this paper are presented in Table 2.1. The stylized facts common to yield curve data are clearly present: the sample average curve is upward sloping and concave, volatility is decreasing with maturity and autocorrelations are very high and increasing with maturity. 51 [TABLE 2.1 AROUND HERE] Figure 2.1 shows a plot of the zero-coupon yields for the period considered and highlights how the yields at diﬀerent maturities tend to move together through time. Correlations between yields of diﬀerent maturities are high, specially for yields with maturities that are close to each other. [FIGURE 2.1 AROUND HERE] The macro dataset is the same as used in Giannone, Reichlin and Sala (2004) and consists of 118 monthly US series. We exclude all interest and spread series, except for the federal funds rate, from the original panel dataset of 132 series. The federal funds rate closely follows the federal fund target rate, which is the key monetary policy instrument for the US Federal Reserve, and should therefore be important for capturing the movements of the short end of the term structure. The variables contained in the macro dataset include real variables (sectorial industrial production, employment and hours worked), nominal variables (consumer and producer price indices, wages, and money aggregates), asset prices (stock prices and exchange rate). Table 2.2 lists the series included in the macro dataset. [TABLE 2.2 AROUND HERE] We transform the monthly recorded macro series, whenever appropriate, to ensure stationarity by using levels, log levels, monthly diﬀerences, monthly log diﬀerences or annual log diﬀerences. The last column in Table 2.2 lists the applied transformation. In general, for real variables such as employment and industrial production we use the monthly growth rates. We use ﬁrst diﬀerences for series already expressed in rates: unemployment rate and capacity of utilization. We do not transform in ﬁrst diﬀerences the federal funds rate to be able to extract level, slope and curvature from the data. 52 2.4 Estimation procedure The macro-yields model, presented in equation (2.2.5) - (2.2.6), allows to identify the Nelson and Siegel yield curve factors through the restrictions imposed in the relative factor loadings, Γ∗yf . Thus the macro-yields model is a restricted dynamic factor model and it cannot be estimated by standard principal components, since this estimator does not allow to impose the necessary restrictions on the factor loadings. For this reason, using the results in Doz, Giannone and Reichlin (2006), we estimate the macro-yields model using quasi maximum likelihood. The procedure proposed by Doz, Giannone and Reichlin (2006) combines EM algorithm and Kalman ﬁlter. This method makes feasible maximum likelihood estimation of factor models for large cross sections providing consistent estimates for any relative size of the time span and of the cross sectional dimension. Moreover, this procedure guarantees consistency even when hypothesis of orthogonality and absence of serial correlation of the idiosyncratic component are violated. The estimation procedure alternates Kalman ﬁlter extraction of the factors to the maximization of the likelihood. In particular, for given parameters of the model, we use the Kalman ﬁlter to extract the factors. Then given the extracted factors, we maximize the Gaussian likelihood function implied by the Kalman ﬁlter using the EM algorithm. The estimation procedure is described in details in the Appendix. As shown in section 2.2.1, the alternative models considered in this paper can be considered as restricted versions of the macro-yields model. For this reason, we use the same estimation procedure for all the models included in the analysis. 2.4.1 Model selection The macro-yields model, presented in equation (2.2.5) - (2.2.6), is a general framework that allows for the presence of both the three Nelson and Siegel yield curve factors ft and any number of unidentiﬁed and unobservable factors gt . Therefore, 53 in order to estimate the model, we need a statistical criteria to select the number of factors to include. The most used procedure to determine the number of factors in approximate factor models is the information criterion proposed by Bai and Ng (2002). The idea is to choose the number of factors that maximizes the general ﬁt of the model using a penalty function to account for the loss in parsimony. The general form of the information criterion IC3 introduced by Bai and Ng (2002) is IC(r) = log(V (r, F̂ r )) + rg(N, T ), g(N, T ) = log CN2 T CN2 T (2.4.1) where r denotes the number of factors, F̂ r are the estimated factors and V (r, F̂ r ) is the sum of squared residuals (divided by NT) when r factors are estimated. Moreover, the penalty function g(N, T ) is a function of both N and T and depends on CN2 T , the convergence rate of the principal component estimator, CN2 T = min{T, N}. The macro-yields model is estimated by quasi maximum likelihood and not by principal components, for this reason the IC information criterion, as presented in equation (2.4.1), cannot be used. However, in Corollary 2 of Bai and Ng (2002) it is shown that the IC information criterion can be applied to any consistent estimator of the factors provided that the penalty function is derived from the correct convergence rate. Thus, in order to apply this criterion to the macro-yields model, it is necessary to substitute the convergence rate of the quasi maximum likelihood estimator in equation (2.4.1). Doz, Giannone and Reichlin (2006) in Proposition 1 show that the quasi maximum likelihood estimator of the common factors converges to the true value at √ a rate equal to CN∗2T = min{ T , logNN }. Therefor substituting CN∗2T in equation (2.4.1), we obtain the modiﬁed Bai and Ng information criterion IC ∗ which can be 54 used when the common factors are estimated by maximum likelihood: √ T , logNN } log min{ ∗ r √ . IC (r) = log(V (r, F̂ )) + r min{ T , logNN } (2.4.2) The estimated modiﬁed Bai and Ng information criterion IC ∗ (r) for diﬀerent speciﬁcation of the macro-yields model is reported in Table 2.3. The model has been estimated on the full sample, from January 1970 to December 2000, varying the number of factors included. In particular, the model with three factors, i.e. r = 3, includes only the three Nelson and Siegel yield curve factors extracted from both yields and macro variables. While the models with r = 4 , 5 and 6 include also one, two or three unidentiﬁed additional factors. The table also reports the values of the sum of the variance of the idiosyncratic components, denoted by RR(r, F̂ r ), for each speciﬁcation of the model. [TABLE 2.3 AROUND HERE] The modiﬁed Bai and Ng information criterion indicates that the best model is the model with the three Nelson and Siegel yield curve factors plus one unidentiﬁed factor, i.e. r = 4. This is also conﬁrmed by the fact that the strongest reduction in the sum of the variances of the idiosyncratic components is obtained passing from the three to the four factors speciﬁcation. Intuitively, this result can be explained by the large dimension of the data-set (17 yields plus 118 macroeconomic variables) that cannot be explained only through the three Nelson and Siegel yield curve factors. Figures 2.2-2.4 show the in sample ﬁt of the macro-yields model with only the three Nelson and Siegel yield curve factors, i.e. r = 3, and adding the unidentiﬁed factor, i.e. r = 4. [FIGURES 2.2-2.4 AROUND HERE] The in sample ﬁt of the yields does not improve passing from the three to the four factors speciﬁcation, as it can be seen in Figure 2.2. However, the picture is completely diﬀerent for the macroeconomic variables. Figures 2.3-2.4 highlight how the 55 fourth factor is important to capture the dynamics of most of the macroeconomic variables. Indeed, the three Nelson and Siegel yield curve factors do a poor job in ﬁtting most of the macroeconomic variables, except price indexes. Figure 2.4 shows that the yield curve factors are by themselves able to ﬁt really well the producer price index (PPI), the consumer price index (CPI) and the personal consumption expenditure implicit price deﬂator (PCE). This is due to the fact that the ﬁrst Nelson and Siegel factor is highly correlated with inﬂation, as also conﬁrmed by Diebold, Rudebush and Aruoba (2005). Following this ﬁndigs from now on we will refer to the macro-yields model as the model with the three Nelson and Siegel yield curve factors plus one unidentiﬁed factor. 2.5 Estimation Results Estimation of the macro-yields model, requires a joint procedure to extract the latent factors, to identify the ﬁrst three as the Nelson and Siegel yields curve factors and to estimate the loadings of all the 118 macroeconomic variables on the extracted factors. As explained in section 2.4, we address this issue estimating the model by quasi maximum likelihood. Figure 2.5 shows the estimated factors of the macro-yields model and, for comparison purposes, also the relative Nelson and Siegel factors. [FIGURE 2.5 AROUND HERE] The ﬁrst three factors of the macro-yields model, which we identiﬁed as the Nelson and Siegel factors, indeed are really close to the original Nelson and Siegel factors. The diﬀerence between the original Nelson and Siegel factors and the macro-yields Nelson and Siegel factors comes from the fact that the macro-yields factors include not only the information contained in the yields but also the one contained in the macroeconomic series. The factor that is more aﬀected by the macroeconomic information is the curvature, since the macro-yields curvature factor is way more 56 persistent than the original Nelson and Siegel curvature. This is also conﬁrmed in Table 2.4 where we report summary statistics of the estimated macro-yields factors and of the Nelson and Siegel factors. [TABLE 2.4 AROUND HERE] Table 2.4 highlights also a certain diﬀerence in the persistence of the macro-yields slope factors and the Nelson and Siegel one. In general, the macro-yields factors tend to be more persistent than the Nelson and Siegel ones. The last panel of Figure 2.5 reports the fourth factor, the unidentiﬁed one. The plot highlights how this factor accounts for the macroeconomic situation. Indeed, during all the recessions in the sample the unidentiﬁed factor decreases drastically. Summary statistics for the unidentiﬁed factor in Table 2.4 show that it has zero mean and almost unit variance, but a high degree of persistency. Table 2.5 displays the goodness of ﬁt of the macro-yields model compared with the alternative models presented in section 2.2.1. In particular, the table reports the mean square error of the only yields model (OY), the basic macro-yields (BMY), the large macro-yields (LMY) and the macro-yields (MY) models for selected maturities. [TABLE 2.5 AROUND HERE] The four models display almost the same performances in ﬁtting the term structure of interest rates except for the shortest yield, where adding a large set of macrovariables clearly worsens the ﬁt. Moreover, the basic macro-yields and the large macro-yields models, even if they are the only models with six factors, they do not display a signiﬁcant improvement with respect to the other two models, namely the only yields model and the macro-yields model. Macro variables do not improve the ﬁt of the yield curve, but this does not imply that they do not have leading information for the yields. The aim of the following chapter is to show that a large set of macro variables helps in forecasting the yields provided they are used parsimoniously. 57 2.6 Out-of-sample forecast The out-of-sample forecasts of the only yields, basic macro-yields, large macro-yields and macro-yields models are obtained iteratively. As mentioned in Section 2.2.1, the only yields, large macro-yields the basic macro-yields models are nested in the macro-yields model.5 Therefore rewriting the macro-yields model (2.2.5)-(2.2.6) in compact notation we obtain a general representation of all the models presented vt ∼ iidN(0, R) zt = ΓFt + vt , Ft = AFt−1 + ut , ut ∼ iidN(0, Q) (2.6.1) (2.6.2) where Ft = (ft , gt ), vt = (vy,t , vx,t ) and ut = (uf,t , ug,t). We generate iterative forecasts for all the models at ﬁrst projecting forward the factors F̂t+h|t = Âh F̂t and then computing the out-of-sample forecast given the projected factors ẑt+h|t = Γ̂F̂t+h|t To evaluate the prediction accuracy at a given forecast horizon h, we use the mean square forecast error (MSFE), the average square error between time t0 and t1 for the h-months ahead forecast of the yield with maturity τ , using a particular model m MSF Ett01 (τ, h, m) t1 2 1 ŷ(τ )m = t+h|t − y(τ )t+h t1 − t0 + 1 t=t (2.6.3) 0 where y(τ )t+h is the realized yield with maturity τ at time t + h and ŷ(τ )m t+h|t is the 5 The only yield model is obtained setting Γxf = Γxg = Γyg = 0 in equations (2.2.5)-(2.2.6). The basic macro-yields model is obtained setting xt = (CUt F F Rt IN F Lt ), Γxf = 0, ,Γxg = I, Γyg = 0 and Rx = 0 in the same equations. And the large macro-yields model can be obtained setting gt = P Ct , Γxf = 0, Γxg = I, Γyg = 0 and Rx = 0 in equations (2.2.5)-(2.2.6). 58 h-steps ahead forecast of the yield with maturity τ made at time t with a particular model m. Forecast results for yields are usually expressed as ratios of the MSFEs of the considered model and the MSFE of a random walk, which is a naı̈ve model very diﬃcult to outperform given the high persistency of the yields. The random walk h-steps ahead prediction at time t of the yield with maturity τ is ŷt+h|t (τ ) = yt (τ ) where the optimal predictor does not change regardless of the maturity of the yield and the forecast horizon. Ang and Piazzesi (2003), Mönch (2006), Favero et al. (2007) and De Pooter et al. (2007) found that macroeconomic variables help in forecasting the yield curve. For this reason, it can be expected that the macro-yields model will outperform the only yields one. However, from the comparison of the macro-yields model with the basic macro-yields and the large macro-yields, it will be possible to show that, not only it is important to use large information to forecast the yields, but it is also crucial to extract the yield curve factors from both the yields and the macro variables in order to be able to capture the co-movement between the yields and the whole economy. 2.6.1 Forecast performances We forecast the yields estimating each model recursively using data from January 1970 until the time that the forecast is made, beginning in January 1985 to December 2000. We use the random walk as benchmark, therefore we construct ratios of each model’s MSFEs over the random walk MSFEs. Table 2.6 reports these ratios for the only yields (OY), basic macro-yields (BMY), large macro-yields (LMY) and macro-yields (MY) models, for selected maturities. [TABLE 2.6 AROUND HERE] 59 The out-of-sample performances of the only yields, the basic macro-yields and the large macro-yields models are similar. This, rather than being interpreted as evidence that the macroeconomic variables are not useful in forecasting the yields, should be considered as a consequence of the lack of parsimony of the basic macro yields model and the large macro-yields models, which both include six factors. The only yields model can be seen as a restricted basic macro-yields, or large macro-yields, model where the loadings of the yields on the observable macroeconomic factors are zero. Therefore, if the macro variables would not be useful in forecasting the yields, it should be expected that the only yields model would outperform the basic macro yields and large macro-yields models. This is not the case, meaning that the macro variables are helpful in forecasting the yields but in the basic macro-yields and the large macro-yields model they are used in a non parsimonious way. The macro-yields model is suited to solve this problem and to exploit a large amount of macro information in a parsimonious way. Indeed the relevance of the macroeconomic variable in forecasting the yields, specially on medium and long horizons, becomes evident looking at the out-of-sample forecasting performance of the macro-yields model. For 6 and 12 months ahead, the macro-yields model not only outperforms the only yields, the basic macro yields and the large macro-yields models but also the random walk. However, at the shortest horizon, i.e. one month ahead, the random walk in most cases provides the best forecast. To investigate the out-of-sample performances of all the models over time, Figures 2.6-2.7 plot the smoothed square forecast errors and the smoothed forecast errors of all the models considered for some selected maturities. The smoothed square forecast errors are computed as a 30 months moving average of the squared forecast errors, while the smoothed forecast errors are computed as a 30 months moving average of the forecast errors. [FIGURES 2.6-2.7 AROUND HERE] 60 The square forecast errors of all the models have a similar pattern both for 6-months and 12-months ahead forecasts. In general, they start to increase just before the recession of July 1981 - November 1982, they peak after the recession and then they decline at the end of the period. However, it is possible to distinguish diﬀerent behaviors across the models. The macro-yields model at both 6-months and 12-months ahead outperforms all the other models, and often also the random walk, for almost all the sample except at the two last years. The opposite happens for the basic macro-yields model, which at both 6-months and 12-months ahead is the worst model for almost all the sample, but just at the last year it slightly outperforms the other models. This can indicate that large macroeconomic information is particularly useful just before and after the recessions, provided it is used parsimoniously. While in the other periods, few macro indicators are enough to convey information about the state of the economy. Figure 2.6 highligts also a bad performance of the benchmark, the random walk, at the beginning and the end of the sample. The forecast errors plotted in Figure 2.7 are also particulary small at the beginning and at the end of the sample, while they increase just before the recession with all the models, sometimes except the macro-yields one, overestimating the yields. However the conclusions, slightly change looking at the forecast errors. In this case, the random walk is always one of the best models, providing small forecast errors. The macro-yields model outperforms all the competitive models in almost the whole sample, and often also the random walk. The forecast errors also indicate that the only yields model, which is the only model that does not use any macro information, is the only model to systematically overestimate the yields. In conclusion, the macro-yields model at 6 and 12-months ahead outperforms on average all the competing models and also the random walk. This result is driven from the fact that the model is particularly able to outperform the others during and just after the recessions. The macro-yields model is the only model able to provide forecast errors that are almost constant in time, while all the other 61 models exhibit high variability of the forecast errors with huge peaks just after the recession. 2.7 Conclusions We propose a new framework to ﬁt and forecast the yield curve using parsimoniously a large amount of macroeconomic information. Our approach is based on a factor model, where the factors are extracted directly from a panel of 17 yields and 118 macro variables. The loadings of the yields on the ﬁrst three factors are characterized by restrictions á la Nelson and Siegel that allow us to identify these ﬁrst three factors as the level, the slope and the curvature of the yield curve. This is an innovative way to use macro variables to forecast the yields, given that the most recent literature was using a small set of macro variables as extra regressors. We show that our approach outperforms the existing methods for all the maturities at mid and long horizons (i.e. 6 and 12-month ahead). 62 2.8 Appendix The more general version of our model can be written as zt = ΓFt + vt vt ∼ N(0, R) Ft = AFt−1 + ut ut ∼ N(0, Q) where R is diagonal and Ft = [ft ⎤ ⎡ Γ= Γ∗ ⎣ yf gt ] and we can partition the matrix Γ as 0 ⎦ Γxf Γxg Where the identiﬁcation of the yield curve factors is achieved through the restrictions on their loadings coming from the Nelson and Siegel representation, with Γ∗yf ⎡ 1 ⎢ ⎢1 ⎢ = ⎢. ⎢ .. ⎣ 1 1−eλτ1 λτ1 1−eλτ2 λτ2 1−eλτ1 λτ1 1−eλτ2 λτ2 1−eλτN λτN 1−eλτN λτN .. . − eλτ1 − eλτ2 .. . ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ − eλτN Following Diebold and Li (2006), we ﬁx λ = 0.0609, the value that maximizes the loading on the curvature factor for the yields with maturity to 30 months.6 The parameters are estimated by maximum likelihood combining EM algorithm and Kalman ﬁlter. As shown in Doz, Giannone and Reichlin (2006), maximum likelihood estimation of a dynamic approximate factor model, when the panel of time series is large, is feasible, in the sense that it guarantees consistency, and represents a valid alternative to principal components. Moreover this methodology is particularly suitable in our case since it allows to impose restriction on the model. Assuming F1 ∼ N(π1 , V1 ) and labeling the time series (z1 , z2 , ...., zT ) = {z} and (F1 , F2 , ...., FT ) = {F }, and the parameters {Γ, R, A, Q, π1 , V1 } = θ, the log6 Using the ECM algorithm is also possible to estimate λ, but despite the increase in the computation burden, the results remain substantially unchanged. 63 likelihood is: ⎛ ⎡ ⎤ ⎤ ⎡ ⎤ ⎤⎞ ⎡ ⎡ 0 0 Γ Γ ⎝ 1 ⎣zt − ⎣ yf ⎦ Ft ⎦ R−1 ⎣zt − ⎣ yf ⎦ Ft ⎦⎠ + L({z}{F }; θ) = − 2 Γ Γ Γ Γ xf xg xf xg t=1 T 1 T − log |R| − [Ft − AFt−1 ] Q−1 [Ft − AFt−1 ] + 2 2 t=2 T 1 T −1 log |Q| [F1 − π1 ] V −1 [F1 − π1 ] + 2 2 1 T (p + k) log 2π − log |V1 | − 2 2 − The EM algorithm alternates Kalman ﬁlter extraction of the factors to the maximization of the likelihood. In particular, for given parameters of the model we use the Kalman ﬁlter to extract the factors (E step). Then given the extracted factors, we maximize the Gaussian likelihood function implied by the Kalman ﬁlter (M step). Therefore in the E step, we compute the expected log-likelihood Q = E[L({z}{F }; θ)|{z}] which depends on three expectations F̂t ≡ E[Ft |{z}] Pt ≡ E[Ft Ft |{z}] |{z}] Pt,t−1 ≡ E[Ft Ft−1 And in the M step, we re-estimate the parameters θ = {Γyg , Γxf , Γxg , R, A, Q, π1 , V1 } taking the corresponding partial derivative of the expected log likelihood, setting to zero, and solving. • Output matrix: since we have the restriction on the upper blocks, we derive the ﬁrst order conditions by blocks. We denote by ft the yield curve factors 64 and by gt the macro factors, such that Ft = ft gt , and by yt the yields and xt the macro variables, such that z = yt xt . - submatrix Γxf Γxg ∂Q =E − ∂ Γxf Γxg T −1 Rxx (xt − Γxf Γxg Ft )Ft |{z} = 0 t=1 T −1 T E [xt Ft |{z}] E [Ft Ft |{z}] Γxf Γxg = t=1 t=1 −1 T T = xt F̂t ⇒ Γnew Pt Γnew xg xf t=1 t=1 • Output noise covariance: T 1 ∂Q T =E − (zt − ΓFt )(zt − ΓFt ) + R|{z} = 0 ∂R−1 2 2 t=1 T 1 E [zt zt − zt Ft Γ − ΓFt zt + ΓFt Ft Γ |{z}] R= T t=1 ⇒ Rnew = T 1 zt zt − zt F̂t Γnew − Γnew F̂t zt + Γnew Pt Γnew T t=1 • State dynamics matrix: T 1 −1 ∂Q =E − Q (Ft − AFt−1 )Ft−1 |{z} = 0 ∂A 2 t=2 A= T t=2 T −1 E Ft Ft−1 |{z} E Ft−1 Ft−1 |{z} t=2 65 ⇒ Anew = T Pt,t−1 t=2 T −1 Pt−1 t=2 • State noise covariance: T 1 T −1 ∂Q Q− =E (Ft − AFt−1 )(Ft − AFt−1 ) |{z} = 0 ∂Q−1 2 2 t=2 1 Q= E Ft Ft − Ft Ft−1 A − AFt−1 Ft + AFt−1 Ft−1 A |{z} T − 1 t=2 T 1 E [Ft Ft − AFt−1 Ft |{z}] T − 1 t=2 T T 1 ⇒ Qnew = Pt − Anew Pt−1,t T − 1 t=2 t=2 T Q= • Initial state mean: ∂Q −1 = E (F − π ) V |{z} =0 1 1 1 ∂π1new ∂Q F̂ = − π V1−1 = 0 1 1 ∂π1new ⇒ π1new = F̂1 • Initial state covariance: 1 1 ∂Q V1 − (F1 − π1 )(F1 − π1 ) |{z} = 0 −1 = E 2 2 ∂V1 V1 = E [F1 F1 − π1 F1 − F1 π1 + π1 π1 |{z}] ⇒ V1new = P1 − F̂1 F̂1 Now we go back to the E step and update the expectations. Using Ft|τ to denote E(Ft |{z}τt=1 ) and Vt|τ to denote the V ar(Ft |{z}τt=1 ), we obtain the following 66 Kalman ﬁlter forward recursions Ft+1|t = AFt|t Vt+1|t = AVt|t A + Q Kt = Vt|t−1 Γ(Γ Vt|t−1 Γ + R)−1 Ft|t = Ft|t−1 + Kt zt − ΓFt|t−1 Vt|t = Vt|t−1 − Kt Γ Vt|t−1 where F1|0 = π1 and V1|0 = V1 . To compute F̂t ≡ Ft|T and Pt ≡ Vt|T + Ft|T Ft|T one performs a set of backward recursion using −1 Jt = Vt|t A Vt+1|t Ft|T = Ft|t + Jt (Ft+1|T − AFt|t ) Vt|T = Vt|t + Jt (Vt+1|T − Vt+1|t )Jt Moreover Pt,t−1 ≡ Vt,t−1|T + Ft|T Ft−1|T can be obtained through the backward recursion Vt,t−1|T = Vt|t Jt−1 + Jt (Vt+1,t|T − AVt|t )Jt−1 which is initialized VT,T −1|T = (I − KT Γ))AVT −1|T −1 . The estimation procedure is initialized using the factors extracted by the two steps OLS procedure introduced by Diebold and Li (2006). These factors are centered around their means and are standardized with the average of the yield standard deviations. The means of these factors multiplied by Γyf are used to center the yields, that is equivalent to center the yields with the mean of the means of the yields, and are standardized by the mean of the standard deviations. The macroeconomic data are centered a round their own means and standardized by their standard deviations. 67 Table 2.1: Summary statistics of the US zero-coupon data τ mean 3 6.75 6 6.98 9 7.10 12 7.20 15 7.31 18 7.38 21 7.44 24 7.46 30 7.55 36 7.63 48 7.77 60 7.84 72 7.96 84 7.99 96 8.05 108 8.08 120 8.05 std dev 2.66 2.66 2.64 2.57 2.52 2.50 2.49 2.44 2.36 2.34 2.28 2.25 2.22 2.18 2.17 2.18 2.14 min 2.73 2.89 2.98 3.11 3.29 3.48 3.64 3.78 4.04 4.20 4.31 4.35 4.38 4.35 4.43 4.43 4.44 max 16.02 16.48 16.39 15.82 16.04 16.23 16.18 15.65 15.40 15.77 15.82 15.01 14.98 14.98 14.94 15.02 14.93 ρ(1) 0.97* 0.97* 0.97* 0.97* 0.97* 0.98* 0.98* 0.98* 0.98* 0.98* 0.98* 0.98* 0.98* 0.98* 0.98* 0.98* 0.98* ρ(2) 0.94* 0.94* 0.94* 0.94* 0.94* 0.94* 0.95* 0.94* 0.95* 0.95* 0.95* 0.96* 0.96* 0.96* 0.96* 0.96* 0.96* ρ(3) 0.91* 0.91* 0.91* 0.91* 0.91* 0.92* 0.92* 0.92* 0.92* 0.93* 0.93* 0.94* 0.94* 0.94* 0.95* 0.95* 0.94* ρ(12) 0.71* 0.73* 0.73* 0.74* 0.75* 0.75* 0.76* 0.75* 0.76* 0.77* 0.78* 0.79* 0.80* 0.78* 0.81* 0.81* 0.78* Descriptive statistics of monthly yields at diﬀerent maturities τ for the sample from January 1970 to December 2000. ρ(p) refers to the sample autocorrelation of the series at lag p and * denotes signiﬁcance at 95 percent conﬁdence level. Conﬁdence intervals are computed according to Box and Jenkins (1976). 68 Table 2.2: Macroeconomic series 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 Code Description a0m052 A0M051 A0M224 R A0M057 A0M059 IPS10 IPS11 IPS299 IPS12 IPS13 IPS18 IPS25 IPS32 IPS34 IPS38 IPS43 IPS307 IPS306 PMP A0m082 LHEL LHELX LHEM LHNAG LHUR LHU680 LHU5 LHU14 LHU15 LHU26 LHU27 A0M005 CES002 CES003 CES006 CES011 CES015 CES017 CES033 CES046 CES048 CES049 CES053 CES088 CES140 A0M048 CES151 CES155 aom001 PMEMP HSFR HSNE HSMW HSSOU HSWST HSBR HSBNE HSBMW HSBSOU HSBWST PMI PMNO PMDEL PMNV A0M008 A0M007 A0M027 A1M092 A0M070 A0M077 Personal income (AR, bil. chain 2000 $) Personal income less transfer payments (AR, bil. chain 2000 $) Real Consumption (AC) A0m224/gmdc Manufacturing and trade sales (mil. Chain 1996 $) Sales of retail stores (mil. Chain 2000 $) INDUSTRIAL PRODUCTION INDEX - TOTAL INDEX INDUSTRIAL PRODUCTION INDEX - PRODUCTS, TOTAL INDUSTRIAL PRODUCTION INDEX - FINAL PRODUCTS INDUSTRIAL PRODUCTION INDEX - CONSUMER GOODS INDUSTRIAL PRODUCTION INDEX - DURABLE CONSUMER GOODS INDUSTRIAL PRODUCTION INDEX - NONDURABLE CONSUMER GOODS INDUSTRIAL PRODUCTION INDEX - BUSINESS EQUIPMENT INDUSTRIAL PRODUCTION INDEX - MATERIALS INDUSTRIAL PRODUCTION INDEX - DURABLE GOODS MATERIALS INDUSTRIAL PRODUCTION INDEX - NONDURABLE GOODS MATERIALS INDUSTRIAL PRODUCTION INDEX - MANUFACTURING (SIC) INDUSTRIAL PRODUCTION INDEX - RESIDENTIAL UTILITIES INDUSTRIAL PRODUCTION INDEX - FUELS NAPM PRODUCTION INDEX (PERCENT) Capacity Utilization (Mfg) INDEX OF HELP-WANTED ADVERTISING IN NEWSPAPERS (1967=100;SA) EMPLOYMENT: RATIO; HELP-WANTED ADS:NO. UNEMPLOYED CLF CIVILIAN LABOR FORCE: EMPLOYED, TOTAL (THOUS.,SA) CIVILIAN LABOR FORCE: EMPLOYED, NONAGRIC.INDUSTRIES (THOUS.,SA) UNEMPLOYMENT RATE: ALL WORKERS, 16 YEARS & OVER (%,SA) UNEMPLOY.BY DURATION: AVERAGE(MEAN)DURATION IN WEEKS (SA) UNEMPLOY.BY DURATION: PERSONS UNEMPL.LESS THAN 5 WKS (THOUS.,SA) UNEMPLOY.BY DURATION: PERSONS UNEMPL.5 TO 14 WKS (THOUS.,SA) UNEMPLOY.BY DURATION: PERSONS UNEMPL.15 WKS + (THOUS.,SA) UNEMPLOY.BY DURATION: PERSONS UNEMPL.15 TO 26 WKS (THOUS.,SA) UNEMPLOY.BY DURATION: PERSONS UNEMPL.27 WKS + (THOUS,SA) Average weekly initial claims, unemploy. insurance (thous.) EMPLOYEES ON NONFARM PAYROLLS - TOTAL PRIVATE EMPLOYEES ON NONFARM PAYROLLS - GOODS-PRODUCING EMPLOYEES ON NONFARM PAYROLLS - MINING EMPLOYEES ON NONFARM PAYROLLS - CONSTRUCTION EMPLOYEES ON NONFARM PAYROLLS - MANUFACTURING EMPLOYEES ON NONFARM PAYROLLS - DURABLE GOODS EMPLOYEES ON NONFARM PAYROLLS - NONDURABLE GOODS EMPLOYEES ON NONFARM PAYROLLS - SERVICE-PROVIDING EMPLOYEES ON NONFARM PAYROLLS - TRADE, TRANSPORTATION, AND UTILITIES EMPLOYEES ON NONFARM PAYROLLS - WHOLESALE TRADE EMPLOYEES ON NONFARM PAYROLLS - RETAIL TRADE EMPLOYEES ON NONFARM PAYROLLS - FINANCIAL ACTIVITIES EMPLOYEES ON NONFARM PAYROLLS - GOVERNMENT Employee hours in nonag. establishments (AR, bil. hours) AV. WEEKLY HRS OF PROD OR NONSUP WORKERS ON PRIV NONFAR - GOODS PROD AV. WEEKLY HRS OF PROD OR NONSUP WORKERS ON PRIV NONFAR - MFG OVERTIME Average weekly hours, mfg. (hours) NAPM EMPLOYMENT INDEX (PERCENT) HOUSING STARTS:NONFARM(1947-58);TOTAL FARM&NONFARM(1959-)(THOUS.,SA HOUSING STARTS:NORTHEAST (THOUS.U.)S.A. HOUSING STARTS:MIDWEST(THOUS.U.)S.A. HOUSING STARTS:SOUTH (THOUS.U.)S.A. HOUSING STARTS:WEST (THOUS.U.)S.A. HOUSING AUTHORIZED: TOTAL NEW PRIV HOUSING UNITS (THOUS.,SAAR) HOUSES AUTHORIZED BY BUILD. PERMITS:NORTHEAST(THOU.U.)S.A HOUSES AUTHORIZED BY BUILD. PERMITS:MIDWEST(THOU.U.)S.A. HOUSES AUTHORIZED BY BUILD. PERMITS:SOUTH(THOU.U.)S.A. HOUSES AUTHORIZED BY BUILD. PERMITS:WEST(THOU.U.)S.A. PURCHASING MANAGERS’ INDEX (SA) NAPM NEW ORDERS INDEX (PERCENT) NAPM VENDOR DELIVERIES INDEX (PERCENT) NAPM INVENTORIES INDEX (PERCENT) Mfrs’ new orders, consumer goods and materials (bil. chain 1982 $) Mfrs’ new orders, durable goods industries (bil. chain 2000 $) Mfrs’ new orders, nondefense capital goods (mil. chain 1982 $) Mfrs’ unfilled orders, durable goods indus. (bil. chain 2000 $) Manufacturing and trade inventories (bil. chain 2000 $) Ratio, mfg. and trade inventories to sales (based on chain 2000 $) Transf. 69 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 2 2 2 4 4 2 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 2 1 1 3 3 3 3 3 3 3 3 3 3 1 1 1 1 4 4 4 4 4 2 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 Code Description FM1 FM2 FM3 FM2DQ FMFBA FMRRA FMRNBA FCLNQ FCLBMC CCINRV A0M095 FSPCOM FSPIN FSDXP FSPXE FYFF CP90 FYAAAC FYBAAC EXRUS EXRSW EXRJAN EXRUK EXRCAN PWFSA PWFCSA PWIMSA PWCMSA PSM99Q PMCP PUNEW PU83 PU84 PU85 PUC PUCD PUS PUXF PUXHS PUXM GMDC GMDCD GMDCN GMDCS CES275 CES277 CES278 HHSNTN MONEY STOCK: M1(CURR,TRAV.CKS,DEM DEP,OTHER CK’ABLE DEP)(BIL$,SA) MONEY STOCK:M2(M1+O’NITE RPS,EURO$,G/P&B/D MMMFS&SAV&SM TIME DEP(BIL$, MONEY STOCK: M3(M2+LG TIME DEP,TERM RP’S&INST ONLY MMMFS)(BIL$,SA) MONEY SUPPLY - M2 IN 1996 DOLLARS (BCI) MONETARY BASE, ADJ FOR RESERVE REQUIREMENT CHANGES(MIL$,SA) DEPOSITORY INST RESERVES:TOTAL,ADJ FOR RESERVE REQ CHGS(MIL$,SA) DEPOSITORY INST RESERVES:NONBORROWED,ADJ RES REQ CHGS(MIL$,SA) COMMERCIAL & INDUSTRIAL LOANS OUSTANDING IN 1996 DOLLARS (BCI) WKLY RP LG COM’L BANKS:NET CHANGE COM’L & INDUS LOANS(BIL$,SAAR) CONSUMER CREDIT OUTSTANDING - NONREVOLVING(G19) Ratio, consumer installment credit to personal income (pct.) S&P’S COMMON STOCK PRICE INDEX: COMPOSITE (1941-43=10) S&P’S COMMON STOCK PRICE INDEX: INDUSTRIALS (1941-43=10) S&P’S COMPOSITE COMMON STOCK: DIVIDEND YIELD (% PER ANNUM) S&P’S COMPOSITE COMMON STOCK: PRICE-EARNINGS RATIO (%,NSA) INTEREST RATE: FEDERAL FUNDS (EFFECTIVE) (% PER ANNUM,NSA) Cmmercial Paper Rate (AC) BOND YIELD: MOODY’S AAA CORPORATE (% PER ANNUM) BOND YIELD: MOODY’S BAA CORPORATE (% PER ANNUM) UNITED STATES;EFFECTIVE EXCHANGE RATE(MERM)(INDEX NO.) FOREIGN EXCHANGE RATE: SWITZERLAND (SWISS FRANC PER U.S.$) FOREIGN EXCHANGE RATE: JAPAN (YEN PER U.S.$) FOREIGN EXCHANGE RATE: UNITED KINGDOM (CENTS PER POUND) FOREIGN EXCHANGE RATE: CANADA (CANADIAN $ PER U.S.$) PRODUCER PRICE INDEX: FINISHED GOODS (82=100,SA) PRODUCER PRICE INDEX:FINISHED CONSUMER GOODS (82=100,SA) PRODUCER PRICE INDEX:INTERMED MAT.SUPPLIES & COMPONENTS(82=100,SA) PRODUCER PRICE INDEX:CRUDE MATERIALS (82=100,SA) INDEX OF SENSITIVE MATERIALS PRICES (1990=100)(BCI-99A) NAPM COMMODITY PRICES INDEX (PERCENT) CPI-U: ALL ITEMS (82-84=100,SA) CPI-U: APPAREL & UPKEEP (82-84=100,SA) CPI-U: TRANSPORTATION (82-84=100,SA) CPI-U: MEDICAL CARE (82-84=100,SA) CPI-U: COMMODITIES (82-84=100,SA) CPI-U: DURABLES (82-84=100,SA) CPI-U: SERVICES (82-84=100,SA) CPI-U: ALL ITEMS LESS FOOD (82-84=100,SA) CPI-U: ALL ITEMS LESS SHELTER (82-84=100,SA) CPI-U: ALL ITEMS LESS MIDICAL CARE (82-84=100,SA) PCE,IMPL PR DEFL:PCE (1987=100) PCE,IMPL PR DEFL:PCE; DURABLES (1987=100) PCE,IMPL PR DEFL:PCE; NONDURABLES (1996=100) PCE,IMPL PR DEFL:PCE; SERVICES (1987=100) AV. HOURLY EARNINGS OF PROD OR NONSUP WORKERS ON PRIV NO - GOODS PROD AV. HOURLY EARNINGS OF PROD OR NONSUP WORKERS ON PRIV NO - CONSTRUCTION AV. HOURLY EARNINGS OF PROD OR NONSUP WORKERS ON PRIV NO - MANIFACTURING U. OF MICH. INDEX OF CONSUMER EXPECTATIONS(BCD-83) Transf. 5 5 5 4 5 5 5 5 1 5 2 4 4 2 4 1 2 2 2 4 4 4 4 4 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 2 The table lists the macro series included in the macroeconomic dataset. The ﬁrst column counts the position in the dataset, the second reports the code of the series, the third shows the name of the variable. The last column reports the transformations applied to original series. These tranformations are coded as: 1:=no transformation (levels are used), 2:= monthly diﬀerences, 3:= logarithm of the level, 4:= monthly ﬁrst diﬀerences of the log levels (in percentage), 5:= annual ﬁrst diﬀerences of the log levels (in percentage). The sample period is January 1970 - December 2000 (372 observations). 70 Table 2.3: Model selection r 3 r RR(r, F̂ ) IC ∗ (r) 4 5 90.12 69.40 64.69 0.06 -0.05 0.03 6 57.14 0.06 Notes: RR(r, F̂ r ) is the sum of the variance of the idiosyncratic component and IC ∗ (r) is the modiﬁed Bai and Ng information criteria presented in equation (2.4.1). Both measures are computed for diﬀerent speciﬁcations of the macro yields model (i.e. with the 3 Nelson and Siegel factors plus 1, 2 or 3 unidentiﬁed factors). Table 2.4: Summary statistics of the estimated factors Mean Var ρ(1) ρ(2) Min Max LM Y LN S SM Y SN S C M Y C N S 8.26 8.26 -1.58 -1.58 0.19 0.19 4.29 4.32 3.75 3.67 1.84 3.27 0.98 0.98 0.96 0.94 0.95 0.79 0.96 0.97 0.90 0.87 0.90 0.64 4.20 4.43 -5.85 -5.62 -3.21 -5.25 14.28 14.15 5.36 5.32 3.71 7.62 UM Y 0.00 1.06 0.92 0.83 -4.44 2.37 Notes: summary statistics of the estimated factors of the macroyields model. LMY denotes the level factor of the macro-yields model, SMY is the slope if the macro-yields model, CMY the curvature and UMY is the unidentiﬁed factor. The table also reports the relative summary statistics for the Nelson and Siegel factors LN S , SN S and CN S . 71 Table 2.5: Goodness of ﬁt maturities 3 12 36 60 120 OY BMY 0.063 0.051 0.010 0.010 0.006 0.006 0.008 0.008 0.024 0.024 LMY 0.046 0.010 0.006 0.008 0.024 MY 0.266 0.017 0.011 0.007 0.039 Notes: MSE of the only yields model (OY), basic macro-yields model (BMY), the large macroyields (LMY) and macro-yields model (MY) on the sample 1970:1 to 2000:12. 72 Table 2.6: Out-of-sample performance 1-month ahead maturities OY BMY LMY 3 1.00 1.28 1.15 12 1.22 1.30 1.17 36 1.26 1.20 1.11 60 1.17 1.01 1.05 120 1.10 1.07 1.03 MY 4.33 1.24 0.98 0.97 1.68 6-months ahead maturities OY BMY LMY 3 1.05 1.31 1.06 12 1.22 1.40 1.25 36 1.10 1.13 1.12 60 1.01 1.01 1.02 120 0.93 0.90 0.94 MY 0.96 0.99 0.87 0.82 0.86 12-months ahead maturities OY BMY LMY 3 1.02 1.01 1.05 12 1.04 1.04 1.13 36 0.95 0.91 1.02 60 0.87 0.82 0.92 120 0.77 0.73 0.82 MY 0.69 0.80 0.74 0.69 0.65 Notes: ratios of the MSFEs of the only yields model (OY), basic macro-yields model (BMY), large macro-yields model (LMY) and macro-yields model (MY) on the MSFE of the random walk, evaluated on the sample 1985:1 to 2000:12. 73 Figure 2.1: Yield data yields 16 14 12 10 8 6 4 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 U.S. zero-coupon yield curve data at monthly frequency from 1970:1 to 2000:12 at maturities 3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 48, 60, 72, 84, 96, 108 and 120 months. The grey-shaded areas indicate the recessions as deﬁned by the NBER. 74 Figure 2.2: Macro-yields model in sample ﬁt: yields 3m 12m 16 16 true fit3 fit4 14 12 true fit3 fit4 14 12 10 10 8 8 6 6 4 4 75 80 85 90 95 00 75 80 60m 85 90 95 00 120m 15 true fit3 fit4 true fit3 fit4 14 12 10 10 8 6 5 75 80 85 90 95 00 75 80 85 90 95 00 The ﬁgure displays the observed yields, for selected maturities, in blue and the relative in sample ﬁt of the macro-yields model. The green line refers to the macro-yields model with only three factors, identiﬁed as the level, slope and curvature. The red line refers to the macro-yields model with four factors: the level, slope and curvature plus one unidentiﬁed factor. The ﬁrst plot refers to the yields with maturity 3 months, the second to the yields with maturity 12 months, the third one to the yields with maturity 60 months and the last one to the yields with maturity 120 months 75 Figure 2.3: Macro-yields model in sample ﬁt: key macro variables/1 PI IP 3 2 2 1 1 0 0 −1 −1 −2 −2 true fit3 fit4 −3 −4 75 80 true fit3 fit4 −3 85 90 95 00 75 80 CU 85 90 95 00 UR 0.8 1 true fit3 fit4 0.6 0.4 0 0.2 −1 0 −2 −0.2 true fit3 fit4 −3 75 80 85 90 95 −0.4 −0.6 00 75 80 85 90 95 00 The ﬁgure displays some observed key macroeconomic variables in blue and the relative in sample ﬁt of the macro-yields model. The green line refers to the macro-yields model with only three factors, identiﬁed as the level, slope and curvature. The red line refers to the macro-yields model with four factors: the level, slope and curvature plus one unidentiﬁed factor. The ﬁrst plot refers to the personal income PI (the ﬁrst variable in table 2.2), the second to the industrial production IP (variable number 6 in table 2.2), the third to the capacity of utilization CU (variable number 20 in table 2.2) and the last one to the unemployment rate UR (variable number 25 in table 2.2). 76 Figure 2.4: Macro-yields model in sample ﬁt: key macro variables/2 EMP PPI true fit3 fit4 1 true fit3 fit4 15 10 0.5 0 5 −0.5 0 75 80 85 90 95 00 75 80 CPI 85 90 95 00 PCE true fit3 fit4 12 10 true fit3 fit4 10 8 8 6 6 4 4 2 2 75 80 85 90 95 00 75 80 85 90 95 00 The ﬁgure displays some observed key macroeconomic variables in blue and the relative in sample ﬁt of the macro-yields model. The green line refers to the macro-yields model with only three factors, identiﬁed as the level, slope and curvature. The red line refers to the macro-yields model with four factors: the level, slope and curvature plus one unidentiﬁed factor. The ﬁrst plot refers to the employment EMP (variable number 33 table 2.2), the second to the producer price index PPI (variable number 95 in table 2.2), the third to the consumer price index CPI (variable number 101 in table 2.2) and the last one the personal consumption expenditure implicit price deﬂator PCE (variable number 111 in table 2.2). 77 Figure 2.5: Estimated macro-yields factors level slope 14 5 MY NS MY NS 12 10 0 8 6 −5 75 80 85 90 95 00 75 curvature 80 85 90 95 00 95 00 unidentified factor 2 MY NS 6 1 4 0 2 −1 0 −2 −2 −3 −4 −4 75 80 85 90 95 00 75 80 85 90 Estimated factors of the macro-yields model. The ﬁrst plot displays the estimated level of the macro-yields model (MY) in blue and the relative Nelson and Siegel factor (NS) in green. The second plot refers to the slope, the third to the curvature and the last one to the unidentiﬁed factor. The gray-shaded area refers to the recessions. 78 Figure 2.6: Smoothed square forecast errors 6 months ahead 12 months 3 months OY BMY LMY ML RW 2 1.5 OY BMY LMY ML RW 2 1.5 1 1 0.5 0.5 90 92 95 97 90 60 months 1.1 1 0.9 1.1 97 0.9 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 95 OY BMY LMY ML RW 1 0.8 92 95 120 months OY BMY LMY ML RW 1.2 90 92 97 90 92 95 97 12 months ahead 3 months 12 months OY BMY LMY ML RW 4 3 OY BMY LMY ML RW 4 3.5 3 2.5 2 2 1.5 1 1 90 92 95 0.5 97 90 60 months 92 95 97 120 months OY BMY LMY ML RW 2.5 3 OY BMY LMY ML RW 2.5 2 2 1.5 1.5 1 1 0.5 90 92 95 97 90 92 95 97 Notes: 30 months moving average square forecast errors for the OY (only yields), BMY (basic macro-yields), LMY (large macro-yields) and MY (macro-yields) models for yields with maturity 3, 12, 60 and 120 months. The MSFE is shown for the out-of-sample period 1985:1-2000:12 for a 6-month horizon in top panel and a 12-month horizon in the bottom panel. The shadowed area indicates the recession between July 1990 and March 1991. 79 Figure 2.7: Smoothed forecast errors 6 months ahead 3 months 12 months OY BMY LMY ML RW 0.8 0.6 0.4 OY BMY LMY ML RW 0.8 0.6 0.4 0.2 0.2 0 0 −0.2 −0.2 −0.4 −0.4 −0.6 −0.6 90 92 95 97 90 60 months 92 95 97 95 97 120 months OY BMY LMY ML RW 0.6 0.4 0.4 0.3 0.2 0.1 0.2 0 −0.1 0 −0.2 −0.2 −0.3 90 92 95 97 90 92 12 months ahead 3 months 12 months OY BMY LMY ML RW 1 OY BMY LMY ML RW 1 0.5 0.5 0 0 −0.5 −0.5 90 92 95 97 60 months 90 0.8 0.6 95 97 95 97 120 months OY BMY LMY ML RW 1 92 0.6 0.4 0.2 0.4 0.2 0 0 −0.2 −0.2 −0.4 90 92 95 97 90 92 Notes: 30 months moving average forecast errors for the OY (only yields), BMY (basic macroyields), LMY (large macro-yields) and MY (macro-yields) models for yields with maturity 3, 12, 60 and 120 months. The MSFE is shown for the out-of-sample period 1985:1-2000:12 for a 6-month horizon in top panel and a 12-month horizon in the bottom panel. The shadowed area indicates the recession between July 1990 and March 1991. 80 Chapter 3 How Arbitrage-Free is the Nelson-Siegel Model? ABSTRACT: This paper tests whether the Nelson and Siegel (1987) yield curve model is arbitrage-free in a statistical sense. Theoretically, the Nelson-Siegel model does not ensure the absence of arbitrage opportunities, as shown by Bjork and Christensen (1999). Still, central banks and public wealth managers rely heavily on it. Using a non-parametric resampling technique and zero-coupon yield curve data from the US market, we ﬁnd that the no-arbitrage parameters are not statistically diﬀerent from those obtained from the NS model, at a 95 percent conﬁdence level. We therefore conclude that the Nelson and Siegel yield curve model is compatible with arbitrage-freeness. Keywords: Nelson-Siegel model; No-arbitrage restrictions; aﬃne term structure models; non-parametric test. JEL classiﬁcation: C14, C15, G12. This chapter is adapted from the paper ”How arbitrage free is the Nelson and Siegel model?” written with Ken Nyholm (ECB) and Rositsa Vidova-Koleva (ECB and Universitat Autonoma de Barcelona), ECB Working Paper 2008, No 874. 81 3.1 Introduction Fixed-income wealth managers in public organizations, investment banks and central banks rely heavily on Nelson and Siegel (1987) type models to ﬁt and forecast yield curves. According to BIS (2005), the central banks of Belgium, Finland, France, Germany, Italy, Norway, Spain, and Switzerland, use these models for estimating zero-coupon yield curves. The European Central Bank (ECB) publishes daily Eurosystem-wide yield curves on the basis of the Soderlind and Svensson (1997) model, which is an extension of the Nelson-Siegel model.1 In its foreign reserve management framework the ECB uses a regime-switching extension of the Nelson-Siegel model, see Bernadell, Coche and Nyholm (2005). There are at least four reasons for the popularity of the Nelson-Siegel model. First, it is easy to estimate. In fact, if the so-called time-decay-parameter is ﬁxed, then Nelson-Siegel curves are obtained by linear regression techniques. If this parameter is not ﬁxed, one has to resort to non-linear regression techniques. In addition, the Nelson-Siegel model can be adapted in a time-series context, as shown by Diebold and Li (2006). In this case the Nelson-Siegel yield-curve model can be seen as the observation equation in a state-space model, and the dynamic evolution of yield curve factors constitutes the transition equation. As a state-space model, estimation can be carried out via the Kalman ﬁlter. Second, by construction, the model provides yields for all maturities, i.e. also maturities that are not covered by the data sample. As such it lends itself as an interpolation and extrapolation tool for the analyst who often is interested in yields at maturities that are not directly observable.2 Third, estimated yield curve factors obtained from the Nelson and Siegel model have intuitive interpretations, as level, slope (the diﬀerence between the long and the short end of the yield curve), and curvature of the yield curve. This interpretation is akin to that obtained by a principal component analysis (see, 1 For Eurosystem-wide yield curves see http://www.ecb.int/stats/money/ yc/html/index.en.html. 2 This is relevant e.g. in a situation where ﬁxed-income returns are calculated to take into account the roll-down/maturity shortening eﬀect. 82 e.g. Litterman and Scheinkman (1991) and Diebold and Li (2006)). Due to the intuitive appeal of the Nelson-Siegel parameters, estimates and conclusions drawn on the basis of the model are easy to communicate. Fourth, empirically the NelsonSiegel model ﬁts data well and performs well in out-of-sample forecasting exercises, as shown by e.g. Diebold and Li (2006) and De Pooter, Ravazzolo and van Dijk (2007). Despite its empirical merits and wide-spread use in the ﬁnance community, two theoretical concerns can be raised against the Nelson-Siegel model. First, it is not theoretically arbitrage-free, as shown by Bjork and Christensen (1999). Second, as demonstrated by Diebold, Ji and Li (2004), it falls outside the class of aﬃne yield curve models deﬁned by Duﬃe and Kan (1996) and Dai and Singleton (2000). The Nelson-Siegel yield curve model operates at the level of yields, as they are observed, i.e. under the so-called empirical measure. In contrast, aﬃne arbitragefree yield curve models specify the dynamic evolution of yields under a risk-neutral measure and then map this dynamic evolution back to the physical measure via a functional form for the market price of risk. The advantage of the no-arbitrage approach is that it automatically ensures a certain consistency between the parameters that describe the dynamic evolution of the yield curve factors under the risk-neutral measure, and the translation of yield curve factors into yields under the physical measure. An arbitrage-free setup will, by construction, ensure internal consistency as it cross-sectionally restricts, in an appropriate manner, the estimated parameters of the model. It is this consistency that guarantees arbitrage freeness. Since a similar consistency is not hard-coded into the Nelson-Siegel model, this model is not necessarily arbitrage-free.3 The main contribution of the current paper is to conduct a statistical test for the equality between the factor loadings of Nelson-Siegel model and the implied arbitrage-free loadings. In the context of a Monte Carlo study, the Nelson and Siegel factors are estimated and used as exogenous factors in an essentially-aﬃne 3 An illustrative example of this issue for a two-factor Nelson-Siegel model is presented by Diebold, Piazzesi and Rudebusch (2005). 83 term structure model to estimate the implied arbitrage-free factor loadings. The no-arbitrage model with time-varying term premia is estimated using the two-step approach of Ang, Piazzesi and Wei (2006), while we use the re-parametrization suggested by Diebold and Li (2006) as our speciﬁcation of the Nelson-Siegel model. In a recent study Christensen, Diebold and Rudebusch (2007) reconcile the Nelson and Siegel modelling setup with the absence of arbitrage by deriving a class of dynamic Nelson-Siegel models that fulﬁll the no-arbitrage constraints. They maintain the original Nelson-Siegel factor-loading structure and derive mathematically, a correction term that, when added to the dynamic Nelson-Siegel model, ensures the fulﬁllments of the no-arbitrage constraints. The correction term is shown to impact mainly very long maturities, in particular maturities above the ten-year segment. While being diﬀerent in setup and analysis method, our paper conﬁrms the ﬁndings of Christensen et al. (2007). In particular, we ﬁnd that the Nelson-Siegel model is not signiﬁcantly diﬀerent from a three-factor no-arbitrage model when it is applied to US zero-coupon yield-curve data. In addition, we outline a general method for empirically testing for the fulﬁllment of the no-arbitrage constraints in yield curve models that are not necessarily arbitrage-free. Our results furthermore indicate that non-compliance with the no-arbitrage constraints is most likely to stem from ”mis-speciﬁcation” in the Nelson-Siegel factor loading structure pertaining to the third factor, i.e. the one often referred to as the curvature factor. Our test is conducted on U.S. Treasury zero-coupon yield data covering the period from January 1970 to December 2000 and spanning 18 maturities from 1 month to 10 years. We rely on a non-parametric resampling procedure to generate multiple realizations of the original data. Our approach to regenerate yield curve samples can be seen as a simpliﬁed version of the yield-curve bootstrapping approach suggested by Rebonato, Mahal, Joshi, Bucholz and Nyholm (2005). In summary, we (1) generate a realization from the original yield curve data using a block-bootstrapping technique; (2) estimate the Nelson-Siegel model on the 84 regenerated yield curve sample; (3) use the obtained Nelson-Siegel yield curve factors as input for the essentially aﬃne no-arbitrage model; (4) estimate the implied no-arbitrage yield curve factor loadings on the regenerated data sample. Steps (1) to (4) are repeated 1000 times in order to obtain bootstrapped distributions for the no-arbitrage parameters. These distributions are then used to test whether the implied no-arbitrage factor loadings are signiﬁcantly diﬀerent from the Nelson-Siegel loadings. Our results show that the Nelson Siegel factor loadings are not statistically diﬀerent from the implied no-arbitrage factor loadings at a 95 percent level of conﬁdence. In an out-of-sample forecasting experiment, we show that the performance of the Nelson-Siegel model is as good as the no-arbitrage counterpart. We therefore conclude that the Nelson and Siegel model is compatible with arbitrage-freeness at this level of conﬁdence. 3.2 Modeling framework Term-structure factor models describe the relationship between observed yields, yield curve factors and loadings as given by yt = a + bXt + t , (3.2.1) where yt denotes a vector of yields observed at time t for N diﬀerent maturities; yt is then of dimension (N × 1). Xt denotes a (K × 1) vector of yield curve factors, where K counts the number of factors included in the model. The variable a is a (N × 1) vector of constants, b is of dimension (N × K) and contains the yield curve factor loadings. t is a zero-mean (N × 1) vector of measurement errors. The reason for the popularity of factor models in the area of yield curve modeling is the empirical observation that yields at diﬀerent maturities generally are highly correlated. So, when the yield for one maturity changes, it is very likely that yields at other maturities also change. As a consequence, a parsimonious representation of 85 the yield curve can be obtained by modeling fewer factors than observed maturities. This empirical feature of yields was ﬁrst exploited in the continuous-time one factor models, where, in terms of equation (3.2.1), Xt = rt , rt being the short rate, see e.g. Merton (1973), Vasicek (1977), Cox, Ingersoll and Ross (1985), Black, Derman and Toy (1990), and Black and Karasinski (1993).4 A richer structure for the dynamic evolution of yield curves can be obtained by adding more yield curve factors to the model. Accordingly, Xt becomes a column-vector with a dimension equal to the number of included factors.5 The multifactor representation of the yield curve is also supported empirically by principal component analysis, see e.g. Litterman and Scheinkman (1991). Multifactor yield curve models can be speciﬁed in diﬀerent ways: the yield curve factors can be observable or unobserved, in which case their values have to be estimated alongside the other parameters of the model; the structure of the factor loadings can be speciﬁed in a way such that a particular interpretation is given to the unobserved yield curve factors, as e.g. Nelson and Siegel (1987) and Soderlind and Svensson (1997); or the factor loadings can be derived from noarbitrage constraints, as in, among many others, Duﬀee (2002), Ang and Piazzesi (2003) and Ang, Bekaert and Wei (2007). Yield curve models that are linear functions of the underlying factors can be written as special cases of equation (3.2.1).6 In this context, the two models used in the current paper are presented below. 3.2.1 The Nelson-Siegel model The Nelson and Siegel (1987) model, as re-parameterized by Diebold and Li (2006), can be seen as a restricted version of equation (3.2.1) by imposing the following 4 The merit of these models mainly lies in the area of derivatives pricing. Yield curve factor models are categorized by Duﬃe and Kan (1996) and Dai and Singleton (2000). 6 Excluded from this list are naturally the quadratic term structure models as proposed by Ahn, Dittmar and Gallant (2002). 5 86 constraints: aN S = 0 NS b = 1 (3.2.2) 1 − exp(−λτ ) λτ 1 − exp(−λτ ) − exp(−λτ ) , λτ (3.2.3) where λ is the exponential decay rate of the loadings for diﬀerent maturities, and τ is time to maturity. This particular loading structure implies that the ﬁrst factor is responsible for parallel yield curve shifts, since the eﬀect of this factor is identical for all maturities; the second factor represents minus the yield curve slope, because it has a maximal impact on short maturities and minimal eﬀect on the longer maturity yields; and, the third factor can be interpreted as the curvature of the yield curve, because its loading has a hump in the middle part of the maturity spectrum, and little eﬀect on both short and long maturities. In summary, the three factors have the interpretation of a yield curve level, slope and curvature. [FIGURE 3.1 AROUND HERE] A visual representation of the Nelson and Siegel factor loading structure is given in Figure 3.1. By imposing the restrictions (3.2.2) to (3.2.3) on equation (3.2.1) we obtain S yt = bN S XtN S + N t , where XtN S = [Lt St (3.2.4) Ct ] represents the Nelson-Siegel yield curve factors: Level, Slope and Curvature, at time t. Empirically the Nelson-Siegel model ﬁts data well, as shown by Nelson and Siegel (1987), and performs relatively well in out-of-sample forecasting exercises (see among others, Diebold and Li (2006) and De Pooter et al. (2007)). However, as mentioned in the introduction, from a theoretical viewpoint the Nelson-Siegel yield curve model is not necessarily arbitrage-free (e.g. see Bjork and Christensen (1999)) and does not belong to the class of aﬃne yield curve models (e.g. see 87 Diebold et al. (2004)). 3.2.2 Gaussian arbitrage-free models The Gaussian discrete-time arbitrage-free aﬃne term structure model can also be seen as a particular case of equation (3.2.1), where the factor loadings are crosssectionally restricted to ensure the absence of arbitrage opportunities. This class of no-arbitrage (NA) models can be represented by A yt = aN A + bN A XtN A + N t , (3.2.5) where the underlying factors are assumed to follow a Gaussian VAR(1) process NA + ut , XtN A = μ + ΦXt−1 with ut ∼ N(0, ΣΣ ) being a (K × 1) vector of errors, μ is a (K × 1) vector of means, and Φ is a (K × K) matrix collecting the autocorrelation coeﬃcients. The elements of aN A and bN A in equation (3.2.5) are deﬁned by A =− aN τ Aτ , τ A bN =− τ Bτ , τ (3.2.6) where, as shown by e.g. Ang and Piazzesi (2003), Aτ and Bτ satisfy the following recursive formulas to preclude arbitrage opportunities 1 Aτ +1 =Aτ + Bτ (μ − Σ λ0 ) + Bτ ΣΣ Bτ − A1 , 2 (3.2.7) Bτ +1 =Bτ (Φ − Σ λ1 ) − B1 , (3.2.8) with boundary conditions A0 = 0 and B0 = 0. The parameters λ0 and λ1 govern the time-varying market price of risk, speciﬁed as an aﬃne function of the yield 88 curve factors Λt = λ0 + λ1 XtN A . A A The coeﬃcients A1 = −aN and B1 = −bN in equations (3.2.7) to (3.2.8) refer to 1 1 the short rate equation A A NA rt = aN + bN + vt , 1 1 Xt where usually rt is approximated by the one-month yield. If the factors XtN A driving the dynamics of the yield curve are assumed to be unobservable, the estimation of aﬃne term structure models requires a joint procedure to extract the factors and to estimate the parameters of the model. This is a diﬃcult task, given the non-linearity of the model and that the number of parameters grows with the number of included factors. As the factors are latent, identifying restrictions have to be imposed. Moreover, as mentioned by Ang and Piazzesi (2003), the likelihood function is ﬂat in the market-price-of-risk parameters and this further complicates the numerical estimation process. The most common procedure to estimate aﬃne term structure models is described by Chen and Scott (1993). It relies on the assumption that as many yields, as factors, are observed without measurement error. Hence, it allows for recovering the latent factors from the observed yields by inverting the yield curve equation. Unfortunately, the estimation results will depend on which yields are assumed to be measured without error and will vary according to the choice made. Alternatively, to reduce the degree of arbitrariness, observable factor can be used. For example, Ang et al. (2006) use the short rate, the spread and the quarterly GDP growth rate as yield curve factors. It is also possible to rely on pure statistical techniques in the determination of yield curve factors, as e.g. De Pooter et al. (2007) who use extracted principal components as yield curve factors. 89 3.2.3 Motivation The aﬃne no-arbitrage term structure models impose a structure on the loadings aN A and bN A , presented in equations (3.2.6) to (3.2.8), such that the resulting yield curves, in the maturity dimension, are compatible with the estimated timeseries dynamics for the yield curve factors. This hard-coded internal consistency between the dynamic evolution of the yield curve factors, and hence the yields at diﬀerent maturity segments of the curve, is what ensures the absence of arbitrage opportunities. A similar constraint is not integrated in the setup of the NelsonSiegel model (see, Bjork and Christensen (1999)). However, in practice, when the Nelson-Siegel model is estimated, it is possible that the no-arbitrage constraints are approximately fulﬁlled, i.e. fulﬁlled in a statistical sense, while not being explicitly imposed on the model. It cannot be excluded that the functional form of the yield curve, as it is imposed by the Nelson and Siegel factor loading structure in equations (3.2.2) and (3.2.3), fulﬁls the no-arbitrage constraints most of the times. As a preliminary check for the comparability of the Nelson-Siegel model and !NA the no-arbitrage model, Figure 3.2 compares extracted yield curve factors i.e. X t ! N S for US data from 1970 to 2000 (the data is presented in Section 3.3). We and X t estimate the Nelson-Siegel factors as in Diebold and Li (2006), and the no-arbitrage model as in Ang and Piazzesi (2003) using the Chen and Scott (1993) method, and assuming that yields at maturities 3, 24, 120 months are observed without error. [FIGURE 3.2 AROUND HERE] Although the two models have diﬀerent theoretical backgrounds and use diﬀerent estimation procedures, the extracted factors are highly correlated. Indeed, the estimated correlation between the Nelson-Siegel level factor and the ﬁrst latent factor from the no-arbitrage model is 0.95. The correlation between the slope and the second latent factor is 0.96 and between the curvature and the third latent factor is 0.65.7 7 Correlations are reported in absolute value. 90 On the basis of these results and in order to properly investigate whether the Nelson-Siegel model is compatible with arbitrage-freeness, we conduct a test for the equality of the Nelson-Siegel factor loadings to the implied no-arbitrage ones obtained from an arbitrage-free model. To ensure correspondence between the Nelson-Siegel model and its arbitrage-free counterpart, we use extracted NelsonSiegel factors as exogenous factors in the no-arbitrage setup. The model that we estimate is the following ! N S + N A , yt = aN A + bN A X t t A N ∼ (0, Ω), t (3.2.9) !tN S are the estimated Nelson-Siegel factors from equations (3.2.2) to (3.2.4), where X A the observation errors N are not assumed to be normally distributed and aN A and t bN A satisfy the no-arbitrage restrictions presented in equations (3.2.6) to (3.2.8). In order to impose these no-arbitrage restrictions we have to ﬁt a VAR(1) on the estimated Nelson-Siegel factors NS X̂tN S = μ + ΦX̂t−1 + ut , (3.2.10) with ut ∼ N(0, ΣΣ ), to specify the market price of risk as an aﬃne function of the estimated Nelson-Siegel factors Λt = λ0 + λ1 X̂tN S , (3.2.11) and the short rate equation as A A NS + bN + vt . rt = aN 1 1 X̂t (3.2.12) In this way, we estimate the no-arbitrage factor loading structure that emerges when the underlying yield curve factors are identical to the Nelson-Siegel yield curve factors. The test is then formulated in terms of the equality between the intercepts of the two models, aN S and aN A , and the relative loadings, bN A and bN S . 91 3.3 Data We use U.S. Treasury zero-coupon yield curve data covering the period from January 1970 to December 2000 constructed by Diebold and Li (2006), based on end-ofmonth CRSP government bond ﬁles.8 The data is sampled at a monthly frequency providing a total of 372 observations for each of the maturities observed at the (1, 3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 48, 60, 72, 84, 96, 108, 120) month segments. [FIGURE 3.3 AROUND HERE] The data is presented in Figure 3.3. The surface plot illustrates how the yield curve evolves over time. Table 3.1 reports the mean, standard deviation and autocorrelations to further illustrate the properties of the data. [TABLE 3.1 AROUND HERE] The estimated autocorrelation coeﬃcients are signiﬁcantly diﬀerent from zero at a 95 percent level of conﬁdence for lag one through twelve, across all maturities.9 Such high autocorrelations could suggest that the underlying yield series are integrated of order one. If this is the case, we would need to take ﬁrst-diﬀerences to make the variables stationary before valid statistical inference could be drawn, or we would have to resort to co-integration analysis. However, economic theory tells us that nominal yield series cannot be integrated, since they have a lower bound support at zero and an upper bound support lower than inﬁnity. Consequently, and in accordance with the yield-curve literature, we model yields in levels and thus disregard that their in-sample properties could indicate otherwise.10 8 The data can be downloaded from http://www.ssc.upenn.edu/ fdiebold/papers/ paper49/FBFITTED.txt and Diebold and Li (2006, pp. 344-345) give a detailed description of the data treatment methodology applied. 9 A similar degree of persistence in yield curve data is also noted by Diebold and Li (2006). 10 It is often the case in yield-curve modeling that yields are in levels. See, among others, Nelson and Siegel (1987), Diebold and Li (2006), Diebold, Rudebusch and Aruoba (2006), Ang and Piazzesi (2003) and Dai and Singleton (2000). 92 3.4 Estimation Procedure ! N S in equation (3.2.4), we follow Diebold To estimate the Nelson-Siegel factors X t and Li (2006) by ﬁxing the decay parameter λ = 0.0609 in equation (3.2.3) and by using OLS.11 We treat the obtained Nelson-Siegel factors as observable in the estimation of the no-arbitrage model presented in equations (3.2.6) to (3.2.12). To estimate the parameters of the arbitrage-free model we standardize the Nelson and Siegel factors and use the two-step procedure proposed by Ang et al. (2006). In the ﬁrst step, we ﬁt a VAR(1) for the standardized Nelson-Siegel factors to ! and Σ ! from equation (3.2.10). And, we project the short rate (oneestimate μ !, Φ month yield) on the standardized Nelson-Siegel yield curve factors, to estimate the parameters in the short rate equation (3.2.12). In the second step, we minimize the sum of squared residuals between observed yields and ﬁtted yields to estimate !0 and λ !1 of equation (3.2.11). Finally, we the market-price-of-risk parameters λ un-standardize the Nelson-Siegel factors and compute ! aN A and !bN A . Our goal is to test whether the Nelson-Siegel model in equations (3.2.2) to (3.2.4) is statistically diﬀerent from the no-arbitrage model in equations (3.2.6) to ! N S are the same for both models we can (3.2.12). Since the estimated factors, X t formulate our hypotheses is the following way: A S = aN ≡ 0, H01 : aN τ τ A NS H02 : bN τ (1) = bτ (1), A NS H03 : bN τ (2) = bτ (2), A NS H04 : bN τ (3) = bτ (3), A where bN τ (k) denotes the loadings on the k-th factor in the no-arbitrage model at S maturity τ , and bN τ (k) denotes the corresponding variable from the Nelson-Siegel model. 11 This value of λ maximizes the loading on the curvature at 30 months maturity as shown by Diebold and Li (2006). 93 We claim that the Nelson-Siegel model is statistically compatible with arbitragefreeness if H01 to H04 are not rejected at traditional levels of conﬁdence. Notice that to test for H01 to H04 we only need to estimate aN A and bN A , since the Nelson-Siegel loading structure is ﬁxed from the model. To account for the two-step estimation procedure of the no-arbitrage model and for the generated regressor problem, we construct conﬁdence intervals around âN A and b̂N A using the resampling procedure described in the next section. 3.4.1 Resampling procedure To recover the empirical distributions of the estimated parameters we conduct block resampling and reconstruct multiple yield curve data samples from the original yield curve data in the following way. We denote with G the matrix of observed yield ratios with elements yt,τ /yt−1,τ where t = (2, . . . , T ) and τ = (1, . . . , N). We ﬁrst randomly select a starting yield curve yk , where the index k is an integer drawn randomly from a discrete uniform distribution [1, . . . , T ]. The resulting k marks the random index value at which the starting yield curve is taken. In a second step, blocks of length w are sampled from the matrix of yield ratios G. The generic i-th block can be denoted by " gz,i where z is a random number from [2, . . . , T − w + 1] denoting the ﬁrst observation of the block and I is the maximum number of blocks drawn, i = 1 . . . I. 12 A full data-sample of regenerated yield " can then be constructed by vertical concatenation of the drawn data curve ratios G blocks " gz,i for i = 1 . . . I. Finally, a new data set of resampled yields can be constructed via: ⎧ ⎪ ⎨y"1 = yk ⎪ ⎩y"s " s, = y"s−1 {G} (3.4.1) s = 2, . . . , S, " s denotes the sth row of the matrix of resampled ratios G, " and denotes where {G} 12 We use ∼ to indicate the re-sampled variables. 94 element by element multiplication. We choose to resample from yield ratios for two reasons. First, it ensures positiveness of the resampled yields. Second, as reported in Table 3.1, yields are highly autocorrelated and close to I(1). Therefore, one could resample from ﬁrst diﬀerences, but as reported in Table 3.2, ﬁrst diﬀerences of yields are highly autocorrelated and not variance-stationary. Yield ratios display better statistical properties regarding variance-stationarity, as can be seen by comparing the correlation coeﬃcients for squared diﬀerences and ratios in Table 3.2. Block-bootstrapping is used to account for serial correlation in the yield curve ratios. [TABLE 3.2 AROUND HERE] A similar resampling technique has been proposed by Rebonato et al. (2005). They provide a detailed account for the desirable statistical features of this approach. In the present context we recall that the method ensures: (i) the exact asymptotic recovery of all the eigenvalues and eigenvectors of yields; (ii) the correct reproduction of the distribution of curvatures of the yield curve across maturities; (iii) the correct qualitative recovery of the transition from super- to sub-linearity as the yield maturity is increased in the variance of n-day changes, and (iv) satisfactory accounting of the empirically-observed positive serial correlations in the yields. To test hypotheses H01 to H04 we employ the following scheme: 1. Construct a yield curve sample y" following equation (3.4.1); "tN S on y"; 2. Estimate the Nelson-Siegel yield curve factors X "tN S to estimate the parameters " 3. Use X aN A and "bN A from the arbitrage-free model given in equations (3.2.6) - (3.2.12); 4. Repeat steps 1 to 3, 1000 times to build a distribution for the parameter estimates âN A and b̂N A ; 95 5. Construct conﬁdence intervals for âN A and b̂N A using the sample quantiles of the empirical distribution of the estimated parameters. Note that by ﬁxing λ in step 2, the Nelson-Siegel factor loading structure remains unchanged from repetition to repetition. We set the block length equal to 50 observations, i.e. w = 50, and generate a total of 370 yield curve observations for each replication, i.e. S = 370.13 3.5 Results This section presents three sets of results to help assess whether the Nelson-Siegel model is compatible with arbitrage-freeness when applied to US zero-coupon data. Our main result is a test of equality of the factor loadings on the basis of the resampling technique outlined in section 3.4. In addition we compare the in-sample and out-of-sample performance of the Nelson-Siegel model, equations (3.2.2) - (3.2.4), to the no-arbitrage model based on exogenous Nelson-Siegel yield curve factors, equations (3.2.6) - (3.2.12). 3.5.1 Testing results Using the resampling methodology outlined in section 3.4, we generate empirical distributions for each factor loading of the no-arbitrage yield curve model in equation (3.2.9). Results are presented for each maturity covered by the original data sample. The Nelson-Siegel factor loading structure, in equations (3.2.2) and (3.2.3), is constant across all bootstrapped data sampled because λ is treated as a known parameter.14 Hence, only the extracted Nelson-Siegel factors vary across the bootstrap samples. 13 The last block is drawn to contain 20 observations as to obtain a total number of observations for each regenerated sample close to the number of observations of the original sample, 372. 14 The results presented in the paper are robust to changes in λ. We have performed the calculations for other values of λ, namely λ = 0.08, λ = 0.045, and λ = 0.0996, and the results for these values of λ are qualitatively the same as the ones presented in the paper. 96 Parameter estimates and corresponding empirical conﬁdence intervals for the no-arbitrage model, equations (3.2.6) - (3.2.12), are shown in Table 3.3. The diago! and nal elements of the matrices holding the estimated autoregressive coeﬃcients Φ ! in equation (3.2.10), are signiﬁcantly the covariance matrix of the VAR residuals Σ, diﬀerent from zero at a 95 percent level of conﬁdence. In addition, the estimates A NA of aN in equation (3.2.12), 1 , and the two ﬁrst elements of the (3 × 1) vector b1 are also diﬀerent from zero, judged at the same level of conﬁdence. [TABLE 3.3 AROUND HERE] The estimated intercepts of the no-arbitrage model âN A , computed as in equations (3.2.6)- (3.2.7), are presented in Table 3.4, for each maturity covered by the original data. This table reports also the 95 percent conﬁdence intervals, obtained from the resampling, and the Nelson-Siegel intercepts, aN S . Therefore, results in Table 3.4 allow for testing H01 for the equality between the intercepts in the yield curve equations for the no-arbitrage and the Nelson-Siegel models. Tables 3.5 to 3.7 present the corresponding results that allow us to test H02 , H03 , and H04 , i.e. whether the corresponding yield curve factor loadings are identical, in a statistical sense. [TABLE 3.4 to 3.7 AROUND HERE] Figure 3.4 gives a visual representation of the results contained in Tables 3.4 to 3.7. The ﬁgure shows the estimated no-arbitrage loadings, âN A and b̂N A , with the relative 50 percent and 95 percent empirical conﬁdence intervals obtained from resampling, as well as the parameter values for the Nelson-Siegel model, bN S , for comparison. It is clear from Figure 3.4 that the empirical distributions are highly skewed for most of the maturities. Consider, for example, the plot for the intercept estimates (the top left plot in Figure 3.4) at maturity 120. It is evident that the distribution of the no-arbitrage coeﬃcient is highly right skewed. 97 [FIGURE 3.4 AROUND HERE] This non-normality of the distributions for the estimated no-arbitrage parameters, is further analyzed in Table 3.8. This table shows that all distributions display skewness, excess kurtosis, or both. Selected maturities are shown in Table 3.8, however, this result holds for all maturities included in the sample. We also perform the Jarque-Bera test for normality, and reject normality at a 95 percent conﬁdence level for all maturities. [TABLE 3.8 AROUND HERE] Visual conﬁrmation of the documented non-normality is provided by Figures 3.5 to 3.8. For a representative selection of maturities, these ﬁgures show the empirical distribution of the estimated no-arbitrage loadings, and a normal distribution approximation. In addition, the ﬁgures show the 95 percent conﬁdence intervals derived from the empirical distribution and the normal approximation. [FIGURE 3.5 to 3.8 AROUND HERE] The non-normality of the empirical distributions for the bootstrapped intercepts âN A , and factor loadings b̂N A , indicates that the conﬁdence intervals should be constructed using the sample quantiles of the empirical distribution. The empirical 95 percent conﬁdence intervals are included in Tables 3.4, 3.5, 3.6 and 3.7. The lower bound of the conﬁdence intervals is denoted by a subscript L, and the upper bound by a U. By inspecting the tables, we reach the following conclusions for the tested hypotheses: A S H01 : aN = aN ≡0 τ τ not rejected at a 95% level of conﬁdence, A NS H02 : bN τ (1) = bτ (1) not rejected at a 95% level of conﬁdence, A NS H03 : bN τ (2) = bτ (2) not rejected at a 95% level of conﬁdence, A NS H04 : bN τ (3) = bτ (3) not rejected at a 95% level of conﬁdence. 98 For the test of the curvature parameter in H04 an additional comment is warranted. As can be seen from Figure 3.4, the curvature parameter, at middle maturities, is the closest to violating the 95 percent conﬁdence band, and this parameter thus constitutes the “weak point” of the Nelson-Siegel model in relation to the noarbitrage constraints. This ﬁnding is in line with Bjork and Christensen (1999) who prove that a Nelson-Siegel type model with two additional curvature factors, each with its own λ, theoretically would be arbitrage-free. However, when acknowledging that Litterman and Scheinkman (1991) ﬁnd that the curvature factor only accounts for approximately 2 percent of the variation of yields, and in the light of our results, one can question the signiﬁcance of imposing constraints on parameters that have an explanatory power in the range of 2 percent. Our empirical ﬁnding is also supported by the theoretical results in Christensen et al. (2007) who show that adding an additional term at very long maturities reconciles the dynamic NelsonSiegel model with the aﬃne arbitrage-free term structure models. Using yield curve modeling for purposes other than relative pricing, as for example central bankers and ﬁxed-income strategists do, one might be tempted to use the Nelson-Siegel model on the basis of its arbitrage-freeness compatibility. The hypothesis H01 through H04 test the equality between each no-arbitrage factor loading and the corresponding Nelson-Siegel factor loading separately. The results reported above are conﬁrmed by a joint F test. To perform the test we use the empirical variance-covariance matrix of the estimates. The test statistic is 0.22 and the 95 percent critical F-value with 72 and 300 degrees of freedom is 1.34. Therefore, we also cannot reject the hypothesis that the loading structures of the two models are equal in a statistical sense. 3.5.2 In-sample comparison To conduct an in-sample comparison of the two models, we estimate the NelsonSiegel model in equations (3.2.2) - (3.2.4) and the no-arbitrage model in equations (3.2.6) - (3.2.12), where the latter model uses the yield curve factors extracted from 99 the former. Measures of ﬁt are displayed in Table 3.9. A general observation is that both models ﬁt data well: the means of the residuals for all maturities are close to zero and show low standard deviations. The root mean squared error, RMSE, and the mean absolute deviation, MAD, are also low and similar for both models. More speciﬁcally, Table 3.9 shows that the averages of the residuals from the ﬁtted Nelson-Siegel model, ˆN S , for the included maturities, are all lower than 16 basis points, in absolute value. In fact, the mean of the absolute residuals across maturities is 5 basis points, while the corresponding number for ˆN A is 3 basis points. The 3 months maturity is the worst ﬁtted maturity for the no-arbitrage model with a mean of the residuals of 8 basis points. For the Nelson-Siegel model the worst ﬁtted maturity is the 1 month segment with a mean of the residuals close to -16 bp. Furthermore, the two models have the same amount of autocorrelation in the residuals. A similar observation is made for the Nelson-Siegel model alone by Diebold and Li (2006). [TABLE 3.9 AROUND HERE] Drawing a comparison on the basis of RMSE and MAD ﬁgures gives the conclusion that both models ﬁt data equally well. 3.5.3 Out-of-sample comparison As a last comparison-check of the equivalence of the Nelson-Siegel model and the no-arbitrage counterpart, we perform an out-of-sample forecast experiment. In particular, we generate h-steps ahead iterative forecasts in the following way. First, the yield curve factors are projected forward using the estimated VAR parameters from equation (3.2.10) NS X̂t+h|t = h−1 Φ̂s μ̂ + Φ̂h X̂tN S , s=0 100 where h ∈ {1, 6, 12} is the forecasting horizon in months. Second, out-of-sample forecasts are calculated for the two models, given the projected factors, NS NS ŷt+h|t = bN S X̂t+h|t , NA A A NS =! aN + !bN ŷt+h|t t t X̂t+h|t , A A and ! aN indicate that parameters are estimated using where subscripts t on ! aN t t data until time t. To evaluate the prediction accuracy at a given forecasting horizon, we use the mean squared forecast error, MSFE, the average squared error over the evaluation period, between t0 and t1 , for the h-months ahead forecast of the yield with maturity τ MSF E(τ, h, m) = t1 m 2 1 ŷt+h,τ |t − yt+h,τ , t1 − t0 + 1 t=t (3.5.1) 0 where m ∈ {NA, NS} denotes the model. The results presented are expressed as ratios of the MSFEs of the two models against the MSFE of a random walk. The random walk represents a naı̈ve forecasting model that historically has proven very diﬃcult to outperform. The success of the random walk model in the area of yield curve forecasting is due to the high degree of persistence exhibited by observed yields. The random walk h-step ahead prediction, at time t, of the yield with maturity τ is ŷt+h,τ |t = yt,τ . To produce the ﬁrst set of forecasts, the model parameters are estimated on a sample deﬁned from 1970:01 to 1993:01, and yields are forecasted for the chosen horizons, h. The data sample is then increased by one month and the parameters are re-estimated on the new data covering 1970:01 to 1993:02. Again, forecasts are produced for the forecasting horizons. This procedure is repeated for the full sample, generating forecasts on successively increasing data samples. The forecasting 101 performances are then evaluated over the period 1994:01 to 2000:12 using the mean squared forecast error, as shown in equation (3.5.1). Table 3.10 reports on the out-of-sample forecast performance of the NelsonSiegel and the implied no-arbitrage model evaluated against the random walk forecasts. [FIGURE 3.10 AROUND HERE] The well-known phenomenon of the good forecasting performance of the random walk model is observed for the 1 month forecasting horizon. For the 6 and 12 month forecasting horizons, the Nelson-Siegel model and the no-arbitrage counterpart generally perform better than the random walk model, as shown by ratios being less than one. Turning now to the relative comparison of the no-arbitrage model against the Nelson-Siegel model, it can be concluded that they exhibit very similar forecasting performances. If we consider every maturity for each forecasting horizon as an individual observation, then there are in total 54 observations. In 18 of these cases the Nelson-Siegel model is better, in 24 cases the no-arbitrage model is better, and in the remaining 12 cases the models perform equally well. Even when one model is judged to be better than its competitor, the diﬀerences in the performance ratios are very small. Typically, a diﬀerence is only seen at the second decimal with a magnitude of 1 to 3 basis points. In summary, it can be concluded that there is no systematic pattern across maturities and forecasting horizons showing when one model is better than its competitor. Indeed, to formally compare the forecasting performance of the two models we calculate the Diebold-Mariano statistic for each maturity and forecasting horizon. At a 5 percent level we do not reject the hypothesis that the no-arbitrage model and the Nelson-Siegel model forecast equally well, see Table 3.11. [TABLE 3.11 AROUND HERE] 102 3.6 Conclusion In this paper we show that the model proposed by Nelson and Siegel (1987) is compatible with arbitrage-freeness, in the sense that the factor loadings from the model are not statistically diﬀerent from those derived from an arbitrage-free model which uses the Nelson-Siegel factors as exogenous factors, at a 95 percent level of conﬁdence. In theory, the Nelson-Siegel model is not arbitrage-free as shown by Bjork and Christensen (1999). However, using US zero-coupon data from 1970 to 2000, a yield curve bootstrapping approach and the implied arbitrage-free factor loadings, we cannot reject the hypothesis that Nelson-Siegel factor loadings fulﬁll the noarbitrage constraints, at a 95 percent conﬁdence level. Furthermore, we show that the Nelson-Siegel model performs as well as the no-arbitrage counterpart in an out-of-sample forecasting experiment. Based on these empirical observations, we conclude that the Nelson-Siegel model is compatible with arbitrage-freeness. This conclusion is of relevance to ﬁxed-income money managers and central banks in particular, since such organizations traditionally rely heavily on the NelsonSiegel model for policy and strategic investment decisions. 103 Table 3.1: Summary statistics of the US zero-coupon data τ mean 1 6.44 3 6.75 6 6.98 9 7.10 12 7.20 15 7.31 18 7.38 21 7.44 24 7.46 30 7.55 36 7.63 48 7.77 60 7.84 72 7.96 84 7.99 96 8.05 108 8.08 120 8.05 std dev 2.58 2.66 2.66 2.64 2.57 2.52 2.50 2.49 2.44 2.36 2.34 2.28 2.25 2.22 2.18 2.17 2.18 2.14 min 2.69 2.73 2.89 2.98 3.11 3.29 3.48 3.64 3.78 4.04 4.20 4.31 4.35 4.38 4.35 4.43 4.43 4.44 max 16.16 16.02 16.48 16.39 15.82 16.04 16.23 16.18 15.65 15.40 15.77 15.82 15.01 14.98 14.98 14.94 15.02 14.93 ρ(1) 0.97* 0.97* 0.97* 0.97* 0.97* 0.97* 0.98* 0.98* 0.98* 0.98* 0.98* 0.98* 0.98* 0.98* 0.98* 0.98* 0.98* 0.98* ρ(2) 0.93* 0.94* 0.94* 0.94* 0.94* 0.94* 0.94* 0.95* 0.94* 0.95* 0.95* 0.95* 0.96* 0.96* 0.96* 0.96* 0.96* 0.96* ρ(3) 0.89* 0.91* 0.91* 0.91* 0.91* 0.91* 0.92* 0.92* 0.92* 0.92* 0.93* 0.93* 0.94* 0.94* 0.94* 0.95* 0.95* 0.94* ρ(12) 0.69* 0.71* 0.73* 0.73* 0.74* 0.75* 0.75* 0.76* 0.75* 0.76* 0.77* 0.78* 0.79* 0.80* 0.78* 0.81* 0.81* 0.78* Descriptive statistics of monthly yields at diﬀerent maturities, τ , for the sample from January 1970 to December 2000. ρ(p) refers to the sample autocorrelation of the series at lag p and * denotes signiﬁcance at 95 percent conﬁdence level. Conﬁdence intervals are computed according to Box and Jenkins (1976). 104 Table 3.2: Autocorrelations τ 1 3 6 12 24 36 60 84 120 τ 1 3 6 12 24 36 60 84 120 ρ(1) 0.06 0.12* 0.16* 0.15* 0.18* 0.14* 0.13* 0.10 0.10 Yield diﬀerences ρ(3) ρ(12) ρ2 (1) -0.07 -0.06 0.23* -0.05 -0.13* 0.34* -0.09 -0.08 0.32* -0.10 -0.05 0.16* -0.11* 0.00 0.21* -0.11* 0.03 0.12* -0.07 0.03 0.09 -0.09 -0.03 0.17* -0.05 -0.03 0.15* ρ2 (3) 0.08 0.07 0.09 0.11* 0.13* 0.14* 0.13* 0.22* 0.19* ρ2 (12) 0.08 0.22* 0.20* 0.13* 0.13* 0.14* 0.13* 0.18* 0.23* ρ(1) 0.07 0.11* 0.16* 0.16* 0.16* 0.13* 0.12* 0.11* 0.08 Yield ratios ρ(12) ρ2 (1) 0.10 0.23* 0.01 0.34* 0.04 0.25* 0.04 0.10 0.03 0.06 0.06 0.01 0.05 0.01 0.00 0.04 0.00 0.03 ρ2 (3) 0.12* 0.10 0.13* 0.13* 0.12* 0.06 0.01 0.07 0.06 ρ2 (12) 0.02 0.16* 0.13* 0.07 0.03 0.05 0.01 0.03 0.06 ρ(3) -0.05 0.00 0.00 -0.04 -0.07 -0.09 -0.04 -0.04 -0.03 Sample autocorrelations of ﬁrst yield diﬀerences y, yt squared ﬁrst yield diﬀerences y 2 , yield ratios yt−1 and 2 yt − μ̄ , for selected squared demeaned yield ratios yt−1 maturities τ , at lags 1, 3 and 12. ∗ denotes signiﬁcance at 95 percent conﬁdence level. Conﬁdence intervals are computed according to Box and Jenkins (1976). ρ(p) and ρ2 (p) denote, respectively, the correlation of the variables and their squares, at lag p. 105 Table 3.3: Parameter estimates Parameter μ̂1 μ̂2 μ̂3 Estimated value -0.247 -0.006 -0.408 Φ̂11 Φ̂21 Φ̂31 Φ̂12 Φ̂22 Φ̂32 Φ̂13 Φ̂23 Φ̂33 0.991* -0.031 0.070 0.024 0.933* 0.036 0.000 0.038 0.771* 0.926 -0.094 -0.102 -0.037 0.888 -0.140 -0.035 -0.015 0.755 1.021 0.032 0.154 0.068 1.013 0.185 0.062 0.082 0.975 Σ̂11 Σ̂21 Σ̂31 Σ̂22 Σ̂32 Σ̂33 0.162* -0.051 -0.110 0.324* 0.009 0.596* 0.086 -0.192 -0.302 0.067 -0.170 0.150 0.306 0.042 0.014 0.305 0.071 0.532 106 Q2.5 Q97.5 -1.170 0.911 -0.992 1.158 -1.164 0.895 Parameter estimates (continued) Parameter Estimated value Q2.5 Q97.5 λ̂0,1 λ̂0,2 λ̂0,3 -0.215 -0.354 0.297 -3.672 -3.043 -2.390 1.967 1.995 3.053 λ̂1,11 λ̂1,21 λ̂1,31 λ̂1,12 λ̂1,22 λ̂1,32 λ̂1,13 λ̂1,23 λ̂1,33 -0.062 -0.123 0.124 0.117 -0.049 0.150 -0.187 -0.169 -0.024 -0.470 -0.799 -1.098 -2.734 -0.633 -1.080 -4.208 -2.238 -0.399 1.262 0.523 0.728 1.051 1.343 1.378 0.209 -0.019 3.209 A âN 1 0.537* 0.115 1.202 A b̂N 1 (1) A b̂N 1 (2) A b̂N 1 (3) 0.168* 0.146* 0.000 0.064 0.061 -0.039 0.390 0.623 0.023 Estimated parameters from the no-arbitrage model in equations (3.2.6) to (3.2.12) with the 95 percent conﬁdence intervals obtained by resampling. The conﬁdence intervals [Q2.5 Q97.5 ] refer to the empirical 2.5 percent and 97.5 percent quantiles of the distributions of the parameters. A star * is used to indicate when a parameter estimate is signiﬁcantly diﬀerent from zero at a 95 percent level of conﬁdence. 107 Table 3.4: Estimation τ aN S âN A 1 0.00 0.00 3 0.00 0.00 6 0.00 0.00 9 0.00 0.01 12 0.00 0.01 15 0.00 0.00 18 0.00 0.00 21 0.00 0.00 24 0.00 0.00 30 0.00 0.00 36 0.00 -0.01 48 0.00 -0.01 60 0.00 -0.01 72 0.00 0.00 84 0.00 0.00 96 0.00 0.00 108 0.00 0.01 120 0.00 0.01 results A " aN L -0.10 -0.04 -0.02 -0.02 -0.02 -0.02 -0.02 -0.03 -0.04 -0.05 -0.06 -0.07 -0.06 -0.04 -0.02 -0.01 -0.02 -0.04 for aN A A " aN U 0.05 0.05 0.06 0.05 0.05 0.04 0.03 0.02 0.01 0.01 0.02 0.03 0.03 0.03 0.02 0.04 0.07 0.10 Estimated intercepts from the no-arbitrage model âN A with the 95 percent conﬁdence intervals obtained from the resampling A A " aN The conﬁdence in[" aN L U ]. tervals refer to the empirical 2.5 percent and 97.5 percent quantiles of the distribution of the parameters. The second column of the Table reports the Nelson-Siegel loadings. 108 Table 3.5: Estimation results for bN A (1) A "N A τ bN S (1) b̂N A (1) "bN L (1) bU (1) 1 1.00 0.98 0.87 1.16 3 1.00 0.99 0.90 1.06 6 1.00 0.99 0.89 1.04 9 1.00 1.00 0.92 1.04 12 1.00 1.00 0.93 1.04 15 1.00 1.00 0.94 1.04 18 1.00 1.00 0.96 1.05 21 1.00 1.00 0.97 1.06 24 1.00 1.00 0.98 1.06 30 1.00 1.01 0.98 1.08 36 1.00 1.01 0.96 1.10 48 1.00 1.00 0.95 1.10 60 1.00 1.00 0.95 1.09 72 1.00 1.00 0.95 1.06 84 1.00 1.00 0.96 1.03 96 1.00 1.00 0.92 1.01 108 1.00 0.99 0.88 1.04 120 1.00 0.99 0.82 1.08 Estimated loadings of the level factor from the no-arbitrage model b̂N A (1) with the 95 percent conﬁdence intervals obtained from the reA "N A sampling ["bN L (1) bU (1)]. The conﬁdence intervals refer to the empirical 2.5 percent and 97.5 percent quantiles of the distribution of the parameters. The second column of the Table reports the Nelson-Siegel loadings on the level. 109 Table 3.6: Estimation results for bN A (2) A "N A τ bN S (2) b̂N A (2) "bN L (2) bU (2) 1 0.97 0.93 0.83 1.08 3 0.91 0.89 0.83 0.98 6 0.84 0.83 0.77 0.92 9 0.77 0.77 0.71 0.84 12 0.71 0.72 0.66 0.76 15 0.66 0.66 0.62 0.70 18 0.61 0.62 0.57 0.64 21 0.56 0.57 0.52 0.59 24 0.53 0.53 0.48 0.56 30 0.46 0.46 0.40 0.50 36 0.41 0.41 0.35 0.45 48 0.32 0.32 0.27 0.38 60 0.27 0.26 0.23 0.32 72 0.23 0.22 0.20 0.26 84 0.19 0.19 0.18 0.22 96 0.17 0.17 0.15 0.21 108 0.15 0.15 0.11 0.20 120 0.14 0.13 0.07 0.19 Estimated loadings of the slope factor from the no-arbitrage model b̂N A (2) with the 95 percent conﬁdence intervals obtained from the A "N A resampling ["bN L (2) bU (2)]. The conﬁdence intervals refer to the empirical 2.5 percent and 97.5 percent quantiles of the distribution of the parameters. The second column of the Table reports the Nelson-Siegel loadings on the slope. 110 Table 3.7: Estimation results for bN A (3) A τ bN S (3) b̂N A (3) "bN L (3) 1 0.03 0.00 -0.10 3 0.08 0.10 0.05 6 0.14 0.19 0.13 9 0.19 0.24 0.17 12 0.23 0.26 0.21 15 0.25 0.27 0.23 18 0.27 0.28 0.24 21 0.29 0.28 0.23 24 0.29 0.27 0.24 30 0.30 0.26 0.23 36 0.29 0.25 0.23 48 0.27 0.23 0.22 60 0.24 0.21 0.20 72 0.21 0.20 0.19 84 0.19 0.19 0.18 96 0.17 0.19 0.16 108 0.15 0.18 0.13 120 0.14 0.18 0.11 "bN A (3) U 0.06 0.18 0.26 0.27 0.28 0.29 0.30 0.30 0.30 0.31 0.31 0.29 0.27 0.23 0.22 0.21 0.21 0.21 Estimated loadings of the curvature factor from the no-arbitrage model b̂N A (3) with the 95 percent conﬁdence intervals obtained from the reA "N A sampling ["bN L (3) bU (3)]. The conﬁdence intervals refer to the empirical 2.5 percent and 97.5 percent quantiles of the distribution of the parameters. The second column of the Table reports the Nelson-Siegel loadings on the curvature. 111 Table 3.8: Summary statistics for the resampled parameters τ mean 3 0.00 12 0.01 24 0.00 60 -0.01 84 0.00 120 0.02 Intercept " aN A st.dev. skewness 0.02 0.11 0.02 -0.24 0.01 -3.11 0.02 0.34 0.01 5.49 0.04 1.06 kurtosis 9.66 8.91 18.77 9.25 57.71 7.71 Loading of the level "bN A (1) τ mean st.dev. skewness kurtosis 3 0.99 0.04 0.28 9.39 12 0.99 0.03 0.76 9.02 24 1.01 0.02 2.85 17.25 60 1.01 0.04 -0.88 10.97 84 1.00 0.02 -5.66 60.42 120 0.97 0.06 -1.03 8.17 Loading of the slope "bN A (2) τ mean st.dev. skewness kurtosis 3 0.91 0.03 0.47 5.56 12 0.71 0.02 -0.08 3.45 24 0.53 0.02 -0.99 6.67 60 0.27 0.02 0.52 5.01 84 0.20 0.01 3.00 34.43 120 0.14 0.03 -0.10 3.97 τ 3 12 24 60 84 120 Loading of the curvature "bN A (3) mean st.dev. skewness kurtosis 0.10 0.03 0.93 3.39 0.25 0.02 -0.52 4.59 0.28 0.02 -0.73 2.71 0.22 0.02 1.72 8.99 0.19 0.01 1.05 5.42 0.16 0.02 -0.85 6.80 Summary statistics of the empirical distributions of the estimated parameters obtained using resampled data. 112 Table 3.9: Measures of ﬁt Residuals from the Nelson-Siegel model st dev min max RMSE MAD ρ(1) 0.200 -1.046 0.387 0.200 0.040 0.513 0.114 -0.496 0.584 0.114 0.013 0.274 0.135 -0.412 0.680 0.135 0.018 0.543 0.122 -0.279 0.483 0.122 0.015 0.586 0.073 -0.398 0.261 0.073 0.005 0.493 0.090 -0.432 0.339 0.089 0.008 0.417 0.096 -0.520 0.292 0.096 0.009 0.655 0.097 -0.446 0.337 0.096 0.009 0.518 0.140 -0.763 0.436 0.140 0.020 0.699 ρ(6) ρ(12) 0.332 0.443 0.159 0.326 0.346 0.471 0.127 0.289 0.044 0.153 0.256 0.183 0.312 -0.037 0.159 -0.083 0.345 0.091 Residuals from no-arbitrage model min max RMSE MAD ρ(1) -0.730 0.752 0.168 0.028 0.361 -0.508 0.817 0.132 0.018 0.448 -0.295 0.795 0.134 0.018 0.579 -0.355 0.439 0.109 0.012 0.514 -0.323 0.217 0.071 0.005 0.491 -0.286 0.405 0.088 0.008 0.474 -0.332 0.379 0.100 0.010 0.688 -0.479 0.343 0.097 0.009 0.527 -0.801 0.375 0.144 0.021 0.705 ρ(6) ρ(12) 0.197 0.363 0.219 0.312 0.361 0.432 0.147 0.306 0.134 0.096 0.320 0.263 0.350 0.101 0.157 -0.070 0.464 0.249 τ 1 3 6 12 24 36 60 84 120 mean -0.159 0.027 0.091 0.046 -0.040 -0.066 -0.053 0.006 0.002 τ 1 3 6 12 24 36 60 84 120 Mean st dev 0.000 0.168 0.080 0.132 0.060 0.135 -0.019 0.109 -0.041 0.071 -0.018 0.088 0.004 0.100 0.019 0.097 -0.060 0.144 Summary statistics of residuals of the Nelson-Siegel and the no-arbitrage models. The Nelson-Siegel model is estimated according to equations (3.2.2) - (3.2.4). The noarbitrage yield curve model is estimated according to equations (3.2.6) - (3.2.12). Statistics are shown for selected maturities, τ . RMSE is the root mean squared error and MAD is the mean absolute deviation. Autocorrelations are denoted by ρ(p), where p is the lag. 113 Table 3.10: Out-of-sample performance τ 1 3 6 9 12 15 18 21 24 30 36 48 60 72 84 96 108 120 1-m ahead NS NA 0.82 0.67 0.91 0.89 1.08 1.03 1.06 1.21 1.01 1.00 1.06 0.98 1.04 1.03 1.06 1.07 1.09 1.11 1.04 1.04 0.99 0.98 0.98 0.98 1.10 1.04 1.02 1.01 1.08 1.08 1.03 1.03 1.04 1.08 1.08 1.32 6-m ahead NS NA 0.67 0.56 0.72 0.70 0.81 0.82 0.80 0.83 0.80 0.81 0.79 0.79 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.78 0.80 0.78 0.84 0.81 0.88 0.85 0.90 0.88 0.91 0.91 0.93 0.94 0.95 0.98 1.02 1.08 12-m ahead NS NA 0.66 0.59 0.64 0.63 0.65 0.67 0.64 0.66 0.64 0.65 0.64 0.65 0.65 0.65 0.66 0.66 0.67 0.67 0.68 0.67 0.70 0.69 0.76 0.73 0.81 0.79 0.85 0.84 0.87 0.86 0.91 0.92 0.93 0.96 1.00 1.05 Ratios of the Mean Squared Forecast Error (MSFE) of the noarbitrage model (NA) and the Nelson-Siegel model (NS) both measured against the performance of the random walk model. A ratio lower than 1 means that the MSFE for the respective model is lower than the forecast error generated by the random walk, and hence that the model performs better than the random walk model. The models are estimated on successively increasing data samples starting 1970:1 until the time the forecast is made, and expanded by one month each time a new set of forecasts are generated. Forecasts for horizons of 1, 6 and 12 months ahead are evaluated on the sample from 1994:1 to 2000:12. Bold entries in the table indicate superior performance of one model (NA or NS) against the other model. 114 Table 3.11: Diebold-Mariano test statistics τ 1-m ahead 6-m ahead 12-m ahead 1 -0.080 -0.214 -0.250 3 -0.037 -0.129 -0.146 6 -0.051 0.132 0.262 9 0.147 0.159 0.222 12 -0.015 0.085 0.154 15 -0.117 0.021 0.098 18 -0.040 0.017 0.086 21 0.048 -0.025 0.046 24 0.070 -0.318 - 0.165 30 -0.003 -0.174 -0.290 36 -0.022 -0.149 -0.239 48 0.002 -0.128 - 0.215 60 -0.082 -0.153 -0.233 72 -0.025 -0.121 -0.215 84 -0.007 -0.047 - 0.166 96 -0.016 0.315 0.447 108 0.069 0.231 0.322 120 0.266 0.290 0.366 Diebold-Mariano test statistic to compare forecast accuracy of two models. We compare the no-arbitrage model against the Nelson-Siegel model. Negative numbers reﬂect superiority of the no-arbitrage model, and positive numbers indicate that the Nelson-Siegel model performs better. The null hypothesis is that the mean squared forecast error of the two models is identical. A number larger than 1.96 in absolute terms indicates that the forecasts produced by the models are signiﬁcantly diﬀerent at a 5 percent level. 115 Figure 3.1: Nelson-Siegel factor loadings 1.2 Level loading 1 Slope loading 0.8 0.6 0.4 0.2 Curvature loading 0 0 12 24 36 48 60 72 Maturity 84 96 108 120 Nelson and Siegel (1987) factor loadings using the re-parameterized version of the model as presented by Diebold and Li (2006). The factor loadings bN S are computed using λ = 0.0609 and equation (3.2.3). 116 Figure 3.2: No-Arbitrage Latent factors and Nelson and Siegel factors 1 0 −1 NS level NA factor 1 −2 −3 1970 1975 1980 1985 1990 1995 2000 2 0 NS slope NA factor 2 −2 1970 1975 1980 1985 1990 1995 2000 4 2 NA factor 3 0 −2 −4 1970 NS curvature 1975 1980 1985 1990 1995 2000 Extracted yield curve factors using US zero-coupon data observed at a monthly frequency and covering the period from 1970:1 to 2000:12. Factors are extracted from the Nelson-Siegel model and from the no-arbitrage model. “NS level” and “NA factor 1” refer to the ﬁrst extracted factor from each model. The second and third extracted factors are correspondingly labeled “NS slope”, “NA factor 2” and “NS curvature”, “NA factor 3”. 117 Figure 3.3: Zero-coupon yields data 16 Yields (Percent) 14 12 10 8 6 4 120 Jan 00 Jan 95 60 Jan 90 Jan 85 Jan 80 Jan 75 1 Maturity (months) Jan 70 Time U.S. zero-coupon yield curve data observed at monthly frequency from 1970:1 to 2000:12 at maturities 1, 3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 48, 118 60, 72, 84, 96, 108 and 120 months. Figure 3.4: No-Arbitrage loadings of the Nelson and Siegel factors Loadings of level Intercept 1.2 0.1 1.15 0.05 1.1 1.05 0 1 0.95 0.9 −0.05 0.85 −0.1 0 20 40 60 80 100 0.8 120 0 20 Loadings of Slope 0.5 1.2 0.4 1 0.3 0.8 0.2 0.6 0.1 0.4 0 0.2 −0.1 0 20 40 60 80 60 80 100 120 100 120 Loadings of curvature 1.4 0 40 100 −0.2 120 0 20 40 60 80 Estimated factor loadings and empirical 50 and 95 percent conﬁdence intervals. Star * indicate the factor loadings from the Nelson-Siegel model, i.e. aN S and bN S in equations (3.2.2) and (3.2.3), while the continuous lines indicate the corresponding factor loadings estimated from the noarbitrage model, i.e. ! aN A and !bN A in equations (3.2.6) to (3.2.8). The distributions of the latter are obtained through resampling. The dark-shaded areas are the 50 percent conﬁdence intervals, while the light-shaded areas show the 95 percent conﬁdence intervals. These are computed as empirical quantiles. 119 Figure 3.5: Distribution of the estimated loadings for aN A Intercept, matuturity 3 Intercept, matuturity 12 3500 3000 3000 2500 2500 2000 2000 1500 1500 1000 1000 500 500 −2 −1 0 1 −10 −5 0 5 −3 −4 x 10 x 10 Intercept, matuturity 60 Intercept, matuturity 120 2500 1600 1400 2000 1200 1000 1500 800 1000 600 400 500 200 −5 0 5 10 −1 −4 0 1 2 3 −3 x 10 x 10 Empirical distributions, for selected maturities, of the no-arbitrage intercepts obtained from the resampling (continuous line), with the relative 95 percent conﬁdence interval (asterisks). The dashed line is the Gaussian approximation with the relative 95 percent conﬁdence intervals (circles). The diamonds are the estimated no-arbitrage intercepts and the dashed vertical line indicates the Nelson and Siegel intercepts. 120 Figure 3.6: Distribution of the estimated loadings for bN A (1) Loading on level, maturity 3 Loading on level, maturity 12 14 16 12 14 10 12 8 10 6 8 6 4 4 2 2 0.8 0.9 1 1.1 1.2 0.9 Loading on level, maturity 60 0.95 1 1.05 1.1 1.15 Loading on level, maturity 120 8 12 7 10 6 8 5 6 4 3 4 2 2 1 0.85 0.9 0.95 1 1.05 1.1 1.15 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 Empirical distributions, for selected maturities, of the no-arbitrage loadings of the level obtained from the re-sampling (continuous line), with the relative 95 percent conﬁdence interval (asterisks). The dashed line is the Gaussian approximation with the relative 95 percent conﬁdence intervals (circles). The diamonds are the estimated no-arbitrage loadings of the level and the dashed vertical line indicates the relative Nelson and Siegel loadings. 121 Figure 3.7: Distribution of the estimated loadings for bN A (2) Loading on slope, maturity 3 Loading on slope, maturity 12 14 14 12 12 10 10 8 8 6 6 4 4 2 2 0.75 0.8 0.85 0.9 0.95 1 1.05 0.6 0.65 Loading on slope, maturity 60 0.7 0.75 0.8 Loading on slope, maturity 120 14 20 12 15 10 8 10 6 4 5 2 0.2 0.25 0.3 0.35 0 0.05 0.1 0.15 0.2 0.25 0.3 Empirical distributions, for selected maturities, of the no-arbitrage loadings of the slope obtained from the re-sampling (continuous line), with the relative 95 percent conﬁdence interval (asterisks). The dashed line is the Gaussian approximation with the relative 95 percent conﬁdence intervals (circles). The diamonds are the estimated no-arbitrage loadings of the slope and the dashed vertical line indicates the relative Nelson and Siegel loadings. 122 Figure 3.8: Distribution of the estimated loadings for bN A (3) Loading on curvature, maturity 3 Loading on curvature, maturity 12 18 20 16 14 15 12 10 10 8 6 5 4 2 0 0.05 0.1 0.15 0.2 0.15 Loading on curvature, maturity 60 0.2 0.25 0.3 Loading on curvature, maturity 120 35 20 30 25 15 20 10 15 10 5 5 0.15 0.2 0.25 0.3 0.05 0.1 0.15 0.2 0.25 Empirical distributions, for selected maturities, of the no-arbitrage loadings of the curvature obtained from the re-sampling (continuous line), with the relative 95 percent conﬁdence interval (stars). The dashed line is the Gaussian approximation with the relative 95 percent conﬁdence intervals (circles). The diamonds are the estimated no-arbitrage loadings of the curvature and the dashed vertical line indicates the relative Nelson and Siegel loadings. 123 Bibliography Ahn, Dong-Hyun, Robert F. Dittmar, and Ronald A. Gallant (2002) ‘Quadratic Term Structure Models: Theory and Evidence.’ Review of Financial Studies 15(1), 243–288 Ang, A., and M. Piazzesi (2003) ‘A No-Arbitrage Vector Autoregression of Term Structure Dynamics with Macroeconomic and Latent Variables.’ Journal of Monetary Economics 50(4), 745–787 Ang, A., G. Bekaert, and M. Wei (2007) ‘The term structure of real rates and expected inﬂation.’ Journal of Finance, forthcoming Ang, A., M. Piazzesi, and M. Wei (2006) ‘What Does the Yield Curve Tell us about GDP Growth.’ Journal of Econometrics 131, 359–403 Bai, J., and S. Ng (2002) ‘Determining the Number of Factors in Approximate Factor Models.’ Econometrica 70(1), 191–221 Bernadell, C., J. Coche, and K. Nyholm (2005) ‘Yield Curve Prediction for the Strategic Investor.’ ECB Working Paper 472 Bernanke, B. S., and J. Boivin (2003) ‘Monetary policy in a data-rich environment.’ Journal of Monetary Economics 50(3), 525–546 BIS (2005) Zero-Coupon Yield Curves: Technical Documentation (Bank for International Settlements, Basle) 124 Bjork, T., and B.J. Christensen (1999) ‘Interest Rate Dynamics and Consistent Forward Rate Curves.’ Mathematical Finance 9, 323–348 Black, F., and P. Karasinski (1993) ‘Bond and option pricing when the short rates are log-normal.’ Financial Analyst Journal 47, 52–59 Black, F., E. Derman, and E. Toy (1990) ‘A one-factor model of interest rates and its application to treasury bond options.’ Financial Analyst Journal 46, 33–39 Boﬁnger, E. (1975) ‘Estimation of a density function using order statistics.’ Australian Journal of Statistics 17(1), 1–7 Bouye, E., and M. Salmon (2002) ‘Dynamic Copula Quantile Regression and Tail Area Dynamic Dependence in Forex Markets.’ Manuscript, Financial Econometrics Research Center Box, G.E.P., and G.M. Jenkins (1976) Time Series Analysis: Forecasting and Control (San Francisco: Holden Day) Chen, R.R., and L. Scott (1993) ‘Maximum likelihood estimation for a multi-factor equilibrium model of the term structure of interest rates.’ Journal of Fixed Income 3, 14–31 Christensen, Jens, Francis Diebold, and Glenn Rudebusch (2007) ‘The Aﬃne Arbitrage-Free Class of Nelson-Siegel Term Structure Models.’ FRB of San Francisco Working Paper No. 2007-20 Christoﬀersen, P.F. (1998) ‘Evaluating Interval Forecasts.’ International Economic Review 39(4), 841–862 Cox, J.C., J.E. Ingersoll, and S.A. Ross (1985) ‘A theory of the term structure of interest rates.’ Econometrica 53, 385–407 Dai, Q., and K. Singleton (2000) ‘Speciﬁcation analysis of aﬃne term structure models.’ Journal of Finance 55, 1943–1978 125 Dai, Q., and T. Philippon (2005) ‘Fiscal Policy and the Term Structure of Interest Rates.’ NBER working paper De Pooter, M., F. Ravazzolo, and D. van Dijk (2007) ‘Predicting the Term Structure of Interest Rates: Incorporating Parameter Uncertainty and Macroeconomic Information.’ Tinbergen Institute Discussion Papers Dewachter, H., and M. Lyrio (2006) ‘Macro Factors and the Term Structure of Interest Rates.’ Journal of Money, Credit, and Banking 38(1), 119–140 Diebold, F. X., and C. Li (2006) ‘Forecasting the term structure of government bond yields.’ Journal of Econometrics 130, 337–364 Diebold, F.X., G. D. Rudebusch, and S. B. Aruoba (2006) ‘The macroeconomy and the yield curve: A dynamic latent factor approach.’ Journal of Econometrics 131, 309–338 Diebold, F.X., L Ji, and C. Li (2004) ‘A Three-Factor Yield Curve Model: NonAﬃne Structure, Systematic Risk Sources, and Generalized Duration.’ Working paper, University of Pennsylvania Diebold, F.X., M. Piazzesi, and G. D. Rudebusch (2005) ‘Modeling Bond Yields in Finance and Macroeconomics.’ American Economic Review 95, 415–420 Dionne, G., P. Duchesne, M. Pacurar, and Montréal (2005) Intraday Value at Risk (IVaR) Using Tick-by-tick Data with Application to the Toronto Stock Exchange (HEC Montréal, Centre de recherche en e-ﬁnance) Doz, C., D. Giannone, and L. Reichlin (2006) A Quasi Maximum Likelihood Approach for Large Approximate Dynamic Factor Models (Centre for Economic Policy Research) Duﬀee, G.R. (2002) ‘Term premia and interest rate forecasts in aﬃne models.’ Journal of Finance 57, 405–443 126 Duﬃe, D., and R. Kan (1996) ‘A yield-factor model of interest rates.’ Mathematical Finance 6, 379–406 Engle, R.F., and S. Manganelli (2004) ‘CAViaR: Conditional Autoregressive Value at Risk by Regression Quantiles.’ Journal of Business & Economic Statistics 22(4), 367–382 Fama, E.F., and J.D. MacBeth (1973) ‘Risk, Return, and Equilibrium: Empirical Tests.’ The Journal of Political Economy 81(3), 607–636 Fama, E.F., and R.R. Bliss (1987) ‘The Information in Long-Maturity Forward Rates.’ The American Economic Review 77(4), 680–692 Favero, C.A., L. Niu, and L. Sala (2007) ‘Term Structure Forecasting: No-Arbitrage Restrictions vs. Large Information Set.’ CEPR Discussion Paper Fleming, M.J., E.M. Remolona, Monetary, Economic Dept, and Bank for International Settlements (1999) The Term Structure of Announcement Eﬀects (Bank for International Settlements, Monetary and Economic Dept.) Furﬁne, C. (2001) ‘Do macroeconomic announcements still drive the Treasury market.’ BIS Quarterly Review pp. 49–57 Giannone, D., L. Reichlin, and L. Sala (2005) ‘Monetary Policy in Real Time.’ Nber Macroeconomics Annual 2004 Giot, P. (2005) ‘Market risk models for intraday data.’ The European Journal of Finance 11(4), 309–324 Gonedes, N.J. (1973) ‘Evidence on the Information Content of Accounting Numbers: Accounting-Based and Market-Based Estimates of Systematic Risk.’ The Journal of Financial and Quantitative Analysis 8(3), 407–443 Hansen, B.E. (1994) ‘Autoregressive Conditional Density Estimation.’ International Economic Review 35(3), 705–730 127 Harvey, C.R., and A. Siddique (1999) ‘Autoregressive Conditional Skewness.’ The Journal of Financial and Quantitative Analysis 34(4), 465–487 Hendricks, W., and R. Koenker (1992) ‘Hierarchical Spline Models for Conditional Quantiles and the Demand for Electricity.’ Journal of the American Statistical Association Hördahl, P., O. Tristani, and D. Vestin (2006) ‘A joint econometric model of macroeconomic and term-structure dynamics.’ Journal of Econometrics 131(12), 405–444 Jasiak, J, and C. Gourieroux (2006) ‘Dynamic quantile models.’ Working Papers 2006 (4), York University Koenker, R. (2005) Quantile Regression (Cambridge University Press) Koenker, R., and G. Bassett Jr (1978) ‘Regression Quantiles.’ Econometrica 46(1), 33–50 Kozicki, S., and PA Tinsley (2001) ‘Shifting endpoints in the term structure of interest rates.’ Journal of Monetary Economics 47(3), 613–652 Kupiec, P. (1995) ‘Techniques for verifying the accuracy of risk management models.’ Journal of Derivatives 3(2), 73–84 Law, P. (2006) ‘Macro factors and the yield curve.’ PhD dissertation, STANFORD UNIVERSITY Litterman, R., and J. Scheinkman (1991) ‘Common factors aﬀecting bond returns.’ Journal of Fixed Income 47, 129–1282 Ludvigson, S.C., and S. Ng (2005) ‘Macro Factors in Bond Risk Premia.’ NBER Working Paper Merton, R. C. (1973) ‘Theory of rational option pricing.’ Bell Journal of Economics and Management Science 4, 141–183 128 Mönch, E. (2005) ‘Forecasting the yield curve in a data-rich environment: A noarbitrage factor-augmented var approach.’ ECB Working Paper series No. 544 (2006) ‘Term structure surprises: The predictive content of curvature, level, and slope.’ Working Paper Nelson, C.R., and A.F. Siegel (1987) ‘Parsimonious modeling of yield curves.’ Journal of Business 60, 473–89 Portnoy, S., and R. Koenker (1997) ‘The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error versus Absolute-Error Estimators.’ Statistical Science 12(4), 279–296 Rebonato, R., S. Mahal, M.S. Joshi, L. Bucholz, and K. Nyholm (2005) ‘Evolving Yield Curves in the Real-World Measures: A Semi-Parametric Approach.’ Journal of Risk 7, 29–62 Rudebusch, G.D., and T. Wu (2004) ‘A Macro-Finance Model of the Term Structure, Monetary Policy, and the Economy.’ Federal Reserve Bank of San Francisco Working Paper Soderlind, P., and L.O.E. Svensson (1997) ‘New Techniques to Extract Market Expectations from Financial Instruments.’ Journal of Monetary Economics 40, 383–429 Vasicek, O. (1977) ‘An equilibrium characterization of the term structure.’ Journal of Financial Economics 5, 177–188 Wu, T. (2006) ‘Macro Factors and the Aﬃne Term Structure of Interest Rates.’ Journal of Money, Credit, and Banking 38(7), 1847–1875 129

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement