Dissertation Modugno

Dissertation Modugno
Suola di Dottorato in Sienze Eonomihe e Statistihe
Dottorato di riera in
Metodologia Statistia per la Riera Sientia
XXIV ilo
Alma Mater Studiorum - Università di Bologna
A Multilevel Model with Time Series Components
for the Analysis of Tribal Art Pries
Luia Modugno
Dipartimento di Sienze Statistihe Paolo Fortunati
Gennaio 2012
Suola di Dottorato in Sienze Eonomihe e Statistihe
Dottorato di riera in
Metodologia Statistia per la Riera Sientia
XXIV ilo
Alma Mater Studiorum - Università di Bologna
A Multilevel Model with Time Series Components
for the Analysis of Tribal Art Pries
Luia Modugno
Coordinatore:
Prof.ssa Daniela Cohi
Tutor:
Prof. Rodolfo Rosa
Co-Tutor:
Dott. Simone Giannerini
Dott.ssa Silvia Cagnone
Settore Disiplinare: SECS-S/01
Settore Conorsuale: 13/D1
Dipartimento di Sienze Statistihe Paolo Fortunati
Gennaio 2012
Acknowledgments
I am really grateful to Simone Giannerini and Silvia Cagnone that have made
possible this work with crucial insights and discussions, but, most of all, have
supported me and believed in me. A special thank also to my mum and to
Matteo for the great patience and for always encouraging me in my moments
of discouragement.
Abstract
In the present work we perform an econometric analysis of the Tribal art
market. To this aim, we use a unique and original database that includes
information on Tribal art market auctions worldwide from 1998 to 2011. In
literature, art prices are modelled through the hedonic regression model, a
classic fixed-effect model. The main drawback of the hedonic approach is
the large number of parameters, since, in general, art data include many
categorical variables. In this work, we propose a multilevel model for the
analysis of Tribal art prices that takes into account the influence of time on
artwork prices. In fact, it is natural to assume that time exerts an influence
over the price dynamics in various ways. Nevertheless, since the set of objects
change at every auction date, we do not have repeated measurements of
the same items over time. Hence, the dataset does not constitute a proper
panel; rather, it has a two-level structure in that items, level-1 units, are
grouped in time points, level-2 units. The main theoretical contribution is
the extension of classical multilevel models to cope with the case described
above. In particular, we introduce a model with time dependent random
effects at the second level. We propose a novel specification of the model,
derive the maximum likelihood estimators and implement them through the
E-M algorithm. We test the finite sample properties of the estimators and the
validity of the own-written R-code by means of a simulation study. Finally,
we show that the new model improves considerably the fit of the Tribal
art data with respect to both the hedonic regression model and the classic
multilevel model.
Contents
List of Tables
vi
List of Figures
vii
Abstract
1
1 Introduction
1
1.1
Review on price indexes in the art market . . . . . . . . . . .
3
1.2
The first database of Tribal artworks . . . . . . . . . . . . . .
5
1.2.1
7
Descriptive analysis . . . . . . . . . . . . . . . . . . . .
2 The multilevel model
17
2.1
Introduction of multilevel analysis . . . . . . . . . . . . . . . . 17
2.2
Conventional approaches for multilevel data . . . . . . . . . . 19
2.2.1
“Rough” strategies . . . . . . . . . . . . . . . . . . . . 19
2.2.2
Fixed-effects models . . . . . . . . . . . . . . . . . . . 20
2.2.3
Interactive models . . . . . . . . . . . . . . . . . . . . 20
2.2.4
Random-effects models . . . . . . . . . . . . . . . . . . 21
2.3
Model specification . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4
Model estimation . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5
2.4.1
Maximum likelihood estimation . . . . . . . . . . . . . 26
2.4.2
Iterative GLS . . . . . . . . . . . . . . . . . . . . . . . 30
Predicting the random effects . . . . . . . . . . . . . . . . . . 31
2.5.1
BLUP . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5.2
Prediction of expected responses . . . . . . . . . . . . . 34
v
2.6
2.5.3 Empirical Bayes prediction . . . . . . . . . . . .
2.5.4 Shrunken estimates . . . . . . . . . . . . . . . .
2.5.5 Standard errors of the predictors . . . . . . . .
Developments and applications of the multilevel model
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
35
35
38
39
3 Hedonic regression model and multilevel model for Tribal art
prices
3.1 Models with no covariates . . . . . . . . . . . . . . . . . . . .
3.2 Models with covariates . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Robust standard errors . . . . . . . . . . . . . . . . . .
3.2.2 Considerations . . . . . . . . . . . . . . . . . . . . . .
43
44
49
62
64
4 A multilevel model with time series components
4.1 Model specification . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Full maximum likelihood estimation through the EM algorithm
4.3 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 The new model and Tribal art prices . . . . . . . . . . . . . .
67
68
70
78
83
5 Conclusions
95
References
97
vi
List of Tables
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
Tribal
Tribal
Tribal
Tribal
Tribal
Tribal
Tribal
Tribal
Tribal
3.1
3.2
3.3
Results of the null model, FE-intercept and RE-intercept models 48
Results of the FE-hedonic and RE-hedonic models . . . . . . . 52
Shapiro-Wilk normality test for the residuals of the FE-hedonic
and the RE-hedonic models . . . . . . . . . . . . . . . . . . . 60
4.1
4.2
4.3
4.4
4.5
4.6
4.7
Scheme of the Monte Carlo study. . . . . . . . .
Scenarios 1 and 2 of the MC study . . . . . . .
Scenarios 3 and 4 of the MC study . . . . . . .
Scenarios 5 and 6 of the MC study . . . . . . .
Simulated forecast . . . . . . . . . . . . . . . .
Results of the AR-RE-hedonic model . . . . . .
Shapiro-Wilk normality tests for the residuals of
hedonic model. . . . . . . . . . . . . . . . . . .
Forecasting . . . . . . . . . . . . . . . . . . . .
4.8
art data: physical variables . .
art data: historical variables .
art data: market variables . .
artworks for continent . . . . .
artworks for type of object . .
artworks for material . . . . .
artworks for patina . . . . . .
artworks for auction house and
artworks for type of last owner
vii
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
venue
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
. .
. .
. .
. .
. .
the
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
AR-RE. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6
7
8
9
10
11
12
12
14
78
79
79
80
82
84
. 91
. 94
List of Figures
1.1
Density histogram of hammer prices . . . . . . . . . . . . . . .
1.2
Time series of prices . . . . . . . . . . . . . . . . . . . . . . . 10
1.3
Yearly turnover and percentage of sold items . . . . . . . . . . 11
1.4
Prices for type of illustration . . . . . . . . . . . . . . . . . . . 13
1.5
Prices for type of description on the catalogue . . . . . . . . . 14
1.6
Prices for quotations . . . . . . . . . . . . . . . . . . . . . . . 15
1.7
Prices for exhibition . . . . . . . . . . . . . . . . . . . . . . . 15
3.1
Plot of the time-specific intercepts for the FE-intercept and
RE-intercept model . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2
Boxplots of residuals by semester of the null model, FE-intercept
and RE-intercept models . . . . . . . . . . . . . . . . . . . . . 50
3.3
Plot of the time-specific intercepts for the FE-hedonic and
RE-hedonic model . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4
Residuals versus fitted values of the FE-hedonic and RE-hedonic
models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.5
Boxplots of residuals by semester for the FE-hedonic and the
RE-hedonic models. . . . . . . . . . . . . . . . . . . . . . . . . 59
3.6
Standard deviations of the level-1 residuals by semester of the
RE-hedonic model . . . . . . . . . . . . . . . . . . . . . . . . 60
3.7
Normal probability plot of residuals of the FE-hedonic and the
RE-hedonic models. . . . . . . . . . . . . . . . . . . . . . . . . 61
3.8
Plots of autocorrelation functions of the residuals of the REhedonic model . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.1
Residuals versus fitted values of the AR-RE-hedonic model. . . 90
ix
9
4.2
4.3
4.4
4.5
4.6
Normal probability plots of residuals of the AR-RE-hedonic
model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Residuals by semester for the AR-RE-hedonic and RE-hedonic
model (3.5). . . . . . . . . . . . . . . . . . . . . . . . . . . .
Plots of autocorrelation functions of the residuals of the ARRE-hedonic model. . . . . . . . . . . . . . . . . . . . . . . .
Standard deviations of the level-1 residuals by semester of the
AR-RE-hedonic. . . . . . . . . . . . . . . . . . . . . . . . . .
Plot of the forecasting capabilities. . . . . . . . . . . . . . .
x
. 90
. 91
. 92
. 93
. 94
Chapter 1
Introduction
Investing in artworks allows to obtain high returns and greater fiscal advantages with respect to investing in financial or housing markets. Also, the art
market can be considered less volatile than other assets and, for this reason,
artworks can be considered alternative investment items. Hence, evaluating the convenience of investing in the art market requires the knowledge of
some pieces of information: the general market trend and the specific segment trend, the investment return in comparison with other assets, but also
the investment risk and its correlation with other financial instruments.
Since artwork items are considered investment goods in the same way as
stocks, bonds and real estates, in the past, the analysis of this new market was
performed by resorting to tools for the analysis of financial markets. However,
these tools miss some essential aspects, mainly because, with respect to the
stocks that are exchanged a high number of times in each instant of time,
artworks are one-off pieces in their kind, hardly comparable with each other,
and they pass through the market only a handful of times (usually only one).
Therefore, the art market trend is more difficult to evaluate than the stock
market trend (Figini, 2007).
Anderson (1974) and Stein (1977) were the first to study the investment
in the art market. Later, Brumal (1986)’s results on the very low art gain in
the long term (only 0.55% between 1600 and 1950) paved the way to numerous studies on different art markets, the Impressionist art among all.
1
The same interest has not been shown in studying the Tribal art market,
until recent years (Figini, 2007).
In the present thesis we perform an econometric analysis of the Tribal art
market. The relevant data come from the first world database of Tribal art
prices that contains more than 20000 records of items sold from 1998 to 2011
by the most important auction houses. At each auction date, the ensemble
of objects to be sold are put together in a catalogue. The selling price and
the most important characteristics of the whole catalogue are recorded, as
presented in section 1.2. The database has been built by a team of researchers
of the University of Bologna, Faculty of Economics – Rimini and it is a unique
source of information.
Among the existing methods to build indexes for prices of artworks, that
will be briefly go over in the next section, the most suitable for fitting our
data is the hedonic regression model. It is a fixed-effect model that regresses
the price on object features and includes fixed time-effects.
This thesis has two innovative contributions. First, we propose a multilevel model for the analysis of Tribal art prices. To our knowledge, this
modelling framework has not been applied yet to this kind of problems. In
chapter 2, a literature review of the multilevel model is presented. As we
will show in chapter 3, such approach will give a substantial advantage over
fixed-effect models, especially in terms of degrees of freedom, parsimony and
interpretability.
Now, it is natural to assume that time exerts an influence over the price
dynamics in various ways. In our case, this assumption has been verified
through the diagnostic analysis performed on the classical multilevel model
that postulates independent effects over time. Nevertheless, since the set of
objects change at every auction date we do not have repeated measurements
of the same items over time. Hence, the dataset does not constitute a proper
panel.
The main theoretical contribution of the thesis is the extension of classical
multilevel models to cope with the case under investigation. In particular,
2
we introduce a model with time dependent random effects at the second
level. In chapter 4, we propose a novel specification of the model, derive the
maximum likelihood estimators for it and implement the whole framework
and the E-M algorithm in R without resorting to third-party software.
We have tested the finite sample properties of the estimators and the
validity of the software implementation through a simulation study.
Finally, the new model has been fit successfully to the Tribal art database
and the main conclusions are drawn.
1.1
Review on price indexes in the art market
The literature on the art economy has proposed some methods to build indexes for prices of artworks, especially for paintings. In the following, we
mention the most important proposals.
Sotheby’s Art Index (and similar others) is constructed by taking the
average price of a group of artworks considered representative of the market
in that moment. Obviously, it is not objective since it involves selecting the
representative sample by some experts.
The average painting methodology (Stein, 1977) constructs the index on paintings with certain “average” characteristics. Also this index is
not objective since it requires choosing the characteristics considered average,
although the degree of subjectivity is smaller than that of the previous index.
The representative painting method (Candela and Scorcu, 1997) constructs the index on a sample of paintings selected according to their price
pattern, rather than to certain characteristics. This method is less subjective, since it is based on empirical and statistical arguments.
The repeated sales regression (Goetzmann, 1993) considers the prices
of the same object exchanged twice. However, rarely artworks are resold,
therefore, the resold objects are not exactly representative of the whole mar3
ket. One of the most famous indexes constructed in this way is that by Mei
and Moses (2002).
The hedonic regression, called also the grey painting method, was born
to pricing houses (Rosen, 1974). Some applications to the art market are for
example in Chanel (1995) and Locatelli-Biey and Zanola (2005).
This approach assumes that the price of an artwork depends both on the
market trend and on the effect of certain object characteristics. The prices
of the relevant features are regressed against past information. These prices
can be used to forecast the price of another object by summing the prices of
its characteristics.
More formally, the hedonic regression is a multiple linear regression:
yit = β0t + β1 X1it + . . . + βk Xkit + it
where yit is the price of the object i sold in the period t, the Xit ’s are the
k object features (the artist’s name, the type of object, the material, the
technique, etc.), and it is the error process with zero mean and constant
variance. The estimated coefficients β̂j , for j = 1, . . . , k, are interpreted as
the price of each characteristic, the so-called shadow prices, assumed to be
constant in time. The estimated coefficient β̂0t expresses, instead, the value
of the grey painting in the period t, that is, the value of an artwork created by
a standard artist, through standard techniques, with standard dimensions,
etc. (Candela and Scorcu, 2004). The market price index is built from the
prices of this grey painting in different periods, β̂01 , . . . , β̂0t , . . . , β̂0T .
This method has been widely used for the painting market since, contrarily to other proposals, it solves the problem of artwork heterogeneity by
explaining prices through object features (Figini, 2007). Moreover, it allows
to construct a market price index by neutralizing the effect of quality.
Nevertheless, the hedonic regression has some drawbacks. First of all, it
is difficult to account for all the relevant characteristics for determining the
price of an object, so that this method can explain only a part of the price.
Moreover, most of the object features are categoric, such as, for example, the
4
artist’s name that in Western market affects strongly the price of artworks.
Therefore, the regression equation will contain many dummy variables: if,
for example, the database includes observations from 100 artists, then there
will be 99 indicator variables only for this characteristic, and this will be
repeated for each dummy inserted in the model. This means that there will
be a high number of parameters to be estimated, so that the resulting models
are not parsimonious.
1.2
The first database of Tribal artworks
Constructing price indexes requires data on sales of artworks. In the art market, the only available information is that coming from auction exchanges.
Nowadays, there are companies1 that publish, through the web, information
about auctions, price indexes, and provide art valuations and other services.
However, most of these companies deals with the Western art. In this scenario, for a long time, there has not been a database on the Ethnic art. However, in recent years, the Tribal art market underwent increases in turnover,
so that it attracted the interest of investors and economists.
In order to fill the lack of information in this neglected segment of art,
given the confidence on its commercial potential, the first database on Ethnic artworks has been created from the agreement of four institutions: the
Department of Economics of the University of the Italian Switzerland, the
Museum of the Extraeuropean cultures in Lugano, the Museo degli Sguardi in
Rimini, and the Faculty of Economics of the University of Bologna, campus
of Rimini2 . The database includes more than 20000 observations auctioned
by the major auction houses during 1998-2011. The information has been
collected from the paper catalogues released from the auction houses before
the auctions.
For each object, 37 features have been gathered; these include physical, historical and market characteristics (see Figini (2007)), listed respectively in
1
Artenet.com, Artinfo.com, Arsvalue.com, Artprice.com are some of them.
The project managers are the economists Guido Candela and Antonello E.Scorcu, and
the statistician Simone Giannerini.
2
5
Table 1.1, 1.2 and 1.3.
Table 1.1: Artworks physical variables (labels in parentheses) with corresponding
levels.
Variable
Type of object (OGG)
Condition (CDCA)
Gaps and repairs (CLIO)
Material (MATP)
Patina (CPAT)
Height, width, diameter, length
(MISA, MISL, MISD, MISN)
Levels
Furniture, Sticks, Masks,
Religious objects, Ornaments,
Sculptures, Musical instruments,
Tools, Clothing, Textiles,
Weapons, Jewels
Passable, Good, Very good
Yes, No
Ivory, Vegetable fibre, Wood,
Metal, Gold, Stone,
Precious stone, Terracotta, ceramic,
Silver, Textile and hides
Seashell, Bone, horn, Not indicated
Not indicated, Pejorative,
Present, Appreciative
(quantitative)
Among the existing proposals for modelling art prices, the hedonic regression method is the one that better fits Tribal art data. Rather, this approach seems more suitable on the Ethnic art than on the Western art data.
An ethnic object, in fact, is characterized by its ethnic provenance, rather
than by the artist’s name that is unknown (The Tribal art is considered an
anonymous art). Since the number of ethnic groups is generally smaller than
the number of artists’ names, the hedonic model for the Tribal art has less
dummy variables than the same model applied to another segment of art.
Moreover, also the amount of iconographic subjects and materials is more
limited. Therefore, the main drawback of the hedonic regression method is
less severe in the Tribal segment (Figini, 2007).
6
Table 1.2: Artworks historical variables (labels in parentheses) with corresponding levels.
Variable
Dating (DATA)
Continent, Region, Stylistic area,
Ethnic group
(CONT, REG, ACSTI, ETNIA)
Illustration on the catalogue (CAIL)
Illustration width (CAAI)
Description (CATD)
Specialized bibliography (CABS)
Comparative bibliography (CABC)
Exhibition (CAES)
Historicization (CAST)
4 Cont., 8 Reg., 89 Styl. Areas
158 Ethnic groups
Absent, black/white ill.,
col. illustr.
Absent, Miscellaneous ill.,
Quarter page, Half page,
Full page, More than one ill.,
Cover
Absent, Short visual descr.,
Visual descr., Broad visual descr.,
Critical descr., Broad critical descr.
Yes, No
Absent, Museum certification,
Relevant museum certification,
Simple certification
Unknown, Museum,
Private individual, Other,
Art gallery, Company
Last owner (CAUA)
1.2.1
Levels
Descriptive analysis
In this subsection, some descriptive analysis of the Tribal art dataset are presented. Hammer prices have been deflated through the HICP (Harmonized
Index of Consumer Prices) and transformed in euro.
Figure 1.1 shows the comparison between the density curve of the logarithm
(base 10) of hammer prices and a Gaussian curve with the same mean and
variance as the distribution of prices: it looks a little bit leptokurtic with
respect to the normal distribution and quite symmetric.
Figure 1.2 displays the series of prices aggregated by year. It reveals that the
most unsatisfying year for the Ethnic art market has been 2003, but it has
7
Table 1.3: Artworks market variables (labels in parentheses) with corresponding
levels.
Variable
Auction date (ASDA)
Venue (ASLU)
Levels
New York, Paris,
Zurich, Amsterdam
Sotheby’s, Christie’s,
Encheres Rive Gauche, Piasa,
Koller, Bonhams
National,
International
Ceased, Existent
Heterogeneous,
Single collection, Homogeneous
Auction house (ASNC)
Importance of the auction house
(ASRC)
State of business (ASSA)
Auction type (ASTP)
Auction title (ASTT)
# of items on the catalogue (CANC)
Number of lot (CANO)
Currency (CAVS)
Minimum and Maximum estimation
(VESM, VESX)
Hammer price (PRICE)
Buy-in and Return of the object
(VENB, VEOR)
(quantitative)
(quantitative)
Euro, Dollar, Franc
(quantitative)
(quantitative)
Yes, No
been also the year with the highest number of sold artworks, both in absolute
and percentage terms, as shown by the plot in Figure 1.3. After this period,
the market has recorded a gradually increase in prices and overall turnover
(plot in Figure 1.3). This positive trend gives an idea of the great potential
of the Tribal art market.
The Ethnic artworks made from African ethnic groups are the most sold in
auction, and, as can be read in Table 1.4, are the items with the highest average hammer price, with respect to America, Eurasian and Oceanic artworks.
By looking at the coefficient of variation, it seems that the African artworks
are also the most risky investment items. However, aggregating the objects
by continent can only give an idea of the importance of each macro-group,
since Tribal artworks are characterized by the Ethnic groups, as the Modern
8
0.6
Figure 1.1: Density histogram of hammer prices on logarithmic scale (base 10),
overlaid by a normal curve of the same mean and variance and a kernel
estimate of the density.
0.3
0.0
0.1
0.2
Density
0.4
0.5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Kernel estimate of the density
●
●
●
●
●
●
●
●
●
●
Gaussian density
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
10
100
1000
10000
1e+05
1e+06
Price
artworks by the artist’s name and the artistic style.
About the type of auctioned object, Table 1.5 reveals that sculptures, tools
Table 1.4: Tribal artworks for continent which they come from. CV is the coefficient variation defined as the ratio between standard deviation and
arithmetic mean, sd
|x̄| .
Eurasia
Africa
Oceania
America
# obj. auctioned % of sales
426
75
10604
67
3091
77
5706
75
Average price Median CV
7028
1532 2.58
24474
4000 5.07
21471
4461 3.31
12918
4699 3.24
and masks are the most sold pieces. Sculptures and masks, together with
religious objects, are also the most priced with respect to other type of objects.
As evident from Table 1.6, most of the Ethnic artworks are made principally
with wood. However, the stone objects seem precious and also less risky than
9
Figure 1.2: Time series of prices in logarithmic scale (base 10). The amount of
sold items are shown within the boxes.
●
1e+06
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
10000
1e+05
●
●
●
●
●
●
Price
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
395
476
199
673
1300
1347
1050
1322
1188
1232
1287
645
1000
1443
1567
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1998
2000
2002
●
●
●
●
●
●
●
10
100
●
●
●
●
●
2004
2006
2008
2010
Year
Table 1.5: Tribal artworks for type of object. CV is the coefficient variation defined
as the ratio between standard deviation and arithmetic mean, sd
|x̄| .
Jewels
Weapons
Sticks
Musical
instruments
Ornaments
Tools
Clothing
Furniture
Textiles
Masks
Sculptures
Religious objects
objects
# obj. auctioned
442
750
763
399
% of sales
71
72
74
68
Average price
9406
8608
13047
17222
Median
2569
2840
3318
3824
CV
3.29
2.78
3.63
4.00
1561
4134
387
663
338
3083
5927
1380
74
77
72
72
80
64
69
72
15737
11055
17073
35427
14262
23772
34687
33367
3886
4224
4224
5207
5977
6211
6769
6910
2.93
3.86
3.04
7.61
2.70
2.94
4.06
3.29
others. Also the artworks made with precious stone, wood and ceramic are
sold at high prices. Moreover, the most common and priced materials are the
main materials of the most auctioned and the most payed types of object,
10
Figure 1.3: Yearly turnover (euro) in logarithmic scale (base 10) and yearly percentage of sold items with respect to the total amount of objects auctioned. The year 2011 has not been included in the first plot since the
database contains information on it only for one semester.
7.5
●
●
7.4
●
●
●
7.3
●
●
●
●
7.2
Yearly turnover (log10)
●
●
●
●
1998
2000
2002
2004
2006
2008
2010
80
Year
75
●
●
70
●
●
●
●
●
●
65
% of sales
●
●
●
60
●
●
1998
2000
2002
2004
●
2006
2008
2010
Year
namely wooden masks, stone or ceramic tools and sculptures. Many Ethnic
Table 1.6: Tribal artworks for material. CV is the coefficient variation defined as
the ratio between standard deviation and arithmetic mean, sd
|x̄| .
Metal
Bone, horn
Seashell
Silver
Ivory
Not indicated
Vegetable fibre, paper,
plumage
Gold
Textile and hides
Wood
Terracotta, ceramic
Precious stone
Stone
# obj. auct.
829
263
155
137
649
136
565
% of sales
72
80
74
67
74
82
77
Av. price
25372
7513
11204
5686
14753
9061
11469
Median
1758
2028
2421
2591
3549
3861
4023
CV
6.88
2.94
3.05
1.88
4.83
2.15
3.09
612
849
11733
2625
521
753
73
75
69
76
69
72
10571
14319
28520
13004
30355
13836
4180
4830
5387
5617
7336
7505
1.89
3.02
4.29
3.06
3.41
1.51
11
artworks have a patina that can assume a different interpretation depending
on the type of object and its original function. When it is interpreted as a
sign of consumption or genuineness, the patina adds value to the object, as
indicated in Table 1.7 for the “appreciative patina”.
Table 1.7: Tribal artworks for patina. CV is the coefficient variation defined as
the ratio between standard deviation and arithmetic mean, sd
|x̄| .
Absent
Present
Pejorative
Appreciative
# obj. auctioned
11111
4390
141
4185
% of sales
72
69
81
72
Average price
18359
19127
16034
38151
Median
4451
5135
5239
7210
CV
5.21
3.26
2.23
3.89
The market features of the objects concern the organization and the general functioning of the Tribal art market. As shown in Table 1.8, the most
important venue for this segment of art are Paris and New York. Christie’s
and Sotheby’s are the dominant auction houses, as in the Modern and Contemporaneous arts. They, in fact, entered in the market respectively in 1970
and 1967, long before the other auction houses working in this sector nowadays. A noteworthy observation is that the organization of auctions is oriented to exploit economies of scale, that is to concentrate auctions in time
and space in order to reduce unit costs.
The marketing actions, made by auction houses through catalogues, seem
Table 1.8: Tribal artworks for auction house and venue. CV is the coefficient
variation defined as the ratio between standard deviation and arithmetic
mean, sd
|x̄| .
Koller-Zurich
Christie’s-Amsterdam
Encheres Rive Gauche
-Paris
Christie’s-Paris
Bonhams-New York
Christie’s-New York
Piasa-Paris
Sotheby’s-Paris
Sotheby’s-New York
# obj. auct.
1396
654
64
% of sales
47
100
52
Average price
3569
6536
4549
Median
1116
2048
2218
CV
4.89
3.80
1.55
4751
282
539
69
3159
8913
75
38
81
61
68
73
10354
6239
19579
16732
43000
26997
2363
3208
4350
5498
7874
7902
4.27
1.49
3.77
3.13
3.92
4.08
12
important in fetching good prices. In fact, the boxplots in Figure 1.4 highlight that prices tend to increase as the importance given to the object on
the catalogue through illustrations increases. In particular, artworks which
have been dedicated a coloured wide illustration are priced on average more
than those without or with black and white figures on the catalogue.
A similar effect on prices is due to the type of description dedicated on the
Figure 1.4: Boxplots of prices (in logarithmic scale) for type of illustration on the
catalogue. The amount of sold items are shown within the boxes.
●
1e+06
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1e+05
●
●
●
●
●
929
1122
10000
Price
42
●
●
●
●
●
●
●
73
1442
660
3134
133
1000
●
●
●
●
●
●
●
●
●
●
●
●
●
714
●
●
●
●
●
●
●
●
●
●
●
●
●
●
100
●
●
●
●
●
●
10
●
●
Absent
●
b/w quarter page
Col. half page More than one col. ill.
Illustration
catalogue to each object (boxplots in Figure 1.5). Moreover, a critical description is more valuated than a visual description.
The pedigree of an artwork can be also constituted by quotations on bibliography that can be object-specific or just comparative. The boxplots in Figure 1.6 reveal that investors tend to pay more for objects boasting citations,
and, as expected, more for the specific rather than comparative bibliography.
Few artworks have been previously exhibited and this fact is positively valuated by art collectors that tend to offer more for having those objects. In
fact, it seems that an object exhibition by a museum, for example, certificates its value.
13
Figure 1.5: Boxplots of prices (in logarithmic scale) for type of description on the
catalogue. The amount of sold items are shown within the boxes.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
439
847
10000
Price
●
●
●
●
1e+05
1e+06
●
●
●
717
3926
●
7925
1000
270
100
●
●
●
●
10
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Absent Short visual descr.
Broad visual descr.
Broad critical descr.
Description
Table 1.9: Tribal artworks for type of last owner. CV is the coefficient variation
defined as the ratio between standard deviation and arithmetic mean,
sd .
|x̄|
Unknown
Art gallery
Other
Private individual
Museum
Company
# obj. auctioned
5641
307
173
13505
77
124
% of sales
72
63
71
71
79
83
Average price
8635
44342
19167
27881
62683
34939
Median
3054
4052
5546
6306
6336
17304
CV
4.19
6.64
2.71
4.12
2.39
1.32
Finally, Table 1.9 shows some statistics about the prices for type of last owner
of the object. The companies are the most paid seller. However, in general,
the table discloses that knowing who has been the last owner is important
for the buyer.
14
Figure 1.6: Boxplots of prices (in logarithmic scale) for quotations. The amount
of sold items are shown within the boxes.
●
●
●
●
1e+06
1e+05
Price
4540
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
10000
1e+06
1e+05
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
10000
Price
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2427
11697
1000
●
●
●
●
●
●
●
●
●
●
●
●
10
10
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
100
100
1000
9584
No
Yes
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
No
Comparative bib
Yes
Specialized bib
Figure 1.7: Boxplots of prices (in logarithmic scale) for exhibition. The amount of
sold items are shown within the boxes.
●
●
●
●
●
●
10000
1455
1000
12669
100
●
10
Price
1e+05
1e+06
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
No
Yes
Exhibition
15
Chapter 2
The multilevel model
2.1
Introduction of multilevel analysis
Multilevel data consist of units of analysis of different type, one hierarchically clustered within the other. In a strictly nested data structure, the term
levels represents the different types of unit of analysis, i.e. the various type
of groupings; in particular, the most detailed level is called the first (or the
lowest) level. The sense of the hierarchy is as follows: there are individuals
described by some variables (level-1 observations), and they are also grouped
into larger units (higher level observations), which in turn could be described
by other variables.
The leading example of multilevel data comes from studies on educational
achievement, in which pupils, teachers, classrooms, schools, district, and so
on, are clustered one within the other, and they might all be units of analysis, each described by own variables. Another well-known example is about
organizational studies, where, generally, data are represented by employees
grouped in departments and firms. Moreover, hierarchical data often occur in social sciences: economists and political scientists frequently work
with data measured at multiple levels in which individuals are nested in geographic divisions, institutions or groups, and so forth (Jones et al., 1992).
Further, other particular structures of data can be thought as multilevel: the
repeated measurements over time on an individual, the respondents to the
17
same interviewer and also subjects within a particular study among those of
a meta-analysis can be considered groups of observations, and, consequently,
be treated as multilevel data.
The idea behind modelling multilevel data, coming from sociological studies (DiPrete and Forristal, 1994), is that living environments (“macro level”
in the sociological field) affect (and can be affected by) individual behaviours
(“micro level”), and, contextual effects are due to social interactions within
an environment. In general, individuals both can influence and be influenced
by various type of contexts, mentioned above: spatial, temporal, organizational and socio-economic-cultural.
As Kreft and Leeuw (1998) put it, “the more individuals share common experiences due to closeness in space and/or in time, the more they are similar,
or, to a certain extent, duplications of each other”; in other words, performances of pupils in the same classroom or those of employees in the same
department tend to be more similar than those from different groups because
of sharing contexts.
The specificity of multilevel data cannot be ignored, first of all because of
an important statistical motivation: the observations within one group are
not independent of each other, as traditional models require. This means that
each individual from the same group may provide less additional information
than a new individual in a new group. If standard statistical analysis, which
generally assume independent observations, is performed on multilevel data
(the so-called naive pooling strategy), results may be misleading. In fact, a
positive correlation among observations within a group, referred to as intraclass correlation (ICC), usually causes the underestimate of standard errors
because the analysis assume that there is more information in the data than
there really is. The case of negative intra-class correlation is less frequent: it
could occur only when the individuals within a context are in competition,
and this may make them less similar to each other. Therefore, generally, a
non-null intra-class correlation biases traditional statistical inference.
Indeed, multilevel structure has not to be treated only as a statistical
nuisance that just needs to be considered for obtaining correct statistical estimations, but a key concept that yields important information by itself. In
18
fact, in addition to statistical motivations, there also are important substantive reasons for considering information from all levels of analysis.
First of all, multilevel model allows to combine multiple levels of analysis
in a single model by specifying predictors at different levels. This can be
useful, for example, to determine whether variables measured at one level
affect relations occurring at another level.1 .
Second, as we will better see, the multilevel analysis allows to decompose the
overall variance in within-group and between-group variances, and, therefore, to know how much the groups are responsible for the variability of the
outcome, the so-called Variance Partition Coefficient (VPC).
2.2
2.2.1
Conventional approaches for multilevel data
“Rough” strategies
When looking for statistical techniques capable of taking into account the correlation structure of multilevel data, one could think, at first, of two simple
procedures: either to disaggregate the higher level variables (e.g. school-level
variables) to the individual level or, conversely, to convey the analysis on the
higher level after the aggregation of the individual observations to the higher
level through a single summary statistic. Both these strategies are obviously
unsatisfactory: while the first one does not take into account the dependence
of the observations within a group, the second approach, which may be referred to as data resolution, is inefficient because, even though it avoids the
over-inflation of the apparent size of the dataset, not only it wastes huge
pieces of information, but also, it may produce misleading results (Burton
et al., 1998; Raudenbush and Bryk, 2002). In fact, this is best known as
ecological fallacy or aggregation fallacy (Goldstein, 2009).
1
The multilevel analysis has been applied, for example, by some demographers, to
examine how differences in national economic development, an information gathered at
the national level, interact with adult educational achievement to influence fertility rates,
which, conversely, are households information (Raudenbush and Bryk, 2002).
19
2.2.2
Fixed-effects models
A simple way to represent the dependence and the variations in outcome, that
may be due to differences between groups and/or to individual differences
within a group, is by including group-specific terms in the model (dummy
variables). This approach has been borrowed from longitudinal data analysis.
Since they show a dependence among the elements, longitudinal data can be
thought of as two-level data with occasion i at level 1 and units j at level
2 (Skrondal and Rabe-Hesketh, 2008). While in the latter context the fixedeffect model is often referred to as least squares dummy variable (LSDV)
model, in the field of experimental design, it is called, instead, analysis of
covariance model (ANCOVA).
On the one hand, these models perfectly capture the clustered structure of
multilevel data, since the dummy variables account for differences among
groups. On the other hand, however, they do not allow to include level-2
(or higher) covariate to explain the differences among the groups. Moreover,
since the model without a constant term includes as many dummy variables
as the groups are (in the model with a constant term, instead, for reasons of
collinearity, one group will be designated baseline category, thus it will not
have the corresponding dummy variable), a further argument against the use
of the dummy variable model, from a statistical point of view, is the high
number of parameters to be estimated.
2.2.3
Interactive models
In political science, there has been another noteworthy attempt to modelling
multilevel data structures through models sometimes referred to as interactive models (Steenbergen and Jones, 2002). These models include in the
regression model both higher level predictors and interaction terms. The first
capture the groups differences in the intercepts; the latter, instead, consisting of group-specific terms interacting with lower level predictors, represent
the differences in the partial slopes of these predictors. This is the strength
of the interactive models compared to dummy variable models. However,
they are based on the strong assumption that differences among groups are
20
completely captured by group predictors without error terms. Since this assumption is often proved to be false, also the interactive models do not fully
address the requirements of multilevel data modelling.
2.2.4
Random-effects models
Whereas the older statistical models were fixed-effects regression models, the
specification of the regression coefficients as random effects has become a
common practice since the 80’s.
In order to understand the benefits of random-effect models compared to
fixed-effect models, particularly in the context of multilevel analysis, consider
a simple one-level model with one regressor:
yi = β0 + β1 x1i + i
with i = 1, . . . , n, under the usual assumption of independent Gaussian errors
with zero mean and variance σ 2 . To deal with the groupings, we let intercept
and slope coefficients vary among groups
yij = β0j + β1j x1ij + ij
(2.1)
where yij is the response of the level-1 unit i (= 1, . . . , nj ) nested in the
level-2 unit j (= 1, . . . , J), and x1ij is the level-1 covariate.
In a fixed-effects regression model, the group varying coefficients, β0j and
β1j , are treated as fixed but unknown parameters to be estimated. Indeed,
in a strictly dummy variable model, the intercept is group varying and the
slope is the same for all groups, β1j = β1 for all j.
Random-effects models, in contrast, contain error terms at higher levels,
and this implies a more complex error framework capable of modelling the
heteroscedastic structure. The coefficients in Eq. (2.1) can be re-expressed
in the following way:
β0j = β0 + u0j
β1j = β1 + u1j
21
(2.2)
where β0 and β1 are fixed, i.e. they do not vary among groups, and the u’s
are random variables with
E(u0j ) = 0,
2
,
, Var(u0j ) = σu0
2
,
Var(u1j ) = σu1
Cov(u0j , u1j ) = σu01 .
For this specification, these models are also called mixed-effects models.
The use of random coefficients for modelling grouped data presents several advantages. First of all, usually, we deal only with a sample from a larger
population, with groups at any level being sub-samples from the whole populations of such type of groups. Thus, it makes more sense to treat parameters
as drawn randomly from a larger “population” of parameters.
Second, specifying a model with random coefficients for grouped data is important also for predictions. Consider, for example, a model of test scores for
students within schools. Since a fixed-coefficients model contains a parameter for each school, we cannot do a prediction for a new student in a new
school, because there is not an indicator for this school in the model (Goldstein, 2010).
According to these two first points, in general, effects should be fixed if they
are interesting in themselves, or random if there is interest in the underlying
population (Searle et al., 1992).
Another common argument against using only fixed effects is the high
number of parameters to be estimated, which, in addition, increases with the
number of groups. This results in a loss of a substantial number of degrees of
freedom. Moreover, rarely it is possible to give a meaningful interpretation
to these parameters. In the random-coefficients models, instead, only the
variance components need to be estimated and interpreted.
Moreover, the complex error structure of a model specification with random
coefficients, allows to decompose the variance of the response in different
components, and this provides important insights. In a two-level model, for
example, the total variance has three components: the first variance term
allows groups to differ in their mean values (intercepts) on the dependent
variable; the second variance term allows slopes between independent and
22
dependent variables to differ across groups (single-level regression models,
instead, generally assume that the relationship between the independent and
dependent variable is constant across groups); a third variance term reflects
the within-group variation, that is the degree to which an individual variable
differs from its predicted value within a specific group.
Further, the random-coefficients model allows to specify a different linear
regression model for any level, so that the model will have many nested
linear models; in particular, Eq. (2.1) is the level-1 regression model, and
Eq. (2.2) is the level-2 regression model.
Finally, a further justification to using random group-effects is the fact that
they represent the ignorance at a certain level, as the residuals represent the
general ignorance.
In summary, important assumptions made in usual regression analysis are,
among others, independence and homoscedasticity of individual responses.
Since these assumptions are violated by data with a multilevel structure,
the results of the classical regression analysis on these data would be biased.
The multilevel regression analysis deals with the dependence of the outcome
variable among individuals within groups through random effects.
2.3
Model specification
Multilevel models are referred to in numerous ways: contextual-effects models (Blalock, 1984) or multilevel linear models (Goldstein, 2010) mainly in
sociological researches, random-coefficients models in econometric literature,
hierarchical mixed linear models or random-effect models in biometric applications (Laird and Ware, 1982), hierarchical linear models in Bayesian
contexts.
The multilevel model is an extension of the random-coefficients model and
the interactive model presented in sections 2.2.4 and 2.2.3, since it takes into
account not only the dependence among elements and the hierarchical structure, but it also allows to incorporate variables from all levels.
A multilevel model can be specified in two stages (Skrondal and Rabe23
Hesketh, 2004; Raudenbush and Bryk, 2002) or in reduced form (Goldstein,
2010). We chose the first type for a better specification and interpretation
of each single model.
The level-1 model with just one covariate of a two-level linear model expresses
the response yij of the level-1 unit i in the group (level-2 unit) j, for i =
1, . . . , nj and for j = 1, . . . , J, as
yij = β0j + β1j xij + ij ,
ij |xij ∼ NID(0, σ 2 )
(2.3)
where β0j is the group-specific intercept, β1j is the group-specific slope, and
ij are level-1 error terms. In the second level model, β0j and β1j are modeled
as
β0j = γ00 + γ01 zj + u0j
β1j = γ10 + γ11 zj + u1j
"
#
u0j uj |xj , zj =
xj , zj ∼ NID 0, Σ ,
u1j "
Σ=
2
σu0
σu01
σu01
2
σu1
(2.4)
#
,
where the γ’s are fixed-coefficients, zj represents the level-2 covariates, and
u0j and u1j are the group-level error terms. Moreover, the random effects
for the group j, uj , are assumed independent of the within-group errors, ij .
From now on, the conditioning on the covariates is omitted but implicit.
By substituting the level-2 model (2.4) in the level-1 model (2.3), the reduced
form of the model is obtained:
yij = γ00 + γ01 zj + u0j + (γ10 + γ11 zj + u1j )xij + ij
(2.5)
= γ00 + γ01 zj + γ10 xij + γ11 zj xij + u0j + u1j xij + ij .
When, β1j = γ10 , the model becomes a two-level random-intercept model
and the variance of the responses is composed by the sum of two variance
components: the within-group variance, σ 2 , and the between-group variance,
2
σu0
. The responses of two units in the same group are correlated since they
share the same random intercept. The correlation within a group takes the
24
following form:
ICC = Corr(yij , yi0 j ) =
Cov(yij , yi0 j )
σ2
= 2 u0 2 .
Var(yij )
σu0 + σ
Therefore, the ICC can also be interpreted as the proportion of total variability in the response due to the between-group variance.
The presence of the random slope makes the variance of the responses
dependent on the covariates having random coefficients, that is
2
2 2
Var(yij ) = σu0
+ 2σ01 xij + σu1
xij + σ 2 .
Then, in the more general case, the intra-class correlation is not equal to
the proportion of variability explained by the second-level variance, that, in
order to avoid confusion, is called by someone (for example Goldstein (2010))
Variance Partition Coefficient (VPC).
By adopting the Skrondal and Rabe-Hesketh (2004)’s notation, the two-level
model in the reduced form (2.5) is simplified in the following way:
yij = β0 + β1 x1ij + β2 x2ij + β3 x3ij + u0j + u1j z1ij + ij
where β0 = γ00 , β1 = γ01 , β2 = γ11 , x1ij = zj and xij = z1ij .
Writing the transformed reduced model in matrix notation, the whole (n ×
1) response vector takes the more general form
y = Xβ + Zu + ,
where, u = (u1 , . . . , uJ )T , Z is
block equal to

1

 1


 1
1
(2.6)
a (n × p ∗ J) block-diagonal matrix with each
z11j
z12j
..
.
...
...
..
.
zp1j
zp2j
..
.
z1nj j . . . zpnj j
25



,


and p is the number of random effects.
Therefore, the response vector is distributed as
y ∼ N (Xβ, Ω)
(2.7)
where Ω = ZΓZT + σ 2 In and



Γ = Var(u) = 



Σ 0 ... 0

0 Σ ... 0 
.. ..
..
.. 

. .
.
. 
0 0 ... Σ
.
2.4
Model estimation
Inference for the linear multilevel model is conducted on the effects, both
fixed and random, and the variance components. It can be based either
on least squares approach (Goldstein, 2010), maximum likelihood methods (Searle et al., 1992; Laird and Ware, 1982; Pinheiro and Bates, 2000;
Raudenbush and Bryk, 1986), or on Bayesian methodology (Seltzer et al.,
1986).
2.4.1
Maximum likelihood estimation
Maximum likelihood is the most used estimation method in multilevel modelling. It produces, in fact, estimators that are asymptotically efficient,
consistent and, for large sample size, robust against violations of the nonnormality assumption for the errors.
Two different likelihood functions can be optimized, each corresponding to a
specific approach: the Full Maximum Likelihood (FML) and the Restricted
Maximum Likelihood (REML). The substantial difference between them is
that, in the latter method, the likelihood function does not include the re26
gression coefficients.
Full maximum likelihood
Call θ the vector of variance components. Under the assumptions of normality and independence for uj and j , the group-response vectors, yj , are
independent and normally distributed with mean Xj β and covariance matrix
Ωj = Zj ΣZTj + σ 2 Inj . Therefore, the full likelihood function associated with
the response vector y of the model (2.6) is:
L(β, θ; y) =
J
Y
f (yj ; β, θ)
j=1
=
J
Y
|Ωj |−1/2 exp{− 21 (yj − Xj β)T Ω−1
j (yj − Xj β)}
(2π)
j=1
nj
2
(2.8)
.
Maximum likelihood estimates for the fixed effects and the variance components are the values maximizing the likelihood function (2.8), or equivalently,
the logarithm of the likelihood:
`(β, θ; y) = ln L(β, θ; y)
J
J 1X
1X
n
−1
T
ln |Ωj | −
(yj − Xj β) Ωj (yj − Xj β)
= − ln(2π) −
2
2 j=1
2 j=1
(2.9)
The derivatives of the log-likelihood with respect to the parameters are as
follows:
J ∂`(β, θ; y) X
−1
T
=
Xj Ωj (yj − Xj β)
(2.10)
∂β
j=1
J
∂`(β, θ; y)
1X
−1 ∂Ωj
=−
tr Ωj
∂θ
2 j=1
∂θ
J 1X
−1 ∂Ωj −1
T
+
(yj − Xj β) Ωj
Ω (yj − Xj β) ,
2 j=1
∂θ j
27
(2.11)
obtained by using the following properties of matrices derivatives (Magnus
and Neudecker, 2007):
(i)
(ii)
∂ ln |A|
∂A
= tr(A−1 )
∂(xT A−1 x)
∂A
= −A−1 (xxT )A−1 ,
where A is a square matrix.
Maximizing the log-likelihood function (2.9) consists in equating the partial
derivatives (2.10) and (2.11) to zero and solving the resulting equations for
β and θ. However, the equation for θ is nonlinear in the variance vector, so
that it is not possible to achieve closed form expressions for θ. Therefore,
solutions have to be obtained through iterative algorithms.
One approach to obtain iteratively maximum likelihood estimates consists
in computing a GLS estimate for the fixed-effect vector β, that is
β̂ = (XT Ω−1 X)−1 XT Ω−1 y.
(2.12)
This is the maximum likelihood estimator of the regression coefficient vector,
that is the same obtained by equating to zero the score function with respect
to β (2.10). Then, the variance components are obtained by optimizing the
profiled likelihood, that is the likelihood function achieved by substituting
the estimate of β from the previous step. This optimization usually is accomplished through the Expectation-Maximization algorithm (Dempster et al.,
1981; Laird et al., 1987). This is an iterative procedure to compute maximum
likelihood estimates in the case of incomplete-data problem. Instead of maximizing the marginal likelihood associate to the data vector y, Eq. 2.8, the
EM algorithm maximizes the joint likelihood of the complete dataset, (y, u),
that is that composed by the observed data and the unobserved (or missing)
data. Starting with some initial values for the parameters, the EM algorithm
consists of iterating two steps: the Expectation step and the Maximization
step.
(h)
1. In the E-STEP, the current vector of estimated parameters, θ̂ , is
used to evaluate the distribution of the unobserved data conditional
28
to the observed data, u|y. Hence, the expected value of the joint loglikelihood, conditional to the data vector y, is obtained.
(h+1)
2. The M-STEP consists in computing a new value of θ̂
by maximizing the expected value of the likelihood function evaluated in the
E-step.
E-STEP and M-STEP are iterated until convergence. A more formal explanation of the EM algorithm is provided in section 4.2.
Restricted maximum likelihood
The problem with the maximum likelihood estimates of the variance components is that the procedure does not take into account the degree of freedom
lost in estimating β. That is, in the variance estimators, the vector of true
parameters β is replaced by its estimator β̂. However, in this way, the uncertainty about the regression coefficients is not taken into account. As a
result, the variance components are estimated with bias in a downward direction (Harville, 1977; Patterson and Thompson, 1971).2
In order to produce unbiased variance estimates, Harville (1977) and Patterson and Thompson (1971) proposed a variant of the FML method, the
Restricted maximum likelihood.
The REML method obtains the estimate of the variance components by optimizing the likelihood of θ not corresponding to y but to a complete set of
error contrasts for the responses, that is on a set of n − k linearly independent error contrasts, aT y, where k is the number of fixed-effects, such that
the function does not contain any fixed effects.3 More specifically, the vector
a is chosen such that aT Xβ = 0, and, therefore, aT y = aT Xβ + aT Zu does
not contain any terms in β.
2
Raudenbush and Bryk (2002) found out that the maximum likelihood estimator of
the between-group variance in a random-intercept model has approximately a bias factor
of (J − k)/J, where J is the number of groups and k is the number of fixed effects.
3
It is easy to show that the log-likelihoods corresponding to different sets of error
contrasts contain the same information and, as a consequence, the same maxima in that
they differ only by a constant (Longford, 1993).
29
In practice, the difference between FML and REML estimates becomes
negligible as the number of groups increases. Moreover, the Restricted likelihood function cannot be used to construct a chi-square test to compare two
nested models, and it is computationally less easy than the FML.
The REML theory can rely also on Bayesian arguments.
2.4.2
Iterative GLS
The most common method among those based on least squares is the Iterative
Generalized Least Squares (Goldstein, 2010, 1986). The GLS estimate for β
is given by:
β̂ = (XT Ω−1 X)−1 XT Ω−1 y
that is the best linear unbiased estimator (BLUE). However, since the variancecovariance matrix, Ω, is typically unknown, this formula is not computable.
For this reason, the GLS procedure is conducted iteratively. Starting from
some initial values for the fixed parameters (generally the OLS estimates),
the “raw” residuals are constructed as:
= y − Xβ̂.
By noting that
E(T ) = Var(y) = Ω,
the procedure applies the GLS method on the following model:
vec(T ) = f (θ) + r
where vec(eeT ) constitutes the dependent variable, r is the residual vector,
and f (θ) is a vector of functions containing coefficients to be estimated, that
are the variance components, and, as covariates, vectors of 0 and 1 depending
on the structure of Ω. Therefore, the variance components in θ, estimated
through GLS, are then used to obtain new estimates of the fixed effects. The
procedure goes on until convergence.
Under the assumption of normality, the IGLS procedure produces maximum
30
likelihood estimates. Moreover, since the IGLS estimates of variance components are biased, as the ML estimates, a Restricted version exists, the
Restricted Iterative Generalized Least Squares (RIGLS).
2.5
Predicting the random effects
The random effects, uj , are not parameters for the statistical model. Nevertheless, numerical values are assigned to them based on the available information, namely the data vector y. Since they are random variables, instead
of parameters, the assigned values will not be properly estimates but predictions of their unobservable values (Henderson, 1953; Searle et al., 1992).
These values can be useful, for example, for conducting inference on particular groups, or model diagnostic, such as for finding outlying groups or
checking assumptions about random effects.
Skrondal and Rabe-Hesketh (2009) provide an overview on the large
existing literature about prediction of random effects and responses in multilevel linear model. When the model parameters are (or treated as) known,
they list four different philosophical approaches, two Bayesian and two frequentist.
- Bayesian approach (Lindley and Smith, 1972; Fearn, 1975; Smith, 1973):
inference regarding random effects is based on their posterior distribution given the observed data and the prior distribution representing
uncertainty about u.
- Empirical Bayesian approach (Strenio et al., 1983; Morris, 1983): inference regarding the random effects is conducted by jointly sampling u
and y. In this case, the distribution of random effects represents their
variation in the population.
- Frequentist prediction (Searle et al., 1992): inference regarding the
random effects is viewed as prediction of the unobserved realizations
of random variables. This approach allows also an empirical Bayesian
justification.
31
- Frequentist estimation (Henderson, 1953; Searle et al., 1992): the random effects are treated as fixed parameters, thus inference on them
typically consists of maximum likelihood estimation.
In the following, the most commonly used predictions of the random
effects are presented.
2.5.1
BLUP
The problem is to assign a value, prediction, to the unobserved realization of
the random vector u. In order to derive the best predictor, the criterion of
minimum variance used for estimating parameters, since they are fixed values,
is replaced by the minimum mean square criterion for the realized value of
a random variable. In other words, the best predictor for the random vector
u is the value that minimizes
E (û − u)T (û − u) .
The best predictor corresponds to the conditional mean of the random effects
vector given the data, û = E(u|y). The minimum mean square can be rewritten as
E (û − u)T (û − u) = E (û − E(u|y) + E(u|y) − u)T (û − E(u|y) + E(u|y) − u)
T
= E û − E(u|y) û − E(u|y)
T
+ E E(u|y) − u E(u|y) − u
T
+ 2E û − E(u|y) E(u|y) − u .
(2.13)
By expressing the latter term as
h
T
i
Ey E û − E(u|y) E(u|y) − u |y ,
32
given the result E(u) = Ey E(u|y) , it is trivial to note that it is equal to
zero. By noting also that the second term of (2.13) does not depend on û,
minimizing the mean square with respect to û is equivalent to minimize
E û − E(u|y)
û − E(u|y) .
T
Therefore, the best predictor for u is
û = E(u|y).
(2.14)
Note that the best predictor is unbiased, not in the classical sense, but in the
sense that its expected value is equal to that of the random variable that is
predicting:
Ey (û) = Ey E(u|y) = E(u).
Since u ∼ N(0, Γ) and y ∼ N(Xβ, Ω),4 their jointly distribution is
"
u, y ∼ N
0
Xβ
# "
,
Γ ΓZT
ZΓ Ω
#!
,
where
ΓZT = Cov(u, yT ) = Cov(u, uT ZT ) = Var(u)ZT .
Then, the best linear predictor for u, BLP(u), is
û = E(u|y) = ΓZT Ω−1 (y − Xβ)
(2.15)
for the property of the joint normal distribution.5
4
It can be showed that the same expression for the best linear predictor is valid without
assumption of normality (Searle et al., 1992).
5
If
X1
µ1
V11 V12
X=
∼N
,
,
X2
µ2
V21 V22
then the conditional distribution of X1 given X2 is
−1
−1
X1 |X2 ∼ N µ1 + V12 V22 (X2 − µ2 ), V11 − V12 V22 V21 .
33
(2.16)
2.5.2
Prediction of expected responses
In a mixed-effects model, the main interest is estimating or, better, predicting
the responses yij for some values of the covariates, that is the prediction the
linear combination of the fixed effects and the realized unobservable value
of the random effects, XT0 β + ZT0 u, for some known covariate matrices X0
and Z0 (Harville, 1977). We will wish finding the Best Linear Unbiased
Predictor (BLUP) of the linear combination, that is, a predictor minimizing
the mean squared prediction error, having a linear form in y, a + By, and
being unbiased in the sense that
\
+ ZT0 u) = E(XT0 β + ZT0 u) = E(XT0 β).
E(XT0 β
From the unbiasedness, it follows that
E(a + Fy) = a + FXβ = XT0 β,
so a = 0 and XT0 = FX. Consequently, the predictor results
\
XT0 β
+ ZT0 u = Fy.
Therefore, the problem is choosing F that minimizes the mean squared prediction error subjected to FX = XT0 (Searle et al., 1992). The solution to
this problem is
BLUP(XT0 β + ZT0 u) = BLUE(XT0 β) + BLUP(ZT0 u)
(2.17)
where BLUE(XT0 β) = XT0 β̂ = XT0 (XT V−1 X)−1 XT V−1 y is the Best Linear
Unbiased Estimate of XT0 β. As a generalized version of the Gauss-Markov
theorem states, XT0 β̂ is BLUE of XT0 β in the sense that if c + Gy is any
other linear unbiased estimator of XT0 β, Var(XT0 β̂) ≤ Var(c + Gy) (Harville,
34
1977). Moreover,
BLUP(u) = ΓZT Ω−1 (y − Xβ̂)
is the Best Linear Unbiased Predictor of u with Xβ replaced by its BLUE.
2.5.3
Empirical Bayes prediction
Given the maximum likelihood estimates for the parameters, β̂ and θ̂, the
Empirical Bayes predictors of the random effects u are the means of their
posterior distributions. The posterior distribution is named empirical since
the parameter estimates β̂ and θ̂ are plugged in.
By using the Bayes theorem, the empirical posterior distribution is obtained
as
f (y|u; β̂, θ̂)f (u; Σ̂)
π(u|y; β̂, θ̂) =
f (y; β̂, θ̂)
where the prior distribution of the random effects, f (u; Σ̂), is combined with
the data y. The difference between the Bayesian approach and the Empirical
Bayesian approach is that, in the former, prior distributions for the parameters are specified, instead of plugging in their estimates; then the posterior
distribution of the random effects is obtained marginally with respect to these
parameters. Thus, the Empirical Bayes predictor for the random effects is:
EB
û
Z
= E(u|y; β̂, θ̂) =
uπ(u|y; β̂, θ̂)du.
that is equivalent to the expression of the BLUP (2.15).
2.5.4
Shrunken estimates
Consider the simplest 2-level random-intercept model
yij = β0 + uj + ij
uj ∼ N (0, σu2 )
ij ∼ N (0, σ 2 ),
35
(2.18)
which in matrix form is
yj = 1nj β0 + 1nj uj + j



u=



u1
u2
..
.


 ∼ N (0, σu2 IJ )


j ∼ N (0, σ 2 Inj ).
uJ
This is also called variance components model (Goldstein, 2009) because the
variance of the response is the sum of the within-group variance and the
between-group variance:
Var(yij ) = Var(uj + ij ) = σu2 + σ 2
The random effects uj in this model represent the deviation of the j-th group
average from the overall mean β0 . Therefore, their prediction is provided by
Pn
the data through the mean for the jth group, ȳj = i j yij /nj . In fact, intuitively, if ȳj is higher than the overall average ȳ, uj should be positive (Searle
et al., 1992). So, a reasonable predictor of uj is given by
ûj = E(uj |ȳj ).
Under the normality assumptions in (2.18), the joint distribution is:
"
uj
ȳj
#
"
∼
0
β0
# "
,
σu2
σu2
2
σu2 σu2 + σnj
#!
Pnj
where Cov(uj , ȳj ) = E(uj i=1
yij )/nj = E(uj yij ) = E(u2j ) = σu2 . Applying
the property (2.16) results in
E(uj |ȳj ) =
nj σu2
(ȳj − β0 ).
σ 2 + nj σu2
The term ȳj − β̂0 represents the mean “raw” or total residual for the j-th
group.
36
As shown in the previous section, in order to obtain the best linear unbiased
predictor (BLUP) of the random effects, the parameters β0 has to be replaced
by its GLS or maximum likelihood estimate. So that the best linear unbiased
estimate is:
nj σu2
BLUP(uj ) = 2
(ȳj − β̂0 ).
σ + nj σu2
Moreover, it may be interesting to predict also the linear combination
µj = β0 + uj representing the overall average for the j-th group. The best
linear unbiased prediction for it is given by
BLUP(µj ) = µ̂j = BLUE(β0 ) + BLUP(uj ) = β̂0 +
n σ2
nj σu2
(ȳj − β̂0 ).
σ 2 + nj σu2
j u
The factor σ2 +n
2 is often called “shrinkage factor” (Goldstein, 2009) and
j σu
BLUP(uj ) shrinkage estimate. The shrinkage factor can be interpreted as
the estimated reliability of the mean raw residual as a predictor of uj . The
name shrinkage is due to the fact that, since the factor takes values between
0 and 1 in absolute value, it shrinks the group mean, µ̂j , toward the overall
mean, β0 , by an amount depending on nj and the variance components: as
nj increases and σ 2 decreases with respect to σu2 , the factor tends to one, and
the group mean tend to dominate in magnitude on the population mean; as,
instead, the group size decreases and σ 2 increases compared with σu2 , the reliability decreases since the “shrinkage factor” becomes closer to zero, and the
group mean tends to the population mean. In Bayesian terms, the shrinkage factor pulls the Empirical Bayes predictor towards the mean of the prior
distribution of the random effects, that is 0. Hence, when nj decreases, the
conditional distribution of the responses f (yj ; β̂, θ̂) (that is the likelihood
associated with the j-th group) becomes flat and uninformative compared
with the prior distribution f (uj ; Σ̂).
The shrinkage estimates are also often referred as James-Stein’s type estimates (James and Stein, 1961) or empirical Bayes estimates.
37
Both the BLUP and the empirical Bayes predictor are biased conditionally
to the random effect. In particular, for the variance component model (2.18),
the conditional expectation of the predictor given the random intercept is
E(ûj |uj ) =
nj σu2
uj
σ 2 + nj σu2
In general, as shown in section 2.5.1, the best linear prediction and the
empirical Bayes predictor are, instead, unconditionally unbiased.
Finally, the best linear predictor of the random effects requires point
estimation of the fixed effects and the variance components to be plugged
in. Though for special cases some corrections have been derived to adjust
for the bias deriving by substituting estimates for parameters (see for references Skrondal and Rabe-Hesketh (2009)), when the estimates are consistent,
this bias is expected to be small as the sample size becomes large.
2.5.5
Standard errors of the predictors
From the expression (2.15) for the BLUP, by using the identities
y − Xβ̂ = [I − X(XT Ω−1 X)−1 XT Ω−1 ]y = My
and ΩMΩ−1 = M, where M is a symmetric and idempotent matrix, the
standard error for the BLUP of u is derived as
Var(û) = ΓZT Ω−1 Var(y − Xβ̂)Ω−1 ZΓ
= ΓZT Ω−1 MΩMΩ−1 ZΓ = ΓZT Ω−1 MZΓ
(2.19)
= ΓZT [Ω−1 − Ω−1 X(XT Ω−1 X)−1 XT Ω−1 ]ZΓ.
This is the unconditional variance-covariance matrix of the random-effects
prediction vector. Note that, through the second term, the expression in (2.19)
takes into account the sampling variation of the estimates of the fixed coefficients, which is negligible in large samples (Goldstein, 2009).
However, Laird and Ware (1982) pointed out that this expression ignores the
38
variation of the random effects, and, for this reason, it would understate the
variation in ûj − uj . Then, to assess the error of prediction of the random
effects, they suggested to use instead
Var(û − u) = Var(û) + Γ − 2Cov(û, uT ) = Γ − Var(û)
= Γ − ΓZT Ω−1 ZΓ + ΓZT Ω−1 X(XT Ω−1 X)−1 XT Ω−1 ZΓ
(2.20)
since, from (2.19),
Cov(û, uT ) = E(ûuT ) = ΓZT Ω−1 ME(yuT ) = ΓZT Ω−1 MZΓ = Var(û).
The form (2.20) represents the conditional or “comparative” variance-covariance
matrix of û, as Goldstein (2009) called it because of its main use in making comparisons among groups. In fact, the comparative standard error is
preferred to the unconditional one (2.19) to make inferences regarding the
realized values of u. On the other hand, for diagnostic purposes, such as
standardizing the estimated residuals and detecting outlying clusters, Var(û)
should be used.
Both the standard errors take into account the sampling variability of
the regression coefficient estimates, but they ignored the sampling variability associated to the variance parameter estimates, when they are plugged
in. Only for simple models, there are some approximations that take into
account the uncertainty about them (see Skrondal and Rabe-Hesketh (2009)
for references). However, if consistent estimators for the variance components
are used, the correction terms become small as the number of groups become
large. Moreover, resampling procedure may be used to estimate standard
errors of the prediction.
2.6
Developments and applications of the multilevel model
The general specification of multilevel models allows a large variety of applications. In the following, we briefly discuss some extensions of the models
39
that may be particularly relevant to the case under scrutiny.
First of all, multilevel models are extended to binary, count, ordered categorical and multi-categorical responses (Stiratelli et al., 1984; Wong and Mason,
1985; Skrondal and Rabe-Hesketh, 2004). The multilevel model is also generalized to cross-classified data and multiple membership structures. These
are cases where observations belong to multiple contexts simultaneously and
these can be either simply nested or crossed. Multiple membership occurs
when a level-1 unit belongs to more than one level-2 unit. Instead, the data
show a crossed structure if, for example, students are grouped in schools and
also in neighbourhoods which they come from; in this case neither “school”
nor “neighbourhood” is above the other one in a hierarchical sense. Crossclassified random-effect models address this situation and allow the inclusion
of predictors of more than one “classification” variable (Raudenbush, 1993;
Goldstein, 1995).
Another interesting application of the multilevel model regards the multivariate case in which multivariate response variables belongs to the same
model. A class of multivariate outcome models is that dealing with missing
data problems, when, in particular, predictors are missing at random. These
are latent variable models, where latent variables represent the unobserved
data “completing” the observed data. Thus, the incomplete data is used
to draw valid inferences about parameters that generate the complete data.
It is possible to manage this missing data problem through hierarchical linear models, by considering the variables for each individual as “occasions of
measurements”, the level-1 units and individuals as level-2 units (Raudenbush and Bryk, 2002; Skrondal and Rabe-Hesketh, 2004). Moreover, when
data are missing by design rather than at random, the multivariate multilevel model takes automatically into account this feature and allows to avoid
special procedures for handling missing data. Also repeated measures data
can be view as a specific case of the multilevel multivariate data. These,
also called longitudinal or panel data, can be thought, in fact, as two-level
data with occasions i at level-1 and units j at level 2. The main difference
between the panel data and simple clustered data is that the level-1 units are
ordered chronologically. In this case, often the usual assumption of indepen40
dence among level-1 residuals does not hold, especially when measurements
on the same unit are taken close together in time. This case can be handled
by including correlation structures at level-1 (Goldstein, 2010). Moreover, it
is possible also to allow heteroscedastic within-group errors through variance
functions (Davidian and Giltinan, 1995).
Thus, in literature, the applications of multilevel models to longitudinal data
consider occasions, that is the points in time, as the lowest level units and
individuals as higher units. Therefore, any time dependence structure is
assessed at the first level. How can we manage the case where different
individuals are observed in different occasions? Our proposal is to treat individuals as level-1 units and time-points as level-2 units, that is to reverse
the structure of (pseudo) panel data. However, this is not costless in that
the assumption of independence among random effects is likely to not hold
because they represent time-effects. For this reason, the classic multilevel
framework needs to be modified to cope with this case. Chapter 4 tries to
extend the multilevel modelling framework to deal with time dependence at
the second level, and this represents the main theoretical contribution of the
present thesis.
Actually, Browne and Goldstein (2010) remove the independence assumption among level-2 disturbances and model the correlation between pairs of
clusters through an explicit function of distance. Although this approach
is especially suitable in presence of spatial correlations, in theory, it could
be used to model our data. However, it has been developed in a Bayesian
framework that requires the specification of prior distributions for the parameters and estimation through MCMC methods; these are computer intensive
and can be unfeasible when the number of level-2 units is large. We chose,
instead, to adopt a frequentist approach by deriving and implementing maximum likelihood estimators with known desirable properties.
41
Chapter 3
Hedonic regression model and
multilevel model for Tribal art
prices
In literature, art prices are modelled through the hedonic regression model,
a classic fixed-effect model, and, as remarked in section 1.2, also Tribal art
prices can be modelled in this way.
Our idea, instead, is to consider the influence of time effects on prices through
a different approach. Since we observe different artworks sold at every auction, Tribal art data do not constitute either a panel or a time series. Rather,
they have a two-level structure in that items, level-1 units, are grouped in
time points, level-2 units. Hence, we propose to exploit the multilevel model
to explain heterogeneity of prices among time points. We have chosen to take
the semesters rather than the auction dates as time points, and this choice
is due mainly to three reasons:
1. the auctions are organized in two sessions, one during the winter and
one during the summer; each of them is constituted by from two to
four auctions organized quite close in time;
2. in general, the stakeholders look at the performance of the previous
semester;
43
3. the auction dates are not equally spaced in time and, as we will see in
chapter 4, this feature will result important to handle time dependence.
In this chapter, we will apply and compare extensively the traditional
hedonic regression model and the multilevel model on the Tribal art dataset.
We will see that the two-level model and the hedonic regression model produce similar results in terms of estimates and residuals.
3.1
Models with no covariates
Let us initially ignore the grouping structure of the data and assume the
simple model
log10 (yit ) = β0 + it ,
i = 1, . . . , nt ,
t = 1, . . . , T,
it ∼ N (0, σ 2 ) (3.1)
where the dependent variable is the logarithm of the observed hammer price
for the observation i in the semester t, namely yit = PRICEit , and β0 is the
overall mean price. The number of semesters included in the database is
T = 27; nt is the number of items sold in the semester t and varies between
P
119 and 915; n = Tt=1 nt = 14124 is the total sample size.1
As said before, Tribal art data do not constitute a proper panel where yit and
yi(t+1) represent the price of item i observed at successive time points. Rather,
since different objects are sold at every auction, yit and yi(t+1) indicate the
prices of two different objects at time points t and t + 1. In particular, some
objects are observed at the same time t, so that, for example, y11 indicates
the price of the first object observed at time 1, y21 the price of the second
object observed at time 1, yn1 1 , the price of the last object observed at time
1, y12 the price of the first object observed at time 2, and so on.
The model (3.1), that we call “null model”, is fitted through the maximum
likelihood method in order to compare the results with those obtained from
fitting the multilevel models. The results are shown in the first column of
Table 3.1. The other two columns of Table 3.1 contain the results of the
1
The database includes more than 20000 observations, but only the sold items are used
in the analysis.
44
models (3.2) and (3.3). The standard errors of the estimates of all models in
this work, reported in parentheses, are computed through a wild bootstrap
procedure, explained in subsection 3.2.1.
The problem with ignoring the grouped structure of data is clear from the
boxplots of the residuals grouped by semester displayed in Figure 3.2a: the
“group effects” are incorporated into the residuals, so that the within-group
variability, σ 2 , is overestimated.
The semester-effects may be accounted by allowing the mean price of each
group to be represented by a separate parameter. This leads to the following
fixed-effect model:
log10 (yit ) = βt + it
(3.2)
where βt is the mean price of the semester t and yit = PRICEit . We will call
it “FE-intercept”.
As shown in the second column of Table 3.1, the estimated residual variability
obtained for the FE-intercept model (3.2) is smaller than that in the null
model (0.499 versus 0.443). Moreover, the boxplots in Figure 3.2b show that
the residuals are centered around zero and smaller than those in Figure 3.2a.
Hence, the fixed-effect model has somehow accounted for the semester-effects.
Nevertheless, as also explained in section 2.2.2, the main drawback of the
fixed-effect model, is that it includes many parameters. On the contrary,
in a random-effect model that treats the group-effects as random variations
around a population mean, the number of parameters does not increase with
the number of groups. Also, an estimate for the between-group variability is
provided.
A simple random-effect model for our data is:
log10 (yit ) = β0t + it
(3.3)
β0t = β0 + ut
where, yit = PRICEit , ut is a random variable representing the deviation
from the population mean of the mean price for the t-th semester. This is a
multilevel model, where items, labeled by i, are the first-level observations,
and the semesters, labeled by t, are the second-level observations.
45
The usual assumptions for this model are:
ut ∼ NID(0, σu2 ),
it ∼ NID(0, σ 2 ),
ut ⊥it ,
for all i = 1, . . . , nt and for all t = 1, . . . , T , where the sign “⊥” stands for
independence and “NID” for white noise process.
By fitting the two-level random intercept model (3.3), that we call “REintercept”, through the maximum likelihood method, we obtain the parameter estimates in the third column of Table 3.1.
Let us compare the FE-intercept and the RE-intercept models. First of all,
the within-group variability, σ 2 , has the same estimate for both models, but
the multilevel model picks out also the between-group variance, σu2 . We carried out a likelihood ratio test between the null model and the RE-intercept
model to assess the significance of the between-group variance. The likelihood ratio test is asymptotically χ2 -distributed with one degree od freedom.
However, since the null hypothesis of zero variance is on the boundary of
the feasible parameter space, the p-value to be used is half the one obtained
from the tables of the chi-squared distribution (Self and Liang, 1987). In this
case, the test gives a value of 1572.582 which is highly significant, and this
assures that the between-group variance is significantly different from zero.
The additional variance component in the RE-intercept model allows to calculate the proportion of the total variability of prices explained by the variability among semesters. In this case, it results 100 ∗ σu2 /(σu2 + σ 2 ) = 13.5%,
that, in a two-level random-intercept model, corresponds to the Intra-class
correlation (ICC), the correlation between two observations in the same
semester. As already mentioned, the existence of a non-zero intra-class correlation reveals the inadequacy of traditional modelling framework (Goldstein,
2010).
The plot in Figure 3.1 highlights the closeness of the semester-effects of
the two models. These values are estimated coefficients for the fixed-effect
model, β̂t , and BLUP (subsection 2.5.1) for the linear combination of the
46
random effects and the overall intercept for the random-intercept model,
β̂0 + ût . The fact that they overlap can be justified analytically. In the FEintercept model, the estimates of the fixed effects βt represent the means of
each cluster. In fact, by considering the cluster means of the model (3.2),
ȳt = βt + ¯t ,
the estimates of the time-specific intercepts correspond to
β̂t = ȳt .
On the other hand, the group mean for the RE-intercept model is obtained
as
β̂0t = β̂0 + ût = β̂0 + λ̂t (ȳt − β̂0 ),
where
λ̂t =
nt σ̂u2
σ̂ 2 + nt σ̂u2
is the shrinkage factor, as derived in subsection 2.5.4, that pulls the group
mean towards the overall mean by an amount depending on nt and the variance components. In practice, the plot of the intercepts of the random-effect
model should appear smoother because of the shrinkage. Instead, since, in
our case, the sample sizes of the groups are big compared to the variance
components, the shrinkage factor tends to one for each t. In particular,
λ̂t = 0.95 for the smallest group that includes 119 units and λ̂t = 0.99 for the
biggest group with 915 units. Therefore, each group-specific mean dominates
in magnitude on the population mean so that β̂t ' β̂0t and the two plots are
almost superimposed.
Finally, the three models have been compared through the information
criteria, the Akaike Information Criterion, AIC = −2 log L(θ) + 2p, and the
Bayesian Information Criterion, BIC = −2 log L(θ) + p log n, where L(θ) is
the likelihood corresponding to the parameter set θ of dimension p, and n
is the total sample size. In general, the model with the smallest AIC/BIC
is chosen as the one which fits best. Both the FE-intercept model and the
47
Table 3.1: Parameter estimates for the null model (3.1), the FE-intercept
model (3.2) and the RE-intercept model (3.3). Bootstrap standard
errors are indicated in parentheses.
σ̂ 2
σ̂u2
ICC
AIC
BIC
# param.
β̂0
1998-1
1998-2
1999-1
1999-2
2000-1
2000-2
2001-1
2001-2
2002-1
2002-2
2003-1
2003-2
2004-1
2004-2
2005-1
2005-2
2006-1
2006-2
2007-1
2007-2
2008-1
2008-2
2009-1
2009-2
2010-1
2010-2
2011-1
Null model
Estimate (s.e.)
0.499 (0.000)
30289
30304
2
3.713 (0.005)
-
FE-intercept
Estimate (s.e.)
0.443 (0.000)
28625
28837
28
3.411 (0.027)
3.720 (0.019)
3.686 (0.021)
3.741 (0.020)
3.926 (0.021)
3.893 (0.019)
4.202 (0.028)
4.098 (0.039)
3.987 (0.031)
3.982 (0.052)
3.478 (0.028)
3.194 (0.024)
3.375 (0.028)
3.557 (0.026)
3.743 (0.024)
3.654 (0.031)
3.902 (0.033)
3.735 (0.023)
3.786 (0.025)
3.559 (0.036)
3.686 (0.035)
3.659 (0.028)
3.851 (0.038)
3.898 (0.058)
4.170 (0.039)
4.262 (0.071)
4.237 (0.045)
48
RE-intercept
Estimate (s.e.)
0.443 (0.002)
0.069 (0.006)
0.135
28718
28741
3
3.79 (0.052)
-0.373 (0.024)
-0.070 (0.019)
-0.103 (0.020)
-0.049 (0.019)
0.134 (0.021)
0.102 (0.020)
0.403 (0.026)
0.297 (0.034)
0.194 (0.029)
0.182 (0.042)
-0.310 (0.029)
-0.592 (0.024)
-0.412 (0.032)
-0.231 (0.027)
-0.047 (0.025)
-0.134 (0.029)
0.110 (0.030)
-0.055 (0.023)
-0.004 (0.025)
-0.228 (0.035)
-0.103 (0.032)
-0.130 (0.026)
0.059 (0.034)
0.105 (0.052)
0.371 (0.040)
0.450 (0.059)
0.433 (0.043)
Figure 3.1: Plot of the time-specific intercepts: β̂t for the FE-intercept and β̂0t for
the RE-intercept model. The red horizontal line represents β̂0 .
●
4.2
●
●
FE−intercept
RE−intercept
●
●
4.0
●
● ●
●
●
●
3.8
●
●
●
●
●
●
●
●
●
●
3.6
Semester−effects
●
●
●
●
●
3.2
3.4
●
●
1998−1
2001−1
2004−1
2007−1
2010−1
Semesters
RE-intercept model have the criteria values smaller than the null model (Table 3.1). However, the AIC and the BIC for the FE-intercept model and the
RE-intercept model are contrasting; the BIC tends to favour the RE-intercept
model which is more parsimonious than the FE-intercept one.
3.2
Models with covariates
As a further modelling step, we have included the covariates that are assumed
important into the models. We started with an extended set of variables
that has been reduced through a backward elimination procedure by means
of significance tests (t-test and F-test) and information criteria.
The final hedonic regression model for modelling the price of artworks, that
49
Figure 3.2: Boxplots of residuals by semester of the null model, FE-intercept and
RE-intercept models.
(a) Null model.
3
●
●
●
●
●
●
●
2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−1
0
Residuals
1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1998−1
2000−1
2002−1
●
2004−1
2006−1
2008−1
2010−1
Semester
(c) RE-intercept model.
3
3
(b) FE-intercept model.
●
●
2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−1
0
Residuals
1
0
−1
Residuals
●
●
●
●
●
●
●
●
1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2
●
●
●
●
●
●
●
●
●
1998−1
2000−1
2002−1
2004−1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2006−1
2008−1
2010−1
1998−1
2000−1
2002−1
Semester
2004−1
2006−1
2008−1
2010−1
Semester
we call “FE-hedonic”, is:
log10 (yit ) = βt + β1 OGGit + β2 REGit + β3 MATPit
+ β4 CPATit + β5 CATDit + β6 CABSit + β7 CABCit
+ β8 CAESit + β9 CASTit + β10 CAIL:CAAIit
+ β11 ASNC:ASLUit + it
50
(3.4)
where the meaning of the labels is reported in Tables 1.1, 1.2 and 1.3, and
the sign “:” indicates the interactions between two variables. Note that all
the variables in the model are categorical, therefore, each of them results in
as many dummy variables as the number of levels of each covariate minus
one. For example, the variable Type of object includes 12 levels (Table 1.1),
hence OGG specifies a set of 11 dummy variables with the first level selected
as the baseline, in this case the level “Furniture”.
The two-level model with the same set of covariates as (3.4), that we call
“RE-hedonic”, has the form:
log10 (yit ) = β0t + β1 OGGit + β2 REGit + β3 MATPit
+ β4 CPATit + β5 CATDit + β6 CABSit + β7 CABCit
+ β8 CAESit + β9 CASTit + β10 CAIL:CAAIit
(3.5)
+ β11 ASNC:ASLUit + it
β0t = β0 + ut
where ut is the random intercept for the semester t. The model assumes that:
ut |Xt ∼ NID(0, σu2 ),
it |Xt ∼ NID(0, σ 2 ),
ut ⊥it
where Xt =
h
i
OGGt REGt . . . .
The FE-hedonic model and the RE-hedonic model (Table 3.2) are very
similar in terms of estimates. First of all, the Figure 3.3 shows that the
time-effects are very close for the two models, as we expected because of the
very high shrinkage factors observed in the previous section.
The likelihood ratio test between the multilevel model and its unrestricted
model (the hedonic model without β0t plus the intercept β0 ) produces a value
of 1509.526 confirming that the between-semester variance is significantly
different from zero.
51
Table 3.2: Parameter estimates for the FE-hedonic model (3.4) and the REhedonic model (3.5). Bootstrap standard errors are indicated in parentheses.
FE-hedonic
Estimate (s.e.)
RE-hedonic
Estimate (s.e.)
0.171 (0.000)
-
0.171 (0.001)
0.026 (0.003)
0.133
15347
16087
98
15439
15991
73
-
2.256 (0.121)
σ̂ 2
σ̂u2
ICC
AIC
BIC
# param.
β̂0
Semester
1998-1
1998-2
1999-1
1999-2
2000-1
2000-2
2001-1
2001-2
2002-1
2002-2
2003-1
2003-2
2004-1
2004-2
2005-1
2005-2
2006-1
2006-2
1.993
2.117
2.177
2.380
2.488
2.449
2.425
2.279
2.370
2.141
2.072
1.985
1.960
2.079
2.255
2.227
2.244
2.141
(0.074)
(0.069)
(0.072)
(0.070)
(0.071)
(0.070)
(0.063)
(0.075)
(0.068)
(0.080)
(0.068)
(0.067)
(0.065)
(0.069)
(0.068)
(0.068)
(0.070)
(0.066)
-0.257
-0.137
-0.077
0.123
0.230
0.192
0.167
0.023
0.114
-0.108
-0.180
-0.267
-0.291
-0.174
0.001
-0.027
-0.010
-0.113
(0.021)
(0.016)
(0.019)
(0.017)
(0.016)
(0.016)
(0.019)
(0.024)
(0.016)
(0.030)
(0.017)
(0.016)
(0.018)
(0.017)
(0.018)
(0.017)
(0.018)
(0.016)
Table 3.2: continued in the next page
52
Table 3.2: continued from the previous page
FE-hedonic
Estimate (s.e.)
2007-1
2007-2
2008-1
2008-2
2009-1
2009-2
2010-1
2010-2
2011-1
2.182
2.305
2.250
2.174
2.269
2.491
2.535
2.499
2.412
(0.067)
(0.068)
(0.064)
(0.069)
(0.067)
(0.070)
(0.070)
(0.082)
(0.075)
RE-hedonic
Estimate (s.e.)
-0.072
0.049
-0.004
-0.080
0.013
0.229
0.273
0.233
0.151
(0.016)
(0.021)
(0.022)
(0.017)
(0.019)
(0.031)
(0.033)
(0.043)
(0.035)
-0.088
0.109
-0.003
-0.099
0.050
-0.114
-0.082
-0.069
-0.038
-0.089
-0.050
(0.035)
(0.023)
(0.028)
(0.038)
(0.022)
(0.042)
(0.023)
(0.052)
(0.058)
(0.034)
(0.049)
-0.159
-0.105
-0.153
0.061
-0.110
0.006
(0.039)
(0.018)
(0.031)
(0.073)
(0.043)
(0.029)
Type of object: baseline Furniture
Sticks
Masks
Religious objects
Ornaments
Sculptures
Musical instruments
Tools
Clothing
Textiles
Weapons
Jewels
-0.088
0.109
-0.002
-0.099
0.050
-0.114
-0.082
-0.069
-0.038
-0.089
-0.050
(0.026)
(0.020)
(0.025)
(0.027)
(0.020)
(0.034)
(0.021)
(0.038)
(0.038)
(0.026)
(0.034)
Region: baseline Central America
Southern Africa
Western Africa
Eastern Africa
Australia
Indonesia
Melanesia
-0.158
-0.105
-0.153
0.061
-0.109
0.006
(0.033)
(0.011)
(0.024)
(0.053)
(0.024)
(0.013)
Table 3.2: continued in the next page
53
Table 3.2: continued from the previous page
FE-hedonic
Estimate (s.e.)
Polynesia
Northern America
Northern Africa
Southern America
Mesoamerica
Far Eastern
Micronesia
Indian Region
Asian Southeast
Middle East
0.177
0.229
-0.371
0.016
0.117
-0.088
0.096
0.301
-0.070
-0.559
(0.015)
(0.018)
(0.120)
(0.024)
(0.021)
(0.145)
(0.072)
(0.105)
(0.112)
(0.088)
RE-hedonic
Estimate (s.e.)
0.177
0.229
-0.371
0.015
0.116
-0.088
0.096
0.297
-0.072
-0.558
(0.032)
(0.053)
(0.140)
(0.052)
(0.052)
(0.270)
(0.070)
(0.105)
(0.142)
(0.137)
-0.045
0.073
-0.032
0.130
0.039
0.053
0.003
-0.084
-0.020
0.061
-0.133
0.046
(0.031)
(0.026)
(0.049)
(0.060)
(0.035)
(0.046)
(0.048)
(0.073)
(0.054)
(0.107)
(0.074)
(0.051)
Type of material: baseline Ivory
Vegetable fibre, paper, plumage
Wood
Metal
Gold
Stone
Precious stone
Terracotta, ceramic
Silver
Textile and hides
Seashell
Bone, horn
Not indicated
-0.045
0.073
-0.032
0.131
0.039
0.053
0.003
-0.083
-0.020
0.061
-0.133
0.047
(0.028)
(0.019)
(0.029)
(0.040)
(0.028)
(0.034)
(0.026)
(0.041)
(0.031)
(0.049)
(0.033)
(0.041)
Patina: baseline Not indicated
Pejorative
Present
Appreciative
0.234 (0.037)
0.033 (0.012)
0.114 (0.013)
0.233 (0.043)
0.033 (0.025)
0.113 (0.028)
Description on the catalogue: baseline Absent
Table 3.2: continued in the next page
54
Table 3.2: continued from the previous page
FE-hedonic
Estimate (s.e.)
Short visual descr.
Visual descr.
Broad visual descr.
Critical descr.
Broad critical descr.
-0.172 (0.033)
0.005 (0.035)
0.238 (0.04)
0.226 (0.037)
0.598 (0.047)
RE-hedonic
Estimate (s.e.)
-0.173
0.003
0.237
0.225
0.597
(0.079)
(0.081)
(0.088)
(0.090)
(0.105)
Yes vs No
Specialized bibliography (dummy)
Comparative bibliography (dummy)
Exhibition (dummy)
0.138 (0.011)
0.120 (0.009)
0.066 (0.013)
0.138 (0.022)
0.120 (0.019)
0.066 (0.028)
Historicization: baseline Absent
Museum certification
Relevant museum certification
Simple certification
0.019 (0.015)
0.032 (0.014)
0.033 (0.009)
0.019 (0.040)
0.032 (0.042)
0.034 (0.026)
Illustration: baseline Absent
Miscellaneous col. ill.
Col. cover
Col. half page
Col. full page
More than one col. ill.
Col. quarter page
Miscellaneous b/w ill.
b/w half page
b/w quarter page
0.410
1.413
0.852
1.005
1.221
0.668
0.404
0.545
0.301
(0.021)
(0.099)
(0.024)
(0.025)
(0.027)
(0.021)
(0.031)
(0.044)
(0.027)
0.411
1.412
0.854
1.005
1.221
0.669
0.403
0.546
0.303
(0.045)
(0.181)
(0.068)
(0.070)
(0.076)
(0.059)
(0.054)
(0.071)
(0.063)
Auction house and venue: baseline Bonhams-New York
Christie’s-Amsterdam
Christie’s-New York
Christie’s-Paris
Encheres Rive Gauche-Paris
0.782
0.709
0.600
0.531
(0.052)
(0.051)
(0.048)
(0.083)
0.783
0.709
0.600
0.529
(0.064)
(0.062)
(0.059)
(0.052)
Table 3.2: continued in the next page
55
Table 3.2: continued from the previous page
FE-hedonic
Estimate (s.e.)
Koller-Zurich
Piasa-Paris
Sotheby’s-New York
Sotheby’s-Paris
-0.005
0.738
0.881
0.744
RE-hedonic
Estimate (s.e.)
(0.051)
(0.071)
(0.048)
(0.047)
-0.005
0.736
0.881
0.744
(0.089)
(0.064)
(0.054)
(0.056)
Figure 3.3: Plot of the time-specific intercepts: β̂t for the FE-hedonic (3.4) and
β̂0t for the RE-hedonic model (3.5). The red horizontal line represents
β̂0 .
2.5
●
FE−hedonic
RE−hedonic
●
●
●
●
●
●
2.3
●
●
●
●
●
●
●
●
2.2
Semester−effects
2.4
●
●
●
●
2.1
●
●
●
2.0
●
●
●
●
●
●
1998−1
2001−1
2004−1
2007−1
2010−1
Semesters
Also the estimates of the regression coefficients are similar for the two
models. This means that the within-group effects are almost coincident with
the total effects, and the between-group effects are negligible. This can be
explained analytically. Since the model for the cluster means of the FEhedonic model is
ȳt = βt + x̄Tt β + ¯t ,
56
the vector of coefficients β is interpretable as the vector of the within-group
effects. In fact, by following Skrondal and Rabe-Hesketh (2004), it can be
estimated equivalently either from the fixed-effect model (3.4) or from the
within-group regression model:
yit − ȳt = (xit − x̄t )T β W + it − ¯t .
The within-group estimator of the fixed regression coefficients (obtained
through OLS estimation but equivalent to that obtained through maximum
likelihood estimation in the case of normality) is, therefore,
−1
β̂ W = Wxx
Wxy
where Wxx = (xit − x̄t )(xit − x̄t )T and Wxy = yit − ȳt . The between-group
regression model, instead, has the following form
ȳt − ȳ = (x̄t − x̄)T β B + ¯t − ¯,
and the between-group estimator is
β̂ B = B−1
xx Bxy
where Bxx = (x̄t −x̄)(x̄t −x̄)T and Bxy = (x̄t −x̄)(ȳt −ȳ). On the one hand, the
between-group estimator considers only the variation among groups; on the
other hand, the fixed-effect model, through the within-group estimator, ignores this source of information and uses only the information within groups.
The advantage of the random-effect model, instead, is that it combines the
two pieces of information in one model. In fact, the GLS estimator (asymptotically equivalent to the maximum likelihood estimator but easier to handle
with) of the regression coefficients of the random-effect model is an average of
the within-estimator and the between-estimator weighted with the respective
precisions:
−1
−1
β̂ GLS = VW
β̂ W + VB
β̂ B
where the matrices V are the respective variance-covariance matrices (Mad57
dala, 1971). Another way to write the GLS estimator is in the following
way
−1
β̂ GLS = Wxx + (1 − λ)Bxx
Wxy + (1 − λ)Bxy
where λ = nσu2 /(σ 2 + nσu2 ) is the shrinkage factor. In our case, we saw that
λ ' 1, so the GLS estimator is almost identical to the within-estimator and
the between-group variation is ignored. Moreover, from an interpretative
perspective, the fact that the between-group effects are null entails that the
contextual effects coincide with the within-group effects. The contextual effects are the additional effects of group means on the responses that are not
accounted for by individual levels. To identify them, we write the fixed-effect
model as (Raudenbush and Bryk, 2002)
yit = β0 + (xit − x̄t + x̄t )T β + it = β0 + (xit − x̄t )T β W + x̄Tt β B .
The contextual effects are represented by β̂ B − β̂ W . This means that, for
example, the proportion of sculptures sold in the semester t (x̄t ) does not
affect the price of the object i in the semester t beyond the fact that the
object is a sculpture, and the proportion of objects sold in the semester t by
Christie’s in New York does not affect the price of the object i in the semester
t additionally to the fact that the object is sold by Christie’s in New York.
In the following, we will have a look to the residuals of both models. The
boxplots of residuals for each semester in Figures 3.5a and 3.5b look almost
identical. As expected, they are better than those of the models without
covariates (Figure 3.2), hence, the covariates have explained a part of the
variability of artwork prices. In fact, the residual variance is 0.171 for both
models, against 0.443 for the models without covariates.
The plots of residuals versus fitted values, in Figure 3.4, show that the (first
level) errors are centered at zero, the variability seems to be constant, and
that the points do not reveal any particular patterns. However, by plotting
the standard deviation of the (level-1) residuals for each group, as shown in
Figure 3.6, it is evident a time dependent heteroscedasticity.
The assumption of normality for the errors, instead, is assessed by looking
at the normal plot of the residuals in Figure 3.7. The plot of level-1 residuals
58
Figure 3.4: Residuals versus fitted values of the FE-hedonic model (3.4) and the
RE-hedonic model (3.5).
(a) FE-hedonic model.
(b) RE-hedonic model.
●
●
●
●
2
2
●
●
●
●
●●
●
●
●
●
●
●●
● ●
●●
● ●●
●
● ●
● ●●
●
●
●●
● ● ● ● ●● ●
●
●● ●
●●
●●
●●
●
● ●●●
●● ●●●
●●
●
● ●
●● ●● ●
●
●
●
●
●
●
●
●
●
●● ●
● ● ●●● ●
●
● ●
●
●●●
●● ● ●●●●●●
●
●
●
●●
●●
● ● ● ●●● ● ● ●●
●● ●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●●●
●
●●●
● ● ●●
●
●●●
●●●
●●●●●
●
●
●
● ●●
●
●
●●
●
●
●●●
●
●
●● ●
●
●● ●
●
●●●●
●
●
●●
●
●
●
● ●
● ● ● ●●● ●●
●●●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
● ● ●●● ● ●
●● ●
●
●
●
●
●● ● ●●● ●
●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
●● ●
●
●●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●● ● ● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●● ●●
●
●●
●●
●●
●
●
●
●
●
●
●●●● ●
●
●●
●
●
●
●
●●
●
●●●●
●
● ●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
●● ●●
●
●
●●●
●
●
●
●
●
●
●
●●
●●●●
●
●
●●●●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●● ●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●●●
●●●
●
● ● ●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●●●
●
●
●
●
●
●● ● ●● ●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●● ●● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●● ●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●● ●●●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●●● ● ●
● ●●●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
● ●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●●●
●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ● ●● ●
●
●
●
●
●●
●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
● ● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
●●
●
●●
● ● ● ●●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●●
●
●
●
●
●
●●●●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●● ●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
● ●●●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
● ● ●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●●
●
●●
●
●
●
●●
●
●●●
●●
●●
●● ● ●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●●
●
●
●
●
●
● ●●● ●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●●●●
●
●
●●
●●●
● ●●
●
●●
●●
●●●
●
●●●●● ●
● ● ●
●● ●
●● ●● ●
●●
●●
●
●●
●
●
●●
●
●●●●●
●
●
●
●●●
●
●●●
● ●●
●●●●
●●
●
●
●
●
●●
●
●●
●
●●
●
●●
●●
●●
●
●●
●
●●● ●●
●
●●
●
●●●
●●
●
●●
●
●
● ●●●
● ●
●
●
●
●
●
●●●●● ●
●
●
●
●●●
● ●
●
●●
●●●
●●
●
●●
●
●
●
●●
●●
●●●
●
●●● ● ● ●
●
●
●● ●
●
●●●
●
●
●
● ●
● ●
●●
●
●
●●●
●
●
●
● ●●●●
●●
●
●●
●●
● ● ●●
●●
● ●● ●● ●●
●
●
● ● ● ●● ●● ● ●
●
●
●●
●
●
●● ●●
●
●●●●
●● ●●
● ●●●● ● ●● ●
●●
●●
●
●
●
●
● ●
●●●●● ● ●●●
●
●
●●
●●
● ●
●●
●
●
● ●●
● ● ●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
0
Residuals
1
●
●
●
−1
0
−1
Residuals
1
●
●
−2
−2
●
●
●
●
2
3
4
5
6
2
3
4
Fitted values
5
6
Fitted values
Figure 3.5: Boxplots of residuals by semester for the FE-hedonic model (3.4) and
the RE-hedonic model (3.5).
(b) RE-hedonic model.
(a) FE-hedonic model.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2000−1
2002−1
●
●
2004−1
2006−1
2008−1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1998−1
●
●
−2
−2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−1
●
●
●
●
●
●
●
●
1
●
Residuals
1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
−1
Residuals
●
●
●
●
●
●
●
●
●
●
●
0
●
●
●
●
●
2
2
●
●
●
2010−1
1998−1
Semester
2000−1
2002−1
2004−1
2006−1
2008−1
2010−1
Semester
show the presence of deviation from normality in the tails. The Shapiro-Wilk
test (see Table 3.3) does not reject the hypothesis of normality only for the
59
0.50
0.45
0.40
0.35
Residual sd by semester
0.55
Figure 3.6: Standard deviations of the level-1 residuals by semester of the REhedonic model (3.5) (the FE-hedonic model shows an identical pattern).
1998−1
2000−2
2003−1
2005−2
2008−1
2010−2
Semester
second level residuals of the RE-hedonic model.
Because the assumption of homoscedasticity and also the assumption of normality for the first level errors do not hold, we calculated the standard errors of the estimates through the Wild Bootstrap, since it is robust to heteroscedastic errors and, being nonparametric, also to not Gaussian errors.
The wild bootstrap procedure adapted to the multilevel case is briefly presented in subsection 3.2.1.
In order to test the assumption of the error processes for the RE-hedonic
Table 3.3: Shapiro-Wilk normality test for the residuals of the FE-hedonic
model (3.4) and the RE-hedonic model (3.5).
FE-hedonic
W
p-value
0.994
0.000
60
RE-hedonic
lev-1 lev-2
0.994 0.962
0.000 0.416
Figure 3.7: Normal probability plot of residuals of the FE-hedonic model (3.4) and
the RE-hedonic model (3.5).
(a) FE-hedonic residuals.
(b) RE-hedonic lev-1 residuals.
Normal Q−Q Plot
Normal Q−Q Plot
●
●
●
●
2
Sample Quantiles
1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−1
0
Sample Quantiles
1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−1
●●
●●
0
2
●●
●●
−2
●
●
−2
●●●●
●
●●●●
●
●
−4
●
−2
0
2
4
−4
−2
0
Theoretical Quantiles
2
4
Theoretical Quantiles
(c) RE-hedonic lev-2 residuals.
Normal Q−Q Plot
●
0.2
●
●
●
●
●
●
●
●
0.0
●
●
●
●●
●
●
● ●
−0.1
Sample Quantiles
0.1
●
● ●
●
−0.2
●
−0.3
●
●
●
●
−2
−1
0
1
2
Theoretical Quantiles
model, we have computed the autocorrelation functions (global and partial)
of the means by semester of level-1 residuals (Figures 3.8a and 3.8b) and
of level-2 residuals (Figures 3.8c and 3.8c). The correlograms point at an
autoregressive-like structure, similar to that of an AR(1) process. This time
dependence is incorporated also in the first level residual means by semester.
61
Figure 3.8: Plots of autocorrelation functions of the residuals of the RE-hedonic
model.
0.5
0.0
Partial ACF
−1.0
−0.5
0.0
−1.0
−0.5
ACF
0.5
1.0
(b) Partial autocorrelation function of
the level-1 residual means aggregated by semesters.
1.0
(a) Autocorrelation function of the level1 residual means aggregated by
semesters.
0
2
4
6
8
10
12
14
2
4
6
Lag
12
14
1.0
0.5
−1.0
−0.5
0.0
Partial ACF
0.5
0.0
−1.0
−0.5
ACF
10
(d) Partial autocorrelation function of
the level-2 residuals.
1.0
(c) Autocorrelation function of the level2 residuals.
0
2
4
6
8
10
12
14
2
Lag
3.2.1
8
Lag
4
6
8
10
12
14
Lag
Robust standard errors
In general, asymptotic standard errors of the ML estimates are given by the
square root of the diagonal elements of the inverse of the information matrix,
evaluated at the ML solution. Because of the complexity of the information
62
matrix for this model, we have recurred to the bootstrap to estimate bias and
variance of estimators. In particular, we chose to perform the wild bootstrap,
developed by Liu (1988) first, since the errors are not homoscedastic as the
model assumes. We have implemented the usual wild bootstrap procedure, as
explained by Davidson and Flachaire (2008), to obtain the standard errors
for the estimates of the fixed-effect models. Instead, since for a two-level
model a resampling scheme has to reflect the hierarchical data structure, we
have adapted the procedure to this case.
Consider the classic multilevel model for the (nt × 1) response of the
generic group t:
yt = β0 + Xt β + rt ,
where
rt = 1nt ut + t ,
for all t = 1, . . . , T . The disturbances are assumed to be mutually independent and to have zero expectation, but they are allowed to be heteroscedastic.
Moreover, the covariates are assumed to be strictly exogenous.
In the homoscedastic case, the variance of the residual vector r̂t is proportional to Int − Ht , where Ht = Xt (XT X)−1 XTt is the orthogonal projection
matrix corresponding to design matrix Xt . This suggests to replace r̂t by the
vector
−1/2
r̃t = diag Int − Ht
· r̂t ,
where the sign “·” indicates the element by element product of the two vectors. Then, the bootstrap procedure is as follows:
1. draw independently T values, wt , for t = 1, . . . , T , from the following
two-point auxiliary distribution (Liu, 1988; Belsley and Kontoghiorghes,
2009):
(
1
with probability 0.5
(3.6)
−1 with probability 0.5
with zero mean and unitary variance;
63
2. generate the bootstrap samples as
yt∗ = β̂0 + Xt β̂ + r∗t ,
where the bootstrap disturbances are obtained as
r∗t = wt r̃t ;
3. compute estimates on the bootstrap sample y∗ ;
4. repeat steps 1-3 B times and compute bootstrap standard errors as
v
u
u
t
B
1 X ∗
(θ − θ̂)2
B − 1 b=1 b
where θ̂ is the vector of the ML estimates.
3.2.2
Considerations
In summary, the results of the FE-hedonic (3.4) and RE-hedonic (3.5) models are very similar in terms of estimates and residuals. The assumptions of
homoscedasticity and normality for the (first level) errors of both models are
not valid, but this is not a big deal, for now, since we used robust standard
errors. On the other hand, the predicted random effects are normally distributed with zero mean, but they are not independent for different groups;
this causes also the violation of the assumption of independence between first
and second level errors.
Improving the classical multilevel model, for this case, requires relaxing
the assumption of independence among random effects. Since the random
effects, in the application of Tribal art data, represent time effects, the inclusion of such correlation implies treating them as a time series.
The main theoretical contribution of this thesis is the derivation of a multilevel model with time series components at a second level. As mentioned
64
above, the correlograms of the residuals suggest the specification of an AR(1)
model. The next chapter is devoted to the theoretical derivation of such
model. In particular, we will find a meaningful econometric specification for
our problem. Then, we will derive and implement a maximum likelihood
estimators and will perform a Monte Carlo study for assessing their finite
sample behaviour. Finally, we will fit the model to the Tribal art dataset
and compare the results with those presented in this chapter.
65
Chapter 4
A multilevel model with time
series components
The previous chapter has shown that applying a classic multilevel model to
the Tribal art data results in the violation of the assumption of independence
among random effects and between first and second level errors. In this
chapter, we propose a new extension of the classic multilevel model that
consists in relaxing the assumption of independence among random effects
and treating them as a time series at the second level. In particular, first
we specify a multilevel model with an AR(1) process at the second level
to capture the time dependence among groups. Section 4.2 contains the
derivation of the maximum likelihood estimators through the E-M algorithm.
The estimation algorithm has been implemented in R with an own-written
code. Through a Monte Carlo study, we will find that the ML estimators
have a good finite sample behaviour and that our R-code is valid. In the
last section, we fit the new model to the Tribal art dataset and compare the
results with those obtained by the classic multilevel model with independent
random effects, presented in the previous chapter. We will see that the
AR(1) process well captures the time dependence structure among groupeffects. Hence, the multilevel model with time series components fits better
the data.
67
4.1
Model specification
Consider a random intercept model with k level-1 covariates:
yit = β1 x1it + β2 x2it + . . . + βk xkit + β0t + it
(4.1)
where t = 1, . . . , T indexes the level-2 units, and i = 1t . . . , nt indexes the
level-1 units in the t-th level-2 unit. The intercepts β0t are group-specific
and random; the slopes β1 , . . . , βk , instead, are fixed. The it are the level1 residuals assumed independent for different groups given the covariates
and independent of each other for the same group; also, they are normally
distributed with zero mean and constant variance:
it |xit ∼ NID(0, σ 2 ),
h
∀i and ∀t,
i
where xit = x1it x2it . . . xkit . All the random variables in the models
are to be conditioned to the design matrices, but, from now on, the conditioning will be omitted to simplify the notation.
The random intercept β0t is modeled as
β0t = β0 + ut ,
where β0 represents the mean across the population, and ut is the deviation
of the group-specific intercept β0t from the overall mean.
Here, the usual assumption of independence for the random effects (ut ⊥us ,
for t 6= s) is relaxed by assuming an autoregressive process of order 1 for the
level-2 errors:
ut = ρut−1 + ηt ,
ηt ∼ NID(0, ση2 ),
with |ρ| < 1, that guarantees stationarity. Moreover, it is assumed that ηt ⊥us
for all s < t and ηt ⊥it for all t 6= s = 1, . . . , T and for all i = 1, . . . , nt . Under
these assumptions the dependent variable has the following distribution:
2
yit ∼ N β0 + xit β, σ + φ0
68
where β =
h
iT
β1 β2 . . . βk
is the vector of fixed slopes, and
φ0 = Var(ut ) =
ση2
.
1 − ρ2
In matrix form, the composite model for the whole response vector is
y = Xβ + Zb + ,
where



Z=


1n1 0
0 1n2
..
..
.
.
0 ...
...
...
..
.
0
0
..
.



,


. . . 1nT
and
b=
h
β01 β02 . . . β0T
iT
is the vector of random intercepts. Since the random intercepts are not
independent, the variance-covariance matrix of b is not diagonal, as in the
classical multilevel framework. Instead, its variance-covariance matrix is that
of an AR(1) process:



Γ = φ0 


1
ρ
..
.
ρ
1
..
.
ρT −1 ρT −2
. . . ρT −1
. . . ρT −2
..
...
.
...
1



.


Therefore, the vector b is normally distributed with expected value 1T β0 and
variance- covariance matrix Γ. Consequently, the probability distribution of
the response vector is
y ∼ N β0 + Xβ, ZΓZT + σ 2 In .
(4.2)
We can now proceed to the estimation of the parameters. We have chosen
the full maximum likelihood estimation method implemented through the
69
E-M algorithm.
4.2
Full maximum likelihood estimation through
the EM algorithm
The set of parameters of the multilevel model with AR(1) random effects to
be estimated is θ = {β0 , β, σ 2 , ρ, ση2 }. In the following, we show the derivation of the full maximum likelihood estimators of θ and their implementation
by mean of the E-M algorithm.
The full likelihood function associated with the response vector y, with
probability distribution (4.2), is
|Ω|−1/2
exp
L(θ; y) = f (y; θ) =
(2π)n/2
(y − Ẋβ̇)T Ω−1 (y − Ẋβ̇)
−
2
(4.3)
where
Ω = ZΓZT + σ 2 In
is the variance-covariance matrix of y, and
Ẋ =
h
1n X
i
and
β̇ =
h
β0 β
iT
are, respectively, the matrix design and the coefficients vector including the
intercept. Maximizing L(θ; y) can be achieved by optimizing its logarithm,
the log-likelihood
n
1
1
`(θ; y) = ln L(θ; y) = − ln(2π)− ln |Ω|− (y− Ẋβ̇)T Ω−1 (y− Ẋβ̇). (4.4)
2
2
2
As seen for the classic multilevel model in section 2.4.1, the direct derivation
of the log-likelihood (4.4) yields
∂`
= ẊT Ω−1 y − ẊT Ω−1 Ẋβ̇
∂ β̇
70
(4.5)
and
∂Ω 1
∂Ω −1
∂`
1 = − tr Ω−1
+ (y − Ẋβ̇)T Ω−1
Ω (y − Ẋβ̇)
∂ωj
2
∂ωj
2
∂ωj
(4.6)
for j = 1, 2, 3, where ω1 = σ 2 , ω2 = ρ and ω3 = ση2 .
Equating the partial derivatives (4.5) and (4.6) to zero and solving the resulting equations for β̇ and ωj0 s gives:
ẊT Ω−1 Ẋβ̇ = ẊT Ω−1 y
and
∂Ω ∂Ω −1
tr Ω−1
= (y − Ẋβ̇)T Ω−1
Ω (y − Ẋβ̇)
∂ωj
∂ωj
(4.7)
for j = 1, 2, 3. However, the equations (4.7) are nonlinear in the variance)
components ωj , so that, it is not possible to achieve closed form expressions
for the solutions of (4.7). Therefore, we obtain solutions to these equations
through the EM algorithm.
The Expectation-Maximization algorithm (Dempster et al., 1977; Laird
et al., 1987) is an iterative procedure to compute maximum likelihood estimates in the case of incomplete-data problem. In the mixed-effects model, if
the random effects were known, it would be possible to write a closed form
of the maximum likelihood estimates of the variance parameters ωj . This
suggests to treat b as missing data in a EM algorithm context, so that (y, b)
forms the complete dataset, that is the dataset composed by the vector of
the observed uncomplete data, y, and the vector of the unobserved data, b.
To simplify the notation, we separate the set of parameters of the multilevel
model with AR(1) random effects in two subsets: θ = {θ 1 , θ 2 }, where the
subset θ 1 = {β, σ 2 } includes the level-1 parameters, and θ 2 = {β0 , ρ, ση2 } the
level-2 parameters.
71
The joint or complete likelihood associate with the complete dataset is:
L(θ; y, b) = f (y, b; θ) = f (y|b; θ 1 )f (b; θ 2 )
(y − Xβ − Zb)T (y − Xβ − Zb)
2 −n/2
= (2πσ )
exp −
2σ 2
|Γ|−1/2
(b − β0 1T )T Γ−1 (b − β0 1T )
×
,
exp −
(2π)T /2
2
that can be re-written as
L(θ; y, b) = (2πσ )
|V|−1/2
×
(2πση2 )T /2
(y − Xβ − Zb)T (y − Xβ − Zb)
exp −
2σ 2
(b − β0 1T )T V−1 (b − β0 1T )
exp −
2ση2
2 −n/2
(4.8)
since the covariance matrix Γ is equal to ση2 V, where

1
V=
1 − ρ2





1
ρ
..
.
ρ
1
..
.
ρT −1 ρT −2
. . . ρT −1
. . . ρT −2
..
..
.
.
...
1



.


Hence, the joint log-likelihood is the sum of two separate components:
`(θ; y, b) = ln L(θ; y, b) = `1 (θ 1 ) + `2 (θ 2 )
(4.9)
where
(y − Xβ − Zb)T (y − Xβ − Zb)
n
2
`1 (θ 1 ) = ln f (y|b) = − ln(2πσ ) −
2
2σ 2
and
`2 (θ2 ) = ln f (b) = −
T
1
(b − β0 1T )T V−1 (b − β0 1T )
ln(2πση2 ) + ln(1 − ρ2 ) −
.
2
2
2ση2
(4.10)
72
Note that (1 − ρ2 ) in (4.10) comes from |V| = |∆T ∆|−1 = (1 − ρ2 )−1 , where
 p
1 − ρ2 0
0 ...


−ρ
1
0 ...


0
−ρ 1 . . .
∆=

.
.. . . . .

..
.
.
.

0
0 . . . −ρ
0
0
0
..
.
1





,



and it is straightforward to show that (see for example Hamilton (1994))

V−1
1
−ρ
0
...
−ρ 1 + ρ2
−ρ . . .
0
−ρ 1 + ρ2 . . .
..
.





T
=∆ ∆=



 0
0
0
...
0
0
0
0
..
.
. . . 1 + ρ2
−ρ
0
0
0






.



−ρ 
1
The estimation of θ is obtained through an iterative procedure involving the
complete log-likelihood. Starting with some initial values for the parameters,
each iteration consists in two steps.
E-STEP
In the Expectation step, since the complete log-likelihood (4.9) is unobservable, the current value of θ, denoted as θ (h) , is used to evaluate its expected
value conditional on the observed data:
E `(θ; y, b)|y .
(4.11)
The probability distribution of the unobserved data conditional to the observed data is
b|y ∼ N 1T β0 + ΓZT Ω−1 (y − 1n β0 − Xβ), Γ − ΓZT Ω−1 ZΓT
73
(4.12)
since
"
b
y
#
"
∼N
1T β0
β0 + Xβ
# "
,
Γ ΓZT
ZΓT Ω
#!
.
By exploiting some results on Schur complements (see for example Searle
et al. (1992)), the variance-covariance matrix of the conditional distribution
can be simplified as follows:
T
−1
T
−1
Γ − ΓZ Ω ZΓ = Γ
ZT Z −1
+ 2
.
σ
Since, in the next step, the modified log-likelihood (4.11) will be maximized,
it is more convenient to compute the conditional expectation of the score
functions from the joint log-likelihood (4.9), that is
Z
∂`(θ; y, b)
f (b|y; θ (h) )db.
∂θ
The complete log-likelihood gives the following score functions
• w.r.t. β:
∂`1 (θ 1 )
−XT XT β + XT y − XT Zb
=
,
∂β
σ2
(4.13)
• w.r.t. σ 2 :
n
(y − Xβ − Zb)T (y − Xβ − Zb)
∂`1 (θ 1 )
=
−
+
,
∂σ 2
2σ 2
2σ 4
• w.r.t. β0 :
∂`2 (θ 2 )
1T V−1 b − 1TT V−1 1T β0
= T
∂β0
ση2
=
(1 − ρ)(b1 + bT ) + (1 − ρ)2
PT −1
t=2
ση2
bt − (1 − ρ)(T − (T − 2)ρ)β0
(4.14)
74
,
• w.r.t. ση2 :
∂`2 (θ 2 )
T
(b − β0 1T )T V−1 (b − β0 1T )
=
−
+
;
∂ση2
2ση2
2ση4
(4.15)
• w.r.t. ρ:
∂`2 (θ 2 )
ρ
1 T ∂V−1
=−
u,
−
u
∂ρ
1 − ρ2 2ση2
∂ρ
(4.16)
where u = b − β0 1T . Note that

∂V−1
∂ρ
0 −1 0 . . .
−1 2ρ −1 . . .
0 −1 2ρ . . .
..
.





=



 0
0
0
...
0
0
0
0
0
0
0
..
.










. . . 2ρ −1 
−1 0
so that its product with uT and u yields a simple expression:
T −1
uT
T −1
X
X
∂V−1
u2t
ut ut+1 + 2ρ
u = −2
∂ρ
t=2
t=1
Hence, the score function (4.16) takes the following form:
#
" T −1
X
T −1
T −1
T −1
X
X
X
∂`2 (θ 2 )
1
u2t +
ut ut+1
= 2
ρ3
u2t − ρ2
ut ut+1 − ρ ση2 +
∂ρ
ση (1 − ρ2 )
t=2
t=1
t=2
t=1
" T −1
#
T
−1
X
1 X
ρ
= 2
ut ut+1 − ρ
u2t −
.
2
ση t=1
1
−
ρ
t=2
Now, instead of calculating the conditional expectations of the score functions, it is enough to take the conditional expectation for the sufficient statistics given the observed data (McLachlan and Krishnan, 1997), evaluate them
with the current estimates of the parameters θ (h) and substitute them in the
score functions. Therefore, by substituting the sufficient statistics in the score
functions with their corresponding conditional expected values, the following
75
conditional expectations of the score functions are obtained:
#
−XT XT β + XT y − XT Zb̂
∂`1 (θ 1 )
E
|y; θ (h) =
∂β
σ2
"
#
n
(y − Xβ − Zb̂)T (y − Xβ − Zb̂) + tr(ZT ZB)
∂`1 (θ 1 )
(h)
=
−
|y;
θ
+
E
∂σ 2
2σ 2
2σ 4
#
"
P −1
b̂t − (1 − ρ)(T − (T − 2)ρ)β0
(1 − ρ)(b̂1 + b̂T ) + (1 − ρ)2 Tt=2
∂`2 (θ 2 )
(h)
=
E
|y; θ
;
2
∂β0
ση
#
"
T
∂`2 (θ 2 )
tr(V−1 B) + ûT V−1 û
(h)
=
−
E
|y;
θ
+
∂ση2
2ση2
2ση4
"
#
#
" T −1
T −1
X
X
ρ
1
∂`2 (θ 2 )
(Bt,t + û2t ) −
|y; θ (h) = 2
(Bt,t+1 + ût ût+1 ) − ρ
E
∂ρ
ση t=1
1 − ρ2
t=2
(4.17)
where
b̂ = β̂0 + û and B = Var(b|y; θ (h) )
"
are respectively the conditional expected value and variance-covariance matrix of the random vector b, whose expressions are in (4.12).
M-STEP
The Maximization step consists in maximizing the conditional expected value
of the log-likelihood (4.11) computed in the E-step, to get new values for the
vector of parameters, θ (h+1) .In practice, since the expected values of the score
functions have been computed, it is enough to set them equal to zero and
solve for the parameters. Therefore, the current value of the parameters are
updated as follows:
β̂
(σ̂ 2 )(h+1) =
(h+1)
= (XT X)−1 XT (y − Zb̂)
(y − Xβ − Zb̂)T (y − Xβ − Zb̂) + tr(ZT ZB)
n
76
(h+1)
β̂0
P −1
b1 + bT + (1 − ρ) Tt=2
bt
=
T − (T − 2)ρ
tr(V−1 B) + ûT V−1 û
T
Since the expected score function for ρ (4.17) is a cubic function, the estimate
for ρ cannot be expressed in an explicit form. Therefore, only for this case,
it is necessary to use methods of numerical optimization. Because the first
derivative for ρ (4.2) has a simple form, it is easy to implement the NewtonRaphson algorithm. It updates the current value of the estimate, ρ(h) through
the formula
h
i
∂`2 (θ 2 )
(h)
i ∂E ∂ρ |y; θ
h ∂` (θ )
2
2
|y; θ (h) /
ρ̂(h+1) = ρ̂h − E
∂ρ
∂ρ
(σ̂η2 )(h+1) =
where
∂E
h
∂`2 (θ 2 )
|y; θ (h)
∂ρ
∂ρ
i
=−
T −1
ρ X
1 + ρ2
2
(B
+
û
)
−
t,t
t
ση2 t=2
(1 − ρ2 )2
is the second derivatives of the joint log-likelihood with respect to ρ, that,
within the E-M algorithm, is the first derivative of the expected score function.
The E-step and the M-step are iterated until convergence, that is until
`(θ (h+1) ; y, b) − `(θ (h) ; y, b)
is arbitrarily small.
In summary, the E-M algorithm consists in the following steps:
1. choosing some initial value for the parameters;
2. E-step as above;
3. M-step as above;
77
4. repeating steps 2 and 3 until convergence.
Although the achievement of the global maximum is not guaranteed, the
monotonicity of the EM procedure, i.e.
`(θ (h+1) ; y, b) ≥ `(θ (h) ; y, b),
can be demonstrated (Dempster et al., 1977).
The EM algorithm produces directly the Empirical Bayes prediction for
the random effects (see section 2.5), b, that is the mean of their conditional
distribution to the observed data y as in (4.12), namely
b̂ = E(b|y) = β̂0 + û,
where
û = Γ̂ZT Ω̂−1 (y − 1n β̂0 − Xβ̂).
4.3
Simulation study
In order to assess the finite sample properties of the estimators and to validate
the practical implementation of the algorithm, we have performed a Monte
Carlo study. In the scheme, we have varied the group size nt , and the time
span T , as reported in Table 4.1.
Table 4.1: Scheme of the Monte Carlo study.
Scenario
1
2
3
4
5
6
nt
30
100
100
30
30
100
T
10
10
30
100
50
50
78
n
300
1000
3000
3000
1500
5000
Table
4.2
4.2
4.3
4.3
4.4
4.4
Table 4.2: Results for scenarios 1 and 2 of the MC study. The table is separated
in two row-blocks: one for the level-1 parameters and the other one for
the level-2 parameters.
β̂1
β̂2
σ̂ 2
β̂0
σ̂η2
ρ̂
True
2.000
4.000
1.000
3.000
0.600
0.700
Scenario 1
Scenario 2
n = 300, nt = 30, T = 10 n = 1000, nt = 100,
Mean
Sd
Bias MSE Mean
Sd
Bias
2.007 0.197 -0.007 0.039 2.000 0.101 0.000
3.992 0.128 0.008 0.016 4.001 0.069 -0.001
0.995 0.082 0.005 0.007 1.001 0.044 -0.001
3.023 0.693 -0.024 0.479 3.011 0.683 -0.011
0.475 0.253 0.125 0.080 0.475 0.235 0.125
0.374 0.345 0.326 0.225 0.390 0.315 0.310
T = 10
MSE
0.010
0.005
0.002
0.465
0.071
0.195
Table 4.3: Results for scenario 3 and 4 of the MC study. The table is separated
in two row-blocks: one for the level-1 parameters and the other one for
the level-2 parameters.
β̂1
β̂2
σ̂ 2
β̂0
σ̂η2
ρ̂
True
2.000
4.000
1.000
3.000
0.600
0.700
Scenario 3
n = 3000, nt = 100,
Mean
Sd Bias
1.996 0.062 0.004
4.000 0.040 0.000
0.998 0.027 0.002
2.975 0.421 0.025
0.561 0.152 0.039
0.586 0.147 0.114
Scenario 4
T = 30 n = 3000, nt = 30, T = 100
MSE Mean
Sd
Bias
MSE
0.004 2.005 0.064 -0.005
0.004
0.002 4.003 0.039 -0.002
0.002
0.001 0.998 0.027 0.002
0.001
0.178 2.998 0.257 0.002
0.066
0.024 0.586 0.090 0.014
0.008
0.035 0.659 0.082 0.041
0.008
79
Table 4.4: Results for scenarios 5 and 6 of the MC study. The table is separated
in two row-blocks: one for the level-1 parameters and the other one for
the level-2 parameters.
β̂1
β̂2
σ̂ 2
β̂0
σ̂η2
ρ̂
True
2.000
4.000
1.000
3.000
0.600
0.700
Scenario 5
Scenario 6
n = 1500, nt = 30, T = 50 n = 5000, nt = 100,
Mean
Sd
Bias
MSE Mean
Sd
Bias
2.005 0.090 -0.005 0.008 2.001 0.047 -0.001
3.999 0.054 0.001 0.003 4.001 0.029 -0.001
0.999 0.037 0.002 0.001 1.000 0.018 0.000
3.012 0.341 -0.012 0.116 2.989 0.352 0.011
0.581 0.129 0.019 0.017 0.568 0.109 0.032
0.630 0.120 0.070 0.019 0.632 0.108 0.068
T = 50
MSE
0.002
0.001
0.000
0.124
0.013
0.016
We have generated n (= nt ∗ T ) level-1 and T level-2 errors (it and ηt )
from a normal distribution with zero mean and variance equal to, respectively, the true values of σ 2 and ση2 . The vector u has been generated as an
autoregressive process of order 1 with errors ηt . In the balanced case, the
matrix Z takes the following form:
Z = IT ⊗ 1nt ,
where the sign ⊗ represents the Kronecker product.
As for design matrix X, two exogenous dichotomic covariates have been considered: the first one with 90% of 0 and 10% of 1 and the second one with
68% of 0 and 32% of 1. Then, the (n × 1) response vector is computed in the
following way:
y = β0 + Xβ + Zu + .
The estimation algorithm, presented in section 4.2, is applied on these data
to give maximum likelihood estimates for the parameters. We have repeated
this procedure for 500 times, and then, for each parameter, we have calculated
the following statistics over the 500 replicates:
• mean,
• standard deviation,
80
• bias, as the difference between the true value of the parameter and the
Monte Carlo mean,
• Mean Square Error (MSE), that is
P500
h=1 (θ̂ h
− θ)2 .
The whole estimation and simulation procedures has been implemented in R
with a completely own-written code.
Table 4.2, 4.3 and 4.4 show the results for the 6 scenarios. The statistics for
level-1 estimates, β1 , β2 and σ 2 , depend on the global sample size, n. In fact,
in scenario 3 and 4, where n is kept constant, means, standard deviations,
bias and MSE are similar. As n increases, bias, standard deviations and MSE
decreases as expected.
On the other hand, level-2 estimates, β0 , ση2 and ρ, depend on the number
of groups, the time points T . In fact, in general, the values of the statistics
decrease as T increases.
The results show that the estimators have a good finite sample performance. Therefore, we expect an even better behaviour of the estimators
when applied to the Tribal art dataset, which contains much more observations than the simulated data. Also, the validity of the R implementation is
confirmed.
Finally, we have extended the simulation study to assess the forecasting
capability of the model. To this aim, we have simulated n + nt level-1 and
T + 1 level-2 units through the same procedure as before. Then, we have
estimated the model by using only the first n units and the first T groups.
The (nt × 1) response vector yT +1 can be predicted as:
ŷT +1 = β̂0 + XT +1 β̂ + 1nt ûT +1
where
ûT +1 = ρ̂ûT .
This procedure has been repeated for 500 times, and for each replication s,
81
we have calculated the Mean Absolute Percentage Error:
nt yi,T +1 − ŷi,T +1,s 1 X
MAPEs =
nt i=1 yi,T +1
where yi,T +1 is the simulated response for the i-th unit in the group T + 1.
In order to have an element of comparison, the response vector yT +1 for
each replication has been predicted also through a multilevel model with
independent random effects in the following way:
ŷT +1 = β̂0 + XT +1 β̂.
Table 4.5 reports mean and standard deviation of the MAPE over the 500
replications both for the model with autoregressive random effects (in the
table “AR model”) and for the model with independent random effects (“IND
model”) in three different scenarios differing for level-1 and level-2 sample
sizes. First of all, it shows that, unsurprisingly, the forecasting capability of
both models improves in terms of mean of MAPE as the number of level-1, n,
and the number of level-2 units, T , increase. Moreover, the distribution of the
MAPE reveals that the model with autoregressive random effects provides
a better forecast in all the scenarios for the responses of a (out-of-sample)
1-lagged time point than the classic multilevel model.
Table 4.5: Simulated forecast for the multilevel model with autoregressive random
effects (AR model) and for the model with independent random effects
(IND model). For both models, the mean and the standard deviation
of the MAPE are reported.
MAPE
n=300, nt = 30, T=10
n=3000, nt = 100, T=30
n=5000, nt = 100, T=50
Mean
AR model IND model
0.412
0.439
0.365
0.393
0.352
0.382
82
AR model
0.701
0.605
0.982
Sd
IND model
0.720
0.714
1.250
4.4
The new model and Tribal art prices
In this section, we show the results of the fit of the new model upon the
Tribal art dataset. The set of covariates is the same as that of the classical
multilevel model with independent random effects (3.5). The model, that we
will call “AR-RE-hedonic”, has the following specification:
log10 (yit ) =β0t + β1 OGGit + β2 REGit + β3 MATPit
+ β4 CPATit + β5 CATDit + β6 CABSit + β7 CABCit
+ β8 CAESit + β9 CASTit + β10 CAIL:CAAIit
(4.18)
+ β11 ASNC:ASLUit + it
it ∼ NID(0, σ 2 ) ∀i, ∀t
for t = 1, . . . , T , where T is the number of semesters, and for i = 1t , . . . , nt ,
where nt is the total number of items sold in the semester t.
At the second level we have:
β0t = β0 + ut
ut = ρut−1 + ηt
ηt ∼ NID(0, ση2 ),
|ρ| < 1
(4.19)
Moreover, it is assumed the independence between ηt and us for all s < t and
between ηt and it for all t 6= s = 1, . . . , T and for all i = 1, . . . , nt .
The results of the fit, obtained through the EM algorithm, are in Table 4.6.
The estimates and the predicted random effects are quite close to those from
the RE-hedonic model reported in the second column of Table 3.2. In addition to the within-group variability and the between-group variability, a further variance component appears in the autocorrelated-random-effect model,
namely the level-2 residual variability, ση2 . In fact, while in the classic twolevel model, the level-2 residual variance coincides with the between-group
variance, in the model with AR(1) random effects, the between-group vari83
ance takes the following form:
Var(ut ) =
ση2
,
1 − ρ2
that in this case has an estimate equal to 0.031. It is slightly bigger than that
of the RE-hedonic model (σ̂u2 = 0.026). Therefore, the proportion of variability explained by the between-semesters variance (ICC) is bigger, 15.3%
against 13.3%. This confirms that, at least in part, the structure at the second level has been taken into account. Further, the estimate for the 1-lag
autocorrelation is quite high, ρ̂ = 0.705. It is interesting to note that its
magnitude agrees with the expectation formed by observing the plots of the
autocorrelation functions relating to the residuals of the RE-hedonic model
( 3.8).
The likelihood ratio test between the RE-hedonic model and the AR-REhedonic model tests the null hypothesis that ρ = 0, under the alternative
that ρ 6= 0. In fact, the two models are nested: when the 1-lag parameter
is zero, the random effects are independent, and the multilevel model takes
the usual characteristics. In the present case, this test confirms that ρ is
significantly different from zero, and, as a result, the unrestricted model is
preferable. Also according to the Information Criteria, the AR-RE-hedonic
model fits our data better than the RE-hedonic model.
Table 4.6: Parameter estimates of the AR-RE-hedonic model (4.18). Bootstrap
standard errors are indicated in parentheses.
Estimate (s.e.)
σ̂ 2
σ̂η2
ρ̂
ICC
0.171 (0.000)
0.016 (0.009)
0.705 (0.191)
0.153
AIC
BIC
15381
15940
Table 4.6: continued in the next page
84
Table 4.6: continued from the previous page
Estimate (s.e.)
# param.
74
β0
2.238 (0.367)
Semester
1998-1
1998-2
1999-1
1999-2
2000-1
2000-2
2001-1
2001-2
2002-1
2002-2
2003-1
2003-2
2004-1
2004-2
2005-1
2005-2
2006-1
2006-2
2007-1
2007-2
2008-1
2008-2
2009-1
2009-2
2010-1
2010-2
-0.252
-0.134
-0.074
0.124
0.231
0.196
0.170
0.036
0.116
-0.106
-0.180
-0.263
-0.287
-0.171
0.001
-0.021
-0.006
-0.106
-0.066
0.054
0.002
-0.076
0.017
0.228
0.172
0.447
(0.021)
(0.020)
(0.023)
(0.022)
(0.020)
(0.021)
(0.024)
(0.028)
(0.021)
(0.034)
(0.022)
(0.022)
(0.023)
(0.022)
(0.023)
(0.023)
(0.023)
(0.022)
(0.021)
(0.026)
(0.026)
(0.024)
(0.024)
(0.036)
(0.033)
(0.048)
Table 4.6: continued in the next page
85
Table 4.6: continued from the previous page
Estimate (s.e.)
2011-1
0.163 (0.032)
Type of object: baseline Furniture
Sticks
Masks
Religious objects
Ornaments
Sculptures
Musical instruments
Tools
Clothing
Textiles
Weapons
Jewels
-0.087
0.109
-0.004
-0.099
0.050
-0.116
-0.084
-0.069
-0.037
-0.091
-0.051
(0.029)
(0.023)
(0.025)
(0.028)
(0.023)
(0.031)
(0.023)
(0.035)
(0.037)
(0.027)
(0.039)
Region: baseline Central America
Southern Africa
Western Africa
Eastern Africa
Australia
Indonesia
Melanesia
Polynesia
Northern America
Northern Africa
Southern America
Mesoamerica
Far Eastern
Micronesia
Indian Region
Asian Southeast
-0.161
-0.105
-0.151
0.064
-0.107
0.006
0.176
0.230
-0.361
0.016
0.116
-0.083
0.095
0.293
-0.069
(0.032)
(0.011)
(0.023)
(0.056)
(0.025)
(0.017)
(0.019)
(0.017)
(0.119)
(0.024)
(0.019)
(0.119)
(0.070)
(0.102)
(0.118)
Table 4.6: continued in the next page
86
Table 4.6: continued from the previous page
Estimate (s.e.)
Middle East
-0.549 (0.091)
Type of material: baseline Ivory
Vegetable fibre, paper, plumage
Wood
Metal
Gold
Stone
Precious stone
Terracotta, ceramic
Silver
Textile and hides
Seashell
Bone, horn
Not indicated
-0.046
0.072
-0.030
0.129
0.040
0.050
0.001
-0.086
-0.024
0.066
-0.130
0.042
(0.027)
(0.019)
(0.026)
(0.032)
(0.029)
(0.034)
(0.024)
(0.042)
(0.033)
(0.045)
(0.034)
(0.041)
Patina: baseline Not indicated
Pejorative
Present
Appreciative
0.231 (0.037)
0.032 (0.010)
0.115 (0.011)
Description on the catalogue: baseline Absent
Short visual descr.
Visual descr.
Broad visual descr.
Critical descr.
Broad critical descr.
-0.168
0.004
0.236
0.217
0.584
(0.028)
(0.029)
(0.034)
(0.034)
(0.039)
Yes vs No
Specialized bibliography (dummy)
Comparative bibliography (dummy)
Exhibition (dummy)
0.135 (0.013)
0.118 (0.009)
0.067 (0.014)
Table 4.6: continued in the next page
87
Table 4.6: continued from the previous page
Estimate (s.e.)
Historicization: baseline Absent
Museum certification
Relevant museum certification
Simple certification
0.022 (0.017)
0.037 (0.014)
0.037 (0.009)
Illustration: baseline Absent
Miscellaneous col. ill.
Col. cover
Col. half page
Col. full page
More than one col. ill.
Col. quarter page
Miscellaneous b/w ill.
b/w half page
b/w quarter page
0.411
1.429
0.867
1.013
1.233
0.666
0.400
0.545
0.301
(0.023)
(0.095)
(0.024)
(0.025)
(0.029)
(0.022)
(0.034)
(0.048)
(0.025)
Auction house and venue: baseline Bonhams-New York
Christie’s-Amsterdam
Christie’s-New York
Christie’s-Paris
Encheres Rive Gauche-Paris
Koller-Zurich
Piasa-Paris
Sotheby’s-New York
Sotheby’s-Paris
0.795
0.721
0.611
0.536
0.006
0.740
0.894
0.742
(0.049)
(0.047)
(0.046)
(0.075)
(0.051)
(0.055)
(0.047)
(0.047)
Now, we focus on the residuals. The level-1 residuals versus fitted values
in Figure 4.1 are centered around zero and show a quite constant variability
at the first level. However, unsurprisingly, the plot of the standard deviations
of the level-1 residuals for each group, shown in Figure 4.5, highlights the
same heteroscedastic time-pattern as for the RE-hedonic model (Figure 3.6).
88
As said before, to cope with this problem, we have computed robust standard
errors through the wild bootstrap procedure (subsection 3.2.1). Nevertheless,
the presence of AR(1) random effects requires an extension of the wild bootstrap for time dependent data (Gonçalves and Kilian, 2004; Shao, 2010) but
adapted to the multilevel structure. In particular, we have substitute the
bootstrap disturbances in step 2 of subsection 3.2.1 by the following expression:
r∗t = u∗t + ∗t ,
with ∗t = ˆt /(1 − ht )1/2 · wt ,
where u∗t is an autoregressive process with disturbances equal to wt η̂t , wt
is a (nt × 1) vector of values independently drawn by the auxiliar distribution (3.6), and ht is t-th diagonal element of the orthogonal projection
matrix of X. In this way, we have taken into account both the time dependence structure at the second level and the heteroscedasticity at the first level.
The Q-Q plot in Figure 4.2a reveals that the distribution of the residuals, though it looks symmetric, has still quite heavier tails. In fact, also in
this case, the Shapiro-Wilk test rejects the null hypothesis of normality for
the level-1 residuals. On the contrary, the assumption of normality for the
random effects is verified by the Q-Q plot in Figure 4.2b and by the ShapiroWilk test in Table 4.7.
The boxplots of the residuals by each semester for the AR-RE-hedonic model
are in Figure 4.3. Beside it, the boxplots for the case of independent random effects has been repeated in order to make comparisons. The figures
look very similar and show that both models capture quite well the grouped
structure of data. As done in the previous chapter, in order to test the assumption of error processes for the AR-RE model, we have computed the
autocorrelation functions (global and partial) of the means by semester of
level-1 residuals. The correlograms in Figures 4.4 do not reveal any residual dependence structure. Hence, the autoregressive behaviour of the level-2
residuals and the level-1 residuals by semester observed for the RE-hedonic
model (Figure 3.8) has been completely absorbed by an AR(1) process for
89
the level-2 errors.
0
−2
−1
Residuals
1
2
Figure 4.1: Residuals versus fitted values of the AR-RE-hedonic model.
2
3
4
5
Fitted values
Figure 4.2: Normal probability plots of residuals of the AR-RE-hedonic model.
(a) Level-1 residuals.
(b) Level-2 residuals.
Normal Q−Q Plot
Normal Q−Q Plot
−1
0.2
●
Sample Quantiles
0
1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
● ●
●
●
−0.2
●●●●
●
●
−4
●
●
●
●
●
●
●
● ●
0.0
2
●
●●
●●
−2
Sample Quantiles
●
0.4
●
●
●
●
−2
0
2
4
−2
Theoretical Quantiles
−1
0
Theoretical Quantiles
90
1
2
Figure 4.3: Residuals by semester for the AR-RE-hedonic and RE-hedonic
model (3.5).
(a) RE-hedonic.
(b) AR(1) random effects.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1
●
●
●
●
●
●
●
●
●
●
Residuals
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2
●
●
●
●
●
●
●
●
●
●
●
0
1
●
●
●
●
●
●
●
●
−1
Residuals
●
●
●
●
●
2
●
●
●
0
2
●
●
●
●
●
●
●
●
●
●
−2
●
●
●
●
●
●
1998−1
1998−1
2000−1
2002−1
2004−1
2006−1
2008−1
2000−1
2002−1
2004−1
2006−1
2008−1
2010−1
Semester
2010−1
Semester
Table 4.7: Shapiro-Wilk normality tests for the residuals of the AR-RE-hedonic
model.
W
p-value
lev-1 residuals
0.992
0.000
lev-2 residuals
0.974
0.721
In theory, the multilevel model allows to predict both level-1 unit responses and level-2 unit responses; in other words, both the responses of
units in existing groups and the responses of units in not existing groups.
Table 4.8 summarizes and compares the prediction capability of the ARRE-hedonic and RE-hedonic models for the first and second level responses
through the following aggregate measures of error:
v
u
T +1
u 1 nX
t
RMSE =
yi,T +1 − ŷi,T +1
nT +1 i=1
MAPE =
1
nT +1
nt X
yi,T +1 − ŷi,T +1 .
yi,T +1
i=1
91
Figure 4.4: Plots of autocorrelation functions of the residuals of the AR-REhedonic model.
0.5
0.0
Partial ACF
−1.0
−0.5
0.0
−1.0
−0.5
ACF
0.5
1.0
(b) Partial autocorrelation function
of the level-1 residual mean by
semesters.
1.0
(a) Autocorrelation function of the level1 residual mean by semesters.
0
2
4
6
8
10
12
14
2
4
6
Lag
10
12
14
0.5
0.0
−0.5
−1.0
−1.0
−0.5
0.0
Partial ACF
0.5
1.0
(d) Partial autocorrelation function of
the level-2 residuals.
1.0
(c) Autocorrelation function of the level2 residuals.
ACF
8
Lag
0
2
4
6
8
10
12
14
2
Lag
4
6
8
10
12
14
Lag
On the one hand, the models predict responses of out-of-sample level-1 units
belonging to existing groups with similar performances. On the other hand,
the AR-RE-hedonic model allows to forecast better the effect of a 1-lagged
out-of-sample semester through the AR(1) process, and, therefore, the prices
of objects sold in that semester. More formally, leaving out all the observa92
0.55
Figure 4.5: Standard deviations of the level-1 residuals by semester of the AR-REhedonic.
●
0.45
●
●
●
●
●
●
●
●
● ●
●
0.40
Residual sd by semester
0.50
●
●
●
●
●
●
●
●
●
0.35
● ●
●
●
1998−1
●
●
2000−2
2003−1
2005−2
2008−1
2010−2
Semester
tions that belong to the last group, what we want to forecast is the price of
these objects, the vector yT +1 . The RE-hedonic model obtains a forecast for
this vector simply as:
ŷT +1 = β̂0 + β̂1 OGGit + β̂2 REGit + . . . + β̂11 ASNC:ASLUit ,
since it does not provide a prediction for the random effect uT +1 . For this
reason, and also because we saw that it produces the same estimates as the
FE-hedonic model, the classic multilevel model and the fixed-effect model
have equal forecasting capability. Instead, the AR-RE-hedonic model adds
to the value obtained by the RE-hedonic model a further piece of information
due to the historical memory; that is, it forecasts the response vector yT +1
as:
ŷT +1 = β̂0 + β̂1 OGGit + β̂2 REGit + . . . + β̂11 ASNC:ASLUit + 1nT +1 ρ̂ûT .
The plot in Figure 4.6a compares the elements of the vector ŷT +1 obtained
93
by the AR-RE-hedonic model with those obtained by the RE-hedonic model,
and Figure 4.6b reports the regression lines passing through those points. The
two lines are parallel because the AR-RE-hedonic model shifts every point
obtained by the RE-hedonic by the same quantity, ρ̂ûT . It is evident that
the red dashed line is closer to the bisector line than the blue line, and this
confirms that the AR-RE-hedonic model allows a better forecasting than the
competing models.
Table 4.8: Prediction of the responses of 100 out-of-sample level-1 units and forecast of the responses of units in one out-of-sample semester, yT +1 .
RE-hedonic AR-RE-hedonic
100 level-1 units in existing groups
MAPE
0.072
0.072
RMSE
0.342
0.341
units in the semester T + 1
MAPE
0.102
0.092
RMSE
0.393
0.367
Figure 4.6: Forecast of nT +1 responses corresponding to units in the out-of-sample
semester T + 1. The black dashed line represents the bisector.
5.0
4.5
3.0
3.5
4.0
Forecasted responses
4.5
4.0
3.5
3.0
Forecasted responses
AR−RE−hedonic
RE−hedonic
5.5
AR−RE−hedonic
RE−hedonic
5.0
5.5
6.0
(b) Regression lines passing through the
points of the plot to the left.
6.0
(a) Plot of the forecasted vs observed
responses obtained by the AR-REhedonic and RE-hedonic models.
3.0
3.5
4.0
4.5
5.0
5.5
6.0
3.0
Observed responses
3.5
4.0
4.5
Observed responses
94
5.0
5.5
6.0
Chapter 5
Conclusions
Nowadays, artwork items are considered investment goods in the same way as
stocks, bonds and real estates. For this reason, the convenience of investing
in the art market requires to be evaluated by coupling both the aesthetic
and economic value. With respect to the stocks that are exchanged a high
number of times in each instant of time, artworks are one-off pieces in their
kind, hardly comparable with each other, and they rarely pass through the
market. Therefore, the items of arts need peculiar tools of analysis. Some
methods to build indexes for artworks prices have been proposed, especially
for paintings. Numerous studies have been conducted on different segments
of Western art, Impressionist art among all.
In the present work, we have performed an economic and econometric analysis
on the Tribal art market, a segment of art that has not received the same
interest as other segments by analysts and researchers, until recent years.
To this aim, we used a unique and original hand-collected database that
includes information on worldwide auctions of Tribal art objects in the time
span 1998-2011.
Among the existent approaches for building a price index for the art
market, the hedonic regression is the one that better fits Tribal art data. It
is a multiple linear regression with fixed effects that takes into account the
heterogeneity of artworks by explaining prices through object features and
allows to construct a market price index by neutralizing the effect of quality.
95
However, since generally art data include many qualitative variables, the
main drawback of the hedonic approach is the large number of parameters
to be estimated, so that the resulting models are not parsimonious.
The first contribution of this thesis is to consider the influence of time
effects on artwork prices through a different approach. Since we observe
different artworks sold at every auction, Tribal art data do not constitute
either a panel or a time series. Rather, they have a two-level structure in
that items, level-1 units, are grouped in time points, level-2 units. Hence,
we have proposed to exploit the multilevel model to explain heterogeneity
of prices among time points. We have applied and compared extensively
the classic hedonic regression model and the multilevel model on the Tribal
art dataset. The two models provides similar results in terms of estimates
and residuals. Since the assumptions of homoscedasticity and normality of
the first level errors do not hold, we have computed the standard errors of
the estimates through the Wild Bootstrap procedure; it is, in fact, robust to
heteroscedastic errors and, being nonparametric, also to not Gaussian errors.
Moreover, the time-effects do not result independent for different groups, as
the classic multilevel model assumes. In general, in fact, to our knowledge,
the applications of multilevel models to longitudinal data consider occasions,
that is the points in time, as the lowest level units and individuals as higher
units. Therefore, any time dependence structure is assessed at the first level.
The main theoretical contribution of this thesis consists in a new extension
of the classic multilevel model that consists in relaxing the assumption of
independence among random effects and treating them as a time series at
the second level. In particular, first we have specified a multilevel model with
an AR(1) process at the second level to capture the time dependence among
groups. Then, we have derived and implemented in R, with an own-written
code, maximum likelihood estimators through the E-M algorithm. We have
conducted a Monte Carlo study that has confirmed the good finite sample
behaviours of the ML estimators and the validity of our R-code. Finally, we
have fitted the new model to the Tribal art dataset. We found that the AR(1)
process completely captures the time dependence structure among groupeffects. Moreover, with respect to the competing hedonic regression model,
96
the proposed model presents similar estimates and, consequently, similar
interpretation of the results: the estimated regression coefficients can be still
interpreted as shadow prices for each feature, and an index price for the art
market is easily provided through the predictions of the time-effects. On the
other hand, the new model has less parameters to be estimated and provides a
decomposition of the total variability of the response (as the classic multilevel
model). Moreover, the multilevel model with autoregressive random effects
allows a better forecasting of the responses of units in a 1-lag-ahead group,
that are the prices of objects that will be sold one semester later.
Therefore, the new model improves considerably the fit of the Tribal art data
with respect to both the hedonic regression model and the classic multilevel
model.
Many applications and further extensions of this model are possible. In
fact, by treating the random effects as a time series at the second level, it
allows to exploit tools of the time series analysis.
97
References
D. A. Belsley and E. J. Kontoghiorghes. Handbook of Computational Econometrics. J. Wiley & Sons, Chichester, 2009.
H. Blalock. Contextual-effects models: theoretical and methodological issues.
Annual Review of Sociology, 10:353–72, 1984.
W. Browne and H. Goldstein. Mcmc sampling for a multilevel model with
nonindependent residuals within and between cluster units. Journal of
Educational and Behavioral Statistics, 35:453–73, 2010.
Paul Burton, Lyle Gurrin, and Peter Sly. Extending the simple linear regression model to account for correlated responses: an introduction to generalized estimating equations and multi-level mixed modelling. Statistics in
Medicine, 17:1261–1291, 1998.
G. Candela and A. E. Scorcu. A price index for art market auctions. an application to the italian market for modern and contemporary oil paintings.
Journal of Cultural Economics, 21:175–196, 1997.
G. Candela and A.E. Scorcu. Economia delle arti. Zanichelli, Bologna, 2004.
O. Chanel. Is the art market predictable? European Economic Review, 39:
519–527, 1995.
M. Davidian and D. M. Giltinan. Nonlinear models for repeated measurement
data. Chapman and Hall, London, 1995.
R. Davidson and E. Flachaire. The wild bootstrap, tamed at last. Journal
of Econometrics, 146:162–169, 2008.
99
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from
incomplete data via em algorithm. Journal of the Royal Statistical Society,
Series B, 39:1–38, 1977.
A.P. Dempster, D.B. Rubin, and R.K. Tsutakawa. Estimation in covariance
components models. Journal of the American Statistical Association, 76:
341–53, 1981.
Thomas A. DiPrete and Jerry D. Forristal. Multilevel models: Methods and
substance. Annual Review of Sociology, 20:331–57, 1994.
T. Fearn. A bayesian approach to growth curves. Biometrika, 62:89–100,
1975.
Paolo Figini. La valutazione dell’investimento in arte: il caso dell’arte etnica.
In G. Candela and M. Biordi, editors, Arte etnica tra cultura e mercato,
pages 241–242. Skira, Milano, 2007.
W. N. Goetzmann. Accounting for taste: Art and financial markets over
three centuries. The American Economic Review, 83:1370–1376, 1993.
Harvey Goldstein. Multilevel mixed linear model analysis using iterative
generalized least squares. Biometrika, 73:43–56, 1986.
Harvey Goldstein. Multilevel Statistical Models. John Wiley, New York, 2nd
edition, 1995.
Harvey Goldstein. Multilevel Statistical Models. J. Wiley & Sons, Chichester,
3rd edition, 2009.
Harvey Goldstein. Multilevel Statistical Models. J. Wiley & Sons, Chichester,
4th edition, 2010.
S. Gonçalves and L. Kilian. Bootstrapping autoregressions with conditional
heteroscedasticity of unknown form. Journal of Econometrics, 123:89–120,
2004.
100
J. D. Hamilton. Time Series Analysis. Princeton University Press, Princeton,
1994.
David A. Harville. Maximum likelihood approaches to variance component
estimation and to related problems. Journal of the American Statistical
Association, 72:320–340, 1977.
C. R. Henderson. Estimation of variance and covariance components. International Biometric Society, 9:226–252, 1953.
W. James and C. Stein. Estimation with quadratic loss. Proceedings of the
Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1:
361–379, 1961.
Kelvin Jones, Ronald John Johnston, and Charles J. Pattie. People, places
and regions: exploring the use of multi-level modelling in the analysis of
electoral data. British Journal of Political Science, 22:343–380, 1992.
Ita G.G. Kreft and Jan De Leeuw. Introducing Multilevel Modeling. Sage,
London, 1998.
Nan M. Laird and James H. Ware. Random-effects models for longitudinal
data. Biometrics, 38:963–974, 1982.
Nan M. Laird, Nicholas Lange, and Daniel Stram. Maximum likelihood
computations with repeated measures: Application of the em algorithm.
American Statistical Association, 82:97–105, 1987.
D. V. Lindley and A. F. M. Smith. Bayes estimates for the linear model (with
discussion). Journal of the Royal Statistical Society B, 34:1–41, 1972.
R. Y. Liu. Bootstrap procedures under some non-i.i.d. models. Annals of
Statistics, 16:1696–1708, 1988.
M. Locatelli-Biey and R. Zanola. The market for picasso prints: An hybrid
approach. Journal of Cultural Economics, 29:127–136, 2005.
N. Longford. Random coefficient models. Clarendon, Oxford, 1993.
101
G.S. Maddala. The use of variance components models in pooling cross
section and time series data. Econometrica, 39:341–58, 1971.
J.R. Magnus and H. Neudecker. Matrix Differential Calculus with Applications in Statistics and Econometrics - Third Edition. Wiley, Chichester,
2007.
G.J. McLachlan and J. Krishnan. The EM Algorithm and Extensions. Wiley,
New York, 1997.
J. Mei and M. Moses. Art as an investment and the underperformance of
masterpieces. American Economic Review, 92:1656–1668, 2002.
C. Morris. Parametric empirical bayes inference: theory and applications.
Journal of the American Statistical Association, 78:47–65, 1983.
H. D. Patterson and R. Thompson. Recovery of inter-block information when
block sizes are unequal. Biometrika, 58:545–54, 1971.
Jose C. Pinheiro and Douglas M. Bates. Mixed-Effects Models in S and
S-Plus. Springer, 2000.
Stephen Raudenbush. A crossed random effects model for unbalanced data
with applications in cross-sectional and longitudinal research. Journal of
Educational Statistics, 18:321–349, 1993.
Stephen W. Raudenbush and A. S. Bryk. A hierarchical model for studying
school effects. Sociology of Education, 59:1–17, 1986.
Stephen W. Raudenbush and Anthony S. Bryk. Hierarchical linear models:
applications and data analysis methods. Sage pubblications, Thousand
Oaks, 2nd edition, 2002.
S. Rosen. Hedonic prices and implicit markets: product differentiation in
pure competition. Journal of Political Economy, 82:34–55, 1974.
S. R. Searle, G. Casella, and C. E. McCulloch. Variance components. Wiley,
New York, 1992.
102
S. G. Self and K. Liang. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under non-standard conditions. Journal
of the American Statistical Association, 82:605–10, 1987.
M. Seltzer, W. Wong, and A. Bryk. Bayesian analysis in applications of
hierarchical models: Issues and methods. Journal of Educational and Behavioral Statistics, 21:131–167, 1986.
X. Shao. The dependent wild bootstrap. Journal of the American Statistical
Association, 105:218–235, 2010.
Anders Skrondal and Sophia Rabe-Hesketh. Generalized latent variable modeling: multilevel, longitudinal, and structural equation models. Chapman
& Hall/CRC, New York, 2004.
Anders Skrondal and Sophia Rabe-Hesketh. Multilevel and related models
for longitudinal data. In J. de Leeuw and E. Meijer, editors, Handbook of
Multilevel Analysis, chapter 7. Springer, New York, 2008.
Anders Skrondal and Sophia Rabe-Hesketh. Prediction in multilevel generalized linear models. Journal of the Royal Statistical Society A, 172:659–687,
2009.
A. F. M. Smith. A general bayesian linear model. Journal of the Royal
Statistical Society B, 35:67–75, 1973.
Marco R. Steenbergen and Bradford S. Jones. Modeling multilevel data
structures. American Journal of Political Science, 46:218–237, 2002.
J. P. Stein. The monetary appreciation of paintings. Journal of Political
Economy, 85:1021–1035, 1977.
R. Stiratelli, N. Laird, and J. Ware. Random effects models for serial observations with binary response. Biometrics, 40:961–971, 1984.
J. L. F. Strenio, H. I. Weisberg, and A. S. Bryk. Empirical bayes estimation
of individual growth curve parameters and their relations to covariates.
Biometrics, 39:71–86, 1983.
103
G. Wong and W. Mason. The hierarchical logistic regression model for multilevel analyis. Journal of the American Statistical Association, 80:513–524,
1985.
104
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertisement