martelli irene tesi

martelli irene tesi
Alma Mater Studiorum - Università di Bologna
DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA
PER LA RICERCA SCIENTIFICA
Ciclo XXVI
Settore Concorsuale di aerenza: 13/D1
Settore Scientico disciplinare: SECS-S/01
MULTIDIMENSIONAL ITEM RESPONSE THEORY
MODELS WITH GENERAL AND SPECIFIC
LATENT TRAITS FOR ORDINAL DATA
Presentata da:
Irene Martelli
Esame nale anno 2014
Alma Mater Studiorum - Università di Bologna
DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA
PER LA RICERCA SCIENTIFICA
Ciclo XXVI
Settore Concorsuale di aerenza: 13/D1
Settore Scientico disciplinare: SECS-S/01
MULTIDIMENSIONAL ITEM RESPONSE THEORY
MODELS WITH GENERAL AND SPECIFIC
LATENT TRAITS FOR ORDINAL DATA
Presentata da:
Irene Martelli
Coordinatore Dottorato:
Relatore:
Chiar.mo Prof. Angela Montanari
Chiar.mo Prof. Stefania Mignani
Esame nale anno 2014
To my family and Lorenzo,
for their love and support.
i
Abstract
The aim of the thesis is to propose a Bayesian estimation through Markov chain
Monte Carlo of multidimensional item response theory models for graded responses with complex structures and correlated traits. In particular, this work
focuses on the multiunidimensional and the additive underlying latent structures,
considering that the rst one is widely used and represents a classical approach
in multidimensional item response analysis, while the second one is able to reect
the complexity of real interactions between items and respondents.
A simulation study is conducted to evaluate the parameter recovery for the
proposed models under dierent conditions (sample size, test and subtest length,
number of response categories, and correlation structure). The results show that
the parameter recovery is particularly sensitive to the sample size, due to the
model complexity and the high number of parameters to be estimated. For a sufciently large sample size the parameters of the multiunidimensional and additive
graded response models are well reproduced. The results are also aected by the
trade-o between the number of items constituting the test and the number of
item categories.
An application of the proposed models on response data collected to investigate Romagna and San Marino residents' perceptions and attitudes towards the
tourism industry is also presented.
ii
iii
Acknowledgements
First and foremost I want to thank my supervisor Stefania Mignani for her constant attention, care and belief. Then, I would like to express my gratitude to
Mariagiulia Matteucci for her precious and fundamental suggestions and supervision during the preparation of the thesis. My work would not have been successful
without her. Appreciation is extended to Cristina Bernini who has provided the
data. My special thanks to my friends and colleagues Lucia and Violeta for their
support during the whole period of the PhD.
iv
v
Preface
Item response theory (IRT) falls within the wide context of the measurement of
theoretical latent constructs, which are not observable by denition and can only
be determined indirectly, through the use of other manifest variables.
IRT is extensively used in educational and psychological elds, where usually a test consisting of a set of items is submitted to a sample of examinees to
infer the individuals' unobservable characteristics (abilities).
To this aim, IRT
(Hambleton and Swaminathan, 1985; van der Linden and Hambleton, 1997) represents the main methodological approach that allows to estimate both the item
psychometric properties and the subjects' scores. Moreover, IRT shows a great
potential in applications within behavioral sciences.
In the past, unidimensionality, i.e. the presence of a unique construct underlying the response process, was one of the most common assumption. Nevertheless,
real data often suggest a multidimensional structure and, with the aim to infer
such distinct latent traits, tests should include dierent subtests.
For this reason, models that allow the presence of more than one latent trait
have been recently developed. The so called multidimensional IRT (MIRT) models (see e.g., Reckase, 2009) are able to describe the complexity of the data, taking
into account correlated abilities and also a possible hierarchical structure of latent traits. This is the reason why MIRT models perform better in tting the
subtests if compared to separate unidimensional models.
Several approaches are possible within the multidimensional perspective: explorative models, where all latent traits are allowed to aect all the item responses, or conrmatory models, where all the relations between observed and
latent variables need to be specied in advance.
By using a conrmatory ap-
proach, it is also possible to assume the simultaneous presence of general and
specic latent traits underlying the response process (Sheng and Wikle, 2008).
A further distinction can be made between non compensatory and compensatory
models, where a lack in one trait naturally compensates for the other (Reckase,
2009).
In several applications, data are characterized by hierarchical structures and
the introduction of dierent levels for latent dimensions permits to specify more
vi
general models. Specically, a proper hierarchy can be assumed to underlie the
response process, where the highest level is associated with the overall trait,
while dimensions representing more specic traits are located on lower level of
the hierarchy.
High-order and additive models are two approaches that allow to include
a general trait in addition to multiple specic traits.
Particularly, in additive
models, we can analyze the strength of the relationships between the specic
latent traits and the associated test items directly as well as the strength of the
relationships between the general latent trait and all the test items. This feature
is particularly appealing for complex applications.
A nal distinction can be made according to the nature of the observable
variables. Usually, in an educational testing framework we deal with binary items
(i.e. correct/incorrect) while in psychological and behavioral researches items are
typically ordinal, representing judgments or agreements.
Dierent models for
ordinal data have been developed according to the number of item parameters
(e.g. partial credit models, graded response models) in a unidimensional context.
On the contrary, within a multidimensional context, models for binary data are
usually applied and, often, the available ordinal data are dichotomized, with a
consequent loss of information. Models for ordinal data remain uncommon and
were developed only for uncorrelated latent traits.
For these reasons, in this work we propose an extension of the unidimensional
graded response model (Samejima, 1969) for ordinal data to multidimensional
structures with correlated traits, namely the multiunidimensional and the additive structures.
A further innovative and important aspect of our proposal
deals with the estimation procedure, in fact, we propose a Markov chain Monte
Carlo (MCMC) procedure for parameter estimation which we implement using
the open-source software OpenBUGS.
Structure of the thesis
In the rst chapter some fundamental notions about IRT are introduced. A rst
section illustrates the basic concepts and denitions characterizing the IRT approach, with a brief description of unidimensional models for binary data.
A
vii
second section focuses on unidimensional models for ordinal data and, in particular, on the Samejima's model for graded responses. A nal section explains the
reasons that have driven several developments of IRT towards its multidimensional generalization.
The second chapter introduces the MIRT approach. In the rst section the
main features of these models are described, while in the second section a brief
review on MIRT models for both binary and ordinal response is reported, together
with a brief description of their most common estimation methods.
In the third chapter the main principles characterizing the Bayesian estimation
in MIRT context are introduced. The rst section describes the general Bayesian
framework, while the second section presents the available Bayesian estimation
methods based on MCMC techniques. The third section briey introduces the
functioning of OpenBUGS, which permits to easily run the most common MCMC
algorithm, i.e. the Gibbs sampler.
In the fourth chapter two MIRT models for ordinal data with a complex
structure are introduced in terms of specication, interpretation and estimation.
The focus is on two MIRT models for graded responses and correlated latent
traits: the multiunidimensional model, where items in each subtest characterize
a single ability, and the additive model, where each item measures a general and
a specic ability directly.
The fth chapter describes a simulation study that has been conducted in order to evaluate the parameter recovery of the estimation method for the proposed
models. The simulation study design is illustrated in the rst section, while the
second and the third sections report the results of the simulations performed for
the multiunidimensional and the additive models for ordinal data, respectively.
In the sixth chapter an application of the proposed models to real data is
presented. The application focuses on the investigation of residents' perceptions
and attitudes towards the tourism industry.
In the seventh chapter conclusions and further research on applicative and
methodological aspects are discussed.
viii
Contents
1 An introduction to item response theory (IRT)
1.1
1.2
Basic concepts and denitions . . . . . . . . . . . . . . . . . . . .
1
1.1.1
The concept of model in IRT
. . . . . . . . . . . . . . . .
2
1.1.2
IRT unidimensional models for binary data . . . . . . . . .
3
IRT unidimensional models for ordinal data
. . . . . . . . . . . .
5
1.2.1
Samejima's unidimensional graded response model . . . . .
7
1.2.2
Other unidimensional IRT models for graded
responses
1.3
1
. . . . . . . . . . . . . . . . . . . . . . . . . . .
9
Towards multidimensional models . . . . . . . . . . . . . . . . . .
10
2 Multidimensional IRT (MIRT) models: a review
2.1
13
Main features of MIRT models . . . . . . . . . . . . . . . . . . . .
13
2.1.1
Compensatory and noncompensatory approaches
. . . . .
15
2.1.2
Conrmatory and exploratory approaches
. . . . . . . . .
15
2.1.3
Underlying latent structures . . . . . . . . . . . . . . . . .
16
2.2
MIRT models for binary data
. . . . . . . . . . . . . . . . . . . .
19
2.3
MIRT models for ordinal data . . . . . . . . . . . . . . . . . . . .
22
2.4
Estimation methods
24
. . . . . . . . . . . . . . . . . . . . . . . . .
3 Bayesian estimation of MIRT models
3.1
3.2
27
Elements of Bayesian statistics in MIRT context . . . . . . . . . .
27
3.1.1
Prior distribution choice
. . . . . . . . . . . . . . . . . . .
28
3.1.2
Bayes' Theorem . . . . . . . . . . . . . . . . . . . . . . . .
29
3.1.3
Marginal posterior distributions for model parameters . . .
30
Markov chain Monte Carlo methods . . . . . . . . . . . . . . . . .
31
3.2.1
35
Metropolis-Hastings algorithm . . . . . . . . . . . . . . . .
ix
x
CONTENTS
3.2.2
3.3
Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . .
Bayesian computation using OpenBUGS
. . . . . . . . . . . . . .
4 MIRT graded response models with complex structures
4.1
4.2
4.3
4.4
5.2
5.3
38
41
MIRT graded response models (GRMs) . . . . . . . . . . . . . . .
41
4.1.1
Specication of the multiunidimensional GRM . . . . . . .
44
4.1.2
Specication of the additive GRM . . . . . . . . . . . . . .
45
Person and item parameters: interpretation
. . . . . . . . . . . .
46
4.2.1
Ability parameters
. . . . . . . . . . . . . . . . . . . . . .
46
4.2.2
Multidimensional item discrimination . . . . . . . . . . . .
46
Multiunidimensional GRM implementation . . . . . . . . . . . . .
47
4.3.1
Model specication . . . . . . . . . . . . . . . . . . . . . .
48
4.3.2
Prior distributions
48
4.3.3
Likelihood function for responses
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
50
. . . . . . . . . . . . . . . . . . .
51
4.4.1
Model specication . . . . . . . . . . . . . . . . . . . . . .
51
4.4.2
Prior distributions
52
4.4.3
Likelihood function for responses
Additive GRM implementation
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
5 Simulation Study
5.1
36
53
55
Simulation study design
. . . . . . . . . . . . . . . . . . . . . . .
56
5.1.1
Parameter recovery . . . . . . . . . . . . . . . . . . . . . .
57
5.1.2
Estimated ability correlations
. . . . . . . . . . . . . . . .
57
5.1.3
Convergence detection
. . . . . . . . . . . . . . . . . . . .
57
5.1.4
Bayesian t
. . . . . . . . . . . . . . . . . . . . . . . . . .
59
5.1.5
General simulation conditions
. . . . . . . . . . . . . . . .
Multiunidimensional GRM: simulations and results
60
. . . . . . . .
60
5.2.1
Simulation conditions . . . . . . . . . . . . . . . . . . . . .
60
5.2.2
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
Additive GRM: simulations and results . . . . . . . . . . . . . . .
64
5.3.1
Simulation conditions . . . . . . . . . . . . . . . . . . . . .
64
5.3.2
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
6 Application to real data: residents' attitudes towards tourism
75
6.1
Interpretation of model parameters
. . . . . . . . . . . . . . . . .
75
6.2
Research design . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
CONTENTS
xi
6.3
Results for the multiunidimensional GRM
. . . . . . . . . . . . .
78
6.4
Results for the additive GRM
. . . . . . . . . . . . . . . . . . . .
82
6.5
Heterogeneity in resident perceptions
. . . . . . . . . . . . . . . .
86
7 Conclusions
89
Bibliography
92
Appendices
100
A OpenBUGS code for implemented models
101
A.1
OpenBUGS code:
graded responses
multiunidimensional and additive models for
. . . . . . . . . . . . . . . . . . . . . . . . . . .
B R procedures for the simulation study
101
105
B.1
Multiunidimensional GRM: R code
. . . . . . . . . . . . . . . . .
106
B.2
Additive GRM: R code . . . . . . . . . . . . . . . . . . . . . . . .
109
C Survey questionnaire
113
xii
CONTENTS
List of Tables
4.1
Main features of the proposed multiunidimensional and additive
models for graded responses. . . . . . . . . . . . . . . . . . . . . .
5.1
Simulation conditions for the multiunidimensional model for graded
responses.
5.2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
62
Multiunidimensional model: block 1 simulation results for subtest
2 (median RMSEs and median absolute biases).
5.4
61
Multiunidimensional model: block 1 simulation results for subtest
1 (median RMSEs and median absolute biases).
5.3
54
. . . . . . . . . .
62
Multiunidimensional model: real (r ) and estimated (r̂ ) ability correlations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
5.5
Simulation conditions for the additive model for graded responses.
66
5.6
Additive model: block 1 simulation results for subtest 1 (median
RMSEs and median absolute biases).
5.7
. . . . . . . . . . . . . . . .
69
Additive model: block 2 simulation results for subtest 1 (median
RMSEs and median absolute biases).
5.9
68
Additive model: block 1 simulation results for subtest 2 (median
RMSEs and median absolute biases).
5.8
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
70
Additive model: block 2 simulation results for subtest 2 (median
RMSEs and median absolute biases).
. . . . . . . . . . . . . . . .
71
5.10 Additive model: real (r ) and estimated (r̂ ) ability correlations. . .
72
6.1
Prole of respondents.
77
6.2
Response frequencies for items about tourism benets (B1-B5) and
. . . . . . . . . . . . . . . . . . . . . . . .
items about tourism costs (C1-C5).
6.3
. . . . . . . . . . . . . . . . .
78
Item parameter estimates for the multiunidimensional GRM. . . .
80
xiii
xiv
LIST OF TABLES
6.4
Item parameter estimates for the additive GRM. . . . . . . . . . .
6.5
Normalized mean perception and attitude scores by age, gender,
education, province and typological area. . . . . . . . . . . . . . .
84
87
List of Figures
1.1
Item characteristic curve for a binary item
. . . . . . . . . . . . .
6
1.2
Item response functions for an item with ve categories . . . . . .
6
1.3
Dichotomization of polytomous item responses, the dashed line
indicates the observed category response. . . . . . . . . . . . . . .
8
2.1
Consecutive unidimensional latent structure. . . . . . . . . . . . .
16
2.2
Multiunidimensional latent structure. . . . . . . . . . . . . . . . .
17
2.3
Bi-factor latent structure.
17
2.4
Hierarchical latent structures.
. . . . . . . . . . . . . . . . . . . .
18
2.5
Additive latent structure. . . . . . . . . . . . . . . . . . . . . . . .
18
4.1
Dichotomization used for the MIRT graded response model speci-
. . . . . . . . . . . . . . . . . . . . . .
cation. The dashed line indicates the observed category response.
43
5.1
Bidimensional case for multiunidimensional and additive structures. 56
5.2
Examples of stationary chains. . . . . . . . . . . . . . . . . . . . .
6.1
Representation of the thresholds' parameter estimates for the multiunidimensional model.
6.2
. . . . . . . . . . . . . . . . . . . . . . .
58
82
Representation of the thresholds' parameter estimates for the additive model.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xv
85
xvi
LIST OF FIGURES
Chapter 1
An introduction to item response
theory (IRT)
In this chapter we introduce the fundamental notions concerning item response
theory (IRT). A brief description of IRT models for binary and ordinal data is
carried out. Particular attention is given to the unidimensional Samejima's model
for graded responses, which represents the starting point towards a generalization
into a multidimensional context.
1.1 Basic concepts and denitions
IRT falls within the wide context of the measurement of theoretical latent constructs.
A latent construct is not observable by denition and it can only be
determined indirectly, through the use of other manifest variables. Examples of
latent constructs are the mathematics achievements of students, the satisfaction
of a costumer about a product or service, the psychological status and all the
situations that may refer to the concept of perception, e.g. depression and happiness. Another relevant eld of application of IRT methods is represented by the
behavioral sciences, where the manifest variables, that are often ordinal, express
a judge or an agreement to the phenomenon of interest.
If we consider the educational and psychological elds, where IRT is extensively used, we can say that IRT has the nal aim to measure abilities and attitudes of individuals through the responses on a number of test items. In other
1
2
1. An introduction to item response theory (IRT)
words, by using IRT models, we wish to determine the position of the individual
along some latent dimensions, representing the unobservable characteristics of
the individuals.
In IRT literature the latent traits are commonly called abilities, for the intensive use of IRT methods in the educational eld, where the constructs are
represented by the students' latent abilities. The analysis of the relation between
latent continuous variables and observed categorical variables is known in the
statistical literature as latent trait analysis, that is the reason why in this thesis
the words abilities, latent abilities and latent traits are all referred to the
same concept.
The use of IRT as a measurement theory is fairly recent: in the pioneer work
of Lord and Novick (1968) a rst formalization of the theory is expressed, on the
basis of ideas and principles that raised in the thirties and forties. Improvements
of IRT were due to the necessity to overtake the lacks of the classical test theory (CTT), for instance the sensitivity to sample conditions and the fact that in
CTT individual abilities and test characteristics can be interpreted only in the
same context (Hambleton et al., 1991).
Moreover, IRT focuses on item rather
than on individual score, while in the CTT the evaluation of test properties and
item characteristics are not included. On the other side, IRT permits to evaluate
individual ability and to describe the performances of the items on the test simultaneously. For these reasons, IRT seemed to be an alternative and promising
method to substitute CTT in theoretical and application elds, showing a wide
and eective framework.
1.1.1 The concept of model in IRT
In IRT a model is dened by a mathematical function used to describe the conditional probability of a response given the latent ability, for an item with categorical responses (Thissen and Steinberg, 1986).
The mathematical function
expresses how an examinee with a high position on a latent trait is likely to provide a dierent response to an examinee with a low position on the trait (Ostini
and Nering, 2006). The parametric model describes the relationship between the
"observable", i.e.
the examinee's performance in the test, and the "unobserv-
able", the latent ability.
1.1 Basic concepts and denitions
3
In general, dierent models can be specied depending on:
•
The structure of the data: binary or polytomous (nominal or ordinal) responses;
•
The number of latent dimensions: unidimensional or multidimensional models;
•
The distribution functions used to link responses and ability(ies);
•
The number of item parameters introduced in the model.
Concerning the rst point, IRT permits to specify dierent models depending
on the kind of items we are dealing with, i.e. items with two response categories
or items with more than two response categories (that, in turn, can be odered or
not). The second point is a crucial choice in the model specication procedure:
when only one ability aects the responses we are assuming unidimensionality,
while when we need two or more latent traits to describe the correlation among the
responses we are assuming multidimensionality. Moreover, the model depends on
the probability distribution used to describe the relationship between the response
and the examinee's ability(ies) and the number of parameters describing the item
characteristics introduced.
The most common probability models used are the
normal distribution function (normal ogive models) and the logistic distribution
function (logit models). Finally, a distinction can be made with reference to the
number of item parameters, one, two or three, introduced in the model.
1.1.2 IRT unidimensional models for binary data
In order to illustrate the basic concepts and assumptions of IRT and to introduce
the notation, we start from the simplest models: the unidimensional models for
dichotomous responses (i.e. correct and incorrect). In this context there are three
fundamental assumptions.
The rst assumption states that only one latent ability aects the item responses (unidimensionality assumption).
The second assumption states that a change in the probability of a correct
response, due to a change in the examinee latent ability, is completely described
4
1. An introduction to item response theory (IRT)
by the item characteristic curve (ICC). Thus, the ICC describes how the probability of a response to an item changes relative to a change in the latent trait.
As illustrated before, dierent distribution functions used to link responses and
ability, i.e. dierent mathematical forms of the ICC, lead to dierent IRT models. In any case the probability of a correct response is expressed as a function
of person and item parameters.
The third assumption is the so called local independence assumption:
re-
sponses to a pair of items are statistically independent given the underlying latent ability. Local independence holds when the assumption of unidimensionality
is true.
Let consider a random vector of
p
item responses for the
= 1, . . . , n), denoted by Yi , and the corresponding
= (yi1 , . . . , yip ). θi is the ability of the examinee i. The
i-th
sub-
ject (i
observed responses,
yi
assumption of local
independence can be stated as:
P (yi |θi ) = P (yi1 |θi )P (yi2 |θi ) . . . P (yip |θi ) =
p
Y
P (yij |θi ) .
j=1
When local independence holds, there is one latent variable underlying the
responses and, conditionally to this latent variable, responses are assumed to be
independent.
The unidimensional IRT model for binary data expresses the probability
of a correct response by the subject
i
ηij , which depends
j , for j = 1, . . . , p:
the vector of parameters characterizing item
on
θi
and on
ξj ,
to the item
ηij = f (θi , ξj ) .
j
πij
as a function of the predictor
(1.1)
The so called probit or normal ogive model is obtained when a normal distribution
is used (1.2), whereas when we use the logistic distribution we get the logit model
1
(1.3) :
1 Normal ogive models and logistic models have dierent ICCs for equivalent set of item
parameters values. It can be proved (Haley, 1952; Birnbaum, 1968) that the two formulations
are equivalent in terms of predicted probability through the introduction of a scaling constant
1.702 into the logistic model, in order to balance for dierences in ICCs. When this constant
is introduced in the model, the predicted probabilities dier by less than 0.01 for each level of
ability (Haley, 1952): | Φ(ηij ) − exp(1.702 ηij )/[1 + exp(1.702 ηij )] | < 0.01.
1.2 IRT unidimensional models for ordinal data
5
πij = Φ(ηij ) ⇒ Φ−1 (πij ) = ηj
πij =
exp(ηij )
1 + exp(ηij )
(1.2)
⇒ logit(πij ) = ηij ,
(1.3)
Φ is the standard normal cumulative distribution function.
where
Dierent unidi-
mensional models can then be obtained by introducing a dierent number of item
parameters
ξj
describing the item characteristics. The simplest case has only one
item parameter
ξj = {βj },
and
βj
is called
diculty parameter
. An example of
one-parameter logistic model is the Rasch model (Rasch, 1960) and if we consider a logarithmic transformations of the scale of person and item parameters
(Fischer, 1995), the predictor becomes
If
ξj = {αj , βj }
a
ηij = θi − βj .
αj is
discrimination parameter
added to the model and
we are in the case of two-parameter models. The predictor (1.1) becomes
αj θi − βj :
ηij =
model (1.2) becomes the two-parameter normal ogive model (Lord,
1952) while model (1.3) becomes the two-parameter logistic model (Birnbaum,
1968).
A further extension can nally be done by introducing a
γj
for each item, leading to three-parameter models where
guessing parameter
ξj = {αj , βj γj } (Lord,
1980). See Reckase (2009) for an exhaustive description of such models.
With respect to the ICC, the parameters
αj , βj
and
γj
represent the slope,
the location and the lower asymptote, respectively.
1.2 IRT unidimensional models for ordinal data
Models briey presented above are all referred to dichotomous responses, nevertheless items with multiple response options exist and their use is quite common
in behavioral sciences. IRT models for polytomous items operate in a dierent
way from binary models. In the latter case the knowledge of the characteristics
of a response determines also the characteristics of the other complementary response, while for polytomous items this feature does not hold anymore and each
category function must be modeled separately (Samejima, 1996). In Figure 1.1
the ICC for a binary item is reported, while Figure 1.2 shows dierent response
6
1. An introduction to item response theory (IRT)
functions for an item with ve categories.
Figure 1.1.
Figure 1.2.
Item characteristic curve for a binary item
Item response functions for an item with ve categories
From Figure 1.2 we can see how, for ordered items, the category response
functions are not all monotonic:
only the curves related to the rst and the
last categories are, respectively, monotonically decreasing and increasing.
The
presence of non-monotonic functions raises some complications: these functions
cannot be described only in terms of discrimination and diculty parameter, as in
the binary case. The choice of the proper mathematical form and the estimation
1.2 IRT unidimensional models for ordinal data
7
of parameters for such unimodal functions is a relevant issue. For ordered polytomous items this problem has been solved by treating polytomous items basically
as `concatenated dichotomous' items (Samejima, 1969, 1996): dichotomizations
of item response data are combined in order to get suitable response functions
for each item category.
As we will illustrate more in detail later, several models for ordinal data exist
as result of extensions of the models for binary data.
The simplest model for
ordinal items is the partial credit model (Masters, 1982), which is an extension of
the Rash model for binary items, i.e. with one item parameter. Despite its wide
use, it focuses on the scoring of the individuals and its restrictive assumptions
make it inadequate for modeling purposes, especially in complex contests. In this
work we focus on the Samejima's graded response model, which is the generalization of the two-parameter IRT model for binary data. This choice has been
lead by the consideration that models that include also the guessing parameter,
even if they are appropriate educational eld, do not suit well in the context of
behavioral science, where individuals typically express opinions.
1.2.1 Samejima's unidimensional graded response model
The graded response model for ordinal data was developed by Samejima in 1969.
Examples of graded responses are Lykert-type scales (strongly-disagree, disagree, neutral, agree, and strongly agree) and responses ordered on the
basis of a range of scores.
p
Let consider a set of
Kj
ordinal items,
k.
categories, indexed by
Y1 , . . . , Yj , . . . , Yp ,
where each item has
In the parametrization of the model we consider
1, while the highest score is Kj and each item is
characterized by Kj −1 thresholds or boundaries κj1 , . . . , κj,Kj −1 . The probability
of achieving k or higher categories is assumed to increase monotonically with
that the lowest score on item
j
is
a growth in the latent ability (Samejima, 1996; Reckase, 2009), therefore the
thresholds must satisfy the so called order constraint:
κj1 < · · · < κj,Kj −1 .
Concerning the dichotomization procedure mentioned above, Samejima's (1969)
graded model is based on the probability that an item response will be observed
in
category k or higher
the
k -th
:
the probability
category on item
j
πijk
that the
i-th
subject will select
is equal to the probability of answering above the
8
1. An introduction to item response theory (IRT)
lower boundary for the category (κk−1 ) minus the probability of answering above
2
the category's upper boundary (κk ). Figure 1.3
describes the dichotomization
method used in Samejima's models, a dashed line is used to represent an hypo-
k = 4: the probability to have a response in such
∗
∗
∗
category can be computed as Pi4 − Pi5 , where in general with Pik = P (Yij ≥ k|θi )
we denote the probability of accomplishing step k at a given level of θ .
thetical response in category
Figure 1.3.
Dichotomization of polytomous item responses, the dashed line
indicates the observed category response.
The probability that the i-th examinee's response will fall in the
on item
j
k -th category
can thus be written as:
∗
πijk = P (Yij = k|θi ) = Pik∗ − Pi,k+1
,
where
Pi1∗
and
∗
Pi,K
j +1
(1.4)
are assumed to be respectively 1 and 0, in order to ensure
that the probability of each category can be determined from (1.4).
The two-
parameter normal ogive and logistic formulations of the model can be obtained
from (1.4). The normal ogive form of the Samejima's model for graded responses
2 Figure
adapted from Ostini and Nering (2006).
1.2 IRT unidimensional models for ordinal data
9
is given by:
πijk
1
= P (Yij = k|θi , κjk , κj,k+1 ) = √
2π
αj Z
θi −κjk
2 /2
e−t
dt .
(1.5)
αj θi −κj,k+1
From expression (1.5), we can observe that the discrimination parameter
αj ,
i.e. the slope of the response functions, is constant between all dierent category
responses of a given item. This constraint ensure to avoid negative probabilities
(Steinberg and Thissen, 1995).
The boundary parameters
κjk
κj,k−1 < κjk < κj,k+1 , and
probability of 0.5 of endorsing
item, according to the order constraint
of
θ = κjk ,
the examinee has a
vary within an
at each level
the category
∗
(Reeve, 2002). Pik is the trace line reecting the probability that an examinee's
response will fall in that scoring category or a higher, at any specic level of
latent ability
θ.
The graded model response function
rate of examinees responding to the
θ,
k -th
P (Yij = k|θi )
reects the
category through the dierent levels of
that is a non-monotonic curve, with the exception of the curves associated to
the extreme categories, as previously pointed out in Figure 1.2 (Thissen et al.,
2001).
1.2.2 Other unidimensional IRT models for graded
responses
Several models for items with two or more ordered responses have been developed.
An assortment of these models, together with their features, has been introduced
by van der Linden and Hambleton (1997) and van der Ark (2001). In addition
to Samejima's graded response model (1969), other widely applied IRT models
for ordinal data are the partial credit model (Masters, 1982) and its extension,
the generalized partial credit model (Muraki, 1992). The partial credit model is
an extension to the case of ordinal items of the Rash model for binary items, i.e.
with one item parameter.
On the other side, the Samejima's graded response
model is the generalization of the two-parameter IRT model for binary data.
In partial credit model and in its generalization, the category responses on
the item represent the levels of performance (Reckase, 2009).
As well as in
10
1. An introduction to item response theory (IRT)
the graded response model, we have thresholds between adjacent scores: an examinee's performance is on the left or the right side of a threshold with a specic probability. Here the dichotomization procedure involves only two category
boundaries for a given item, see Ostini and Nering (2006) for a detailed discussion
about dierences between Samejima and Rasch dichotomization approaches.
Mathematical expressions for the partial credit model and the generalized
partial credit model are presented in (1.6) and (1.7), where
D = 1.702
is the
scaling constant:
exp {
Pk
u=1 (θi − κju )}
Pk
v=1 exp {
u=1 (θi − κju )}
πijk = P (Yij = k|θi ) = PKj
Pk
u=1 Dαj (θi − βj + κju )}
Pk
v=1 exp {
u=1 Dαj (θi − βj + κju )}
exp {
(1.6)
πijk = P (Yij = k|θi ) = PKj
.
(1.7)
In the generalized partial credit model the assumption of constant discrimination parameter of test items is relaxed, in fact
αj
parameters may vary across
items. Reckase (2009) provides an exhaustive illustration of such models. Other
IRT models for polytomous items have been proposed by Bock (1972), Andrich
(1978, 1982), Thissen and Steinberg (1984), and Rost (1988). All these models
refer to an unidimensional underlying ability structure.
1.3 Towards multidimensional models
Unidimensional models are suitable when tests are made to measure only one
latent ability (Sheng and Wikle, 2009). There are some advantages in the use of
such unidimensional models: i) they have quite simple mathematical forms; ii)
they perform well in tting the data in several empirical applications; and iii)
they are rather robust to violations of assumptions (Reckase, 2009).
Nevertheless, real interactions between examinees and test items are not simple as described in unidimensional models. A person is likely to use more than
a single ability in the response process, on one hand, and the problems posed in
a test can require several abilities in order to get the right solution, on the other
1.3 Towards multidimensional models
11
side.
Multidimensional IRT (MIRT) models were developed to have a more accurate
description of interactions between persons and test items. In particular, in MIRT
models a vector of latent abilities is introduced, instead of assuming a single
person parameter.
In other words, MIRT models deal with quite common circumstances where
an examinee requires multiple abilities in order to respond to an item. In this
case, more than one latent construct is measured by that item. One of the most
famous example in the educational eld is a mathematical test item presented as
story that requires both mathematical and verbal abilities to arrive at a correct
score (Fox, 2010), where both mathematical and reading comprehension skills
are involved in the answering process.
12
1. An introduction to item response theory (IRT)
Chapter 2
Multidimensional IRT (MIRT)
models: a review
As previously pointed out, the latent space that has to be measured may be more
complex than the one underlying unidimensional IRT models.
The so called
MIRT models are used when separate latent abilities are encompassed in the
observed responses for an item.
In this chapter we introduce the MIRT approach.
In particular, we show
how dierent models can be specied depending on the latent ability structure
hypothesized to underlie the response process.
A literature review on MIRT
models for both binary and ordinal data is reported.
A nal section describes
the most common estimation methods in IRT and MIRT frameworks.
2.1 Main features of MIRT models
The assessment of dimensionality is a key topic in IRT and in the latent variable
framework. A review of methods for an empirical detection of the structure of
tests with binary items was made by Tate (2003). In his work, a particular attention is given to the assessment of the test statistical structure as subtended from
the relations between examinees and items. This aspect should be an important
13
14
2. Multidimensional IRT (MIRT) models: a review
part of the development, evaluation, and maintenance of large-scale test.
Several IRT models are based on a common postulate:
the assumption of
unidimensionality. However, the local independence assumption holds only if the
latent space is entirely specied. For this reason, many eorts for the characterization of the concept of dimensionality and for its detection have been made. We
can say that an accurate and unequivocal denition of dimensionality does not
exist yet. This is due to the fact that the phenomenon is latent by nature, hence
a direct comparison with observed results is not possible.
Hambleton and Swaminathan (1985) justied the unidimensionality assumption with the presence of a dominant trait able to explain the examinees' responses.
In this sense, we can imagine that a single trait always exists but
crucial points are if the dominant trait is suciently strong and in which way it
dominates the others. Conversely, Traub (1983) argued that unidimensionality
is probably more the exception than the rule, with respect to the skills necessary
to answer to the items on most cognitive tests.
Some weak features of the unidimensionality assumption have been reviewed
by Adams et al. (1997), with the aim to propose a MIRT model.
The use of
unidimensional models might be improper for tests intentionally built from subcomponents that are assumed to measure dierent abilities.
IRT models seem
to be robust to these violations of unidimensionality, especially with highly correlated latent constructs. In fact, if we assume the existence of a single latent
ability, it can be seen as the dominant factor reecting the dierent composition
of the items. On the other hand, when a test is made by mutually exclusive subtests of items or when the underlying dimensions are not highly correlated, the
use of a unidimensional model can bias the parameter estimation, adaptive item
selection and trait estimation. The problem is highlighted especially in adaptive
testing, when the examinees are administrated dierent combinations of items
and the traits underlying the performance may reect the dierent composition
of the items (Matteucci, 2007).
Finally, as shortly described at the end of Chapter 1, the assessment of knowledge, competencies and achievement is going more and more towards a multidimensional evaluation. The reason of the widely use of MIRT models in recent
studied is that the actual interactions between examinees and test items are complex and necessitate to be framed in a multidimensional background. A clarifying
example reported in Matteucci (2007) concerns the assessment of prociency in
2.1 Main features of MIRT models
15
the University context, where the student's evaluation is typically multidimensional at each level: within a single course and during all the University career,
students are evaluated on the basis of multiple competencies.
2.1.1 Compensatory and noncompensatory approaches
MIRT models can be classied in two main groups:
compensatory and non-
compensatory models, depending on the way the vector of latent abilities,
θ,
is combined with item parameters to obtain the probability of responses to the
item.
In
compensatory
models we use a linear combination of the values of
θ
in the
specication of the response probabilities, by using a logistic or a normal ogive
form.
This approach implies that dierent combinations of elements in
θ
can
yield the same sum, and the direct consequence is a compensation eect: if a
θ-value
In
is low, but another one is appropriately high, the sum can be the same.
noncompensatory
models, dierent latent abilities used to solve an item
are separated and each part is used as an unidimensional model. Then the global
probability is obtained as the product of the probabilities of each unidimensional
part. Nonlinearity raises in relation to the use of the product of such probabilities,
and the compensation property does not hold (Reckase, 2009).
2.1.2 Conrmatory and exploratory approaches
Another classication of MIRT models can be done with reference to the available
information at the model specication step. Mainly, the investigation of multidimensionality can be conducted by using two dierent approaches: the exploratory
and the conrmatory approaches.
In the exploratory approach no prior knowledge is included in the model, in
terms of relationship between items and latent traits.
When the number of latent abilities is specied in advance, the method is
not merely explorative and we are in a conrmatory context.
In line with the
conrmatory approach, not only the number of latent variables is pre-specied
but also their relationships with the items. In fact, the researcher can use prior
knowledge to dene which items load on which factors.
16
2. Multidimensional IRT (MIRT) models: a review
2.1.3 Underlying latent structures
In this paragraph a brief review of dierent multidimensional latent structures
is reported.
For simplicity, gures are referred to the simplest case of a test
consisting of two subtests. Circles represent latent traits and squares represent
observed item responses. Subtests are indicated with dashed lines.
Consecutive unidimensional model
consecutive unidimensional
In Figure 2.1 is illustrated the so called
approach, where simple unidimensional IRT models
are tted to each subtest in a sequential way.
Fitting this model, we obtain
person measures for every specic ability, but a direct estimation for the relation
between them is not feasible (Huang et al., 2013).
Figure 2.1.
Consecutive unidimensional latent structure.
Multiunidimensional model
Figure 2.2 reports the underlying structure for
the between-item MIRT model (Wang et al., 2004), also called
sional
multiunidimen-
approach (Sheng and Wikle, 2007), where abilities are allowed to correlate
and the intensity of such associations can be obtained directly.
Bi-factor model
The well known
bi-factor
model, rst introduced by Holzinger
and Swineford (1937), where a general (or common) ability,
θ0 ,
and a specic
ability are assumed to aect the response to each item, is illustrated in Figure
2.3.
This is a case where there is within-item multidimensionality, i.e.
single
2.1 Main features of MIRT models
Figure 2.2.
17
Multiunidimensional latent structure.
items measure more than one latent trait. This approach ignores the association
between latent abilities.
Figure 2.3.
Hierarchical models
hierarchical
Bi-factor latent structure.
Figure 2.4 shows the latent structure assumed for MIRT
models, where the hierarchical structure in general and specic la-
tent constructs is modeled explicitly: items in the same subtest measure a specic
ability and, in turn, each specic ability is inuenced by a general ability. Different hierarchical models can be specied depending on the relation between
specic and overall abilities: if each specic ability is a linear function of the
overall ability we are in the case illustrated in
(a),
while if each specic ability
18
2. Multidimensional IRT (MIRT) models: a review
linearly combines to form the overall ability we are in the case showed by
(b)
(Schmid and Leiman, 1957; Sheng and Wikle, 2008).
Figure 2.4.
Additive model
In the
Hierarchical latent structures.
additive
model presented in Figure 2.5 the latent struc-
ture is such that the response to a test item is aected both by the general and
the specic latent traits, so that the latent abilities form an additive structure
(Sheng and Wikle, 2009). This model has a latent structure similar to the bifactor model, but here all the latent constructs are allowed to correlate.
Figure 2.5.
Additive latent structure.
2.2 MIRT models for binary data
19
2.2 MIRT models for binary data
MIRT is a methodology that has been developed with the principal aim of dealing
with the situation of complexity in psychological measurement when several latent
abilities inuence the individual's performance on a given item (Reckase, 1997).
By introducing a person trait and item discrimination parameters for each ability
measured by a test item, MIRT models permit separate inferences with reference
to each distinct latent dimension of an examinee (Ackerman, 1993).
Two parameter normal ogive model for binary data
p multiple choice items, each measuring m latent
abilities, θ1i , . . . , θmi . Let Y = [Yij ]n×p represents the data matrix, i.e. a matrix
containing n examinees' responses to p binary items, so that, for i = 1, . . . , n and
j = 1, . . . , p, Yij is dened as:
Let consider a test consists of
Yij =

1,
if examinee
i
answers item
j
correctly
0,
if examinee
i
answers item
j
incorrectly.
Reckase (1985) derived a multidimensional extension of the compensatory unidimensional two-parameter model, that in its normal ogive formulation becomes:
P (Yij = 1|θi , αj , βj ) = Φ
m
X
!
ανj θνi − βj
=
ν=1
Pm
ν=1
1
=√
2π
Each individual is characterized by a vector
ties, where
m
Zανj θνi −βj
2 /2
e−t
dt .
(2.1)
−∞
θi = (θ1i , . . . , θmi )
of latent abili-
is the number of latent dimensions measured by a generic item, in
contrast to the unidimensional case, where they are classied by only one latent
ability
θi .
Item discrimination parameters are also represented by a vector, reecting
20
2. Multidimensional IRT (MIRT) models: a review
multiple dimensions:
and
m
αj = (α1j , . . . , αmj )
, where
j
represents the item number
shows the dimension to which the discrimination value is related. If the
discrimination parameter related to dimension
ν , ανj
, is high, it means that
such dimension has a great inuence in determining an examinee's success on
item
j.
Finally,
βj
is a scalar parameter determining the location in the latent
space where the item provides maximum information.
Multiunidimensional model for binary data
As illustrated in the work of Sheng and Wikle (2007), the elements in the vector
of discrimination parameters
αj = (α1j , . . . , αmj )
can be considered as factor
loadings in factor analysis. If a rotation is performed so that each item loads on
one factor only, the vector of discrimination parameters can be simplied to
αj =
(0, . . . , 0, ανj , 0, . . . , 0), and we can get the expression for the multiunidimensional
model for binary data, where each latent trait is related to a single set of items,
from (2.1). The underlying latent structure of such model is illustrated in Figure
2.2.
Let consider a test consisting of
subtests, each one composed by
probability that the individual
to the
ν -th
ανj
items.
The test is structured into
items that measure one latent trait.
i will obtain a correct response to item j
m
The
belonging
subtest is given by:
P (Yνij
where
pν
p
1
= 1|θνi , ανj , βj ) = Φ (ανj θνi − βj ) = √
2π
ανjZ
θνi −βj
ν -th
ability, and
dt ,
−∞
is a scalar parameter reecting the item discrimination,
parameter reecting the individual's
2 /2
e−t
βj
θνi
is a scalar
is a scalar parameter
representing the location in the latent space where the item provides maximum
information.
Additive model for binary data
The additive MIRT model for dichotomous data proposed by Sheng and Wikle
(2009) assumes an underlying latent structure such that both specic abilities and
an overall ability aect directly the individual response to a test item, resulting
2.2 MIRT models for binary data
21
in an additive structure (see Figure 2.5).
If we consider again a test containing
(each one composed by
pν
p
items structured into
j
belonging to the
subtests
items), according to the additive MIRT model for
binary data, the probability that the individual
to item
m
ν -th
i
will obtain a correct response
subtest is given by:
P (Yνij = 1|θ0i , θνi , α0νj , ανj , βj ) =
α0νj θ0i +α
Z νj θνi −βj
= Φ(α0νj θ0i + ανj θνi − βj ) =
−∞
(2.2)
ν -th
dimension, θ0i is the i-th individual parameter related to the overall ability, α0νj
is the j -th item discrimination parameter with reference to the overall ability θ0i ,
ανj is the item discrimination parameter with reference to the specic ability θνi ,
and βj is a scalar parameter representing the location in the latent space where
where
θνi
1
2
√ e−t /2 dt ,
2π
is a scalar parameter representing the examinee's ability in the
the item provides maximum information.
The expression in (2.2) implies that the probability that an individual endorses
an item is directly inuenced by two latent traits: a general ability and a specic
one (Sheng and Wikle, 2009).
A more detailed description of the models for binary data presented above
goes beyond the purpose of this study.
Our decision to focus the analysis on
the additive structure has been driven by the fact that this latent structure,
according to which both the specic and general latent traits directly underlie
all the test items, represents a plausible and fairly detailed approximation of the
real interactions between individuals and item responses. On the other hand, the
multiunidimensional model is simpler than the additive, but it is regularly used
in MIRT applications.
The exposition of these two models has been done in order to furnish a more
complete background on the latent structures that we will discuss in detail for
the case of ordered responses.
22
2. Multidimensional IRT (MIRT) models: a review
2.3 MIRT models for ordinal data
A multidimensional formalization of IRT models for graded responses has been
developed as an extension of the unidimensional version by several authors. In
this section we present some works that focus on multidimensional models for
ordered items. These works have not necessary developed in an IRT context, but
also in the framework of conrmatory factor analysis. Basically, the interest in
adopting such models raised to face the widespread use of Likert items (Likert,
1932), and in general other ordered scales, on questionnaires in sociological and
psychological measurement. The extensive availability of such data has led, in
the last two decades, to the need of new progressions towards a multidimensional
version of IRT model for graded responses.
We begin by introducing some notation.
items, let consider a test made by
latent traits,
θ1 , . . . , θ m .



1,




 2,
Yij =
 ...





K ,
j
where
1
and
Kj
multiple choice items, each measuring
m
Y = [Yij ]n×p ,
i = 1, . . . , n and
Now the data are collected in a matrix,
n examinees' responses
j = 1, . . . , p, Yij is dened as:
containing
p
As in the case for dichotomous
to
p
ordered items, thus, for
if the answer of examinee
i
to item
j
falls in category
1
if the answer of examinee
i
to item
j
falls in category
2
i
to item
j
falls in category
Kj
.
.
.
if the answer of examinee
are the lowest and the highest score for item
j,
respectively.
Muraki and Carlson (1995) developed a MIRT model for polytomously scored
items on the basis of Samejima's graded response model in the full information
factor analysis context. In their work, they show how the factor analytic model
for categorical variables is based on the assumption that the response process,
say
Zij ,
is an underlying not observable variable and, for each subject i, realized
into the vector of observed ordered item responses
also model the response process variable
Zij
Yi = (Yi1 , Yi2 , . . . , Yip ).
They
as a linear combination of the
m
2.3 MIRT models for ordinal data
latent traits,
θ1i , θ2i , . . . , θmi ,
23
and the factor loadings
αj1 , αj2 . . . , αjm .
Thus:
Zij = αj1 θ1i + αj2 θ2i + · · · + αjm θmi + εij = α0j θi + εij ,
εij is an unobserved random variable that is assumed to be distributed
2
as N 0, σj . Muraki and Carlson (1995) introduced the threshold parameter
γjk associated with the k -th category of item j , and modeled the unobservable
response process according to the psychological mechanism, that is Yij = k if
γj,k−1 ≤ Zij < γjk , for k = 1, . . . , Kj , γj0 = −∞, and γjKj = +∞. The probability to get the response category k of item j by examinee i, given the examinee's
m-dimensional latent trait and assuming a normal ogive model, is formalized as:
where
P (Yij = k|θi ) =
Zγjk
1
exp
1
(2π) 2 σj
(
1
−
2
Zij − α0j θi
σj
2 )
dZ .
(2.3)
γj,k−1
Model (2.3) can be rewritten in a more familiar way with item response models,
by applying some transformation of the variables (see Muraki and Carlson (1995)
for the detailed procedure). The authors focus on uncorrelated latent dimensions
(bi-factor latent structure) and furnish a detailed procedure of the Expectation
Maximization (EM) algorithm in a marginal maximum likelihood estimation context (the matter of estimation methods will be covered in the next section). The
proposed algorithm has been implemented in the POLYFACT computer program (Muraki, 1993), which calculates the factor loadings via the principal factor
method adopted to the product-moment correlation matrix. The program treats
the observed responses as continuous variables (Muraki and Carlson, 1995).
In the study by Ferrando (1999) a comparison between three dierent item
response models for graded responses has been made, focusing on a continuous
response model based on linear factor analysis, a censored response model, where
the graded responses are considered to be censored continuous variables, and a
multidimensional graded response model in the formulation given by Muraki and
Carlson (1995). They observed that, even though there have been several applications of the unidimensional graded response model to attitude and personality
data, applications of the multidimensional version of the model are not common.
24
2. Multidimensional IRT (MIRT) models: a review
Ferrando (1999) concludes showing that the solutions were similar for the three
models considered, but that the estimation method could aect the results.
A more recent work by Edwards (2010a) falls within the context of conrmatory item factor analysis models. He developed a relatively user friendly package,
MultiNorm (Edwards, 2010b), where the user can t multidimensional graded
(or dichotomous) response models characterized by a multiunidimensional or a
bi-factor underlying latent structure. The estimation technique used in this work
belong to the Markov chain Monte Carlo (MCMC) techinques.
Again, for a
further discussion on estimation methods see the next section.
Other applications of MIRT models for graded responses, with empirical examples regarding mainly the eld of educational assessment and the psychological
reactance, can be found in Yao and Schwarz (2006), Fu et al. (2010), Brown et al.
(2011) and van der Ark et al. (2011). It is worth to remark that the latent structures assumed in these studies were prevalently the multiunidimensional structure
(Figure 2.2) and the bi-factor structure (Figure 2.3).
Considering this scarcity of existing research about MIRT models for ordinal
outcomes, especially for complex cases, in this work we take into consideration
the one represented by an additive underlying latent structure (Figure 2.5), after
having introduced the multiunidimensional case (Figure 2.2).
2.4 Estimation methods
In IRT models, as well as in MIRT models, the characteristics of interest are the
person's abilities and the item parameters: dierent values of these parameters
lead to dierent response probability.
Nevertheless, these two important char-
acteristics are both unknown and the available data are represented only by a
collection of responses given by a sample of examinees.
Concerning the estimation procedure, we need to consider two relevant features: the rst one is that the response model is not linear and the second one is
that is not possible to observe the latent trait
θ.
It implies that the estimation
is similar to perform a nonlinear regression with unknown predictor values.
Starting from the available data, the focal objective is in the determination of
the
θ values for every individual and the item parameters from the item responses.
We can perform a simultaneous estimation of ability and item parameters in a
2.4 Estimation methods
25
context of the maximum likelihood (ML) or in a Bayesian framework.
The estimation procedure is in general aected by the way the probabilities
of the responses are theorized.
bility: one of them is the
There are two main interpretations of proba-
stochastic subject
interpretation, where the observed
examinees are considered as xed and probabilities reect the unpredictability of
specic events. Here the latent variables are constructed as unknown xed parameters. The other interpretation of probability is the
random sampling
, where
the examinees are considered as a representative random sample from a population, so that it raises the needing to specify a specic distribution of the latent
trait and the latent variables are constructed as random.
In the framework of ML estimation, three main methods can be identied:
•
The joint maximum likelihood (JML);
•
The conditional maximum likelihood (CML);
•
The marginal maximum likelihood (MML).
In the JML and CML methods we are in the context of the stochastic subjects
interpretation of probability, i.e.
xed latent variables, whereas in the MML
method we are in the random sampling interpretation framework and the latent
variables are treated as random.
The applicability of JML and CML is pretty limited. The JML method works
by simultaneously estimating item and person parameter through an iterative
procedure.
This method is quite simple but the complexity of the algorithm
increases with the number of observations. The standard limit theorems do not
apply and the resulting parameter estimators are not consistent (Andersen, 1970).
The CML was a method suggested by Andersen (1970) and based on the availability of a sucient statistic for the ability in order to simplify the maximum
likelihood conditioning on it. There is a relevant problem which limits the applicability of such method: most models, including the quite simple unidimensional
two parameter model, do not have simple sucient statistics (Johnson, 2007).
The MML estimation method is the most widely applied and, by considering
the joint probability of a certain response pattern given the latent trait and
integrating out of the individual likelihoods, it denes the marginal probability
26
2. Multidimensional IRT (MIRT) models: a review
of observing the item response pattern. To obtain the parameters estimates, the
EM algorithm is used (Ayala, 2009).
A single estimated latent trait value can be associated to each individual
through maximum a posteriori or expected a posteriori techniques. In general,
all the ML estimation methods consider xed item parameters. Conversely, in the
Bayesian context, both the latent abilities and the item parameters are regarded
as random variables.
As we will see later more in detail in the next chapter, the adoption of a fully
Bayesian approach implies several advantages.
It allows a joint estimation of
item parameters and individual abilities and it permits to include uncertainties
about item parameters and abilities, and in general prior beliefs, in the prior
distributions. MCMC estimation of IRT and MIRT models can be then viewed as
an alternative to MML estimation, where the approximation of multiple integrals
involved in the likelihood function, especially for increasingly complex models,
may represent a serious problem.
Chapter 3
Bayesian estimation of MIRT
models
This chapter introduces the main ideas and functioning characterizing the Bayesian
approach for estimation purposes, with a particular focus on the simulation-based
methods for parameter estimation. Available Bayesian estimation methods based
on MCMC techniques for MIRT models are also presented.
3.1 Elements of Bayesian statistics in MIRT context
According to the Bayesian approach, all the model parameters, i.e. person and
item parameters in our case, are random variables, each one with its prior distribution reecting the prior information available and the uncertainty about their
real values before the observation of the data.
All the MIRT models so far illustrated (for both binary and ordinal items) are
specied with the nal aim to express the data-generating process as a function
of the unknown person and item parameters. These are likelihood models and
present the density of the data conditional on the model parameters. In order to
27
28
3. Bayesian estimation of MIRT models
formulate a Bayesian model, we need to specify:
•
A prior distribution for each unknown model parameter;
•
A likelihood model reecting the data-generating process.
Once the data are observed, the prior information is updated with the information contained in the observed data and a posterior distribution is made,
which permits to perform direct inference about parameters.
3.1.1 Prior distribution choice
A key point in Bayesian framework is the possibility to specify prior distributions
for the unknown model parameters with the aim to exploit background information and beliefs available before the collection of the sample. All these context
information are expressed as probability distributions and, as a result, are reected in a prior distribution.
On the other hand, the conditional probability
distribution is specied to reect the observed data.
One of the main objection to the Bayesian framework regards the specication of these prior distributions, that can be considered extensively subjective
and arbitrary (Gelman, 2008). It has to be noticed that the choice of the prior
distributions, made at the moment of model specication, is subjective by denition.
Therefore, only prior distributions expressing prior ideas can be considered
correct in this setting and, even if the choice is subjective, it cannot be considered
arbitrary since it reects the researcher's thought (Fox, 2010). In addition, it is
possible to specify the so called vague priors, that are objective non informative
prior distributions indicating ignorance around the unknown parameter values.
The branch of objective Bayesian statistics rely on the specication of objective prior distributions. Even though it does not need any subjective contribution,
we have to consider that a specic point of strength of Bayesian methodology is
the possibility of including beliefs and prior information in model specication,
and objective Bayesian methods do not allow to do that.
The inclusion of prior beliefs can increase the reliability of the statistical
inference. In IRT and MIRT frameworks, item responses represent the observed
data and we can include other sources of information in the model through the
3.1 Elements of Bayesian statistics in MIRT context
29
a priori model. These are circumstances where data-based information is slight,
and where prior information can signicantly improve the statistical inference
(Fox, 2010).
3.1.2 Bayes' Theorem
y = (y1 , . . . , yN ), that are the
numerical realization of the random vector Y = (Y1 , . . . , YN ), which follows some
probability distribution. Let denote with p(y) the probability density (mass)
function of the continuous (discrete) variable Y .
Let consider a set of
N
observations, denoted by
Now let assume that, starting from the observed responses, we are interested in measuring the unknown person (θ ) and item (ξ ) parameters, denoted
p(λ) the prior distribution reecting the beliefs
on unknown parameters. The term p(y|λ) reects the information about λ from
the vector of observed values y . In general, we can be interested in the sampling
by
λ = (θ, ξ).
We denote with
distribution and the likelihood function if we consider it as a function of the data
or as a function of the parameters, respectively. Usually, the distribution of the
parameters given the data is of main interest. According to the Bayes' Theorem,
the conditional distribution of
p(λ|y) =
where
∝
λ
given the response data is
p(y|λ) p(λ)
∝ p(y|λ) p(λ) ,
p(y)
denotes proportionality.
The term
p(λ|y)
(3.1)
is the posterior density of
λ given both prior and sample information and, for continuous
R
quantities, p(y) =
p(y|λ)p(λ) dλ (where Λ denotes the set of all the possible
λ∈Λ
values of λ).
the parameter
Since we are interested in person and item parameters, we replace expression
(3.1) with
p(θ, ξ|y) =
(3.2)
p(θ) is the prior for person parameters θ , p(ξ) is the prior for item paramξ and these prior densities are assumed to be independent from each other,
where
eters
p(y|θ, ξ) p(θ) p(ξ)
∝ p(y|θ, ξ) p(θ) p(ξ) ,
p(y)
30
3. Bayesian estimation of MIRT models
thus
p(θ, ξ) = p(θ) p(ξ).
The denominator of expression (3.2) is called the data marginal density,
marginal likelihood, or integrated likelihood. Its evaluation can be a time costly
process, so that, when the knowledge of the shape of the posterior
p(θ, ξ|y)
is
enough for the study purposes, we can focus on the unnormalized density function:
p(y|θ, ξ)p(θ)p(ξ)
(Fox, 2010).
The statement of the well-known Bayes' Theorem (Bayes and Price, 1763)
is represented by the expression reported in (3.2). In particular, the expression
p(θ, ξ|y) ∝ p(y|θ, ξ) p(θ) p(ξ) is a factorization representing the product of
the likelihood L(y; θ, ξ) and the prior density, as typically L(y; θ, ξ) = p(y|θ, ξ).
All the sample information regarding person and item parameters is contained in
this likelihood function.
A relevant distribution for the inference process is the so called joint posterior
density
p(y, θ, ξ).
This density can be factorized as follow:
p(y, θ, ξ) = p(θ, ξ|y) p(y)
(3.3)
= p(y|θ, ξ) p(θ) p(ξ) .
(3.4)
From the expressions above we can observe that the joint posterior distribution
can be factorized in two dierent ways:
(i)
as the marginal density of the data and
the posterior of the unknown parameters (3.3), and
(ii)
of the parameters and the likelihood of (θ, ξ ) given
y
as the prior distributions
(3.4).
3.1.3 Marginal posterior distributions for model parameters
In order to make inference, the joint posterior distribution reported in (3.2) is
used. Since this high-dimensional distribution has a complex form, and consequently it usually shows an analytically intractable expression, we need to focus
on one of the unknown parameters, and consider the other as a nuisance parameter.
More precisely, if we are interested in the distribution of
θ,
we assume
nuisance parameter and, integrating out all the possible values of
ξ,
ξ
as a
from (3.2)
3.2 Markov chain Monte Carlo methods
31
we obtain the marginal posterior density for person parameters:
Z
p(θ|y) =
Z
p(θ, ξ|y) dξ =
ξ∈Ξ
p(y|θ, ξ) p(θ) p(ξ)
dξ
p(y)
ξ∈Ξ
Z
p(y|θ, ξ) p(θ) p(ξ) dξ .
∝
(3.5)
ξ∈Ξ
When we are interested in the distribution of
ξ,
we consider
parameter and thus we integrate out all the values of
θ,
θ
as a nuisance
getting the marginal
posterior density for item parameters:
Z
Z
p(θ, ξ|y) dθ =
p(ξ|y) =
θ∈Θ
p(y|θ, ξ) p(θ) p(ξ)
dθ
p(y)
θ∈Θ
Z
p(y|θ, ξ) p(θ) p(ξ) dθ .
∝
(3.6)
θ∈Θ
In general, the information contained in the joint and/or marginal posterior distributions are summarized by the posterior mean (median) and standard
deviation.
Concerning the joint posterior distribution of person and item pa-
rameters, as previously pointed out, several diculties arise as a result of its
high-dimensionality and analytical intractability. Nonetheless, with reference to
the marginal posterior densities of person (3.5) and item (3.6) parameters, the
same diculties remain, as the mathematical expressions are not always known.
These computational problems can be solved by the use of simulation based
techniques. In particular, the MCMC method is a very useful technique that we
will be briey describe in the next section.
3.2 Markov chain Monte Carlo methods
The Bayesian approach based on MCMC techniques has increased its popularity
in the estimation of unidimensional and multidimensional item response models.
A twofold motivation can drive the use of such method. First of all, it can
32
3. Bayesian estimation of MIRT models
represent an eective substitute to the classical EM algorithm implemented in the
MML estimation. In fact, it works with simulation and introduces an informative
prior distribution in the estimation process and, unlike the MML method, the
Bayesian approach considers both the person parameters and item parameters as
random variables. Secondly, it can also be seen as a compensatory instrument
to the EM algorithm. The posterior distribution generated through the MCMC
techniques can be used to evaluate the suitability of the normal approximations
in the MML, so that we can compare the two approaches with reference to the
accuracy of parameter recovery.
As we will see in this section, MCMC is a very useful and relatively straightforward method to make inference when we have to face with a very complex
model, where it is actually dicult to sample or directly simulate from the posterior distribution. This represents a common situation in a MIRT context.
In particular, the Gibbs sampler is a widely used MCMC algorithm consisting
in a quite precise scheme to create suitable samples from the posterior density.
Moreover, this method is not very constraining and fairly simple to implement,
if compared with other methods. For the motivations mentioned above, MCMC
strategies have been implemented in IRT background by several researchers and
many studies have been made in order to investigate the properties of these
methods. Of particular interest is also the evaluation of model parameter recovery
in comparison with the classical methods.
If we perform a comparison between the MCMC technique and the classical
MML estimation, we can summarize the main advantages of the MCMC approach
in:
•
the exibility regarding the modeling of all the connections between latent
and observed variables;
•
the appropriateness for more complex models;
•
the non-sensitivity to the choice of starting values (unlike the EM algorithm).
Two relevant works that perform Bayesian estimation using the Gibbs sampler
in an IRT context are the works of Albert (1992) and Béguin and Glas (2001).
In the rst one, the Gibbs sampler for the unidimensional two parameter model
for binary data is implemented, and the MCMC algorithm is compared with the
3.2 Markov chain Monte Carlo methods
33
EM algorithm through an application in the educational assessment context. In
the second work, an extension to the multidimensional case has been done with
respect to the work of Albert. Other item response applications of MCMC can
be found in Fox and Glas (2001); Patz and Junker (1999a,b).
As previously highlighted, from a Bayesian point of view, the leading purpose
of the researcher is to analyze the properties of the posterior distribution
p(λ|y)
which, as we can see from (3.1), is proportional to the product between the likelihood function and the prior distribution (recall that
the unknown parameters of interest, and
y
λ
is the vector representing
represents the observed data).
For exposition purpose and without loss of generality, in this section we will
consider the simplest case where the vector of unknown parameters is unidimensional, namely
λ = λ.
When the posterior distribution does not have a familiar
functional form and/or it is not possible to perform a direct simulation because of
the complexity of the model, simulation methods based on Markov chains seem
to be an easy way to get samples from the posterior density
p(λ|y).
The MCMC is a class of techniques developed with the nal aim of reproducing a target distribution by simulating one or more sequences of correlated
random variables.
posterior density
In our context the target distribution is represented by the
p(λ|y).
ulated by the MCMC algorithm, where at each iteration
value of
[t]
λ
λ is simt = 1, . . . , T , the
A random walk in the space of the parameter
t,
for
is drawn from a probability function which depends on the value of
λ at the previous step, λ[t−1] .
The underlying idea is that the regions of the state
space are touched by the random walk in a proportional way with respect to their
posterior probabilities and, for a suciently large number of iterations, it might
approximate the target distribution.
MCMC methods dier from the Monte
Carlo methods because the simulated values are correlated, rather than being
statistically independent. The generated Markov chain converges to an unique
and stationary distribution that corresponds to the target distribution (Gelman
et al., 2003). Therefore, with reference to the reproduction of the marginal posterior densities of IRT model parameters with a complex structure, this method
is able to furnish reliable results, and overtakes the problem of analytically intractable distributions.
One of the key point concerning all the MCMC techniques is the creation of
a chain suciently long to approximate the target distribution. Considering that
we are in the context of iterative based methods, the time of convergence also
34
3. Bayesian estimation of MIRT models
represents a relevant topic.
Usually, a so called burn-in period, containing a
xed number of rst iterations, is dened and excluded from the analysis.
The chain length is aected by the complexity of the posterior distribution,
the initial values and the speed of convergence. Gelman et al. (2003) recommend
to use half of the sample as burn-in period. On the contrary, other authors prefer
to directly choose the number of iterations as burn-in period, for example in one
of the analyses illustrated in Béguin and Glas (2001), the burn-in period is of
1000 iterations against a run length of 30000 iterations.
What we suggest from a practical point of view is to control the behavior of
the sampled parameters through a plot in the sequence of iterations, and then
decide subsequently.
Moreover, another signicant (but still not very clear, as illustrated in Gilks
et al. (1996)) topic concerns the number of distinct chains needed to implement
the MCMC algorithm. Mainly, there are three dierent approaches. According
to the rst one, only one long chain is created, considering that the longer the
chain is, the higher the possibility to nd new modes is.
The second approach is based on the creation of several quite long chains.
The main advantage of this approach is that multiple chains allow the comparison
between the results, that can permit to detect some signicant dierences and
symptom of non-stationarity.
The use of the third approach, consisting of the utilization of many short
chains, is driven by the aim of creating independent samples.
Actually, this
approach is not advisable because chains can take a long time to reach the convergence and independent samples are not required.
Several MCMC algorithm exist, depending on the features of the problem and
the specic attributes of the Markov chains. Each MCMC algorithm denes a
transition distribution
λ,
say
λ
[0]
p(λ[t] |λ[t−1] ),
representing the probability of a parameter,
to move from a state to the following, starting from a proper initial values
.
Examples of detailed essays about MCMC are Gelman et al. (2003), Gamer-
man (1997) and Gilks et al. (1996).
3.2 Markov chain Monte Carlo methods
35
3.2.1 Metropolis-Hastings algorithm
The Metropolis-Hastings (M-H) algorithm (Hastings, 1970) is one of the most
popular MCMC mehtods and it can be directly implemented in a Bayesian framework. Our aim is the generation of a sample of size
T
from the target distribution
represented, in our context, by the posterior distribution
p(λ|y).
We can sum-
marize the M-H algorithm functioning in the following way (Ntzoufras, 2011):
1. Set initial values
λ[0] ;
2. Then reiterate the following steps for
(i) Set
t = 1, . . . , T :
λ = λ[t−1]
(ii) Generate a new candidate parameter value
ing) distribution
λ[t] = λ0
from a proposal (jump-
q(λ0 |λ)
α = min 1,
(iii) Calculate the ratio
(iv) Update
λ0
with probability
Let focus on the case where
λ
p(λ0 |y)q(λ|λ0 )
p(λ|y)q(λ0 |λ)
α;
otherwise set
λ[t] = λ.
is a vector of parameters that can assume only
continuous values.
According to step (i), suitable starting values have been provided. Let suppose
to be in the state
λ[s−1]
of the chain.
In the step (ii) of the algorithm, a new candidate
a proposal distribution
0
[s−1]
q(λ |λ
).
λ0
is sampled by using
The proposal distribution is also called
jumping distribution, in order to emphasize the concept of movement from
the current value to the next one of the chain. It is also possible to dene the
probability of jumping in the opposite direction, i.e. from
q(λ
[s−1]
0
|λ )
λ0
to
λ[s−1] ,
that is
Even if in the original M-H algorithm (Metropolis et al., 1953) only
symmetric proposals were considered, this property is not compulsory in the more
recent versions of the algorithm (Ntzoufras, 2011).
Furthermore, the proposal
q(·) should be dened in a proper way.
In fact, the
resulted chain needs to satisfy some specic characteristics, namely: irreducibility,
aperiodicity and not transitoriness.
A chain is irreducible if it is possible to
move from one state to any other state in a nite number of steps with positive
probability, aperiodic if all the states are acyclic, and not transient if all the
states are recurrent (i.e. the probability to return to a state from the same state
36
3. Bayesian estimation of MIRT models
is equal to one). Moreover, the ratio
r=
q(λ0 |λ[s−1] )
must be strictly positive, for
q(λ[s−1] |λ0 )
λ such that both the numerator and the denominator are nonzero.
every value of
In the step (iii) the acceptance probability
α
α
is computed. The higher the
is, the more probable the acceptance of the candidate value
quantity
r
λ0
will be. The
consists of two components: the ratio of the posterior probabilities,
which drives the algorithm towards the
λ-value with higher posterior density, and
the ratio of the proposal densities, which also has an inuence in determining
the direction to one or the other
λ-value.
Step (iv) of the M-H algorithm is about the acceptance or the rejection of the
candidate value
λ0 .
To make this choice, we draw a random number
uniform distribution in the
[0, 1]
λ[s]
Thus, the candidate value
of
u ≥ α.
from the
interval. Then we set:

λ 0 ,
=
λ[s−1] ,
λ0
u
if
u < α
if
u ≥ α.
is accepted with probability
(3.7)
α
and rejected in case
In both cases (acceptance or rejection) the iterations progress and the
algorithm proceeds to generate the next value.
The M-H algorithm can be also applied in case of discrete-values parameters
where the
q(·)
proposal distribution becomes the probability mass function used
to generate candidate points.
3.2.2 Gibbs sampler
The Gibbs sampler was rst introduced by Geman and Geman (1984) and then
formalized by Gelfand and Smith (1990). It can be obtained as a special case of
the M-H algorithm by using as a proposal distribution the so called full conditional posterior distribution:
p(λj | λ1 , . . . , λj−1 , λj+1 , . . . , λd , y) = p(λj | λ∗j , y).
Such proposal distribution implies a probability of acceptance
to the fact that the ratio
r
is
1
(3.8)
α equal to one, due
(see Gelman et al., 2003). With an acceptance
3.2 Markov chain Monte Carlo methods
37
probability equal to one, at each iteration the algorithm performs the jump provided by step (iv) in the M-H algorithm. The Gibbs sampler is based on iterative
sampling of the conditional distributions resulting from the decomposition of the
full posterior density.
A rst advantage of the Gibbs sampler is that, for every iteration, the values are randomly generated from unidimensional distributions for which a wide
variety of computational tools exists (Gilks et al., 1996). Another important advantage is that it does not require the specication of a proposal distribution.
This is a key point, because an inaccurate choice of the proposal
q(·)
in the M-H
algorithm may lead to a very slow algorithm.
Thus, if it is dicult to sample from a complex and/or high-parameterized
posterior distribution and it is possible to decompose the vector of parameters,
we can proceed to generate the parameter values from the single conditional
distribution in a sequential way.
Let suppose that we are interested in producing a sample of size
the target distribution, represented here by the posterior distribution
where
λ = (λ1 , . . . , λp ).
T from
p(λ|y),
The functioning of the Gibbs sampler algorithm can be
described with the following steps (Ntzoufras, 2011):
1. Set initial values
λ[0] ;
2. Then reiterate the following steps for
(i) Set
(ii) For
(iii) Set
t = 1, . . . , T :
λ = λ[t−1]
j = 1, . . . , p ,
update
λj
from
λj ∼ p(λj | λ∗j , y)
λ[t] = λ and save it as the generated set of values at t + 1 iteration
of the algorithm.
Hence, given a particular state of the chain
values by:
λ[t] , we generate the new parameter
38
3. Bayesian estimation of MIRT models
[t]
λ1
[t]
λ2
[t]
λ3
[t]
[t−1]
p(λ1 | λ2
from
p(λ2 | λ1 , λ3
from
p(λ3 | λ1 , λ2 , λ4
.
.
.
λp
[t−1]
from
, λ3
[t−1]
, λ4
[t]
[t−1]
[t−1]
[t]
[t]
[t−1]
[t]
[t]
[t]
, λ4
[t−1]
, . . . , λp
[t−1]
, . . . , λp
[t−1]
, . . . , λp
, y)
, y)
, y)
.
.
.
from
[t]
p(λp | λ1 , λ2 , λ4 , . . . , λp−1 , y)
.
Generating values from the single conditional distributions is relatively easy,
since those are univariate distributions. Moreover, under appropriate conditions
of regularity, the
λ[t] -distribution
will converge to the target distribution. Usu-
ally, this convergence process is fast and the complete sequence
λ[t]
can be
considered as the simulated sample of the distribution of interest (Matteucci,
2007).
For a more detailed exposition of the Gibbs sampler, see Gamerman (1997)
and Gelman et al. (2003), or Gelfand and Smith (1990) for early presentations of
this widely used MCMC algorithm.
3.3 Bayesian computation using OpenBUGS
In the following, the simulation study and the application on real data will be performed using OpenBUGS (http://www.openbugs.net), an open-source version
of the famous software package BUGS (Bayesian inference Using Gibbs Sampling)
that permits an user-friendly implementation of the Gibbs sampler.
The software package BUGS was developed in the context of the BUGS
project.
The BUGS project started in 1989 in the MRC Biostatistic Unit in
Cambridge and the last version of the resulting software developed by Spiegelhalter et al. (1996) became very popular in the 1990s. WinBUGS, an available
windows-based version of BUGS, has nished to be further upgraded in 2012
hence OpenBUGS, which basically contains all the features of its ancestor WinBUGS, represents nowadays the future of the BUGS project.
A detailed description of the software goes beyond the scope of this work,
nevertheless, useful tools to understand the theoretical ideas that are the foundations of BUGS and its functioning are the book of Ntzoufras (2011) and Lunn
3.3 Bayesian computation using OpenBUGS
39
et al. (2013).
As we can nd in Lunn et al. (2009), there are several reasons behind the success of the BUGS software. These appealing features can be strictly summarized
in:
•
Flexibility.
Flexibility is quite probably the principal reason for BUGS's
popularity. BUGS runs the Gibbs sampling method to any directed acyclic
graph specied in its language, moreover it allows the user to add new
distributions and functions.
•
Easy implementation.
The model implementation using BUGS is fairly
simple because the package itself run the MCMC algorithm. It is not necessary for the user to write down all the full conditional distributions. Moreover, measures, plots and statistics to check the convergence and the t of
the model are automatically computed.
These aspects notwithstanding, the user must always be careful because BUGS
does not perform any control about the model identication, thus several mistakes can be made without any alert from the program. As the manual clearly
remark:
Gibbs sampling can be dangerous!
40
3. Bayesian estimation of MIRT models
Chapter 4
MIRT graded response models with
complex structures
In this chapter we specify two MIRT models for graded responses with a complex structure. After having established a dichotomization method, we focus on
models with a multiunidimensional structure, where items in each subtest characterize a single ability, and on models with an additive structure, where each item
measures a general and a specic ability directly. In the MIRT model presented,
all the latent traits are allowed to correlate. The main scientic contribution of
this work is the multidimensional additive model for graded responses with correlated traits, estimated with MCMC tecniques. Due to the adoption of Bayesian
estimation methods, particular attention is paid to the model building phases.
4.1 MIRT graded response models (GRMs)
A multidimensional generalization of IRT graded response model (GRM) can be
obtained from its unidimensional counterpart.
(ii)
a set of
p
ordinal items where the response
item can take values in the set
κj1 , . . . , κj,Kj −1
{1, . . . , Kj }.
Let consider:
Yij
individuals;
of the i-th subject to the
Each item thus has
that have to satisfy the order constraint
41
(i) n
κj1
j -th
Kj − 1 thresholds
< · · · < κj,Kj −1 ;
42
4. MIRT graded response models with complex structures
and
(iii)
the existence of multiple, say
m,
latent abilities
θi = (θ1i , . . . , θmi )0
underlying the responses to the items.
For simplicity, in this paragraph we do not consider the number of latent
dimensions, even if we have always to take in mind that
θi
is a vector, so we are
dealing with the presence of concomitant latent dimensions. The key point of the
choice of the underlying latent structure will be examined more closely later.
Assumptions are quite similar to the unidimensional version of the model: it
is assumed that an individual can reach a specic category level of an ordinal test
item only if he/she is also able to reach all the lower categories on the same item.
In other words, the item necessitates an amount of steps and the accomplishment
of a step requires the achievement of the previous one.
This type of model is
then appropriate for rating scales where a rating category includes all previous
categories (Reckase, 2009).
The notation introduced above implies that the lowest score on item
and the highest score is
The probability that the
i-th
is 1
examinee will select
j is assumed to increase monotonically with
an increase in any component of the θi vector, i.e. an increase in any of the latent
the
k -th
Kj .
j
category or higher on item
abilities underlying the test.
We have used a dichotomization procedure by adapting Samejima's approach
(see section 1.2.1): in order to make the implementation of the models more clear
and easy, our models are specied on the basis of the probability that an item
response will fall in
category k or lower
, denoted by
have used the probability that an item will fall in
by
P
∗
). The probability
on item
j
πijk
that the
i-th
P
(while in section 1.2.1 we
category k or higher
subject will select the
, denoted
k -th
category
is equal to the probability of answering below the upper boundary for
the category (κk ) minus the probability of answering below the category's lower
boundary (κk−1 ). Figure 4.1 illustrates the dichotomization method used. The
dashed line, that represents the hypothetical response, falls in category
k = 4:
the probability to observe a response in that category can be easily calculated as
Pi4 − Pi3 .
Generalizing the example presented in Figure 4.1, the probability that the i-th
k -th category on item j can be constructed
Pijk = P (Yij ≤ k|θi ), for k = 2, . . . , Kj . We
examinee's response will fall in the
from the cumulative probabilities
4.1 MIRT graded response models (GRMs)
Figure 4.1.
43
Dichotomization used for the MIRT graded response model speci-
cation. The dashed line indicates the observed category response.
obtain that:
πijk = Pijk − Pi,j,k−1 = P (Yij ≤ k|θi ) − P (Yij ≤ k − 1|θi ) ,
(4.1)
and, with the aim to guarantee that the probability of each category can be
πij1 = Pij1 = P (Yij ≤ 1|θi )
= 1 − P (Yij ≤ Kj − 1|θi ).
determined from (4.1), it is assumed that
πijKj = 1 − Pi,j,Kj −1
and
A normal ogive or a logistic formulation of the model can be obtained from
expressions (1.2) and (1.3), but a previous step is needed to get an expression
for the predictor
ηij .
In the multidimensional case the predictor becomes a func-
θi vector of person parameters and the ξj vector of item parameters,
ηij = f (θi , ξj ). In particular, to have an explicit formulation for the predictor we
tion of the
need to make some assumptions reecting the underlying latent structure hypothesized. Among the dierent underlying latent structures that can be assumed (see
Paragraph 2.1.3), in this thesis we focus on:
•
models with a
multiunidimensional structure, where items in each sub-
test characterize a single ability;
•
models with an
additive
structure, where each item measures a general
and a specic ability directly.
44
4. MIRT graded response models with complex structures
As previously mentioned, the choice of these two latent structures has been
driven by the fact that the rst one is widely used and represents a classical
approach in MIRT analysis, while the second one is able to reect the complexity
of real interactions between items and individuals.
4.1.1 Specication of the multiunidimensional GRM
As previously mentioned, according to the multiunidimensional structure, each
i is assumed to be characterized
(θ1i , . . . , θmi ) where each latent dimension is
by a vector of latent traits
items. Thus, considering a test consisting of
p
individual
m
subtests indexed by
ν,
measured by a specic set of test
items, the test is structured into
pν
each one composed by
items that measure one
latent trait. The cumulative probability that the individual
category or lower on item
j
belonging to the
θi =
ν -th
i
will select the
k -th
subtest is given by:
Pνijk = P (Yνij ≤ k|θνi , ανj , κjk ) =
κjk −α
Z νj θνi
= Φ(κjk − ανj θνi ) =
−∞
where
ανj
and
κjk
1
2
√ e−t /2 dt ,
2π
(4.2)
are item parameters representing the item discrimination and
k + 1, respectively. The parameter θνi
represents the i-th examinee ability in the ν -th ability dimension. We can observe
that the predictor ηνij = f (θi , ξj ) assumes the form: ηνij = κjk − ανj θνi .
the threshold between categories
k
and
The multiunidimensional model for graded response can be specied in a
normal ogive formulation
from (4.2) and
(ii)
(i)
by considering the cumulative probabilities obtained
by applying the dichotomization procedure represented in
Figure 4.1, according to which the probability
select the
k -th
category on item
πνijk
j
in subtest



P

 νij1
= Pνijk − Pν,i,j,k−1



1−P
ν,i,j,Kj −1
ν
πνijk
that the
i-th
examinee will
is:
for
k=1
for
k = 2, . . . , Kj − 1
for
k = Kj .
(4.3)
4.1 MIRT graded response models (GRMs)
45
It has to be noticed how in (4.2) only one specic ability aects the response to
a specic item. This structure reminds the unidimensional version of the GRM:
we can imagine to t a sequence of unidimensional models, each one for a specic
subtest. Nevertheless, a relevant dierence consists in the fact that that distinct
latent traits are now allowed to correlate.
4.1.2 Specication of the additive GRM
A relevant aim of this work is to propose a new additive model for ordinal data,
estimated by Bayesian MCMC techniques, where the general and specic latent
traits are allowed to correlate. In this section, we provide the simple, but very
eective, specication for the additive GRM.
p items and structured into m subtests,
each one composed by pν items (ν = 1, . . . , m). The responses to items belonging
Let consider again a test consisting of
to a specic subtest are assumed to be inuenced by a specic ability and a
general ability, according to the underlying latent structure illustrated in Figure
2.5. The cumulative probability that the individual
or lower on item
j
belonging to the
ν -th
i will select the k -th category
subtest is given by:
Pνijk = P (Yνij ≤ k|θ0i , θνi , α0νj , ανj , κjk ) =
κjk −α0νj
Zθ0i −ανj θνi
= Φ(κjk − α0νj θ0i − ανj θνi ) =
−∞
1
2
√ e−t /2 dt .
2π
(4.4)
θ0i represents the i-th overall ability and θνi represent the specic abilities (with ν = 1, . . . , m). For each item j of the subtest ν : α0νj reects the
item discrimination with reference to the overall ability, ανj reects the item discrimination with reference to the specic ability and κjk is an item parameter
that reect the threshold between categories k and k + 1. The predictor ηνij now
depends on both specic and general latent traits: ηνij = κjk − α0νj θ0i − ανj θνi .
The probability πνijk that the i-th examinee will select the k -th category on
item j in subtest ν is obtained recursively from (4.3), as in the multiunidimenHere,
sional GRM.
It has to be noticed that both general and specic abilities are involved in
46
4. MIRT graded response models with complex structures
determining the response probability by following a compensatory approach. Finally, all the latent traits underlying the item responses are allowed to correlate.
4.2 Person and item parameters: interpretation
The aim of this section is to briey illustrate the meaning of the parameters
introduced in the MIRT models for graded responses described in paragraphs
4.1.1 and 4.1.2.
Contents of this section are particularly helpful for practical
applications.
4.2.1 Ability parameters
The presence of more than one latent trait aecting the response process to a test
is on the basis of the use of multidimensional item response theory models. The
θi -vector
of the latent space parameters for person
i
contains all the information
about the measurement of these latent abilities. Higher levels of abilities lead to
higher values in the elements of
θi .
Of course, the composition, and consequently the dimension, of the
θi -vector
depends on the underlying structure we are assuming. As often mentioned before,
when we are dealing with a multiunidimensional structure the vector of person
parameters has the form
θi = (θ1i , . . . , θmi ).
While in an additive context, a pa-
rameter reecting the general ability is added, and we get:
where
m
θi = (θ0i , θ1i , . . . , θmi ),
still denotes the number of specic abilities. One lack in one specic
dimension is compensated by the general dimension and viceversa.
4.2.2 Multidimensional item discrimination
Moving towards the signicance of the discrimination item parameters, when considered individually,
ανj
reects the capability of a generic item
between individuals with dierent levels of ability
sional and additive models.
item
j
Analogously,
α0νj
θν ,
j
to discriminate
both for multiunidimen-
reproduces the aptitude of the
to dierentiate individuals with dierent levels of general ability
θ0 .
4.3 Multiunidimensional GRM implementation
47
Muraki and Carlson (1995) and Yao and Schwarz (2006) dene the multidimensional item discrimination (MDISC) as the maximum discrimination of a test
item in a particular direction of the latent space.
Hence, considering the multiunidimensional and additive latent structures
assumed for the MIRT models presented in this work, we can dene two MDISC
measures.
The rst one (MDISC) is dened with reference to each one of the
latent dimensions
ν = 1, . . . , m.
For
MDISCj
j = 1, . . . , p
=
m
X
it is expressed as:
!1/2
2
ανj
.
(4.5)
ν=1
∗
The second one (MDISC ) include a further dimension, represented by the general
ability. For
j = 1, . . . , p,
MDISC
∗
MDISCj
∗
is expressed by:
=
m
X
!1/2
2
2
ανj
+ α0νj
.
(4.6)
ν=1
∗
With reference to a given item, the higher a value of MDISC (MDISC ) is, the
grater is the discrimination power of that item, independently from the assumed
underlying latent structure.
4.3 Multiunidimensional GRM implementation
In order to implement the multiunidimensional model for graded responses by
using OpenBUGS, the rst step that we have to face is the so called
building phase
(Ntzoufras, 2011).
model
We can summarize the functioning of this
phase through several sub-steps, namely:
1. identify the main variable of interest and the corresponding (observed) data;
2. build a structure for the parameters of the distribution;
3. specify the prior distributions;
48
4. MIRT graded response models with complex structures
4. nd a distribution that adequately describes the observed data and formulate the likelihood of the model.
Considering that our observed variables of interest (point 1) are the responses,
given from a group of examinees, to a test consisting of graded response items, in
this section we will dene all the elements listed above, according to the model
characterized by a multiunidimensional latent ability structure, i.e. according to
the probability function dened in (4.2).
4.3.1 Model specication
The probability model is specied according to the multiunidimensional structure
(point 2). Recalling the expression in (4.2), a generic cumulative probability
Pνijk
is a function of the item discrimination parameter (ανj ), the threshold parameter
k + 1 (κjk ), and the specic ability measured by the
Thus, for ν = 1, . . . , m, j = 1, . . . , p and k = 1, . . . , Kj − 1, it
= Φ(κjk − ανj θνi ).
between categories
j -th
item (θνi ).
holds that
Pνijk
k
and
PνijKj = 1, and we obtain by dierence the
probability that the response of individual i to item j will fall in category k :
πνij1 = Pνij1 and πνijk = Pνijk − Pν,i,j,k−1 , for ν = 1, . . . , m, j = 1, . . . , p and
k = 2, . . . , Kj .
As previously described, we set
The model parameters, treated in a Bayesian context as proper random variables, for which we need to specify prior distributions are the person parameters
θi = (θ1i , . . . , θmi )
and the item parameters
ανj
and
κj1 , . . . , κj,Kj −1 .
4.3.2 Prior distributions
Getting on to point 3 of the model building phase, in the multiunidimensional
GRM we assume that the latent traits
θ1 , . . . , θn are independent and multivariate
normally distributed:
θi ∼ Nm (µ, Σ) ,
where
θi = (θ1i , . . . , θmi )
m-dimensional
is the vector of latent traits for examinee
mean vector and
Σ
is the
m×m
i, µ
is the
constrained variance-covariance
4.3 Multiunidimensional GRM implementation
matrix with diagonal elements being
49
1 and o-diagonal elements being the ability
correlations.
Thus, for
i = 1, . . . , n,
the prior distribution for
θi
is dened as:
1
0
−1
exp − (θi − µ) Σ (θi − µ) ,
p(θi ) = p
2
(2π)m |Σ|
1
where
m
(4.7)
is the number of specic latent traits (subtests).
Moreover, normal distributions are assumed for item discrimination parameters, that is
ανj ∼ N (µα , σα2 ),
for
ν = 1, . . . , m
and
j = 1, . . . , p.
In addition,
considering that the parameter which reects the power of the item to discriminate between examinees is signicantly positive, the truncated version of the
normal distribution is taken into account:
ανj ∼ N (µα , σα2 ) I(ανj > 0) ,
where
I
indicates the indicator function.
The priors for the threshold parameters must account for the order constraint
κj1 < · · · < κj,Kj −1 , hence we proceed rst introducing unconstrained auxiliary
∗
∗
∗
2
parameters κj1 , . . . , κj,K −1 such that κjk ∼ N (µκ , σκ ) for j = 1, . . . , p and k =
j
1, . . . , Kj − 1 (Curtis, 2010). Then, prior distributions on the thresholds for
the j -th item can be obtained considering the order statistics for the auxiliary
variables:
κj1 = κ∗j,[1]
κj2 = κ∗j,[2]
.
.
.
κj,Kj −1 = κ∗j,[Kj −1]
where with
,
κ∗j,[s] is denoted the s-th order statistic of κ∗j1 , . . . , κ∗j,Kj −1 .
As reported
in Curtis (2010), this approach is also recommended by Plummer (2010).
Identication issues
Particular attention should be paid to the restrictions that have to be imposed on
hyperparameters in order to ensure the model identication. In general, Bayesian
50
4. MIRT graded response models with complex structures
item response models can be identied (Fox, 2010) by imposing restrictions on
the hyperparameters or via a (standard) scale transformation in estimation procedure.
According to the rst approach, for identication purposes we set
0, µκ =
0, σα2
=1
2
and σκ
= 1.
Moreover, a multivariate normal prior distribution
with a xed correlation structure is assumed for abilities:
i = 1, . . . , n,
where
Σ
µ = 0, µα =
θi ∼ Nm (0, Σ),
for
is the variance-covariance matrix dened before.
Even if this choice can be viewed as very restrictive, it reects the common
beliefs and usual assumption we nd in literature. In fact, a point of strength of
the Bayesian approach is the possibility to formulate particular prior distributions
depending on the information available a priori.
4.3.3 Likelihood function for responses
πνij1 , . . . , πνijKj
phase), thus for ν =
A categorical or generalized Bernoulli distribution of parameters
is assumed for responses (point 4 of the model building
1, . . . , m, j = 1, . . . , p
and
i = 1, . . . , n,
it holds that:
Yij |• ∼ Cat(πνij1 , . . . , πνijKj ) ,
(4.8)
therefore:
[k=1]
[k=2]
[k=K ]
P (Yij = k|•) = πνij1 · πνij2 · . . . · πνijKjj .
(4.9)
Once the likelihood function for observed data is dened, the model is specied
and we can perform the Bayesian estimation of the parameters of interest through
an easy implementation in OpenBUGS, which run the Gibbs sampler algorithm.
In particular the main advantage is due to fact that the joint posterior distribution
has an untractable form, while the full conditional distributions are well dened.
In fact:
P (θ, α, κ, Σ|Y ) ∝ L(Y |θ, α, κ, Σ) P (θ|Σ) P (α) P (κ) .
(4.10)
Expression (4.10) represents the joint posterior distribution of interest, where
is the likelihood function and
θ, α
and
κ
are assumed to be independent.
L
4.4 Additive GRM implementation
51
Details about the code used to implement the model in OpenBUGS are reported in Appendix A.
4.4 Additive GRM implementation
As mentioned before for the multiunidimensional model, the implementation in
OpenBUGS of the additive GRM needs the specication of the model according to
the probability function dened in (4.4), the denition of the prior distributions,
and the formulation of the likelihood function for the observed responses.
4.4.1 Model specication
The existence of a general ability in addition to the specic abilities implies the
introduction of the further component
individual i:
θi = (θ0i , θ1i , . . . , θmi ),
θ0i
in the vector of person parameters for
therefore the dimension of this vector is now
m + 1.
According to the additive structure presented in Figure 2.5, where each item
measures an overall and a specic ability directly, and translated in expression
(4.4), a generic cumulative probability
Pνijk
is a function of the item discrimi-
nation parameter related to the general ability (α0νj ), the item discrimination
parameter related to the specic ability (ανj ), the threshold parameter between
categories
k
the specic
k + 1 (κjk ), the general ability of the individual (θ0νi ), and
ability (θνi ) measured by the j -th item. We remind that each item
and
belonging to a given subtest measures the general ability and only one specic
ν = 1, . . . , m, j = 1, . . . , p
= Φ(κjk − α0νj θ0i − ανj θνi ).
ability. Hence, for
Pνijk
Again, we set
i
to item
j
PνijKj = 1,
j = 1, . . . , p
and
k = 1, . . . , Kj − 1,
it holds that
and the probability that the response of individual
will fall in category
k
πνij1 = Pνij1
k = 2, . . . , Kj .
nidimensional case:
and
can be obtained by dierence, as in the multiuand
πνijk = Pνijk − Pν,i,j,k−1 ,
for
ν = 1, . . . , m,
52
4. MIRT graded response models with complex structures
4.4.2 Prior distributions
Also in the additive GRM we assume that the latent traits
θ1 , . . . , θn
are inde-
pendent and multivariate normally distribuited:
θi ∼ Nm+1 (µ, Σ) ,
where
µ
is the
(m + 1)-dimensional
mean vector and
(m + 1) × (m + 1)
elements being 1 and o-
Σ
constrained variance-covariance matrix with diagonal
is the
diagonal elements being the ability correlations.
As explained above,
inee
i
θ0i
represents the unobservable general ability for exam-
which aects all the responses given from this examinee to the test items,
while the specic abilities for the individual i,
θ1i , . . . , θmi , aect every item contained in the corresponding subtest ν , for ν = 1, . . . , m. The prior distribution
for θi is then dened by expression (4.7), for i = 1, . . . , n, where m is the number
of both subtests and specic latent traits.
µ=0
and
Σ
For identication purposes, we set
xed variance-covariance matrix.
Normal distributions are assumed for item discrimination parameters
ανj ,
for
ν = 1, . . . , m
and
α0νj
and
j = 1, . . . , p:
α0νj ∼ N (µα0 , σα2 0 )
ανj ∼ N (µα , σα2 ) ,
and after having limited these parameters to be positive and having considered the
µα0 = µα = 0 and σα2 0 = σα2 = 1),
distributions for α0νj and ανj :
restraints due to the identication issues (i.e.
we obtain truncated normal prior
α0νj ∼ N (0, 1) I(α0νj > 0)
ανj ∼ N (0, 1) I(ανj > 0) .
Finally, concerning the threshold parameters, again we obtain an ordered
series
κjk , . . . , κj,Kj −1
starting from the unconstrained variables
tication constraints on hyperparmeters
κ∗jk
∼ N (0, 1),
µκ = 0
and
σκ2 = 1)
and applying the transformation:
{κjk , . . . κj,Kj −1 } = ranked{κ∗j1 , . . . , κ∗j,Kj −1 } .
κ∗jk
(with iden-
distributed as
4.4 Additive GRM implementation
53
4.4.3 Likelihood function for responses
Likewise the multiunidimensional GRM, a categorical distribution of parameters
πνij1 , . . . , πνijKj is assumed for responses, therefore, for ν = 1, . . . , m, j = 1, . . . , p
and i = 1, . . . , n, expressions (4.8) and (4.9) hold also for the additive GRM.
See Appendix A for details about the code used to implement the additive
GRM in OpenBUGS.
Summarizing, Table 4.1 reports the main characteristics of the multiunidimensional and the additive model considered in this work.
54
4. MIRT graded response models with complex structures
Multiunidimensional Model
Additive Model
for Graded Responses
for Graded Responses
Underlying latent structure
Model specication
Pνijk = Φ(κjk − ανj θνi )
Pνijk = Φ(κjk − α0νj θ0i − ανj θνi )
PνijKj = 1
PνijKj = 1
πνij1 = Pνij1
πνij1 = Pνij1
πνijk = Pνijk − Pν,i,j,k−1
πνijk = Pνijk − Pν,i,j,k−1
Prior distributions on person parameters
θi -vector
θi = (θ1i , . . . , θmi )
θi = (θ0i , θ1i , . . . , θmi )
Prior on θi
θi ∼ Nm (0, Σ)
θi ∼ Nm+1 (0, Σ)
Prior distributions on item parameters
Item discrimination
(for a specic ability)
ανj ∼ N (0, 1) I(ανj > 0)
ανj ∼ N (0, 1) I(ανj > 0)
Item discrimination
(for the general ability)
none
α0νj ∼ N (0, 1) I(α0νj > 0)
Threshold parameters
κ∗jk ∼ N (0, 1)
= ranked{κ∗j1 , . . . , κ∗j,Kj −1 }
κ∗jk ∼ N (0, 1)
= ranked{κ∗j1 , . . . , κ∗j,Kj −1 }
{κjk , . . . κj,Kj −1 }
Response likelihood
Yij |• ∼ Cat(πνij1 , . . . , πνijKj )
[k=1]
[k=2]
[k=K ]
P (Yij = k|•) = πνij1 · πνij2 · . . . · πνijKjj
Table 4.1.
Main features of the proposed multiunidimensional and additive
models for graded responses.
Chapter 5
Simulation Study
In this chapter we present the simulation study performed to assess the item
parameter recovery for both multiunidimensional and additive GRMs. The simulation study is conducted on a bidimensional case by varying the number of
response categories, the sample size, the test and subtest lengths and the ability
correlation structure.
Two distinct simulation analyses have been designed in
order to evaluate the parameter recovery of he multiunidimensional and the additive GRMs, respectively. A rst series of simulations was carried out with the
same simulation conditions for both models (Block 1). Then further conditions
were analyzed in order to better understand the behavior of the additive model
(Block 2). Several works on MIRT models focus on the accuracy of parameter
estimation, and, through the manipulation of simulation conditions, it is possible
to assess parameter recovery (Sheng, 2008; Sheng, 2010; Edwards, 2010a).
The rst section of the chapter describes the simulation study design, while in
the second and third sections are illustrated the simulation conditions and results
for the multiunidimensional and additive models, respectively.
55
56
5. Simulation Study
5.1 Simulation study design
The aim is the evaluation of the item parameter recovery of the multiunidimensional and the additive GRMs under several conditions. We consider the bidimensional case,
m = 2,
which, in particular, implies the presence of two specic
θ1 and θ2 for the multiunidimensional model, and the presence of two speabilities θ1 and θ2 and an overall ability θ0 for the additive model (recalling
abilities
cic
the graphical notation introduced before, the latent structures are summarized
in Figure 5.1).
Figure 5.1.
Bidimensional case for multiunidimensional and additive structures.
The model parameters and the ability correlations are estimated through
OpenBUGS version 3.2.2.
The fundamental scheme for each simulation is the
following (for more details about the procedure and the codes used for the implementation, see Appendix B):
•
Simulate the vectors of `real' parameters, taking into account the conditions
we are testing. We perform this step using an R GUI procedure.
•
Perform
Q = 10
replications of the computation procedure for each simu-
lation. In each replication we sample the data matrix using the parameters
obtained at the previous step, and we run OpenBUGS through the R GUI
package BRugs (Thomas et al., 2006), which basically permits to recall
OpenBUGS automatically from R.
•
Proceed to the evaluation of parameter recovery and the computation of the
reproduced correlations between the latent traits by using the
gained at the previous step.
Q
estimates
5.1 Simulation study design
57
5.1.1 Parameter recovery
In order to evaluate the recovery of the generated item parameters (which in
our simulation context correspond to the real population values), we compute
the absolute bias and the root mean square error (RMSE) for each estimated
parameter, taking account the
with
ω̂
Q
replications for each simulation. If we denote
a generic parameter estimate, i.e. the mean of the posterior distribution
gained in each replication, and with
ω ∗ the real generated value, biases and RMSE
are computed as follow:
Q
1 X
(ω̂q − ω ∗ )
Bias(ω) =
Q q=1
(5.1)
v
u Q
X
1u
t
(ω̂q − ω ∗ )2 ,
RMSE(ω) =
Q q=1
(5.2)
where lower levels of bias and RMSE indicate better precision in parameter recovery.
5.1.2 Estimated ability correlations
Considering that the two models have been specied allowing the latent traits to
correlate, and that the correlation structure is reected in the variance-covariance
matrix of the latent abilities
Σ,
we are not interested only in item parameters
recovery, but also in the way the models are able to reproduce such ability correlations.
For this reason, for each simulation, we report also the estimated ability Pearson correlations:
r̂12
for the multiunidimensional model, and
the additive model (remind that with
0
r̂01 , r̂02
and
r̂12
for
we refer to the overall ability).
5.1.3 Convergence detection
In Lunn et al. (2013) is clearly described how important is the detection of the
chain convergence. An easy, but eective, strategy is the detection of convergence
58
5. Simulation Study
informally
by eye.
Anyway, the model could include many parameters and,
consequently, it can be quite hard to check all of them by eye.
shows two examples of chains that have reached the convergence.
part of the chain, i.e. the non-stationary part, is called
burn-in
1
Figure 5.2
The initial
and the iterations
belonging to it must be discharged to be sure that the successive realisations can
be considered as a sample from the stationary distribution. The burn-in period
is easily recognizable in the rst chain reported in Figure 5.2.
Figure 5.2.
Examples of stationary chains.
The so called R statistic of Gelman and Rubin, proposed by Gelman and
Rubin (1992) and further developed by Brooks and Gelman (1998), represents
an useful instrument adopted to check the convergence of the Markov chains,
and hence the reliability of the estimates.
This convergence diagnostic can be
constructed only when more than one chain are run simultaneously. This aspect
lead to our decision of running two distinct chains for each simulation (see the
next section).
Basically, the convergence is reached when the chains follow an indistinguishable, not recognizable, trajectory from the initial values. The method is based on
the between and within sample variabilities (Ntzoufras, 2011) and the diagnostic
statistic is given by:
R̂ =
1 Source:
Lunn et al., 2013.
V̂
T 0 − 1 B/T 0 M + 1
=
+
,
W
T0
W
M
5.1 Simulation study design
59
T 0 represents the number of iterations in each chain, M is the number of
0
chains, B/T is the between-sample variance, that is the variance of the posterior
mean values taking into account all the chains, W is the within-sample variance,
where
that is the mean of variances within each chain, and the pooled posterior variance
is given by (Ntzoufras, 2011):
B M +1
T0 − 1
.
W+ 0
0
T
T
M
V̂ =
Once the chains are stationary and the convergence is reached,
R̂ → 1.
A cor-
rected version of the R statistic also exists, see Brooks and Gelman (1998).
5.1.4 Bayesian t
Additionally to the calculation and examination of
R̂, other well known indicators
for the t evaluation are the Bayesian deviance and the deviance information
criterion (DIC) (Lunn et al., 2013).
Their use is appropriate to obtain some
measures of t and complexity of the model considered.
The Bayesian deviance is dened as:
D(θ) = −2 log p(y|θ) ,
where
θ denotes the model parameters and with p(y|θ) is denoted the full sampling
distribution. OpenBUGS considers it as a node (created automatically), so that
it has its own posterior distribution and can be considered like the other model
parameters. Combining the mean posterior deviance,
model parameters,
pD ,
D(θ),
and the number of
we can compute the DIC through the expression
DIC = D(θ) + pD .
It can be proved that the DIC is an approximation of the Akaike's information
criterion,
AIC = D(θ) + 2pD .
Also in this case, OpenBUGS permits to easily
compute the DIC for each model implemented.
60
5. Simulation Study
5.1.5 General simulation conditions
For all the simulations conducted in this work,
Q = 10
replications have been
performed. For each one, we have considered a chain length of 30,000 iterations,
with a burn-in phase of 15,000 iterations. Moreover, two chains have been generated, in order to be able to set in OpenBUGS the computation of the R and
the DIC statistics.
These choices may be penalizing with reference to the computational time
needed to run the Gibbs sampling algorithm for each simulation (a single replication needs about 13 hours to be completed), nevertheless, after an examination of
the R diagnostic illustrated above, they ensure the reaching of the convergence.
For each distinct case, we perform dierent simulations according to a sample
size of
n = 500,
and a larger sample size of
n = 1000.
5.2 Multiunidimensional GRM: simulations and
results
In this section we report the conditions and the results about the simulations
made to assess the parameter recovery of MCMC estimation for the multiunidimensional model for graded responses. All the simulations conducted are characterized by the general conditions reported in section 5.1.5 and other specic
conditions, with the aim to evaluate the sensitivity of the model.
5.2.1 Simulation conditions
We consider
n
individuals and a set of
each one consisting of
p1
and
p2
p
ordinal items, divided into 2 subtests,
items. The response
Yij
of the i-th individual to
j -th item can take values in the set 1, . . . , Kj , hence each item is characterized
by Kj − 1 thresholds satisfying the order constraint κj1 < . . . κj,Kj −1 . Moreover,
we assume that all the test items have the same number of categories, i.e. K1 =
... = Kp = K . Additionally, we assume the existence of m = 2 latent abilities, θ1
and θ2 , underlying the responses to the items, which follow a multiunidimensional
the
latent structure (see Figure 5.1, left part). Thus, the test consists of two subtests
5.2 Multiunidimensional GRM: simulations and results
p
p1
p2
Kj
n
Σ
1
15
5
10
3
500
2
15
5
10
3
500
3
15
5
10
4
500
4
15
5
10
4
500
5
15
5
10
3
1000
6
15
5
10
3
1000
7
15
5
10
4
1000
8
15
5
10
4
1000
ΣA
ΣB
ΣA
ΣB
ΣA
ΣB
ΣA
ΣB
Simulation
]
]
]
]
]
]
]
]
Table 5.1.
61
Simulation conditions for the multiunidimensional model for graded
responses.
and the items in each subtest characterize a single specic ability. Moreover, the
specic abilities are allowed to correlate and the model follows a compensatory
approach.
We perform a block of simulations (Block 1) referred to the case where a test
p = 15 is divided into a rst subtest made of p1 = 5 items and a second
subtest made of p2 = 10 items. A further distinction has been made about the
number of item categories, varying from K = 3 to K = 4. Furthermore, each case
was analyzed by using two dierent correlation matrices among the abilities: ΣA
and ΣB . ΣA is a 2 × 2 identity matrix, where the correlation among the specic
abilities is set to zero (r12 = 0). The second correlation matrix ΣB introduces a
moderate correlation between the latent abilities (r12 = 0.4).
length of
By combining all the conditions, we obtain 8 dierent scenarios, listed in Table
5.1, to investigate the parameter recovery for the multiunidimensional GRM.
5.2.2 Results
In this section we report the results we obtained for each of the 8 simulations
conducted for the multiunidimensional model for graded responses. In the following, for each item parameter type within a subtest, median absolute bias and
median root mean square error are reported for each scenario, as well as the
62
5. Simulation Study
Simulations Block 1 - Subtest 1 (5 items)
α1
n
(p1 ,
p2 )
K
RMSE
κ1
Bias
RMSE
ΣA
ΣB
(5,10) 4
ΣA
ΣB
(5,10) 3
ΣA
ΣB
0.06
0.02
0.05
0.06
0.04
(5,10) 4
ΣA
ΣB
0.05
0.05
1000
Table 5.2.
Bias
0.11 0.09 0.08 0.05
0.11 0.07 0.10 0.02
0.11 0.09 0.12 0.06
0.09
0.05
0.10
0.10
(5,10) 3
500
κ2
κ3
RMSE Bias RMSE Bias
0.08
0.04
0.07
0.03
0.07
0.03
0.09
0.04
0.08
0.02
0.09
0.05
0.03
0.08
0.05
0.05
0.04
0.05
0.02
0.02
0.06
0.04
0.05
0.01
0.04
0.01
0.01
0.04
0.01
0.03
0.01
0.05
0.02
Multiunidimensional model: block 1 simulation results for subtest 1
(median RMSEs and median absolute biases).
Simulations Block 1 - Subtest 2 (10 items)
α1
n
(p1 ,
p2 )
K
κ1
RMSE Bias RMSE
κ2
Bias
κ3
RMSE Bias RMSE
Bias
(5,10) 3
ΣA
ΣB
0.09
0.02
0.07
0.02
0.07
0.02
0.08
0.01
0.07
0.03
0.08
0.02
(5,10) 4
ΣA
ΣB
0.07
0.03
0.08
0.09
0.05
0.09
0.02
0.08
0.06
0.08
0.08
0.01
(5,10) 3
ΣA
ΣB
0.07
0.02
0.06
0.02
0.05
0.03
0.08
0.03
0.06
0.03
0.06
0.03
(5,10) 4
ΣA
ΣB
0.07
0.02
0.05
0.02
0.05
0.02
0.06
0.01
0.06
0.02
0.05
0.01
0.05
0.00
0.08
0.03
500
1000
Table 5.3.
0.12 0.07
0.11 0.02
Multiunidimensional model: block 1 simulation results for subtest 2
(median RMSEs and median absolute biases).
ability correlation estimates.
In Tables 5.2 and 5.3 we present RMSE and absolute bias for the item parameters (discrimination and thresholds parameters) characterizing, respectively, the
5.2 Multiunidimensional GRM: simulations and results
63
items belonging to the rst subtest and the items belonging to second subtest.
Values of the RMSE greater than 0.10 and values of the absolute bias greater
than 0.05 are highlighted in bold, identifying cases where the parameter recovery
could be improved.
With reference to the rst subtest, Table 5.2 shows how the worst performances are related to the smaller sample sizes (n
creased the sample size to
= 500).
In fact, when we in-
n = 1000, the RMSEs and biases noticeably decreased,
other things being equal. The presence of an underlying correlation between the
two latent traits does not seem to aect the item parameters recovery.
Results are similar for items belonging to the second subtest (Table 5.3).
Higher biases are noticed when sample sizes are smaller, even though the overall
parameter reproduction is better if compared to the rst subtest.
This aspect
should be due to the greater number of items included in the second subtest
(p2
= 10
versus
p1 = 5).
For
n = 1000
item parameters are recovered very
precisely.
Multiunidimensional model:
Real and estimated ability correlations
r12
(5,10) 3
ΣA
ΣB
(5,10) 4
ΣA
ΣB
(5,10) 3
ΣA
ΣB
(5,10) 4
ΣA
ΣB
500
1000
Table 5.4.
r̂12
0.00
0.02
0.40
0.46
0.00
0.03
0.40
0.45
0.00
-0.01
0.40
0.49
0.00
-0.01
0.40
0.48
Multiunidimensional model: real (r ) and estimated (r̂ ) ability corre-
lations.
Table 5.4 illustrates the estimated ability correlations for each scenario.
can be noticed that the dierences between the generated real values
estimated values
r̂12
r12
It
and the
are quite low, indicating good performances of the model.
64
5. Simulation Study
Here, unlike what we observe for item parameters, the underlying ability correlation structure seems to inuence the correlation reproduction. In fact we observe
a worst reproduction in correspondence to the model characterized by the more
complex latent correlation structure
ΣB .
The sample size seems to inuence the
latent traits correlation reproduction: the reproduction accuracy increases with
the increase of sample size for the simple case (ΣA ), while it decreases with the
increase of sample size for the complex case (ΣB ).
As a conclusive remark, what emerges from the simulation study conducted to
assess the multiunidimensional model for graded responses with correlated latent
traits is that item parameters and ability correlations are well reproduced.
5.3 Additive GRM: simulations and results
In this section we report the conditions and the results related to the simulation
study conducted to evaluate the multidimensional additive GRM with correlated
abilities, estimated within a Bayesian context. In addition to the rst block of
simulations designed also for the multiunidimensional model, a further block of
simulations has been performed, in order to better understand the behavior of
our proposed model.
5.3.1 Simulation conditions
The general simulation conditions for the additive model for graded responses
are the same as the multiunidimensional model. We still assume the existence
of of
m = 2
specic latent abilities,
overall latent ability
θ0 .
θ1
and
θ2
, but now we consider also an
Accordingly, we are focusing on the bidimensional case,
for which the latent structure is represented in Figure 5.1, right part.
p2
the
number of items belonging to the rst and the second subtest, respectively.
Kj
We indicate with
p
the total number of ordinal test items, with
p1
and
5.3 Additive GRM: simulations and results
65
j -th item and we consider that all the items
Kj = K, ∀j .
indicates the greater category for the
have the same number of categories
We start from a rst block of simulations (Block 1) referred to the case where
a test length of
p = 15
is divided into a rst subtest made of
second subtest made of
p2 = 10 items.
p1 = 5
items and a
A further distinction has been made about
the number of item categories, varying in the rst block from
K=3
to
K = 4.
Furthermore, each case was analyzed by using two dierent correlation matrices
3 × 3 identity matrix, where all the
correlations among the abilities are set to zero (r01 = r02 = r12 = 0). In this case,
among the abilities:
ΣA
and
ΣB . ΣA
is a
the additive model with orthogonal traits has the same latent structure of the
well known bi-factor model and the three latent traits (the general and specic
abilities) are separate and well distinguished from each other. The second corre-
ΣB introduce moderate correlations between all the latent abilities
= 0.4, r02 = 0.3, r12 = 0.2). The choice to consider not particularly high
lation matrix
(r01
levels of correlation has been driven by the consideration that high correlations
among the latent abilities may lead to the existence of a dominant latent trait,
redirecting to a unidimensional model.
In order to investigate further conditions, we designed a second block of simulations (Block 2), where we increase both the length of the test and the number
of item categories. We consider a case characterized by a test length of
(divided into
p1 = 20
and
p2 = 30
p = 50
items for subtest 1 and 2, respectively) and
K = 4 categories for each test item; and a last case where the test length is
p = 30 (p1 = 10 and p2 = 20) and items have K = 5 categories. Again, with
respect to the correlation matrix, the two cases of ΣA and ΣB are distinguished
as above.
By combining all the simulation conditions, we obtain 16 dierent scenarios,
illustrated in Table 5.5, to investigate the parameter recovery for the proposed
model.
66
5. Simulation Study
p
p1
p2
Kj
n
Σ
1
15
5
10
3
500
2
15
5
10
3
500
3
15
5
10
4
500
4
15
5
10
4
500
5
15
5
10
3
1000
ΣA
ΣB
ΣA
ΣB
ΣA
ΣB
ΣA
ΣB
Simulation
]
]
]
]
]
]
]
]
6
15
5
10
3
1000
7
15
5
10
4
1000
8
15
5
10
4
1000
50
20
30
4
500
50
20
30
4
500
30
10
20
5
500
30
10
20
5
500
50
20
30
4
1000
50
20
30
4
1000
30
10
20
5
1000
30
10
20
5
1000
]9
] 10
] 11
] 12
] 13
] 14
] 15
] 16
Table 5.5.
ΣA
ΣB
ΣA
ΣB
ΣA
ΣB
ΣA
ΣB
Simulation conditions for the additive model for graded responses.
5.3.2 Results
Tables 5.6 and 5.7 show the item parameter recovery for the rst block of simulations where
p = 15 (p1 = 5
and
p2 = 10),
respectively for subset 1 and subtest
2. It emerges that all parameters are quite well recovered when the number of
K = 3
categories for each item is
and a sample size of
get accurate estimates. Results are slightly better for the
rather than
n = 500 is enough to
ΣA correlation matrix,
ΣB .
On the other hand, when the number of item categories is
accurate estimates, for both
ΣA
and
ΣB
K = 4 we obtain less
ability correlation structures. Estimates
get better after increasing the sample size, but median RMSEs and biases remain
rather high, especially for
α0
and
αv
discrimination parameters.
that this result is more evident for the rst subtest where
second one where
Considering
p1 = 5, rather than the
p2 = 10, this may be due to the small number of item compared
5.3 Additive GRM: simulations and results
67
to the increased number of categories.
Results about the second block of simulations are reported in Tables 5.8 and
5.9.
Focusing on the case where
p = 50 (p1 = 20
and
p2 = 30)
and
K = 4,
we observe that in both subtests the item parameters are not well recovered,
particularly the discrimination parameters.
Nevertheless, these shortcomings are overtaken by increasing the sample size.
In fact, when
n = 1000 all the parameters are recovered rather precisely.
Dierent
correlation structures seem not to aect parameter recovery, with an exception
of the discrimination parameters for the second subtest, where we register higher
median RMSEs in association to the more complex correlation structure.
p = 30 (p1 = 10 and p2 = 20) and K = 5 benet
from the enlarged sample size. For n = 1000, item parameters are recovered with
care, with slightly better accuracy with respect to ΣA correlation matrix.
Analogously, the cases where
68
Simulations Block 1 - Subtest 1 (5 items)
α0
n
(p1 ,
p2 )
K
RMSE
α1
Bias
(5,10) 3
ΣA
ΣB
0.08
0.05
0.09
0.02
(5,10) 4
ΣA
ΣB
0.13
0.17
0.07
(5,10) 3
ΣA
ΣB
(5,10) 4
ΣA
ΣB
500
1000
Bias
RMSE
Bias
RMSE
0.03
0.08
κ3
Bias
0.13
0.09
0.10
0.01
0.07
0.04
0.09
0.03
0.09
0.02
0.15
0.16
0.12
0.16
0.15
0.10
0.05
0.15
0.12
0.23
0.09
0.02
0.07
0.03
0.09
0.03
0.08
0.06
0.07
0.03
0.09
0.03
0.08
0.03
0.06
0.03
0.15
0.05
0.06
0.03
0.08
0.05
0.08
0.04
0.09
0.14
0.02
0.08
0.08
κ2
0.14
0.16
0.16
0.07
0.12
0.12
RMSE
0.13
0.15
0.16
0.15
κ4
Bias
RMSE
Bias
0.04
0.03
0.10
0.10
Additive model: block 1 simulation results for subtest 1 (median RMSEs and median absolute biases).
5. Simulation Study
Table 5.6.
RMSE
κ1
n
(p1 ,
p2 )
K
(5,10) 3
ΣA
ΣB
(5,10) 4
ΣA
ΣB
(5,10) 3
ΣA
ΣB
(5,10) 4
ΣA
ΣB
500
1000
Table 5.7.
α2
RMSE
Bias
0.09
0.11
0.12
0.14
0.09
0.15
0.16
0.23
κ1
κ2
κ3
RMSE
Bias
RMSE
Bias
RMSE
Bias
0.05
0.10
0.02
0.08
0.02
0.08
0.03
0.04
0.00
0.05
0.09
0.04
0.10
0.00
0.05
0.10
0.03
0.04
0.04
0.10
0.10
0.10
0.09
0.02
0.05
0.06
0.02
0.06
0.02
0.03
0.05
0.02
0.05
0.01
0.07
0.03
0.07
0.07
0.04
0.06
0.04
0.14
0.14
0.04
0.09
0.02
0.05
0.12
0.13
0.18
0.16
0.19
0.09
0.11
κ4
RMSE
Bias
0.13
0.11
0.06
0.03
0.08
0.03
0.03
0.09
0.04
RMSE
Bias
0.03
5.3 Additive GRM: simulations and results
Simulations Block 1 - Subtest 2 (10 items)
α0
Additive model: block 1 simulation results for subtest 2 (median RMSEs and median absolute biases).
69
70
Simulations Block 2 - Subtest 1 (20 and 10 items)
α0
n
(p1 ,
p2 )
K
(20,30) 4
ΣA
ΣB
(10,20) 5
ΣA
ΣB
(20,30) 4
ΣA
ΣB
(10,20) 5
ΣA
ΣB
500
1000
κ1
RMSE
Bias
RMSE
Bias
0.14
0.15
0.20
0.17
0.07
0.07
0.07
0.14
0.18
0.21
0.22
0.08
0.10
0.06
0.07
0.07
0.05
0.08
0.06
0.02
0.08
0.07
0.19
0.05
0.04
0.04
0.07
0.27
κ2
RMSE
Bias
RMSE
0.10
0.06
0.09
0.03
0.09
κ3
κ4
Bias
RMSE
Bias
RMSE
Bias
0.10
0.05
0.10
0.04
0.08
0.03
0.09
0.03
0.01
0.09
0.03
0.08
0.04
0.09
0.04
0.10
0.02
0.09
0.02
0.08
0.02
0.08
0.02
0.04
0.06
0.01
0.06
0.01
0.05
0.01
0.04
0.06
0.04
0.06
0.04
0.06
0.03
0.04
0.08
0.02
0.06
0.02
0.05
0.02
0.05
0.01
0.05
0.07
0.03
0.06
0.03
0.05
0.03
0.07
0.02
Additive model: block 2 simulation results for subtest 1 (median RMSEs and median absolute biases).
5. Simulation Study
Table 5.8.
α1
α0
n
(p1 ,
p2 )
K
α2
κ1
RMSE
Bias
RMSE
Bias
RMSE
κ2
Bias
RMSE
κ3
Bias
RMSE
κ4
Bias
RMSE
Bias
(20,30) 4
ΣA
ΣB
0.16
0.07
0.20
0.08
0.10
0.10
0.05
0.10
0.10
0.06
0.03
0.02
0.06
0.10
0.07
0.10
0.05
(10,20) 5
ΣA
ΣB
0.23
0.20
0.08
0.09
0.18
0.19
0.05
0.08
0.09
0.03
0.07
0.03
0.07
0.02
0.09
0.03
0.08
0.03
0.07
0.03
0.08
0.02
0.08
0.02
(20,30) 4
ΣA
ΣB
0.06
0.05
0.06
0.02
0.06
0.02
0.06
0.02
0.06
0.01
0.03
0.07
0.02
0.05
0.02
0.07
0.02
(10,20) 5
ΣA
ΣB
500
1000
Table 5.9.
0.14
0.03
0.09
0.03
0.06
0.17
0.07
0.05
0.07
0.03
0.05
0.01
0.05
0.01
0.05
0.01
0.06
0.02
0.06
0.01
0.08
0.03
0.06
0.02
0.05
0.02
0.05
0.01
0.06
0.02
5.3 Additive GRM: simulations and results
Simulations Block 2 - Subtest 2 (30 and 20 items)
Additive model: block 2 simulation results for subtest 2 (median RMSEs and median absolute biases).
71
72
5. Simulation Study
Additive model: Real and estimated ability correlations
r01
r̂01
r02
r̂02
r12
r̂12
(5,10) 3
ΣA
ΣB
0.00
0.07
0.00
0.16
0.00
-0.07
0.40
0.62
0.30
0.49
0.20
0.24
(5,10) 4
ΣA
ΣB
0.00
0.09
0.00
0.29
0.00
-0.07
0.40
0.60
0.30
0.56
0.20
0.27
(5,10) 3
ΣA
ΣB
0.00
0.11
0.00
0.16
0.00
-0.05
0.40
0.60
0.30
0.54
0.20
0.27
(5,10) 4
ΣA
ΣB
0.00
0.13
0.00
0.36
0.00
-0.05
0.40
0.58
0.30
0.65
0.20
0.30
(20,30) 4
ΣA
ΣB
0.00
0.00
0.00
0.29
0.00
-0.05
0.40
0.50
0.30
0.36
0.20
0.21
(10,20) 5
ΣA
ΣB
0.00
0.02
0.00
0.15
0.00
-0.03
0.40
0.51
0.30
0.48
0.20
0.24
(20,30) 4
ΣA
ΣB
0.00
0.03
0.00
0.07
0.00
-0.02
0.40
0.45
0.30
0.37
0.20
0.20
(10,20) 5
ΣA
ΣB
0.00
0.06
0.00
0.11
0.00
-0.05
0.40
0.52
0.30
0.41
0.20
0.24
500
1000
500
1000
Table 5.10.
Additive model: real (r ) and estimated (r̂ ) ability correlations.
Table 5.10 illustrates the estimated ability correlations for each scenario.
Their correspondent true values are also reported and we can observe how the
correlations are reproduced. In particular, the results are coherent with the ones
observed in relation to the item parameters: the best performance is associated
to the cases of the highest sample size, a reasonable number of items (totally 50)
and a number of categories equal to 4, even in case of slightly high correlations.
To conclude, the main results showed that the algorithm is particularly sensitive to the sample size due to the model complexity and the high number of
parameters to be estimated.
(n
= 1000),
In fact, when the sample size is suciently large
all the parameters are well reproduced.
The results are also af-
fected by the trade-o between the test length and the number of categories: the
5.3 Additive GRM: simulations and results
73
worst results are associated to a high number of categories and a low test length.
Analogous evidences apply for the correlation estimates.
74
5. Simulation Study
Chapter 6
Application to real data: residents'
attitudes towards tourism
In this chapter we illustrate an implementation of the proposed models on data
collected with the aim to investigate Romagna and San Marino residents' perceptions and attitudes towards the tourism industry. After having introduced the
interpretation of model parameters in this new context, we illustrate the research
design. Results about the multiunidimensional and additive GRMs estimations
are reported in the nal two sections.
6.1 Interpretation of model parameters
In the present application, the opinions of a sample of respondents on a set of
aspects referred to the tourism industry represent our observed variables. Therefore, latent traits can be dened as `perceptions'.
The investigation involves
two distinct aspects of the phenomenon, namely perceived benets and costs of
tourism.
Therefore, it is possible to identify two specic perceptions and the
overall attitude of respondents as latent variables.
Within this framework, discrimination parameters represent the capability of
the items to dierentiate between respondents with dierent levels of agreement,
75
76
6. Application to real data: residents' attitudes towards tourism
whereas the threshold parameters can be interpreted as `criticity levels' of the
corresponding item.
For a given item, high values for the criticity parameters
correspond to lower probabilities to observe responses in positive categories.
6.2 Research design
Data analyzed are the result of a research conducted by the University of Bologna
with the aim to study the subjective well-being (Bernini et al., 2013). Data were
collected in the end of 2010 from residents in the Romagna area and in the State of
San Marino (Italy). The Romagna area consists of the provinces of Forlì-Cesena,
Rimini, and Ravenna, and is located in the southeast of the Emilia-Romagna
region. The independent republic of San Marino borders the Rimini Province.
The tourism industry has a relevant weight in this area: it contains the 7% of
Italian accommodation structures and the 5% of Italian entertainment activities.
Moreover, it is one of the main Italian tourism destinations, hosting in 2010
almost 27.5 million overnight stays (7.3% of the total national overnights) and
5.3 million arrivals (5.3% of the total national arrivals).
The sampling design was carried out taking into account a stratication of the
provinces and the demographic characteristics of the tourists (age and gender).
The nal sample is representative of the population at the provincial level, with a
margin of error of
±5% at a 95% level of condence.
A total of 794 questionnaires
were obtained through a telephone survey.
The questionnaire was created with the aim to collect residents' evaluations
about costs and benets of the tourism industry, a personal benet from tourism,
the quality of life in the area, the degree of involvement in the tourism industry,
the residents' satisfaction with either their leisure or non-leisure domains, their
quality of life, and the degree of support for future development of the tourism
industry. Furthermore, personal information (age, gender, nationality, residence
and occupation) were also collected (see Appendix C for the submitted questionnaire). Some characteristics of the sample are summarised in Table 6.1.
In particular, among all the aspects investigated through the survey, the object
of our analysis is the perception of benets and costs associated to the tourism
industry.
The perceived benets of tourism were assessed by ve items: the support in
6.2 Research design
77
Number
%
246
245
243
60
31.0
30.9
30.6
7.6
413
381
52.0
48.0
65
115
171
127
95
221
8.2
14.5
21.5
16.0
12.0
27.8
105
196
192
301
13.2
24.7
24.2
37.9
Provinces
Forlì-Cesena
Ravenna
Rimini
San Marino
Gender
Female
Male
Age
<
25
25 - 35
35 - 45
45 - 55
55 - 65
≥
65
Education
Primary
Lower secondary
Upper secondary
University
Table 6.1.
Prole of respondents.
local economic development [B1], quality of life [B2], public services improvement
[B3],
employment prospects
[B4],
and opportunities for cultural activities
[B5].
Respondents were asked to indicate whether those items would improve for their
community as a result of increasing tourism activity on a 7-point anchor scale,
from strongly disagree to strongly agree.
On the other hand, the perceived costs of tourism were assessed by other
ve items: the cost of living [C1], crime [C2], environment damage [C3], trac
congestion [C4], and pollution [C5]. In this case residents were asked to express if
those aspects would worsen for their community as a result of increasing tourism
activity on the 7-point scale mentioned above. Scales of the items with respect
78
6. Application to real data: residents' attitudes towards tourism
to costs were inverted in order to eliminate reverse scoring and make the low and
high scores be associated with high and low perceptions of costs, respectively.
In Table 6.2, the response frequencies are reported for each item. Items B1-B5
refer to the benets, while items C1-C5 refer to the costs that were perceived by
residents about the tourism industry.
Responses
Item
Item description
Low benets
←−
1
2
3
4
−→
5
6
7
12
51
58
157
149
235
132
High benets
B1
Econ. support
B2
Quality of life
24
49
78
184
227
155
77
B3
Public services
16
45
97
186
190
171
89
B4
Job opportunities
16
36
69
157
187
198
131
B5
Cultural act.
30
54
76
186
188
157
103
Responses
High costs
Item
Item description
←−
−→
Low costs
1
2
3
4
5
6
7
C1
Cost of life
64
151
182
139
119
100
39
C2
Crime rate
145
169
157
155
69
71
28
C3
Env. damage
117
151
166
187
96
59
18
C4
Trac
193
152
158
130
89
45
27
C5
Pollution
158
173
164
136
63
81
19
Table 6.2.
Response frequencies for items about tourism benets (B1-B5) and
items about tourism costs (C1-C5).
6.3 Results for the multiunidimensional GRM
The parameters of the bidimensional version of the multiunidimensional GRM
have been estimated on the basis of the residents' responses to the 5 items on
benets (B1-B5) and the 5 items on costs (C1-C5).
6.3 Results for the multiunidimensional GRM
79
By following a conrmatory approach, we assume that the item responses on
benets are related to the rst latent variable
costs are related to the second latent variable
θ1 ,
θ2 .
while the item responses on
The two traits are allowed to
correlate.
Concerning the denition of the latent traits,
perception of tourism benets, while
tourism costs.
θ2
θ1
can be expressed as the
can be dened as the perception of the
These interpretations strictly derives from the meaning of the
items included in the questionnaire.
A positive perception of the eect of the
tourism industry is reected by high resident scores on
θ1
and
θ2 .
In particular,
the higher the positive perception of the eect of tourism on the local environment
is, the higher the score is on
θ1 .
Conversely, the higher the score is on
θ2 ,
the
lower the perception of a negative impact of tourism on the environment is.
The model parameters were estimated by using the proposed OpenBUGS
procedure for the multiunidimensional GRM, with two chains and 30,000 total
iterations (15,000 as burn-in) for each one. Table 6.3 illustrates the item parameter estimates for the test items.
The strength of the relationship among the observed responses and the related
latent trait is expressed by the discrimination parameters
α.
From Table 6.3 we
can see that these parameters are all largely positive, suggesting that there is a
coherent choice for the chosen latent structure.
Particularly, the capability of an item to dierentiate individuals with dierent
perceptions of the impact of tourism increases as the discrimination parameters
increases.
This relationship means that public services, job opportunities and
cultural activities (items B3, B4 and B5, respectively) are the most informative on
the perception of the tourism advantage, whereas trac and pollution (items C4
and C5) can better discriminate between residents who have dierent perceptions
of the environmental impact of tourism. Among all the items, the cost of life (C1)
presents the lower discrimination capability.
80
Item description
̂
SD( ̂ )
MCSE( ̂ )
̂
SD(̂ )
MCSE(̂ )
̂
SD(̂ )
MCSE(̂ )
̂
SD(̂ )
MCSE(̂ )
B1
B2
B3
B4
B5
C1
C2
C3
C4
C5
Econ. support
Quality of life
Public services
Job. opp.
Cultural act.
Cost of life
Crime rate
Env. damage
Traffic
Pollution
1.103
1.204
1.485
1.423
1.339
0.286
1.563
1.440
1.638
1.793
0.074
0.078
0.096
0.094
0.087
0.105
0.109
0.100
0.117
0.131
0.001
0.001
0.001
0.002
0.001
0.004
0.002
0.003
0.003
0.003
-2.904
-2.632
-3.240
-3.231
-2.629
-1.468
-1.603
-1.744
-1.268
-1.580
0.155
0.134
0.184
0.183
0.134
0.115
0.106
0.105
0.097
0.114
0.001
0.001
0.002
0.002
0.001
0.001
0.001
0.001
0.001
0.001
-1.920
-1.856
-2.178
-2.315
-1.844
-0.633
-0.432
-0.658
-0.322
-0.413
0.095
0.096
0.116
0.123
0.099
0.065
0.080
0.078
0.082
0.087
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
-1.400
-1.225
-1.301
-1.549
-1.221
0.003
0.450
0.222
0.618
0.566
0.080
0.078
0.088
0.095
0.082
0.046
0.080
0.074
0.085
0.088
0.001
0.001
0.002
0.002
0.001
0.001
0.001
0.001
0.001
0.001
Item
Item description
̂
SD(̂ )
MCSE(̂ )
̂
SD(̂ )
MCSE(̂ )
̂
SD(̂ )
MCSE(̂ )
B1
B2
B3
B4
B5
C1
C2
C3
C4
C5
Econ. support
Quality of life
Public services
Job. opp.
Cultural act.
Cost of life
Crime rate
Env. damage
Traffic
Pollution
-0.496
-0.274
-0.284
-0.560
-0.235
0.471
1.345
1.235
1.449
1.512
0.065
0.066
0.075
0.076
0.070
0.052
0.096
0.088
0.102
0.111
0.001
0.001
0.002
0.002
0.001
0.001
0.001
0.001
0.001
0.001
0.171
0.807
0.723
0.366
0.679
0.962
1.867
1.982
2.234
2.082
0.063
0.072
0.080
0.074
0.074
0.071
0.109
0.108
0.126
0.130
0.001
0.001
0.002
0.001
0.001
0.001
0.001
0.001
0.001
0.001
1.348
1.906
2.010
1.543
1.749
1.670
2.775
2.893
2.946
3.367
0.077
0.099
0.113
0.094
0.098
0.107
0.146
0.151
0.158
0.191
0.001
0.001
0.001
0.002
0.001
0.001
0.001
0.001
0.001
0.001
NOTE:
= 1 for the items on benefits and
Table 6.3.
= 2 for the items on costs, SD = standard deviation, MCSE = Monte Carlo standard error.
Item parameter estimates for the multiunidimensional GRM.
6. Application to real data: residents' attitudes towards tourism
Item
6.3 Results for the multiunidimensional GRM
The thresholds' parameters
the specic aspect considered.
κ
81
for each are able to reect the criticity level of
In fact, high values for the criticity parameters
correspond to lower probabilities to observe responses in higher categories, which
means that the items characterized by higher criticity parameters are answered
in lower categories more frequently.
For this model, it is not possible to unambiguously order the items by the
response probability on the basis of the criticity parameters.
θ̂1
and
θ̂2
at the mean value
0,
But, by xing
we can use these parameters to compare the
probabilities of category responses for each item, and to compare probabilities
to observe a response in a particular category or higher (lower) for each item.
The rst comparison can be carried out by calculating dierences in parameters
associated to adjacent thresholds, while the second comparison, which is mainly
meaningful in a context of interpretation, can be carried out directly through the
thresholds' parameters.
Figure 6.1
1
graphically illustrates the estimated probabilities to observe each
category for each test item for a resident with an average perception of tourism
benets and costs, i.e.
θ̂1 = 0
and
θ̂2 = 0.
As an example, an individual with an average perception of tourism benets
will have a higher probability of responding higher categories to item B1 than
κ3 ,
= −0.50,
= −0.28,
to item B3, in fact thresholds' parameters associated to higher categories,
κ4 , κ5 and κ6 , are regularly lower for item B1 (κ̂B1,3 = −1.40, κ̂B1,4
κ̂B1,5 = 0.17, κ̂B1,6 = 1.35) than to item B3 (κ̂B3,3 = −1.30, κ̂B3,4
κ̂B3,5 = 0.72, κ̂B3,6 = 2.01). This means that, between the advantages of economic
support and public services, the rst aspect is considered mainly relevant by an
individual with an average perception of benets. From Table 6.3 emerges that
thresholds' parameters for item B1 related to the highest categories
κ5
and
κ6
are the lowest in the group of items on benets. This means that the main and
immediate advantages of tourism are identied by the residents in the economic
support.
Analogously, a resident with an average perception of the environmental impact of tourism will have a higher probability of answering higher categories to
item C1 than to the other items (κ̂C1,3
= 0.003, κ̂C1,4 = 0.471, κ̂C1,5 = 0.962,
1 NOTE: in order to represent the probabilities associated to categories 1 and 7 for each
item, a lower bound of -4 and an upper bound of 4 have been xed.
82
6. Application to real data: residents' attitudes towards tourism
B1
B2
B3
B4
B5
C1
C2
C3
C4
C5
Cat.1
Figure 6.1.
Cat.2
Cat.3
Cat.4
Cat.5
Cat.6
Cat.7
Representation of the thresholds' parameter estimates for the mul-
tiunidimensional model.
κ̂C1,6 = 1.670).
Hence, the cost of life can be regarded as a marginal negative
aspect of tourism in comparison with the other issues.
The estimated correlation between the two latent traits is
r̂12 = −0.37.
The
correlation is negative and relatively high, indicating that the perception of a
high economic advantage of tourism is associated with a strongly negative environmental impact. As a conclusive remark, we can observe that the individuals
show a dierent evaluation of the benets vs. costs, revealing a critical view of
the tourism industry, and the multiunidimensional GRM is able to capture this
peculiarity.
6.4 Results for the additive GRM
In order to extend the structure of the multiunidimensional model with the inclusion of a general trait that directly aects all the item responses, we estimated
6.4 Results for the additive GRM
83
the parameters of the additive GRM. A general latent trait
specic traits
θ1
and
θ2 .
θ0
is added to the
The two specic traits have the same interpretation as in
the multiunidimensional model, namely perceptions of benets and costs, while
the general trait can be dened as the overall attitude towards tourism.
The foundation is that the general trait is estimated on the basis of the perception of either benets and costs but conditionally on the specic eects of the
two traits, and allowing other residual factors (age, gender, place of residence,
occupation,...) to inuence the measure of the overall attitude. Concerning the
score interpretation, higher scores in the attitude are related to residents who
perceive higher advantages and a lower negative impact of tourism.
The model parameters were estimated by using the proposed OpenBUGS
procedure for the additive GRM, with two chains and 30,000 iterations (15,000
burn-in) for each one. The item parameter estimates for the additive model are illustrated in Table 6.4. The additive model requires, for each item, the estimation
of the general discrimination parameter (α0ν ), the specic discrimination parameter (αν ) and the criticity parameters (κ). Again, items cannot be unambiguously
ordered on the basis of the response probabilities.
Figure 6.2 graphically illustrates the estimated probabilities to observe each
category for each test item for a resident with an average perception of tourism
θ̂0 = 0 and θ̂1 = 0 for items on
θ̂0 = 0 and θ̂2 = 0 for items on costs.
benets and costs, i.e. a resident characterized by
benets and a resident characterized by
Concerning the group of items on benets, the economic support (B1) and job
opportunities (B4) are associated with higher probabilities of responses in higher
categories, because the corresponding estimates for the thresholds' parameters
are generally lower than for the remaining items (κ̂B1,3
= −1.47, κ̂B1,4 = −0.51,
= −0.56, κ̂B4,5 = 0.41,
κ̂B1,5 = 0.20, κ̂B1,6 = 1.46 and κ̂B4,3 = −1.60, κ̂B4,4
κ̂B4,6 = 1.64)). This arrangement means that residents
who have an average
general perception of advantages and an average specic perception of advantages consider the economic development and the job opportunities as the main
advantages of tourism.
Moreover, among the items on costs, again the cost of life (C1) is characterised
by generally lower thresholds' parameters in comparison to the estimated criticity
levels of other items, especially with reference to higher categories (κ̂C1,3
κ̂C1,4 = 0.47, κ̂C1,5 = 0.96
and
κ̂C1,6 = 1.68).
= 0.00,
So that, the cost of life seems to be
the least important impact of the tourism industry for a typical respondent.
84
Item
Item
B1
B2
B3
B4
B5
C1
C2
C3
C4
C5
NOTE:
̂
SD( ̂ )
MCSE( ̂ )
1.047
0.946
1.247
1.290
1.194
0.284
1.534
1.343
1.487
1.425
0.074
0.063
0.082
0.082
0.077
0.042
0.109
0.090
0.126
0.103
0.001
0.001
0.001
0.001
0.001
0.000
0.002
0.001
0.004
0.002
Item
description
Econ. support
Quality of life
Public services
Job. opp.
Cultural act.
Cost of life
Crime rate
Env. damage
Traffic
Pollution
SD( ̂ )
MCSE( ̂ )
0.013
0.250
0.446
0.144
0.343
0.017
0.074
0.051
0.906
0.668
0.012
0.073
0.095
0.083
0.094
0.016
0.060
0.049
0.144
0.113
0.000
0.001
0.002
0.002
0.002
0.000
0.001
0.001
0.005
0.003
̂
= 1 for the items on benefits and
̂
- 3.049
- 2.539
- 3.278
- 3.342
- 2.713
- 1.491
- 1.824
- 1.901
- 1.509
- 1.646
̂
-
0.507
0.250
0.264
0.560
0.228
0.470
1.470
1.299
1.660
1.520
SD(̂ )
MCSE( ̂ )
0.169
0.125
0.188
0.187
0.141
0.069
0.123
0.114
0.134
0.114
0.002
0.001
0.002
0.002
0.001
0.000
0.002
0.002
0.005
0.003
SD(̂ )
MCSE( ̂ )
0.066
0.061
0.071
0.073
0.069
0.048
0.106
0.091
0.137
0.110
0.001
0.001
0.001
0.001
0.001
0.000
0.002
0.001
0.004
0.002
̂
-
2.014
1.789
2.200
2.390
1.901
0.644
0.534
0.745
0.397
0.452
̂
0.204
0.802
0.760
0.405
0.732
0.964
2.068
2.093
2.566
2.102
SD(̂ )
MCSE( ̂ )
0.105
0.090
0.119
0.125
0.103
0.049
0.085
0.080
0.092
0.083
0.001
0.001
0.001
0.001
0.001
0.000
0.002
0.001
0.003
0.002
SD(̂ )
MCSE( ̂ )
0.064
0.067
0.078
0.071
0.074
0.054
0.128
0.116
0.186
0.133
0.001
0.001
0.001
0.001
0.001
0.000
0.002
0.002
0.006
0.003
̂
-
1.469
1.176
1.306
1.595
1.256
0.002
0.461
0.205
0.700
0.548
̂
1.458
1.871
2.066
1.644
1.856
1.676
3.083
3.089
3.381
3.413
SD(̂ )
MCSE( ̂ )
0.087
0.073
0.088
0.094
0.084
0.046
0.083
0.073
0.098
0.085
0.001
0.001
0.001
0.001
0.001
0.000
0.001
0.001
0.002
0.002
SD(̂ )
MCSE( ̂ )
0.086
0.094
0.116
0.097
0.103
0.075
0.179
0.167
0.240
0.200
0.001
0.001
0.002
0.002
0.001
0.000
0.003
0.002
0.007
0.004
= 2 for the items on costs, SD = standard deviation, MCSE = Monte Carlo standard error.
Table 6.4.
Item parameter estimates for the additive GRM.
6. Application to real data: residents' attitudes towards tourism
B1
B2
B3
B4
B5
C1
C2
C3
C4
C5
Item
description
Econ. support
Quality of life
Public services
Job. opp.
Cultural act.
Cost of life
Crime rate
Env. damage
Traffic
Pollution
6.4 Results for the additive GRM
85
B1
B2
B3
B4
B5
C1
C2
C3
C4
C5
Cat.1
Figure 6.2.
Cat.2
Cat.3
Cat.4
Cat.5
Cat.6
Cat.7
Representation of the thresholds' parameter estimates for the addi-
tive model.
Focusing on the discrimination parameters, concerning the estimated specic
discrimination parameters, results are similar to the multiunidimensional case:
the most informative items on the specic perception of tourism benets are
public services, job opportunities and cultural activities (items B3, B4 and B5,
respectively), while crime rate, trac and pollution (items C2, C4 and C5, respectively) are the items that better discriminate respondents with dierent levels
of specic perception of tourism costs.
Higher values of estimated general discrimination parameters are associated to
public services (item B3) and cultural activities (item B5) regarding the benets,
and to trac (item C4) and pollution (item C5) among the items on costs of
tourism industry. Consequently, these aspects principally inuence the general
residents' attitude towards tourism.
Usually, the additive model ts the data better than the multiunidimensional
model because the presence of an overall latent trait is generally supported by
data. In fact, also for our data a lower DIC is associated with the additive model
(DIC= 8945) in comparison to the multiunidimensional model (DIC=10950).
86
6. Application to real data: residents' attitudes towards tourism
Analogously to the previous model, we estimated the correlations between the
latent variables (θ1 ,
r̂02 = 0.18
and
r̂12
θ2 , and θ0 ) of the additive model. The results are r̂01 = 0.03,
= −0.62. The correlation between the benet and cost latent
traits is negative as in the multiunidimensional model.
The correlation between the benet latent trait and the attitude is very low,
and slightly higher is the estimated correlation between the cost latent trait and
the general attitude.
6.5 Heterogeneity in resident perceptions
The multiunidimensional and additive models presented in this work are specied
without considering the presence of covariates. Of course, once the measurement
process is carried out, latent constructs may result in dierent scores according
to the characteristics of the examinees. In order to face this issue, we perform an
analysis of the scores for the general and specic latent traits obtained from the
additive model.
Therefore, to investigate the importance of the individuals' heterogeneity in
the evaluation of tourism attitudes, the score distributions
2
of the general and
specic latent traits are calculated and compared on the basis of some sociodemographic characteristics (Table 6.5).
Residents show, on average, a positive attitude towards tourism (0.62) and a
higher perception of benets (0.57) compared with the costs (0.48).
From Table 6.5 we can observe how the youngest people have both a signicant
personal attitude tward tourism and a critical perception of the tourism industry:
a high score in the perception of benets is associated to a low score on the
perception of costs. This means that the youngest are conscious of the advantages
related to tourism, but at the same time, they strongly evaluate the negative
eects of the industry on the community. On the contrary, respondents with a
low level of education and elderly people show a high attitude towards tourism
and a small gap between the benet and cost scores.
The area of residence also aects the evaluation of the tourism industry.
In fact, residents in the tourism municipalities and provinces (Rimini and San
2 As
the scores have a dierent range, they have been normalized to the range of 0 to 1.
6.5 Heterogeneity in resident perceptions
Age
<
25
25 - 35
35 - 45
45 - 55
55 - 65
≥
Gender
65
Female
Male
Education
Primary
Lower secondary
Upper secondary
University
87
θ̂0
θ̂1
θ̂2
0.57
0.61
0.61
0.63
0.64
0.62
0.61
0.56
0.55
0.57
0.58
0.57
0.45
0.53
0.46
0.50
0.50
0.48
0.62
0.61
0.58
0.55
0.47
0.50
0.62
0.65
0.60
0.60
0.57
0.57
0.56
0.57
0.48
0.52
0.45
0.48
0.65
0.65
0.58
0.50
0.54
0.53
0.62
0.62
0.52
0.53
0.41
0.40
0.60
0.63
0.64
0.62
0.57
0.59
0.55
0.57
0.49
0.47
0.49
0.48
Provinces
Forlì-Cesena
Ravenna
Rimini
San Marino
Typological locality
Main town
Tourism municipality
Total
Table 6.5.
Other urban city
Normalized mean perception and attitude scores by age, gender,
education, province and typological area.
Marino), where the seaside tourism is relevant, present a high gap between the
benet and cost scores.
This rst research, that has been repeated in 2013, furnishes interesting suggestions for the development of incentive tourism policies, which are also related
to the well-being.
88
6. Application to real data: residents' attitudes towards tourism
Chapter 7
Conclusions
This work falls within the context of item response theory (IRT). In particular,
it focuses on models for ordinal data. The importance of developing models for
ordinal data is relevant not only from a theoretical perspective. Actually, several
elds of application are characterized by ordinal manifest variables and the use
of proper models for ordinal data allows to avoid the loss of information due
to the dichotomization process. IRT is widely used in psychological and educational elds, but it also shows a great potential in applications within behavioral
sciences, where data are often ordinal.
In the past, a common assumption was the presence of a single latent construct underlying the response process.
However, real data typically suggest a
multidimensional structure. So that, multidimensional IRT (MIRT) models have
been recently developed, taking into account the complexity of real data and
allowing for the presence of more than one latent trait.
In this work we focus on MIRT models for ordinal data with complex latent
structures. Indeed, numerous MIRT models can be specied according to several
conditions, and one of them is the hypothesized underlying latent structure. The
models proposed in this work are extensions of the unidimensional graded resopnse model (GRM) (Samejima, 1969) and are characterized by multidimensional
latent structures with correlated traits. In particular, we consider the multiunidimensional structure, where the item responses are aected by specic traits,
and the additive structure, where the item responses are simultaneously aected
89
90
7. Conclusions
by a general and specic traits.
Then, we considered two model: the multiunidimensional and the additive
GRMs with correlated traits. This choice has been driven by the fact that the
rst one is widely used and represents a classical approach in MIRT analysis,
while the second one is able to reect the complexity of real interactions between
items and respondents.
Due to the complexity of the models proposed, another important aspect of
this work concerns the estimation procedure. Within a Bayesian approach, we
propose a Markov chain Monte Carlo (MCMC) procedure for parameter estimation, which permits to overtake the problem of analytically intractable expressions. Models are implemented using the open-source software OpenBUGS. This
software, allowing for a exible and rather easy implementation, represents a
good solution for estimation issues.
In order to assess the item parameter recovery for both multiunidimensional
and additive GRMs we perform a simulation study.
The simulation study is
conducted on a bidimensional case by varying the simulation conditions, that are:
the number of response categories, the sample size, the test and subtest lengths
and the latent trait correlation structure. Concisely, the main simulation results
showed that the parameter recovery is particularly sensitive to the sample size,
due to the model complexity and the high number of parameters to be estimated.
For a suciently large sample size the parameters of the multiunidimensional and
additive GRMs are well reproduced. The results are also aected by the tradeo between the number of items constituting the test and the number of item
categories: the worst results are associated to a high number of categories and
a low test length.
Analogous evidences apply for the latent trait correlation
estimates.
In order to verify the actual applicability of the proposed models in real situations, we estimated them on empirical data.
Data were collected with the
aim to investigate Romagna and San Marino residents' perceptions and attitudes
towards the tourism industry.
A relevant advantage of the proposed models
concerns the possibility to use the data collected without any preliminary transformation, hence without any loss of information.
Some limitations of the research regarding the application study exist, in
particular the choice of the prior distributions, the sample size, the number of
item categories, the test and subtests lengths, are important issues that have to
91
be always considered and checked.
Lastly, concerning the future works to be done on the MIRT models for ordinal data and correlated traits, rst of all it could be interesting to perform
further simulations with an increased number of latent dimensions.
Secondly,
this work focuses on two specic underlying latent structures, hence an extension
to dierent (i.e.
hierarchical or high-orders) structures represent a stimulating
issue. A nal extension could consider the introduction of covariates in the model
specication, independently from the underlying structure considered.
92
7. Conclusions
Bibliography
T.A. Ackerman.
Insuring the validity of the reported score scale by reporting
multiple scores.
metric Society
Paper presented at the North American Meeting of the Prycho-
, 1993.
A.J. Adams, M. Wilson, and W-C Wang.
cients multinomial logit model.
The multidimensional random coef-
Applied Psychological Measurement
, 21(1):
123, 1997.
J.H. Albert.
Bayesian estimation of normal ogive item response curves using
Gibbs sampling.
Journal of Educational Statistics
, 17(3):251269, 1992.
E.B. Andersen. Asymptotic properties of conditional maximum-likelihood estimators.
Journal of the Royal Statistical Society. Series B (Methodological)
, 32:
283301, 1970.
D. Andrich.
chometrika
A rating scale formulation for ordered response categories.
Psy-
, 43(4):561573, 1978.
D. Andrich. An extension of the Rasch model for ratings providing both location
Psychometrika
The theory and practice of item response theory
and dispersion parameters.
R.J. De Ayala.
, 47(1):105113, 1982.
. Guilford Press,
2009.
Mr. Bayes and MR. Price. An Essay towards solving a Problem in the Doctrine
of Chances. By the late Rev. Mr. Bayes, FRS communicated by Mr. Price,
in a letter to John Canton, AMFRS.
Philosophical Transactions (1683-1775)
pages 370418, 1763.
93
,
94
BIBLIOGRAPHY
A.A. Béguin and C.A.W. Glas. MCMC estimation and some model-t analysis
of multidimensional IRT models.
Psychometrika
, 66(4):541562, 2001.
C. Bernini, A. Guizzardi, and G. Angelini. DEA-like model and common weights
approach for the construction of a subjective community well-being indicator.
Social indicators research
, 114(2):405424, 2013.
A. Birnbaum. Some latent trait models and their use in inferring an examinee's
ability. In F.M. Lord and M.R. Novick, editors,
test scores
Statistical theories of mental
. Addison-Wesley, Reading MA, 1968.
R.D. Bock.
Estimating item parameters and latent ability when responses are
scored in two or more nominal categories.
S.P. Brooks and A. Gelman.
iterative simulations.
Psychometrika
, 37(1):2951, 1972.
General methods for monitoring convergence of
Journal of computational and graphical statistics
, 7(4):
434455, 1998.
A.R. Brown, S.J. Finney, and M.K. France. Using the bifactor model to assess
the dimensionality of the Hong Psychological Reactance Scale.
and Psychological Measurement
Educational
, 71(1):170185, 2011.
F. F. Chen, S. G. West, and K. H. Sousa. A comparison of bifactor and secondorder models of quality of life.
Multivariate Behavioral Research
, 41(2):189225,
2006.
S.M. Curtis. BUGS code for item response theory.
Journal of Statistical Software
,
36(1):134, 2010.
M.C. Edwards.
A Markov chain Monte Carlo approach to conrmatory item
factor analysis.
Psychometrika
, 75(3):474497, 2010a.
M.C. Edwards. MultiNorm User's Guide: multidimensional normal ogive item
response theory models.
P.J. Ferrando.
mimeo
, 2010b.
Likert scaling using continuous, censored, and graded response
models: eects on criterion-related validity.
ment
, 23(2):161175, 1999.
Applied Psychological Measure-
BIBLIOGRAPHY
95
G.H. Fischer. Derivations of the rasch model. In G.H. Fischer and I.W. Molenaar, editors,
tions
Rasch models: Foundations, recent developments and applica-
. Springer-Verlag, New York, 1995.
J-P Fox.
Bayesian item response modeling
. Springer, 2010.
J-P Fox and C.A.W. Glas. Bayesian estimation of a multilevel of an IRT model
using Gibbs sampling.
Psychometrika
, 66(2):271288, 2001.
Z-H Fu, J. Tao, and N-Z Shi. Bayesian estimation of the multidimensional graded
response model with nonignorable missing data.
tation and Simulation
Markov Chain Monte Carlo
Journal of Statistical Compu-
, 80(11):12371252, 2010.
D. Gamerman.
. Chapman and Hall, London, 1997.
A.E. Gelfand and A.F.M. Smith.
marginal densities.
Sampling-based approaches to calculating
Journal of American Statistical Association
, 85:398409,
1990.
A. Gelman. Objections to Bayesian statistics.
Bayesian Analysis
, 3(3):445449,
2008.
A. Gelman and D.B. Rubin. Inference from iterative simulation using multiple
Statistical science
sequences.
, pages 457472, 1992.
A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin.
Bayesian Data Analysis
.
CRC press, London, 2003.
S. Geman and D. Geman.
Stochastic relaxation, Gibbs distributions and the
Bayesian restoration of images.
Machine Intelligence
IEEE Transactons on Pattern Analysis and
, 6:721741, 1984.
R.D. Gibbons, A.J. Rush, and J.C. Immekus. On the psychometric validity of
the domains of the PDSQ: An illustration of the bi-factor item response theory
model.
Journal of psychiatric research
, 43(4):401410, 2009.
W.R. Gilks, S. Richardson, and D.J. Spiegelhalter.
in practice
. Chapman and Hall, London, 1996.
Markov chain Monte Carlo
96
BIBLIOGRAPHY
H. Gu and C. Ryan.
Place attachment, identity and community impacts of
Tourism management
tourism-the case of a Beijing hutong.
, 29(4):637647,
2008.
D.C. Haley.
Estimation of the dosage mortality relationship when the dose is
subject to error. Technical Report 15 (Oce of Naval Research Contract No.
25140, NR-342-022), Stanford University: Applied Mathematics and Statistics
Laboratory, 1952.
R.K. Hambleton and H. Swaminathan.
applications
Item Response Theory: Principles and
. Kluwer Nijho Publishing, Boston, 1985.
R.K. Hambleton, H. Swaminathan, and H.J. Rogers.
sponse Theory
Fundamentals of Item Re-
. Sage Publications, Newbury Park, CA, 1991.
W.K. Hastings. Monte Carlo simulation methods using Markov chains and their
applications.
Biometrika
, 57:97109, 1970.
K.J. Holzinger and F. Swineford.
The bi-factor method.
Psychometrika
, 2(1):
4154, 1937.
H-Y Huang, W-C Wang, P-H Chen, and C-M Su. Higher-order item response
models for hierarchical latent traits.
Applied Psychological Measurement
, 37
(8):619637, 2013.
M.S. Johnson. Marginal maximum likelihood estimation of item response models
in R.
Journal of Statistical Software
, 20(10):124, 2007.
R. Likert. A technique for the measurement of attitudes.
Archives of psychology
,
22(140), 1932.
Psychometric Monograph No. 7
Applications of item response theory to practical testing problems
F.M. Lord. A theory of test scores.
F.M. Lord.
, 1952.
.
Lawrence Erlbaum, Hillsdale, NJ, 1980.
F.M. Lord and M.R. Novick.
Statistical theories of mental test scores
Wesley, Reading, MA, 1968.
. Addison-
BIBLIOGRAPHY
97
D. Lunn, D. Spiegelhalter, A. Thomas, and N. Best. The BUGS project: Evolution, critique and future directions.
Statistics in Medicine
, 28(25):30493067,
2009.
D. Lunn, C. Jackson, N. Best, A. Thomas, and D. Spiegelhalter.
Book
The BUGS
. CRC Press, Taylor & Francis Group, 2013.
D.J. Lunn, A. Thomas, N. Best, and D. Spiegelhalter. WinBUGS - a Bayesian
modelling framework:
computing
concepts, structure, and extensibility.
Statistics and
, 10(4):325337, 2000.
G.N. Masters. A rasch model for partial credit scoring.
Psychometrika
, 47(2):
149174, 1982.
Item Response Theory models for the competence evaluation: towards a multidimensional approach in the University guidance
M. Matteucci.
.
PhD thesis,
University of Bologna, 2007.
N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E.Teller.
Equation of state calculations by fast computing machines.
Chemical Physics
The Journal of
, 21, 1953.
E. Muraki. A generalized partial credit model: Application of an EM algorithm.
Applied psychological measurement
, 16(2):159176, 1992.
E. Muraki. POLYFACT [computer software].
Service
Princeton, NJ: Educational Testing
, 1993.
E. Muraki and J.E. Carlson. Full-information factor analysis for polytomous item
Applied Psychological Measurement
Bayesian modeling using WinBUGS
Polytomous Item Response Theory Models (Quantitative Applications in the Social Sciences, Vol. 144)
responses.
, 19(1):7390, 1995.
I. Ntzoufras.
, volume 698. Wiley. com, 2011.
R. Ostini and M.L. Nering.
. Sage Publications, Thou-
sand Oaks, CA, 2006.
R.J. Patz and B.W. Junker.
Applications and extensions of MCMC in IRT:
Multiple item types, missing data, and rated responses.
and Behavioral Statistics
, 24(4):342366, 1999a.
Journal of Educational
98
BIBLIOGRAPHY
R.J. Patz and B.W. Junker. A straightforward approach to Markov chain Monte
Carlo methods for item response models.
ioral Statistics
Journal of Educational and Behav-
, 24(2):146178, 1999b.
M. Plummer. JAGS Version 2.2.0 user manual, 2010.
Probabilistic models for some intelligence and attainment tests
G. Rasch.
. Danish
Institute for Educational Research, Copenhagen, 1960.
M.D. Reckase. The diculty of test items that measure more than one ability.
Applied Psychological Measurement
, 9(4):401412, 1985.
M.D. Reckase.
The past and future of multidimensional item response theory.
Applied Psychological Measurement
Multidimensional Item Response Theory
, 21(1):2536, 1997.
M.D. Reckase.
. Springer, London, 2009.
B.B. Reeve. An introduction to modern measurement theory.
Institute
J. Rost.
National Cancer
, 2002.
Measuring attitudes with a threshold model drawing on a traditional
scaling concept.
Applied Psychological Measurement
, 12(4):397409, 1988.
F. Samejima. Estimation of latent trait ability using a response pattern of graded
scores.
Psychometrika Monograph Supplement No. 17
, 1969.
F. Samejima.
responses.
Evaluation of mathematical models for ordered polychotomous
Behaviormetrika
, 23(1):1735, 1996.
J. Schmid and J.M. Leiman. The development of hierarchical factor solutions.
Psychometrika
Y. Sheng.
, 22(1):5361, 1957.
A MATLAB package for Markov chain Monte Carlo with a multi-
unidimensional IRT model.
Journal of Statistical Software
, 28(10):120, 2008.
Y. Sheng. Bayesian estimation of MIRT models with general and specic latent
traits in MATLAB.
Journal of Statistical Software
, 34(3):126, 2010.
Y. Sheng and C.K. Wikle. Comparing multiunidimensional and unidimensional
item response theory models.
(6):899919, 2007.
Educational and Psychological Measurement
, 67
BIBLIOGRAPHY
99
Y. Sheng and C.K. Wikle. Bayesian multidimensional IRT models with a hierarchical structure.
Educational and Psychological Measurement
, 68(3):413430,
2008.
Y. Sheng and C.K. Wikle. Bayesian IRT models incorporating general and specic
abilities.
Behaviormetrika
, 36(1):2748, 2009.
D. Spiegelhalter, A. Thomas, N. Best, and W. Gilks. BUGS 0.5: Bayesian inference using Gibbs sampling - Manual (version ii).
MRC Biostatistics Unit,
Institute of Public Health, Cambridge, UK
Item response theory in personality research
, 1996.
L. Steinberg and D. Thissen.
.
Lawrence Erlbaum Associates Hillsdale, NJ, 1995.
R. Tate. A comparison of selected empirical methods for assessing the structure
Applied Psychological Measurement
of responses to test items.
, 27(3):159203,
2003.
D. Thissen and L. Steinberg. A response model for multiple choice items.
chometrika
Psy-
, 49(4):501519, 1984.
D. Thissen and L. Steinberg. A taxonomy of item response models.
trika
Psychome-
, 51(4):567577, 1986.
D. Thissen, L. Nelson, K. Rosa, and L.D. McLeod. Item response theory for items
scored in more than two categories. In D. Thissen and H. Wainer, editors,
scoring
Test
, pages 141186. Psychology Press, 2001.
A. Thomas, B. O'Hara, U. Ligges, and S. Sturtz. Making BUGS Open.
6(1):1217, 2006. URL
R.E. Traub.
http://cran.r-project.org/doc/Rnews/.
R News
A priori considerations in choosing an item response model.
R.K. Hambleton, editor,
Applications of item response theory
,
In
, pages 5770.
Educational Research Institute of British Columbia, Vancouver, BC, 1983.
L.A. van der Ark.
theory models.
Relationships and properties of polytomous item response
Applied Psychological Measurement
, 25(3):273282, 2001.
100
BIBLIOGRAPHY
L.A. van der Ark, D.W. van der Palm, and K. Sijtsma. A latent class approach
to estimating test-score reliability.
Applied Psychological Measurement
, 35(5):
380392, 2011.
W.J. van der Linden and R.K. Hambleton.
Theory
Handbook of Modern Item Response
. Springer-Verlag, New York, 1997.
W-C Wang, P-H Chen, and Y-Y Cheng. Improving measurement precision of test
batteries using multidimensional item response models.
Psychological Methods
,
9(1):116136, 2004.
L. Yao and R.D. Schwarz. A multidimensional partial credit model with associated item and test statistics: An application to mixed-format tests.
psychological measurement
, 30(6):469492, 2006.
Applied
Appendix A
OpenBUGS code for implemented
models
A.1 OpenBUGS code: multiunidimensional and
additive models for graded responses
In this section we report the codes used to implement the multiunidimensional
model and the additive model for graded responses.
Initial values for the following quantities have to be set and loaded from the
user before to run the models: m.theta, Sigma.theta, m.alpha, s.alpha, m.alpha0,
s.alpha0, m.kappa and s.kappa (of course, m.alpha0 and s.alpha0 are referred
only to the additive model).
101
Appendix B
R procedures for the simulation
study
The following sections report the codes used to perform the simulation study
for both the multiunidimensional and the additive GRMs. For each model, the
procedure about a single scenario (i.e. with particular simulation conditions that
can be set at the beginning of the procedure) is described.
The simulation study has been conducted by using an R procedure to generate
the objects of interest, and by recalling OpenBUGS trough the R package BRugs.
The main advantage of the combined use of R and OpenBUGS consists in the
possibility to create an automatic routine to complete all replications within a
distinct scenario.
For further details about all the available functions and features of the package
BRugs, see Thomas et al. (2006).
105
B.1 Multiunidimensional GRM: R code
B.2 Additive GRM: R code
Appendix C
Survey questionnaire
In this section we report the questionnaire submitted to residents in the Romagna
area and in the State of San Marino (Italy). The questionnaire has been created to
investigate residents' evaluations about costs and benets of the tourism industry,
a personal benet from tourism, the quality of life in the area, the degree of
involvement in the tourism industry, the residents' satisfaction with either their
leisure or non-leisure domains, their quality of life, and the degree of support for
future development of the tourism industry.
113
QUESTIONARIO RIMINESI
8. Con un voto da 1 a 7 (1 min accordo, 7 max accordo), quanto è in accordo
con le seguenti affermazioni:
Intervistatrice num.___ Intervistato num.____
Buongiorno, questa è un'indagine coordinata da docenti dell’Università di Rimini per
conoscere le opinioni dei cittadini sulla qualità della vita. Possiamo avere anche il suo
parere? La disturberemo solo pochi minuti e le sue risposte rimarranno
completamente anonime ... (passare subito alla domanda successiva).
1.
Giudichi la Romagna rispetto a: / Con un voto da 1 a 7, come giudica
questi aspetti della Romagna (1 min sodd, 7 max sodd):
1.
2.
2.
3.
4.
5.
6.
7.
2.
____
____
____
____
____
____
____
____
sviluppa l’economia della città
migliora lo standard/qualità di vita
sviluppa i servizi pubblici
aumenta le opportunità lavorative
migliora le attività culturali
9. Complessivamente è a favore dello sviluppo dell’industria turistica nella
Romagna: si no
10. Per migliorare la qualità della vita in Romagna cosa suggerisce di fare
(1 sola proposta) …………………….…………………………………………………….
11. La sua professione è in qualche modo legata al mercato turistico:
Sì, svolgo un’attività legata al settore turistico ……………………………………………… sino
Saltuariamente o in passato ho svolto attività legate al settore turistico ………… si no
L’attività dei miei familiari è legata al settore turistico ………………………………… si no
Dia un voto da 1 a 7 ai seguenti vantaggi che il turismo porta nella
Romagna (1 min vantaggio, 7 max vantaggio) :
1.
2.
3.
4.
5.
3.
tenore di vita
dotazione di servizi pubblici
traffico
pulizia della città e verde
ospitalità e accoglienza
possibilità di lavoro/carriera
attività ricreative e culturali
sicurezza
1. sono a favore dello sviluppo del turismo balneare
____
2. sono a favore dello sviluppo delle attività culturali e ricreative della mia città ____
3. sono a favore dello sviluppo delle manifestazioni fieristiche e sportive
____
____
____
____
____
____
12.
1.
3.
4.
5.
6.
7.
8.
9.
Lei…:
legge abitualmente quotidiani ......................si
fa sport regolarmente ....................................si
va spesso a mostre d'arte, musei o teatro ........si
viaggia spesso per vacanza ............................si
naviga spesso su Internet da casa ..................si
acquista su internet .......................................si
fa volontariato e/o politica ..............................si
va in Chiesa o altro luogo di culto religioso.......si
no
no
no
no
no
no
no
no
La ringrazio per la sua cortese collaborazione, per concludere posso ancora chiederle:
Dia un voto da 1 a 7 ai seguenti problemi che il turismo porta nella
Romagna (1 min problema, 7 max problema):
13. La sua età: _______
14. Genere
1.
2.
3.
4.
5.
4.
1.
2.
3.
4.
7.
Maschio ...........................................................1
Femmina ..........................................................2
15. In quale città risiede? _____________________
Situazione economica
Salute
Relazioni famigliari
Relazioni con amici
Lavoro
spiritualità/religione
16. Da quanti anni vive in Romagna? _______
____
____
____
____
____
____
17. Il suo stato civile
Con un voto da 1 a 7, quanto si ritiene soddisfatto dalle attività che
svolge nel tempo libero (ultimo anno):
1.
2.
3.
4.
5.
6.
7.
6.
____
____
____
____
____
Con un voto da 1 a 7, quanto si ritiene soddisfatto dei seguenti
aspetti della sua vita (ultimo anno):
1.
2.
3.
4.
5.
6.
5.
Aumenta il costo della vita e delle case
Aumenta il disordine e la criminalità
Danneggia l’ambiente e il paesaggio
Aumenta il traffico
Aumenta l’inquinamento
relazioni sociali
attività sportive/fitness
hobby personali
attività culturali (cinema, teatro, ecc.)
attività ricreative (ristoranti, discoteche, ecc.)
fare shopping
andare in spiaggia/mare
____
____
____
____
____
____
____
18. Qual è il suo titolo di Studio:
19. E la sua professione? (1 sola risposta):
____
____
____
____
Con un voto da 1 a 7 (1 min accordo, 7 max accordo), quanto è
d’accordo con le seguenti affermazioni: l’industria turistica:
1. ha migliorato la qualità della mia vita
2. ha reso Rimini il posto migliore dove trascorrere il mio tempo libero
3. ha reso Rimini una città che mi consente di realizzarmi
1
2
3
4
Licenza Elementare…………………………………………………………………… 1
Licenza Media…………………………………………………………………………… 2
Diploma…………………………………………………………………………………… 3
Laurea……………………………………………………………………………………… 4
Con un voto da 1 a 7 (1 min sodd, 7 max sodd), quanto (ultimo anno):
è soddisfatto di come sono andate le cose nella sua vita
è soddisfatto della maggior parte degli aspetti della sua vita
trova soddisfazione nel pensare a quello che è riuscito a fare nella vita
è soddisfatto per quello che è quando si confronta con amici e familiari
nubile/ celibe……………………………………………………………………………
coniugato/a ……………………………………………………………………………
separato/divorziato …………………………………………………………………
vedovo/……………………………………………………………………………………
____
____
____
Dirigente / Funzionario / Professionista d'albo……………………………… 1
Imprenditore/ Lavoratore in proprio/Artigiano……………………………… 2
Impiegato/a o quadro………………………………………………………………… 3
Insegnante (professore, maestro, ecc.) ……………………………………… 4
Operaio/a………………………………………………………………………………… 5
Casalinga………………………………………………………………………………… 6
Studente/ssa…………………………………………………………………………… 7
Pensionato/a…………………………………………………………………………… 8
In cerca di lavoro……………………………………………………………………… 9
Altro………………………………………………………………………………………… 10
Specificare______________________________________
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement