Alma Mater Studiorum - Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA Ciclo XXVI Settore Concorsuale di aerenza: 13/D1 Settore Scientico disciplinare: SECS-S/01 MULTIDIMENSIONAL ITEM RESPONSE THEORY MODELS WITH GENERAL AND SPECIFIC LATENT TRAITS FOR ORDINAL DATA Presentata da: Irene Martelli Esame nale anno 2014 Alma Mater Studiorum - Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA Ciclo XXVI Settore Concorsuale di aerenza: 13/D1 Settore Scientico disciplinare: SECS-S/01 MULTIDIMENSIONAL ITEM RESPONSE THEORY MODELS WITH GENERAL AND SPECIFIC LATENT TRAITS FOR ORDINAL DATA Presentata da: Irene Martelli Coordinatore Dottorato: Relatore: Chiar.mo Prof. Angela Montanari Chiar.mo Prof. Stefania Mignani Esame nale anno 2014 To my family and Lorenzo, for their love and support. i Abstract The aim of the thesis is to propose a Bayesian estimation through Markov chain Monte Carlo of multidimensional item response theory models for graded responses with complex structures and correlated traits. In particular, this work focuses on the multiunidimensional and the additive underlying latent structures, considering that the rst one is widely used and represents a classical approach in multidimensional item response analysis, while the second one is able to reect the complexity of real interactions between items and respondents. A simulation study is conducted to evaluate the parameter recovery for the proposed models under dierent conditions (sample size, test and subtest length, number of response categories, and correlation structure). The results show that the parameter recovery is particularly sensitive to the sample size, due to the model complexity and the high number of parameters to be estimated. For a sufciently large sample size the parameters of the multiunidimensional and additive graded response models are well reproduced. The results are also aected by the trade-o between the number of items constituting the test and the number of item categories. An application of the proposed models on response data collected to investigate Romagna and San Marino residents' perceptions and attitudes towards the tourism industry is also presented. ii iii Acknowledgements First and foremost I want to thank my supervisor Stefania Mignani for her constant attention, care and belief. Then, I would like to express my gratitude to Mariagiulia Matteucci for her precious and fundamental suggestions and supervision during the preparation of the thesis. My work would not have been successful without her. Appreciation is extended to Cristina Bernini who has provided the data. My special thanks to my friends and colleagues Lucia and Violeta for their support during the whole period of the PhD. iv v Preface Item response theory (IRT) falls within the wide context of the measurement of theoretical latent constructs, which are not observable by denition and can only be determined indirectly, through the use of other manifest variables. IRT is extensively used in educational and psychological elds, where usually a test consisting of a set of items is submitted to a sample of examinees to infer the individuals' unobservable characteristics (abilities). To this aim, IRT (Hambleton and Swaminathan, 1985; van der Linden and Hambleton, 1997) represents the main methodological approach that allows to estimate both the item psychometric properties and the subjects' scores. Moreover, IRT shows a great potential in applications within behavioral sciences. In the past, unidimensionality, i.e. the presence of a unique construct underlying the response process, was one of the most common assumption. Nevertheless, real data often suggest a multidimensional structure and, with the aim to infer such distinct latent traits, tests should include dierent subtests. For this reason, models that allow the presence of more than one latent trait have been recently developed. The so called multidimensional IRT (MIRT) models (see e.g., Reckase, 2009) are able to describe the complexity of the data, taking into account correlated abilities and also a possible hierarchical structure of latent traits. This is the reason why MIRT models perform better in tting the subtests if compared to separate unidimensional models. Several approaches are possible within the multidimensional perspective: explorative models, where all latent traits are allowed to aect all the item responses, or conrmatory models, where all the relations between observed and latent variables need to be specied in advance. By using a conrmatory ap- proach, it is also possible to assume the simultaneous presence of general and specic latent traits underlying the response process (Sheng and Wikle, 2008). A further distinction can be made between non compensatory and compensatory models, where a lack in one trait naturally compensates for the other (Reckase, 2009). In several applications, data are characterized by hierarchical structures and the introduction of dierent levels for latent dimensions permits to specify more vi general models. Specically, a proper hierarchy can be assumed to underlie the response process, where the highest level is associated with the overall trait, while dimensions representing more specic traits are located on lower level of the hierarchy. High-order and additive models are two approaches that allow to include a general trait in addition to multiple specic traits. Particularly, in additive models, we can analyze the strength of the relationships between the specic latent traits and the associated test items directly as well as the strength of the relationships between the general latent trait and all the test items. This feature is particularly appealing for complex applications. A nal distinction can be made according to the nature of the observable variables. Usually, in an educational testing framework we deal with binary items (i.e. correct/incorrect) while in psychological and behavioral researches items are typically ordinal, representing judgments or agreements. Dierent models for ordinal data have been developed according to the number of item parameters (e.g. partial credit models, graded response models) in a unidimensional context. On the contrary, within a multidimensional context, models for binary data are usually applied and, often, the available ordinal data are dichotomized, with a consequent loss of information. Models for ordinal data remain uncommon and were developed only for uncorrelated latent traits. For these reasons, in this work we propose an extension of the unidimensional graded response model (Samejima, 1969) for ordinal data to multidimensional structures with correlated traits, namely the multiunidimensional and the additive structures. A further innovative and important aspect of our proposal deals with the estimation procedure, in fact, we propose a Markov chain Monte Carlo (MCMC) procedure for parameter estimation which we implement using the open-source software OpenBUGS. Structure of the thesis In the rst chapter some fundamental notions about IRT are introduced. A rst section illustrates the basic concepts and denitions characterizing the IRT approach, with a brief description of unidimensional models for binary data. A vii second section focuses on unidimensional models for ordinal data and, in particular, on the Samejima's model for graded responses. A nal section explains the reasons that have driven several developments of IRT towards its multidimensional generalization. The second chapter introduces the MIRT approach. In the rst section the main features of these models are described, while in the second section a brief review on MIRT models for both binary and ordinal response is reported, together with a brief description of their most common estimation methods. In the third chapter the main principles characterizing the Bayesian estimation in MIRT context are introduced. The rst section describes the general Bayesian framework, while the second section presents the available Bayesian estimation methods based on MCMC techniques. The third section briey introduces the functioning of OpenBUGS, which permits to easily run the most common MCMC algorithm, i.e. the Gibbs sampler. In the fourth chapter two MIRT models for ordinal data with a complex structure are introduced in terms of specication, interpretation and estimation. The focus is on two MIRT models for graded responses and correlated latent traits: the multiunidimensional model, where items in each subtest characterize a single ability, and the additive model, where each item measures a general and a specic ability directly. The fth chapter describes a simulation study that has been conducted in order to evaluate the parameter recovery of the estimation method for the proposed models. The simulation study design is illustrated in the rst section, while the second and the third sections report the results of the simulations performed for the multiunidimensional and the additive models for ordinal data, respectively. In the sixth chapter an application of the proposed models to real data is presented. The application focuses on the investigation of residents' perceptions and attitudes towards the tourism industry. In the seventh chapter conclusions and further research on applicative and methodological aspects are discussed. viii Contents 1 An introduction to item response theory (IRT) 1.1 1.2 Basic concepts and denitions . . . . . . . . . . . . . . . . . . . . 1 1.1.1 The concept of model in IRT . . . . . . . . . . . . . . . . 2 1.1.2 IRT unidimensional models for binary data . . . . . . . . . 3 IRT unidimensional models for ordinal data . . . . . . . . . . . . 5 1.2.1 Samejima's unidimensional graded response model . . . . . 7 1.2.2 Other unidimensional IRT models for graded responses 1.3 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Towards multidimensional models . . . . . . . . . . . . . . . . . . 10 2 Multidimensional IRT (MIRT) models: a review 2.1 13 Main features of MIRT models . . . . . . . . . . . . . . . . . . . . 13 2.1.1 Compensatory and noncompensatory approaches . . . . . 15 2.1.2 Conrmatory and exploratory approaches . . . . . . . . . 15 2.1.3 Underlying latent structures . . . . . . . . . . . . . . . . . 16 2.2 MIRT models for binary data . . . . . . . . . . . . . . . . . . . . 19 2.3 MIRT models for ordinal data . . . . . . . . . . . . . . . . . . . . 22 2.4 Estimation methods 24 . . . . . . . . . . . . . . . . . . . . . . . . . 3 Bayesian estimation of MIRT models 3.1 3.2 27 Elements of Bayesian statistics in MIRT context . . . . . . . . . . 27 3.1.1 Prior distribution choice . . . . . . . . . . . . . . . . . . . 28 3.1.2 Bayes' Theorem . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.3 Marginal posterior distributions for model parameters . . . 30 Markov chain Monte Carlo methods . . . . . . . . . . . . . . . . . 31 3.2.1 35 Metropolis-Hastings algorithm . . . . . . . . . . . . . . . . ix x CONTENTS 3.2.2 3.3 Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . Bayesian computation using OpenBUGS . . . . . . . . . . . . . . 4 MIRT graded response models with complex structures 4.1 4.2 4.3 4.4 5.2 5.3 38 41 MIRT graded response models (GRMs) . . . . . . . . . . . . . . . 41 4.1.1 Specication of the multiunidimensional GRM . . . . . . . 44 4.1.2 Specication of the additive GRM . . . . . . . . . . . . . . 45 Person and item parameters: interpretation . . . . . . . . . . . . 46 4.2.1 Ability parameters . . . . . . . . . . . . . . . . . . . . . . 46 4.2.2 Multidimensional item discrimination . . . . . . . . . . . . 46 Multiunidimensional GRM implementation . . . . . . . . . . . . . 47 4.3.1 Model specication . . . . . . . . . . . . . . . . . . . . . . 48 4.3.2 Prior distributions 48 4.3.3 Likelihood function for responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 . . . . . . . . . . . . . . . . . . . 51 4.4.1 Model specication . . . . . . . . . . . . . . . . . . . . . . 51 4.4.2 Prior distributions 52 4.4.3 Likelihood function for responses Additive GRM implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Simulation Study 5.1 36 53 55 Simulation study design . . . . . . . . . . . . . . . . . . . . . . . 56 5.1.1 Parameter recovery . . . . . . . . . . . . . . . . . . . . . . 57 5.1.2 Estimated ability correlations . . . . . . . . . . . . . . . . 57 5.1.3 Convergence detection . . . . . . . . . . . . . . . . . . . . 57 5.1.4 Bayesian t . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.1.5 General simulation conditions . . . . . . . . . . . . . . . . Multiunidimensional GRM: simulations and results 60 . . . . . . . . 60 5.2.1 Simulation conditions . . . . . . . . . . . . . . . . . . . . . 60 5.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Additive GRM: simulations and results . . . . . . . . . . . . . . . 64 5.3.1 Simulation conditions . . . . . . . . . . . . . . . . . . . . . 64 5.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6 Application to real data: residents' attitudes towards tourism 75 6.1 Interpretation of model parameters . . . . . . . . . . . . . . . . . 75 6.2 Research design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 CONTENTS xi 6.3 Results for the multiunidimensional GRM . . . . . . . . . . . . . 78 6.4 Results for the additive GRM . . . . . . . . . . . . . . . . . . . . 82 6.5 Heterogeneity in resident perceptions . . . . . . . . . . . . . . . . 86 7 Conclusions 89 Bibliography 92 Appendices 100 A OpenBUGS code for implemented models 101 A.1 OpenBUGS code: graded responses multiunidimensional and additive models for . . . . . . . . . . . . . . . . . . . . . . . . . . . B R procedures for the simulation study 101 105 B.1 Multiunidimensional GRM: R code . . . . . . . . . . . . . . . . . 106 B.2 Additive GRM: R code . . . . . . . . . . . . . . . . . . . . . . . . 109 C Survey questionnaire 113 xii CONTENTS List of Tables 4.1 Main features of the proposed multiunidimensional and additive models for graded responses. . . . . . . . . . . . . . . . . . . . . . 5.1 Simulation conditions for the multiunidimensional model for graded responses. 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Multiunidimensional model: block 1 simulation results for subtest 2 (median RMSEs and median absolute biases). 5.4 61 Multiunidimensional model: block 1 simulation results for subtest 1 (median RMSEs and median absolute biases). 5.3 54 . . . . . . . . . . 62 Multiunidimensional model: real (r ) and estimated (r̂ ) ability correlations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.5 Simulation conditions for the additive model for graded responses. 66 5.6 Additive model: block 1 simulation results for subtest 1 (median RMSEs and median absolute biases). 5.7 . . . . . . . . . . . . . . . . 69 Additive model: block 2 simulation results for subtest 1 (median RMSEs and median absolute biases). 5.9 68 Additive model: block 1 simulation results for subtest 2 (median RMSEs and median absolute biases). 5.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Additive model: block 2 simulation results for subtest 2 (median RMSEs and median absolute biases). . . . . . . . . . . . . . . . . 71 5.10 Additive model: real (r ) and estimated (r̂ ) ability correlations. . . 72 6.1 Prole of respondents. 77 6.2 Response frequencies for items about tourism benets (B1-B5) and . . . . . . . . . . . . . . . . . . . . . . . . items about tourism costs (C1-C5). 6.3 . . . . . . . . . . . . . . . . . 78 Item parameter estimates for the multiunidimensional GRM. . . . 80 xiii xiv LIST OF TABLES 6.4 Item parameter estimates for the additive GRM. . . . . . . . . . . 6.5 Normalized mean perception and attitude scores by age, gender, education, province and typological area. . . . . . . . . . . . . . . 84 87 List of Figures 1.1 Item characteristic curve for a binary item . . . . . . . . . . . . . 6 1.2 Item response functions for an item with ve categories . . . . . . 6 1.3 Dichotomization of polytomous item responses, the dashed line indicates the observed category response. . . . . . . . . . . . . . . 8 2.1 Consecutive unidimensional latent structure. . . . . . . . . . . . . 16 2.2 Multiunidimensional latent structure. . . . . . . . . . . . . . . . . 17 2.3 Bi-factor latent structure. 17 2.4 Hierarchical latent structures. . . . . . . . . . . . . . . . . . . . . 18 2.5 Additive latent structure. . . . . . . . . . . . . . . . . . . . . . . . 18 4.1 Dichotomization used for the MIRT graded response model speci- . . . . . . . . . . . . . . . . . . . . . . cation. The dashed line indicates the observed category response. 43 5.1 Bidimensional case for multiunidimensional and additive structures. 56 5.2 Examples of stationary chains. . . . . . . . . . . . . . . . . . . . . 6.1 Representation of the thresholds' parameter estimates for the multiunidimensional model. 6.2 . . . . . . . . . . . . . . . . . . . . . . . 58 82 Representation of the thresholds' parameter estimates for the additive model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv 85 xvi LIST OF FIGURES Chapter 1 An introduction to item response theory (IRT) In this chapter we introduce the fundamental notions concerning item response theory (IRT). A brief description of IRT models for binary and ordinal data is carried out. Particular attention is given to the unidimensional Samejima's model for graded responses, which represents the starting point towards a generalization into a multidimensional context. 1.1 Basic concepts and denitions IRT falls within the wide context of the measurement of theoretical latent constructs. A latent construct is not observable by denition and it can only be determined indirectly, through the use of other manifest variables. Examples of latent constructs are the mathematics achievements of students, the satisfaction of a costumer about a product or service, the psychological status and all the situations that may refer to the concept of perception, e.g. depression and happiness. Another relevant eld of application of IRT methods is represented by the behavioral sciences, where the manifest variables, that are often ordinal, express a judge or an agreement to the phenomenon of interest. If we consider the educational and psychological elds, where IRT is extensively used, we can say that IRT has the nal aim to measure abilities and attitudes of individuals through the responses on a number of test items. In other 1 2 1. An introduction to item response theory (IRT) words, by using IRT models, we wish to determine the position of the individual along some latent dimensions, representing the unobservable characteristics of the individuals. In IRT literature the latent traits are commonly called abilities, for the intensive use of IRT methods in the educational eld, where the constructs are represented by the students' latent abilities. The analysis of the relation between latent continuous variables and observed categorical variables is known in the statistical literature as latent trait analysis, that is the reason why in this thesis the words abilities, latent abilities and latent traits are all referred to the same concept. The use of IRT as a measurement theory is fairly recent: in the pioneer work of Lord and Novick (1968) a rst formalization of the theory is expressed, on the basis of ideas and principles that raised in the thirties and forties. Improvements of IRT were due to the necessity to overtake the lacks of the classical test theory (CTT), for instance the sensitivity to sample conditions and the fact that in CTT individual abilities and test characteristics can be interpreted only in the same context (Hambleton et al., 1991). Moreover, IRT focuses on item rather than on individual score, while in the CTT the evaluation of test properties and item characteristics are not included. On the other side, IRT permits to evaluate individual ability and to describe the performances of the items on the test simultaneously. For these reasons, IRT seemed to be an alternative and promising method to substitute CTT in theoretical and application elds, showing a wide and eective framework. 1.1.1 The concept of model in IRT In IRT a model is dened by a mathematical function used to describe the conditional probability of a response given the latent ability, for an item with categorical responses (Thissen and Steinberg, 1986). The mathematical function expresses how an examinee with a high position on a latent trait is likely to provide a dierent response to an examinee with a low position on the trait (Ostini and Nering, 2006). The parametric model describes the relationship between the "observable", i.e. the examinee's performance in the test, and the "unobserv- able", the latent ability. 1.1 Basic concepts and denitions 3 In general, dierent models can be specied depending on: • The structure of the data: binary or polytomous (nominal or ordinal) responses; • The number of latent dimensions: unidimensional or multidimensional models; • The distribution functions used to link responses and ability(ies); • The number of item parameters introduced in the model. Concerning the rst point, IRT permits to specify dierent models depending on the kind of items we are dealing with, i.e. items with two response categories or items with more than two response categories (that, in turn, can be odered or not). The second point is a crucial choice in the model specication procedure: when only one ability aects the responses we are assuming unidimensionality, while when we need two or more latent traits to describe the correlation among the responses we are assuming multidimensionality. Moreover, the model depends on the probability distribution used to describe the relationship between the response and the examinee's ability(ies) and the number of parameters describing the item characteristics introduced. The most common probability models used are the normal distribution function (normal ogive models) and the logistic distribution function (logit models). Finally, a distinction can be made with reference to the number of item parameters, one, two or three, introduced in the model. 1.1.2 IRT unidimensional models for binary data In order to illustrate the basic concepts and assumptions of IRT and to introduce the notation, we start from the simplest models: the unidimensional models for dichotomous responses (i.e. correct and incorrect). In this context there are three fundamental assumptions. The rst assumption states that only one latent ability aects the item responses (unidimensionality assumption). The second assumption states that a change in the probability of a correct response, due to a change in the examinee latent ability, is completely described 4 1. An introduction to item response theory (IRT) by the item characteristic curve (ICC). Thus, the ICC describes how the probability of a response to an item changes relative to a change in the latent trait. As illustrated before, dierent distribution functions used to link responses and ability, i.e. dierent mathematical forms of the ICC, lead to dierent IRT models. In any case the probability of a correct response is expressed as a function of person and item parameters. The third assumption is the so called local independence assumption: re- sponses to a pair of items are statistically independent given the underlying latent ability. Local independence holds when the assumption of unidimensionality is true. Let consider a random vector of p item responses for the = 1, . . . , n), denoted by Yi , and the corresponding = (yi1 , . . . , yip ). θi is the ability of the examinee i. The i-th sub- ject (i observed responses, yi assumption of local independence can be stated as: P (yi |θi ) = P (yi1 |θi )P (yi2 |θi ) . . . P (yip |θi ) = p Y P (yij |θi ) . j=1 When local independence holds, there is one latent variable underlying the responses and, conditionally to this latent variable, responses are assumed to be independent. The unidimensional IRT model for binary data expresses the probability of a correct response by the subject i ηij , which depends j , for j = 1, . . . , p: the vector of parameters characterizing item on θi and on ξj , to the item ηij = f (θi , ξj ) . j πij as a function of the predictor (1.1) The so called probit or normal ogive model is obtained when a normal distribution is used (1.2), whereas when we use the logistic distribution we get the logit model 1 (1.3) : 1 Normal ogive models and logistic models have dierent ICCs for equivalent set of item parameters values. It can be proved (Haley, 1952; Birnbaum, 1968) that the two formulations are equivalent in terms of predicted probability through the introduction of a scaling constant 1.702 into the logistic model, in order to balance for dierences in ICCs. When this constant is introduced in the model, the predicted probabilities dier by less than 0.01 for each level of ability (Haley, 1952): | Φ(ηij ) − exp(1.702 ηij )/[1 + exp(1.702 ηij )] | < 0.01. 1.2 IRT unidimensional models for ordinal data 5 πij = Φ(ηij ) ⇒ Φ−1 (πij ) = ηj πij = exp(ηij ) 1 + exp(ηij ) (1.2) ⇒ logit(πij ) = ηij , (1.3) Φ is the standard normal cumulative distribution function. where Dierent unidi- mensional models can then be obtained by introducing a dierent number of item parameters ξj describing the item characteristics. The simplest case has only one item parameter ξj = {βj }, and βj is called diculty parameter . An example of one-parameter logistic model is the Rasch model (Rasch, 1960) and if we consider a logarithmic transformations of the scale of person and item parameters (Fischer, 1995), the predictor becomes If ξj = {αj , βj } a ηij = θi − βj . αj is discrimination parameter added to the model and we are in the case of two-parameter models. The predictor (1.1) becomes αj θi − βj : ηij = model (1.2) becomes the two-parameter normal ogive model (Lord, 1952) while model (1.3) becomes the two-parameter logistic model (Birnbaum, 1968). A further extension can nally be done by introducing a γj for each item, leading to three-parameter models where guessing parameter ξj = {αj , βj γj } (Lord, 1980). See Reckase (2009) for an exhaustive description of such models. With respect to the ICC, the parameters αj , βj and γj represent the slope, the location and the lower asymptote, respectively. 1.2 IRT unidimensional models for ordinal data Models briey presented above are all referred to dichotomous responses, nevertheless items with multiple response options exist and their use is quite common in behavioral sciences. IRT models for polytomous items operate in a dierent way from binary models. In the latter case the knowledge of the characteristics of a response determines also the characteristics of the other complementary response, while for polytomous items this feature does not hold anymore and each category function must be modeled separately (Samejima, 1996). In Figure 1.1 the ICC for a binary item is reported, while Figure 1.2 shows dierent response 6 1. An introduction to item response theory (IRT) functions for an item with ve categories. Figure 1.1. Figure 1.2. Item characteristic curve for a binary item Item response functions for an item with ve categories From Figure 1.2 we can see how, for ordered items, the category response functions are not all monotonic: only the curves related to the rst and the last categories are, respectively, monotonically decreasing and increasing. The presence of non-monotonic functions raises some complications: these functions cannot be described only in terms of discrimination and diculty parameter, as in the binary case. The choice of the proper mathematical form and the estimation 1.2 IRT unidimensional models for ordinal data 7 of parameters for such unimodal functions is a relevant issue. For ordered polytomous items this problem has been solved by treating polytomous items basically as `concatenated dichotomous' items (Samejima, 1969, 1996): dichotomizations of item response data are combined in order to get suitable response functions for each item category. As we will illustrate more in detail later, several models for ordinal data exist as result of extensions of the models for binary data. The simplest model for ordinal items is the partial credit model (Masters, 1982), which is an extension of the Rash model for binary items, i.e. with one item parameter. Despite its wide use, it focuses on the scoring of the individuals and its restrictive assumptions make it inadequate for modeling purposes, especially in complex contests. In this work we focus on the Samejima's graded response model, which is the generalization of the two-parameter IRT model for binary data. This choice has been lead by the consideration that models that include also the guessing parameter, even if they are appropriate educational eld, do not suit well in the context of behavioral science, where individuals typically express opinions. 1.2.1 Samejima's unidimensional graded response model The graded response model for ordinal data was developed by Samejima in 1969. Examples of graded responses are Lykert-type scales (strongly-disagree, disagree, neutral, agree, and strongly agree) and responses ordered on the basis of a range of scores. p Let consider a set of Kj ordinal items, k. categories, indexed by Y1 , . . . , Yj , . . . , Yp , where each item has In the parametrization of the model we consider 1, while the highest score is Kj and each item is characterized by Kj −1 thresholds or boundaries κj1 , . . . , κj,Kj −1 . The probability of achieving k or higher categories is assumed to increase monotonically with that the lowest score on item j is a growth in the latent ability (Samejima, 1996; Reckase, 2009), therefore the thresholds must satisfy the so called order constraint: κj1 < · · · < κj,Kj −1 . Concerning the dichotomization procedure mentioned above, Samejima's (1969) graded model is based on the probability that an item response will be observed in category k or higher the k -th : the probability category on item j πijk that the i-th subject will select is equal to the probability of answering above the 8 1. An introduction to item response theory (IRT) lower boundary for the category (κk−1 ) minus the probability of answering above 2 the category's upper boundary (κk ). Figure 1.3 describes the dichotomization method used in Samejima's models, a dashed line is used to represent an hypo- k = 4: the probability to have a response in such ∗ ∗ ∗ category can be computed as Pi4 − Pi5 , where in general with Pik = P (Yij ≥ k|θi ) we denote the probability of accomplishing step k at a given level of θ . thetical response in category Figure 1.3. Dichotomization of polytomous item responses, the dashed line indicates the observed category response. The probability that the i-th examinee's response will fall in the on item j k -th category can thus be written as: ∗ πijk = P (Yij = k|θi ) = Pik∗ − Pi,k+1 , where Pi1∗ and ∗ Pi,K j +1 (1.4) are assumed to be respectively 1 and 0, in order to ensure that the probability of each category can be determined from (1.4). The two- parameter normal ogive and logistic formulations of the model can be obtained from (1.4). The normal ogive form of the Samejima's model for graded responses 2 Figure adapted from Ostini and Nering (2006). 1.2 IRT unidimensional models for ordinal data 9 is given by: πijk 1 = P (Yij = k|θi , κjk , κj,k+1 ) = √ 2π αj Z θi −κjk 2 /2 e−t dt . (1.5) αj θi −κj,k+1 From expression (1.5), we can observe that the discrimination parameter αj , i.e. the slope of the response functions, is constant between all dierent category responses of a given item. This constraint ensure to avoid negative probabilities (Steinberg and Thissen, 1995). The boundary parameters κjk κj,k−1 < κjk < κj,k+1 , and probability of 0.5 of endorsing item, according to the order constraint of θ = κjk , the examinee has a vary within an at each level the category ∗ (Reeve, 2002). Pik is the trace line reecting the probability that an examinee's response will fall in that scoring category or a higher, at any specic level of latent ability θ. The graded model response function rate of examinees responding to the θ, k -th P (Yij = k|θi ) reects the category through the dierent levels of that is a non-monotonic curve, with the exception of the curves associated to the extreme categories, as previously pointed out in Figure 1.2 (Thissen et al., 2001). 1.2.2 Other unidimensional IRT models for graded responses Several models for items with two or more ordered responses have been developed. An assortment of these models, together with their features, has been introduced by van der Linden and Hambleton (1997) and van der Ark (2001). In addition to Samejima's graded response model (1969), other widely applied IRT models for ordinal data are the partial credit model (Masters, 1982) and its extension, the generalized partial credit model (Muraki, 1992). The partial credit model is an extension to the case of ordinal items of the Rash model for binary items, i.e. with one item parameter. On the other side, the Samejima's graded response model is the generalization of the two-parameter IRT model for binary data. In partial credit model and in its generalization, the category responses on the item represent the levels of performance (Reckase, 2009). As well as in 10 1. An introduction to item response theory (IRT) the graded response model, we have thresholds between adjacent scores: an examinee's performance is on the left or the right side of a threshold with a specic probability. Here the dichotomization procedure involves only two category boundaries for a given item, see Ostini and Nering (2006) for a detailed discussion about dierences between Samejima and Rasch dichotomization approaches. Mathematical expressions for the partial credit model and the generalized partial credit model are presented in (1.6) and (1.7), where D = 1.702 is the scaling constant: exp { Pk u=1 (θi − κju )} Pk v=1 exp { u=1 (θi − κju )} πijk = P (Yij = k|θi ) = PKj Pk u=1 Dαj (θi − βj + κju )} Pk v=1 exp { u=1 Dαj (θi − βj + κju )} exp { (1.6) πijk = P (Yij = k|θi ) = PKj . (1.7) In the generalized partial credit model the assumption of constant discrimination parameter of test items is relaxed, in fact αj parameters may vary across items. Reckase (2009) provides an exhaustive illustration of such models. Other IRT models for polytomous items have been proposed by Bock (1972), Andrich (1978, 1982), Thissen and Steinberg (1984), and Rost (1988). All these models refer to an unidimensional underlying ability structure. 1.3 Towards multidimensional models Unidimensional models are suitable when tests are made to measure only one latent ability (Sheng and Wikle, 2009). There are some advantages in the use of such unidimensional models: i) they have quite simple mathematical forms; ii) they perform well in tting the data in several empirical applications; and iii) they are rather robust to violations of assumptions (Reckase, 2009). Nevertheless, real interactions between examinees and test items are not simple as described in unidimensional models. A person is likely to use more than a single ability in the response process, on one hand, and the problems posed in a test can require several abilities in order to get the right solution, on the other 1.3 Towards multidimensional models 11 side. Multidimensional IRT (MIRT) models were developed to have a more accurate description of interactions between persons and test items. In particular, in MIRT models a vector of latent abilities is introduced, instead of assuming a single person parameter. In other words, MIRT models deal with quite common circumstances where an examinee requires multiple abilities in order to respond to an item. In this case, more than one latent construct is measured by that item. One of the most famous example in the educational eld is a mathematical test item presented as story that requires both mathematical and verbal abilities to arrive at a correct score (Fox, 2010), where both mathematical and reading comprehension skills are involved in the answering process. 12 1. An introduction to item response theory (IRT) Chapter 2 Multidimensional IRT (MIRT) models: a review As previously pointed out, the latent space that has to be measured may be more complex than the one underlying unidimensional IRT models. The so called MIRT models are used when separate latent abilities are encompassed in the observed responses for an item. In this chapter we introduce the MIRT approach. In particular, we show how dierent models can be specied depending on the latent ability structure hypothesized to underlie the response process. A literature review on MIRT models for both binary and ordinal data is reported. A nal section describes the most common estimation methods in IRT and MIRT frameworks. 2.1 Main features of MIRT models The assessment of dimensionality is a key topic in IRT and in the latent variable framework. A review of methods for an empirical detection of the structure of tests with binary items was made by Tate (2003). In his work, a particular attention is given to the assessment of the test statistical structure as subtended from the relations between examinees and items. This aspect should be an important 13 14 2. Multidimensional IRT (MIRT) models: a review part of the development, evaluation, and maintenance of large-scale test. Several IRT models are based on a common postulate: the assumption of unidimensionality. However, the local independence assumption holds only if the latent space is entirely specied. For this reason, many eorts for the characterization of the concept of dimensionality and for its detection have been made. We can say that an accurate and unequivocal denition of dimensionality does not exist yet. This is due to the fact that the phenomenon is latent by nature, hence a direct comparison with observed results is not possible. Hambleton and Swaminathan (1985) justied the unidimensionality assumption with the presence of a dominant trait able to explain the examinees' responses. In this sense, we can imagine that a single trait always exists but crucial points are if the dominant trait is suciently strong and in which way it dominates the others. Conversely, Traub (1983) argued that unidimensionality is probably more the exception than the rule, with respect to the skills necessary to answer to the items on most cognitive tests. Some weak features of the unidimensionality assumption have been reviewed by Adams et al. (1997), with the aim to propose a MIRT model. The use of unidimensional models might be improper for tests intentionally built from subcomponents that are assumed to measure dierent abilities. IRT models seem to be robust to these violations of unidimensionality, especially with highly correlated latent constructs. In fact, if we assume the existence of a single latent ability, it can be seen as the dominant factor reecting the dierent composition of the items. On the other hand, when a test is made by mutually exclusive subtests of items or when the underlying dimensions are not highly correlated, the use of a unidimensional model can bias the parameter estimation, adaptive item selection and trait estimation. The problem is highlighted especially in adaptive testing, when the examinees are administrated dierent combinations of items and the traits underlying the performance may reect the dierent composition of the items (Matteucci, 2007). Finally, as shortly described at the end of Chapter 1, the assessment of knowledge, competencies and achievement is going more and more towards a multidimensional evaluation. The reason of the widely use of MIRT models in recent studied is that the actual interactions between examinees and test items are complex and necessitate to be framed in a multidimensional background. A clarifying example reported in Matteucci (2007) concerns the assessment of prociency in 2.1 Main features of MIRT models 15 the University context, where the student's evaluation is typically multidimensional at each level: within a single course and during all the University career, students are evaluated on the basis of multiple competencies. 2.1.1 Compensatory and noncompensatory approaches MIRT models can be classied in two main groups: compensatory and non- compensatory models, depending on the way the vector of latent abilities, θ, is combined with item parameters to obtain the probability of responses to the item. In compensatory models we use a linear combination of the values of θ in the specication of the response probabilities, by using a logistic or a normal ogive form. This approach implies that dierent combinations of elements in θ can yield the same sum, and the direct consequence is a compensation eect: if a θ-value In is low, but another one is appropriately high, the sum can be the same. noncompensatory models, dierent latent abilities used to solve an item are separated and each part is used as an unidimensional model. Then the global probability is obtained as the product of the probabilities of each unidimensional part. Nonlinearity raises in relation to the use of the product of such probabilities, and the compensation property does not hold (Reckase, 2009). 2.1.2 Conrmatory and exploratory approaches Another classication of MIRT models can be done with reference to the available information at the model specication step. Mainly, the investigation of multidimensionality can be conducted by using two dierent approaches: the exploratory and the conrmatory approaches. In the exploratory approach no prior knowledge is included in the model, in terms of relationship between items and latent traits. When the number of latent abilities is specied in advance, the method is not merely explorative and we are in a conrmatory context. In line with the conrmatory approach, not only the number of latent variables is pre-specied but also their relationships with the items. In fact, the researcher can use prior knowledge to dene which items load on which factors. 16 2. Multidimensional IRT (MIRT) models: a review 2.1.3 Underlying latent structures In this paragraph a brief review of dierent multidimensional latent structures is reported. For simplicity, gures are referred to the simplest case of a test consisting of two subtests. Circles represent latent traits and squares represent observed item responses. Subtests are indicated with dashed lines. Consecutive unidimensional model consecutive unidimensional In Figure 2.1 is illustrated the so called approach, where simple unidimensional IRT models are tted to each subtest in a sequential way. Fitting this model, we obtain person measures for every specic ability, but a direct estimation for the relation between them is not feasible (Huang et al., 2013). Figure 2.1. Consecutive unidimensional latent structure. Multiunidimensional model Figure 2.2 reports the underlying structure for the between-item MIRT model (Wang et al., 2004), also called sional multiunidimen- approach (Sheng and Wikle, 2007), where abilities are allowed to correlate and the intensity of such associations can be obtained directly. Bi-factor model The well known bi-factor model, rst introduced by Holzinger and Swineford (1937), where a general (or common) ability, θ0 , and a specic ability are assumed to aect the response to each item, is illustrated in Figure 2.3. This is a case where there is within-item multidimensionality, i.e. single 2.1 Main features of MIRT models Figure 2.2. 17 Multiunidimensional latent structure. items measure more than one latent trait. This approach ignores the association between latent abilities. Figure 2.3. Hierarchical models hierarchical Bi-factor latent structure. Figure 2.4 shows the latent structure assumed for MIRT models, where the hierarchical structure in general and specic la- tent constructs is modeled explicitly: items in the same subtest measure a specic ability and, in turn, each specic ability is inuenced by a general ability. Different hierarchical models can be specied depending on the relation between specic and overall abilities: if each specic ability is a linear function of the overall ability we are in the case illustrated in (a), while if each specic ability 18 2. Multidimensional IRT (MIRT) models: a review linearly combines to form the overall ability we are in the case showed by (b) (Schmid and Leiman, 1957; Sheng and Wikle, 2008). Figure 2.4. Additive model In the Hierarchical latent structures. additive model presented in Figure 2.5 the latent struc- ture is such that the response to a test item is aected both by the general and the specic latent traits, so that the latent abilities form an additive structure (Sheng and Wikle, 2009). This model has a latent structure similar to the bifactor model, but here all the latent constructs are allowed to correlate. Figure 2.5. Additive latent structure. 2.2 MIRT models for binary data 19 2.2 MIRT models for binary data MIRT is a methodology that has been developed with the principal aim of dealing with the situation of complexity in psychological measurement when several latent abilities inuence the individual's performance on a given item (Reckase, 1997). By introducing a person trait and item discrimination parameters for each ability measured by a test item, MIRT models permit separate inferences with reference to each distinct latent dimension of an examinee (Ackerman, 1993). Two parameter normal ogive model for binary data p multiple choice items, each measuring m latent abilities, θ1i , . . . , θmi . Let Y = [Yij ]n×p represents the data matrix, i.e. a matrix containing n examinees' responses to p binary items, so that, for i = 1, . . . , n and j = 1, . . . , p, Yij is dened as: Let consider a test consists of Yij = 1, if examinee i answers item j correctly 0, if examinee i answers item j incorrectly. Reckase (1985) derived a multidimensional extension of the compensatory unidimensional two-parameter model, that in its normal ogive formulation becomes: P (Yij = 1|θi , αj , βj ) = Φ m X ! ανj θνi − βj = ν=1 Pm ν=1 1 =√ 2π Each individual is characterized by a vector ties, where m Zανj θνi −βj 2 /2 e−t dt . (2.1) −∞ θi = (θ1i , . . . , θmi ) of latent abili- is the number of latent dimensions measured by a generic item, in contrast to the unidimensional case, where they are classied by only one latent ability θi . Item discrimination parameters are also represented by a vector, reecting 20 2. Multidimensional IRT (MIRT) models: a review multiple dimensions: and m αj = (α1j , . . . , αmj ) , where j represents the item number shows the dimension to which the discrimination value is related. If the discrimination parameter related to dimension ν , ανj , is high, it means that such dimension has a great inuence in determining an examinee's success on item j. Finally, βj is a scalar parameter determining the location in the latent space where the item provides maximum information. Multiunidimensional model for binary data As illustrated in the work of Sheng and Wikle (2007), the elements in the vector of discrimination parameters αj = (α1j , . . . , αmj ) can be considered as factor loadings in factor analysis. If a rotation is performed so that each item loads on one factor only, the vector of discrimination parameters can be simplied to αj = (0, . . . , 0, ανj , 0, . . . , 0), and we can get the expression for the multiunidimensional model for binary data, where each latent trait is related to a single set of items, from (2.1). The underlying latent structure of such model is illustrated in Figure 2.2. Let consider a test consisting of subtests, each one composed by probability that the individual to the ν -th ανj items. The test is structured into items that measure one latent trait. i will obtain a correct response to item j m The belonging subtest is given by: P (Yνij where pν p 1 = 1|θνi , ανj , βj ) = Φ (ανj θνi − βj ) = √ 2π ανjZ θνi −βj ν -th ability, and dt , −∞ is a scalar parameter reecting the item discrimination, parameter reecting the individual's 2 /2 e−t βj θνi is a scalar is a scalar parameter representing the location in the latent space where the item provides maximum information. Additive model for binary data The additive MIRT model for dichotomous data proposed by Sheng and Wikle (2009) assumes an underlying latent structure such that both specic abilities and an overall ability aect directly the individual response to a test item, resulting 2.2 MIRT models for binary data 21 in an additive structure (see Figure 2.5). If we consider again a test containing (each one composed by pν p items structured into j belonging to the subtests items), according to the additive MIRT model for binary data, the probability that the individual to item m ν -th i will obtain a correct response subtest is given by: P (Yνij = 1|θ0i , θνi , α0νj , ανj , βj ) = α0νj θ0i +α Z νj θνi −βj = Φ(α0νj θ0i + ανj θνi − βj ) = −∞ (2.2) ν -th dimension, θ0i is the i-th individual parameter related to the overall ability, α0νj is the j -th item discrimination parameter with reference to the overall ability θ0i , ανj is the item discrimination parameter with reference to the specic ability θνi , and βj is a scalar parameter representing the location in the latent space where where θνi 1 2 √ e−t /2 dt , 2π is a scalar parameter representing the examinee's ability in the the item provides maximum information. The expression in (2.2) implies that the probability that an individual endorses an item is directly inuenced by two latent traits: a general ability and a specic one (Sheng and Wikle, 2009). A more detailed description of the models for binary data presented above goes beyond the purpose of this study. Our decision to focus the analysis on the additive structure has been driven by the fact that this latent structure, according to which both the specic and general latent traits directly underlie all the test items, represents a plausible and fairly detailed approximation of the real interactions between individuals and item responses. On the other hand, the multiunidimensional model is simpler than the additive, but it is regularly used in MIRT applications. The exposition of these two models has been done in order to furnish a more complete background on the latent structures that we will discuss in detail for the case of ordered responses. 22 2. Multidimensional IRT (MIRT) models: a review 2.3 MIRT models for ordinal data A multidimensional formalization of IRT models for graded responses has been developed as an extension of the unidimensional version by several authors. In this section we present some works that focus on multidimensional models for ordered items. These works have not necessary developed in an IRT context, but also in the framework of conrmatory factor analysis. Basically, the interest in adopting such models raised to face the widespread use of Likert items (Likert, 1932), and in general other ordered scales, on questionnaires in sociological and psychological measurement. The extensive availability of such data has led, in the last two decades, to the need of new progressions towards a multidimensional version of IRT model for graded responses. We begin by introducing some notation. items, let consider a test made by latent traits, θ1 , . . . , θ m . 1, 2, Yij = ... K , j where 1 and Kj multiple choice items, each measuring m Y = [Yij ]n×p , i = 1, . . . , n and Now the data are collected in a matrix, n examinees' responses j = 1, . . . , p, Yij is dened as: containing p As in the case for dichotomous to p ordered items, thus, for if the answer of examinee i to item j falls in category 1 if the answer of examinee i to item j falls in category 2 i to item j falls in category Kj . . . if the answer of examinee are the lowest and the highest score for item j, respectively. Muraki and Carlson (1995) developed a MIRT model for polytomously scored items on the basis of Samejima's graded response model in the full information factor analysis context. In their work, they show how the factor analytic model for categorical variables is based on the assumption that the response process, say Zij , is an underlying not observable variable and, for each subject i, realized into the vector of observed ordered item responses also model the response process variable Zij Yi = (Yi1 , Yi2 , . . . , Yip ). They as a linear combination of the m 2.3 MIRT models for ordinal data latent traits, θ1i , θ2i , . . . , θmi , 23 and the factor loadings αj1 , αj2 . . . , αjm . Thus: Zij = αj1 θ1i + αj2 θ2i + · · · + αjm θmi + εij = α0j θi + εij , εij is an unobserved random variable that is assumed to be distributed 2 as N 0, σj . Muraki and Carlson (1995) introduced the threshold parameter γjk associated with the k -th category of item j , and modeled the unobservable response process according to the psychological mechanism, that is Yij = k if γj,k−1 ≤ Zij < γjk , for k = 1, . . . , Kj , γj0 = −∞, and γjKj = +∞. The probability to get the response category k of item j by examinee i, given the examinee's m-dimensional latent trait and assuming a normal ogive model, is formalized as: where P (Yij = k|θi ) = Zγjk 1 exp 1 (2π) 2 σj ( 1 − 2 Zij − α0j θi σj 2 ) dZ . (2.3) γj,k−1 Model (2.3) can be rewritten in a more familiar way with item response models, by applying some transformation of the variables (see Muraki and Carlson (1995) for the detailed procedure). The authors focus on uncorrelated latent dimensions (bi-factor latent structure) and furnish a detailed procedure of the Expectation Maximization (EM) algorithm in a marginal maximum likelihood estimation context (the matter of estimation methods will be covered in the next section). The proposed algorithm has been implemented in the POLYFACT computer program (Muraki, 1993), which calculates the factor loadings via the principal factor method adopted to the product-moment correlation matrix. The program treats the observed responses as continuous variables (Muraki and Carlson, 1995). In the study by Ferrando (1999) a comparison between three dierent item response models for graded responses has been made, focusing on a continuous response model based on linear factor analysis, a censored response model, where the graded responses are considered to be censored continuous variables, and a multidimensional graded response model in the formulation given by Muraki and Carlson (1995). They observed that, even though there have been several applications of the unidimensional graded response model to attitude and personality data, applications of the multidimensional version of the model are not common. 24 2. Multidimensional IRT (MIRT) models: a review Ferrando (1999) concludes showing that the solutions were similar for the three models considered, but that the estimation method could aect the results. A more recent work by Edwards (2010a) falls within the context of conrmatory item factor analysis models. He developed a relatively user friendly package, MultiNorm (Edwards, 2010b), where the user can t multidimensional graded (or dichotomous) response models characterized by a multiunidimensional or a bi-factor underlying latent structure. The estimation technique used in this work belong to the Markov chain Monte Carlo (MCMC) techinques. Again, for a further discussion on estimation methods see the next section. Other applications of MIRT models for graded responses, with empirical examples regarding mainly the eld of educational assessment and the psychological reactance, can be found in Yao and Schwarz (2006), Fu et al. (2010), Brown et al. (2011) and van der Ark et al. (2011). It is worth to remark that the latent structures assumed in these studies were prevalently the multiunidimensional structure (Figure 2.2) and the bi-factor structure (Figure 2.3). Considering this scarcity of existing research about MIRT models for ordinal outcomes, especially for complex cases, in this work we take into consideration the one represented by an additive underlying latent structure (Figure 2.5), after having introduced the multiunidimensional case (Figure 2.2). 2.4 Estimation methods In IRT models, as well as in MIRT models, the characteristics of interest are the person's abilities and the item parameters: dierent values of these parameters lead to dierent response probability. Nevertheless, these two important char- acteristics are both unknown and the available data are represented only by a collection of responses given by a sample of examinees. Concerning the estimation procedure, we need to consider two relevant features: the rst one is that the response model is not linear and the second one is that is not possible to observe the latent trait θ. It implies that the estimation is similar to perform a nonlinear regression with unknown predictor values. Starting from the available data, the focal objective is in the determination of the θ values for every individual and the item parameters from the item responses. We can perform a simultaneous estimation of ability and item parameters in a 2.4 Estimation methods 25 context of the maximum likelihood (ML) or in a Bayesian framework. The estimation procedure is in general aected by the way the probabilities of the responses are theorized. bility: one of them is the There are two main interpretations of proba- stochastic subject interpretation, where the observed examinees are considered as xed and probabilities reect the unpredictability of specic events. Here the latent variables are constructed as unknown xed parameters. The other interpretation of probability is the random sampling , where the examinees are considered as a representative random sample from a population, so that it raises the needing to specify a specic distribution of the latent trait and the latent variables are constructed as random. In the framework of ML estimation, three main methods can be identied: • The joint maximum likelihood (JML); • The conditional maximum likelihood (CML); • The marginal maximum likelihood (MML). In the JML and CML methods we are in the context of the stochastic subjects interpretation of probability, i.e. xed latent variables, whereas in the MML method we are in the random sampling interpretation framework and the latent variables are treated as random. The applicability of JML and CML is pretty limited. The JML method works by simultaneously estimating item and person parameter through an iterative procedure. This method is quite simple but the complexity of the algorithm increases with the number of observations. The standard limit theorems do not apply and the resulting parameter estimators are not consistent (Andersen, 1970). The CML was a method suggested by Andersen (1970) and based on the availability of a sucient statistic for the ability in order to simplify the maximum likelihood conditioning on it. There is a relevant problem which limits the applicability of such method: most models, including the quite simple unidimensional two parameter model, do not have simple sucient statistics (Johnson, 2007). The MML estimation method is the most widely applied and, by considering the joint probability of a certain response pattern given the latent trait and integrating out of the individual likelihoods, it denes the marginal probability 26 2. Multidimensional IRT (MIRT) models: a review of observing the item response pattern. To obtain the parameters estimates, the EM algorithm is used (Ayala, 2009). A single estimated latent trait value can be associated to each individual through maximum a posteriori or expected a posteriori techniques. In general, all the ML estimation methods consider xed item parameters. Conversely, in the Bayesian context, both the latent abilities and the item parameters are regarded as random variables. As we will see later more in detail in the next chapter, the adoption of a fully Bayesian approach implies several advantages. It allows a joint estimation of item parameters and individual abilities and it permits to include uncertainties about item parameters and abilities, and in general prior beliefs, in the prior distributions. MCMC estimation of IRT and MIRT models can be then viewed as an alternative to MML estimation, where the approximation of multiple integrals involved in the likelihood function, especially for increasingly complex models, may represent a serious problem. Chapter 3 Bayesian estimation of MIRT models This chapter introduces the main ideas and functioning characterizing the Bayesian approach for estimation purposes, with a particular focus on the simulation-based methods for parameter estimation. Available Bayesian estimation methods based on MCMC techniques for MIRT models are also presented. 3.1 Elements of Bayesian statistics in MIRT context According to the Bayesian approach, all the model parameters, i.e. person and item parameters in our case, are random variables, each one with its prior distribution reecting the prior information available and the uncertainty about their real values before the observation of the data. All the MIRT models so far illustrated (for both binary and ordinal items) are specied with the nal aim to express the data-generating process as a function of the unknown person and item parameters. These are likelihood models and present the density of the data conditional on the model parameters. In order to 27 28 3. Bayesian estimation of MIRT models formulate a Bayesian model, we need to specify: • A prior distribution for each unknown model parameter; • A likelihood model reecting the data-generating process. Once the data are observed, the prior information is updated with the information contained in the observed data and a posterior distribution is made, which permits to perform direct inference about parameters. 3.1.1 Prior distribution choice A key point in Bayesian framework is the possibility to specify prior distributions for the unknown model parameters with the aim to exploit background information and beliefs available before the collection of the sample. All these context information are expressed as probability distributions and, as a result, are reected in a prior distribution. On the other hand, the conditional probability distribution is specied to reect the observed data. One of the main objection to the Bayesian framework regards the specication of these prior distributions, that can be considered extensively subjective and arbitrary (Gelman, 2008). It has to be noticed that the choice of the prior distributions, made at the moment of model specication, is subjective by denition. Therefore, only prior distributions expressing prior ideas can be considered correct in this setting and, even if the choice is subjective, it cannot be considered arbitrary since it reects the researcher's thought (Fox, 2010). In addition, it is possible to specify the so called vague priors, that are objective non informative prior distributions indicating ignorance around the unknown parameter values. The branch of objective Bayesian statistics rely on the specication of objective prior distributions. Even though it does not need any subjective contribution, we have to consider that a specic point of strength of Bayesian methodology is the possibility of including beliefs and prior information in model specication, and objective Bayesian methods do not allow to do that. The inclusion of prior beliefs can increase the reliability of the statistical inference. In IRT and MIRT frameworks, item responses represent the observed data and we can include other sources of information in the model through the 3.1 Elements of Bayesian statistics in MIRT context 29 a priori model. These are circumstances where data-based information is slight, and where prior information can signicantly improve the statistical inference (Fox, 2010). 3.1.2 Bayes' Theorem y = (y1 , . . . , yN ), that are the numerical realization of the random vector Y = (Y1 , . . . , YN ), which follows some probability distribution. Let denote with p(y) the probability density (mass) function of the continuous (discrete) variable Y . Let consider a set of N observations, denoted by Now let assume that, starting from the observed responses, we are interested in measuring the unknown person (θ ) and item (ξ ) parameters, denoted p(λ) the prior distribution reecting the beliefs on unknown parameters. The term p(y|λ) reects the information about λ from the vector of observed values y . In general, we can be interested in the sampling by λ = (θ, ξ). We denote with distribution and the likelihood function if we consider it as a function of the data or as a function of the parameters, respectively. Usually, the distribution of the parameters given the data is of main interest. According to the Bayes' Theorem, the conditional distribution of p(λ|y) = where ∝ λ given the response data is p(y|λ) p(λ) ∝ p(y|λ) p(λ) , p(y) denotes proportionality. The term p(λ|y) (3.1) is the posterior density of λ given both prior and sample information and, for continuous R quantities, p(y) = p(y|λ)p(λ) dλ (where Λ denotes the set of all the possible λ∈Λ values of λ). the parameter Since we are interested in person and item parameters, we replace expression (3.1) with p(θ, ξ|y) = (3.2) p(θ) is the prior for person parameters θ , p(ξ) is the prior for item paramξ and these prior densities are assumed to be independent from each other, where eters p(y|θ, ξ) p(θ) p(ξ) ∝ p(y|θ, ξ) p(θ) p(ξ) , p(y) 30 3. Bayesian estimation of MIRT models thus p(θ, ξ) = p(θ) p(ξ). The denominator of expression (3.2) is called the data marginal density, marginal likelihood, or integrated likelihood. Its evaluation can be a time costly process, so that, when the knowledge of the shape of the posterior p(θ, ξ|y) is enough for the study purposes, we can focus on the unnormalized density function: p(y|θ, ξ)p(θ)p(ξ) (Fox, 2010). The statement of the well-known Bayes' Theorem (Bayes and Price, 1763) is represented by the expression reported in (3.2). In particular, the expression p(θ, ξ|y) ∝ p(y|θ, ξ) p(θ) p(ξ) is a factorization representing the product of the likelihood L(y; θ, ξ) and the prior density, as typically L(y; θ, ξ) = p(y|θ, ξ). All the sample information regarding person and item parameters is contained in this likelihood function. A relevant distribution for the inference process is the so called joint posterior density p(y, θ, ξ). This density can be factorized as follow: p(y, θ, ξ) = p(θ, ξ|y) p(y) (3.3) = p(y|θ, ξ) p(θ) p(ξ) . (3.4) From the expressions above we can observe that the joint posterior distribution can be factorized in two dierent ways: (i) as the marginal density of the data and the posterior of the unknown parameters (3.3), and (ii) of the parameters and the likelihood of (θ, ξ ) given y as the prior distributions (3.4). 3.1.3 Marginal posterior distributions for model parameters In order to make inference, the joint posterior distribution reported in (3.2) is used. Since this high-dimensional distribution has a complex form, and consequently it usually shows an analytically intractable expression, we need to focus on one of the unknown parameters, and consider the other as a nuisance parameter. More precisely, if we are interested in the distribution of θ, we assume nuisance parameter and, integrating out all the possible values of ξ, ξ as a from (3.2) 3.2 Markov chain Monte Carlo methods 31 we obtain the marginal posterior density for person parameters: Z p(θ|y) = Z p(θ, ξ|y) dξ = ξ∈Ξ p(y|θ, ξ) p(θ) p(ξ) dξ p(y) ξ∈Ξ Z p(y|θ, ξ) p(θ) p(ξ) dξ . ∝ (3.5) ξ∈Ξ When we are interested in the distribution of ξ, we consider parameter and thus we integrate out all the values of θ, θ as a nuisance getting the marginal posterior density for item parameters: Z Z p(θ, ξ|y) dθ = p(ξ|y) = θ∈Θ p(y|θ, ξ) p(θ) p(ξ) dθ p(y) θ∈Θ Z p(y|θ, ξ) p(θ) p(ξ) dθ . ∝ (3.6) θ∈Θ In general, the information contained in the joint and/or marginal posterior distributions are summarized by the posterior mean (median) and standard deviation. Concerning the joint posterior distribution of person and item pa- rameters, as previously pointed out, several diculties arise as a result of its high-dimensionality and analytical intractability. Nonetheless, with reference to the marginal posterior densities of person (3.5) and item (3.6) parameters, the same diculties remain, as the mathematical expressions are not always known. These computational problems can be solved by the use of simulation based techniques. In particular, the MCMC method is a very useful technique that we will be briey describe in the next section. 3.2 Markov chain Monte Carlo methods The Bayesian approach based on MCMC techniques has increased its popularity in the estimation of unidimensional and multidimensional item response models. A twofold motivation can drive the use of such method. First of all, it can 32 3. Bayesian estimation of MIRT models represent an eective substitute to the classical EM algorithm implemented in the MML estimation. In fact, it works with simulation and introduces an informative prior distribution in the estimation process and, unlike the MML method, the Bayesian approach considers both the person parameters and item parameters as random variables. Secondly, it can also be seen as a compensatory instrument to the EM algorithm. The posterior distribution generated through the MCMC techniques can be used to evaluate the suitability of the normal approximations in the MML, so that we can compare the two approaches with reference to the accuracy of parameter recovery. As we will see in this section, MCMC is a very useful and relatively straightforward method to make inference when we have to face with a very complex model, where it is actually dicult to sample or directly simulate from the posterior distribution. This represents a common situation in a MIRT context. In particular, the Gibbs sampler is a widely used MCMC algorithm consisting in a quite precise scheme to create suitable samples from the posterior density. Moreover, this method is not very constraining and fairly simple to implement, if compared with other methods. For the motivations mentioned above, MCMC strategies have been implemented in IRT background by several researchers and many studies have been made in order to investigate the properties of these methods. Of particular interest is also the evaluation of model parameter recovery in comparison with the classical methods. If we perform a comparison between the MCMC technique and the classical MML estimation, we can summarize the main advantages of the MCMC approach in: • the exibility regarding the modeling of all the connections between latent and observed variables; • the appropriateness for more complex models; • the non-sensitivity to the choice of starting values (unlike the EM algorithm). Two relevant works that perform Bayesian estimation using the Gibbs sampler in an IRT context are the works of Albert (1992) and Béguin and Glas (2001). In the rst one, the Gibbs sampler for the unidimensional two parameter model for binary data is implemented, and the MCMC algorithm is compared with the 3.2 Markov chain Monte Carlo methods 33 EM algorithm through an application in the educational assessment context. In the second work, an extension to the multidimensional case has been done with respect to the work of Albert. Other item response applications of MCMC can be found in Fox and Glas (2001); Patz and Junker (1999a,b). As previously highlighted, from a Bayesian point of view, the leading purpose of the researcher is to analyze the properties of the posterior distribution p(λ|y) which, as we can see from (3.1), is proportional to the product between the likelihood function and the prior distribution (recall that the unknown parameters of interest, and y λ is the vector representing represents the observed data). For exposition purpose and without loss of generality, in this section we will consider the simplest case where the vector of unknown parameters is unidimensional, namely λ = λ. When the posterior distribution does not have a familiar functional form and/or it is not possible to perform a direct simulation because of the complexity of the model, simulation methods based on Markov chains seem to be an easy way to get samples from the posterior density p(λ|y). The MCMC is a class of techniques developed with the nal aim of reproducing a target distribution by simulating one or more sequences of correlated random variables. posterior density In our context the target distribution is represented by the p(λ|y). ulated by the MCMC algorithm, where at each iteration value of [t] λ λ is simt = 1, . . . , T , the A random walk in the space of the parameter t, for is drawn from a probability function which depends on the value of λ at the previous step, λ[t−1] . The underlying idea is that the regions of the state space are touched by the random walk in a proportional way with respect to their posterior probabilities and, for a suciently large number of iterations, it might approximate the target distribution. MCMC methods dier from the Monte Carlo methods because the simulated values are correlated, rather than being statistically independent. The generated Markov chain converges to an unique and stationary distribution that corresponds to the target distribution (Gelman et al., 2003). Therefore, with reference to the reproduction of the marginal posterior densities of IRT model parameters with a complex structure, this method is able to furnish reliable results, and overtakes the problem of analytically intractable distributions. One of the key point concerning all the MCMC techniques is the creation of a chain suciently long to approximate the target distribution. Considering that we are in the context of iterative based methods, the time of convergence also 34 3. Bayesian estimation of MIRT models represents a relevant topic. Usually, a so called burn-in period, containing a xed number of rst iterations, is dened and excluded from the analysis. The chain length is aected by the complexity of the posterior distribution, the initial values and the speed of convergence. Gelman et al. (2003) recommend to use half of the sample as burn-in period. On the contrary, other authors prefer to directly choose the number of iterations as burn-in period, for example in one of the analyses illustrated in Béguin and Glas (2001), the burn-in period is of 1000 iterations against a run length of 30000 iterations. What we suggest from a practical point of view is to control the behavior of the sampled parameters through a plot in the sequence of iterations, and then decide subsequently. Moreover, another signicant (but still not very clear, as illustrated in Gilks et al. (1996)) topic concerns the number of distinct chains needed to implement the MCMC algorithm. Mainly, there are three dierent approaches. According to the rst one, only one long chain is created, considering that the longer the chain is, the higher the possibility to nd new modes is. The second approach is based on the creation of several quite long chains. The main advantage of this approach is that multiple chains allow the comparison between the results, that can permit to detect some signicant dierences and symptom of non-stationarity. The use of the third approach, consisting of the utilization of many short chains, is driven by the aim of creating independent samples. Actually, this approach is not advisable because chains can take a long time to reach the convergence and independent samples are not required. Several MCMC algorithm exist, depending on the features of the problem and the specic attributes of the Markov chains. Each MCMC algorithm denes a transition distribution λ, say λ [0] p(λ[t] |λ[t−1] ), representing the probability of a parameter, to move from a state to the following, starting from a proper initial values . Examples of detailed essays about MCMC are Gelman et al. (2003), Gamer- man (1997) and Gilks et al. (1996). 3.2 Markov chain Monte Carlo methods 35 3.2.1 Metropolis-Hastings algorithm The Metropolis-Hastings (M-H) algorithm (Hastings, 1970) is one of the most popular MCMC mehtods and it can be directly implemented in a Bayesian framework. Our aim is the generation of a sample of size T from the target distribution represented, in our context, by the posterior distribution p(λ|y). We can sum- marize the M-H algorithm functioning in the following way (Ntzoufras, 2011): 1. Set initial values λ[0] ; 2. Then reiterate the following steps for (i) Set t = 1, . . . , T : λ = λ[t−1] (ii) Generate a new candidate parameter value ing) distribution λ[t] = λ0 from a proposal (jump- q(λ0 |λ) α = min 1, (iii) Calculate the ratio (iv) Update λ0 with probability Let focus on the case where λ p(λ0 |y)q(λ|λ0 ) p(λ|y)q(λ0 |λ) α; otherwise set λ[t] = λ. is a vector of parameters that can assume only continuous values. According to step (i), suitable starting values have been provided. Let suppose to be in the state λ[s−1] of the chain. In the step (ii) of the algorithm, a new candidate a proposal distribution 0 [s−1] q(λ |λ ). λ0 is sampled by using The proposal distribution is also called jumping distribution, in order to emphasize the concept of movement from the current value to the next one of the chain. It is also possible to dene the probability of jumping in the opposite direction, i.e. from q(λ [s−1] 0 |λ ) λ0 to λ[s−1] , that is Even if in the original M-H algorithm (Metropolis et al., 1953) only symmetric proposals were considered, this property is not compulsory in the more recent versions of the algorithm (Ntzoufras, 2011). Furthermore, the proposal q(·) should be dened in a proper way. In fact, the resulted chain needs to satisfy some specic characteristics, namely: irreducibility, aperiodicity and not transitoriness. A chain is irreducible if it is possible to move from one state to any other state in a nite number of steps with positive probability, aperiodic if all the states are acyclic, and not transient if all the states are recurrent (i.e. the probability to return to a state from the same state 36 3. Bayesian estimation of MIRT models is equal to one). Moreover, the ratio r= q(λ0 |λ[s−1] ) must be strictly positive, for q(λ[s−1] |λ0 ) λ such that both the numerator and the denominator are nonzero. every value of In the step (iii) the acceptance probability α α is computed. The higher the is, the more probable the acceptance of the candidate value quantity r λ0 will be. The consists of two components: the ratio of the posterior probabilities, which drives the algorithm towards the λ-value with higher posterior density, and the ratio of the proposal densities, which also has an inuence in determining the direction to one or the other λ-value. Step (iv) of the M-H algorithm is about the acceptance or the rejection of the candidate value λ0 . To make this choice, we draw a random number uniform distribution in the [0, 1] λ[s] Thus, the candidate value of u ≥ α. from the interval. Then we set: λ 0 , = λ[s−1] , λ0 u if u < α if u ≥ α. is accepted with probability (3.7) α and rejected in case In both cases (acceptance or rejection) the iterations progress and the algorithm proceeds to generate the next value. The M-H algorithm can be also applied in case of discrete-values parameters where the q(·) proposal distribution becomes the probability mass function used to generate candidate points. 3.2.2 Gibbs sampler The Gibbs sampler was rst introduced by Geman and Geman (1984) and then formalized by Gelfand and Smith (1990). It can be obtained as a special case of the M-H algorithm by using as a proposal distribution the so called full conditional posterior distribution: p(λj | λ1 , . . . , λj−1 , λj+1 , . . . , λd , y) = p(λj | λ∗j , y). Such proposal distribution implies a probability of acceptance to the fact that the ratio r is 1 (3.8) α equal to one, due (see Gelman et al., 2003). With an acceptance 3.2 Markov chain Monte Carlo methods 37 probability equal to one, at each iteration the algorithm performs the jump provided by step (iv) in the M-H algorithm. The Gibbs sampler is based on iterative sampling of the conditional distributions resulting from the decomposition of the full posterior density. A rst advantage of the Gibbs sampler is that, for every iteration, the values are randomly generated from unidimensional distributions for which a wide variety of computational tools exists (Gilks et al., 1996). Another important advantage is that it does not require the specication of a proposal distribution. This is a key point, because an inaccurate choice of the proposal q(·) in the M-H algorithm may lead to a very slow algorithm. Thus, if it is dicult to sample from a complex and/or high-parameterized posterior distribution and it is possible to decompose the vector of parameters, we can proceed to generate the parameter values from the single conditional distribution in a sequential way. Let suppose that we are interested in producing a sample of size the target distribution, represented here by the posterior distribution where λ = (λ1 , . . . , λp ). T from p(λ|y), The functioning of the Gibbs sampler algorithm can be described with the following steps (Ntzoufras, 2011): 1. Set initial values λ[0] ; 2. Then reiterate the following steps for (i) Set (ii) For (iii) Set t = 1, . . . , T : λ = λ[t−1] j = 1, . . . , p , update λj from λj ∼ p(λj | λ∗j , y) λ[t] = λ and save it as the generated set of values at t + 1 iteration of the algorithm. Hence, given a particular state of the chain values by: λ[t] , we generate the new parameter 38 3. Bayesian estimation of MIRT models [t] λ1 [t] λ2 [t] λ3 [t] [t−1] p(λ1 | λ2 from p(λ2 | λ1 , λ3 from p(λ3 | λ1 , λ2 , λ4 . . . λp [t−1] from , λ3 [t−1] , λ4 [t] [t−1] [t−1] [t] [t] [t−1] [t] [t] [t] , λ4 [t−1] , . . . , λp [t−1] , . . . , λp [t−1] , . . . , λp , y) , y) , y) . . . from [t] p(λp | λ1 , λ2 , λ4 , . . . , λp−1 , y) . Generating values from the single conditional distributions is relatively easy, since those are univariate distributions. Moreover, under appropriate conditions of regularity, the λ[t] -distribution will converge to the target distribution. Usu- ally, this convergence process is fast and the complete sequence λ[t] can be considered as the simulated sample of the distribution of interest (Matteucci, 2007). For a more detailed exposition of the Gibbs sampler, see Gamerman (1997) and Gelman et al. (2003), or Gelfand and Smith (1990) for early presentations of this widely used MCMC algorithm. 3.3 Bayesian computation using OpenBUGS In the following, the simulation study and the application on real data will be performed using OpenBUGS (http://www.openbugs.net), an open-source version of the famous software package BUGS (Bayesian inference Using Gibbs Sampling) that permits an user-friendly implementation of the Gibbs sampler. The software package BUGS was developed in the context of the BUGS project. The BUGS project started in 1989 in the MRC Biostatistic Unit in Cambridge and the last version of the resulting software developed by Spiegelhalter et al. (1996) became very popular in the 1990s. WinBUGS, an available windows-based version of BUGS, has nished to be further upgraded in 2012 hence OpenBUGS, which basically contains all the features of its ancestor WinBUGS, represents nowadays the future of the BUGS project. A detailed description of the software goes beyond the scope of this work, nevertheless, useful tools to understand the theoretical ideas that are the foundations of BUGS and its functioning are the book of Ntzoufras (2011) and Lunn 3.3 Bayesian computation using OpenBUGS 39 et al. (2013). As we can nd in Lunn et al. (2009), there are several reasons behind the success of the BUGS software. These appealing features can be strictly summarized in: • Flexibility. Flexibility is quite probably the principal reason for BUGS's popularity. BUGS runs the Gibbs sampling method to any directed acyclic graph specied in its language, moreover it allows the user to add new distributions and functions. • Easy implementation. The model implementation using BUGS is fairly simple because the package itself run the MCMC algorithm. It is not necessary for the user to write down all the full conditional distributions. Moreover, measures, plots and statistics to check the convergence and the t of the model are automatically computed. These aspects notwithstanding, the user must always be careful because BUGS does not perform any control about the model identication, thus several mistakes can be made without any alert from the program. As the manual clearly remark: Gibbs sampling can be dangerous! 40 3. Bayesian estimation of MIRT models Chapter 4 MIRT graded response models with complex structures In this chapter we specify two MIRT models for graded responses with a complex structure. After having established a dichotomization method, we focus on models with a multiunidimensional structure, where items in each subtest characterize a single ability, and on models with an additive structure, where each item measures a general and a specic ability directly. In the MIRT model presented, all the latent traits are allowed to correlate. The main scientic contribution of this work is the multidimensional additive model for graded responses with correlated traits, estimated with MCMC tecniques. Due to the adoption of Bayesian estimation methods, particular attention is paid to the model building phases. 4.1 MIRT graded response models (GRMs) A multidimensional generalization of IRT graded response model (GRM) can be obtained from its unidimensional counterpart. (ii) a set of p ordinal items where the response item can take values in the set κj1 , . . . , κj,Kj −1 {1, . . . , Kj }. Let consider: Yij individuals; of the i-th subject to the Each item thus has that have to satisfy the order constraint 41 (i) n κj1 j -th Kj − 1 thresholds < · · · < κj,Kj −1 ; 42 4. MIRT graded response models with complex structures and (iii) the existence of multiple, say m, latent abilities θi = (θ1i , . . . , θmi )0 underlying the responses to the items. For simplicity, in this paragraph we do not consider the number of latent dimensions, even if we have always to take in mind that θi is a vector, so we are dealing with the presence of concomitant latent dimensions. The key point of the choice of the underlying latent structure will be examined more closely later. Assumptions are quite similar to the unidimensional version of the model: it is assumed that an individual can reach a specic category level of an ordinal test item only if he/she is also able to reach all the lower categories on the same item. In other words, the item necessitates an amount of steps and the accomplishment of a step requires the achievement of the previous one. This type of model is then appropriate for rating scales where a rating category includes all previous categories (Reckase, 2009). The notation introduced above implies that the lowest score on item and the highest score is The probability that the i-th is 1 examinee will select j is assumed to increase monotonically with an increase in any component of the θi vector, i.e. an increase in any of the latent the k -th Kj . j category or higher on item abilities underlying the test. We have used a dichotomization procedure by adapting Samejima's approach (see section 1.2.1): in order to make the implementation of the models more clear and easy, our models are specied on the basis of the probability that an item response will fall in category k or lower , denoted by have used the probability that an item will fall in by P ∗ ). The probability on item j πijk that the i-th P (while in section 1.2.1 we category k or higher subject will select the , denoted k -th category is equal to the probability of answering below the upper boundary for the category (κk ) minus the probability of answering below the category's lower boundary (κk−1 ). Figure 4.1 illustrates the dichotomization method used. The dashed line, that represents the hypothetical response, falls in category k = 4: the probability to observe a response in that category can be easily calculated as Pi4 − Pi3 . Generalizing the example presented in Figure 4.1, the probability that the i-th k -th category on item j can be constructed Pijk = P (Yij ≤ k|θi ), for k = 2, . . . , Kj . We examinee's response will fall in the from the cumulative probabilities 4.1 MIRT graded response models (GRMs) Figure 4.1. 43 Dichotomization used for the MIRT graded response model speci- cation. The dashed line indicates the observed category response. obtain that: πijk = Pijk − Pi,j,k−1 = P (Yij ≤ k|θi ) − P (Yij ≤ k − 1|θi ) , (4.1) and, with the aim to guarantee that the probability of each category can be πij1 = Pij1 = P (Yij ≤ 1|θi ) = 1 − P (Yij ≤ Kj − 1|θi ). determined from (4.1), it is assumed that πijKj = 1 − Pi,j,Kj −1 and A normal ogive or a logistic formulation of the model can be obtained from expressions (1.2) and (1.3), but a previous step is needed to get an expression for the predictor ηij . In the multidimensional case the predictor becomes a func- θi vector of person parameters and the ξj vector of item parameters, ηij = f (θi , ξj ). In particular, to have an explicit formulation for the predictor we tion of the need to make some assumptions reecting the underlying latent structure hypothesized. Among the dierent underlying latent structures that can be assumed (see Paragraph 2.1.3), in this thesis we focus on: • models with a multiunidimensional structure, where items in each sub- test characterize a single ability; • models with an additive structure, where each item measures a general and a specic ability directly. 44 4. MIRT graded response models with complex structures As previously mentioned, the choice of these two latent structures has been driven by the fact that the rst one is widely used and represents a classical approach in MIRT analysis, while the second one is able to reect the complexity of real interactions between items and individuals. 4.1.1 Specication of the multiunidimensional GRM As previously mentioned, according to the multiunidimensional structure, each i is assumed to be characterized (θ1i , . . . , θmi ) where each latent dimension is by a vector of latent traits items. Thus, considering a test consisting of p individual m subtests indexed by ν, measured by a specic set of test items, the test is structured into pν each one composed by items that measure one latent trait. The cumulative probability that the individual category or lower on item j belonging to the θi = ν -th i will select the k -th subtest is given by: Pνijk = P (Yνij ≤ k|θνi , ανj , κjk ) = κjk −α Z νj θνi = Φ(κjk − ανj θνi ) = −∞ where ανj and κjk 1 2 √ e−t /2 dt , 2π (4.2) are item parameters representing the item discrimination and k + 1, respectively. The parameter θνi represents the i-th examinee ability in the ν -th ability dimension. We can observe that the predictor ηνij = f (θi , ξj ) assumes the form: ηνij = κjk − ανj θνi . the threshold between categories k and The multiunidimensional model for graded response can be specied in a normal ogive formulation from (4.2) and (ii) (i) by considering the cumulative probabilities obtained by applying the dichotomization procedure represented in Figure 4.1, according to which the probability select the k -th category on item πνijk j in subtest P νij1 = Pνijk − Pν,i,j,k−1 1−P ν,i,j,Kj −1 ν πνijk that the i-th examinee will is: for k=1 for k = 2, . . . , Kj − 1 for k = Kj . (4.3) 4.1 MIRT graded response models (GRMs) 45 It has to be noticed how in (4.2) only one specic ability aects the response to a specic item. This structure reminds the unidimensional version of the GRM: we can imagine to t a sequence of unidimensional models, each one for a specic subtest. Nevertheless, a relevant dierence consists in the fact that that distinct latent traits are now allowed to correlate. 4.1.2 Specication of the additive GRM A relevant aim of this work is to propose a new additive model for ordinal data, estimated by Bayesian MCMC techniques, where the general and specic latent traits are allowed to correlate. In this section, we provide the simple, but very eective, specication for the additive GRM. p items and structured into m subtests, each one composed by pν items (ν = 1, . . . , m). The responses to items belonging Let consider again a test consisting of to a specic subtest are assumed to be inuenced by a specic ability and a general ability, according to the underlying latent structure illustrated in Figure 2.5. The cumulative probability that the individual or lower on item j belonging to the ν -th i will select the k -th category subtest is given by: Pνijk = P (Yνij ≤ k|θ0i , θνi , α0νj , ανj , κjk ) = κjk −α0νj Zθ0i −ανj θνi = Φ(κjk − α0νj θ0i − ανj θνi ) = −∞ 1 2 √ e−t /2 dt . 2π (4.4) θ0i represents the i-th overall ability and θνi represent the specic abilities (with ν = 1, . . . , m). For each item j of the subtest ν : α0νj reects the item discrimination with reference to the overall ability, ανj reects the item discrimination with reference to the specic ability and κjk is an item parameter that reect the threshold between categories k and k + 1. The predictor ηνij now depends on both specic and general latent traits: ηνij = κjk − α0νj θ0i − ανj θνi . The probability πνijk that the i-th examinee will select the k -th category on item j in subtest ν is obtained recursively from (4.3), as in the multiunidimenHere, sional GRM. It has to be noticed that both general and specic abilities are involved in 46 4. MIRT graded response models with complex structures determining the response probability by following a compensatory approach. Finally, all the latent traits underlying the item responses are allowed to correlate. 4.2 Person and item parameters: interpretation The aim of this section is to briey illustrate the meaning of the parameters introduced in the MIRT models for graded responses described in paragraphs 4.1.1 and 4.1.2. Contents of this section are particularly helpful for practical applications. 4.2.1 Ability parameters The presence of more than one latent trait aecting the response process to a test is on the basis of the use of multidimensional item response theory models. The θi -vector of the latent space parameters for person i contains all the information about the measurement of these latent abilities. Higher levels of abilities lead to higher values in the elements of θi . Of course, the composition, and consequently the dimension, of the θi -vector depends on the underlying structure we are assuming. As often mentioned before, when we are dealing with a multiunidimensional structure the vector of person parameters has the form θi = (θ1i , . . . , θmi ). While in an additive context, a pa- rameter reecting the general ability is added, and we get: where m θi = (θ0i , θ1i , . . . , θmi ), still denotes the number of specic abilities. One lack in one specic dimension is compensated by the general dimension and viceversa. 4.2.2 Multidimensional item discrimination Moving towards the signicance of the discrimination item parameters, when considered individually, ανj reects the capability of a generic item between individuals with dierent levels of ability sional and additive models. item j Analogously, α0νj θν , j to discriminate both for multiunidimen- reproduces the aptitude of the to dierentiate individuals with dierent levels of general ability θ0 . 4.3 Multiunidimensional GRM implementation 47 Muraki and Carlson (1995) and Yao and Schwarz (2006) dene the multidimensional item discrimination (MDISC) as the maximum discrimination of a test item in a particular direction of the latent space. Hence, considering the multiunidimensional and additive latent structures assumed for the MIRT models presented in this work, we can dene two MDISC measures. The rst one (MDISC) is dened with reference to each one of the latent dimensions ν = 1, . . . , m. For MDISCj j = 1, . . . , p = m X it is expressed as: !1/2 2 ανj . (4.5) ν=1 ∗ The second one (MDISC ) include a further dimension, represented by the general ability. For j = 1, . . . , p, MDISC ∗ MDISCj ∗ is expressed by: = m X !1/2 2 2 ανj + α0νj . (4.6) ν=1 ∗ With reference to a given item, the higher a value of MDISC (MDISC ) is, the grater is the discrimination power of that item, independently from the assumed underlying latent structure. 4.3 Multiunidimensional GRM implementation In order to implement the multiunidimensional model for graded responses by using OpenBUGS, the rst step that we have to face is the so called building phase (Ntzoufras, 2011). model We can summarize the functioning of this phase through several sub-steps, namely: 1. identify the main variable of interest and the corresponding (observed) data; 2. build a structure for the parameters of the distribution; 3. specify the prior distributions; 48 4. MIRT graded response models with complex structures 4. nd a distribution that adequately describes the observed data and formulate the likelihood of the model. Considering that our observed variables of interest (point 1) are the responses, given from a group of examinees, to a test consisting of graded response items, in this section we will dene all the elements listed above, according to the model characterized by a multiunidimensional latent ability structure, i.e. according to the probability function dened in (4.2). 4.3.1 Model specication The probability model is specied according to the multiunidimensional structure (point 2). Recalling the expression in (4.2), a generic cumulative probability Pνijk is a function of the item discrimination parameter (ανj ), the threshold parameter k + 1 (κjk ), and the specic ability measured by the Thus, for ν = 1, . . . , m, j = 1, . . . , p and k = 1, . . . , Kj − 1, it = Φ(κjk − ανj θνi ). between categories j -th item (θνi ). holds that Pνijk k and PνijKj = 1, and we obtain by dierence the probability that the response of individual i to item j will fall in category k : πνij1 = Pνij1 and πνijk = Pνijk − Pν,i,j,k−1 , for ν = 1, . . . , m, j = 1, . . . , p and k = 2, . . . , Kj . As previously described, we set The model parameters, treated in a Bayesian context as proper random variables, for which we need to specify prior distributions are the person parameters θi = (θ1i , . . . , θmi ) and the item parameters ανj and κj1 , . . . , κj,Kj −1 . 4.3.2 Prior distributions Getting on to point 3 of the model building phase, in the multiunidimensional GRM we assume that the latent traits θ1 , . . . , θn are independent and multivariate normally distributed: θi ∼ Nm (µ, Σ) , where θi = (θ1i , . . . , θmi ) m-dimensional is the vector of latent traits for examinee mean vector and Σ is the m×m i, µ is the constrained variance-covariance 4.3 Multiunidimensional GRM implementation matrix with diagonal elements being 49 1 and o-diagonal elements being the ability correlations. Thus, for i = 1, . . . , n, the prior distribution for θi is dened as: 1 0 −1 exp − (θi − µ) Σ (θi − µ) , p(θi ) = p 2 (2π)m |Σ| 1 where m (4.7) is the number of specic latent traits (subtests). Moreover, normal distributions are assumed for item discrimination parameters, that is ανj ∼ N (µα , σα2 ), for ν = 1, . . . , m and j = 1, . . . , p. In addition, considering that the parameter which reects the power of the item to discriminate between examinees is signicantly positive, the truncated version of the normal distribution is taken into account: ανj ∼ N (µα , σα2 ) I(ανj > 0) , where I indicates the indicator function. The priors for the threshold parameters must account for the order constraint κj1 < · · · < κj,Kj −1 , hence we proceed rst introducing unconstrained auxiliary ∗ ∗ ∗ 2 parameters κj1 , . . . , κj,K −1 such that κjk ∼ N (µκ , σκ ) for j = 1, . . . , p and k = j 1, . . . , Kj − 1 (Curtis, 2010). Then, prior distributions on the thresholds for the j -th item can be obtained considering the order statistics for the auxiliary variables: κj1 = κ∗j,[1] κj2 = κ∗j,[2] . . . κj,Kj −1 = κ∗j,[Kj −1] where with , κ∗j,[s] is denoted the s-th order statistic of κ∗j1 , . . . , κ∗j,Kj −1 . As reported in Curtis (2010), this approach is also recommended by Plummer (2010). Identication issues Particular attention should be paid to the restrictions that have to be imposed on hyperparameters in order to ensure the model identication. In general, Bayesian 50 4. MIRT graded response models with complex structures item response models can be identied (Fox, 2010) by imposing restrictions on the hyperparameters or via a (standard) scale transformation in estimation procedure. According to the rst approach, for identication purposes we set 0, µκ = 0, σα2 =1 2 and σκ = 1. Moreover, a multivariate normal prior distribution with a xed correlation structure is assumed for abilities: i = 1, . . . , n, where Σ µ = 0, µα = θi ∼ Nm (0, Σ), for is the variance-covariance matrix dened before. Even if this choice can be viewed as very restrictive, it reects the common beliefs and usual assumption we nd in literature. In fact, a point of strength of the Bayesian approach is the possibility to formulate particular prior distributions depending on the information available a priori. 4.3.3 Likelihood function for responses πνij1 , . . . , πνijKj phase), thus for ν = A categorical or generalized Bernoulli distribution of parameters is assumed for responses (point 4 of the model building 1, . . . , m, j = 1, . . . , p and i = 1, . . . , n, it holds that: Yij |• ∼ Cat(πνij1 , . . . , πνijKj ) , (4.8) therefore: [k=1] [k=2] [k=K ] P (Yij = k|•) = πνij1 · πνij2 · . . . · πνijKjj . (4.9) Once the likelihood function for observed data is dened, the model is specied and we can perform the Bayesian estimation of the parameters of interest through an easy implementation in OpenBUGS, which run the Gibbs sampler algorithm. In particular the main advantage is due to fact that the joint posterior distribution has an untractable form, while the full conditional distributions are well dened. In fact: P (θ, α, κ, Σ|Y ) ∝ L(Y |θ, α, κ, Σ) P (θ|Σ) P (α) P (κ) . (4.10) Expression (4.10) represents the joint posterior distribution of interest, where is the likelihood function and θ, α and κ are assumed to be independent. L 4.4 Additive GRM implementation 51 Details about the code used to implement the model in OpenBUGS are reported in Appendix A. 4.4 Additive GRM implementation As mentioned before for the multiunidimensional model, the implementation in OpenBUGS of the additive GRM needs the specication of the model according to the probability function dened in (4.4), the denition of the prior distributions, and the formulation of the likelihood function for the observed responses. 4.4.1 Model specication The existence of a general ability in addition to the specic abilities implies the introduction of the further component individual i: θi = (θ0i , θ1i , . . . , θmi ), θ0i in the vector of person parameters for therefore the dimension of this vector is now m + 1. According to the additive structure presented in Figure 2.5, where each item measures an overall and a specic ability directly, and translated in expression (4.4), a generic cumulative probability Pνijk is a function of the item discrimi- nation parameter related to the general ability (α0νj ), the item discrimination parameter related to the specic ability (ανj ), the threshold parameter between categories k the specic k + 1 (κjk ), the general ability of the individual (θ0νi ), and ability (θνi ) measured by the j -th item. We remind that each item and belonging to a given subtest measures the general ability and only one specic ν = 1, . . . , m, j = 1, . . . , p = Φ(κjk − α0νj θ0i − ανj θνi ). ability. Hence, for Pνijk Again, we set i to item j PνijKj = 1, j = 1, . . . , p and k = 1, . . . , Kj − 1, it holds that and the probability that the response of individual will fall in category k πνij1 = Pνij1 k = 2, . . . , Kj . nidimensional case: and can be obtained by dierence, as in the multiuand πνijk = Pνijk − Pν,i,j,k−1 , for ν = 1, . . . , m, 52 4. MIRT graded response models with complex structures 4.4.2 Prior distributions Also in the additive GRM we assume that the latent traits θ1 , . . . , θn are inde- pendent and multivariate normally distribuited: θi ∼ Nm+1 (µ, Σ) , where µ is the (m + 1)-dimensional mean vector and (m + 1) × (m + 1) elements being 1 and o- Σ constrained variance-covariance matrix with diagonal is the diagonal elements being the ability correlations. As explained above, inee i θ0i represents the unobservable general ability for exam- which aects all the responses given from this examinee to the test items, while the specic abilities for the individual i, θ1i , . . . , θmi , aect every item contained in the corresponding subtest ν , for ν = 1, . . . , m. The prior distribution for θi is then dened by expression (4.7), for i = 1, . . . , n, where m is the number of both subtests and specic latent traits. µ=0 and Σ For identication purposes, we set xed variance-covariance matrix. Normal distributions are assumed for item discrimination parameters ανj , for ν = 1, . . . , m and α0νj and j = 1, . . . , p: α0νj ∼ N (µα0 , σα2 0 ) ανj ∼ N (µα , σα2 ) , and after having limited these parameters to be positive and having considered the µα0 = µα = 0 and σα2 0 = σα2 = 1), distributions for α0νj and ανj : restraints due to the identication issues (i.e. we obtain truncated normal prior α0νj ∼ N (0, 1) I(α0νj > 0) ανj ∼ N (0, 1) I(ανj > 0) . Finally, concerning the threshold parameters, again we obtain an ordered series κjk , . . . , κj,Kj −1 starting from the unconstrained variables tication constraints on hyperparmeters κ∗jk ∼ N (0, 1), µκ = 0 and σκ2 = 1) and applying the transformation: {κjk , . . . κj,Kj −1 } = ranked{κ∗j1 , . . . , κ∗j,Kj −1 } . κ∗jk (with iden- distributed as 4.4 Additive GRM implementation 53 4.4.3 Likelihood function for responses Likewise the multiunidimensional GRM, a categorical distribution of parameters πνij1 , . . . , πνijKj is assumed for responses, therefore, for ν = 1, . . . , m, j = 1, . . . , p and i = 1, . . . , n, expressions (4.8) and (4.9) hold also for the additive GRM. See Appendix A for details about the code used to implement the additive GRM in OpenBUGS. Summarizing, Table 4.1 reports the main characteristics of the multiunidimensional and the additive model considered in this work. 54 4. MIRT graded response models with complex structures Multiunidimensional Model Additive Model for Graded Responses for Graded Responses Underlying latent structure Model specication Pνijk = Φ(κjk − ανj θνi ) Pνijk = Φ(κjk − α0νj θ0i − ανj θνi ) PνijKj = 1 PνijKj = 1 πνij1 = Pνij1 πνij1 = Pνij1 πνijk = Pνijk − Pν,i,j,k−1 πνijk = Pνijk − Pν,i,j,k−1 Prior distributions on person parameters θi -vector θi = (θ1i , . . . , θmi ) θi = (θ0i , θ1i , . . . , θmi ) Prior on θi θi ∼ Nm (0, Σ) θi ∼ Nm+1 (0, Σ) Prior distributions on item parameters Item discrimination (for a specic ability) ανj ∼ N (0, 1) I(ανj > 0) ανj ∼ N (0, 1) I(ανj > 0) Item discrimination (for the general ability) none α0νj ∼ N (0, 1) I(α0νj > 0) Threshold parameters κ∗jk ∼ N (0, 1) = ranked{κ∗j1 , . . . , κ∗j,Kj −1 } κ∗jk ∼ N (0, 1) = ranked{κ∗j1 , . . . , κ∗j,Kj −1 } {κjk , . . . κj,Kj −1 } Response likelihood Yij |• ∼ Cat(πνij1 , . . . , πνijKj ) [k=1] [k=2] [k=K ] P (Yij = k|•) = πνij1 · πνij2 · . . . · πνijKjj Table 4.1. Main features of the proposed multiunidimensional and additive models for graded responses. Chapter 5 Simulation Study In this chapter we present the simulation study performed to assess the item parameter recovery for both multiunidimensional and additive GRMs. The simulation study is conducted on a bidimensional case by varying the number of response categories, the sample size, the test and subtest lengths and the ability correlation structure. Two distinct simulation analyses have been designed in order to evaluate the parameter recovery of he multiunidimensional and the additive GRMs, respectively. A rst series of simulations was carried out with the same simulation conditions for both models (Block 1). Then further conditions were analyzed in order to better understand the behavior of the additive model (Block 2). Several works on MIRT models focus on the accuracy of parameter estimation, and, through the manipulation of simulation conditions, it is possible to assess parameter recovery (Sheng, 2008; Sheng, 2010; Edwards, 2010a). The rst section of the chapter describes the simulation study design, while in the second and third sections are illustrated the simulation conditions and results for the multiunidimensional and additive models, respectively. 55 56 5. Simulation Study 5.1 Simulation study design The aim is the evaluation of the item parameter recovery of the multiunidimensional and the additive GRMs under several conditions. We consider the bidimensional case, m = 2, which, in particular, implies the presence of two specic θ1 and θ2 for the multiunidimensional model, and the presence of two speabilities θ1 and θ2 and an overall ability θ0 for the additive model (recalling abilities cic the graphical notation introduced before, the latent structures are summarized in Figure 5.1). Figure 5.1. Bidimensional case for multiunidimensional and additive structures. The model parameters and the ability correlations are estimated through OpenBUGS version 3.2.2. The fundamental scheme for each simulation is the following (for more details about the procedure and the codes used for the implementation, see Appendix B): • Simulate the vectors of `real' parameters, taking into account the conditions we are testing. We perform this step using an R GUI procedure. • Perform Q = 10 replications of the computation procedure for each simu- lation. In each replication we sample the data matrix using the parameters obtained at the previous step, and we run OpenBUGS through the R GUI package BRugs (Thomas et al., 2006), which basically permits to recall OpenBUGS automatically from R. • Proceed to the evaluation of parameter recovery and the computation of the reproduced correlations between the latent traits by using the gained at the previous step. Q estimates 5.1 Simulation study design 57 5.1.1 Parameter recovery In order to evaluate the recovery of the generated item parameters (which in our simulation context correspond to the real population values), we compute the absolute bias and the root mean square error (RMSE) for each estimated parameter, taking account the with ω̂ Q replications for each simulation. If we denote a generic parameter estimate, i.e. the mean of the posterior distribution gained in each replication, and with ω ∗ the real generated value, biases and RMSE are computed as follow: Q 1 X (ω̂q − ω ∗ ) Bias(ω) = Q q=1 (5.1) v u Q X 1u t (ω̂q − ω ∗ )2 , RMSE(ω) = Q q=1 (5.2) where lower levels of bias and RMSE indicate better precision in parameter recovery. 5.1.2 Estimated ability correlations Considering that the two models have been specied allowing the latent traits to correlate, and that the correlation structure is reected in the variance-covariance matrix of the latent abilities Σ, we are not interested only in item parameters recovery, but also in the way the models are able to reproduce such ability correlations. For this reason, for each simulation, we report also the estimated ability Pearson correlations: r̂12 for the multiunidimensional model, and the additive model (remind that with 0 r̂01 , r̂02 and r̂12 for we refer to the overall ability). 5.1.3 Convergence detection In Lunn et al. (2013) is clearly described how important is the detection of the chain convergence. An easy, but eective, strategy is the detection of convergence 58 5. Simulation Study informally by eye. Anyway, the model could include many parameters and, consequently, it can be quite hard to check all of them by eye. shows two examples of chains that have reached the convergence. part of the chain, i.e. the non-stationary part, is called burn-in 1 Figure 5.2 The initial and the iterations belonging to it must be discharged to be sure that the successive realisations can be considered as a sample from the stationary distribution. The burn-in period is easily recognizable in the rst chain reported in Figure 5.2. Figure 5.2. Examples of stationary chains. The so called R statistic of Gelman and Rubin, proposed by Gelman and Rubin (1992) and further developed by Brooks and Gelman (1998), represents an useful instrument adopted to check the convergence of the Markov chains, and hence the reliability of the estimates. This convergence diagnostic can be constructed only when more than one chain are run simultaneously. This aspect lead to our decision of running two distinct chains for each simulation (see the next section). Basically, the convergence is reached when the chains follow an indistinguishable, not recognizable, trajectory from the initial values. The method is based on the between and within sample variabilities (Ntzoufras, 2011) and the diagnostic statistic is given by: R̂ = 1 Source: Lunn et al., 2013. V̂ T 0 − 1 B/T 0 M + 1 = + , W T0 W M 5.1 Simulation study design 59 T 0 represents the number of iterations in each chain, M is the number of 0 chains, B/T is the between-sample variance, that is the variance of the posterior mean values taking into account all the chains, W is the within-sample variance, where that is the mean of variances within each chain, and the pooled posterior variance is given by (Ntzoufras, 2011): B M +1 T0 − 1 . W+ 0 0 T T M V̂ = Once the chains are stationary and the convergence is reached, R̂ → 1. A cor- rected version of the R statistic also exists, see Brooks and Gelman (1998). 5.1.4 Bayesian t Additionally to the calculation and examination of R̂, other well known indicators for the t evaluation are the Bayesian deviance and the deviance information criterion (DIC) (Lunn et al., 2013). Their use is appropriate to obtain some measures of t and complexity of the model considered. The Bayesian deviance is dened as: D(θ) = −2 log p(y|θ) , where θ denotes the model parameters and with p(y|θ) is denoted the full sampling distribution. OpenBUGS considers it as a node (created automatically), so that it has its own posterior distribution and can be considered like the other model parameters. Combining the mean posterior deviance, model parameters, pD , D(θ), and the number of we can compute the DIC through the expression DIC = D(θ) + pD . It can be proved that the DIC is an approximation of the Akaike's information criterion, AIC = D(θ) + 2pD . Also in this case, OpenBUGS permits to easily compute the DIC for each model implemented. 60 5. Simulation Study 5.1.5 General simulation conditions For all the simulations conducted in this work, Q = 10 replications have been performed. For each one, we have considered a chain length of 30,000 iterations, with a burn-in phase of 15,000 iterations. Moreover, two chains have been generated, in order to be able to set in OpenBUGS the computation of the R and the DIC statistics. These choices may be penalizing with reference to the computational time needed to run the Gibbs sampling algorithm for each simulation (a single replication needs about 13 hours to be completed), nevertheless, after an examination of the R diagnostic illustrated above, they ensure the reaching of the convergence. For each distinct case, we perform dierent simulations according to a sample size of n = 500, and a larger sample size of n = 1000. 5.2 Multiunidimensional GRM: simulations and results In this section we report the conditions and the results about the simulations made to assess the parameter recovery of MCMC estimation for the multiunidimensional model for graded responses. All the simulations conducted are characterized by the general conditions reported in section 5.1.5 and other specic conditions, with the aim to evaluate the sensitivity of the model. 5.2.1 Simulation conditions We consider n individuals and a set of each one consisting of p1 and p2 p ordinal items, divided into 2 subtests, items. The response Yij of the i-th individual to j -th item can take values in the set 1, . . . , Kj , hence each item is characterized by Kj − 1 thresholds satisfying the order constraint κj1 < . . . κj,Kj −1 . Moreover, we assume that all the test items have the same number of categories, i.e. K1 = ... = Kp = K . Additionally, we assume the existence of m = 2 latent abilities, θ1 and θ2 , underlying the responses to the items, which follow a multiunidimensional the latent structure (see Figure 5.1, left part). Thus, the test consists of two subtests 5.2 Multiunidimensional GRM: simulations and results p p1 p2 Kj n Σ 1 15 5 10 3 500 2 15 5 10 3 500 3 15 5 10 4 500 4 15 5 10 4 500 5 15 5 10 3 1000 6 15 5 10 3 1000 7 15 5 10 4 1000 8 15 5 10 4 1000 ΣA ΣB ΣA ΣB ΣA ΣB ΣA ΣB Simulation ] ] ] ] ] ] ] ] Table 5.1. 61 Simulation conditions for the multiunidimensional model for graded responses. and the items in each subtest characterize a single specic ability. Moreover, the specic abilities are allowed to correlate and the model follows a compensatory approach. We perform a block of simulations (Block 1) referred to the case where a test p = 15 is divided into a rst subtest made of p1 = 5 items and a second subtest made of p2 = 10 items. A further distinction has been made about the number of item categories, varying from K = 3 to K = 4. Furthermore, each case was analyzed by using two dierent correlation matrices among the abilities: ΣA and ΣB . ΣA is a 2 × 2 identity matrix, where the correlation among the specic abilities is set to zero (r12 = 0). The second correlation matrix ΣB introduces a moderate correlation between the latent abilities (r12 = 0.4). length of By combining all the conditions, we obtain 8 dierent scenarios, listed in Table 5.1, to investigate the parameter recovery for the multiunidimensional GRM. 5.2.2 Results In this section we report the results we obtained for each of the 8 simulations conducted for the multiunidimensional model for graded responses. In the following, for each item parameter type within a subtest, median absolute bias and median root mean square error are reported for each scenario, as well as the 62 5. Simulation Study Simulations Block 1 - Subtest 1 (5 items) α1 n (p1 , p2 ) K RMSE κ1 Bias RMSE ΣA ΣB (5,10) 4 ΣA ΣB (5,10) 3 ΣA ΣB 0.06 0.02 0.05 0.06 0.04 (5,10) 4 ΣA ΣB 0.05 0.05 1000 Table 5.2. Bias 0.11 0.09 0.08 0.05 0.11 0.07 0.10 0.02 0.11 0.09 0.12 0.06 0.09 0.05 0.10 0.10 (5,10) 3 500 κ2 κ3 RMSE Bias RMSE Bias 0.08 0.04 0.07 0.03 0.07 0.03 0.09 0.04 0.08 0.02 0.09 0.05 0.03 0.08 0.05 0.05 0.04 0.05 0.02 0.02 0.06 0.04 0.05 0.01 0.04 0.01 0.01 0.04 0.01 0.03 0.01 0.05 0.02 Multiunidimensional model: block 1 simulation results for subtest 1 (median RMSEs and median absolute biases). Simulations Block 1 - Subtest 2 (10 items) α1 n (p1 , p2 ) K κ1 RMSE Bias RMSE κ2 Bias κ3 RMSE Bias RMSE Bias (5,10) 3 ΣA ΣB 0.09 0.02 0.07 0.02 0.07 0.02 0.08 0.01 0.07 0.03 0.08 0.02 (5,10) 4 ΣA ΣB 0.07 0.03 0.08 0.09 0.05 0.09 0.02 0.08 0.06 0.08 0.08 0.01 (5,10) 3 ΣA ΣB 0.07 0.02 0.06 0.02 0.05 0.03 0.08 0.03 0.06 0.03 0.06 0.03 (5,10) 4 ΣA ΣB 0.07 0.02 0.05 0.02 0.05 0.02 0.06 0.01 0.06 0.02 0.05 0.01 0.05 0.00 0.08 0.03 500 1000 Table 5.3. 0.12 0.07 0.11 0.02 Multiunidimensional model: block 1 simulation results for subtest 2 (median RMSEs and median absolute biases). ability correlation estimates. In Tables 5.2 and 5.3 we present RMSE and absolute bias for the item parameters (discrimination and thresholds parameters) characterizing, respectively, the 5.2 Multiunidimensional GRM: simulations and results 63 items belonging to the rst subtest and the items belonging to second subtest. Values of the RMSE greater than 0.10 and values of the absolute bias greater than 0.05 are highlighted in bold, identifying cases where the parameter recovery could be improved. With reference to the rst subtest, Table 5.2 shows how the worst performances are related to the smaller sample sizes (n creased the sample size to = 500). In fact, when we in- n = 1000, the RMSEs and biases noticeably decreased, other things being equal. The presence of an underlying correlation between the two latent traits does not seem to aect the item parameters recovery. Results are similar for items belonging to the second subtest (Table 5.3). Higher biases are noticed when sample sizes are smaller, even though the overall parameter reproduction is better if compared to the rst subtest. This aspect should be due to the greater number of items included in the second subtest (p2 = 10 versus p1 = 5). For n = 1000 item parameters are recovered very precisely. Multiunidimensional model: Real and estimated ability correlations r12 (5,10) 3 ΣA ΣB (5,10) 4 ΣA ΣB (5,10) 3 ΣA ΣB (5,10) 4 ΣA ΣB 500 1000 Table 5.4. r̂12 0.00 0.02 0.40 0.46 0.00 0.03 0.40 0.45 0.00 -0.01 0.40 0.49 0.00 -0.01 0.40 0.48 Multiunidimensional model: real (r ) and estimated (r̂ ) ability corre- lations. Table 5.4 illustrates the estimated ability correlations for each scenario. can be noticed that the dierences between the generated real values estimated values r̂12 r12 It and the are quite low, indicating good performances of the model. 64 5. Simulation Study Here, unlike what we observe for item parameters, the underlying ability correlation structure seems to inuence the correlation reproduction. In fact we observe a worst reproduction in correspondence to the model characterized by the more complex latent correlation structure ΣB . The sample size seems to inuence the latent traits correlation reproduction: the reproduction accuracy increases with the increase of sample size for the simple case (ΣA ), while it decreases with the increase of sample size for the complex case (ΣB ). As a conclusive remark, what emerges from the simulation study conducted to assess the multiunidimensional model for graded responses with correlated latent traits is that item parameters and ability correlations are well reproduced. 5.3 Additive GRM: simulations and results In this section we report the conditions and the results related to the simulation study conducted to evaluate the multidimensional additive GRM with correlated abilities, estimated within a Bayesian context. In addition to the rst block of simulations designed also for the multiunidimensional model, a further block of simulations has been performed, in order to better understand the behavior of our proposed model. 5.3.1 Simulation conditions The general simulation conditions for the additive model for graded responses are the same as the multiunidimensional model. We still assume the existence of of m = 2 specic latent abilities, overall latent ability θ0 . θ1 and θ2 , but now we consider also an Accordingly, we are focusing on the bidimensional case, for which the latent structure is represented in Figure 5.1, right part. p2 the number of items belonging to the rst and the second subtest, respectively. Kj We indicate with p the total number of ordinal test items, with p1 and 5.3 Additive GRM: simulations and results 65 j -th item and we consider that all the items Kj = K, ∀j . indicates the greater category for the have the same number of categories We start from a rst block of simulations (Block 1) referred to the case where a test length of p = 15 is divided into a rst subtest made of second subtest made of p2 = 10 items. p1 = 5 items and a A further distinction has been made about the number of item categories, varying in the rst block from K=3 to K = 4. Furthermore, each case was analyzed by using two dierent correlation matrices 3 × 3 identity matrix, where all the correlations among the abilities are set to zero (r01 = r02 = r12 = 0). In this case, among the abilities: ΣA and ΣB . ΣA is a the additive model with orthogonal traits has the same latent structure of the well known bi-factor model and the three latent traits (the general and specic abilities) are separate and well distinguished from each other. The second corre- ΣB introduce moderate correlations between all the latent abilities = 0.4, r02 = 0.3, r12 = 0.2). The choice to consider not particularly high lation matrix (r01 levels of correlation has been driven by the consideration that high correlations among the latent abilities may lead to the existence of a dominant latent trait, redirecting to a unidimensional model. In order to investigate further conditions, we designed a second block of simulations (Block 2), where we increase both the length of the test and the number of item categories. We consider a case characterized by a test length of (divided into p1 = 20 and p2 = 30 p = 50 items for subtest 1 and 2, respectively) and K = 4 categories for each test item; and a last case where the test length is p = 30 (p1 = 10 and p2 = 20) and items have K = 5 categories. Again, with respect to the correlation matrix, the two cases of ΣA and ΣB are distinguished as above. By combining all the simulation conditions, we obtain 16 dierent scenarios, illustrated in Table 5.5, to investigate the parameter recovery for the proposed model. 66 5. Simulation Study p p1 p2 Kj n Σ 1 15 5 10 3 500 2 15 5 10 3 500 3 15 5 10 4 500 4 15 5 10 4 500 5 15 5 10 3 1000 ΣA ΣB ΣA ΣB ΣA ΣB ΣA ΣB Simulation ] ] ] ] ] ] ] ] 6 15 5 10 3 1000 7 15 5 10 4 1000 8 15 5 10 4 1000 50 20 30 4 500 50 20 30 4 500 30 10 20 5 500 30 10 20 5 500 50 20 30 4 1000 50 20 30 4 1000 30 10 20 5 1000 30 10 20 5 1000 ]9 ] 10 ] 11 ] 12 ] 13 ] 14 ] 15 ] 16 Table 5.5. ΣA ΣB ΣA ΣB ΣA ΣB ΣA ΣB Simulation conditions for the additive model for graded responses. 5.3.2 Results Tables 5.6 and 5.7 show the item parameter recovery for the rst block of simulations where p = 15 (p1 = 5 and p2 = 10), respectively for subset 1 and subtest 2. It emerges that all parameters are quite well recovered when the number of K = 3 categories for each item is and a sample size of get accurate estimates. Results are slightly better for the rather than n = 500 is enough to ΣA correlation matrix, ΣB . On the other hand, when the number of item categories is accurate estimates, for both ΣA and ΣB K = 4 we obtain less ability correlation structures. Estimates get better after increasing the sample size, but median RMSEs and biases remain rather high, especially for α0 and αv discrimination parameters. that this result is more evident for the rst subtest where second one where Considering p1 = 5, rather than the p2 = 10, this may be due to the small number of item compared 5.3 Additive GRM: simulations and results 67 to the increased number of categories. Results about the second block of simulations are reported in Tables 5.8 and 5.9. Focusing on the case where p = 50 (p1 = 20 and p2 = 30) and K = 4, we observe that in both subtests the item parameters are not well recovered, particularly the discrimination parameters. Nevertheless, these shortcomings are overtaken by increasing the sample size. In fact, when n = 1000 all the parameters are recovered rather precisely. Dierent correlation structures seem not to aect parameter recovery, with an exception of the discrimination parameters for the second subtest, where we register higher median RMSEs in association to the more complex correlation structure. p = 30 (p1 = 10 and p2 = 20) and K = 5 benet from the enlarged sample size. For n = 1000, item parameters are recovered with care, with slightly better accuracy with respect to ΣA correlation matrix. Analogously, the cases where 68 Simulations Block 1 - Subtest 1 (5 items) α0 n (p1 , p2 ) K RMSE α1 Bias (5,10) 3 ΣA ΣB 0.08 0.05 0.09 0.02 (5,10) 4 ΣA ΣB 0.13 0.17 0.07 (5,10) 3 ΣA ΣB (5,10) 4 ΣA ΣB 500 1000 Bias RMSE Bias RMSE 0.03 0.08 κ3 Bias 0.13 0.09 0.10 0.01 0.07 0.04 0.09 0.03 0.09 0.02 0.15 0.16 0.12 0.16 0.15 0.10 0.05 0.15 0.12 0.23 0.09 0.02 0.07 0.03 0.09 0.03 0.08 0.06 0.07 0.03 0.09 0.03 0.08 0.03 0.06 0.03 0.15 0.05 0.06 0.03 0.08 0.05 0.08 0.04 0.09 0.14 0.02 0.08 0.08 κ2 0.14 0.16 0.16 0.07 0.12 0.12 RMSE 0.13 0.15 0.16 0.15 κ4 Bias RMSE Bias 0.04 0.03 0.10 0.10 Additive model: block 1 simulation results for subtest 1 (median RMSEs and median absolute biases). 5. Simulation Study Table 5.6. RMSE κ1 n (p1 , p2 ) K (5,10) 3 ΣA ΣB (5,10) 4 ΣA ΣB (5,10) 3 ΣA ΣB (5,10) 4 ΣA ΣB 500 1000 Table 5.7. α2 RMSE Bias 0.09 0.11 0.12 0.14 0.09 0.15 0.16 0.23 κ1 κ2 κ3 RMSE Bias RMSE Bias RMSE Bias 0.05 0.10 0.02 0.08 0.02 0.08 0.03 0.04 0.00 0.05 0.09 0.04 0.10 0.00 0.05 0.10 0.03 0.04 0.04 0.10 0.10 0.10 0.09 0.02 0.05 0.06 0.02 0.06 0.02 0.03 0.05 0.02 0.05 0.01 0.07 0.03 0.07 0.07 0.04 0.06 0.04 0.14 0.14 0.04 0.09 0.02 0.05 0.12 0.13 0.18 0.16 0.19 0.09 0.11 κ4 RMSE Bias 0.13 0.11 0.06 0.03 0.08 0.03 0.03 0.09 0.04 RMSE Bias 0.03 5.3 Additive GRM: simulations and results Simulations Block 1 - Subtest 2 (10 items) α0 Additive model: block 1 simulation results for subtest 2 (median RMSEs and median absolute biases). 69 70 Simulations Block 2 - Subtest 1 (20 and 10 items) α0 n (p1 , p2 ) K (20,30) 4 ΣA ΣB (10,20) 5 ΣA ΣB (20,30) 4 ΣA ΣB (10,20) 5 ΣA ΣB 500 1000 κ1 RMSE Bias RMSE Bias 0.14 0.15 0.20 0.17 0.07 0.07 0.07 0.14 0.18 0.21 0.22 0.08 0.10 0.06 0.07 0.07 0.05 0.08 0.06 0.02 0.08 0.07 0.19 0.05 0.04 0.04 0.07 0.27 κ2 RMSE Bias RMSE 0.10 0.06 0.09 0.03 0.09 κ3 κ4 Bias RMSE Bias RMSE Bias 0.10 0.05 0.10 0.04 0.08 0.03 0.09 0.03 0.01 0.09 0.03 0.08 0.04 0.09 0.04 0.10 0.02 0.09 0.02 0.08 0.02 0.08 0.02 0.04 0.06 0.01 0.06 0.01 0.05 0.01 0.04 0.06 0.04 0.06 0.04 0.06 0.03 0.04 0.08 0.02 0.06 0.02 0.05 0.02 0.05 0.01 0.05 0.07 0.03 0.06 0.03 0.05 0.03 0.07 0.02 Additive model: block 2 simulation results for subtest 1 (median RMSEs and median absolute biases). 5. Simulation Study Table 5.8. α1 α0 n (p1 , p2 ) K α2 κ1 RMSE Bias RMSE Bias RMSE κ2 Bias RMSE κ3 Bias RMSE κ4 Bias RMSE Bias (20,30) 4 ΣA ΣB 0.16 0.07 0.20 0.08 0.10 0.10 0.05 0.10 0.10 0.06 0.03 0.02 0.06 0.10 0.07 0.10 0.05 (10,20) 5 ΣA ΣB 0.23 0.20 0.08 0.09 0.18 0.19 0.05 0.08 0.09 0.03 0.07 0.03 0.07 0.02 0.09 0.03 0.08 0.03 0.07 0.03 0.08 0.02 0.08 0.02 (20,30) 4 ΣA ΣB 0.06 0.05 0.06 0.02 0.06 0.02 0.06 0.02 0.06 0.01 0.03 0.07 0.02 0.05 0.02 0.07 0.02 (10,20) 5 ΣA ΣB 500 1000 Table 5.9. 0.14 0.03 0.09 0.03 0.06 0.17 0.07 0.05 0.07 0.03 0.05 0.01 0.05 0.01 0.05 0.01 0.06 0.02 0.06 0.01 0.08 0.03 0.06 0.02 0.05 0.02 0.05 0.01 0.06 0.02 5.3 Additive GRM: simulations and results Simulations Block 2 - Subtest 2 (30 and 20 items) Additive model: block 2 simulation results for subtest 2 (median RMSEs and median absolute biases). 71 72 5. Simulation Study Additive model: Real and estimated ability correlations r01 r̂01 r02 r̂02 r12 r̂12 (5,10) 3 ΣA ΣB 0.00 0.07 0.00 0.16 0.00 -0.07 0.40 0.62 0.30 0.49 0.20 0.24 (5,10) 4 ΣA ΣB 0.00 0.09 0.00 0.29 0.00 -0.07 0.40 0.60 0.30 0.56 0.20 0.27 (5,10) 3 ΣA ΣB 0.00 0.11 0.00 0.16 0.00 -0.05 0.40 0.60 0.30 0.54 0.20 0.27 (5,10) 4 ΣA ΣB 0.00 0.13 0.00 0.36 0.00 -0.05 0.40 0.58 0.30 0.65 0.20 0.30 (20,30) 4 ΣA ΣB 0.00 0.00 0.00 0.29 0.00 -0.05 0.40 0.50 0.30 0.36 0.20 0.21 (10,20) 5 ΣA ΣB 0.00 0.02 0.00 0.15 0.00 -0.03 0.40 0.51 0.30 0.48 0.20 0.24 (20,30) 4 ΣA ΣB 0.00 0.03 0.00 0.07 0.00 -0.02 0.40 0.45 0.30 0.37 0.20 0.20 (10,20) 5 ΣA ΣB 0.00 0.06 0.00 0.11 0.00 -0.05 0.40 0.52 0.30 0.41 0.20 0.24 500 1000 500 1000 Table 5.10. Additive model: real (r ) and estimated (r̂ ) ability correlations. Table 5.10 illustrates the estimated ability correlations for each scenario. Their correspondent true values are also reported and we can observe how the correlations are reproduced. In particular, the results are coherent with the ones observed in relation to the item parameters: the best performance is associated to the cases of the highest sample size, a reasonable number of items (totally 50) and a number of categories equal to 4, even in case of slightly high correlations. To conclude, the main results showed that the algorithm is particularly sensitive to the sample size due to the model complexity and the high number of parameters to be estimated. (n = 1000), In fact, when the sample size is suciently large all the parameters are well reproduced. The results are also af- fected by the trade-o between the test length and the number of categories: the 5.3 Additive GRM: simulations and results 73 worst results are associated to a high number of categories and a low test length. Analogous evidences apply for the correlation estimates. 74 5. Simulation Study Chapter 6 Application to real data: residents' attitudes towards tourism In this chapter we illustrate an implementation of the proposed models on data collected with the aim to investigate Romagna and San Marino residents' perceptions and attitudes towards the tourism industry. After having introduced the interpretation of model parameters in this new context, we illustrate the research design. Results about the multiunidimensional and additive GRMs estimations are reported in the nal two sections. 6.1 Interpretation of model parameters In the present application, the opinions of a sample of respondents on a set of aspects referred to the tourism industry represent our observed variables. Therefore, latent traits can be dened as `perceptions'. The investigation involves two distinct aspects of the phenomenon, namely perceived benets and costs of tourism. Therefore, it is possible to identify two specic perceptions and the overall attitude of respondents as latent variables. Within this framework, discrimination parameters represent the capability of the items to dierentiate between respondents with dierent levels of agreement, 75 76 6. Application to real data: residents' attitudes towards tourism whereas the threshold parameters can be interpreted as `criticity levels' of the corresponding item. For a given item, high values for the criticity parameters correspond to lower probabilities to observe responses in positive categories. 6.2 Research design Data analyzed are the result of a research conducted by the University of Bologna with the aim to study the subjective well-being (Bernini et al., 2013). Data were collected in the end of 2010 from residents in the Romagna area and in the State of San Marino (Italy). The Romagna area consists of the provinces of Forlì-Cesena, Rimini, and Ravenna, and is located in the southeast of the Emilia-Romagna region. The independent republic of San Marino borders the Rimini Province. The tourism industry has a relevant weight in this area: it contains the 7% of Italian accommodation structures and the 5% of Italian entertainment activities. Moreover, it is one of the main Italian tourism destinations, hosting in 2010 almost 27.5 million overnight stays (7.3% of the total national overnights) and 5.3 million arrivals (5.3% of the total national arrivals). The sampling design was carried out taking into account a stratication of the provinces and the demographic characteristics of the tourists (age and gender). The nal sample is representative of the population at the provincial level, with a margin of error of ±5% at a 95% level of condence. A total of 794 questionnaires were obtained through a telephone survey. The questionnaire was created with the aim to collect residents' evaluations about costs and benets of the tourism industry, a personal benet from tourism, the quality of life in the area, the degree of involvement in the tourism industry, the residents' satisfaction with either their leisure or non-leisure domains, their quality of life, and the degree of support for future development of the tourism industry. Furthermore, personal information (age, gender, nationality, residence and occupation) were also collected (see Appendix C for the submitted questionnaire). Some characteristics of the sample are summarised in Table 6.1. In particular, among all the aspects investigated through the survey, the object of our analysis is the perception of benets and costs associated to the tourism industry. The perceived benets of tourism were assessed by ve items: the support in 6.2 Research design 77 Number % 246 245 243 60 31.0 30.9 30.6 7.6 413 381 52.0 48.0 65 115 171 127 95 221 8.2 14.5 21.5 16.0 12.0 27.8 105 196 192 301 13.2 24.7 24.2 37.9 Provinces Forlì-Cesena Ravenna Rimini San Marino Gender Female Male Age < 25 25 - 35 35 - 45 45 - 55 55 - 65 ≥ 65 Education Primary Lower secondary Upper secondary University Table 6.1. Prole of respondents. local economic development [B1], quality of life [B2], public services improvement [B3], employment prospects [B4], and opportunities for cultural activities [B5]. Respondents were asked to indicate whether those items would improve for their community as a result of increasing tourism activity on a 7-point anchor scale, from strongly disagree to strongly agree. On the other hand, the perceived costs of tourism were assessed by other ve items: the cost of living [C1], crime [C2], environment damage [C3], trac congestion [C4], and pollution [C5]. In this case residents were asked to express if those aspects would worsen for their community as a result of increasing tourism activity on the 7-point scale mentioned above. Scales of the items with respect 78 6. Application to real data: residents' attitudes towards tourism to costs were inverted in order to eliminate reverse scoring and make the low and high scores be associated with high and low perceptions of costs, respectively. In Table 6.2, the response frequencies are reported for each item. Items B1-B5 refer to the benets, while items C1-C5 refer to the costs that were perceived by residents about the tourism industry. Responses Item Item description Low benets ←− 1 2 3 4 −→ 5 6 7 12 51 58 157 149 235 132 High benets B1 Econ. support B2 Quality of life 24 49 78 184 227 155 77 B3 Public services 16 45 97 186 190 171 89 B4 Job opportunities 16 36 69 157 187 198 131 B5 Cultural act. 30 54 76 186 188 157 103 Responses High costs Item Item description ←− −→ Low costs 1 2 3 4 5 6 7 C1 Cost of life 64 151 182 139 119 100 39 C2 Crime rate 145 169 157 155 69 71 28 C3 Env. damage 117 151 166 187 96 59 18 C4 Trac 193 152 158 130 89 45 27 C5 Pollution 158 173 164 136 63 81 19 Table 6.2. Response frequencies for items about tourism benets (B1-B5) and items about tourism costs (C1-C5). 6.3 Results for the multiunidimensional GRM The parameters of the bidimensional version of the multiunidimensional GRM have been estimated on the basis of the residents' responses to the 5 items on benets (B1-B5) and the 5 items on costs (C1-C5). 6.3 Results for the multiunidimensional GRM 79 By following a conrmatory approach, we assume that the item responses on benets are related to the rst latent variable costs are related to the second latent variable θ1 , θ2 . while the item responses on The two traits are allowed to correlate. Concerning the denition of the latent traits, perception of tourism benets, while tourism costs. θ2 θ1 can be expressed as the can be dened as the perception of the These interpretations strictly derives from the meaning of the items included in the questionnaire. A positive perception of the eect of the tourism industry is reected by high resident scores on θ1 and θ2 . In particular, the higher the positive perception of the eect of tourism on the local environment is, the higher the score is on θ1 . Conversely, the higher the score is on θ2 , the lower the perception of a negative impact of tourism on the environment is. The model parameters were estimated by using the proposed OpenBUGS procedure for the multiunidimensional GRM, with two chains and 30,000 total iterations (15,000 as burn-in) for each one. Table 6.3 illustrates the item parameter estimates for the test items. The strength of the relationship among the observed responses and the related latent trait is expressed by the discrimination parameters α. From Table 6.3 we can see that these parameters are all largely positive, suggesting that there is a coherent choice for the chosen latent structure. Particularly, the capability of an item to dierentiate individuals with dierent perceptions of the impact of tourism increases as the discrimination parameters increases. This relationship means that public services, job opportunities and cultural activities (items B3, B4 and B5, respectively) are the most informative on the perception of the tourism advantage, whereas trac and pollution (items C4 and C5) can better discriminate between residents who have dierent perceptions of the environmental impact of tourism. Among all the items, the cost of life (C1) presents the lower discrimination capability. 80 Item description ̂ SD( ̂ ) MCSE( ̂ ) ̂ SD(̂ ) MCSE(̂ ) ̂ SD(̂ ) MCSE(̂ ) ̂ SD(̂ ) MCSE(̂ ) B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 Econ. support Quality of life Public services Job. opp. Cultural act. Cost of life Crime rate Env. damage Traffic Pollution 1.103 1.204 1.485 1.423 1.339 0.286 1.563 1.440 1.638 1.793 0.074 0.078 0.096 0.094 0.087 0.105 0.109 0.100 0.117 0.131 0.001 0.001 0.001 0.002 0.001 0.004 0.002 0.003 0.003 0.003 -2.904 -2.632 -3.240 -3.231 -2.629 -1.468 -1.603 -1.744 -1.268 -1.580 0.155 0.134 0.184 0.183 0.134 0.115 0.106 0.105 0.097 0.114 0.001 0.001 0.002 0.002 0.001 0.001 0.001 0.001 0.001 0.001 -1.920 -1.856 -2.178 -2.315 -1.844 -0.633 -0.432 -0.658 -0.322 -0.413 0.095 0.096 0.116 0.123 0.099 0.065 0.080 0.078 0.082 0.087 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 -1.400 -1.225 -1.301 -1.549 -1.221 0.003 0.450 0.222 0.618 0.566 0.080 0.078 0.088 0.095 0.082 0.046 0.080 0.074 0.085 0.088 0.001 0.001 0.002 0.002 0.001 0.001 0.001 0.001 0.001 0.001 Item Item description ̂ SD(̂ ) MCSE(̂ ) ̂ SD(̂ ) MCSE(̂ ) ̂ SD(̂ ) MCSE(̂ ) B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 Econ. support Quality of life Public services Job. opp. Cultural act. Cost of life Crime rate Env. damage Traffic Pollution -0.496 -0.274 -0.284 -0.560 -0.235 0.471 1.345 1.235 1.449 1.512 0.065 0.066 0.075 0.076 0.070 0.052 0.096 0.088 0.102 0.111 0.001 0.001 0.002 0.002 0.001 0.001 0.001 0.001 0.001 0.001 0.171 0.807 0.723 0.366 0.679 0.962 1.867 1.982 2.234 2.082 0.063 0.072 0.080 0.074 0.074 0.071 0.109 0.108 0.126 0.130 0.001 0.001 0.002 0.001 0.001 0.001 0.001 0.001 0.001 0.001 1.348 1.906 2.010 1.543 1.749 1.670 2.775 2.893 2.946 3.367 0.077 0.099 0.113 0.094 0.098 0.107 0.146 0.151 0.158 0.191 0.001 0.001 0.001 0.002 0.001 0.001 0.001 0.001 0.001 0.001 NOTE: = 1 for the items on benefits and Table 6.3. = 2 for the items on costs, SD = standard deviation, MCSE = Monte Carlo standard error. Item parameter estimates for the multiunidimensional GRM. 6. Application to real data: residents' attitudes towards tourism Item 6.3 Results for the multiunidimensional GRM The thresholds' parameters the specic aspect considered. κ 81 for each are able to reect the criticity level of In fact, high values for the criticity parameters correspond to lower probabilities to observe responses in higher categories, which means that the items characterized by higher criticity parameters are answered in lower categories more frequently. For this model, it is not possible to unambiguously order the items by the response probability on the basis of the criticity parameters. θ̂1 and θ̂2 at the mean value 0, But, by xing we can use these parameters to compare the probabilities of category responses for each item, and to compare probabilities to observe a response in a particular category or higher (lower) for each item. The rst comparison can be carried out by calculating dierences in parameters associated to adjacent thresholds, while the second comparison, which is mainly meaningful in a context of interpretation, can be carried out directly through the thresholds' parameters. Figure 6.1 1 graphically illustrates the estimated probabilities to observe each category for each test item for a resident with an average perception of tourism benets and costs, i.e. θ̂1 = 0 and θ̂2 = 0. As an example, an individual with an average perception of tourism benets will have a higher probability of responding higher categories to item B1 than κ3 , = −0.50, = −0.28, to item B3, in fact thresholds' parameters associated to higher categories, κ4 , κ5 and κ6 , are regularly lower for item B1 (κ̂B1,3 = −1.40, κ̂B1,4 κ̂B1,5 = 0.17, κ̂B1,6 = 1.35) than to item B3 (κ̂B3,3 = −1.30, κ̂B3,4 κ̂B3,5 = 0.72, κ̂B3,6 = 2.01). This means that, between the advantages of economic support and public services, the rst aspect is considered mainly relevant by an individual with an average perception of benets. From Table 6.3 emerges that thresholds' parameters for item B1 related to the highest categories κ5 and κ6 are the lowest in the group of items on benets. This means that the main and immediate advantages of tourism are identied by the residents in the economic support. Analogously, a resident with an average perception of the environmental impact of tourism will have a higher probability of answering higher categories to item C1 than to the other items (κ̂C1,3 = 0.003, κ̂C1,4 = 0.471, κ̂C1,5 = 0.962, 1 NOTE: in order to represent the probabilities associated to categories 1 and 7 for each item, a lower bound of -4 and an upper bound of 4 have been xed. 82 6. Application to real data: residents' attitudes towards tourism B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 Cat.1 Figure 6.1. Cat.2 Cat.3 Cat.4 Cat.5 Cat.6 Cat.7 Representation of the thresholds' parameter estimates for the mul- tiunidimensional model. κ̂C1,6 = 1.670). Hence, the cost of life can be regarded as a marginal negative aspect of tourism in comparison with the other issues. The estimated correlation between the two latent traits is r̂12 = −0.37. The correlation is negative and relatively high, indicating that the perception of a high economic advantage of tourism is associated with a strongly negative environmental impact. As a conclusive remark, we can observe that the individuals show a dierent evaluation of the benets vs. costs, revealing a critical view of the tourism industry, and the multiunidimensional GRM is able to capture this peculiarity. 6.4 Results for the additive GRM In order to extend the structure of the multiunidimensional model with the inclusion of a general trait that directly aects all the item responses, we estimated 6.4 Results for the additive GRM 83 the parameters of the additive GRM. A general latent trait specic traits θ1 and θ2 . θ0 is added to the The two specic traits have the same interpretation as in the multiunidimensional model, namely perceptions of benets and costs, while the general trait can be dened as the overall attitude towards tourism. The foundation is that the general trait is estimated on the basis of the perception of either benets and costs but conditionally on the specic eects of the two traits, and allowing other residual factors (age, gender, place of residence, occupation,...) to inuence the measure of the overall attitude. Concerning the score interpretation, higher scores in the attitude are related to residents who perceive higher advantages and a lower negative impact of tourism. The model parameters were estimated by using the proposed OpenBUGS procedure for the additive GRM, with two chains and 30,000 iterations (15,000 burn-in) for each one. The item parameter estimates for the additive model are illustrated in Table 6.4. The additive model requires, for each item, the estimation of the general discrimination parameter (α0ν ), the specic discrimination parameter (αν ) and the criticity parameters (κ). Again, items cannot be unambiguously ordered on the basis of the response probabilities. Figure 6.2 graphically illustrates the estimated probabilities to observe each category for each test item for a resident with an average perception of tourism θ̂0 = 0 and θ̂1 = 0 for items on θ̂0 = 0 and θ̂2 = 0 for items on costs. benets and costs, i.e. a resident characterized by benets and a resident characterized by Concerning the group of items on benets, the economic support (B1) and job opportunities (B4) are associated with higher probabilities of responses in higher categories, because the corresponding estimates for the thresholds' parameters are generally lower than for the remaining items (κ̂B1,3 = −1.47, κ̂B1,4 = −0.51, = −0.56, κ̂B4,5 = 0.41, κ̂B1,5 = 0.20, κ̂B1,6 = 1.46 and κ̂B4,3 = −1.60, κ̂B4,4 κ̂B4,6 = 1.64)). This arrangement means that residents who have an average general perception of advantages and an average specic perception of advantages consider the economic development and the job opportunities as the main advantages of tourism. Moreover, among the items on costs, again the cost of life (C1) is characterised by generally lower thresholds' parameters in comparison to the estimated criticity levels of other items, especially with reference to higher categories (κ̂C1,3 κ̂C1,4 = 0.47, κ̂C1,5 = 0.96 and κ̂C1,6 = 1.68). = 0.00, So that, the cost of life seems to be the least important impact of the tourism industry for a typical respondent. 84 Item Item B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 NOTE: ̂ SD( ̂ ) MCSE( ̂ ) 1.047 0.946 1.247 1.290 1.194 0.284 1.534 1.343 1.487 1.425 0.074 0.063 0.082 0.082 0.077 0.042 0.109 0.090 0.126 0.103 0.001 0.001 0.001 0.001 0.001 0.000 0.002 0.001 0.004 0.002 Item description Econ. support Quality of life Public services Job. opp. Cultural act. Cost of life Crime rate Env. damage Traffic Pollution SD( ̂ ) MCSE( ̂ ) 0.013 0.250 0.446 0.144 0.343 0.017 0.074 0.051 0.906 0.668 0.012 0.073 0.095 0.083 0.094 0.016 0.060 0.049 0.144 0.113 0.000 0.001 0.002 0.002 0.002 0.000 0.001 0.001 0.005 0.003 ̂ = 1 for the items on benefits and ̂ - 3.049 - 2.539 - 3.278 - 3.342 - 2.713 - 1.491 - 1.824 - 1.901 - 1.509 - 1.646 ̂ - 0.507 0.250 0.264 0.560 0.228 0.470 1.470 1.299 1.660 1.520 SD(̂ ) MCSE( ̂ ) 0.169 0.125 0.188 0.187 0.141 0.069 0.123 0.114 0.134 0.114 0.002 0.001 0.002 0.002 0.001 0.000 0.002 0.002 0.005 0.003 SD(̂ ) MCSE( ̂ ) 0.066 0.061 0.071 0.073 0.069 0.048 0.106 0.091 0.137 0.110 0.001 0.001 0.001 0.001 0.001 0.000 0.002 0.001 0.004 0.002 ̂ - 2.014 1.789 2.200 2.390 1.901 0.644 0.534 0.745 0.397 0.452 ̂ 0.204 0.802 0.760 0.405 0.732 0.964 2.068 2.093 2.566 2.102 SD(̂ ) MCSE( ̂ ) 0.105 0.090 0.119 0.125 0.103 0.049 0.085 0.080 0.092 0.083 0.001 0.001 0.001 0.001 0.001 0.000 0.002 0.001 0.003 0.002 SD(̂ ) MCSE( ̂ ) 0.064 0.067 0.078 0.071 0.074 0.054 0.128 0.116 0.186 0.133 0.001 0.001 0.001 0.001 0.001 0.000 0.002 0.002 0.006 0.003 ̂ - 1.469 1.176 1.306 1.595 1.256 0.002 0.461 0.205 0.700 0.548 ̂ 1.458 1.871 2.066 1.644 1.856 1.676 3.083 3.089 3.381 3.413 SD(̂ ) MCSE( ̂ ) 0.087 0.073 0.088 0.094 0.084 0.046 0.083 0.073 0.098 0.085 0.001 0.001 0.001 0.001 0.001 0.000 0.001 0.001 0.002 0.002 SD(̂ ) MCSE( ̂ ) 0.086 0.094 0.116 0.097 0.103 0.075 0.179 0.167 0.240 0.200 0.001 0.001 0.002 0.002 0.001 0.000 0.003 0.002 0.007 0.004 = 2 for the items on costs, SD = standard deviation, MCSE = Monte Carlo standard error. Table 6.4. Item parameter estimates for the additive GRM. 6. Application to real data: residents' attitudes towards tourism B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 Item description Econ. support Quality of life Public services Job. opp. Cultural act. Cost of life Crime rate Env. damage Traffic Pollution 6.4 Results for the additive GRM 85 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 Cat.1 Figure 6.2. Cat.2 Cat.3 Cat.4 Cat.5 Cat.6 Cat.7 Representation of the thresholds' parameter estimates for the addi- tive model. Focusing on the discrimination parameters, concerning the estimated specic discrimination parameters, results are similar to the multiunidimensional case: the most informative items on the specic perception of tourism benets are public services, job opportunities and cultural activities (items B3, B4 and B5, respectively), while crime rate, trac and pollution (items C2, C4 and C5, respectively) are the items that better discriminate respondents with dierent levels of specic perception of tourism costs. Higher values of estimated general discrimination parameters are associated to public services (item B3) and cultural activities (item B5) regarding the benets, and to trac (item C4) and pollution (item C5) among the items on costs of tourism industry. Consequently, these aspects principally inuence the general residents' attitude towards tourism. Usually, the additive model ts the data better than the multiunidimensional model because the presence of an overall latent trait is generally supported by data. In fact, also for our data a lower DIC is associated with the additive model (DIC= 8945) in comparison to the multiunidimensional model (DIC=10950). 86 6. Application to real data: residents' attitudes towards tourism Analogously to the previous model, we estimated the correlations between the latent variables (θ1 , r̂02 = 0.18 and r̂12 θ2 , and θ0 ) of the additive model. The results are r̂01 = 0.03, = −0.62. The correlation between the benet and cost latent traits is negative as in the multiunidimensional model. The correlation between the benet latent trait and the attitude is very low, and slightly higher is the estimated correlation between the cost latent trait and the general attitude. 6.5 Heterogeneity in resident perceptions The multiunidimensional and additive models presented in this work are specied without considering the presence of covariates. Of course, once the measurement process is carried out, latent constructs may result in dierent scores according to the characteristics of the examinees. In order to face this issue, we perform an analysis of the scores for the general and specic latent traits obtained from the additive model. Therefore, to investigate the importance of the individuals' heterogeneity in the evaluation of tourism attitudes, the score distributions 2 of the general and specic latent traits are calculated and compared on the basis of some sociodemographic characteristics (Table 6.5). Residents show, on average, a positive attitude towards tourism (0.62) and a higher perception of benets (0.57) compared with the costs (0.48). From Table 6.5 we can observe how the youngest people have both a signicant personal attitude tward tourism and a critical perception of the tourism industry: a high score in the perception of benets is associated to a low score on the perception of costs. This means that the youngest are conscious of the advantages related to tourism, but at the same time, they strongly evaluate the negative eects of the industry on the community. On the contrary, respondents with a low level of education and elderly people show a high attitude towards tourism and a small gap between the benet and cost scores. The area of residence also aects the evaluation of the tourism industry. In fact, residents in the tourism municipalities and provinces (Rimini and San 2 As the scores have a dierent range, they have been normalized to the range of 0 to 1. 6.5 Heterogeneity in resident perceptions Age < 25 25 - 35 35 - 45 45 - 55 55 - 65 ≥ Gender 65 Female Male Education Primary Lower secondary Upper secondary University 87 θ̂0 θ̂1 θ̂2 0.57 0.61 0.61 0.63 0.64 0.62 0.61 0.56 0.55 0.57 0.58 0.57 0.45 0.53 0.46 0.50 0.50 0.48 0.62 0.61 0.58 0.55 0.47 0.50 0.62 0.65 0.60 0.60 0.57 0.57 0.56 0.57 0.48 0.52 0.45 0.48 0.65 0.65 0.58 0.50 0.54 0.53 0.62 0.62 0.52 0.53 0.41 0.40 0.60 0.63 0.64 0.62 0.57 0.59 0.55 0.57 0.49 0.47 0.49 0.48 Provinces Forlì-Cesena Ravenna Rimini San Marino Typological locality Main town Tourism municipality Total Table 6.5. Other urban city Normalized mean perception and attitude scores by age, gender, education, province and typological area. Marino), where the seaside tourism is relevant, present a high gap between the benet and cost scores. This rst research, that has been repeated in 2013, furnishes interesting suggestions for the development of incentive tourism policies, which are also related to the well-being. 88 6. Application to real data: residents' attitudes towards tourism Chapter 7 Conclusions This work falls within the context of item response theory (IRT). In particular, it focuses on models for ordinal data. The importance of developing models for ordinal data is relevant not only from a theoretical perspective. Actually, several elds of application are characterized by ordinal manifest variables and the use of proper models for ordinal data allows to avoid the loss of information due to the dichotomization process. IRT is widely used in psychological and educational elds, but it also shows a great potential in applications within behavioral sciences, where data are often ordinal. In the past, a common assumption was the presence of a single latent construct underlying the response process. However, real data typically suggest a multidimensional structure. So that, multidimensional IRT (MIRT) models have been recently developed, taking into account the complexity of real data and allowing for the presence of more than one latent trait. In this work we focus on MIRT models for ordinal data with complex latent structures. Indeed, numerous MIRT models can be specied according to several conditions, and one of them is the hypothesized underlying latent structure. The models proposed in this work are extensions of the unidimensional graded resopnse model (GRM) (Samejima, 1969) and are characterized by multidimensional latent structures with correlated traits. In particular, we consider the multiunidimensional structure, where the item responses are aected by specic traits, and the additive structure, where the item responses are simultaneously aected 89 90 7. Conclusions by a general and specic traits. Then, we considered two model: the multiunidimensional and the additive GRMs with correlated traits. This choice has been driven by the fact that the rst one is widely used and represents a classical approach in MIRT analysis, while the second one is able to reect the complexity of real interactions between items and respondents. Due to the complexity of the models proposed, another important aspect of this work concerns the estimation procedure. Within a Bayesian approach, we propose a Markov chain Monte Carlo (MCMC) procedure for parameter estimation, which permits to overtake the problem of analytically intractable expressions. Models are implemented using the open-source software OpenBUGS. This software, allowing for a exible and rather easy implementation, represents a good solution for estimation issues. In order to assess the item parameter recovery for both multiunidimensional and additive GRMs we perform a simulation study. The simulation study is conducted on a bidimensional case by varying the simulation conditions, that are: the number of response categories, the sample size, the test and subtest lengths and the latent trait correlation structure. Concisely, the main simulation results showed that the parameter recovery is particularly sensitive to the sample size, due to the model complexity and the high number of parameters to be estimated. For a suciently large sample size the parameters of the multiunidimensional and additive GRMs are well reproduced. The results are also aected by the tradeo between the number of items constituting the test and the number of item categories: the worst results are associated to a high number of categories and a low test length. Analogous evidences apply for the latent trait correlation estimates. In order to verify the actual applicability of the proposed models in real situations, we estimated them on empirical data. Data were collected with the aim to investigate Romagna and San Marino residents' perceptions and attitudes towards the tourism industry. A relevant advantage of the proposed models concerns the possibility to use the data collected without any preliminary transformation, hence without any loss of information. Some limitations of the research regarding the application study exist, in particular the choice of the prior distributions, the sample size, the number of item categories, the test and subtests lengths, are important issues that have to 91 be always considered and checked. Lastly, concerning the future works to be done on the MIRT models for ordinal data and correlated traits, rst of all it could be interesting to perform further simulations with an increased number of latent dimensions. Secondly, this work focuses on two specic underlying latent structures, hence an extension to dierent (i.e. hierarchical or high-orders) structures represent a stimulating issue. A nal extension could consider the introduction of covariates in the model specication, independently from the underlying structure considered. 92 7. Conclusions Bibliography T.A. Ackerman. Insuring the validity of the reported score scale by reporting multiple scores. metric Society Paper presented at the North American Meeting of the Prycho- , 1993. A.J. Adams, M. Wilson, and W-C Wang. cients multinomial logit model. The multidimensional random coef- Applied Psychological Measurement , 21(1): 123, 1997. J.H. Albert. Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational Statistics , 17(3):251269, 1992. E.B. Andersen. Asymptotic properties of conditional maximum-likelihood estimators. Journal of the Royal Statistical Society. Series B (Methodological) , 32: 283301, 1970. D. Andrich. chometrika A rating scale formulation for ordered response categories. Psy- , 43(4):561573, 1978. D. Andrich. An extension of the Rasch model for ratings providing both location Psychometrika The theory and practice of item response theory and dispersion parameters. R.J. De Ayala. , 47(1):105113, 1982. . Guilford Press, 2009. Mr. Bayes and MR. Price. An Essay towards solving a Problem in the Doctrine of Chances. By the late Rev. Mr. Bayes, FRS communicated by Mr. Price, in a letter to John Canton, AMFRS. Philosophical Transactions (1683-1775) pages 370418, 1763. 93 , 94 BIBLIOGRAPHY A.A. Béguin and C.A.W. Glas. MCMC estimation and some model-t analysis of multidimensional IRT models. Psychometrika , 66(4):541562, 2001. C. Bernini, A. Guizzardi, and G. Angelini. DEA-like model and common weights approach for the construction of a subjective community well-being indicator. Social indicators research , 114(2):405424, 2013. A. Birnbaum. Some latent trait models and their use in inferring an examinee's ability. In F.M. Lord and M.R. Novick, editors, test scores Statistical theories of mental . Addison-Wesley, Reading MA, 1968. R.D. Bock. Estimating item parameters and latent ability when responses are scored in two or more nominal categories. S.P. Brooks and A. Gelman. iterative simulations. Psychometrika , 37(1):2951, 1972. General methods for monitoring convergence of Journal of computational and graphical statistics , 7(4): 434455, 1998. A.R. Brown, S.J. Finney, and M.K. France. Using the bifactor model to assess the dimensionality of the Hong Psychological Reactance Scale. and Psychological Measurement Educational , 71(1):170185, 2011. F. F. Chen, S. G. West, and K. H. Sousa. A comparison of bifactor and secondorder models of quality of life. Multivariate Behavioral Research , 41(2):189225, 2006. S.M. Curtis. BUGS code for item response theory. Journal of Statistical Software , 36(1):134, 2010. M.C. Edwards. A Markov chain Monte Carlo approach to conrmatory item factor analysis. Psychometrika , 75(3):474497, 2010a. M.C. Edwards. MultiNorm User's Guide: multidimensional normal ogive item response theory models. P.J. Ferrando. mimeo , 2010b. Likert scaling using continuous, censored, and graded response models: eects on criterion-related validity. ment , 23(2):161175, 1999. Applied Psychological Measure- BIBLIOGRAPHY 95 G.H. Fischer. Derivations of the rasch model. In G.H. Fischer and I.W. Molenaar, editors, tions Rasch models: Foundations, recent developments and applica- . Springer-Verlag, New York, 1995. J-P Fox. Bayesian item response modeling . Springer, 2010. J-P Fox and C.A.W. Glas. Bayesian estimation of a multilevel of an IRT model using Gibbs sampling. Psychometrika , 66(2):271288, 2001. Z-H Fu, J. Tao, and N-Z Shi. Bayesian estimation of the multidimensional graded response model with nonignorable missing data. tation and Simulation Markov Chain Monte Carlo Journal of Statistical Compu- , 80(11):12371252, 2010. D. Gamerman. . Chapman and Hall, London, 1997. A.E. Gelfand and A.F.M. Smith. marginal densities. Sampling-based approaches to calculating Journal of American Statistical Association , 85:398409, 1990. A. Gelman. Objections to Bayesian statistics. Bayesian Analysis , 3(3):445449, 2008. A. Gelman and D.B. Rubin. Inference from iterative simulation using multiple Statistical science sequences. , pages 457472, 1992. A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin. Bayesian Data Analysis . CRC press, London, 2003. S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. Machine Intelligence IEEE Transactons on Pattern Analysis and , 6:721741, 1984. R.D. Gibbons, A.J. Rush, and J.C. Immekus. On the psychometric validity of the domains of the PDSQ: An illustration of the bi-factor item response theory model. Journal of psychiatric research , 43(4):401410, 2009. W.R. Gilks, S. Richardson, and D.J. Spiegelhalter. in practice . Chapman and Hall, London, 1996. Markov chain Monte Carlo 96 BIBLIOGRAPHY H. Gu and C. Ryan. Place attachment, identity and community impacts of Tourism management tourism-the case of a Beijing hutong. , 29(4):637647, 2008. D.C. Haley. Estimation of the dosage mortality relationship when the dose is subject to error. Technical Report 15 (Oce of Naval Research Contract No. 25140, NR-342-022), Stanford University: Applied Mathematics and Statistics Laboratory, 1952. R.K. Hambleton and H. Swaminathan. applications Item Response Theory: Principles and . Kluwer Nijho Publishing, Boston, 1985. R.K. Hambleton, H. Swaminathan, and H.J. Rogers. sponse Theory Fundamentals of Item Re- . Sage Publications, Newbury Park, CA, 1991. W.K. Hastings. Monte Carlo simulation methods using Markov chains and their applications. Biometrika , 57:97109, 1970. K.J. Holzinger and F. Swineford. The bi-factor method. Psychometrika , 2(1): 4154, 1937. H-Y Huang, W-C Wang, P-H Chen, and C-M Su. Higher-order item response models for hierarchical latent traits. Applied Psychological Measurement , 37 (8):619637, 2013. M.S. Johnson. Marginal maximum likelihood estimation of item response models in R. Journal of Statistical Software , 20(10):124, 2007. R. Likert. A technique for the measurement of attitudes. Archives of psychology , 22(140), 1932. Psychometric Monograph No. 7 Applications of item response theory to practical testing problems F.M. Lord. A theory of test scores. F.M. Lord. , 1952. . Lawrence Erlbaum, Hillsdale, NJ, 1980. F.M. Lord and M.R. Novick. Statistical theories of mental test scores Wesley, Reading, MA, 1968. . Addison- BIBLIOGRAPHY 97 D. Lunn, D. Spiegelhalter, A. Thomas, and N. Best. The BUGS project: Evolution, critique and future directions. Statistics in Medicine , 28(25):30493067, 2009. D. Lunn, C. Jackson, N. Best, A. Thomas, and D. Spiegelhalter. Book The BUGS . CRC Press, Taylor & Francis Group, 2013. D.J. Lunn, A. Thomas, N. Best, and D. Spiegelhalter. WinBUGS - a Bayesian modelling framework: computing concepts, structure, and extensibility. Statistics and , 10(4):325337, 2000. G.N. Masters. A rasch model for partial credit scoring. Psychometrika , 47(2): 149174, 1982. Item Response Theory models for the competence evaluation: towards a multidimensional approach in the University guidance M. Matteucci. . PhD thesis, University of Bologna, 2007. N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E.Teller. Equation of state calculations by fast computing machines. Chemical Physics The Journal of , 21, 1953. E. Muraki. A generalized partial credit model: Application of an EM algorithm. Applied psychological measurement , 16(2):159176, 1992. E. Muraki. POLYFACT [computer software]. Service Princeton, NJ: Educational Testing , 1993. E. Muraki and J.E. Carlson. Full-information factor analysis for polytomous item Applied Psychological Measurement Bayesian modeling using WinBUGS Polytomous Item Response Theory Models (Quantitative Applications in the Social Sciences, Vol. 144) responses. , 19(1):7390, 1995. I. Ntzoufras. , volume 698. Wiley. com, 2011. R. Ostini and M.L. Nering. . Sage Publications, Thou- sand Oaks, CA, 2006. R.J. Patz and B.W. Junker. Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. and Behavioral Statistics , 24(4):342366, 1999a. Journal of Educational 98 BIBLIOGRAPHY R.J. Patz and B.W. Junker. A straightforward approach to Markov chain Monte Carlo methods for item response models. ioral Statistics Journal of Educational and Behav- , 24(2):146178, 1999b. M. Plummer. JAGS Version 2.2.0 user manual, 2010. Probabilistic models for some intelligence and attainment tests G. Rasch. . Danish Institute for Educational Research, Copenhagen, 1960. M.D. Reckase. The diculty of test items that measure more than one ability. Applied Psychological Measurement , 9(4):401412, 1985. M.D. Reckase. The past and future of multidimensional item response theory. Applied Psychological Measurement Multidimensional Item Response Theory , 21(1):2536, 1997. M.D. Reckase. . Springer, London, 2009. B.B. Reeve. An introduction to modern measurement theory. Institute J. Rost. National Cancer , 2002. Measuring attitudes with a threshold model drawing on a traditional scaling concept. Applied Psychological Measurement , 12(4):397409, 1988. F. Samejima. Estimation of latent trait ability using a response pattern of graded scores. Psychometrika Monograph Supplement No. 17 , 1969. F. Samejima. responses. Evaluation of mathematical models for ordered polychotomous Behaviormetrika , 23(1):1735, 1996. J. Schmid and J.M. Leiman. The development of hierarchical factor solutions. Psychometrika Y. Sheng. , 22(1):5361, 1957. A MATLAB package for Markov chain Monte Carlo with a multi- unidimensional IRT model. Journal of Statistical Software , 28(10):120, 2008. Y. Sheng. Bayesian estimation of MIRT models with general and specic latent traits in MATLAB. Journal of Statistical Software , 34(3):126, 2010. Y. Sheng and C.K. Wikle. Comparing multiunidimensional and unidimensional item response theory models. (6):899919, 2007. Educational and Psychological Measurement , 67 BIBLIOGRAPHY 99 Y. Sheng and C.K. Wikle. Bayesian multidimensional IRT models with a hierarchical structure. Educational and Psychological Measurement , 68(3):413430, 2008. Y. Sheng and C.K. Wikle. Bayesian IRT models incorporating general and specic abilities. Behaviormetrika , 36(1):2748, 2009. D. Spiegelhalter, A. Thomas, N. Best, and W. Gilks. BUGS 0.5: Bayesian inference using Gibbs sampling - Manual (version ii). MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK Item response theory in personality research , 1996. L. Steinberg and D. Thissen. . Lawrence Erlbaum Associates Hillsdale, NJ, 1995. R. Tate. A comparison of selected empirical methods for assessing the structure Applied Psychological Measurement of responses to test items. , 27(3):159203, 2003. D. Thissen and L. Steinberg. A response model for multiple choice items. chometrika Psy- , 49(4):501519, 1984. D. Thissen and L. Steinberg. A taxonomy of item response models. trika Psychome- , 51(4):567577, 1986. D. Thissen, L. Nelson, K. Rosa, and L.D. McLeod. Item response theory for items scored in more than two categories. In D. Thissen and H. Wainer, editors, scoring Test , pages 141186. Psychology Press, 2001. A. Thomas, B. O'Hara, U. Ligges, and S. Sturtz. Making BUGS Open. 6(1):1217, 2006. URL R.E. Traub. http://cran.r-project.org/doc/Rnews/. R News A priori considerations in choosing an item response model. R.K. Hambleton, editor, Applications of item response theory , In , pages 5770. Educational Research Institute of British Columbia, Vancouver, BC, 1983. L.A. van der Ark. theory models. Relationships and properties of polytomous item response Applied Psychological Measurement , 25(3):273282, 2001. 100 BIBLIOGRAPHY L.A. van der Ark, D.W. van der Palm, and K. Sijtsma. A latent class approach to estimating test-score reliability. Applied Psychological Measurement , 35(5): 380392, 2011. W.J. van der Linden and R.K. Hambleton. Theory Handbook of Modern Item Response . Springer-Verlag, New York, 1997. W-C Wang, P-H Chen, and Y-Y Cheng. Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods , 9(1):116136, 2004. L. Yao and R.D. Schwarz. A multidimensional partial credit model with associated item and test statistics: An application to mixed-format tests. psychological measurement , 30(6):469492, 2006. Applied Appendix A OpenBUGS code for implemented models A.1 OpenBUGS code: multiunidimensional and additive models for graded responses In this section we report the codes used to implement the multiunidimensional model and the additive model for graded responses. Initial values for the following quantities have to be set and loaded from the user before to run the models: m.theta, Sigma.theta, m.alpha, s.alpha, m.alpha0, s.alpha0, m.kappa and s.kappa (of course, m.alpha0 and s.alpha0 are referred only to the additive model). 101 Appendix B R procedures for the simulation study The following sections report the codes used to perform the simulation study for both the multiunidimensional and the additive GRMs. For each model, the procedure about a single scenario (i.e. with particular simulation conditions that can be set at the beginning of the procedure) is described. The simulation study has been conducted by using an R procedure to generate the objects of interest, and by recalling OpenBUGS trough the R package BRugs. The main advantage of the combined use of R and OpenBUGS consists in the possibility to create an automatic routine to complete all replications within a distinct scenario. For further details about all the available functions and features of the package BRugs, see Thomas et al. (2006). 105 B.1 Multiunidimensional GRM: R code B.2 Additive GRM: R code Appendix C Survey questionnaire In this section we report the questionnaire submitted to residents in the Romagna area and in the State of San Marino (Italy). The questionnaire has been created to investigate residents' evaluations about costs and benets of the tourism industry, a personal benet from tourism, the quality of life in the area, the degree of involvement in the tourism industry, the residents' satisfaction with either their leisure or non-leisure domains, their quality of life, and the degree of support for future development of the tourism industry. 113 QUESTIONARIO RIMINESI 8. Con un voto da 1 a 7 (1 min accordo, 7 max accordo), quanto è in accordo con le seguenti affermazioni: Intervistatrice num.___ Intervistato num.____ Buongiorno, questa è un'indagine coordinata da docenti dell’Università di Rimini per conoscere le opinioni dei cittadini sulla qualità della vita. Possiamo avere anche il suo parere? La disturberemo solo pochi minuti e le sue risposte rimarranno completamente anonime ... (passare subito alla domanda successiva). 1. Giudichi la Romagna rispetto a: / Con un voto da 1 a 7, come giudica questi aspetti della Romagna (1 min sodd, 7 max sodd): 1. 2. 2. 3. 4. 5. 6. 7. 2. ____ ____ ____ ____ ____ ____ ____ ____ sviluppa l’economia della città migliora lo standard/qualità di vita sviluppa i servizi pubblici aumenta le opportunità lavorative migliora le attività culturali 9. Complessivamente è a favore dello sviluppo dell’industria turistica nella Romagna: si no 10. Per migliorare la qualità della vita in Romagna cosa suggerisce di fare (1 sola proposta) …………………….……………………………………………………. 11. La sua professione è in qualche modo legata al mercato turistico: Sì, svolgo un’attività legata al settore turistico ……………………………………………… sino Saltuariamente o in passato ho svolto attività legate al settore turistico ………… si no L’attività dei miei familiari è legata al settore turistico ………………………………… si no Dia un voto da 1 a 7 ai seguenti vantaggi che il turismo porta nella Romagna (1 min vantaggio, 7 max vantaggio) : 1. 2. 3. 4. 5. 3. tenore di vita dotazione di servizi pubblici traffico pulizia della città e verde ospitalità e accoglienza possibilità di lavoro/carriera attività ricreative e culturali sicurezza 1. sono a favore dello sviluppo del turismo balneare ____ 2. sono a favore dello sviluppo delle attività culturali e ricreative della mia città ____ 3. sono a favore dello sviluppo delle manifestazioni fieristiche e sportive ____ ____ ____ ____ ____ ____ 12. 1. 3. 4. 5. 6. 7. 8. 9. Lei…: legge abitualmente quotidiani ......................si fa sport regolarmente ....................................si va spesso a mostre d'arte, musei o teatro ........si viaggia spesso per vacanza ............................si naviga spesso su Internet da casa ..................si acquista su internet .......................................si fa volontariato e/o politica ..............................si va in Chiesa o altro luogo di culto religioso.......si no no no no no no no no La ringrazio per la sua cortese collaborazione, per concludere posso ancora chiederle: Dia un voto da 1 a 7 ai seguenti problemi che il turismo porta nella Romagna (1 min problema, 7 max problema): 13. La sua età: _______ 14. Genere 1. 2. 3. 4. 5. 4. 1. 2. 3. 4. 7. Maschio ...........................................................1 Femmina ..........................................................2 15. In quale città risiede? _____________________ Situazione economica Salute Relazioni famigliari Relazioni con amici Lavoro spiritualità/religione 16. Da quanti anni vive in Romagna? _______ ____ ____ ____ ____ ____ ____ 17. Il suo stato civile Con un voto da 1 a 7, quanto si ritiene soddisfatto dalle attività che svolge nel tempo libero (ultimo anno): 1. 2. 3. 4. 5. 6. 7. 6. ____ ____ ____ ____ ____ Con un voto da 1 a 7, quanto si ritiene soddisfatto dei seguenti aspetti della sua vita (ultimo anno): 1. 2. 3. 4. 5. 6. 5. Aumenta il costo della vita e delle case Aumenta il disordine e la criminalità Danneggia l’ambiente e il paesaggio Aumenta il traffico Aumenta l’inquinamento relazioni sociali attività sportive/fitness hobby personali attività culturali (cinema, teatro, ecc.) attività ricreative (ristoranti, discoteche, ecc.) fare shopping andare in spiaggia/mare ____ ____ ____ ____ ____ ____ ____ 18. Qual è il suo titolo di Studio: 19. E la sua professione? (1 sola risposta): ____ ____ ____ ____ Con un voto da 1 a 7 (1 min accordo, 7 max accordo), quanto è d’accordo con le seguenti affermazioni: l’industria turistica: 1. ha migliorato la qualità della mia vita 2. ha reso Rimini il posto migliore dove trascorrere il mio tempo libero 3. ha reso Rimini una città che mi consente di realizzarmi 1 2 3 4 Licenza Elementare…………………………………………………………………… 1 Licenza Media…………………………………………………………………………… 2 Diploma…………………………………………………………………………………… 3 Laurea……………………………………………………………………………………… 4 Con un voto da 1 a 7 (1 min sodd, 7 max sodd), quanto (ultimo anno): è soddisfatto di come sono andate le cose nella sua vita è soddisfatto della maggior parte degli aspetti della sua vita trova soddisfazione nel pensare a quello che è riuscito a fare nella vita è soddisfatto per quello che è quando si confronta con amici e familiari nubile/ celibe…………………………………………………………………………… coniugato/a …………………………………………………………………………… separato/divorziato ………………………………………………………………… vedovo/…………………………………………………………………………………… ____ ____ ____ Dirigente / Funzionario / Professionista d'albo……………………………… 1 Imprenditore/ Lavoratore in proprio/Artigiano……………………………… 2 Impiegato/a o quadro………………………………………………………………… 3 Insegnante (professore, maestro, ecc.) ……………………………………… 4 Operaio/a………………………………………………………………………………… 5 Casalinga………………………………………………………………………………… 6 Studente/ssa…………………………………………………………………………… 7 Pensionato/a…………………………………………………………………………… 8 In cerca di lavoro……………………………………………………………………… 9 Altro………………………………………………………………………………………… 10 Specificare______________________________________

Download PDF

advertisement