SAS Global Forum 2011 Statistics and Data Analysis Paper 341-2011 Logit Models in Practice: B, C, E, G, M, N, O… Joseph C. Gardiner, Zhehui Luo Division of Biostatistics, Department of Epidemiology, Michigan State University, East Lansing, MI ABSTRACT Logit models appear in a variety of forms in applications in biostatistics, epidemiology, economics, marketing research and sociology. They are used to model the relationship between covariates and various types of discrete outcomes from the ubiquitous binary logit model for a two-level response to the conditional logit and multinomial (generalized) logit models concerning polytomous responses. Covariates may vary by characteristics of both the individual and response. For example, when assessing a consumer’s choice of health insurance plan or health care provider, or selection of a treatment regime (surgery, medical management, or no treatment), the probability of choice depends on the consumer’s own circumstances, utilities and preferences. Nested logit models allow for modeling the sequence of the decision process faced by the consumer by grouping alternatives at each stage into nests. Ordered logit models exploit the underlying ordinal structure of the response, whereas the exploded logit can be applied to rank ordered responses. We survey some enhancements in SAS/STAT and SAS/ETS software that can be used to fit various logit models. INTRODUCTION In many applications one encounters qualitative response data. The simplest binary outcome has two levels, for example a patient’s response to treatment is success or failure; a voter supports, or does not support a piece of legislation. Polytomous outcomes with several levels may be ordinal such as the severity of pain recorded as none, mild, moderate or severe, or nominal (unordered) such as the choice of travel mode—car, bus, train or plane, for traveling between two cities. Rank-ordered response data arise when a consumer is provided a menu of alternatives such as several breakfast cereals, and asked to order their choice from best (most preferred) to worst (least preferred). There may be several nuances in the respondent data. The set of alternatives could vary across individuals; some choices may receive the same rank; only a subset of the offered alternatives may be ranked leaving the remaining choices unranked. Discrete choice models (DCMs) in which individuals make choices based on own tastes for attributes of the alternatives have applications in marketing research, health services research and behavioral and social sciences. See references. Statistical models for analysis of qualitative observations should exploit their discrete nature while focusing on the inferential questions being addressed. Methods typically used to analyze quantitative, continuous responses are likely to be inadequate and inappropriate. For the models to be discussed in this paper the observations {( Yi , x i ) : 1 ≤ i ≤ n} constitute a random sample from the target population, with Yi denoting the response or the chosen alternative in DCMs and x i a p×1 vector of explanatory variables (covariates) for the i-th individual or unit in the sample. Especially with DCMs the covariates will vary by characteristics of the = x i {x ij : j ∈ C i } where x ij are the covariates for the j-th alternative in the choice set alternatives. In this case C i for the i-th individual. Typically researchers wish to quantify the influence of the covariates on some feature of the distribution of Yi , for example the probabilities of choosing alternative j. This quantification is 1 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis through a regression model for an underlying unobserved continuous latent variable whose range of values is manifest in the observation Yi . Although reference to a latent variable regression is not strictly necessary, it nevertheless provides a convenient primitive to frame the derivation of various models by changing the distribution assumption on the latent variable. If the latent variable has a meaning in a particular field of application, it has the advantage of providing a context that could help with interpretation of the model. Binary Logit The binary logit model is the mainstay for modeling a dichotomous response with applications in perhaps every research endeavor. The response Yi is realized as a binary indicator = Yi [ Yi ∗ > 0] from the latent linear ∗ regression model Y = x′i β + ε i where the error ε i has a logistic distribution F ( u =) (1 + e − u )−1 , u ∈ ( −∞, ∞ ) i independent of x i . The response probability π (= [ Yi 1|= x i ) P= x i ] F ( x′i β ) when transformed by log (π ( x i ) / (1 − π ( x i )) = x′i β provides an interpretation of the regression coefficients β as log odds ratios. The maximum likelihood estimator (MLE) of β is obtained by maximizing the log-likelihood ( β ) = = − F ( x′ β ))) ∑ ∑ (Y log F ( x′ β ) + (1 − Y )log(1 n n i i i i i 1= i 1 (score) vector, g = log F ( qi x′i β ) where = q i 2Yi − 1. The gradient ∂( β ) ∂( β ) ∂( β ) ∂ 2 ( β ) , outer product (OP) matrix B = and Hessian matrix H = − ∂β ∂β ∂β ′ ∂β∂β ′ − F ( x′i β )) x i x′i , H = simplify to g = ∑ ( Y − F ( x′ β )) x , B = ∑ (Yi = ∑ i 1 ( F ( x′i β )(1 − F ( x′i β )) xi x′i . n n 2 n i i i i 1 =i 1 = The MLE β̂ of β is the solution to the normal equation g( β ) = 0. It is consistent and asymptotically normal ˆ = H( βˆ ) . Two other estimates of the asymptotic ˆ −1 where H with (estimated) asymptotic variance matrix H ˆ −1BH ˆ ˆ −1 , also referred to as the ˆ −1 and the robust-sandwich estimate H variance matrix are the OP estimate B quasi (Q)-MLE variance matrix. All three variances are computed by proc QLIM; only the Hessian variance is computed in proc LOGISTIC. Robust-sandwich (empirical) and Hessian variances are computed in proc GLIMMIX and proc GENMOD under the assumed set-up of the generalized linear model (GLM). The solution to the estimating equation n ∂π 0 , where Var ( Yi | x i=) υ= (EE) for β , ∑ i =1 i υi−1 ( Yi − π i ) = π i (1 − π i ) is the same as the solution to the i ∂β MLE normal equation. The GLM model-based and robust-sandwich estimators of the variance coincide with ˆ −1 and H ˆ −1BH ˆ ˆ −1 , respectively. In contrast the probit model with F = Φ (standard normal distribution) will H yield slightly different variance estimators under the MLE and GLM theory although the MLE and EE estimators for β are the same. Consistent estimation of β requires correct specification of π ( x i ). Any of the following will make the MLE β̂ inconsistent: (i) heteroscedasticity, i.e., Var ( ε i | x i ) being non-constant. (ii) endogeneity of covariates x i , i.e., one or more covariates are correlated with the error ε i , (iii) incorrect distribution assumption on the error ε i , and (iv) omitted covariates in x i (even if they are orthogonal to those included). An example of (i) is Var ( ε i | x i ) = σ 2 exp( z′i γ ) where σ 2 is 1 for the probit model or π 2 /3 for the logit model which lead to specifying π ( x i ) F ( x′i β exp( −½z′i γ )) . The covariates zi , typically a subset of x i , should be selected with = guidance from subject-matter rather than statistical convenience. For (ii) we need additional models for the 2 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis endogenous covariates. Both (i) and (ii) can be fitted in QLIM although for (ii) the model errors are assumed jointly normal. The logistic and normal distributional assumption on ε i generally yield similar results for π ( x i ). Moon (1988) and Mcdonald (2000) discuss other flexible forms for F concerning (iii). Wooldridge (2002) gives some insightful comments on the issue of neglected heterogeneity (iv) in the context of the probit model. Since all moments of the response Yi are functions of π ( x i ), in the single response context one might question the need for robust standard errors to guard against heteroscedasticity or misspecification. Illustrative Example 1 The data set comprises 4483 respondents in year 1988 to the German Socioeconomic Panel Survey 19841995 on healthcare utilization (Riphahn et al, 2003). Self-reported assessment of health (HSAT) is recorded on a 0 to 10 scale with higher values indicative of better health. The covariates we will use in this analysis are the respondent’s age (AGE), a measure of household income (HHNINC), education (EDUC) –all continuous, and the binary indicators for gender (FEMALE, 48%), having children in household (HHKIDS, 38%) and marital status (MARRIED, 75%). For purposes of illustration of various binary logit models we use the dichotomization Y=[HSAT≥7]. Approximately 60% have the event Y=1 which we will call “good health”. The following formats might prove useful: proc format; value hsat low-<7='<7' 7-high='>=7'; value female 0='male' 1='female'; value affirm 0='no' 1=' yes'; run; LOGISTIC and QLIM will produce identical results: proc logistic data=c.healthcare(where=(year=1988)); class married(ref='no') hhkids(ref='no') /param=ref; model hsat(event='>=7')=age educ hhninc married hhkids female/link=logit; format female female. married hhkids affirm. hsat hsat.; run; proc qlim data=c.healthcare(where=(year=1988)); *covest=qml; class hhkids married female; endogenous hsat~discrete(dist=logistic order=formatted); model hsat=age educ hhninc married hhkids female; format female female. married hhkids affirm. hsat hsat.; run; Table 1 summarizes the estimation results. Although its need is questionable, the robust estimates of standard errors (column 4) are produced by the option covest=qml in the QLIM statement. The p-values (column 5) computed using either standard errors are practically the same. The heteroscedastic logit model (columns 6-8) is fitted by adding the HETERO statement to the QLIM syntax: hetero hsat~female HHNINC /link=exp noconst; Model fit statistics at the bottom of Table 1 show that the heteroscedastic model is not significantly different from the homoscedastic model. The formal likelihood ratio (LR) test of H 0 : γ = 0 has χ 2 = 0.35 , 2 DF. 3 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis Table 1: Binary Logit Models Homoscedastic case Parameter Intercept AGE Heteroscedastic Case Standard Standard Standard P-value P-value Error Error Error Estimate (Hessian) (QMLE) (Hessian) Estimate (Hessian) (Hessian) 0.8091 –0.0328 0.24155 0.00321 0.24287 0.00321 0.0008 <.0001 0.8146 –0.0320 0.23332 0.00589 0.0005 <.0001 EDUC 0.0837 0.01503 0.01536 <.0001 0.0805 0.02032 <.0001 HHNINC 0.3487 0.20833 0.21234 0.0942 0.2224 0.37888 0.5572 MARRIED HHKIDS –0.0518 0.1289 0.08288 0.07557 0.08339 0.07523 0.5318 0.0881 –0.0401 0.1285 0.08622 0.07690 0.6422 0.0947 FEMALE –0.0568 0.06388 0.06387 0.3738 –0.0304 0.08008 0.7040 0.1212 –0.3642 0.27374 0.95830 0.6579 0.7039 _H.FEMALE _H.HHNINC –2 Log L –2 Log L (null) –2 Log LR 5780.0 6020.8 5779.6 6020.8 240.8 241.2 The results show that older age is associated with poor health, and more education with good heath. The sign on MARRIED suggests that the health status of married respondents was worse than their single counterparts. Fortunately the effect is not significant. Using the OUTPUT statement we can obtain predicted probabilities of response. This is useful in the heteroscedastic model because the standard interpretation of the β-coefficients as log odds ratios is not valid. Cumulative Logit and Ordered Logit Models There are many applications in which the categories of the outcome have a natural ordering. For example, the severity of pain recorded as none, mild, moderate or severe. Any categorical variable assessed on a Likert scale would also fit this type of response. Suppose there are J- levels of the outcome Yi with labels 1, 2, …,J. The response variable can be modeled in x i ) P[ Yi ≤ j | x i ] , reflect the ordering, with various ways. The cumulative probabilities of Yi , γ j (= γ 1( xi ) ≤ γ 2 ( xi ) ≤ ≤ γ J ( xi ) = 1. Procedures LOGISTIC, GENMOD and GLIMMIX with the option link=cumlogit in the model statement will fit the model log ( γ j ( x i ) / (1 − γ j ( x i )) = α j + x′i δ , which is called the cumulative logit model (Agresti, 2002). Changing the link to cumprobit will fit the cumulative probit model. The α j , j=1, 2, …,J are intercepts; a constant is not included in x i . The parameters δ describe the effect of a covariate on the log odds of response in the category j or below. When the corresponding δ >0, as the value of the associated covariate increases, the response is more likely to fall at the low end of the ordinal scale. 4 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis As in the aforementioned pain scale the response variable sometimes reflects an underling measure that is not observed in its entirety. Let µ j , j = 0, , J be threshold-points that provide a partition of the entire real line, that is, −∞ = µ0 < µ1 < < µ J = ∞ . The observed outcome is a categorization of a latent variable * Y = x′i β + ε i such that Yi = j if and only if µ j −1 < Yi * ≤ µ j . The probability of response is i π j (= x i ) P= [ Yi j | x i ] = F ( µ j − x′i β ) − F ( µ j −1 − x′i β ), j = 1, , J where F is the distribution of ε i . The cumulative response probability is γ j ( x i ) =P[ Yi ≤ j | x i ] =F ( µ j − x′i β ). By specifying F we get the two commonly used models: when F is the logistic distribution function we get the ordered logit model; when F is the standard normal distribution function Φ we get the ordered probit model. ( ) ( ) µ j − x′i β = 1, , J − 1. − log (1 − γ j ( x i )) / γ j ( x i ) , j = In the ordered logit model log γ j ( x i ) / (1 − γ j ( x i )) = The parameters β describe the effect of a covariate on the log odds of response in the category above j, or equivalently the marginal effect of the covariate on E[ Yi * | x i ] . When β >0, as the value of the covariate increases, the response is more likely to fall at the high end of the ordinal scale, because ∂E[ Yi * | x i ] =β. ∂x i Both the cumulative logit model and the ordered logit model have the proportional odds property because the odds ratio does not depend on the category to which the response variable belongs. Both models assume the effect of a covariate is identical for all J–1 cumulative logits. When this property holds, the model requires a single parameter rather than J–1 parameters to describe the effect of a covariate. Proc QLIM fits the ordered logit and ordered probit models. It uses the latent variable formulation. By default an intercept is included in β and the first threshold parameter µ1 is set to zero. The model option limit1=varying overrides the default. Estimation of the parameters ( µ j , β ) or (α j , δ ) in the ordered and cumulative models is via maximum likelihood. The log-likelihood is the same for two models except for the difference in the parameterization. n For the ordered model the log-likelihood is (= µ , β ) ∑ i =1 ∑ [ Yi j ]log ( F ( µ j − x′i β ) − F ( µ j −1 − x′i β )) . = j Standard errors can be obtained from the Hessian, OP or their combination as QMLE. The default Hessian is preferred. A heteroscedastic model can be also fitted using, for example the variance model Var ( ε i | x i ) = σ 2 exp( z′i γ ) as we did in the binary logit case. Note that σ 2 is a constant. Illustrative Example 2 In example 1 the self-reported health status (HSAT) has a range 0 to 10. Suppose we create an ordinal response using the categories reflected in the format: value ohsat 0-<3='0' 3-<6='1' 6-<9='2' 9='3' 10='4'; proc logistic data=c.healthcare(where=(year=1988)); class married(ref='no') hhkids(ref='no') female(ref='male')/param=ref; model HSAT=age educ hhninc married hhkids female /link=cumlogit; format female female. married hhkids affirm. hsat ohsat.; run; 5 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis The responses are cumulated over the lower formatted values. Table 2 shows the estimation results for the homoscedastic cumulative logit model (columns 3-5) fitted in proc LOGISTIC. The LR test (5 DF) is for the model’s δ-parameters. The estimate for AGE, for example, is 0.0322, which indicates that as people grow older, they are more likely to be in the lower end of the observed ordinal scale, i.e., having worse health. The proportional odds assumption maintains the same slope parameter across the 4 response levels. Responsespecific slope parameters increase the number of parameters by 18. Unfortunately, overall, the proportional odds assumption is violated (score test χ 2 = 66.6, 18 DF, p<.0001). Proc QLIM may be used to fit the equivalent homoscedastic ordered logit model. The results (not shown) are the same for the threshold parameters, but the signs for the covariates are reversed because the β-parameters here are −δ. A heteroscedastic ordered logit model with Var ( ε i | x i ) = σ 2 exp( z′i γ ) is fitted in QLIM (columns 6-8). The LR test for no heteroscedasticity ( χ 2 =20.42, 3DF) is significant, p<.0001. Comparison of coefficients between the two models is meaningless. Instead, predicted probabilities and marginal effects could be compared. proc qlim data=c.healthcare(where=(year=1988)); endogenous HSAT~discrete(dist=logistic order=formatted); model HSAT=age educ hhninc married hhkids female/limit1=varying; format female female. married hhkids affirm. hsat ohsat.; hetero HSAT~HHNINC female age/link=exp noconst; test 'NOHETERO' _H.HHNINC, _H.female, _H.age/all; run; Table 2: Ordinal Logit Models Homoscedastic Cumulative Logit Heteroscedastic Ordered Logit Standard Standard Estimate Error p–value Estimate Error p–value Parameter Intercept 1 α1, μ1 α2, μ2 α3, μ3 α4, μ4 –3.5070 0.2197 <.0001 –3.8870 0.3535 <.0001 –1.3858 0.2105 <.0001 –1.5145 0.2465 <.0001 0.9275 0.2099 <.0001 0.9868 0.2294 <.0001 1.8707 0.2129 <.0001 1.9965 0.2525 <.0001 0.0322 0.0029 <.0001 –0.0352 0.0040 <.0001 EDUC –0.0650 0.0127 <.0001 0.0711 0.0142 <.0001 HHNINC MARRIED yes –0.4254 0.0636 0.1820 0.0738 0.0194 0.3884 0.4166 –0.0661 0.1945 0.0814 0.0322 0.4173 HHKIDS yes –0.1144 0.0671 0.0884 0.1295 0.0724 0.0735 –0.0130 0.0570 0.8199 0.0152 –0.5438 0.0620 0.1611 0.8063 0.0007 0.0391 0.0078 0.0571 0.0026 0.4931 0.0027 Intercept 2 Intercept 3 Intercept 4 AGE FEMALE F _H.HHNINC _H.FEMALE F _H.AGE –2 Log L –2 Log L (null) –2 Log LR 11489.26 11750.19 11477.84 11750.19 251.93 272.35 6 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis Since the proportional odds assumption is violated in this example one might consider fitting a model with level-specific coefficients for the covariates. But π j ( x i ) = F ( µ j − x′i β j ) − F ( µ j −1 − x′i β j −1 ) must be between 0 and 1, and the only way to assure this for all covariate values is to have µ j > µ j −1 and β j = β j −1 . This is tantamount to assuming the proportional odds model. SAS Usage Note 22954 uses NLMIXED to fit a fully non-proportional odds model wherein each of the covariates is crossed with the response levels. The likelihood for optimization is constructed from the cumulative response probabilities γ j ( x i ). Whenever ‘ π j ( x i ) ≤ 0 ’ for an observation its contribution to the likelihood is set to near zero, whilst if ‘ π j ( x i ) > 1 ’ the contribution is set to 1. In this way we can assure that estimates of the response probabilities are properly constrained. Stokes et al (2000) provide another approach based on generalized estimating equations (GEE) [ Yi ≤ j ], j =1,..., J − 1 . This makes the for the vector of binary responses Yi = ( Yi 1 , Yi 2 , , YiJ −1 ) where Yij = marginal responses highly correlated. The GEE model for the marginal response is logit P= [ Yij 1|= xi x′ij β . Although this method does not guarantee appropriately constrained response ( ) probability estimates, it is easy to implement and generally, with data sets of moderate size, will yield proper probability estimates π j ( x ) unless x lies in the fringes of the covariate space (McCullagh and Nelder, 1989). Another alternative with non-proportionality of odds is to abandon the ordinal model altogether and regard the response as nominal. The multinomial model is described next. Multinomial logit (generalized logit) The multinomial logit model (MLM) makes the parameters specific to the nominal outcome. With subject– exp( x′i β j ) specific covariates only, the probability of response j ∈{0,1, , J − 1} = is π ij π= with j ( xi ) J −1 ∑ j =0 exp( x′i β j ) β 0 = 0 for identification. An intercept is included in each β j . The MLM has the property of independence from irrelevant alternatives (IIA) because= π ij / π ik exp( x′i ( β j − β k )) depends only on the two outcomes (j, k). Having too many parameters is a serious drawback of the MLM. Since one outcome (j=0) is used as a reference we will have J−1 intercepts and (J−1)p regression coefficients, a total (J−1)(p+1) parameters. In the previous example on health status at 5 nominal levels and 6 covariates we have a MLM with 28 parameters. The proporatinal odds model on the other hand has 10 parameters. The MLM can be estimated in LOGISTIC or GLIMMIX using the link=glogit option. A single record file per subject is used with only the observed nominal response Yi and covariates x i . Proc MDC in SAS/ETS could also be used but requires a multiple-record input file — one record for each of the J alternatives. The dependent variable is numeric with value 1 for the observed response and zero for all other alternatives. All subject covariates need to be made response level–specific (crossed effects). Essentially MDC is fitting a conditional logit model (see next). See MDC documentation example ‘Binary Data Modeling’ for a description of the binary logit model as a choice model. Conditional logit A series of logit models was popularized by McFadden (1984) in the context of discrete choice. A person (indexed by i ) is presented with a set of discrete choices C i ––for example, choice of health insurance plan or 7 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis health care provider; or different treatment regimes (surgery, medical management, no treatment). The observed option Yi = j that the individual chooses can be thought of as the person’s attempt to optimize his or her utility function {U ij : j ∈ C i } . The selected choice Yi = j is made because the person believes U ij ≥ max{U il : l ∈ C i , l ≠ j } . Different classes of choice models are obtained from an underling latent U ij x′ij β + ε ij where {ε ij : j ∈ C i } are random variables with a specified random utility model (RUM) = distribution. The conditional logit model (CLM) assumes {ε ij : j ∈ C i } are independent identically distributed (iid) extreme–value random variables, with distribution function F ( u ) = exp( − exp( −u )) , −∞<u<∞. The computation of π ij = P[ Yi = j | x i ]= P[max{U il : l ∈ C i , l ≠ j } < U ij | x i ] leads to the expression π ij = ∑ exp( x′ij β ) l ∈C i exp( x′il β ) . The CLM has the IIA property, that is, for any two alternatives (j, k) we have exp(( x ij − x ik )′β ) which depends only on the characteristics of the two alternatives (j, k). A π= ij / π ik covariate that does not vary across alternatives does not enter the model because it is a constant multiplier to both the numerator and denominator of π ij . Estimation in the CLM is via maximization of the log–likelihood = ( β ) ∑= ∑ [Y n i =1 j i j ]log π ij . This is exactly the same objective function that one obtains in conditional logistic regression for matched case–control studies. The analogy is that the revealed choice from the set C i is a ‘case’ whilst all remaining alternatives in the choice set are ‘controls’. Therefore the CLM can be analyzed in proc LOGISTIC using the strata statement to identify the matched sets, whereas in proc MDC the id statement serves the same functionality. Both procedures require a multiple-record input file—one record for each alternative in C i . Illustrative Example 3 Allison (1999) describes a study of 147 murder cases. Each of 50 trial judges were asked to read 14 or 15 murder cases and rank them from the most serious (rank=1) to the least (up to 15). All cases were ranked and ties were allowed with ties given the average rank. For example, ties in the three most serious cases received average rank=2; ties in 5–th and 6–th cases got average rank=5.5. Each case was ranked by 4 to 6 judges. The data set JUDGERNK is arrayed as one record per case with the following characteristics of each case: BLACKD= indicator for defendant being black; WHITVIC= indicator for victim being white; DEATH= indicator for death penalty; JUDGID identifies judges. Allison adds CULP an ordinal variable for culpability on a scale 1 to 5 derived from prediction of the death penalty. CULP is used here as another covariate although it is a generated regressor (Wooldridge, 2002). When ranking the cases for seriousness the judges did not receive information on race or penalty. We first consider the 35 judges who gave a unique top rank (=1). Other cases ranked may have ties. The objective is to assess the relative importance of characteristics of the case that was ranked as most serious. In the data set JUDGRNK2 the variables CHOSEN and CHOSEN2 are created for convenience: chosen=(rank=1); chosen2=(rank=1)+2*(rank>1); 8 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis In the parlance of the choice model, π ij is the probability that judge i ranks case j as the most serious amongst his or her portfolio of cases C i . The case characteristics x i = (BLACKD, WHITVIC, DEATH, CULP) vary across cases in C i and across judges. Proc MDC is dedicated to fitting discrete choice models. The CLM is invoked via the type=clogit option in the model statement. An equivalent model statement is also shown but it uses only the first ranked choice, all other ranks are ignored. Future enhancements will provide flexibility of analyzing rank–ordered responses. proc mdc data=judgernk2 covest=hess; id judgid; model chosen=blackd whitvic death CULP/type=clogit choice=(rank); *model rank=blackd whitvic death CULP/type=clogit choice=(rank) rank; output out=stats_q pred=phat_q xbeta=xbeta_q; run; Exactly the same model is fitted by LOGISITC via proc logistic data=judgernk2; strata judgid; model chosen(event='1')=blackd whitvic death CULP; run; Table 3: Choice models First ranked choice Ranked choices Parameter Estimate Standard Error p–value Estimate Standard Error p–value BLACKD 0.2043 0.4124 0.6204 0.1195 0.0971 0.2185 WHITVIC 0.3631 0.4266 0.3947 0.2370 0.1046 0.0235 –0.4339 0.4821 0.3681 –0.1818 0.1377 0.1866 0.5311 0.1415 0.0002 0.2586 0.0423 <.0001 DEATH CULP In Table 3 (columns 2–4) the only significant coefficient is CULP indicating that an increase in this variable is associated with an increase in the probability of a case being ranked as most serious. In fact the partial effects ∂π ij for continuous covariates are = βπ ij ([ k= j ] − π ik ) . For a discrete covariate the partial effect should be ∂x ik derived as differences in probabilities. The OUTPUT statement will compute the probability that each case in the input file is ranked first. For each judge these probabilities for the portfolio C i must sum to 1. Note that only case characteristics are used in π ij . This does not mean that cases with same values for BLACKD, WHITVIC, DEATH and CULP rated by different judges will have the same probability of receiving the most serious rank. The reason is that the choice set could be different for different judges. Neither MDC nor LOGISTIC will compute a confidence interval for the choice probabilities. However, using a survival model that has the same likelihood as the choice model, PHREG would allow computation of confidence intervals. The variable CHOSEN2 is regarded as an event time with the first ranked case having value 1 and all other cases having value 2, which is treated as censored. Contribution to the partial log–likelihood by the potential times for cases j ∈ C i for judge i is the aforementioned ( β ) . The absence of 9 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis ties amongst the event times makes the likelihoods–– Breslow, Efron, discrete all the same. In the parlance of survival analysis, the estimated cumulative hazard at time t is Hˆ i ( t |z0 ) = H i 0 ( t , βˆ )exp( z′0 βˆ ) where z0 is a t profile of a case, H ( t , βˆ ) = {S ( 0 ) ( u , βˆ )}−1 dN ( u ), S ( 0 ) ( t , βˆ ) = Y ( t )exp( x βˆ ) , Y ( t ) is the indicator i0 ∫ 0 i ∑ i i j ∈C i ij ij ij for cases to be ranked at time t, and N i ( t ) is the counting process for ranked cases up to time t. We have just one event time (=1) which yields the desired choice probability for case profile z0 . Note that the profile z0 need not be one of the cases in the portfolio. This fact has important implications in application of discrete choice models in marketing research where the available constellation of choice characteristics could be extremely large. PHREG computes a confidence interval for Si (1|= z0 ) exp( − H i (1|z0 )) from a confidence interval for log ( − log( Si (1|z0 )) =log H i 0 (1, β ) + z′0 β . This can be salvaged to get the desired confidence interval via the approximation 1 − Si (1|z0 ) = 1 − exp( −H i (1|z0 )) ≈ H i (1|z0 ) . As an alternative, one could use directly the variance of Hˆ (1|z ) to do the calculations, Var ( Hˆ (1|z )) ≈ Var ( Sˆ (1|z )) / Sˆ 2 (1|z ) . i 0 i 0 i 0 i 0 Exploded logit (rank–ordered logit) model U ij x′ij β + ε ij where we maintain the The exploded logit model uses the rankings of the utilities in the RUM = assumption that the errors {ε ij : j ∈ C i } are distributed iid extreme–value. The observed responses are the rank order of the utilities of the choices, instead of the single choice that corresponds to the maximum utility. For example, without loss of generality suppose there are J alternatives and individual i ranks them as = Yi 1 max{U ij : j = ∈ C i } U= max{U ij : j ∈ C i= , j > 1} U i 2 , , YiJ = U iJ . The observed response is i 1 , Yi 2 only the rank order U i 1 >U i 2 > > U iJ . The response probability is computed as P[U i 1 >U i 2 > > U iJ ]. We may allow for incomplete rankings with the first J1 alternatives being ranked keeping the remaining J− J1 unranked. The probability of response is then P[U i 1 > > U iJ1 > U iJ1 ] where = U iJ1 max{U ij : j > J 1} . Ties among ranks are theoretically not possible under the continuous utility specification. However, see below. = exp( −U ij ) has the exponential distribution with mean ( λij )−1 where λij = exp( x′ij β ) . For Use the fact that X ij a subset A⊆ C i , min{X ij : j ∈ A} is exponentially distributed with inverse scale ∑ j ∈A λij . A pedestrian λ λ λiJ i1 i2 1 . The structure makes the calculation yields P[U i 1 > > U iJ1 > U iJ1 ] = ∑ j ≥1 λij ∑ j ≥ 2 λij ∑ j ≥ J λij 1 term exploded logit to describe this model quite appropriate. The overall likelihood is the product of such terms across the sample. The form of this likelihood is exactly the same as the Breslow likelihood for observed survival times Ti 1 < Ti 2 < < TiJ1 in a sample of J potential events time of which the last J− J1 are censored. Therefore to analyze these data on the preference ranks we can use PHREG with the survival times 1<2< < J 1 for the first J1 ranked alternatives and a censored value (=J1+1) for the last J− J1 unranked alternatives. Of course the actual “times” are immaterial as long as the order is preserved. 10 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis If there are ties amongst the preference ranks an acceptable approach is to modify the above likelihood terms as follows. Suppose alternatives j 1 , , j p have the same rank r and R denotes all subsets of p alternatives amongst those that might receive a rank r or worse. Let q = ( q1 , , q p ) denote subscripts for the p alternatives in a subset q∈R. The corresponding term(s) in the likelihood is replaced by ( ) exp ( p x )′β ∑ k =1 ijk . Allison suggests that this discrete logistic likelihood should be used with tied p x )′β exp ( ∑ l = 1 iql ∑ q∈R ranked data. Estimation is readily carried out in PHREG with the TIES=DISCRETE option to invoke use of this likelihood. The response times are the observed ranks 1, 2,…, allowing for ties. ( ) Illustrative Example 4 Use the data set JUDGERNK with the syntax proc phreg data=judgernk; strata judgid; model rank=blackd whitvic death CULP/ties=discrete; output out=stats xbeta=xbeta logsurv=logsurv survival=survival/method=ch; run; The parameter estimates are shown in Table 3 (columns 5–7). Strictly speaking the results are not comparable with the analysis of the first ranked choice because inter alia the data sets used and the underlying models are different. Output statistics generated for the rank–ordered model must be interpreted with some caution because PHREG is operating in the context of a survival model. Let us carry out a few calculations (see Table 4). JUDGID=11 provided unique ranks to his portfolio of 15 exp( x′i 1 β ) cases. The probability of case=4163 (obs=1) being ranked first is P[U i 1 > max{U ij : j > 1}] = . ∑ j ≥1 exp( x′ij β ) For obs=1 we get 3.1026/31.8052 =0.0976 which is the cumulative hazard H(1)= –logsurv. Survival is computed as S (1)= P[ RANK > 1]= exp( −H (1))= 0.9071. Dividing each exp_xb=exp(xbeta) by the sum across all cases gives the probability of first rank for each case. Although the observed ranks differ in obs=11, 14 and 15, they have the same case characteristics and will therefore have the same probability of having the first rank. Obs=2 is a different case. exp( x′ β ) exp( x′ β ) i1 i2 =0.0976×2.2954/28.7026=0.0078. P[U i 1 > U i 2 > max{U ij : j > 2}] = ∑ j ≥1 exp( x′ij β ) ∑ j ≥ 2 exp( x′ij β ) −1 −1 For obs=2 we have H (2)= ((31.8052) + (28.7026) ) × 2.2954 = 0.15215. Survival for this record is S (2)= P[ RANK > 2]= exp( − H (2))= 0.8587. We notice that we cannot easily use these results to compute the choice probabilities, other than for the first ranked choice. Moreover, if there were tied ranks at the first choice then ∆N i (1) > 1 and we cannot use H(1) directly to obtain choice probabilities. Future enhancements to proc MDC for ranked choice response data are likely to address these issues. Currently, in SAS/ETS 9.2, MDC utilizes only the first ranked value (=1) in estimation, ignoring the rest, basically fitting a conditional logit model. 11 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis Table 4: Output statistics for ranked choice (exploded logit) model Obs judgid rank blackd whitvic death culp case xbeta survival logsurv exp_xb 1 2 11 11 1 2 1 0 1 1 0 1 3 4163 1.13224 3 2172 0.83093 0.90706 –0.09755 0.85886 –0.15215 3.1026 2.2954 3 11 3 1 1 0 1 1880 0.61508 0.82477 –0.19266 1.8498 4 11 4 1 0 1 5 2015 1.23062 0.60900 –0.49594 3.4234 5 11 5 0 0 0 1 1060 0.25858 0.77966 –0.24890 1.2951 6 11 6 1 1 0 1 2375 0.61508 0.63843 –0.44875 1.8498 7 11 7 1 0 1 2 1720 0.45487 0.62505 –0.46993 1.5760 8 9 11 11 8 9 1 1 0 0 1 1 5 1598 1.23062 2 197 0.45487 0.29248 –1.22936 0.50295 –0.68727 3.4234 1.5760 10 11 11 11 10 11 0 1 1 0 1 0 5 119 1.34809 1 3035 0.37809 0.13315 –2.01630 0.38393 –0.95731 3.8501 1.4595 12 11 12 1 0 0 2 4142 0.63668 0.21236 –1.54945 1.8902 13 11 13 0 0 0 1 4128 0.25858 0.25437 –1.36896 1.2951 14 11 14 1 0 0 1 1791 0.37809 0.12967 –2.04274 1.4595 15 11 15 1 0 0 1 0.04770 –3.04274 1.4595 Sum 31.8052 463 0.37809 Nested logit As an extension of the conditional logit model suppose the choice alternatives are partitioned into K nonoverlapping nests, B1 , , BK (McFadden, 1984). The observed response for the i-th subject is the revealed choice (j) within the nest (k), that is,= Yi j , j ∈ Bk . Suppressing the subject index, the underlying latent utility Vkj + ε kj where Vkj will be specified later. Within the nest Bk the errors = εk {ε kj : j ∈ Bk } have model is U= kj ( ) θk exp − ∑ j ∈B exp( −u kj / θk ) , called the generalized extreme-value a joint cumulative distribution F ( u k ) = k distribution (GEV, Train, 2003). To ensure that F ( u k ) is a proper distribution we require θk ∈ (0,1] for k=1, 2,…, K. Across nests the errors are independent. The reasoning behind this specification originates from the marginal distribution of each ε kj , j ∈ Bk , which is assumed to be the standard extreme-value distribution C ( Λ( u k 1 ), Λ( u k 2 ),) Λ(= u ) exp( − exp( −u )), − ∞ < u < ∞ . We then generate the GEV distribution F ( u k ) = for εk via the Gumbel-Hougaard (GH) copula C ( v ) = exp − 1999). (∑ j ∈Bk ( − log v j )1/θk ) θk , v ∈[0,1] (Nelson, j The GH copula belongs to the Archimedean Family of Copulas which is generated by a continuous convex strictly decreasing function ϕ : [0,1] → [0, ∞ ] . Then ϕ (C ( v )) = ∑ j ∈B ϕ ( v j ) . For the GH copula k 1/θk ϕ ( v ) = ( − log v ) . Kendall’s tau ( τ ) assesses the association between two marginals ( ε k 1 , ε k 2 ) . It is the 12 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis difference of the probability of concordance P[( ε k 1 − ε k′1 )( ε k 2 − ε k′ 2 ) > 0] and the probability of discordance P[( ε k 1 − ε k′1 )( ε k 2 − ε k′ 2 ) < 0] where ( ε k′1 , ε k′ 2 ) is an independent copy of ( ε k 1 , ε k 2 ) . For the GH copula 1 1 + 4 ∫ {ϕ ( v ) /ϕ ′( v )}dv = 1 − θk . Also Corr ( ε kj , ε kl )= 1 − θk2 (Kotz and Nadarajah, 2000). Generally, we τ= 0 require θk ∈ (0,1] which makes the RUM consistent with utility maximization. If θk = 1 for all k, then {ε kj : j ∈ Bk } are iid extreme-value random variables. With the specification of the GEV for εk the choice probability π kj for a subject with choice (j) within the π kj P= [ Yi j | Bk ]P[ Yi ∈ Bk ] , with the maximum utility across nests being in Bk . nest (k) is computed as = This is the nested logit model with level 1 for alternatives (j) and a level 2 layer for nests (k). For multiple layers the formulation becomes more complex in its notation (Hensher et al, 2005). Analogous to a tree structure a 4-layer nested model has alternatives (level 1) nested in branches (level 2) that are in limbs (level 3) of trunks (level 4) of the root. ( ) θk exp(Vkj / θk ) ∑ l ∈Bk exp(Vkl / θk ) The expression for a 2-level choice probability is π kj = ∑ l ∈B exp(Vkl / θk ) K k = ∑ k 1 ∑ l ∈Bk exp(Vkl / θk ) ( ) ( ) θk . exp (Vkj − Vkm ) / θk which maintains the IIA property within For alternatives j , m in Bk we have π= kj / π km a nest. However, if j ∈ Bk , m ∈ Bl , m ≠ l then π kj / π lm exp ((Vkj / θk ) − (Vlm = (∑ / θ )) (∑ ) / θ )) exp(Vkh / θk ) h∈Bk l h∈Bl exp(Vlh θ k −1 θ l −1 depends on alternatives in the nests Bk , Bl . l When θk = 1 for all k, the 2-level nested logit model reduces the conditional logit model. = z′kα + x′kj β with variables that depend only on the nest and variables that To incorporate covariates let V kj depend on alternatives (within the nest). Note that the additional subject index (i) is suppressed. Then the utility-maximized nested logit model (UMNL) is ( ) θk exp( x′kj β / θk ) exp( z′kα ) ∑ l ∈B exp( x′kl β / θk ) k π kj = K ∑ l ∈B exp( x′kl β / θk ) k ∑ k 1 exp( z′kα ) ∑ l ∈Bk exp( x′kl β / θk ) = ( ) θk exp( x′kj β / θk ) exp( z′ α + θ I ) k k k = ∑ l ∈B exp( x′kl β / θk ) ∑ K exp( z′kα + θk I k ) k k =1 where I k = log (∑ l ∈Bk UMNL ) exp( x′kl β / θk ) k=1,…,K are called the inclusive values. The scale parameters θk in the GEV could be referred to as the inclusive value parameters. From the GEV distribution we can compute the expected maximum utility from alternatives in Bk : E max(U kj : j ∈ Bk ) = θk I k + z′kα + γ where γ is a ( ) (Euler) constant. The first term in π kj is the conditional probability of selected choice being j in the nest Bk , given that the maximum utility across nests is the nest Bk . The second term is the probability of = M k max(U kj : j ∈ Bk ) being the maximum utility across nests, i.e., P[ M k > max( M l : l ≠ k ,1 ≤ l ≤ K )] . 13 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis Proc MDC does not fit the UMNL (Silberhorn et al, 2008). Instead it fits the non-normalized nested logit model (NNNL) where β / θk is replaced by β yielding the formula: exp( z′ α + θ I ) k k k K ∑ l ∈B exp( x′kl β ) ∑ exp( z′kα + θk I k ) k k =1 π kj = exp( x′kj β ) NNNL The NNNL places no restrictions on the parameters θk . If all θk are constrained to be equal, the UMNL model reduces to the NNNL model. In MDC we could estimate nest-specific coefficients by replacing β / θk with βk . The corresponding UMNL beta coefficients are βk × θk . This means that we are estimating alternative-specific effects that differ by nest. Unless the intended application can support a complex structure, fitting such a model would be unwieldy and its interpretation a challenge. For each individual i in a random sample, the observation is the revealed choice and the nest to which it belongs, i.e., Yi = j and the nest Bk with j ∈ Bk . With individual covariate values ( x kji , zki ) the log-likelihood is (α , = β) ∑ ∑ n i =1 k, j Yi [= j , j ∈ Bk ]log π kj ( x kji , zki ). Illustrative Example 4 Brownstone and Small (1989) describe a study of 527 automobile commuters from home to their work place. The choice of arrival time at work consists of 12 alternatives based on their preference for arriving at work early, on-time or late relative to the official work-start time. Early arrivals (ALT 1-8) have a schedule delay (SD) of between −40 min to −5 min in 5 min increments; on-time arrivals (ALT 9) of course have SD=0; and late arrivals (ALT 10-12) have SD 5, 10 or 15 min. The binary DECISION (0 or 1) revealed the arrival time choice. About 35% (n=187) of commuters chose on-time arrival, and only 5% (n=22) favored late arrival. Data on travel time (TTIME) in minutes were obtained from actual work-arrival time, official start-time at work supplemented by calculations for each commuter for each alternative. For the chosen alternative (DECISION=1), as expected TTIME was on average slightly longer for carpoolers (CP=1, n=156) than noncarpoolers (CP=0, n=371). Some individuals had the flexibility of late arrival at work without any consequence. The binary variable D2L=[SD≥FLEX] indicates schedule delay in excess of flex time (FLEX in mins). So D2L=0 for ALT 1-8. The variable SDLX=(SD−FLEX)/10, if SD>FLEX, and 0 otherwise, measures schedule delay in excess of allowed FLEX. We define an indicator FL=[FLEX>0] for commuters who had flexibility of late arrival (FL=1, n=193), and those who did not (FL=0, n=334). Four variables describe characteristics of alternatives only: SDE=(−SD/10)×[SD<0] for schedule delay for early arrival; SDL =(SD/10)×[SD>0] for schedule delay for late arrival. Note that (SDE, SDL)=(0,0) only for ALT=9, on-time arrival. Binary variables R15=[SD∈{−30, −15, 0, 15}] and R10 =[SD∈{−40, −30, −20, −10, 0, 10}] capture the tendency of respondents to round off answers to their schedule delay time to 15 minutes and 10 minutes, respectively. The figure depicts the tree structure for schedule delay. This is a 2-level model. Level 1 at the bottom shows the alternatives which are nested at level 2 in three nests. The nests are joined at the top of the tree. The NEST statement in proc MDC for the nesting of level 1 alternatives in level 2 nests is nest level(1) = (1 2 3 4 5 6 7 8 @ 1, 9 @ 2, 10 11 12 @ 3), level(2) = (1 2 3 @ 1); 14 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis Covariates for the model are specified through the UTILITY statement. The general specification is utility u(level,[email protected])=; Figure: Tree Structure for Schedule Delay TOP On Time Early B2 = {9} B1 = {1, 2, 3, 4,5,6,7,8} 1 2 3 4 5 Late 6 7 8 9 B3 = {10,11,12} 10 11 12 Although proc MDC permits some flexibility in covariate specifications, having too many alternative-specific covariates builds an unwieldy model that is likely at best to be un-interpretable, let alone being able to fit properly (convergence problems). In most applications one would use a set of covariates that are common to all alternatives. All covariates in the model must appear in the MODEL statement. The data set SMALL may be accessed from the SAS Sample Library for the MDC procedure. Individual commuters are identified by ID, there are 12 records per individual corresponding to ALT= 1-12, for a total of 527×12= 6324 records. For additional description and analysis of this data set see Brownstone and Small (1989), Small (1982) and the documentation example ‘Nested Logit Analysis’ in MDC. For illustrative purposes and demonstration of different nested logit models we will use the following covariates: Level 1: R10, R15, TTIME, SDE, SDL, SDLX, D2L Level 2: CP_2, FL_2, CP_3, FL_3. The level 2 variables are indicators for CP and FL specific to nest=2 (ALT 9), and nest=3 (ALT=10-12). For nest=1 (ALT 1-8) all four variables are zero. Note that these level 2 variables are subject-specific. They are constant across the alternatives within each nest. Table 5 summarizes the output from fitting different NNNL models. Model A: Covariates at level 1 only. proc mdc data=small maxit=200 covest=hess; model decision = r15 r10 ttime sde sdl sdlx d2l/ type=nlogit choice=(alt); id id; utility u(1, )= r15 r10 ttime sde sdl sdlx d2l; nest level(1) = (1 2 3 4 5 6 7 8 @ 1, 9 @ 2, 10 11 12 @ 3), level(2) = (1 2 3 @ 1); run; 15 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis The labeling of the parameters β in the output is self-explanatory. The inclusive value parameters θ1 , θ 2 , θ 3 are named INC_L2G1C1, INC_L2G1C2, INC_L2G1C3. The three nests at level 2 (L2) form a single group (G1) at the top of the tree (see Figure). Model B: Covariates at level 1 only with the restriction θ= θ= θ3 . 1 2 As previously noted, model B will be consistent with utility maximization. The restriction is accomplished by adding the option SAMESCALE to the model statement. The LR test for model B versus model A has −2 log LR=8.03. The 2 DF chi-square test is significant (p=.018). The test can be carried out within the invocation for fitting model A by adding the TEST statement: test "SAME SCALE" INC_L2G1C1=INC_L2G1C2=INC_L2G1C3/LR; Model C: Covariates at level 1 only with the restriction θ 2 = 1. Nest 2 is degenerate because it has a single alternative associated with it. The inclusive value parameter θ 2 is not defined for the UMNL but identifiable in the NNNL. To impose the restriction, add the RESTRICT statement to model A: restrict "THETA2=1" INC_L2G1C2=1; The LR test for model C versus model A is not significant. Model D: Covariates at level 1 and level 2. The syntax modifies the UTILITY statement, imposes bounds on θ1 and θ 3 via a BOUNDS statement, and restricts θ 2 = 1 as before. proc mdc data=small maxit=250 covest=hess; bounds 0<INC_L2G1C1<=1, 0<INC_L2G1C3<=1; model decision = r15 r10 ttime sde sdl sdlx d2l cp_2 FL_2 cp_3 FL_3/ type=nlogit choice=(alt); id id; utility u(1, ) = r15 r10 ttime sde sdl sdlx d2l, u(2, 1 2 [email protected])=cp_2 fl_2 cp_3 fl_3; nest level(1) = (1 2 3 4 5 6 7 8 @ 1, 9 @ 2, 10 11 12 @ 3), level(2) = (1 2 3 @ 1); restrict "THETA2=1" INC_L2G1C2=1; run; The utility specification for Model D is U kj =z′kα + x′kj β + ε kj where x′kj β = β1R 10 + β 2 R 15 + β 3TTIME + β 4 SDE + β5 SDL + β6 SDLX + β7 D 2 L , z′kα = α 11CP _ 2 + α 12 FL _ 2 + α 21CP _ 3 + α 22 FL _ 3 . An increase of 1 min in travel time is associated with an expected disutility β 3 =–0.0933, whereas arriving at work a minute earlier has a disutility of β 4 =–0.6490/10 (recall that SDE is scaled by 10). The marginal rate 16 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis ∂EU kj −1 = ) / ( =) 10 β 4 / β 3 0.70. So a commuter will ∂SLEkj ∂TTIMEkj incur 0.70 minutes of extra travel time to avoid arriving an extra minute early. In all models the negative sign on the β-coefficients for variables associated with time signify their disutility. The α-coefficients in model D are subject-specific. For example, with all other variables held constant, α 21 is the difference in utility between a commuter who carpools and arrives late, and a commuter who does not carpool and arrives late. The negative sign on the estimate seems plausible, reflecting perhaps the perceived inconvenience of having to travel with others. A Wald test is not significant (p=.4052). By default MDC produces Wald tests for all parameters in the model. The TEST statement carries out hypotheses tests for linear combinations of model parameters through the LR, Wald, or Lagrange multiplier (score) chi-square tests. of substitution is ∆TTIMEkj / ( −∆SLEkj ) ≈ ( ∂EU kj Estimates of choice probabilities are computed for each record in the data set from an OUTPUT statement: output out=stats_mdc predicted=phat; The distribution of values of phat by alternative are shown in the boxplot. Each box represents 527 estimates. Table 5: Non-normalized Nested Logit Models Model A Parameter Estimate Model B Model C Model D Standard Standard Standard Standard Error Estimate Error Estimate Error Estimate Error r15_L1 1.1455 0.1234 1.1404 0.1104 1.1300 0.1118 1.0996 0.1260 r10_L1 0.4344 0.1202 0.4260 0.1096 0.4203 0.1105 0.3862 0.1250 ttime_L1 –0.0803 0.0361 –0.1072 0.0441 –0.0752 0.0290 –0.0933 0.0367 sde_L1 –0.6711 0.0760 –0.6765 0.0572 –0.6623 0.0693 –0.6490 0.0710 sdl_L1 –2.1683 0.5036 –2.1960 0.4994 –2.1146 0.4649 –2.3154 0.6865 sdlx_L1 –3.4391 1.5077 –3.1042 1.3509 –3.3737 1.4740 –2.5152 1.7652 d2L_L1 –1.2057 0.3665 –1.3962 0.3640 –1.1183 0.1897 –0.7994 0.2630 INC_L2G1C1 0.5992 0.2547 0.7471 0.1521 0.6574 0.1735 0.7641 0.1304 INC_L2G1C2 0.9133 0.2782 0.7471 0.1521 1.0000 INC_L2G1C3 0.7436 0.1543 0.7471 0.1521 0.7694 1.0000 0.8730 0.1803 CP_2_L2G1 –0.7075 0.2268 FL_2_L2G1 0.4282 0.2862 CP_3_L2G1 –0.4096 0.4921 FL_3_L2G1 0.5630 0.7200 −Log L 993.53 997.54 993.58 17 0.1387 988.19 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis Boxplot: Distribution of estimates of choice probabilities (Model D) PROC NLP is harnessed to carry out the maximum likelihood estimation for two UMNL models E and F (Table 6) considered as counterparts to NNNL models C and D. In model F having covariates at level 2 appears to be detrimental as most alternative-specific variables are not significant. All alternative-specific coefficients are scaled by the corresponding inclusive value parameters θ1 or θ 3 , but θ 2 is not defined and thus fixed at value 1. Initial parameters for the NLP procedure (inest= option) were from the NNNL models, and initial results from NLP were used in subsequent iterations of NLP with the hope of improving convergence and precision (i.e., small gradients). Although the results in Table 6 are satisfactory, we are unsure if additional improvements are possible using the myriad of options available in NLP. SUMMARY SAS Usage Note 22871 summarizes the types of logit models that can be fitted with SAS software. In this paper we described some of the capabilities of SAS procedures LOGISTIC, GENMOD, PHREG, QLIM and MDC in fitting a variety of logit models. We covered the binary logit for a dichotomous response, the ordinal and cumulative logit for ordered responses, the multinomial (or generalized) logit for nominal responses, and the exploded logit model for ranked responses. The latter used PHREG for analysis by exploiting the analogy between the ranked outcomes and a discrete time survival model. For discrete choice models, the conditional logit and nested logit models were discussed. The conditional logit model (CLM) is structurally similar to conditional logistic regression (CLR) for matched case-control data. However, important differences exist in interpretation of results from CLR and CLM because of differences in study design. For all models discussed in this paper estimation of model parameters is via maximization of an appropriate objective function, which is generally a log-likelihood function. 18 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis Although we focused on a single categorical response, there are natural extensions to longitudinal and clustered data. In specific contexts GLIMMIX and GENMOD could be used to account for correlation in repeated measures. CATMOD performs categorical data analyses for data structures that are presented as multidimensional contingency tables, using weighted least-squares for estimation. Some logit models not discussed in this paper are the continuation-ratio, adjacent-category models for ordinal responses, the stereotype models for ordered and multinomial responses, and mixed-logit model in the context of discrete choice. Finally, we note that using the term logit broadly to describe structurally very different models might seem overly simplistic. Table 6: Utility Maximized Nested Logit Models Parameter Model E Model F Standard Estimate Error Standard p-value Estimate Error p-value r15_L1 0.7868 0.2951 0.0079 0.6852 0.5418 0.2066 r10_L1 0.2879 0.1369 0.0359 0.2328 0.2117 0.2720 ttime_L1 –0.0765 0.0365 0.0369 –0.0696 0.0512 0.1745 sde_L1 –0.4698 0.1804 0.0095 –0.4069 0.3216 0.2063 sdl_L1 –1.8759 0.9548 0.0500 –1.8602 1.1545 0.1077 sdlx_L1 –2.3989 0.8865 0.0070 –2.7819 1.7467 0.1118 d2L_L1 –1.0429 0.1592 <.0001 –0.7970 0.1812 <.0001 INC_L2G1C1 0.6866 0.2693 0.0111 0.6177 0.4988 0.2161 INC_L2G1C2 1.0000 INC_L2G1C3 0.8872 0.9357 0.6083 0.1246 CP_2_L2G1 –0.7849 0.2241 0.0005 FL_2_L2G1 0.4140 0.2169 0.0568 CP_3_L2G1 –0.4661 0.4896 0.3416 FL_3_L2G1 0.0860 1.0276 0.9333 − Log L 997.50 1.0000 0.5545 0.1102 990.89 DATA SOURCES The German Socioeconomic Panel Survey 1984-1995 on healthcare utilization used in examples 1 and 2 is discussed extensively in Greene and Hensher (2010). The judge rank data set used in example 3 is from Allison (1999). The travel time data set of commuters used in example 4 can be obtained from the SAS Sample Program Library for the MDC procedure. 19 SAS Global Forum 2011 Statistics and Data Analysis SAS Global Forum Statistics and Data Analysis REFERENCES Agresti A. Categorical Data Analysis, Second edition. New York: John Wiley & Sons; 2002. Allison PD. Logistic Regression Using the SAS System. Cary, NC: SAS Institute Inc; 1999. Greene WG, Hensher DA. Modeling Ordered Choices: A Primer. New York, NY: Cambridge University Press; 2010. Hensher DA, Rose JM, Greene WH. Applied Choice Analysis: A Primer. New York, NY: Cambridge University Press; 2005. Kotz S, Nadarajah S. Extreme Value Distributions: Theory and Applications. London, UK: Imperial College Press; 2000. Kuhfeld WF. Marketing Research Methods in SAS: Experimental Design, Choice, Conjoint, and Graphical Techniques. Cary, NC: SAS Institute Inc; 2009. McFadden D. Econometric analysis of qualitative response models. In: Griliches Z, Intriligator MD, eds. Handbook of Econometrics, Volume 2. Amsterdam: North-Holland; 1984:1395-1457. Moon CG. Simultaneous specification test in a binary logit model - Skewness and Heteroscedasticity. Communications in Statistics-Theory and Methods. 1988;17(10):3361-3387. McDonald JB, Hansen JV. An application and comparison of some flexible parametric and semiparametric qualitative-response models with heteroskedasticity. International Journal of Systems Science. 2000;31(1):27-33. Nelson R. An Introduction to Copulas. New York, NY: Springer-Verlag; 1999. Riphahn RT, Wambach A, Million A. Incentive effects in the demand for health care: A bivariate panel count data estimation. Journal of Applied Econometrics. 2003;18(4):387-405. SAS Institute Inc. What kinds of logistic (or logit) models can be fit using SAS? Usage Note 22871. Available at: http://support.sas.com/kb/22/871.html. Accessed 01/18/2011. SAS Institute Inc. The PROC LOGISTIC proportional odds test and how to fit a partial proportional odds model. Usage Note 22954. Available at: http://support.sas.com/kb/22/954.html Accessed 01/18/2011. Silberhorn N, Boztug Y, Hildebrandt L. Estimation with the nested logit model: specifications and software particularities. OR Spectrum. 2008;30(4):635-653. Stokes ME, Davis CS, Koch GG. Categorical Data Analysis Using the SAS System. Second edition. Cary, NC: SAS Institute Inc; 2000. Train K. Discrete Choice Methods with Simulation. New York, NY: Cambridge University Press; 2003. Wooldridge JM. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press; 2002. ACKNOWLEDGMENTS This study was supported by the Agency for Healthcare Research & Quality under grant 1R01 HS14206. CONTACT INFORMATION We welcome your comments and questions. Please contact Joseph C. Gardiner Division of Biostatistics Department of Epidemiology B629 West Fee Hall Michigan State University East Lansing, MI 48824 [email protected] SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. 20

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement