The MCMC Procedure

The MCMC Procedure
®
SAS/STAT 14.2 User’s Guide
The MCMC Procedure
This document is an individual chapter from SAS/STAT® 14.2 User’s Guide.
The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. SAS/STAT® 14.2 User’s Guide. Cary, NC:
SAS Institute Inc.
SAS/STAT® 14.2 User’s Guide
Copyright © 2016, SAS Institute Inc., Cary, NC, USA
All Rights Reserved. Produced in the United States of America.
For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by
any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute
Inc.
For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time
you acquire this publication.
The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is
illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic
piracy of copyrighted materials. Your support of others’ rights is appreciated.
U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software
developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication, or
disclosure of the Software by the United States Government is subject to the license terms of this Agreement pursuant to, as
applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a), and DFAR 227.7202-4, and, to the extent required under U.S.
federal law, the minimum restricted rights as set out in FAR 52.227-19 (DEC 2007). If FAR 52.227-19 is applicable, this provision
serves as notice under clause (c) thereof and no other notice is required to be affixed to the Software or documentation. The
Government’s rights in Software and documentation shall be only those set forth in this Agreement.
SAS Institute Inc., SAS Campus Drive, Cary, NC 27513-2414
November 2016
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the
USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
SAS software may be provided with certain third-party software, including but not limited to open-source software, which is
licensed under its applicable third-party software license agreement. For license information about third-party software distributed
with SAS software, refer to http://support.sas.com/thirdpartylicenses.
Chapter 74
The MCMC Procedure
Contents
Overview: MCMC Procedure . . . . . . . . . . . . . . . . . .
PROC MCMC Compared with Other SAS Procedures . .
Getting Started: MCMC Procedure . . . . . . . . . . . . . . . .
Simple Linear Regression . . . . . . . . . . . . . . . . .
The Behrens-Fisher Problem . . . . . . . . . . . . . . . .
Random-Effects Model . . . . . . . . . . . . . . . . . .
Syntax: MCMC Procedure . . . . . . . . . . . . . . . . . . . .
PROC MCMC Statement . . . . . . . . . . . . . . . . .
ARRAY Statement . . . . . . . . . . . . . . . . . . . . .
BEGINCNST/ENDCNST Statement . . . . . . . . . . .
BEGINNODATA/ENDNODATA Statements . . . . . . .
BY Statement . . . . . . . . . . . . . . . . . . . . . . .
MODEL Statement . . . . . . . . . . . . . . . . . . . . .
PARMS Statement . . . . . . . . . . . . . . . . . . . . .
PREDDIST Statement . . . . . . . . . . . . . . . . . . .
PRIOR/HYPERPRIOR Statement . . . . . . . . . . . . .
Programming Statements . . . . . . . . . . . . . . . . .
RANDOM Statement . . . . . . . . . . . . . . . . . . .
UDS Statement . . . . . . . . . . . . . . . . . . . . . . .
Details: MCMC Procedure . . . . . . . . . . . . . . . . . . . .
How PROC MCMC Works . . . . . . . . . . . . . . . .
Blocking of Parameters . . . . . . . . . . . . . . . . . .
Sampling Methods . . . . . . . . . . . . . . . . . . . . .
Tuning the Proposal Distribution . . . . . . . . . . . . .
Direct Sampling . . . . . . . . . . . . . . . . . . . . . .
Conjugate Sampling . . . . . . . . . . . . . . . . . . . .
Initial Values of the Markov Chains . . . . . . . . . . . .
Assignments of Parameters . . . . . . . . . . . . . . . .
Standard Distributions . . . . . . . . . . . . . . . . . . .
Usage of Multivariate Distributions . . . . . . . . . . . .
Specifying a New Distribution . . . . . . . . . . . . . . .
Using Density Functions in the Programming Statements .
Truncation and Censoring . . . . . . . . . . . . . . . . .
Some Useful SAS Functions . . . . . . . . . . . . . . . .
Matrix Functions in PROC MCMC . . . . . . . . . . . .
Create Design Matrix . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5629
5629
5630
5630
5636
5640
5645
5645
5662
5663
5664
5665
5665
5674
5675
5676
5677
5679
5687
5688
5688
5691
5692
5694
5696
5697
5698
5699
5700
5713
5715
5716
5719
5721
5723
5728
5628 F Chapter 74: The MCMC Procedure
Modeling Joint Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5729
Access Lag and Lead Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5731
CALL ODE and CALL QUAD Subroutines . . . . . . . . . . . . . . . . . . . . . . .
5734
Regenerating Diagnostics Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5740
Caterpillar Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5742
Autocall Macros for Postprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . .
5744
Gamma and Inverse-Gamma Distributions . . . . . . . . . . . . . . . . . . . . . . .
5746
Posterior Predictive Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5749
Handling of Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5753
Functions of Random-Effects Parameters . . . . . . . . . . . . . . . . . . . . . . . .
5755
Spatial Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5761
Floating Point Errors and Overflows . . . . . . . . . . . . . . . . . . . . . . . . . . .
5763
Handling Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5766
Computational Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Displayed Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5768
5769
ODS Table Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5773
ODS Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5775
Examples: MCMC Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5776
Example 74.1: Simulating Samples From a Known Density . . . . . . . . . . . . . .
5776
Example 74.2: Box-Cox Transformation . . . . . . . . . . . . . . . . . . . . . . . .
5785
Example 74.3: Logistic Regression Model with a Diffuse Prior . . . . . . . . . . . .
5792
Example 74.4: Logistic Regression Model with Jeffreys’ Prior . . . . . . . . . . . . .
5799
Example 74.5: Poisson Regression . . . . . . . . . . . . . . . . . . . . . . . . . . .
5802
Example 74.6: Nonlinear Poisson Regression Models . . . . . . . . . . . . . . . . .
5804
Example 74.7: Logistic Regression Random-Effects Model . . . . . . . . . . . . . .
5812
Example 74.8: Nonlinear Poisson Regression Multilevel Random-Effects Model . . .
5814
Example 74.9: Multivariate Normal Random-Effects Model . . . . . . . . . . . . . .
5820
Example 74.10: Missing at Random Analysis . . . . . . . . . . . . . . . . . . . . . .
5823
Example 74.11: Nonignorably Missing Data (MNAR) Analysis . . . . . . . . . . . .
5827
Example 74.12: Change Point Models . . . . . . . . . . . . . . . . . . . . . . . . . .
5830
Example 74.13: Exponential and Weibull Survival Analysis . . . . . . . . . . . . . .
5834
Example 74.14: Time Independent Cox Model . . . . . . . . . . . . . . . . . . . . .
5847
Example 74.15: Time Dependent Cox Model . . . . . . . . . . . . . . . . . . . . . .
5854
Example 74.16: Piecewise Exponential Frailty Model . . . . . . . . . . . . . . . . .
5859
Example 74.17: Normal Regression with Interval Censoring . . . . . . . . . . . . . .
5866
Example 74.18: Constrained Analysis . . . . . . . . . . . . . . . . . . . . . . . . . .
5867
Example 74.19: Implement a New Sampling Algorithm . . . . . . . . . . . . . . . .
5873
Example 74.20: Using a Transformation to Improve Mixing . . . . . . . . . . . . . .
5882
Example 74.21: Gelman-Rubin Diagnostics . . . . . . . . . . . . . . . . . . . . . . .
5891
Example 74.22: One-Compartment Model with Pharmacokinetic Data . . . . . . . . .
5898
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5902
Overview: MCMC Procedure F 5629
Overview: MCMC Procedure
The MCMC procedure is a general purpose Markov chain Monte Carlo (MCMC) simulation procedure that
is designed to fit Bayesian models. Bayesian statistics is different from traditional statistical methods such as
frequentist or classical methods. For a short introduction to Bayesian analysis and related basic concepts, see
Chapter 7, “Introduction to Bayesian Analysis Procedures.” Also see the section “A Bayesian Reading List”
on page 153 in Chapter 7, “Introduction to Bayesian Analysis Procedures,” for a guide to Bayesian textbooks
of varying degrees of difficulty.
In essence, Bayesian statistics treats parameters as unknown random variables, and it makes inferences based
on the posterior distributions of the parameters. There are several advantages associated with this approach to
statistical inference. Some of the advantages include its ability to use prior information and to directly answer
specific scientific questions that can be easily understood. For further discussions of the relative advantages
and disadvantages of Bayesian analysis, see the section “Bayesian Analysis: Advantages and Disadvantages”
on page 128 in Chapter 7, “Introduction to Bayesian Analysis Procedures.”
It follows from Bayes’ theorem that a posterior distribution is the product of the likelihood function and the
prior distribution of the parameter. In all but the simplest cases, it is very difficult to obtain the posterior
distribution directly and analytically. Often, Bayesian methods rely on simulations to generate sample from
the desired posterior distribution and use the simulated draws to approximate the distribution and to make all
of the inferences.
PROC MCMC is a flexible, simulation-based procedure that is suitable for fitting a wide range of Bayesian
models. To use PROC MCMC, you need to specify a likelihood function for the data and a prior distribution
for the parameters. If you are fitting hierarchical models, you can specify a hyperprior distribution or
distributions for the random-effects parameters. PROC MCMC then obtains samples from the corresponding
posterior distributions, produces summary and diagnostic statistics, and saves the posterior samples in an
output data set that can be used for further analysis. Although PROC MCMC supports a suite of standard
distributions, you can analyze data that have any likelihood, prior, or hyperprior, as long as these functions
are programmable using the SAS DATA step functions. There are no constraints on how the parameters can
enter the model, in either linear or any nonlinear functional form.
The MODEL statement in PROC MCMC can automatically model missing data, response variables, or
covariates. In releases before SAS/STAT 12.1, observations with missing values were discarded prior to
the analysis. Now, PROC MCMC treats the missing values as unknown parameters and incorporates the
sampling of the missing values as part of the simulation.
PROC MCMC selects a sampling method for each parameter or a block of parameters. For example, when
conjugacy is available, samples are drawn directly from the full conditional distribution by using standard
random number generators. In other cases, PROC MCMC uses an adaptive blocked random walk Metropolis
algorithm that uses a normal proposal distribution. You can also choose alternative sampling algorithms, such
as the slice sampler.
PROC MCMC Compared with Other SAS Procedures
PROC MCMC is unlike most other SAS/STAT procedures in that the nature of the statistical inference is
Bayesian. You specify prior distributions for the parameters with PRIOR statements and the likelihood
function for the data with MODEL statements. PROC MCMC derives inferences from simulation rather than
5630 F Chapter 74: The MCMC Procedure
through analytic or numerical methods. You should expect slightly different answers from each run for the
same problem, unless the same random number seed is used. The model specification is similar to PROC
NLIN, and PROC MCMC shares some of the syntax of PROC NLMIXED.
You can also carry out a Bayesian analysis with the BCHOICE, GENMOD, PHREG, LIFEREG, and FMM
procedures for discrete choice models, generalized linear models, accelerated life failure models, Cox
regression models, piecewise constant baseline hazard models (also known as piecewise exponential models),
and finite mixture models. See Chapter 27, “The BCHOICE Procedure,” Chapter 45, “The GENMOD
Procedure,” Chapter 86, “The PHREG Procedure,” Chapter 70, “The LIFEREG Procedure,” and Chapter 40,
“The FMM Procedure.”
Getting Started: MCMC Procedure
There are three examples in this “Getting Started” section: a simple linear regression, the Behrens-Fisher
estimation problem, and a random-effects model. The regression model is chosen for its simplicity; the
Behrens-Fisher problem illustrates some advantages of the Bayesian approach; and the random-effects model
is one of the most prevalently used models.
Keep in mind that PARMS statements declare the parameters in the model, PRIOR statements declare the
prior distributions, MODEL statements declare the likelihood for the data, and RANDOM statements declare
the random effects. In most cases, you do not need to supply initial values. PROC MCMC advises you if it is
unable to generate starting values for the Markov chain.
Simple Linear Regression
This section illustrates some basic features of PROC MCMC by using a linear regression model. The model
is as follows:
Yi D ˇ0 C ˇ1 Xi C i
for the observations i D 1; 2; : : : ; n.
The following statements create a SAS data set with measurements of Height and Weight for a group of
children:
title 'Simple Linear Regression';
data Class;
input Name $ Height Weight @@;
datalines;
Alfred 69.0 112.5
Alice 56.5 84.0
Carol
62.8 102.5
Henry 63.5 102.5
Jane
59.8 84.5
Janet 62.5 112.5
John
59.0 99.5
Joyce 51.3 50.5
Louise 56.3 77.0
Mary
66.5 112.0
Robert 64.8 128.0
Ronald 67.0 133.0
William 66.5 112.0
;
Barbara
James
Jeffrey
Judy
Philip
Thomas
65.3 98.0
57.3 83.0
62.5 84.0
64.3 90.0
72.0 150.0
57.5 85.0
Simple Linear Regression F 5631
The equation of interest is as follows:
Weighti D ˇ0 C ˇ1 Heighti C i
The observation errors, i , are assumed to be independent and identically distributed with a normal distribution
with mean zero and variance 2 .
Weighti normal.ˇ0 C ˇ1 Heighti ; 2 /
The likelihood function for each of the Weight, which is specified in the MODEL statement, is as follows:
p.Weightjˇ0 ; ˇ1 ; 2 ; Heighti / D .ˇ0 C ˇ1 Heighti ; 2 /
where p.j/ denotes a conditional probability density and is the normal density. There are three parameters
in the likelihood: ˇ0 , ˇ1 , and 2 . You use the PARMS statement to indicate that these are the parameters in
the model.
Suppose you want to use the following three prior distributions on each of the parameters:
.ˇ0 / D .0; var D 1e6/
.ˇ1 / D .0; var D 1e6/
. 2 / D fi € .shape D 3=10; scale D 10=3/
where ./ indicates a prior distribution and fi € is the density function for the inverse-gamma distribution.
The normal priors on ˇ0 and ˇ1 have large variances, expressing your lack of knowledge about the regression
coefficients. The priors correspond to an equal-tail 95% credible intervals of approximately (-2000, 2000) for
ˇ0 and ˇ1 . Priors of this type are often called vague or diffuse priors. See the section “Prior Distributions”
on page 123 in Chapter 7, “Introduction to Bayesian Analysis Procedures,” for more information. Typically
diffuse prior distributions have little influence on the posterior distribution and are appropriate when stronger
prior information about the parameters is not available.
A frequently used prior for the variance parameter 2 is the inverse-gamma distribution. See Table 74.22 in
the section “Standard Distributions” on page 5700 for the density definition. The inverse-gamma distribution
is a conjugate prior (see the section “Conjugate Sampling” on page 5697) for the variance parameter in a
normal distribution. Also see the section “Gamma and Inverse-Gamma Distributions” on page 5746 for
typical usages of the gamma and inverse-gamma prior distributions. With a shape parameter of 3/10 and a
scale parameter of 10/3, this prior corresponds to an equal-tail 95% credible interval of (1.7, 1E6), with the
mode at 2.5641 for 2 . Alternatively, you can use any other prior distribution with positive support on this
variance component. For example, you can use the gamma prior.
According to Bayes’ theorem, the likelihood function and prior distributions determine the posterior (joint)
distribution of ˇ0 , ˇ1 , and 2 as follows:
.ˇ0 ; ˇ1 ; 2 jWeight; Height/ / .ˇ0 /.ˇ1 /. 2 /p.Weightjˇ0 ; ˇ1 ; 2 ; Height/
You do not need to know the form of the posterior distribution when you use PROC MCMC. PROC MCMC
automatically obtains samples from the desired posterior distribution, which is determined by the prior and
likelihood you supply.
5632 F Chapter 74: The MCMC Procedure
The following statements fit this linear regression model with diffuse prior information:
ods graphics on;
proc mcmc data=class outpost=classout nmc=10000 thin=2 seed=246810;
parms beta0 0 beta1 0;
parms sigma2 1;
prior beta0 beta1 ~ normal(mean = 0, var = 1e6);
prior sigma2 ~ igamma(shape = 3/10, scale = 10/3);
mu = beta0 + beta1*height;
model weight ~ n(mu, var = sigma2);
run;
ods graphics off;
When ODS Graphics is enabled, diagnostic plots, such as the trace and autocorrelation function plots of the
posterior samples, are displayed. For more information about ODS Graphics, see Chapter 21, “Statistical
Graphics Using ODS.”
The PROC MCMC statement invokes the procedure and specifies the input data set Class. The output data
set Classout contains the posterior samples for all of the model parameters. The NMC= option specifies
the number of posterior simulation iterations. The THIN= option controls the thinning of the Markov chain
and specifies that one of every 2 samples is kept. Thinning is often used to reduce the correlations among
posterior sample draws. In this example, 5,000 simulated values are saved in the Classout data set. The
SEED= option specifies a seed for the random number generator, which guarantees the reproducibility of
the random stream. For more information about Markov chain sample size, burn-in, and thinning, see the
section “Burn-In, Thinning, and Markov Chain Samples” on page 135 in Chapter 7, “Introduction to Bayesian
Analysis Procedures.”
The PARMS statements identify the three parameters in the model: beta0, beta1, and sigma2. Each statement
also forms a block of parameters, where the parameters are updated simultaneously in each iteration. In this
example, beta0 and beta1 are sampled jointly, conditional on sigma2; and sigma2 is sampled conditional on
fixed values of beta0 and beta1. In simple regression models such as this, you expect the parameters beta0
and beta1 to have high posterior correlations, and placing them both in the same block improves the mixing
of the chain—that is, the efficiency that the posterior parameter space is explored by the Markov chain. For
more information, see the section “Blocking of Parameters” on page 5691. The PARMS statements also
assign initial values to the parameters (see the section “Initial Values of the Markov Chains” on page 5698).
The regression parameters are given 0 as their initial values, and the scale parameter sigma2 starts at value 1.
If you do not provide initial values, PROC MCMC chooses starting values for every parameter.
The PRIOR statements specify prior distributions for the parameters. The parameters beta0 and beta1
both share the same prior—a normal prior with mean 0 and variance 1e6. The parameter sigma2 has an
inverse-gamma distribution with a shape parameter of 3/10 and a scale parameter of 10/3. For a list of
standard distributions that PROC MCMC supports, see the section “Standard Distributions” on page 5700.
The MU assignment statement calculates the expected value of Weight as a linear function of Height. The
MODEL statement uses the shorthand notation, n, for the normal distribution to indicate that the response
variable, Weight, is normally distributed with parameters mu and sigma2. The functional argument MEAN=
in the normal distribution is optional, but you have to indicate whether sigma2 is a variance (VAR=), a
standard deviation (SD=), or a precision (PRECISION=) parameter. See Table 74.2 in the section “MODEL
Statement” on page 5665 for distribution specifications.
Simple Linear Regression F 5633
The distribution parameters can contain expressions. For example, you can write the MODEL statement as
follows:
model weight ~ n(beta0 + beta1*height, var = sigma2);
Before you do any posterior inference, it is essential that you examine the convergence of the Markov chain
(see the section “Assessing Markov Chain Convergence” on page 136 in Chapter 7, “Introduction to Bayesian
Analysis Procedures”). You cannot make valid inferences if the Markov chain has not converged. A very
effective convergence diagnostic tool is the trace plot. Although PROC MCMC produces graphs at the end of
the procedure output (see Figure 74.5), you should visually examine the convergence graph first.
The first table that PROC MCMC produces is the “Number of Observations” table, as shown in Figure 74.1.
This table lists the number of observations read from the DATA= data set and the number of observations
used in the analysis.
Figure 74.1 Observation Information
Simple Linear Regression
The MCMC Procedure
Number of Observations Read 19
Number of Observations Used 19
The “Parameters” table, shown in Figure 74.2, lists the names of the parameters, the blocking information, the
sampling method used, the starting values, and the prior distributions. For more information about blocking
information, see the section “Blocking of Parameters” on page 5691; for more information about starting
values, see the section “Initial Values of the Markov Chains” on page 5698. The first block, which consists of
the parameters beta0 and beta1, uses a random walk Metropolis algorithm. The second block, which consists
of the parameter sigma2, is updated via its full conditional distribution in conjugacy. You should check this
table to ensure that you have specified the parameters correctly, especially for complicated models.
Figure 74.2 Parameter Information
Parameters
Sampling
Block Parameter Method
1 beta0
N-Metropolis
beta1
2 sigma2
Initial
Value Prior Distribution
0 normal(mean = 0, var = 1e6)
0 normal(mean = 0, var = 1e6)
Conjugate
1.0000 igamma(shape = 3/10, scale = 10/3)
For each posterior distribution, PROC MCMC also reports summary and interval statistics (posterior means,
standard deviations, and 95% highest posterior density credible intervals), as shown in Figure 74.3. For
more information about posterior statistics, see the section “Summary Statistics” on page 150 in Chapter 7,
“Introduction to Bayesian Analysis Procedures.”
5634 F Chapter 74: The MCMC Procedure
Figure 74.3 MCMC Summary and Interval Statistics
Simple Linear Regression
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
beta0
5000 -142.8
beta1
5000 3.8924
sigma2
5000
137.3
33.4326
0.5333
95%
HPD Interval
-210.8 -81.6714
2.9056
4.9545
51.1030 59.2362
236.3
By default, PROC MCMC computes the effective sample sizes (ESSs) as a convergence diagnostic test to
help you determine whether the chain has converged. The ESSs are shown in Figure 74.4. For details and
interpretations of ESS and additional convergence diagnostics, see the section “Assessing Markov Chain
Convergence” on page 136 in Chapter 7, “Introduction to Bayesian Analysis Procedures.”
Figure 74.4 MCMC Convergence Diagnostics
Simple Linear Regression
The MCMC Procedure
Effective Sample Sizes
Parameter
ESS
Autocorrelation
Time Efficiency
beta0
1102.2
4.5366
0.2204
beta1
1119.0
4.4684
0.2238
sigma2
2910.1
1.7182
0.5820
PROC MCMC produces a number of graphs, shown in Figure 74.5, which also aid convergence diagnostic
checks. With the trace plots, there are two important aspects to examine. First, you want to check whether
the mean of the Markov chain has stabilized and appears constant over the graph. Second, you want to check
whether the chain has good mixing and is “dense,” in the sense that it quickly traverses the support of the
distribution to explore both the tails and the mode areas efficiently. The plots show that the chains appear to
have reached their stationary distributions.
Next, you should examine the autocorrelation plots, which indicate the degree of autocorrelation for each of
the posterior samples. High correlations usually imply slow mixing. Finally, the kernel density plots estimate
the posterior marginal distributions for each parameter.
Simple Linear Regression F 5635
Figure 74.5 Diagnostic Plots for ˇ0 , ˇ1 and 2
5636 F Chapter 74: The MCMC Procedure
Figure 74.5 continued
In regression models such as this, you expect the posterior estimates to be very similar to the maximum
likelihood estimators with noninformative priors on the parameters, The REG procedure produces the
following fitted model (code not shown):
Weight D
143:0 C 3:9 Height
These are very similar to the means show in Figure 74.3. With PROC MCMC, you can carry out informative
analysis that uses specifications to indicate prior knowledge on the parameters. Informative analysis is likely
to produce different posterior estimates, which are the result of information from both the likelihood and the
prior distributions. Incorporating additional information in the analysis is one major difference between the
classical and Bayesian approaches to statistical inference.
The Behrens-Fisher Problem
One of the famous examples in the history of statistics is the Behrens-Fisher problem (Fisher 1935). Consider
the situation where there are two independent samples from two different normal distributions:
y11 ; y12 ; : : : ; y1n1 normal.1 ; 12 /
y21 ; y22 ; : : : ; y2n2 normal.2 ; 22 /
Note that n1 ¤ n2 . When you do not want to assume that the variances are equal, testing the hypothesis
H0 W 1 D 2 is a difficult problem in the classical statistics framework, because the distribution under H0
The Behrens-Fisher Problem F 5637
is not known. Within the Bayesian framework, this problem is straightforward because you can estimate the
posterior distribution of 1 2 while taking into account the uncertainties in all of parameters by treating
them as random variables.
Suppose you have the following set of data:
title 'The Behrens-Fisher Problem';
data behrens;
input y ind @@;
datalines;
121 1 94 1 119 1 122
172 1 155 1 107 1 180
145 1 148 1 120 1 147
130 2 130 2 122 2 118
126 2 127 2 111 2 112
;
1
1
1
2
2
142
119
125
118
121
1
1
1
2
2
168
157
126
111
1
1
2
2
116
101
125
123
1
1
2
2
The response variable is y, and the ind variable is the group indicator, which takes two values: 1 and 2. There
are 19 observations that belong to group 1 and 14 that belong to group 2.
The likelihood functions for the two samples are as follows:
p.y1i j1 ; 12 / D .y1i I 1 ; 12 / for i D 1; : : : ; 19
p.y2j j2 ; 22 / D .y2j I 2 ; 22 / for j D 1; : : : ; 14
Berger (1985) showed that a uniform prior on the support of the location parameter is a noninformative prior.
The distribution is invariant under location transformations—that is, D C c. You can use this prior for
the mean parameters in the model:
.1 / / 1
.2 / / 1
In addition, Berger (1985) showed that a prior of the form 1= 2 is noninformative for the scale parameter,
and it is invariant under scale transformations (that is D c 2 ). You can use this prior for the variance
parameters in the model:
.12 / / 1=12
.22 / / 1=22
The log densities of the prior distributions on 12 and 22 are:
log..12 // D
log.12 /
log..22 // D
log.22 /
5638 F Chapter 74: The MCMC Procedure
The following statements generate posterior samples of 1 ; 2 ; 12 ; 22 , and the difference in the means:
1 2 :
proc mcmc data=behrens outpost=postout seed=123
nmc=40000 monitor=(_parms_ mudif)
statistics(alpha=0.01);
ods select PostSumInt;
parm mu1 0 mu2 0;
parm sig21 1;
parm sig22 1;
prior mu: ~ general(0);
prior sig21 ~ general(-log(sig21), lower=0);
prior sig22 ~ general(-log(sig22), lower=0);
mudif = mu1 - mu2;
if ind = 1 then do;
mu = mu1;
s2 = sig21;
end;
else do;
mu = mu2;
s2 = sig22;
end;
model y ~ normal(mu, var=s2);
run;
The PROC MCMC statement specifies an input data set (Behrens), an output data set containing the posterior
samples (Postout), a random number seed, and the simulation size. The MONITOR= option specifies a list
of symbols, which can be either parameters or functions of the parameters in the model, for which inference
is to be done. The symbol _parms_ is a shorthand for all model parameters—in this case, mu1, mu2, sig21,
and sig22. The symbol mudif is defined in the program as the difference between 1 and 2 .
The global suboption ALPHA=0.01 in the STATISTICS= option specifies 99% highest posterior density
(HPD) credible intervals for all parameters.
The ODS SELECT statement displays the summary statistics and interval statistics tables while excluding all
other output. For a complete list of ODS tables that PROC MCMC can produce, see the sections “Displayed
Output” on page 5769 and “ODS Table Names” on page 5773.
The PARMS statements assign the parameters mu1 and mu2 to the same block, and sig21 and sig22 each to
their own separate blocks. There are a total of three blocks. The PARMS statements also assign an initial
value to each parameter.
The PRIOR statements specify prior distributions for the parameters. Because the priors are all nonstandard
(uniform on the real axis for 1 and 2 and 1= 2 for 12 and 22 ), you must use the GENERAL function here.
The argument in the GENERAL function is an expression for the log of the distribution, up to an additive
constant. This distribution can have any functional form, as long as it is programmable using SAS functions
and expressions. The function specifies a distribution on the log scale, not on the original scale. The log of
the prior on mu1 and mu2 is 0, and the log of the priors on sig21 and sig22 are –log(sig21) and –log(sig22)
respectively. See the section “Specifying a New Distribution” on page 5715 for more information about how
to specify an arbitrary distribution. The LOWER= option indicates that both variance terms must be strictly
positive.
The Behrens-Fisher Problem F 5639
The MUDIF assignment statement calculates the difference between mu1 and mu2. The IF-ELSE statements
enable different y’s to have different mean and variance, depending on their group indicator ind. The MODEL
statement specifies the normal likelihood function for each observation in the model.
Figure 74.6 displays the posterior summary and interval statistics.
Figure 74.6 Posterior Summary and Interval Statistics
The Behrens-Fisher Problem
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
99%
HPD Interval
mu1
40000
134.8
6.0092
119.1
152.3
mu2
40000
121.4
1.9119
116.1
126.6
sig21
40000
685.0
255.3
260.0
1580.5
sig22
40000 51.1811
23.8675 14.2322
136.0
mudif
40000 13.3730
6.3095 -3.3609 30.7938
The mean difference has a posterior mean value of 13.37, and the lower endpoints of the 99% credible
intervals are negative. This suggests that the mean difference is positive with a high probability. However, if
you want to estimate the probability that 1 2 > 0, you can do so as follows.
The following statements produce Figure 74.7:
proc format;
value diffmt low-0 = 'mu1 - mu2 <= 0' 0<-high = 'mu1 - mu2 > 0';
run;
proc freq data = postout;
tables mudif /nocum;
format mudif diffmt.;
run;
The sample estimate of the posterior probability that 1 2 > 0 is 0.98. This example illustrates an
advantage of Bayesian analysis. You are not limited to making inferences based on model parameters only.
You can accurately quantify uncertainties with respect to any function of the parameters, and this allows for
flexibility and easy interpretations in answering many scientific questions.
Figure 74.7 Estimated Probability of 1
The Behrens-Fisher Problem
The FREQ Procedure
mudif Frequency Percent
mu1 - mu2 <= 0
753
1.88
mu1 - mu2 > 0
39247
98.12
2 > 0.
5640 F Chapter 74: The MCMC Procedure
Random-Effects Model
This example illustrates how you can fit a normal likelihood random-effects model in PROC MCMC.
PROC MCMC offers you the ability to model beyond the normal likelihood (see “Example 74.7: Logistic
Regression Random-Effects Model” on page 5812, “Example 74.8: Nonlinear Poisson Regression Multilevel
Random-Effects Model” on page 5814, and “Example 74.16: Piecewise Exponential Frailty Model” on
page 5859).
Consider a scenario in which data are collected in groups and you want to model group-specific effects. You
can use a random-effects model (sometimes also known as a variance-components model):
yij D ˇ0 C ˇ1 xij C i C eij ;
eij normal.0; 2 /
where i D 1; 2; : : : ; I is the group index and j D 1; 2; : : : ; ni indexes the observations in the ith group.
In the regression model, the fixed effects ˇ0 and ˇ1 are the intercept and the coefficient for variable xij ,
respectively. The random effect i is the mean for the ith group, and eij are the error term.
Consider the following SAS data set:
title 'Random-Effects Model';
data heights;
input Family G$ Height @@;
datalines;
1 F 67
1 F 66
1 F 64
1 M 71
2 F 63
2 F 67
2 M 69
2 M 68
3 M 64
4 F 67
4 F 66
4 M 67
;
1 M 72
2 M 70
4 M 67
2 F 63
3 F 63
4 M 69
The response variable Height measures the heights (in inches) of 18 individuals. The covariate x is the
gender (variable G), and the individuals are grouped according to Family (group index). Since the variable
G is a character variable and PROC MCMC does not support a CLASS statement, you need to create the
corresponding design matrix. In this example, the design matrix for a factor variable of level 2 (M and F) can
be constructed using the following statement:
data input;
set heights;
if g eq 'F' then gf = 1;
else gf = 0;
drop g;
run;
The data set variable gf is a numeric variable and can be used in the regression model in PROC MCMC.
In data sets with factor variables that have more levels, you can consider using PROC TRANSREG to
construct the design matrix. See the section “Create Design Matrix” on page 5728 for more information.
To model the data, you can assume that Height is normally distributed:
yij normal.ij ; 2 /;
ij D ˇ0 C ˇ1 gfij C i
The priors on the parameters ˇ0 , ˇ1 , i are also assumed to be normal:
ˇ0 ; ˇ1 normal.0; var D 1e5/
i
normal.0; var D 2 /
Random-Effects Model F 5641
Priors on the variance terms, 2 and 2 , are inverse-gamma:
2 ; 2 igamma.shape D 0:01; scale D 0:01/
The inverse-gamma distribution is a conjugate prior for the variance in the normal likelihood and the variance
in the prior distribution of the random effect.
The following statements fit a linear random-effects model to the data and produce the output shown in
Figure 74.9 and Figure 74.10:
ods graphics on;
proc mcmc data=input outpost=postout nmc=50000 seed=7893 plots=trace;
ods select Parameters REparameters PostSumInt tracepanel;
parms b0 0 b1 0 s2 1 s2g 1;
prior b: ~ normal(0, var = 10000);
prior s: ~ igamma(0.01, scale = 0.01);
random gamma ~ normal(0, var = s2g) subject=family monitor=(gamma);
mu = b0 + b1 * gf + gamma;
model height ~ normal(mu, var = s2);
run;
ods graphics off;
Some of the statements are very similar to those shown in the previous two examples. The ODS GRAPHICS
ON statement enables ODS Graphics. The PROC MCMC statement specifies the input and output data sets,
the simulation size, and a random number seed. The ODS SELECT statement displays the model parameter
and random-effects parameter information tables, summary statistics table, the interval statistics table, and
the trace plots.
The PARMS statement lumps all four model parameters in a single block. They are b0 (overall intercept), b1
(main effect for gf), s2 (variance of the likelihood function), and s2g (variance of the random effect). If a
random walk Metropolis sampler is the only applicable sampler for all parameters, then these four parameters
are updated in a single block. However, because PROC MCMC updates the parameters s2 and s2g via
conjugacy, these parameters are separated into individual blocks. (See the Block column in “Parameters”
table in Figure 74.8.)
The PRIOR statements specify priors for all the parameters. The notation b: is a shorthand for all symbols
that start with the letter ‘b’. In this example, b: includes b0 and b1. Similarly, s: stands for both s2 and s2g.
This shorthand notation can save you some typing, and it keeps your statements tidy.
The RANDOM statement specifies a single random effect to be gamma, and specifies that it has a normal
prior centered at 0 with variance s2g. The SUBJECT= argument in the RANDOM statement defines a
group index (family) in the model, where all observations from the same family should have the same group
indicator value. The MONITOR= option outputs analysis for all the random-effects parameters.
Finally, the MU assignment statement calculates the expected value of the height of the model. The calculation
includes the random-effects term gamma. The MODEL statement specifies the likelihood function for height.
The “Parameters” and “Random-Effects Parameters” tables, shown in Figure 74.8, contain information about
the model parameters and the four random-effects parameters.
5642 F Chapter 74: The MCMC Procedure
Figure 74.8 Model and Random-Effects Parameter Information
Random-Effects Model
The MCMC Procedure
Parameters
Sampling
Block Parameter Method
Initial
Value Prior Distribution
1 s2
Conjugate
1.0000 igamma(0.01, scale = 0.01)
2 s2g
Conjugate
1.0000 igamma(0.01, scale = 0.01)
3 b0
N-Metropolis
b1
0 normal(0, var = 10000)
0 normal(0, var = 10000)
Random Effect Parameters
Sampling
Parameter Method
gamma
Subject
N-Metropolis
Number of Subject Prior
Subjects Values Distribution
Family
4 1234
normal(0, var = s2g)
The posterior summary and interval statistics for the model parameters and the random-effects parameters
are shown in Figure 74.9.
Figure 74.9 Posterior Summary and Interval Statistics
Random-Effects Model
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
b0
50000 68.4687
1.2757 65.9159 71.1771
b1
50000 -3.5502
0.9762 -5.4269 -1.5257
s2
50000
4.1446
s2g
50000
4.9378
gamma_1 50000
0.9383
1.3255 -1.0078
4.1195
gamma_2 50000
1.9506
1.3768
7.9151
19.2469 0.00105 18.7219
0.0139
1.1956 -2.6746
2.4767
gamma_3 50000 -1.3470
1.6495 -4.7168
1.0744
gamma_4 50000
1.1971 -2.4108
2.7432
0.0966
Trace plots for all the parameters are shown in Figure 74.10. The mixing looks very reasonable, suggesting
convergence.
Random-Effects Model F 5643
Figure 74.10 Plots for b1 and Log of the Posterior Density
5644 F Chapter 74: The MCMC Procedure
Figure 74.10 continued
From the interval statistics table, you see that both the equal-tail and HPD intervals for ˇ0 are positive,
strongly indicating the positive effect of the parameter. On the other hand, both intervals for ˇ1 cover the
value zero, indicating that gf does not have a strong impact on predicting height in this model.
Syntax: MCMC Procedure F 5645
Syntax: MCMC Procedure
The following statements are available in the MCMC procedure. Items within < > are optional.
PROC MCMC < options > ;
ARRAY arrayname [ dimensions ] < $ > < variables-and-constants > ;
BEGINCNST/ENDCNST ;
BEGINNODATA/ENDNODATA ;
BY variables ;
MODEL variable Ï distribution < options > ;
PARMS parameter < = > number < / options > ;
PREDDIST < 'label' > OUTPRED=SAS-data-set < options > ;
PRIOR/HYPERPRIOR parameter Ï distribution ;
Programming statements ;
RANDOM random-effects-specification < / options > ;
UDS subroutine-name (subroutine-argument-list) ;
The PARMS statements declare parameters in the model and assign optional starting values for the Markov
chain. The PRIOR/HYPERPRIOR statements specify the prior distributions of the parameters. The MODEL
statements specify the log-likelihood functions for the response variables. These statements form the basis of
most Bayesian models.
In addition, you can use the ARRAY statement to define constant or parameter arrays, the BEGINCNST/ENDCNST and BEGINNODATA/ENDNODATA statements to omit unnecessary evaluation and reduce
simulation time, the PREDDIST statement to generate samples from the posterior predictive distribution,
the program statements to specify more complicated models that you want to fit, the RANDOM statement
to specify random effects and their prior distributions, and the UDS statement to define your own Gibbs
samplers to sample parameters in the model.
The following sections provide a description of each of these statements.
PROC MCMC Statement
PROC MCMC options ;
The PROC MCMC statement invokes the MCMC procedure. Table 74.1 summarizes the options available in
the PROC MCMC statement.
Table 74.1 PROC MCMC Statement Options
Option
Description
Basic options
DATA=
OUTPOST=
Names the input data set
Names the output data set for posterior samples of parameters
Debugging output
LIST
LISTCODE
TRACE
Displays the model program and variables
Displays the compiled model program
Displays detailed model execution messages
5646 F Chapter 74: The MCMC Procedure
Table 74.1 (continued)
Option
Description
Frequently used MCMC options
ALG=
Specifies the default sampling algorithm
MAXTUNE=
Specifies the maximum number of tuning loops
MINTUNE=
Specifies the minimum number of tuning loops
NBI=
Specifies the number of burn-in iterations
NMC=
Specifies the number of MCMC iterations, excluding the burn-in iterations
NTHREADS=
Specifies the number of threads to use
NTU=
Specifies the number of tuning iterations
PROPCOV=
Controls options for constructing the initial proposal covariance matrix
SEED=
Specifies the random seed for simulation
THIN=
Specifies the thinning rate
Less frequently used MCMC options
ACCEPTTOL=
Specifies a tolerance for acceptance probabilities
BINARYJOINT
Jointly samples a block of binary parameters
DISCRETE=
Controls the sampling of discrete parameters
INIT=
Controls the generation of initial values
MCHISTORY=
Displays the Markov chain sampling history
MAXINDEXPRINT=
Specifies the maximum number of observation indices to print in models
with missing data
MAXSUBVALUEPRINT= Specifies the maximum number of subject values to print in the “Random
Effects Parameters” table
REOBSINFO
Displays more detailed information about each random effect
SCALE=
Specifies the initial scale applied to the proposal distribution
TARGACCEPT=
Specifies the target acceptance rate for the random-walk sampler
TARGACCEPTI=
Specifies the target acceptance rate for the independence sampler
TUNEWT=
Specifies the weight used in covariance updating
Summary, diagnostics, and plotting options
AUTOCORLAG=
Specifies the number of autocorrelation lags used to compute effective
sample sizes and Monte Carlo errors
DIAGNOSTICS=
Controls the convergence diagnostics
DIC
Computes the deviance information criterion (DIC)
MONITOR=
Outputs the analysis for a list of symbols of interest
PLOTS=
Controls plotting
STATISTICS=
Controls posterior statistics
Other options
INF=
JOINTMODEL
MISSING=
NOLOGDIST
SIMREPORT=
SINGDEN=
Specifies the machine numerical limit for infinity
Specifies the joint log-likelihood function
Indicates how missing values are handled
Omits the calculation of the logarithm of the joint distribution of the
parameters
Controls the frequency of the report for the expected run time
Specifies the singularity tolerance
PROC MCMC Statement F 5647
These options are described in alphabetical order.
ACCEPTTOL=n
specifies a tolerance for acceptance probabilities. By default, ACCEPTTOL=0.075.
ALG=value
PROPDIST=value
specifies the default sampling algorithm for continuous parameters when more optimal algorithms,
such as conjugate samplers, are not available. For more information, see the sections “Hamiltonian
Monte Carlo Sampler” on page 134 and “Metropolis and Metropolis-Hastings Algorithms” on page 130
in Chapter 7, “Introduction to Bayesian Analysis Procedures.” By default, ALG=NORMAL, a normal
kernel based random-walk Metropolis.
You can specify the following values:
HMC< (hmc-options) >
specifies the Hamiltonian Monte Carlo algorithm with a fixed step size and predetermined number
of steps. You can specify the following hmc-options:
NSTEPS=value
N=value
specifies the number of steps in the HMC algorithm. By default, N=15.
SAVEGRAD
saves the gradient calculation in the OUTPOST= data set.
STEPSIZE=value
specifies the step size in the HMC algorithm. By default, STEPSIZE=0.1.
NORMAL
N
specifies a normal distribution as the proposal distribution in the random-walk Metropolis
algorithm. This is the default.
NUTS< (nuts-options) >
specifies the No-U-Turn Sampler of the Hamiltonian algorithm. You can specify the following
nuts-options:
DELTA=value
specifies the target acceptance rate during the tuning process. By default, DELTA=0.6.
Increasing the value can often improve mixing, but it can also significantly slow down the
sampling.
FCALLS
outputs the number of function evaluations at each iteration.
MAXHEIGHT=value
specifies the maximum height of the NUTS tree. The taller the tree, the more gradient
evaluations per iteration the procedure calculates. The number of evaluations is 2height .
By default, MAXHEIGHT=10. Usually, the height of a tree should be no more than 7 or
8 during the sampling stage, but it can go higher during the tuning stage. A larger number
5648 F Chapter 74: The MCMC Procedure
indicates that the algorithm is having difficulty converging. PROC MCMC stops when the
height of a NUTS tree surpasses MAXHEIGHT by the number of times specified in the
MAXTIME= option. You can increase the height of the tree and the MAXTIME value.
MAXTIME=value
specifies the maximum number of iterations that it takes the algorithm to surpass the
MAXHEIGHT of the NUTS tree before the procedure stops. By default, MAXTIME=1.
NTU=value
specifies the number of tuning iterations used by NUTS. By default, NTU=1000.
SAVEGRAD
saves the gradient calculation in the OUTPOST= data set.
T< (df ) >
specifies a t distribution with the degrees of freedom df in the random-walk Metropolis algorithm.
By default, df = 3. If df > 100, the normal distribution is used, because the two distributions are
almost identical.
AUTOCORLAG=n
ACLAG=n
specifies the maximum number of autocorrelation lags used in computing the effective sample size;
see the section “Effective Sample Size” on page 149 in Chapter 7, “Introduction to Bayesian Analysis
Procedures,” for more details. The value is used in the calculation of the Monte Carlo standard
error; see the section “Standard Error of the Mean Estimate” on page 150 in Chapter 7, “Introduction
to Bayesian Analysis Procedures.” By default, AUTOCORLAG=MIN(500, MCsample/4),
h where
i
NMC .
MCsample is the Markov chain sample size kept after thinning—that is, MCsample D NTHIN
If the value of the AUTOCORLAG= option is set too low, you might observe significant lags, and
the effective sample size cannot be calculated accurately. A warning message appears, and you can
increase either AUTOCORLAG= or NMC=, accordingly.
BINARYJOINT
jointly samples binary parameters in a block. Binary parameters in a block are sampled separately in
SAS/STAT 13.2 and later. This option reverts to the behavior in SAS/STAT 13.1 and earlier.
DISCRETE=keyword
specifies the proposal distribution used in sampling discrete parameters.
CRETE=BINNING.
By default, DIS-
You can specify the following keywords:
BINNING
uses continuous proposal distributions for all discrete parameter blocks. The proposed sample is
then discretized (binned) before further calculations. This sampling method approximates the
correlation structure among the discrete parameters in the block and could improve mixing in
some cases.
PROC MCMC Statement F 5649
GEO
uses independent symmetric geometric proposal distributions for all discrete parameter blocks.
This proposal does not take parameter correlations into account. However, it can work better than
the BINNING option in cases where the range of the parameters is relatively small and a normal
approximation can perform poorly.
DIAGNOSTICS=NONE | (keyword-list)
DIAG=NONE | (keyword-list)
specifies options for MCMC convergence diagnostics. By default, PROC MCMC computes the Geweke
test, sample autocorrelations, effective sample sizes, and Monte Carlo errors. The Raftery-Lewis and
Heidelberger-Welch tests are also available. See the section “Assessing Markov Chain Convergence” on
page 136 in Chapter 7, “Introduction to Bayesian Analysis Procedures,” for more details on convergence
diagnostics. You can request all of the diagnostic tests by specifying DIAGNOSTICS=ALL. You can
suppress all the tests by specifying DIAGNOSTICS=NONE.
You can use postprocessing autocall macros to calculate convergence diagnostics of the posterior
samples after PROC MCMC has exited. See the section “Autocall Macros for Postprocessing” on
page 5744.
The following options are available.
ALL
computes all diagnostic tests and statistics. You can combine the option ALL with any other
specific tests to modify test options. For example DIAGNOSTICS=(ALL AUTOCORR(LAGS=(1
5 35))) computes all tests with default settings and autocorrelations at lags 1, 5, and 35.
AUTOCORR < (autocorr-options) >
computes default autocorrelations at lags 1, 5, 10, and 50 for each variable. You can choose other
lags by using the following autocorr-options:
LAGS | AC=numeric-list
specifies autocorrelation lags. The numeric-list must take positive integer values.
ESS
computes the effective sample sizes (Kass et al. (1998)) of the posterior samples of each parameter.
It also computes the correlation time and the efficiency of the chain for each parameter. Small
values of ESS might indicate a lack of convergence. See the section “Effective Sample Size” on
page 149 in Chapter 7, “Introduction to Bayesian Analysis Procedures,” for more details.
GEWEKE < (Geweke-options) >
computes the Geweke spectral density diagnostics; this is a two-sample t-test between the first f1
portion and the last f2 portion of the chain. See the section “Geweke Diagnostics” on page 143
in Chapter 7, “Introduction to Bayesian Analysis Procedures,” for more details. The default
is FRAC1=0.1 and FRAC2=0.5, but you can choose other fractions by using the following
Geweke-options:
FRAC1 | F1=value
specifies the beginning FRAC1 proportion of the Markov chain. By default, FRAC1=0.1.
5650 F Chapter 74: The MCMC Procedure
FRAC2 | F2=value
specifies the end FRAC2 proportion of the Markov chain. By default, FRAC2=0.5.
HEIDELBERGER | HEIDEL < (Heidel-options) >
computes the Heidelberger and Welch diagnostic (which consists of a stationarity test and a
halfwidth test) for each variable. The stationary diagnostic test tests the null hypothesis that
the posterior samples are generated from a stationary process. If the stationarity test is passed,
a halfwidth test is then carried out. See the section “Heidelberger and Welch Diagnostics” on
page 145 in Chapter 7, “Introduction to Bayesian Analysis Procedures,” for more details.
These diagnostics are not performed by default.
You can specify the DIAGNOSTICS=HEIDELBERGER option to request these diagnostics, and you can also specify
suboptions, such as DIAGNOSTICS=HEIDELBERGER(EPS=0.05), as follows:
SALPHA=value
specifies the ˛ level .0 < ˛ < 1/ for the stationarity test. By default, SALPHA=0.05.
HALPHA=value
specifies the ˛ level .0 < ˛ < 1/ for the halfwidth test. By default, HALPHA=0.05.
EPS=value
specifies a small positive number such that if the halfwidth is less than times the sample
mean of the retaining iterates, the halfwidth test is passed. By default, EPS=0.1.
MCSE
MCERROR
computes the Monte Carlo standard error for the posterior samples of each parameter.
NONE
suppresses all of the diagnostic tests and statistics. This is not recommended.
RAFTERY | RL < (Raftery-options) >
computes the Raftery and Lewis diagnostics, which evaluate the accuracy of the estimated
quantile (OQ for a given Q 2 .0; 1/) of a chain. OQ can achieve any degree of accuracy when
the chain is allowed to run for a long time. The algorithm stops when the estimated probability
POQ D Pr. OQ / reaches within ˙R of the value Q with probability S; that is, Pr.Q R PO Q Q C R/ D S. See the section “Raftery and Lewis Diagnostics” on page 146 in Chapter 7,
“Introduction to Bayesian Analysis Procedures,” for more details. The Raftery-options enable
you to specify Q, R, S, and a precision level for a stationary test.
These diagnostics are not performed by default. You can specify the DIAGNOSTICS=RAFERTY
option to request these diagnostics, and you can also specify suboptions, such as DIAGNOSTICS=RAFERTY(QUANTILE=0.05), as follows:
QUANTILE | Q=value
specifies the order (a value between 0 and 1) of the quantile of interest. By default, QUANTILE=0.025.
ACCURACY | R=value
specifies a small positive number as the margin of error for measuring the accuracy of
estimation of the quantile. By default, ACCURACY=0.005.
PROC MCMC Statement F 5651
PROB | S=value
specifies the probability of attaining the accuracy of the estimation of the quantile. By
default, PROB=0.95.
EPS=value
specifies the tolerance level (a small positive number) for the stationary test. By default,
EPS=0.001.
DIC
computes the Deviance Information Criterion (DIC). DIC is calculated using the posterior mean
estimates of the parameters. See the section “Deviance Information Criterion (DIC)” on page 152 in
Chapter 7, “Introduction to Bayesian Analysis Procedures,” for more details.
DATA=SAS-data-set
specifies the input data set. Observations in this data set are used to compute the log-likelihood function
that you specify with PROC MCMC statements.
INF=value
specifies the numerical definition of infinity in PROC MCMC. The default is INF=1E15. For example,
PROC MCMC considers 1E16 to be outside of the support of the normal distribution and assigns a
missing value to the log density evaluation. You can select a larger value with the INF= option. The
minimum value allowed is 1E10.
INIT=(keyword-list)
specifies options for generating the initial values for the parameters. These options apply only to
prior distributions that are recognized by PROC MCMC. See the section “Standard Distributions” on
page 5700 for a list of these distributions. If either of the functions GENERAL or DGENERAL is used,
you must supply explicit initial values for the parameters. By default, INIT=MODE. The following
keywords are used:
MODE
uses the mode of the prior density as the initial value of the parameter, if you did not provide one.
If the mode does not exist or if it is on the boundary of the support of the density, the mean value
is used. If the mean is outside of the support or on the boundary, which can happen if the prior
distribution is truncated, a random number drawn from the prior is used as the initial value.
PINIT
tabulates parameter values after the tuning phase. This option also tabulates the tuned proposal
parameters used by the Metropolis algorithm. These proposal parameters include covariance
matrices for continuous parameters and probability vectors for discrete parameters for each block.
By default, PROC MCMC does not display the initial values or the tuned proposal parameters
after the tuning phase.
RANDOM
generates a random number from the prior density and uses it as the initial value of the parameter,
if you did not provide one.
REINIT
resets the parameters, after the tuning phase, with the initial values that you provided explicitly
or that were assigned by PROC MCMC. By default, PROC MCMC does not reset the parameters
5652 F Chapter 74: The MCMC Procedure
because the tuning phase usually moves the Markov chains to a more favorable place in the
posterior distribution.
LIST
displays the model program and variable lists. The LIST option is a debugging feature and is not
normally needed.
LISTCODE
displays the compiled program code. The LISTCODE option is a debugging feature and is not normally
needed.
JOINTMODEL
JOINTLLIKE
specifies how the likelihood function is calculated. By default, PROC MCMC assumes that the
observations in the data set are independent so that the joint log-likelihood function is the sum of the
individual log-likelihood functions for the observations, where the individual log-likelihood function
is specified in the MODEL statement. When your data are not independent, you can specify the
JOINTMODEL option to modify the way that PROC MCMC computes the joint log-likelihood
function. In this situation, PROC MCMC no longer steps through the input data set to sum the
individual log likelihood.
To use this option correctly, you need to do the following two things:
create ARRAY symbols to store all data set variables that are used in the program. This can be
accomplished with the BEGINCNST and ENDCNST statements.
program the joint log-likelihood function by using these ARRAY symbols only. The MODEL
statement specifies the joint log-likelihood function for the entire data set. Typically, you use the
function GENERAL in the MODEL statement.
See the sections “BEGINCNST/ENDCNST Statement” on page 5663 and “Modeling Joint Likelihood”
on page 5729 for details.
MAXTUNE=n
specifies an upper limit for the number of proposal tuning loops. By default, MAXTUNE=24. See the
section “Covariance Tuning” on page 5695 for more details.
MAXINDEXPRINT=number | ALL
MAXIPRINT=number | ALL
specifies the maximum number of observation indices to print in the ODS tables “Missing Response
Information” table and “Missing Covariates Information” table. This option applies only to programs
that model missing data. The default value is 20. MAXINDEXPRINT=ALL prints all observation
indices for every missing variable that is modeled in PROC MCMC.
MAXSUBVALUEPRINT=number | ALL
MAXSVPRINT=number | ALL
specifies the maximum number of subject values to display in the “Subject Values” column of the
ODS table “Random Effects Parameters.” This option applies only to programs that have RANDOM
statements. The default value is 20. MAXSUBVALUEPRINT=ALL prints all subject values for every
random effect in the program.
PROC MCMC Statement F 5653
MCHISTORY=keyword
MCHIST=keyword
controls the display of the Markov chain sampling history.
BRIEF
produces a summary output for the tuning, burn-in, and sampling history tables. The tables show
the following when applicable:
“RWM Scale” shows the scale, or the range of the scales, used in each random walk
Metropolis block that is normal or is based on a t distribution.
“Probability” shows the proposal probability parameter, or the range of the parameters, used
in each random walk Metropolis block that is based on a geometric distribution.
“RWM Acceptance Rate” shows the acceptance rate, or the range of the acceptance rates,
for each random walk Metropolis block.
“IM Acceptance Rate” shows the acceptance rate, or the range of the acceptance rates, for
each independent Metropolis block.
DETAILED
produces detailed output of the tuning, burn-in, and sampling history tables, including scale
values, acceptance probabilities, blocking information, and so on. Use this option with caution,
especially in random-effects models that have a large number of random-effects groups. This
option can produce copious output.
NONE
produces none of the tuning history, burn-in history, and sampling history tables.
The default is MCHISTORY=NONE.
MINTUNE=n
specifies a lower limit for the number of proposal tuning loops. By default, MINTUNE=2. See the
section “Covariance Tuning” on page 5695 for more details.
MISSING=keyword
MISS=keyword
specifies how missing values are handled (see the section “Handling of Missing Data” on page 5753 for
more details). By default, PROC MCMC models missing response variables and discard observations
with missing covariates.
ALLCASE | AC
gives you the option to model the missing values in an all-case analysis directly. PROC MCMC
does not attempt to model the missing values. You can use any techniques that you see fit, for
example, fully Bayesian or multiple imputation.
COMPLETECASE | CC
assumes a complete case analysis, so all observations with missing variable values are discarded
prior to the simulation.
5654 F Chapter 74: The MCMC Procedure
MONITOR= (symbol-list)
outputs analysis for selected symbols of interest in the program. The symbols can be any of the
following: model parameters (symbols in the PARMS statement), secondary parameters (assigned
using the operator “=”), the log of the posterior density (LOGPOST), the log of the prior density
(LOGPRIOR), the log of the hyperprior density (LOGHYPER) if the HYPER statement is used, or the
log of the likelihood function (LOGLIKE). You can use the keyword _PARMS_ as a shorthand for all
of the model parameters. PROC MCMC performs only posterior analyses (such as plotting, diagnostics,
and summaries) on the symbols selected with the MONITOR= option. You can also choose to monitor
an entire array by specifying the name of the array. By default MONITOR=_PARMS_.
Posterior samples of any secondary parameters listed in the MONITOR= option are saved in the
OUTPOST= data set. Posterior samples of model parameters are always saved to the OUTPOST= data
set, regardless of whether they appear in the MONITOR= option.
NBI=n
specifies the number of burn-in iterations to perform before beginning to save parameter estimate
chains. By default, NBI=1000. See the section “Burn-In, Thinning, and Markov Chain Samples” on
page 135 in Chapter 7, “Introduction to Bayesian Analysis Procedures,” for more details.
NMC=n
specifies the number of iterations in the main simulation loop. This is the MCMC sample size if
THIN=1. By default, NMC=1000.
NOLOGDIST
omits the calculation of the logarithm of the joint distribution of the model parameters at each iteration.
The option applies only if all parameters in the model are updated directly from their target distribution,
either from the full conditional posterior via conjugacy or from the marginal distribution. Such
algorithms do not require the calculation of the joint posterior distribution; hence PROC MCMC runs
faster by avoiding these unnecessary calculations. As a result, the OUTPOST= data set does not
contain the LOGPRIOR, LOGLIKE, and LOGPOST variables.
NTHREADS=n
specifies the number of threads for simulation. PROC MCMC performs two types of threading. In
sampling model parameters, PROC MCMC allocates data to different threads and calculates the
objective function by accumulating values from each thread; in sampling of random-effects parameters
and missing data variables, each thread generates a subset of these parameters simultaneously at each
iteration. Most sampling algorithms are threaded. NTHREADS=–1 sets the number of available
threads to the number of hyperthreaded cores available on the system. By default, NTHREADS=1.
NTU=n
specifies the number of iterations to use in each proposal tuning phase. By default, NTU=500.
OUTPOST=SAS-data-set
specifies an output data set that contains the posterior samples of all model parameters, the iteration
numbers (variable name ITERATION), the log of the posterior density (LOGPOST), the log of the
prior density (LOGPRIOR), the log of the hyperprior density (LOGHYPER), if the HYPER statement
is used, and the log likelihood (LOGLIKE). Any secondary parameters (assigned using the operator
“=”) listed in the MONITOR= option are saved to this data set. By default, no OUTPOST= data set is
created.
PROC MCMC Statement F 5655
PLOTS< (global-plot-options) >= (plot-request < . . . plot-request >)
PLOT< (global-plot-options) >= (plot-request < . . . plot-request >)
controls the display of diagnostic plots. Three types of plots can be requested: trace plots, autocorrelation function plots, and kernel density plots. By default, the plots are displayed in panels unless the
global plot option UNPACK is specified. Also when more than one type of plot is specified, the plots
are grouped by parameter unless the global plot option GROUPBY=TYPE is specified. When you
specify only one plot request, you can omit the parentheses around the plot-request, as shown in the
following example:
plots=none
plots(unpack)=trace
plots=(trace density)
ODS Graphics must be enabled before plots can be requested. For example:
ods graphics on;
proc mcmc data=exi seed=7 outpost=p1 plots=all;
parm mu;
prior mu ~ normal(0, sd=10);
model y ~ normal(mu, sd=1);
run;
ods graphics off;
For more information about enabling and disabling ODS Graphics, see the section “Enabling and
Disabling ODS Graphics” on page 607 in Chapter 21, “Statistical Graphics Using ODS.”
If ODS Graphics is enabled but you do not specify the PLOTS= option, then PROC MCMC produces,
for each parameter, a panel that contains the trace plot, the autocorrelation function plot, and the density
plot. This is equivalent to specifying PLOTS=(TRACE AUTOCORR DENSITY).
The global-plot-options include the following:
FRINGE
adds a fringe plot to the horizontal axis of the density plot.
GROUPBY|GROUP=PARAMETER | TYPE
specifies how the plots are grouped when there is more than one type of plot.
GROUPBY=PARAMETER is the default. The choices are as follows:
TYPE
specifies that the plots are grouped by type.
PARAMETER
specifies that the plots are grouped by parameter.
LAGS=n
specifies the number of autocorrelation lags used in plotting the ACF graph. By default,
LAGS=50.
5656 F Chapter 74: The MCMC Procedure
SMOOTH
smooths the trace plot with a fitted penalized B-spline curve (Eilers and Marx 1996).
UNPACKPANEL
UNPACK
specifies that all paneled plots are to be unpacked, so that each plot in a panel is displayed
separately.
The plot-requests are as follows:
ALL
requests all types of plots. PLOTS=ALL is equivalent to specifying PLOTS=(TRACE AUTOCORR DENSITY).
AUTOCORR | ACF
displays the autocorrelation function plots for the parameters.
DENSITY | D | KERNEL | K
displays the kernel density plots for the parameters.
NONE
suppresses the display of all plots.
TRACE | T
displays the trace plots for the parameters.
Consider a model with four parameters, X1–X4. Displays for various specifications are depicted as
follows.
PLOTS=(TRACE AUTOCORR) displays the trace and autocorrelation plots for each parameter
side by side with two parameters per panel:
Display 1
Trace(X1)
Trace(X2)
Autocorr(X1)
Autocorr(X2)
Display 2
Trace(X3)
Trace(X4)
Autocorr(X3)
Autocorr(X4)
PLOTS(GROUPBY=TYPE)=(TRACE AUTOCORR) displays all the paneled trace plots, followed by panels of autocorrelation plots:
Display 1
Trace(X1)
Trace(X2)
Display 2
Trace(X3)
Trace(X4)
Display 3
Autocorr(X1)
Autocorr(X3)
Autocorr(X2)
Autocorr(X4)
PLOTS(UNPACK)=(TRACE AUTOCORR) displays a separate trace plot and a separate correlation plot, parameter by parameter:
PROC MCMC Statement F 5657
Display 1
Trace(X1)
Display 2
Autocorr(X1)
Display 3
Trace(X2)
Display 4
Autocorr(X2)
Display 5
Trace(X3)
Display 6
Autocorr(X3)
Display 7
Trace(X4)
Display 8
Autocorr(X4)
PLOTS(UNPACK GROUPBY=TYPE)=(TRACE AUTOCORR) displays all the separate trace
plots followed by the separate autocorrelation plots:
Display 1
Trace(X1)
Display 2
Trace(X2)
Display 3
Trace(X3)
Display 4
Trace(X4)
Display 5
Autocorr(X1)
Display 6
Autocorr(X2)
Display 7
Autocorr(X3)
Display 8
Autocorr(X4)
PROPCOV=value
specifies the method used in constructing the initial covariance matrix for the Metropolis-Hastings
algorithm. The QUANEW and NMSIMP methods find numerically approximated covariance matrices
at the optimum of the posterior density function with respect to all continuous parameters. The
optimization does not apply to discrete parameters. The tuning phase starts at the optimized values; in
some problems, this can greatly increase convergence performance. If the approximated covariance
matrix is not positive definite, then an identity matrix is used instead. Valid values are as follows:
IND
uses the identity covariance matrix. This is the default. See the section “Tuning the Proposal
Distribution” on page 5694.
5658 F Chapter 74: The MCMC Procedure
CONGRA< (optimize-options) >
performs a conjugate-gradient optimization.
DBLDOG< (optimize-options) >
performs a double-dogleg optimization.
QUANEW< (optimize-options) >
performs a quasi-Newton optimization.
NMSIMP | SIMPLEX< (optimize-options) >
performs a Nelder-Mead simplex optimization.
The optimize-options are as follows:
ITPRINT
prints optimization iteration steps and results.
REOBSINFO < (display-options) >
displays the ODS table “Random Effect Observation Information.” The table lists the name of each
random effect, the unique values in the corresponding subject variable, the number of observations in
each subject, and the observation indices for each subject value.
To understand how this option works, consider the following statements:
data input;
array names{*} $ n1-n10 ("John" "Mary" "Chris" "Rob" "Greg"
"Jen" "Henry" "Alice" "James" "Toby");
call streaminit(17);
do i = 1 to 20;
j = ceil(rand("uniform") * 10 );
index = names[j];
output;
end;
drop n: j;
run;
proc print data=input;
run;
The input data set (Figure 74.11) contains the index variable, which indicates subjects in a hypothetical
random-effects model.
PROC MCMC Statement F 5659
Figure 74.11 Subject Variable in an Input Data Set
Obs
i index
1
1 Mary
2
2 James
3
3 Mary
4
4 Greg
5
5 Chris
6
6 James
7
7 James
8
8 Chris
9
9 James
10 10 James
11 11 Chris
12 12 Rob
13 13 Rob
14 14 Greg
15 15 Greg
16 16 Alice
17 17 Jen
18 18 Alice
19 19 John
20 20 Chris
The following statements illustrate the use of the REOBSINFO option:
ods select reobsinfo;
proc mcmc data=input reobsinfo stats=none diag=none;
random u ~ normal(0, sd=1) subject=index;
model general(0);
run;
Figure 74.12 displays the “Random Effect Observation Information” table. The table contains the
name of the random-effect parameter (u), the values of the subject variable index, the total number of
observations, and the row index of these observations in each of the subject values.
Figure 74.12 Random Effect Observation Information
The MCMC Procedure
Random Effect Observation Information
Number of
Subject Observations Observation
Parameter Values
in Subject Indices
u
Mary
2 13
James
5 2 6 7 9 10
Greg
3 4 14 15
Chris
4 5 8 11 20
Rob
2 12 13
Alice
2 16 18
Jen
1 17
John
1 19
5660 F Chapter 74: The MCMC Procedure
The display-options are as follows:
MAXVALUEPRINT=number | ALL
MAXVPRINT=number | ALL
prints the number of subject values for each random effect (that is, the number of rows that are
displayed in the “Random Effect Observation Information” table for each random effect). The
default value is 20. MAXVALUEPRINT=ALL displays all subject values.
MAXOBSPRINT=number | ALL
MAXOPRINT=number | ALL
prints the number of observation indices for each subject value of every random effect (that is,
the maximum number of indices that are displayed in the “Observation Indices” column in the
“Random Effect Observation Information” table). The default value is 20. MAXOBSPRINT=ALL
displays indices for every subject value.
SCALE=value
controls the initial multiplicative scale to the covariance matrix of the proposal distribution. By default,
SCALE=2.38. See the section “Scale Tuning” on page 5694 for more details.
SEED=n
specifies the random number seed. By default, SEED=0, and PROC MCMC gets a random number
seed from the clock.
SIMREPORT=n
controls the number of times that PROC MCMC reports the expected run time of the simulation.
This can be useful for monitoring the progress of CPU-intensive programs. For example, with
SIMREPORT=2, PROC MCMC reports the simulation progress twice. By default, SIMREPORT=0,
and there is no reporting. The expected run times are displayed in the log file.
SINGDEN=value
defines the singularity criterion in PROC MCMC. By default, SINGDEN=1E-11. The value indicates
the exclusion of an endpoint in an interval. The mathematical notation “.0” is equivalent to “Œvalue” in
PROC MCMC—that is, x < 0 is treated as x value in PROC MCMC. The maximum SINGDEN
allowed is 1E-6.
STATISTICS< (global-options) > = NONE | ALL |stats-request
STATS< (global-options) > = NONE | ALL |stats-request
specifies options for posterior statistics. By default, PROC MCMC computes the posterior mean,
standard deviation, quantiles, and two 95% credible intervals: equal-tail and highest posterior density
(HPD). Other available statistics include the posterior correlation and covariance. See the section
“Summary Statistics” on page 150 in Chapter 7, “Introduction to Bayesian Analysis Procedures,” for
more details. You can request all of the posterior statistics by specifying STATS=ALL. You can
suppress all the calculations by specifying STATS=NONE.
You can use postprocessing autocall macros to calculate posterior summary statistics of the posterior
samples after PROC MCMC has exited. See the section “Autocall Macros for Postprocessing” on
page 5744.
You can specify the following global-options to display interval and percentile estimates:
PROC MCMC Statement F 5661
ALPHA=numeric-list
specifies the ˛ level for the equal-tail and HPD intervals. The value ˛ must be between 0 and 0.5.
By default, ALPHA=0.05.
PERCENTAGE | PERCENT=numeric-list
calculates the posterior percentages. The numeric-list contains values between 0 and 100. By
default, PERCENTAGE=(25 50 75).
You can specify the following stats-requests:
ALL
computes all posterior statistics. You can combine the option ALL with any other options. For
example STATS(ALPHA=(0.02 0.05 0.1))=ALL computes all statistics with the default settings
and intervals at ˛ levels of 0.02, 0.05, and 0.1.
BRIEF
computes the posterior means, standard deviations, and the 100.1
each variable.
˛/% equal-tail intervals for
CORR
computes the posterior correlation matrix.
COV
computes the posterior covariance matrix.
INTERVAL
INT
computes the 100.1 ˛/% equal-tail and HPD credible intervals for each variable. For more
information, see the sections “Equal-Tail Credible Interval” on page 151 in Chapter 7, “Introduction to Bayesian Analysis Procedures,” and “Highest Posterior Density (HPD) Interval” on
page 151 in Chapter 7, “Introduction to Bayesian Analysis Procedures.” By default, ˛ D 0:05,
but you can use the ALPHA= global-option to request other intervals of any probabilities.
NONE
suppresses all of the statistics.
SUMMARY
SUM
computes the posterior means, standard deviations, and percentile points for each variable. By
default, the 25th, 50th, and 75th percentile points are produced, but you can use the global
PERCENT= option to request specific percentile points.
TARGACCEPT=value
specifies the target acceptance rate for the random walk Metropolis algorithm. For more information,
see the section “Metropolis and Metropolis-Hastings Algorithms” on page 130 in Chapter 7, “Introduction to Bayesian Analysis Procedures.” The numeric value must be between 0.01 and 0.99. By
default, TARGACCEPT=0.45 for models that have one parameter; TARGACCEPT=0.35 for models
that have two, three, or four parameters; and TARGACCEPT=0.234 for models that have more than
four parameters (Roberts, Gelman, and Gilks 1997; Roberts and Rosenthal 2001).
5662 F Chapter 74: The MCMC Procedure
TARGACCEPTI=value
specifies the target acceptance rate for the independence sampler algorithm. The independence sampler
is used for blocks of binary parameters. For more information, see the section “Independence Sampler”
on page 133 in Chapter 7, “Introduction to Bayesian Analysis Procedures.” The numeric value must be
between 0 and 1. By default, TARGACCEPTI=0.6.
THIN=n
NTHIN=n
controls the thinning rate of the simulation. PROC MCMC keeps every nth simulation sample and
discards the rest. All the posterior statistics and diagnostics are calculated using the thinned samples.
By default, THIN=1. For more information, see the section “Burn-In, Thinning, and Markov Chain
Samples” on page 135 in Chapter 7, “Introduction to Bayesian Analysis Procedures.”
TRACE
displays the result of each operation in each statement in the model program as it is executed. This
debugging option is very rarely needed, and it produces voluminous output. If you use this option, also
specify small numbers in the NMC=, NBI=, MAXTUNE=, and NTU= options.
TUNEWT=value
specifies the multiplicative weight used in updating the covariance matrix of the proposal distribution.
The numeric value must be between 0 and 1. By default, TUNEWT=0.75. For more information, see
the section “Covariance Tuning” on page 5695.
ARRAY Statement
ARRAY arrayname [ dimensions ] < $ > < variables-and-constants > ;
The ARRAY statement associates a name (of no more than eight characters) with a list of variables and
constants. The ARRAY statement is similar to, but not the same as, the ARRAY statement in the DATA step,
and it is the same as the ARRAY statements in the NLIN, NLP, NLMIXED, and MODEL procedures. The
array name is used with subscripts in the program to refer to the array elements, as illustrated in the following
statements:
array r[8] r1-r8;
do i = 1 to 8;
r[i] = 0;
end;
The ARRAY statement does not support all the features of the ARRAY statement in the DATA step. Implicit
indexing of variables cannot be used; all array references must have explicit subscript expressions. Only exact
array dimensions are allowed; lower-bound specifications are not supported. A maximum of six dimensions
is allowed.
Both variables and constants can be array elements. Constant array elements cannot have values assigned
to them while variables can. Both the dimension specification and the list of elements are optional, but at
least one must be specified. When the list of elements is not specified or fewer elements than the size of the
array are listed, array variables are created by appending element numbers to the array name to complete the
element list. You can index array elements by enclosing a subscript in braces .f g/ or brackets .Œ /, but not in
parentheses .. //. The parentheses are reserved for function calls only.
BEGINCNST/ENDCNST Statement F 5663
For example, the following statement names an array day:
array day[365];
By default, the variables names are day1 to day365. However, since day is a SAS function, any subscript
that uses parentheses gives you the wrong results. The expression day(4) returns the value 5 and does not
reference the array element day4.
BEGINCNST/ENDCNST Statement
BEGINCNST ;
ENDCNST ;
The BEGINCNST and ENDCNST statements define a block within which PROC MCMC processes the
programming statements only during the setup stage of the simulation. You can use the BEGINCNST and
ENDCNST statements to define constants or import data set variables into arrays. Storing data in arrays
enables you to work with data that are not identically distributed (see the section “Modeling Joint Likelihood”
on page 5729) or to implement your own Markov chain sampler (see the section “UDS Statement” on
page 5687). You can also use the BEGINCNST and ENDCNST statements to assign initial values to the
parameters (see the section “Assignments of Parameters” on page 5699).
Assign Constants
Whenever you have programming statements that calculate constants that do not need to be evaluated multiple
times throughout the simulation, you should put them within the BEGINCNST and ENDCNST statements.
Using these statements can reduce redundant processing. For example, you can assign a constant to a symbol
or fill in an array with numbers:
array cnst[17];
begincnst;
offset = 17;
do i = 1 to 17;
cnst[i] = i * i;
end;
endcnst;
During the setup process, PROC MCMC evaluates the programming statements within the BEGINCNST/ENDCNST once for each observation in the data set and ignores the statements in the rest of the
simulation.
READ_ARRAY Function
Sometimes you might need to store variables, either from the current input data set or from a different data
set, in arrays and use these arrays to specify your model. The READ_ARRAY function is convenient for that
purpose.
The following two forms of the READ_ARRAY function are available:
rc = READ_ARRAY (data-set, array ) ;
rc = READ_ARRAY (data-set, array < , "col-name1" > < , "col-name2" > < , . . . >) ;
5664 F Chapter 74: The MCMC Procedure
where
rc returns 0 if the function is able to successfully read the data set.
data-set specifies the name of the data set from which the array data is read. The value specified for
data-set must be a character literal or a variable that contains the member name (libname.memname)
of the data set to be read from.
array specifies the PROC MCMC array variable into which the data is read. The value specified for
array must be a local temporary array variable because the function might need to grow or shrink its
size to accommodate the size of the data set.
col-name specifies optional names for the specific columns of the data set that are read. If specified,
col-name must be a literal string enclosed in quotation marks. In addition, col-name cannot be a PROC
MCMC variable. If column names are not specified, PROC MCMC reads all of the columns in the
data set.
When SAS translates between an array and a data set, the array is indexed as [row,column].
The READ_ARRAY function attempts to dynamically resize the array to match the dimensions of the input
data set. Therefore, the array must be dynamic; that is, the array must be declared with the /NOSYMBOLS
option.
For examples that use the READ_ARRAY function, see “Modeling Joint Likelihood” on page 5729, “Example 74.14: Time Independent Cox Model” on page 5847, and “Example 74.19: Implement a New Sampling
Algorithm” on page 5873.
BEGINNODATA/ENDNODATA Statements
BEGINNODATA ;
ENDNODATA ;
BEGINPRIOR ;
ENDPRIOR ;
The BEGINNODATA and ENDNODATA statements define a block within which PROC MCMC processes
the programming statements without stepping through the entire data set. The programming statements are
executed only twice: at the first and the last observation of the data set. The BEGINNODATA and ENDNODATA statements are best used to reduce unnecessary observation-level computations. Any computations
that are identical to every observation, such as transformation of parameters, should be enclosed in these
statements.
At the first observation, PROC MCMC executes all programming statements, including those that are enclosed
by these two statements. This enables a quick update of all the symbols enclosed by the BEGINNODATA
and ENDNODATA statements. The goal is to ensure that subsequent statements (for example, the MODEL
statement) use symbol values that have been calculated correctly. At the last observation, PROC MCMC
executes the enclosed programming statements again and adds the log of the prior density to the log of the
posterior density.
BY Statement F 5665
The BEGINPRIOR and ENDPRIOR statements are aliases for the BEGINNODATA and ENDNODATA
statements, respectively. You can enclose PRIOR statements in the BEGINNODATA and ENDNODATA
statements.
BY Statement
BY variables ;
You can specify a BY statement with PROC MCMC to obtain separate analyses of observations in groups
that are defined by the BY variables. When a BY statement appears, the procedure expects the input data
set to be sorted in order of the BY variables. If you specify more than one BY statement, only the last one
specified is used.
If your input data set is not sorted in ascending order, use one of the following alternatives:
Sort the data by using the SORT procedure with a similar BY statement.
Specify the NOTSORTED or DESCENDING option in the BY statement for the MCMC procedure.
The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged
in groups (according to values of the BY variables) and that these groups are not necessarily in
alphabetical or increasing numeric order.
Create an index on the BY variables by using the DATASETS procedure (in Base SAS software).
For more information about BY-group processing, see the discussion in SAS Language Reference: Concepts.
For more information about the DATASETS procedure, see the discussion in the Base SAS Procedures Guide.
MODEL Statement
MODEL dependent-variable-list Ï distribution < options > ;
The MODEL statement specifies the conditional distribution of the data given the parameters (the likelihood
function). You specify a single dependent variable or a list of dependent variables, a tilde Ï, and then a
distribution with its arguments. The dependent variables can be variables from the input data set or functions
of the symbols in the program. You must specify the dependent variables unless you use the GENERAL
function or the DGENERAL function (see the section “Specifying a New Distribution” on page 5715 for
more details).
The MODEL statement assumes that the observations are independent of each other, conditional on the
model parameters. If you want to model dependent data—that is, f .yi j; yj / for j ¤ i —you can use the
JOINTMODEL option in the PROC MCMC statement. See the section “Modeling Joint Likelihood” on
page 5729 for more details. By default, the log-likelihood value is the sum of the individual log-likelihood
value for each observation.
You can specify multiple MODEL statements. You can define likelihood functions that are independent of
each other. For example, in the following statements, the dependent variables y1 and y2 are independent of
each other:
5666 F Chapter 74: The MCMC Procedure
model y1 ~ normal(alpha, var=s21);
model y2 ~ normal(beta, var=s22);
Alternatively, you can use marginal and conditional distributions to define a joint log-likelihood function
for multiple dependent variables. For example, the following statements jointly define a distribution over
.y1; y2/. They specify a marginal distribution for the dependent variable y1 and a conditional distribution for
the dependent variable y2:
model y1 ~ normal(alpha, var=s21);
model y2 ~ normal(beta * y1, var=s22);
Every program must have at least one MODEL statement. If you want to run a Monte Carlo simulation that
does not require a response variable, use the GENERAL function in the MODEL statement:
model general(0);
PROC MCMC interprets the statement as a flat likelihood function with a constant log-likelihood value of 0.
PROC MCMC is a programming language that is similar to the DATA step, and the order of statement
evaluation is important. For example, the MODEL statement must come after any SAS programming
statements that define or modify arguments used in the construction of the log likelihood. In PROC MCMC, a
symbol can be defined multiple times and used at different places. Using an expression out of order produces
erroneous results that can also be hard to detect.
Do not embed the MODEL statement within programming statements. For example, suppose you have
three response variables, y1, y2, and y3, and want to model each with a normal distribution. The following
statements lead to erroneous output:
array Y[3] y1 y2 y3;
do i = 1 to 3;
model y[i] ~ normal(mu, sd=s);
end;
Instead, you should do one of the following.
Use separate MODEL statements:
model y1 ~ normal(mu, sd=s);
model y2 ~ normal(mu, sd=s);
model y3 ~ normal(mu, sd=s);
Use the GENERAL function to construct a joint distribution of the three dependent variables and use a
single MODEL statement to specify the log-likelihood function:
llike = logpdf("normal", y1, mu, s) +
logpdf("normal", y2, mu, s) +
logpdf("normal", y3, mu, s);
model y1 y2 y3 ~ general(llike);
See the section “Specifying a New Distribution” on page 5715 for more information about how to use
the GENERAL function to specify an arbitrary distribution.
MODEL Statement F 5667
Missing data are allowed in the response variables; the MODEL statement augments missing data automatically. (In releases before SAS/STAT 12.1, observations with missing values were discarded prior to analysis
and PROC MCMC did not attempt to model these values.) In each iteration, PROC MCMC samples missing
values from their posterior distributions and incorporates them as part of the simulation. PROC MCMC
creates one variable for each missing response value. There are two ways to create the missing value variable
names; see the NAMESUFFIX= option for the naming convention of the variables.
Distributions in MODEL Statement
Standard distributions that the MODEL statement supports are listed in the Table 74.2 (univariate) and
Table 74.3 (multivariate). See the section “Standard Distributions” on page 5700 for density specifications.
You can also specify all distributions except the multinomial distribution in the PRIOR and HYPERPRIOR
statements. The RANDOM statement supports only a subset of the distributions (see Table 74.4).
PROC MCMC allows some distributions to be parameterized in multiple ways. For example, you can specify
a normal distribution with a variance, standard deviation, or precision parameter. For distributions that
have different parameterizations, you must specify an option to clearly name the ambiguous parameter. For
example, in the normal distribution, you must indicate whether the second argument represents variance,
standard deviation, or precision.
All univariate distributions, with the exception of binary and uniform, can have the optional LOWER= and
UPPER= arguments, which specify a truncated density. See the section “Truncation and Censoring” on
page 5719 for more details. Truncation is not supported for multivariate distributions.
Table 74.2 Univariate Distributions
Distribution Name
Definition
beta(< a= >˛, < b= >ˇ)
Beta distribution with shape parameters ˛ and ˇ
binary(< prob|p= > p)
Binary (Bernoulli) distribution with probability of
success p. You can use the alias bern for this
distribution.
binomial (< n= > n, < prob|p= > p)
Binomial distribution with count n and probability
of success p
cauchy (< location|loc|l= >, < scale|s= >)
Cauchy distribution with location and scale chisq(< df= > )
2 distribution with degrees of freedom
dgeneral(ll )
General log-likelihood function that you construct
using SAS programming statements for single or
multiple discrete parameters. Also see the
function general. The name dlogden is an alias
for this function.
expchisq(< df= > )
Log transformation of a 2 distribution with degrees of freedom:
chisq./ , log. / expchisq./. You
can use the alias echisq for this distribution.
5668 F Chapter 74: The MCMC Procedure
Table 74.2 (continued)
Distribution Name
Definition
expexpon(scale|s= )
expexpon(iscale|is= )
Log transformation of an exponential distribution
with scale or inverse-scale parameter :
expon./ , log. / expexpon./. You
can use the alias eexpon for this distribution.
expGamma(< shape|sp= > a, scale|s= )
expGamma(< shape|sp= > a, iscale|is= )
Log transformation of a gamma distribution with
shape a and scale or inverse-scale : gamma.a; / , log. / expgamma.a; /.
You can use the alias egamma for this
distribution.
expichisq(< df= > )
Log transformation of an inverse 2 distribution
with degrees of freedom:
ichisq./ , log. / expichisq./. You
can use the alias eichisq for this distribution.
expiGamma(< shape|sp= > a, scale|s= )
expiGamma(< shape|sp= > a, iscale|is= )
Log transformation of an inverse-gamma
distribution with shape a and scale or
inverse-scale : igamma.a; / , log. / expigamma.a; /. You can use the alias
eigamma for this distribution.
expsichisq(< df= > , < scale|s= > s)
Log transformation of a scaled inverse 2
distribution with degrees of freedom and scale
parameter s:
sichisq./ , log. / expsichisq./.
You can use the alias esichisq for this
distribution.
expon(scale|s= )
expon(iscale|is= )
Exponential distribution with scale or
inverse-scale parameter gamma(< shape|sp= > a, scale|s= )
gamma(< shape|sp= > a, iscale|is= )
Gamma distribution with shape a and scale or
inverse-scale geo(< prob|p= > p)
Geometric distribution with probability p
MODEL Statement F 5669
Table 74.2 (continued)
Distribution Name
Definition
general(ll )
General log-likelihood function that you construct
using SAS programming statements for a single or
multiple continuous parameters. The argument ll
is an expression for the log of the distribution. If
there are multiple variables specified before the
tilde in a MODEL, PRIOR, or HYPERPRIOR
statement, ll is interpreted as the log of the joint
distribution for these variables. Note that in the
MODEL statement, the response variable
specified before the tilde is just a place holder and
is of no consequence; the variable must have
appeared in the construction of ll in the
programming statements. general(constant ) is
equivalent to a uniform distribution on the real
line. You can use the alias logden for this
distribution.
ichisq(< df= >)
Inverse 2 distribution with degrees of freedom
igamma(< shape|sp= > a, scale|s= )
igamma(< shape|sp= > a, iscale|is= )
Inverse-gamma distribution with shape a and
scale or inverse-scale laplace(< location|loc|l= > , scale|s= )
laplace(< location|loc|l= > , iscale|is= )
Laplace distribution with location and scale or
inverse-scale . This is also known as the double
exponential distribution. You can use the alias
dexpon for this distribution.
logistic(< location|loc|l= > a, < scale|s= > b)
Logistic distribution with location a and scale b
lognormal(< mean|m= > , sd= )
lognormal(< mean|m= > , var|v= )
lognormal(< mean|m= > , prec= )
Log-normal distribution with mean and a value
of for the standard deviation, variance, or
precision. You can use the aliases lognormal or
lnorm for this distribution.
negbin(< n= > n, < prob|p= > p)
Negative binomial distribution with count n and
probability of success p. You can use the alias nb
for this distribution.
normal(< mean|m= > , sd= )
normal(< mean|m= > , var|v= )
normal(< mean|m= > , prec= )
Normal (Gaussian) distribution with mean and
a value of for the standard deviation, variance,
or precision. You can use the aliases gaussian,
norm, or n for this distribution.
pareto(< shape|sp= > a, < scale|s= > b)
Pareto distribution with shape a and scale b
5670 F Chapter 74: The MCMC Procedure
Table 74.2 (continued)
Distribution Name
Definition
poisson(< mean|m= > )
Poisson distribution with mean sichisq(< df= > , < scale|s= > s)
Scaled inverse 2 distribution with degrees of
freedom and scale parameter s
t(< mean|m= > , sd= , < df= > )
t(< mean|m= > , var|v= , < df= > )
t(< mean|m= > , prec= , < df= > )
T distribution with mean , standard deviation or
variance or precision , and degrees of freedom
table(< p= > p)
Table (categorical) distribution with probability
vector p. You can also use the alias cat for this
distribution.
uniform(< left|l= > a, < right|r= > b)
Uniform distribution with range a and b. You can
use the alias unif for this distribution.
wald(< mean|m= > , < iscale|is= > )
Wald distribution with mean parameter and
inverse scale parameter . This is also known as
the Inverse Gaussian distribution. You can use the
alias igaussian for this distribution.
weibull(; c; )
Weibull distribution with location (threshold)
parameter , shape parameter c, and scale
parameter .
Table 74.3 Multivariate Distributions
Distribution Name
Definition
dirichlet(< alpha= >˛)
Dirichlet distribution with parameter vector ˛,
where ˛ must be a one-dimensional array of
length greater than 1
iwish(< df= >, < scale= >S)
Inverse Wishart distribution with degrees of
freedom and symmetric positive definite scale
array S
multinom(< p= >p)
Multinomial distribution with probability vector p
mvn(< mu= >, < cov= >†)
Multivariate normal distribution with mean vector
and covariance matrix †
MVNAR(< mu= >, sd= , < rho= >)
MVNAR(< mu= >, var= , < rho= >)
MVNAR(< mu= >, prec= , < rho= >)
Multivariate normal distribution with mean vector
and a covariance matrix †. The covariance
matrix † is a multiple of the scale and a matrix
with a first-order autoregressive structure. When
RHO=0, this distribution becomes a multivariate
normal distribution with shared variance.
MODEL Statement F 5671
Options for the MODEL Statement
The options in the MODEL statement apply when there are missing values in the response variable, or in the
case of the ICOND= option, when there are lag or lead variables for the response variable. You can specify
the following options.
ICOND=variable-list | numeric-list
specifies the initial conditions (or initial states) of the lag or lead variables for the response variable
when the observation indices are out of the range. (For more information about rules of constructing lag
and lead variables in PROC MCMC, see the section “Access Lag and Lead Variables” on page 5731.)
For example, you can use the ICOND= option to specify the lag 1 value of the response for the first
observation. This option works similarly to the ICOND= option in the RANDOM statement, except
that the index is done according to observations, not a subject variable. The initial conditions can be
model parameters, functions of model parameters, or constants. By default, numeric-list is set to 0.
The ICOND= option in a MODEL statement sets the initial conditions for all lag or lead variables (for
the associated response variable) that appear in the program, not just those that appear in the MODEL
statement. Suppose you have a maximum L number of lag variables and a maximum M number of
lead variables of the response y in the program, and there are n observations. The program has the
following variables that need to be resolved during the simulation:
yLC1 ; : : : ; y0 ; y1 ; : : : ; yn ; ynC1 ; : : : ; ynCM
Of these variables, n are observations of y from the input data set and the remaining L+M are initial
conditions that are specified in the ICOND= option. In essence, the ICOND= numeric list stretches the
input data set by filling in the first L and last M values. As PROC MCMC steps through the input data
set, it resolves the current, lagged, and lead variables according to this stretched vector of observations.
The variable-list (or the number-list ) should be of length L+M, which can be greater than the number
of lag or lead response variables that appear in a program. Here is an example.
Suppose you want to fit an autoregressive model of order 2. And instead of two lagged values, the
model requires only the second lag,
Yi D A C Yi
2
C i
where the noise is assumed to be normal. To specify this autoregressive model, you would use the
statements
mu = A + phi * y.l2;
model y ~ normal(mu, var=s2) icond=(-2 -1);
where the Yi 2 , or the lag-2 of Y, variable is constructed by concatenating the variable name, the letter
L (for “lag”), and a lag number.
This model requires two initial conditions for the lag-2 variable of Y, at the first and second observations.
Therefore, the ICOND= option expects a numeric list of two values. In this example, at the first
observation, the variable y.l2 is given a value of –2; at the second observation, y.l2 is given a value
of –1. If you provide a partial list that contains less than the expected number of conditions, PROC
MCMC fills the remaining list with the value of 0.
5672 F Chapter 74: The MCMC Procedure
INITIAL=SAS-data-set | constant | numeric-list
specifies the initial values of the missing values. By default, PROC MCMC uses a sample average
of the nonmissing values of a response variable as the starting values for all missing values in the
simulation for that variable. You can use the INITIAL= option to start the Markov chain at a different
place.
If you use a SAS-data-set to store initial values, the data set must consist of variable names that agree
with the missing variable names that are used by PROC MCMC. The easiest way to find the names of
the internally created variables is to run a default analysis with a very small number of simulations and
check the variable names in the OUTPOST= data set. You can provide a subset of the initial values in
the SAS-data-set , and PROC MCMC uses a default mechanism to fill in the rest of the missing initial
values.
For example, the following statement creates a data set with initial values for the first three missing
values of a response variable:
data RandomInit;
input y_1 y_2 y_3;
datalines;
2.3 3 -3
;
The following MODEL statement uses the values in the RandomInit data set as the initial values of the
corresponding missing values in the model:
model y ~ normal(0,var=s2u) init=randominit;
Specifying a constant assigns that constant as the initial value to all missing values in that response
variable. For example, the following statement assigns the value 5 to be used as an initial value for all
missing yi in the model:
model y ~ normal(0,var=s2u) init=5;
If you have a multidimensional response variable, you can provide a list of numbers that have the
same length as the dimension of your response array. Each number is then given to all corresponding
missing variables in order. For example, the following statement assigns the value 2 to be used as an
initial value for all missing w1i and the value 3 to be used for all missing w2i in the model:
array w[2] w1 w2;
model w ~ mvn(mu, cov) init=(2 3);
MONITOR= (symbol-list | number-list | RANDOM(number ))
outputs analysis for selected missing data variables. You can choose to monitor the missing values by
listing the response variable names, the missing data variable names, or indices, or you can have them
randomly selected by PROC MCMC.
For example, suppose that the data set contains 10 observations and the response variable y has missing
values in observations 2, 3, 7, 9, and 10. To monitor all missing data variables (five in total), you
specify the response variable name in the MONITOR= option:
MODEL Statement F 5673
model y ~ normal(0,var=s2u) monitor=(y);
Suppose you want to monitor the missing data variables that correspond to the missing values in
observations 2, 3, and 10. You have two options: provide either a list of variable names or a list of
indices.
The following statement selects monitored variables by their variable names:
model y ~ normal(0,var=s2u) monitor=(y_2 y_3 y_10);
The variable names must match the internally created variable names for each missing value. See
NAMESUFFIX= option for the naming convention of the variables. By default, the names are created
by concatenating the response variable with the observation index; hence you use the name_obs format
to construct the names. The numbers 2, 3, and 10 are the corresponding observation indices to the
missing values in the input data set.
The following statement selects monitored variables by indices:
model y ~ normal(0,var=s2u) monitor=(1 2 5);
The indices are not a list of the observation numbers, but rather the order by which the missing values
appear in the data set: PROC MCMC reports back the first, the second, and the fifth missing value
variables that it creates. The actual variable names that appear in the output are still y_2, y_3, and
y_10, honoring the control of the NAMESUFFIX= option.
Lastly, PROC MCMC can randomly choose a subset of the variables to monitor. The following
statement randomly selects 3 variables to monitor:
model y ~ normal(0,var=s2u) monitor=(random(3));
The list of the random indices is controlled by the SEED= option in the PROC MCMC statement.
Therefore, the selected variables will be the same when the SEED= option is the same.
NAMESUFFIX=OBSERVATION | POSITION | ORDER
specifies how the names of the missing data variables are created. By default, the names are created by
concatenating the response variable symbol, an underscore (“_”), and the observation number of the
missing value.
NAMESUFFIX=OBSERVATION constructs the parameter names by appending the observation
number to the response variable symbol. This is the default. NAMESUFFIX=POSITION or NAMESUFFIX=ORDER construct the parameter names by appending the numbers 1, 2, 3, and so on, where
the number indicates the order in which the missing values appear in the data set.
For example, suppose you have a response variable y with 10 observations in total, of which five are
missing (observations 2, 3, 7, 9, and 10). By default, PROC MCMC creates five variable names y_2,
y_3, y_7, y_9, and y_10. Using NAMESUFFIX=POSITION changes the names to y_1, y_2, y_3, y_4,
and y_5.
5674 F Chapter 74: The MCMC Procedure
NOOUTPOST
suppresses the output of the posterior samples of missing data variables to the posterior output data set
(which is specified in the OUTPOST= option in the PROC MCMC statement). In models with a large
number of missing values (for example, tens of thousands), PROC MCMC can run faster if it does not
save the posterior samples.
When you specify both the NOOUTPOST option and the MONITOR= option, PROC MCMC outputs
the list of variables that are monitored.
The maximum number of variables that can be saved to an OUTPOST= data set is 32,767. If the total
number of parameters in your model, including the number of missing data variables, exceeds the limit,
the NOOUTPOST option is evoked automatically and PROC MCMC does not save the missing value
draws to the posterior output data set. You can use the MONITOR= option to select a subset of the
parameters to store in the OUTPOST= data set.
PARMS Statement
PARMS
name |(name-list)< = > < { > number | number-list < } >
< name |(name-list)< = > < { > number | number-list < } > . . . >
< / options > ;
The PARMS statement lists the names of the parameters in the model and specifies optional initial values
for these parameters. These parameters are referred to as the model parameters. You can specify multiple
PARMS statements. Each PARMS statement defines a block of parameters, and the blocked Metropolis
algorithm updates the parameters in each block simultaneously. See the section “Blocking of Parameters”
on page 5691 for more details. PROC MCMC generates missing initial values from the prior distributions
whenever needed, as long as they are the standard distributions and not the GENERAL or DGENERAL
function.
If your model contains a multidimensional parameter (for example, a parameter with a multivariate normal
prior distribution), you must declare the parameter as an array (using the ARRAY statement). You can use
braces f g after the parameter name in the PARM statement to assign initial values. For example:
array mu[3];
parms mu {1 2 3};
You cannot use the ARRAY statement to assign initial values. If you use the ARRAY statement to store
values in array elements, the declared array becomes a constant array and cannot be used as parameters in the
PARMS statement. For example, the following statement assigns three numbers to mu:
array mu[3] (1 2 3);
The array mu can no longer be a model parameter.
Every parameter in the PARMS statement must have a corresponding prior distribution in the PRIOR
statement. The program exits if this one-to-one requirement is not satisfied.
You can specify the following options to control different samplers explicitly for that block of parameters.
PREDDIST Statement F 5675
NORMAL | N
uses the normal proposal distribution in the random walk Metropolis. This is the default.
T < (df ) >
uses the t distribution with df degrees of freedom as an alternative proposal distribution. A t distribution
with a small number of degrees of freedom has thicker tails and can sometimes improve the mixing of
the Markov chain. When df > 100, the normal distribution is used instead.
SLICE
applies the slice sampler to each parameter in the PARMS statement individually. See the section“Slice
Sampler” on page 133 in Chapter 7, “Introduction to Bayesian Analysis Procedures,” for details. PROC
MCMC does not implement a multidimensional version of the slice sampler. Because the slice sampler
usually requires multiple evaluations of the objective function (the posterior distribution) in each
iteration, the associated computational cost could be potentially high with this sampling algorithm.
UDS
implements a user-defined sampler for any of the parameters in the block. See the section “UDS
Statement” on page 5687 for details and “Example 74.19: Implement a New Sampling Algorithm” on
page 5873 for a realistic example. When you specify the UDS option, PROC MCMC hands off the
sampling of these parameters to you at each iteration and relies on your sampler to return a random
draw from the conditional posterior distribution. This option is useful if you have a model-specific
sampler that you want to implement or a new algorithm that can improve the convergence and mixing
of the Markov chain. This functionality is for advanced users, and you should proceed with caution.
PREDDIST Statement
PREDDIST < 'label' > OUTPRED=SAS-data-set < NSIM=n > < COVARIATES=SAS-data-set >
< STATISTICS=options > ;
The PREDDIST statement creates a new SAS data set that contains random samples from the posterior
predictive distribution of the response variable. The posterior predictive distribution is the distribution of
unobserved observations (prediction) conditional on the observed data. Let y be the observed data, X be the
covariates, be the parameter, and ypred be the unobserved data. The posterior predictive distribution is
defined to be the following:
Z
p.ypred jy; X/ D
p.ypred ; jy; X/d
Z
D
p.ypred j; y; X/p.jy; X/d
Given the assumption that the observed and unobserved data are conditional independent given , the
posterior predictive distribution can be further simplified as the following:
Z
p.ypred jy; X/ D p.ypred j /p.jy; X/d
The posterior predictive distribution is an integral of the likelihood function p.ypred j / with respect to
the posterior distribution p.jy/. The PREDDIST statement generates samples from a posterior predictive
distribution based on draws from the posterior distribution of .
5676 F Chapter 74: The MCMC Procedure
The PREDDIST statement works only on response variables that have standard distributions, and it does not
support either the GENERAL or DGENERAL functions. Multiple PREDDIST statements can be specified,
and an optional label (specified as a quoted string) helps identify the output.
The following list explains specifications in the PREDDIST statement:
COVARIATES=SAS-data-set
names the SAS data set that contains the sets of explanatory variable values for which the predictions
are established. This data set must contain data with the same variable names as are used in the
likelihood function. If you omit the COVARIATES= option, the DATA= data set specified in the PROC
MCMC statement is used instead.
NSIM=n
specifies the number of simulated predicted values. By default, NSIM= uses the NMC= option value
specified in the PROC MCMC statement.
OUTPRED=SAS-data-set
creates an output data set to contain the samples from the posterior predictive distribution. The output
variable names are listed as resp_1–resp_m, where resp is the name of the response variable and m
is the number of observations in the COVARIATES= data set in the PREDDIST statement. If the
COVARIATES= data set is not specified, m is the number of observations in the DATA= data set
specified in the PROC statement.
SAVEPARM
outputs to the OUTPRED= data set sampled parameter values that are used in each predictive draw.
STATISTICS< (global-options) > = NONE | ALL |stats-request
STATS< (global-options) > = NONE | ALL |stats-request
specifies options for calculating posterior statistics. This option works identically to the STATISTICS=
option in the PROC statement. By default, this option takes the specification of the STATISTICS=
option in the PROC MCMC statement.
For an example that uses the PREDDIST statement, see the section “Posterior Predictive Distribution” on
page 5749.
PRIOR/HYPERPRIOR Statement
PRIOR parameter-list Ï distribution ;
HYPERPRIOR parameter-list Ï distribution ;
HYPER parameter-list Ï distribution ;
The PRIOR statement specifies the prior distribution of the model parameters. You must specify a single
parameter or a list of parameters, a tilde Ï, and then a distribution with its parameters.
Programming Statements F 5677
You can specify multiple PRIOR statements to define models with multiple prior components. Your model
can have as many hierarchical levels as you want. But in many cases, such as random-effects models, it
is better to use the RANDOM statements to build up the model hierarchy. The log of the prior is the sum
of the log prior values from each of the PRIOR statements. Similar to the MODEL statement, you can
use the PRIOR statement to specify marginal or conditional prior distributions. See the section “MODEL
Statement” on page 5665 for the names of the standard distributions and the section “Standard Distributions”
on page 5700 for density specification.
The PRIOR statements are processed twice at every Markov chain simulation—that is, twice per pass
through the data set. The statements are called at the first and the last observation of the data set, just as the
BEGINNODATA and ENDNODATA statements are processed. If you run a Monte Carlo simulation that
is data-independent, you can specify the NOLOGDIST option in the PROC MCMC statement to omit the
calculation of the prior distribution. Omitting this calculation enables PROC MCMC to run faster.
The HYPERPRIOR statement is treated internally the same as the PRIOR statement. It provides a notational
convenience in case you want to fit a multilevel hierarchical model. It specifies the hyperprior distribution of
the prior distribution parameters. The log of the hyperprior is the sum of the log hyperprior values from each
of the HYPERPRIOR statements.
Parameters in the PRIOR statements can appear as hyperparameters in the RANDOM statement. The reverse
is not allowed: random-effects parameters cannot be hyperparameters in a PRIOR statement.
You can have a program that contains a RANDOM statement but no PRIOR statements. (In SAS 9.3 and
earlier, each program had to contain a PRIOR statement.) A program that contains a RANDOM statement but
no PRIOR statements could be a random-effects model with no fixed-effects parameters or hyperparameters
to the random effects. A MODEL statement is still required in every program.
Programming Statements
This section lists the programming statements available in PROC MCMC to compute the priors and loglikelihood functions. This section also documents the differences between programming statements in
PROC MCMC and programming statements in the DATA step. The syntax of programming statements used
in PROC MCMC is identical to that used in the NLMIXED procedure (see Chapter 83, “The NLMIXED
Procedure”) and the MODEL procedure (see Chapter 25, “The MODEL Procedure” (SAS/ETS User’s Guide)).
Most of the programming statements that can be used in the DATA step can also be used in PROC MCMC.
See SAS Language Reference: Dictionary for a description of SAS programming statements.
There are also a number of unique functions in PROC MCMC that calculate the log density of various
distributions in the procedure. You can find them at the section “Using Density Functions in the Programming
Statements” on page 5716.
For the list of matrix-based functions that is supported in PROC MCMC, see the section “Matrix Functions
in PROC MCMC” on page 5723.
5678 F Chapter 74: The MCMC Procedure
The following are valid statements:
ABORT;
ARRAY arrayname < [ dimensions ] > < $ > < variables-and-constants >;
CALL name < (expression < , expression . . . >) >;
DELETE;
DO < variable = expression < TO expression > < BY expression > >
< , expression < TO expression > < BY expression > > . . .
< WHILE expression > < UNTIL expression >;
END;
GOTO statement-label;
IF expression;
IF expression THEN program-statement;
ELSE program-statement;
variable = expression;
variable + expression;
LINK statement-label;
PUT < variable > < = > . . . ;
RETURN;
SELECT < (expression) >;
STOP;
SUBSTR(variable, index, length)= expression;
WHEN (expression)program-statement;
OTHERWISE program-statement;
For the most part, the SAS programming statements work the same as they do in the DATA step, as
documented in SAS Language Reference: Concepts. However, there are several differences:
The ABORT statement does not allow any arguments.
The DO statement does not allow a character index variable. Thus
do i = 1,2,3;
is supported; however, the following statement is not supported:
do i = 'A','B','C';
The PUT statement, used mostly for program debugging in PROC MCMC (see the section “Handling
Error Messages” on page 5766), supports only some of the features of the DATA step PUT statement,
and it has some features that are not available with the DATA step PUT statement:
– The PROC MCMC PUT statement does not support line pointers, factored lists, iteration factors,
overprinting, _INFILE_, _OBS_, the colon (:) format modifier, or “$”.
– The PROC MCMC PUT statement does support expressions, but the expression must be enclosed
in parentheses. For example, the following statement displays the square root of x:
put (sqrt(x));
The WHEN and OTHERWISE statements enable you to specify more than one target statement. That
is, DO/END groups are not necessary for multiple statement WHENs. For example, the following
syntax is valid:
RANDOM Statement F 5679
select;
when (exp1) stmt1;
stmt2;
when (exp2) stmt3;
stmt4;
end;
You should avoid defining variables that begin with an underscore (_). They might conflict with internal
variables created by PROC MCMC. The MODEL statement must come after any SAS programming
statements that define or modify terms used in the construction of the log likelihood.
RANDOM Statement
RANDOM random-effect Ï distribution SUBJECT=variable < options > ;
The RANDOM statement defines a single random effect and its prior distribution or an array of random
effects and their prior distribution. The random-effect must be represented by either a symbol or an array.
The RANDOM statement must consist of the random-effect , a tilde (Ï), the distribution for the random
effect, and then a SUBJECT= variable.
SUBJECT=variable | _OBS_
identifies the subjects in the random-effects model. The variable must be part of the input data set,
and it can be either a numeric variable or character literal. The variable does not need to be sorted,
and the input data set does not need to be clustered according to it. SUBJECT=_OBS_ enables you
fit an observation-level random-effects model (each observation has its own random effect) without
specifying a subject variable in the input data set.
The random-effects parameters associated with each subject in the same RANDOM statement are assumed to
be conditionally independent of each other, given other parameters and data set variables in the model. The
other parameters include model parameters (declared in the PARMS statements), random-effects parameters
(from other RANDOM statements), and missing data variables.
Table 74.4 shows the distributions that you can specify in the RANDOM statement.
Table 74.4 Valid Distributions in the RANDOM Statement
Distribution Name
Definition
beta(< a= >˛, < b= >ˇ)
Beta distribution with shape parameters ˛ and ˇ
binary(< prob|p= > p)
Binary (Bernoulli) distribution with probability of
success p. You can use the alias bern for this
distribution.
gamma(< shape|sp= > a, scale|s= )
gamma(< shape|sp= > a, iscale|is= )
Gamma distribution with shape a and scale or
inverse-scale 5680 F Chapter 74: The MCMC Procedure
Table 74.4 (continued)
Distribution Name
Definition
dgeneral(ll )
General log-prior function that you construct
using SAS programming statements for univariate
or multivariate discrete random effects. See the
section “Specifying a New Distribution” on
page 5715 for more details.
general(ll )
General log-prior function that you construct
using SAS programming statements for univariate
or multivariate continuous random effects. See the
section “Specifying a New Distribution” on
page 5715 for more details.
igamma(< shape|sp= > a, scale|s= )
igamma(< shape|sp= > a, iscale|is= )
Inverse-gamma distribution with shape a and
scale or inverse-scale laplace(< location|loc|l= > , scale|s= )
laplace(< location|loc|l= > , iscale|is= )
Laplace distribution with location and scale or
inverse-scale . This is also known as the double
exponential distribution. You can use the alias
dexpon for this distribution.
normal(< mean|m= > , sd= )
normal(< mean|m= > , var|v= )
normal(< mean|m= > , prec= )
Normal (Gaussian) distribution with mean and
a value of for the standard deviation, variance,
or precision. You can use the aliases gaussian,
norm, or n for this distribution.
poisson(< mean|m= > )
Poisson distribution with mean table(< p= > p)
Table (categorical) distribution with probability
vector p. You can also use the alias cat for this
distribution
uniform(< left|l= > a, < right|r= > b)
Uniform distribution with range a and b. You can
use the alias unif for this distribution.
MVN(< mu= >, < cov= >†)
Multivariate normal distribution with mean vector
and covariance matrix †
Multivariate normal distribution with mean vector
and a covariance matrix †. The covariance
matrix † is a multiple of the scale and a matrix
with a first-order autoregressive structure
MVNAR(< mu= >, sd= , < rho= >)
MVNAR(< mu= >, var= , < rho= >)
MVNAR(< mu= >, prec= , < rho= >)
RANDOM Statement F 5681
Table 74.4 (continued)
Distribution Name
Definition
normalcar(neighbors=, num=, < sd= >)
normalcar(neighbors=, num=, < var= >)
normalcar(neighbors=, num=, < prec= >)
Intrinsic Gaussian conditional autoregressive
(CAR) distribution. The NUM= option specifies
the name of the data set variable that contains the
number of neighbors for each subject; the
NEIGHBORS= option specifies the prefix of data
set variables that contain the neighboring indices
of each subject; is the standard deviation,
variance, or precision of the distribution.
The following RANDOM statement specifies a scale effect, where s2u can be a constant or a model parameter
and index is a data set variable that indicates group membership of the random effect u:
random u ~ normal(0,var=s2u) subject=index;
The following statements specify multidimensional effects, where mu and cov can be either parameters in the
model or constant arrays:
array w[2];
array mu[2];
array cov[2,2];
random w ~ mvn(mu, cov) subject=index;
You can specify multiple RANDOM statements. Hyperparameters in the prior distribution of a random effect
can be other random effects in the model. For example, the following statements are allowed because the
random effect g appears in the distribution for the random effect u:
random g ~ normal(0,var=s2g) subject=month;
random u ~ normal(g,var=s2u) subject=day;
These two RANDOM statements specify a nested hierarchical model in which the random effect g is the
hyperparameter of the random effect u. You can build the hierarchical structure as deep as you want. You can
also use multiple RANDOM statements to build non-nested random-effects models.
The number of random-effects parameters in each RANDOM statement is determined by the number of
unique values in the SUBJECT= variable, which can be either unsorted numeric or unsorted character literal.
Unlike the model parameters that are explicitly declared in the PARMS statement (with therefore a fixed total
number), the number of random-effects parameters in a program depends on the values of the SUBJECT=
data set variable. That number can change from one BY group to another.
The order of the RANDOM statements, or their relative placement with respect to other statements in the
program (such as the PRIOR statement or the MODEL statement), is not important. The programming order
becomes relevant if any hyperparameters are defined variables in the program. For example, in the following
statements, the hyperparameter s is defined as a function of some variable or parameter in the model:
5682 F Chapter 74: The MCMC Procedure
s = sqrt(s2g);
random g ~ normal(0,sd=s) subject=month;
That definition of s must appear before the RANDOM statement that requires it. If you switched the order of
the statements as follows, PROC MCMC would not be able to calculate the prior density for some subjects
correctly and would produce erroneous results.
random g ~ normal(0,sd=s) subject=month;
s = sqrt(s2g);
The names of the random-effects parameters are created internally. See the NAMESUFFIX= option for
the naming convention of the random-effects parameters. The random-effects parameters are updated
conditionally in the simulation. All posterior draws are saved to the OUTPOST= output data set by default,
and you can use the MONITOR= option to monitor any of the parameters. For more information about
available sampling algorithms, see the ALGORITHM= option. For more information about how to set a
random-effects parameter to a constant (also known as corner-point constraint), see the CONSTRAINT
option.
You can specify the following options in the RANDOM statement:
ALGORITHM=option
ALG=option
specifies the algorithm to use to sample the posterior distribution. The following options are available:
RWM
uses the random-walk Metropolis algorithm with normal proposal.
SLICE
uses the slice sampling algorithm.
GEO
uses the discrete random-walk Metropolis with symmetric geometric proposal.
When possible, PROC MCMC samples directly from the full conditional distribution. Otherwise, the
default sampling algorithm is the RWM.
CENTER | NOCENTER
specifies whether to re-center the random-effects parameters after each draw. This option applies only
when you use the NORMALCAR prior. The default is CENTER.
CONSTRAINT(VALUE=value) = FIRST | LAST | NONE | ’formatted-value’
ZERO=FIRST | LAST | NONE | ’formatted-value’
sets one of the random-effects parameters to a fixed value. The default is ZERO=NONE, which
does not fix any of the parameters to be a constant. This option enables you to eliminate one of the
parameters.
For example, this option could be useful if you want to fit a regression model with categorical
covariates and, instead of creating a design matrix, you treat the parameters as “random effects” and fit
an equivalent random-effects model.
Suppose you have a regression that includes a categorical variable X with J levels. You can construct a
full-rank design matrix with J–1 dummy variables (X2 XJ with X1 being the base group) and fit a
RANDOM Statement F 5683
regression such as the following:
i D ˇ0 C ˇ2 X2 ˇJ XJ
The following statements in a PROC MCMC step fit such a hypothetical regression model:
parms beta0 betax2 ... betaxJ;
prior beta: ~ n(0, sd=100);
mu = beta0 + betax2 * x2 + ... betaxJ * xJ;
...
Equivalently, you can also treat this model as a random-effects model such as the following, where ˇj
are random effects for each category in X:
i D ˇ0 C ˇj for j D 1; : : : ; J
However, this random-effects model is over-parameterized. The ZERO= option rids the model with
one random-effects parameter of choice and fixes it to be zero. The following example statements fit
such a hypothetical random-effects model:
parms beta0;
prior beta0 ~ n(0, sd=100);
random beta ~ n(0, sd=100) subject=x zero=first;
mu = beta0 + beta;
...
The specification ZERO=FIRST sets the first random-effects parameter to 0, implying ˇ1 D 0. This
random-effects parameter corresponds to the first category in the SUBJECT= variable. The category is
what the first observation of the SUBJECT= variable takes.
The specification ZERO=LAST sets the last random-effects parameter to be 0, implying ˇJ D 0. This
random-effects parameter corresponds to the last category in the SUBJECT= variable. The category is
not necessarily the same category that the last observation of the SUBJECT= variable takes because
the SUBJECT= variable does not need to be sorted.
The specification ZERO=‘formatted-value’ sets the random-effects parameter for the category (in
the SUBJECT= variable) with a formatted value that matches ‘formatted-value’ to 0. For example,
ZERO=‘3’ sets ˇ3 D 0.
The CONSTRAINT(VALUE=value) option works similarly to the ZERO= option. You can assign
an arbitrary value to any one of the random-effects parameter. For example, the specification CONSTRAINT(VALUE=0)=FIRST is equivalent to ZERO=FIRST.
ICOND=variable-list | numeric-list
ISTATES=variable-list | numeric-list
specifies the initial conditions (or initial states) of the lag or lead variable of the random effect when the
subject indices are out of the range of the subjects. (For more information about rules of constructing lag
and lead variables in PROC MCMC, see the section “Access Lag and Lead Variables” on page 5731.)
This works similarly to the ICOND= option in the MODEL statement, except that the index is done
5684 F Chapter 74: The MCMC Procedure
according to a subject variable, not observations. The initial conditions can be model parameters,
functions of model parameters, or constants. By default, numeric-list is set to 0.
The ICOND= option in a RANDOM statement sets the initial conditions for all lag or lead variables (of
the associated random effect) that appear in the program, not just those that appear in the RANDOM
statement. Suppose you have a maximum L number of lag variables and a maximum M number of
lead variables of the random effect mu in the program, and there are n clusters. The program has the
following vector of variables that need to be resolved during simulation:
LC1 ; : : : ; 0 ; 1 ; : : : ; n ; nC1 ; : : : ; nCM
Of these variables, n (1 ; : : : ; n ) are random-effects parameters and the remaining L+M are initial
conditions that are specified in the ICOND= option. The variable-list (or the number-list ) should be a
vector of length L+M, which can be greater than the number of lag/lead random-effect variables that
appear in a program. If you provide a partial list that contains fewer than L+M states, PROC MCMC
fills the remaining vector with the value of 0.
INITIAL=SAS-data-set | constant | numeric-list
specifies the initial values of the random-effects parameters. By default, PROC MCMC uses the same
option as specified in the INIT= option to generate initial values for the random-effects parameter:
either it uses the mode of the prior density or it randomly draws a sample from that distribution. You
can start the Markov chain at different places by providing a SAS-data-set , a constant, or a numeric-list
for multivariate random-effects parameters.
If you use a SAS-data-set , the data set must consist of variable names that agree with the randomeffects parameters in the model (see the NAMESUFFIX= option for the naming convention of the
random-effects parameters). The easiest way to find the names of the internally created parameter
names is to run a default analysis with a very small number of simulations and check the variable
names in the OUTPOST= data set. You can provide a subset of the initial values in the SAS-data-set
and PROC MCMC will use the default mechanism to fill in the rest of the random-effects parameters.
For example, the following statement creates a data set with initial values for the random-effects
parameters u_1, u_2, and u_3:
data RandomInit;
input u_1 u_2 u_3;
datalines;
2.3 3 -3
;
The following RANDOM statement takes the values in the RandomInit data set to be the initial values
of the corresponding random-effects parameters in the model:
random u ~ normal(0,var=s2u) subject=index init=randominit;
Specifying a constant assigns that constant as the initial value to all random-effects parameters in the
statement. For example, the following statement assigns the value 5 to be used as an initial value for
all ui in the model:
RANDOM Statement F 5685
random u ~ normal(0,var=s2u) subject=index init=5;
If you have multiple effects, you can provide a list of numbers, where the length of the list the same as
the dimension of your random-effects array. Each number is then given to all corresponding randomeffects parameters in order. For example, the following statement assigns the value 2 to be used as an
initial value for all w1i and the value 3 to be used for all w2i in the model:
array w[2] w1 w2;
random w ~ mvn(mu, cov) subject=index init=(2 3);
If you use the GENERAL or DGENERAL functions in the RANDOM statement, you must provide
initial values for these parameters.
MONITOR= (symbol-list | number-list | RANDOM(number ))
outputs analysis for selected random-effects parameters. You can choose to monitor the random-effects
parameters by listing the effect names or effect indices, or you can have them randomly selected by
PROC MCMC.
To monitor all random-effects parameters, you specify the effect name in the MONITOR= option:
random u ~ normal(0,var=s2u) subject=index monitor=(u);
You have three options for monitoring a subset of the random-effects parameters. You can provide a
list of the parameter names, you can provide a number list of the parameter indices, or you can have
PROC MCMC randomly choose a subset of parameters for you.
For example, if you want to monitor analysis for parameters u_1 through u_10, u_23, and u_57, you
can provide the names as follows:
random u ~ normal(0,var=s2u) subject=index monitor=(u_1-u_10 u_23 u_57);
The naming convention in the symbol-list must agree with the NAMESUFFIX= option, which controls
how the parameter names of the random-effect are created. By default, NAMESUFFIX=SUBJECT,
and the symbol-list must use suffixes that correspond to the formatted values1 in the SUBJECT= data
set variable. With the NAMESUFFIX=POSITION option, the symbol-list must use suffixes that agree
with the input order of the SUBJECT= variable. If the SUBJECT= variable has a character value, you
cannot use the hyphen (-) in the symbol-list to indicate a range of variables.
To monitor the same list of random-effects parameters, you can provide their indices:
random u ~ normal(0,var=s2u) subject=index monitor=(1 to 10 by 1 23 57);
PROC MCMC can also randomly choose a subset of the parameters to monitor:
1 In
SAS/STAT 9.3, the random-effects parameters were created using the unformatted values of the SUBJECT= variable.
5686 F Chapter 74: The MCMC Procedure
random u ~ normal(0,var=s2u) subject=index monitor=(random(12));
The sequence of the random indices is controlled by the SEED= option in the PROC MCMC statement.
By default, PROC MCMC does not monitor any random-effects parameters. When you specify this
option, it takes the specification of the STATISTICS= and PLOTS= options in the PROC MCMC
statement. By default, PROC MCMC outputs all the posterior samples of all random-effects parameters
to the OUTPOST= output data set. You can use the NOOUTPOST option to suppress the saving of the
random-effects parameters.
NAMESUFFIX=option
specifies how the names of the random-effects parameters are internally created from the SUBJECT=
variable that is specified in the RANDOM statement. PROC MCMC creates the names by concatenating
the random-effect symbol with an underscore and a series of numbers or characters. The following
options control the type of methods that are used in such construction:
SUBJECT
constructs the parameter names by appending the formatted values of the SUBJECT= variable in
the input data set.2
POSITION
constructs the parameter names by appending the numbers 1, 2, 3, and so on, where the number
indicates the order in which the SUBJECT= variable appears in the data set.
For example, suppose you have an input data set with four observations and the SUBJECT= variable
zipcode has four values (with three of them unique): 27513, 01440, 27513, and 15217. The following
SAS statement creates three random-effects parameters named u_27513, u_01440, and u_15217:
random u ~ normal(0,var=10) subject=zipcode namesuffix=subject;
On the other hand, using NAMESUFFIX=POSITION creates three parameters named u_1, u_2, and
u_3:
random u ~ normal(0,var=10) subject=zipcode namesuffix=position;
By default, NAMESUFFIX=SUBJECT.
NOOUTPOST
suppresses the output of the posterior samples of random-effects parameters to the OUTPOST= data
set. In models with a large number of random-effects parameters (for example, tens of thousands),
PROC MCMC can run faster if it does not save the posterior samples of the random-effects parameters.
When you specify both the NOOUTPOST option and the MONITOR= option, PROC MCMC outputs
the list of variables that are monitored.
The maximum number of variables that can be saved to an OUTPOST= data set is 32,767. If you run a
large-scale random-effects model with the number of parameters exceeding the limit, the NOOUTPOST
option is evoked automatically and PROC MCMC does not save the random-effects parameter draws to
the posterior output data set. You can use the MONITOR= option to select a subset of the parameters
to store in the OUTPOST= data set.
2 In
SAS/STAT 9.3, the random-effects parameters were created using the unformatted values of the SUBJECT= variable.
UDS Statement F 5687
UDS Statement
UDS subroutine-name (subroutine-argument-list) ;
UDS stands for user defined sampler. The UDS statement enables you to use a separate algorithm, other than
the default random walk Metropolis, to update parameters in the model. The purpose of the UDS statement is
to give you a greater amount of flexibility and better control over the updating schemes of the Markov chain.
Multiple UDS statements are allowed.
For the UDS statement to work properly, you have to do the following:
write a subroutine by using PROC FCMP (see the FCMP Procedure in the Base SAS Procedures
Guide) and save it to a SAS catalog (see the example in this section). The subroutine must update some
parameters in the model. These are the UDS parameters. The subroutine is called the UDS subroutine.
declare any UDS parameters in the PARMS statement with a sampling option, as in < / UDS > (see the
section “PARMS Statement” on page 5674).
specify the prior distributions for all UDS parameters, using the PRIOR statements.
N OTE : All UDS parameters must appear in three places: the UDS statement, the PARMS statement, and the
PRIOR statement. Otherwise, PROC MCMC exits.
To obtain a valid Markov chain, a UDS subroutine must update a parameter from its full posterior conditional
distribution and not the posterior marginal distribution. The posterior conditional is something that you
need to provide. This conditional is implicitly based on a prior distribution. PROC MCMC has no means to
verify that the implied prior in the UDS subroutine is the same as the prior that you specified in the PRIOR
statement. You need to make sure that the two distributions agree; otherwise, you will get misleading results.
The priors in the PRIOR statements do not directly affect the sampling of the UDS parameters. They could
affect the sampling of the other parameters in the model, which, in turn, changes the behavior of the Markov
chain. You can see this by noting cases where the hyperparameters of the UDS parameters are model
parameters; the priors should be part of the posterior conditional distributions of these hyperparameters, and
they cannot be omitted.
Some additional information is listed to help you better understand the UDS statement:
Most features of the SAS programming language can be used in subroutines processed by PROC
FCMP (see the FCMP Procedure in the Base SAS Procedures Guide).
The UDS statement does not support FCMP functions—a FCMP function returns a value, while a
subroutine does not. A subroutine updates some of its subroutine arguments. These arguments are
called OUTARGS arguments.
The UDS parameters cannot be in the same block as other parameters. The optional argument < /
UDS > in the PARMS statement prevents parameters that use the default Metropolis from being mixed
with those that are updated by the UDS subroutines.
You can put all the UDS parameters in the same PARMS statement or have a separate UDS statement
for each of them.
5688 F Chapter 74: The MCMC Procedure
The same subroutine can be used in multiple UDS statements. This feature comes in handy if you have
a generic sampler that can be applied to different parameters.
PROC MCMC updates the UDS parameters by calling the UDS subroutines directly. At every iteration,
PROC MCMC first samples parameters that use the Metropolis algorithm, then the UDS parameters.
Sampling of the UDS parameters proceeds in the order in which the UDS statements are listed.
A UDS subroutine accepts any symbols in the program as well as any input data set variables as its
arguments.
Only the OUTARGS arguments in a UDS subroutine are updated in PROC MCMC. You can modify
other arguments in the subroutine, but the changes are not global in PROC MCMC.
If a UDS subroutine has an argument that is a SAS data set variable, PROC MCMC steps through the
data set while updating the UDS parameters. The subroutine is called once per observation in the data
set for every iteration.
If a UDS subroutine does not have any arguments that are data set variables, PROC MCMC does not
access the data set while executing the subroutine. The subroutine is called once per iteration.
To reduce the overhead in calling the UDS subroutine and accessing the data set repeatedly, you might
consider reading all the input data set variables into arrays and using the arrays as the subroutine
arguments. See the section “BEGINCNST/ENDCNST Statement” on page 5663 about how to use the
BEGINCNST and ENDCNST statements to store data set variables.
For an example that uses the UDS statement, see “Example 74.19: Implement a New Sampling Algorithm”
on page 5873.
Details: MCMC Procedure
How PROC MCMC Works
PROC MCMC is a simulation-based procedure that applies a variety of sampling algorithms to the program
at hand. The default sampling methods include conjugate sampling (from full conditional), direct sampling
from the marginal distribution, inverse cumulative distribution function, random walk Metropolis with normal
proposal, and discretized random walk Metropolis with normal proposal. You can request alternate sampling
algorithms, such as random walk Metropolis with t distribution proposal, discretized random walk Metropolis
with symmetric geometric proposal, and the slice sampling algorithm.
PROC MCMC applies the more efficient sampling algorithms first, whenever possible. When a parameter
does not appear in the conditional distributions of other random variables in the program, PROC MCMC
generates samples directly from its prior distribution (which is also its marginal distribution). This usually
occurs in data-independent Monte Carlo simulation programs (see “Example 74.1: Simulating Samples From
a Known Density” on page 5776 for an example) or missing data problems, where the missing response
variables are generated directly from the conditional sampling distribution (or the conditional likelihood).
When conjugacy is detected, PROC MCMC uses random number generators to draw values from the full
conditional distribution. (For information about detecting conjugacy, see the section “Conjugate Sampling”
How PROC MCMC Works F 5689
on page 5697.) In other situations, PROC MCMC resorts to the random walk Metropolis with normal
proposal to generate posterior samples for continuous parameters and a discretized version for discrete
parameters. See the section “Metropolis and Metropolis-Hastings Algorithms” on page 130 in Chapter 7,
“Introduction to Bayesian Analysis Procedures,” for details about the Metropolis algorithm. For the actual
implementation details of the Metropolis algorithm in PROC MCMC, such as tuning of the covariance
matrices, see the section “Tuning the Proposal Distribution” on page 5694.
A key component of the Metropolis algorithm is the calculation of the objective function. In most cases,
the objective function that PROC MCMC uses in a Metropolis step is the logarithm of the joint posterior
distribution, which is calculated with the inclusion of all data and parameters. The rest of this section describes
how PROC MCMC calculates the objective function for parameters that use the Metropolis algorithm.
Model Parameters
To calculate the log of the posterior density, PROC MCMC assumes that all observations in the data set are
independent,
log.p.jy// D log.. // C
n
X
log.f .yi j //
i D1
where is a parameter or a vector of parameters that are defined in the PARMS statements (referred to
as the model parameters). The term log.. // is the sum of the log of the prior densities specified in the
PRIOR and HYPERPRIOR statements. The term log.f .yi j // is the log likelihood specified in the MODEL
statement. The MODEL statement specifies the log likelihood for a single observation in the data set.
P
If you want to model dependent data—that is, log.f .yj // ¤ i log.f .yi j //—you can use the JOINTMODEL option in the PROC MCMC statement. See the section “Modeling Joint Likelihood” on page 5729
for more details.
The statements in PROC MCMC are similar to DATA step statements; PROC MCMC evaluates every
statement in order for each observation. At the beginning of the data set, the log likelihood is set to be 0.
As PROC MCMC steps through the data set, it cumulatively adds the log likelihood for each observation.
Statements between the BEGINNODATA and ENDNODATA statements are evaluated only at the first and
the last observations. At the last observation, the log of the prior and hyperprior distributions is added to the
sum of the log likelihood to obtain the log of the posterior distribution.
Calculation of the log.p.jy// objective function involves a complete pass through the data set, making it
potentially computationally expensive. If D f1 ; 2 g is multidimensional, you can choose to update a
portion of the parameters at each iteration step by declaring them in separate PARMS statements (see the
section “Blocking of Parameters” on page 5691 for more information). PROC MCMC updates each block of
parameters while holding others constant. The objective functions that are used in each update are the same
as the log of the joint posterior density:
log.p.1 jy; 2 // D log.p.2 jy; 1 // D log.p.jy//
In other words, PROC MCMC does not derive the conditional distribution explicitly for each block of
parameters, and it uses the full joint distribution in the Metropolis step for every block update.
5690 F Chapter 74: The MCMC Procedure
Random-Effects Models
For programs that require RANDOM statements, PROC MCMC includes the sum of the density evaluation
of the random-effects parameters in the calculation of the objective function for ,
log.p.j; y// D log.. // C
J
X
log..j j // C
j D1
n
X
log.f .yi j; //
i D1
where D f1 ; : : : ; J g are random-effects parameters and .j j / is the prior distribution of the randomeffects parameters. The likelihood function can be conditional on , but the prior distributions of , which
must be independent of , cannot.
The objective function used in the Metropolis step for the random-effects parameter j contains only the
portion of the data that belong to the jth cluster:
X
log.p.j j; y// D log..j j // C
log.f .yi j; j //
i 2fj th clusterg
The calculation does not include log. /, the prior density piece, because that is a known constant. Evaluation
of this objective function involves only a portion of the data set, making it more computationally efficient.
In fact, updating every random-effects parameters in a single RANDOM statement involves only one pass
through the data set.
You can have multiple RANDOM statements in a program, which adds more pieces to the posterior calculation,
such as
log.p.j; ˛; y// D log.. // C
J
X
log..j j // C
j D1
K
X
log..˛k j // C
kD1
n
X
log.f .yi j; ; ˛//
i D1
where ˛ D f˛1 ; : : : ; ˛K g is another random effect. The random effects and ˛ can form their own hierarchy
(as in a nested model), or they can enter the program in a non-nested fashion. The objective functions for j
and ˛k are calculated using only observations that belong to their respective clusters.
Models with Missing Values
Missing values in the response variables of the MODEL statement are treated as random variables, and they
add another layer in the conditional updates in the simulation. Suppose that
y D fyobs ; ymis g
The response variable y consists of n1 observed values yobs and n2 missing values ymis . The log of the
posterior distribution is thus formed by
log.p.j; ymis ; yobs // D log.. //C
J
X
j D1
n2
n1
X
X
log..j j //C
log.f .ymis;i j; //
log.f .yobs;i j; //
i D1
where the expression is evaluated at the drawn and yobs values.
The conditional distribution of the random-effects parameter j is
X
log.p.j j; y// D log..j j // C
log.f .yi j; j //
i 2fj th clusterg
i D1
Blocking of Parameters F 5691
where the yi are either the observed or the imputed values of the response variable.
The missing values are usually sampled directly from the sampling distribution and do not require the
Metropolis sampler. When a response variable takes
on a GENERAL function, the objective function is
simply the likelihood function: log f .ymis;i j; j / .
Blocking of Parameters
In a multivariate parameter model, if all k parameters are proposed with one joint distribution q.j/, acceptance
or rejection would occur for all of them. This can be rather inefficient, especially when parameters have vastly
different scales. A way to avoid this difficulty is to allocate the k parameters into d blocks and update them
separately. The PARMS statement puts model parameters in separate blocks, and each block of parameters is
updated sequentially in the procedure.
Suppose you want to sample from a multivariate distribution with probability density function p. jy/ where
D f1 ; 2 ; : : : ; k g: Now suppose that these k parameters are separated into d blocks—for example,
p. jx/ D fd .z/ where z D fz1 ; z2 ; : : : ; zd g, where each zj contains a nonempty subset of the fi g, and
where each i is contained in one and only one zj . In the MCMC context, the z’s are blocks of parameters.
In the blocked algorithm, a proposal consists of several parts. Instead of proposing a simultaneous move for
all the ’s, a proposal is made for the i ’s in z1 only, then for the i ’s in z2 , and so on for d subproposals.
Any accepted proposal can involve any number of the blocks moving. The parameters do not necessarily all
move at once as in the all-at-once Metropolis algorithm.
Formally, the blocked Metropolis algorithm is as follows. Let wj be the collection of i that are in block zj ,
and let qj .jwj / be a symmetric multivariate distribution that is centered at the current values of wj .
1. Let t D 0. Choose points for all wjt . A point can be an arbitrary point as long as p.wjt jy/ > 0.
2. For j D 1; : : : ; d :
a) Generate a new sample, wj;new , using the proposal distribution qj .jwjt /.
b) Calculate the following quantity:
(
)
1
p.wj;new jw1t ; : : : ; wjt 1 ; wjt C1
; : : : ; wdt ; y/
r D min
;1 :
p.wjt jw1t ; : : : ; wjt 1 ; wjt C11 ; : : : ; wdt ; y/
c) Sample u from the uniform distribution U.0; 1/.
d) Set wjt C1 D wj;new if r < a; wjt C1 D wjt otherwise.
3. Set t D t C 1. If t < T , the number of desired samples, go back to Step 2; otherwise, stop.
With PROC MCMC, you can sample all parameters simultaneously by putting them all in a single PARMS
statement, you can sample parameters individually by putting each parameter in its own PARMS statement, or
you can sample certain subsets of parameters together by grouping each subset in its own PARMS statements.
For example, if the model you are interested in has five parameters, alpha, beta, gamma, phi, sigma, the
all-at-once strategy is as follows:
5692 F Chapter 74: The MCMC Procedure
parms alpha beta gamma phi sigma;
The one-at-a-time strategy is as follows:
parms
parms
parms
parms
parms
alpha;
beta;
gamma;
phi;
sigma;
A two-block strategy could be as follows:
parms alpha beta gamma;
parms phi sigma;
The exceptions to the previously described blocking strategies are parameters that are sampled directly (either
from their full conditional or marginal distributions) and parameters that are array-based (with multivariate
prior distributions). In these cases, the parameters are taken out of an existing block and are updated
individually. You can use the sampling options in the PARMS statement to override the default behavior.
One of the greatest challenges in MCMC sampling is achieving good mixing of the chains—the chains should
quickly traverse the support of the stationary distribution. A number of factors determine the behavior of a
Metropolis sampler; blocking is one of them, so you want to be extremely careful when you choose a good
design. Generally speaking, forming blocks of parameters has its advantages, but it is not true that the larger
the block the faster the convergence.
When simultaneously sampling a large number of parameters, the algorithm might find it difficult to achieve
good mixing. As the number of parameters gets large, it is much more likely to have (proposal) samples that
fall well into the tails of the target distribution, producing too small a test ratio. As a result, few proposed
values are accepted and convergence is slow. On the other hand, when the algorithm samples each parameter
individually, the computational cost increases linearly. Each block of Metropolis parameters requires one
additional pass through the data set, so a five-block updating strategy could take five times longer than
a single-block updating strategy. In addition, there is a chance that the chain might mix far too slowly
because the conditional distributions (of i given all other ’s) might be very “narrow,” as a result of posterior
correlation among the parameters. When that happens, it takes a long time for the chain to fully explore
that dimension alone. There are no theoretical results that can help determine an optimal “blocking” for an
arbitrary parametric model. A rule followed in practice is to form small groups of correlated parameters
that belong to the same context in the formulation of the model. The best mixing is usually obtained with a
blocking strategy somewhere between the all-at-once and one-at-a-time strategies.
Sampling Methods
When suitable, PROC MCMC chooses the optimal sampling method for each parameter. That involves
direct sampling either from the conditional posterior via conjugacy (see the section “Conjugate Sampling” on
page 5697) or via the marginal posterior (see the section “Direct Sampling” on page 5696). Alternatively,
PROC MCMC samples according to Table 74.5. Each block of parameters is classified by the nature of the
prior distributions. “Continuous” means all priors of the parameters in the same block have a continuous
distribution. “Discrete” means all priors are discrete. “Mixed” means that some parameters are continuous
and others are discrete. Parameters that have binary priors are treated differently, as indicated in the table.
Sampling Methods F 5693
Table 74.5 Sampling Methods in PROC MCMC
Blocks
Default Method
Alternative Method
Continuous
Discrete (other than binary)
Mixed
Binary (single dimensional)
Binary (multidimensional)
Multivariate normal (MVN)
Binned MVN
MVN
Inverse CDF
Independence sampler
Multivariate t (MVT); slice sampler
Binned MVT or symmetric geometric
MVT
For a block of continuous parameters, PROC MCMC uses a multivariate normal distribution as the default
proposal distribution. In the tuning phase, PROC MCMC finds an optimal scale c and a tuning covariance
matrix †.
For a discrete block of parameters, PROC MCMC uses a discretized multivariate normal distribution as the
default proposal distribution. The scale c and covariance matrix † are tuned. Alternatively, you can use an
p/jj
independent symmetric geometric proposal distribution. The density has form p.1
and has variance
2.1 p/
.2 p/.1 p/
.
p2
In the tuning phase, the procedure finds an optimal proposal probability p for every parameter in
the block.
You can change the proposal distribution, from the normal to a t distribution. You can either use the PROC
option PROPDIST=T(df ) or PARMS statement option < / T(df ) > to make the change. The t distributions
have thicker tails, and they can propose to the tail areas more efficiently than the normal distribution.
It can help with the mixing of the Markov chain if some of the parameters have a skewed tails. See
“Example 74.6: Nonlinear Poisson Regression Models” on page 5804. The independence sampler (see the
section “Independence Sampler” on page 133 in Chapter 7, “Introduction to Bayesian Analysis Procedures”)
is used for a block of binary parameters. The inverse CDF method is used for a block that consists of a single
binary parameter.
For parameters with continuous prior distributions, you can use the slice sampler as an alternative sampling
algorithm. To do so, specify the SLICE option in the PARMS. When you specify the SLICE option, all
parameters are updated individually. PROC MCMC does not support a multivariate version of the slice
sampler. For more in information about the slice sampler, see the section“Slice Sampler” on page 133 in
Chapter 7, “Introduction to Bayesian Analysis Procedures.”
The sampling algorithms for the random-effects parameters are chosen in a similar fashion. The preferred
algorithms are the direct method either from the full conditional or the marginal. When these are not
attainable, Metropolis with normal proposal becomes the default for continuous random-effects parameters,
and discrete Metropolis with normal proposal becomes the default for discrete random-effects parameters.
You can use the ALGORITHM= option in the RANDOM statement to choose the slice sampler or discrete
Metropolis with symmetric geometric as the alternatives.
The sampling preference of the missing data variables is the same as the random-effects parameters. The
reserve sampling algorithm is the Metropolis. There is no alternative sampling method available for the
missing data variables.
5694 F Chapter 74: The MCMC Procedure
Tuning the Proposal Distribution
One key factor in achieving high efficiency of a Metropolis-based Markov chain is finding a good proposal
distribution for each block of parameters. This process is referred to as tuning. The tuning phase consists of a
number of loops. The minimum number of loops is controlled by the option MINTUNE=, with a default
value of 2. The option MAXTUNE= controls the maximum number of tuning loops, with a default value
of 24. Each loop lasts for NTU= iterations, where by default NTU= 500. At the end of every loop, PROC
MCMC examines the acceptance probability for each block. The acceptance probability is the percentage of
NTU= proposals that have been accepted. If the probability falls within the acceptance tolerance range (see
the section “Scale Tuning” on page 5694), the current configuration of c/† or p is kept. Otherwise, these
parameters are modified before the next tuning loop.
Continuous Distribution: Normal or t Distribution
A good proposal distribution should resemble the actual posterior distribution of the parameters. Large
sample theory states that the posterior distribution of the parameters approaches a multivariate normal
distribution (see Gelman et al. 2004, Appendix B, and Schervish 1995, Section 7.4). That is why a normal
proposal distribution often works well in practice. The default proposal distribution in PROC MCMC
is the normal distribution: qj .new j t / D MVN.new j t ; c 2 †/. As an alternative, you can choose a
multivariate t distribution as the proposal distribution. It is a good distribution to use if you think that the
posterior distribution has thick tails and a t distribution can improve the mixing of the Markov chain. See
“Example 74.6: Nonlinear Poisson Regression Models” on page 5804.
Scale Tuning
The acceptance rate is closely related to the sampling efficiency of a Metropolis chain. For a random
walk Metropolis, high acceptance rate means that most new samples occur right around the current data
point. Their frequent acceptance means that the Markov chain is moving rather slowly and not exploring
the parameter space fully. On the other hand, a low acceptance rate means that the proposed samples are
often rejected; hence the chain is not moving much. An efficient Metropolis sampler has an acceptance
rate that is neither too high nor too low. The scale c in the proposal distribution q.j/ effectively controls
this acceptance probability. Roberts, Gelman, and Gilks (1997) showed that if both the target and proposal
densities are normal, the optimal acceptance probability for the Markov chain should be around 0.45 in a
single dimensional problem, and asymptotically approaches 0.234 in higher dimensions. The corresponding
optimal scale is 2.38, which is the initial scale set for each block.
Due to the nature of stochastic simulations, it is impossible to fine-tune a set of variables such that the
Metropolis chain has the exact desired acceptance rate. In addition, Roberts and Rosenthal (2001) empirically
demonstrated that an acceptance rate between 0.15 and 0.5 is at least 80% efficient, so there is really no
need to fine-tune the algorithms to reach acceptance probability that is within small tolerance of the optimal
values. PROC MCMC works with a probability range, determined by the PROC options TARGACCEPT
˙ ACCEPTTOL. The default value of TARGACCEPT is a function of the number of parameters in the
model, as outlined in Roberts, Gelman, and Gilks (1997). The default value of ACCEPTTOL= is 0.075. If
the observed acceptance rate in a given tuning loop is less than the lower bound of the range, the scale is
reduced; if the observed acceptance rate is greater than the upper bound of the range, the scale is increased.
During the tuning phase, a scale parameter in the normal distribution is adjusted as a function of the observed
Tuning the Proposal Distribution F 5695
acceptance rate and the target acceptance rate. The following updating scheme is used in PROC MCMC 3 :
cnew D
ccur ˆ 1 .popt =2/
ˆ 1 .pcur =2/
where ccur is the current scale, pcur is the current acceptance rate, popt is the optimal acceptance probability.
Covariance Tuning
To tune a covariance matrix, PROC MCMC takes a weighted average of the old proposal covariance matrix
and the recent observed covariance matrix, based on NTU samples in the current loop. The TUNEWT=w
option determines how much weight is put on the recently observed covariance matrix. The formula used to
update the covariance matrix is as follows:
COVnew D w COVcur C .1
w /COVold
There are two ways to initialize the covariance matrix:
The default is an identity matrix multiplied by the initial scale of 2.38 (controlled by the PROC option
SCALE=) and divided by the square root of the number of estimated parameters in the model. It can
take a number of tuning phases before the proposal distribution is tuned to its optimal stage, since the
Markov chain needs to spend time learning about the posterior covariance structure. If the posterior
variances of your parameters vary by more than a few orders of magnitude, if the variances of your
parameters are much different from 1, or if the posterior correlations are high, then the proposal tuning
algorithm might have difficulty with forming an acceptable proposal distribution.
Alternatively, you can use a numerical optimization routine, such as the quasi-Newton method, to find
a starting covariance matrix. The optimization is performed on the joint posterior distribution, and the
covariance matrix is a quadratic approximation at the posterior mode. In some cases this is a better
and more efficient way of initializing the covariance matrix. However, there are cases, such as when
the number of parameters is large, where the optimization could fail to find a matrix that is positive
definite. In that case, the tuning covariance matrix is reset to the identity matrix.
A side product of the optimization routine is that it also finds the maximum a posteriori (MAP) estimates with
respect to the posterior distribution. The MAP estimates are used as the initial values of the Markov chain.
If any of the parameters are discrete, then the optimization is performed conditional on these discrete
parameters at their respective fixed initial values. On the other hand, if all parameters are continuous, you
can in some cases skip the tuning phase (by setting MAXTUNE=0) or the burn-in phase (by setting NBI=0).
Discrete Distribution: Symmetric Geometric
By default, PROC MCMC uses the normal density as the proposal distribution in all Metropolis random walks.
For parameters that have discrete prior distributions, PROC MCMC discretizes proposed samples. You can
choose an alternative symmetric geometric proposal distribution by specifying the option DISCRETE=GEO.
3
Roberts, Gelman, and Gilks (1997) and Roberts and Rosenthal
p (2001)
demonstrate that the relationship between acceptance
probability and scale in a random walk Metropolis is p D 2ˆ
I c=2 , where c is the scale, p is the acceptance rate, ˆ is the
CDF of a standard normal, and I Ef Œ.f 0 .x/=f .x//2 , f .x/ is the density function of samples. This relationship determines the
updating scheme, with I being replaced by the identity matrix to simplify calculation.
5696 F Chapter 74: The MCMC Procedure
The density of the symmetric geometric proposal distribution is as follows:
pg .1
2.1
pg /jj
pg /
where the symmetry centers at . The distribution has a variance of
2 D
.2
pg /.1
pg2
pg /
Tuning for the proposal pg uses the following formula:
ˆ
new
D
cur
ˆ
1 .p
opt =2/
cur =2/
1 .p
where new is the standard deviation of the new proposal geometric distribution, cur is the standard
deviation of the current proposal distribution, popt is the target acceptance probability, and pcur is the
current acceptance probability for the discrete parameter block.
The updated pg is the solution to the following equation that is between 0 and 1 :
s
cur ˆ 1 .popt =2/
.2 pg /.1 pg /
D
pg2
ˆ 1 .pcur =2/
Binary Distribution: Independence Sampler
Blocks consisting of a single parameter with a binary prior do not require any tuning; the inverse-CDF method
applies. Blocks that consist of multiple parameters with binary prior are sampled by using an independence
sampler with binary proposal distributions. See the section “Independence Sampler” on page 133 in Chapter 7,
“Introduction to Bayesian Analysis Procedures.” During the tuning phase, the success probability p of the
proposal distribution is taken to be the probability of acceptance in the current loop. Ideally, an independence
sampler works best if the acceptance rate is 100%, but that is rarely achieved. The algorithm stops when the
probability of success exceeds the TARGACCEPTI=value, which has a default value of 0.6.
Direct Sampling
The word “direct” is reserved for sampling that is done directly from the prior distribution of a model or a
random-effects parameter or from the sampling distribution of a missing data variable. If the parameter is
updated via sampling from its full conditional posterior distribution, the sampling method is referred to as
conjugate sampling. (See the section “Conjugate Sampling” on page 5697.)
Whenever a parameter does not appear in the hierarchy of another parameter in the model, PROC MCMC
samples directly from its distribution. For a model parameter or a random-effects parameter, this distribution
is its prior distribution. For a missing data variable, this distribution is the sampling distribution of the
response variable. Therefore, direct sampling takes place most frequently in data-independent Monte Carlo
simulations or the sampling of missing response variables.
Conjugate Sampling F 5697
Conjugate Sampling
Conjugate prior is a family of prior distributions in which the prior and the posterior distributions are of the
same family of distributions. For example, if you model an independently and identically distributed random
variable yi using a normal likelihood with known variance 2 ,
yi normal.; 2 /
a normal prior on normal.0 ; 02 /
is a conjugate prior because the posterior distribution of is also a normal distribution given y D fyi g, 2 ,
0 , and 02 :
0
! 1
!
! 11
n
n
y
N
1
n
1
0
A
C 2 ;
C 2
jy normal @ 2 C 2
0
02
02
Conjugate sampling is efficient because it enables the Markov chain to obtain samples from the target
distribution directly. When appropriate, PROC MCMC uses conjugate sampling methods to draw conditional
posterior samples. Table 74.6 lists scenarios that lead to conjugate sampling in PROC MCMC.
Table 74.6 Conjugate Sampling in PROC MCMC
Family
Parameter
Prior
Normal with known Normal with known Normal with known scale parameter ( 2 , , or )
Multivariate normal with known †
Multivariate normal with known Multinomial
Binomial/binary
Poisson
Variance 2
Precision Mean Mean Covariance †
p
p
Inverse gamma family
Gamma family
Normal
Multivariate normal
Inverse Wishart
Dirichlet
Beta
Gamma family
In most cases, Family in Output 74.6 refers to the likelihood function. However, it does not necessarily have
to be the case. The Family is a distribution that is conditional on the parameter of interest, and it can appear
in any level of the hierarchical model, including on the random-effects level.
PROC MCMC can detect conjugacy only if the model parameter (not a function or a transformation of the
model parameter) is used in the prior and Family distributions. For example, the following statements lead to
a conjugate sampler being used on the parameter mu:
parm mu;
prior mu ~ n(0, sd=1000);
model y ~ n(mu, var=s2);
However, if you modify the program slightly in the following way, although the conjugacy still holds in
theory, PROC MCMC cannot detect conjugacy on mu because the parameter enters the normal likelihood
5698 F Chapter 74: The MCMC Procedure
function through the symbol w:
parm mu;
prior mu ~ n(0, sd=1000);
w = mu;
model y ~ n(w, var=s2);
In this case, PROC MCMC resorts to the default sampling algorithm, which is a random walk Metropolis
based on a normal kernel.
Similarly, the following statements also prevent PROC MCMC from detecting conjugacy on the parameter
mu:
parm mu;
prior mu ~ n(0, sd=1000);
model y ~ n(mu + 2, var=s2);
In a normal family, an often-used and often-confused conjugate prior on the variance is the inverse gamma
distribution, and a conjugate prior on the precision is the gamma distribution. See “Gamma and InverseGamma Distributions” on page 5746 for typical usages of these prior distributions.
When conjugacy is detected in a model, PROC MCMC performs a numerical optimization on the joint
posterior distribution at the start of the MCMC simulation. If the only sampling methods required in the
program are conjugate samplers or direct samplers, PROC MCMC omits this optimization step. To turn off
this optimization routine, use the PROPCOV=IND option in the PROC MCMC statement.
Initial Values of the Markov Chains
There are three types of parameters in a PROC MCMC program: the model parameters in the PARMS
statement, the random-effects parameters in the RANDOM statement, and the missing data variables in the
MODEL statement. The last category is used to model missing values in the input data set.
When the model parameters and random-effects parameters have missing initial values, PROC MCMC
generates initial values based on the prior distributions. PROC MCMC either uses the mode value (the
default) or draws a random number (if the INIT=RANDOM option is specified). For distributions that do
not have modes, such as the uniform distribution, PROC MCMC uses the mean instead. In general, PROC
MCMC avoids using starting values that are close to the boundary of support of the prior distribution. For
example, the exponential prior has a mode at 0, and PROC MCMC starts an initial value at the mean. This
avoids some potential numerical problems. If you use the GENERAL or DGENERAL function in the PRIOR
statements, you must provide initial values for those parameters.
With missing data variables, PROC MCMC uses the sample average of the nonmissing values (of the response
variable) as the initial value. If all values of a particular variable are missing, PROC MCMC resorts to
using the mode value or a random number from the sampling distribution (the likelihood), depending on the
specification of the INIT= option.
To assign a different set of initial values to the model parameters, you use either the PARMS statements
or programming statements within the BEGINCNST and ENDCNST statements. See the section “Assignments of Parameters” on page 5699 for more information about how to assign parameter values within the
BEGINCNST and ENDCNST statements.
Assignments of Parameters F 5699
To assign initial values to the random-effects parameters, you can use the INIT= option in the RANDOM
statement. Either you can give a constant value to all random-effects parameters that are associated with that
statement (for example, use init=3), or you can assign values individually by providing a data set that stores
different values for different parameters.
A mirroring INIT= option in the MODEL statement enables you to assign different initial values to the
missing data variables.
If you use the PROPCOV= optimization option in the PROC MCMC statement, PROC MCMC starts the
tuning at the optimized values. PROC MCMC overwrites the initial values that you might have provided at
the beginning of the Markov chain unless you use the option INIT=REINIT.
Assignments of Parameters
In general, you cannot alter the values of any model parameters in PROC MCMC. For example, the following
assignment statement produces an error:
parms alpha;
alpha = 27;
This restriction prevents incorrect calculation of the posterior density—assignments of parameters in the
program would override the parameter values generated by PROC MCMC and lead to an incorrect value of
the density function.
However, you can modify parameter values and assign initial values to parameters within the block defined
by the BEGINCNST and ENDCNST statements. The following syntax is allowed:
parms alpha;
begincnst;
alpha = 27;
endcnst;
The initial value of alpha is 27. Assignments within the BEGINCNST/ENDCNST block override initial
values specified in the PARMS statement. For example, with the following statements, the Markov chain
starts at alpha = 27, not 23.
parms alpha 23;
begincnst;
alpha = 27;
endcnst;
This feature enables you to systematically assign initial values. Suppose that z is an array parameter of the
same length as the number of observations in the input data set. You want to start the Markov chain with
each zi having a different value depending on the data set variable y. The following statements set zi D jyj
for the first half of the observations and zi D 2:3 for the rest:
/* a rather artificial input data set. */
data inputdata;
do ind = 1 to 10;
y = rand('normal');
output;
end;
5700 F Chapter 74: The MCMC Procedure
run;
proc mcmc data=inputdata;
array z[10];
begincnst;
if ind <= 5 then z[ind] = abs(y);
else z[ind] = 2.3;
endcnst;
parms z:;
prior z: ~ normal(0, sd=1);
model general(0);
run;
Elements of z are modified as PROC MCMC executes the programming statements between the BEGINCNST
and ENDCNST statements. This feature could be useful when you use the GENERAL function and you find
that the PARMS statements are too cumbersome for assigning starting values.
Standard Distributions
The section “Univariate Distributions” on page 5700 (Table 74.7 through Table 74.36) lists all univariate
distributions that PROC MCMC recognizes. The section “Multivariate Distributions” on page 5712 (Table 74.37 through Table 74.41) lists all multivariate distributions that PROC MCMC recognizes. With the
exception of the multinomial distribution, all these distributions can be used in the MODEL, PRIOR, and
HYPERPRIOR statements. The multinomial distribution is supported only in the MODEL statement. The
RANDOM statement supports a limited number of distributions; see Table 74.4 for the complete list.
See the section “Using Density Functions in the Programming Statements” on page 5716 for information
about how to use distributions in the programming statements. To specify an arbitrary distribution, you
can use the GENERAL and DGENERAL functions. See the section “Specifying a New Distribution” on
page 5715 for more details. See the section “Truncation and Censoring” on page 5719 for tips about how to
work with truncated distributions and censoring data.
Univariate Distributions
Table 74.7
PROC specification
Beta Distribution
beta(a, b)
Density
€.aCb/ a 1
.1
€.a/€.b/
Parameter restriction
a
8 > 0, b > 0
ˆ Œ0; 1 when a D 1; b D 1
ˆ
ˆ
ˆ
ˆ
< Œ0; 1/ when a D 1; b ¤ 1
ˆ
.0; 1 when a ¤ 1; b D 1
ˆ
ˆ
ˆ
ˆ
: .0; 1/ otherwise
Range
Mean
a
aCb
Variance
ab
.aCb/2 .aCbC1/
/b
1
Standard Distributions F 5701
8
a 1
ˆ
ˆ
ˆ
aCb 2
ˆ
ˆ
ˆ
ˆ
0 and 1
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
< 0
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
1
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
: does not exist uniquely
Mode
Random number
Table 74.8
a D 1; b > 1
(
a 1; b < 1
a > 1; b D 1
aDbD1
Binary Distribution
binary(p)
Density
p .1
Parameter restriction
0p1
8
ˆ
when p D 0
ˆ
< f0g
f1g
when p D 1
ˆ
ˆ
: f0; 1g otherwise
p/1
Mean
round.p/
Variance
p.1 p/
(
f1g when p D 1
Mode
a < 1; b < 1
(
a < 1; b 1
If min.a; b/ > 1, see (Cheng 1978); if max.a; b/ < 1, see (Atkinson and Whittaker 1976) and (Atkinson 1979); if min.a; b/ < 1
and max.a; b/ > 1, see (Cheng 1978); if a = 1 or b = 1, use the
inversion method; if a D b D 1, use a uniform random number
generator.
PROC specification
Range
a > 1; b > 1
f0g otherwise
Random number
Table 74.9
PROC specification
Density
Generate u uniform.0; 1/. If u p, D 1; else, D 0:
Binomial Distribution
binomial(n, p)
!
n
p .1 p/n
Parameter restriction
n D 0; 1; 2; : : : 0 p 1
Range
2 f0; : : : ; ng
Mean
bnpc
Variance
np.1
Mode
b.n C 1/pc
p/
5702 F Chapter 74: The MCMC Procedure
Table 74.10
PROC specification
Cauchy Distribution
cauchy(a, b)
1
Density
b
b 2 C. a/2
Parameter restriction
b>0
Range
2 . 1; 1/
Mean
Does not exist.
Variance
Does not exist.
Mode
a
Random number
Generate u1 ; u2 uniform.0; 1/; let v D 2u2 1. Repeat the
procedure until u21 C v 2 < 1. y D v=u1 is a draw from the
standard Cauchy, and D a C by (Ripley 1987).
Table 74.11 2 Distribution
PROC specification
chisq()
Density
1
.=2/ 1 e =2
€.=2/2=2
Parameter restriction
>0
Range
2 Œ0; 1/ if D 2; .0; 1/ otherwise.
Mean
Variance
2
Mode
Random number
2
Table 74.12
2 if 2; does not exist otherwise.
is a special case of the gamma distribution:
gamma.=2; scale=2/ is a draw from the 2 distribution.
Exponential 2 Distribution
PROC specification
expchisq()
Density
1
€.=2/2=2
Parameter restriction
>0
Range
2 . 1; 1/
Mode
log./
Random number
Generate x1 2 ./, and D log.x1 / is a draw from the exponential 2 distribution.
Relationship to the 2
distribution
2 ./ , log. / exp 2 ./
exp. /=2 exp. exp. /=2/
Standard Distributions F 5703
Table 74.13
Exponential Exponential Distribution
PROC specification
expexpon(scale = b )
expexpon(iscale = ˇ )
Density
1
b
ˇ exp. / exp. exp. / ˇ/
Parameter restriction
b>0
ˇ>0
Range
2 . 1; 1/
Same
Mode
log.b/
log.1=ˇ/
Random number
Generate x1 expon.scale=b/, and D log.x1 / is a draw from
the exponential exponential distribution. Note that an exponential
exponential distribution is not the same as the double exponential
distribution.
exp. / exp. exp. /=b/
Relationship to the ex- expon.b/ , log. / expExpon.b/
ponential distribution
Table 74.14
Exponential Gamma Distribution
PROC specification
expgamma(a, scale = b )
expgamma(a, iscale = ˇ )
Density
1
e a
b a €.a/
ˇ a a
e
€.a/
Parameter restriction
a > 0; b > 0
a > 0; ˇ > 0
Range
2 . 1; 1/
Same
Mode
log.ab/
log.a=ˇ/
Random number
Generate x1 gamma.a; scale D b/, and D log.x1 / is a draw
from the exponential gamma distribution.
Relationship to the €
distribution
gamma.a; b/ , log. / expGamma.a; b/
Table 74.15
exp. e =b/
Exponential Inverse 2 Distribution
PROC specification
expichisq()
Density
1
€. 2 /2=2
Parameter restriction
>0
Range
2 . 1; 1/
Mode
exp. e ˇ/
exp. =2/ exp. 1=.2 exp. ///
log./
Random number
Generate x1 i2 ./, and D log.x1 / is a draw from the exponential inverse 2 distribution.
Relationship to the i2
distribution
i2 ./ , log. / exp i2 ./
5704 F Chapter 74: The MCMC Procedure
Table 74.16
PROC specification
Exponential Inverse-Gamma Distribution
expigamma(a, scale = b )
ba
exp. ˛ / exp. b= exp. //
expigamma(a, iscale = ˇ )
1
ˇ ˛ €.a/
exp. ˛ / exp.
Density
€.a/
Parameter restriction
a > 0; b > 0
a > 0; ˇ > 0
Range
2 . 1; 1/
Same
Mode
log.a=b/
1
/
ˇ exp. /
log.aˇ/
Random number
Generate x1 igamma.a; scale D b/, and D log.x1 / is a draw
from the exponential inverse-gamma distribution.
Relationship to the
i€ distribution
igamma.a; b/ , log. / eigamma.a; b/
Table 74.17
Exponential Scaled Inverse 2 Distribution
PROC specification
expsichisq(, s)
Density
. 2 /=2 s
€. 2 /
Parameter restriction
> 0; s > 0
Range
2 . 1; 1/
Mode
log.s 2 /
Random number
Generate x1 si2 .; s/, and D log.x1 / is a draw from the
exponential scaled inverse 2 distribution.
Relationship to the si2
distribution
si2 .; s/ , log. / exp si2 .; s/
Table 74.18
exp. =2/ exp. s 2 =.2 exp. ///
Exponential Distribution
PROC specification
expon(scale = b )
expon(iscale = ˇ )
Density
1
e =b
b
ˇe
Parameter restriction
b>0
ˇ>0
Range
2 Œ0; 1/
Same
Mean
b
1=ˇ
Variance
b2
1=ˇ 2
Mode
0
0
Random number
The exponential distribution is a special case of the gamma distribution: gamma.1; scale D b/ is a draw from the exponential
distribution.
ˇ
Standard Distributions F 5705
Table 74.19
Gamma Distribution
PROC specification
gamma(a, scale = b )
gamma(a, iscale = ˇ )
Density
1
a 1 e =b
b a €.a/
ˇa a 1
e ˇ
€.a/
Parameter restriction
a > 0; b > 0
a > 0; ˇ > 0
Range
2 Œ0; 1/ if a D 1I .0; 1/ oth- Same
erwise.
Mean
ab
a=ˇ
Variance
ab 2
a=ˇ 2
Mode
.a
Random number
See (McGrath and Irving 1973).
Table 74.20
PROC specification
Density
*
Parameter restriction
Range
1/b if a 1
.a
1/=ˇ if a 1
Geometric Distribution
geo(p)
p.1
p/
0<p1
(
f0; 1; 2; : : :g 0 < p < 1
2
f0g
pD1
Mean
round( 1 pp )
Variance
1 p
p2
Mode
0
Random number
Based on samples obtained from a Bernoulli distribution with probability p until the first success.
* The random variable is the total number of failures in an experiment before the first success. This density function is
not to be confused with another popular formulation, p.1
success.
p/
1,
which counts the total number of trials until the first
5706 F Chapter 74: The MCMC Procedure
Inverse 2 Distribution
Table 74.21
PROC specification
ichisq()
Density
1
.=2C1/ e 1=.2 /
€.=2/2=2
Parameter restriction
>0
Range
2 .0; 1/
Mean
1
2
Variance
2
if
. 2/2 . 4/
1
C2
Inverse 2 is
Mode
Random number
>4
a special case of the inverse-gamma distribution:
igamma.=2; iscale D 2/ is a draw from the inverse 2 distribution.
Table 74.22
PROC specification
if > 2
Inverse-Gamma Distribution
igamma(a, scale = b )
ba
.aC1/ e b=
igamma(a, iscale = ˇ )
1
.aC1/ e 1=ˇ
ˇ a €.a/
Density
€.a/
Parameter restriction
a > 0; b > 0
a > 0; ˇ > 0
Range
2 .0; 1/
Same
Mean
Variance
Mode
b
a 1
if a > 1
b2
.a 1/2 .a 2/
b
aC1
1
if a > 1
ˇ .a 1/
1
ˇ 2 .a 1/2 .a 2/
1
ˇ .aC1/
Random number
Generate x1 gamma.a; scale D b/, and D 1=x1 is a draw
from the igamma.a; iscale D b/ distribution.
Relationship to the
gamma distribution
gamma.a; iscale D b/ , 1= igamma.a; scale D b/
Standard Distributions F 5707
Table 74.23
Laplace (Double Exponential) Distribution
PROC specification
laplace(a, scale = b)
laplace(a, iscale = ˇ)
Density
1
e j aj=b
2b
ˇ
ˇ j aj
2e
Parameter restriction
b>0
ˇ>0
Range
2 . 1; 1/
Same
Mean
a
a
Variance
2b 2
2=ˇ 2
Mode
a
Random number
Inverse CDF. F . /
Parameter restriction
Range
2 . 1; 1/
Mean
a
Variance
2 b2
3
Mode
a
Random number
<a
Logistic Distribution
logistic(a, b)
exp. b a /
2
b .1Cexp. b a //
b>0
Density
a b
:
a
: 1 1 exp
a
2
b
Generate u1 ; u2 uniform.0; 1/. If u1 < 0:5; D a C b log.u2 /I
else D a b log.u2 /. is a draw from the Laplace distribution.
Table 74.24
PROC specification
D
8 a
< 1 exp
2
1
Inverse CDF method with F . / D 1 C exp. b a /
. Generate
u uniform.0; 1/, and D a b log.1=u 1/ is a draw from the
logistic distribution.
5708 F Chapter 74: The MCMC Procedure
Table 74.25
Lognormal Distribution
PROC speci- lognormal(, sd = s)
lognormal(, var = v)
lognormal(, prec = )
fication
q
.log /2
.log /2
.log /2
1
1
1
p
p
Density
exp
exp
exp
2v
2
2
2s 2
s
2
Parameter re- s > 0
striction
Range
Mean
Variance
2 .0; 1/
exp. C
s 2 =2/
2v
v>0
>0
Same
Same
exp. C v=2/
exp. C 1=.2 //
exp .2. C s 2 //
exp .2. C v//
exp .2. C 1= //
s2/
exp .2 C v/
exp .2 C 1= /
exp .2 C
s2/
v/
exp.
Random
number
Generate x1 normal.0; 1/, and D exp. C sx1 / is a draw from the
lognormal distribution.
Table 74.26
PROC specification
Density
Parameter restriction
Range
Mean
Variance
Mode
Random number
exp.
exp.
1= /
Mode
Negative Binomial Distribution
negbin(n, p)
Cn
n
1
1
!
p n .1
p/
n D 1; 2; : : :; and0 < p 1
(
f0; 1; 2; : : :g 0 < p < 1
2
f0g
pD1
round n.1p p/
n.1 p/
8 p2
< 0
: round
nD1
.n 1/.1 p/
p
n>1
Generate x1 gamma.n; 1/, and Poisson.x1 .1
(Fishman 1996).
p/=p/
Standard Distributions F 5709
Table 74.27
Normal Distribution
normal(, prec = )
PROC speci- normal(, sd = s)
fication
. /2
p1
Density
exp
2
2s
normal(, var = v)
Parameter re- s > 0
striction
v>0
>0
s
p1
2v
2
exp
. /2
2v
q
2
Range
2 . 1; 1/
Same
Same
Mean
Same
Same
Variance
s2
v
1=
Mode
Same
Same
Table 74.28
exp
. /2
2
NormalCAR Distribution
PROC speci- normalcar(neighbors=, normalcar(neighbors=, normalcar(neighbors=,
fication
num=, sd= s)
num=, var= v)
num=, prec= )
Density
i jPi
N. j 2N.i / j =mi ; s 2 /
i jPi
N. j 2N.i / j =mi ; v/
Notation
i is the area or site index, mi is the number of neighbors to i, and N.i / is
the set of neighbors to i.
Parameter re- s > 0
striction
i jPi
N. j 2N.i / j =mi ; 1= /
v>0
>0
Range
2 . 1; 1/
Same
Same
Mean
average of neighbors
Same
Same
Variance
s2
v
1=
Mode
average of neighbors
Same
Same
Table 74.29
PROC specification
Density
Pareto Distribution
pareto(a, b)
aC1
a
b
b
Parameter restriction
a > 0; b > 0
Range
2 Œb; 1/
Mean
ab
a 1
Variance
b2 a
.a 1/2 .a 2/
Mode
b
Random number
Inverse CDF method with F . / D 1 .b= /a . Generate u b
uniform.0; 1/, and D u1=a
is a draw from the Pareto distribution.
Useful transformation
x D 1= is Beta(a, 1)I{x < 1=b}.
if a > 1
if a > 2
5710 F Chapter 74: The MCMC Procedure
Table 74.30
Poisson Distribution
PROC specification
poisson()
Density
Š
exp. /
Parameter restriction
0
(
Range
2
Mean
Variance
, if > 0
Mode
round./
Table 74.31
f0; 1; : : :g if > 0
f0g
if D 0
Scaled Inverse 2 Distribution
PROC specification
sichisq(; s 2 )
Density
2
.s 2 =2/=2
.=2C1/ e s =.2 /
€.=2/
Parameter restriction
> 0; s > 0
Range
2 .0; 1/
Mean
2
2 s if > 2
2 2
s 4 if
. 2/2 . 4/
2
C2 s
Variance
Mode
>4
Scaled inverse 2 is a special case of the inverse-gamma distribution: igamma.=2; scale D .s 2 /=2/ is a draw from the
scaled inverse 2 distribution.
Random number
Table 74.32 t Distribution
PROC
specification
Density
t(, sd = s, )
€. C1
2p /
.1
€. 2 /s C
Parm re- s > 0, > 0
striction
t(, var = v, )
. /2
/
s 2
C1
2
€. C1
2 /
p
.1
€. 2 / v
C
t(, prec = , )
. /2
v /
C1
2
p
2
€. C1
2 p/ .1 C . / /
€. 2 / v > 0, > 0
> 0, > 0
Range
2 . 1; 1/
Same
Same
Mean
if > 1
Same
Same
Variance
2
2s
2v
Mode
Random
number
if > 2
Same
if > 2
1
2
if > 2
Same
p
x1 normal.0; 1/; x2 2 .d /; and D m C x1 d=x2 is a draw from the t
distribution.
C1
2
Standard Distributions F 5711
Table 74.33
Table (Categorical) Distribution
PROC specification
table(p), where p D fpi g, for i D 1; 2; : : : ; k
Density
Parameter restriction
f . D i / D pi
Pk
i pi D 1 with all pi > 0
Range
2 f1; 2; : : : ; kg
Mode
i such that pi D max.p1 ; : : : ; pk /
Random number
Inverse CDF method with F . D i / D
Table 74.34
PROC specification
Density
Pi
j D1 pj .
Uniform Distribution
uniform(a, b)
8
1
ˆ
ˆ
< a b if a > b
1
if b > a
b a
ˆ
ˆ
: 1
if a D b
Parameter restriction
none
Range
2 Œa; b
Mean
Variance
aCb
2
jb aj2
12
Mode
Does not exist
Random number
Mersenne Twister (Matsumoto and Kurita 1992, 1994; Matsumoto
and Nishimura 1998)
Table 74.35
Wald Distribution
Density
wald(, )
q
exp
2 3
Parameter restriction
> 0; > 0
Range
2 .0; 1/
Mean
Variance
Mode
3 =
1C
Random number
Generate 0 2.1/ . Let x1 D C
PROC specification
92
42
. /2
22 1=2
3
2
2 0
2
2
q
40 C 2 02
and x2 D 2 =x1 . Perform a Bernoulli trial, w Bernoulli. Cx
/.
1
If w D 1, choose D x1 ; otherwise, choose D x2 (Michael,
Schucany, and Haas 1976).
5712 F Chapter 74: The MCMC Procedure
Table 74.36
Weibull Distribution
Density
weibull(, c, )
c c
c exp
Parameter restriction
c > 0; > 0
Range
2 Œ; 1/ if c D 1I .; 1/ otherwise
Mean
C €.1 C 1=c/
Variance
2 Œ€.1 C 2=c/
Mode
C .1
Random number
Inverse CDF method with F . / D 1
PROC specification
1
€ 2 .1 C 1=c/
1=c/1=c if c > 1
exp
ate u uniform.0; 1/, and D C .
the Weibull distribution.
c 1=c
ln u/ is a
. Gener-
draw from
Multivariate Distributions
Table 74.37
Dirichlet Distribution
PROC specification
Density
dirich(˛), where D fi g ; ˛ D f˛i g, for i D 1; 2; : : : ; k
Qk
P
˛i 1
€.˛0 /
, where ˛0 D kiD1 ˛i
Qk
i D1 i
Parameter restriction
Range
Mean
Mode
˛i > 0
P
i > 0, kiD1 i D 1
˛j =˛0 ˛j 1 =.˛0 k/
iD1
Table 74.38
€.˛i /
Inverse Wishart Distribution
Density
iwishart(, S), both and S are k k matrices
k k.k 1/ Q
1 CkC1
k
C1 i
2
€
jSj 2 jj
exp
22 4
i D1
2
Parameter restriction
Range
Mean
Mode
S must be symmetric and positive definite; > k
is symmetric and positive definite
S =. k 1/
S =. C k C 1/
PROC specification
Table 74.39
1
1/
2 tr.S
1
Multivariate Normal Distribution
Density
mvn(, †), where D fk g ; D fk g, for i D 1; 2; : : : ; k,
and † is a k k variance matrix.
p
exp 12 . /0 † 1 . /
.2/k j†j
Parameter restriction
Range
Mean
Mode
† must be symmetric and positive definite
1 < i < 1
PROC specification
Usage of Multivariate Distributions F 5713
Table 74.40
Autoregressive Multivariate Normal
Distribution
PROC speci- MVNAR(, sd=,)
fication
Density
exp
1
2 .
2
6
6
6
6
†D6
6
6
4
/0 . 2 †/
1
2
3
::
:
Parameter restriction
Range
Mean
Mode
Special Case
1
MVNAR(, MVNAR(, prec=1= 2 ,
)
ˇ
ˇ
.q
/
.2/k ˇ. 2 †/ˇ where
var= 2 ,)
1 .
2
1
::
:
1
2
::
:
k k
k
3
2
1
::
:
2
k
3
k
k 1
k 2
k 3
::
::
:
:
1
3
7
7
7
7
7
7
7
5
> 0 and 1 < < 1
1 < i < 1
When D 0, the distribution simplifies to mvn(, 2 Ik ), where Ik denotes the
k k identity matrix
Table 74.41
PROC specification
Density
Parameter restriction
Range
Mean
Multinomial Distribution
multinom(p), where D fi g and p D fpi g, for i D
1; 2; : : : ; k
Pk
k
1
nŠ
,
where
p
p
i i D n
1
1 k
k
P
k
i pi D 1 with all pi > 0
i 2 f0; : : : ; ng, nonnegative integers
np
Usage of Multivariate Distributions
The following simple example illustrates the usage of the multivariate distributions in PROC MCMC. Suppose
you are interested in estimating the mean and covariance of multivariate data using this multivariate normal
model:
x1
1
11 12
MVN D
;† D
x2
2
21 22
5714 F Chapter 74: The MCMC Procedure
where
1
D
2
2:4 3
† D
3 8:1
You can use the following independent prior on and †:
0
100 0
MVN 0 D
; †0 D
0 100
0
1 0
† iWishart D 2; S D
0 1
The following IML procedure statements simulate 100 random multivariate normal samples:
title 'An Example that Uses Multivariate Distributions';
proc iml;
N = 100;
Mean = {1 2};
Cov = {2.4 3, 3 8.1};
call randseed(1);
x = RANDNORMAL( N, Mean, Cov );
SampleMean = x[:,];
n = nrow(x);
y = x - repeat( SampleMean, n );
SampleCov = y`*y / (n-1);
print SampleMean Mean, SampleCov Cov;
cname = {"x1", "x2"};
create inputdata from x [colname = cname];
append from x;
close inputdata;
quit;
Figure 74.13 prints the sample mean and covariance of the simulated data, in addition to the true mean and
covariance matrix.
Figure 74.13 Simulated Multivariate Normal Data
An Example that Uses Multivariate Distributions
SampleMean
Mean
0.9987751 2.115693
SampleCov
2.8252975 3.7190704
3.7190704 9.2916805
1 2
Cov
2.4
3
3 8.1
Specifying a New Distribution F 5715
The following PROC MCMC statements estimate the posterior mean and covariance of the multivariate
normal data:
proc mcmc data=inputdata seed=17 nmc=3000 diag=none;
ods select PostSumInt;
array data[2] x1 x2;
array mu[2];
array Sigma[2,2];
array mu0[2] (0 0);
array Sigma0[2,2] (100 0 0 100);
array S[2,2] (1 0 0 1);
parm mu Sigma;
prior mu ~ mvn(mu0, Sigma0);
prior Sigma ~ iwish(2, S);
model data ~ mvn(mu, Sigma);
run;
To use the multivariate distribution, you must specify parameters (or random variables in the MODEL
statement) in an array form. The first ARRAY statement creates an one-dimensional array data, which
contains two numeric variables, x1 and x2, from the input data set. The data variable is your response
variable. The subsequent statements defines two array-parameters (mu and Sigma) and three constant
array-hyperparameters (mu0, Sigma0, and S). The PARMS statement declares mu and Sigma to be model
parameters. The two PRIOR statements specify the multivariate normal and inverse Wishart distributions as
the prior for mu and Sigma, respectively. The MODEL statement specifies the multivariate normal likelihood
with data as the random variable, mu as the mean, and Sigma as the covariance matrix.
Figure 74.14 lists the estimated posterior statistics for the parameters.
Figure 74.14 Estimated Mean and Covariance
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
mu1
3000 0.9941
0.1763 0.6338
1.3106
mu2
3000 2.1135
0.3112 1.4939
2.7165
Sigma1
3000 2.8726
0.4084 2.1001
3.6723
Sigma2
3000 3.7573
0.6418 2.5791
5.0223
Sigma3
3000 3.7573
0.6418 2.5791
5.0223
Sigma4
3000 9.3987
1.3224 7.0155 12.0969
Specifying a New Distribution
To work with a new density that is not listed in the section “Standard Distributions” on page 5700, you can
use the GENERAL and DGENERAL functions. The letter “D” stands for discrete. The new distributions
have to be specified on the logarithm scale.
Suppose you want to use the inverse-beta distribution:
p.˛ja; b/ D
€.a C b/
˛ .a
€.a/ C €.b/
1/
.1 C ˛/
.aCb/
5716 F Chapter 74: The MCMC Procedure
The following statements in PROC MCMC define the density on its log scale:
a = 3; b = 5;
const = lgamma(a + b) - lgamma(a) - lgamma(b);
lp = const + (a - 1) * log(alpha) - (a + b) * log(1 + alpha);
prior alpha ~ general(lp);
The symbol lp is the expression for the log of an inverse-beta (a = 3, b = 5). The function general(lp)
assigns that distribution to alpha. The constant term, const, can be omitted because the Markov simulation
requires only the log of the density kernel.
You can use the GENERAL function to specify a distribution for a single variable or for multiple variables. It
is important to emphasize that the argument lp is an expression for the log of the joint distribution for these
variables. On the contrary, any standard distribution is applied separately to each random variable in the
statement.
When you use the GENERAL function in the MODEL statement, you do not need to specify the dependent
variable on the left of the tilde Ï. The log-likelihood function takes the dependent variable into account;
hence, there is no need to explicitly state the dependent variable in the MODEL statement. However, in the
PRIOR and RANDOM statements, you need to explicitly state the parameter names and a tilde with the
GENERAL function.
You can specify any distribution function by using the GENERAL and DGENERAL functions as long as
the distribution function is programmable with SAS statements. When the function is used in the PRIOR
statements, you must supply initial values in either the PARMS statement or within the BEGINCNST and
ENDCNST statements. See the sections “PARMS Statement” on page 5674 and “BEGINCNST/ENDCNST
Statement” on page 5663. When the function is used in the RANDOM statement, you must use the INITIAL=
option in the RANDOM statement to supply initial values
N OTE : PROC MCMC does not verify that the GENERAL function you specify is a valid distribution—that
is, an integrable density. You must use the function with caution.
Using Density Functions in the Programming Statements
Density Functions in PROC MCMC
PROC MCMC has a number of internally defined log-density functions for univariate and multivariate
distributions. These functions have the basic form of LPDFdist (x , parm-list ), where dist is the name of the
distribution (see Table 74.42 for univariate distributions and Table 74.43 for multivariate distributions). The
argument x is the random variable, and parm-list is the list of parameters.
In addition, the univariate functions allow for optional boundary arguments, such as LPDFdist (x , parm-list ,
< lower >, < upper >), where lower and upper are optional but positional boundary arguments. With the
exception of the Bernoulli and uniform distribution, you can specify limits on all univariate distributions.
To set a lower bound on the normal density:
lpdfnorm(x, 0, 1, -2);
To set just an upper bound, specify a missing value for the lower bound argument:
Using Density Functions in the Programming Statements F 5717
lpdfnorm(x, 0, 1, ., 2);
Leaving both limits out gives you the unbounded density. You can also specify both bounds:
lpdfnorm(x, 0, 1);
lpdfnorm(x, 0, 1, -3, 4);
See Table 74.42 for the function names of univariate distributions and Table 74.43 for multivariate distributions.
Table 74.42 Logarithm of Univariate Density Functions in PROC
MCMC
Distribution Name
Function Call
Beta
lpdfbeta(x, a, b,< lower >, < upper >);
Binary
lpdfbern(x, p);
Binomial
lpdfbin(x, n,p, < lower >, < upper >);
Cauchy
lpdfcau(x, loc, scale, < lower >, < upper >);
2
lpdfchisq(x, df,< lower >, < upper >);
Exponential 2
lpdfechisq(x, df, < lower >, < upper >);
Exponential gamma
lpdfegamma(x, sp,scale, < lower >, < upper >);
Exponential exponential
Exponential inverse
2
Exponential inversegamma
Exponential scaled
inverse 2
Exponential
lpdfeexpon(x, scale,< lower >, < upper >);
lpdfexpon(x, scale, < lower >, < upper >);
Gamma
lpdfgamma(x, sp, scale, < lower >, < upper >);
Geometric
lpdfgeo(x, p, < lower >, < upper >);
Inverse 2
lpdfichisq(x, df, < lower >, < upper >);
Inverse-gamma
lpdfigamma(x, sp, scale, < lower >, < upper >);
Laplace
lpdfdexp(x, loc, scale, < lower >, < upper >);
lpdfeichisq(x, df, < lower >, < upper >);
lpdfeigamma(x, sp, scale, < lower >, < upper >);
lpdfesichisq(x, df, scale, < lower >, < upper >);
5718 F Chapter 74: The MCMC Procedure
Table 74.42 (continued)
Distribution Name
Function Call
Logistic
lpdflogis(x, loc, scale, < lower >, < upper >);
Lognormal
lpdflnorm(x, loc, sd, < lower >, < upper >);
Negative binomial
lpdfnegbin(x, n, p, < lower >, < upper >);
Normal
lpdfnorm(x, mu, sd, < lower >, < upper >);
Pareto
lpdfpareto(x, sp, scale, < lower >, < upper >);
Poisson
lpdfpoi(x, mean, < lower >, < upper >);
Scaled inverse 2
lpdfsichisq(x, df, scale, < lower >, < upper >);
t
lpdft(x, mu, sd, df, < lower >,< upper >);
Uniform
lpdfunif(x, a, b);
Wald
lpdfwald(x, mean, scale, < lower >, < upper >);
Weibull
lpdfwei(x, loc, sp, scale, < lower >, < upper >);
In the multivariate log-density functions, arrays must be used in place for the random variable and parameters
in the model.
Table 74.43 Logarithm of Multivariate Density Functions in
PROC MCMC
Distribution Name
Function Call
Dirichlet
lpdfdirich(x_array, alpha_array );
Inverse Wishart
lpdfiwish(x_array, df, S_array );
Multivariate normal
lpdfmvn(x_array, mu_array, cov_array );
Multinomial
lpdfmnom(x_array, p_array );
Truncation and Censoring F 5719
Standard Distributions, the LOGPDF Functions, and the LPDFdist Functions
Standard distributions listed in the section “Standard Distributions” on page 5700 are names only, and they
can be used only in the MODEL, PRIOR, and HYPERPRIOR statements to specify either a prior distribution
or a conditional distribution of the data given parameters. They do not return any values, and you cannot use
them in the programming statements.
The LOGPDF functions are DATA step functions that compute the logarithm of various probability density
(mass) functions. For example,
logpdf("beta", x, 2, 15);
returns the log of a beta density with parameters a = 2 and b = 15, evaluated at x. All the LOGPDF functions
are supported in PROC MCMC.
The LPDFdist functions are unique to PROC MCMC. They compute the logarithm of various probability
density (mass) functions. The functions are the same as the LOGPDF functions when it comes to calculating
the log density. For example,
lpdfbeta(x, 2,15);
returns the same value as
logpdf("beta", x, 2, 15);
The LPDFdist functions cover a greater class of probability density functions, and the univariate distribution
functions take the optional but positional boundary arguments. There are no corresponding LCDFdist or
LSDFdist functions in PROC MCMC. To work with the cumulative probability function or the survival
functions, you need to use the LOGCDF and the LOGSDF DATA step functions.
Truncation and Censoring
Truncated Distributions
To specify a truncated distribution, you can use the LOWER= and/or UPPER= options. Almost all of the
standard distributions, including the GENERAL and DGENERALfunctions, take these optional truncation
arguments. The exceptions are the binary and uniform distributions.
For example, you can specify the following:
prior alpha ~ normal(mean = 0, sd = 1, lower = 3, upper = 45);
or
parms beta;
a = 3; b = 7;
ll = (a + 1) * log(b / beta);
prior beta ~ general(ll, upper = b + 17);
The preceding statements state that if beta is less than b+17, the log of the prior density is ll, as calculated by
the equation; otherwise, the log of the prior density is missing—the log of zero.
5720 F Chapter 74: The MCMC Procedure
When the same distribution is applied to multiple parameters in a PRIOR statement, the LOWER= and
UPPER= truncations apply to all parameters in that statement. For example, the following statements define
a Poisson density for theta and gamma:
parms theta gamma;
lambda = 7;
l1 = theta * log(lambda) - lgamma(1 + theta);
l2 = gamma * log(lambda) - lgamma(1 + gamma);
ll = l1 + l2;
prior theta gamma ~ dgeneral(ll, lower = 1);
The LOWER=1 condition is applied to both theta and gamma, meaning that for the assignment to ll to be
meaningful, both theta and gamma have to be greater than 1. If either of the parameters is less than 1, the log
of the joint prior density becomes a missing value.
In releases before SAS/STAT 13.1, only three distributions support parameters (or functions of parameters)
in the LOWER= and UPPER= options. These are the normal distribution, the GENERAL function, and the
DGENERAL function. Appropriate normalizing constants, which are required if the truncations involve
model parameters, are not calculated. Starting with SAS/STAT 13.1, PROC MCMC calculates the normalizing
constant in all truncated distributions, and you can use parameters in the LOWER= or UPPER= option.
Note that if you use either the GENERAL or DGENERAL function, you must compute the normalizing
constant in cases where it is required. A truncated distribution has the probability distribution
p. ja < < b/ D
p. /
F .a/ F .b/
where p./ is the density function and F ./ is the cumulative distribution function. In SAS functions, p./
is the probability density function and F ./ is the cumulative distribution function. The following example
shows how to construct a truncated gamma prior on theta, with SHAPE=3, SCALE=2, LOWER=A, and
UPPER=B:
lp = logpdf('gamma', theta, 3, 2)
- log(cdf('gamma', a, 3, 2) - cdf('gamma', b, 3, 2));
prior theta ~ general(lp);
This density specification is different from the following more naive definition, without taking into account
the normalizing constant:
lp = logpdf('gamma', theta, 3, 2);
prior theta ~ general(lp, lower=a, upper=b);
If a or b is a parameter, you get very different results from the two formulations.
Censoring
There is no built-in mechanism in PROC MCMC that models censoring automatically. You need to construct
the density function (using a combination of the LOGPDF, LOGCDF, and LOGSDF functions and IF-ELSE
statements) for the censored data.
Suppose you partition the data into four categories: uncensored (with observation x), left censored (with
observation xl), right censored (with observation xr), and interval censored (with observations xl and xr). The
likelihood is the normal with mean mu and standard deviation s. The following statements construct the
corresponding log likelihood for the observed data:
Some Useful SAS Functions F 5721
if uncensored then
ll = logpdf('normal', x, mu, s);
else if leftcensored then
ll = logcdf('normal', xl, mu, s);
else if rightcensored then
ll = logsdf('normal', xr, mu, s);
else /* this is the case of interval censored. */
ll = log(cdf('normal', xr, mu, s) - cdf('normal', xl, mu, s));
model general(ll);
See “Example 74.17: Normal Regression with Interval Censoring” on page 5866.
Some Useful SAS Functions
Table 74.44 Some Useful SAS Functions
SAS Function
abs(x);
airy(x);
beta(x1, x2);
Definition
jxj
Returns the value of the AIRY function.
R 1 x1 1
.1 z/x2 1 dz
0 z
call logistic(x);
exp.x/
1Cexp.x/
call softmax(x1, . . . , xn);
call stdize(x1, . . . , xn);
cdf();
cdf(’normal’, x, 0, 1);
comb(x1, x2);
P
Each element is replaced by exp.xj /= exp.xj /
Standardize values
Cumulative distribution function
Standard normal cumulative distribution function
constant(’.’);
cos(x);
css(x1, . . . , xn);
cv(x1, . . . , xn);
dairy(x);
dimN(m);
x1 eq x2
x1**x2
Calculate commonly used constants
cosine(x)
P
x/
N 2
i .xi
std(x) / mean(x) * 100
Derivative of the AIRY function
Returns the numbers of elements in the Nth dim of array m
Returns 1 if x1 = x2; 0 otherwise
x1x2
geomean(x1, . . . , xn);
difN(x);
digamma(x1);
erf(x);
x1Š
x2Š.x1 x2/Š
exp
log.x1 /CClog.xn /
n
Returns differences between the argument and its Nth lag
€ 0 .x1/
€.x1/
Rx
p2
0
exp. z 2 /dz
erfc(x);
fact(x);
floor(x);
gamma(x);
harmean(x1, . . . , xn);
1 – erf(x)
xŠ
Greatest
R 1 x 1integer x
exp. 1/dz
0 z
ibessel(nu, x, kode);
Modified Bessel function of order nu evaluated at x
n
1=x1 C1=xn
5722 F Chapter 74: The MCMC Procedure
Table 74.44 (continued)
SAS Function
Definition
jbessel(nu, x);
lagN(x);
largest(k, x1, . . . , xn);
lgamma(x);
lgamma(x+1);
log(x, logN(x));
logbeta(x1, x2);
logcdf();
logpdf();
logsdf();
max(x1, x2);
mean(of x1–xn);
median(of x1–xn);
min(x1, x2);
missing(x);
mod(x1, x2);
n(x1, . . . , xn);
nmiss(of y1–yn);
quantile();
pdf();
perm(n, r );
Bessel function of order nu evaluated at x
Returns values from a queue
Returns the kth largest element
ln.€.x//
ln.xŠ/
ln.x/
lgamma(x1 ) + lgamma(x2 ) - lgamma(x1 C x2 )
Log of a left cumulative distribution function
Log of a probability density (mass) function
Log of a survival function
Returns
x1 if x1 > x2 ; x2 otherwise
P
x
=n
i i
Returns the median of nonmissing values
Returns x1 if x1 < x2 ; x2 otherwise
Returns 1 if x is missing; 0 otherwise
Returns the remainder from x1 =x2
Returns number of nonmissing values
Number of missing values
Computes the quantile from a specific distribution
Probability density (mass) functions
put();
round(x);
Returns a value that uses a specified format
Rounds
x
q
rms(of x1–xn);
sdf();
sign(x);
sin(x);
smallest(s, x1, . . . , en );
sortn(of x1-xn);
sqrt(x);
std(x1, . . . , xn) );
sum(of x:);
trigamma(x);
uss(of x1–xn);
nŠ
.n r/Š
2
x12 Cxn
n
Survival function
Returns –1 if x < 0; 0 if x D 0; 1 if x > 0
sine(x)
Returns the sth smallest component of x1 ; : : : ; xn
Sorts the values of the variables
p
x
Standard
deviation of x1 ; : : : ; xn (n-1 in denominator)
P
x
i i
Derivative of the DIGAMMA(x) function
Uncorrected sum of squares
Here are examples of some commonly used transformations:
logit
mu = beta0 + beta1 * z1;
call logistic(mu);
log
Matrix Functions in PROC MCMC F 5723
w = beta0 + beta1 * z1;
mu = exp(w);
probit
w = beta0 + beta1 * z1;
mu = cdf(`normal', w, 0, 1);
cloglog
w = beta0 + beta1 * z1;
mu = 1 - exp(-exp(w));
Matrix Functions in PROC MCMC
The MCMC procedure provides you with a number of CALL routines for performing simple matrix operations
on declared arrays. With the exception of FILLMATRIX, IDENTITY, and ZEROMATRIX, the CALL
routines listed in Table 74.45 do not support matrices or arrays that contain missing values.
Table 74.45 Matrix Functions in PROC MCMC
CALL Routine
ADDMATRIX
Description
Performs an element-wise addition of two matrices or of a matrix and a
scalar.
CHOL
Calculates the Cholesky decomposition for a particular symmetric matrix.
DET
Calculates the determinant of a specified matrix, which must be square.
ELEMMULT
Performs an element-wise multiplication of two matrices.
FILLMATRIX
Replaces all of the element values of the input matrix with the specified
value. You can use this routine with multidimensional numeric arrays.
IDENTITY
Converts the input matrix to an identity matrix. Diagonal element values of
the matrix are set to 1, and the rest of the values are set to 0.
INV
Calculates a matrix that is the inverse of the input matrix. The input matrix
must be a square, nonsingular matrix.
MULT
Calculates the matrix product of two input matrices.
SUBTRACTMATRIX Performs an element-wide subtraction of two matrices or of a matrix and a
scalar.
TRANSPOSE
Returns the transpose of a matrix.
ZEROMATRIX
Replaces all of the element values of the numeric input matrix with 0.
5724 F Chapter 74: The MCMC Procedure
ADDMATRIX CALL Routine
The ADDMATRIX CALL routine performs an element-wise addition of two matrices or of a matrix and a
scalar.
The syntax of the ADDMATRIX CALL routine is
CALL ADDMATRIX (X , Y , Z ) ;
where
X specifies a scalar or an input matrix with dimensions m n (that is, X [m; n])
Y specifies a scalar or an input matrix with dimensions m n (that is, Y [m; n])
Z specifies an output matrix with dimensions m n (that is, Z [m; n])
such that
Z DX CY
CHOL CALL Routine
The CHOL CALL routine calculates the Cholesky decomposition for a particular symmetric matrix.
The syntax of the CHOL CALL routine is
CALL CHOL (X , Y < , validate >) ;
where
X specifies a symmetric positive-definite input matrix with dimensions m m (that is, X [m, m])
Y is a variable that contains the Cholesky decomposition and specifies an output matrix with dimensions
m m (that is, Y [m; m])
validate specifies an optional argument that can increase the processing speed by avoiding error
checking:
If validate = 0 or is not specified, then the matrix X is checked for symmetry.
If validate = 1, then the matrix X is assumed to be symmetric.
such that
X D YY
where Y is a lower triangular matrix with strictly positive diagonal entries and Y denotes the conjugate
transpose of Y .
Both input and output matrices must be square and have the same dimensions. If X is symmetric positivedefinite, Y is a lower triangle matrix. If X is not symmetric positive-definite, Y is filled with missing
values.
Matrix Functions in PROC MCMC F 5725
DET CALL Routine
The determinant, the product of the eigenvalues, is a single numeric value. If the determinant of a matrix
is zero, then that matrix is singular (that is, it does not have an inverse). The routine performs an LU
decomposition and collects the product of the diagonals.
The syntax of the DET CALL routine is
CALL DET (X , a) ;
where
X specifies an input matrix with dimensions m m (that is, X [m; m])
a specifies the returned determinate value
such that
a D jX j
ELEMMULT CALL Routine
The ELEMMULT CALL routine performs an element-wise multiplication of two matrices.
The syntax of the ELEMMULT CALL routine is
CALL ELEMMULT (X , Y , Z ) ;
where
X specifies an input matrix with dimensions m n (that is, X [m; n])
Y specifies an input matrix with dimensions m n (that is, Y [m; n])
Z specifies an output matrix with dimensions m n (that is, Z [m; n])
FILLMATRIX CALL Routine
The FILLMATRIX CALL routine replaces all of the element values of the input matrix with the specified
value. You can use the FILLMATRIX CALL routine with multidimensional numeric arrays.
The syntax of the FILLMATRIX CALL routine is
CALL FILLMATRIX (X , Y ) ;
where
X specifies an input numeric matrix
Y specifies the numeric value that is used to fill the matrix
5726 F Chapter 74: The MCMC Procedure
IDENTITY CALL Routine
The IDENTITY CALL routine converts the input matrix to an identity matrix. Diagonal element values of
the matrix are set to 1, and the rest of the values are set to 0.
The syntax of the IDENTITY CALL routine is
CALL IDENTITY (X ) ;
where
X specifies an input matrix with dimensions m m (that is, X [m; m])
INV CALL Routine
The INV CALL routine calculates a matrix that is the inverse of the input matrix. The input matrix must be a
square, nonsingular matrix.
The syntax of the INV CALL routine is
CALL INV (X , Y ) ;
where
X specifies an input matrix with dimensions m m (that is, X [m; m])
Y specifies an output matrix with dimensions m m (that is, Y [m; m])
MULT CALL Routine
The MULT CALL routine calculates the matrix product of two input matrices.
The syntax of the MULT CALL routine is
CALL MULT (X , Y , Z ) ;
where
X specifies an input matrix with dimensions m n (that is, X [m; n])
Y specifies an input matrix with dimensions n p (that is, Y [n; p])
Z specifies an output matrix with dimensions m p (that is, Z [m; p])
The number of columns for the first input matrix must be the same as the number of rows for the second
matrix. The calculated matrix is the last argument.
Matrix Functions in PROC MCMC F 5727
SUBTRACTMATRIX CALL Routine
The SUBTRACTMATRIX CALL routine performs an element-wide subtraction of two matrices or of a
matrix and a scalar.
The syntax of the SUBTRACTMATRIX CALL routine is
CALL SUBTRACTMATRIX (X , Y , Z ) ;
where
X specifies a scalar or an input matrix with dimensions m n (that is, X [m; n])
Y specifies a scalar or an input matrix with dimensions m n (that is, Y [m; n])
Z specifies an output matrix with dimensions m n (that is, Z [m; n])
such that
Z DX
Y
TRANSPOSE CALL Routine
The TRANSPOSE CALL routine returns the transpose of a matrix.
The syntax of the TRANSPOSE CALL routine is
CALL TRANSPOSE (X , Y ) ;
where
X specifies an input matrix with dimensions m n (that is, X [m; n])
Y specifies an output matrix with dimensions n m (that is, Y [n; m])
ZEROMATRIX CALL Routine
The ZEROMATRIX CALL routine replaces all of the element values of the numeric input matrix with 0.
You can use the ZEROMATRIX CALL routine with multidimensional numeric arrays.
The syntax of the ZEROMATRIX CALL routine is
CALL ZEROMATRIX (X ) ;
where
X specifies a numeric input matrix.
5728 F Chapter 74: The MCMC Procedure
Create Design Matrix
PROC MCMC does not support a CLASS statement; therefore you need to construct the right design matrix
(with dummy or indicator variables) prior to calling PROC MCMC. The best tool to use is the TRANSREG
procedure (see Chapter 119, “The TRANSREG Procedure”). This procedure offers both indicator and effects
coding methods. You can specify any categorical variables in the CLASS expansion, and use the ZERO=
option to select a reference category. You can also specify any other data set variables (predictors, the
responses, and so on) to the output data set in the ID statement.
For example, the following statements create a data set that contains two categorical variables (City and G),
and two continuous variables (x and resp):
title 'Create Design Matrix';
data categorical;
input City$ G$ x resp @@;
datalines;
Chicago F 69.0 112.5 Chicago
Chicago M 65.3 98.0 Chicago
NewYork M 62.8 102.5 NewYork
NewYork F 57.3 83.0 NewYork
;
F
M
M
M
56.5 84.0
59.8 84.5
63.5 102.5
57.5 85.0
Suppose you are interested in creating a design matrix that uses dummy variable coding for the categorical
variables City, G and their interaction City * G. You can use the following PROC TRANSREG statements:
proc transreg data=categorical design;
model class(city g city*g / zero=last);
id x resp;
output out=input_mcmc(drop=_: Int:);
run;
The DESIGN option specifies that the primary goal is to code the design matrix. The MODEL statement
indicates the variable of interest. The CLASS option in the MODEL statement expands the variables of
interest to a list of “dummy” variables. The ZERO=LAST option sets the reference level. The ID statement
includes x and resp in the OUT= data set. And the OUTPUT statement creates a new data set Input_MCMC
that stores the design matrix and original variables from the original data set.
A quick call of the PRINT procedure shows the output from the PROC TRANSREG call:
proc print data=input_mcmc;
run;
Figure 74.15 prints the design matrix that is generated by PROC TRANSREG. The Input_mcmc data set
contains all the variables from the original Categorical data set, in addition to corresponding dummy variables
(CityChicago, GF, and CityChicagoGF) for the categorical variables.
Modeling Joint Likelihood F 5729
Figure 74.15 Design Matrix Generated by PROC TRANSREG
Create Design Matrix
Obs CityChicago GF CityChicagoGF City
G
x resp
1
1
1
1 Chicago F 69.0 112.5
2
1
1
1 Chicago F 56.5
84.0
3
1
0
0 Chicago M 65.3
98.0
4
1
0
0 Chicago M 59.8
84.5
5
0
0
0 NewYork M 62.8 102.5
6
0
0
0 NewYork M 63.5 102.5
7
0
1
0 NewYork F 57.3
83.0
8
0
0
0 NewYork M 57.5
85.0
You can now proceed to call PROC MCMC using this input data set Input_mcmc and the corresponding
dummy variables.
PROC TRANSREG automatically creates a macro variable, &_TRGIND, which contains a list of variable
names that it creates. The %put &_trgind; statement prints the following:
CityChicago GF CityChicagoGF
The macro variable &_TRGIND can come handy if you want to build a regression model; you can refer to
&_TRGIND in the following way:
proc mcmc data=input_mcmc;
array data[5] 1 &_trgind x;
array beta[5] beta0-beta4;
...;
call mult(beta, data, mu);
...;
The first ARRAY statement defines a one-dimensional array of length 5, and it takes on five values: a constant
1 and variables CityChicago, GF, CityChicagoGF, and x. The second ARRAY statement defines an array
of beta, which are the model parameters. Later in the program, you can use the CALL MULT function to
calculate the regression mean and store the value in the symbol mu.
Modeling Joint Likelihood
PROC MCMC assumes that the input observations are independent and that the joint log likelihood is the
sum of individual log-likelihood functions. You specify the log likelihood of one observation in the MODEL
statement. PROC MCMC evaluates that function for each observation in the data set and cumulatively
sums them up. If observations are not independent of each other, this summation produces the incorrect log
likelihood.
There are two ways to model dependent data. You can either use the DATA step LAG function or use the
PROC option JOINTMODEL. The LAG function returns values of a variable from a queue. As PROC
MCMC steps through the data set, the LAG function queues each data set variable, and you have access to the
current value as well as to all previous values of any variable. If the log likelihood for observation xi depends
5730 F Chapter 74: The MCMC Procedure
only on observations 1 to i in the data set, you can use this SAS function to construct the log-likelihood
function for each observation. Note that the LAG function enables you to access observations from different
rows, but the log-likelihood function in the MODEL statement must be generic enough that it applies to all
observations. See “Example 74.14: Time Independent Cox Model” on page 5847 and “Example 74.15: Time
Dependent Cox Model” on page 5854 for how to use this LAG function.
A second option is to create arrays, store all relevant variables in the arrays, and construct the joint log
likelihood for the entire data set instead of for each observation. Following is a simple example that illustrates
the usage of this option. For a more realistic example that models dependent data, see “Example 74.14: Time
Independent Cox Model” on page 5847 and “Example 74.15: Time Dependent Cox Model” on page 5854.
/* allocate the sample size. */
data exi;
call streaminit(17);
do ind = 1 to 100;
y = rand("normal", 2.3, 1);
output;
end;
run;
The log-likelihood function for each observation is as follows:
log.f .yi j; // D log..yi I ; var D 2 //
The joint log-likelihood function is as follows:
X
log.f .yj; // D
log..yi I ; var D 2 //
i
The following statements fit a simple model with an unknown mean (mu) in PROC MCMC, with the variance
in the likelihood assumed known. The MODEL statement indicates a normal likelihood for each observation
y.
proc mcmc data=exi seed=7 outpost=p1;
parm mu;
prior mu ~ normal(0, sd=10);
model y ~ normal(mu, sd=1);
run;
The following statements show how you can specify the log-likelihood function for the entire data set:
data a;
run;
proc mcmc data=a seed=7 outpost=p2 jointmodel;
array data[1] / nosymbols;
begincnst;
rc = read_array("exi", data, "y");
n = dim(data, 1);
endcnst;
parm mu;
prior mu ~ normal(0, sd=10);
ll = 0;
Access Lag and Lead Variables F 5731
do i = 1 to n;
ll = ll + lpdfnorm(data[i], mu, 1);
end;
model general(ll);
run;
The JOINTMODEL option indicates that the function used in the MODEL statement calculates the log
likelihood for the entire data set, rather than just for one observation. Given this option, PROC MCMC no
longer steps through the input data during the simulation. Consequently, you can no longer use any data set
variables to construct the log-likelihood function. Instead, you store the data set in arrays and use arrays
instead of data set variables to calculate the log likelihood.
The ARRAY statement allocates a temporary array (data). The READ_ARRAY function selects the y
variable from the exi data set and stores it in the data array. See the section “READ_ARRAY Function” on
page 5663. In the programming statements, you use a DO loop to construct the joint log likelihood. The
expression ll in the GENERAL function now takes the value of the joint log likelihood for all data.
You can run the following statements to see that two PROC MCMC runs produce identical results.
proc compare data=p1 compare=p2;
var mu;
run;
Access Lag and Lead Variables
There are two types of random variables in PROC MCMC that are indexed: the response (MODEL statement)
is indexed by observations, and the random effect (RANDOM statement) is indexed by the SUBJECT=
option variable. As the procedure steps through the input data set, the response or the random-effects symbols
are filled with values of the current observation or the random-effects parameters in the current subject. Often
you might want to access lag or lead variables across an index. For example, the likelihood function for yi
can depend on yi 1 in an autoregressive model, or the prior distribution for j can depend on k , where
k ¤ j , in a dynamic linear model. In these situations, you can use the following rules to construct symbols
to access values from other observations or subjects:
rv.Li: the ith lag of the variable rv. This looks back to the past.
rv.Ni: the ith lead of the variable rv. This looks forward to the future.
The construction is allowed for random variables that are associated with an index, either a response variable
or a random-effects variable. You concatenate the variable name, a dot, either the letter L (for “lag”) or the
letter N (for “next”), and a lag or a lead number. PROC MCMC resolves these variables according to the
indices that are associated with the random variable, with respect to the current observation.
For example, the following RANDOM statement specifies a first-order Markov dependence in the random
effect mu that is indexed by the subject variable time:
random mu ~ normal(mu.l1,sd=1) subject=time;
This corresponds to the prior
t normal.t
1 ; sd
D 1/
5732 F Chapter 74: The MCMC Procedure
At each observation, PROC MCMC fills in the symbol mu with the random-effects parameter t that belongs
to the current cluster t. To fill in the symbol mu.l1, the procedure looks back and finds a lag-1 random-effects
parameter, t 1 , from the last cluster t-1. As the procedure moves forward in the input data set, these two
symbols are constantly updated, as appropriate.
When the index is out of range, such as t–1 when t is 1, PROC MCMC fills in the missing state from the
ICOND= option in either the MODEL or RANDOM statement. The following example illustrates how
PROC MCMC fills in the values of these lag and lead variables as it steps through the data set.
Assume that the random effect mu has five levels, indexed by sub D fa; b; c; d; eg. The model contains two
lag variables and one lead variable (mu.l1, mu.l2, and mu.n2):
mn = (mu.l1 + mu.l2 + mu.n2) / 3
random mu ~ normal(mn, sd=1) subject=time icond=(alpha beta gamma kappa);
In this setup, instead of a list of five random-effects parameters that the variable mu can be assigned values
to, there is now a list of nine variables for the variables mu, mu.l1, mu.l2, and mu.n2. The list lines up in the
following order:
alpha; beta; a ; b ; c ; d ; e ; gamma; kappa
PROC MCMC finds relevant symbol values according to this list, as the procedure steps through different
subject cluster. The process is illustrated in Table 74.46.
Table 74.46 Processing Lag and Lead Variables
sub
mu
a
b
c
d
e
a
b
c
d
e
mu.l2
alpha
beta
a
b
c
mu.l1
beta
a
b
c
d
mu.n2
c
d
e
gamma
kappa
For observations in cluster a, PROC MCMC sets the random-effects variable mu to a , looks back two lags
and fills in mu.l2 with alpha, looks back one lag and fills in mu.l1 with beta, and looks forward two leads and
fills in mu.n2 with c . As the procedure moves to observations in cluster b, mu becomes b , mu.l2 becomes
beta, mu.l1 becomes a , and mu.n2 becomes d . For observations in the last cluster, cluster e, mu becomes
e , mu.l2 becomes c , mu.l1 becomes d , and mu.n2 is filled with kappa.
Access Lag and Lead Variables F 5733
The following example fits a simple first-order dynamic linear model, in which the data set contains a time
index and the response variable y:
data dlm;
input time y;
datalines;
1
1.353412529
2
4.840739953
3
1.604892523
4
6.8947921
5
3.509644288
6
4.020173553
7
3.842884451
8
4.49057276
9
2.204570502
10
4.007351323
11
2.005515044
12
2.781756057
;
You can fit the following model to the data:
Yt
normal.t ; var=y2 /
t
normal.t
2
1 ; var= /
0 D ˛
˛ normal.0; var=10/
y2 ; 2
igamma.shape=3, scale=2/
The following PROC MCMC statements estimate parameters from this dynamic linear model:
proc mcmc data=dlm outpost=dlmO nmc=20000 seed=23;
ods select PostSumInt;
parms alpha 0;
parms var_y 1 var_mu 1;
prior alpha ~ n(0, sd=10);
prior var_y var_mu ~ igamma(shape=3, scale=2);
random mu ~ n(mu.l1,var=var_mu) s=time icond=(alpha) monitor=(mu);
model y~n(mu, var=var_y);
run;
The key component is the mu.l1 specification in the RANDOM statement, where the prior for mu depends on
its lag-1 value. The ICOND= option specifies the initial condition of mu and assigns it to be alpha, which is
a parameter in the model.
Figure 74.16 lists the estimated posterior statistics for the parameters.
5734 F Chapter 74: The MCMC Procedure
Figure 74.16 Posterior Summary Statistics of the Dynamic Linear Model
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
95%
Mean Deviation HPD Interval
alpha
20000 2.6498
1.2924 -0.0555 5.0366
var_y
20000 1.7400
0.8337 0.5651 3.3498
var_mu
20000 0.8299
0.5720 0.2109 1.9582
mu_1
20000 2.6899
0.9253 0.8380 4.5390
mu_2
20000 3.4175
0.7622 1.9336 4.9541
mu_3
20000 3.3820
0.7431 1.8543 4.7950
mu_4
20000 4.3364
0.8297 2.7406 6.0198
mu_5
20000 3.9308
0.7194 2.4967 5.3248
mu_6
20000 3.8642
0.7348 2.4117 5.3832
mu_7
20000 3.7716
0.7399 2.3019 5.1525
mu_8
20000 3.6754
0.7316 2.1683 5.0342
mu_9
20000 3.1796
0.7209 1.7485 4.6257
mu_10
20000 3.2237
0.7355 1.7802 4.6253
mu_11
20000 2.8074
0.7839 1.2985 4.3701
mu_12
20000 2.8006
0.8758 1.1286 4.5473
CALL ODE and CALL QUAD Subroutines
The CALL ODE subroutine numerically solves a set of first-order ordinary differential equations (ODEs),
including piecewise differential equations. The CALL QUAD subroutine calculates multidimensional
integrand. You can use them as programming statements in PROC MCMC. These subroutines require that
you define an objective function, for either a set of simultaneous differential equations or an integrand
function, by using PROC FCMP (see the FCMP procedure in the Base SAS Procedures Guide) and call these
subroutines in PROC MCMC.
CALL ODE
The CALL ODE subroutine performs numerical integration of first-order vector differential equations of the
form dy
D f .t; y.t // over the subinterval t 2 Œti ; tf  with the initial values y.ti / D y0 . The subroutine can
dt
also be used to solve piecewise differential equations.
You specify the CALL ODE subroutine in PROC MCMC by using the following syntax:
CALL ODE("DeqFun", Soln, Init, ti, tf, G1, G2, ... , Gk <, ode_opt>);
DeqFun: the name of the PROC FCMP subroutine of a set of simultaneous set of ordinary differential
equations,
dy
dt
D f .t; y.t //
Soln: an argument that contains solutions. Soln can be a numeric variable or an array. If it is an array, then
the size of the array determines the dimension of the problem. Otherwise, the dimension of the problem is
one.
Init: initial values of the variable y. Init can be either a numeric variable or an array. The dimension must
match that of Soln.
CALL ODE and CALL QUAD Subroutines F 5735
ti: initial time value of the subinterval t
tf: final time value of the subinterval t
Gi: input arguments in the DeqFun subroutine
ode_opt: subroutine options. ode_opt is a positional array of size 4, with the following definition:
ode_opt[1]: convergence accuracy. The default value is 1E–8.
ode_opt[2]: minimum allowable step size in the integration process. The default value is 1E–12.
ode_opt[3]: maximum allowable step size in the integration process. The default value is the largest
double-precision floating-point number.
ode_opt[4]: initial step size to start the integration process. The default value is 1E–5.
To specify the DeqFun subroutine by using PROC FCMP, use the following statements:
proc fcmp outlib=sasuser.funcs.ODE;
subroutine DeqFun(t,y[*],dy[*], A1,A2,...,Ak);
outargs dy;
dy[1] = -A1*y[1];
dy[2] = A2*y[1]-Ak-1*y[2];
...
dy[n] = t*y[n] + Ak;
endsub;
run;
The OUTLIB= option specifies an output data set to which the compiled subroutine is written. The first
three arguments of the DeqFun subroutine must be (1) the time variable t, (2) the with-respect-to variable
y, which can be an array, and (3) the differential equation function variable dy, which can also be an array.
The remaining optional input arguments are variables that are required in the construction of the differential
equations. You must declare variable dy as the updated variable in the OUTARGS statement.
To include the DeqFun in your PROC MCMC program, you must specify the SAS data set that contains the
compiler subroutine:
options cmplib=sasuser.funcs;
See “Example 74.22: One-Compartment Model with Pharmacokinetic Data” on page 5898 for an example of
fitting a pharmacokinetic model by using the CALL ODE subroutine.
The CALL ODE subroutine uses the polyalgorithm of Byrne and Hindmarsh (1975) to provide a numerical
solution for the set of ordinary differential equations. Note that you can model any n-order differential
equation as a system of first-order differential equations.
Piecewise Differential Equations
Suppose you have a system of piecewise ODEs of the following form:
5736 F Chapter 74: The MCMC Procedure
8
f1 .t; y.t //
ˆ
ˆ
ˆ
ˆ
ˆ
< f2 .t; y.t //
dy
::
D
:
ˆ
dt
ˆ
ˆ
fm 1 .t; y.t //
ˆ
ˆ
:
fm .t; y.t //
t1 t < t2
t2 t < t3
::
:
tm 1 t < tm
tm t
The system of equations is defined over m intervals (t 2 Œti ; tf ) with initial values y.tj / D y. tj 0 / for
j D 1; 2; : : : ; m. The variable y.t / can be a vector. The CALL ODE subroutine solves the system of
piecewise ODEs in each semiclosed interval Œti ; tj / for i < j D 1; 2; : : : ; m.
The CALL ODE subroutine differentiates between a regular ODE problem and a piecewise ODE problem
based on the dimension of the second input argument to the subroutine. That is the Soln argument, which
stores the solutions to the ODEs. If Soln is either a numeric variable or a one-dimensional array, the CALL
ODE subroutine treats the problem as a regular ODE problem; if Soln is a two-dimensional array, the CALL
ODE subroutine treats it as a piecewise ODE problem and solves accordingly.
You specify the CALL ODE subroutine in PROC MCMC to solve piecewise ODEs by using the following
syntax:
CALL ODE("DeqFun", Soln, ., ti, tf, G1, G2, ... , Gk,
ode_grid, "InitFun", H1, H2, ..., Hl <, ode_opt>);
DeqFun: the name of the PROC FCMP subroutine of a system of ODEs
Soln: an argument that contains solutions. Soln must be an m (number of intervals) n (number of
equations) array. Upon completion, the CALL ODE subroutine fills in the last row of the Soln array
(SolnŒm; 1; : : : ; SolnŒm; n) with the final solutions.
. (period): the third positional argument takes a missing value, or a period (.). In a regular ODE problem,
the third argument is reserved for initial values, which are fixed per solution. In a piecewise problem, initial
values can depend on solutions to previous set of ODEs, and they are set through a separate PROC FCMP
subroutine (see description of InitFun).
ti: initial time value of the subinterval t
tf: final time value of the subinterval t
Gi: input arguments in the DeqFun subroutine
ode_grid: numerical array that stores the interval boundaries. You must use the keyword ode_grid for this
array, and the array should be placed after all the Gi variables. Elements in ode_grid must be sorted in
ascending order. The dimension of this array should be m, the number of rows in Soln.
InitFun: the name of the PROC FCMP subroutine that specifies the initial values of the ODEs at different
intervals. The subroutine requires an output array argument that should be filled with the required initial
values for each interval. The InitFun subroutine should come after the ode_grid variable.
Hi: input arguments in the InitFun subroutine
ode_opt: subroutine options
The following example statements construct a two-interval piecewise ODE in the DeqFun subroutine, set
initial values in the InitFun subroutine, and call the general ODE solver in PROC MCMC:
CALL ODE and CALL QUAD Subroutines F 5737
proc fcmp outlib=sasuser.funcs.ODE;
subroutine DeqFun(t,y[*],dy[*],t1,beta,alpha);
outargs dy;
if (t <= t1) then do;
dy[1] = -y[1];
dy[2] = beta * y[1] - y[2];
dy[3] = - y[3];
end; else do;
dy[1] = 0;
dy[2] = alpha * y[1] + y[2];
dy[3] = alpha * y[3] * y[3];
end;
endsub;
run;
proc fcmp outlib=sasuser.funcs.ODE;
subroutine InitFun(init[*,*],d,Soln[*,*]);
outargs init;
init[1,1] = d;
init[1,2] = d;
init[1,3] = d;
init[2,1] = Soln[1,1];
init[2,2] = Soln[1,2];
init[2,3] = Soln[1,3];
endsub;
run;
options cmplib=sasuser.funcs;
proc mcmc ...;
array Soln[2, 3];
array ode_grid[2];
ode_grid[1]=0;
ode_grid[2]=ti;
call ode("DeqFun", Soln, ., 0, tf, ti, beta, alpha,
ode_grid, "InitFun", dose, Soln);
...;
run;
The DeqFun subroutine specifies three differential equations in two intervals, t t2 and t > t2 . The InitFun
subroutine specifies the initial values for the first set of equations as d (an input variable) and specifies the
initial values for the second set of equations as solutions to the first set of equations. In the PROC MCMC
statements, Soln is an array of size 2 3. The ode_grid array has two elements: the first element is the
lower bound (0 in this case), and the second element is ti (for example, a data set variable). The DeqFun and
InitFun subroutines are passed to the ODE solver. Three variables (ti, beta, and alpha) are input arguments to
the DeqFun subroutine. Two variables (dose and Soln) are input arguments to the InitFun subroutine. The
solution array, Soln, is passed to the InitFun subroutine and enables the ODE solver to use solutions to the
first set of ODEs as the initial values for the second set. The CALL ODE subroutine fills the second row of
the Soln array (Soln[2,1], Soln[2,2], and Soln[2,3]) with the final solution to the entire system of the ODEs
upon completion.
5738 F Chapter 74: The MCMC Procedure
CALL QUAD
The CALL QUAD subroutine is a numerical integrator that performs the integration in a twofold fashion
based on the dimension of the problem. In a unidimensional problem, the subroutine uses the adaptive
Romberg type of integration techniques; in a multidimensional problem, it uses the Laplace approximation.
You specify the CALL QUAD subroutine syntax in PROC MCMC by using the following syntax:
CALL QUAD("IntFun",Res, LL, UL, G1, G2, ... , Gk <, quad_init> <, quad_opt>);
IntFun: the name of the PROC FCMP subroutine of an integrand function
Res: contains the approximated integral value
LL: lower limit of the integral. LL must be an array if the problem is multidimensional.
UL: upper limit of the integral. UL must be an array if the problem is multidimensional.
Gi: input arguments in the IntFun subroutine
quad_init: starting values that are used in optimizing the IntFun function in Laplace approximation. This
optional array is applicable only in a multidimensional problem, and you must use the keyword quad_init for
this array. By default, the optimization starts at the average values of LL and UL. When you use the variable
quad_init, it must precede the integration option array quad_opt.
quad_opt: integration options. You must use the keyword quad_opt for this array. For a unidimensional
integration problem, quad_opt must be a positional array of size 4:
quad_opt[1]: relative accuracy. The default value is 1E–7.
quad_opt[2]: the approximate location of a maximum of the integrand. The default value is the centered
location between the LL and UL limits.
quad_opt[3]: approximate estimate of any scale in the integrand along the independent variables. The
default value is 1.
quad_opt[4]: the number of refinements allowed in order to achieve the required accuracy
In a multidimensional integration problem, quad_opt is a positional array of size 2:
quad_opt[1]: relative accuracy. The default value is 1E–7.
quad_opt[2]: optimization technique:
– 1: Quasi-Newton
– 2: Newton-Raphson
– 3: Newton-Raphson with ridging
– 4: Nelder-Mead simplex method
To specify the subroutine IntFun in PROC FCMP, you can use the following example:
CALL ODE and CALL QUAD Subroutines F 5739
proc fcmp outlib=sasuser.funcs.QUAD;
subroutine IntFun(y[*],fy,pi);
outargs fy;
fy = (exp(-(y[1]*y[1]+y[2]*y[2])/2))/(2*pi);
endsub;
run;
The first argument to IntFun must be the with-respect-to vector in the integral. The variable y can be a scalar
or an array, depending on your problem. The second argument must be the variable that contains the integrand
value, which should be declared as the updated variable in the OUTARGS statement. The remaining input
arguments are optional variables, if needed in constructing the integrand function.
To include the IntFun subroutine in your PROC MCMC program, you must specify the SAS data set that
contains the compiler subroutine:
options cmplib=sasuser.funcs;
proc mcmc ...;
array LL[2] (-100 -100);
array UL[2] ( 100 100);
call quad("IntFun", Res, LL, UL, pi);
...;
run;
In the PROC MCMC statements, LL and UL are arrays that contain the lower and upper limits of the integral.
The variable pi is an input argument to the IntFun subroutine. The solution is stored in Res.
In case of one-dimensional integration, the CALL QUAD subroutine uses adaptive Romberg-type integration
techniques. (See Rice 1973; Sikorsky 1982; Sikorsky and Stenger 1984; Stenger 1973b, a.) Many adaptive
numerical integration methods (Ralston and Rabinowitz 1978) start at one end of the interval and proceed
toward the other end, working on subintervals while locally maintaining a certain prescribed precision. This
is not the case with the CALL QUAD subroutine. The CALL QUAD subroutine is an adaptive global-type
integrator that produces a quick, rough estimate of the integration result and then refines the estimate until it
achieves the prescribed accuracy. This gives the subroutine an advantage over Gauss-Hermite and GaussLaguerre quadratures (Ralston and Rabinowitz 1978; Squire 1987), particularly for infinite and semi-infinite
intervals, because those methods perform only a single evaluation.
The Laplace approximation is used in the multidimensional problem. To approximate a k-dimensional
integration of an integrand, h(t), between the limits a and b, you can use the approximation formula
Z
a
s
b
h.t/dt Š h.c/
.2/k
jH.t /jt Dc
where logŒh.t / assumes a minimum over Œa; b at an interior critical point c, and H.t / is the Hessian of
logŒh.t/,
H.t/ D
@2 f log.h.t //g
@t @t 0
5740 F Chapter 74: The MCMC Procedure
The minimum of the negative log integrand is found by using numerical optimization. By default, the CALL
QUAD subroutine uses the quasi-Newton method. You can select other methods by using the quad_opt
option.
Regenerating Diagnostics Plots
By default, PROC MCMC generates three plots: the trace plot, the autocorrelation plot, and the kernel density
plot. Unless ODS Graphics is enabled before calling the procedure, it is hard to generate the same graph
afterwards. Directly using the Stat.MCMC.Graphics.TraceAutocorrDensity template is not feasible.
The easiest way to regenerate the same graph is with the %TADPlot autocall macro. The %TADPlot macro
requires you to specify an input data set (which usually is the output data set from a previous PROC MCMC
call) and a list of variables that you want to plot.
For more information about enabling and disabling ODS Graphics, see the section “Enabling and Disabling
ODS Graphics” on page 607 in Chapter 21, “Statistical Graphics Using ODS.”
A simple regression example, with three parameters, is used here for illustrational purposes. For an explanation of the regression model and the data involved, see the section “Simple Linear Regression” on page 5630.
The following statements generate a SAS data set and fit a regression model:
title 'Regenerating Diagnostics Plots';
data Class;
input Name $ Height Weight @@;
datalines;
Alfred 69.0 112.5
Alice 56.5 84.0
Carol
62.8 102.5
Henry 63.5 102.5
Jane
59.8 84.5
Janet 62.5 112.5
John
59.0 99.5
Joyce 51.3 50.5
Louise 56.3 77.0
Mary
66.5 112.0
Robert 64.8 128.0
Ronald 67.0 133.0
William 66.5 112.0
;
Barbara
James
Jeffrey
Judy
Philip
Thomas
65.3 98.0
57.3 83.0
62.5 84.0
64.3 90.0
72.0 150.0
57.5 85.0
ods select none;
proc mcmc data=class nmc=50000 thin=5 outpost=classout seed=246810;
parms beta0 0 beta1 0;
parms sigma2 1;
prior beta0 beta1 ~ normal(0, var = 1e6);
prior sigma2 ~ igamma(3/10, scale = 10/3);
mu = beta0 + beta1*height;
model weight ~ normal(mu, var = sigma2);
run;
ods select all;
Regenerating Diagnostics Plots F 5741
The output data set Classout contains posterior draws for beta0, beta1, and sigma2. It also stores the log of
the prior density (LogPrior), log of the likelihood (LogLike), and the log of the posterior density (LogPost). If
you want to examine the beta0 and LogPost variable, you can use the following statements to generate the
graphs:
ods graphics on;
%tadplot(data=classout, var=beta0 logpost);
ods graphics off;
Figure 74.17 displays the regenerated diagnostics plots for variables beta0 and Logpost from the data set
Classout.
Figure 74.17 Regenerated Diagnostics Plots for beta0 and Logpost
5742 F Chapter 74: The MCMC Procedure
Figure 74.17 continued
Caterpillar Plot
The caterpillar plot is a side-by-side bar plot of 95% intervals for multiple parameters. Typically, it is used to
visualize and compare random-effects parameters, which can come in large numbers in certain models. You
can use the %CATER autocall macro to create a caterpillar plot. The %CATER macro requires you specify
an input data set and a list of variables that you want to plot.
A random-effects model that has 21 random-effects parameters is used here for illustrational purpose. For
an explanation of the random-effects model and the data involved, see “Example 74.7: Logistic Regression
Random-Effects Model” on page 5812. The following statements generate a SAS data set and fit the model:
title 'Create a Caterpillar Plot';
data seeds;
input r n seed extract @@;
ind = _N_;
datalines;
10 39 0 0
23 62 0 0
17 39 0 0
5
6 0 1
32 51 0 1
46 79 0 1
10 30 1 0
8 28 1 0
3 12 1 1
22 41 1 1
3
7 1 1
;
23
53
10
23
15
81
74
13
45
30
0
0
0
1
1
0
1
1
0
1
26
55
8
0
32
51
72
16
4
51
0
0
1
1
1
0
1
0
0
1
Caterpillar Plot F 5743
ods select none;
proc mcmc data=seeds outpost=postout seed=332786 nmc=20000;
parms beta0 0 beta1 0 beta2 0 beta3 0 s2 1;
prior s2 ~ igamma(0.01, s=0.01);
prior beta: ~ general(0);
w = beta0 + beta1*seed + beta2*extract + beta3*seed*extract;
random delta ~ normal(w, var=s2) subject=ind;
pi = logistic(delta);
model r ~ binomial(n = n, p = pi);
run;
ods select all;
The output data set Postout contains posterior draws for all 21 random-effects parameters, delta_1 delta_21. You can use the following statements to generate a caterpillar plot for the 21 parameters:
%CATER(data=postout, var=delta:);
Figure 74.18 is a caterpillar plot of the random-effects parameters delta_1–delta_21.
Figure 74.18 Caterpillar Plot of the Random-Effects Parameters
If you want to change the display of the caterpillar plot, such as using a different line pattern, color, or size of
the markers, you need to first modify the Stat.MCMC.Graphics.Caterpillar template and then call the
%CATER macro again.
You can use the following statements to view the source of the Stat.MCMC.Graphics.Caterpillar
template:
5744 F Chapter 74: The MCMC Procedure
proc template;
path sashelp.tmplmst;
source Stat.MCMC.Graphics.Caterpillar;
run;
Figure 74.19 lists the source statements of the template that is used to generate the template for the caterpillar
plot.
Figure 74.19 Source Statements for Stat.MCMC.Graphics.Caterpillar Template
define statgraph Stat.MCMC.Graphics.Caterpillar;
dynamic _OverallMean _VarName _VarMean _XLower _XUpper _byline_ _bytitle_
_byfootnote_;
begingraph;
entrytitle "Caterpillar Plot";
layout overlay / yaxisopts=(offsetmin=0.05 offsetmax=0.05 display=(line
ticks tickvalues)) xaxisopts=(display=(line ticks tickvalues));
referenceline x=_OVERALLMEAN / lineattrs=(color=
GraphReference:ContrastColor);
HighLowPlot y=_VARNAME high=_XUPPER low=_XLOWER / lineattrs=
GRAPHCONFIDENCE;
scatterplot y=_VARNAME x=_VARMEAN / markerattrs=(size=5 symbol=
circlefilled);
endlayout;
if (_BYTITLE_)
entrytitle _BYLINE_ / textattrs=GRAPHVALUETEXT;
else
if (_BYFOOTNOTE_)
entryfootnote halign=left _BYLINE_;
endif;
endif;
endgraph;
end;
You can use the TEMPLATE procedure (see Chapter 21, “Statistical Graphics Using ODS”) to modify the
graph template. Subsequent calls to the %CATER macro will use the modified template to make the graph.
Autocall Macros for Postprocessing
Although PROC MCMC provides a number of convergence diagnostic tests and posterior summary statistics,
PROC MCMC performs the calculations only if you specify the options in advance. If you wish to analyze
the posterior draws of unmonitored parameters or functions of the parameters that are calculated in later
DATA step calls, you can use the autocall macros in Table 74.47.
Autocall Macros for Postprocessing F 5745
Table 74.47 Postprocessing Autocall Macros
Macro
Description
%ESS
%GEWEKE*
%HEIDEL*
%MCSE
%RAFTERY
%POSTACF
%POSTCOR
%POSTCOV
%POSTINT
%POSTSUM
%SUMINT
Effective sample sizes
Geweke diagnostic
Heidelberger-Welch diagnostic
Monte Carlo standard errors
Raftery diagnostic
Autocorrelation
Correlation matrix
Covariance matrix
Equal-tail and HPD intervals
Summary statistics
Mean, standard deviation, and HPD interval
* The %GEWEKE and %HEIDEL macros use a different optimization routine
than that used in PROC MCMC. As a result, there might be numerical differences
in some cases, especially when the sample size is small.
Table 74.48 lists options that are shared by all postprocessing autocall macros. See Table 74.49 for macrospecific options.
Table 74.48 Shared Options
Option
Description
DATA=SAS-data-set
VAR=variable-list
Input data set that contains posterior samples
Specifies the variables on which you want to carry out the calculation.
Displays the results. The default is YES.
Specifies a name for the output SAS data set to contain the results.
PRINT=YES | NO
OUT=SAS-data-set
Suppose that the data set that contains posterior samples is called post and that the variables of interest are
defined in the macro variable &PARMS. The following statements call the %ESS macro and calculates the
effective sample sizes for each variable:
%let parms = alpha beta u_1-u_17;
%ESS(data=post, var=&parms);
By default, the ESS estimates are displayed. You can choose not to display the result and save the output to a
data set with the following statement:
%ESS(data=post, var=&parms, print=NO, out=eout);
Some of the macros can take additional options, which are listed in Table 74.49.
5746 F Chapter 74: The MCMC Procedure
Table 74.49 Macro-Specific Options
Macro
Option
Description
%ESS
AUTOCORLAG=numeric
%HEIDEL
HIST=YES|NO
SALPHA=numeric
Specifies the maximum number of autocorrelation lags used
in computing the ESS estimates. By default, AUTOCORLAG=MIN(500, NOBS/4), where NOBS is the sample size
of the input data set.
Displays a histogram of all ESS estimates. The default is NO.
Specifies the ˛ level for the stationarity test. By default,
SALPHA=0.05.
Specifies the ˛ level for the halfwidth test. By default,
HALPHA=0.05.
Specifies a small positive number such that if the halfwidth is
less than times the sample mean of the remaining iterations,
the halfwidth test is passed. By default, EPS=0.1.
Specifies the earlier portion of the Markov chain used in the test.
By default, FRAC1=0.1.
Specifies the latter portion of the Markov chain used in the test.
By default, FRAC2=0.5.
Specifies the maximum number of autocorrelation lags used
in computing the Monte Carlo standard error estimates. By
default, AUTOCORLAG=MIN(500, NOBS/4), where NOBS is
the sample size of the input data set.
Specifies the order of the quantile of interest. By default,
Q=0.025.
Specifies the margin of error for measuring the accuracy of
estimation of the quantile. By default, R=0.005.
Specifies the probability of attaining the accuracy of the estimation of the quantile. By default, S=0.95.
Specifies the tolerance level for the stationary test. By default,
EPS=0.001.
Specifies autocorrelation lags calculated. The default values are
1, 5, 10, and 50.
Specifies the ˛ level .0 < ˛ < 1/ for the interval estimates. By
default, ALPHA=0.05.
HALPHA=numeric
EPS=numeric
%GEWEKE
FRAC1=numeric
FRAC2=numeric
%MCSE
AUTOCORLAG=numeric
%RAFTERY
Q=numeric
R=numeric
S=numeric
EPS=numeric
%POSTACF
LAGS=%STR(numeric-list )
%POSTINT
ALPHA=value
For example, the following statement calculates and displays autocorrelation at lags 1, 6, 11, 50, and 100.
Note that the lags in the numeric-list need to be separated by commas “,”.
%PostACF(data=post, var=&parms, lags=%str(1 to 15 by 5, 50, 100));
Gamma and Inverse-Gamma Distributions
The gamma and inverse gamma distributions are widely used in Bayesian analysis. With their respective
scale and inverse scale parameterizations, they are a frequent source of confusion in the field. This section
Gamma and Inverse-Gamma Distributions F 5747
aims to clarify their parameterizations and common usages.
The gamma distribution is often used as the conjugate prior for the precision parameter ( D 1= 2 ) in a
normal distribution. See Table 74.19 in the section “Standard Distributions” on page 5700 for the density
definitions. You can specify the distribution in two ways:
gamma(shape=, scale=) which has mean shapescale and variance shape scale2
shape
gamma(shape=, iscale=) which has mean iscale
and variance shape 2
iscale
The parameterization of the gamma distribution that is preferred by most Bayesian analysts is to have the
same number in both hyperparameter positions, which results in a prior distribution that has mean 1. To do
this, you should use the iscale= parameterization. In addition, if you choose a small value (for example,
0.01), the prior distribution takes on a large variance (100 in this example). To specify this prior in PROC
MCMC, use gamma(shape=0.01, iscale=0.01)4 , not gamma(shape=0.01, scale=0.01).
If you specify the scale= parameterization, as in gamma(shape=0.01, scale=0.01), you would get a
prior distribution that has mean 0.0001 and variance 0.000001. This would lead to a completely different
posterior inference: the prior would push the precision parameter estimate close to 0, or the variance estimate
to a large value.
The inverse-gamma distribution is often used as the conjugate prior of the variance parameter ( 2 ) in a
normal distribution. See Table 74.22 in the section “Standard Distributions” on page 5700 for the density
definitions. Similar to the gamma distribution, you can specify the inverse-gamma distribution in two ways:
igamma(shape=, scale=)
igamma(shape=, iscale=)
The inverse gamma distribution does not have a mean when the shape parameter is less than or equal to 1 and
does not have a variance when the shape parameter is less than or equal to 2.
A gamma prior distribution on the precision is the equivalent to an inverse gamma prior distribution on the
variance. The equivalency is the following:
gamma(shape=0.01, iscale=0.01) , 2 igamma(shape=0.01, scale=0.01)
N OTE : This mnemonic might help you remember the parameterization scheme of the distributions. If you
prefer to have identical hyperparameter values in the distribution, you should specify one and only one
“i.”. When the “i” appears in the igamma distribution name for the variance parameter, choose the scale=
parameterization; when the “i” appears in the iscale= parameterization, choose the gamma distribution for
the precision parameter.
If you are not sure about the choices of other hyperparameter values and what type of prior distributions they
induce, you can write a simple PROC MCMC program and see the distributions as in the following example:
4 Specifying
the same number at both positions and choosing a small value has been popularized by the WinBUGS
software program. The WinBUGS’s distribution specification of dgamma(0.01, 0.01) is equivalent to specifying
gamma(shape=0.01, iscale=0.01) in PROC MCMC.
5748 F Chapter 74: The MCMC Procedure
data a;
run;
ods graphics on;
ods select DensityPanel;
proc mcmc data=a stats=none diag=none nmc=10000 outpost=gout
plots=density seed=1;
parms gamma_3_is2 gamma_001_sc4 igamma_12_sc001 igamma_2_is01;
prior gamma_3_is2
~ gamma(shape=3, iscale=2);
prior gamma_001_sc4
~ gamma(shape=0.01, scale=4);
prior igamma_12_sc001 ~ igamma(shape=12, scale=0.01);
prior igamma_2_is01
~ igamma(shape=2, iscale=0.1);
model general(0);
run;
ods graphics off;
The preceding statements specify four different gamma and inverse gamma distributions with various scale
and inverse scale parameter values. The output of kernel density plots of these four prior distributions is
shown in Figure 74.20. Note how the X axis scales vary across different distributions.
Figure 74.20 Density Plots of Different Gamma and Inverse Gamma Distributions
Posterior Predictive Distribution F 5749
Posterior Predictive Distribution
The posterior predictive distribution
Z
p.ypred jy/ D p.ypred j /p.jy/d
can often be used to check whether the model is consistent with data. For more information about using
predictive distribution as a model checking tool, see Gelman et al. 2004, Chapter 6 and the bibliography in
i
that chapter. The idea is to generate replicate data from p.ypred jy/—call them ypred
, for i D 1; : : : ; M ,
where M is the total number of replicates—and compare them to the observed data to see whether there
are any large and systematic differences. Large discrepancies suggest a possible model misfit. One way to
compare the replicate data to the observed data is to first summarize the data to some test quantities, such as
the mean, standard deviation, order statistics, and so on. Then compute the tail-area probabilities of the test
statistics (based on the observed data) with respect to the estimated posterior predictive distribution that uses
the M replicate ypred samples.
Let T ./ denote the function of the test quantity, T .y/ the test quantity that uses the observed data, and
i
T .ypred
/ the test quantity that uses the ith replicate data from the posterior predictive distribution. You
calculate the tail-area probability by using the following formula:
Pr.T .ypred / > T .y/j /
The following example shows how you can use PROC MCMC to estimate this probability.
An Example for the Posterior Predictive Distribution
This example uses a normal mixed model to analyze the effects of coaching programs for the scholastic
aptitude test (SAT) in eight high schools. For the original analysis of the data, see Rubin (1981). The
presentation here follows the analysis and posterior predictive check presented in Gelman et al. (2004). The
data are as follows:
title 'An Example for the Posterior Predictive Distribution';
data SAT;
input effect se @@;
ind=_n_;
datalines;
28.39 14.9 7.94 10.2 -2.75 16.3
6.82 11.0 -0.64 9.4 0.63 11.4
18.01 10.4 12.16 17.6
;
The variable effect is the reported test score difference between coached and uncoached students in eight
schools. The variable se is the corresponding estimated standard error for each school. In a normal mixed
effect model, the variable effect is assumed to be normally distributed:
effecti normal.i ; se2 / for i D 1; : : : ; 8
The parameter i has a normal prior with hyperparameters .m; v/:
i normal.m; var = v/
5750 F Chapter 74: The MCMC Procedure
The hyperprior distribution on m is a uniform prior on the real axis, and the hyperprior distribution on v is a
uniform prior from 0 to infinity.
The following statements fit a normal mixed model and use the PREDDIST statement to generate draws from
the posterior predictive distribution.
ods listing close;
proc mcmc data=SAT outpost=out nmc=40000 seed=12;
parms m 0;
parms v 1 /slice;
prior m ~ general(0);
prior v ~ general(1,lower=0);
random mu ~ normal(m,var=v) subject=ind monitor=(mu);
model effect ~ normal(mu,sd=se);
preddist outpred=pout nsim=5000;
run;
ods listing;
The ODS LISTING CLOSE statement disables the listing output because you are primarily interested in the
samples of the predictive distribution. The HYPER, PRIOR, and MODEL statements specify the Bayesian
model of interest. The PREDDIST statement generates samples from the posterior predictive distribution and
stores the samples in the Pout data set. The predictive variables are named effect_1, : : :, effect_8. When no
COVARIATES option is specified, the covariates in the original input data set SAT are used in the prediction.
The NSIM= option specifies the number of predictive simulation iterations.
The following statements use the Pout data set to calculate the four test quantities of interest: the average
(mean), the sample standard deviation (sd), the maximum effect (max), and the minimum effect (min). The
output is stored in the Pred data set.
data pred;
set pout;
mean = mean(of effect:);
sd = std(of effect:);
max = max(of effect:);
min = min(of effect:);
run;
The following statements compute the corresponding test statistics, the mean, standard deviation, and the
minimum and maximum statistics on the real data and store them in macro variables. You then calculate
the tail-area probabilities by counting the number of samples in the data set Pred that are greater than the
observed test statistics based on the real data.
proc means data=SAT noprint;
var effect;
output out=stat mean=mean max=max min=min stddev=sd;
run;
data _null_;
set stat;
call symputx('mean',mean);
call symputx('sd',sd);
call symputx('min',min);
call symputx('max',max);
run;
Posterior Predictive Distribution F 5751
data _null_;
set pred end=eof nobs=nobs;
ctmean + (mean>&mean);
ctmin + (min>&min);
ctmax + (max>&max);
ctsd + (sd>&sd);
if eof then do;
pmean = ctmean/nobs; call symputx('pmean',pmean);
pmin = ctmin/nobs; call symputx('pmin',pmin);
pmax = ctmax/nobs; call symputx('pmax',pmax);
psd = ctsd/nobs; call symputx('psd',psd);
end;
run;
You can plot histograms of each test quantity to visualize the posterior predictive distributions. In addition,
you can see where the estimated p-values fall on these densities. Figure 74.21 shows the histograms. To put
all four histograms on the same panel, you need to use PROC TEMPLATE to define a new graph template.
(See Chapter 21, “Statistical Graphics Using ODS.”) The following statements define the template twobytwo:
proc template;
define statgraph twobytwo;
begingraph;
layout lattice / rows=2 columns=2;
layout overlay / yaxisopts=(display=none)
xaxisopts=(label="mean");
layout gridded / columns=2 border=false
autoalign=(topleft topright);
entry halign=right "p-value =";
entry halign=left eval(strip(put(&pmean, 12.2)));
endlayout;
histogram mean / binaxis=false;
lineparm x=&mean y=0 slope=. /
lineattrs=(color=red thickness=5);
endlayout;
layout overlay / yaxisopts=(display=none)
xaxisopts=(label="sd");
layout gridded / columns=2 border=false
autoalign=(topleft topright);
entry halign=right "p-value =";
entry halign=left eval(strip(put(&psd, 12.2)));
endlayout;
histogram sd / binaxis=false;
lineparm x=&sd y=0 slope=. /
lineattrs=(color=red thickness=5);
endlayout;
layout overlay / yaxisopts=(display=none)
xaxisopts=(label="max");
layout gridded / columns=2 border=false
autoalign=(topleft topright);
entry halign=right "p-value =";
entry halign=left eval(strip(put(&pmax, 12.2)));
endlayout;
histogram max / binaxis=false;
lineparm x=&max y=0 slope=. /
lineattrs=(color=red thickness=5);
5752 F Chapter 74: The MCMC Procedure
endlayout;
layout overlay / yaxisopts=(display=none)
xaxisopts=(label="min");
layout gridded / columns=2 border=false
autoalign=(topleft topright);
entry halign=right "p-value =";
entry halign=left eval(strip(put(&pmin, 12.2)));
endlayout;
histogram min / binaxis=false;
lineparm x=&min y=0 slope=. /
lineattrs=(color=red thickness=5);
endlayout;
endlayout;
endgraph;
end;
run;
You call PROC SGRENDER to create the graph, which is shown in Figure 74.21. (See the SGRENDER
procedure in the SAS ODS Graphics: Procedures Guide.) There are no extreme p-values observed; this
supports the notion that the predicted results are similar to the actual observations and that the model fits the
data.
proc sgrender data=pred template=twobytwo;
run;
Figure 74.21 Posterior Predictive Distribution Check for the SAT example
Note that the posterior predictive distribution is not the same as the prior predictive distribution. The prior
predictive distribution is p.y/, which is also known as the marginal distribution of the data. The prior
Handling of Missing Data F 5753
predictive distribution is an integral of the likelihood function with respect to the prior distribution
Z
p.ypred / D p.ypred j /p. /d
and the distribution is not conditional on observed data.
Handling of Missing Data
PROC MCMC automatically augments missing values5 via the use of the MODEL statement. PROC MCMC
treats missing values as unknown parameters, assigns distributions to the variables, and incorporates the
sampling of the missing data as part of Markov chain.
(In SAS/STAT 9.3 and earlier releases, by default, PROC MCMC discarded all observations that had missing
or partial missing values. PROC MCMC could not model missing values.)
You can use the MISSING= option in the PROC MCMC statement to specify how you want PROC MCMC
to handle the missing values. If you specify MISSING=CC (CC stands for complete cases), PROC MCMC
discards all observations that have missing or partial missing values before carrying out the simulation. If
you specify MISSING=AC (AC stands for all cases), PROC MCMC neither discards any missing values nor
augments them.
Generally speaking, there are three types of missing data models, as discussed by Rubin (1976). Also see
Little and Rubin (2002) for a comprehensive treatment of missing data analysis. The rest of this section
provides an overview of these three types of missing data models and explains how to use PROC MCMC to
fit them.
Missing Completely at Random (MCAR)
Data are said to be MCAR if the probability of a missing value (or the failure of observing a value) does
not depend on any other observations in the data set, regardless of whether they are observed or missing.
That is, the observed and unobserved values are independent of each other: if yi is missing, it is MCAR if
the probability of observing yi is independent of other yj (and other covariates xi ) in the data set. Under
this assumption, both the observed and unobserved data are random samples of all the data; hence, fitting
a model based only on the observed data does not introduce any biases. This type of analysis is called a
complete-case analysis. To carry out a complete-case analysis, you must specify MISSING=CC in the PROC
MCMC statement. (In SAS/STAT 9.3 and earlier, PROC MCMC performed a complete-case analysis when
the data contained missing values.)
Missing at Random (MAR)
Data are said to be MAR if the probability of a missing value can depend on some observed quantities but
does not depend on any unobserved data. For example, suppose that xi are completely observed for all
observations and some yi are missing. MAR states that the probability of observing yi is independent of
other missing yi (values that could have been observed) and that it depends only on xi (and, potentially,
observed yi ).
The MAR assumption states that the missing yi are no longer random samples and that they need to be
modeled (via the likelihood specification of the missing values). At the same time, the independence
5A
missing value is usually, although not necessarily, represented by a single period (.) in the input data set.
5754 F Chapter 74: The MCMC Procedure
assumption of the missing values on the unobserved quantities states that the missing mechanism (usually an
binary indicator variable such that ri D 1 if yi is missing and ri D 0 otherwise) can be ignored and does
not need to be taken into account. Hence, MAR is sometimes referred to as ignorably missing. It is not the
missing values that can be ignored, it is the missing mechanism that can be ignored.
By default, PROC MCMC treats the missing data as MAR (this assumes that you do not input a binary
indicator variable ri and model it specifically): each missing value becomes an extra parameter and PROC
MCMC updates it in every iteration. PROC MCMC assumes that both the missing values and observed
values arise from the same distribution (which is specified in the MODEL statement),
y D fyobs ; ymis g f .yj /
where y consists of observed (yobs ) and missing (ymis ) values, and f .yj / is the likelihood function with
parameters .
You can use the MODEL statement to model missing covariates. Using multiple MODEL statements enables
you to specify, for example, a marginal distribution for missing values in covariate x and a conditional
distribution for the response variable y given x as follows:
model x ~ normal(alpha, var=s2_x);
model y ~ normal(beta * x, var=s2_y);
In each iteration, PROC MCMC draws samples for every missing value in variable x, then every missing
value in variable y, conditional on the drawn values of the x variable.
Missing Not at Random (MNAR)
Data are said to be MNAR if the probability of a missing value depends on unobserved data (or data that
could have been observed): the probability that yi is missing depends on the missing values of other yi . This
is a very general scenario that assumes that the missing mechanism is no longer ignorable (it is sometimes
referred to as nonignorably missing) and that a model for the missing mechanism is required in order to make
correct inferences about the model parameters.
Let R D .r1 ; : : : ; rn / be the missing value indicator for Y D .y1 ; : : : ; yn /, where ri D 1 if yi is missing
and ri D 0 otherwise. This R is usually part of an input data set where you preprocess the response variable
and create this missing value indicator variable. Modeling MNAR data implies that you must specify a
joint likelihood function over R and Y W f .R; YjX; /, where X represents the covariates and represents
the model parameters. This joint distribution can be factored in two ways: a pattern-mixture model and a
selection model.
The selection model factors the joint distribution R and Y into a marginal distribution for Y and a conditional
distribution for R,
f .R; YjX; / / f .YjX; ˛/ f .RjY; X; ˇ/
where D .˛; ˇ/, f .RjY; X; ˛/ is usually a binary model with a logit or probit link that involves regression
parameters ˛, and f .YjX; ˇ/ is the sampling distribution that generates yi with model parameters ˇ.
The pattern-mixture model factors the opposite way, a marginal distribution for R and a conditional distribution for Y,
f .R; YjX; / / f .RjX; / f .YjR; X; ı/
where D .; ı/.
Functions of Random-Effects Parameters F 5755
You can use PROC MCMC to fit either model by specifying multiple MODEL statements: one for the
marginal distribution and one for the conditional distribution. Suppose that the variable r is the missing data
indicator, which is modeled using a logit model, and that the response variable y is a Poisson regression that
includes the missing variable indicator as one of its covariates. The following statements are a PROC MCMC
program that fits a pattern-mixture model:
pi = logistic(alpha * x1);
model r ~ binary(pi);
mu = beta0 + beta1 * x2 + beta3 * r;
model y ~ poisson(exp(mu));
The first MODEL statement uses a binary model with logit link to model the missing mechanism, and the
second MODEL statement models the response variable with a Poisson regression that includes the missing
value indicator as one of its covariates. Each of the two sets of regression has its covariates and regression
coefficients. If this hypothetical data set contained missing values in covariates x1 and x2, you could add two
more MODEL statements to handle each variable as follows:
model x1 ~ normal(mu1, var=s2_x1);
pi = logistic(alpha * x1);
model r ~ binary(pi);
model x2 ~ normal(mu2, var=s2_x2);
mu = beta0 + beta1 * x2 + beta3 * r;
model y ~ poisson(exp(mu));
Functions of Random-Effects Parameters
When you specify a RANDOM statement in a program, PROC MCMC internally creates a random-effects
parameter for every unique value in the SUBJECT= variable. You can calculate any transformations of these
random-effects parameters by applying SAS functions to the effect, and you can use the transformed variable
in the subsequent statements. For example, the following statements perform a logit transformation of an
effect:
random u ~ normal(mu, var=s2) subject=students;
p = logistic(u);
...
The value of the variable p changes with u as the procedure steps through the input data set: for different
unique values of the students variable, u takes on a different parameter value, and p changes accordingly.
To save all the transformed values in p to the OUTPOST= data set, you cannot just specify the MONITOR=(p)
option in the PROC MCMC statement. With such a specification, PROC MCMC can save only one value of
p (usually the value associated with the last observation in the data set); it cannot save all values. To output
all transformed values, you must create an array to store every transformation and use the MONITOR=
option to save the entire array to the OUTPOST= data set. The difficult part of the programming involves
the creation of the correct array index to use in different types of the SUBJECT= variables. The rest of this
section describes how to monitor functions of random-effects parameters in different situations.
5756 F Chapter 74: The MCMC Procedure
Indexing Subject Variable
This subsection describes how to monitor transformation of an effect u when the students variable is an
indexing subject variable. An indexing subject variable is an integer variable that takes value from one to the
total number of unique subjects in a variable. In other words, the variable can be used as an index in a SAS
array. The indexing subject variable does not need to be sorted for the example code in this section to work.
An example of an indexing variable takes the values of (1 2 3 4 5 1 2 3 4 5), where the total number of
observation is n=10 and the number of unique values is m=5.
The following statements create an indexing variable students in the data set a:
data a;
input students @@;
datalines;
1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10
;
The following statements run a random-effects model without any response variables. There are only randomeffects parameters in the model; the program calculates the logit transformation of each effect, saves the
results to the OUTPOST= data set, and produces Figure 74.22:
proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1
plots=none seed=1;
array p[10];
random u ~ n(0, sd=1) subject=students;
p[students] = logistic(u);
model general(0);
run;
proc print data=a1(obs=3);
run;
The ARRAY statement creates an array p of size 10, which is the number of unique values in students.
The p array stores all the transformed values. The RANDOM statement declares u to be an effect with the
subject variable students. The P[STUDENTS] assignment statement calculates the logit transformations
of u and saves them in appropriate array elements—this is why the students variable must be an indexing
variable. Because the students variable used in the p[] array is also the subject variable in the RANDOM
statement, PROC MCMC can match each random-effects parameter with the corresponding element in array
p. The MONITOR= option monitors all elements in p and saves the output to the a1 data set. The a1 data set
contains variables p1–p10. Figure 74.22 shows the first three observations of the OUTPOST= data set.
Figure 74.22 Monitor Functions of Random Effect u
Obs Iteration
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
u_1
u_2
1
1 0.5050 0.7334 0.5375 0.5871 0.6862 0.4944 0.5740 0.5991 0.6812 0.8678 0.0198 1.0120
2
2 0.5563 0.4254 0.7260 0.5280 0.4593 0.5797 0.5813 0.7360 0.9079 0.3351 0.2261 -0.3006
3
3 0.6132 0.8031 0.4735 0.3135 0.7047 0.5273 0.6761 0.2379 0.2938 0.3986 0.4607 1.4058
Obs
u_3
u_4
u_5
u_6
u_7
u_8
u_9
u_10 LogReff LogLike LogPost
1 0.1504 0.3521 0.7826 -0.0225 0.2983 0.4019 0.7595 1.8819 -12.2658
0 -12.2658
2 0.9742 0.1120 -0.1630 0.3213 0.3281 1.0253 2.2887 -0.6854 -13.2393
0 -13.2393
3 -0.1060 -0.7837 0.8698 0.1092 0.7361 -1.1641 -0.8772 -0.4112 -12.3984
0 -12.3984
Functions of Random-Effects Parameters F 5757
The variable p1 is the logit transformation of the variable u_1, p2 is the logit transformation of the variable
u_2, and so on.
The same idea works for a students variable that is unsorted. The following statements create an unsorted indexing variable students with repeated measures in each subject, fit the same model, and produce
Figure 74.23:
data a;
input students @@;
datalines;
1 1 1 3 5 3 4 5 3 1 5 5 4 4 2 2 2 2 4 3
;
proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1
plots=none seed=1;
array p[5];
random u ~ n(0, sd=1) subject=students;
p[students] = logistic(u);
model general(0);
run;
proc print data=a1(obs=3);
run;
Figure 74.23 Monitor Functions of Random Effect u When the students Variable is Unsorted
Obs Iteration
p1
p2
p3
p4
p5
u_1
u_3
u_5
u_4
u_2 LogReff LogLike LogPost
1
1 0.5050 0.6862 0.7334 0.5871 0.5375 0.0198 1.0120 0.1504 0.3521 0.7826 -5.4865
0
-5.4865
2
2 0.4944 0.8678 0.5740 0.6812 0.5991 -0.0225 0.2983 0.4019 0.7595 1.8819 -6.7793
0
-6.7793
3
3 0.5563 0.4593 0.4254 0.5280 0.7260 0.2261 -0.3006 0.9742 0.1120 -0.1630 -5.1595
0
-5.1595
There are five random-effects parameters in this example, and the array p also has five elements. The values
p1–p5 are the transformations of u_1–u_5, respectively. The u variables are not sorted from u_1 to u_5
because PROC MCMC creates the names according to the order by which the subject variable appears
in the input data set. Nevertheless, because students is an indexing variable, the first element p[1] stores
the transformation that corresponds to students=1 (which is u_1), the second element p[2] stores the
transformation that corresponds to students=2, and so on.
Non-Indexing Subject Variable
A non-indexing subject variable can take values of character literals (for example, names) or numerals (for
example, ZIP code or a person’s weight). This section illustrates how to monitor functions of random-effects
parameters in these situations.
Suppose you have unsorted character literals as the subject variable:
data a;
input students$ @@;
datalines;
smith john john mary kay smith lizzy ben ben dylan
ben toby abby mary kay kay lizzy ben dylan mary
;
5758 F Chapter 74: The MCMC Procedure
A statement such as following does not work anymore because a character variable cannot be used as an
array index:
p[students] = logistic(u);
In this situation, you usually need to do two things: (1) find out the number of unique values in the subject
variable, and (2) create a numeric index variable that replaces the students array index. You can use the
following statements to do the first task:
proc sql noprint;
select count(distinct(students)) into :nuniq from a;
quit;
%put &nuniq;
The PROC SQL call counts the distinct values in the students variable and saves the count to the macro
variable &nuniq. The macro variable is used later to specify the element size of the p array. In this example,
the a data set contains 20 observations and 9 unique elements (the value of &nuinq).
The following statements create an Index variable in the a data set that is in the order by which the students
names appear in the data set:
proc freq data=a order=data noprint;
tables students / out=_f(keep=students);
run;
proc print data=_f;
run;
data a(drop=n);
set a;
do i = 1 to nobs until(students=n);
set _f(keep=students rename=(students=n)) point=i nobs=nobs;
Index = i;
end;
run;
proc print data=a;
run;
The PROC FREQ call identifies the unique students names and saves them to the _f data set, which is
displayed in Figure 74.24.
Figure 74.24 Unique Names in the Variable Students
Obs students
1 smith
2 john
3 mary
4 kay
5 lizzy
6 ben
7 dylan
8 toby
9 abby
Functions of Random-Effects Parameters F 5759
The DATA step steps through the a data set and creates an Index variable to match the order in which the
students names appear in the data set. The new a data set6 is displayed in Figure 74.25.
Figure 74.25 New Index Variable in the a Data Set
Obs students Index
1 smith
1
2 john
2
3 john
2
4 mary
3
5 kay
4
6 smith
1
7 lizzy
5
8 ben
6
9 ben
6
10 dylan
7
11 ben
6
12 toby
8
13 abby
9
14 mary
3
15 kay
4
16 kay
4
17 lizzy
5
18 ben
6
19 dylan
7
20 mary
3
Student smith is the first subject, and his Index value is one. The same student appears again in the sixth
observation, which is given the same Index value. Now, this Index variable can be used to index the p array,
in a similar fashion as demonstrated in previous programs:
data _f;
set _f;
subj = compress('p_'||students);
run;
proc sql noprint;
select subj into :pnames separated by ' ' from _f;
quit;
%put &pnames;
proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1
plots=none seed=1;
array p[&nuniq] &pnames;
random u ~ n(0, sd=1) subject=students;
p[index] = logistic(u);
model general(0);
run;
proc print data=a1(obs=3);
run;
6 The
programming code that creates and adds the Index variable to the data set a keeps all variables from the original data set
and does not discard them.
5760 F Chapter 74: The MCMC Procedure
The first part of the DATA step and the PROC SQL call create array names for the p array that match the
subject names in the students variable. It is not necessary to include these steps for the PROC MCMC
program to work, but it makes the output more readable. The first DATA step steps through the _f data set
and creates a subj variable that concatenates the prefix characters p_ with the names of the students. The
PROC SQL calls put all the subj values to a macro called &pnames, which looks like the following:
p_smith p_john p_mary p_kay p_lizzy p_ben p_dylan p_toby p_abby
In the PROC MCMC program, the ARRAY statement defines an p array with size &nuniq (9), and use the
macro variable &pnames to name the array elements. The P[INDEX] assignment statement uses the Index
variable to find the correct array element to store the transformation. Figure 74.26 displays the first few
observations of the OUTPOST=a1 data set.
Figure 74.26 First Few Observations of the Outpost Data Set
Obs Iteration p_smith p_john p_mary p_kay p_lizzy p_ben p_dylan p_toby p_abby u_smith u_john
1
1
0.5050 0.7334
0.5375 0.5871 0.6862 0.4944
0.5740 0.5991 0.6812
0.0198 1.0120
2
2
0.8678 0.5563
0.4254 0.7260 0.5280 0.4593
0.5797 0.5813 0.7360
1.8819 0.2261
3
3
0.9079 0.3351
0.6132 0.8031 0.4735 0.3135
0.7047 0.5273 0.6761
2.2887 -0.6854
Obs u_mary u_kay u_lizzy
u_ben u_dylan u_toby u_abby LogReff LogLike LogPost
1 0.1504 0.3521 0.7826 -0.0225
0.2983 0.4019 0.7595
-9.5761
0
-9.5761
2 -0.3006 0.9742 0.1120 -0.1630
0.3213 0.3281 1.0253 -11.2370
0 -11.2370
3 0.4607 1.4058 -0.1060 -0.7837
0.8698 0.1092 0.7361 -13.1867
0 -13.1867
There are nine random-effects parameters (u_smith, u_john, and so on). There are nine elements of the p
array (p_smith, p_john, and so on); each is the logit transformation of corresponding u elements.
You can use the same statements for subject variables that are numeric non-indexing variables. The following
statements create a students variable that take large numbers that cannot be used as indices to an array. The
rest of the program monitors functions of the effect u. The output is not displayed here.
data a;
call streaminit(1);
do i = 1 to 20;
students = rand("poisson", 20);
output;
end;
drop i;
run;
proc sql noprint;
select count(distinct(students)) into :nuniq from a;
quit;
%put &nuniq;
proc freq data=a order=data noprint;
tables students / out=_f(keep=students);
run;
Spatial Prior F 5761
data a(drop=n);
set a;
do i = 1 to nobs until(students=n);
set _f(keep=students rename=(students=n)) point=i nobs=nobs;
Index = i;
end;
run;
data _f;
set _f;
subj = compress('p_'||students);
proc sql noprint;
select subj into :pnames separated by ' ' from _f;
quit;
%put &pnames;
proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1
plots=none seed=1;
array p[&nuniq] &pnames;
random u ~ n(0, sd=1) subject=students;
p[index] = logistic(u);
model general(0);
run;
proc print data=a1(obs=3);
run;
Spatial Prior
You can use the conditional autoregressive Gaussian (NORMALCAR) prior in the RANDOM statement to
model random effects that are spatially correlated. Suppose that the areas are indexed by the variable ID in a
SAS data set. To model spatial dependence among areas, you can use the following statement:
random a ~ normalcar(neighbors=, num=, sd|var|prec=) subject=ID;
The NUM= option specifies the data set variable that indicates the number of neighbors for each ID, and
the NEIGHBORS= option specifies the prefix (without quotation marks) of data variables that contain the
adjacent IDs (neighbors) for each subject. The names of all neighboring variables must have this same prefix
and must be followed by numbers 1, 2, : : :, N, where N is the maximum number of neighbors that a subject
has in the data set. Missing values are allowed in the neighboring ID variables if a subject has fewer than N
neighbors.
The default specification of the NORMALCAR prior corresponds to the full conditional distribution of each
subject is the following CAR model,
0
i j
i
normal @
1
X
j 2N.i /
j =mi ; s 2 A
5762 F Chapter 74: The MCMC Procedure
where N.i/ is the set of neighbors of area i, mi is the total number of neighbors of area i, and s is the standard
deviation of the normal distribution.
You can also specify a CAR prior in which the variance is weighed by the number of neighbors:
0
i j
i
normal @
1
X
j =mi ; s 2 =mi A
j 2N.i /
To specify this prior, you use the following statement, where NumNei is the variable of number of neighbors
and s2 is the variance parameter:
random a ~ normalcar(neighbor=, num=NumNei, var=s2/NumNei) subject=ID;
In most cases, the CAR prior is an improper prior because the corresponding covariance matrix of D
fi ; : : : ; p g is less than full rank. If the posterior distribution is proper, then using an improper prior is not
an issue. A common practice is to re-center all the draws after each iteration (Banerjee, Carlin, and Gelfand
2004, p. 164). In effect, this re-centering sets the prior mean of the random effects to be zero. The centering
step is performed by using the CENTER option in the RANDOM statement.
Here is an example that illustrates the use of the NORMALCAR prior distribution.
The following statements create a SAS data set that contains lung cancer data from a London Health Authority
annual report (Thomas et al. 2004):
title 'Spatial Analysis';
data London;
input ID N1-N9 NumNei O E Depriv;
datalines;
1
4
8
9 12 17
.
.
.
2
7 10 13 14
.
.
.
.
.
.
5
4
4
8
7.2090
7.8144
1.233
8.162
.
7
7
6.3737
3.961
... more lines ...
44 22
run;
24
25
26
31
32
41
.
The 44 wards are indicated by the ID variable. N1–N9 are variables of neighboring indices for each ward.
For example, ward 1 is adjacent to five wards: 4, 8, 9, 12, and 17. The NumNei variable is the total number
of neighbor wards that each ward has. The maximum number of neighbors wards is nine in the data set, so
every observation must have nine neighboring ID variables. Missing data (".") are used in neighboring ID
variables that have fewer than nine neighbors. The O and E variables are simulated observed and expected
counts of lung cancer incidence. The Depriv variable is the socioeconomic variable for each ward.
The following random-effects model is considered,
Oi
Poisson.i /
log.i / D log.Ei / C ˛ C ˇ Deprivei C bi C hi
Floating Point Errors and Overflows F 5763
where the random effects bi are given a spatial prior and hi is a normal prior for each area. You can fit this
model using the following PROC MCMC statements:
ods select none;
proc mcmc data=london seed=615926 nbi=10000 nmc=50000 thin=10
plots=none outpost=londonpost;
parms tau_b 0.5 tau_h 0.2;
parms alpha 0 beta 0;
prior tau: ~ gamma(0.5, is=0.0005);
prior alpha ~ general(0);
prior beta ~ n(0, prec=1e-5);
random h ~ n(0, prec=tau_h) s=id;
random b ~ normalcar(neighbors=n,num=numnei,prec=tau_b*numnei) s=id;
mu=e*exp(alpha + beta*depriv + b + h);
model o ~ poisson(mu);
run;
The two RANDOM statements specify the hi and the bi random effects; the NORMALCAR prior is assigned
to bi . In the NORMALCAR prior, NEIGHBORS=N specifies the prefix “n” for all the neighboring ID
variables, NUM=NUMNEI specifies the number of neighbors that each ID has, and PREC=TAU_B*NUMNEI
weights the precision parameter by the number of neighbors in the model.
The example assumes that the adjacency matrices have been created and stored in a SAS data set. More often,
you might have the map polygon data in an .shp file. In that case, you can use the %NEIGHBOR autocall
macro to import simple polygons and to compute the adjacent units and number of neighbors for each spatial
unit:
%neighbor("map.shp", IDVAR=ID, OUTNBRS=neighbor, OUTADJ=adjacent);
The first argument is the path to the shape file, which must be specified in quotation marks. The IDVAR=
option specifies the ID variable that identifies the sites or the units. The OUTNBRS= and OUTADJ= options
save the neighborhood and adjacency information, respectively, to two SAS data sets. The default values for
the IDVAR=, OUTNBRS=, and OUTADJ= options are ID, Neighbor, and Adjacent, respectively. You can
combine these SAS data sets with covariates and response information before you use PROC MCMC to fit
the spatial model.
Floating Point Errors and Overflows
When performing a Markov chain Monte Carlo simulation, you must calculate a proposed jump and an
objective function (usually a posterior density). These calculations might lead to arithmetic exceptions and
overflows. A typical cause of these problems is parameters with widely varying scales. If the posterior
variances of your parameters vary by more than a few orders of magnitude, the numerical stability of the
optimization problem can be severely reduced and can result in computational difficulties. A simple remedy
is to rescale all the parameters so that their posterior variances are all approximately equal. Changing the
SCALE= option might help if the scale of your parameters is much different than one. Another source of
numerical instability is highly correlated parameters. Often a model can be reparameterized to reduce the
posterior correlations between parameters.
If parameter rescaling does not help, consider the following actions:
5764 F Chapter 74: The MCMC Procedure
provide different initial values or try a different seed value
use boundary constraints to avoid the region where overflows might happen
change the algorithm (specified in programming statements) that computes the objective function
Problems Evaluating Code for Objective Function
The initial values must define a point for which the programming statements can be evaluated. However,
during simulation, the algorithm might iterate to a point where the objective function cannot be evaluated.
If you program your own likelihood, priors, and hyperpriors by using SAS statements and the GENERAL
function in the MODEL, PRIOR, AND HYPERPRIOR statements, you can specify that an expression cannot
be evaluated by setting the value you pass back through the GENERAL function to missing. This tells the
PROC MCMC that the proposed set of parameters is invalid, and the proposal will not be accepted. If you
use the shorthand notation that the MODEL, PRIOR, AND HYPERPRIOR statements provide, this error
checking is done for you automatically.
Long Run Times
PROC MCMC can take a long time to run for problems with complex models, many parameters, or large
input data sets. Although the techniques used by PROC MCMC are some of the best available, they are not
guaranteed to converge or proceed quickly for all problems. Ill-posed or misspecified models can cause the
algorithms to use more extensive calculations designed to achieve convergence, and this can result in longer
run times. You should make sure that your model is specified correctly, that your parameters are scaled to the
same order of magnitude, and that your data reasonably match the model that you are specifying.
To speed general computations, you should check over your programming statements to minimize the
number of unnecessary operations. For example, you can use the proportional kernel in the priors or the
likelihood and not add constants in the densities. You can also use the BEGINCNST and ENDCNST to
reduce unnecessary computations on constants, and the BEGINNODATA and ENDNODATA statements to
reduce observation-level calculations.
Reducing the number of blocks (the number of the PARMS statements) can speed up the sampling process. A
single-block program is approximately three times faster than a three-block program for the same number of
iterations. On the other hand, you do not want to put too many parameters in a single block, because blocks
with large size tend not to produce well-mixed Markov chains.
If some parameters satisfy the conditional independence assumption, such as in the random-effects models or
latent variable models, consider using the RANDOM statement to model these parameters. This statement
takes advantage of the conditional independence assumption and can sample a larger number of parameters
at a more efficient pace.
Slow or No Convergence
If the simulator is slow or fails to converge, you can try changing the model as follows:
Change the number of Monte Carlo iterations (NMC=), or the number of burn-in iterations (NBI=), or
both. Perhaps the chain just needs to run a little longer. Note that after the simulation, you can always
use the DATA step or the FIRSTOBS data set option to throw away initial observations where the
algorithm has not yet burned in, so it is not always necessary to set NBI= to a large value.
Floating Point Errors and Overflows F 5765
Increase the number of tuning. The proposal tuning can often work better in large models (models that
have more parameters) with larger values of NTU=. The idea of tuning is to find a proposal distribution
that is a good approximation to the posterior distribution. Sometimes 500 iterations per tuning phase
(the default) is not sufficient to find a good approximating covariance.
Change the initial values to more feasible starting values. Sometimes the proposal tuning starts badly
if the initial values are too far away from the main mass of the posterior density, and it might not be
able to recover.
Use the PROPCOV= option to start the Markov chain at better starting values. With the PROPCOV=QUANEW option, PROC MCMC optimizes the object function and uses the posterior mode as
the starting value of the Markov chain. In addition, a quadrature approximation to the posterior mode
is used as the proposal covariance matrix. This option works well in many cases and can improve the
mixing of the chain and shorten the tuning and burn-in time.
Parameterize your model to include conjugacy, such as using the gamma prior on the precision
parameter in a normal distribution or using an inverse gamma on the variance parameter. For a list of
conjugate sampling methods that PROC MCMC supports, see the section “Conjugate Sampling” on
page 5697.
Change the blocking by using the PARMS statements. Sometimes poor mixing and slow convergence
can be attributed to highly correlated parameters being in different parameter blocks.
Modify the target acceptance rate. A target acceptance rate of about 25% works well for many
multi-parameter problems, but if the mixing is slow, a lower target acceptance rate might be better.
Change the initial scaling or the TUNEWT= option to possibly help the proposal tuning.
Consider using a different proposal distribution. If from a trace plot you see that a chain traverses to
the tail area and sometimes takes quite a few simulations before it comes back, you can consider using
a t proposal distribution. You can do this by either using the PROC option PROPDIST=T or using a
PARMS statement option T.
Transform parameters and sample on a different scale. For example, if a parameter has a gamma
distribution, sample on the logarithm scale instead. A parameter a that has a gamma distribution is
equivalent to log.a/ that has an egamma distribution, with the same distribution specification. For
example, the following two formulations are equivalent:
parm a;
prior a ~ gamma(shape = 0.001, scale = 0.001);
and
parm la;
prior la ~ egamma(shape = 0.001, scale = 0.001);
a = exp(la);
5766 F Chapter 74: The MCMC Procedure
See “Example 74.6: Nonlinear Poisson Regression Models” on page 5804 and “Example 74.20: Using
a Transformation to Improve Mixing” on page 5882. You can also use the logit transformation on
parameters that have uniform.0; 1/ priors. This prior is often used on probability parameters. The logit
transformation is as follows: q D log. 1 pp /. The distribution on q is the Jacobian of the transformation:
exp. q/.1 C exp. q// 2 .
Again, the following two formulations are equivalent:
parm p;
prior p ~ uniform(0, 1);
and
parm q;
lp = -q - 2 * log(1 + exp(-q));
prior q ~ general(lp);
p = 1/(1+exp(-q));
Precision of Solution
In some applications, PROC MCMC might produce parameter values that are not precise enough. Usually,
this means that there were not enough iterations in the simulation. At best, the precision of MCMC estimates
increases with the square of the simulation sample size. Autocorrelation in the parameter values deflate the
precision of the estimates. For more information about autocorrelations in Markov chains, see the section
“Autocorrelations” on page 149 in Chapter 7, “Introduction to Bayesian Analysis Procedures.”
Handling Error Messages
PROC MCMC does not have a debugger. This section covers a few ways to debug and resolve error messages.
Using the PUT Statement
Adding the PUT statement often helps to find errors in a program. The following statements produce an error:
data a;
run;
proc mcmc data=a seed=1;
parms sigma lt w;
beginnodata;
prior sigma ~ unif(0.001,100);
s2 = sigma*sigma;
prior lt ~ gamma(shape=1, iscale=0.001);
t = exp(lt);
Handling Error Messages F 5767
c = t/s2;
d = 1/(s2);
prior w ~ gamma(shape=c, iscale=d);
endnodata;
model general(0);
run;
ERROR: PROC MCMC is unable to generate an initial value for the
parameter w. The first parameter in the prior distribution is
missing.
To find out why the shape parameter c is missing, you can add the put statement and examine all the
calculations that lead up to the assignment of c:
proc mcmc data=a seed=1;
parms sigma lt w;
beginnodata;
prior sigma ~ unif(0.001,100);
s2 = sigma*sigma;
prior lt ~ gamma(shape=1, iscale=0.001);
t = exp(lt);
c = t/s2;
d = 1/(s2);
put c= t= s2= lt=; /* display the values of these symbols. */
prior w ~ gamma(shape=c, iscale=d);
endnodata;
model general(0);
run;
In the log file, you see the following:
c=. t=. s2=. lt=.
c=. t=. s2=2500.0500003 lt=1000
c=. t=. s2=2500.0500003 lt=1000
ERROR: PROC MCMC is unable to generate an initial value for the parameter w.
The first parameter in the prior distribution is missing.
You can ignore the first few lines. They are the results of initial set up by PROC MCMC. The last line is
important. The variable c is missing because t is the exponential of a very large number, 1000, in lt. The
value 1000 is assigned to lt by PROC MCMC because none was given. The gamma prior with shape of 1
and inverse scale of 0.001 has mode 0 (see “Standard Distributions” on page 5700 for more details). PROC
MCMC avoids starting the Markov chain at the boundary of the support of the distribution, and it uses the
mean value here instead. The mean of the gamma prior is 1000, hence the problem. You can change how
the initial value is generated by using the PROC statement INIT=RANDOM. Remember to take out the put
statement once you identify the problem. Otherwise, you will see a voluminous output in the log file.
5768 F Chapter 74: The MCMC Procedure
Using the HYPER Statement
You can use the HYPER statement to narrow down possible errors in the prior distribution specification.
With multiple PRIOR statements in a program, you might see the following error message if one of the prior
distributions is not specified correctly:
ERROR: The initial prior parameter specifications must yield log
of positive prior density values.
This message is displayed when PROC MCMC detects an error in the prior distribution calculation but cannot
pinpoint the specific parameter at fault. It is frequently, although not necessarily, associated with parameters
that have GENERAL or DGENERAL distributions. If you have a complicated model with many PRIOR
statements, finding the parameter at fault can be time consuming. One way is to change a subset of the
PRIOR statements to HYPER statements. The two statements are treated the same in PROC MCMC and
the simulation is not affected, but you get a different message if the hyperprior distributions are calculated
incorrectly:
ERROR: The initial hyperprior parameter specifications must yield
log of positive hyperprior density values.
This message can help you identify more easily which distributions are producing the error, and you can then
use the PUT statement to further investigate.
Computational Resources
It is impossible to estimate how long it will take for a general Markov chain to converge to its stationary
distribution. It takes a skilled and thoughtful analysis of the chain to decide whether it has converged to
the target distribution and whether the chain is mixing rapidly enough. In some cases, you might be able
to estimate how long a particular simulation might take. The running time of a program that does not have
RANDOM statements is approximately linear to the following factors: the number of samples in the input
data set, the number of simulations, the number of blocks in the program, and the speed of your computer.
For an analysis that uses a data set of size nsamples, a simulation length of nsim, and a block design of
nblocks, PROC MCMC evaluates the log-likelihood function the following number of times, excluding the
tuning phase:
nsamples nsim nblocks
The faster your computer evaluates a single log-likelihood function, the faster this program runs. Suppose
you have nsamples equal to 200, nsim equal to 55,000, and nblocks equal to 3. PROC MCMC evaluates the
log-likelihood function approximately 3:3 107 times. If your computer can evaluate the log likelihood for
one observation 106 times per second, this program takes approximately a half a minute to run. If you want
to increase the number of simulations five-fold, the run time increases approximately five-fold.
Each RANDOM statement adds one pass through the input data at each iteration. If the Metropolis algorithm
is used to sample the random-effects parameter, the conditional density (objective function) is calculated
twice per pass through the data, which requires a computational resource that is approximately equivalent to
adding two blocks of parameters.
Displayed Output F 5769
Of course, larger problems take longer than shorter ones, and if your model is amenable to frequentist
treatment, then one of the other SAS procedures might be more suitable. With “regular” likelihoods and
a lot of data, the results of standard frequentist analysis are often asymptotically equivalent to a Bayesian
approach. If PROC MCMC requires too much CPU time, then perhaps another SAS/STAT tool would be
suitable.
Displayed Output
This section describes the output that PROC MCMC displays. For a quick reference of all ODS table
names, see the section “ODS Table Names” on page 5773. ODS tables are arranged under four groups,
which are listed in the following sections: “Model and Data Related ODS Tables” on page 5769, “Sampling
Related ODS Tables” on page 5770, “Posterior Statistics Related ODS Tables” on page 5771, “Convergence
Diagnostics Related ODS Tables” on page 5771, and “Optimization Related ODS Tables” on page 5773.
Model and Data Related ODS Tables
Missing Data Information Table
The “Missing Data Information” table (ODS table name MISSDATAINFO) displays the name of the response
variable that contains missing values, the number of missing observations, the corresponding observation
indices in the input data set, and the sampling method used in the simulation for the missing values.
Number of Observation Table
The “NObs” table (ODS table name NOBS) shows the number of observations that is in the data set and
the number of observations that is used in the analysis. By default, observations with missing values are not
used (see the section “Handling of Missing Data” on page 5753 for more details). This table is displayed by
default.
Parameters
The “Parameters” table (ODS table name Parameters) shows the name of each parameter, the block number
of each parameter, the sampling method used for the block, the initial values, and the prior or hyperprior
distributions. This table is displayed by default.
REObsInfo
The “Random Effect Observation Information” table (ODS table name REObsInfo) lists the name of the
random effect, each subject value, the number of observations in each subject, and their corresponding
observation indices in the input data set. You can request this table by specifying the REOBSINFO option.
REParameters
The “REParameters” table (ODS table name REParameters) lists the name of the random effect, sampling
algorithm, the subject variable, the number of subjects, unique values of the subject variable, and the prior
distribution. This table is displayed by default if a RANDOM statement is used in the program.
5770 F Chapter 74: The MCMC Procedure
Sampling Related ODS Tables
Burn-In History
The “Burn-In History” table (ODS table name BurnInHistory) shows the scales and acceptance rates for
each parameter block in the burn-in phase. The table is not displayed by default and can be requested by
specifying the option MCHISTORY=BRIEF | DETAILED.
Parameters Initial Value Table
The “Parameters Initial” table (ODS table name ParametersInit) shows the value of each parameter after
the tuning phase. This table is not displayed by default and can be requested by specifying the option
INIT=PINIT.
Posterior Samples
The “Posterior Samples” table (ODS table name PosteriorSample) stores posterior draws of all parameters.
It is not printed by PROC MCMC. You can create an ODS output data set of the chain by specifying the
following:
ODS OUTPUT PosteriorSample = SAS-data-set;
Sampling History
The “Sampling History” table (ODS table name SamplingHistory) shows the scales and acceptance rates for
each parameter block in the main sampling phase. The table is not displayed by default and can be requested
by specifying the option MCHISTORY=BRIEF | DETAILED.
Tuning Covariance
The “Tuning Covariance” table (ODS table name TuneCov) shows the proposal covariance matrices for
each parameter block after the tuning phase. The table is not displayed by default and can be requested
by specifying the option INIT=PINIT. For more details about proposal tuning, see the section “Tuning the
Proposal Distribution” on page 5694.
Tuning History
The “Tuning History” table (ODS table name TuningHistory) shows the number of tuning phases used
in establishing the proposal distribution. The table also displays the scales and acceptance rates for each
parameter block at each of the tuning phases. For more information about the self-adapting proposal tuning
algorithm used by PROC MCMC, see the section “Tuning the Proposal Distribution” on page 5694. The
table is not displayed by default and can be requested by specifying the option MCHISTORY=BRIEF |
DETAILED.
Tuning Probability Vector
The “Tuning Probability” table (ODS table name TuneP) shows the proposal probability vector for each
discrete parameter block (when the option DISCRETE=GEO is specified and the geometric proposal distribution is used for discrete parameters) after the tuning phase. The table is not displayed by default and can be
requested by specifying the option INIT=PINIT. For more information about proposal tuning, see the section
“Tuning the Proposal Distribution” on page 5694.
Displayed Output F 5771
Posterior Statistics Related ODS Tables
PROC MCMC calculates some essential posterior statistics and outputs them to a number of ODS tables that
you can request and save individually. For details of the calculations, see the section “Summary Statistics” on
page 150 in Chapter 7, “Introduction to Bayesian Analysis Procedures.”
Summary and Interval Statistics
The “Posterior Summaries and Intervals” table (ODS table name PostSumInt) contains a summary of basic
point and interval statistics for each parameter. The table lists the number of posterior samples, the posterior
mean and standard deviation estimates, and the 95% HPD interval estimates. This table is displayed by
default.
Summary Statistics
The “Posterior Summaries” table (ODS table name PostSummaries) contains basic statistics for each
parameter. The table lists the number of posterior samples, the posterior mean and standard deviation
estimates, and the percentile estimates. The table is not displayed by default and can be requested by
specifying the option STATISTICS=SUMMARY.
Correlation Matrix
The “Posterior Correlation Matrix” table (ODS table name Corr) contains the posterior correlation of
model parameters. The table is not displayed by default and can be requested by specifying the option
STATISTICS=CORR.
Covariance Matrix
The “Posterior Covariance Matrix” table (ODS table name Cov) contains the posterior covariance of model
parameters. The table is not displayed by default and can be requested by specifying the option STATISTICS=COV.
Deviance Information Criterion
The “Deviance Information Criterion” table (ODS table name DIC) contains the DIC of the model. The table
is not displayed by default and can be requested by specifying the option DIC. For details of the calculations,
see the section “Deviance Information Criterion (DIC)” on page 152 in Chapter 7, “Introduction to Bayesian
Analysis Procedures.”
Interval Statistics
The “Posterior Intervals” table (ODS table name PostIntervals) contains the equal-tail and highest posterior
density (HPD) interval estimates for each parameter. The default ˛ value is 0.05, and you can change it to
other levels by using the STATISTICS= option. The table is not displayed by default and can be requested by
specifying the option STATISTICS=INTERVAL.
Convergence Diagnostics Related ODS Tables
PROC MCMC has convergence diagnostic tests that check for Markov chain convergence. PROC MCMC
produces a number of ODS tables that you can request and save individually. For details in calculation,
see the section “Statistical Diagnostic Tests” on page 141 in Chapter 7, “Introduction to Bayesian Analysis
Procedures.”
5772 F Chapter 74: The MCMC Procedure
Autocorrelation
The “Autocorrelations” table (ODS table name AUTOCORR) contains the first-order autocorrelations of the
posterior samples for each parameter. The “Parameter” column states the name of the parameter. By default,
PROC MCMC displays lag 1, 5, 10, and 50 estimates of the autocorrelations. You can request different
autocorrelations by using the DIAGNOSTICS = AUTOCORR(LAGS=) option. The table is not displayed by
default and can be requested by specifying the option DIAGNOSTICS=AUTOCORR.
Effective Sample Size
The “Effective Sample Sizes” table (ODS table name ESS) calculates the effective sample size of each
parameter. See the section “Effective Sample Size” on page 149 in Chapter 7, “Introduction to Bayesian
Analysis Procedures,” for more details. The table is displayed by default.
Monte Carlo Standard Errors
The “Monte Carlo Standard Errors” table (ODS table name MCSE) calculates the standard errors of the
posterior mean estimate. See the section “Standard Error of the Mean Estimate” on page 150 in Chapter 7,
“Introduction to Bayesian Analysis Procedures,” for more details. The table is not displayed by default and
can be requested by specifying the option DIAGNOSTICS=MCSE.
Geweke Diagnostics
The “Geweke Diagnostics” table (ODS table name Geweke) lists the result of the Geweke diagnostic test. See
the section “Geweke Diagnostics” on page 143 in Chapter 7, “Introduction to Bayesian Analysis Procedures,”
for more details. The table is not displayed by default and can be requested by specifying the option
DIAGNOSTICS=GEWEKE.
Heidelberger-Welch Diagnostics
The “Heidelberger-Welch Diagnostics” table (ODS table name Heidelberger) lists the result of the
Heidelberger-Welch diagnostic test. The test is consisted of two parts: a stationary test and a half-width
test. See the section “Heidelberger and Welch Diagnostics” on page 145 in Chapter 7, “Introduction to
Bayesian Analysis Procedures,” for more details. The table is not displayed by default and can be requested
by specifying DIAGNOSTICS = HEIDEL.
Raftery-Lewis Diagnostics
The “Raftery-Lewis Diagnostics” table (ODS table name Raftery) lists the result of the Raftery-Lewis
diagnostic test. See the section “Raftery and Lewis Diagnostics” on page 146 in Chapter 7, “Introduction to
Bayesian Analysis Procedures,” for more details. The table is not displayed by default and can be requested
by specifying DIAGNOSTICS = RAFTERY.
Summary Statistics for Prediction
The “Posterior Summaries for Prediction” table (ODS table name PredSummaries) contains basic statistics
for each prediction. The table lists the number of posterior samples, the posterior mean and standard deviation
estimates, and the percentile estimates. This table is displayed by default if any PREDDIST statement is used
in the program.
Interval Statistics for Prediction
The “Posterior Intervals for Prediction” table (ODS table name PredIntervals) contains the equal-tail and
highest posterior density (HPD) interval estimates for each prediction. The default ˛ value is 0.05, and
you can change it to other levels by using the STATISTICS option in a PREDDIST statement, or the
ODS Table Names F 5773
STATISTICS= option in the PROC MCMC statement if the option is not specified in a statement. This table
is displayed by default if any PREDDIST statement is used in the program.
Optimization Related ODS Tables
PROC MCMC can perform optimization on the joint posterior distribution. This is requested by the
PROPCOV= option. The most commonly used optimization method is the quasi-Newton method: PROPCOV=QUANEW(ITPRINT). The ITPRINT option displays the ODS tables, listed as follows:
Input Options
The “Input Options” table (ODS table name InputOptions) lists optimization options used in the procedure.
Optimization Start
The “Optimization Start” table (ODS table name ProblemDescription) shows the initial state of the optimization.
Iteration History
The “Iteration History” table (ODS table name IterHist) shows iteration history of the optimization.
Optimization Results
The “Optimization Results” table (ODS table name IterStop) shows the results of the optimization, includes
information about the number of function calls, and the optimized objective function, which is the joint log
posterior density.
Convergence Status
The “Convergence Status” table (ODS table name ConvergenceStatus) shows whether the convergence
criterion is satisfied.
Parameters Value After Optimization Table
The “Parameter Values After Optimization” table (ODS table name OptiEstimates) lists the parameter values
that maximize the joint log posterior. These are the maximum a posteriori point estimates, and they are used
to start the Markov chain.
Covariance Matrix After Optimization Table
The “Proposal Covariance” table (ODS table name OptiCov) lists covariance matrices for each block
parameter by using quadrature approximation at the posterior mode. These covariance matrices are used in
the proposal distribution.
ODS Table Names
PROC MCMC assigns a name to each table it creates. You can use these names to refer to the table when you
use the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in
Table 74.50. For more information about ODS, see Chapter 21, “Statistical Graphics Using ODS.”
5774 F Chapter 74: The MCMC Procedure
Table 74.50 ODS Tables Produced in PROC MCMC
ODS Table Name
Description
Statement or Option
AutoCorr
Autocorrelation statistics for each
parameter
History of burn-in phase sampling
Optimization convergence status
Correlation matrix of the posterior
samples
Covariance matrix of the posterior
samples
Deviance information criterion
Effective sample size for each
parameter
Monte Carlo standard error for each
parameter
Geweke diagnostics for each
parameter
Heidelberger-Welch diagnostics for
each parameter
Optimization input table
Optimization iteration history
Optimization results table
Response variable, number of missing
observations, missing observation
indices, and sampling algorithm
Number of observations
Parameter values after either
optimization
Covariance used in proposal
distribution after optimization
Summary of the PARMS,
BLOCKING, PRIOR, sampling
method, and initial value specification
Parameter values after the tuning
phase
Posterior samples for each parameter
Equal-tail and HPD intervals for each
parameter
Basic posterior statistics for each
parameter, including sample size,
mean, standard deviation, and HPD
intervals
Basic posterior statistics for each
parameter, including sample size,
mean, standard deviation, and
percentiles
DIAGNOSTICS=AUTOCORR
BurnInHistory
ConvergenceStatus
Corr
Cov
DIC
ESS
MCSE
Geweke
Heidelberger
InputOptions
IterHist
IterStop
MissDataInfo
NObs
OptiEstimates
OptiCov
Parameters
ParametersInit
PosteriorSample
PostIntervals
PostSumInt
PostSummaries
MCHISTORY=BRIEF | DETAILED
PROPCOV=method(ITPRINT)
STATS=CORR
STATS=COV
DIC
Default
DIAGNOSTICS=MCSE
DIAGNOSTICS=GEWEKE
DIAGNOSTICS=HEIDEL
PROPCOV=method(ITPRINT)
PROPCOV=method(ITPRINT)
PROPCOV=method(ITPRINT)
Default with sampling of missing values
Default
PROPCOV=method(ITPRINT)
PROPCOV=method(ITPRINT)
Default
INIT=PINIT
(For ODS output data set only)
STATISTICS=INTERVAL
Default
STATISTICS=SUMMARY
ODS Graphics F 5775
Table 74.50 (continued)
ODS Table Name
Description
Statement or Option
PredIntervals
Equal-tail and HPD intervals for each
prediction
Basic posterior statistics for each
prediction
Optimization table
Random effect, subject values,
number of observations in each unique
subject value, and corresponding
observation indices
Random effect, sampling method,
subject variable, number of subjects,
unique values of the subject variable,
and prior distribution of the random
effect
Raftery-Lewis diagnostics for each
parameter
History of main phase sampling
Proposal covariance matrix (for
continuous parameters) after the
tuning phase
Proposal probability vector (for
discrete parameters) after the tuning
phase
History of proposal distribution tuning
Default with any PREDDIST statement
PredSummaries
ProblemDescription
REObsInfo
REParameters
Raftery
SamplingHistory
TuneCov
TuneP
TuningHistory
Default with any PREDDIST statement
PROPCOV=method(ITPRINT)
REOBSINFO
Default with any RANDOM statement
DIAGNOSTICS=RAFTERY
MCHISTORY=BRIEF | DETAILED
INIT=PINIT
INIT=PINIT and DISCRETE=GEO
MCHISTORY=BRIEF | DETAILED
ODS Graphics
Statistical procedures use ODS Graphics to create graphs as part of their output. ODS Graphics is described
in detail in Chapter 21, “Statistical Graphics Using ODS.”
Before you create graphs, ODS Graphics must be enabled (for example, by specifying the ODS GRAPHICS ON statement). For more information about enabling and disabling ODS Graphics, see the section
“Enabling and Disabling ODS Graphics” on page 607 in Chapter 21, “Statistical Graphics Using ODS.”
The overall appearance of graphs is controlled by ODS styles. Styles and other aspects of using ODS
Graphics are discussed in the section “A Primer on ODS Statistical Graphics” on page 606 in Chapter 21,
“Statistical Graphics Using ODS.”
You can reference every graph produced through ODS Graphics with a name. The names of the graphs that
PROC MCMC generates are listed in Table 74.51.
5776 F Chapter 74: The MCMC Procedure
Table 74.51 Graphs Produced by PROC MCMC
ODS Graph Name
Plot Description
Statement & Option
ADPanel
Autocorrelation function
and density panel
Autocorrelation function
panel
Autocorrelation function
plot
Density panel
Density plot
Trace and autocorrelation
function panel
Trace, density, and autocorrelation function panel
Trace and density panel
Trace panel
Trace plot
PLOTS=(AUTOCORR DENSITY)
AutocorrPanel
AutocorrPlot
DensityPanel
DensityPlot
TAPanel
TADPanel
TDPanel
TracePanel
TracePlot
PLOTS=AUTOCORR
PLOTS(UNPACK)=AUTOCORR
PLOTS=DENSITY
PLOTS(UNPACK)=DENSITY
PLOTS=(TRACE AUTOCORR)
PLOTS=(TRACE AUTOCORR DENSITY)
PLOTS=(TRACE DENSITY)
PLOTS=TRACE
PLOTS(UNPACK)=TRACE
Examples: MCMC Procedure
Example 74.1: Simulating Samples From a Known Density
This example illustrates how you can obtain random samples from a known function. The target distributions
are the normal distribution (a standard distribution) and a mixture of the normal distributions (a nonstandard
distribution). For more information, see the sections “Standard Distributions” on page 5700 and “Specifying
a New Distribution” on page 5715). This example also shows how you can use PROC MCMC to estimate an
integral (area under a curve). Monte Carlo simulation is data-independent; hence, you do not need an input
data set from which to draw random samples from the desired distribution.
Sampling from a Normal Density
When you run a simulation without an input data set, the posterior distribution is the same as the prior
distribution. Hence, if you want to generate samples from a distribution, you declare the distribution in the
PRIOR statement and set the likelihood function to a constant. Although there is no contribution from any
data set variable to the likelihood calculation, you still must specify a data set and the MODEL statement
needs a distribution. You can input an empty data set and use the GENERAL function to provide a flat
likelihood. The following statements generate 10,000 samples from a standard normal distribution:
title 'Simulating Samples from a Normal Density';
data x;
run;
Example 74.1: Simulating Samples From a Known Density F 5777
ods graphics on;
proc mcmc data=x outpost=simout seed=23 nmc=10000 diagnostics=none;
ods exclude nobs;
parm alpha 0;
prior alpha ~ normal(0, sd=1);
model general(0);
run;
The ODS GRAPHICS ON statement enables ODS Graphics. The PROC MCMC statement specifies the
input and output data sets, a random number seed, and the size of the simulation sample. The STATISTICS=
option displays only the summary and interval statistics. The ODS EXCLUDE statement excludes the display
of the NObs table. PROC MCMC draws independent samples from the normal distribution directly (see
Output 74.1.1). Therefore, the simulation does not require any tuning, and PROC MCMC omits the default
burn-in phrase.
Output 74.1.1 Parameters Information
Simulating Samples from a Normal Density
The MCMC Procedure
Parameters
Sampling Initial
Block Parameter Method Value Prior Distribution
1 alpha
Direct
0 normal(0, sd=1)
The summary statistics (Output 74.1.2) are what you would expect from a standard normal distribution.
Output 74.1.2 MCMC Summary and Interval Statistics from a Normal Target Distribution
Simulating Samples from a Normal Density
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
alpha
N
Standard
95%
Mean Deviation HPD Interval
10000 0.00195
0.9949 -1.9664 1.9302
The trace plot (Output 74.1.3) shows perfect mixing with no autocorrelation in the lag plot. This is expected
because these are independent draws.
5778 F Chapter 74: The MCMC Procedure
Output 74.1.3 Diagnostics Plots for ˛
You can overlay the estimated kernel density with the true density to visually compare the densities, as
displayed in Output 74.1.4. To create the kernel comparison plot, you first call PROC KDE (see Chapter 67,
“The KDE Procedure”) to obtain a kernel density estimate of the posterior density on alpha. Then you
evaluate a grid of alpha values on a normal density. The following statements evaluate kernel density and
compute the corresponding normal density:
proc kde data=simout;
ods exclude inputs controls;
univar alpha /out=sample;
run;
data den;
set sample;
alpha = value;
true = pdf('normal', alpha, 0, 1);
keep alpha density true;
run;
Next you plot the two curves on top of each other by using PROC SGPLOT (see Chapter 21, “Statistical
Graphics Using ODS”) as follows:
proc sgplot data=den;
yaxis label="Density";
series y=density x=alpha / legendlabel = "MCMC Kernel";
series y=true x=alpha / legendlabel = "True Density";
discretelegend;
run;
Example 74.1: Simulating Samples From a Known Density F 5779
Output 74.1.4 shows the result. You can see that the kernel estimate and the true density are very similar to
each other.
Output 74.1.4 Estimated Density versus the True Density
Density Visualization Macro
In programs that do not involve any data set variables, PROC MCMC samples directly from the (joint)
prior distributions of the parameters. The modification makes the sampling from a known distribution more
efficient and more precise. For example, you can write simple programs, such as the following macro, to
understand different aspects of a prior distribution of interest, such as its moments, intervals, shape, spread,
and so on:
%macro density(dist=, seed=0);
%let savenote = %sysfunc(getoption(notes));
options nonotes;
title "&dist distribution.";
data _a;
run;
ods select densitypanel postsumint;
proc mcmc data=_a nmc=10000 diag=none nologdist
plots=density seed=&seed;
parms alpha;
prior alpha ~ &dist;
model general(0);
run;
5780 F Chapter 74: The MCMC Procedure
proc datasets nolist;
delete _a;
run;
options &savenote;
%mend;
%density(dist=beta(4, 12), seed=1);
The macro %density creates an empty data set, invokes PROC MCMC, draws 10,000 samples from a beta(4,
12) distribution, displays summary and interval statistics, and generates a kernel density plot. Summary and
interval statistics from the beta distribution are displayed in Output 74.1.5.
Output 74.1.5 Beta Distribution Statistics
beta(4, 12) distribution.
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
alpha
N
Standard
95%
Mean Deviation HPD Interval
10000 0.2494
0.1039 0.0657 0.4590
The distribution is displayed in Output 74.1.6.
Output 74.1.6 Density Plot
Example 74.1: Simulating Samples From a Known Density F 5781
Calculation of Integrals
One advantage of MCMC methods is to estimate any integral under the curve of a target distribution. This
can be done fairly easily using the MCMC procedure. Suppose you are interested in estimating the following
cumulative probability:
Z
1:3
.˛j0; 1/d˛
˛D0
To estimate this integral, PROC MCMC draws samples from the distribution and counts the portion of the
simulated values that fall within the desired range of [0, 1.3]. This becomes a Monte Carlo estimate of the
integral. The following statements simulate samples from a standard normal distribution and estimate the
integral:
proc mcmc data=x outpost=simout seed=23 nmc=10000 nologdist
monitor=(int) diagnostics=none;
ods select postsumint;
parm alpha 0;
prior alpha ~ normal(0, sd=1);
int = (0 <= alpha <= 1.3);
model general(0);
run;
The ODS SELECT statement displays the posterior summary statistics table. The MONITOR= option outputs
analysis on the variable int (the integral estimate). The STATISTICS= option computes the summary statistics.
The NOLOGDIST option omits the calculation of the log of the prior distribution at each iteration, shortening
the simulation time7 . The INT assignment statement sets int to be 1 if the simulated alpha value falls between
0 and 1.3, and 0 otherwise. PROC MCMC supports the usage of the IF-ELSE logical control if you need to
account for more complex conditions. Output 74.1.7 displays the estimated integral value:
Output 74.1.7 Monte Carlo Integral from a Normal Distribution
beta(4, 12) distribution.
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
int
N
Standard
Mean Deviation
10000 0.4079
0.4915
95%
HPD
Interval
0 1.0000
In this simulation, 4079 samples fall between 0 and 1.3, making the expected probability 0.4079. In this
example, you can verify the actual cumulative probability by calling the CDF function in the DATA step:
data _null_;
int = cdf("normal", 1.3, 0, 1) - cdf("normal", 0, 0, 1);
put int=;
run;
The value is 0.4032.
7 In this example, the NOLOGDIST option saves only a fraction of the time. But in more complex simulation schemes that
involve a larger number of distributions and parameters, the time reduction could be significant.
5782 F Chapter 74: The MCMC Procedure
Sampling from a Mixture of Normal Densities
Suppose you are interested in generating samples from a three-component mixture of normal distributions,
with the density specified as follows:
p.˛/ D 0:3 . 3; D 2/ C 0:4 .2; D 1/ C 0:3 .10; D 4/
You can either specify the distribution directly or use a latent variable approach to generate samples from the
normal mixture.
To specify the normal mixture density directly in PROC MCMC, you need to construct the function because
the normal mixture distribution is not one of the standard distributions that PROC MCMC supports. The
following statements generate random samples from the normal mixture density:
title 'Simulating Samples from a Mixture of Normal Densities';
data x;
run;
proc mcmc data=x outpost=simout seed=1234 nmc=30000;
ods select TADpanel;
parm alpha 0.3;
lp = logpdf('normalmix', alpha, 3, 0.3, 0.4, 0.3, -3, 2, 10, 2, 1, 4);
prior alpha ~ general(lp);
model general(0);
run;
The ODS SELECT statement displays the diagnostic plots. All other tables, such as the NObs tables, are
excluded. The PROC MCMC statement uses the input data set X, saves output to the Simout data set, sets a
random number seed, and draws 30,000 samples.
The LP assignment statement evaluates the log density of alpha at the mixture density, using the SAS function
LOGPDF. The number 3 after alpha in the LOGPDF function indicates that the density is a three-component
normal mixture. The following three numbers, 0.3, 0.4, and 0.3, are the weights in the mixture; –3, 2, and 10
are the means; 2, 1, and 4 are the standard deviations. The PRIOR statement assigns this log density function
to alpha as its prior. Note that the GENERAL function interprets the density on the log scale, and not the
original scale–you must use the LOGPDF function, not the PDF function. Output 74.1.8 displays the results.
The kernel density clearly shows three modes.
Example 74.1: Simulating Samples From a Known Density F 5783
Output 74.1.8 Plots of Posterior Samples from a Mixture Normal Distribution
Alternatively, the normal mixture distribution can also decomposed into a marginal distribution for the
component (call it Z) and a conditional model of the response variable Y given Z, as
z categorical.p1 ; p2 ; : : : ; pK /
yjz normal.z ; z2 /
where K is the total number of mixture components, z is the component indicator, and z and z2 are the
model parameters for the zth component.
Starting with SAS/STAT 13.2, PROC MCMC supports a categorical distribution that can be used to model
the discrete random variable for components. You can use PRIOR statements to specify a normal mixture
distribution and generate samples accordingly:
proc mcmc data=x outpost=simout_m seed=1234 nmc=30000;
array p[3] (0.3 0.4 0.3);
array mu[3] (-3 2 10);
array sd[3] (2 1 4);
parm z alpha;
prior z ~ table(p);
prior alpha ~ normal(mu[z], sd=sd[z]);
model general(0);
run;
5784 F Chapter 74: The MCMC Procedure
The ARRAY statements define one array p for the mixture weights, one array mu for the means of the normal
distributions, and one array sd for the corresponding standard deviations. The PRIOR statements specify
a categorical prior on the parameter z and a conditional normal prior on alpha. The mean and standard
deviation of the alpha parameter depend on the component indicator z. No output is created.
You can use the following set of statements, which are similar to the previous example, to overlay the
estimated kernel density with the true density. The comparison is shown in Output 74.1.9.
proc kde data=simout_m;
ods exclude inputs controls;
univar alpha /out=sample;
run;
data den;
set sample;
alpha = value;
true = pdf('normalmix', alpha, 3, 0.3, 0.4, 0.3, -3, 2, 10, 2, 1, 4);
keep alpha density true;
run;
proc sgplot data=den;
yaxis label="Density";
series y=density x=alpha /
legendlabel = "MCMC Kernel - Latent Variable Approach";
series y=true x=alpha / legendlabel = "True Density";
discretelegend;
run;
Output 74.1.9 Estimated Density (Latent Variable Approach) versus the True Density
Example 74.2: Box-Cox Transformation F 5785
Example 74.2: Box-Cox Transformation
Box-Cox transformations (Box and Cox 1964) are often used to find a power transformation of a dependent
variable to ensure the normality assumption in a linear regression model. This example illustrates how you
can use PROC MCMC to estimate a Box-Cox transformation for a linear regression model. Two different
priors on the transformation parameter are considered: a continuous prior and a discrete prior. You can
estimate the probability of being 0 with a discrete prior but not with a continuous prior. The IF-ELSE
statements are demonstrated in the example.
Using a Continuous Prior on The following statements create a SAS data set with measurements of y (the response variable) and x (a
single dependent variable):
title 'Box-Cox Transformation, with a Continuous Prior on Lambda';
data boxcox;
input y x @@;
datalines;
10.0 3.0 72.6 8.3 59.7 8.1 20.1 4.8 90.1 9.8
1.1 0.9
78.2 8.5 87.4 9.0
9.5 3.4
0.1 1.4
0.1 1.1 42.5 5.1
57.0 7.5
9.9 1.9
0.5 1.0 121.1 9.9 37.5 5.9 49.5 6.7
... more lines ...
2.6
1.8
58.6
7.9
81.2
8.1
37.2
6.9
;
The Box-Cox transformation of y takes on the form of:
( y
1
if ¤ 0I
y./ D
log.y/ if D 0:
The transformed response y./ is assumed to be normally distributed:
yi ./ normal.ˇ0 C ˇ1 xi ; 2 /
The likelihood with respect to the original response yi is as follows:
p.yi j; ˇ; 2 ; xi / / .yi jˇ0 C ˇ1 xi ; 2 / J.; yi /
where J.; yi / is the Jacobian:
1
yi
if ¤ 0I
J.; y/ D
1=yi if D 0:
And on the log-scale, the Jacobian becomes:
. 1/ log.yi / if ¤ 0I
log.J.; y// D
log.yi /
if D 0:
There are four model parameters: ; ˇ D fˇ0 ; ˇ1 g; and 2 . You can considering using a flat prior on ˇ and
a gamma prior on 2 .
5786 F Chapter 74: The MCMC Procedure
To consider only power transformations ( ¤ 0), you can use a continuous prior (for example, a uniform
prior from –2 to 2) on . One issue with using a continuous prior is that you cannot estimate the probability
of D 0. To do so, you need to consider a discrete prior that places positive probability mass on the point 0.
See “Modeling D 0” on page 5789.
The following statements fit a Box-Cox transformation model:
ods graphics on;
proc mcmc data=boxcox nmc=50000 propcov=quanew seed=12567
monitor=(lda);
ods select PostSumInt TADpanel;
parms beta0 0 beta1 0 lda 1 s2 1;
beginnodata;
prior beta: ~ general(0);
prior s2 ~ gamma(shape=3, scale=2);
prior lda ~ unif(-2,2);
sd = sqrt(s2);
endnodata;
ys = (y**lda-1)/lda;
mu = beta0+beta1*x;
ll = (lda-1)*log(y)+lpdfnorm(ys, mu, sd);
model general(ll);
run;
The PROPCOV= option initializes the Markov chain at the posterior mode and uses the estimated inverse
Hessian matrix as the initial proposal covariance matrix. The MONITOR= option selects as the variable to
report. The ODS SELECT statement displays the summary statistics table, the interval statistics table, and
the diagnostic plots.
The PARMS statement puts all four parameters, ˇ0 , ˇ1 , , and 2 , in a single block and assigns initial values
to each of them. Three PRIOR statements specify previously stated prior distributions for these parameters.
The assignment to sd transforms a variance to a standard deviation. It is better to place the transformation
inside the BEGINNODATA and ENDNODATA statements to save computational time.
The assignment to the symbol ys evaluates the Box-Cox transformation of y, where mu is the regression
mean and ll is the log likelihood of the transformed variable ys. Note that the log of the Jacobian term is
included in the calculation of ll.
Summary statistics and interval statistics for lda are listed in Output 74.2.1.
Output 74.2.1 Box-Cox Transformation
Box-Cox Transformation, with a Continuous Prior on Lambda
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
lda
N
Standard
95%
Mean Deviation HPD Interval
50000 0.4703
0.0287 0.4156 0.5284
The posterior mean of is 0.47, with a 95% equal-tail interval of Œ0:42; 0:53 and a similar HPD interval.
The preferred power transformation would be 0.5 (rounding up to the square root transformation).
Example 74.2: Box-Cox Transformation F 5787
Output 74.2.2 shows diagnostics plots for lda. The chain appears to converge, and you can proceed to make
inferences. The density plot shows that the posterior density is relatively symmetric around its mean estimate.
Output 74.2.2 Diagnostic Plots for To verify the results, you can use PROC TRANSREG (see Chapter 119, “The TRANSREG Procedure”) to
find the estimate of .
proc transreg data=boxcox details pbo;
ods output boxcox = bc;
model boxcox(y / convenient lambda=-2 to 2 by 0.01) = identity(x);
output out=trans;
run;
Output from PROC TRANSREG is shown in Output 74.2.5 and Output 74.2.4. PROC TRANSREG produces
a similar point estimate of D 0:46, and the 95% confidence interval is shown in Output 74.2.5.
5788 F Chapter 74: The MCMC Procedure
Output 74.2.3 Box-Cox Transformation Using PROC TRANSREG
Output 74.2.4 Estimates Reported by PROC TRANSREG
Box-Cox Transformation, with a Continuous Prior on Lambda
The TRANSREG Procedure
Model Statement Specification Details
Type DF Variable
Dep
Description
1 BoxCox(y) Lambda Used
Value
0.5
Lambda
0.46
Log Likelihood
-167.0
Conv. Lambda
0.5
Conv. Lambda LL -168.3
Ind
CI Limit
-169.0
Alpha
0.05
Options
Convenient Lambda Used
1 Identity(x) DF
1
The ODS data set Bc contains the 95% confidence interval estimates produced by PROC TRANSREG. This
ODS table is rather large, and you want to see only the relevant portion. The following statements generate
the part of the table that is important and display Output 74.2.5:
Example 74.2: Box-Cox Transformation F 5789
proc print noobs label data=bc(drop=rmse);
title2 'Confidence Interval';
where ci ne ' ' or abs(lambda - round(lambda, 0.5)) < 1e-6;
label convenient = '00'x ci = '00'x;
run;
The estimated 90% confidence interval is Œ0:41; 0:51, which is very close to the reported Bayesian credible
intervals. The resemblance of the intervals is probably due to the noninformative prior that you used in this
analysis.
Output 74.2.5 Estimated Confidence Interval on Box-Cox Transformation, with a Continuous Prior on Lambda
Confidence Interval
Dependent Lambda
Log
R-Square Likelihood
BoxCox(y)
-2.00
0.14
-1030.56
BoxCox(y)
-1.50
0.17
-810.50
BoxCox(y)
-1.00
0.22
-602.53
BoxCox(y)
-0.50
0.39
-415.56
BoxCox(y)
0.00
0.78
-257.92
BoxCox(y)
0.41
0.95
-168.40 *
BoxCox(y)
0.42
0.95
-167.86 *
BoxCox(y)
0.43
0.95
-167.46 *
BoxCox(y)
0.44
0.95
-167.19 *
BoxCox(y)
0.45
0.95
-167.05 *
BoxCox(y)
0.46
0.95
-167.04 <
BoxCox(y)
0.47
0.95
-167.16 *
BoxCox(y)
0.48
0.95
-167.41 *
BoxCox(y)
0.49
0.95
-167.79 *
BoxCox(y)
0.50 +
0.95
-168.28 *
BoxCox(y)
0.51
0.95
-168.89 *
BoxCox(y)
1.00
0.89
-253.09
BoxCox(y)
1.50
0.79
-345.35
BoxCox(y)
2.00
0.70
-435.01
Modeling D 0
With a continuous prior on , you can get only a continuous posterior distribution, and this makes the
probability of Pr. D 0jdata/ equal to 0 by definition. To consider D 0 as a viable solution to the Box-Cox
transformation, you need to use a discrete prior that places some probability mass on the point 0 and allows
for a meaningful posterior estimate of Pr. D 0jdata/.
This example uses a simulation study where the data are generated from an exponential likelihood. The
simulation implies that the correct transformation should be the logarithm and should be 0. Consider the
following exponential model:
y D exp.x C /;
5790 F Chapter 74: The MCMC Procedure
where normal.0; 1/. The transformed data can be fitted with a linear model:
log.y/ D x C The following statements generate a SAS data set with a gridded x and corresponding y:
title 'Box-Cox Transformation, Modeling Lambda = 0';
data boxcox;
do x = 1 to 8 by 0.025;
ly = x + normal(7);
y = exp(ly);
output;
end;
run;
The log-likelihood function, after taking the Jacobian into consideration, is as follows:
8
2
xi /
ˆ
< . 1/ log.yi / 1 log 2 C ..yi 1/=
C C1 if ¤ 0I
2
2
log p.yi j; xi / D
2
ˆ
: log.yi / 1 log 2 C .log.yi /2 xi / C C2
if D 0:
2
where C1 and C2 are two constants.
You can use the function DGENERAL to place a discrete prior on . The function is similar to the function
GENERAL, except that it indicates a discrete distribution. For example, you can specify a discrete uniform
prior from –2 to 2 using
prior lda ~ dgeneral(1, lower=-2, upper=2);
This places equal probability mass on five points, –2, –1, 0, 1, and 2. This prior might not work well here
because the grid is too coarse. To consider smaller values of , you can sample a parameter that takes a
wider range of integer values and transform it back to the space. For example, set alpha as your model
parameter and give it a discrete uniform prior from –200 to 200. Then define as alpha/100 so can take
values between –2 and 2 but on a finer grid.
The following statements fit a Box-Cox transformation by using a discrete prior on :
proc mcmc data=boxcox outpost=simout nmc=50000 seed=12567
monitor=(lda);
ods select PostSumInt;
parms s2 1 alpha 10;
beginnodata;
prior s2 ~ gamma(shape=3, scale=2);
if alpha=0 then lp = log(2);
else lp = log(1);
prior alpha ~ dgeneral(lp, lower=-200, upper=200);
lda = alpha * 0.01;
sd = sqrt(s2);
endnodata;
if alpha=0 then
ll = -ly+lpdfnorm(ly, x, sd);
else do;
ys = (y**lda - 1)/lda;
ll = (lda-1)*ly+lpdfnorm(ys, x, sd);
Example 74.2: Box-Cox Transformation F 5791
end;
model general(ll);
run;
There are two parameters, s2 and alpha, in the model. They are placed in a single PARMS statement so that
they are sampled in the same block.
The parameter s2 takes a gamma distribution, and alpha takes a discrete prior. The IF-ELSE statements
state that alpha takes twice as much prior density when it is 0 than otherwise. Note that on the original
scale, Pr.alpha D 0/ D 2 Pr.alpha ¤ 0/. Translating that to the log scale, the densities become log.2/ and
log.1/, respectively. The LDA assignment statement transforms alpha to the parameter of interest: lda takes
values between –2 and 2. You can model lda on a even smaller scale by dividing alpha by a larger constant.
However, an increment of 0.01 in the Box-Cox transformation is usually sufficient. The SD assignment
statement calculates the square root of the variance term.
The log-likelihood function uses another set of IF-ELSE statements, separating the case of D 0 from the
others. The formulas are stated previously. The output of the program is shown in Output 74.2.6.
Output 74.2.6 Box-Cox Transformation
Box-Cox Transformation, Modeling Lambda = 0
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
lda
N
95%
Standard
HPD
Mean Deviation Interval
50000 -0.00001
0.00199
0
0
From the summary statistics table, you see that the point estimate for is 0 and both of the 95% equal-tail
and HPD credible intervals are 0. This strongly suggests that D 0 is the best estimate for this problem. In
addition, you can also count the frequency of among posterior samples to get a more precise estimate on
the posterior probability of being 0.
The following statements use PROC FREQ to produce Output 74.2.7 and Output 74.2.8:
proc freq data=simout;
ods select onewayfreqs freqplot;
tables lda /nocum plot=freqplot(scale=percent);
run;
ods graphics off;
Output 74.2.7 shows the frequency count table. An estimate of Pr. D 0jdata/ is 96%. The conclusion is that
the log transformation should be the appropriate transformation used here, which agrees with the simulation
setup. Output 74.2.8 shows the histogram of .
5792 F Chapter 74: The MCMC Procedure
Output 74.2.7 Frequency Counts of Box-Cox Transformation, Modeling Lambda = 0
The FREQ Procedure
lda Frequency Percent
-0.0100
1011
2.02
0
48029
96.06
0.0100
960
1.92
Output 74.2.8 Histogram of Example 74.3: Logistic Regression Model with a Diffuse Prior
This example illustrates how to fit a logistic regression model with a diffuse prior in PROC MCMC. You can
also use the BAYES statement in PROC GENMOD. See Chapter 45, “The GENMOD Procedure.”
The following statements create a SAS data set with measurements of the number of deaths, y, among n
beetles that have been exposed to an environmental contaminant x:
title 'Logistic Regression Model with a Diffuse Prior';
data beetles;
input n y x @@;
datalines;
6 0 25.7
8 2 35.9
5 2 32.9
7 7 50.4
6 0
28.3
Example 74.3: Logistic Regression Model with a Diffuse Prior F 5793
7
6
8
;
2
6
2
32.3
49.6
35.2
5
6
6
1
3
6
33.2
39.8
51.3
8
6
5
3
4
3
40.9
43.6
42.5
6
6
7
0
1
0
36.5
34.1
31.3
6
7
3
1
1
2
36.5
37.4
40.6
You can model the data points yi with a binomial distribution,
yi jpi binomial.ni ; pi /
where pi is the success probability and links to the regression covariate xi through a logit transformation:
pi
D ˛ C ˇxi
logit.pi / D log
1 pi
The priors on ˛ and ˇ are both diffuse normal:
˛ normal.0; var D 10000/
ˇ normal.0; var D 10000/
These statements fit a logistic regression with PROC MCMC:
ods graphics on;
proc mcmc data=beetles ntu=1000 nmc=20000 propcov=quanew
diag=(mcse ess) outpost=beetleout seed=246810;
ods select PostSumInt mcse ess TADpanel;
parms (alpha beta) 0;
prior alpha beta ~ normal(0, var = 10000);
p = logistic(alpha + beta*x);
model y ~ binomial(n,p);
run;
The key statement in the program is the assignment to p that calculates the probability of death. The SAS
function LOGISTIC does the proper transformation. The MODEL statement specifies that the response
variable, y, is binomially distributed with parameters n (from the input data set) and p. The summary statistics
table, interval statistics table, the Monte Carlo standard error table, and the effective sample sizes table are
shown in Output 74.3.1.
Output 74.3.1 MCMC Results
Logistic Regression Model with a Diffuse Prior
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
alpha
20000 -11.7689
2.0942 -15.9412 -7.7491
beta
20000
0.0541
0.2919
0.1901 0.4029
5794 F Chapter 74: The MCMC Procedure
Output 74.3.1 continued
Logistic Regression Model with a Diffuse Prior
The MCMC Procedure
Monte Carlo Standard Errors
Parameter
Standard
MCSE Deviation MCSE/SD
alpha
0.0418
2.0942
0.0200
beta
0.00109
0.0541
0.0201
Effective Sample Sizes
Parameter
ESS
Autocorrelation
Time Efficiency
alpha
2507.0
7.9778
0.1253
beta
2478.5
8.0694
0.1239
The summary statistics table shows that the sample mean of the output chain for the parameter alpha is
–11.77. This is an estimate of the mean of the marginal posterior distribution for the intercept parameter
alpha. The estimated posterior standard deviation for alpha is 2.09. The two 95% credible intervals for
alpha are both negative, which indicates with very high probability that the intercept term is negative. On the
other hand, you observe a positive effect on the regression coefficient beta. Exposure to the environment
contaminant increases the probability of death.
The Monte Carlo standard errors of each parameter are significantly small relative to the posterior standard
deviations. A small MCSE/SD ratio indicates that the Markov chain has stabilized and the mean estimates do
not vary much over time. Note that the precision in the parameter estimates increases with the square of the
MCMC sample size, so if you want to double the precision, you must quadruple the MCMC sample size.
MCMC chains do not produce independent samples. Each sample point depends on the point before it. In this
case, the correlation time estimate, read from the effective sample sizes table, is roughly 8. This means that it
takes four observations from the MCMC output to make inferences about alpha with the same precision that
you would get from using an independent sample. The effective sample size of around 2500 reflects this loss
of efficiency. The coefficient beta has similar efficiency. You can often observe that some parameters have
significantly better mixing (better efficiency) than others, even in a single Markov chain run.
Example 74.3: Logistic Regression Model with a Diffuse Prior F 5795
Output 74.3.2 Plots for Parameters in the Logistic Regression Example
5796 F Chapter 74: The MCMC Procedure
Trace plots and autocorrelation plots of the posterior samples are shown in Output 74.3.2. Convergence looks
good in both parameters; there is good mixing in the trace plot and quick drop-off in the ACF plot.
One advantage of Bayesian methods is the ability to directly answer scientific questions. In this example, you
might want to find out the posterior probability that the environmental contaminant increases the probability
of death—that is, Pr.ˇ > 0jy/. This can be estimated using the following steps:
proc format;
value betafmt low-0 = 'beta <= 0' 0<-high = 'beta > 0';
run;
proc freq data=beetleout;
tables beta /nocum;
format beta betafmt.;
run;
Output 74.3.3 Frequency Counts
Logistic Regression Model with a Diffuse Prior
The FREQ Procedure
beta Frequency Percent
beta > 0
20000
100.00
All of the simulated values for ˇ are greater than zero, so the sample estimate of the posterior probability
that ˇ > 0 is 100%. The evidence overwhelmingly supports the hypothesis that increased levels of the
environmental contaminant increase the probability of death.
If you are interested in making inference based on any quantities that are transformations of the random
variables, you can either do it directly in PROC MCMC or by using the DATA step after you run the simulation.
Transformations sometimes can make parameter inference quite formidable using direct analytical methods,
but with simulated chains, it is easy to compute chains for any set of parameters. Suppose you are interested
in the lethal dose and want to estimate the level of the covariate x that corresponds to a probability of death,
p. Abbreviate this quantity as ldp. In other words, you want to solve the logit transformation with a fixed
value p. The lethal dose is as follows:
log 1 pp
˛
ldp D
ˇ
You can obtain an estimate of any ldp by using the posterior mean estimates for ˛ and ˇ. For example, lp95,
which corresponds to p D 0:95, is calculated as follows:
log 1 0:95
0:95 C 11:77
lp95 D
D 50:79
0:29
where –11.77 and 0.29 are the posterior mean estimates of ˛ and ˇ, respectively, and 50.79 is the estimated
lethal dose that leads to a 95% death rate.
While it is easy to obtain the point estimates, it is harder to estimate other posterior quantities, such as the
standard deviation directly. However, with PROC MCMC, you can trivially get estimates of any posterior
quantities of lp95. Consider the following program in PROC MCMC:
Example 74.3: Logistic Regression Model with a Diffuse Prior F 5797
proc mcmc data=beetles ntu=1000 nmc=20000 propcov=quanew
outpost=beetleout seed=246810 plot=density
monitor=(pi30 ld05 ld50 ld95);
ods select PostSumInt densitypanel;
parms (alpha beta) 0;
begincnst;
c1 = log(0.05 / 0.95);
c2 = -c1;
endcnst;
beginnodata;
prior alpha beta ~ normal(0, var = 10000);
pi30 = logistic(alpha + beta*30);
ld05 = (c1 - alpha) / beta;
ld50 = - alpha / beta;
ld95 = (c2 - alpha) / beta;
endnodata;
pi = logistic(alpha + beta*x);
model y ~ binomial(n,pi);
run;
ods graphics off;
The program estimates four additional posterior quantities. The three lpd quantities, ld05, ld50, and ld95, are
the three levels of the covariate that kills 5%, 50%, and 95% of the population, respectively. The predicted
probability when the covariate x takes the value of 30 is pi30. The MONITOR= option selects the quantities
of interest. The PLOTS= option selects kernel density plots as the only ODS graphical output, excluding the
trace plot and autocorrelation plot.
Programming statements between the BEGINCNST and ENDCNST statements define two constants. These
statements are executed once at the beginning of the simulation. The programming statements between the
BEGINNODATA and ENDNODATA statements evaluate the quantities of interest. The symbols, pi30, ld05,
ld50, and ld95, are functions of the parameters alpha and beta only. Hence, they should not be processed
at the observation level and should be included in the BEGINNODATA and ENDNODATA statements.
Output 74.3.4 lists the posterior summary and Output 74.3.5 shows the density plots of these posterior
quantities.
Output 74.3.4 PROC MCMC Results
Logistic Regression Model with a Diffuse Prior
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
0.0524
0.0252
95%
HPD Interval
pi30
20000
ld05
20000 29.9310
1.8731 26.2170 33.2648
0.0126
0.1022
ld50
20000 40.3781
0.9371 38.5334 42.1808
ld95
20000 50.8251
2.5327 46.2349 55.7958
5798 F Chapter 74: The MCMC Procedure
The posterior mean estimate of lp95 is 50.82, which is close to the estimate of 50.79 by using the posterior
mean estimates of the parameters. With PROC MCMC, in addition to the mean estimate, you can get the
standard deviation, quantiles, and interval estimates at any level of significance.
From the density plots, you can see, for example, that the sample distribution for 30 is skewed to the right,
and almost all of your posterior belief concerning 30 is concentrated in the region between zero and 0.15.
Output 74.3.5 Density Plots of Quantities of Interest in the Logistic Regression Example
It is easy to use the DATA step to calculate these quantities of interest. The following DATA step uses the
simulated values of ˛ and ˇ to create simulated values from the posterior distributions of ld05, ld50, ld95,
and 30 :
data transout;
set beetleout;
pi30 = logistic(alpha + beta*30);
ld05 = (log(0.05 / 0.95) - alpha) / beta;
ld50 = (log(0.50 / 0.50) - alpha) / beta;
ld95 = (log(0.95 / 0.05) - alpha) / beta;
run;
Subsequently, you can use SAS/INSIGHT, or the UNIVARIATE, CAPABILITY, or KDE procedures to
analyze the posterior sample. If you want to regenerate the default ODS graphs from PROC MCMC, see
“Regenerating Diagnostics Plots” on page 5740.
Example 74.4: Logistic Regression Model with Jeffreys’ Prior F 5799
Example 74.4: Logistic Regression Model with Jeffreys’ Prior
A controlled experiment was run to study the effect of the rate and volume of air inspired on a transient reflex
vasoconstriction in the skin of the fingers. Thirty-nine tests under various combinations of rate and volume of
air inspired were obtained (Finney 1947). The result of each test is whether or not vasoconstriction occurred.
Pregibon (1981) uses this set of data to illustrate the diagnostic measures he proposes for detecting influential
observations and to quantify their effects on various aspects of the maximum likelihood fit. The following
statements create the data set Vaso:
title 'Logistic Regression Model with Jeffreys Prior';
data vaso;
input vol rate resp @@;
lvol = log(vol);
lrate = log(rate);
ind = _n_;
cnst = 1;
datalines;
3.7 0.825 1 3.5 1.09 1 1.25 2.5
1 0.75 1.5
0.8 3.2
1 0.7 3.5
1 0.6
0.75 0 1.1
1.7
0.9 0.75
0 0.9 0.45 0 0.8
0.57 0 0.55 2.75
0.6 3.0
0 1.4 2.33 1 0.75 3.75 1 2.3 1.64
3.2 1.6
1 0.85 1.415 1 1.7
1.06 0 1.8 1.8
0.4 2.0
0 0.95 1.36 0 1.35 1.35 0 1.5 1.36
1.6 1.78
1 0.6 1.5
0 1.8
1.5
1 0.95 1.9
1.9 0.95
1 1.6 0.4
0 2.7
0.75 1 2.35 0.03
1.1 1.83
0 1.1 2.2
1 1.2
2.0
1 0.8 3.33
0.95 1.9
0 0.75 1.9
0 1.3
1.625 1
;
1
0
0
1
1
0
0
0
1
The variable resp represents the outcome of a test. The variable lvol represents the log of the volume of air
intake, and the variable lrate represents the log of the rate of air intake. You can model the data by using
logistic regression. You can model the response with a binary likelihood:
respi binary.pi /
with
pi D
1
1 C exp. .ˇ0 C ˇ1 lvoli C ˇ2 lratei //
Let X be the design matrix in the regression. Jeffreys’ prior for this model is
p.ˇ/ / jX0 MXj1=2
where M is a 39 by 39 matrix with off-diagonal elements being 0 and diagonal elements being pi .1 pi /.
For details on Jeffreys’ prior, see “Jeffreys’ Prior” on page 125 in Chapter 7, “Introduction to Bayesian
Analysis Procedures.” You can use a number of matrix functions, such as the determinant function, in PROC
MCMC to construct Jeffreys’ prior. The following statements illustrate how to fit a logistic regression with
Jeffreys’ prior:
5800 F Chapter 74: The MCMC Procedure
%let n = 39;
proc mcmc data=vaso nmc=10000 outpost=mcmcout seed=17;
ods select PostSumInt;
array
array
array
array
array
array
array
beta[3] beta0 beta1 beta2;
m[&n, &n];
x[1] / nosymbols;
xt[3, &n];
xtm[3, &n];
xmx[3, 3];
p[&n];
parms beta0 1 beta1 1 beta2 1;
begincnst;
if (ind
rc =
call
call
end;
endcnst;
eq 1) then do;
read_array("vaso", x, "cnst", "lvol", "lrate");
transpose(x, xt);
zeromatrix(m);
beginnodata;
call mult(x, beta, p);
do i = 1 to &n;
p[i] = 1 / (1 + exp(-p[i]));
m[i,i] = p[i] * (1-p[i]);
end;
call mult (xt, m, xtm);
call mult (xtm, x, xmx);
call det (xmx, lp);
lp = 0.5 * log(lp);
prior beta: ~ general(lp);
endnodata;
/* p = x * beta */
/* p[i] = 1/(1+exp(-x*beta)) */
/*
/*
/*
/*
xtm = xt * m
xmx = xtm * x
lp = det(xmx)
lp = -0.5 * log(lp)
*/
*/
*/
*/
model resp ~ bern(p[ind]);
run;
The first ARRAY statement defines an array beta with three elements: beta0, beta1, and beta2. The
subsequent statements define arrays that are used in the construction of Jeffreys’ prior. These include m (the
M matrix), x (the design matrix), xt (the transpose of x), and some additional work spaces.
The explanatory variables lvol and lrate are saved in the array x in the BEGINCNST and ENDCNST
statements. See “BEGINCNST/ENDCNST Statement” on page 5663 for details. After all the variables
are read into x, you transpose the x matrix and store it to xt. The ZEROMATRIX function call assigns all
elements in matrix m the value zero. To avoid redundant calculation, it is best to perform these calculations
as the last observation of the data set is processed—that is, when ind is 39.
You calculate Jeffreys’ prior in the BEGINNODATA and ENDNODATA statements. The probability vector p
is the product of the design matrix x and parameter vector beta. The diagonal elements in the matrix m are
pi .1 pi /. The expression lp is the logarithm of Jeffreys’ prior. The PRIOR statement assigns lp as the prior
for the ˇ regression coefficients. The MODEL statement assigns a binary likelihood to resp, with probability
p[ind]. The p array is calculated earlier using the matrix function MULT. You use the ind variable to pick out
the right probability value for each resp.
Example 74.4: Logistic Regression Model with Jeffreys’ Prior F 5801
Posterior summary statistics are displayed in Output 74.4.1.
Output 74.4.1 PROC MCMC Results, Jeffreys’ prior
Logistic Regression Model with Jeffreys Prior
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
beta0
10000 -2.9587
1.3258 -5.5936 -0.6027
beta1
10000 5.2905
1.8193 1.8590 8.7222
beta2
10000 4.6889
1.8189 1.3611 8.2490
You can also use PROC GENMOD to fit the same model by using the following statements:
proc genmod data=vaso descending;
ods select PostSummaries PostIntervals;
model resp = lvol lrate / d=bin link=logit;
bayes seed=17 coeffprior=jeffreys nmc=20000 thin=2;
run;
The MODEL statement indicates that resp is the response variable and lvol and lrate are the covariates. The
options in the MODEL statement specify a binary likelihood and a logit link function. The BAYES statement
requests Bayesian capability. The SEED=, NMC=, and THIN= arguments work in the same way as in PROC
MCMC. The COEFFPRIOR=JEFFREYS option requests Jeffreys’ prior in this analysis.
The PROC GENMOD statements produce Output 74.4.2, with estimates very similar to those reported in
Output 74.4.1. Note that you should not expect to see identical output from PROC GENMOD and PROC
MCMC, even with the simulation setup and identical random number seed. The two procedures use different
sampling algorithms. PROC GENMOD uses the adaptive rejection metropolis algorithm (ARMS) (Gilks and
Wild 1992; Gilks 2003) while PROC MCMC uses a random walk Metropolis algorithm. The asymptotic
answers, which means that you let both procedures run an very long time, would be the same as they both
generate samples from the same posterior distribution.
Output 74.4.2 PROC GENMOD Results
Logistic Regression Model with Jeffreys Prior
The GENMOD Procedure
Bayesian Analysis
Posterior Summaries
Percentiles
Parameter
N
Standard
Mean Deviation
25%
50%
75%
Intercept
10000 -2.8773
1.3213 -3.6821 -2.7326 -1.9097
lvol
10000 5.2059
1.8707 3.8535 4.9574 6.3337
lrate
10000 4.5525
1.8140 3.2281 4.3722 5.6643
5802 F Chapter 74: The MCMC Procedure
Output 74.4.2 continued
Posterior Intervals
Parameter Alpha
Equal-Tail
Interval
HPD Interval
Intercept
0.050 -5.7447 -0.6877 -5.4593 -0.5488
lvol
0.050 2.2066 9.4415 2.0729 9.2343
lrate
0.050 1.5906 8.5272 1.3351 8.1152
Example 74.5: Poisson Regression
You can use the Poisson distribution to model the distribution of cell counts in a multiway contingency
table. Aitkin et al. (1989) have used this method to model insurance claims data. Suppose the following
hypothetical insurance claims data are classified by two factors: age group (with two levels) and car type
(with three levels). The following statements create the data set:
title 'Poisson Regression';
data insure;
input n c car $ age;
ln = log(n);
datalines;
500
42 small 0
1200 37 medium 0
100
1 large 0
400 101 small 1
500
73 medium 1
300
14 large 1
;
proc transreg data=insure design;
model class(car / zero=last);
id n c age ln;
output out=input_insure(drop=_: Int:);
run;
The variable n represents the number of insurance policy holders and the variable c represents the number of
insurance claims. The variable car is the type of car involved (classified into three groups), and it is coded
into two levels. The variable age is the age group of a policy holder (classified into two groups).
Assume that the number of claims c has a Poisson probability distribution and that its mean, i , is related to
the factors car and age for observation i by
log.i / D log.ni / C x0 ˇ
D log.ni / C ˇ0 C
cari .1/ˇ1 C cari .2/ˇ2 C cari .3/ˇ3 C
agei .1/ˇ4 C agei .2/ˇ5
Example 74.5: Poisson Regression F 5803
The indicator variables cari .j / is associated with the jth level of the variable car for observation i in the
following way:
1 if car D j
cari .j / D
0 if car ¤ j
A similar coding applies to age. The ˇ’s are parameters. The logarithm of the variable n is used as an
offset—that is, a regression variable with a constant coefficient of 1 for each observation. Having the offset
constant in the model is equivalent to fitting an expanded data set with 3000 observations, each with response
variable y observed on an individual level. The log link relates the mean and the factors car and age.
The following statements run PROC MCMC:
proc mcmc data=input_insure outpost=insureout nmc=5000 propcov=quanew
maxtune=0 seed=7;
ods select PostSumInt;
array data[4] 1 &_trgind age;
array beta[4] alpha beta_car1 beta_car2 beta_age;
parms alpha beta:;
prior alpha beta: ~ normal(0, prec = 1e-6);
call mult(data, beta, mu);
model c ~ poisson(exp(mu+ln));
run;
The analysis uses a relatively flat prior on all the regression coefficients, with mean at 0 and precision at 10 6 .
The option MAXTUNE=0 skips the tuning phase because the optimization routine (PROPCOV=QUANEW)
provides good initial values and proposal covariance matrix.
There are four parameters in the model: alpha is the intercept; beta_car1 and beta_car2 are coefficients for
the CLASS variable car, which has three levels; and beta_age is the coefficient for age. The symbol mu
connects the regression model and the Poisson mean by using the log link. The MODEL statement specifies
a Poisson likelihood for the response variable c.
Posterior summary and interval statistics are shown in Output 74.5.1.
Output 74.5.1 MCMC Results
Poisson Regression
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
alpha
N
Standard
Mean Deviation
95%
HPD Interval
5000 -2.6403
0.1344 -2.9133 -2.3831
beta_car1 5000 -1.8335
0.2917 -2.4692 -1.3336
beta_car2 5000 -0.6931
0.1255 -0.9485 -0.4589
beta_age
0.1386 1.0387 1.5812
5000 1.3151
To fit the same model by using PROC GENMOD, you can do the following. Note that the default normal
prior on the coefficients ˇ is N.0; prec D 1e 6/, the same as used in the PROC MCMC. The following
statements run PROC GENMOD and create Output 74.5.2:
5804 F Chapter 74: The MCMC Procedure
proc genmod data=insure;
ods select PostSummaries PostIntervals;
class car age(descending);
model c = car age / dist=poisson link=log offset=ln;
bayes seed=17 nmc=5000 coeffprior=normal;
run;
To compare, posterior summary and interval statistics from PROC GENMOD are reported in Output 74.5.2,
and they are very similar to PROC MCMC results in Output 74.5.1.
Output 74.5.2 PROC GENMOD Results
Poisson Regression
The GENMOD Procedure
Bayesian Analysis
Posterior Summaries
Percentiles
Parameter
N
Standard
Mean Deviation
25%
50%
75%
Intercept
5000 -2.6424
0.1336 -2.7334 -2.6391 -2.5547
carlarge
5000 -1.8040
0.2764 -1.9859 -1.7929 -1.6101
carmedium 5000 -0.6908
0.1311 -0.7797 -0.6898 -0.6044
age1
0.1384 1.2264 1.3209 1.4140
5000 1.3207
Posterior Intervals
Parameter Alpha
Equal-Tail
Interval
HPD Interval
Intercept
0.050 -2.9154 -2.3893 -2.8997 -2.3850
carlarge
0.050 -2.3668 -1.2891 -2.2992 -1.2378
carmedium 0.050 -0.9437 -0.4231 -0.9434 -0.4230
age1
0.050 1.0455 1.5871 1.0266 1.5629
Note that the descending option in the CLASS statement reverses the sort order of the CLASS variable age
so that the results agree with PROC MCMC. If this option is not used, the estimate for age has a reversed
sign as compared to Output 74.5.2.
Example 74.6: Nonlinear Poisson Regression Models
This example illustrates how to fit a nonlinear Poisson regression with PROC MCMC. In addition, it shows
how you can improve the mixing of the Markov chain by selecting a different proposal distribution or by
sampling on the transformed scale of a parameter. This example shows how to analyze count data for calls to
a technical support help line in the weeks immediately following a product release. This information could
be used to decide upon the allocation of technical support resources for new products. You can model the
number of daily calls as a Poisson random variable, with the average number of calls modeled as a nonlinear
function of the number of weeks that have elapsed since the product’s release. The data are input into a SAS
data set as follows:
Example 74.6: Nonlinear Poisson Regression Models F 5805
title 'Nonlinear Poisson Regression';
data calls;
input weeks calls @@;
datalines;
1
0
1
2
2
2
2
1
3
1
4
5
4
8
5
5
5
9
6 17
7 24
7 16
8 23
8 27
;
3
6
3
9
During the first several weeks after a new product is released, the number of questions that technical support
receives concerning the product increases in a sigmoidal fashion. The expression for the mean value in the
classic Poisson regression involves the log link. There is some theoretical justification for this link, but with
MCMC methodologies, you are not constrained to exploring only models that are computationally convenient.
The number of calls to technical support tapers off after the initial release, so in this example you can use a
logistic-type function to model the mean number of calls received weekly for the time period immediately
following the initial release. The mean function .t / is modeled as follows:
i D
1 C exp Œ .˛ C ˇti /
The likelihood for every observation callsi is
callsi Poisson .i /
Past experience with technical support data for similar products suggests the following prior distributions:
gamma.shape D 3:5; scale D 12/
˛ normal. 5; sd D 0:5/
ˇ normal.0:75; sd D 0:5/
The following PROC MCMC statements fit this model:
ods graphics on;
proc mcmc data=calls outpost=callout seed=53197 ntu=1000 nmc=20000
propcov=quanew stats=none diag=ess;
ods select TADpanel ess;
parms alpha -4 beta 1 gamma 2;
prior gamma ~ gamma(3.5, scale=12);
prior alpha ~ normal(-5, sd=0.25);
prior beta ~ normal(0.75, sd=0.5);
lambda = gamma*logistic(alpha+beta*weeks);
model calls ~ poisson(lambda);
run;
The one PARMS statement defines a block of all parameters and sets their initial values individually. The
PRIOR statements specify the informative prior distributions for the three parameters. The assignment
statement defines , the mean number of calls. Instead of using the SAS function LOGISTIC, you can use
the following statement to calculate and get the same result:
5806 F Chapter 74: The MCMC Procedure
lambda = gamma / (1 + exp(-(alpha+beta*weeks)));
Mixing is not particularly good with this run of PROC MCMC. The ODS SELECT statement displays the
diagnostic graphs and effective sample sizes (ESS) calculation while excluding all other output. The graphical
output is shown in Output 74.6.1, and the ESS of each parameters are all relatively low (Output 74.6.2).
Output 74.6.1 Plots for Parameters
Example 74.6: Nonlinear Poisson Regression Models F 5807
Output 74.6.1 continued
5808 F Chapter 74: The MCMC Procedure
Output 74.6.2 Effective Sample Sizes
Nonlinear Poisson Regression
The MCMC Procedure
Effective Sample Sizes
Parameter
ESS
Autocorrelation
Time Efficiency
alpha
897.4
22.2870
0.0449
beta
231.6
86.3540
0.0116
gamma
162.9
122.8
0.0081
Often a simple scatter plot of the posterior samples can reveal a potential cause of the bad mixing. You can
use PROC SGSCATTER to generate pairwise scatter plots of the three model parameters. The following
statements generate Output 74.6.3:
proc sgscatter data=callout;
matrix alpha beta gamma;
run;
Output 74.6.3 Pairwise Scatter Plots of the Parameters
Example 74.6: Nonlinear Poisson Regression Models F 5809
The nonlinearity in parameters beta and gamma stands out immediately. This explains why a random walk
Metropolis with normal proposal has a difficult time exploring the joint distribution efficiently—the algorithm
works best when the target distribution is unimodal and symmetric (normal-like). When there is nonlinearity
in the parameters, it is impossible to find a single proposal scale parameter that optimally adapts to different
regions of the joint parameter space. As a result, the Markov chain can be inefficient in traversing some parts
of the distribution. This is evident in examining the trace plot of the gamma parameter. You see that the
Markov chain sometimes gets stuck in the far-right tail and does not travel back to the high-density area
quickly. This effect can be seen around the simulations 8,000 and 18,000 in Output 74.6.1.
Reparameterization can often improve the mixing of the Markov chain. Note that the parameter gamma has
a positive support and that the posterior distribution is right-skewed. This suggests that the chain might mix
more rapidly if you sample on the logarithm of the parameter gamma.
Let ı D log. /, and reparameterize the mean function as follows:
i D
exp.ı/
1 C exp Œ .˛ C ˇti /
To obtain the same inference, you use an induced prior on delta based on the gamma prior on the gamma
parameter. This involves a transformation of variables, and you can obtain the following equivalency, where
j exp.ı/j is the Jacobian:
. / D gamma.I a; scale D b/ D
1
b a €.a/
a
1
exp . =b/
, .ı/ D gamma. D exp.ı/I a; scale D b/ jexp.ı/j
The distribution on ı simplifies to the following:
.ı/ D
1
b a €.a/
exp .aı/ exp . exp .ı/ =b/
PROC MCMC supports such a distribution on the logarithm transformation of a gamma random variable. It
is called the ExpGamma distribution.
In the original model, you specify a prior on gamma:
prior gamma ~ gamma(3.5, scale=12);
You can obtain the same inference by specifying an ExpGamma prior on delta and take an exponential
transformation to get back to gamma:
prior delta ~ egamma(3.5, scale=12);
gamma = exp(delta);
The following statements produce Output 74.6.6 and Output 74.6.4:
proc mcmc data=calls outpost=tcallout seed=53197 ntu=1000 nmc=20000
propcov=quanew diag=ess plots=(trace) monitor=(alpha beta gamma);
ods select PostSumInt ESS TRACEpanel;
parms alpha -4 beta 1 delta 2;
prior alpha ~ normal(-5, sd=0.25);
prior beta ~ normal(0.75, sd=0.5);
prior delta ~ egamma(3.5, scale=12);
5810 F Chapter 74: The MCMC Procedure
gamma = exp(delta);
lambda = gamma*logistic(alpha+beta*weeks);
model calls ~ poisson(lambda);
run;
The PARMS statement declares delta, instead of gamma, as a model parameter. The prior distribution of
delta is egamma, as opposed to the gamma distribution. The GAMMA assignment statement transforms delta
to gamma. The LAMBDA assignment statement calculates the mean for the Poisson by using the gamma
parameter. The MODEL statement specifies a Poisson likelihood for the calls response.
The trace plots in Output 74.6.4 show better mixing of the parameters, and the effective sample sizes in
Output 74.6.5 show substantial improvements over the original formulation of the model. The improvements
are especially obvious in beta and gamma, where the increase is fivefold to tenfold.
Output 74.6.4 Plots for Parameters, Sampling on the Log Scale of Gamma
Output 74.6.5 Effective Sample Sizes, Sampling on the Log Scale of Gamma
Nonlinear Poisson Regression
The MCMC Procedure
Effective Sample Sizes
Parameter
ESS
Autocorrelation
Time Efficiency
alpha
1338.4
14.9430
0.0669
beta
1254.9
15.9379
0.0627
gamma
1073.4
18.6320
0.0537
Example 74.6: Nonlinear Poisson Regression Models F 5811
Output 74.6.6 shows the posterior summary and interval statistics of the nonlinear Poisson regression.
Output 74.6.6 MCMC Results, Sampling on the Log Scale of Gamma
Nonlinear Poisson Regression
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
alpha
20000 -4.9040
beta
20000
gamma
20000 46.7199
0.6899
95%
HPD Interval
0.2234 -5.3217 -4.4544
0.1154
0.4841
0.9169
19.4977 19.6588 86.2425
Note that the delta parameter has a more symmetric density than the skewed gamma parameter. A pairwise
scatter plot (Output 74.6.7) shows a more linear relationship between beta and delta. The Metropolis
algorithm always works better if the target distribution is approximately normal.
proc sgscatter data=tcallout;
matrix alpha beta delta;
run;
Output 74.6.7 Pairwise Scatter Plots of the Transformed Parameters
5812 F Chapter 74: The MCMC Procedure
If you are still unsatisfied with the slight nonlinearity in the parameters beta and delta, you can try another
transformation on beta. Normally you would not want to do a logarithm transformation on a parameter
that has support on the real axis, because you would risk taking the logarithm of negative values. However,
because all the beta samples are positive and the marginal posterior distribution is away from 0, you can try a
such a transformation.
Let D log.ˇ/. The prior distribution on is the following:
./ D normal.ˇ D exp./I ; 2 / jexp./j
You can specify the prior distribution in PROC MCMC by using a GENERAL function:
parms kappa;
lprior = logpdf("normal", exp(kappa), 0.75, 0.5) + kappa;
prior kappa ~ general(lp);
beta = exp(kappa);
The PARMS statement declares the transformed parameter kappa, which will be sampled. The LPRIOR
assignment statement defines the logarithm of the prior distribution on kappa. The LOGPDF function is
used here to simplify the specification of the distribution. The PRIOR statement specifies the nonstandard
distribution as the prior on kappa. Finally, the BETA assignment statement transforms kappa back to the
beta parameter.
Applying logarithm transformations on both beta and gamma yields the best mixing. (The results are not
shown here, but you can find the code in the file mcmcex6.sas in the SAS Sample Library.) The transformed
parameters kappa and delta have much clearer linear correlation. However, the improvement over the case
where gamma alone is transformed is only marginally significant (50% increase in ESS).
This example illustrates that PROC MCMC can fit Bayesian nonlinear models just as easily as Bayesian
linear models. More importantly, transformations can sometimes improve the efficiency of the Markov chain,
and that is something to always keep in mind. Also see “Example 74.20: Using a Transformation to Improve
Mixing” on page 5882 for another example of how transformations can improve mixing of the Markov
chains.
Example 74.7: Logistic Regression Random-Effects Model
This example illustrates how you can use PROC MCMC to fit random-effects models. In the example
“Random-Effects Model” on page 5640 in “Getting Started: MCMC Procedure” on page 5630, you already
saw PROC MCMC fit a linear random-effects model. This example shows how to fit a logistic random-effects
model in PROC MCMC. Although you can use PROC MCMC to analyze random-effects models, you might
want to first consider some other SAS procedures. For example, you can use PROC MIXED (see Chapter 78,
“The MIXED Procedure”) to analyze linear mixed effects models, PROC NLMIXED (see Chapter 83, “The
NLMIXED Procedure”) for nonlinear mixed effects models, and PROC GLIMMIX (see Chapter 46, “The
GLIMMIX Procedure”) for generalized linear mixed effects models. In addition, a sampling-based Bayesian
analysis is available in the MIXED procedure through the PRIOR statement (see “PRIOR Statement” on
page 6193 in Chapter 78, “The MIXED Procedure”).
The data are taken from Crowder (1978). The Seeds data set is a 2 2 factorial layout, with two types
of seeds, O. aegyptiaca 75 and O. aegyptiaca 73, and two root extracts, bean and cucumber. You observe
r, which is the number of germinated seeds, and n, which is the total number of seeds. The independent
variables are seed and extract.
The following statements create the data set:
Example 74.7: Logistic Regression Random-Effects Model F 5813
title 'Logistic Regression Random-Effects Model';
data seeds;
input r n seed extract @@;
ind = _N_;
datalines;
10 39 0 0
23 62 0 0
23 81 0 0
26
17 39 0 0
5
6 0 1
53 74 0 1
55
32 51 0 1
46 79 0 1
10 13 0 1
8
10 30 1 0
8 28 1 0
23 45 1 0
0
3 12 1 1
22 41 1 1
15 30 1 1
32
3
7 1 1
;
51
72
16
4
51
0
0
1
1
1
0
1
0
0
1
You can model each observation ri as having its own probability of success pi , and the likelihood is as
follows:
ri binomial.ni ; pi /
You can use the logit link function to link the covariates of each observation, seed and extract, to the
probability of success,
i
D ˇ0 C ˇ1 seedi C ˇ2 extracti C ˇ3 seedi extracti
pi
D logistic.i C ıi /
where ıi is assumed to be an iid random effect with a normal prior:
ıi normal.0; var D 2 /
The four ˇ regression coefficients and the standard deviation 2 in the random effects are model parameters;
they are given noninformative priors as follows:
.ˇ0 ; ˇ1 ; ˇ2 ; ˇ3 / / 1
2 igamma.shape D 0:01; scale D 0:01/
Another way of expressing the same model is as
pi D logistic.ıi /
where
ıi normal.ˇ0 C ˇ1 seedi C ˇ2 extracti C ˇ3 seedi extracti ; 2 /
The two models are equivalent. In the first model, the random effects ıi centers at 0 in the normal distribution,
and in the second model, ıi centers at the regression mean. This hierarchical centering can sometimes
improve mixing.
The following statements fit the second model and generate Output 74.7.1:
5814 F Chapter 74: The MCMC Procedure
proc mcmc data=seeds outpost=postout seed=332786 nmc=20000;
ods select PostSumInt;
parms beta0 0 beta1 0 beta2 0 beta3 0 s2 1;
prior s2 ~ igamma(0.01, s=0.01);
prior beta: ~ general(0);
w = beta0 + beta1*seed + beta2*extract + beta3*seed*extract;
random delta ~ normal(w, var=s2) subject=ind;
pi = logistic(delta);
model r ~ binomial(n = n, p = pi);
run;
The PROC MCMC statement specifies the input and output data sets, sets a seed for the random number
generator, and requests a large simulation size. The ODS SELECT statement displays the summary statistics
and interval statistics tables. The PARMS statement declares the model parameters, and the PRIOR statements
specify the prior distributions for ˇ and 2 .
The symbol w calculates the regression mean, and the RANDOM statement specifies the random effect, with
a normal prior distribution, centered at w with variance 2 . Note that the variable w is a function of the input
data set variables. You can use data set variable in constructing the hyperparameters of the random-effects
parameters, as long as the hyperparameters remain constant within each subject group. The SUBJECT=
option indicates the group index for the random-effects parameters.
The symbol pi is the logit transformation. The MODEL specifies the response variable r as a binomial
distribution with parameters n and pi.
Output 74.7.1 lists the posterior mean and interval estimates of the regression parameters.
Output 74.7.1 Logistic Regression Random-Effects Model
Logistic Regression Random-Effects Model
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
beta0
20000 -0.5570
0.1929 -0.9422 -0.1816
beta1
20000 0.0776
0.3276 -0.5690 0.7499
beta2
20000 1.3667
0.2923
beta3
20000 -0.8469
0.4718 -1.7741 0.0742
s2
20000 0.1171
0.0993 0.00163 0.3045
0.8463 1.9724
Example 74.8: Nonlinear Poisson Regression Multilevel Random-Effects
Model
This example uses the pump failure data of Gaver and O’Muircheartaigh (1987) to illustrate how to fit a
multilevel random-effects model with PROC MCMC. The number of failures and the time of operation are
recorded for 10 pumps. Each of the pumps is classified into one of two groups that correspond to either
continuous or intermittent operation. The following statements generate the data set:
Example 74.8: Nonlinear Poisson Regression Multilevel Random-Effects Model F 5815
title 'Nonlinear Poisson Regression Random-Effects Model';
data pump;
input y t group @@;
pump = _n_;
logtstd = log(t) - 2.4564900;
datalines;
5 94.320 1
1 15.720 2
5 62.880 1
14 125.760 1
3
5.240 2
19 31.440 1
1
1.048 2
1
1.048 2
4
2.096 2
22 10.480 2
;
Each row denotes data for a single pump, and the variable logtstd contains the centered operation times.
Letting yij denote the number of failures for the jth pump in the ith group, Draper (1996) considers the
following hierarchical model for these data, where the data set variable logtstd is log tij log t:
yij jij
Poisson.ij /
log ij
D ˛i C ˇi .log tij
log t / C eij
This model specifies different intercepts and slopes for each group (i = 1, 2), and the random effect eij is a
mechanism for accounting for overdispersion. You can use noninformative priors on the parameters ˛i , ˇi ,
and 2 , and a normal prior on eij ,
ui
D
˛i
ˇi
mvn
0
0
1e6
0
;
for i D 1; 2
0 1e6
2 igamma .shape D 0:01; scale D 0:01/
eij j 2 normal.0; 2 /
where ui is a multidimensional random effect. The following statements fit such a random-effects model:
ods graphics on;
proc mcmc data=pump outpost=postout seed=248601 nmc=10000
plots=trace stats=none diag=none;
ods select tracepanel;
array u[2] alpha beta;
array mu[2] (0 0);
parms s2;
prior s2 ~ igamma(0.01, scale=0.01);
random u ~ MVNAR(mu, sd=1e6, rho=0) subject=group monitor=(u);
random e ~ normal(0, var=s2) subject=pump monitor=(random(1));
w = alpha + beta * logtstd;
lambda = exp(w+e);
model y ~ poisson(lambda);
run;
The PROC MCMC statement specifies the input data set (Pump), the output data set (Postout), a seed for the
random number generator, and a simulation sample size of 10,000. The program requests that only trace
plots be produced, disabling all posterior calculations and convergence diagnostics tests. The ODS SELECT
statement displays the trace plots, which are the primary focus.
5816 F Chapter 74: The MCMC Procedure
The first ARRAY statement declares an array u of size 2 and names the elements alpha and beta. The array u
stores the random-effects parameters alpha and beta. The next ARRAY statement defines the mean of the
multivariate normal prior on u.
The PARMS statement declares the only model parameter here, the variance s2 in the prior distribution for
the random effect eij . The PRIOR statement specifies an inverse-gamma prior on the variance. The first
RANDOM statement specifies a multivariate normal prior on u. The MVNAR distribution is a multivariate
normal distribution with a first-order autoregressive covariance. When the argument rho is set to 0, this
distribution simplifies to a multivariate normal distribution with a shared variance. The RANDOM statement
also indicates the group variable as its subject index and monitors all elements u. The second RANDOM
statement specifies a normal prior on the effect e, where the subject index variable is pump. The MONITOR=
option requests that PROC MCMC randomly choose one of the 10 e random-effects parameters to monitor.
Next, programming statements construct the mean of the Poisson likelihood, and the MODEL statement
specifies the likelihood function for each observation.
Output 74.8.1 shows trace plots for 2 ; ˛1 ; ˛2 ; ˇ1 ; ˇ2 ; and e8 . You can see that the chains are mixing poorly.
Output 74.8.1 Trace Plots of 2 , ˛ , ˇ , and e8 without Hierarchical Centering
Example 74.8: Nonlinear Poisson Regression Multilevel Random-Effects Model F 5817
Output 74.8.1 continued
To improve mixing, you can repeat the same analysis by using a hierarchical centering technique, where
instead of using a normal prior centered at 0 on eij , you center the random effects on the group means:
yij jij
log ij
Poisson.ij /
normal ˛i C ˇi .log tij
log t /; var D 2
The following statements illustrate how to fit a multilevel hierarchical centering random-effects model:
proc mcmc data=pump outpost=postout_c seed=248601 nmc=10000
plots=trace diag=none;
ods select tracepanel postsumint;
array u[2] alpha beta;
array mu[2] (0 0);
parms s2 1;
prior s2 ~ igamma(0.01, scale=0.01);
random u ~ MVNAR(mu, sd=1e6, rho=0) subject=group monitor=(u);
w = alpha + beta * logtstd;
random llambda ~ normal(w, var = s2) subject=pump monitor=(random(1));
lambda = exp(llambda);
model y ~ poisson(lambda);
run;
The difference between these statements and the previous statements on page 5815 is that these statements
have the variable w as the prior mean of the random effect llambda. The symbol lambda is the exponential of
5818 F Chapter 74: The MCMC Procedure
the corresponding log ij (llambda), and the MODEL statement assigns the response variable y a Poisson
likelihood with a mean parameter lambda, the same way it did in the previous statements.
The trace plots of the monitored parameters are shown in Output 74.8.2. The mixing is significantly improved
over the previous model. The posterior summary and interval statistics tables are shown in Output 74.8.3.
Output 74.8.2 Trace Plots of 2 , ˛ , and ˇ with Hierarchical Centering
Example 74.8: Nonlinear Poisson Regression Multilevel Random-Effects Model F 5819
Output 74.8.2 continued
Output 74.8.3 Posterior Summary Statistics
Nonlinear Poisson Regression Random-Effects Model
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
95%
Mean Deviation HPD Interval
s2
10000 1.7606
2.2022 0.1039 4.7631
alpha_1
10000 2.9286
2.4247 -1.9416 7.4115
beta_1
10000 -0.4018
1.3110 -2.9323 2.0623
alpha_2
10000 1.6105
0.8801 -0.0436 3.2985
beta_2
10000 0.5652
0.5804 -0.5469 1.7381
llambda_8 10000 -0.0560
0.8395 -1.6933 1.4612
You can generate a caterpillar plot (Output 74.8.4) of the random-effects parameters by calling the %CATER
macro:
%CATER(data=postout_c, var=llambda_:);
ods graphics off;
Varying llambda indicates nonconstant dispersion in the Poisson model.
5820 F Chapter 74: The MCMC Procedure
Output 74.8.4 Caterpillar Plots of the Random-Effects Parameters
Example 74.9: Multivariate Normal Random-Effects Model
Gelfand et al. (1990) use a multivariate normal hierarchical model to estimate growth regression coefficients
for the growth of 30 young rats in a control group over a period of 5 weeks. The following statements create
a SAS data set with measurements of Weight, Age (in days), and Subject.
title 'Multivariate Normal Random-Effects Model';
data rats;
array days[5] (8 15 22 29 36);
input weight @@;
subject = ceil(_n_/5);
index = mod(_n_-1, 5) + 1;
age = days[index];
drop index days:;
datalines;
151 199 246 283 320 145 199 249 293 354
147 214 263 312 328 155 200 237 272 297
135 188 230 280 323 159 210 252 298 331
141 189 231 275 305 159 201 248 297 338
177 236 285 350 376 134 182 220 260 296
160 208 261 313 352 143 188 220 273 314
154 200 244 289 325 171 221 270 326 358
163 216 242 281 312 160 207 248 288 324
142 187 234 280 316 156 203 243 283 317
157 212 259 307 336 152 203 246 286 321
Example 74.9: Multivariate Normal Random-Effects Model F 5821
154
146
132
169
137
;
205
191
185
216
180
253
229
237
261
219
298
272
286
295
258
334
302
331
333
291
139
157
160
157
153
190
211
207
205
200
225
250
257
248
244
267
285
303
289
286
302
323
345
316
324
The model assumes normal measurement errors,
Weightij normal ˛i C ˇi Ageij ; 2 ; i D 1; : : : ; 30I j D 1; : : : ; 5
where i indexes rat (Subject variable), j indexes the time period, Weightij and Ageij denote the weight and
age of the ith rat in week j, and 2 is the variance in the normal likelihood. The individual intercept and slope
coefficients are modeled as the following:
˛i
˛c
i D
MVN c D
; †c ; i D 1; : : : ; 30
ˇi
ˇc
You can use the following independent prior distributions on c , †c , and 2 :
c
†c
0
1000
0
MVN 0 D
; †0 D
0
0
1000
0:01 0
iwishart D 2; S D 0
10
2 igamma .shape D 0:01; scale D 0:01/
The following statements fit this multivariate normal random-effects model:
proc mcmc data=rats nmc=10000 outpost=postout
seed=17 init=random;
ods select Parameters REParameters PostSumInt;
array theta[2] alpha beta;
array theta_c[2];
array Sig_c[2,2];
array mu0[2] (0 0);
array Sig0[2,2] (1000 0 0 1000);
array S[2,2] (0.02 0 0 20);
parms
prior
prior
prior
theta_c
theta_c
Sig_c ~
var_y ~
Sig_c {121 0 0 0.26} var_y;
~ mvn(mu0, Sig0);
iwish(2, S);
igamma(0.01, scale=0.01);
random theta ~ mvn(theta_c, Sig_c) subject=subject
monitor=(alpha_9 beta_9 alpha_25 beta_25);
mu = alpha + beta * age;
model weight ~ normal(mu, var=var_y);
run;
The ODS SELECT statement displays information about model parameters, random-effects parameters,
and the posterior summary statistics. The ARRAY statements allocate memory space for the multivariate
5822 F Chapter 74: The MCMC Procedure
parameters and hyperparameters in the model. The parameters are (theta where the variable name of each
element is alpha or beta), c (theta_c), and †c (Sig_c). The hyperparameters are 0 (mu0), †0 (Sig0), and
S (S). The multivariate hyperparameters are assigned with constant values using parentheses . /.
The PARMS statement declares model parameters and assigns initial values to Sig_c using braces f g.
The PRIOR statements specify the prior distributions. The RANDOM statement defines an array random
effect theta and specifies a multivariate normal prior distribution. The SUBJECT= option indicates cluster
membership for each of the random-effects parameter. The MONITOR= option monitors the individual
intercept and slope coefficients of subjects 9 and 25.
You can use the following syntax in the RANDOM statement to monitor all parameters in an array random
effect:
monitor=(theta)
This would produce posterior summary statistics on ˛1 ˛30 and ˇ1 ˇ30 .
The following syntax monitors all ˛i parameters:
monitor=(alpha)
If you did not name elements of theta to be alpha and beta, the SAS System creates variable names
automatically in a consecutive fashion, as in theta1 and theta2.
Output 74.9.1 Parameter and Random-Effects Parameter Information Table
Multivariate Normal Random-Effects Model
The MCMC Procedure
Parameters
Array Sampling
Block Parameter Index Method
1 theta_c1
Initial
Value Prior Distribution
Conjugate -4.5834 MVNormal(mu0, Sig0)
theta_c2
5.7930
2 Sig_c1
[1,1]
Sig_c2
[1,2]
Sig_c3
[2,1]
0
Sig_c4
[2,2]
0.2600
3 var_y
Conjugate
121.0 iWishart(2, S)
0
Conjugate 2806714 igamma(0.01, scale=0.01)
Random Effect Parameters
Sampling
Parameter Method
theta
Number of Subject
Subject Subjects Values
N-Metropolis subject
30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
Prior
Distribution
MVNormal(theta_c, Sig_c)
Output 74.9.1 displays the parameter and random-effects parameter information tables. The Array Index
column in “Parameters” table shows the index reference of the elements in the array parameter Sig_c. The
total number of subjects in the study is 30.
Example 74.10: Missing at Random Analysis F 5823
Output 74.9.2 Multivariate Normal Random-Effects Model
Multivariate Normal Random-Effects Model
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
2.2486
101.7
110.6
0.1988
5.8058
6.5815
theta_c1
10000
106.1
theta_c2
10000
6.1975
Sig_c1
10000
110.8
45.9169 37.9670
203.8
Sig_c2
10000 -1.4267
2.3320 -6.2878
2.7756
Sig_c3
10000 -1.4267
2.3320 -6.2878
2.7756
Sig_c4
10000
0.2979
1.6538
var_y
10000 37.6855
5.9591 27.0943 49.4449
alpha_9
10000
119.4
5.6756
108.1
130.5
beta_9
10000
7.4670
0.2382
7.0146
7.9278
alpha_25
10000 86.5673
6.3694 74.4247 99.9007
beta_25
10000
0.2612
1.0591
6.7804
0.5549
6.2529
7.2906
Output 74.9.2 displays posterior summary statistics for model parameters and the random-effects parameters
for subjects 9 and 25. You can see that there is a substantial difference in the intercepts and growth rates
between the two rats.
A seemingly confusing message might occur if a symbol name matches an internally generated variable
name for elements of an array. For example, if, instead of using the symbol var_y in the SAS program for the
model variance 2 , you used s2, the SAS System produces the following error message:
ERROR: The initial value 0 for the parameter S2 is outside of the prior
distribution support set.
This is confusing because the program does not assign an initial value for the parameter s2 in the PARMS
statement, and you might expect that PROC MCMC would not generate an invalid initial value. The confusion
is caused by the ARRAY statement that defines the array variable S:
array S[2,2] (0.02 0 0 20);
Elements of S are automatically given names s1–s4. PROC MCMC interprets s2 as an element in S that was
given a value of 0, hence producing this error message.
Example 74.10: Missing at Random Analysis
This example illustrates how PROC MCMC treats missing at random (MAR) data. For a short overview of
missing data problems, see the section “Handling of Missing Data” on page 5753.
Researchers studied the effects of air pollution on respiratory disease in children. The response variable (y)
represented whether a child exhibited wheezing symptoms; it was recorded as 1 for symptoms exhibited and
0 for no symptoms exhibited. City of residency (x1) and maternal smoking status (x2) were the explanatory
variables. The variable x1 was coded as 1 if the child lived in the more polluted city, Steel City, and 0 if the
5824 F Chapter 74: The MCMC Procedure
child lived in Green Hills. The variable x2 was the number of cigarettes the mother reported that she smoked
per day. Both the covariates contain missing values: 17 for x1 and 30 for x2, respectively. The total number
of observations in the data set is 390. The following statements generate the data set air:
title 'Missing at Random Analysis';
data air;
input y x1 x2 @@;
datalines;
0 0 0
0 0 0
0 1 0
0 0
0 0 8
0 1 10
0 1 9
0 0
0 1 12
0 0 .
0 0 0
0 1
0 0 8
0 0 0
1 1 0
1 0
0 0 0
1 0 0
1 0 5
0 0
0
0
0
6
8
0
1
0
0
0
0 11
1 6
1 7
0 0
0 0
0
0
1
1
0
1 7
1 10
1 15
1 11
1 9
0 12
1 7
0 0
1 11
. 4
0
0
0
0
1
0 10
0 7
1 12
1 0
1 16
0
0
0
0
0
1 10
0 0
0 0
1 8
. 13
... more lines ...
0
0
0
0
0
0 11
1 11
. 11
1 0
. 0
0
0
1
1
1
0
0
1
1
0
0
9
6
8
0
0
1
0
0
1
0 6
0 11
0 8
0 0
1 10
0
0
0
0
0
;
Suppose you want to fit a logistic regression model for whether the subject develops wheezing symptoms
with density for the i D 1; : : : ; 390 subjects as follows:
yi
binary.pi /
logit.pi / D ˇ0 C ˇ1 x1i C ˇ2 x2i
.ˇ0 /; .ˇ1 /; .ˇ2 / D normal.0; 2 D 10/
Suppose you specify a joint distribution of x1 and x2 in terms of the product of a conditional and marginal
distribution; that is,
p.x1; x2j˛/ D p.x1jx2; ˛10 ; ˛11 /p.x2j˛20 /
where p.x1i jx2i ; ˛10 ; ˛11 / could be a logistic model and p.x2i j˛20 / could be a Poisson distribution that
models the following counts:
p.x1i jx2i ; ˛10 ; ˛11 / D binary.pc;i /
logit.pc;i / D ˛10 C ˛11 x2i
.˛10 /; .˛11 / D normal.0; 2 D 10/
p.x2i j˛20 / D Poisson.e ˛20 /
.˛20 / D normal.0; 2 D 2/
The researchers are interested in interpreting how the odds of developing a wheeze changes for a child living
in the more polluted city. The odds ratio can be written as the follows:
ORx1 D exp .ˇ1 /
Example 74.10: Missing at Random Analysis F 5825
Similarly, the odds ratio for the maternal smoking effect can be written as follows:
ORx2 D exp .ˇ2 /
The following statements fit a Bayesian logistic regression with missing covariates:
proc mcmc data=air seed=1181 nmc=10000 monitor=(_parms_ orx1 orx2)
diag=none plots=none;
parms beta0 -1 beta1 0.1 beta2 .01;
parms alpha10 0 alpha11 0 alpha20 0;
prior beta: alpha1: ~ normal(0,var=10);
prior alpha20 ~ normal(0,var=2);
beginnodata;
pm = exp(alpha20);
orx1 = exp(beta1);
orx2 = exp(beta2);
endnodata;
model x2 ~ poisson(pm) monitor=(1 3 10);
p1 = logistic(alpha10 + alpha11 * x2);
model x1 ~ binary(p1) monitor=(random(3));
p = logistic(beta0 + beta1*x1 + beta2*x2);
model y ~ binary(p);
run;
The PARMS statements specify the parameters in the model and assign initial values to each of them. The
PRIOR statements specify priors for all the model parameters. The notations beta: and alpha: in the PRIOR
statements are shorthand for all variables that start with “beta,” and “alpha,” respectively. The shorthand
notation is not necessary, but it keeps your code succinct.
The BEGINNODATA and ENDNODATA statements enclose three programming statements that calculate the
Poisson mean (pm) and the two odds ratios (ORX1 and ORX2). These enclosed statements are independent
of any data set variables, and they are run only once per iteration to reduce unnecessary observation-level
computations.
The first MODEL statement assigns a Poisson likelihood with mean pm to x2. The statement models
missing values in x2 automatically, creating one variable for each of the missing values, and augments them
accordingly. By default, PROC MCMC does not output analyses of the posterior samples of the missing
values. You can use the MONITOR= option to choose the missing values that you want to monitor. In the
example, the first, third, and tenth missing values are monitored.
The P1 assignment statement calculates pc:i . The second MODEL statement assigns a binary likelihood with
probability p1 and requests a random choice of three missing data variables of x1 to monitor.
The P assignment statement calculates pi in the logistic model. The third MODEL statement specifies the
complete data likelihood function for Y.
Output 74.10.1 displays the number of observations read from the DATA= data set, the number of observations
used in the analysis, and the “Missing Data Information” table. No observations were omitted from the data
set in the analysis.
The “Missing Data Information” table lists the variables that contain missing values, which are x1 and x2, the
number of missing observations in each variable, the observation indices of these missing values, and the
sampling algorithms used. By default, the first 20 observation indices of each variable are printed in the table.
5826 F Chapter 74: The MCMC Procedure
Output 74.10.1 Observation Information and Missing Data Information
Missing at Random Analysis
The MCMC Procedure
Number of Observations Read 390
Number of Observations Used 390
Missing Data Information Table
Number of Observation
Variable Missing Obs Indices
Sampling
Method
x2
30 14 41 50 55 59 66 71 83 88 90 118 158 174 175 178 183 196 203 210 212 ...
N-Metropolis
x1
17 50 92 93 167 194 231 273 296 303 304 308 330 349 373 385 388 390
Inverse CDF
There are 30 missing values in the variable x2, and 17 in x1. Internally, PROC MCMC creates 30 and 17
variables for the missing values in x2 and x1, respectively. The default naming convention for these missing
values is to concatenate the response variable and the observation number. For example, the first missing
value in x2 is the fourteenth observation, and the corresponding variable is x2_14.
Output 74.10.2 displays the summary and interval statistics for each parameter, the odds ratios, and the
monitored missing values.
Output 74.10.2 Posterior Summary and Interval Statistics
Missing at Random Analysis
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
beta0
10000 -1.3732
0.2078 -1.7909 -0.9676
beta1
10000 0.4797
0.2387 0.0268 0.9491
beta2
10000 0.0156
0.0227 -0.0265 0.0642
alpha10
10000 -0.2166
0.1422 -0.4662 0.0874
alpha11
10000 0.0126
0.0201 -0.0267 0.0521
alpha20
10000 1.5635
0.0235 1.5199 1.6105
orx1
10000 1.6627
0.4094 0.9205 2.4493
orx2
10000 1.0160
0.0231 0.9739 1.0663
x2_14
10000 4.9022
2.2083 1.0000 9.0000
x2_50
10000 4.8924
2.1626 1.0000 9.0000
x2_90
10000 4.8263
2.0816 1.0000 8.0000
x1_296
10000 0.4160
0.4929
0 1.0000
x1_304
10000 0.4460
0.4971
0 1.0000
x1_373
10000 0.4443
0.4969
0 1.0000
The odds ratio for x1 is the multiplicative change in the odds of a child wheezing in Steel City compared
to the odds of the child wheezing in Green Hills. The estimated odds ratio (ORX1) value is 1.6736 with a
corresponding 95% equal-tail credible interval of .1:0248; 2:5939/. City of residency is a significant factor
in a child’s wheezing status. The estimated odds ratio for x2 is the multiplicative change in the odds of
developing a wheeze for each additional reported cigarette smoked per day. The odds ratio of ORX2 indicates
Example 74.11: Nonignorably Missing Data (MNAR) Analysis F 5827
that the odds of a child developing a wheeze is 1.0150 times higher for each reported cigarette a mother
smokes. The corresponding 95% equal-tail credible interval is .0:9695; 1:0619/. Since this interval contains
the value 1, maternal smoking is not considered to be an influential effect.
Example 74.11: Nonignorably Missing Data (MNAR) Analysis
This example illustrates how to fit a nonignorably missing data model (MNAR) with PROC MCMC. For a
short overview of missing data problems, see the section “Handling of Missing Data” on page 5753.
This data set comes from an environmental study that involve workers in a cotton factory. A similar data
set was analyzed from Ibrahim, Chen, and Lipsitz (2001). There are 912 workers in the data set, and the
response variable of interest is whether they develop dyspnea (difficult or labored respiration). The data are
collected over three time points, and there are six covariates. The following statements create the data set:
title 'Nonignorably Missing Data Analysis';
data dyspnea;
input smoke1 smoke2 smoke3 y1 y2 y3 yrswrk1 yrswrk2 yrswrk3
age expd sex hgt;
datalines;
0 0 0 0 0 0 28.1 33.1 39.1 48 1 1 165.0
0 0 0 0 . 0
5.1 10.1 16.1 45 1 0 147.0
0 0 0 0 . 0 26.0 31.0 37.0 46 1 0 156.0
... more lines ...
1
0
1
0
1
0
0
0
.
.
.
.
6.0
20.0
11.0
25.0
17.0
31.0
25 0 1 180.0
42 0 0 159.0
;
The following variables are included in the data set:
y1, y2, and y3: dichotomous outcome at the three time periods, which takes the value 1 if the worker
has dyspnea, 0 if not (there are missing values in y2 and y3)
smoke1, smoke2, smoke3: smoking status (0=no, and 1=yes)
yrswrk1, yrswrk2, yrswrk3: years worked at the cotton factory
age: age of the worker
expd: cotton dust exposure (0=no, 1=yes)
sex: gender (0=female, 1=male)
hgt: height of the worker
Prior to the analysis, three missing data indicator variables (r1, r2, and r3, one for each of the response
variables) are created, and they are set to 1 if the response variable is missing, and 0 otherwise. The covariates
age, hgt, yrswrk1, yrswkr2, and yrswrk3 are standardized:
5828 F Chapter 74: The MCMC Procedure
data dyspnea;
array y[3] y1-y3;
array r[3];
set dyspnea;
do i = 1 to 3;
if y[i] = . then r[i] = 1;
else r[i] = 0;
end;
output;
run;
proc standard data=dyspnea out=dyspnea mean=0 std=1;
var age hgt yrswrk:;
run;
There are no missing values in response variable y1, 128 missing values in y2, and 131 in y3. Ibrahim, Chen,
and Lipsitz (2001) used a logistic regression for each of the response variables, where ıi is a scalar random
effect on the observational level:
yki
binary.pki / k D 1; 2; 3I i D 1; : : : ; 912
pki
D logistic.ki C ıi /
ki
D ˇ1 C ˇ2 expdi C ˇ3 sexi C ˇ4 hgti C ˇ5 agei C ˇ6 yrswrkki C ˇ7 smokeki
ıi
n.0; 2 /
Ibrahim, Chen, and Lipsitz (2001) noted that taking ıi to be higher dimensional (3) would make the model
either not identifiable or nearly not identifiable because of the multiple missing values for some subjects.
The first response variable y1 does not contain any missing values, making it meaningless to model the
corresponding r1 because every value is 1. Hence, only r2 and r3 are considered in the missing mechanism
part of the model. Ibrahim, Chen, and Lipsitz (2001) suggest the following logistic regression for r2 and
r3, where the regression mean for each r depends not only on the current response variable y but also the
response from previous time period:
rki
binary.qki / k D 2; 3I i D 1; : : : ; 912
qki
D logistic.ki /
c D 1 C 2 expdi C 3 sexi C 4 hgti C 5 agei C 6 yrswrkki C 7 smokeki
2i
D c C 8 y1i C 9 y2i
3i
D c C 9 y2i C 10 y3i
The missing mechanism model introduces an additional 10 parameters to the model. Normal priors with
large standard deviations are used here.
The following statements fit a nonignorably missing model to the dyspnea data set:
ods select MissDataInfo REParameters Postsumint;
proc mcmc data=dyspnea seed=17 outpost=dysp2 nmc=20000
propcov=simplex diag=none monitor=(beta1-beta7);
array p[3];
array yrswrk[3];
Example 74.11: Nonignorably Missing Data (MNAR) Analysis F 5829
array smoke[3];
parms beta1-beta7 s2;
parms phi1-phi10;
prior beta: phi: ~ n(0, var=1e6);
prior s2 ~ igamma(2, scale=2);
random d ~ n(0, var=s2) subject=_obs_;
mu = beta1 + beta2*expd + beta3*sex + beta4*hgt + beta5*age + d;
do j = 1 to 3;
p[j] = logistic(mu + beta6*yrswrk[j] + beta7*smoke[j]);
end;
model y1 ~ binary(p1);
model y2 ~ binary(p2);
model y3 ~ binary(p3);
nu = phi1 + phi2*expd + phi3*sex + phi4*hgt + phi5*age;
q2 = logistic(nu + phi6*yrswrk[2] + phi7*smoke[2] +
phi8*y1 + phi9*y2);
model r2 ~ binary(q2);
q3 = logistic(nu + phi6*yrswrk[3] + phi7*smoke[3] +
phi9*y2 + phi10*y3);
model r3 ~ binary(q3);
run;
The first ARRAY statement declares an array p of size 3. This arrays stores three binary probabilities of the
response variables. The next two ARRAY statements create storage arrays for some of yrswrk and smoke
variables for later programming convenience. The first PARMS statement declares eight parameters, ˇ1 ˇ7
and 2 . The second PARMS statement declares the 10 parameters for the missing mechanism model. The
PRIOR statements assign prior distributions to these parameters.
The RANDOM statement defines an observational-level random effect d that has a normal prior with variance
s2. The SUBJECT=_OBS_ option enables the specification of individual random effects without an explicit
input data set variable.
The MU assignment statement and the following DO loop statements calculate the binary probabilities for
the three response variables. Note that different yrswrk and smoke variables are used in the DO loop for
different years. The three MODEL statements assign three binary distributions to the response variables.
The NU assignment statement starts the calculation for the regression mean in the logistic model for r2 and r3.
The variables q2 and q3 are the binary probabilities for the missing mechanisms. Note that their calculations
are conditional on the response variables y (selection model). The last two MODEL statements for r2 and r3
complete the specification of the models.
Missing data information and random-effects parameters information are displayed in Output 74.11.1. You
can read the total number of missing observations from each variable and its indices from the table. The
missing values are sampled using the inverse CDF method. There are 912 random-effects parameters in the
model.
5830 F Chapter 74: The MCMC Procedure
Output 74.11.1 Missing Data and Random-Effects Information
Nonignorably Missing Data Analysis
The MCMC Procedure
Missing Data Information Table
Number of Observation
Variable Missing Obs Indices
Sampling
Method
y2
128 2 3 9 11 13 19 20 21 30 31 32 35 39 40 43 56 58 71 75 95 ...
Inverse CDF
y3
131 9 14 16 20 21 29 31 32 43 45 56 72 86 115 117 121 124 142 149 160 ...
Inverse CDF
Random Effect Parameters
Sampling
Parameter Method
d
Number of Subject
Subject Subjects Values
N-Metropolis _OBS_
Prior
Distribution
912 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
normal(0, var=s2)
The posterior summary and interval statistics of all the ˇ parameters are shown in Output 74.11.2. There are
a number of significant regression coefficients in modeling the probability of a worker developing dyspnea,
including those for expd (ˇ2 ), sex (ˇ3 ), age (ˇ5 ), and smoke (ˇ7 ).
Output 74.11.2 Posterior Summary Statistics for ˇ
Nonignorably Missing Data Analysis
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
beta1
20000 -2.3256
0.1771 -2.6670 -1.9826
beta2
20000 0.5327
0.1530 0.2306 0.8193
beta3
20000 -0.5966
0.2593 -1.0906 -0.0691
beta4
20000 -0.0682
0.1061 -0.2734 0.1462
beta5
20000 0.6252
0.1640 0.2992 0.9490
beta6
20000 -0.1776
0.1574 -0.4971 0.1218
beta7
20000 0.5862
0.2214 0.1433 1.0051
Example 74.12: Change Point Models
Consider the data set from Bacon and Watts (1971), where yi is the logarithm of the height of the stagnant
surface layer and the covariate xi is the logarithm of the flow rate of water. The following statements create
the data set:
title 'Change Point Model';
data stagnant;
input y x @@;
ind = _n_;
datalines;
1.12 -1.39
1.12 -1.39
0.92 -0.94
0.90 -0.80
0.99
0.81
-1.08
-0.63
1.03
0.83
-1.08
-0.63
Example 74.12: Change Point Models F 5831
0.65
0.51
0.33
0.13
-0.30
-0.65
;
-0.25
0.01
0.25
0.44
0.85
1.19
0.67
0.44
0.30
-0.01
-0.33
-0.25
0.11
0.25
0.59
0.85
0.60
0.43
0.25
-0.13
-0.46
-0.12
0.11
0.34
0.70
0.99
0.59
0.43
0.24
-0.14
-0.43
-0.12
0.11
0.34
0.70
0.99
A scatter plot (Output 74.12.1) shows the presence of a nonconstant slope in the data. This suggests a change
point regression model (Carlin, Gelfand, and Smith 1992). The following statements generate the scatter plot
in Output 74.12.1:
proc sgplot data=stagnant;
scatter x=x y=y;
run;
Output 74.12.1 Scatter Plot of the Stagnant Data Set
Let the change point be cp. Following formulation by Spiegelhalter et al. (1996b), the regression model is as
follows:
normal.˛ C ˇ1 .xi cp/; 2 / if xi < cp
yi normal.˛ C ˇ2 .xi cp/; 2 / if xi >D cp
You might consider the following diffuse prior distributions:
cp uniform. 1:3; 1:1/
˛; ˇ1 ; ˇ2 normal.0; var D 1e6/
2 uniform.0; 5/
5832 F Chapter 74: The MCMC Procedure
The following statements generate Output 74.12.2:
proc mcmc data=stagnant outpost=postout seed=24860 ntu=1000
nmc=20000;
ods select PostSumInt;
ods output PostSumInt=ds;
array beta[2];
parms alpha cp beta1 beta2;
parms s2;
prior cp ~ unif(-1.3, 1.1);
prior s2 ~ uniform(0, 5);
prior alpha beta: ~ normal(0, v = 1e6);
j = 1 + (x >= cp);
mu = alpha + beta[j] * (x - cp);
model y ~ normal(mu, var=s2);
run;
The PROC MCMC statement specifies the input data set (Stagnant), the output data set (Postout), a random
number seed, a tuning sample of 1000, and an MCMC sample of 20000. The ODS SELECT statement
displays only the summary statistics table. The ODS OUTPUT statement saves the summary statistics table
to the data set Ds.
The ARRAY statement allocates an array of size 2 for the beta parameters. You can use beta1 and beta2 as
parameter names without allocating an array, but having the array makes it easier to construct the likelihood
function. The two PARMS statements put the five model parameters in two blocks. The three PRIOR
statements specify the prior distributions for these parameters.
The symbol j indicates the segment component of the regression. When x is less than the change point, (x >=
cp) returns 0 and j is assigned the value 1; if x is greater than or equal to the change point, (x >= cp) returns 1
and j is 2. The symbol mu is the mean for the jth segment, and beta[j] changes between the two regression
coefficients depending on the segment component. The MODEL statement assigns the normal model to the
response variable y.
Posterior summary statistics are shown in Output 74.12.2.
Output 74.12.2 MCMC Estimates of the Change Point Regression Model
Change Point Model
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation 95% HPD Interval
alpha
20000
0.5349
0.0249
0.4843
cp
20000
0.0283
0.0314
-0.0353
0.5813
0.0846
beta1
20000
-0.4200
0.0146
-0.4482
-0.3911
beta2
20000
-1.0136
0.0167
-1.0476
-0.9817
s2
20000 0.000451 0.000145 0.000220 0.000735
Example 74.12: Change Point Models F 5833
You can use PROC SGPLOT to visualize the model fit. Output 74.12.3 shows the fitted regression lines
over the original data. In addition, on the bottom of the plot is the kernel density of the posterior marginal
distribution of cp, the change point. The kernel density plot shows the relative variability of the posterior
distribution on the data plot. You can use the following statements to create the plot:
data _null_;
set ds;
call symputx(parameter, mean);
run;
data b;
missing A;
input x1 @@;
if x1 eq .A then x1 = &cp;
if _n_ <= 2 then y1 = &alpha + &beta1 * (x1 - &cp);
else y1 = &alpha + &beta2 * (x1 - &cp);
datalines;
-1.5 A 1.2
;
proc kde data=postout;
univar cp / out=m1 (drop=count);
run;
data m1;
set m1;
density = (density / 25) - 0.653;
run;
data all;
set stagnant b m1;
run;
proc sgplot data=all noautolegend;
scatter x=x y=y;
series x=x1 y=y1 / lineattrs = graphdata2;
series x=value y=density / lineattrs = graphdata1;
run;
The macro variables &alpha, &beta1, &beta2, and &cp store the posterior mean estimates from the data
set Ds. The data set b contains three predicted values, at the minimum and maximum values of x and the
estimated change point &cp. These input values give you fitted values from the regression model. Data set
M1 contains the kernel density estimates of the parameter cp. The density is scaled down so the curve would
fit in the plot. Finally, you use PROC SGPLOT to overlay the scatter plot, regression line and kernel density
plots in the same graph.
5834 F Chapter 74: The MCMC Procedure
Output 74.12.3 Estimated Fit to the Stagnant Data Set
Example 74.13: Exponential and Weibull Survival Analysis
This example covers two commonly used survival analysis models: the exponential model and the Weibull
model. The deviance information criterion (DIC) is used to do model selections, and you can also find
programs that visualize posterior quantities. Exponential and Weibull models are widely used for survival
analysis. This example shows you how to use PROC MCMC to analyze the treatment effect for the E1684
melanoma clinical trial data. These data were collected to assess the effectiveness of using interferon alpha-2b
in chemotherapeutic treatment of melanoma. The following statements create the data set:
data e1684;
input t t_cen treatment @@;
if t = . then do;
t = t_cen;
v = 0;
end;
else
v = 1;
ifn = treatment - 1;
et = exp(t);
lt = log(t);
drop t_cen;
datalines;
1.57808 0.00000 2
1.48219
2.23288 0.00000 1
.
0.00000
9.38356
2
2
.
3.27671
7.33425
0.00000
1
1
Example 74.13: Exponential and Weibull Survival Analysis F 5835
.
1.68767
9.64384
0.00000
1
2
1.66575
2.34247
0.00000
0.00000
2
2
0.94247
0.89863
0.00000
0.00000
1
1
.
4.36164
2
.
4.81918
2
... more lines ...
3.39178
0.00000
1
;
The data set E1684 contains the following variables: t is the failure time that equals the censoring time
whether the observation was censored, v indicates whether the observation is an actual failure time or a
censoring time, treatment indicates two levels of treatments, and ifn indicates the use of interferon as a
treatment. The variables et and lt are the exponential and logarithm transformation of the time t. The
published data contains other potential covariates that are not listed here. This example concentrates on the
effectiveness of the interferon treatment.
Exponential Survival Model
The density function for exponentially distributed survival times is as follows:
p.ti ji / D i exp . i ti /
Note that this formulation of the exponential distribution is different from what is used in the SAS probability
function PDF. The definition used in PDF for the exponential distributions is as follows:
p.ti ji / D
1
ti
exp.
/
i
i
The relationship between and is as follows:
i D
1
i
The corresponding survival function, using the i formulation, is as follows:
S.ti ji / D exp . i ti /
If you have a sample fti g of n independent exponential survival times, each with mean i , then the likelihood
function in terms of is as follows:
L.jt/ D …niD1 p.ti ji /i S.ti ji /1
i
D …niD1 .i exp. i ti //i .exp. i ti //1
i
D …niD1 i i exp. i ti /
If you link the covariates to with i D exp x0i ˇ, where xi is the vector of covariates corresponding to the
ith observation and ˇ is a vector of regression coefficients, then the log-likelihood function is as follows:
l.ˇjt; x/ D
n
X
i x0i ˇ
ti exp.x0i ˇ/
i D1
In the absence of prior information about the parameters in this model, you can choose diffuse normal priors
for the ˇ:
ˇ normal.0; sd =10000/
5836 F Chapter 74: The MCMC Procedure
There are two ways to program the log-likelihood function in PROC MCMC. You can use the SAS functions
LOGPDF and LOGSDF. Alternatively, you can use the simplified log-likelihood function, which is more
computationally efficient. You get identical results by using either approaches.
The following PROC MCMC statements fit an exponential model with simplified log-likelihood function:
title 'Exponential Survival Model';
ods graphics on;
proc mcmc data=e1684 outpost=expsurvout nmc=10000 seed=4861
diag=(mcse ess);
ods select PostSumInt TADpanel
ess mcse;
parms (beta0 beta1) 0;
prior beta: ~ normal(0, sd = 10000);
/*****************************************************/
/* (1) the logpdf and logsdf functions are not used */
/*****************************************************/
/*
nu = 1/exp(beta0 + beta1*ifn);
llike = v*logpdf("exponential", t, nu) +
(1-v)*logsdf("exponential", t, nu);
*/
/****************************************************/
/* (2) the simplified likelihood formula is used
*/
/****************************************************/
l_h = beta0 + beta1*ifn;
llike = v*(l_h) - t*exp(l_h);
model general(llike);
run;
The two assignment statements that are commented out calculate the log-likelihood function by using the
SAS functions LOGPDF and LOGSDF for the exponential distribution. The next two assignment statements
calculate the log likelihood by using the simplified formula. The first approach is slower because of the
redundant calculation involved in calling both LOGPDF and LOGSDF.
An examination of the trace plots for ˇ0 and ˇ1 (see Output 74.13.1) reveals that the sampling has gone well
with no particular concerns about the convergence or mixing of the chains.
Example 74.13: Exponential and Weibull Survival Analysis F 5837
Output 74.13.1 Posterior Plots for ˇ0 and ˇ1 in the Exponential Survival Analysis
The MCMC results are shown in Output 74.13.2.
5838 F Chapter 74: The MCMC Procedure
Output 74.13.2 Posterior Summary and Interval Statistics
Exponential Survival Model
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
Standard
Mean Deviation
N
95%
HPD Interval
beta0
10000 -1.6715
0.1091 -1.8930 -1.4673
beta1
10000 -0.2879
0.1615 -0.6104 0.0169
The Monte Carlo standard errors and effective sample sizes are shown in Output 74.13.3. The posterior
means for ˇ0 and ˇ1 are estimated with high precision, with small standard errors with respect to the standard
deviation. This indicates that the mean estimates have stabilized and do not vary greatly in the course of the
simulation. The effective sample sizes are roughly the same for both parameters.
Output 74.13.3 MCSE and ESS
Exponential Survival Model
The MCMC Procedure
Monte Carlo Standard Errors
Standard
MCSE Deviation MCSE/SD
Parameter
beta0
0.00302
0.1091
0.0277
beta1
0.00485
0.1615
0.0301
Effective Sample Sizes
Parameter
ESS
Autocorrelation
Time Efficiency
beta0
1304.1
7.6682
0.1304
beta1
1107.2
9.0319
0.1107
The next part of this example shows fitting a Weibull regression to the data and then comparing the two
models with DIC to see which one provides a better fit to the data.
Weibull Survival Model
The density function for Weibull distributed survival times is as follows:
p.ti j˛; i / D ˛ti˛
1
exp.i
exp.i /ti˛ /
Note that this formulation of the Weibull distribution is different from what is used in the SAS probability
function PDF. The definition used in PDF is as follows:
˛ ˛ 1
ti
˛ ti
p.ti j˛; i / D exp
i
i i
The relationship between and in these two parameterizations is as follows:
i D
˛ log i
Example 74.13: Exponential and Weibull Survival Analysis F 5839
The corresponding survival function, using the i formulation, is as follows:
S.ti j˛; i / D exp. exp.i /ti˛ /
If you have a sample fti g of n independent Weibull survival times, with parameters ˛, and i , then the
likelihood function in terms of ˛ and is as follows:
L.˛; jt/ D …niD1 p.ti j˛; i /i S.ti j˛; i /1
i
D …niD1 .˛ti˛
1
exp.i
exp.i /ti˛ //i .exp. exp.i /ti˛ //1
D …niD1 .˛ti˛
1
exp.i //i .exp. exp.i /ti˛ //
i
If you link the covariates to with i D x0i ˇ, where xi is the vector of covariates corresponding to the ith
observation and ˇ is a vector of regression coefficients, the log-likelihood function becomes this:
l.˛; ˇjt; x/ D
n
X
i .log.˛/ C .˛
1/ log.ti / C x0i ˇ/
exp.x0i ˇ/ti˛ /
i D1
As with the exponential model, in the absence of prior information about the parameters in this model, you
can use diffuse normal priors on ˇ: You might want to choose a diffuse gamma distribution for ˛: Note that
when ˛ D 1, the Weibull survival likelihood reduces to the exponential survival likelihood. Equivalently, by
looking at the posterior distribution of ˛, you can conclude whether fitting an exponential survival model
would be more appropriate than the Weibull model.
PROC MCMC also enables you to make inference on any functions of the parameters. Quantities of interest
in survival analysis include the value of the survival function at specific times for specific treatments and the
relationship between the survival curves for different treatments. With PROC MCMC, you can compute a
sample from the posterior distribution of the interested survival functions at any number of points. The data
in this example range from about 0 to 10 years, and the treatment of interest is the use of interferon.
Like in the previous exponential model example, there are two ways to fit this model: using the SAS functions
LOGPDF and LOGSDF, or using the simplified log likelihood functions. The example uses the latter method.
The following statements run PROC MCMC and produce Output 74.13.4:
title 'Weibull Survival Model';
proc mcmc data=e1684 outpost=weisurvout nmc=10000 seed=1234
monitor=(_parms_ surv_ifn surv_noifn) stats=(summary intervals);
ods select PostSummaries;
ods output PostSummaries=ds PostIntervals=is;
array surv_ifn[10];
array surv_noifn[10];
parms alpha 1 (beta0 beta1) 0;
prior beta: ~ normal(0, var=10000);
prior alpha ~ gamma(0.001,is=0.001);
beginnodata;
do t1 = 1 to 10;
surv_ifn[t1] = exp(-exp(beta0+beta1)*t1**alpha);
surv_noifn[t1] = exp(-exp(beta0)*t1**alpha);
end;
endnodata;
5840 F Chapter 74: The MCMC Procedure
lambda = beta0 + beta1*ifn;
/*****************************************************/
/* (1) the logpdf and logsdf functions are not used */
/*****************************************************/
/*
gamma = exp(-lambda /alpha);
llike = v*logpdf('weibull', t, alpha, gamma) +
(1-v)*logsdf('weibull', t, alpha, gamma);
/
*
/****************************************************/
/* (2) the simplified likelihood formula is used
*/
/****************************************************/
llike = v*(log(alpha) + (alpha-1)*log(t) + lambda) exp(lambda)*(t**alpha);
model general(llike);
run;
The MONITOR= option indicates the parameters and quantities of interest that PROC MCMC tracks. The
symbol _PARMS_ specifies all model parameters. The array surv_ifn stores the expected survival probabilities
for patients who received interferon over a period of 10 years. Similarly, surv_noifn stores the expected
survival probabilities for patients who did not received interferon.
The BEGINNODATA and ENDNODATA statements enclose the calculations for the survival probabilities.
The assignment statements proceeding the MODEL statement calculate the log likelihood for the Weibull
survival model. The MODEL statement specifies the log likelihood that you programmed.
An examination of the trace plots for ˛, ˇ0 , and ˇ1 (not displayed here) reveals that the sampling has gone
well, with no particular concerns about the convergence or mixing of the chains.
Output 74.13.4 displays the posterior summary statistics.
Example 74.13: Exponential and Weibull Survival Analysis F 5841
Output 74.13.4 Posterior Summary Statistics
Weibull Survival Model
The MCMC Procedure
Posterior Summaries
Percentiles
Parameter
N
Standard
Mean Deviation
25
50
75
alpha
10000 0.7891
0.0539 0.7514 0.7880 0.8260
beta0
10000 -1.3581
0.1369 -1.4519 -1.3597 -1.2624
beta1
10000 -0.2512
0.1541 -0.3541 -0.2606 -0.1521
surv_ifn1
10000 0.8175
0.0227 0.8027 0.8187 0.8331
surv_ifn2
10000 0.7066
0.0291 0.6874 0.7072 0.7265
surv_ifn3
10000 0.6203
0.0331 0.5983 0.6205 0.6436
surv_ifn4
10000 0.5495
0.0360 0.5253 0.5497 0.5747
surv_ifn5
10000 0.4899
0.0381 0.4635 0.4895 0.5170
surv_ifn6
10000 0.4390
0.0396 0.4118 0.4382 0.4666
surv_ifn7
10000 0.3949
0.0406 0.3669 0.3934 0.4223
surv_ifn8
10000 0.3564
0.0413 0.3281 0.3551 0.3840
surv_ifn9
10000 0.3225
0.0416 0.2940 0.3212 0.3505
surv_ifn10
10000 0.2926
0.0416 0.2638 0.2911 0.3208
surv_noifn1
10000 0.7719
0.0274 0.7535 0.7736 0.7913
surv_noifn2
10000 0.6401
0.0339 0.6171 0.6415 0.6635
surv_noifn3
10000 0.5415
0.0374 0.5161 0.5428 0.5662
surv_noifn4
10000 0.4635
0.0395 0.4365 0.4636 0.4890
surv_noifn5
10000 0.4001
0.0406 0.3725 0.3995 0.4261
surv_noifn6
10000 0.3475
0.0411 0.3195 0.3459 0.3745
surv_noifn7
10000 0.3034
0.0411 0.2758 0.3012 0.3299
surv_noifn8
10000 0.2661
0.0406 0.2384 0.2630 0.2921
surv_noifn9
10000 0.2342
0.0399 0.2069 0.2311 0.2592
surv_noifn10 10000 0.2069
0.0389 0.1803 0.2035 0.2312
An examination of the ˛ parameter reveals that the exponential model might not be inappropriate here. The
estimated posterior mean of ˛ is 0.7856 with a posterior standard deviation of 0.0533. As noted previously, if
˛ D 1, then the Weibull survival distribution is the exponential survival distribution. With these data, you can
see that the evidence is in favor of ˛ < 1. The value 1 is almost 4 posterior standard deviations away from
the posterior mean. The following statements compute the posterior probability of the hypothesis that ˛ < 1::
proc format;
value alphafmt low-<1 = 'alpha < 1' 1-high = 'alpha >= 1';
run;
proc freq data=weisurvout;
tables alpha /nocum;
format alpha alphafmt.;
run;
The PROC FREQ results are shown in Output 74.13.5.
5842 F Chapter 74: The MCMC Procedure
Output 74.13.5 Frequency Analysis of ˛
Weibull Survival Model
The FREQ Procedure
alpha Frequency Percent
alpha < 1
9998
99.98
alpha >= 1
2
0.02
The output from PROC FREQ shows that 100% of the 10000 simulated values for ˛ are less than 1. This is a
very strong indication that the exponential model is too restrictive to model these data well.
You can examine the estimated survival probabilities over time individually, either through the posterior
summary statistics or by looking at the kernel density plots. Alternatively, you might find it more informative
to examine these quantities in relation with each other. For example, you can use a side-by-side box plot
to display these posterior distributions by using PROC SGPLOT. For more information, see the section
“Statistical Graphics Using ODS” on page 587 in Chapter 21, “Statistical Graphics Using ODS.” First you
need to take the posterior output data set Weisurvout and stack variables that you want to plot. For example,
to plot all the survival times for patients who received interferon, you want to stack surv_inf1–surv_inf10.
The macro %Stackdata takes an input data set dataset, stacks the wanted variables vars, and outputs them
into the output data set.
The following statements define the macro stackdata:
/* define macro stackdata */
%macro StackData(dataset,output,vars);
data &output;
length var $ 32;
if 0 then set &dataset nobs=nnn;
array lll[*] &vars;
do jjj=1 to dim(lll);
do iii=1 to nnn;
set &dataset point=iii;
value = lll[jjj];
call vname(lll[jjj],var);
output;
end;
end;
stop;
keep var value;
run;
%mend;
/* stack the surv_ifn variables and saved them to survifn. */
%StackData(weisurvout, survifn, surv_ifn1-surv_ifn10);
Once you stack the data, use PROC SGPLOT to create the side-by-side box plots. The following statements
generate Output 74.13.6:
Example 74.13: Exponential and Weibull Survival Analysis F 5843
proc sgplot data=survifn;
yaxis label='Survival Probability' values=(0 to 1 by 0.2);
xaxis label='Time' discreteorder=data;
vbox value / category=var;
run;
Output 74.13.6 Side-by-Side Box Plots of Estimated Survival Probabilities
There is a clear decreasing trend over time of the survival probabilities for patients who receive the treatment.
You might ask how does this group compare to those who did not receive the treatment? In this case, you
want to overlay the two predicted curves for the two groups of patients and add the corresponding credible
interval. See Output 74.13.7. To generate the graph, you first take the posterior mean estimates from the
ODS output table ds and the lower and upper HPD interval estimates is, store them in the data set Surv, and
draw the figure by using PROC SGPLOT.
The following statements generate data set Surv:
data surv;
set ds;
if _n_ >= 4 then do;
set is point=_n_;
group = 'with interferon
';
time = _n_ - 3;
if time > 10 then do;
time = time - 10;
group = 'without interferon';
end;
5844 F Chapter 74: The MCMC Procedure
output;
end;
keep time group mean hpdlower hpdupper;
run;
The following SGPLOT statements generate Output 74.13.7:
proc sgplot data=surv;
yaxis label="Survival Probability" values=(0 to 1 by 0.2);
series x=time y=mean / group = group name='i';
band x=time lower=hpdlower upper=hpdupper / group = group transparency=0.7;
keylegend 'i';
run;
ods graphics off;
In Output 74.13.7, the solid line is the survival curve for patients who received interferon; the shaded region
centers at the solid line is the 95% HPD intervals; the medium-dashed line is the survival curve for patients
who did not receive interferon; and the shaded region around the dashed line is the corresponding 95% HPD
intervals.
Output 74.13.7 Predicted Survival Probability Curves with 95% HPD Intervals
The plot suggests that there is an effect of using interferon because patients who received interferon have
sustained better survival probabilities than those who did not. However, the effect might not be very
significant, as the 95% credible intervals of the two groups do overlap. For more on these interferon studies,
see Ibrahim, Chen, and Lipsitz (2001).
Example 74.13: Exponential and Weibull Survival Analysis F 5845
Weibull or Exponential?
Although the evidence from the Weibull model fit shows that the posterior distribution of ˛ has a significant
amount of density mass less than 1, suggesting that the Weibull model is a better fit to the data than the
exponential model, you might still be interested in comparing the two models more formally. You can use the
Bayesian model selection criterion (see the section “Deviance Information Criterion (DIC)” on page 152 in
Chapter 7, “Introduction to Bayesian Analysis Procedures”) to determine which model fits the data better.
The PROC MCMC DIC option requests the calculation of DIC, and the procedure displays the ODS output
table DIC. The table includes the posterior mean of the deviation, D./, deviation at the estimate, D./,
effective number of parameters, pD , and DIC. It is important to remember that the standardizing term,
p.y/, which is a function of the data alone, is not taken into account in calculating the DIC. This term is
irrelevant only if you compare two models that have the same likelihood function. If you do not have identical
likelihood functions, using DIC for model selection purposes without taking this standardizing term into
account can produce incorrect results. In addition, you want to be careful in interpreting the DIC whenever
you use the GENERAL function to construct the log-likelihood, as the case in this example. Using the
GENERAL function, you can obtain identical posterior samples with two log-likelihood functions that differ
only by a constant. This difference translates to a difference in the DIC calculation, which could be very
misleading.
If ˛ D 1, the Weibull likelihood is identical to the exponential likelihood. It is safe in this case to directly
compare DICs from these two models. However, if you do not want to work out the mathematical detail or
you are uncertain of the equivalence, a better way of comparing the DICs is to run the Weibull model twice:
once with ˛ being a parameter and once with ˛ D 1. This ensures that the likelihood functions are the same,
and the DIC comparison is meaningful.
The following statements fit a Weibull model:
title 'Model Comparison between Weibull and Exponential';
proc mcmc data=e1684 outpost=weisurvout nmc=10000 seed=4861 dic;
ods select dic;
parms alpha 1 (beta0 beta1) 0;
prior beta: ~ normal(0, var=10000);
prior alpha ~ gamma(0.001,is=0.001);
lambda = beta0 + beta1*ifn;
llike = v*(log(alpha) + (alpha-1)*log(t) + lambda) exp(lambda)*(t**alpha);
model general(llike);
run;
The DIC option requests the calculation of DIC, and the table is displayed in Output 74.13.8.
5846 F Chapter 74: The MCMC Procedure
Output 74.13.8 DIC Table from the Weibull Model
Model Comparison between Weibull and Exponential
The MCMC Procedure
Deviance Information Criterion
Dbar (posterior mean of deviance)
858.623
Dmean (deviance evaluated at posterior mean) 855.633
pD (effective number of parameters)
DIC (smaller is better)
2.990
861.614
The GENERAL or DGENERAL function is used in
this program. To make meaningful comparisons,
you must ensure that all GENERAL or DGENERAL
functions include appropriate normalizing
constants. Otherwise,
DIC comparisons can be misleading.
The note in Output 74.13.8 reminds you of the importance of ensuring identical likelihood functions when
you use the GENERAL function. The DIC value is 861.6.
Based on the same set of code, the following statements fit an exponential model by setting ˛ D 1:
proc mcmc data=e1684 outpost=expsurvout nmc=10000 seed=4861 dic;
ods select dic;
parms beta0 beta1 0;
prior beta: ~ normal(0, var=10000);
begincnst;
alpha = 1;
endcnst;
lambda = beta0 + beta1*ifn;
llike = v*(log(alpha) + (alpha-1)*log(t) + lambda) exp(lambda)*(t**alpha);
model general(llike);
run;
Output 74.13.9 displays the DIC table.
Output 74.13.9 DIC Table from the Exponential Model
Model Comparison between Weibull and Exponential
The MCMC Procedure
Deviance Information Criterion
Dbar (posterior mean of deviance)
870.133
Dmean (deviance evaluated at posterior mean) 868.190
pD (effective number of parameters)
DIC (smaller is better)
1.943
872.075
The GENERAL or DGENERAL function is used in
this program. To make meaningful comparisons,
you must ensure that all GENERAL or DGENERAL
functions include appropriate normalizing
constants. Otherwise,
DIC comparisons can be misleading.
Example 74.14: Time Independent Cox Model F 5847
The DIC value of 872.075 is greater than 861. A smaller DIC indicates a better fit to the data; hence, you
can conclude that the Weibull model is more appropriate for this data set. You can see the equivalencing of
the exponential model you fitted in “Exponential Survival Model” on page 5835 by running the following
comparison.
The following statements are taken from the section “Exponential Survival Model” on page 5835, and they fit
the same exponential model:
proc mcmc data=e1684 outpost=expsurvout1 nmc=10000 seed=4861 dic;
ods select none;
parms (beta0 beta1) 0;
prior beta: ~ normal(0, sd = 10000);
l_h = beta0 + beta1*ifn;
llike = v*(l_h) - t*exp(l_h);
model general(llike);
run;
proc compare data=expsurvout compare=expsurvout1;
var beta0 beta1;
run;
The posterior samples of beta0 and beta1 in the data set Expsurvout1 are identical to those in the data set
Expsurvout. The comparison results are not shown here.
Example 74.14: Time Independent Cox Model
This example has two purposes. One is to illustrate how to use PROC MCMC to fit a Cox proportional hazard
model. Specifically, the time independent model. See “Example 74.15: Time Dependent Cox Model” on
page 5854 for an example on fitting time dependent Cox model. Note that it is much easier to fit a Bayesian
Cox model by specifying the BAYES statement in PROC PHREG (see Chapter 86, “The PHREG Procedure”).
If you are interested only in fitting a Cox regression survival model, you should use PROC PHREG.
The second objective of this example is to demonstrate how to model data that are not independent. That is
the case where the likelihood for observation i depends on other observations in the data
Q set. In other words,
if you work with a likelihood function that cannot be broken down simply as L.y/ D ni L.yi /, you can use
this example for illustrative purposes. By default, PROC MCMC assumes that the programming statements
and model specification is intended for a single row of observations in the data set. The Cox model is chosen
because the complexity in the data structure requires more elaborate coding.
The Cox proportional hazard model is widely used in the analysis of survival time, failure time, or other
duration data to explain the effect of exogenous explanatory variables. The data set used in this example
is taken from Krall, Uthoff, and Harley (1975), who analyzed data from a study on myeloma in which
researchers treated 65 patients with alkylating agents. Of those patients, 48 died during the study and 17
survived. The following statements generate the data set that is used in this example:
data Myeloma;
input Time Vstatus LogBUN HGB Platelet Age LogWBC Frac
LogPBM Protein SCalc;
label Time='survival time'
VStatus='0=alive 1=dead';
datalines;
5848 F Chapter 74: The MCMC Procedure
1.25
1.25
2.00
2.00
1
1
1
1
2.2175
1.9395
1.5185
1.7482
9.4
12.0
9.8
11.3
1
1
1
0
67
38
81
75
3.6628
3.9868
3.8751
3.8062
1
1
1
1
1.9542
1.9542
2.0000
1.2553
12
20
2
0
10
18
15
12
1
60
3.6812
0
0.9542
0
12
... more lines ...
77.00
;
0
1.0792
14.0
proc sort data = Myeloma;
by descending time;
run;
data _null_;
set Myeloma nobs=_n;
call symputx('N', _n);
stop;
run;
The variable Time represents the survival time in months from diagnosis. The variable VStatus consists
of two values, 0 and 1, indicating whether the patient was alive or dead, respectively, at the end of the
study. If the value of VStatus is 0, the corresponding value of Time is censored. The variables thought
to be related to survival are LogBUN (log.BUN/ at diagnosis), HGB (hemoglobin at diagnosis), Platelet
(platelets at diagnosis: 0=abnormal, 1=normal), Age (age at diagnosis in years), LogWBC (log(WBC) at
diagnosis), Frac (fractures at diagnosis: 0=none, 1=present), LogPBM (log percentage of plasma cells in
bone marrow), Protein (proteinuria at diagnosis), and SCalc (serum calcium at diagnosis). Interest lies in
identifying important prognostic factors from these explanatory variables. In addition, there are 65 (&n)
observations in the data set Myeloma. The likelihood used in these examples is the Breslow likelihood:
2
3 vi
di
n
0
Y
Y
exp.ˇ Zj .ti //
4
5
P
L.ˇ/ D
0 Z .t //
exp.ˇ
i
l
l2Ri
i D1
j D1
where
ˇ is the vector parameters
n is the total number of observations in the data set
ti is the ith time, which can be either event time or censored time
Zl .t/ is the vector explanatory variables for the lth individual at time t
di is the multiplicity of failures at ti . If there are no ties in time, di is 1 for all i.
Ri is the risk set for the ith time ti , which includes all observations that have survival time greater than
or equal to ti
vi indicates whether the patient is censored. The value 0 corresponds to censoring. Note that the
censored time ti enters the likelihood function only through the formation of the risk set Ri .
Example 74.14: Time Independent Cox Model F 5849
Priors on the coefficients are independent normal priors with very large variance (1e6). Throughout this
example,P
the symbol bZ represents the regression term ˇ 0 Zj .ti / in the likelihood, and the symbol S represents
the term l2Ri exp.ˇ 0 Zl .ti //.
The regression model considered in this example uses the following formula:
ˇ 0 Zj
D ˇ1 logbun C ˇ2 hgb C ˇ3 platelet C ˇ4 age C
ˇ5 logwbc C ˇ6 frac C ˇ7 logpbm C ˇ8 protein C ˇ9 scalc
The hard part of coding this in PROC MCMC is the construction of the risk set Ri . Ri contains all
observations that have survival time greater than or equal to ti . First suppose that there are no ties in
time. Sorting the data set by the variable time into descending order gives you Ri that is in the right order.
Observation i’s risk set consists of all data points j such that j <D i in the data set. You can cumulatively
increment S in the SAS statements.
With potential ties in time, at observation i, you need to know whether any subsequent observations, i + 1
and so on, have the same survival time as ti . Suppose that the ith, the i + 1, and the i + 2 observations all
have the same survival time; all three of them need to be included in the risk set calculation. This means
that to calculate the likelihood for some observations, you need to access both the previous and subsequent
observations in the data set. There are two ways to do this. One is to use the LAG function; the other is to
use the option JOINTMODEL.
The LAG function returns values from a queue (see SAS Language Reference: Dictionary). So for the ith
observation, you can use LAG1 to access variables from the previous row in the data set. You want to
compare the lag1 value of time with the current time value. Depending on whether the two time values are
equal, you can add correction terms in the calculation for the risk set S.
The following statements sort the data set by time into descending order, with the largest survival time on top:
title 'Cox Model with Time Independent Covariates';
proc freq data=myeloma;
ods select none;
tables time / out=freqs;
run;
proc sort data = freqs;
by descending time;
run;
data myelomaM;
set myeloma;
ind = _N_;
run;
ods select all;
The following statements run PROC MCMC and produce Output 74.14.1:
proc mcmc data=myelomaM outpost=outi nmc=50000 ntu=3000 seed=1;
ods select PostSumInt;
array beta[9];
parms beta: 0;
prior beta: ~ normal(0, var=1e6);
5850 F Chapter 74: The MCMC Procedure
bZ = beta1 * LogBUN + beta2 * HGB + beta3 * Platelet
+ beta4 * Age + beta5 * LogWBC + beta6 * Frac +
beta7 * LogPBM + beta8 * Protein + beta9 * SCalc;
if ind = 1 then do;
/* first observation
*/
S = exp(bZ);
l = vstatus * bZ;
v = vstatus;
end;
else if (1 < ind < &N) then do;
if (lag1(time) ne time) then do;
l = vstatus * bZ;
l = l - v * log(S); /* correct the loglike value
*/
v = vstatus;
/* reset v count value
*/
S = S + exp(bZ);
end;
else do;
/* still a tie
*/
l = vstatus * bZ;
S = S + exp(bZ);
v = v + vstatus;
/* add # of nonsensored values */
end;
end;
else do;
/* last observation
*/
if (lag1(time) ne time) then do;
l = - v * log(S);
/* correct the loglike value
*/
S = S + exp(bZ);
l = l + vstatus * (bZ - log(S));
end;
else do;
S = S + exp(bZ);
l = vstatus * bZ - (v + vstatus) * log(S);
end;
end;
model general(l);
run;
The symbol bZ is the regression term, which is independent of the time variable. The symbol ind indexes
observation numbers in the data set. The symbol S keeps track of the risk set term for every observation. The
symbol l calculates the log likelihood for each observation. Note that the value of l for observation ind is not
necessarily the correct log likelihood value for that observation, especially in cases where the observation ind
is in the tied times. Correction terms are added to subsequent values of l when the time variable becomes
different in order to make up the difference. The total sum of l calculated over the entire data set is correct.
The symbol v keeps track of the sum of vstatus, as censored data do not enter the likelihood and need to be
taken out.
You use the function LAG1 to detect if two adjacent time values are different. If they are, you know that
the current observation is in a different risk set than the last one. You then need to add a correction term
to the log likelihood value of l. The IF-ELSE statements break the observations into three parts: the first
observation, the last observation and everything in the middle.
Example 74.14: Time Independent Cox Model F 5851
Output 74.14.1 Summary Statistics on Cox Model with Time Independent Explanatory Variables and Ties
in the Survival Time, Using PROC MCMC
Cox Model with Time Independent Covariates
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
beta1
50000 1.7600
0.6441 0.5117
3.0465
beta2
50000 -0.1308
0.0720 -0.2746 0.00524
beta3
50000 -0.2017
0.5148 -1.2394
0.7984
beta4
50000 -0.0126
0.0194 -0.0512
0.0245
beta5
50000 0.3373
0.7256 -1.1124
1.7291
beta6
50000 0.3992
0.4337 -0.4385
1.2575
beta7
50000 0.3749
0.4861 -0.5423
1.3689
beta8
50000 0.0106
0.0271 -0.0451
0.0616
beta9
50000 0.1272
0.1064 -0.0763
0.3406
An alternative to using the LAG function is to use the PROC option JOINTMODEL. With this option,
the log-likelihood function you specify applies not to a single observation but to the entire data set. See
“Modeling Joint Likelihood” on page 5729 for details on how to properly use this option. The basic idea is
that you store all necessary data set variables in arrays and use only the arrays to construct the log likelihood
of the entire data set. This approach works here because for every observation i, you can use index to access
different values of arrays to construct the risk set S. To use the JOINTMODEL option, you need to do some
additional data manipulation. You want to create a stop variable for each observation, which indicates the
observation number that should be included in S for that observation. For example, if observations 4, 5, 6 all
have the same survival time, the stop value for all of them is 6.
The following statements generate a new data set MyelomaM that contains the stop variable:
data myelomaM;
merge myelomaM freqs(drop=percent);
by descending time;
retain stop;
if first.time then do;
stop = _n_ + count - 1;
end;
run;
The following SAS program fits the same Cox model by using the JOINTMODEL option:
data a;
run;
proc mcmc data=a outpost=outa nmc=50000 ntu=3000 seed=1 jointmodel;
ods select none;
array beta[9];
array data[1] / nosymbols;
array timeA[1] / nosymbols;
array vstatusA[1] / nosymbols;
array stopA[1] / nosymbols;
5852 F Chapter 74: The MCMC Procedure
array bZ[&n];
array S[&n];
begincnst;
rc = read_array("myelomam", data, "logbun", "hgb", "platelet",
"age", "logwbc", "frac", "logpbm", "protein", "scalc");
rc = read_array("myelomam", timeA, "time");
rc = read_array("myelomam", vstatusA, "vstatus");
rc = read_array("myelomam", stopA, "stop");
endcnst;
parms (beta:) 0;
prior beta: ~ normal(0, var=1e6);
jl = 0;
/* calculate each bZ and cumulatively adding S as if there are no ties.*/
call mult(data, beta, bZ);
S[1] = exp(bZ[1]);
do i = 2 to &n;
S[i] = S[i-1] + exp(bZ[i]);
end;
do i = 1 to &n;
/* correct the S[i] term, when needed. */
if(stopA[i] > i) then do;
do j = (i+1) to stopA[i];
S[i] = S[i] + exp(bZ[j]);
end;
end;
jl = jl + vstatusA[i] * (bZ[i] - log(S[i]));
end;
model general(jl);
run;
ods select all;
No output tables were produced because this PROC MCMC run produces identical posterior samples as does
the previous example.
Because the JOINTMODEL option is specified here, you do not need to specify myelomaM as the input data
set. An empty data set a is used to speed up the procedure run.
Multiple ARRAY statements allocate array symbols that are used to store the parameters (beta), the response
and the covariates (data, timeA, vstatusA, and stopA), and the work space (bZ and S). The data, timeA,
vstatusA, and stopA arrays are declared with the /NOSYMBOLS option. This option enables PROC
MCMC to dynamically resize these arrays to match the dimensions of the input data set. See the section
“READ_ARRAY Function” on page 5663. The bZ and S arrays store the regression term and the risk set
term for every observation.
The BEGINCNST and ENDCNST statements enclose programming statements that read the data set variables
into these arrays. The rest of the programming statements construct the log likelihood for the entire data set.
The CALL MULT function calculates the regression term in the model and stores the result in the array bZ. In
the first DO loop, you sum the risk set term S as if there are no ties in time. This underevaluates some of the
S elements. For observations that have a tied time, you make the necessary correction to the corresponding
S values. The correction takes place in the second DO loop. Any observation that has a tied time also has
Example 74.14: Time Independent Cox Model F 5853
a stopA[i] that is different from i. You add the right terms to S and sum up the joint log likelihood jl. The
MODEL statement specifies that the log likelihood takes on the value of jl.
To see that you get identical results from these two approaches, use PROC COMPARE to compare the
posterior samples from two runs:
proc compare data=outi compare=outa;
ods select comparesummary;
var beta1-beta9;
run;
The output is not shown here.
Generally, the JOINTMODEL option can be slightly faster than using the default setup. The savings come
from avoiding the overhead cost of accessing the data set repeatedly at every iteration. However, the speed
gain is not guaranteed because it largely depends on the efficiency of your programs.
PROC PHREG fits the same model, and you get very similar results to PROC MCMC. The following
statements fit the model using PROC PHREG and produce Output 74.14.2:
proc phreg data=Myeloma;
ods select PostSumInt;
model Time*VStatus(0)=LogBUN HGB Platelet Age LogWBC
Frac LogPBM Protein Scalc;
bayes seed=1 nmc=10000 outpost=phout;
run;
Output 74.14.2 Summary Statistics for Cox Model with Time Independent Explanatory Variables and Ties
in the Survival Time, Using PROC PHREG
Cox Model with Time Independent Covariates
The PHREG Procedure
Bayesian Analysis
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
LogBUN
10000 1.7610
0.6593 0.4107
HGB
10000 -0.1279
0.0727 -0.2801 0.00599
2.9958
Platelet
10000 -0.2179
0.5169 -1.1871
0.8341
Age
10000 -0.0130
0.0199 -0.0519
0.0251
LogWBC
10000 0.3150
0.7451 -1.1783
1.7483
Frac
10000 0.3766
0.4152 -0.4273
1.2021
LogPBM
10000 0.3792
0.4909 -0.5939
1.3241
Protein
10000 0.0102
0.0267 -0.0405
0.0637
SCalc
10000 0.1248
0.1062 -0.0846
0.3322
5854 F Chapter 74: The MCMC Procedure
Example 74.15: Time Dependent Cox Model
This example uses the same Myeloma data set as in “Example 74.14: Time Independent Cox Model” on
page 5847, and illustrates the fitting of a time dependent Cox model. The following statements generate the
data set once again:
data Myeloma;
input Time Vstatus LogBUN HGB Platelet
LogPBM Protein SCalc;
label Time='survival time'
VStatus='0=alive 1=dead';
datalines;
1.25 1 2.2175
9.4 1 67 3.6628 1
1.25 1 1.9395 12.0 1 38 3.9868 1
2.00 1 1.5185
9.8 1 81 3.8751 1
2.00 1 1.7482 11.3 0 75 3.8062 1
Age LogWBC Frac
1.9542
1.9542
2.0000
1.2553
12
20
2
0
10
18
15
12
0.9542
0
12
... more lines ...
77.00
;
0
1.0792
14.0
1
60
3.6812
0
To model Zi .ti / as a function of the survival time, you can relate time ti to covariates by using this formula:
ˇ 0 Zj .ti / D .ˇ1 C ˇ2 ti /logbun C .ˇ3 C ˇ4 ti /hgb C .ˇ5 C ˇ6 ti /platelet
For illustrational purposes, only three explanatory variables, LOGBUN, HBG, and PLATELET, are used in
this example.
P
Since Zj .ti / depends on ti , every term in the summation of l2Ri exp.ˇ 0 Zl .ti // is a product of the current
time ti and all observations that are in the risk set. You can use the JOINTMODEL option, as in the last
example, or you can modify the input data set such that every row contains not only the current observation
but also all observations that are in the corresponding risk set. When you construct the log likelihood for
each observation, you have all the relevant data at your disposal.
The following statements illustrate how you can create a new data set with different risk sets at different rows:
title 'Cox Model with Time Dependent Covariates';
proc sort data = Myeloma;
by descending time;
run;
data _null_;
set Myeloma nobs=_n;
call symputx('N', _n);
stop;
run;
ods select none;
proc freq data=myeloma;
tables time / out=freqs;
run;
Example 74.15: Time Dependent Cox Model F 5855
ods select all;
proc sort data = freqs;
by descending time;
run;
data myelomaM;
set myeloma;
ind = _N_;
run;
data myelomaM;
merge myelomaM freqs(drop=percent); by descending time;
retain stop;
if first.time then do;
stop = _n_ + count - 1;
end;
run;
%macro array(list);
%global mcmcarray;
%let mcmcarray = ;
%do i = 1 %to 32000;
%let v = %scan(&list, &i, %str( ));
%if %nrbquote(&v) ne %then %do;
array _&v[&n];
%let mcmcarray = &mcmcarray array _&v[&n] _&v.1 - _&v.&n%str(;);
do i = 1 to stop;
set myelomaM(keep=&v) point=i;
_&v[i] = &v;
end;
%end;
%else %let i = 32001;
%end;
%mend;
data z;
set myelomaM;
%array(logbun hgb platelet);
drop vstatus logbun hgb platelet count stop;
run;
data myelomaM;
merge myelomaM z; by descending time;
run;
The data set MyelomaM contains 65 observations and 209 variables. For each observation, you see added
variables stop, _logbun1 through _logbun65, _hgb1 through _hgb65, and _platelet1 through _platelet65.
The variable stop indicates the number of observations that are in the risk set of the current observation.
The rest are transposed values of model covariates of the entire data set. The data set contains a number
of missing values. This is due to the fact that only the relevant observations are kept, such as _logbun1 to
_logbunstop. The rest of the cells are filled in with missing values. For example, the first observation has a
unique survival time of 92 and stop is 1, making it a risk set of itself. You see nonmissing values only in
_logbun1, _hgb1, and _platelet1.
5856 F Chapter 74: The MCMC Procedure
The following statements fit the Cox model by using PROC MCMC:
proc mcmc data=myelomaM outpost=outi nmc=50000 ntu=3000 seed=17
missing=ac;
ods select PostSumInt;
array beta[6];
&mcmcarray
parms (beta:) 0;
prior beta: ~ normal(0, prec=1e-6);
b = (beta1 + beta2 * time) * logbun +
(beta3 + beta4 * time) * hgb +
(beta5 + beta6 * time) * platelet;
S = 0;
do i = 1 to stop;
S = S + exp( (beta1 + beta2 * time) * _logbun[i] +
(beta3 + beta4 * time) * _hgb[i] +
(beta5 + beta6 * time) * _platelet[i]);
end;
loglike = vstatus * (b - log(S));
model general(loglike);
run;
Note that the option MISSING= is set to AC. This is due to missing cells in the input data set. You must use
this option so that PROC MCMC retains observations that contain missing values.
The macro variable &mcmcarray is defined in the earlier part in this example. You can use a %put statement
to print its value:
%put &mcmcarray;
This statement prints the following:
array _logbun[65] _logbun1 - _logbun65; array _hgb[65] _hgb1 - _hgb65; array
_platelet[65] _platelet1 - _platelet65;
The macro uses the ARRAY statement to allocate three arrays, each of which links their corresponding data
set variables. This makes it easier to reference these data set variables in the program. The PARMS statement
puts all the parameters in the same block. The PRIOR statement gives them normal priors with large variance.
The symbol b is the regression term, and S is cumulatively added from 1 to stop for each observation in
the DO loop. The symbol loglike completes the construction of log likelihood for each observation and the
MODEL statement completes the model specification.
Posterior summary and interval statistics are shown in Output 74.15.1.
Example 74.15: Time Dependent Cox Model F 5857
Output 74.15.1 Summary Statistics on Cox Model with Time Dependent Explanatory Variables and Ties in
the Survival Time, Using PROC MCMC
Cox Model with Time Dependent Covariates
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
beta1
50000
3.2397
0.8226 1.6664
4.8752
beta2
50000
-0.1411
beta3
50000
-0.0369
beta4
50000 -0.00409
beta5
50000
0.3548
0.7359 -1.0394
1.8100
beta6
50000
-0.0417
0.0359 -0.1122
0.0269
0.0471 -0.2294 -0.0458
0.1017 -0.2272
0.1685
0.00360 -0.0112 0.00264
You can also use the option JOINTMODEL to get the same inference and avoid transposing the data for
every observation:
proc mcmc data=myelomaM outpost=outa nmc=50000 ntu=3000 seed=17 jointmodel;
ods select none;
array beta[6];
array timeA[&n];
array vstatusA[&n];
array logbunA[&n]; array hgbA[&n];
array plateletA[&n];
array stopA[&n];
array bZ[&n];
array S[&n];
begincnst;
timeA[ind]=time;
logbunA[ind]=logbun;
plateletA[ind]=platelet;
endcnst;
vstatusA[ind]=vstatus;
hgbA[ind]=hgb;
stopA[ind]=stop;
parms (beta:) 0;
prior beta: ~ normal(0, prec=1e-6);
jl = 0;
do i = 1 to &n;
v1 = beta1 +
v2 = beta3 +
v3 = beta5 +
bZ[i] = v1 *
beta2 * timeA[i];
beta4 * timeA[i];
beta6 * timeA[i];
logbunA[i] + v2 * hgbA[i] + v3 * plateletA[i];
/* sum over risk set without considering ties in time. */
S[i] = exp(bZ[i]);
if (i > 1) then do;
do j = 1 to (i-1);
b1 = v1 * logbunA[j] + v2 * hgbA[j] + v3 * plateletA[j];
S[i] = S[i] + exp(b1);
end;
end;
end;
5858 F Chapter 74: The MCMC Procedure
/* make correction to the risk set due to ties in time. */
do i = 1 to &n;
if(stopA[i] > i) then do;
v1 = beta1 + beta2 * timeA[i];
v2 = beta3 + beta4 * timeA[i];
v3 = beta5 + beta6 * timeA[i];
do j = (i+1) to stopA[i];
b1 = v1 * logbunA[j] + v2 * hgbA[j] + v3 * plateletA[j];
S[i] = S[i] + exp(b1);
end;
end;
jl = jl + vstatusA[i] * (bZ[i] - log(S[i]));
end;
model general(jl);
run;
The multiple ARRAY statements allocate array symbols that are used to store the parameters (beta), the
response (timeA), the covariates (vstatusA, logbunA, hgbA, plateletA, and stopA), and work space (bZ and
S). The bZ and S arrays store the regression term and the risk set term for every observation. Programming
statements in the BEGINCNST and ENDCNST statements input the response and covariates from the data
set to the arrays.
Using the same technique shown in the example “Example 74.14: Time Independent Cox Model” on
page 5847, the next DO loop calculates the regression term and corresponding S for every observation,
pretending that there are no ties in time. This means that the risk set for observation i involves only observation
1 to i. The correction terms are added to the corresponding S[i] in the second DO loop, conditional on whether
the stop variable is greater than the observation count itself. The symbol jl cumulatively adds the log
likelihood for the entire data set, and the MODEL statement specifies the joint log-likelihood function.
The following statements run PROC COMPARE and show that the output data set outa contains identical
posterior samples as outi:
proc compare data=outi compare=outa;
ods select comparesummary;
var beta1-beta6;
run;
The results are not shown here.
The following statements use PROC PHREG to fit the same time dependent Cox model:
proc phreg data=Myeloma;
ods select PostSumInt;
model Time*VStatus(0)=LogBUN z2 hgb z3 platelet z4;
z2 = Time*logbun;
z3 = Time*hgb;
z4 = Time*platelet;
bayes seed=1 nmc=10000 outpost=phout;
run;
Coding is simpler than PROC MCMC. See Output 74.15.2 for posterior summary and interval statistics:
Example 74.16: Piecewise Exponential Frailty Model F 5859
Output 74.15.2 Summary Statistics on Cox Model with Time Dependent Explanatory Variables and Ties in
the Survival Time, Using PROC PHREG
Cox Model with Time Dependent Covariates
The PHREG Procedure
Bayesian Analysis
Posterior Summaries and Intervals
Parameter
Standard
Mean Deviation
N
95%
HPD Interval
LogBUN
10000
3.2423
z2
10000
-0.1401
0.8311 1.5925
0.0482 -0.2354 -0.0492
4.8582
HGB
10000
-0.0382
0.1009 -0.2331
z3
10000 -0.00407
Platelet
10000
0.3778
0.7524 -1.1342
1.7968
z4
10000
-0.0419
0.0364 -0.1142
0.0274
0.1603
0.00363 -0.0109 0.00322
Example 74.16: Piecewise Exponential Frailty Model
This example illustrates how to fit a piecewise exponential frailty model using PROC MCMC. Part of the
notation and presentation in this example follows Clayton (1991) and the Luek example in Spiegelhalter et al.
(1996a).
Generally speaking, the proportional hazards model assumes the hazard function,
˚
i .tjzi / D 0 .t / exp ˇ 0 zi
where i D 1; : : : ; n indexes subject, 0 .t / is the baseline hazard function, and zi are the covariates for subject
i. If you define Ni .t / to be the number of observed failures of the ith subject up to time t, then the hazard
function for the ith subject can be seen as a special case of a multiplicative intensity model (Clayton 1991).
The intensity process for Ni .t / becomes
Ii .t/ D Yi .t /0 .t / exp.ˇ 0 zi /
where Yi .t/ indicates observation of the subject at time t (taking the value of 1 if the subject is observed and
0 otherwise). Under noninformative censoring, the corresponding likelihood is proportional to
n
Y
3dNi .t /
2
Y
4
i D1
Ii .t /5
Z
t 0
Ii .t /dt
exp
t 0
where dNi .t / is the increment of Ni .t / over the small time interval Œt; t C dt /: it takes a value of 1 if the
subject i fails in the time interval, 0 otherwise. This is a Poisson kernel with the random variable being the
increments of dNi and the means Ii .t /dt
dNi .t/ Poisson.Ii .t /dt /
where
Ii .t/dt D Yi .t / exp.ˇ 0 z/dƒ0 .t /
5860 F Chapter 74: The MCMC Procedure
and
Z
ƒ0 .t/ D
t
0 .u/du:
0
The integral is the increment in the integrated baseline hazard function that occurs during the time interval
Œt; t C dt/.
This formulation provides an alternative way to fit a piecewise exponential model. You partition the time axis
to a few intervals, where each interval has its own hazard rate, ƒ0 .t /. You count the Yi .t / and dNi .t / in
each interval, and fit a Poisson model to each count.
The following DATA step creates the data set Blind (Lin 1994) that represents 197 diabetic patients who have
a high risk of experiencing blindness in both eyes as defined by DRS criteria:
title 'Piecewise Exponential Model';
data Blind;
input ID Time Status DiabeticType Treatment
datalines;
5 46.23 0 1 1
5 46.23 0 1 0
14 42.50 0
16 42.27 0 0 1
16 42.27 0 0 0
25 20.60 0
29 38.77 0 0 1
29 0.30 1 0 0
46 65.23 0
49 63.50 0 0 1
49 10.80 1 0 0
56 23.17 0
@@;
0
0
0
0
1
1
1
1
14
25
46
56
31.30
20.60
54.27
23.17
1
0
1
0
0
0
0
0
0
0
0
0
... more lines ...
1705 8.00 0 0 1 1705 8.00 0 0 0 1717 51.60 0 1 1 1717 42.33 1 1 0
1727 49.97 0 1 1 1727 2.90 1 1 0 1746 45.90 0 0 1 1746 1.43 1 0 0
1749 41.93 0 1 1 1749 41.93 0 1 0
;
One eye of each patient is treated with laser photocoagulation. The hypothesis of interest is whether the laser
treatment delays the occurrence of blindness. The following variables are included in Blind:
ID, patient’s identification
Time, failure time
Status, event indicator (0=censored and 1=uncensored)
Treatment, treatment received (1=laser photocoagulation and 0=otherwise)
DiabeticType, type of diabetes (0=juvenile onset with age of onset at 20 or under, and 1= adult onset
with age of onset over 20)
For illustrational purposes, a piecewise exponential model that ignores the patient-level frailties is first fit
to the entire data set. The formulation of the Poisson counting process makes it straightforward to add the
frailty terms, as it is demonstrated later.
The following statements create a partition (of length 8) along the time axis, with s0 < s1 < s2 < < sJ ,
with s0 D 0:1 < yi and sJ D 80 > yi for all i. The time intervals are stored in the Partition data set:
Example 74.16: Piecewise Exponential Frailty Model F 5861
data partition;
input int_1-int_9;
datalines;
0.1 6.545 13.95 26.47
;
38.8
45.88
54.35
62 80
To obtain reasonable estimates, placing an equal number of observations in each interval is recommended.
You can find the partition points by calculating the percentile statistics of the time variable (for example, by
using the UNIVARIATE procedure).
The following regression model and prior distributions are used in the analysis:
ˇ 0 zi
D ˇ1 treatment C ˇ2 diabetictype C ˇ3 treatment * diabetictype
ˇ1 ; ˇ2 ; ˇ3 normal.0; var D 1e6/
j
gamma.shape D 0:01; iscale D 0:01/ for j D 1; : : : ; 8
The following statements calculate Yi .t / for each observation i, at every time point t in the Partition data set.
The statements also find the observed failure time interval, dNi .t /, for each observation:
%let n = 8;
data _a;
set blind;
if _n_ eq 1 then set partition;
array int[*] int_:;
array Y[&n];
array dN[&n];
do k = 1 to (dim(int)-1);
Y[k] = (time - int[k] + 0.001 >= 0);
dN[k] = Y[k] * ( int[k+1] - time - 0.001 >= 0) * status;
end;
output;
drop int_: k;
run;
The DATA step reads in the Blind data set. At the first observation, it also reads in the Partition data set.
The first ARRAY statement creates the int array and name the elements int_:. Because the names match the
variable names in the Partition data set, all values of the int_: variables (there is only one observation) in the
Partition data set are therefore stored in the int array. The next two ARRAY statements create arrays Y and
dN, each with length 8. They store values of Yi .t / and dNi .t /, resulting from each failure time in the Blind
data set.
The following statements print the first 10 observations of the constructed data set _a and display them in
Output 74.16.1:
proc print data=_a(obs=10);
run;
5862 F Chapter 74: The MCMC Procedure
Output 74.16.1 First 10 Observations of the Data Set _a
Piecewise Exponential Model
Obs ID Time Status DiabeticType Treatment Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 dN1 dN2 dN3 dN4 dN5 dN6 dN7 dN8
1
5 46.23
0
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
2
5 46.23
0
1
0
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
3 14 42.50
0
0
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
4 14 31.30
1
0
0
1
1
1
1
0
0
0
0
0
0
0
1
0
0
0
0
5 16 42.27
0
0
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
6 16 42.27
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
7 25 20.60
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
8 25 20.60
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
9 29 38.77
0
0
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
10 29
0.30
The first subject in _a experienced blindness in the left eye at time 46.23, and the time falls in the sixth
interval as defined in the Partition data set. Therefore, Y1 through Y6 all take a value of 1, and Y7 and Y8 are
0. The variable dN# takes on a value of 1 if the subject is observed to go blind in that interval. Since the
first observation is censored (status == 1), the actual failure time is unknown. Hence all dN# are 0. The first
observed failure time occurs in observation number 4 (the right eye of the second subject), where the time
variable takes a value of 31.30, Y1 through Y4 are 1, and dN4 is 1.
Note that each observation in the _a data set has 8 Y and 8 dN, meaning that you would need eight MODEL
statements in a PROC MCMC call, each for a Poisson likelihood. Alternatively, you can expand _a, put one
Y and one dN in every observation, and fit the data using a single MODEL statement in PROC MCMC. The
following statements expand the data set _a and save the results in the data set _b:
data _b;
set _a;
array y[*] y:;
array dn[*] dn:;
do i = 1 to (dim(y));
y_val
= y[i];
dn_val
= dn[i];
int_index
= i;
output;
end;
keep y_: dn_: diabetictype treatment int_index id;
run;
data _b;
set _b;
rename y_val=Y dn_val=dN;
run;
You can use the following PROC PRINT statements to see the first few observations in _b:
proc print data=_b(obs=10);
run;
Example 74.16: Piecewise Exponential Frailty Model F 5863
Output 74.16.2 First 20 Observations of the Data Set _b
Obs ID DiabeticType Treatment Y dN int_index
1 5
1
1 1
0
1
2 5
1
1 1
0
2
3 5
1
1 1
0
3
4 5
1
1 1
0
4
5 5
1
1 1
0
5
6 5
1
1 1
0
6
7 5
1
1 0
0
7
8 5
1
1 0
0
8
9 5
1
0 1
0
1
10 5
1
0 1
0
2
The data set _b now contains 3,152 observations (see Output 74.16.2 for the first few observations). The
Time and Status variables are no longer needed; hence they are discarded from the data set. The int_index
variable is an index variable that indicates interval membership of each observation.
Because the variable Y does not contribute to the likelihood calculation when it takes a value of 0 (it
amounts to a Poisson likelihood that has a mean and response variable that are both 0), you can remove these
observations. This speeds up the calculation in PROC MCMC:
data inputdata;
set _b;
if Y > 0;
run;
The data set Inputdata has 1,775 observations, as opposed to 3,152 observations in _b. The following
statements fit a piecewise exponential model in PROC MCMC:
proc mcmc data=inputdata nmc=10000 outpost=postout seed=12351
maxtune=5;
ods select PostSumInt ESS;
parms beta1-beta3 0;
prior beta: ~ normal(0, var = 1e6);
random lambda ~ gamma(0.01, iscale = 0.01) subject=int_index;
bZ = beta1*treatment + beta2*diabetictype + beta3*treatment*diabetictype;
idt = exp(bz) * lambda;
model dN ~ poisson(idt);
run;
The PARMS statement declares three regression parameters, beta1–beta3. The PRIOR statement specifies a
noninformative normal prior on the regression coefficients. The RANDOM statement specifies the random
effect, lambda, its prior distribution, and interval membership which is indexed by the data set variable
int_index.
The symbol bZ calculates the regression mean, and the symbol idt is the mean of the Poisson likelihood. It
corresponds to the equation
Ii .t/dt D Yi .t / exp.ˇ 0 z/dƒ0 .t /
Note that the Yi .t / term is omitted in the assignment statement because Y takes only the value of 1 in the
input data set.
5864 F Chapter 74: The MCMC Procedure
Output 74.16.3 displays posterior estimates of the three regression parameters.
Output 74.16.3 Posterior Summary Statistics
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
beta1
10000 -0.4174
0.2129 -0.8121 0.0203
beta2
10000 0.3138
0.1956 -0.0885 0.6958
beta3
10000 -0.7899
0.3308 -1.4300 -0.1046
To understand the results, you can create a 2 2 table (Table 74.52) and plug in the posterior mean estimates
to the regression model. A –0.41 estimate for subjects who received laser treatment and had juvenile diabetes
suggests that the laser treatment is effective in delaying blindness. And the effect is much more pronounced
(–0.80) for adult subjects who have diabetes and received treatment.
Table 74.52
Estimates of Regression Effects in the Survival Model
ˇO 0 Z
Treatment
0
1
Diabetic Type
0
1
0
0.32
–0.41 –0.80
You can also use the macro %CATER (“Caterpillar Plot” on page 5742) to draw a caterpillar plot to visualize
the eight hazards in the model:
%cater(data=postout, var=lambda_:);
Example 74.16: Piecewise Exponential Frailty Model F 5865
Output 74.16.4 Caterpillar Plot of the Hazards in the Piecewise Exponential Model
The fitted hazards show a nonconstant underlying hazard function (read along the y-axis as lambda_# are
hazards along the time-axis) in the model.
Now suppose you want to include patient-level information and fit a frailty model to the blind data set, where
the random effect enters the model through the regression term, where the subject is indexed by the variable
ID in the data.
ˇ 0 zi
D ˇ1 treatment C ˇ2 diabetictype C ˇ3 treatment * diabetictype C uid
uid
normal.0; var D 2 /
2 igamma.shape D 0:01; scale D 0:01/
where id indexes patient.
The actual coding in PROC MCMC of a piecewise exponential frailty model is rather straightforward:
ods select none;
proc mcmc data=inputdata nmc=10000 outpost=postout seed=12351
stats=summary diag=none;
parms beta1-beta3 0 s2;
prior beta: ~ normal(0, var = 1e6);
prior s2 ~ igamma(0.01, scale=0.01);
random lambda ~ gamma(0.01, iscale = 0.01) subject=int_index;
random u ~ normal(0, var=s2) subject=id;
bZ = beta1*treatment + beta2*diabetictype + beta3*treatment*diabetictype + u;
idt = exp(bZ) * lambda;
model dN ~ poisson(idt);
run;
5866 F Chapter 74: The MCMC Procedure
A second RANDOM statement defines a subject-level random effect u, and the random-effects parameters
enter the model in the term for the regression mean, bZ. An additional model parameter, s2, the variance of
the random-effects parameters, is needed for the model. The results are not shown here.
Example 74.17: Normal Regression with Interval Censoring
You can use PROC MCMC to fit failure time data that can be right, left, or interval censored. To illustrate, a
normal regression model is used in this example.
Assume that you have the following simple regression model with no covariates:
y D C where y is a vector of response values (the failure times), is the grand mean, is an unknown scale
parameter, and are errors from the standard normal distribution. Instead of observing yi directly, you only
observe a truncated value ti . If the true yi occurs after the censored time ti , it is called right censoring. If yi
occurs before the censored time, it is called left censoring. A failure time yi can be censored at both ends,
and this is called interval censoring. The likelihood for yi is as follows:
8
.yi j; /
if yi is uncensored
ˆ
ˆ
<
S.tl;i j/
if yi is right censored by tl;i
p.yi j/ D
1
S.t
j/
if yi is left censored by tr;i
ˆ
r;i
ˆ
:
S.tl;i j/ S.tr;i j/ if yi is interval censored by tl;i and tr;i
where S./ is the survival function, S.t / D Pr.T > t /.
Gentleman and Geyer (1994) uses the following data on cosmetic deterioration for early breast cancer patients
treated with radiotherapy:
title 'Normal Regression with Interval Censoring';
data cosmetic;
label tl = 'Time to Event (Months)';
input tl tr @@;
datalines;
45 .
6 10
. 7 46 . 46 .
7 16 17 .
7 14
37 44
. 8
4 11 15 . 11 15 22 . 46 . 46 .
25 37 46 . 26 40 46 . 27 34 36 44 46 . 36 48
37 . 40 . 17 25 46 . 11 18 38 .
5 12 37 .
. 5 18 . 24 . 36 .
5 11 19 35 17 25 24 .
32 . 33 . 19 26 37 . 34 . 36 .
;
The data consist of time interval endpoints (in months). Nonmissing equal endpoints (tl = tr) indicates
noncensoring; a nonmissing lower endpoint (tl ¤ .) and a missing upper endpoint (tr = .) indicates right
censoring; a missing lower endpoint (tl = .) and a nonmissing upper endpoint (tr ¤ .) indicates left censoring;
and nonmissing unequal endpoints (tl ¤ tr) indicates interval censoring.
With this data set, you can consider using proper but diffuse priors on both and , for example:
normal.0; sd D 1000/
gamma.0:001; iscale D 0:001/
The following SAS statements fit an interval censoring model and generate Output 74.17.1:
Example 74.18: Constrained Analysis F 5867
proc mcmc data=cosmetic outpost=postout seed=1 nmc=20000 missing=AC;
ods select PostSumInt;
parms mu 60 sigma 50;
prior mu ~ normal(0, sd=1000);
prior sigma ~ gamma(shape=0.001,iscale=0.001);
if (tl^=. and tr^=. and tl=tr) then
llike = logpdf('normal',tr,mu,sigma);
else if (tl^=. and tr=.) then
llike = logsdf('normal',tl,mu,sigma);
else if (tl=. and tr^=.) then
llike = logcdf('normal',tr,mu,sigma);
else
llike = log(sdf('normal',tl,mu,sigma) sdf('normal',tr,mu,sigma));
model general(llike);
run;
Because there are missing cells in the input data, you want to use the MISSING=AC option so that PROC
MCMC does not delete any observations that contain missing values. The IF-ELSE statements distinguish
different censoring cases for yi , according to the likelihood. The SAS functions LOGCDF, LOGSDF,
LOGPDF, and SDF are useful here. The MODEL statement assigns llike as the log likelihood to the response.
The Markov chain appears to have converged in this example (evidence not shown here), and the posterior
estimates are shown in Output 74.17.1.
Output 74.17.1 Interval Censoring
Normal Regression with Interval Censoring
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
mu
20000 41.7807
5.7882 31.3604 53.6115
sigma
20000 29.1122
6.0503 19.4041 41.6742
Example 74.18: Constrained Analysis
Conjoint analysis uses regression techniques to model consumer preferences and to estimate consumer utility
functions. A problem with conventional conjoint analysis is that sometimes your estimated utilities do not
make sense. Your results might suggest, for example, that the consumers would prefer to spend more on
a product than to spend less. With PROC MCMC, you can specify constraints on the part-worth utilities
(parameter estimates). Suppose that the consumer product being analyzed is an off-road motorcycle. The
relevant attributes are how large each motorcycle is (less than 300cc, 301–550cc, and more than 551cc),
how much it costs (less than $5000, $5001–$6000, $6001–$7000, and more than $7000), whether or not it
has an electric starter, whether or not the engine is counter-balanced, and whether the bike is from Japan or
5868 F Chapter 74: The MCMC Procedure
Europe. The preference variable is a ranking of the bikes. You could perform an ordinary conjoint analysis
with PROC TRANSREG (see Chapter 119, “The TRANSREG Procedure”) as follows:
options validvarname=any;
proc format;
value sizef 1 = '< 300cc' 2 = '300-550cc' 3 = '> 551cc';
value pricef 1 = '< $5000' 2 = '$5000 - $6000'
3 = '$6001 - $7000' 4 = '> $7000';
value startf 1 = 'Electric Start' 2 = 'Kick Start';
value balf
1 = 'Counter Balanced' 2 = 'Unbalanced';
value orif
1 = 'Japanese' 2 = 'European';
run;
data bikes;
input Size Price Start Balance Origin Rank @@;
format size sizef. price pricef. start startf.
balance balf. origin orif.;
datalines;
2 1 2 1 2 3 1 4 2 2 2 7 1 2 1 1 2 6
3 3 1 1 2 1 1 3 2 1 1 5 3 4 2 2 2 12
2 3 2 2 1 9 1 1 1 2 1 8 2 2 1 2 2 10
2 4 1 1 1 4 3 1 1 2 1 11 3 2 2 1 1 2
;
title 'Ordinary Conjoint Analysis by PROC TRANSREG';
proc transreg data=bikes utilities cprefix=0 lprefix=0;
ods select Utilities;
model identity(rank / reflect) =
class(size price start balance origin / zero=sum);
output out=coded(drop=intercept) replace;
run;
The DATA step reads the experimental design and dependent variable Rank and assigns formats to label the
factor levels. PROC TRANSREG is run specifying UTILITIES, which requests a conjoint analysis. The rank
variable is reflected around its mean (1 ! 12, 2 ! 11, . . . , 12 ! 1) so that in the analysis, larger part-worth
utilities correspond to higher preference. The OUT=CODED data set contains the reflected ranks and a
binary coding of the factors that can be used in other analyses. See Kuhfeld (2010) for more information
about conjoint analysis and coding with PROC TRANSREG.
The Utilities table from the conjoint analysis is shown in Output 74.18.1. Notice the part-worth utilities
for price. The part-worth utility for < $5000 is 0.25. As price increases to the $5000–$6000 range, utility
decreases to –0.5. Then as price increases to the $6001–$7000 range, part-worth utility increases to 0.5.
Finally, for the most expensive bikes, utility decreases again to –0.25. In cases like this, you might want to
impose constraints on the solution so that the part-worth utility for price never increases as prices go up.
Example 74.18: Constrained Analysis F 5869
Output 74.18.1 Ordinary Conjoint Analysis by PROC TRANSREG
Ordinary Conjoint Analysis by PROC TRANSREG
The TRANSREG Procedure
Utilities Table Based on the Usual Degrees of Freedom
Importance
(% Utility
Range) Variable
Label
Standard
Utility
Error
Intercept
6.5000
0.95743
Intercept
< 300cc
-0.0000
1.35401
0.000 Class.< 300cc
300-550cc
-0.0000
1.35401
Class.300-550cc
0.0000
1.35401
Class.> 551cc
13.333 Class.< $5000
> 551cc
< $5000
0.2500
1.75891
$5000 - $6000
-0.5000
1.75891
Class.$5000 - $6000
$6001 - $7000
0.5000
1.75891
Class.$6001 - $7000
> $7000
-0.2500
1.75891
Class.> $7000
Electric Start
-0.1250
1.01550
0.1250
1.01550
Class.Kick Start
Counter Balanced 3.0000
1.01550
80.000 Class.Counter Balanced
Unbalanced
-3.0000
1.01550
Japanese
-0.1250
1.01550
3.333 Class.Japanese
European
0.1250
1.01550
Class.European
Kick Start
3.333 Class.Electric Start
Class.Unbalanced
You could run PROC TRANSREG again, specifying monotonicity constraints on the part-worth utilities for
price:
title 'Constrained Conjoint Analysis by PROC TRANSREG';
proc transreg data=bikes utilities cprefix=0 lprefix=0;
ods select ConservUtilities;
model identity(rank / reflect) =
monotone(price / tstandard=center)
class(size start balance origin / zero=sum);
run;
The output from this PROC TRANSREG step is shown in Output 74.18.2.
5870 F Chapter 74: The MCMC Procedure
Output 74.18.2 Constrained Conjoint Analysis by PROC TRANSREG
Constrained Conjoint Analysis by PROC TRANSREG
The TRANSREG Procedure
Utilities Table Based on Conservative Degrees of Freedom
Label
Standard
Utility
Error
Intercept
6.5000
0.97658
Price
-0.1581
.
< $5000
0.2500
.
$5000 - $6000
0.0000
.
$6001 - $7000
0.0000
.
> $7000
-0.2500
.
< 300cc
-0.0000
1.38109
0.0000
1.38109
300-550cc
> 551cc
Electric Start
Kick Start
0.0000
1.38109
-0.2083
1.00663
Importance
(% Utility
Range) Variable
Intercept
7.143 Monotone(Price)
0.000 Class.< 300cc
Class.300-550cc
Class.> 551cc
5.952 Class.Electric Start
0.2083
1.00663
Class.Kick Start
Counter Balanced 3.0000
0.97658
85.714 Class.Counter Balanced
Unbalanced
-3.0000
0.97658
Japanese
-0.0417
1.00663
1.190 Class.Japanese
European
0.0417
1.00663
Class.European
Class.Unbalanced
This monotonicity constraint is one of the few constraints on the part-worth utilities that you can specify in
PROC TRANSREG. In contrast, PROC MCMC enables you to specify any constraint that can be written in
the DATA step language. You can perform the restricted conjoint analysis with PROC MCMC by using the
coded factors that were output from PROC TRANSREG. The data set is Coded.
The likelihood is a simple regression model:
ranki normal.x0i ˇ; /
where rank is the response, the covariates are ‘< 300cc’n, ‘300-500cc’n, ‘< $5000’n, ‘$5000 - $6000’n,
‘$6001 - $7000’n, ‘Electric Start’n, ‘Counter Balanced’n, and Japanese. Note that OPTIONS VALIDVARNAME=ANY enables PROC TRANSREG to create names for the coded variables with blanks and special
characters. That is why the name-literal notation (‘variable-name’n) is used for the input data set variables.
Suppose that there are two constraints you want to put on some of the parameters: one is that the parameters
for ‘< $5000’n, ‘$5000 - $6000’n, and ‘$6001 - $7000’n decrease in order, and the other is that the parameter
for ‘Counter Balanced’n is strictly positive. You can consider a truncated multivariate normal prior as follows:
.ˇ‘< $5000’n ; ˇ‘$5000 - $6000’n ; ˇ‘$6001 - $7000’n ; ˇ‘Counter Balanced’n / MVN.0; I/
with the following set of constraints:
ˇ‘< $5000’n > ˇ‘$5000 - $6000’n > ˇ‘$6001 - $7000’n > 0
ˇ‘Counter Balanced’n > 0
Example 74.18: Constrained Analysis F 5871
The condition that ˇ‘$6001 - $7000’n > 0 reflects an implied constraint that, by definition, 0 is the utility for the
highest price range, > $7000, which is the reference level for the binary coded price variable. The following
statements fit the desired model:
title 'Bayesian Constrained Conjoint Analysis by PROC MCMC';
proc mcmc data=coded outpost=bikesout ntu=3000 nmc=50000
propcov=quanew seed=448 diag=none;
ods select PostSumInt;
array pw[4] pw5000 pw5000_6000 pw6001_7000 pwCounterBalanced;
array sigma[4,4];
array mu[4];
begincnst;
call identity(sigma);
call mult(sigma, 100, sigma);
call zeromatrix(mu);
endcnst;
parms intercept pw300cc pw300_550cc pwElectricStart pwJapanese tau 1;
parms pw5000 0.3 pw5000_6000 0.2 pw6001_7000 0.1 pwCounterBalanced 1;
beginnodata;
prior intercept pw300: pwE: pwJ: ~ normal(0, var=100);
if (pw5000
>= pw5000_6000 & pw5000_6000 >= pw6001_7000 &
pw6001_7000 >= 0
& pwCounterBalanced > 0) then
lp = lpdfmvn(pw, mu, sigma);
else
lp = .;
prior pw5000 pw5000_6000 pw6001_7000 pwC: ~ general(lp);
prior tau ~ gamma(0.01, iscale=0.01);
endnodata;
mean = intercept +
pw300cc
* '< 300cc'n
pw300_550cc
* '300-550cc'n
pw5000
* '< $5000'n
pw5000_6000
* '$5000 - $6000'n
pw6001_7000
* '$6001 - $7000'n
pwElectricStart
* 'Electric Start'n
pwCounterBalanced * 'Counter Balanced'n
pwJapanese
* Japanese;
model rank ~ normal(mean, prec=tau);
run;
+
+
+
+
+
+
+
The two ARRAY statements allocate a 4 4 dimensional array for the prior covariance and an array of size 4
for the prior means. In the BEGINCNST and ENDCNST statements, the CALL IDENTITY function sets
sigma to be an identity matrix; the CALL MULT function sets sigma’s diagonal elements to be 100 (the
diagonal variance terms); and the CALL ZEROMATRIX function sets mu to be a vector of zeros (the prior
means). For matrix functions in PROC MCMC, see the section “Matrix Functions in PROC MCMC” on
page 5723.
There are two PARMS statements, with each of them naming a block of parameters. The first PARMS
statement blocks the following: the intercept, the two size parameters, the one start-type parameter, the one
origin parameter, and the precision. The second PARMS statement blocks the three price parameters and the
5872 F Chapter 74: The MCMC Procedure
one balance parameter, parameters that have the constraint multivariate normal prior. The second PARMS
statement also specifies initial values for the parameter estimates. The initial values reflect the constraints on
these parameters. The initial part-worth utilities all decrease from 0.3 to 0.2 to 0.1 to 0.0 (for the implicit
reference level) as the prices increase. Also, the initial part-worth utility for the counter-balanced engine is
set to a positive value, 1.
In the PRIOR statements, regression coefficients without constraints are given an independent normal prior
with mean at 0 and variance of 100. The next IF-ELSE construction imposes the constraints. When these
constraints are met, pw5000, pw5000_6000, pw6001_7000, pwCounterBalanced are jointly distributed as a
multivariate normal prior with mean mu and covariance sigma. Otherwise, the prior is not defined and lp
is assigned a missing value. The parameter tau is given a gamma prior, which is a conjugate prior for that
parameter.
The model specification is linear. The mean is comprised of an intercept and the sum of terms like pw300cc *
‘< 300cc’n, which is a parameter times an input data set variable. The MODEL statement specifies that the
linear model for rank is normally distributed with mean mean and precision tau.
The MCMC results are shown in Output 74.18.3.
Output 74.18.3 MCMC Results
Bayesian Constrained Conjoint Analysis by PROC MCMC
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
intercept
50000
2.2570
2.5131
-2.9083 7.1760
pw300cc
50000 0.00983
2.4903
-4.8014 5.3161
pw300_550cc
50000
0.0549
2.5097
-5.1371 4.9766
pwElectricStart
50000 -1.1319
2.1195
-5.6257 2.9663
pwJapanese
50000 -0.4567
2.1232
-4.9020 3.6599
tau
50000
0.1135
0.0765
0.00885 0.2643
pw5000
50000
4.1614
2.1803
0.5751 8.3740
pw5000_6000
50000
2.6147
1.6188
0.0587 5.6001
pw6001_7000
50000
1.5040
1.2530 0.000104 3.9803
pwCounterBalanced 50000
5.8880
2.0638
1.7161 9.9558
The estimates of the part-worth utility for the price categories are ordered as expected. This agrees with the
intuition that there is a higher preference for a less expensive motor bike when all other things are equal, and
that is what you see when you look at the estimated posterior means for the price part-worths. The estimated
standard deviations of the price part-worths in this model are of approximately the same order of magnitude
as the posterior means. This indicates that the part-worth utilities for this subject are not significantly far from
each other, and that this subject’s ranking of the options was not significantly influenced by the difference in
price.
One advantage of Bayesian analysis is that you can incorporate prior information in the data analysis.
Constraints on the parameter space are one possible source of information that you might have before you
examine the data. This example shows that it can be accomplished in PROC MCMC.
Example 74.19: Implement a New Sampling Algorithm F 5873
Example 74.19: Implement a New Sampling Algorithm
This example illustrates using the UDS statement to implement a new Markov chain sampler. The algorithm
demonstrated here is proposed by Holmes and Held (2006), hereafter referred to as HH. They presented a
Gibbs sampling algorithm for generating draws from the posterior distribution of the parameters in a probit
regression model. The notation follows closely to HH.
The data used here is the remission data set from a PROC LOGISTIC example:
title 'Implement a New Sampling Algorithm';
data inputdata;
input remiss cell smear infil li blast temp;
ind = _n_;
cnst = 1;
label remiss='Complete Remission';
datalines;
1 0.8
0.83 0.66 1.9 1.1
0.996
... more lines ...
0
1
0.73
0.73
0.7
0.398
0.986
;
The variable remiss is the cancer remission indicator variable with a value of 1 for remission and a value of 0
for nonremission. There are six explanatory variables: cell, smear, infil, li, blast, and temp. These variables
are the risk factors thought to be related to cancer remission. The binary regression model is as follows:
remissi binary.pi /
where the covariates are linked to pi through a probit transformation:
probit.pi / D x0 ˇ
ˇ are the regression coefficients and x0 the explanatory variables. Suppose you want to use independent
normal priors on the regression coefficients:
ˇi normal.0; var D 25/
Fitting a probit model with PROC MCMC is straightforward. You can use the following statements:
proc mcmc data=inputdata nmc=100000 propcov=quanew seed=17
outpost=mcmcout;
ods select PostSumInt ess;
parms beta0-beta6;
prior beta: ~ normal(0,var=25);
mu = beta0 + beta1*cell + beta2*smear +
beta3*infil + beta4*li + beta5*blast + beta6*temp;
p = cdf('normal', mu, 0, 1);
model remiss ~ bern(p);
run;
5874 F Chapter 74: The MCMC Procedure
The expression mu is the regression mean, and the CDF function links mu to the probability of remission p in
the binary likelihood.
The summary statistics and effective sample sizes tables are shown in Output 74.19.1. There are high
autocorrelations among the posterior samples, and efficiency is relatively low. The correlation time is reduced
only after a large amount of thinning.
Output 74.19.1 Random Walk Metropolis
Implement a New Sampling Algorithm
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
beta0
100000 -2.0531
3.8299
-9.5480 5.4131
beta1
100000 2.6300
2.8270
-2.7934 8.2334
beta2
100000 -0.8426
3.2108
-7.0459 5.4269
beta3
100000 1.5933
3.5491
-5.5342 8.3307
beta4
100000 2.0390
0.8796
0.4133 3.8654
beta5
100000 -0.3184
0.9543
-2.1420 1.5567
beta6
100000 -3.2611
3.7806 -10.7053 4.1000
Implement a New Sampling Algorithm
The MCMC Procedure
Effective Sample Sizes
Parameter
ESS
Autocorrelation
Time Efficiency
beta0
4280.8
23.3602
0.0428
beta1
4496.5
22.2398
0.0450
beta2
3434.1
29.1199
0.0343
beta3
3856.6
25.9294
0.0386
beta4
3659.7
27.3245
0.0366
beta5
3229.9
30.9610
0.0323
beta6
4430.7
22.5696
0.0443
As an alternative to the random walk Metropolis, you can use the Gibbs algorithm to sample from the posterior
distribution. The Gibbs algorithm is described in the section “Gibbs Sampler” on page 131 in Chapter 7,
“Introduction to Bayesian Analysis Procedures.” While the Gibbs algorithm generally applies to a wide range
of statistical models, the actual implementation can be problem-specific. In this example, performing a
Gibbs sampler involves introducing a class of auxiliary variables (also known as latent variables). You first
reformulate the model by adding a zi for each observation in the data set:
1 if zi > 0
0 otherwise
yi
D
zi
D x0i ˇ C i
normal.0; 1/
ˇ .ˇ/
Example 74.19: Implement a New Sampling Algorithm F 5875
If ˇ has a normal prior, such as .ˇ/ D N.b; v/, you can work out a closed form solution to the full
conditional distribution of ˇ given the data and the latent variables zi . The full conditional distribution is also
a multivariate normal, due to the conjugacy of the problem. See the section “Conjugate Priors” on page 125
in Chapter 7, “Introduction to Bayesian Analysis Procedures.” The formula is shown here:
ˇjz; x normal.B; V/
B D V..v/
V D .v
1
1
b C x0 z/
C x0 x/
1
The advantage of creating the latent variables is that the full conditional distribution of z is also easy to work
with. The distribution is a truncated normal distribution:
normal.xi ˇ; 1/I.zi > 0/ if yi D 1
zi jˇ; xi ; yi normal.xi ˇ; 1/I.zi 0/ otherwise
You can sample ˇ and z iteratively, by drawing ˇ given z and vice verse. HH point out that a high degree
of correlation could exist between ˇ and z, and it makes this iterative way of sampling inefficient. As an
improvement, HH proposed an algorithm that samples ˇ and z jointly. At each iteration, you sample zi from
the posterior marginal distribution (this is the distribution that is conditional only on the data and not on any
parameters) and then sample ˇ from the same posterior full conditional distribution as described previously:
1. Sample zi from its posterior marginal distribution:
mi
normal.mi ; vi /I.zi > 0/ if yi D 1
normal.mi ; vi /I.zi 0/ otherwise
D xi B wi .zi xi B/
vi
D 1 C wi
wi
D hi =.1
hi
D .H/i i; H D xVx0
zi jz i ; yi
hi /
2. Sample ˇ from the same posterior full conditional distribution described previously.
For a detailed description of each of the conditional terms, refer to the original paper.
PROC MCMC cannot sample from the probit model by using this sampling scheme but you can implement
the algorithm by using the UDS statement. To sample zi from its marginal, you need a function that draws
random variables from a truncated normal distribution. The functions, RLTNORM and RRTNORM, generate
left- and right-truncated normal variates, respectively. The algorithm is taken from Robert (1995).
The functions are written in PROC FCMP (see the FCMP Procedure in the Base SAS Procedures Guide):
proc fcmp outlib=sasuser.funcs.uds;
/******************************************/
/* Generate left-truncated normal variate */
/******************************************/
function rltnorm(mu,sig,lwr);
if lwr<mu then do;
ans = lwr-1;
5876 F Chapter 74: The MCMC Procedure
do while(ans<lwr);
ans = rand('normal',mu,sig);
end;
end;
else do;
mul = (lwr-mu)/sig;
alpha = (mul + sqrt(mul**2 + 4))/2;
accept=0;
do while(accept=0);
z = mul + rand('exponential')/alpha;
lrho = -(z-alpha)**2/2;
u = rand('uniform');
lu = log(u);
if lu <= lrho then accept=1;
end;
ans = sig*z + mu;
end;
return(ans);
endsub;
/*******************************************/
/* Generate right-truncated normal variate */
/*******************************************/
function rrtnorm(mu,sig,uppr);
ans = 2*mu - rltnorm(mu,sig, 2*mu-uppr);
return(ans);
endsub;
run;
The function call to RLTNORM(mu,sig,lwr) generates a random number from the left-truncated normal
distribution:
normal.mu; sd D sig/I. > lwr/
Similarly, the function call to RRTNORM(mu,sig,uppr) generates a random number from the right-truncated
normal distribution:
normal.mu; sd D sig/I. < uppr/
These functions are used to generate the latent variables zi .
Using the algorithm A1 from the HH paper as an example, Output 74.53 lists a line-by-line implementation
with the PROC MCMC coding style. The table is broken into three portions: set up the constants, initialize
the parameters, and sample one draw from the posterior distribution. The left column of the table is identical
to the A1 algorithm stated in the appendix of HH. The right column of the table lists SAS statements.
Example 74.19: Implement a New Sampling Algorithm F 5877
Table 74.53 Holmes and Held (2006), algorithm A1.
Side-by-Side Comparison to SAS
Define Constants
1/ 1
In the BEGINCNST/ENDCNST Statements
call
call
call
call
call
transpose(x,xt); /* xt = transpose(x) */
mult(xt,x,xtx);
inv(v,v); /* v = inverse(v) */
addmatrix(xtx,v,xtx); /* xtx = xtx+v */
inv(xtx,v); /* v = inverse(xtx) */
V
.X 0 X C v
L
Chol.V /
call chol(v,L);
S
V X0
call mult(v,xt,S);
FOR j = 1 to n
H Œj 
X Œj; S Œ; j 
W Œj 
H Œj =.1 H Œj /
QŒj 
W Œj  C 1
END
call mult(x,S,HatMat);
do j=1 to &n;
H = HatMat[j,j];
W[j] = H/(1-H);
sQ[j] = sqrt(W[j] + 1); /* use s.d.
end;
Initial Values
In the BEGINCNST/ENDCNST Statements
Z normal.0; In /Ind.Y; Z/
do j=1 to &n;
if(y[j]=1) then
Z[j] = rltnorm(0,1,0);
else
Z[j] = rrtnorm(0,1,0);
end;
B
call mult(S,Z,B);
SZ
in SAS */
5878 F Chapter 74: The MCMC Procedure
Table 74.53 (continued)
Draw One Sample
Subroutine HH
do j=1 to &n;
zold = Z[j];
m = 0;
do k= 1 to &p;
m = m + X[j,k] * B[k];
end;
m = m - W[j]*(Z[j]-m);
FOR j = 1 to n
if (y[j]=1) then
zold
ZŒj 
Z[j] = rltnorm(m,sQ[j],0);
m
X Œj; B
else
m
m W Œj .ZŒj  m/
Z[j] = rrtnorm(m,sQ[j],0);
ZŒj  normal.m; QŒj /Ind.Y Œj ; ZŒj /
diff = Z[j] - zold;
B
B C .ZŒj  zold /S Œ; j 
do k= 1 to &p;
END
B[k] = B[k] + diff * S[k,j];
T normal.0; Ip /
end;
ˇŒ; i
B C LT
end;
do j = 1 to &p;
T[j] = rand(’normal’);
end;
call mult(L,T,T);
call addmatrix(B,T,beta);
The following statements define the subroutine HH (algorithm A1) in PROC FCMP and store it in library
sasuser.funcs.uds:
/* define the HH algorithm in PROC FCMP. */
%let n = 27;
%let p = 7;
options cmplib=sasuser.funcs;
proc fcmp outlib=sasuser.funcs.uds;
subroutine HH(beta[*],Z[*],B[*],x[*,*],y[*],W[*],sQ[*],S[*,*],L[*,*]);
outargs beta, Z, B;
array T[&p] / nosym;
do j=1 to &n;
zold = Z[j];
m = 0;
do k = 1 to &p;
m = m + X[j,k] * B[k];
end;
m = m - W[j]*(Z[j]-m);
if (y[j]=1) then
Z[j] = rltnorm(m,sQ[j],0);
else
Z[j] = rrtnorm(m,sQ[j],0);
diff = Z[j] - zold;
do k = 1 to &p;
Example 74.19: Implement a New Sampling Algorithm F 5879
B[k] = B[k] + diff * S[k,j];
end;
end;
do j=1 to &p;
T[j] = rand('normal');
end;
call mult(L,T,T);
call addmatrix(B,T,beta);
endsub;
run;
Note that one-dimensional array arguments take the form of name[*] and two-dimensional array arguments
take the form of name[*,*]. Three variables, beta, Z, and B, are OUTARGS variables, making them the only
arguments that can be modified in the subroutine. For the UDS statement to work, all OUTARGS variables
have to be model parameters. Technically, only beta and Z are model parameters, and B is not. The reason
that B is declared as an OUTARGS is because the array must be updated throughout the simulation, and
this is the only way to modify its values. The input array x contains all of the explanatory variables, and the
array y stores the response. The rest of the input arrays, W, sQ, S, and L, store constants as detailed in the
algorithm. The following statements illustrate how to fit a Bayesian probit model by using the HH algorithm:
options cmplib=sasuser.funcs;
proc mcmc data=inputdata nmc=5000 monitor=(beta) outpost=hhout;
ods select PostSumInt ess;
array xtx[&p,&p];
/* work space
array xt[&p,&n];
/* work space
array v[&p,&p];
/* work space
array HatMat[&n,&n];
/* work space
array S[&p,&n];
/* V * Xt
array W[&n];
array y[1]/ nosymbols; /* y stores the response variable
array x[1]/ nosymbols; /* x stores the explanatory variables
array sQ[&n];
/* sqrt of the diagonal elements of Q
array B[&p];
/* conditional mean of beta
array L[&p,&p];
/* Cholesky decomp of conditional cov
array Z[&n];
/* latent variables Z
array beta[&p] beta0-beta6;
/* regression coefficients
*/
*/
*/
*/
*/
*/
*/
*/
*/
*/
*/
*/
begincnst;
call streaminit(83101);
if ind=1 then do;
rc = read_array("inputdata", x, "cnst", "cell", "smear", "infil",
"li", "blast", "temp");
rc = read_array("inputdata", y, "remiss");
call identity(v);
call mult(v, 25, v);
call transpose(x,xt);
call mult(xt,x,xtx);
call inv(v,v);
call addmatrix(xtx,v,xtx);
call inv(xtx,v);
call chol(v,L);
call mult(v,xt,S);
5880 F Chapter 74: The MCMC Procedure
call mult(x,S,HatMat);
do j=1 to &n;
H = HatMat[j,j];
W[j] = H/(1-H);
sQ[j] = sqrt(W[j] + 1);
end;
do j=1 to &n;
if(y[j]=1) then
Z[j] = rltnorm(0,1,0);
else
Z[j] = rrtnorm(0,1,0);
end;
call mult(S,Z,B);
end;
endcnst;
uds HH(beta,Z,B,x,y,W,sQ,S,L);
parms z: beta: 0 B1-B7 / uds;
prior z: beta: B1-B7 ~ general(0);
model general(0);
run;
The OPTIONS statement names the catalog of FCMP subroutines to use. The cmplib library stores the
subroutine HH. You do not need to set a random number seed in the PROC MCMC statement because all
random numbers are generated from the HH subroutine. The initialization of the rand function is controlled
by the streaminit function, which is called in the program with a seed value of 83101.
A number of arrays are allocated. Some of them, such as xtx, xt, v, and HatMat, allocate work space for
constant arrays. Other arrays are used in the subroutine sampling. Explanations of the arrays are shown in
comments in the statements.
In the BEGINCNST and ENDCNST statement block, you read data set variables in the arrays x and y,
calculate all the constant terms, and assign initial values to Z and B. For the READ_ARRAY function, see
the section “READ_ARRAY Function” on page 5663. For listings of all array functions and their definitions,
see the section “Matrix Functions in PROC MCMC” on page 5723.
The UDS statement declares that the subroutine HH is used to sample the parameters beta, Z, and B. You
also specify the UDS option in the PARMS statement. Because all parameters are updated through the UDS
interface, it is not necessary to declare the actual form of the prior for any of the parameters. Each parameter
is declared to have a prior of general(0). Similarly, it is not necessary to declare the actual form of the
likelihood. The MODEL statement also takes a flat likelihood of the form general(0).
Summary statistics and effective sample sizes are shown in Output 74.19.2. The posterior estimates are
very close to what was shown in Output 74.19.1. The HH algorithm produces samples that are much less
correlated.
Example 74.19: Implement a New Sampling Algorithm F 5881
Output 74.19.2 Holms and Held
Implement a New Sampling Algorithm
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
Standard
Mean Deviation
95%
HPD Interval
beta0
5000 -2.0567
3.8260
-9.4031 5.2733
beta1
5000 2.7254
2.8079
-2.3940 8.5828
beta2
5000 -0.8318
3.2017
-6.6219 5.8170
beta3
5000 1.6319
3.5108
-5.7117 7.9353
beta4
5000 2.0567
0.8800
0.3155 3.7289
beta5
5000 -0.3473
0.9490
-2.1478 1.5889
beta6
5000 -3.3787
3.7991 -10.6821 4.1930
Implement a New Sampling Algorithm
The MCMC Procedure
Effective Sample Sizes
Parameter
ESS
Autocorrelation
Time Efficiency
beta0
3651.3
1.3694
0.7303
beta1
1563.8
3.1973
0.3128
beta2
5005.9
0.9988
1.0012
beta3
4853.2
1.0302
0.9706
beta4
2611.2
1.9148
0.5222
beta5
3049.2
1.6398
0.6098
beta6
3503.2
1.4273
0.7006
It is interesting to compare the two approaches of fitting a generalized linear model. The random walk
Metropolis on a seven-dimensional parameter space produces autocorrelations that are substantially higher
than the HH algorithm. A much longer chain is needed to produce roughly equivalent effective sample sizes.
On the other hand, the Metropolis algorithm is faster to run. The running time of these two examples is
roughly the same, with the random walk Metropolis with 100000 samples, a 20-fold increase over that in
the HH algorithm example. The speed difference can be attributed to a number of factors, ranging from
the implementation of the software and the overhead cost of calling PROC FCMP subroutine and functions.
In addition, the HH algorithm requires more parameters by creating an equal number of latent variables as
the sample size. Sampling more parameters takes time. A larger number of parameters also increases the
challenge in convergence diagnostics, because it is imperative to have convergence in all parameters before
you make valid posterior inferences. Finally, you might feel that coding in PROC MCMC is easier. However,
this really is not a fair comparison to make here. Writing a Metropolis algorithm from scratch would have
probably taken just as much, if not more, effort than the HH algorithm.
5882 F Chapter 74: The MCMC Procedure
Example 74.20: Using a Transformation to Improve Mixing
Proper transformations of parameters can often improve the mixing in PROC MCMC. You already saw this
in “Example 74.6: Nonlinear Poisson Regression Models” on page 5804, which sampled using the log scale
of parameters that priors that are strictly positive, such as the gamma priors. This example shows another
useful transformation: the logit transformation on parameters that take a uniform prior on [0, 1].
The data set is taken from Sharples (1990). It is used in Chaloner and Brant (1988) and Chaloner (1994) to
identify outliers in the data set in a two-level hierarchical model. Congdon (2003) also uses this data set to
demonstrates the same technique. This example uses the data set to illustrate how mixing can be improved
using transformation and does not address the question of outlier detection as in those papers. The following
statements create the data set:
data inputdata;
input nobs grp y
ind = _n_;
datalines;
1 1 24.80 2 1 26.90
4 1 30.93 5 1 33.77
1 2 23.96 2 2 28.92
4 2 26.16 5 2 21.34
1 3 18.30 2 3 23.67
4 3 24.45 5 3 24.89
1 4 51.42 2 4 27.97
4 4 26.67 5 4 17.58
1 5 34.12 2 5 46.87
4 5 38.11 5 5 47.59
;
@@;
3
6
3
6
3
6
3
6
3
6
1
1
2
2
3
3
4
4
5
5
26.65
63.31
28.19
29.46
14.47
28.95
24.76
24.29
58.59
44.67
There are five groups (grp, j D 1; : : : ; 5) with six observations (nobs, i D 1; : : : ; 6) in each. The two-level
hierarchical model is specified as follows:
yij
normal.j ; prec D w /
j
normal.; prec D b /
normal.0; prec D 1e
6/
gamma.0:001; iscale D 0:001/
p uniform.0; 1/
with the precision parameters related to each other in the following way:
b D =p
w
D b
The total number of parameters in this model is eight: 1 ; : : : ; 5 ; ; , and p.
Example 74.20: Using a Transformation to Improve Mixing F 5883
The following statements fit the model:
ods graphics on;
proc mcmc data=inputdata nmc=50000 thin=10 outpost=m1 seed=17
plot=trace;
ods select ess tracepanel;
parms p;
parms tau;
parms mu;
prior p ~ uniform(0,1);
prior tau ~ gamma(shape=0.001,iscale=0.001);
prior mu ~ normal(0,prec=0.00000001);
beginnodata;
taub = tau/p;
tauw = taub-tau;
endnodata;
random theta ~ normal(mu, prec=taub) subject=grp monitor=(theta);
model y ~ normal(theta,prec=tauw);
run;
The ODS SELECT statement displays the effective sample size table and the trace plots. The ODS GRAPHICS ON statement enables ODS Graphics. The PROC MCMC statement specifies the usual options for the
procedure run and produces trace plots (PLOTS=TRACE). The three PARMS statements put three model parameters, p, tau, and mu, in three different blocks. The PRIOR statements specify the prior distributions, and
the programming statements enclosed with the BEGINNODATA and ENDNODATA statements calculate the
transformation to taub and tauw. The RANDOM statement specifies the random effect, its prior distribution,
and the subject variable. The resulting trace plots are shown in Output 74.20.1, and the effective sample size
table is shown in Output 74.20.2.
5884 F Chapter 74: The MCMC Procedure
Output 74.20.1 Trace Plots
Example 74.20: Using a Transformation to Improve Mixing F 5885
Output 74.20.1 continued
Output 74.20.2 Bad Effective Sample Sizes
The MCMC Procedure
Effective Sample Sizes
Autocorrelation
Time Efficiency
Parameter
ESS
p
81.3
61.5342
0.0163
tau
61.2
81.7440
0.0122
mu
5000.0
1.0000
1.0000
theta_1
4839.9
1.0331
0.9680
theta_2
2739.7
1.8250
0.5479
theta_3
1346.6
3.7130
0.2693
theta_4
4897.5
1.0209
0.9795
theta_5
338.1
14.7866
0.0676
The trace plots show that most parameters have relatively good mixing. Two exceptions appear to be p
and . The trace plot of p shows a slow periodic movement. The parameter does not have good mixing
either. When the values are close to zero, the chain stays there for periods of time. An inspection of the
effective sample sizes table reveals the same conclusion: p and have much smaller ESSs than the rest of the
parameters.
A scatter plot of the posterior samples of p and reveals why mixing is bad in these two dimensions. The
following statements generate the scatter plot in Output 74.20.3:
5886 F Chapter 74: The MCMC Procedure
title 'Scatter Plot of Parameters on Original Scales';
proc sgplot data=m1;
yaxis label = 'p';
xaxis label = 'tau' values=(0 to 0.4 by 0.1);
scatter x = tau y = p;
run;
Output 74.20.3 Scatter Plot of versus p
The two parameters clearly have a nonlinear relationship. It is not surprising that the Metropolis algorithm
does not work well here. The algorithm is designed for cases where the parameters are linearly related with
each other.
To improve on mixing, you can sample on the log of , instead of sampling on . The formulation is:
gamma.shape D 0:001; iscale D 0:001/
log./ egamma.shape D 0:001; iscale D 0:001/
See the section “Standard Distributions” on page 5700 for the definitions of the gamma and egamma
distributions. In addition, you can sample on the logit of p. Note that
p uniform.0; 1/
is equivalent to
lgp D logit.p/ logistic.0; 1/
Example 74.20: Using a Transformation to Improve Mixing F 5887
The following statements fit the same model by using transformed parameters:
proc mcmc data=inputdata nmc=50000 thin=10 outpost=m2 seed=17
monitor=(p tau mu) plot=trace;
ods select ess tracepanel;
parms ltau lgp mu ;
prior ltau ~ egamma(shape=0.001,iscale=0.001);
prior lgp ~ logistic(0,1);
prior mu ~ normal(0,prec=0.00000001);
beginnodata;
tau = exp(ltau);
p = logistic(lgp);
taub = tau/p;
tauw = taub-tau;
endnodata;
random theta ~ normal(mu, prec=taub) subject=grp monitor=(theta);
model y ~ normal(theta,prec=tauw);
run;
The variable lgp is the logit transformation of p, and ltau is the log transformation of . The prior for ltau is
egamma, and the prior for lgp is logistic. The TAU and P assignment statements transform the parameters
back to their original scales. The rest of the programs remain unchanged. Trace plots (Output 74.20.4) and
effective sample size (Output 74.20.5) both show significant improvements in the mixing for both p and .
Output 74.20.4 Trace Plots after Transformation
5888 F Chapter 74: The MCMC Procedure
Output 74.20.4 continued
Example 74.20: Using a Transformation to Improve Mixing F 5889
Output 74.20.5 Effective Sample Sizes after Transformation
The MCMC Procedure
Effective Sample Sizes
Parameter
ESS
Autocorrelation
Time Efficiency
p
3119.4
1.6029
0.6239
tau
2588.0
1.9320
0.5176
mu
5000.0
1.0000
1.0000
theta_1
4866.0
1.0275
0.9732
theta_2
5244.5
0.9534
1.0489
theta_3
5000.0
1.0000
1.0000
theta_4
5000.0
1.0000
1.0000
theta_5
4054.8
1.2331
0.8110
The following statements generate Output 74.20.6 and Output 74.20.7:
title 'Scatter Plot of Parameters on Transformed Scales';
proc sgplot data=m2;
yaxis label = 'logit(p)';
xaxis label = 'log(tau)';
scatter x = ltau y = lgp;
run;
title 'Scatter Plot of Parameters on Original Scales';
proc sgplot data=m2;
yaxis label = 'p';
xaxis label = 'tau' values=(0 to 5.0 by 1);
scatter x = tau y = p;
run;
ods graphics off;
5890 F Chapter 74: The MCMC Procedure
Output 74.20.6 Scatter Plot of log. / versus logit.p/, After Transformation
Output 74.20.7 Scatter Plot of versus p, After Transformation
Example 74.21: Gelman-Rubin Diagnostics F 5891
The scatter plot of log. / versus logit.p/ shows a linear relationship between the two transformed parameters,
and this explains the improvement in mixing. In addition, the transformations also help the Markov chain
better explore in the original parameter space. Output 74.20.7 shows a scatter plot of versus p. The plot is
similar to Output 74.20.3. However, note that tau has a far longer tail in Output 74.20.7, extending all the
way to 5 as opposed to 0.15 in Output 74.20.3. This means that the second Markov chain can explore this
dimension of the parameter more efficiently, and as a result, you are able to draw more precise inference with
an equal number of simulations.
Example 74.21: Gelman-Rubin Diagnostics
PROC MCMC does not have the Gelman-Rubin test (see the section “Gelman and Rubin Diagnostics” on
page 142 in Chapter 7, “Introduction to Bayesian Analysis Procedures”) as a part of its diagnostics. The
Gelman-Rubin diagnostics rely on parallel chains to test whether they all converge to the same posterior
distribution. This example demonstrates how you can carry out this convergence test. The regression model
from the section “Simple Linear Regression” on page 5630 is used. The model has three parameters: ˇ0 and
ˇ1 are the regression coefficients, and 2 is the variance of the error distribution.
The following statements generate the data set:
title 'Simple Linear Regression, Gelman-Rubin Diagnostics';
data Class;
input Name $ Height Weight @@;
datalines;
Alfred 69.0 112.5
Alice 56.5 84.0
Carol
62.8 102.5
Henry 63.5 102.5
Jane
59.8 84.5
Janet 62.5 112.5
John
59.0 99.5
Joyce 51.3 50.5
Louise 56.3 77.0
Mary
66.5 112.0
Robert 64.8 128.0
Ronald 67.0 133.0
William 66.5 112.0
;
Barbara
James
Jeffrey
Judy
Philip
Thomas
65.3 98.0
57.3 83.0
62.5 84.0
64.3 90.0
72.0 150.0
57.5 85.0
To run a Gelman-Rubin diagnostic test, you want to start Markov chains at different places in the parameter
space. Suppose you want to start ˇ0 at 10, –15, and 0; ˇ1 at –5, 10, and 0; and 2 at 1, 20, and 50. You can
put these starting values in the following Init SAS data set:
data init;
input Chain beta0 beta1 sigma2;
datalines;
1
10 -5
1
2 -15 10 20
3
0
0 50
;
The following statements run PROC MCMC three times, each with starting values specified in the data set
Init:
/* define constants */
%let nchain = 3;
%let nparm = 3;
5892 F Chapter 74: The MCMC Procedure
%let nsim = 50000;
%let var = beta0 beta1 sigma2;
%macro gmcmc;
%do i=1 %to &nchain;
data _null_;
set init;
if Chain=&i;
%do j = 1 %to &nparm;
call symputx("init&j", %scan(&var, &j));
%end;
stop;
run;
proc mcmc data=class outpost=out&i init=reinit nbi=0 nmc=&nsim
stats=none seed=7;
parms beta0 &init1 beta1 &init2;
parms sigma2 &init3 / n;
prior beta0 beta1 ~ normal(0, var = 1e6);
prior sigma2 ~ igamma(3/10, scale = 10/3);
mu = beta0 + beta1*height;
model weight ~ normal(mu, var = sigma2);
run;
%end;
%mend;
ods listing close;
%gmcmc;
ods listing;
The macro variables nchain, nparm, nsim, and var define the number of chains, the number of parameters,
the number of Markov chain simulations, and the parameter names, respectively. The macro GMCMC gets
initial values from the data set Init, assigns them to the macro variables init1, init2 and init3, starts the Markov
chain at these initial values, and stores the posterior draws to three output data sets: Out1, Out2, and Out3.
In the PROC MCMC statement, the INIT=REINIT option restarts the Markov chain after tuning at the
assigned initial values. No burn-in is requested.
You can use the autocall macro GELMAN to calculate the Gelman-Rubin statistics by using the three chains.
The GELMAN macro has the following arguments:
%macro gelman(dset, nparm, var, nsim, nc=3, alpha=0.05);
The argument dset is the name of the data set that stores the posterior samples from all the runs, nparm is the
number of parameters, var is the name of the parameters, nsim is the number of simulations, nc is the number
of chains with a default value of 3, and alpha is the ˛ significant level in the test with a default value of 0.05.
This macro creates two data sets: _Gelman_Ests stores the diagnostic estimates and _Gelman_Parms stores
the names of the parameters.
Example 74.21: Gelman-Rubin Diagnostics F 5893
The following statements calculate the Gelman-Rubin diagnostics:
data all;
set out1(in=in1) out2(in=in2) out3(in=in3);
if in1 then Chain=1;
if in2 then Chain=2;
if in3 then Chain=3;
run;
%gelman(all, &nparm, &var, &nsim);
data GelmanRubin(label='Gelman-Rubin Diagnostics');
merge _Gelman_Parms _Gelman_Ests;
run;
proc print data=GelmanRubin;
run;
The Gelman-Rubin statistics are shown in Output 74.21.1.
Output 74.21.1 Gelman-Rubin Diagnostics of the Regression Example
Simple Linear Regression, Gelman-Rubin Diagnostics
Obs Parameter Between-chain Within-chain Estimate UpperBound
1 beta0
5384.76
1168.64
1.0002
1.0001
2 beta1
1.20
0.30
1.0002
1.0002
8034.41
2890.00
1.0010
1.0011
3 sigma2
The Gelman-Rubin statistics do not reveal any concerns about the convergence or the mixing of the multiple
chains. To get a better visual picture of the multiple chains, you can draw overlapping trace plots of these
parameters from the three Markov chains runs.
The following statements create Output 74.21.2:
/* plot the trace plots of three Markov chains. */
%macro trace;
%do i = 1 %to &nparm;
proc sgplot data=all cycleattrs;
series x=Iteration y=%scan(&var, &i) / group=Chain;
run;
%end;
%mend;
%trace;
5894 F Chapter 74: The MCMC Procedure
Output 74.21.2 Trace Plots of Three Chains for Each of the Parameters
Example 74.21: Gelman-Rubin Diagnostics F 5895
Output 74.21.2 continued
The trace plots show that three chains all eventually converge to the same regions even though they started
at very different locations. In addition to the trace plots, you can also plot the potential scale reduction
factor (PSRF). See the section “Gelman and Rubin Diagnostics” on page 142 in Chapter 7, “Introduction to
Bayesian Analysis Procedures,” for definition and details.
The following statements calculate PSRF for each parameter. They use the GELMAN macro repeatedly and
can take a while to run:
/* define sliding window size */
%let nwin = 200;
data PSRF;
run;
%macro PSRF(nsim);
%do k = 1 %to %sysevalf(&nsim/&nwin, floor);
%gelman(all, &nparm, &var, nsim=%sysevalf(&k*&nwin));
data GelmanRubin;
merge _Gelman_Parms _Gelman_Ests;
run;
data PSRF;
set PSRF GelmanRubin;
run;
%end;
%mend PSRF;
options nonotes;
%PSRF(&nsim);
5896 F Chapter 74: The MCMC Procedure
options notes;
data PSRF;
set PSRF;
if _n_ = 1 then delete;
run;
proc sort data=PSRF;
by Parameter;
run;
%macro sepPSRF(nparm=, var=, nsim=);
%do k = 1 %to &nparm;
data save&k; set PSRF;
if _n_ > %sysevalf(&k*&nsim/&nwin, floor) then delete;
if _n_ < %sysevalf((&k-1)*&nsim/&nwin + 1, floor) then delete;
Iteration + &nwin;
run;
proc sgplot data=save&k(firstobs=10) cycleattrs;
series x=Iteration y=Estimate;
series x=Iteration y=upperbound;
yaxis label="%scan(&var, &k)";
run;
%end;
%mend sepPSRF;
%sepPSRF(nparm=&nparm, var=&var, nsim=&nsim);
Example 74.21: Gelman-Rubin Diagnostics F 5897
Output 74.21.3 PSRF Plot for Each Parameter
5898 F Chapter 74: The MCMC Procedure
Output 74.21.3 continued
PSRF is the square root of the ratio of the between-chain variance and the within-chain variance. A large
PSRF indicates that the between-chain variance is substantially greater than the within-chain variance, so
that longer simulation is needed. You want the PSRF to converge to 1 eventually, as it appears to be the case
in this simulation study.
Example 74.22: One-Compartment Model with Pharmacokinetic Data
A popular application of nonlinear mixed models is in the field of pharmacokinetics, which studies how a
drug disperses through a living individual. This example considers the theophylline data from Pinheiro and
Bates (1995). Serum concentrations of the drug theophylline, which is used to treat respiratory diseases, are
measured in 12 subjects over a 25-hour period after oral administration. The data are as follows.
data theoph;
input subject time conc dose;
datalines;
1 0.00 0.74 4.02
1 0.25 2.84 4.02
1 0.57 6.57 4.02
1 1.12 10.50 4.02
1 2.02 9.66 4.02
1 3.82 8.58 4.02
1 5.10 8.36 4.02
1 7.03 7.47 4.02
Example 74.22: One-Compartment Model with Pharmacokinetic Data F 5899
... more lines ...
12 24.15
;
1.17 5.30
A commonly used one-compartment model is based on the differential equations
dA0 .t/
dt
dA.t/
dt
D
Ka A0 .t /
D Ka A0 .t /
Ke A.t /
where Ka is the absorption rate and Ke is the elimination rate.
The initial values are
Aa .t D 0/ D x
A.t D 0/ D 0
where x is the dosage.
The expected concentration of the substance in the body is computed by dividing the solution to the ODE,
A.t/, by Cl, the clearance,
i .t/ D Ai .t /=C l
yi .t/ normal.i .t /; 2 /
where i is the subject index.
Pinheiro and Bates (1995) consider the following first-order compartment model for these data, where
log.C l/ and log.Ka / are modeled using random effects to account for the patient-to-patient variability:
C li
D exp.ˇ1 C bi1 /
Kai
D exp.ˇ2 C bi 2 /
Kei
D exp.ˇ3 /
Here the ˇs denote fixed-effects parameters and the bi s denote random-effects parameters with an unknown
covariance matrix.
Although there is an analytical solution to this set of differential equations, this example illustrates the use of
a general ODE solver that does not require a closed-form solution. To use the ODE solver, you want to first
define the set of differential equations in PROC FCMP and store the objective function, called OneComp, in
a user library:
5900 F Chapter 74: The MCMC Procedure
proc fcmp outlib=sasuser.funcs.PK;
subroutine OneComp(t,y[*],dy[*],ka,ke);
outargs dy;
dy[1] = -ka*y[1];
dy[2] = ka*y[1]-ke*y[2];
endsub;
run;
The first argument of the OneComp subroutine is the time variable in the differential equation. The second
argument is the with-respect-to variable, which can be an array in case a multidimensional problem is
required. The third argument is an array that stores the differential equations. This dy argument must also be
the OUTARGS variable in the subroutine. The subsequent variables are additional variables, depending on
the problem at hand.
In the OneComp subroutine, you can define Ka and Ke as functions of ˇ1 , ˇ2 , and the random-effects
parameter b2 . The dy[1] and dy[2] are the differential equation, with dy[1] storing dAa =dt and dy[2] storing
dAc =dt.
The following PROC MCMC statements use the CALL ODE subroutine to solve the set of differential
equations defined in OneComp and then use that solution in the construction of the likelihood function:
options cmplib=sasuser.funcs;
proc mcmc data=theoph nmc=10000 seed=27 outpost=theophO diag=none
nthreads=8;
ods select PostSumInt;
array b[2];
array muB[2] (0 0);
array cov[2,2];
array S[2,2] (1 0 0 1);
array init[2] dose 0;
array sol[2];
parms beta1 -3.22 beta2 0.47 beta3 -2.45 ;
parms cov {0.03 0 0 0.4};
parms s2y;
prior beta: ~ normal(0, sd=100);
prior cov ~ iwish(2, S);
prior s2y ~ igamma(shape=3, scale=2);
random b ~ mvn(muB, cov) subject=subject;
cl
= exp(beta1 + b1);
ka
= exp(beta2 + b2);
ke
= exp(beta3);
v
= cl/ke;
call ode('OneComp',sol,init,0,time,ka,ke);
mu
= (sol[2]/v);
model conc ~ normal(mu,var=s2y);
run;
Example 74.22: One-Compartment Model with Pharmacokinetic Data F 5901
The INIT array stores the initial values of the two differential equations, with Aa .t D 0/ D dose and
A.t D 0/ D 0. The array is used as an input argument to the CALL ODE subroutine.
The RANDOM statement specifies two-dimensional random effects, b, with a multivariate normal prior. The
first random effect, b1, enters the model through the clearance variable, cl. The second random effect, b2,
is part of the differential equations. The CALL ODE subroutine solves the OneComp set of differential
equations and returns the solution to the SOL array. The second array element, SOL[2], is the solution to
Ai .t/ for every subject i at every time t.
Posterior summary statistics are reported in Output 74.22.1.
Output 74.22.1 Posterior Summary Statistics
The MCMC Procedure
Posterior Summaries and Intervals
Parameter
N
beta1
10000
beta2
beta3
Standard
Mean Deviation
95%
HPD Interval
-3.2073
0.0836 -3.3768 -3.0429
10000
0.4375
0.1860 0.1112 0.7671
10000
-2.4608
0.0547 -2.5642 -2.3576
cov1
10000
0.1312
0.0637 0.0429 0.2507
cov2
10000 -0.00248
0.0919 -0.1876 0.1793
cov3
10000 -0.00248
0.0919 -0.1876 0.1793
cov4
10000
0.6155
0.3175 0.1897 1.2190
s2y
10000
0.5246
0.0719 0.3954 0.6714
In this problem, the closed-form solution of the ODE is known:
Ci t D
Dkei kai
Œexp. kei t /
C li .kai kei /
exp. kai t / C ei t
You can manually enter the equation in PROC MCMC and use the following program to fit the same model:
proc mcmc data=theoph nmc=10000 seed=22 outpost=theophC;
array b[2];
array mu[2] (0 0);
array cov[2,2];
array S[2,2] (1 0 0 1);
parms beta1 -3.22 beta2 0.47 beta3 -2.45 ;
parms cov {0.03 0 0 0.4};
parms s2y;
prior beta: ~ normal(0, sd=100);
prior cov ~ iwish(2, S);
prior s2y ~ igamma(shape=3, scale=2);
random
cl
=
ka
=
ke
=
b ~ mvn(mu, cov) subject=subject;
exp(beta1 + b1);
exp(beta2 + b2);
exp(beta3);
5902 F Chapter 74: The MCMC Procedure
pred = dose*ke*ka*(exp(-ke*time)-exp(-ka*time))/cl/(ka-ke);
model conc ~ normal(pred,var=s2y);
run;
Because this program makes it unnecessary to numerically solve the ODE at every observation and every
iteration, it runs much faster than the program that uses the CALL ODE subroutine. But few pharmacokinetic
models have known solutions that enable you to do that.
References
Aitkin, M., Anderson, D. A., Francis, B., and Hinde, J. (1989). Statistical Modelling in GLIM. Oxford:
Oxford Science Publications.
Atkinson, A. C. (1979). “The Computer Generation of Poisson Random Variables.” Journal of the Royal
Statistical Society, Series C 28:29–35.
Atkinson, A. C., and Whittaker, J. (1976). “A Switching Algorithm for the Generation of Beta Random
Variables with at Least One Parameter Less Than One.” Journal of the Royal Statistical Society, Series A
139:462–467.
Bacon, D. W., and Watts, D. G. (1971). “Estimating the Transition between Two Intersecting Straight Lines.”
Biometrika 58:525–534.
Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2004). Hierarchical Modeling and Analysis for Spatial Data.
Boca Raton, FL: Chapman & Hall/CRC.
Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis. 2nd ed. New York: Springer-Verlag.
Box, G. E. P., and Cox, D. R. (1964). “An Analysis of Transformations.” Journal of the Royal Statistical
Society, Series B 26:211–234.
Byrne, G. D., and Hindmarsh, A. C. (1975). “A Polyalgorithm for the Numerical Solution of ODEs.” ACM
Transactions on Mathematical Software 1:71–96.
Carlin, B. P., Gelfand, A. E., and Smith, A. F. M. (1992). “Hierarchical Bayesian Analysis of Changepoint
Problems.” Journal of the Royal Statistical Society, Series C 41:389–405.
Chaloner, K. (1994). “Residual Analysis and Outliers in Bayesian Hierarchical Models.” In Aspects of
Uncertainty: A Tribute to D. V. Lindley, 149–157. New York: John Wiley & Sons.
Chaloner, K., and Brant, R. (1988). “A Bayesian Approach to Outlier Detection and Residual Analysis.”
Biometrika 75:651–659.
Cheng, R. C. H. (1978). “Generating Beta Variates with Non-integral Shape Parameters.” Communications
ACM 28:290–295.
Clayton, D. G. (1991). “A Monte Carlo Method for Bayesian Inference in Frailty Models.” Biometrics
47:467–485.
Congdon, P. (2003). Applied Bayesian Modeling. New York: John Wiley & Sons.
References F 5903
Crowder, M. J. (1978). “Beta-Binomial Anova for Proportions.” Journal of the Royal Statistical Society,
Series C 27:34–37.
Draper, D. (1996). “Discussion of the Paper by Lee and Nelder.” Journal of the Royal Statistical Society,
Series B 58:662–663.
Eilers, P. H. C., and Marx, B. D. (1996). “Flexible Smoothing with B-Splines and Penalties.” Statistical
Science 11:89–121. With discussion.
Finney, D. J. (1947). “The Estimation from Individual Records of the Relationship between Dose and Quantal
Response.” Biometrika 34:320–334.
Fisher, R. A. (1935). “The Fiducial Argument in Statistical Inference.” Annals of Eugenics 6:391–398.
Fishman, G. S. (1996). Monte Carlo: Concepts, Algorithms, and Applications. New York: John Wiley &
Sons.
Gaver, D. P., and O’Muircheartaigh, I. G. (1987). “Robust Empirical Bayes Analysis of Event Rates.”
Technometrics 29:1–15.
Gelfand, A. E., Hills, S. E., Racine-Poon, A., and Smith, A. F. M. (1990). “Illustration of Bayesian
Inference in Normal Data Models Using Gibbs Sampling.” Journal of the American Statistical Association
85:972–985.
Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2004). Bayesian Data Analysis. 2nd ed. London:
Chapman & Hall.
Gentleman, R., and Geyer, C. J. (1994). “Maximum Likelihood for Interval Censored Data: Consistency and
Computation.” Biometrika 81:618–623.
Gilks, W. R. (2003). “Adaptive Metropolis Rejection Sampling (ARMS).” Software from MRC Biostatistics Unit, Cambridge, UK. http://www.maths.leeds.ac.uk/~wally.gilks/adaptive.
rejection/web_page/Welcome.html.
Gilks, W. R., and Wild, P. (1992). “Adaptive Rejection Sampling for Gibbs Sampling.” Journal of the Royal
Statistical Society, Series C 41:337–348.
Holmes, C. C., and Held, L. (2006). “Bayesian Auxiliary Variable Models for Binary and Multinomial
Regression.” Bayesian Analysis 1:145–168.
Ibrahim, J. G., Chen, M.-H., and Lipsitz, S. R. (2001). “Missing Responses in Generalised Linear Mixed
Models When the Missing Data Mechanism Is Nonignorable.” Biometrika 88:551–564.
Kass, R. E., Carlin, B. P., Gelman, A., and Neal, R. M. (1998). “Markov Chain Monte Carlo in Practice: A
Roundtable Discussion.” American Statistician 52:93–100.
Krall, J. M., Uthoff, V. A., and Harley, J. B. (1975). “A Step-Up Procedure for Selecting Variables Associated
with Survival.” Biometrics 31:49–57.
Kuhfeld, W. F. (2010). Conjoint Analysis. Technical report, SAS Institute Inc., Cary, NC. http://
support.sas.com/resources/papers/tnote/tnote_marketresearch.html.
Lin, D. Y. (1994). “Cox Regression Analysis of Multivariate Failure Time Data: The Marginal Approach.”
Statistics in Medicine 13:2233–2247.
5904 F Chapter 74: The MCMC Procedure
Little, R. J. A., and Rubin, D. B. (2002). Statistical Analysis with Missing Data. 2nd ed. Hoboken, NJ: John
Wiley & Sons.
Matsumoto, M., and Kurita, Y. (1992). “Twisted GFSR Generators.” ACM Transactions on Modeling and
Computer Simulation 2:179–194.
Matsumoto, M., and Kurita, Y. (1994). “Twisted GFSR Generators II.” ACM Transactions on Modeling and
Computer Simulation 4:254–266.
Matsumoto, M., and Nishimura, T. (1998). “Mersenne Twister: A 623-Dimensionally Equidistributed
Uniform Pseudo-random Number Generator.” ACM Transactions on Modeling and Computer Simulation
8:3–30.
McGrath, E. J., and Irving, D. C. (1973). Techniques for Efficient Monte Carlo Simulation, Vol. 2: Random
Number Generation for Selected Probability Distributions. Technical report, Science Applications Inc., La
Jolla, CA.
Michael, J. R., Schucany, W. R., and Haas, R. W. (1976). “Generating Random Variates Using Transformations with Multiple Roots.” American Statistician 30:88–90.
Pinheiro, J. C., and Bates, D. M. (1995). “Approximations to the Log-Likelihood Function in the Nonlinear
Mixed-Effects Model.” Journal of Computational and Graphical Statistics 4:12–35.
Pregibon, D. (1981). “Logistic Regression Diagnostics.” Annals of Statistics 9:705–724.
Ralston, A., and Rabinowitz, P. (1978). A First Course in Numerical Analysis. New York: McGraw-Hill.
Rice, S. O. (1973). “Efficient Evaluation of Integrals of Analytic Functions by the Trapezoidal Rule.” Bell
System Technical Journal 52:707–722.
Ripley, B. D. (1987). Stochastic Simulation. New York: John Wiley & Sons.
Robert, C. P. (1995). “Simulation of Truncated Normal Variables.” Statistics and Computing 5:121–125.
Roberts, G. O., Gelman, A., and Gilks, W. R. (1997). “Weak Convergence and Optimal Scaling of Random
Walk Metropolis Algorithms.” Annals of Applied Probability 7:110–120.
Roberts, G. O., and Rosenthal, J. S. (2001). “Optimal Scaling for Various Metropolis-Hastings Algorithms.”
Statistical Science 16:351–367.
Rubin, D. B. (1976). “Inference and Missing Data.” Biometrika 63:581–592.
Rubin, D. B. (1981). “Estimation in Parallel Randomized Experiments.” Journal of Educational Statistics
6:377–411.
Schervish, M. J. (1995). Theory of Statistics. New York: Springer-Verlag.
Sharples, L. (1990). “Identification and Accommodation of Outliers in General Hierarchical Models.”
Biometrika 77:445–453.
Sikorsky, K. (1982). “Optimal Quadrature Algorithms in HP Spaces.” Numerische Mathematik 39:405–410.
Sikorsky, K., and Stenger, F. (1984). “Optimal Quadratures in HP Spaces.” ACM Transactions on Mathematical Software 3:140–151.
References F 5905
Spiegelhalter, D. J., Thomas, A., Best, N. G., and Gilks, W. R. (1996a). “BUGS Examples, Volume 1.”
Version 0.5 (version ii).
Spiegelhalter, D. J., Thomas, A., Best, N. G., and Gilks, W. R. (1996b). “BUGS Examples, Volume 2.”
Version 0.5 (version ii).
Squire, W. (1987). “Comparison of Gauss-Hermite and Midpoint Quadrature with Application to the Voigt
Function.” In Numerical Integration: Recent Developments, edited by P. Keast and G. Fairweather,
111–112. Dordrecht, Netherlands: D. Reidel Publishing.
Stenger, F. (1973a). “Integration Formulas Based on the Trapezoidal Formula.” Journal of the Institute of
Mathematics and Its Applications 12:103–114.
Stenger, F. (1973b). “Remarks on Integration Formulas Based on the Trapezoidal Formula.” Journal of the
Institute of Mathematics and Its Applications 19:145–147.
Thomas, A., Best, N. G., Lunn, D., Arnold, R., and Spiegelhalter, D. (2004). “GeoBUGS User Manual.”
http://www.mrc-bsu.cam.ac.uk/wp-content/uploads/geobugs12manual.pdf.
Subject Index
arrays
MCMC procedure, 5662
monitor values of (MCMC), 5839
Autoregressive Multivariate Normal Distribution
MCMC procedure, 5713
autoregressive multivariate normal distribution
definition of (MCMC), 5713
Behrens-Fisher problem
MCMC procedure, 5636
Bernoulli distribution
definition of (MCMC), 5701
MCMC procedure, 5667, 5679, 5701
beta distribution
definition of (MCMC), 5700
MCMC procedure, 5667, 5679, 5700
binary distribution
definition of (MCMC), 5701
MCMC procedure, 5667, 5679, 5701
binomial distribution
definition of (MCMC), 5701
MCMC procedure, 5667, 5701
blocking
MCMC procedure, 5691
Box-Cox transformation
estimate D 0, 5789
MCMC procedure, 5785
Categorical distribution
definition of (MCMC), 5711
MCMC procedure, 5711
Cauchy distribution
definition of (MCMC), 5702
MCMC procedure, 5667, 5702
censoring
MCMC procedure, 5719, 5866
chi-square distribution
definition of (MCMC), 5702
MCMC procedure, 5667, 5702
conjugate sampling
MCMC procedure, 5696
constants specification
MCMC procedure, 5663
convergence
MCMC procedure, 5764
corner-point constraint
MCMC procedure, 5682
Cox models
MCMC procedure, 5847, 5854
dgeneral distribution
MCMC procedure, 5667, 5680, 5715
direct sampling
MCMC procedure, 5696
Dirichlet distribution
MCMC procedure, 5670, 5712
dirichlet distribution
definition of (MCMC), 5712
dlogden distribution
MCMC procedure, 5667, 5680
double exponential distribution
definition of (MCMC), 5707
MCMC procedure, 5669, 5680, 5707
dynamic linear model
MCMC procedure, 5733
examples, MCMC
array subscripts, 5663
arrays, 5662
arrays, store data set variables, 5799
BEGINCNST/ENDCNST statements, 5799
Behrens-Fisher problem, 5636
blocking, 5691
Box-Cox transformation, 5785
Caterpillar Plot, 5742
censoring, 5720, 5866
change point models, 5830
cloglog transformation, 5723
constrained analysis, 5867
Cox models, 5847, 5854
Cox models, time dependent covariates, 5854
Cox models, time independent covariates, 5847
deviance information criterion, 5845
discrete priors, 5789
dynamic linear model, 5733
error finding using the PUT statement, 5766
estimate functionals, 5796, 5839
estimate posterior probabilities, 5639
exponential models, survival analysis, 5835
FCMP procedure, 5875, 5878
Gelman-Rubin diagnostics, 5891
generalized linear models, 5792, 5799, 5802
GENMOD procedure, BAYES statement, 5801,
5803
getting started, 5630
graphics, box plots, 5843
graphics, custom template, 5751
graphics, fit plots, 5834
graphics, kernel density comparisons, 5779, 5785
graphics, multiple chains, 5895
graphics, posterior predictive checks, 5752
graphics, PSRF plots, 5898
graphics, scatter plots, 5831, 5886, 5890, 5891
graphics, survival curves, 5844
hierarchical centering, 5813
IF-ELSE statement, 5638
implement a new sampling algorithm, 5873
improve mixing, 5804, 5882
improving mixing, 5813
initial values, 5699
interval censoring, 5866
Jeffreys’ prior, 5799
JOINTMODEL option, 5730, 5851, 5857
LAG functions, 5849
linear regression, 5630
log transformation, 5722
logistic regression, diffuse prior, 5792
logistic regression, Jeffreys’ prior, 5799
logistic regression, random-effects, 5812
logistic regression, sampling via Gibbs, 5873
logit transformation, 5722
matrix functions, 5799, 5871, 5879
missing at random (MAR), 5823
missing not at random (MNAR), 5827
MISSING= option, 5856
mixed-effects models, 5640, 5812
mixing, 5804, 5882
mixture of normal densities, 5782
model comparison, 5845
modelling dependent data, 5730
MONITOR= option, arrays, 5839
multilevel random-effects models, 5814
Multivariate Distribution, 5713
multivariate priors, 5871
nonignorably missing, 5827
nonlinear Poisson regression, 5804
pharmacokinetics, 5898
PHREG procedure, BAYES statement, 5853,
5858
Piecewise Exponential Frailty Models, 5859
Poisson regression, 5802
Poisson regression, multilevel random-effects,
5814
Poisson regression, nonlinear, 5804, 5814
posterior predictive distribution, 5749
probit transformation, 5723
proportional hazard models, 5847, 5854
random-effects models, 5640, 5812, 5814
regenerate diagnostics plots, 5740
SGPLOT procedure, 5778, 5784, 5831, 5833,
5842, 5844, 5885, 5889, 5893, 5895
SGRENDER procedure, 5752
specifying a new distribution, 5715
store data set variables in arrays, 5799
survival analysis, 5834
survival analysis, exponential models, 5835
survival analysis, Weibull model, 5838
TEMPLATE procedure, 5751
theophylline data, 5898
truncated distributions, 5720, 5871
UDS statement, 5873
use macros to construct log-likelihood, 5854
user-defined samplers, 5873
Weibull model, survival analysis, 5838
exponential chi-square distribution
definition of (MCMC), 5702
MCMC procedure, 5667, 5702
exponential distribution
definition of (MCMC), 5704
MCMC procedure, 5668, 5704
exponential exponential distribution
definition of (MCMC), 5702
MCMC procedure, 5668, 5702
exponential gamma distribution
definition of (MCMC), 5703
MCMC procedure, 5668, 5703
exponential inverse chi-square distribution
definition of (MCMC), 5703
MCMC procedure, 5668, 5703
exponential inverse-gamma distribution
definition of (MCMC), 5703
MCMC procedure, 5668, 5703
exponential scaled inverse chi-square distribution
definition of (MCMC), 5704
MCMC procedure, 5668, 5704
floating point errors
MCMC procedure, 5763
gamma distribution
definition of (MCMC), 5704
MCMC procedure, 5668, 5679, 5704, 5746
Gaussian distribution
definition of (MCMC), 5709
MCMC procedure, 5669, 5680, 5709
Gelman-Rubin diagnostics
MCMC procedure, 5891
general distribution
MCMC procedure, 5669, 5680, 5715
generalized linear models
MCMC procedure, 5792, 5799, 5802
geometric distribution
definition of (MCMC), 5705
MCMC procedure, 5668, 5705
handling error messages
MCMC procedure, 5766
hierarchical centering
MCMC procedure, 5813
initial values
MCMC procedure, 5630, 5651, 5674, 5695, 5698,
5699
intrinsic Gaussian CAR distribution
definition of (MCMC), 5709
MCMC procedure, 5709
inverse chi-square distribution
definition of (MCMC), 5706
MCMC procedure, 5669, 5706
inverse Gaussian distribution
definition of (MCMC), 5711
MCMC procedure, 5670, 5711
Inverse Wishart distribution
definition of (MCMC), 5712
MCMC procedure, 5712
inverse Wishart distribution
MCMC procedure, 5670
inverse-gamma distribution
definition of (MCMC), 5706
MCMC procedure, 5669, 5680, 5706, 5746
Laplace distribution
definition of (MCMC), 5707
MCMC procedure, 5669, 5680, 5707
likelihood function specification
MCMC procedure, 5665
logden distribution
MCMC procedure, 5669, 5680
logistic distribution
definition of (MCMC), 5707
MCMC procedure, 5669, 5707
lognormal distribution
definition of (MCMC), 5708
MCMC procedure, 5669, 5708
long run times
MCMC procedure, 5764
marginal distribution
MCMC procedure, 5752
Maximum a posteriori
MCMC procedure, 5695
MCMC procedure, 5628
arrays, 5662
Autoregressive Multivariate Normal Distribution,
5713
Behrens-Fisher problem, 5636
Bernoulli distribution, 5667, 5679, 5701
beta distribution, 5667, 5679, 5700
binary distribution, 5667, 5679, 5701
binomial distribution, 5667, 5701
blocking, 5691
Box-Cox transformation, 5785
CALL ODE subroutine, 5734
CALL QUAD subroutine, 5734
Categorical distribution, 5711
Cauchy distribution, 5667, 5702
censoring, 5719, 5866
chi-square distribution, 5667, 5702
compared with other SAS procedures, 5629
computational resources, 5768
conjugate sampling, 5696
constants specification, 5663
convergence, 5764
corner-point constraint, 5682
Cox models, 5847, 5854
deviance information criterion, 5845
dgeneral distribution, 5667, 5680, 5715
direct sampling, 5696
Dirichlet distribution, 5670, 5712
dlogden distribution, 5667, 5680
double exponential distribution, 5669, 5680, 5707
dynamic linear model, 5733
examples, see also examples, MCMC, 5776
exponential chi-square distribution, 5667, 5702
exponential distribution, 5668, 5704
exponential exponential distribution, 5668, 5702
exponential gamma distribution, 5668, 5703
exponential inverse chi-square distribution, 5668,
5703
exponential inverse-gamma distribution, 5668,
5703
exponential scaled inverse chi-square distribution,
5668, 5704
floating point errors, 5763
gamma distribution, 5668, 5679, 5704, 5746
Gaussian distribution, 5669, 5680, 5709
Gelman-Rubin diagnostics, 5891
general distribution, 5669, 5680, 5715
general integration function, 5734
general ODE solver, 5734
generalized linear models, 5792, 5799, 5802
geometric distribution, 5668, 5705
handling error messages, 5766
hierarchical centering, 5813
hyperprior distribution, 5664, 5676
initial values, 5630, 5651, 5674, 5695, 5698, 5699
intrinsic Gaussian CAR distribution, 5709
inverse chi-square distribution, 5669, 5706
inverse Gaussian distribution, 5670, 5711
Inverse Wishart distribution, 5712
inverse Wishart distribution, 5670
inverse-gamma distribution, 5669, 5680, 5706,
5746
Laplace distribution, 5669, 5680, 5707
likelihood function specification, 5665
logden distribution, 5669, 5680
logistic distribution, 5669, 5707
lognormal distribution, 5669, 5708
long run times, 5764
marginal distribution, 5752
Maximum a posteriori, 5695
mixed-effects models, 5812
mixing, 5804, 5882
model missing response variables, 5665
model parameters, 5674
model specification, 5665
modeling dependent data, 5847
Multinomial Distribution, 5713
Multinomial distribution, 5670
Multivariate Normal Distribution, 5712
MVN distribution, 5670, 5680
MVNAR distribution, 5670, 5680
negative binomial distribution, 5669, 5708
nonlinear Poisson regression, 5804
normal distribution, 5669, 5680, 5709
NORMALCAR distribution, 5681
normalcar distribution, 5709
options, 5647
options summary, 5645
output ODS Graphics table names, 5775
output table names, 5773
overflows, 5763
parameters specification, 5674
pareto distribution, 5669, 5709
pharmacokinetics, 5898
Piecewise Differential Equations, 5735
Piecewise Exponential Frailty Models, 5859
Poisson distribution, 5670, 5680, 5710
posterior predictive distribution, 5675, 5749
posterior samples data set, 5654
precision of solution, 5766
prior distribution, 5664, 5676
prior predictive distribution, 5752
programming statements, 5677
proposal distribution, 5694
random effects, 5679
random-effects models, 5812
run times, 5764, 5768
scaled inverse chi-square distribution, 5670, 5710
Spatial Prior, 5761
specifying a new distribution, 5715
standard distributions, 5700
survival analysis, 5834
syntax summary, 5645
t distribution, 5670, 5710
table (categorical) distribution, 5670, 5680
Table distribution, 5711
truncated distributions, 5719
tuning, 5694
UDS statement, 5687
uniform distribution, 5670, 5680, 5711
user defined sampler statement, 5687
user-defined distribution, 5669, 5680
user-defined samplers, 5873
using the IF-ELSE logical control, 5785
Wald distribution, 5670, 5711
Weibull distribution, 5670, 5712
WinBUGS specification of the gamma
distribution, 5747
Missing data
Missing Not at Random (MNAR), 5754
MNAR, 5754
missing data
Ignorably Missing, 5753
MAR, 5753
MCAR, 5753
Missing at Random (MAR), 5753
Missing Completely at Random (MCAR), 5753
Nonignorably Missing, 5754
Pattern-Mixture Model, 5754
Selection Model, 5754
mixed-effects models
MCMC procedure, 5812
mixing
convergence (MCMC), 5882
improving (MCMC), 5765, 5804, 5882
MCMC procedure, 5804, 5882
model missing response variables
MCMC procedure, 5665
model parameters
MCMC procedure, 5674
model specification
MCMC procedure, 5665
Multinomial Distribution
MCMC procedure, 5713
Multinomial distribution
MCMC procedure, 5670
multinomial distribution
definition of (MCMC), 5713
Multivariate Normal Distribution
MCMC procedure, 5712
multivariate normal distribution
definition of (MCMC), 5712
multivariate normal distribution with a first-order
autoregressive covariance
definition of (MCMC), 5713
MVN distribution
MCMC procedure, 5670, 5680
MVNAR distribution
MCMC procedure, 5670, 5680
negative binomial distribution
definition of (MCMC), 5708
MCMC procedure, 5669, 5708
nonlinear Poisson regression
MCMC procedure, 5804
normal distribution
definition of (MCMC), 5709
MCMC procedure, 5669, 5680, 5709
NORMALCAR distribution
MCMC procedure, 5681
normalcar distribution
definition of (MCMC), 5709
MCMC procedure, 5709
output ODS Graphics table names
MCMC procedure, 5775
output table names
MCMC procedure, 5773
overflows
MCMC procedure, 5763
parameters specification
MCMC procedure, 5674
pareto distribution
definition of (MCMC), 5709
MCMC procedure, 5669, 5709
pharmacokinetics
MCMC procedure, 5898
Piecewise Exponential Frailty Models
MCMC procedure, 5859
Poisson distribution
definition of (MCMC), 5710
MCMC procedure, 5670, 5680, 5710
posterior predictive distribution
MCMC procedure, 5675, 5749
precision of solution
MCMC procedure, 5766
prior distribution
distribution specification (MCMC), 5664, 5676
hyperprior specification (MCMC), 5664, 5676
predictive distribution (MCMC), 5752, 5753
user-defined (MCMC), 5669, 5680, 5715
programming statements
MCMC procedure, 5677
proposal distribution
MCMC procedure, 5694
random effects
MCMC procedure, 5679
random-effects models
MCMC procedure, 5812
run times
MCMC procedure, 5764, 5768
scaled inverse chi-square distribution
definition of (MCMC), 5710
MCMC procedure, 5670, 5710
Spatial Prior
MCMC procedure, 5761
specifying a new distribution
MCMC procedure, 5715
standard distributions
MCMC procedure, 5700
survival analysis
MCMC procedure, 5834
t distribution
definition of (MCMC), 5710
MCMC procedure, 5670, 5710
table (categorical) distribution
MCMC procedure, 5670, 5680
Table distribution
definition of (MCMC), 5711
MCMC procedure, 5711
theophylline data
examples, MCMC, 5898
truncated distributions
MCMC procedure, 5719
tuning
MCMC procedure, 5694
UDS statement
MCMC procedure, 5687
uniform distribution
definition of (MCMC), 5711
MCMC procedure, 5670, 5680, 5711
user defined sampler statement
MCMC procedure, 5687
user-defined distribution
MCMC procedure, 5669, 5680
user-defined samplers
MCMC procedure, 5873
using the IF-ELSE logical control
MCMC procedure, 5785
Wald distribution
definition of (MCMC), 5711
MCMC procedure, 5670, 5711
Weibull distribution
definition of (MCMC), 5712
MCMC procedure, 5670, 5712
WinBUGS specification of the gamma distribution
MCMC procedure, 5747
Syntax Index
ACCEPTTOL= option
PROC MCMC statement, 5647
ALG= option
PROC MCMC statement, 5647
ALGORITHM= option
RANDOM statement (MCMC), 5682
ARRAY statement
MCMC procedure, 5662
AUTOCORLAG= option
PROC MCMC statement, 5648
BEGINCNST statement
MCMC procedure, 5663
BEGINNODATA statement
MCMC procedure, 5664
BEGINPRIOR statement
MCMC procedure, 5664
BINARYJOINT option
PROC MCMC statement, 5648
BY statement
MCMC procedure, 5665
CENTER option
RANDOM statement, 5682
CONSTRAINT= option
RANDOM statement (MCMC), 5682
COVARIATES= option
PREDDIST statement (MCMC), 5676
DATA= option
PROC MCMC statement, 5651
DIAG= option
PROC MCMC statement, 5649
DIAGNOSTICS= option
PROC MCMC statement, 5649
DIC option
PROC MCMC statement, 5651
DISCRETE= option
PROC MCMC statement, 5648
ENDCNST statement
MCMC procedure, 5663
ENDNODATA statement
MCMC procedure, 5664
ENDPRIOR statement
MCMC procedure, 5664
HYPERPRIOR statement
MCMC procedure, 5676
ICOND= option
MODEL statement(MCMC), 5671
RANDOM statement(MCMC), 5683
INF= option
PROC MCMC statement, 5651
INIT= option
PROC MCMC statement, 5651
INITIAL= option
MODEL statement(MCMC), 5672
RANDOM statement(MCMC), 5684
ISTATES= option
RANDOM statement(MCMC), 5683
JOINTMODEL option
PROC MCMC statement, 5652
LIST option
PROC MCMC statement, 5652
LISTCODE option
PROC MCMC statement, 5652
MAXINDEXPRINT= option
PROC MCMC statement, 5652
MAXSUBVALUEPRINT= option
PROC MCMC statement, 5652
MAXTUNE= option
PROC MCMC statement, 5652
MCHISTORY= option
PROC MCMC statement, 5653
MCMC procedure, 5645
ARRAY statement, 5662
BEGINCNST statement, 5663
BEGINNODATA statement, 5664
BEGINPRIOR statement, 5664
ENDCNST statement, 5663
ENDNODATA statement, 5664
ENDPRIOR statement, 5664
HYPERPRIOR statement, 5676
MODEL statement, 5665
PARMS statement, 5674
PRED statement, 5675
PREDDIST statement, 5675
PRIOR statement, 5676
syntax, 5645
MCMC procedure, ARRAY statement, 5662
MCMC procedure, BEGINCNST statement, 5663
MCMC procedure, BEGINNODATA statement, 5664
MCMC procedure, BEGINPRIOR statement, 5664
MCMC procedure, BY statement, 5665
MCMC procedure, ENDCNST statement, 5663
MCMC procedure, ENDNODATA statement, 5664
MCMC procedure, ENDPRIOR statement, 5664
MCMC procedure, HYPERPRIOR statement, 5676
MCMC procedure, MODEL statement, 5665
ICOND= option, 5671
INITIAL= option, 5672
MONITOR= option, 5672
NAMESUFFIX= option, 5673
NOOUTPOST option, 5674
MCMC procedure, PARMS statement, 5674
NORMAL option, 5675
SLICE option, 5675
T option, 5675
UDS option, 5675
MCMC procedure, PRED statement, 5675
MCMC procedure, PREDDIST statement, 5675
COVARIATES= option, 5676
NSIM= option, 5676
OUTPRED= option, 5676
SAVEPARM option, 5676
STATISTICS= option, 5676
STATS= option, 5676
MCMC procedure, PRIOR statement, 5676
MCMC procedure, PROC MCMC statement
ACCEPTTOL= option, 5647
ALG= option, 5647
AUTOCORLAG= option, 5648
BINARYJOINT option, 5648
DATA= option, 5651
DIAG= option, 5649
DIAGNOSTICS= option, 5649
DIC option, 5651
DISCRETE= option, 5648
INF= option, 5651
INIT= option, 5651
JOINTMODEL option, 5652
LIST option, 5652
LISTCODE option, 5652
MAXINDEXPRINT= option, 5652
MAXSUBVALUEPRINT= option, 5652
MAXTUNE= option, 5652
MCHISTORY= option, 5653
MINTUNE= option, 5653
MISSING= option, 5653
MONITOR= option, 5654
NBI= option, 5654
NMC= option, 5654
NOLOGDIST option, 5654
NTHREADS= option, 5654
NTU= option, 5654
OUTPOST=option, 5654
PLOTS= option, 5655
PROPCOV= option, 5657
PROPDIST= option, 5647
REOBSINFO option, 5658
SCALE option, 5660
SEED option, 5660
SIMREPORT= option, 5660
SINGDEN= option, 5660
STATISTICS= option, 5660
STATS= option, 5660
TARGACCEPT= option, 5661
TARGACCEPTI= option, 5662
THIN= option, 5662
TRACE option, 5662
TUNEWT= option, 5662
MCMC procedure, Programming statements
ABORT statement, 5678
CALL statement, 5678
DELETE statement, 5678
DO statement, 5678
GOTO statement, 5678
IF statement, 5678
LINK statement, 5678
PUT statement, 5678
RETURN statement, 5678
SELECT statement, 5678
STOP statement, 5678
SUBSTR statement, 5678
WHEN statement, 5678
MCMC procedure, RANDOM statement, 5679
ALGORITHM= option, 5682
CENTER option, 5682
CONSTRAINT= option, 5682
ICOND= option, 5683
INITIAL= option, 5684
ISTATES= option, 5683
MONITOR= option, 5685
NAMESUFFIX= option, 5686
NOCENTER option, 5682
NOOUTPOST option, 5686
SUBJECT= option, 5679
ZERO= option, 5682
MINTUNE= option
PROC MCMC statement, 5653
MISSING= option
PROC MCMC statement, 5653
MODEL statement
MCMC procedure, 5665
MONITOR= option
MODEL statement, 5672
PROC MCMC statement, 5654
RANDOM statement, 5685
NAMESUFFIX= option
MODEL statement, 5673
RANDOM statement (MCMC), 5686
NBI= option
PROC MCMC statement, 5654
NMC= option
PROC MCMC statement, 5654
NOCENTER option
RANDOM statement, 5682
NOLOGDIST option
PROC MCMC statement, 5654
NOOUTPOST option
MODEL statement, 5674
RANDOM statement (MCMC), 5686
NORMAL option
PARMS statement, 5675
NSIM= option
PREDDIST statement (MCMC), 5676
NTHREADS= option
PROC MCMC statement, 5654
NTU= option
PROC MCMC statement, 5654
OUTPOST= option
PROC MCMC statement, 5654
OUTPRED= option
PREDDIST statement (MCMC), 5676
PARMS statement
MCMC procedure, 5674
PLOTS= option
PROC MCMC statement, 5655
PRED statement
MCMC procedure, 5675
PREDDIST statement
MCMC procedure, 5675
PRIOR statement
MCMC procedure, 5676
PROPCOV=method
PROC MCMC statement, 5657
PROPDIST= option
PROC MCMC statement, 5647
RANDOM statement
MCMC procedure, 5679
REOBSINFO option
PROC MCMC statement, 5658
SAVEPARM option
PREDDIST statement (MCMC), 5676
SCALE option
PROC MCMC statement, 5660
SEED option
PROC MCMC statement, 5660
SIMREPORT= option
PROC MCMC statement, 5660
SINGDEN= option
PROC MCMC statement, 5660
SLICE option
PARMS statement, 5675
STATISTICS= option
PREDDIST statement (MCMC), 5676
PROC MCMC statement, 5660
STATS= option
PREDDIST statement (MCMC), 5676
PROC MCMC statement, 5660
SUBJECT= option
RANDOM statement(MCMC), 5679
T option
PARMS statement, 5675
TARGACCEPT= option
PROC MCMC statement, 5661
TARGACCEPTI= option
PROC MCMC statement, 5662
THIN= option
PROC MCMC statement, 5662
TRACE option
PROC MCMC statement, 5662
TUNEWT= option
PROC MCMC statement, 5662
UDS option
PARMS statement, 5675
ZERO= option
RANDOM statement (MCMC), 5682
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement