The PLM Procedure SAS/STAT User’s Guide (Book Excerpt)

The PLM Procedure SAS/STAT User’s Guide (Book Excerpt)
®
SAS/STAT 9.22 User’s Guide
The PLM Procedure
(Book Excerpt)
SAS® Documentation
This document is an individual chapter from SAS/STAT® 9.22 User’s Guide.
The correct bibliographic citation for the complete manual is as follows: SAS Institute Inc. 2010. SAS/STAT® 9.22 User’s
Guide. Cary, NC: SAS Institute Inc.
Copyright © 2010, SAS Institute Inc., Cary, NC, USA
All rights reserved. Produced in the United States of America.
For a Web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at
the time you acquire this publication.
U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation
by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19,
Commercial Computer Software-Restricted Rights (June 1987).
SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513.
1st electronic book, May 2010
SAS® Publishing provides a complete selection of books and electronic products to help customers use SAS software to
its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit the
SAS Publishing Web site at support.sas.com/publishing or call 1-800-727-3228.
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute
Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.
Chapter 66
The PLM Procedure
Contents
Overview: PLM Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5408
Basic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5408
PROC PLM Contrasted with Other SAS Procedures . . . . . . . . . . . . .
5409
Getting Started: PLM Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . .
5410
Syntax: PLM Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5417
PROC PLM Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5418
EFFECTPLOT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5421
ESTIMATE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5422
FILTER Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5423
LSMEANS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5425
LSMESTIMATE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 5426
SCORE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5427
SHOW Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5429
SLICE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5431
TEST Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5431
WHERE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5432
Details: PLM Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5434
BY Processing and the PLM Procedure . . . . . . . . . . . . . . . . . . . .
5434
Analysis Based on Posterior Estimates . . . . . . . . . . . . . . . . . . . .
5434
User-Defined Formats and the PLM Procedure . . . . . . . . . . . . . . . .
5436
ODS Table Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5437
ODS Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5438
Examples: PLM Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5438
Example 66.1: Scoring with PROC PLM . . . . . . . . . . . . . . . . . . .
5438
Example 66.2: Working with Item Stores . . . . . . . . . . . . . . . . . . .
5440
Example 66.3: Group Comparisons in Ordinal Model . . . . . . . . . . . .
5442
Example 66.4: Posterior Inference for Binomial Data . . . . . . . . . . . .
5444
Example 66.5: By-Group Processing . . . . . . . . . . . . . . . . . . . . .
5449
Example 66.6: Comparing Multiple B-Splines . . . . . . . . . . . . . . . .
5454
Example 66.7: Linear Inference with Arbitrary Estimates . . . . . . . . . .
5460
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5463
5408 F Chapter 66: The PLM Procedure
Overview: PLM Procedure
The PLM procedure performs postfitting statistical analyses for the contents of a SAS item store
that was previously created with the STORE statement in some other SAS/STAT procedure. An
item store is a special SAS-defined binary file format used to store and restore information with a
hierarchical structure.
The statements available in the PLM procedure are designed to reveal the contents of the source item
store via the Output Delivery System (ODS) and to perform postfitting tasks such as the following:
testing hypotheses
computing confidence intervals
producing prediction plots
scoring a new data set
The use of item stores and PROC PLM enables you to separate common postprocessing tasks, such
as testing for treatment differences and predicting new observations under a fitted model, from
the process of model building and fitting. A numerically expensive model fitting technique can be
applied once to produce a source item store. The PLM procedure can then be called multiple times
and the results of the fitted model analyzed without incurring the model fitting expenditure again.
The PLM procedure offers the most advanced postprocessing techniques available in SAS/STAT
software. These techniques include step-down multiplicity adjustments for p-values, F tests with
order restrictions, analysis of means (ANOM), and sampling-based linear inference based on Bayes
posterior estimates.
The following procedures support the STORE statement for the generation of item stores that
can be processed with the PLM procedure: GENMOD, GLIMMIX, GLM, LOGISTIC, MIXED,
ORTHOREG, PHREG, SURVEYLOGISTIC, SURVEYPHREG, and SURVEYREG. For details
about the STORE statement, see the section “STORE Statement” on page 529 of Chapter 19, “Shared
Concepts and Topics.”
Basic Features
The PLM procedure, unlike most SAS/STAT procedures, does not operate primarily on an input data
set. Instead, the procedure requires you to specify an item store with the SOURCE= option in the
PROC PLM statement. The item store contains the necessary information and context about the
statistical model that was fit when the store was created. SAS data sets are used only to provide input
information in some circumstances. For example, when scoring a data set or when computing least
squares means with specially defined population margins. In other words, instead of reading raw
data and fitting a model, the PLM procedure reads the results of a model having been fit.
PROC PLM Contrasted with Other SAS Procedures F 5409
In order to interact with the item store and to reveal its contents, the PLM procedure supports the
SHOW statement which converts item store information into standard ODS tables for viewing and
further processing.
The PLM procedure is sensitive to the contents of the item store. For example, if a BAYES statement
was in effect when the item store was created, the posterior parameter estimates are saved to the item
store so that the PLM procedure can perform postprocessing tasks by taking the posterior distribution
of estimable functions into account. As another example, for item stores that are generated by
a mixed model procedure using the Satterthwaite or Kenward-Roger (Kenward and Roger 1997)
degrees-of-freedom method, these methods continue to be available when the item store contents are
processed with the PLM procedure.
Because the PLM procedure does not read data and does not fit a model, the processing time of this
procedure is usually considerably less than the processing time of the procedure that generates the
item store.
PROC PLM Contrasted with Other SAS Procedures
In contrast to other analytic procedures in SAS/STAT software, the PLM procedure does not use an
input data set. Instead, it retrieves information from an item store.
Some of the statements in the PLM procedure are also available as postprocessing statements in other
procedures. Table 66.1 lists SAS/STAT procedures that support the same postprocessing statements
as PROC PLM does.
Table 66.1 SAS/STAT Procedures with Postprocessing Statements Similar to PROC PLM
GENMOD
GLIMMIX
GLM
LOGISTIC
MIXED
ORTHOREG
PHREG
SURVEYLOGISTIC
SURVEYPHREG
SURVEYREG
EFFECTPLOT
p
p
p
ESTIMATE
p
p
p
p
p
p
p
p
p
p
LSMEANS
p
p
p
p
p
p
p
p
p
p
LSMESTIMATE
p
p
SLICE
p
p
p
p
p
p
p
p
p
p
p
p
p
p
p
p
p
Table entries marked with indicate procedures that support statements with the same functionality
p
as in PROC PLM. Those entries marked with indicate procedures that support statements with
same names but different syntaxes from PROC PLM. You can find the most comprehensive set
of features for these statements in the PLM procedure. For example, the LSMEANS statement is
available in all of the listed procedures. For example, the ESTIMATE statement available in the
GENMOD, GLIMMIX, GLM and MIXED procedures does not support all options that PROC PLM
supports, such as multiple rows and multiplicity adjustments.
TEST
p
p
p
p
p
p
p
5410 F Chapter 66: The PLM Procedure
The WHERE statement in other procedures enables you to conditionally select a subset of the
observations from the input data set so that the procedure processes only the observations that meet
the specified conditions. Since the PLM procedure does not use an input data set, the WHERE
statement in the PLM procedure has different functionality. If the item store contains information
about By groups—that is, a BY statement was in effect when the item store was created—you can use
the WHERE statement to select specific BY groups for the analysis. You can also use the FILTER
statement in the PLM procedure to filter results from the ODS output and output data sets.
Getting Started: PLM Procedure
The following DATA step creates a data set from a randomized block experiment with a factorial
treatment structure of factors A and B:
data BlockDesign;
input block a b y
datalines;
1 1 1 56 1 1 2
1 2 1 50 1 2 2
1 3 1 39 1 3 2
2 1 1 30 2 1 2
2 2 1 36 2 2 2
2 3 1 33 2 3 2
3 1 1 32 3 1 2
3 2 1 31 3 2 2
3 3 1 15 3 3 2
4 1 1 30 4 1 2
4 2 1 35 4 2 2
4 3 1 17 4 3 2
;
@@;
41
36
35
25
28
30
24
27
19
25
30
18
The GLM procedure is used in the following statements to fit the model and to create a source item
store for the PLM procedure:
proc glm
class
model
store
run;
data=BlockDesign;
block a b;
y = block a b a*b / solution;
sasuser.BlockAnalysis / label='PLM: Getting Started';
The CLASS statement identifies the variables Block, A, and B as classification variables. The
MODEL statement specifies the response variable and the model effects. The block effect models
the design effect, and the a, b, and a*b effects model the factorial treatment structure. The STORE
statement requests that the context and results of this analysis be saved to an item store named
sasuser.BlockAnalysis. Because the SASUSER library is specified as the library name of the item
store, the store will be available after the SAS session completes. The optional label in the STORE
statement identifies the store in subsequent analyses with the PLM procedure.
Getting Started: PLM Procedure F 5411
Note that having BlockDesign as the name of the output store would not create a conflict with the
input data set name, because data sets and item stores are saved as files of different types.
Figure 66.1 displays the results from the GLM procedure. The “Class Level Information” table
shows the number of levels and their values for the three classification variables. The “Parameter
Estimates” table shows the estimates and their standard errors along with t tests.
Figure 66.1 Class Variable Information, Fit Statistics, and Parameter Estimates
The GLM Procedure
Class Level Information
Levels
Values
block
4
1 2 3 4
a
3
1 2 3
b
2
1 2
R-Square
Coeff Var
Root MSE
y Mean
0.848966
15.05578
4.654747
30.91667
Parameter
Intercept
block
block
block
block
a
a
a
b
b
a*b
a*b
a*b
a*b
a*b
a*b
Class
Estimate
1
2
3
4
1
2
3
1
2
1
1
2
2
3
3
1
2
1
2
1
2
20.41666667
17.00000000
4.50000000
-1.16666667
0.00000000
3.25000000
4.75000000
0.00000000
0.50000000
0.00000000
7.75000000
0.00000000
7.25000000
0.00000000
0.00000000
0.00000000
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
Standard
Error
t Value
Pr > |t|
2.85043856
2.68741925
2.68741925
2.68741925
.
3.29140294
3.29140294
.
3.29140294
.
4.65474668
.
4.65474668
.
.
.
7.16
6.33
1.67
-0.43
.
0.99
1.44
.
0.15
.
1.66
.
1.56
.
.
.
<.0001
<.0001
0.1148
0.6704
.
0.3391
0.1695
.
0.8813
.
0.1167
.
0.1402
.
.
.
The following statements invoke the PLM procedure and use sasuser.BlockAnalysis as the source
item store:
proc plm source=sasuser.BlockAnalysis;
run;
5412 F Chapter 66: The PLM Procedure
These statements produce Figure 66.2. The “Store Information” table displays information that is
gleaned from the source item store. For example, the store was created by the GLM procedure at the
indicated time and date, and the input data set for the analysis was WORK.BLOCKDESIGN. The
label used earlier in the STORE statement of the GLM procedure also appears as a descriptor in
Figure 66.2.
Figure 66.2 Default Information
The PLM Procedure
Store Information
Item Store
Label
Data Set Created From
Created By
Date Created
Response Variable
Class Variables
Model Effects
SASUSER.BLOCKANALYSIS
PLM: Getting Started
WORK.BLOCKDESIGN
PROC GLM
13JAN10:12:58:48
y
block a b
Intercept block a b a*b
Class Level Information
Class
Levels
block
a
b
4
3
2
Values
1 2 3 4
1 2 3
1 2
The “Store Information” table also echoes partial information about the variables and model effects
that are used in the analysis. The “Class Level Information” table is produced by the PLM procedure
by default whenever the model contains effects that depend on CLASS variables.
The following statements request a display of the fit statistics and the parameter estimates from the
source item store and a test of the treatment main effects and their interactions:
proc plm source=sasuser.BlockAnalysis;
show fit parms;
test a b a*b;
run;
Getting Started: PLM Procedure F 5413
The statements produce Figure 66.3. Notice that the estimates and standard errors in the “Parameter
Estimates” table agree with the results displayed earlier by the GLM procedure, except for small
differences in formatting.
Figure 66.3 Fit Statistics, Parameter Estimates, and Tests of Effects
The PLM Procedure
Fit Statistics
MSE
Error df
21.66667
15
Parameter Estimates
Effect
Intercept
block
block
block
block
a
a
a
b
b
a*b
a*b
a*b
a*b
a*b
a*b
block
a
b
Estimate
Standard
Error
1
2
1
2
1
2
1
2
20.4167
17.0000
4.5000
-1.1667
0
3.2500
4.7500
0
0.5000
0
7.7500
0
7.2500
0
0
0
2.8504
2.6874
2.6874
2.6874
.
3.2914
3.2914
.
3.2914
.
4.6547
.
4.6547
.
.
.
1
2
3
4
1
2
3
1
1
2
2
3
3
Type III Tests of Model Effects
Effect
a
b
a*b
Num
DF
Den
DF
F Value
Pr > F
2
1
2
15
15
15
7.54
8.38
1.74
0.0054
0.0111
0.2097
Since the main effects, but not the interaction are significant in this experiment, the subsequent
analysis focuses on the main effects, in particular on the effect of variable A.
The following statements request the least squares means of the A effect along with their pairwise
differences:
proc plm source=sasuser.BlockAnalysis seed=3;
lsmeans
a
/ diff;
lsmestimate a -1 1,
1 1 -2 / uppertailed ftest;
run;
5414 F Chapter 66: The PLM Procedure
The LSMESTIMATE statement tests two linear combinations of the A least squares means: equality
of the first two levels and whether the sum of the first two level effects equals twice the effect of
the third level. The FTEST option in the LSMESTIMATE statement requests a joint F tests for
this two-row contrast. The UPPERTAILED option requests that the F test also be carried out under
one-sided order restrictions. Since F tests under order restrictions (chi-bar-square statistic) require a
simulation-based approach for the calculation of p-values, the random number stream is initialized
with a known seed value through the SEED= option in the PROC PLM statement.
The results of the LSMEANS and the LSMESTIMATE statement are shown in Figure 66.4.
Figure 66.4 LS-Means Related Inference for A Effect
The PLM Procedure
a Least Squares Means
a
Estimate
Standard
Error
DF
t Value
Pr > |t|
1
2
3
32.8750
34.1250
25.7500
1.6457
1.6457
1.6457
15
15
15
19.98
20.74
15.65
<.0001
<.0001
<.0001
Differences of a Least Squares Means
a
_a
1
1
2
2
3
3
Estimate
Standard
Error
DF
t Value
Pr > |t|
-1.2500
7.1250
8.3750
2.3274
2.3274
2.3274
15
15
15
-0.54
3.06
3.60
0.5991
0.0079
0.0026
Least Squares Means Estimates
Effect
Label
Estimate
Standard
Error
DF
t Value
Tails
Pr > t
a
a
Row 1
Row 2
1.2500
15.5000
2.3274
4.0311
15
15
0.54
3.85
Upper
Upper
0.2995
0.0008
F Test for Least Squares Means Estimates
Effect
a
Num
DF
Den
DF
F Value
2
15
7.54
Pr > F
ChiBar
Sq
Value
Pr >
ChiBarSq
0.0054
15.07
0.0001
The least squares means for the three levels of variable A are 32:875, 34:125, and 25:75. The
differences between the third level and the first and second levels are statistically significant at the
5% level (p-values of 0:0079 and 0:0026, respectively). There is no significant difference between
the first two levels. The first row of the “Least Squares Means Estimates” table also displays the
Getting Started: PLM Procedure F 5415
difference between the first two levels of factor A. Although the (absolute value of the) estimate
and its standard error are identical to those in the “Differences of a Least Squares Means” table, the
p-values do not agree because one-sided tests were requested in the LSMESTIMATE statement.
The “F Test” table in Figure 66.4 shows the two degree-of-freedom test for the linear combinations
of the LS-means. The F value of 7:54 with p-value of 0:0054 represents the usual (two-sided) F
test. Under the one-sided right-tailed order restriction imposed by the UPPERTAILED option, the
ChiBarSq value of 15:07 represents the observed value of the chi-bar-square statistic of Silvapulle
and Sen (2004). The associated p-value of 0:0001 was obtained by simulation.
Now suppose that you are interested in analyzing the relationship of the interaction cell means.
(Typically this would not be the case in this example since the a*b interaction is not significant; see
Figure 66.3.) The SLICE statement in the following PROC PLM run produces an F test of equality
and all pair-wise differences of the interaction means for the subset (partition) where variable B is at
level ‘1’. With the ODS GRAPHICS ON statement, the pairwise differences are also visualized in a
diffogram by default.
ods graphics on;
proc plm source=sasuser.BlockAnalysis;
slice a*b / sliceby(b='1') diff;
run;
ods graphics off;
The results are shown in Figure 66.5. Since variable A has three levels, the test of equality of the
A means at level ‘1’ of B is a two-degree comparison. This comparison is statistically significant
(p-value equals 0:0040). You can conclude that the three levels of A are not the same for the first
level of B.
Figure 66.5 Results from Analyzing an Interaction Partition
The PLM Procedure
F Test for a*b Least Squares Means Slice
Slice
b 1
Num
DF
Den
DF
F Value
Pr > F
2
15
8.18
0.0040
Simple Differences of a*b Least Squares Means
Slice
a
_a
Estimate
Standard
Error
DF
t Value
Pr > |t|
b 1
b 1
b 1
1
1
2
2
3
3
-1.0000
11.0000
12.0000
3.2914
3.2914
3.2914
15
15
15
-0.30
3.34
3.65
0.7654
0.0045
0.0024
5416 F Chapter 66: The PLM Procedure
The table of “Simple Differences” was produced by the DIFF option in the SLICE statement. As is
the case with the marginal comparisons in Figure 66.4, there are significant differences against the
third level of A if variable B is held fixed at ‘1’.
Figure 66.6 shows the diffogram that displays the three pairwise least squares mean differences and
their significance. Each line segment corresponds to a comparison. It centers at the least squares
means in the pair with its length corresponding to the projected width of a confidence interval for the
difference. If the variable B is held fixed at ‘1’, both the first two levels are significantly different
from the third level, but the difference between the first and the second level is not significant.
Figure 66.6 LS-Means Difference Diffogram
Syntax: PLM Procedure F 5417
Syntax: PLM Procedure
You can specify the following statements in the PLM procedure:
PROC PLM SOURCE=item-store-specification < options > ;
EFFECTPLOT < plot-type < (plot-definition-options) > > < / options > ;
ESTIMATE < ‘label’ > estimate-specification < (divisor =n) >
< , . . . < ‘label’ > estimate-specification < (divisor =n) > > < / options > ;
FILTER expression ;
LSMEANS < model-effects > < / options > ;
LSMESTIMATE model-effect < ‘label’ > values < divisor =n >
< , . . . < ‘label’ > values < divisor =n > > < / options > ;
SCORE DATA=SAS-data-set < OUT=SAS-data-set >
< keyword< =name > >. . .
< keyword< =name > > < / options > ;
SHOW options ;
SLICE model-effect < / options > ;
TEST < model-effects > < / options > ;
WHERE expression ;
With the exception of the PROC PLM statement and the FILTER statement, any statement can
appear multiple times and in any order. The default order in which the statements are processed
by the PLM procedure depends on the specification in the item store and can be modified with the
STMTORDER= option in the PROC PLM statement.
In contrast to many other SAS/STAT modeling procedures, the PLM procedure does not have
common modeling statements such as the CLASS and MODEL statements. This is because the
information about classification variables and model effects is contained in the source item store that
is passed to the procedure in the PROC PLM statement. All subsequent statements are checked for
consistency with the stored model. For example, the statement
lsmeans c / diff;
is detected as not valid unless one of the following conditions was true at the time when the source
store was created:
The effect C was used in the model.
C was specified in the CLASS statement.
The CLASS variables in the model had a GLM parameterization.
The FILTER, SCORE, SHOW, and WHERE statements are described in full after the PROC PLM
statement in alphabetical order. The EFFECTPLOT, ESTIMATE, LSMEANS, LSMESTIMATE,
SLICE, and TEST statements are also used by many other procedures. Summary descriptions
of functionality and syntax for these statements are also given after the PROC PLM statement in
alphabetical order, but full documentation about them is available in Chapter 19, “Shared Concepts
and Topics.”
5418 F Chapter 66: The PLM Procedure
PROC PLM Statement
PROC PLM SOURCE=item-store-specification < options > ;
The PROC PLM statement invokes the procedure. The SOURCE= option with item-storespecification is required.
You can specify the following options:
ALPHA=˛
specifies the nominal significance level for multiplicity corrections and for the construction
of confidence intervals. The value of ˛ must be between 0 and 1. The default is the value
specified in the source item store, or 0.05 if the item store does not provide a value. The
confidence level based on ˛ is 1 ˛.
DDFMETHOD=RESIDUAL | RES | ERROR
DDFMETHOD=NONE
DDFMETHOD=KENROG | KR | KENWARDROGER
DDFMETHOD=SATTERTH | SAT | SATTERTHWAITE
specifies the method for determining denominator degrees of freedom for tests and confidence
intervals. The default degree-of-freedom method is determined by the contents of the item
store. You can override the default to some extent with the DDFMETHOD= option.
If you choose DDFMETHOD=NONE, then infinite denominator degrees of freedom are
assumed for tests and confidence intervals. This essentially produces z tests and intervals
instead of t tests and intervals and chi-square tests instead of F tests.
The KENWARDROGER and SATTERTHWAITE methods require that the source item store
contain information about these methods. This information is currently available for item
stores that were created with the MIXED or GLIMMIX procedures when the appropriate
DDFM= option was in effect.
ESTEPS=
specifies the tolerance value used in determining the estimability of linear functions. The
default value is determined by the contents of the source item store; it is usually 1E 4.
FORMAT=NOLOAD | RELOAD
specifies how the PLM procedure handles user-defined formats, which are not permanent.
When the item store is created, user-defined formats are stored. When the PLM procedure
opens an item store, these formats are loaded by default. If the format already exists in your
SAS session, this operation amounts to a reloading of the format (FORMAT=RELOAD) that
replaces the existing format.
With FORMAT=NOLOAD, you prevent the PLM procedure from reloading the format from
the item store. As a consequence, PLM statements might fail if a format was present at the item
store creation and is not available in your SAS session. Also, if you modify the format that was
used in the item store creation and use FORMAT=NOLOAD, you might obtain unexpected
results because levels of classification variables are remapped.
PROC PLM Statement F 5419
The “Class Level Information” table always displays the formatted values of classification
variables that were used in fitting the model, regardless of the FORMAT= option. For more
details about using formats with the PLM procedure, see “User-Defined Formats and the PLM
Procedure” on page 5436.
MAXLEN=n
determines the maximum length of informational strings in the “Store Information” table. This
table displays, for example, lists of classification or BY variables and lists of model effects.
The value of n determines the truncation length for these strings. The minimum and maximum
values for n are 20 and 256, respectively. The default is n D 100.
NOCLPRINT< =number >
suppresses the display of the “Class Level Information” table if you do not specify number. If
you specify number, only levels with totals that are less than number are listed in the table.
The PLM procedure produces the “Class Level Information” table by default when the model
contains effects that depend on classification variables.
NOINFO
suppresses the display of the “Store Information” table.
NOPRINT
suppresses the generation of tabular and graphical output. When the NOPRINT option is in
effect, ODS tables are also not produced.
PERCENTILES=value-list
PERCENTILE=value-list
supplies a list of percentiles for the construction of highest posterior density (HPD) intervals
when the PLM procedure performs a sampling-based analysis (for example, when processing
an item store that contains posterior parameter estimates from a Bayesian analysis). The
default set of percentiles depends on the contents of the source item store; it is typically
PERCENTILES=25, 50, 75. The entries in value-list must be strictly between 0 and 100.
PLOTS < (global-plot-option) > < =specific-plot-options >
requests that the PLM procedure produce statistical graphics via the Output Delivery System,
provided that the ODS GRAPHICS ON statement has been specified. For general information
about ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS.” You can request
statistical graphics in the EFFECTPLOT, ESTIMATE, LSMEANS, LSMESTIMATE, and
SLICE statements. For information about these plots, see the corresponding sections of
Chapter 19, “Shared Concepts and Topics.”
Global Plot Option
The following global-plot-option applies to all plots produced by PROC PLM.
UNPACKPANEL
UNPACK
breaks a graphic that is otherwise paneled into individual component plots.
5420 F Chapter 66: The PLM Procedure
Specific Plot Options
You can specify the following specific-plot-options:
ALL
requests that all the appropriate plots be produced.
NONE
suppresses all plots.
SEED=number
specifies the random number seed for analyses that depend on a random number stream.
You can also specify the random number seed through some PLM statements (for example,
through the SEED= options in the ESTIMATE, LSMEANS, and LSMESTIMATE statements).
However, note that there is only a single random number stream per procedure run. Specifying
the SEED= option in the PROC PLM statement initializes the stream for all subsequent
statements. If you do not specify a random number seed, the source item store might supply
one for you. If a seed is in effect when the PLM procedure opens the source store, the “Store
Information” table displays its value.
If the random number seed is less than or equal to zero, the seed is generated from reading the
time of day from the computer clock and a log message indicates the chosen seed value.
SINGCHOL=number
tunes the singularity criterion in Cholesky decompositions. The default value depends on the
contents of the source item store. The default value is typically 1E4 times the machine epsilon;
this product is approximately 1E 12 on most computers.
SINGRES=number
sets the tolerance for which the residual variance or scale parameter is considered to be
zero. The default value depends on the contents of the source item store. The default value
is typically 1E4 times the machine epsilon; this product is approximately 1E 12 on most
computers.
SINGULAR=number
tunes the general singularity criterion applied by the PLM procedure in divisions and inversions.
The default value used by the PLM procedure depends on the contents of the item store. The
default value is typically 1E4 times the machine epsilon; this product is approximately 1E 12
on most computers.
SOURCE=item-store-specification
RESTORE=item-store-specification
specifies the source item store for processing. This option is required because, in contrast to
SAS data sets, there is no default item store. An item-store-specification consists of a one- or
two-level name as with SAS data sets. As with data sets, the default library association of an
item store is with the WORK library, and any stores created in this library are deleted when
the SAS session concludes.
EFFECTPLOT Statement F 5421
STMTORDER=SYNTAX | GROUP
STMT=SYNTAX | GROUP
affects the order in which statements are grouped during processing. The default behavior
depends on the contents of the source item store and can be modified with the STMTORDER=
option. If STMTORDER=SYNTAX is in effect, the statements are processed in the order
in which they appear. Note that this precludes the hierarchical grouping of ODS objects.
If STMTORDER=GROUP is in effect, the statements are processed in groups and in the
following order: SHOW, TEST, LSMEANS, SLICE, LSMESTIMATE, ESTIMATE, and
SCORE.
WHEREFORMAT
specifies that the constants (literals) specified in WHERE expressions for group selection are
in terms of the formatted values of the BY variables. By default, WHERE expressions are
specified in terms of the unformatted (raw) values of the BY variables, as in the SAS DATA
step.
ZETA=number
tunes the sensitivity in forming Type III functions. Any element in the estimable function basis
with an absolute value less than number is set to 0. The default depends on the contents of the
source item store; it usually is 1E 8.
EFFECTPLOT Statement
EFFECTPLOT < plot-type < (plot-definition-options) > > < / options > ;
The EFFECTPLOT statement produces a display of the fitted model and provides options for
changing and enhancing the displays. Table 66.2 describes the available plot-types and their plotdefinition-options.
Table 66.2 Plot-Types and Plot-Definition-Options
Description
Plot-Definition-Options
BOX plot-type
Displays a box plot of continuous response data at each
level of a CLASS effect, with predicted values
superimposed and connected by a line. This is an
alternative to the INTERACTION plot-type.
PLOTBY= variable or CLASS effect
X= CLASS variable or effect
CONTOUR plot-type
Displays a contour plot of predicted values against two
continuous covariates.
PLOTBY= variable or CLASS effect
X= continuous variable
Y= continuous variable
FIT plot-type
Displays a curve of predicted values versus a
continuous variable.
PLOTBY= variable or CLASS effect
X= continuous variable
5422 F Chapter 66: The PLM Procedure
Table 66.2 continued
Description
Plot-Definition-Options
INTERACTION plot-type
Displays a plot of predicted values (possibly with error
bars) versus the levels of a CLASS effect. The
predicted values are connected with lines and can be
grouped by the levels of another CLASS effect.
PLOTBY= variable or CLASS effect
SLICEBY= variable or CLASS effect
X= CLASS variable or effect
SLICEFIT plot-type
Displays a curve of predicted values versus a
continuous variable grouped by the levels of a
CLASS effect.
PLOTBY= variable or CLASS effect
SLICEBY= variable or CLASS effect
X= continuous variable
For full details about the syntax and options of the EFFECTPLOT statement, see the section
“EFFECTPLOT Statement” on page 436 of Chapter 19, “Shared Concepts and Topics.”
ESTIMATE Statement
ESTIMATE < ‘label’ > estimate-specification < (divisor =n) >
< , . . . < ‘label’ > estimate-specification < (divisor =n) > >
< / options > ;
The ESTIMATE statement provides a mechanism for obtaining custom hypothesis tests. Estimates
are formed as linear estimable functions of the form Lˇ. You can perform hypothesis tests for the
estimable functions, construct confidence limits, and obtain specific nonlinear transformations.
Table 66.3 summarizes important options in the ESTIMATE statement.
Table 66.3
Option
Important ESTIMATE Statement Options
Description
Construction and Computation of Estimable Functions
DIVISOR=
Specifies a list of values to divide the coefficients
NOFILL
Suppresses the automatic fill-in of coefficients for higher-order
effects
SINGULAR=
Tunes the estimability checking difference
FILTER Statement F 5423
Table 66.3 continued
Option
Description
Degrees of Freedom and p-values
ADJUST=
Determines the method for multiple comparison adjustment of
estimates
ALPHA=˛
Determines the confidence level (1 ˛)
LOWER
Performs one-sided, lower-tailed inference
STEPDOWN
Adjusts multiplicity-corrected p-values further in a step-down fashion
TESTVALUE=
Specifies values under the null hypothesis for tests
UPPER
Performs one-sided, upper-tailed inference
Statistical Output
CL
CORR
COV
E
JOINT
PLOTS=
SEED=
Constructs confidence limits
Displays the correlation matrix of estimates
Displays the covariance matrix of estimates
Prints the L matrix
Produces a joint F or chi-square test for the estimable functions
Requests ODS statistical graphics if the analysis is sampling-based
Specifies the seed for computations that depend on random numbers
Generalized Linear Modeling
CATEGORY=
Specifies how to construct estimable functions with multinomial
data
EXP
Exponentiates and displays estimates
ILINK
Computes and displays estimates and standard errors on the inverse
linked scale
For details about the syntax of the ESTIMATE statement, see the section “ESTIMATE Statement”
on page 462 of Chapter 19, “Shared Concepts and Topics.”
FILTER Statement
FILTER expression ;
The FILTER statement enables you to filter the results of the PLM procedure, specifically the contents
of ODS tables and the output data sets. There can be at most one FILTER statement per PROC
PLM run, and the filter is applied to all BY groups and to all queries generated through WHERE
expressions.
A filter expression follows the same pattern as a where-expression in the WHERE statement. The
expressions consist of operands and operators. For more information about specifying whereexpressions, see the WHERE statement for the PLM procedure and SAS Language Reference:
Concepts.
5424 F Chapter 66: The PLM Procedure
Valid keywords for the formation of operands in the FILTER statement are shown in Table 66.4.
Table 66.4
Keywords for Filtering Results
Keyword
Description
Prob
ProbChi
ProbF
ProbT
AdjP
Estimate
Pred
Resid
Std
Mu
Regular (unadjusted) p-values from t, F, or chi-square tests
Regular (unadjusted) p-values from chi-square tests
Regular (unadjusted) p-values from F tests
Regular (unadjusted) p-values from t tests
Adjusted p-values
Results displayed in “Estimates” column of ODS tables
Predicted values in SCORE output data sets
Residuals in SCORE output data sets.
Standard errors in ODS tables and in SCORE results
Results displayed in the “Mean” column of ODS tables (this column
is typically produced by the ILINK option)
The value of the usual t statistic
The value of the usual F statistic
The value of the chi-square statistic
The value of the test statistic (a generic keyword for the ‘tValue’,
‘FValue’, and ‘Chisq’ tokens)
The lower confidence limit displayed in ODS tables
The upper confidence limit displayed in ODS tables
The adjusted lower confidence limit displayed in ODS tables
The adjusted upper confidence limit displayed in ODS tables
The lower confidence limit for the mean displayed in ODS tables
The upper confidence limit for the mean displayed in ODS tables
The adjusted lower confidence limit for the mean displayed in ODS
tables
The adjusted upper confidence limit for the mean displayed in ODS
tables
tValue
FValue
Chisq
testStat
Lower
Upper
AdjLower
AdjUpper
LowerMu
UpperMu
AdjLowerMu
AdjUpperMu
When you write filtering expressions, be advised that filtering variables that are not used in the results
are typically set to missing values. For example, the following statements select all results (filter
nothing) because no adjusted p-values are computed:
proc plm source=MyStore;
lsmeans a / diff;
filter adjp < 0.05;
run;
If the adjusted p-values are set to missing values, the condition < 0:05 is true in each case (missing
values always compare smaller than the smallest nonmissing value).
See “Example 66.6: Comparing Multiple B-Splines” on page 5454 for an example of using the
FILTER statement.
Filtering results has no affect on the item store contents that are displayed with the SHOW statement.
However, BY-group selection with the WHERE statement can limit the amount of information that is
displayed by the SHOW statements.
LSMEANS Statement F 5425
LSMEANS Statement
LSMEANS < model-effects > < / options > ;
The LSMEANS statement computes and compares least squares means (LS-means) of fixed effects.
LS-means are predicted population margins—that is, they estimate the marginal means over a
balanced population. In a sense, LS-means are to unbalanced designs as class and subclass arithmetic
means are to balanced designs.
Table 66.5 summarizes important options in the LSMEANS statement.
Table 66.5 Important LSMEANS Statement Options
Option
Description
Construction and Computation of LS-Means
AT
Modifies the covariate value in computing LS-means
BYLEVEL
Computes separate margins
DIFF
Requests differences of LS-means
Specifies the weighting scheme for LS-means computation as deOM=
termined by the input data set
SINGULAR=
Tunes estimability checking
Degrees of Freedom and p-values
ADJUST=
Determines the method for multiple comparison adjustment of LSmeans differences
ALPHA=˛
Determines the confidence level (1 ˛)
STEPDOWN
Adjusts multiple comparison p-values further in a step-down
fashion
Statistical Output
CL
CORR
COV
E
LINES
MEANS
PLOTS=
SEED=
Constructs confidence limits for means and mean differences
Displays the correlation matrix of LS-means
Displays the covariance matrix of LS-means
Prints the L matrix
Produces a “Lines” display for pairwise LS-means differences
Prints the LS-means
Requests ODS statistical graphics of means and mean comparisons
Specifies the seed for computations that depend on random numbers
Generalized Linear Modeling
EXP
Exponentiates and displays estimates of LS-means or LS-means
differences
ILINK
Computes and displays estimates and standard errors of LS-means
(but not differences) on the inverse linked scale
ODDSRATIO
Reports (simple) differences of least squares means in terms of
odds ratios if permitted by the link function
5426 F Chapter 66: The PLM Procedure
For details about the syntax of the LSMEANS statement, see the section “LSMEANS Statement” on
page 479 of Chapter 19, “Shared Concepts and Topics.”
LSMESTIMATE Statement
LSMESTIMATE model-effect < ‘label’ > values < divisor =n >
< , . . . < ‘label’ > values < divisor =n > >
< / options > ;
The LSMESTIMATE statement provides a mechanism for obtaining custom hypothesis tests among
least squares means.
Table 66.6 summarizes important options in the LSMESTIMATE statement.
Table 66.6
Important LSMESTIMATE Statement Options
Option
Description
Construction and Computation of LS-Means
AT
Modifies covariate values in computing LS-means
BYLEVEL
Computes separate margins
DIVISOR=
Specifies a list of values to divide the coefficients
OM=
Specifies the weighting scheme for LS-means computation as determined by a data set
SINGULAR=
Tunes estimability checking
Degrees of Freedom and p-values
ADJUST=
Determines the method for multiple comparison adjustment of LSmeans differences
ALPHA=˛
Determines the confidence level (1 ˛)
LOWER
Performs one-sided, lower-tailed inference
STEPDOWN
Adjusts multiple comparison p-values further in a step-down fashion
TESTVALUE=
Specifies values under the null hypothesis for tests
UPPER
Performs one-sided, upper-tailed inference
Statistical Output
CL
CORR
COV
E
ELSM
JOINT
PLOTS=
SEED=
Constructs confidence limits for means and mean differences
Displays the correlation matrix of LS-means
Displays the covariance matrix of LS-means
Prints the L matrix
Prints the K matrix
Produces a joint F or chi-square test for the LS-means and LSmeans differences
Requests ODS statistical graphics of means and mean comparisons
Specifies the seed for computations that depend on random numbers
SCORE Statement F 5427
Table 66.6 continued
Option
Description
Generalized Linear Modeling
CATEGORY=
Specifies how to construct estimable functions with multinomial
data
EXP
Exponentiates and displays LS-means estimates
ILINK
Computes and displays estimates and standard errors of LS-means
(but not differences) on the inverse linked scale
For details about the syntax of the LSMESTIMATE statement, see the section “LSMESTIMATE
Statement” on page 496 of Chapter 19, “Shared Concepts and Topics.”
SCORE Statement
SCORE DATA=SAS-data-set < OUT=SAS-data-set >
< keyword< =name > >. . .
< keyword< =name > > < / options > ;
The SCORE statement applies the contents of the source item store to compute predicted values and
other observation-wise statistics for a SAS data set.
You can specify the following syntax elements in the SCORE statement before the option slash (/):
DATA=SAS-data-set
specifies the input data set for scoring. This option is required, and the data set is examined for
congruity with the previously fitted (and stored) model. For example, all necessary variables
to form a row of the X matrix must be present in the input data set and must be of the correct
type and format. The following variables do not have to be present in the input data set:
the response variable
the events and trials variables used in the events/trials syntax for binomial data
variables used in WEIGHT or FREQ statements
OUT=SAS-data-set
specifies the name of the output data set. If you do not specify an output data set with the
OUT= option, the PLM procedure uses the DATAn convention to name the output data set.
keyword< =name >
specifies a statistic to be included in the OUT= data set and optionally assigns the statistic the
variable name name. Table 66.7 lists the keywords and the default names assigned by the PLM
procedure if you do not specify a name.
5428 F Chapter 66: The PLM Procedure
Table 66.7 Keywords for Output Statistics
Keyword
Description
PREDICTED
Linear predictor
STDERR
Standard deviation of linear predictor
Residual
RESIDUAL
LCLM
UCLM
LCL
UCL
Lower confidence limit
for the linear predictor
Upper confidence limit
for the linear predictor
Lower prediction limit
for the linear predictor
Upper prediction limit
for the linear predictor
Expression
b
D xb̌
p
Var.b
/
y
b
Name
Predicted
StdErr
Resid
LCLM
UCLM
LCL
UCL
Prediction limits (LCL, UCL) are available only for statistical models that allow such limits, typically
regression-type models for normally distributed data with an identity link function.
You can specify the following options in the SCORE statement after a slash (/):
ALPHA=number
determines the coverage probability for two-sided confidence and prediction intervals. The
coverage probability is computed as 1 number. The value of number must be between 0 and
1; the default is 0.05.
DF=number
specifies the degrees of freedom to use in the construction of prediction and confidence limits.
ILINK
requests that predicted values be inversely linked to produce predictions on the data scale. By
default, predictions are produced on the linear scale where covariate effects are additive.
NOUNIQUE
requests that names not be made unique in the case of naming conflicts. By default, the PLM
procedure avoids naming conflicts by assigning a unique name to each output variable. If you
specify the NOUNIQUE option, variables with conflicting names are not renamed. In that
case, the first variable added to the output data set takes precedence.
NOVAR
requests that variables from the input data set not be added to the output data set.
OBSCAT
requests that statistics in models for multinomial data be written to the output data set only for
the response level that corresponds to the observed level of the observation.
SHOW Statement F 5429
SAMPLE
requests that the sample of parameter estimates in the item store be used to form scoring
statistics. This option is useful when the item store contains the results of a Bayesian analysis
and a posterior sample of parameter estimates. The predicted value is then computed as the
average predicted value across the posterior estimates, and the standard error measures the
standard deviation of these estimates. For example, let b̌1 ; : : : ; b̌k denote the k posterior
sample estimates of ˇ, and let xi denote the x-vector for the i th observation in the scoring
data set. If the SAMPLE option is in effect, the output statistics for the predicted value, the
standard error, and the residual of the i th observation are computed as
ij D xi b̌j
PREDi
STDERRi
k
1X
D i D
ij
k
j D1
0
k
1 X
D @
ij
k 1
11=2
2
i A
j D1
RESIDUALi
where g
1 ./
D yi
g
1
.i /
denotes the inverse link function.
If, in addition, the ILINK option is in effect, the calculations are as follows:
ij D xi b̌j
k
PREDi
STDERRi
1X 1
g
ij
D
k
j D1
0
k
1 X
g 1 .ij /
D @
k 1
11=2
2
PREDi A
j D1
RESIDUALi
D yi
PREDi
The LCL and UCL statistics are not available with the SAMPLE option. When the LCLM and
UCLM statistics are requested, the SAMPLE option yields the lower 100 ˛=2% and upper
100 .1 ˛=2/% percentiles of the predicted values under the sample (posterior) distribution.
When you request residuals with the SAMPLE option, the calculation depends on whether the
ILINK option is specified.
SHOW Statement
SHOW options ;
The SHOW statement uses the Output Delivery System to display contents of the item store. This
statement is useful for verifying that the contents of the item store apply to the analysis and for
generating ODS tables. You can specify the following options after the SHOW statement:
5430 F Chapter 66: The PLM Procedure
ALL | _ALL_
displays all applicable contents.
BYVAR | BY
displays information about the BY variables in the source item store. If a BY statement was
present when the item store was created, the PLM procedure performs the analysis separately
for each BY group.
CLASSLEVELS | CLASS
displays the “Class Level Information” table. This table is produced by the PLM procedure by
default if the model contains effects that depend on classification variables.
CORRELATION | CORR | CORRB
produces the correlation matrix of the parameter estimates. If the source item store contains a
posterior sample of parameter estimates, the computed matrix is the correlation matrix of the
sample covariance matrix.
COVARIANCE | COV | COVB
produces the covariance matrix of the parameter estimates. If the source item store contains a
posterior sample of parameter estimates, the PLM procedure computes the empirical sample
covariance matrix from the posterior estimates. You can convert this matrix into a sample
correlation matrix with the CORRELATION option in the SHOW statement.
EFFECTS
displays information about the constructed effects in the model. Constructed effects are those
that were created with the EFFECT statement in the procedure run that generated the source
item store.
FITSTATS | FIT
displays the fit statistics from the item store.
HESSIAN | HESS
displays the Hessian matrix.
HERMITE | HERM
generates the Hermite matrix H D .X0 X/ .X0 X/. The PLM procedure chooses a reflexive,
g2 -inverse for the generalized inverse of the crossproduct matrix X0 X. See “Important Linear
Algebra Concepts” on page 47 of Chapter 3, “Introduction to Statistical Modeling with
SAS/STAT Software,” for information about generalized inverses and the sweep operator.
PARAMETERS< =n >
PARMS< =n >
displays the parameter estimates. The structure of the display depends on whether a posterior
sample of parameter estimates is available in the source item store. If such a sample is present,
up to the first 20 parameter vectors are shown in wide format. You can modify this number
with the n argument.
If no posterior sample is present, the single vector of parameter estimates is shown in narrow
format. If the store contains information about the covariance matrix of the parameter estimates,
then standard errors are added.
SLICE Statement F 5431
PROGRAM< (WIDTH=n) >
PROG< (WIDTH=n) >
displays the SAS program that generated the item store, provided that this was stored at
store generation time. The program does not include comments, titles, or some other global
statements. The optional width parameter n determines the display width of the source code.
XPX | CROSSPRODUCT
displays the crossproduct matrix X0 X.
XPXI
displays the generalized inverse of the crossproduct matrix X0 X. The PLM procedure obtains
a reflexive g2 -inverse by sweeping. See “Important Linear Algebra Concepts” on page 47 of
Chapter 3, “Introduction to Statistical Modeling with SAS/STAT Software,” for information
about generalized inverses and the sweep operator.
SLICE Statement
SLICE model-effect < / options > ;
The SLICE statement provides a general mechanism for performing a partitioned analysis of the
LS-means for an interaction. This analysis is also known as an analysis of simple effects.
The SLICE statement uses the same options as the LSMEANS statement, which are summarized in
Table 19.19. For details about the syntax of the SLICE statement, see the section “SLICE Statement”
on page 526 of Chapter 19, “Shared Concepts and Topics.”
TEST Statement
TEST < model-effects > < / options > ;
The TEST statement enables you to perform F tests for model effects that test Type I, II, or Type
III hypotheses. See Chapter 15, “The Four Types of Estimable Functions,” for details about the
construction of Type I, II, and III estimable functions.
Table 66.8 summarizes options in the TEST statement.
5432 F Chapter 66: The PLM Procedure
Table 66.8
TEST Statement Options
Option
Description
CHISQ
DDF=
E
E1
E2
E3
HTYPE=
INTERCEPT
Requests chi-square tests
Specifies denominator degrees of freedom for fixed effects
Requests Type I, Type II, and Type III coefficients
Requests Type I coefficients
Requests Type II coefficients
Requests Type III coefficients
Indicates the type of hypothesis test to perform
Adds a row that corresponds to the overall intercept
For details about the syntax of the TEST statement, see the section “TEST Statement” on page 530
of Chapter 19, “Shared Concepts and Topics.”
WHERE Statement
WHERE expression ;
The WHERE statement in the PLM procedure is helpful when the item store contains BY-variable
information and you want to apply the PROC PLM statements to only a subset of the BY groups.
A WHERE expression is a type of SAS expression that defines a condition. In the DATA step
and in procedures that use SAS data sets as input source, the WHERE expression is used to select
observations for inclusion in the DATA step or in the analysis. In the PLM procedure, which does
not accept a SAS data set but rather takes an item store that was created by a qualifying SAS/STAT
procedure, the WHERE statement is also used to specify conditions. The conditional selection does
not apply to observations in PROC PLM, however. Instead, you use the WHERE statement in the
PLM procedure to select a subset of BY groups from the item store to which to apply the PROC
PLM statements.
The general syntax of the WHERE statement is
WHERE operand < operator > < operand > ;
where
operand is something to be operated on. The operand can be the name of a BY variable in
the item store, a SAS function, a constant, or a predefined name to identify columns in result
tables.
operator is a symbol that requests a comparison, logical operation, or arithmetic calculation.
All SAS expression operators are valid for a WHERE expression.
WHERE Statement F 5433
For more details about how to specify general WHERE expressions, see SAS Language Reference:
Concepts. Notice that the FILTER statement accepts similar expressions that are specified in terms
of predefined keywords. Expressions in the WHERE statement of the PLM procedure are written in
terms of BY variables.
There is no limit to the number of WHERE statements in the PLM procedure. When you specify
multiple WHERE statements, the statements are not cumulative. Each WHERE statement is executed
separately. You can think of each selection WHERE statement as one analytic query to the item
store: the WHERE statement defines the query, and the PLM procedure is the querying engine. For
example, suppose that the item store contains results for the numeric BY variables A and B. The
following statements define two separate queries of the item store:
WHERE a = 4;
WHERE (b < 3) and (a > 4);
The PLM procedure first applies the requested analysis to all BY groups where a equals 4 (irrespective
of the value of variable b). The analysis is then repeated for all BY groups where b is less than 3 and
a is greater than 4.
Group selection with WHERE statements is possible only if the item store contains BY variables.
You can use the BYVAR option in the SHOW statement to display the BY variables in the item store.
Note that WHERE expressions in the SAS DATA step and in many procedures are specified in terms
of the unformatted values of data set variables, even if a format was applied to the variable. If you
specify the WHEREFORMAT option in the PROC PLM statement, the PLM procedure evaluates
WHERE expressions for BY variables in terms of the formatted values. For example, assume that
the following format was applied to the variable tx when the item store was created:
proc format;
value bf 1 = 'Control'
2 = 'Treated';
run;
Then the following two PROC PLM runs are equivalent:
proc plm source=MyStore;
show parms;
where b = 2;
run;
proc plm source=MyStore whereformat;
show parms;
where b = 'Treated';
run;
5434 F Chapter 66: The PLM Procedure
Details: PLM Procedure
BY Processing and the PLM Procedure
When a BY statement is in effect for the anlaysis that creates an item store, the information about
BY variables and BY-group-specific modeling results are transferred to the item store. In this case,
the PLM procedure automatically assumes a processing mode for the item store that is akin to BY
processing, with the PLM statements being applied in turn for each of the BY groups. Also, you
can then obtain a table of BY groups with the BYVAR option in the SHOW statement. The “Source
Information” table also displays the variable names of the BY variables if BY groups are present.
The WHERE statement can be used to restrict the analysis to specific BY groups that meet the
conditions of the WHERE expression.
See Example 66.4 for an example that uses BY-group-specific information in the source item store.
As with procedures that operate on input data sets, the BY variable information is added automatically
to any output data sets and ODS tables produced by the PLM procedure.
When you score a data set with the SCORE statement and the item store contains BY variables, three
situations can arise:
None of the BY variables are present in the scoring data set. In this situation the results of the
BY groups in the item store are applied in turn to the entire scoring data set. For example, if
the scoring data set contains 50 observations and no BY-variable information, the number of
observations in the output data set of the SCORE statement equals 50 times the number of BY
groups.
The scoring data set contains only a part of the BY variables, or the variables have different
type or format. The PLM procedure does not process such an incompatible scoring data set.
All BY variables are in the scoring data set in the same type and format as when the item store
was created. The BY-group-specific results are applied to each observation in the scoring data
set. The scoring data set does not have to be sorted or grouped by the BY variables. However,
it is computationally more efficient if the scoring data set is arranged by groups of the BY
variables.
Analysis Based on Posterior Estimates
If an item store that are saved from a Bayesian analysis (by PROC GENMOD or PROC PHREG),
then PROC PLM can perform sampling-based inference based on Bayes posterior estimates that are
saved in the item store. For example, the following statements request that a Bayesian analysis and
results be saved to an item store named sasuser.gmd. For the Bayesian analysis, the random number
Analysis Based on Posterior Estimates F 5435
generator seed is set to 1. By default, a noninformative distribution is set as the prior distribution for
the regression coefficients and the posterior sample size is 10,000.
proc genmod data=gs;
class a b;
model y = a b;
bayes seed=1;
store sasuser.gmd / label='Bayesian Analysis';
run;
When the PLM procedure opens the item store sasuser.gmd, it detects that the results were saved
from a Bayesian analysis. The posterior sample of regression coefficient estimates are then loaded to
perform statistical inference tasks.
The majority of postprocessing tasks involve inference based on an estimable linear function Lb̌,
which often requires its mean and variance. When the standard frequentist analyses are performed,
the mean and variance have explicit forms because the parameter estimate b̌ is analytically tractable.
However, explicit forms are not usually available when Bayesian models are fitted. Instead, empirical
means and variance-covariance matrices for the estimable function are constructed from the posterior
sample.
Let b̌i ; i D 1; : : : ; Np denote the Np vectors of posterior sample estimates of ˇ saved in
sasuser.gmd. Use these vectors to construct the posterior sample of estimable functions Lˇi .
The posterior mean of the estimable function is thus
Np
X
1
Lb̌ D
Lb̌i
Np
i D1
and the posterior variance of the estimable function is
V Lb̌ D
1
Np
1
Np X
Lb̌i
2
Lb̌
i D1
Sometimes statistical inference on a transformation of Lb̌ is requested. For example, the EXP
option for the ESTIMATE and LSMESTIMATE statements requests analysis based on exp.Lb̌/,
exponentiation of the estimable function. If this type of analysis is requested, the posterior sample
of transformed estimable functions is constructed by transforming each of the estimable function
evaluated at the posterior sample: f .Lb̌i /; i D 1; : : : ; Np . The posterior mean and variance for
f .Lb̌/ are then computed from the constructed sample to make the inference:
Np
1 X
b̌
f .L / D
f .Lb̌i /
Np
i D1
V f .Lb̌/ D
1
Np
1
Np X
f .Lb̌i /
2
f .Lb̌/
i D1
After obtaining the posterior mean and variance, the PLM procedure proceeds to perform statistical
inference tasks based on them.
5436 F Chapter 66: The PLM Procedure
User-Defined Formats and the PLM Procedure
The PLM procedure does not support a FORMAT statement because it operates without an input
data set, and also because changing the format properties of variables could alter the interpretation of
parameter estimates, thus creating a dissonance with variable properties in effect when the item store
was created. Instead, user-defined formats that are applied to classification variables when the item
store is created are saved to the store and are by default reloaded by the PLM procedure. When the
PLM procedure loads a format, notes are issued to the log.
You can change the load behavior for formats with the user-defined FORMAT= option in the PROC
PLM statement.
User-defined formats do not need to be supplied in a new SAS session. However, when a user-defined
format with the same name as a stored format exists and the default FORMAT=RELOAD option is
in effect, the format definition loaded from the item store replaces the format currently in effect.
In the following statements, the format AFORM is created and applied to the variable a in the PROC
GLM step. This format definition is transferred to the item store sasuser.glm through the STORE
statement.
proc format;
value aform 1='One' 2='Two' 3='Three';
run;
proc glm data=sp;
format a aform.;
class block a b;
model y = block a b x;
store sasuser.glm;
weight x;
run;
The following statements replace the format definition for aform in the PROC FORMAT step. The
PLM step then reloads the AFORM format and thereby restores its original state.
proc format;
value aform 1='Un' 2='Deux' 3='Trois';
run;
proc plm source=sasuser.glm;
show class;
score data=sp out=plmout lcl lclm ucl uclm;
run;
The following notes, issued by the PLM procedure, inform you that the procedure loaded the format,
the format already existed, and the existing format was replaced:
NOTE: The format AFORM was loaded from item store SASUSER.GLM.
NOTE: Format AFORM is already on the library.
NOTE: Format AFORM has been output.
ODS Table Names F 5437
After the PROC PLM run, the definition that is in effect for the format AFORM corresponds to the
following SAS statements:
proc format;
value aform 1='One' 2='Two' 3='Three';
run;
ODS Table Names
PROC PLM assigns a name to each table it creates. You can use these names to refer to the table
when you use the Output Delivery System (ODS) to select tables and create output data sets. These
names are listed in Table 66.9. For more information about ODS, see Chapter 20, “Using the Output
Delivery System.”
Each of the EFFECTPLOT, ESTIMATE, LSMEANS, LSMESTIMATE, and SLICE statements
also creates tables, which are not listed in Table 66.9. For information about these tables, see the
corresponding sections of Chapter 19, “Shared Concepts and Topics.”
Table 66.9 ODS Tables Produced by PROC PLM
Table Name
Description
Required Option
ByVarInfo
Information about BY variables in
source item store (if present)
Level information from the CLASS
statement
SHOW BYVAR
ClassLevels
Corr
Cov
FitStatistics
Hessian
Hermite
ParameterEstimates
ParameterSample
Program
StoreInfo
XpX
XpXI
Default output when model effects depend on CLASS variables
Correlation matrix of parameter esti- SHOW CORR
mates
Covariance matrix of parameter esti- SHOW COV
mates
Fit statistics
SHOW FIT
Hessian matrix
SHOW HESSIAN
Hermite matrix
SHOW HERMITE
Parameter estimates
SHOW PARMS
Sampled (posterior) parameter esti- SHOW PARMS
mates
Originating source code
SHOW PROGRAM
Information about source item store Default
X0 X matrix
SHOW XPX
0
.X X/ matrix
SHOW XPXI
5438 F Chapter 66: The PLM Procedure
ODS Graphics
When the ODS Graphics are in effect, then each of the EFFECTPLOT, ESTIMATE, LSMEANS,
LSMESTIMATE, and SLICE statements can produce plots associated with their analyses. For
information about these plots, see the corresponding sections of Chapter 19, “Shared Concepts and
Topics.”
Examples: PLM Procedure
Example 66.1: Scoring with PROC PLM
Logistic regression with model selection is often used to extract useful information and build
interpretable models for classification problems with many variables. This example demonstrates
how you can use PROC LOGISTIC to build a spline model on a simulated data set and how you can
later use the fitted model to classify new observations.
The following DATA step creates a data set named SimuData, which contains 5,000 observations
and 100 continuous variables:
%let nObs
= 5000;
%let nVars = 100;
data SimuData;
array x{&nVars};
do obsNum=1 to &nObs;
do j=1 to &nVars;
x{j}=ranuni(1);
end;
linp =
10 + 11*x1 - 10*sqrt(x2) + 2/x3 - 8*exp(x4) + 7*x5*x5
- 6*x6**1.5 + 5*log(x7) - 4*sin(3.14*x8) + 3*x9 - 2*x10;
TrueProb = 1/(1+exp(-linp));
if ranuni(1) < TrueProb then y=1;
else y=0;
output;
end;
run;
The response is binary based on the inversely transformed logit values. The true logit is a function of
only 10 of the 100 variables, including nonlinear transformations of seven variables, as follows:
p
2
8 exp.x4 /C7x52 6x61:5 C5 log.x7 / 4 sin.3:14x8 /C3x9 2x10
logit.p/ D 10C11x1 10 x2 C
x3
Example 66.1: Scoring with PROC PLM F 5439
Now suppose the true model is not known. With some exploratory data analysis, you determine that
the dependency of the logit on some variables is nonlinear. Therefore, you decide to use splines to
model this nonlinear dependence. Also, you want to use stepwise regression to remove unimportant
variable transformations. The following statements perform the task:
proc logistic data=SimuData;
effect splines = spline(x1-x&nVars/separate);
model y = splines/selection=stepwise;
store sasuser.SimuModel;
run;
By default, PROC LOGISTIC models the probability that y D 0. The EFFECT statement requests
an effect named splines constructed by all predictors in the data. The SEPARATE option specifies
that the spline basis for each variable be treated as a separate set so that model selection applies to
each individual set. The SELECTION=STEPWISE specifies the stepwise regression as the model
selection technique. The STORE statement requests that the fitted model be saved to an item store
sasuser.SimuModel. See “Example 66.2: Working with Item Stores” on page 5440 for an example
with more details about working with item stores.
The spline effect for each predictor produces seven columns in the design matrix, making stepwise
regression computationally intensive. For example, a typical Pentium 4 workstation takes around ten
minutes to run the preceding statements. Real data sets for classification can be much larger. See
examples at UCI Machine Learning Repository (Asuncion and Newman 2007). If new observations
about which you want to make predictions are available at model fitting time, you can add the
SCORE statement in the LOGISTIC procedure. However, if observations to predict become available
after fitting the model, you must use the LOGISTIC procedure to refit the model to make predictions
for new observations. With PROC PLM, you do not have to repeat the intimidating model-fitting
processes multiple times. You can use the SCORE statement in the PLM procedure to score new
observations based on the item store sasuser.SimuModel that was created during the initial model
building. For example, to compute the probability of y D 0 for one new observation with all
predictor values equal to 0.15 in the data set test, you can use the following statements:
data test;
array x{&nVars};
do j=1 to &nVars;
x{j}=0.15;
end;
drop j;
output;
run;
proc plm source=sasuser.SimuModel;
score data=test out=testout predicted / ilink;
run;
The ILINK option in the SCORE statement requestes that predicted values be inversely transformed
to the response scale. In this case, it is the predicted probability of y D 0. Output 66.1.1 shows the
predicted probability for the new observation.
5440 F Chapter 66: The PLM Procedure
Output 66.1.1 Predicted Probability for One New Observation
Obs
Predicted
1
0.56649
Example 66.2: Working with Item Stores
This example demonstrates how procedures save statistical analysis context and results into item
stores and how you can use PROC PLM to make post hoc inference based on saved item stores.
The data are taken from McCullagh and Nelder (1989) and concern the effects on taste of various
cheese additives. Four cheese additives were tested, and 52 response ratings for each additive were
obtained. The response was measured on a scale of nine categories that range from strong dislike (1)
to excellent taste (9). The following program saves the data in the data set Cheese. The variable y
contains the taste rating, the variable Additive contains cheese additive types, and the variable freq
contains the frequencies with which each additive received each rating.
data Cheese;
do Additive = 1 to 4;
do y = 1 to 9;
input freq @@;
output;
end;
end;
label y='Taste Rating';
datalines;
0 0 1 7 8 8 19 8 1
6 9 12 11 7 6 1 0 0
1 1 6 8 23 7 5 1 0
0 0 0 1 3 7 14 16 11
;
The response y is a categorical variable that contains nine ordered levels. You can use PROC
LOGISTIC to fit an ordinal model to investigate the effects of the cheese additive types on taste
ratings. Suppose you also want to save the ordinal model into an item store so that you can make
statistical inference later. You can use the following statements to perform the tasks:
proc logistic data=cheese;
freq freq;
class additive y / param=glm;
model y=additive;
store sasuser.cheese;
title 'Ordinal Model on Cheese Additives';
run;
By default, PROC LOGISTIC uses the cumulative logit model for the ordered categorical response. The STORE statement requests that the fitted model be saved to a SAS item store named
sasuser.cheese. The name is a two-level SAS name of the form libname.membername. If libname
Example 66.2: Working with Item Stores F 5441
is not specified in the STORE statement, the fitted results are saved in work.membername and the
item store is deleted after the current SAS session ends. With this example, the fitted model is saved
to an item store named sasuser.cheese in the SASUSER library. It is not deleted after the current
SAS session ends. You can use PROC PLM to restore the results later.
The following statements use PROC PLM to load the saved model context and results by specifying
SOURCE= with the target item store sasuser.cheese. Then they use two SHOW statements to
display separate information saved in the item store. The first SHOW statement with the PROGRAM
option displays the program that was used to generate the item store sasuser.cheese. The second
SHOW statement with the PARMS option displays parameter estimates and associated statistics of
the fitted ordinal model.
proc plm source=sasuser.cheese;
show program;
show parms;
run;
Output 66.2.1 displays the program that generated the item store sasuser.cheese. Except for the
title information, it matches the original program.
Output 66.2.1 Program Information from sasuser.cheese
Ordinal Model on Cheese Additives
The PLM Procedure
SAS Program Information
proc logistic data=cheese;
freq freq;
class additive y / param=glm;
model y=additive;
store sasuser.cheese;
run;
Output 66.2.2 displays estimates of the intercept terms and covariates and associated statistics. The
intercept terms correspond to eight cumulative logits defined on taste ratings; that is, the i th intercept
for i th logit is
!
P
j i pj
P
log
1
j i pj
5442 F Chapter 66: The PLM Procedure
Output 66.2.2 Parameter Estimates of the Ordinal Model
Parameter Estimates
Parameter
Intercept
Intercept
Intercept
Intercept
Intercept
Intercept
Intercept
Intercept
Additive 1
Additive 2
Additive 3
Additive 4
Taste
Rating
1
2
3
4
5
6
7
8
Estimate
Standard
Error
-7.0801
-6.0249
-4.9254
-3.8568
-2.5205
-1.5685
-0.06688
1.4930
1.6128
4.9645
3.3227
0
0.5624
0.4755
0.4272
0.3902
0.3431
0.3086
0.2658
0.3310
0.3778
0.4741
0.4251
.
You can perform various statistical inference tasks from a saved item store, as long as the task is
applicable under the model context. For example, you can perform group comparisons between
different cheese additive types. See the next example for details.
Example 66.3: Group Comparisons in Ordinal Model
This example continues the study of the effects on taste of various cheese additives. You have
finished fitting an ordinal logistic model and saved it to an item store named sasuser.cheese in the
previous example. Suppose you want to make comparisons between any pair of cheese additives.
You can conduct the analysis by using the ESTIMATE statement and constructing an appropriate L
matrix, or by using the LSMEANS statement to compute least squares means differences. For an
ordinal logistic model with the cumulative logit link, the least squares means are predicted population
margins of the cumulative logits. The following statements compute and display differences between
least squares means of cheese additive:
ods graphics on;
proc plm source=sasuser.cheese;
lsmeans additive / cl diff oddsratio plot=diff;
run;
ods graphics off;
There are four options specified for the LSMEANS statement in the preceding statements. The
DIFF option requests least squares means differences for cheese additives. Since the fitted model
is an ordinal logistic model with the cumulative logit link, the least squares means differences
represent log cumulative odds ratios. The ODDSRATIO option requests exponentiation of the LSmeans differences which produces cumulative odds ratios. The CL option requests that confidence
limits be constructed for the LS-means differences. When ODS GRAPHICS ON is specified, the
PLOTS=DIFF option requests a display of all pairwise least squares means differences and their
significance.
Example 66.3: Group Comparisons in Ordinal Model F 5443
Output 66.3.1 displays the LS-means differences. The reported log odds ratios indicate the relative
difference among the cheese additives. A negative log odds ratio indicates that the first category
(displayed in the “Additive” column) having a lower taste rating is less likely than the second category
(displayed in the “_Additive” column) having a lower taste rating. For example, the log odds ratio
between cheese additive 1 and 2 is 3:3517 and the corresponding odds ratio is 0:035. This means
the odds of cheese additive 1 receiving a poor rating is 0:035 times the odds of cheese additive 2
receiving a poor rating. In addition to the highly significant p-value (< 0:0001), the confidence
limits for both the log odds ratio and the odds ratio indicate that you can reject the null hypothesis
that the odds of cheese additive 1 having a lower taste rating is the same as that of cheese additive 2
having a lower rating. Similarly, the odds of cheese additive 2 having a lower rating is 143:241 (with
95% confidence limits .56:558; 362:777/) times the odds of cheese additive 4 having a lower rating.
With the same logic, you can conclude that the preference order for the four cheese types from the
most favorable to the least favorable is: 4, 1, 3 and 2.
Output 66.3.1 LS-Means Differences of Additive
Ordinal Model on Cheese Additives
The PLM Procedure
Differences of Additive Least Squares Means
Additive
_Additive
1
1
1
2
2
3
2
3
4
3
4
4
Estimate
Standard
Error
z Value
Pr > |z|
Alpha
-3.3517
-1.7098
1.6128
1.6419
4.9645
3.3227
0.4235
0.3731
0.3778
0.3738
0.4741
0.4251
-7.91
-4.58
4.27
4.39
10.47
7.82
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
0.05
0.05
0.05
0.05
0.05
0.05
Differences of Additive Least Squares Means
Additive
_Additive
1
1
1
2
2
3
2
3
4
3
4
4
Lower
Upper
Odds
Ratio
Lower
Confidence
Limit for
Odds Ratio
-4.1818
-2.4410
0.8724
0.9092
4.0353
2.4895
-2.5216
-0.9787
2.3532
2.3746
5.8938
4.1558
0.035
0.181
5.017
5.165
143.241
27.734
0.015
0.087
2.393
2.482
56.558
12.055
Upper
Confidence
Limit for
Odds Ratio
0.080
0.376
10.520
10.746
362.777
63.805
Output 66.3.2 displays the DiffPlot. This shows that all pairs of LS-means differences, equivalent to
log odds ratios in this case, are significant at the level of ˛ D 0:05. This means that the preference
between any pair of the four cheese additive types are statistically significantly different.
5444 F Chapter 66: The PLM Procedure
Output 66.3.2 LS-Means Plot of Pairwise Differences
Example 66.4: Posterior Inference for Binomial Data
This example demonstrates how you can use PROC PLM to perform posterior inference from a
Bayesian analysis. The data for this example are taken from Weisberg (1985) and concern the effect
of small electrical currents on farm animals. The ultimate goal of the experiment was to understand
the effects of high-voltage power lines on livestock and to better protect farm animals. Seven cows
and six shock intensities were used in two experiments. In one experiment, each cow was given 30
electrical shocks with five at each shock intensity in random order. The number of shock responses
was recorded for each cow at each shock level. The experiment was then repeated to investigate
whether the response diminished due to fatigue of cows, or due to learning. So each cow received
a total of 60 shocks. For the following analysis, the cow difference is ignored. The following
DATA step lists the data where the variable current represents the shock level, the variable response
represents the number of shock responses, the variable trial represents the total number of trials at
each shock level, and the variable experiment represents the experiment number (1 for the initial
experiment and 2 for the repeated one):
Example 66.4: Posterior Inference for Binomial Data F 5445
data cow;
input current response trial experiment;
datalines;
0 0 35 1
0 0 35 2
1 6 35 1
1 3 35 2
2 13 35 1
2 8 35 2
3 26 35 1
3 21 35 2
4 33 35 1
4 27 35 2
5 34 35 1
5 29 35 2
;
Suppose you are interested in modeling the distribution of the shock response based on the level of
the current and the experiment number. You can use the GENMOD procedure to fit a frequentist
logistic model for the data. However, if you have some prior information about parameter estimates,
you can fit a Bayesian logistic regression model to take this prior information into account. In this
case, suppose you believe the logit of response has a positive association with the shock level but you
are uncertain about the ranges of other regression coefficients. To incorporate this prior information
in the regression model, you can use the following statements:
data prior;
input _type_$ current;
datalines;
mean 100
var
50
;
proc genmod data=cow;
class experiment;
bayes coeffprior=normal(input=prior) seed=1;
model response/trial = current|experiment / dist=binomial;
store cowgmd;
title 'Bayesian Logistic Model on Cow';
run;
The DATA step before the GENMOD procedure creates a data set prior that specifies the prior
distribution information for current, which in this case is a normal distribution with mean 100 and
variance 50. This reflects a rough belief in a positive coefficient in a moderate range for current. The
prior distribution parameters are not specified for experiment and the interaction between experiment
and current, and so PROC GENMOD assigns a default prior for them, which is a normal distribution
with mean 0 and variance 1E6.
After the DATA step, the BAYES statement in PROC GENMOD specifies that the regression
coefficients follow a normal distribution with mean and variance specified in the input data set named
prior. It also specifies 1 as the seed for the random number generator in the simulation of the posterior
sample. The MODEL statement requests a logistic regression model with a logit link. The STORE
statement requests that the fitted results be saved into an item store named cowgmd.
5446 F Chapter 66: The PLM Procedure
The convergence diagnostics in the output of PROC GENMOD indicate that the Markov chain has
converged. Output 66.4.1 displays summaries on the posterior sample of the regression coefficients.
The posterior mean for the intercept is 3:5857 with a 95% HPD interval . 4:5226; 2:6303/. The
posterior mean of the coefficient for current is 1:1893 with a 95% HPD interval .0:8950; 1:4946/,
which indicates a positive association between the logit of response and the shock level. Further
investigation about whether shock reaction was different between two experiment is warranted.
Output 66.4.1 Posterior Summaries on the Bayesian Logistic Model
Bayesian Logistic Model on Cow
The GENMOD Procedure
Bayesian Analysis
Posterior Summaries
Parameter
Intercept
current
experiment1
experiment1current
N
Mean
Standard
Deviation
25%
10000
10000
10000
10000
-3.5857
1.1893
0.00727
0.3695
0.4822
0.1536
0.7025
0.2529
-3.9014
1.0833
-0.4483
0.1977
Percentiles
50%
-3.5704
1.1843
0.00849
0.3651
75%
-3.2553
1.2893
0.4879
0.5332
Posterior Intervals
Parameter
Alpha
Equal-Tail Interval
Intercept
current
experiment1
experiment1current
0.050
0.050
0.050
0.050
-4.5814
0.9016
-1.4347
-0.1134
-2.6799
1.5047
1.3517
0.8802
HPD Interval
-4.5226
0.8950
-1.4390
-0.1105
-2.6303
1.4946
1.3439
0.8809
Bayesian model fitting typically involves a large amount of simulation. Using the item store and
PROC PLM, you do not need to refit the model to perform further posterior inference. Suppose
you want to determine whether the shock reaction for the current level is different between the two
experiments. You can use PROC PLM with the ESTIMATE statement in the following statements:
proc plm source=cowgmd;
estimate
'Diff at current 0' experiment
'Diff at current 1' experiment
'Diff at current 2' experiment
'Diff at current 3' experiment
'Diff at current 4' experiment
'Diff at current 5' experiment
/ exp cl;
run;
1
1
1
1
1
1
-1
-1
-1
-1
-1
-1
current*experiment
current*experiment
current*experiment
current*experiment
current*experiment
current*experiment
[1,
[1,
[1,
[1,
[1,
[1,
0
1
2
3
4
5
1]
1]
1]
1]
1]
1]
[-1,
[-1,
[-1,
[-1,
[-1,
[-1,
0
1
2
3
4
5
2],
2],
2],
2],
2],
2]
Example 66.4: Posterior Inference for Binomial Data F 5447
Each line in the ESTIMATE statement compares the fits between the two groups at each current
level. The nonpositional syntax is used for the interaction effect current*experiment. For example, the
first line requests coefficient 1 for the interaction effect at current level 0 for the initial experiment,
and coefficient –1 for the effect at current level 0 for the repeated experiment. The two terms are
then added to derive the difference. For more details about the nonpositional syntax, see “Positional
and Nonpositional Syntax for Coefficients in Linear Functions” on page 473 of Chapter 19, “Shared
Concepts and Topics.”
The EXP option exponentiates log odds ratios to produce odds ratios. The CL option requests that
confidence limits be constructed for both log odds ratios and odds ratios. Output 66.4.2 lists the
posterior sample estimates for differences between experiments at different current levels.
Output 66.4.2 Comparisons between Experiments at Different Current Levels
Bayesian Logistic Model on Cow
The PLM Procedure
Sample Estimates
Label
Diff
Diff
Diff
Diff
Diff
Diff
at
at
at
at
at
at
current
current
current
current
current
current
0
1
2
3
4
5
N
Estimate
Standard
Deviation
10000
10000
10000
10000
10000
10000
0.007272
0.3767
0.7462
1.1156
1.4851
1.8546
0.7025
0.4840
0.3207
0.3151
0.4729
0.6899
---------Percentiles-------25th
50th
75th
-0.4483
0.0590
0.5316
0.9023
1.1681
1.3925
0.00849
0.3802
0.7500
1.1113
1.4739
1.8382
0.4879
0.7051
0.9581
1.3253
1.7943
2.3004
Sample Estimates
Label
Diff
Diff
Diff
Diff
Diff
Diff
at
at
at
at
at
at
current
current
current
current
current
current
0
1
2
3
4
5
Alpha
Lower
HPD
Upper
HPD
Exponentiated
Standard
Deviation of
Exponentiated
0.05
0.05
0.05
0.05
0.05
0.05
-1.4390
-0.6113
0.1091
0.5205
0.5601
0.4712
1.3439
1.3007
1.3665
1.7407
2.4287
3.1885
1.2811
1.6362
2.2202
3.2082
4.9514
8.1917
0.974564
0.824141
0.730518
1.052458
2.602392
7.099208
Sample Estimates
-------Percentiles for
Exponentiated-------25th
50th
75th
Label
Diff
Diff
Diff
Diff
Diff
Diff
at
at
at
at
at
at
current
current
current
current
current
current
0
1
2
3
4
5
0.6387
1.0608
1.7017
2.4652
3.2157
4.0250
1.0085
1.4626
2.1170
3.0383
4.3661
6.2849
1.6289
2.0240
2.6066
3.7632
6.0152
9.9777
Lower HPD of
Exponentiated
Upper HPD of
Exponentiated
0.07387
0.4184
0.9713
1.4772
1.3250
0.8604
3.1001
3.2783
3.6418
5.3149
9.9922
20.2432
5448 F Chapter 66: The PLM Procedure
The sample statistics are constructed from the posterior sample saved in the item store cowgmd.
From the output, the odds of a cow showing shock reaction at level 0 in the initial experiment is
1.2811 (with a 95% HPD interval (0:07387, 3:1001)) times the odds in the repeated experiment. The
HPD interval for the odds ratio is constructed based on the mean and variance of the sample of
the exponentiated log odds ratios, instead of based on the exponentiated mean and variance of the
posterior sample of log odds ratios. The HPD interval suggests that there is not much evidence that
the cows responded differently at current level 0 between the two experiments. Similar conclusions
can be drawn for current level 1, 2, and 5. However, there is strong evidence that cows responded
differently at current level 3 and 4 between the two experiments. The possible explanation is that, if
the current level is so small that cows could hardly feel it or the current level is so strong that cows
could hardly bear it, cows would respond consistently in the two experiment. If the current level is
moderate, cows might get used to it and their response diminished in the repeated experiment.
You can visualize the distribution of the posterior sample of log odds ratios by specifying the PLOTS=
option in the ESTIMATE statement. In the following statements, ODS Graphics is enabled by the
ODS GRAPHICS ON statement, the PLOTS=BOXPLOT option requests a box plot of posterior
distribution of log odds ratios. The suboption ORIENT=HORIZONTAL specifies a horizontal
orientation of the boxes.
ods graphics on;
proc plm source=cowgmd;
estimate
'Diff at current 0' experiment 1 -1 current*experiment
'Diff at current 1' experiment 1 -1 current*experiment
'Diff at current 2' experiment 1 -1 current*experiment
'Diff at current 3' experiment 1 -1 current*experiment
'Diff at current 4' experiment 1 -1 current*experiment
'Diff at current 5' experiment 1 -1 current*experiment
/ plots=boxplot(orient=horizontal);
run;
ods graphics off;
[1,
[1,
[1,
[1,
[1,
[1,
0
1
2
3
4
5
1]
1]
1]
1]
1]
1]
[-1,
[-1,
[-1,
[-1,
[-1,
[-1,
0
1
2
3
4
5
Output 66.4.3 displays the box plot of the posterior sample of log odds ratios. The two boxes for
differences at current level 3 and 4 show that the corresponding log odds ratios are significantly
larger than the reference value x D 0. This indicate that there is obvious evidence that the probability
of cow response is larger in the initial experiment than in the repeated one at the two current levels.
The other four boxes show that the corresponding log odds ratios are not significantly different from
0, which suggests that there is no obvious reaction difference at current level 0, 1, 2, and 5 between
the two experiments.
2],
2],
2],
2],
2],
2]
Example 66.5: By-Group Processing F 5449
Output 66.4.3 Box Plot of Difference between Two Experiments
Example 66.5: By-Group Processing
This example uses a data set on a study of the analgesic effects of treatments on elderly patients
with neuralgia. The purpose of this example is to show how PROC PLM behaves under different
situations when By-group processing is present. Two test treatments and a placebo are compared to
test whether the patient reported pain or not. For each patient, the information of age, gender, and the
duration of complaint before the treatment began were recorded. The following DATA step creates
the data set named Neuralgia:
5450 F Chapter 66: The PLM Procedure
Data Neuralgia;
input Treatment $
datalines;
P F 68 1 No B M 74
P M 66 26 Yes B F 67
A F 71 12 No B F 72
A M 71 17 Yes A F 63
B F 66 12 No A M 62
A F 64 17 No P M 74
P M 70 1 Yes B M 66
A F 64 30 No A M 70
B F 78 1 No P M 83
B M 75 30 Yes P M 77
A M 70 12 No A F 69
B M 70 1 No B M 67
P M 78 12 Yes B M 77
P M 66 4 Yes P F 65
A M 78 15 Yes B M 75
P F 72 27 No P F 70
B F 65 7 No P F 68
P M 67 17 Yes B M 70
P F 67 1 Yes A M 67
A F 74 1 No B M 80
;
Sex $ Age Duration Pain $ @@;
16
28
50
27
42
4
19
28
1
29
12
23
1
29
21
13
27
22
10
21
No
No
No
No
No
No
No
No
Yes
Yes
No
No
Yes
No
Yes
Yes
Yes
No
No
Yes
P
B
B
A
P
A
B
A
B
P
B
A
B
P
A
A
P
A
P
A
F
F
F
F
F
F
M
M
F
F
F
M
F
M
F
M
M
M
F
F
67
77
76
69
64
72
59
69
69
79
65
76
69
60
67
75
68
65
72
69
30
16
9
18
1
25
29
1
42
20
14
25
24
26
11
6
11
15
11
3
No
No
Yes
Yes
Yes
No
No
No
No
Yes
No
Yes
No
Yes
No
Yes
Yes
No
Yes
No
The data set contains five variables. Treatment is a classification variable that has three levels: A and
B represent the two test treatments, and P represents the placebo treatment. Sex is a classification
variable that indicates each patient’s gender. Age is a continuous variable that indicates the age in
years of each patient when a treatment began. Duration is a continuous variable that indicates the
duration of complaint in months. The last variable Pain is the response variable with two levels: ‘Yes’
if pain was reported, ‘No’ if no pain was reported.
Suppose there is some preliminary belief that the dependency of pain on the explanatory variables is
different for male and female patients, leading to separate models between genders. You believe there
might be redundant information for predicting the probability of Pain. Thus, you want to perform
model selection to eliminate unnecessary effects. You can use the following statements:
proc sort data=Neuralgia;
by sex;
run;
proc logistic data=Neuralgia;
class Treatment / param=glm;
model pain = Treatment Age Duration / selection=backward;
by sex;
store painmodel;
title 'Logistic Model on Neuralgia';
run;
Example 66.5: By-Group Processing F 5451
PROC SORT is called to sort the data by variable Sex. The LOGISTIC procedure is then called
to fit the probability of no pain. Three variables are specified for the full model: Treatment, Age,
and Duration. The backward elimination is used as the model selection method. The BY statement
specifies that separate models be fitted for male and female patients. Finally, the STORE statement
specifies that the fitted results be saved to an item store named painmodel.
Output 66.5.1 lists parameter estimates from the two models after backward elimination is performed.
From the model for female patients, Treatment is the only factor that affects the probability of no pain,
and Treatment A and B have the same positive effect in predicting the probability of no pain. From
the model for male patients, both Treatment and Age are included in the selected model. Treatment A
and B have different positive effects, while Age has a negative effect in predicting the probability of
no pain.
Output 66.5.1 Parameter Estimates for Male and Female Patients
Logistic Model on Neuralgia
------------------------------------ Sex=F ------------------------------------The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Parameter
Intercept
Treatment A
Treatment B
Treatment P
DF
Estimate
Standard
Error
Wald
Chi-Square
Pr > ChiSq
1
1
1
0
-0.4055
2.6027
2.6027
0
0.6455
1.2360
1.2360
.
0.3946
4.4339
4.4339
.
0.5299
0.0352
0.0352
.
------------------------------------ Sex=M ------------------------------------Analysis of Maximum Likelihood Estimates
Parameter
Intercept
Treatment A
Treatment B
Treatment P
Age
DF
Estimate
Standard
Error
Wald
Chi-Square
Pr > ChiSq
1
1
1
0
1
20.6178
3.9982
4.5556
0
-0.3416
9.1638
1.7333
1.9252
.
0.1408
5.0621
5.3208
5.5993
.
5.8869
0.0245
0.0211
0.0180
.
0.0153
5452 F Chapter 66: The PLM Procedure
Now the fitted models are saved to the item store painmodel. Suppose you want to use it to score
several new observations. The following DATA steps create three data sets for scoring:
data score1;
input Treatment $ Sex $ Age;
datalines;
A F 20
B F 30
P F 40
A M 20
B M 30
P M 40
;
data score2;
set score1(drop=sex);
run;
data score3;
set score2(drop=Age);
run;
The first score data set score1 contains six observations and all the variables that are specified in
the full model. The second score data set score2 is a duplicate of score1 except that Sex is dropped.
The third score data set score3 is a duplicate of score2 except that Age is dropped. You can use the
following statements to score the three data sets:
proc plm
score
score
score
run;
source=painmodel;
data=score1 out=score1out predicted;
data=score2 out=score2out predicted;
data=score3 out=score3out predicted;
Output 66.5.2 lists the store information that PROC PLM reads from the item store painmodel. The
“Model Effects” entry lists all three variables that are specified in the full model before the By-group
processing.
Output 66.5.2 Item Store Information for painmodel
Logistic Model on Neuralgia
The PLM Procedure
Store Information
Item Store
Data Set Created From
Created By
Date Created
By Variable
Response Variable
Link Function
Distribution
Class Variables
Model Effects
WORK.PAINMODEL
WORK.NEURALGIA
PROC LOGISTIC
13JAN10:13:02:41
Sex
Pain
Logit
Binary
Treatment Pain
Intercept Treatment Age Duration
Example 66.5: By-Group Processing F 5453
With the three SCORE statements, three data sets are thus produced: score1out, score2out, and
score3out. They contain the linear predictors in addition to all original variables. The data set
score1out contains the values shown in Output 66.5.3:
Output 66.5.3 Values of Data Set score1out
Logistic Model on Neuralgia
Obs
Treatment
Sex
1
2
3
4
5
6
A
B
P
A
B
P
F
F
F
M
M
M
Age
Predicted
20
30
40
20
30
40
2.1972
2.1972
-0.4055
17.7850
14.9269
6.9557
Linear predictors are computed for all six observations. Because the BY variable Sex is available in
score1, PROC PLM uses separate models to score observations of male and female patients. So an
observation with the same Treatment and Age has different linear predictors for different genders.
The data set score2out contains the values shown in Output 66.5.4:
Output 66.5.4 Values of Data Set score2out
Logistic Model on Neuralgia
Obs
1
2
3
4
5
6
7
8
9
10
11
12
Sex
Treatment
F
F
F
F
F
F
M
M
M
M
M
M
A
B
P
A
B
P
A
B
P
A
B
P
Age
Predicted
20
30
40
20
30
40
20
30
40
20
30
40
2.1972
2.1972
-0.4055
2.1972
2.1972
-0.4055
17.7850
14.9269
6.9557
17.7850
14.9269
6.9557
The second score data set score2 does not contain the BY variable Sex. PROC PLM continues
to score the full data set two times. Each time the scoring is based on the fitted model for each
corresponding By-group. In the output data set, Sex is added at the first column as the By-group
indicator. The first six entries correspond to the model for female patients, and the next six entries
correspond to the model for male patients. Age is not included in the first model, and Treatment A and
B have the same parameter estimates, so observations 1, 2, 4, and 5 have the same linear predicted
value.
5454 F Chapter 66: The PLM Procedure
The data set score3out contains the values shown in Output 66.5.5:
Output 66.5.5 Values of Data Set score3out
Logistic Model on Neuralgia
Obs
1
2
3
4
5
6
7
8
9
10
11
12
Sex
Treatment
F
F
F
F
F
F
M
M
M
M
M
M
A
B
P
A
B
P
A
B
P
A
B
P
Predicted
2.19722
2.19722
-0.40547
2.19722
2.19722
-0.40547
.
.
.
.
.
.
The third score data set score3 does not contain the BY variable Sex. PROC PLM scores the full data
twice with separate models. Furthermore, it does not contain the variable Age, which is a selected
variable for predicting the probability of no pain for male patients. Thus, PROC PLM computes
linear predictor values for score3 by using the first model for female patients, and sets the linear
predictor to missing when using the second model for male patients to score the data set.
Example 66.6: Comparing Multiple B-Splines
This example conducts an analysis similar to Example 15 in Chapter 38.33, “Examples: GLIMMIX
Procedure.” It uses simulated data to perform multiple comparisons among predicted values in
a model with group-specific trends that are modeled through regression splines. The estimable
functions are formed using nonpositional syntax with constructed effects. Consider the data in
the following DATA step. Each of the 100 observations for the continuous response variable y is
associated with one of two groups.
Example 66.6: Comparing Multiple B-Splines F 5455
data spline;
input group y @@;
x = _n_;
datalines;
1
-.020 1
0.199
2
-.397 1
0.065
1
0.253 2
-.460
1
0.379 1
0.971
2
0.574 2
0.755
2
1.088 2
0.607
1
0.629 2
1.237
2
1.002 2
1.201
1
1.329 1
1.580
2
1.052 2
1.108
2
1.726 2
1.179
2
2.105 2
1.828
1
1.984 2
1.867
2
1.522 2
2.200
1
2.769 1
2.534
1
2.873 1
2.678
1
2.893 1
3.023
2
2.549 1
2.836
1
3.727 1
3.806
1
2.948 2
1.954
1
3.744 2
2.431
2
1.996 2
2.028
2
2.337 1
4.516
2
2.474 2
2.221
1
5.253 2
3.024
;
2
2
2
1
1
2
2
1
2
2
2
2
1
1
2
1
1
2
1
2
2
2
2
1
2
-1.36
-.861
0.195
0.712
0.316
0.959
0.734
1.520
1.098
1.257
1.338
1.368
2.771
2.562
1.969
3.135
3.050
2.375
3.269
2.326
2.040
2.321
2.326
4.867
2.403
1
1
2
2
2
1
2
1
1
2
1
1
1
1
1
2
2
2
1
2
1
2
2
2
1
-.026
0.251
-.108
0.811
0.961
0.653
0.299
1.105
1.613
2.005
1.707
2.252
2.052
2.517
2.460
1.705
2.273
1.841
3.533
2.017
3.995
2.479
2.144
2.453
5.498
The following statements fit a model with separate trends for the two groups; the trends are modeled
as B-splines.
proc orthoreg data=spline;
class group;
effect spl = spline(x);
model y = group spl*group / noint;
store ortho_spline;
title 'B-splines Comparisons';
run;
Results from this analysis are shown in Output 66.6.1. The “Parameter Estimates” table shows the
estimates for the spline coefficients in the two groups.
5456 F Chapter 66: The PLM Procedure
Output 66.6.1 Results for Group-Specific Spline Model
B-splines Comparisons
The ORTHOREG Procedure
Dependent Variable: y
Source
Model
Error
Uncorrected Total
DF
Sum of
Squares
14
86
100
481.92117059
6.3223804119
488.243551
Root MSE
R-Square
Parameter
(group='1')
(group='2')
spl_group_1_1
spl_group_1_2
spl_group_2_1
spl_group_2_2
spl_group_3_1
spl_group_3_2
spl_group_4_1
spl_group_4_2
spl_group_5_1
spl_group_5_2
spl_group_6_1
spl_group_6_2
spl_group_7_1
spl_group_7_2
Mean Square
F Value
Pr > F
34.422940756
0.0735160513
468.24
<.0001
0.2711384357
0.9603214326
DF
Parameter Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
9.70265463962039
6.30619220563569
-11.1786451718041
-20.1946092746139
-9.53273697995301
-5.85652496534967
-8.96118371893294
-5.55671605245205
-7.26153231478755
-4.36778889738236
-6.44615256510896
-4.03801618914902
-4.63816959094139
-4.30290104395061
0
0
3.1341899987
2.6299147768
3.7008097395
3.9765046236
3.2575832048
2.7906116773
3.0717508806
2.5716715573
3.243690314
2.7246809593
2.9616955361
2.4588839125
3.7094636319
3.0478540171
.
.
3.10
2.40
-3.02
-5.08
-2.93
-2.10
-2.92
-2.16
-2.24
-1.60
-2.18
-1.64
-1.25
-1.41
.
.
0.0026
0.0187
0.0033
<.0001
0.0044
0.0388
0.0045
0.0335
0.0278
0.1126
0.0323
0.1042
0.2146
0.1616
.
.
By default, the ORTHOREG procedure constructs B-splines with seven knots. Since B-spline
coefficients satisfy a sum-to-one constraint and since the model contains group-specific intercepts,
the last spline coefficient for each group is redundant and estimated as 0.
The following statements make a prediction for the input data set by using the SCORE statement
with PROC PLM and graph the observed and predicted values in the two groups:
proc plm source=ortho_spline;
score data=spline out=ortho_pred predicted=p;
run;
proc sgplot data=ortho_pred;
series y=p x=x / group=group name="fit";
scatter y=y x=x / group=group;
keylegend "fit" / title="Group";
run;
Example 66.6: Comparing Multiple B-Splines F 5457
The prediction plot in Output 66.6.2 suggests that there is some separation of the group trends for
small values of x and for values that exceed about x D 40.
Output 66.6.2 Observed Data and Predicted Values by Group
In order to determine the range on which the trends separate significantly, the PLM procedure is
executed in the following statements with an ESTIMATE statement that applies group comparisons
at a number of values for the spline variable x:
%macro GroupDiff;
%do x=0 %to 75 %by 5;
"Diff at x=&x" group 1 -1 group*spl [1,1 &x] [-1,2
%end;
'Diff at x=80' group 1 -1 group*spl [1,1 80] [-1,2 80]
%mend;
proc plm source=ortho_spline;
show effects;
estimate %GroupDiff / adjust=simulate seed=1 stepdown;
run;
&x],
5458 F Chapter 66: The PLM Procedure
For example, the following ESTIMATE statement compares the trends between the two groups at
x D 25:
estimate 'Diff at x=25' group 1 -1 group*spl [1,1 25] [-1,2 25];
The nonpositional syntax is used for the group*spl effect. For example, the specification Œ 1; 2 25
requests that the spline be computed at x D 25 for the second level of variable group. The resulting
coefficients are added to the bL vector for the estimate after being multiplied with 1.
Because comparisons are made at a large number of values for x, a multiplicity correction is in
order to adjust the p-values to reflect familywise error control. Simulated p-values with step-down
adjustment are used here.
Output 66.6.3 displays the “Store Information” for the item store and information about the spline
effect (the result of the SHOW statement).
Output 66.6.3 Spline Details
B-splines Comparisons
The PLM Procedure
Store Information
Item Store
Data Set Created From
Created By
Date Created
Response Variable
Class Variable
Constructed Effect
Model Effects
WORK.ORTHO_SPLINE
WORK.SPLINE
PROC ORTHOREG
13JAN10:13:03:14
y
group
spl
group spl*group
B-splines Comparisons
The PLM Procedure
Knots for Spline Effect spl
Knot
Number
1
2
3
4
5
6
7
8
9
Boundary
*
*
*
*
*
*
x
-48.50000
-23.75000
1.00000
25.75000
50.50000
75.25000
100.00000
124.75000
149.50000
Example 66.6: Comparing Multiple B-Splines F 5459
Output 66.6.3 continued
B-splines Comparisons
The PLM Procedure
Basis Details for Spline Effect spl
Column
1
2
3
4
5
6
7
Support
Knots
--------Support-------48.50000
-48.50000
-23.75000
1.00000
25.75000
50.50000
75.25000
25.75000
50.50000
75.25000
100.00000
124.75000
149.50000
149.50000
1-4
1-5
2-6
3-7
4-8
5-9
6-9
Output 66.6.4 displays the results from the ESTIMATE statement.
Output 66.6.4 Estimate Results with Multiplicity Correction
Estimates
Adjustment for Multiplicity: Holm-Simulated
Label
Diff
Diff
Diff
Diff
Diff
Diff
Diff
Diff
Diff
Diff
Diff
Diff
Diff
Diff
Diff
Diff
Diff
at
at
at
at
at
at
at
at
at
at
at
at
at
at
at
at
at
x=0
x=5
x=10
x=15
x=20
x=25
x=30
x=35
x=40
x=45
x=50
x=55
x=60
x=65
x=70
x=75
x=80
Estimate
Standard
Error
DF
t Value
Pr > |t|
Adj P
12.4124
1.0376
0.3778
0.05822
-0.02602
0.02014
0.1023
0.1924
0.2883
0.3877
0.4885
0.5903
0.7031
0.8401
1.0147
1.2400
1.5237
4.2130
0.1759
0.1540
0.1481
0.1243
0.1312
0.1378
0.1236
0.1114
0.1195
0.1308
0.1231
0.1125
0.1203
0.1348
0.1326
0.1281
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
86
2.95
5.90
2.45
0.39
-0.21
0.15
0.74
1.56
2.59
3.24
3.74
4.79
6.25
6.99
7.52
9.35
11.89
0.0041
<.0001
0.0162
0.6952
0.8346
0.8783
0.4600
0.1231
0.0113
0.0017
0.0003
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
0.0206
<.0001
0.0545
0.9101
0.9565
0.9565
0.7418
0.2925
0.0450
0.0096
0.0024
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
Notice that the “Store Information” in Output 66.6.3 displays the classification variables (from the
CLASS statement in PROC ORTHOREG), the constructed effects (from the EFFECT statement in
PROC ORTHOREG), and the model effects (from the MODEL statement in PROC ORTHOREG).
Output 66.6.4 shows that at the 5% significance level the trends are significantly different for x 10
and for x 40. Between those values you cannot reject the hypothesis of trend congruity.
5460 F Chapter 66: The PLM Procedure
To see this effect more clearly, you can filter the results by adding the following filtering statement to
the previous PROC PLM run:
filter adjp > 0.05;
This produces Output 66.6.5, which displays the subset of the results in Output 66.6.4 that meets the
condition in the FILTER expression.
Output 66.6.5 Filtered Estimate Results
B-splines Comparisons
The PLM Procedure
Estimates
Adjustment for Multiplicity: Holm-Simulated
Label
Diff
Diff
Diff
Diff
Diff
Diff
at
at
at
at
at
at
x=10
x=15
x=20
x=25
x=30
x=35
Estimate
Standard
Error
DF
t Value
Pr > |t|
Adj P
0.3778
0.05822
-0.02602
0.02014
0.1023
0.1924
0.1540
0.1481
0.1243
0.1312
0.1378
0.1236
86
86
86
86
86
86
2.45
0.39
-0.21
0.15
0.74
1.56
0.0162
0.6952
0.8346
0.8783
0.4600
0.1231
0.0545
0.9101
0.9565
0.9565
0.7418
0.2925
Example 66.7: Linear Inference with Arbitrary Estimates
Suppose that you have calculated a vector of parameter estimates of dimension .p 1/ and its
associated variance-covariance matrix by some statistical method. You are now interested in using
these results to perform linear inference, or perhaps to score a data set and to calculate predicted
values and their standard errors.
The following DATA steps create two SAS data sets. The first, called parms, contains six estimates
that represent two uncorrelated groups. The data set cov contains the covariance matrix of the
estimates. The lack of correlation between the two sets of three parameters is evident in the blockdiagonal structure of the covariance matrix.
data parms;
length name $6;
input Name$ Value;
datalines;
alpha1 -3.5671
beta1
0.4421
gamma1 -2.6230
alpha2 -3.0111
beta2
0.3977
gamma2 -2.4442
;
Example 66.7: Linear Inference with Arbitrary Estimates F 5461
data cov;
input Parm row col1-col6;
datalines;
1 1 0.007462 -0.005222 0.010234 0.000000 0.000000 0.000000
1 2 -0.005222 0.048197 -0.010590 0.000000 0.000000 0.000000
1 3 0.010234 -0.010590 0.215999 0.000000 0.000000 0.000000
1 4 0.000000 0.000000 0.000000 0.031261 -0.009096 0.015785
1 5 0.000000 0.000000 0.000000 -0.009096 0.039487 -0.019996
1 6 0.000000 0.000000 0.000000 0.015785 -0.019996 0.126172
;
Suppose that you are interested in testing whether the parameters are homogeneous across groups—
that is, whether ˛1 D ˛2 ; ˇ1 D ˇ2 ; 1 D 2. You are interested in testing the hypothesis jointly and
separately with multiplicity adjustment.
In order to use the facilities of the PLM procedure, you first need to create an item store that contains
the necessary information as if the preceding parameter vector and covariance matrix were the
result of a statistical modeling procedure. The following statements use the multivariate facilities of
the GLIMMIX procedure to create such an item store, by fitting a saturated linear model with the
GLIMMIX procedure where the data set that contains the parameter estimates serves as the input
data set:
proc glimmix data=parms order=data;
class Name;
model Value = Name / noint ddfm=none s;
random _residual_ / type=lin(1) ldata=cov v;
parms (1) / noiter;
store ArtificialModel;
title 'Linear Inference';
run;
The RANDOM statement is used to form the covariance structure for the estimates. The PARMS
statement prevents iterative updates of the covariance parameters. The resulting marginal covariance
matrix of the “data” is thus identical to the covariance matrix in the data set cov. The ORDER=DATA
option in the PROC GLIMMIX statement is used to arrange the levels of the classification variable
Name in the order in which they appear in the data set so that the order of the parameters matches
that of the covariance matrix.
The results of this analysis are shown in Output 66.7.1. Note that the parameter estimates are
identical to the values passed in the input data set and their standard errors equal the square root of
the diagonal elements of the cov data set.
5462 F Chapter 66: The PLM Procedure
Output 66.7.1 “Fitted” Parameter Estimates and Covariance Matrix
Linear Inference
The GLIMMIX Procedure
Estimated V Matrix for Subject 1
Row
Col1
Col2
Col3
1
2
3
4
5
6
0.007462
-0.00522
0.01023
-0.00522
0.04820
-0.01059
0.01023
-0.01059
0.2160
Col4
Col5
Col6
0.03126
-0.00910
0.01579
-0.00910
0.03949
-0.02000
0.01579
-0.02000
0.1262
Solutions for Fixed Effects
Effect
name
name
name
name
name
name
name
alpha1
beta1
gamma1
alpha2
beta2
gamma2
Estimate
Standard
Error
DF
t Value
Pr > |t|
-3.5671
0.4421
-2.6230
-3.0111
0.3977
-2.4442
0.08638
0.2195
0.4648
0.1768
0.1987
0.3552
Infty
Infty
Infty
Infty
Infty
Infty
-41.29
2.01
-5.64
-17.03
2.00
-6.88
<.0001
0.0440
<.0001
<.0001
0.0454
<.0001
There are other ways to fit a saturated model with the GLIMMIX procedure. For example, you
can use the TYPE=UN covariance structure in the RANDOM statement with a properly prepared
input data set for the PDATA= option in the PARMS statement. See Example 17 in Chapter 38.33,
“Examples: GLIMMIX Procedure,” for details.
Once the item store exists, you can apply the linear inference capabilities of the PLM procedure.
For example, the ESTIMATE statement in the following statements test the hypothesis of parameter
homogeneity across groups:
proc plm source=ArtificialModel;
estimate
'alpha1 = alpha2' Name 1 0
'beta1 = beta2 ' Name 0 1
'gamma1 = gamma2' Name 0 0
adjust=bon stepdown
run;
0 -1 0 0,
0 0 -1 0,
1 0 0 -1 /
ftest(label='Homogeneity');
References F 5463
Output 66.7.2 Results from the PLM Procedure
Linear Inference
The PLM Procedure
Estimates
Adjustment for Multiplicity: Holm
Label
alpha1 = alpha2
beta1 = beta2
gamma1 = gamma2
Estimate
Standard
Error
DF
t Value
Pr > |t|
Adj P
-0.5560
0.04440
-0.1788
0.1968
0.2961
0.5850
Infty
Infty
Infty
-2.83
0.15
-0.31
0.0047
0.8808
0.7599
0.0142
1.0000
1.0000
F Test for Estimates
Label
Homogeneity
Num
DF
Den
DF
F Value
Pr > F
3
Infty
2.79
0.0389
The F test in Output 66.7.2 shows that the joint test of homogeneity is rejected. The individual tests
with familywise control of the Type I error show that the overall difference is due to a significant
change in the ˛ parameters. The hypothesis of homogeneity across the two groups cannot be rejected
for the ˇ and parameters.
References
Asuncion, A. and Newman, D. J. (2007). UCI Machine Learning Repository, Irvine, CA: University
of California, School of Information and Computer Science.
Kenward, M. G. and Roger, J. H. (1997), “Small Sample Inference for Fixed Effects from Restricted
Maximum Likelihood,” Biometrics, 53, 983–997.
McCullagh, P. and Nelder, J. A. (1989), Generalized Linear Models, Second Edition, London:
Chapman & Hall.
Silvapulle, M. J. and Sen, P. K. (2004), Constrained Statistical Inference: Order, Inequality, and
Shape Constraints, New York: John Wiley & Sons.
Weisberg, S. (1985), Applied Linear Regression, Second Edition. New York: John Wiley & Sons.
Subject Index
alpha level
PLM procedure, 5428
degrees of freedom
PLM procedure, 5428
options summary
ESTIMATE statement, 5422
PLM procedure
alpha level, 5428
BY processing, 5434
common postprocessing statements, 5409
degrees of freedom, 5428
filter PLM results, 5423
item store, 5408
least squares means, 5429
ODS graph names, 5438
ODS Graphics, 5419
ODS table names, 5437
posterior inference, 5434
scoring statistics, 5429
user-defined formats, 5436
scoring statistics
PLM procedure, 5429
Syntax Index
procedure, EFFECTPLOT statement, 5421
procedure, ESTIMATE statement, 5422
procedure, LSMEANS statement, 5425
procedure, LSMESTIMATE statement, 5426
procedure, SLICE statement, 5431
procedure, TEST statement, 5431
ALL option
SHOW statement (PLM), 5430
ALPHA= option
PROC PLM statement (PLM), 5418
SCORE statement (PLM), 5428
BYVAR option
SHOW statement (PLM), 5430
CLASS option
SHOW statement (PLM), 5430
CORRELATION option
SHOW statement (PLM), 5430
COVARIANCE option
SHOW statement (PLM), 5430
DDFMETHOD= option
PROC PLM statement (PLM), 5418
DF= option
SCORE statement (PLM), 5428
EFFECTPLOT statement
procedure, 5421
EFFECTS option
SHOW statement (PLM), 5430
ESTEPS= option
PROC PLM statement (PLM), 5418
ESTIMATE statement
procedure, 5422
FILTER statement
PLM procedure, 5423
FITSTATS option
SHOW statement (PLM), 5430
FORMAT= option
PROC PLM statement (PLM), 5418
HERMITE option
SHOW statement (PLM), 5430
HESSIAN option
SHOW statement (PLM), 5430
ILINK option
SCORE statement (PLM), 5428
LSMEANS statement
procedure, 5425
LSMESTIMATE statement
procedure, 5426
MAXLEN= option
PROC PLM statement (PLM), 5419
NOCLPRINT option
PROC PLM statement (PLM), 5419
NOINFO option
PROC PLM statement (PLM), 5419
NOPRINT option
PROC PLM statement (PLM), 5419
NOUNIQUE option
SCORE statement (PLM), 5428
NOVAR option
SCORE statement (PLM), 5428
OBSCAT option
SCORE statement (PLM), 5428
PARAMETERS option
SHOW statement (PLM), 5430
PERCENTILES= option
PROC PLM statement (PLM), 5419
PLM procedure, 5417
FILTER statement, 5423
PROC PLM statement, 5418
SHOW statement, 5429
syntax, 5417
WHERE statement, 5432
PLM procedure, FILTER statement, 5423
PLM procedure, PROC PLM statement, 5418
ALPHA= option, 5418
DDFMETHOD= option, 5418
ESTEPS= option, 5418
FORMAT= option, 5418
MAXLEN= option, 5419
NOCLPRINT option, 5419
NOINFO option, 5419
PERCENTILES= option, 5419
PLOT option, 5419
PLOTS option, 5419
SEED= option, 5420
SINGCHOL= option, 5420
SINGRES= option, 5420
SINGULAR= option, 5420
SOURCE= option, 5420
STMTORDER= option, 5421
WHEREFORMAT option, 5421
ZETA= option, 5421
PLM procedure, SCORE statement
ALPHA= option, 5428
DF= option, 5428
ILINK option, 5428
NOUNIQUE option, 5428
NOVAR option, 5428
OBSCAT option, 5428
SAMPLE option, 5429
PLM procedure, SHOW statement, 5429
ALL option, 5430
BYVAR option, 5430
CLASS option, 5430
CORREATION option, 5430
COVARIANCE option, 5430
EFFECTS option, 5430
FITSTATS option, 5430
HERMITE option, 5430
HESSIAN option, 5430
PARAMETERS option, 5430
PROGRAM option, 5431
XPX option, 5431
XPXI option, 5431
PLM procedure, WHERE statement, 5432
PLOT option
PROC PLM statement, 5419
PLOTS option
PROC PLM statement, 5419
PROC PLM statement, see PLM procedure
PLM procedure, 5418
PROGRAM option
SHOW statement (PLM), 5431
SAMPLE option
SCORE statement (PLM), 5429
SEED= option
PROC PLM statement (PLM), 5420
SHOW statement
PLM procedure, 5429
SINGCHOL= option
PROC PLM statement (PLM), 5420
SINGRES= option
PROC PLM statement (PLM), 5420
SINGULAR= option
PROC PLM statement (PLM), 5420
SLICE statement
procedure, 5431
SOURCE= option
PROC PLM statement (PLM), 5420
STMTORDER= option
PROC PLM statement (PLM), 5421
TEST statement
procedure, 5431
WHERE statement
PLM procedure, 5432
WHEREFORMAT option
PROC PLM statement (PLM), 5421
XPX option
SHOW statement (PLM), 5431
XPXPI option
SHOW statement (PLM), 5431
ZETA= option
PROC PLM statement (PLM), 5421
Your Turn
We welcome your feedback.
If you have comments about this book, please send them to
[email protected] Include the full title and page numbers (if
applicable).
If you have comments about the software, please send them to
[email protected]
SAS Publishing Delivers!
®
Whether you are new to the work force or an experienced professional, you need to distinguish yourself in this rapidly
changing and competitive job market. SAS Publishing provides you with a wide range of resources to help you set
yourself apart. Visit us online at support.sas.com/bookstore.
®
SAS Press
®
Need to learn the basics? Struggling with a programming problem? You’ll find the expert answers that you
need in example-rich books from SAS Press. Written by experienced SAS professionals from around the
world, SAS Press books deliver real-world insights on a broad range of topics for all skill levels.
SAS Documentation
support.sas.com/saspress
®
To successfully implement applications using SAS software, companies in every industry and on every
continent all turn to the one source for accurate, timely, and reliable information: SAS documentation.
We currently produce the following types of reference documentation to improve your work experience:
• Online help that is built into the software.
• Tutorials that are integrated into the product.
• Reference documentation delivered in HTML and PDF – free on the Web.
• Hard-copy books.
support.sas.com/publishing
SAS Publishing News
®
Subscribe to SAS Publishing News to receive up-to-date information about all new SAS titles, author
podcasts, and new Web site features via e-mail. Complete instructions on how to subscribe, as well as
access to past issues, are available at our Web site.
support.sas.com/spn
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2009 SAS Institute Inc. All rights reserved. 518177_1US.0109
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement