The Four Types of Estimable Functions SAS/STAT User’s Guide

The Four Types of Estimable Functions SAS/STAT User’s Guide
®
SAS/STAT 9.22 User’s Guide
The Four Types of Estimable
Functions
(Book Excerpt)
SAS® Documentation
This document is an individual chapter from SAS/STAT® 9.22 User’s Guide.
The correct bibliographic citation for the complete manual is as follows: SAS Institute Inc. 2010. SAS/STAT® 9.22 User’s
Guide. Cary, NC: SAS Institute Inc.
Copyright © 2010, SAS Institute Inc., Cary, NC, USA
All rights reserved. Produced in the United States of America.
For a Web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at
the time you acquire this publication.
U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation
by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19,
Commercial Computer Software-Restricted Rights (June 1987).
SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513.
1st electronic book, May 2010
SAS® Publishing provides a complete selection of books and electronic products to help customers use SAS software to
its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit the
SAS Publishing Web site at support.sas.com/publishing or call 1-800-727-3228.
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute
Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.
Chapter 15
The Four Types of Estimable Functions
Contents
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
273
Estimability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
273
General Form of an Estimable Function . . . . . . . . . . . . . . . . . . . .
274
Introduction to Reduction Notation . . . . . . . . . . . . . . . . . . . . . .
276
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
277
Estimable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
280
Type I SS and Estimable Functions . . . . . . . . . . . . . . . . . . . . . .
280
Type II SS and Estimable Functions . . . . . . . . . . . . . . . . . . . . . . .
281
Type III and IV SS and Estimable Functions . . . . . . . . . . . . . . . . .
285
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
290
Overview
Many regression and analysis of variance procedures in SAS/STAT label tests for various effects in
the model as Type I, Type II, Type III, or Type IV. These four types of hypotheses might not always
be sufficient for a statistician to perform all desired inferences, but they should suffice for the vast
majority of analyses. This chapter explains the hypotheses involved in each of the four test types.
For additional discussion, see Freund, Littell, and Spector (1991) or Milliken and Johnson (1984).
The primary context of the discussion is testing linear hypotheses in least squares regression and
analysis of variance, such as with PROC GLM. In this context, tests correspond to hypotheses about
linear functions of the true parameters and are evaluated using sums of squares of the estimated
parameters. Thus, there will be frequent references to Type I, II, III, and IV (estimable) functions
and corresponding Type I, II, III, and IV sums of squares, or simply SS.
Estimability
Given a response or dependent variable Y, predictors or independent variables X, and a linear
expectation model EŒY D Xˇ relating the two, a primary analytical goal is to estimate or test for
274 F Chapter 15: The Four Types of Estimable Functions
the significance of certain linear combinations of the elements of ˇ. For least squares regression
and analysis of variance, this is accomplished by computing linear combinations of the observed
Ys. An unbiased linear estimate of a specific linear function of the individual ˇs, say Lˇ, is a linear
combination of the Ys that has an expected value of Lˇ. Hence, the following definition:
A linear combination of the parameters Lˇ is estimable if and only if a linear combination of the Ys exists that has expected value Lˇ.
Any linear combination of the Ys, for instance KY, will have expectation EŒK Y  D KXˇ. Thus,
the expected value of any linear combination of the Ys is equal to that same linear combination of
the rows of X multiplied by ˇ. Therefore,
Lˇ is estimable if and only if there is a linear combination of the rows of X that is equal
to L—that is, if and only if there is a K such that L D KX.
Thus, the rows of X form a generating set from which any estimable L can be constructed. Since the
row space of X is the same as the row space of X0 X, the rows of X0 X also form a generating set from
which all estimable Ls can be constructed. Similarly, the rows of .X0 X/ X0 X also form a generating
set for L.
Therefore, if L can be written as a linear combination of the rows of X, X0 X, or .X0 X/ X0 X, then
Lˇ is estimable.
In the context of least squares regression and analysis of variance, an estimable linear function
Lˇ can be estimated by Lb̌, where b̌ D .X0 X/ X0 Y. From the general theory of linear models,
the unbiased estimator Lb̌ is, in fact, the best linear unbiased estimator of Lˇ, in the sense of
having minimum variance as well as maximum likelihood when the residuals are normal. To test the
hypothesis that Lˇ D 0, compute the sum of squares
SS.H0 W Lˇ D 0/ D .Lb̌/0 .L.X0 X/ L0 /
1
Lb̌
and form an F test with the appropriate error term. Note that in contexts more general than least
squares regression (for example, generalized and/or mixed linear models), linear hypotheses are
often tested by analogous sums of squares of the estimated linear parameters .Lb̌/0 .VarŒLb̌/ 1 Lb̌.
General Form of an Estimable Function
This section demonstrates a shorthand technique for displaying the generating set for any estimable
L. Suppose
2
3
1 1 0 0
2
3
6 1 1 0 0 7
6
7
6 1 0 1 0 7
6 A1 7
7
6
7
XD6
6 1 0 1 0 7 and ˇ D 4 A2 5
6
7
4 1 0 0 1 5
A3
1 0 0 1
General Form of an Estimable Function F 275
X is a generating set for L, but so is the smaller set
3
2
1 1 0 0
X D 4 1 0 1 0 5
1 0 0 1
X is formed from X by deleting duplicate rows.
Since all estimable Ls must be linear functions of the rows of X for Lˇ to be estimable, an L for a
single-degree-of-freedom estimate can be represented symbolically as
L1 .1 1 0 0/ C L2 .1 0 1 0/ C L3 .1 0 0 1/
or
L D .L1 C L2 C L3; L1; L2; L3/
For this example, Lˇ is estimable if and only if the first element of L is equal to the sum of the other
elements of L or if
Lˇ D .L1 C L2 C L3/ C L1 A1 C L2 A2 C L3 A3
is estimable for any values of L1, L2, and L3.
If other generating sets for L are represented symbolically, the symbolic notation looks different.
However, the inherent nature of the rules is the same. For example, if row operations are performed
on X to produce an identity matrix in the first 3 3 submatrix of the resulting matrix
2
3
1 0 0
1
1 5
X D 4 0 1 0
0 0 1
1
then X is also a generating set for L. An estimable L generated from X can be represented
symbolically as
L D .L1; L2; L3; L1
L2
L3/
Note that, again, the first element of L is equal to the sum of the other elements.
With multiple generating sets available, the question arises as to which one is the best to represent
L symbolically. Clearly, a generating set containing a minimum of rows (of full row rank) and a
maximum of zero elements is desirable.
The generalized g2 -inverse .X0 X/ of X0 X computed by the modified sweep operation (Goodnight
1979) has the property that .X0 X/ X0 X usually contains numerous zeros. For this reason, in PROC
GLM the nonzero rows of .X0 X/ X0 X are used to represent L symbolically.
If the generating set represented symbolically is of full row rank, the number of symbols .L1; L2; : : :/
represents the maximum rank of any testable hypothesis (in other words, the maximum number of
linearly independent rows for any L matrix that can be constructed). By letting each symbol in turn
take on the value of 1 while the others are set to 0, the original generating set can be reconstructed.
276 F Chapter 15: The Four Types of Estimable Functions
Introduction to Reduction Notation
Reduction notation can be used to represent differences in sums of squares (SS) for two models.
The notation R.; A; B; C / denotes the complete main-effects model for effects A, B, and C . The
notation
R.A j ; B; C /
denotes the difference between the model SS for the complete main-effects model containing A, B,
and C and the model SS for the reduced model containing only B and C .
In other words, this notation represents the differences in model SS produced by
proc glm;
class a b c;
model y = a b c;
run;
and
proc glm;
class b c;
model y = b c;
run;
As another example, consider a regression equation with four independent variables. The notation
R.ˇ3 ; ˇ4 j ˇ1 ; ˇ2 / denotes the differences in model SS between
y D ˇ0 C ˇ1 x1 C ˇ2 x2 C ˇ3 x3 C ˇ4 x4 C and
y D ˇ0 C ˇ1 x1 C ˇ2 x2 C This is the difference in the model SS for the models produced, respectively, by
model y = x1 x2 x3 x4;
and
model y = x1 x2;
The following examples demonstrate the ability to manipulate the symbolic representation of a
generating set. Note that any operations performed on the symbolic notation have corresponding row
operations that are performed on the generating set itself.
Examples F 277
Examples
A One-Way Classification Model
For the model
Y D C Ai C i D 1; 2; 3
the general form of estimable functions Lˇ is (from the previous example)
Lˇ D L1 C L2 A1 C L3 A2 C .L1
L2
L3/ A3
Thus,
L D .L1; L2; L3; L1
L2
L3/
Tests involving only the parameters A1 , A2 , and A3 must have an L of the form
L D .0; L2; L3; L2
L3/
Since this L for the A parameters involves only two symbols, hypotheses with at most two degrees of
freedom can be constructed. For example, letting .L2; L3/ be .1; 0/ and .0; 1/, respectively, yields
0 1 0
1
LD
0 0 1
1
The preceding L can be used to test the hypothesis that A1 D A2 D A3 . For this example, any L
with two linearly independent rows with column 1 equal to zero produces the same sum of squares.
For example, a joint test for linear and quadratic effects of A
0 1
0
1
LD
0 1
2
1
gives the same SS. In fact, for any L of full row rank and any nonsingular matrix K of conformable
dimensions,
SS.H0 W Lˇ D 0/ D SS.H0 W KLˇ D 0/
A Three-Factor Main-Effects Model
Consider a three-factor main-effects model involving the CLASS variables A, B, and C , as shown in
Table 15.1.
Table 15.1 Three-Factor Main-Effects Model
Obs
A
B
C
1
2
3
4
5
1
1
2
2
2
2
1
1
2
2
1
2
3
2
2
278 F Chapter 15: The Four Types of Estimable Functions
The general form of an estimable function is shown in Table 15.2.
Table 15.2
General Form of an Estimable Function for Three-Factor Main-Effects Model
Parameter
Coefficient
(Intercept)
A1
A2
B1
B2
C1
C2
C3
L1
L2
L1 L2
L4
L1 L4
L6
L1 C L2 L4 2 L6
L2 C L4 C L6
Since only four symbols (L1, L2, L4, and L6) are involved, any testable hypothesis will have
at most four degrees of freedom. If you form an L matrix with four linearly independent rows
according to the preceding rules, then testing Lˇ D 0 is equivalent to testing that EŒY is uniformly
0. Symbolically,
SS.H0 W Lˇ D 0/ D R.; A; B; C /
In a main-effects model, the usual hypothesis of interest for a main effect is the equality of all the
parameters. In this example, it is not possible to unambiguously test such a hypothesis because of
confounding: any test for the equality of the parameters for any one of A, B, or C will necessarily
involve the parameters for the other two effects. One way to proceed is to construct a maximum rank
hypothesis (MRH) involving only the parameters of the main effect in question. This can be done
using the general form of estimable functions. Note the following:
To get an MRH involving only the parameters of A, the coefficients of L associated with ,
B1, B2, C1, C 2, and C 3 must be equated to zero. Starting at the top of the general form, let
L1 D 0, then L4 D 0, then L6 D 0. If C 2 and C 3 are not to be involved, then L2 must also
be zero. Thus, A1 A2 is not estimable; that is, the MRH involving only the A parameters
has zero rank and R.A j ; B; C / D 0.
To obtain the MRH involving only the B parameters, let L1 D L2 D L6 D 0. But then to
remove C 2 and C 3 from the comparison, L4 must also be set to 0. Thus, B1 B2 is not
estimable and R.B j ; A; C / D 0.
To obtain the MRH involving only the C parameters, let L1 D L2 D L4 D 0. Thus, the
MRH involving only C parameters is
C1
2 C2 C C3 D K
(for any K)
or any multiple of the left-hand side equal to K. Furthermore,
SS.H0 W C1
2 C 2 C C 3 D 0/ D R.C j ; A; B/
Examples F 279
A Multiple Regression Model
Suppose
EŒY  D ˇ0 C ˇ1 x1 C ˇ2 x2 C ˇ3 x3
where the X0 X matrix has full rank. The general form of estimable functions is as shown in Table 15.3.
Table 15.3 General Form of Estimable Functions for a Multiple Regression Model When X0 X
Matrix Is of Full Rank
Parameter
Coefficient
ˇ0
ˇ1
ˇ2
ˇ3
L1
L2
L3
L4
For example, to test the hypothesis that ˇ2 D 0, let L1 D L2 D L4 D 0 and let L3 D 1. Then
SS.Lˇ D 0/ D R.ˇ2 j ˇ0 ; ˇ1 ; ˇ3 /. In this full-rank case, all parameters, as well as any linear
combination of parameters, are estimable.
Suppose, however, that X 3 D 2x1 C 3x2 . The general form of estimable functions is shown in
Table 15.4.
Table 15.4 General Form of Estimable Functions for a Multiple Regression Model When X0 X
Matrix Is Not of Full Rank
Parameter
Coefficient
ˇ0
ˇ1
ˇ2
ˇ3
L1
L2
L3
2 L2 C 3 L3
For this example, it is possible to test H0 W ˇ0 D 0. However, ˇ1 , ˇ2 , and ˇ3 are not jointly estimable;
that is,
R.ˇ1 j ˇ0 ; ˇ2 ; ˇ3 / D 0
R.ˇ2 j ˇ0 ; ˇ1 ; ˇ3 / D 0
R.ˇ3 j ˇ0 ; ˇ1 ; ˇ2 / D 0
280 F Chapter 15: The Four Types of Estimable Functions
Estimable Functions
Type I SS and Estimable Functions
In PROC GLM, the Type I SS and the associated hypotheses they test are byproducts of the modified
sweep operator used to compute a generalized g2 -inverse of X0 X and a solution to the normal
equations. For the model EŒY  D x1 ˇ1 C x2 ˇ2 C x3 ˇ3 , the Type I SS for each effect are as follows:
Effect
x1
x2
x3
Type I SS
R.ˇ1 /
R.ˇ2 j ˇ1 /
R.ˇ3 j ˇ1 ; ˇ2 /
Note that some other SAS/STAT procedures compute Type I hypotheses by sweeping X0 X (for
example, PROC MIXED and PROC GLIMMIX), but their test statistics are not necessarily equivalent
to the results of using those procedures to fit models that contain successively more effects.
The Type I SS are model-order dependent; each effect is adjusted only for the preceding effects
in the model.
There are numerous ways to obtain a Type I hypothesis matrix L for each effect. One way is to form
the X0 X matrix and then reduce X0 X to an upper triangular matrix by row operations, skipping over
any rows with a zero diagonal. The nonzero rows of the resulting matrix associated with x1 provide
an L such that
SS.H0 W Lˇ D 0/ D R.ˇ1 /
The nonzero rows of the resulting matrix associated with x2 provide an L such that
SS.H0 W Lˇ D 0/ D R.ˇ2 j ˇ1 /
The last set of nonzero rows (associated with x3 ) provide an L such that
SS.H0 W Lˇ D 0/ D R.ˇ3 j ˇ1 ; ˇ2 /
Another more formalized representation of Type I generating sets for x1 , x2 , and x3 , respectively, is
G1 D . X01 X1 j
G2 D .
0
j
X02 M1 X2
G3 D .
0
j
0
where
M1 D I
X1 .X01 X1 / X01
and
M2 D M1
X01 X2
M1 X2 .X02 M1 X2 / X02 M1
j
X01 X3
/
j
/
j
X02 M1 X3
X03 M2 X3
/
Type II SS and Estimable Functions F 281
Using the Type I generating set G2 (for example), if an L is formed from linear combinations of the
rows of G2 such that L is of full row rank and of the same row rank as G2 , then SS.H0 W Lˇ D 0/ D
R.ˇ2 j ˇ1 /.
In the GLM procedure, the Type I estimable functions displayed symbolically when the E1 option is
requested are
G1 D .X01 X1 / G1
G2 D .X02 M1 X2 / G2
G3 D .X03 M2 X3 / G3
As can be seen from the nature of the generating sets G1 , G2 , and G3 , only the Type I estimable
functions for ˇ3 are guaranteed not to involve the ˇ1 and ˇ2 parameters. The Type I hypothesis
for ˇ2 can (and often does) involve ˇ3 parameters, and likewise the Type I hypothesis for ˇ1 often
involves ˇ2 and ˇ3 parameters.
There are, however, a number of models for which the Type I hypotheses are considered appropriate.
These are as follows:
balanced ANOVA models specified in proper sequence (that is, interactions do not precede
main effects in the MODEL statement and so forth)
purely nested models (specified in the proper sequence)
polynomial regression models (in the proper sequence)
Type II SS and Estimable Functions
For main-effects models and regression models, the general form of estimable functions can be
manipulated to provide tests of hypotheses involving only the parameters of the effect in question.
The same result can also be obtained by entering each effect in turn as the last effect in the model
and obtaining the Type I SS for that effect. These are the Type II SS. Using a modified reversible
sweep operator, it is possible to obtain the Type II SS without actually refitting the model.
Thus, the Type II SS correspond to the R notation in which each effect is adjusted for all other
appropriate effects. For a regression model such as
EŒY  D x1 ˇ1 C x2 ˇ2 C x3 ˇ3
the Type II SS correspond to
Effect
Type II SS
x1
x2
x3
R.ˇ1 j ˇ2 ; ˇ3 /
R.ˇ2 j ˇ1 ; ˇ3 /
R.ˇ3 j ˇ1 ; ˇ2 /
282 F Chapter 15: The Four Types of Estimable Functions
For a main-effects model (A, B, and C as classification variables), the Type II SS correspond to
Effect
A
B
C
Type II SS
R.A j B; C /
R.B j A; C /
R.C j A; B/
As the discussion in the section “A Three-Factor Main-Effects Model” on page 277 indicates, for
regression and main-effects models the Type II SS provide an MRH for each effect that does not
involve the parameters of the other effects.
In order to see what effects are appropriate to adjust for in computing Type II estimable functions,
note that for models involving interactions and nested effects, in the absence of a priori parametric
restrictions, it is not possible to obtain a test of a hypothesis for a main effect free of parameters of
higher-level interactions effects with which the main effect is involved. It is reasonable to assume,
then, that any test of a hypothesis concerning an effect should involve the parameters of that effect
and only those other parameters with which that effect is involved. The concept of effect containment
helps to define this involvement.
Contained Effect
Given two effects F 1 and F 2, F 1 is said to be contained in F 2 provided that the following two
conditions are met:
Both effects involve the same continuous variables (if any).
F 2 has more CLASS variables than F 1 does, and if F 1 has CLASS variables, they all appear
in F 2.
Note that the intercept effect is contained in all pure CLASS effects, but it is not contained in any
effect involving a continuous variable. No effect is contained by .
Type II, Type III, and Type IV estimable functions rely on this definition, and they all have one thing
in common: the estimable functions involving an effect F 1 also involve the parameters of all effects
that contain F 1, and they do not involve the parameters of effects that do not contain F 1 (other than
F 1).
Type II SS and Estimable Functions F 283
Hypothesis Matrix for Type II Estimable Functions
The Type II estimable functions for an effect F 1 have an L (before reduction to full row rank) of the
following form:
All columns of L associated with effects not containing F 1 (except F 1) are zero.
The submatrix of L associated with effect F 1 is .X01 MX1 / .X01 MX1 /.
Each of the remaining submatrices of L associated with an effect F 2 that contains F 1 is
.X01 MX1 / .X01 MX2 /.
In these submatrices,
X0 D the columns of X whose associated effects do not contain F 1
X1 D the columns of X associated with F 1
X2 D the columns of X associated with an F 2 effect that contains F 1
M D I
X0 .X00 X0 / X00
For the model
class A B;
model Y = A B A*B;
the Type II SS correspond to
R.A j ; B/; R.B j ; A/; R.A B j ; A; B/
for effects A, B, and A B, respectively. For the model
class A B C;
model Y = A B(A) C(A B);
the Type II SS correspond to
R.A j /; R.B.A/ j ; A/; R.C.AB/ j ; A; B.A//
for effects A, B.A/ and C.A B/, respectively. For the model
model Y = x x*x;
the Type II SS correspond to
R.X j ; X X / and R.X X j ; X /
for x and x x, respectively.
Note that, as in the situation for Type I tests, PROC MIXED and PROC GLIMMIX compute Type
I hypotheses by sweeping X0 X, but their test statistics are not necessarily equivalent to the results
of sequentially fitting with those procedures models that contain successively more effects; while
PROC TRANSREG computes tests labeled as being Type II by leaving out each effect in turn, but
the specific linear hypotheses associated with these tests might not be precisely the same as the ones
derived from successively sweeping X0 X.
284 F Chapter 15: The Four Types of Estimable Functions
Example of Type II Estimable Functions
For a 2 2 factorial with w observations per cell, the general form of estimable functions is shown in
Table 15.5. Any nonzero values for L2, L4, and L6 can be used to construct L vectors for computing
the Type II SS for A, B, and A B, respectively.
Table 15.5
General Form of Estimable Functions for 2 2 Factorial
Effect
Coefficient
A1
A2
B1
B2
AB11
AB12
AB21
AB22
L1
L2
L1
L4
L1
L6
L2
L4
L1
L2
L4
L6
L6
L2
L4 C L6
For a balanced 2 2 factorial with the same number of observations in every cell, the Type II
estimable functions are shown in Table 15.6.
Table 15.6
Type II Estimable Functions for Balanced 2 2 Factorial
Effect
A1
A2
B1
B2
AB11
AB12
AB21
AB22
Coefficients for Effect
A
B
AB
0
L2
L2
0
0
0:5 L2
0:5 L2
0:5 L2
0:5 L2
0
0
0
L4
L4
0:5 L4
0:5 L4
0:5 L4
0:5 L4
0
0
0
0
0
L6
L6
L6
L6
Now consider an unbalanced 2 2 factorial with two observations in every cell except the AB22
cell, which contains only one observation. The general form of estimable functions is the same as if
it were balanced, since the same effects are still estimable. However, the Type II estimable functions
for A and B are not the same as they were for the balanced design. The Type II estimable functions
for this unbalanced 2 2 factorial are shown in Table 15.7.
Type III and IV SS and Estimable Functions F 285
Table 15.7 Type II Estimable Functions for Unbalanced 2 2 Factorial
Effect
A1
A2
B1
B2
AB11
AB12
AB21
AB22
A
Coefficients for Effect
B
AB
0
L2
L2
0
0
0:6 L2
0:4 L2
0:6 L2
0:4 L2
0
0
0
L4
L4
0:6 L4
0:6 L4
0:4 L4
0:4 L4
0
0
0
0
0
L6
L6
L6
L6
By comparing the hypothesis being tested in the balanced case to the hypothesis being tested in
the unbalanced case for effects A and B, you can note that the Type II hypotheses for A and B
are dependent on the cell frequencies in the design. For unbalanced designs in which the cell
frequencies are not proportional to the background population, the Type II hypotheses for effects that
are contained in other effects are of questionable value.
However, if an effect is not contained in any other effect, the Type II hypothesis for that effect is an
MRH that does not involve any parameters except those associated with the effect in question.
Thus, Type II SS are appropriate for the following models:
any balanced model
any main-effects model
any pure regression model
an effect not contained in any other effect (regardless of the model)
In addition to the preceding models, Type II SS are generally accepted by most statisticians for
purely nested models.
Type III and IV SS and Estimable Functions
When an effect is contained in another effect, the Type II hypotheses for that effect are dependent on
the cell frequencies. The philosophy behind both the Type III and Type IV hypotheses is that the
hypotheses tested for any given effect should be the same for all designs with the same general form
of estimable functions.
To demonstrate this concept, recall the hypotheses being tested by the Type II SS in the balanced
2 2 factorial shown in Table 15.6. Those hypotheses are precisely the ones that the Type III and
286 F Chapter 15: The Four Types of Estimable Functions
Type IV hypotheses employ for all 2 2 factorials that have at least one observation per cell. The
Type III and Type IV hypotheses for a design without missing cells usually differ from the hypothesis
employed for the same design with missing cells since the general form of estimable functions
usually differs.
Many SAS/STAT procedures can perform tests of Type III hypotheses, but only PROC GLM offers
Type IV tests as well.
Type III Estimable Functions
Type III hypotheses are constructed by working directly with the general form of estimable functions.
The following steps are used to construct a hypothesis for an effect F 1:
1. For every effect in the model except F 1 and those effects that contain F 1, equate the coefficients in the general form of estimable functions to zero.
If F 1 is not contained in any other effect, this step defines the Type III hypothesis (as well as
the Type II and Type IV hypotheses). If F 1 is contained in other effects, go on to step 2. (See
the section “Type II SS and Estimable Functions” on page 281 for a definition of when effect
F 1 is contained in another effect.)
2. If necessary, equate new symbols to compound expressions in the F 1 block in order to obtain
the simplest form for the F 1 coefficients.
3. Equate all symbolic coefficients outside the F 1 block to a linear function of the symbols in the
F 1 block in order to make the F 1 hypothesis orthogonal to hypotheses associated with effects
that contain F 1.
By once again observing the Type II hypotheses being tested in the balanced 2 2 factorial, it is
possible to verify that the A and A B hypotheses are orthogonal and also that the B and A B
hypotheses are orthogonal. This principle of orthogonality between an effect and any effect that
contains it holds for all balanced designs. Thus, construction of Type III hypotheses for any design is
a logical extension of a process that is used for balanced designs.
The Type III hypotheses are precisely the hypotheses being tested by programs that reparameterize
using the usual assumptions (for example, constraining all parameters for an effect to sum to zero).
When no missing cells exist in a factorial model, Type III SS coincide with Yates’ weighted squaresof-means technique. When cells are missing in factorial models, the Type III SS coincide with those
discussed in Harvey (1960) and Henderson (1953).
The following discussion illustrates the construction of Type III estimable functions for a 2 2
factorial with no missing cells.
To obtain the A B interaction hypothesis, start with the general form and equate the coefficients for
effects , A, and B to zero, as shown in Table 15.8.
Type III and IV SS and Estimable Functions F 287
Table 15.8 Type III Hypothesis for A B Interaction
Effect
General Form
A1
A2
B1
B2
AB11
AB12
AB21
AB22
L1
L2
L1
L4
L1
L6
L2
L4
L1
L2
L4
L6
L6
L2
L4 C L6
L1 D L2 D L4 D 0
0
0
0
0
0
L6
L6
L6
L6
The last column in Table 15.8 represents the form of the MRH for A B.
To obtain the Type III hypothesis for A, first start with the general form and equate the coefficients
for effects and B to zero (let L1 D L4 D 0). Next let L6 D K L2, and find the value of K that
makes the A hypothesis orthogonal to the A*B hypothesis. In this case, K D 0:5. Each of these
steps is shown in Table 15.9.
In Table 15.9, the fourth column (under L6 D K L2) represents the form of all estimable functions
not involving , B1, or B2. The prime difference between the Type II and Type III hypotheses for A
is the way K is determined. Type II chooses K as a function of the cell frequencies, whereas Type
III chooses K such that the estimable functions for A are orthogonal to the estimable functions for
A B.
Table 15.9 Type III Hypothesis for A
Effect
General Form
A1
A2
B1
B2
AB11
AB12
AB21
AB22
L1
L2
L1
L4
L1
L6
L2
L4
L1
L2
L4
L6
L6
L2
L4 C L6
L1 D L4 D 0
0
L2
L2
0
0
L6
L2 L6
L6
L2 C L6
L6 D K L2
0
L2
L2
0
0
K L2
.1 K/ L2
K L2
.1 K/ L2
K D 0:5
0
L2
L2
0
0
0:5 L2
0:5 L2
0:5 L2
0:5 L2
An example of Type III estimable functions in a 3 3 factorial with unequal cell frequencies and
missing diagonals is given in Table 15.10 (N1 through N6 represent the nonzero cell frequencies).
288 F Chapter 15: The Four Types of Estimable Functions
Table 15.10
3 3 Factorial Design with Unequal Cell Frequencies and Missing Diagonals
B
1
2
3
1
N1 N2
A 2 N3
N4
3 N5 N6
For any nonzero values of N1 through N6 , the Type III estimable functions for each effect are shown
in Table 15.11.
Table 15.11
Type III Estimable Functions for 3 3 Factorial Design with Unequal Cell
Frequencies and Missing Diagonals
A
Effect
A1
A2
A3
B1
B2
B3
AB12
AB13
AB21
AB23
AB31
AB32
0
L2
L3
L2 L3
0
0
0
0:667 L2 C 0:333 L3
0:333 L2 0:333 L3
0:333 L2 C 0:667 L3
0:333 L2 C 0:333 L3
0:333 L2 0:667 L3
0:667 L2 0:333 L3
B
0
0
0
0
L5
L6
L5 L6
0:333 L5 C 0:667 L6
0:333 L5 0:667 L6
0:667 L5 C 0:333 L6
0:667 L5 0:333 L6
0:333 L5 0:333 L6
0:333 L5 C 0:333 L6
AB
0
0
0
0
0
0
0
L8
L8
L8
L8
L8
L8
Type IV Estimable Functions
By once again looking at the Type II hypotheses being tested in the balanced 2 2 factorial (see
Table 15.6), you can see another characteristic of the hypotheses employed for balanced designs:
the coefficients of lower-order effects are averaged across each higher-level effect involving the
same subscripts. For example, in the A hypothesis, the coefficients of AB11 and AB12 are equal
to one-half the coefficient of A1, and the coefficients of AB21 and AB22 are equal to one-half the
coefficient of A2. With this in mind, the basic concept used to construct Type IV hypotheses is that
the coefficients of any effect, say F 1, are distributed equitably across higher-level effects that contain
F 1. When missing cells occur, this same general philosophy is adhered to, but care must be taken in
the way the distributive concept is applied.
Construction of Type IV hypotheses begins as does the construction of the Type III hypotheses. That
is, for an effect F 1, equate to zero all coefficients in the general form that do not belong to F 1
or to any other effect containing F 1. If F 1 is not contained in any other effect, then the Type IV
hypothesis (and Type II and III) has been found. If F 1 is contained in other effects, then simplify, if
Type III and IV SS and Estimable Functions F 289
necessary, the coefficients associated with F 1 so that they are all free coefficients or functions of
other free coefficients in the F 1 block.
To illustrate the method of resolving the free coefficients outside the F 1 block, suppose that you are
interested in the estimable functions for an effect A and that A is contained in AB, AC , and ABC .
(In other words, the main effects in the model are A, B, and C .)
With missing cells, the coefficients of intermediate effects (here they are AB and AC ) do not always
have an equal distribution of the lower-order coefficients, so the coefficients of the highest-order
effects are determined first (here it is ABC ). Once the highest-order coefficients are determined, the
coefficients of intermediate effects are automatically determined.
The following process is performed for each free coefficient of A in turn. The resulting symbolic
vectors are then added together to give the Type IV estimable functions for A.
1. Select a free coefficient of A, and set all other free coefficients of A to zero.
2. If any of the levels of A have zero as a coefficient, equate all of the coefficients of higher-level
effects involving that level of A to zero. This step alone usually resolves most of the free
coefficients remaining.
3. Check to see if any higher-level coefficients are now zero when the coefficient of the associated
level of A is not zero. If this situation occurs, the Type IV estimable functions for A are not
unique.
4. For each level of A in turn, if the A coefficient for that level is nonzero, count the number
of times that level occurs in the higher-level effect. Then equate each of the higher-level
coefficients to the coefficient of that level of A divided by the count.
An example of a 3 3 factorial with four missing cells (N1 through N5 represent positive cell
frequencies) is shown in Table 15.12.
Table 15.12 3 3 Factorial Design with Four Missing Cells
A
1
2
3
1
N1
N3
B
2
N2
N4
3
N5
The Type IV estimable functions are shown in Table 15.13.
290 F Chapter 15: The Four Types of Estimable Functions
Table 15.13
Type IV Estimable Functions for 3 3 Factorial Design with Four Missing Cells
Effect
A1
A2
A3
B1
B2
B3
AB11
AB12
AB21
AB22
AB33
A
0
L3
L3
0
0
0
0
0:5 L3
0:5 L3
0:5 L3
0:5 L3
0
B
0
0
0
0
L5
L5
0
0:5 L5
0:5 L5
0:5 L5
0:5 L5
0
AB
0
0
0
0
0
0
0
L8
L8
L8
L8
0
A Comparison of Type III and Type IV Hypotheses
For the vast majority of designs, Type III and Type IV hypotheses for a given effect are the same.
Specifically, they are the same for any effect F 1 that is not contained in other effects for any design
(with or without missing cells). For factorial designs with no missing cells, the Type III and Type
IV hypotheses coincide for all effects. When there are missing cells, the hypotheses can differ. By
using the GLM procedure, you can study the differences in the hypotheses and then decide on the
appropriateness of the hypotheses for a particular model.
The Type III hypotheses for three-factor and higher completely nested designs with unequal N s in
the lowest level differ from the Type II hypotheses; however, the Type IV hypotheses do correspond
to the Type II hypotheses in this case.
When missing cells occur in a design, the Type IV hypotheses might not be unique. If this occurs
in PROC GLM, you are notified, and you might need to consider defining your own specific
comparisons.
References
Freund, R. J., Littell, R. C., and Spector, P. C. (1991), SAS System for Linear Models, Cary, NC: SAS
Institute Inc.
Goodnight, J. H. (1978), Tests of the Hypotheses in Fixed-Effects Linear Models, Technical Report
R-101, SAS Institute Inc, Cary, NC.
Goodnight, J. H. (1979), “A Tutorial on the Sweep Operator,” The American Statistician, 33, 149–158.
Harvey, W. R. (1960), Least-Squares Analysis of Data with Unequal Subclass Frequencies, Technical
Report ARS 20-8, USDA, Agriculture Research Service.
References F 291
Henderson, C. R. (1953), “Estimation of Variance and Covariance Components,” Biometrics, 9,
226–252.
Milliken, G. A. and Johnson, D. E. (1984), Analysis of Messy Data, Volume I: Designed Experiments,
Belmont, CA: Lifetime Learning Publications.
Your Turn
We welcome your feedback.
If you have comments about this book, please send them to
[email protected] Include the full title and page numbers (if
applicable).
If you have comments about the software, please send them to
[email protected]
SAS Publishing Delivers!
®
Whether you are new to the work force or an experienced professional, you need to distinguish yourself in this rapidly
changing and competitive job market. SAS Publishing provides you with a wide range of resources to help you set
yourself apart. Visit us online at support.sas.com/bookstore.
®
SAS Press
®
Need to learn the basics? Struggling with a programming problem? You’ll find the expert answers that you
need in example-rich books from SAS Press. Written by experienced SAS professionals from around the
world, SAS Press books deliver real-world insights on a broad range of topics for all skill levels.
SAS Documentation
support.sas.com/saspress
®
To successfully implement applications using SAS software, companies in every industry and on every
continent all turn to the one source for accurate, timely, and reliable information: SAS documentation.
We currently produce the following types of reference documentation to improve your work experience:
• Online help that is built into the software.
• Tutorials that are integrated into the product.
• Reference documentation delivered in HTML and PDF – free on the Web.
• Hard-copy books.
support.sas.com/publishing
SAS Publishing News
®
Subscribe to SAS Publishing News to receive up-to-date information about all new SAS titles, author
podcasts, and new Web site features via e-mail. Complete instructions on how to subscribe, as well as
access to past issues, are available at our Web site.
support.sas.com/spn
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2009 SAS Institute Inc. All rights reserved. 518177_1US.0109
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement