The RSREG Procedure SAS/STAT User’s Guide (Book Excerpt)

The RSREG Procedure SAS/STAT User’s Guide (Book Excerpt)
®
SAS/STAT 9.22 User’s Guide
The RSREG Procedure
(Book Excerpt)
SAS® Documentation
This document is an individual chapter from SAS/STAT® 9.22 User’s Guide.
The correct bibliographic citation for the complete manual is as follows: SAS Institute Inc. 2010. SAS/STAT® 9.22 User’s
Guide. Cary, NC: SAS Institute Inc.
Copyright © 2010, SAS Institute Inc., Cary, NC, USA
All rights reserved. Produced in the United States of America.
For a Web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at
the time you acquire this publication.
U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation
by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19,
Commercial Computer Software-Restricted Rights (June 1987).
SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513.
1st electronic book, May 2010
SAS® Publishing provides a complete selection of books and electronic products to help customers use SAS software to
its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit the
SAS Publishing Web site at support.sas.com/publishing or call 1-800-727-3228.
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute
Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.
Chapter 76
The RSREG Procedure
Contents
Overview: RSREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6454
Comparison to Other SAS Software . . . . . . . . . . . . . . . . . . . . . .
6454
Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6455
Getting Started: RSREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . .
6456
A Response Surface with a Simple Optimum . . . . . . . . . . . . . . . . .
6456
Syntax: RSREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6460
PROC RSREG Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6461
BY Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6464
ID Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6465
MODEL Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6465
RIDGE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6467
WEIGHT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6469
Details: RSREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6469
Introduction to Response Surface Experiments . . . . . . . . . . . . . . . .
6469
Coding the Factor Variables . . . . . . . . . . . . . . . . . . . . . . . . . .
6472
Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6472
Plotting the Surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6472
Searching for Multiple Response Conditions . . . . . . . . . . . . . . . . .
6473
Handling Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Computational Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6475
6476
Output Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6477
Displayed Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6478
ODS Table Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6480
ODS Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6481
Examples: RSREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6482
Example 76.1: A Saddle Surface Response Using Ridge Analysis . . . . . .
6482
Example 76.2: Response Surface Analysis with Covariates . . . . . . . . .
6486
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6491
6454 F Chapter 76: The RSREG Procedure
Overview: RSREG Procedure
The RSREG procedure uses the method of least squares to fit quadratic response surface regression
models. Response surface models are a kind of general linear model in which attention focuses
on characteristics of the fit response function and in particular, where optimum estimated response
values occur.
In addition to fitting a quadratic function, you can use the RSREG procedure to do the following:
test for lack of fit
test for the significance of individual factors
analyze the canonical structure of the estimated response surface
compute the ridge of optimum response
predict new values of the response
The RSREG procedure uses ODS Graphics to display the response surfaces, residuals, fit diagnostics,
and ridges of optimum response. For general information about ODS Graphics, see Chapter 21,
“Statistical Graphics Using ODS.”
Comparison to Other SAS Software
Other SAS/STAT procedures can be used to fit the response surface, but the RSREG procedure is
more specialized. PROC RSREG uses a much more compact model syntax than other procedures;
for example, the following statements model a three-factor response surface in the REG, GLM, and
RSREG procedures:
proc reg;
model y=x1 x1*x1
x2 x1*x2 x2*x2
x3 x1*x3 x2*x3 x3*x3;
run;
proc glm;
model y=x1|x2|x3@2;
run;
proc rsreg;
model y=x1 x2 x3;
run;
Terminology F 6455
Additionally, PROC RSREG includes specialized methodology for analyzing the fitted response
surface, such as canonical analysis and optimum response ridges.
Note that the ADX Interface in SAS/QC software provides an interactive environment for constructing and analyzing many different kinds of experiments, including response surface experiments.
The ADX Interface is the preferred interactive SAS System tool for analyzing experiments, since it
includes facilities for checking underlying assumptions and graphically optimizing the response surface; see Getting Started with the SAS ADX Interface for more information. The RSREG procedure
is appropriate for analyzing experiments in a batch environment.
Terminology
Variables are referred to according to the following conventions:
factor variables
independent variables used to construct the quadratic response surface. To
estimate the necessary parameters, each variable must have at least three
distinct values in the data. Independent variables must be numeric.
response variables
the dependent variables to which the quadratic response surfaces are fit. Dependent variables must be numeric.
covariates
additional independent variables for use in the regression but not in the formation of the quadratic response surface. Covariates must be numeric.
WEIGHT variable
a variable for weighting the observations in the regression. The WEIGHT
variable must be numeric.
ID variables
variables not previously described that are transferred to an output data set
containing statistics for each observation in the input data set. This data set
is created by using the OUT= option in the PROC RSREG statement. ID
variables can be either character or numeric.
BY variables
variables for grouping observations. Separate analyses are obtained for each
BY group. BY variables can be either character or numeric.
6456 F Chapter 76: The RSREG Procedure
Getting Started: RSREG Procedure
A Response Surface with a Simple Optimum
This example uses the three-factor quadratic model discussed in John (1971). Settings of the
temperature, gas–liquid ratio, and packing height are controlled factors in the production of a certain
chemical; Schneider and Stockett (1963) performed an experiment in order to determine the values
of these three factors that minimize the unpleasant odor of the chemical. The following statements
input the SAS data set smell; the variable Odor is the response, while the variables T, R, and H are the
independent factors.
title 'Response Surface with a Simple
data smell;
input Odor T R H @@;
label
T = "Temperature"
R = "Gas-Liquid Ratio"
H = "Packing Height";
datalines;
66 40 .3 4
39 120 .3 4
43 40
58 40 .5 2
17 120 .5 2
-5 40
65 80 .3 2
7 80 .7 2
43 80
-31 80 .5 4
-35 80 .5 4
-26 80
;
Optimum';
.7
.5
.3
.5
4
6
6
4
49 120 .7
-40 120 .5
-22 80 .7
4
6
6
The following statements invoke PROC RSREG on the data set smell. Figure 76.1 through Figure 76.3
display the results of the analysis, including a lack-of-fit test requested with the LACKFIT option.
proc rsreg data=smell;
model Odor = T R H / lackfit;
run;
Figure 76.1 displays the coding coefficients for the transformation of the independent variables to lie
between 1 and 1, simple statistics for the response variable, hypothesis tests for linear, quadratic,
and crossproduct terms, and the lack-of-fit test. The hypothesis tests can be used to gain a rough idea
of importance of the effects; here the crossproduct terms are not significant. However, the lack of fit
for the model is significant, so more complicated modeling or further experimentation with additional
variables should be performed before firm conclusions are made concerning the underlying process.
A Response Surface with a Simple Optimum F 6457
Figure 76.1 Summary Statistics and Analysis of Variance
Response Surface with a Simple Optimum
The RSREG Procedure
Coding Coefficients for the Independent Variables
Factor
T
R
H
Subtracted off
Divided by
80.000000
0.500000
4.000000
40.000000
0.200000
2.000000
Response Surface for Variable Odor
Response Mean
Root MSE
R-Square
Coefficient of Variation
Regression
Linear
Quadratic
Crossproduct
Total Model
Residual
Lack of Fit
Pure Error
Total Error
15.200000
22.478508
0.8820
147.8849
DF
Type I Sum
of Squares
R-Square
F Value
Pr > F
3
3
3
9
7143.250000
11445
293.500000
18882
0.3337
0.5346
0.0137
0.8820
4.71
7.55
0.19
4.15
0.0641
0.0264
0.8965
0.0657
DF
Sum of
Squares
Mean Square
F Value
Pr > F
3
2
5
2485.750000
40.666667
2526.416667
828.583333
20.333333
505.283333
40.75
0.0240
Parameter estimates and the factor ANOVA are shown in Figure 76.2. Looking at the parameter
estimates, you can see that the crossproduct terms are not significantly different from zero, as noted
previously. The Estimate column contains estimates based on the raw data, and the Parameter Estimate from Coded Data column contains estimates based on the coded data. The factor ANOVA table
displays tests for all four parameters corresponding to each factor—the parameters corresponding to
the linear effect, the quadratic effect, and the effects of the crossproducts with each of the other two
factors. The only factor with a significant overall effect is R, indicating that the level of noise left
unexplained by the model is still too high to estimate the effects of T and H accurately. This might be
due to the lack of fit.
6458 F Chapter 76: The RSREG Procedure
Figure 76.2 Parameter Estimates and Hypothesis Tests
Parameter
DF
Estimate
Standard
Error
t Value
Pr > |t|
Parameter
Estimate
from Coded
Data
Intercept
T
R
H
T*T
R*T
R*R
H*T
H*R
H*H
1
1
1
1
1
1
1
1
1
1
568.958333
-4.102083
-1345.833333
-22.166667
0.020052
1.031250
1195.833333
0.018750
-4.375000
1.520833
134.609816
1.489024
335.220685
29.780489
0.007311
1.404907
292.454665
0.140491
28.098135
2.924547
4.23
-2.75
-4.01
-0.74
2.74
0.73
4.09
0.13
-0.16
0.52
0.0083
0.0401
0.0102
0.4902
0.0407
0.4959
0.0095
0.8990
0.8824
0.6252
-30.666667
-12.125000
-17.000000
-21.375000
32.083333
8.250000
47.833333
1.500000
-1.750000
6.083333
Factor
T
R
H
DF
Sum of
Squares
Mean Square
F Value
Pr > F
Label
4
4
4
5258.016026
11045
3813.016026
1314.504006
2761.150641
953.254006
2.60
5.46
1.89
0.1613
0.0454
0.2510
Temperature
Gas-Liquid Ratio
Packing Height
Figure 76.3 displays the canonical analysis and eigenvectors. The canonical analysis indicates that the
directions of principal orientation for the predicted response surface are along the axes associated with
the three factors, confirming the small interaction effect in the regression ANOVA (Figure 76.1). The
largest eigenvalue (48.8588) corresponds to the eigenvector f0:238091; 0:971116; 0:015690g, the
largest component of which (0.971116) is associated with R; similarly, the second-largest eigenvalue
(31.1035) is associated with T. The third eigenvalue (6.0377), associated with H, is quite a bit smaller
than the other two, indicating that the response surface is relatively insensitive to changes in this
factor. The coded form of the canonical analysis indicates that the estimated response surface is at a
minimum when T and R are both near the middle of their respective ranges (that is, the coded critical
values for T and R are both near 0) and H is relatively high; in uncoded terms, the model predicts that
the unpleasant odor is minimized when T D 84:876502, R D 0:539915, and H D 7:541050.
Figure 76.3 Canonical Analysis and Eigenvectors
Factor
T
R
H
Critical Value
Coded
Uncoded
0.121913
0.199575
1.770525
84.876502
0.539915
7.541050
Label
Temperature
Gas-Liquid Ratio
Packing Height
Predicted value at stationary point: -52.024631
Eigenvalues
T
Eigenvectors
R
H
48.858807
31.103461
6.037732
0.238091
0.970696
-0.032594
0.971116
-0.237384
0.024135
-0.015690
0.037399
0.999177
Stationary point is a minimum.
A Response Surface with a Simple Optimum F 6459
To plot the response surface with respect to two of the factor variables, fix H, the least significant
factor variable, at its estimated optimum value. The following statements use ODS Graphics to
display the surface:
ods graphics on;
proc rsreg data=smell
plots(only unpack)=surface(3d at(H=7.541050));
model Odor = T R H;
ods select 'T * R = Pred';
run;
ods graphics off;
Note that the ODS SELECT statement is specified to select the plot of interest.
Figure 76.4 The Response Surface at the Optimum H
6460 F Chapter 76: The RSREG Procedure
Alternatively, the following statements produce an output data set containing the surface information,
which you can then use for plotting surfaces or searching for optima. The first DATA step fixes H, the
least significant factor variable, at its estimated optimum value (7.541), and generates a grid of points
for T and R. To ensure that the grid data do not affect parameter estimates, the response variable
(Odor) is set to missing. (See the section “Missing Values” on page 6472.) The second DATA step
concatenates these grid points to the original data. Then PROC RSREG computes predictions for
the combined data. The last DATA step subsets the predicted values over just the grid points, which
excludes the predictions at the original data.
data grid;
do;
Odor = . ;
H
= 7.541;
do T = 20 to 140 by 5;
do R = .1 to .9 by .05;
output;
end;
end;
end;
data grid;
set smell grid;
run;
proc rsreg data=grid out=predict noprint;
model Odor = T R H / predict;
run;
data grid;
set predict;
if H = 7.541;
run;
Syntax: RSREG Procedure
The following statements are available in PROC RSREG.
PROC RSREG < options > ;
MODEL responses= independents < / options > ;
RIDGE < options > ;
WEIGHT variable ;
ID variables ;
BY variables ;
The PROC RSREG and MODEL statements are required.
The BY, ID, MODEL, RIDGE, and WEIGHT statements are described after the PROC RSREG
statement, and they can appear in any order.
PROC RSREG Statement F 6461
PROC RSREG Statement
PROC RSREG < options > ;
The PROC RSREG statement invokes the procedure. You can specify the following options in the
PROC RSREG statement.
DATA=SAS-data-set
specifies the input SAS data set that contains the data to be analyzed. By default, PROC
RSREG uses the most recently created SAS data set.
NOPRINT
suppresses the normal display of results when only the output data set is required.
For more information, see the description of the NOPRINT option in the MODEL and RIDGE
statements.
Note that this option temporarily disables the Output Delivery System (ODS); see Chapter 20,
“Using the Output Delivery System,” for more information.
OUT=SAS-data-set
creates an output SAS data set that contains statistics for each observation in the input data set.
In particular, this data set contains the BY variables, the ID variables, the WEIGHT variable,
the variables in the MODEL statement, and the output options requested in the MODEL
statement. You must specify output statistic options in the MODEL statement; otherwise, the
output data set is created but contains no observations. To create a permanent SAS data set, you
must specify a two-level name (see SAS Language Reference: Concepts for more information
about permanent SAS data sets). For more details, see the section “OUT=SAS-data-set” on
page 6477.
PLOTS < (global-plot-option) >=plot-request< (options) >
PLOTS < (global-plot-option) >=(plot-request< (options) >< : : : plot-request< (options) > >)
controls the plots produced through ODS Graphics. When you specify only one plot-request,
you can omit the parentheses from around the plot-request. For example:
plots = all
plots = (diagnostics ridge surface(unpack))
plots(unpack) = surface(overlaypairs)
In order to produce plots, you must enable ODS Graphics and specify a plot-request, as shown
in the following statements:
ods graphics on;
proc rsreg plots=all;
model y=x;
run;
ods graphics off;
See Figure 76.4, Output 76.1.5, Output 76.1.6, Output 76.2.3, and Output 76.2.4 for examples
of the ODS graphical displays. For general information about ODS graphics, see Chapter 21,
“Statistical Graphics Using ODS.”
6462 F Chapter 76: The RSREG Procedure
The following global-plot-option is available.
UNPACKPANELS | UNPACK
suppresses paneling. By default, multiple plots can appear in some output panels.
Specify the UNPACK option to display each plot separately.
The following plot-requests are available.
ALL
produces all appropriate plots. You can specify other options with ALL; for example,
to display all plots and unpack the SURFACE contours you can specify plots=(all
surface(unpack)).
DIAGNOSTICS < (LABEL | UNPACK ) >
displays a panel of summary fit diagnostic plots. The plots produced and their usage are
discussed in Table 76.1.
Table 76.1
Diagnostic Plots
Diagnostic Plot
Usage
Cook’s D statistic versus observation
number
Dependent variable values versus
predicted values
Externally studentized residuals
(RStudent) versus leverage
Externally studentized residuals versus
predicted values
Histogram of residuals
Normal quantile plot of residuals
Evaluate influence of an observation on the entire
parameter estimate vector
Evaluate adequacy of fit and detect influential
observations
Detect outliers and influential (high-leverage)
observations
Evaluate adequacy of fit and detect outliers
Residuals versus predicted values
Residual-fit (RF) spread plot
Confirm normality of error terms
Confirm normality and homogeneity of error
terms, and detect outliers
Evaluate adequacy of fit and detect outliers
side-by-side quantile plots of the centered fit and
the residuals show “how much variation in the
data is explained by the fit and how much remains
in the residuals” (Cleveland 1993)
Observations satisfying RStudent > 2 or RStudent < –2 are called outliers, and observations with leverage > 2p/n are called influential, where n is the number of observations
used in fitting the model and p is the number of parameters used in the model (Rawlings,
Pantula, and Dickey 1998). Specifying the LABEL option labels the influential and
outlying observations—the label is the first ID variable if the ID statement is specified;
otherwise, it is the observation number. Note in the Cook’s D plot that only observations
with D exceeding 4/n are labeled; these are also called influential observations. The
UNPACK option displays each diagnostic plot separately. See Output 76.2.3 for an
example of the diagnostics panel.
PROC RSREG Statement F 6463
FIT < (GRIDSIZE=number ) >
plots the predicted values against a single predictor when you have only one factor or
only one covariate in the model. The GRIDSIZE= option specifies the number of points
at which the fitted values are computed; by default, GRIDSIZE=200.
NONE
suppresses all plots.
RESIDUALS < (UNPACK | SMOOTH) >
displays plots of residuals against each factor and covariate. The UNPACK option
displays each residual plot separately. The SMOOTH option overlays a loess smooth on
each residual plot; see Chapter 50, “The LOESS Procedure,” for more information. See
Output 76.1.5 for an example of this plot.
RIDGE < (UNPACK) >
displays the maximum and/or minimum ridge plots. This option is available only when a
MAXIMUM or MINIMUM option is specified in the RIDGE statement. The UNPACK
option displays the estimated response and factor level ridge plots separately. See
Output 76.1.5 for an example of this plot.
SURFACE < (surface-options) >
displays the response surface for each response variable and each pair of factors with all
other factors and covariates fixed at their means. By default a panel of contour plots is
produced; see Output 76.1.6 for an example of this plot. The following surface-options
can be specified:
3D
displays three-dimensional surface plots instead of contour plots. See Figure 76.4 for an example of this plot.
AT < keyword >< (variable=value-list | keyword < ...variable=value-list | keyword >) >
specifies fixed values for factors and covariates. You can specify one or more
numbers in the value-list or one of the following keywords:
MIN
sets the variable to its minimum value.
MEAN
sets the variable to its mean value.
min
MIDRANGE sets the variable to the middle value: max C
.
2
MAX
sets the variable to its maximum value.
Specifying a keyword immediately after AT sets the default value of all
variables; for example, AT MIN sets all variables not displayed on an axis to
their minimum values. By default, continuous variables are set to their means
(AT MEAN) when they are not used on an axis. For example, if your model
contains variables X1, X2, and X3, then specifying AT(X1=7 9) produces a
contour plot of X2 versus X3 fixing X1 D 7 and then another contour plot with
X1 D 9, along with contour plots of X1 versus X2 fixing X3 at its mean, and
X1 versus X3 fixing X2 at its mean.
extends the surface value-times the range of each factor in each
direction, which enables you to see more of the fitted surface. For example,
if factor A has range Œ0; 10, then specifying EXTEND=0.1 will compute and
display the surface for A in Œ 1; 11. You can specify value 0; by default,
value D 0:1.
EXTEND=value
6464 F Chapter 76: The RSREG Procedure
produces a filled contour plot for either the predicted values
or the standard errors. FILL=SE is the default. If the 3D option is also
specified, then the contour plot is projected onto the surface.
FILL=PRED | SE | NONE
creates an n n grid of points at which the estimated values for the
surface and standard errors are computed, for n 1. By default, n D 50.
GRIDSIZE=n
produces a contour line plot for either the predicted
values or the standard errors. LINE=PRED is the default. If the 3D option is
also specified, then specifying LINE displays a grid on the surface, and the
other LINE= specifications are ignored.
LINE< =PRED | SE | NONE >
suppresses the display of the design points on the contour surface plots
and the overlaid contour-line plots.
NODESIGN
produces overlaid contour line plots for all pairs of response variables in addition to the contour surface plots. See Figure 76.6 for an example
of this plot.
OVERLAYPAIRS
rotates the 3D surface plots angle degrees, –180 < angle < 180. By
default, angle = 57.
ROTATE=angle
tilts the 3D surface plots angle degrees, –180 < angle < 180. By default,
angle = 20.
TILT=angle
UNPACKPANELS | UNPACK
suppresses paneling, and displays each surface plot
separately.
BY Statement
BY variables ;
You can specify a BY statement with PROC RSREG to obtain separate analyses on observations in
groups that are defined by the BY variables. When a BY statement appears, the procedure expects the
input data set to be sorted in order of the BY variables. If you specify more than one BY statement,
only the last one specified is used.
If your input data set is not sorted in ascending order, use one of the following alternatives:
Sort the data by using the SORT procedure with a similar BY statement.
Specify the NOTSORTED or DESCENDING option in the BY statement for the RSREG
procedure. The NOTSORTED option does not mean that the data are unsorted but rather that
the data are arranged in groups (according to values of the BY variables) and that these groups
are not necessarily in alphabetical or increasing numeric order.
Create an index on the BY variables by using the DATASETS procedure (in Base SAS
software).
For more information about BY-group processing, see the discussion in SAS Language Reference:
Concepts. For more information about the DATASETS procedure, see the discussion in the Base
SAS Procedures Guide.
ID Statement F 6465
ID Statement
ID variables ;
The ID statement names variables that are to be transferred to the data set created by the OUT=
option in the PROC RSREG statement.
MODEL Statement
MODEL responses=independents < / options > ;
In the MODEL statement, you specify the response (dependent) variables followed by an equal sign
and then the independent variables, some of which can be covariates.
Table 76.2 summarizes the options available in the MODEL statement. The statistic options specify
which statistics are output to the OUT= data set. If none of the statistic options are selected, the
data set is created but contains no observations. The statistic option keywords become values of the
special variable _TYPE_ in the output data set.
Table 76.2 MODEL Statement Options
Task
Options
Analyze original data
Fit model to first BY group only
Declare covariates
Request additional statistics
Request additional tests
Suppress displayed output
NOCODE
BYOUT
COVAR=
PRESS
LACKFIT
NOANOVA
NOOPTIMAL
NOPRINT
Task
Statistic Options
Output statistics
ACTUAL
PREDICT
RESIDUAL
L95
U95
L95M
U95M
D
The following list describes these options in alphabetical order.
6466 F Chapter 76: The RSREG Procedure
ACTUAL
specifies that the observed response values from the input data set be written to the output data
set.
BYOUT
uses only the first BY group to estimate the model. Subsequent BY groups have scoring
statistics computed in the output data set only. The BYOUT option is used only when a BY
statement is specified.
COVAR=n
declares that the first n variables on the right side of the model are simple linear regressors
(covariates) and not factors in the quadratic response surface. By default, PROC RSREG forms
quadratic and crossproduct effects for all regressor variables in the MODEL statement.
See the section “Handling Covariates” on page 6475 for more details and Example 76.2 for an
example that uses covariates.
D
specifies that Cook’s D influence statistic be written to the output data set.
See Chapter 4, “Introduction to Regression Procedures,” for details and formulas.
LACKFIT
performs a lack-of-fit test.
See Draper and Smith (1981) for a discussion of lack-of-fit tests.
L95
specifies that the lower bound of a 95% confidence interval for an individual predicted value
be written to the output data set. The variance used in calculating this bound is a function of
both the mean square error and the variance of the parameter estimates.
See Chapter 4, “Introduction to Regression Procedures,” for details and formulas.
L95M
specifies that the lower bound of a 95% confidence interval for the expected value of the
dependent variable be written to the output data set. The variance used in calculating this
bound is a function of the variance of the parameter estimates.
See Chapter 4, “Introduction to Regression Procedures,” for details and formulas.
NOANOVA
NOAOV
suppresses the display of the analysis of variance and parameter estimates from the model fit.
NOCODE
performs the canonical and ridge analyses with the parameter estimates derived from fitting
the response to the original values of the factor variables, rather than their coded values (see
the section “Coding the Factor Variables” on page 6472 for more details). Use this option if
the data are already stored in a coded form.
RIDGE Statement F 6467
NOOPTIMAL
NOOPT
suppresses the display of the canonical analysis for the quadratic response surface.
NOPRINT
suppresses the display of both the analysis of variance and the canonical analysis.
PREDICT
specifies that the values predicted by the model be written to the output data set.
PRESS
computes and displays the predicted residual sum of squares (PRESS) statistic for each
dependent variable in the model. The PRESS statistic is added to the summary information
at the beginning of the analysis of variance, so if the NOANOVA or NOPRINT option is
specified, then the PRESS option has no effect.
See Chapter 4, “Introduction to Regression Procedures,” for details and formulas.
RESIDUAL
specifies that the residuals, calculated as ACTUAL
data set.
PREDICTED, be written to the output
U95
specifies that the upper bound of a 95% confidence interval for an individual predicted value
be written to the output data set. The variance used in calculating this bound is a function of
both the mean square error and the variance of the parameter estimates.
See Chapter 4, “Introduction to Regression Procedures,” for details and formulas.
U95M
specifies that the upper bound of a 95% confidence interval for the expected value of the
dependent variable be written to the output data set. The variance used in calculating this
bound is a function of the variance of the parameter estimates.
See Chapter 4, “Introduction to Regression Procedures,” for details and formulas.
RIDGE Statement
RIDGE < options > ;
A RIDGE statement computes the ridge of optimum response. The ridge starts at a given point x0 ,
and the point on the ridge at radius r from x0 is the collection of factor settings that optimizes the
predicted response at this radius. You can think of the ridge as climbing or falling as fast as possible
on the surface of predicted response. Thus, the ridge analysis can be used as a tool to help interpret
an existing response surface or to indicate the direction in which further experimentation should be
performed.
The default starting point, x0 , has each coordinate equal to the point midway between the highest
and lowest values of the factor in the design. The default radii at which the ridge is computed are 0,
6468 F Chapter 76: The RSREG Procedure
0.1, : : :, 0.9, 1. If the ridge analysis is based on the response surface fit to coded values for the factor
variables (see the section “Coding the Factor Variables” on page 6472 for details), then this results
in a ridge that starts at the point with a coded zero value for each coordinate and extends toward,
but not beyond, the edge of the range of experimentation. Alternatively, both the center point of the
ridge and the radii at which it is to be computed can be specified.
You can specify the following options in the RIDGE statement:
CENTER=uncoded-factor-values
gives the coordinates of the point x0 from which to begin the ridge. The coordinates should
be given in the original (uncoded) factor variable values and should be separated by commas.
There must be as many coordinates specified as there are factors in the model, and the order of
the coordinates must be the same as that used in the MODEL statement. This starting point
should be well inside the range of experimentation. The default sets each coordinate equal to
the value midway between the highest and lowest values for the associated factor.
MAXIMUM
MAX
computes the ridge of maximum response. Both the MIN and MAX options can be specified;
at least one must be specified.
MINIMUM
MIN
computes the ridge of minimum response. Both the MIN and MAX options can be specified;
at least one must be specified.
NOPRINT
suppresses the display of the ridge analysis when only an output data set is required.
OUTR=SAS-data-set
creates an output SAS data set containing the computed optimum ridge.
For details, see the section “OUTR=SAS-data-set” on page 6477.
RADIUS=coded-radii
gives the distances from the ridge starting point at which to compute the optima. The values in
the list represent distances between coded points. The list can take any of the following forms
or can be composed of mixtures of them:
m1 ; m2 ; : : : ; mn
specifies several values.
m TO n
specifies a sequence where m equals the starting value, n equals the ending
value, and the increment equals 1.
m TO n BY i
specifies a sequence where m equals the starting value, n equals the ending
value, and i equals the increment.
Mixtures of the preceding forms should be separated by commas. The default list runs from 0
to 1 by increments of 0.1. The following are examples of valid lists.
radius=0 to 5 by .5;
radius=0, .2, .25, .3, .5 to 1.0 by .1;
WEIGHT Statement F 6469
WEIGHT Statement
WEIGHT variable ;
When a WEIGHT statement is specified, a weighted residual sum of squares
X
wi .yi yOi /2
i
is minimized, where wi is the value of the variable specified in the WEIGHT statement, yi is the
observed value of the response variable, and yOi is the predicted value of the response variable.
The observation is used in the analysis only if the value of the WEIGHT statement variable is greater
than zero. The WEIGHT statement has no effect on degrees of freedom or number of observations.
If the weights for the observations are proportional to the reciprocals of the error variances, then the
weighted least squares estimates are best linear unbiased estimators (BLUE).
Details: RSREG Procedure
Introduction to Response Surface Experiments
Many industrial experiments are conducted to discover which values of given factor variables
optimize a response. If each factor is measured at three or more values, a quadratic response surface
can be estimated by least squares regression. The predicted optimal value can be found from the
estimated surface if the surface is shaped like a simple hill or valley. If the estimated surface is more
complicated, or if the predicted optimum is far from the region of experimentation, then the shape of
the surface can be analyzed to indicate the directions in which new experiments should be performed.
Suppose that a response variable y is measured at combinations of values of two factor variables, x1
and x2 . The quadratic response surface model for this variable is written as
y D ˇ0 C ˇ1 x1 C ˇ2 x2 C ˇ3 x12 C ˇ4 x22 C ˇ5 x1 x2 C The steps in the analysis for such data are as follows:
1. model fitting and analysis of variance, including lack-of-fit testing, to estimate parameters
2. canonical analysis to investigate the shape of the predicted response surface
3. ridge analysis to search for the region of optimum response
6470 F Chapter 76: The RSREG Procedure
Model Fitting and Analysis of Variance
The first task in analyzing the response surface is to estimate the parameters of the model by least
squares regression and to obtain information about the fit in the form of an analysis of variance. The
estimated surface is typically curved: a hill with the peak occurring at the unique estimated point of
maximum response, a valley, or a saddle surface with no unique minimum or maximum. Use the
results of this phase of the analysis to answer the following questions:
What is the contribution of each type of effect—linear, quadratic, and crossproduct—to the
statistical fit? The ANOVA table with sources labeled “Regression” addresses this question.
What part of the residual error is due to lack of fit? Does the quadratic response model
adequately represent the true response surface? If you specify the LACKFIT option in the
MODEL statement, then the ANOVA table with sources labeled “Residual” addresses this
question. See the section “Lack-of-Fit Test” on page 6470 for details.
What is the contribution of each factor variable to the statistical fit? Can the response be
predicted accurately if the variable is removed? The ANOVA table with sources labeled
“Factor” addresses this question.
What are the predicted responses for a grid of factor values? (See the section “Plotting the
Surface” on page 6472 and the section “Searching for Multiple Response Conditions” on
page 6473.)
Lack-of-Fit Test
The lack-of-fit test compares the variation around the model with pure variation within replicated
observations. This measures the adequacy of the quadratic response surface model. In particular,
if there are ni replicated observations Yi1 ; : : : ; Yi ni of the response all at the same values xi of the
factors, then you can predict the true response at xi either by using the predicted value YOi based
on the model or by using the mean YNi of the replicated values. The lack-of-fit test decomposes the
residual error into a component due to the variation of the replications around their mean value (the
pure error) and a component due to the variation of the mean values around the model prediction
(the bias error):
ni XX
Yij
i
j D1
YOi
2
D
ni
XX
i
j D1
Yij
YNi
2
C
X
ni YNi
YOi
2
i
If the model is adequate, then both components estimate the nominal level of error; however, if the
bias component of error is much larger than the pure error, then this constitutes evidence that there is
significant lack of fit.
If some observations in your design are replicated, you can test for lack of fit by specifying the
LACKFIT option in the MODEL statement. Note that, since all other tests use total error rather than
pure error, you might want to hand-calculate the tests with respect to pure error if the lack of fit is
significant. On the other hand, significant lack of fit indicates that the quadratic model is inadequate,
so if this is a problem you can also try to refine the model, possibly by using PROC GLM for general
Introduction to Response Surface Experiments F 6471
polynomial modeling; see Chapter 39, “The GLM Procedure,” for more information. Example 76.1
illustrates the use of the LACKFIT option.
Canonical Analysis
The second task in analyzing the response surface is to examine the overall shape of the curve and
determine whether the estimated stationary point is a maximum, a minimum, or a saddle point. The
canonical analysis can be used to answer the following questions:
Is the surface shaped like a hill, a valley, or a saddle, or is it flat?
If there is a unique optimum combination of factor values, where is it?
To which factor or factors are the predicted responses most sensitive?
The eigenvalues and eigenvectors in the matrix of second-order parameters characterize the shape
of the response surface. The eigenvectors point in the directions of principal orientation for the
surface, and the signs and magnitudes of the associated eigenvalues give the shape of the surface
in these directions. Positive eigenvalues indicate directions of upward curvature, and negative
eigenvalues indicate directions of downward curvature. The larger an eigenvalue is in absolute value,
the more pronounced is the curvature of the response surface in the associated direction. Often, all
the coefficients of an eigenvector except for one are relatively small, indicating that the vector points
roughly along the axis associated with the factor corresponding to the single large coefficient. In
this case, the canonical analysis can be used to determine the relative sensitivity of the predicted
response surface to variations in that factor. (See the section “Getting Started: RSREG Procedure”
on page 6456 for an example.)
Ridge Analysis
If the estimated surface is found to have a simple optimum well within the range of experimentation,
the analysis performed by the preceding two steps might be sufficient. In more complicated situations,
further search for the region of optimum response is required. The method of ridge analysis computes
the estimated ridge of optimum response for increasing radii from the center of the original design.
The ridge analysis answers the following question:
If there is not a unique optimum of the response surface within the range of experimentation,
in which direction should further searching be done in order to locate the optimum?
You can use the RIDGE statement to compute the ridge of maximum or minimum response.
6472 F Chapter 76: The RSREG Procedure
Coding the Factor Variables
For the results of the canonical and ridge analyses to be interpretable, the values of different factor
variables should be comparable. This is because the canonical and ridge analyses of the response
surface are not invariant with respect to differences in scale and location of the factor variables. The
analysis of variance is not affected by these changes. Although the actual predicted surface does not
change, its parameterization does. The usual solution to this problem is to code each factor variable
so that its minimum in the experiment is 1 and its maximum is 1 and to carry through the analysis
with the coded values instead of the original ones. This practice has the added benefit of making 1 a
reasonable boundary radius for the ridge analysis since 1 represents approximately the edge of the
experimental region. By default, PROC RSREG computes the linear transformation to perform this
coding as the data are initially read in, and the canonical and ridge analyses are performed on the
model fit to the coded data. The actual form of the coding operation for each value of a variable is
coded value D .original value
M /=S
where M is the average of the highest and lowest values for the variable in the design and S is half
their difference.
Missing Values
If an observation has missing data for any of the variables used by the procedure, then that observation
is not used in the estimation process. If one or more response variables are missing, but no factor or
covariate variables are missing, then predicted values and confidence limits are computed for the
output data set, but the residual and Cook’s D statistic are missing.
Plotting the Surface
Specifying the PLOTS=SURFACE option in the PROC RSREG statement displays contour plots for
all pairs of factors in the model (see Example 76.1), while specifying the PLOTS=SURFACE(3D)
option displays a three-dimensional surface as shown in Figure 76.4.
You can also generate predicted values for a grid of points with the PREDICT option (see the section
“Getting Started: RSREG Procedure” on page 6456 for an example) and then use these values to
create a contour plot or a three-dimensional plot of the response surface over a two-dimensional grid.
Any two factor variables can be chosen to form the grid for the plot. Several plots can be generated
by using different pairs of factor variables.
Searching for Multiple Response Conditions F 6473
Searching for Multiple Response Conditions
Suppose you have the following data with two factors and three responses, and you want to find the
factor setting that produces responses in a certain region:
data a;
input x1 x2 y1 y2 y3;
datalines;
-1
-1
1.8 1.940 3.6398
-1
1
2.6 1.843 4.9123
1
-1
5.4 1.063 6.0128
1
1
0.7 1.639 2.3629
0
0
8.5 0.134 9.0910
0
0
3.0 0.545 3.7349
0
0
9.8 0.453 10.4412
0
0
4.1 1.117 5.0042
0
0
4.8 1.690 6.6245
0
0
5.9 1.165 6.9420
0
0
7.3 1.013 8.7442
0
0
9.3 1.179 10.2762
1.4142 0
3.9 0.945 5.0245
-1.4142 0
1.7 0.333 2.4041
0
1.4142
3.0 1.869 5.2695
0
-1.4142
5.7 0.099 5.4346
;
You want to find the values of x1 and x2 that maximize y1 subject to y2<2 and y3<y2+y1. The
exact answer is not easy to obtain analytically, but you can obtain a practically feasible solution by
checking conditions across a grid of values in the range of interest. First, append a grid of factor
values to the observed data, with missing values for the responses:
data b;
set a end=eof;
output;
if eof then do;
y1=.;
y2=.;
y3=.;
do x1=-2 to 2 by .1;
do x2=-2 to 2 by .1;
output;
end;
end;
end;
run;
Next, use PROC RSREG to fit a response surface model to the data and to compute predicted values
for both the observed data and the grid, putting the predicted values in a data set c:
proc rsreg data=b out=c;
model y1 y2 y3=x1 x2 / predict;
run;
6474 F Chapter 76: The RSREG Procedure
Finally, find the subset of predicted values that satisfy the constraints, sort by the unconstrained
variable, and display the top five predictions:
data d;
set c;
if y2<2;
if y3<y2+y1;
proc sort data=d;
by descending y1;
run;
data d; set d;
if (_n_ <= 5);
proc print;
run;
The results are displayed in Figure 76.5. They indicate that optimal values of the factors are around
0.3 for x1 and around –0.5 for x2.
Figure 76.5 Top Five Predictions
Obs
1
2
3
4
5
x1
0.3
0.3
0.3
0.4
0.4
x2
_TYPE_
-0.5
-0.6
-0.4
-0.6
-0.5
PREDICT
PREDICT
PREDICT
PREDICT
PREDICT
y1
6.92570
6.91424
6.91003
6.90769
6.90540
y2
0.75784
0.74174
0.77870
0.73357
0.75135
y3
7.60471
7.54194
7.64341
7.51836
7.56883
If you are also interested in simultaneously optimizing y1 and y2, you can specify the following
statements to make a visual comparison of the two response surfaces by overlaying their contour
plots:
ods graphics on;
proc rsreg data=a plots(only)=surface(overlaypairs);
model y1 y2=x1 x2;
run;
ods graphics off;
Figure 76.6 shows that you have to make some compromises in any attempt to maximize both y1 and
y2; however, you might be able to maximize y1 while minimizing y2.
Handling Covariates F 6475
Figure 76.6 Overlaid Line Contours of Predicted Responses
Handling Covariates
Covariate regressors are added to a response surface model because they are believed to account for
a sizable yet relatively uninteresting portion of the variation in the data. What the experimenter is
really interested in is the response corrected for the effect of the covariates. A common example is
the block effect in a block design. In the canonical and ridge analyses of a response surface, which
estimate responses at hypothetical levels of the factor variables, the actual value of the predicted
response is computed by using the average values of the covariates. The estimated response values
do optimize the estimated surface of the response corrected for covariates, but true prediction of the
response requires actual values for the covariates. You can use the COVAR= option in the MODEL
statement to include covariates in the response surface model. Example 76.2 illustrates the use of
this option.
6476 F Chapter 76: The RSREG Procedure
Computational Method
Canonical Analysis
For each response variable, the model can be written in the form
yi D x0i Axi C b0 xi C c0 zi C i
where
yi
is the i th observation of the response variable.
xi
D .xi1 ; xi 2 ; : : : ; xi k /0 are the k factor variables for the i th observation.
zi
D .zi1 ; zi 2 ; : : : ; ziL /0 are the L covariates, including the intercept term.
A
is the k k symmetrized matrix of quadratic parameters, with diagonal elements equal to the
coefficients of the pure quadratic terms in the model and off-diagonal elements equal to half
the coefficient of the corresponding crossproduct.
b
is the k 1 vector of linear parameters.
c
is the L 1 vector of covariate parameters, one of which is the intercept.
i
is the error associated with the ith observation. Tests performed by PROC RSREG assume
that errors are independently and normally distributed with mean zero and variance 2 .
The parameters in A, b, and c are estimated by least squares. To optimize y with respect to x, take
partial derivatives, set them to zero, and solve:
@y
D 2x0 A C b0 D 0
@x
H)
x D
1
A
2
1
b
You can determine if the solution is a maximum or minimum by looking at the eigenvalues of A:
If the eigenvalues. . .
are all negative
are all positive
have mixed signs
contain zeros
then the solution is. . .
a maximum
a minimum
a saddle point
in a flat area
Ridge Analysis
If the largest eigenvalue is positive, its eigenvector gives the direction of steepest ascent from the
stationary point; if the largest eigenvalue is negative, its eigenvector gives the direction of steepest
descent. The eigenvectors corresponding to small or zero eigenvalues point in directions of relative
flatness.
The point on the optimum response ridge at a given radius R from the ridge origin is found by
optimizing
.x0 C d/0 A.x0 C d/ C b0 .x0 C d/
Output Data Sets F 6477
over d satisfying d0 d D R2 , where x0 is the k 1 vector containing the ridge origin and A and b are
as previously discussed. By the method of Lagrange multipliers, the optimal d has the form
dD
.A
I/
1
.Ax0 C 0:5b/
where I is the k k identity matrix and is chosen so that d0 d D R2 . There can be several values
of that satisfy this constraint; the correct one depends on which sort of response ridge is of interest.
If you are searching for the ridge of maximum response, then the appropriate is the unique one
that satisfies the constraint and is greater than all the eigenvalues of A. Similarly, the appropriate for the ridge of minimum response satisfies the constraint and is less than all the eigenvalues of A.
(See Myers and Montgomery (1995) for details.)
Output Data Sets
OUT=SAS-data-set
An output data set containing statistics requested with options in the MODEL statement for each
observation in the input data set is created whenever the OUT= option is specified in the PROC
RSREG statement. The data set contains the following variables:
the BY variables
the ID variables
the WEIGHT variable
the independent variables in the MODEL statement
the variable _TYPE_, which identifies the observation type in the output data set. _TYPE_ is a
character variable with a length of eight, and it takes on the values ‘ACTUAL’, ‘PREDICT’,
‘RESIDUAL’, ‘U95M’, ‘L95M’, ‘U95’, ‘L95’, and ‘D’, corresponding to the options specified.
the response variables containing special output values identified by the _TYPE_ variable
All confidence limits use the two-tailed Student’s t value.
OUTR=SAS-data-set
An output data set containing the optimum response ridge is created when the OUTR= option is
specified in the RIDGE statement. The data set contains the following variables:
the current values of the BY variables
a character variable _DEPVAR_ containing the name of the dependent variable
6478 F Chapter 76: The RSREG Procedure
a character variable _TYPE_ identifying the type of ridge being computed, MINIMUM or
MAXIMUM. If both MAXIMUM and MINIMUM are specified, the data set contains observations for the minimum ridge followed by observations for the maximum ridge.
a numeric variable _RADIUS_ giving the distance from the ridge starting point
the values of the model factors at the estimated optimum point at distance _RADIUS_ from the
ridge starting point
a numeric variable _PRED_, which is the estimated expected value of the dependent variable at
the optimum
a numeric variable _STDERR_, which is the standard error of the estimated expected value
Displayed Output
All estimates and hypothesis tests assume that the model is correctly specified and the errors are
distributed according to classical statistical assumptions.
The output displayed by PROC RSREG includes the following.
Estimation and Analysis of Variance
The actual form of the coding operation for each value of a variable is
coded value D
1
.original value
S
M/
where M is the average of the highest and lowest values for the variable in the design and S is
half their difference. The Subtracted off column contains the M values for this formula for
each factor variable, and S is found in the Divided by column.
The summary table for the response variable contains the following information.
“Response Mean” is the mean of the response variable in the sample. When a WEIGHT
statement is specified, the mean yN is calculated by
P
wi yi
yN D Pi
i wi
“Root MSE” estimates the standard deviation of the response variable and is calculated
as the square root of the “Total Error” mean square.
The “R-Square” value is R2 , or the coefficient of determination. R2 measures the
proportion of the variation in the response that is attributed to the model rather than to
random error.
The “Coefficient of Variation” is 100 times the ratio of the “Root MSE” to the “Response
Mean.”
Displayed Output F 6479
A table analyzing the significance of the terms of the regression is displayed. Terms are
brought into the regression in four steps: (1) the “Intercept” and any covariates in the model,
(2) “Linear” terms like X1 and X2, (3) pure “Quadratic” terms like X1*X1 or X2*X2, and (4)
“Crossproduct” terms like X1*X2. The table displays the following information:
the degrees of freedom in the DF column, which should be the same as the number of
corresponding parameters unless one or more of the parameters are not estimable
Type I Sum of Squares, also called the sequential sums of squares, which measures the
reduction in the error sum of squares as sets of terms (Linear, Quadratic, and so forth)
are added to the model
R-Square, which measures the portion of total R2 contributed as each set of terms (Linear,
Quadratic, and so forth) is added to the model
F Value, which tests the null hypothesis that all parameters in the term are zero by using
the Total Error mean square as the denominator. This is a test of a Type I hypothesis,
containing the usual F test numerator, conditional on the effects of subsequent variables
not being in the model.
Pr > F, which is the significance value or probability of obtaining at least as great an F
ratio given that the null hypothesis is true.
The Sum of Squares column partitions the “Total Error” into “Lack of Fit” and “Pure Error.”
When “Lack of Fit” is significant, there is variation around the model other than random error
(such as cubic effects of the factor variables).
The “Total Error” Mean Square estimates 2 , the variance.
F Value tests the null hypothesis that the variation is adequately described by random
error.
A table containing the parameter estimates from the model is displayed.
The Estimate column contains the parameter estimates based on the uncoded values of
the factor variables. If an effect is a linear combination of previous effects, the parameter
for the effect is not estimable. When this happens, the degrees of freedom are zero,
the parameter estimate is set to zero, and estimates and tests on other parameters are
conditional on this parameter being zero.
The Standard Error column contains the estimated standard deviations of the parameter
estimates based on uncoded data.
The t Value column contains t values of a test of the null hypothesis that the true
parameter is zero when the uncoded values of the factor variables are used.
The Pr > |T| column gives the significance value or probability of a greater absolute t
ratio given that the true parameter is zero.
The Parameter Estimate from Coded Data column contains the parameter estimates
based on the coded values of the factor variables. These are the estimates used in the
subsequent canonical and ridge analyses.
The sum of squares are partitioned by the factors in the model, and an analysis table is
displayed. The test on a factor is a joint test on all the parameters involving that factor. For
example, the test for the factor X1 tests the null hypothesis that the true parameters for X1,
X1*X1, and X1*X2 are all zero.
6480 F Chapter 76: The RSREG Procedure
Canonical Analysis
The Critical Value columns contain the values of the factor variables that correspond to the
stationary point of the fitted response surface. The critical values can be at a minimum,
maximum, or saddle point.
The eigenvalues and eigenvectors are from the matrix of quadratic parameter estimates based
on the coded data. They characterize the shape of the response surface.
Ridge Analysis
The Coded Radius column contains the distance from the coded version of the associated point
to the coded version of the origin of the ridge. The origin is given by the point at radius zero.
The Estimated Response column contains the estimated value of the response variable at the
associated point. The standard error of this estimate is also given. This quantity is useful for
assessing the relative credibility of the prediction at a given radius. Typically, this standard
error increases rapidly as the ridge moves up to and beyond the design perimeter, reflecting
the inherent difficulty of making predictions beyond the range of experimentation.
The Uncoded Factor Values columns contain the values of the uncoded factor variables that
give the optimum response at this radius from the ridge origin.
ODS Table Names
PROC RSREG assigns a name to each table it creates. You can use these names to reference the table
when using the Output Delivery System (ODS) to select tables and create output data sets. These
names are listed in Table 76.3. For more information about ODS, see Chapter 20, “Using the Output
Delivery System.”
Table 76.3
ODS Tables Produced by PROC RSREG
ODS Table Name
Description
Statement
Coding
Coding coefficients for the independent
variables
Error analysis of variance
Factor analysis of variance
Overall statistics for fit
Model analysis of variance
Estimated linear parameters
Ridge analysis for optimum response
Spectral analysis
Stationary point of response surface
default
ErrorANOVA
FactorANOVA
FitStatistics
ModelANOVA
ParameterEstimates
Ridge
Spectral
StationaryPoint
default
default
default
default
default
RIDGE
default
default
ODS Graphics F 6481
ODS Graphics
PROC RSREG assigns a name to each graph it creates using ODS. You can use these names to
reference the graphs when using ODS. The names are listed in Table 76.4.
To request these graphs you must specify the ODS GRAPHICS statement in addition to the PLOTS=
option and any other options indicated in Table 76.4. For more information about the ODS GRAPHICS statement, see Chapter 21, “Statistical Graphics Using ODS.”
Table 76.4 ODS Graphics Produced by PROC RSREG
ODS Graph Name
Plot Description
PLOTS= Option
FitPlot
DiagnosticsPanel
CooksDPlot
ObservedByPredicted
QQPlot
ResidualByPredicted
ResidualHistogram
RFPlot
RStudentByPredicted
RStudentByLeverage
ResidualPlots
Fit plot for 1 predictor
Panel of fit diagnostics
Cook’s D plot
Observed by predicted
Residual Q-Q plot
Residual by predicted values
Residual histogram
RF plot
Studentized residuals by predicted
RStudent by hat diagonals
Panel of residuals by predictors
Residuals by predictors
Panel of ridge plot and factors
FIT
DIAGNOSTICS
DIAGNOSTICS(UNPACK)
DIAGNOSTICS(UNPACK)
DIAGNOSTICS(UNPACK)
DIAGNOSTICS(UNPACK)
DIAGNOSTICS(UNPACK)
DIAGNOSTICS(UNPACK)
DIAGNOSTICS(UNPACK)
DIAGNOSTICS(UNPACK)
RESIDUALS
RESIDUALS(UNPACK)
RIDGE
(with RIDGE MAX or MIN)
RIDGE(UNPACK)
(with RIDGE MAX or MIN)
RIDGE(UNPACK)
(with RIDGE MAX or MIN)
SURFACE
SURFACE(UNPACK)
SURFACE(3D)
SURFACE(3D UNPACK)
SURFACE(OVERLAYPAIRS)
SURFACE(OVERLAYPAIRS UNPACK)
RidgePlots
Ridge plot
Ridge factors
Contour
Surface
ContourOverlay
Panel of contour plots
Contour plots
Panel of 3D surface plots
3D surface plots
Panel of overlaid line-contour plots
Overlaid line-contour plots
6482 F Chapter 76: The RSREG Procedure
Examples: RSREG Procedure
Example 76.1: A Saddle Surface Response Using Ridge Analysis
Myers (1976) analyzes an experiment reported by Frankel (1961) aimed at maximizing the yield of
mercaptobenzothiazole (MBT) by varying processing time and temperature. Myers (1976) uses a
two-factor model in which the estimated surface does not have a unique optimum. A ridge analysis is
used to determine the region in which the optimum lies. The objective is to find the settings of time
and temperature in the processing of a chemical that maximize the yield. The following statements
produce Output 76.1.1 through Output 76.1.6:
data d;
input Time Temp MBT;
label Time = "Reaction Time (Hours)"
Temp = "Temperature (Degrees Centigrade)"
MBT = "Percent Yield Mercaptobenzothiazole";
datalines;
4.0
250
83.8
20.0
250
81.7
12.0
250
82.4
12.0
250
82.9
12.0
220
84.7
12.0
280
57.9
12.0
250
81.2
6.3
229
81.3
6.3
271
83.1
17.7
229
85.3
17.7
271
72.7
4.0
250
82.0
;
ods graphics on;
proc rsreg data=d plots=(ridge surface);
model MBT=Time Temp / lackfit;
ridge max;
run;
ods graphics off;
Output 76.1.1 displays the coding coefficients for the transformation of the independent variables to
lie between 1 and 1 and some simple statistics for the response variable.
Example 76.1: A Saddle Surface Response Using Ridge Analysis F 6483
Output 76.1.1 Coding and Response Variable Information
The RSREG Procedure
Coding Coefficients for the Independent Variables
Factor
Subtracted off
Divided by
12.000000
250.000000
8.000000
30.000000
Time
Temp
Response Surface for Variable MBT: Percent Yield Mercaptobenzothiazole
Response Mean
Root MSE
R-Square
Coefficient of Variation
79.916667
4.615964
0.8003
5.7760
Output 76.1.2 shows that the lack of fit for the model is highly significant. Since the quadratic model
does not fit the data very well, firm statements about the underlying process should not be based only
on the current analysis. Note from the analysis of variance for the model that the test for the time
factor is not significant. If further experimentation is undertaken, it might be best to fix Time at a
moderate to high value and to concentrate on the effect of temperature. In the actual experiment
discussed here, extra runs were made that confirmed the results of the following analysis.
Output 76.1.2 Analyses of Variance
Regression
Linear
Quadratic
Crossproduct
Total Model
Residual
Lack of Fit
Pure Error
Total Error
DF
Type I Sum
of Squares
R-Square
F Value
Pr > F
2
2
1
5
313.585803
146.768144
51.840000
512.193947
0.4899
0.2293
0.0810
0.8003
7.36
3.44
2.43
4.81
0.0243
0.1009
0.1698
0.0410
DF
Sum of
Squares
Mean Square
F Value
Pr > F
3
3
6
124.696053
3.146667
127.842720
41.565351
1.048889
21.307120
39.63
0.0065
Parameter
DF
Estimate
Standard
Error
t Value
Pr > |t|
Parameter
Estimate
from Coded
Data
Intercept
Time
Temp
Time*Time
Temp*Time
Temp*Temp
1
1
1
1
1
1
-545.867976
6.872863
4.989743
0.021631
-0.030075
-0.009836
277.145373
5.004928
2.165839
0.056784
0.019281
0.004304
-1.97
1.37
2.30
0.38
-1.56
-2.29
0.0964
0.2188
0.0608
0.7164
0.1698
0.0623
82.173110
-1.014287
-8.676768
1.384394
-7.218045
-8.852519
6484 F Chapter 76: The RSREG Procedure
Output 76.1.2 continued
Factor
Time
Temp
DF
Sum of
Squares
Mean Square
F Value
Pr > F
3
3
61.290957
461.250925
20.430319
153.750308
0.96
7.22
0.4704
0.0205
Factor
Label
Time
Temp
Reaction Time (Hours)
Temperature (Degrees Centigrade)
The canonical analysis (Output 76.1.3) indicates that the predicted response surface is shaped like a
saddle. The eigenvalue of 2.5 shows that the valley orientation of the saddle is less curved than the
hill orientation, with an eigenvalue of 9:99. The coefficients of the associated eigenvectors show
that the valley is more aligned with Time and the hill with Temp. Because the canonical analysis
resulted in a saddle point, the estimated surface does not have a unique optimum.
Output 76.1.3 Canonical Analysis
Factor
Time
Temp
Critical Value
Coded
Uncoded
-0.441758
-0.309976
8.465935
240.700718
Label
Reaction Time (Hours)
Temperature (Degrees Centigrade)
Predicted value at stationary point: 83.741940
Eigenvalues
2.528816
-9.996940
Eigenvectors
Time
0.953223
0.302267
Temp
-0.302267
0.953223
Stationary point is a saddle point.
However, the ridge analysis in Output 76.1.4 and the ridge plot in Output 76.1.5 indicate that
maximum yields result from relatively high reaction times and low temperatures. A contour plot of
the predicted response surface, shown in Output 76.1.6, confirms this conclusion.
Example 76.1: A Saddle Surface Response Using Ridge Analysis F 6485
Output 76.1.4 Ridge Analysis
Estimated Ridge of Maximum Response for Variable
MBT: Percent Yield Mercaptobenzothiazole
Coded
Radius
Estimated
Response
Standard
Error
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
82.173110
82.952909
83.558260
84.037098
84.470454
84.914099
85.390012
85.906767
86.468277
87.076587
87.732874
2.665023
2.648671
2.602270
2.533296
2.457836
2.404616
2.410981
2.516619
2.752355
3.130961
3.648568
Uncoded Factor Values
Time
Temp
12.000000
11.964493
12.142790
12.704153
13.517555
14.370977
15.212247
16.037822
16.850813
17.654321
18.450682
Output 76.1.5 Ridge Plot of Predicted Response Surface
250.000000
247.002956
244.023941
241.396084
239.435227
237.919138
236.624811
235.449230
234.344204
233.284652
232.256238
6486 F Chapter 76: The RSREG Procedure
Output 76.1.6 Contour Plot of Predicted Response Surface
Example 76.2: Response Surface Analysis with Covariates
One way of viewing covariates is as extra sources of variation in the dependent variable that can mask
the variation due to primary factors. This example demonstrates the use of the COVAR= option in
PROC RSREG to fit a response surface model to the dependent variables corrected for the covariates.
You have a chemical process with a yield that you hypothesize to be dependent on three factors:
reaction time, reaction temperature, and reaction pressure. You perform an experiment to measure
this dependence. You are willing to include up to 20 runs in your experiment, but you can perform
no more than 8 runs on the same day, so the design for the experiment is composed of three blocks.
Additionally, you know that the grade of raw material for the reaction has a significant impact on the
yield. You have no control over this, but you keep track of it. The following statements create a SAS
data set containing the results of the experiment:
Example 76.2: Response Surface Analysis with Covariates F 6487
data Experiment;
input Day Grade Time
datalines;
1 67
-1
-1
1 68
-1
1
1 70
1
-1
1 66
1
1
1 74
0
0
1 68
0
0
2 75
-1
-1
2 69
-1
1
2 70
1
-1
2 71
1
1
2 72
0
0
2 74
0
0
3 69
1.633 0
3 67
-1.633 0
3 68
0
1.633
3 71
0
-1.633
3 70
0
0
3 72
0
0
3 70
0
0
3 72
0
0
;
Temp Pressure Yield;
-1
1
1
-1
0
0
1
-1
-1
1
0
0
0
0
0
0
1.633
-1.633
0
0
32.98
47.04
67.11
26.94
103.22
42.94
122.93
62.97
72.96
94.93
93.11
112.97
78.88
52.53
68.96
92.56
88.99
102.50
82.84
103.12
Your first analysis neglects to take the covariates into account. The following statements use PROC
RSREG to fit a response surface to the observed yield, but note that Day and Grade are omitted:
proc rsreg data=Experiment;
model Yield = Time Temp Pressure;
run;
The ANOVA results shown in Output 76.2.1 indicate that no process variable effects are significantly
larger than the background noise.
Output 76.2.1 Analysis of Variance Ignoring Covariates
The RSREG Procedure
Regression
Linear
Quadratic
Crossproduct
Total Model
DF
Type I Sum
of Squares
R-Square
F Value
Pr > F
3
3
3
9
1880.842426
2370.438681
241.873250
4493.154356
0.1353
0.1706
0.0174
0.3233
0.67
0.84
0.09
0.53
0.5915
0.5023
0.9663
0.8226
Residual
DF
Sum of
Squares
Mean Square
Total Error
10
9405.129724
940.512972
6488 F Chapter 76: The RSREG Procedure
However, when the yields are adjusted for covariate effects of day and grade of raw material, very
strong process variable effects are revealed. The following statements produce the ANOVA results in
Output 76.2.2. Note that in order to include the effects of the classification factor Day as covariates,
you need to create dummy variables indicating each day separately.
data Experiment;
set Experiment;
d1 = (Day = 1);
d2 = (Day = 2);
d3 = (Day = 3);
ods graphics on;
proc rsreg data=Experiment plots=all;
model Yield = d1-d3 Grade Time Temp Pressure / covar=4;
run;
ods graphics off;
The results show very strong effects due to both the covariates and the process variables.
Output 76.2.2 Analysis of Variance Including Covariates
The RSREG Procedure
Regression
DF
Type I Sum
of Squares
R-Square
F Value
Pr > F
Covariates
Linear
Quadratic
Crossproduct
Total Model
3
3
3
3
12
13695
156.524497
22.989775
23.403614
13898
0.9854
0.0113
0.0017
0.0017
1.0000
316957
3622.53
532.06
541.64
80413.2
<.0001
<.0001
<.0001
<.0001
<.0001
Residual
Total Error
DF
Sum of
Squares
Mean Square
7
0.100820
0.014403
The number of observations in the data set might be too small for the diagnostic plots in Output 76.2.3
to dependably identify problems; however, some outliers are indicated. The residual plots in
Output 76.2.4 do not display any obvious structure.
Example 76.2: Response Surface Analysis with Covariates F 6489
Output 76.2.3 Fit Diagnostics
6490 F Chapter 76: The RSREG Procedure
Output 76.2.4 Residual Plots
References F 6491
References
Box, G. E. P. (1954), “The Exploration and Exploitation of Response Surfaces: Some General
Considerations,” Biometrics, 10, 16.
Box, G. E. P. and Draper, N. R. (1982), “Measures of Lack of Fit for Response Surface Designs and
Predictor Variable Transformations,” Technometrics, 24, 1–8.
Box, G. E. P. and Draper, N. R. (1987), Empirical Model Building and Response Surfaces, New
York: John Wiley & Sons.
Box, G. E. P. and Hunter, J. S. (1957), “Multifactor Experimental Designs for Exploring Response
Surfaces,” Annals of Mathematical Statistics, 28, 195–242.
Box, G. E. P., Hunter, W. G., and Hunter, J. S. (1978), Statistics for Experimenters, New York: John
Wiley & Sons.
Box, G. E. P. and Wilson, K. J. (1951), “On the Experimental Attainment of Optimum Conditions,”
Journal of the Royal Statistical Society.
Cleveland, W. S. (1993), Visualizing Data, Summit, NJ: Hobart Press.
Cochran, W. G. and Cox, G. M. (1957), Experimental Designs, Second Edition, New York: John
Wiley & Sons.
Draper, N. R. (1963), “Ridge Analysis of Response Surfaces,” Technometrics, 5, 469–479.
Draper, N. R. and John, J. A. (1988), “Response Surface Designs for Quantitative and Qualitative
Variables,” Technometrics, 30, 423–428.
Draper, N. R. and Smith, H. (1981), Applied Regression Analysis, Second Edition, New York: John
Wiley & Sons.
Frankel, S. A. (1961), “Statistical Design of Experiments for Process Development of MBT,” Rubber
Age, 89, 453.
John, P. W. M. (1971), Statistical Design and Analysis of Experiments, New York: Macmillan.
Mead, R. and Pike, D. J. (1975), “A Review of Response Surface Methodology from a Biometric
Point of View,” Biometrics, 31, 803.
Meyer, D. C. (1963), “Response Surface Methodology in Education and Psychology,” Journal of
Experimental Education, 31, 329.
Myers, R. H. (1976), Response Surface Methodology, Blacksburg: Virginia Polytechnic Institute and
State University.
Myers, R. H. and Montgomery, D. C. (1995), Response Surface Methodology: Process and Product
Optimization Using Designed Experiments, New York: John Wiley & Sons.
Rawlings, J. O., Pantula, S. G., and Dickey, D. A. (1998), Applied Regression Analysis: A Research
Tool, Springer Texts in Statistics, Second Edition, New York: Springer-Verlag.
6492 F Chapter 76: The RSREG Procedure
Schneider, A. M. and Stockett, A. L. (1963), “An Experiment to Select Optimum Operating Conditions on the Basis of Arbitrary Preference Ratings,” Chemical Engineering Progress Symposium
Series.
Subject Index
analysis of variance
quadratic response surfaces, 6470
canonical analysis
response surfaces, 6471
RSREG procedure, 6471
confidence intervals
individual observation (RSREG), 6466, 6467
means (RSREG), 6466, 6467
Cook’s D influence statistic
RSREG procedure, 6466
eigenvalues and eigenvectors
RSREG procedure, 6476
GLM procedure
compared to other procedures, 6454
hypothesis tests
lack of fit (RSREG), 6470
lack-of-fit tests
RSREG procedure, 6470
ODS graph names
RSREG procedure, 6481
predicted residual sum of squares
RSREG procedure, 6467
PRESS statistic
RSREG procedure, 6467
response surfaces, 6453
canonical analysis, interpreting, 6471
covariates, 6475
experiments, 6469
plotting, 6472
ridge analysis, 6471
ridge analysis
RSREG procedure, 6471
RSREG procedure
canonical analysis, 6471
coding variables, 6472, 6478
compared to other procedures, 6454
computational methods, 6476
confidence intervals, 6466, 6467
Cook’s D influence statistic, 6466
covariates, 6455
eigenvalues, 6476
eigenvectors, 6476
factor variables, 6455
input data sets, 6461, 6466
introductory example, 6456
missing values, 6472
ODS graph names, 6481
ODS table names, 6480
output data sets, 6461, 6468, 6477
PRESS statistic, 6467
response variables, 6455
ridge analysis, 6471
Syntax Index
ACTUAL option
MODEL statement (RSREG), 6466
BY statement
RSREG procedure, 6464
BYOUT option
MODEL statement (RSREG), 6466
CENTER= option
RIDGE statement (RSREG), 6468
COVAR= option
MODEL statement (RSREG), 6466
D option
MODEL statement (RSREG), 6466
DATA= option
PROC RSREG statement, 6461
ID statement
RSREG procedure, 6465
L95 option
MODEL statement (RSREG), 6466
L95M option
MODEL statement (RSREG), 6466
LACKFIT option
MODEL statement (RSREG), 6466
MAXIMUM option
RIDGE statement (RSREG), 6468
MINIMUM option
RIDGE statement (RSREG), 6468
MODEL statement
RSREG procedure, 6465
NOANOVA option
MODEL statement (RSREG), 6466
NOCODE option
MODEL statement (RSREG), 6466
NOOPTIMAL option
MODEL statement (RSREG), 6467
NOPRINT option
MODEL statement (RSREG), 6467
PROC RSREG statement, 6461
RIDGE statement (RSREG), 6468
OUT= option
PROC RSREG statement, 6461
OUTR= option
RIDGE statement (RSREG), 6468
PLOTS= option
PROC RSREG statement, 6461
PREDICT option
MODEL statement (RSREG), 6467
PRESS option
MODEL statement (RSREG), 6467
PROC RSREG statement, see RSREG procedure
RADIUS= option
RIDGE statement (RSREG), 6468
RESIDUAL option
MODEL statement (RSREG), 6467
RIDGE statement
RSREG procedure, 6467
RSREG procedure
syntax, 6460
RSREG procedure, BY statement, 6464
RSREG procedure, ID statement, 6465
RSREG procedure, MODEL statement, 6465
ACTUAL option, 6466
BYOUT option, 6466
COVAR= option, 6466
D option, 6466
L95 option, 6466
L95M option, 6466
LACKFIT option, 6466
NOANOVA option, 6466
NOCODE option, 6466
NOOPTIMAL option, 6467
NOPRINT option, 6467
PREDICT option, 6467
PRESS option, 6467
RESIDUAL option, 6467
U95 option, 6467
U95M option, 6467
RSREG procedure, PROC RSREG statement,
6461
DATA= option, 6461
NOPRINT option, 6461
OUT= option, 6461
PLOTS= option, 6461
RSREG procedure, RIDGE statement, 6467
CENTER= option, 6468
MAXIMUM option, 6468
MINIMUM option, 6468
NOPRINT option, 6468
OUTR= option, 6468
RADIUS= option, 6468
RSREG procedure, WEIGHT statement, 6469
U95 option
MODEL statement (RSREG), 6467
U95M option
MODEL statement (RSREG), 6467
WEIGHT statement
RSREG procedure, 6469
Your Turn
We welcome your feedback.
If you have comments about this book, please send them to
yourturn@sas.com. Include the full title and page numbers (if
applicable).
If you have comments about the software, please send them to
suggest@sas.com.
SAS Publishing Delivers!
®
Whether you are new to the work force or an experienced professional, you need to distinguish yourself in this rapidly
changing and competitive job market. SAS Publishing provides you with a wide range of resources to help you set
yourself apart. Visit us online at support.sas.com/bookstore.
®
SAS Press
®
Need to learn the basics? Struggling with a programming problem? You’ll find the expert answers that you
need in example-rich books from SAS Press. Written by experienced SAS professionals from around the
world, SAS Press books deliver real-world insights on a broad range of topics for all skill levels.
SAS Documentation
support.sas.com/saspress
®
To successfully implement applications using SAS software, companies in every industry and on every
continent all turn to the one source for accurate, timely, and reliable information: SAS documentation.
We currently produce the following types of reference documentation to improve your work experience:
• Online help that is built into the software.
• Tutorials that are integrated into the product.
• Reference documentation delivered in HTML and PDF – free on the Web.
• Hard-copy books.
support.sas.com/publishing
SAS Publishing News
®
Subscribe to SAS Publishing News to receive up-to-date information about all new SAS titles, author
podcasts, and new Web site features via e-mail. Complete instructions on how to subscribe, as well as
access to past issues, are available at our Web site.
support.sas.com/spn
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies. © 2009 SAS Institute Inc. All rights reserved. 518177_1US.0109
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising