® SAS/STAT 9.22 User’s Guide The RSREG Procedure (Book Excerpt) SAS® Documentation This document is an individual chapter from SAS/STAT® 9.22 User’s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute Inc. 2010. SAS/STAT® 9.22 User’s Guide. Cary, NC: SAS Institute Inc. Copyright © 2010, SAS Institute Inc., Cary, NC, USA All rights reserved. Produced in the United States of America. For a Web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987). SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513. 1st electronic book, May 2010 SAS® Publishing provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit the SAS Publishing Web site at support.sas.com/publishing or call 1-800-727-3228. SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. Chapter 76 The RSREG Procedure Contents Overview: RSREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6454 Comparison to Other SAS Software . . . . . . . . . . . . . . . . . . . . . . 6454 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6455 Getting Started: RSREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 6456 A Response Surface with a Simple Optimum . . . . . . . . . . . . . . . . . 6456 Syntax: RSREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6460 PROC RSREG Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6461 BY Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6464 ID Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6465 MODEL Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6465 RIDGE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6467 WEIGHT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6469 Details: RSREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6469 Introduction to Response Surface Experiments . . . . . . . . . . . . . . . . 6469 Coding the Factor Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 6472 Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6472 Plotting the Surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6472 Searching for Multiple Response Conditions . . . . . . . . . . . . . . . . . 6473 Handling Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computational Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6475 6476 Output Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6477 Displayed Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6478 ODS Table Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6480 ODS Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6481 Examples: RSREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6482 Example 76.1: A Saddle Surface Response Using Ridge Analysis . . . . . . 6482 Example 76.2: Response Surface Analysis with Covariates . . . . . . . . . 6486 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6491 6454 F Chapter 76: The RSREG Procedure Overview: RSREG Procedure The RSREG procedure uses the method of least squares to fit quadratic response surface regression models. Response surface models are a kind of general linear model in which attention focuses on characteristics of the fit response function and in particular, where optimum estimated response values occur. In addition to fitting a quadratic function, you can use the RSREG procedure to do the following: test for lack of fit test for the significance of individual factors analyze the canonical structure of the estimated response surface compute the ridge of optimum response predict new values of the response The RSREG procedure uses ODS Graphics to display the response surfaces, residuals, fit diagnostics, and ridges of optimum response. For general information about ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS.” Comparison to Other SAS Software Other SAS/STAT procedures can be used to fit the response surface, but the RSREG procedure is more specialized. PROC RSREG uses a much more compact model syntax than other procedures; for example, the following statements model a three-factor response surface in the REG, GLM, and RSREG procedures: proc reg; model y=x1 x1*x1 x2 x1*x2 x2*x2 x3 x1*x3 x2*x3 x3*x3; run; proc glm; model y=x1|x2|x3@2; run; proc rsreg; model y=x1 x2 x3; run; Terminology F 6455 Additionally, PROC RSREG includes specialized methodology for analyzing the fitted response surface, such as canonical analysis and optimum response ridges. Note that the ADX Interface in SAS/QC software provides an interactive environment for constructing and analyzing many different kinds of experiments, including response surface experiments. The ADX Interface is the preferred interactive SAS System tool for analyzing experiments, since it includes facilities for checking underlying assumptions and graphically optimizing the response surface; see Getting Started with the SAS ADX Interface for more information. The RSREG procedure is appropriate for analyzing experiments in a batch environment. Terminology Variables are referred to according to the following conventions: factor variables independent variables used to construct the quadratic response surface. To estimate the necessary parameters, each variable must have at least three distinct values in the data. Independent variables must be numeric. response variables the dependent variables to which the quadratic response surfaces are fit. Dependent variables must be numeric. covariates additional independent variables for use in the regression but not in the formation of the quadratic response surface. Covariates must be numeric. WEIGHT variable a variable for weighting the observations in the regression. The WEIGHT variable must be numeric. ID variables variables not previously described that are transferred to an output data set containing statistics for each observation in the input data set. This data set is created by using the OUT= option in the PROC RSREG statement. ID variables can be either character or numeric. BY variables variables for grouping observations. Separate analyses are obtained for each BY group. BY variables can be either character or numeric. 6456 F Chapter 76: The RSREG Procedure Getting Started: RSREG Procedure A Response Surface with a Simple Optimum This example uses the three-factor quadratic model discussed in John (1971). Settings of the temperature, gas–liquid ratio, and packing height are controlled factors in the production of a certain chemical; Schneider and Stockett (1963) performed an experiment in order to determine the values of these three factors that minimize the unpleasant odor of the chemical. The following statements input the SAS data set smell; the variable Odor is the response, while the variables T, R, and H are the independent factors. title 'Response Surface with a Simple data smell; input Odor T R H @@; label T = "Temperature" R = "Gas-Liquid Ratio" H = "Packing Height"; datalines; 66 40 .3 4 39 120 .3 4 43 40 58 40 .5 2 17 120 .5 2 -5 40 65 80 .3 2 7 80 .7 2 43 80 -31 80 .5 4 -35 80 .5 4 -26 80 ; Optimum'; .7 .5 .3 .5 4 6 6 4 49 120 .7 -40 120 .5 -22 80 .7 4 6 6 The following statements invoke PROC RSREG on the data set smell. Figure 76.1 through Figure 76.3 display the results of the analysis, including a lack-of-fit test requested with the LACKFIT option. proc rsreg data=smell; model Odor = T R H / lackfit; run; Figure 76.1 displays the coding coefficients for the transformation of the independent variables to lie between 1 and 1, simple statistics for the response variable, hypothesis tests for linear, quadratic, and crossproduct terms, and the lack-of-fit test. The hypothesis tests can be used to gain a rough idea of importance of the effects; here the crossproduct terms are not significant. However, the lack of fit for the model is significant, so more complicated modeling or further experimentation with additional variables should be performed before firm conclusions are made concerning the underlying process. A Response Surface with a Simple Optimum F 6457 Figure 76.1 Summary Statistics and Analysis of Variance Response Surface with a Simple Optimum The RSREG Procedure Coding Coefficients for the Independent Variables Factor T R H Subtracted off Divided by 80.000000 0.500000 4.000000 40.000000 0.200000 2.000000 Response Surface for Variable Odor Response Mean Root MSE R-Square Coefficient of Variation Regression Linear Quadratic Crossproduct Total Model Residual Lack of Fit Pure Error Total Error 15.200000 22.478508 0.8820 147.8849 DF Type I Sum of Squares R-Square F Value Pr > F 3 3 3 9 7143.250000 11445 293.500000 18882 0.3337 0.5346 0.0137 0.8820 4.71 7.55 0.19 4.15 0.0641 0.0264 0.8965 0.0657 DF Sum of Squares Mean Square F Value Pr > F 3 2 5 2485.750000 40.666667 2526.416667 828.583333 20.333333 505.283333 40.75 0.0240 Parameter estimates and the factor ANOVA are shown in Figure 76.2. Looking at the parameter estimates, you can see that the crossproduct terms are not significantly different from zero, as noted previously. The Estimate column contains estimates based on the raw data, and the Parameter Estimate from Coded Data column contains estimates based on the coded data. The factor ANOVA table displays tests for all four parameters corresponding to each factor—the parameters corresponding to the linear effect, the quadratic effect, and the effects of the crossproducts with each of the other two factors. The only factor with a significant overall effect is R, indicating that the level of noise left unexplained by the model is still too high to estimate the effects of T and H accurately. This might be due to the lack of fit. 6458 F Chapter 76: The RSREG Procedure Figure 76.2 Parameter Estimates and Hypothesis Tests Parameter DF Estimate Standard Error t Value Pr > |t| Parameter Estimate from Coded Data Intercept T R H T*T R*T R*R H*T H*R H*H 1 1 1 1 1 1 1 1 1 1 568.958333 -4.102083 -1345.833333 -22.166667 0.020052 1.031250 1195.833333 0.018750 -4.375000 1.520833 134.609816 1.489024 335.220685 29.780489 0.007311 1.404907 292.454665 0.140491 28.098135 2.924547 4.23 -2.75 -4.01 -0.74 2.74 0.73 4.09 0.13 -0.16 0.52 0.0083 0.0401 0.0102 0.4902 0.0407 0.4959 0.0095 0.8990 0.8824 0.6252 -30.666667 -12.125000 -17.000000 -21.375000 32.083333 8.250000 47.833333 1.500000 -1.750000 6.083333 Factor T R H DF Sum of Squares Mean Square F Value Pr > F Label 4 4 4 5258.016026 11045 3813.016026 1314.504006 2761.150641 953.254006 2.60 5.46 1.89 0.1613 0.0454 0.2510 Temperature Gas-Liquid Ratio Packing Height Figure 76.3 displays the canonical analysis and eigenvectors. The canonical analysis indicates that the directions of principal orientation for the predicted response surface are along the axes associated with the three factors, confirming the small interaction effect in the regression ANOVA (Figure 76.1). The largest eigenvalue (48.8588) corresponds to the eigenvector f0:238091; 0:971116; 0:015690g, the largest component of which (0.971116) is associated with R; similarly, the second-largest eigenvalue (31.1035) is associated with T. The third eigenvalue (6.0377), associated with H, is quite a bit smaller than the other two, indicating that the response surface is relatively insensitive to changes in this factor. The coded form of the canonical analysis indicates that the estimated response surface is at a minimum when T and R are both near the middle of their respective ranges (that is, the coded critical values for T and R are both near 0) and H is relatively high; in uncoded terms, the model predicts that the unpleasant odor is minimized when T D 84:876502, R D 0:539915, and H D 7:541050. Figure 76.3 Canonical Analysis and Eigenvectors Factor T R H Critical Value Coded Uncoded 0.121913 0.199575 1.770525 84.876502 0.539915 7.541050 Label Temperature Gas-Liquid Ratio Packing Height Predicted value at stationary point: -52.024631 Eigenvalues T Eigenvectors R H 48.858807 31.103461 6.037732 0.238091 0.970696 -0.032594 0.971116 -0.237384 0.024135 -0.015690 0.037399 0.999177 Stationary point is a minimum. A Response Surface with a Simple Optimum F 6459 To plot the response surface with respect to two of the factor variables, fix H, the least significant factor variable, at its estimated optimum value. The following statements use ODS Graphics to display the surface: ods graphics on; proc rsreg data=smell plots(only unpack)=surface(3d at(H=7.541050)); model Odor = T R H; ods select 'T * R = Pred'; run; ods graphics off; Note that the ODS SELECT statement is specified to select the plot of interest. Figure 76.4 The Response Surface at the Optimum H 6460 F Chapter 76: The RSREG Procedure Alternatively, the following statements produce an output data set containing the surface information, which you can then use for plotting surfaces or searching for optima. The first DATA step fixes H, the least significant factor variable, at its estimated optimum value (7.541), and generates a grid of points for T and R. To ensure that the grid data do not affect parameter estimates, the response variable (Odor) is set to missing. (See the section “Missing Values” on page 6472.) The second DATA step concatenates these grid points to the original data. Then PROC RSREG computes predictions for the combined data. The last DATA step subsets the predicted values over just the grid points, which excludes the predictions at the original data. data grid; do; Odor = . ; H = 7.541; do T = 20 to 140 by 5; do R = .1 to .9 by .05; output; end; end; end; data grid; set smell grid; run; proc rsreg data=grid out=predict noprint; model Odor = T R H / predict; run; data grid; set predict; if H = 7.541; run; Syntax: RSREG Procedure The following statements are available in PROC RSREG. PROC RSREG < options > ; MODEL responses= independents < / options > ; RIDGE < options > ; WEIGHT variable ; ID variables ; BY variables ; The PROC RSREG and MODEL statements are required. The BY, ID, MODEL, RIDGE, and WEIGHT statements are described after the PROC RSREG statement, and they can appear in any order. PROC RSREG Statement F 6461 PROC RSREG Statement PROC RSREG < options > ; The PROC RSREG statement invokes the procedure. You can specify the following options in the PROC RSREG statement. DATA=SAS-data-set specifies the input SAS data set that contains the data to be analyzed. By default, PROC RSREG uses the most recently created SAS data set. NOPRINT suppresses the normal display of results when only the output data set is required. For more information, see the description of the NOPRINT option in the MODEL and RIDGE statements. Note that this option temporarily disables the Output Delivery System (ODS); see Chapter 20, “Using the Output Delivery System,” for more information. OUT=SAS-data-set creates an output SAS data set that contains statistics for each observation in the input data set. In particular, this data set contains the BY variables, the ID variables, the WEIGHT variable, the variables in the MODEL statement, and the output options requested in the MODEL statement. You must specify output statistic options in the MODEL statement; otherwise, the output data set is created but contains no observations. To create a permanent SAS data set, you must specify a two-level name (see SAS Language Reference: Concepts for more information about permanent SAS data sets). For more details, see the section “OUT=SAS-data-set” on page 6477. PLOTS < (global-plot-option) >=plot-request< (options) > PLOTS < (global-plot-option) >=(plot-request< (options) >< : : : plot-request< (options) > >) controls the plots produced through ODS Graphics. When you specify only one plot-request, you can omit the parentheses from around the plot-request. For example: plots = all plots = (diagnostics ridge surface(unpack)) plots(unpack) = surface(overlaypairs) In order to produce plots, you must enable ODS Graphics and specify a plot-request, as shown in the following statements: ods graphics on; proc rsreg plots=all; model y=x; run; ods graphics off; See Figure 76.4, Output 76.1.5, Output 76.1.6, Output 76.2.3, and Output 76.2.4 for examples of the ODS graphical displays. For general information about ODS graphics, see Chapter 21, “Statistical Graphics Using ODS.” 6462 F Chapter 76: The RSREG Procedure The following global-plot-option is available. UNPACKPANELS | UNPACK suppresses paneling. By default, multiple plots can appear in some output panels. Specify the UNPACK option to display each plot separately. The following plot-requests are available. ALL produces all appropriate plots. You can specify other options with ALL; for example, to display all plots and unpack the SURFACE contours you can specify plots=(all surface(unpack)). DIAGNOSTICS < (LABEL | UNPACK ) > displays a panel of summary fit diagnostic plots. The plots produced and their usage are discussed in Table 76.1. Table 76.1 Diagnostic Plots Diagnostic Plot Usage Cook’s D statistic versus observation number Dependent variable values versus predicted values Externally studentized residuals (RStudent) versus leverage Externally studentized residuals versus predicted values Histogram of residuals Normal quantile plot of residuals Evaluate influence of an observation on the entire parameter estimate vector Evaluate adequacy of fit and detect influential observations Detect outliers and influential (high-leverage) observations Evaluate adequacy of fit and detect outliers Residuals versus predicted values Residual-fit (RF) spread plot Confirm normality of error terms Confirm normality and homogeneity of error terms, and detect outliers Evaluate adequacy of fit and detect outliers side-by-side quantile plots of the centered fit and the residuals show “how much variation in the data is explained by the fit and how much remains in the residuals” (Cleveland 1993) Observations satisfying RStudent > 2 or RStudent < –2 are called outliers, and observations with leverage > 2p/n are called influential, where n is the number of observations used in fitting the model and p is the number of parameters used in the model (Rawlings, Pantula, and Dickey 1998). Specifying the LABEL option labels the influential and outlying observations—the label is the first ID variable if the ID statement is specified; otherwise, it is the observation number. Note in the Cook’s D plot that only observations with D exceeding 4/n are labeled; these are also called influential observations. The UNPACK option displays each diagnostic plot separately. See Output 76.2.3 for an example of the diagnostics panel. PROC RSREG Statement F 6463 FIT < (GRIDSIZE=number ) > plots the predicted values against a single predictor when you have only one factor or only one covariate in the model. The GRIDSIZE= option specifies the number of points at which the fitted values are computed; by default, GRIDSIZE=200. NONE suppresses all plots. RESIDUALS < (UNPACK | SMOOTH) > displays plots of residuals against each factor and covariate. The UNPACK option displays each residual plot separately. The SMOOTH option overlays a loess smooth on each residual plot; see Chapter 50, “The LOESS Procedure,” for more information. See Output 76.1.5 for an example of this plot. RIDGE < (UNPACK) > displays the maximum and/or minimum ridge plots. This option is available only when a MAXIMUM or MINIMUM option is specified in the RIDGE statement. The UNPACK option displays the estimated response and factor level ridge plots separately. See Output 76.1.5 for an example of this plot. SURFACE < (surface-options) > displays the response surface for each response variable and each pair of factors with all other factors and covariates fixed at their means. By default a panel of contour plots is produced; see Output 76.1.6 for an example of this plot. The following surface-options can be specified: 3D displays three-dimensional surface plots instead of contour plots. See Figure 76.4 for an example of this plot. AT < keyword >< (variable=value-list | keyword < ...variable=value-list | keyword >) > specifies fixed values for factors and covariates. You can specify one or more numbers in the value-list or one of the following keywords: MIN sets the variable to its minimum value. MEAN sets the variable to its mean value. min MIDRANGE sets the variable to the middle value: max C . 2 MAX sets the variable to its maximum value. Specifying a keyword immediately after AT sets the default value of all variables; for example, AT MIN sets all variables not displayed on an axis to their minimum values. By default, continuous variables are set to their means (AT MEAN) when they are not used on an axis. For example, if your model contains variables X1, X2, and X3, then specifying AT(X1=7 9) produces a contour plot of X2 versus X3 fixing X1 D 7 and then another contour plot with X1 D 9, along with contour plots of X1 versus X2 fixing X3 at its mean, and X1 versus X3 fixing X2 at its mean. extends the surface value-times the range of each factor in each direction, which enables you to see more of the fitted surface. For example, if factor A has range Œ0; 10, then specifying EXTEND=0.1 will compute and display the surface for A in Œ 1; 11. You can specify value 0; by default, value D 0:1. EXTEND=value 6464 F Chapter 76: The RSREG Procedure produces a filled contour plot for either the predicted values or the standard errors. FILL=SE is the default. If the 3D option is also specified, then the contour plot is projected onto the surface. FILL=PRED | SE | NONE creates an n n grid of points at which the estimated values for the surface and standard errors are computed, for n 1. By default, n D 50. GRIDSIZE=n produces a contour line plot for either the predicted values or the standard errors. LINE=PRED is the default. If the 3D option is also specified, then specifying LINE displays a grid on the surface, and the other LINE= specifications are ignored. LINE< =PRED | SE | NONE > suppresses the display of the design points on the contour surface plots and the overlaid contour-line plots. NODESIGN produces overlaid contour line plots for all pairs of response variables in addition to the contour surface plots. See Figure 76.6 for an example of this plot. OVERLAYPAIRS rotates the 3D surface plots angle degrees, –180 < angle < 180. By default, angle = 57. ROTATE=angle tilts the 3D surface plots angle degrees, –180 < angle < 180. By default, angle = 20. TILT=angle UNPACKPANELS | UNPACK suppresses paneling, and displays each surface plot separately. BY Statement BY variables ; You can specify a BY statement with PROC RSREG to obtain separate analyses on observations in groups that are defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. If you specify more than one BY statement, only the last one specified is used. If your input data set is not sorted in ascending order, use one of the following alternatives: Sort the data by using the SORT procedure with a similar BY statement. Specify the NOTSORTED or DESCENDING option in the BY statement for the RSREG procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order. Create an index on the BY variables by using the DATASETS procedure (in Base SAS software). For more information about BY-group processing, see the discussion in SAS Language Reference: Concepts. For more information about the DATASETS procedure, see the discussion in the Base SAS Procedures Guide. ID Statement F 6465 ID Statement ID variables ; The ID statement names variables that are to be transferred to the data set created by the OUT= option in the PROC RSREG statement. MODEL Statement MODEL responses=independents < / options > ; In the MODEL statement, you specify the response (dependent) variables followed by an equal sign and then the independent variables, some of which can be covariates. Table 76.2 summarizes the options available in the MODEL statement. The statistic options specify which statistics are output to the OUT= data set. If none of the statistic options are selected, the data set is created but contains no observations. The statistic option keywords become values of the special variable _TYPE_ in the output data set. Table 76.2 MODEL Statement Options Task Options Analyze original data Fit model to first BY group only Declare covariates Request additional statistics Request additional tests Suppress displayed output NOCODE BYOUT COVAR= PRESS LACKFIT NOANOVA NOOPTIMAL NOPRINT Task Statistic Options Output statistics ACTUAL PREDICT RESIDUAL L95 U95 L95M U95M D The following list describes these options in alphabetical order. 6466 F Chapter 76: The RSREG Procedure ACTUAL specifies that the observed response values from the input data set be written to the output data set. BYOUT uses only the first BY group to estimate the model. Subsequent BY groups have scoring statistics computed in the output data set only. The BYOUT option is used only when a BY statement is specified. COVAR=n declares that the first n variables on the right side of the model are simple linear regressors (covariates) and not factors in the quadratic response surface. By default, PROC RSREG forms quadratic and crossproduct effects for all regressor variables in the MODEL statement. See the section “Handling Covariates” on page 6475 for more details and Example 76.2 for an example that uses covariates. D specifies that Cook’s D influence statistic be written to the output data set. See Chapter 4, “Introduction to Regression Procedures,” for details and formulas. LACKFIT performs a lack-of-fit test. See Draper and Smith (1981) for a discussion of lack-of-fit tests. L95 specifies that the lower bound of a 95% confidence interval for an individual predicted value be written to the output data set. The variance used in calculating this bound is a function of both the mean square error and the variance of the parameter estimates. See Chapter 4, “Introduction to Regression Procedures,” for details and formulas. L95M specifies that the lower bound of a 95% confidence interval for the expected value of the dependent variable be written to the output data set. The variance used in calculating this bound is a function of the variance of the parameter estimates. See Chapter 4, “Introduction to Regression Procedures,” for details and formulas. NOANOVA NOAOV suppresses the display of the analysis of variance and parameter estimates from the model fit. NOCODE performs the canonical and ridge analyses with the parameter estimates derived from fitting the response to the original values of the factor variables, rather than their coded values (see the section “Coding the Factor Variables” on page 6472 for more details). Use this option if the data are already stored in a coded form. RIDGE Statement F 6467 NOOPTIMAL NOOPT suppresses the display of the canonical analysis for the quadratic response surface. NOPRINT suppresses the display of both the analysis of variance and the canonical analysis. PREDICT specifies that the values predicted by the model be written to the output data set. PRESS computes and displays the predicted residual sum of squares (PRESS) statistic for each dependent variable in the model. The PRESS statistic is added to the summary information at the beginning of the analysis of variance, so if the NOANOVA or NOPRINT option is specified, then the PRESS option has no effect. See Chapter 4, “Introduction to Regression Procedures,” for details and formulas. RESIDUAL specifies that the residuals, calculated as ACTUAL data set. PREDICTED, be written to the output U95 specifies that the upper bound of a 95% confidence interval for an individual predicted value be written to the output data set. The variance used in calculating this bound is a function of both the mean square error and the variance of the parameter estimates. See Chapter 4, “Introduction to Regression Procedures,” for details and formulas. U95M specifies that the upper bound of a 95% confidence interval for the expected value of the dependent variable be written to the output data set. The variance used in calculating this bound is a function of the variance of the parameter estimates. See Chapter 4, “Introduction to Regression Procedures,” for details and formulas. RIDGE Statement RIDGE < options > ; A RIDGE statement computes the ridge of optimum response. The ridge starts at a given point x0 , and the point on the ridge at radius r from x0 is the collection of factor settings that optimizes the predicted response at this radius. You can think of the ridge as climbing or falling as fast as possible on the surface of predicted response. Thus, the ridge analysis can be used as a tool to help interpret an existing response surface or to indicate the direction in which further experimentation should be performed. The default starting point, x0 , has each coordinate equal to the point midway between the highest and lowest values of the factor in the design. The default radii at which the ridge is computed are 0, 6468 F Chapter 76: The RSREG Procedure 0.1, : : :, 0.9, 1. If the ridge analysis is based on the response surface fit to coded values for the factor variables (see the section “Coding the Factor Variables” on page 6472 for details), then this results in a ridge that starts at the point with a coded zero value for each coordinate and extends toward, but not beyond, the edge of the range of experimentation. Alternatively, both the center point of the ridge and the radii at which it is to be computed can be specified. You can specify the following options in the RIDGE statement: CENTER=uncoded-factor-values gives the coordinates of the point x0 from which to begin the ridge. The coordinates should be given in the original (uncoded) factor variable values and should be separated by commas. There must be as many coordinates specified as there are factors in the model, and the order of the coordinates must be the same as that used in the MODEL statement. This starting point should be well inside the range of experimentation. The default sets each coordinate equal to the value midway between the highest and lowest values for the associated factor. MAXIMUM MAX computes the ridge of maximum response. Both the MIN and MAX options can be specified; at least one must be specified. MINIMUM MIN computes the ridge of minimum response. Both the MIN and MAX options can be specified; at least one must be specified. NOPRINT suppresses the display of the ridge analysis when only an output data set is required. OUTR=SAS-data-set creates an output SAS data set containing the computed optimum ridge. For details, see the section “OUTR=SAS-data-set” on page 6477. RADIUS=coded-radii gives the distances from the ridge starting point at which to compute the optima. The values in the list represent distances between coded points. The list can take any of the following forms or can be composed of mixtures of them: m1 ; m2 ; : : : ; mn specifies several values. m TO n specifies a sequence where m equals the starting value, n equals the ending value, and the increment equals 1. m TO n BY i specifies a sequence where m equals the starting value, n equals the ending value, and i equals the increment. Mixtures of the preceding forms should be separated by commas. The default list runs from 0 to 1 by increments of 0.1. The following are examples of valid lists. radius=0 to 5 by .5; radius=0, .2, .25, .3, .5 to 1.0 by .1; WEIGHT Statement F 6469 WEIGHT Statement WEIGHT variable ; When a WEIGHT statement is specified, a weighted residual sum of squares X wi .yi yOi /2 i is minimized, where wi is the value of the variable specified in the WEIGHT statement, yi is the observed value of the response variable, and yOi is the predicted value of the response variable. The observation is used in the analysis only if the value of the WEIGHT statement variable is greater than zero. The WEIGHT statement has no effect on degrees of freedom or number of observations. If the weights for the observations are proportional to the reciprocals of the error variances, then the weighted least squares estimates are best linear unbiased estimators (BLUE). Details: RSREG Procedure Introduction to Response Surface Experiments Many industrial experiments are conducted to discover which values of given factor variables optimize a response. If each factor is measured at three or more values, a quadratic response surface can be estimated by least squares regression. The predicted optimal value can be found from the estimated surface if the surface is shaped like a simple hill or valley. If the estimated surface is more complicated, or if the predicted optimum is far from the region of experimentation, then the shape of the surface can be analyzed to indicate the directions in which new experiments should be performed. Suppose that a response variable y is measured at combinations of values of two factor variables, x1 and x2 . The quadratic response surface model for this variable is written as y D ˇ0 C ˇ1 x1 C ˇ2 x2 C ˇ3 x12 C ˇ4 x22 C ˇ5 x1 x2 C The steps in the analysis for such data are as follows: 1. model fitting and analysis of variance, including lack-of-fit testing, to estimate parameters 2. canonical analysis to investigate the shape of the predicted response surface 3. ridge analysis to search for the region of optimum response 6470 F Chapter 76: The RSREG Procedure Model Fitting and Analysis of Variance The first task in analyzing the response surface is to estimate the parameters of the model by least squares regression and to obtain information about the fit in the form of an analysis of variance. The estimated surface is typically curved: a hill with the peak occurring at the unique estimated point of maximum response, a valley, or a saddle surface with no unique minimum or maximum. Use the results of this phase of the analysis to answer the following questions: What is the contribution of each type of effect—linear, quadratic, and crossproduct—to the statistical fit? The ANOVA table with sources labeled “Regression” addresses this question. What part of the residual error is due to lack of fit? Does the quadratic response model adequately represent the true response surface? If you specify the LACKFIT option in the MODEL statement, then the ANOVA table with sources labeled “Residual” addresses this question. See the section “Lack-of-Fit Test” on page 6470 for details. What is the contribution of each factor variable to the statistical fit? Can the response be predicted accurately if the variable is removed? The ANOVA table with sources labeled “Factor” addresses this question. What are the predicted responses for a grid of factor values? (See the section “Plotting the Surface” on page 6472 and the section “Searching for Multiple Response Conditions” on page 6473.) Lack-of-Fit Test The lack-of-fit test compares the variation around the model with pure variation within replicated observations. This measures the adequacy of the quadratic response surface model. In particular, if there are ni replicated observations Yi1 ; : : : ; Yi ni of the response all at the same values xi of the factors, then you can predict the true response at xi either by using the predicted value YOi based on the model or by using the mean YNi of the replicated values. The lack-of-fit test decomposes the residual error into a component due to the variation of the replications around their mean value (the pure error) and a component due to the variation of the mean values around the model prediction (the bias error): ni XX Yij i j D1 YOi 2 D ni XX i j D1 Yij YNi 2 C X ni YNi YOi 2 i If the model is adequate, then both components estimate the nominal level of error; however, if the bias component of error is much larger than the pure error, then this constitutes evidence that there is significant lack of fit. If some observations in your design are replicated, you can test for lack of fit by specifying the LACKFIT option in the MODEL statement. Note that, since all other tests use total error rather than pure error, you might want to hand-calculate the tests with respect to pure error if the lack of fit is significant. On the other hand, significant lack of fit indicates that the quadratic model is inadequate, so if this is a problem you can also try to refine the model, possibly by using PROC GLM for general Introduction to Response Surface Experiments F 6471 polynomial modeling; see Chapter 39, “The GLM Procedure,” for more information. Example 76.1 illustrates the use of the LACKFIT option. Canonical Analysis The second task in analyzing the response surface is to examine the overall shape of the curve and determine whether the estimated stationary point is a maximum, a minimum, or a saddle point. The canonical analysis can be used to answer the following questions: Is the surface shaped like a hill, a valley, or a saddle, or is it flat? If there is a unique optimum combination of factor values, where is it? To which factor or factors are the predicted responses most sensitive? The eigenvalues and eigenvectors in the matrix of second-order parameters characterize the shape of the response surface. The eigenvectors point in the directions of principal orientation for the surface, and the signs and magnitudes of the associated eigenvalues give the shape of the surface in these directions. Positive eigenvalues indicate directions of upward curvature, and negative eigenvalues indicate directions of downward curvature. The larger an eigenvalue is in absolute value, the more pronounced is the curvature of the response surface in the associated direction. Often, all the coefficients of an eigenvector except for one are relatively small, indicating that the vector points roughly along the axis associated with the factor corresponding to the single large coefficient. In this case, the canonical analysis can be used to determine the relative sensitivity of the predicted response surface to variations in that factor. (See the section “Getting Started: RSREG Procedure” on page 6456 for an example.) Ridge Analysis If the estimated surface is found to have a simple optimum well within the range of experimentation, the analysis performed by the preceding two steps might be sufficient. In more complicated situations, further search for the region of optimum response is required. The method of ridge analysis computes the estimated ridge of optimum response for increasing radii from the center of the original design. The ridge analysis answers the following question: If there is not a unique optimum of the response surface within the range of experimentation, in which direction should further searching be done in order to locate the optimum? You can use the RIDGE statement to compute the ridge of maximum or minimum response. 6472 F Chapter 76: The RSREG Procedure Coding the Factor Variables For the results of the canonical and ridge analyses to be interpretable, the values of different factor variables should be comparable. This is because the canonical and ridge analyses of the response surface are not invariant with respect to differences in scale and location of the factor variables. The analysis of variance is not affected by these changes. Although the actual predicted surface does not change, its parameterization does. The usual solution to this problem is to code each factor variable so that its minimum in the experiment is 1 and its maximum is 1 and to carry through the analysis with the coded values instead of the original ones. This practice has the added benefit of making 1 a reasonable boundary radius for the ridge analysis since 1 represents approximately the edge of the experimental region. By default, PROC RSREG computes the linear transformation to perform this coding as the data are initially read in, and the canonical and ridge analyses are performed on the model fit to the coded data. The actual form of the coding operation for each value of a variable is coded value D .original value M /=S where M is the average of the highest and lowest values for the variable in the design and S is half their difference. Missing Values If an observation has missing data for any of the variables used by the procedure, then that observation is not used in the estimation process. If one or more response variables are missing, but no factor or covariate variables are missing, then predicted values and confidence limits are computed for the output data set, but the residual and Cook’s D statistic are missing. Plotting the Surface Specifying the PLOTS=SURFACE option in the PROC RSREG statement displays contour plots for all pairs of factors in the model (see Example 76.1), while specifying the PLOTS=SURFACE(3D) option displays a three-dimensional surface as shown in Figure 76.4. You can also generate predicted values for a grid of points with the PREDICT option (see the section “Getting Started: RSREG Procedure” on page 6456 for an example) and then use these values to create a contour plot or a three-dimensional plot of the response surface over a two-dimensional grid. Any two factor variables can be chosen to form the grid for the plot. Several plots can be generated by using different pairs of factor variables. Searching for Multiple Response Conditions F 6473 Searching for Multiple Response Conditions Suppose you have the following data with two factors and three responses, and you want to find the factor setting that produces responses in a certain region: data a; input x1 x2 y1 y2 y3; datalines; -1 -1 1.8 1.940 3.6398 -1 1 2.6 1.843 4.9123 1 -1 5.4 1.063 6.0128 1 1 0.7 1.639 2.3629 0 0 8.5 0.134 9.0910 0 0 3.0 0.545 3.7349 0 0 9.8 0.453 10.4412 0 0 4.1 1.117 5.0042 0 0 4.8 1.690 6.6245 0 0 5.9 1.165 6.9420 0 0 7.3 1.013 8.7442 0 0 9.3 1.179 10.2762 1.4142 0 3.9 0.945 5.0245 -1.4142 0 1.7 0.333 2.4041 0 1.4142 3.0 1.869 5.2695 0 -1.4142 5.7 0.099 5.4346 ; You want to find the values of x1 and x2 that maximize y1 subject to y2<2 and y3<y2+y1. The exact answer is not easy to obtain analytically, but you can obtain a practically feasible solution by checking conditions across a grid of values in the range of interest. First, append a grid of factor values to the observed data, with missing values for the responses: data b; set a end=eof; output; if eof then do; y1=.; y2=.; y3=.; do x1=-2 to 2 by .1; do x2=-2 to 2 by .1; output; end; end; end; run; Next, use PROC RSREG to fit a response surface model to the data and to compute predicted values for both the observed data and the grid, putting the predicted values in a data set c: proc rsreg data=b out=c; model y1 y2 y3=x1 x2 / predict; run; 6474 F Chapter 76: The RSREG Procedure Finally, find the subset of predicted values that satisfy the constraints, sort by the unconstrained variable, and display the top five predictions: data d; set c; if y2<2; if y3<y2+y1; proc sort data=d; by descending y1; run; data d; set d; if (_n_ <= 5); proc print; run; The results are displayed in Figure 76.5. They indicate that optimal values of the factors are around 0.3 for x1 and around –0.5 for x2. Figure 76.5 Top Five Predictions Obs 1 2 3 4 5 x1 0.3 0.3 0.3 0.4 0.4 x2 _TYPE_ -0.5 -0.6 -0.4 -0.6 -0.5 PREDICT PREDICT PREDICT PREDICT PREDICT y1 6.92570 6.91424 6.91003 6.90769 6.90540 y2 0.75784 0.74174 0.77870 0.73357 0.75135 y3 7.60471 7.54194 7.64341 7.51836 7.56883 If you are also interested in simultaneously optimizing y1 and y2, you can specify the following statements to make a visual comparison of the two response surfaces by overlaying their contour plots: ods graphics on; proc rsreg data=a plots(only)=surface(overlaypairs); model y1 y2=x1 x2; run; ods graphics off; Figure 76.6 shows that you have to make some compromises in any attempt to maximize both y1 and y2; however, you might be able to maximize y1 while minimizing y2. Handling Covariates F 6475 Figure 76.6 Overlaid Line Contours of Predicted Responses Handling Covariates Covariate regressors are added to a response surface model because they are believed to account for a sizable yet relatively uninteresting portion of the variation in the data. What the experimenter is really interested in is the response corrected for the effect of the covariates. A common example is the block effect in a block design. In the canonical and ridge analyses of a response surface, which estimate responses at hypothetical levels of the factor variables, the actual value of the predicted response is computed by using the average values of the covariates. The estimated response values do optimize the estimated surface of the response corrected for covariates, but true prediction of the response requires actual values for the covariates. You can use the COVAR= option in the MODEL statement to include covariates in the response surface model. Example 76.2 illustrates the use of this option. 6476 F Chapter 76: The RSREG Procedure Computational Method Canonical Analysis For each response variable, the model can be written in the form yi D x0i Axi C b0 xi C c0 zi C i where yi is the i th observation of the response variable. xi D .xi1 ; xi 2 ; : : : ; xi k /0 are the k factor variables for the i th observation. zi D .zi1 ; zi 2 ; : : : ; ziL /0 are the L covariates, including the intercept term. A is the k k symmetrized matrix of quadratic parameters, with diagonal elements equal to the coefficients of the pure quadratic terms in the model and off-diagonal elements equal to half the coefficient of the corresponding crossproduct. b is the k 1 vector of linear parameters. c is the L 1 vector of covariate parameters, one of which is the intercept. i is the error associated with the ith observation. Tests performed by PROC RSREG assume that errors are independently and normally distributed with mean zero and variance 2 . The parameters in A, b, and c are estimated by least squares. To optimize y with respect to x, take partial derivatives, set them to zero, and solve: @y D 2x0 A C b0 D 0 @x H) x D 1 A 2 1 b You can determine if the solution is a maximum or minimum by looking at the eigenvalues of A: If the eigenvalues. . . are all negative are all positive have mixed signs contain zeros then the solution is. . . a maximum a minimum a saddle point in a flat area Ridge Analysis If the largest eigenvalue is positive, its eigenvector gives the direction of steepest ascent from the stationary point; if the largest eigenvalue is negative, its eigenvector gives the direction of steepest descent. The eigenvectors corresponding to small or zero eigenvalues point in directions of relative flatness. The point on the optimum response ridge at a given radius R from the ridge origin is found by optimizing .x0 C d/0 A.x0 C d/ C b0 .x0 C d/ Output Data Sets F 6477 over d satisfying d0 d D R2 , where x0 is the k 1 vector containing the ridge origin and A and b are as previously discussed. By the method of Lagrange multipliers, the optimal d has the form dD .A I/ 1 .Ax0 C 0:5b/ where I is the k k identity matrix and is chosen so that d0 d D R2 . There can be several values of that satisfy this constraint; the correct one depends on which sort of response ridge is of interest. If you are searching for the ridge of maximum response, then the appropriate is the unique one that satisfies the constraint and is greater than all the eigenvalues of A. Similarly, the appropriate for the ridge of minimum response satisfies the constraint and is less than all the eigenvalues of A. (See Myers and Montgomery (1995) for details.) Output Data Sets OUT=SAS-data-set An output data set containing statistics requested with options in the MODEL statement for each observation in the input data set is created whenever the OUT= option is specified in the PROC RSREG statement. The data set contains the following variables: the BY variables the ID variables the WEIGHT variable the independent variables in the MODEL statement the variable _TYPE_, which identifies the observation type in the output data set. _TYPE_ is a character variable with a length of eight, and it takes on the values ‘ACTUAL’, ‘PREDICT’, ‘RESIDUAL’, ‘U95M’, ‘L95M’, ‘U95’, ‘L95’, and ‘D’, corresponding to the options specified. the response variables containing special output values identified by the _TYPE_ variable All confidence limits use the two-tailed Student’s t value. OUTR=SAS-data-set An output data set containing the optimum response ridge is created when the OUTR= option is specified in the RIDGE statement. The data set contains the following variables: the current values of the BY variables a character variable _DEPVAR_ containing the name of the dependent variable 6478 F Chapter 76: The RSREG Procedure a character variable _TYPE_ identifying the type of ridge being computed, MINIMUM or MAXIMUM. If both MAXIMUM and MINIMUM are specified, the data set contains observations for the minimum ridge followed by observations for the maximum ridge. a numeric variable _RADIUS_ giving the distance from the ridge starting point the values of the model factors at the estimated optimum point at distance _RADIUS_ from the ridge starting point a numeric variable _PRED_, which is the estimated expected value of the dependent variable at the optimum a numeric variable _STDERR_, which is the standard error of the estimated expected value Displayed Output All estimates and hypothesis tests assume that the model is correctly specified and the errors are distributed according to classical statistical assumptions. The output displayed by PROC RSREG includes the following. Estimation and Analysis of Variance The actual form of the coding operation for each value of a variable is coded value D 1 .original value S M/ where M is the average of the highest and lowest values for the variable in the design and S is half their difference. The Subtracted off column contains the M values for this formula for each factor variable, and S is found in the Divided by column. The summary table for the response variable contains the following information. “Response Mean” is the mean of the response variable in the sample. When a WEIGHT statement is specified, the mean yN is calculated by P wi yi yN D Pi i wi “Root MSE” estimates the standard deviation of the response variable and is calculated as the square root of the “Total Error” mean square. The “R-Square” value is R2 , or the coefficient of determination. R2 measures the proportion of the variation in the response that is attributed to the model rather than to random error. The “Coefficient of Variation” is 100 times the ratio of the “Root MSE” to the “Response Mean.” Displayed Output F 6479 A table analyzing the significance of the terms of the regression is displayed. Terms are brought into the regression in four steps: (1) the “Intercept” and any covariates in the model, (2) “Linear” terms like X1 and X2, (3) pure “Quadratic” terms like X1*X1 or X2*X2, and (4) “Crossproduct” terms like X1*X2. The table displays the following information: the degrees of freedom in the DF column, which should be the same as the number of corresponding parameters unless one or more of the parameters are not estimable Type I Sum of Squares, also called the sequential sums of squares, which measures the reduction in the error sum of squares as sets of terms (Linear, Quadratic, and so forth) are added to the model R-Square, which measures the portion of total R2 contributed as each set of terms (Linear, Quadratic, and so forth) is added to the model F Value, which tests the null hypothesis that all parameters in the term are zero by using the Total Error mean square as the denominator. This is a test of a Type I hypothesis, containing the usual F test numerator, conditional on the effects of subsequent variables not being in the model. Pr > F, which is the significance value or probability of obtaining at least as great an F ratio given that the null hypothesis is true. The Sum of Squares column partitions the “Total Error” into “Lack of Fit” and “Pure Error.” When “Lack of Fit” is significant, there is variation around the model other than random error (such as cubic effects of the factor variables). The “Total Error” Mean Square estimates 2 , the variance. F Value tests the null hypothesis that the variation is adequately described by random error. A table containing the parameter estimates from the model is displayed. The Estimate column contains the parameter estimates based on the uncoded values of the factor variables. If an effect is a linear combination of previous effects, the parameter for the effect is not estimable. When this happens, the degrees of freedom are zero, the parameter estimate is set to zero, and estimates and tests on other parameters are conditional on this parameter being zero. The Standard Error column contains the estimated standard deviations of the parameter estimates based on uncoded data. The t Value column contains t values of a test of the null hypothesis that the true parameter is zero when the uncoded values of the factor variables are used. The Pr > |T| column gives the significance value or probability of a greater absolute t ratio given that the true parameter is zero. The Parameter Estimate from Coded Data column contains the parameter estimates based on the coded values of the factor variables. These are the estimates used in the subsequent canonical and ridge analyses. The sum of squares are partitioned by the factors in the model, and an analysis table is displayed. The test on a factor is a joint test on all the parameters involving that factor. For example, the test for the factor X1 tests the null hypothesis that the true parameters for X1, X1*X1, and X1*X2 are all zero. 6480 F Chapter 76: The RSREG Procedure Canonical Analysis The Critical Value columns contain the values of the factor variables that correspond to the stationary point of the fitted response surface. The critical values can be at a minimum, maximum, or saddle point. The eigenvalues and eigenvectors are from the matrix of quadratic parameter estimates based on the coded data. They characterize the shape of the response surface. Ridge Analysis The Coded Radius column contains the distance from the coded version of the associated point to the coded version of the origin of the ridge. The origin is given by the point at radius zero. The Estimated Response column contains the estimated value of the response variable at the associated point. The standard error of this estimate is also given. This quantity is useful for assessing the relative credibility of the prediction at a given radius. Typically, this standard error increases rapidly as the ridge moves up to and beyond the design perimeter, reflecting the inherent difficulty of making predictions beyond the range of experimentation. The Uncoded Factor Values columns contain the values of the uncoded factor variables that give the optimum response at this radius from the ridge origin. ODS Table Names PROC RSREG assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in Table 76.3. For more information about ODS, see Chapter 20, “Using the Output Delivery System.” Table 76.3 ODS Tables Produced by PROC RSREG ODS Table Name Description Statement Coding Coding coefficients for the independent variables Error analysis of variance Factor analysis of variance Overall statistics for fit Model analysis of variance Estimated linear parameters Ridge analysis for optimum response Spectral analysis Stationary point of response surface default ErrorANOVA FactorANOVA FitStatistics ModelANOVA ParameterEstimates Ridge Spectral StationaryPoint default default default default default RIDGE default default ODS Graphics F 6481 ODS Graphics PROC RSREG assigns a name to each graph it creates using ODS. You can use these names to reference the graphs when using ODS. The names are listed in Table 76.4. To request these graphs you must specify the ODS GRAPHICS statement in addition to the PLOTS= option and any other options indicated in Table 76.4. For more information about the ODS GRAPHICS statement, see Chapter 21, “Statistical Graphics Using ODS.” Table 76.4 ODS Graphics Produced by PROC RSREG ODS Graph Name Plot Description PLOTS= Option FitPlot DiagnosticsPanel CooksDPlot ObservedByPredicted QQPlot ResidualByPredicted ResidualHistogram RFPlot RStudentByPredicted RStudentByLeverage ResidualPlots Fit plot for 1 predictor Panel of fit diagnostics Cook’s D plot Observed by predicted Residual Q-Q plot Residual by predicted values Residual histogram RF plot Studentized residuals by predicted RStudent by hat diagonals Panel of residuals by predictors Residuals by predictors Panel of ridge plot and factors FIT DIAGNOSTICS DIAGNOSTICS(UNPACK) DIAGNOSTICS(UNPACK) DIAGNOSTICS(UNPACK) DIAGNOSTICS(UNPACK) DIAGNOSTICS(UNPACK) DIAGNOSTICS(UNPACK) DIAGNOSTICS(UNPACK) DIAGNOSTICS(UNPACK) RESIDUALS RESIDUALS(UNPACK) RIDGE (with RIDGE MAX or MIN) RIDGE(UNPACK) (with RIDGE MAX or MIN) RIDGE(UNPACK) (with RIDGE MAX or MIN) SURFACE SURFACE(UNPACK) SURFACE(3D) SURFACE(3D UNPACK) SURFACE(OVERLAYPAIRS) SURFACE(OVERLAYPAIRS UNPACK) RidgePlots Ridge plot Ridge factors Contour Surface ContourOverlay Panel of contour plots Contour plots Panel of 3D surface plots 3D surface plots Panel of overlaid line-contour plots Overlaid line-contour plots 6482 F Chapter 76: The RSREG Procedure Examples: RSREG Procedure Example 76.1: A Saddle Surface Response Using Ridge Analysis Myers (1976) analyzes an experiment reported by Frankel (1961) aimed at maximizing the yield of mercaptobenzothiazole (MBT) by varying processing time and temperature. Myers (1976) uses a two-factor model in which the estimated surface does not have a unique optimum. A ridge analysis is used to determine the region in which the optimum lies. The objective is to find the settings of time and temperature in the processing of a chemical that maximize the yield. The following statements produce Output 76.1.1 through Output 76.1.6: data d; input Time Temp MBT; label Time = "Reaction Time (Hours)" Temp = "Temperature (Degrees Centigrade)" MBT = "Percent Yield Mercaptobenzothiazole"; datalines; 4.0 250 83.8 20.0 250 81.7 12.0 250 82.4 12.0 250 82.9 12.0 220 84.7 12.0 280 57.9 12.0 250 81.2 6.3 229 81.3 6.3 271 83.1 17.7 229 85.3 17.7 271 72.7 4.0 250 82.0 ; ods graphics on; proc rsreg data=d plots=(ridge surface); model MBT=Time Temp / lackfit; ridge max; run; ods graphics off; Output 76.1.1 displays the coding coefficients for the transformation of the independent variables to lie between 1 and 1 and some simple statistics for the response variable. Example 76.1: A Saddle Surface Response Using Ridge Analysis F 6483 Output 76.1.1 Coding and Response Variable Information The RSREG Procedure Coding Coefficients for the Independent Variables Factor Subtracted off Divided by 12.000000 250.000000 8.000000 30.000000 Time Temp Response Surface for Variable MBT: Percent Yield Mercaptobenzothiazole Response Mean Root MSE R-Square Coefficient of Variation 79.916667 4.615964 0.8003 5.7760 Output 76.1.2 shows that the lack of fit for the model is highly significant. Since the quadratic model does not fit the data very well, firm statements about the underlying process should not be based only on the current analysis. Note from the analysis of variance for the model that the test for the time factor is not significant. If further experimentation is undertaken, it might be best to fix Time at a moderate to high value and to concentrate on the effect of temperature. In the actual experiment discussed here, extra runs were made that confirmed the results of the following analysis. Output 76.1.2 Analyses of Variance Regression Linear Quadratic Crossproduct Total Model Residual Lack of Fit Pure Error Total Error DF Type I Sum of Squares R-Square F Value Pr > F 2 2 1 5 313.585803 146.768144 51.840000 512.193947 0.4899 0.2293 0.0810 0.8003 7.36 3.44 2.43 4.81 0.0243 0.1009 0.1698 0.0410 DF Sum of Squares Mean Square F Value Pr > F 3 3 6 124.696053 3.146667 127.842720 41.565351 1.048889 21.307120 39.63 0.0065 Parameter DF Estimate Standard Error t Value Pr > |t| Parameter Estimate from Coded Data Intercept Time Temp Time*Time Temp*Time Temp*Temp 1 1 1 1 1 1 -545.867976 6.872863 4.989743 0.021631 -0.030075 -0.009836 277.145373 5.004928 2.165839 0.056784 0.019281 0.004304 -1.97 1.37 2.30 0.38 -1.56 -2.29 0.0964 0.2188 0.0608 0.7164 0.1698 0.0623 82.173110 -1.014287 -8.676768 1.384394 -7.218045 -8.852519 6484 F Chapter 76: The RSREG Procedure Output 76.1.2 continued Factor Time Temp DF Sum of Squares Mean Square F Value Pr > F 3 3 61.290957 461.250925 20.430319 153.750308 0.96 7.22 0.4704 0.0205 Factor Label Time Temp Reaction Time (Hours) Temperature (Degrees Centigrade) The canonical analysis (Output 76.1.3) indicates that the predicted response surface is shaped like a saddle. The eigenvalue of 2.5 shows that the valley orientation of the saddle is less curved than the hill orientation, with an eigenvalue of 9:99. The coefficients of the associated eigenvectors show that the valley is more aligned with Time and the hill with Temp. Because the canonical analysis resulted in a saddle point, the estimated surface does not have a unique optimum. Output 76.1.3 Canonical Analysis Factor Time Temp Critical Value Coded Uncoded -0.441758 -0.309976 8.465935 240.700718 Label Reaction Time (Hours) Temperature (Degrees Centigrade) Predicted value at stationary point: 83.741940 Eigenvalues 2.528816 -9.996940 Eigenvectors Time 0.953223 0.302267 Temp -0.302267 0.953223 Stationary point is a saddle point. However, the ridge analysis in Output 76.1.4 and the ridge plot in Output 76.1.5 indicate that maximum yields result from relatively high reaction times and low temperatures. A contour plot of the predicted response surface, shown in Output 76.1.6, confirms this conclusion. Example 76.1: A Saddle Surface Response Using Ridge Analysis F 6485 Output 76.1.4 Ridge Analysis Estimated Ridge of Maximum Response for Variable MBT: Percent Yield Mercaptobenzothiazole Coded Radius Estimated Response Standard Error 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 82.173110 82.952909 83.558260 84.037098 84.470454 84.914099 85.390012 85.906767 86.468277 87.076587 87.732874 2.665023 2.648671 2.602270 2.533296 2.457836 2.404616 2.410981 2.516619 2.752355 3.130961 3.648568 Uncoded Factor Values Time Temp 12.000000 11.964493 12.142790 12.704153 13.517555 14.370977 15.212247 16.037822 16.850813 17.654321 18.450682 Output 76.1.5 Ridge Plot of Predicted Response Surface 250.000000 247.002956 244.023941 241.396084 239.435227 237.919138 236.624811 235.449230 234.344204 233.284652 232.256238 6486 F Chapter 76: The RSREG Procedure Output 76.1.6 Contour Plot of Predicted Response Surface Example 76.2: Response Surface Analysis with Covariates One way of viewing covariates is as extra sources of variation in the dependent variable that can mask the variation due to primary factors. This example demonstrates the use of the COVAR= option in PROC RSREG to fit a response surface model to the dependent variables corrected for the covariates. You have a chemical process with a yield that you hypothesize to be dependent on three factors: reaction time, reaction temperature, and reaction pressure. You perform an experiment to measure this dependence. You are willing to include up to 20 runs in your experiment, but you can perform no more than 8 runs on the same day, so the design for the experiment is composed of three blocks. Additionally, you know that the grade of raw material for the reaction has a significant impact on the yield. You have no control over this, but you keep track of it. The following statements create a SAS data set containing the results of the experiment: Example 76.2: Response Surface Analysis with Covariates F 6487 data Experiment; input Day Grade Time datalines; 1 67 -1 -1 1 68 -1 1 1 70 1 -1 1 66 1 1 1 74 0 0 1 68 0 0 2 75 -1 -1 2 69 -1 1 2 70 1 -1 2 71 1 1 2 72 0 0 2 74 0 0 3 69 1.633 0 3 67 -1.633 0 3 68 0 1.633 3 71 0 -1.633 3 70 0 0 3 72 0 0 3 70 0 0 3 72 0 0 ; Temp Pressure Yield; -1 1 1 -1 0 0 1 -1 -1 1 0 0 0 0 0 0 1.633 -1.633 0 0 32.98 47.04 67.11 26.94 103.22 42.94 122.93 62.97 72.96 94.93 93.11 112.97 78.88 52.53 68.96 92.56 88.99 102.50 82.84 103.12 Your first analysis neglects to take the covariates into account. The following statements use PROC RSREG to fit a response surface to the observed yield, but note that Day and Grade are omitted: proc rsreg data=Experiment; model Yield = Time Temp Pressure; run; The ANOVA results shown in Output 76.2.1 indicate that no process variable effects are significantly larger than the background noise. Output 76.2.1 Analysis of Variance Ignoring Covariates The RSREG Procedure Regression Linear Quadratic Crossproduct Total Model DF Type I Sum of Squares R-Square F Value Pr > F 3 3 3 9 1880.842426 2370.438681 241.873250 4493.154356 0.1353 0.1706 0.0174 0.3233 0.67 0.84 0.09 0.53 0.5915 0.5023 0.9663 0.8226 Residual DF Sum of Squares Mean Square Total Error 10 9405.129724 940.512972 6488 F Chapter 76: The RSREG Procedure However, when the yields are adjusted for covariate effects of day and grade of raw material, very strong process variable effects are revealed. The following statements produce the ANOVA results in Output 76.2.2. Note that in order to include the effects of the classification factor Day as covariates, you need to create dummy variables indicating each day separately. data Experiment; set Experiment; d1 = (Day = 1); d2 = (Day = 2); d3 = (Day = 3); ods graphics on; proc rsreg data=Experiment plots=all; model Yield = d1-d3 Grade Time Temp Pressure / covar=4; run; ods graphics off; The results show very strong effects due to both the covariates and the process variables. Output 76.2.2 Analysis of Variance Including Covariates The RSREG Procedure Regression DF Type I Sum of Squares R-Square F Value Pr > F Covariates Linear Quadratic Crossproduct Total Model 3 3 3 3 12 13695 156.524497 22.989775 23.403614 13898 0.9854 0.0113 0.0017 0.0017 1.0000 316957 3622.53 532.06 541.64 80413.2 <.0001 <.0001 <.0001 <.0001 <.0001 Residual Total Error DF Sum of Squares Mean Square 7 0.100820 0.014403 The number of observations in the data set might be too small for the diagnostic plots in Output 76.2.3 to dependably identify problems; however, some outliers are indicated. The residual plots in Output 76.2.4 do not display any obvious structure. Example 76.2: Response Surface Analysis with Covariates F 6489 Output 76.2.3 Fit Diagnostics 6490 F Chapter 76: The RSREG Procedure Output 76.2.4 Residual Plots References F 6491 References Box, G. E. P. (1954), “The Exploration and Exploitation of Response Surfaces: Some General Considerations,” Biometrics, 10, 16. Box, G. E. P. and Draper, N. R. (1982), “Measures of Lack of Fit for Response Surface Designs and Predictor Variable Transformations,” Technometrics, 24, 1–8. Box, G. E. P. and Draper, N. R. (1987), Empirical Model Building and Response Surfaces, New York: John Wiley & Sons. Box, G. E. P. and Hunter, J. S. (1957), “Multifactor Experimental Designs for Exploring Response Surfaces,” Annals of Mathematical Statistics, 28, 195–242. Box, G. E. P., Hunter, W. G., and Hunter, J. S. (1978), Statistics for Experimenters, New York: John Wiley & Sons. Box, G. E. P. and Wilson, K. J. (1951), “On the Experimental Attainment of Optimum Conditions,” Journal of the Royal Statistical Society. Cleveland, W. S. (1993), Visualizing Data, Summit, NJ: Hobart Press. Cochran, W. G. and Cox, G. M. (1957), Experimental Designs, Second Edition, New York: John Wiley & Sons. Draper, N. R. (1963), “Ridge Analysis of Response Surfaces,” Technometrics, 5, 469–479. Draper, N. R. and John, J. A. (1988), “Response Surface Designs for Quantitative and Qualitative Variables,” Technometrics, 30, 423–428. Draper, N. R. and Smith, H. (1981), Applied Regression Analysis, Second Edition, New York: John Wiley & Sons. Frankel, S. A. (1961), “Statistical Design of Experiments for Process Development of MBT,” Rubber Age, 89, 453. John, P. W. M. (1971), Statistical Design and Analysis of Experiments, New York: Macmillan. Mead, R. and Pike, D. J. (1975), “A Review of Response Surface Methodology from a Biometric Point of View,” Biometrics, 31, 803. Meyer, D. C. (1963), “Response Surface Methodology in Education and Psychology,” Journal of Experimental Education, 31, 329. Myers, R. H. (1976), Response Surface Methodology, Blacksburg: Virginia Polytechnic Institute and State University. Myers, R. H. and Montgomery, D. C. (1995), Response Surface Methodology: Process and Product Optimization Using Designed Experiments, New York: John Wiley & Sons. Rawlings, J. O., Pantula, S. G., and Dickey, D. A. (1998), Applied Regression Analysis: A Research Tool, Springer Texts in Statistics, Second Edition, New York: Springer-Verlag. 6492 F Chapter 76: The RSREG Procedure Schneider, A. M. and Stockett, A. L. (1963), “An Experiment to Select Optimum Operating Conditions on the Basis of Arbitrary Preference Ratings,” Chemical Engineering Progress Symposium Series. Subject Index analysis of variance quadratic response surfaces, 6470 canonical analysis response surfaces, 6471 RSREG procedure, 6471 confidence intervals individual observation (RSREG), 6466, 6467 means (RSREG), 6466, 6467 Cook’s D influence statistic RSREG procedure, 6466 eigenvalues and eigenvectors RSREG procedure, 6476 GLM procedure compared to other procedures, 6454 hypothesis tests lack of fit (RSREG), 6470 lack-of-fit tests RSREG procedure, 6470 ODS graph names RSREG procedure, 6481 predicted residual sum of squares RSREG procedure, 6467 PRESS statistic RSREG procedure, 6467 response surfaces, 6453 canonical analysis, interpreting, 6471 covariates, 6475 experiments, 6469 plotting, 6472 ridge analysis, 6471 ridge analysis RSREG procedure, 6471 RSREG procedure canonical analysis, 6471 coding variables, 6472, 6478 compared to other procedures, 6454 computational methods, 6476 confidence intervals, 6466, 6467 Cook’s D influence statistic, 6466 covariates, 6455 eigenvalues, 6476 eigenvectors, 6476 factor variables, 6455 input data sets, 6461, 6466 introductory example, 6456 missing values, 6472 ODS graph names, 6481 ODS table names, 6480 output data sets, 6461, 6468, 6477 PRESS statistic, 6467 response variables, 6455 ridge analysis, 6471 Syntax Index ACTUAL option MODEL statement (RSREG), 6466 BY statement RSREG procedure, 6464 BYOUT option MODEL statement (RSREG), 6466 CENTER= option RIDGE statement (RSREG), 6468 COVAR= option MODEL statement (RSREG), 6466 D option MODEL statement (RSREG), 6466 DATA= option PROC RSREG statement, 6461 ID statement RSREG procedure, 6465 L95 option MODEL statement (RSREG), 6466 L95M option MODEL statement (RSREG), 6466 LACKFIT option MODEL statement (RSREG), 6466 MAXIMUM option RIDGE statement (RSREG), 6468 MINIMUM option RIDGE statement (RSREG), 6468 MODEL statement RSREG procedure, 6465 NOANOVA option MODEL statement (RSREG), 6466 NOCODE option MODEL statement (RSREG), 6466 NOOPTIMAL option MODEL statement (RSREG), 6467 NOPRINT option MODEL statement (RSREG), 6467 PROC RSREG statement, 6461 RIDGE statement (RSREG), 6468 OUT= option PROC RSREG statement, 6461 OUTR= option RIDGE statement (RSREG), 6468 PLOTS= option PROC RSREG statement, 6461 PREDICT option MODEL statement (RSREG), 6467 PRESS option MODEL statement (RSREG), 6467 PROC RSREG statement, see RSREG procedure RADIUS= option RIDGE statement (RSREG), 6468 RESIDUAL option MODEL statement (RSREG), 6467 RIDGE statement RSREG procedure, 6467 RSREG procedure syntax, 6460 RSREG procedure, BY statement, 6464 RSREG procedure, ID statement, 6465 RSREG procedure, MODEL statement, 6465 ACTUAL option, 6466 BYOUT option, 6466 COVAR= option, 6466 D option, 6466 L95 option, 6466 L95M option, 6466 LACKFIT option, 6466 NOANOVA option, 6466 NOCODE option, 6466 NOOPTIMAL option, 6467 NOPRINT option, 6467 PREDICT option, 6467 PRESS option, 6467 RESIDUAL option, 6467 U95 option, 6467 U95M option, 6467 RSREG procedure, PROC RSREG statement, 6461 DATA= option, 6461 NOPRINT option, 6461 OUT= option, 6461 PLOTS= option, 6461 RSREG procedure, RIDGE statement, 6467 CENTER= option, 6468 MAXIMUM option, 6468 MINIMUM option, 6468 NOPRINT option, 6468 OUTR= option, 6468 RADIUS= option, 6468 RSREG procedure, WEIGHT statement, 6469 U95 option MODEL statement (RSREG), 6467 U95M option MODEL statement (RSREG), 6467 WEIGHT statement RSREG procedure, 6469 Your Turn We welcome your feedback. If you have comments about this book, please send them to yourturn@sas.com. Include the full title and page numbers (if applicable). If you have comments about the software, please send them to suggest@sas.com. SAS Publishing Delivers! ® Whether you are new to the work force or an experienced professional, you need to distinguish yourself in this rapidly changing and competitive job market. SAS Publishing provides you with a wide range of resources to help you set yourself apart. Visit us online at support.sas.com/bookstore. ® SAS Press ® Need to learn the basics? Struggling with a programming problem? You’ll find the expert answers that you need in example-rich books from SAS Press. Written by experienced SAS professionals from around the world, SAS Press books deliver real-world insights on a broad range of topics for all skill levels. SAS Documentation support.sas.com/saspress ® To successfully implement applications using SAS software, companies in every industry and on every continent all turn to the one source for accurate, timely, and reliable information: SAS documentation. We currently produce the following types of reference documentation to improve your work experience: • Online help that is built into the software. • Tutorials that are integrated into the product. • Reference documentation delivered in HTML and PDF – free on the Web. • Hard-copy books. support.sas.com/publishing SAS Publishing News ® Subscribe to SAS Publishing News to receive up-to-date information about all new SAS titles, author podcasts, and new Web site features via e-mail. Complete instructions on how to subscribe, as well as access to past issues, are available at our Web site. support.sas.com/spn SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2009 SAS Institute Inc. All rights reserved. 518177_1US.0109

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising