YASAI User Manual

```YASAIw.xla – A modified version of an open­source add­in for Excel to provide additional functions for Monte Carlo simulation. By Greg Pelletier, Department of Ecology, P.O. Box 47710, Olympia, WA 98504‐7710 (gpel461@ecy.wa.gov) June 2009 Introduction YASAI is a freely‐available open‐source add‐in for Microsoft Excel that was developed at Rutgers University primarily to teach Monte Carlo simulation at the university level (Eckstein and Riedmueller, 2001). The basic functionality and theory of YASAI is presented in detail by Eckstein et al. (2000), Eckstein and Riedmueller (2002), and Eckstein and Riedmueller (2002). We modified the original version 2.0 of YASAI (from the Rutgers Web site at http://www.yasai.rutgers.edu/) by adding several new functions and features as described in this document. The modified version of YASAI is called YASAIw version 2.0w. Some of the documentation provided by Eckstein and Riedmueller (2002) is repeated here to provide context for the new functions that we added in the modified version of YASAIw. All of the original functions of YASAI are preserved in the new version. The following new features have been added in YASAIw: •
Additional distributions for specifying random variables. •
Functions for generating correlated random variables. •
Sensitivity analysis to estimate the correlation between forecasted variables and each assumed input variable and the contribution to variance of each forecast variable from each assumption variable. •
The ability to call user‐defined macro VBA subroutines with each iteration of the simulation. This function allows the user to construct more complex models for prediction of the forecasted variables using VBA. •
The ability to use YASAIw functions in any or all of the worksheets in the workbook for the simulation. •
•
•
•
0.05
0.1
0.15
0.25
0.5
0.75
0.85
0.9
0.95
1
1490
1540
1590
1622.5
1682.5
1820
1962.5
2042.5
2105
2190
2275 =GENLIMITNORMAL(m, s, min, max, optional method): Returns a value with an underlying normal distribution with mean m and standard deviation s that is bounded by the min and max. The ‘method’ argument is optional and allows for calculation using one of the following two methods: •
•
Method=0 (or missing) samples according to the underlying distribution until a sampled value is within the bounds of the min and max. This method changes the percentiles relative to the underlying distribution and takes the density of the underlying distribution conditioned on being within the truncation bounds. Method 1 samples from an unbounded distribution and replaces any values that are above or below the max or min with the max or min. This method preserves the same percentiles as the unbounded distribution within the bounds of the min and max. =GENLIMITLOGNORMAL(m, s, min, max, optional method): Returns a value with an underlying lognormal distribution with mean m and standard deviation s that is bounded by the min and max. The ‘method’ argument is optional and is the same as described for GENLIMITNORMAL. =GENBETAPERT(min, mode, max, optional weight): Returns a value with a PERT‐beta distribution with a minimum (min), most likely value (mode), and maximum (max). The optional parameter (weight) is a weighting factor that is used to estimate the mean value (weight=4 is assumed if weight is not specified). As the weight parameter is increased the estimated mean value approaches the most likely value (mode). The PERT‐beta is useful for modeling expert opinion in which a variable is bounded with known or estimated bounds (min and max) and for which a “most likely value” is know or can be estimated. The PERT‐beta distribution is a smoother curved alternative to a triangular distribution. The PERT‐beta distribution βp(min, mode, max, weight) is related to the standard beta distribution β(α1, α2) through the following relationships (EPA, 2007; Vose, 2009): βp(min, mode, max, weight) = min + (max – min) * β(α1, α2) α1 = ((µ ‐ min) * (2*mode – min – max)) / ((mode ‐ µ) * (max – min)) α2 = (α1 * (max ‐ µ)) / (µ ‐ min) µ(mean) = (min + weight*mode + max) / (weight + 2) The mean value (µ) of the PERT‐beta distribution is assumed using the equation above in order to estimate the appropriate shape parameters (α1 and α2) of the corresponding beta distribution. Unlike the triangular distribution, the mean value (µ) of the PERT‐beta distribution is more influenced by the most likely value (mode) depending on the value of the weight parameter. The weight parameter influences the spread of the distribution around the most likely value, with larger values of weight resulting in narrower spread. A default weight = 4 is suggested based on standard PERT network assumption that the best estimate for the duration of a task = (min + 4*mode + max)/6 (Vose, 2009). Correlated variables Generating two sequences of random numbers from standardized normal distributions with a given linear correlation is done in two steps in YASAIw: 1. Generate two sequences of uncorrelated standardized normal variables (e.g. X1 and X2) 2. Define a new sequence of correlated standardized normalized variable Y1 = r * X1 + (1 ‐ r^2)^0.5 * X2, where r is the correlation coefficient between Y1 and X1. Generating two sequences of random numbers from uniform distributions with a given linear correlation is done in three steps in YASAIw: 1. Generate two sequences of uncorrelated standardized normal variables (e.g. X1 and X2) 2. Define a new sequence of correlated standardized normalized variable Y1 = radj * X1 + (1 ‐ radj^2)^0.5 * X2, where ‘radj’ is the calculated linear correlation coefficient between Y1 and X1 estimated by adjusting the assumed linear correlation ‘r’ between uniform variables that are used to sample from the non‐parametric distribution. This new sequence for Y1 is a standard normal variable with a linear correlation ‘radj’ with the X1 sequence, where radj = 2 * sin(pi/6 * r). Note that r is also the Spearman rank correlation between the standardized normals as well as being the linear correlation between the uniform variables. 3. Calculate the new sequence of linearly correlated uniform variables (0 to 1) using Excel’s NORMSDIST function that returns the probability (0 to 1) that a standardized normal variable will be less than or equal to Y1. Correlated standard normal variables are used by YASAIw to generate correlated normal or lognormal variables. Correlated uniform variables are used to generate correlated nonparametric variables with specified CFDs. Nine functions are provided for use in generating correlated random variables from either normal or lognormal distributions. =GENZ(): Generate independent standardized normal variables =GENNORMALX(z1, m, s): Generate independent normal variable with m=mean, s=standard deviation, using z1= independent standardized normal variable. =GENNORMALY(z1, z2, r, m, s): Generate normal variable with m=mean, s=standard deviation, correlated to the independent variable by r= correlation, using z1 and z2= independent standardized normal variables. =GENLOGNORMALX(z1, m, s): Generate independent log‐normal variable with m=mean, s=standard deviation, using z1= independent standardized normal variable. =GENLOGNORMALY(z1, z2, r, m, s): Generate lognormal variable with m=mean, s=standard deviation, correlated to the independent variable by r= correlation between variables, using z1 and z2= independent standardized normal variables. =GENLIMITNORMALX(z1, m, s, min, max): Generate truncated independent normal variable with m=mean, s=standard deviation, using z1= independent standardized normal variable, min and max= minimum and maximum values of the truncated distribution. The truncation method is method 1 a described above. =GENLIMITNORMALY(z1, z2, r, m, s, min, max): Generate truncated normal variable with m=mean, s=standard deviation, correlated to the independent variable by r= correlation, using z1 and z2= independent standardized normal variables, min and max= minimum and maximum values of the truncated distribution. The truncation method is method 1 a described above. =GENLIMITLOGNORMALX(z1, m, s, min, max): Generate independent log‐normal variable with m=mean, s=standard deviation, using z1= independent standardized normal variable, min and max= minimum and maximum values of the truncated distribution. The truncation method is method 1 a described above. =GENLIMITLOGNORMALY(z1, z2, r, m, s, min, max) : Generate lognormal variable with m=mean, s=standard deviation, correlated to the independent variable by r= correlation between variables, using z1 and z2= independent standardized normal variables, min and max= minimum and maximum values of the truncated distribution. The truncation method is method 1 described above. Two additional functions are provided for use in generating correlated random variables from nonparametric cumulative distribution functions (CFDs): =GENCFDX(range, z1): returns an independent X value that is randomly selected from a cumulative frequency distribution (CFD) that is specified in a contiguous two column range of Worksheet cells (as described above for GENCFD). The second argument refers to a worksheet cell containing a GENZ function that is a standardized normal variable that is used to derive the uniform distribution value that is used to sample from the CFD. =GENCFDY(range, z1, z2, r): returns a Y value that is rank‐correlated to the X value and is randomly selected from a cumulative frequency distribution (CFD) that is specified in a contiguous two column range of Worksheet cells (as described above for GENCFD). The second and third arguments (z1 and z2) refer to worksheet cells containing independent GENZ functions that are used to estimate a correlated standardized normal variable that is used to derive a linearly correlated uniform distribution value that is used to sample from the CFD. The fourth argument is the linear correlation between the uniform variables that are used to derive the independent and dependent X and Y values from the CFDs (r is also the Spearman rank correlation between the standardized normals and the nonparametric random variables). The following are instructions for generating normal random values of a dependent variable that are correlated to an independent variable for use as assumptions in Monte Carlo simulation with YASAIw (Figure 1): 1. Use =GENZ() to generate uncorrelated independent standardized normal variables in worksheet cells for the independent and the dependent variables (e.g. cell C4 and C5 in an example worksheet shown in Figure 1) 2. Enter the mean "m" and standard deviation "s" for the independent variable in worksheet cells (e.g. cells E4 and F4 in this example worksheet) 3. Enter the correlation coefficient "r", mean "m", and standard deviation "s" of the dependent correlated variable in worksheet cells (e.g. cells D5, E5, and F5 in this example worksheet), where r^2 represents the fraction of the variance of the dependent variable that is explained by variance of the independent variable, m is the mean of the correlated dependent variable, and s is the standard deviation of the correlated dependent variable. 4. Use =GENNORMALX(z1, m, s) to generate normal random values for the independent variable, where z1 references the worksheet cell where the =GENZ() function was used to generate the standardized normal variable for the independent variable (e.g. cell G4 in this example worksheet). For lognormal variables use =GENLOGNORMALX(z1, m, s) (e.g. cell K4). 5. Use =GENNORMALY(z1, z2, r, m, s) to generate normal random values for the independent variable, where z1 references the worksheet cell where the =GENZ() function was used to generate the standardized normal variable for the independent variable, z2 is the =GENZ() for the dependent variable, r is the correlation coefficient of the independent to the dependent variable, m is the mean value of the dependent variable, and s is the standard deviation of the dependent variable (e.g. cell G5 in this example worksheet). For lognormal variables, use =GENLOGNORMALY(z1, z2, r, m, s) (e.g. cell K5). The independent and dependent variables may also be from different parametric or nonparametric distributions. For example, the independent variable may be a nonparametric distribution (GENCFDX) and the dependent variable may be a normal distribution (GENNORMALY) or vice versa (GENNORMALX and GENCFDY). In these cases of mixed distribution types the correlation between the independent and dependent variables refers to Spearman’s rank correlation. Figure 1. Example Worksheet using correlated variables. Sensitivity analysis Sensitivity analysis provides output of a Worksheet with Spearman’s rank correlation coefficient for each pair of forecasts/assumptions for the SIMOUTPUT variables. In practice it is most useful to define SIMOUTPUT variables as either “echoes of the assumptions” or “forecasts” using a new optional third argument for the SIMOUTPUT function (1=echo of assumption, 2=forecast). “Assumptions” can be thought of as the cells that contain functions for generating randomized variables (e.g. using the GENNORMAL or other functions to specify random variables) that are used as inputs to the model. The “echoes of the assumptions” are cells that contain SIMOUTPUT functions that refer to the assumptions as their argument so that the assumption values will be saved for the sensitivity analysis or to view their cumulative frequency distributions. The “forecast” cells are the model calculations that depend on the assumptions, and they contain SIMOUTPUT functions that use the model equations as their argument. Using the example shown in Figure 1, the input variables X and Y are the model assumptions that are contained in cells G4 and G5, respectively, and they are echoed as SIMOUTPUT assumptions by entering the following functions in cells H4 and H5, respectively (note the optional third argument is 1 to specify these SIMOUTPUT functions as echoes of assumptions): =SIMOUTPUT(G4, “X”, 1) =SIMOUTPUT(G5, “Y”, 1) There typically are several variables that are assumptions in any model, and any of these can also be echoed in the same way. The model result that depends on these assumptions can be thought of as a “forecast” that depends on the “assumptions”. For example, if the model calculation is in cell G7 and it is the sum of two numbers, X in cell G4 and Y in cell G5, then the model result is specified as a SIMOUTPUT forecast in cell G7 as follows (note the optional third argument is 2 to specify this SIMOUTPUT as a forecast): =SIMOUTPUT(G4+G5, “X+Y”, 2). If the optional third argument is not used (or is a value other than 1 or 2) then the SIMOUTPUT function will not be included in the sensitivity analysis. The sensitivity analysis presents the Spearman’s rank correlation coefficient between each forecast with each assumption. The sensitivity analysis also presents the “contribution to variance” that each assumption provides for each forecast. Contribution to variance is estimated by summing the squared rank correlation coefficients for each forecast, and then taking the ratio of the squared rank correlation coefficient of each assumption to the sum of the squared rank correlation coefficients for each forecast (Vose, 2009). Calling user­defined macro VBA subroutines during each iteration YASAIw allows the option to call user‐defined macro VBA subroutines with each iteration of the simulation. This allows the user to construct more complex models for prediction of the forecasted variables using VBA. For example, assumption variables that are randomly generated may be used as inputs to models that are executed with user‐defined VBA subroutines or macros. The user has the option of running macros either before or after the workbook is recalculated, or both. The YASAIw functions for specifying random variables generate a new set of random values each time the workbook is recalculated. During each iteration YASAIw forces a full calculation of the data in all of the worksheets of the active workbook. In some cases the user may want to run a macro before the workbook is recalculated in order to provide outputs that will be used to generate the new set of variables. In other cases the user may want to run a macro after the workbook is recalculated in order to use the new set of assumption variables for input to a complex model that generates output that will be used as forecasts or predictions. To run macros during simulation the user must include Subroutines in a VBA module of the active workbook with one or both of the following names: •
Public Sub YASAIwBeforeRecalc() – if a subroutine with this name is present in a VBA module of the active workbook then it will be run during each iteration before the workbook is recalculated by YASAIw. •
Public Sub YASAIwAfterRecalc() – if a subroutine with this name is present in a VBA module of the active workbook then it will be run during each iteration after the workbook is recalculated by YASAIw. As an example of the use of user‐defined macros, suppose that the user has a Subroutine named “RunQ2K” that needs to be run during each iteration after the workbook is recalculated by YASAIw to use the new set of assumptions as inputs to a complex model. The following subroutine would need to be present in a VBA module of the active workbook: Public Sub YASAIwAfterRecalc()
Call RunQ2K
End Sub The following VBA statements should generally not be used in any user‐defined VBA subroutines or functions that are executed during the simulation because they cause the YASAIw functions that are present in the worksheets to generate a new set of random variables, which will result in incorrect results of sensitivity analysis: ActiveSheet.Calculate
Application.Calculate
Application.CalculateFull
Application.CalculateFullRebuild
Saving results of user­defined macro VBA subroutines during each iteration To specify that output statistics should be tabulated for an output of a user‐defined VBA subroutine or macro in a worksheet cell, one includes somewhere in a worksheet the formula as follows: =VBAOUTPUT(x, name, optional sensitivity) Here, x is the address of the cell where the output of the VBA subroutine is located, name is an argument that specifies what the output should be called, and the optional sensitivity argument, if used, is set to either 1 (use the output as an assumption for sensitivity analysis) or 2 (use the output as a forecast for sensitivity analysis). The VBAOUTPUT function always returns the value in cell x. When working interactively with a spreadsheet, it does nothing more. However, when a simulation is running, VBAOUTPUT records each value it sees for later analysis. For example, suppose that a user‐defined VBA subroutine runs a water quality model of a river that writes an output value of the predicted dissolved oxygen concentration at river mile 3.5 into cell B5, and we want to name that variable “DO at RM 3.5” and use it as a forecast variable for the sensitivity analysis. To do this the following formula would be entered in a worksheet cell: =VBAOUTPUT(B5,”DO at RM 3.5”, 2) It is possible to save outputs from user‐defined subroutines using the SIMOUTPUT function, but this should be avoided because it would cause incorrect results of the sensitivity analysis. The user should also avoid placing any YASAIw functions in cells that the user‐defined subroutines will use as output because the contents of these cells would be replaced with output values during each iteration. Use of multiple worksheets in the workbook for the simulation model The original version of YASAI required that all of the functions for the simulation model needed to be on a single worksheet, and that the simulation had to be run from that worksheet. In YASAIw we have changed this to allow the user to place YASAIw functions for the simulation model in any cells of any worksheets in the workbook. The simulation can be started from any worksheet with YASAIw. This allows the user to use any or all of the worksheets in the workbook for the simulation model. We also recommend that a workbook should be used for only one simulation model with YASAIw. Running the simulation To run a simulation, one selects “YASAI Simulation” from Excel’s Add‐ins menu, which brings up YASAIw’s main dialog box, shown in Figure 2. Figure 2. YASAIw’s main dialog box. The dialog box allows the user to specify the number of scenarios and the number of sample recalculations per scenario. Ordinarily, nothing else needs to be entered. There is an option, however, to specify a fixed random number seed value (important when trying to exactly reproduce simulation behavior, and useful for model debugging). If the random seed field is not filled, YASAIw constructs a seed from the system clock. A final option, turned on by default in accordance with standard practice, causes the same random seed value to be used for all scenarios. Two other check boxes are also present in the main dialog box and by default they are activated: •
•
“Write output of all iterations” provides output of a Worksheet called “Iterations Output” that gives output of the values of each value for all SIMOUTPUT functions. “Run sensitivity analysis” provides output of a Worksheet called “Sensitivity Output” with Spearman’s rank correlation coefficient for each pair of forecasts/assumptions for the SIMOUTPUT or VBAOUTPUT variables as described above. The sensitivity analysis can take a long time (e.g. many hours) for models with large numbers of pairs of forecasts and assumptions and large numbers of iterations. Therefore we provide the option to turn off the sensitivity analysis with this checkbox in the main dialog box to allow for faster run times if the user is not interested the sensitivity analysis. Clicking the “Simulate” button on the dialog box starts the simulation. YASAIw recalculates the spreadsheet NS times, where N is the sample size and S is the number of scenarios, storing the x arguments of all SIMOUTPUT and VBAOUTPUT functions, organized by output name and scenario number. During this process, YASAIw shows a moving progress bar, and the user may abort the simulation by clicking an “Abort” button. When all NS recalculations are complete, YASAIw produces at least two and up to four worksheets with output reports. The output reports become new worksheets of the current workbook, as follows: •
Simulation Output – this worksheet contains information for each output name/scenario number pair. For each such pair, the report contains the mean, standard deviation, minimum, maximum, and selected percentiles in 1% intervals. •
CFD Output – This worksheet contains output of the cumulative frequency distributions (CFD) for each output name/scenario number pair. For each such pair, the report contains the mean, standard deviation, minimum, maximum, and the CFD at 1% intervals •
Iterations Output – This optional output worksheet contains the values of all of the SIMOUTPUT variables for each iteration. This worksheet is written if this option is selected in the YASAIw main dialog box. •