SUGI 30 Tutorials Paper 237-30 HOLY MACRO! An Intuitive Approach to Understanding the Macro Facility Michael J. Molter, Howard M Proskin and Associates, Rochester, NY ABSTRACT The macro facility allows us to write macro code that instructs SAS® how to generate the more familiar open code found in DATA steps and PROCs. With such a tool, we can consolidate families of related programs, execute the same piece of code repetitively, and develop applications for others to use. This replaces error-prone, highmaintenance jobs such as saving multiple versions of essentially the same program (with slight differences depending on the task) or excessive typing or copying and pasting. The best part is that most of the code is similar to what we find in open code. The key to understanding how the macro processor works is in understanding the differences – both in the code itself, and in the way the two types of code are processed. This theme will be emphasized throughout the paper. Discussion begins with macro variables, beginning with initialization and referencing, followed by other subjects such as functions, quoting, and combinations. Macros are then presented, beginning with creation and referencing. Examples begin with “open-code” macros followed by a discussion of macro logic and examples of its use with open code in “mixed-code” macros. Finally, I will discuss different ways macros are written and used, depending on who they are written for. This paper is perfect for programmers who are comfortable writing in BASE SAS and are ready to take the next step toward automating their programs. It is applicable in any operating system. INTRODUCTION In your first algebra class one of the first exercises you learn is to evaluate an algebraic expression for a given value of the variable. For example, evaluate the expression 3x+2 when x=5. The idea is to take the given value of x, plug it into the expression and see what you get. At first, the way this exercise is expressed just seems like a fancy way of saying “triple 5 and then add 2”. It is only when you start plugging in several different values of x and start understanding what a variable is, that you start to appreciate the value of this type of exercise. The macro facility is not much different. Imagine replacing the constants 3 and 2 in the algebraic expression with valid SAS code. The expression 3x+2 has now been replaced with a SAS program, but what does the x from the algebraic expression translate into? This is where the macro facility comes in. Now imagine that x can play a role in the program similar to that in the algebraic expression, except instead of a range of numbers constituting the valid values, imagine that different pieces of valid SAS code constitute valid values. By valid, I mean anything that when substituted into the “variable” generates syntactically correct SAS code. So numerical values are valid if the “variable” is in a part of the program where substitution of a number makes sense. Other times, substitution of a single word, a statement, or even groups of DATA steps and PROCs may be valid, depending on the reference. The analogy is now complete. The constants 3 and 2 correspond to “open” SAS code. The variable x in the algebraic expression, a stand-in for a range of numbers, now corresponds to a macro facility reference, a stand-in for SAS code. The exercise of replacing the variable with the given value and evaluating the expression corresponds to actual execution of the program. You see x and replace it with 5. The SAS system sees a macro facility reference, calls on the macro facility, generates the code, and executes the program. Sounds simple, right? A SAS program laden with macro facility references is like an algebraic expression laden with variables. The process of replacing those macro facility references is the same as replacing the algebraic variables with a given value. On the surface, the two exercises are fundamentally the same. Below the surface, two parts of this exercise, referencing and definition, begin to illustrate why code-generation is a non-trivial task. Referencing refers to the way you invoke or call the macro facility. In an algebraic expression, you use letters to represent variables because they are easily distinguishable from the numbers. However letters and numbers can be found in open SAS code. A macro facility reference represented only by a letter, a number, or even combinations of the two would not be distinguishable from open code. For that reason, though references to the macro facility contain letters or combinations of letters and numbers, when used in a program, they must be preceded by special characters known only to the macro facility. Not only that, but combinations of macro facility references and even combinations of these special characters can be used to provide specific instructions to the macro facility on how to substitute. Since a reference is replaced with SAS code, it can be used in any part of a program where the substitution results in valid SAS syntax. Definition refers to the way you tell the SAS system which value is to be substituted. In the algebraic expression, the part of the exercise that specified “when x=5” told you which value in the range of x to substitute for x. But how do 1 SUGI 30 Tutorials you tell the SAS system which value to substitute for the reference, or in other words, what code to generate? The answer is not much different from the way you always communicate with the SAS system. We tell the SAS system what data sets to create, what output to produce, and what analyses to run through open code. Ironically, you tell the macro facility how to generate code through more code – macro code. Without macro code, almost all SAS code revolves around a set of data. If you are not creating a data set with a DATA step, you are using one in a PROC. The exception to this rule is the set of global statements such as TITLE, ODS, and LIBNAME, which help set up the environment. Just as the choice of 5 for x was made independent of the algebraic expression, defining the way code is generated is generally independent of the rest of the program and any data. For that reason, these definitions tend to exist outside of DATA steps and PROCs. Up to this point, I have used the term “macro facility reference” as a generic term to mean a call to the macro facility for a substitution of code, but gave no indication as to how the macro facility knows what to substitute. I am now ready to be more specific. The macro facility gives us two ways to define through macro code how open code is generated. One way is by storing the code itself in memory. Macro variables are units of memory where you can store text. Part of initializing (or defining) a macro variable involves naming it. Once this is done, you can generate within open code the stored text by referring to its name preceded by an ampersand, the special character for macro variables, in the place where the substitution is to take place. The second way is by storing logic that produces the code. A collection of such logic is called a macro. Again, part of defining a macro involves naming it. After that, just as with macro variables, you can refer to it in code by name, but preceded by a percent sign rather than an ampersand. In both cases, before the DATA step or PROC in which a reference is found can be processed, the substitutions must be made. The good news is that macro code is similar to DATA step code. Both follow a similar syntax that includes semicolons at the end of statements. Both have statements that allow you to direct messages to the log, create and initialize new variables, and define windows. Both contain IF-THEN logic to execute such statements conditionally, as well as DO-END blocks to execute several of them conditionally. The data set contains variables that reflect variability within a set of data, the reason you analyze data. The macro facility contains macro variables that reflect coding variability within a program, which reflects the changing circumstances that dictate what code to execute. In the DATA step, it is often the value of a data set variable that determines the result of a statement and ultimately the observation that is written to the new data set. Similarly, it is the value of a macro variable that determines the code that is generated. In addition, data set variable functions provide you with information about the value of a variable on any given observation while macro variable functions provide you with similar information about the value of a macro variable. Finally, the DATA step has iterative DO blocks that allow you to process a set of DATA step statements with each iteration of a data set variable, while similar DO blocks process macro statements and generate code with each iteration of a macro variable. It is convenient that the means to the end for creating a data set is similar to that for creating code. Because of that, this paper will attempt to help you develop an understanding of the macro facility by drawing on your knowledge of the DATA step. In order to fully develop this understanding, you must learn where the similarities stop and the differences begin. Data step code is geared toward generating observations for a new data set, and macro code is geared toward generating open code. Because the code found in a DATA step applies to all observations being created, it must be processed iteratively or once per data record being read. Because macro code defines how to generate code, it does not have to be processed iteratively. Additionally, because macro code generates open code and open code is what is needed to execute a DATA step or a PROC, macro facility references must be defined and resolved prior to execution of the DATA step or PROC. As the title suggests, this paper is intended to guide you through the workings of the macro facility using your intuition. It is assumed that you have a solid knowledge of what is available in the DATA step. We have discussed some of the similarities between macro code and DATA step code as well as differences. Throughout this paper we will continue to draw upon those similarities while continuing to emphasize the differences. In the end, your view of SAS code as a series of DATA steps, PROCs, and a few scattered global statements will be expanded. This is not intended to be a fully comprehensive instruction on the entire macro facility. Rather, the focus will be split between macro variables and macros. Other topics such as system options or macro statements will only be discussed within these contexts. The next section will discuss why you need code-generation capabilities and how they work. Following that, we will get into macro variable specifics. We will start with features and properties of macro variables, and then move into detailed discussions on definition and referencing. Comparisons with data set variables will be used throughout. Following that will be a section on macros, again detailing properties, features, definition and referencing. Comparisons to DATA step logic will be used throughout. WHY DO YOU NEED CODE GENERATORS? One advantage of macro facility references is that they allow you to broaden the scope of your programs by accepting and controlling changing circumstances in an organized way. Often times you find that while the general 2 SUGI 30 Tutorials purpose of a program is useful under many circumstances, each of those many circumstances has slightly different criteria, which requires slightly different code, but how do you change that code? If the program is small enough and the number of places in the program that require change is small enough and if the number of users with write-access to make these changes is under control, then it may be safe to scan through the program either manually or by searching and making the changes. This may mean replacing old code with new, adding new code but commenting old, or making changes and saving the program under a different name. Generally speaking, for many reasons this is not a good practice. Programs are often too long and many times the task is spread across multiple programs. Also, the primary user of the program who has the responsibility of implementing the new criteria is not the person who wrote the program, and may not be someone who should be changing code. The macro facility allows you to accommodate multiple circumstances without having to make changes to the core of the program. This is achieved by dividing a program or an application into three distinct pieces. One piece accepts user input, usually through initialization of macro variables. A second place, if necessary, is macro code that defines how user input generates code. The third piece is the core code. Rather than writing code in the core that is subject to change, the programmer replaces each of those pieces with a reference to the appropriate definition. Upon execution, when this reference is reached, code is generated according to the definition being referenced. With this setup, the input piece is designed for straightforward specification of the criteria by users and drives code generation throughout the program, In our first example, an application is designed to provide statistical summaries of all employees whose salary is at a chosen threshold or higher. Because the database contains a data set for each department, several DATA steps must be written to extract data for all employees whose salary qualifies, each filtering on the chosen threshold. In addition, TITLE statements are issued that each states the threshold. Rather than asking users to find all places where this number is to be part of the code, you create a macro variable in the beginning of the program where they can enter a value. Each filter in each DATA step as well as each TITLE statement will now contain a reference to this macro variable, and upon execution, that code will be generated in these spots. This example is characterized by open code containing references to a macro variable whose value is directly set by the user. Though any user could scan through the program and insert their choice of threshold, this use of macro facility references offers the advantage of making changes in one place instead of many, reducing the likelihood of overlooking substitutions. In this case the generated code represents a data set variable value, but this does not have to be the case. This approach can be useful whenever a programmer feels that the code to be generated can be specified by the user. For example, if a group of users is familiar with variables in a data set, the programmer can create a user-defined macro variable to be referenced in a KEEP statement. If users are familiar with the statistics keywords that the MEANS procedure recognizes, then a user-defined macro variable populated with these can be referenced in PROC MEANS. If users are familiar with the directory structure, they can specify the directory path where they want data sets to be saved in a macro variable to be referenced in a LIBNAME statement. For our second example, we return to the situation in example 1. Because the macro variable was to be used as a filtering value of a numerical data set variable, users were instructed to supply only the number, without other characters such as dollar signs and commas. You now realize that you do want these formatted values in TITLE statements. One solution is to ask users to initialize two macro variables. The first is the same as in the first example, the unformatted value, and the second is the formatted value. For example, the macro variable NBR1 is initialized at 30000, and the macro variable NBR2 is initialized at $30,000. A second option is to ask users for one of these and through macro functions and/or macro logic, derive the other. Something to consider when choosing one of these options is the fact is that the more you rely on user input, the higher the likelihood for error. If you allow users to initialize both, they may forget to specify one in which case the titles do not reflect the actual summaries generated. Inconsistency is also a possibility if, for example, commas or dollar signs are sometimes neglected. In instances where the input is not numerical, inconsistencies in spelling or case can also occur. Generally speaking, deriving values from minimal user input generates more consistent code than relying on that from users. While code such as data set variable names or values or data set names may be reasonable for users to supply as part of the criteria specification, sometimes the code to be generated is too long or too technical to expect users to specify. In such cases, you define for users meaningful keys to use for input. You then use these keys to create links between the input and the code to be generated. This can be accomplished using a combination of macros, macro functions, and other macro variables. For our third example, consider an application that creates daily, weekly, monthly, or annual reports based on user input, but the layout, and therefore the TABLE statement in the TABULATE procedure for each type is different. Of course, not only is the text used to construct a TABLE statement too complicated to ask a user to specify, but there is also no meaningful relationship between such text and the type of report it is supposed to represent. For that reason, you create a macro variable in the beginning of the program where users enter a value such as D, W, M, and A (for Daily, Weekly, Monthly, and Annually respectively), only this time, instead of generating code in the core of the program from that value, you generate code based on macro logic that links these keys with their respective text. The advantage is obvious for users who are not programmers and do 3 SUGI 30 Tutorials not know how to construct a TABLE statement, but even the world’s most knowledgeable experts on PROC TABULATE draw benefits. Every possible TABLE statement only has to be thought about and written once. Once these are part of the macro definition, every subsequent call of the macro consistently generates the defined code. Why use D, W, M, and A instead of asking users to spell out the words? There is no technical reason why you cannot use the words for input, but keep in mind that this is more vulnerable to differences in spelling and casing. For example, if your macro code constructs the TABLE statement based on user input values of “Annual”, “Weekly”, “Monthly”, and “Daily”, then nothing will be generated for users that specify “Annually”, “Weeky”, “DAILY”, or “monthly”. In such a case, extra caution should be taken when specifying in your macro logic the conditions to be met to generate the TABLE statement. For example, you might consider executing based on the UPCASE of the first character of the input. This would catch all misspellings that at least started with the correct character, and would also erase any concerns regarding casing. Sometimes you may decide to create input keys for the user, not because the generated code is too technical or too long, but simply to create consistency in output. Consider example 4 with the same situation as example 3, but now you do allow users to spell out the words rather than supply the first letter. As discussed, proper care has been taken to ensure that the TABLE statement will be generated, even from different spellings and casings of the input. Unfortunately, it is discovered that the inconsistencies in the input has led to inconsistencies in the titles on these reports, generated from TITLE statements containing references to the macro variables used for input. For example, some daily reports contain the text “Daily reports” in their title, some contain “DAILY reports”, and some contain “daily reports”. To create consistency across reports as well as within the title, you derive a macro variable from the input with the spelling and the casing you require for all reports. The examples in this section thus far have involved code generation in relatively small sizes, but there is no limit on the amount of code the macro facility can generate. As the scope of an application grows to handle a wider variety of different tasks, it may be entire sections of code consisting of multiple DATA steps and PROCs that change. For our fifth example, consider an application that creates a series of reports for several divisions of a company. The reports are to look the same regardless of which division they represent, but each division stores their data in much different ways. One division stores all of their data in a SAS database in which each data differs from the others according to the year it represents. A second division stores all of their data in an Oracle database and tables differ from each other not by time, but by the class of parameters they represent. For example, one table represents demographic information while another represents sales information. A third division uses a DB2 database and also separates their tables by parameter groups, although not the same as the second division. Additionally, variables in different databases that appear to represent the same parameter may use different values from one database to the next (e.g. “F” and “M” represent GENDER in one database whereas “1” and “2” are used in another). Not only should reports from different divisions look the same but they should also represent the same data. For example a report by age should calculate age consistently across the divisions. The general purpose of the application is to extract data from the appropriate database based on the division supplied by the user. Following extraction is whatever code is needed to transform data from that database into a common set of variables with common definitions. After this, a common set of code will create the reports regardless of the data source. Conceptually, this example is not much different from the last two examples. The programmer provides keys for users to indicate a division, and macro code is used to link them with generated code. The difference between this example and example 3 is the size of the task that is subject to change, which reflects the amount of code necessary to generate. In example 3, all that differed between the types of reports was the layout of the table. From a programming perspective, all that differed was one statement. In example 5, it was the entire preparation of the data that differs between divisions. With division 1 you can use a DATA step to extract, but with divisions 2 and 3 you need to run SQL native to the environment. After that, each division will require its own code to create the common variables based on variable names, values, and definitions used in the database from which it came. Through many examples, we have now seen that code that is likely to change from one execution of a program to the next based on different criteria and circumstances is a good candidate for being generated according to a previouslydefined specification or algorithm. Such code can represent values of a data set variable, names of data set variables or data sets, entire statements or even sections of code. Sometimes, it is more than just the content of the code that needs to be generated, but also the quantity. Sections of code that contain repeated patterns can be generated by defining the pattern in a macro. For example 6, we return to example 1 where we extracted from multiple data sets those whose salaries exceeded a threshold. Suppose you have fifty data sets from which you need to extract. One way to write this part of the program would be to write the DATA step, copy it, and paste it fortynine times. After that, you have to change the data set name in each SET statement as well as each DATA statement. On the other hand, if you can use a predictable text pattern to name each of the fifty data sets being read and another pattern to name each of the fifty being created, then you can use the iterations of a macro DO loop to define a general pattern for this DATA step. Calling the macro will generate the DATA step as many times as the 4 SUGI 30 Tutorials loop iterates. The macro variable references present in the DATA step that represent the number of the current iteration will be replaced by that number as the macro executes. This way, any spec changes can be made in the pattern definition rather than making the change in all fifty DATA steps. Of course you can generate repeating patterns of any amount of code. In the current example, we have generated a DATA step fifty times. In order to generate summaries based on all employees, we need to concatenate all fifty data sets that were just created. For this there is no need to generate fifty DATA steps but we do need to generate the names of all fifty data sets in the SET statement. Again, we use the iterations of a macro DO loop to define the pattern of data set names to be listed in the SET statement. More details on macro DO loops will be provided in the Macros section. We generate code for two main reasons. One is to accommodate regularly changing criteria in a way that preserves the general purpose of the application while leaving the core code untouched. A second is that the necessary code contains repeated patterns. By supplying the macro facility with the block of code that needs repeating and a definition of the pattern, you can save yourself the time of either typing or copying and pasting all the necessary occurrences of the block and manually changing whatever differs between them. Sometimes both of the reasons are applicable. In such cases, a predictable pattern may exist but the number of iterations may vary from one execution to the next based on user input. For our seventh and final example of this section, suppose a summary of a data set variable called VAR1 is to be created in which users choose the statistics to be computed by listing them in a macro variable assignment, each separated by spaces. For each statistic requested, the resulting variable in the output data set will be named by the name of the keyword that represents the statistic preceded by the text VAR1 (e.g. max=VAR1max), specified in the OUTPUT statement. Because you have given users the flexibility to specify as many statistics as they want, you have no upper bound for an iterative DO loop and so cannot include any set number of variable naming specifications of the form stat=VAR1stat in the OUTPUT statement. However just as the DATA step has DO-WHILE and DO-UNTIL loops to handle these kinds of situations, the macro facility has corresponding functionality. With the help of %SCAN (see Macro Functions section later) to parse the input macro variable into the individual statistics, the code to request all of the statistics can be generated. Alternatively, macro variables can be created to represent each of the statistics requested, as well as one that counts them. This last macro variable can then be used as an upper bound for an iterative DO loop to generate the code. HOW DOES IT WORK? At some point prior to referencing a macro or a macro variable, it has to be defined. Standard practice is to define macros outside of steps since they are usually independent of any data. References to macros defined within steps will be resolved as long as they follow the definition. However in the DATA step, if a definition follows a reference, the reference will not be resolved for ANY of the iterations of the DATA step. In this paper, all discussions on macros will assume it has been defined outside any step. All macros are defined by the percent sign followed by the string MACRO. Macro variables are also generally defined outside of steps (we will see exceptions to this rule in the Macro Variables section), either within macros (in which case their use may be limited by the environment in which they were defined) or in open code (in which case their use is unlimited). Details on defining macros and macro variables will be discussed later in their respective sections. Almost every key on your keyboard, when found in a SAS program, is accepted as literal text by the SAS system. Alone they may not mean much, but certain combinations of them have special meaning within the SAS syntax. For example, the combination of the letters D-A-T-A in the beginning of a program or following a semicolon tell the SAS system that the following text will name a new data set. The combination of the letters K-E-E-P following a semicolon tells the SAS system before execution the list of variables to keep in the data set it is about to create and to make room for them in the program data vector. The two keys NOT always accepted as literals are the ampersand and the percent sign. When the SAS system sees either of these two characters followed immediately by text allowable in naming macro variables and macros (mostly letters, underscores and numerals, though numerals cannot begin the name of a macro variable or macro), then the combination is not part of the code to be compiled or executed. Rather, it is a call to the macro facility to generate code in its place. The exceptions to this are %INCLUDE, %LIST, and %RUN which are not executed with the macro facility. You know that the steps (DATA steps and PROCs) that make up a program are executed sequentially, but once inside any given step, compile-time statements are processed before execution-type statements. For example, before creating observations, the SAS system has to survey the situation or look over the code in order to set up a program data vector. If no macro facility references are present, then the SAS system can proceed through compiling, executing and moving to the next step. On the other hand, once the SAS system reaches a step that does contain macro facility references, before executing or even compiling any code, the SAS system has to know what the code is! Therefore when an ampersand or a percent sign is seen, the macro facility is called in to generate code. When the character is an ampersand, the macro facility looks through its list of macro variables for one named by the string of characters that follow the ampersand. When the character is a percent sign, a few possibilities exist. One 5 SUGI 30 Tutorials possibility is that the string of characters that follows the percent sign matches that of a macro function. If an argument of a function contains a reference to a macro variable, the list of macro variables is consulted to resolve it. Either way, the macro facility evaluates the function after macro variables have been resolved, and generates the result at the reference. A second possibility is that if the string of characters does not match that of any keywords for definitions or functions, but is allowable for naming macros, then it must represent a macro. Again, the macro facility is consulted to generate the code. Of course these calls to the macro facility do not have to be found within steps. When a macro contains entire steps, it can be found on its own, independent of any step. Also, global statements such as TITLE and LIBNAME can contain macro variables and functions of them. MACRO VARIABLES With a conceptual idea of what a macro variable is, we are now ready to dive into the details. We will start by observing some of the high-level similarities and differences between macro variables and data set variables. For starters, they are both variables. That means that both of them are representations of variability whose values can change with circumstances. Second, you can work with each type of variability with SAS code. With this code, you can create new variables whose values are either literals or based on other variable values. In the case of the latter, you have text functions as well as mathematical functions. You can also create both kinds of variables with iterative DO loops and use their values as conditions on which to base further action. Where they start to differ is in what they represent and what their ultimate goal is. While a macro variable represents variability in potential code, the data set variable represents variability among observations of a data set. Because of its tie to the data set, a data set variable must be created in the DATA step, making use of the iterative process to define values for every record of data being read and observation being created. As a result of this, the value of a data set variable value is in memory only as long as the corresponding iteration is processed. With the help of RETAIN it stays in memory longer, but it is gone after the DATA step is completed. Macro variables are generally independent of data, and so do not need to be created within a DATA step. This means that its value is remembered not only across iterations of a DATA step, but also outside of DATA steps and across steps of a program until explicitly changed. References to data set variables in a DATA step are references to the value of the variable for the observation under consideration. In a PROC, these references pertain to the variability of the data set being analyzed. Macro variables references are references to strings of text, possibly to be used as code. References to data set variables and their variability can only be made in areas of steps where the syntax allows. For example, in several procedures, you obtain descriptive statistics on a data set variable by referencing it in the VAR statement. You separate analyses according to the values of a data set variable by referencing it in a BY statement. You create a data set variable by referencing it immediately after a semicolon and following it with an equal sign. If a string of text is the same as that of a data set variable name and it is found anywhere else such as within quotation marks of a FOOTNOTE statement or following DATA in a DATA statement, it is not a reference to the variable. However, since a macro variable is nothing more than a string of text, its definition presumes nothing about how it will be used. For that reason, a macro variable reference can be found anywhere in code, as long as resolution of it leads to syntactically correct code. For example, a reference to a macro variable might be found in a DATA statement to name a data set and in a CLASS statement in a subsequent PROC TABULATE if its value happens to coincide with the name of a variable found in the data set being analyzed, as is the case in the following piece of code. data &sample ; data step code ; keep &sample other variables ; run; title “Average &sample by department for 2004” ; proc tabulate data=&sample ; class &sample ; other statements ; run; Here the macro variable SAMPLE serves two purposes. First, its value matches the string of text used for a data set variable name. For that reason, it is reasonable to find it wherever you might find data set variable references, such as the KEEP statement in the DATA step and the CLASS statement of the PROC. Second, the same text string, or the same value of the macro variable is being used to name a data set. With this example we see the value of a macro variable being used across multiple steps of a program, and used in different ways. In all cases, the effect is the same – code generation. On the other hand, the use of a text string inside a TITLE statement or to name a data set that happens to match the name of a data set variable is in no way a reference to the variability represented by that variable. Data set variables are the object of our analysis. We also use them for other reasons while creating data sets, such as filtering, creating other data set variables, iterative DO loops, etc., but in the end, it is the variability within a group 6 SUGI 30 Tutorials of data that drives our analyses and the decisions we make based on them. Though by definition, macro variables vary in their values too, we usually are not interested in analyzing them. Their greatest contribution is in helping us create code. To the extent that they share characteristics, the SAS system provides us with functionality for macro variables similar to that of their counterparts. In these cases, we can use what we know about data set variables to help us learn more about using macro variables. PROPERTIES Because macro variables are text strings by definition and primarily used for code substitution, they do not have the attributes that data set variables have such as length, formats, informats, and most notably, type. Unlike data set variables that can be numeric or character, macro variables are always character. Just as data set variables can be defined as literals or in terms of other data set variables, macro variables can be defined as literals or in terms of other macro variables. The difference is in how literals are distinguished from variable references. With data set variables, literals are marked by quotation marks. You use quotation marks when you define them (e.g. NAME=”Mike”) and everywhere you refer to such values (e.g. if NAME=”Mike”). Anything is fair game within the quotation marks too. In other words, special characters (e.g. semicolons and spaces) and mnemonics (OR and AND) or anything else that may otherwise have special meaning outside of quotes are masked or considered part of text within quotation marks. The absence of quotation marks represents a reference to other data set variables or a function of them. Quotation marks can also mask macro variable values but unlike data set variables, when used to define macro variables, they are included as part of the text value. Also unlike data set variables, they are not required to define a macro variable as a literal. While the absence of a marker indicates a reference to another data set variable or a function in the DATA step, reference to another macro variable is marked by an ampersand. Additionally, macro functions of macro variables are marked with percent signs. The absence of such a marker is interpreted as a literal by the macro facility. Consider the difference in the way quotation marks are used in the following table. Specifics of the macro statements and functions will be discussed in later sections. DATA Step Statement DEPARTMENT=”SALES” ; IF DEPARTMENT=”SALES” THEN … PUT “THE DEPARTMENT OF THE MONTH IS “ DEPARTMENT ; SCAN(DEPARTMENT,1,”L”) Macro Facility Statement %LET DEPARTMENT=SALES ; %IF &DEPARTMENT=SALES %THEN … %PUT THE DEPARTMENT OF THE MONTH IS &DEPARTMENT ; %SCAN(&DEPARTMENT,1,L) Of course in addition to marking literals, quotation marks also mask special characters and mnemonics. Consider the following DATA step statement. VAR3=”PENCILS ; ERASERS” ; Because of the quotation marks, the DATA step is not tempted to end the statement at the semicolon that follows PENCILS. Without quotation marks, how does the macro facility know which semicolons to interpret as part of the text, and which indicates the end of the statement? For another example, consider a macro statement that conditionally executes a statement based on the value of a macro variable. Suppose the value of the macro variable contains a space followed by OR followed by another space. OR is a logical operator in the macro facility. Does the macro facility interpret this as text or as a logical operator? Finally, consider a macro function that expects three arguments, but one of the arguments is populated with a macro variable whose resolution contains a comma. How do we tell the macro facility to interpret this comma as text rather than an argument delimiter? Just as quotation marks mask the meanings of special characters in the DATA step, macro quoting functions are available to clear up ambiguity that can arise from situations similar to those mentioned here. Another consequence is that you can assign numbers to macro variables, but again, it is only the numeral text that is assigned and not any quantity as with numeric data set variables. To extend that idea, you know that the DATA step statement VAR2=8+4 will store the value 12 in VAR2, but storing the string “8+4” in the macro variable VAR2 will only store the sequence of characters, not the arithmetic result in the macro variable. We will see later that the macro facility does have special functions that allow for such calculations. With the help of macro quoting functions, macro variables can be as flexible as you need them to be in terms of possible values, but you need to determine what the best way is to use that flexibility for a given application. Macro variable values can be one “word” (e.g. a data set variable name, a variable value, library references, a data set name, any alphanumeric string without spaces) or many words. They can be partial words (e.g. data set variable name prefixes) or they can be null (e.g. when what varies is the decision to include a piece of code or not). At the other extreme, macro variables can be as long as 65,534 characters. Also, you are unlimited in the number of macro variables you have. Technically, you can write a program with nothing but several macro variable references, or a program containing nothing but one macro variable reference, where that variable includes several DATA steps and PROCs. The price you pay for macro variables, however, is memory, maintenance, and readability. As a programmer, just as a consumer in a department store, you have to ask yourself what you have bought with macro 7 SUGI 30 Tutorials variables and if it is worth it. Generally macro variables are most useful when they represent code that has the potential to change relatively frequently. A program laden with macro variable references may help address all the variability in a task, but also may be hard to read. Just as a book is harder to read when it is written in French, forcing you to continually consult a French-English dictionary, than if it were written in English, programs with macro variable references are harder to read than those without. Also, more macro variables mean more macro variables to maintain. When these get lost in the shuffle, they may accidentally be overwritten or not overwritten when they should have been, and can lead to unexpected results. OTHER PROPERTIES Macro variables cannot resolve inside of single quotation marks. Keep this in mind when using a macro variable inside a TITLE or a FOOTNOTE statement, or a PUT statement. Suppose also that you assign a string of text to a macro variable that you plan to use as a literal in a DATA step. Since literals must be surrounded by quotation marks in a DATA step, any reference to this macro variable must be surrounded by double quotes. For example, you are creating a data set with a variable called OFFICER whose value at any given execution will be the title of an officer of choice (e.g. PRESIDENT, SECRETARY, etc.). Following a macro variable definition statement that initializes a macro variable that is also called OFFICER, you have a DATA step with a statement that reads OFFICER=”&OFFICER”;. Macro variables are not always available for referencing. In general, macro variables defined outside of macros are always available and are described as “global” macro variables, whereas macro variables defined inside macros are available only during the execution of that macro and any sub-macro, and are described as “local” to that macro or environment. The exception to this is when a macro variable of the same name already exists “more globally”. This exception can be overridden with a %LOCAL statement inside a macro, which keeps separate two macro variables of the same name defined in two environments. References to the macro variable within the macro affect the value of the one defined in the macro while references outside the macro affect the one defined outside. The use of this statement is recommended in applications that use many macro variables defined in many different environments. On the other hand, a macro that would otherwise be local can be made global by naming it (without an ampersand) in a %GLOBAL statement before it is defined. DEFINING MACRO VARIABLES The method by which you define a macro variable somewhat depends on how you plan to use it and who will be initializing it. You know that the ultimate goal is to generate code with it, but you now know that that can be accomplished in a couple of ways. Sometimes a macro variable is directly referenced in open code, and sometimes it is used in macros only to create other macro variables or generate code directly without creating other macro variables. You know that a macro variable is the means by which a user specifies a set of circumstances, and you know that the user may only be ourselves, a group of other programmers, or a group of non-programmers who need a user-friendly interface on a multi-purpose application. This section will describe three general ways of initializing macro variables, none of which require a macro to do so. Two other methods that require a macro will be discussed in the Macro section. The most versatile and straightforward method for defining a macro variable is with the macro statement %LET, a statement that is useful inside and outside of macros. Found outside of macros, this is one common way of accepting input from a user. Rather than asking a user to search through a program to make parameter changes, you create a block of %LET statements in the beginning of the program and provide instructions (by way of comments or a separate document) on how to use them. Returning to an earlier example, users specifiy a salary to use as a threshold for filtering data sets, both in an unformatted and formatted way. %LET NBR1=30000 ; %LET NBR2=$30,000 ; This code is in the beginning of the program, and references to NBR1 and NBR2 throughout the program are replaced by their respective values. Inside of macros, %LET is a quick and convenient way to create links between user input and code to be generated. Reasons for using it mirror those for creating data set variables in the DATA step. Just as created data set variables may be the target of analysis or just a means to an end in a DATA step, we can create macro variables to be referenced in code or to lead to other code to be generated. Also like its DATA step counterpart, we can use %LET to create macro variables based on the values of other macro variables or as literal text. These statements can also be executed conditionally using macro conditional statements. Before moving on to other methods, consider the following illustration of how some of the differences between macro variables and data set variables manifest themselves using %LET. %LET GAME=CHESS ; DATA HOBBIES1 ; LENGTH GAME $10 ; SET HOBBIES0 ; 8 SUGI 30 Tutorials GAME=’CHESS’ ; RUN; Of course, to create a data set variable, you need to do so in a DATA step, whereas the macro variable does not. Whether in a macro or in open code, %LET is processed once, while the data set variable is processed for each observation read from the data set HOBBIES0. The value of the macro variable persists through the data set and into other steps until it is explicitly changed. The data set variable is restricted to contexts involving HOBBIES1. As emphasized by the LENGTH statement, data set variables have attributes such as length while macro variables do not. While both types of assignment follow the form new variable = value, the macro variable assignment needs the percent sign to signal the macro facility, whereas the DATA step assignment begins with the new variable name. Finally, the macro variable assignment needs no quotation marks. Depending on the user, you may feel that allowing users to specify input with a %LET statement is letting them too close to the code. For that reason there is %WINDOW. %WINDOW is a method of creating macro variables exclusively from user input by way of attractive, user-friendly input screens created by the programmer containing messages from the programmer (e.g. instructions) and fields for input. While its scope is not as broad as that of %LET, it does keep users further away from the code. Just as with its DATA step counterpart, the WINDOW statement, %WINDOW allows the programmer to specify the background color of the screen, the font color, number of columns and rows, and where to direct messages and input fields based on the current position of the pointer. Fields can be assigned attributes such as length and appearance (e.g. underline, reverse video, blinking). While the name of the field in the WINDOW statement of a DATA step corresponds to a data set variable name, the field name corresponds to a macro variable name in %WINDOW. When %WINDOW executes, it creates macro variables for each field if they are not defined already. With %DISPLAY, users have the opportunity to initialize these macro variables by typing their input into the field. Until now, we have talked about the macro facility as a separate entity from the DATA step with processing taking place outside of any data context. There are exceptions. Sometimes code needs to be generated based on data. In other words, information from a data set is needed outside of the DATA step to determine what happens next. CALL SYMPUT is a tool in the DATA step that allows you to create macro variables whose values are found in the data set. Because it is found in the DATA step, it does get processed iteratively with the rest of the executable statements, and its syntax is more typical of the DATA step. The macro variables created by CALL SYMPUT are local to the environment that is current when the end of the step is reached, but are not available until after the step is complete. If that current environment contains no macro variables, it will be created in the nearest non-empty environment in which the current environment is nested (though exceptions do occur). CALL SYMPUT has two arguments. The first is the name of the macro variable you are creating, and the second is the value being assigned. The first argument can be literal text, a combination of variables from the data set, or a combination of literal text and data set variables. As is always the case in the DATA step, literal text must be enclosed in quotation marks. Anything else is assumed to be a variable name. Of course, using only literal text means that only one macro variable is being created. This is useful when you only need one piece of information from the data set, such as the number of observations. For that reason, in such cases, CALL SYMPUT is usually executed conditionally, where the condition is true for only one iteration of the DATA step, such as the first or last observation. To the extent that it gets executed more than once, the macro variable continues to be overwritten. On the other hand, the first argument could be the name of one of the data set variables, or a function of multiple data set variables. In this case, the name of the macro variable created in a given iteration is the result of the expression for that iteration of the DATA step. Suppose a data set has an ID variable in which every value is found in only one observation. By naming this variable in the first argument, we create as many macro variables as there are observations in the data set, and carry outside of the data set, information from each observation. Of course even without such a variable, every data set has available the automatic variable _N_. Combining this with literal text is another way to create a macro variable for each observation. The second argument is the same as the first in terms of the syntax. Literal text is indicated with quotation marks. Any valid SAS expression involving combinations of literal text and functions of DATA step variables is allowed. When the second argument is the name of a character value, the SAS system will write it using a format that reflects the length of the variable. Values shorter than that length will be padded with spaces on the right. For this reason, it can be a good idea to use LEFT and TRIM or COMPRESS to remove unwanted spaces. Similarly, when the variable is numeric, the SAS system uses BEST12 to convert the number to character. Without suppressing spaces, written instances of this macro variable will be right-aligned. Version 9 offers CALL SYMPUTX, which relieves you of the hassle of using these compression-type functions to remove unwanted spaces. Consider the following three examples. Example 1 contains a data set called GEOGRAPHY with a variable called STATE, in which each value of STATE can be found in any number of observations. You need to use the number of unique values of this variable as the upper bound of an iterative DO loop later in the program, so you obtain this 9 SUGI 30 Tutorials information with a DATA step that reads GEOGRAPHY and passes the information out of the DATA step by way of CALL SYMPUT. Assuming that GEOGRAPHY has been sorted by STATE, observe the following code. data _null_ ; set GEOGRAPHY end=thatsit ; by STATE ; if first.state then counter+1; if thatsit then call symput(‘stateno’,put(counter,2.)); run; A few observations are noteworthy. First, you only needed this DATA step to pass along this information, but you did not need a data set, so DATA _NULL_ makes sense. Second, the second argument of the CALL SYMPUT turns the numeric variable into a character variable before assigning it to the macro variable. We know that macro variables are always character strings and this is no exception. Without this step, you leave the conversion to the SAS system. This example is characterized by the fact that you only needed one piece of information about the DATA step, and so CALL SYMPUT is executed only once, and a literal text string is used to name the macro variable. For example 2, in addition to the number of unique values, you need to pass along the values themselves. Consider the following change to the code. data _null_ ; set GEOGRAPHY end=thatsit ; by STATE ; if first.state then do; counter+1; call symput(compress(‘VAR’||put(counter,2.)),state); end; if thatsit then call symput(‘stateno’,put(counter,2.)); run; In addition to incrementing the counter by 1 at each new value of STATE, we also assign the value of STATE to a macro variable. The name of the macro variable begins with the text ‘VAR’, and is suffixed by a unique number, thereby creating unique macro variables. In the Macro section, we will see how you can use and reference these macro variables. This example is also characterized by the fact that you did not know which states would appear in the data set, so you used macro variables to store them. For example 3, suppose in the same database but in a different data set you have the variables STATE and CAPITAL, where each state in the United States is represented by STATE, and the corresponding value of CAPITAL is the capital of that state. Consider the following code. data _null_ ; set capitals ; call symput(state,capital) ; run; Here CALL SYMPUT is executed for each observation of the data set. In this case, we know that all states are represented, and so each state name is the name of a macro variable. The SQL procedure also offers macro variables through data, but with less options and flexibility. Using an INTO clause between SELECT and FROM, you can specify a comma-delimited list of macro variable names, each preceded by a colon. The value of the nth variable listed in the SELECT clause will be assigned to the nth macro variable listed. When specifying only macro variable names and not ranges of variable names, only the values of the first observation of the data set will populate the macro variables. On the other hand, when specifying a range of n macro variable names, either with a dash or with the word THROUGH or THRU, (e.g. :MACVAR1 thru :MACVARn), values of the corresponding data set variable from the first n observations will populate the n macro variables. Finally, by specifying one macro variable name followed by the keyword SEPARATED BY and a delimiter of your choice enclosed in quotation marks, you can create one macro variable whose value is the string of all values of the corresponding data set variable, separated by the chosen delimiter. Note that the ability to create macro variables named by values of a variable does not exist with PROC SQL. Observe the following code that accomplishes the same task as that of example 2 above in which each value of STATE was assigned to a macro variable along with the total number of unique values of STATE. proc sql noprint ; select count(distinct state) into :stateno from GEOGRAPHY ; select distinct state into :var1 thru :var%left(&stateno) from GEOGRAPHY ; quit; 10 SUGI 30 Tutorials The first statement counts the number of unique values of STATE without having to create the COUNTER variable as was done in the DATA step. The second statement uses the value of the macro variable created in the first statement as an upper limit on the number of macro variables to be created. As we will see later in the Macro Functions section, the effect of %LEFT is to left-align its argument. As with CALL SYMPUT, when creating macro variables for only the first observation of the data set being read, numeric values will be right-aligned. Without %LEFT, VAR&STATENO would resolve to the text string VAR followed by padded blanks followed by the value of the macro variable STATENO, thus causing an error. When specifying ranges of macro variables or when using the keyword SEPARATED BY, leading and trailing blanks are removed. Finally, a group of macro variables that need no defining are the automatic variables supplied by the SAS system. These macro variables mostly contain information about the environment, such as the name of the current PROC or macro that is executing, recent return codes, operating system information, and information known to your system such as the current time, day, and date. Because the names of most of these macro variables begin with the string SYS, it is a good idea to avoid creating macro variables named in this way. PROC SQL also creates macro variables to provide information about how many rows were processed by PROC SQL and SQL return codes. These macro variables begin with the string SQL. Other macro variables that begin with the string SQLX provide information about the results of an SQL query submitted through the Pass-Through facility. MACRO VARIABLE REFERENCING At this point, what else can we say about macro variable referencing? At the sight of an ampersand followed immediately by an allowable macro variable name, the macro facility is called upon to generate code in its place based on what has been stored in memory. We know that what is stored in memory is nothing more than text without any presumption about how the text will be used, once generated. For that reason, macro variables can be referenced anywhere in open code where the resolution generates syntactically correct code. We have talked about how big the value of a macro variable can get. You can assign to them such “words” as data set variable names or variable values or data set names, etc, or you can assign longer text such as a list of variable names or data set names or even entire statements or blocks of code. What we have not talked about is the opposite situation – shorter values of macro variables, to be used as portions of “words”. How do you combine macro variable references with text to produce a “word”? For example, how do you combine the text string “20” with a reference to a macro variable that represents two-digit years to generate four-digit years? In this section, we will discuss combining macro variable references with text as well as with other macro variable references. Along with generating text in open code, you can also generate with macro variables the names of other macro variables. When this happens, how is the macro processor to know whether these are to be used as text, or whether the macro facility should attempt to resolve them? We will discuss the notation used to instruct the macro facility to treat the result of macro variable resolution as another macro variable reference. Finally, we will discuss macro functions and their relationships, not only to DATA step functions, but also to macro variable references. For our first example, we will use the example alluded to above. Knowing that a macro variable reference is replaced by its value upon execution, it should be of no surprise that 20&YR2 would generate a four-digit year if YR2 is the name of the macro variable representing the two-digit year. When using a reference like this, make sure to consider how YR2 was initialized. If it was assigned through CALL SYMPUT or PROC SQL, make sure that the proper care was taken to eliminate leading blanks or stripping of leading zeros. Combining macro variables works the same way. For our second example, consider a medical claims database in which the data sets are named both for the type of claims they contain (IP for inpatient and OP for outpatient) and the year in which they were incurred (four digit years). With the macro variables TYPE and YEAR, users specify which claims to extract from the database. Combining the two macro variables references in a SET statement generates the name of the appropriate data set to process. data extract ; set db.&type&year ; more code Ambiguity arises though when the combination of text with a macro variable reference requires the reference to precede the text. For our third example, suppose that a smaller application than that of example 2 extracts only 2004 claims, but users still have a choice of claim type. If you take out the reference &YEAR and replace it with 2004, the SET statement now looks like this. set db.&type2004 ; The macro facility looks for a macro variable named TYPE2004, a valid name for a macro variable, does not find it, and issues an error. This was not an issue in the first two examples. In the first, the macro variable reference came at the end of a “word”, and in the second, it was followed by an ampersand. The question here is how does the macro facility know when it has reached the end of a reference? Remember that certain characters are not allowed as part of macro variable names. The space is one such character. For that reason, in example 1, once a space is reached after the 2, the macro processor knows that this must be the end of the reference. Ampersands are also 11 SUGI 30 Tutorials prohibited; thereby removing any ambiguity that otherwise may arise in example 2. Numerals such as 2, however, are allowed. In the reference &TYPE2004, the macro processor has no way of knowing that the reference ends after the e. For that reason, the period is provided as a delimiter for the macro processor. The period allows you to concatenate macro variable references with text without having to separate them with unwanted characters such as spaces. You insert the period to signify the end of the macro variable reference, but because of its special status as a delimiter, it does not become part of the code. set db.&type.2004 ; If the value of TYPE is initialized to IP, then the generated code becomes set db.ip2004 ;. So a period that immediately follows a macro variable reference is always “swallowed” by the reference, or never becomes part of the code. Of course, this may not always be the desired effect. For our fourth example, suppose that the first level of the data set name is represented by a macro variable LIBREF, instead of “db”. By removing “db” and inserting the macro variable reference, you get the following: set &libref.&type.2004 ;. Just as the reference to TYPE swallows the period that follows, the reference to LIBREF does the same. If TYPE is initialized as IP and LIBREF is initialized as DB, upon resolution, the following is generated. set dbip2004 ; Rather than trying to process a permanent data set, the SAS system will try to process a WORK data set called DBIP2004. To fix this, keep in mind that the period is the end of a reference. Anything that follows it up until the next ampersand is just part of the open code. Therefore, following the delimiter period with another period will leave one period in the open code. set &libref..&type.2004; We know that we use macro variables primarily to represent different possibilities for a given piece of code. What if it is the case that not only does a piece of code vary, but also the macro variable to be used could vary from one execution to the next? We now return to the example where we had a data set with two variables, STATE and CAPITAL, where each value of CAPITAL represents the capital city of the corresponding value of STATE. In that example, we used CALL SYMPUT to create macro variables named for each value of STATE, whose values were the corresponding values of CAPITAL. With the user’s choice of a state, specified through a macro variable called PICKASTATE, you write an application that writes a message to the log indicating that state’s capital. It sounds simple enough, but there is one problem – because it coincides with the choice of the user, the macro variable that generates the capital city changes with every execution. Certainly if the message was always to generate a statement about the capital city of Oklahoma, the %PUT statement could read as follows: %put The capital city of Oklahoma is &Oklahoma; . However, since users are free to choose other states through the macro variable, we know that the statement would begin as follows: %put The capital city of &pickastate etc… How do you finish this statement? We know that the result of macro variable resolution is text. Sometimes, that text may be the name of a macro variable, but of course, once the resolution is over, the ampersand is gone, and the name of a macro variable is just another piece of text. In this example, if the user chooses Oklahoma, then the resolution of &PICKASTATE is Oklahoma. Though Oklahoma is the name of a macro variable, without any ampersands, it is just another string of text. However, when resolution of a macro variable reference yields an ampersand, the macro processor will try to resolve this result. In this case, this second read will generate the name of the capital city in the code. The macro processor will generate ampersands only from references that contain consecutive ampersands. Knowing how many to use and where to put them in a reference requires knowledge of three rules the macro processor follows for resolving such references. The first rule states that pairs of ampersands always resolve to one ampersand. Second, the macro processor makes a complete read of a macro variable reference from left to right before making a second read. Third, the presence of at least two consecutive ampersands guarantees a subsequent read and in fact, is the only way to get a subsequent read. Several consequences evolve from this set of rules. First, because of rules one and two, the number of ampersands present at any one time will be cut in half after a single read by the macro processor. In the case of an even number of ampersands, this means that no resolution of the macro variable takes place. When the number is odd, the last ampersand resolves the macro variable while the remaining even number preceding it reduces to half. A second consequence not unrelated to the first is the fact that the placement and the number of ampersands used is anything but arbitrary. Suppose that in the example above, we finish the %PUT statement in the following way. %put The capital city of &pickastate is &&pickastate... ; How does the second reference resolve, according to the rules outlined above? We know that the pair reduces to one ampersand. Continuing to the right, you have no more ampersands, but only the text PICKASTATE. After the first read, you are left with &PICKASTATE. Because of rule 3, we know that the macro processor will turn around 12 SUGI 30 Tutorials and make another read, but now it is reading the same reference as the first reference in the statement. The result is a message in the log that reads THE CAPITAL CITY OF OKLAHOMA IS OKLAHOMA., not the message you wanted. Generally, a reference containing only one macro variable with a pair of ampersands gives you nothing more than what you get with one ampersand. Suppose you add a third ampersand. This way, during the first read, the first two reduce to one and the third resolves the macro variable. What remains is &OKLAHOMA. Being guaranteed a second read, final resolution generates the capital city. To generate the message in the log, you use the following statement. %put The capital city of &pickastate is &&&pickastate... ; Sometimes an even number of ampersands is used to delay resolution of one part of a reference until another part is resolved. Suppose an application contains a FOOTNOTE statement that sometimes is to indicate the current day and sometimes the current date, depending on the value of the macro variable FOOTNOTE, initialized by the user. You have available the automatic macro variables SYSDAY and SYSDATE, but how do you make a reference that generates the user’s choice? One option would be similar to above. If the two choices for the user are the text strings SYSDAY or SYSDATE, then by referencing the macro variable FOOTNOTE with three ampersands, you would generate the desired output. However the prefix SYS does not mean anything to users. All they care about is the choice between the day and the date. For that reason, you make these the two choices. Of course these two choices are the suffixes of the two automatic macro variables of choice, and so you are faced with the situation of combining macro variables with text as described above. If you simply combine the text string SYS with a reference to the macro variable FOOTNOTE, then the result of SYS&FOOTNOTE is the text string SYSDAY or SYSDATE, depending on the user’s choice. Again, these are macro variable names, but without any ampersands left, they are just text. Suppose you place an ampersand in the front of the reference. Then &SYS&FOOTNOTE asks the macro processor to resolve a macro variable called SYS which does not exist. What you need is to use enough ampersands in the beginning of the reference to force the macro processor to wait until the reference in the suffix is resolved to resolve anything in the beginning. Notice what happens with a second ampersand in the beginning. With &&SYS&FOOTNOTE, the first pair resolves to one, but the text string “SYS” is left alone. Continuing from left to right, &FOOTNOTE resolves to DATE (if that is the user’s choice), and after the first read, you have the reference &SYSDATE. With a second read, the current date is generated in code. Footnote “Today is &&sys&footnote...” ; Note the use of the three periods following the macro variable reference. Since this reference will be read twice, the first two periods will be swallowed, leaving one period to end the sentence after complete resolution. For a more detailed discussion on the use of consecutive ampersands, please see Molter, 2004. FUNCTIONS Just as the DATA step has functions to provide you with information about data set variables, the macro facility has functions to provide you with information about macro variables. Just like macro variables, macro functions can be referenced in macros or in open code. Since the macro facility treats anything without a percent sign or an ampersand as literal text, macro functions must always be preceded by a percent sign. Also because of this property, the macro facility provides us with other functions the DATA step does not have or need. Finally, the macro facility provides us with one function that allows us to use DATA step functions that have no macro counterparts, such as numeric functions (e.g. MIN and MAX). One class of macro functions that requires little discussion is the text functions. These include %LENGTH, %INDEX, %SCAN, %SUBSTR, and %UPCASE. These work exactly the same way as their DATA step counterparts with the same arguments. As is the case with the rest of the macro facility, literal arguments need no quotation marks. Of course these correspond to only a small subset of DATA step functions, but more are available through autocall macros. Though technically these are macros and not macro functions, I include them in this discussion because many of them remind us of DATA step functions. Autocall macros are macros that come with your SAS system. To use them, two system options must be in effect. The SASAUTOS option points to where these macros are. This should be part of your configuration file so that you do not have to worry about remembering to activate it. Also, MAUTOSOURCE should be turned on to direct SAS to search these libraries for macros. This option is turned on by default. Included in these macros are %LEFT, %TRIM, %LOWCASE, and %CMPRES (a limited macro counterpart to the COMPRESS function). In another class of functions are the quoting functions. Because literals are not enclosed in quotation marks in the macro facility, you need another way to distinguish between literals and characters or mnemonics that have special meaning such as commas, semicolons, logical operators, etc. Suppose for example that you need %SCAN to parse a text string delimited with commas. While the third argument of the DATA step counterpart would surround the comma with quotation marks, without quotation marks, the third argument of the macro function is indistinguishable 13 SUGI 30 Tutorials from the commas used to separate arguments of the function. For a second example, imagine trying to assign with %LET a string of text that contains a semicolon to a macro variable. The quotation marks that surround the literal in a DATA step assignment statement mask the special meaning of the semicolon, but without the quotation marks, the semicolon intended to be part of the macro variable value ends the macro statement. For instances like these where characters intended to be literals could be interpreted as special characters during the definition or compilation of a macro or macro variable, the macro facility offers %STR to mask special meanings and treat them as literals. Also available are quoting functions such as %BQUOTE that mask special characters during execution. While %STR allows you to define a macro variable or a macro without ambiguity, what guarantee do you have that the resolution of a macro variable will not lead to more ambiguity? For example, if a conditional macro statement is based on a string of text stored in a macro variable that happens to contain a space, followed by the text string OR, followed by another space, then without a quoting function, the macro facility will interpret this as a logical operator within the condition and likely lead to errors. In order to prevent against this, you use %BQUOTE to mask any special characters that may be contained in the resolution of macro variables, especially when values of macro variables can be long and unpredictable (e.g. when the user specifies syntactically correct filtering criteria for a step). Both %STR and %BQUOTE also allow you to use unmatched quotation marks or parentheses without the SAS system expecting them to be matched, though %STR requires these to be marked with special characters. By specifying %NRSTR and %NRBQUOTE, percent signs and ampersands are also masked. Finally, each of the text functions mentioned earlier has a corresponding function with the same name preceded by a Q (e.g. %QSUBSTR) in order to mask the result of the function at execution time. We know that to assign the result of a mathematical operation to a data set variable, you can simply provide the numbers or the names of numeric variable along with the operator symbols (e.g.“+”). For example, the assignment statement X=10+5 assigns the value of 15 to the numeric variable X, but without a percent sign, the macro statement %LET X=10+5 simply assigns the text “10+5” to the macro variable. For that reason, the macro facility gives you %EVAL to allow you to perform mathematical operations on integers. To assign 15 to the macro variable X, you use the statement %LET X=%EVAL(10+5). %SYSEVALF is provided to perform mathematical operations on numbers with decimal points. Finally, for most of those DATA step functions without macro function counterparts, the macro facility gives us %SYSFUNC and %QSYSFUNC. For example, to compress b out of the string abc and assign the result to the macro variable x, you use the macro statement %LET X=%SYSFUNC(COMPRESS(abc,b)). Be careful though because you cannot nest DATA step functions within one %SYSFUNC function. You can however nest multiple %SYSFUNC functions. For example, suppose you create a macro variable with the statement %LET X=%STR( )ABC%STR( ). The macro variable X contains a leading and a trailing blank, which can be trimmed out with %SYSFUNC(LEFT(%SYSFUNC(TRIM(&X)))). MACROS We have discussed two main reasons for generating code. We also know that the macro facility provides us with two ways of doing so. One way is by storing potential code in memory, but we have seen that that is not always enough. One reason to generate code is to repeatedly execute a piece of code, with each execution sometimes differing from the others according to a defined pattern. While a macro variable can help us track iterations of this kind of execution, it takes more than the storage of a text string in memory to define the pattern. We also generate code to allow a program to run under multiple circumstances. When writing such a program to accept user input, we do our users a favor by offering them meaningful, intuitive choices for input specification. If we allow a user to specify gender, we ask them to use M or F, or Male or Female. We do not ask them to specify 1 and 2 just because those are the coded values in our database. When the code to be generated is more than what we want to ask of our users, we need logic to make the translation. Consider a task in which you are to analyze several variables. In front of you is a database with several data sets, each containing several variables, some of which will be analyzed, some will not. Some of those variables not being analyzed are still used to create additional variables not present in the database that do need to be analyzed. These are the two roles a DATA step variable plays – either it is analyzed, or through the DATA step, using DATA step tools, it is used to create other variables to be analyzed. An analogous statement can be said of macro variables. While the DATA step facilitates analysis, the macro facility facilitates code generation. Just as some database variables are ready from the beginning for analysis without manipulation, some macro variables can be referenced in open code. Just as sometimes we need logic to get to the other analyses, we also need logic to generate other code. To the extent that a tool is applicable to both environments, macros allow you to do with macro variables what the DATA step allows us to do with DATA step variables. Of course DATA step statements that help define a data set such as KEEP, RETAIN, and FORMAT are not necessary in the macro facility. However, a DATA step statement of the form IF (DATA step variable) (operator) (DATA step variable value) THEN (DATA step statement) clearly has use in the form of %IF (macro variable) (operator) (macro variable value) %THEN (macro statement) in the macro facility. Notice the percent signs to signal the macro processor. Among the possibilities for macro statement is %DO, to be 14 SUGI 30 Tutorials followed later by %END. Similarly, the DATA step statement of the form DO (data set variable) = a TO b, where a and b can be numbers or data set variables with numeric values, has a useful macro facility counterpart of the form %DO (macro variable) = x %TO y, where x and y can be numbers or macro variables with numeric values. DOWHILE and DO-UNTIL also have macro facility counterparts. Finally, maybe the most important analogy to understand is that between the DATA step OUTPUT statement and its macro facility counterpart. Whether executed conditionally or not, implicitly or explicitly, this is what creates the observation in the new data set. In this section, we will see what macros use to create code. Before getting into these details, we will briefly discuss the syntax of macro definition and referencing. A macro definition always begins with a %MACRO statement and ends with a %MEND statement. Following %MACRO is the name you give the macro along with parameter definitions if you choose to use them. As with other macro statements, a semicolon ends the %MACRO statement. The %MEND statement ends the definition of a macro. Optionally, the name of the macro that is being ended can follow %MEND. This is recommended particularly when macros are defined within other macros in order to easily detect which macro definition is ending. When referencing or calling a macro, you type the name of the macro immediately after a percent sign. Macro parameters are tools used to accept user input. The value supplied by the user becomes the value of a macro variable named for the parameter. This macro variable is always local to the macro that defines it. To define macro parameters, you follow the name of the macro in the %MACRO statement with a set of parentheses that enclose their definitions. If you are defining more than one parameter, each definition is separated by a comma. The way you define any given parameter depends on which of two types you choose to use. One is the keyword parameter. When defining a keyword parameter, you indicate the name of the parameter followed by an equal sign. Optionally, you can assign a default value on the right side of the equal sign. The following example illustrates the %MACRO statement for a macro that uses two keyword parameters, the second of which assigns a default value. %macro example1(parm1=,parm2=75); The second kind of macro parameter is the positional parameter. When defining a macro with positional parameters, you must define them before any keyword parameters. With these parameters, an equal sign is not used and default values cannot be assigned. The following example defines a macro with both types of parameters. %macro example2(posparm1,posparm2,kparm1=,kparm2=75); To reference a macro with parameters, you specify values according to the same rules by which they are defined. Keyword parameter values are specified by indicating the name of the parameter, followed by an equal sign, followed by the desired value. The keyword parameters can be specified in any order, but no keyword parameter value can be specified before any positional parameter value. Finally, these parameter values are optional. When not specified, the value assigned will be the default specified in the definition, or the null value if no such default was assigned. Positional parameter values are specified by indicating only the desired value without the parameter name. They must also be specified in the same order they were defined, and their specification is mandatory. The following illustrates a call to the macro EXAMPLE2 defined above. %example2(5,10,kparm2=50,kparm1=0); In this case, &POSPARM1=5, &POSPARM2=10, &KPARM1=0, and & KPARM2=50. Note that the keyword values do not need to be specified in the order in which the parameters were defined. Once inside the macro, a number of macro statements, some of which we have discussed in varying amounts of detail, are available to you. You know about %LET, %WINDOW, and %DISPLAY. %INPUT provides functionality similar to that of %WINDOW and %DISPLAY. You also know that %GLOBAL and %LOCAL allow you to change referencing environments for macro variables. Two other classes of macro statements that we have alluded to are conditional statements and DO loops. As discussed earlier, the macro facility has the %IF-%THEN statement whose form mirrors that of its DATA step counterpart. While the DATA step statement checks the value of a data set variable, the macro statement checks the value of a macro variable. Following THEN is a DATA step executable statement. Among these is the DO statement, in which case, everything that follows until the END statement is to be executed when the condition is satisfied. OUTPUT is also available, in which case observations are created conditionally. Following %THEN can be macro statements such as some that we have discussed such as %LET, %WINDOW, %DISPLAY, and %PUT. %DO can also follow, in which case, everything until %END, macro statements or code generation, will be executed if the condition is satisfied. Just as the generation of an observation can follow THEN with the OUTPUT statement, generation of code can follow %THEN. Also, as is the case with ELSE in the DATA step, %ELSE offers alternatives when the macro condition is not satisfied. Macro DO loops also mirror in form their DATA step counterparts. While iterative DO loops in the DATA step iterate through values of a data set variable, iterative DO loops in the macro facility iterate through values of a macro variable. Of course the macro facility requires the percent sign in front of DO and TO. Similarly, a percent sign is required to precede WHILE and UNTIL. Inside parentheses that follow either of these macro keywords is a valid macro expression. As you would expect, all %DO loops end with %END. Until that %END, macro statements as well as code generation can appear. 15 SUGI 30 Tutorials There is nothing in the rules that says that a macro has to generate code, or has to have statements that may generate code. You might need a macro that does nothing more than creates other macro variables or writes a message to the log. We now return to the example in which users indicate with a macro parameter the type of report they want. By asking them to spell the word that describes the report, we introduce the possibility of inconsistent spelling or casing in the title. The following macro keys on the first letter of the input to create a macro variable to be referenced in the TITLE statement. %macro title(report=); %global rpt; %if %upcase(%substr(&report,1,1))=A %then %let rpt=Annual; %if %upcase(%substr(&report,1,1))=M %then %let rpt=Monthly; %if %upcase(%substr(&report,1,1))=W %then %let rpt=Weekly; %if %upcase(%substr(&report,1,1))=D %then %let rpt=Daily; %mend; Referencing &RPT in the TITLE statement instead of &REPORT yields a consistent report title. Of course %LET is allowed in open code, but executing them conditionally is not, which is why we used the macro. The next example uses the macro variables created earlier where the name of each state is a macro variable whose value is its capital city. From a data set, you have selected certain states whose capital cities will be identified in a statement in the log. The number of such states is stored in the macro variable STATENO, and each state name is stored in a macro variable named with the text string VAR followed by a unique number. Consider the following macro. %macro logmessages ; %do %i=1 %to &stateno ; %let pickastate=&&var&i ; %put The capital city of &pickastate is &&&pickastate ; %end; %mend; We know that %PUT can be used in open code without the help of a macro. Of course you can also use it multiple times in open code, but with a macro variable representing the upper limit of the loop, this macro is suggesting that the number of states, and therefore, the number of messages written to the log, can vary, making the loop, and therefore the macro necessary. We now turn our attention to macros that generate code. Unlike the DATA step which has the OUTPUT statement to generate observations of a data set, the macro facility has no statement to generate code. In place of an OUTPUT statement, you simply provide the code to be generated. Certain ways of using OUTPUT in the DATA step have no counterpart in the macro facility, such as naming a data set after OUTPUT to direct the observation to a particular data set, but other ways do have counterparts. One way is based on conditional logic. Just as the OUTPUT statement can be executed conditionally based on the values of data set variables, you can also follow %THEN with code to be generated when conditions based on macro variables are met. You can also execute OUTPUT unconditionally, either inside or outside DO loops. Similarly, we can provide code between macro statements, either inside %DO loops or on its own. We are now ready to take a look at examples of macros that generate code. For example 1, consider a macro that contains no macro logic. %macro nologic ; and %mend; data test1 ; set test0 ; if var1 %nologic var2 ; run; The macro NOLOGIC does nothing more than generates the text AND. It is not a very practical macro since a user can type the word AND in the program as easily as calling a macro, but it is worthy of some observations. Of course, the first semicolon of the macro ends the %MACRO statement. Since what follows is not preceded by a percent sign, it is code that will be generated at the location the macro is called. Notice that there are no semicolons other than those that end %MACRO and %MEND. This is ok. Again, in the absence of macro statements, everything is just code to be generated. Here a semicolon is not desired in the generated code. For example 2, consider the following change to the macro. %macro nologic ; var2 ; 16 SUGI 30 Tutorials %mend; data test1 ; set test0 ; if var1 and %nologic run; In this case, the macro generates the text VAR2 plus a semicolon. For that reason, the IF statement in the DATA step does not need a semicolon because the semicolon is generated unconditionally by the macro. Note also that the semicolon is unmasked. Because it is not part of a macro statement, there is no ambiguity as to how it will be used, and so masking is unnecessary. The code generated by a macro that contains no macro logic or macro statements can also be generated by a %INCLUDE statement, without the overhead of compiling. For that reason, most macros contain a mixture of macro logic and code to be generated. Such macros present the biggest challenge because it can be difficult to distinguish macro code from open code. The key to this understanding is that macro statements begin with percent signs and end with the first unmasked semicolon. Anything in between is part of the statement and anything after the semicolon and before the next percent sign is code to be generated unconditionally. Anything following %THEN that does not follow a percent sign is code to be generated conditionally. The remainder of this section will focus on these types of macros. For example 3 consider the following macro in which the operator that is generated is done so based on the value of the macro variable OPERATOR. %macro mixed1 ; %if &operator=ANY %then or ; %else %if &operator=ALL %then and ; %mend; data test2 ; set test1 ; if var1 &operator var2 ; run; For beginning macro writers, this code is hard to look at. We are not used to seeing THEN followed by an operator like OR or AND. The best way to get around this confusion is by separating what is to be generated from the macro code. Since the text following %THEN is not preceded by a percent sign, it is code to be generated when the condition is met. Because the semicolon ends the macro statement, it is not considered part of the generated text. When generating code conditionally, because the code to be generated is actually part of the macro statement, a challenge is presented when a semicolon is part of that code. The first unmasked semicolon after %THEN will end the macro statement. Therefore, any semicolon to be generated conditionally should be masked with %STR. Observe the following example in which the code intended to be generated is a DATA step DO loop, but the semicolon is unmasked. %if &a=1 %then do i=1 to 2 ; output ; end; Because the first semicolon is unmasked, it ends the macro statement. That means that the code DO I=1 TO 2 is generated only when the value of the macro variable a is 1. On the other hand, the OUTPUT and the END statements are always generated. IF a resolves to 1, then the DO loop is generated without a semicolon between the DO statement and the OUTPUT statement, resulting in an error. When a does not resolve to 1, the DO statement is not generated, and an error results from having an END statement without a DO or SELECT statement. The following fixes the problem. %if &a=1 %then do i=1 to 2 %str(;) output %str(;) end %str(;) ; Alternatively, when the code to be generated from a %IF statement includes multiple semicolons, a %DO-%END block saves us from having to worry about masking. %if &a=1 %then %do; do i=1 to 2 ; output ; end; %end; Here we have replaced the code to be generated in the %IF statement with %DO. Following the semicolon, everything until the next percent sign will be generated when a resolves to 1. Now the semicolons that were masked when they followed %THEN can be unmasked because they are no longer part of a macro statement. For example 4, we return to the DATA step in examples 1 and 2. Suppose that the data set TEST0 contains 100 variables, all named with the string VAR followed by a unique number between 1 and 100. The positional parameter 17 SUGI 30 Tutorials CONDITIONS represents how many of the first 100 variables must be listed in the subsetting IF. If no such statement is needed, the macro variable will be initialized to 0. %macro mixed2(conditions) ; %if &conditions>0 %then %do; if var1 %if &conditions>1 %then %do i=2 %to &conditions ; and var&i %end; ; %end; %mend; The following generates a subsetting IF that checks the values of five variables. data test1; set test0; %mixed2(5) run; This time the macro contains a mixture of macro code and code to be generated. We will step through this carefully to distinguish between them. First, nothing is necessary when CONDITIONS resolves to 0. Next, everything after the %IF statement until the next percent sign is code to be generated. In this case, it will be used as the start of a subsetting IF, but without a semicolon to be generated, it is not the whole statement. After this is another percent sign signifying the beginning of another macro statement. The text (with the macro variable resolved) that follows the semicolon will be generated for each iteration of the %DO loop. For example, if the value of CONDITIONS resolves to 3, the text AND VAR2 AND VAR3 will be generated. If CONDITIONS resolves to 1, nothing further will be generated. Once the loop completes, the subsetting IF is completed except for a semicolon to end it. Again, everything between the semicolon that ends %END and the next percent sign is code to be generated. In this case, a semicolon is generated. These macros have generated relatively short pieces of code. Some examples have generated a part of a statement, while others have generated part of a DATA step. For example 5, we revisit several earlier examples with a macro that accomplishes several tasks. Consider an application that retrieves data from a database that depends on the division of the company, as specified by the user through a keyword parameter called DIVISION. As before, division 1 data is held in a SAS database against which SAS DATA steps can extract. Division 2 data is stored in an Oracle database, and Division 3 data in a DB2 database. The keyword parameter COST allows users to request certain types of cost reports. Specifying A produces an annual report, M a monthly report, W a weekly report, and D for a daily report. Finally, through the keyword parameter SALESSUMMARY, a SAS data set is created containing any number of summaries, as chosen by the user. We begin with a look at the macro code that decides from which database to extract. %if &division=1 %then %do; data extract1 ; other DATA step code %end; %else %do; proc sql; connect to %if &division=2 %then oracle ; %else %if &division=3 %then db2 ; as mydbms (user=username password=password) ; create extract1 as select * from connection to mydbms ( %if &division=2 %then oracle specific SQL query ; %else %if &division=3 %then db2 specific SQL query ; ) ; disconnect from mydbms ; quit; %end; Once again, %DO-%END blocks are used for division 1 and for the combination of divisions 2 and 3. Within the latter though are interruptions of code to be generated with macro code. The first place is in naming the DBMS. Once again, following the %IF and %ELSE statements are semicolons, but they end the macro statements and are not to be generated. Without a percent sign, the line that begins AS MYDBMS is unconditionally generated until the next percent sign. This example assumes the same username and password for both Oracle and DB2. If this is not the case, macro logic similar to above may be used within the parentheses to generate the appropriate username and password. Finally, after the semicolon following the last %ELSE statement is code to be generated – a closing parenthesis, a semicolon, the DISCONNECT statement, and the QUIT statement. 18 SUGI 30 Tutorials This portion of the application could have been coded in several different ways. First, it might make sense to require the DBMS specific username and password in order to even use the application. If so, you can make these either keyword parameters or ask for them in a prompt screen. Second, this portion was written with the approach that only minimal code was to be generated conditionally. Anything that would be common to both choices of DIVISION would not be part of the %IF-%THEN statements. This meant using multiple %IF-%THEN statements. Another approach would have been to create separate blocks for Division 2 and Division 3. Some of the code would be redundant but as the number of places that require conditional logic increase, it may be less cumbersome. After data is extracted and put into a common format, PROC TABULATE generates the report of choice for cost. proc tabulate data=extract2 ; class year month week date ; var cost ; table %if &cost=A %then TABLE statement 1 ; %else %if &cost=M %then TABLE statement 2 ; %else %if &cost=W %then TABLE statement 3 ; %else %if &cost=D %then TABLE statement 4 ; ; run ; In this case, the TABLE statement is interrupted by macro logic. Once it is complete, a semicolon and a RUN statement are generated. Finally, the SUMMARY procedure creates data sets that summarize sales. The types of summaries as well as the number of them are up to the user. proc summary data=extract2 ; class salesperson ; var sales ; output out=summaries %let i=1; %let summary=%scan(&salessummary,1,%str( )); %do %until(&summary=%str()) ; &summary=&summary.sales %let i=%eval(&i+1); %let summary=%scan(&salessummary,&i,%str( )); %end; ; run; Here the OUTPUT statement in PROC SUMMARY is interrupted by a series of macro statements including an entire %DO-%UNTIL loop. At the ith iteration of the loop, the macro variable SUMMARY represents the ith summary requested in the parameter. The only place within this macro logic where code is generated is immediately after the %DO statement. For example, If the 2nd summary requested is MIN, then MIN=MINSUMMARY is generated after the generated code for the first statistic requested (note the role of the period as a macro variable delimiter here). Once out of the loop, a semicolon is generated to end the OUTPUT statement, followed by a RUN statement. CONCLUSION The macro facility allows you to build flexibility into your applications. An application becomes more flexible as the number of circumstances under which it can be run with minimal user intervention increases. Of course, different circumstances require different pieces of code. Through the macro facility you can efficiently generate circumstancedependent code without having to touch the core of the program. This is accomplished by designating an area of the program for specification of circumstances, possibly followed by logic that defines how to turn any given circumstance into code. Places in the core of the program that would otherwise depend on the circumstance are replaced by references either to the specification itself or the logic that generates the code that is based on the circumstance. Rather than searching throughout a program for all pieces of code that depend on the current circumstance, a user can specify the circumstance in the designated area and the references in the core of the program will generate the appropriate code. In addition to flexibility, the macro facility can also make applications easy to use. Since macro logic can be used to translate specifications into code, you can write the macro to accept meaningful, easy-toremember input keys from the user. In the end, users are kept away from the core of the program. In this paper we have seen that you can develop an understanding of the macro facility by starting with your knowledge of the DATA step. Concepts such as variables and DO loops and conditional logic are already familiar to you, and so the understanding of slight differences such as the use of quotation marks, quoting functions and %EVAL is not too much to ask, especially with a solid understanding of the difference in objectives between the DATA step and the macro facility. With all that said, the role of the macro facility to generate open code makes inevitable the mixing of macro code and open code, and this is where confusion often arises. Statements such as %LET 19 SUGI 30 Tutorials VAR1=VAR1=5 and %IF &VAR2=5 %THEN IF VAR2=5 do not have a place in your understanding of the SAS language. In this paper I have attempted to clear up some of this confusion. Just like data set variable assignments, the text on the right side of the equal sign is the value assigned to the variable. The only difference is the lack of quotation marks and %LET. Just as conditional creation of an observation is accomplished with the OUTPUT statement following THEN, the code to be generated conditionally follows %THEN. In summary, among all the mixing of open code and macro code, open code does have certain “places” it is allowed to be. In macros, these places correspond to those where OUTPUT can be found in DATA step code. By remembering these facts, distinguishing macro code from open code becomes easier, and ultimately, so does reading and writing it. The SAS system is famous for having many ways to accomplish almost any given task. Many programmers get along fine without the macro facility while others use it every chance they get. Among those who do, some create macros to generate only the code with potential to change while others may include static code in their macros. If given a choice, some may use macro variables for situations that others would use macros for. Issues to consider include memory and performance, but maybe most important is who will be using it. Liberties can be taken when you write a macro only for your own convenience. When other programmers use it, you may get away with asking them to supply syntactically correct SAS code into a macro variable to be directly referenced in code. However as the number of users increase, especially users who are not programmers, extra efforts should be taken to keep them away from the code while making input as intuitive and easy as possible. This may involve allowing them to populate the right side of the equal sign in a %LET statement with text that will become part of the code, allowing them to call a macro with keyword parameters whose values, for example, are the first letters of the possible types of reports, or creating an icon for their desktop which, upon double-clicking, invokes a prompt screen with a list of choices they can mark with an “x”. As is the case with everything else, you have many choices. REFERENCES Molter, Michael (2004), “The Role of Consecutive Ampersands in Macro Variable Resolution and the Mathematical Patterns That Follow, “ Proceedings of the Twenty-Ninth Annual SAS Users Group International Conference. CONTACT INFORMATION I am happy to answer any questions you may have regarding this subject. Please feel free to contact me in any of the following ways. Mike Molter Howard Proskin and Associates 300 Red Creek Drive Rochester, New York Phone: (585) 359-2420 Fax: E-Mail: SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. 20
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement