Although part of the SAS Format facility for well over two decades, PICTURE Formats are often not well understood, and consequently tend to be underutilized by even experienced SAS users. Yet, they provide a wealth of tools to effectively portray the values of numeric variables and often avoid the need for either tedious data step coding or to create new variables in your data sets. This tutorial describes how PICTURE Formats “work,” how to write them in
PROC FORMAT statements and how to apply them in both Data and Procedure Steps. By mastering the concepts and techniques shown in this paper you will be able to apply the power of PICTURE formats in your SAS programs and in so doing often complete your data management, reporting and analysis tasks faster and easier than resorting to other potentially more tedious and time-intensive methods in the SAS System.
A Format contains the instructions used by the SAS System to display, or portray the values of variables. More formally, formats control the “external representation” of a variable’s values. There are two broad classes of SAS
Formats: VALUE Formats and PICTURE Formats. VALUE Formats are either “supplied” or “internal” to the SAS
System (that is, they ‘come with’ our installation of SAS System Software) or you can create your own using PROC
FORMAT and the VALUE Statement. VALUE Formats can be used to externally represent the values of either character or numeric variables.
Experienced SAS users are already aware of the broad array of tools, as well as the flexibility offered by appropriate use of both SAS-supplied and user-created VALUE Formats. Using them enhances the appearance of our output, reduces ambiguity about the definitions of values of variables appearing in our report, and can be used as a resourcesaving alternative to aggregating (that is, “rolling up”) data from one unit of analysis to another. They are also exceptionally well suited for tasks such as “recoding” or “bucketing” the values of a variable in to a smaller number of discrete groups. For more information about Value Formats, please see my paper, “My Friend the SAS Format,” in the SUGI 30 Proceedings.
Picture Formats are different from Value Formats in two critical respects. First, Picture Formats can be used ONLY with numeric variables. Second, a Picture Format creates a template that is used to display the values of numeric variables. (Note: A “Picture Format Template” is NOT the same thing as a Style or Table Template in the Output
Delivery System.). Also, there are no “SAS-supplied” Picture Formats; rather, you create them using the PICTURE
Statement in PROC Format. Picture Formats are stored in a FORMATS Catalog in a SAS Library, just like Value
Formats. And, you can use PROC FORMAT tools such as the FMTLIB, CNTLIN, CNTLOUT and MULTILABEL
Options with Picture Formats as well as Value Formats.
So, what is a Picture Format Template? It is a series of commands that control how the values of numeric variables are displayed in your SAS-generated output. Once you master the core concepts and functionalities of Picture
Formats you will find them a powerful and flexible tool with which to enhance the quality of your reports and analyses.
To fix ideas, suppose we have a numeric variable in a data set representing a series of telephone numbers in the
United States. The first three digits are the area code, the next three are the “exchange” and the last four are the number itself. What we want to do is insert a hyphen (dash) between the third and fourth digit and between the sixth and seventh digit. Here’s an example data set, where the “raw data” are included within the Data Step creating the data set using a “Datalines” statement.
* example 1;
phones1; input phone_number; datalines ;
SUGI 31 run
; options nonumber nocenter nodate;
data =phones1; title 'Getting in to the Picture Format' ; title2 'Phone Number Data Set 1' ;
The PROC PRINT-generated output is:
Now, what the boss really wants is to have the output displayed with hyphens between the area code and exchange and between the exchange and the remaining four digits of the telephone number. Since we don’t have a SASsupplied “telephone number format,” there’s no help there. And, it’s pretty clear that a VALUE format isn’t going to help either, since it will control the display (that is, the representation) of the values of the variable.
One approach might be to turn the telephone number in to a character, rather than a numeric, variable and then use a combination of the SUBSTR (substring) SAS Programming Language Function and the concatenation operator in a data step to “break apart” the “pieces” of the telephone number and then essentially reassemble them in a new variable with the required hyphens. A Data Step implementing this approach might look like this (with a PROC PRINT to display the results):
The output is shown on the next page:
Well, we got what we wanted, but at the expense of doing a lot of work in the Data Step (i.e., numeric-to-character conversion of the value of the original telephone number variable, then a fairly complex assignment statement to display the values of telephone number in the desired way. While there are other ways this Data Step could have been written, the “take-home message” here is that because of the tools available in the Picture Format facility, there’s no need to apply a Data Step approach to this problem in the first place.
EXAMPLE 1: USING A SIMPLE PICTURE FORMAT
Here’s how a Picture Format is created, and then applied to the telephone number data set. After seeing what it does, we’ll go over the syntax and options of this initial, and very basic Picture Format, and use it to set the stage for identifying more of the Picture Format’s functionalities.
Let’s go over the syntax step by step. First, we’re calling, or starting the Format Procedure with the PROC FORMAT statement, just like we would to create a Value Format. But, the PICTURE Statement tells the PROC that we’re about to create a Picture Format, the name of which is given immediately to the right of “PICTURE.” So, far, we know we’re creating a Picture Format called “phone_a.” Next, we’re supplying the range of values to which the Picture is going to be applied. In this example, we’re using the ‘low’ and ‘high’ keywords, separated by a dash, to instruct PROC
FORMAT that the Picture Format we’re creating will be used to display all (non-missing) values of the variable to which it will be associated in either a subsequent Data or Procedure Step. To the right of the equals sign is the template, or instructions as to how the value of the variable is to be displayed. The strings of “9’s” are called
and will be explained in more detail later in this paper.
Once the Phone_a Picture Format is created, we can associate it to the values of a variable in, say a PROC PRINT
Step, and see the results. Here they are:
Just from this basic example we can see that the Picture Format gave us exactly what we needed without a lot of tedious, potentially error-prone and inefficient Data Step programming. With the Picture Format, we did NOT need to do anything other than create the format and then associate it to a variable to obtain the data display we need.
Now, let’s identify some other basic, but very useful aspects of the Picture Format facility before delving its details and advanced capabilities. Suppose, having seen the fine job you’ve done on the previous task, the boss changes her mind and asks that the area code be enclosed in parentheses, with one space between the right parenthesis and the exchange, and she still wants a hyphen between the exchange and the rest of the telephone number. Most of you know this as a typical “boss question” that usually starts with something like “Well, this is nice, but how hard would it be to make a couple of tiny changes…”
Fortunately, the Picture Format’s PREFIX option will give us just what we want. While we’ll get in to the details of digit selectors and other details of the Picture Format facility shortly, one core rule for Picture Formats is that if you use digit selectors as the template for displaying your data, the first position of the template must be a digit selector. That means we can’t make the right parentheses we need part of the template itself, since it is obviously not a digit.
Instead, we will instruct PROC FORMAT to make the left parentheses the prefix to the template, which will start with the (required) digit selector. Here we go:
This PROC FORMAT task shows you two important options to the PICTURE statement. First, we’re making the default length of the Picture Format 16 characters, which is wide enough to accommodate BOTH the template AND the specified prefix. The prefix we want is given in the PREFIX option, which, like the DEFAULT option is enclosed in parentheses. Be careful: The PREFIX
is in the parentheses, and the
of the option is a
left parentheses symbol enclosed in single quotes
. The PROC PRINT Output is:
So, with just a small amount of additional work in the PROC FORMAT step, we have exactly what the boss wants
(until, of course, she changes her mind again) without resort to complex and tedious Data Step coding.
One more fairly basic example and then we’ll take a more detailed look at
PICTURE Statement syntax, options, rules and more advanced capabilities.
Suppose we have a data set that looks like the one shown to the left.
In the United States, some of the values of Number in this data set correspond to local numbers (seven digits, three for exchange and four for the number) and others to long distance number (ten digits, with the first three representing the area code. We also have three “bad” observations in the data set. A valid local number (again, in the USA), has to be at least seven digits long and must start
with the number two. Valid area codes in long distance numbers start with the number two. Under these “rules,” observations 5, 6 and 10 have invalid values of a telephone number. What we now need to do is create a picture format that has three “rules” or value ranges to it: 1) if the length of the telephone number is ten digits and starts with the number two, then we want the area code enclosed in parentheses and a hyphen separating the exchange from the number or, 2) if the length of the telephone number is seven digits and the first digit is a two, then we want a hyphen separating the exchange from the number or, 3) first digit of a ten or seven digit number is a 1, or the length of the phone number is LESS than seven digits then we want “Invalid Number” displayed in our output. Needless to say, coding these rules in a Data Step, especially when we are “starting” with a numeric variable, could be very tedious.
Fortunately, the Picture Format facility lets avoid a lot of unnecessary programming and still get what we need, quickly and easily. Here’s the solution:
In this example, we can see how a series of value ranges were supplied, each with a different picture template. So, the variable’s display is controlled by its internal value. The results shown below and to the left:
The three examples of Picture Formats we’ve seen so far should be enough to convince you that they offer a power range of tools to display or portray the values of numeric variables without extensive, tedious, data step coding. So, having seen these examples, I hope you’ll want to continue reading this paper to see even more tools and capabilities of Picture Formats and how you can apply them in your work.
Picture Format names can be up to 32 characters in length starting with the release of SAS 9.1 Software. The name you give to a Picture Format cannot be the name of a SAS-Supplied format, nor may it end in a number. Picture
Formats are used to display the values of numeric variables. Like Value Formats, Picture Formats can be stored in either a temporary or permanent Formats Catalog. Although not discussed in this paper, the new (to SAS 8, and enhanced in SAS 9) MULTILABEL option can be used to create a Picture Format with overlapping value ranges. (For more information, please see the PROC FORMAT documentation and/or my paper “Using Multilabel Formats, available for download at www.SierraInformation.com)
A Picture Format may be up to 40 characters in length. A ‘picture’ or ‘template’ is a series of characters in single quote marks. The characters forming the ‘template’ or ‘picture’ can be one of three types: o
, which numeric characters ranging from zero to nine defining positions in which the values of numbers in the variable will be displayed. As we will soon see, there is a critical difference in results when you apply a “zero digit selector” versus a “non-zero digit selector.” o
, which are non-numeric characters that will be printed in the picture. For example of a message character, see the Phone_C Picture format above where some values of telephone number were displayed as either “Invalid Long Distance Number” or “Invalid Local
which control the display of date, time or datetime variables. These special characters require specification of the DATATYPE= option in the PICTURE Statement, and will be discussed in detail below.
Perhaps one of the most confusing aspects of Picture Formats is the “digit selector.” But, by working through a few examples, we’ll see how digit selectors work and how to specify them correctly for your particular data presentation needs. First, let’s review some core concepts: 1) a Picture Format is creating a template (or ‘picture’) that will display the values of a numeric variable in your SAS-generated output; 2) if you are using a Picture Format to display numeric values (as opposed to message characters, which we will discuss next), you’ll need to tell PROC FORMAT how many
“slots” or “spaces” in the Picture Format are needed to display the values of the numeric variable
what to display if there’s no value to display in a “slot.” That’s that the digit selector does.
A digit selector is either a zero (0) or the numbers one (1) through nine (9). Most SAS users, and the PROC
FORMAT documentation, use
a nine as digit selectors, so that’s the same convention I’ll apply in this paper. When you specify zero as the digit selector, any leading zeros in the number to be displayed are shown as blanks. When nine is specified as the digit selector, the leading zeros are displayed in the output. Perhaps the easiest way to remember how digit selectors work is the saying taught to me by Pete Lund of Looking Glass Analytics, who has also written extensively about the SAS Formats (see below.) The saying is: “Nines print zeros, and zeros print blanks.”
Let’s take a look at how digit selectors are used in a Picture Format, and what happens when you use either a zero or a nine for a digit selector. The example data set below shows some made-up values for sales of parts in an automobile store.
First, a Picture Format with a string of nines as digit selectors is created and applied the variable SALES.
The output looks like this:
Using a series of nines as the digit selectors results in having zeros displayed in the output for every “position” in the picture template for which there was no value of the variable to which it was applied. Remembering that “nines print zeros and zeros print blanks,” one potential approach to a better-looking result might be to replace all the nines with zeros. What happens when we do that?
The output now looks like this:
Well, by using nines as our digit selectors we’ve managed to address the problem with leading zeros. But, remember, since “nines print blanks,” if the value of the variable to which the Picture Format is applied is a zero, then the formatted value that appears in our output is a blank! In many situations, that can lead to some confusion…in our example, having a ‘blank’ value of sales for distributor caps is misleading. Does it mean we sold no distributor caps, or does it mean we are missing data in the source data set for this value of the parts variable? Since we don’t want to give our clients/customers confusing reports, one way to solve this problem is to use both zeros and nines as digit selectors in the same Picture template. Here’s an example that will give us the solution we need:
In this Picture Format I’ve combined both zero and nine digit selectors in one template. The result, shown below, gives us exactly what we need. If you’re creating a picture format to display numeric variables that may have values of zeros, I’d recommend your using an appropriate combination of zero and nine digit selectors so that you have zero values displayed in your output.
In my opinion, two of the most useful and powerful tools in the Picture Format “arsenal” are the MULTIPLIER (or
MULT) and ROUND options. With these, we can usually avoid “pre-processing” observations via a Data Step before obtaining the output we need without a potentially tedious, time-consuming or resource-intensive Data Step.
Let’s first take a look at the MULT option, and then the ROUND Option, and then we’ll use both in one Picture Format.
THE MULT OPTION
This option allows you to provide a constant by which the values of the number is to be multiplied before it is formatted. With it, you can easy carry out tasks such as “round up” financial data to the nearest thousand (or some other appropriate value) convert values from one unit of measurement to another (e.g., from inches to centimeters or from US dollars to another currency, Using the MULT option is not only easy, but it avoids unnecessary Data Step processing and allows you to easily change the value of the multiplier, if, for example you are using it to calculate currency exchange rates that change between each “run” of a report.
Here is a PROC REPORT task that generates a report from some (simulated) credit card transaction data. This very
powerful PROC is used to group and sum credit card charges by year and credit card used. This data set has over
265,000 observations in it, so processing time may be something to keep in mind as we consider requests to change the report.
The output is:
So far, so good. But, what do we do when the boss asks her typical “how hard would it be” question: can we display the credit card charges rounded to the nearest thousand dollars? Some might want to create a new data set, and create a new variable in that data set, where each of the more than quarter-million individual records had their values of the variable charge_amount rounded to the nearest thousand, and then have PROC REPORT re-generate the analysis we need.
USING THE ROUND OPTION
We can avoid the Data Step with the MULT Option, and avoid any potential truncation problems by also specifying the
ROUND Option. Without the ROUND Option, PROC FORMAT would
any decimal portion of the variables value and then display the result according to the defined template. When you specify the ROUND
Option and the MULT Option, PROC FORMAT
multiples the variable’s value by the supplied multiplier, and
rounds the results to the nearest integer and
formats the value according to the template. According to the
PROC FORMAT documentation, a value of exactly .5 is rounded
to the next highest integer.
For complete details on how the steps PROC FORMAT follows to build Picture Formats, see Chapter 23 of the BASE
SAS Procedures Documentation, and in particular, Table 23.1 (“Building a Picture Format”).
The code sample below shows how the MULT Option is added to the Picture Format, and then how the Format is applied in a PROC REPORT Define Statement. Then, the output generated by the PROC REPORT task is displayed.
The resulting output looks like this:
USING PICTURE FORMATS TO CONVERT VALUES OF VARIABLES
In my opinion, the most powerful application of Picture Formats is to carry out conversions of variable values from one unit of measurement to another without having to use a Data Step to operate on every observation in the source data set. Instead, a Picture Format can gives us what we want and affords greater flexibility in our programming.
Perhaps the best example of how Picture Formats can help us to convert values of one variable to another is a conversion of numeric variables from one currency to another. This is a fairly common requirement for SAS users working with data that are drawn from financial/accounting systems in many countries, each with its own currency.
And, since the rate at which currencies are exchanged changes frequently, a SAS program that takes data stored in one currency and displays it in another currency needs to be updated with the latest exchange rate prior its execution.
Here is an example of how a Picture Format is used to display values that are stored in US dollars in the credit card transaction file seen above as the equivalent value in Japanese Yen. The Picture Format code below shows how the value supplied to the MULT option is the exchange rate on the date I first generated this example. Obviously, if this is something you need to do at your job on an ongoing basis, its very easy to update the value in the MULT option with the latest exchange rate just before you run your program again. Here are the PROC FORMAT and PROC REPORT steps that generate the desired report: Notice that I’ve modified the PROC REPORT task so that the output it generates clearly indicates that the values displayed are the Japanese Yen.
The resulting output is shown on the next page:
SUGI 31 Tutorials
USING PICTURE FORMATS IN VARIABLE ASSIGNMENT STATEMENTS
Picture Formats can also be used to assign the values of new variables in Data Steps. While I’ve generally want to avoid creating “extra” variables in data sets, there may be times when we need to assign values to a new variable based on the commands placed in a Picture Format He is an example: suppose the boss asks: “How hard would it be to give me an Excel™ spreadsheet that shows the amount of credit card transactions by credit card and year, with separate columns showing total charge amounts in US Dollars, Japanese Yen and British Pounds?” Here is what we will do:
1) Create Picture Formats for Great Britain Pounds (GBP) and Japanese Yen (JPY). The variable charge_amount already “holds” the data in US Dollars.
2) Use the PUT Function in a Data Step to create two new variables, each of which applies the previouscreated Picture Format to the values of charge_amount.
3) Before exporting the results an Excel workbook, we will look at the results of steps 1 and 2 using PROC
PRINT. If we are happy with them, then a tool such as PROC EXPORT or the Export Wizard can be employed to create an Excel workbook from the data set.
First, let’s create the appropriate Picture Formats:
Now, well, use these Picture Formats to assign values to new variables in a Data Step and then look at the results:
The results are:
The last section of this paper discusses the use of
used to format the values of date, time or datetime variables in Picture Formats. Even though we have a wide range of useful SAS-supplied Value Formats to associate to variables representing dates and times, these Picture Format directives given us an even broader array of ways to display this type of data. In order to apply directives in a Picture Format, you need to include the DATATYPE= option in the Picture Statement. The valid values of this option are DATE, TIME or DATETIME.
This table shows some of the directives you can use in a Picture Format to customize the display of values of your date, time or datetime variables. Please consult the PROC FORMAT documentation for a complete list.
To fix ideas about directives might be used to create a customized Picture Format, below is an excerpt from the credit card transaction file that’s been used for several other examples in this paper. Notice that the variable trans_date is a
SAS Date variable (that is, an integer representing the number of days from January 1, 1960.)
Here is an example of a PROC FORMAT task where a Picture Format is created using directives to display the values of the SAS date variable trans_date.
Next, the date_b_fmt Picture Format is associated to the variable trans_date in the following PROC PRINT step:
The output is:
The Picture Format facility contains a wealth of tools you can use to work effectively—and efficiently---with your data.
Taking the time to master the concepts and capabilities of this aspect of the SAS System can provide you with a range of enhanced capabilities to portray, manage and analyze your data.
My Friend the SAS Format,
Proceedings of the Thirtieth Annual SAS Users Group International
Conference, Cary: NC SAS Institute, Inc., 2005
Than Just VALUE: A Look Into the Depths of PROC FORMAT,
Proceedings of the Twenty-
Seventh Annual SAS Users Group International Conference, Cary, NC: SAS Institute, Inc, 2002
SAS Institute, Inc.,
BASE SAS 9.1.3 Procedures Guide,
Cary, NC: SAS Institute, Inc, 2004
Thanks to Rick Langston of SAS Institute’s Research and Development unit for his insights in to the SAS Format facility and for answering several questions I had while preparing this paper. I’d also like to thank Pete Lund of
Looking Glass Analytics, Olympia, Washington, for his advice and insights during the development of this paper.
Your comments and questions are valued and encouraged. Please contact me at:
Andrew H. Karp
Sierra Information Services
19229 Sonoma Hwy PMB 264
Sonoma, CA 95476 USA
707 996 7380 voice [email protected] www.sierrainformation.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project