SCO® UNIX® Development System

Development System
Programming Tools Guide
sco® UNIX®
Development System
Programming Tools Guide
© 1983-1992 The Santa Cruz Operation, Inc.
© 1980-1992 Microsoft Corporation.
© 1989-1992 UNIX System Laboratories, Inc.
All Rights Reserved.
No part of this publication may be reproduced, transmitted, stored in a retrieval system,
nor translated into any human or computer language, in any form or by any means, elec­
tronic, mechanical, magnetic, optical, chemical, manual, or otherwise, without the prior
written permission of the copyright owner, The Santa Cruz Operation, Inc., 400 Encinal,
Santa Cruz, California, 95060, U.S.A. Copyright infringement is a serious matter under the
United States and foreign Copyright Laws.
The copyrighted software that accompanies this manual is licensed to the End User only
for use in strict accordance with the End User License Agreement, which should be read
carefully before commencing use of the software. Information in this document is subject
to change without notice and does not represent a commitment on the part of The Santa
Cruz Operation, Inc.
The following legend applies to all contracts and subcontracts governed by the Rights in
Technical Data and Computer Software Clause of the United States Department of Defense
Federal Acquisition Regulations Supplement:
95060, U.S.A.
Microsoft, MS-DOS, and XENIX are registered trademarks of Microsoft Corporation.
DEC, PDP, VAX, and VT100 are trademarks of Digital Equipment Corporation.
Intel is a registered trademark of Intel Corporation.
UNIX is a registered trademark of UNIX System Laboratories, Inc.
Date: 10 December 1991
Document Version: 3.2.4D
Creating programs
lex and yacc .
Maintaining program source files
.................................................................................. ...............................................
Using this guide
Documentation conventions
Development System documentation set
Using the Documentation
Commercial books and articles
. 4
............................................. ............................................. .
............................. ......... ........................ ....
................................ ..................... ........................
Compiling and linking C language programs
Compiling simple programs
cc options
Specifying input files
File extensions
Specifying source files
Specifying output files .
. ..
Naming object files
Naming th e executable file
Linking .
. .
Linking with additional libraries ..........................................................................
Error and warning messages
Setting the warning level ........................................................................................
Checking syntax .......................................................................................................
Preparing for debugging
......................................................................... .................
...................................................... ....................................................
........... ................................
.......................................................................... ...............
........................................................ ..........................................
. ........ ...... .................................................................... ..........................................
............... ...................................................................
.............................................................................. .............. ............................
Table of contents
Standards conformance
Portable Operating System for UNIX (POSIX)
X/ Open Portability Guide 3 (XPG3)
System V Interface Definition Issue 2 (SVID) ......................................................
Intel Binary Compatibility Standard 2 (IBCS2)
Compiling programs for XENIX .....................................................................................
Compiling programs for DOS and OS/2 .......................................................................
............................................................................ ... ............................................
.................................................... ...............
The Link Editor
Sections ................................................................................................................................ 23
Memory configuration
............................................................................. ........................
........................ ..........................................................................................
Object files
........................................................................................................ ...............
. .
......................................................................... ......................... .. ...........
Link editor command language
Assignment statements
Specifying a memory configuration
Section definition directives ...................................................................................
Changing the entry point .................................................................................................
Using archive libraries .....................................................................................................
Allocation algorithm
Incremental link editing ..........................................................................................
Output file blocking .................................................................................................
Nonrelocatable input files ......................................................................................
................... . .......
Syntax diagram for input directives
lint message types
. .. . . ... ....
. . ..
. .. .
. ..
Unused variables and functions .. .
. .. . .
Set/used information
Flow of control ..
. . . ... ..
. .
Function values
. .. . .. .. . ... .
Type checking
.... . . ... . ...... . .. .
Type casts
. ...
. .... .
. .. .
Non-portable character use .. . . . .
Assignments of longs to ints . .
Unusual constructions ........................................................ ;....................................
Multiple uses and side effects ................................................................................
................................... .
... . ...
............ ........... .
............................................................. .... ....
... ........ .
.............................................................. ..
. .......... .......................................... .. . .. ... ......
.............. ..... .. ... . . .. . ............................................ ..... ............
......... .
..... .......
. ....
... . ..
.................... .....
. ..
. .......................... .............................................
.. .
. ...
........ .
..... ........ .................. ................... .........
... ............. ....
.. ............... .............................
.... .... . ... .................................... .............................
...... .................. .......................................................
C Programmer's Productivity Tools
Introducing the C Programmers Productivity Tools ................................................ 61
Creating a profiled version of a program ............................................................
Running the profiled program ...............................................................................
The PROFOPTS environment variable .................................................................
Examples of using PROFOPTS
. .
Interpreting profiling output ..........................................................................................
Viewing the profiled source listing .......................................................................
Specifying program and data files to lprof .........................................................
Files needed by lprof ................................................................................................
Source listing for a subset of files ..........................................................................
Summary option .......................................................................................................
Merging option .........................................................................................................
Cautionary notes on using Iprof
.... .. . . . .
Improving performance with prof and lprof ......................................................
Improving test coverage with lprof ......................................................................
Using lprof with rcc ..........................................................................................................
cscope ....................................................................................................................................
Configuring the environment ................................................................................
................. ...... ....... ........................... ..................
Table of contents
...... ..... ..... ... ................... .....................
Using cscope
Running cscope
. .
. 81
The cross-reference file . .
. .
A tutorial example: locating the source of the error message
Conditional compilation directives .
. . . . 88
Examples of using cscope
... .. .. . ... . . .. . . . .
.. ...
. . . . . . . .. 89
Changing a text string
.. .. . . . . .
. .. .
. .. .
. 89
Adding an argument to a function
Changing the value of a variable
......................................... ........................ ... .............................. ...
..... ............. ...... ........ ........... ................................. .........
...... ..................
.. ........................... ............ ..... .... .... ...... ...
.. .
. .. .
.. ...........
... ...... .... ... . ......... ..
......... . ...... .... . .. . ....
.. .............. ..
. .......... ........ ..
....... .................... ........ . ..................................
Basic features . .
. .
makefiles and substitutions
. . .. ..
Dependency line syntax
Dependency information
. .
.. .
. .
Macro definitions .
Executable commands .. . . ..... .. ... .... .. .. . . ..
. . . . . ... .
Output translations
.... . ....... . ... . .. . ..
. .... ..... . . ... . . ... . . .
Recursive makefiles ....
. . .. . . . . . .... . . .
. .
Suffixes and transformation rules
Implicit rules .
Archive libraries . ... ... . . . . . . . . .. . . . . . . .
Tildes (1 in sees Filenames ....... . . . .. ............. . .... ........ ................... .
The null suffix . .. ......
. .. ... ... .. ... . ..... .. .... ...... . ... .. ... . .... ..
Creating new suffix rules ......................................................... ; ............................
include files ..............................................................................................................
Dynamic dependency parameters ......................................................................
Environment variables ..........................................................................................
Suggestions and warnings ............................................................................................
Internal rules
1 10
1 11
.. . ........... .................. .... .............................................................................
.... . ..
.......... ........ . ....... ................. .................... .........
........................................ ................................. ...............
.... .. ............
. ........................................ ... ...................
...... . . .......... .............................. ............................................. ..
..... . .
. ..
....... .. .....
. ...
. .
. . ... . ....
. ..
.. ..... ...
........ ..
....... ..... . . ..... ....
. .. ....
.. ......
... .. .
. .. .
.... .. ..... ............. ........ ........ ... ..........
.......................................... ....... .......
.......... . .............................................................................................
. . ... ... .. ...... .. . ...
.. . .. .
........ ......
. ... . .... ..... . ... ..................................
. ..
. .
.. ...... . .
. .
. ..
................... .............................................................................................. .
Source code control s y stem (SCCS)
sees for beginners
Creating an SCCS file using admin
Retrieving a file by means of get
Recording changes by using delta
Additional information about get
Delta numbering
sees command conventions
x.files and z.files
sees commands
Error messages
The help command
The get command
The delta command
The admin command
The prs command
The sact command
The rmdel command
The cdc command
The what command
The sccsdiff command
The comb command
The val command
The vc command
sees files
............................................. ...............................................................
.................................. .............................................................
....................... .......................................................... ..................
............. ........ ...............................................................................
............................. ....................................................................................
....................................... ............................................................................
1 18
1 18
Shared libraries
What is a shared library?
Building an a.out file
Deciding whether to use a shared library
Space considerations
Table of contents
Saving space
Increase space usage in memory
Coding an application
.................... ....................................................
..................................... ...............................................................
Identifying a. out files that use shared libraries
Debugging a. out files that use shared libraries
Implementing shared libraries
The host library and target library .
The branch table
.. ...................................................................
Building a shared library
The building process
Guidelines for writing shared library code
Choosing library members
Include large, frequently used routines
Exclude infrequently used routines
Exclude routines that use much static data
Exclude routines that complicate maintenance
Include routines the library itself needs
.................................................................. ...........................
... ......................... ................................
Changing existing code for the shared library
Minimize global data
Using the specification file for compatibility
Importing symbols
Providing compatibility with non-shared libraries
Tuning the shared library code
Checking for compatibility
An example
.............................................................................. ..............
Writing lex programs
The rules section
The definitions section
The subroutines section
Advanced lex usage .
Disambiguating rules
Context sensitivity
lex I/0 routines
Routines to reprocess input
Using lex with yacc
......... ................................................................................
.... ...................................................................................
.. ........ ............................................................................................
............................................. . . .......................................................
................................ ................................................
........... .....................................................................................
Using lex under UNIX systems
.. . . .. .. . ... . .. . ...................................... . ......................... . ....
. . ........................ ................ ... .............. . . .. .......... .
Compilation .......................... . ................ ...................... . .......... ............... .. ... ..........
Execution .. .... ... ....... ..... .... ......... .. .............. . ......... . ... ....... . . ... . .. . .....
Using make with lex ...... ........ ........... .. . ... . ..
. . . .. . ....... . .....
. .
. ....
. .
.. .....
. . .
..... ..
. .
... ....
. .
Basic specifications
The rules section
............ .. ............. . ..... . . . .....................................................................
............................................................ . .. . .... . ..... . ... . ..... . ... .. ......... .. ....
Terminal and non-terminal symbols
Actions .. . .. . ... . . . ... . .. . .
.... ......
... . ..... .
The declarations section
. ..
.... . ........... . . . . . .... . ..........................
.. . ........ ... .....
. ...................................... . .......................... . .................
... ....... .
.. . . . . .
.. . .
. . . . . . . .. . .
. . ... .
The start symbol .. .. . ............. ...... .. ............ .. ........... . ............ ..... . ... ......... .
C declarations .... .
........ .. . ....... ....... ....
.. . . . ... . .... . . .. .....
Support for arbitrary value types .... ...... .. . . .... ...... . . . . . ... ... .
Other declarations ... . . .... ... ...... . .. . . .
. . .. . . . . .. . . . ... . ......
The subroutines section ..................-...............................................................................
Lexical analysis ................................................................................................................
lex and yylval ..........................................................................................................
Token numbers .......................................................................................................
The end marker .......................................................................................................
Reserved token names ............�..............................................................................
The yacc environment ....................................................................................................
Compiling and running the parser .............................................................................
Compiling ................................................................................................................
Running the parser .................................................................................................
Parser operation ...............................................................................................................
The shift action ........................................................................................................
The reduce action .....................................................................................................
Ambiguity and Conflicts ...............................................................................................
Disambiguating rules ............................................................................................
Precedence .........................................................................................................................
Assigning a precedence to grammar rules ........................................................
Error handling ..................................................................................................................
The error token ........................................................................................................
Interactive error recovery ... ................................................................. .................
. . . ... .. ..................
.. .
. . ............ ..... ...... . ...... .. . .....
. ........
... .
Table of contents
. .. .
.. ....
..... ........
..... .. .
. . . ........ ..... .
.. ... ....... . .
. .
. ..
.. ....
. . ..
... . ...... ... .... .
..... .
... .
.. . . ...
. .
Hints for preparing specifications
Input style
Left recursion
Lexical tie-ins
Reserved words
A simple example
An advanced example
m4: A macro processor
Invoking m4
Defining macros
Changing the quoting marks
Using arguments
Using built-in arithmetic functions
Manipulating files
Using system commands
Using conditionals
Manipulating strings
Appendix A
ANSI implementation-defined behavior
Identifying diagnostics
Arguments to main()
Interactive devices
Significant characters without external linkage
Significant characters with external linkage
Significance of character case
Source and execution character sets
Multi-byte shift states
Bits per character
Mapping character sets
Constants with unrepresented characters and escape sequences
Constants with multiple or wide characters
Locale used for multi-byte conversion
Range of char values
Integer range and representation
Demotion of integers
Signed bitwise operations
Sign of division remainder
Right shift of negative-valued signed integer
Floating point
Floating-point range and representation
Converting an integer to a floating-point
Converting a floating-point to a narrower floating-point
Arrays and pointers
Largest array size
Casting pointers
Pointer subtraction
Using registers
Structures, unions, enumerations, and bit-fields
Improper access to a union
Padding and alignment of members of structures
Sign of bit-fields
Order of allocation of bit-fields
Alignment of bit-fields
The type of values of an enumerated type
Access to volatile objects
Maximum number of declarators
Table of contents
Maximum number of case values
Preprocessing directives
Character constants and conditional inclusion
Locating includable source files
Including files with quoted names
Character sequences
Definitions for date and time
Library functions
Expanding the NULL macro
Diagnostic printed by the assert function
Character testing
Math functions and domain errors
Underflow of floating-point values
Domain errors and the fmod function
The signal function
Default signals
Signal blocking
The SIGILL signal
Terminating new-line characters
Space characters before a new-line character
Null characters appended to a binary stream
File position in append mode
Writing on text stream
File buffering
The existence of zero-length files
Composing valid file names
File access limits
Removing open files
Renaming with a name that exists
Output of pointer values
Input of pointer values
Reading ranges
File position errors
Messages generated by the perror function
Allocating zero memory
The abort function and open and temporary files
The exit function
Environment names
The system function
....................... .........................................................................
The strerror function
The time zone
The clock function
................. .................................................................................
Locale-specific behavior .
. .................................. ...........................................................
Content of execution character set
Direction of printing .......................................... ....................................................
Decimal point character ........................................................................................
Character testing and case mapping ..................................................................
Collation sequence .................................................................................................
Time and date formats ..........................................................................................
........... ................. .......................................
C6.0 Implementation limits description ...................................................................
Environmental limits .............................................................................................
Translation limits
Appendix B
Compiler exit codes and error messages
Compiler exit codes ........................................................................................................
Command-line error messages ....................................................................................
Command-line fatal-error messages ..................................................................
Command-line error messages ............................................................................
Command-line warning messages .....................................................................
Compiler error messages ...............................................................................................
Fatal-error messages ..............................................................................................
Compilation-Error messages ...............................................................................
Warning messages ..................................................................................................
Table of contents
This guide explains how to use the UNIX Development System to create and
maintain C programs. The system provides a broad spectrum of programs
and commands to help you design and develop application and system soft­
The following sections introduce the programs and commands of the UNIX
Software Development System.
Creating programs
The C programming language can meet the needs of most programming pro­
jects. The UNIX Development System provides a full set standards­
conforming libraries that will support you in the development of portable
You can use the cc command to compile and link C programs. This command
also accepts and processes object files, library archives, and assembly lan­
guage files.
The link editor ld links relocatable object files to produce executable pro­
grams. Note that the cc command invokes the linker automatically, so the use
of ld is optional.
lex and yacc
You can create source files for lexical analyzers and parsers using the program
generators lex and yacc. You use lexical analyzers in programs to pick pat­
terns out of complex input and convert these patterns into meaningful values
or tokens. You use parsers in programs to convert meaningful sequences of
tokens and values into actions. The lex program generates lexical analyzers,
written in C, from given specification files. The yacc program generates
parsers, written in C, from given specification files. You can use lex and yacc
together to make complete programs.
You can preprocess C and assembly language source files, or even lex and
yacc source files, using the m4 macro processor. The m4 program performs
several preprocessing functions, such as converting macros to their defined
values and including the contents of files into a source file.
Maintaining program
source files
You can automate the creation of executable programs from C and assembly
language source files and maintain your source files using the make program
and the sees (Source Code Control) commands.
The make program is the UNIX program maintainer. It automates the steps
required to create executable programs and provides a mechanism for ensur­
ing that programs are up-to-date.
You can use the make(eP) command to simplify the tasks of maintaining,
updating, and regenerating your executable programs. This tool is especially
useful for large programming projects, because it ensures that all necessary
compilations are performed, and minimizes the time wasted by unnecessary
compilation. It can also be used to advantage with smaller programming
make executes the instructions in the makefile to update one or more target
files. It checks all of the components of the specified target file to ensure that
only the portions that have been changed since the target file was last created
are compiled. After compiling all updated source files, make can call the
linker to create the executable file. make can be used to run UNIX shell com­
mands so that it can take advantage of other operating system features.
Programming Tools Guide
Using this guide
make follows certain filename extension conventions that direct it to call
other processors such as l e x(eP) and yacc(eP). Files that need to be processed
by one of these utilities can be included in the makefile.
! :E)
make can extract files from sees when it updates your target file(s). See the
following section for more information on sees.
The Source Code Control System (sees) is an integral part of the Develop­
ment System. The sees commands allow you to maintain a history of the
changes to your source files. sees keeps all previous versions of your source
files so that you can trace changes, or even regenerate earlier versions as
Refer to the sees chapter in this guide for more information on using sees.
Using this guide
This guide is intended for programmers who are familiar with the C program­
ming language and the UNIX system. The following list briefly describes each
Chapter 1, ucompiling and Linking Your Program," explains how to compile
and link using the cc command.
Chapter 2, "The Link Editor," contains information on the Link Editor and the
command language which can be used to modify the default behavior of
Chapter 3, "lint," examines C language source files to detect bugs and obscure
code constructions.
Chapter 4, "C Programmer's Productivity Tools," teaches you how to use
cscope, a browser, and lpro f, a line profiling tool.
Chapter 5, "make," helps you keep track of file-to-file relationships, the order
of command executions, and general file maintenance.
Chapter 6, "Source Code Control System (Sees)," explains how to control and
maintain all versions of a project's source files using the sees commands.
Chapter 7, "Shared Libraries" describes the C shared Libraries.
Chapter 8, "lex," explains how to create lexical analyzers using the program
generator. lex.
Chapter 9, "yacc," explains how to create parsers using the program genera­
tor, yacc.
Chapter 10, "m4," explains how to imbed m4 macros in your files and prepro­
cess them.
Documentation conventions
SCO documents use font changes and other typographic conventions to dis­
tinguish text elements. The following table shows these conventions:
Font conventions
cc or cc(CP)
command. The "CP" indicates the manual page section in
which the command is documented.
open or open(S) system calls, library routines, kernel functions, C keywords.
The "S" indicates the manual page section in which the
command is documented.
b . errno
structure member
environment or shell variable
named constant
data value
user input
Programming Tools Guide
Development System documentation set
Development System documentation set
The contents of the sea UNIX System V/386 Development System documen­
tation set are illustrated here. In addition to these books, one set of Release and
Installation Notes is shipped with each copy of the Development System soft­
Developer's Overview
Programmer's Reference
Manual (2 volumes)
Programming Tools Guide
Developer's Topics
Debugging Tools Guide
User Interfaces Guide
Device Driver Writer's Guide
Macro Assembler Writer's Guide
The books included with the Development System are described here.
Developer's Overview
introduces the Development System facilities and gives general in­
formation about developing software to run on sea UNIX systems
and the supported cross-development environments
Programmer's Reference Mll nual
a two-volume set that includes manual pages for the entire Devel­
opment System, including sections (CP), (DOS), (FP), (S), and
(XNX). See the "Manual pages" article in the Encyclopedia for a
description of these manual page sections.
contains articles that give background information about system
internals, descriptions of facilities, and other general issues. Arti­
cles are arranged alphabetically.
Developer's Topics
a collection of technical papers about topics of interest to users of
the Development System. Many of these papers include extensive
examples that illustrate features that sea added to the porting
Programming Tools Guide
provides generally useful information about programming tools
and their use in developing software. It also provides
implementation-specific details about the ANSI-conforming C
compiler that is provided with the Development System.
Debugging Tools Guide
provides generally useful information about the debugging tools,
and their use in tracking down and eradicating problems in C and
assembly language programs. In addition, it includes information
on using dbXtra to debug C and C++ programs in a Motif-based
windowing environment.
User Interfaces Guide
introduces the facilities that are available for developing user
interfaces, and gives detailed instructions for using curses(S), the
Extended Terminal Interface (ETI), and writing user interfaces that
can be run on ASCII terminals. This book includes numerous
examples of curses programs.
In addition, the following books are available for Development System custo­
mers who need them:
Device Driver Writer's Guide
Provides guide and reference information describing how to write
device drivers for sco UNIX systems. This book is sold separately.
Macro Assembler Writer's Guide
Provides information on the assembly language for the Intel 286,
386, and 486 processors. Developers who are writing device
drivers or applications that have tight performance requirements
may want to write portions in assembly language. Return the
coupon enclosed with the Development System documentation to
receive a copy.
Using the Documentation
We recommend the following approach to using this documentation set. In
general, we suggest the following:
1 . All users should read the Developer's Overview thoroughly, soon after in­
stalling the software. Experienced users may skim some sections, but this
book gives a comprehensive overview of the Development System's capa­
2. Use the Encyclapedia to look up topics about which you need more infor­
mation. The Encyclapedia gives both summary and background informa­
tion on many topics, and includes cross-references to sources of more in­
depth information.
3. The Programmer's Reference Manual contains complete reference informa­
tion about all capabilities provided in the Development System. New
users are warned, however, that the manual pages provide reference
material that may be difficult to utilize if they do not understand basic
Programming Tools Guide
Development System documentation set
The Programmer's Reference Milnual contains separate sections for Develop­
ment System commands (CP), system services (S), Development System
file formats (FP), and other topics. The pages in each section of the
Programmer's Reference Milnual are arranged alphabetically, although some
related facilities are grouped together on a page. If you canno t locate the
correct manual page, use the permuted index to find the one that contains
the information you want. The "Manual pages" article in the Encyclopedia
explains how to use the permuted index.
4. The Programming Tools Guide and Debugging Tools Guide are mainly for
users who have not used one of these tools before, or who want to reac­
quaint themselves with some aspect of that tool's operation. These books
are good places to look when the Encyclopedia and Programmer's Reference
Manual are giving you lots of detail, but not enough guidance. Nonethe­
less, even experienced programmers will discover helpful hints and practi­
cal tips on the tools discussed in these books.
Commercial books and articles
A number of fine books and articles are published commercially that discuss
how to develop software on the UNIX Operating System. We have not
attempted to replicate guide information on all of these topics; manual pages
are provided, and many articles in the Encyclopedia list additional sources of
information on their subject matter. In addition, the Encyclopedia includes a
large "Bibliography" that will interest users of the Development System.
In particular, we recommend that all developers have the following standard
textbooks on their shelves:
Brian Kernighan and Dennis Ritchie, The C Programming Language, 2nd Edition.
Marc Rochkind, Advanced UNIX Programming.
W. Richard Stevens, UNIX Network Programming.
Maurice Bach, Design of the UNIX Operating System
See the "Bibliography" in the Encyclopedia for full citations if you are not fami­
liar with these books.
Programming Tools Guide
Chapter 1
Compiling and linking C langu age
The Development System provides two C language compilers: cc(CP) i s a
fully ANSI-compliant implementation, based on the Microsoft C compiler Ver­
sion 6.0, while the AT&T compiler rcc(CP) is available for programmers who
need to produce "K&R" (Kernighan and Ritchie) code. Most of the informa­
tion in this chapter deals with the cc command. See rcc(CP) for specific infor­
mation on this command.
The cc command accepts C source files, object files, assembly-language files
and library files; it compiles and links these files to create an executable file.
For further information, see the "Specifying input files" and "Specifying out­
put files" sections in this chapter.
cc also controls the linking process. You can alter the way in which files are ·
linked by using command line options to the cc command. For information
on linker options, see the "Linking'' section.
The parser called by cc can be used to check the syntax of your C language
source files without compiling them. See the section called "Checking syntax"
for more information.
The cc command automatically produces optimized machine language code.
You do not have to specify optimizing instructions to cc unless you want to
change the way cc optimizes, request more sophisticated optimizations, or
disable optimization altogether for ease of debugging. For more information,
see the "Optimizing'' and "Preparing for debugging'' sections in this chapter.
Compiling and linking C language programs
The cc command can ensure that the source code conforms to the ANSI,
POSIX, XPG3, IBCS2 and SVID standards for C language compilers. For further
information, see the sections pertaining to each standard.
You can use cc to compile programs to run under sec XENIX, or to incor­
porate functions from the XENIX libraries. For more information, see the
ncompiling programs for XENIX" section.
You can also compile programs for DOS or OS/2 systems with the cc com­
mand. For more information, see the ncompiling programs for DOS and
OS/2" section.
There are several other programming tools that you may want to use for vari­
ous development tasks, and these are discussed at the end of this chapter in
the nusing other programming tools" section.
Refer to the cc(CP) manual page for detailed information on cc functionality.
Compiling simp le programs
You can compile a simple C language source file like the one in the following
example to produce an executable program using cc without specifying any
B i nc lude < s t d i o . h >
ma i n ( )
pr i nt f ( ' Th i s i s a test \ n ' ) ;
To compile and link this program, enter this co mmand:
cc test.c
cc echoes the filename to the screen and then returns you to the system
prompt. The executable file produced by cc is called a.out. To test the pro­
gram, enter the command:
You will see the following response on your screen: .
Thi s i s a test
Programming Tools Guide
Specifying input files
cc options
cc offers a large number of command options to control the compiler. You can
use command options to rename your executable file, save the generated
object files, modify or disable optimization, and so on. Options consist of a
dash ( - ) followed by one or more letters. Refer to the
cc(CP) manual page for
detailed information on the command line options .
Options can appear anywhere on the
cc command line. In general, an option
applies to all files that follow it on the command line, and does not apply files
preceding it. Except where specifically noted in the manual page, options do
not affect any object files specified on the command line.
Specifying input files
cc command can process source files, object files, and library files. It is a
common prac tice to use the file extension to indicate the type of file, although
this is not mandatory when using cc.
File extensions
cc uses the filename extension to determine what kind of processing the file
requires, as shown in Table 1 . 1 .
Table 1 -1
File extensions
Ac tion taken by
compiled as a
C source file
linked as an object file
passed to the linker and searched as a library
file, unless the
-c option, which suppresses the
linking stage, is specified
passed to
masm as an OMF assembly-language
source file
passed to
source file
linked as an object file
anything else
as as a COFF assembly-language
linked as an object file, unless the -Tc option (see
below) is used
Compiling and linking C language programs
Specifying source files
To indicate to
cc that a file is a C source file even though it does not have the .c
extension, u s e the -Tc co mmand line option. This tells cc that th e file that fol­
lows the option is a C source file. One or more spaces can appear between -Tc
and the filename. A separate -Tc option must appear before each source file
that has an extension other than .c, as shown below:
cc main.c -Tc test..prg -Tc collate.prg print.prg
This example compiles three source files:
Because print.prg is specified without a
After compiling the three source files,
main.c, test.prg,
Tc option, cc treats it as an object file.
cc links the object files main.o, test.o,
collate.o, and print.prg, creating the executable file, a. out.
Specifying output files
Normally, after the linker creates the executable file from the object files and
libraries, it removes the generated object files. If the linker is unable to create
the executable file, the object files are not removed from the system. Object
files which were specified on the
of the link phase.
cc command line are not removed at the end
Naming object files
C source files, i t gives each object file created th e base name
.o extension. (The base name consists
in the filename up to the final dot.) The -Fo option lets you
When cc compiles
of the corresponding source file plus the
of all characters
specify alternate object filenames or cause object files to be created in a
different directory. Object files specified on a -Fo option will not be removed
at the end of the link phase.
The -Fo option is used as follows:
cc -Foobj.file.o filel.c
The object filename argument must appear immediately after the option, with
no intervening spaces; the corresponding source file must follow the object
filename, separated from it by whitespace. In the preceding example, .file1 .c is
compiled into the object file obj.file.o .
NOTE You are free to supply any name and extension you like for objfile.
However, it is recommended that you use the conventional .o extension
because the linker uses .o as the default extension when processing object
A .o extension is added if you do not specify one.
Programming Tools Guide
If you specify a directory (which must end with a forward slash) following the
-Fo option, cc creates the object files in the specified directory and gives them
the default filename (the basename with the extension .o). Otherwise, the
object files is created in the current directory. Here is an example of naming
object files:
cc -Foobject1/ this.c that.c -Fo/src/newthose those.c
In this example, the first -Fo option causes object1/this.o to be created as a
result of compiling this.c. The compiler also creates object1/that.o as a result of
compiling that.c. The second -Fo tells the compiler to create /src/newthose.o
from those.c).
Naming the executable file
H the cc
command successfully compiles your source files, it calls the linker to
create the executable file. By default, cc names the executable file a.out. You
can specify an alternate name by using the -o command line option. This
option lets you specify the name of the executable file or cause it to be created
in a different directory. The cc command creates only one executable file, so
you can specify the -o option anywhere on the co mmand line. If more than
one -o option appears, the last one on the command line is used. Here is an
cc -o process ./*.c
This example compiles all source files in the current directory with the exten­
sion .c. The resulting object files are linked and the executable file is named
After compiling the source files specified on the command line, cc calls the
linker, ld, to create the executable file. The ld command combines the object
files generated by cc with any object files specified on the co mmand line,
resolves any external references, and copies the result to the executable file.
You can specify linker options after the -link option, which must appear at the
end of the cc command line. These are passed to the linker at the end of the
compilation. As well, any options not understood by the compiler will be
passed on to the linker.
When an object file is linked, the linker tries to resolve any external references
in the object file by searching the library file(s). Any definitions required from
the standard C library are automatically linked in without your having to
specify this library on the command line. There are a number of standard
libraries: the memory-model (-M) option you specify on the cc co mmand line
determines which are used. If you use the default small memory-model
(-Ms), cc uses the libraries libc.a and libintl.a .
Compiling and linking C language programs
The linker called by cc is !bin/ld, which produces executable files in Common
Object File Format (COFF) by default. If the object files specified on the com­
mand line are in Object Module Format (OMF), the linker creates a XENIX exe­
cutable file. You must use one of the cc command line options -xenix, -xout,
or -x2.3 to produce OMF files. You cannot link OMF and COFF files together in
one executable.
Linking with additional libraries
The -1 option may be used to specify additional libraries. The -1 option and its
argument must appear after the file or files which require the library. If you
specify additional libraries to cc (such as the math, curses, and lex libraries),
the linker searches these libraries before it searches the default library to
resolve external references in the object files. To compile a program with the
lex library, use the 1 argument after the -1 option, as shown below:
cc lex.yy.c -11
If a library name includes a path specification, the linker searches only that
path for the library. If you specify only a library name (without a path specifi­
cation), the linker searches for the specified library file in the locations listed
below, and in the order given:
the current directory
any paths specified, in their order of appearance on the command line
the default library location, /lib
If a library name is specified without an extension, the linker automatically
assumes the .a extension. To link with a library file that has any other exten­
sion, specify the complete library name, including the extension.
Error and warning messages
Error and warning messages can appear at different stages o f program devel­
opment and usage:
During compilation, the compiler generates a broad range of error and
warning messages to help locate errors and potential problems in the C
source code. Compiler error and warning messages begin with the letter C,
followed by a four-digit error number.
During linking, the linker generates error messages.
During program execution, the system displays runtime error messages.
This category includes messages about core dumps, segmentation viola­
tions, and floating-point exceptions (errors generated by an 8087, 80287, or
80387 coprocessor).
Programming Tools Guide
Error and warning messages
Compiler error messages are sent to standard error, which is usually the ter­
minal. The general form of error messages is as follows:
f i l ename ( l ine number ) : error or warning number : mes sage
For a full list of all possible error and warning messages, see "Appendix E."
Compiler error and warning messages are numbered in the following
Fatal compiler errors are numbered from ClO O O .
Other compiler errors are numbered from C 2 0 0 0 .
Compiler warnings are numbered from C4 0 0 0 .
You can redirect the messages to a file b y using the Bourne shell o r Korn shell
redirection symbols at the end of your command line:
cc ffie.c -o file.o 2> file.err
If you are using the C shell, you can redirect the messages to a file by using
standard redirection syntax:
cc file.c -o file.o >&: file.err
Setting the warning level
You can suppress warning messages produced by the compiler by using the
-W(0123] or -w (warning) options. Compiler warning messages are any mes­
sages numbered C4000 and up. Warnings indicate potential problems (rather
than actual errors) with statements that may not be compiled as you intend.
The -W(0123] and -w options apply only to source files specified on the com­
mand line: they do not apply to object files.
The -WO and -w options tum off warning messages. These options are useful
when you compile programs that deliberately include questionable state­
ments. The -WO or -w option applies to the remainder of the co mmand line or
until the next -W(0123] option on the command line.
The -Wl option (the default) causes the compiler to display most warning
The -W2 option causes the compiler to display more warning messages.
Level 2 warnings may or may not indicate serious problems. They include the
use of functions with no declared return type
failure to put return statements in functions with non-void return types
data conversions that cause loss of data or precision
Compiling and linking C language programs
The -W3 option displays the highest level of warning messages, including
warnings about the uses of non-ANSI features and extended keywords and
about function calls before the appearance of function prototypes in the pro­
You can also use the -WX option to make any warning message function as a
fatal error. In this case, you would only create an executable file when no
warning or error messages are generated.
Checking syntax
The cc command can be used to check the syntax of source code. This can be
useful in detecting typing errors, mismatched braces and parentheses, and
other problems.
The -Zs option tells the compiler to perform a syntax check on the source files
that follow it on the command line. This option provides a quick way to find
and correct syntax errors before you try to compile and link a source file. The
compiler will display error messages if the source file has syntax errors, but
will not produce object files, object listings, or executable files.
The -fs option directs cc to generate a source listing that shows these error
messages in the locations they occurred.
Preparing for debugging
CodeView, or sdb, you must create an executable file with full
symbolic-debugging information and source-code line numbers so that the
debugger can match the object code to the relevant source code. The the -g
option to cc causes it to incorporate the information necessary for debugging
in the executable file.
The -Od option tells the compiler not to perform most optimizations: some
simple optimizations will still be performed. Without the -Od option, cc
optimizes by default. This is the only -0 option that should be used when
you plan to use a symbolic debugger with your object file, because optimiza­
tion usually involves rearrangement of instructions in the object code. This
could make it difficult to understand the order of execution when debugging.
If the -g option is specified with no explicit -0 options, all optimizations
involving code rearrangement are suppressed� although the compiler may
continue to perform simple optimizations. If any explicit -0 options are
specified, all requested optimizations are performed. Other optimization
options are discussed in the section on °0ptimizing.0
Programming Tools Guide
The optimizing capabilities of the C compiler can reduce the storage space or
execution time required for your program by eliminating unnecessary instruc­
tions and rearranging code. The compiler performs some optimizations by
default: you can use the -0 options to exercise control over the optimizations
Optimizing for speed
When you do not specify a -0 option, cc automatically uses -Ot, causing
program-execution speed to be favored in the optimization. Wherever the
compiler has a choice between producing smaller (but perhaps slower) and
faster (but perhaps larger) code, the compiler generates faster code.
Optimizing for code size
To cause the compiler to favor smaller code size instead of the default faster
code, use the -Os option.
Disabling optimization
The -Od option turns off all optimizations, except for some optimizations
within basic blocks (blocks of instructions that have a single entry point and a
single exit point). This is useful in the early stages of program development
to avoid spending the time optimizing code that will later be changed.
Because optimization may involve rearrangement of instructions, you should
specify the -Od option when you use a debugger with your program or when
you want to examine an object-file listing. Note that turning off or restricting
optimization of a program usually increases the size of the generated code.
Achieving consistent floating-point results
The -Op option is useful when floating-point results must be consistent
within a program. This option changes the way in which the program han­
dles floating-point values.
Ordinarily the compiler stores each floating-point value in an 80-bit register.
subsequent references to that value, the compiler reads the value from the
register. When the final value is written to memory, it is truncated, because
floating-point types are allocated fewer than 80 bits of storage (32 bits for float
type and 64 bits for double type). Therefore, the value stored in the register
may actually be more precise than the same value stored in a floating-point
If you use the -Op option, when floating-point values are referenced, the com­
piler reloads them from floating-point variables rather than from registers.
Because the value is truncated each time it is written to memory, over the
Compiling and linking C language programs
course o f the program a value which was maintained in the register may
become quite different from what it would have been had it been loaded from
memory each time the variable was referenced. Using -Op gives less precise
results than using registers, and it may increase the size of the generated code.
However, it gives you more control over the truncation (and hence the con­
sistency) of floating-point values.
Wide return values
Many programmers assume that if the C language source code does not have
accurate function declarations in scope, the functions will return type int
values anyway. This can cause problems if a function in your program actu­
ally returns a type char or type short value, due to the possibility that the
high-order part of the int is non-zero. Therefore, the cc command "widens"
the return value by default for 386 compilations.
You can use the -Oh option to cc to ensure that the char or short return values
are not "widened" to int. This option applies only to 386 compilations,
because the compiler does not "widen'' return values when the target plat­
form is not a 386.
Producing maximum optimization
The -Ox option is a shorthand way to combine optimizing options to produce
the fastest possible program. Its effect is the same as using -Otlige on the
command line.
Unsafe optimizations
Some optimizations are not always safe, that is, they rely on assumptions
about the code which will not be true of all C code. This does not imply a bug
in the optimization: rather, certain more "aggressive'' optimization procedures
will be carried out on the assumption that if the user specifies particular
optimization options, the code satisfies the criteria for safe optimization. In
particular, the -Oa option, which relaxes alias checking, the -Oc option, which
enables optimization of block-level common subexpressions, and the -Oe
option, which enables global register allocation, should be used with caution.
The -Ox option, which causes maximum optimization, will tum on some
optimizations which are possibly unsafe.
Programming Tools Guide
Standards confonnance
Standards conformance
The cc command conforms to the ANSI, POSIX, XPG3, and SVID standards.
For more information, see Appendix A, "ANSI Implementation defined lim­
Enabling or disabling language extensions
The C compiler is fully compliant with the ANSI C standard. In addition, it
offers a number of features beyond those specified in the ANSI C standard.
These additional features are enabled when the -Ze (default) option is in effect
and disabled when the -ansi option is in effect. They include the following:
the cdecl, far, fortran, huge, near, and pascal keywords
use of casts to produce values, as in this example:
int *p ;
( ( l on g * ) p ) + + ;
The preceding example could be rewritten to conform with ANSI C as
shown here:
( i n t * ) ( ( char * ) p + s i z e o f ( l o ng ) ) ;
use of trailing comma s (,) without ellipses (, ) in function declarations to
indicate variable-length argument lists, such as:
i n t p r i n t f ( char * , ) ;
use of bit fields with base types other than unsigned int or signed int
Use the -ansi option if you are porting your program to other environments.
This ensures that the code conforms to the ANSI standard. The -ansi option
tells the compiler to treat extended keywords as simple identifiers and disable
the other extensions listed previously.
Predefined macro names
The C compiler supports all of the predefined macro names found in the ANSI
standard for the C language. These provide a convenient means for obtaining
the date and time of the compilation and for indicating whether the compiler
purports to conform fully to the ANSI standard.
Compiling and linking C language programs
The following list explains each of these macro names:
date of compilation, expressed as a string literal in the
form: Mmm [ d ] d yyyy
filename, expressed as a string literal
number of the line that the __LINE__ macro appears
integer constant 0. If equal to 1, this macro indicates
full conformity with the ANSI C language standard.
time of compilation, expressed as a string literal in the
form: hh : mm : s s
The __TIMESTAMP__ macro name offers a capability not found in the ANSI
standard. It gives the date and time of last modification of the source file,
expressed as a string literal in the form:
Ddd Mmm [ d ] d hh : mm : s s yyyy
The time and date provided by __TIMESTAMP__ indicate the actual times­
tamp of the source file, whereas __DATE__ and __TIME__ indicate the time
of compilation.
The following code fragment uses three predefined macros with the message
#pragma to display informational messages at the time of compilation.
# p ragma m e s s a g e ( ' Comp i l a t i o n dat e :
# p ragma m e s s a g e ( ' Comp i l i n g : • F I LE
# p ragma m e s s a g e ( ' La s t mod i f i c a t i o n :
• __
• __
Here is the output you might see from the preceding code fragment:
Comp i l a t i o n da t e : Feb 1 1 1 9 9 2
Comp i l i n g : samp l e . c .
L a s t m o d i f i c a t i o n : Mon Feb 1 0 1 2 : 0 2 : 5 1 1 9 9 2
Portable Operating System for UNIX (POSIX)
You can force the cc co mmand to compile only POSIX-conforming code by
using the -posix command line option.
X/Open Portability Guide 3 (XPG3)
You can force the cc command to compile only XPG3-conforming code by
using the -xpg3 command line option.
Programming Tools Guide
Compiling programs for DOS and 05(2
System V Interface Definition Issue 2 (SVID)
You can force the cc command to compile only SVID-conforming code by
using the -svid command line option.
Intel Binary Compatibility Standard 2 (IBCS2)
You can force the cc command to compile only IBCS2-conforming code by
using the -iBCS2 command line option.
Compiling programs for XENIX
By default, cc produces object and executable files in the COFF format, which
is the same format used by the AT&T development system. CP)>-xenix" The
-xenix option causes cc to produce OMF object and executable files, which are
compatible with the XENIX System V Development System tools. When the
-xenix option is used with any of the options that produce assembly-language
output, the warning message normally issued ("masm direct ive s") is
The -xout option causes cc to produce OMF object and executable files that
include the functionality of SCO UNIX System V/386 Release 3.2. Programs
generated using this option may not run properly under XENIX.
The -x2.3 option is equivalent to the -xenix option, but it also includes the
extended functions available with XENIX System V/386 release 2.3. When
used with the memory model option -M2, which enables 80286 code genera­
tion, this option produces XENIX System V/286 2.3 compatible files.
SCO UNIX System V can execute either COFF or OMF programs.
Compiling programs for DOS and OS/2
The cc command is capable of creating object code that executes in the DOS or
OS/2 environments. The -dos option instructs the compiler to use the set of
libraries in /usr/lib/dos and to use a different linker, dosld(CP).
The -os2 option instructs the compiler to use the libraries in /usr/lib/os2 and to
call the os2ld linker command.
Programs compiled with -dos or -os2 cannot run in the UNIX System V
environment, and many UNIX System V system calls are not supported in DOS
or OS/2.
Compiling and linking C language programs
There is a variety of -FP options that can be used with -dos or -os2 to control
floating-point operations. See the DOS and OS/2 Development Guide for more
NOTE If either -dos or -os2 are specified on the cc command line, it over­
rides any other options specified on the command line, and the resulting
executable file is in DOS or OS/2 format. If both are present, the first one to
appear will be used.
The AT&T compiler rcc is provided in the Development System to ensure that
pre-ANSI C source programs can be compiled. The rcc command produces
COFF files only.
rcc accepts input files with .c extensions as C language source files; files with
.o extensions are assumed to be COFF object files. The rcc command calls the
AT&T assembler (as) to assemble any files that have the extension .s. Input
files with .i extensions are assumed to be preprocessed C source files. Files
with any other extension or with no extension are passed to the linker as
object files.
For more information on this compiler and the available options, see the
rcc(CP) manual page.
Programming Tools Guide
The Link Editor
This chapter contains information on the link editor,
ld(CP), and the com­
mand language which can be used to direct its operations.
ld creates a program by combining object files, performing relocations, and
resolving external symbol s. The input files are normally relocatable object
files, or library files containing relocatable objec t files. These will have been
created by cc(CP) or by previous invocations of ld. The output file created by
ld may be a relocatable object file or an absolute (executable) object file.
The command language supported by
ld allows the user a greater degree of
control over object files and their memory locations.
The discussion that follows relies heavily on the reader having knowledge of
the COFF file format. In particular, familiarity with the concept of a section is
essential to understanding this material. Refer to the
SCO UNIX System V/386
Development System Encyclopedia for more information.
A section of an object file is the smallest unit of relocation and must be a con­
tiguous block of memory. A section is identified by a starting address and a
size. Information desc ribing all the sections in a file is stored in section
headers at the start of the file. Sections in an ld input object file are called
input sections; sections in a file created by ld are called output sections. Sections
from input files are combined to form output sections that contain executable
text, data, or a mixture of both. The link editor performs allocation , that is, set­
ting aside some portion of the virtual address space for a section. Although
there may be holes or gaps between input sections and between output sec­
tions, storage is allocated contiguously within each output section and may
not overlap a hole in memory.
The Link Editor
Memory configuration
The virtual memory of the target machine is, for allocation purposes, parti­
tioned into configured and unconfigured memory. By default, ld treats all
memory as configured. It is common with microprocessor applications, how­
ever, to have different types of memory at different addresses. For example,
an application might have 3K of PROM (Programmable Read-Only Memory)
beginning at address 0, and 8K of ROM (Read-Only Memory) starting at 20K.
In this case, addresses in the range 3K to 20K-1 are not configured. Unconfig­
ured memory is treated as reserved or unusable by ld.
Unless otherwise specified, all discussions of memory, addresses, and so
forth, refer to the configured sections of the address space.
The physical address of a section or symbol is the relative offset from address
zero of the address space. The physical address of an object is not necessarily
the location at which it is placed when the process is executed. For example,
on a system with paging, the address is with respect to address zero of the vir­
tual space, and the system performs another address translation.
It is often necessary to have a section begin at a specific address. Specifying
this starting address is called binding, and the section in question is said to be
"bound td' or "bound at" the required address. While binding is most com­
monly relevant to output sections, it is also possible to bind special absolute
global symbols with a command language assignment statement.
Object files
Object files are produced both by the compiler (which generates an OMF file
and converts it to COFF using cvtomf) and by ld. ld accepts relocatable object
files as input and produces an output object file that may or may not be relo­
catable. Under certain special circumstances, the object files given to ld can
also be executable files.
Files produced by compilation may contain sections called .text, .data, .bss,
.init and .fini . The .text section contains the instruction text (executable
instructions), .data contains initialized data variables, and .bss contains unini­
tialized data variables. Consider the following fragment of C code:
Programming Tools Guide
Link editor comnumd language
c h a r abc [ 2 0 0 l ;
int i
100 ;
The uninitialized variable abc (line 1) is located in .bss, the initialized variable
i (line 2) is located in .data, compiled code from the C assignment (line 4) is
stored in .text. The .init section, if it exists, contains initialization code used in
shared libraries. The .fini section, if it exists, contains finalization code used in
shared libraries. Sections with other names than these may be present in
object files that were created by ld.
Link editor command language
This section gives information on the ld co mmand language. The command
language enables you to:
specify the memory configuration of the target machine
direct how ld combines the sections of an object file
bind sections to specific addresses or within specific portions of memory
define or redefine global symbols
Under normal circumstances there is no need for such tight control over
object files and their memory locations. When you do need to have precise
control over link editor output, you do it with the co mman d language.
The link editor is called using the command:
ld [options] filenamel filename2 ...
Link editor command language directives are passed in a file named on the
command line. Such a file is often called an ifile. Any file named on the com­
mand line that is not identifiable (from the magic number) as an object
module or an archive library is assumed to contain link editor directives.
Expressions in the link editor command language may contain global sym­
bols, constants, and most of the basic C language operators. (See the section
"Syntax diagram for input directives.") Constants are the same as in C, with a
number recognized as hexadecimal if it starts with 'Ox', as octal if it starts with
'0', and as decimal otherwise. All numbers are treated as long integers. Sym­
bol names may contain uppercase or lowercase letters, digits, and the under­
score character (_). Symbols within an expression have the value of the
address of the symbol only. ld does not do symbol table lookup to find the
contents of a symbol, the dimensionality of an array, structure elements
declared in a C program, and so on.
The Link Editor
The following names are reserved and hence unavailable as symbol or section
The supported operators are shown in Figure 2-1, in order of precedence from
highest to lowest:
! - - (UNARY Minus)
* I %
+ - (BINARY Minus)
>> <<
== != > < < = > =
= + = -= * = I =
Figure 2-1 Operator Symbols
The operators have the same meaning as in the C language. Operators on the
same line in the table have the same precedence.
Assignment statements
External symbols are globally visible functions or data items. They may be
defined and assigned addresses by an assignment statement in a file of ld
The syntax of the assignment statement is:
symbo l op expres s i o n ;
ap may be one of the one of the operators " = ", "+=", "- = ", "* = ", or " 1 = ".
Assignment statements must be terminated by a semicolon.
Programming Tools Guide
Link editor command language
Interaction with source code
Assignments statements in an ifile may refer to symbols defined in the object
files being linked. These statements cannot directly affect the value of the
symbols in the object file, but they can affect the addresses of these symbols.
The following is an example of how assignment statements interact with the
source code. Consider the C source file:
int i = 100 ;
int k = 2 2 ;
int j ;
ma i n ( )
p r i n t f ( ' Va l u e o f j i s % d \ n ' ) ;
Suppose that the object file corresponding to this program is linked with the
following ifile:
j =i+4
This directive causes the external symbol j to b e assigned the address o f i plus
4 bytes. This is the same address as external symbol k. The value found at ths
address will be the integer 22. The action of the C program will be to print
All assignment statements (with one exception, described in the next para­
graph) are evaluated after allocation has been performed. This occurs after all
symbols defined in the input file are appropriately relocated but before the
actual relocation of the text and data itself. Therefore, if an assignment state­
ment contains any symbol name, the address used for that symbol in the
evaluation of the expression reflects the symbols address in the output object
file. References within text and data to symbols given a value through an
assignment statement access this latest assigned value. Assignment state­
ments are processed in the order in which they are input to ld.
location counter sym bol
Assignment statements are normally placed outside the scope of section
definition directives (see "Section Definition Directives"). However, there is a
special symbol, the dot (.), that can occur only within a section definition
directive. This symbol refers to the current address of ld's location counter (a
pointer to the next memory address which may be allocated). Assignment
expressions involving "." are evaluated during the allocation phase of I d.
Assigning a value to "." within a section definition directive can increment
(but not decrement) the location counter and create holes within the section,
as described in "Section Definition Directives." Assigning the value of 0." to a
symbol permits the final allocated address of a section to be saved.
The Link Editor
The pseudo-function ALIGN is used to increment the location counter so as to
align a symbol to an n-byte boundary within an output section, where n is a
power of 2. This is done with the expression:
AL I GN ( n )
This function call is equivalent to:
+ n - 1) &
(n - 1)
SIZEOF and ADDR are pseudo-functions that, given the name of a section,
return the size or address of the section, respectively. These may be used in
assignment statements.
Types of expression values
Link editor expressions may have either an absolute or a relocatable value.
When ld creates a symbol through an assignment statement, the symbol's
value is the same type, absolute or relocatable, as the expression. That type
depends on the following rules:
an expression with a single relocatable symbol and zero or more constants
or absolute symbols is relocatable
the difference of two relocatable symbols from the same section is absolute
all other expressions are combinations of the above
Specifying a memory configuration
directives are used to specify:
the total size of the virtual space of the target machine
the configured and unconfigured areas of the virtual space
If no directives are supplied, ld assumes that all memory is configured. The
size of the default memory is dependent upon the target machine.
MEMORY directives are used to assign a name to a virtual address range. This
range is described by specifying its origin and length. Output sections can
then be bound to virtual addresses within specifically named memory areas.
Memory names can bP. up to 8 characters long and may contain uppercase or
lowercase letters, digits, and the characters u$ ", u .", and u_ . Names of mem­
ory ranges 'are used only by ld and are not carried in the output file symbol
table or headers.
Programming Tools Guide
Link editor command language
The syntax of the MEMORY directive is as follows:
name l ( a t t r ) : or i g i n = n l , l e n g t h = n 2
name2 { a t t r ) : o r i g i n = n 3 , l en g t h = n 4
For example:
mem l :
mem2 ( RW ) :
or i g i n = O x O O O O O O ,
or i g i n = O x 0 2 0 0 0 0 ,
l ength=Ox l O O O O
l ength=Ox4 0 0 0 0
The keyword origin (or org or o ) must precede the origin o f a memory range,
and length (or len or 1) must precede the length as shown in the above exam­
ple. The origin operand refers to the virtual address of the memory range.
The origin and length are entered as long integer constants in either decimal,
octal, or hexadecimal. The origin and length specifications, as well as individ­
ual MEMORY directives, may be separated by white space or a comma.
Attributes may be associated with a named memory area. These attributes fol­
low the memory area name and are enclosed in parentheses. The attributes
that may be specified are:
R - readable memory
I - initializable (stack areas are typically not initialized)
writable memory
- executable (instructions may reside in this memory)
If no attributes are specified on a MEMORY directive or if no MEMORY direc­
tives are supplied, memory areas assume all of the attributes R, W, X, and I.
By way of MEMORY directives, ld(CP) can be told that memory is configured
in some manner other than the default. If MEMORY directives are used, all
virtual memory not described in some MEMORY directive is considered to be
unconfigured. Unconfigured memory is not used in ld's allocation process:
nothing except DSECT sections (discussed later) can be link edited or bound to
an address within unconfigured memory. For example, it may be necessary to
prevent anything from being linked to the first OxlOOOO words of memory.
This can be accomplished by using MEMORY directives which do not mention
that portion of memory.
The Link Editor
Section definition directives
The purpose of the SECTIONS directive is to describe how input sections are
to be combined, to direct where to place output sections (both in relation to
each other and to the entire virtual memory space), and to permit the renam­
ing of output sections.
the default case where no SECTIONS directives are given, all input sections
of the same name appear in an output section of that name. If two object files
are linked, one containing sections s1 and s2 and the other containing sections
s2 and s3, the output object file contains three output sections s1 , s2, and s3.
The input sections s1 and s3 appear in output sections of the same name; the
two input sections named s2 appear in an output section named s2. The order
of these output sections depends on the order in which the link editor sees the
input files.
The basic syntax of the SECTIONS directive is:
s e c n am e l :
f i l e_spec i f i c at i o n s ,
a s s i gnment_s t a t emen t s
s e c name2 :
f i l e_spec i f i c a t i o n s ,
a s s i gnment_s t a t emen t s
File specifications
Within a section definition (a SECTIONS directive), the files and sections of
files to be included in the output section are listed in the order in which they
are to appear in the output section. Sections from an input file are specified in
a statement of the form:
f i l ename
s e c n ame )
f i l e name
s e c n a m l sec nam2 . . . )
White space or commas are used to separate file specifications and to separate
input section names within file specifications.
Programming Tools Guide
Link editor command language
The following is an example of a SECTIONS directive:
ou t s ec l :
f i l e 1 . o { sec 1 )
f i l e2 . o
f i l e 3 . o { s ec l , s e c 2 )
According to this directive, the order in which the input sections appear in the
output section outsect would be:
1 . section sect from file file1 .o
2. all sections from file2.o, in the order they appear in the input file
3. section sect from file file3.o, and then section sec2 from file file3.o
If there are any additional input files that contain input sections also named
outsect, these sections are linked following the last section named in the
definition of outsect . If there are any other input sections in file1 .o or file3.o,
they will be placed in output sections with the same names as the input sec­
tions unless they are included in other file specifications.
To refer to all the uninitialized, unallocated global symbols in a file, the fol­
lowing statement may be used in a file specification:
f i l ename [ COMMON ]
If a file name appears with no sections listed, then all sections from the file
(but not the uninitialized, unallocated globals) are linked into the current out­
put section.
The following code may be used in a file specification to refer to all previously
unallocated input sections of the given name, regardless of what input file
they are contained in:
* { s e c name )
The Link Editor
Loading a section at a specified address
To bind an output section to a specific virtual address, use a SECTIONS direc­
tive of the following form:
o u t s e c addr :
The value addr is a C constant which specifies the binding address. If outsec
does not fit at addr (perhaps because of holes in the memory configuration or
because outsec is too large to fit without overlapping some other output sec­
tion), ld issues an appropriate error message. addr may also be the word
BIND, followed by a parenthesized expression. The expression may use the
pseudo-functions SIZEOF, ADDR, or NEXT. NEXT accepts a constant and
returns the first multiple of that value that falls into configured unallocated
memory; SIZEOF and ADDR accept previously defined sections.
As long as output sections do not overlap and there is enough space, they can
be bound anywhere in configured memory. The SECTIONS directives defining
output sections need not be given to ld in any particular order, unless SIZEOF
or ADDR is used.
ld does not ensure that the size of each section consists of an even number of
bytes or that each section starts on an even byte boundary. The assembler
ensures that the size (in bytes) of a section is evenly divisible by 4.
The ld directives can be used to force a section to start on an odd byte bound­
ary, although this is not recommended. If a section starts on an odd byte
boundary, the section's contents are either accessed incorrectly or are not exe­
cuted properly. When a user specifies an odd byte boundary, ld issues a
warning message.
Aligning an output section
An output section may be bound to a virtual address that falls on an n-byte
boundary, where n is a power of 2. This may be done in order to take advan­
tage of the underlying architecture. For example, it may be possible to reduce
the number of instructions necessary to address a data object by aligning the
data object on a word boundary. This is performed using an ALIGN in a SEC­
TIONS directive. For example:
Programming Tools Guide
Link editor command language
out sec
AL I G N ( O x 2 0 0 0 0 )
The output section outsec is not bound to any specific address but is placed at
some virtual address that is a multiple of Ox20000.
Grouping sections together
The default allocation algorithm for ld does the following:
Links all input .init sections followed by .text sections into one output sec­
tion. This output section is called .text and is bound to the address OxO plus
the size of all headers in the output file.
Links all input .data sections together into one output section. This output
section is called .data and, in paging systems, is bound to an address
aligned to a machine-dependent constant plus a number dependent on the
size of headers and text.
Links all input .bss sections together with all uninitialized, unallocated glo­
bal symbols, into one output section. This output section is called .bss and
is allocated so as to immediately follow the output section .data.
If any SECTIONS directives are specified, they replace the default allocation
algorithm. Rather than relying on the ld default algorithm when manipulat­
ing COFF files, the one certain way of determining address and order informa­
tion is to take it from the file and section headers. The default allocation of ld
is equivalent to the following directive, where align_value and
sizeof_headers are machine-dependent constants:
. text
sizeof_headers : { * [ . i n i t )
NEXT ( align_value ) +
* ( . text )
* ( . fini ) )
( ( S I Z EOF [ . t e x t )
A D DR ( . t e x t ) )
% Ox2000 ) )
. da t a
. bs s
The Link Editor
The GROUP command ensures that the two output sections .data and .bss are
allocated together. Binding or alignment information is supplied only for the
group and not for the output sections contained within the group. The sec­
tions making up the group are allocated in the order listed in the directive.
For compatibility with UNIX System V Release 2, the addresses of these sec­
tions cannot change. Unfortunately, .init sections in the algorithm above will
interfere with the placement of the signal recovery routines. Hence the .text
sections are linked into the a ou t .text section first. The .init sections (for
shared libraries) and the .fini sections follow all of the .text sections. Routines
in crtl.O (a C runtime startup routine) branch to the .init sections before cal­
ling the main( ) function of the program.
The following SECTIONS directive may be used to place .text, .data, and .bss
in the same segment of memory,
. text
. da t a
. bs s
Note that there are still three output sections (.text, .data, and .bss), but now
they are allocated into consecutive virtual memory.
This entire group of output sections could be bound to a starting address or
aligned simply by adding a field to the GROUP directive in the above example.
To bind the group to OxCOOOO, add the address after the GROUP keyword:
G ROUP O x C O O O O :
This change causes the output section .text to be bound at OxCOOOO, followed
by the remaining members of the group in order of their appearance. To align
the group to OxlOOOO, add an ALIGN after the GROUP keyword:
G ROUP A L I G N ( O x l O O O O )
This change will causes the output section .text to be aligned to OxlOOOO, fol­
lowed by the remaining members of the group.
Programming Tools Guide
Link editor command language
When the GROUP directive is not used, each output section is treated as an
independent entity:
. text
. da t a AL I G N { O x 4 0 0 0 0 0 }
. bs s
this example, the .text section starts at virtual address OxO (provided that
address is in configured memory) and the .data section starts at a virtual
address aligned to Ox400000. The .bss section immediately follows the .text
section if there is enough space. If not, it follows the .data section. The order
in which output sections are defined to ld cannot be used to force a certain
allocation order in the output file.
Creating holes within output sections
The special symbol dot ("."), representing ld's location counter, appears only
within section definitions and assignment statements. When it appears on the
left side of an assignment statement, "." causes the location counter to be
reset, leaving a hole in the output section. Holes built into output sections in
this manner take up physical space in the output file and are initialized using
a fill character. The default fill character is OxOO; alternately, the user may sup­
ply a fill character. See the discussion of filling holes in ''Initialized section
holes or .bss sections".
Consider the following section definition:
ou t s ec :
. += OxlOOO ;
f l . o { . text )
f 2 . o { . text }
f 3 . o ( . text }
OxlOO ;
a l ign I 4 J ;
The effect of this command is as follows:
line 3 increments the location counter by OxlOOO, thereby leaving a OxlOOO
byte hole, filled with the default fill character, at the beginning of the sec­
in line 4, the .text section of input file fl.o is linked after the hole that was
just left
in line 5, the location counter is incremented by OxlOO, leaving a OxlOO byte
hole filled with the default fill character
The Link Editor
in line 6, the .text section of input file f2.o is linked following the second
hole; this section begins OxlOO bytes from the end of fl.o (.text).
line 7 causes the location counter to be aligned with the next 4-byte bound­
ary (that is, the next double-word boundary)
in line 8, the .text section of f3.o is linked; the effect of lines 7 and 8 is to
cause this section to start at the next full word boundary following the .text
section of f2.o. The boundary is determined relative to the beginning of the
output section outsec.
For the purposes of allocating and aligning addresses within an output sec­
tiOil, ld treats the output section as if it began at address zero. As a result, in
the above example, if outsec ultimately is linked to start at an odd address,
then the part of outsec built from f3.o (.text) also starts at an odd address,
even though f3.o (.text) is aligned to a full word boundary. This may be
prevented by specifying an alignment for the entire output section, as follows:
outsec ALIGN ( 4 )
Expressions that decrement "." are illegal. Subtracting a value from the loca­
tion counter is not allowed, since this can cause memory to be overwritten.
Creating and defining symbols at link-edit time
Assignment statements can be used to give symbols a value that is link-edit
dependent. For example, we just saw that the "." symbol can be used to adjust
the location counter during allocation. It is possible to assign
allocation-dependent values to other symbols. These can be symbols that
were defined in an object file that is being linked, or they may be symbols that
are used only in the ifile. This provides a way to assign to symbols addresses
known only after allocation.
For example:
ou t s c l :
ou t s c 2 :
f i le l . o { s l }
s 2_s t a rt
f i le2 . o { s2 }
s 2_end
- 1;
The symbol s2_start is defined to be the address of file2.o(s2), and s2_end is
the address of the last byte of file2.o(s2) .
Programming Tools Guide
Link editor command language
Consider the following example:
ou t s c l
f i l e l . o ( . da t a l
= . ;
. += 4 ;
f i l e2 . o ( . da t a )
In this example, the symbol mark is created and is equal to the address of the
first byte beyond the end of file1 .o's .data section. Four bytes are reserved for a
future run-time initialization of the symbol mark. The type of the symbol is a
long integer (32 bits).
Assignment instructions involving "." must appear within SECTIONS
definitions since they are evaluated during allocation. Assignment instruc­
tions that do not involve "." can appear within SECTIONS definitions but
typically do not. Such instructions are evaluated after allocation is complete.
Reassignment of a defined symbol to a different address is dangerous. For
example, if a symbol within .data is defined, initialized, and referenced within
a set of object files being link-edited, the symbol table entry for that symbol is
changed to reflect the new, reassigned physical address. However, the associ­
ated initialized data is not moved to the new address, and there may be refer­
ences to the old address. ld issues warning messages for each defined symbol
that is being redefined within an ifile. However, it is safe to assign of absolute
values to new symbols because there are no references or initialized data asso­
ciated with these symbols.
Allocating a section into named memory
It is possible to specify that a section be linked somewhere within a named
memory range, previously defined on a MEMORY directive.
The Link Editor
For example:
mem l :
mem2 ( RW ) :
mem 3 ( RW ) :
mem l :
o=Ox l 2 0000
l =O x l O O O O
l =O x 4 0 0 0 0
l =Ox40000
l =Ox04000
ou t s ec l :
ou t s ec 2 :
f l . o ( . da t a )
f 2 . o ( . da t a )
> mem l
> mem3
The '>' operator (analogous to the UNIX System redirection operator) directs
ld to place outsecl anywhere within the memory area named meml (that is,
somewhere within the address range OxO-OxFFFF or Ox120000-0x123FFF). The
output section outsec2 is to be placed somewhere in the area named mem3,
that is, the address range Ox70000-0xAFFFF.
Initialized section holes or .bss sections
When holes are created within a section, ld normally fills them with bytes of
zero (OxOO). By default, .bss sections are not initialized at all; that is, no initial­
ized data is generated for any .bss section by the assembler nor supplied by
the link editor.
SECTIONS directives may be used to initialize such holes or .bss output sec­
tions to an arbitrary 2-byte pattern. Such initialization options apply only to
.bss sections or holes. For example, an application might want an uninitial­
ized data table to be initialized to a constant value without recompiling the .o
file, or a hole in the text area to be filled with a transfer to an error routine.
An entire output section may be initialized, or specific areas within an output
section. However, since no text is generated for an uninitialized .bss section, if
part of such a section is initialized, then the entire section is initialized. In
other words, if a .bss section is to be combined with a .text or .data section
(both of which are initialized) or if part of an output .bss section is to be ini­
tialized, then one of the following will apply:
Explicit initialization options may be used to initialize all .bss sections in
the output section.
ld will use the default fill value to initialize all .bss sections in the output
Programming Tools Guide
Link editor command language
Holes are filled using a statement of the form:
s e c t i on_name :
l o n g_ i n t
f i l e_spec i f i c a t i o n
l o n g_ i n t
Consider the following l d file:
s ec l :
fl . o
. = + Ox200 ;
f2 . o ( . text )
O x DF F F
sec2 :
f l . o ( . bs s )
f 2 . o ( . bs s )
sec3 :
f 3 . o { . bs s )
s ec 4 : { f 4 . o ( . bs s )
the example above, the Ox200 byte hole in section secl is filled with the
value OxDFFF. In section sec2, fl.o (.bss) is initialized to the default fill value of
OxOO, and f2.o (.bss) is initialized to Ox1234 . All .bss sections within sec3 as
well as all holes are initialized to OxFFFF. Section sec4 is not initialized; that is,
no data is written to the object file for this section. When unconfigured areas
exist in the virtual memory, each application must assume responsibility for
forming output sections that will fit into memory. For example, assume that
memory is configured as follows:
mem l :
mem2 :
mem3 :
Suppose that the files fl . o, f2.o, . . . fn.o each contain three sections .text, .data,
and .bss, with the combined .text section length being Ox12000 bytes. There is
no configured area of memory in which this section can be placed, because the
longest memory range that has been defined is only OxlOOOO bytes long.
The Link Editor
Appropriate directives must be supplied to break up the .text output section
so ld may do the allocation. The following set of directives group the .text
sections from the input files into a number of output sections txtl, txt2, and so
txt l :
f l . 0 ( . text l
f 2 . 0 ( . text l
f3 . o ( . text )
txt2 :
f 4 . o ( . text )
f 5 . o ( . text )
f 6 . o ( . text )
Changing the entry point
The UNIX System a. out optional header contains a field for the (primary) entry
point of the file. This field is set using one of the following rules: the rule used
is the first one (from the order given) that applies:
1 . The value of the symbol specified with the -e option to ld is used if present
2. The value of the symbol _start is used if present
3. The value of the symbol main (defined somewhere in the user program) is
used if present
4. The value zero is used
The start symbol (the C language startup routine) is usually defined in the crt
file found in /lib . It may be defined in an ifile using an assignment statement
of the form:
_s t a rt
expres s i on ;
If ld is called through cc, a startup routine is automatically linked in. When
the program is executed, this startup routine causes exit( ) to be called after
main( ) finishes to close file descriptors and do other cleanup. When calling ld
directly or when changing the entry point, the user must supply the start-up
routine or make sure that the program always calls exit rather than falling
through the end. Otherwise, the program will dump core.
Programming Tools Guide
Using archive libraries
Using archive libraries
Each member of an archive library is a complete object file. Archive libraries
are created with the ar(CP) command from object files generated by cc or as.
Libraries may be searched by the link editor to find definitions for undefined
symbols. For a member to be extracted from the library it must satisfy a refer­
ence that is known to be unresolved at the time the library is searched. Only
those members that are required to resolve existing undefined-symbol refer­
ences are taken from the library for link editing.
Libraries can be placed both inside and outside section definitions. When a
library member is included by searching the library inside a SECTIONS direc­
tive, all input sections from the library member are included in the output sec­
tion being defined by that directive. When a library member is included by
searching the library outside of a SECTIONS directive, all input sections from
the library member are included in an output section .with the same name as
that input section. If necessary, new output sections are defined to provide a
place to put the input sections. Note, however, the following:
specific members of a library cannot be referenced explicitly in an 1d direc­
the default rules for the placement of members and sections cannot be
overridden when they apply to archive library members
-l option
The -1 option to 1d is is used to link with an input file coming from a
predefined set of directories and having a predefined name. Such files are
usually archive libraries, however, they don't have to be. Furthermore,
archive libraries can be specified without using the -1 option by simply giving
the full or relative UNIX System file path.
An option of the form -lx, where x is a string of up to nine characters, will
cause a library with filename libx.a, located in the directory /lib, to be searched.
The effect of this option is the same as if the library file name was substituted
for the option on the command line.
The ordering of archive libraries is important since the library must be
searched after the unresolved symbol reference is encountered. Archive
libraries can be specified more than once, and they are searched every time
they are encountered. Archive files have a symbol table at the beginning . If
the definition that the library provides for a symbol gives rise to more
undefined symbols, 1d will cycle through the symbol table until it has deter­
mined that it cannot resolve any more references from the current library.
The Link Editor
Consider the following example involving two input files, file1 .o and file2.o :
The input files file1 .o and file2.o each contain a reference to the external
function FCN .
Input file1 .o contains a reference to symbol ABC.
Input file2.o contains a reference to symbol XYZ.
Library liba.a, member 0, contains a definition of XYZ.
Library libc.a, member 0, contains a definition of ABC.
Both libraries have a member 1 that defines FCN.
The following ld command line will cause the FCN references to be satisfied
by liba.a, member 1, ABC to be obtained from libc.a, member 0, and XYZ to
remain undefined:
ld filel.o -Ia file2.o
When file1 .o is processed by ld, references to ABC and FCN are encountered.
The -la option causes liba.a to be searched, and the unresolved references are
satisfied by definitions in that library. When file2.o is processed, references to
FCN and XYZ are encountered. A definition for FCN has already been found,
and hence this definition (from liba.a) resolves the reference. The -lc option
causes libc.a to be searched. The reference to XYZ cannot be resolved from
libc.a because there is no definition of that symbol in the library. There is a
definition of XYZ in liba.a but that definition cannot be obtained at this point,
because ld did not know at the time it was searching liba.a that there would be
an unresolved reference to XYZ.
The following co mmand line will allow the reference to XYZ to be resolved by
the definition of that symbol in member 0 of the library liba.a. This is possible
because the -la option appears to the right of the filename file2.o. ld processes
the file file2.o and encounters the unresolved reference to XYZ; when it
searches the library liba.a it knows that it must resolve that reference and
hence obtains the definition from the library. The other references are
resolved as in the previous example.
filel.o file2.o -Ia
With the following command line, the FCN references are satisfied by libc.a,
member 1, ABC is obtained from libc.a, member 0, and XYZ is obtained from
liba.a, member 0.
ld filel.o file2.o
The -u
The -u option is used to force the linking of a library member that contains a
definition of a symbol even if none of the source files contain an unresolved
reference to that symbol. As is done in the next example, this option may also
be used without any source files to extract a member from a library.
Programming Tools Guide
Allocation algorithm
The following command line creates an undefined symbol called routl in ld's
global symbol table. If any member of library liba.a defines this symbol, that
member is extracted. Without the -u option, there would have been no
unresolved references or undefined symbols to cause ld to search the archive
library. Note that the -u option must appear on the command line before the
library that you want to link a member from.
ld -u routl -la
Allocation algorithm
An output section is formed as a result of a SECTIONS directive, by combining
input sections of the same name, or by combining .text and .init into .text. An
output section can consist of zero or more input sections. After the composi­
tion of an output section is determined, it must then be allocated into config­
ured virtual memory. The link editor uses an algorithm that attempts to
minimize fragmentation of memory, and hence increase the possibility that a
link edit run will be able to allocate all output sections within the specified
virtual memory configuration. The algorithm proceeds as follows:
Any output sections for which explicit binding addresses are specified are
2. Any output sections to be included in a specific named memory range are
allocated. Each output section is placed into the first available space
within named memory with any alignment taken into consideration.
3. Output sections not handled by one of the above steps are allocated. Each
output section is placed into the first available space within memory with
any alignment taken into consideration.
If all memory is contiguous and configured (the default case), and no SEC­
TIONS directives are given, then output sections are allocated in the order
they appear to I d.
Incremental link editing
As previously mentioned, the output of ld can be used as an input file to sub­
sequent ld runs provided that the relocation information is retained in the ini­
tial run by using the -r option. With large applications, it may be desirable to
partition the C programs into subsystems, link each subsystem indepen­
dently, and then link edit the entire application.
The first two steps in the following example are described by giving an ld
command line followed by the file of ld directives that was used in that com­
mand. The third step is the final link edit.
The Link Editor
ld -r -o outfile1 ifile1 infile1.o
/ * i f i lel
ssl :
fl . o
f2 . o
fn . o
ld -r -o outfile2 ifile2 infile2.o
/* i f i le2
ss2 :
gl . o
g2 . o
gn . o
Step 3 :
ld -o final.out outfile1 outfile2
By judiciously forming subsystems, applications may achieve a form of incre­
mental link editing whereby it is necessary to relink only a portion of the total
link edit when a few files are recompiled.
There are two simple rules to follow when applying this technique:
Intermediate link edits should have only
SECTIONS declarations in the link
edit co mmand language file, and be concerned only with the formation of
output sections from input files and input sections. No binding of output
sections should be done in these runs.
All allocation and memory directives, as well as any assignment state­
ments, are included only in the final ld run.
Programming Tools Guide
Allocation algorithm
Sections may be given a type in a section definition, as shown in the following
name l
O x 2 0 0 0 0 0 { DSECT )
O x 4 0 0 0 0 0 { COPY )
O x 6 0 0 0 0 0 { NOLOAD )
{ I NFO )
O x 9 0 0 0 0 0 { OVERLAY )
f i le l . o
f i le2 . o
f i l e3 . o
f i l e4 . o
f i le5 . o
The DSECT option creates a dummy section. A dummy section has the follow­
ing properties:
It does not participate in the memory allocation for output sections. As a
result, it takes up no memory and does not show up in the memory map
generated by ld.
It may overlay other output sections and even unconfigured memory.
DSECTs may overlay other DSECTs.
The global symbols defined within the dummy section are relocated nor­
mally. That is, they appear in the output file's symbol table with the same
value they would have had if the DSECT were actually loaded at its virtual
address. Symbols defined in a DSECT may be referenced by other input
sections. Undefined external symbols found within a DSECT cause
specified archive libraries to be searched: any members which define such
symbols are link edited normally.
None of the section contents, relocation information, or line number information associated with the section is written to the output file.
In the above example, none of the sections from file1 .o are allocated, but all
symbols are relocated as though the sections were link edited at the specified
address. Other sections could refer to any of the global symbols and they are
resolved correctly.
A copy section created by the COPY option is similar to a dummy section.
The difference between a COPY section and a dummy section is that the con­
tents of a COPY section and all associated information is written to the output
An INFO section is similar to a COPY section: instead of containing valid text
and data, as a COPY section does, its purpose is to carry information about the
object file. INFO sections are usually used to contain file version identification
The Link Editor
A section with type NO LOAD differs in only one respect from a normal output
section: its text or data is not written to the output file. A NOLOAD section is
allocated virtual space, appears in the memory map, and so on.
section is relocated and written to the output file. It is different
from a normal section in that it is not allocated and may overlay other sec­
tions or unconfigured memory.
Output file blocking
The BLOCK option, applied to any output section or GROUP directive, is used
to direct ld to align a section at a specified byte offset in the output file. Note
that this has no effect on the address at which the section is allocated nor on
any part of the link edit process. It is used purely to adjust the physical posi­
tion of the section in the output file.
. t ex t BLOCK { O x 2 0 0 )
{ )
. da t a AL I GN { O x 2 0 0 0 0 ) BLOCK { O x 2 0 0 )
{ )
With this SECTIONS directive, ld assures that each section, .text and .data, is
physically written at a file offset, which is a multiple of Ox200 (at an offset of 0,
Ox200, Ox400, and so forth, in the file).
Nonrelocatable input files
If a file produced by ld is intended to be used in a subsequent ld run, the first
ld run should have the -r option set. This preserves relocation information
and permits the sections of the file to be relocated by the subsequent run.
If an ld input file does not have relocation or symbol table information
(perhaps from the action of a strip(CP) command, or from being link edited
without a -r option, or with a -s option), the link edit run continues using the
nonrelocatable input file.
For such a link edit to be successful (that is, to correctly link all input files,
relocate all symbols, resolve unresolved references, and so on), two conditions
on the nonrelocatable input files must be met:
Each input file must have no unresolved external references
Each input file must be bound to the same virtual address as it was bound
to in the ld run that created it
If these two conditions are not met for all nonrelocatable input files, no error
messages are issued. Because of this fact, extreme care must be taken when
supplying such input files to ld.
Programming Tools Guide
Syntax diagram for input directives
Syntax diagram for input directives
Square brackets and curly braces have two meanings in this diagram.
Where the actual symbols, [ ] and { }, are used, they are part of the syntax
and must be present when the directive is specified.
Where symbols
[ and ] (larger and in bold) appear, the material enclosed is
Where the symbols { and } (larger and in bold) appear, multiple occurrences
of the material enclosed are permitted.
Expanded Directives
MEMORY { <memory_spec>
{ [,] <memory_spec> } }
<name> [ <attributes> ]
<origin_spec> [,] <length_spec>
({R I W I X I I})
<origin> <long>
<length> <long>
ORIGIN I o I org I origin
LENGTH I 1 I len I length
SECTIONS { { <sec_or_group>} }
<section> I <group> I <library>
GROUP <group_options> : {
<section_list> } [ <mem_spec>]
<section> { [,] <section> }
<name> <sec_options> :
{ <statement> }
(Continued on next page)
The Link Editor
Expanded Directives
[ <fill> ] [ <mem_spec> ]
[ <addr>] I [ <align_option>] [ <block_option>]
[ <addr>] I [ <align_option>]
[ <block_option> ] [ <type_option> ]
<long> I <bind>( <expr> )
<align> ( <expr> )
ALIGN I align
<block> ( <long> )
BLOCK I block
= <long>
> <name>
> <attributes>
<filename> ( <name_list> ) I [COMMON]
* ( <name_list> ) I [COMMON]
<section_name> [,] { <section_name> }
BIND I bind
<lside> <assign_op> <expr> <end>
<name> I .
= I += I -= I *= I I =
; I I
<expr> <binary_op> <expr>
* I I I %
+ I >> I <<
== I != I > I < I <= I >=
(Continued on next page)
Programming Tools Guide
Syntax diagram for input directives
Expanded Directives
<align> ( <term> )
( <expr> )
<unary_op> <term>
<phy> ( <lside>)
<next>( <long>)
<addr>( <sectionname>)
! I PHY I phy
SIZEOF I sizeof
NEXT I next
ADDR I addr
-I <name>
(Continued on next page)
The Link Editor
Expanded Directives
Any valid symbol name
Any valid long integer constant
Blanks, tabs, and newlines
Any valid UNIX Operating System file name. This
may include a full or partial path name.
Any valid section name, up to 8 characters
Any valid UNIX Operating System path name (full or
Programming Tools Guide
Chapter 3
lint examines C language source files to detect bugs and obscure code con­
structions. It enforces the type rules of C more strictly than the C compiler
does. lint also enforces any portability restrictions involved in moving pro­
grams between different machines or operating systems. In addition, lint can
detect legal C constructs that are wasteful or error-prone. lint accepts multi­
ple input files and library specifications and checks them for consistency.
Two versions of lint are currently available. These are lint and rlint. lint is
based on the the first pass of the Microsoft C compiler. This understands both
the ANSI and Microsoft dialects of the C language. rlint is based on the first
pass of the AT&T C compiler. All references to lint, in this chapter will also
apply to rlint unless explicitly stated otherwise.
The syntax for lint is:
lint [options] files ... libraries ...
The syntax for rlint is:
rlint [options] files .. . libraries ...
The arguments are as follows:
flags to change the default settings for lint or to
suppress certain types of messages displayed by lint
one or more files on which lint is to be run
additional libraries to be used, other than the Standard
C Library which is used by default
The names of files written in C must end with the suffix .c. Files ending with
.In are either lint library files or output files from lint's first pass. If rlint is
used, the equivalent files would end with .rln.
For the full list of options, refer to the lint(CP) and rlint(CP) manual pages.
Multiple options can be combined into a single argument, such as -ab or -xha.
lint accepts the option -Ia lint library file of the form x.ln found in the direc­
tory /usr/lib. The lint math library /usr/lib!llib-lm.ln is called with the option
-1m. The library /usr/lib/llib-lm.rln is referenced if -lm is used with rlint. The -p
option is used to call the portable lint library in place of the standard C lint
library used by default. lint libraries are not actual libraries but are library
description files corresponding to the equivalent C libraries. These descrip­
tion files all begin with the comment:
/ * L I NTL I BRARY * /
A series of dummy function definitions follow the comment. The critical
parts of these definitions are: the declaration of the function return type; the
number and types of arguments to the function; and whether the dummy
function returns a value, for example:
i n t p u t char ( c ) i n t c ; ( return ( c ) ; )
this example, int putchar indicates that the function putchar returns an
integer. The function name is followed by a list of argument type definitions
separated by semi-colons (;). In the above example, int c indicates that the
single argument c must be an integer. Finally, return (c) indicates that the
function returns its argument.
lint libraries are processed like ordinary source files. However, functions that
are defined in a library, but not used in a source file, do not result in warning
messages. lint does not simulate a full library-search algorithm, and it will
print messages if the source files contain a redefinition of a library routine.
lint checks programs against the standard C lint library by default. To
suppress checking against the standard C lint library or the portable lint
library the -n option can be used.
Programming Tools Guide
lint message types
lint message types
The following paragraphs describe the major categories of messages printed
by lint.
Unused variables and functions
As programs evolve, variables and arguments to functions may cease to be
used. It is not uncommon for external variables or even entire functions to
become unnecessary but remain in the source. These errors rarely cause pro­
grams to fail, but they are inefficient and make the source harder to under­
lint prints messages about defined but unused variables and functions. These
messages can be suppressed with the u or -x option.
A function can be written that does not use all of its arguments. Normally
lint will produce warning messages that indicate which arguments are not
used in a function. If the program is still being developed it may be desirable
to suppress these messages. This can be done with the -v option to lint. This
suppresses all messages about unused arguments except those arguments
which are also declared to be register variables. This is an active (and
preventable) waste of the register resources of the machine.
The comment I * ARG SUSED * I can be placed before a function in the source
code to suppress messages about unused arguments. This has the same effect
as using the -v option with lint but applies only for the single function.
The comment I * VARARG sn * I can be used before functions in sources files
and in lint library files to indicate that the function can take a variable number
of arguments. The value n indicates the number of arguments that lint should
perform type checking on, for example:
i n t p r i nt f ( s ) c a n s t char * s ; ( return ( 0 ) ; )
This entry in the standard C lint library file indicates that print£ can take mul­
tiple arguments and that lint should only perform type checking on the first
argument. If I * VARARG S * 1 is used without an n value then no argument
type checking is done.
When lint is applied to some, but not all, files of a collection that are to be
loaded together, it issues warnings about unused or undefined variables.
Functions and variables that are defined may not be used in the files checked
by lint; conversely, functions and variables defined elsewhere may be used.
The u option suppresses these spurious messages.
If a program is divided into many files it is likely that some of these files will
have external variable declarations. If external variables are declared but
unused in any of these files then lint will produce a warning message. These
warning messages can be suppressed by use of the -x option to lint.
Set/used infonnation
lint attempts to detect cases where a variable is used before it is set. It detects
local variables (automatic and register storage classes) whose first use appears
physically earlier in the input file than the first assignment to the variable. It
assumes that taking the address of a variable constitutes a "use," because the
actual use may occur at any later time, in a data-dependent fashion.
The restriction to the physical appearance of variables in the file makes the
algorithm very simple and quick to implement, because the true flow of con­
trol need not be discovered. lint can print error messages about program
fragments that are legal but are considered bad stylistically. Because static
and external variables are initialized to zero, no meaningful information can
be discovered about their uses. lint does deal with initialized automatic vari­
The set/used information also allows recognition of those local variables that
are set but never used. These are frequent sources of inefficiencies and may
also be symptomatic of bugs.
Flow of control
lint attempts to detect unreachable portions of a program. It prints messages
about unlabeled statements immediately following goto, break, continue, or
return statements. It attempts to detect loops that cannot be left at the bot­
tom, and to recognize the special cases while(const) and for{;;) as infinite
loops. lint also attempts to warn about loops that cannot be entered at the
top. Valid programs may have such loops, but they are considered bad style.
Use the -b option to suppress warnings about unreached code.
The lint co mmand has no way of detecting functions that are called and never
return. Thus, a call to exit may cause unreachable code that lint does not
detect. The most serious effects of this are in the determination of returned
function values, (see the next section, "Function Values"). If you think that a
particular place in the program is unreachable in a way that is not apparent to
lint, the comment can be added to the source code at the appropriate place:
This informs lint that a portion of the program cannot be reached, and to
ignore it (that is, do not print a warning).
Programming Tools Guide
lint message types
Programs generated by lex and yacc may have hundreds of unreachable
break statements, but messages about them are of little importance. There is
typically nothing the user can do about them, and the resulting messages
would clutter up lint's output. We recommend invoking lint with the -b
option when dealing with such input.
Function values
Sometimes functions return values that are never used; sometimes programs
incorrectly use function values that have never been returned. lint addresses
these problems in a number of ways.
Locally, the appearance of both of these lines within a function definition is
cause for alarm:
r e t u r n { expr ) ;
re t u r n ;
lint returns the message:
f u n c t i o n name h a s re t u rn { e ) a n d ret u r n
The difficulty with this is detecting when a function return is implied by flow
of control reaching the end of the function. This can be seen with a simple
f {a)
i f { a ) return { 3 ) ;
g { );
If a tests false, f calls g and then returns with no defined return value; this
triggers a message from lint. If g, like exit, never returns, the message is still
produced, even though nothing is wrong. The comment I *NOTREACHED* I in
the source code suppresses this warning. In practice, this feature has detected
some potentially serious bugs.
Globally, lint detects cases where a function returns a value that is sometimes
or never used. When the value is never used, it may constitute an inefficiency
in the function definition that can be overcome by specifying the function as
being of type (void). For example:
{ vo i d ) f p r i n t f { s t derr , ' F i l e busy . Try a g a i n l a t er ! \ n ' ) ;
When the value is sometimes unused, it may represent bad style (such as not
testing for error conditions).
The opposite problem, using a function value when the function does not
return one, is also detected.
Type checking
lint enforces the C type-checking rules more strictly than compilers do. The
additional checking is in four major areas:
across certain binary operators and implied assignments
at the structure-selection operators
between the definition and uses of functions
in enumerations
A number of operators have an implied balancing between operand types.
The assignment, conditional, and relational operators have this property. The
argument of a return statement and expressions used in initialization suffer
similar conversions. In these operations, char, short, int, long, unsigned,
float, and double types may be freely intermixed. Pointer types must agree
exactly, except that arrays of x's can be intermixed with pointers to x's.
The type-checking rules for structure references also require that the left
operand of the "->" be a pointer to structure, the left operand of the 0." be a
structure, and the right operand of these operators be a member of the struc­
ture implied by the left operand. Similar checking is done for references to
Strict rules apply to function-argument and return-value matching. The types
float and double may be freely matched, as may the types char, short, int, and
unsigned. Also, pointers can be matched with the associated arrays. Aside
from this, all actual arguments must be the same types as their declared coun­
With enumerations, checks are made that enumeration variables or members
are not mixed with other types or other enumerations and that the only opera­
tions applied are "=", initialization, 0==", 0!=", function arguments, and return
If you want to tum off strict type-checking for an expression, the comment J•­
NOSTRICT •1 should be added to the source code immediately before the
expression. This comment prevents strict type-checking for only the next line
in the program.
Type casts
The C type cast feature was introduced as an aid to portability. Consider this
assignment, where p is a character pointer:
Programming Tools Guide
lint message types
lint warns you about the assignment. In this assignment, a cast is used to
convert the integer to a character pointer:
( char
*) 1;
The programmer obviously had a strong motivation for doing this and has
clearly signaled those intentions. Nevertheless, lint prints a warning message
about it.
Non-portable character use
On some systems, characters are signed quantities with a range from -128 to
127. On other C language implementations, characters take on only positive
values. Thus, lint prints messages about certain comparisons and assign­
ments being illegal or non-portable. For example, this code fragment works
on one machine but fails on machines where characters always take on posi­
tive values:
char c ;
if ( (c
g e t char (
< 0 ) . . .
The real solution is to declare c as an integer, because getchar is actually
returning integer values. In any case, lint will prints the message:
n o n p o r t ab l e chara c t er compa r i s o n
A similar issue arises with bit fields. When assignments of constant values
are made to bit fields, the fields may be too small to hold the values. This is
especially true because bit fields on some machines are considered signed
quantities. While it may seem logical that a 2-bit field declared type int can­
not hold the value 3, the problem disappears if the bit field is declared to be
type unsigned.
Assignments of longs to ints
Bugs can arise from the assignment of a long to an int, which truncates the
contents. This can happen in programs that have been incompletely con­
verted to use typedefs. When a typedef variable is changed from int to long,
the program can stop working because some intermediate results may be
assigned to ints, which are truncated. The -a option can suppress messages
about the assignment of longs to ints.
Unusual constructions
Several legal but unusual constructions are detected by lint. It is hoped that
the messages encourage clearer style, and even point out bugs. The -h option
can suppress these checks. For example, in the following statement, the *
does nothing.
*ptt ;
lint returns the message:
nu l l e f f e c t
The following program fragment results in a test that will never succeed:
u n s i g ned x ;
if ( X < O ) . . .
Similarly, the next two tests are equivalent, although the intended action may
if( X > 0 )
. . .
if ( X != 0 )
lint prints the following message in these cases:
degenera t e u n s i gned c ompa r i s o n
Suppose a program contains something similar to this:
if( 1 != 0 )
. .
lint prints this message because the comparison of 1 with 0 gives a constant
c o n s t a n t i n c o nd i t i on a l c o n t e x t
Another construction detected by lint involves operator precedence. Bugs
that arise from misunderstanding operator precedence can be accentuated by
spacing and formatting, making such bugs extremely hard to find. For exam­
ple, the following statements probably do not act as intended:
i f ( x& 0 7 7 == 0 l
X« 2
. . .
The best solution i s to use parentheses in such expressions, and lint
encourages this with an appropriate message.
Multiple uses and side effects
With complicated expressions, the best order in which to evaluate subexpres­
sions may be machine-dependent. For example, on machines in which the
stack runs backwards, function arguments are probably best evaluated from
right to left. On machines with a stack running forward, left to right evalua­
tion seems most efficient.
Function calls embedded as arguments of other functions may or may not be
treated comparably to ordinary arguments. Similar issues arise with other
operators that have side effects, such as the assignment operators and the
increment and decrement operators.
Programming Tools Guide
lint message types
The C language leaves the order of evaluation of complicated expressions up
to the local compiler, so that efficiency of C on a particular machine is not
unduly compromised. In fact, various C compilers have considerable
differences in the order in which they evaluate complicated expressions. In
particular, if any variable is changed by a side effect and also used elsewhere
in the same expression, the result is explicitly undefined.
lint checks for the important special case where a simple scalar variable is
affected. For example:
b [ i++ l ;
This statement causes lint to call attention to this condition by printing:
warn i n g : i eva l u a t i on order u n de f i ned
Programming Tools Guide
Chapter 4
C Programmer's Productivity Tools
Introducing the
C Programmer's Productivity
This chapter will teach you how to use the C Programmer's Productivity Tools
(CPPT}. Step-by-step instructions are provided through basic examples, so
you can start using these tools right away. Additional examples demonstrate
various options that allow you to make the best use of the tools.
The CPPT package consists of two tools: cscope, a browser, and lprof, a
The cscope browser is an interactive program that locates specified parts of
code in a set of C source files and allows you to edit tP.e files. It can reduce
significantly the amount of time you must spend searching for functions,
function calls, macros, and variables in the code. It is especially helpful for a
programmer working on unfamiliar code.
A profiler is a tool for analyzing a program's run-time behavior, a procedure
known as "dynamic analysis." The lprof tool allows a programmer or tester to
identify those parts of the source code that are most often executed and those
that are never executed when a program is run. It provides line by line fre­
quency profiling, reporting how many times each line of source code is exe­
cuted. You can request test coverage analysis so that lprof only reports which
lines of code are not executed at run time. lprof can be used over a set of tests
such as a test suite.
C Programmer's Productivity Tools
There are two profilers available for dynamic analysis of C programs written
in a UNIX System environment:
the prof tool performs time-profiling; it reports how much time is spent
executing various portions of a program
the lprof tool performs line-by-line frequency-profiling; it reports how
many times each line of source code is executed
This chapter explains the use of lprof; refer to the prof{CP) manual page for
information on using prof.
Creating a profiled version of a program
If a program is to be profiled with lprof, it must be compiled with the -ql
option to the cc command so that line count data will be saved. For example:
cc -ql travel.c
If you wish to create relocatable object files and link them later, you must
specify -ql when you link as well as when you compile:
$ cc -ql -c travel.c
$ cc -ql -c misc.c
$ cc -ql -o travel travel.o misc.o
To profile an individual source file, rather than the source for the whole pro­
gram, create a profiled version by using the -ql option when you compile the
source file in question, and again when you link.
Running the profiled program
When you execute the program run-time data is stored in a data file whose
name consists of the program name plus the extension .cnt. When the pro­
gram ends, a message such as the following one is printed to stderr:
dump i ng p ro f i l i n g da t a f rom proc e s s ' t ra ve l '
CNTF I L E ' t rave l . c n t ' c rea t ed
environment variable
The environment variable PROFOPTS provides run-time control over
profiling, allowing you to override some of the default behavior. When the
profiled program is about to terminate, it examines the value of PROFOPTS to
determine how the profiling data is to be handled.
Programming Tools Guide
The PROFOPTS environment variable is a co mma -separated list of options
interpreted by the program being profiled. If PROFOPTS is not defined in the
environment, then the default action is taken: the profiling data is saved in a
file (with the default name) in the current directory. If PROFOPTS is set to the
null string, no profiling data is produced.
The following options can be specified for PROFOPTS . They are explained in
more detail in the examples.
msg=[y I n]
If msg=y is specified, a message is printed to stderr
stating that profile data is being created. If msg=n is
specified, only profiling error messages are printed.
The default is msg=y.
merge=[y I n]
If merge=n is specified, data files are not merged after
successive runs; the data file will be overwritten after
each execution. If merge=y is specified, the data will
be merged. The merge will fail if the program has
been recompiled between runs; the data file will be
stored in TMPDIR. The default is merge=n.
pid=[y I n]
If pid=y is specified, the name of the data file will
include the process ID of the profiled program. This
allows the creation of different data files for programs
calling fork(S). If pid=n is specified, the default name
is used. The default is pid=n.
The data file is placed in the directory dirname if this
option is specified. Otherwise, the data file is created
in the directory that is current at the end of execution.
file=fil erui me
filename is used as the name of the data file created by
the profiled program if this option is specified. Oth­
erwise the default name is
"Profiling>programs that fork" for an example.)
Examples of using PROFOPTS
The following examples show how PROFOPTS can be used to configure the
environment in typical profiling situations.
C Programmer's
Productivity Tools
Turning offprofiling
If you do not want to profile a particular run, you can set PROFOPTS to the
null string on the command line when you run a profiled version of a pro­
PROFOPTS="" a.out
However, this value will remain in effect for only one execution of one pro­
If you want to tum off profiling for more than one program or run, you must
export the value of PROFOPTS :
Exporting the variable eliminates the need to specify it every time you run the
program. Once you have exported PROFOPTS, it keeps the value you have
given it until you unset or redefine that variable. This makes the value of
PROFOPTS accessible to all runs of any profiled programs.
Merging data files
Information collected from multiple executions can be collected into a merged
data file. When data files created with the lprof compiling option are merged,
the execution counts for all files are added together.
The following example shows how to configure the environment if you want
data files from successive runs to be merged:
dump i ng pro f i l i n g d at a f rom p ro c e s s ' t rave l '
CNTF I LE ' t rave l . c n t ' crea t e d
dump i n g p ro f i l i ng da t a f rom proc e s s ' t rave l '
CNTF I LE ' t rave l . c n t ' upda t ed
Programming Tools Guide
Keeping data files in a separate directory
To avoid cluttering current directory, you may want to create a separate direc­
tory for data files. This directory must be specified on PROFOPTS For exam­
PROFOPTS="dir=cntfiles" travel
All the data files will be created in the subdirectory cntfiles.
Profiling within a shell script
You may want to write a shell script to runs profiled programs automatically.
This facilitates specifying several PROFOPTS conditions for a particular pro­
gram. The shell script in the next example implements the following condi­
no notification that profiling data is being created
data merged automatically
a name is specified for the data files
Here is the script:
P ROFO P TS = ' m s g = n , merge=y , f i l e = t e s t l . c n t • myprog
test l
Profiling programs that fork
If a program uses the system call fork(S), the data files of the parent and child
processes will have the same name by default. You can avoid this by using
the PROFOPTS option pid. By setting pid=y, you ensure that the data file
name will include the process ID of the program being profiled. As a result,
multiple data files will be created, each with a unique name.
If you run a program that forks without setting pid=y option, then:
if you have set merge=y, the data will be merged; data from separate pro­
cesses will be indistinguishable
if you have set merge=n, the last process to dump data will overwrite the
data file
The following example shows how the pid option works. Notice names of
the data files that are created (as reported by the messages sent to stderr):
PROFOPTS="pid=y'' forkprog
dump i ng p ro f i l i ng da t a f rom p roc e s s ' f orkprog '
CNTF I L E ' 9 2 2 . f o rkprog . c n t ' c re a t e d
dump i n g pro f i l i n g da t a f rom proc e s s ' f orkpro g '
CNTF I L E ' 9 2 3 . f o rkprog . c n t ' c rea t e d
C Programmer's Productivity Tools
Interpreting profiling output
You can use Iprof to:
produce source listing reports of profile data
produce summary reports of profile data
merge profile data files
Viewing the profiled source listing
Along with profiling information, lprof produces a source listing by default.
Once you have executed your profiled program and the data file has been cre­
ated, you can view the profile data by entering the following command:
S lprof
Programming Tools Guide
Interpreting profiling output
The lprof output consists of a source listing with profiling information in the
left margin, as shown in the following example:
# inc lude < s td i o . h>
ma in ( )
1 [3)
! * Note that dec lara t i ons a re n o t executab l e a nd
therefore have no l ine-number or execut ion
status a s s o c i ated with them . * /
int i ;
10 [ 10 ]
for ( i =O ; i < 1 0 ; i t t )
10 [ 11 ]
sub1 ( ) ;
1 [ 12 ]
sub1 ( )
1 0 [ 15 ]
/ * The f o l l owing dec lara t i on * i s * a n executa b l e
sta tement . * /
10 [ 2 0 ]
int i = 0 ;
10 [22]
if ( i > 0 )
! * The next l ine i s never executed . * /
sub2 ( ) ;
0 [25]
0 [26]
0 [ 27 1
10 [ 2 8 ]
sub3 ( ) ;
10 [29]
10 [30]
s ub2 ( )
0 [33]
/ * do noth ing * /
0 [35]
s ub3 ( )
10 [38]
! * do s ame a s s ub2 ( ) * /
10 [40]
The square brackets enclose line numbers for the file. Each number to the left
of a line number shows how many times the corresponding source line was
C Programmer's Productivity Tools
Showing unexecuted lines
If you use the -x option to lprof, the output highlights the lines that have not
been executed. Lines that have been executed are marked only by line num­
bers. Lines that have not been executed are marked with a line number pre­
ceded by a [ U ] . The following example shows an example of output pro­
duced by the -x option:
* inc lude <std i o . h>
ma in ( J
/ * Note that dec larat ions are not executable and
therefore have no l ine-number or execu t i on
status a s s o c i ated with them . * /
int i ;
[ 10 ]
for ( i = O ; i < 1 0 ; i t t )
sub1 ( J ;
sub1 ( J
/ * The f o l l ow ing dec lara t i on * i s * an executab l e
statement . * /
int i = 0 ;
if (i > 0)
/ * The next l ine i s never executed . * /
[U] [25]
sub2 ( J ;
[U] [26]
[ U ] [ 27 ]
else (
s ub3 ( J ;
sub2 ( J
[U] [33 ]
I * do noth ing * I
[U] [35]
sub3 ( J
I • do same a s sub2 ( )
In any lprof output, certain lines (such as declarations, comments, and blank
lines) do not have line numbers associated with them. This allows you to dis­
tinguish between lines that were not executed during a particular run from
those that are not executable. In the previous example, neither line 24 nor line
25 in subl was executed, but line 25 is marked with a line number while line
Programming Tools Guide
Interpreting profiling output
23 is not. This is because line 24 is not executable; line 25 is executable but
was not executed in the run that produced this output.
Specifying program and data files to lprof
By default, lprof expects the profiled program to be called a. out, and the data
file, a.out.cnt.
To run lprof on a program with a name other than a.out, specify the name
after the -o option. For example:
lprof -o sample
Iprof will assume that the data file is called sample.cnt.
You can specify a data file other than sample.cnt by using the -c option. For
lprof -c newdata.cnt
Files needed by lprof
The lprof tool must have access to three kinds of files in order to produce
profiling information.
the executable program (compiled with -ql)
the data file associated with one or more program runs
the source code files for the program being profiled
Depending on which operation you request, lprof will not run if it cannot
find one of these files.
When you run a profiled program. the name of the program is stored in the
data file exactly as it appears on the command line. If you do not specify the
-o option when you run lprof, it consults the data file to obtain the name of
the program. Therefore, you may invoke lprof, specifying the name of the
data file, and let Iprof determine the name of the program. Because the name
of the data file is not stored in the program itself, the reverse is not true: you
cannot specify the name of the program and expect lprof to determine the
name of the data file if it is not the default name.
The lprof tool will not be able to display data if you perform the following
steps in the order shown:
1 . use a relative path name for the executable file when you run your
profiled program
run lprof from a different directory, specifying only the name of the data
file (that is, without specifying the program name)
C Programmer's Productivity Tools
When you run lprof from a directory other than the one in which you have
executed your profiled program, and you have used a relative path name
when executing the profiled program, then you must specify the -o option to
lprof with either the profiled program's full pathname or the program's path­
name relative to your current directory. The following example illustrat�s
cd $HOME
cd src
Iprof -o ..hnybinla.out -c .Ja.out.cnt
In this example, the -c is necessary because you are no longer in the directory
where the program was run from (and hence where the data file was created).
The -o is necessary because otherwise lprof will look for an executable file
called mybin/a.out in the current directory {$HOM E /src) It is assumed in this
example that the source files are in the current directory.
Source files in a different directory
The names of the source files for the profiled program are also stored in the
data file. However, only the file names, not the directories, are stored. Conse­
quently when lprof searches for source files, it must assume that the source
files are in the current directory. If they are in another directory, you must
specify their location with the -1 option and a path name.
The following example is a modified version of the previous one that demon­
strates the use of -I. As before, assume that the source files are in the directory
$HOME /src.
cd $HOME
cd doc
lprof -o $HOME/mybin/a.out -c $HOME/a.out.cnt -I $HOME/src
In this example, lprof is run from the directory $HOME /doc, but the source
files are in $HOME /src.
You can specify multiple -I arguments.
Source listing for a subset of files
If you want profiling output for a limited number of selected files, use the -r
Iprof -r filel.c -r file2.c
This command line will produce output only forfil e1 c andfil e2 c This is use­
ful if you want to examine a few files rather than an entire program.
Programming Tools Guide
Interpreting profiling output
Summary option
You can obtain a surrunary report of the profile data by using the -s option:
lprof -s -c sample.cnt
Because a source listing is not produced with Iprof -s, the -r and -I options do
not need to be specified: Iprof needs only to find the data file. The following
example shows output produced with the -s option:
Coverage Da t a Sourc e : a . ou t . c n t
D a t e o f Coverage Da t a Sourc e : W e d Sep 1 8 1 6 : 4 2 : 1 9 1 9 9 1
Obj e c t : . / a . ou t
c o vered
l i nes
c overed
tota l funct ion
l i n e s name
1 00 . 0
66 . 7
1 00 . 0
ma i n
70 . 6
This table gives the percentage of lines in each function that are actually exe­
cuted during the program run.
Merging option
As described in the section "The PROFOPTS environment variable'', data files
can be merged automatically at run-time. You can also merge existing data
files using the Iprof command, as illustrated in the following example:
lprof -d destfile -m filel.cnt file2.cnt file3.cnt
The -d option is followed by the name of the file that will contain the merged
data. The -m option is followed by the names of two or more data files to be
merged. The data files must have been created by the same profiled program;
if they have not, I prof will issue an error message:
$ lprof -d merged.cnt -m progl.cnt prog2.cnt
E R ROR : ' prog 1 ' , ' prog2 '
Obj e c t f i l e e n t ry names & t i m e s t amps do n ' t m a t c h .
* * * n o merged output * * *
You may have multiple data files created by the same program that have
different time stamps. This will happen, for example, if you recompile a pro­
gram. If you want to merge data from runs of different versions of the same
program, you can override the time-stamp check by specifying -T (time stamp
C Programmer's Productivity Tools
NOTE You must be extremely cautious when using the -T option. If the
control flow of the recompiled program has changed, the new merged data
file is very likely to be erroneous, and Iprof will produce an incorrect report.
Cautionary notes on using I prof
This section describes solutions for several problems that may arise while you
are using I prof.
Trouble at compile time
Sometimes if you are compiling with one or more forms of optimization
together with the profiling option, you will get an error warning that there is
an incompatibility between obtaining profiling information and optimization.
This is usually a permissible combination of options, but occasionally the
compiler does not accept it.
You may not need to have the function in question profiled. If not, ignore this
warning; data will be collected in the data file for all other functions. If you
do want data for the function in question, compile your program again with
the profiling option but without optimization. The warning should not reap­
Non-tenninating programs
If the profiled program does not terminate normally, no profiling data will be
saved. The profiling data is saved at termination by the system call exit(S). If
exit is never called, no profiling data is saved.
Failure of data to merge
If a program has been recompiled, a new data file will be created in a tem­
porary directory. The path name of the new file will be printed to stderr .
Trouble at the end of execution
At the end of execution, you may see the following error message:
dump i n g pro f i l i n g da t a f rom p ro c e s s ' a . ou t '
* * * u n a b l e t o seek t o symbo l t a b l e
. . .
Usually this is caused by running a stripped version of a profiled program.
Never strip files to be profiled. If necessary, change "makefiles" so that they
do not produce stripped files.
No data collected
Sometimes you may get no data after running a profiled program. The pro­
gram terminates normally, and you receive neither a message about data
being saved, nor an error message. This may be caused by one of two prob­
Programming Tools Guide
Interpreting profiling output
You may not have specified -ql at both compile time and link time. If you
forget to specify -ql when you link, the profiled program will run, but a
data file will not be created.
The profiled program may include a call to _exit that is causing the pro­
gram to quit without calling exit(S), the procedure that saves your profiling
data. Replace calls to _exit with calls to exit(S) to save profiling data.
The PROFOPTS variable may be set to NULL.
Data file not found
Occasionally, I prof may not be able to find the data file, despite the fact that
the profiled program has terminated normally and you have received a mes­
sage saying that the data file has been created.
The profiled program creates the data file in the directory in which the pro­
gram is located when it terminates. If the program changes directories, the
data file may be created in a directory different from both the one from which
you executed the program and the one in which the shell is located when the
program terminates.
Use the dir option of PROFOPTS to specify exactly where the data file is to be
created, so you will be able to find it.
Using lprof with program generators
Program-generators such as lex and yacc make use of the #line preprocessor
directive to change line numbering. This generally disrupts line profiling such
as is done with lprof.
Using lprof with shared libraries
It is recommended that when profiling with lprof, you use archived versions
of libraries rather than shared versions.
Improving performance with prof and lprof
The prof and lprof profilers can help a programmer locate the time­
consuming parts of a C program.
prof provides a time profile, that is, a list of the most time consuming func­
tions and the amount of time taken by each: The lprof tool provides a list of
the lines that are being executed most frequently. Once these potential prob­
lem areas have been identified, it is the programmer's job to rewrite those
parts of the code so that the program runs more efHciently.
C Programmer's Productivity Tools
Although either of these profilers can be used singly, they are most efficient if
you use them together. First, profile your program with prof to identify the
most time-consuming functions. Then, profile only those functions (rather
than the entire program) with lprof to determine which lines are being exe­
cuted most frequently. This two-step approach takes the guesswork out of
determining which lines of code are the most time-consuming.
It is important to profile programs with data that are typical of what the pro­
gram will encounter in normal use. Most test cases fail to provide profiling
data representative of typical usage.
Improving test coverage with lprof
It is difficult to write test suites that fully exercise programs if you have no
way of determining how much of the code is exercised. The lprof tool
removes the guesswork by showing which lines of code are executed. This
allows the tester to know exactly what has been tested. It also makes it easier
to refine and improve tests.
To measure how well a given test suite tests a program, profile the program
and look at the summary output, to see how much of the code is exercised.
More specifically, you can examine individual functions that do not have
100% coverage (that is, not all lines in the function were executed). This
analysis may suggest ways of improving the tests.
The following two examples illustrate why certain functions may not have
100% coverage. The first example demonstrates how to uncover a feature that
is usually missed in the test suite. The second example shows how to uncover
a function that is never called.
Programming Tools Guide
Interpreting profiling output
1: searching for undocumented features
First, examine a source listing to see what parts of the code are not executed.
The portion of code profiled in the following example processes command­
line options.
wh i l e ( ( c = g e t opt ( a rgc 1 a rgv 1 ' b l c n s v i ' ) ) ! = EOF )
sw i t ch ( c ) (
case ' v ' :
vf l a g + + ;
[U] [34]
brea k ;
case ' c ' :
c f l ag + + ;
break ;
case ' n ' :
n f lag++ ;
break ;
case l b l :
bf lag++ ;
break ;
case ' s ' :
s f lag++ ;
break ;
case I l I :
l f lag++ ;
brea k ;
case I i I :
[ 52 ]
i f lag++ ;
brea k ;
case I ? I :
[U] [55]
errf l g+ + ;
[ 56]
The output shows that the code for the -v option was never executed. To
correct this, create a test that exercises the option.
C Programmer's
Productivity Tools
Example 2: functions that are never called
Consider the following lpraf summary.
Coverage Data Source : test . cnt
Date of Coverage Data Source : Wed Mar
Obj ect : myprog
l i nes
tota l funct ion
l i nes name
91 . 5
1 00 . 0
1 00 . 0
1 00 . 0
42 . 9
1 00 . 0
1 00 . 0
42 . 9
100 . 0
1 0 6 comp i le
1 8 step
7 3 advance
4 getrnge
2 8 ma i n
29 execute
1 9 succeed
7 putdata
1 9 reg err
2 1 fget l
85 . 2
6 1 1 : 1 1 : 58 1 9 9 1
None o f the lines in the function regerr are executed. To find out why, invoke
escape (described later in this chapter) and request a list of the functions that
call it. If escape reports that no function calls regerr, it may be possible to
delate it from the code.
Using lprof with rcc
The examples given so far in this chapter illustrate using lpraf with cc. This
tool may be used with rcc also. As with cc, profiled programs must be com­
piled with the -ql option. All the lpraf options remain the same. There will,
however, be some differences in behavior.
Values are reported for gata labels and case labels when using cc even
though these statements do not correspond to executable code. These
values are not meaningful.
Similarly, values are reported for a line containing only the closing bracket
of an if statement, and for a line that contains an else statement. These
values are carry-overs from previous lines.
Programming Tools Guide
Using lprof with rcc
The number of times a function was called and exited are marked
differently. If rcc is being used, the beginning and ending curly braces are
f oo ( i )
int i ;
If cc is used, the first formal parameter declaration and the ending curly
brace are marked:
foo ( i )
int i ;
If a break or continue statement is executed inside a loop, lprof, used with
cc, will mark the closing curly brace as having been executed a smaller
number of times than the beginning curly brace, as shown in the next
[ 10]
wh i l e ( i )
i f ( exp ) brea k ;
If Iprof is used with rcc, the beginning and ending curly brace will both be
marked the with the actual number of executions of the first curly brace:
wh i l e ( i )
i f ( exp ) brea k ;
Similarly, functions that use setjmp( ) and longjmp( ) may have their start­
and end-points marked with a different number of executions.
These differences occur because of differences in the way cc and rcc generate
code and line number information.
C Programmer's Productivity Tools
The cscope browser builds a cross-reference symbol table for the functions,
function calls, macros, and variables in designated source files. It then allows
you to query that table about the locations of symbols. cscope presents a
menu and asks you to choose the type of search you would like to perform.
When cscope has completed this search, it prints a list of the lines on which it
has found the item requested. You can then indicate which of these lines you
want to examine. After you have requested a subset of the lines, cscope
allows you to edit a line or to begin another search.
Throughout a cscope session, you have the option of returning to the menu
from the editor to request a new search. There are a variety of
single-character commands available for manipulating the menu.
Because the procedure you follow will depend on the task you select, there is
no single set of instructions for using cscope. To show how this browser
works, a tutorial is provided later in the chapter.
Configuring the environment
Before you can use cscope, you must ensure that it can employ the editor and
terminal that you wish to use.
Setting the terminal type
Check the value of the TERM environment variable to make sure you have set
it to the correct terminal type for your terminal. This is done with commands
such as the following:
TERM=ansi; export TERM
These commands may be used at the UNIX System prompt or in a .profile file.
When you invoke cscope, you may see the following error message:
e s cape : ' t e rm ' i s n o t i n the t e rm i n a l d a t a ba s e .
If this message appears, your terminal may not be listed in the terminal infor­
mation (terminfo) database that is currently loaded. Try reloading the data­
base from the Terminal Information Utilities.
You may also see the message:
e s c ape : TERM va r i a b l e i s n o t set or i s n o t exported i n y o u r . pr o f i l e
If this message appears, set and export the TERM variable.
Programming Tools Guide
Choosing an editor
By default, cscope invokes the vi editor. If you want to use another editor,
such as the emacs editor, you must set the EDITOR environment variable.
This is done with commands such as the following:
EDITOR=emacs; export EDITOR
These commands may be used at the UNIX System prompt or in a .profile file.
cscope expects vi and any other editor it uses to have a standard
command-line syntax of the following form:
editor +Iinenum filename
However, if the editor you want to use does not conform to this
command-line syntax, you must write an interface between cscope and the
For example, suppose you want to use ed. Because ed. does not allow specifi­
cation of a line number on the command line, you will not be able to edit or
view any files while using cscope. To solve this problem, write a shell script
called, for example, myedit, that consists of the following line:
/bi n /ed $ 2
Then set the value o f EDITOR to the name o f your shell script:
EDITOR=myedit; export EDITOR
Normally, when cscope invokes the editor, it uses a command line such as:
vi +17 file.c
Now, when cscope invokes the editor, it will calls the shell script myedit with
the same arguments; this script will discard the line number and call ed
correctly with the filename ($2).
NOTE The ed tool has one other drawback as a cscope editor that you
should take into consideration when selecting an editor: it cannot move you
to specified lines in the file. If you use the shell script shown in the previous
example, you will have to move to specified lines manually.
Choosing a browser
If you want to use cscope only for browsing (without editing) you can set the
VIEWER environment variable to pg and export VIEWER. cscope will then
invoke pg instead of vi.
C Programmer's Productivity Tools
Using cscope
If all the source files for the program to be browsed (with the possible excep­
tion of standard system header files) are in the current directory, invoke
cscope without any arguments:
By default, cscope builds its cross-reference table for all the C, lex, and yacc
source files in the current directory. Therefore, typing cscope without any
arguments is equivalent to the following command line:
cscope *.[chly]
cscope also searches the standard directories for any header files that you
include with #include.
To browse through specific files, invoke cscope with file names as arguments
on the command line:
cscope filel.c file2.c file3.h
A list of all the files to be browsed may be read from a file: use the -i option to
specify this file. For example:
cscope -i list
If the source files are in a directory tree, the following commands will allow
you to examine all the source files easily:
find . -name '* .[chly]' -print I sort > filelist
cscope -i filelist
Searching for #include files
cscope automatically searches for the #include files that it encounters when
scanning. The -I option for cscope is similar to the -I option for cc. It directs
cscope to search specified directories for #include files. For example:
cscope -I .Jhdr
The cscope tool searches for #include files in directories in this order:
1 . the current directory
2. directories specified by the -I option
3. the standard location for header files (usually /usr/include).
Searching for source files
By default, cscope searches for source files only in the current directory. The
environment variable VPATH can be used to extend your search for source
files from a single directory to a set of directories.
Programming Tools Guide
Using escape
VPATH should be set to the list the directories you want searched, in the order
you want them searched. Separate the directory names with colons. You
must specify the current directory in VPATH if you want it to be searched.
The current directory can be represented by the dot 11 . " symbol. The VPATH
variable may be set with the commands such as the following:
export VPATH
These commands may be used on the command line or in a .profile file. In this
example, cscope will first search for files in the directory lfs2/mydirectory. If
the file is not in that directory, cscope continues searching for it in the other
directories specified in VPATH until it finds the file. The current directory,
represented by 11 ", is the last directory to be searched in this example.
Running cscope
Mter cscope has been invoked and the cross-reference information processed,
the cscope task menu appears on the screen:
c s cope
Press the ? key f o r he l p
L i s t references t o t h i s C symbo l :
Ed i t t h i s f u n c t i o n or # d e f i n e :
L i s t f u nc t i o n s c a l l e d by t h i s f u n c t i o n :
L i s t f u n c t i on s c a l l i ng t h i s f u n c t i o n :
L i s t l i n e s c o n t a i n i n g t h i s t e x t s t r i ng :
Change t h i s t ex t s t r i n g :
L i s t f i l e names c o n t a i n i n g t h i s t e x t s t r i n g :
Press the (Tab) or (Return) key to move the cursor down the screen (with wra­
paround at the bottom of the display), and (Ctrl)p to move the cursor up.
Once the cursor is at the desired input field, enter the text to be searched, and
press the (Return) key.
The following single-key co mmands are available at any time during a cscope
_· - _:--,
. · .·
C Programmer's Productivity Tools
Table 4-1 Menu manipulation commands
move to next input field
move to next input field
move to next input field
move to previous input field
search with the last text typed
rebuild the cross-reference
start an interactive shell
(type (Ctrl)d to return to cscope)
redraw the screen
display list of co mmands
exit cscope
NOTE To type control characters such as (Ctrl)p hold down the (Ctrl) key
and press the letter shown.
The cross-reference file
When cscope is invoked it checks whether a file containing a cross-reference
symbol table exists in the current directory. If it does not exist, cscope builds
this table and refers to it during subsequent sessions. This table is created in
the current directory and is called cscope.out. The next time cscope is invoked,
it checks cscope.out for changes. cscope modifies the table if the list of source
files has been changed. If the table has been modified, cscope rebuilds only
the modified portions. Because copying information is much faster than
building it, subsequent calls to cscope should require less start-up time than
the initial call.
A cross-reference file other than cscope.out may be specified using the f
option. This is useful for keeping separate symbol cross-reference files in the
same directory. You may want to do this if two programs are in the same
directory but do not share all the same files. For example:
cscope -f adrnin.ref adrnin.c cornmon.c aux.c libs.c
cscope -f delta.ref delta.c common.c aux.c libs.c
In the preceding example, the source for two programs are in the same direc­
tory, but the programs involve different files. By specifying two reference
files, the cross-reference information for the two programs can be kept
As with cscope.out, if the file specified on the -f option does not exist, cscope
will build the cross-reference and leave it in the file specified.
Programming Tools Guide
Using escape
The -d option allows you to prevent updating of the cross-reference table and
thereby save time. You should use this option only if you are sure that your
source files have not been changed. Because it is usually more important to
safeguard against generating erroneous data than to save time, avoid using
the -d option unless absolutely necessary. If you specify -d with cscope under
the erroneous impression that your source files have not been changed,
cscope will give you outdated information.
A tutorial example: locating the source of the error message
In this example, the task is to locate the source of an error message "out o f
s t o rage" that i s printed when a program i s run . Assume that you are working
with unfamiliar code. You will invoke cscope and start your search for the
problem by locating the section of code where the error message is generated.
Move the cursor to the fifth menu item, that is:
L i s t l ines conta ining this t ext s t r ing
Now enter the text to be searched for:
out of storage
This process is shown in the next example.
e s c ap e
Pre s s the ? k e y f o r he l p
L i s t re f eren c e s t o t h i s c symbo l :
Ed i t t h i s f u n c t i o n or # de f i n e :
L i s t f u n c t i o n s c a l l ed by t h i s f u n c t i o n :
L i st funct ions ca l l ing t h i s funct ion :
L i s t l i n e s c o n t a i n i n g t h i s t ex t s t r i ng : out of storage
Cha n g e t h i s t e x t s t r i ng :
L i s t f i l e names c o n t a i n i n g t h i s t ex t s t r i n g :
Press the (Return) key. The cscope tool searches for the specified text and
finds one line that contains it.
NOTE The same procedure is followed to perform any other task listed in
the menu, except the sixth, changing a text string. For a description and
examples of changing a text string, see the section "Examples of using
cscope" later in this chapter.
C Programmer's Productivity Tools
cscope reports its findings as follows:
Text s t r i n g : out of s t orage
F i le
L i ne
a l l o c . c 5 6 ( vo i d ) fpr i n t f ( s t de r r ,
' \ n % s : out o f s t orag e \ n ' ,
argv [ O ] ) ;
Ed i t t h i s f u n c t i on or * d e f i n e :
L i s t f u n c t i o n s ca l l ed by t h i s f u n c t i o n :
L i s t funct ions ca l l ing t h i s funct ion :
L i s t l i n e s c o n t a i n i n g t h i s t ex t s t r i ng :
Change t h i s t ex t s t r i ng :
L i s t f i l e names c o n t a i n i n g t h i s t e x t s t r i ng :
After cscope shows the results of a successful search in this way, you have
several options. For example, you may want to edit one of the lines found. If
cscope has found several lines and a list of them will not fit on the screen at
once, you may want to look at the next part of the list. The following table
shows the commands available after cscope has found the specified text.
Table 4-2 Commands for use after initial search
edit this line
(the number you type corresponds to an item
in the list of lines printed by cscope)
display the lines after the current line
display the lines after the current line
display the lines before the current line
edit all lines
append the list of lines being displayed to a file
NOTE If the first character of the text you are searching for matches one of
these co mmands, precede it with a n \ " (backslash).
Programming Tools Guide
Using cscope
Now examine the code around the newly found line. Enter " 1 " (the number
of the line in the list). The editor will be invoked with the file alloc.c; the cur­
sor will be at the beginning of line 56 of the text file.
r e t u r n { a l l o c t e s t { rea l l oc { p , { un s i gned ) s i z e ) ) ) ;
/ * check f o r memory a l l oc a t i o n f a i l u re * /
s t a t i c char *
a l loctest { p )
if {p
{ vo i d ) f p r i n t f { s t derr ,
' \ n % s : o u t o f s t o ra g e \ n ' ,
argv [ O ] ) ;
ex i t { C ) ;
return { p ) ;
' a l l o c . c ' 6 0 l i n e s , 1 0 2 2 chara c t ers
By examining the code, you learn that the error message is generated when
the variable p is NULL. To determine how an argument passed to alloctest
could have been NULL, you must first identify the functions that call alloctest.
C Programmer's
Productivity Tools
Exit the editor by using normal write and quit conventions, and return to the
menu of tasks. Now request a list of functions that call alloctest, as shown in
the next example:
Text s t r i n g : o u t o f s t orage
L i ne
Fi le
a l l oc . c 5 6 ( vo i d ) fpr i n t f ( s t derr ,
" \ n % s : o u t o f s t orage \ n " ,
a rgv [ O ) ) ;
L i s t r e f e r e n c e s t o t h i s C symbo l :
Ed i t t h i s f u n c t i o n or # d e f i n e :
L i s t f u n c t i o n s c a l l ed by t h i s f u n c t i on :
L i s t f u n c t i o n s c a l l i n g t h i s f u n c t i o n : alloctest
L i s t l i nes conta i n i ng t h i s text s t r i ng :
Cha n g e t h i s t ex t s t r i ng :
L i s t f i l e names c o n t a i n i n g t h i s t e x t s t r i n g :
escape finds and lists three such functions:
F u n c t i o n s c a l l i n g t h i s f u n c t i o n : a l l oc t e s t
F i le
Fun c t i o n
1 a l l oc . c myma l l oc
2 a l l oc . c my c a l l oc
( u n s i gn e d )
L i ne
2 6 return ( a l l o c t e s t ( ma l l oc ( ( un s i gn e d ) s i z e ) ) ) ;
3 6 return ( a l l o c t e s t ( c a l l oc ( ( u n s i g n e d ) n e l em ,
s i ze ) ) ) ;
3 a l l o c . c myrea l l oc 4 6 return ( a l l o c t e s t ( rea l l o c ( p ,
s i ze ) ) ) ;
( un s i gn e d )
L i s t re f e r e n c e s t o t h i s C symbo l :
Ed i t t h i s f u n c t i o n or # d e f i ne :
L i s t f u n c t i o n s c a l l e d by t h i s f u n c t i o n :
L i s t funct ions ca l l ing t h i s funct ion :
L i s t l i ne s c o n t a i n i n g t h i s t e xt s t r i n g :
Change t h i s t e x t s t r i ng :
L i s t f i l e names c o n t a i n i ng t h i s t e x t s t r i n g :
Programming Tools Guide
Using escape
Now you need to know which functions call mymalloc. cscope finds ten such
functions. It lists seven of them on the screen and instructs you to press the
space bar to see the rest of the list.
F u n c t i o n s ca l l i n g t h i s f u n c t i on : myma l l oc
F i le
a l l oc . c
Func t i o n
s t ra l l oc
2 d ir . c
makesrc d i r l i s t
3 dir . c
makesrcd i r l i s t
4 d ir . c
make f i l e l i s t
5 dir.c
makef i l e l i s t
6 dir . c
add i n c d i r
7 d i s p l ay . c d i sp i n i t
17 return ( s t rc py ( myma l l o c ( s t r l e n ( s )
+ 1 ) . s) ) ;
7 0 srcd i rs = ( ch a r * * )
myma l l oc ( n s rc d i rs * s i z e o f ( cha r * ) ) ;
8 9 s = myma l l o c ( s t r l e n ( s r c d i r s [ i ] )
+ n) ;
1 1 5 s rc f i l e s = ( ch a r * * )
myma l l oc ( m s rc f i l e s * s i z e o f ( ch a r * ) ) ;
1 1 6 s rc n ames = ( ch a r * * )
myma l l oc ( ms rc f i l e s * s i z eo f ( c ha r * ) ) ;
2 1 2 i n c d i rs = ( ch a r * * )
myma l l oc ( s i z e o f ( c h a r * ) ) ;
7 6 d i sp l i ne = ( i n t * )
myma l l oc ( md i s p re f s * s i z e o f ( i n t ) ) ;
* 3 more l i n e s - pre s s the space bar t o d i sp l ay more *
L i s t re f erences t o t h i s C symbo l :
E d i t t h i s f u n c t i o n or # de f i ne :
L i s t f u nc t i o n s c a l l ed by t h i s f u n c t i o n :
L i s t f u n c t i o n s c a l l i n g t h i s f u n c t i on :
L i s t l i n e s c o n t a i n i n g t h i s t e x t s t r i ng :
Chan g e t h i s t e x t s t r i n g :
L i s t f i l e names c o n t a i n i n g t h i s t e xt s t r i n g :
C Programmer's Productivity Tools
Because you know that the error message ("out o f s t o rage") is generated at
the beginning of the program, you can guess that the problem may have
occurred in the function dispinit (display initialization). To view dispinit, the
seventh function on the list, type "7":
vo i d
d i sp i n i t ( )
/ * c a l c u l a t e the max imum d i s p l ayed re f erence l i n e s * /
l a s t d i s p l i n e = FLDL I N E
m d i s p re f s
l a s t d i sp l i n e - REFL I N E + 1 ;
i f ( m d i s p re f s > 9 ) {
md i s pre f s
/ * a l l oc a t e the d i sp l ayed l i ne a rray * /
d i sp l i n e = { i n t * ) myma l l oc { m d i spre f s * s i z e o f { i n t ) ) ;
A L / * d i s p l ay a p a g e o f the r e f erences * /
vo i d
d i s p l ay { )
f i l e [ PATH LEN t 1 ] ;
f u n c t i on [ PATLEN t 1 ] ;
l i n enum [ NUMLEN t 1 ] ;
s c reen l i ne ;
w i dt h ;
/ * f i l e name * /
/ * f u n c t i o n name * /
/ * l i n e number * /
/ * s c reen l i n e number * /
/ * source l i ne d i s p l ay
w i dt h * /
reg i s t e r i n t
i, j;
' d i s p l ay . c • 4 4 0 l i n e s , 1 0 1 9 8 chara c t ers
Examining this code, you will see that mymalloc failed because it was called
with a negative number. You are now in a position to fix the problem. The
program needs a mechanism such that if the value of the variable mdisprefs
is negative, it will abort after printing a meaningful error message.
Stacking cscope and editor calls
cscope and editor calls can be stacked. This means that when cscope puts you
in the editor to display one symbol reference and there is another symbol of
interest, you can call cscope again from within the editor without exiting the
current invocation of either cscope or the editor. You can then back up to a
previous invocation by exiting the appropriate cscope and editor calls.
Conditional compilation directives
Conditional compilation directives such as #if, #ifde£, and #ifnde£ allow you
to put more than one definition of a function in your program, provided that
after the preprocessor interprets the directives, only one definition of the func­
tion is passed on to the compiler. These conditional compilation directives
Programming Tools Guide
Examples of using escape
are ignored by cscope . If there are multiple definitions of a function, cscope
recognizes only the first definition that appears in the source text; this occurs
even if the normal interpretation of the preprocessor directives would cause a
different definition to be compiled.
A consequence of this is that the correct definition of a function may not be
accessible via the "Li s t func t ions called by thi s func t ion" and "Li s t
func t ions c a l l ing thi s funct ion" menu items. You may obtain a list of
lines containing the name of the function using the "L i s t l ines cont aining
t h i s t ext s t r ing" menu item. References to the function may be obtained
using the "L i s t ref erenc es to this C symbo l" menu item.
Examples of using cscope
This section gives three examples of how cscope can be used to perform tasks:
change a constant to a preprocessor symbol, add an argument to a function,
and change the value of a variable.
Changing a text string
If you select the sixth item in the task menu, "change t h i s t ext s t ring",
cscope accepts a search string, prompts you for new text and displays the
lines containing the old text. You can select the lines you want changed with
any of the following single-key commands:
Table 4-3 Commands for selecting lines to be changed
mark or unmark the line to be changed
mark or unmark all displayed lines to be changed
display next lines
display next lines
display previous lines
mark all lines to be changed
change the marked lines and exit
exit without changing the marked lines
Suppose you want to change a constant, '100', to a preprocessor symbol,
MAXSIZE. Select the menu item "change thi s t ext s t r ing" and enter \100.
I meaning (item
NOTE The 1 must be preceded by a " \ " (backslash) because it has a special
1 on the menu) to cscope.
C Programmer's
Productivity Tools
Now press (Return); cscope will prompt you for the new text string. Type
MAXSIZE. This process is illustrated in the following example:
e s c ope
Pre s s the ? key for he l p
L i s t ref erenc e s t o th i s .c symbo l :
Ed i t t h i s f u nc t i o n or # d e f i ne :
L i s t f u n c t i on s c a l l e d by t h i s f u n c t i o n :
L i s t f u nc t i o n s c a l l i ng t h i s f u n c t i o n :
L i s t l i n e s c o n t a i n i ng t h i s t e x t s t r i n g :
Cha n g e th i s t e x t s t r i n g : 100
L i s t f i l e names c o n t a i n i ng t h i s t e x t s t r i ng :
The lines containing the particular text string are displayed; cscope waits for
you to specify the lines in which you want the text to be changed.
Cha nge ' 1 0 0 ' t o ' MAXS I Z E '
F i l e L i ne
i n i t . c 4 char s [ 1 0 0 ] ;
i n i t . c 2 6 for ( i
0 ; i < 100 ; i+t }
f i nd . c 8 i f ( c < 1 0 0 } (
( bb & 0 1 0 0 } ;
read . c 1 2 f
err . c 1 9 p
t o t a l / 1 0 0 . 0 ; / * get perc e n t a g e * /
L i s t re f e r e n c e s t o th i s c symbo l :
Ed i t t h i s f u n c t i o n or # d e f i ne :
L i s t f u n c t i o n s c a l l ed by t h i s f u n c t i on :
L i s t f u n c t i o n s ca l l i n g t h i s f u n c t i o n :
L i s t l i n e s c o n t a i n i n g t h i s t e x t s t r i ng :
Cha n g e t h i s t ex t s t r i ng :
L i s t f i l e names c o n t a i n i n g t h i s t e x t s t r i n g :
Se l ec t l i n e s t o change ( pres s the ? key f o r he l p } :
Occurrences of 1 0 0 in lines 1, 2, and 3 of the list (from lines 4, 26, and 8 of the
program) are to be changed to MAXSIZE. The occurrences of 1 0 0 in read.c and
err. c (lines 4 and 5 of the list) have a different meaning; in these lines, 1 0 0
should not b e changed. Enter 1, 2, and 3.
Programming Tools Guide
Examples of using escape
The numbers you type are not printed on the screen. Instead, cscope prints a
11 > " (greater than) symbol after each number of the list that you type. For
example, after you type 1, a > symbol is printed after the number 1 in the list
(and before the line 11 ini t . c 4 char s [ 1 o o J ; " ), as shown in the following
C h a n g e ' 1 0 0 ' to ' MAXS I Z E '
F i l e L i ne
1 > i n i t . c 4 char s [ 1 0 0 ] ;
2 > i n i t . c 2 6 for ( i = 0 ; i < 1 0 0 ; i t t )
3 > f i nd . c 8 i f ( c < 1 0 0 ) (
4 read . c 1 2 f = ( bb & 0 1 0 0 ) ;
5 err . c 1 9 p = t o t a l / 1 0 0 . 0 ; / * g e t perc e n t a g e * /
L i s t re f erences t o th i s C symbo l :
Ed i t t h i s f u nc t i o n or # de f i n e :
L i s t f u n c t i o n s ca l l ed by t h i s f u n c t i o n :
L i s t f u n c t i o n s ca l l i ng t h i s f u n c t i o n :
L i s t l i n e s c o n t a i n i n g th i s t e x t s t r i n g :
Change t h i s t e x t s t r i n g :
L i s t f i l e names c o n t a i n i n g t h i s t e x t s t r i ng :
Se l ec t l i n e s t o change ( p re s s the ? key f o r he l p ) :
After selecting lines, type (Ctrl)d to change them. Then cscope displays the
lines that have been changed:
Cha n g e d l i n e s :
char s [ MAXS I Z E ] ;
f o r ( i = 0 ; i < MAXS I Z E ; i t t )
i f ( c < MAXS I Z E ) (
Ty p e any chara c t e r t o c o n t i nue :
When you type a character in response to this prompt, cscope will pause and
redraw the screen before allowing you to continue with the session.
The next step is to add the #define for the new symbol MAXSIZE. Escape to
the shell by typing an exclamation mark 11 ! ". The shell prompt will appear at
the bottom of the screen. Then enter the editor and add the #define. To
resume the cscope session, quit the editor and type (Ctrl)d to exit the shell.
Adding an argument to a function
The cscope tool makes it easy to add an argument to a function. This involves
two steps: editing the function itself and adding the new argument to each
place where the function is called.
C Programmer's Productivity Tools
First, edit the function by using the second menu item, "Edit t h i s func t ion
or lldef ine". Next, find out where the function is called. By invoking the
fourth menu item, "Li s t funct ions call ing thi s funct ion", you can get a
list of all functions that call it. With this list, you can either invoke the editor
on each line found by entering the list number for each line individually, or
invoke the editor on all lines automatically by typing (Ctrl)e . cscope is espe­
cially useful when making this kind of change because it guarantees that none
of the functions you need to edit will be overlooked.
Changing the value of a variable
The value of cscope as a browser becomes apparent when you want to see
how a proposed change will affect your code. Suppose you want to change
the value of a variable or preprocessor symbol. Before doing so, use the first
menu item, "Li s t r e f erences to this C symbo l", to obtain a list of refer­
ences that will be affected. Then use the editor to examine each one. This will
help you predict the overall effects of your proposed change. Later, you can
use cscope this way again to verify that your changes have been made.
Programming Tools Guide
Chapter 5
Most programming projects encompass a large number of individual files.
Keeping track of file interdependencies often gets too complex to maintain on
a piece of paper or by memory. make(CP), documented in the Programmer's
Reference, was designed to keep track of file-to-file relationships, the order of
command executions, and general file maintenance. make is a command gen­
erator. It generates command sequences to be executed within a UNIX system
shell. Within a make description file, more commonly known as a makefile or
Makefile, a user defines the file interdependencies and co mmand sequences to
be executed.
If a program must be linked from object files and libraries, which are in tum
created from assembly or high-level language source files, then invoking
make performs this task automatically.
By using make, a programmer no longer has to be concerned with the follow­
ing scenario: if file A depends on file B, and if file B was modified after file A,
then file A must be compiled and linked before the program can run correctly.
A programmer can now let make remember:
file-to-file interdependencies
which files were modified recently
which files require recompilation after source changes
the exact sequence of operations required to generate a new version of the
NOTE For a detailed explanation of the make command line usage, see the
I on- and off-line manual pages and the Programmer's Reference.
Basic features
make's primary function is to execute the steps required to produce a new
version of a specific (target) program.
make operates using the following information sources:
user-defined description file
filesystem data and timestamp information
set of suffix rules
The user-defined description file, conventionally called makefile or Makefile,
holds the information on interfile dependencies and command sequences. For
example, a program under development consists of the following:
three C language source files: x.c, y.c, z.c Assume that x.c and y.c share some
declarations found in defs.h
assembly language source in assmb.s, called by one of the C sources
a set of library routines in /usrlfred/lib/abc.a
The following code is a typical description file (makefile or Makefile), containing
all the information required by make (line numbers are included for illustra­
tion purposes only).
y . o z . o a s smb . s abc . a
x . o y . o z . o / u s r / f red / l i b / abc . a
defs . h
-c x . c
de f s . h
-c y . c
The description file contains four entries. Each entry contains a line with a
colon (the dependency line), and one or more command lines beginning with
a tab. To the left of the colon, on the dependency line, are one or more targets;
to the right of the colon are the files (components) on which the targets
depend. The tab-indents lines illustrate how targets are made from their com­
For example, line 1 states that the program depends on files x.o, y.o, z.o, the
assmb.s assembly source file and the library abc.a. Line 2 specifies the linker
and compiler co mmands used in creating the program from its components.
make executes this command if menu does not exist or if any of the com­
ponent files was modified after the 'last-modified' (timestamp) date of menu.
Programming Tools Guide
Basic features
make uses the remainder of the information contained in the description file
to ensure that each component is up-to-date before executing the cc command
(line 2). Line 3 indicates that x.o depends on x.c and defs.h. If either of these has
been modified, then x.c is recompiled, as directed by the command line (line
4). y.o has similar dependencies, as described in line 5. Line 7 shows that z.o
has only one dependency. After all the subordinate dependencies have been
brought up-to-date, make executes the command ( cc) shown in line 2.
If none of the source or object files have changed since the last time menu was
created, and all of the files are current, the make command announces this
fact and stops.
These entries can take advantage of make's ability to generate files and substi­
tute macros. (For information about macros, see ' Description files and substi­
tutions' later in this chapter.) For example, an entry 'save' might be included
to copy a certain set of files, or an entry 'clean' might be used to throw away
unneeded intermediate files. For example:
c l ea n u p :
If the following command is issued, all object files are deleted.
make cleanup
If a file exists after such co mmands are executed, the file's time of last
modification is used in further decisions. If the file does not exist after the
commands are executed, the current time is used in making further decisions.
Maintaining a zero-length file to keep track of the time at which certain
actions were performed is useful for updating remote archives and listings.
The above description file can be simplified even further by using the existing
naming and compiling conventions, provided in the UNIX system environ­
ment. C language sources always have a .c suffix; a convention imposed by
the cc compiler. Similarly, assembly language sources have a .s suffix. These
conventions allow make, acting upon a set of ' suffix rules', to perform many
tasks. For example:
x . o y . o z . o / u s r / f red / l i b / abc . a
c c - o menu x . o y . o z . o / u s r / f red / l i b / abc . a
In building menu, make first checks whether x.o is up-to-date. It checks any
file in the current directory that, according to the standard suffix rules, could
be used to make x.o. If it locates a file, x.c, and if that file has been changed
. since x.o was last made, make applies one of its own rules by invoking the C
compiler on x.c. Similarly, this occurs with the assembler files, which end in
a .s.
By knowing and using the default suffix rules, or by creating new ones, the
complexity of the description files can be reduced. For more information on
suffixes, see 'Suffixes and transformation rules', later in this chapter.
makefiles and substitutions
A makefile contains the following:
dependency information
macro definitions
executable commands
Dependency line syntax
A dependency line has the form:
t arge t l [ t a rge t 2 . . . ] : [ : J [ dependen t ! . . . ] [ ; c omma n d s ] [ II . . . ]
[ \ t c omma n d s ] [ II . . . ]
Items inside brackets may be omitted. Targets and dependents are strings of
letters, digits, periods, and slashes. Shell metacharacters such as * and ? are
expanded when the line is evaluated. Commands appear either after a semi­
colon on a dependency line or on lines beginning with tabs immediately fol­
lowing a dependency line. A command is any string of characters not includ­
ing a number sign ( #) except, when the number sign is in quotes.
A number sign # 11 denotes a comment line. All characters after the number
sign, on the same line, are ignored. Blank lines are ignored.
Continuation lines
If a non-comment line is too long, it can be continued by using a backslash
\ 11
at the end of the line. However, if the last character on a line is a
backslash, then the backslash and everything following it is replaced by a sin­
gle blank.
Dependency information
A dependency line lists both targets and components, separated by either one
or two colons. A target name can appear on more than one dependency line,
but all of those lines must be of the same (single or double-colon) type. For
example, if a single colon is used on a dependency line, then only a single set
of commands can be associated with that target. If a target appears on more
than one single-colon line, only one of those lines can have a co mmand
sequence associated with it. The occurrence of targets on multiple single96
Programming Tools Guide
makefiles and substitutions
colon lines is used as a notational convenience for clarifying complex
makefiles. make concatenates all the components following each occurrence
of the repeated target into a single dependency list. For example:
t a rg e t l : c omp l comp2
c omma nds
t a rg e t l : c omp2 comp3
This is identical to:
t a rg e t l : c omp l comp2 comp 3
c omma nds
A double colon allows the listing of the same target on two or more depen­
dency lines, permitting commands to be associated with each target line. For
t a rg e t l t arge t 2 : : c omp l comp2
c omma nds
t a rge t 2 : : comp3
c omma nds
Macro definitions
A macro definition is an identifier followed by an equal sign. The identifier
must not be preceded by a colon or a tab. The name (string of letters and
digits) to the left of the equal sign (trailing blanks and tabs are stripped) is
assigned the string of characters following the equal sign (leading blanks and
tabs are stripped). The following are valid macro definitions:
2 = xy z
abc = - l l - l y - l m
The last definition assigns LIBES the null string. A macro that is never explic­
itly defined has the null string as its value. Remember, however, that some
macros are explicitly defined in make's own rules.
make uses a simple macro mechanism for substitution in dependency lines
and command strings is used by make. Macros can either be defined by
command-line arguments or included in the makefile. A macro is invoked by
preceding the name with a dollar sign. Macro names longer than one charac­
ter must be put in parentheses. The following are valid macro invocations:
$ ( C FLAG S )
$ ( xy )
The last two macros are equivalent.
$*, $@, $?, and $< are four special macros that change values during the exe­
cution of the command. The following fragment shows assignment and use
of some macros:
x.o y.o z.o
l rn
menu : $ ( OBJECTS )
cc $ ( OBJECTS )
$ ( L I BES )
-o $ @
Before any command i s issued, certain internally maintained macros are set.
The $@ macro is set to the full name of the current target. The $@ macro is
evaluated only for explicitly named dependencies. The $? macro is set to the
string of names that were found to be younger than the target. The $? macro
is evaluated along with explicit rules from the makefile. If the command was
generated by an implicit rule, the $< macro is the name of the related file that
caused the action, and the $* macro is the prefix, shared by the current and
the dependent filenames.
make UBES="-11 -lm'
This command loads the three objects with both the lex (-11) and the math
(-lm) libraries, because macro definitions on the command line override
definitions in the makefile. (In UNIX system commands, arguments with
embedded blanks must be quoted.)
For example, consider a makefile to maintain the make command itself. The
code for make is spread over a number of C language source files and has a
yacc gr ammar. An example of the makefile follows:
Programming Tools Guide
makefiles and substitutions
# m a ke f i l e f o r t he make command
F I L E S = M a ke f i l e de f s . h ma i n . c doname . c m i s c . c
f i l es . c dosys . c gram . y
OBJECTS = ma i n . o doname . o m i s c . o f i l es . o
dosys . o gram . o
L I BE S = - l l d
L I NT = l i n t -p
C FLAGS = -0
LP = / u s r /b i n / l p
make :
$ ( CC ) $ ( CFLAGS ) $ ( OBJECTS ) $ ( L I B E S ) - o m ak e
@ s i ze make
$ ( OBJECTS ) :
de f s . h
c l ea n up :
- rm * . o gram . c
- du
i n s ta l l :
@ s i z e make / u s r /b i n /make
c p make / u s r / b i n /make & & rm make
l int
dosys . c doname . c f i l e s . c ma i n . c m i s c . c gram . c
$ ( L I NT ) dosys . c doname . c f i l es . c m a i n . c m i s c . c \
gram . c
# pr i n t f i l e s t h a t are ou t - o f - d a t e
# w i t h re spect t o ' pr i n t ' f i l e .
print :
$ ( F I LES )
p r $ ? I $ ( LP )
t ouch p r i n t
Th e make program prints out each co mmand before issuing it.
The following output results from typing make in a directory containing only
the source and makefiles:
cc -o -c ma i n . c
c c - o - c doname . c
c c -o - c m i s c . c
c c -o - c f i l e s . c
c c - o - c dosys . c
y a c c gram . y
mv y . t ab . c g ram . c
c c - o - c gram . c
c c m a i n . o doname . o m i s c . o f i l es . o dosys . o
gram . o - l l d - o make
13 188 + 3348 + 3044
1 9 5 80
The string of digits results from the size make command. The printing of the
command line itself was suppressed by an "at" sign ( @ ) in the makefile.
Extensions of $*, $@, and $<
A makefile may also use the following related macros: $(@D), $(@F), $(*D),
$(*F), $(<D), and $(<F) . The internally generated macros $*, $@, and $< are
useful generic terms for current targets and out-of-date relatives. The follow­
ing related macros are also available: $(@D), $(@F), $(*D), $(*F), $(<D), and
$(<F) . The D refers to the directory part of the single-character macro. The F
refers to the filename part of the single-character macro. These additions are
useful when building hierarchical makefiles. They allow access to directory
names for purposes of using the UNIX cd command.
For example, the following command can be used:
cd $(<D); $(MAKE) $(<F)
Executable commands
Commands contained within a makefile invoke a UNIX system shell and are
executed there. A shell is separately invoked for each successive command.
For example, the following makefile results in three invocations of the shell:
Report : d a t a l da t a 2 awk l
sor t - bdf d a t a l dat a 2 > / u s r / t mp / rp t x x x
a w k - f awk l / u s r / tmp / rp t xxx > report
rm / u s r / tmp / rp t x x x
In general, any shell command can be used in a makefile, but there are a few
restrictions. Each command must be a single line, which constrains the use of
the shell multi-line constructs. A way to circumvent this limitation is to use
backslashes to prevent the shell interpretation of a newline. For example:
test :
echo • s t a rt •
f o r i i n 1 2 3 ; do \
echo ' s t a rt ' \
echo ' s t a rt i n g
'; \
echo • . . . done ' ; \
echo ' e n d '
The middle four lines comprise a single shell command. The backslashes
suppress the normal interpretation of a newline, resulting in the four lines
being considered a single line.
A further restriction results from the fact that each command line is executed
in a separate shell. This does not allow for carry over command executions.
Consequently, a cd command remains in effect within a single line. For exam­
1 00
Programming Tools Guide
makefiles and substitutions
t a rg e t l : c omp l comp2
cd n ewd i r
The Is command lists the directory where make was invoked, not newdir. To
achieve the desired result, try:
t a rge t l : c omp l comp2
c d newd i r ; l s
In thi s example, the shell is invoked on the entire command line, and the cd
remains in effect when the ls is executed.
If a file must be made but there are no explicit co mmands or relevant built-in
rules, the commands associated with the target name .DEFAULT are used. If
there is no such name, make prints a message and stops.
Output translations
Macros in shell commands are translated when executed. The form is as fol­
$ ( m a c ro : s t r i n g l = s t r i ng 2 )
For each appearance of stringl in the evaluated macro, string2 is substituted.
Finding stringl in $(macro) means that the evaluated $ (macro) is considered a
series of strings, each delimited by white space (blanks or tabs). Thus, the oc­
currence of stringl in $(macro) means that a regular expression of the follow­
ing form was found:
. *< s t r i n g l > [ TAB I BLAN K ]
This particular form was chosen because make usually concerns itself with
suffixes. This type of translation is useful when maintaining archive libraries.
Now, all that is necessary is to accumulate the out-of-date members and write
a shell script that can handle all the C language programs (which are those
files ending in .c). Thus, the following fragment optimizes the executions of
make for maintaining an archive library:
$ (LIB) : $ ( LIB) (a . o) $ (LIB) (b. o ) $ (LIB) ( c . o )
$ ( CC ) - c $ ( CFLAGS ) $ ( ? : . o = . C )
$ ( AR ) $ ( ARFLAG S ) $ ( L I B ) $ ?
rm $ ?
A dependency of the preceding form is necessary for each o f the different
types of source files (suffixes) that define the archive library. These transla­
tions are added in an effort to make more general use of the wealth of infor­
mation that make generates.
1 01
Recursive makefiles
Another feature of make concerns the environment and recursive invocations.
For testing purposes, make -n . . . can be executed and everything that can be
done is printed, including output from lower-level invocations of make. For
make -n
This command prints out the commands that make issues without actually
taking the time to execute them. If the sequence $(MAKE) appears anywhere
in a shell-command line, the line is executed even if the -n flag is set. Because
the -n flag is exported across invocations of make (through the MAKEFLAGS
variable), the only thing that is executed is the make command itself. This
feature is useful when a hierarchy of makefile(s) describes a set of software
Suffixes and transfonnation rules
The make program uses an internal table of rules to learn how to transform a
file with one suffix into a file with another suffix. If the -r flag is used on the
make command line, the internal table is not used.
The list of suffixes is actually the dependency list for the name .SUFFIXES. The
make program searches for a file with any of the suffixes on the list. If one is
found, make transforms it into a file with another suffix. The transformation
rule names are the concatenation of the before and after suffixes. Thus, the
name of the rule to transform a r file to a .o file is .r.o. If the rule is present
and no explicit co mmand sequence has been given in the user's makefiles, the
command sequence for the rule .r.o is used. If a command is generated by
using one of these suffixing rules, the macro $* is given the value of the stem
(everything but the suffix) of the name of the file to be made, and the macro $<
is the full name of the dependent line that caused the action.
The order of the suffix list is significant because the list is scanned from left to
right. The first name formed that has both a file and a rule associated with it
is used. If new names are to be appended, the user can add an entry for .SUF­
FIXES in the makefile. The dependents lines are added to the usual list. A
.SUFFIXES line without any dependents deletes the current list. It is necessary
to clear the current list if the order of names is to be changed.
Implicit rules
The make program uses a table of suffixes and a set of transformation rules to
supply default dependency information and implied commands. The default
suffix list is as follows:
1 02
Programming Tools Guide
Recursive makefiles
object file
C source file
sees C source file
FORTRAN source file
sees FORTRAN source file
assembler source file
sees assembler source file
yacc source grammar
sees yacc source grammar
lex source grammar
sees ex source grammar
header file
sees header file
shell file
sees shell file
Figure 5-l summarizes the default transfonnation paths. If there are two
paths connecting a pair of suffixes, the longer one is used only if the inter­
mediate file exists or is named in the makefile .
.y .1
Figure 5-1 Summary of default transformation path
If the file x.o is needed and an x.c is found in the makefile directory, the x.c
would be compiled. If there is an x.l and an x.c file and the x.l has been
modified (that is, has a later modification date than the x.c), then make runs
1 03
lex on the x.l file to re create the x.c file. Otherwise, just the x.c file would be
recompiled to produce the x.o file. However, if there is no x.c but there is an
x.l, make uses the direct link as shown in Figure 5-1 .
It is possible to change tlte names of some of the compilers used in the default
or the flag arguments with which they are invoked by knowing the macro
names used. AS, CC, F77, YACC, and LEX are the macro names of the com­
pilers used by make. By specifically changing a compiler macro, a program­
mer can select which compiler is to be used. For example:
make CC=newcc
This causes the newcc command to be used instead of the usual C language
compiler. The macros ASFLAGS, CFLAGS, F77FLAGS, YFLAGS, and
LFLAGS can be set to cause these co mmands to be issued with optional flags.
For example:
make "CFLAGS=-g''
This causes the cc command to include debugging information.
Archive libraries
The make program has an interface to archive libraries. A user can name a
member of a library, as in these examples:
proj l i b ( ob j e c t . o )
proj l i b ( ( e n t rypt ) )
The second method actually refers to an entry point of an object file within the
library. (The make program looks through the library, locates the entry point,
and translates it to the correct object-filename.)
To use this procedure to maintain an archive library, the following type of
makefile is required for each object:
proj l i b : :
proj l i b ( p f i l e l . o )
$ ( CC ) - c - 0 p f i l e l . c
$ ( AR ) $ ( ARFLAGS ) proj l i b p f i l e l . o
rm p f i l e l . o
proj l i b : :
proj l i b ( p f i l e 2 . o )
$ ( CC ) - c -0 p f i l e 2 . c
$ ( A R ) $ ( ARFLAGS ) proj l i b p f i l e 2 . o
rm p f i l e 2 . o
This is tedious and error-prone. In most cases, the command sequences for
adding a C language file to a library are the same for each invocation; the file
name being the only difference each time.
1 04
Programming Tools Guide
Recursive makefiles
The make command also gives the user access to a rule for building libraries.
The handle for the rule is the .a suffix. A .c.a rule is the rule for compiling a C
language source file, adding it to the library, and removing the .o file. Simi­
larly, the .y.a, the .s.a, and the .l.a rules rebuild yacc, assembler, and lex files,
respectively. The archive rules defined internally are .c.a, .c-.a, fa, J-.a, and
.s-.a. (The tilde (-) syntax will be described shortly.) The user may define
other needed rules in the makefile.
The two-member library mentioned earlier is then maintained with the fol­
lowing shorter makefile:
pro j l i b :
proj l i b ( p f i l e l . o ) proj l i b ( p f i l e 2 . o )
@echo pro j l i b u p - t o - d a t e
The internal rules are already defined to complete the preceding library main­
tenance. The actual .c. a rule is as follows:
$ ( CC ) - c $ ( CFLAGS ) $<
$ ( AR ) $ ( ARFLAG S ) $ @ S * . o
rm - f S * . o
In this example, the $@ macro is the .a target (projlib); the $< and $* macros
are set to the out-of-date C language file, and the filename minus the suffix,
respectively (pfile1 .c and pfile1 ). The $< macro (in the preceding rule) could
have been changed to $*.c.
This is what make does when it sees the construction:
proj l ib :
proj l i b ( p f i l e l . o )
@echo pro j l i b up - t o - da t e
Assume the object in the library i s out-of-date with respect t o pfile1 .c. Also,
there is no pfile1 .o file.
1 . Before using make on projlib, check each dependent of projlib.
2. Enter: make projlib.
3. projlib(pfile1 .o) is a dependent of projlib and needs to be generated.
4. Before generating projlib(pfile1 .o ) , check each dependent of projlib(pfile1 .o ).
5. Use internal rules to try to create projlib(pfile1 .o). (There is no explicit rule.)
Note that projlib(pfile1 .o) has parentheses in the name to identify the target
suffix as .a. This is the key. There is no explicit .a at the end of the projlib
library name. The parentheses imply the .a suffix. In this sense, the .a is
hard-wired into make.
6. Break the name projlib(pfile1 .o) up into projlib and pfile1 .o. Define two mac­
ros, $@ (=projlib) and $* (=pfile1 ).
7. Look for a rule .X.a and a file $*.X*. The first .X (in the .SUFFIXES list) that
fulfills these conditions is .c so the rule is .c.a, and the file is pfile1 .c. Set $<
to pfile1 .c and execute the rule. In fact, make must then compile pfile1 .c.
1 05
8. The library has been updated. Execute the command associated with the
projlib: dependency. That is:
@echo proj l i b u p - t o - da t e
It should be noted that to let pfile1 .o have dependencies, the following syntax
is required:
proj l i b ( p f i l e l . o ) :
$ ( I NCD I R ) / s t d i o . h
pfi lel . c
There is also a macro for referencing the archive member name when this
form is used. The $% macro is evaluated each time $@ is evaluated. If there is
no current archive member, $% is null. If an archive member exists, then $%
evaluates to the expression between the parentheses.
Tildes () in
Files under sees control have the prefix s. followed by the filename. How­
ever, the syntax of make does not directly permit referencing of prefixes. The
sees files are the exception to this. To allow make to access the prefix s. the
suffix rules use tildes r) to identify sees files. For example, the default suffix
rule describes how to transform a C language source file under sees control
into an object file:
$ ( G ET ) $ ( GFLAGS ) $<
$ ( CC ) $ ( CFLAG S ) - c $* . c
- rrn - f $ * . c
The tilde appended to any suffix transforms the file search into an sees file
name search with the actual suffix named by the dot and all characters up to
(but not including) the tilde.
1 06
NOTE makefile or Makefiles files under sees control are accessible to make.
That is, if make is run and only a file named s.makefile or s.Makefile exists,
make does a get on the file, then reads and removes the file.
Programming Tools Guide
Tildes (-) in sees Filenames
The following sees suffixes are internally defined:
C source file
Fortran source file
shell file
transforms an sees C source file to a library (archive) file
transforms an sees C source file to a C source file
transforms an sees C source file to an object file
transforms an sees Fortran source file to a library (archive) file
transforms an sees Fortran source file to a Fortran source file
transforms an sees Fortran source file to an object file
transforms an
transforms an sees assembler source file to an assembler source
transforms an sees assembler source file to an object file
transforms an sees yacc source file to a C source file
transforms an sees yacc source file to an object file
transforms an sees lex source file to a lex source file
transforms an sees lex source file to an object file
transforms an sees c header file to a c header file
assembler source file to a library (archive)
Obviously, the user can define other rules and suffixes, which may prove use­
ful. The tilde provides a handle on the sees filename format so that this is
The null suffix
Many programs consist of a single source file. make handles this case with a
null suffix rule. The following example specifies how to create a file with a
null suffix (visualize the null suffix occurring between the given suffix and the
colon), from a file with a .c suffix .
$ ( CC ) $ ( CFLAGS ) $< - o $ @
In fact, this . c : rule i s internally defined, s o no makefile i s necessary a t all. The
user only needs to type:
make cat dd echo date
1 07
All of these are UNIX system single-file programs, and all four C language
source files are passed through the above shell command line associated with
the .c: rule. The internally defined single-suffix rules are:
Others may be added to the makefile by the user.
Creating new suffix rules
Both significant suffix rules and the rules used in transforming files of one
suffix-type into another can be modified. Including the following line in the
makefile adds .q, .w, .t to the list of available suffixes:
. SUFF IXES : . q . w . t . a
To delete all currently recognized suffixes, add this line:
Therefore, the combination of the two lines replaces the current suffixes with
.q, .w, and .t:
.q .w .t
NOTE Be sure to include spaces or tabs between each suffix on the .SUF­
FIXES line.
To define new suffix rules, simply add the rules to the makefile. This appears
as a normal entry. For example, make is required to help maintain a set of text
files requiring formatting and printing. Add the suffixes and define suffix
rules as follows:
TBL = t b l
EQN = eqn
N EQN= neqn
.s .t .1
. s . l : i f orma t t ro f f -ms source f i l e f o r l i n e pr i n t er
$ ( TB L ) $< 1 $ ( N EQN ) I t ro f f -ms -e - T l p - u 5 $ ( ROFFARGS ) > $ * . 1
. s . t : i f o rma t t ro f f -ms s ource f i l e f o r t e rm i na l
$ ( TBL ) $< 1 $ ( NEQN ) I t ro f f -ms - e - T l p - u 5 $ ( ROFFARG S ) I u l - i > $ * . t
. s . a : i f o rma t t ro f f -ms source f i l e f o r app l e l a s e r p r i n t e r
$ ( TBL ) $< 1 $ ( EQN ) I t ro f f - m s -e -Tspc $ ( ROFFARGS ) > $ * . a
1 08
Programming Tools Guide
Tildes (-) in sees Filenames
The first line of the makefile nullifies the default suffix rule of make. The
second line establishes three suffixes as significant. These suffixes are inter­
preted as .s suffixes, indicating their use of the -ms formatting macros. The
first entry declares how to make a .I file out of an .s file. That provides the rule
for converting a source file containing -ms macros into a formatted output file
for a line printer. Similarly, the third entry describes how to create a .a file (for
output to a laser printer). These rules work in the same manner as regular
entries: the second suffix belongs to the 'target' and the first suffix belongs to
the 'component'.
include files
The make program has a capability similar to the #include directive of the C
preprocessor. If the string include appears as the first seven letters of a line in
a makefile and is followed by a blank or a tab, the rest of the line is assumed to
be a filename, which the current invocation of make reads. Macros may be
used in filenames. The file descriptors are stacked for reading include files so
that no more than 16 levels of nested include files are supported.
Dynamic dependency parameters
A dependency line consists of a target, to the left of the colon, and a depen­
dent (components) to the right of the colon. A target may be a large number of
executable files, having one source file each. Each of the executable files must
be compiled separately using its own source file. This creates a long and tedi­
ous makefile. make can create a list of executables under one name. For
CMOS = c a t dd echo date c c crnp c ornrn ar ld chown
When a $@ macro is used to the right of the colon in a makefile, replace the
dependent (components) with the target name (which contains the list of exe­
cutables requiring compilation). This is the dynamic dependency parameter.
It is dynamic because each of the components contained in CMOS is used. For
example, the following compiles all the components previously defined in the
target (CMOS) with a two-line entry in the makefile:
$ ( CMOS ) : $ $ @ . c
$ ( cc ) -o $ ?
Obviously, this is a subset of all the single-file programs. For multiple-file
programs, a directory is usually allocated and a separate makefile is made. For
any particular file that has a peculiar compilation procedure, a specific entry
must be made in the makefile.
1 09
The second useful form of the dependency parameter is $$(@F). It represents
the filename part of $$@. Again, it is evaluated at execution time. Its useful­
ness becomes evident when trying to maintain the /usr/include directory from
a makefile in the /usr/src/head directory. Therefore, the /usr/src/head/makefile
looks like this:
/ u s r / i n c l u de
$ ( I NCD I R )
$ ( I N CD I R )
$ ( I NCD I R )
$ ( I NCD I R )
/stdio . h \
/pwd . h \
/d i r . h \
/ a . ou t . h
$ ( I NCLUDES ) : $ $ ( @ F )
c p $ ? $@
chmod 0 4 4 4 $ @
This completely maintains the /usr/include directory whenever one o f the
above files in /usr/src/head is updated.
Environment variables
Environment variables are read and added to the macro definitions each time
make executes. Precedence is a prime consideration in doing this properly.
The following describes make's interaction with the environment.
make automatically defines a macro, MAKEFLAGS, containing the
command-line option flags given in the invocation of make. The intent of the
MAKEFLAGS macro, upon startup, is to set environment variables to be used
by successive invocations of make.
When executed, make assigns macro definitions in the following order:
1 . Read the MAKEFLAGS environment variable. If it is not present or null,
the internal make variable MAKEFLAGS is set to the null string. Other­
wise, each letter in MAKEFLAGS is assumed to be an input flag argument
and is processed as such. (The only exceptions are the -f, -p, and -r flags.)
2. Read the internal list of macro definitions.
3. Read the environment. The environment variables are treated as macro
definitions and marked as exported (in the shell sense).
4. Read the makefile(s). The assignments in the makefile(s) override the
environment. This order is chosen so that it is possible to know what to
expect by reading the makefile being executed. However, if the -e
command-line flag is used, the environment overrides the makefile assign­
ments. Therefore, if make -e is typed, the variables in the environment
override the definitions in the makefile. Also, MAKEFIAGS overrides the
environment if assigned. This is useful for further invocations of make
from the current makefile.
Programming Tools Guide
Internal rules
The priority of assignments is from least binding to most binding; the precedence of assignments is:
internal definitions
makefile( s )
command line
NOTE If the -e option is specified, then the priority of assignments is rear­
ranged to:
internal definitions
command line
This order is general enough to allow a programmer to define a makefile or set
of makefiles whose parameters are dynamically definable.
Suggestions and warnings
If a change to a file is minor (such as adding a comment to an include file), the
-t (touch) option can save a lot of time. Instead of issuing a large number of
superfluous recompilations, make updates the modification times on the
affected file.
make -ts
This command (-ts stands for touch silently) causes the relevant files to
appear up-to-date. Obvious care is necessary because this mode of operation
subverts the intention of make and destroys all memory of the previous rela­
Internal rules
The standard set of internal rules used by make is reproduced in the examples
shown on the following pages.
.. :
Example 5-1 make internal rules (Page 1 of 5)
c - .y . y
.1 .1
. s . s - .h . h- . sh . sh - . f . f
ARFLAG S = - rv
CFLAG S = -0
F 7 7 FLAG S =
LEX= l e x
LD= l d
YACC =y a c c
1 12
Programming Tools Guide
Internal rules
Example 5-1 make internal rules (Page 2 of 5)
$ ( CC ) $ ( CFLAGS ) $< $ ( LDFLAGS ) - o $ @
$ ( G E T ) $ ( GFLAGS ) $<
$ ( CC ) $ ( CFLAGS ) $ * . c $ ( LDFLAG S ) - o $ *
- rm - f $ * . c
. f:
$ ( F 7 7 ) $ ( F 7 7 FLAGS ) $< $ ( LDFLAG S ) - o $ @
$ ( GET ) $ ( GFLAGS ) $<
$ ( F 7 7 ) $ ( F 7 7 FLAGS ) $< $ ( LDFLAGS ) - o $ *
- rm - f $ * . f
. sh :
c p $< $ @ ; chmod 0 7 7 7 $ @
. sh
$ ( GET ) $ ( GFLAGS ) $<
c p $ * . sh $ * ; chmod 0 7 7 7 $ @
- rm - f $ * . sh
1 13
Example 5-1 make internal rules (Page 3 of 5)
. f . f . s . s . sh . sh
$ ( GET ) $ ( GFLAG S ) $<
$ ( C C ) - c $ ( CFLAGS ) $ <
$ ( A R ) $ ( ARFLAGS ) $ @ $ * . o
rrn - f $ * . 0
$ ( GE T ) $ ( GFLAGS ) $<
$ ( CC ) -c $ ( CFLAGS ) $ * . c
$ ( AR ) $ ( ARFLAGS ) $ @ $ * . o
rrn - f $ * . [ c o ]
$ ( CC ) $ ( CFLAG S ) - c $ <
$ ( G ET ) $ ( GFLAGS ) $ <
$ ( CC ) $ ( CFLAGS ) -c $ * . c
- rrn - f $ * . c
. f.a:
$ ( F 7 7 ) $ ( F 7 7 FLAGS ) - c $ * . f $ ( LDFLAG S )
$ ( A R ) $ ( ARFLAGS ) $ @ $ * . 0
- rrn - f $ * . o
$ ( GET ) $ ( GFLAGS ) $ <
$ ( F 7 7 ) $ ( F7 7 FLAGS ) c $ * . f $ ( LDFLAG S )
$ ( A R ) $ ( ARFLAGS ) $ @ $ * . o
- rrn - f $ * . [ f o ]
Programming Tools Guide
Internal rules
Example 5-1 make internal rules (Page 4 of 5)
$ ( F 7 7 ) $ ( F7 7 FLAG S ) -c $ * . f $ ( LDFLAG S )
. a :
$ ( GET ) $ ( G FLAGS ) $ <
$ ( F 7 7 ) $ ( F7 7 FLAGS ) -c $ * . f $ ( LDFLAG S )
- rm - f $ * . f
$ ( GET ) $ ( GFLAGS ) $ <
$ ( AS ) $ ( ASFLAGS ) - o $ * . o $ * . s
$ ( AR ) $ ( ARFLAG S ) $ @ $ * . 0
- rm - f $ * . [ s o ]
$ ( AS ) $ ( ASFLAGS ) - o $ @ $ <
$ ( G ET ) $ ( GFLAG S ) $<
$ ( AS ) $ ( ASFLAG S ) - o $ * . o $ * . s
- rm - f $ * . s
. l .c
$ ( LEX ) $ ( LFLAGS ) $<
mv l e x . yy . c $ @
$ ( G E T ) $ ( GFLAGS ) $ <
$ ( LEX ) $ ( LFLAGS ) $ * . 1
mv l ex . yy . c $ @
Example 5-1 make internal rules {Page 5 of 5)
. l .0:
$ ( LEX ) $ ( LFLAGS ) $<
$ ( CC ) $ ( CFLAGS ) -c 1 ex . yy . c
rm 1 e x . yy . c
mv 1 e x . yy . o $ @
- rm - f $ * . 1
$ ( GET ) $ ( GFLAGS ) $<
$ ( LEX ) $ ( LFLAGS ) $ * . 1
$ ( CC ) $ ( CFLAGS ) - c 1 ex . yy . c
rm - f 1 e x . yy . c $ * . 1
mv 1 ex . yy . o $ * . o
$ ( YACC ) $ ( YFLAGS ) $<
mv y . t ab . c $ @
$ ( GET ) $ ( GFLAGS ) $ <
$ ( Y ACC ) $ ( YFLAGS ) $ * . y
mv y . t ab . c $ * . c
- rm - f $ * . y
$ ( YACC ) $ ( YFLAGS ) $<
$ ( CC ) $ ( CFLAGS ) - c y . t ab . c
rm y . t ab . c
mv y . t ab . o $ @
$ ( G ET ) $ ( GFLAGS ) $ <
$ ( YACC ) $ ( YFLAGS ) $ * . y
$ ( CC ) $ ( CFLAGS ) - c y . t ab . c
rm - f y . t ab . c $ * . y
mv y . t ab . o $ * . o
Programming Tools Guide
Chapter 6
Source code control system (SCCS)
Source Code Control System (SCCS) is a collection of UNIX system commands
that maintain and track changes made to source code or documentation.
SCCS is basically a file custodian. Under SCCS, whenever changes are made
to a file, sees records those changes and maintains the original. sees can:
store text files
retrieve specific versions of files
control updating access to files
identify the version of a retrieved file
record when and why changes were made to a file, and the person making
These features are important if code and documentation undergo frequent
changes due to maintenance or enhancement work. Whenever changes are
applied to a file under sees control, only the latest set of changes are added
to the SCCS image of that file. By using this approach, SCCS can maintain
successive versions of the file, while consuming only a minimal amount of
increased disk space. This allows for the efficient use of resources and easy
regeneration of previous versions of a file.
This chapter covers the following topics:
sees for beginners: how to create, retrieve, and update versions of sees
Delta numbering: how versions are numbered and named under SCCS
sees command conventions: conventions and rules applied to sees com­
Source code control system (SCCS)
sees commands: explanation of sees commands, with argument usage
sees files: how sees files are formatted, protected, and audited
NOTE Installation and implementation of SCCS are not covered by in
sees for
This section presents several terminal session fragments. The best way to
learn sees is to use it.
Files under SCCS are composed of one or more sets of changes applied to the
original version of the file. A set of changes depends on all the previous sets.
For SCCS to keep track of the changes, any alterations to a file are stored
separately in another file called a delta.
Each delta is assigned a name known as an SCCS IDentification string (SID).
An SID contains four components. The first two are the release and level
numbers, separated by a period. The next two are the branch and sequence
numbers, also separated by a period. This is explained under the 'Delta Num­
bering' section of this chapter.
The SID for any original file turned over to SCCS is composed of release num­
ber 1 and level number 1, stated as 1 . 1 . The SID for the first set of changes
made to that file (that is, its first delta) is release 1 and version 2, or 1 .2. The
next delta would be 1 .3, the next 1 .4, and so on. There will be more on delta
numbering later. At this point, it is enough to know that, by default, SCCS
assigns SIDs automatically.
Creating an SCCS file using admin
Using a text editor, create a file called lang, which contains a list of following
programming languages:
PL / 1
Custody of the lang file can be given to SCCS by using the admin (administer)
command. The following co mmand creates an sees file from the lang file:
admin -ilang sJang
Programming Tools Guide
sees for beginners
All sees file names must begin with s., therefore s.Iang. The -i option,
together with its value argument (lang), indicates that admin is to create a
new sees file with the contents of the file lang.
The output of the admin command is:
No id keywo rds ( cm7 )
This is a warning message that can also be issued by other sees commands.
Ignore it for now. Its significance is described later with the get command
under ' SCeS commands.' In the following examples, this warning message is
not shown, although it may be issued.
Remove the lang file. It is no longer needed because it now exists under sees
as s.Iang. Enter:
rm lang
Retrieving a file by means of get
Using the get command, retrieve the file s.Iang:
get sJang
The output of the get command is:
5 l i nes
This indicates that get has retrieved version 1 .1 of the file, which is made up
of five lines of text.
The retrieved file has been placed in a new file known as a 'g.file.' sees
forms the g.file name by deleting the prefix s. (the file is now called g.lang)
from the name of the sees file. Thus, the original lang file has been recreated.
An ls(C) command (documented in the User's Reference) lists both lang and
s.Iang files in the directory. sees retains the s.lang for other users.
The get s.lang command creates lang as read-only, keeping no information
regarding its creation. When a file is retrieved in the above manner, it canno t
be edited. If changes are to be made to the file, use the -e option to the get
command as follows:
get -e s.lang
The get -e command causes sees to create lang for editing. It also places per­
tinent information about lang in another new file, called the 'p.file' (p.lang, in
this case), which is needed later by the delta command.
The get -e command prints additional status information including informing
the user that a first delta has been created for this file. The SID is included in
the get -e output.
Source code control system (SCCS)
n ew de l t a
5 l i nes
Recording changes by using delta
Modify the lang file by adding two more programming languages:
To record the changes made to lang, issue the following command:
delta s.lang
The following prompt appears:
commen t s ?
Respond with a meaningful description of the changes just applied to the file.
For example:
c ommen t s ?
added more languages
The delta command now reads the p.file (p.lang) and determines what
changes have occurred. It does this by performing its own get to retrieve the
original version, and applying the diff(C) command (documented in the User's
Reference). The delta command compares the original and the edited versions
of the lang file: it determines the differences and stores those changes in s.lang.
The p.lang and lang files, which are no longer needed, are automatically
When this process is complete, delta outputs the following:
2 inserted
0 deleted
5 unchanged
(1 .2 is the SID of the delta just created)
(indicates how many new lines are in the file)
(indicates how many lines were deleted from the file)
(indicates unchanged lines in the file)
Additional infonnation about get
As shown in the previous example, get retrieves the latest version of the file
s.lang. This is accomplished by starting with the original version and succes­
sively �pplying the changes (deltas) in order until all have been applied. If the
get command is issued now, it retrieves version 1 .2 of the file s.lang. Any of the
following invocations of get produces the same result:
Programming Tools Guide
sees for beginners
get sJang
get -rl s.lang
get -r1.2 s.lang
The numbers following the -r are SIDs. When the level number of the SID (get
-r s.lang) is omitted, the default is the highest level number existing within
that specific release. The second command, (get -rl s.lang), requests the
retrieval of the latest version of release 1 . The third co mmand specifically
requests the retrieval of a particular version, release 1 version 2.
Whenever a significant change is made to a file, the usual identification
method is to increase the release number (the first number of the SID).
Because normal automatic numbering of deltas proceeds by incrementing the
level number, the user must explicitly instruct sees to increment the release
number. This is accomplished as follows:
get -e -r2 s.lang
Release 2 does not exist, so get retrieves the latest version before release 2.
The get command also interprets this as a request to change the release num­
ber of the new delta to 2, thereby naming it 2.1 rather than 1 .3. The output
that follows indicates that version 1 .2 has been retrieved, and 2.1 is the new
delta version that will be created.
new de l t a 2 . 1
7 l i nes
If the file is now edited (for example, by deleting COBOL from the list of lan­
guages) and delta is executed:
delta s.lang
deleted COB OL from the list of languages
c ommen t s ?
Then the delta's output is:
0 i nserted
1 de l e� e d
6 unchanged
Deltas can now be created in release 2 (deltas 2.2, 2.3, and so on), or another
new release can be created in a similar manner.
Source code control system (SCCS)
Delta numbering
Deltas may be thought of as the nodes of a tree in which the root node is the
initial version of the file. The root node (original file) is normally named 1 . 1
and successor deltas (nodes) are named 1 .2, 1 .3, 1 .4, and s o on. The two num­
bers in these names are the first two components of an SID, known as the
'release' and ' level' numbers, respectively. sees automatically names new
deltas by incrementing the level numbers whenever a delta is made.
In addition, if a release number is incremented, indicating a major change, the
new release number is applied to all subsequent deltas. This evolutionary
process is represented in Figure 6-1 .
Figure 6·1 Evolution of an sees file
This is the normal sequential development of an sees file, with each delta
dependent on the preceding deltas. Such a structure is called the trunk of an
sees tree.
There are situations that require branching an sees tree. That is, changes are
planned for a given delta that are not dependent on all previous deltas. For
example, consider a program in production, at version 1 .3, and for which de­
velopment work on release 2 is already in progress. Release 2 may already
have a delta in progress as shown in Figure 6-1 . Assume that a problem is
reported in version 1 .3 that cannot wait for correction until release 2. The
changes necessary to correct the problem are applied as a delta to version 1 .3.
This creates a new version that will then be released to the user, but does not
affect the changes being applied for release 2 (that is, deltas 1 .4, 2.1, 2.2, and so
on). This new delta is the first node of a new branch of the tree.
Branch delta names always have four SID components: the same release
number and level number as the trunk delta, plus a branch number and
sequence number. The format is as follows:
The branch number of the first delta branching off any trunk delta is always 1,
and its sequence number is also 1 . For example, the full SID for a delta
branching off trunk delta 1 .3 will be 1 .3.1 . 1 . As other deltas on that same bis
created, only the sequence number changes: 1 .3.1 .2, 1 .3.1 .3, and so forth. This
is shown in Figure 6-2:
Programming Tools Guide
Delta numbering
Figure 6-2 Tree structure with branch deltas
The branch number is incremented only when a delta is created that starts a
new branch off an existing branch, as shown in Figure 6-3. As this secondary
branch develops, the sequence numbers of its deltas are incremented ( 1 .3.2.1,
1 .3.2.2, and so on), but the secondary branch number remains the same.
Figure 6-3 Extended branching concept
The concept of branching may be extended to any delta in the tree, and the
numbering of the resulting deltas proceeds as shown here. sees allows the
generation of complex tree structures. Although this capability has been pro­
vided for certain specialized uses, the sees tree should be kept as simple as
1 23
Source code control system (SCCS)
command conventions
sees commands
key letters
accept two types of arguments:
A key letter represents an option that begins with a minus sign, (-), followed
by a lowercase letter and, in some instances, a value.
File arguments, that may be file or directory names, specify the file(s) that the
SCCS co mmand is to process. Naming a directory is equivalent to naming all
the SCCS files contained in the directory. Non-SCCS files and files which are
unreadable are silently ignored.
In general, a filename argument may not begin with a minus sign. If a
filename - (a single minus sign) is specified, the command reads the standard
input (usually your terminal) for lines and takes each line as the name of an
sees file to be processed. The standard input is read until end-of-file. This
feature is often used in pipelines with, for example, the commands find(C)
and ls(C), which are also documented in the User's Reference.
Key letters are processed before filenames. Therefore, the placement of key
letters is arbitrary; that is, they may be interspersed with file names. File
names, however, are processed left to right. Somewhat different conventions
apply to what(CP), sccsdiff(CP), and val(CP), detailed later under ' SCCS com­
mands' and documented in the Programmer's Reference.
Certain actions of various SCCS commands are controlled by flags appearing
in SCCS files. Some of these flags are discussed, but for a complete descrip­
tion see admin(CP) in the Programmer's Reference.
There is distinction between real user (see passwd(C)) and effective user when
discussing various actions of SCCS commands. For now, assume that the per­
son logged into the UNIX system is both the real and effective user.
x.files and z.files
All sees commands that modify an sees file do so by writing a copy called
the 'x.file.' This is done to ensure that the sees file is not damaged if pro­
cessing terminates abnormally. sees names the x.file by replacing the s. of
the SCCS filename with X. . The x.file is created in the same directory as the
SCCS file, given the same mode (see chmod(C) in the User's Reference), and
owned by the effective user. When processing is complete, the old sees file is
destroyed, and the modified x.file is renamed (with x. replaced by s.) and
becomes the new sees file.
1 24
Programming Tools Guide
sees commands
To prevent simultaneous updates to an sees file, the same modifying com­
mands also create a lock-file called the "z.file." sees forms its name by
replacing the
of the sees filename with a
prefix. The z.file contains the
process number of the co mmand that created it, and its existence prevents
other commands from processing the sees file. The z.file is created with
access permission mode 444 (read only) in the same directory as the sees file,
and is owned by the effective user. Like the x.file, the z.file exists only for the
duration of the execution of the command that creates it.
general, users can ignore x.files and z.files. They are useful only in the
event of system crashes or similar situations.
This section describes the major features of the sees commands along with
their most common arguments. Full descriptions with details of all argu­
ments are in the Programmer's
Here is a quick-reference overview of the commands:
initializes sees files, manipulates their descriptive text, and
controls delta creation rights
changes the commentary associated with a delta
combines consecutive deltas into one to reduce the size of an
seeS file
applies deltas (changes) to sees files and creates new ver­
retrieves versions of sees files
provides an expanation of error messages
prints portions of an sees file in user-specified format
removes a delta from an sees file; allows removal of deltas
created by mistake
prints information about files that are currently out for edit­
sccsdi ff
shows differences between any two versions of an sees file
undoes the effect of a get -e before the file is changed
validates an sees file
a filter that may be used for version control
searches any UNIX system file(s) for all occurrences of a spe­
cial pattern and prints out what follows it; useful in identify­
ing information inserted by the
get command
1 25
Source code control system (SCCS)
Error messages
sees commands produce error messages on the diagnostic output in this for­
ERROR [ name - o f - f i l e -being-processed ] : mes s age text ( code )
The code in parentheses can be used as an argument to the help co mmand to
obtain a further explanation of the message. Detection of a fatal error during
the processing of a file causes the sees command to stop processing that file
and proceed with the next file specified.
The help command
The help facility provides an explanation of the error messages. For example:
get lang
This error message is displayed:
E R ROR [ l a n g ] : n o t an sees f i l e ( c o l )
For more information on this error message, use the help facility in conjunc­
tion with the error code (col), for example:
help col
This provides the following explanation of why get lang produced an error
col :
• no t an sees f i l e '
A f i l e t h a t you t h i n k i s a n sees f i l e
does n o t beg i n w i t h t he chara c t ers ' s . ' .
The help co mmand, along with an error code, will provide assistance in
understanding the meaning of most sees messages.
The get command
The get(eP) command creates a file containing a specified version of an sees
file. The version is retrieved by beginning with the initial version and then
applying deltas, in order, until the desired version is obtained. The resulting
file is called the "g.file." It is created in the current directory and owned by the
real user. The mode assigned to the g.file depends on how the get command
is used.
The most common use of get is:
This retrieves the latest version of file abc from the sees file tree trunk and
produces (for example) on the standard output:
Programming Tools Guide
sees commands
6 7 l i nes
N o i d keywo rds ( cm7 )
This means version 1 .3 of file has been retrieved (assuming 1 .3 is the
latest trunk delta), it has 67 lines of text, and no ID keywords have been sub­
stituted in the file.
The generated g.file (file abc) is given access permission mode 444 (read only).
This particular way of using get produces g.files only for inspection, compila­
tion, and so on. It is not intended for editing (making deltas).
When several files are specified, the same information is output for each one.
For example, enter:
It produces:
s . abc :
6 7 l ines
N o i d keywords ( cm7 )
s . xy z :
8 5 l ines
N o i d keywords ( cm7 )
keywords in SCCS
In generating a g.file for compilation, it is useful to record the date and time of
creation, and the version retrieved, the module's name within the g.file. This
information eventually appears in a load module when one is created. sees
provides a convenient mechanism for doing this automatically. Identification
(ID) keywords appearing anywhere in the generated file are replaced by
appropriate values according to the definitions of those ID keywords. The
format of an ID keyword is an uppercase letter enclosed by percent signs (% ).
In this example, I is the ID keyword replaced by the SID of the retrieved ver­
sion of a file. Similarly, %H% is the current date in MM/DD /YY format and
%M% is the name of the g.file. When get is executed on an sees file contain­
ing the following PL/1 declaration:
DCL I D CHAR ( l O O ) VAR I N I T ( ' %M % % 1 % % H % ' ) ;
The PL/1 declaration produces the following:
CHAR ( l O O )
07 / 1 8 / 8 5 ' ) ;
Source code control system (SCCS)
When no ID keywords are substituted by get, the following message is issued:
No i d keywords ( crn7 )
This message is normally treated as a warning by get, although the presence
of the -i flag in the sees file causes it to be treated as an error. For a complete
list of the approximately 20 ID keywords provided, see get(CP) in the
Programmer's Reference.
Retrieval of different versions
The version of an sees file that get retrieves is the most recently created delta
of the highest-numbered trunk release. However, any other version can be
retrieved with get -r by specifying the version's SID.
get -r1.3
This command retrieves version 1.3 of file and produces (for example) on
the standard output:
64 l ines
A branch delta can be retrieved similarly:
get -r1.5.2.3
This produces (for example) on the standard output:
2 3 4 l i nes
When an SID is specified and the particular version does not exist in the sees
file, an error message results.
This command omits the level number:
get -r3
This causes the retrieval of the trunk delta with the highest level number
within the given release. The output of the above command might be:
2 1 3 l ines
If the given release does not exist, get retrieves the trunk delta with the
highest level number within the highest-numbered existing release that is
lower than the given release. For details on numbering by delta's see the
'Delta numbering' section.
get -1'9
Executing the above might produce:
4 2 0 l ines
This indicates that trunk delta 7.6 is the latest version of file below
release 9.
Programming Tools Guide
sees commands
get -r4.3.2
Similarly, omitting the sequence number, for example: results in the retrieval
of the branch delta with the highest sequence number on the given branch. (If
the given branch does not exist, an error message results.) This might result
in the following output:
8 9 l ines
The get -t command retrieves the latest (top) version of a particular release
when no -r is used or when its value is simply a release number. The latest
version is the delta produced most recently, independent of its location on the
sees file tree. For example, if the most recent delta in release 3 is 3.5, and the
following command is issued:
get -r3 -t
It might produce:
5 9 l i nes
However, if branch delta were the latest delta (created after delta 3 .5),
the same co mmand might produce:
4 6 l i nes
Retrieval with intent to make a delta
The get -e command indicates an intent to make a delta. First, get checks the
following conditions:
It checks whether the login name or group ID of the person executing get is
present in the user list. The login name or group ID must be present for the
user to be allowed to make deltas. (See 'The admin command' for a discus­
sion of making user lists.)
It checks whether the release number (R) of the version being retrieved
satisfies the relation:
f l oo r i s l e s s than or equa l to R ,
wh i ch i s l e s s t h a n o r equa l t o c e i l i ng .
This check determines whether the release being accessed is protected. The
floor and ceiling are flags in the sees file, representing start and end of
It checks whether the R is locked against editing. The lock is a flag in the
sees file.
It checks whether multiple concurrent edits are allowed for the sees file
by the -j flag in the sees file.
Source code control system (SCCS)
A failure of any of the first three conditions causes the processing of the corre­
sponding sees file to terminate.
If the above checks succeed, get -e causes the creation of a g.file in the current
directory with mode 644 (readable by everyone, writable only by the owner)
owned by .the real user. If a writable g.file already exists, get terminates with
an error. This is to prevent inadvertent destruction of a g.file while it is being
edited for the purpose of making a delta. Any ID keywords appearing in the
g.file are not substituted by get -e because the generated g.file is subsequently
used to create another delta. Replacement of ID keywords causes them to be
permanently changed in the sees file. As a direct result of this, get does not
check for presence of ID keywords in the g.file. The following message is
never output when get -e is used.
No id keywo rds ( em? )
addition, get -e causes the creation (or updating) of a p.file that is used to
pass information to the delta command.
For example:
get -e
The output of this command is:
n ew de l t a 1 . 4
67 l i nes
Undoing a get -e
There may be times when a file is erroneously retrieved for editing, when
there is really no editing that needs to be done at the time. In such cases, the
unget co mmand cancels the delta reservation that was set up. For example,
enter either of the following commands:
unget -r1.4
These commands produce:
Programming Tools Guide
sees commands
Additional get options
If get -r and/or -t keys are used together with the -e option, the version
retrieved for editing is specified with -r and/or -t.
The get -i and -x commands specify a list of deltas to be included and
excluded, respectively. (See get(eP) in the Programmers Reference for the
syntax of such a list.) Including a delta means forcing its changes to be
included in the retrieved version. This is useful in applying the same changes
to more than one version of the sees file. Excluding a delta means forcing it
not to be applied. This may be used to undo the effects of a previous delta in
the version to be created.
Whenever deltas are included or excluded, get checks for possible interference
with other deltas. For example, two deltas can interfere when each one
changes the same line of the retrieved g.file. A warning shows the range of
lines within the retrieved g.file where the problem may exist.
NOTE The user should examine the g.file to determine what the problem is
and take appropriate corrective steps (such as edit the file).
The get -i and get -x commands should be used with care.
The get -k command has two uses. The first is to regenerate a g.file that may
have been accidentally removed or corrupted after get -e. The second use is
to generate a g.file in which the replacement of ID keywords has been
suppressed. A g.file generated by get -k is identical to one produced by get
and executed with the -e key. However, no processing related to the p.file
takes place.
1 31
Source code control system (SCCS)
Concurrent edits of different SID
The ability to retrieve different versions of an sees file allows several deltas
to be in progress at any given time. This means that several get -e co mmands
may be executed on the same file unless two executions retrieve the same ver­
sion or multiple concurrent edits are allowed.
The p.file created by get -e is named by automatic replacement of the sees
filename's prefix s. with p . It is created in the same directory as the sees file,
given mode 644 (readable by everyone, writable only by the owner), and
owned by the effective user. The p.file contains the following information for
each delta that is still in progress:
SID of the retrieved version
SID given to the new delta when it is created
login name of the real user executing get
The first execution of get -e causes the creation of a p.file for the corre­
sponding sees file. Subsequent executions simply update the p.file with a
line containing the above information. Before updating, however, get checks
to assure that the SID of the version to be retrieved has not already been
retrieved (unless multiple concurrent edits are allowed). If the check
succeeds, the user is informed that other deltas are in progress, and pro­
cessing continues. If the check fails, an error message results.
It should be noted that concurrent executions of get must b e carried out from
different directories. Subsequent executions from the same directory attempt
to overwrite the g.file, which is an sees error condition.. In practice, this
problem does not arise because each user normally has a different working
directory. See 'Protection' under 'SCeS files' for a discussion of how different
users are permitted to use sees commands on the same files.
1 32
Programming Tools Guide
sees commands
Table 6-1 shows the possible SID components a user can specify with get
(left-most column), the version that is then retrieved by get, and the resulting
SID for the delta, which delta creates (right-most column).
Table 6-1 Determination of new SID
in get*
-b keyletter
by get
SID of delta
to be created
by delta
R defaults to mR
mR.(mL+ 1)
R defaults to mR
mR.mL.(mB+1 )
R > mR
R = mR
mR.(mL+1 )
R > mR
mR.mL.(mB+ 1 ) .1
R = mR
mR.mL.(mB+1 ) .1
R< mR and R
does not exist
hR.mL.(mB+1 ) .1
Trunk successor
number in
release > R,
and R exists
R.mL.(mB+ 1).1
No trunk
R .(L+ 1 )
No trunk
R.L.(mB+ 1).1
Trunk successor
in release <!: R
No branch
R.L.B.(m5+1 )
No branch
R.L.(mB+ 1).1
No branch
No branch
R.L. B.S
Branch successor
R.L.(mB+1 ) .1
Footnotes *, t, t, §, and ** on next page.
Source code control system (SCCS)
Footnotes to Table 6-4:
R, L, B, and S mean release, level, branch, and sequence numbers in
the SID, and m means maximum. For example, R.mL means the max­
imum level number within release R. R.L.(mB+1).1 means the first
sequence number on the new branch (that is, the maximum branch
number plus 1) of level L within release R. Note that, if the SID
specified is R.L, R.L.B, or R.L.B.S, then each of these specified SID
numbers must exist.
The -b key letter is effective only if the -b flag (discussed under
admin(eP) in the Programmer's Reference) is present in the file. An
entry of - means 'irrelevant'.
The release hR i s the highest existing release that i s lower than the
specified, nonexistent release R.
This case applies if the -d (default SID) flag is not present. If the -d
flag is present in the file, the SID is interpreted as though specified on
the co mmand line. Therefore, one of the other cases in this figure
This forces the creation of the first delta in a new release.
Concurrent edits of same SID
Under normal conditions, more than one get -e for the same SID is not permit­
ted. That is, delta must be executed before a subsequent get -e is executed on
the same SID.
Multiple concurrent edits are allowed if the -j flag is set in the sees file. For
get -e
This produces:
new de l t a 1 . 2
5 l i nes
This can be immediately followed by (without an intervening delta):
get -e
This produces:
new de l t a 1 . 1 . 1 . 1
5 l i ne s
In this case, a delta after the first get produces delta 1 .2 (assuming that 1.1 is
the most recent trunk delta), and a delta after the second get produces delta
1 .1 .1 . 1 .
Programming Tools Guide
Key letters that affect output
The get -p co mmand causes the retrieved text to be written to the standard
output, rather than to a g.file. In addition, all output normally directed to the
standard output (such as the SID of the version retrieved and the number of
lines retrieved) is directed instead to the diagnostic output. The get -p com­
mand creates a g.file with an arbitrary name, as in:
get -p > arbitrary-filename
The get -s co mmand suppresses output normally directed to the standard out­
put, such as the SID of the retrieved version and the number of lines retrieved,
but it does not affect messages normally directed to the diagnostic output.
The get -s co mmand prevents nondiagnostic messages from appearing on the
user's terminal and is often used with -p to pipe the output, as in:
get -p -s I pg
The get -g co mmand suppresses the retrieval of the text of an sees file. This
is useful in several ways. It may be used to verify a particular SID in an sees
file. For example:
get -g r4 3
This outputs the SID 4.3 if it exists in the sees file, or an error message if
it does not.
Another use of get -g is in regenerating a p.file that may have been acciden­
tally destroyed, as in:
get -e -g
The get -1 command causes sees to c:reate an 'l.file.' It is named by replacing
the s. of the sees filename with I., created in the current directory with mode
444 (read only) and owned by the real user. The l.file contains a table (whose
format is described under get(eP) in the Programmer's Reference) showing the
deltas used in constructing a particular version of the sees file.
The following command generates an l.file, showing the deltas applied to
retrieve version 2.3 of file
get -r2.3 -1
Specifying p with -1, for example, results in the output being written to the
standard output, rather than to the l.file.
get -lp -r2.3
The get -g command can be used with -1 to suppress the retrieval of the text.
For example:
get -g -1
The get -m co mmand identifies the changes applied to an sees file. Each line
of the g.file is preceded by the SID of the delta that caused the line to be
inserted. The SID is separated from the text of the line by a tab character.
1 35
Source code control system (SCCS)
The get -n command causes each line of a g.file to be preceded by the value of
the ID keyword and a tab character. This is most often used in a pipeline with
grep(C), which is documented in the User's Reference. For example, to find all
lines that match a given pattern in the latest version of each sees file in a
directory, the following can be executed:
get -p -n -s directory I grep pattern
If both -m and -n are specified, each line of the generated g.file is preceded by
the value of the %M% ID keyword and a tab (the effect of -n), and is followed
by the line in the format produced by -m. Because use of -m and/or -n causes
the contents of the g.file to be modified, such a g.file must not be used for
creating a delta. Therefore, neither -m nor -n may be specified together with
get -e.
NOTE See get(eP) in the Programmer's Reference for a full description of
additional key letters.
The delta command
The delta(eP) command incorporates changes made to a g.file into the corre­
sponding sees file (that is, to create a delta and, therefore, a new version of
the file).
The delta command requires the existence of a p.file (created by means of get
-e). It examines the p.file to verify the presence of an entry containing the
user's login name. If none is found, an error message results.
The delta command performs the same permission checks that get -e per­
forms. If all checks are successful, delta determines what has been changed in
the g.file by comparing it using diff(C), documented in the User's Reference,
with its own temporary copy of the g.file as it was before editing. This tem­
porary copy of the g.file is called the d.file, and is obtained by performing an
internal get on the SID specified in the p.file entry.
The required p.file entry is the one containing the login name of the user exe­
cuting delta, because the user who retrieved the g.file must be the one who
creates the delta. However, if the login name of the user appears in more than
one entry, then the same user has executed get -e more than once on the same
sees file. Then, delta -r must be used to specify the SID that uniquely
identifies the p.file entry. This entry becomes the one used to obtain the SID
of the delta to be created.
In practice, the following is the most common use of delta:
Programming Tools Guide
This command results in the following prompt:
c omme n t s ?
The user replies with a description o f why the delta i s being made, ending the
reply with a new-line character. The user's response may be up to 512 charac­
ters long with new-lines (not intended to terminate the response) escaped by
backslashes ( \ ).
If the sees file has a -v flag, delta first prompts with:
MRs ?
Modification Requests (MRs}, is a formal way of asking for corrections or
enhancements to a file. After the MRs is specified, the standard output is then
read for MR numbers. These are separated by blanks and/or tabs, ending with
a new-line character. In some controlled environments where changes to
source files are tracked, deltas are permitted only when initiated by a trouble
report/ticket, change request, and so on, collectively known as MRs. Record­
ing MR numbers within deltas is a way of enforcing the rules of the change­
managment process.
The delta -y and/or delta -m commands can be used to enter comments and
MR numbers on the command line rather than through the standard input, for
delta -y"descriptive comment'' -m'mrnum1 mrnum2"
In this case, the prompts for comments and MRs are not printed, and the stan­
dard input is not read. These two key letters are useful when delta is exe­
cuted from within a shell procedure. (See sh(C) in the User's Reference.)
The delta -m command is allowed only if the sees file has a -v flag.
All comments and MR numbers, whether solicited by a delta or supplied by
keys, are recorded as part of the entry for the delta being created. They are
applicable to all sees files specified with the same invocation of the delta.
If delta is used with more than one file argument and the first file named has a
-v flag, all files named must have this flag. Similarly, if the first file named
does not have the flag, none of the files named may have it.
When delta processing is complete, the standard output displays the SID of
the new delta (from the p.file) and the number of lines inserted, deleted, and
left unchanged. Here is an example:
1 4 i nserted
7 de l e t e d
u n c h a n ged
1 37
Source code control system (SCCS)
If line counts do not agree with the user's perception of the changes made to a
g.file, it may be because there are various ways to describe a set of changes,
especially if lines are moved around in the g.file. However, the total number
of lines in the new delta (the number inserted plus the number left
unchanged) should always agree with the number of lines in the edited g.file.
If, in the process of making a delta, delta finds no ID keywords in the edited
g.file, the following message is issued after the prompts for commentary but
before any other output:
No i d keywo rds ( crn7 )
This means that any ID keywords that may have existed in the sees file have
been replaced by their values or deleted during the editing process. This
could be caused by:
making a delta from a g.file that was created by a get without �e. (ID keywords are replaced by get in such a case.)
accidentally deleting or changing ID keywords while editing the g.file.
the file having no ID keywords.
In any case, the delta is created unless there is an -i flag in the sees file
(meaning the error should be treated as fatal), in which case the delta is not
After the processing of an sees file is complete, the corresponding p .file entry
is removed from the p.file. All updates to the p.file are made to a temporary
copy, the 'q.file,' whose use is similar to that of the x.file described earlier
under 'SCeS command conventions.' If there is only one entry in the p.file,
then the p.file itself is removed.
In addition, delta removes the edited g.file unless n is specified. For example,
the following command keeps the g.file after processing:
delta n
The delta as command suppresses all output normally directed to the stan­
dard output, other than c omment s ? and MRs ? prompts. Thus, use of -s with -y
(and/or am) causes delta not to read the standard input or write the standard
The changes made to the g.file constitute the delta and may be printed on the
standard output by using delta p The format of this output is similar to that
produced by dliff(C), documented in the User's Reference.
Programming Tools Guide
sees commands
The admin command
The admin(eP) command, documented in the Programmer's Reference, admin­
isters sees files; that is, it creates new sees files and changes the parameters
of existing ones. When an sees file is created, its parameters are initialized
by use of key letters with admin, or are assigned default values if no key
letters are supplied. The same key letters are used to change the parameters
of existing sees files.
Two key letters are used in detecting and correcting corrupted sees files.
(See 'Auditing' under 'SCeS files' later in this chapter.)
Newly created sees files are given access permission mode 444 (read only),
and are owned by the effective user. Only a user with write permission in the
directory containing the sees file may use the admin command on that file.
Creation of SCCS files
An sees file can be created by executing the following command:
admin -ifirst
The value first with the -i flag is the name of the file from which the text of the
initial delta of the sees file,, is to be taken. Omission of a value with -i
indicates that admin is to read the standard input for the text of the initial
The following command is equivalent to the previous example:
admin -i < first
If the text of the initial delta does not contain ID keywords, admin issues the
following warning message:
No i d keywords ( crn7 )
However, if the command also sets the -i flag (not to be confused with the -i
key letter), then the message is treated as an error and the sees file is not cre­
ated. Only one sees file may be created at a time using admin -i.
The -r option used in conjunction with admin creates a release number for the
first delta.
admin ifirst -r3
This �ommand specifies that the first delta should be named 3.1 rather than
the normal 1.1. Because r has meaning only when creating the first delta, its
use is permitted only with -i.
1 39
Source code control system (SCCS)
Inserting commentary for the initial delta
When an sees file is created, the user may want to record why this was done.
Comments (admin -y) and/or MR numbers (-m) can be entered in exactly the
same way as a delta.
If the -y option is omitted, a comment line of the following form is generated:
da t e
t i me
yy /mm /dd
hh : mm : s s by
l o g name
If it is desired to supply MR numbers (admin -m), the -v flag must be set by
means of -f. The -v flag simply determines whether MR numbers must be
supplied when using any sees command that modifies a delta commentary.
(See sccsfile(F) in the Programmer's Reference.) For example:
admin -ifirst -mmrnum1 -fv
-y and -m are effective only if a new sees file is being created.
Initialization and modification of SCCS file parameters
Part of an SCCS file is reserved for descriptive text, usually a summary of the
file's contents and purpose. It can be initialized or changed by using admin -t.
When an sees file is first being created and -t is used, it must be followed by
the name of a file from which the descriptive text is to be taken. For example,
the following command specifies that the descriptive text is to be taken from
file desc.
admin -ifirst -tdesc
When processing an existing sees file, -t specifies that the descriptive text (if
any) currently in the file is to be replaced with the text in the named file.
Omission of the filename after the -t key letter results in the removal of the
descriptive text from the SCCS file. For example:
admin -t
The flags of an sees file may be initialized or changed by admin -f or deleted
by means of -d.
SCCS file flags direct certain actions of the various commands. (See
admin(CP) in the Programmer's Reference for a description of all the flags.) For
example, the -i flag specifies that a warning message (stating that there are no
ID keywords contained in the SCCS file) should be treated as an error. The -d
(default SID) flag specifies the default version of the sees file to be retrieved
by the get command.
The admin -f command sets flags and, if desired, their values. In the follow­
ing example, the -i and -m (module name) flags are set.
1 40
Programming Tools Guide
sees commands
admin -ifirst -fi -fmmodname
The value modname specified for the -m flag is the value that the get command
uses to replace the %M% ID keyword. (In the absence of the -m fla� the
name of the g.file is used as the replacement for the %M% ID keyword.)
Several -f key letters may be supplied on a single admin, and they may be
used whether the command is creating a new sees file or processing an exist­
ing one.
The admin -d command deletes a flag from an existing sees file. For exam­
admin -dm
This command removes the -m flag from the sees file. Several -d key letters
may be used with one admin and may be intermixed with -f.
sees files contain a list of login names and/or group IDs of users who are
allowed to create deltas. This list is empty by default, allowing anyone to cre­
ate deltas. To create a user list (or add to an existing one), use admin -a: For
admin -axyz -awql -a1234
This adds the login names xyz and wql and the group ID 1234 to the list. The
admin -a command may be used whether creating a new sees file or pro­
cessing an existing one.
The admin -e command (erase) removes login names or group IDs from the
prs command
The prs(CP) command prints all or part of an sees file on the standard out­
put. H prs -d is used, the output is in a format called data specification. Data
specification is a string of sees file data key words (not to be confused with
get ID keywords) interspersed with optional user text.
Data keywords are replaced by appropriate values, according to their
In this example, I is defined as the data keyword replaced by the SID of a
specified delta. Similarly, :F: is the data keyword for the sees filename
currently being processed, and :C: is the comment line associated with a
specified delta. All parts of an sees file have an associated data keyword.
For a complete list, see prs(eP) in the Programmer's Reference.
Source code control system (SCCS)
There is no limit to the number of times a data keyword can appear in a data
specification. For example:
prs -d":l: this is the top delta for :F: :1:"
This produces the following output:
2 . 1 t h i s i s t h e t op de l t a f o r s . abc 2 . 1
Information can be obtained from a single delta by specifying its SID using
prs -r. For example:
prs -d":F:: :1: comment line is: : C:" -r1.4
This produces the following output:
s . abc :
1 . 4 c omm e n t l i ne i s : TH I S I S A COMMENT
If -r is not specified, the value of the SID defaults to the most recently created
In addition, information from a range of deltas may be obtained with -1 or -e.
Using prs -e substitutes data keywords for the SID designated by means of -r
and all deltas created earlier, while prs -1 substitutes data keywords for the
SID designated by means of -r and all deltas created later. For example:
prs -d:l: -r1.4 -e
This produces the following output:
Another example is:
prs -d:l: -r1.4 -1
This produces the following output:
Substitution of data keywords for all deltas of the sees file may be obtained
by specifying both -e and -1.
The sact command
The sact(eP) command is like a special form of the prs command that pro­
duces a report about files that are out for editing. The command takes only
one type of argument: a list of file or directory names. The report shows the
1 42
Programming Tools Guide
SID of any file in the list that is out for editing, the SID of the impending delta,
the login of the user who executed the get -e command, and the date and time
the get -e was executed. It is a useful command for an administrator. It is
described in the Programmer's Reference.
The rmdel command
The rmdel(eP) command, documented in the Programmer's Reference, allows
removal of a delta from an sees file. Its use should be reserved for deltas in
which incorrect global changes were made. The delta to be removed must be
a leaf delta. That is, it must be the most recently created delta on its branch or
on the trunk of the sees file tree. In Figure 6-3, only deltas 1 .3.1 .2, 1 .3.2.2, and
2.2 can be removed. Only after they are removed can deltas 1 .3.2.1 and 2.1 be
To remove a delta, the effective user must have write permission in the direc­
tory containing the sees file. In addition, the real user must be either the one
who created the delta being removed or the owner of the sees file and its
The -r key letter is mandatory with rmdel. It specifies the complete SID of the
delta to be removed. For example, to specify the removal of trunk delta 2.3,
use the command:
rmdel -r2.3
Before removing the delta, rmdel checks that the release number (%R%) of the
given SID satisfies the relation:
f l oo r l e s s t h a n or equa l to R l e s s than or equa l t o c e i l i n g
The rmdel command also checks the SID to make sure i t i s not for a version
on which a get for editing has been executed and whose associated delta has
not yet been made. In addition, the login name or group ID of the user must
appear in the file's user list (or the user list must be empty). Also, the release
specified canno t be locked against editing. That is, if the �1 flag is set, the
release must not be contained in the list. (See admin(eP) in the Programmer's
Reference.) If these conditions are not satisfied, processing is terminated, and
the delta is not removed.
Once a specified delta has been removed, its type indicator in the delta table
of the Sees file is changed from D (delta) to R (removed).
The cdc command
The cdc(eP) command, documented in the Programmer's Reference, changes
the commentary made when the delta was created. It is similar to the rmdel
command (for instance, -r and full SID are necessary), although the delta need
not be a leaf delta.
1 43
Source code control system (SCCS)
cdc -r3.4
The above example specifies that the commentary of delta 3.4 is to be
changed. New commentary is then prompted for, as with delta.
The old commentary is kept, but it is preceded by a comment line indicating
that it has been superseded, and the new commentary is entered ahead of the
comment line. The inserted comment line records the login name of the user
executing cdc and the time of its execution.
The cdc command also allows for the insertion of new and deletion of old ("!"
prefix) MR numbers. The following command inserts mmum3 and deletes
mmuml for delta 1 .4.
cdc -r1.4
MR s ? mrnum3
! mrnuml
c ommen t s ?
(The 'MRs?' prompt appears only if
the -v flag has been set.)
deleted wrong MR number and
inserted correct MR number
NOTE An MR (Modification Request) is described in 'The delta co mmand'
section of this guide.
what command
The what(eP ) command, described in the Programmer's Reference, finds identi­
fying information within any UNIX system file whose name is given as an
argument. No key letters are accepted. The what command searches the
given file(s) for all occurrences of the string @(#), which is the replacement for
the %Z% ID keyword. (See get(eP ) in the Programmer's Reference.) It prints on
the standard output whatever follows the string until the first double quote
("), greater than (>), backslash (\), new-line, or nonprinting NULL character.
For example, an sees file called s.prog.c (a e language program) contains the
following line:
i d [ ] = ' %W% ' ;
When the following command is used, the resulting g.file is compiled to pro­
duce prog.o and a. out:
get -r3.4 s.prog.c
Then, the following command is executed:
what prog.c prog.o a.out
1 44
Programming Tools Guide
prog . c :
prog . c :
prog . o :
prog . c :
a . ou t :
prog . c :
The string searched for by what need not be inserted by means of an ID key­
word of get; you can insert it in any convenient manner.
sccsdiff command
The sccsdi ff(eP) command, documented in the Programmer's Reference, deter­
mines (and prints on the standard output) the differences between any two
versions of an sees file. The versions to be compared are specified with
sccsdi ff -r in the same way as with get -r. SID numbers must be specified as
the first two arguments. Any following key letters are interpreted as argu­
ments to the pr(C) command, documented in the User's Reference (which
prints the differences), and must appear before any filenames. The sees
file(s) to be processed are named last. Directory names and a name '-' (a sin­
gle minus sign) are not acceptable to sccsdiff.
The following is an example of the format of sccsdiff:
sccsdi ff -r3.4 -r5.6
The differences are printed the same way as by diff(C), which is discussed in
the User's Reference.
comb command
The comb(eP) command, documented in the Programmer's Reference, lets the
user try to reduce the size of an sees file. It generates a shell procedure (see
sh(C) in the User's Reference) on the standard output, which reconstructs the
file by discarding unwanted deltas and combining other specified deltas. (It is
not recommended that comb be used as a matter of routine.)
In the absence of any key letters, comb preserves only leaf deltas and the min­
imum number of ancestor deltas necessary to preserve the shape of an sees
tree. The effect of this is to eliminate middle deltas on the trunk and on all
branches of the tree. Thus, in Figure 6-3, deltas 1 .2, 1 .3.2.1, 1 .4, and 2.1 would
be eliminated.
Some of the key letters options used with this command are as follows:
comb -s
generates a shell procedure that produces a report of the per­
centage space (if any) the user will save. This is often useful
as an advance step.
1 45
Source code control system (SCCS)
comb -p
specifies the oldest delta the user wants preserved
comb -c
specifies a list (see get(eP) in the Programmer's Reference for its
syntax) of deltas the user wants preserved. All other deltas
are discarded.
The shell procedure generated by comb is not guaranteed to save space. A
reconstructed file may even be larger than the original. Note, too, that the
shape of an sees file tree can be altered by the reconstruction process.
val command
The val(eP) command (see the Programmer's Reference) determines whether a
file is an sees file meeting the characteristics specified by certain key letters.
It checks for the existence of a particular delta when the SID for that delta is
specified with r
The string following -y or -m checks the value set by the -t or the -m flag,
respectively. See admin(eP) in the Programmer's Reference for descriptions of
these flags.
The val command treats the special argument - differently from other sees
commands. It allows val to read the argument list from the standard input
instead of from the command line: the standard input is read until an end-of­
file ((Ctrl)d) is entered. This permits one val command with different values
for key letters and file arguments. For example:
val - -yc -mabc mxyz -ypll
val first checks whether file has a value c for its type flag and value abc
for the module name flag. Once this is done, val processes the remaining file,
in this case,
The val command returns an 8-bit code. Each bit set shows a specific error.
(See val(eP) for a description of errors and codes.) In addition, an appropri­
ate diagnostic is printed, unless suppressed by -s. A return code of 0 means
all files meet the characteristics specified.
vc command
The vc(eP) command is an awk-like tool used for version control of sets of
files. While it is distributed as part of the sees package, it does not require
the files it operates on to be under sees control. A complete description of vc
may be found in the Programmer's Reference.
Programming Tools Guide
sees files
The following topics are discussed in this section:
protection mechanisms used by sees files
format of sees files
recommended procedures for auditing sees files
Sees relies on the capabilities of the UNIX system for most of the protection
mechanisms required to prevent unauthorized changes to sees files, that is,
changes by non-Sees commands. The only protection features directly pro­
vided by sees are:
release lock flag
release floor flag
ceiling flag
user list
Files created by the admin command are given access permission mode 444
(read only). This mode should remain unchanged because it prevents
modification of sees files by non-Sees commands. Directories containing
sees files should be given mode 755, which allows only the owner of the
directory to modify it.
sees files should be kept in directories containing only sees files and any
temporary files created by sees. This simplifies their protection and audit­
ing. Directories should contain logical groupings of sees files: for example,
subsystems of the same large project.
sees files must have only one link (name) because commands that modify
sees files do so by creating a copy of the file (see 'SCeS co mmand conven­
tions'). When processing is completed, the x.file is automatically renamed
with an s. prefix. If the old file had more than one link, the renaming would
break them. Rather than process these files, sees commands produce an
error message.
When only one person uses sees, the real and effective user IDs are the same;
the user ID owns the directories containing sees files. Therefore, sees can
be used directly without any preliminary preparation.
1 47
Source code control system (SCCS)
When several users with unique user IDs are assigned SCCS responsibilities,
one user ID should be selected as the owner of the SCCS files. This person is
responsible for all administration (admin) of the sees files. This limits the
privileges and permissions allowed to other users. To work around this limi­
tation, it is recommended that a project-dependent user interface be set up
allowing other (non-SCCS administrator) users access to the get, delta, and
rmdel sees commands.
The interface program must be owned by the sees administrator and must
have the set-user-ill-on-execution bit on. (See chmod(C) in the User's Refer­
ence.) This assures that the effective user ID is that of the SCCS administrator.
The owner of an SCCS file can modify it at will. Other users whose login
names or group IDs are in the user list for that file (but are not the owner) are
given the necessary permissions only for the duration of the execution of the
interface program. Thus, they may modify sees only with delta and, possi­
bly, rmdel and cdc.
SCCS files are composed of lines of ASCII text arranged in six parts as fol­
This is a line containing the logical sum of all the char­
acters of the file (not including the checksum itself).
Delta table
This provides information about each delta, such as
type, SID, date and time of creation, and commentary.
User names
This is the list of login names and/or group IDs of
users who are allowed to modify the file by adding or
removing deltas.
These are indicators that control certain actions of
sees commands.
Descriptive text
This is usually a summary of the contents and purpose
of the file.
This is the text administered by SCCS, intermixed with
internal sees control lines.
Details on these file sections can be found in sccsfile(F). The checksum is dis­
cussed next under nAuditing."
1 48
Programming Tools Guide
Since SCCS files are ASCII files, they can be processed by non-SCCS com­
like ed(C), grep(C), and cat(C), all documented in the User's Reference.
This is convenient when an sees file must be modified manually (such as
when a delta's time and date were recorded incorrectly because the system
clock was set incorrectly), or when a user wants simply to look at the file.
NOTE Care should be exercised when modifying sees files with non-Sees
When a system or hardware malfunction destroys an sees file, any co mmand
issues an error message. Commands also use the checksum stored in an sees
file to determine whether the file has been corrupted since it was last accessed
(possibly by having lost one or more blocks or by having been modified with
ed(C)). No sees command processes a corrupted sees fih except the admin
command with -h or -z , as described below.
sees files should be audited for possible corruptions on a regular basis. The
simplest and fastest way to do an audit is to use admin -h and specify all
sees files. Either command works:
admin -h sjile1 s.file2 . . .
admin -h directory1 directory2 . . .
If the new checksum of any file is not equal to the checksum in the first line of
that file, the following message is produced for that file:
c o rrupt ed f i l e ( c o 6 )
The process continues until all specified files have been examined. When exa­
mining directories (ds in the second example above), the checksum process
does not detect missing files. A simple way to learn whether files are missing
from a directory is to execute the ls(C) command, described in the User's Refer­
ence, periodically, and compare the outputs. Any file whose name appeared
in a previous output but not in the current one no longer exists.
When a file has been corrupted, the way to restore it depends on the extent of
the corruption. If damage is extensive, the best solution is to contact the local
UNIX system operations group and request that the file be restored from a
backup copy. If the damage is minor, repair through editing may be possible.
After such a repair, the admin command must be executed:
The purpose of this command is to recompute the checksum and bring it into
agreement with the contents of the file. After this co mmand is executed, any
corruption that existed in the file is no longer detectable.
1 49
Source code control system (SCCS)
1 50
Programming Tools Guide
Shared libraries
Efficient use of disk storage space, memory, and computing power is becom­
ing increasingly important. A shared library can offer savings in all three
areas. For example, if constructed properly, a shared library can make a.out
files (executable object files) smaller on disk storage and processes (a.out files
that are executing) smaller in memory.
The section "What is a Shared Library," describes what a shared library is and
how to use one to build a.out files. It also offers suggestions about when and
when not to use a shared library and how to determine whether an a. out uses
a shared library.
The section "Building a Shared Library," describes how to build a shared
library. Specifically, this part describes how to use the tool mkshlib(CP) and
how to write C code for shared libraries. Also described is how to use the tool
chkshlib(CP), which checks the compatibility of versions of shared libraries.
NOTE Shared libraries are a new feature of UNIX System V Release 3.0 and
later. An executable object file that needs shared libraries will not run on
previous releases of UNIX System V.
What is a shared library?
A shared library is a file containing object code that several a. out files may use
simultaneously while executing. When a program is link edited with a shared
library, the library code that defines the program's external references is not
copied into the program's object file. Instead, a special section called .lib that
identifies the library code is created in the object file. When the UNIX System
executes the resulting a. out file, it uses the information in this section to bring
the required shared library code into the address space of the process.
1 51
Shared libraries
The implementation behind these concepts is a shared library with two pieces.
The first, called the host shared library, is an archive that the link editor
searches to resolve user references and to create the .lib section in a.out files .
Th e structure and operation o f this archive i s the same a s any archive without
shared library members. It must be present on the system when the a. out files
are link edited.
The second part of a shared library is the target shared library. This is the file
that the UNIX System uses when running a. out files built with the host shared
library. It contains the actual code for the routines in the library. It must be
present on the the system where the a. out files will be run.
A shared library offers several benefits by not copying code into a. out files. It
save disk storage space
Shared library code is not copied into all the a.out files that use that code,
a. out files are smaller and use less disk space.
save memory
By sharing library code at run time the dymanic memory needs of the pro­
cesses are reduced.
make executable files using library code easier to maintain
At run time shared library code is brought into the processes' address space.
Therefore, updating a shared library effectively updates all executable files
that use the library. If an error in shared library code is fixed, all processes au­
tomatically use the corrected code.
Non-shared libraries cannot offer this benefit: changes to archive libraries do
not affect executable files, because code from the libraries is copied to the files
during link editing, not during execution.
Shared libraries examples
The C and networking libraries are available as shared libraries.
Host Library
Command Line Option
Target Library
C Library
Networking Library
The _s suffix is used to indicate that the library is shared and statically linked.
1 52
Programming Tools Guide
What is a shared library?
Building an
a.out file
The standard (non-shared) C library is still available with releases of the C
Programming Language Utilities; this library is searched by default during the
compilation or link editing of C programs.
To build a.out files, a shared library's name must be directly referenced by
using the -1 option with the C compiler. You direct the link editor to search a
shared library the same way you direct a search of any library on the UNIX
file.c -o file ... -llibrary_jile ...
To direct a search of the networking library, for example, you use the follow­
ing command line.
file.c -o file ... -1ns1_s . . .
An d t o l ink all the files in your current directory together with th e shared C
library you'd use the following command line:
*.c -1c_s
Including the -1c_s argument after all other 1 arguments on a command line,
indicates that the shared C library will be treated like the relocatable C library,
which is searched by default after all other libraries specified on a command
A shared library might be built with references to other shared libraries. That
is, the first shared library might contain references to symbols that are
resolved in a second shared library.
For example, if the shared library libX_s.a references symbols in the shared C
library, the command line would be as follows:
* .c -lX_s -lc_s
Notice that the shared library containing the references to symbols must be
listed on the command line before the shared library needed to resolve those
references. For more information on inter-library dependencies, see the sec­
tion "Referencing Symbols in a Shared Library from Another Shared Library''
later in this chapter.
1 53
Shared libraries
Deciding whether to
use a shared library
The decison to use a shared library, should be based on whether it saves space
in disk storage and memory. A well-designed shared library almost always
saves space. As a general rule, use a shared library when it is available.
To determine what savings are gained from using a shared library, try build­
ing the same application with both a non-shared and a shared library, assum­
ing both kinds are available. Then compare the two versions of the applica­
tion for size and performance. For example:
$ c a t he l l o . c
ma i n ( )
pr i n t f ( ' He l l o \ n " ) ;
$ c c - o u n shared he l l o . c
$ c c - o shared he l l o . c - l c_s
$ s i z e u n shared shared
u n shared : 8 6 8 0 + 1 3 8 8 + 2 2 4 8 = 1 2 3 1 6
shared : 3 0 0 + 6 8 0 + 2 2 4 8 = 3 /. 2 8
If the application calls only a few library members, i t is possible that using a
shared library could take more disk storage or memory. For a more detailed
discussion see "Space Considerations" later in this section.
Space considerations
The following section outlines the benefits and drawbacks of using a shared
library. Covered are:
how shared libraries save space that non-shared libraries cannot
how shared libraries might increase space usage
Saving space
To better understand how a shared library saves space, compare the space
used between a shared library and an non-shared library.
A host shared library resembles an non-shared library in three ways. First,
both are archive files. Second, the object code in the library typically defines
commonly used text symbols and data symbols. The symbols defined inside,
and made visible outside, the library are external symbols. Note that the
library may also have imported symbols, symbols that it uses but does not
define. Third, the link editor searches the library for these symbols when
1 54
Programming Tools Guide
Space considerations
linking a program to resolve its external references. By resolving the refer­
ences, the link editor produces an executable version of the program, the a. out
NOTE The link editor on the UNIX System is a static linking tool; static link­
ing requires all symbolic references in a program be resolved before the pro­
gram may be executed. The link editor uses static linking with both an
non-shared library and a shared library.
Although these similarities exist, a shared library differs significantly from an
non· shared library. The major differences are related to how the libraries are
handled to resolve symbolic references.
To produce an a.out file using an non-shared library, the link editor copies the
library code that defines a program's unresolved external reference from the
library into appropriate .text and .data sections in the program's object file. In
contrast, to produce an a.out file using a shared library, the link editor copies
from the shared library into the program's object file only a small amount of
code for initialization of imported symbols. (See the section "Importing Sym­
bols" later in the chapter for more details on imported symbols.) For the bulk
of the library code, it creates a special section called .lib in the file that
identifies the library code needed at run time and resolves the external refer­
ences to shared library symbols with their correct values. When the UNIX Sys­
tem executes the resulting a. out file, it uses the information in the .lib section
to bring the required shared library code into the address space of the process.
Figure 7-1 depicts the a.out files produced using a non-shared version and a
shared version of the standard C library to compile the following program:
ma i n ( )
p r i n t f ( ' How do you l i ke th i s manua l ? \ n ' ) ;
r e s u l t = s t rcmp ( ' I do . ' , a n swer ) ;
Notice that the shared version is smaller. Figure 7-2 depicts the process
images in memory of these two files when they are executed.
1 55
Shared libraries
a.out Using
a.out Using
Non-shared Library
program .text
Shared Library
Created by the link editor.
Refers to library code for
print and strcmp(S)
library .text
for printf(S) and
Figure 7-1
for printf(S) and
program .text
program .data
CoP.ied to file by
the link editor
a.out files created using a non-shared library and a shared library
Now consider what happens when several a. out files need the same code from
a library. When using a non-shared library, each file gets its own copy of the
code. This results in duplication of the same code on the disk and in memory
when the a. out files are run as processes. In contrast, when a shared library is
used, the library code remains separate from the code in the a. out files, as indi­
cated in Figure 7-2. This separation enables all processes using the same
shared library to reference a single copy of the code.
1 56
Programming Tools Guide
Coding an application
May be brought
to other processes
. . . . . b.) . . . . . .
·· ·
.: . · . ·
:· ·
· .?
· library
p: · · Brought into
address space
Library code referred
to by .lib
Figure 7-2 Processes using an archive and a shared library
Increase space usage in memory
A target library might add space to a process. A shared library's target file
may have both text and data regions connected to a process. The text region
is shared by all processes that use the library, the data region is not. Every
process that uses the library gets its own private copy of the entire library
data region. This adds to the process's memory requirements. As a result, if
an application uses only a small part of a shared library's text and data, exe­
cuting the application might require more memory with a shared library than
without one. For example, using the shared C library to access only strcmp(S)
saves disk storage and memory. However, the memory cost for sharing all
the shared C library's private data region outweighs the savings. The non­
shared version of the library would be more appropriate.
Coding an application
Application source code in C or assembly language is compatible with both
non-shared and shared libraries. As a result, there should be no coding
changes required in applications that currently use a shared library. When
coding a new application for use with a shared library, observe standard cod­
ing conventions.
1 57
Shared libraries
Use the following two points when using either an non-shared or a shared
Don't define application symbols with the same names as those in a library.
Although exceptions exist, avoid redefining standard library routines, such
as printf(S) and strcmp(S). Replacements that are incompatibly defined
can cause any library, shared or unshared, to behave incorrectly.
Don't use undocumented archive routines.
Use only the functions and data mentioned on the manual pages describing
the routines in Section (S) of the Programmer's Reference.
Identifying a.out files that use shared libraries
Use the dump(CP) command (documented in the Programmer's Reference) to
look at the section headers for the file:
dump hv a.out
If the file has a .lib section, a shared library is needed. If the a.out does not
have a Jib section, it does not use shared libraries.
To display the shared libraries used by a.out, use the -L option as shown in the
following example:
dump -L a.out
Debugging a.out files that use shared libraries
dbxtra reads the shared libraries' symbol tables and performs as documented
(in the Programmer's Reference) using the available debugging information.
The branch table is hidden so that functions in shared libraries can be refer­
enced by their names.
Shared library data are not dumped to core files, however. If an error is
encountered resulting in a core dump and does not appear in the application's
code, debugging might be easier if the application is rebuilt with the non­
shared version of the library used.
1 58
Programming Tools Guide
Implementing shared libraries
Implementing shared libraries
The following section describes host and target shared libraries and the
branch table.
host library and target library
Every shared library has two parts: the host library used for linking that
resides on the host machine and the target library used for execution that
resides on the target machine. The host machine is the machine on which an
a.out file is built; the target machine is the machine on which the file is run.
The host and target may be the same machine, but it is not a requirement.
The host library is just like a non-shared library. Each of its members (typi­
cally a complete object file) defines text and data symbols in its symbol table.
The link editor searches this file when a shared library is used during the com­
pilation of a program.
The search is for definitions of symbols referenced in the program but not
defined there. However, the link editor does not copy the library code
defining the symbols into the program's object file. Instead, it uses the library
members to locate the definitions and then places symbols in the file that tell
where the library code is. The result is the special section in the a. out file
shown in Figure 7-1 as .lib.
The target library, used for execution, resembles an a.out file. During execu­
tion, if a process needs a shared library, this file is read. The .lib section in the
a.out file tells which shared libraries are needed. When the UNIX System exe­
cutes the a.out file, it uses this section to bring the appropriate library code
into the address space of the process. In this way, before the process starts to
run, all required library code has been made available.
Shared libraries enable the sharing of .text sections in the target library, which
is where text symbols are defined. Although processes that use the shared
library have their own virtual address spaces, they share a single physical
copy of the library's text among them.
The target library cannot share its .data sections. Each process using data
from the library has its own private data region that mirrors the .data section
of the target library. Processes that share text do not share data and stack area
in order that they do not interfere with one another.
As suggested above, the target library is a lot like an a.out file, which can also
share its text, but not its data. Processes must have execute permission for a
target library in order to execute an a. out file that uses the library.
1 59
Shared libraries
branch table
When the link editor resolves an external reference in a program, it gets the
address of the referenced symbol from the host library. A static linking loader
like ld binds symbols to addresses during link editing. In this way, the a.out
file for the program has an address for each referenced symbol.
If non-shared library code is updated and the address of a symbols changes,
the a.out file will still run, since that file already has a copy of the code
defining the symbol. However, that type of change can adversely affect an
a.out file built with a shared library. This file has only a single symbol telling
where the required library code is. Therefore, if the a.out file ran after a
change, the operating system could bring in the wrong code.
To avoid this, two options are available. The first is to recompile the library
after each update. The second, is to implement the shared library with a
branch table. A branch table associates text symbols with absolute addresses.
Each address labels a jump instruction to the address of the code defining a
symbol. Instead of being directly associated with the addresses of code, text
symbols have addresses in the branch table.
Figure 7-3 shows two a.out files executing a call to printf(S). The process on
the left was built using an non-shared library. It already has a copy of the
library code defining the printf(S) symbol. The process on the right was built
using a shared library. This file references an absolute address (10) in the
branch table of the shared library at run time; at this address, a jump instruc­
tion references the needed code.
1 60
Programming Tools Guide
Building a shared library
A shared library uses
a branch table.
· ·
. .
archive library does
not use a branch table.
call printf(S
Figure 7-3
branch table in a shared library
Data symbols do not have a mechanism to prevent a change of address
between shared libraries. The tool chkshlib (CP) compares a. out files
with a shared library to check compatibility and help you decide if the
files need to be recompiled. See "Checking Versions of Shared
Libraries Using chkshlib (CP)."
Building a shared library
This part of the chapter explains how to build a shared library. It covers the
major steps in the building process, the use of the UNIX System tool
mkshlib(CP) that builds the host and target libraries, and some guidelines for
writing shared library code. An example is provided to demonstrate the
building process.
This section makes the following assumptions:
you are an advanced C programmer
you are familiar with the archive library building process.
familiar with basic operating system concepts
1 61
Shared libraries
building process
To build a shared library the following six steps must be completed:
choosing region addresses
choosing the pathname for the shared library target file
selecting the library contents
rewriting existing library code to be included in the shared library
writing the library specification file
using the mkshlib tool to build the host and target libraries
Step 1: choosing region addresses
Choose region addresses for your shared library.
Shared library regions on the 386-based computer correspond to memory
management unit (MMU) segment sizes, each of which is 4 MB. The follow­
ing table gives a list of the segment assignments on the 386-based computer
(as of the copyright date for this guide) and shows what virtual addresses are
available for libraries.
1 62
Programming Tools Guide
Building a shared library
Reserved for AT&T
System Shared C Library
AT&T Networking
Generic Database Library
Generic Statistical Library
Generic User Interface Library
Generic Screen Handling Library
Generic Graphics Library
Generic Networking Library
Scan Code library
Libprot library
For private use
1 63
Shared libraries
If a shared library is built within a reserved address region, there is a risk of
conflicting with future products.
A number of segments are allocated for shared libraries that provide various
services such as graphics, database access, and so on. These categories are
intended to reduce the chance of address conflicts among commercially avail­
able libraries. Although two libraries of the same type may conflict, that
doesn't matter. A single process should not usually need to use two shared
libraries of the same type. If the need arises, a program can use one shared
library and one non-shared library instead.
Any number of libraries can use the same virtual addresses, even on
the same machine. Conflicts occur only within a single process, not among
separate processes. Thus two shared libraries can have the same region
, addresses without causing problems, as long as a single a.out file doesn't
need to use both libraries.
Several segments are reserved for private use. If you are building a large sys­
tem with many a. out files and processes, shared libraries might improve its
performance. As long as you don't intend to release the shared libraries as
separate products, you should use the private region addresses. You can put
your shared libraries into these segments and avoid conflicting with commer­
cial shared libraries. You should also use these segments when you will own
all the a. out files that access your shared library. Don't risk address conflicts.
I NOTE If you plan to build a commercial shared library, you are
� encouraged to provide a compatible, relocatable archive as well.
Step 2: choosing the target library pathname
After selecting the region addresses, choose the pathname for the target
library. For example /shlib/libc_s was chosen for the shared C library and
/shlib/libnsl_s for the networking library. To choose a pathname for your
shared library, consult the established list of names for your computer or see
your system administrator.
NOTE Shared libraries required to boot a UNIX System should be located in
/shlib; other application libraries normally reside in /usr/lib or in private
application directories.
selecting library contents
The most important task in the building process is selecting the contents for
the shared library. Some routines are prime candidates for sharing; others are
not. For example, it's a good idea to include large, frequently used routines in
a shared library but to exclude smaller routines that aren't used as much. For
1 64
Programming Tools Guide
Building a shared library
general guidelines see the section "Choosing Library Members" in this chapter.
Also see the guidelines in the following sections: "Importing Symbols,"
"Referencing Symbols in a Shared Library from Another Shared Library," and
"Tuning the Shared Library Code."
Step 4: rewriting existing library code
If you choose to include some existing code from an archive library in a
shared library, changing some of the code will make the shared code easier to
maintain. See the section "Changing Existing Code for the Shared Library'' in
this chapter.
writing the library specification file
using mkshlib
After you select and edit all the code for your shared library, you have to
build the shared library specification file. The library specification file con­
tains all the information that mkshlib needs to build both the host and target
libraries. See section "An example" for a sample specification file. Also, see
the section "Using the Specification File for Compatibility'' in this chapter.
The contents and format of the specification file are given by the directives
list. A description of each directive is provided in the mkshlib(CP) manual
page. References to these directives are made throughtout this chapter. Prior
review of the mkshlib(CP) manual page is recomm�nded.
build the host and target
The mkshlib(CP) command builds both the host and target libraries.
mkshlib invokes other tools such as the assembler, as(CP), and link editor,
ld(CP) Tools are invoked through the use of execvp [see exec(S)], which
searches directories in a user's $PATH environment variable. Also, prefixes to
mkshlib are parsed in much the same manner as prefixes to the cc(CP) com­
mand and invoked tools are given the prefix, where appropriate. For exam­
ple, 3mkshlib invokes 3ld. These commands all are documented in the
Programmer's Reference.
Guidelines for writing shared library
The main advantage of a shared library over an archive library is sharing and
the space it saves. These guidelines stress ways to increase sharing while
avoiding the disadvantages of a shared library. The guidelines also stress
upward compatibility.
We recommend that you read these guidelines once to get a perspective of the
concepts involved in building a shared library.
1 65
Shared libraries
When building a shared library there are a few restrictions involving static
linking that should be considered. For example:
External symbols have fixed addresses.
If an external symbol moves, you have to re-link all a.out files that use the
library. This restriction applies both to text and data symbols.
Use of the #hide directive to limit externally visible symbols can help avoid
problems in this area. (See "Use #hide and #export to Limit Externally
Visible Symbols" in the "Using the Specification File for Compatibility'' sec­
tion for more details).
If the library's text changes for one process at run time, it changes for all
If the library uses a symbol directly, that symbol's run time value (address)
must be known when the library is built.
Imported symbols canno t be referenced directly.
Their addresses are not known when you build the library, and they can be
different for different processes. You can use imported symbols by adding
an indirection through a pointer in the library's data.
Choosing library members
Include large, frequently used routines
Large, frequently used routines are prime candidates for sharing. Placing
them in a shared library saves code space for individual a.out files and saves
memory, too, when several concurrent processes need the same code.
printf(S) and related C library routines (which are documented in the
Programmer's Reference) are good examples.
NOTE Since the printf(S) family of routines is used frequently, we included
printf(S) and related routines when we built the shared C library.
Exclude infrequently used routines
Putting infrequently used routines in a shared library can degrade perfor­
mance, particularly on paging systems. Traditional a. out files contain all code
they need at run time. The code in an a.out file is related to the process.
Therefore, if a process calls a function, it may already be in memory because
of its proximity to other text in the process. See also "Organize to Improve
Locality'' in the "Tuning the Shared Library Code'' section.
1 66
Programming Tools Guide
Changing existing code for the shared library
Exclude routines that use much static data
Routines that use much static data increase the size of processes. Every pro­
cess that uses a shared library gets its own private copy of the library's data,
regardless of how much of the data is needed. Library data is static: it is not
shared and cannot be loaded selectively with the provision that unreferenced
pages may be removed from the working set.
For example, getgrent(S) (documented in the Programmer's Reference), is not
used by many standard UNIX System commands. Some versions of the
module define over 1400 bytes of unshared, static data. It probably should
not be included in a shared library.
Exclude routines that complicate maintenance
All external symbols must remain at constant addresses. The branch table
makes this easy for text symbols, but data symbols don't have an equivalent
mechanism. The more data a library has, the more likely some of them will
have to change size. Any change in the size of external data may affect sym­
bol addresses and break compatibility.
Include routines the library itse lf needs
Make the library self-contained. For example, printf(S) requires much of the
standard l/0 library. A shared library containing printf(S) should contain the
rest of the standard l/0 routines, too.
NOTE This guideline should not take priority over the others in this section.
If you exclude some routine that the library itself needs based on a previous
guideline, consider leaving the symbol out of the library and importing it.
Changing existing code for the
All C code that works in a shared library will also work in an non-shared
library. However, the reverse is not true because a shared library must explic­
itly handle imported symbols. The following guidelines are meant to help
you produce shared library code that is still valid for non-shared libraries
(although it may be slightly bigger and slower). The guidelines explain how
to structure data for ease of maintenance, since most compatibility problems
involve restructuring data.
1 67
Shared libraries
Minimize global data
All external data symbols are visible to applications. This can make mainte­
nance difficult. You should try to reduce global data, as described below.
First, try to use automatic (stack) variables. Don't use permanent storage if
automatic variables work. Using automatic variables saves static data space
and reduces the number of symbols visible to application processes.
Second, see whether variables really must be external. Static symbols are not
visible outside the library, so they may change addresses between library ver­
sions. Only external variables must remain constant. See "Use #hide and
#export to Limit Externally Visible Symbols" in the section "Using the Specifi­
cation File for Compatibility" later in this chapter for further tips.
Third, allocate buffers at run time instead of defining them at compile time.
This reduces the size of the library's data region for all processes (saving
memory). It also allows the size of the buffer to change from one release to
the next without affecting compatibility. Statically allocated buffers cannot
change size without affecting the addresses of other symbols and, perhaps,
breaking compatibility.
Define text and global data in separate source files
Separating text from global data to prevent data symbols from moving. If
new external variables are needed, they can be added at the end of the old
definitions to preserve the old symbols' addresses.
Libraries let the link editor extract individual members. This works fine for
relocatable files, but shared libraries have a different set of restrictions. If
external variables were scattered throughout the library modules resulting in
external and static data being intermixed. Changing static data, like hello in
the following example, moves subsequent data symbols, even the external
1 68
Programming Tools Guide
Changing existing code for the shared library
Broken Successor
i n t head
i n t head
func ( )
func ( )
' he l l o ' ;
int ta i l = 0 ;
' he l l o , wor l d ' ;
int tai l
Assume the relative virtual address of head is 0 for both examples. The string
literals will have the same address too, but they have different lengths. The
old and new addresses of tail thus might be 12 and 20, respectively. If tail is
supposed to be visible outside the library, the two versions will not be compa­
NOTE The compilation system sometimes defines and uses static data
invisibly to the user (e.g. tables for switch statements).
Adding new external variables to a shared library may change the addresses
of static symbols, but this doesn't affect compatibility. An a.out file has no
way to reference static library symbols directly, so it cannot depend on their
NOTE There is a real danger of mixing in static data with exported data
when building a shared library. This may not immediately cause problems
but may cause incompatibilities in later versions of the shared library. If the
external data modules are not first, a seemingly harmless change (such as a
new string literal) can break existing a.out files. Even changing the code
may not be necessary to cause a problem: the user might count on the com­
piler to place statics at a known default location. If the compiler is ever
replaced that default location may change and then existing a.out files will
also break. By this time it may be too late to do anything about and the only
alternative may be to distribute multiple versions of the same shared
1 69
Shared libraries
To avoid these problems, group all exported data symbols and place them at
lower addresses than static (hidden) data. The following are suggestions for
locating exported data symbols at lower addresses than static data:
1. The exported data symbols should go in a file (or files) by themselves.
2. Do not put other external data in this file. Remember, if the #hide directive
has been used all other external data may become static.
3. Do not initialize exported data symbols with static data. For example, ini­
tializing an exported data symbol to an unnamed string literal is a bad
idea since the exported data file will now contain static data intermixed
with the exported data. If there is a need to initialize the exported data
symbol then do so in a way that the initializations are themselves data
symbols which can be defined in another data file.
4. Put all other external data in a separate data file. Shared library users get
all library data at run time, regardless of the source file organization. Con­
sequently, all external variables' definitions can be put in a single source
file without a space penalty.
5. Place data and text object files in the #obj ects list as follows:
a. imported symbols definition files (remember symbols mentionned in
#init and #branch directives are always external).
b. exported data files
c. all other data files,
d. all other text files.
6. Check the exported data object file by dumping the object file after it has
been built. The size of all data should exactly match the size of the
exported data symbols
7. Add new exported data to the end of the exported data file.
Initialize global data
Wtialize external variables, including the pointers for imported symbols.
Although this uses more disk space in the target shared library, the expansion
is limited to a single file. mkshlib will give a fatal error if it finds an uninitial­
ized external symbol.
Using the specification file for compatibility
The way in which directives are used in the specification file can affect compa­
tibility across versions of a shared library. This section gives some guidelines
on how to use the directives #branch, #hide, and #export (see also
mkshlib(CP) man page).
1 70
Programming Tools Guide
Changing existing code for the shared library
Preserve branch table order
Add new functions only at the end of the branch table. Try and maintain com­
patibility with previous versions after a specification file for the library is cre­
ated. New functions can be added without breaking old a.out files as long as
the previous assignments are not changed. This allows distribution of a new
version of the library without having to re-link all of the a. out files that used
the previous version.
and #export to limit externally visible symbols
Variables (or functions) can be referenced from several object files for inclu­
sion in the shared library. However, they are not intended to be available to
users of the shared library. They must be external so that the link editor can
properly resolve all references to symbols and create the target shared library,
but should be hidden from the user's view to prevent their use. Unintended
use can result in compatibility problems if the symbols move or are removed
between versions of the shared library.
The #hide and #export directives can resolve this dilemma . The #hide direc­
tive causes mkshlib, after resolving all references within the shared library, to
alter the symbol tables of the shared library so that all specified external sym­
bols are made static and inaccessible from user code. You can specify the
symbols to be so treated either individually or through the use of regular
The #export directive allows you to specify those symbols in the range of an
accompanying #hide directive regular expression which should remain exter­
nal. For example, in the shared C library all data symobls are hidden by
default. Symbols required outside of the library are then explicity exported:
!! h i de l i n ker
!! export l i n ker
o p t arg
opt err
opt i n d
opt o p t
The advantage to this approach is that future changes to the library won't
introduce new external symbols (possibly causing name collisions), unless the
new symbols are explicitly exported. The symbols to be exported are chosen
by looking at a list of all the current external symbols in the shared C library
and finding out what each symbol is used for. The symbols that are global but
were only used in the shared C library are not exported; these symbols will be
hidden from applications code. All other symbols are explicitly exported.
1 71
Shared libraries
NOTE It is a fatal error to try to explicitly name the same symbol in a #hide
and an #export directive.
The #export directive is useful when building a complicated shared library
where many symbols are to be made static. In these cases, it is more efficient
to use regular expressions to make all external variables static and individu­
ally list those symbols you need to be external.
NOTE Symbols mentioned in the #branch and #init directives are services
of the shared library, must be external symbols, and cannot be made static
through the use of these directives.
Importing symbols
Shared library code cannot directly use symbols defined outside a library, but
an escape hatch exists. You can define pointers in the data area and arrange
for those pointers to be initialized to the addresses of imported symbols.
Library code then accesses imported symbols indirectly, delaying symbol
binding until run time. Libraries can import both text and data symbols.
Moreover, imported symbols can come from the user's code, another library,
or even the library itself. In Figure 7-4, the symbols _libc.ptrl and _libc.ptr2
are imported from user's code and the symbol _l ibc_m alloc from the library
1 72
Programming Tools Guide
Changing existing code for the shared
Shared Library
a. out File
ma l l o c ( )
_l ibc . p t r l
l ibc_ma l l o c
_l ibc_p t r 2
Figure 7-4 Imported symbols in a shared library
The following guidelines describe when and how to use imported symbols.
Imported symbols that the library does not define
Non-shared libraries typically contain relocatable files, which allow undefined
references. Although the host shared library is an archive, too, that archive is
constructed to mirror the target library, which more closely resembles an a. out
file. Neither target shared libraries nor a.out files can have unresolved refer­
ences to symbols.
Shared libraries must import any symbols they use but do not define. Some
shared libraries will derive from existing non-shared libraries. For the reasons
stated above, it may not be appropriate to include all the non-shared archive's
modules in the target shared library. Remember though that if you exclude a
symbol from the target shared library that is referenced from the target shared
library, you will have to import the excluded symbol.
Imported symbols that users must be able to redefine
Optionally, shared libraries can import their own symbols. Two standard
libraries, libc and libmalloc, provide a malloc family. Even though most
UNIX System commands use the malloc from the C library, they can choose
either library or define their own.
1 73
Shared libraries
Three possible strategies exist for building the shared C library. First, exclude
the malloc(S) family. But other library members might need it, and so it will
have to be an imported symbol. This will work, but it means less savings.
Second, include the malloc family don't import it. This provides more sav­
ings for typical commands. However, other library routines call malloc
directly, and those calls can not be overridden. If an application tries to
redefine malloc, the library calls will not have to use the alternate "ersion.
Furthermore, the link editor will find multiple definitions of malloc while
building the application. To resolve this the library developer will have to
change source code to remove the custom malloc, or refrain from using the
shared library.
Finally, the most flexible, is to include malloc in a shared library, treating it as
an imported symbol. Even though malloc is in the library, nothing else there
refers to it directly; all references are through an imported symbol pointer. If
the application does not redefine malloc, both application and library calls are
routed to the library version. All calls are mapped to the alternate, if present.
You might want to permit redefinition of all library symbols in some libraries.
You can do this by importing all symbols the library defines, in addition to
those it uses but does not define. Although this adds a little space and time
overhead to the library, the technique allows a shared library to be one hun­
dred percent compatible with an existing non-shared library at link time and
run time. This is the strategy used for the installed version of the Shared C
Mechanics of importing symbols
For example, assume a shared library wants to import the symbol malloc.
The original non-shared code and the shared library code appear below.
Non-Shared Library
Shared Library Code
/* See po i n t ers . c o n n e x t page * /
e x t ern char *ma l l oc ( ) ;
ext ern char * ( *_l i bc_ma l l oc ) ( ) ;
funce ( )
funce ( )
rna l l oc ( n ) ;
p = ( *_ l i bc_ma l l o c ) ( n ) ;
Making this transformation is straightforward, but two sets of source code
would be necessary to support both an non-shared and a shared library.
Some simple macro definitions can hide the transformations and allow source
code compatibility. A header file defines the macros, and a different version
of this header file would exist for each type of library. The -I flag to cc(CP),
documented in the Programmer's Reference, would direct the C preprocessor to
look in the appropriate directory to find the desired file.
1 74
Programming Tools Guide
Changing existing code for the shared library
Non-shared import.h
Shared import.h
/ * emp ty * /
* Macros f o r import i n g
* symbo l s . One # de f i n e
* per symbo 1 .
# de f i ne ma l l oc
( * _ l i bc_ma l l oc )
ex t e rn char *ma l l oc ( ) ;
These header ffies allow one source both to serve the original archive source
and to serve a shared library, too, because they supply the indirections for
imported symbols. The declaration of malloc in import.h actually declares
the pointer _libc_malloc.
Comm o n Source
# i n c l u de ' import . h '
e x t ern char *ma l l oc ( ) ;
funce ( )
ma l l oc ( n ) ;
Alternatively, one can hide the ll inc 1 ude with ll i f de f :
Comm o n Source
# i fdef SHL I B
i nc l ude ' i mport . h '
#endi f
e x t ern c h a r *ma l l oc ( ) ;
funce ( )
ma l l oc ( n ) ;
1 75
Shared libraries
NOTE When building the shared library the codt� can be conditionally
turned on by defining shlib via the -on flag to cc(CP).
Of course the transformation is not complete. You must define the pointer
Fi le
char * ( *_l i bc_ma l l oc ) ( ) = 0 ;
NOTE _libc_malloc is initialized to zero, because it is an external data sym­
Special initialization code sets the pointers. Shared library code should not
use the pointer before it contains the correct value. In the example the
address of malloc must be assigned to _libc_malloc. Tools that build the
shared library generate the initialization code according to the library specifi­
cation file.
Pointer initialization fragments
A host shared library archive member can define one or many imported sym­
bol pointers. Regardless of the number, every imported symbol pointer
should have initialization code.
This code goes into the a.out file and does two things. First, it creates an
unresolved reference to make sure the symbol being imported gets resolved.
Second, initialization fragments set the imported symbol pointers to their
values before the process reaches main. If the imported symbol pointer can
be used at run time, the imported symbol will be present, and 'the imported
symbol pointer will be set properly.
NOTE Initialization fragments reside in the host, not the target, shared
library. The link editor copies initialization code into a.out files to set
imported pointers to their correct values.
Library specification files describe how to initialize the imported symbol
pointers. For example, the following specification line would set
_libc_malloc to the address of malloc:
ll i n i t pma l l oc . o
ma l l oc
l i bc_ma l l oc
1 76
Programming Tools Guide
Changing existing code for the shared library
When mkshlib builds the host library, it modifies the file pmalloc.o, adding
relocatable code to perform the following assignment statement:
_ l i bc_rna l l oc = &rna l l o c ;
When the link editor extracts pmalloc.o from the host library, the relocatable
code goes into the a.out file. As the link editor builds the final a.out file, it
resolves the unresolved references and collects all initialization fragments.
When the a.out file is executed, the run time startup routines execute the ini­
tialization fragments to set the library pointers.
Selectively loading imported symbols
You can reduce unnecessary loading by writing C source files that define
imported symbol pointers singly or in related groups. If an imported symbol
must be individually selectable, put its pointer in its own source file (and
archive member). This will give the link editor a finer granularity to use when
it resolves the reference to the symbol.
For example, a single source file might define all pointers to imported sym­
Old pointers.c
i n t ( * _l i bc_p t r l ) ( ) = 0 ;
char * ( *_l i bc_rna l l oc ) ( ) = 0 ;
i n t ( *_ l i bc_p t r 2 ) ( ) = 0 ;
Allowing the loader to resolve only those references that are needed requires
multiple source files and archive members. Each of the new files defines a sin­
gle pointer:
ptr1 .c
(int (*_libc_ptrl) ( )
(char *(*_Iibc_malloc) ( )
(int (*_libc_ptr2)( ) = 0;
Using the three files ensures that the link editor will only look for definitions
for imported symbols and load in the corresponding initialization code in
cases where the symbols are actually used.
Referencing symbols in a shared library from another shared
In general, import all symbols defined outside the shared library whenever
1 77
Shared libraries
However, this is not always possible, as for example when floating-point
operations are performed in a shared library to be built. When such opera­
tions are encountered in any C code, the standard C compiler generates calls
to functions to perform the actual operations. These functions are defined in
the C library and are normally resolved in a manner invisible to the user when
an a. out is created, since the cc command automatically causes the relocatable
(non-shared) version of the C library to be searched. These floating-point rou­
tine references must be resolved at the time the shared library is being built.
But, the symbols cannot be imported, because their names and usage are
The #obj ects noload directive mkshlib(CP) has been provided to allow sym­
bol references such as these to be resolved at the time the shared library is
built, provided that the symbols are defined in another shared library. If there
are unresolved references to symbols after the object files listed with the
#obj ects directive have been link edited, the host shared libraries specified
with the #obj ects noload directive are searched for absolute definitions of the
symbols. The normal use of the directive would be to search the shared ver­
sion of the C library to resolve references to floating-point routines.
For this use, the syntax in the specification file would be
ll ob j e c t s no l oad
- l c_s
This would cause mkshlib to search for the host shared library libc_s.a in the
default library locations and to use it to resolve references to any symbols left
unresolved in the shared library being built. The -L option can be used to
cause mkshlib to look for the specified library in other than the default loca­
Using or building a shared library
When building a shared library using #obj ects noload, you must make sure
that for each symbol with an unresolved reference there is a version of the
symbol with an absolute definition in the searched host shared libraries,
before any relocatable version of that symbol. mkshlib will give a fatal error
if this is not the case, because relocatable definitions do not have absolute
addresses and therefore do not allow complete resolution of the target shared
When using a shared library built with references to symbols resolved from
another shared library, both libraries must be specified on the cc command
line. The dependent library must be specified on the command line before the
libraries on which it depends. (See the section "Building an a.out File'' for
more details.) If you provide a shared library which references symbols in
another shared library, you should make sure that your documentation
clearly states that users must specify both libraries when building a. out files.
1 78
Programming Tools Guide
Changing existing code for the shared library
Finally, it is possible to use #obj ects noload to resolve references to any sym­
bols not defined in a shared library, as long as they are defined in some other
shared library. Therefore, we strongly encourage you to import as many sym­
bols as possible and to use #obj ects noload only when absolutely necessary.
Probably you will only need to use this feature to resolve references to
floating-point routines generated by the C compiler.
However, importing symbols has several important benefits over resolving
references through #obj ects noload. First, importing symbols is more flexible
in that it allows you to define your own version of library routines. You can
define your own versions with archive versions of a library. Preserving this
ability with the shared versions helps maintain compatibility.
Importing symbols also helps prevent unexpected name space collisions. The
link editor will complain about multiple definitions of a symbol, references to
which are resolved through the #obj ects noload mechanism, if a user of the
shared library also has an external definition of the symbol.
Finally, #obj ects noload has the drawback that both the library you build and
all the libraries on which it depends must be available on all the systems.
Anyone who wishes to create a. out files using your shared library will need to
use the host shared libraries. Also, the targets of all the libraries must be
available on all systems on which the a. out files are to be run.
Providing compatibility with non-shared libraries
Having compatible libraries makes it easy to substitute one for the other. In
almost all cases, this can be done without makefile or source file changes. For
example, the shared C library is built using the existing non-shared library as
the base.
The host library archive file is compatible with the relocatable non-shared C
library. However, the shared library target file does not include all routines
from the archive, because including them all would have hurt performance.
These goals are reached as follows. The host library is built in two steps.
First, the available shared library tools are used to create the host library to
exactly match the target. The resulting archive file may not be compatible
with the archive C library at this point. Second, add to the host library the set
of relocatable objects residing in the archive C library that are missing from
the host library. Although this set is not in the shared library target, its inclu­
sion in the host library makes the relocatable and shared C libraries compati­
1 79
Shared libraries
Tuning the shared library code
Some suggestions for how to organize shared library code to improve perfor­
mance are presented here. They apply to paging systems, such as UNIX Sys­
tem V Release 3.0.
The non-shared C library contains several diverse groups of functions. Many
processes use different combinations of these groups, making the paging
behavior of any shared C library difficult to predict. A shared library should
offer greater benefits for more homogeneous collections of code. For example,
a database library probably could be organized to reduce system paging sub­
stantially, if its static and dynamic calling dependencies were more predict­
Profile the code
To begin, profile the code that might go into the shared library (see the
prof(CP ) command in the Programmer's Reference).
Choose library contents
Based on profiling information, make some decisions about what to include in
the shared library. a.out file size is a static property, and paging is a dynamic
property. These static and dynamic characteristics may conflict, so you have
to decide whether the performance lost is worth the disk space gained. See
"Choosing Library Members" earlier in this chapter for more information.
Organize to improve locality
Try to improve locality of reference by grouping dynamically related func­
tions. If every call of funcA generates calls to funcB and funcC, try to put
them in the same page. cflow(CP) (documented in the Programmer's Reference)
generates this static dependency information. Combine it with profiling to
see what things actually are called, as opposed to what things might be called.
Align for paging
The key is to arrange the shared library target's object files so that frequently
used functions do not unnecessarily cross page boundaries. When arranging
object files within the target library, be sure to keep the text and data files
separate. You can reorder text object files without breaking compatibility; the
same is not true for object files that define global data. Use name lists and
disassemblies of the shared library target file, to determine where the page
boundaries fall.
1 80
Programming Tools Guide
Changing existing code for the shared library
After grouping related functions, break them into page-sized chunks.
Although some object files and functions are larger than a single page, most of
them are smaller. Use the infrequently called functions as glue between the
chunks. Because the glue between pages is referenced less frequently than the
page contents, the probability of a page fault decreases.
After determining the branch table, arrange the library's object files without
breaking compatibility. Put frequently used, unrelated functions together
because they probably will be called randomly enough to keep the pages in
memory. System calls go into another page as a group, and so on. The fol­
lowing example shows how to change the order of the C library's object files:
ll ob j e c t s
p r i nt f . o
fopen . o
ma l l oc . o
s t rcmp . o
llobj e c t s
s t rcmp . o
ma l l oc . o
print f . a
fopen . o
Avoid hardware thrashing
Better performance results by arranging the typical process to avoid cache
entry conflicts. If a heavily used library had both its text and its data segment
mapped to the same cache entry, the performance penalty would be particu­
larly severe. Every library instruction would bring the text segment informa­
tion into the cache. Instructions that referenced data would flush the entry to
load the data segment.
Checking for compatibility
The following guidelines explain how to check for upwardly compatible
shared libraries. However, upward compatibility may not always be an issue.
Consider the case in which a shared library is one piece of a larger system and
is not delivered as a separate product. In this restricted case, identify all a.out
files that use a particular library. As long as all the a. out files are rebuilt every
time the library changes, the a. out files will run successfully, even though ver­
sions of the library are not compatible.
Checking versions of shared libraries using chkshlib (CP)
a. out files will not execute properly if newer versions of a library are not com­
patible with the previous ones.
1 81
Shared libraries
If you use shared libraries, you might need to find out if different versions of a
shared library are compatible, or if executable files could have been built with
a particular host shared library or can run with a particular target shared
library. For example, you might have a new version of a target shared library,
and you need to know if all the executable files that ran with the older version
will run with the new one. You might need to find out if a particular target
shared library can reference symbols in another shared library. A command,
chkshlib(CP) (documented in the Programmer's Reference), has been provided
to allow you to do these and other comparisons.
chkshlib takes names of target shared libraries, host shared libraries, and exe­
cutable files as input, and checks to see if those files satisfy the compatibility
criteria. chkshlib checks to see if every library symbol in the first file that
needs to be matched exists in the second file and has the same address.
The following table shows what types of files and how many of them
chkshlib accepts as input. The rows listed down represent the first input
given, and the columns listed across represent secondary input. For example,
if the first input file you give chkshlib is a target shared library, you must give
another input file that is a target or host shared library.
* The executable file must be one that was built using a host shared hbrary. A useful way to
confirm this is to use dump -L to find out which target file(s) gets loaded when the program is
See dump(CP), documented in the Programmer's Reference.
* You can also have executable target1 .. .targetn and executable host1 ... hostn.
An example of a chkshlib command line is shown below:
chkshlib /shlib/libc_s /lib/libc_s.a
In this example, /shlib!Iibc_s is a target shared library and /lib/libc_s.a is a host
shared library. chkshlib will check to see if executable files built with
/shlib!Iibc_s would be able to run with /lib/libc_s.a.
Depending on the input it receives, chkshlib checks to find out if the following
is true:
1 82
an executable file will run with the given target shared library
an executable file could have been built using the given host shared library
Programming Tools Guide
Changing existing code for the shared library
an executable file produced with a given host shared library will run with a
given target shared library
an executable file that ran with an old version of a target shared library will
run with a new version
a new host shared library can replace the old host shared library; that is,
executable files built with the new host shared library will run with the old
target shared library
a target shared library can reference symbols in another target shared
To determine if files are compatible, chkshlib has to determine which library
symbols in the first file need to be matched in the second file.
For target shared libraries, the symbols of concern are all external, defined
symbols with non-zero values, except for branch labels (branch labels
always start with .bt), and the special symbols etext, edata, and end.
For host shared libraries, the symbols of concern are all external, absolute
symbols with a non-zero value.
For executable files, the symbols of concern are all external, absolute symbols with a non-zero value, except for the special symbols etext, edata, and
For two files to be compatible, the target pathnames must be identical in both
files (unless the chkshlib -i option has been specified).
The following table displays the output you will receive when you use
chkshlib to check different combinations of files for compatibility. In this
table filel represents the name of the first file given, and file2,3,. represents
the names ofany more files given as input.
filel is executable
file2,3, (if any) are targets
filel is executable
file2,3 are hosts
filel is host
file2 (if any) is target
filel is target
file2 is host
both files are targets or
both files are hosts
both files are targets and
-n option is specified•
filel can [may not] execute using file2
filel can [may not] execute using file3
filel may [may not] have been produced using file2
filel may rmay notl have been produced usirut file3
filel can [may not] produce executables which
will run with file2
file2 can [may not] produce executables which
will run with filel
filel can [may not] replace file2
file2 can [may not] replace filel
filel can [may not] include file2
1 83
Shared libraries
• The -n option tells chkshlib that the two files are target shared hbraries, the ftrSt of which can
reference (include) symbols in the other. See "Referencing Symbols in a Shared Ubrary from
Another Shared Ubrary" for details.
For more information on chkshlib, see chkshlib(CP), documented in the
Programmer's Reference.
NOTE Symbols that have been hidden via the #hide directive cannot be
referenced directly. chkshlib will ignore them in its check for compatibility.
Dealing with incompatible libraries
There are two methods of dealing with incompatible libraries. First, you can
rebuild all the a.out files that use your library. If feasible, this is probably the
best choice. Unfortunately, you might not be able to find those a. out files, let
alone force their owners to rebuild them with your new library.
So your second choice is to give a different target pathname to the new ver­
sion of the library. The host and target pathnames are independent; you don't
have to change the host library pathname. New a.out files will use your new
target library, but old a. out files will continue to access the old library.
As the library developer, it is your responsibility to check for compatibility
and, probably, to provide a new target library pathname for a new version of
a library that is incompatible with older versions. If you fail to resolve com­
patibility problems, a. out files that use your library will not work properly.
1 84
NOTE You should try to avoid multiple library versions. If too many copies
of the same shared library exist, they might actually use more disk space
and more memory than the equivalent relocatable version would have.
Programming Tools Guide
Changing existing code for the shared library
An example
This section contains the process by which a small specialized shared library
is created and built. We refer to the guidelines given earlier in this chapter.
original source
The name of the library to be built is libmaux (for math auxiliary library). The
interface consists of three functions, an external variable, and a header file.
The three functions:
floating-point logarithm to a given base; defined in the file
evaluate a polynomial; defined in the file poly.c
return usage counts for the other two routines in a structure;
defined in stats.c,
The external variable:
set to non-zero if there is an error in the processing of any of
the functions in the library and set to zero if there is no error
(unlike errno in the C library),
And the header file:
declares the return types of the function and the structure
returned by maux_stat.
The source files before any modifications for inclusion in a shared library fol­
1 85
Shared libraries
I * l og . c * I
i i n c l u d e " rnaux . h "
i i n c l ude <rna t h . h>
* Return the l o g o f " x ' re l a t i ve t o the ba s e • a • .
l ogd ( ba s e , X ) : = l og ( x ) I l o g ( ba s e ) ;
* where " l o g ' i s ' l o g t o the base E ' .
doub l e l og d ( ba s e , x )
doub l e ba s e , x ;
e x t ern i n t s t a t s_logd ;
e x t ern i n t t o t a l_ca l l s ;
doub l e l ogba s e ;
doub l e l og x ;
t o t a l_ca l l s t t ;
s t a t s_ l o g dt t ;
l ogba s e = l og ( ( doubl e ) base ) ;
l og x = l og ( ( doubl e ) x ) ;
i f ( l ogba s e = = - HUG E I I l ogx = = - HUG E ) (
rnauxerr = 1 ;
ret urn ( O ) ;
rnauxerr = 0 ;
return ( l o gx l l ogba s e ) ;
1 86
Programming Tools Guide
Changing existing code for the shared library
/ * po l y . c * /
� i nc l ude " maux . h "
� i nc l u d e < ma t h . h>
Eva l u a t e the p o l y n om i a l
f (x ) : = a [O) * ( x A n ) + a [ 1 ) * ( x A (n-1 ) ) + . . . + a [n ) ;
N o t e t h a t t here are N+ l c o e f f i c i e nt s !
Th i s u s e s Horner ' s Me t ho d , wh i ch i s :
f ( x ) : = ( ( ( ( ( a [ O ) * x ) + a [ l ) ) *x ) + a [ 2 ) ) + . . . ) + a [ n ) ;
I t ' s equ i va l e n t , bu t u s e s many f ewer operat i o n s
a n d i s more prec i s e . * /
doub l e p o lyd ( a , n , x )
doub l e a [ ) ;
int n ;
doub l e x ;
e x t ern i n t s t a t s_po lyd ;
e x t ern i n t t o t a l_c a l l s ;
doub l e resu l t ;
int i;
t o t a l_ca l l s + + ;
s t a t s_po lyd+ + ;
if (n < 0) (
mauxerr = 1 ;
return ( O ) ;
resu l t = a [ 0 l ;
for ( i = 1 ; i <= n ; i++ l
resu l t * = ( doub l e ) x ;
resu l t + = ( doub l e ) a [ i ) ;
mauxerr = 0 ;
return ( resu l t ) ;
1 87
Shared libraries
/ * stats . c * /
* i n c l ude • maux . h '
i n t t o t a l_ca l l s = 0 ;
i n t s t a t s_ l ogd = 0 ;
i n t s t a t s_po l y d = 0 ;
i n t mauxerr ;
/ * R e t u rn s t ru c t ure w i t h usage s t a t s f o r f u nc t i o n s i n l i brary
or 0 i f space c a n n o t be a l l oc a t e d f o r the s t ru c t ure * /
s t ru c t m s t a t s *
maux_s t a t ( )
e x t ern char * ma l l oc ( ) ;
s t ru c t m s t a t s * s t ;
i f { { s t = { s t ru c t m s t a t s * )
ma l l oc { s i ze o f ( s t ru c t m s t a t s ) ) )
return ( O ) ;
s t - >st_p o l y d = s t a t s_po lyd ;
s t - >s t _ l og d = s t a t s_ l og d ;
s t - >s t _t o t a l = t o t a l_c a l l s ;
return ( s t ) ;
= =
/ * maux . h * /
s t ru c t m s t a t s
i n t s t_po l y d ;
i n t s t _ l ogd ;
i n t s t _t o t a l ;
e x t ern doub l e p o l y d ( ) ;
e x t ern doub l e l ogd ( ) ;
e x t ern s t ru c t m s t a t s *
maux_s t a t ( ) ;
e x t ern i n t mauxerr ;
Choosing region addresses and the target pathname
To begin, we choose the region addresses for the library's .text and .data sec­
tions from the segments reserved for private use on the 80386 Computer.
Note that the region addresses must be on a segment boundary (4 MB):
. text
. da t a
1 88
Programming Tools Guide
Changing existing code for the shared library
Also we choose the pathname for our target library:
/ my / d i re c t o ry / l i brn a u x_s
The choice of region addresses can be important. See the comments
in "Step 1; choosing region addressess" of 'The building process' earlier in
this chapter. A table of existing and suggested addresses is also given there.
Selecting library contents
This example is for illustration purposes, and so we will include everything in
the shared library. In a real case, it is unlikely that you would make a shared
library with these three small routines, unless you had many programmers
using them frequently.
Rewriting existing code
According to the guidelines given earlier in the chapter, we need to first
minimize the global data. We realize that total_calls, stats_logd, and
stats_polyd do not need to be visible outside the library, but are needed in
multiple files within the library. Hence, we will use the #hide directive in our
specification file to make these variables static after the shared library is built.
We need to define text and global data in separate source files. The only piece
of global data we have left is mauxerr, which we will remove from stats.c and
put in a new file maux_defs.c. We will also have to initialize i t to zero, since
shared libraries cannot have any uninitialized variables.
Next, we notice that there are some references to symbols that we do not
define in our shared library (i.e. log and malloc). We can import these sym­
bols. To do so, we create a new header file, import.h, which will be included
in each of log.c, poly.c, and stats.c . The header file defines C preprocessor
macros for these symbols to make transparent the use of indirection in the
actual C source files.
We use the _libmaux_ prefixes on the pointers to the symbols because those
pointers are made external, and the use of the library name as a prefix helps
prevent name conflicts.
/ * New header f i l e i rn p o r t . h * /
# de f i n e rn a l l o c
( *_ l i brn a u x_rna l l o c )
# d e f i n e l og
( *_ l i brn a u x_ l o g )
e x t e r n c h a r * rna l l o c ( ) ;
e x t e r n d ou b l e l o g ( )
Now, we need to define the imported symbol pointers somewhere. We have
already created a file for global data maux_defs.c, so we will add the
definitions to it.
1 89
, ,
Shared libraries
/ * Data f i l e mau x_de f s . c * /
i n t mauxerr
doub l e ( *_ l i bmaux_ l o g ) ( ) = 0 ;
char * ( *_ l i bmau x_ma l l oc ) ( ) = 0 ;
Finally, we observe that there are floating-point operations in the code, and
we remember that the routines for these cannot be imported. (If we tried to
write the specification file and build the shared library without taking this
into account, mkshlib would give us errors about unresolved references.)
This means we will have to use the #obj ects noload directive in our specifica­
tion file to search the C host shared library to resolve the references.
Writing the specification file
This is the specification file for libmaux:
# # l i bmaux . s l - l i bmaux spec i f i c a t i o n l f i l e
#addre s s . t e x t O x B 0 6 8 0 0 0 0
# addre s s . da t a O x B 0 6 a O O O O
# t arget /my / d i re c t o ry / l i bmaux_s
# branch
l ogd
maux_s t a t
# obj ec t s
mau x_de f s . o
poly . o
l og . o
stat s . o
# o b j e c t s no l oad
- l c_s
# h i de l i n ker *
# export l i n ker
# i n i t maux_de f s . o
l i bmau x_ma l l oc ma l l oc
_ l i bmaux_ l og
Briefly, here is what the specification file does. Lines 1 and 2 are comment
lines. Lines 3 and 4 give the virtual addresses for the shared library text and
data regions, respectively. Line 5 gives the pathname of the shared library on
the target machine. The target shared library must be installed there for a. out
files that use it to work correctly. Line 6 contains the #branch directive. Line
7 through 9 specify the branch table. They assign the functions polyd( ),
logd( ), and maux_stat( ) to branch table entries 1, 2, and 3. Only external text
symbols, such as C functions, should be placed in the branch table.
Programming Tools Guide
Changing existing code for the shared library
Line 10 contains the #obj ects directive. Lines 1 1 through 14 give the list of
object files that will be used to construct the host and target shared libraries.
When building the host shared library archive, each file listed here will reside
in its own archive member. When building the target library, the order of
object files will be preserved. The data files must be first. Otherwise, an addi­
tion of static data to poly.o, for example, would move external data symbols
and break compatibility.
Line 15 contains the #obj ects noload directive, and line 16 gives information
about where to resolve the references to the floating-point routines.
Lines 17 through 19 contain the #hide linker and #export linker directives,
which tell what external symbols are to be left external after the shared library
is built. Together, these #hide and #export directives say that only mauxerr
will remain external. The symbols in the branch table and those specified in
the #init directive will remain external by definition.
Line 20 contains the #init directive. Lines 21 and 22 give imported symbol in­
formation for the object file maux_defs.o. You can imagine assignments of the
symbol values on the right to the symbols on the left. Thus _libmaux will
hold a pointer to malloc, and so on.
Building the shared library
Now, we have to compile the .o files as we would for any other library:
cc -c maux_defs.c poly.c log.c stats.c
Next, we need to invoke mkshlib to build our host and target libraries:
mkshlib -s -t libmaux_s -h libmaux_s.a
Presuming all of the source files have been compiled appropriately, the
mkshlib command line shown above will create both the host library,
libmaux_s.a, and the target library, libmaux_s. Before any a.out files built
with libmaux_s.a can be executed, the target shared library libmaux_s will
have to be moved to !my/directory/libmaux_s as specified in the specification
Using the shared library
To use the shared library with a file, x.c, which contains a reference to one or
more of the routines in libmaux, you would issue the following command
cc x.c libmaux_s.a -1m -lc_s
Shared libraries
This command line causes the following:
the imported symbol pointer reference to log is resolved from libm
the imported symbol pointer reference to malloc is resolved with the
shared version from libc_s.
The most important thing to note from the command line, however, is that
you have to specify the C host shared library (in this case with the -lc_s) on
the command line, since libmaux was built with direct references to the
floating-point routines in that library.
This chapter describes the UNIX System shared libraries and explains how to
use them. It also explains how to build your own shared libraries. Using any
shared library almost always saves disk storage space, memory, and com­
puter power; and running the UNIX System on smaller machines makes the
efficient use of these resources increasingly important. Therefore, you should
normally use a shared library whenever it's available.
Programming Tools Guide
Chapter 8
The software tool lex lets you quickly generate solutions to problems that
involve lexical analysis, that is, the recognition of strings of characters that
satisfy certain characteristics. This enables you to solve a wide class of prob­
lems drawn from text processing, code enciphering, compiler writing, and
other areas. For example:
in text processing, checking the spelling of words for errors
in code enciphering, translating certain patterns of characters into others
in compiler writing, determining what the tokens (smallest meaningful
sequences of characters) are in the program to be compiled
It is not essential to use lex to handle problems of this kind: you could write
programs in a standard language like C to handle them. What lex does is gen­
erate such C programs, based on a set of specifications that you give it. These
lex specifications name and describe the classes of strings that you wish to
recognize, and often give actions to be carried out when a particular kind of
string is found. lex is referred to as a "program generator'': more specifically,
it is a "lexical analyzer generator." It offers a faster, easier way to create pro­
grams to perform lexical analysis. Its weakness is that it often produces C
programs that are longer and execute more slowly than hand-coded programs
that do the same task. In many applications size and speed are minor con­
siderations, and the advantages of using lex considerably outweigh these
1 93
To understand what lex does, refer to Figure 8-1 . It begins with a lex specifi­
cation, sometimes referred to as a lex source program. The source is read by
the lex program generator. The output of the program generator is a C pro­
gram which, in tum, must be compiled in order to generate an executable pro­
gram that performs the lexical analysis. The lexical analyzer program pro­
duced by this process accepts as input any source file and produces the
specified output, such as altered text or a list of tokens.
in C
executa e
Figure 8-1 Creation and use of a lexical analyzer with lex
Programs generated by lex can also be used to collect statistical data on fea­
tures of the input, such as character count, word length, and the number of oc­
currences of particular words. In later sections of this chapter, you will see
how to:
1 94
write lex specifications to perform some of these tasks
translate lex specifications into C
compile, link, and execute the lexical analyzer in C
run the lexical analyzer program
Programming Tools Guide
Writing lex programs
Writing lex programs
A lex specification consists of a mandatory rules section, and optional sec­
tions for definitions and user subroutines.
The definitions section, if present, must be the first section in the lex program.
The mandatory rules section follows the definitions; if there are no definitions,
then the rules section is first. In both cases, the rules section must start with
the delimiter % % . If there is a subroutines section, it follows the rules section
and is separated from the rules by another %% delimiter. If there is no second
% % delimiter, the rules section is presumed to continue to the end of the pro­
The following is a small, though complete, lex specification, illustrating all
three sections:
i n t count ;
cou n t + + ; n o i s e ( J ; }
f ly
pr i n t f ( ' Wo o f ! \ n ' } ; }
pr i n t f ( ' He l l o wor l d ! \ n ' ) ;
no i se ( )
p r i nt f ( ' B z z z z ! \ n ' l ;
In this example, the definitions section (lines 1-3) declares a variable which is
used as a counter later in the program. The rules section (lines 4-7) consists of
three rules, each of which consists of a pa ttern , followed by some C code. The
subroutines section (lines 8-12) defines a function that is used in one of the
roles section
Each rule consists of a pattern to be searched for in the input, followed on the
same line by an action to be performed when the pattern is matched. Because
lexical analyzers are often used in conjunction with parsers, as in program­
ming language compilation and interpretation, the patterns can be said to
define the classes of tokens that may be found in the input.
Regu lar expressions
The patterns describing the classes of strings to be searched for are written
using regular expressions in a notation similar to that used in awk and sed.
The terms "pattern" and "regular expression'' are often used interchangeably.
1 95
A regular expression is formed by concatenating characters and, usually, cer­
tain operators. This notation used with lex is summarized in the following
A string of text characters with no operators at all just matches the literal
string. To match the word orange, use:
To match a literal string that contains spaces or tabs, surround the expres­
sion with double quotes. To match the phrase red apple, use the expression:
' red app l e '
An expression, followed by the 11 * operator, matches 0 or more occurren­
ces of that expression. To match a string containing any number of m's, or
the null string, use the expression:
An expression, followed b y the 11 + 11 operator, matches one o r more occur­
rences of that expression. To match a string containing one or more m's,
but not the null string, use the expression:
An expression, followed by the 11 ? operator, matches 0 or 1 occurrence(s)
of that expression. This is equivalent to saying that the expression is
optional. To match one occurrence of the letter 11 m 11, or the null string, use
the expression:
The period character, (.), matches any single character. To match any five­
letter string starting with 11 m " and ending with 11 y , use the expression:
Alternation in regular expressions is supported using the vertical bar, ( I ).
To match either of the strings love and money, use the expression:
l ove l money
Expressions may be grouped using parentheses, '(' and ')'. To match a
string that consists of any number of a's and b's, followed by a 11 c 11, use the
( a l b) *c
The circumflex, 'AI, followed by a pattern, signifies that the pattern must
match at the beginning of a line. The following rule matches the word First
at the beginning of a line.
"F irst
Programming Tools Guide
Writing lex programs
The dollar sign, '$' is appended to a pattern to indicate that it must match
at the end of a line. The following rule matches the word cow at the end of
a line:
To indicate that a regular expression should be matched a specific number
of times, follow that expression with a number enclosed in curly braces, '{'
and '}'. To match three repetitions of cd, that is, cdcdcd, use the expression:
( cd ) ( 3 )
To specify a range of repetitions, follow the expression by two numbers,
separated by a comma and enclosed in curly braces. To match three, four,
or five repetitions of ab, that is, ababab, abababab, or ababababab, use the
( ab ) { 3 , 5 )
A sequence of characters inside square brackets, '[' and ']', matches any one
character in the sequence. To match any one of n d ", " g ", n k ", and n a ",
use the expression:
[ dg ka )
If the circumflex, n, is the first character inside the square brackets, then
the pattern matches any character that does not appear inside the brackets.
In this context, the circumflex does not signify the start of a line, as it does
when prepended to a pattern. To match any character other than n a ", n b ",
and n c ", use the expression:
[ � abc )
Ranges within a standard alphabetic or numeric order are indicated with a
hyphen, (-). The following expression matches any digit, uppercase letter,
or lowercase letter.
( 0 -9A-Za-z ]
Regular expressions can be concatenated. The resulting expression matches
whatever the first expression matches followed by whatever the second
expression matches. The following regular expression matches an
identifier in many programming languages. An identifier, thus defined, is a
letter followed by zero or more letters or digits.
[ a - zA - Z ) [ 0 - 9 a - z A - Z ) *
1 97
To treat an otherwise special character as a literal character, rather than as a
special character, enclose the character in quotation marks or precede it
with a backslash (\) . Either of the following expressions could be used to
match an asterisk followed by one or more digits.
\t [ 0 -9 ] +
' + ' [0-9 ) +
To recognize a backslash itself, either of these expressions could be used:
lex understands the standard C escape sequences, such as \n for the end­
An action is a block of C code that is executed whenever the corresponding
pattern in the lex specification is matched. Once the lex-generated lexical
analyzer matches a regular expression specified in a rule in the specification, it
looks to the right of the rule for the action to be performed. Actions typically
involve operations such as a transformation of the matched string, returning a
token to a parser, or compiling statistics on the input.
The simplest action contains no statements at all. Input text that matches the
pattern associated with a null action is ignored. A sequence of characters that
does not match any pattern in the rules section is written to the standard out­
put without being modified in any way. To cause lex to generate a lexical
analyzer that prints everything in the input text with the exception of the
word orange, which is ignored, the following rules section is used:
Note that there must be some white space (spaces or tabs) between the pat­
tern and the semicolon.
You may want to print out a message noting that a string of text was found, or
a message transforming the text in some way. To recognize the expression
Amelia Earhart, the following rule can be used:
' Ame l i a Earhart '
p r i n t f ( ' found Ame l i a ' s bookc a s e ! \ n ' ) ;
To replace a lengthy medical term with its acronym, a rule such as this is
called for:
E l ec t roencepha l ogram
p r i n t f ( ' EEG ' ) ;
Programming Tools Guide
Writing lex programs
In order to count the lines in a text file, the analyzer must recognize end-of­
lines and increment a counter. The following rule is used for this purpose:
i n t l i ne n o = O ;
l i ne n o+ + ;
NOTE If an action consists of two or more C statements spread over two or
more lines, the code must be enclosed in curly braces, { and '} '.
yytext, yyleng
When a character string matches some pattern in the lex specification, it is
stored in a character array called yytext. The contents of this array m-:y be
operated on by the action associated with the pattern: it can be printed or
manipulated as necessary. lex also provides a variable yyleng, which gives
the number of characters matched by the pattern.
For example, the following rule directs the lexical analyzer to count the digit
strings in an input text and print the running total, and print out the text of
each string as soon as it is found.
i n t d i g s t r i n gcount =O ;
[ -+) ? [0-9 ) +
d i g s t r i ngcoun t + + ;
pr i n t f { ' % d % s \ n ' , d i g s t r i n g c o u n t , y y t e x t ) ;
This specification matches negative digit strings, and positive strings whether
or not they are preceded by a plus sign; the " ? " indicates that the preceding
sign is optional.
The macro ECHO is a shorthand way of printing out the text of the token. The
two rules in the next example have the same effect.
J i m i James
J i m i James
{ ECHO ; J
{ p r i n t f ( ' % s ' , yy t e x t ) ; J
1 99
r · --
[ ···
�:,. : .' : - .
II�� �
The following lex specification draws together several of the points discussed
1 %{
2 int subprogcount
3 int gstringcount
4 %}
5 %%
6 - [ 0-9 ] +
printf ( • negative integer\n' } ;
7 ' t ' ? [ 0- 9 ] +
printf ( • posit ive integer\n' ) ;
8 -0\ . [ 0 -9 ] +
printf ( 'negative rea l number, no who le number part \n' ) ;
9 ra i l [ ] +road
printf ( • ra i lroad i s one word \n' J ;
10 crook
11 function
printf ( ' Here ' s a crook ! \n • ) ;
subprogcount+ + ;
12 G [ a - zA-Z ] *
printf ( •may have a G wo rd here : % s \n • , yytext ) ;
gstringcounttt ;
The first three rules (lines 6-8) recognize negative integers, positive integers,
and negative real numbers between 0 and -1. The fourth rule (line 9) matches
cases where one or more blanks intervene between the two syllables of the
word railroad. The fifth specification (line 10) matches the word crook and
prints a useful warning. The rule recognizing Junction (line 1 1 ) increments a
counter. The last rule (lines 12-15) illustrates a multiline action, and the use of
definitions section
The lex definitions section may contain any of several classes of items. The
most important are external definitions, #include statements, and abbrevia­
External definitions
Recall that for legal lex source this section is optional, but in most cases some
of these items are necessary. External definitions have the same form and
function as in C. They declare that variables globally defined elsewhere
(perhaps in another source file) can be accessed in your lexical analyzer.
You may want to define a variable for use only within the action associated
with one rule, for example, a loop index. You can declare such a variable to be
local to a block (in this case, the action sequence) as you normally would in
C-by declaring it directly after the left curly brace.
#include statements
The purpose of the #include statement is the same as in C: to include files
that are important to the lexical analyzer program. An example of this usage
occurs when lex is used with yacc. yacc is a program generator that generates
Programming Tools Guide
Writing lex programs
parsers, programs that analyze input to ensure that it is syntactically correct.
These parsers will call a lexical analyzer. When using yacc, the file,
generated by yacc, should be included. This file can contain definitions for
token names. #include statements and variable declarations must be placed
between the delimiters '%{' and '%}', for example:
# i n c l u d e • y . t ab . h '
e x t ern i n t t o kva l ;
i n t l i neno ;
The definitions section can also contain abbreviations for regular expressions
to be used in the rules section. The purpose of abbreviations is to avoid need­
less repetition in writing your specifications and to provide clarity in reading
them. A definition for an identifier would often be placed in this section.
Abbreviations must appear after the '%}' delimiter that ends your #include
statements and variable declarations (if there are any).
The abbreviation appears on the left of the line and its definition appears on
the right, separated by one or more spaces. When abbreviations are used in
rules, they must be enclosed within curly braces. The following example
illustrates how abbreviations are defined and used.
{ a-zA-Z } [ a-zA-z0-9 ] *
- (d ig } +
printf ( ' negative integer\n ' } ;
' + ' ? (dig } +
printf ( 'posit ive integer\n ' ) ;
- 0 \ . ( d ig } +
printf ( 'negative rea l number, no whol e number part \n ' } ;
{ ident }
printf ( ' % s ' , yytext } ;
subroutines section
Subroutines can be used in a lex specification for the same purposes as in
other C programs. Code used in actions for several rules can be written once
and called when needed. Other reasons to use a subroutine are to highlight
some code of interest and to simplify the rules section, even if the code is used
in one rule only. As an example, consider the following routine which is used
to compile statistics on an input text.
i n t t he_c o u n t = 0 ;
i n t a_c o u n t = 0 ;
i n t an_c o u n t = 0 ;
art i c l e
a l an l t he
do_a rt i c l e { y y t e x t ) ;
{ a rt i c l e )
do_a rt i c l e { s )
char * s ;
i f { ! s t rcrnp { s , ' a ' ) )
a_c ou n t + + ;
return 0 ;
l e l s e i f ( ! s t rcrnp ( s , ' a n ' ) )
an_cou n t + + ;
ret u rn 0 ;
l e l s e i f ( ! s t rcrnp ( s , ' t he ' l l
t he_cou n t + + ;
return 0 ;
pr i n t f ( ' t e x t not an art i c l e : % s \ n ' , s ) ;
return 1 ;
Other examples of subroutines are be programmer-defined versions of the
I/0 routines input( ), unput( ) , and output( ), which will be discussed later.
Subroutines that may be exploited by many different programs should prob­
ably be stored in their own individual file or library to be called as needed.
Advanced lex usage
This section discusses advanced features of lex that help the user solve more
complex problems that would be difficult to deal with using ordinary pro­
gramming techniques.
Programming Tools Guide
Advanced lex usage
Disambiguating rules
lex follows two rules to resolve ambiguities that may arise from the lex specif­
ication. These rules are:
when a string of input text can match two or more rules in a specification,
the first rule in the specification is the one which is matched, and the one
whose action is executed
if a string in the input text can match a rule in the specification, but a longer
string which has the first string as its prefix will also match a rule, then the
longer string matches
A situation of the type addressed by the first rule could arise if a specification
had patterns to match keywords and identifiers:
l i n c l ude • y . t ab . h '
[ a - zA - Z ] [ a - z A - Z 0 - 9 ] *
return { STARTTOK )
return ( BREAKTOK )
return ( ENDTOK ) ;
return ( yy t e x t ) ;
The string "START" could be matched by both the first or the fourth rule:
because START is a reserved word, you want only the action associated with
the first rule to be executed. By placing the rule for START and the other
reserved words before the rule for identifiers, you ensure that reserved words
are recognized as such.
The second kind of ambiguity could arise, for example, if the input text was in
a programming language that had operators that were similar. Part of a lex
specification for the C language might look like this:
ll + d
' ++ '
{ return ( PLUS ) ; l
{ return ( I NC ) ; l
The lexical analyzer should recognize the increment operator "++", not the
addition operator "+", when it reads the following statement:
i ++ ;
lex has a number of mechanisms that help deal with problems of context sen­
Trailing context
A potential problem exists when the lexical analyzer must read characters
beyond the pattern being sought because it cannot be sure it has found the
pattern until some additional information is known about the context in which
it appears. A classic example of this involves the DO statement in FORTRAN.
Consider the following DO statement:
DO 5 0 k = 1 , 2 0 , 2
Consider the sequence consisting of the characters preceding the comma in
the statement above:
DO 50 k
Because FORTRAN ignores blanks, this sequence might be equivalent to the
assignment statement:
D0 5 0 k = 1
It is not possible to know that the 11 1 11 is the initial value of the index 11 k 11, and
that the characters " DO " are a keyword, not the first two characters of an
identifier, until the first comma is read. Therefore the lexical analyzer would
not always interpret the string " DO " in the desired way if the lex specification
contained the rule:
( r e t u r n ( DOTOK ) ; )
The way to handle this is to use the slash, (/) which signifies that what follows
is trailing context. Trailing context is a second pattern that is expected to fol­
low the token that is being searched for. The token is not matched unless the
trailing context is also matched. The string that matches the trailing context is
not stored in yytext, because it is not part of the token itself. The pattern to
recognize the FORTRAN DO statement could be
DO / [ ] * [ 0 - 9 ] + [ ] * [ a - zA-Z0 - 9 ] + [ a - zA-Z0 - 9 ] * = [ ] * [ a - z A - Z 0 - 9 ] + [ a - zA-Z0 - 9 ] * [ ] * ,
(To simplify the example, the rule accepts an index name of any length.) The
11 $ 11 operator, discussed in the section on regular expressions, is a form of
trailing context. Note t.ltat this operator is not exactly the same as a newline,
\n. Consider the specification
he l l o \ n
( pr i n t f ( ' % s " , yy t e x t ) ;
wor l d $
( p r i n t f ( ' % s ' , yy t e x t ) ;
The token matched by the first rule is printed with the newline still attached
as part of the token. The token matched by the second rule is printed without
a newline; a newline matched by 11 $ 111 like any trailing context, is not part of
the token.
Starting state
lex allows you to set a kind of flag called a starting state that designates cer­
tain rules as applying only when that state is active. There are a number of
steps that have to be followed to employ this mechanism.
Programming Tools Guide
Adva nced lex usage
For the example that follows, assume that you are dealing with a program­
ming language in which programs start with the keyword #go and end with
the keyword #stop, and which has a keyword while. Assume also that the
input text contains pieces of code interspersed with blocks of text. In this
case, there needs to be a way for the lexical analyzer to distinguish between
the word while as a keyword and as an ordinary word.
# i n c l u de ' de f s . h '
% s t a rt PROG
r e t u r n { WH I L E ) ;
< P ROG >wh i l e
return ( 1 1 l ; l
p u t _ i n_t a b { y y t e x t ) ; )
[ a - zA - Z ] *
< PROG > # s t op
ECHO ; BEG I N 0 ; )
i n t p u t_ i n_t a b ( )
The line in the declarations section beginning with %start (line 4) is necessary
to define the state PROG . This is state indicates to the lexical analyzer that it is
reading code, not text.
The first rule (line 6) determines that when the analyzer reads the keyword
#go it activates the state PROG by means of the BEGIN macro followed by the
state name.
A rule is associated with a state by prefacing it with the state name enclosed
in angle brackets, '<' and '>'. A rule that has been so designated is applied if
and only if that state is active. According to rule two (line 7), if the state PROG
is active, then if the character sequence while is seen, a token is returned indi­
cating that it is a keyword.
A rule that is not prefaced by a state name is applied no matter which, if any,
s tate is active. Rule three (line 8) is an example of this. Rule four (line 9) in the
example is also not associated with any state. It will, not, however, match the
word while if the state PROG is active. This is because that word will have
matched the pattern in the earlier rule, rule two, according to the first disam­
biguating rule discussed in the section on "Disambiguating rules".
Rule five (line 10) in the specification deactivates the state PROG if that state is
active and the keyword #stop is read. "BEGIN 0 " deactivates the current state
and does not make any other state active.
Programming techniques such as flag variables may also be used to mark
context-sensitive conditions.
lex 1/0 routines
Some actions may require reading another character, putting a character back
into the input stream, or writing a character to the standard output. lex sup­
plies three functions to handle these tasks: input( ), unput( ), and output( ),
respectively. input( ) takes no arguments; unput( ) and output( ) take a single
character-valued argument.
The following example illustrates the use of input( ) and unput( ). The subrou­
tine skipcmnts( ) is used to ignore comments in a language like C, where com­
ments occur between ' I*' and ' * /' :
' /* '
s k i pcmn t s ( ) ;
s k i pcmn t s ( )
for ( ; ; )
wh i l e ( i npu t ( ) ! = 1 * 1 ) ;
i f ( i nput ( ) ! = 1 1 1 ) (
unput ( yy t e x t [ yy l e n g - 1 ] ) ;
e l se
return ;
After the token "/*" is read, the lexical analyzer continues reading characters
until an asterisk (*) is found. If the character after the asterisk is a "/'� the
function returns. Otherwise, that character is returned to the input stream and
the function keeps on reading characters. The important thing to note here is
that the analyzer does not try to match any patterns with the characters that
are read by input( ). When it resumes pattern matching, after the function
returns, it starts with the first character in the input stream after the characters
read by the subroutine.
Programming Tools Guide
Advanced lex usage
There are three other things to note in this example. First, the unput( ) func­
tion (which puts back the last character read) is necessary to avoid missing
the final " I " if the comment ends with a * *' · In this case, having read an " * " ,
the analyzer finds that the next character is not the terminal '/' and must read
some more. Second, the expression yytext[yyleng-1] refers to the last charac­
ter read. Third, this routine assumes that the comments are not nested, as is
indeed the case with the C language. If, unlike in C, they are nested in the
source text, after reading the first *I ending the inner group of comments, the
lexical analyzer reads the rest of the comments as if they were part of the
input to be searched for patterns.
To handle special I/0 needs, such as writing to several files, standard l/0
routines in C can be used to rewrite the functions input( ), unput( ), and out­
put( ). These and other programmer-defined functions should be placed in the
subroutine section. The new routines will then replace the standard ones.
lex's input( ) is equivalent to getchar( ), and output( ) is equivalent to
putchar( ).
Routines to reprocess input
There are a number of lex routines that let you handle sequences of characters
that are to be processed in more than one way.
The text matching a given pattern is stored in the yytext array. In general,
after the action associated with is performed, the characters in yytext are
overwritten with succeeding characters in the input stream to form the next
match. The function yymore{ ) prevents this overwriting, and causes the char­
acters matching the next pattern to be appended to those already in yytext.
This allows you to process in the same action a set of characters associated
with two (or more) successive pattern matches. It is useful when one string of
characters is significant and a longer string, that includes the first, is signifi­
cant as well.
Consider a character string bounded by B's and interspersed with one 11 B 11 at
an arbitrary location. For example,
B i i i Bj j j B
You may want to count the number of characters between the first and second
11 B 11 and add it to the number of characters between the second and third 11 B 11,
and print the result. (The last 11 B 11 is not to be counted.)
Code to do this is:
int f lag=O ;
B [ AB ] *
i f ( f lag == O J
f l ag = 1 ;
yymore ( } ;
e l se
f l ag = 0 ;
p r i n t f ( ' %d \ n ' , yy l eng } ;
The variable flag is used to distinguish the character sequence terminating
just before the second " B " from that terminating just before the third. If the
input text consists of the "word" BoomBoxB, then the pattern first matches the
string Boom, causing that string to be put into yytext. Next, the pattern
matches the string Box, which would normally cause only that string to be put
into yytext. However, yymore( ) was called in the action associated with the
previous pattern match, so yytext contains BoomBox, and yyleng consequently
equals 7.
The function yyless( ) resets the end point of the current token. yyless( ) takes
a single integer argument: yyless(n) causes the current token to consist of the
first n characters of what was originally matched (that is, up to yytext[n-1]).
The remaining yyleng-n characters are returned to the input stream. Consider
the following specification:
[A-Zl (a-z] *
[a-z] *
( i f ( yy l en g > S }
yy l e s s { S } ;
pr i nt f ( ' % s ' , yy t ex t } ;
The lexical analyzer generated from this specification removes the first 5
letters from any word that starts with an uppercase letter.
REJECT allows the lexical analyzer to try to match the current token against
the remaining patterns in the specification. Its function is the same as if
yyless(O) were executed (that is, all the characters in the token were returned
to the input stream), except that pattern matching resumes at the pattern fol­
lowing the current one, rather than at the first pattern in the specification. If
Programming Tools Guide
Advanced lex usage
you want to count the number of occurrences of both the regular expression
snapdragon and its subexpression dragon, the following works:
s n apdragon
[ count f l owers + + ; REJECT ; )
countmo n s t e rs + + ;
As an example of one pattern overlapping another, the following counts the
number of occurrences of the expressions comedian and diana, even where the
input text has sequences such as comediana:
c ome d i a n
d i ana
[ com i ccount + + ; REJECT ; }
p r i n c e s s c ount + + ;
Note that the actions here may be considerably more complicated than simply
incrementing a counter.
End�of-file processing
The routine yywrap( ) is used to deal with end-of-file processing. In its default
form, yywrap( ) simply returns 1 if the end-of-file has been reached, and 0 oth­
erwise. A user-defined yywrap( ) may be substituted to provide some other
action at the end of input. This routine should be linked before the lex library.
Using lex with yacc
If you work on a compiler project or develop a program to check the validity
of an input language, you may want to use the UNIX System program tool
yacc . lex can conveniently be used with yacc to develop compilers. (For a
more complete discussion, see the chapter on yacc in this guide.) Whether or
not you plan to use lex with yacc, this section is useful because it covers infor­
mation of interest to all lex programmers.
No matter what kind of parser you are using, the lexical analyzer generated
by lex is invoked through a call to the function yylex( ) . yylex is an integer­
valued function. This name is convenient because the built-in function call
that yacc-generated parsers use to invoke the lexical analyzer is also yylex( ) .
Returning tokens
If your lexical analyzer is to be used along with a parser, each lex action
should be ended with a return statement of the form
r e t u r n t oken ;
r e t u r n [ t o ken } ;
Here, token is an integer-valued variable, literal, or macro, or a character
enclosed in single quotes. The value returned indicates to the parser what
kind of token the lexical analyzer has found. The parser then resumes control
and later makes another call to the lexical analyzer (via a call to yylex( )) when
it needs another token.
The different return values indicate to the parser whether the token is a
reserved word, identifier, constant, arithmetic operator, relational operator, or
other token type. In some cases, such as reserved words, and sometimes
operators, the return value indicates which token was recognized: for exam­
ple, a value of 16 might indicate the reserved word else in a C compiler. In
the other cases, especially identifiers and constants, there needs to be another
mechanism for telling the parser the exact value of the token. Consider the
following portion of lex source for a lexical analyzer for a hypothetical pro­
gramming language:
21 0
i n c l u de ' t o ken . de f s '
e x t ern i n t t okva l ;
beg i n
wh i l e
reve r se
l oop
[ a - zA - Z ] [ a - zA - Z 0 - 9 ] *
[0-9 ] +
return { BEG I N ) ;
return { EN D ) ;
return { WH I L E ) ;
return { I F ) ;
return { PACKAGE ) ;
re t u rn ( REVERSE ) ;
r e t u rn { LOOP ) ;
return { a { . ) ;
r e t u rn ( ' ) ' ) ;
{ t o kva l = p u t_ i n_t ab l { ) ;
return { I DENT I F I ER ) ; )
t o kva l = p u t _ i n_t ab l { ) ;
re t u r n { I NTEGE R ) ; )
t o kva l = PLUS ;
return { AR I THOP ) ;
{ t o kva l = M I NUS ;
ret u rn { AR I THOP ) ;
t o kva l = G R E ATE R ;
return { RELOP ) ; )
t okva l = G R E ATEREQL ;
return { RELOP ) ; )
put_i n_t a b l ( )
Programming Tools Guide
Advanced lex usage
The identifiers BEGIN, END, and WHILE, used as return values would typi­
cally be integer-valued macros defined in some header file: in this case,
token.defs (see line 2). The header file would look something like this:
# de f i n e BEG I N
# de f i n e END
# de f i n e WH I L E
# de f i n e RELOP
If i t becomes necessary to change the value o f some token type, then a #define
statement in the header file is changed. If using yacc to generate the parser,
you may insert the following statement into the definitions section of your lex
# i n c l u de • y . t ab . h '
NOTE yacc, with the -d option, generates on the basis of the yacc
specification, which includes token declarations. If lex and yacc are used
together, presumably the same token set is used with both. If the file
is included in the lex source, then yacc must be run before the lex-generated
file is compiled.
To indicate the reserved words in the example, the returned integer values
suffice. If a token consists of a single character, like the 11 ( 11 and 11 ) 11 tokens in
lines 13 and 14 of the example, its literal value may be passed to the parser in
the return statement. For the other token types (lines 15-26}, the integer value
identifying the token is stored in the programmer-defined variable tokval.
This variable is globally defined so that the parser as well as the lexical
analyzer can access it. If the parser has been generated by yacc, the variable
yylval should be used for this purpose. Parsers generated by yacc look in that
variable for the value of the token. yylval is defined in the file It is an
integer variable by default, but it can be redefined as a C union in the yacc
Using a sym bol table
This example illustrates the use of a symbol table. The symbol table is a data
structure that is globally accessible (or at least accessible by the parser); it
stores the text of most or all of the tokens found in a particular input and asso­
ciates them with identifying indexes. The function put_in_tabl( ) helps main­
tain the symbol table. This function may be defined in the subroutines section
of the lex specification, or in an included file. The implementation details of
this function are not important here, but it may be understood to carry out the
following tasks:
When a token is recognized, put_in_tabl( ) takes the value in yytext and
compares it with all the strings currently in the symbol table.
If it finds that value already in the table, put_in_tabl( ) returns the index of
the value.
The index is a unique identifying integer: no other identifier may have the
same index. The index could be an array subscript, but the details of this
are unimportant to us as long as the value is unique.
If the current token does not already appear in the symbol table, a new
entry is created for it and an index is generated. (The function may store
some other information in the symbol table such as whether the token is an
identifier, constant, or other token type.) put_in_tabl( ) then returns the
Different parts of the program know about the text of an identifier by way of
the index stored in tokval. For example, when the parser receives a value
from the lexical analyzer indicating that an identifier (line 15) has been recog­
nized, it can find the string in the symbol table which has an index equal to
Note that the example shows two ways to assign a value to tokval.
put_in_tabl( ) may assign a symbol table index to tokval. In the last few
actions of the example (line 15-26), tokval is assigned a specific integer (in the
form of a defined macro) indicating which operator the analyzer recognized.
For example, in the last rule in the specification (lines 25 and 26), the lexical
analyzer indicates the general class of operator by returning the value RELOP
to the parser. This enables the parser to check the input for syntactic correct­
ness. Further, the lexical analyzer indicates the specific operator by assigning
the value GREATEREQL to tokval. The parser has some way of associating this
value with the string >= . Quite possibly the symbol table will have a set of
base entries that includes >= . The definition of GREATEREQL, and the other
predefined token indexes will not necessarily appear in the same set of
definitions as the token types.
Using lex
under U1\l1X
This section explains how to create C code from lex specifications, then how
to compile and run the lexical analyzer program.
Running lex
If Iex.I i s the file containing the lex specification, the C source for the lexical
analyzer is produced by running the command:
lex lexJ
lex produces a C file called lex.yy.c.
Programming Tools Guide
Using lex under UNIX systems
There are several options available with the lex command. If you use one or
more of them, place them between the command name lex and the filename
The -t option sends lex's output to the standard output rather than to the file
The -v option prints out a small set of statistics describing the so-called finite
automata that lex produces with the C program lex.yy.c. (For a detailed
account of finite automata and their importance to lex, see the Aho, Sethi, and
Ullman book, Compilers: Principles, Techniques, and Tools, Addison-Wesley,
lex uses a table (a two-dimensional array in C) to represent its finite automa­
ton. The maximum number of states that the finite automaton requires is set
by default to 500. If your lex source has a large number of rules or the rules
are very complex, this default value may be too small. You can enlarge the
value by placing the following entry in the definitions section of your lex
%n 7 0 0
Thi s entry tells lex to make the table large enough to handle a s many a s 700
states. (The -v option indicates how large a number you should choose.) If
you have need to increase the maximum number of state transitions beyond
2000, the designated parameter is a, for example:
%a 2 8 0 0
You may refer to the Programmer's Reference page o n lex for a list o f all the
options available with the lex command.
The file lex.yy.c may be compiled and linked in the same way as any C pro­
gram. The -11 option is used to link the object file created from this C source
with lex library:
cc lex.yy.c -11
The lex library provides a default main( ) program that calls the lexical
analyzer under the name yylex( ), so you do not have to supply your own
main( ).
If you have the lex specification spread across several files, you can run lex on
each of them individually, but be sure to rename or move each lex.yy.c file
before you run lex on the next one. Otherwise, each file overwrites the previ­
ous one. Once you have generated all the C files, you can compile all of them
in one command line.
To compile and link the output files produced by lex and yacc, run:
cc lex.yy.c l y -11
Note that the yacc library is linked (with the -ly option) before the lex library
(with the -11 option) to ensure that the main( ) program supplied will call the
yacc parser.
By default, the lexical analyzer takes input from the standard input. To have
it take input from a file, use redirection; for example:
a.out <
a. out is the executable lexical analyzer.
Output is sent to the standard output. You can redirect this as well:
a.out < > text.out
Using make with lex
The make utility can be used to maintain programs that involve lex. make
assumes that a file that has an extension of .l is a lex source file. It knows how
such a file must be processed to create an object file.
Suppose that a list of dependencies in a makefile contains a filename x.o, and
there exists a file x.l. If x.l was modified later than the file x.c (or if x.c does not
exist), then make will cause lex to be run on x.l, and then cause the object file
x.o to be created from the resulting lex.yy.c. The make internal macro LFLAGS
can be used to specify lex options to be invoked automatically by make. (See
the chapter on make in this guide for more information.)
Programming Tools Guide
Chapter 9
yacc is a "parser generator''; It generates C code that can be compiled to pro­
duce a parser. A parser is a program that examines the input stream and, pos­
sibly among other things, checks whether it is syntactically correct according
to a given grammar.
yacc's input typically consists of:
a grammar-a set of rules describing the expected syntactic structure of the
input to the parser
actions-some C code to be invoked when a rule is recognized
auxiliary declarations and subroutines
A yacc-generated parser calls a low-level input scanner, called a lexical
analyzer. This routine reads the input stream and separates it into items
called "tokens". The sequence of tokens that the parser receives from the lexi­
cal analyzer is compared against the grammar rules. When a rule is recog­
nized, an action (code that the user has supplied for this rule) is executed.
Actions can return values, use values returned by previous actions, and carry
out any other operations possible in C.
The nucleus of the yacc specification is the collection of gramma r rules. Each
rule describes a construct and gives it a name. For example, the following rule
defines a symbol "date'' in terms of other symbols "month'', "day", and uyear'':
da t e
ye ar
The symbols to the right of the colon will have been defined as tokens, defined
in other rules in the specification, or else will be literals such as the comma in
the rule above. In the example, the comma is enclosed in single quotes, indi­
cating that the co mma is to appear literally in the input. The colon and semi­
colon serve as punctuation in the rule and have no significance in evaluating
the input. Input �uch as the following would be matched by this rule:
Apr i l 1 6 , 1 9 6 1
This chapter discusses the following topics:
preparing a yacc specification
the parser operation
operator precedences in arithmetic expressions
error detection and recovery
the operating environment and special features of the parsers yacc pro­
suggestions to improve the style and efficiency of the specifications
examples of yacc usage
Basic specifications
A yacc specification consists o f a mandatory rules section, and optional sec­
tions for definitions and user subroutines.
The definitions section, if present, must be the first section in the yacc pro­
gram. The mandatory rules section follows the definitions; if there are no
definitions, then the rules section is first. In both cases, the rules section must
start with the delimiter %% . If there is a subroutines section, it follows the
rules section and is separated from the rules by another %% delimiter. If
there is no second %% delimiter, the rules section continues to the end of the
When all sections are present, a specification file has the format:
dec l arat i o n s
ru l e s
subrou t i ne s
Programming Tools Guide
Basic specifications
The following example is a complete yacc specification:
% u n i on
char * t ex t ;
i n t i va l ;
% t oken t_DAY
% t oken t_MONTH
% t oken t_YEAR
t_MONTH t_DAY ' , ' t_YEAR
( p r i n t_da t e ( $ 2 , $ 1 , $ 4 ) ; ) ;
vo i d p r i n t _da t e ( d , m , y )
char * m ;
int d , y ;
print f ( ' %d % s %d\n ' , d , m, y ) ;
The sample program generates a parser which takes input in the form:
m o n t h day , year
This input is converted to output in the form:
day m o n t h year
In the example, the declarations · section defines a data structure used to hold
the values associated with tokens, and declares all the token names used in
the rules section. The rules section contains one rule and an action associated
with it. The subroutines section defines a function that is called in the action.
The parser uses a lexical analyzer that can return the tokens t_DAY,
t_MONTH, and t_YEAR, and also can associate each token with some value.
The mechanisms by which this may be implemented are discussed later in
this chapter, and also in the chapter on lex in this guide.
While the lexical analyzer may be included as a routine defined the
specification file, it would be more usual to define it elsewhere.
Blanks, tabs, and newlines are ignored in a yacc specification, but they cannot
appear in names or in multicharacter reserved symb ols. Comments can
appear wherever a symbol is legal. They are enclosed in I* * I, as in C.
rules section
This section describes the grammar rules and actions that appear in the rules
section of a yacc specification.
Terminal and non-terminal symbols
The tokens that are recognized by the lexical analyzer and passed on to the
parser are referred to as 0terminal symbols", because they cannot be broken
down into smaller units. The terms ntoken'' and a terminal symbol" are essen­
tially synonymous. By contrast, symbols that can be broken down into other
symbols are called anon-terminal symbols". Symbols of each type are also
called "names".
The rules section is made up of' one or more grammar rules, each of which
has the form:
l n ame
de f i n i t i o n ;
The purpose of the rule is to define lname in terms of other symbols. In the
example, lname is a non-terminal symbol, as are all symbols that appear to the
left of the colon in some rule. The definition of name can consist of a sequence
of terminal symbols, a sequence of other non-terminal symbols, or a sequence
of both terminal and non-terminal symbols. Non-terminal symbols that
appear in the definition of another symbol are still regarded as non-terminal
symbols in that context. The colon and the semicolon are yacc delimiters: the
colon separates the non-terminal symbol on the left from its definition and the
semicolon must be the last character in the rule.
H actions (discussed later in this chapter) are associated with the rule, they
can appear between the colon and the semicolon.
Symbols can be any length and are composed of letters, dots, underscores,
and digits, although a digit cannot be the first character of a symbol. Upper­
case and lowercase letters are distinct. The NULL character (\0 or 0) should
never be used in a grammar rule.
Literals and escape sequences
A literal in a definition consists of a character enclosed in single quotes (' ).
Literal characters must be passed to the parser by the lexical analyzer, and are
considered to be tokens. As in C, the backslash (\) is an escape character
within literals; all the C escapes are recognized. The following escapes are
understood by yacc .
Programming Tools Guide
The rules section
' \n '
' \r'
' \ ''
' \t'
' \b'
' \ f'
'\ nnn'
n ew l i ne
s i ng l e qu o t e ( ' )
backs l a s h ( \ )
t ab
f o rm feed
a chara c t e r in oc t a l n ot a t i o n
Alternation in grammar mles
If there are several grammar rules with the same left-hand side, the vertical
bar ( I ) can be used to combine them and thereby avoid rewriting the left­
hand side. The semicolon at the end of a rule is dropped before a vertical bar.
The following constructions are equivalent:
It is not necessary for all grammar rules with the same left side to appear
together in the grammar rules section, although if they do the input will be
more readable and easier to change.
Empty mles
If a non-terminal symbol matches the empty string, this can be indicated by
an empty definition, such as the following:
eps i lon :
Representing constructs as tokens or non-tenninals
Depending on the situation, you can recognize constructs using either the lex­
ical analyzer or grammar rules. For example, the following rules can be used
to define the symbol month.
'J ' 'a ' 'n '
month : ' F ' 'e ' 'b '
' D ' 'e ' 'c '
In this example, month is a non-terminal symbol. The lexical analyzer only
needs to recognize individual letters, and therefore may be very simple.
However, such low-level rules are wasteful and result in a complicated specif-
··- -
ication. To avoid this problem, have the lexical analyzer recognize strings
such as "January" and return an indication that a "month" token was seen. In
that case, "month'' is a terminal symbol and the detailed rules are not needed.
Erroneous specifications
In some cases, yacc fails to produce a parser when given a set of specifica­
tions. The specifications may be self-contradictory or they may require a
more powerful recognition mechanism than that available to yacc. The
former cases represent design errors; the latter often can be corrected by mak­
ing the lexical analyzer more powerful or rewriting some of the grammar
An action is a set of C statements enclosed in curly braces, '{' and '}'. The user
can associate one or more actions with each grammar rule; these actions are
performed when the rule is recognized. An action can appear anywhere in the
list of symbols that define a rule, including before the first symbol: usually,
actions follow the final symbol in the definition. Actions can return values,
obtain the values returned by previous actions, and use values for tokens
returned by the lexical analyzer. They can also carry out other tasks that can
be programmed in C, such as doing input and output, calling subroutines,
modifying variables, and so on.
The following are examples of grammar rules with actions:
he l l o ( 1 1 ' abc ' ) ;
( vo i d ) p r i nt f ( ' a me s s a g e \ n ' ) ;
f l ag = 2 5 ;
Programming Tools Guide
The rules section
Values of symbols
Each symbol in a grammar rule, including the one to the left of the colon, may
have some value associated with it. In the case of a terminal symbol, this
value can be assigned by the lexical analyzer (for example, the literal value of
an identifier). Non-terminal symbols recognized by the parser can have values
associated with them by parser actions. These values can be numbers, text, or
another kind of data structure.
The dollar-sign symbol ($) is used in actions to access the value of a symbol.
The pseudo-variable $$ represents the value returned by the action. For exam­
ple, the following action returns the value '1'.
If the action follows the final symbol in the definition, then $$ is the value
associated with the symbol on the left of the colon, and will be that symbol's
value when it appears on the right of the colon in another grammar rule.
To obtain the values returned by previous actions and the lexical analyzer, the
action can use the pseudo-variables $1, $2, $n. $n refers to the value of the
nth symbol or action to the right of the colon. In the following rule, $2 is the
value returned by C, and $3 is the value returned by D.
Consider the rule:
1 [1
ex pr
1 } 1
One would expect the value returned by this rule to be the value of the expr
within the parentheses. Since the first component of the action is the literal
left parenthesis, the desired result can be obtained with the following action:
' ['
e x pr
1 } '
$2 ;
default action
By default, the value of a rule (that is, the value assigned to the symbol to the
the value of the first element in the definition, ($1). Thus,
grammar rules such as the following example, which has only one symbol to
the right of the colon, often need not have an explicit action:
left of the colon) is
This example is equivalent to:
$1 ;
Actions in the middle of rules
In previous examples, all actions came at the end of rules. Sometimes it is
desirable to have an action take place before a rule is fully parsed. yacc per­
mits an action to be written in the middle of a rule as well. This action can
return a value that is accessible by the actions to the right of it through the
usual $ mechanism. In turn, it can access the values returned by the symbols
or actions to its left. The following example of such a rule sets x to 1, and sets
y to the value returned by C.
$$ = 1 ;
X = $2 ;
y = $3 ;
The first action is given a value via the assignment to $$. Because that action
is the second component of the list to the right of the colon, its value is
referred to in subsequent actions as $2. The value returned by C, which would
normally have been $2, is now $3.
yacc treats the previous example as if it were written as follows, where ACT is
an empty action:
I* empty * /
ss =
X = $2 ;
y = $3 ;
Programming Tools Guide
The rules section
Accessing left-context symbols
The following discussion is somewhat advanced and therefore should be
given careful examination.
An action associated with the left-hand symbol in a rule may need to refer to
values associated with symbols that occurred before the current left-hand
symbol. These values are referred to as left-context values, because they are
associated with symbols that appeared to the left of the current left-hand sym­
bol in another rule in the specification. Consider the following yacc specifica­
tion for a grammar that recognizes dates.
% t oken t _MONTH t_DAY t_YEAR
%un ion (
char * t ex t ;
i n t i va l ;
I ;
d a t e : ye ar day m o n t h
m o n t h : t_MONTH
( i f ( ! s t rcmp { $ 1 , ' Febru ary ' l l
i f ( $0==29 && ( $ - 1 1 %4 ! =0 1
p r i n t f ( ' Too many day s ! \ n ' l ;
$$ = $1;
: t_YEAR ;
$$ = $1 ;
In this example the lexical analyzer routine associates an integer value with
the tokens t_DAY and t_YEAR, and a character string with the token
The action associated with the symbol "month'' checks whether nFebruary 29"
occurs in a non-leap year. To do so, it needs to know what values are associ­
ated with the "day" and ayear'' symbols. These symbols appear to the left of
"month" in the first rule in the specification, and so their values are left­
context values with respect to the symbol "month''.
There are two constructions for accessing left-context values. The value asso­
ciated with the symbol immediately to the left of the current left-hand symbol
is referred to as $0. Values farther to the left are referred to by constructions of
the form
For example, the pseudo-variable
refers to the value associ­
ated with the symbol which is two symbols to the left of nmonth''.
the pseudo-variable
In general,
refers to the value associated with the symbol that is
symbols to the left of the current symbol.
the action associated with nmonth" in the example,
$0 refers
to the value
associated with nday'', and $-1 refers to the value associated with nyear".
Parse trees
In many applications, output is not done directly by the actions. A data struc­
ture, such as a parse tree, is constructed in memory, and transformations are
applied to it before output is generated. Parse trees are particularly easy to
construct, given routines to build and maintain the tree structure desired. The
following example shows a
C function node written so that the following call
creates a node with label L and descendants
and n2, and returns a pointer
to the newly-created node:
node ( L , n l , n2 )
Then a parse tree can be built by supplying actions such as:
node ( ' + ' , $ 1 , $ 3 l ;
declarations section
The declarations section is used to declare and describe constructs that are
needed by the parsing mechanism and the actions associated wlth rules in the
rules section.
Any token that appears in a rule in the rules section must be declared. There
are several ways of doing this: the most common way is by using a %token
statement. This has the form:
% t oken name l name2 . . .
Each name that appears after the keyword
%token is thereby declared as a
token. You can declare several tokens on the same statement, and have several
%right and %nonassoc keywords, discussed in the section °Precedence."
such statements. Tokens can be declared in a similar way using the
Every name that appears in the rules section, but is not defined in the declara­
tions section, is assumed to represent a nonterminal symbol. Every nontermi­
nal symbol must appear on the left side of at least one rule.
Programming Tools Guide
The declarations section
start symbol
The start symbol is the top-level non-terminal symbol in the grammar . By
default, the start symbol is the left-hand side of the first grammar rule in the
rules section. You can declare a start symbol explicitly in the declarations sec­
tion, using the %start keyword:
% s t art s omename
C declarations
C code to be used by the parser can appear in the declarations section,
enclosed between the delimiters '%{' and '%}'. Declarations made here have
global scope, so they are known to the action statements and can be made
known to the lexical analyzer. This section is usually used for variable
declarations and #include statements, though other C code can appear here,
as shown in the following example:
# i n c lude ' g l oba l . h '
i n t i va l = 0 ;
Names beginning with yy should be avoided, because internal variables used
by the parser begin with these characters.
Support for arbitrary value types
By default, the values that parser actions associate with symbols are integers.
yacc can also support values of other types, including structures. You can
declare a C union that holds the different kinds of value that symbols can
have. The parser maintains a data structure called value stack that is declared
to be of this union type. To declare the union, use a declaration in the follow­
ing form:
%un i o n
. . . body o f u n i on
For example:
% u n i on
char * t e x t ;
i n t i va l ;
doub l e dub ;
� 1
In addition to the value stack, the external variables yylval and yyval are
declared to have type equal to this union. If yacc is invoked with the -d
option, the union declaration is defined under the name YYSTYPE in the file
Once YYSTYPE is defined, the union member names must be associated with
the various terminal and nonterminal names. This enables yacc to automati­
cally associate the right type with the pseudo-variables used in actions so that
the resulting parser is type-checked. For non-terminal symbols, this associa­
tion is done using the %type keyword. The following declarations associate
symbols with the members of the union in the example above:
% t oken < t ex t > s l s 2
% t oken < i va l > s 3
% t oken < dub> s 4
To associate a terminal symbol (token) with a union member name, the
%token keyword is normally used. The following declaration associates the
tokens s5 and s6 with the union member ival.
% t oken
< i va l > sS s G
In some cases, these mechanisms are insufficient. For example, there i s no
default type for the value returned by an action that occurs in the middle of a
rule. Similarly, yacc must be told explicitly about the type of left-context
values such as $0. In such cases, a type can be imposed by inserting a union
member name between angle brackets, '<' and '>', immediately after the first
" $ " in a pseudo-variable. The following example shows this usage.
$< i nt va l > $
f u n ( $< i nt va l >2 , $<ot her>O } ;
Other declarations
The keywords %left, %right, and o/ononassoc can replace %token in the
preceding examples. However, these keywords are used principally to deal
with operator precedence and associativity. An understanding of precedence
and associativity relies heavily on the discussions which follow, and hence
consideration of these topics will be delayed until the section "Precedence".
Programming Tools Guide
Lexical analysis
subroutines section
This section contains user-defined routines. Usually the routines appearing
here are used by actions that appear in the rules section. Additionally, a parser
generated by yacc must be supplied with three routines: yylex to perform lexi­
cal analysis, yyerror to deal with error messages, and a main( ) routine. These
routines can be supplied in the subroutines section.
The subroutines section is available as a convenience, because any functions
that appear there can also be included using an include directive in the
declarations section, or else linked to the parser module.
Lexical analysis
The user must supply a lexical analyzer to read the input stream and pass
tokens to the parser. This analyzer may also have to associate values with the
tokens and make these values known to the parser. A parser generated by
yacc calls an integer-valued function called yylex to provide tokens. The func­
tion returns an integer, the "token number'', that represents the kind of token
being read. If value is associated with that token, the lexical analyzer should
assign it to the external variable yylval. A parser generated by yacc knows to
look in this variable for the values of symbols that are defined as tokens. The
variable yylval takes an integer value by default. It can be redefined to take on
other kinds of values, as explained in the earlier section "Support for arbitrary
value types."
The parser and the lexical analyzer must agree on these token numbers before
they can communicate. The numbers can be chosen by yacc or the user.
lex and yylval
The lex utility is a program generation tool for constructing lexical analyzers.
Lexical analyzers produced by lex are designed to work with yacc parsers.
lex generates a lexical analyzer which is called by calling a function yylex.
The following example of a lex specification for a lexical analyzer can be used
with either of the parsers for recognizing dates, generated from the yacc spe­
cifications given earlier in this chapter.
H i n c l u de < s t d l i b . h >
H i n c lude • y . t ab . h '
char * p ;
January i February i March i Ap r i l i May i Ju n e i Ju l y i Augu s t l
Sept ember i Oc t ober i November i December
{ mon )
P = { char * ) c a l l o c { s t r l en { yy t e x t ) t l , s i z e o f ( ch a r ) ) ;
s t rcpy ( p , yytext ) ;
yy l va l . t e x t =p ;
return ( t_MONTH ) ;
[0-9) { 1 , 2)
y y l va l . iva l =a t o i ( y y t e x t ) ;
return ( t_DAY ) ;
[0-9] { 4 )
yy l va l . i va l =a t o i ( yy t e x t ) ;
return ( t_Y EAR ) ;
The analyzer returns a token t_MONTH, t_DAY, or t_YEAR when it recognizes
the corresponding sequence of characters. The lexical analyzer associates a
t_DAY or t_YEAR token with an integer value and a t_MONTH token with a
character string. The tokens are declared in the yacc specification and subse­
quently defined in the file, generated by yacc with the -d option.
The previous example illustrates the use of yylval. This variable is defined as
a C union that has a member called text to point to character strings, and a
member ivai to hold an integer value. This definition was performed in the
yacc specification.
In the action for the first pattern, the lexical analyzer puts the value of the
string that it matches into the array yytext. A copy is made of yytext, and
yylval.text is assigned a pointer to this value. If a pointer to yytext had been
assigned to yylval.text, a problem could arise because the value in yytext
could get overwritten by the lexical analyzer by the time yylval was used by
the parser.
The actions for the second and third patterns convert the matched string to an
integer and assign yylval.ival this value.
Programming Tools Guide
Lexical analysis
Token numbers
By default, token numbers are chosen by yacc. The default token number for
a literal character is the numerical value of the character in the local character
set. Other names are assigned token numbers starting at 25 7. These
definitions are placed in the file
If the user prefers to assign the token numbers, the first appearance of the
token name or literal in the declarations section must be followed immedi­
ately by a nonnegative integer. This integer is taken to be the token number of
the name or literal. Names and literals not defined this way are assigned
default definitions by yacc. The potential for duplication exists here, so care
must be taken to make sure that all token numbers are distinct.
end marker
The end of the input to the parser is signaled by a special token, called the
"end-marker". The end-marker is represented by either a zero or a negative
number. Thus, every lexical analyzer should be prepared to return 0 or a
negative number as a token upon reaching the end of its input. If the tokens
up to, but not including, the end-marker form a construct that matches the
start symbol, the parser function accepts the input and returns to its caller
after the end-marker is seen. If the end-marker is seeri in any other context, it
is an error.
It is the responsibility of the user-supplied lexical analyzer to return the end­
marker when appropriate. Usually the end-marker represents some reason­
ably obvious I/0 status, such as end-of-file or end-of-record. The program­
mer does not have to deal with this if using lex, which takes care of this job
Reserved token names
Avoid using any token names in the grammar that are reserved or significant
in C language or· the parser. For example, the use of token names if or while
will almost certainly cause severe difficulties when the lexical analyzer is com­
piled. The token name error is reserved for error handling and should not be
used naively. One technique for avoiding this kind of difficulty is to preface
all token names with some string, such as t_ , that is unlikely to appear in any
reserved word.
yacc environment
When yacc processes a specification, the output is a file of C code, named
The function produced by yacc is called yyparse( ) and is an integer­
valued function. When yyparse is called, it repeatedly calls yylex( ), the lexical
analyzer supplied to obtain input tokens. If the lexical analyzer returns the
end-marker token and the parser accepts the input, yyparse( ) returns the
value 0. If an error is detected, yyparse( ) returns the value 1, and no error
recovery is possible.
A main( ) routine that calls yyparse( ) must be defined. In addition, a routine
called yyerror( ) is needed to print a message when a syntax error is detected.
The user must supply these two routines. A library has been provided with
default versions of main( ) and yyerror( ) The library is accessed by using the
-ly option to cc(CP) or ld(CP). The following source code shows the triviality
of these default programs.
ma i n { )
return ( yyparse ( ) ) ;
# i n c l ude < s t d i o . h>
yyerror { s )
char * s ;
{ vo i d ) f pr i n t f { s t derr , ' % s \ n ' , s ) ;
The argument to yyerror( ) is a string containing an error message, usually the
string syntax e r r o r . An application may require more sophisticated error
reporting. The program should keep track of the input-line number and print
it with the message when a syntax error is detected. The external integer vari­
able yychar contains the look-ahead token number at the time the error was
detected. This may be useful in giving better diagnostics.
The external integer variable yydebug is normally set to 0. If not, the parser
outputs a verbose description of its actions, including the input symbols read
and the parser actions. You can set this variable by using a debugger (adb,
sdb, codeview, or dbxtra).
Programming Tools Guide
Compiling and running the parser
Compiling and running the parser
This section explains how to create and run the parser once yacc is run to pro­
duce the C code.
The file produced by yacc can be compiled with cc as follows:
cc y . t ab . c - ly
The -ly option causes the object file generated to be linked with the yacc
library. H you supply your own versions of main( ) and yyerror( ) link the
module containing these routines before the library.
A routine named yylex( ) must be provided to do lexical analysis and return
tokens to the parser. This routine can be supplied by the user or generated
from a lex specification. A lexical analyzer generated lex can be linked to the
parser using the following command line:
cc y . t ab . c l ex . yy . c - ly - l l
lex.yy.c is the C file that lex produces. If you are using the library version of
main( ), the -ly option must appear before -11, so that the version in the yacc
library is used.
Using make
The make facility can be used to help maintain the parser code. make inter­
prets files with the extension .y as yacc source files. Suppose a list of depen­
dencies in a makefile contains a filename x.o, and there exists a file x.y. H x.y
has been modified later than x.c, or if x.c doe not exist, then make will cause
yacc to be run on x.y, and cause the object file x.o to be created from the result­
Running the parser
The resulting executable file accepts input from the standard input H, for
example, the data to be parsed is in a file called textin, it can be processed
using the following command line:
a . out < t ex t i n
The parser's output is sent to the standard output This c an b e captured in a
�: . . . ,� ·
Parser operation
The algorithm which yacc uses to go from the specification to the C code for
the parser is complex and will not be discussed here. The parser itself is rela­
tively simple and understanding how it works will make treatment of error
recovery and ambiguities easier. (Understanding this section is not essential
to being able to use yacc, but it will likely prove helpful.)
The parser produced by yacc consists of a finite-state machine with a state
stack. The parser is also capable of reading and remembering the next input
token, called the look-ahead token. The current state is always the one on the
top of the stack. The states of the finite-state machine are given small integer
labels. In addition to the state stack, a parallel stack, the value stack, holds the
values returned by the lexical analyzer and the actions. Initially, the machine
is in state 0 (that is, the state stack contains only state 0) and no look-ahead
token has been read.
The machine has only five actions available: shift, reduce, accept, error, and
goto . The goto is always performed as a component of a reduce action. The
parser operates in the following manner:
1 . Based on its current state, the parser decides whether it needs a look­
ahead token to choose the action to be taken. If so, it calls yylex( ) to obtain
the next token.
2. Using the current state and the look-ahead token if needed, the parser
decides on its next action and carries it out. This may result in states being
pushed onto or popped off the stack, and in the look-ahead token being
processed or left alone.
When yacc is invoked with the -v option, a file called y.output is produced
with a human-readable description of the parser. The actions referred to in
the discussion that follows are taken from such a description file.
shift action
The shift action is the most common one the parser takes. A shift occurs
whenever a token is recognized. Whenever a shift action is taken, there is
always a look-ahead token. A shift is represented in y.output as follows
(assume that the current state is 56):
sh i ft 3 4
This says that in state 56, if the look-ahead token is LOOP, then the current
state (56) is pushed down on the stack, and state 34 becomes the current state,
that is, it is pushed on the stack. In addition, the look-ahead token is cleared,
and the variable yylval is pushed on to the value stack.
Programming Tools Guide
Parser operation
The reduce action
A reduce action takes place when the parser determines that all the items on
the right-hand side of some grammar rule have been seen. In a reduce action,
all the states that were put on the stack while the right-hand side of the rule
was being recognized are popped from the stack. Then a new state is put on
the stack, based on the symbol on the left of the rule, the state that is currently
at the top of the stack, and sometimes the look-ahead token. Suppose the fol­
lowing rule is being reduced:
The reduce action depends on the symbol to the left of the colon and the num­
ber of symbols on the right-hand side (in this case, three). This reduction first
involves popping three states off the top of the stack. (In general, the number
of states popped equals the number of symbols on the right side of the rule.)
In effect, these states were the ones put on the stack while recognizing x, y,
and z and no longer serve any useful purpose. After these states have been
popped, the state that is on top of the stack is the one the parser was in before
it recognized any of the symbols on the right side of the rule. Next, something
similar to a shift of 'X using this uncovered state is performed. A new state is
obtained and pushed onto the stack, and parsing continues. This action is
called a goto. There are differences between a goto and an ordinary shift of a
token. In particular, the look-ahead token is cleared by a shift but is not
affected by a goto. Sometimes, but not usually, it will be necessary for the
parser to refer to the look-ahead token to decide whether or not to reduce.
In effect, the reduce action turns back the clock in the parse, popping the
states off the stack so that the stack has the same contents as before the sym­
bols on the right side of the rule were recognized. The parser then treats the
symbol on the left side of the rule as if it were an input token and performs an
action accordingly. Note that if the rule has an empty right-hand side, no
states are popped off the stack.
The reduce actions are associated with individual gramma r rules. In the
y.output file, these rules are given small integer numbers, which could lead to
some confusion. The following action refers to grammar rule 18:
reduce 1 8
This action refers to state 34:
shi ft 3 4
In any case, the state which is uncovered when symbols are popped off the
stack on a reduce will contain an entry such as the following:
goto 2 0
If the left side of the current rule consists of the symbol " A ", this action
causes state 20 to be pushed onto the stack.
The reduce action is also important in the treatment of user-supplied actions
and values. When a rule is reduced, the code supplied with the rule is exe­
cuted before the stack is adjusted. When a shift takes place, the external vari­
able yylval is copied onto the value stack. A reduction takes place after the
action code associated with the rule is carried out. When the goto action is
done, the external variable yyval is copied onto the value stack. The pseudo­
variables $1, $2, and so on, refer to the value stack.
The accept action
The other two parser actions are conceptually much simpler. The accept
action indicates that the entire input has been seen and that it matches the
specification. This action appears only when the look-ahead token is the end­
marker, and indicates that the parser has done its job .
The occurrence o f accept can b e simulated i n an action b y use o f the macro
YYACCEPT. The YYACCEPT macro causes yyparse( ) to return the value 0,
indicating a successful parse. Here is an example of its use:
que s t
: wea l t h
I l ove
l ho ly g ra i l
The error action
The error action, on the other hand, represents a place where the parser can no
longer continue parsing according to the specification. The input tokens it has
seen (together with the look-ahead token) cannot be followed by anything
that would result in a legal input. The parser reports an error and attempts to
recover the situation and resume parsing. Error recovery will be discussed
The error parser action can be simulated in a action code by use of the macro
YYERROR. YYERROR causes the parser to behave as if the current input
symbol had been a syntax error. The function yyerror( ) is called and error
recovery takes place. Here is an example of such usage:
seq :
f i r s t second t h i rd
i f ! $1<$2 I I $2<$3 l
pr i n t f { ' Va l u e s out o f order ! \ n ' } ;
Programming Tools Guide
Parser operation
Interpreting the y.output file
Consider the following yacc specification:
% t oken
p l ac e
p l ace
The y.output file corresponding to the preceding grammar (with some statis­
tics stripped off the end) is:
state 0
$ accept
shift 3
goto 1
s ound
goto 2
s tate 1
$ accept
$ end
state 2
shift 5
goto 4
p l ace
state 3
s ound
shift 6
state 4
p lace_
state 5
p l ace
(3 )
state 6
The actions for each state are specified, with a description of the parsing rules
processed in each state. The underscore ( _ ) character in a rule separates the
symbols that have been seen from those that are expected to follow. The fol­
lowing input can be used to track the operations of the parser:
The input is processed in the following steps:
1. Initially, the current state is state 0 . The parser needs to refer to the input
to decide between the actions available in state 0, so the first token, DING,
is read and becomes the look-ahead token. The action in state 0 on DING
is shift 3, so state 3 is pushed onto the stack and the look-ahead token is
2. The next token, DONG, is read and becomes the look-ahead token. The
action in state 3 on the token DONG is shift 6, so state 6 is pushed onto
the stack and the look-ahead is cleared. The stack now contains 0, 3, and
3. In state 6, without even consulting the look-ahead, the parser reduces by
rule 2:
In the reduction, two states, 6 and 3, are popped off the stack, uncovering
state O.
4. Now sound, the left side of rule 2, has just been recognized. Consulting the
description of state 0, we see that there is a goto on sound:
goto 2
This causes state 2 to be pushed onto the stack and become the current
5 . In state 2, b'le next token, DELL, is read. The action is shift 5, so state 5 is
pushed onto the stack, which now has 0, 2, and 5 on it. The look-ahead
token is cleared.
6. In state 5, the only action is to reduce by rule 3:
p lace
This rule has one symbol on the right-hand side, so one state, 5, is popped
off, and state 2 is uncovered.
7. There is a goto on place in state 2; this causes the state to become 4. Now,
the stack contains 0, 2, and 4.
8. In state 4, the only action is to reduce by rule 1 :
: s ound
p l ac e
There are two symbols on the right, so the top two states are popped off,
uncovering state 0 .
Programming Tools Guide
Ambiguity and Conflicts
9. In state 0, there is a goto on rhyme, causing the parser to enter state 1 .
10. In state 1 , the end-marker, indicated by $end, i s obtained when the input is
read. The accept action in state 1 (when the end-marker is seen) success­
fully ends the parse.
Ambiguity and Conflicts
A set of grammar rules is ambiguous if there is some input string that can be
structured in two or more different ways. For example, the following gram­
mar rule is a natural way of expressing the fact that one way of forming an
arithmetic expression is to join two other expressions with a minus sign.
Unfortunately, this grammar rule does not completely specify how all com­
plex inputs should be structured.
The yacc program detects such ambiguities when it is attempting to build the
Consider the problem that confronts the parser when it is given the input:
After the parser reads the second expr, the visible input is:
This matches the right side of the preceding gramma r rule. The parser can
reduce the input by the rule. After applying this rule, the input is reduced to
expr (the left side of the rule). The parser then reads the final part of the input:
A similar reduction then takes place. This interpretation of the rule is called
the "left associative'' interpretation.
Alternatively, the parser could defer the immediate application of the rule
when it has seen:
Instead, the parser can continue reading until it sees the whole input:
It can then apply the rule to the rightmost three symbols, reducing them to
expr, leaving:
Now the rule can be reduced once more. This interpretation is called "right
Depending on which interpretation is used the parser can do either a shift or a
reduction after reading:
This is called a shift-reduce conflict. The parser can also have a choice of two
legal reductions, called a reduce-reduce conflict. There are no shift-shift
Disambiguating mles
yacc uses disambiguating rules so that it can produce a parser when shift­
reduce or reduce-reduce conflicts occur.
The two default disambiguating rules are:
where there is a shift-reduce conflict, perform the shift
where there is a reduce-reduce conflict, reduce by the grammar rule that
appears earlier in the yacc specification
Rule 1 means that reductions are deferred in favor of shifts when there is a
choice. Rule 2 gives the user rather control over the behavior of the parser
when there are reduce-reduce conflicts; these should be avoided whenever
In general, when it is possible to apply disambiguating rules to a grammar to
produce a correct parser, it is also possible to write equivalent grammar rules
which do not have the ambiguities.
yacc always reports the number of shift-reduce and reduce-reduce conflicts
resolved by the two disambiguating rules. The conflict messages of yacc are
best understood by examining the y ou tp ut file. Here is an example which
describes the conflicts that arise from an ifaelse construct, a common source
of ambiguity in programming language grammar s:
2 3 : s h i f t - reduce co n f l i c t ( sh i f t 4 5 , redu c e 1 8 ) on ELSE
state 23
sh i ft 4 5
reduce 1 8
s t a t ELSE
( 18 )
The first line describes the conflict, giving the state and the input symbol. The
state description gives the grammar rules active in the state and the parser
actions. In state 23, the parser has seen input corresponding to:
Programming Tools Guide
Two grammar rules are active at this time, and so the parser can do one of two
things. If the input symbol is ELSE, it will shift into state 45. State 45 will
have the following line as part of its description, indicating that ELSE has
been shifted to arrive at this state.
s t a t : I F ( cond ) s t a t ELSE_s t a t
In state 23, the default action (designated by a dot, . ) will be performed if the
input symbol is something other than ELSE. In that case, the parser reduces
by grammar rule 18:
stat :
( cond ) s t a t
The fact that the action that occurs when ELSE is read appears first in y.output
indicates that the shift is the favored action.
Users who encounter unexpected shift-reduce conflicts will probably want to
look at the verbose output to decide whether the default actions are appropri­
The disambiguating rules given above are not sufficient to resolve the
conflicts that arise in the parsing of arithmetic expressions. These situations
require that the parser be given some information about precedence and associ­
ativity. Most of the commonly-used constructions for arithmetic expressions
can be described naturally by the notion of precedence levels for operators,
together with left and right associativity. It turns out that ambiguous gram­
mars with appropriate disambiguating rules can be used to create parsers that
are faster and easier to write than those constructed from unambiguous gram­
Grammar rules for binary operators are typically written in the form:
Rules for unary operators typically look like:
These create a very ambiguous grammar with many parsing conflicts. To
avoid ambiguity, the user can specifies the precedence of all the operators and
the associativities of the binary operators. This information is sufficient to
allow yacc to resolve the parsing conflicts and construct a parser that imple­
ments the precedences and associativities.
In an expression where there is a choice of two operators to evaluate, pre­
cedence determines which of the two is evaluated. For example, the following
expression could be evaluated to 23 or 35, depending on whether '+' or '*' is
evaluated first.
The operator evaluated first is said to have higher precedence.
Associativity determines which side of an expression involving a particular
operator should be evaluated first. The following example could be evaluated
to 2 or 4 :
The result depends on whether the logical grouping is:
( 6-3 ) -1
If an operator is left associative, then the expression to the left of the operator is
evaluated first, as in the first case above. If it is right associative, then the right
side is evaluated first, as in the second case above.
The precedences and associativities are attached to tokens in the declarations
section of the yacc specification. This is done with a series of lines starting
with one of the yacc keywords %left, %right, or %nonassoc, followed by a list
of tokens. All of the tokens on the same line have the same precedence level
and associativity; the lines occur in order of increasing precedence. The fol­
lowing lines describe the precedence and associativity of the four arithmetic
% left
% left
' *'
' '
' /'
Plus and minus are left-associative and have lower precedence than star and
slash, which are also left-associative. The keyword %right is used to indicate
right-associative operators, and the keyword %nonassoc is used to describe
operators, like LT in FORTRAN, that may not associate with themselves. The
following statement is illegal in FORTRAN, and therefore the .LT operator
should be described with the keyword %nonassoc :
A . LT . B . LT . C
Programming Tools Guide
Here is a yacc specification for expressions involving operators:
% r i ght
+ '
% left
%left '* '
'- '
,I ,
+ '
'* '
Now consider the following input tokens:
A parser that followed the specification just given would structure the input
as follows:
a = { b = {
{ { c *d ) - e ) - { f *g ) ) )
Assigning a precedence to grammar mles
In general, unary operators must be given a precedence. Sometimes a unary
operator and a binary operator have the same symbolic representation but
different precedences. An example is the unary and binary minus, (-). Unary
minus often has the same strength as multiplication, or even higher, while
binary minus will have a lower strength than multiplication.
Since the representations of the two operators are the same, the keywords
%left and %right cannot be used to set the precedences for both. Rather, the
precedence for the unary minus is associated with the grammar rule in which it
appears. The keyword %prec is used to change the precedence level associ­
ated with a particular grammar rule. Here is a grammar that involves unary
and binary minuses:
% left
'+ '
'* '
expr ' * '
expr , I ,
' - ' expr
'* '
The keyword %prec appears immediately after the body of the grammar rule,
before an action or closing semicolon, and is followed by a token name or
literal. It causes the precedence of the grammar rule to become that of the fol­
lowing token name or literal. In this example, it is used to give unary minus
the same precedence as the multiplication operator, ( "' ) .
If a token appears after one of the keywords %left, %right, and %nonassoc, it
does not need to be declared by %token as well, though it is not incorrect to
do so.
Precedences and associativities used by
yacc to resolve parsing conflicts give
rise to the following disambiguating rules:
The precedence and associativity associated with a grammar rule is that of
the last token or literal in the body of the rule. If the
%prec construction is
used, it overrides this default. Some grammar rules may have no pre­
cedence and associativity associated with them.
When there is a
reduce-reduce conflict, or when there is a shift-reduce
conflict and either the input symbol or the grammar rule has no precedence
and associativity, then the two default disambiguating rules given at the
beginning of the section are used and the conflicts are reported.
If there is a shift-reduce conflict, and both the grammar rule and the input
character have precedence and associativity associated with them, then the
conflict is resolved in favor of the action (shift or reduce) associated with
the higher precedence. If precedences are equal, then associativity is used.
Left-associativity implies reduce, right-associativity implies
nonassociativity implies error.
shift, and
reduce and reduce-reduce conflicts that yacc reports. This means that mis­
Conflicts resolved by precedence are not counted in the number of
takes in the precedence specification may disguise errors in the input gram­
mar. It is a good idea to be sparing with precedence and use them in a cook­
book fashion until you are experienced in using it. The y.output file is very
useful in deciding whether the parser is actually doing what was intended.
Error handling
The input to the parser will sometimes not conform to the specifications. If no
provision has been made to deal with errors, the processing halts when an
error is detected. Instead of stopping all processing when an error is found, it
is often more useful to continue scanning the input to find other syntax errors.
Some of the problems associated with error handling are semantic ones.
When an error is found, for example, it may be necessary to reclaim parse-tree
storage, delete or alter symbol-table entries, or set switches to avoid generat­
ing any further output.
Programming Tools Guide
Error handling
Error handling mechanisms can be provided as part of the input specifica­
tions. This permits the reentry of data after bad data has been seen, or the
continuation of the input process after skipping over the bad data. This leads
to the problem of where parser should resume parsing after an error. A gen­
eral class of algorithms to do this involves discarding a number of tokens
from the input string and attempting to adjust the parser so that input can
error token
The token name error is provided by ya cc to allow the user some control over
this process, This name can be used in grammar rules. In effect, it suggests
places where errors are expected and recovery might take place. If an error
occurs, the parser pops its stack until it enters a state where the token error is
legal. It then behaves as if the token error were the current look-ahead token,
and performs the action encountered. The look-ahead token is then reset to
the token that caused the error.
In order to prevent a cascade of error messages, after detecting an error, the
parser remains in an error state until three tokens have been successfully read
and shifted. If an error is detected when the parser is already in error state, no
message is given, and the input token is ignored.
As an example, the following rule means that on a syntax error the parser will
pop its state stack until the symbol stat is valid. It will then act as if stat has
been seen and perform the code associated with the token error.
a c t i on c ode
Error rules such as this are very general but difficult to control. Rules like the
following are somewhat easier:
Here, when there is an error, the parser attempts to skip over the statement to
the next period. Tokens following the error and preceding the next period
cannot be shifted and are discarded. When the period is seen, this rule will be
reduced and any action associated with it performed.
Interactive error recovery
Another form of error rule can be used in interactive applications where it
may be desirable to permit a line to be reentered after an error. The following
example is one way to do this:
i nput
: e rror
' \n '
( vo i d ) p r i n t f ( ' Reenter l a s t l i ne : • ) ;
i np u t
$ $ = $4 ;
There is one potential difficulty with this approach. The parser must correctly
process three input tokens before it considers that it has correctly resynchron­
ized after the error: if the reentered line contains an error in the first two
tokens, the parser deletes the offending tokens and gives no message. This is
most likely unacceptable, and so there is a mechanism that can force the
parser to believe that error recovery has been accomplished. The following
statement in an action resets the parser to its normal mode:
yyerrok ;
The last example can be rewritten as follows:
i nput :
error ' \ n '
yyerrok ;
( vo i d ) p r i n t f ( " Reenter l a s t l i n e : • ) ;
i nput
$$ = $4 ;
As previously mentioned, the next token seen after the error symbol is the
same token that was seen when error was discovered. Sometimes this is inap­
propriate: for example, an error recovery action might take upon itself the job
of finding the correct place to resume input. In this case, the previous look­
ahead token must be cleared. This can be done with the statement
yyc l e a r i n ;
For example, suppose the action after error were to call some sophisticated
resynchronization routine that attempted to advance the input to the begin­
ning of the next valid statement. After this routine was called, the next token
Programming Tools Guide
Hints for preparing specifications
returned by the lexical analyzer would presumably be the first token in a legal
statement. The old illegal token would have to be discarded and the error
state reset. An action similar to the following could perform this:
resynch ( ) ;
yyerrok ;
yyc l e a r i n ;
Hints for preparing specifications
This part contains miscellaneous hints on preparing clear, efficient, and easily
changeable specifications.
Input style
It is difficult to provide rules with substantial actions and still have a readable
specification file. Here are a few style hints:
Use all uppercase letters for token names and all lowercase letters for non­
terminal names. This helps in debugging.
Preface token names with an unusual sequence of characters, such as t_, to
ensure that there are no conflicts with C reserved words.
Put grammar rules and actions on separate lines to make the specification
easier to read and edit.
Put all rules with the same left side together. The left side should appear
only once and each rule after the first one should begin with a vertical bar.
Put the semicolon on a separate line after the last rule. This allows new
rules to be added easily.
Indent the body of a rule by one tab stop and the body of an action by two
tab stops.
Put complicated actions into subroutines.
uft recursion
The algorithm used by the yacc parser encourages left-recursive grammar
rules. Rules of the following form match this algorithm:
res t_o f_ru l e
Rules like this arise frequently during the writing of specifications for
sequences and lists. For example:
l i st
i t em
l i st
i t em
i t em
s e q i t em
The first rule in each group will be reduced for the first item only, and the
second rule will be reduced for the second and all succeeding items.
With right recursive rules such as the following, the parser is somewhat
bigger, and the items are seen and reduced from right to left:
i t em
i t em
A more serious problem is that an internal stack in the parser is in danger of
overflowing if a very long sequence is read. Thus, the user should use left
recursion whenever possible.
It is worth considering whether a sequence with zero elements has any mean­
ing; if it is, consider writing the sequence specification using an empty rule.
/ * empt y * /
s e q i t em
Once again, the first rule would always be reduced exactly once before the
first item was read, and then the second rule would be reduced once for each
item read. Permitting empty sequences often leads to increased generality.
However, conflicts might arise if the parser is asked to decide between empty
sequences that could satisfy more than one rule, if not enough input has been
seen to know which one is appropriate.
uxical tie-ins
Some lexical decisions depend on context. For example, the lexical analyzer
might normally want to delete blanks, but not within quoted strings; names
might be entered into a symbol table in declarations, but not in expressions.
One way of handling these situations is to create a global flag that is set by
Programming Tools Guide
Hints for preparing specifications
actions in the parser but which can be accessed by the lexical analyzer. Then
the lexical analyzer can make decisions on the basis of this variable's value.
The following specification specifies a program that consists of zero or more
declarations followed by zero or more statements.
i n t d f l ag ;
o t her dec l arat i o n s
p rog
dec l s
dec l s
/ * empt y * /
d f l ag
dec l s
dec l ara t i o n
/ * empt y * /
df lag
s t a t ement
other ru l e s . . .
The flag dflag is set to 0 when reading statements and 1 when reading declara­
tions. The first token in the first statement must be seen by the parser before it
can tell that the declarations section has ended and the statements have
begun. In many cases, this single token exception does not affect the lexical
Reserved words
Some programming languages permit you to use words like if, which are nor­
mally reserved as label or variable names, provided that such use does not
conflict with the legal use of these names in the programming language. This
is extremely hard to do in the framework of yacc. It is difficu lt to pass the lexi­
cal analyzer information telling it one instance of if is a keyword and another
instance is a variable. The user can attempt to implement this using the mech­
anism described in the last subsection, but this is not trivial. It is better that
the keywords be reserved, that is, forbidden for use as variable names.
This section contains two extended examples. They display many of the fea­
tures of yacc discussed in this chapter.
A simple example
This example gives the complete yacc applications for a small desk calculator;
the calculator has 26 registers, labeled a through z, and accepts arithmetic
expressions made up of the operators:
, * , / , % ( mo d opera t o r ) , & ( b i tw i s e a n d ) ,
I ( b i tw i se o r ) , and a s s i gnment s .
If an expression at the top level is an assignment, only the assignment is done;
otherwise, the expression is printed. As in C, an integer that begins with 0
(zero) is assumed to be octal; otherwise, it is assumed to be decimal.
As an example of a yacc specification, the desk calculator shows how pre­
cedence and ambiguities are used and demonstrates simple recovery. The
major oversimplifications are that the lexical analyzer is much simpler than
for most applications, and the output is produced immediately, line by line.
Note the way that decimal and octal integers are read in by grammar rules.
This job is probably better done by the lexical analyzer.
Programming Tools Guide
# i n c l ude < s t d i o . h>
# i n c l ude < c type . h>
i n t re g s [ 2 6 ] ;
i n t ba s e ;
% s t art l i s t
% t oken D I G I T LETTER
% left , I ,
% le f t '& ,
, , , ,
% left +
'* , ' '
% left
I '% ,
% l e f t U M I NUS / * supp l i e s precedence f o r u n a ry m i nu s * /
l ist
/ * beg i n n i ng o f ru l e s s ec t i o n * /
/ * emp t y * /
l ist stat '\n '
l i st error ' \ n '
yyerro k ;
{ vo i d ) p r i n t f { " % d \ n " , $1 ) ;
LETTER ' = ' expr
regs [ $ 1 ]
$3 ;
' ( ' expr ' ) '
$2 ;
expr ' + ' expr
$1 + $ 3 ;
$$ = $ 1 - $ 3 ;
$3 ;
expr ' / ' expr
$3 ;
expr ' % ' expr
& ' expr
$1 & $ 3 i
' - ' expr
$1 % $ 3 ;
$1 I $ 3 ;
%prec UM I NUS
-$2 ;
regs [ $ 1 ] ;
Programming Tools Guide
$ $ = $ 1 ; ba s e = ( $ 1 = = 0 } ? 8
10 ;
number D I G I T
$$ = base * $ 1 + $2 ;
/ * beg i n n i ng o f subrou t i ne s s ec t i on * /
i n t y y l ex ( }
int c ;
l e x i c a l a n a l y s i s rou t i n e * /
return LETTER f o r l owerc a s e l e t t e r , * /
yy l va l
0 through 2 5 * /
ret u r n s D I G I T for d i g i t , y y l va l = 0 t h rough 9 * /
a l l other chara c t ers are returned immed i a t e ly * /
/ * sk i p b l anks * /
' ')
wh i l e ( ( c = get char ( } }
/ * c i s n ow nonb l a n k * /
i f ( i s l ower ( c } }
yy l va l = c - ' a ' ;
return ( LETTER } ;
i f ( isdigit ( c ) }
yy l va l = C - ' 0 ' ;
return ( D I G I T } ;
return ( c ) ;
advanced example
This section gives an example of a grammar using some of the advanced fea­
tures. The desk calculator example is modified to provide a desk calculator
that does floating-point interval arithmetic. The calculator understands
floating-point constants, the arithmetic operations +, -, *, /, and unary minus
(-), and; registers labeled a through z. It also understands intervals written in
the following format, where X is less than or equal to Y:
{X, Y )
There are 26 interval-valued variables, A through Z, that may also be used.
The usage is similar to that in the previous example: assignments return no
value and print nothing, while expressions print the floating or interval value.
Intervals are represented by a structure consisting of the left and right end­
point values stored as doubles. This structure is given a type name, INTER­
VAL. The yacc value stack can also contain floating point scalars, and integers
used to index into the arrays holding the variable values.
YYERROR is used to handle error conditions. The errors dealt with in this
way are division by an interval containing 0, and an interval presented in the
wrong order. The error recovery mechanism of yacc is used to discard the
rest of the offending line.
In addition to the mixing of types on the value stack, this grammar also
demonstrates an interesting use of syntax to keep track of the type (for exam­
ple, scalar or interval) of intermediate expressions. No_te that a scalar can be
automatically promoted to an interval if the context demands an interval
value. This causes a large number of conflicts when the grammar is processed
by yacc: 18 shift-reduce and 26 reduce-reduce. The problem can be seen by
looking at the two input lines:
{3 . 5 - 4 . )
{ 3 . 5, 4)
Notice that the 2.5 is to be used in an interval value expression in the second
example, but this fact is not known until the co mma is read. By this time, 2.5
is finished, and the parser cannot go back and change its mind. More gen­
erally, it might be necessary to look ahead an arbitrary number of tokens to
decide whether to convert a scalar to an interval. This problem is avoided by
having two rules for each binary interval valued operator-one when the left
operand is a scalar, and one rule when the left operand is an interval. In the
second case, the right operand must be an interval, so the conversion will be
applied automatically. Despite this, there are still many cases where the
conversion may be applied or not, leading to the above conflicts. These are
resolved by listing the rules that yield scalars first in the specification file: in
Programming Tools Guide
this way, the conflict will be resolved in favor of keeping scalar valued expres­
sions scalar valued, until they are forced to become intervals.
This way of handling multiple types is very instructive. If there were many
kinds of expression types instead of just two, the number of rules needed
would increase dramatically and the conflicts even more so. Thus, while this
example is instructive, it is better practice in a more normal programming lan­
guage environment to maintain the type information at the lexical level, and
not as part of the grammar.
Finally, a word about the lexical analysis. The only unusual feature is the
treatment of floating-point constants. The C library routine atDf( ) is used to
do the actual conversion from a character string to a double-precision value.
If the lexical analyzer detects an error, it responds by returning a token that is
illegal in the grammar, provoking a syntax error in the parser and then error
# i nc l ude < s t d i o . h>
# i nc l ude < c type . h>
typede f s t ru c t i n t erva l
doub l e l o , h i ;
I NTERVAL vmu l [ ) , vd i v [ ) ;
doub l e a t o f [ ) ;
doub l e dreg [ 2 6 ] ;
I NTERVAL vreg [ 2 6 ] ;
% s t art l i ne
%un ion
i n t i va l ;
doub l e dva l ;
I NTERVAL vva l ;
% t oken < i va l > DREG VREG
/ * i nd i c e s i n t o dreg , vreg a rray s * /
% t oken < dva l > CONST
/ * f l oat i ng p o i n t c o n s t a n t * /
% type < dva l > dexp
/* expre s s i o n * /
% type < vva l > vexp
/* i n t erva l expre s s i o n * /
/ * precede n c e i n f ormat i o n about the opera t o r s * /
% left
% left
% left
,* ,
/* precedence f o r unary m i nu s * /
Programming Tools Guide
l ines
l ine
/ * beg i n n i ng o f ru l e s s e c t i o n * /
/ * empt y * /
l i ne s l i ne
dexp ' \ n '
( vo i d ) p r i n t f ( • % 1 5 . S f \ n
• ,
$1) ;
vexp ' \ n '
( vo i d ) p r i n t f ( ' ( % 1 5 . 8 f , % 1 5 . 8 f ) \ n ' , $ 1 . l o , $ 1 . h i ) ;
dexp ' \n '
dreg [ $ 1 ]
$3 ;
vexp ' \ n '
vreg [ $ 1 ]
$3 ;
error ' \ n '
yyerrok ;
dreg [ $ 1 ] ;
$3 ;
dexp ' ' dexp
$1 - $ 3 ;
dexp ' * ' dexp
$1 * $ 3 ;
dexp ' / ' dexp
$1 I $ 3 ;
' - ' dexp
%prec UMINUS
' ( ' dexp ' )
- $2 ;
$ $ . hi
$2 ;
j$ . lo
' ( ' dexp ' , ' dexp ' )
$1 ;
$ $ . lo = $2 ;
$$ . hi
i f ( $ $ . lo > $$ . hi
( vo i d ) pr i n t f ( " i n t erva l out o f order \ n " ) ;
'+ '
'+ '
vreg [ $ 1 ] ;
$$ . lo
$ 1 . hi
$1 . lo
$3 . hi ;
$3 . l o ;
$$ . hi = $ 1
$ $ . lo
$3 . hi ;
$3 . lo;
vexp ' - ' vexp
$$ . hi
$$ . lo
$ 1 . hi - $3 . l o ;
$ 1 . lo - $3 . hi ;
dexp ' - ' vexp
$$ . hi
$$ . 10
$1 - $ 3 . l o ;
$1 - $3 . hi ;
Programming Tools Guide
vexp ' * ' vexp
$$ = vmu l ( $ 1 . l o , $ 1 . h i , $ 3 ) ;
dexp ' * ' vexp
$$ = vm u l ( $ 1 , $ 1 , $ 3 ) ;
vexp ' / ' vexp
i f ( dchec k ( $3 l l Y Y E R ROR ;
$ $ = vd i v ( $ 1 , $ 1 , $3 J ;
dexp ' / ' vexp
i f ( dchec k ( $3 ) ) Y Y ERROR ;
$ $ = vd i v ( $ 1 . l o , $ 1 . h i , $ 3 ) ;
' - ' vexp
%prec UM I NUS
$$ .hi
- $2 . lo ; $$ . l o = - $ 2 . hi ;
' ( ' vexp ' J '
$$ = $2 ;
/ * beg i n n i ng o f subrou t i ne s s e c t i o n * /
/ * bu f fer s i ze f o r f l o a t i ng p o i n t number * /
I de f i n e BSZ 5 0
/ * l e x i c a l ana l y s i s * /
i n t yy l ex ( J
reg i s t e r i n t c ;
/ * s k i p over b l a n k s * /
wh i l e ( ( c = get char ( J J = = ' )
i f ( i s upper ( c J J
yy l va l . i va l = c - ' A ' ;
return ( VREG J ;
i f ( i s l ower ( c ) )
yy l va l . i va l = c - ' a ' ;
return ( DR EG ) ;
/ * gobb l e up d i g i t s . po i n t s , expon en t s * /
i f ( isdigit ( c ) I I c = = ' . ' )
char bu f [ BS Z t l ] , * c p = bu f ;
i n t dot = 0 , exp = 0 ;
t t c p , c = g e t char ( ) )
f or ( ; ( cp - bu f ) < BSZ
*cp = c ;
i f ( i sd i g i t ( c ) )
c o n t i nu e ;
i f ( C == ' . ' )
i f ( do t + + I I exp )
return ( ' . ' ) ;
/ * w i l l c a u s e s y n t a x error * /
c o n t i nu e ;
i f ( c == ' e ' )
i f ( exptt )
return ( ' e ' ) ;
c o n t i nu e ;
/ * w i l l c a u s e s y n t a x error * I
/ * end o f number * /
brea k ;
i f { cp - bu f >= BSZ )
( vo i d ) p r i nt f ( ' co n s t a n t t o o l o n g - t ru n c a t e d \ n ' ) ;
e l se
ungetc ( c , s t d in ) ;
/ * p u s h back l a s t char read * /
yy l va l . dva l = a t o f ( bu f ) ;
return ( CONST ) ;
return ( c ) ;
Programming Tools Guide
hilo (a, b, c, d)
doub l e a , b , c , d ;
/ * returns the sma l l e s t i n t erva l c o n t a i n i n g a , b , c , a n d d * /
/ * u s ed by * , / rou t i ne * /
if (a > b)
v.hi = a;
v. lo
e l se
v.hi = b;
v. lo
if {c > d)
i f ( C > v . hi )
v.hi = c;
if (d < v. lo)
v. lo = d;
i f ( d > v . hi )
v.hi = d;
i f (C < V. lo)
v . lo = c ;
return ( v ) ;
vmu l ( a , b , v )
doub l e a , b ;
return ( h i l o ( a * v . h i , a * v . l o , b * v . h i , b * v . l o ) ) ;
dcheck ( v )
i f ( v . h i >= 0 . & & v . l o < = 0 . )
( vo i d ) p r i nt f ( ' d i v i s o r i n t e rva l c o n t a i n s 0 . \ n ' ) ;
r e t u rn ( 1 ) ;
return ( 0 ) ;
vd i v ( a , b , v )
doub l e a , b ;
return ( h i l o ( a I v . h i , a I v . l o , b I v . h i , b I v . l o ) ) ;
Programming Tools Guide
Chapter 10
m4: A macro processor
The m4(CP) macro processor allows the user to define and process strings of
characters called macros. Macros are strings in a file which get replaced by
some other text when the file is processed. The substituted text will often be
another literal string. Macros can also be defined in terms of a set of program­
ming constructs that m4 supports, and so macros may act as functions. These
functions can:
accept arguments
perform arithmetic
check conditions
perform file manipulation
do string procesing
The m4 macro processor may be used to enhance the functionality of some
other application, such as a compiler or word processor.
The basic operation of m4 is to copy its input to its output. As the input is
read, each alphanumeric string (that is, string of letters and digits) is checked.
If the string is the name of a macro, the name is replaced by its definition. m4
then rereads the resulting string and may perform further manipulations on it.
When macros are called with arguments, the arguments are collected and sub­
stituted in the correct position in the defining text before m4 rescans the text.
The m4 macro processor provides a collection of about twenty built-in mac­
ros. In addition, the user can define new macros. This chapter describes some
of the most commonly used built-in macros and explains how you can define
your own macros. Built-in and user-defined macros work the same way,
except that some of the built-in macros have side effects on the state of the
process. For more information about the built-in macros, see m4(CP).
m4: A macro processor
Invoking m4
To invoke m4, use a command of the form:
m4 [filenames]
Filename arguments are processed in order. If there are no arguments, or if a
dash (-) appears in the argument list, then the standard input is read. The pro­
cessed text is written to the standard output, and can be redirected as shown
by the following command:
m4 file1 file2
> outputfile
The dash in the above example indicates that the standard input is processed
after file1 andfile2.
Defining macros
The built-in function define is used to define new macros. The following
statement defines the name name as contents:
d e f i n e ( name , c o n t e n t s )
All subsequent occurrences of name will be replaced by contents. name must
be alphanumeric and must begin with a letter. (The underscore (_) is
regarded as a letter.) The string contents could be any text, including text that
contains balanced parentheses, and it may stretch over multiple lines. Here is
a short example of a file which is to be processed by m4:
d e f i ne ( Bob , Robert )
de f i n e ( Sa n dy , Sandra )
de f i n e ( I , James l
de f i ne ( Ju l i a , her mother )
Bob and I had l u n c h w i t h Sandy and Ju l i a .
If this file were processed by m4, the output would be:
Robert and James had l u n c h w i t h Sandra and her mother .
The left parenthesis must immediately follow the word define, to signal that
define has arguments. If a macro or built-in name is not followed immedi­
ately by a left parenthesis it is assumed to have no arguments.
A macro riame is only recognized as such if it is surrounded by non­
alphanumeric characters. A slightly modified version of the previous example
will demonstrate this:
Programming Tools Guide
Defining macros
de f i n e ( Bo b , Robert )
d e f i n e ( Sandy , San dra )
d e f i n e ( I , James )
d e f i n e ( Ju l i a , her mother )
Bob a n d I a n had l u n ch w i t h Sandy a n d Ju l i a .
The output will be:
Robert and I a n had l u n c h w i t h San dra and her m o t h e r .
Notice that the letter " I " in the word "Ian" is not replaced by the string
"James". This is because one of the characters surrounding " I " is an
alphanumeric character.
Macro names or arguments can also be defined in terms of other names or
arguments. The following statements define macros M and N each to be 100:
def ine ( N , 100 )
def i ne ( M , N )
Since N is defined to be 100, and M is defined to be N, M is defined to be 100,
This illustrates two aspects of m4's behavior:
macro names can be expanded into their defining text at any time after they
are defined
m4 expands macro names into their defining text at the first opportunity
The first fact implies that the definition of N may be used anywhere in the text
that follows it, even inside another define statement. The second fact implies
that the second define statement above is equivalent to:
def i ne ( M , 1 0 0 )
As soon as m4 reads the name N, it substitutes the text '100'. Therefore, even
if N subsequently changes, M does not.
H this is not the desired result, it may be avoided by interchanging the order
of the definitions:
def i ne ( M , N )
def ine ( N , 100 )
Now M is defined as the string " N ", so when M is read later, m4 will first
replace it with the string " N " Then, because m4 rereads substituted text, the
name N will be replaced by '100'. The net effect is that M will be replaced by
whatever the value of N is at that time.
m4: A macro processor
The more general solution to the problem described in the last section is to
delay the expansion of the arguments of define. This is done by by quoting
them. Any text surrounded by a left single quotation mark (' ) and a right sin­
gle quotation mark (' ) is not expanded immediately, but has the marks
stripped off. In the second statement below, the punctuation marks are
removed from the string 11 N 11 as the argument is being collected, but the
macro N is not expanded. The effect is that M is defined as the string 11 N ", not
as 100:
de f i ne ( N , 1 0 0 )
de f i n e ( M , ' N ' l
The general rule is that m4 strips off one level of single quotation marks
whenever it evaluates something. This is true even outside of macros. Given
the definitions in the last example, if you want the letter 11 M " to pass through
m4 unchanged and appear in the output, it must be quoted. For example:
' Come i n , Mr B o n d , • ' M ' s a i d .
The following definitions show another application o f quoting:
de f i n e ( N , 1 0 0 )
de f i n e ( N , 2 0 0 )
The 11 N " in the second definition is evaluated as soon as it is seen: that is, it is
replaced by 100. This makes the second definition equivalent to:
de f i ne ( 1 0 0 , 2 0 0 )
Obviously, this statement doesn't have the desired effect. Moreover, it will be
ignored by m4, since only names (alpahnumeric strings starting with a letter)
can be defined. To really redefine N, evaluation must be delayed by quoting,
as shown below:
de f i n e ( N , 1 0 0 )
de f i n e ( ' N ' , 2 0 0 )
When using m4, it is often wise to quote the first argument of a define state­
Programming Tools Guide
Changing the quoting marks
If the single quote and grave marks used for quoting are not convenient for
some reason, the quoting marks can be changed with the built-in function
changequote. For example, the following statement defines the new quota­
tion marks to be the left and right square brackets:
changequot e ( [ , ] )
The original characters can be restored by calling changequote with no argu­
The built-in function undefine removes the definition of a macro or built-in
function. The following statement removes the definition of N :
undef ine ( ' N ' )
undefine, as in the following statement:
u n de f i ne ( ' de f i n e ' )
Notice that the macros and functions being undefined must be quoted. Once
a built-in function is removed, its value cannot be retrieved.
The built-in function ifdef determines whether a macro is currently defined.
This function can takes three arguments, but the third argument may be null.
The first argument is a macro name, the second and third are strings (which
may contain macros). If the macro given as the first argument is defined, the
function returns the second argument (that is, the function call is replaced by
the text of the argument). If the macro is undefined, the third argument is
returned. If the third argument is null, the function returns a null string.
As an example, suppose that one of the macros xenix and unix is defined in a
particular version of a program. To define a macro system the system being
used, you could use the following statements:
i f de f ( ' xe n i x ' , ' de f i n e ( sys t em , 1 ) ' )
i f d e f ( ' u n i x ' , ' de f i n e ( s y s t e m , 2 ) ' )
Again, note the quotation marks in this example.
Here is an example of an ifdef call with three arguments:
i f d e f ( ' x e n i x ' , on XENIX, not on XENIX )
m4: A macro processor
Using arguments
The simplest form of macro definition involves replacing a string with
another fixed string. User-defined macros can also have arguments, so
different invocations can have different results. Within the replacement text
for a macro, any occurrence of $n will be replaced by the nth argument when
the macro is actually used. For example, the replicate macro, defined in the
following example, generates code to double its argument. It uses the eval
macro, explained in the section "Using built-in arithmetic functions".
de f i n e ( rep l i c a t e , eva l ( $ 1 + $ 1 ) )
S i x o f one and ha l f - a - do z e n o f t he other g i ve s rep l i c a t e ( 6 )
When m4 processes the input file above, the output is:
S i x of one and ha l f - a - d o z e n of the o t her g i ve s 1 2
Only the first nine macro arguments, $ 1 through $9, are accessible. The argu­
ment $0 is the macro name itself. Arguments which are referred to in a macro
definition but not supplied when the macro is called are replaced by null
strings. The following macro simply concatenates up to nine arguments:
de f i n e ( c a t , $ 1 $ 2 $ 3 $ 4 $ 5 $ 6 $ 7 $ 8 $ 9 )
It may be called with fewer than nine arguments, as in the call:
cat ( x , y , z )
The result of this call will be the string xyz: The arguments $4 through $9 are
null, since no corresponding arguments were provided.
Leading unquoted spaces, tabs, and newlines that occur during argument col­
lection are discarded. All other white space is retained. The following state­
ment defines " a " to be b c:
de f i n e { a ,
Arguments are separated by comma s, but parentheses are counted properly,
so a comma protected by parentheses does not terminate an argument.
Therefore, in the statement below there are only two arguments: the second is
literally ( b,c ) .
de f i n e ( a ,
( b, c ) )
A literal comma or parenthesis can be inserted in an argument list by quoting
Programming Tools Guide
Using built-in arithmetic functions
Using built-in arithmetic functions
The m4 processor provides built-in functions for doing arithmetic on integers.
There are two simple functions, incr, which returns the result of incrementing
its numeric argument by 1, and deer which returns the result of decrementing
by 1 . To perform .the equivalent of the assignment statement 'M=N+l', use
the following statements:
d e f i ne ( N , 1 0 0 )
de f i n e ( M , ' i n c r ( N ) ' )
Note that incr(N) is not equivalent to 'N=N+l', that is, it does not actually
change the value of N.
The more general mechanism for arithmetic is a the built-in function eval,
which supports ordinary arithmetic on integers. It provides the following
operators (in decreasing order of precedence):
Arithmetic Addition
Arithmetic Negation
Not Equal
Less than
Less than or Equal to
Greater than
Greater than or Equal to
Logical NOT
Logical AND
Logical AND
Logical OR
Logical OR
Parentheses can be used to group operations where needed. All the operands
of an expression given to eval must ultimately evaluate to numbers. The
numeric value of a relation that evaluates to true, for example, '1>0', is 1; the
value of a relation which evalutes to false is 0. The precision of eval depends
on the implementation.
m4: A macro processor
To set the value of the macro M to 2**N+l, use the following statements:
de f i ne ( N , 3 )
de f i n e ( M , ' eva l ( 2 * * N+ l ) ' )
It is advisable to quote the defining text for a macro unless it is very simple
(for example, just a number).
eval may appear on its own in the input. For example:
2 + 2 =eva l ( 2 + 2 )
Manipulating files
A new ffie can be included in the input at any time by using the built-in func­
tion include. The following example inserts the contents of the file boiler into
the file being processed:
i n c l u de ( bo i l e r )
The m 4 processor replaces the function call with the contents of the file. Files
containing a set of define statements are often included in this way.
It is a fatal error if include cannot access the file named. To avoid such errors,
the alternate form sinclude can be used. If sinclude (silent include) cannot
find a ffie, no messages are issued and processing continues.
The output of m4 can be diverted to files during processing and the collected
material output later. For this purpose, m4 maintains ten output streams,
called diversions. These are numbered 0 through 9, and the files diverted into
are referred to by their diversion number. Diversion 0 is the standard output.
The following statement puts all subsequent output at the end of a temporary
file referred to as 1 :
d i vert ( l )
Diverting to this file is stopped by another divert macro. A divert call which
has an argument in the range 2 through 9 will stop diverting to the current file
and start diverting to another file; a call which has no arguments or an argu­
ment of 0 will cause output to be sent to the standard output. Diverting to a
diversion whose name is not between 0 and 9, inclusive, causes the diverted
text to be discarded.
Diverted text is normally output all at once at the end of processing, with the
diversions output in numeric order. It is possible, however, to bring back
diversions at any time, that is, to append them to the current diversion. This
is done using the undivert macro. Diversion n may be brought back with a
call of the form
u n d ivert ( n )
Programming Tools Guide
Using conditionals
The macro call is replaced by the contents of the diversion. undivert can take
multiple arguments, separated by commas; the diversions are brought back in
the order specified in the argument list. The following macro call will bring
back all the diversions, in numeric order:
u n d i vert
When a diversion is "undiverted", it is emptied of all text.
The built-in divnum function returns the number of the currently active
diversion. This value is 0 during normal processing.
Using system commands
Any program on the local operating system can be run from a file being pro­
cessed by m4 by way of the built-in syscmd macro. For example, the follow­
ing statement runs the UNIX System date command:
s y s cmd ( da t e )
The macro syscmd can be used to create a file which is subsequently included
using the include command,
Using conditionals
There is a built-in macro called ifelse that enables conditional testing. The
behavior of this macro is somewhat unusual, so the description and examples
below should be read carefully. ifelse takes three or more arguments. In the
simplest case, where only three arguments are present, the values of the first
two arguments are compared, and if they are the same string, then the macro
returns the value of the third argument. For example:
de f i n e ( B i l l , W i l l i am )
i f e l s e ( W i l l i am , B i l l , ' Same guy - - book h im ! ' )
In this example, the first two arguments are the same after macro substitution
takes place, so the third argument will be output. If the first two arguments
are not the same, then a null string is returned.
If exactly four arguments are present, then if arguments one and two are not
identical, the fourth argument is returned. For example, you could define a
macro called compare that compares two strings and returns 1 if they are the
same, and 0 if they are not:
de f i n e ( c ompa re ,
' i fe l se ( $ 1 , $ 2 , 1 , 0 ) ' )
Note the quotation marks, which prevent the immediate evaluation of ifelse.
If the fourth argument is missing, it is treated as empty.
m4: A macro processor
If there are exactly five arguments to ifelse, the fifth argument is ignored. If
there are more than five arguments, and if the first two arguments are not
identical, then the result will be as if ifelse were called with the fourth and
remaining arguments. For example:
i fe l se ( a , A, l , b, b , 2 )
This expression will return 2. a and A are not identical, so the result is the
same as if the following call had been made:
i f e l se ( b , b , 2 )
Since the first two arguments are the same in this case, the value 2 is returned.
This method of evaluation works with even longer argument lists. Consider
this example:
i fe l se ( a , A , l , b , B , 2 , c , c , 3 , d )
Since the first two arguments are not the same, the result is the same as that of
the macro call:
i fe l se ( b , B , 2 , c , c , 3 , d )
Again, the first two arguments are not the same, and so the result of this
macro call is now the same as the result of:
i fe l se ( c , c , 3 , d )
This macro call will return 3. The ifelse macro thus provides a limited form
of multiple-decision capability.
Manipulating strings
A number of m4 built-in macros for performing string manipulation are dis­
cussed in this section.
The built-in len macro takes a string as its argument and returns its length.
The following statement will return a value of 6:
l e n ( ab c de f )
All characters within the parentheses are counted, so the following statement
will return a value of 5:
l en ( ( a , b ) )
Programming Tools Guide
Manipulating strings
The built-in substr macro returns substrings of strings. The macro call has the
subs t r ( s , i , n }
This call returns the substring of s that starts at position i and is n characters
long. The first character in the string is considered to be in position zero. The
following example returns the string "hor ses":
subs t r ( ' t o t i ehor s e s t o ' , 6 , 7 )
If n is omitted, the remainder of the string will be returned.
The built-in macro index takes two strings as its arguments and returns the
starting position at which the second string appears in the first string. It
returns -1 if the second string does not appear in the first. The following
macro call returns " 2 ":
i n dex ( ' t o t i ehor s e s t o ' , ' t i e ' )
As with the substr macro, the first character of a string is in position zero.
trans lit
The built-in translit macro performs character transliteration. The macro call
takes the form:
t ran s l i t ( s , f , t }
The arguments s, f, and t are all strings; the macro returns a string gotten by
replacing all characters in s that appear in f with the corresponding character
in t. The following macro call replaces each of the letters in aeiou with the cor­
responding digit:
t ra n s l i t ( qu e s t i onab l e , ae i ou , 1 2 3 4 5 }
The returned value will be q52st34n1 bl2 . If t is shorter than f, characters that
don't have an entry in t are replaced by the null string. f are replaced with the
null string. Therefore, the following statement deletes vowels from the first
t ra n s l i t ( que s t i on ab l e , ae i ou }
m4: A macro processor
The built-in macro errprint writes its arguments to the standard error. The
statement below to print the message "fatal error" :
errp r i n t ( ' f a t a l error ' )
The dumpde£ macro is a debugging aid that dumps the current definitions of
defined terms. If there are no arguments, all definitions are printed. Other­
wise, the definitions named as arguments are printed.
Programming Tools Guide
Appendix A
ANSI implemen tation-defined behavior
The American National Standards Institute (ANSI) has adopted a standard
that specifies the form and establishes the interpretation of programs written
in the programming language C. The standard is detailed in ANSI Document
Number X3J1 1 /90-013. A list of the characteristics and extensions that must
be documented is provided in an appendix to the ANSI document, subsection
"F.3 Implementation-Defined Behaviour". This list serves as a guide to docu­
menting implementation-defined behaviour. Note that it is not an exhaustive
list but a guide provided as a convenience.
According to the ANSI standard, each implementation of an ANSI-conforming
implementation of the C language must be accompanied by a document that
defines all implementation-defined characteristics and extensions. This
appendix provides this information for the Microsoft C Compiler, version 6.
It is organized into the following sections, corresponding to the sections in the
appendix to the ANSI document:
Floating Point
Arrays and Pointers
Strutcture, Unions, Enumerations, and Bit-fields
ANSI implementation-defined behavior
Preprocessing Directives
Library Functions
Locale-specific Behaviour
This section describes the implementation-defined characteristics of source
code translation. It corresponds to section "F.3.1 Translation'' of the ANSI
Identifying diagnostics
The compiler generates diagnostic messages in the following form:
filename ( line-number) : diagnostic level error source error number message
filename and line-number indicate the location of the offending statement.
diagnostic level indicates whether the message is an error or a warning.
error source indicates which program in the compilation process generated
the error as follows:
C a compiler error
D a command line error
The leading digit of error number indicates the severity of the message as fol­
a fatal error
a non-fatal error
a warning
message is an explanation of the error or warning.
Programming Tools Guide
This section describes the implementation-defined characteristics of the exe­
cution environment. It corresponds to section "F.3.2 Environmentn of the
ANSI document.
Arguments to main()
The function main can have zero or two arguments, and is declared as fol­
i n t ma i n ( vo i d )
i n t ma i n ( i n t argc , char •argv [ ] )
argc will always be greater than or equal to 1 .
� �--�· -·
argv[O] contains the full pathname of the program being executed.
argv[l] through argv[c] contain the exact strings as supplied to exec. No
transformations on the arguments are performed by exec although the shell
may transform the arguments.
Interactive devices
The input and output dynamics of an interactive device are specified by the
standard. An interactive device is defined by the UNIX system, typically the
console and terminals.
This section describes the implementation-defined characteristics of
identifiers. It corresponds to section "F.3.3 Identifiersn in the ANSI document.
Significant characters without external linkage
The Microsoft C Compiler recognizes up to 254 significant initial characters in
an identifier without external linkage. For debugging purposes, only the first
128 characters are recognized.
Significant characters with external linkage
The Microsoft C Compiler recognizes up to 254 significant initial characters in
an identifier with external linkage. For debugging purposes, only the first 128
characters are recognized.
ANSI implementation-defined behavior
Significance of character case
Case is significant in an identifier with external linkage.
This section describes the implementation-defined characteristics of charac­
ters. It corresponds to section "F.3.4 Characters" in the ANSI document.
Source and execution character sets
The source character set is the set of legal characters that can appear in source
files. The execution character set is the set of legal characters interpreted in
the execution environment. For the Microsoft C compiler, the source and exe­
cution character sets are the same, the standard 8-bit character set.
Multi-byte shift states
Multi-byte characters are used by some C language implementations to
represent foreign-language characters not represented in the standard ASCII
set. The Microsoft C compiler supports multi-byte characters but does not
implement shift states; only the "C" locale is implemented. Consequently, no
useful operations can be performed on multi-byte characters.
Bits per character
The number of bits in a character in the execution character set is represented
by the manifest constant char_bit, which is defined in limits. h. It is defined as
8 bits.
Mapping character sets
The source code character set maps directly to the execution character set.
The source character set, a proper subset of the ASCII character set, com­
mences with ASCII 32 and ends with ASCII 126. This includes all of the print­
able graphic characters of the ASCII character set.
Constants with unrepresented characters and escape sequences
Certain nongraphic characters may be represented in the source character set
by escape sequences which begin with a backslash followed by a lowercase
letter. These escape sequences map onto specific characters in the ASCII set as
shown in the following table:
Programming Tools Guide
Table A-1
Escape sequence
ASCII value
Character Name
single quote
double quote
question mark
carriage return
horizontal tab
vertical tab
For escape sequences other than those listed in Table A-1, the backslash is
stripped and the characters in the sequence are treated like ordinary charac­
ters. All other characters map directly from the source character set to the
external character set.
Constants with multiple or wide characters
When a character constant contains more than one character, the individual
values of the characters are concatenated together to form an int. Since int is
equal to 4 bytes and each byte can hold a character, a character constant may
be up to four characters long. Excess characters are dropped from the left
side. For example, if
int i
' abe d ' ;
then the value of i in hex is 65666768, the individual values for "a", "b", uc",
and "d" concatenated together. Again, if
i n t i i = ' abcde f ' ;
then the value of ii in hex is 67686970, the indivual values for "c", "d", "e', and
"f" concatenated together. Note that the excess characters on the left, "a" and
"b", are dropped.
Locale used for multi-byte conversion
The compiler uses the "C" locale to convert multibyte characters into corre­
sponding wide characters.
ANSI implementation-defined behavior
Range of char values
A "plain'' char has the same range of values as a signed char. The -J com­
mand line option changes the range of values to be the same as an unsigned
This section describes the implementation-defined characteristics of integers.
It corresponds to section "F.3.5 Integers" in the ANSI document.
Integer range and representation
Integers are represented as follows:
signed char
unsigned char
signed short
unsigned short
signed int
signed long
unsigned int
unsigned long
I appendix, presents the range of values each integer type can take on.
NOTE The section "C6.0 Implementation Limits description'', later in this
Demotion of integers
When an integer is converted to a shorter signed integer, if the value canno t
be represented, the excess bytes are truncated. For example, if
char c = ( cha r ) O xabcde f ;
then the value of c is Oxef.
When an unsigned integer is converted to a signed integer of equal length, if
the value canno t be represented, the conversion is the same, but the value is
treated like a signed number.
Programming Tools Guide
Floating point
Signed bitwise operations
In bitwise operations on signed integers, each of the operands of a bitwise
operation is subject to the "usual arithmetic conversions". Consequently, nar­
row integers are subject to "integral promotions". For example, a short is pro­
moted into an int. If the value of the short is negative, the sign is extended in
the int. After the promotion, bitwise operations on signed ints are applied as
if they were applied on unsigned ints.
Sign of division remainder
When integers are divided and the division is inexact, the sign of the
remainder is the same as the sign of the dividend.
Right shift of negative-valued signed integer
When a signed integral type with a negative value is right shifted, the sign is
Floating point
This section describes the implementation-defined characteristics of floating
point numbers. It corresponds to section "F.3.6 Floating Point" in the ANSI
Floating-point range and representation
The representations and sets of values of the various types of floating-point
numbers in the Microsoft C Compiler are:
4 byte IEEE format
8 byte IEEE format
long double 8 byte IEEE format
� NOTE The section "C6.0 Implementation Limits description", later in this
appendix, presents the range of values each floating point type can take on.
Converting an integer to a floating-point
An integral value is truncated to the nearest floating-point representation if it
cannot be converted exactly.
2 79
ANSI implementation-defined behavior
Converting a floating-point to a narrower floating-point
When a floating-point number is converted to a narrower floating-point num­
ber, the direction of truncation or rounding is towards the nearest
floating-point number.
An overflow causes a run-time exception to occur. For example,
doub l e d ;
f l oat f ;
rn a i n ( )
d = 1 . 2 3 4 5e300 ;
/ * dumps c o re * /
If an tmderflow occurs, the value is rounded towards 0.
Arrays and pointers
This section describes the implementation-defined characteristics of arrays
and pointers. It corresponds to section "F.3.7 Arrays and Pointers" in the ANSI
Largest array size
The type of integer rquired to hold the maximum size of an array, that is, the
type of the sizeof operator, size_t, is an unsigned int.
Casting pointers
The effect of casting a pointer to an integer or vice versa depends on the type
of int. When casting any pointer into a long int or an int and back, no infor­
mation is lost. However, when casting any pointer to type char or short and
back, the original pointer will not be restored.
Pointer subtraction
The difference be_tween two pointers to members of the same array is held in
ptrdiff_t, a signed int.
Programming Tools Guide
Structures, unions, enumerations, and bit-fields
This section describes the implementation-defined characteristics of registers.
It corresponds to section "F.3.8 Registers" in the ANSI document.
Using registers
The compiler will place the first 3 variables declared with the register
storage-class specifier into registers as long as the variable is accessed and it is
of the correct type. Three registers in the i386 are used for register variables:
EBX ESI and ED I. These registers can hold values of types long, int, and short.
Structures, unions, enumerations, and bit-fields
This section describes the implementation-defined characteristics of struc­
tures, unions, enumerations, and bit-fields. It corresponds to section "F.3.9
Structures, Unions, Enumerations, and Bit-Fields" in the ANSI document.
Improper access to a union
Attempts to access a union member will always succeed. The value of the
member object will be the value of the first "r(' bytes of the union (where un"
is the size of the object being accessed) interpreted according to the type of the
accessed member.
Padding and alignment of members of structures
The Intel 80x86 does not impose a restriction on the alignment of objects; any
object can start at any address. However, for certain objects, having a particu­
lar starting address can speed up processor access.
The C 6.0 compiler aligns the whole structure on a 4-byte boundary by default
(see "Pragmas" below). All [ 4 I 8 I 10 ]-byte objects are aligned on a 4-byte
boundary, 2-byte objects are aligned on a 2-byte boundary, while 1-byte
objects are not aligned.
Sign of bit-fields
A "plairf' int bit-field is treated as a signed int bit-field.
ANSI implementation-defined behavior
Order of allocation of bit-fields
Bit-fields are allocated in an int starting from the lowest order bit to the
highest. For example, the structure
s t ru c t { i n t x : 1 0 ; i n t y : 2 0 ; i n t z : 2 ; } y ;
would be stored like
z zyyyyyyyyyyyyyyyyyyyy x x x x x x x x x x
444444443 3 3 3 3 3 3 322222222 1 1 1 1 1 1 1 1
where z , y, and x represent the bits of the corresponding bit-field, and
and 4 , represent the bits of the corresponding bytes
1, 2 , 3 ,
The order of allocation of bit-fields within an int is determined by the archi­
tecture of Intel processors which store the low order byte of an int at a lower
address than the highest order byte in that int. For example:
1st byte
- low memory or address 'x'
2nd byte
- address 'x' + 1
3rd byte
- address 'x' + 2
4th byte
- address 'x' + 3
Alignment of bit-fields
A bit-field cannot straddle its storage-unit boundary, an int, which is 32 bits
type of values of an enumerated type
The values of an enumeration type are represented by an int.
This section describes the implementation-defined characteristics of qualif­
iers. It corresponds to section "F.3.10 Qualifiers" in the ANSI document.
Access to volatile objects
Any reference to or use of an object that has been qualified with the volatile
type constitutes an access to that object. All parts of a referenced object are
assumed to be referenced if any part of the object is referenced.
Programming Tools Guide
Preprocessing directives
This section describes the implementation-defined characteristics of declara­
tors. It corresponds to section "F.3.11 Declarators" in the ANSI document.
Maximum number of declarators
An unlimited number of declarators (pointer, array, or function) may modify
an arithmetic, structure, or union type. Note that for COFF debugging pur­
poses only, the maximum number of declarators is "6".
This section describes the implementation-dependent characteristics of state­
ments. It corresponds to section "F.3.12 Statements" in the ANSI document.
Maximum number of case values
There is no fixed limit on the number of case values in a switch statement.
Preprocessing directives
This section describes the implementation-defined characteristics of prepro­
cessing directives. It corresponds to section "F.3.13 Preprocessing Directives"
in the ANSI document.
Character constants and conditional inclusion
The value of a single-character character constant in a constant expression
that controls conditional inclusion matches the value of the same character
constant in the execution character set. This may be a negative value.
Locating includable source files
To locate bracketed, includable source files, the preprocessor searches through
any paths specified on the co mmand line by the I option and then through
the default path, normally /usr/include. The -X option disables the search
through the default path.
Including files with quoted names
To locate quoted, includable source files, the preprocessor looks in the direc­
tory containing the file being compiled, then searches through any paths
specified on the command line by the -I option, and then searches through the
ANSI implementation-defined behavior
default path, normally /usr/include. The -X option disables the search through
the default path. If the file is not found, the preprocessor reparses the the
"#include string newline'' as if it was a "< >n delimited file and repeats the
search as described in the previous section.
Filenames can be any legal UNIX filename including a full specification of
Character sequences
Both source and execution sets include the full ASCII character sets. The de­
limited sequence is mapped directly into the external file name.
The behavior of each recognized #pragma directive in C 6.0 is listed below:
Places the specified functions in the given text segment.
» p ragma a l l o c_t e x t ( t ext_s egmen t 1 f u n c t i o n [ 1 f u n c t i o n ] . . . )
Places the comment string into the comment segment of the object file.
» pragma c ommen t ( c omment_type l ' s t r i ng ' )
comment_type is one of: compiler, lib, exestr, or user.
Allocates a segment named segment in
with class name
•• •.
» pragma c o de_s eg ( • s egment •
name l ' [ 1
name2 ' l . . . )
Tells the compiler to tum stack probes on or off. The default is off.
» pragma check_s t a c k ( on l o f f )
Turns pointer checking on or off. The default is off.
# p ragma check_po i n t e r ( on l o f f l l
This pragma is recognized, but performs no useful function under UNIX.
Programming Tools Guide
Preprocessing directives
Allocates a segment named "segment" in
with class name
namelname2 ....
# p ragma da t a_seg ( ' s egment ' , ' n ame 1 ' [ , ' n ame2 ' ] . . . )
Tells the compiler to generate code for function calls instead of using the
inline (instrinsic) form of the function(s) listed.
# p ragma f u n c t i o n ( f u nc t i on l [ , f u n c t i o n 2 ] . . . )
Tells the compiler to generate inline code for the listed function(s).
# pragma i n t r i n s i c ( name )
Turns loop optimization on and off.
# pragma l oop_opt ( o n l o f f )
The default is off. Optimization can be toggled for the complete compilation
using the -01 option.
Print s tring on the standard output.
# p ragma m e s s a g e ( ' s t r i n g ' )
If string is empty, the compiler uses default values. If on or off is omitted, the
setting for the given options are toggled.
# p ragma opt i m i z e ( ' s t r i n g ' , on l o f f )
string is one or more of:
Assume no aliases in the following code.
Enable local common subexpression.
Enable global register allocation.
Enable global common subexpressions.
Optimize loops.
Disable unsafe loop optimizations.
Ensure floating-point consistency.
ANSI implementation-defined behavior
Optimize for speed.
Assume no aliases except across function calls.
Tilis pragma is recognized but performs no useful function under UNIX. on or
off toggles last setting.
B p ragrna p lrnn ( o n l o f f )
Tilis pragma is recognized but performs no useful function under UNIX. on or
off toggles last setting.
B p ragrna p l rn f ( on l o f f )
Sets the number of lines on the source listing page to a number greater than 15
and less than or equal to 255. The default is 63.
B p ragrna p a g e s i z e ( i n t eger )
Sets the listing line size in a source listing to a number greater than 79 and less
than or equal to 132. The default is 79.
B pragrna l i n e s i z e ( i n t eger )
When compiler is generating source listing, generate integer number of blank
pages. The number of blank pages may be greater than or equal to 1 and less
than or equal to 127. The default is to generate one page.
B p ragrna page ( i n t eger )
When compiler is generating source listing, generate integer number of blank
lines. The number of blank lines may be greater than or equal to 1 and less
than or equal to 127. The default is to generate one blank line.
B pragrna s k i p ( i n t eger )
Programming Tools Guide
Preprocessing directives
Instructs the compiler to pack structs on 1, 2, 4, or 8 byte boundary (defaults
to 4).
# p ragma pac k ( l l 2 1 4 1 8 l
Controls whether or not code is generated to perform range checking on
switch expressions.
# p ragma sw i t ch_check ( on l o f f l l
on or off toggles the last setting.
,, .
Allocates a segment named identifier in one of _TEXT, _DATA, _BSS, CON ST.
# p ragma segment ( _TEXT I _DATA I _BSS I CONS T , i de n t i f i e r , i de n t i f i e r ,
. . .)
This pragma is recognized, but performs no useful function under UNIX.
# p ragma search l i b
Tells compiler to assumes the listed "far" variables are allocated in the same
segment Must be used with the -ND co mmand line option.
# p ragma s ame_s eg ( va r i ab l e l , va r i a b l e 2 , . . . )
Sets a subtitle, string, on the source listing page.
# p ragma subt i t l e ( ' s t r i ng ' )
Sets a title, string, on the source listing page.
# p ragma t i t l e ( ' s t r i n g ' )
Definitions for date and time
The macros _DATE_ and _TIME_ return the date and time of translation
respectively. The date and time are always available to the macros.
-· -· .
ANSI implementation-defined behavior
Library functions
This section describes the implementation-defined characteristics of the com­
piler library functions. It corresponds to section uF.3.14 Library Functions" in
the ANSI document.
Expanding the NULL macro
The macro NUlL expands to the null pointer constant,
( ( vo i d * ) 0 ) ;
Diagnostic printed by the assert function
The assert function prints the diagnostic
' As s ert i on f a i l e d : < expre s s i on > , < f i l e name > , < l i n e number> '
On termination, the assert function prints out a string of the form:
A ss ertion f a i l ed : <compari son operat i on> , f i l e <source f i lename > , l ine <source l ine number> .
Character testing
The sets of characters that the isalnum, isalpha, iscntrl, islower, isprint, and
isupper functions test for are listed below in the section uLocale-Specfic
Math functions and domain errors
Domain errors occur when an input argument is outside the domain over
which the mathematical function is defined. The values returned by the
mathematics functions on the occurence of domain errors are presented in the
following table:
Table A-2
pow( )
sqrt( )
asin( )
acos( )
fmod( )
atan2( )
-HUGE_VAL or -HUGE or 0.0 depending on input values.
The value of EDOM is "33".
Programming Tools Guide
Library functions
Underflow of floating-point values
On underflow range errors, the mathematics functions listed below set the
integer expression ermo to the value of the macro ERANGE.
sin( )
cos( )
exp( )
log( )
tan( )
pow( )
sinh( )
cosh( )
tanh( )
ldexp( )
Domain errors and the ftnod function
The £mod function computes the floating point remainder of x /y. A domain
error occurs and zero is returned when the £mod function has a second argu­
ment of zero (errno=EDOM).
signal function
The set of signals for the signal function are:
Table A-3
1 1 [1]
illegal instruction (not reset when caught)
used by abort, replaces SIGlOT
floating point exception
segmentation violation
software termination signal
Default signals
The default handling for each signal recognized by the signal function is to
not ignore these signals.
ANSI implementation-defined behavior
A program runs in a process which inherits its handling of signals from its
parent process (such as the shell). If signals are set to SIG_IGN in the parent
process, that remains the same. If the signals were set to be caught in the
parent they are reset to SIG_DFL which may be to ignore the signal or ter­
minate the program. The setting of SIG_DFL remains the same.
Signal blocking
If the equivalent of "signal(sig, SIG_DFL);" is not executed prior to the call of a
signal handler, signal blocking is performed. Signals are set to their default
values just before program execution.
SIGILL signal
The default is not reset if the SIGILL signal is received by a handler specified
to the signal function.
Tenninating new-line characters
A text stream is an ordered sequence of characters composed into lines, each
line consisting of zero of more characters plus a new-line character. The last
line of a text stream does not require a terminating new-line character.
Space characters before a new-line character
Space characters written out to a text stream immediately before a new-line
character appear when read in.
Null characters appended to a binary stream
A binary stream is an ordered sequence of characters that can transparently
record internal data. An unlimited number of null characters may be
appended to data written to a binary stream.
File position in append mode
The file position indicator of an append mode stream is initially positioned at
the end of the file.
Writing on text stream
Writing to a text stream causes the associated file to be truncated beyond that
Programming Tools Guide
Library functions
File buffering
Files are line buffered or unbuffered depending on the setting of setbuf( ) and
setvbu£. The default buffer size is set by BUFSIZ in stdio.h.
existence of zero-length files
A zero-length file, one on which no characters have been written by an output
stream, actually exists.
Composing valid file names
The rules for composing valid file names are:
The maximum file name length is 14 characters under SCO UNIX 3.2.2.
The maximum file name length is 255 characters under SCO UNIX 3.2.4.
All ASCII characters are acceptable with the exception of u /", 0." and 0 • •".
File access limits
The same file can be simultaneously open multiple times.
Removing open files
The effect of the remove function on an open file is to make the file inaccessi­
ble to other programs or to users. The file remains accessible to the program,
and other running programs that have already opened the file, through
already open file descriptors, but not through new attempts to open a file
with the same name. Once all open file descriptors are closed, the file will be
irrevocably gone.
Renaming with a name that exists
If a file with the new name exists prior to a call to the rename function,
rename will succeed, overwriting the existing file in the process.
Output of pointer values
The output for %p conversion in the £print£ function is a hex number. The
value may be preceeded by a 00X" prefix if %#p is used.
ANSI implementation-defined
Input of pointer values
The input for %p conversion in the £scan£ function is a number of hex digits
interpreted as a pointer. If %#p is used, nox" is used as a prefix. This option is
identical to the %x conversion specifier.
Reading ranges
A n-" character that is neither the first nor the last character in the scanlist for
%[ conversion in the fscanf function is interpreted as a range indicator as long
as the character to the left is less than the character to the right of the 0-0• The
following example selects all upper-case ASCII characters from nA" to uz"
% [A-Z ]
File position errors
The macro errno is set to 2 by the fgetpos or £tell function on failure. A 9 or a
22 may be generated by the lseek( ) system call (£tell( ) and fgetpos( ) call
lseek( )).
Messages generated by the perror function
The messages generated by the perror function look like:
< s ome u s e r m e s s a ge > : < error me s s a g e >
The error message is selected, from the list shown below under strerror,
depending on the value of errno just before calling perror( ) . If errno has a
value of non then the first string in the list is used and so on.
Allocating zero memory
The calloc, malloc, and realloc functions return a pointer to a zero-sized block
if the size requested is zero. This is different behaviour from the behaviour of
the memory allocation functions in libmalloc.a which always return a NULL
on malloc(O) .
abort function and open and temporary files
When the abort function is executed, the buffers of open and temporary files
are flushed, all files are closed, and execution returns to the shell.
Programming Tools Guide
Library functions
exit function
The exit( ) function returns the lower byte of its argument to the shell if the
value of the argument is other than zero, EXIT_SUCCESS, or EXIT_FAILURE.
The shell or the user may interpret this value in any way.
Environment names
The environment list is altered by the function putenv. putenv("string");
where string is of the format: variable=value.
NOTE Since the underlying operating system is POSIX and X/OPEN compa­
tible, the size of the environment is determined by a manifest defined by
system function
Any UNIX command line that is acceptable to the Bourne Shell can be passed
as the string to the system function. The Bourne Shell is assumed to be the
command processor and that shell is run as a child process to the program's
(that is, parent) process.
strerror function
The strerror function generates the following messages:
0. Error O
1 . Not owner
2 . No such file or directory
3 . No such process
4 . Interrupted system call
5 . I/O error
6. No such device or address
7. Arg list too long
8. Exec format error
9. Bad file number
10. No child processes
1 1 . No more processes
12. Not enough space
13. Permission denied
14. Bad address
15. Block device required
16. Device busy
17. File exists
ANSI implementation-defined behavior
Cross-device link
No such device
Not a directory
Is a directory
Invalid argument
File table overflow
Too many open files
Not a typewriter
Text file busy
File too large
No space left on device
Illegal seek
Read-only file system
Too many links
Broken pipe
Argument out of domain
Result too large
No message of desired type
Identifier removed
Channel number out of range
Level 2 not synchronized
Level 3 halted
Level 3 reset
Link number out of range
Protocol driver not attached
No CSI structure available
Level 2 halted
Deadlock situation detected/avoided
No record locks available
Error 47
Error 48
Error 49
Bad exchange descriptor
Bad request descriptor
Message tables full
Anode table overflow
Bad request code
Invalid slot
File locking deadlock
Bad font file format
Error 58
Error 59
Not a stream device
Programming Tools Guide
Library functions
61 .
71 .
No data available
Timer expired
Out of stream resources
Machine is not on the network
Package not installed
Object is remote
Link has been severed
Advertise error
Srmount error
Communication error on send
Protocol error
Error 72
Error 73
Multihop attempted
Error 75
Error 76
Not a data message
Filename too long
Error 79
Name not unique on network
File descriptor in bad state
Remote address changed
83. Can not access a needed shared library
84. Accessing a corrupted shared library
85. .lib section in a.out corrupted
86. Attempting to link in more shared libraries than system limit
87. Can not exec a shared library directly
88. Error 88
89. Function not implemented
90. Error 90
9 1 . Error 91
92. Error 92
93. Error 93
94. Error 94
95. Error 95
96. Error 96
97. Error 97
98. Error 98
99. Error 99
100. Error 100
101. Error 101
102. Error 102
103. Error 103
ANSI implementation-defined behavior
104. Error 104
105. Error 105
106. Error 106
107. Error 107
108. Error 108
109. Error 109
1 10. Error 1 10
1 1 1 . Error 1 1 1
- 1 12. Error 1 12
1 13. Error 1 13
1 14. Error 1 14
1 15. Error 1 15
1 16. Error 116
1 17. Error 117
1 18. Error 118
1 19. Error 1 1 9
120. Error 120
121. Error 121
122. Error 122
123. Error 123
124. Error 124
125. Error 125
126. Error 126
127. Error 127
128. Error 128
129. Error 129
130. Error 130
131 . Error 131
132. Error 132
133. Error 133
134. Error 134
135. Structure needs cleaning
136. Error 136
137. Not a name file
138. Not available
139. Is a name file
140. remote i/o error
141. reserved for future use
142. Error 142
143. Error 143
144. Error 144
145. Directory not empty
Programming Tools Guide
Locale-specific behavior
Errors 33 and 34 are available as the manifest constants EDOM and ERANGE
time zone
The local time zone (with or without daylight savings) is read from an
environment variable called TZ at program runtime. That variable assists in
interpreting the system clock.
clock function
The era for the clock function is: January 1 1970 00:00 GMT.
Locale-specific behavior
This section describes the locale-specific behaviour of the hosted environ­
ment. It corresponds to section "F.4 Locale-Specific Behaviour'/ in the ANSI
Content of execution character set
In the default "C1 locale/ the execution character set contains the entire 8-bit
character set.
Direction of printing
The direction of printing is left to right.
Decimal point character
The decimal-point character is "/1•
Character testing and case mapping
The implementation-defined aspects of character testing functions are
presented in the following table:
ANSI implementation-defined behavior
Table A-4
isalnum( )
isalpha( )
iscntrl( )
isdigit( )
isgraph( )
Character Testing Functions
!"#$%&'( )*+,-./
islower( )
isprint( )
{ lr
!"#$%&'( )*+,-./
ispunct( )
isspace( )
isupper( )
isxdigit( )
[ \ r_'
!"#$%&'( )*+,-./
{ lr
" ", \f, \n, \r, \t, \v
ASCII value
48 through 57
65 through 90
97 through 122
65 through 90
97 through 122
0 through 31 and 127
48 through 57
48 through 57
65 through 90
97 through 122
33 through 47
58 through 64
91 through 96
123 through 126
97 through 122
48 through 57
65 through 90
97 through 122
33 through 47
58 through 64
91 through 96
123 through 126
33 through 47 .
58 through 64
91 through 96
123 through 126
32, 12, 10, 13, 9, 1 1
65 through 90
48 through 57
65 through 70
97 through 103
Programming Tools Guide
Locale-specific behavior
The implementation-defined aspects of case mapping functions are presented
in the following table:
Table A-S
toupper( )
tolower( )
Case Mapping Functions
Input Character(s)
Output Character(s)
Collation sequence
In the default "C" locale, characters are collated as they would be if they were
interpreted as numeric values. For normal letters, this corresponds to the nor­
mal alphabetic collation sequence.
Time and date formats
The formats for time and date are AM and PM for morning and evening
12-hour periods of the day respectively.
The short formats for names of the days of the week are: Sun, Mon, Tue, Wed,
Thu, Fri, Sat. The long formats for names of the days of the week are: Sunday,
Monday, Tuesday, Wednesday, Thursday, Friday, Saturday.
The short formats for the names of the months are: Jan, Feb, Mar, Apr, Jun, Jul,
Aug, Sep, Oct, Nov, Dec. The long formats for the names of the months are:
January, February, March, April, June, July, August, September, October,
November, December.
ANSI implementation-defined behavior
C6.0 Implementation limits description
This section describes the limits on values defined for this implementation.
Environmental limits
The following environmental limits are defined for this implementation:
Table A-6 limits.h
Programming Tools Guide
C6.0 Implementation limits description
Table A-7
( (float)1 .17549435e-38)
1 .7976931348623147e+308
ANSI implementation-defined behavior
Translation limits
This section describes the translation limits defined for this environment. The
following translation limits are defined for this implementation:
Programming Tools Guide
Translation limits
Table A-8
nesting levels of compound statements, iteration
control structures, and selection control structures.
nesting levels of conditional inclusion
declarators nested by parentheses within a full
expressions nested by parentheses within a full
significant initial characters in an internal identifier or a macro name
significant initial characters in an external identifier
external identifiers in one translation unit
macro identifiers simultaneously defined in one
translation unit
parameters in one function definition
bytes in an object (in a hosted environment)
nesting levels for #include'd files
case labels for a switch statement (excluding
those for any nested switch statements)
members in a single structure or union
pointer, array, and function declarators (in any
combinations) modifying an arithmetic, a structure, a union, or an incomplete type in a declaration
identifiers with block scope declared in one
arguments in one function call
parameters in one macro definition
arguments in one macro invocation
characters in a logical source line
characters in a character string literal or wide
string literal (after concatenation)
enumeration constants in a single enumeration
levels of nested structure or union definitions in
a single struct-declaration-list
functions that can be registered by atexit( ).
Programming Tools Guide
Compiler exit codes and error messages
This appendix describes the exit codes returned by the compiler. It also lists
error messages you may encounter as you develop a program, and gives a
brief description of actions you can take to correct the errors.
Compiler exit codes
All the programs in the C Compiler package return an exit code (sometimes
called an "errorlevel" code) that can be used by other programs such as make.
If the program finishes without errors, it returns a code of 0. The code
returned varies depending on the error encountered.
Code Meaning
No fatal error
Program error (such as compiler error)
System level error (such as out of disk space or compiler internal error)
Messages that indicate errors on the command line used to invoke the com­
piler have one of the following formats:
command line fatal error D1xxx: messagetext
(fatal error)
command line error D2xxx: messagetext
command line warning D4xxx: messagetext
If possible, the compiler continues operation, printing a warning message. In
some cases, command-line errors are fatal and the compiler terminates pro­
�'-- i
Appendix B Compiler exit codes and error messages
Command-line fatal-error messages
The following messages identify fatal errors. The compiler driver canno t
recover from a fatal error; it terminates after printing the error message.
The compiler detected an unknown fatal-error condition.
The compiler could not find the given file in the current working
directory or any of the other directories named in the PATH variable.
D1 0 0 1 could not execut e
No more file descriptors were available to redirect the output of the
-P option to a file.
D1 0 0 2 too many open f i l e s , canno t redi r e c t
D1 0 0 3 no mor e proc e s s e s , t ry later
Can't fork any more processes. Process table is full.
Command-line error m.essages
When the compiler driver encounters any of the errors listed in this section, it
continues compiling the program (if possible) and outputs additional error
messages. However, no object file is produced.
The compiler detected an unknown error condition.
D2 0 0 1 too many symbo l s prede f ined with -D
Too many symbolic constants were defined using the -D option on
the command line.
The limit on command-line definitions is normally 16; you can use the
aU or -u option to increase the limit to 200.
D2 0 0 2 a previous ly de f ined model spec i f i c a t ion ha s been ove r r i dden
Two different memory models were specified; the model specified
later on the command line was used.
D2 0 0 3 mi s s ing source f i l e name
You did not give the name of the source file to be compiled.
option f lag , would overwr i t e stringl wi t h string2
The specified option was given more than once, with conflicting argu­
ments string1 and string2.
D2 0 0 7 bad
option f lags , string
Too many letters were given with the specified option (for example,
with the -0 option).
02 0 0 8 too many
Programming Tools Guide
Command-line error messages
character in optionstring
One of the letters in the given option was not recognized.
02 0 0 9 unknown opt ion
02 0 1 2 t o o many l inker f lags on command l ine
You tried to pass more than 128 separate options and object files to
the linker.
02 0 1 3 incomp l e t e mode l spec i f icat ion
Not enough characters were given for the -Astring option. The
-Astring option requires all three letters (to specify the data-pointer
size, code-pointer size, and segment setup).
02 0 1 4
-NO n o t a l l owed w i t h -Ad
You cannot rename the default data segment unless you give the
-Auxx option (SS != DS, load DS) on the command line.
-NO name are incompat ib l e
You tried to rename the default data segment to the given name when
you specified the -Gw option. Renaming the default data segment is
illegal in this case because the aGw option requires the -Awxx option.
02 0 1 6 -Gw and
02 0 17 -Gw and -Au f lags are incompat ible
You tried to specify the -Auxx option (SS != DS, load DS) with the
-Gw option. Specifying -Auxx with -Gw is illegal because the -Gw
option requires the -Awxx option.
You specified the source file as an output-file name. The compiler does
not allow the source file to be overwritten by one of the compiler out­
put files.
02 0 1 9 c anno t overwr i t e the sour c e f i l e ,
02 0 2 0 -Gc op t i on requi r e s ext ended keywords to be enab l e d ( - Z e )
The -Gc option and the -Za option were specified on the same com­
mand line. The -Gc option requires the extended keyword cded to be
enabled if library functions are to be accessible.
A non-numerical string was specified following an option that
required a numerical argument.
02 0 2 1 inva l id nume r i c a l argument
02 0 2 2 canno t open help f i l e , c c . hlp
The -help option was given, but the file containing the help messages
(cc.hlp) was not in the default directory (/usr/l.ib/286) or in any of the
directories specified by the PATH environment variable.
02 0 2 4
-Gm and -NO are incompat ib l e opt ions
You compiled with both the -Gm and -ND compiler options. These
options are incompatible because -Gm indicates that string literals
and near const data items should be allocated in the CONST seg­
ment, while the -ND option attempts to allocate the same items in a
different, named segment.
Appendix B Compiler exit codes and error messages
Command-line warning messages
The messages listed in this section indicate potential problems but do not
hinder compilation and linking.
An unknown warning has been detected by the compiler.
04 0 0 1 l i s t ing has precedence over assembly output
Two different listing options were chosen; the assembly listing is not
04 0 0 2 ignoring unknown flag string
One of the options given on the command line was not recognized
and is ignored.
04 0 0 3 8 0 1 8 6 / 2 8 6 selected over 8 0 8 6 for code generat ion
Both the -GO option and either the -Gl or -G2 option were given; -Gl
or -G2 takes precedence.
04 0 0 4 opt imiz ing for t ime over space
This message confirms that the -Ot option is used for optimizing.
04 0 0 6 only one o f -P/ -E/ -EP allowed, -P selected
Only one preprocessor output option can be specified at one time.
04 0 0 7
ignored ( must also specify -P or -E or -EP )
The -C option must be used in conjunction with one of the prepro­
cessor output flags, -E, -EP, or -P.
04 0 0 8 non-standard model -- default ing to small model l ibraries
A nonstandard memory model was specified with the -Astring
option. The library search records in the object model were set to use
the small-model libraries.
04 0 0 9 threshold only for far/huge dat a , ignored
The -Gt option canno t be used in memory models that have near data
pointers. It can be used only in compact, large, and huge models.
04 0 1 1 preprocessing overrides source l i s t ing
Only a preprocessor listing was generated, since the compiler cannot
generate both a source listing and a preprocessor listing at the same
04 0 1 2 funct ion dec larat ions override source list ing
The compiler cannot generate both a source-listing file and the func­
tion prototype declarations at the same time.
040 1 3 combined l i s t ing has precedence over obj ect list ing
When -Fe is specified along with either -Fl or -Fa, the combined listing
(-Fe) is created.
Programming Tools Guide
Compiler erro r messages
04 0 1 4 invalid value number for string . Default number i s used
An invalid value was given in a context where a particular numerical
value was expected.
04 017 conflict ing stack checking opt ions - stack checking disabled
Both the -Ge and the -Gs flags are given in one compile co mmand
(-Ge enables stack checking, -Gs disables it).
Compiler error messages
The error messages produced by the Microsoft C compiler fall into three
1 . Fatal-error messages
2. Compilation-error messages
3. Warning messages
The messages for each category are listed below in numerical order, with a
brief explanation of each error. To look up an error message, first determine
the message category, then find the error number. All messages give the file
name and line number where the error occurs.
Fatal-error messages
Fatal-error messages indicate a severe problem, one that prevents the com­
piler from processing your program any further. These messages have the
following format:
filename(line) : fatal error Clxxx: messagetext
After the compiler displays a fatal-error message, it terminates without pro­
ducing an object file or checking for further errors.
Compilation-error messages
Compilation-error messages identify actual program errors. These messages
appear in the following format:
filename(line) : error C2xxx: messagetext
The compiler does not produce an object file for a source file that has compila­
tion errors in the program. When the compiler encounters such errors, it
attempts to recover from the error. If possible, it continues to process the
source file and produce error messages. If errors are too numerous or too
severe, the compiler stops processing.
Appendix B Compiler exit codes and error messages
Warning messages
Warning messages are informational only; they do not prevent compilation
and linking. These messages appear in the following format:
filename(line) : warning C4xxx: messagetext
You can use the -W option to control the level of warnings that the compiler
generates. This option is described in the "Compiling and linking C language
programs" chapter of this guide.
Fatal-error messages
The following messages identify fatal errors. The compiler cannot recover
from a fatal error; it terminates after printing the error message.
ClO O O UNKNOWN FATAL ERROR Contact SCO Technical Support
An unknown error condition has been detected by the compiler.
ClO O l Int ernal Compi ler Error ( compiler file filename, l ine n ) Contact
SCO Technical Support
The compiler detected an internal inconsistency. The filename refers to
an internal compiler file, not your source file.
C1002 out of heap space in Pass 2
The compiler has run out of dynamic memory space. This usually
means that your program has many symbols and/or complex expres­
sions. To correct the problem, divide the file into several smaller
source files, or break expressions into subexpressions.
Cl003 error count exceeds n ; stopping compilat ion
Errors in the program were too numerous or too severe to allow
recovery, and the compiler must terminate.
C1004 unexpected end-of - f ile found
This message appears when you have insufficient disk space for the
compiler to create the temporary files it needs. The space required is
approximately two times the size of the source file. This message can
also occur when a comment does not have a closing delimiter ( *I ), or
when an #if directive occurs without a corresponding closing #endif
ClOOS string too big for buf fer
A string in a compiler intermediate file overflowed a buffer.
31 0
Programming Tools Guide
Compiler erro r messages
C l 0 0 6 write error on compi ler-generated f i l e
The compiler was unable t o create the intermediate files used in the
compilation process.
The following conditions commonly cause this error:
1. A system file or the inode table is full at time of compilation
2. Not enough space on a device containing a compiler intermediate
C l 0 0 7 unrecognized flag string in option
The string in the command-line aption was not a valid option.
Cl0 0 8 no input file specified
The compiler found no input file on the co mmand-line.
C l 0 0 9 compi ler limit : macros nested too deeply
The expansion of a macro exceeds the available space. Check to see if
the macro is recursively defined, or if the expanded text is too large.
Cl010 compiler limit : macro expans ion too big
The expansion of a macro exceeds the available space.
ClOll compi ler limit : macro def init ion too big
The expanded macro definition is larger than the internal buffer size
used to save it. This size is currently: (24*1024) - 5.
C1012 unmat ched parenthesis - missing character
The parentheses in a preprocessor directive were not matched; charac­
ter is either a left or right parenthesis.
C1013 cannot open source file filename
The given file either did not exist, could not be opened, or was not
found. Make sure your environment settings are valid and that you
have given the correct path name for the file.
C1014 too many include files
Nesting of #include directives exceeds 10 levels.
C 1 0 1 6 # i f [n] de f expected an ident i f ier
You must specify an identifier with the #ifdef and #ifndef directives.
Cl017 invalid integer constant express ion
The expression in an #if directive must evaluate to a constant.
C l 0 1 8 unexpected #elif
The #elif directive is legal only when it appears within an #if, #ifdef,
or #ifndef directive.
C l 0 1 9 unexpected #else
The #else directive is legal only when it appears within an #if, #ifde£,
or #ifndef directive.
31 1
Appendix B Compiler exit codes and erro r messages
C10 2 0 unexpected # endi f
An #endif directive appears without a matching #if, #ifdef, or #ifndef
C10 2 1 invalid preprocessor command string
The characters following the number sign (#) do not form a valid
preprocessor directive.
C1022 expected # endi f
An #if, #ifdef, or #ifndef directive was not terminated with an #endif
Cl02 3 cannot open source file filename
Unable to open filename for reading. See open(S) for possible reasons
as to why the open might fail.
C1024 cannot open include filefilename
Cannot find or open include file filename in the list of search direc­
Cl0 2 6 parser stack overflow, please s imp l i fy your program
Your program cannot be processed because the space required to
parse the program causes a stack overflow in the compiler. To solve
this problem, try to simplify your program.
C1027 DGROUP data al locat ion exceeds 64K
More than 64K of variables was allocated to the default data segment.
For compact-, medium-, large-, or huge-model programs, use the -Gt
option to move items into separate segments.
C1032 cannot open obj ect list ing file filename
One of the following statements about the file name or path name
given (filename) is true:
1 . The given name is not valid.
2. The file with the given name cannot be opened for lack of space.
3. A read-only file with the given name already exists.
C1033 cannot open assembly-language output file filename
One of the conditions listed under error message C1032 prevents the
given file from being opened.
C1035 expression too complex, please s imp l i fy
The compiler cannot generate the code for a complex expression.
Break the expression into simpler subexpressions and recompile.
C10 3 6 cannot open source list ing file filename
One of the conditions listed under error message C1032 prevents the
given file from being opened.
31 2
Programming Tools Guide
Compiler erro r messages
C 1 0 3 7 cannot open obj ect file filename
One of the conditions listed under error message C1032 prevents the
given file from being opened.
C103 9 unrecoverable heap overf low in Pas s 3
The post-optimizer compiler pass overflowed the heap and could not
continue. Try recompiling with the -Od option (see "Compiling and
linking C language programs" ) or try rewriting the function contain­
ing the line that caused the error.
C10 4 0 unexpected end-o f - f ile in source file filename
The compiler detected an unexpected end-of-file condition while
creating a source listing or mingled source/object listing. This error
probably occurred because the source file was edited during compila­
C l 0 4 1 cannot open compiler intermediate f i l e - no more f i les
The compiler could not create futermediate files used in the compila­
tion process because no more file descriptors were available.
C1042 cannot open compiler intermediate file - no such f i l e or directory
The compiler could not create intermediate files used in the compila­
tion process because the /tmp directory did not exist.
C 1 0 4 3 cannot open compi ler intermediate f i l e
The compiler could not create intermediate files used in the compila­
tion process. The exact reason is unknown.
C 1 0 4 4 out o f disk space for compiler int ermediate f i le
The compiler could not create intermediate files used in the compila­
tion process because no more space was available. To correct the
problem, make more space available on the disk and recompile.
C1045 f loat ing point overf low
The compiler generated a floating-point exception while doing con­
stant arithmetic on floating-point items at compile time, as in the fol­
lowing example:
f l o a t fp_va l = l . O e l O O ;
In this example, the double-precision constant l.OelOO exceeds the
maximum allowable value for a floating-point data item.
C 1 0 4 6 comp i ler l imit : struct/union nes t ed too deep ly
The struct/union is nested too deeply. Try simpilfying the definition.
Cl o 4 6 bad option flag, would overwrite flag with string
A string option is being set, but setting flag with string would
overwrite flag.
C 1 0 4 7 too ma ny option f l a g s , string
The option appeared too many times. The string contains the occur­
rence of the option that caused the error.
Appendix B Compiler exit codes and error messages
C 1 0 4 7 l imit o f -D/ - I exceeded at s tring
The -D/-1 options appeared too many times. The string contains the
occurence of the option that caused the error.
C l 0 4 8 Unknown opt ion character in optionstring
The character was not a valid letter for optionstring.
C l 0 4 9 inva l i d numerical argument string
A numerical argument was expected instead of string.
C l O S O segmentname code segment too large
A code segment grew to within 36 bytes of 64K during compilation. A
36-byte pad is used because of a bug in some 80286 chips that can
cause programs to exhibit strange behavior when, among other condi­
tions, the size of a code segment is within 36 bytes of 64K.
C 1 0 S 2 comp i l er l imit : # i f / # i fde f blocks nes t ed too deep ly
You have exceeded the maximum nesting level for #if/#ifdef direc­
C 1 0 S 3 comp i l er l imit : s t ruct /union nested too deeply
Structure and union definitions were nested to more than 10 levels.
C 1 0 S 4 comp i ler l imit : ini t ia l i z ers nes t ed too deep ly
The compiler limit on nesting of initializers was exceeded. The limit
ranges from 10 to 15 levels, depending on the combination of types
being initialized. To correct this problem, simplify the data type being
initialized to reduce the levels of nesting, or assign initial values in
separate statements after the declaration.
C l O S S comp i ler l imit : out o f keys
The maximum number of keys was exceeded.
C l 0 S 6 comp i l e r l imit : out of macro expans ion spac e
The compiler has overflowed an internal buffer during the expansion
of a macro; reduce the complexity of the macro.
C 1 0 S 7 unexpected end-o f - f i l e in macro expansion ; { mi s s ing ) ? )
The compiler has encountered the end of the source file while gather­
ing the arguments of a macro invocation. Usually this is the result of
a missing closing parenthesis ( ) ) on the macro invocation.
C l 0 S 9 comp i ler i s out o f near heap space
The compiler has run out of storage for items that it stores in the near
(default data segment) heap. This usually means that your program
has too many symbols or complex expressions. To correct the prob­
lem, divide the file into several smaller source files, or break expres­
sions into smaller subexpressions.
Programming Tools Guide
Compiler erro r messages
C 1 0 6 0 comp i ler i s out o f far heap space
The compiler has run out of storage for items that it stores in the far
heap. Usually this is the result of too many symbols in the symbol
C1 0 6 4 comp i ler l imit : t oken over f lowed int e rnal bu f fe r
You defined more than 1 0 distinct text segments with the alloc_text
C l 0 6 8 cannot open f i l e stdout
Can't open the standard output file for writing. Check if current direc­
tory is read-only or if stdout file exists and is read only.
C 1 0 7 0 mi smat ched h f / # endi f pair in f i l efilename
Mismatched #if/#endif pair in file filename.
C 1 0 7 1 unexpected end-o f - f i le found in comment
An unterminated comment was found.
C 1 0 7 2 filename :cannot read f i le
Some problem occurred while trying to read the file. See read(S).
C l 0 9 0 namedat a a l locat ion exceeds 6 4 K
· Segment name i s bigger than 64K.
C l l O O inva l i d number: num, in # l ine prepro c e s s ing direct ive
The linenumber is negative or too big.
C l l 2 6 name :automa t i c al locat ion exceeds % s
There are more than num bytes of automatic (local) variables in the
C 1 1 2 7 Segment :segment rede f init ion
Segment name was redefined.
C l S O l comp i l er l imit : too many t emporary vari ab l e s
Thi s i s an internal limit. Ran out o f temporary variables.
Compilation-Error messages
The messages listed below indicate that your program has errors. When the
compiler encounters any of the errors listed in this section, it continues pars­
ing the program (if possible) and outputs additional error messages. How­
ever, no object file is produced.
C2 0 0 0 UNKNOWN ERROR Contact SCO Technical Support
The compiler detected an unknown error condition.
C2 0 0 1 newl ine in cons tant
A new-line character in a character or string constant was not in the
correct escape-sequence format (\n).
31 5
Appendix B Compiler exit codes and error messages
C2 0 0 3 expe c t ed de f ined id
The identifier to be checked in an #if directive was not enclosed in
C2 0 0 4 expec t ed de f ined ( id )
An #i f directive caused a syntax error.
C2 0 0 5 # l ine exp e c t ed a l ine number , found token
A #line directive lacked the required line-number specification.
C2 0 0 6 # inc lude expected a f i lename , found token
An #include directive lacked the required filename specification.
C2 0 0 7 # de f ine syntax
A #define directive caused a syntax error.
C2 0 0 8 character : unexpect ed in macro de f init ion
The given character was used incorrectly in a macro definition.
C2 0 0 9 reuse o f macro formal identifier
The given identifier was used twice in the formal-parameter list of a
macro definition.
C2 0 1 0 character : unexpected in macro formal -parame t e r l i s t
The given character was used incorrectly in the formal-parameter list
of a macro definition.
C2 0 1 1 identifier : identifier type rede f init ion
The given macro definitions exceeded 256 bytes.
C2 0 1 2 m i s s ing name fol lowing <
An #include directive lacked the required filename specification.
C2 0 1 3 mi s s ing >
The closing angle bracket ( > ) was missing from an #include direc­
C2 0 1 4 preprocessor command must start as f i r s t non-wh i t e - space
Non-white-space characters appear before the number sign ( #) of a
preprocessor directive on the same line.
C2 0 1 5 too many charac ters in cons tant
A character constant containing more than one character or escape
sequence was used.
C2 0 1 6 no c l o s ing s ingle quotat ion mark
A character constant was not enclosed in single quotation marks.
C2 0 1 7 i l l egal e scape sequence
The character or characters after the escape character ( \ ) did not
form a valid escape sequence.
31 6
Programming Tools Guide
Compiler error messages
C2 0 1 B unknown chara c t er Oxcharacter
The given hexadecimal number does not correspond to a character.
C2 0 1 9 exp e c t ed preproces sor command, found character
The given character followed a number sign (#), but it was not the
first letter of a preprocessor directive.
C2 0 2 0 member : keywordmember rede f init ion
Member member was redefined. Classification is keyword.
C2 0 2 1 expect ed exponent va lue , not character
The given character was used as the exponent of a floating-point con­
stant but was not a valid number.
C2 0 2 2 number : too big for char
The number was too large to be represented as a character.
C2 0 2 3 divide by 0
The second operand in a division operation ( I ) evaluated to zero, giv­
ing undefined results.
C2 0 2 4 mod by 0
The second operand in a remainder operation ( % ) evaluated to zero,
giving undefined results.
C2 0 2 5 identifier : enum/ s t ruc t /union type rede f in i t ion
The given identifier had already been used for an enumeration, struc­
ture, or union tag.
C2 0 2 6 identifier : member o f enum rede f init ion
The given identifier had already been used for an enumeration con­
stant, either within the same enumeration type or within another
enumeration type with the same visibility.
C2 0 2 7 u s e o f unde f ined type type
The member type was not defined.
C2 0 2 8 s t ruct /union member mus t be ins ide a s t ru c t / union
Structure and union members must be declared within the structure
or union. This error may be caused by an enumeration declaration
that contains a declaration of a structure member, as in the following
enum a (
j a nuary ,
f ebruary ,
i n t march ;
/ * s t ru c t ure dec l ar a t i o n :
* * i l l ega l
31 7
Appendix B Compiler exit codes and erro r messages
C2 0 3 0 identifier : s t ruc t /union member rede f init ion
The identifier was used for more than one member of the same struc­
ture or union.
C2 0 3 1 identifier : func t ion cannot be s t ruct /union member
The given function was declared to be a member of a structure. To
correct this error, use a pointer to the function instead.
C2 0 3 2 identifier :funct ion cannot be member o f keyword tagname
Member/fields cannot be functions. Use a pointer to function instead.
C2 0 3 3 identifier : bit f ield cannot have indirect ion
The given bit field was declared as a pointer ( * ) , which is not
C2 0 3 4 identifier : type o f bit f i eld too sma l l for number o f b i t s
The number o f bits specified in the bit field declaration exceeded the
number of bits in the given base type.
C2 0 3 5 s t ruct /union identifier : unknown s i z e
The given structure o r union had an undefined size.
C2 0 3 6 struct/union identifier keyword : unknown s i z e
The given identifier has an unknown size.
C2 0 3 7 l e f t o f character spec i f ies unde f ined s t ruct /union identifier
The expression before the member-selection operator (-> or )
identified a structure or union type that was not defined.
C2 0 3 8 identifier : not s t ruct /union member
The given identifier was used in a context that required a structure or
union member.
C2 0 3 9 identifier : not keyword member
The identifier is not a struct/union member.
C2 0 4 1 i l l egal digit digit for ba se base
A number of given base has an illegal digit in it.
C2 0 4 2 s i gned/unsigned keywords mutua l ly exc lus ive
The signed and unsigned keywords may not appear in the same
C2 0 4 3 i l l egal break
A break statement is legal only when it appears within a do, for,
while, or switch statement.
C2 0 4 4 i l lega l cont inue
A continue statement is legal only when it appears within a do, for, or
while statement.
c2 o 4 5 identifier : label redefined
The given label appeared before more than one statement in the same
Programming Tools Guide
Compiler erro r messages
C2 0 4 6 i l l egal case
The case keyword may appear only within a switch statement.
C2 0 4 7 i l l ega l de fault
The default keyword may appear only within a switch statement.
C2 0 4 8 more than one de fault
A switch statement contained more than one default label.
C2 0 4 9 c a s e va lue number already used
A duplicate case value was used.
C2 0 5 0 nonint egral swi t ch expres s ion
A switch expression was not integral.
C2 0 5 1 c a s e expr e s s ion not cons tant
Case expressions must be integral constants.
C2 0 5 2 c a s e exp r e s s ion not int egral
Case expressions must be integral constants.
C2 0 5 4 exp e c t ed ( t o fol low identifier
The context requires parentheses after the function identifier.
C2 0 5 5 exp e c t ed forma l -parameter l i s t , not a typ e l i s t
An argument-type list appeared in a function definition instead o f a
formal-parameter list.
C2 0 5 6 i l l ega l express ion
An expression was illegal because of a previous error. (The previous
error may not have produced an error message.)
C2 0 5 7 expe c t ed cons tant express ion
The context requires a constant expression.
C2 0 5 8 con s t ant expres s ion i s not int egra l
The context requires an integral constant expression.
C2 0 5 9 syntax error : token
The given token caused a syntax error.
C2 0 6 0 syntax error : end-o f - f i le found
The end of the file was encountered unexpectedly, causing a syntax
error. This error can be caused by a missing closing curly brace ( } ) at
the end of your program.
C2 0 6 1 syntax error : ident i f i er identifier
The given identifier caused a syntax error.
C2 0 6 2 typ e � e unexpec t ed
The given �e was misused.
C2 0 6 3 identifier : not a funct ion
The given identifier was not declared as a function, but an attempt was
made to use it as a function.
Appendix B Compiler exit codes and error messages
C2 0 6 4 t erm doe s not evaluate to a funct ion
An attempt was made to call a function through an expression that
did not evaluate to a function pointer.
C2 0 6 5 identifier : unde f ined
The given identifier was not defined.
C2 0 6 6 cast t o funct ion type i s i l l egal
An object was cast to a function type.
C2 0 6 7 cast to array type i s i l l ega l
An object was cast to an array type.
C2 0 6 8 i l l egal cast
A type used in a cast operation was not a legal type.
C2 0 6 9 cast o f void t erm to nonvoid
The void type was cast to a different type.
C2 0 7 0 i l l egal s i z eo f operand
The operand of a sizeof expression was not an identifier or a type
C2 0 7 1 class : i l l egal storage class
The given storage class cannot be used in this context.
C2 0 7 2 identifier : ini t ia l i zat ion o f a func t ion
An attempt was made to initialize a function.
C2 0 7 5 identifier : array ini t ializat ion needs curly bra c e s
The braces ( { } ) around the given array initializer were missing.
C2 0 7 6 identifier : s t ruct/union ini t ia l i zat ion needs curly bra c e s
The braces ( { } ) around the given structure o r union initializer were
C2 0 7 7 nonscalar f ield ini t ia l i z er identifier
An attempt was made to initialize a bit field member of a structure
with a non-integral value.
C2 0 7 8 too many init i a l i z e r s
The number o f initializers exceeded the number o f objects to b e ini­
C2 0 7 9 expression uses unde f ined s t ruc t /union
The given identifier was declared as a structure or union type that
had not been defined.
C2 0 8 2 rede f init ion o f formal parameter identifier
A formal parameter to a function was redeclared within the function
C2 0 8 3 keyword compari son i l l egal
Canno t compare structs /unions.
Programming Tools Guide
Compiler erro r messages
C2 0 8 4 funct i on iden tifier a l ready has a body
The given function had already been defined.
C2 0 8 5 identifier : not in formal -parameter l i s t
The given parameter was declared in a function definition for a
nonexistent formal parameter.
C2 0 8 6 identifier : rede f init ion
The given identifier was defined more than once.
C2 0 8 7 identifier : mi s s ing subscript
The definition of an array with multiple subscripts was missing a sub­
script value for a dimension other than the first dimension, as in the
following example:
i n t func ( a )
char a [ l O ] [ ] ;
i n t func ( a )
char a [ ] [ 5 ] ;
I l l eg a l * I
Leg a l * /
C2 0 8 8 op : i l l egal for struct/union
This is an illegal operation for a struct/union.
C2 0 8 9 tagname : struct/union too large
The struct/union is larger than the maximun allowable size.
C2 0 9 0 funct ion returns array
A function cannot return an array. (It can return a pointer to an
C2 0 9 1 funct ion returns funct ion
A function cannot return a function. (It can return a pointer to a func­
C2 0 9 2 a rray e lement type cannot be funct ion
Arrays of functions are not allowed; however, arrays of pointers to
functions are allowed.
C2 0 9 3 cannot use address o f automa t i c var iable a s s t a t i c ini t i a l i z er
You cannot use the address of an automatic variable in the initializer
of a static item.
Appendix B Compiler exit codes and erro r messages
C2 0 9 4 label iden tifier was unde f ined
The function did not contain a statement labeled with the given
C2 0 9 5 function : actual has type void : parame t e r number
An attempt was made to pass a void argument to a function. Formal
parameters and arguments to functions cannot have type void; they
can, however, have type void "' (pointer to void).
C2 0 9 6 s t ruct /union compar i son i l l egal
You cannot compare two structures or unions. (You can, however,
compare individual members within structures and unions.)
C2 0 9 7 i l l egal ini t iali zat ion
An attempt was made to initialize a variable using a nonconstant
C2 0 9 8 nonaddre s s express ion
An attempt was made to initialize an item that was not an lvalue.
C2 0 9 9 nonconstant o f f set
An initializer used a nonconstant offset.
C2 1 0 0 i l l egal indirect ion
The indirection operator ( "' ) was applied to a nonpointer value.
C2 1 0 1
C2 1 0 2
on cons tant
The address-of operator ( &t ) did not have an lvalue as its operand.
requires !value
The address-of operator must be applied to an lvalue expression.
C2 1 0 3
on regi s t er variable
An attempt was made to take the address of a register variable.
C2 1 0 4
on bit f ield ignored
An attempt was made to take the address of a bit field.
C2 1 0 5 operator needs !value
The given operator did not have an lvalue operand.
C2 1 0 6 op erator : l e f t operand mus t be !va lue
The left operand of the given operator was not an lvalue.
C2 1 0 7 i l l egal index, indirect ion not allowed
A subscript was applied to an expression that did not evaluate to a
C2 1 0 8 nonint egral index
A nonintegral expression was used in an array subscript.
C2 1 0 9 subscrip t on nonarray
A subscript was used on a variable that was not an array.
Programming Tools Guide
Compiler error messages
C2 1 1 0 point er + point er
An attempt was made to add one pointer to another.
C2 1 1 1 pointer + nonintegral va lue
An attempt was made to add a nonintegral value to a pointer.
C2 1 1 2 i l l egal point er subt ract ion
An attempt was made to subtract pointers that did not point to the
same type.
C2 1 1 3 point er subt rac t ed f rom nonpoint er
Canno t subtract pointer from int.
C2 1 1 4 operator : po int er on l e f t ; needs int egral r i ght
The left operand of the given operator was a pointer; the right operand
must be an integral value.
C2 1 1 5 identifier : incompat ible types
An expression contained incompatible types.
C2 1 1 7 operator : i l l egal for s t ruct /union
Structure and union type values are not allowed with the given opera­
C2 1 1 8 negat ive subscript
A value defining an array size was negative.
C2 1 1 9 typ ede f s both de f ine indirect ion
Two typedef types were used to declare an item and both typedef
types had indirection. For example, the declaration of p in the follow­
ing example is illegal:
/ * this
i n t * P_ I N T ;
short * P_SHORT ;
dec l arat i o n i s i l l eg a l * /
P_ INT p ;
C2 1 2 0 void i l lega l with all · type s
The void typ e was used in a declaration with another type.
C2 1 2 1 op : bad left/right operand
Bad operand.
C2 1 2 4 divide or mod by z e ro
Attemptiing to divide or mod by zero.
C2 1 2 5 identifier : a l locat ion exceeds 6 4 K
The given item exceeds the size limit o f 64K. The only items that are
allowed to exceed 64K are huge arrays.
C2 1 2 7 parameter a l locat ion exceeds 3 2 K
The storage space required for the parameters to a function exceeded
the limit of 32K.
r · -- -· - · -
· . __
Appendix B - Compiler exit codes and error messages
C2 1 2 8 identifier : huge array cannot be al igned t o s egment boundary
The given array violated one of the restrictions imposed on huge
arrays; see the "Memory models" section of the manual page for more
information on these restrictions.
C2 1 2 9 stat i c funct ion identifier not found
A forward reference was made to a static function that was never
C2 1 3 0 # l ine expected a s t r ing containing the f i l e name , found token
A file name was missing from a #line directive.
C2 1 3 1 more than one memory a t t r ibut e
More than one near, far, o r huge attribute was applied to an item, as
in the following example:
typede f i n t near N I N T ;
N I NT f a r a ;
I * I l lega l * I
C2 1 3 2 syntax error : unexpected ident i f ie r
An identifier appeared in a syntactically illegal context.
C2 1 3 3 identifier : unknown size
An attempt was made to declare an unsized array as local variable, as
in the following example:
i n t mat_add ( array l )
i n t array l [ J ;
i n t array2 [ ) ;
Lega l * I
I l l ega l * I
C2 1 3 4 identifier : s t ruct/union too large
The size of a structure or union exceeded the compiler limit (232
bytes). This limit is 64K on 80286 systems.
C2 1 3 6 syr,nbol : prototype mus t have parame t e r types
Wrong kind o f formals for prototype.
C2 1 3 7 empty character constant
An illegal character constant was used.
C2 1 3 9 typ e following �e i s i l l egal
An illegal �e combination such as the following was used:
l on g char a ;
Programming Tools Guide
Compiler error messages
C2 1 4 0 argument cannot be funct ion type
A function was declared as a formal parameter of another function, as
in the following example:
i n t f u nc l ( a )
int a ( )
/ * I l lega l * /
C2 1 4 1 value out o f range for enum cons tant
An enumeration constant had a value outside the range of values
allowed for type int.
C2 1 4 3 syntax error : mi s s ing token1 before token2
The compiler expected token1 to appear before token2. This message
may appear if a required closing curly brace ( } ), right parenthesis ( ) ),
or semicolon ( ; ) is missing.
C2 1 4 4 syntax error : miss ing token before typ e type
The compiler expected the given token to appear before the given type
name. This message may appear if a required closing curly brace ( } ),
right parenthesis ( ) ), or semicolon ( ; ) is missing.
C2 1 4 5 syntax error : mi s s ing token before ident i f i e r
The compiler expected the given token to appear before an identifier.
This message may appear if a semicolon (;) does not appear after the
last declaration of a block.
C2 l 4 6 syntax error : mi s s ing token before ident i f i e r identifier
The compiler expected the given token to appear before the given
C2 1 4 7 unknown s i z e
An attempt was made to increment an index o r pointer to an array
whose base type has not yet been declared.
C2 1 4 8 array too large
An array exceeded the maximum legal size (2 bytes).
C2 1 4 9 identifier : named bit f ield cannot have 0 width
The given named bit field had a zero width. Only unnamed bit fields
are allowed to have zero width.
C2 1 5 0 identifier : bit f ield mus t have typ e int , s i gned int , or uns igned int
The ANSI C standard requires bit fields to have types of int, signed
int, or unsigned int. This message appears only if you compiled your
program with the -Za option.
C2 1 5 1 more than one language a t t r ibut e spec i f i ed
More than one keyword specifying a function-calling convention was
Appendix B - Compiler exit codes and error messages
C2 1 5 2 identifier : point ers t o funct ions with di f f erent a t t r ibut e s
An attempt was made to assign a pointer to a function declared with
one calling convention (cdecl, fortran, or pascal) to a pointer to a
function declared with a different calling convention.
C2 1 5 3 hex constant s mus t have at least 1 hex digit
At least one hexadecimal digit must follow the "x". The hexadecimal
constants Ox and OX are illegal.
C2 1 5 4 name : doe s not refer t o a s egment name
The name was the first identifier given in an alloc_text pragma argu­
ment list and it is already defined as something other than a segment
C2 1 5 6 pragma mus t be out s ide funct ion
Certain pragmas must be specified at a global level, outside a function
body, and there is an occurrence of one of these pragmas within a
C2 1 5 7 name : mus t be dec lared before use in pragma l i s t
The function name in the list of functions for an alloc_text pragma has
not been declared prior to being referenced in the list.
C2 1 5 8 name : i s a funct ion
name was specified in the list of variables in a same_segment pragma,
but was previously declared as a function.
C2 1 5 9 mor e than one s torage class spec i f ied
lllegal declaration - only one storage class is allowed.
C2 1 6 0 # #
occur at
C2 1 6 1 # #
occur at
the beginning o f a macro de f in i t ion
A macro definition cannot begin with a token-pasting ( ##) operator.
the end o f a macro de f in i t ion
A macro definition cannot end with a token-pasting ( ##) operator.
2 1 6 2 expe c t ed macro formal parameter
The token following a stringizing operator ( # ) must b e a formal
parameter name.
2 1 6 3 string : not ava i lable a s an int rins i c funct ion
A function specified in the list of functions for an intrinsic or function
pragma is not one of the functions available in intrinsic form.
C2 1 6 4 identijier : int rins i c funct ion not dec lared
The intrinsic function was not declared.
C2 1 6 5 �ord : cannot modi fy point ers t o dat a
Bad use of fortran, pascal o r cdecl keyword to modify pointer to data.
C2 1 6 7 name : too many actual parameters for int rins i c
A reference to the intrinsic function name contains too many actual
Programming Tools Guide
Compiler error messages
C2 1 6 8 name : too f ew actual parameters for int rins i c
A reference to the intrinsic function name contains too few actual
C2 1 6 9 name : int rins ic funct ion , cannot be de f ined
An attempt was made to provide a function definition for a function
already declared as an intrinsic.
C2 1 7 0 identifier : not dec lared a s a funct ion , cannot be intr in s i c
You tried to use the intrinsic pragma for an item other than a func­
tion, or for a function that does not have an intrinsic form.
C2 1 7 1 operator : i l l egal operand
illegal operand type for the specified unary operator.
C2 1 7 2 actual : actual i s not a point er : parame t er number
The expression result type is checked against the given argument.
C2 1 7 3 actual : actual i s not a point er : parame t e r number, parame t e r l i s t number
The expression result type is checked against the given argument.
C2 1 7 4 actual : actual has type void : parame t e r number, parame t e r l i s t number
The expression result type is checked against the given argument.
C2 1 7 7 con s t ant too big
Information was lost because a constant value was too large to be
represented in the type to which it was assigned. (1)
i y:,:f:i
i -
C2 1 8 0 cont rol l ing express ion has type void
Result type of the expression is void.
C2 1 8 2 identifier : ha s type vo id
illegal non-indirected void.
C2 1 8 7 : c a s t o f near funct ion po int er to far funct ion point e r
You attempted to cast a near function pointer a s a far function
C2 1 8 9 # error : string
Emit pragma #error string.
C2 1 9 3 identifier : al ready in a s egment
The symbol was already in a segment.
C2 1 9 4 segment : i s a t ext segment
The named segment is a text segment.
C2 1 9 5 segment : i s a data segment
The named segment is a data segment.
C2 2 0 0 name : funct ion has already been de f ined
The function name has already been defined.
C2 2 0 l identifier : s torage class mus t be ext ern
The identifier is an auto or a register, it should be extern.
Appendix B Compiler exit codes and error messages
C2 2 0 5 identifier : cannot ini t ial i z e ext ern block-seeped var i ab l e s
Canno t ititialize extern block-scoped variables.
C2 2 0 6 name : typ ede f cannot be used for funct ion de f init ion
A declarator was declared to be a function by using a typedef.
C2 2 0 8 no members de f ined using this typ e
Tried to define a member of a struct/union o f inappropriate type.
C2 2 1 9 syntax error : type qua l i f ier mus t be a f t er
' * '
Type qualifier must be after '*'.
C2 2 2 3 l e f t o f ->identifier mus t point t o s t ruct /union
Left operand must point to struct/union.
C2 2 2 4 l e f t o f .identifier mus t have s t ruct /union type
Left operand must have struct/union type.
C2 2 2 6 syntax error : unexpec t ed type typename
Unexpected type.
C2 2 2 9 member in keyword nametag has a z e ro - s i z ed array
A zero sized array was detected.
C2 2 3 1
. ' : l e f t operand point s t o keyword, use ' - > '
The indirection is an address. Use '->' instead.
C2 2 3 2
' -> '
l e f t operand has keyword typ e , use
No indirection. Use '.' instead.
C2 4 0 0 in- l ine syntax error in context, found token
The asm inline has a syntax error. The context is either aperand n or
apcode. The token is the offending input token.
C2 4 0 1 register : regi s t er mus t be base in aperand n
The asm inline statement is attempting to use register as a
base/register. A valid base register is required.
C2 4 0 2 register : reg i s t er mus t be index in aperand n
The asm inline statement is attempting to use register as
register. A valik index register is required.
C2 4 0 3 register : regi s t er mus t be base/ index in aperand n
The asm inline statement is attempint to use register as a base/index
register. A valid base/index register is required.
C2 4 0 4 register : i l l egal regi s t er in aperand n
The asm inline statement is attempting to use register in an invalid
C2 4 0 5 i l l egal short forward reference with o f f set
The asm inline statement i s attempting a short jump o r call to a for­
ward location. This forward location cannot ve range checked. The
asm code must be written to avoid the short forward reference.
Programming Tools Guide
Compiler erro r messages
C2 4 0 6 symbol : name unde f i ned in operand n
The asm inline statement is using symbol which is undefined.
C2 4 0 7
i l l ega l f l o a t r e g i s t e r in
operand n
The asm inline statement has an operand which is not a valid float
C2 4 0 8
i l l eg a l typ e on
C2 4 0 9
i l l ega l typ e u s e d as op e r a t o r in
PTR op e r a t o r in operand n
The asm inline statement has a bad operand. The PTR operator
specifies a bad type.
operand n
The asm inline statement has a bad operand. A type specifier is being
used as an operator.
C2 4 1 0 member : amb i guous memb e r name in operand n
The asm inline statement has a bad operand. The member name is
ambiguous. The structure/union has members which the asm inline
mechanism cannot distinquish.
C2 4 1 1 member : i l l ega l s t ru c t /union memb e r in operand n
The asm inline statement has a bad operand. The member is not a
member of the struct/union specified.
C2 4 12 labelname : c a s e i n s en s i t ive l abe l r ede f i n e d
The asm inline statement is trying to define a label that has already
been used.Use a different label name. Note that these label names are
case insensitive.
C2 4 1 3 value : i l l eg a l a l i gn s i z e
The asm inline statement has specified an illegal value to the align­
ment directive. The allowed range is 0 through 4, inclusive.
C2 4 1 4
i l l eg a l numb e r o f op e r ands
The asm inline statement has the wrong number of operands for the
specified opcode.
C2 4 1 5
i mp rop e r op e r and typ e
The asm inline statement has the wrong type of operands for the
specified opcode. Or else, a SEG or OFFSET operator has an improper
C2 4 1 6 opcode : i l l ega l op c ode f o r p r o c e s s o r
The asm inline statement uses opcode which is not a valid opcode for
the processor.
C2 4 1 7
operand n
The asm inline statement has an operand which results in a divide by
zero. Constant folding is applied to the operand, so the bad operand
value is detected at compile time, instead of runtime.
d i v i de by z e ro in
Appendix B Compiler exit codes and error messages
C2 4 1 9 mod by z ero in operand n
The asm inline statement has an operand which results in a modulus
by zero. Contant folding is applied to the operand, so the bad operand
value is detected at compile time, instead of runtime.
C2 4 2 0 identifier : i l l egal symbol in operand n
Error in asm inline statement. The n-th operand uses an illegal sym­
bol. The allowable symbols are symbols which have already been
seen, or a label, function, or data symbol that has not yet been seen.
C2 4 2 1 PTR operator used with regi s t er in operand n
Error in asm inline statement. The n-th operand is a register, and an
illegal attempt to apply the PTR operator to the register is made.
C2 4 2 2 i l l egal s egment override in operand n
Error in asm inline statement. The n-th operand has an illegal segmen­
toverride. A valid segment register is required, and the operand to
which the segment override applies canno t be in a register.
C2 4 2 3 value : i l l ega l scale
Error in asm inline statement. The scale value for an index instruction
must be one of 1, 2, 4, or 8.
C2 4 2 4 op : improper express ion in % s
Error in asm inline statement. The operands specified for the op opera­
tor are invalid.
C2 4 2 5 op : nonconstant express ion in operand n
Error in asm inline statement. A constant was expected for one of the
operands of op.
C2 4 2 6 op : i l l egal operator in operand n
Error in asm inline statement. The n-th operand has more than one
level of brackets (ie. indexing).
C2 4 2 7 labcl : j ump r e ferenc ing label i s out o f range
A short jump is out of range. The allowed range is +I -127.
C2 4 2 9 label : i l legal far label r e ference
Error in asm inline statement. A jump or call to a far label is not
C2 4 3 0 more than one index regi ster in operand n
Error in asm inline statement. Only one of the operands can use an
index register in a single instruction.
C2 4 3 1 i l l ega l index register in operand n
Error in asm inline statement. The n-th operand uses an illegal register
for indexing. Use an allowable index register.
Programming Tools Guide
Compiler erro r messages
C2 4 3 2 i l l egal r e f erence t o 16 -bit dat a in operand n
Error in asm inline statement. The n-th operand tries to access mem­
oryin a 16-bit mode which is not supported in the 386 compiler.
Warning messages
The messages listed in this section indicate potential problems but do not
hinder compilation and linking. The number in parentheses at the end of each
warning-message description (if any) gives the minimum warning level that
must be set for the message to appear.
C4 0 0 0 UNKNOWN
Contact SCO Technical Support
The compiler detected an unknown error condition.
C4 0 0 1 non s tandard ext ens ion used - identifier
A non-standard extension was used - see -Ze flag.
C4 0 0 2 too many actual parameters for macro identifier
The number of actual arguments specified with the given identifier
was greater than the number of formal parameters given in the macro
definition of the identifier. (1)
C4 0 0 3 not enough actual parameters for macro identifier
The number of actual arguments specified with the given identifier
was less than the number of formal parameters given in the macro
definition of the identifier. (1)
C4 0 0 4 mi s s ing ' ) ' a f t e r de f ined
The closing parenthesis was missing from an #if defined phrase. (1)
C4 0 0 5 identifier : macro rede f init ion
The given identifier was redefined. (1)
C4 0 0 6 #unde f expected an ident i f ier
The name o f the identifier whose definition was to b e removed was
not given with the #undef directive. (1)
C4 0 0 7 identifier : mus t be calltype
The identifier is the entry point from the program start-up code. It will
use the regular C calling convention, whether or not some other cal­
ling convention is specified.
C 4 0 0 8 identifier : calltype a t t r ibut e on dat a ignored
The identifier is not a function, so the calltype attribute is ignored.
C4 0 0 9 s t r ing too big, t r a i l ing chars t runca t ed
A string exceeded the compiler limit on string size. To correct this
problem, break the string into two or more strings. (1)
C 4 0 1 1 i dent i f ier t runcated to identifier
Only the identifiers first 31 characters are significant. (1)
Appendix B Compiler exit codes and error messages
C4 0 1 2 f loat constant in a cross comp i lat ion
The code is being cross-compiled. Value of constant could be different
if compiled on target machine.
C4 0 1 5 identifier : type o f bit f i e ld must be int egral
The given bit field was not declared as an integral type.
Bit fields must be declared as unsigned integral types. A conversion
has been supplied. (1)
C4 0 1 6 identifier : no funct ion return type , us ing int a s de f ault
The given function had not yet been declared or defined, so the return
type was unknown.
The default return type (int) is assumed. (2)
C4 0 1 7 cast o f int express ion to far point er
A far pointer represents a full segmented address. On an 8086/8088
processor, casting an int value to a far pointer may produce an
address with a meaningless segment value. (1)
C4 0 1 8 operator : s i gned/unsigned misma t ch
Signed/unsigned mismatch between operands.
C4 0 2 0 }Uncnanre : too many actual parameters
The number of arguments specified in a function call was greater than
the number of parameters specified in the argument-type list or func­
tion definition. (1)
C4 0 2 1 }Uncnante : too f ew actual parameters
The number of arguments specified in a function call was less than
the number of parameters specified in the argument-type list or func­
tion definition. (1)
C4 0 2 2 point er mi sma t ch : parameter n
The pointer type of the given parameter was different from the
pointer type specified in the argument-type list or function definition.
C4 0 2 4 JUncnante : di f ferent types : parame t e r n
The type of the given parameter in a function call did not agree with
the type given in the argument-type list or function definition. (1)
C4 0 2 6 funct ion was declared with formal argument l i s t
The function was declared to take arguments, but the function
definition did not declare formal parameters. (1)
C4 0 2 7 funct ion was declared without formal argument l i s t
The function was declared to take no arguments (the argument-type
list consisted of the word void), but formal parameters were declared
in the function definition or arguments were given in a call to the
function. (1)
Prograntnting Tools Guide
Compiler error messages
C4 0 2 8 parame t e r n dec laration di f f erent
The type of the given parameter did not agree with the corresponding
type in the argument-type list or with the corresponding formal
parameter. (1)
C4 0 2 9 declared parameter l i s t di f f erent f rom de f in i t ion
The argument-type list given in a function declaration did not agree
with the types of the formal parameters given in the function
definition. (1)
C4 0 3 0 f i r s t parameter l i s t i s longer than the second
A function was declared more than once with different argument-type
lists in the declarations. (1)
C4 0 3 1 second pa rameter l i s t i s longer than the f ir s t
A function was declared more than once with different argument-type
lists. (1)
C 4 0 3 2 unnamed s t ruct /union as parameter
The structure o r union type being passed a s an argument was not
named, so the declaration of the formal parameter canno t use the
name and must declare the type. (1)
C4 0 3 3 funct ion mus t return a value
A function is expected to return a value unless it is declared as void.
C4 0 3 4 s i z eo f returns 0
The sizeof operator was applied to an operand that yielded a size of
zero. (1)
C4 0 3 5 identifier : no return va lue
A function declared to return a value did not do so. (2)
C4 0 3 6 unnamed struct/union as parameter
Unamed struct/union as parameter: see -Zg flag.
C4 0 4 0 memory a t t r ibut e on identifier ignored
The near or far keyword has no effect in the declaration of the given
identifier and is ignored. (1)
C4 0 4 2 identifier : has bad storage c l a s s
The specified storage class canno t b e used in thi s context (for exam­
ple, function parameters canno t be given extern class). The default
storage class for that context was used in place of the illegal class. (1)
C4 0 4 4 _huge on identifier ignored, mus t be an array
The huge keyword was used to declare the given nonarray item. (1)
C4 0 4 5 identifier : array bounds overf low
Too many initializers were present for the given array. The excess ini­
tializers are ignored. (1)
Appendix B - Compiler exit codes and error messages
C4 0 4 7 �erator : di f f erent leve l s o f indirect ion
An expression involving the specified �erator had inconsistent levels
of indirection. (1)
The following example illustrates this condition:
char * *p ;
char * q ;
C4 0 4 8 arrays dec lared subscrip t s di f f erent
An array was declared twice with different sizes. The larger size is
used. (1)
C4 0 4 9 �erator : indirect ion t o di f f erent typ e s
The indirection �erator ( "' ) was used in an expression t o access
values of different types. (1)
C4 0 5 0 �erator : di f f erent code a t t r ibut es
The operands to the �erator are pointers to functions. However, the
interrupt, saveregs, export, or loads modifiers of the operands do not
C4 0 5 1 typ e convers ion - pos s ible l o s s o f dat a
Two data items in an expression had different types, causing the type
of one item to be converted. (2}
C4 0 5 3 at least one vo id operand
An expression with type void was used as an operand. (1)
C4 0 5 6 ove r f low in cons tant arithme t i c
The result o f an operation exceeded Ox7FFFFFFF. (1)
C4 0 5 7 ove r f low in constant mult ip l i cat ion
The result of an operation exceeded Ox7FFFFFFF. (1)
C4 0 5 8 addr e s s of automat i c ( local ) variable t aken , DS ! = SS
The program was compiled with the default data segment (DS) not
equal to the stack segment (SS), and the program tried to point to a
frame variable with a near pointer. (1)
C4 0 5 9 segment lost in convers ion
The conversion of a far pointer (a full segmented address) to a near
pointer (a segment offset) resulted in the loss of the segment address.
C4 0 6 1 long/ short mi smatch in argument : convers ion supp l i ed
The base types of the actual and formal arguments of a function were
different. The actual argument is converted to the type of the formal
parameter. (1)
Programming Tools Guide
Compiler error messages
C4 0 6 2 near / far mi smatch in argument : convers ion supp l i ed
The pointer sizes of the actual and formal arguments of a function
were different. The actual argument is converted to the type of the
formal parameter. (1)
C 4 0 6 3 identifier : funct ion too large for post -opt imi z e r
The given function was not optimized because not enough space was
available. To correct this problem, reduce the size of the function by
dividing it into two or more smaller functions. (0)
C4 0 6 5 recoverable heap ove r f low in po s t -opt imi z e r - some opt im i z a ­
t ions may b e mi s sed
Some optimizations were skipped because not enough space was
available for optimization. To correct this problem, reduce the size of
the function by dividing it into two or more smaller functions. (0)
C4 0 6 6 local symbol table overf low - some local symbo l s may be mi s s ­
ing i n l i s t ings
The listing generator ran out of heap space for local variables, so the
source listing may not contain symbol-table information for all local
C4 0 6 7 unexpec t ed characters following directive
direct ive - newl ine expected
Extra characters followed a preprocessor directive, as in the following
example (1):
ll e n d i f
This is accepted in Version 3.0, but not in Versions 4.0 and 5.0. Ver­
sions 4.0 and 5.0 require comment delimiters, such as the following:
#endi f
/ * NO_EXT_KEYS * /
C4 0 6 8 unknown pragma
The compiler did not recognize a pragma and ignored it. (1)
C4 0 6 9 conve r s ion of near point er t o long integer
A near pointer was converted to a long integer, which involves first
extending the high-order word with the current data-segment value,
not 0 as in Version 3.0. (1)
C4 0 7 1
identifier : no func t ion prototyp e given
The given function was called before the compiler found the corre­
sponding function prototype. (3)
C 4 0 1 2 JUncname no func t ion prototype on _ f a s t c a l l funct ion
No function prototype.
C4 0 7 3 seep ing too deep , deepest seeping merged when debugging
Declarations appeared at a static nesting level greater than 13. As a
result, all declarations will seem to appear at the same level. (1)
Appendix B Compiler exit codes and error messages
C4 0 7 6 type : may be used on integral types only
The type modifiers signed and unsigned can be combined only with
other integral types.
C4 0 7 7 unknown check_stack opt ion
Unknown option given when using the old form of the check_stack
pragma. The option must be empty, +, or -.
C4 0 7 8 case constant number too big for the typ e o f swi t ch express ion
Switch expression limits the size that can be used as a case constant.
C4 0 7 9 unexp e c t ed token token
Unexpected separator token found in argument list of a pragma.
C4 0 8 0 expected ident i f ier for s egment name , found string
Expecting an identifier for segement name in the pragma.
C4 0 8 1 expected a comma , found string
There is a missing comma ( , ) between two arguments of a pragma.
C4 0 8 2 expe c t ed an ident i f ie r , found string
There is a missing identifier in list of arguments to a pragma.
C4 0 8 3 expec t ed ' ( ' , found string
There is a missing opening parenthesis ( ( ) in the argument list for a
C4 0 8 4 exp e c t ed a pragma direct ive , found string
The token following the pragma keyword is not an identifier.
C4 0 8 5 expected [ on l o f f ]
Bad argument given for new form of check_stack pragma.
C4 0 8 6 exp e c t ed [ 1 I 2 I 4 ]
Bad argument given for pack pragma.
C4 0 8 7 name : dec lared with void parameter l i s t
The function name was declared as taking no parameters, but a call to
the function specifies actual parameters.
C4 0 8 8 identifier : po int er mi smatch : parame t e r number, parame t e r l i s t number
Inline assembler warning.
C4 0 8 9 identifier di f ferent types : parameter number parameter l i s t number
Inline assembler warning.
C4 0 9 0 di f f erent const /vola t i l e qua l i f iers
The program passed a pointer to a const item to a function where the
corresponding formal parameter is a pointer to a non-const item,
which means the item could be modified by the function undetected.
C4 0 9 1 no symbo l s were declared
An empty declaration was detected. (2)
Programming Tools Guide
Compiler erro r messages
C4 0 9 2 untagged enum/ s t ruct /union dec l ared no symbo l s
An empty declaration was detected that used an untagged
enum/struct/union. (2)
C4 0 9 3 unescaped newl ine in character cons tant in ina c t ive code
The constant expression of an #if, #eli£, #ifdef, or #ifndef prepro­
cessor directive evaluated to 0, making the following code inactive,
and a new-line character appeared between a single or double quota­
tion mark and the matching single or double quotation mark in that
inactive code.
C4 0 9 4 unt agged member dec lared no symbo l s
An unnamed member is allowed to be an enum/struct/union , only if
the enhancements are enabled. The ANSI C Standard only allows
bitfield members to be unnamed.
found shing
C4 0 9 5 expe c t ed )
Pragma requires a right parenthesis ' )' but did not find it.
C4 0 9 6 modifier mus t be used with modifier
Certain function modifiers must appear in combination. A warning is
given when the compiler adds the necessary constraint.
C4 0 9 8 vo i d funct ion returning a value
A function was declared as not returning a value (ie. void), but a
return value was specified in the function body.
C4 1 0 0 formal : unreferenced forma l parameter
The parameter formal was not used in the function body.
C 4 1 0 1 name : unre f erenced local var iabl e
The local variable name was not used in the function body.
C4 1 0 2 Iabei : unre f erenced label
The label label was not used in the fuction body.
C4 1 0 4 identifier : near data in same_seg pragma l ignored
Near data in same_seg pragma, ignored.
C4 1 0 5 identifier : code modi f iers only on func t i on o r po int er t o funct i on
Code modifications are only allowed on function or pointer to func­
C4 1 0 6 : pragma requires integer between 1 and 1 2 7
You must supply an integer constant in the range 1-127, inclusive, for
the given pragma.
C4 1 0 7 : pragma requires integer between 1 5 and 2 5 5
You must supply an integer constant in the range 15-255, inclusive,
for the given pragma.
Appendix B Compiler exit codes and error messages
C4 1 0 8 : p ragma requires integer between 7 9 and 1 3 2
You must supply an integer constant in the range 79-132, inclusive,
for the given pragrna.
C4 1 0 9 : unexpected ident i f ier token
The designated line contains an unexpected token.
C4 1 1 0 : unexp e c t ed t oken ' int constant '
The designated line contains an unexpected integer constant.
C4 1 1 1 : unexp e c t ed t oken s t r ing
The designated line contains an unexpected string.
C4 1 1 2 : mac ro name name i s reserved , command ignored
You attempted to define a predefined macro name or the preprocessor
operator defined. This warning error also occurs if you attempt to
undefine a predefined macro name. If you attempt to define or
undefine a predefined macro name using command-line options, com­
mand will still be either #define or #undef.
C4 1 1 3 funct ion parameter l i s t s di f f ered
Function parameter lists differed.
C4 1 1 4 same typ e qua l i f ier used more than onc e
Same type o f qualifiers used more than once, example unsigned
C4 1 1 5 name : typ e de f init ion in forma l paramet er l i s t
Tag type definition in formal parameter list.
C4 1 1 6 ( no t ag ) : type de f init ion in forma l parame t e r l i s t
Unamed tag type definition in formal parameter list.
C4 1 1 8 pragma not supported
This pragrna is not supported.
C4 1 2 4 _fast ca l l with stack checking i s ine f f i c i ent
_fastcall and stackchecking are two contradictory goals.
C4 1 2 5 dec ima l digit terminates octal es cape sequence
A non-octal digit was found and terminated the octal sequence.
C4 12 6 <char> : unknown memory model f lag
An unknown memory model flag was specified in pass 1 .
C4 1 2 7 condit iona l express ion i s constant
The expression has a constant value, it is always true or false.
C4 12 8 s torage- c l a s s spec i f ier a f t er type
A storage-class specifier was found after the type.
C4 1 2 9 <char> : unrecogn i z ed charac t er e scape s equenc e
While getting an escaped character (\ddd) an unrecognized character
was found.
Programming Tools Guide
Compiler erro r messages
C4 1 3 0 sh1ng : logical operat ion on addr e s s o f s t r ing c onst ant
Attempting to do a logical operation on address of shing constant.
C 4 1 3 1 name : uses old-style dec larator
Using old-style declarator.
C 4 1 3 2 idenHfier : cons t obj ect should be ini t ia l i z ed
A non-extern const item has not been initialized.
C4 1 3 5 convers ion between di f ferent int egral types
Converting an integral type to a smaller integral type.
C4 1 3 6 convers ion between di f f erent f loat ing types
Converting a real type to a smaller real type.
C4 1 3 7 jUncname : no return value f rom f loat ing-po int funct ion
Expected a return value from floating-point function.
C4 1 3 8
/ ' found out s ide o f comment
Improper comment.
' *
C4 1 3 9 hexvalue : hex es cape sequence i s out o f range
Hex escape number is out of range, it must fit into a byte.
C4 1 8 6 s t r ing too long - t runcated to num characters
The string was too large, it has been truncated to num characters.
C4 2 0 0 local var iabl e identi.fier used without having been ini t i a l i z ed
The local variable will have random contents.
C 4 2 0 1 local variable identifier may be used without having been ini t ial i z ed
The local variable may not have been initialized before being used.
C 4 2 0 2 unreachable code
This code will never be run.
C4 2 0 3 jUncname : func t ion too large for global opt imi z a t ions
Ran out of internal buffers necessary to do global optimizations.
C4 2 0 4 name : in-l ine a s s embler prec ludes globa l opt im i z a t ions
In-line assembler prevents global optimizations.
C4 2 0 5 s t a t ement has no e f fect
The statement does nothing.
C 4 2 0 6 a s s ignment within condi t iona l express ion
An assignment occurred inside a conditional expression.
C4 2 0 9 comma operator within array index exp r e s s i on
Comma operator within array index expression.
C4 3 0 0 insu f f i c i ent memory to proce s s debugging informat ion
Ran out of memory while processing debug information.
Appendix B Compiler exit codes and error messages
C4 3 0 1 loss o f debugging informat ion caus ed by opt imi z a t i o�
Optimization may have scrambled debugging information. Debug­
ging information will be lost.
C4 3 0 2 l o s s o f opt imizat ion with symbol i c debug s e l e c t ed
Will not do certain optimizations if debug is selected.
C4 3 2 3 pot ent ial divide by 0
Potential divide by 0.
C4 3 2 4 pot ent ial mod by 0
Potential mod by 0.
C4 4 0 1 identifier : member i s bit f ield
Inline assembly warning. The identifier is a bit field number.
C4 4 0 2 mus t use PTR operator
Inline assembly warning.
C4 4 0 3 i l l egal PTR operator
Inline assembly warning.
C4 4 0 4 period on direct ive ignored
Inline assembly warning. The ALIGN and EVEN directives do not
require a period prefix.
C4 4 0 5 identifier : ident i f i er i s reserved word
Inline assembly warning. The identifier is a reserved word. The context
in this case is able to determine that it is an identifier. It is not recom­
mended that reserved words be used as identifiers.
C4 4 0 6 operand on direct ive ignored
Inline assembly warning. The assembler directive does not require the
operand. The operand is ignored.
C4 4 0 7 operand s i z e con f l i c t
Inline assembly warning. The size o f the operands are in conflict with
each other. The size chosen is the size associated with the given
opcode. If the opcode does not specify a size, then the size of which­
ever operand is a register is chosen.
C4 4 0 9 i l l egal ins t ruct ion s i z e
Inline assembly warning. The size specified for the opcode i s not valid
for the opcode. A default size appropriate for the opcode is chosen. It
is much safer tospecify the correct size than to depend on the default
C4 4 1 0 i l legal s i z e for operand
Inline assembly warning. The size of some operand conflicts with the
requirements of the opcode. A default size appropriate for the opcode
is chosen. It is much safer to specify the correct size than to depend on
the default value.
Programming Tools Guide
Compiler error messages
C4 4 l l identijler : symbo l resolves t o di splac ement regi s t er
Inline assembly warning. The identijler is a register arguement. Make
sure that the inline assembly is appropriate, given this fact.
C4 4 1 4 identijter : short j ump to func t i on conver t ed t o near
Inline assembly warning. The size specified for the call instruction is
byte sized. The size is converted to a near call.
Appendix B Compiler exit codes and erro r messages
Programming Tools Guide
Abbreviations used in lex, 201
associated with yacc rules, 218, 220
default in yacc specification, 221
executed by parser, 215
in the middle of yacc rules, 222
binding, ld(CP), 24
physical, ld(CP), 24
admin command, sees, 118, 124-125, 139
algorithm in ld(CP), 43
virtual address space, ld(CP), 23
Ambiguities in lex. See Disambiguating rules
Ambiguous grammar rules for yacc, 237
ANSI, predefined macro names, cc(CP), 19
a.out file
building, 153
compatibility, 181
differences with shared and non-shared
libraries, 155
identification, 158
importing symbols, 173
incompatible libraries, 184
-1 option, 153
non-shared libraries, 155
referencing symbols, 153
shared libraries, 151-152, 155, 159-160, 166,
169, 176, 190
Appending to yytext with yymore, 207
Arbitrary value types in yacc specifica­
tions, 225
Archive libraries
compatibility, 179
make(CP), 101, 104
non-shared libraries, 152
searching, ld(CP), 41
shared library, 152
m4(CP), 262, 266
macro, 266
unused in lint, 53
Arithmetic expression parsing, 239
arithmetic macro, m4(CP), 267
Assembly language, shared and non-shared
libraries, 157
Assignment statement
ld(CP), 26, 37
source code, ld(CP), 27
Associativity rules for yacc grammar, 239
Auditing, in sees, 149
Binding address, ld(CP), 24
Blanks in yacc specifications, 217
libraries, 160
table in shared libraries, 160, 167
#branch in shared libraries, 170
Branching, in sees, 122
browser. See escape
Browser choice for escape, 79
.bss, ld(CP), 24
Built-in macros in m4(CP), 261
C shared and non-shared libraries' 157
C code
created from lex specifications, 212
translating lex specifications into C, 213
C compiler. See Compiling and linking C
C declarations in yacc specifications, 225
C Programmer's Productivity Tools
(CPPT), 61
C programs, compiling and linking, cc(CP)
checking syntax, 16
compiling simple programs, 10
disabling and enabling language
extensions, 19
disabling optimization, 1 7
error messages, 1 4
file processing, 1 1
format, 1 1
cc(CP) (continued)
linking libraries, 14
linking object files, 13
maximum optimization, 18
naming executable files, 13
naming object files, 12
options, 12-13, 15-21
redirecting compiler error messages, 15
specifying file, 1 1 -12
specifying output file, 12
syntax, 1 1
warning level, 15
warning messages, 14
cc(CP) optimizing
code size, 1 7
execution time, 1 7
fo r consistent floating point results, 1 7
for speed, 1 7
cc(CP) option
-D, 1 75
-1, 1 74
-xenix, 21
cdc command, sees, 125, 143
cflow(CP), 180
changequote macro, m4(CP}, 265
Character, non-portable use with lint, 57
Checksum, in SCCS, 148
chkshlib(CP}, 151, 181
chmod, in sees, 124
.cnt files, 62
Code size, optimizing, cc (CP}, 1 7
CodeView debugger, with cc(CP}, 16
comb command, sees, 125, 145
Command language
expressions, ld(CP}, 25
symbols, ld(CP}, 36
Command line, error messages, 305
sees, 1 24
in make(CP}, 96
in yacc specifications, 217
Microsoft C Version 6.0, 9
error message, 309
error messages, cc(CP), 15
rcc, cc(CP), 9, 22
redirecting error messages, cc(CP}, 15
writing with lex, 193
Compilers. See Compiling and linking C
C programs, cc(CP}, 9
and linking C programs, cc(CP}, 9
programs, cc command, 10
programs for DOS and OS/2, cc(CP}, 21
programs for XENIX, cc(CP), 21
Concurrent edits
of different SID in SCCS, 132
of same SID in SCCS, 134
Conditional compilation directives in
cscope, 88
Conditionals, m4 macro processor, 269
Configured memory, ld(CP}, 24
Context sensitivity in lex, 203
Continuation lines, in make(CP}, 96
COPY section in ld(CP}, 45
Core files, shared libraries, 158
Correctable error messages, 309
Coverage analysis with lprof, 61
CPPT (C Programmer's Productivity Tools)
cscope. See cscope
introduced, 61
lprof. See lprof
SCCS File using admin, 118
sees files, 139
new macros, m4(CP}, 262
Cross-reference symbol table for cscope, 82
browser choice, 79
cross-reference symbol table, 78, 82
cscope.out cross-reference table, 82
editing, 78-79, 88
environment variables, 78-80
escape to the shell, 91
examples of use, 89
introduced, 61
locating error message source, 83
setting terminal type, 78
stacking editor calls, 88
terminal information (terminfo)
database, 78
use, 80
ld(CP), 24
shared libraries, 155
dbxtra, 158
CodeView with cc(CP), 16
cc(CP), 16
dbxtra with cc(CP), 16
sdb with cc(CP), 16
shared libraries, 158
yacc, 230
Declarations in yacc specification. See yacc
Default action in yacc specification, 221
arguments in m4(CP), 262
expansion delayed with quoting in
m4(CP), 264
macro in m4(CP), 262
lex, 195, 200
yacc specification. See yacc declarations
in yacc, 216, 218, 225
comments, sees, 140
numbering, sees, 122
delta command, sees, 120, 125, 136-137
delta me, sees, 1 1 8, 120
Descriptive text, in SCCS, 148
Disabling optimization, cc(CP), 17
Disabling or enabling language extensions,
cc(CP), 19
Disambiguating rules
lex, 203, 205
yacc, 238, 242
Disk storage space, 151
divert macro, m4(CP), 268
divnum macro, m4(CP), 269
Dollar-sign symbol in yacc rules, 221
dosld(CP), 21
DSECT dummy section, ld(CP), 45
Dummy section, DSECT option, ld(CP), 45
dump(CP), 158
dumpdef macro, m4(CP), 272
dynamic analysis, 61
Dynamic dependency parameters
in make(CP), 109
ECHO macro in lex, 199
calls stacked with cscope, 88
choice for cscope, 79
link editor. See ld(CP)
EDITOR environment variable for cscope, 79
Empty rules in yacc specification, 219
Enabling or disabling language extensions,
cc(CP), 19
End marker in yacc specifications, 229
Entry point, changing. ld(CP), 40
Enumeration, type checking with lint, 56
handling in yacc, 242, 244
recovery, interactive in yacc, 244
recovery,YYERROR( ) function in
yacc, 234
reporting in yacc, 230
token in yacc, 242
Error message
sees, 126, 128-129, 132
cc(CP), 14-15
compiler, 305, 309-310, 331
cscope, 78
fatal, cc(CP), 15
format. See Error message, compiler
help, sees, 125-126
ld(CP), 32
lint, 54
locating source with cscope, 83
m4(CP), 272
profiled program, lprof, 71
redirecting, cc(CP), 15
syntax, cc(CP), 16
warning as fatal, cc(CP), 1 6
yacc, 227, 230
Errorlevel codes See Exit codes
errprint macro, m4(CP), 272
Escape sequences in yacc specifications, 218
eval macro, m4(CP), 266-267
Examples of yacc specifications, 248
Executable files, naming, cc(CP), 13
Exit code, 305
#export, 166, 168
Extensions, object files, cc(CP), 12
___ _
Fatal-error messages, 309
sees, 147
sees delta me, 118
a.out IDe. See a.out file
arguments in sees, 124
creating, in sees, 1 1 8
creating in sees, 139
delta me in sees, 120
end-of-IDe processing in lex, 209
extensions, cc(eP), 1 1 i n sees, 1 1 9
interdependencies. S ee make(eP)
linking object IDes, cc(eP), 13
naming executable IDes, cc command 13
naming IDes, cc(eP), 12
nonrelocatable input me, ld(eP}, 46
object me, cc(eP}, 1 1 -12
object, ld(eP}, 24
output blocking, ld(eP}, 46
parameters in sees, 140
processing, cc(eP}, 1 1
protection in sees, 147
retrieving sees me with get, 119
source IDe, cc(eP}, 1 1
source me, shared libraries 185
specifying me, cc(eP}, 1 i
find command, sees, 124
.fini, ld(eP}, 25
finite-state machine, 231
Flags, in sees, 148
Floating point, optimizing for consistent
results, cc(eP}, 17
fork(S}, 65
Format, yacc specifications, 216
Formatting, in sees, 148
Frequency profiling with lprof, 61
unused in lint, 53, 55
get command options, sees, 1 19, 128-131,
get command, sees, 1 19-120, 125-126
getgrent(S) and shared library, 167, in sees, 1 19
Global data
shared libraries, 168, 170, 189
Grammar rules
in yacc, 215, 218-219
left associativity, yacc, 237
left associativity, yacc , 237
right associativity, yacc, 237
GROUP directive, ld(eP}, 34
Grouping, output sections, ld(eP},
help command, sees, 125-126
#hide, 166, 168, 170
creating within a section, ld(eP}, 38
in physical memory, ld(eP}, 39
Host shared library, 152, 159, 165, 1 76 182I
ID keywords, in sees, 127
ifdef macro, m4(eP), 265
ifelse macro, m4(eP}, 269
ime, ld(eP}, 25
Implicit rules, in make(eP}, 102
Imported symbol, 1 77
include IDes, in make(eP}, 109
include macro, m4(eP}, 268
incr macro, m4(eP}, 267
Incremental link editing, ld(eP}, 43
INFO section in ld(eP), 45
#init, 170
.init, initialized variable, ld(eP), 25
Initialization and modification of me
in Sees, 140
Initialized variable, ld(eP}, 25
section, ld(eP}, 23
style for yacc specifications, 245
Inserting commentary for the initial delta
in Sees, 140
Interactive error recovery in yacc, 244
Internal rules of make(eP}, 1 1 1
Interpreting proffiing output, 66
I/0 routines for lex. See lex l/0 routines
j flag, in sees,
Kernighan and Ritchie code, compiling with
rcc, cc (CP), 9
Key letters that affect output
in sees, 135
Keywords in yacc specifications, 224, 240241
Language extensions, enabling or disabling,
cc(CP), 19
ADDR pseudo-function, 28, 32
ALIGN pseudo-function, 28, 32
C language operators in command language, 25
COPY, 45
DSECT dummy section, 45
GROUP directive, 34
INFO, 45
MEMORY directives, 28
NEXT pseudo-function, 32
SECTIONS directive, 30, 32
SECTIONS output alignment, 32
SIZEOF pseudo-function, 28, 32
address, 24
allocation, 23, 43
archive libraries, 41
assignment statement, 26-27, 37
binding address, 24
.bss, 24
changing entry point, 40
command language expressions, 25
configured memory, 24
constants in command language, 25
.data, 24
dummy section, 45
expressions, 25
.fini, 25
global symbols in command language, 25
grouping output sections, 33
holes, 35, 39
ld(CP) (continued)
ifile, 25
incremental link editing, 43
.init, ld(CP), 25
initialized section holes, 38
input section, 23
introduced, 23
-1 option, 41
linking, cc(CP), 13
location counter, 27
memory configuration, 24-25, 28
nonrelocatable input file, 46
output, 23, 46
-r option, 43
section, 23-24, 37
shared libraries, 160
source code and assignment statement
interaction, 27
symbol creating and defining, 36
syntax for input directives, 47
.text, 24
-u option, 42
unconfigured memory, 24
Left associativity rules for yacc
grammar, 239
Left recursion rules for yacc, 246
len macro, m4(CP), 270
Level numbering, delta numbering in sees'
l/0 routines, 206
UNIX systems, 212
actions, 198-199
compiling and linking, 213
context sensitivity, 203-204
definitions, 195, 200-201
disambiguating rules, 203, 205
�nd-of-file processing with yywrap, 209
mtroduced, 193
library default main( ) program, 213
options, 213
output file, 213
programs checking with lint, 55
regular expressions, 195
reprocessing input with yymore, 207
rules, 195
running lexical analyzer program, 213
specification, 195
specifications, 193
subroutines, 201
lex (continued)
symbol table, 211
tokens, 195, 209
used, 214
used with yacc, 217, 227
user subroutines, 195
y.tabh file, 211
yyleng variable, 199
yyless, 208
yylex( ) lexical analyzer, 209
yytext character array, 199
yyval variable, 211
Lexical analysis,
lex, 193
yacc, 215, 227
Lexical analyzer yylex( ) generated by
lex, 209
Lexical tie-ins, yacc, 246
lex.yy.c file, 213
.lib, 151
libc, 173
libmalloc, 173
archive. See Archive library
cc(eP) -M options, 13
host, 165, 176
lex, 213
libc, 173
libmalloc, 173
lint, 52
math, 52
networking, 152-153
non-shared, 152
search path, cc(eP}, 14
shared. See Shared libraries
target, 165
yacc access, 230
Line count data, 62
Lines, newlines in yacc specifications, 217
Link edit, shared libraries, 151
Link editor. See ld(eP)
shared libraries, 160
standard e library, 13
with additional libraries, cc(eP}, 14
with cc(eP}, 13
Linking and compiling e programs,
cc(eP), 9
flow of control, 54
introduction, 51
lint (continued)
libraries, 52
longs to ints, 57
math library, 52
messages, 53
non-portable character use, 57
options, 52
portable library, 52
set, 54
subexpression evaluation order, 58
syntax, 51
type casts, 56
type checking, 56
unused arguments, 53
unused function values, 55
unused variables and functions, 53
usage, 51
used wil:h lex or yacc, 55
variable used before set, 54
Literals in yacc specifications, 218
Location counter, ld(eP}, 27
look-ahead token, 231, 244
PROFOPTS environment variable, 62
behavior differences with rcc, 76
.cnt file, 62
coverage analysis, 61
creating profiled program, 62
data files in separate directory, 65
dynamic analysis, 61
frequency profiling, 61-62
incompatibility with optimization, 72
interpreting output, 66
introduced, 62
line count data, 62
merging data files, 64
non-terminating programs, 72
profiled source listing, 66
programs that fork(S), 65
run-time data, 62
summary option, 71
test coverage improvement, 74
timestamp override, 71
unexecuted lines, 68
used with program generators, 73
used with rcc compiler, 76
used with shared libraries, 73
ls command, sees, 124
Macro processor (continued)
system commands in m4(CP),
main( ) routine in yacc, 227
See also Macro processor
basic operation,
Include files,
archive libraries,
quoting to delay define expansion,
redirection, 262
removing macro definitions,
standard input,
using arguments,
defining, m4(CP),
maintaining parser code,
definition removed with undefine'
definitions, make(CP), 97
divnum m4(CP),
len, m4(CP),
syscrnd, m4(CP),
used with yacc,
using shell commands,
1 73
macro processors,
Manipulating strings, m4 macro processor
Macro processor
Math library, lint,
Memory configuration, ld(CP), 24-25, 28
MEMORY directives, ld(CP),
Merging, cscope, 64, 71
_warning and error messages, cc(CP),
Microsoft C compiler, 9
See m4(CP)
conditionals in m4(CP),
m4(CP), 261
printing, 272
Manipulating files
include function,
user-defined in m4(CP),
translations, make(CP), 101
translit, m4(CP), 271
undefine in m4(CP), 265
undivert, m4(CP), 268
timestamp information,
substr, m4(CP),
94, 101-102, 107-108
makefile. See make(CP)
sinclude, m4(CP),
suffix rules,
Makeffie. See make(CP)
ifdef, m4(CP), 265
ifelse, m4(CP), 269
include, m4(CP),
recursive makeffies,
errprint, m4(CP),
93-94, 96
null suffix,
dumpdef in m4(CP),
macro definitions, 97
macro translations, 101
changequot� 265
defining in m4(CP),
internal rules,
built-in, m4(CP),
dependency, 96
dynamic dependency parameters,
implicit rules,
arithmetic, m4(CP),
creating new suffix rules,
example makefile, 94, 99
file interdependencies, 93-94, 109
undefine function,
divert, m4(CP),
environment variables,
standard output,
101, 104
continuation lines,
eval, m4(CP),
Tilde n i n SCCS filenames,
ECHO in lex,
1 09
151, 161, 165, 1 70, 1 78, 190-191
1 99
Multi-statement action in lex, 199
Multiple function definitions and cscope, 88
Multi-line action in lex,
executable files, cc(eP}, 13
object file naming with cc command, 12
yacc specifications, 218
Newlines in yacc specifications, 217
noload, 178
NOLOAD section in ld(eP}, 46
Non-portable character use
with lint, 57
Non-shared libraries, 152
Non-terminating profiled programs, 72
Null suffix, in make(eP}, 107
Numbering, delta in sees, 122
Parse trees in yacc rules, 224
compiling and running, 231
error recovery, 234
generator. See also yacc
operation, 232
reset with yyerrok, 244
See also yacc
arithmetic expressions, 239
Pattern � lex specification. See also Regular
Performance improvement with prof and
lprof, 73
p.file, in sees, 120
Physical address, ld(eP}, 24
Portability restrictions, checking with lint'
Portable lint library, 52
Portable operating system for UNIX. See
POSIX conforming code, cc(eP}, 20
Precedence rules for yacc grammar, 239
Preparing specifications for yacc, 245
printf(S}, 158, 160, 166-167
Printing, m4(eP}, 272
prof, time-profiling, 62
Profile options. See PROFOPTS
Profiled program
creating, 62
profiler. See lprof
error messages, escape, 63
programs that fork(S}, 65
within a shell script, 65
environment variable, 62
examples, 63
options list, 63
profiling within a shell script, 65
turning off, 64
checker. See lint
creating profiled version, 62
generator. See lex
generator and lprof, 73
#object, 170
Object file
cc(eP}, 11
default extension, cc(eP}, 1 1
extensions, cc(eP}, 1 2
ld(eP}, 24
linking, cc(eP}, 13
naming with cc command, 12
section, ld(eP}, 23
Operand types, type checking with lint, 56
consistent floating point results, cc(eP}, 17
default,cc(eP}, 17
disabling, cc(eP}, 16-17
for code size, cc(eP}, 17
for execution time, cc(eP}, 1 7
fo r speed, cc(eP}, 17
incompatibility with profiling, 72
maximum, cc(eP}, 18
unsafe procedures, cc(eP}, 18
Output file
blocking, ld(eP}, 46
specifying, cc(eP}, 12
Output section
creating holes, ld(eP}, 35
ld(eP}, 23
Output translations, in make(eP}, 101
OVERLAY section, ld(eP}, 46
Protection, in SCCS, 147
prs command, sees, 125, 141
Pseudo-variable in yacc rules, 221, 223
compiler, cc(CP}, 9
used with lprof, 76
Recording changes
sees, 120
Recursion rules for yacc, 246
Reduce-reduce conflict in yacc, 238, 242
Region addresses, shared libraries, 162
Regular expressions in lex. See also Pattern
REJECT in lex, 208
Release numbering, delta numbering in SCCS,
Removing macro definitions with
undefine, 265
Reserved words in yacc, 247
Resetting parser with yyerrok, 244
different versions, sees, 128
get command, sees, 1 1 9
t o make a delta, sees, 129
Return codes See Exit Codes
Return value, type checking with lint, 56
associativity rules for yacc grammar, 239
recursion rules for yacc, 246
rlint. See lint
rmdel command, sees, 125, 143
Rules, yacc specification, 216, 218
Runtime analysis. See lprof
Run-time data file, 62
sact command, sees, 125, 142
ID keywords, 127
SCCS identification string(SID) See SID
SID, 122
sees (continued)
access permission, 125
auditing, 149
beginners, 118
body, 148
branching, 122
checksum, 148
chmod, 124
commands, 1 18-120, 124-125, 128-129, 131,
135-137, 139-146
concurrent edits of different SID, 132
concurrent edits of same SID, 134
creating a file using admin, 1 1 8
creating o f sees files, 139
delta comments, 140
delta file, 118, 120
delta numbering, 122
delta table, 148
descriptive text, 148
error messages, 126
file arguments, 124
filenames in make(CP}, 106
files, 147
flags, 148
formatting, 148
get command, 126, 135
get options, 131
g.file, 1 1 9
help command, 126
id keywords, 127
initialization and modification of file
parameters, 140
inserting commentary for the initial
delta, 140
introduced, 1 1 7
j flag, 129
key letters that affect output, 135
permission, 125
p.file, 120
protection, 147
recording changes by using delta, 1 20
retrieving a file, 1 1 9, 128-129
terminology, 1 1 8
tree structure, 122
undoing a get -e, 130
user names, 1 48
x.files and z.files, 124
SCCS IDentification string (SID) See SID
sccsdiff command, sees, 124-125, 145
sdb debugger, with cc(CP}, 16
Searching for patterns with lex, 195
allocation into named memory, ld(CP), 37
holes in physical memory, ld(CP), 39
initialized holes, ld(CP), 38
ld(CP), 23
physical address, ld(CP), 24
Section definition, ld(CP)
COPY, 45
INFO, 45
Section type. See Section definition
assignment statement, ld(CP), 37
directive, ld(CP), 30
file specification directive, ld(CP), 30
loading to a specific address, ld(CP), 32
output, ld(CP), 32
Shared libraries
C library, 152
a.out ffie, 155, 158-160, 166, 169
assembly language, 157
benefits, 152
#branch, 170
branch table, 160, 167, 171
building, 161, 191
building a.out file, 153
building host libraries, 165
building target libraries, 165
changing existing code, 167
checking for different version, 181
chkshlib(CP), 181
code conventions, 157
coding an application, 157
compatibility, 170, 179, 181
data symbol, 168
debugging, 158
description, 151
example, 152, 185
#export, 166, 168, 171, 191
external symbol, 168
getgrent(S), 167
global data, 168, 170
guidelines, 165
#hide, 166, 168, 170-171
#hide linker, 191
host, 152
host library, 159
implementing, 159
importing symbols, 1 72, 177
incompatible libraries, 184
Shared libraries (continued)
#init, 170
-lc_s option, 152
ld (CP) link editor, 160
.lib, 155
libc, 172-173
libmalloc, 173
link edit, 151
-lnsl_s option, 152
malloc(S), 173
members, 166
memory, 151
memory management unit (MMU), 162
mkshlib(CP), 190-191
networking library, 152
noload, 178
#object, 170
#objects noload, 191
paging, 180
printf(S), 160, 166-167
prof(CP), 180
proffiing, 73
region addresses, 162, 188
rewriting existing code, 165, 189
routines, 166
_s suffix, 152
selecting contents, 164, 189
source code, 157
source files, 185
space .considerations, 154, 157
specification file, 165, 170, 190
stack variables, 168
static data, 168
static library symbols, 169
static linking, 160
strcmp(S), 157
target library, 152, 159
target library pathname, 164
target pathname, 188
tuning code, 180
using, 191
when to use, 154
Shell commands, in make(CP), 100
Shift-reduce conflict in yacc, 238, 242
sees identification string, 118
concurrent edits of different SID, 132
concurrent edits of same SID, 134
retrieving different versions, 128
sinclude macro, m4(CP), 268
Source code and assignment statement,
ld(eP), 27
Source code control system. See sees
Source file
default extension, cc(eP), 1 1
i n a different directory, 70
searching with cscope, 80
shared libraries, 185, 187, 189
specifying, cc(eP), 12
Source listing for a subset of files, 70
Specification file, 190
Speci�ing program and data file to lprof, 69
Stacking cscope and editor calls, 88
Standard e library, linking, cc(eP), 13
Standards conformance, cc(eP), 19
Start symbol in yacc specifications, 225
Starting state in lex, 204
Static data, 167, 169
Static link editor, shared libraries' 160
strcmp(S), 157-158
Structure reference, type checking with lint
Subexpression, evaluation order with lint
lex, 201
yacc. See yacc subroutines
substr macro, m4(eP), 271
Suffix rules, in make(eP), 94
Suffixes and transformation rules
in make(eP), 102
SVID conforming code, cc(eP), 21
($) dollar-sign in yacc rules, 221
importing into shared libraries 1 72-1 73
ld(eP) command language, 36
left context in yacc specification, 223
table, cscope, 82
table, lex, 211
terminal and non-terminal in yacc specifica­
tion, 218
values in yacc rules, 221
checking, cc(eP), 16
diagram for input directives, ld(eP), 47
error messages, cc(eP), 16
error messages in yacc, 230
ld(eP), 26
lint, 51
syscmd macro, m4(eP), 269
System commands, m4 macro processor,
Tabs in yacc specifications, 217
Target program, using make(eP) to produce
Target shared libraries, 152, 159, 165, 182
TER� environment variable for cscope, 78
and non-terminal symbols in yacc specifica­
tion, 218
information (terminfo) database, 78
type setting for cscope, 78
Terminfo database, 78
Terminology, in sees, 1 18
Test coverage improvement with lprof, 74
.text, executable instructions
ld(eP), 24
shared libraries, 155
Time-profiling with prof, 62
in make(eP), 94
override with lprof, 71
changing the value, 211
declared in yacc specifications, 224
in yacc, 218
lex specification, 195
lexical analysis, 215
look ahead clearing with yyclearin, 244
look ahead use in yacc, 231
names, reserved, 229
numbers in yacc, 229
returning, 209
use with lexical analyzer and parser, 209
yacc error token, 242
yyclearin, 244
translit macro, m4(eP), 271
Tree structure, in sees, 122
casts with lint, 56
checking with lint, 56
Unconfigured memory, ld(eP}, 24
undefine macro in m4(eP}, 265
undivert macro, m4(eP}, 268
Undoing a get -e, in sees, 130
unget command, sees, 125
Uninitialized variable, ld(eP}, 25
arguments, lint, 53
function values, lint, 55
functions, lint, 53
variables and functions, lint, 53
defined macros in m4(eP}, 261
defined subroutines. See yacc subroutines
names, sees, 148
subroutines, lex, 195
XENIX, compiling, cc(eP}, 21
x.files and z.files, in sees, 124
XPG3 conforming code, cc(eP}, 20
val command, sees, 124-125, 146
left context in yacc specification, 223
stack in yacc declarations, 225
types in yacc specifications, 225
initialized, ld(eP}, 25
uninitialized, ld(eP}, 25
unused in lint, 53
vc command, sees, 125, 146
VIEWER environment variable for
cscope, 79
VPATH environment variable for cscope,
Warning error messages, 310, 331
Warning message
sees, 128
cc(eP}, 14
setting level, cc(eP}, 15
what command, sees, 124-125, 144
Wide return values, cc(eP}, 18
YYERROR( ) parser error recovery, 234
compiling and running the parser, 231
declarations, 216, 224-226, 240-241
delimiters, 218
error. See yacc error
forced error recovery, 244
grammar rules. See yacc grammar rules
grammar use, 215
interactive error recovery, 244
introduced, 215
lexical analysis, 227
library, accessing, 230
maintenance using make(eP}, 231
output file, 230
parser operation, 231
program checking with lint, 55
reserved words, 247
rules. See yacc rules
source files, 231
specifications. See yacc specifications
subroutines. See yacc subroutines
token names reserved, 229
used with lex, 209, 217, 227
yychar, 230
yydebug, 230
yyparse( ) function, 230
yacc error
forced recovery, 244
handling, 242
interactive recovery, 244
recovery, reset with yyerrok, 244
reporting, 230
token, 242
yacc grammar rules
ambiguous, 237
associativity, 239
disambiguating rules, 238, 242
left associativity, yacc, 237
precedence, 239
reduce-reduce conflict, 238, 242
right associativity, yacc, 237
shift-reduce conflict, 238, 242
yacc rules
actions, 218, 220, 222
default action, 221
dollar-sign symbol, 221
empty rules, 219
escape sequences, 218
grammar, 218-219
left recursion, 246
literals, 218
names, 218
parse trees, 224
pseudo-variable, 221, 223-224
right recursion, 246
specification, 218
symbols, 218, 221, 223
terminal and non-terminal symbols, 218
tokens, 218
values, left context, 223
yacc specifications
blanks, 217
comments, 217
delimiter %%, 216
examples, 248
format, 216
input style, 245
newlines, 217
tabs, 217
yacc subroutines
end marker, 229
external variable yyval, 227
lexical analyzer, 227
main( ) routine, 227
token numbers, 229
user-defined, 227
yyerror routine, 227
yylex routine, 227 output file for yacc, 230, used with yacc, 229 used with lex, 211 used with yacc, 211
yychar variable used in yacc, 230
yyclearin token in yacc, 244
yydebug used in yacc, 230
yyerrok, parser reset in yacc, 244
YYERROR( ) parser error recovery, 234
yyerror routine in yacc, 227
yyleng variable in lex, 199
yyless function in lex, 208
yylex, routine in yacc, 227
yylex( ) lexical analyzer generated by
lex, 209
assignment, 227
variable used in yacc subroutines, 227
yylval variable used in yacc declaration, 226
yymore used to append to yytext, 207
yyparse( ) function in yacc, 230
YYSTYPE union declaration in yacc, 226
yytext character array in lex, 199, 207
yyval, variable used in yacc declaration, 226
yyval variable, 211
yywrap end-of-file processing in lex, 209
Please help us to write computer manuals that meet your needs by completing this
form. Please post the completed form to the Technical Publications Research
Coordinator nearest you: The Santa Cruz Operation, Ltd., Croxley Centre, Hatters
Lane, Watford WDl 8YN, United Kingdom; The Santa Cruz Operation, Inc., 400
Encinal Street, P.O. Box 1 900, Santa Cruz, California 95061, USA or SCO Canada,
Inc., 130 Bloor Street West, l Oth Floor, Toronto, Ontario, Canada MSS 1N5.
Volume title:
(Copy this from the title page of the manual, for example, SeD UNIX Operating System User's Guide)
(for example, sea UNIX System V Release 3.2 Operating System Version 4.0)
How long have you used this product?
0 Less than one month
0 Less than six months
0 1 to 2 years
0 More than 2 years
0 Less than one year
How much have you read of this manual?
0 Entire manual
0 Specific chapters
0 Used only for reference
The software was fully and accurately described
The manual was well organized
The writing was at an appropriate technical level
(neither too complicated nor too simple)
It was easy to find the information I was looking for
Examples were clear and easy to follow
lllustrations added to my understanding of the software
I liked the page design of the manual
If you have specific comments or if you have found specific inaccuracies,
please report these on the back of this form or on a separate sheet of paper.
In the case of inaccuracies, please list the relevant page number.
May we contact you further about how to improve SCO UNIX d ocumentation?
If so, please supply the following details:
Com� ny
Positio n
City & Post/Zip Code
Coun hy
Facsim ile
10 December 1991
1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 11 1 1 1
AU01 4 1 1 POOO