Advanced Programming in the UNIX Environment - X

ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Advanced Programming in the UNIX® Environment: Second Edition
By W. Richard Stevens, Stephen A. Rago
...............................................
Publisher: Addison Wesley Professional
Pub Date: June 17, 2005
ISBN: 0201433079
Pages: 960
T able of C ontents | I ndex
"Stephen Rago's update is a long overdue benefit to the community of professionals using the
versatile family of UNIX and UNIX-like operating environments. It removes obsolescence and
includes newer developments. It also thoroughly updates the context of all topics, examples, and
applications to recent releases of popular implementations of UNIX and UNIX-like environments.
And yet, it does all this while retaining the style and taste of the original classic."--Mukesh
Kacker, cofounder and former CTO of Pronto Networks, Inc."One of the essential classics of
UNIX programming."--Eric S. Raymond, author of The Art of UNIX Programming"This is the
definitive reference book for any serious or professional UNIX systems programmer. Rago has
updated and extended the classic Stevens text while keeping true to the original. The APIs are
illuminated by clear examples of their use. He also mentions many of the pitfalls to look out for
when programming across different UNIX system implementations and points out how to avoid
these pitfalls using relevant standards such as POSIX 1003.1, 2004 edition and the Single UNIX
Specification, Version 3."--Andrew Josey, Director, Certification, The Open Group, and Chair of
the POSIX 1003.1 Working Group"Advanced Programming in the UNIX® Environment, Second
Edition, is an essential reference for anyone writing programs for a UNIX system. It's the first
book I turn to when I want to understand or re-learn any of the various system interfaces.
Stephen Rago has successfully revised this book to incorporate newer operating systems such
as GNU/Linux and Apple's OS X while keeping true to the first edition in terms of both readability
and usefulness. It will always have a place right next to my computer."--Dr. Benjamin
Kuperman, Swarthmore CollegePraise for the First Edition"Advanced Programming in the
UNIX® Environment is a must-have for any serious C programmer who works under UNIX. Its
depth, thoroughness, and clarity of explana-tion are unmatched."--UniForum Monthly"Numerous
readers recommended Advanced Programming in the UNIX® Environment by W. Richard
Stevens (Addison-Wesley), and I'm glad they did; I hadn't even heard of this book, and it's been
out since 1992. I just got my hands on a copy, and the first few chapters have been
fascinating."--Open Systems Today"A much more readable and detailed treatment of UNIX
internals can be found in Advanced Programming in the UNIX® Environment by W. Richard
Stevens (Addison-Wesley). This book includes lots of realistic examples, and I find it quite
helpful when I have systems programming tasks to do."--RS/Magazine"This is the definitive
reference book for any serious or professional UNIX systems programmer. Rago has updated
and extended the original Stevens classic while keeping true to the original."--Andrew Josey,
Director, Certification, The Open Group, and Chair of the POSIX 1003.1 Working GroupFor over
a decade, serious C programmers have relied on one book for practical, in-depth knowledge of
the programming interfaces that drive the UNIX and Linux kernels: W. Richard Stevens'
Advanced Programming in the UNIX® Environment. Now, Stevens' colleague Stephen Rago has
thoroughly updated this classic to reflect the latest technical advances and add support for
today's leading UNIX and Linux platforms.Rago carefully retains the spirit and approach that
made this book a classic. Building on Stevens' work, he begins with basic topics such as files,
directories, and processes, carefully laying the groundwork for understanding more advanced
techniques, such as signal handling and terminal I/O.Substantial new material includes chapters
on threads and multithreaded programming, using the socket interface to drive interprocess
communication (IPC), and extensive coverage of the interfaces added to the latest version of the
POSIX.1 standard. Nearly all examples have been tested on four of today's most widely used
Page 1
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
UNIX/Linux platforms: FreeBSD 5.2.1; the Linux 2.4.22 kernel; Solaris 9; and Darwin 7.4.0, the
FreeBSD/Mach hybrid underlying Apple's Mac OS X 10.3.As in the first edition, you'll learn
through example, including more than 10,000 lines of downloadable, ANSI C source code. More
than 400 system calls and functions are demonstrated with concise, complete programs that
clearly illustrate their usage, arguments, and return values. To tie together what you've learned,
the book presents several chapter-length case studies, each fully updated for contemporary
environments.Advanced Programming in the UNIX® Environment has helped a generation of
programmers write code with exceptional power, performance, and reliability. Now updated for
today's UNIX/Linux systems, this second edition will be even more indispensable.
Page 2
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Advanced Programming in the UNIX® Environment: Second Edition
By W. Richard Stevens, Stephen A. Rago
...............................................
Publisher: Addison Wesley Professional
Pub Date: June 17, 2005
ISBN: 0201433079
Pages: 960
T able of C ontents | I ndex
Copyright
Praise for Advanced Programming in the UNIX® Environment, Second Edition
Praise for the First Edition
Addison-Wesley Professional Computing Series
Foreword
Preface
Introduction
Changes from the First Edition
Acknowledgments
Preface to the First Edition
Introduction
Unix Standards
Organization of the Book
Examples in the Text
Systems Used to Test the Examples
Acknowledgments
Chapter 1. UNIX System Overview
Section 1.1. Introduction
Section 1.2. UNIX Architecture
Section 1.3. Logging In
Section 1.4. Files and Directories
Section 1.5. Input and Output
Section 1.6. Programs and Processes
Section 1.7. Error Handling
Section 1.8. User Identification
Section 1.9. Signals
Section 1.10. Time Values
Section 1.11. System Calls and Library Functions
Section 1.12. Summary
Exercises
Chapter 2. UNIX Standardization and Implementations
Section 2.1. Introduction
Section 2.2. UNIX Standardization
Section 2.3. UNIX System Implementations
Section 2.4. Relationship of Standards and Implementations
Section 2.5. Limits
Section 2.6. Options
Section 2.7. Feature Test Macros
Section 2.8. Primitive System Data Types
Section 2.9. Conflicts Between Standards
Section 2.10. Summary
Exercises
Chapter 3. File I/O
Section 3.1. Introduction
Section 3.2. File Descriptors
Section 3.3. open Function
Section 3.4. creat Function
Section 3.5. close Function
Page 3
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Section 3.6. lseek Function
Section 3.7. read Function
Section 3.8. write Function
Section 3.9. I/O Efficiency
Section 3.10. File Sharing
Section 3.11. Atomic Operations
Section 3.12. dup and dup2 Functions
Section 3.13. sync, fsync, and fdatasync Functions
Section 3.14. fcntl Function
Section 3.15. ioctl Function
Section 3.16. /dev/fd
Section 3.17. Summary
Exercises
Chapter 4. Files and Directories
Section 4.1. Introduction
Section 4.2. stat, fstat, and lstat Functions
Section 4.3. File Types
Section 4.4. Set-User-ID and Set-Group-ID
Section 4.5. File Access Permissions
Section 4.6. Ownership of New Files and Directories
Section 4.7. access Function
Section 4.8. umask Function
Section 4.9. chmod and fchmod Functions
Section 4.10. Sticky Bit
Section 4.11. chown, fchown, and lchown Functions
Section 4.12. File Size
Section 4.13. File Truncation
Section 4.14. File Systems
Section 4.15. link, unlink, remove, and rename Functions
Section 4.16. Symbolic Links
Section 4.17. symlink and readlink Functions
Section 4.18. File Times
Section 4.19. utime Function
Section 4.20. mkdir and rmdir Functions
Section 4.21. Reading Directories
Section 4.22. chdir, fchdir, and getcwd Functions
Section 4.23. Device Special Files
Section 4.24. Summary of File Access Permission Bits
Section 4.25. Summary
Exercises
Chapter 5. Standard I/O Library
Section 5.1. Introduction
Section 5.2. Streams and FILE Objects
Section 5.3. Standard Input, Standard Output, and Standard Error
Section 5.4. Buffering
Section 5.5. Opening a Stream
Section 5.6. Reading and Writing a Stream
Section 5.7. Line-at-a-Time I/O
Section 5.8. Standard I/O Efficiency
Section 5.9. Binary I/O
Section 5.10. Positioning a Stream
Section 5.11. Formatted I/O
Section 5.12. Implementation Details
Section 5.13. Temporary Files
Section 5.14. Alternatives to Standard I/O
Section 5.15. Summary
Exercises
Chapter 6. System Data Files and Information
Section 6.1. Introduction
Section 6.2. Password File
Page 4
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Section 6.3. Shadow Passwords
Section 6.4. Group File
Section 6.5. Supplementary Group IDs
Section 6.6. Implementation Differences
Section 6.7. Other Data Files
Section 6.8. Login Accounting
Section 6.9. System Identification
Section 6.10. Time and Date Routines
Section 6.11. Summary
Exercises
Chapter 7. Process Environment
Section 7.1. Introduction
Section 7.2. main Function
Section 7.3. Process Termination
Section 7.4. Command-Line Arguments
Section 7.5. Environment List
Section 7.6. Memory Layout of a C Program
Section 7.7. Shared Libraries
Section 7.8. Memory Allocation
Section 7.9. Environment Variables
Section 7.10. setjmp and longjmp Functions
Section 7.11. getrlimit and setrlimit Functions
Section 7.12. Summary
Exercises
Chapter 8. Process Control
Section 8.1. Introduction
Section 8.2. Process Identifiers
Section 8.3. fork Function
Section 8.4. vfork Function
Section 8.5. exit Functions
Section 8.6. wait and waitpid Functions
Section 8.7. waitid Function
Section 8.8. wait3 and wait4 Functions
Section 8.9. Race Conditions
Section 8.10. exec Functions
Section 8.11. Changing User IDs and Group IDs
Section 8.12. Interpreter Files
Section 8.13. system Function
Section 8.14. Process Accounting
Section 8.15. User Identification
Section 8.16. Process Times
Section 8.17. Summary
Exercises
Chapter 9. Process Relationships
Section 9.1. Introduction
Section 9.2. Terminal Logins
Section 9.3. Network Logins
Section 9.4. Process Groups
Section 9.5. Sessions
Section 9.6. Controlling Terminal
Section 9.7. tcgetpgrp, tcsetpgrp, and tcgetsid Functions
Section 9.8. Job Control
Section 9.9. Shell Execution of Programs
Section 9.10. Orphaned Process Groups
Section 9.11. FreeBSD Implementation
Section 9.12. Summary
Exercises
Chapter 10. Signals
Section 10.1. Introduction
Section 10.2. Signal Concepts
Page 5
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Section 10.3. signal Function
Section 10.4. Unreliable Signals
Section 10.5. Interrupted System Calls
Section 10.6. Reentrant Functions
Section 10.7. SIGCLD Semantics
Section 10.8. Reliable-Signal Terminology and Semantics
Section 10.9. kill and raise Functions
Section 10.10. alarm and pause Functions
Section 10.11. Signal Sets
Section 10.12. sigprocmask Function
Section 10.13. sigpending Function
Section 10.14. sigaction Function
Section 10.15. sigsetjmp and siglongjmp Functions
Section 10.16. sigsuspend Function
Section 10.17. abort Function
Section 10.18. system Function
Section 10.19. sleep Function
Section 10.20. Job-Control Signals
Section 10.21. Additional Features
Section 10.22. Summary
Exercises
Chapter 11. Threads
Section 11.1. Introduction
Section 11.2. Thread Concepts
Section 11.3. Thread Identification
Section 11.4. Thread Creation
Section 11.5. Thread Termination
Section 11.6. Thread Synchronization
Section 11.7. Summary
Exercises
Chapter 12. Thread Control
Section 12.1. Introduction
Section 12.2. Thread Limits
Section 12.3. Thread Attributes
Section 12.4. Synchronization Attributes
Section 12.5. Reentrancy
Section 12.6. Thread-Specific Data
Section 12.7. Cancel Options
Section 12.8. Threads and Signals
Section 12.9. Threads and fork
Section 12.10. Threads and I/O
Section 12.11. Summary
Exercises
Chapter 13. Daemon Processes
Section 13.1. Introduction
Section 13.2. Daemon Characteristics
Section 13.3. Coding Rules
Section 13.4. Error Logging
Section 13.5. Single-Instance Daemons
Section 13.6. Daemon Conventions
Section 13.7. ClientServer Model
Section 13.8. Summary
Exercises
Chapter 14. Advanced I/O
Section 14.1. Introduction
Section 14.2. Nonblocking I/O
Section 14.3. Record Locking
Section 14.4. STREAMS
Section 14.5. I/O Multiplexing
Section 14.6. Asynchronous I/O
Page 6
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Section 14.7. readv and writev Functions
Section 14.8. readn and writen Functions
Section 14.9. Memory-Mapped I/O
Section 14.10. Summary
Exercises
Chapter 15. Interprocess Communication
Section 15.1. Introduction
Section 15.2. Pipes
Section 15.3. popen and pclose Functions
Section 15.4. Coprocesses
Section 15.5. FIFOs
Section 15.6. XSI IPC
Section 15.7. Message Queues
Section 15.8. Semaphores
Section 15.9. Shared Memory
Section 15.10. ClientServer Properties
Section 15.11. Summary
Exercises
Chapter 16. Network IPC: Sockets
Section 16.1. Introduction
Section 16.2. Socket Descriptors
Section 16.3. Addressing
Section 16.4. Connection Establishment
Section 16.5. Data Transfer
Section 16.6. Socket Options
Section 16.7. Out-of-Band Data
Section 16.8. Nonblocking and Asynchronous I/O
Section 16.9. Summary
Exercises
Chapter 17. Advanced IPC
Section 17.1. Introduction
Section 17.2. STREAMS-Based Pipes
Section 17.3. UNIX Domain Sockets
Section 17.4. Passing File Descriptors
Section 17.5. An Open Server, Version 1
Section 17.6. An Open Server, Version 2
Section 17.7. Summary
Exercises
Chapter 18. Terminal I/O
Section 18.1. Introduction
Section 18.2. Overview
Section 18.3. Special Input Characters
Section 18.4. Getting and Setting Terminal Attributes
Section 18.5. Terminal Option Flags
Section 18.6. stty Command
Section 18.7. Baud Rate Functions
Section 18.8. Line Control Functions
Section 18.9. Terminal Identification
Section 18.10. Canonical Mode
Section 18.11. Noncanonical Mode
Section 18.12. Terminal Window Size
Section 18.13. termcap, terminfo, and curses
Section 18.14. Summary
Exercises
Chapter 19. Pseudo Terminals
Section 19.1. Introduction
Section 19.2. Overview
Section 19.3. Opening Pseudo-Terminal Devices
Section 19.4. pty_fork Function
Section 19.5. pty Program
Page 7
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Section 19.6. Using the pty Program
Section 19.7. Advanced Features
Section 19.8. Summary
Exercises
Chapter 20. A Database Library
Section 20.1. Introduction
Section 20.2. History
Section 20.3. The Library
Section 20.4. Implementation Overview
Section 20.5. Centralized or Decentralized?
Section 20.6. Concurrency
Section 20.7. Building the Library
Section 20.8. Source Code
Section 20.9. Performance
Section 20.10. Summary
Exercises
Chapter 21. Communicating with a Network Printer
Section 21.1. Introduction
Section 21.2. The Internet Printing Protocol
Section 21.3. The Hypertext Transfer Protocol
Section 21.4. Printer Spooling
Section 21.5. Source Code
Section 21.6. Summary
Exercises
Appendix A. Function Prototypes
Appendix B. Miscellaneous Source Code
Section B.1. Our Header File
B.2 Standard Error Routines
Appendix C. Solutions to Selected Exercises
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15
Chapter 16
Chapter 17
Chapter 18
Chapter 19
Chapter 20
Chapter 21
Bibliography
Index
Page 8
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Copyright
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in this book, and the publisher was
aware of a trademark claim, the designations have been printed with initial capital letters or in
all capitals.
The authors and publisher have taken care in the preparation of this book, but make no
expressed or implied warranty of any kind and assume no responsibility for errors or omissions.
No liability is assumed for incidental or consequential damages in connection with or arising out
of the use of the information or programs contained herein.
The publisher offers excellent discounts on this book when ordered in quantity for bulk
purchases or special sales, which may include electronic versions and/or custom covers and
content particular to your business, training goals, marketing focus, and branding interests.
For more information, please contact:
U.S. Corporate and Government Sales
(800) 382-3419
corpsales@pearsontechgroup.com
For sales outside the U.S., please contact:
International Sales
international@pearsoned.com
Visit us on the Web: www.awprofessional.com
Library of Congress Cataloging-in-Publication Data:
Stevens, W. Richard.
Advanced programming in the Unix environment / W. Richard Stevens,
Stephen A. Rago.2nd ed.
p. cm.
Includes bibliographical references and index.
ISBN 0-201-43307-9 (hardcover : alk. paper)
1. Operating systems (Computers) 2. UNIX (Computer file) I. Rago,
Stephen A. II. Title.
QA76.76.O63S754 2005
005.4'32dc22
2005007943
Copyright © 2005 Pearson Education, Inc.
All rights reserved. Printed in the United States of America. This publication is protected by
copyright, and permission must be obtained from the publisher prior to any prohibited
reproduction, storage in a retrieval system, or transmission in any form or by any means,
electronic, mechanical, photocopying, recording, or likewise. For information regarding
permissions, write to:
Pearson Education, Inc.
Rights and Contracts Department
One Lake Street
Page 9
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Upper Saddle River, NJ 07458
0-201-43307-9
Text printed in the United States on recycled paper at Courier in Westford, Massachusetts.
First printing, June 2005
Dedication
To Jeanne
Page 10
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Praise for Advanced Programming in the
UNIX® Environment, Second Edition
"Stephen Rago's update is a long overdue benefit to the community of professionals using the
versatile family of UNIX and UNIX-like operating environments. It removes obsolescence and
includes newer developments. It also thoroughly updates the context of all topics, examples,
and applications to recent releases of popular implementations of UNIX and UNIX-like
environments. And yet, it does all this while retaining the style and taste of the original
classic."
Mukesh Kacker, cofounder and former CTO of Pronto Networks, Inc.
"One of the essential classics of UNIX programming."
Eric S. Raymond, author of The Art of UNIX Programming
"This is the definitive reference book for any serious or professional UNIX systems programmer.
Rago has updated and extended the classic Stevens text while keeping true to the original.
The APIs are illuminated by clear examples of their use. He also mentions many of the pitfalls
to look out for when programming across different UNIX system implementations and points
out how to avoid these pitfalls using relevant standards such as POSIX 1003.1, 2004 edition
and the Single UNIX Specification, Version 3."
Andrew Josey, Director, Certification, The Open Group, and Chair of the POSIX 1003.1 Working
Group
®
"Advanced Programming in the UNIX Environment, Second Edition, is an essential reference
for anyone writing programs for a UNIX system. It's the first book I turn to when I want to
understand or re-learn any of the various system interfaces. Stephen Rago has successfully
revised this book to incorporate newer operating systems such as GNU/Linux and Apple's OS X
while keeping true to the first edition in terms of both readability and usefulness. It will always
have a place right next to my computer."
Dr. Benjamin Kuperman, Swarthmore College
Page 11
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Praise for the First Edition
®
"Advanced Programming in the UNIX Environment is a must-have for any serious C
programmer who works under UNIX. Its depth, thoroughness, and clarity of explanation are
unmatched."
UniForum Monthly
®
"Numerous readers recommended Advanced Programming in the UNIX Environment by W.
Richard Stevens (Addison-Wesley), and I'm glad they did; I hadn't even heard of this book,
and it's been out since 1992. I just got my hands on a copy, and the first few chapters have
been fascinating."
Open Systems Today
"A much more readable and detailed treatment of [UNIX internals] can be found in Advanced
®
Programming in the UNIX Environment by W. Richard Stevens (Addison-Wesley). This book
includes lots of realistic examples, and I find it quite helpful when I have systems programming
tasks to do."
RS/Magazine
Page 12
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Addison-Wesley Professional Computing
Series
Brian W. Kernighan, Consulting Editor
Matthew H. Austern, Generic Programming and the STL: Using and Extending the C++
Standard Template Library
®
David R. Butenhof, Programming with POSIX Threads
Brent Callaghan, NFS Illustrated
Tom Cargill, C++ Programming Style
William R. Cheswick/Steven M. Bellovin/Aviel D. Rubin, Firewalls and Internet Security, Second
Edition: Repelling the Wily Hacker
®
David A. Curry, UNIX System Security: A Guide for Users and System Administrators
Stephen C. Dewhurst, C++ Gotchas: Avoiding Common Problems in Coding and Design
Dan Farmer/Wietse Venema, Forensic Discovery
Erich Gamma/Richard Helm/Ralph Johnson/John Vlissides, Design Patterns: Elements of
Reusable Object-Oriented Software
Erich Gamma/Richard Helm/Ralph Johnson/John Vlissides, Design Patterns CD: Elements of
Reusable Object-Oriented Software
™
Peter Haggar, Practical Java Programming Language Guide
David R. Hanson, C Interfaces and Implementations: Techniques for Creating Reusable
Software
Mark Harrison/Michael McLennan, Effective Tcl/Tk Programming: Writing Better Programs with
Tcl and Tk
®
Michi Henning/Steve Vinoski, Advanced CORBA Programming with C++
Brian W. Kernighan/Rob Pike, The Practice of Programming
S. Keshav, An Engineering Approach to Computer Networking: ATM Networks, the Internet,
and the Telephone Network
John Lakos, Large-Scale C++ Software Design
Scott Meyers, Effective C++ CD: 85 Specific Ways to Improve Your Programs and Designs
Scott Meyers, Effective C++, Third Edition: 55 Specific Ways to Improve Your Programs and
Designs
Scott Meyers, More Effective C++: 35 New Ways to Improve Your Programs and Designs
Scott Meyers, Effective STL: 50 Specific Ways to Improve Your Use of the Standard
Template Library
Robert B. Murray, C++ Strategies and Tactics
Page 13
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
David R. Musser/Gillmer J. Derge/Atul Saini, STL Tutorial and Reference Guide, Second Edition:
C++ Programming with the Standard Template Library
John K. Ousterhout, Tcl and the Tk Toolkit
Craig Partridge, Gigabit Networking
Radia Perlman, Interconnections, Second Edition: Bridges, Routers, Switches, and
Internetworking Protocols
®
Stephen A. Rago, UNIX System V Network Programming
Eric S. Raymond, The Art of UNIX Programming
Marc J. Rochkind, Advanced UNIX Programming, Second Edition
®
Curt Schimmel, UNIX Systems for Modern Architectures: Symmetric Multiprocessing and
Caching for Kernel Programmers
W. Richard Stevens, TCP/IP Illustrated, Volume 1: The Protocols
W. Richard Stevens, TCP/IP Illustrated, Volume 3: TCP for Transactions, HTTP, NNTP, and the
®
UNIX Domain Protocols
W. Richard Stevens/Bill Fenner/Andrew M. Rudoff, UNIX Network Programming Volume 1,
Third Edition: The Sockets Networking API
®
W. Richard Stevens/Stephen A. Rago, Advanced Programming in the UNIX Environment,
Second Edition
W. Richard Stevens/Gary R. Wright, TCP/IP Illustrated Volumes 1-3 Boxed Set
John Viega/Gary McGraw, Building Secure Software: How to Avoid Security Problems the Right
Way
Gary R. Wright/W. Richard Stevens, TCP/IP Illustrated, Volume 2: The Implementation
Ruixi Yuan/W. Timothy Strayer, Virtual Private Networks: Technologies and Solutions
Visit www.awprofessional.com/series/professionalcomputing for more information
about these titles.
Page 14
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Foreword
At some point during nearly every interview I give, as well as in question periods after talks, I
get asked some variant of the same question: "Did you expect Unix to last for so long?" And of
course the answer is always the same: No, we didn't quite anticipate what has happened.
Even the observation that the system, in some form, has been around for well more than half
the lifetime of the commercial computing industry is now dated.
The course of developments has been turbulent and complicated. Computer technology has
changed greatly since the early 1970s, most notably in universal networking, ubiquitous
graphics, and readily available personal computing, but the system has somehow managed to
accommodate all of these phenomena. The commercial environment, although today
dominated on the desktop by Microsoft and Intel, has in some ways moved from
single-supplier to multiple sources and, in recent years, to increasing reliance on public
standards and on freely available source.
Fortunately, Unix, considered as a phenomenon and not just a brand, has been able to move
with and even lead this wave. AT&T in the 1970s and 1980s was protective of the actual Unix
source code, but encouraged standardization efforts based on the system's interfaces and
languages. For example, the SVIDthe System V Interface Definitionwas published by AT&T,
and it became the basis for the POSIX work and its follow-ons. As it happened, Unix was able
to adapt rather gracefully to a networked environment and, perhaps less elegantly, but still
adequately, to a graphical one. And as it also happened, the basic Unix kernel interface and
many of its characteristic user-level tools were incorporated into the technological
foundations of the open-source movement.
It is important that papers and writings about the Unix system were always encouraged, even
while the software of the system itself was proprietary, for example Maurice Bach's book, The
Design of the Unix Operating System. In fact, I would claim that a central reason for the
system's longevity has been that it has attracted remarkably talented writers to explain its
beauties and mysteries. Brian Kernighan is one of these; Rich Stevens is certainly another.
The first edition of this book, along with his series of books about networking, are rightfully
regarded as remarkably well-crafted works of exposition, and became hugely popular.
However, the first edition of this book was published before Linux and the several open-source
renditions of the Unix interface that stemmed from the Berkeley CSRG became widespread,
and also at a time when many people's networking consisted of a serial modem. Steve Rago
has carefully updated this book to account for the technology changes, as well as
developments in various ISO and IEEE standards since its first publication. Thus his examples
are fresh, and freshly tested.
It's a most worthy second edition of a classic.
Murray Hill, New Jersey
Dennis Ritchie
March 2005
Page 15
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Preface
Introduction
Changes from the First Edition
Acknowledgments
Page 16
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Introduction
Rich Stevens and I first met through an e-mail exchange when I reported a typographical error
in his first book, UNIX Network Programming. He used to kid me about being the person to
send him his first errata notice for the book. Until his death in 1999, we exchanged e-mail
irregularly, usually when one of us had a question we thought the other might be able to
answer. We met for dinner at USENIX conferences and when Rich was teaching in the area.
Rich Stevens was a friend who always conducted himself as a gentleman. When I wrote UNIX
System V Network Programming in 1993, I intended it to be a System V version of Rich's UNIX
Network Programming. As was his nature, Rich gladly reviewed chapters for me, and treated
me not as a competitor, but as a colleague. We often talked about collaborating on a
STREAMS version of his TCP/IP Illustrated book. Had events been different, we might have
actually done it, but since Rich is no longer with us, revising Advanced Programming in the
UNIX Environment is the closest I'll ever get to writing a book with him.
When the editors at Addison-Wesley told me that they wanted to update Rich's book, I
thought that there wouldn't be too much to change. Even after 13 years, Rich's work still
holds up well. But the UNIX industry is vastly different today from what it was when the book
was first published.

The System V variants are slowly being replaced by Linux. The major system vendors
that ship their hardware with their own versions of the UNIX System have either made
Linux ports available or announced support for Linux. Solaris is perhaps the last
descendant of UNIX System V Release 4 with any appreciable market share.

After 4.4BSD was released, the Computing Science Research Group (CSRG) from the
University of California at Berkeley decided to put an end to its development of the
UNIX operating system, but several different groups of volunteers still maintain publicly
available versions.

The introduction of Linux, supported by thousands of volunteers, has made it possible
for anyone with a computer to run an operating system similar to the UNIX System,
with freely available source code for the newest hardware devices. The success of
Linux is something of a curiosity, given that several free BSD alternatives are readily
available.

Continuing its trend as an innovative company, Apple Computer abandoned its old Mac
operating system and replaced it with one based on Mach and FreeBSD.
Thus, I've tried to update the information presented in this book to reflect these four
platforms.
After Rich wrote Advanced Programming in the UNIX Environment in 1992, I got rid of most of
my UNIX programmer's manuals. To this day, the two books I keep closest to my desk are a
dictionary and a copy of Advanced Programming in the UNIX Environment. I hope you find
this revision equally useful.
Page 17
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Changes from the First Edition
Rich's work holds up well. I've tried not to change his original vision for this book, but a lot has
happened in 13 years. This is especially true with the standards that affect the UNIX
programming interface.
Throughout the book, I've updated interfaces that have changed from the ongoing efforts in
standards organizations. This is most noticeable in Chapter 2, since its primary topic is
standards. The 2001 version of the POSIX.1 standard, which we use in this revision, is much
more comprehensive than the 1990 version on which the first edition of this book was based.
The 1990 ISO C standard was updated in 1999, and some changes affect the interfaces in the
POSIX.1 standard.
A lot more interfaces are now covered by the POSIX.1 specification. The base specifications
of the Single UNIX Specification (published by The Open Group, formerly X/Open) have been
merged with POSIX.1. POSIX.1 now includes several 1003.1 standards and draft standards
that were formerly published separately.
Accordingly, I've added chapters to cover some new topics. Threads and multithreaded
programming are important concepts because they present a cleaner way for programmers to
deal with concurrency and asynchrony.
The socket interface is now part of POSIX.1. It provides a single interface to interprocess
communication (IPC), regardless of the location of the process, and is a natural extension of
the IPC chapters.
I've omitted most of the real-time interfaces that appear in POSIX.1. These are best treated
in a text devoted to real-time programming. One such book appears in the bibliography.
I've updated the case studies in the last chapters to cover more relevant real-world examples.
For example, few systems these days are connected to a PostScript printer via a serial or
parallel port. Most PostScript printers today are accessed via a network, so I've changed the
case study that deals with PostScript printer communication to take this into account.
The chapter on modem communication is less relevant these days. So that the original
material is not lost, however, it is available on the book's Web site in two formats: PostScript
(http://www.apuebook.com/lostchapter/modem.ps) and PDF (
http://www.apuebook.com/lostchapter/modem.pdf).
The source code for the examples shown in this book is also available at www.apuebook.com.
Most of the examples have been run on four platforms:
1.
FreeBSD 5.2.1, a derivative of the 4.4BSD release from the Computer Systems
Research Group at the University of California at Berkeley, running on an Intel Pentium
processor
2.
Linux 2.4.22 (the Mandrake 9.2 distribution), a free UNIX-like operating system, running
on Intel Pentium processors
3.
Solaris 9, a derivative of System V Release 4 from Sun Microsystems, running on a
64-bit UltraSPARC IIi processor
4.
Darwin 7.4.0, an operating environment based on FreeBSD and Mach, supported by
Apple Mac OS X, version 10.3, on a PowerPC processor
Page 18
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Acknowledgments
Rich Stevens wrote the first edition of this book on his own, and it became an instant classic.
I couldn't have updated this book without the support of my family. They put up with piles of
papers scattered about the house (well, more so than usual), my monopolizing most of the
computers in the house, and lots of hours with my face buried behind a computer terminal. My
wife, Jeanne, even helped out by installing Linux for me on one of the test machines.
The technical reviewers suggested many improvements and helped make sure that the
content was accurate. Many thanks to David Bausum, David Boreham, Keith Bostic, Mark Ellis,
Phil Howard, Andrew Josey, Mukesh Kacker, Brian Kernighan, Bengt Kleberg, Ben Kuperman,
Eric Raymond, and Andy Rudoff.
I'd also like to thank Andy Rudoff for answering questions about Solaris and Dennis Ritchie for
digging up old papers and answering history questions. Once again, the staff at
Addison-Wesley was great to work with. Thanks to Tyrrell Albaugh, Mary Franz, John Fuller,
Karen Gettman, Jessica Goldstein, Noreen Regina, and John Wait. My thanks to Evelyn Pyle for
the fine job of copyediting.
As Rich did, I also welcome electronic mail from any readers with comments, suggestions, or
bug fixes.
Warren, New Jersey
April 2005
Stephen A. Rago
sar@apuebook.com
Page 19
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Preface to the First Edition
Introduction
Unix Standards
Organization of the Book
Examples in the Text
Systems Used to Test the Examples
Acknowledgments
Page 20
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Introduction
This book describes the programming interface to the Unix systemthe system call interface
and many of the functions provided in the standard C library. It is intended for anyone writing
programs that run under Unix.
Like most operating systems, Unix provides numerous services to the programs that are
runningopen a file, read a file, start a new program, allocate a region of memory, get the
current time-of-day, and so on. This has been termed the system call interface. Additionally,
the standard C library provides numerous functions that are used by almost every C program
(format a variable's value for output, compare two strings, etc.).
The system call interface and the library routines have traditionally been described in Sections
2 and 3 of the Unix Programmer's Manual. This book is not a duplication of these sections.
Examples and rationale are missing from the Unix Programmer's Manual, and that's what this
book provides.
Page 21
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Unix Standards
The proliferation of different versions of Unix during the 1980s has been tempered by the
various international standards that were started during the late 1980s. These include the
ANSI standard for the C programming language, the IEEE POSIX family (still being developed),
and the X/Open portability guide.
This book also describes these standards. But instead of just describing the standards by
themselves, we describe them in relation to popular implementations of the standardsSystem
V Release 4 and the forthcoming 4.4BSD. This provides a real-world description, which is often
lacking from the standard itself and from books that describe only the standard.
Page 22
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Organization of the Book
This book is divided into six parts:
1.
An overview and introduction to basic Unix programming concepts and terminology (
Chapter 1), with a discussion of the various Unix standardization efforts and different
Unix implementations (Chapter 2).
2.
I/Ounbuffered I/O (Chapter 3), properties of files and directories (Chapter 4), the
standard I/O library (Chapter 5), and the standard system data files (Chapter 6).
3.
Processesthe environment of a Unix process (Chapter 7), process control (Chapter 8),
the relationships between different processes (Chapter 9), and signals (Chapter 10).
4.
More I/Oterminal I/O (Chapter 11), advanced I/O (Chapter 12), and daemon processes
(Chapter 13).
5.
IPCInterprocess communication (Chapters 14 and 15).
6.
Examplesa database library (Chapter 16), communicating with a PostScript printer (
Chapter 17), a modem dialing program (Chapter 18), and using pseudo terminals (
Chapter 19).
A reading familiarity with C would be beneficial as would some experience using Unix. No prior
programming experience with Unix is assumed. This text is intended for programmers familiar
with Unix and programmers familiar with some other operating system who wish to learn the
details of the services provided by most Unix systems.
Page 23
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Examples in the Text
This book contains many examplesapproximately 10,000 lines of source code. All the examples
are in the C programming language. Furthermore, these examples are in ANSI C. You should
have a copy of the Unix Programmer's Manual for your system handy while reading this book,
since reference is made to it for some of the more esoteric and implementation-dependent
features.
Almost every function and system call is demonstrated with a small, complete program. This
lets us see the arguments and return values and is often easier to comprehend than the use
of the function in a much larger program. But since some of the small programs are contrived
examples, a few bigger examples are also included (Chapters 16, 17, 18, and 19). These larger
examples demonstrate the programming techniques in larger, real-world examples.
All the examples have been included in the text directly from their source files. A
machine-readable copy of all the examples is available via anonymous FTP from the Internet
host ftp.uu.net in the file published/books/stevens.advprog.tar.Z. Obtaining the source code
allows you to modify the programs from this text and experiment with them on your system.
Page 24
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Systems Used to Test the Examples
Unfortunately all operating systems are moving targets. Unix is no exception. The following
diagram shows the recent evolution of the various versions of System V and 4.xBSD.
[View full size image]
4.xBSD are the various systems from the Computer Systems Research Group at the University
of California at Berkeley. This group also distributes the BSD Net 1 and BSD Net 2
releasespublicly available source code from the 4.xBSD systems. SVRx refers to System V
Release x from AT&T. XPG3 is the X/Open Portability Guide, Issue 3, and ANSI C is the ANSI
standard for the C programming language. POSIX.1 is the IEEE and ISO standard for the
interface to a Unix-like system. We'll have more to say about these different standards and
the various versions of Unix in Sections 2.2 and 2.3.
In this text we use the term 4.3+BSD to refer to the Unix system from Berkeley that is
somewhere between the BSD Net 2 release and 4.4BSD.
At the time of this writing, 4.4BSD was not released, so the system could not be called
4.4BSD. Nevertheless a simple name was needed to refer to this system and 4.3+BSD is used
throughout the text.
Most of the examples in this text have been run on four different versions of Unix:
1.
Unix System V/386 Release 4.0 Version 2.0 ("vanilla SVR4") from U.H. Corp. (UHC), on
an Intel 80386 processor.
2.
4.3+BSD at the Computer Systems Research Group, Computer Science Division,
University of California at Berkeley, on a Hewlett Packard workstation.
3.
BSD/386 (a derivative of the BSD Net 2 release) from Berkeley Software Design, Inc.,
on an Intel 80386 processor. This system is almost identical to what we call 4.3+BSD.
4.
SunOS 4.1.1 and 4.1.2 (systems with a strong Berkeley heritage but many System V
features) from Sun Microsystems, on a SPARCstation SLC.
Numerous timing tests are provided in the text and the systems used for the test are
identified.
Page 25
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Acknowledgments
Once again I am indebted to my family for their love, support, and many lost weekends over
the past year and a half. Writing a book is, in many ways, a family affair. Thank you Sally, Bill,
Ellen, and David.
I am especially grateful to Brian Kernighan for his help in the book. His numerous thorough
reviews of the entire manuscript and his gentle prodding for better prose hopefully show in the
final result. Steve Rago was also a great resource, both in reviewing the entire manuscript and
answering many questions about the details and history of System V. My thanks to the other
technical reviewers used by Addison- Wesley, who provided valuable comments on various
portions of the manuscript: Maury Bach, Mark Ellis, Jeff Gitlin, Peter Honeyman, John
Linderman, Doug McIlroy, Evi Nemeth, Craig Partridge, Dave Presotto, Gary Wilson, and Gary
Wright.
Keith Bostic and Kirk McKusick at the U.C. Berkeley CSRG provided an account that was used
to test the examples on the latest BSD system. (Many thanks to Peter Salus too.) Sam
Nataros and Joachim Sacksen at UHC provided the copy of SVR4 used to test the examples.
Trent Hein helped obtain the alpha and beta copies of BSD/386.
Other friends have helped in many small, but significant ways over the past few years: Paul
Lucchina, Joe Godsil, Jim Hogue, Ed Tankus, and Gary Wright. My editor at Addison-Wesley,
John Wait, has been a great friend through it all. He never complained when the due date
slipped and the page count kept increasing. A special thanks to the National Optical
Astronomy Observatories (NOAO), especially Sidney Wolff, Richard Wolff, and Steve Grandi,
for providing computer time.
Real Unix books are written using troff and this book follows that time-honored tradition.
Camera-ready copy of the book was produced by the author using the groff package written
by James Clark. Many thanks to James Clark for providing this excellent system and for his
rapid response to bug fixes. Perhaps someday I will really understand troff footer traps.
I welcome electronic mail from any readers with comments, suggestions, or bug fixes.
Tucson, Arizona
April 1992
W. Richard Stevens
rstevens@kohala.com
http://www.kohala.com/~rstevens
Page 26
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Chapter 1. UNIX System Overview
Section 1.1. Introduction
Section 1.2. UNIX Architecture
Section 1.3. Logging In
Section 1.4. Files and Directories
Section 1.5. Input and Output
Section 1.6. Programs and Processes
Section 1.7. Error Handling
Section 1.8. User Identification
Section 1.9. Signals
Section 1.10. Time Values
Section 1.11. System Calls and Library Functions
Section 1.12. Summary
Exercises
Page 27
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
1.1. Introduction
All operating systems provide services for programs they run. Typical services include
executing a new program, opening a file, reading a file, allocating a region of memory, getting
the current time of day, and so on. The focus of this text is to describe the services provided
by various versions of the UNIX operating system.
Describing the UNIX System in a strictly linear fashion, without any forward references to
terms that haven't been described yet, is nearly impossible (and would probably be boring).
This chapter provides a whirlwind tour of the UNIX System from a programmer's perspective.
We'll give some brief descriptions and examples of terms and concepts that appear throughout
the text. We describe these features in much more detail in later chapters. This chapter also
provides an introduction and overview of the services provided by the UNIX System, for
programmers new to this environment.
Page 28
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
1.2. UNIX Architecture
In a strict sense, an operating system can be defined as the software that controls the
hardware resources of the computer and provides an environment under which programs can
run. Generally, we call this software the kernel, since it is relatively small and resides at the
core of the environment. Figure 1.1 shows a diagram of the UNIX System architecture.
Figure 1.1. Architecture of the UNIX operating system
The interface to the kernel is a layer of software called the system calls (the shaded portion
in Figure 1.1). Libraries of common functions are built on top of the system call interface, but
applications are free to use both. (We talk more about system calls and library functions in
Section 1.11.) The shell is a special application that provides an interface for running other
applications.
In a broad sense, an operating system is the kernel and all the other software that makes a
computer useful and gives the computer its personality. This other software includes system
utilities, applications, shells, libraries of common functions, and so on.
For example, Linux is the kernel used by the GNU operating system. Some people refer to this
as the GNU/Linux operating system, but it is more commonly referred to as simply Linux.
Although this usage may not be correct in a strict sense, it is understandable, given the dual
meaning of the phrase operating system. (It also has the advantage of being more succinct.)
Page 29
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
1.3. Logging In
Login Name
When we log in to a UNIX system, we enter our login name, followed by our password. The
system then looks up our login name in its password file, usually the file /etc/passwd. If we
look at our entry in the password file we see that it's composed of seven colon-separated
fields: the login name, encrypted password, numeric user ID (205), numeric group ID (105), a
comment field, home directory (/home/sar), and shell program (/bin/ksh).
sar:x:205:105:Stephen Rago:/home/sar:/bin/ksh
All contemporary systems have moved the encrypted password to a different file. In Chapter 6
, we'll look at these files and some functions to access them.
Shells
Once we log in, some system information messages are typically displayed, and then we can
type commands to the shell program. (Some systems start a window management program
when you log in, but you generally end up with a shell running in one of the windows.) A shell
is a command-line interpreter that reads user input and executes commands. The user input
to a shell is normally from the terminal (an interactive shell) or sometimes from a file (called a
shell script). The common shells in use are summarized in Figure 1.2.
Figure 1.2. Common shells used on UNIX systems
Name
Path
FreeBSD 5.2.1
Linux 2.4.22
Mac OS X 10.3
Solaris 9
Bourne shell
/bin/sh
•
link to bash
link to bash
•
Bourne-again shell
/bin/bash
optional
•
•
•
C shell
/bin/csh
link to tcsh
link to tcsh
link to tcsh
•
Korn shell
/bin/ksh
TENEX C shell
/bin/tcsh
•
•
•
•
•
The system knows which shell to execute for us from the final field in our entry in the
password file.
The Bourne shell, developed by Steve Bourne at Bell Labs, has been in use since Version 7 and
is provided with almost every UNIX system in existence. The control-flow constructs of the
Bourne shell are reminiscent of Algol 68.
The C shell, developed by Bill Joy at Berkeley, is provided with all the BSD releases.
Additionally, the C shell was provided by AT&T with System V/386 Release 3.2 and is also in
System V Release 4 (SVR4). (We'll have more to say about these different versions of the
UNIX System in the next chapter.) The C shell was built on the 6th Edition shell, not the
Bourne shell. Its control flow looks more like the C language, and it supports additional
Page 30
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
features that weren't provided by the Bourne shell: job control, a history mechanism, and
command line editing.
The Korn shell is considered a successor to the Bourne shell and was first provided with SVR4.
The Korn shell, developed by David Korn at Bell Labs, runs on most UNIX systems, but before
SVR4 was usually an extra-cost add-on, so it is not as widespread as the other two shells. It
is upward compatible with the Bourne shell and includes those features that made the C shell
popular: job control, command line editing, and so on.
The Bourne-again shell is the GNU shell provided with all Linux systems. It was designed to be
POSIX-conformant, while still remaining compatible with the Bourne shell. It supports features
from both the C shell and the Korn shell.
The TENEX C shell is an enhanced version of the C shell. It borrows several features, such as
command completion, from the TENEX operating system (developed in 1972 at Bolt Beranek
and Newman). The TENEX C shell adds many features to the C shell and is often used as a
replacement for the C shell.
Linux uses the Bourne-again shell for its default shell. In fact, /bin/sh is a link to /bin/bash.
The default user shell in FreeBSD and Mac OS X is the TENEX C shell, but they use the Bourne
shell for their administrative shell scripts because the C shell's programming language is
notoriously difficult to use. Solaris, having its heritage in both BSD and System V, provides all
the shells shown in Figure 1.2. Free ports of most of the shells are available on the Internet.
Throughout the text, we will use parenthetical notes such as this to describe historical notes
and to compare different implementations of the UNIX System. Often the reason for a
particular implementation technique becomes clear when the historical reasons are described.
Throughout this text, we'll show interactive shell examples to execute a program that we've
developed. These examples use features common to the Bourne shell, the Korn shell, and the
Bourne-again shell.
Page 31
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
1.4. Files and Directories
File System
The UNIX file system is a hierarchical arrangement of directories and files. Everything starts in
the directory called root whose name is the single character /.
A directory is a file that contains directory entries. Logically, we can think of each directory
entry as containing a filename along with a structure of information describing the attributes
of the file. The attributes of a file are such things as type of fileregular file, directorythe size
of the file, the owner of the file, permissions for the filewhether other users may access this
fileand when the file was last modified. The stat and fstat functions return a structure of
information containing all the attributes of a file. In Chapter 4, we'll examine all the attributes
of a file in great detail.
We make a distinction between the logical view of a directory entry and the way it is actually
stored on disk. Most implementations of UNIX file systems don't store attributes in the
directory entries themselves, because of the difficulty of keeping them in synch when a file
has multiple hard links. This will become clear when we discuss hard links in Chapter 4.
Filename
The names in a directory are called filenames. The only two characters that cannot appear in
a filename are the slash character (/) and the null character. The slash separates the
filenames that form a pathname (described next) and the null character terminates a
pathname. Nevertheless, it's good practice to restrict the characters in a filename to a subset
of the normal printing characters. (We restrict the characters because if we use some of the
shell's special characters in the filename, we have to use the shell's quoting mechanism to
reference the filename, and this can get complicated.)
Two filenames are automatically created whenever a new directory is created: . (called dot)
and .. (called dot-dot). Dot refers to the current directory, and dot-dot refers to the parent
directory. In the root directory, dot-dot is the same as dot.
The Research UNIX System and some older UNIX System V file systems restricted a filename
to 14 characters. BSD versions extended this limit to 255 characters. Today, almost all
commercial UNIX file systems support at least 255-character filenames.
Pathname
A sequence of one or more filenames, separated by slashes and optionally starting with a
slash, forms a pathname. A pathname that begins with a slash is called an absolute pathname
; otherwise, it's called a relative pathname. Relative pathnames refer to files relative to the
current directory. The name for the root of the file system (/) is a special-case absolute
pathname that has no filename component.
Example
Listing the names of all the files in a directory is not difficult. Figure 1.3 shows a bare-bones
implementation of the ls(1) command.
The notation ls(1) is the normal way to reference a particular entry in the UNIX system
manuals. It refers to the entry for ls in Section 1. The sections are normally numbered 1
through 8, and all the entries within each section are arranged alphabetically. Throughout this
text, we assume that you have a copy of the manuals for your UNIX system.
Historically, UNIX systems lumped all eight sections together into what was called the UNIX
Page 32
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Programmer's Manual. As the page count increased, the trend changed to distributing the
sections among separate manuals: one for users, one for programmers, and one for system
administrators, for example.
Some UNIX systems further divide the manual pages within a given section, using an
uppercase letter. For example, all the standard input/output (I/O) functions in AT&T [1990e]
are indicated as being in Section 3S, as in fopen(3S). Other systems have replaced the
numeric sections with alphabetic ones, such as C for commands.
Today, most manuals are distributed in electronic form. If your manuals are online, the way to
see the manual pages for the ls command would be something like
man 1 ls
or
man -s1 ls
Figure 1.3 is a program that just prints the name of every file in a directory, and nothing else.
If the source file is named myls.c, we compile it into the default a.out executable file by
cc myls.c
Historically, cc(1) is the C compiler. On systems with the GNU C compilation system, the C
compiler is gcc(1). Here, cc is often linked to gcc.
Some sample output is
$ ./a.out /dev
.
..
console
tty
mem
kmem
null
mouse
stdin
stdout
stderr
zero
many more lines that aren't shown
cdrom
$ ./a.out /var/spool/cron
can't open /var/spool/cron: Permission denied
$ ./a.out /dev/tty
can't open /dev/tty: Not a directory
Throughout this text, we'll show commands that we run and the resulting output in this
fashion: Characters that we type are shown in this font, whereas output from programs is
shown like this. If we need to add comments to this output, we'll show the comments in
italics. The dollar sign that precedes our input is the prompt that is printed by the shell. We'll
always show the shell prompt as a dollar sign.
Note that the directory listing is not in alphabetical order. The ls command sorts the names
before printing them.
Page 33
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
There are many details to consider in this 20-line program.

First, we include a header of our own: apue.h. We include this header in almost every
program in this text. This header includes some standard system headers and defines
numerous constants and function prototypes that we use throughout the examples in
the text. A listing of this header is in Appendix B.

The declaration of the main function uses the style supported by the ISO C standard.
(We'll have more to say about the ISO C standard in the next chapter.)

We take an argument from the command line, argv[1], as the name of the directory to
list. In Chapter 7, we'll look at how the main function is called and how the
command-line arguments and environment variables are accessible to the program.

Because the actual format of directory entries varies from one UNIX system to another,
we use the functions opendir, readdir, and closedir to manipulate the directory.

The opendir function returns a pointer to a DIR structure, and we pass this pointer to
the readdir function. We don't care what's in the DIR structure. We then call readdir in
a loop, to read each directory entry. The readdir function returns a pointer to a dirent
structure or, when it's finished with the directory, a null pointer. All we examine in the
dirent structure is the name of each directory entry (d_name). Using this name, we
could then call the stat function (Section 4.2) to determine all the attributes of the
file.

We call two functions of our own to handle the errors: err_sys and err_quit. We can
see from the preceding output that the err_sys function prints an informative message
describing what type of error was encountered ("Permission denied" or "Not a
directory"). These two error functions are shown and described in Appendix B. We also
talk more about error handling in Section 1.7.

When the program is done, it calls the function exit with an argument of 0. The
function exit terminates a program. By convention, an argument of 0 means OK, and
an argument between 1 and 255 means that an error occurred. In Section 8.5, we
show how any program, such as a shell or a program that we write, can obtain the
exit status of a program that it executes.
Figure 1.3. List all the files in a directory
#include "apue.h"
#include <dirent.h>
int
main(int argc, char *argv[])
{
DIR
*dp;
struct dirent
*dirp;
if (argc != 2)
err_quit("usage: ls directory_name");
if ((dp = opendir(argv[1])) == NULL)
err_sys("can't open %s", argv[1]);
while ((dirp = readdir(dp)) != NULL)
printf("%s\n", dirp->d_name);
closedir(dp);
exit(0);
}
Page 34
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Working Directory
Every process has a working directory, sometimes called the current working directory. This
is the directory from which all relative pathnames are interpreted. A process can change its
working directory with the chdir function.
For example, the relative pathname doc/memo/joe refers to the file or directory joe, in the
directory memo, in the directory doc, which must be a directory within the working directory.
From looking just at this pathname, we know that both doc and memo have to be directories,
but we can't tell whether joe is a file or a directory. The pathname /usr/lib/lint is an
absolute pathname that refers to the file or directory lint in the directory lib, in the
directory usr, which is in the root directory.
Home Directory
When we log in, the working directory is set to our home directory. Our home directory is
obtained from our entry in the password file (Section 1.3).
Page 35
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
1.5. Input and Output
File Descriptors
File descriptors are normally small non-negative integers that the kernel uses to identify the
files being accessed by a particular process. Whenever it opens an existing file or creates a
new file, the kernel returns a file descriptor that we use when we want to read or write the
file.
Standard Input, Standard Output, and Standard Error
By convention, all shells open three descriptors whenever a new program is run: standard
input, standard output, and standard error. If nothing special is done, as in the simple
command
ls
then all three are connected to the terminal. Most shells provide a way to redirect any or all
of these three descriptors to any file. For example,
ls > file.list
executes the ls command with its standard output redirected to the file named file.list.
Unbuffered I/O
Unbuffered I/O is provided by the functions open, read, write, lseek, and close. These
functions all work with file descriptors.
Example
If we're willing to read from the standard input and write to the standard output, then the
program in Figure 1.4 copies any regular file on a UNIX system.
The <unistd.h> header, included by apue.h, and the two constants STDIN_FILENO and
STDOUT_FILENO are part of the POSIX standard (about which we'll have a lot more to say in the
next chapter). In this header are function prototypes for many of the UNIX system services,
such as the read and write functions that we call.
The constants STDIN_FILENO and STDOUT_FILENO are defined in <unistd.h> and specify the file
descriptors for standard input and standard output. These values are typically 0 and 1,
respectively, but we'll use the new names for portability.
In Section 3.9, we'll examine the BUFFSIZE constant in detail, seeing how various values affect
the efficiency of the program. Regardless of the value of this constant, however, this program
still copies any regular file.
The read function returns the number of bytes that are read, and this value is used as the
number of bytes to write. When the end of the input file is encountered, read returns 0 and
the program stops. If a read error occurs, read returns -1. Most of the system functions
return 1 when an error occurs.
If we compile the program into the standard name (a.out) and execute it as
./a.out > data
Page 36
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
standard input is the terminal, standard output is redirected to the file data, and standard
error is also the terminal. If this output file doesn't exist, the shell creates it by default. The
program copies lines that we type to the standard output until we type the end-of-file
character (usually Control-D).
If we run
./a.out < infile > outfile
then the file named infile will be copied to the file named outfile.
Figure 1.4. List all the files in a directory
#include "apue.h"
#define BUFFSIZE
int
main(void)
{
int
char
4096
n;
buf[BUFFSIZE];
while ((n = read(STDIN_FILENO, buf, BUFFSIZE)) > 0)
if (write(STDOUT_FILENO, buf, n) != n)
err_sys("write error");
if (n < 0)
err_sys("read error");
exit(0);
}
In Chapter 3, we describe the unbuffered I/O functions in more detail.
Standard I/O
The standard I/O functions provide a buffered interface to the unbuffered I/O functions. Using
standard I/O prevents us from having to worry about choosing optimal buffer sizes, such as
the BUFFSIZE constant in Figure 1.4. Another advantage of using the standard I/O functions is
that they simplify dealing with lines of input (a common occurrence in UNIX applications). The
fgets function, for example, reads an entire line. The read function, on the other hand, reads
a specified number of bytes. As we shall see in Section 5.4, the standard I/O library provides
functions that let us control the style of buffering used by the library.
The most common standard I/O function is printf. In programs that call printf, we'll always
include <stdio.h>normally by including apue.has this header contains the function prototypes
for all the standard I/O functions.
Example
The program in Figure 1.5, which we'll examine in more detail in Section 5.8, is like the
previous program that called read and write. This program copies standard input to standard
output and can copy any regular file.
The function getc reads one character at a time, and this character is written by putc. After
the last byte of input has been read, getc returns the constant EOF (defined in <stdio.h>). The
standard I/O constants stdin and stdout are also defined in the <stdio.h> header and refer to
Page 37
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
the standard input and standard output.
Figure 1.5. Copy standard input to standard output, using standard I/O
#include "apue.h"
int
main(void)
{
int
c;
while ((c = getc(stdin)) != EOF)
if (putc(c, stdout) == EOF)
err_sys("output error");
if (ferror(stdin))
err_sys("input error");
exit(0);
}
Page 38
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
1.6. Programs and Processes
Program
A program is an executable file residing on disk in a directory. A program is read into memory
and is executed by the kernel as a result of one of the six exec functions. We'll cover these
functions in Section 8.10.
Processes and Process ID
An executing instance of a program is called a process, a term used on almost every page of
this text. Some operating systems use the term task to refer to a program that is being
executed.
The UNIX System guarantees that every process has a unique numeric identifier called the
process ID. The process ID is always a non-negative integer.
Example
The program in Figure 1.6 prints its process ID.
If we compile this program into the file a.out and execute it, we have
$ ./a.out
hello world from process ID 851
$ ./a.out
hello world from process ID 854
When this program runs, it calls the function getpid to obtain its process ID.
Figure 1.6. Print the process ID
#include "apue.h"
int
main(void)
{
printf("hello world from process ID %d\n", getpid());
exit(0);
}
Process Control
There are three primary functions for process control: fork, exec, and waitpid. (The exec
function has six variants, but we often refer to them collectively as simply the exec function.)
Example
The process control features of the UNIX System are demonstrated using a simple program (
Figure 1.7) that reads commands from standard input and executes the commands. This is a
bare-bones implementation of a shell-like program. There are several features to consider in
this 30-line program.

We use the standard I/O function fgets to read one line at a time from the standard
input. When we type the end-of-file character (which is often Control-D) as the first
Page 39
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
character of a line, fgets returns a null pointer, the loop stops, and the process
terminates. In Chapter 18, we describe all the special terminal charactersend of file,
backspace one character, erase entire line, and so onand how to change them.

Because each line returned by fgets is terminated with a newline character, followed
by a null byte, we use the standard C function strlen to calculate the length of the
string, and then replace the newline with a null byte. We do this because the execlp
function wants a null-terminated argument, not a newline-terminated argument.

We call fork to create a new process, which is a copy of the caller. We say that the
caller is the parent and that the newly created process is the child. Then fork returns
the non-negative process ID of the new child process to the parent, and returns 0 to
the child. Because fork creates a new process, we say that it is called onceby the
parentbut returns twicein the parent and in the child.

In the child, we call execlp to execute the command that was read from the standard
input. This replaces the child process with the new program file. The combination of a
fork, followed by an exec, is what some operating systems call spawning a new
process. In the UNIX System, the two parts are separated into individual functions.
We'll have a lot more to say about these functions in Chapter 8.

Because the child calls execlp to execute the new program file, the parent wants to
wait for the child to terminate. This is done by calling waitpid, specifying which
process we want to wait for: the pid argument, which is the process ID of the child.
The waitpid function also returns the termination status of the childthe status
variablebut in this simple program, we don't do anything with this value. We could
examine it to determine exactly how the child terminated.

The most fundamental limitation of this program is that we can't pass arguments to the
command that we execute. We can't, for example, specify the name of a directory to
list. We can execute ls only on the working directory. To allow arguments would
require that we parse the input line, separating the arguments by some convention,
probably spaces or tabs, and then pass each argument as a separate argument to the
execlp function. Nevertheless, this program is still a useful demonstration of the
process control functions of the UNIX System.
If we run this program, we get the following results. Note that our program has a different
promptthe percent signto distinguish it from the shell's prompt.
$ ./a.out
% date
Sun Aug 1 03:04:47 EDT 2004
% who
sar
:0
Jul 26 22:54
sar
pts/0
Jul 26 22:54 (:0)
sar
pts/1
Jul 26 22:54 (:0)
sar
pts/2
Jul 26 22:54 (:0)
% pwd
/home/sar/bk/apue/2e
% ls
Makefile
a.out
shell1.c
% ^D
$
programmers work late
type the end-of-file character
the regular shell prompt
Figure 1.7. Read commands from standard input and execute them
#include "apue.h"
Page 40
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
#include <sys/wait.h>
int
main(void)
{
char
pid_t
int
buf[MAXLINE];
pid;
status;
/* from apue.h */
printf("%% "); /* print prompt (printf requires %% to print %) */
while (fgets(buf, MAXLINE, stdin) != NULL) {
if (buf[strlen(buf) - 1] == "\n")
buf[strlen(buf) - 1] = 0; /* replace newline with null */
if ((pid = fork()) < 0) {
err_sys("fork error");
} else if (pid == 0) {
/* child */
execlp(buf, buf, (char *)0);
err_ret("couldn't execute: %s", buf);
exit(127);
}
/* parent */
if ((pid = waitpid(pid, &status, 0)) < 0)
err_sys("waitpid error");
printf("%% ");
}
exit(0);
}
The notation ^D is used to indicate a control character. Control characters are special
characters formed by holding down the control keyoften labeled Control or Ctrlon your
keyboard and then pressing another key at the same time. Control-D, or ^D, is the default
end-of-file character. We'll see many more control characters when we discuss terminal I/O in
Chapter 18.
Threads and Thread IDs
Usually, a process has only one thread of controlone set of machine instructions executing at
a time. Some problems are easier to solve when more than one thread of control can operate
on different parts of the problem. Additionally, multiple threads of control can exploit the
parallelism possible on multiprocessor systems.
All the threads within a process share the same address space, file descriptors, stacks, and
process-related attributes. Because they can access the same memory, the threads need to
synchronize access to shared data among themselves to avoid inconsistencies.
As with processes, threads are identified by IDs. Thread IDs, however, are local to a process.
A thread ID from one process has no meaning in another process. We use thread IDs to refer
to specific threads as we manipulate the threads within a process.
Functions to control threads parallel those used to control processes. Because threads were
added to the UNIX System long after the process model was established, however, the thread
model and the process model have some complicated interactions, as we shall see in Chapter
12.
Page 41
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
1.7. Error Handling
When an error occurs in one of the UNIX System functions, a negative value is often returned,
and the integer errno is usually set to a value that gives additional information. For example,
the open function returns either a non-negative file descriptor if all is OK or 1 if an error
occurs. An error from open has about 15 possible errno values, such as file doesn't exist,
permission problem, and so on. Some functions use a convention other than returning a
negative value. For example, most functions that return a pointer to an object return a null
pointer to indicate an error.
The file <errno.h> defines the symbol errno and constants for each value that errno can
assume. Each of these constants begins with the character E. Also, the first page of Section
2 of the UNIX system manuals, named intro(2), usually lists all these error constants. For
example, if errno is equal to the constant EACCES, this indicates a permission problem, such as
insufficient permission to open the requested file.
On Linux, the error constants are listed in the errno(3) manual page.
POSIX and ISO C define errno as a symbol expanding into a modifiable lvalue of type integer.
This can be either an integer that contains the error number or a function that returns a
pointer to the error number. The historical definition is
extern int errno;
But in an environment that supports threads, the process address space is shared among
multiple threads, and each thread needs its own local copy of errno to prevent one thread
from interfering with another. Linux, for example, supports multithreaded access to errno by
defining it as
extern int *_ _errno_location(void);
#define errno (*_ _errno_location())
There are two rules to be aware of with respect to errno. First, its value is never cleared by a
routine if an error does not occur. Therefore, we should examine its value only when the
return value from a function indicates that an error occurred. Second, the value of errno is
never set to 0 by any of the functions, and none of the constants defined in <errno.h> has a
value of 0.
Two functions are defined by the C standard to help with printing error messages.
#include <string.h>
char *strerror(int errnum);
Returns: pointer to message string
This function maps errnum, which is typically the errno value, into an error message string
and returns a pointer to the string.
The perror function produces an error message on the standard error, based on the current
value of errno, and returns.
Page 42
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
#include <stdio.h>
void perror(const char *msg);
It outputs the string pointed to by msg, followed by a colon and a space, followed by the
error message corresponding to the value of errno, followed by a newline.
Example
Figure 1.8 shows the use of these two error functions.
If this program is compiled into the file a.out, we have
$ ./a.out
EACCES: Permission denied
./a.out: No such file or directory
Note that we pass the name of the programargv[0], whose value is ./a.outas the argument
to perror. This is a standard convention in the UNIX System. By doing this, if the program is
executed as part of a pipeline, as in
prog1 < inputfile | prog2 | prog3 > outputfile
we are able to tell which of the three programs generated a particular error message.
Figure 1.8. Demonstrate strerror and perror
#include "apue.h"
#include <errno.h>
int
main(int argc, char *argv[])
{
fprintf(stderr, "EACCES: %s\n", strerror(EACCES));
errno = ENOENT;
perror(argv[0]);
exit(0);
}
Instead of calling either strerror or perror directly, all the examples in this text use the error
functions shown in Appendix B. The error functions in this appendix let us use the variable
argument list facility of ISO C to handle error conditions with a single C statement.
Error Recovery
The errors defined in <errno.h> can be divided into two categories: fatal and nonfatal. A fatal
error has no recovery action. The best we can do is print an error message on the user's
screen or write an error message into a log file, and then exit. Nonfatal errors, on the other
hand, can sometimes be dealt with more robustly. Most nonfatal errors are temporary in
nature, such as with a resource shortage, and might not occur when there is less activity on
the system.
Resource-related nonfatal errors include EAGAIN, ENFILE, ENOBUFS, ENOLCK, ENOSPC, ENOSR,
EWOULDBLOCK, and sometimes ENOMEM. EBUSY can be treated as a nonfatal error when it indicates
that a shared resource is in use. Sometimes, EINTR can be treated as a nonfatal error when it
Page 43
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
interrupts a slow system call (more on this in Section 10.5).
The typical recovery action for a resource-related nonfatal error is to delay a little and try
again later. This technique can be applied in other circumstances. For example, if an error
indicates that a network connection is no longer functioning, it might be possible for the
application to delay a short time and then reestablish the connection. Some applications use
an exponential backoff algorithm, waiting a longer period of time each iteration.
Ultimately, it is up to the application developer to determine which errors are recoverable. If a
reasonable strategy can be used to recover from an error, we can improve the robustness of
our application by avoiding an abnormal exit.
Page 44
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
1.8. User Identification
User ID
The user ID from our entry in the password file is a numeric value that identifies us to the
system. This user ID is assigned by the system administrator when our login name is assigned,
and we cannot change it. The user ID is normally assigned to be unique for every user. We'll
see how the kernel uses the user ID to check whether we have the appropriate permissions to
perform certain operations.
We call the user whose user ID is 0 either root or the superuser. The entry in the password
file normally has a login name of root, and we refer to the special privileges of this user as
superuser privileges. As we'll see in Chapter 4, if a process has superuser privileges, most file
permission checks are bypassed. Some operating system functions are restricted to the
superuser. The superuser has free rein over the system.
Client versions of Mac OS X ship with the superuser account disabled; server versions ship
with the account already enabled. Instructions are available on Apple's Web site describing
how to enable it. See http://docs.info.apple.com/article.html?artnum=106290.
Group ID
Our entry in the password file also specifies our numeric group ID. This too is assigned by the
system administrator when our login name is assigned. Typically, the password file contains
multiple entries that specify the same group ID. Groups are normally used to collect users
together into projects or departments. This allows the sharing of resources, such as files,
among members of the same group. We'll see in Section 4.5 that we can set the permissions
on a file so that all members of a group can access the file, whereas others outside the group
cannot.
There is also a group file that maps group names into numeric group IDs. The group file is
usually /etc/group.
The use of numeric user IDs and numeric group IDs for permissions is historical. With every file
on disk, the file system stores both the user ID and the group ID of a file's owner. Storing
both of these values requires only four bytes, assuming that each is stored as a two-byte
integer. If the full ASCII login name and group name were used instead, additional disk space
would be required. In addition, comparing strings during permission checks is more expensive
than comparing integers.
Users, however, work better with names than with numbers, so the password file maintains
the mapping between login names and user IDs, and the group file provides the mapping
between group names and group IDs. The ls -l command, for example, prints the login name
of the owner of a file, using the password file to map the numeric user ID into the
corresponding login name.
Early UNIX systems used 16-bit integers to represent user and group IDs. Contemporary UNIX
systems use 32-bit integers.
Example
The program in Figure 1.9 prints the user ID and the group ID.
We call the functions getuid and getgid to return the user ID and the group ID. Running the
program yields
$ ./a.out
Page 45
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
uid = 205, gid = 105
Figure 1.9. Print user ID and group ID
#include "apue.h"
int
main(void)
{
printf("uid = %d, gid = %d\n", getuid(), getgid());
exit(0);
}
Supplementary Group IDs
In addition to the group ID specified in the password file for a login name, most versions of the
UNIX System allow a user to belong to additional groups. This started with 4.2BSD, which
allowed a user to belong to up to 16 additional groups. These supplementary group IDs are
obtained at login time by reading the file /etc/group and finding the first 16 entries that list
the user as a member. As we shall see in the next chapter, POSIX requires that a system
support at least eight supplementary groups per process, but most systems support at least
16.
Page 46
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
1.9. Signals
Signals are a technique used to notify a process that some condition has occurred. For
example, if a process divides by zero, the signal whose name is SIGFPE (floating-point
exception) is sent to the process. The process has three choices for dealing with the signal.
1.
Ignore the signal. This option isn't recommended for signals that denote a hardware
exception, such as dividing by zero or referencing memory outside the address space
of the process, as the results are undefined.
2.
Let the default action occur. For a divide-by-zero condition, the default is to terminate
the process.
3.
Provide a function that is called when the signal occurs (this is called "catching" the
signal). By providing a function of our own, we'll know when the signal occurs and we
can handle it as we wish.
Many conditions generate signals. Two terminal keys, called the interrupt key often the
DELETE key or Control-Cand the quit keyoften Control-backslashare used to interrupt the
currently running process. Another way to generate a signal is by calling the kill function.
We can call this function from a process to send a signal to another process. Naturally, there
are limitations: we have to be the owner of the other process (or the superuser) to be able to
send it a signal.
Example
Recall the bare-bones shell example (Figure 1.7). If we invoke this program and press the
interrupt key, the process terminates because the default action for this signal, named SIGINT,
is to terminate the process. The process hasn't told the kernel to do anything other than the
default with this signal, so the process terminates.
To catch this signal, the program needs to call the signal function, specifying the name of
the function to call when the SIGINT signal is generated. The function is named sig_int; when
it's called, it just prints a message and a new prompt. Adding 11 lines to the program in Figure
1.7 gives us the version in Figure 1.10. (The 11 new lines are indicated with a plus sign at the
beginning of the line.)
In Chapter 10, we'll take a long look at signals, as most nontrivial applications deal with them.
Figure 1.10. Read commands from standard input and execute them
#include "apue.h"
#include <sys/wait.h>
+ static void
+
int
main(void)
{
char
pid_t
int
+
+
+
sig_int(int);
buf[MAXLINE];
pid;
status;
/* our signal-catching function */
/* from apue.h */
if (signal(SIGINT, sig_int) == SIG_ERR)
err_sys("signal error");
printf("%% ");
/* print prompt (printf requires %% to print %) */
Page 47
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
while (fgets(buf, MAXLINE, stdin) != NULL) {
if (buf[strlen(buf) - 1] == "\n")
buf[strlen(buf) - 1] = 0; /* replace newline with null */
if ((pid = fork()) < 0) {
err_sys("fork error");
} else if (pid == 0) {
/* child */
execlp(buf, buf, (char *)0);
err_ret("couldn't execute: %s", buf);
exit(127);
}
/* parent */
if ((pid = waitpid(pid, &status, 0)) < 0)
err_sys("waitpid error");
printf("%% ");
}
exit(0);
}
+
+
+
+
+
+
void
sig_int(int signo)
{
printf("interrupt\n%% ");
}
Page 48
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
1.10. Time Values
Historically, UNIX systems have maintained two different time values:
1.
Calendar time. This value counts the number of seconds since the Epoch: 00:00:00
January 1, 1970, Coordinated Universal Time (UTC). (Older manuals refer to UTC as
Greenwich Mean Time.) These time values are used to record the time when a file was
last modified, for example.
The primitive system data type time_t holds these time values.
2.
Process time. This is also called CPU time and measures the central processor
resources used by a process. Process time is measured in clock ticks, which have
historically been 50, 60, or 100 ticks per second.
The primitive system data type clock_t holds these time values. (We'll show how to
obtain the number of clock ticks per second with the sysconf function in Section 2.5.4
.)
When we measure the execution time of a process, as in Section 3.9, we'll see that the UNIX
System maintains three values for a process:

Clock time

User CPU time

System CPU time
The clock time, sometimes called wall clock time, is the amount of time the process takes to
run, and its value depends on the number of other processes being run on the system.
Whenever we report the clock time, the measurements are made with no other activities on
the system.
The user CPU time is the CPU time attributed to user instructions. The system CPU time is the
CPU time attributed to the kernel when it executes on behalf of the process. For example,
whenever a process executes a system service, such as read or write, the time spent within
the kernel performing that system service is charged to the process. The sum of user CPU
time and system CPU time is often called the CPU time.
It is easy to measure the clock time, user time, and system time of any process: simply
execute the time(1) command, with the argument to the time command being the command
we want to measure. For example:
$ cd /usr/include
$ time -p grep _POSIX_SOURCE */*.h > /dev/null
real
user
sys
0m0.81s
0m0.11s
0m0.07s
The output format from the time command depends on the shell being used, because some
shells don't run /usr/bin/time, but instead have a separate built-in function to measure the
time it takes commands to run.
In Section 8.16, we'll see how to obtain these three times from a running process. The general
topic of times and dates is covered in Section 6.10.
Page 49
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 50
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
1.11. System Calls and Library Functions
All operating systems provide service points through which programs request services from the
kernel. All implementations of the UNIX System provide a well-defined, limited number of entry
points directly into the kernel called system calls (recall Figure 1.1). Version 7 of the Research
UNIX System provided about 50 system calls, 4.4BSD provided about 110, and SVR4 had
around 120. Linux has anywhere between 240 and 260 system calls, depending on the version.
FreeBSD has around 320.
The system call interface has always been documented in Section 2 of the UNIX
Programmer's Manual. Its definition is in the C language, regardless of the actual
implementation technique used on any given system to invoke a system call. This differs from
many older operating systems, which traditionally defined the kernel entry points in the
assembler language of the machine.
The technique used on UNIX systems is for each system call to have a function of the same
name in the standard C library. The user process calls this function, using the standard C
calling sequence. This function then invokes the appropriate kernel service, using whatever
technique is required on the system. For example, the function may put one or more of the C
arguments into general registers and then execute some machine instruction that generates a
software interrupt in the kernel. For our purposes, we can consider the system calls as being
C functions.
Section 3 of the UNIX Programmer's Manual defines the general-purpose functions available
to programmers. These functions aren't entry points into the kernel, although they may invoke
one or more of the kernel's system calls. For example, the printf function may use the write
system call to output a string, but the strcpy (copy a string) and atoi (convert ASCII to
integer) functions don't involve the kernel at all.
From an implementor's point of view, the distinction between a system call and a library
function is fundamental. But from a user's perspective, the difference is not as critical. From
our perspective in this text, both system calls and library functions appear as normal C
functions. Both exist to provide services for application programs. We should realize, however,
that we can replace the library functions, if desired, whereas the system calls usually cannot
be replaced.
Consider the memory allocation function malloc as an example. There are many ways to do
memory allocation and its associated garbage collection (best fit, first fit, and so on). No
single technique is optimal for all programs. The UNIX system call that handles memory
allocation, sbrk(2), is not a general-purpose memory manager. It increases or decreases the
address space of the process by a specified number of bytes. How that space is managed is
up to the process. The memory allocation function, malloc(3), implements one particular type
of allocation. If we don't like its operation, we can define our own malloc function, which will
probably use the sbrk system call. In fact, numerous software packages implement their own
memory allocation algorithms with the sbrk system call. Figure 1.11 shows the relationship
between the application, the malloc function, and the sbrk system call.
Figure 1.11. Separation of malloc function and sbrk system call
Page 51
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Here we have a clean separation of duties: the system call in the kernel allocates an
additional chunk of space on behalf of the process. The malloc library function manages this
space from user level.
Another example to illustrate the difference between a system call and a library function is the
interface the UNIX System provides to determine the current time and date. Some operating
systems provide one system call to return the time and another to return the date. Any
special handling, such as the switch to or from daylight saving time, is handled by the kernel
or requires human intervention. The UNIX System, on the other hand, provides a single system
call that returns the number of seconds since the Epoch: midnight, January 1, 1970,
Coordinated Universal Time. Any interpretation of this value, such as converting it to a
human-readable time and date using the local time zone, is left to the user process. The
standard C library provides routines to handle most cases. These library routines handle such
details as the various algorithms for daylight saving time.
An application can call either a system call or a library routine. Also realize that many library
routines invoke a system call. This is shown in Figure 1.12.
Figure 1.12. Difference between C library functions and system calls
Page 52
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Another difference between system calls and library functions is that system calls usually
provide a minimal interface, whereas library functions often provide more elaborate
functionality. We've seen this already in the difference between the sbrk system call and the
malloc library function. We'll see this difference later, when we compare the unbuffered I/O
functions (Chapter 3) and the standard I/O functions (Chapter 5).
The process control system calls (fork, exec, and wait) are usually invoked by the user's
application code directly. (Recall the bare-bones shell in Figure 1.7.) But some library routines
exist to simplify certain common cases: the system and popen library routines, for example. In
Section 8.13, we'll show an implementation of the system function that invokes the basic
process control system calls. We'll enhance this example in Section 10.18 to handle signals
correctly.
To define the interface to the UNIX System that most programmers use, we have to describe
both the system calls and some of the library functions. If we described only the sbrk system
call, for example, we would skip the more programmer-friendly malloc library function that
many applications use. In this text, we'll use the term function to refer to both system calls
and library functions, except when the distinction is necessary.
Page 53
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
1.12. Summary
This chapter has been a short tour of the UNIX System. We've described some of the
fundamental terms that we'll encounter over and over again. We've seen numerous small
examples of UNIX programs to give us a feel for what the remainder of the text talks about.
The next chapter is about standardization of the UNIX System and the effect of work in this
area on current systems. Standards, particularly the ISO C standard and the POSIX.1
standard, will affect the rest of the text.
Page 54
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Exercises
1.1
1.2
1.3
1.4
1.5
1.6
Verify on your system that the directories dot and dot-dot are not the same,
except in the root directory.
In the output from the program in Figure 1.6, what happened to the processes
with process IDs 852 and 853?
In Section 1.7, the argument to perror is defined with the ISO C attribute const
, whereas the integer argument to strerror isn't defined with this attribute.
Why?
In the error-handling function err_sys in Appendix B, why is the value of errno
saved when the function is called?
If the calendar time is stored as a signed 32-bit integer, in what year will it
overflow? What ways can be used to extend the overflow point? Are they
compatible with existing applications?
If the process time is stored as a signed 32-bit integer, and if the system
counts 100 ticks per second, after how many days will the value overflow?
Page 55
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Chapter 2. UNIX Standardization and
Implementations
Section 2.1. Introduction
Section 2.2. UNIX Standardization
Section 2.3. UNIX System Implementations
Section 2.4. Relationship of Standards and Implementations
Section 2.5. Limits
Section 2.6. Options
Section 2.7. Feature Test Macros
Section 2.8. Primitive System Data Types
Section 2.9. Conflicts Between Standards
Section 2.10. Summary
Exercises
Page 56
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
2.1. Introduction
Much work has gone into standardizing the UNIX programming environment and the C
programming language. Although applications have always been quite portable across different
versions of the UNIX operating system, the proliferation of versions and differences during the
1980s led many large users, such as the U.S. government, to call for standardization.
In this chapter we first look at the various standardization efforts that have been under way
over the past two decades. We then discuss the effects of these UNIX programming
standards on the operating system implementations that are described in this book. An
important part of all the standardization efforts is the specification of various limits that each
implementation must define, so we look at these limits and the various ways to determine their
values.
Page 57
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
2.2. UNIX Standardization
2.2.1. ISO C
In late 1989, ANSI Standard X3.1591989 for the C programming language was approved. This
standard has also been adopted as international standard ISO/IEC 9899:1990. ANSI is the
American National Standards Institute, the U.S. member in the International Organization for
Standardization (ISO). IEC stands for the International Electrotechnical Commission.
The C standard is now maintained and developed by the ISO/IEC international standardization
working group for the C programming language, known as ISO/IEC JTC1/SC22/WG14, or WG14
for short. The intent of the ISO C standard is to provide portability of conforming C programs
to a wide variety of operating systems, not only the UNIX System. This standard defines not
only the syntax and semantics of the programming language but also a standard library [
Chapter 7 of ISO 1999; Plauger 1992; Appendix B of Kernighan and Ritchie 1988]. This library
is important because all contemporary UNIX systems, such as the ones described in this book,
provide the library routines that are specified in the C standard.
In 1999, the ISO C standard was updated and approved as ISO/IEC 9899:1999, largely to
improve support for applications that perform numerical processing. The changes don't affect
the POSIX standards described in this book, except for the addition of the restrict keyword
to some of the function prototypes. This keyword is used to tell the compiler which pointer
references can be optimized, by indicating that the object to which the pointer refers is
accessed in the function only via that pointer.
As with most standards, there is a delay between the standard's approval and the modification
of software to conform to it. As each vendor's compilation systems evolve, they add more
support for the latest version of the ISO C standard.
A summary of the current level of conformance of gcc to the 1999 version of the ISO C
standard is available at http://www.gnu.org/software/gcc/c99status.html.
The ISO C library can be divided into 24 areas, based on the headers defined by the
standard. Figure 2.1 lists the headers defined by the C standard. The POSIX.1 standard
includes these headers, as well as others. We also list which of these headers are supported
by the four implementations (FreeBSD 5.2.1, Linux 2.4.22, Mac OS X 10.3, and Solaris 9) that
are described later in this chapter.
Figure 2.1. Headers defined by the ISO C standard
FreeBSD
5.2.1
Linux
2.4.22
Mac OS X
10.3
Solaris
9
<assert.h>
•
•
•
•
<complex.h>
•
•
•
<ctype.h>
•
•
•
•
character types
<errno.h>
•
•
•
•
error codes (Section 1.7)
•
•
Header
<fenv.h>
Description
verify program assertion
complex arithmetic support
floating-point environment
Page 58
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.1. Headers defined by the ISO C standard
FreeBSD
5.2.1
Linux
2.4.22
Mac OS X
10.3
Solaris
9
<float.h>
•
•
•
•
floating-point constants
<inttypes.h>
•
•
•
•
integer type format conversion
<iso646.h>
•
•
•
•
alternate relational operator
macros
<limits.h>
•
•
•
•
implementation constants (
Section 2.5)
<locale.h>
•
•
•
•
locale categories
<math.h>
•
•
•
•
mathematical constants
<setjmp.h>
•
•
•
•
nonlocal goto (Section 7.10)
<signal.h>
•
•
•
•
signals (Chapter 10)
<stdarg.h>
•
•
•
•
variable argument lists
<stdbool.h>
•
•
•
•
boolean type and values
<stddef.h>
•
•
•
•
standard definitions
<stdint.h>
•
•
•
<stdio.h>
•
•
•
•
standard I/O library (Chapter 5)
<stdlib.h>
•
•
•
•
utility functions
<string.h>
•
•
•
•
string operations
Header
<tgmath.h>
Description
integer types
•
type-generic math macros
<time.h>
•
•
•
•
time and date (Section 6.10)
<wchar.h>
•
•
•
•
extended multibyte and wide
character support
<wctype.h>
•
•
•
•
wide character classification
and mapping support
Page 59
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The ISO C headers depend on which version of the C compiler is used with the operating
system. When considering Figure 2.1, note that FreeBSD 5.2.1 ships with version 3.3.3 of gcc,
Solaris 9 ships with both version 2.95.3 and version 3.2 of gcc, Mandrake 9.2 (Linux 2.4.22)
ships with version 3.3.1 of gcc, and Mac OS X 10.3 ships with version 3.3 of gcc. Mac OS X
also includes older versions of gcc.
2.2.2. IEEE POSIX
POSIX is a family of standards developed by the IEEE (Institute of Electrical and Electronics
Engineers). POSIX stands for Portable Operating System Interface. It originally referred only to
the IEEE Standard 1003.11988the operating system interfacebut was later extended to include
many of the standards and draft standards with the 1003 designation, including the shell and
utilities (1003.2).
Of specific interest to this book is the 1003.1 operating system interface standard, whose goal
is to promote the portability of applications among various UNIX System environments. This
standard defines the services that must be provided by an operating system if it is to be
"POSIX compliant," and has been adopted by most computer vendors. Although the 1003.1
standard is based on the UNIX operating system, the standard is not restricted to UNIX and
UNIX-like systems. Indeed, some vendors supplying proprietary operating systems claim that
these systems have been made POSIX compliant, while still leaving all their proprietary
features in place.
Because the 1003.1 standard specifies an interface and not an implementation, no distinction
is made between system calls and library functions. All the routines in the standard are called
functions.
Standards are continually evolving, and the 1003.1 standard is no exception. The 1988 version
of this standard, IEEE Standard 1003.11988, was modified and submitted to the International
Organization for Standardization. No new interfaces or features were added, but the text was
revised. The resulting document was published as IEEE Std 1003.11990 [IEEE 1990]. This is
also the international standard ISO/IEC 99451:1990. This standard is commonly referred to as
POSIX.1, which we'll use in this text.
The IEEE 1003.1 working group continued to make changes to the standard. In 1993, a
revised version of the IEEE 1003.1 standard was published. It included 1003.1-1990 standard
and the 1003.1b-1993 real-time extensions standard. In 1996, the standard was again
updated as international standard ISO/IEC 99451:1996. It included interfaces for
multithreaded programming, called pthreads for POSIX threads. More real-time interfaces were
added in 1999 with the publication of IEEE Standard 1003.1d-1999. A year later, IEEE
Standard 1003.1j-2000 was published, including even more real-time interfaces, and IEEE
Standard 1003.1q-2000 was published, adding event-tracing extensions to the standard.
The 2001 version of 1003.1 departed from the prior versions in that it combined several 1003.1
amendments, the 1003.2 standard, and portions of the Single UNIX Specification (SUS),
Version 2 (more on this later). The resulting standard, IEEE Standard 1003.1-2001, includes
the following other standards:


ISO/IEC 9945-1 (IEEE Standard 1003.1-1996), which includes
o
IEEE Standard 1003.1-1990
o
IEEE Standard 1003.1b-1993 (real-time extensions)
o
IEEE Standard 1003.1c-1995 (pthreads)
o
IEEE Standard 1003.1i-1995 (real-time technical corrigenda)
IEEE P1003.1a draft standard (system interface revision)
Page 60
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html

IEEE Standard 1003.1d-1999 (advanced real-time extensions)

IEEE Standard 1003.1j-2000 (more advanced real-time extensions)

IEEE Standard 1003.1q-2000 (tracing)

IEEE Standard 1003.2d-1994 (batch extensions)

IEEE P1003.2b draft standard (additional utilities)

Parts of IEEE Standard 1003.1g-2000 (protocol-independent interfaces)

ISO/IEC 9945-2 (IEEE Standard 1003.2-1993)

The Base Specifications of the Single UNIX Specification, version 2, which include
o
System Interface Definitions, Issue 5
o
Commands and Utilities, Issue 5
o
System Interfaces and Headers, Issue 5

Open Group Technical Standard, Networking Services, Issue 5.2

ISO/IEC 9899:1999, Programming Languages - C
Figure 2.2, Figure 2.3, and Figure 2.4 summarize the required and optional headers as specified
by POSIX.1. Because POSIX.1 includes the ISO C standard library functions, it also requires
the headers listed in Figure 2.1. All four figures summarize which headers are included in the
implementations discussed in this book.
Figure 2.2. Required headers defined by the POSIX standard
FreeBSD
5.2.1
Linux
2.4.22
Mac OS
X 10.3
Solaris
9
<dirent.h>
•
•
•
•
directory entries (Section
4.21)
<fcntl.h>
•
•
•
•
file control (Section 3.14)
<fnmatch.h>
•
•
•
•
filename-matching types
<glob.h>
•
•
•
•
pathname pattern-matching
types
<grp.h>
•
•
•
•
group file (Section 6.4)
<netdb.h>
•
•
•
•
network database operations
<pwd.h>
•
•
•
•
password file (Section 6.2)
<regex.h>
•
•
•
•
regular expressions
<tar.h>
•
•
•
•
tar archive values
Header
Description
Page 61
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.2. Required headers defined by the POSIX standard
FreeBSD
5.2.1
Linux
2.4.22
Mac OS
X 10.3
Solaris
9
<termios.h>
•
•
•
•
terminal I/O (Chapter 18)
<unistd.h>
•
•
•
•
symbolic constants
<utime.h>
•
•
•
•
file times (Section 4.19)
<wordexp.h>
•
•
•
word-expansion types
<arpa/inet.h>
•
•
•
•
Internet definitions (Chapter
16)
<net/if.h>
•
•
•
•
socket local interfaces (
Chapter 16)
<netinet/in.h>
•
•
•
•
Internet address family (
Section 16.3)
<netinet/tcp.h>
•
•
•
•
Transmission Control
Protocol definitions
<sys/mman.h>
•
•
•
•
memory management
declarations
<sys/select.h>
•
•
•
•
Header
Description
select function (Section
14.5.1)
<sys/socket.h>
•
•
•
•
sockets interface (Chapter
16)
<sys/stat.h>
•
•
•
•
file status (Chapter 4)
<sys/times.h>
•
•
•
•
process times (Section 8.16)
<sys/types.h>
•
•
•
•
primitive system data types
(Section 2.8)
<sys/un.h>
•
•
•
•
UNIX domain socket
definitions (Section 17.3)
<sys/utsname.h>
•
•
•
•
system name (Section 6.9)
<sys/wait.h>
•
•
•
•
process control (Section 8.6
)
Page 62
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.3. XSI extension headers defined by the POSIX standard
FreeBSD
5.2.1
Linux
2.4.22
<cpio.h>
•
•
<dlfcn.h>
•
•
<fmtmsg.h>
•
Header
Mac OS X
10.3
Solaris
9
Description
•
cpio archive values
•
dynamic linking
•
•
message display structures
<ftw.h>
•
•
file tree walking (Section
4.21)
<iconv.h>
•
•
•
codeset conversion utility
•
<langinfo.h>
•
•
•
•
language information
constants
<libgen.h>
•
•
•
•
definitions for
pattern-matching function
<monetary.h>
•
•
•
•
monetary types
<ndbm.h>
•
•
•
database operations
<nl_types.h>
•
•
•
•
message catalogs
<poll.h>
•
•
•
•
poll function (Section
14.5.2)
<search.h>
•
•
•
•
search tables
<strings.h>
•
•
•
•
string operations
<syslog.h>
•
•
•
•
system error logging (
Section 13.4)
<ucontext.h>
•
•
•
•
user context
<ulimit.h>
•
•
•
•
user limits
•
user accounting database
•
IPC (Section 15.6)
•
message queues (Section
15.7)
<utmpx.h>
•
<sys/ipc.h>
•
•
<sys/msg.h>
•
•
•
Page 63
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.3. XSI extension headers defined by the POSIX standard
Header
FreeBSD
5.2.1
Linux
2.4.22
Mac OS X
10.3
Solaris
9
<sys/resource.h>
•
•
•
•
resource operations (
Section 7.11)
<sys/sem.h>
•
•
•
•
semaphores (Section 15.8)
<sys/shm.h>
•
•
•
•
shared memory (Section
15.9)
<sys/statvfs.h>
•
•
•
file system information
<sys/time.h>
•
•
•
•
time types
<sys/timeb.h>
•
•
•
•
additional date and time
definitions
<sys/uio.h>
•
•
•
•
vector I/O operations (
Section 14.7)
Description
Figure 2.4. Optional headers defined by the POSIX standard
FreeBSD
5.2.1
Linux
2.4.22
Mac OS X
10.3
Solaris
9
<aio.h>
•
•
•
•
asynchronous I/O
<mqueue.h>
•
•
message queues
<pthread.h>
•
•
•
•
threads (Chapters 11 and
12)
<sched.h>
•
•
•
•
execution scheduling
<semaphore.h>
•
•
•
•
semaphores
Header
<spawn.h>
•
<stropts.h>
•
<trace.h>
Description
real-time spawn interface
•
XSI STREAMS interface (
Section 14.4)
event tracing
Page 64
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
In this text we describe the 2001 version of POSIX.1, which includes the functions specified in
the ISO C standard. Its interfaces are divided into required ones and optional ones. The
optional interfaces are further divided into 50 sections, based on functionality. The sections
containing nonobsolete programming interfaces are summarized in Figure 2.5 with their
respective option codes. Option codes are two- to three-character abbreviations that help
identify the interfaces that belong to each functional area. The option codes highlight text on
manual pages where interfaces depend on the support of a particular option. Many of the
options deal with real-time extensions.
Figure 2.5. POSIX.1 optional interface groups and codes
Code
SUS
mandatory
Symbolic constant
Description
ADV
_POSIX_ADVISORY_INFO
advisory information (real-time)
AIO
_POSIX_ASYNCHRONOUS_IO
asynchronous input and output
(real-time)
BAR
_POSIX_BARRIERS
barriers (real-time)
CPT
_POSIX_CPUTIME
process CPU time clocks
(real-time)
CS
_POSIX_CLOCK_SELECTION
clock selection (real-time)
CX
•
FSC
•
extension to ISO C standard
_POSIX_FSYNC
file synchronization
_POSIX_IPV6
IPv6 interfaces
_POSIX_MAPPED_FILES
memory-mapped files
ML
_POSIX_MEMLOCK
process memory locking
(real-time)
MLR
_POSIX_MEMLOCK_RANGE
memory range locking
(real-time)
MON
_POSIX_MONOTONIC_CLOCK
monotonic clock (real-time)
_POSIX_MEMORY_PROTECTION
memory protection
_POSIX_MESSAGE_PASSING
message passing (real-time)
IP6
MF
MPR
MSG
•
•
MX
PIO
IEC 60559 floating-point option
_POSIX_PRIORITIZED_IO
prioritized input and output
Page 65
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.5. POSIX.1 optional interface groups and codes
Code
SUS
mandatory
Symbolic constant
Description
PS
_POSIX_PRIORITIZED_SCHEDULING
process scheduling (real-time)
RS
_POSIX_RAW_SOCKETS
raw sockets
RTS
_POSIX_REALTIME_SIGNALS
real-time signals extension
SEM
_POSIX_SEMAPHORES
semaphores (real-time)
SHM
_POSIX_SHARED_MEMORY_OBJECTS
shared memory objects
(real-time)
SIO
_POSIX_SYNCHRONIZED_IO
synchronized input and output
(real-time)
SPI
_POSIX_SPIN_LOCKS
spin locks (real-time)
SPN
_POSIX_SPAWN
spawn (real-time)
SS
_POSIX_SPORADIC_SERVER
process sporadic server
(real-time)
TCT
_POSIX_THREAD_CPUTIME
thread CPU time clocks
(real-time)
TEF
_POSIX_TRACE_EVENT_FILTER
trace event filter
_POSIX_THREADS
threads
TMO
_POSIX_TIMEOUTS
timeouts (real-time)
TMR
_POSIX_TIMERS
timers (real-time)
TPI
_POSIX_THREAD_PRIO_INHERIT
thread priority inheritance
(real-time)
TPP
_POSIX_THREAD_PRIO_PROTECT
thread priority protection
(real-time)
TPS
_POSIX_THREAD_PRIORITY_SCHEDULING
thread execution scheduling
(real-time)
TRC
_POSIX_TRACE
trace
THR
•
Page 66
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.5. POSIX.1 optional interface groups and codes
Code
SUS
mandatory
Symbolic constant
Description
TRI
_POSIX_TRACE_INHERIT
trace inherit
TRL
_POSIX_TRACE_LOG
trace log
TSA
•
_POSIX_THREAD_ATTR_STACKADDR
thread stack address attribute
TSF
•
_POSIX_THREAD_SAFE_FUNCTIONS
thread-safe functions
TSH
•
_POSIX_THREAD_PROCESS_SHARED
thread process-shared
synchronization
_POSIX_THREAD_SPORADIC_SERVER
thread sporadic server
(real-time)
_POSIX_THREAD_ATTR_STACKSIZE
thread stack address size
_POSIX_TYPED_MEMORY_OBJECTS
typed memory objects
(real-time)
_XOPEN_UNIX
X/Open extended interfaces
_XOPEN_STREAMS
XSI STREAMS
TSP
TSS
•
TYM
XSI
XSR
•
POSIX.1 does not include the notion of a superuser. Instead, certain operations require
"appropriate privileges," although POSIX.1 leaves the definition of this term up to the
implementation. UNIX systems that conform to the Department of Defense security guidelines
have many levels of security. In this text, however, we use the traditional terminology and
refer to operations that require superuser privilege.
After almost twenty years of work, the standards are mature and stable. The POSIX.1
standard is maintained by an open working group known as the Austin Group (
http://www.opengroup.org/austin). To ensure that they are still relevant, the standards need
to be either updated or reaffirmed every so often.
2.2.3. The Single UNIX Specification
The Single UNIX Specification, a superset of the POSIX.1 standard, specifies additional
interfaces that extend the functionality provided by the basic POSIX.1 specification. The
complete set of system interfaces is called the X/Open System Interface (XSI). The
_XOPEN_UNIX symbolic constant identifies interfaces that are part of the XSI extensions to the
base POSIX.1 interfaces.
The XSI also defines which optional portions of POSIX.1 must be supported for an
implementation to be deemed XSI conforming. These include file synchronization,
memory-mapped files, memory protection, and thread interfaces, and are marked in Figure 2.5
as "SUS mandatory." Only XSI-conforming implementations can be called UNIX systems.
Page 67
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The Open Group owns the UNIX trademark and uses the Single UNIX Specification to define
the interfaces an implementation must support to call itself a UNIX system. Implementations
must file conformance statements, pass test suites that verify conformance, and license the
right to use the UNIX trademark.
Some of the additional interfaces defined in the XSI are required, whereas others are optional.
The interfaces are divided into option groups based on common functionality, as follows:

Encryption: denoted by the _XOPEN_CRYPT symbolic constant

Real-time: denoted by the _XOPEN_REALTIME symbolic constant

Advanced real-time

Real-time threads: denoted by the _XOPEN_REALTIME_THREADS symbolic constant

Advanced real-time threads

Tracing

XSI STREAMS: denoted by the _XOPEN_STREAMS symbolic constant

Legacy: denoted by the _XOPEN_LEGACY symbolic constant
The Single UNIX Specification (SUS) is a publication of The Open Group, which was formed in
1996 as a merger of X/Open and the Open Software Foundation (OSF), both industry
consortia. X/Open used to publish the X/Open Portability Guide, which adopted specific
standards and filled in the gaps where functionality was missing. The goal of these guides was
to improve application portability past what was possible by merely conforming to published
standards.
The first version of the Single UNIX Specification was published by X/Open in 1994. It was also
known as "Spec 1170," because it contained roughly 1,170 interfaces. It grew out of the
Common Open Software Environment (COSE) initiative, whose goal was to further improve
application portability across all implementations of the UNIX operating system. The COSE
groupSun, IBM, HP, Novell/USL, and OSFwent further than endorsing standards. In addition,
they investigated interfaces used by common commercial applications. The resulting 1,170
interfaces were selected from these applications, and also included the X/Open Common
Application Environment (CAE), Issue 4 (known as "XPG4" as a historical reference to its
predecessor, the X/Open Portability Guide), the System V Interface Definition (SVID), Edition
3, Level 1 interfaces, and the OSF Application Environment Specification (AES) Full Use
interfaces.
The second version of the Single UNIX Specification was published by The Open Group in
1997. The new version added support for threads, real-time interfaces, 64-bit processing,
large files, and enhanced multibyte character processing.
The third version of the Single UNIX Specification (SUSv3, for short) was published by The
Open Group in 2001. The Base Specifications of SUSv3 are the same as the IEEE Standard
1003.1-2001 and are divided into four sections: Base Definitions, System Interfaces, Shell and
Utilities, and Rationale. SUSv3 also includes X/Open Curses Issue 4, Version 2, but this
specification is not part of POSIX.1.
In 2002, ISO approved this version as International Standard ISO/IEC 9945:2002. The Open
Group updated the 1003.1 standard again in 2003 to include technical corrections, and ISO
approved this as International Standard ISO/IEC 9945:2003. In April 2004, The Open Group
published the Single UNIX Specification, Version 3, 2004 Edition. It included more technical
corrections edited in with the main text of the standard.
2.2.4. FIPS
Page 68
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
FIPS stands for Federal Information Processing Standard. It was published by the U.S.
government, which used it for the procurement of computer systems. FIPS 1511 (April 1989)
was based on the IEEE Std. 1003.11988 and a draft of the ANSI C standard. This was followed
by FIPS 1512 (May 1993), which was based on the IEEE Standard 1003.11990. FIPS 1512
required some features that POSIX.1 listed as optional. All these options have been included
as mandatory in POSIX.1-2001.
The effect of the POSIX.1 FIPS was to require any vendor that wished to sell
POSIX.1-compliant computer systems to the U.S. government to support some of the optional
features of POSIX.1. The POSIX.1 FIPS has since been withdrawn, so we won't consider it
further in this text.
Page 69
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
2.3. UNIX System Implementations
The previous section described ISO C, IEEE POSIX, and the Single UNIX Specification; three
standards created by independent organizations. Standards, however, are interface
specifications. How do these standards relate to the real world? These standards are taken by
vendors and turned into actual implementations. In this book, we are interested in both these
standards and their implementation.
Section 1.1 of McKusick et al. [1996] gives a detailed history (and a nice picture) of the UNIX
System family tree. Everything starts from the Sixth Edition (1976) and Seventh Edition (1979)
of the UNIX Time-Sharing System on the PDP-11 (usually called Version 6 and Version 7).
These were the first releases widely distributed outside of Bell Laboratories. Three branches of
the tree evolved.
1.
One at AT&T that led to System III and System V, the so-called commercial versions
of the UNIX System.
2.
One at the University of California at Berkeley that led to the 4.xBSD implementations.
3.
The research version of the UNIX System, developed at the Computing Science
Research Center of AT&T Bell Laboratories, that led to the UNIX Time-Sharing System
8th Edition, 9th Edition, and ended with the 10th Edition in 1990.
2.3.1. UNIX System V Release 4
UNIX System V Release 4 (SVR4) was a product of AT&T's UNIX System Laboratories (USL,
formerly AT&T's UNIX Software Operation). SVR4 merged functionality from AT&T UNIX System
V Release 3.2 (SVR3.2), the SunOS operating system from Sun Microsystems, the 4.3BSD
release from the University of California, and the Xenix system from Microsoft into one
coherent operating system. (Xenix was originally developed from Version 7, with many features
later taken from System V.) The SVR4 source code was released in late 1989, with the first
end-user copies becoming available during 1990. SVR4 conformed to both the POSIX 1003.1
standard and the X/Open Portability Guide, Issue 3 (XPG3).
AT&T also published the System V Interface Definition (SVID) [AT&T 1989]. Issue 3 of the
SVID specified the functionality that an operating system must offer to qualify as a conforming
implementation of UNIX System V Release 4. As with POSIX.1, the SVID specified an interface,
not an implementation. No distinction was made in the SVID between system calls and library
functions. The reference manual for an actual implementation of SVR4 must be consulted to
see this distinction [AT&T 1990e].
2.3.2. 4.4BSD
The Berkeley Software Distribution (BSD) releases were produced and distributed by the
Computer Systems Research Group (CSRG) at the University of California at Berkeley; 4.2BSD
was released in 1983 and 4.3BSD in 1986. Both of these releases ran on the VAX
minicomputer. The next release, 4.3BSD Tahoe in 1988, also ran on a particular minicomputer
called the Tahoe. (The book by Leffler et al. [1989] describes the 4.3BSD Tahoe release.) This
was followed in 1990 with the 4.3BSD Reno release; 4.3BSD Reno supported many of the
POSIX.1 features.
The original BSD systems contained proprietary AT&T source code and were covered by AT&T
licenses. To obtain the source code to the BSD system you had to have a UNIX source license
from AT&T. This changed as more and more of the AT&T source code was replaced over the
years with non-AT&T source code and as many of the new features added to the Berkeley
system were derived from non-AT&T sources.
Page 70
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
In 1989, Berkeley identified much of the non-AT&T source code in the 4.3BSD Tahoe release
and made it publicly available as the BSD Networking Software, Release 1.0. This was followed
in 1991 with Release 2.0 of the BSD Networking Software, which was derived from the 4.3BSD
Reno release. The intent was that most, if not all, of the 4.4BSD system would be free of any
AT&T license restrictions, thus making the source code available to all.
4.4BSD-Lite was intended to be the final release from the CSRG. Its introduction was delayed,
however, because of legal battles with USL. Once the legal differences were resolved,
4.4BSD-Lite was released in 1994, fully unencumbered, so no UNIX source license was needed
to receive it. The CSRG followed this with a bug-fix release in 1995. This release, 4.4BSD-Lite,
release 2, was the final version of BSD from the CSRG. (This version of BSD is described in the
book by McKusick et al. [1996].)
The UNIX system development done at Berkeley started with PDP-11s, then moved to the VAX
minicomputer, and then to other so-called workstations. During the early 1990s, support was
provided to Berkeley for the popular 80386-based personal computers, leading to what is
called 386BSD. This was done by Bill Jolitz and was documented in a series of monthly articles
in Dr. Dobb's Journal throughout 1991. Much of this code appears in the BSD Networking
Software, Release 2.0.
2.3.3. FreeBSD
FreeBSD is based on the 4.4BSD-Lite operating system. The FreeBSD project was formed to
carry on the BSD line after the Computing Science Research Group at the University of
California at Berkeley decided to end its work on the BSD versions of the UNIX operating
system, and the 386BSD project seemed to be neglected for too long.
All software produced by the FreeBSD project is freely available in both binary and source
forms. The FreeBSD 5.2.1 operating system was one of the four used to test the examples in
this book.
Several other BSD-based free operating systems are available. The NetBSD project (
http://www.netbsd.org) is similar to the FreeBSD project, with an emphasis on portability
between hardware platforms. The OpenBSD project (http://www.openbsd.org) is similar to
FreeBSD but with an emphasis on security.
2.3.4. Linux
Linux is an operating system that provides a rich UNIX programming environment, and is freely
available under the GNU Public License. The popularity of Linux is somewhat of a phenomenon
in the computer industry. Linux is distinguished by often being the first operating system to
support new hardware.
Linux was created in 1991 by Linus Torvalds as a replacement for MINIX. A grass-roots effort
then sprang up, whereby many developers across the world volunteered their time to use and
enhance it.
The Mandrake 9.2 distribution of Linux was one of the operating systems used to test the
examples in this book. That distribution uses the 2.4.22 version of the Linux operating system
kernel.
2.3.5. Mac OS X
Mac OS X is based on entirely different technology than prior versions. The core operating
system is called "Darwin," and is based on a combination of the Mach kernel (Accetta et al. [
1986]) and the FreeBSD operating system. Darwin is managed as an open source project,
similar to FreeBSD and Linux.
Mac OS X version 10.3 (Darwin 7.4.0) was used as one of the operating systems to test the
examples in this book.
Page 71
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
2.3.6. Solaris
Solaris is the version of the UNIX System developed by Sun Microsystems. It is based on
System V Release 4, with more than ten years of enhancements from the engineers at Sun
Microsystems. It is the only commercially successful SVR4 descendant, and is formally certified
to be a UNIX system. (For more information on UNIX certification, see
http://www.opengroup.org/certification/idx/unix.html.)
The Solaris 9 UNIX system was one of the operating systems used to test the examples in this
book.
2.3.7. Other UNIX Systems
Other versions of the UNIX system that have been certified in the past include

AIX, IBM's version of the UNIX System

HP-UX, Hewlett-Packard's version of the UNIX System

IRIX, the UNIX System version shipped by Silicon Graphics

UnixWare, the UNIX System descended from SVR4 and currently sold by SCO
Page 72
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
2.4. Relationship of Standards and Implementations
The standards that we've mentioned define a subset of any actual system. The focus of this
book is on four real systems: FreeBSD 5.2.1, Linux 2.4.22, Mac OS X 10.3, and Solaris 9.
Although only Solaris can call itself a UNIX system, all four provide a UNIX programming
environment. Because all four are POSIX compliant to varying degrees, we will also
concentrate on the features that are required by the POSIX.1 standard, noting any
differences between POSIX and the actual implementations of these four systems. Those
features and routines that are specific to only a particular implementation are clearly marked.
As SUSv3 is a superset of POSIX.1, we'll also note any features that are part of SUSv3 but
not part of POSIX.1.
Be aware that the implementations provide backward compatibility for features in earlier
releases, such as SVR3.2 and 4.3BSD. For example, Solaris supports both the POSIX.1
specification for nonblocking I/O (O_NONBLOCK) and the traditional System V method (O_NDELAY).
In this text, we'll use only the POSIX.1 feature, although we'll mention the nonstandard
feature that it replaces. Similarly, both SVR3.2 and 4.3BSD provided reliable signals in a way
that differs from the POSIX.1 standard. In Chapter 10 we describe only the POSIX.1 signal
mechanism.
Page 73
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
2.5. Limits
The implementations define many magic numbers and constants. Many of these have been
hard coded into programs or were determined using ad hoc techniques. With the various
standardization efforts that we've described, more portable methods are now provided to
determine these magic numbers and implementation-defined limits, greatly aiding the
portability of our software.
Two types of limits are needed:
1.
Compile-time limits (e.g., what's the largest value of a short integer?)
2.
Runtime limits (e.g., how many characters in a filename?)
Compile-time limits can be defined in headers that any program can include at compile time.
But runtime limits require the process to call a function to obtain the value of the limit.
Additionally, some limits can be fixed on a given implementationand could therefore be defined
statically in a headeryet vary on another implementation and would require a runtime function
call. An example of this type of limit is the maximum number of characters in a filename. Before
SVR4, System V historically allowed only 14 characters in a filename, whereas BSD-derived
systems increased this number to 255. Most UNIX System implementations these days support
multiple file system types, and each type has its own limit. This is the case of a runtime limit
that depends on where in the file system the file in question is located. A filename in the root
file system, for example, could have a 14-character limit, whereas a filename in another file
system could have a 255-character limit.
To solve these problems, three types of limits are provided:
1.
Compile-time limits (headers)
2.
Runtime limits that are not associated with a file or directory (the sysconf function)
3.
Runtime limits that are associated with a file or a directory (the pathconf and fpathconf
functions)
To further confuse things, if a particular runtime limit does not vary on a given system, it can
be defined statically in a header. If it is not defined in a header, however, the application must
call one of the three conf functions (which we describe shortly) to determine its value at
runtime.
2.5.1. ISO C Limits
All the limits defined by ISO C are compile-time limits. Figure 2.6 shows the limits from the C
standard that are defined in the file <limits.h>. These constants are always defined in the
header and don't change in a given system. The third column shows the minimum acceptable
values from the ISO C standard. This allows for a system with 16-bit integers using
one's-complement arithmetic. The fourth column shows the values from a Linux system with
32-bit integers using two's-complement arithmetic. Note that none of the unsigned data types
has a minimum value, as this value must be 0 for an unsigned data type. On a 64-bit system,
the values for long integer maximums match the maximum values for long long integers.
Page 74
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.6. Sizes of integral values from <limits.h>
Name
Description
Minimum acceptable value
Typical value
CHAR_BIT
bits in a char
8
8
CHAR_MAX
max value of
(see later)
127
(see later)
128
127
127
127
128
255
255
32,767
2,147,483,647
char
CHAR_MIN
min value of
char
SCHAR_MAX
max value of
signed char
SCHAR_MIN
min value of
signed char
UCHAR_MAX
max value of
unsigned char
INT_MAX
max value of
int
INT_MIN
min value of int
32,767
2,147,483,648
UINT_MAX
max value of
65,535
4,294,967,295
32,767
32,768
32,767
32,767
65,535
65,535
2,147,483,647
2,147,483,647
2,147,483,647
2,147,483,648
4,294,967,295
4,294,967,295
9,223,372,036,854,775,807
9,223,372,036,854,775,807
unsigned int
SHRT_MIN
min value of
short
SHRT_MAX
max value of
short
USHRT_MAX
max value of
unsigned short
LONG_MAX
max value of
long
LONG_MIN
min value of
long
ULONG_MAX
max value of
unsigned long
LLONG_MAX
max value of
long long
Page 75
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.6. Sizes of integral values from <limits.h>
Name
LLONG_MIN
Description
min value of
Minimum acceptable value
Typical value
9,223,372,036,854,775,807
9,223,372,036,854,775,808
18,446,744,073,709,551,615
18,446,744,073,709,551,615
long long
ULLONG_MAX
max value of
unsigned long
long
MB_LEN_MAX
max number of
bytes in a
multibyte
character
constant
1
16
One difference that we will encounter is whether a system provides signed or unsigned
character values. From the fourth column in Figure 2.6, we see that this particular system
uses signed characters. We see that CHAR_MIN equals SCHAR_MIN and that CHAR_MAX equals
SCHAR_MAX. If the system uses unsigned characters, we would have CHAR_MIN equal to 0 and
CHAR_MAX equal to UCHAR_MAX.
The floating-point data types in the header <float.h> have a similar set of definitions. Anyone
doing serious floating-point work should examine this file.
Another ISO C constant that we'll encounter is FOPEN_MAX, the minimum number of standard I/O
streams that the implementation guarantees can be open at once. This value is in the
<stdio.h> header, and its minimum value is 8. The POSIX.1 value STREAM_MAX, if defined, must
have the same value as FOPEN_MAX.
ISO C also defines the constant TMP_MAX in <stdio.h>. It is the maximum number of unique
filenames generated by the tmpnam function. We'll have more to say about this constant in
Section 5.13.
In Figure 2.7, we show the values of FOPEN_MAX and TMP_MAX on the four platforms we discuss
in this book.
Figure 2.7. ISO limits on various platforms
Limit
FOPEN_MAX
TMP_MAX
FreeBSD 5.2.1
Linux 2.4.22
Mac OS X 10.3
Solaris 9
20
16
20
20
308,915,776
238,328
308,915,776
17,576
ISO C also defines the constant FILENAME_MAX, but we avoid using it, because some operating
system implementations historically have defined it to be too small to be of use.
2.5.2. POSIX Limits
POSIX.1 defines numerous constants that deal with implementation limits of the operating
Page 76
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
system. Unfortunately, this is one of the more confusing aspects of POSIX.1. Although
POSIX.1 defines numerous limits and constants, we'll only concern ourselves with the ones
that affect the base POSIX.1 interfaces. These limits and constants are divided into the
following five categories:
1.
Invariant minimum values: the 19 constants in Figure 2.8
2.
Invariant value: SSIZE_MAX
3.
Runtime increasable values: CHARCLASS_NAME_MAX, COLL_WEIGHTS_MAX, LINE_MAX,
NGROUPS_MAX, and RE_DUP_MAX
4.
Runtime invariant values, possibly indeterminate: ARG_MAX, CHILD_MAX, HOST_NAME_MAX,
LOGIN_NAME_MAX, OPEN_MAX, PAGESIZE, RE_DUP_MAX, STREAM_MAX, SYMLOOP_MAX, TTY_NAME_MAX,
and TZNAME_MAX
5.
Pathname variable values, possibly indeterminate: FILESIZEBITS, LINK_MAX, MAX_CANON,
MAX_INPUT, NAME_MAX, PATH_MAX, PIPE_BUF, and SYMLINK_MAX
Figure 2.8. POSIX.1 invariant minimum values from <limits.h>
Name
Description: minimum acceptable value for
_POSIX_ARG_MAX
length of arguments to exec functions
_POSIX_CHILD_MAX
number of child processes per real user ID
_POSIX_HOST_NAME_MAX
maximum length of a host name as returned by
Value
4,096
25
255
gethostname
_POSIX_LINK_MAX
number of links to a file
8
_POSIX_LOGIN_NAME_MAX
maximum length of a login name
9
_POSIX_MAX_CANON
number of bytes on a terminal's canonical input queue
255
_POSIX_MAX_INPUT
space available on a terminal's input queue
255
_POSIX_NAME_MAX
number of bytes in a filename, not including the
terminating null
_POSIX_NGROUPS_MAX
number of simultaneous supplementary group IDs per
process
_POSIX_OPEN_MAX
number of open files per process
_POSIX_PATH_MAX
number of bytes in a pathname, including the terminating
null
256
_POSIX_PIPE_BUF
number of bytes that can be written atomically to a pipe
512
_POSIX_RE_DUP_MAX
number of repeated occurrences of a basic regular
255
14
8
20
Page 77
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.8. POSIX.1 invariant minimum values from <limits.h>
Name
Description: minimum acceptable value for
Value
expression permitted by the regexec and regcomp functions
when using the interval notation \{m,n\}
_POSIX_SSIZE_MAX
value that can be stored in ssize_t object
_POSIX_STREAM_MAX
number of standard I/O streams a process can have open
at once
_POSIX_SYMLINK_MAX
number of bytes in a symbolic link
_POSIX_SYMLOOP_MAX
number of symbolic links that can be traversed during
pathname resolution
8
_POSIX_TTY_NAME_MAX
length of a terminal device name, including the terminating
null
9
_POSIX_TZNAME_MAX
number of bytes for the name of a time zone
6
32,767
8
255
Of these 44 limits and constants, some may be defined in <limits.h>, and others may or may
not be defined, depending on certain conditions. We describe the limits and constants that
may or may not be defined in Section 2.5.4, when we describe the sysconf, pathconf, and
fpathconf functions. The 19 invariant minimum values are shown in Figure 2.8.
These values are invariant; they do not change from one system to another. They specify the
most restrictive values for these features. A conforming POSIX.1 implementation must provide
values that are at least this large. This is why they are called minimums, although their names
all contain MAX. Also, to ensure portability, a strictly-conforming application must not require a
larger value. We describe what each of these constants refers to as we proceed through the
text.
A strictly-conforming POSIX application is different from an application that is merely POSIX
conforming. A POSIX-conforming application uses only interfaces defined in IEEE Standard
1003.1-2001. A strictly-conforming application is a POSIX-conforming application that does
not rely on any undefined behavior, does not use any obsolescent interfaces, and does not
require values of constants larger than the minimums shown in Figure 2.8.
Unfortunately, some of these invariant minimum values are too small to be of practical use. For
example, most UNIX systems today provide far more than 20 open files per process. Also, the
minimum limit of 255 for _POSIX_PATH_MAX is too small. Pathnames can exceed this limit. This
means that we can't use the two constants _POSIX_OPEN_MAX and _POSIX_PATH_MAX as array
sizes at compile time.
Each of the 19 invariant minimum values in Figure 2.8 has an associated implementation value
whose name is formed by removing the _POSIX_ prefix from the name in Figure 2.8. The names
without the leading _POSIX_ were intended to be the actual values that a given implementation
supports. (These 19 implementation values are items 25 from our list earlier in this section: the
invariant value, the runtime increasable value, the runtime invariant values, and the pathname
variable values.) The problem is that not all of the 19 implementation values are guaranteed to
be defined in the <limits.h> header.
Page 78
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
For example, a particular value may not be included in the header if its actual value for a given
process depends on the amount of memory on the system. If the values are not defined in the
header, we can't use them as array bounds at compile time. So, POSIX.1 decided to provide
three runtime functions for us to callsysconf, pathconf, and fpathconfto determine the actual
implementation value at runtime. There is still a problem, however, because some of the
values are defined by POSIX.1 as being possibly "indeterminate" (logically infinite). This means
that the value has no practical upper bound. On Linux, for example, the number of iovec
structures you can use with readv or writev is limited only by the amount of memory on the
system. Thus, IOV_MAX is considered indeterminate on Linux. We'll return to this problem of
indeterminate runtime limits in Section 2.5.5.
2.5.3. XSI Limits
The XSI also defines constants that deal with implementation limits. They include:
1.
Invariant minimum values: the ten constants in Figure 2.9
2.
Numerical limits: LONG_BIT and WORD_BIT
3.
Runtime invariant values, possibly indeterminate: ATEXIT_MAX, IOV_MAX, and PAGE_SIZE
Figure 2.9. XSI invariant minimum values from <limits.h>
Name
Description
Minimum acceptable
value
Typical
value
NL_ARGMAX
maximum value of digit in calls to
printf and scanf
9
9
NL_LANGMAX
maximum number of bytes in LANG
environment variable
14
14
NL_MSGMAX
maximum message number
32,767
32,767
NL_NMAX
maximum number of bytes in N-to-1
mapping characters
(none specified)
1
NL_SETMAX
maximum set number
255
255
NL_TEXTMAX
maximum number of bytes in a message
string
_POSIX2_LINE_MAX
2,048
NZERO
default process priority
20
20
_XOPEN_IOV_MAX
maximum number of iovec structures
that can be used with readv or writev
16
16
_XOPEN_NAME_MAX
number of bytes in a filename
255
255
_XOPEN_PATH_MAX
number of bytes in a pathname
1,024
1,024
The invariant minimum values are listed in Figure 2.9. Many of these values deal with message
Page 79
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
catalogs. The last two illustrate the situation in which the POSIX.1 minimums were too
smallpresumably to allow for embedded POSIX.1 implementationsso the Single UNIX
Specification added symbols with larger minimum values for XSI-conforming systems.
2.5.4. sysconf, pathconf, and fpathconf Functions
We've listed various minimum values that an implementation must support, but how do we find
out the limits that a particular system actually supports? As we mentioned earlier, some of
these limits might be available at compile time; others must be determined at runtime. We've
also mentioned that some don't change in a given system, whereas others can change
because they are associated with a file or directory. The runtime limits are obtained by calling
one of the following three functions.
#include <unistd.h>
long sysconf(int name);
long pathconf(const char *pathname, int name);
long fpathconf(int filedes, int name);
All three return: corresponding value if OK, 1 on error (see later)
The difference between the last two functions is that one takes a pathname as its argument
and the other takes a file descriptor argument.
Figure 2.10 lists the name arguments that sysconf uses to identify system limits. Constants
beginning with _SC_ are used as arguments to sysconf to identify the runtime limit. Figure 2.11
lists the name arguments that are used by pathconf and fpathconf to identify system limits.
Constants beginning with _PC_ are used as arguments to pathconf and fpathconf to identify
the runtime limit.
Figure 2.10. Limits and name arguments to sysconf
Name of limit
Description
name argument
ARG_MAX
maximum length, in bytes, of arguments to the
exec functions
_SC_ARG_MAX
ATEXIT_MAX
maximum number of functions that can be
registered with the atexit function
_SC_ATEXIT_MAX
CHILD_MAX
maximum number of processes per real user ID
_SC_CHILD_MAX
clock
ticks/second
number of clock ticks per second
_SC_CLK_TCK
COLL_WEIGHTS_MAX
maximum number of weights that can be
assigned to an entry of the LC_COLLATE order
keyword in the locale definition file
_SC_COLL_WEIGHTS_MAX
HOST_NAME_MAX
maximum length of a host name as returned by
_SC_HOST_NAME_MAX
gethostname
Page 80
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.10. Limits and name arguments to sysconf
Name of limit
Description
name argument
IOV_MAX
maximum number of iovec structures that can be
used with readv or writev
_SC_IOV_MAX
LINE_MAX
maximum length of a utility's input line
_SC_LINE_MAX
LOGIN_NAME_MAX
maximum length of a login name
_SC_LOGIN_NAME_MAX
NGROUPS_MAX
maximum number of simultaneous supplementary
process group IDs per process
_SC_NGROUPS_MAX
OPEN_MAX
maximum number of open files per process
_SC_OPEN_MAX
PAGESIZE
system memory page size, in bytes
_SC_PAGESIZE
PAGE_SIZE
system memory page size, in bytes
_SC_PAGE_SIZE
RE_DUP_MAX
number of repeated occurrences of a basic
regular expression permitted by the regexec and
regcomp functions when using the interval
notation \{m,n\}
_SC_RE_DUP_MAX
STREAM_MAX
maximum number of standard I/O streams per
process at any given time; if defined, it must
have the same value as FOPEN_MAX
_SC_STREAM_MAX
SYMLOOP_MAX
number of symbolic links that can be traversed
during pathname resolution
_SC_SYMLOOP_MAX
TTY_NAME_MAX
length of a terminal device name, including the
terminating null
_SC_TTY_NAME_MAX
TZNAME_MAX
maximum number of bytes for the name of a time
zone
_SC_TZNAME_MAX
Figure 2.11. Limits and name arguments to pathconf and fpathconf
Name of limit
Description
name argument
FILESIZEBITS
minimum number of bits needed to represent, as a signed _PC_FILESIZEBITS
integer value, the maximum size of a regular file allowed in
the specified directory
LINK_MAX
maximum value of a file's link count
_PC_LINK_MAX
Page 81
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.11. Limits and name arguments to pathconf and fpathconf
Name of limit
Description
name argument
MAX_CANON
maximum number of bytes on a terminal's canonical input
queue
_PC_MAX_CANON
MAX_INPUT
number of bytes for which space is available on terminal's
input queue
_PC_MAX_INPUT
NAME_MAX
maximum number of bytes in a filename (does not include
a null at end)
_PC_NAME_MAX
PATH_MAX
maximum number of bytes in a relative pathname,
including the terminating null
_PC_PATH_MAX
PIPE_BUF
maximum number of bytes that can be written atomically
to a pipe
_PC_PIPE_BUF
SYMLINK_MAX
number of bytes in a symbolic link
_PC_SYMLINK_MAX
We need to look in more detail at the different return values from these three functions.
1.
All three functions return 1 and set errno to EINVAL if the name isn't one of the
appropriate constants. The third column in Figures 2.10 and 2.11 lists the limit
constants we'll deal with throughout the rest of this book.
2.
Some names can return either the value of the variable (a return value
0) or an
indication that the value is indeterminate. An indeterminate value is indicated by
returning 1 and not changing the value of errno.
3.
The value returned for _SC_CLK_TCK is the number of clock ticks per second, for use
with the return values from the times function (Section 8.16).
There are some restrictions for the pathname argument to pathconf and the filedes argument
to fpathconf. If any of these restrictions isn't met, the results are undefined.
1.
The referenced file for _PC_MAX_CANON and _PC_MAX_INPUT must be a terminal file.
2.
The referenced file for _PC_LINK_MAX can be either a file or a directory. If the
referenced file is a directory, the return value applies to the directory itself, not to the
filename entries within the directory.
3.
The referenced file for _PC_FILESIZEBITS and _PC_NAME_MAX must be a directory. The
return value applies to filenames within the directory.
4.
The referenced file for _PC_PATH_MAX must be a directory. The value returned is the
maximum length of a relative pathname when the specified directory is the working
directory. (Unfortunately, this isn't the real maximum length of an absolute pathname,
which is what we want to know. We'll return to this problem in Section 2.5.5.)
5.
The referenced file for _PC_PIPE_BUF must be a pipe, FIFO, or directory. In the first two
cases (pipe or FIFO) the return value is the limit for the referenced pipe or FIFO. For
the other case (a directory) the return value is the limit for any FIFO created in that
directory.
Page 82
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
6.
The referenced file for _PC_SYMLINK_MAX must be a directory. The value returned is the
maximum length of the string that a symbolic link in that directory can contain.
Example
The awk(1) program shown in Figure 2.12 builds a C program that prints the value of each
pathconf and sysconf symbol.
The awk program reads two input filespathconf.sym and sysconf.symthat contain lists of the
limit name and symbol, separated by tabs. All symbols are not defined on every platform, so
the awk program surrounds each call to pathconf and sysconf with the necessary #ifdef
statements.
For example, the awk program transforms a line in the input file that looks like
NAME_MAX
_PC_NAME_MAX
into the following C code:
#ifdef NAME_MAX
printf("NAME_MAX is defined to be %d\n", NAME_MAX+0);
#else
printf("no symbol for NAME_MAX\n");
#endif
#ifdef _PC_NAME_MAX
pr_pathconf("NAME_MAX =", argv[1], _PC_NAME_MAX);
#else
printf("no symbol for _PC_NAME_MAX\n");
#endif
The program in Figure 2.13, generated by the awk program, prints all these limits, handling the
case in which a limit is not defined.
Figure 2.14 summarizes results from Figure 2.13 for the four systems we discuss in this book.
The entry "no symbol" means that the system doesn't provide a corresponding _SC or _PC
symbol to query the value of the constant. Thus, the limit is undefined in this case. In
contrast, the entry "unsupported" means that the symbol is defined by the system but
unrecognized by the sysconf or pathconf functions. The entry "no limit" means that the system
defines no limit for the constant, but this doesn't mean that the limit is infinite.
We'll see in Section 4.14 that UFS is the SVR4 implementation of the Berkeley fast file system.
PCFS is the MS-DOS FAT file system implementation for Solaris.
Figure 2.12. Build C program to print all supported configuration limits
BEGIN
{
printf("#include \"apue.h\"\n")
printf("#include <errno.h>\n")
printf("#include <limits.h>\n")
printf("\n")
printf("static void pr_sysconf(char *, int);\n")
printf("static void pr_pathconf(char *, char *, int);\n")
printf("\n")
printf("int\n")
printf("main(int argc, char *argv[])\n")
printf("{\n")
printf("\tif (argc != 2)\n")
printf("\t\terr_quit(\"usage: a.out <dirname>\");\n\n")
Page 83
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
FS="\t+"
while (getline <"sysconf.sym" > 0) {
printf("#ifdef %s\n", $1)
printf("\tprintf(\"%s defined to be %%d\\n\", %s+0);\n", $1, $1)
printf("#else\n")
printf("\tprintf(\"no symbol for %s\\n\");\n", $1)
printf("#endif\n")
printf("#ifdef %s\n", $2)
printf("\tpr_sysconf(\"%s =\", %s);\n", $1, $2)
printf("#else\n")
printf("\tprintf(\"no symbol for %s\\n\");\n", $2)
printf("#endif\n")
}
close("sysconf.sym")
while (getline <"pathconf.sym" > 0) {
printf("#ifdef %s\n", $1)
printf("\tprintf(\"%s defined to be %%d\\n\", %s+0);\n", $1, $1)
printf("#else\n")
printf("\tprintf(\"no symbol for %s\\n\");\n", $1)
printf("#endif\n")
printf("#ifdef %s\n", $2)
printf("\tpr_pathconf(\"%s =\", argv[1], %s);\n", $1, $2)
printf("#else\n")
printf("\tprintf(\"no symbol for %s\\n\");\n", $2)
printf("#endif\n")
}
close("pathconf.sym")
exit
}
END {
printf("\texit(0);\n")
printf("}\n\n")
printf("static void\n")
printf("pr_sysconf(char *mesg, int name)\n")
printf("{\n")
printf("\tlong val;\n\n")
printf("\tfputs(mesg, stdout);\n")
printf("\terrno = 0;\n")
printf("\tif ((val = sysconf(name)) < 0) {\n")
printf("\t\tif (errno != 0) {\n")
printf("\t\t\tif (errno == EINVAL)\n")
printf("\t\t\t\tfputs(\" (not supported)\\n\", stdout);\n")
printf("\t\t\telse\n")
printf("\t\t\t\terr_sys(\"sysconf error\");\n")
printf("\t\t} else {\n")
printf("\t\t\tfputs(\" (no limit)\\n\", stdout);\n")
printf("\t\t}\n")
printf("\t} else {\n")
printf("\t\tprintf(\" %%ld\\n\", val);\n")
printf("\t}\n")
printf("}\n\n")
printf("static void\n")
printf("pr_pathconf(char *mesg, char *path, int name)\n")
printf("{\n")
printf("\tlong val;\n")
printf("\n")
printf("\tfputs(mesg, stdout);\n")
printf("\terrno = 0;\n")
printf("\tif ((val = pathconf(path, name)) < 0) {\n")
printf("\t\tif (errno != 0) {\n")
Page 84
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
printf("\t\t\tif (errno == EINVAL)\n")
printf("\t\t\t\tfputs(\" (not supported)\\n\", stdout);\n")
printf("\t\t\telse\n")
printf("\t\t\t\terr_sys(\"pathconf error, path = %%s\", path);\n")
printf("\t\t} else {\n")
printf("\t\t\tfputs(\" (no limit)\\n\", stdout);\n")
printf("\t\t}\n")
printf("\t} else {\n")
printf("\t\tprintf(\" %%ld\\n\", val);\n")
printf("\t}\n")
printf("}\n")
}
Figure 2.13. Print all possible sysconf and pathconf values
#include "apue.h"
#include <errno.h>
#include <limits.h>
static void pr_sysconf(char *, int);
static void pr_pathconf(char *, char *, int);
int
main(int argc, char *argv[])
{
if (argc != 2)
err_quit("usage: a.out <dirname>");
#ifdef ARG_MAX
printf("ARG_MAX defined to be %d\n", ARG_MAX+0);
#else
printf("no symbol for ARG_MAX\n");
#endif
#ifdef _SC_ARG_MAX
pr_sysconf("ARG_MAX =", _SC_ARG_MAX);
#else
printf("no symbol for _SC_ARG_MAX\n");
#endif
/* similar processing for all the rest of the sysconf symbols... */
#ifdef MAX_CANON
printf("MAX_CANON defined to be %d\n", MAX_CANON+0);
#else
printf("no symbol for MAX_CANON\n");
#endif
#ifdef _PC_MAX_CANON
pr_pathconf("MAX_CANON =", argv[1], _PC_MAX_CANON);
#else
printf("no symbol for _PC_MAX_CANON\n");
#endif
/* similar processing for all the rest of the pathconf symbols... */
exit(0);
}
static void
pr_sysconf(char *mesg, int name)
{
Page 85
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
long
val;
fputs(mesg, stdout);
errno = 0;
if ((val = sysconf(name)) < 0) {
if (errno != 0) {
if (errno == EINVAL)
fputs(" (not supported)\n", stdout);
else
err_sys("sysconf error");
} else {
fputs(" (no limit)\n", stdout);
}
} else {
printf(" %ld\n", val);
}
}
static void
pr_pathconf(char *mesg, char *path, int name)
{
long
val;
fputs(mesg, stdout);
errno = 0;
if ((val = pathconf(path, name)) < 0) {
if (errno != 0) {
if (errno == EINVAL)
fputs(" (not supported)\n", stdout);
else
err_sys("pathconf error, path = %s", path);
} else {
fputs(" (no limit)\n", stdout);
}
} else {
printf(" %ld\n", val);
}
}
Figure 2.14. Examples of configuration limits
Solaris 9
Limit
ARG_MAX
FreeBSD
5.2.1
Linux 2.4.22
Mac OS X
10.3
UFS file
system
PCFS file
system
65,536
131,072
262,144
1,048,320
1,048,320
32
2,147,483,647
no symbol
no limit
no limit
no symbol
2,048
no symbol
14
14
CHILD_MAX
867
999
100
7,877
7,877
clock ticks/second
128
100
100
100
100
ATEXIT_MAX
CHARCLASS_NAME_MAX
Page 86
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.14. Examples of configuration limits
Solaris 9
Limit
COLL_WEIGHTS_MAX
FreeBSD
5.2.1
Linux 2.4.22
Mac OS X
10.3
UFS file
system
PCFS file
system
0
255
2
10
10
unsupported
64
no symbol
41
unsupported
255
unsupported
no symbol
no symbol
no symbol
IOV_MAX
1,024
no limit
no symbol
16
16
LINE_MAX
2,048
2,048
2,048
2,048
2,048
LINK_MAX
32,767
32,000
32,767
32,767
1
256 no symbol
9
9
FILESIZEBITS
HOST_NAME_MAX
LOGIN_NAME_MAX
17
MAX_CANON
255
255
255
256
256
MAX_INPUT
255
255
255
512
512
NAME_MAX
255
255
765
255
8
16
32
16
16
16
OPEN_MAX
1,735
1,024
256
256
256
PAGESIZE
4,096
4,096
4,096
8,192
8,192
PAGE_SIZE
4,096
4,096 no symbol
8,192
8,192
PATH_MAX
1,024
4,096
1,024
1,024
1,024
PIPE_BUF
512
4,096
512
5,120
5,120
RE_DUP_MAX
255
32,767
255
255
255
STREAM_MAX
1,735
16
20
256
256
SYMLINK_MAX
unsupported
no limit
no symbol
no symbol
no symbol
SYMLOOP_MAX
32
no limit
no symbol
no symbol
no symbol
NGROUPS_MAX
Page 87
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.14. Examples of configuration limits
Solaris 9
FreeBSD
5.2.1
Limit
TTY_NAME_MAX
255
TZNAME_MAX
255
Mac OS X
10.3
Linux 2.4.22
UFS file
system
32 no symbol
6
255
PCFS file
system
128
128
no limit
no limit
2.5.5. Indeterminate Runtime Limits
We mentioned that some of the limits can be indeterminate. The problem we encounter is that
if these limits aren't defined in the <limits.h> header, we can't use them at compile time. But
they might not be defined at runtime if their value is indeterminate! Let's look at two specific
cases: allocating storage for a pathname and determining the number of file descriptors.
Pathnames
Many programs need to allocate storage for a pathname. Typically, the storage has been
allocated at compile time, and various magic numbersfew of which are the correct valuehave
been used by different programs as the array size: 256, 512, 1024, or the standard I/O
constant BUFSIZ. The 4.3BSD constant MAXPATHLEN in the header <sys/param.h> is the correct
value, but many 4.3BSD applications didn't use it.
POSIX.1 tries to help with the PATH_MAX value, but if this value is indeterminate, we're still out
of luck. Figure 2.15 shows a function that we'll use throughout this text to allocate storage
dynamically for a pathname.
Figure 2.15. Dynamically allocate space for a pathname
#include "apue.h"
#include <errno.h>
#include <limits.h>
#ifdef PATH_MAX
static int pathmax = PATH_MAX;
#else
static int pathmax = 0;
#endif
#define SUSV3
200112L
static long posix_version = 0;
/* If PATH_MAX is indeterminate, no guarantee this is adequate */
#define PATH_MAX_GUESS 1024
char *
path_alloc(int *sizep) /* also return allocated size, if nonnull */
{
char
*ptr;
int size;
if (posix_version == 0)
Page 88
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
posix_version = sysconf(_SC_VERSION);
if (pathmax == 0) {
/* first time through */
errno = 0;
if ((pathmax = pathconf("/", _PC_PATH_MAX)) < 0) {
if (errno == 0)
pathmax = PATH_MAX_GUESS; /* it's indeterminate */
else
err_sys("pathconf error for _PC_PATH_MAX");
} else {
pathmax++;
/* add one since it's relative to root */
}
}
if (posix_version < SUSV3)
size = pathmax + 1;
else
size = pathmax;
if ((ptr = malloc(size)) == NULL)
err_sys("malloc error for pathname");
if (sizep != NULL)
*sizep = size;
return(ptr);
}
If the constant PATH_MAX is defined in <limits.h>, then we're all set. If it's not, we need to
call pathconf. The value returned by pathconf is the maximum size of a relative pathname when
the first argument is the working directory, so we specify the root as the first argument and
add 1 to the result. If pathconf indicates that PATH_MAX is indeterminate, we have to punt and
just guess a value.
Standards prior to SUSv3 were unclear as to whether or not PATH_MAX included a null byte at
the end of the pathname. If the operating system implementation conforms to one of these
prior versions, we need to add 1 to the amount of memory we allocate for a pathname, just to
be on the safe side.
The correct way to handle the case of an indeterminate result depends on how the allocated
space is being used. If we were allocating space for a call to getcwd, for exampleto return the
absolute pathname of the current working directory; see Section 4.22and if the allocated
space is too small, an error is returned and errno is set to ERANGE. We could then increase the
allocated space by calling realloc (see Section 7.8 and Exercise 4.16) and try again. We could
keep doing this until the call to getcwd succeeded.
Maximum Number of Open Files
A common sequence of code in a daemon processa process that runs in the background, not
connected to a terminalis one that closes all open files. Some programs have the following
code sequence, assuming the constant NOFILE was defined in the <sys/param.h> header:
#include
<sys/param.h>
for (i = 0; i < NOFILE; i++)
close(i);
Other programs use the constant _NFILE that some versions of <stdio.h> provide as the upper
limit. Some hard code the upper limit as 20.
We would hope to use the POSIX.1 value OPEN_MAX to determine this value portably, but if the
Page 89
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
value is indeterminate, we still have a problem. If we wrote the following and if OPEN_MAX was
indeterminate, the loop would never execute, since sysconf would return -1:
#include
<unistd.h>
for (i = 0; i < sysconf(_SC_OPEN_MAX); i++)
close(i);
Our best option in this case is just to close all descriptors up to some arbitrary limit, say 256.
As with our pathname example, this is not guaranteed to work for all cases, but it's the best
we can do. We show this technique in Figure 2.16.
Figure 2.16. Determine the number of file descriptors
#include "apue.h"
#include <errno.h>
#include <limits.h>
#ifdef OPEN_MAX
static long openmax = OPEN_MAX;
#else
static long openmax = 0;
#endif
/*
* If OPEN_MAX is indeterminate, we're not
* guaranteed that this is adequate.
*/
#define OPEN_MAX_GUESS 256
long
open_max(void)
{
if (openmax == 0) {
/* first time through */
errno = 0;
if ((openmax = sysconf(_SC_OPEN_MAX)) < 0) {
if (errno == 0)
openmax = OPEN_MAX_GUESS;
/* it's indeterminate */
else
err_sys("sysconf error for _SC_OPEN_MAX");
}
}
return(openmax);
}
We might be tempted to call close until we get an error return, but the error return from close
(EBADF) doesn't distinguish between an invalid descriptor and a descriptor that wasn't open. If
we tried this technique and descriptor 9 was not open but descriptor 10 was, we would stop
on 9 and never close 10. The dup function (Section 3.12) does return a specific error when
OPEN_MAX is exceeded, but duplicating a descriptor a couple of hundred times is an extreme
way to determine this value.
Some implementations will return LONG_MAX for limits values that are effectively unlimited. Such
is the case with the Linux limit for ATEXIT_MAX (see Figure 2.14). This isn't a good idea,
because it can cause programs to behave badly.
For example, we can use the ulimit command built into the Bourne-again shell to change the
Page 90
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
maximum number of files our processes can have open at one time. This generally requires
special (superuser) privileges if the limit is to be effectively unlimited. But once set to infinite,
sysconf will report LONG_MAX as the limit for OPEN_MAX. A program that relies on this value as the
upper bound of file descriptors to close as shown in Figure 2.16 will waste a lot of time trying
to close 2,147,483,647 file descriptors, most of which aren't even in use.
Systems that support the XSI extensions in the Single UNIX Specification will provide the
getrlimit(2) function (Section 7.11). It can be used to return the maximum number of
descriptors that a process can have open. With it, we can detect that there is no configured
upper bound to the number of open files our processes can open, so we can avoid this
problem.
The OPEN_MAX value is called runtime invariant by POSIX, meaning that its value should not
change during the lifetime of a process. But on systems that support the XSI extensions, we
can call the setrlimit(2) function (Section 7.11) to change this value for a running process.
(This value can also be changed from the C shell with the limit command, and from the
Bourne, Bourne-again, and Korn shells with the ulimit command.) If our system supports this
functionality, we could change the function in Figure 2.16 to call sysconf every time it is
called, not only the first time.
Page 91
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
2.6. Options
We saw the list of POSIX.1 options in Figure 2.5 and discussed XSI option groups in Section
2.2.3. If we are to write portable applications that depend on any of these
optionally-supported features, we need a portable way to determine whether an
implementation supports a given option.
Just as with limits (Section 2.5), the Single UNIX Specification defines three ways to do this.
1.
Compile-time options are defined in <unistd.h>.
2.
Runtime options that are not associated with a file or a directory are identified with
the sysconf function.
3.
Runtime options that are associated with a file or a directory are discovered by calling
either the pathconf or the fpathconf function.
The options include the symbols listed in the third column of Figure 2.5, as well as the symbols
listed in Figures 2.17 and 2.18. If the symbolic constant is not defined, we must use sysconf,
pathconf, or fpathconf to determine whether the option is supported. In this case, the name
argument to the function is formed by replacing the _POSIX at the beginning of the symbol
with _SC or _PC. For constants that begin with _XOPEN, the name argument is formed by
prepending the string with _SC or _PC. For example, if the constant _POSIX_THREADS is
undefined, we can call sysconf with the name argument set to _SC_THREADS to determine
whether the platform supports the POSIX threads option. If the constant _XOPEN_UNIX is
undefined, we can call sysconf with the name argument set to _SC_XOPEN_UNIX to determine
whether the platform supports the XSI extensions.
Figure 2.17. Options and name arguments to sysconf
Name of option
Description
name argument
_POSIX_JOB_CONTROL
indicates whether the
implementation supports job
control
_SC_JOB_CONTROL
_POSIX_READER_WRITER_LOCKS
indicates whether the
implementation supports
readerwriter locks
_SC_READER_WRITER_LOCKS
_POSIX_SAVED_IDS
indicates whether the
implementation supports the
saved set-user-ID and the
saved set-group-ID
_SC_SAVED_IDS
_POSIX_SHELL
indicates whether the
implementation supports the
POSIX shell
_SC_SHELL
_POSIX_VERSION
indicates the POSIX.1 version
_SC_VERSION
_XOPEN_CRYPT
indicates whether the
implementation supports the
_SC_XOPEN_CRYPT
Page 92
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.17. Options and name arguments to sysconf
Name of option
Description
name argument
XSI encryption option group
_XOPEN_LEGACY
indicates whether the
implementation supports the
XSI legacy option group
_SC_XOPEN_LEGACY
_XOPEN_REALTIME
indicates whether the
implementation supports the
XSI real-time option group
_SC_XOPEN_REALTIME
_XOPEN_REALTIME_THREADS
indicates whether the
implementation supports the
XSI real-time threads option
group
_SC_XOPEN_REALTIME_THREADS
_XOPEN_VERSION
indicates the XSI version
_SC_XOPEN_VERSION
Figure 2.18. Options and name arguments to pathconf and fpathconf
Name of option
Description
name argument
_POSIX_CHOWN_RESTRICTED
indicates whether use of chown is
restricted
_PC_CHOWN_RESTRICTED
_POSIX_NO_TRUNC
indicates whether pathnames longer than
NAME_MAX generate an error
_PC_NO_TRUNC
_POSIX_VDISABLE
if defined, terminal special characters can _PC_VDISABLE
be disabled with this value
_POSIX_ASYNC_IO
indicates whether asynchronous I/O can
be used with the associated file
_PC_ASYNC_IO
_POSIX_PRIO_IO
indicates whether prioritized I/O can be
used with the associated file
_PC_PRIO_IO
_POSIX_SYNC_IO
indicates whether synchronized I/O can
be used with the associated file
_PC_SYNC_IO
If the symbolic constant is defined by the platform, we have three possibilities.
1.
If the symbolic constant is defined to have the value 1, then the corresponding option
is unsupported by the platform.
2.
If the symbolic constant is defined to be greater than zero, then the corresponding
Page 93
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
option is supported.
3.
If the symbolic constant is defined to be equal to zero, then we must call sysconf,
pathconf, or fpathconf to determine whether the option is supported.
Figure 2.17 summarizes the options and their symbolic constants that can be used with
sysconf, in addition to those listed in Figure 2.5.
The symbolic constants used with pathconf and fpathconf are summarized in Figure 2.18. As
with the system limits, there are several points to note regarding how options are treated by
sysconf, pathconf, and fpathconf.
1.
The value returned for _SC_VERSION indicates the four-digit year and two-digit month of
the standard. This value can be 198808L, 199009L, 199506L, or some other value for a
later version of the standard. The value associated with Version 3 of the Single UNIX
Specification is 200112L.
2.
The value returned for _SC_XOPEN_VERSION indicates the version of the XSI that the
system complies with. The value associated with Version 3 of the Single UNIX
Specification is 600.
3.
The values _SC_JOB_CONTROL, _SC_SAVED_IDS, and _PC_VDISABLE no longer represent
optional features. As of Version 3 of the Single UNIX Specification, these features are
now required, although these symbols are retained for backward compatibility.
4.
_PC_CHOWN_RESTRICTED and _PC_NO_TRUNC return 1 without changing errno if the feature is
not supported for the specified pathname or filedes.
5.
The referenced file for _PC_CHOWN_RESTRICTED must be either a file or a directory. If it is
a directory, the return value indicates whether this option applies to files within that
directory.
6.
The referenced file for _PC_NO_TRUNC must be a directory. The return value applies to
filenames within the directory.
7.
The referenced file for _PC_VDISABLE must be a terminal file.
In Figure 2.19 we show several configuration options and their corresponding values on the
four sample systems we discuss in this text. Note that several of the systems haven't yet
caught up to the latest version of the Single UNIX Specification. For example, Mac OS X 10.3
supports POSIX threads but defines _POSIX_THREADS as
#define _POSIX_THREADS
without specifying a value. To conform to Version 3 of the Single UNIX Specification, the
symbol, if defined, should be set to -1, 0, or 200112.
Figure 2.19. Examples of configuration options
Solaris 9
Limit
FreeBSD
5.2.1
Linux
2.4.22
Mac OS X
10.3
UFS file
system
PCFS file
system
_POSIX_CHOWN_RESTRICTED
1
1
1
1
1
_POSIX_JOB_CONTROL
1
1
1
1
1
Page 94
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.19. Examples of configuration options
Solaris 9
Limit
_POSIX_NO_TRUNC
FreeBSD
5.2.1
Linux
2.4.22
Mac OS X
10.3
UFS file
system
PCFS file
system
1
1
1
1
unsupported
unsupported
1
unsupported
1
1
200112
200112
defined
1
1
255
0
255
0
0
200112
200112
198808
199506
199506
_XOPEN_UNIX
unsupported
1
undefined
1
1
_XOPEN_VERSION
unsupported
500
undefined
3
3
_POSIX_SAVED_IDS
_POSIX_THREADS
_POSIX_VDISABLE
_POSIX_VERSION
An entry is marked as "undefined" if the feature is not defined, i.e., the system doesn't define
the symbolic constant or its corresponding _PC or _SC name. In contrast, the "defined" entry
means that the symbolic constant is defined, but no value is specified, as in the preceding
_POSIX_THREADS example. An entry is "unsupported" if the system defines the symbolic
constant, but it has a value of -1, or it has a value of 0 but the corresponding sysconf or
pathconf call returned -1.
Note that pathconf returns a value of 1 for _PC_NO_TRUNC when used with a file from a PCFS file
system on Solaris. The PCFS file system supports the DOS format (for floppy disks), and DOS
filenames are silently truncated to the 8.3 format limit that the DOS file system requires.
Page 95
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
2.7. Feature Test Macros
The headers define numerous POSIX.1 and XSI symbols, as we've described. But most
implementations can add their own definitions to these headers, in addition to the POSIX.1
and XSI definitions. If we want to compile a program so that it depends only on the POSIX
definitions and doesn't use any implementation-defined limits, we need to define the constant
_POSIX_C_SOURCE. All the POSIX.1 headers use this constant to exclude any
implementation-defined definitions when _POSIX_C_SOURCE is defined.
Previous versions of the POSIX.1 standard defined the _POSIX_SOURCE constant. This has been
superseded by the _POSIX_C_SOURCE constant in the 2001 version of POSIX.1.
The constants _POSIX_C_SOURCE and _XOPEN_SOURCE are called feature test macros. All feature
test macros begin with an underscore. When used, they are typically defined in the cc
command, as in
cc -D_POSIX_C_SOURCE=200112 file.c
This causes the feature test macro to be defined before any header files are included by the
C program. If we want to use only the POSIX.1 definitions, we can also set the first line of a
source file to
#define _POSIX_C_SOURCE 200112
To make the functionality of Version 3 of the Single UNIX Specification available to
applications, we need to define the constant _XOPEN_SOURCE to be 600. This has the same
effect as defining _POSIX_C_SOURCE to be 200112L as far as POSIX.1 functionality is concerned.
The Single UNIX Specification defines the c99 utility as the interface to the C compilation
environment. With it we can compile a file as follows:
c99 -D_XOPEN_SOURCE=600 file.c -o file
To enable the 1999 ISO C extensions in the gcc C compiler, we use the -std=c99 option, as in
gcc -D_XOPEN_SOURCE=600 -std=c99 file.c -o file
Another feature test macro is _ _STDC_ _, which is automatically defined by the C compiler if
the compiler conforms to the ISO C standard. This allows us to write C programs that compile
under both ISO C compilers and non-ISO C compilers. For example, to take advantage of the
ISO C prototype feature, if supported, a header could contain
#ifdef _ _STDC_ _
void *myfunc(const char *, int);
#else
void *myfunc();
#endif
Although most C compilers these days support the ISO C standard, this use of the _ _STDC_ _
feature test macro can still be found in many header files.
Page 96
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 97
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
2.8. Primitive System Data Types
Historically, certain C data types have been associated with certain UNIX system variables.
For example, the major and minor device numbers have historically been stored in a 16-bit
short integer, with 8 bits for the major device number and 8 bits for the minor device number.
But many larger systems need more than 256 values for these device numbers, so a different
technique is needed. (Indeed, Solaris uses 32 bits for the device number: 14 bits for the major
and 18 bits for the minor.)
The header <sys/types.h> defines some implementation-dependent data types, called the
primitive system data types. More of these data types are defined in other headers also.
These data types are defined in the headers with the C typedef facility. Most end in _t. Figure
2.20 lists many of the primitive system data types that we'll encounter in this text.
Figure 2.20. Some common primitive system data types
Type
Description
caddr_t
core address (Section 14.9)
clock_t
counter of clock ticks (process time) (Section 1.10)
comp_t
compressed clock ticks (Section 8.14)
dev_t
device numbers (major and minor) (Section 4.23)
fd_set
file descriptor sets (Section 14.5.1)
fpos_t
file position (Section 5.10)
gid_t
numeric group IDs
ino_t
i-node numbers (Section 4.14)
mode_t
file type, file creation mode (Section 4.5)
nlink_t
link counts for directory entries (Section 4.14)
off_t
file sizes and offsets (signed) (lseek, Section 3.6)
pid_t
process IDs and process group IDs (signed) (Sections 8.2 and 9.4)
ptrdiff_t
result of subtracting two pointers (signed)
rlim_t
resource limits (Section 7.11)
sig_atomic_t
data type that can be accessed atomically (Section 10.15)
Page 98
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 2.20. Some common primitive system data types
Type
Description
sigset_t
signal set (Section 10.11)
size_t
sizes of objects (such as strings) (unsigned) (Section 3.7)
ssize_t
functions that return a count of bytes (signed) (read, write, Section 3.7)
time_t
counter of seconds of calendar time (Section 1.10)
uid_t
numeric user IDs
wchar_t
can represent all distinct character codes
By defining these data types this way, we do not build into our programs implementation
details that can change from one system to another. We describe what each of these data
types is used for when we encounter them later in the text.
Page 99
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
2.9. Conflicts Between Standards
All in all, these various standards fit together nicely. Our main concern is any differences
between the ISO C standard and POSIX.1, since SUSv3 is a superset of POSIX.1. There are
some differences.
ISO C defines the function clock to return the amount of CPU time used by a process. The
value returned is a clock_t value. To convert this value to seconds, we divide it by
CLOCKS_PER_SEC, which is defined in the <time.h> header. POSIX.1 defines the function times
that returns both the CPU time (for the caller and all its terminated children) and the clock
time. All these time values are clock_t values. The sysconf function is used to obtain the
number of clock ticks per second for use with the return values from the times function. What
we have is the same term, clock ticks per second, defined differently by ISO C and POSIX.1.
Both standards also use the same data type (clock_t) to hold these different values. The
difference can be seen in Solaris, where clock returns microseconds (hence CLOCKS_PER_SEC is
1 million), whereas sysyconf returns the value 100 for clock ticks per second.
Another area of potential conflict is when the ISO C standard specifies a function, but doesn't
specify it as strongly as POSIX.1 does. This is the case for functions that require a different
implementation in a POSIX environment (with multiple processes) than in an ISO C
environment (where very little can be assumed about the host operating system).
Nevertheless, many POSIX-compliant systems implement the ISO C function, for compatibility.
The signal function is an example. If we unknowingly use the signal function provided by
Solaris (hoping to write portable code that can be run in ISO C environments and under older
UNIX systems), it'll provide semantics different from the POSIX.1 sigaction function. We'll
have more to say about the signal function in Chapter 10.
Page 100
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
2.10. Summary
Much has happened over the past two decades with the standardization of the UNIX
programming environment. We've described the dominant standardsISO C, POSIX, and the
Single UNIX Specificationand their effect on the four implementations that we'll examine in this
text: FreeBSD, Linux, Mac OS X, and Solaris. These standards try to define certain parameters
that can change with each implementation, but we've seen that these limits are imperfect.
We'll encounter many of these limits and magic constants as we proceed through the text.
The bibliography specifies how one can obtain copies of the standards that we've discussed.
Page 101
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Exercises
2.1
2.2
2.3
We mentioned in Section 2.8 that some of the primitive system data types are
defined in more than one header. For example, on FreeBSD 5.2.1, size_t is
defined in 26 different headers. Because all 26 headers could be included in a
program and because ISO C does not allow multiple typedefs for the same
name, how must the headers be written?
Examine your system's headers and list the actual data types used to
implement the primitive system data types.
Update the program in Figure 2.16 to avoid the needless processing that occurs
when sysconf returns LONG_MAX as the limit for OPEN_MAX.
Page 102
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Chapter 3. File I/O
Section 3.1. Introduction
Section 3.2. File Descriptors
Section 3.3. open Function
Section 3.4. creat Function
Section 3.5. close Function
Section 3.6. lseek Function
Section 3.7. read Function
Section 3.8. write Function
Section 3.9. I/O Efficiency
Section 3.10. File Sharing
Section 3.11. Atomic Operations
Section 3.12. dup and dup2 Functions
Section 3.13. sync, fsync, and fdatasync Functions
Section 3.14. fcntl Function
Section 3.15. ioctl Function
Section 3.16. /dev/fd
Section 3.17. Summary
Exercises
Page 103
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.1. Introduction
We'll start our discussion of the UNIX System by describing the functions available for file
I/Oopen a file, read a file, write a file, and so on. Most file I/O on a UNIX system can be
performed using only five functions: open, read, write, lseek, and close. We then examine the
effect of various buffer sizes on the read and write functions.
The functions described in this chapter are often referred to as unbuffered I/O, in contrast to
the standard I/O routines, which we describe in Chapter 5. The term unbuffered means that
each read or write invokes a system call in the kernel. These unbuffered I/O functions are not
part of ISO C, but are part of POSIX.1 and the Single UNIX Specification.
Whenever we describe the sharing of resources among multiple processes, the concept of an
atomic operation becomes important. We examine this concept with regard to file I/O and the
arguments to the open function. This leads to a discussion of how files are shared among
multiple processes and the kernel data structures involved. After describing these features,
we describe the dup, fcntl, sync, fsync, and ioctl functions.
Page 104
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.2. File Descriptors
To the kernel, all open files are referred to by file descriptors. A file descriptor is a
non-negative integer. When we open an existing file or create a new file, the kernel returns a
file descriptor to the process. When we want to read or write a file, we identify the file with
the file descriptor that was returned by open or creat as an argument to either read or write.
By convention, UNIX System shells associate file descriptor 0 with the standard input of a
process, file descriptor 1 with the standard output, and file descriptor 2 with the standard
error. This convention is used by the shells and many applications; it is not a feature of the
UNIX kernel. Nevertheless, many applications would break if these associations weren't
followed.
The magic numbers 0, 1, and 2 should be replaced in POSIX-compliant applications with the
symbolic constants STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO. These constants are
defined in the <unistd.h> header.
File descriptors range from 0 through OPEN_MAX. (Recall Figure 2.10.) Early historical
implementations of the UNIX System had an upper limit of 19, allowing a maximum of 20 open
files per process, but many systems increased this limit to 63.
With FreeBSD 5.2.1, Mac OS X 10.3, and Solaris 9, the limit is essentially infinite, bounded by
the amount of memory on the system, the size of an integer, and any hard and soft limits
configured by the system administrator. Linux 2.4.22 places a hard limit of 1,048,576 on the
number of file descriptors per process.
Page 105
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.3. open Function
A file is opened or created by calling the open function.
[View full width]
#include <fcntl.h>
int open(const char *pathname, int oflag, ...
/*
mode_t mode
*/ );
Returns: file descriptor if OK, 1 on error
We show the third argument as ..., which is the ISO C way to specify that the number and
types of the remaining arguments may vary. For this function, the third argument is used only
when a new file is being created, as we describe later. We show this argument as a comment
in the prototype.
The pathname is the name of the file to open or create. This function has a multitude of
options, which are specified by the oflag argument. This argument is formed by ORing together
one or more of the following constants from the <fcntl.h> header:
O_RDONLY Open for reading only.
O_WRONLY Open for writing only.
O_RDWR
Open for reading and writing.
Most implementations define O_RDONLY as 0, O_WRONLY as 1, and O_RDWR as 2, for compatibility
with older programs.
One and only one of these three constants must be specified. The following constants are
optional:
O_APPEND
Append to the end of file on each write. We describe this option in detail in
Section 3.11.
O_CREAT
Create the file if it doesn't exist. This option requires a third argument to the
open function, the mode, which specifies the access permission bits of the new
file. (When we describe a file's access permission bits in Section 4.5, we'll see
how to specify the mode and how it can be modified by the umask value of a
process.)
O_EXCL
Generate an error if O_CREAT is also specified and the file already exists. This
test for whether the file already exists and the creation of the file if it doesn't
exist is an atomic operation. We describe atomic operations in more detail in
Section 3.11.
O_TRUNC
If the file exists and if it is successfully opened for either write-only or
Page 106
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
O_APPEND
Append to the end of file on each write. We describe this option in detail in
Section 3.11.
readwrite, truncate its length to 0.
O_NOCTTY
If the pathname refers to a terminal device, do not allocate the device as the
controlling terminal for this process. We talk about controlling terminals in
Section 9.6.
O_NONBLOCK If the pathname refers to a FIFO, a block special file, or a character special
file, this option sets the nonblocking mode for both the opening of the file and
subsequent I/O. We describe this mode in Section 14.2.
In earlier releases of System V, the O_NDELAY (no delay) flag was introduced. This option is
similar to the O_NONBLOCK (nonblocking) option, but an ambiguity was introduced in the return
value from a read operation. The no-delay option causes a read to return 0 if there is no data
to be read from a pipe, FIFO, or device, but this conflicts with a return value of 0, indicating
an end of file. SVR4-based systems still support the no-delay option, with the old semantics,
but new applications should use the nonblocking option instead.
The following three flags are also optional. They are part of the synchronized input and output
option of the Single UNIX Specification (and thus POSIX.1):
O_DSYNC Have each write wait for physical I/O to complete, but don't wait for file attributes
to be updated if they don't affect the ability to read the data just written.
O_RSYNC Have each read operation on the file descriptor wait until any pending writes for
the same portion of the file are complete.
O_SYNC
Have each write wait for physical I/O to complete, including I/O necessary to
update file attributes modified as a result of the write. We use this option in
Section 3.14.
The O_DSYNC and O_SYNC flags are similar, but subtly different. The O_DSYNC flag affects a file's
attributes only when they need to be updated to reflect a change in the file's data (for
example, update the file's size to reflect more data). With the O_SYNC flag, data and attributes
are always updated synchronously. When overwriting an existing part of a file opened with
the O_DSYNC flag, the file times wouldn't be updated synchronously. In contrast, if we had
opened the file with the O_SYNC flag, every write to the file would update the file's times
before the write returns, regardless of whether we were writing over existing bytes or
appending to the file.
Solaris 9 supports all three flags. FreeBSD 5.2.1 and Mac OS X 10.3 have a separate flag (
O_FSYNC) that does the same thing as O_SYNC. Because the two flags are equivalent, FreeBSD
5.2.1 defines them to have the same value (but curiously, Mac OS X 10.3 doesn't define
O_SYNC). FreeBSD 5.2.1 and Mac OS X 10.3 don't support the O_DSYNC or O_RSYNC flags. Linux
2.4.22 treats both flags the same as O_SYNC.
The file descriptor returned by open is guaranteed to be the lowest-numbered unused
descriptor. This fact is used by some applications to open a new file on standard input,
standard output, or standard error. For example, an application might close standard
outputnormally, file descriptor 1and then open another file, knowing that it will be opened on
file descriptor 1. We'll see a better way to guarantee that a file is open on a given descriptor
in Section 3.12 with the dup2 function.
Page 107
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Filename and Pathname Truncation
What happens if NAME_MAX is 14 and we try to create a new file in the current directory with a
filename containing 15 characters? Traditionally, early releases of System V, such as SVR2,
allowed this to happen, silently truncating the filename beyond the 14th character.
BSD-derived systems returned an error status, with errno set to ENAMETOOLONG. Silently
truncating the filename presents a problem that affects more than simply the creation of new
files. If NAME_MAX is 14 and a file exists whose name is exactly 14 characters, any function that
accepts a pathname argument, such as open or stat, has no way to determine what the
original name of the file was, as the original name might have been truncated.
With POSIX.1, the constant _POSIX_NO_TRUNC determines whether long filenames and long
pathnames are truncated or whether an error is returned. As we saw in Chapter 2, this value
can vary based on the type of the file system.
Whether or not an error is returned is largely historical. For example, SVR4-based systems do
not generate an error for the traditional System V file system, S5. For the BSD-style file
system (known as UFS), however, SVR4-based systems do generate an error.
As another example, see Figure 2.19. Solaris will return an error for UFS, but not for PCFS, the
DOS-compatible file system, as DOS silently truncates filenames that don't fit in an 8.3
format.
BSD-derived systems and Linux always return an error.
If _POSIX_NO_TRUNC is in effect, errno is set to ENAMETOOLONG, and an error status is returned if
the entire pathname exceeds PATH_MAX or any filename component of the pathname exceeds
NAME_MAX.
Page 108
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.4. creat Function
A new file can also be created by calling the creat function.
#include <fcntl.h>
int creat(const char *pathname, mode_t mode);
Returns: file descriptor opened for write-only if OK, 1 on error
Note that this function is equivalent to
open (pathname, O_WRONLY | O_CREAT | O_TRUNC, mode);
Historically, in early versions of the UNIX System, the second argument to open could be only
0, 1, or 2. There was no way to open a file that didn't already exist. Therefore, a separate
system call, creat, was needed to create new files. With the O_CREAT and O_TRUNC options now
provided by open, a separate creat function is no longer needed.
We'll show how to specify mode in Section 4.5 when we describe a file's access permissions in
detail.
One deficiency with creat is that the file is opened only for writing. Before the new version of
open was provided, if we were creating a temporary file that we wanted to write and then
read back, we had to call creat, close, and then open. A better way is to use the open
function, as in
open (pathname, O_RDWR | O_CREAT | O_TRUNC, mode);
Page 109
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.5. close Function
An open file is closed by calling the close function.
#include <unistd.h>
int close(int filedes);
Returns: 0 if OK, 1 on error
Closing a file also releases any record locks that the process may have on the file. We'll
discuss this in Section 14.3.
When a process terminates, all of its open files are closed automatically by the kernel. Many
programs take advantage of this fact and don't explicitly close open files. See the program in
Figure 1.4, for example.
Page 110
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.6. lseek Function
Every open file has an associated "current file offset," normally a non-negative integer that
measures the number of bytes from the beginning of the file. (We describe some exceptions to
the "non-negative" qualifier later in this section.) Read and write operations normally start at
the current file offset and cause the offset to be incremented by the number of bytes read or
written. By default, this offset is initialized to 0 when a file is opened, unless the O_APPEND
option is specified.
An open file's offset can be set explicitly by calling lseek.
#include <unistd.h>
off_t lseek(int filedes, off_t offset, int whence
);
Returns: new file offset if OK, 1 on error
The interpretation of the offset depends on the value of the whence argument.

If whence is SEEK_SET, the file's offset is set to offset bytes from the beginning of the
file.

If whence is SEEK_CUR, the file's offset is set to its current value plus the offset. The
offset can be positive or negative.

If whence is SEEK_END, the file's offset is set to the size of the file plus the offset. The
offset can be positive or negative.
Because a successful call to lseek returns the new file offset, we can seek zero bytes from
the current position to determine the current offset:
off_t
currpos;
currpos = lseek(fd, 0, SEEK_CUR);
This technique can also be used to determine if a file is capable of seeking. If the file
descriptor refers to a pipe, FIFO, or socket, lseek sets errno to ESPIPE and returns 1.
The three symbolic constantsSEEK_SET, SEEK_CUR, and SEEK_ENDwere introduced with System V.
Prior to this, whence was specified as 0 (absolute), 1 (relative to current offset), or 2
(relative to end of file). Much software still exists with these numbers hard coded.
The character l in the name lseek means "long integer." Before the introduction of the off_t
data type, the offset argument and the return value were long integers. lseek was introduced
with Version 7 when long integers were added to C. (Similar functionality was provided in
Version 6 by the functions seek and tell.)
Example
The program in Figure 3.1 tests its standard input to see whether it is capable of seeking.
If we invoke this program interactively, we get
Page 111
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
$ ./a.out < /etc/motd
seek OK
$ cat < /etc/motd | ./a.out
cannot seek
$ ./a.out < /var/spool/cron/FIFO
cannot seek
Figure 3.1. Test whether standard input is capable of seeking
#include "apue.h"
int
main(void)
{
if (lseek(STDIN_FILENO, 0, SEEK_CUR) == -1)
printf("cannot seek\n");
else
printf("seek OK\n");
exit(0);
}
Normally, a file's current offset must be a non-negative integer. It is possible, however, that
certain devices could allow negative offsets. But for regular files, the offset must be
non-negative. Because negative offsets are possible, we should be careful to compare the
return value from lseek as being equal to or not equal to 1 and not test if it's less than 0.
The /dev/kmem device on FreeBSD for the Intel x86 processor supports negative offsets.
Because the offset (off_t) is a signed data type (Figure 2.20), we lose a factor of 2 in the
31
maximum file size. If off_t is a 32-bit integer, the maximum file size is 2 -1 bytes.
lseek only records the current file offset within the kernelit does not cause any I/O to take
place. This offset is then used by the next read or write operation.
The file's offset can be greater than the file's current size, in which case the next write to
the file will extend the file. This is referred to as creating a hole in a file and is allowed. Any
bytes in a file that have not been written are read back as 0.
A hole in a file isn't required to have storage backing it on disk. Depending on the file system
implementation, when you write after seeking past the end of the file, new disk blocks might
be allocated to store the data, but there is no need to allocate disk blocks for the data
between the old end of file and the location where you start writing.
Example
The program shown in Figure 3.2 creates a file with a hole in it.
Running this program gives us
$ ./a.out
$ ls -l file.hole
check its size
-rw-r--r-- 1 sar
16394 Nov 25 01:01 file.hole
$ od -c file.hole
let's look at the actual contents
0000000
a b c d e f g h i j \0 \0 \0 \0 \0 \0
0000020 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
0040000
A B C D E F G H I J
0040012
Page 112
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
We use the od(1) command to look at the contents of the file. The -c flag tells it to print the
contents as characters. We can see that the unwritten bytes in the middle are read back as
zero. The seven-digit number at the beginning of each line is the byte offset in octal.
To prove that there is really a hole in the file, let's compare the file we've just created with a
file of the same size, but without holes:
$ ls -ls file.hole file.nohole compare sizes
8 -rw-r--r-- 1 sar
16394 Nov 25 01:01 file.hole
20 -rw-r--r-- 1 sar
16394 Nov 25 01:03 file.nohole
Although both files are the same size, the file without holes consumes 20 disk blocks, whereas
the file with holes consumes only 8 blocks.
In this example, we call the write function (Section 3.8). We'll have more to say about files
with holes in Section 4.12.
Figure 3.2. Create a file with a hole in it
#include "apue.h"
#include <fcntl.h>
char
char
buf1[] = "abcdefghij";
buf2[] = "ABCDEFGHIJ";
int
main(void)
{
int
fd;
if ((fd = creat("file.hole", FILE_MODE)) < 0)
err_sys("creat error");
if (write(fd, buf1, 10) != 10)
err_sys("buf1 write error");
/* offset now = 10 */
if (lseek(fd, 16384, SEEK_SET) == -1)
err_sys("lseek error");
/* offset now = 16384 */
if (write(fd, buf2, 10) != 10)
err_sys("buf2 write error");
/* offset now = 16394 */
exit(0);
}
Because the offset address that lseek uses is represented by an off_t, implementations are
allowed to support whatever size is appropriate on their particular platform. Most platforms
today provide two sets of interfaces to manipulate file offsets: one set that uses 32-bit file
offsets and another set that uses 64-bit file offsets.
The Single UNIX Specification provides a way for applications to determine which
environments are supported through the sysconf function (Section 2.5.4.). Figure 3.3
summarizes the sysconf constants that are defined.
Page 113
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 3.3. Data size options and name arguments to sysconf
Name of option
_POSIX_V6_ILP32_OFF32
Description
int, long, pointer, and off_t types are 32
name argument
_SC_V6_ILP32_OFF32
bits.
_POSIX_V6_ILP32_OFFBIG
int, long, and pointer types are 32 bits;
off_t types are at least 64 bits.
_SC_V6_ILP32_OFFBIG
_POSIX_V6_LP64_OFF64
int types are 32 bits; long, pointer, and
off_t types are 64 bits.
_SC_V6_LP64_OFF64
_POSIX_V6_LP64_OFFBIG
int types are 32 bits; long, pointer, and
off_t types are at least 64 bits.
_SC_V6_LP64_OFFBIG
The c99 compiler requires that we use the getconf(1) command to map the desired data size
model to the flags necessary to compile and link our programs. Different flags and libraries
might be needed, depending on the environments supported by each platform.
Unfortunately, this is one area in which implementations haven't caught up to the standards.
Confusing things further is the name changes that were made between Version 2 and Version
3 of the Single UNIX Specification.
To get around this, applications can set the _FILE_OFFSET_BITS constant to 64 to enable
64-bit offsets. Doing so changes the definition of off_t to be a 64-bit signed integer. Setting
_FILE_OFFSET_BITS to 32 enables 32-bit file offsets. Be aware, however, that although all four
platforms discussed in this text support both 32-bit and 64-bit file offsets by setting the
_FILE_OFFSET_BITS constant to the desired value, this is not guaranteed to be portable.
Note that even though you might enable 64-bit file offsets, your ability to create a file larger
31
than 2 TB (2 -1 bytes) depends on the underlying file system type.
Page 114
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.7. read Function
Data is read from an open file with the read function.
#include <unistd.h>
ssize_t read(int filedes, void *buf, size_t nbytes);
Returns: number of bytes read, 0 if end of file, 1 on error
If the read is successful, the number of bytes read is returned. If the end of file is
encountered, 0 is returned.
There are several cases in which the number of bytes actually read is less than the amount
requested:

When reading from a regular file, if the end of file is reached before the requested
number of bytes has been read. For example, if 30 bytes remain until the end of file
and we try to read 100 bytes, read returns 30. The next time we call read, it will return
0 (end of file).

When reading from a terminal device. Normally, up to one line is read at a time. (We'll
see how to change this in Chapter 18.)

When reading from a network. Buffering within the network may cause less than the
requested amount to be returned.

When reading from a pipe or FIFO. If the pipe contains fewer bytes than requested,
read will return only what is available.

When reading from a record-oriented device. Some record-oriented devices, such as
magnetic tape, can return up to a single record at a time.

When interrupted by a signal and a partial amount of data has already been read. We
discuss this further in Section 10.5.
The read operation starts at the file's current offset. Before a successful return, the offset is
incremented by the number of bytes actually read.
POSIX.1 changed the prototype for this function in several ways. The classic definition is
int read(int filedes, char *buf, unsigned nbytes);

First, the second argument was changed from a char * to a void * to be consistent
with ISO C: the type void * is used for generic pointers.

Next, the return value must be a signed integer (ssize_t) to return a positive byte
count, 0 (for end of file), or 1 (for an error).

Finally, the third argument historically has been an unsigned integer, to allow a 16-bit
implementation to read or write up to 65,534 bytes at a time. With the 1990 POSIX.1
standard, the primitive system data type ssize_t was introduced to provide the signed
return value, and the unsigned size_t was used for the third argument. (Recall the
SSIZE_MAX constant from Section 2.5.2.)
Page 115
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 116
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.8. write Function
Data is written to an open file with the write function.
[View full width]
#include <unistd.h>
ssize_t write(int filedes, const void *buf,
size_t
nbytes);
Returns: number of bytes written if OK, 1 on error
The return value is usually equal to the nbytes argument; otherwise, an error has occurred. A
common cause for a write error is either filling up a disk or exceeding the file size limit for a
given process (Section 7.11 and Exercise 10.11).
For a regular file, the write starts at the file's current offset. If the O_APPEND option was
specified when the file was opened, the file's offset is set to the current end of file before
each write operation. After a successful write, the file's offset is incremented by the number
of bytes actually written.
Page 117
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.9. I/O Efficiency
The program in Figure 3.4 copies a file, using only the read and write functions. The following
caveats apply to this program.
Figure 3.4. Copy standard input to standard output
#include "apue.h"
#define BUFFSIZE 4096
int
main(void)
{
int
n;
char
buf[BUFFSIZE];
while ((n = read(STDIN_FILENO, buf, BUFFSIZE)) > 0)
if (write(STDOUT_FILENO, buf, n) != n)
err_sys("write error");
if (n < 0)
err_sys("read error");
exit(0);
}

It reads from standard input and writes to standard output, assuming that these have
been set up by the shell before this program is executed. Indeed, all normal UNIX
system shells provide a way to open a file for reading on standard input and to create
(or rewrite) a file on standard output. This prevents the program from having to open
the input and output files.

Many applications assume that standard input is file descriptor 0 and that standard
output is file descriptor 1. In this example, we use the two defined names,
STDIN_FILENO and STDOUT_FILENO, from <unistd.h>.

The program doesn't close the input file or output file. Instead, the program uses the
feature of the UNIX kernel that closes all open file descriptors in a process when that
process terminates.

This example works for both text files and binary files, since there is no difference
between the two to the UNIX kernel.
One question we haven't answered, however, is how we chose the BUFFSIZE value. Before
answering that, let's run the program using different values for BUFFSIZE. Figure 3.5 shows the
results for reading a 103,316,352-byte file, using 20 different buffer sizes.
The file was read using the program shown in Figure 3.4, with standard output redirected to
/dev/null. The file system used for this test was the Linux ext2 file system with 4,096-byte
blocks. (The st_blksize value, which we describe in Section 4.12, is 4,096.) This accounts for
the minimum in the system time occurring at a BUFFSIZE of 4,096. Increasing the buffer size
beyond this has little positive effect.
Most file systems support some kind of read-ahead to improve performance. When sequential
reads are detected, the system tries to read in more data than an application requests,
Page 118
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
assuming that the application will read it shortly. From the last few entries in Figure 3.5, it
appears that read-ahead in ext2 stops having an effect after 128 KB.
Figure 3.5. Timing results for reading with different buffer sizes on
Linux
BUFFSIZE
User CPU
(seconds)
System CPU
(seconds)
Clock time
(seconds)
#loops
1
124.89
161.65
288.64
103,316,352
2
63.10
80.96
145.81
51,658,#176
4
31.84
40.00
72.75
25,829,088
8
15.17
21.01
36.85
12,914,544
16
7.86
10.27
18.76
6,457,272
32
4.13
5.01
9.76
3,228,636
64
2.11
2.48
6.76
1,614,318
128
1.01
1.27
6.82
807,159
256
0.56
0.62
6.80
403,579
512
0.27
0.41
7.03
201,789
1,024
0.17
0.23
7.84
100,894
2,048
0.05
0.19
6.82
50,447
4,096
0.03
0.16
6.86
25,223
8,192
0.01
0.18
6.67
12,611
16,384
0.02
0.18
6.87
6,305
32,768
0.00
0.16
6.70
3,152
65,536
0.02
0.19
6.92
1,576
131,072
0.00
0.16
6.84
788
262,144
0.01
0.25
7.30
394
524,288
0.00
0.22
7.35
198
Page 119
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
We'll return to this timing example later in the text. In Section 3.14, we show the effect of
synchronous writes; in Section 5.8, we compare these unbuffered I/O times with the standard
I/O library.
Beware when trying to measure the performance of programs that read and write files. The
operating system will try to cache the file incore, so if you measure the performance of the
program repeatedly, the successive timings will likely be better than the first. This is because
the first run will cause the file to be entered into the system's cache, and successive runs will
access the file from the system's cache instead of from the disk. (The term incore means in
main memory. Back in the day, a computer's main memory was built out of ferrite core. This
is where the phrase "core dump" comes from: the main memory image of a program stored in a
file on disk for diagnosis.)
In the tests reported in Figure 3.5, each run with a different buffer size was made using a
different copy of the file so that the current run didn't find the data in the cache from the
previous run. The files are large enough that they all don't remain in the cache (the test
system was configured with 512 MB of RAM).
Page 120
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.10. File Sharing
The UNIX System supports the sharing of open files among different processes. Before
describing the dup function, we need to describe this sharing. To do this, we'll examine the
data structures used by the kernel for all I/O.
The following description is conceptual. It may or may not match a particular implementation.
Refer to Bach [1986] for a discussion of these structures in System V. McKusick et al. [1996]
describes these structures in 4.4BSD. McKusick and Neville-Neil [2005] cover FreeBSD 5.2. For
a similar discussion of Solaris, see Mauro and McDougall [2001].
The kernel uses three data structures to represent an open file, and the relationships among
them determine the effect one process has on another with regard to file sharing.
1.
2.
3.
Every process has an entry in the process table. Within each process table entry is a
table of open file descriptors, which we can think of as a vector, with one entry per
descriptor. Associated with each file descriptor are
a.
The file descriptor flags (close-on-exec; refer to Figure 3.6 and Section 3.14)
b.
A pointer to a file table entry
The kernel maintains a file table for all open files. Each file table entry contains
a.
The file status flags for the file, such as read, write, append, sync, and
nonblocking; more on these in Section 3.14
b.
The current file offset
c.
A pointer to the v-node table entry for the file
Each open file (or device) has a v-node structure that contains information about the
type of file and pointers to functions that operate on the file. For most files, the
v-node also contains the i-node for the file. This information is read from disk when the
file is opened, so that all the pertinent information about the file is readily available.
For example, the i-node contains the owner of the file, the size of the file, pointers to
where the actual data blocks for the file are located on disk, and so on. (We talk more
about i-nodes in Section 4.14 when we describe the typical UNIX file system in more
detail.)
Linux has no v-node. Instead, a generic i-node structure is used. Although the
implementations differ, the v-node is conceptually the same as a generic i-node. Both
point to an i-node structure specific to the file system.
We're ignoring some implementation details that don't affect our discussion. For example, the
table of open file descriptors can be stored in the user area instead of the process table.
These tables can be implemented in numerous waysthey need not be arrays; they could be
implemented as linked lists of structures, for example. These implementation details don't
affect our discussion of file sharing.
Figure 3.6 shows a pictorial arrangement of these three tables for a single process that has
two different files open: one file is open on standard input (file descriptor 0), and the other is
open on standard output (file descriptor 1). The arrangement of these three tables has
existed since the early versions of the UNIX System [Thompson 1978], and this arrangement
is critical to the way files are shared among processes. We'll return to this figure in later
chapters, when we describe additional ways that files are shared.
Figure 3.6. Kernel data structures for open files
Page 121
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
[View full size image]
The v-node was invented to provide support for multiple file system types on a single
computer system. This work was done independently by Peter Weinberger (Bell Laboratories)
and Bill Joy (Sun Microsystems). Sun called this the Virtual File System and called the file
systemindependent portion of the i-node the v-node [Kleiman 1986]. The v-node propagated
through various vendor implementations as support for Sun's Network File System (NFS) was
added. The first release from Berkeley to provide v-nodes was the 4.3BSD Reno release, when
NFS was added.
In SVR4, the v-node replaced the file systemindependent i-node of SVR3. Solaris is derived
from SVR4 and thus uses v-nodes.
Instead of splitting the data structures into a v-node and an i-node, Linux uses a file
systemindependent i-node and a file systemdependent i-node.
If two independent processes have the same file open, we could have the arrangement shown
in Figure 3.7. We assume here that the first process has the file open on descriptor 3 and that
the second process has that same file open on descriptor 4. Each process that opens the file
gets its own file table entry, but only a single v-node table entry is required for a given file.
One reason each process gets its own file table entry is so that each process has its own
current offset for the file.
Figure 3.7. Two independent processes with the same file open
[View full size image]
Page 122
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Given these data structures, we now need to be more specific about what happens with
certain operations that we've already described.

After each write is complete, the current file offset in the file table entry is
incremented by the number of bytes written. If this causes the current file offset to
exceed the current file size, the current file size in the i-node table entry is set to the
current file offset (for example, the file is extended).

If a file is opened with the O_APPEND flag, a corresponding flag is set in the file status
flags of the file table entry. Each time a write is performed for a file with this append
flag set, the current file offset in the file table entry is first set to the current file size
from the i-node table entry. This forces every write to be appended to the current end
of file.

If a file is positioned to its current end of file using lseek, all that happens is the
current file offset in the file table entry is set to the current file size from the i-node
table entry. (Note that this is not the same as if the file was opened with the O_APPEND
flag, as we will see in Section 3.11.)

The lseek function modifies only the current file offset in the file table entry. No I/O
takes place.
It is possible for more than one file descriptor entry to point to the same file table entry, as
we'll see when we discuss the dup function in Section 3.12. This also happens after a fork
when the parent and the child share the same file table entry for each open descriptor (
Section 8.3).
Note the difference in scope between the file descriptor flags and the file status flags. The
former apply only to a single descriptor in a single process, whereas the latter apply to all
descriptors in any process that point to the given file table entry. When we describe the fcntl
function in Section 3.14, we'll see how to fetch and modify both the file descriptor flags and
the file status flags.
Everything that we've described so far in this section works fine for multiple processes that
Page 123
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
are reading the same file. Each process has its own file table entry with its own current file
offset. Unexpected results can arise, however, when multiple processes write to the same file.
To see how to avoid some surprises, we need to understand the concept of atomic
operations.
Page 124
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.11. Atomic Operations
Appending to a File
Consider a single process that wants to append to the end of a file. Older versions of the
UNIX System didn't support the O_APPEND option to open, so the program was coded as follows:
if (lseek(fd, 0L, 2) < 0)
err_sys("lseek error");
if (write(fd, buf, 100) != 100)
err_sys("write error");
/* position to EOF */
/* and write */
This works fine for a single process, but problems arise if multiple processes use this technique
to append to the same file. (This scenario can arise if multiple instances of the same program
are appending messages to a log file, for example.)
Assume that two independent processes, A and B, are appending to the same file. Each has
opened the file but without the O_APPEND flag. This gives us the same picture as Figure 3.7.
Each process has its own file table entry, but they share a single v-node table entry. Assume
that process A does the lseek and that this sets the current offset for the file for process A
to byte offset 1,500 (the current end of file). Then the kernel switches processes, and B
continues running. Process B then does the lseek, which sets the current offset for the file for
process B to byte offset 1,500 also (the current end of file). Then B calls write, which
increments B's current file offset for the file to 1,600. Because the file's size has been
extended, the kernel also updates the current file size in the v-node to 1,600. Then the kernel
switches processes and A resumes. When A calls write, the data is written starting at the
current file offset for A, which is byte offset 1,500. This overwrites the data that B wrote to
the file.
The problem here is that our logical operation of "position to the end of file and write" requires
two separate function calls (as we've shown it). The solution is to have the positioning to the
current end of file and the write be an atomic operation with regard to other processes. Any
operation that requires more than one function call cannot be atomic, as there is always the
possibility that the kernel can temporarily suspend the process between the two function calls
(as we assumed previously).
The UNIX System provides an atomic way to do this operation if we set the O_APPEND flag
when a file is opened. As we described in the previous section, this causes the kernel to
position the file to its current end of file before each write. We no longer have to call lseek
before each write.
pread and pwrite Functions
The Single UNIX Specification includes XSI extensions that allow applications to seek and
perform I/O atomically. These extensions are pread and pwrite.
Page 125
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
[View full width]
#include <unistd.h>
ssize_t pread(int filedes, void *buf, size_t
nbytes, off_t offset);
Returns: number of bytes read, 0 if end of file, 1 on error
[View full width]
ssize_t pwrite(int filedes, const void *buf,
size_t nbytes, off_t offset);
Returns: number of bytes written if OK, 1 on error
Calling pread is equivalent to calling lseek followed by a call to read, with the following
exceptions.

There is no way to interrupt the two operations using pread.

The file pointer is not updated.
Calling pwrite is equivalent to calling lseek followed by a call to write, with similar exceptions.
Creating a File
We saw another example of an atomic operation when we described the O_CREAT and O_EXCL
options for the open function. When both of these options are specified, the open will fail if the
file already exists. We also said that the check for the existence of the file and the creation
of the file was performed as an atomic operation. If we didn't have this atomic operation, we
might try
if ((fd = open(pathname, O_WRONLY)) < 0) {
if (errno == ENOENT) {
if ((fd = creat(pathname, mode)) < 0)
err_sys("creat error");
} else {
err_sys("open error");
}
}
The problem occurs if the file is created by another process between the open and the creat.
If the file is created by another process between these two function calls, and if that other
process writes something to the file, that data is erased when this creat is executed.
Combining the test for existence and the creation into a single atomic operation avoids this
problem.
In general, the term atomic operation refers to an operation that might be composed of
multiple steps. If the operation is performed atomically, either all the steps are performed, or
none are performed. It must not be possible for a subset of the steps to be performed. We'll
return to the topic of atomic operations when we describe the link function (Section 4.15)
and record locking (Section 14.3).
Page 126
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 127
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.12. dup and dup2 Functions
An existing file descriptor is duplicated by either of the following functions.
#include <unistd.h>
int dup(int filedes);
int dup2(int filedes, int filedes2);
Both return: new file descriptor if OK, 1 on error
The new file descriptor returned by dup is guaranteed to be the lowest-numbered available file
descriptor. With dup2, we specify the value of the new descriptor with the filedes2 argument.
If filedes2 is already open, it is first closed. If filedes equals filedes2, then dup2 returns filedes2
without closing it.
The new file descriptor that is returned as the value of the functions shares the same file
table entry as the filedes argument. We show this in Figure 3.8.
Figure 3.8. Kernel data structures after dup(1)
[View full size image]
In this figure, we're assuming that when it's started, the process executes
newfd = dup(1);
We assume that the next available descriptor is 3 (which it probably is, since 0, 1, and 2 are
opened by the shell). Because both descriptors point to the same file table entry, they share
the same file status flagsread, write, append, and so onand the same current file offset.
Each descriptor has its own set of file descriptor flags. As we describe in the next section, the
close-on-exec file descriptor flag for the new descriptor is always cleared by the dup
functions.
Another way to duplicate a descriptor is with the fcntl function, which we describe in Section
3.14. Indeed, the call
dup(filedes);
Page 128
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
is equivalent to
fcntl(filedes, F_DUPFD, 0);
Similarly, the call
dup2(filedes, filedes2);
is equivalent to
close(filedes2);
fcntl(filedes, F_DUPFD, filedes2);
In this last case, the dup2 is not exactly the same as a close followed by an fcntl. The
differences are as follows.
1.
dup2 is an atomic operation, whereas the alternate form involves two function calls. It
is possible in the latter case to have a signal catcher called between the close and
the fcntl that could modify the file descriptors. (We describe signals in Chapter 10.)
2.
There are some errno differences between dup2 and fcntl.
The dup2 system call originated with Version 7 and propagated through the BSD
releases. The fcntl method for duplicating file descriptors appeared with System III
and continued with System V. SVR3.2 picked up the dup2 function, and 4.2BSD picked
up the fcntl function and the F_DUPFD functionality. POSIX.1 requires both dup2 and
the F_DUPFD feature of fcntl.
Page 129
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.13. sync, fsync, and fdatasync Functions
Traditional implementations of the UNIX System have a buffer cache or page cache in the
kernel through which most disk I/O passes. When we write data to a file, the data is normally
copied by the kernel into one of its buffers and queued for writing to disk at some later time.
This is called delayed write. (Chapter 3 of Bach [1986] discusses this buffer cache in detail.)
The kernel eventually writes all the delayed-write blocks to disk, normally when it needs to
reuse the buffer for some other disk block. To ensure consistency of the file system on disk
with the contents of the buffer cache, the sync, fsync, and fdatasync functions are provided.
#include <unistd.h>
int fsync(int filedes);
int fdatasync(int filedes);
Returns: 0 if OK, 1 on error
void sync(void);
The sync function simply queues all the modified block buffers for writing and returns; it does
not wait for the disk writes to take place.
The function sync is normally called periodically (usually every 30 seconds) from a system
daemon, often called update. This guarantees regular flushing of the kernel's block buffers. The
command sync(1) also calls the sync function.
The function fsync refers only to a single file, specified by the file descriptor filedes, and waits
for the disk writes to complete before returning. The intended use of fsync is for an
application, such as a database, that needs to be sure that the modified blocks have been
written to the disk.
The fdatasync function is similar to fsync, but it affects only the data portions of a file. With
fsync, the file's attributes are also updated synchronously.
All four of the platforms described in this book support sync and fsync. However, FreeBSD
5.2.1 and Mac OS X 10.3 do not support fdatasync.
Page 130
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.14. fcntl Function
The fcntl function can change the properties of a file that is already open.
#include <fcntl.h>
int fcntl(int filedes, int cmd, ... /* int arg */ );
Returns: depends on cmd if OK (see following), 1 on error
In the examples in this section, the third argument is always an integer, corresponding to the
comment in the function prototype just shown. But when we describe record locking in
Section 14.3, the third argument becomes a pointer to a structure.
The fcntl function is used for five different purposes.
1.
Duplicate an existing descriptor (cmd = F_DUPFD)
2.
Get/set file descriptor flags (cmd = F_GETFD or F_SETFD)
3.
Get/set file status flags (cmd = F_GETFL or F_SETFL)
4.
Get/set asynchronous I/O ownership (cmd = F_GETOWN or F_SETOWN)
5.
Get/set record locks (cmd = F_GETLK, F_SETLK, or F_SETLKW)
We'll now describe the first seven of these ten cmd values. (We'll wait until Section 14.3 to
describe the last three, which deal with record locking.) Refer to Figure 3.6, since we'll be
referring to both the file descriptor flags associated with each file descriptor in the process
table entry and the file status flags associated with each file table entry.
F_DUPFD Duplicate the file descriptor filedes. The new file descriptor is returned as the
value of the function. It is the lowest-numbered descriptor that is not already
open, that is greater than or equal to the third argument (taken as an integer).
The new descriptor shares the same file table entry as filedes. (Refer to Figure 3.8
.) But the new descriptor has its own set of file descriptor flags, and its FD_CLOEXEC
file descriptor flag is cleared. (This means that the descriptor is left open across
an exec, which we discuss in Chapter 8.)
F_GETFD Return the file descriptor flags for filedes as the value of the function. Currently,
only one file descriptor flag is defined: the FD_CLOEXEC flag.
F_SETFD Set the file descriptor flags for filedes. The new flag value is set from the third
argument (taken as an integer).
Be aware that some existing programs that deal with the file descriptor flags don't
use the constant FD_CLOEXEC. Instead, the programs set the flag to either 0 (don't
close-on-exec, the default) or 1 (do close-on-exec).
F_GETFL Return the file status flags for filedes as the value of the function. We described
the file status flags when we described the open function. They are listed in Figure
3.9.
Page 131
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 3.9. File status flags for fcntl
File status flag
Description
O_RDONLY
open for reading only
O_WRONLY
open for writing only
O_RDWR
open for reading and writing
O_APPEND
append on each write
O_NONBLOCK
nonblocking mode
O_SYNC
wait for writes to complete (data and attributes)
O_DSYNC
wait for writes to complete (data only)
O_RSYNC
synchronize reads and writes
O_FSYNC
wait for writes to complete (FreeBSD and Mac OS X only)
O_ASYNC
asynchronous I/O (FreeBSD and Mac OS X only)
Unfortunately, the three access-mode flagsO_RDONLY, O_WRONLY, and O_RDWRare not
separate bits that can be tested. (As we mentioned earlier, these three often
have the values 0, 1, and 2, respectively, for historical reasons. Also, these three
values are mutually exclusive; a file can have only one of the three enabled.)
Therefore, we must first use the O_ACCMODE mask to obtain the access-mode bits
and then compare the result against any of the three values.
F_SETFL
Set the file status flags to the value of the third argument (taken as an integer).
The only flags that can be changed are O_APPEND, O_NONBLOCK, O_SYNC, O_DSYNC,
O_RSYNC, O_FSYNC, and O_ASYNC.
F_GETOWN Get the process ID or process group ID currently receiving the SIGIO and SIGURG
signals. We describe these asynchronous I/O signals in Section 14.6.2.
F_SETOWN Set the process ID or process group ID to receive the SIGIO and SIGURG signals. A
positive arg specifies a process ID. A negative arg implies a process group ID
equal to the absolute value of arg.
The return value from fcntl depends on the command. All commands return 1 on an error or
some other value if OK. The following four commands have special return values: F_DUPFD,
F_GETFD, F_GETFL, and F_GETOWN. The first returns the new file descriptor, the next two return
the corresponding flags, and the final one returns a positive process ID or a negative process
group ID.
Example
Page 132
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The program in Figure 3.10 takes a single command-line argument that specifies a file
descriptor and prints a description of selected file flags for that descriptor.
Note that we use the feature test macro _POSIX_C_SOURCE and conditionally compile the file
access flags that are not part of POSIX.1. The following script shows the operation of the
program, when invoked from bash (the Bourne-again shell). Results vary, depending on which
shell you use.
$ ./a.out 0 < /dev/tty
read only
$ ./a.out 1 > temp.foo
$ cat temp.foo
write only
$ ./a.out 2 2>>temp.foo
write only, append
$ ./a.out 5 5<>temp.foo
read write
The clause 5<>temp.foo opens the file temp.foo for reading and writing on file descriptor 5.
Figure 3.10. Print file flags for specified descriptor
#include "apue.h"
#include <fcntl.h>
int
main(int argc, char *argv[])
{
int
val;
if (argc != 2)
err_quit("usage: a.out <descriptor#>");
if ((val = fcntl(atoi(argv[1]), F_GETFL, 0)) < 0)
err_sys("fcntl error for fd %d", atoi(argv[1]));
switch (val & O_ACCMODE) {
case O_RDONLY:
printf("read only");
break;
case O_WRONLY:
printf("write only");
break;
case O_RDWR:
printf("read write");
break;
default:
err_dump("unknown access mode");
}
if (val & O_APPEND)
printf(", append");
if (val & O_NONBLOCK)
printf(", nonblocking");
#if defined(O_SYNC)
if (val & O_SYNC)
Page 133
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
printf(", synchronous writes");
#endif
#if !defined(_POSIX_C_SOURCE) && defined(O_FSYNC)
if (val & O_FSYNC)
printf(", synchronous writes");
#endif
putchar('\n');
exit(0);
}
Example
When we modify either the file descriptor flags or the file status flags, we must be careful to
fetch the existing flag value, modify it as desired, and then set the new flag value. We can't
simply do an F_SETFD or an F_SETFL, as this could turn off flag bits that were previously set.
Figure 3.11 shows a function that sets one or more of the file status flags for a descriptor.
If we change the middle statement to
val &= ~flags;
/* turn flags off */
we have a function named clr_fl, which we'll use in some later examples. This statement
logically ANDs the one's complement of flags with the current val.
If we call set_fl from Figure 3.4 by adding the line
set_fl(STDOUT_FILENO, O_SYNC);
at the beginning of the program, we'll turn on the synchronous-write flag. This causes each
write to wait for the data to be written to disk before returning. Normally in the UNIX System,
a write only queues the data for writing; the actual disk write operation can take place
sometime later. A database system is a likely candidate for using O_SYNC, so that it knows on
return from a write that the data is actually on the disk, in case of an abnormal system
failure.
We expect the O_SYNC flag to increase the clock time when the program runs. To test this, we
can run the program in Figure 3.4, copying 98.5 MB of data from one file on disk to another
and compare this with a version that does the same thing with the O_SYNC flag set. The results
from a Linux system using the ext2 file system are shown in Figure 3.12.
The six rows in Figure 3.12 were all measured with a BUFFSIZE of 4,096. The results in Figure
3.5 were measured reading a disk file and writing to /dev/null, so there was no disk output.
The second row in Figure 3.12 corresponds to reading a disk file and writing to another disk
file. This is why the first and second rows in Figure 3.12 are different. The system time
increases when we write to a disk file, because the kernel now copies the data from our
process and queues the data for writing by the disk driver. We expect the clock time to
increase also when we write to a disk file, but it doesn't increase significantly for this test,
which indicates that our writes go to the system cache, and we don't measure the cost to
actually write the data to disk.
When we enable synchronous writes, the system time and the clock time should increase
significantly. As the third row shows, the time for writing synchronously is about the same as
when we used delayed writes. This implies that the Linux ext2 file system isn't honoring the
O_SYNC flag. This suspicion is supported by the sixth line, which shows that the time to do
synchronous writes followed by a call to fsync is just as large as calling fsync after writing the
file without synchronous writes (line 5). After writing a file synchronously, we expect that a
call to fsync will have no effect.
Page 134
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 3.13 shows timing results for the same tests on Mac OS X 10.3. Note that the times
match our expectations: synchronous writes are far more expensive than delayed writes, and
using fsync with synchronous writes makes no measurable difference. Note also that adding a
call to fsync at the end of the delayed writes makes no measurable difference. It is likely that
the operating system flushed previously written data to disk as we were writing new data to
the file, so by the time that we called fsync, very little work was left to be done.
Compare fsync and fdatasync, which update a file's contents when we say so, with the O_SYNC
flag, which updates a file's contents every time we write to the file.
Figure 3.11. Turn on one or more of the file status flags for a descriptor
#include "apue.h"
#include <fcntl.h>
void
set_fl(int fd, int flags) /* flags are file status flags to turn on */
{
int
val;
if ((val = fcntl(fd, F_GETFL, 0)) < 0)
err_sys("fcntl F_GETFL error");
val |= flags;
/* turn on flags */
if (fcntl(fd, F_SETFL, val) < 0)
err_sys("fcntl F_SETFL error");
}
Figure 3.12. Linux ext2 timing results using various synchronization
mechanisms
User CPU
(seconds)
System CPU
(seconds)
Clock time
(seconds)
read time from Figure 3.5 for
BUFFSIZE = 4,096
0.03
0.16
6.86
normal write to disk file
0.02
0.30
6.87
write to disk file with O_SYNC set
0.03
0.30
6.83
write to disk followed by fdatasync
0.03
0.42
18.28
write to disk followed by fsync
0.03
0.37
17.95
write to disk with O_SYNC set
followed by fsync
0.05
0.44
17.95
Operation
Page 135
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 3.13. Mac OS X timing results using various synchronization
mechanisms
User CPU
(seconds)
System CPU
(seconds)
Clock time
(seconds)
write to /dev/null
0.06
0.79
4.33
normal write to disk file
0.05
3.56
14.40
write to disk file with O_FSYNC set
0.13
9.53
22.48
write to disk followed by fsync
0.11
3.31
14.12
write to disk with O_FSYNC set
followed by fsync
0.17
9.14
22.12
Operation
With this example, we see the need for fcntl. Our program operates on a descriptor (standard
output), never knowing the name of the file that was opened by the shell on that descriptor.
We can't set the O_SYNC flag when the file is opened, since the shell opened the file. With
fcntl, we can modify the properties of a descriptor, knowing only the descriptor for the open
file. We'll see another need for fcntl when we describe nonblocking pipes (Section 15.2),
since all we have with a pipe is a descriptor.
Page 136
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.15. ioctl Function
The ioctl function has always been the catchall for I/O operations. Anything that couldn't be
expressed using one of the other functions in this chapter usually ended up being specified
with an ioctl. Terminal I/O was the biggest user of this function. (When we get to Chapter 18
, we'll see that POSIX.1 has replaced the terminal I/O operations with separate functions.)
#include <unistd.h>
#include <sys/ioctl.h>
#include <stropts.h>
/* System V */
/* BSD and Linux */
/* XSI STREAMS */
int ioctl(int filedes, int request, ...);
Returns: 1 on error, something else if OK
The ioctl function is included in the Single UNIX Specification only as an extension for dealing
with STREAMS devices [Rago 1993]. UNIX System implementations, however, use it for many
miscellaneous device operations. Some implementations have even extended it for use with
regular files.
The prototype that we show corresponds to POSIX.1. FreeBSD 5.2.1 and Mac OS X 10.3
declare the second argument as an unsigned long. This detail doesn't matter, since the
second argument is always a #defined name from a header.
For the ISO C prototype, an ellipsis is used for the remaining arguments. Normally, however,
there is only one more argument, and it's usually a pointer to a variable or a structure.
In this prototype, we show only the headers required for the function itself. Normally,
additional device-specific headers are required. For example, the ioctl commands for terminal
I/O, beyond the basic operations specified by POSIX.1, all require the <termios.h> header.
Each device driver can define its own set of ioctl commands. The system, however, provides
generic ioctl commands for different classes of devices. Examples of some of the categories
for these generic ioctl commands supported in FreeBSD are summarized in Figure 3.14.
Figure 3.14. Common FreeBSD ioctl operations
Category
Constant names
Header
Number of ioctls
disk labels
DIOxxx
<sys/disklabel.h>
6
file I/O
FIOxxx
<sys/filio.h>
9
mag tape I/O
MTIOxxx
<sys/mtio.h>
11
socket I/O
SIOxxx
<sys/sockio.h>
60
terminal I/O
TIOxxx
<sys/ttycom.h>
44
Page 137
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The mag tape operations allow us to write end-of-file marks on a tape, rewind a tape, space
forward over a specified number of files or records, and the like. None of these operations is
easily expressed in terms of the other functions in the chapter (read, write, lseek, and so on),
so the easiest way to handle these devices has always been to access their operations using
ioctl.
We use the ioctl function in Section 14.4 when we describe the STREAMS system, in Section
18.12 to fetch and set the size of a terminal's window, and in Section 19.7 when we access
the advanced features of pseudo terminals.
Page 138
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.16. /dev/fd
Newer systems provide a directory named /dev/fd whose entries are files named 0, 1, 2, and
so on. Opening the file /dev/fd/n is equivalent to duplicating descriptor n, assuming that
descriptor n is open.
The /dev/fd feature was developed by Tom Duff and appeared in the 8th Edition of the
Research UNIX System. It is supported by all of the systems described in this book: FreeBSD
5.2.1, Linux 2.4.22, Mac OS X 10.3, and Solaris 9. It is not part of POSIX.1.
In the function call
fd = open("/dev/fd/0", mode);
most systems ignore the specified mode, whereas others require that it be a subset of the
mode used when the referenced file (standard input, in this case) was originally opened.
Because the previous open is equivalent to
fd = dup(0);
the descriptors 0 and fd share the same file table entry (Figure 3.8). For example, if descriptor
0 was opened read-only, we can only read on fd. Even if the system ignores the open mode,
and the call
fd = open("/dev/fd/0", O_RDWR);
succeeds, we still can't write to fd.
We can also call creat with a /dev/fd pathname argument, as well as specifying O_CREAT in a
call to open. This allows a program that calls creat to still work if the pathname argument is
/dev/fd/1, for example.
Some systems provide the pathnames /dev/stdin, /dev/stdout, and /dev/stderr. These
pathnames are equivalent to /dev/fd/0, /dev/fd/1, and /dev/fd/2.
The main use of the /dev/fd files is from the shell. It allows programs that use pathname
arguments to handle standard input and standard output in the same manner as other
pathnames. For example, the cat(1) program specifically looks for an input filename of - and
uses this to mean standard input. The command
filter file2 | cat file1 - file3 | lpr
is an example. First, cat reads file1, next its standard input (the output of the filter
program on file2), then file3. If /dev/fd is supported, the special handling of - can be
removed from cat, and we can enter
filter file2 | cat file1 /dev/fd/0 file3 | lpr
The special meaning of - as a command-line argument to refer to the standard input or
standard output is a kludge that has crept into many programs. There are also problems if we
specify - as the first file, as it looks like the start of another command-line option. Using
/dev/fd is a step toward uniformity and cleanliness.
Page 139
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 140
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.17. Summary
This chapter has described the basic I/O functions provided by the UNIX System. These are
often called the unbuffered I/O functions because each read or write invokes a system call
into the kernel. Using only read and write, we looked at the effect of various I/O sizes on the
amount of time required to read a file. We also looked at several ways to flush written data to
disk and their effect on application performance.
Atomic operations were introduced when multiple processes append to the same file and when
multiple processes create the same file. We also looked at the data structures used by the
kernel to share information about open files. We'll return to these data structures later in the
text.
We also described the ioctl and fcntl functions. We return to both of these functions in
Chapter 14, where we'll use ioctl with the STREAMS I/O system, and fcntl for record locking.
Page 141
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Exercises
3.1
3.2
3.3
When reading or writing a disk file, are the functions described in this chapter
really unbuffered? Explain.
Write your own dup2 function that performs the same service as the dup2
function described in Section 3.12, without calling the fcntl function. Be sure
to handle errors correctly.
Assume that a process executes the following three function calls:
fd1 = open(pathname, oflags);
fd2 = dup(fd1);
fd3 = open(pathname, oflags);
Draw the resulting picture, similar to Figure 3.8. Which descriptors are affected
by an fcntl on fd1 with a command of F_SETFD? Which descriptors are affected
by an fcntl on fd1 with a command of F_SETFL?
3.4
The following sequence of code has been observed in various programs:
dup2(fd, 0);
dup2(fd, 1);
dup2(fd, 2);
if (fd > 2)
close(fd);
To see why the if test is needed, assume that fd is 1 and draw a picture of
what happens to the three descriptor entries and the corresponding file table
entry with each call to dup2. Then assume that fd is 3 and draw the same
picture.
3.5
The Bourne shell, Bourne-again shell, and Korn shell notation
digit1>&digit2
says to redirect descriptor digit1 to the same file as descriptor digit2. What is
the difference between the two commands
./a.out > outfile 2>&1
./a.out 2>&1 > outfile
(Hint: the shells process their command lines from left to right.)
3.6
If you open a file for readwrite with the append flag, can you still read from
anywhere in the file using lseek? Can you use lseek to replace existing data in
the file? Write a program to verify this.
Page 142
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 143
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Chapter 4. Files and Directories
Section 4.1. Introduction
Section 4.2. stat, fstat, and lstat Functions
Section 4.3. File Types
Section 4.4. Set-User-ID and Set-Group-ID
Section 4.5. File Access Permissions
Section 4.6. Ownership of New Files and Directories
Section 4.7. access Function
Section 4.8. umask Function
Section 4.9. chmod and fchmod Functions
Section 4.10. Sticky Bit
Section 4.11. chown, fchown, and lchown Functions
Section 4.12. File Size
Section 4.13. File Truncation
Section 4.14. File Systems
Section 4.15. link, unlink, remove, and rename Functions
Section 4.16. Symbolic Links
Section 4.17. symlink and readlink Functions
Section 4.18. File Times
Section 4.19. utime Function
Section 4.20. mkdir and rmdir Functions
Section 4.21. Reading Directories
Section 4.22. chdir, fchdir, and getcwd Functions
Section 4.23. Device Special Files
Section 4.24. Summary of File Access Permission Bits
Section 4.25. Summary
Exercises
Page 144
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.1. Introduction
In the previous chapter we covered the basic functions that perform I/O. The discussion
centered around I/O for regular filesopening a file, and reading or writing a file. We'll now look
at additional features of the file system and the properties of a file. We'll start with the stat
functions and go through each member of the stat structure, looking at all the attributes of a
file. In this process, we'll also describe each of the functions that modify these attributes:
change the owner, change the permissions, and so on. We'll also look in more detail at the
structure of a UNIX file system and symbolic links. We finish this chapter with the functions
that operate on directories, and we develop a function that descends through a directory
hierarchy.
Page 145
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.2. stat, fstat, and lstat Functions
The discussion in this chapter centers around the three stat functions and the information
they return.
[View full width]
#include <sys/stat.h>
int stat(const char *restrict pathname, struct
stat *restrict buf);
int fstat(int filedes, struct stat *buf);
int lstat(const char *restrict pathname,
struct
stat *restrict buf);
All three return: 0 if OK, 1 on error
Given a pathname, the stat function returns a structure of information about the named file.
The fstat function obtains information about the file that is already open on the descriptor
filedes. The lstat function is similar to stat, but when the named file is a symbolic link, lstat
returns information about the symbolic link, not the file referenced by the symbolic link. (We'll
need lstat in Section 4.21 when we walk down a directory hierarchy. We describe symbolic
links in more detail in Section 4.16.)
The second argument is a pointer to a structure that we must supply. The function fills in the
structure pointed to by buf. The definition of the structure can differ among implementations,
but it could look like
struct stat
mode_t
ino_t
dev_t
dev_t
nlink_t
uid_t
gid_t
off_t
time_t
time_t
time_t
blksize_t
blkcnt_t
};
{
st_mode;
st_ino;
st_dev;
st_rdev;
st_nlink;
st_uid;
st_gid;
st_size;
st_atime;
st_mtime;
st_ctime;
st_blksize;
st_blocks;
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
file type & mode (permissions) */
i-node number (serial number) */
device number (file system) */
device number for special files */
number of links */
user ID of owner */
group ID of owner */
size in bytes, for regular files */
time of last access */
time of last modification */
time of last file status change */
best I/O block size */
number of disk blocks allocated */
The st_rdev, st_blksize, and st_blocks fields are not required by POSIX.1. They are defined
as XSI extensions in the Single UNIX Specification.
Note that each member is specified by a primitive system data type (see Section 2.8). We'll
Page 146
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
go through each member of this structure to examine the attributes of a file.
The biggest user of the stat functions is probably the ls -l command, to learn all the
information about a file.
Page 147
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.3. File Types
We've talked about two different types of files so far: regular files and directories. Most files
on a UNIX system are either regular files or directories, but there are additional types of files.
The types are:
1.
Regular file. The most common type of file, which contains data of some form. There is
no distinction to the UNIX kernel whether this data is text or binary. Any interpretation
of the contents of a regular file is left to the application processing the file.
One notable exception to this is with binary executable files. To execute a program,
the kernel must understand its format. All binary executable files conform to a format
that allows the kernel to identify where to load a program's text and data.
2.
Directory file. A file that contains the names of other files and pointers to information
on these files. Any process that has read permission for a directory file can read the
contents of the directory, but only the kernel can write directly to a directory file.
Processes must use the functions described in this chapter to make changes to a
directory.
3.
Block special file. A type of file providing buffered I/O access in fixed-size units to
devices such as disk drives.
4.
Character special file. A type of file providing unbuffered I/O access in variable-sized
units to devices. All devices on a system are either block special files or character
special files.
5.
FIFO. A type of file used for communication between processes. It's sometimes called a
named pipe. We describe FIFOs in Section 15.5.
6.
Socket. A type of file used for network communication between processes. A socket
can also be used for non-network communication between processes on a single host.
We use sockets for interprocess communication in Chapter 16.
7.
Symbolic link. A type of file that points to another file. We talk more about symbolic
links in Section 4.16.
The type of a file is encoded in the st_mode member of the stat structure. We can determine
the file type with the macros shown in Figure 4.1. The argument to each of these macros is
the st_mode member from the stat structure.
Figure 4.1. File type macros in <sys/stat.h>
Macro
Type of file
S_ISREG()
regular file
S_ISDIR()
directory file
S_ISCHR()
character special file
S_ISBLK()
block special file
Page 148
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 4.1. File type macros in <sys/stat.h>
Macro
Type of file
S_ISFIFO()
pipe or FIFO
S_ISLNK()
symbolic link
S_ISSOCK()
socket
POSIX.1 allows implementations to represent interprocess communication (IPC) objects, such
as message queues and semaphores, as files. The macros shown in Figure 4.2 allow us to
determine the type of IPC object from the stat structure. Instead of taking the st_mode
member as an argument, these macros differ from those in Figure 4.1 in that their argument is
a pointer to the stat structure.
Figure 4.2. IPC type macros in <sys/stat.h>
Macro
Type of object
S_TYPEISMQ()
message queue
S_TYPEISSEM()
semaphore
S_TYPEISSHM()
shared memory object
Message queues, semaphores, and shared memory objects are discussed in Chapter 15.
However, none of the various implementations of the UNIX System discussed in this book
represent these objects as files.
Example
The program in Figure 4.3 prints the type of file for each command-line argument.
Sample output from Figure 4.3 is
$ ./a.out /etc/passwd /etc /dev/initctl /dev/log /dev/tty \
> /dev/scsi/host0/bus0/target0/lun0/cd /dev/cdrom
/etc/passwd: regular
/etc: directory
/dev/initctl: fifo
/dev/log: socket
/dev/tty: character special
/dev/scsi/host0/bus0/target0/lun0/cd: block special
/dev/cdrom: symbolic link
(Here, we have explicitly entered a backslash at the end of the first command line, telling the
shell that we want to continue entering the command on another line. The shell then prompts
us with its secondary prompt, >, on the next line.) We have specifically used the lstat
function instead of the stat function to detect symbolic links. If we used the stat function,
Page 149
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
we would never see symbolic links.
To compile this program on a Linux system, we must define _GNU_SOURCE to include the
definition of the S_ISSOCK macro.
Figure 4.3. Print type of file for each command-line argument
#include "apue.h"
int
main(int argc, char *argv[])
{
int
i;
struct stat buf;
char
*ptr;
for (i = 1; i < argc; i++) {
printf("%s: ", argv[i]);
if (lstat(argv[i], &buf) < 0) {
err_ret("lstat error");
continue;
}
if (S_ISREG(buf.st_mode))
ptr = "regular";
else if (S_ISDIR(buf.st_mode))
ptr = "directory";
else if (S_ISCHR(buf.st_mode))
ptr = "character special";
else if (S_ISBLK(buf.st_mode))
ptr = "block special";
else if (S_ISFIFO(buf.st_mode))
ptr = "fifo";
else if (S_ISLNK(buf.st_mode))
ptr = "symbolic link";
else if (S_ISSOCK(buf.st_mode))
ptr = "socket";
else
ptr = "** unknown mode **";
printf("%s\n", ptr);
}
exit(0);
}
Historically, early versions of the UNIX System didn't provide the S_ISxxx macros. Instead, we
had to logically AND the st_mode value with the mask S_IFMT and then compare the result with
the constants whose names are S_IFxxx. Most systems define this mask and the related
constants in the file <sys/stat.h>. If we examine this file, we'll find the S_ISDIR macro defined
something like
#define S_ISDIR(mode) (((mode) & S_IFMT) == S_IFDIR)
We've said that regular files are predominant, but it is interesting to see what percentage of
the files on a given system are of each file type. Figure 4.4 shows the counts and
percentages for a Linux system that is used as a single-user workstation. This data was
obtained from the program that we show in Section 4.21.
Page 150
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 4.4. Counts and percentages of different file types
File type
regular file
Count
Percentage
226,856
88.22 %
23,017
8.95
6,442
2.51
character special
447
0.17
block special
312
0.12
69
0.03
1
0.00
directory
symbolic link
socket
FIFO
Page 151
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.4. Set-User-ID and Set-Group-ID
Every process has six or more IDs associated with it. These are shown in Figure 4.5.
Figure 4.5. User IDs and group IDs associated with each process
real user ID
real group ID
who we really are
effective user ID
effective group ID
supplementary group IDs
used for file access permission checks
saved set-user-ID
saved set-group-ID
saved by exec functions

The real user ID and real group ID identify who we really are. These two fields are
taken from our entry in the password file when we log in. Normally, these values don't
change during a login session, although there are ways for a superuser process to
change them, which we describe in Section 8.11.

The effective user ID, effective group ID, and supplementary group IDs determine our
file access permissions, as we describe in the next section. (We defined supplementary
group IDs in Section 1.8.)

The saved set-user-ID and saved set-group-ID contain copies of the effective user ID
and the effective group ID when a program is executed. We describe the function of
these two saved values when we describe the setuid function in Section 8.11.
The saved IDs are required with the 2001 version of POSIX.1. They used to be optional
in older versions of POSIX. An application can test for the constant _POSIX_SAVED_IDS
at compile time or can call sysconf with the _SC_SAVED_IDS argument at runtime, to see
whether the implementation supports this feature.
Normally, the effective user ID equals the real user ID, and the effective group ID equals the
real group ID.
Every file has an owner and a group owner. The owner is specified by the st_uid member of
the stat structure; the group owner, by the st_gid member.
When we execute a program file, the effective user ID of the process is usually the real user
ID, and the effective group ID is usually the real group ID. But the capability exists to set a
special flag in the file's mode word (st_mode) that says "when this file is executed, set the
effective user ID of the process to be the owner of the file (st_uid)." Similarly, another bit
can be set in the file's mode word that causes the effective group ID to be the group owner
of the file (st_gid). These two bits in the file's mode word are called the set-user-ID bit and
the set-group-ID bit.
For example, if the owner of the file is the superuser and if the file's set-user-ID bit is set,
then while that program file is running as a process, it has superuser privileges. This happens
regardless of the real user ID of the process that executes the file. As an example, the UNIX
System program that allows anyone to change his or her password, passwd(1), is a set-user-ID
program. This is required so that the program can write the new password to the password
file, typically either /etc/passwd or /etc/shadow, files that should be writable only by the
Page 152
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
superuser. Because a process that is running set-user-ID to some other user usually assumes
extra permissions, it must be written carefully. We'll discuss these types of programs in more
detail in Chapter 8.
Returning to the stat function, the set-user-ID bit and the set-group-ID bit are contained in
the file's st_mode value. These two bits can be tested against the constants S_ISUID and
S_ISGID.
Page 153
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.5. File Access Permissions
The st_mode value also encodes the access permission bits for the file. When we say file, we
mean any of the file types that we described earlier. All the file typesdirectories, character
special files, and so onhave permissions. Many people think only of regular files as having
access permissions.
There are nine permission bits for each file, divided into three categories. These are shown in
Figure 4.6.
Figure 4.6. The nine file access permission bits, from <sys/stat.h>
st_mode mask
Meaning
S_IRUSR
user-read
S_IWUSR
user-write
S_IXUSR
user-execute
S_IRGRP
group-read
S_IWGRP
group-write
S_IXGRP
group-execute
S_IROTH
other-read
S_IWOTH
other-write
S_IXOTH
other-execute
The term user in the first three rows in Figure 4.6 refers to the owner of the file. The chmod(1)
command, which is typically used to modify these nine permission bits, allows us to specify u
for user (owner), g for group, and o for other. Some books refer to these three as owner,
group, and world; this is confusing, as the chmod command uses o to mean other, not owner.
We'll use the terms user, group, and other, to be consistent with the chmod command.
The three categories in Figure 4.6read, write, and executeare used in various ways by
different functions. We'll summarize them here, and return to them when we describe the
actual functions.

The first rule is that whenever we want to open any type of file by name, we must
have execute permission in each directory mentioned in the name, including the current
directory, if it is implied. This is why the execute permission bit for a directory is often
called the search bit.
For example, to open the file /usr/include/stdio.h, we need execute permission in the
directory /, execute permission in the directory /usr, and execute permission in the
Page 154
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
directory /usr/include. We then need appropriate permission for the file itself,
depending on how we're trying to open it: read-only, readwrite, and so on.
If the current directory is /usr/include, then we need execute permission in the
current directory to open the file stdio.h. This is an example of the current directory
being implied, not specifically mentioned. It is identical to our opening the file ./stdio.h
.
Note that read permission for a directory and execute permission for a directory mean
different things. Read permission lets us read the directory, obtaining a list of all the
filenames in the directory. Execute permission lets us pass through the directory when
it is a component of a pathname that we are trying to access. (We need to search the
directory to look for a specific filename.)
Another example of an implicit directory reference is if the PATH environment variable,
described in Section 8.10, specifies a directory that does not have execute permission
enabled. In this case, the shell will never find executable files in that directory.

The read permission for a file determines whether we can open an existing file for
reading: the O_RDONLY and O_RDWR flags for the open function.

The write permission for a file determines whether we can open an existing file for
writing: the O_WRONLY and O_RDWR flags for the open function.

We must have write permission for a file to specify the O_TRUNC flag in the open
function.

We cannot create a new file in a directory unless we have write permission and
execute permission in the directory.

To delete an existing file, we need write permission and execute permission in the
directory containing the file. We do not need read permission or write permission for
the file itself.

Execute permission for a file must be on if we want to execute the file using any of the
six exec functions (Section 8.10). The file also has to be a regular file.
The file access tests that the kernel performs each time a process opens, creates, or deletes
a file depend on the owners of the file (st_uid and st_gid), the effective IDs of the process
(effective user ID and effective group ID), and the supplementary group IDs of the process, if
supported. The two owner IDs are properties of the file, whereas the two effective IDs and
the supplementary group IDs are properties of the process. The tests performed by the kernel
are as follows.
1.
If the effective user ID of the process is 0 (the superuser), access is allowed. This
gives the superuser free rein throughout the entire file system.
2.
If the effective user ID of the process equals the owner ID of the file (i.e., the process
owns the file), access is allowed if the appropriate user access permission bit is set.
Otherwise, permission is denied. By appropriate access permission bit, we mean that if
the process is opening the file for reading, the user-read bit must be on. If the process
is opening the file for writing, the user-write bit must be on. If the process is executing
the file, the user-execute bit must be on.
3.
If the effective group ID of the process or one of the supplementary group IDs of the
process equals the group ID of the file, access is allowed if the appropriate group
access permission bit is set. Otherwise, permission is denied.
4.
If the appropriate other access permission bit is set, access is allowed. Otherwise,
permission is denied.
Page 155
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
These four steps are tried in sequence. Note that if the process owns the file (step 2),
access is granted or denied based only on the user access permissions; the group permissions
are never looked at. Similarly, if the process does not own the file, but belongs to an
appropriate group, access is granted or denied based only on the group access permissions;
the other permissions are not looked at.
Page 156
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.6. Ownership of New Files and Directories
When we described the creation of a new file in Chapter 3, using either open or creat, we
never said what values were assigned to the user ID and group ID of the new file. We'll see
how to create a new directory in Section 4.20 when we describe the mkdir function. The rules
for the ownership of a new directory are identical to the rules in this section for the ownership
of a new file.
The user ID of a new file is set to the effective user ID of the process. POSIX.1 allows an
implementation to choose one of the following options to determine the group ID of a new file.
1.
The group ID of a new file can be the effective group ID of the process.
2.
The group ID of a new file can be the group ID of the directory in which the file is
being created.
FreeBSD 5.2.1 and Mac OS X 10.3 always uses the group ID of the directory as the
group ID of the new file.
The Linux ext2 and ext3 file systems allow the choice between these two POSIX.1
options to be made on a file system basis, using a special flag to the mount(1)
command. On Linux 2.4.22 (with the proper mount option) and Solaris 9, the group ID
of a new file depends on whether the set-group-ID bit is set for the directory in which
the file is being created. If this bit is set for the directory, the group ID of the new file
is set to the group ID of the directory; otherwise, the group ID of the new file is set to
the effective group ID of the process.
Using the second optioninheriting the group ID of the directoryassures us that all files and
directories created in that directory will have the group ID belonging to the directory. This
group ownership of files and directories will then propagate down the hierarchy from that
point. This is used, for example, in the /var/spool/mail directory on Linux.
As we mentioned, this option for group ownership is the default for FreeBSD 5.2.1 and Mac OS
X 10.3, but an option for Linux and Solaris. Under Linux 2.4.22 and Solaris 9, we have to
enable the set-group-ID bit, and the mkdir function has to propagate a directory's
set-group-ID bit automatically for this to work. (This is described in Section 4.20.)
Page 157
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.7. access Function
As we described earlier, when we open a file, the kernel performs its access tests based on
the effective user and group IDs. There are times when a process wants to test accessibility
based on the real user and group IDs. This is useful when a process is running as someone
else, using either the set-user-ID or the set-group-ID feature. Even though a process might
be set-user-ID to root, it could still want to verify that the real user can access a given file.
The access function bases its tests on the real user and group IDs. (Replace effective with
real in the four steps at the end of Section 4.5.)
#include <unistd.h>
int access(const char *pathname, int mode
);
Returns: 0 if OK, 1 on error
The mode is the bitwise OR of any of the constants shown in Figure 4.7.
Figure 4.7. The mode constants for access function, from <unistd.h>
mode
Description
R_OK
test for read permission
W_OK
test for write permission
X_OK
test for execute permission
F_OK
test for existence of file
Example
Figure 4.8 shows the use of the access function.
Here is a sample session with this program:
$ ls -l a.out
-rwxrwxr-x 1 sar
15945 Nov 30 12:10 a.out
$ ./a.out a.out
read access OK
open for reading OK
$ ls -l /etc/shadow
-r-------- 1 root
1315 Jul 17 2002 /etc/shadow
$ ./a.out /etc/shadow
access error for /etc/shadow: Permission denied
open error for /etc/shadow: Permission denied
$ su
become superuser
Password:
enter superuser password
# chown root a.out
change file's user ID to root
Page 158
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
# chmod u+s a.out
and turn on set-user-ID bit
# ls -l a.out
check owner and SUID bit
-rwsrwxr-x 1 root
15945 Nov 30 12:10 a.out
# exit
go back to normal user
$ ./a.out /etc/shadow
access error for /etc/shadow: Permission denied
open for reading OK
In this example, the set-user-ID program can determine that the real user cannot normally
read the file, even though the open function will succeed.
Figure 4.8. Example of access function
#include "apue.h"
#include <fcntl.h>
int
main(int argc, char *argv[])
{
if (argc != 2)
err_quit("usage: a.out <pathname>");
if (access(argv[1], R_OK) < 0)
err_ret("access error for %s", argv[1]);
else
printf("read access OK\n");
if (open(argv[1], O_RDONLY) < 0)
err_ret("open error for %s", argv[1]);
else
printf("open for reading OK\n");
exit(0);
}
In the preceding example and in Chapter 8, we'll sometimes switch to become the superuser,
to demonstrate how something works. If you're on a multiuser system and do not have
superuser permission, you won't be able to duplicate these examples completely.
Page 159
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.8. umask Function
Now that we've described the nine permission bits associated with every file, we can describe
the file mode creation mask that is associated with every process.
The umask function sets the file mode creation mask for the process and returns the previous
value. (This is one of the few functions that doesn't have an error return.)
#include <sys/stat.h>
mode_t umask(mode_t cmask);
Returns: previous file mode creation mask
The cmask argument is formed as the bitwise OR of any of the nine constants from Figure 4.6
: S_IRUSR, S_IWUSR, and so on.
The file mode creation mask is used whenever the process creates a new file or a new
directory. (Recall from Sections 3.3 and 3.4 our description of the open and creat functions.
Both accept a mode argument that specifies the new file's access permission bits.) We
describe how to create a new directory in Section 4.20. Any bits that are on in the file mode
creation mask are turned off in the file's mode.
Example
The program in Figure 4.9 creates two files, one with a umask of 0 and one with a umask that
disables all the group and other permission bits.
If we run this program, we can see how the permission bits have been set.
$ umask
first print the current file mode creation mask
002
$ ./a.out
$ ls -l foo bar
-rw------- 1 sar
0 Dec 7 21:20 bar
-rw-rw-rw- 1 sar
0 Dec 7 21:20 foo
$ umask
see if the file mode creation mask changed
002
Figure 4.9. Example of umask function
#include "apue.h"
#include <fcntl.h>
#define RWRWRW (S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)
int
main(void)
{
umask(0);
if (creat("foo", RWRWRW) < 0)
err_sys("creat error for foo");
umask(S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH);
if (creat("bar", RWRWRW) < 0)
Page 160
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
err_sys("creat error for bar");
exit(0);
}
Most users of UNIX systems never deal with their umask value. It is usually set once, on login,
by the shell's start-up file, and never changed. Nevertheless, when writing programs that
create new files, if we want to ensure that specific access permission bits are enabled, we
must modify the umask value while the process is running. For example, if we want to ensure
that anyone can read a file, we should set the umask to 0. Otherwise, the umask value that is
in effect when our process is running can cause permission bits to be turned off.
In the preceding example, we use the shell's umask command to print the file mode creation
mask before we run the program and after. This shows us that changing the file mode creation
mask of a process doesn't affect the mask of its parent (often a shell). All of the shells have a
built-in umask command that we can use to set or print the current file mode creation mask.
Users can set the umask value to control the default permissions on the files they create. The
value is expressed in octal, with one bit representing one permission to be masked off, as
shown in Figure 4.10. Permissions can be denied by setting the corresponding bits. Some
common umask values are 002 to prevent others from writing your files, 022 to prevent group
members and others from writing your files, and 027 to prevent group members from writing
your files and others from reading, writing, or executing your files.
Figure 4.10. The umask file access permission bits
Mask bit
Meaning
0400
user-read
0200
user-write
0100
user-execute
0040
group-read
0020
group-write
0010
group-execute
0004
other-read
0002
other-write
0001
other-execute
The Single UNIX Specification requires that the shell support a symbolic form of the umask
command. Unlike the octal format, the symbolic format specifies which permissions are to be
allowed (i.e., clear in the file creation mask) instead of which ones are to be denied (i.e., set
in the file creation mask). Compare both forms of the command, shown below.
$ umask
002
first print the current file mode creation mask
Page 161
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
$ umask -S
u=rwx,g=rwx,o=rx
$ umask 027
$ umask -S
u=rwx,g=rx,o=
print the symbolic form
change the file mode creation mask
print the symbolic form
Page 162
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.9. chmod and fchmod Functions
These two functions allow us to change the file access permissions for an existing file.
#include <sys/stat.h>
int chmod(const char *pathname, mode_t mode
);
int fchmod(int filedes, mode_t mode);
Both return: 0 if OK, 1 on error
The chmod function operates on the specified file, whereas the fchmod function operates on a
file that has already been opened.
To change the permission bits of a file, the effective user ID of the process must be equal to
the owner ID of the file, or the process must have superuser permissions.
The mode is specified as the bitwise OR of the constants shown in Figure 4.11.
Figure 4.11. The mode constants for chmod functions, from
<sys/stat.h>
mode
Description
S_ISUID
set-user-ID on execution
S_ISGID
set-group-ID on execution
S_ISVTX
saved-text (sticky bit)
S_IRWXU
read, write, and execute by user (owner)
S_IRUSR
read by user (owner)
S_IWUSR
write by user (owner)
S_IXUSR
execute by user (owner)
S_IRWXG
read, write, and execute by group
S_IRGRP
read by group
S_IWGRP
write by group
S_IXGRP
execute by group
Page 163
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 4.11. The mode constants for chmod functions, from
<sys/stat.h>
mode
Description
S_IRWXO
read, write, and execute by other (world)
S_IROTH
read by other (world)
S_IWOTH
write by other (world)
S_IXOTH
execute by other (world)
Note that nine of the entries in Figure 4.11 are the nine file access permission bits from Figure
4.6. We've added the two set-ID constants (S_ISUID and S_ISGID), the saved-text constant (
S_ISVTX), and the three combined constants (S_IRWXU, S_IRWXG, and S_IRWXO).
The saved-text bit (S_ISVTX) is not part of POSIX.1. It is defined as an XSI extension in the
Single UNIX Specification. We describe its purpose in the next section.
Example
Recall the final state of the files foo and bar when we ran the program in Figure 4.9 to
demonstrate the umask function:
$ ls -l foo bar
-rw------- 1 sar
-rw-rw-rw- 1 sar
0 Dec 7 21:20 bar
0 Dec 7 21:20 foo
The program shown in Figure 4.12 modifies the mode of these two files.
After running the program in Figure 4.12, we see that the final state of the two files is
$ ls -l foo bar
-rw-r--r-- 1 sar
-rw-rwSrw- 1 sar
0 Dec 7 21:20 bar
0 Dec 7 21:20 foo
In this example, we have set the permissions of the file bar to an absolute value, regardless of
the current permission bits. For the file foo, we set the permissions relative to their current
state. To do this, we first call stat to obtain the current permissions and then modify them.
We have explicitly turned on the set-group-ID bit and turned off the group-execute bit. Note
that the ls command lists the group-execute permission as S to signify that the set-group-ID
bit is set without the group-execute bit being set.
On Solaris, the ls command displays an l instead of an S to indicate that mandatory file and
record locking has been enabled for this file. This applies only to regular files, but we'll discuss
this more in Section 14.3.
Finally, note that the time and date listed by the ls command did not change after we ran the
program in Figure 4.12. We'll see in Section 4.18 that the chmod function updates only the time
that the i-node was last changed. By default, the ls -l lists the time when the contents of
the file were last modified.
Page 164
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 4.12. Example of chmod function
#include "apue.h"
int
main(void)
{
struct stat
statbuf;
/* turn on set-group-ID and turn off group-execute */
if (stat("foo", &statbuf) < 0)
err_sys("stat error for foo");
if (chmod("foo", (statbuf.st_mode & ~S_IXGRP) | S_ISGID) < 0)
err_sys("chmod error for foo");
/* set absolute mode to "rw-r--r--" */
if (chmod("bar", S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH) < 0)
err_sys("chmod error for bar");
exit(0);
}
The chmod functions automatically clear two of the permission bits under the following
conditions:

On systems, such as Solaris, that place special meaning on the sticky bit when used
with regular files, if we try to set the sticky bit (S_ISVTX) on a regular file and do not
have superuser privileges, the sticky bit in the mode is automatically turned off. (We
describe the sticky bit in the next section.) This means that only the superuser can
set the sticky bit of a regular file. The reason is to prevent malicious users from setting
the sticky bit and adversely affecting system performance.
On FreeBSD 5.2.1, Mac OS X 10.3, and Solaris 9, only the superuser can set the sticky
bit on a regular file. Linux 2.4.22 places no such restriction on the setting of the sticky
bit, because the bit has no meaning when applied to regular files on Linux. Although
the bit also has no meaning when applied to regular files on FreeBSD and Mac OS X,
these systems prevent everyone but the superuser from setting it on a regular file.

It is possible that the group ID of a newly created file is a group that the calling
process does not belong to. Recall from Section 4.6 that it's possible for the group ID
of the new file to be the group ID of the parent directory. Specifically, if the group ID
of the new file does not equal either the effective group ID of the process or one of
the process's supplementary group IDs and if the process does not have superuser
privileges, then the set-group-ID bit is automatically turned off. This prevents a user
from creating a set-group-ID file owned by a group that the user doesn't belong to.
FreeBSD 5.2.1, Linux 2.4.22, Mac OS X 10.3, and Solaris 9 add another security
feature to try to prevent misuse of some of the protection bits. If a process that does
not have superuser privileges writes to a file, the set-user-ID and set-group-ID bits
are automatically turned off. If malicious users find a set-group-ID or a set-user-ID file
they can write to, even though they can modify the file, they lose the special
privileges of the file.
Page 165
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.10. Sticky Bit
The S_ISVTX bit has an interesting history. On versions of the UNIX System that predated
demand paging, this bit was known as the sticky bit. If it was set for an executable program
file, then the first time the program was executed, a copy of the program's text was saved in
the swap area when the process terminated. (The text portion of a program is the machine
instructions.) This caused the program to load into memory more quickly the next time it was
executed, because the swap area was handled as a contiguous file, compared to the possibly
random location of data blocks in a normal UNIX file system. The sticky bit was often set for
common application programs, such as the text editor and the passes of the C compiler.
Naturally, there was a limit to the number of sticky files that could be contained in the swap
area before running out of swap space, but it was a useful technique. The name sticky came
about because the text portion of the file stuck around in the swap area until the system was
rebooted. Later versions of the UNIX System referred to this as the saved-text bit; hence,
the constant S_ISVTX. With today's newer UNIX systems, most of which have a virtual memory
system and a faster file system, the need for this technique has disappeared.
On contemporary systems, the use of the sticky bit has been extended. The Single UNIX
Specification allows the sticky bit to be set for a directory. If the bit is set for a directory, a
file in the directory can be removed or renamed only if the user has write permission for the
directory and one of the following:

Owns the file

Owns the directory

Is the superuser
The directories /tmp and /var/spool/uucppublic are typical candidates for the sticky bitthey
are directories in which any user can typically create files. The permissions for these two
directories are often read, write, and execute for everyone (user, group, and other). But users
should not be able to delete or rename files owned by others.
The saved-text bit is not part of POSIX.1. It is an XSI extension to the basic POSIX.1
functionality defined in the Single UNIX Specification, and is supported by FreeBSD 5.2.1,
Linux 2.4.22, Mac OS X 10.3, and Solaris 9.
Solaris 9 places special meaning on the sticky bit if it is set on a regular file. In this case, if
none of the execute bits is set, the operating system will not cache the contents of the file.
Page 166
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.11. chown, fchown, and lchown Functions
The chown functions allow us to change the user ID of a file and the group ID of a file.
[View full width]
#include <unistd.h>
int chown(const char *pathname, uid_t owner,
gid_t
group);
int fchown(int filedes, uid_t owner, gid_t group);
int lchown(const char *pathname, uid_t owner,
gid_t group);
All three return: 0 if OK, 1 on error
These three functions operate similarly unless the referenced file is a symbolic link. In that
case, lchown changes the owners of the symbolic link itself, not the file pointed to by the
symbolic link.
The lchown function is an XSI extension to the POSIX.1 functionality defined in the Single
UNIX Specification. As such, all UNIX System implementations are expected to provide it.
If either of the arguments owner or group is -1, the corresponding ID is left unchanged.
Historically, BSD-based systems have enforced the restriction that only the superuser can
change the ownership of a file. This is to prevent users from giving away their files to others,
thereby defeating any disk space quota restrictions. System V, however, has allowed any user
to change the ownership of any files they own.
POSIX.1 allows either form of operation, depending on the value of _POSIX_CHOWN_RESTRICTED.
With Solaris 9, this functionality is a configuration option, whose default value is to enforce
the restriction. FreeBSD 5.2.1, Linux 2.4.22, and Mac OS X 10.3 always enforce the chown
restriction.
Recall from Section 2.6 that the _POSIX_CHOWN_RESTRICTED constant can optionally be defined in
the header <unistd.h>, and can always be queried using either the pathconf function or the
fpathconf function. Also recall that this option can depend on the referenced file; it can be
enabled or disabled on a per file system basis. We'll use the phrase, if _POSIX_CHOWN_RESTRICTED
is in effect, to mean if it applies to the particular file that we're talking about, regardless of
whether this actual constant is defined in the header.
If _POSIX_CHOWN_RESTRICTED is in effect for the specified file, then
1.
Only a superuser process can change the user ID of the file.
2.
A nonsuperuser process can change the group ID of the file if the process owns the file
(the effective user ID equals the user ID of the file), owner is specified as 1 or equals
the user ID of the file, and group equals either the effective group ID of the process or
one of the process's supplementary group IDs.
Page 167
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
This means that when _POSIX_CHOWN_RESTRICTED is in effect, you can't change the user ID of
other users' files. You can change the group ID of files that you own, but only to groups that
you belong to.
If these functions are called by a process other than a superuser process, on successful
return, both the set-user-ID and the set-group-ID bits are cleared.
Page 168
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.12. File Size
The st_size member of the stat structure contains the size of the file in bytes. This field is
meaningful only for regular files, directories, and symbolic links.
Solaris also defines the file size for a pipe as the number of bytes that are available for
reading from the pipe. We'll discuss pipes in Section 15.2.
For a regular file, a file size of 0 is allowed. We'll get an end-of-file indication on the first read
of the file.
For a directory, the file size is usually a multiple of a number, such as 16 or 512. We talk
about reading directories in Section 4.21.
For a symbolic link, the file size is the number of bytes in the filename. For example, in the
following case, the file size of 7 is the length of the pathname usr/lib:
lrwxrwxrwx 1 root
7 Sep 25 07:14 lib -> usr/lib
(Note that symbolic links do not contain the normal C null byte at the end of the name, as the
length is always specified by st_size.)
Most contemporary UNIX systems provide the fields st_blksize and st_blocks. The first is the
preferred block size for I/O for the file, and the latter is the actual number of 512-byte blocks
that are allocated. Recall from Section 3.9 that we encountered the minimum amount of time
required to read a file when we used st_blksize for the read operations. The standard I/O
library, which we describe in Chapter 5, also tries to read or write st_blksize bytes at a time,
for efficiency.
Be aware that different versions of the UNIX System use units other than 512-byte blocks for
st_blocks. Using this value is nonportable.
Holes in a File
In Section 3.6, we mentioned that a regular file can contain "holes." We showed an example of
this in Figure 3.2. Holes are created by seeking past the current end of file and writing some
data. As an example, consider the following:
$ ls -l core
-rw-r--r-- 1 sar
$ du -s core
272
core
8483248 Nov 18 12:18 core
The size of the file core is just over 8 MB, yet the du command reports that the amount of
disk space used by the file is 272 512-byte blocks (139,264 bytes). (The du command on
many BSD-derived systems reports the number of 1,024-byte blocks; Solaris reports the
number of 512-byte blocks.) Obviously, this file has many holes.
As we mentioned in Section 3.6, the read function returns data bytes of 0 for any byte
positions that have not been written. If we execute the following, we can see that the normal
I/O operations read up through the size of the file:
$ wc -c core
8483248 core
Page 169
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The wc(1) command with the -c option counts the number of characters (bytes) in the file.
If we make a copy of this file, using a utility such as cat(1), all these holes are written out as
actual data bytes of 0:
$ cat core > core.copy
$ ls -l core*
-rw-r--r-- 1 sar
8483248 Nov 18 12:18 core
-rw-rw-r-- 1 sar
8483248 Nov 18 12:27 core.copy
$ du -s core*
272
core
16592
core.copy
Here, the actual number of bytes used by the new file is 8,495,104 (512 x 16,592). The
difference between this size and the size reported by ls is caused by the number of blocks
used by the file system to hold pointers to the actual data blocks.
Interested readers should refer to Section 4.2 of Bach [1986], Sections 7.2 and 7.3 of
McKusick et al. [1996] (or Sections 8.2 and 8.3 in McKusick and Neville-Neil [2005]), and
Section 14.2 of Mauro and McDougall [2001] for additional details on the physical layout of
files.
Page 170
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.13. File Truncation
There are times when we would like to truncate a file by chopping off data at the end of the
file. Emptying a file, which we can do with the O_TRUNC flag to open, is a special case of
truncation.
#include <unistd.h>
int truncate(const char *pathname, off_t length
);
int ftruncate(int filedes, off_t length);
Both return: 0 if OK, 1 on error
These two functions truncate an existing file to length bytes. If the previous size of the file
was greater than length, the data beyond length is no longer accessible. If the previous size
was less than length, the effect is system dependent, but XSI-conforming systems will
increase the file size. If the implementation does extend a file, data between the old end of
file and the new end of file will read as 0 (i.e., a hole is probably created in the file).
The ftruncate function is part of POSIX.1. The truncate function is an XSI extension to the
POSIX.1 functionality defined in the Single UNIX Specification.
BSD releases prior to 4.4BSD could only make a file smaller with TRuncate.
Solaris also includes an extension to fcntl (F_FREESP) that allows us to free any part of a file,
not just a chunk at the end of the file.
We use ftruncate in the program shown in Figure 13.6 when we need to empty a file after
obtaining a lock on the file.
Page 171
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.14. File Systems
To appreciate the concept of links to a file, we need a conceptual understanding of the
structure of the UNIX file system. Understanding the difference between an i-node and a
directory entry that points to an i-node is also useful.
Various implementations of the UNIX file system are in use today. Solaris, for example,
supports several different types of disk file systems: the traditional BSD-derived UNIX file
system (called UFS), a file system (called PCFS) to read and write DOS-formatted diskettes,
and a file system (called HSFS) to read CD file systems. We saw one difference between file
system types in Figure 2.19. UFS is based on the Berkeley fast file system, which we describe
in this section.
We can think of a disk drive being divided into one or more partitions. Each partition can
contain a file system, as shown in Figure 4.13.
Figure 4.13. Disk drive, partitions, and a file system
[View full size image]
The i-nodes are fixed-length entries that contain most of the information about a file.
If we examine the i-node and data block portion of a cylinder group in more detail, we could
have what is shown in Figure 4.14.
Figure 4.14. Cylinder group's i-nodes and data blocks in more detail
[View full size image]
Page 172
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Note the following points from Figure 4.14.

We show two directory entries that point to the same i-node entry. Every i-node has a
link count that contains the number of directory entries that point to the i-node. Only
when the link count goes to 0 can the file be deleted (i.e., can the data blocks
associated with the file be released). This is why the operation of "unlinking a file" does
not always mean "deleting the blocks associated with the file." This is why the function
that removes a directory entry is called unlink, not delete. In the stat structure, the
link count is contained in the st_nlink member. Its primitive system data type is
nlink_t. These types of links are called hard links. Recall from Section 2.5.2 that the
POSIX.1 constant LINK_MAX specifies the maximum value for a file's link count.

The other type of link is called a symbolic link. With a symbolic link, the actual
contents of the filethe data blocksstore the name of the file that the symbolic link
points to. In the following example, the filename in the directory entry is the
three-character string lib and the 7 bytes of data in the file are usr/lib:


lrwxrwxrwx 1 root
7 Sep 25 07:14 lib -> usr/lib
The file type in the i-node would be S_IFLNK so that the system knows that this is a
symbolic link.

The i-node contains all the information about the file: the file type, the file's access
permission bits, the size of the file, pointers to the file's data blocks, and so on. Most
of the information in the stat structure is obtained from the i-node. Only two items of
interest are stored in the directory entry: the filename and the i-node number; the
other itemsthe length of the filename and the length of the directory recordare not of
interest to this discussion. The data type for the i-node number is ino_t.

Because the i-node number in the directory entry points to an i-node in the same file
system, we cannot have a directory entry point to an i-node in a different file system.
This is why the ln(1) command (make a new directory entry that points to an existing
file) can't cross file systems. We describe the link function in the next section.

When renaming a file without changing file systems, the actual contents of the file
need not be movedall that needs to be done is to add a new directory entry that
points to the existing i-node, and then unlink the old directory entry. The link count will
remain the same. For example, to rename the file /usr/lib/foo to /usr/foo, the
contents of the file foo need not be moved if the directories /usr/lib and /usr are on
the same file system. This is how the mv(1) command usually operates.
We've talked about the concept of a link count for a regular file, but what about the link
count field for a directory? Assume that we make a new directory in the working directory, as
in
$ mkdir testdir
Figure 4.15 shows the result. Note that in this figure, we explicitly show the entries for dot
and dot-dot.
Figure 4.15. Sample cylinder group after creating the directory testdir
[View full size image]
Page 173
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The i-node whose number is 2549 has a type field of "directory" and a link count equal to 2.
Any leaf directory (a directory that does not contain any other directories) always has a link
count of 2. The value of 2 is from the directory entry that names the directory (testdir) and
from the entry for dot in that directory. The i-node whose number is 1267 has a type field of
"directory" and a link count that is greater than or equal to 3. The reason we know that the
link count is greater than or equal to 3 is that minimally, it is pointed to from the directory
entry that names it (which we don't show in Figure 4.15), from dot, and from dot-dot in the
testdir directory. Note that every subdirectory in a parent directory causes the parent
directory's link count to be increased by 1.
This format is similar to the classic format of the UNIX file system, which is described in detail
in Chapter 4 of Bach [1986]. Refer to Chapter 7 of McKusick et al. [1996] or Chapter 8 of
McKusick and Neville-Neil [2005] for additional information on the changes made with the
Berkeley fast file system. See Chapter 14 of Mauro and McDougall [2001] for details on UFS,
the Solaris version of the Berkeley fast file system.
Page 174
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.15. link, unlink, remove, and rename Functions
As we saw in the previous section, any file can have multiple directory entries pointing to its
i-node. The way we create a link to an existing file is with the link function.
[View full width]
#include <unistd.h>
int link(const char *existingpath, const
char
*newpath);
Returns: 0 if OK, 1 on error
This function creates a new directory entry, newpath, that references the existing file
existingpath. If the newpath already exists, an error is returned. Only the last component of
the newpath is created. The rest of the path must already exist.
The creation of the new directory entry and the increment of the link count must be an
atomic operation. (Recall the discussion of atomic operations in Section 3.11.)
Most implementations require that both pathnames be on the same file system, although
POSIX.1 allows an implementation to support linking across file systems. If an implementation
supports the creation of hard links to directories, it is restricted to only the superuser. The
reason is that doing this can cause loops in the file system, which most utilities that process
the file system aren't capable of handling. (We show an example of a loop introduced by a
symbolic link in Section 4.16.) Many file system implementations disallow hard links to
directories for this reason.
To remove an existing directory entry, we call the unlink function.
#include <unistd.h>
int unlink(const char *pathname
);
Returns: 0 if OK, 1 on error
This function removes the directory entry and decrements the link count of the file referenced
by pathname. If there are other links to the file, the data in the file is still accessible through
the other links. The file is not changed if an error occurs.
We've mentioned before that to unlink a file, we must have write permission and execute
permission in the directory containing the directory entry, as it is the directory entry that we
will be removing. Also, we mentioned in Section 4.10 that if the sticky bit is set in this
directory we must have write permission for the directory and one of the following:

Own the file

Own the directory

Have superuser privileges
Page 175
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Only when the link count reaches 0 can the contents of the file be deleted. One other
condition prevents the contents of a file from being deleted: as long as some process has the
file open, its contents will not be deleted. When a file is closed, the kernel first checks the
count of the number of processes that have the file open. If this count has reached 0, the
kernel then checks the link count; if it is 0, the file's contents are deleted.
Example
The program shown in Figure 4.16 opens a file and then unlinks it. The program then goes to
sleep for 15 seconds before terminating.
Running this program gives us
$ ls -l tempfile
look at how big the file is
-rw-r----- 1 sar
413265408 Jan 21 07:14 tempfile
$ df /home
check how much free space is available
Filesystem 1K-blocks
Used Available Use% Mounted on
/dev/hda4
11021440 1956332
9065108
18% /home
$ ./a.out &
run the program in Figure 4.16 in the background
1364
the shell prints its process ID
$ file unlinked
the file is unlinked
ls -l tempfile
see if the filename is still there
ls: tempfile: No such file or directory
the directory entry is gone
$ df /home
see if the space is available yet
Filesystem 1K-blocks
Used Available Use% Mounted on
/dev/hda4
11021440 1956332
9065108
18% /home
$ done
the program is done, all open files are closed
df /home
now the disk space should be available
Filesystem 1K-blocks
Used Available Use% Mounted on
/dev/hda4
11021440 1552352
9469088
15% /home
now the 394.1 MB of disk space are available
Figure 4.16. Open a file and then unlink it
#include "apue.h"
#include <fcntl.h>
int
main(void)
{
if (open("tempfile", O_RDWR) < 0)
err_sys("open error");
if (unlink("tempfile") < 0)
err_sys("unlink error");
printf("file unlinked\n");
sleep(15);
printf("done\n");
exit(0);
}
This property of unlink is often used by a program to ensure that a temporary file it creates
won't be left around in case the program crashes. The process creates a file using either open
or creat and then immediately calls unlink. The file is not deleted, however, because it is still
open. Only when the process either closes the file or terminates, which causes the kernel to
close all its open files, is the file deleted.
If pathname is a symbolic link, unlink removes the symbolic link, not the file referenced by the
link. There is no function to remove the file referenced by a symbolic link given the name of
the link.
Page 176
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The superuser can call unlink with pathname specifying a directory, but the function rmdir
should be used instead to unlink a directory. We describe the rmdir function in Section 4.20.
We can also unlink a file or a directory with the remove function. For a file, remove is identical
to unlink. For a directory, remove is identical to rmdir.
#include <stdio.h>
int remove(const char *pathname
);
Returns: 0 if OK, 1 on error
ISO C specifies the remove function to delete a file. The name was changed from the historical
UNIX name of unlink because most non-UNIX systems that implement the C standard didn't
support the concept of links to a file at the time.
A file or a directory is renamed with the rename function.
#include <stdio.h>
int rename(const char *oldname, const char *newname
);
Returns: 0 if OK, 1 on error
This function is defined by ISO C for files. (The C standard doesn't deal with directories.)
POSIX.1 expanded the definition to include directories and symbolic links.
There are several conditions to describe, depending on whether oldname refers to a file, a
directory, or a symbolic link. We must also describe what happens if newname already exists.
1.
If oldname specifies a file that is not a directory, then we are renaming a file or a
symbolic link. In this case, if newname exists, it cannot refer to a directory. If
newname exists and is not a directory, it is removed, and oldname is renamed to
newname. We must have write permission for the directory containing oldname and for
the directory containing newname, since we are changing both directories.
2.
If oldname specifies a directory, then we are renaming a directory. If newname exists,
it must refer to a directory, and that directory must be empty. (When we say that a
directory is empty, we mean that the only entries in the directory are dot and
dot-dot.) If newname exists and is an empty directory, it is removed, and oldname is
renamed to newname. Additionally, when we're renaming a directory, newname cannot
contain a path prefix that names oldname. For example, we can't rename /usr/foo to
/usr/foo/testdir, since the old name (/usr/foo) is a path prefix of the new name and
cannot be removed.
3.
If either oldname or newname refers to a symbolic link, then the link itself is
processed, not the file to which it resolves.
4.
As a special case, if the oldname and newname refer to the same file, the function
returns successfully without changing anything.
If newname already exists, we need permissions as if we were deleting it. Also, because we're
removing the directory entry for oldname and possibly creating a directory entry for newname
, we need write permission and execute permission in the directory containing oldname and in
the directory containing newname.
Page 177
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 178
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.16. Symbolic Links
A symbolic link is an indirect pointer to a file, unlike the hard links from the previous section,
which pointed directly to the i-node of the file. Symbolic links were introduced to get around
the limitations of hard links.

Hard links normally require that the link and the file reside in the same file system

Only the superuser can create a hard link to a directory
There are no file system limitations on a symbolic link and what it points to, and anyone can
create a symbolic link to a directory. Symbolic links are typically used to move a file or an
entire directory hierarchy to another location on a system.
Symbolic links were introduced with 4.2BSD and subsequently supported by SVR4.
When using functions that refer to a file by name, we always need to know whether the
function follows a symbolic link. If the function follows a symbolic link, a pathname argument
to the function refers to the file pointed to by the symbolic link. Otherwise, a pathname
argument refers to the link itself, not the file pointed to by the link. Figure 4.17 summarizes
whether the functions described in this chapter follow a symbolic link. The functions mkdir,
mkfifo, mknod, and rmdir are not in this figure, as they return an error when the pathname is a
symbolic link. Also, the functions that take a file descriptor argument, such as fstat and
fchmod, are not listed, as the handling of a symbolic link is done by the function that returns
the file descriptor (usually open). Whether or not chown follows a symbolic link depends on the
implementation.
In older versions of Linux (those before version 2.1.81), chown didn't follow symbolic links. From
version 2.1.81 onward, chown follows symbolic links. With FreeBSD 5.2.1 and Mac OS X 10.3,
chown follows symbolic links. (Prior to 4.4BSD, chown didn't follow symbolic links, but this was
changed in 4.4BSD.) In Solaris 9, chown also follows symbolic links. All of these platforms
provide implementations of lchown to change the ownership of symbolic links themselves.
One exception to Figure 4.17 is when the open function is called with both O_CREAT and O_EXCL
set. In this case, if the pathname refers to a symbolic link, open will fail with errno set to
EEXIST. This behavior is intended to close a security hole so that privileged processes can't be
fooled into writing to the wrong files.
Figure 4.17. Treatment of symbolic links by various functions
Function
Does not follow symbolic link
Follows symbolic link
access
•
chdir
•
chmod
•
chown
•
•
creat
•
exec
•
Page 179
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 4.17. Treatment of symbolic links by various functions
Function
Does not follow symbolic link
lchown
Follows symbolic link
•
link
•
lstat
•
open
•
opendir
•
pathconf
•
readlink
•
remove
•
rename
•
stat
•
truncate
•
unlink
•
Example
It is possible to introduce loops into the file system by using symbolic links. Most functions
that look up a pathname return an errno of ELOOP when this occurs. Consider the following
commands:
$ mkdir foo
make a new directory
$ touch foo/a
create a 0-length file
$ ln -s ../foo foo/testdir create a symbolic link
$ ls -l foo
total 0
-rw-r----- 1 sar
0 Jan 22 00:16 a
lrwxrwxrwx 1 sar
6 Jan 22 00:16 testdir -> ../foo
This creates a directory foo that contains the file a and a symbolic link that points to foo. We
show this arrangement in Figure 4.18, drawing a directory as a circle and a file as a square. If
we write a simple program that uses the standard function ftw(3) on Solaris to descend
through a file hierarchy, printing each pathname encountered, the output is
foo
foo/a
foo/testdir
foo/testdir/a
Page 180
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
foo/testdir/testdir
foo/testdir/testdir/a
foo/testdir/testdir/testdir
foo/testdir/testdir/testdir/a
(many more lines until we encounter an ELOOP error)
In Section 4.21, we provide our own version of the ftw function that uses lstat instead of
stat, to prevent it from following symbolic links.
Note that on Linux, the ftw function uses lstat, so it doesn't display this behavior.
A loop of this form is easy to remove. We are able to unlink the file foo/testdir, as unlink
does not follow a symbolic link. But if we create a hard link that forms a loop of this type, its
removal is much more difficult. This is why the link function will not form a hard link to a
directory unless the process has superuser privileges.
Indeed, Rich Stevens did this on his own system as an experiment while writing the original
version of this section. The file system got corrupted and the normal fsck(1) utility couldn't fix
things. The deprecated tools clri(8) and dcheck(8) were needed to repair the file system.
The need for hard links to directories has long since passed. With symbolic links and the mkdir
function, there is no longer any need for users to create hard links to directories.
When we open a file, if the pathname passed to open specifies a symbolic link, open follows the
link to the specified file. If the file pointed to by the symbolic link doesn't exist, open returns
an error saying that it can't open the file. This can confuse users who aren't familiar with
symbolic links. For example,
$ ln -s /no/such/file myfile
create a symbolic link
$ ls myfile
myfile
ls says it's there
$ cat myfile
so we try to look at it
cat: myfile: No such file or directory
$ ls -l myfile
try -l option
lrwxrwxrwx 1 sar
13 Jan 22 00:26 myfile -> /no/such/file
The file myfile does exist, yet cat says there is no such file, because myfile is a symbolic link
and the file pointed to by the symbolic link doesn't exist. The -l option to ls gives us two
hints: the first character is an l, which means a symbolic link, and the sequence -> also
indicates a symbolic link. The ls command has another option (-F) that appends an at-sign to
filenames that are symbolic links, which can help spot symbolic links in a directory listing
without the -l option.
Figure 4.18. Symbolic link testdir that creates a loop
Page 181
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 182
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.17. symlink and readlink Functions
A symbolic link is created with the symlink function.
[View full width]
#include <unistd.h>
int symlink(const char *actualpath, const
char
*sympath);
Returns: 0 if OK, 1 on error
A new directory entry, sympath, is created that points to actualpath. It is not required that
actualpath exist when the symbolic link is created. (We saw this in the example at the end of
the previous section.) Also, actualpath and sympath need not reside in the same file system.
Because the open function follows a symbolic link, we need a way to open the link itself and
read the name in the link. The readlink function does this.
[View full width]
#include <unistd.h>
ssize_t readlink(const char* restrict pathname,
char *restrict buf,
size_t bufsize);
Returns: number of bytes read if OK, 1 on error
This function combines the actions of open, read, and close. If the function is successful, it
returns the number of bytes placed into buf. The contents of the symbolic link that are
returned in buf are not null terminated.
Page 183
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.18. File Times
Three time fields are maintained for each file. Their purpose is summarized in Figure 4.19.
Figure 4.19. The three time values associated with each file
Field
Description
Example
ls(1) option
st_atime
last-access time of file data
read
-u
st_mtime
last-modification time of file data
write
default
st_ctime
last-change time of i-node status
chmod, chown
-c
Note the difference between the modification time (st_mtime) and the changed-status time (
st_ctime). The modification time is when the contents of the file were last modified. The
changed-status time is when the i-node of the file was last modified. In this chapter, we've
described many operations that affect the i-node without changing the actual contents of the
file: changing the file access permissions, changing the user ID, changing the number of links,
and so on. Because all the information in the i-node is stored separately from the actual
contents of the file, we need the changed-status time, in addition to the modification time.
Note that the system does not maintain the last-access time for an i-node. This is why the
functions access and stat, for example, don't change any of the three times.
The access time is often used by system administrators to delete files that have not been
accessed for a certain amount of time. The classic example is the removal of files named a.out
or core that haven't been accessed in the past week. The find(1) command is often used for
this type of operation.
The modification time and the changed-status time can be used to archive only those files
that have had their contents modified or their i-node modified.
The ls command displays or sorts only on one of the three time values. By default, when
invoked with either the -l or the -t option, it uses the modification time of a file. The -u option
causes it to use the access time, and the -c option causes it to use the changed-status time.
Figure 4.20 summarizes the effects of the various functions that we've described on these
three times. Recall from Section 4.14 that a directory is simply a file containing directory
entries: filenames and associated i-node numbers. Adding, deleting, or modifying these
directory entries can affect the three times associated with that directory. This is why Figure
4.20 contains one column for the three times associated with the file or directory and another
column for the three times associated with the parent directory of the referenced file or
directory. For example, creating a new file affects the directory that contains the new file, and
it affects the i-node for the new file. Reading or writing a file, however, affects only the i-node
of the file and has no effect on the directory. (The mkdir and rmdir functions are covered in
Section 4.20. The utime function is covered in the next section. The six exec functions are
described in Section 8.10. We describe the mkfifo and pipe functions in Chapter 15.)
Page 184
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 4.20. Effect of various functions on the access, modification,
and changed-status times
Function
Referenced file
or directory
a
m
c
Parent directory of
referenced file or
directory
a
m
Section
c
chmod, fchmod
•
4.9
chown, fchown
•
4.11
creat
•
creat
exec
•
•
•
•
•
Note
•
•
3.4
O_CREAT new file
3.4
O_TRUNC existing file
8.10
lchown
•
link
•
•
•
4.15
4.11
parent of second
argument
mkdir
•
•
•
•
•
4.20
mkfifo
•
•
•
•
•
15.5
open
•
•
•
•
•
3.3
O_CREAT new file
•
•
3.3
O_TRUNC existing file
•
•
15.2
open
pipe
•
read
•
3.7
remove
•
remove
•
•
4.15
remove file = unlink
•
•
4.15
remove directory =
rmdir
rename
•
rmdir
truncate,
ftruncate
•
•
•
•
4.15
•
•
4.20
for both arguments
4.13
Page 185
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 4.20. Effect of various functions on the access, modification,
and changed-status times
Function
Referenced file
or directory
a
m
unlink
utime
write
c
•
•
Parent directory of
referenced file or
directory
a
m
c
•
•
Section
Note
4.15
•
•
4.19
•
•
3.8
Page 186
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.19. utime Function
The access time and the modification time of a file can be changed with the utime function.
[View full width]
#include <utime.h>
int utime(const char *pathname, const
struct
utimbuf *times);
Returns: 0 if OK, 1 on error
The structure used by this function is
struct utimbuf {
time_t actime;
time_t modtime;
}
/* access time */
/* modification time */
The two time values in the structure are calendar times, which count seconds since the
Epoch, as described in Section 1.10.
The operation of this function, and the privileges required to execute it, depend on whether
the times argument is NULL.

If times is a null pointer, the access time and the modification time are both set to the
current time. To do this, either the effective user ID of the process must equal the
owner ID of the file, or the process must have write permission for the file.

If times is a non-null pointer, the access time and the modification time are set to the
values in the structure pointed to by times. For this case, the effective user ID of the
process must equal the owner ID of the file, or the process must be a superuser
process. Merely having write permission for the file is not adequate.
Note that we are unable to specify a value for the changed-status time, st_ctimethe time the
i-node was last changedas this field is automatically updated when the utime function is
called.
On some versions of the UNIX System, the touch(1) command uses this function. Also, the
standard archive programs, tar(1) and cpio(1), optionally call utime to set the times for a file
to the time values saved when the file was archived.
Example
The program shown in Figure 4.21 truncates files to zero length using the O_TRUNC option of
the open function, but does not change their access time or modification time. To do this, the
program first obtains the times with the stat function, truncates the file, and then resets the
times with the utime function.
We can demonstrate the program in Figure 4.21 with the following script:
$ ls -l changemod times
look at sizes and last-modification times
Page 187
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
-rwxrwxr-x 1 sar
15019
Nov 18 18:53 changemod
-rwxrwxr-x 1 sar
16172
Nov 19 20:05 times
$ ls -lu changemod times
look at last-access times
-rwxrwxr-x 1 sar
15019
Nov 18 18:53 changemod
-rwxrwxr-x 1 sar
16172
Nov 19 20:05 times
$ date
print today's date
Thu Jan 22 06:55:17 EST 2004
$ ./a.out changemod times
run the program in Figure 4.21
$ ls -l changemod times
and check the results
-rwxrwxr-x 1 sar
0 Nov 18 18:53 changemod
-rwxrwxr-x 1 sar
0 Nov 19 20:05 times
check the last-access times also
$ ls -lu changemod times
-rwxrwxr-x 1 sar
0 Nov 18 18:53 changemod
-rwxrwxr-x 1 sar
0 Nov 19 20:05 times
and the changed-status times
$ ls -lc changemod times
-rwxrwxr-x 1 sar
0 Jan 22 06:55 changemod
-rwxrwxr-x 1 sar
0 Jan 22 06:55 times
As we expect, the last-modification times and the last-access times are not changed. The
changed-status times, however, are changed to the time that the program was run.
Figure 4.21. Example of utime function
#include "apue.h"
#include <fcntl.h>
#include <utime.h>
int
main(int argc, char
{
int
struct stat
struct utimbuf
*argv[])
i, fd;
statbuf;
timebuf;
for (i = 1; i < argc; i++) {
if (stat(argv[i], &statbuf) < 0) { /* fetch current times */
err_ret("%s: stat error", argv[i]);
continue;
}
if ((fd = open(argv[i], O_RDWR | O_TRUNC)) < 0) { /* truncate */
err_ret("%s: open error", argv[i]);
continue;
}
close(fd);
timebuf.actime = statbuf.st_atime;
timebuf.modtime = statbuf.st_mtime;
if (utime(argv[i], &timebuf) < 0) {
/* reset times */
err_ret("%s: utime error", argv[i]);
continue;
}
}
exit(0);
}
Page 188
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.20. mkdir and rmdir Functions
Directories are created with the mkdir function and deleted with the rmdir function.
#include <sys/stat.h>
int mkdir(const char *pathname, mode_t mode
);
Returns: 0 if OK, 1 on error
This function creates a new, empty directory. The entries for dot and dot-dot are
automatically created. The specified file access permissions, mode, are modified by the file
mode creation mask of the process.
A common mistake is to specify the same mode as for a file: read and write permissions only.
But for a directory, we normally want at least one of the execute bits enabled, to allow
access to filenames within the directory. (See Exercise 4.16.)
The user ID and group ID of the new directory are established according to the rules we
described in Section 4.6.
Solaris 9 and Linux 2.4.22 also have the new directory inherit the set-group-ID bit from the
parent directory. This is so that files created in the new directory will inherit the group ID of
that directory. With Linux, the file system implementation determines whether this is
supported. For example, the ext2 and ext3 file systems allow this behavior to be controlled by
an option to the mount(1) command. With the Linux implementation of the UFS file system,
however, the behavior is not selectable; it inherits the set-group-ID bit to mimic the historical
BSD implementation, where the group ID of a directory is inherited from the parent directory.
BSD-based implementations don't propagate the set-group-ID bit; they simply inherit the
group ID as a matter of policy. Because FreeBSD 5.2.1 and Mac OS X 10.3 are based on
4.4BSD, they do not require this inheriting of the set-group-ID bit. On these platforms, newly
created files and directories always inherit the group ID of the parent directory, regardless of
the set-group-ID bit.
Earlier versions of the UNIX System did not have the mkdir function. It was introduced with
4.2BSD and SVR3. In the earlier versions, a process had to call the mknod function to create a
new directory. But use of the mknod function was restricted to superuser processes. To
circumvent this, the normal command that created a directory, mkdir(1), had to be owned by
root with the set-user-ID bit on. To create a directory from a process, the mkdir(1) command
had to be invoked with the system(3) function.
An empty directory is deleted with the rmdir function. Recall that an empty directory is one
that contains entries only for dot and dot-dot.
#include <unistd.h>
int rmdir(const char *pathname
);
Returns: 0 if OK, 1 on error
If the link count of the directory becomes 0 with this call, and if no other process has the
directory open, then the space occupied by the directory is freed. If one or more processes
Page 189
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
have the directory open when the link count reaches 0, the last link is removed and the dot
and dot-dot entries are removed before this function returns. Additionally, no new files can be
created in the directory. The directory is not freed, however, until the last process closes it.
(Even though some other process has the directory open, it can't be doing much in the
directory, as the directory had to be empty for the rmdir function to succeed.)
Page 190
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.21. Reading Directories
Directories can be read by anyone who has access permission to read the directory. But only
the kernel can write to a directory, to preserve file system sanity. Recall from Section 4.5
that the write permission bits and execute permission bits for a directory determine if we can
create new files in the directory and remove files from the directorythey don't specify if we
can write to the directory itself.
The actual format of a directory depends on the UNIX System implementation and the design
of the file system. Earlier systems, such as Version 7, had a simple structure: each directory
entry was 16 bytes, with 14 bytes for the filename and 2 bytes for the i-node number. When
longer filenames were added to 4.2BSD, each entry became variable length, which means that
any program that reads a directory is now system dependent. To simplify this, a set of
directory routines were developed and are part of POSIX.1. Many implementations prevent
applications from using the read function to access the contents of directories, thereby
further isolating applications from the implementation-specific details of directory formats.
#include <dirent.h>
DIR *opendir(const char *pathname);
Returns: pointer if OK, NULL on error
struct dirent *readdir(DIR *dp);
Returns: pointer if OK, NULL at end of directory or error
void rewinddir(DIR *dp);
int closedir(DIR *dp);
Returns: 0 if OK, 1 on error
long telldir(DIR *dp);
Returns: current location in directory associated with dp
void seekdir(DIR *dp, long loc);
The telldir and seekdir functions are not part of the base POSIX.1 standard. They are XSI
extensions in the Single UNIX Specifications, so all conforming UNIX System implementations
are expected to provide them.
Recall our use of several of these functions in the program shown in Figure 1.3, our
bare-bones implementation of the ls command.
The dirent structure defined in the file <dirent.h> is implementation dependent.
Implementations define the structure to contain at least the following two members:
Page 191
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
struct dirent {
ino_t d_ino;
char d_name[NAME_MAX + 1];
}
/* i-node number */
/* null-terminated filename */
The d_ino enTRy is not defined by POSIX.1, since it's an implementation feature, but it is
defined in the XSI extension to POSIX.1. POSIX.1 defines only the d_name entry in this
structure.
Note that NAME_MAX is not a defined constant with Solarisits value depends on the file system
in which the directory resides, and its value is usually obtained from the fpathconf function. A
common value for NAME_MAX is 255. (Recall Figure 2.14.) Since the filename is null terminated,
however, it doesn't matter how the array d_name is defined in the header, because the array
size doesn't indicate the length of the filename.
The DIR structure is an internal structure used by these six functions to maintain information
about the directory being read. The purpose of the DIR structure is similar to that of the FILE
structure maintained by the standard I/O library, which we describe in Chapter 5.
The pointer to a DIR structure that is returned by opendir is then used with the other five
functions. The opendir function initializes things so that the first readdir reads the first entry
in the directory. The ordering of entries within the directory is implementation dependent and
is usually not alphabetical.
Example
We'll use these directory routines to write a program that traverses a file hierarchy. The goal
is to produce the count of the various types of files that we show in Figure 4.4. The program
shown in Figure 4.22 takes a single argumentthe starting pathnameand recursively descends
the hierarchy from that point. Solaris provides a function, ftw(3), that performs the actual
traversal of the hierarchy, calling a user-defined function for each file. The problem with this
function is that it calls the stat function for each file, which causes the program to follow
symbolic links. For example, if we start at the root and have a symbolic link named /lib that
points to /usr/lib, all the files in the directory /usr/lib are counted twice. To correct this,
Solaris provides an additional function, nftw(3), with an option that stops it from following
symbolic links. Although we could use nftw, we'll write our own simple file walker to show the
use of the directory routines.
In the Single UNIX Specification, both ftw and nftw are included in the XSI extensions to the
base POSIX.1 specification. Implementations are included in Solaris 9 and Linux 2.4.22.
BSD-based systems have a different function, fts(3), that provides similar functionality. It is
available in FreeBSD 5.2.1, Mac OS X 10.3, and Linux 2.4.22.
We have provided more generality in this program than needed. This was done to illustrate
the ftw function. For example, the function myfunc always returns 0, even though the function
that calls it is prepared to handle a nonzero return.
Figure 4.22. Recursively descend a directory hierarchy, counting file
types
#include "apue.h"
#include <dirent.h>
#include <limits.h>
/* function type that is called for each filename */
typedef int Myfunc(const char *, const struct stat *, int);
static Myfunc
static int
static int
myfunc;
myftw(char *, Myfunc *);
dopath(Myfunc *);
Page 192
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
static long nreg, ndir, nblk, nchr, nfifo, nslink, nsock, ntot;
int
main(int argc, char *argv[])
{
int
ret;
if (argc != 2)
err_quit("usage: ftw <starting-pathname>");
ret = myftw(argv[1], myfunc);
/* does it all */
ntot = nreg + ndir + nblk + nchr + nfifo + nslink + nsock;
if (ntot == 0)
ntot = 1;
/* avoid divide by 0; print 0 for all counts */
printf("regular files = %7ld, %5.2f %%\n", nreg,
nreg*100.0/ntot);
printf("directories
= %7ld, %5.2f %%\n", ndir,
ndir*100.0/ntot);
printf("block special = %7ld, %5.2f %%\n", nblk,
nblk*100.0/ntot);
printf("char special
= %7ld, %5.2f %%\n", nchr,
nchr*100.0/ntot);
printf("FIFOs
= %7ld, %5.2f %%\n", nfifo,
nfifo*100.0/ntot);
printf("symbolic links = %7ld, %5.2f %%\n", nslink,
nslink*100.0/ntot);
printf("sockets
= %7ld, %5.2f %%\n", nsock,
nsock*100.0/ntot);
exit(ret);
}
/*
* Descend through the hierarchy, starting at "pathname".
* The caller's func() is called for every file.
*/
#define FTW_F
1
/* file other than directory */
#define FTW_D
2
/* directory */
#define FTW_DNR 3
/* directory that can't be read */
#define FTW_NS 4
/* file that we can't stat */
static char *fullpath;
/* contains full pathname for every file */
static int
/* we return whatever func() returns */
myftw(char *pathname, Myfunc *func)
{
int len;
fullpath = path_alloc(&len);
/* malloc's for PATH_MAX+1 bytes */
/* (Figure 2.15) */
strncpy(fullpath, pathname, len);
/* protect against */
fullpath[len-1] = 0;
/* buffer overrun */
return(dopath(func));
}
/*
* Descend through the hierarchy, starting at "fullpath".
* If "fullpath" is anything other than a directory, we lstat() it,
Page 193
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
* call func(), and return. For a directory, we call ourself
* recursively for each name in the directory.
*/
static int
/* we return whatever func() returns */
dopath(Myfunc* func)
{
struct stat
statbuf;
struct dirent
*dirp;
DIR
*dp;
int
ret;
char
*ptr;
if (lstat(fullpath, &statbuf) < 0) /* stat error */
return(func(fullpath, &statbuf, FTW_NS));
if (S_ISDIR(statbuf.st_mode) == 0) /* not a directory */
return(func(fullpath, &statbuf, FTW_F));
/*
* It's a directory. First call func() for the directory,
* then process each filename in the directory.
*/
if ((ret = func(fullpath, &statbuf, FTW_D)) != 0)
return(ret);
ptr = fullpath + strlen(fullpath);
*ptr++ = '/';
*ptr = 0;
/* point to end of fullpath */
if ((dp = opendir(fullpath)) == NULL)
/* can't read directory */
return(func(fullpath, &statbuf, FTW_DNR));
while ((dirp = readdir(dp)) != NULL) {
if (strcmp(dirp->d_name, ".") == 0 ||
strcmp(dirp->d_name, "..") == 0)
continue;
/* ignore dot and dot-dot */
strcpy(ptr, dirp->d_name);
/* append name after slash */
if ((ret = dopath(func)) != 0)
break; /* time to leave */
}
ptr[-1] = 0;
/* recursive */
/* erase everything from slash onwards */
if (closedir(dp) < 0)
err_ret("can't close directory %s", fullpath);
return(ret);
}
static int
myfunc(const char *pathname, const struct stat *statptr, int type)
{
switch (type) {
case FTW_F:
switch (statptr->st_mode & S_IFMT) {
case S_IFREG:
nreg++;
break;
case S_IFBLK:
nblk++;
break;
case S_IFCHR:
nchr++;
break;
case S_IFIFO:
nfifo++;
break;
case S_IFLNK:
nslink++; break;
Page 194
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
case S_IFSOCK:
nsock++;
break;
case S_IFDIR:
err_dump("for S_IFDIR for %s", pathname);
/* directories should have type = FTW_D */
}
break;
case FTW_D:
ndir++;
break;
case FTW_DNR:
err_ret("can't read directory %s", pathname);
break;
case FTW_NS:
err_ret("stat error for %s", pathname);
break;
default:
err_dump("unknown type %d for pathname %s", type, pathname);
}
return(0);
}
For additional information on descending through a file system and the use of this technique in
many standard UNIX System commandsfind, ls, tar, and so onrefer to Fowler, Korn, and Vo [
1989].
Page 195
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.22. chdir, fchdir, and getcwd Functions
Every process has a current working directory. This directory is where the search for all
relative pathnames starts (all pathnames that do not begin with a slash). When a user logs in
to a UNIX system, the current working directory normally starts at the directory specified by
the sixth field in the /etc/passwd filethe user's home directory. The current working directory is
an attribute of a process; the home directory is an attribute of a login name.
We can change the current working directory of the calling process by calling the chdir or
fchdir functions.
#include <unistd.h>
int chdir(const char *pathname
);
int fchdir(int filedes);
Both return: 0 if OK, 1 on error
We can specify the new current working directory either as a pathname or through an open
file descriptor.
The fchdir function is not part of the base POSIX.1 specification. It is an XSI extension in the
Single UNIX Specification. All four platforms discussed in this book support fchdir.
Example
Because it is an attribute of a process, the current working directory cannot affect processes
that invoke the process that executes the chdir. (We describe the relationship between
processes in more detail in Chapter 8.) This means that the program in Figure 4.23 doesn't do
what we might expect.
If we compile it and call the executable mycd, we get the following:
$ pwd
/usr/lib
$ mycd
chdir to /tmp succeeded
$ pwd
/usr/lib
The current working directory for the shell that executed the mycd program didn't change. This
is a side effect of the way that the shell executes programs. Each program is run in a
separate process, so the current working directory of the shell is unaffected by the call to
chdir in the program. For this reason, the chdir function has to be called directly from the
shell, so the cd command is built into the shells.
Figure 4.23. Example of chdir function
#include "apue.h"
int
main(void)
Page 196
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
{
if (chdir("/tmp") < 0)
err_sys("chdir failed");
printf("chdir to /tmp succeeded\n");
exit(0);
}
Because the kernel must maintain knowledge of the current working directory, we should be
able to fetch its current value. Unfortunately, the kernel doesn't maintain the full pathname of
the directory. Instead, the kernel keeps information about the directory, such as a pointer to
the directory's v-node.
What we need is a function that starts at the current working directory (dot) and works its
way up the directory hierarchy, using dot-dot to move up one level. At each directory, the
function reads the directory entries until it finds the name that corresponds to the i-node of
the directory that it just came from. Repeating this procedure until the root is encountered
yields the entire absolute pathname of the current working directory. Fortunately, a function
is already provided for us that does this task.
#include <unistd.h>
char *getcwd(char *buf, size_t size);
Returns: buf if OK, NULL on error
We must pass to this function the address of a buffer, buf, and its size (in bytes). The buffer
must be large enough to accommodate the absolute pathname plus a terminating null byte, or
an error is returned. (Recall the discussion of allocating space for a maximum-sized pathname
in Section 2.5.5.)
Some older implementations of getcwd allow the first argument buf to be NULL. In this case, the
function calls malloc to allocate size number of bytes dynamically. This is not part of POSIX.1
or the Single UNIX Specification and should be avoided.
Example
The program in Figure 4.24 changes to a specific directory and then calls getcwd to print the
working directory. If we run the program, we get
$ ./a.out
cwd = /var/spool/uucppublic
$ ls -l /usr/spool
lrwxrwxrwx 1 root 12 Jan 31 07:57 /usr/spool -> ../var/spool
Note that chdir follows the symbolic linkas we expect it to, from Figure 4.17but when it goes
up the directory tree, getcwd has no idea when it hits the /var/spool directory that it is
pointed to by the symbolic link /usr/spool. This is a characteristic of symbolic links.
Figure 4.24. Example of getcwd function
#include "apue.h"
int
main(void)
{
Page 197
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
char
int
*ptr;
size;
if (chdir("/usr/spool/uucppublic") < 0)
err_sys("chdir failed");
ptr = path_alloc(&size); /* our own function */
if (getcwd(ptr, size) == NULL)
err_sys("getcwd failed");
printf("cwd = %s\n", ptr);
exit(0);
}
The getcwd function is useful when we have an application that needs to return to the
location in the file system where it started out. We can save the starting location by calling
getcwd before we change our working directory. After we complete our processing, we can
pass the pathname obtained from getcwd to chdir to return to our starting location in the file
system.
The fchdir function provides us with an easy way to accomplish this task. Instead of calling
getcwd, we can open the current directory and save the file descriptor before we change to a
different location in the file system. When we want to return to where we started, we can
simply pass the file descriptor to fchdir.
Page 198
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.23. Device Special Files
The two fields st_dev and st_rdev are often confused. We'll need to use these fields in Section
18.9 when we write the ttyname function. The rules are simple.

Every file system is known by its major and minor device numbers, which are encoded
in the primitive system data type dev_t. The major number identifies the device driver
and sometimes encodes which peripheral board to communicate with; the minor number
identifies the specific subdevice. Recall from Figure 4.13 that a disk drive often
contains several file systems. Each file system on the same disk drive would usually
have the same major number, but a different minor number.

We can usually access the major and minor device numbers through two macros
defined by most implementations: major and minor. This means that we don't care how
the two numbers are stored in a dev_t object.
Early systems stored the device number in a 16-bit integer, with 8 bits for the major
number and 8 bits for the minor number. FreeBSD 5.2.1 and Mac OS X 10.3 use a
32-bit integer, with 8 bits for the major number and 24 bits for the minor number. On
32-bit systems, Solaris 9 uses a 32-bit integer for dev_t, with 14 bits designated as the
major number and 18 bits designated as the minor number. On 64-bit systems, Solaris 9
represents dev_t as a 64-bit integer, with 32 bits for each number. On Linux 2.4.22,
although dev_t is a 64-bit integer, currently the major and minor numbers are each only
8 bits.
POSIX.1 states that the dev_t type exists, but doesn't define what it contains or how
to get at its contents. The macros major and minor are defined by most
implementations. Which header they are defined in depends on the system. They can
be found in <sys/types.h> on BSD-based systems. Solaris defines them in <sys/mkdev.h>
. Linux defines these macros in <sys/sysmacros.h>, which is included by <sys/types.h>.

The st_dev value for every filename on a system is the device number of the file
system containing that filename and its corresponding i-node.

Only character special files and block special files have an st_rdev value. This value
contains the device number for the actual device.
Example
The program in Figure 4.25 prints the device number for each command-line argument.
Additionally, if the argument refers to a character special file or a block special file, the
st_rdev value for the special file is also printed.
Running this program gives us the following output:
$ ./a.out / /home/sar /dev/tty[01]
/: dev = 3/3
/home/sar: dev = 3/4
/dev/tty0: dev = 0/7 (character) rdev = 4/0
/dev/tty1: dev = 0/7 (character) rdev = 4/1
$ mount
which directories are mounted on which devices?
/dev/hda3 on / type ext2 (rw,noatime)
/dev/hda4 on /home type ext2 (rw,noatime)
$ ls -lL /dev/tty[01] /dev/hda[34]
brw------- 1 root
3,
3 Dec 31 1969 /dev/hda3
brw------- 1 root
3,
4 Dec 31 1969 /dev/hda4
crw------- 1 root
4,
0 Dec 31 1969 /dev/tty0
Page 199
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
crw-------
1 root
4,
1 Jan 18 15:36 /dev/tty1
The first two arguments to the program are directories (/ and /home/sar), and the next two
are the device names /dev/tty[01]. (We use the shell's regular expression language to shorten
the amount of typing we need to do. The shell will expand the string /dev/tty[01] to /dev/tty0
/dev/tty1.)
We expect the devices to be character special files. The output from the program shows that
the root directory has a different device number than does the /home/sar directory. This
indicates that they are on different file systems. Running the mount(1) command verifies this.
We then use ls to look at the two disk devices reported by mount and the two terminal
devices. The two disk devices are block special files, and the two terminal devices are
character special files. (Normally, the only types of devices that are block special files are
those that can contain random-access file systems: disk drives, floppy disk drives, and
CD-ROMs, for example. Some older versions of the UNIX System supported magnetic tapes for
file systems, but this was never widely used.)
Note that the filenames and i-nodes for the two terminal devices (st_dev) are on device
0/7the devfs pseudo file system, which implements the /devbut that their actual device
numbers are 4/0 and 4/1.
Figure 4.25. Print st_dev and st_rdev values
#include "apue.h"
#ifdef SOLARIS
#include <sys/mkdev.h>
#endif
int
main(int argc, char *argv[])
{
int
i;
struct stat buf;
for (i = 1; i < argc; i++) {
printf("%s: ", argv[i]);
if (stat(argv[i], &buf) < 0) {
err_ret("stat error");
continue;
}
printf("dev = %d/%d", major(buf.st_dev), minor(buf.st_dev));
if (S_ISCHR(buf.st_mode) || S_ISBLK(buf.st_mode)) {
printf(" (%s) rdev = %d/%d",
(S_ISCHR(buf.st_mode)) ? "character" : "block",
major(buf.st_rdev), minor(buf.st_rdev));
}
printf("\n");
}
exit(0);
}
Page 200
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 201
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.24. Summary of File Access Permission Bits
We've covered all the file access permission bits, some of which serve multiple purposes.
Figure 4.26 summarizes all these permission bits and their interpretation when applied to a
regular file and a directory.
Figure 4.26. Summary of file access permission bits
Constant
Description
Effect on regular file
Effect on directory
S_ISUID
set-user-ID
set effective user ID on execution
(not used)
S_ISGID
set-group-I
D
if group-execute set then set effective
group ID on execution; otherwise
enable mandatory record locking (if
supported)
set group ID of new files
created in directory to
group ID of directory
S_ISVTX
sticky bit
control caching of file contents (if
supported)
restrict removal and
renaming of files in
directory
S_IRUSR
user-read
user permission to read file
user permission to read
directory entries
S_IWUSR
user-write
user permission to write file
user permission to remove
and create files in
directory
S_IXUSR
user-execut
e
user permission to execute file
user permission to search
for given pathname in
directory
S_IRGRP
group-read
group permission to read file
group permission to read
directory entries
S_IWGRP
group-write
group permission to write file
group permission to
remove and create files in
directory
S_IXGRP
group-execu group permission to execute file
te
group permission to
search for given
pathname in directory
S_IROTH
other-read
other permission to read file
other permission to read
directory entries
S_IWOTH
other-write
other permission to write file
other permission to
remove and create files in
directory
Page 202
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 4.26. Summary of file access permission bits
Constant
S_IXOTH
Description
Effect on regular file
other-execu other permission to execute file
te
Effect on directory
other permission to
search for given
pathname in directory
The final nine constants can also be grouped into threes, since
S_IRWXU = S_IRUSR | S_IWUSR | S_IXUSR
S_IRWXG = S_IRGRP | S_IWGRP | S_IXGRP
S_IRWXO = S_IROTH | S_IWOTH | S_IXOTH
Page 203
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.25. Summary
This chapter has centered around the stat function. We've gone through each member in the
stat structure in detail. This in turn led us to examine all the attributes of UNIX files. A
thorough understanding of all the properties of a file and all the functions that operate on files
is essential to UNIX programming.
Page 204
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Exercises
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
Modify the program in Figure 4.3 to use stat instead of lstat. What changes if
one of the command-line arguments is a symbolic link?
What happens if the file mode creation mask is set to 777 (octal)? Verify the
results using your shell's umask command.
Verify that turning off user-read permission for a file that you own denies your
access to the file.
Run the program in Figure 4.9 after creating the files foo and bar. What
happens?
In Section 4.12, we said that a file size of 0 is valid for a regular file. We also
said that the st_size field is defined for directories and symbolic links. Should
we ever see a file size of 0 for a directory or a symbolic link?
Write a utility like cp(1) that copies a file containing holes, without writing the
bytes of 0 to the output file.
Note in output from the ls command in Section 4.12 that the files core and
core.copy have different access permissions. If the umask value didn't change
between the creation of the two files, explain how the difference could have
occurred.
When running the program in Figure 4.16, we check the available disk space
with the df(1) command. Why didn't we use the du(1) command?
In Figure 4.20, we show the unlink function as modifying the changed-status
time of the file itself. How can this happen?
In Section 4.21, how does the system's limit on the number of open files affect
the myftw function?
In Section 4.21, our version of ftw never changes its directory. Modify this
routine so that each time it encounters a directory, it does a chdir to that
directory, allowing it to use the filename and not the pathname for each call to
lstat. When all the entries in a directory have been processed, execute
chdir(".."). Compare the time used by this version and the version in the text.
Each process also has a root directory that is used for resolution of absolute
pathnames. This root directory can be changed with the chroot function. Look
up the description for this function in your manuals. When might this function
be useful?
Page 205
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
4.13
4.14
4.15
4.16
4.17
How can you set only one of the two time values with the utime function?
Some versions of the finger(1) command output "New mail received ..." and
"unread since ..." where ... are the corresponding times and dates. How can
the program determine these two times and dates?
Examine the archive formats by the cpio(1) and tar(1) commands. (These
descriptions are usually found in Section 5 of the UNIX Programmer's Manual.)
How many of the three possible time values are saved for each file? When a file
is restored, what value do you think the access time is set to, and why?
Does the UNIX System have a fundamental limitation on the depth of a
directory tree? To find out, write a program that creates a directory and then
changes to that directory, in a loop. Make certain that the length of the
absolute pathname of the leaf of this directory is greater than your system's
PATH_MAX limit. Can you call getcwd to fetch the directory's pathname? How do
the standard UNIX System tools deal with this long pathname? Can you archive
the directory using either tar or cpio?
In Section 3.16, we described the /dev/fd feature. For any user to be able to
access these files, their permissions must be rw-rw-rw-. Some programs that
create an output file delete the file first, in case it already exists, ignoring the
return code:
unlink(path);
if ((fd = creat(path, FILE_MODE)) < 0)
err_sys(...);
What happens if path is /dev/fd/1?
Page 206
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Chapter 5. Standard I/O Library
Section 5.1. Introduction
Section 5.2. Streams and FILE Objects
Section 5.3. Standard Input, Standard Output, and Standard Error
Section 5.4. Buffering
Section 5.5. Opening a Stream
Section 5.6. Reading and Writing a Stream
Section 5.7. Line-at-a-Time I/O
Section 5.8. Standard I/O Efficiency
Section 5.9. Binary I/O
Section 5.10. Positioning a Stream
Section 5.11. Formatted I/O
Section 5.12. Implementation Details
Section 5.13. Temporary Files
Section 5.14. Alternatives to Standard I/O
Section 5.15. Summary
Exercises
Page 207
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
5.1. Introduction
In this chapter, we describe the standard I/O library. This library is specified by the ISO C
standard because it has been implemented on many operating systems other than the UNIX
System. Additional interfaces are defined as extensions to the ISO C standard by the Single
UNIX Specification.
The standard I/O library handles such details as buffer allocation and performing I/O in
optimal-sized chunks, obviating our need to worry about using the correct block size (as in
Section 3.9). This makes the library easy to use, but at the same time introduces another set
of problems if we're not cognizant of what's going on.
The standard I/O library was written by Dennis Ritchie around 1975. It was a major revision of
the Portable I/O library written by Mike Lesk. Surprisingly, little has changed in the standard
I/O library after 30 years.
Page 208
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
5.2. Streams and FILE Objects
In Chapter 3, all the I/O routines centered around file descriptors. When a file is opened, a file
descriptor is returned, and that descriptor is then used for all subsequent I/O operations. With
the standard I/O library, the discussion centers around streams. (Do not confuse the
standard I/O term stream with the STREAMS I/O system that is part of System V and
standardized in the XSI STREAMS option in the Single UNIX Specification.) When we open or
create a file with the standard I/O library, we say that we have associated a stream with the
file.
With the ASCII character set, a single character is represented by a single byte. With
international character sets, a character can be represented by more than one byte.
Standard I/O file streams can be used with single-byte and multibyte ("wide") character sets.
A stream's orientation determines whether the characters that are read and written are
single-byte or multibyte. Initially, when a stream is created, it has no orientation. If a
multibyte I/O function (see <wchar.h>) is used on a stream without orientation, the stream's
orientation is set to wide-oriented. If a byte I/O function is used on a stream without
orientation, the stream's orientation is set to byte-oriented. Only two functions can change
the orientation once set. The freopen function (discussed shortly) will clear a stream's
orientation; the fwide function can be used to set a stream's orientation.
#include <stdio.h>
#include <wchar.h>
int fwide(FILE *fp, int mode);
Returns: positive if stream is wide-oriented,
negative if stream is byte-oriented,
or 0 if stream has no orientation
The fwide function performs different tasks, depending on the value of the mode argument.

If the mode argument is negative, fwide will try to make the specified stream
byte-oriented.

If the mode argument is positive, fwide will try to make the specified stream
wide-oriented.

If the mode argument is zero, fwide will not try to set the orientation, but will still
return a value identifying the stream's orientation.
Note that fwide will not change the orientation of a stream that is already oriented. Also note
that there is no error return. Consider what would happen if the stream is invalid. The only
recourse we have is to clear errno before calling fwide and check the value of errno when we
return. Throughout the rest of this book, we will deal only with byte-oriented streams.
When we open a stream, the standard I/O function fopen returns a pointer to a FILE object.
This object is normally a structure that contains all the information required by the standard
I/O library to manage the stream: the file descriptor used for actual I/O, a pointer to a buffer
for the stream, the size of the buffer, a count of the number of characters currently in the
buffer, an error flag, and the like.
Application software should never need to examine a FILE object. To reference the stream,
we pass its FILE pointer as an argument to each standard I/O function. Throughout this text,
we'll refer to a pointer to a FILE object, the type FILE * as a file pointer.
Page 209
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Throughout this chapter, we describe the standard I/O library in the context of a UNIX
system. As we mentioned, this library has already been ported to a wide variety of other
operating systems. But to provide some insight about how this library can be implemented, we
will talk about its typical implementation on a UNIX system.
Page 210
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
5.3. Standard Input, Standard Output, and Standard
Error
Three streams are predefined and automatically available to a process: standard input,
standard output, and standard error. These streams refer to the same files as the file
descriptors STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO, which we mentioned in Section
3.2.
These three standard I/O streams are referenced through the predefined file pointers stdin,
stdout, and stderr. The file pointers are defined in the <stdio.h> header.
Page 211
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
5.4. Buffering
The goal of the buffering provided by the standard I/O library is to use the minimum number
of read and write calls. (Recall Figure 3.5, where we showed the amount of CPU time required
to perform I/O using various buffer sizes.) Also, it tries to do its buffering automatically for
each I/O stream, obviating the need for the application to worry about it. Unfortunately, the
single aspect of the standard I/O library that generates the most confusion is its buffering.
Three types of buffering are provided:
1.
Fully buffered. In this case, actual I/O takes place when the standard I/O buffer is
filled. Files residing on disk are normally fully buffered by the standard I/O library. The
buffer used is usually obtained by one of the standard I/O functions calling malloc (
Section 7.8) the first time I/O is performed on a stream.
The term flush describes the writing of a standard I/O buffer. A buffer can be flushed
automatically by the standard I/O routines, such as when a buffer fills, or we can call
the function fflush to flush a stream. Unfortunately, in the UNIX environment, flush
means two different things. In terms of the standard I/O library, it means writing out
the contents of a buffer, which may be partially filled. In terms of the terminal driver,
such as the tcflush function in Chapter 18, it means to discard the data that's already
stored in a buffer.
2.
Line buffered. In this case, the standard I/O library performs I/O when a newline
character is encountered on input or output. This allows us to output a single
character at a time (with the standard I/O fputc function), knowing that actual I/O will
take place only when we finish writing each line. Line buffering is typically used on a
stream when it refers to a terminal: standard input and standard output, for example.
Line buffering comes with two caveats. First, the size of the buffer that the standard
I/O library is using to collect each line is fixed, so I/O might take place if we fill this
buffer before writing a newline. Second, whenever input is requested through the
standard I/O library from either (a) an unbuffered stream or (b) a line-buffered stream
(that requires data to be requested from the kernel), all line-buffered output streams
are flushed. The reason for the qualifier on (b) is that the requested data may already
be in the buffer, which doesn't require data to be read from the kernel. Obviously, any
input from an unbuffered stream, item (a), requires data to be obtained from the
kernel.
3.
Unbuffered. The standard I/O library does not buffer the characters. If we write 15
characters with the standard I/O fputs function, for example, we expect these 15
characters to be output as soon as possible, probably with the write function from
Section 3.8.
The standard error stream, for example, is normally unbuffered. This is so that any
error messages are displayed as quickly as possible, regardless of whether they contain
a newline.
ISO C requires the following buffering characteristics.

Standard input and standard output are fully buffered, if and only if they do not refer
to an interactive device.

Standard error is never fully buffered.
This, however, doesn't tell us whether standard input and standard output can be unbuffered
or line buffered if they refer to an interactive device and whether standard error should be
unbuffered or line buffered. Most implementations default to the following types of buffering.
Page 212
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html

Standard error is always unbuffered.

All other streams are line buffered if they refer to a terminal device; otherwise, they
are fully buffered.
The four platforms discussed in this book follow these conventions for standard I/O
buffering: standard error is unbuffered, streams open to terminal devices are line
buffered, and all other streams are fully buffered.
We explore standard I/O buffering in more detail in Section 5.12 and Figure 5.11.
If we don't like these defaults for any given stream, we can change the buffering by calling
either of the following two functions.
[View full width]
#include <stdio.h>
void setbuf(FILE *restrict fp, char *restrict buf);
int setvbuf(FILE *restrict fp, char *restrict buf,
int mode,
size_t size);
Returns: 0 if OK, nonzero on error
These functions must be called after the stream has been opened (obviously, since each
requires a valid file pointer as its first argument) but before any other operation is performed
on the stream.
With setbuf, we can turn buffering on or off. To enable buffering, buf must point to a buffer of
length BUFSIZ, a constant defined in <stdio.h>. Normally, the stream is then fully buffered, but
some systems may set line buffering if the stream is associated with a terminal device. To
disable buffering, we set buf to NULL.
With setvbuf, we specify exactly which type of buffering we want. This is done with the mode
argument:
_IOFBF fully buffered
_IOLBF line buffered
_IONBF unbuffered
If we specify an unbuffered stream, the buf and size arguments are ignored. If we specify fully
buffered or line buffered, buf and size can optionally specify a buffer and its size. If the
stream is buffered and buf is NULL, the standard I/O library will automatically allocate its own
buffer of the appropriate size for the stream. By appropriate size, we mean the value specified
by the constant BUFSIZ.
Some C library implementations use the value from the st_blksize member of the stat
structure (see Section 4.2) to determine the optimal standard I/O buffer size. As we will see
later in this chapter, the GNU C library uses this method.
Figure 5.1 summarizes the actions of these two functions and their various options.
Page 213
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 5.1. Summary of the setbuf and setvbuf functions
Function
mode
buf
Buffer and length
non-null
user buf of length BUFSIZ
fully buffered or line
buffered
NULL
(no buffer)
unbuffered
non-null
user buf of length size
NULL
system buffer of appropriate
length
non-null
user buf of length size
NULL
system buffer of appropriate
length
setbuf
_IOLBF
setvbuf
Type of buffering
_IOFBF
_IONBF (ignored)
(no buffer)
fully buffered
line buffered
unbuffered
Be aware that if we allocate a standard I/O buffer as an automatic variable within a function,
we have to close the stream before returning from the function. (We'll discuss this more in
Section 7.8.) Also, some implementations use part of the buffer for internal bookkeeping, so
the actual number of bytes of data that can be stored in the buffer is less than size. In
general, we should let the system choose the buffer size and automatically allocate the
buffer. When we do this, the standard I/O library automatically releases the buffer when we
close the stream.
At any time, we can force a stream to be flushed.
#include <stdio.h>
int fflush(FILE *fp);
Returns: 0 if OK, EOF on error
This function causes any unwritten data for the stream to be passed to the kernel. As a
special case, if fp is NULL, this function causes all output streams to be flushed.
Page 214
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
5.5. Opening a Stream
The following three functions open a standard I/O stream.
[View full width]
#include <stdio.h>
FILE *fopen(const char *restrict pathname, const
char *restrict type);
FILE *freopen(const char *restrict pathname,
const
char *restrict type,
FILE *restrict fp);
FILE *fdopen(int filedes, const char *type);
All three return: file pointer if OK, NULL on error
The differences in these three functions are as follows.
1.
The fopen function opens a specified file.
2.
The freopen function opens a specified file on a specified stream, closing the stream
first if it is already open. If the stream previously had an orientation, freopen clears it.
This function is typically used to open a specified file as one of the predefined
streams: standard input, standard output, or standard error.
3.
The fdopen function takes an existing file descriptor, which we could obtain from the
open, dup, dup2, fcntl, pipe, socket, socketpair, or accept functions, and associates a
standard I/O stream with the descriptor. This function is often used with descriptors
that are returned by the functions that create pipes and network communication
channels. Because these special types of files cannot be opened with the standard
I/O fopen function, we have to call the device-specific function to obtain a file
descriptor, and then associate this descriptor with a standard I/O stream using fdopen.
Both fopen and freopen are part of ISO C; fdopen is part of POSIX.1, since ISO C
doesn't deal with file descriptors.
ISO C specifies 15 values for the type argument, shown in Figure 5.2.
Figure 5.2. The type argument for opening a standard I/O stream
type
Description
r or rb
open for reading
w or wb
truncate to 0 length or create for writing
Page 215
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 5.2. The type argument for opening a standard I/O stream
type
Description
a or ab
append; open for writing at end of file, or create for writing
r+ or r+b or rb+
open for reading and writing
w+ or w+b or wb+
truncate to 0 length or create for reading and writing
a+ or a+b or ab+
open or create for reading and writing at end of file
Using the character b as part of the type allows the standard I/O system to differentiate
between a text file and a binary file. Since the UNIX kernel doesn't differentiate between
these types of files, specifying the character b as part of the type has no effect.
With fdopen, the meanings of the type argument differ slightly. The descriptor has already
been opened, so opening for write does not truncate the file. (If the descriptor was created
by the open function, for example, and the file already existed, the O_TRUNC flag would control
whether or not the file was truncated. The fdopen function cannot simply truncate any file it
opens for writing.) Also, the standard I/O append mode cannot create the file (since the file
has to exist if a descriptor refers to it).
When a file is opened with a type of append, each write will take place at the then current
end of file. If multiple processes open the same file with the standard I/O append mode, the
data from each process will be correctly written to the file.
Versions of fopen from Berkeley before 4.4BSD and the simple version shown on page 177 of
Kernighan and Ritchie [1988] do not handle the append mode correctly. These versions do an
lseek to the end of file when the stream is opened. To correctly support the append mode
when multiple processes are involved, the file must be opened with the O_APPEND flag, which
we discussed in Section 3.3. Doing an lseek before each write won't work either, as we
discussed in Section 3.11.
When a file is opened for reading and writing (the plus sign in the type), the following
restrictions apply.

Output cannot be directly followed by input without an intervening fflush, fseek,
fsetpos,or rewind.

Input cannot be directly followed by output without an intervening fseek, fsetpos,or
rewind, or an input operation that encounters an end of file.
We can summarize the six ways to open a stream from Figure 5.2 in Figure 5.3.
Figure 5.3. Six ways to open a standard I/O stream
Restriction
file must already exist
previous contents of file discarded
r
w
•
a
r+
w+
a+
•
•
•
Page 216
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 5.3. Six ways to open a standard I/O stream
Restriction
stream can be read
stream can be written
stream can be written only at end
r
w
a
•
•
•
•
r+
w+
a+
•
•
•
•
•
•
•
Note that if a new file is created by specifying a type of either w or a, we are not able to
specify the file's access permission bits, as we were able to do with the open function and
the creat function in Chapter 3.
By default, the stream that is opened is fully buffered, unless it refers to a terminal device, in
which case it is line buffered. Once the stream is opened, but before we do any other
operation on the stream, we can change the buffering if we want to, with the setbuf or
setvbuf functions from the previous section.
An open stream is closed by calling fclose.
#include <stdio.h>
int fclose(FILE *fp);
Returns: 0 if OK, EOF on error
Any buffered output data is flushed before the file is closed. Any input data that may be
buffered is discarded. If the standard I/O library had automatically allocated a buffer for the
stream, that buffer is released.
When a process terminates normally, either by calling the exit function directly or by returning
from the main function, all standard I/O streams with unwritten buffered data are flushed, and
all open standard I/O streams are closed.
Page 217
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
5.6. Reading and Writing a Stream
Once we open a stream, we can choose from among three types of unformatted I/O:
1.
Character-at-a-time I/O. We can read or write one character at a time, with the
standard I/O functions handling all the buffering, if the stream is buffered.
2.
Line-at-a-time I/O. If we want to read or write a line at a time, we use fgets and
fputs. Each line is terminated with a newline character, and we have to specify the
maximum line length that we can handle when we call fgets. We describe these two
functions in Section 5.7.
3.
Direct I/O. This type of I/O is supported by the fread and fwrite functions. For each
I/O operation, we read or write some number of objects, where each object is of a
specified size. These two functions are often used for binary files where we read or
write a structure with each operation. We describe these two functions in Section 5.9.
The term direct I/O, from the ISO C standard, is known by many names: binary I/O,
object-at-a-time I/O, record-oriented I/O, or structure-oriented I/O.
(We describe the formatted I/O functions, such as printf and scanf, in Section 5.11.)
Input Functions
Three functions allow us to read one character at a time.
#include <stdio.h>
int getc(FILE *fp);
int fgetc(FILE *fp);
int getchar(void);
All three return: next character if OK, EOF on end of file or error
The function getchar is defined to be equivalent to getc(stdin). The difference between the
first two functions is that getc can be implemented as a macro, whereas fgetc cannot be
implemented as a macro. This means three things.
1.
The argument to getc should not be an expression with side effects.
2.
Since fgetc is guaranteed to be a function, we can take its address. This allows us to
pass the address of fgetc as an argument to another function.
3.
Calls to fgetc probably take longer than calls to getc, as it usually takes more time to
call a function.
These three functions return the next character as an unsigned char converted to an int. The
reason for specifying unsigned is so that the high-order bit, if set, doesn't cause the return
value to be negative. The reason for requiring an integer return value is so that all possible
character values can be returned, along with an indication that either an error occurred or the
end of file has been encountered. The constant EOF in <stdio.h> is required to be a negative
value. Its value is often 1. This representation also means that we cannot store the return
value from these three functions in a character variable and compare this value later against
Page 218
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
the constant EOF.
Note that these functions return the same value whether an error occurs or the end of file is
reached. To distinguish between the two, we must call either ferror or feof.
#include <stdio.h>
int ferror(FILE *fp);
int feof(FILE *fp);
Both return: nonzero (true) if condition is true, 0 (false) otherwise
void clearerr(FILE *fp);
In most implementations, two flags are maintained for each stream in the FILE object:

An error flag

An end-of-file flag
Both flags are cleared by calling clearerr.
After reading from a stream, we can push back characters by calling ungetc.
#include <stdio.h>
int ungetc(int c, FILE *fp);
Returns: c if OK, EOF on error
The characters that are pushed back are returned by subsequent reads on the stream in
reverse order of their pushing. Be aware, however, that although ISO C allows an
implementation to support any amount of pushback, an implementation is required to provide
only a single character of pushback. We should not count on more than a single character.
The character that we push back does not have to be the same character that was read. We
are not able to push back EOF. But when we've reached the end of file, we can push back a
character. The next read will return that character, and the read after that will return EOF.
This works because a successful call to ungetc clears the end-of-file indication for the stream.
Pushback is often used when we're reading an input stream and breaking the input into words
or tokens of some form. Sometimes we need to peek at the next character to determine how
to handle the current character. It's then easy to push back the character that we peeked
at, for the next call to getc to return. If the standard I/O library didn't provide this pushback
capability, we would have to store the character in a variable of our own, along with a flag
telling us to use this character instead of calling getc the next time we need a character.
When we push characters back with ungetc, they don't get written back to the underlying file
or device. They are kept incore in the standard I/O library's buffer for the stream.
Output Functions
We'll find an output function that corresponds to each of the input functions that we've
already described.
Page 219
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
#include <stdio.h>
int putc(int c, FILE *fp);
int fputc(int c, FILE *fp);
int putchar(int c);
All three return: c if OK, EOF on error
Like the input functions, putchar(c) is equivalent to putc(c, stdout), and putc can be
implemented as a macro, whereas fputc cannot be implemented as a macro.
Page 220
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
5.7. Line-at-a-Time I/O
Line-at-a-time input is provided by the following two functions.
[View full width]
#include <stdio.h>
char *fgets(char *restrict buf, int n, FILE
*restrict fp);
char *gets(char *buf);
Both return: buf if OK, NULL on end of file or error
Both specify the address of the buffer to read the line into. The gets function reads from
standard input, whereas fgets reads from the specified stream.
With fgets, we have to specify the size of the buffer, n. This function reads up through and
including the next newline, but no more than n1 characters, into the buffer. The buffer is
terminated with a null byte. If the line, including the terminating newline, is longer than n1,
only a partial line is returned, but the buffer is always null terminated. Another call to fgets
will read what follows on the line.
The gets function should never be used. The problem is that it doesn't allow the caller to
specify the buffer size. This allows the buffer to overflow, if the line is longer than the buffer,
writing over whatever happens to follow the buffer in memory. For a description of how this
flaw was used as part of the Internet worm of 1988, see the June 1989 issue (vol. 32, no. 6)
of Communications of the ACM . An additional difference with gets is that it doesn't store the
newline in the buffer, as does fgets.
This difference in newline handling between the two functions goes way back in the evolution
of the UNIX System. Even the Version 7 manual (1979) states "gets deletes a newline, fgets
keeps it, all in the name of backward compatibility."
Even though ISO C requires an implementation to provide gets, use fgets instead.
Line-at-a-time output is provided by fputs and puts.
[View full width]
#include <stdio.h>
int fputs(const char *restrict str, FILE *restrict
fp);
int puts(const char *str);
Both return: non-negative value if OK, EOF on error
The function fputs writes the null-terminated string to the specified stream. The null byte at
Page 221
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
the end is not written. Note that this need not be line-at-a-time output, since the string need
not contain a newline as the last non-null character. Usually, this is the casethe last non-null
character is a newlinebut it's not required.
The puts function writes the null-terminated string to the standard output, without writing the
null byte. But puts then writes a newline character to the standard output.
The puts function is not unsafe, like its counterpart gets. Nevertheless, we'll avoid using it, to
prevent having to remember whether it appends a newline. If we always use fgets and fputs,
we know that we always have to deal with the newline character at the end of each line.
Page 222
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
5.8. Standard I/O Efficiency
Using the functions from the previous section, we can get an idea of the efficiency of the
standard I/O system. The program in Figure 5.4 is like the one in Figure 3.4: it simply copies
standard input to standard output, using getc and putc. These two routines can be
implemented as macros.
Figure 5.4. Copy standard input to standard output using getc and putc
#include "apue.h"
int
main(void)
{
int
c;
while ((c = getc(stdin)) != EOF)
if (putc(c, stdout) == EOF)
err_sys("output error");
if (ferror(stdin))
err_sys("input error");
exit(0);
}
We can make another version of this program that uses fgetc and fputc, which should be
functions, not macros. (We don't show this trivial change to the source code.)
Finally, we have a version that reads and writes lines, shown in Figure 5.5.
Figure 5.5. Copy standard input to standard output using fgets and
fputs
#include "apue.h"
int
main(void)
{
char
buf[MAXLINE];
while (fgets(buf, MAXLINE, stdin) != NULL)
if (fputs(buf, stdout) == EOF)
err_sys("output error");
if (ferror(stdin))
err_sys("input error");
exit(0);
}
Note that we do not close the standard I/O streams explicitly in Figure 5.4 or Figure 5.5.
Instead, we know that the exit function will flush any unwritten data and then close all open
streams. (We'll discuss this in Section 8.5.) It is interesting to compare the timing of these
Page 223
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
three programs with the timing data from Figure 3.5. We show this data when operating on
the same file (98.5 MB with 3 million lines) in Figure 5.6.
Figure 5.6. Timing results using standard I/O routines
Function
User CPU
(seconds)
System CPU
(seconds)
Clock time
(seconds)
Bytes of
program text
best time from Figure
3.5
0.01
0.18
6.67
fgets, fputs
2.59
0.19
7.15
139
getc, putc
10.84
0.27
12.07
120
fgetc, fputc
10.44
0.27
11.42
120
124.89
161.65
288.64
single byte time
from Figure 3.5
For each of the three standard I/O versions, the user CPU time is larger than the best read
version from Figure 3.5, because the character-at-a-time standard I/O versions have a loop
that is executed 100 million times, and the loop in the line-at-a-time version is executed
3,144,984 times. In the read version, its loop is executed only 12,611 times (for a buffer size
of 8,192). This difference in clock times is from the difference in user times and the difference
in the times spent waiting for I/O to complete, as the system times are comparable.
The system CPU time is about the same as before, because roughly the same number of kernel
requests are being made. Note that an advantage of using the standard I/O routines is that
we don't have to worry about buffering or choosing the optimal I/O size. We do have to
determine the maximum line size for the version that uses fgets, but that's easier than trying
to choose the optimal I/O size.
The final column in Figure 5.6 is the number of bytes of text spacethe machine instructions
generated by the C compilerfor each of the main functions. We can see that the version using
getc and putc takes the same amount of space as the one using the fgetc and fputc
functions. Usually, getc and putc are implemented as macros, but in the GNU C library
implementation, the macro simply expands to a function call.
The version using line-at-a-time I/O is almost twice as fast as the version using
character-at-a-time I/O. If the fgets and fputs functions are implemented using getc and putc
(see Section 7.7 of Kernighan and Ritchie [1988], for example), then we would expect the
timing to be similar to the getc version. Actually, we might expect the line-at-a-time version
to take longer, since we would be adding the overhead of 200 million extra function calls to
the existing 6 million ones. What is happening with this example is that the line-at-a-time
functions are implemented using memccpy(3). Often, the memccpy function is implemented in
assembler instead of C, for efficiency.
The last point of interest with these timing numbers is that the fgetc version is so much faster
than the BUFFSIZE=1 version from Figure 3.5. Both involve the same number of function
callsabout 200 millionyet the fgetc version is almost 12 times faster in user CPU time and
slightly more than 25 times faster in clock time. The difference is that the version using read
executes 200 million function calls, which in turn execute 200 million system calls. With the
fgetc version, we still execute 200 million function calls, but this ends up being only 25,222
system calls. System calls are usually much more expensive than ordinary function calls.
Page 224
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
As a disclaimer, you should be aware that these timing results are valid only on the single
system they were run on. The results depend on many implementation features that aren't the
same on every UNIX system. Nevertheless, having a set of numbers such as these, and
explaining why the various versions differ, helps us understand the system better. From this
section and Section 3.9, we've learned that the standard I/O library is not much slower than
calling the read and write functions directly. The approximate cost that we've seen is about
0.11 seconds of CPU time to copy a megabyte of data using getc and putc. For most nontrivial
applications, the largest amount of the user CPU time is taken by the application, not by the
standard I/O routines.
Page 225
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
5.9. Binary I/O
The functions from Section 5.6 operated with one character at a time, and the functions
from Section 5.7 operated with one line at a time. If we're doing binary I/O, we often would
like to read or write an entire structure at a time. To do this using getc or putc, we have to
loop through the entire structure, one byte at a time, reading or writing each byte. We can't
use the line-at-a-time functions, since fputs stops writing when it hits a null byte, and there
might be null bytes within the structure. Similarly, fgets won't work right on input if any of the
data bytes are nulls or newlines. Therefore, the following two functions are provided for binary
I/O.
[View full width]
#include <stdio.h>
size_t fread(void *restrict ptr, size_t size,
size_t nobj,
FILE *restrict fp);
size_t fwrite(const void *restrict ptr, size_t
size, size_t nobj,
FILE *restrict fp);
Both return: number of objects read or written
These functions have two common uses:
1.
2.
3.
4.
5.
6.
Read or write a binary array. For example, to write elements 2 through 5 of a
floating-point array, we could write
float data[10];
if (fwrite(&data[2], sizeof(float), 4, fp) != 4)
err_sys("fwrite error");
Here, we specify size as the size of each element of the array and nobj as the number
of elements.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
Read or write a structure. For example, we could write
struct {
short
long
char
} item;
count;
total;
name[NAMESIZE];
if (fwrite(&item, sizeof(item), 1, fp) != 1)
err_sys("fwrite error");
Here, we specify size as the size of structure and nobj as one (the number of objects
Page 226
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
to write).
The obvious generalization of these two cases is to read or write an array of structures. To
do this, size would be the sizeof the structure, and nobj would be the number of elements in
the array.
Both fread and fwrite return the number of objects read or written. For the read case, this
number can be less than nobj if an error occurs or if the end of file is encountered. In this
case ferror or feof must be called. For the write case, if the return value is less than the
requested nobj, an error has occurred.
A fundamental problem with binary I/O is that it can be used to read only data that has been
written on the same system. This was OK many years ago, when all the UNIX systems were
PDP-11s, but the norm today is to have heterogeneous systems connected together with
networks. It is common to want to write data on one system and process it on another.
These two functions won't work, for two reasons.
1.
The offset of a member within a structure can differ between compilers and systems,
because of different alignment requirements. Indeed, some compilers have an option
allowing structures to be packed tightly, to save space with a possible runtime
performance penalty, or aligned accurately, to optimize runtime access of each
member. This means that even on a single system, the binary layout of a structure can
differ, depending on compiler options.
2.
The binary formats used to store multibyte integers and floating-point values differ
among machine architectures.
We'll touch on some of these issues when we discuss sockets in Chapter 16. The real solution
for exchanging binary data among different systems is to use a higher-level protocol. Refer to
Section 8.2 of Rago [1993] or Section 5.18 of Stevens, Fenner, & Rudoff [2004] for a
description of some techniques various network protocols use to exchange binary data.
We'll return to the fread function in Section 8.14 when we'll use it to read a binary structure,
the UNIX process accounting records.
Page 227
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
5.10. Positioning a Stream
There are three ways to position a standard I/O stream:
1.
The two functions ftell and fseek. They have been around since Version 7, but they
assume that a file's position can be stored in a long integer.
2.
The two functions ftello and fseeko. They were introduced in the Single UNIX
Specification to allow for file offsets that might not fit in a long integer. They replace
the long integer with the off_t data type.
3.
The two functions fgetpos and fsetpos. They were introduced by ISO C. They use an
abstract data type, fpos_t, that records a file's position. This data type can be made
as big as necessary to record a file's position.
Portable applications that need to move to non-UNIX systems should use fgetpos and fsetpos.
#include <stdio.h>
long ftell(FILE *fp);
Returns: current file position indicator if OK, 1L on error
int fseek(FILE *fp, long offset, int whence);
Returns: 0 if OK, nonzero on error
void rewind(FILE *fp);
For a binary file, a file's position indicator is measured in bytes from the beginning of the file.
The value returned by ftell for a binary file is this byte position. To position a binary file
using fseek, we must specify a byte offset and how that offset is interpreted. The values for
whence are the same as for the lseek function from Section 3.6: SEEK_SET means from the
beginning of the file, SEEK_CUR means from the current file position, and SEEK_END means from
the end of file. ISO C doesn't require an implementation to support the SEEK_END specification
for a binary file, as some systems require a binary file to be padded at the end with zeros to
make the file size a multiple of some magic number. Under the UNIX System, however,
SEEK_END is supported for binary files.
For text files, the file's current position may not be measurable as a simple byte offset. Again,
this is mainly under non-UNIX systems that might store text files in a different format. To
position a text file, whence has to be SEEK_SET, and only two values for offset are allowed:
0meaning rewind the file to its beginningor a value that was returned by ftell for that file. A
stream can also be set to the beginning of the file with the rewind function.
The ftello function is the same as ftell, and the fseeko function is the same as fseek,
except that the type of the offset is off_t instead of long.
Page 228
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
#include <stdio.h>
off_t ftello(FILE *fp);
Returns: current file position indicator if OK, (off_t)1 on error
int fseeko(FILE *fp, off_t offset, int whence);
Returns: 0 if OK, nonzero on error
Recall the discussion of the off_t data type in Section 3.6. Implementations can define the
off_t type to be larger than 32 bits.
As we mentioned, the fgetpos and fsetpos functions were introduced by the ISO C standard.
#include <stdio.h>
int fgetpos(FILE *restrict fp, fpos_t *restrict pos);
int fsetpos(FILE *fp, const fpos_t *pos);
Both return: 0 if OK, nonzero on error
The fgetpos function stores the current value of the file's position indicator in the object
pointed to by pos. This value can be used in a later call to fsetpos to reposition the stream to
that location.
Page 229
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
5.11. Formatted I/O
Formatted Output
Formatted output is handled by the four printf functions.
[View full width]
#include <stdio.h>
int printf(const char *restrict format, ...);
int fprintf(FILE *restrict fp, const char
*restrict format, ...);
Both return: number of characters output if OK, negative value if output error
[View full width]
int sprintf(char *restrict buf, const char
*restrict format, ...);
int snprintf(char *restrict buf, size_t n,
const char *restrict format, ...);
Both return: number of characters stored in array if OK, negative value if encoding error
The printf function writes to the standard output, fprintf writes to the specified stream,
and sprintf places the formatted characters in the array buf. The sprintf function
automatically appends a null byte at the end of the array, but this null byte is not included in
the return value.
Note that it's possible for sprintf to overflow the buffer pointed to by buf. It's the caller's
responsibility to ensure that the buffer is large enough. Because this can lead to
buffer-overflow problems, snprintf was introduced. With it, the size of the buffer is an explicit
parameter; any characters that would have been written past the end of the buffer are
discarded instead. The snprintf function returns the number of characters that would have
been written to the buffer had it been big enough. As with sprintf, the return value doesn't
include the terminating null byte. If snprintf returns a positive value less than the buffer size
n, then the output was not truncated. If an encoding error occurs, snprintf returns a
negative value.
The format specification controls how the remainder of the arguments will be encoded and
ultimately displayed. Each argument is encoded according to a conversion specification that
starts with a percent sign (%). Except for the conversion specifications, other characters in
the format are copied unmodified. A conversion specification has four optional components,
shown in square brackets below:
%[flags][fldwidth][precision][lenmodifier]convtype
The flags are summarized in Figure 5.7.
Page 230
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 5.7. The flags component of a conversion specification
Flag
Description
-
left-justify the output in the field
+
always display sign of a signed conversion
(space) prefix by a space if no sign is generated
#
convert using alternate form (include 0x prefix for hex format, for example)
0
prefix with leading zeros instead of padding with spaces
The fldwidth component specifies a minimum field width for the conversion. If the conversion
results in fewer characters, it is padded with spaces. The field width is a non-negative
decimal integer or an asterisk.
The precision component specifies the minimum number of digits to appear for integer
conversions, the minimum number of digits to appear to the right of the decimal point for
floating-point conversions, or the maximum number of bytes for string conversions. The
precision is a period (.) followed by a optional non-negative decimal integer or an asterisk.
Both the field width and precision can be an asterisk. In this case, an integer argument
specifies the value to be used. The argument appears directly before the argument to
converted.
The lenmodifier component specifies the size of the argument. Possible values are
summarized in Figure 5.8.
Figure 5.8. The length modifier component of a conversion
specification
Length modifier
Description
hh
signed or unsigned char
h
signed or unsigned short
l
signed or unsigned long or wide character
ll
signed or unsigned long long
j
intmax_t or uintmax_t
z
size_t
t
ptrdiff_t
Page 231
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 5.8. The length modifier component of a conversion
specification
Length modifier
L
Description
long double
The convtype component is not optional. It controls how the argument is interpreted. The
various conversion types are summarized in Figure 5.9.
Figure 5.9. The conversion type component of a conversion
specification
Conversion
type
Description
d,i
signed decimal
o
unsigned octal
u
unsigned decimal
x,X
unsigned hexadecimal
f,F
double floating-point number
e,E
double floating-point number in exponential format
g,G
interpreted as f, F, e, or E, depending on value converted
a,A
double floating-point number in hexadecimal exponential format
c
character (with l length modifier, wide character)
s
string (with l length modifier, wide character string)
p
pointer to a void
n
pointer to a signed integer into which is written the number of characters
written so far
%
a % character
C
wide character (an XSI extension, equivalent to lc)
S
wide character string (an XSI extension, equivalent to ls)
Page 232
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The following four variants of the printf family are similar to the previous four, but the
variable argument list (...) is replaced with arg.
[View full width]
#include <stdarg.h>
#include <stdio.h>
int vprintf(const char *restrict format, va_list arg);
int vfprintf(FILE *restrict fp, const char
*restrict format,
va_list arg);
Both return: number of characters output if OK, negative value if output error
[View full width]
int vsprintf(char *restrict buf, const char
*restrict format,
va_list arg);
int vsnprintf(char *restrict buf, size_t n,
const char *restrict format, va_list
arg);
Both return: number of characters stored in array if OK, negative value if encoding error
We use the vsnprintf function in the error routines in Appendix B.
Refer to Section 7.3 of Kernighan and Ritchie [1988] for additional details on handling
variable-length argument lists with ISO Standard C. Be aware that the variable-length
argument list routines provided with ISO Cthe <stdarg.h> header and its associated
routinesdiffer from the <varargs.h> routines that were provided with older UNIX systems.
Formatted Input
Formatted input is handled by the three scanf functions.
Page 233
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
[View full width]
#include <stdio.h>
int scanf(const char *restrict format, ...);
int fscanf(FILE *restrict fp, const char *restrict
format, ...);
int sscanf(const char *restrict buf, const char
*restrict format,
...);
All three return: number of input items assigned,
EOF if input error or end of file before any conversion
The scanf family is used to parse an input string and convert character sequences into
variables of specified types. The arguments following the format contain the addresses of the
variables to initialize with the results of the conversions.
The format specification controls how the arguments are converted for assignment. The
percent sign (%) indicates the start of a conversion specification. Except for the conversion
specifications and white space, other characters in the format have to match the input. If a
character doesn't match, processing stops, leaving the remainder of the input unread.
There are three optional components to a conversion specification, shown in square brackets
below:
%[*][fldwidth][lenmodifier]convtype
The optional leading asterisk is used to suppress conversion. Input is converted as specified
by the rest of the conversion specification, but the result is not stored in an argument.
The fldwidth component specifies the maximum field width in characters. The lenmodifier
component specifies the size of the argument to be initialized with the result of the
conversion. The same length modifiers supported by the printf family of functions are
supported by the scanf family of functions (see Figure 5.8 for a list of the length modifiers).
The convtype field is similar to the conversion type field used by the printf family, but there
are some differences. One difference is that results that are stored in unsigned types can
optionally be signed on input. For example, 1 will scan as 4294967295 into an unsigned
integer. Figure 5.10 summarizes the conversion types supported by the scanf family of
functions.
Figure 5.10. The conversion type component of a conversion
specification
Conversion
type
d
Description
signed decimal, base 10
Page 234
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 5.10. The conversion type component of a conversion
specification
Conversion
type
Description
i
signed decimal, base determined by format of input
o
unsigned octal (input optionally signed)
u
unsigned decimal, base 10 (input optionally signed)
x
unsigned hexadecimal (input optionally signed)
a,A,e,E,f,F,g,G
floating-point number
c
character (with l length modifier, wide character)
s
string (with l length modifier, wide character string)
[
matches a sequence of listed characters, ending with ]
[^
matches all characters except the ones listed, ending with ]
p
pointer to a void
n
pointer to a signed integer into which is written the number of characters
read so far
%
a % character
C
wide character (an XSI extension, equivalent to lc)
S
wide character string (an XSI extension, equivalent to ls)
As with the printf family, the scanf family also supports functions that use variable argument
lists as specified by <stdarg.h>.
Page 235
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
[View full width]
#include <stdarg.h>
#include <stdio.h>
int vscanf(const char *restrict format, va_list arg
);
int vfscanf(FILE *restrict fp, const char
*restrict format,
va_list arg);
int vsscanf(const char *restrict buf, const char
*restrict format,
va_list arg);
All three return: number of input items assigned,
EOF if input error or end of file before any conversion
Refer to your UNIX system manual for additional details on the scanf family of functions.
Page 236
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
5.12. Implementation Details
As we've mentioned, under the UNIX System, the standard I/O library ends up calling the I/O
routines that we described in Chapter 3. Each standard I/O stream has an associated file
descriptor, and we can obtain the descriptor for a stream by calling fileno.
Note that fileno is not part of the ISO C standard, but an extension supported by POSIX.1.
#include <stdio.h>
int fileno(FILE *fp);
Returns: the file descriptor associated with the stream
We need this function if we want to call the dup or fcntl functions, for example.
To look at the implementation of the standard I/O library on your system, start with the
header <stdio.h>. This will show how the FILE object is defined, the definitions of the
per-stream flags, and any standard I/O routines, such as getc, that are defined as macros.
Section 8.5 of Kernighan and Ritchie [1988] has a sample implementation that shows the
flavor of many implementations on UNIX systems. Chapter 12 of Plauger [1992] provides the
complete source code for an implementation of the standard I/O library. The implementation of
the GNU standard I/O library is also publicly available.
Example
The program in Figure 5.11 prints the buffering for the three standard streams and for a
stream that is associated with a regular file.
Note that we perform I/O on each stream before printing its buffering status, since the first
I/O operation usually causes the buffers to be allocated for a stream. The structure members
_IO_file_flags, _IO_buf_base, and _IO_buf_end and the constants _IO_UNBUFFERED and
_IO_LINE_BUFFERED are defined by the GNU standard I/O library used on Linux. Be aware that
other UNIX systems may have different implementations of the standard I/O library.
If we run the program in Figure 5.11 twice, once with the three standard streams connected
to the terminal and once with the three standard streams redirected to files, we get the
following result:
$ ./a.out
enter any character
stdin, stdout, and stderr connected to terminal
we type a newline
one line to standard error
stream = stdin, line buffered, buffer size = 1024
stream = stdout, line buffered, buffer size = 1024
stream = stderr, unbuffered, buffer size = 1
stream = /etc/motd, fully buffered, buffer size = 4096
$ ./a.out < /etc/termcap > std.out 2> std.err
run it again with all three streams
redirected
$ cat std.err
one line to standard error
$ cat std.out
enter any character
Page 237
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
stream
stream
stream
stream
=
=
=
=
stdin, fully buffered, buffer size = 4096
stdout, fully buffered, buffer size = 4096
stderr, unbuffered, buffer size = 1
/etc/motd, fully buffered, buffer size = 4096
We can see that the default for this system is to have standard input and standard output
line buffered when they're connected to a terminal. The line buffer is 1,024 bytes. Note that
this doesn't restrict us to 1,024-byte input and output lines; that's just the size of the buffer.
Writing a 2,048-byte line to standard output will require two write system calls. When we
redirect these two streams to regular files, they become fully buffered, with buffer sizes equal
to the preferred I/O sizethe st_blksize value from the stat structurefor the file system. We
also see that the standard error is always unbuffered, as it should be, and that a regular file
defaults to fully buffered.
Figure 5.11. Print buffering for various standard I/O streams
#include "apue.h"
void
pr_stdio(const char *, FILE *);
int
main(void)
{
FILE
*fp;
fputs("enter any character\n", stdout);
if (getchar() == EOF)
err_sys("getchar error");
fputs("one line to standard error\n", stderr);
pr_stdio("stdin", stdin);
pr_stdio("stdout", stdout);
pr_stdio("stderr", stderr);
if ((fp = fopen("/etc/motd", "r")) == NULL)
err_sys("fopen error");
if (getc(fp) == EOF)
err_sys("getc error");
pr_stdio("/etc/motd", fp);
exit(0);
}
void
pr_stdio(const char *name, FILE *fp)
{
printf("stream = %s, ", name);
/*
* The following is nonportable.
*/
if (fp->_IO_file_flags & _IO_UNBUFFERED)
printf("unbuffered");
else if (fp->_IO_file_flags & _IO_LINE_BUF)
printf("line buffered");
else /* if neither of above */
printf("fully buffered");
printf(", buffer size = %d\n", fp->_IO_buf_end - fp->_IO_buf_base);
}
Page 238
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 239
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
5.13. Temporary Files
The ISO C standard defines two functions that are provided by the standard I/O library to
assist in creating temporary files.
#include <stdio.h>
char *tmpnam(char *ptr);
Returns: pointer to unique pathname
FILE *tmpfile(void);
Returns: file pointer if OK, NULL on error
The tmpnam function generates a string that is a valid pathname and that is not the same
name as an existing file. This function generates a different pathname each time it is called,
up to TMP_MAX times. TMP_MAX is defined in <stdio.h>.
Although ISO C defines TMP_MAX, the C standard requires only that its value be at least 25.
The Single UNIX Specification, however, requires that XSI-conforming systems support a value
of at least 10,000. Although this minimum value allows an implementation to use four digits
(00009999), most implementations on UNIX systems use lowercase or uppercase characters.
If ptr is NULL, the generated pathname is stored in a static area, and a pointer to this area is
returned as the value of the function. Subsequent calls to tmpnam can overwrite this static
area. (This means that if we call this function more than once and we want to save the
pathname, we have to save a copy of the pathname, not a copy of the pointer.) If ptr is not
NULL, it is assumed that it points to an array of at least L_tmpnam characters. (The constant
L_tmpnam is defined in <stdio.h>.) The generated pathname is stored in this array, and ptr is
also returned as the value of the function.
The tmpfile function creates a temporary binary file (type wb+) that is automatically removed
when it is closed or on program termination. Under the UNIX System, it makes no difference
that this file is a binary file.
Example
The program in Figure 5.12 demonstrates these two functions.
If we execute the program in Figure 5.12, we get
$ ./a.out
/tmp/fileC1Icwc
/tmp/filemSkHSe
one line of output
Figure 5.12. Demonstrate tmpnam and tmpfile functions
#include "apue.h"
int
Page 240
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
main(void)
{
char
FILE
name[L_tmpnam], line[MAXLINE];
*fp;
printf("%s\n", tmpnam(NULL));
/* first temp name */
tmpnam(name);
printf("%s\n", name);
/* second temp name */
if ((fp = tmpfile()) == NULL)
/* create temp file */
err_sys("tmpfile error");
fputs("one line of output\n", fp); /* write to temp file */
rewind(fp);
/* then read it back */
if (fgets(line, sizeof(line), fp) == NULL)
err_sys("fgets error");
fputs(line, stdout);
/* print the line we wrote */
exit(0);
}
The standard technique often used by the tmpfile function is to create a unique pathname by
calling tmpnam, then create the file, and immediately unlink it. Recall from Section 4.15 that
unlinking a file does not delete its contents until the file is closed. This way, when the file is
closed, either explicitly or on program termination, the contents of the file are deleted.
The Single UNIX Specification defines two additional functions as XSI extensions for dealing
with temporary files. The first of these is the tempnam function.
[View full width]
#include <stdio.h>
char *tempnam(const char *directory, const
char
*prefix);
Returns: pointer to unique pathname
The tempnam function is a variation of tmpnam that allows the caller to specify both the
directory and a prefix for the generated pathname. There are four possible choices for the
directory, and the first one that is true is used.
1.
If the environment variable TMPDIR is defined, it is used as the directory. (We describe
environment variables in Section 7.9.)
2.
If directory is not NULL, it is used as the directory.
3.
The string P_tmpdir in <stdio.h> is used as the directory.
4.
A local directory, usually /tmp, is used as the directory.
If the prefix argument is not NULL, it should be a string of up to five bytes to be used as the
first characters of the filename.
This function calls the malloc function to allocate dynamic storage for the constructed
pathname. We can free this storage when we're done with the pathname. (We describe the
Page 241
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
malloc and free functions in Section 7.8.)
Example
The program in Figure 5.13 shows the use of tempnam.
Note that if either command-line argumentthe directory or the prefixbegins with a blank, we
pass a null pointer to the function. We can now show the various ways to use it:
$ ./a.out /home/sar TEMP
specify both directory and prefix
/home/sar/TEMPsf00zi
$ ./a.out " " PFX
use default directory: P_tmpdir
/tmp/PFXfBw7Gi
$ TMPDIR=/var/tmp ./a.out /usr/tmp " " use environment variable; no prefix
environment variable overrides
/var/tmp/file8fVYNi
directory
$ TMPDIR=/no/such/dir ./a.out /home/sar/tmp QQQ
/home/sar/tmp/QQQ98s8Ui
invalid environment directory is ignored
As the four steps that we listed earlier for specifying the directory name are tried in order,
this function also checks whether the corresponding directory name makes sense. If the
directory doesn't exist (the /no/such/dir example), that case is skipped, and the next choice
for the directory name is tried. From this example, we can see that for this implementation,
the P_tmpdir directory is /tmp. The technique that we used to set the environment variable,
specifying TMPDIR= before the program name, is used by the Bourne shell, the Korn shell, and
bash.
Figure 5.13. Demonstrate tempnam function
#include "apue.h"
int
main(int argc, char *argv[])
{
if (argc != 3)
err_quit("usage: a.out <directory> <prefix>");
printf("%s\n", tempnam(argv[1][0] != ' ' ? argv[1] : NULL,
argv[2][0] != ' ' ? argv[2] : NULL));
exit(0);
}
The second function that XSI defines is mkstemp. It is similar to tmpfile, but returns an open
file descriptor for the temporary file instead of a file pointer.
#include <stdlib.h>
int mkstemp(char *template);
Returns: file descriptor if OK, 1 on error
The returned file descriptor is open for reading and writing. The name of the temporary file is
selected using the template string. This string is a pathname whose last six characters are
set to XXXXXX. The function replaces these with different characters to create a unique
Page 242
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
pathname. If mkstemp returns success, it modifies the template string to reflect the name of
the temporary file.
Unlike tmpfile, the temporary file created by mkstemp is not removed automatically for us. If
we want to remove it from the file system namespace, we need to unlink it ourselves.
There is a drawback to using tmpnam and tempnam: a window exists between the time that the
unique pathname is returned and the time that an application creates a file with that name.
During this timing window, another process can create a file of the same name. The tempfile
and mkstemp functions should be used instead, as they don't suffer from this problem.
The mktemp function is similar to mkstemp, except that it creates a name suitable only for use
as a temporary file. The mktemp function doesn't create a file, so it suffers from the same
drawback as tmpnam and tempnam. The mktemp function is marked as a legacy interface in the
Single UNIX Specification. Legacy interfaces might be withdrawn in future versions of the
Single UNIX Specification, and so should be avoided.
Page 243
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
5.14. Alternatives to Standard I/O
The standard I/O library is not perfect. Korn and Vo [1991] list numerous defects: some in the
basic design, but most in the various implementations.
One inefficiency inherent in the standard I/O library is the amount of data copying that takes
place. When we use the line-at-a-time functions, fgets and fputs, the data is usually copied
twice: once between the kernel and the standard I/O buffer (when the corresponding read or
write is issued) and again between the standard I/O buffer and our line buffer. The Fast I/O
library [fio(3) in AT&T 1990a] gets around this by having the function that reads a line return
a pointer to the line instead of copying the line into another buffer. Hume [1988] reports a
threefold increase in the speed of a version of the grep(1) utility, simply by making this
change.
Korn and Vo [1991] describe another replacement for the standard I/O library: sfio. This
package is similar in speed to the fio library and normally faster than the standard I/O library.
The sfio package also provides some new features that aren't in the others: I/O streams
generalized to represent both files and regions of memory, processing modules that can be
written and stacked on an I/O stream to change the operation of a stream, and better
exception handling.
Krieger, Stumm, and Unrau [1992] describe another alternative that uses mapped filesthe mmap
function that we describe in Section 14.9. This new package is called ASI, the Alloc Stream
Interface. The programming interface resembles the UNIX System memory allocation functions
(malloc, realloc, and free, described in Section 7.8). As with the sfio package, ASI attempts
to minimize the amount of data copying by using pointers.
Several implementations of the standard I/O library are available in C libraries that were
designed for systems with small memory footprints, such as embedded systems. These
implementations emphasize modest memory requirements over portability, speed, or
functionality. Two such implementations are the uClibc C library (see http://www.uclibc.org
for more information) and the newlibc C library (http://sources.redhat.com/newlib).
Page 244
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
5.15. Summary
The standard I/O library is used by most UNIX applications. We have looked at all the
functions provided by this library, as well as at some implementation details and efficiency
considerations. Be aware of the buffering that takes place with this library, as this is the area
that generates the most problems and confusion.
Page 245
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Exercises
5.1
5.2
5.3
5.4
Implement setbuf using setvbuf.
Type in the program that copies a file using line-at-a-time I/O (fgets and fputs
) from Section 5.8, but use a MAXLINE of 4. What happens if you copy lines that
exceed this length? Explain what is happening.
What does a return value of 0 from printf mean?
The following code works correctly on some machines, but not on others. What
could be the problem?
#include
<stdio.h>
int
main(void)
{
char
c;
while ((c = getchar()) != EOF)
putchar(c);
}
5.5
5.6
5.7
Why does tempnam restrict the prefix to five characters?
How would you use the fsync function (Section 3.13) with a standard I/O
stream?
In the programs in Figures 1.7 and 1.10, the prompt that is printed does not
contain a newline, and we don't call fflush. What causes the prompt to be
output?
Page 246
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Chapter 6. System Data Files and
Information
Section 6.1. Introduction
Section 6.2. Password File
Section 6.3. Shadow Passwords
Section 6.4. Group File
Section 6.5. Supplementary Group IDs
Section 6.6. Implementation Differences
Section 6.7. Other Data Files
Section 6.8. Login Accounting
Section 6.9. System Identification
Section 6.10. Time and Date Routines
Section 6.11. Summary
Exercises
Page 247
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
6.1. Introduction
A UNIX system requires numerous data files for normal operation: the password file
/etc/passwd and the group file /etc/group are two files that are frequently used by various
programs. For example, the password file is used every time a user logs in to a UNIX system
and every time someone executes an ls -l command.
Historically, these data files have been ASCII text files and were read with the standard I/O
library. But for larger systems, a sequential scan through the password file becomes time
consuming. We want to be able to store these data files in a format other than ASCII text,
but still provide an interface for an application program that works with any file format. The
portable interfaces to these data files are the subject of this chapter. We also cover the
system identification functions and the time and date functions.
Page 248
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
6.2. Password File
The UNIX System's password file, called the user database by POSIX.1, contains the fields
shown in Figure 6.1. These fields are contained in a passwd structure that is defined in <pwd.h>.
Figure 6.1. Fields in /etc/passwd file
POSIX.1
FreeBSD
5.2.1
Linux
2.4.22
Mac OS
X 10.3
Solaris
9
•
•
•
•
•
•
•
•
•
struct passwd
Description
member
user name
char *pw_name
encrypted
password
char *pw_passwd
numerical user ID
uid_t pw_uid
•
•
•
•
•
numerical group ID
gid_t pw_gid
•
•
•
•
•
comment field
char *pw_gecos
•
•
•
•
initial working
directory
char *pw_dir
•
•
•
•
•
initial shell (user
program)
char *pw_shell
•
•
•
•
•
user access class
char *pw_class
•
•
next time to
change password
time_t
pw_change
•
•
account expiration
time
time_t
pw_expire
•
•
Note that POSIX.1 specifies only five of the ten fields in the passwd structure. Most platforms
support at least seven of the fields. The BSD-derived platforms support all ten.
Historically, the password file has been stored in /etc/passwd and has been an ASCII file. Each
line contains the fields described in Figure 6.1, separated by colons. For example, four lines
from the /etc/passwd file on Linux could be
root:x:0:0:root:/root:/bin/bash
squid:x:23:23::/var/spool/squid:/dev/null
nobody:x:65534:65534:Nobody:/home:/bin/sh
sar:x:205:105:Stephen Rago:/home/sar:/bin/bash
Note the following points about these entries.
Page 249
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html

There is usually an entry with the user name root. This entry has a user ID of 0 (the
superuser).

The encrypted password field contains a single character as a placeholder where older
versions of the UNIX System used to store the encrypted password. Because it is a
security hole to store the encrypted password in a file that is readable by everyone,
encrypted passwords are now kept elsewhere. We'll cover this issue in more detail in
the next section when we discuss passwords.

Some fields in a password file entry can be empty. If the encrypted password field is
empty, it usually means that the user does not have a password. (This is not
recommended.) The entry for squid has one blank field: the comment field. An empty
comment field has no effect.

The shell field contains the name of the executable program to be used as the login
shell for the user. The default value for an empty shell field is usually /bin/sh. Note,
however, that the entry for squid has /dev/null as the login shell. Obviously, this is a
device and cannot be executed, so its use here is to prevent anyone from logging in to
our system as user squid.
Many services have separate user IDs for the daemon processes (Chapter 13) that
help implement the service. The squid enTRy is for the processes implementing the
squid proxy cache service.

There are several alternatives to using /dev/null to prevent a particular user from
logging in to a system. It is common to see /bin/false used as the login shell. It simply
exits with an unsuccessful (nonzero) status; the shell evaluates the exit status as
false. It is also common to see /bin/true used to disable an account. All it does is exit
with a successful (zero) status. Some systems provide the nologin command. It prints
a customizable error message and exits with a nonzero exit status.

The nobody user name can be used to allow people to log in to a system, but with a
user ID (65534) and group ID (65534) that provide no privileges. The only files that this
user ID and group ID can access are those that are readable or writable by the world.
(This assumes that there are no files specifically owned by user ID 65534 or group ID
65534, which should be the case.)

Some systems that provide the finger(1) command support additional information in
the comment field. Each of these fields is separated by a comma: the user's name,
office location, office phone number, and home phone number. Additionally, an
ampersand in the comment field is replaced with the login name (capitalized) by some
utilities. For example, we could have


sar:x:205:105:Steve Rago, SF 5-121, 555-1111,
555-2222:/home/sar:/bin/sh
Then we could use finger to print information about Steve Rago.
$ finger -p sar
Login: sar
Directory: /home/sar
Office: SF 5-121, 555-1111
On since Mon Jan 19 03:57 (EST)
No Mail.
Name: Steve Rago
Shell: /bin/sh
Home Phone: 555-2222
on ttyv0 (messages off)
Even if your system doesn't support the finger command, these fields can still go into
the comment field, since that field is simply a comment and not interpreted by system
utilities.
Page 250
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Some systems provide the vipw command to allow administrators to edit the password file.
The vipw command serializes changes to the password file and makes sure that any additional
files are consistent with the changes made. It is also common for systems to provide similar
functionality through graphical user interfaces.
POSIX.1 defines only two functions to fetch entries from the password file. These functions
allow us to look up an entry given a user's login name or numerical user ID.
#include <pwd.h>
struct passwd *getpwuid(uid_t uid);
struct passwd *getpwnam(const char *name);
Both return: pointer if OK, NULL on error
The getpwuid function is used by the ls(1) program to map the numerical user ID contained in
an i-node into a user's login name. The getpwnam function is used by the login(1) program
when we enter our login name.
Both functions return a pointer to a passwd structure that the functions fill in. This structure is
usually a static variable within the function, so its contents are overwritten each time we call
either of these functions.
These two POSIX.1 functions are fine if we want to look up either a login name or a user ID,
but some programs need to go through the entire password file. The following three functions
can be used for this.
#include <pwd.h>
struct passwd *getpwent(void);
Returns: pointer if OK, NULL on error or end of file
void setpwent(void);
void endpwent(void);
These three functions are not part of the base POSIX.1 standard. They are defined as XSI
extensions in the Single UNIX Specification. As such, all UNIX systems are expected to provide
them.
We call getpwent to return the next entry in the password file. As with the two POSIX.1
functions, getpwent returns a pointer to a structure that it has filled in. This structure is
normally overwritten each time we call this function. If this is the first call to this function, it
opens whatever files it uses. There is no order implied when we use this function; the entries
can be in any order, because some systems use a hashed version of the file /etc/passwd.
The function setpwent rewinds whatever files it uses, and endpwent closes these files. When
using getpwent, we must always be sure to close these files by calling endpwent when we're
through. Although getpwent is smart enough to know when it has to open its files (the first
time we call it), it never knows when we're through.
Example
Page 251
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 6.2 shows an implementation of the function getpwnam.
The call to setpwent at the beginning is self-defense: we ensure that the files are rewound, in
case the caller has already opened them by calling getpwent. The call to endpwent when we're
done is because neither getpwnam nor getpwuid should leave any of the files open.
Figure 6.2. The getpwnam function
#include <pwd.h>
#include <stddef.h>
#include <string.h>
struct passwd *
getpwnam(const char *name)
{
struct passwd *ptr;
setpwent();
while ((ptr = getpwent()) != NULL)
if (strcmp(name, ptr->pw_name) == 0)
break;
/* found a match */
endpwent();
return(ptr);
/*a ptr is NULL if no match found */
}
Page 252
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
6.3. Shadow Passwords
The encrypted password is a copy of the user's password that has been put through a
one-way encryption algorithm. Because this algorithm is one-way, we can't guess the original
password from the encrypted version.
Historically, the algorithm that was used (see Morris and Thompson [1979]) always generated
13 printable characters from the 64-character set [a-zA-Z0-9./]. Some newer systems use an
MD5 algorithm to encrypt passwords, generating 31 characters per encrypted password. (The
more characters used to store the encrypted password, the more combinations there are, and
the harder it will be to guess the password by trying all possible variations.) When we place a
single character in the encrypted password field, we ensure that an encrypted password will
never match this value.
Given an encrypted password, we can't apply an algorithm that inverts it and returns the
plaintext password. (The plaintext password is what we enter at the Password: prompt.) But
we could guess a password, run it through the one-way algorithm, and compare the result to
the encrypted password. If user passwords were randomly chosen, this brute-force approach
wouldn't be too successful. Users, however, tend to choose nonrandom passwords, such as
spouse's name, street names, or pet names. A common experiment is for someone to obtain a
copy of the password file and try guessing the passwords. (Chapter 4 of Garfinkel et al. [2003
] contains additional details and history on passwords and the password encryption scheme
used on UNIX systems.)
To make it more difficult to obtain the raw materials (the encrypted passwords), systems now
store the encrypted password in another file, often called the shadow password file. Minimally,
this file has to contain the user name and the encrypted password. Other information relating
to the password is also stored here (Figure 6.3).
Figure 6.3. Fields in /etc/shadow file
Description
struct spwd member
user login name
char *sp_namp
encrypted password
char *sp_pwdp
days since Epoch of last password change int sp_lstchg
days until change allowed
int sp_min
days before change required
int sp_max
days warning for expiration
int sp_warn
days before account inactive
int sp_inact
days since Epoch when account expires
int sp_expire.
reserved
unsigned int sp_flag
Page 253
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The only two mandatory fields are the user's login name and encrypted password. The other
fields control how often the password is to changeknown as "password aging"and how long an
account is allowed to remain active.
The shadow password file should not be readable by the world. Only a few programs need to
access encrypted passwordslogin(1) and passwd(1), for exampleand these programs are often
set-user-ID root. With shadow passwords, the regular password file, /etc/passwd, can be left
readable by the world.
On Linux 2.4.22 and Solaris 9, a separate set of functions is available to access the shadow
password file, similar to the set of functions used to access the password file.
#include <shadow.h>
struct spwd *getspnam(const char *name);
struct spwd *getspent(void);
Both return: pointer if OK, NULL on error
void setspent(void);
void endspent(void);
On FreeBSD 5.2.1 and Mac OS X 10.3, there is no shadow password structure. The additional
account information is stored in the password file (refer back to Figure 6.1).
Page 254
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
6.4. Group File
The UNIX System's group file, called the group database by POSIX.1, contains the fields
shown in Figure 6.4. These fields are contained in a group structure that is defined in <grp.h>.
Figure 6.4. Fields in /etc/group file
POSIX.1
FreeBSD
5.2.1
Linux
2.4.22
Mac OS
X 10.3
Solaris
9
•
•
•
•
•
•
•
•
•
struct group
Description
member
group name
char *gr_name
encrypted password
char
*gr_passwd
numerical group ID
int gr_gid
•
•
•
•
•
array of pointers to
individual user names
char **gr_mem
•
•
•
•
•
The field gr_mem is an array of pointers to the user names that belong to this group. This array
is terminated by a null pointer.
We can look up either a group name or a numerical group ID with the following two functions,
which are defined by POSIX.1.
#include <grp.h>
struct group *getgrgid(gid_t gid);
struct group *getgrnam(const char *name);
Both return: pointer if OK, NULL on error
As with the password file functions, both of these functions normally return pointers to a
static variable, which is overwritten on each call.
If we want to search the entire group file, we need some additional functions. The following
three functions are like their counterparts for the password file.
#include <grp.h>
struct group *getgrent(void);
Returns: pointer if OK, NULL on error or end of file
void setgrent(void);
Page 255
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
#include <grp.h>
struct group *getgrent(void);
void endgrent(void);
These three functions are not part of the base POSIX.1 standard. They are defined as XSI
extensions in the Single UNIX Specification. All UNIX Systems provide them.
The setgrent function opens the group file, if it's not already open, and rewinds it. The
getgrent function reads the next entry from the group file, opening the file first, if it's not
already open. The endgrent function closes the group file.
Page 256
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
6.5. Supplementary Group IDs
The use of groups in the UNIX System has changed over time. With Version 7, each user
belonged to a single group at any point in time. When we logged in, we were assigned the real
group ID corresponding to the numerical group ID in our password file entry. We could change
this at any point by executing newgrp(1). If the newgrp command succeeded (refer to the
manual page for the permission rules), our real group ID was changed to the new group's ID,
and this was used for all subsequent file access permission checks. We could always go back
to our original group by executing newgrp without any arguments.
This form of group membership persisted until it was changed in 4.2BSD (circa 1983). With
4.2BSD, the concept of supplementary group IDs was introduced. Not only did we belong to
the group corresponding to the group ID in our password file entry, but we also could belong
to up to 16 additional groups. The file access permission checks were modified so that not
only was the effective group ID compared to the file's group ID, but also all the supplementary
group IDs were compared to the file's group ID.
Supplementary group IDs are a required feature of POSIX.1. (In older versions of POSIX.1,
they were optional.) The constant NGROUPS_MAX (Figure 2.10) specifies the number of
supplementary group IDs. A common value is 16 (Figure 2.14).
The advantage in using supplementary group IDs is that we no longer have to change groups
explicitly. It is not uncommon to belong to multiple groups (i.e., participate in multiple
projects) at the same time.
Three functions are provided to fetch and set the supplementary group IDs.
#include <unistd.h>
int getgroups(int gidsetsize, gid_t grouplist[]);
Returns: number of supplementary group IDs if OK, 1 on error
[View full width]
#include <grp.h>
#include <unistd.h>
/* on Linux */
/* on FreeBSD, Mac OS X, and
Solaris */
int setgroups(int ngroups, const gid_t grouplist[]);
#include <grp.h>
#include <unistd.h>
/* on Linux and Solaris */
/* on FreeBSD and Mac OS X */
int initgroups(const char *username, gid_t basegid);
Both return: 0 if OK, 1 on error
Of these three functions, only getgroups is specified by POSIX.1. Because setgroups and
initgroups are privileged operations, they are not part of POSIX.1. All four platforms covered
in this book, however, support all three functions.
On Mac OS X 10.3, basegid is declared to be of type int.
Page 257
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The getgroups function fills in the array grouplist with the supplementary group IDs. Up to
gidsetsize elements are stored in the array. The number of supplementary group IDs stored in
the array is returned by the function.
As a special case, if gidsetsize is 0, the function returns only the number of supplementary
group IDs. The array grouplist is not modified. (This allows the caller to determine the size of
the grouplist array to allocate.)
The setgroups function can be called by the superuser to set the supplementary group ID list
for the calling process: grouplist contains the array of group IDs, and ngroups specifies the
number of elements in the array. The value of ngroups cannot be larger than NGROUPS_MAX.
The only use of setgroups is usually from the initgroups function, which reads the entire
group filewith the functions getgrent, setgrent, and endgrent, which we described earlierand
determines the group membership for username. It then calls setgroups to initialize the
supplementary group ID list for the user. One must be superuser to call initgroups, since it
calls setgroups. In addition to finding all the groups that username is a member of in the group
file, initgroups also includes basegid in the supplementary group ID list; basegid is the group
ID from the password file for username.
The initgroups function is called by only a few programs: the login(1) program, for example,
calls it when we log in.
Page 258
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
6.6. Implementation Differences
We've already discussed the shadow password file supported by Linux and Solaris. FreeBSD
and Mac OS X store encrypted passwords differently. Figure 6.5 summarizes how the four
platforms covered in this book store user and group information.
Figure 6.5. Account implementation differences
FreeBSD
5.2.1
Information
Linux
2.4.22
Mac OS X 10.3
Solaris
9
Account information
/etc/passwd
/etc/passwd
netinfo
/etc/passwd
Encrypted passwords
/etc/master.passwd
/etc/shadow
netinfo
/etc/shadow
Hashed password files?
yes
no
no
no
Group information
/etc/group
/etc/group
netinfo
/etc/group
On FreeBSD, the shadow password file is /etc/master.passwd. Special commands are used to
edit it, which in turn generate a copy of /etc/passwd from the shadow password file. In
addition, hashed versions of the files are also generated: /etc/pwd.db is the hashed version of
/etc/passwd, and /etc/spwd.db is the hashed version of /etc/master.passwd. These provide
better performance for large installations.
On Mac OS X, however, /etc/passwd and /etc/master.passwd are used only in single-user mode
(when the system is undergoing maintenance; single-user mode usually means that no system
services are enabled). In multiuser modeduring normal operationthe netinfo directory service
provides access to account information for users and groups.
Although Linux and Solaris support similar shadow password interfaces, there are some subtle
differences. For example, the integer fields shown in Figure 6.3 are defined as type int on
Solaris, but as long int on Linux. Another difference is the account-inactive field. Solaris
defines it to be the number of days since the user last logged in to the system, whereas Linux
defines it to be the number of days after which the maximum password age has been reached.
On many systems, the user and group databases are implemented using the Network
Information Service (NIS). This allows administrators to edit a master copy of the databases
and distribute them automatically to all servers in an organization. Client systems contact
servers to look up information about users and groups. NIS+ and the Lightweight Directory
Access Protocol (LDAP) provide similar functionality. Many systems control the method used
to administer each type of information through the /etc/nsswitch.conf configuration file.
Page 259
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
6.7. Other Data Files
We've discussed only two of the system's data files so far: the password file and the group
file. Numerous other files are used by UNIX systems in normal day-to-day operation. For
example, the BSD networking software has one data file for the services provided by the
various network servers (/etc/services), one for the protocols (/etc/protocols), and one for
the networks (/etc/networks). Fortunately, the interfaces to these various files are like the
ones we've already described for the password and group files.
The general principle is that every data file has at least three functions:
1.
A get function that reads the next record, opening the file if necessary. These
functions normally return a pointer to a structure. A null pointer is returned when the
end of file is reached. Most of the get functions return a pointer to a static structure,
so we always have to copy it if we want to save it.
2.
A set function that opens the file, if not already open, and rewinds the file. This
function is used when we know we want to start again at the beginning of the file.
3.
An end enTRy that closes the data file. As we mentioned earlier, we always have to
call this when we're done, to close all the files.
Additionally, if the data file supports some form of keyed lookup, routines are provided to
search for a record with a specific key. For example, two keyed lookup routines are provided
for the password file: getpwnam looks for a record with a specific user name, and getpwuid looks
for a record with a specific user ID.
Figure 6.6 shows some of these routines, which are common to UNIX systems. In this figure,
we show the functions for the password files and group file, which we discussed earlier in this
chapter, and some of the networking functions. There are get, set, and end functions for all
the data files in this figure.
Figure 6.6. Similar routines for accessing system data files
Description
Data file
Header
Structure
Additional keyed lookup functions
passwords
/etc/passwd
<pwd.h>
passwd
getpwnam, getpwuid
groups
/etc/group
<grp.h>
group
getgrnam, getgrgid
shadow
/etc/shadow
<shadow.h>
spwd
getspnam
hosts
/etc/hosts
<netdb.h>
hostent
gethostbyname, gethostbyaddr
networks
/etc/networks
<netdb.h>
netent
getnetbyname, getnetbyaddr
protocols
/etc/protocols
<netdb.h>
protoent
getprotobyname, getprotobynumber
services
/etc/services
<netdb.h>
servent
getservbyname, getservbyport
Under Solaris, the last four data files in Figure 6.6 are symbolic links to files of the same name
Page 260
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
in the directory /etc/inet. Most UNIX System implementations have additional functions that
are like these, but the additional functions tend to deal with system administration files and
are specific to each implementation.
Page 261
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
6.8. Login Accounting
Two data files that have been provided with most UNIX systems are the utmp file, which keeps
track of all the users currently logged in, and the wtmp file, which keeps track of all logins and
logouts. With Version 7, one type of record was written to both files, a binary record
consisting of the following structure:
struct utmp {
char ut_line[8]; /* tty line: "ttyh0", "ttyd0", "ttyp0", ... */
char ut_name[8]; /* login name */
long ut_time;
/* seconds since Epoch */
};
On login, one of these structures was filled in and written to the utmp file by the login
program, and the same structure was appended to the wtmp file. On logout, the entry in the
utmp file was erasedfilled with null bytesby the init process, and a new entry was appended
to the wtmp file. This logout entry in the wtmp file had the ut_name field zeroed out. Special
entries were appended to the wtmp file to indicate when the system was rebooted and right
before and after the system's time and date was changed. The who(1) program read the utmp
file and printed its contents in a readable form. Later versions of the UNIX System provided
the last(1) command, which read through the wtmp file and printed selected entries.
Most versions of the UNIX System still provide the utmp and wtmp files, but as expected, the
amount of information in these files has grown. The 20-byte structure that was written by
Version 7 grew to 36 bytes with SVR2, and the extended utmp structure with SVR4 takes over
350 bytes!
The detailed format of these records in Solaris is given in the utmpx(4) manual page. With
Solaris 9, both files are in the /var/adm directory. Solaris provides numerous functions
described in getutx(3) to read and write these two files.
On FreeBSD 5.2.1, Linux 2.4.22, and Mac OS X 10.3, the utmp(5) manual page gives the
format of their versions of these login records. The pathnames of these two files are
/var/run/utmp and /var/log/wtmp.
Page 262
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
6.9. System Identification
POSIX.1 defines the uname function to return information on the current host and operating
system.
#include <sys/utsname.h>
int uname(struct utsname *name);
Returns: non-negative value if OK, 1 on error
We pass the address of a utsname structure, and the function fills it in. POSIX.1 defines only
the minimum fields in the structure, which are all character arrays, and it's up to each
implementation to set the size of each array. Some implementations provide additional fields in
the structure.
struct utsname {
char sysname[];
char nodename[];
char release[];
char version[];
char machine[];
};
/*
/*
/*
/*
/*
name of
name of
current
current
name of
the operating system */
this node */
release of operating system */
version of this release */
hardware type */
Each string is null-terminated. The maximum name lengths supported by the four platforms
discussed in this book are listed in Figure 6.7. The information in the utsname structure can
usually be printed with the uname(1) command.
POSIX.1 warns that the nodename element may not be adequate to reference the host on a
communications network. This function is from System V, and in older days, the nodename
element was adequate for referencing the host on a UUCP network.
Realize also that the information in this structure does not give any information on the
POSIX.1 level. This should be obtained using _POSIX_VERSION, as described in Section 2.6.
Finally, this function gives us a way only to fetch the information in the structure; there is
nothing specified by POSIX.1 about initializing this information.
Historically, BSD-derived systems provide the gethostname function to return only the name of
the host. This name is usually the name of the host on a TCP/IP network.
#include <unistd.h>
int gethostname(char *name, int namelen);
Returns: 0 if OK, 1 on error
The namelen argument specifies the size of the name buffer. If enough space is provided, the
string returned through name is null terminated. If insufficient room is provided, however, it is
unspecified whether the string is null terminated.
The gethostname function, now defined as part of POSIX.1, specifies that the maximum host
Page 263
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
name length is HOST_NAME_MAX. The maximum name lengths supported by the four
implementations covered in this book are summarized in Figure 6.7.
Figure 6.7. System identification name limits
Interface
Maximum name length
FreeBSD 5.2.1
Linux 2.4.22
Mac OS X 10.3
Solaris 9
uname
256
65
256
257
gethostname
256
64
256
256
If the host is connected to a TCP/IP network, the host name is normally the fully qualified
domain name of the host.
There is also a hostname(1) command that can fetch or set the host name. (The host name is
set by the superuser using a similar function, sethostname.) The host name is normally set at
bootstrap time from one of the start-up files invoked by /etc/rc or init.
Page 264
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
6.10. Time and Date Routines
The basic time service provided by the UNIX kernel counts the number of seconds that have
passed since the Epoch: 00:00:00 January 1, 1970, Coordinated Universal Time (UTC). In
Section 1.10, we said that these seconds are represented in a time_t data type, and we call
them calendar times. These calendar times represent both the time and the date. The UNIX
System has always differed from other operating systems in (a) keeping time in UTC instead of
the local time, (b) automatically handling conversions, such as daylight saving time, and (c)
keeping the time and date as a single quantity.
The time function returns the current time and date.
#include <time.h>
time_t time(time_t *calptr);
Returns: value of time if OK, 1 on error
The time value is always returned as the value of the function. If the argument is non- null,
the time value is also stored at the location pointed to by calptr.
We haven't said how the kernel's notion of the current time is initialized. Historically, on
implementations derived from System V, the stime(2) function was called, whereas
BSD-derived systems used settimeofday(2).
The Single UNIX Specification doesn't specify how a system sets its current time.
The gettimeofday function provides greater resolution (up to a microsecond) than the time
function. This is important for some applications.
[View full width]
#include <sys/time.h>
int gettimeofday(struct timeval *restrict tp, void
*restrict tzp);
Returns: 0 always
This function is defined as an XSI extension in the Single UNIX Specification. The only legal
value for tzp is NULL; other values result in unspecified behavior. Some platforms support the
specification of a time zone through the use of tzp, but this is implementation-specific and not
defined by the Single UNIX Specification.
The gettimeofday function stores the current time as measured from the Epoch in the memory
pointed to by tp. This time is represented as a timeval structure, which stores seconds and
microseconds:
struct timeval {
time_t tv_sec;
long
tv_usec;
};
/* seconds */
/* microseconds */
Page 265
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Once we have the integer value that counts the number of seconds since the Epoch, we
normally call one of the other time functions to convert it to a human-readable time and
date. Figure 6.8 shows the relationships between the various time functions.
Figure 6.8. Relationship of the various time functions
(The four functions in this figure that are shown with dashed lineslocaltime, mktime, ctime,
and strftimeare all affected by the TZ environment variable, which we describe later in this
section.)
The two functions localtime and gmtime convert a calendar time into what's called a
broken-down time, a tm structure.
struct
int
int
int
int
int
int
int
int
int
};
tm {
tm_sec;
tm_min;
tm_hour;
tm_mday;
tm_mon;
tm_year;
tm_wday;
tm_yday;
tm_isdst;
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
a broken-down time */
seconds after the minute: [0 - 60] */
minutes after the hour: [0 - 59] */
hours after midnight: [0 - 23] */
day of the month: [1 - 31] */
months since January: [0 - 11] */
years since 1900 */
days since Sunday: [0 - 6] */
days since January 1: [0 - 365] */
daylight saving time flag: <0, 0, >0 */
The reason that the seconds can be greater than 59 is to allow for a leap second. Note that
all the fields except the day of the month are 0-based. The daylight saving time flag is
positive if daylight saving time is in effect, 0 if it's not in effect, and negative if the
information isn't available.
In previous versions of the Single UNIX Specification, double leap seconds were allowed. Thus,
the valid range of values for the tm_sec member was 061. The formal definition of UTC doesn't
allow for double leap seconds, so the valid range for seconds is now defined to be 060.
Page 266
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
#include <time.h>
struct tm *gmtime(const time_t *calptr);
struct tm *localtime(const time_t *calptr
);
Both return: pointer to broken-down time
The difference between localtime and gmtime is that the first converts the calendar time to
the local time, taking into account the local time zone and daylight saving time flag, whereas
the latter converts the calendar time into a broken-down time expressed as UTC.
The function mktime takes a broken-down time, expressed as a local time, and converts it into
a time_t value.
#include <time.h>
time_t mktime(struct tm *tmptr);
Returns: calendar time if OK, 1 on error
The asctime and ctime functions produce the familiar 26-byte string that is similar to the
default output of the date(1) command:
Tue Feb 10 18:27:38 2004\n\0
#include <time.h>
char *asctime(const struct tm *tmptr);
char *ctime(const time_t *calptr);
Both return: pointer to null-terminated string
The argument to asctime is a pointer to a broken-down string, whereas the argument to ctime
is a pointer to a calendar time.
The final time function, strftime, is the most complicated. It is a printf-like function for time
values.
#include <time.h>
size_t strftime(char *restrict buf, size_t maxsize,
const char *restrict format,
const struct tm *restrict tmptr);
Returns: number of characters stored in array if room, 0 otherwise
Page 267
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The final argument is the time value to format, specified by a pointer to a broken-down time
value. The formatted result is stored in the array buf whose size is maxsize characters. If the
size of the result, including the terminating null, fits in the buffer, the function returns the
number of characters stored in buf, excluding the terminating null. Otherwise, the function
returns 0.
The format argument controls the formatting of the time value. Like the printf functions,
conversion specifiers are given as a percent followed by a special character. All other
characters in the format string are copied to the output. Two percents in a row generate a
single percent in the output. Unlike the printf functions, each conversion specified generates
a different fixed-size output stringthere are no field widths in the format string. Figure 6.9
describes the 37 ISO C conversion specifiers. The third column of this figure is from the output
of strftime under Linux, corresponding to the time and date Tue Feb 10 18:27:38 EST 2004.
Figure 6.9. Conversion specifiers for strftime
Format
Description
Example
%a
abbreviated weekday name
Tue
%A
full weekday name
Tuesday
%b
abbreviated month name
Feb
%B
full month name
February
%c
date and time
Tue Feb 10 18:27:38 2004
%C
year/100: [0099]
20
%d
day of the month: [0131]
10
%D
date [MM/DD/YY]
02/10/04
%e
day of month (single digit preceded by space) [131]
10
%F
ISO 8601 date format [YYYYMMDD]
2004-02-10
%g
last two digits of ISO 8601 week-based year [0099]
04
%G
ISO 8601 week-based year
2004
%h
same as %b
Feb
%H
hour of the day (24-hour format): [0023]
18
%I
hour of the day (12-hour format): [0112]
06
%j
day of the year: [001366]
041
Page 268
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 6.9. Conversion specifiers for strftime
Format
Description
Example
%m
month: [0112]
02
%M
minute: [0059]
27
%n
newline character
%p
AM/PM
PM
%r
locale's time (12-hour format)
06:27:38 PM
%R
same as "%H:%M"
18:27
%S
second: [0060]
38
%t
horizontal tab character
%T
same as "%H:%M:%S"
18:27:38
%u
ISO 8601 weekday [Monday=1, 17]
2
%U
Sunday week number: [0053]
06
%V
ISO 8601 week number: [0153]
07
%w
weekday: [0=Sunday, 06]
2
%W
Monday week number: [0053]
06
%x
date
02/10/04
%X
time
18:27:38
%y
last two digits of year: [0099]
04
%Y
year
2004
%z
offset from UTC in ISO 8601 format
-0500
%Z
time zone name
EST
%%
translates to a percent sign
%
Page 269
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The only specifiers that are not self-evident are %U, %V, and %W. The %U specifier represents
the week number of the year, where the week containing the first Sunday is week 1. The %W
specifier represents the week number of the year, where the week containing the first Monday
is week 1. The %V specifier is different. If the week containing the first day in January has four
or more days in the new year, then this is treated as week 1. Otherwise, it is treated as the
last week of the previous year. In both cases, Monday is treated as the first day of the week.
As with printf, strftime supports modifiers for some of the conversion specifiers. The E and O
modifiers can be used to generate an alternate format if supported by the locale.
Some systems support additional, nonstandard extensions to the format string for strftime.
We mentioned that the four functions in Figure 6.8 with dashed lines were affected by the TZ
environment variable: localtime, mktime, ctime, and strftime. If defined, the value of this
environment variable is used by these functions instead of the default time zone. If the
variable is defined to be a null string, such as TZ=, then UTC is normally used. The value of
this environment variable is often something like TZ=EST5EDT, but POSIX.1 allows a much more
detailed specification. Refer to the Environment Variables chapter of the Single UNIX
Specification [Open Group 2004] for all the details on the TZ variable.
All the time and date functions described in this section, except gettimeofday, are defined by
the ISO C standard. POSIX.1, however, added the TZ environment variable. On FreeBSD 5.2.1,
Linux 2.4.22, and Mac OS X 10.3, more information on the TZ variable can be found in the
tzset(3) manual page. On Solaris 9, this information is in the environ(5) manual page.
Page 270
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
6.11. Summary
The password file and the group file are used on all UNIX systems. We've looked at the various
functions that read these files. We've also talked about shadow passwords, which can help
system security. Supplementary group IDs provide a way to participate in multiple groups at
the same time. We also looked at how similar functions are provided by most systems to
access other system-related data files. We discussed the POSIX.1 functions that programs
can use to identify the system on which they are running. We finished the chapter with a look
at the time and date functions provided by ISO C and the Single UNIX Specification.
Page 271
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Exercises
6.1
6.2
6.3
6.4
6.5
If the system uses a shadow file and we need to obtain the encrypted
password, how do we do it?
If you have superuser access and your system uses shadow passwords,
implement the previous exercise.
Write a program that calls uname and prints all the fields in the utsname
structure. Compare the output to the output from the uname(1) command.
Calculate the latest time that can be represented by the time_t data type.
After it wraps around, what happens?
Write a program to obtain the current time and print it using strftime, so that
it looks like the default output from date(1). Set the TZ environment variable to
different values and see what happens.
Page 272
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Chapter 7. Process Environment
Section 7.1. Introduction
Section 7.2. main Function
Section 7.3. Process Termination
Section 7.4. Command-Line Arguments
Section 7.5. Environment List
Section 7.6. Memory Layout of a C Program
Section 7.7. Shared Libraries
Section 7.8. Memory Allocation
Section 7.9. Environment Variables
Section 7.10. setjmp and longjmp Functions
Section 7.11. getrlimit and setrlimit Functions
Section 7.12. Summary
Exercises
Page 273
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
7.1. Introduction
Before looking at the process control primitives in the next chapter, we need to examine the
environment of a single process. In this chapter, we'll see how the main function is called
when the program is executed, how command-line arguments are passed to the new program,
what the typical memory layout looks like, how to allocate additional memory, how the
process can use environment variables, and various ways for the process to terminate.
Additionally, we'll look at the longjmp and setjmp functions and their interaction with the
stack. We finish the chapter by examining the resource limits of a process.
Page 274
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
7.2. main Function
A C program starts execution with a function called main. The prototype for the main function
is
int main(int argc, char *argv[]);
where argc is the number of command-line arguments, and argv is an array of pointers to the
arguments. We describe these arguments in Section 7.4.
When a C program is executed by the kernelby one of the exec functions, which we describe
in Section 8.10a special start-up routine is called before the main function is called. The
executable program file specifies this routine as the starting address for the program; this is
set up by the link editor when it is invoked by the C compiler. This start-up routine takes
values from the kernelthe command-line arguments and the environmentand sets things up so
that the main function is called as shown earlier.
Page 275
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
7.3. Process Termination
There are eight ways for a process to terminate. Normal termination occurs in five ways:
1.
Return from main
2.
Calling exit
3.
Calling _exit or _Exit
4.
Return of the last thread from its start routine (Section 11.5)
5.
Calling pthread_exit (Section 11.5) from the last thread
Abnormal termination occurs in three ways:
6.
Calling abort (Section 10.17)
7.
Receipt of a signal (Section 10.2)
8.
Response of the last thread to a cancellation request (Sections 11.5 and 12.7)
For now, we'll ignore the three termination methods specific to threads until we discuss
threads in Chapters 11 and 12.
The start-up routine that we mentioned in the previous section is also written so that if the
main function returns, the exit function is called. If the start-up routine were coded in C (it is
often coded in assembler) the call to main could look like
exit(main(argc, argv));
Exit Functions
Three functions terminate a program normally: _exit and _Exit, which return to the kernel
immediately, and exit, which performs certain cleanup processing and then returns to the
kernel.
#include <stdlib.h>
void exit(int status);
void _Exit(int status);
#include <unistd.h>
void _exit(int status);
We'll discuss the effect of these three functions on other processes, such as the children and
the parent of the terminating process, in Section 8.5.
The reason for the different headers is that exit and _Exit are specified by ISO C, whereas
_exit is specified by POSIX.1.
Historically, the exit function has always performed a clean shutdown of the standard I/O
library: the fclose function is called for all open streams. Recall from Section 5.5 that this
Page 276
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
causes all buffered output data to be flushed (written to the file).
All three exit functions expect a single integer argument, which we call the exit status. Most
UNIX System shells provide a way to examine the exit status of a process. If (a) any of these
functions is called without an exit status, (b) main does a return without a return value, or (c)
the main function is not declared to return an integer, the exit status of the process is
undefined. However, if the return type of main is an integer and main "falls off the end" (an
implicit return), the exit status of the process is 0.
This behavior is new with the 1999 version of the ISO C standard. Historically, the exit status
was undefined if the end of the main function was reached without an explicit return
statement or call to the exit function.
Returning an integer value from the main function is equivalent to calling exit with the same
value. Thus
exit(0);
is the same as
return(0);
from the main function.
Example
The program in Figure 7.1 is the classic "hello, world" example.
When we compile and run the program in Figure 7.1, we see that the exit code is random. If
we compile the same program on different systems, we are likely to get different exit codes,
depending on the contents of the stack and register contents at the time that the main
function returns:
$ cc hello.c
$ ./a.out
hello, world
$ echo $?
13
print the exit status
Now if we enable the 1999 ISO C compiler extensions, we see that the exit code changes:
$ cc -std=c99 hello.c
enable gcc's 1999 ISO C extensions
hello.c:4: warning: return type defaults to 'int'
$ ./a.out
hello, world
role="italicAlt"print the exit status
$ echo $?
0
Note the compiler warning when we enable the 1999 ISO C extensions. This warning is printed
because the type of the main function is not explicitly declared to be an integer. If we were to
add this declaration, the message would go away. However, if we were to enable all
recommended warnings from the compiler (with the -Wall flag), then we would see a warning
message something like "control reaches end of nonvoid function."
The declaration of main as returning an integer and the use of exit instead of return produces
needless warnings from some compilers and the lint(1) program. The problem is that these
Page 277
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
compilers don't know that an exit from main is the same as a return. One way around these
warnings, which become annoying after a while, is to use return instead of exit from main. But
doing this prevents us from using the UNIX System's grep utility to locate all calls to exit from
a program. Another solution is to declare main as returning void, instead of int, and continue
calling exit. This gets rid of the compiler warning but doesn't look right (especially in a
programming text), and can generate other compiler warnings, since the return type of main is
supposed to be a signed integer. In this text, we show main as returning an integer, since that
is the definition specified by both ISO C and POSIX.1.
Different compilers vary in the verbosity of their warnings. Note that the GNU C compiler
usually doesn't emit these extraneous compiler warnings unless additional warning options are
used.
Figure 7.1. Classic C program
#include <stdio.h>
main()
{
printf("hello, world\n");
}
In the next chapter, we'll see how any process can cause a program to be executed, wait for
the process to complete, and then fetch its exit status.
atexit Function
With ISO C, a process can register up to 32 functions that are automatically called by exit.
These are called exit handlers and are registered by calling the atexit function.
#include <stdlib.h>
int atexit(void (*func)(void));
Returns: 0 if OK, nonzero on error
This declaration says that we pass the address of a function as the argument to atexit.
When this function is called, it is not passed any arguments and is not expected to return a
value. The exit function calls these functions in reverse order of their registration. Each
function is called as many times as it was registered.
These exit handlers first appeared in the ANSI C Standard in 1989. Systems that predate ANSI
C, such as SVR3 and 4.3BSD, did not provide these exit handlers.
ISO C requires that systems support at least 32 exit handlers. The sysconf function can be
used to determine the maximum number of exit handlers supported by a given platform (see
Figure 2.14).
With ISO C and POSIX.1, exit first calls the exit handlers and then closes (via fclose) all open
streams. POSIX.1 extends the ISO C standard by specifying that any exit handlers installed
will be cleared if the program calls any of the exec family of functions. Figure 7.2 summarizes
how a C program is started and the various ways it can terminate.
Figure 7.2. How a C program is started and how it terminates
[View full size image]
Page 278
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Note that the only way a program is executed by the kernel is when one of the exec functions
is called. The only way a process voluntarily terminates is when _exit or _Exit is called, either
explicitly or implicitly (by calling exit). A process can also be involuntarily terminated by a
signal (not shown in Figure 7.2).
Example
The program in Figure 7.3 demonstrates the use of the atexit function.
Executing the program in Figure 7.3 yields
$ ./a.out
main is done
first exit handler
first exit handler
second exit handler
An exit handler is called once for each time it is registered. In Figure 7.3, the first exit handler
is registered twice, so it is called two times. Note that we don't call exit; instead, we return
from main.
Figure 7.3. Example of exit handlers
#include "apue.h"
static void my_exit1(void);
static void my_exit2(void);
int
main(void)
{
if (atexit(my_exit2) != 0)
err_sys("can't register my_exit2");
Page 279
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
if (atexit(my_exit1) != 0)
err_sys("can't register my_exit1");
if (atexit(my_exit1) != 0)
err_sys("can't register my_exit1");
printf("main is done\n");
return(0);
}
static void
my_exit1(void)
{
printf("first exit handler\n");
}
static void
my_exit2(void)
{
printf("second exit handler\n");
}
Page 280
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
7.4. Command-Line Arguments
When a program is executed, the process that does the exec can pass command-line
arguments to the new program. This is part of the normal operation of the UNIX system shells.
We have already seen this in many of the examples from earlier chapters.
Example
The program in Figure 7.4 echoes all its command-line arguments to standard output. Note
that the normal echo(1) program doesn't echo the zeroth argument.
If we compile this program and name the executable echoarg, we have
$ ./echoarg arg1 TEST foo
argv[0]: ./echoarg
argv[1]: arg1
argv[2]: TEST
argv[3]: foo
We are guaranteed by both ISO C and POSIX.1 that argv[argc] is a null pointer. This lets us
alternatively code the argument-processing loop as
for (i = 0; argv[i] != NULL; i++)
Figure 7.4. Echo all command-line arguments to standard output
#include "apue.h"
int
main(int argc, char *argv[])
{
int
i;
for (i = 0; i < argc; i++)
/* echo all command-line args */
printf("argv[%d]: %s\n", i, argv[i]);
exit(0);
}
Page 281
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
7.5. Environment List
Each program is also passed an environment list. Like the argument list, the environment list is
an array of character pointers, with each pointer containing the address of a null-terminated
C string. The address of the array of pointers is contained in the global variable environ:
extern char **environ;
For example, if the environment consisted of five strings, it could look like Figure 7.5. Here we
explicitly show the null bytes at the end of each string. We'll call environ the environment
pointer, the array of pointers the environment list, and the strings they point to the
environment strings.
Figure 7.5. Environment consisting of five C character strings
[View full size image]
By convention, the environment consists of
name=value
strings, as shown in Figure 7.5. Most predefined names are entirely uppercase, but this is only
a convention.
Historically, most UNIX systems have provided a third argument to the main function that is
the address of the environment list:
int main(int argc, char *argv[], char *envp[]);
Because ISO C specifies that the main function be written with two arguments, and because
this third argument provides no benefit over the global variable environ, POSIX.1 specifies
that environ should be used instead of the (possible) third argument. Access to specific
environment variables is normally through the getenv and putenv functions, described in
Section 7.9, instead of through the environ variable. But to go through the entire
environment, the environ pointer must be used.
Page 282
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
7.6. Memory Layout of a C Program
Historically, a C program has been composed of the following pieces:

Text segment, the machine instructions that the CPU executes. Usually, the text
segment is sharable so that only a single copy needs to be in memory for frequently
executed programs, such as text editors, the C compiler, the shells, and so on. Also,
the text segment is often read-only, to prevent a program from accidentally modifying
its instructions.

Initialized data segment, usually called simply the data segment, containing variables
that are specifically initialized in the program. For example, the C declaration


int
maxcount = 99;
appearing outside any function causes this variable to be stored in the initialized data
segment with its initial value.



Uninitialized data segment, often called the "bss" segment, named after an ancient
assembler operator that stood for "block started by symbol." Data in this segment is
initialized by the kernel to arithmetic 0 or null pointers before the program starts
executing. The C declaration
long
sum[1000];
appearing outside any function causes this variable to be stored in the uninitialized
data segment.

Stack, where automatic variables are stored, along with information that is saved each
time a function is called. Each time a function is called, the address of where to return
to and certain information about the caller's environment, such as some of the machine
registers, are saved on the stack. The newly called function then allocates room on
the stack for its automatic and temporary variables. This is how recursive functions in
C can work. Each time a recursive function calls itself, a new stack frame is used, so
one set of variables doesn't interfere with the variables from another instance of the
function.

Heap, where dynamic memory allocation usually takes place. Historically, the heap has
been located between the uninitialized data and the stack.
Figure 7.6 shows the typical arrangement of these segments. This is a logical picture of how a
program looks; there is no requirement that a given implementation arrange its memory in this
fashion. Nevertheless, this gives us a typical arrangement to describe. With Linux on an Intel
x86 processor, the text segment starts at location 0x08048000, and the bottom of the stack
starts just below 0xC0000000. (The stack grows from higher-numbered addresses to
lower-numbered addresses on this particular architecture.) The unused virtual address space
between the top of the heap and the top of the stack is large.
Figure 7.6. Typical memory arrangement
Page 283
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Several more segment types exist in an a.out, containing the symbol table, debugging
information, linkage tables for dynamic shared libraries, and the like. These additional sections
don't get loaded as part of the program's image executed by a process.
Note from Figure 7.6 that the contents of the uninitialized data segment are not stored in the
program file on disk. This is because the kernel sets it to 0 before the program starts running.
The only portions of the program that need to be saved in the program file are the text
segment and the initialized data.
The size(1) command reports the sizes (in bytes) of the text, data, and bss segments. For
example:
$ size /usr/bin/cc /bin/sh
text
data
bss
dec
79606
1536
916
82058
619234
21120 18260 658614
hex
1408a
a0cb6
filename
/usr/bin/cc
/bin/sh
The fourth and fifth columns are the total of the three sizes, displayed in decimal and
hexadecimal, respectively.
Page 284
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
7.7. Shared Libraries
Most UNIX systems today support shared libraries. Arnold [1986] describes an early
implementation under System V, and Gingell et al. [1987] describe a different implementation
under SunOS. Shared libraries remove the common library routines from the executable file,
instead maintaining a single copy of the library routine somewhere in memory that all
processes reference. This reduces the size of each executable file but may add some runtime
overhead, either when the program is first executed or the first time each shared library
function is called. Another advantage of shared libraries is that library functions can be
replaced with new versions without having to relink edit every program that uses the library.
(This assumes that the number and type of arguments haven't changed.)
Different systems provide different ways for a program to say that it wants to use or not use
the shared libraries. Options for the cc(1) and ld(1) commands are typical. As an example of
the size differences, the following executable filethe classic hello.c programwas first created
without shared libraries:
$ cc -static hello1.c
prevent gcc from using shared libraries
$ ls -l a.out
-rwxrwxr-x 1 sar
475570 Feb 18 23:17 a.out
$ size a.out
text
data
bss
dec
hex
filename
375657
3780
3220 382657
5d6c1
a.out
If we compile this program to use shared libraries, the text and data sizes of the executable
file are greatly decreased:
$ cc hello1.c
$ ls -l a.out
-rwxrwxr-x 1 sar
$ size a.out
text
data
872
256
gcc defaults to use shared libraries
11410 Feb 18 23:19 a.out
bss
4
dec
1132
hex
46c
filename
a.out
Page 285
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
7.8. Memory Allocation
ISO C specifies three functions for memory allocation:
1.
malloc, which allocates a specified number of bytes of memory. The initial value of the
memory is indeterminate.
2.
calloc, which allocates space for a specified number of objects of a specified size. The
space is initialized to all 0 bits.
3.
realloc, which increases or decreases the size of a previously allocated area. When
the size increases, it may involve moving the previously allocated area somewhere
else, to provide the additional room at the end. Also, when the size increases, the
initial value of the space between the old contents and the end of the new area is
indeterminate.
#include <stdlib.h>
void *malloc(size_t size);
void *calloc(size_t nobj, size_t size);
void *realloc(void *ptr, size_t newsize);
All three return: non-null pointer if OK, NULL on error
void free(void *ptr);
The pointer returned by the three allocation functions is guaranteed to be suitably aligned so
that it can be used for any data object. For example, if the most restrictive alignment
requirement on a particular system requires that doubles must start at memory locations that
are multiples of 8, then all pointers returned by these three functions would be so aligned.
Because the three alloc functions return a generic void * pointer, if we #include <stdlib.h>
(to obtain the function prototypes), we do not explicitly have to cast the pointer returned by
these functions when we assign it to a pointer of a different type.
The function free causes the space pointed to by ptr to be deallocated. This freed space is
usually put into a pool of available memory and can be allocated in a later call to one of the
three alloc functions.
The realloc function lets us increase or decrease the size of a previously allocated area. (The
most common usage is to increase an area.) For example, if we allocate room for 512 elements
in an array that we fill in at runtime but find that we need room for more than 512 elements,
we can call realloc. If there is room beyond the end of the existing region for the requested
space, then realloc doesn't have to move anything; it simply allocates the additional area at
the end and returns the same pointer that we passed it. But if there isn't room at the end of
the existing region, realloc allocates another area that is large enough, copies the existing
512-element array to the new area, frees the old area, and returns the pointer to the new
area. Because the area may move, we shouldn't have any pointers into this area. Exercise
4.16 shows the use of realloc with getcwd to handle any length pathname. Figure 17.36
shows an example that uses realloc to avoid arrays with fixed, compile-time sizes.
Note that the final argument to realloc is the new size of the region, not the difference
Page 286
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
between the old and new sizes. As a special case, if ptr is a null pointer, realloc behaves like
malloc and allocates a region of the specified newsize.
Older versions of these routines allowed us to realloc a block that we had freed since the
last call to malloc, realloc, or calloc. This trick dates back to Version 7 and exploited the
search strategy of malloc to perform storage compaction. Solaris still supports this feature,
but many other platforms do not. This feature is deprecated and should not be used.
The allocation routines are usually implemented with the sbrk(2) system call. This system call
expands (or contracts) the heap of the process. (Refer to Figure 7.6.) A sample
implementation of malloc and free is given in Section 8.7 of Kernighan and Ritchie [1988].
Although sbrk can expand or contract the memory of a process, most versions of malloc and
free never decrease their memory size. The space that we free is available for a later
allocation, but the freed space is not usually returned to the kernel; that space is kept in the
malloc pool.
It is important to realize that most implementations allocate a little more space than is
requested and use the additional space for record keepingthe size of the allocated block, a
pointer to the next allocated block, and the like. This means that writing past the end of an
allocated area could overwrite this record-keeping information in a later block. These types of
errors are often catastrophic, but difficult to find, because the error may not show up until
much later. Also, it is possible to overwrite this record keeping by writing before the start of
the allocated area.
Writing past the end or before the beginning of a dynamically-allocated buffer can corrupt
more than internal record-keeping information. The memory before and after a
dynamically-allocated buffer can potentially be used for other dynamically-allocated objects.
These objects can be unrelated to the code corrupting them, making it even more difficult to
find the source of the corruption.
Other possible errors that can be fatal are freeing a block that was already freed and calling
free with a pointer that was not obtained from one of the three alloc functions. If a process
calls malloc, but forgets to call free, its memory usage continually increases; this is called
leakage. By not calling free to return unused space, the size of a process's address space
slowly increases until no free space is left. During this time, performance can degrade from
excess paging overhead.
Because memory allocation errors are difficult to track down, some systems provide versions
of these functions that do additional error checking every time one of the three alloc
functions or free is called. These versions of the functions are often specified by including a
special library for the link editor. There are also publicly available sources that you can compile
with special flags to enable additional runtime checking.
FreeBSD, Mac OS X, and Linux support additional debugging through the setting of
environment variables. In addition, options can be passed to the FreeBSD library through the
symbolic link /etc/malloc.conf.
Alternate Memory Allocators
Many replacements for malloc and free are available. Some systems already include libraries
providing alternate memory allocator implementations. Other systems provide only the
standard allocator, leaving it up to software developers to download alternatives, if desired.
We discuss some of the alternatives here.
libmalloc
SVR4-based systems, such as Solaris, include the libmalloc library, which provides a set of
interfaces matching the ISO C memory allocation functions. The libmalloc library includes
mallopt, a function that allows a process to set certain variables that control the operation of
Page 287
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
the storage allocator. A function called mallinfo is also available to provide statistics on the
memory allocator.
vmalloc
Vo [1996] describes a memory allocator that allows processes to allocate memory using
different techniques for different regions of memory. In addition to the functions specific to
vmalloc, the library also provides emulations of the ISO C memory allocation functions.
quick-fit
Historically, the standard malloc algorithm used either a best-fit or a first-fit memory
allocation strategy. Quick-fit is faster than either, but tends to use more memory. Weinstock
and Wulf [1988] describe the algorithm, which is based on splitting up memory into buffers of
various sizes and maintaining unused buffers on different free lists, depending on the size of
the buffers. Free implementations of malloc and free based on quick-fit are readily available
from several FTP sites.
alloca Function
One additional function is also worth mentioning. The function alloca has the same calling
sequence as malloc; however, instead of allocating memory from the heap, the memory is
allocated from the stack frame of the current function. The advantage is that we don't have
to free the space; it goes away automatically when the function returns. The alloca function
increases the size of the stack frame. The disadvantage is that some systems can't support
alloca, if it's impossible to increase the size of the stack frame after the function has been
called. Nevertheless, many software packages use it, and implementations exist for a wide
variety of systems.
All four platforms discussed in this text provide the alloca function.
Page 288
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
7.9. Environment Variables
As we mentioned earlier, the environment strings are usually of the form
name=value
The UNIX kernel never looks at these strings; their interpretation is up to the various
applications. The shells, for example, use numerous environment variables. Some, such as HOME
and USER, are set automatically at login, and others are for us to set. We normally set
environment variables in a shell start-up file to control the shell's actions. If we set the
environment variable MAILPATH, for example, it tells the Bourne shell, GNU Bourne-again shell,
and Korn shell where to look for mail.
ISO C defines a function that we can use to fetch values from the environment, but this
standard says that the contents of the environment are implementation defined.
#include <stdlib.h>
char *getenv(const char *name);
Returns: pointer to value associated with name, NULL if not found
Note that this function returns a pointer to the value of a name=value string. We should
always use getenv to fetch a specific value from the environment, instead of accessing
environ directly.
Some environment variables are defined by POSIX.1 in the Single UNIX Specification, whereas
others are defined only if the XSI extensions are supported. Figure 7.7 lists the environment
variables defined by the Single UNIX Specification and also notes which implementations
support the variables. Any environment variable defined by POSIX.1 is marked with •;
otherwise, it is an XSI extension. Many additional implementation-dependent environment
variables are used in the four implementations described in this book. Note that ISO C doesn't
define any environment variables.
Figure 7.7. Environment variables defined in the Single UNIX
Specification
POSIX.1
FreeBSD
5.2.1
Linux
2.4.22
Mac OS
X 10.3
Solaris
9
COLUMNS
•
•
•
•
•
terminal width
DATEMSK
XSI
•
getdate(3) template file
Variable
•
Description
pathname
HOME
•
•
•
•
•
home directory
LANG
•
•
•
•
•
name of locale
LC_ALL
•
•
•
•
•
name of locale
Page 289
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 7.7. Environment variables defined in the Single UNIX
Specification
POSIX.1
FreeBSD
5.2.1
Linux
2.4.22
Mac OS
X 10.3
Solaris
9
LC_COLLATE
•
•
•
•
•
name of locale for
collation
LC_CTYPE
•
•
•
•
•
name of locale for
character classification
LC_MESSAGES
•
•
•
•
•
name of locale for
messages
LC_MONETARY
•
•
•
•
•
name of locale for
monetary editing
LC_NUMERIC
•
•
•
•
•
name of locale for
numeric editing
LC_TIME
•
•
•
•
•
name of locale for
date/time formatting
LINES
•
•
•
•
•
terminal height
LOGNAME
•
•
•
•
•
login name
MSGVERB
XSI
•
•
fmtmsg(3) message
Variable
Description
components to process
NLSPATH
XSI
•
•
•
•
sequence of templates for
message catalogs
PATH
•
•
•
•
•
list of path prefixes to
search for executable file
PWD
•
•
•
•
•
absolute pathname of
current working directory
SHELL
•
•
•
•
•
name of user's preferred
shell
TERM
•
•
•
•
•
terminal type
TMPDIR
•
•
•
•
•
pathname of directory for
creating temporary files
TZ
•
•
•
•
•
time zone information
Page 290
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
In addition to fetching the value of an environment variable, sometimes we may want to set
an environment variable. We may want to change the value of an existing variable or add a
new variable to the environment. (In the next chapter, we'll see that we can affect the
environment of only the current process and any child processes that we invoke. We cannot
affect the environment of the parent process, which is often a shell. Nevertheless, it is still
useful to be able to modify the environment list.) Unfortunately, not all systems support this
capability. Figure 7.8 shows the functions that are supported by the various standards and
implementations.
Figure 7.8. Support for various environment list functions
Function
ISO C
POSIX.1
FreeBSD 5.2.1
Linux 2.4.22
Mac OS X 10.3
Solaris 9
•
•
•
•
•
•
putenv
XSI
•
•
•
•
setenv
•
•
•
•
unsetenv
•
•
•
•
getenv
clearenv
•
clearenv is not part of the Single UNIX Specification. It is used to remove all entries from the
environment list.
The prototypes for the middle three functions listed in Figure 7.8 are
[View full width]
#include <stdlib.h>
int putenv(char *str);
int setenv(const char *name, const char *value,
int rewrite);
int unsetenv(const char *name);
All return: 0 if OK, nonzero on error
The operation of these three functions is as follows.

The putenv function takes a string of the form name=value and places it in the
environment list. If name already exists, its old definition is first removed.

The setenv function sets name to value. If name already exists in the environment,
then (a) if rewrite is nonzero, the existing definition for name is first removed; (b) if
rewrite is 0, an existing definition for name is not removed, name is not set to the
new value, and no error occurs.

The unsetenv function removes any definition of name. It is not an error if such a
Page 291
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
definition does not exist.
Note the difference between putenv and setenv. Whereas setenv must allocate memory
to create the name=value string from its arguments, putenv is free to place the string
passed to it directly into the environment. Indeed, on Linux and Solaris, the putenv
implementation places the address of the string we pass to it directly into the
environment list. In this case, it would be an error to pass it a string allocated on the
stack, since the memory would be reused after we return from the current function.
It is interesting to examine how these functions must operate when modifying the environment
list. Recall Figure 7.6: the environment listthe array of pointers to the actual name=value
stringsand the environment strings are typically stored at the top of a process's memory
space, above the stack. Deleting a string is simple; we simply find the pointer in the
environment list and move all subsequent pointers down one. But adding a string or modifying
an existing string is more difficult. The space at the top of the stack cannot be expanded,
because it is often at the top of the address space of the process and so can't expand
upward; it can't be expanded downward, because all the stack frames below it can't be
moved.
1.
2.
If we're modifying an existing name:
a.
If the size of the new value is less than or equal to the size of the existing
value, we can just copy the new string over the old string.
b.
If the size of the new value is larger than the old one, however, we must malloc
to obtain room for the new string, copy the new string to this area, and then
replace the old pointer in the environment list for name with the pointer to this
allocated area.
If we're adding a new name, it's more complicated. First, we have to call malloc to
allocate room for the name=value string and copy the string to this area.
a.
Then, if it's the first time we've added a new name, we have to call malloc to
obtain room for a new list of pointers. We copy the old environment list to this
new area and store a pointer to the name=value string at the end of this list of
pointers. We also store a null pointer at the end of this list, of course. Finally,
we set environ to point to this new list of pointers. Note from Figure 7.6 that if
the original environment list was contained above the top of the stack, as is
common, then we have moved this list of pointers to the heap. But most of the
pointers in this list still point to name=value strings above the top of the stack.
b.
If this isn't the first time we've added new strings to the environment list, then
we know that we've already allocated room for the list on the heap, so we just
call realloc to allocate room for one more pointer. The pointer to the new
name=value string is stored at the end of the list (on top of the previous null
pointer), followed by a null pointer.
Page 292
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
7.10. setjmp and longjmp Functions
In C, we can't goto a label that's in another function. Instead, we must use the setjmp and
longjmp functions to perform this type of branching. As we'll see, these two functions are
useful for handling error conditions that occur in a deeply nested function call.
Consider the skeleton in Figure 7.9. It consists of a main loop that reads lines from standard
input and calls the function do_line to process each line. This function then calls get_token to
fetch the next token from the input line. The first token of a line is assumed to be a command
of some form, and a switch statement selects each command. For the single command shown,
the function cmd_add is called.
Figure 7.9. Typical program skeleton for command processing
#include "apue.h"
#define TOK_ADD
void
void
int
5
do_line(char *);
cmd_add(void);
get_token(void);
int
main(void)
{
char
line[MAXLINE];
while (fgets(line, MAXLINE, stdin) != NULL)
do_line(line);
exit(0);
}
char
*tok_ptr;
void
do_line(char *ptr)
{
int
cmd;
/* global pointer for get_token() */
/* process one line of input */
tok_ptr = ptr;
while ((cmd = get_token()) > 0) {
switch (cmd) { /* one case for each command */
case TOK_ADD:
cmd_add();
break;
}
}
}
void
cmd_add(void)
{
int
token;
token = get_token();
/* rest of processing for this command */
}
Page 293
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
int
get_token(void)
{
/* fetch next token from line pointed to by tok_ptr */
}
The skeleton in Figure 7.9 is typical for programs that read commands, determine the
command type, and then call functions to process each command. Figure 7.10 shows what
the stack could look like after cmd_add has been called.
Figure 7.10. Stack frames after cmd_add has been called
Storage for the automatic variables is within the stack frame for each function. The array line
is in the stack frame for main, the integer cmd is in the stack frame for do_line, and the
integer token is in the stack frame for cmd_add.
As we've said, this type of arrangement of the stack is typical, but not required. Stacks do
not have to grow toward lower memory addresses. On systems that don't have built-in
hardware support for stacks, a C implementation might use a linked list for its stack frames.
The coding problem that's often encountered with programs like the one shown in Figure 7.9 is
how to handle nonfatal errors. For example, if the cmd_add function encounters an errorsay, an
invalid numberit might want to print an error, ignore the rest of the input line, and return to
the main function to read the next input line. But when we're deeply nested numerous levels
down from the main function, this is difficult to do in C. (In this example, in the cmd_add
function, we're only two levels down from main, but it's not uncommon to be five or more
levels down from where we want to return to.) It becomes messy if we have to code each
function with a special return value that tells it to return one level.
The solution to this problem is to use a nonlocal goto: the setjmp and longjmp functions. The
adjective nonlocal is because we're not doing a normal C goto statement within a function;
instead, we're branching back through the call frames to a function that is in the call path of
the current function.
#include <setjmp.h>
int setjmp(jmp_buf env);
Returns: 0 if called directly, nonzero if returning from a call to longjmp
Page 294
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
#include <setjmp.h>
int setjmp(jmp_buf env);
void longjmp(jmp_buf env, int val);
We call setjmp from the location that we want to return to, which in this example is in the
main function. In this case, setjmp returns 0 because we called it directly. In the call to setjmp
, the argument env is of the special type jmp_buf. This data type is some form of array that is
capable of holding all the information required to restore the status of the stack to the state
when we call longjmp. Normally, the env variable is a global variable, since we'll need to
reference it from another function.
When we encounter an errorsay, in the cmd_add functionwe call longjmp with two arguments.
The first is the same env that we used in a call to setjmp, and the second, val, is a nonzero
value that becomes the return value from setjmp. The reason for the second argument is to
allow us to have more than one longjmp for each setjmp. For example, we could longjmp from
cmd_add with a val of 1 and also call longjmp from get_token with a val of 2. In the main
function, the return value from setjmp is either 1 or 2, and we can test this value, if we want,
and determine whether the longjmp was from cmd_add or get_token.
Let's return to the example. Figure 7.11 shows both the main and cmd_add functions. (The
other two functions, do_line and get_token, haven't changed.)
Figure 7.11. Example of setjmp and longjmp
#include "apue.h"
#include <setjmp.h>
#define TOK_ADD
5
jmp_buf jmpbuffer;
int
main(void)
{
char
line[MAXLINE];
if (setjmp(jmpbuffer) != 0)
printf("error");
while (fgets(line, MAXLINE, stdin) != NULL)
do_line(line);
exit(0);
}
...
void
cmd_add(void)
{
int
token;
token = get_token();
if (token < 0)
/* an error has occurred */
longjmp(jmpbuffer, 1);
/* rest of processing for this command */
}
Page 295
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
When main is executed, we call setjmp, which records whatever information it needs to in the
variable jmpbuffer and returns 0. We then call do_line, which calls cmd_add, and assume that
an error of some form is detected. Before the call to longjmp in cmd_add, the stack looks like
that in Figure 7.10. But longjmp causes the stack to be "unwound" back to the main function,
throwing away the stack frames for cmd_add and do_line (Figure 7.12). Calling longjmp causes
the setjmp in main to return, but this time it returns with a value of 1 (the second argument
for longjmp).
Figure 7.12. Stack frame after longjmp has been called
Automatic, Register, and Volatile Variables
We've seen what the stack looks like after calling longjmp. The next question is, "what are the
states of the automatic variables and register variables in the main function?" When main is
returned to by the longjmp, do these variables have values corresponding to when the setjmp
was previously called (i.e., are their values rolled back), or are their values left alone so that
their values are whatever they were when do_line was called (which caused cmd_add to be
called, which caused longjmp to be called)? Unfortunately, the answer is "it depends." Most
implementations do not try to roll back these automatic variables and register variables, but
the standards say only that their values are indeterminate. If you have an automatic variable
that you don't want rolled back, define it with the volatile attribute. Variables that are
declared global or static are left alone when longjmp is executed.
Example
The program in Figure 7.13 demonstrates the different behavior that can be seen with
automatic, global, register, static, and volatile variables after calling longjmp.
If we compile and test the program in Figure 7.13, with and without compiler optimizations,
the results are different:
$ cc testjmp.c
compile without any optimization
$ ./a.out
in f1():
globval = 95, autoval = 96, regival = 97, volaval = 98, statval = 99
after longjmp:
globval = 95, autoval = 96, regival = 97, volaval = 98, statval = 99
$ cc -O testjmp.c
compile with full optimization
$ ./a.out
in f1():
globval = 95, autoval = 96, regival = 97, volaval = 98, statval = 99
after longjmp:
Page 296
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
globval = 95, autoval = 2, regival = 3, volaval = 98, statval = 99
Note that the optimizations don't affect the global, static, and volatile variables; their values
after the longjmp are the last values that they assumed. The setjmp(3) manual page on one
system states that variables stored in memory will have values as of the time of the longjmp,
whereas variables in the CPU and floating-point registers are restored to their values when
setjmp was called. This is indeed what we see when we run the program in Figure 7.13.
Without optimization, all five variables are stored in memory (the register hint is ignored for
regival). When we enable optimization, both autoval and regival go into registers, even
though the former wasn't declared register, and the volatile variable stays in memory. The
thing to realize with this example is that you must use the volatile attribute if you're writing
portable code that uses nonlocal jumps. Anything else can change from one system to the
next.
Some printf format strings in Figure 7.13 are longer than will fit comfortably for display in a
programming text. Instead of making multiple calls to printf, we rely on ISO C's string
concatenation feature, where the sequence
"string1" "string2"
is equivalent to
"string1string2"
Figure 7.13. Effect of longjmp on various types of variables
#include "apue.h"
#include <setjmp.h>
static void f1(int, int, int, int);
static void f2(void);
static jmp_buf jmpbuffer;
static int
globval;
int
main(void)
{
int
register int
volatile int
static int
autoval;
regival;
volaval;
statval;
globval = 1; autoval = 2; regival = 3; volaval = 4; statval = 5;
if (setjmp(jmpbuffer) != 0) {
printf("after longjmp:\n");
printf("globval = %d, autoval = %d, regival = %d,"
" volaval = %d, statval = %d\n",
globval, autoval, regival, volaval, statval);
exit(0);
}
/*
* Change variables after setjmp, but before longjmp.
*/
globval = 95; autoval = 96; regival = 97; volaval = 98;
Page 297
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
statval = 99;
f1(autoval, regival, volaval, statval); /* never returns */
exit(0);
}
static void
f1(int i, int j, int k, int l)
{
printf("in f1():\n");
printf("globval = %d, autoval = %d, regival = %d,"
" volaval = %d, statval = %d\n", globval, i, j, k, l);
f2();
}
static void
f2(void)
{
longjmp(jmpbuffer, 1);
}
We'll return to these two functions, setjmp and longjmp, in Chapter 10 when we discuss signal
handlers and their signal versions: sigsetjmp and siglongjmp.
Potential Problem with Automatic Variables
Having looked at the way stack frames are usually handled, it is worth looking at a potential
error in dealing with automatic variables. The basic rule is that an automatic variable can
never be referenced after the function that declared it returns. There are numerous warnings
about this throughout the UNIX System manuals.
Figure 7.14 shows a function called open_data that opens a standard I/O stream and sets the
buffering for the stream.
Figure 7.14. Incorrect usage of an automatic variable
#include
<stdio.h>
#define DATAFILE
"datafile"
FILE *
open_data(void)
{
FILE
*fp;
char
databuf[BUFSIZ];
/* setvbuf makes this the stdio buffer */
if ((fp = fopen(DATAFILE, "r")) == NULL)
return(NULL);
if (setvbuf(fp, databuf, _IOLBF, BUFSIZ) != 0)
return(NULL);
return(fp);
/* error */
}
The problem is that when open_data returns, the space it used on the stack will be used by
the stack frame for the next function that is called. But the standard I/O library will still be
using that portion of memory for its stream buffer. Chaos is sure to result. To correct this
problem, the array databuf needs to be allocated from global memory, either statically (static
or extern) or dynamically (one of the alloc functions).
Page 298
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 299
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
7.11. getrlimit and setrlimit Functions
Every process has a set of resource limits, some of which can be queried and changed by the
geTRlimit and setrlimit functions.
[View full width]
#include <sys/resource.h>
int getrlimit(int resource, struct rlimit *rlptr
);
int setrlimit(int resource, const struct rlimit
*rlptr);
Both return: 0 if OK, nonzero on error
These two functions are defined as XSI extensions in the Single UNIX Specification. The
resource limits for a process are normally established by process 0 when the system is
initialized and then inherited by each successive process. Each implementation has its own
way of tuning the various limits.
Each call to these two functions specifies a single resource and a pointer to the following
structure:
struct rlimit {
rlim_t rlim_cur;
rlim_t rlim_max;
};
/* soft limit: current limit */
/* hard limit: maximum value for rlim_cur */
Three rules govern the changing of the resource limits.
1.
A process can change its soft limit to a value less than or equal to its hard limit.
2.
A process can lower its hard limit to a value greater than or equal to its soft limit. This
lowering of the hard limit is irreversible for normal users.
3.
Only a superuser process can raise a hard limit.
An infinite limit is specified by the constant RLIM_INFINITY.
The resource argument takes on one of the following values. Figure 7.15 shows which limits
are defined by the Single UNIX Specification and supported by each implementation.
Figure 7.15. Support for resource limits
Limit
XSI
RLIMIT_AS
•
RLIMIT_CORE
•
FreeBSD 5.2.1
Linux 2.4.22
Mac OS X 10.3
•
•
•
Solaris 9
•
•
•
Page 300
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 7.15. Support for resource limits
Limit
XSI
FreeBSD 5.2.1
Linux 2.4.22
Mac OS X 10.3
Solaris 9
RLIMIT_CPU
•
•
•
•
•
RLIMIT_DATA
•
•
•
•
•
RLIMIT_FSIZE
•
•
•
•
•
RLIMIT_LOCKS
•
RLIMIT_MEMLOCK
•
•
•
•
•
•
RLIMIT_NPROC
•
•
•
RLIMIT_RSS
•
•
•
RLIMIT_SBSIZE
•
•
•
RLIMIT_NOFILE
RLIMIT_STACK
RLIMIT_VMEM
•
•
•
•
•
•
•
RLIMIT_AS
The maximum size in bytes of a process's total available memory. This
affects the sbrk function (Section 1.11) and the mmap function (Section
14.9).
RLIMIT_CORE
The maximum size in bytes of a core file. A limit of 0 prevents the creation
of a core file.
RLIMIT_CPU
The maximum amount of CPU time in seconds. When the soft limit is
exceeded, the SIGXCPU signal is sent to the process.
RLIMIT_DATA
The maximum size in bytes of the data segment: the sum of the initialized
data, uninitialized data, and heap from Figure 7.6.
RLIMIT_FSIZE
The maximum size in bytes of a file that may be created. When the soft
limit is exceeded, the process is sent the SIGXFSZ signal.
RLIMIT_LOCKS
The maximum number of file locks a process can hold. (This number also
includes file leases, a Linux-specific feature. See the Linux fcntl(2) manual
page for more information.)
RLIMIT_MEMLOCK The maximum amount of memory in bytes that a process can lock into
memory using mlock(2).
Page 301
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
RLIMIT_AS
The maximum size in bytes of a process's total available memory. This
affects the sbrk function (Section 1.11) and the mmap function (Section
14.9).
RLIMIT_NOFILE
The maximum number of open files per process. Changing this limit affects
the value returned by the sysconf function for its _SC_OPEN_MAX argument (
Section 2.5.4). See Figure 2.16 also.
RLIMIT_NPROC
The maximum number of child processes per real user ID. Changing this limit
affects the value returned for _SC_CHILD_MAX by the sysconf function (
Section 2.5.4).
RLIMIT_RSS
Maximum resident set size (RSS) in bytes. If available physical memory is
low, the kernel takes memory from processes that exceed their RSS.
RLIMIT_SBSIZE
The maximum size in bytes of socket buffers that a user can consume at
any given time.
RLIMIT_STACK
The maximum size in bytes of the stack. See Figure 7.6.
RLIMIT_VMEM
This is a synonym for RLIMIT_AS.
The resource limits affect the calling process and are inherited by any of its children. This
means that the setting of resource limits needs to be built into the shells to affect all our
future processes. Indeed, the Bourne shell, the GNU Bourne-again shell, and the Korn shell
have the built-in ulimit command, and the C shell has the built-in limit command. (The umask
and chdir functions also have to be handled as shell built-ins.)
Example
The program in Figure 7.16 prints out the current soft limit and hard limit for all the resource
limits supported on the system. To compile this program on all the various implementations, we
have conditionally included the resource names that differ. Note also that we must use a
different printf format on platforms that define rlim_t to be an unsigned long long instead of
an unsigned long.
Note that we've used the ISO C string-creation operator (#) in the doit macro, to generate
the string value for each resource name. When we say
doit(RLIMIT_CORE);
the C preprocessor expands this into
pr_limits("RLIMIT_CORE", RLIMIT_CORE);
Running this program under FreeBSD gives us the following:
$ ./a.out
RLIMIT_CORE
RLIMIT_CPU
RLIMIT_DATA
RLIMIT_FSIZE
RLIMIT_MEMLOCK
RLIMIT_NOFILE
(infinite)
(infinite)
536870912
(infinite)
(infinite)
1735
(infinite)
(infinite)
536870912
(infinite)
(infinite)
1735
Page 302
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
RLIMIT_NPROC
RLIMIT_RSS
RLIMIT_SBSIZE
RLIMIT_STACK
RLIMIT_VMEM
867
867
(infinite) (infinite)
(infinite) (infinite)
67108864
67108864
(infinite) (infinite)
Solaris gives us the following results:
$ ./a.out
RLIMIT_AS
RLIMIT_CORE
RLIMIT_CPU
RLIMIT_DATA
RLIMIT_FSIZE
RLIMIT_NOFILE
RLIMIT_STACK
RLIMIT_VMEM
(infinite)
(infinite)
(infinite)
(infinite)
(infinite)
256
8388608
(infinite)
(infinite)
(infinite)
(infinite)
(infinite)
(infinite)
65536
(infinite)
(infinite)
Figure 7.16. Print the current resource limits
#include "apue.h"
#if defined(BSD) || defined(MACOS)
#include <sys/time.h>
#define FMT "%10lld "
#else
#define FMT "%10ld "
#endif
#include <sys/resource.h>
#define doit(name) pr_limits(#name, name)
static void pr_limits(char *, int);
int
main(void)
{
#ifdef RLIMIT_AS
doit(RLIMIT_AS);
#endif
doit(RLIMIT_CORE);
doit(RLIMIT_CPU);
doit(RLIMIT_DATA);
doit(RLIMIT_FSIZE);
#ifdef RLIMIT_LOCKS
doit(RLIMIT_LOCKS);
#endif
#ifdef RLIMIT_MEMLOCK
doit(RLIMIT_MEMLOCK);
#endif
doit(RLIMIT_NOFILE);
#ifdef RLIMIT_NPROC
doit(RLIMIT_NPROC);
#endif
#ifdef RLIMIT_RSS
doit(RLIMIT_RSS);
#endif
#ifdef RLIMIT_SBSIZE
Page 303
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
doit(RLIMIT_SBSIZE);
#endif
doit(RLIMIT_STACK);
#ifdef RLIMIT_VMEM
doit(RLIMIT_VMEM);
#endif
exit(0);
}
static void
pr_limits(char *name, int resource)
{
struct rlimit limit;
if (getrlimit(resource, &limit) < 0)
err_sys("getrlimit error for %s", name);
printf("%-14s ", name);
if (limit.rlim_cur == RLIM_INFINITY)
printf("(infinite) ");
else
printf(FMT, limit.rlim_cur);
if (limit.rlim_max == RLIM_INFINITY)
printf("(infinite)");
else
printf(FMT, limit.rlim_max);
putchar((int)'\n');
}
Exercise 10.11 continues the discussion of resource limits, after we've covered signals.
Page 304
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
7.12. Summary
Understanding the environment of a C program in a UNIX system's environment is a
prerequisite to understanding the process control features of the UNIX System. In this
chapter, we've looked at how a process is started, how it can terminate, and how it's passed
an argument list and an environment. Although both are uninterpreted by the kernel, it is the
kernel that passes both from the caller of exec to the new process.
We've also examined the typical memory layout of a C program and how a process can
dynamically allocate and free memory. It is worthwhile to look in detail at the functions
available for manipulating the environment, since they involve memory allocation. The
functions setjmp and longjmp were presented, providing a way to perform nonlocal branching
within a process. We finished the chapter by describing the resource limits that various
implementations provide.
Page 305
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Exercises
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
On an Intel x86 system under both FreeBSD and Linux, if we execute the
program that prints "hello, world" and do not call exit or return, the termination
status of the program, which we can examine with the shell, is 13. Why?
When is the output from the printfs in Figure 7.3 actually output?
Is there any way for a function that is called by main to examine the
command-line arguments without (a) passing argc and argv as arguments from
main to the function or (b) having main copy argc and argv into global variables?
Some UNIX system implementations purposely arrange that, when a program is
executed, location 0 in the data segment is not accessible. Why?
Use the typedef facility of C to define a new data type Exitfunc for an exit
handler. Redo the prototype for atexit using this data type.
If we allocate an array of longs using calloc, is the array initialized to 0? If we
allocate an array of pointers using calloc, is the array initialized to null
pointers?
In the output from the size command at the end of Section 7.6, why aren't any
sizes given for the heap and the stack?
In Section 7.7, the two file sizes (475570 and 11410) don't equal the sums of
their respective text and data sizes. Why?
In Section 7.7, why is there such a difference in the size of the executable file
when using shared libraries for such a trivial program?
At the end of Section 7.10, we showed how a function can't return a pointer to
an automatic variable. Is the following code correct?
int
f1(int val)
{
int
*ptr;
if (val == 0) {
int
val;
val = 5;
ptr = &val;
}
return(*ptr + 1);
}
Page 306
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 307
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Chapter 8. Process Control
Section 8.1. Introduction
Section 8.2. Process Identifiers
Section 8.3. fork Function
Section 8.4. vfork Function
Section 8.5. exit Functions
Section 8.6. wait and waitpid Functions
Section 8.7. waitid Function
Section 8.8. wait3 and wait4 Functions
Section 8.9. Race Conditions
Section 8.10. exec Functions
Section 8.11. Changing User IDs and Group IDs
Section 8.12. Interpreter Files
Section 8.13. system Function
Section 8.14. Process Accounting
Section 8.15. User Identification
Section 8.16. Process Times
Section 8.17. Summary
Exercises
Page 308
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.1. Introduction
We now turn to the process control provided by the UNIX System. This includes the creation
of new processes, program execution, and process termination. We also look at the various
IDs that are the property of the processreal, effective, and saved; user and group IDsand
how they're affected by the process control primitives. Interpreter files and the system
function are also covered. We conclude the chapter by looking at the process accounting
provided by most UNIX systems. This lets us look at the process control functions from a
different perspective.
Page 309
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.2. Process Identifiers
Every process has a unique process ID, a non-negative integer. Because the process ID is the
only well-known identifier of a process that is always unique, it is often used as a piece of
other identifiers, to guarantee uniqueness. For example, applications sometimes include the
process ID as part of a filename in an attempt to generate unique filenames.
Although unique, process IDs are reused. As processes terminate, their IDs become
candidates for reuse. Most UNIX systems implement algorithms to delay reuse, however, so
that newly created processes are assigned IDs different from those used by processes that
terminated recently. This prevents a new process from being mistaken for the previous
process to have used the same ID.
There are some special processes, but the details differ from implementation to
implementation. Process ID 0 is usually the scheduler process and is often known as the
swapper. No program on disk corresponds to this process, which is part of the kernel and is
known as a system process. Process ID 1 is usually the init process and is invoked by the
kernel at the end of the bootstrap procedure. The program file for this process was /etc/init
in older versions of the UNIX System and is /sbin/init in newer versions. This process is
responsible for bringing up a UNIX system after the kernel has been bootstrapped. init usually
reads the system-dependent initialization filesthe /etc/rc* files or /etc/inittab and the files
in /etc/init.dand brings the system to a certain state, such as multiuser. The init process
never dies. It is a normal user process, not a system process within the kernel, like the
swapper, although it does run with superuser privileges. Later in this chapter, we'll see how
init becomes the parent process of any orphaned child process.
Each UNIX System implementation has its own set of kernel processes that provide operating
system services. For example, on some virtual memory implementations of the UNIX System,
process ID 2 is the pagedaemon. This process is responsible for supporting the paging of the
virtual memory system.
In addition to the process ID, there are other identifiers for every process. The following
functions return these identifiers.
#include <unistd.h>
pid_t getpid(void);
Returns: process ID of calling process
pid_t getppid(void);
Returns: parent process ID of calling process
uid_t getuid(void);
Returns: real user ID of calling process
uid_t geteuid(void);
Returns: effective user ID of calling process
gid_t getgid(void);
Returns: real group ID of calling process
Page 310
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
#include <unistd.h>
pid_t getpid(void);
gid_t getegid(void);
Returns: effective group ID of calling process
Note that none of these functions has an error return. We'll return to the parent process ID in
the next section when we discuss the fork function. The real and effective user and group
IDs were discussed in Section 4.4.
Page 311
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.3. fork Function
An existing process can create a new one by calling the fork function.
#include <unistd.h>
pid_t fork(void);
Returns: 0 in child, process ID of child in parent, 1 on error
The new process created by fork is called the child process. This function is called once but
returns twice. The only difference in the returns is that the return value in the child is 0,
whereas the return value in the parent is the process ID of the new child. The reason the
child's process ID is returned to the parent is that a process can have more than one child,
and there is no function that allows a process to obtain the process IDs of its children. The
reason fork returns 0 to the child is that a process can have only a single parent, and the
child can always call getppid to obtain the process ID of its parent. (Process ID 0 is reserved
for use by the kernel, so it's not possible for 0 to be the process ID of a child.)
Both the child and the parent continue executing with the instruction that follows the call to
fork. The child is a copy of the parent. For example, the child gets a copy of the parent's
data space, heap, and stack. Note that this is a copy for the child; the parent and the child
do not share these portions of memory. The parent and the child share the text segment (
Section 7.6).
Current implementations don't perform a complete copy of the parent's data, stack, and heap,
since a fork is often followed by an exec. Instead, a technique called copy-on-write (COW) is
used. These regions are shared by the parent and the child and have their protection changed
by the kernel to read-only. If either process tries to modify these regions, the kernel then
makes a copy of that piece of memory only, typically a "page" in a virtual memory system.
Section 9.2 of Bach [1986] and Sections 5.6 and 5.7 of McKusick et al. [1996] provide more
detail on this feature.
Variations of the fork function are provided by some platforms. All four platforms discussed in
this book support the vfork(2) variant discussed in the next section.
Linux 2.4.22 also provides new process creation through the clone(2) system call. This is a
generalized form of fork that allows the caller to control what is shared between parent and
child.
FreeBSD 5.2.1 provides the rfork(2) system call, which is similar to the Linux clone system
call. The rfork call is derived from the Plan 9 operating system (Pike et al. [1995]).
Solaris 9 provides two threads libraries: one for POSIX threads (pthreads) and one for Solaris
threads. The behavior of fork differs between the two thread libraries. For POSIX threads,
fork creates a process containing only the calling thread, but for Solaris threads, fork creates
a process containing copies of all threads from the process of the calling thread. To provide
similar semantics as POSIX threads, Solaris provides the fork1 function, which can be used to
create a process that duplicates only the calling thread, regardless of the thread library used.
Threads are discussed in detail in Chapters 11 and 12.
Example
The program in Figure 8.1 demonstrates the fork function, showing how changes to variables
in a child process do not affect the value of the variables in the parent process.
Page 312
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
If we execute this program, we get
$ ./a.out
a write to stdout
before fork
pid = 430, glob = 7,
pid = 429, glob = 6,
$ ./a.out > temp.out
$ cat temp.out
a write to stdout
before fork
pid = 432, glob = 7,
before fork
pid = 431, glob = 6,
var = 89
var = 88
child's variables were changed
parent's copy was not changed
var = 89
var = 88
In general, we never know whether the child starts executing before the parent or vice versa.
This depends on the scheduling algorithm used by the kernel. If it's required that the child and
parent synchronize, some form of interprocess communication is required. In the program
shown in Figure 8.1, we simply have the parent put itself to sleep for 2 seconds, to let the
child execute. There is no guarantee that this is adequate, and we talk about this and other
types of synchronization in Section 8.9 when we discuss race conditions. In Section 10.16,
we show how to use signals to synchronize a parent and a child after a fork.
When we write to standard output, we subtract 1 from the size of buf to avoid writing the
terminating null byte. Although strlen will calculate the length of a string not including the
terminating null byte, sizeof calculates the size of the buffer, which does include the
terminating null byte. Another difference is that using strlen requires a function call, whereas
sizeof calculates the buffer length at compile time, as the buffer is initialized with a known
string, and its size is fixed.
Note the interaction of fork with the I/O functions in the program in Figure 8.1. Recall from
Chapter 3 that the write function is not buffered. Because write is called before the fork, its
data is written once to standard output. The standard I/O library, however, is buffered. Recall
from Section 5.12 that standard output is line buffered if it's connected to a terminal device;
otherwise, it's fully buffered. When we run the program interactively, we get only a single
copy of the printf line, because the standard output buffer is flushed by the newline. But
when we redirect standard output to a file, we get two copies of the printf line. In this
second case, the printf before the fork is called once, but the line remains in the buffer
when fork is called. This buffer is then copied into the child when the parent's data space is
copied to the child. Both the parent and the child now have a standard I/O buffer with this
line in it. The second printf, right before the exit, just appends its data to the existing
buffer. When each process terminates, its copy of the buffer is finally flushed.
Figure 8.1. Example of fork function
#include "apue.h"
int
char
glob = 6;
/* external variable in initialized data */
buf[] = "a write to stdout\n";
int
main(void)
{
int
pid_t
var;
pid;
/* automatic variable on the stack */
var = 88;
if (write(STDOUT_FILENO, buf, sizeof(buf)-1) != sizeof(buf)-1)
err_sys("write error");
Page 313
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
printf("before fork\n");
if ((pid = fork()) < 0) {
err_sys("fork error");
} else if (pid == 0) {
glob++;
var++;
} else {
sleep(2);
}
/* we don't flush stdout */
/* child */
/* modify variables */
/* parent */
printf("pid = %d, glob = %d, var = %d\n", getpid(), glob, var);
exit(0);
}
File Sharing
When we redirect the standard output of the parent from the program in Figure 8.1, the
child's standard output is also redirected. Indeed, one characteristic of fork is that all file
descriptors that are open in the parent are duplicated in the child. We say "duplicated"
because it's as if the dup function had been called for each descriptor. The parent and the
child share a file table entry for every open descriptor (recall Figure 3.8).
Consider a process that has three different files opened for standard input, standard output,
and standard error. On return from fork, we have the arrangement shown in Figure 8.2.
Figure 8.2. Sharing of open files between parent and child after fork
[View full size image]
It is important that the parent and the child share the same file offset. Consider a process
that forks a child, then waits for the child to complete. Assume that both processes write to
standard output as part of their normal processing. If the parent has its standard output
redirected (by a shell, perhaps) it is essential that the parent's file offset be updated by the
child when the child writes to standard output. In this case, the child can write to standard
output while the parent is waiting for it; on completion of the child, the parent can continue
Page 314
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
writing to standard output, knowing that its output will be appended to whatever the child
wrote. If the parent and the child did not share the same file offset, this type of interaction
would be more difficult to accomplish and would require explicit actions by the parent.
If both parent and child write to the same descriptor, without any form of synchronization,
such as having the parent wait for the child, their output will be intermixed (assuming it's a
descriptor that was open before the fork). Although this is possiblewe saw it in Figure 8.2it's
not the normal mode of operation.
There are two normal cases for handling the descriptors after a fork.
1.
The parent waits for the child to complete. In this case, the parent does not need to
do anything with its descriptors. When the child terminates, any of the shared
descriptors that the child read from or wrote to will have their file offsets updated
accordingly.
2.
Both the parent and the child go their own ways. Here, after the fork, the parent
closes the descriptors that it doesn't need, and the child does the same thing. This
way, neither interferes with the other's open descriptors. This scenario is often the
case with network servers.
Besides the open files, there are numerous other properties of the parent that are inherited by
the child:

Real user ID, real group ID, effective user ID, effective group ID

Supplementary group IDs

Process group ID

Session ID

Controlling terminal

The set-user-ID and set-group-ID flags

Current working directory

Root directory

File mode creation mask

Signal mask and dispositions

The close-on-exec flag for any open file descriptors

Environment

Attached shared memory segments

Memory mappings

Resource limits
The differences between the parent and child are

The return value from fork

The process IDs are different

The two processes have different parent process IDs: the parent process ID of the
child is the parent; the parent process ID of the parent doesn't change
Page 315
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html

The child's tms_utime, tms_stime, tms_cutime, and tms_cstime values are set to 0

File locks set by the parent are not inherited by the child

Pending alarms are cleared for the child

The set of pending signals for the child is set to the empty set
Many of these features haven't been discussed yetwe'll cover them in later chapters.
The two main reasons for fork to fail are (a) if too many processes are already in the system,
which usually means that something else is wrong, or (b) if the total number of processes for
this real user ID exceeds the system's limit. Recall from Figure 2.10 that CHILD_MAX specifies
the maximum number of simultaneous processes per real user ID.
There are two uses for fork:
1.
When a process wants to duplicate itself so that the parent and child can each
execute different sections of code at the same time. This is common for network
serversthe parent waits for a service request from a client. When the request arrives,
the parent calls fork and lets the child handle the request. The parent goes back to
waiting for the next service request to arrive.
2.
When a process wants to execute a different program. This is common for shells. In
this case, the child does an exec (which we describe in Section 8.10) right after it
returns from the fork.
Some operating systems combine the operations from step 2a fork followed by an execinto a
single operation called a spawn. The UNIX System separates the two, as there are numerous
cases where it is useful to fork without doing an exec. Also, separating the two allows the
child to change the per-process attributes between the fork and the exec, such as I/O
redirection, user ID, signal disposition, and so on. We'll see numerous examples of this in
Chapter 15.
The Single UNIX Specification does include spawn interfaces in the advanced real-time option
group. These interfaces are not intended to be replacements for fork and exec, however.
They are intended to support systems that have difficulty implementing fork efficiently,
especially systems without hardware support for memory management.
Page 316
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.4. vfork Function
The function vfork has the same calling sequence and same return values as fork. But the
semantics of the two functions differ.
The vfork function originated with 2.9BSD. Some consider the function a blemish, but all the
platforms covered in this book support it. In fact, the BSD developers removed it from the
4.4BSD release, but all the open source BSD distributions that derive from 4.4BSD added
support for it back into their own releases. The vfork function is marked as an obsolete
interface in Version 3 of the Single UNIX Specification.
The vfork function is intended to create a new process when the purpose of the new process
is to exec a new program (step 2 at the end of the previous section). The bare-bones shell in
the program from Figure 1.7 is also an example of this type of program. The vfork function
creates the new process, just like fork, without copying the address space of the parent into
the child, as the child won't reference that address space; the child simply calls exec (or exit)
right after the vfork. Instead, while the child is running and until it calls either exec or exit,
the child runs in the address space of the parent. This optimization provides an efficiency gain
on some paged virtual-memory implementations of the UNIX System. (As we mentioned in the
previous section, implementations use copy-on-write to improve the efficiency of a fork
followed by an exec, but no copying is still faster than some copying.)
Another difference between the two functions is that vfork guarantees that the child runs
first, until the child calls exec or exit. When the child calls either of these functions, the
parent resumes. (This can lead to deadlock if the child depends on further actions of the
parent before calling either of these two functions.)
Example
The program in Figure 8.3 is a modified version of the program from Figure 8.1. We've replaced
the call to fork with vfork and removed the write to standard output. Also, we don't need to
have the parent call sleep, as we're guaranteed that it is put to sleep by the kernel until the
child calls either exec or exit.
Running this program gives us
$ ./a.out
before vfork
pid = 29039, glob = 7, var = 89
Here, the incrementing of the variables done by the child changes the values in the parent.
Because the child runs in the address space of the parent, this doesn't surprise us. This
behavior, however, differs from fork.
Note in Figure 8.3 that we call _exit instead of exit. As we described in Section 7.3, _exit
does not perform any flushing of standard I/O buffers. If we call exit instead, the results are
indeterminate. Depending on the implementation of the standard I/O library, we might see no
difference in the output, or we might find that the output from the parent's printf has
disappeared.
If the child calls exit, the implementation flushes the standard I/O streams. If this is the only
action taken by the library, then we will see no difference with the output generated if the
child called _exit. If the implementation also closes the standard I/O streams, however, the
memory representing the FILE object for the standard output will be cleared out. Because the
child is borrowing the parent's address space, when the parent resumes and calls printf, no
output will appear and printf will return -1. Note that the parent's STDOUT_FILENO is still valid,
Page 317
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
as the child gets a copy of the parent's file descriptor array (refer back to Figure 8.2).
Most modern implementations of exit will not bother to close the streams. Because the
process is about to exit, the kernel will close all the file descriptors open in the process.
Closing them in the library simply adds overhead without any benefit.
Figure 8.3. Example of vfork function
#include "apue.h"
int
glob = 6;
int
main(void)
{
int
pid_t
var;
pid;
/* external variable in initialized data */
/* automatic variable on the stack */
var = 88;
printf("before vfork\n");
/* we don't flush stdio */
if ((pid = vfork()) < 0) {
err_sys("vfork error");
} else if (pid == 0) {
/* child */
glob++;
/* modify parent's variables */
var++;
_exit(0);
/* child terminates */
}
/*
* Parent continues here.
*/
printf("pid = %d, glob = %d, var = %d\n", getpid(), glob, var);
exit(0);
}
Section 5.6 of McKusick et al. [1996] contains additional information on the implementation
issues of fork and vfork. Exercises 8.1 and 8.2 continue the discussion of vfork.
Page 318
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.5. exit Functions
As we described in Section 7.3, a process can terminate normally in five ways:
1.
Executing a return from the main function. As we saw in Section 7.3, this is equivalent
to calling exit.
2.
Calling the exit function. This function is defined by ISO C and includes the calling of
all exit handlers that have been registered by calling atexit and closing all standard I/O
streams. Because ISO C does not deal with file descriptors, multiple processes (parents
and children), and job control, the definition of this function is incomplete for a UNIX
system.
3.
Calling the _exit or _Exit function. ISO C defines _Exit to provide a way for a process
to terminate without running exit handlers or signal handlers. Whether or not standard
I/O streams are flushed depends on the implementation. On UNIX systems, _Exit and
_exit are synonymous and do not flush standard I/O streams. The _exit function is
called by exit and handles the UNIX system-specific details; _exit is specified by
POSIX.1.
In most UNIX system implementations, exit(3) is a function in the standard C library,
whereas _exit(2) is a system call.
4.
Executing a return from the start routine of the last thread in the process. The return
value of the thread is not used as the return value of the process, however. When the
last thread returns from its start routine, the process exits with a termination status of
0.
5.
Calling the pthread_exit function from the last thread in the process. As with the
previous case, the exit status of the process in this situation is always 0, regardless of
the argument passed to pthread_exit. We'll say more about pthread_exit in Section
11.5.
The three forms of abnormal termination are as follows:
1.
Calling abort. This is a special case of the next item, as it generates the SIGABRT signal.
2.
When the process receives certain signals. (We describe signals in more detail in
Chapter 10). The signal can be generated by the process itselffor example, by calling
the abort functionby some other process, or by the kernel. Examples of signals
generated by the kernel include the process referencing a memory location not within
its address space or trying to divide by 0.
3.
The last thread responds to a cancellation request. By default, cancellation occurs in a
deferred manner: one thread requests that another be canceled, and sometime later,
the target thread terminates. We discuss cancellation requests in detail in Sections
11.5 and 12.7.
Regardless of how a process terminates, the same code in the kernel is eventually executed.
This kernel code closes all the open descriptors for the process, releases the memory that it
was using, and the like.
For any of the preceding cases, we want the terminating process to be able to notify its
parent how it terminated. For the three exit functions (exit, _exit, and _Exit), this is done by
passing an exit status as the argument to the function. In the case of an abnormal
termination, however, the kernel, not the process, generates a termination status to indicate
the reason for the abnormal termination. In any case, the parent of the process can obtain
the termination status from either the wait or the waitpid function (described in the next
Page 319
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
section).
Note that we differentiate between the exit status, which is the argument to one of the three
exit functions or the return value from main, and the termination status. The exit status is
converted into a termination status by the kernel when _exit is finally called (recall Figure 7.2
). Figure 8.4 describes the various ways the parent can examine the termination status of a
child. If the child terminated normally, the parent can obtain the exit status of the child.
Figure 8.4. Macros to examine the termination status returned by wait
and waitpid
Macro
WIFEXITED(status)
Description
True if status was returned for a child that terminated normally. In this
case, we can execute
WEXITSTATUS (status)
to fetch the low-order 8 bits of the argument that the child passed to
exit, _exit,or _Exit.
WIFSIGNALED (status True if status was returned for a child that terminated abnormally, by
)
receipt of a signal that it didn't catch. In this case, we can execute
WTERMSIG (status)
to fetch the signal number that caused the termination.
Additionally, some implementations (but not the Single UNIX
Specification) define the macro
WCOREDUMP (status)
that returns true if a core file of the terminated process was
generated.
WIFSTOPPED (status) True if status was returned for a child that is currently stopped. In this
case, we can execute
WSTOPSIG (status)
to fetch the signal number that caused the child to stop.
WIFCONTINUED (
status)
True if status was returned for a child that has been continued after a
job control stop (XSI extension to POSIX.1; waitpid only).
When we described the fork function, it was obvious that the child has a parent process after
the call to fork. Now we're talking about returning a termination status to the parent. But
what happens if the parent terminates before the child? The answer is that the init process
becomes the parent process of any process whose parent terminates. We say that the
process has been inherited by init. What normally happens is that whenever a process
terminates, the kernel goes through all active processes to see whether the terminating
process is the parent of any process that still exists. If so, the parent process ID of the
surviving process is changed to be 1 (the process ID of init). This way, we're guaranteed
that every process has a parent.
Another condition we have to worry about is when a child terminates before its parent. If the
Page 320
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
child completely disappeared, the parent wouldn't be able to fetch its termination status when
and if the parent were finally ready to check if the child had terminated. The kernel keeps a
small amount of information for every terminating process, so that the information is available
when the parent of the terminating process calls wait or waitpid. Minimally, this information
consists of the process ID, the termination status of the process, and the amount of CPU time
taken by the process. The kernel can discard all the memory used by the process and close
its open files. In UNIX System terminology, a process that has terminated, but whose parent
has not yet waited for it, is called a zombie. The ps(1) command prints the state of a zombie
process as Z. If we write a long-running program that forks many child processes, they
become zombies unless we wait for them and fetch their termination status.
Some systems provide ways to prevent the creation of zombies, as we describe in Section
10.7.
The final condition to consider is this: what happens when a process that has been inherited
by init terminates? Does it become a zombie? The answer is "no," because init is written so
that whenever one of its children terminates, init calls one of the wait functions to fetch the
termination status. By doing this, init prevents the system from being clogged by zombies.
When we say "one of init's children," we mean either a process that init generates directly
(such as getty, which we describe in Section 9.2) or a process whose parent has terminated
and has been subsequently inherited by init.
Page 321
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.6. wait and waitpid Functions
When a process terminates, either normally or abnormally, the kernel notifies the parent by
sending the SIGCHLD signal to the parent. Because the termination of a child is an
asynchronous eventit can happen at any time while the parent is runningthis signal is the
asynchronous notification from the kernel to the parent. The parent can choose to ignore this
signal, or it can provide a function that is called when the signal occurs: a signal handler. The
default action for this signal is to be ignored. We describe these options in Chapter 10. For
now, we need to be aware that a process that calls wait or waitpid can

Block, if all of its children are still running

Return immediately with the termination status of a child, if a child has terminated and
is waiting for its termination status to be fetched

Return immediately with an error, if it doesn't have any child processes
If the process is calling wait because it received the SIGCHLD signal, we expect wait to return
immediately. But if we call it at any random point in time, it can block.
#include <sys/wait.h>
pid_t wait(int *statloc);
pid_t waitpid(pid_t pid, int *statloc, int options);
Both return: process ID if OK, 0 (see later), or 1 on error
The differences between these two functions are as follows.

The wait function can block the caller until a child process terminates, whereas waitpid
has an option that prevents it from blocking.

The waitpid function doesn't wait for the child that terminates first; it has a number of
options that control which process it waits for.
If a child has already terminated and is a zombie, wait returns immediately with that child's
status. Otherwise, it blocks the caller until a child terminates. If the caller blocks and has
multiple children, wait returns when one terminates. We can always tell which child
terminated, because the process ID is returned by the function.
For both functions, the argument statloc is a pointer to an integer. If this argument is not a
null pointer, the termination status of the terminated process is stored in the location pointed
to by the argument. If we don't care about the termination status, we simply pass a null
pointer as this argument.
Traditionally, the integer status that these two functions return has been defined by the
implementation, with certain bits indicating the exit status (for a normal return), other bits
indicating the signal number (for an abnormal return), one bit to indicate whether a core file
was generated, and so on. POSIX.1 specifies that the termination status is to be looked at
using various macros that are defined in <sys/wait.h>. Four mutually exclusive macros tell us
how the process terminated, and they all begin with WIF. Based on which of these four macros
is true, other macros are used to obtain the exit status, signal number, and the like. The four
mutually-exclusive macros are shown in Figure 8.4.
Page 322
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
We'll discuss how a process can be stopped in Section 9.8 when we discuss job control.
Example
The function pr_exit in Figure 8.5 uses the macros from Figure 8.4 to print a description of
the termination status. We'll call this function from numerous programs in the text. Note that
this function handles the WCOREDUMP macro, if it is defined.
FreeBSD 5.2.1, Linux 2.4.22, Mac OS X 10.3, and Solaris 9 all support the WCOREDUMP macro.
The program shown in Figure 8.6 calls the pr_exit function, demonstrating the various values
for the termination status. If we run the program in Figure 8.6, we get
$ ./a.out
normal termination, exit status = 7
abnormal termination, signal number = 6 (core file generated)
abnormal termination, signal number = 8 (core file generated)
Unfortunately, there is no portable way to map the signal numbers from WTERMSIG into
descriptive names. (See Section 10.21 for one method.) We have to look at the <signal.h>
header to verify that SIGABRT has a value of 6 and that SIGFPE has a value of 8.
Figure 8.5. Print a description of the exit status
#include "apue.h"
#include <sys/wait.h>
void
pr_exit(int status)
{
if (WIFEXITED(status))
printf("normal termination, exit status = %d\n",
WEXITSTATUS(status));
else if (WIFSIGNALED(status))
printf("abnormal termination, signal number = %d%s\n",
WTERMSIG(status),
#ifdef WCOREDUMP
WCOREDUMP(status) ? " (core file generated)" : "");
#else
"");
#endif
else if (WIFSTOPPED(status))
printf("child stopped, signal number = %d\n",
WSTOPSIG(status));
}
Figure 8.6. Demonstrate various exit statuses
#include "apue.h"
#include <sys/wait.h>
int
main(void)
{
pid_t
int
pid;
status;
if ((pid = fork()) < 0)
Page 323
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
err_sys("fork error");
else if (pid == 0)
exit(7);
if (wait(&status) != pid)
err_sys("wait error");
pr_exit(status);
if ((pid = fork()) < 0)
err_sys("fork error");
else if (pid == 0)
abort();
if (wait(&status) != pid)
err_sys("wait error");
pr_exit(status);
if ((pid = fork()) < 0)
err_sys("fork error");
else if (pid == 0)
status /= 0;
if (wait(&status) != pid)
err_sys("wait error");
pr_exit(status);
/* child */
/* wait for child */
/* and print its status */
/* child */
/* generates SIGABRT */
/* wait for child */
/* and print its status */
/* child */
/* divide by 0 generates SIGFPE */
/* wait for child */
/* and print its status */
exit(0);
}
As we mentioned, if we have more than one child, wait returns on termination of any of the
children. What if we want to wait for a specific process to terminate (assuming we know
which process ID we want to wait for)? In older versions of the UNIX System, we would have
to call wait and compare the returned process ID with the one we're interested in. If the
terminated process wasn't the one we wanted, we would have to save the process ID and
termination status and call wait again. We would need to continue doing this until the desired
process terminated. The next time we wanted to wait for a specific process, we would go
through the list of already terminated processes to see whether we had already waited for it,
and if not, call wait again. What we need is a function that waits for a specific process. This
functionality (and more) is provided by the POSIX.1 waitpid function.
The interpretation of the pid argument for waitpid depends on its value:
pid == Waits for any child process. In this respect, waitpid is equivalent to wait.
1
pid >
0
Waits for the child whose process ID equals pid.
pid == Waits for any child whose process group ID equals that of the calling process. (We
0
discuss process groups in Section 9.4.)
pid <
1
Waits for any child whose process group ID equals the absolute value of pid.
The waitpid function returns the process ID of the child that terminated and stores the child's
termination status in the memory location pointed to by statloc. With wait, the only real error
is if the calling process has no children. (Another error return is possible, in case the function
Page 324
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
call is interrupted by a signal. We'll discuss this in Chapter 10.) With waitpid, however, it's
also possible to get an error if the specified process or process group does not exist or is not
a child of the calling process.
The options argument lets us further control the operation of waitpid. This argument is either
0 or is constructed from the bitwise OR of the constants in Figure 8.7.
Figure 8.7. The options constants for waitpid
Constant
Description
WCONTINUED
If the implementation supports job control, the status of any child specified by
pid that has been continued after being stopped, but whose status has not yet
been reported, is returned (XSI extension to POSIX.1).
WNOHANG
The waitpid function will not block if a child specified by pid is not immediately
available. In this case, the return value is 0.
WUNTRACED
If the implementation supports job control, the status of any child specified by
pid that has stopped, and whose status has not been reported since it has
stopped, is returned. The WIFSTOPPED macro determines whether the return
value corresponds to a stopped child process.
Solaris supports one additional, but nonstandard, option constant. WNOWAIT has the system
keep the process whose termination status is returned by waitpid in a wait state, so that it
may be waited for again.
The waitpid function provides three features that aren't provided by the wait function.
1.
The waitpid function lets us wait for one particular process, whereas the wait function
returns the status of any terminated child. We'll return to this feature when we discuss
the popen function.
2.
The waitpid function provides a nonblocking version of wait. There are times when we
want to fetch a child's status, but we don't want to block.
3.
The waitpid function provides support for job control with the WUNtrACED and
WCONTINUED options.
Example
Recall our discussion in Section 8.5 about zombie processes. If we want to write a process so
that it forks a child but we don't want to wait for the child to complete and we don't want
the child to become a zombie until we terminate, the trick is to call fork twice. The program
in Figure 8.8 does this.
We call sleep in the second child to ensure that the first child terminates before printing the
parent process ID. After a fork, either the parent or the child can continue executing; we
never know which will resume execution first. If we didn't put the second child to sleep, and if
it resumed execution after the fork before its parent, the parent process ID that it printed
would be that of its parent, not process ID 1.
Executing the program in Figure 8.8 gives us
$ ./a.out
$ second child, parent pid = 1
Page 325
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Note that the shell prints its prompt when the original process terminates, which is before the
second child prints its parent process ID.
Figure 8.8. Avoid zombie processes by calling fork twice
#include "apue.h"
#include <sys/wait.h>
int
main(void)
{
pid_t
pid;
if ((pid = fork()) < 0) {
err_sys("fork error");
} else if (pid == 0) {
/* first child */
if ((pid = fork()) < 0)
err_sys("fork error");
else if (pid > 0)
exit(0);
/* parent from second fork == first child */
/*
* We're the second child; our parent becomes init as soon
* as our real parent calls exit() in the statement above.
* Here's where we'd continue executing, knowing that when
* we're done, init will reap our status.
*/
sleep(2);
printf("second child, parent pid = %d\n", getppid());
exit(0);
}
if (waitpid(pid, NULL, 0) != pid)
err_sys("waitpid error");
/* wait for first child */
/*
* We're the parent (the original process); we continue executing,
* knowing that we're not the parent of the second child.
*/
exit(0);
}
Page 326
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.7. waitid Function
The XSI extension of the Single UNIX Specification includes an additional function to retrieve
the exit status of a process. The waitid function is similar to waitpid, but provides extra
flexibility.
[View full width]
#include <sys/wait.h>
int waitid(idtype_t idtype, id_t id,
siginfo_t
*infop, int options);
Returns: 0 if OK, 1 on error
Like waitpid, waitid allows a process to specify which children to wait for. Instead of
encoding this information in a single argument combined with the process ID or process group
ID, two separate arguments are used. The id parameter is interpreted based on the value of
idtype. The types supported are summarized in Figure 8.9.
Figure 8.9. The idtype constants for waitid
Constant
Description
P_PID
Wait for a particular process: id contains the process ID of the child to wait for.
P_PGID
Wait for any child process in a particular process group: id contains the process
group ID of the children to wait for.
P_ALL
Wait for any child process: id is ignored.
The options argument is a bitwise OR of the flags shown in Figure 8.10. These flags indicate
which state changes the caller is interested in.
Figure 8.10. The options constants for waitid
Constant
Description
WCONTINUED
Wait for a process that has previously stopped and has been continued, and
whose status has not yet been reported.
WEXITED
Wait for processes that have exited.
WNOHANG
Return immediately instead of blocking if there is no child exit status available.
WNOWAIT
Don't destroy the child exit status. The child's exit status can be retrieved by a
Page 327
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 8.10. The options constants for waitid
Constant
Description
subsequent call to wait, waitid,or waitpid.
WSTOPPED
Wait for a process that has stopped and whose status has not yet been
reported.
The infop argument is a pointer to a siginfo structure. This structure contains detailed
information about the signal generated that caused the state change in the child process.
The siginfo structure is discussed further in Section 10.14.
Of the four platforms covered in this book, only Solaris provides support for waitid.
Page 328
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.8. wait3 and wait4 Functions
Most UNIX system implementations provide two additional functions: wait3 and wait4.
Historically, these two variants descend from the BSD branch of the UNIX System. The only
feature provided by these two functions that isn't provided by the wait, waitid, and waitpid
functions is an additional argument that allows the kernel to return a summary of the
resources used by the terminated process and all its child processes.
[View full width]
#include
#include
#include
#include
<sys/types.h>
<sys/wait.h>
<sys/time.h>
<sys/resource.h>
pid_t wait3(int *statloc, int options, struct
rusage *rusage);
pid_t wait4(pid_t pid, int *statloc, int options
,
struct rusage *rusage);
Both return: process ID if OK, 0, or 1 on error
The resource information includes such statistics as the amount of user CPU time, the amount
of system CPU time, number of page faults, number of signals received, and the like. Refer to
the geTRusage(2) manual page for additional details. (This resource information differs from the
resource limits we described in Section 7.11.) Figure 8.11 details the various arguments
supported by the wait functions.
Figure 8.11. Arguments supported by wait functions on various
systems
Function pid
options
rusage
wait
POSIX.1
Free BSD
5.2.1
Linux
2.4.22
Mac OSX
10.3
Solaris 9
•
•
•
•
•
waitid
•
•
XSI
waitpid
•
•
•
wait3
wait4
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
The wait3 function was included in earlier versions of the Single UNIX Specification. In Version
Page 329
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
2, wait3 was moved to the legacy category; wait3 was removed from the specification in
Version 3.
Page 330
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.9. Race Conditions
For our purposes, a race condition occurs when multiple processes are trying to do something
with shared data and the final outcome depends on the order in which the processes run. The
fork function is a lively breeding ground for race conditions, if any of the logic after the fork
either explicitly or implicitly depends on whether the parent or child runs first after the fork. In
general, we cannot predict which process runs first. Even if we knew which process would run
first, what happens after that process starts running depends on the system load and the
kernel's scheduling algorithm.
We saw a potential race condition in the program in Figure 8.8 when the second child printed
its parent process ID. If the second child runs before the first child, then its parent process
will be the first child. But if the first child runs first and has enough time to exit, then the
parent process of the second child is init. Even calling sleep, as we did, guarantees nothing.
If the system was heavily loaded, the second child could resume after sleep returns, before
the first child has a chance to run. Problems of this form can be difficult to debug because
they tend to work "most of the time."
A process that wants to wait for a child to terminate must call one of the wait functions. If a
process wants to wait for its parent to terminate, as in the program from Figure 8.8, a loop of
the following form could be used:
while (getppid() != 1)
sleep(1);
The problem with this type of loop, called polling, is that it wastes CPU time, as the caller is
awakened every second to test the condition.
To avoid race conditions and to avoid polling, some form of signaling is required between
multiple processes. Signals can be used, and we describe one way to do this in Section 10.16.
Various forms of interprocess communication (IPC) can also be used. We'll discuss some of
these in Chapters 15 and 17.
For a parent and child relationship, we often have the following scenario. After the fork, both
the parent and the child have something to do. For example, the parent could update a record
in a log file with the child's process ID, and the child might have to create a file for the
parent. In this example, we require that each process tell the other when it has finished its
initial set of operations, and that each wait for the other to complete, before heading off on
its own. The following code illustrates this scenario:
#include
"apue.h"
TELL_WAIT();
/* set things up for TELL_xxx & WAIT_xxx */
if ((pid = fork()) < 0) {
err_sys("fork error");
} else if (pid == 0) {
/* child */
/* child does whatever is necessary ... */
TELL_PARENT(getppid());
WAIT_PARENT();
/* tell parent we're done */
/* and wait for parent */
/* and the child continues on its way ... */
exit(0);
Page 331
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
}
/* parent does whatever is necessary ... */
TELL_CHILD(pid);
WAIT_CHILD();
/* tell child we're done */
/* and wait for child */
/* and the parent continues on its way ... */
exit(0);
We assume that the header apue.h defines whatever variables are required. The five routines
TELL_WAIT, TELL_PARENT, TELL_CHILD, WAIT_PARENT, and WAIT_CHILD can be either macros or
functions.
We'll show various ways to implement these TELL and WAIT routines in later chapters: Section
10.16 shows an implementation using signals; Figure 15.7 shows an implementation using
pipes. Let's look at an example that uses these five routines.
Example
The program in Figure 8.12 outputs two strings: one from the child and one from the parent.
The program contains a race condition because the output depends on the order in which the
processes are run by the kernel and for how long each process runs.
We set the standard output unbuffered, so every character output generates a write. The
goal in this example is to allow the kernel to switch between the two processes as often as
possible to demonstrate the race condition. (If we didn't do this, we might never see the type
of output that follows. Not seeing the erroneous output doesn't mean that the race condition
doesn't exist; it simply means that we can't see it on this particular system.) The following
actual output shows how the results can vary:
$ ./a.out
ooutput from child
utput from parent
$ ./a.out
ooutput from child
utput from parent
$ ./a.out
output from child
output from parent
We need to change the program in Figure 8.12 to use the TELL and WAIT functions. The
program in Figure 8.13 does this. The lines preceded by a plus sign are new lines.
When we run this program, the output is as we expect; there is no intermixing of output from
the two processes.
In the program shown in Figure 8.13, the parent goes first. The child goes first if we change
the lines following the fork to be
} else if (pid == 0) {
charatatime("output from child\n");
TELL_PARENT(getppid());
} else {
WAIT_CHILD();
/* child goes first */
charatatime("output from parent\n");
}
Page 332
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Exercise 8.3 continues this example.
Figure 8.12. Program with a race condition
#include "apue.h"
static void charatatime(char *);
int
main(void)
{
pid_t
pid;
if ((pid = fork()) < 0) {
err_sys("fork error");
} else if (pid == 0) {
charatatime("output from child\n");
} else {
charatatime("output from parent\n");
}
exit(0);
}
static void
charatatime(char *str)
{
char
*ptr;
int
c;
setbuf(stdout, NULL);
/* set unbuffered */
for (ptr = str; (c = *ptr++) != 0; )
putc(c, stdout);
}
Figure 8.13. Modification of Figure 8.12 to avoid race condition
#include "apue.h"
static void charatatime(char *);
int
main(void)
{
pid_t
+
+
pid;
TELL_WAIT();
if ((pid = fork()) < 0) {
err_sys("fork error");
} else if (pid == 0) {
WAIT_PARENT();
/* parent goes first */
charatatime("output from child\n");
} else {
charatatime("output from parent\n");
TELL_CHILD(pid);
}
exit(0);
+
+
}
Page 333
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
static void
charatatime(char *str)
{
char
*ptr;
int
c;
setbuf(stdout, NULL);
/* set unbuffered */
for (ptr = str; (c = *ptr++) != 0; )
putc(c, stdout);
}
Page 334
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.10. exec Functions
We mentioned in Section 8.3 that one use of the fork function is to create a new process
(the child) that then causes another program to be executed by calling one of the exec
functions. When a process calls one of the exec functions, that process is completely replaced
by the new program, and the new program starts executing at its main function. The process
ID does not change across an exec, because a new process is not created; exec merely
replaces the current processits text, data, heap, and stack segmentswith a brand new
program from disk.
There are six different exec functions, but we'll often simply refer to "the exec function," which
means that we could use any of the six functions. These six functions round out the UNIX
System process control primitives. With fork, we can create new processes; and with the
exec functions, we can initiate new programs. The exit function and the wait functions handle
termination and waiting for termination. These are the only process control primitives we
need. We'll use these primitives in later sections to build additional functions, such as popen
and system.
[View full width]
#include <unistd.h>
int execl(const char *pathname, const char *arg0,
... /* (char *)0 */ );
int execv(const char *pathname, char *const argv []);
int execle(const char *pathname, const char *arg0,
...
/* (char *)0, char *const envp[] */ );
int execve(const char *pathname, char *const
argv[], char *const envp []);
int execlp(const char *filename, const char *arg0,
... /* (char *)0 */ );
int execvp(const char *filename, char *const argv []);
All six return: 1 on error, no return on success
The first difference in these functions is that the first four take a pathname argument,
whereas the last two take a filename argument. When a filename argument is specified

If filename contains a slash, it is taken as a pathname.

Otherwise, the executable file is searched for in the directories specified by the PATH
environment variable.
The PATH variable contains a list of directories, called path prefixes, that are separated by
colons. For example, the name=value environment string
Page 335
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
PATH=/bin:/usr/bin:/usr/local/bin/:.
specifies four directories to search. The last path prefix specifies the current directory. (A
zero-length prefix also means the current directory. It can be specified as a colon at the
beginning of the value, two colons in a row, or a colon at the end of the value.)
There are security reasons for never including the current directory in the search path. See
Garfinkel et al. [2003].
If either execlp or execvp finds an executable file using one of the path prefixes, but the file
isn't a machine executable that was generated by the link editor, the function assumes that
the file is a shell script and tries to invoke /bin/sh with the filename as input to the shell.
The next difference concerns the passing of the argument list (l stands for list and v stands
for vector). The functions execl, execlp, and execle require each of the command-line
arguments to the new program to be specified as separate arguments. We mark the end of
the arguments with a null pointer. For the other three functions (execv, execvp, and execve),
we have to build an array of pointers to the arguments, and the address of this array is the
argument to these three functions.
Before using ISO C prototypes, the normal way to show the command-line arguments for the
three functions execl, execle, and execlp was
char *arg0, char *arg1, ..., char *argn, (char *)0
This specifically shows that the final command-line argument is followed by a null pointer. If
this null pointer is specified by the constant 0, we must explicitly cast it to a pointer; if we
don't, it's interpreted as an integer argument. If the size of an integer is different from the
size of a char *, the actual arguments to the exec function will be wrong.
The final difference is the passing of the environment list to the new program. The two
functions whose names end in an e (execle and execve) allow us to pass a pointer to an array
of pointers to the environment strings. The other four functions, however, use the environ
variable in the calling process to copy the existing environment for the new program. (Recall
our discussion of the environment strings in Section 7.9 and Figure 7.8. We mentioned that if
the system supported such functions as setenv and putenv, we could change the current
environment and the environment of any subsequent child processes, but we couldn't affect
the environment of the parent process.) Normally, a process allows its environment to be
propagated to its children, but in some cases, a process wants to specify a certain
environment for a child. One example of the latter is the login program when a new login shell
is initiated. Normally, login creates a specific environment with only a few variables defined
and lets us, through the shell start-up file, add variables to the environment when we log in.
Before using ISO C prototypes, the arguments to execle were shown as
char *pathname, char *arg0, ..., char *argn, (char *)0, char *envp[]
This specifically shows that the final argument is the address of the array of character
pointers to the environment strings. The ISO C prototype doesn't show this, as all the
command-line arguments, the null pointer, and the envp pointer are shown with the ellipsis
notation (...).
The arguments for these six exec functions are difficult to remember. The letters in the
function names help somewhat. The letter p means that the function takes a filename
argument and uses the PATH environment variable to find the executable file. The letter l
means that the function takes a list of arguments and is mutually exclusive with the letter v,
which means that it takes an argv[] vector. Finally, the letter e means that the function
takes an envp[] array instead of using the current environment. Figure 8.14 shows the
Page 336
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
differences among these six functions.
Figure 8.14. Differences among the six exec functions
Function
execl
pathname filename
•
execlp
•
execle
•
execv
•
execvp
execve
Arg list
•
•
•
•
•
•
•
(letter in name)
argv[] environ
p
l
envp[]
•
•
•
•
•
•
•
v
e
Every system has a limit on the total size of the argument list and the environment list. From
Section 2.5.2 and Figure 2.8, this limit is given by ARG_MAX. This value must be at least 4,096
bytes on a POSIX.1 system. We sometimes encounter this limit when using the shell's filename
expansion feature to generate a list of filenames. On some systems, for example, the
command
grep getrlimit /usr/share/man/*/*
can generate a shell error of the form
Argument list too long
Historically, the limit in older System V implementations was 5,120 bytes. Older BSD systems
had a limit of 20,480 bytes. The limit in current systems is much higher. (See the output from
the program in Figure 2.13, which is summarized in Figure 2.14.)
To get around the limitation in argument list size, we can use the xargs(1) command to break
up long argument lists. To look for all the occurrences of geTRlimit in the man pages on our
system, we could use
find /usr/share/man -type f -print | xargs grep getrlimit
If the man pages on our system are compressed, however, we could try
find /usr/share/man -type f -print | xargs bzgrep getrlimit
We use the type -f option to the find command to restrict the list to contain only regular
files, because the grep commands can't search for patterns in directories, and we want to
avoid unnecessary error messages.
Page 337
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
We've mentioned that the process ID does not change after an exec, but the new program
inherits additional properties from the calling process:

Process ID and parent process ID

Real user ID and real group ID

Supplementary group IDs

Process group ID

Session ID

Controlling terminal

Time left until alarm clock

Current working directory

Root directory

File mode creation mask

File locks

Process signal mask

Pending signals

Resource limits

Values for tms_utime, tms_stime, tms_cutime, and tms_cstime
The handling of open files depends on the value of the close-on-exec flag for each descriptor.
Recall from Figure 3.6 and our mention of the FD_CLOEXEC flag in Section 3.14 that every open
descriptor in a process has a close-on-exec flag. If this flag is set, the descriptor is closed
across an exec. Otherwise, the descriptor is left open across the exec. The default is to leave
the descriptor open across the exec unless we specifically set the close-on-exec flag using
fcntl.
POSIX.1 specifically requires that open directory streams (recall the opendir function from
Section 4.21) be closed across an exec. This is normally done by the opendir function calling
fcntl to set the close-on-exec flag for the descriptor corresponding to the open directory
stream.
Note that the real user ID and the real group ID remain the same across the exec, but the
effective IDs can change, depending on the status of the set-user-ID and the set- group-ID
bits for the program file that is executed. If the set-user-ID bit is set for the new program,
the effective user ID becomes the owner ID of the program file. Otherwise, the effective user
ID is not changed (it's not set to the real user ID). The group ID is handled in the same way.
In many UNIX system implementations, only one of these six functions, execve, is a system call
within the kernel. The other five are just library functions that eventually invoke this system
call. We can illustrate the relationship among these six functions as shown in Figure 8.15.
Figure 8.15. Relationship of the six exec functions
[View full size image]
Page 338
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
In this arrangement, the library functions execlp and execvp process the PATH environment
variable, looking for the first path prefix that contains an executable file named filename.
Example
The program in Figure 8.16 demonstrates the exec functions.
We first call execle, which requires a pathname and a specific environment. The next call is
to execlp, which uses a filename and passes the caller's environment to the new program. The
only reason the call to execlp works is that the directory /home/sar/bin is one of the current
path prefixes. Note also that we set the first argument, argv[0] in the new program, to be the
filename component of the pathname. Some shells set this argument to be the complete
pathname. This is a convention only. We can set argv[0] to any string we like. The login
command does this when it executes the shell. Before executing the shell, login adds a dash
as a prefix to argv[0] to indicate to the shell that it is being invoked as a login shell. A login
shell will execute the start-up profile commands, whereas a nonlogin shell will not.
The program echoall that is executed twice in the program in Figure 8.16 is shown in Figure
8.17. It is a trivial program that echoes all its command-line arguments and its entire
environment list.
When we execute the program from Figure 8.16, we get
$ ./a.out
argv[0]: echoall
argv[1]: myarg1
argv[2]: MY ARG2
USER=unknown
PATH=/tmp
$ argv[0]: echoall
argv[1]: only 1 arg
USER=sar
LOGNAME=sar
SHELL=/bin/bash
47 more lines that aren't shown
HOME=/home/sar
Note that the shell prompt appeared before the printing of argv[0] from the second exec. This
is because the parent did not wait for this child process to finish.
Figure 8.16. Example of exec functions
#include "apue.h"
#include <sys/wait.h>
char
*env_init[] = { "USER=unknown", "PATH=/tmp", NULL };
int
main(void)
{
Page 339
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
pid_t
pid;
if ((pid = fork()) < 0) {
err_sys("fork error");
} else if (pid == 0) { /* specify pathname, specify environment */
if (execle("/home/sar/bin/echoall", "echoall", "myarg1",
"MY ARG2", (char *)0, env_init) < 0)
err_sys("execle error");
}
if (waitpid(pid, NULL, 0) < 0)
err_sys("wait error");
if ((pid = fork()) < 0) {
err_sys("fork error");
} else if (pid == 0) { /* specify filename, inherit environment */
if (execlp("echoall", "echoall", "only 1 arg", (char *)0) < 0)
err_sys("execlp error");
}
exit(0);
}
Figure 8.17. Echo all command-line arguments and all environment
strings
#include "apue.h"
int
main(int argc, char *argv[])
{
int
i;
char
**ptr;
extern char **environ;
for (i = 0; i < argc; i++)
/* echo all command-line args */
printf("argv[%d]: %s\n", i, argv[i]);
for (ptr = environ; *ptr != 0; ptr++)
printf("%s\n", *ptr);
/* and all env strings */
exit(0);
}
Page 340
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.11. Changing User IDs and Group IDs
In the UNIX System, privileges, such as being able to change the system's notion of the
current date, and access control, such as being able to read or write a particular file, are
based on user and group IDs. When our programs need additional privileges or need to gain
access to resources that they currently aren't allowed to access, they need to change their
user or group ID to an ID that has the appropriate privilege or access. Similarly, when our
programs need to lower their privileges or prevent access to certain resources, they do so by
changing either their user ID or group ID to an ID without the privilege or ability access to the
resource.
In general, we try to use the least-privilege model when we design our applications. Following
this model, our programs should use the least privilege necessary to accomplish any given
task. This reduces the likelihood that security can be compromised by a malicious user trying
to trick our programs into using their privileges in unintended ways.
We can set the real user ID and effective user ID with the setuid function. Similarly, we can
set the real group ID and the effective group ID with the setgid function.
#include <unistd.h>
int setuid(uid_t uid);
int setgid(gid_t gid);
Both return: 0 if OK, 1 on error
There are rules for who can change the IDs. Let's consider only the user ID for now.
(Everything we describe for the user ID also applies to the group ID.)
1.
If the process has superuser privileges, the setuid function sets the real user ID,
effective user ID, and saved set-user-ID to uid.
2.
If the process does not have superuser privileges, but uid equals either the real user ID
or the saved set-user-ID, setuid sets only the effective user ID to uid. The real user
ID and the saved set-user-ID are not changed.
3.
If neither of these two conditions is true, errno is set to EPERM, and 1 is returned.
Here, we are assuming that _POSIX_SAVED_IDS is true. If this feature isn't provided, then delete
all preceding references to the saved set-user-ID.
The saved IDs are a mandatory feature in the 2001 version of POSIX.1. They used to be
optional in older versions of POSIX. To see whether an implementation supports this feature,
an application can test for the constant _POSIX_SAVED_IDS at compile time or call sysconf with
the _SC_SAVED_IDS argument at runtime.
We can make a few statements about the three user IDs that the kernel maintains.
1.
Only a superuser process can change the real user ID. Normally, the real user ID is set
by the login(1) program when we log in and never changes. Because login is a
superuser process, it sets all three user IDs when it calls setuid.
2.
The effective user ID is set by the exec functions only if the set-user-ID bit is set for
the program file. If the set-user-ID bit is not set, the exec functions leave the
Page 341
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
effective user ID as its current value. We can call setuid at any time to set the
effective user ID to either the real user ID or the saved set-user-ID. Naturally, we
can't set the effective user ID to any random value.
3.
The saved set-user-ID is copied from the effective user ID by exec. If the file's
set-user-ID bit is set, this copy is saved after exec stores the effective user ID from
the file's user ID.
Figure 8.18 summarizes the various ways these three user IDs can be changed.
Figure 8.18. Ways to change the three user IDs
ID
setuid(uid)
exec
set-user-ID bit off
set-user-ID bit on
superuser
unprivileged
user
real user ID
unchanged
unchanged
set to uid
unchanged
effective user
ID
unchanged
set from user ID of
program file
set to uid
set to uid
saved
set-user ID
copied from effective
user ID
copied from effective
user ID
set to uid
unchanged
Note that we can obtain only the current value of the real user ID and the effective user ID
with the functions getuid and geteuid from Section 8.2. We can't obtain the current value of
the saved set-user-ID.
Example
To see the utility of the saved set-user-ID feature, let's examine the operation of a program
that uses it. We'll look at the man(1) program, which is used to display online manual pages.
The man program can be installed either set-user-ID or set-group-ID to a specific user or
group, usually one reserved for man itself. The man program can be made to read and possibly
overwrite files in locations that are chosen either through a configuration file (usually
/etc/man.config or /etc/manpath.config) or using a command-line option.
The man program might have to execute several other commands to process the files
containing the manual page to be displayed. To prevent being tricked into running the wrong
commands or overwriting the wrong files, the man command has to switch between two sets of
privileges: those of the user running the man command and those of the user that owns the
man executable file. The following steps take place.
1.
Assuming that the man program file is owned by the user name man and has its
set-user-ID bit set, when we exec it, we have
real user ID = our user ID
effective user ID = man
saved set-user-ID = man
2.
The man program accesses the required configuration files and manual pages. These
files are owned by the user name man, but because the effective user ID is man, file
access is allowed.
Page 342
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.
Before man runs any command on our behalf, it calls setuid(getuid()). Because we are
not a superuser process, this changes only the effective user ID. We have
real user ID = our user ID (unchanged)
effective user ID = our user ID
saved set-user-ID = man (unchanged)
Now the man process is running with our user ID as its effective user ID. This means
that we can access only the files to which we have normal access. We have no
additional permissions. It can safely execute any filter on our behalf.
4.
When the filter is done, man calls setuid(euid), where euid is the numerical user ID for
the user name man. (This was saved by man by calling geteuid.) This call is allowed
because the argument to setuid equals the saved set-user-ID. (This is why we need
the saved set-user-ID.) Now we have
real user ID = our user ID (unchanged)
effective user ID = man
saved set-user-ID = man (unchanged)
5.
The man program can now operate on its files, as its effective user ID is man.
By using the saved set-user-ID in this fashion, we can use the extra privileges granted to us
by the set-user-ID of the program file at the beginning of the process and at the end of the
process. In between, however, the process runs with our normal permissions. If we weren't
able to switch back to the saved set-user-ID at the end, we might be tempted to retain the
extra permissions the whole time we were running (which is asking for trouble).
Let's look at what happens if man spawns a shell for us while it is running. (The shell is
spawned using fork and exec.) Because the real user ID and the effective user ID are both our
normal user ID (step 3), the shell has no extra permissions. The shell can't access the saved
set-user-ID that is set to man while man is running, because the saved set-user-ID for the shell
is copied from the effective user ID by exec. So in the child process that does the exec, all
three user IDs are our normal user ID.
Our description of how man uses the setuid function is not correct if the program is
set-user-ID to root, because a call to setuid with superuser privileges sets all three user IDs.
For the example to work as described, we need setuid to set only the effective user ID.
setreuid and setregid Functions
Historically, BSD supported the swapping of the real user ID and the effective user ID with
the setreuid function.
#include <unistd.h>
int setreuid(uid_t ruid, uid_t euid
);
int setregid(gid_t rgid, gid_t egid
);
Both return: 0 if OK, 1 on error
We can supply a value of 1 for any of the arguments to indicate that the corresponding ID
should remain unchanged.
Page 343
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The rule is simple: an unprivileged user can always swap between the real user ID and the
effective user ID. This allows a set-user-ID program to swap to the user's normal permissions
and swap back again later for set-user-ID operations. When the saved set-user-ID feature
was introduced with POSIX.1, the rule was enhanced to also allow an unprivileged user to set
its effective user ID to its saved set-user-ID.
Both setreuid and setregid are XSI extensions in the Single UNIX Specification. As such, all
UNIX System implementations are expected to provide support for them.
4.3BSD didn't have the saved set-user-ID feature described earlier. It used setreuid and
setregid instead. This allowed an unprivileged user to swap back and forth between the two
values. Be aware, however, that when programs that used this feature spawned a shell, they
had to set the real user ID to the normal user ID before the exec. If they didn't do this, the
real user ID could be privileged (from the swap done by setreuid) and the shell process could
call setreuid to swap the two and assume the permissions of the more privileged user. As a
defensive programming measure to solve this problem, programs set both the real user ID and
the effective user ID to the normal user ID before the call to exec in the child.
seteuid and setegid Functions
POSIX.1 includes the two functions seteuid and setegid. These functions are similar to setuid
and setgid, but only the effective user ID or effective group ID is changed.
#include <unistd.h>
int seteuid(uid_t uid);
int setegid(gid_t gid);
Both return: 0 if OK, 1 on error
An unprivileged user can set its effective user ID to either its real user ID or its saved
set-user-ID. For a privileged user, only the effective user ID is set to uid. (This differs from
the setuid function, which changes all three user IDs.)
Figure 8.19 summarizes all the functions that we've described here that modify the three user
IDs.
Figure 8.19. Summary of all the functions that set the various user IDs
[View full size image]
Page 344
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Group IDs
Everything that we've said so far in this section also applies in a similar fashion to group IDs.
The supplementary group IDs are not affected by setgid, setregid, or setegid.
Page 345
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.12. Interpreter Files
All contemporary UNIX systems support interpreter files. These files are text files that begin
with a line of the form
#! pathname [ optional-argument ]
The space between the exclamation point and the pathname is optional. The most common of
these interpreter files begin with the line
#!/bin/sh
The pathname is normally an absolute pathname, since no special operations are performed on
it (i.e., PATH is not used). The recognition of these files is done within the kernel as part of
processing the exec system call. The actual file that gets executed by the kernel is not the
interpreter file, but the file specified by the pathname on the first line of the interpreter file.
Be sure to differentiate between the interpreter filea text file that begins with #!and the
interpreter, which is specified by the pathname on the first line of the interpreter file.
Be aware that systems place a size limit on the first line of an interpreter file. This limit
includes the #!, the pathname, the optional argument, the terminating newline, and any
spaces.
On FreeBSD 5.2.1, this limit is 128 bytes. Mac OS X 10.3 extends this limit to 512 bytes. Linux
2.4.22 supports a limit of 127 bytes, whereas Solaris 9 places the limit at 1,023 bytes.
Example
Let's look at an example to see what the kernel does with the arguments to the exec function
when the file being executed is an interpreter file and the optional argument on the first line of
the interpreter file. The program in Figure 8.20 execs an interpreter file.
The following shows the contents of the one-line interpreter file that is executed and the
result from running the program in Figure 8.20:
$ cat /home/sar/bin/testinterp
#!/home/sar/bin/echoarg foo
$ ./a.out
argv[0]: /home/sar/bin/echoarg
argv[1]: foo
argv[2]: /home/sar/bin/testinterp
argv[3]: myarg1
argv[4]: MY ARG2
The program echoarg (the interpreter) just echoes each of its command-line arguments. (This
is the program from Figure 7.4.) Note that when the kernel execs the interpreter (
/home/sar/bin/echoarg), argv[0] is the pathname of the interpreter, argv[1] is the optional
argument from the interpreter file, and the remaining arguments are the pathname (
/home/sar/bin/testinterp) and the second and third arguments from the call to execl in the
program shown in Figure 8.20 (myarg1 and MY ARG2). Both argv[1] and argv[2] from the call to
execl have been shifted right two positions. Note that the kernel takes the pathname from
the execl call instead of the first argument (testinterp), on the assumption that the
pathname might contain more information than the first argument.
Page 346
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 8.20. A program that execs an interpreter file
#include "apue.h"
#include <sys/wait.h>
int
main(void)
{
pid_t
pid;
if ((pid = fork()) < 0) {
err_sys("fork error");
} else if (pid == 0) {
/* child */
if (execl("/home/sar/bin/testinterp",
"testinterp", "myarg1", "MY ARG2", (char *)0) < 0)
err_sys("execl error");
}
if (waitpid(pid, NULL, 0) < 0) /* parent */
err_sys("waitpid error");
exit(0);
}
Example
A common use for the optional argument following the interpreter pathname is to specify the
-f option for programs that support this option. For example, an awk(1) program can be
executed as
awk -f myfile
which tells awk to read the awk program from the file myfile.
Systems derived from UNIX System V often include two versions of the awk language. On
these systems, awk is often called "old awk" and corresponds to the original version distributed
with Version 7. In contrast, nawk (new awk) contains numerous enhancements and corresponds
to the language described in Aho, Kernighan, and Weinberger [1988]. This newer version
provides access to the command-line arguments, which we need for the example that follows.
Solaris 9 provides both versions.
The awk program is one of the utilities included by POSIX in its 1003.2 standard, which is now
part of the base POSIX.1 specification in the Single UNIX Specification. This utility is also
based on the language described in Aho, Kernighan, and Weinberger [1988].
The version of awk in Mac OS X 10.3 is based on the Bell Laboratories version that Lucent has
placed in the public domain. FreeBSD 5.2.1 and Linux 2.4.22 ship with GNU awk, called gawk,
which is linked to the name awk. The gawk version conforms to the POSIX standard, but also
includes other extensions. Because they are more up-to-date, the version of awk from Bell
Laboratories and gawk are preferred to either nawk or old awk. (The version of awk from Bell
Laboratories is available at http://cm.bell-labs.com/cm/cs/awkbook/index.html.)
Using the -f option with an interpreter file lets us write
#!/bin/awk -f
(awk program follows in the interpreter file)
For example, Figure 8.21 shows /usr/local/bin/awkexample (an interpreter file).
If one of the path prefixes is /usr/local/bin, we can execute the program in Figure 8.21
Page 347
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
(assuming that we've turned on the execute bit for the file) as
$ awkexample file1 FILENAME2 f3
ARGV[0] = awk
ARGV[1] = file1
ARGV[2] = FILENAME2
ARGV[3] = f3
When /bin/awk is executed, its command-line arguments are
/bin/awk -f /usr/local/bin/awkexample file1 FILENAME2 f3
The pathname of the interpreter file (/usr/local/bin/awkexample) is passed to the interpreter.
The filename portion of this pathname (what we typed to the shell) isn't adequate, because
the interpreter (/bin/awk in this example) can't be expected to use the PATH variable to locate
files. When it reads the interpreter file, awk ignores the first line, since the pound sign is awk's
comment character.
We can verify these command-line arguments with the following commands:
$ /bin/su
Password:
# mv /bin/awk /bin/awk.save
# cp /home/sar/bin/echoarg /bin/awk
# suspend
[1] + Stopped
/bin/su
$ awkexample file1 FILENAME2 f3
argv[0]: /bin/awk
argv[1]: -f
argv[2]: /usr/local/bin/awkexample
argv[3]: file1
argv[4]: FILENAME2
argv[5]: f3
$ fg
/bin/su
# mv /bin/awk.save /bin/awk
# exit
become superuser
enter superuser password
save the original program
and replace it temporarily
suspend the superuser shell using job control
resume superuser shell using job control
restore the original program
and exit the superuser shell
In this example, the -f option for the interpreter is required. As we said, this tells awk where to
look for the awk program. If we remove the -f option from the interpreter file, an error
message usually results when we try to run it. The exact text of the message varies,
depending on where the interpreter file is stored and whether the remaining arguments
represent existing files. This is because the command-line arguments in this case are
/bin/awk /usr/local/bin/awkexample file1 FILENAME2 f3
and awk is trying to interpret the string /usr/local/bin/awkexample as an awk program. If we
couldn't pass at least a single optional argument to the interpreter (-f in this case), these
interpreter files would be usable only with the shells.
Figure 8.21. An awk program as an interpreter file
#!/bin/awk -f
BEGIN {
for (i = 0; i < ARGC; i++)
printf "ARGV[%d] = %s\n", i, ARGV[i]
Page 348
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
exit
}
Are interpreter files required? Not really. They provide an efficiency gain for the user at some
expense in the kernel (since it's the kernel that recognizes these files). Interpreter files are
useful for the following reasons.
1.
2.
3.
They hide that certain programs are scripts in some other language. For example, to
execute the program in Figure 8.21, we just say
awkexample optional-arguments
instead of needing to know that the program is really an awk script that we would
otherwise have to execute as
awk -f awkexample optional-arguments
4.
5.
6.
7.
8.
9.
10.
Interpreter scripts provide an efficiency gain. Consider the previous example again. We
could still hide that the program is an awk script, by wrapping it in a shell script:
awk 'BEGIN {
for (i = 0; i < ARGC; i++)
printf "ARGV[%d] = %s\n", i, ARGV[i]
exit
}' $*
The problem with this solution is that more work is required. First, the shell reads the
command and tries to execlp the filename. Because the shell script is an executable
file, but isn't a machine executable, an error is returned, and execlp assumes that the
file is a shell script (which it is). Then /bin/sh is executed with the pathname of the
shell script as its argument. The shell correctly runs our script, but to run the awk
program, the shell does a fork, exec, and wait. Thus, there is more overhead in
replacing an interpreter script with a shell script.
11. Interpreter scripts let us write shell scripts using shells other than /bin/sh. When it
finds an executable file that isn't a machine executable, execlp has to choose a shell to
invoke, and it always uses /bin/sh. Using an interpreter script, however, we can simply
write
12.
13.
14.
#!/bin/csh
(C shell script follows in the interpreter file)
Again, we could wrap this all in a /bin/sh script (that invokes the C shell), as we
described earlier, but more overhead is required.
None of this would work as we've shown if the three shells and awk didn't use the pound sign
as their comment character.
Page 349
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.13. system Function
It is convenient to execute a command string from within a program. For example, assume
that we want to put a time-and-date stamp into a certain file. We could use the functions we
describe in Section 6.10 to do this: call time to get the current calendar time, then call
localtime to convert it to a broken-down time, and then call strftime to format the result,
and write the results to the file. It is much easier, however, to say
system("date > file");
ISO C defines the system function, but its operation is strongly system dependent. POSIX.1
includes the system interface, expanding on the ISO C definition to describe its behavior in a
POSIX environment.
#include <stdlib.h>
int system(const char *cmdstring
);
Returns: (see below)
If cmdstring is a null pointer, system returns nonzero only if a command processor is available.
This feature determines whether the system function is supported on a given operating
system. Under the UNIX System, system is always available.
Because system is implemented by calling fork, exec, and waitpid, there are three types of
return values.
1.
If either the fork fails or waitpid returns an error other than EINTR, system returns 1
with errno set to indicate the error.
2.
If the exec fails, implying that the shell can't be executed, the return value is as if the
shell had executed exit(127).
3.
Otherwise, all three functionsfork, exec, and waitpidsucceed, and the return value
from system is the termination status of the shell, in the format specified for waitpid.
Some older implementations of system returned an error (EINTR) if waitpid was
interrupted by a caught signal. Because there is no cleanup strategy that an
application can use to recover from this type of error, POSIX later added the
requirement that system not return an error in this case. (We discuss interrupted
system calls in Section 10.5.)
Figure 8.22 shows an implementation of the system function. The one feature that it doesn't
handle is signals. We'll update this function with signal handling in Section 10.18.
Figure 8.22. The system function, without signal handling
#include
#include
#include
<sys/wait.h>
<errno.h>
<unistd.h>
int
system(const char *cmdstring)
{
/* version without signal handling */
Page 350
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
pid_t
int
pid;
status;
if (cmdstring == NULL)
return(1);
/* always a command processor with UNIX */
if ((pid = fork()) < 0) {
status = -1;
/* probably out of processes */
} else if (pid == 0) {
/* child */
execl("/bin/sh", "sh", "-c", cmdstring, (char *)0);
_exit(127);
/* execl error */
} else {
/* parent */
while (waitpid(pid, &status, 0) < 0) {
if (errno != EINTR) {
status = -1; /* error other than EINTR from waitpid() */
break;
}
}
}
return(status);
}
The shell's -c option tells it to take the next command-line argumentcmdstring, in this caseas
its command input instead of reading from standard input or from a given file. The shell parses
this null-terminated C string and breaks it up into separate command-line arguments for the
command. The actual command string that is passed to the shell can contain any valid shell
commands. For example, input and output redirection using < and > can be used.
If we didn't use the shell to execute the command, but tried to execute the command ourself,
it would be more difficult. First, we would want to call execlp instead of execl, to use the PATH
variable, like the shell. We would also have to break up the null-terminated C string into
separate command-line arguments for the call to execlp. Finally, we wouldn't be able to use
any of the shell metacharacters.
Note that we call _exit instead of exit. We do this to prevent any standard I/O buffers,
which would have been copied from the parent to the child across the fork, from being
flushed in the child.
We can test this version of system with the program shown in Figure 8.23. (The pr_exit
function was defined in Figure 8.5.)
Figure 8.23. Calling the system function
#include "apue.h"
#include <sys/wait.h>
int
main(void)
{
int
status;
if ((status = system("date")) < 0)
err_sys("system() error");
pr_exit(status);
if ((status = system("nosuchcommand")) < 0)
err_sys("system() error");
pr_exit(status);
Page 351
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
if ((status = system("who; exit 44")) < 0)
err_sys("system() error");
pr_exit(status);
exit(0);
}
Running the program in Figure 8.23 gives us
$ ./a.out
Sun Mar 21 18:41:32 EST 2004
normal termination, exit status = 0
sh: nosuchcommand: command not found
normal termination, exit status = 127
sar
:0
Mar 18 19:45
sar
pts/0
Mar 18 19:45 (:0)
sar
pts/1
Mar 18 19:45 (:0)
sar
pts/2
Mar 18 19:45 (:0)
sar
pts/3
Mar 18 19:45 (:0)
normal termination, exit status = 44
for date
for nosuchcommand
for exit
The advantage in using system, instead of using fork and exec directly, is that system does all
the required error handling and (in our next version of this function in Section 10.18) all the
required signal handling.
Earlier systems, including SVR3.2 and 4.3BSD, didn't have the waitpid function available.
Instead, the parent waited for the child, using a statement such as
while ((lastpid = wait(&status)) != pid && lastpid != -1)
;
A problem occurs if the process that calls system has spawned its own children before calling
system. Because the while statement above keeps looping until the child that was generated
by system terminates, if any children of the process terminate before the process identified by
pid, then the process ID and termination status of these other children are discarded by the
while statement. Indeed, this inability to wait for a specific child is one of the reasons given in
the POSIX.1 Rationale for including the waitpid function. We'll see in Section 15.3 that the
same problem occurs with the popen and pclose functions, if the system doesn't provide a
waitpid function.
Set-User-ID Programs
What happens if we call system from a set-user-ID program? Doing so is a security hole and
should never be done. Figure 8.24 shows a simple program that just calls system for its
command-line argument.
Figure 8.24. Execute the command-line argument using system
#include "apue.h"
int
main(int argc, char *argv[])
{
int
status;
if (argc < 2)
err_quit("command-line argument required");
Page 352
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
if ((status = system(argv[1])) < 0)
err_sys("system() error");
pr_exit(status);
exit(0);
}
We'll compile this program into the executable file tsys.
Figure 8.25 shows another simple program that prints its real and effective user IDs.
Figure 8.25. Print real and effective user IDs
#include "apue.h"
int
main(void)
{
printf("real uid = %d, effective uid = %d\n", getuid(), geteuid());
exit(0);
}
We'll compile this program into the executable file printuids. Running both programs gives us
the following:
$ tsys printuids
normal execution, no special privileges
real uid = 205, effective uid = 205
normal termination, exit status = 0
$ su
become superuser
Password:
enter superuser password
# chown root tsys
change owner
# chmod u+s tsys
make set-user-ID
# ls -l tsys
verify file's permissions and owner
-rwsrwxr-x 1 root
16361 Mar 16 16:59 tsys
# exit
leave superuser shell
$ tsys printuids
real uid = 205, effective uid = 0
oops, this is a security hole
normal termination, exit status = 0
The superuser permissions that we gave the tsys program are retained across the fork and
exec that are done by system.
When /bin/sh is bash version 2, the previous example doesn't work, because bash will reset
the effective user ID to the real user ID when they don't match.
If it is running with special permissionseither set-user-ID or set-group-IDand wants to spawn
another process, a process should use fork and exec directly, being certain to change back to
normal permissions after the fork, before calling exec. The system function should never be
used from a set-user-ID or a set-group-ID program.
One reason for this admonition is that system invokes the shell to parse the command string,
and the shell uses its IFS variable as the input field separator. Older versions of the shell didn't
reset this variable to a normal set of characters when invoked. This allowed a malicious user
to set IFS before system was called, causing system to execute a different program.
Page 353
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 354
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.14. Process Accounting
Most UNIX systems provide an option to do process accounting. When enabled, the kernel
writes an accounting record each time a process terminates. These accounting records are
typically a small amount of binary data with the name of the command, the amount of CPU
time used, the user ID and group ID, the starting time, and so on. We'll take a closer look at
these accounting records in this section, as it gives us a chance to look at processes again
and to use the fread function from Section 5.9.
Process accounting is not specified by any of the standards. Thus, all the implementations
have annoying differences. For example, the I/O counts maintained on Solaris 9 are in units of
bytes, whereas FreeBSD 5.2.1 and Mac OS X 10.3 maintain units of blocks, although there is
no distinction between different block sizes, making the counter effectively useless. Linux
2.4.22, on the other hand, doesn't try to maintain I/O statistics at all.
Each implementation also has its own set of administrative commands to process raw
accounting data. For example, Solaris provides runacct(1m) and acctcom(1), whereas FreeBSD
provides the sa(8) command to process and summarize the raw accounting data.
A function we haven't described (acct) enables and disables process accounting. The only use
of this function is from the accton(8) command (which happens to be one of the few
similarities among platforms). A superuser executes accton with a pathname argument to
enable accounting. The accounting records are written to the specified file, which is usually
/var/account/acct on FreeBSD and Mac OS X, /var/account/pacct on Linux, and /var/adm/pacct
on Solaris. Accounting is turned off by executing accton without any arguments.
The structure of the accounting records is defined in the header <sys/acct.h> and looks
something like
typedef
struct
{
char
char
u_short comp_t;
acct
uid_t
gid_t
dev_t
time_t
comp_t
comp_t
comp_t
comp_t
comp_t
ac_flag;
ac_stat;
ac_uid;
ac_gid;
ac_tty;
ac_btime;
ac_utime;
ac_stime;
ac_etime;
ac_mem;
ac_io;
comp_t ac_rw;
char
/* 3-bit base 8 exponent; 13-bit fraction */
ac_comm[8];
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
flag (see Figure 8.26) */
termination status (signal & core flag only) */
(Solaris only) */
real user ID */
real group ID */
controlling terminal */
starting calendar time */
user CPU time (clock ticks) */
system CPU time (clock ticks) */
elapsed time (clock ticks) */
average memory usage */
bytes transferred (by read and write) */
"blocks" on BSD systems */
blocks read or written */
(not present on BSD systems) */
command name: [8] for Solaris, */
[10] for Mac OS X, [16] for FreeBSD, and */
[17] for Linux */
};
The ac_flag member records certain events during the execution of the process. These
events are described in Figure 8.26.
Page 355
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 8.26. Values for ac_flag from accounting record
ac_flag
Description
FreeBSD
5.2.1
Linux
2.4.22
Mac OS X
10.3
Solaris
9
•
•
•
•
•
•
•
AFORK
process is the result of fork, but
never called exec
ASU
process used superuser privileges
ACOMPAT
process used compatibility mode
ACORE
process dumped core
•
•
•
AXSIG
process was killed by a signal
•
•
•
AEXPND
expanded accounting entry
•
The data required for the accounting record, such as CPU times and number of characters
transferred, is kept by the kernel in the process table and initialized whenever a new process
is created, as in the child after a fork. Each accounting record is written when the process
terminates. This means that the order of the records in the accounting file corresponds to the
termination order of the processes, not the order in which they were started. To know the
starting order, we would have to go through the accounting file and sort by the starting
calendar time. But this isn't perfect, since calendar times are in units of seconds (Section 1.10
), and it's possible for many processes to be started in any given second. Alternatively, the
elapsed time is given in clock ticks, which are usually between 60 and 128 ticks per second.
But we don't know the ending time of a process; all we know is its starting time and ending
order. This means that even though the elapsed time is more accurate than the starting time,
we still can't reconstruct the exact starting order of various processes, given the data in the
accounting file.
The accounting records correspond to processes, not programs. A new record is initialized by
the kernel for the child after a fork, not when a new program is executed. Although exec
doesn't create a new accounting record, the command name changes, and the AFORK flag is
cleared. This means that if we have a chain of three programsA execs B, then B execs C, and
C exitsonly a single accounting record is written. The command name in the record
corresponds to program C, but the CPU times, for example, are the sum for programs A, B, and
C.
Example
To have some accounting data to examine, we'll create a test program to implement the
diagram shown in Figure 8.27.
The source for the test program is shown in Figure 8.28. It calls fork four times. Each child
does something different and then terminates.
We'll run the test program on Solaris and then use the program in Figure 8.29 to print out
selected fields from the accounting records.
BSD-derived platforms don't support the ac_flag member, so we define the HAS_SA_STAT
constant on the platforms that do support this member. Basing the defined symbol on the
feature instead of on the platform reads better and allows us to modify the program simply by
Page 356
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
adding the additional definition to our compilation command. The alternative would be to use
#if defined(BSD) || defined(MACOS)
which becomes unwieldy as we port our application to additional platforms.
We define similar constants to determine whether the platform supports the ACORE and AXSIG
accounting flags. We can't use the flag symbols themselves, because on Linux, they are
defined as enum values, which we can't use in a #ifdef expression.
To perform our test, we do the following:
1.
Become superuser and enable accounting, with the accton command. Note that when
this command terminates, accounting should be on; therefore, the first record in the
accounting file should be from this command.
2.
Exit the superuser shell and run the program in Figure 8.28. This should append six
records to the accounting file: one for the superuser shell, one for the test parent, and
one for each of the four test children.
A new process is not created by the execl in the second child. There is only a single
accounting record for the second child.
3.
Become superuser and turn accounting off. Since accounting is off when this accton
command terminates, it should not appear in the accounting file.
4.
Run the program in Figure 8.29 to print the selected fields from the accounting file.
The output from step 4 follows. We have appended to each line the description of the process
in italics, for the discussion later.
accton
sh
dd
a.out
a.out
a.out
a.out
e
e
e
e
e
e
e
=
=
=
=
=
=
=
6,
2106,
8,
202,
407,
600,
801,
chars
chars
chars
chars
chars
chars
chars
=
=
=
=
=
=
=
0,
15632,
273344,
921,
0,
0,
0,
stat
stat
stat
stat
stat
stat
stat
=
0:
=
0:
=
0:
=
0:
= 134:
=
9:
=
0:
S
S
F
F
F
second child
parent
first child
fourth child
third child
The elapsed time values are measured in units of clock ticks per second. From Figure 2.14, the
value on this system is 100. For example, the sleep(2) in the parent corresponds to the
elapsed time of 202 clock ticks. For the first child, the sleep(4) becomes 407 clock ticks. Note
that the amount of time a process sleeps is not exact. (We'll return to the sleep function in
Chapter 10.) Also, the calls to fork and exit take some amount of time.
Note that the ac_stat member is not the true termination status of the process, but
corresponds to a portion of the termination status that we discussed in Section 8.6. The only
information in this byte is a core-flag bit (usually the high-order bit) and the signal number
(usually the seven low-order bits), if the process terminated abnormally. If the process
terminated normally, we are not able to obtain the exit status from the accounting file. For
the first child, this value is 128 + 6. The 128 is the core flag bit, and 6 happens to be the
value on this system for SIGABRT, which is generated by the call to abort. The value 9 for the
fourth child corresponds to the value of SIGKILL. We can't tell from the accounting data that
the parent's argument to exit was 2 and that the third child's argument to exit was 0.
The size of the file /etc/termcap that the dd process copies in the second child is 136,663
bytes. The number of characters of I/O is just over twice this value. It is twice the value, as
136,663 bytes are read in, then 136,663 bytes are written out. Even though the output goes
to the null device, the bytes are still accounted for.
Page 357
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The ac_flag values are as we expect. The F flag is set for all the child processes except the
second child, which does the execl. The F flag is not set for the parent, because the
interactive shell that executed the parent did a fork and then an exec of the a.out file. The
first child process calls abort, which generates a SIGABRT signal to generate the core dump.
Note that neither the X flag nor the D flag is on, as they are not supported on Solaris; the
information they represent can be derived from the ac_stat field. The fourth child also
terminates because of a signal, but the SIGKILL signal does not generate a core dump; it only
terminates the process.
As a final note, the first child has a 0 count for the number of characters of I/O, yet this
process generated a core file. It appears that the I/O required to write the core file is not
charged to the process.
Figure 8.27. Process structure for accounting example
[View full size image]
Figure 8.28. Program to generate accounting data
#include "apue.h"
int
main(void)
{
pid_t
pid;
if ((pid = fork()) < 0)
err_sys("fork error");
else if (pid != 0) {
/* parent */
sleep(2);
exit(2);
/* terminate with exit status 2 */
}
/* first child */
if ((pid = fork()) < 0)
err_sys("fork error");
else if (pid != 0) {
sleep(4);
abort();
/* terminate with core dump */
}
/* second child */
if ((pid = fork()) < 0)
err_sys("fork error");
Page 358
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
else if (pid != 0) {
execl("/bin/dd", "dd", "if=/etc/termcap", "of=/dev/null", NULL);
exit(7);
/* shouldn't get here */
}
/* third child */
if ((pid = fork()) < 0)
err_sys("fork error");
else if (pid != 0) {
sleep(8);
exit(0);
}
/* normal exit */
/* fourth child */
sleep(6);
kill(getpid(), SIGKILL);
exit(6);
/* terminate w/signal, no core dump */
/* shouldn't get here */
}
Figure 8.29. Print selected fields from system's accounting file
#include "apue.h"
#include <sys/acct.h>
#ifdef HAS_SA_STAT
#define FMT "%-*.*s
#else
#define FMT "%-*.*s
#endif
#ifndef HAS_ACORE
#define ACORE 0
#endif
#ifndef HAS_AXSIG
#define AXSIG 0
#endif
e = %6ld, chars = %7ld, stat = %3u: %c %c %c %c\n"
e = %6ld, chars = %7ld, %c %c %c %c\n"
static unsigned long
compt2ulong(comp_t comptime)
{
unsigned long
val;
int
exp;
/* convert comp_t to unsigned long */
val = comptime & 0x1fff;
/* 13-bit fraction */
exp = (comptime >> 13) & 7; /* 3-bit exponent (0-7) */
while (exp-- > 0)
val *= 8;
return(val);
}
int
main(int argc, char *argv[])
{
struct acct
acdata;
FILE
*fp;
if (argc != 2)
err_quit("usage: pracct filename");
if ((fp = fopen(argv[1], "r")) == NULL)
err_sys("can't open %s", argv[1]);
while (fread(&acdata, sizeof(acdata), 1, fp) == 1) {
Page 359
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
printf(FMT, (int)sizeof(acdata.ac_comm),
(int)sizeof(acdata.ac_comm), acdata.ac_comm,
compt2ulong(acdata.ac_etime), compt2ulong(acdata.ac_io),
#ifdef HAS_SA_STAT
(unsigned char) acdata.ac_stat,
#endif
acdata.ac_flag & ACORE ? 'D' : ' ',
acdata.ac_flag & AXSIG ? 'X' : ' ',
acdata.ac_flag & AFORK ? 'F' : ' ',
acdata.ac_flag & ASU
? 'S' : ' ');
}
if (ferror(fp))
err_sys("read error");
exit(0);
}
Page 360
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.15. User Identification
Any process can find out its real and effective user ID and group ID. Sometimes, however, we
want to find out the login name of the user who's running the program. We could call
getpwuid(getuid()), but what if a single user has multiple login names, each with the same
user ID? (A person might have multiple entries in the password file with the same user ID to
have a different login shell for each entry.) The system normally keeps track of the name we
log in under (Section 6.8), and the getlogin function provides a way to fetch that login name.
#include <unistd.h>
char *getlogin(void);
Returns: pointer to string giving login name if OK, NULL on error
This function can fail if the process is not attached to a terminal that a user logged in to. We
normally call these processes daemons. We discuss them in Chapter 13.
Given the login name, we can then use it to look up the user in the password fileto determine
the login shell, for exampleusing getpwnam.
To find the login name, UNIX systems have historically called the ttyname function (Section
18.9) and then tried to find a matching entry in the utmp file (Section 6.8). FreeBSD and Mac
OS X store the login name in the session structure associated with the process table entry
and provide system calls to fetch and store this name.
System V provided the cuserid function to return the login name. This function called getlogin
and, if that failed, did a getpwuid(getuid()). The IEEE Standard 1003.11988 specified cuserid,
but it called for the effective user ID to be used, instead of the real user ID. The 1990 version
of POSIX.1 dropped the cuserid function.
The environment variable LOGNAME is usually initialized with the user's login name by login(1)
and inherited by the login shell. Realize, however, that a user can modify an environment
variable, so we shouldn't use LOGNAME to validate the user in any way. Instead, getlogin
should be used.
Page 361
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.16. Process Times
In Section 1.10, we described three times that we can measure: wall clock time, user CPU
time, and system CPU time. Any process can call the times function to obtain these values for
itself and any terminated children.
#include <sys/times.h>
clock_t times(struct tms *buf);
Returns: elapsed wall clock time in clock ticks if OK, 1 on error
This function fills in the tms structure pointed to by buf:
struct tms
clock_t
clock_t
clock_t
clock_t
};
{
tms_utime;
tms_stime;
tms_cutime;
tms_cstime;
/*
/*
/*
/*
user CPU time */
system CPU time */
user CPU time, terminated children */
system CPU time, terminated children */
Note that the structure does not contain any measurement for the wall clock time. Instead,
the function returns the wall clock time as the value of the function, each time it's called.
This value is measured from some arbitrary point in the past, so we can't use its absolute
value; instead, we use its relative value. For example, we call times and save the return
value. At some later time, we call times again and subtract the earlier return value from the
new return value. The difference is the wall clock time. (It is possible, though unlikely, for a
long-running process to overflow the wall clock time; see Exercise 1.6.)
The two structure fields for child processes contain values only for children that we have
waited for with wait, waitid, or waitpid.
All the clock_t values returned by this function are converted to seconds using the number of
clock ticks per secondthe _SC_CLK_TCK value returned by sysconf (Section 2.5.4).
Most implementations provide the geTRusage(2) function. This function returns the CPU times
and 14 other values indicating resource usage. Historically, this function originated with the
BSD operating system, so BSD-derived implementations generally support more of the fields
than do other implementations.
Example
The program in Figure 8.30 executes each command-line argument as a shell command string,
timing the command and printing the values from the tms structure.
If we run this program, we get
$ ./a.out "sleep 5" "date"
command: sleep 5
real:
5.02
user:
0.00
sys:
0.00
child user:
0.01
Page 362
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
child sys:
0.00
normal termination, exit status = 0
command: date
Mon Mar 22 00:43:58 EST 2004
real:
0.01
user:
0.00
sys:
0.00
child user:
0.01
child sys:
0.00
normal termination, exit status = 0
In these two examples, all the CPU time appears in the child process, which is where the shell
and the command execute.
Figure 8.30. Time and execute all command-line arguments
#include "apue.h"
#include <sys/times.h>
static void pr_times(clock_t, struct tms *, struct tms *);
static void do_cmd(char *);
int
main(int argc, char *argv[])
{
int
i;
setbuf(stdout, NULL);
for (i = 1; i < argc; i++)
do_cmd(argv[i]);
/* once for each command-line arg */
exit(0);
}
static void
do_cmd(char *cmd)
/* execute and time the "cmd" */
{
struct tms tmsstart, tmsend;
clock_t
start, end;
int
status;
printf("\ncommand: %s\n", cmd);
if ((start = times(&tmsstart)) == -1)
err_sys("times error");
if ((status = system(cmd)) < 0)
err_sys("system() error");
/* starting values */
/* execute command */
if ((end = times(&tmsend)) == -1)
err_sys("times error");
/* ending values */
pr_times(end-start, &tmsstart, &tmsend);
pr_exit(status);
}
static void
pr_times(clock_t real, struct tms *tmsstart, struct tms *tmsend)
{
Page 363
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
static long
clktck = 0;
if (clktck == 0)
/* fetch clock ticks per second first time */
if ((clktck = sysconf(_SC_CLK_TCK)) < 0)
err_sys("sysconf error");
printf(" real: %7.2f\n", real / (double) clktck);
printf(" user: %7.2f\n",
(tmsend->tms_utime - tmsstart->tms_utime) / (double) clktck);
printf(" sys:
%7.2f\n",
(tmsend->tms_stime - tmsstart->tms_stime) / (double) clktck);
printf(" child user:
%7.2f\n",
(tmsend->tms_cutime - tmsstart->tms_cutime) / (double) clktck);
printf(" child sys:
%7.2f\n",
(tmsend->tms_cstime - tmsstart->tms_cstime) / (double) clktck);
}
Page 364
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
8.17. Summary
A thorough understanding of the UNIX System's process control is essential for advanced
programming. There are only a few functions to master: fork, the exec family, _exit, wait,
and waitpid. These primitives are used in many applications. The fork function also gave us
an opportunity to look at race conditions.
Our examination of the system function and process accounting gave us another look at all
these process control functions. We also looked at another variation of the exec functions:
interpreter files and how they operate. An understanding of the various user IDs and group IDs
that are providedreal, effective, and savedis critical to writing safe set-user-ID programs.
Given an understanding of a single process and its children, in the next chapter we examine
the relationship of a process to other processessessions and job control. We then complete
our discussion of processes in Chapter 10 when we describe signals.
Page 365
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Exercises
8.1
8.2
8.3
In Figure 8.3, we said that replacing the call to _exit with a call to exit might
cause the standard output to be closed and printf to return -1. Modify the
program to check whether your implementation behaves this way. If it does
not, how can you simulate this behavior?
Recall the typical arrangement of memory in Figure 7.6. Because the stack
frames corresponding to each function call are usually stored in the stack, and
because after a vfork, the child runs in the address space of the parent, what
happens if the call to vfork is from a function other than main and the child
does a return from this function after the vfork? Write a test program to verify
this, and draw a picture of what's happening.
When we execute the program in Figure 8.13 one time, as in
$ ./a.out
the output is correct. But if we execute the program multiple times, one right
after the other, as in
$ ./a.out ; ./a.out ; ./a.out
output from parent
ooutput from parent
ouotuptut from child
put from parent
output from child
utput from child
the output is not correct. What's happening? How can we correct this? Can
this problem happen if we let the child write its output first?
8.4
8.5
8.6
8.7
In the program shown in Figure 8.20, we call execl, specifying the pathname of
the interpreter file. If we called execlp instead, specifying a filename of
testinterp, and if the directory /home/sar/bin was a path prefix, what would be
printed as argv[2] when the program is run?
How can a process obtain its saved set-user-ID?
Write a program that creates a zombie, and then call system to execute the ps
(1) command to verify that the process is a zombie.
We mentioned in Section 8.10 that POSIX.1 requires that open directory
streams be closed across an exec. Verify this as follows: call opendir for the
root directory, peek at your system's implementation of the DIR structure, and
print the close-on-exec flag. Then open the same directory for reading, and
print the close-on-exec flag.
Page 366
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 367
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Chapter 9. Process Relationships
Section 9.1. Introduction
Section 9.2. Terminal Logins
Section 9.3. Network Logins
Section 9.4. Process Groups
Section 9.5. Sessions
Section 9.6. Controlling Terminal
Section 9.7. tcgetpgrp, tcsetpgrp, and tcgetsid Functions
Section 9.8. Job Control
Section 9.9. Shell Execution of Programs
Section 9.10. Orphaned Process Groups
Section 9.11. FreeBSD Implementation
Section 9.12. Summary
Exercises
Page 368
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
9.1. Introduction
We learned in the previous chapter that there are relationships between processes. First,
every process has a parent process (the initial kernel-level process is usually its own parent).
The parent is notified when the child terminates, and the parent can obtain the child's exit
status. We also mentioned process groups when we described the waitpid function (Section
8.6) and how we can wait for any process in a process group to terminate.
In this chapter, we'll look at process groups in more detail and the concept of sessions that
was introduced by POSIX.1. We'll also look at the relationship between the login shell that is
invoked for us when we log in and all the processes that we start from our login shell.
It is impossible to describe these relationships without talking about signals, and to talk about
signals, we need many of the concepts in this chapter. If you are unfamiliar with the UNIX
System signal mechanism, you may want to skim through Chapter 10 at this point.
Page 369
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
9.2. Terminal Logins
Let's start by looking at the programs that are executed when we log in to a UNIX system. In
early UNIX systems, such as Version 7, users logged in using dumb terminals that were
connected to the host with hard-wired connections. The terminals were either local (directly
connected) or remote (connected through a modem). In either case, these logins came
through a terminal device driver in the kernel. For example, the common devices on PDP-11s
were DH-11s and DZ-11s. A host had a fixed number of these terminal devices, so there was
a known upper limit on the number of simultaneous logins.
As bit-mapped graphical terminals became available, windowing systems were developed to
provide users with new ways to interact with host computers. Applications were developed to
create "terminal windows" to emulate character-based terminals, allowing users to interact
with hosts in familiar ways (i.e., via the shell command line).
Today, some platforms allow you to start a windowing system after logging in, whereas other
platforms automatically start the windowing system for you. In the latter case, you might still
have to log in, depending on how the windowing system is configured (some windowing
systems can be configured to log you in automatically).
The procedure that we now describe is used to log in to a UNIX system using a terminal. The
procedure is similar regardless of the type of terminal we useit could be a character-based
terminal, a graphical terminal emulating a simple character-based terminal, or a graphical
terminal running a windowing system.
BSD Terminal Logins
This procedure has not changed much over the past 30 years. The system administrator
creates a file, usually /etc/ttys, that has one line per terminal device. Each line specifies the
name of the device and other parameters that are passed to the getty program. One
parameter is the baud rate of the terminal, for example. When the system is bootstrapped,
the kernel creates process ID 1, the init process, and it is init that brings the system up
multiuser. The init process reads the file /etc/ttys and, for every terminal device that allows
a login, does a fork followed by an exec of the program getty. This gives us the processes
shown in Figure 9.1.
Figure 9.1. Processes invoked by init to allow terminal logins
Page 370
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
All the processes shown in Figure 9.1 have a real user ID of 0 and an effective user ID of 0
(i.e., they all have superuser privileges). The init process also execs the getty program with
an empty environment.
It is getty that calls open for the terminal device. The terminal is opened for reading and
writing. If the device is a modem, the open may delay inside the device driver until the modem
is dialed and the call is answered. Once the device is open, file descriptors 0, 1, and 2 are set
to the device. Then getty outputs something like login: and waits for us to enter our user
name. If the terminal supports multiple speeds, getty can detect special characters that tell it
to change the terminal's speed (baud rate). Consult your UNIX system manuals for additional
details on the getty program and the data files (gettytab) that can drive its actions.
When we enter our user name, getty's job is complete, and it then invokes the login program,
similar to
execle("/bin/login", "login", "-p", username, (char *)0, envp);
(There can be options in the gettytab file to have it invoke other programs, but the default is
the login program.) init invokes getty with an empty environment; getty creates an
environment for login (the envp argument) with the name of the terminal (something like
TERM=foo, where the type of terminal foo is taken from the gettytab file) and any environment
strings that are specified in the gettytab. The -p flag to login tells it to preserve the
environment that it is passed and to add to that environment, not replace it. Figure 9.2 shows
the state of these processes right after login has been invoked.
Figure 9.2. State of processes after login has been invoked
All the processes shown in Figure 9.2 have superuser privileges, since the original init process
has superuser privileges. The process ID of the bottom three processes in Figure 9.2 is the
same, since the process ID does not change across an exec. Also, all the processes other
than the original init process have a parent process ID of 1.
The login program does many things. Since it has our user name, it can call getpwnam to fetch
our password file entry. Then login calls getpass(3) to display the prompt Password: and read
our password (with echoing disabled, of course). It calls crypt(3) to encrypt the password
that we entered and compares the encrypted result to the pw_passwd field from our shadow
password file entry. If the login attempt fails because of an invalid password (after a few
tries), login calls exit with an argument of 1. This termination will be noticed by the parent (
Page 371
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
init), and it will do another fork followed by an exec of getty, starting the procedure over
again for this terminal.
This is the traditional authentication procedure used on UNIX systems. Modern UNIX systems
have evolved to support multiple authentication procedures. For example, FreeBSD, Linux, Mac
OS X, and Solaris all support a more flexible scheme known as PAM (Pluggable Authentication
Modules). PAM allows an administrator to configure the authentication methods to be used to
access services that are written to use the PAM library.
If our application needs to verify that a user has the appropriate permission to perform a task,
we can either hard code the authentication mechanism in the application, or we can use the
PAM library to give us the equivalent functionality. The advantage to using PAM is that
administrators can configure different ways to authenticate users for different tasks, based on
the local site policies.
If we log in correctly, login will

Change to our home directory (chdir)

Change the ownership of our terminal device (chown) so we own it

Change the access permissions for our terminal device so we have permission to read
from and write to it

Set our group IDs by calling setgid and initgroups

Initialize the environment with all the information that login has: our home directory (
HOME), shell (SHELL), user name (USER and LOGNAME), and a default path (PATH)

Change to our user ID (setuid) and invoke our login shell, as in


execl("/bin/sh", "-sh", (char *)0);
The minus sign as the first character of argv[0] is a flag to all the shells that they are
being invoked as a login shell. The shells can look at this character and modify their
start-up accordingly.
The login program really does more than we've described here. It optionally prints the
message-of-the-day file, checks for new mail, and performs other tasks. We're interested only
in the features that we've described.
Recall from our discussion of the setuid function in Section 8.11 that since it is called by a
superuser process, setuid changes all three user IDs: the real user ID, effective user ID, and
saved set-user-ID. The call to setgid that was done earlier by login has the same effect on
all three group IDs.
At this point, our login shell is running. Its parent process ID is the original init process
(process ID 1), so when our login shell terminates, init is notified (it is sent a SIGCHLD signal),
and it can start the whole procedure over again for this terminal. File descriptors 0, 1, and 2
for our login shell are set to the terminal device. Figure 9.3 shows this arrangement.
Figure 9.3. Arrangement of processes after everything is set for a
terminal login
Page 372
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Our login shell now reads its start-up files (.profile for the Bourne shell and Korn shell;
.bash_profile, .bash_login, or .profile for the GNU Bourne-again shell; and .cshrc and .login
for the C shell). These start-up files usually change some of the environment variables and
add many additional variables to the environment. For example, most users set their own PATH
and often prompt for the actual terminal type (TERM). When the start-up files are done, we
finally get the shell's prompt and can enter commands.
Mac OS X Terminal Logins
On Mac OS X, the terminal login process follows the same steps as in the BSD login process,
since Mac OS X is based in part on FreeBSD. With Mac OS X, however, we are presented with
a graphical-based login screen from the start.
Linux Terminal Logins
The Linux login procedure is very similar to the BSD procedure. Indeed, the Linux login
command is derived from the 4.3BSD login command. The main difference between the BSD
login procedure and the Linux login procedure is in the way the terminal configuration is
specified.
On Linux, /etc/inittab contains the configuration information specifying the terminal devices
for which init should start a getty process, similar to the way it is done on System V.
Depending on the version of getty in use, the terminal characteristics are specified either on
the command line (as with agetty) or in the file /etc/gettydefs (as with mgetty).
Solaris Terminal Logins
Solaris supports two forms of terminal logins: (a) getty style, as described previously for BSD,
and (b) ttymon logins, a feature introduced with SVR4. Normally, getty is used for the console,
and ttymon is used for other terminal logins.
The ttymon command is part of a larger facility termed SAF, the Service Access Facility. The
goal of the SAF was to provide a consistent way to administer services that provide access to
a system. (See Chapter 6 of Rago [1993] for more details.) For our purposes, we end up with
the same picture as in Figure 9.3, with a different set of steps between init and the login
shell. init is the parent of sac (the service access controller), which does a fork and exec of
Page 373
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
the ttymon program when the system enters multiuser state. The ttymon program monitors all
the terminal ports listed in its configuration file and does a fork when we've entered our login
name. This child of ttymon does an exec of login, and login prompts us for our password.
Once this is done, login execs our login shell, and we're at the position shown in Figure 9.3.
One difference is that the parent of our login shell is now ttymon, whereas the parent of the
login shell from a getty login is init.
Page 374
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
9.3. Network Logins
The main (physical) difference between logging in to a system through a serial terminal and
logging in to a system through a network is that the connection between the terminal and the
computer isn't point-to-point. In this case, login is simply a service available, just like any
other network service, such as FTP or SMTP.
With the terminal logins that we described in the previous section, init knows which terminal
devices are enabled for logins and spawns a getty process for each device. In the case of
network logins, however, all the logins come through the kernel's network interface drivers
(e.g., the Ethernet driver), and we don't know ahead of time how many of these will occur.
Instead of having a process waiting for each possible login, we now have to wait for a
network connection request to arrive.
To allow the same software to process logins over both terminal logins and network logins, a
software driver called a pseudo terminal is used to emulate the behavior of a serial terminal
and map terminal operations to network operations, and vice versa. (In Chapter 19, we'll talk
about pseudo terminals in detail.)
BSD Network Logins
In BSD, a single process waits for most network connections: the inetd process, sometimes
called the Internet superserver. In this section, we'll look at the sequence of processes
involved in network logins for a BSD system. We are not interested in the detailed network
programming aspects of these processes; refer to Stevens, Fenner, and Rudoff [2004] for all
the details.
As part of the system start-up, init invokes a shell that executes the shell script /etc/rc.
One of the daemons that is started by this shell script is inetd. Once the shell script
terminates, the parent process of inetd becomes init; inetd waits for TCP/IP connection
requests to arrive at the host. When a connection request arrives for it to handle, inetd does
a fork and exec of the appropriate program.
Let's assume that a TCP connection request arrives for the TELNET server. TELNET is a
remote login application that uses the TCP protocol. A user on another host (that is
connected to the server's host through a network of some form) or on the same host initiates
the login by starting the TELNET client:
telnet hostname
The client opens a TCP connection to hostname, and the program that's started on hostname
is called the TELNET server. The client and the server then exchange data across the TCP
connection using the TELNET application protocol. What has happened is that the user who
started the client program is now logged in to the server's host. (This assumes, of course,
that the user has a valid account on the server's host.) Figure 9.4 shows the sequence of
processes involved in executing the TELNET server, called telnetd.
Figure 9.4. Sequence of processes involved in executing TELNET server
Page 375
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The telnetd process then opens a pseudo-terminal device and splits into two processes using
fork. The parent handles the communication across the network connection, and the child
does an exec of the login program. The parent and the child are connected through the
pseudo terminal. Before doing the exec, the child sets up file descriptors 0, 1, and 2 to the
pseudo terminal. If we log in correctly, login performs the same steps we described in Section
9.2: it changes to our home directory and sets our group IDs, user ID, and our initial
environment. Then login replaces itself with our login shell by calling exec. Figure 9.5 shows
the arrangement of the processes at this point.
Figure 9.5. Arrangement of processes after everything is set for a
network login
Obviously, a lot is going on between the pseudo-terminal device driver and the actual user at
the terminal. We'll show all the processes involved in this type of arrangement in Chapter 19
when we talk about pseudo terminals in more detail.
Page 376
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The important thing to understand is that whether we log in through a terminal (Figure 9.3) or
a network (Figure 9.5), we have a login shell with its standard input, standard output, and
standard error connected to either a terminal device or a pseudo-terminal device. We'll see in
the coming sections that this login shell is the start of a POSIX.1 session, and that the
terminal or pseudo terminal is the controlling terminal for the session.
Mac OS X Network Logins
Logging in to a Mac OS X system over a network is identical to a BSD system, because Mac
OS X is based partially on FreeBSD.
Linux Network Logins
Network logins under Linux are the same as under BSD, except that an alternate inetd process
is used, called the extended Internet services daemon, xinetd. The xinetd process provides a
finer level of control over services it starts than does inetd.
Solaris Network Logins
The scenario for network logins under Solaris is almost identical to the steps under BSD and
Linux. An inetd server is used similar to the BSD version. The Solaris version has the additional
ability to run under the service access facility framework, although it is not configured to do
so. Instead, the inetd server is started by init. Either way, we end up with the same overall
picture as in Figure 9.5.
Page 377
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
9.4. Process Groups
In addition to having a process ID, each process also belongs to a process group. We'll
encounter process groups again when we discuss signals in Chapter 10.
A process group is a collection of one or more processes, usually associated with the same job
(job control is discussed in Section 9.8), that can receive signals from the same terminal.
Each process group has a unique process group ID. Process group IDs are similar to process
IDs: they are positive integers and can be stored in a pid_t data type. The function getpgrp
returns the process group ID of the calling process.
#include <unistd.h>
pid_t getpgrp(void);
Returns: process group ID of calling process
In older BSD-derived systems, the getpgrp function took a pid argument and returned the
process group for that process. The Single UNIX Specification defines the getpgid function as
an XSI extension that mimics this behavior.
#include <unistd.h>
pid_t getpgid(pid_t pid);
Returns: process group ID if OK, 1 on error
If pid is 0, the process group ID of the calling process is returned. Thus,
getpgid(0);
is equivalent to
getpgrp();
Each process group can have a process group leader. The leader is identified by its process
group ID being equal to its process ID.
It is possible for a process group leader to create a process group, create processes in the
group, and then terminate. The process group still exists, as long as at least one process is in
the group, regardless of whether the group leader terminates. This is called the process group
lifetimethe period of time that begins when the group is created and ends when the last
remaining process leaves the group. The last remaining process in the process group can
either terminate or enter some other process group.
A process joins an existing process group or creates a new process group by calling setpgid.
(In the next section, we'll see that setsid also creates a new process group.)
Page 378
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
#include <unistd.h>
int setpgid(pid_t pid, pid_t pgid
);
Returns: 0 if OK, 1 on error
This function sets the process group ID to pgid in the process whose process ID equals pid. If
the two arguments are equal, the process specified by pid becomes a process group leader.
If pid is 0, the process ID of the caller is used. Also, if pgid is 0, the process ID specified by
pid is used as the process group ID.
A process can set the process group ID of only itself or any of its children. Furthermore, it
can't change the process group ID of one of its children after that child has called one of the
exec functions.
In most job-control shells, this function is called after a fork to have the parent set the
process group ID of the child, and to have the child set its own process group ID. One of
these calls is redundant, but by doing both, we are guaranteed that the child is placed into its
own process group before either process assumes that this has happened. If we didn't do this,
we would have a race condition, since the child's process group membership would depend on
which process executes first.
When we discuss signals, we'll see how we can send a signal to either a single process
(identified by its process ID) or a process group (identified by its process group ID). Similarly,
the waitpid function from Section 8.6 lets us wait for either a single process or one process
from a specified process group.
Page 379
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
9.5. Sessions
A session is a collection of one or more process groups. For example, we could have the
arrangement shown in Figure 9.6. Here we have three process groups in a single session.
Figure 9.6. Arrangement of processes into process groups and sessions
[View full size image]
The processes in a process group are usually placed there by a shell pipeline. For example, the
arrangement shown in Figure 9.6 could have been generated by shell commands of the form
proc1 | proc2 &
proc3 | proc4 | proc5
A process establishes a new session by calling the setsid function.
#include <unistd.h>
pid_t setsid(void);
Returns: process group ID if OK, 1 on error
If the calling process is not a process group leader, this function creates a new session. Three
things happen.
1.
The process becomes the session leader of this new session. (A session leader is the
process that creates a session.) The process is the only process in this new session.
2.
The process becomes the process group leader of a new process group. The new
process group ID is the process ID of the calling process.
3.
The process has no controlling terminal. (We'll discuss controlling terminals in the next
section.) If the process had a controlling terminal before calling setsid, that
association is broken.
This function returns an error if the caller is already a process group leader. To ensure this is
not the case, the usual practice is to call fork and have the parent terminate and the child
continue. We are guaranteed that the child is not a process group leader, because the
process group ID of the parent is inherited by the child, but the child gets a new process ID.
Hence, it is impossible for the child's process ID to equal its inherited process group ID.
Page 380
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The Single UNIX Specification talks only about a "session leader." There is no "session ID"
similar to a process ID or a process group ID. Obviously, a session leader is a single process
that has a unique process ID, so we could talk about a session ID that is the process ID of
the session leader. This concept of a session ID was introduced in SVR4. Historically,
BSD-based systems didn't support this notion, but have since been updated to include it. The
getsid function returns the process group ID of a process's session leader. The getsid
function is included as an XSI extension in the Single UNIX Specification.
Some implementations, such as Solaris, join with the Single UNIX Specification in the practice
of avoiding the use of the phrase "session ID," opting instead to refer to this as the "process
group ID of the session leader." The two are equivalent, since the session leader is always the
leader of a process group.
#include <unistd.h>
pid_t getsid(pid_t pid);
Returns: session leader's process group ID if OK, 1 on error
If pid is 0, getsid returns the process group ID of the calling process's session leader. For
security reasons, some implementations may restrict the calling process from obtaining the
process group ID of the session leader if pid doesn't belong to the same session as the caller.
Page 381
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
9.6. Controlling Terminal
Sessions and process groups have a few other characteristics.

A session can have a single controlling terminal. This is usually the terminal device (in
the case of a terminal login) or pseudo-terminal device (in the case of a network login)
on which we log in.

The session leader that establishes the connection to the controlling terminal is called
the controlling process.

The process groups within a session can be divided into a single foreground process
group and one or more background process groups.

If a session has a controlling terminal, it has a single foreground process group, and all
other process groups in the session are background process groups.

Whenever we type the terminal's interrupt key (often DELETE or Control-C), this
causes the interrupt signal be sent to all processes in the foreground process group.

Whenever we type the terminal's quit key (often Control-backslash), this causes the
quit signal to be sent to all processes in the foreground process group.

If a modem (or network) disconnect is detected by the terminal interface, the hang-up
signal is sent to the controlling process (the session leader).
These characteristics are shown in Figure 9.7.
Figure 9.7. Process groups and sessions showing controlling terminal
[View full size image]
Usually, we don't have to worry about the controlling terminal; it is established automatically
when we log in.
POSIX.1 leaves the choice of the mechanism used to allocate a controlling terminal up to each
individual implementation. We'll show the actual steps in Section 19.4.
Systems derived from UNIX System V allocate the controlling terminal for a session when the
Page 382
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
session leader opens the first terminal device that is not already associated with a session.
This assumes that the call to open by the session leader does not specify the O_NOCTTY flag (
Section 3.3).
BSD-based systems allocate the controlling terminal for a session when the session leader
calls ioctl with a request argument of TIOCSCTTY (the third argument is a null pointer). The
session cannot already have a controlling terminal for this call to succeed. (Normally, this call
to ioctl follows a call to setsid, which guarantees that the process is a session leader
without a controlling terminal.) The POSIX.1 O_NOCTTY flag to open is not used by BSD-based
systems, except in compatibility-mode support for other systems.
There are times when a program wants to talk to the controlling terminal, regardless of
whether the standard input or standard output is redirected. The way a program guarantees
that it is talking to the controlling terminal is to open the file /dev/tty. This special file is a
synonym within the kernel for the controlling terminal. Naturally, if the program doesn't have a
controlling terminal, the open of this device will fail.
The classic example is the getpass(3) function, which reads a password (with terminal echoing
turned off, of course). This function is called by the crypt(1) program and can be used in a
pipeline. For example,
crypt < salaries | lpr
decrypts the file salaries and pipes the output to the print spooler. Because crypt reads its
input file on its standard input, the standard input can't be used to enter the password. Also,
crypt is designed so that we have to enter the encryption password each time we run the
program, to prevent us from saving the password in a file (which could be a security hole).
There are known ways to break the encoding used by the crypt program. See Garfinkel et al. [
2003] for more details on encrypting files.
Page 383
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
9.7. tcgetpgrp, tcsetpgrp, and tcgetsid Functions
We need a way to tell the kernel which process group is the foreground process group, so
that the terminal device driver knows where to send the terminal input and the
terminal-generated signals (Figure 9.7).
#include <unistd.h>
pid_t tcgetpgrp(int filedes);
Returns: process group ID of foreground process group if OK, 1 on error
int tcsetpgrp(int filedes, pid_t pgrpid);
Returns: 0 if OK, 1 on error
The function tcgetpgrp returns the process group ID of the foreground process group
associated with the terminal open on filedes.
If the process has a controlling terminal, the process can call tcsetpgrp to set the foreground
process group ID to pgrpid. The value of pgrpid must be the process group ID of a process
group in the same session, and filedes must refer to the controlling terminal of the session.
Most applications don't call these two functions directly. They are normally called by
job-control shells.
The Single UNIX Specification defines an XSI extension called tcgetsid to allow an application
to obtain the process group ID for the session leader given a file descriptor for the controlling
TTY.
#include <termios.h>
pid_t tcgetsid(int filedes);
Returns: session leader's process group ID if OK, 1 on error
Applications that need to manage controlling terminals can use tcgetsid to identify the
session ID of the controlling terminal's session leader (which is equivalent to the session
leader's process group ID).
Page 384
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
9.8. Job Control
Job control is a feature added to BSD around 1980. This feature allows us to start multiple
jobs (groups of processes) from a single terminal and to control which jobs can access the
terminal and which jobs are to run in the background. Job control requires three forms of
support:
1.
A shell that supports job control
2.
The terminal driver in the kernel must support job control
3.
The kernel must support certain job-control signals
SVR3 provided a different form of job control called shell layers. The BSD form of job
control, however, was selected by POSIX.1 and is what we describe here. In earlier
versions of the standard, job control support was optional, but POSIX.1 now requires
platforms to support it.
From our perspective, using job control from a shell, we can start a job in either the
foreground or the background. A job is simply a collection of processes, often a pipeline of
processes. For example,
vi main.c
starts a job consisting of one process in the foreground. The commands
pr *.c | lpr &
make all &
start two jobs in the background. All the processes invoked by these background jobs are in
the background.
As we said, to use the features provided by job control, we need to be using a shell that
supports job control. With older systems, it was simple to say which shells supported job
control and which didn't. The C shell supported job control, the Bourne shell didn't, and it was
an option with the Korn shell, depending whether the host supported job control. But the C
shell has been ported to systems (e.g., earlier versions of System V) that don't support job
control, and the SVR4 Bourne shell, when invoked by the name jsh instead of sh, supports job
control. The Korn shell continues to support job control if the host does. The Bourne-again
shell also supports job control. We'll just talk generically about a shell that supports job
control, versus one that doesn't, when the difference between the various shells doesn't
matter.
When we start a background job, the shell assigns it a job identifier and prints one or more of
the process IDs. The following script shows how the Korn shell handles this:
$ make all > Make.out &
[1]
1475
$ pr *.c | lpr &
[2]
1490
$
[2] + Done
[1] + Done
just press RETURN
pr *.c | lpr &
make all > Make.out &
Page 385
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The make is job number 1 and the starting process ID is 1475. The next pipeline is job number
2 and the process ID of the first process is 1490. When the jobs are done and when we press
RETURN, the shell tells us that the jobs are complete. The reason we have to press RETURN is
to have the shell print its prompt. The shell doesn't print the changed status of background
jobs at any random timeonly right before it prints its prompt, to let us enter a new command
line. If the shell didn't do this, it could output while we were entering an input line.
The interaction with the terminal driver arises because a special terminal character affects the
foreground job: the suspend key (typically Control-Z). Entering this character causes the
terminal driver to send the SIGTSTP signal to all processes in the foreground process group.
The jobs in any background process groups aren't affected. The terminal driver looks for three
special characters, which generate signals to the foreground process group.

The interrupt character (typically DELETE or Control-C) generates SIGINT.

The quit character (typically Control-backslash) generates SIGQUIT.

The suspend character (typically Control-Z) generates SIGTSTP.
In Chapter 18, we'll see how we can change these three characters to be any characters we
choose and how we can disable the terminal driver's processing of these special characters.
Another job control condition can arise that must be handled by the terminal driver. Since we
can have a foreground job and one or more background jobs, which of these receives the
characters that we enter at the terminal? Only the foreground job receives terminal input. It is
not an error for a background job to try to read from the terminal, but the terminal driver
detects this and sends a special signal to the background job: SIGTTIN. This signal normally
stops the background job; by using the shell, we are notified of this and can bring the job into
the foreground so that it can read from the terminal. The following demonstrates this:
$ cat > temp.foo &
[1]
1681
$
[1] + Stopped (SIGTTIN)
$ fg %1
cat > temp.foo
start in background, but it'll read from standard input
hello, world
enter one line
^D
$ cat temp.foo
hello, world
type the end-of-file character
check that the one line was put into the file
we press RETURN
cat > temp.foo &
bring job number 1 into the foreground
the shell tells us which job is now in the foreground
The shell starts the cat process in the background, but when cat tries to read its standard
input (the controlling terminal), the terminal driver, knowing that it is a background job, sends
the SIGTTIN signal to the background job. The shell detects this change in status of its child
(recall our discussion of the wait and waitpid function in Section 8.6) and tells us that the job
has been stopped. We then move the stopped job into the foreground with the shell's fg
command. (Refer to the manual page for the shell that you are using, for all the details on its
job control commands, such as fg and bg, and the various ways to identify the different jobs.)
Doing this causes the shell to place the job into the foreground process group (tcsetpgrp) and
send the continue signal (SIGCONT) to the process group. Since it is now in the foreground
process group, the job can read from the controlling terminal.
What happens if a background job outputs to the controlling terminal? This is an option that
we can allow or disallow. Normally, we use the stty(1) command to change this option. (We'll
see in Chapter 18 how we can change this option from a program.) The following shows how
this works:
[View full width]
Page 386
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
$ cat temp.foo &
[1]
1719
$ hello, world
[1] + Done
$ stty tostop
execute in background
the output from the background job appears after the prompt
we press RETURN
cat temp.foo &
disable ability of background jobs to output
to
controlling terminal
$ cat temp.foo &
[1]
1721
$
[1] + Stopped(SIGTTOU)
$ fg %1
cat temp.foo
hello, world
try it again in the background
we press RETURN and find the job is stopped
cat temp.foo &
resume stopped job in the foreground
the shell tells us which job is now in the foreground
and here is its output
When we disallow background jobs from writing to the controlling terminal, cat will block when
it tries to write to its standard output, because the terminal driver identifies the write as
coming from a background process and sends the job the SIGTTOU signal. As with the previous
example, when we use the shell's fg command to bring the job into the foreground, the job
completes.
Figure 9.8 summarizes some of the features of job control that we've been describing. The
solid lines through the terminal driver box mean that the terminal I/O and the
terminal-generated signals are always connected from the foreground process group to the
actual terminal. The dashed line corresponding to the SIGTTOU signal means that whether the
output from a process in the background process group appears on the terminal is an option.
Figure 9.8. Summary of job control features with foreground and
background jobs, and terminal driver
[View full size image]
Page 387
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Is job control necessary or desirable? Job control was originally designed and implemented
before windowing terminals were widespread. Some people claim that a well-designed
windowing system removes any need for job control. Some complain that the implementation
of job controlrequiring support from the kernel, the terminal driver, the shell, and some
applicationsis a hack. Some use job control with a windowing system, claiming a need for
both. Regardless of your opinion, job control is a required feature of POSIX.1.
Page 388
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
9.9. Shell Execution of Programs
Let's examine how the shells execute programs and how this relates to the concepts of
process groups, controlling terminals, and sessions. To do this, we'll use the ps command
again.
First, we'll use a shell that doesn't support job controlthe classic Bourne shell running on
Solaris. If we execute
ps -o pid,ppid,pgid,sid,comm
the output is
PID
949
1774
PPID
947
949
PGID
949
949
SID
949
949
COMMAND
sh
ps
The parent of the ps command is the shell, which we would expect. Both the shell and the ps
command are in the same session and foreground process group (949). We say that 949 is the
foreground process group because that is what you get when you execute a command with a
shell that doesn't support job control.
Some platforms support an option to have the ps(1) command print the process group ID
associated with the session's controlling terminal. This value would be shown under the TPGID
column. Unfortunately, the output of the ps command often differs among versions of the
UNIX System. For example, Solaris 9 doesn't support this option. Under FreeBSD 5.2.1 and
Mac OS X 10.3, the command
ps -o pid,ppid,pgid,sess,tpgid,command
and under Linux 2.4.22, the command
ps -o pid,ppid,pgrp,session,tpgid,comm
print exactly the information we want.
Note that it is a misnomer to associate a process with a terminal process group ID (the TPGID
column). A process does not have a terminal process control group. A process belongs to a
process group, and the process group belongs to a session. The session may or may not have
a controlling terminal. If the session does have a controlling terminal, then the terminal device
knows the process group ID of the foreground process. This value can be set in the terminal
driver with the tcsetpgrp function, as we show in Figure 9.8. The foreground process group ID
is an attribute of the terminal, not the process. This value from the terminal device driver is
what ps prints as the TPGID. If it finds that the session doesn't have a controlling terminal, ps
prints 1.
If we execute the command in the background,
ps -o pid,ppid,pgid,sid,comm &
the only value that changes is the process ID of the command:
Page 389
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
PID
949
1812
PPID
947
949
PGID
949
949
SID COMMAND
949 sh
949 ps
This shell doesn't know about job control, so the background job is not put into its own
process group and the controlling terminal isn't taken away from the background job.
Let's now look at how the Bourne shell handles a pipeline. When we execute
ps -o pid,ppid,pgid,sid,comm | cat1
the output is
PID
949
1823
1824
PPID
947
949
1823
PGID
949
949
949
SID
949
949
949
COMMAND
sh
cat1
ps
(The program cat1 is just a copy of the standard cat program, with a different name. We have
another copy of cat with the name cat2, which we'll use later in this section. When we have
two copies of cat in a pipeline, the different names let us differentiate between the two
programs.) Note that the last process in the pipeline is the child of the shell and that the first
process in the pipeline is a child of the last process. It appears that the shell forks a copy of
itself and that this copy then forks to make each of the previous processes in the pipeline.
If we execute the pipeline in the background,
ps -o pid,ppid,pgid,sid,comm | cat1 &
only the process IDs change. Since the shell doesn't handle job control, the process group ID
of the background processes remains 949, as does the process group ID of the session.
What happens in this case if a background process tries to read from its controlling terminal?
For example, suppose that we execute
cat > temp.foo &
With job control, this is handled by placing the background job into a background process
group, which causes the signal SIGTTIN to be generated if the background job tries to read
from the controlling terminal. The way this is handled without job control is that the shell
automatically redirects the standard input of a background process to /dev/null, if the
process doesn't redirect standard input itself. A read from /dev/null generates an end of file.
This means that our background cat process immediately reads an end of file and terminates
normally.
The previous paragraph adequately handles the case of a background process accessing the
controlling terminal through its standard input, but what happens if a background process
specifically opens /dev/tty and reads from the controlling terminal? The answer is "it depends,"
but it's probably not what we want. For example,
crypt < salaries | lpr &
is such a pipeline. We run it in the background, but the crypt program opens /dev/tty,
changes the terminal characteristics (to disable echoing), reads from the device, and resets
the terminal characteristics. When we execute this background pipeline, the prompt Password:
Page 390
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
from crypt is printed on the terminal, but what we enter (the encryption password) is read by
the shell, which tries to execute a command of that name. The next line we enter to the shell
is taken as the password, and the file is not encrypted correctly, sending junk to the printer.
Here we have two processes trying to read from the same device at the same time, and the
result depends on the system. Job control, as we described earlier, handles this multiplexing of
a single terminal between multiple processes in a better fashion.
Returning to our Bourne shell example, if we execute three processes in the pipeline, we can
examine the process control used by this shell:
ps -o pid,ppid,pgid,sid,comm | cat1 | cat2
generates the following output
PID
949
1988
1989
1990
PPID
947
949
1988
1988
PGID
949
949
949
949
SID
949
949
949
949
COMMAND
sh
cat2
ps
cat1
Don't be alarmed if the output on your system doesn't show the proper command names.
Sometimes you might get results such as
PID
949
1831
1832
1833
PPID
947
949
1831
1831
PGID
949
949
949
949
SID
949
949
949
949
COMMAND
sh
sh
ps
sh
What's happening here is that the ps process is racing with the shell, which is forking and
executing the cat commands. In this case, the shell hasn't yet completed the call to exec
when ps has obtained the list of processes to print.
Again, the last process in the pipeline is the child of the shell, and all previous processes in
the pipeline are children of the last process. Figure 9.9 shows what is happening. Since the
last process in the pipeline is the child of the login shell, the shell is notified when that
process (cat2) terminates.
Figure 9.9. Processes in the pipeline ps | cat1 | cat2 when invoked by
Bourne shell
[View full size image]
Page 391
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Now let's examine the same examples using a job-control shell running on Linux. This shows
the way these shells handle background jobs. We'll use the Bourne-again shell in this example;
the results with other job-control shells are almost identical.
ps -o pid,ppid,pgrp,session,tpgid,comm
gives us
PID
2837
5796
PPID
2818
2837
PGRP
2837
5796
SESS
2837
2837
TPGID COMMAND
5796 bash
5796 ps
(Starting with this example, we show the foreground process group in a bolder font.) We
immediately have a difference from our Bourne shell example. The Bourne-again shell places
the foreground job (ps) into its own process group (5796). The ps command is the process
group leader and the only process in this process group.
Furthermore, this process group is the foreground process group, since it has the controlling
terminal. Our login shell is a background process group while the ps command executes. Note,
however, that both process groups, 2837 and 5796, are members of the same session.
Indeed, we'll see that the session never changes through our examples in this section.
Executing this process in the background,
ps -o pid,ppid,pgrp,session,tpgid,comm &
gives us
PID
2837
5797
PPID
2818
2837
PGRP
2837
5797
SESS
2837
2837
TPGID COMMAND
2837 bash
2837 ps
Again, the ps command is placed into its own process group, but this time the process group
(5797) is no longer the foreground process group. It is a background process group. The
TPGID of 2837 indicates that the foreground process group is our login shell.
Page 392
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Executing two processes in a pipeline, as in
ps -o pid,ppid,pgrp,session,tpgid,comm | cat1
gives us
PID
2837
5799
5800
PPID
2818
2837
2837
PGRP
2837
5799
5799
SESS
2837
2837
2837
TPGID
5799
5799
5799
COMMAND
bash
ps
cat1
Both processes, ps and cat1, are placed into a new process group (5799), and this is the
foreground process group. We can also see another difference between this example and the
similar Bourne shell example. The Bourne shell created the last process in the pipeline first, and
this final process was the parent of the first process. Here, the Bourne-again shell is the
parent of both processes. If we execute this pipeline in the background,
ps -o pid,ppid,pgrp,session,tpgid,comm | cat1 &
the results are similar, but now ps and cat1 are placed in the same background process group:
PID
2837
5801
5802
PPID
2818
2837
2837
PGRP
2837
5801
5801
SESS
2837
2837
2837
TPGID
2837
2837
2837
COMMAND
bash
ps
cat1
Note that the order in which a shell creates processes can differ depending on the particular
shell in use.
Page 393
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
9.10. Orphaned Process Groups
We've mentioned that a process whose parent terminates is called an orphan and is inherited
by the init process. We now look at entire process groups that can be orphaned and how
POSIX.1 handles this situation.
Example
Consider a process that forks a child and then terminates. Although this is nothing abnormal
(it happens all the time), what happens if the child is stopped (using job control) when the
parent terminates? How will the child ever be continued, and does the child know that it has
been orphaned? Figure 9.10 shows this situation: the parent process has forked a child that
stops, and the parent is about to exit.
The program that creates this situation is shown in Figure 9.11. This program has some new
features. Here, we are assuming a job-control shell. Recall from the previous section that the
shell places the foreground process into its own process group (6099 in this example) and that
the shell stays in its own process group (2837). The child inherits the process group of its
parent (6099). After the fork,

The parent sleeps for 5 seconds. This is our (imperfect) way of letting the child
execute before the parent terminates.

The child establishes a signal handler for the hang-up signal (SIGHUP). This is so we can
see whether SIGHUP is sent to the child. (We discuss signal handlers in Chapter 10.)

The child sends itself the stop signal (SIGTSTP) with the kill function. This stops the
child, similar to our stopping a foreground job with our terminal's suspend character
(Control-Z).

When the parent terminates, the child is orphaned, so the child's parent process ID
becomes 1, the init process ID.

At this point, the child is now a member of an orphaned process group. The POSIX.1
definition of an orphaned process group is one in which the parent of every member is
either itself a member of the group or is not a member of the group's session. Another
way of wording this is that the process group is not orphaned as long as a process in
the group has a parent in a different process group but in the same session. If the
process group is not orphaned, there is a chance that one of those parents in a
different process group but in the same session will restart a stopped process in the
process group that is not orphaned. Here, the parent of every process in the group
(e.g., process 1 is the parent of process 6100) belongs to another session.

Since the process group is orphaned when the parent terminates, POSIX.1 requires
that every process in the newly orphaned process group that is stopped (as our child
is) be sent the hang-up signal (SIGHUP) followed by the continue signal (SIGCONT).

This causes the child to be continued, after processing the hang-up signal. The default
action for the hang-up signal is to terminate the process, so we have to provide a
signal handler to catch the signal. We therefore expect the printf in the sig_hup
function to appear before the printf in the pr_ids function.
Here is the output from the program shown in Figure 9.11:
$ ./a.out
parent: pid = 6099, ppid = 2837, pgrp = 6099, tpgrp = 6099
child: pid = 6100, ppid = 6099, pgrp = 6099, tpgrp = 6099
$ SIGHUP received, pid = 6100
Page 394
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
child: pid = 6100, ppid = 1, pgrp = 6099, tpgrp = 2837
read error from controlling TTY, errno = 5
Note that our shell prompt appears with the output from the child, since two processesour
login shell and the childare writing to the terminal. As we expect, the parent process ID of the
child has become 1.
After calling pr_ids in the child, the program tries to read from standard input. As we saw
earlier in this chapter, when a background process group tries to read from its controlling
terminal, SIGTTIN is generated for the background process group. But here we have an
orphaned process group; if the kernel were to stop it with this signal, the processes in the
process group would probably never be continued. POSIX.1 specifies that the read is to return
an error with errno set to EIO (whose value is 5 on this system) in this situation.
Finally, note that our child was placed in a background process group when the parent
terminated, since the parent was executed as a foreground job by the shell.
Figure 9.10. Example of a process group about to be orphaned
Figure 9.11. Creating an orphaned process group
#include "apue.h"
#include <errno.h>
static void
sig_hup(int signo)
{
printf("SIGHUP received, pid = %d\n", getpid());
}
static void
pr_ids(char *name)
{
printf("%s: pid = %d, ppid = %d, pgrp = %d, tpgrp = %d\n",
name, getpid(), getppid(), getpgrp(), tcgetpgrp(STDIN_FILENO));
fflush(stdout);
}
Page 395
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
int
main(void)
{
char
pid_t
c;
pid;
pr_ids("parent");
if ((pid = fork()) < 0) {
err_sys("fork error");
} else if (pid > 0) {
/* parent */
sleep(5);
/*sleep to let child stop itself */
exit(0);
/* then parent exits */
} else {
/* child */
pr_ids("child");
signal(SIGHUP, sig_hup);
/* establish signal handler */
kill(getpid(), SIGTSTP);
/* stop ourself */
pr_ids("child");
/* prints only if we're continued */
if (read(STDIN_FILENO, &c, 1) != 1)
printf("read error from controlling TTY, errno = %d\n",
errno);
exit(0);
}
}
We'll see another example of orphaned process groups in Section 19.5 with the pty program.
Page 396
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
9.11. FreeBSD Implementation
Having talked about the various attributes of a process, process group, session, and
controlling terminal, it's worth looking at how all this can be implemented. We'll look briefly at
the implementation used by FreeBSD. Some details of the SVR4 implementation of these
features can be found in Williams [1989]. Figure 9.12 shows the various data structures used
by FreeBSD.
Figure 9.12. FreeBSD implementation of sessions and process groups
[View full size image]
Let's look at all the fields that we've labeled, starting with the session structure. One of these
structures is allocated for each session (e.g., each time setsid is called).

s_count is the number of process groups in the session. When this counter is
decremented to 0, the structure can be freed.

s_leader is a pointer to the proc structure of the session leader.

s_ttyvp is a pointer to the vnode structure of the controlling terminal.

s_ttyp is a pointer to the tty structure of the controlling terminal.

s_sid is the session ID. Recall that the concept of a session ID is not part of the Single
UNIX Specification.
When setsid is called, a new session structure is allocated within the kernel. Now s_count is
set to 1, s_leader is set to point to the proc structure of the calling process, s_sid is set to
the process ID, and s_ttyvp and s_ttyp are set to null pointers, since the new session doesn't
Page 397
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
have a controlling terminal.
Let's move to the tty structure. The kernel contains one of these structures for each terminal
device and each pseudo-terminal device. (We talk more about pseudo terminals in Chapter 19
.)

t_session points to the session structure that has this terminal as its controlling
terminal. (Note that the tty structure points to the session structure and vice versa.)
This pointer is used by the terminal to send a hang-up signal to the session leader if
the terminal loses carrier (Figure 9.7).

t_pgrp points to the pgrp structure of the foreground process group. This field is used
by the terminal driver to send signals to the foreground process group. The three
signals generated by entering special characters (interrupt, quit, and suspend) are
sent to the foreground process group.

t_termios is a structure containing all the special characters and related information for
this terminal, such as baud rate, is echo on or off, and so on. We'll return to this
structure in Chapter 18.

t_winsize is a winsize structure that contains the current size of the terminal window.
When the size of the terminal window changes, the SIGWINCH signal is sent to the
foreground process group. We show how to set and fetch the terminal's current
window size in Section 18.12.
Note that to find the foreground process group of a particular session, the kernel has to start
with the session structure, follow s_ttyp to get to the controlling terminal's tty structure, and
then follow t_pgrp to get to the foreground process group's pgrp structure. The pgrp structure
contains the information for a particular process group.

pg_id is the process group ID.

pg_session points to the session structure for the session to which this process group
belongs.

pg_members is a pointer to the list of proc structures that are members of this process
group. The p_pglist structure in that proc structure is a doubly-linked list entry that
points to both the next process and the previous process in the group, and so on, until
a null pointer is encountered in the proc structure of the last process in the group.
The proc structure contains all the information for a single process.

p_pid contains the process ID.

p_pptr is a pointer to the proc structure of the parent process.

p_pgrp points to the pgrp structure of the process group to which this process belongs.

p_pglist is a structure containing pointers to the next and previous processes in the
process group, as we mentioned earlier.
Finally, we have the vnode structure. This structure is allocated when the controlling terminal
device is opened. All references to /dev/tty in a process go through this vnode structure. We
show the actual i-node as being part of the v-node.
Page 398
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
9.12. Summary
This chapter has described the relationships between groups of processes: sessions, which
are made up of process groups. Job control is a feature supported by most UNIX systems
today, and we've described how it's implemented by a shell that supports job control. The
controlling terminal for a process, /dev/tty, is also involved in these process relationships.
We've made numerous references to the signals that are used in all these process
relationships. The next chapter continues the discussion of signals, looking at all the UNIX
System signals in detail.
Page 399
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Exercises
9.1
9.2
Refer back to our discussion of the utmp and wtmp files in Section 6.8. Why are
the logout records written by the init process? Is this handled the same way
for a network login?
Write a small program that calls fork and has the child create a new session.
Verify that the child becomes a process group leader and that the child no
longer has a controlling terminal.
Page 400
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Chapter 10. Signals
Section 10.1. Introduction
Section 10.2. Signal Concepts
Section 10.3. signal Function
Section 10.4. Unreliable Signals
Section 10.5. Interrupted System Calls
Section 10.6. Reentrant Functions
Section 10.7. SIGCLD Semantics
Section 10.8. Reliable-Signal Terminology and Semantics
Section 10.9. kill and raise Functions
Section 10.10. alarm and pause Functions
Section 10.11. Signal Sets
Section 10.12. sigprocmask Function
Section 10.13. sigpending Function
Section 10.14. sigaction Function
Section 10.15. sigsetjmp and siglongjmp Functions
Section 10.16. sigsuspend Function
Section 10.17. abort Function
Section 10.18. system Function
Section 10.19. sleep Function
Section 10.20. Job-Control Signals
Section 10.21. Additional Features
Section 10.22. Summary
Exercises
Page 401
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.1. Introduction
Signals are software interrupts. Most nontrivial application programs need to deal with signals.
Signals provide a way of handling asynchronous events: a user at a terminal typing the
interrupt key to stop a program or the next program in a pipeline terminating prematurely.
Signals have been provided since the early versions of the UNIX System, but the signal model
provided with systems such as Version 7 was not reliable. Signals could get lost, and it was
difficult for a process to turn off selected signals when executing critical regions of code. Both
4.3BSD and SVR3 made changes to the signal model, adding what are called reliable signals.
But the changes made by Berkeley and AT&T were incompatible. Fortunately, POSIX.1
standardized the reliable-signal routines, and that is what we describe here.
In this chapter, we start with an overview of signals and a description of what each signal is
normally used for. Then we look at the problems with earlier implementations. It is often
important to understand what is wrong with an implementation before seeing how to do things
correctly. This chapter contains numerous examples that are not entirely correct and a
discussion of the defects.
Page 402
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.2. Signal Concepts
First, every signal has a name. These names all begin with the three characters SIG. For
example, SIGABRT is the abort signal that is generated when a process calls the abort
function. SIGALRM is the alarm signal that is generated when the timer set by the alarm
function goes off. Version 7 had 15 different signals; SVR4 and 4.4BSD both have 31 different
signals. FreeBSD 5.2.1, Mac OS X 10.3, and Linux 2.4.22 support 31 different signals, whereas
Solaris 9 supports 38 different signals. Both Linux and Solaris, however, support additional
application-defined signals as real-time extensions (the real-time extensions in POSIX aren't
covered in this book; refer to Gallmeister [1995] for more information).
These names are all defined by positive integer constants (the signal number) in the header
<signal.h>.
Implementations actually define the individual signals in an alternate header file, but this
header file is included by <signal.h>. It is considered bad form for the kernel to include header
files meant for user-level applications, so if the applications and the kernel both need the same
definitions, the information is placed in a kernel header file that is then included by the
user-level header file. Thus, both FreeBSD 5.2.1 and Mac OS X 10.3 define the signals in
<sys/signal.h>. Linux 2.4.22 defines the signals in <bits/signum.h>, and Solaris 9 defines them
in <sys/iso/signal_iso.h>.
No signal has a signal number of 0. We'll see in Section 10.9 that the kill function uses the
signal number of 0 for a special case. POSIX.1 calls this value the null signal.
Numerous conditions can generate a signal.

The terminal-generated signals occur when users press certain terminal keys. Pressing
the DELETE key on the terminal (or Control-C on many systems) normally causes the
interrupt signal (SIGINT) to be generated. This is how to stop a runaway program. (We'll
see in Chapter 18 how this signal can be mapped to any character on the terminal.)

Hardware exceptions generate signals: divide by 0, invalid memory reference, and the
like. These conditions are usually detected by the hardware, and the kernel is notified.
The kernel then generates the appropriate signal for the process that was running at
the time the condition occurred. For example, SIGSEGV is generated for a process that
executes an invalid memory reference.

The kill(2) function allows a process to send any signal to another process or process
group. Naturally, there are limitations: we have to be the owner of the process that
we're sending the signal to, or we have to be the superuser.

The kill(1) command allows us to send signals to other processes. This program is just
an interface to the kill function. This command is often used to terminate a runaway
background process.

Software conditions can generate signals when something happens about which the
process should be notified. These aren't hardware-generated conditions (as is the
divide-by-0 condition), but software conditions. Examples are SIGURG (generated when
out-of-band data arrives over a network connection), SIGPIPE (generated when a
process writes to a pipe after the reader of the pipe has terminated), and SIGALRM
(generated when an alarm clock set by the process expires).
Signals are classic examples of asynchronous events. Signals occur at what appear to be
random times to the process. The process can't simply test a variable (such as errno) to see
whether a signal has occurred; instead, the process has to tell the kernel "if and when this
signal occurs, do the following."
Page 403
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
We can tell the kernel to do one of three things when a signal occurs. We call this the
disposition of the signal, or the action associated with a signal.
1.
Ignore the signal. This works for most signals, but two signals can never be ignored:
SIGKILL and SIGSTOP. The reason these two signals can't be ignored is to provide the
kernel and the superuser with a surefire way of either killing or stopping any process.
Also, if we ignore some of the signals that are generated by a hardware exception
(such as illegal memory reference or divide by 0), the behavior of the process is
undefined.
2.
Catch the signal. To do this, we tell the kernel to call a function of ours whenever the
signal occurs. In our function, we can do whatever we want to handle the condition. If
we're writing a command interpreter, for example, when the user generates the
interrupt signal at the keyboard, we probably want to return to the main loop of the
program, terminating whatever command we were executing for the user. If the SIGCHLD
signal is caught, it means that a child process has terminated, so the signal-catching
function can call waitpid to fetch the child's process ID and termination status. As
another example, if the process has created temporary files, we may want to write a
signal-catching function for the SIGTERM signal (the termination signal that is the default
signal sent by the kill command) to clean up the temporary files. Note that the two
signals SIGKILL and SIGSTOP can't be caught.
3.
Let the default action apply. Every signal has a default action, shown in Figure 10.1.
Note that the default action for most signals is to terminate the process.
Figure 10.1. UNIX System signals
Name
Description
Mac
ISO
OS
SUS
C
FreeBSD Linux
X
Solaris
5.2.1
2.4.22 10.3 9
•
•
•
•
•
terminate+core
timer expired
(alarm)
•
•
•
•
•
terminate
SIGBUS
hardware
fault
•
•
•
•
•
terminate+core
SIGCANCEL
threads
library
internal use
•
ignore
SIGCHLD
change in
status of
child
•
•
•
•
•
ignore
SIGCONT
continue
stopped
process
•
•
•
•
•
continue/ignore
SIGEMT
hardware
•
•
•
•
terminate+core
SIGABRT
abnormal
termination (
abort)
SIGALRM
•
Default action
Page 404
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 10.1. UNIX System signals
Name
Description
Mac
ISO
OS
SUS
C
FreeBSD Linux
X
Solaris
5.2.1
2.4.22 10.3 9
Default action
fault
SIGFPE
arithmetic
exception
SIGFREEZE
checkpoint
freeze
SIGHUP
hangup
SIGILL
illegal
instruction
SIGINFO
status
request from
keyboard
SIGINT
terminal
interrupt
character
SIGIO
•
•
•
•
•
•
terminate+core
•
ignore
•
•
•
•
•
terminate
•
•
•
•
•
terminate+core
•
•
•
•
•
ignore
•
•
•
•
terminate
asynchronou
s I/O
•
•
•
•
terminate/ignore
SIGIOT
hardware
fault
•
•
•
•
terminate+core
SIGKILL
termination
•
•
•
•
terminate
SIGLWP
threads
library
internal use
•
ignore
SIGPIPE
write to pipe
with no
readers
•
•
terminate
SIGPOLL
pollable
event (poll)
XSI
•
terminate
SIGPROF
profiling time
alarm (
setitimer)
XSI
•
terminate
•
•
•
•
•
•
•
•
Page 405
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 10.1. UNIX System signals
Name
Description
SIGPWR
power
fail/restart
SIGQUIT
terminal quit
character
SIGSEGV
invalid
memory
reference
SIGSTKFLT
coprocessor
stack fault
SIGSTOP
stop
SIGSYS
invalid
system call
SIGTERM
termination
SIGTHAW
checkpoint
thaw
SIGTRAP
hardware
fault
SIGTSTP
Mac
ISO
OS
SUS
C
FreeBSD Linux
X
Solaris
5.2.1
2.4.22 10.3 9
•
•
•
terminate/ignore
•
•
•
•
•
terminate+core
•
•
•
•
•
terminate+core
•
•
Default action
terminate
•
•
•
•
•
stop process
XSI
•
•
•
•
terminate+core
•
•
•
•
•
terminate
•
ignore
XSI
•
•
•
•
terminate+core
terminal stop
character
•
•
•
•
•
stop process
SIGTTIN
background
read from
control tty
•
•
•
•
•
stop process
SIGTTOU
background
write to
control tty
•
•
•
•
•
stop process
SIGURG
urgent
condition
(sockets)
•
•
•
•
•
ignore
SIGUSR1
user-defined
signal
•
•
•
•
•
terminate
Page 406
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 10.1. UNIX System signals
Name
Description
SIGUSR2
user-defined
signal
SIGVTALRM
virtual time
alarm (
setitimer)
Mac
ISO
OS
SUS
C
FreeBSD Linux
X
Solaris
5.2.1
2.4.22 10.3 9
Default action
•
•
•
•
•
terminate
XSI
•
•
•
•
terminate
•
ignore
SIGWAITING threads
library
internal use
SIGWINCH
terminal
window size
change
SIGXCPU
CPU limit
exceeded (
setrlimit)
SIGXFSZ
file size limit
exceeded (
setrlimit)
SIGXRES
resource
control
exceeded
•
•
•
•
ignore
XSI
•
•
•
•
terminate+core/ignor
e
XSI
•
•
•
•
terminate+core/ignor
e
•
ignore
Figure 10.1 lists the names of all the signals, an indication of which systems support the
signal, and the default action for the signal. The SUS column contains • if the signal is defined
as part of the base POSIX.1 specification and XSI if it is defined as an XSI extension to the
base.
When the default action is labeled "terminate+core," it means that a memory image of the
process is left in the file named core of the current working directory of the process. (Because
the file is named core, it shows how long this feature has been part of the UNIX System.) This
file can be used with most UNIX System debuggers to examine the state of the process at the
time it terminated.
The generation of the core file is an implementation feature of most versions of the UNIX
System. Although this feature is not part of POSIX.1, it is mentioned as a potential
implementation-specific action in the Single UNIX Specification's XSI extension.
The name of the core file varies among implementations. On FreeBSD 5.2.1, for example, the
core file is named cmdname.core, where cmdname is the name of the command corresponding
to the process that received the signal. On Mac OS X 10.3, the core file is named core.pid,
where pid is the ID of the process that received the signal. (These systems allow the core
filename to be configured via a sysctl parameter.)
Page 407
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Most implementations leave the core file in the current working directory of the corresponding
process; Mac OS X places all core files in /cores instead.
The core file will not be generated if (a) the process was set-user-ID and the current user is
not the owner of the program file, or (b) the process was set-group-ID and the current user is
not the group owner of the file, (c) the user does not have permission to write in the current
working directory, (d) the file already exists and the user does not have permission to write to
it, or (e) the file is too big (recall the RLIMIT_CORE limit in Section 7.11). The permissions of
the core file (assuming that the file doesn't already exist) are usually user-read and user-write,
although Mac OS X sets only user-read.
In Figure 10.1, the signals with a description "hardware fault" correspond to
implementation-defined hardware faults. Many of these names are taken from the original
PDP-11 implementation of the UNIX System. Check your system's manuals to determine exactly
what type of error these signals correspond to.
We now describe each of these signals in more detail.
SIGABRT
This signal is generated by calling the abort function (Section 10.17). The
process terminates abnormally.
SIGALRM
This signal is generated when a timer set with the alarm function expires (see
Section 10.10 for more details). This signal is also generated when an interval
timer set by the setitimer(2) function expires.
SIGBUS
This indicates an implementation-defined hardware fault. Implementations
usually generate this signal on certain types of memory faults, as we describe
in Section 14.9.
SIGCANCEL
This signal is used internally by the Solaris threads library. It is not meant for
general use.
SIGCHLD
Whenever a process terminates or stops, the SIGCHLD signal is sent to the
parent. By default, this signal is ignored, so the parent must catch this signal if
it wants to be notified whenever a child's status changes. The normal action in
the signal-catching function is to call one of the wait functions to fetch the
child's process ID and termination status.
Earlier releases of System V had a similar signal named SIGCLD (without the H).
The semantics of this signal were different from those of other signals, and as
far back as SVR2, the manual page strongly discouraged its use in new
programs. (Strangely enough, this warning disappeared in the SVR3 and SVR4
versions of the manual page.) Applications should use the standard SIGCHLD
signal, but be aware that many systems define SIGCLD to be the same as
SIGCHLD for backward compatibility. If you maintain software that uses SIGCLD,
you need to check your system's manual page to see what semantics it
follows. We discuss these two signals in Section 10.7.
SIGCONT
This job-control signal is sent to a stopped process when it is continued. The
default action is to continue a stopped process, but to ignore the signal if the
process wasn't stopped. A full-screen editor, for example, might catch this
signal and use the signal handler to make a note to redraw the terminal screen.
See Section 10.20 for additional details.
SIGEMT
This indicates an implementation-defined hardware fault.
The name EMT comes from the PDP-11 "emulator trap" instruction. Not all
Page 408
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
SIGABRT
This signal is generated by calling the abort function (Section 10.17). The
process terminates abnormally.
platforms support this signal. On Linux, for example, SIGEMT is supported only for
selected architectures, such as SPARC, MIPS, and PA-RISC.
SIGFPE
This signals an arithmetic exception, such as divide by 0, floating-point
overflow, and so on.
SIGFREEZE
This signal is defined only by Solaris. It is used to notify processes that need to
take special action before freezing the system state, such as might happen
when a system goes into hibernation or suspended mode.
SIGHUP
This signal is sent to the controlling process (session leader) associated with a
controlling terminal if a disconnect is detected by the terminal interface.
Referring to Figure 9.12, we see that the signal is sent to the process pointed
to by the s_leader field in the session structure. This signal is generated for
this condition only if the terminal's CLOCAL flag is not set. (The CLOCAL flag for a
terminal is set if the attached terminal is local. The flag tells the terminal driver
to ignore all modem status lines. We describe how to set this flag in Chapter 18
.)
Note that the session leader that receives this signal may be in the
background; see Figure 9.7 for an example. This differs from the normal
terminal-generated signals (interrupt, quit, and suspend), which are always
delivered to the foreground process group.
This signal is also generated if the session leader terminates. In this case, the
signal is sent to each process in the foreground process group.
This signal is commonly used to notify daemon processes (Chapter 13) to
reread their configuration files. The reason SIGHUP is chosen for this is that a
daemon should not have a controlling terminal and would normally never receive
this signal.
SIGILL
This signal indicates that the process has executed an illegal hardware
instruction.
4.3BSD generated this signal from the abort function. SIGABRT is now used for
this.
SIGINFO
This BSD signal is generated by the terminal driver when we type the status
key (often Control-T). This signal is sent to all processes in the foreground
process group (refer to Figure 9.8). This signal normally causes status
information on processes in the foreground process group to be displayed on
the terminal.
Linux doesn't provide support for SIGINFO except on the Alpha platform, where
it is defined to be the same value as SIGPWR.
SIGINT
This signal is generated by the terminal driver when we type the interrupt key
(often DELETE or Control-C). This signal is sent to all processes in the
foreground process group (refer to Figure 9.8). This signal is often used to
terminate a runaway program, especially when it's generating a lot of unwanted
output on the screen.
Page 409
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
SIGABRT
This signal is generated by calling the abort function (Section 10.17). The
process terminates abnormally.
SIGIO
This signal indicates an asynchronous I/O event. We discuss it in Section
14.6.2.
In Figure 10.1, we labeled the default action for SIGIO as either "terminate" or
"ignore." Unfortunately, the default depends on the system. Under System V,
SIGIO is identical to SIGPOLL, so its default action is to terminate the process.
Under BSD, the default is to ignore the signal.
Linux 2.4.22 and Solaris 9 define SIGIO to be the same value as SIGPOLL, so the
default behavior is to terminate the process. On FreeBSD 5.2.1 and Mac OS X
10.3, the default is to ignore the signal.
SIGIOT
This indicates an implementation-defined hardware fault.
The name IOT comes from the PDP-11 mnemonic for the "input/output TRAP"
instruction. Earlier versions of System V generated this signal from the abort
function. SIGABRT is now used for this.
On FreeBSD 5.2.1, Linux 2.4.22, Mac OS X 10.3, and Solaris 9, SIGIOT is defined
to be the same value as SIGABRT.
SIGKILL
This signal is one of the two that can't be caught or ignored. It provides the
system administrator with a sure way to kill any process.
SIGLWP
This signal is used internally by the Solaris threads library, and is not available
for general use.
SIGPIPE
If we write to a pipeline but the reader has terminated, SIGPIPE is generated.
We describe pipes in Section 15.2. This signal is also generated when a process
writes to a socket of type SOCK_STREAM that is no longer connected. We
describe sockets in Chapter 16.
SIGPOLL
This signal can be generated when a specific event occurs on a pollable device.
We describe this signal with the poll function in Section 14.5.2. SIGPOLL
originated with SVR3, and loosely corresponds to the BSD SIGIO and SIGURG
signals.
On Linux and Solaris, SIGPOLL is defined to have the same value as SIGIO.
SIGPROF
This signal is generated when a profiling interval timer set by the setitimer(2)
function expires.
SIGPWR
This signal is system dependent. Its main use is on a system that has an
uninterruptible power supply (UPS). If power fails, the UPS takes over and the
software can usually be notified. Nothing needs to be done at this point, as the
system continues running on battery power. But if the battery gets low (if the
power is off for an extended period), the software is usually notified again; at
this point, it behooves the system to shut everything down within about 1530
seconds. This is when SIGPWR should be sent. Most systems have the process
that is notified of the low-battery condition send the SIGPWR signal to the init
process, and init handles the shutdown.
Linux 2.4.22 and Solaris 9 have entries in the inittab file for this purpose:
Page 410
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
SIGABRT
This signal is generated by calling the abort function (Section 10.17). The
process terminates abnormally.
powerfail and powerwait (or powerokwait).
In Figure 10.1, we labeled the default action for SIGPWR as either "terminate" or
"ignore." Unfortunately, the default depends on the system. The default on
Linux is to terminate the process. On Solaris, the signal is ignored by default.
SIGQUIT
This signal is generated by the terminal driver when we type the terminal quit
key (often Control-backslash). This signal is sent to all processes in the
foreground process group (refer to Figure 9.8). This signal not only terminates
the foreground process group (as does SIGINT), but also generates a core file.
SIGSEGV
This signal indicates that the process has made an invalid memory reference.
The name SEGV stands for "segmentation violation."
SIGSTKFLT
This signal is defined only by Linux. This signal showed up in the earliest
versions of Linux, intended to be used for stack faults taken by the math
coprocessor. This signal is not generated by the kernel, but remains for
backward compatibility.
SIGSTOP
This job-control signal stops a process. It is like the interactive stop signal (
SIGTSTP), but SIGSTOP cannot be caught or ignored.
SIGSYS
This signals an invalid system call. Somehow, the process executed a machine
instruction that the kernel thought was a system call, but the parameter with
the instruction that indicates the type of system call was invalid. This might
happen if you build a program that uses a new system call and you then try to
run the same binary on an older version of the operating system where the
system call doesn't exist.
SIGTERM
This is the termination signal sent by the kill(1) command by default.
SIGTHAW
This signal is defined only by Solaris and is used to notify processes that need
to take special action when the system resumes operation after being
suspended.
SIGTRAP
This indicates an implementation-defined hardware fault.
The signal name comes from the PDP-11 TRAP instruction. Implementations
often use this signal to transfer control to a debugger when a breakpoint
instruction is executed.
SIGTSTP
This interactive stop signal is generated by the terminal driver when we type
the terminal suspend key (often Control-Z). This signal is sent to all processes
in the foreground process group (refer to Figure 9.8).
Unfortunately, the term stop has different meanings. When discussing job
control and signals, we talk about stopping and continuing jobs. The terminal
driver, however, has historically used the term stop to refer to stopping and
starting the terminal output using the Control-S and Control-Q characters.
Therefore, the terminal driver calls the character that generates the interactive
stop signal the suspend character, not the stop character.
Page 411
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
SIGABRT
This signal is generated by calling the abort function (Section 10.17). The
process terminates abnormally.
SIGTTIN
This signal is generated by the terminal driver when a process in a background
process group tries to read from its controlling terminal. (Refer to the
discussion of this topic in Section 9.8.) As special cases, if either (a) the
reading process is ignoring or blocking this signal or (b) the process group of
the reading process is orphaned, then the signal is not generated; instead, the
read operation returns an error with errno set to EIO.
SIGTTOU
This signal is generated by the terminal driver when a process in a background
process group tries to write to its controlling terminal. (Refer to the discussion
of this topic in Section 9.8.) Unlike the SIGTTIN signal just described, a process
has a choice of allowing background writes to the controlling terminal. We
describe how to change this option in Chapter 18.
If background writes are not allowed, then like the SIGTTIN signal, there are
two special cases: if either (a) the writing process is ignoring or blocking this
signal or (b) the process group of the writing process is orphaned, then the
signal is not generated; instead, the write operation returns an error with errno
set to EIO.
Regardless of whether background writes are allowed, certain terminal
operations (other than writing) can also generate the SIGTTOU signal: tcsetattr,
tcsendbreak, tcdrain, tcflush, tcflow, and tcsetpgrp. We describe these
terminal operations in Chapter 18.
SIGURG
This signal notifies the process that an urgent condition has occurred. This
signal is optionally generated when out-of-band data is received on a network
connection.
SIGUSR1
This is a user-defined signal, for use in application programs.
SIGUSR2
This is another user-defined signal, similar to SIGUSR1, for use in application
programs.
SIGVTALRM
This signal is generated when a virtual interval timer set by the setitimer(2)
function expires.
SIGWAITING This signal is used internally by the Solaris threads library, and is not available
for general use.
SIGWINCH
The kernel maintains the size of the window associated with each terminal and
pseudo terminal. A process can get and set the window size with the ioctl
function, which we describe in Section 18.12. If a process changes the window
size from its previous value using the ioctl set-window-size command, the
kernel generates the SIGWINCH signal for the foreground process group.
SIGXCPU
The Single UNIX Specification supports the concept of resource limits as an XSI
extension; refer to Section 7.11. If the process exceeds its soft CPU time limit,
the SIGXCPU signal is generated.
In Figure 10.1, we labeled the default action for SIGXCPU as either "terminate
with a core file" or "ignore." Unfortunately, the default depends on the
operating system. Linux 2.4.22 and Solaris 9 support a default action of
terminate with a core file, whereas FreeBSD 5.2.1 and Mac OS X 10.3 support a
Page 412
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
SIGABRT
This signal is generated by calling the abort function (Section 10.17). The
process terminates abnormally.
default action of ignore. The Single UNIX Specification requires that the default
action be to terminate the process abnormally. Whether a core file is generated
is left up to the implementation.
SIGXFSZ
This signal is generated if the process exceeds its soft file size limit; refer to
Section 7.11.
Just as with SIGXCPU, the default action taken with SIGXFSZ depends on the
operating system. On Linux 2.4.22 and Solaris 9, the default is to terminate the
process and create a core file. On FreeBSD 5.2.1 and Mac OS X 10.3, the
default is to be ignored. The Single UNIX Specification requires that the default
action be to terminate the process abnormally. Whether a core file is generated
is left up to the implementation.
SIGXRES
This signal is defined only by Solaris. This signal is optionally used to notify
processes that have exceeded a preconfigured resource value. The Solaris
resource control mechanism is a general facility for controlling the use of
shared resources among independent application sets.
Page 413
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.3. signal Function
The simplest interface to the signal features of the UNIX System is the signal function.
#include <signal.h>
void (*signal(int signo, void (*func)(int)))(int);
Returns: previous disposition of signal (see following) if OK, SIG_ERR on error
The signal function is defined by ISO C, which doesn't involve multiple processes, process
groups, terminal I/O, and the like. Therefore, its definition of signals is vague enough to be
almost useless for UNIX systems.
Implementations derived from UNIX System V support the signal function, but it provides the
old unreliable-signal semantics. (We describe these older semantics in Section 10.4.) This
function provides backward compatibility for applications that require the older semantics.
New applications should not use these unreliable signals.
4.4BSD also provides the signal function, but it is defined in terms of the sigaction function
(which we describe in Section 10.14), so using it under 4.4BSD provides the newer
reliable-signal semantics. FreeBSD 5.2.1 and Mac OS X 10.3 follow this strategy.
Solaris 9 has roots in both System V and BSD, but it chooses to follow the System V
semantics for the signal function.
On Linux 2.4.22, the semantic of signal can follow either the BSD or System V semantics,
depending on the version of the C library and how you compile your application.
Because the semantics of signal differ among implementations, it is better to use the
sigaction function instead. When we describe the sigaction function in Section 10.14, we
provide an implementation of signal that uses it. All the examples in this text use the signal
function that we show in Figure 10.18.
The signo argument is just the name of the signal from Figure 10.1. The value of func is (a)
the constant SIG_IGN, (b) the constant SIG_DFL, or (c) the address of a function to be called
when the signal occurs. If we specify SIG_IGN, we are telling the system to ignore the signal.
(Remember that we cannot ignore the two signals SIGKILL and SIGSTOP.) When we specify
SIG_DFL, we are setting the action associated with the signal to its default value (see the final
column in Figure 10.1). When we specify the address of a function to be called when the
signal occurs, we are arranging to "catch" the signal. We call the function either the signal
handler or the signal-catching function.
The prototype for the signal function states that the function requires two arguments and
returns a pointer to a function that returns nothing (void). The signal function's first
argument, signo, is an integer. The second argument is a pointer to a function that takes a
single integer argument and returns nothing. The function whose address is returned as the
value of signal takes a single integer argument (the final (int)). In plain English, this
declaration says that the signal handler is passed a single integer argument (the signal
number) and that it returns nothing. When we call signal to establish the signal handler, the
second argument is a pointer to the function. The return value from signal is the pointer to
the previous signal handler.
Many systems call the signal handler with additional, implementation-dependent arguments.
We discuss this further in Section 10.14.
Page 414
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The perplexing signal function prototype shown at the beginning of this section can be made
much simpler through the use of the following typedef [Plauger 1992]:
typedef void Sigfunc(int);
Then the prototype becomes
Sigfunc *signal(int, Sigfunc *);
We've included this typedef in apue.h (Appendix B) and use it with the functions in this
chapter.
If we examine the system's header <signal.h>, we probably find declarations of the form
#define SIG_ERR (void (*)())-1
#define SIG_DFL (void (*)())0
#define SIG_IGN (void (*)())1
These constants can be used in place of the "pointer to a function that takes an integer
argument and returns nothing," the second argument to signal, and the return value from
signal. The three values used for these constants need not be -1, 0, and 1. They must be
three values that can never be the address of any declarable function. Most UNIX systems
use the values shown.
Example
Figure 10.2 shows a simple signal handler that catches either of the two user-defined signals
and prints the signal number. In Section 10.10, we describe the pause function, which simply
suspends the calling process until a signal is received.
We invoke the program in the background and use the kill(1) command to send it signals.
Note that the term kill in the UNIX System is a misnomer. The kill(1) command and the kill
(2) function just send a signal to a process or process group. Whether or not that signal
terminates the process depends on which signal is sent and whether the process has arranged
to catch the signal.
$ ./a.out &
[1]
7216
process ID
$ kill -USR1 7216
received SIGUSR1
$ kill -USR2 7216
received SIGUSR2
$ kill 7216
[1]+ Terminated
start process in background
job-control shell prints job number and
send it SIGUSR1
send it SIGUSR2
now send it SIGTERM
./a.out
When we send the SIGTERM signal, the process is terminated, since it doesn't catch the signal,
and the default action for the signal is termination.
Figure 10.2. Simple program to catch SIGUSR1 and SIGUSR2
#include "apue.h"
static void sig_usr(int);
/* one handler for both signals */
int
Page 415
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
main(void)
{
if (signal(SIGUSR1, sig_usr) == SIG_ERR)
err_sys("can't catch SIGUSR1");
if (signal(SIGUSR2, sig_usr) == SIG_ERR)
err_sys("can't catch SIGUSR2");
for ( ; ; )
pause();
}
static void
sig_usr(int signo)
/* argument is signal number */
{
if (signo == SIGUSR1)
printf("received SIGUSR1\n");
else if (signo == SIGUSR2)
printf("received SIGUSR2\n");
else
err_dump("received signal %d\n", signo);
}
Program Start-Up
When a program is executed, the status of all signals is either default or ignore. Normally, all
signals are set to their default action, unless the process that calls exec is ignoring the signal.
Specifically, the exec functions change the disposition of any signals being caught to their
default action and leave the status of all other signals alone. (Naturally, a signal that is being
caught by a process that calls exec cannot be caught by the same function in the new
program, since the address of the signal- catching function in the caller probably has no
meaning in the new program file that is executed.)
One specific example is how an interactive shell treats the interrupt and quit signals for a
background process. With a shell that doesn't support job control, when we execute a
process in the background, as in
cc main.c &
the shell automatically sets the disposition of the interrupt and quit signals in the background
process to be ignored. This is so that if we type the interrupt character, it doesn't affect the
background process. If this weren't done and we typed the interrupt character, it would
terminate not only the foreground process, but also all the background processes.
Many interactive programs that catch these two signals have code that looks like
void sig_int(int), sig_quit(int);
if (signal(SIGINT, SIG_IGN) != SIG_IGN)
signal(SIGINT, sig_int);
if (signal(SIGQUIT, SIG_IGN) != SIG_IGN)
signal(SIGQUIT, sig_quit);
Doing this, the process catches the signal only if the signal is not currently being ignored.
These two calls to signal also show a limitation of the signal function: we are not able to
determine the current disposition of a signal without changing the disposition. We'll see later in
this chapter how the sigaction function allows us to determine a signal's disposition without
changing it.
Page 416
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Process Creation
When a process calls fork, the child inherits the parent's signal dispositions. Here, since the
child starts off with a copy of the parent's memory image, the address of a signal-catching
function has meaning in the child.
Page 417
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.4. Unreliable Signals
In earlier versions of the UNIX System (such as Version 7), signals were unreliable. By this we
mean that signals could get lost: a signal could occur and the process would never know
about it. Also, a process had little control over a signal: a process could catch the signal or
ignore it. Sometimes, we would like to tell the kernel to block a signal: don't ignore it, just
remember if it occurs, and tell us later when we're ready.
Changes were made with 4.2BSD to provide what are called reliable signals. A different set of
changes was then made in SVR3 to provide reliable signals under System V. POSIX.1 chose
the BSD model to standardize.
One problem with these early versions is that the action for a signal was reset to its default
each time the signal occurred. (In the previous example, when we ran the program in Figure
10.2, we avoided this detail by catching each signal only once.) The classic example from
programming books that described these earlier systems concerns how to handle the interrupt
signal. The code that was described usually looked like
int
sig_int();
...
signal(SIGINT, sig_int);
...
sig_int()
{
signal(SIGINT, sig_int);
...
}
/* my signal handling function */
/* establish handler */
/* reestablish handler for next time */
/* process the signal ... */
(The reason the signal handler is declared as returning an integer is that these early systems
didn't support the ISO C void data type.)
The problem with this code fragment is that there is a window of timeafter the signal has
occurred, but before the call to signal in the signal handlerwhen the interrupt signal could
occur another time. This second signal would cause the default action to occur, which for this
signal terminates the process. This is one of those conditions that works correctly most of the
time, causing us to think that it is correct, when it isn't.
Another problem with these earlier systems is that the process was unable to turn a signal off
when it didn't want the signal to occur. All the process could do was ignore the signal. There
are times when we would like to tell the system "prevent the following signals from occurring,
but remember if they do occur." The classic example that demonstrates this flaw is shown by
a piece of code that catches a signal and sets a flag for the process that indicates that the
signal occurred:
int
sig_int_flag;
/* set nonzero when signal occurs */
main()
{
int
sig_int();
/* my signal handling function */
...
signal(SIGINT, sig_int); /* establish handler */
...
while (sig_int_flag == 0)
pause();
/* go to sleep, waiting for signal */
Page 418
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
...
}
sig_int()
{
signal(SIGINT, sig_int);
sig_int_flag = 1;
}
/* reestablish handler for next time */
/* set flag for main loop to examine */
Here, the process is calling the pause function to put it to sleep until a signal is caught. When
the signal is caught, the signal handler just sets the flag sig_int_flag to a nonzero value. The
process is automatically awakened by the kernel after the signal handler returns, notices that
the flag is nonzero, and does whatever it needs to do. But there is a window of time when
things can go wrong. If the signal occurs after the test of sig_int_flag, but before the call
to pause, the process could go to sleep forever (assuming that the signal is never generated
again). This occurrence of the signal is lost. This is another example of some code that isn't
right, yet it works most of the time. Debugging this type of problem can be difficult.
Page 419
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.5. Interrupted System Calls
A characteristic of earlier UNIX systems is that if a process caught a signal while the process
was blocked in a "slow" system call, the system call was interrupted. The system call returned
an error and errno was set to EINTR. This was done under the assumption that since a signal
occurred and the process caught it, there is a good chance that something has happened
that should wake up the blocked system call.
Here, we have to differentiate between a system call and a function. It is a system call within
the kernel that is interrupted when a signal is caught.
To support this feature, the system calls are divided into two categories: the "slow" system
calls and all the others. The slow system calls are those that can block forever. Included in
this category are

Reads that can block the caller forever if data isn't present with certain file types
(pipes, terminal devices, and network devices)

Writes that can block the caller forever if the data can't be accepted immediately by
these same file types

Opens that block until some condition occurs on certain file types (such as an open of
a terminal device that waits until an attached modem answers the phone)

The pause function (which by definition puts the calling process to sleep until a signal is
caught) and the wait function

Certain ioctl operations

Some of the interprocess communication functions (Chapter 15)
The notable exception to these slow system calls is anything related to disk I/O. Although a
read or a write of a disk file can block the caller temporarily (while the disk driver queues the
request and then the request is executed), unless a hardware error occurs, the I/O operation
always returns and unblocks the caller quickly.
One condition that is handled by interrupted system calls, for example, is when a process
initiates a read from a terminal device and the user at the terminal walks away from the
terminal for an extended period. In this example, the process could be blocked for hours or
days and would remain so unless the system was taken down.
POSIX.1 semantics for interrupted reads and writes changed with the 2001 version of the
standard. Earlier versions gave implementations a choice for how to deal with reads and writes
that have processed partial amounts of data. If read has received and transferred data to an
application's buffer, but has not yet received all that the application requested and is then
interrupted, the operating system could either fail the system call with errno set to EINTR or
allow the system call to succeed, returning the partial amount of data received. Similarly, if
write is interrupted after transferring some of the data in an application's buffer, the operation
system could either fail the system call with errno set to EINTR or allow the system call to
succeed, returning the partial amount of data written. Historically, implementations derived
from System V fail the system call, whereas BSD-derived implementations return partial
success. With the 2001 version of the POSIX.1 standard, the BSD-style semantics are
required.
The problem with interrupted system calls is that we now have to handle the error return
explicitly. The typical code sequence (assuming a read operation and assuming that we want
to restart the read even if it's interrupted) would be
Page 420
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
again:
if ((n = read(fd, buf, BUFFSIZE)) < 0) {
if (errno == EINTR)
goto again;
/* just an interrupted system call */
/* handle other errors */
}
To prevent applications from having to handle interrupted system calls, 4.2BSD introduced the
automatic restarting of certain interrupted system calls. The system calls that were
automatically restarted are ioctl, read, readv, write, writev, wait, and waitpid. As we've
mentioned, the first five of these functions are interrupted by a signal only if they are
operating on a slow device; wait and waitpid are always interrupted when a signal is caught.
Since this caused a problem for some applications that didn't want the operation restarted if it
was interrupted, 4.3BSD allowed the process to disable this feature on a per signal basis.
POSIX.1 allows an implementation to restart system calls, but it is not required. The Single
UNIX Specification defines the SA_RESTART flag as an XSI extension to sigaction to allow
applications to request that interrupted system calls be restarted.
System V has never restarted system calls by default. BSD, on the other hand, restarts them
if interrupted by signals. By default, FreeBSD 5.2.1, Linux 2.4.22, and Mac OS X 10.3 restart
system calls interrupted by signals. The default on Solaris 9, however, is to return an error (
EINTR) instead.
One of the reasons 4.2BSD introduced the automatic restart feature is that sometimes we
don't know that the input or output device is a slow device. If the program we write can be
used interactively, then it might be reading or writing a slow device, since terminals fall into
this category. If we catch signals in this program, and if the system doesn't provide the
restart capability, then we have to test every read or write for the interrupted error return
and reissue the read or write.
Figure 10.3 summarizes the signal functions and their semantics provided by the various
implementations.
Figure 10.3. Features provided by various signal implementations
Functions
System
ISO C, POSIX.1
Signal handler
remains installed
Ability to block
signals
Automatic restart of
interrupted system
calls?
unspecified
unspecified
unspecified
V7, SVR2, SVR3, SVR4,
Solaris
never
signal
4.2BSD
•
•
4.3BSD, 4.4BSD,
FreeBSD, Linux, Mac
OS X
•
•
XSI
•
•
unspecified
SVR3, SVR4, Linux,
Solaris
•
•
never
always
default
sigset
Page 421
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 10.3. Features provided by various signal implementations
Signal handler
remains installed
Ability to block
signals
Automatic restart of
interrupted system
calls?
4.2BSD
•
•
always
4.3BSD, 4.4BSD,
FreeBSD, Mac OS X
•
•
default
POSIX.1
•
•
unspecified
XSI, 4.4BSD, SVR4,
FreeBSD, Mac OS X,
Linux, Solaris
•
•
optional
Functions
System
sigvec
sigaction
We don't discuss the older sigset and sigvec functions. Their use has been superceded by
the sigaction function; they are included only for completeness. In contrast, some
implementations promote the signal function as a simplified interface to sigaction.
Be aware that UNIX systems from other vendors can have values different from those shown
in this figure. For example, sigaction under SunOS 4.1.2 restarts an interrupted system call by
default, different from the platforms listed in Figure 10.3.
In Figure 10.18, we provide our own version of the signal function that automatically tries to
restart interrupted system calls (other than for the SIGALRM signal). In Figure 10.19, we
provide another function, signal_intr, that tries to never do the restart.
We talk more about interrupted system calls in Section 14.5 with regard to the select and
poll functions.
Page 422
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.6. Reentrant Functions
When a signal that is being caught is handled by a process, the normal sequence of
instructions being executed by the process is temporarily interrupted by the signal handler.
The process then continues executing, but the instructions in the signal handler are now
executed. If the signal handler returns (instead of calling exit or longjmp, for example), then
the normal sequence of instructions that the process was executing when the signal was
caught continues executing. (This is similar to what happens when a hardware interrupt
occurs.) But in the signal handler, we can't tell where the process was executing when the
signal was caught. What if the process was in the middle of allocating additional memory on
its heap using malloc, and we call malloc from the signal handler? Or, what if the process was
in the middle of a call to a function, such as getpwnam (Section 6.2), that stores its result in a
static location, and we call the same function from the signal handler? In the malloc example,
havoc can result for the process, since malloc usually maintains a linked list of all its allocated
areas, and it may have been in the middle of changing this list. In the case of getpwnam, the
information returned to the normal caller can get overwritten with the information returned to
the signal handler.
The Single UNIX Specification specifies the functions that are guaranteed to be reentrant.
Figure 10.4 lists these reentrant functions.
Figure 10.4. Reentrant functions that may be called from a signal
handler
accept
fchmod
lseek
sendto
stat
access
fchown
lstat
setgid
symlink
aio_error
fcntl
mkdir
setpgid
sysconf
aio_return
fdatasync
mkfifo
setsid
tcdrain
aio_suspend
fork
open
setsockopt
tcflow
alarm
fpathconf
pathconf
setuid
tcflush
bind
fstat
pause
shutdown
tcgetattr
cfgetispeed
fsync
pipe
sigaction
tcgetpgrp
cfgetospeed
ftruncate
poll
sigaddset
tcsendbreak
cfsetispeed
getegid
posix_trace_event
sigdelset
tcsetattr
cfsetospeed
geteuid
pselect
sigemptyset
tcsetpgrp
chdir
getgid
raise
sigfillset
time
chmod
getgroups
read
sigismember
timer_getoverrun
Page 423
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 10.4. Reentrant functions that may be called from a signal
handler
accept
fchmod
lseek
sendto
stat
chown
getpeername
readlink
signal
timer_gettime
clock_gettime
getpgrp
recv
sigpause
timer_settime
close
getpid
recvfrom
sigpending
times
connect
getppid
recvmsg
sigprocmask
umask
creat
getsockname
rename
sigqueue
uname
dup
getsockopt
rmdir
sigset
unlink
dup2
getuid
select
sigsuspend
utime
execle
kill
sem_post
sleep
wait
execve
link
send
socket
waitpid
_Exit & _exit
listen
sendmsg
socketpair
write
Most functions that are not in Figure 10.4 are missing because (a) they are known to use
static data structures, (b) they call malloc or free, or (c) they are part of the standard I/O
library. Most implementations of the standard I/O library use global data structures in a
nonreentrant way. Note that even though we call printf from signal handlers in some of our
examples, it is not guaranteed to produce the expected results, since the signal hander can
interrupt a call to printf from our main program.
Be aware that even if we call a function listed in Figure 10.4 from a signal handler, there is
only one errno variable per thread (recall the discussion of errno and threads in Section 1.7),
and we might modify its value. Consider a signal handler that is invoked right after main has
set errno. If the signal handler calls read, for example, this call can change the value of errno,
wiping out the value that was just stored in main. Therefore, as a general rule, when calling
the functions listed in Figure 10.4 from a signal handler, we should save and restore errno. (Be
aware that a commonly caught signal is SIGCHLD, and its signal handler usually calls one of the
wait functions. All the wait functions can change errno.)
Note that longjmp (Section 7.10) and siglongjmp (Section 10.15) are missing from Figure 10.4,
because the signal may have occurred while the main routine was updating a data structure in
a nonreentrant way. This data structure could be left half updated if we call siglongjmp
instead of returning from the signal handler. If it is going to do such things as update global
data structures, as we describe here, while catching signals that cause sigsetjmp to be
executed, an application needs to block the signals while updating the data structures.
Example
Figure 10.5 shows a program that calls the nonreentrant function getpwnam from a signal
handler that is called every second. We describe the alarm function in Section 10.10. We use
Page 424
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
it here to generate a SIGALRM signal every second.
When this program was run, the results were random. Usually, the program would be
terminated by a SIGSEGV signal when the signal handler returned after several iterations. An
examination of the core file showed that the main function had called getpwnam, but that some
internal pointers had been corrupted when the signal handler called the same function.
Occasionally, the program would run for several seconds before crashing with a SIGSEGV error.
When the main function did run correctly after the signal had been caught, the return value
was sometimes corrupted and sometimes fine. Once (on Mac OS X), messages were printed
from the malloc library routine warning about freeing pointers not allocated through malloc.
As shown by this example, if we call a nonreentrant function from a signal handler, the results
are unpredictable.
Figure 10.5. Call a nonreentrant function from a signal handler
#include "apue.h"
#include <pwd.h>
static void
my_alarm(int signo)
{
struct passwd
*rootptr;
printf("in signal handler\n");
if ((rootptr = getpwnam("root")) == NULL)
err_sys("getpwnam(root) error");
alarm(1);
}
int
main(void)
{
struct passwd
*ptr;
signal(SIGALRM, my_alarm);
alarm(1);
for ( ; ; ) {
if ((ptr = getpwnam("sar")) == NULL)
err_sys("getpwnam error");
if (strcmp(ptr->pw_name, "sar") != 0)
printf("return value corrupted!, pw_name = %s\n",
ptr->pw_name);
}
}
Page 425
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.7. SIGCLD Semantics
Two signals that continually generate confusion are SIGCLD and SIGCHLD. First, SIGCLD (without
the H) is the System V name, and this signal has different semantics from the BSD signal,
named SIGCHLD. The POSIX.1 signal is also named SIGCHLD.
The semantics of the BSD SIGCHLD signal are normal, in that its semantics are similar to those
of all other signals. When the signal occurs, the status of a child has changed, and we need
to call one of the wait functions to determine what has happened.
System V, however, has traditionally handled the SIGCLD signal differently from other signals.
SVR4-based systems continue this questionable tradition (i.e., compatibility constraint) if we
set its disposition using either signal or sigset (the older, SVR3-compatible functions to set
the disposition of a signal). This older handling of SIGCLD consists of the following.
1.
If the process specifically sets its disposition to SIG_IGN, children of the calling process
will not generate zombie processes. Note that this is different from its default action (
SIG_DFL), which from Figure 10.1 is to be ignored. Instead, on termination, the status
of these child processes is discarded. If it subsequently calls one of the wait functions,
the calling process will block until all its children have terminated, and then wait returns
1 with errno set to ECHILD. (The default disposition of this signal is to be ignored, but
this default will not cause the preceding semantics to occur. Instead, we specifically
have to set its disposition to SIG_IGN.)
POSIX.1 does not specify what happens when SIGCHLD is ignored, so this behavior is
allowed. The Single UNIX Specification includes an XSI extension specifying that this
behavior be supported for SIGCHLD.
4.4BSD always generates zombies if SIGCHLD is ignored. If we want to avoid zombies,
we have to wait for our children. FreeBSD 5.2.1 works like 4.4BSD. Mac OS X 10.3,
however, doesn't create zombies when SIGCHLD is ignored.
With SVR4, if either signal or sigset is called to set the disposition of SIGCHLD to be
ignored, zombies are never generated. Solaris 9 and Linux 2.4.22 follow SVR4 in this
behavior.
With sigaction, we can set the SA_NOCLDWAIT flag (Figure 10.16) to avoid zombies. This
action is supported on all four platforms: FreeBSD 5.2.1, Linux 2.4.22, Mac OS X 10.3,
and Solaris 9.
2.
If we set the disposition of SIGCLD to be caught, the kernel immediately checks
whether any child processes are ready to be waited for and, if so, calls the SIGCLD
handler.
Item 2 changes the way we have to write a signal handler for this signal, as illustrated in the
following example.
Example
Recall from Section 10.4 that the first thing to do on entry to a signal handler is to call signal
again, to reestablish the handler. (This action was to minimize the window of time when the
signal is reset back to its default and could get lost.) We show this in Figure 10.6. This
program doesn't work on some platforms. If we compile and run it under a traditional System V
platform, such as OpenServer 5 or UnixWare 7, the output is a continual string of SIGCLD
received lines. Eventually, the process runs out of stack space and terminates abnormally.
FreeBSD 5.2.1 and Mac OS X 10.3 don't exhibit this problem, because BSD-based systems
generally don't support historic System V semantics for SIGCLD. Linux 2.4.22 also doesn't
Page 426
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
exhibit this problem, because it doesn't call the SIGCHLD signal handler when a process
arranges to catch SIGCHLD and child processes are ready to be waited for, even though SIGCLD
and SIGCHLD are defined to be the same value. Solaris 9, on the other hand, does call the
signal handler in this situation, but includes extra code in the kernel to avoid this problem.
Although the four platforms described in this book solve this problem, realize that platforms
(such as UnixWare) still exist that haven't addressed it.
The problem with this program is that the call to signal at the beginning of the signal handler
invokes item 2 from the preceding discussionthe kernel checks whether a child needs to be
waited for (which there is, since we're processing a SIGCLD signal), so it generates another call
to the signal handler. The signal handler calls signal, and the whole process starts over again.
To fix this program, we have to move the call to signal after the call to wait. By doing this,
we call signal after fetching the child's termination status; the signal is generated again by
the kernel only if some other child has since terminated.
POSIX.1 states that when we establish a signal handler for SIGCHLD and there exists a
terminated child we have not yet waited for, it is unspecified whether the signal is generated.
This allows the behavior described previously. But since POSIX.1 does not reset a signal's
disposition to its default when the signal occurs (assuming that we're using the POSIX.1
sigaction function to set its disposition), there is no need for us to ever establish a signal
handler for SIGCHLD within that handler.
Figure 10.6. System V SIGCLD handler that doesn't work
#include
#include
"apue.h"
<sys/wait.h>
static void sig_cld(int);
int
main()
{
pid_t
pid;
if (signal(SIGCLD, sig_cld) == SIG_ERR)
perror("signal error");
if ((pid = fork()) < 0) {
perror("fork error");
} else if (pid == 0) {
/* child */
sleep(2);
_exit(0);
}
pause();
/* parent */
exit(0);
}
static void
sig_cld(int signo)
{
pid_t
pid;
int
status;
/* interrupts pause() */
printf("SIGCLD received\n");
if (signal(SIGCLD, sig_cld) == SIG_ERR) /* reestablish handler */
perror("signal error");
if ((pid = wait(&status)) < 0)
/* fetch child status */
perror("wait error");
printf("pid = %d\n", pid);
}
Page 427
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Be cognizant of the SIGCHLD semantics for your implementation. Be especially aware of some
systems that #define SIGCHLD to be SIGCLD or vice versa. Changing the name may allow you to
compile a program that was written for another system, but if that program depends on the
other semantics, it may not work.
On the four platforms described in this text, SIGCLD is equivalent to SIGCHLD.
Page 428
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.8. Reliable-Signal Terminology and Semantics
We need to define some of the terms used throughout our discussion of signals. First, a signal
is generated for a process (or sent to a process) when the event that causes the signal
occurs. The event could be a hardware exception (e.g., divide by 0), a software condition
(e.g., an alarm timer expiring), a terminal-generated signal, or a call to the kill function.
When the signal is generated, the kernel usually sets a flag of some form in the process table.
We say that a signal is delivered to a process when the action for a signal is taken. During the
time between the generation of a signal and its delivery, the signal is said to be pending.
A process has the option of blocking the delivery of a signal. If a signal that is blocked is
generated for a process, and if the action for that signal is either the default action or to
catch the signal, then the signal remains pending for the process until the process either (a)
unblocks the signal or (b) changes the action to ignore the signal. The system determines
what to do with a blocked signal when the signal is delivered, not when it's generated. This
allows the process to change the action for the signal before it's delivered. The sigpending
function (Section 10.13) can be called by a process to determine which signals are blocked
and pending.
What happens if a blocked signal is generated more than once before the process unblocks
the signal? POSIX.1 allows the system to deliver the signal either once or more than once. If
the system delivers the signal more than once, we say that the signals are queued. Most UNIX
systems, however, do not queue signals unless they support the real-time extensions to
POSIX.1. Instead, the UNIX kernel simply delivers the signal once.
The manual pages for SVR2 claimed that the SIGCLD signal was queued while the process was
executing its SIGCLD signal handler. Although this might have been true on a conceptual level,
the actual implementation was different. Instead, the signal was regenerated by the kernel as
we described in Section 10.7. In SVR3, the manual was changed to indicate that the SIGCLD
signal was ignored while the process was executing its signal handler for SIGCLD. The SVR4
manual removed any mention of what happens to SIGCLD signals that are generated while a
process is executing its SIGCLD signal handler.
The SVR4 sigaction(2) manual page in AT&T [1990e] claims that the SA_SIGINFO flag (Figure
10.16) causes signals to be reliably queued. This is wrong. Apparently, this feature was
partially implemented within the kernel, but it is not enabled in SVR4. Curiously, the SVID
doesn't make the same claims of reliable queuing.
What happens if more than one signal is ready to be delivered to a process? POSIX.1 does not
specify the order in which the signals are delivered to the process. The Rationale for POSIX.1
does suggest, however, that signals related to the current state of the process be delivered
before other signals. (SIGSEGV is one such signal.)
Each process has a signal mask that defines the set of signals currently blocked from delivery
to that process. We can think of this mask as having one bit for each possible signal. If the
bit is on for a given signal, that signal is currently blocked. A process can examine and change
its current signal mask by calling sigprocmask, which we describe in Section 10.12.
Since it is possible for the number of signals to exceed the number of bits in an integer,
POSIX.1 defines a data type, called sigset_t, that holds a signal set. The signal mask, for
example, is stored in one of these signal sets. We describe five functions that operate on
signal sets in Section 10.11.
Page 429
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.9. kill and raise Functions
The kill function sends a signal to a process or a group of processes. The raise function
allows a process to send a signal to itself.
raise was originally defined by ISO C. POSIX.1 includes it to align itself with the ISO C
standard, but POSIX.1 extends the specification of raise to deal with threads (we discuss
how threads interact with signals in Section 12.8). Since ISO C does not deal with multiple
processes, it could not define a function, such as kill, that requires a process ID argument.
#include <signal.h>
int kill(pid_t pid, int signo
);
int raise(int signo);
Both return: 0 if OK, 1 on error
The call
raise(signo);
is equivalent to the call
kill(getpid(), signo);
There are four different conditions for the pid argument to kill.
pid
>0
The signal is sent to the process whose process ID is pid.
pid The signal is sent to all processes whose process group ID equals the process group
== 0 ID of the sender and for which the sender has permission to send the signal. Note
that the term all processes excludes an implementation-defined set of system
processes. For most UNIX systems, this set of system processes includes the kernel
processes and init (pid 1).
pid
<0
The signal is sent to all processes whose process group ID equals the absolute value
of pid and for which the sender has permission to send the signal. Again, the set of all
processes excludes certain system processes, as described earlier.
pid The signal is sent to all processes on the system for which the sender has permission
== 1 to send the signal. As before, the set of processes excludes certain system
processes.
As we've mentioned, a process needs permission to send a signal to another process. The
superuser can send a signal to any process. For other users, the basic rule is that the real or
effective user ID of the sender has to equal the real or effective user ID of the receiver. If
the implementation supports _POSIX_SAVED_IDS (as POSIX.1 now requires), the saved
Page 430
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
set-user-ID of the receiver is checked instead of its effective user ID. There is also one
special case for the permission testing: if the signal being sent is SIGCONT, a process can send
it to any other process in the same session.
POSIX.1 defines signal number 0 as the null signal. If the signo argument is 0, then the normal
error checking is performed by kill, but no signal is sent. This is often used to determine if a
specific process still exists. If we send the process the null signal and it doesn't exist, kill
returns 1 and errno is set to ESRCH. Be aware, however, that UNIX systems recycle process
IDs after some amount of time, so the existence of a process with a given process ID does
not mean that it's the process that you think it is.
Also understand that the test for process existence is not atomic. By the time that kill
returns the answer to the caller, the process in question might have exited, so the answer is
of limited value.
If the call to kill causes the signal to be generated for the calling process and if the signal is
not blocked, either signo or some other pending, unblocked signal is delivered to the process
before kill returns. (Additional conditions occur with threads; see Section 12.8 for more
information.)
Page 431
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.10. alarm and pause Functions
The alarm function allows us to set a timer that will expire at a specified time in the future.
When the timer expires, the SIGALRM signal is generated. If we ignore or don't catch this
signal, its default action is to terminate the process.
#include <unistd.h>
unsigned int alarm(unsigned int seconds);
Returns: 0 or number of seconds until previously set alarm
The seconds value is the number of clock seconds in the future when the signal should be
generated. Be aware that when that time occurs, the signal is generated by the kernel, but
there could be additional time before the process gets control to handle the signal, because of
processor scheduling delays.
Earlier UNIX System implementations warned that the signal could also be sent up to 1 second
early. POSIX.1 does not allow this.
There is only one of these alarm clocks per process. If, when we call alarm, a previously
registered alarm clock for the process has not yet expired, the number of seconds left for that
alarm clock is returned as the value of this function. That previously registered alarm clock is
replaced by the new value.
If a previously registered alarm clock for the process has not yet expired and if the seconds
value is 0, the previous alarm clock is canceled. The number of seconds left for that previous
alarm clock is still returned as the value of the function.
Although the default action for SIGALRM is to terminate the process, most processes that use
an alarm clock catch this signal. If the process then wants to terminate, it can perform
whatever cleanup is required before terminating. If we intend to catch SIGALRM, we need to be
careful to install its signal handler before calling alarm. If we call alarm first and are sent
SIGALRM before we can install the signal handler, our process will terminate.
The pause function suspends the calling process until a signal is caught.
#include <unistd.h>
int pause(void);
Returns: 1 with errno set to EINTR
The only time pause returns is if a signal handler is executed and that handler returns. In that
case, pause returns 1 with errno set to EINTR.
Example
Using alarm and pause, we can put a process to sleep for a specified amount of time. The
sleep1 function in Figure 10.7 appears to do this (but it has problems, as we shall see
shortly).
Page 432
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
This function looks like the sleep function, which we describe in Section 10.19, but this simple
implementation has three problems.
1.
If the caller already has an alarm set, that alarm is erased by the first call to alarm. We
can correct this by looking at the return value from the first call to alarm. If the
number of seconds until some previously set alarm is less than the argument, then we
should wait only until the previously set alarm expires. If the previously set alarm will
go off after ours, then before returning we should reset this alarm to occur at its
designated time in the future.
2.
We have modified the disposition for SIGALRM. If we're writing a function for others to
call, we should save the disposition when we're called and restore it when we're done.
We can correct this by saving the return value from signal and resetting the
disposition before we return.
3.
There is a race condition between the first call to alarm and the call to pause. On a
busy system, it's possible for the alarm to go off and the signal handler to be called
before we call pause. If that happens, the caller is suspended forever in the call to
pause (assuming that some other signal isn't caught).
Earlier implementations of sleep looked like our program, with problems 1 and 2 corrected as
described. There are two ways to correct problem 3. The first uses setjmp, which we show in
the next example. The other uses sigprocmask and sigsuspend, and we describe it in Section
10.19.
Figure 10.7. Simple, incomplete implementation of sleep
#include
#include
<signal.h>
<unistd.h>
static void
sig_alrm(int signo)
{
/* nothing to do, just return to wake up the pause */
}
unsigned int
sleep1(unsigned int nsecs)
{
if (signal(SIGALRM, sig_alrm) == SIG_ERR)
return(nsecs);
alarm(nsecs);
/* start the timer */
pause();
/* next caught signal wakes us up */
return(alarm(0));
/* turn off timer, return unslept time */
}
Example
The SVR2 implementation of sleep used setjmp and longjmp (Section 7.10) to avoid the race
condition described in problem 3 of the previous example. A simple version of this function,
called sleep2, is shown in Figure 10.8. (To reduce the size of this example, we don't handle
problems 1 and 2 described earlier.)
The sleep2 function avoids the race condition from Figure 10.7. Even if the pause is never
executed, the sleep2 function returns when the SIGALRM occurs.
There is, however, another subtle problem with the sleep2 function that involves its
interaction with other signals. If the SIGALRM interrupts some other signal handler, when we
call longjmp, we abort the other signal handler. Figure 10.9 shows this scenario. The loop in
Page 433
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
the SIGINT handler was written so that it executes for longer than 5 seconds on one of the
systems used by the author. We simply want it to execute longer than the argument to sleep2
. The integer k is declared volatile to prevent an optimizing compiler from discarding the loop.
Executing the program shown in Figure 10.9 gives us
$ ./a.out
we type the interrupt character
^?
sig_int starting
sleep2 returned: 0
We can see that the longjmp from the sleep2 function aborted the other signal handler,
sig_int, even though it wasn't finished. This is what you'll encounter if you mix the SVR2
sleep function with other signal handling. See Exercise 10.3.
Figure 10.8. Another (imperfect) implementation of sleep
#include
#include
#include
<setjmp.h>
<signal.h>
<unistd.h>
static jmp_buf
env_alrm;
static void
sig_alrm(int signo)
{
longjmp(env_alrm, 1);
}
unsigned int
sleep2(unsigned int nsecs)
{
if (signal(SIGALRM, sig_alrm) == SIG_ERR)
return(nsecs);
if (setjmp(env_alrm) == 0) {
alarm(nsecs);
/* start the timer */
pause();
/* next caught signal wakes us up */
}
return(alarm(0));
/* turn off timer, return unslept time */
}
Figure 10.9. Calling sleep2 from a program that catches other signals
#include "apue.h"
unsigned int
static void
int
main(void)
{
unsigned int
sleep2(unsigned int);
sig_int(int);
unslept;
if (signal(SIGINT, sig_int) == SIG_ERR)
err_sys("signal(SIGINT) error");
unslept = sleep2(5);
printf("sleep2 returned: %u\n", unslept);
Page 434
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
exit(0);
}
static void
sig_int(int signo)
{
int
i, j;
volatile int
k;
/*
* Tune these loops to run for more than 5 seconds
* on whatever system this test program is run.
*/
printf("\nsig_int starting\n");
for (i = 0; i < 300000; i++)
for (j = 0; j < 4000; j++)
k += i * j;
printf("sig_int finished\n");
}
The purpose of these two examples, the sleep1 and sleep2 functions, is to show the pitfalls in
dealing naively with signals. The following sections will show ways around all these problems,
so we can handle signals reliably, without interfering with other pieces of code.
Example
A common use for alarm, in addition to implementing the sleep function, is to put an upper
time limit on operations that can block. For example, if we have a read operation on a device
that can block (a "slow" device, as described in Section 10.5), we might want the read to time
out after some amount of time. The program in Figure 10.10 does this, reading one line from
standard input and writing it to standard output.
This sequence of code is common in UNIX applications, but this program has two problems.
1.
The program in Figure 10.10 has one of the same flaws that we described in Figure
10.7: a race condition between the first call to alarm and the call to read. If the kernel
blocks the process between these two function calls for longer than the alarm period,
the read could block forever. Most operations of this type use a long alarm period, such
as a minute or more, making this unlikely; nevertheless, it is a race condition.
2.
If system calls are automatically restarted, the read is not interrupted when the
SIGALRM signal handler returns. In this case, the timeout does nothing.
Here, we specifically want a slow system call to be interrupted. POSIX.1 does not give us a
portable way to do this; however, the XSI extension in the Single UNIX Specification does.
We'll discuss this more in Section 10.14.
Figure 10.10. Calling read with a timeout
#include "apue.h"
static void sig_alrm(int);
int
main(void)
{
int
char
n;
line[MAXLINE];
if (signal(SIGALRM, sig_alrm) == SIG_ERR)
Page 435
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
err_sys("signal(SIGALRM) error");
alarm(10);
if ((n = read(STDIN_FILENO, line, MAXLINE)) < 0)
err_sys("read error");
alarm(0);
write(STDOUT_FILENO, line, n);
exit(0);
}
static void
sig_alrm(int signo)
{
/* nothing to do, just return to interrupt the read */
}
Example
Let's redo the preceding example using longjmp. This way, we don't need to worry about
whether a slow system call is interrupted.
This version works as expected, regardless of whether the system restarts interrupted system
calls. Realize, however, that we still have the problem of interactions with other signal
handlers, as in Figure 10.8.
Figure 10.11. Calling read with a timeout, using longjmp
#include "apue.h"
#include <setjmp.h>
static void
static jmp_buf
int
main(void)
{
int
char
sig_alrm(int);
env_alrm;
n;
line[MAXLINE];
if (signal(SIGALRM, sig_alrm) == SIG_ERR)
err_sys("signal(SIGALRM) error");
if (setjmp(env_alrm) != 0)
err_quit("read timeout");
alarm(10);
if ((n = read(STDIN_FILENO, line, MAXLINE)) < 0)
err_sys("read error");
alarm(0);
write(STDOUT_FILENO, line, n);
exit(0);
}
static void
sig_alrm(int signo)
{
longjmp(env_alrm, 1);
}
Page 436
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
If we want to set a time limit on an I/O operation, we need to use longjmp, as shown
previously, realizing its possible interaction with other signal handlers. Another option is to use
the select or poll functions, described in Sections 14.5.1 and 14.5.2.
Page 437
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.11. Signal Sets
We need a data type to represent multiple signalsa signal set. We'll use this with such
functions as sigprocmask (in the next section) to tell the kernel not to allow any of the signals
in the set to occur. As we mentioned earlier, the number of different signals can exceed the
number of bits in an integer, so in general, we can't use an integer to represent the set with
one bit per signal. POSIX.1 defines the data type sigset_t to contain a signal set and the
following five functions to manipulate signal sets.
#include <signal.h>
int sigemptyset(sigset_t *set);
int sigfillset(sigset_t *set);
int sigaddset(sigset_t *set, int signo);
int sigdelset(sigset_t *set, int signo);
All four return: 0 if OK, 1 on error
int sigismember(const sigset_t *set, int signo
);
Returns: 1 if true, 0 if false, 1 on error
The function sigemptyset initializes the signal set pointed to by set so that all signals are
excluded. The function sigfillset initializes the signal set so that all signals are included. All
applications have to call either sigemptyset or sigfillset once for each signal set, before
using the signal set, because we cannot assume that the C initialization for external and
static variables (0) corresponds to the implementation of signal sets on a given system.
Once we have initialized a signal set, we can add and delete specific signals in the set. The
function sigaddset adds a single signal to an existing set, and sigdelset removes a single
signal from a set. In all the functions that take a signal set as an argument, we always pass
the address of the signal set as the argument.
Implementation
If the implementation has fewer signals than bits in an integer, a signal set can be
implemented using one bit per signal. For the remainder of this section, assume that an
implementation has 31 signals and 32-bit integers. The sigemptyset function zeros the integer,
and the sigfillset function turns on all the bits in the integer. These two functions can be
implemented as macros in the <signal.h> header:
#define sigemptyset(ptr)
#define sigfillset(ptr)
(*(ptr) = 0)
(*(ptr) = ~(sigset_t)0, 0)
Note that sigfillset must return 0, in addition to setting all the bits on in the signal set, so
we use C's comma operator, which returns the value after the comma as the value of the
expression.
Page 438
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Using this implementation, sigaddset turns on a single bit and sigdelset turns off a single bit;
sigismember tests a certain bit. Since no signal is ever numbered 0, we subtract 1 from the
signal number to obtain the bit to manipulate. Figure 10.12 shows implementations of these
functions.
Figure 10.12. An implementation of sigaddset, sigdelset, and
sigismember
#include
#include
<signal.h>
<errno.h>
/* <signal.h> usually defines NSIG to include signal number 0 */
#define SIGBAD(signo)
((signo) <= 0 || (signo) >= NSIG)
int
sigaddset(sigset_t *set, int signo)
{
if (SIGBAD(signo)) { errno = EINVAL; return(-1); }
*set |= 1 << (signo - 1);
return(0);
/* turn bit on */
}
int
sigdelset(sigset_t *set, int signo)
{
if (SIGBAD(signo)) { errno = EINVAL; return(-1); }
*set &= ~(1 << (signo - 1));
return(0);
/* turn bit off */
}
int
sigismember(const sigset_t *set, int signo)
{
if (SIGBAD(signo)) { errno = EINVAL; return(-1); }
return((*set & (1 << (signo - 1))) != 0);
}
We might be tempted to implement these three functions as one-line macros in the <signal.h>
header, but POSIX.1 requires us to check the signal number argument for validity and to set
errno if it is invalid. This is more difficult to do in a macro than in a function.
Page 439
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.12. sigprocmask Function
Recall from Section 10.8 that the signal mask of a process is the set of signals currently
blocked from delivery to that process. A process can examine its signal mask, change its
signal mask, or perform both operations in one step by calling the following function.
#include <signal.h>
int sigprocmask(int how, const sigset_t *restrict set,
sigset_t *restrict oset);
Returns: 0 if OK, 1 on error
First, if oset is a non-null pointer, the current signal mask for the process is returned through
oset.
Second, if set is a non-null pointer, the how argument indicates how the current signal mask
is modified. Figure 10.13 describes the possible values for how. SIG_BLOCK is an inclusive-OR
operation, whereas SIG_SETMASK is an assignment. Note that SIGKILL and SIGSTOP can't be
blocked.
Figure 10.13. Ways to change current signal mask using sigprocmask
how
Description
SIG_BLOCK
The new signal mask for the process is the union of its current signal mask
and the signal set pointed to by set. That is, set contains the additional
signals that we want to block.
SIG_UNBLOCK
The new signal mask for the process is the intersection of its current signal
mask and the complement of the signal set pointed to by set. That is, set
contains the signals that we want to unblock.
SIG_SETMASK
The new signal mask for the process is replaced by the value of the signal set
pointed to by set.
If set is a null pointer, the signal mask of the process is not changed, and how is ignored.
After calling sigprocmask, if any unblocked signals are pending, at least one of these signals is
delivered to the process before sigprocmask returns.
The sigprocmask function is defined only for single-threaded processes. A separate function is
provided to manipulate a thread's signal mask in a multithreaded process. We'll discuss this in
Section 12.8.
Example
Figure 10.14 shows a function that prints the names of the signals in the signal mask of the
calling process. We call this function from the programs shown in Figure 10.20 and Figure
10.22.
Page 440
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
To save space, we don't test the signal mask for every signal that we listed in Figure 10.1.
(See Exercise 10.9.)
Figure 10.14. Print the signal mask for the process
#include "apue.h"
#include <errno.h>
void
pr_mask(const char *str)
{
sigset_t
sigset;
int
errno_save;
errno_save = errno;
/* we can be called by signal handlers */
if (sigprocmask(0, NULL, &sigset) < 0)
err_sys("sigprocmask error");
printf("%s", str);
if (sigismember(&sigset,
if (sigismember(&sigset,
if (sigismember(&sigset,
if (sigismember(&sigset,
SIGINT))
SIGQUIT))
SIGUSR1))
SIGALRM))
printf("SIGINT ");
printf("SIGQUIT ");
printf("SIGUSR1 ");
printf("SIGALRM ");
/* remaining signals can go here */
printf("\n");
errno = errno_save;
}
Page 441
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.13. sigpending Function
The sigpending function returns the set of signals that are blocked from delivery and currently
pending for the calling process. The set of signals is returned through the set argument.
#include <signal.h>
int sigpending(sigset_t *set);
Returns: 0 if OK, 1 on error
Example
Figure 10.15 shows many of the signal features that we've been describing.
The process blocks SIGQUIT, saving its current signal mask (to reset later), and then goes to
sleep for 5 seconds. Any occurrence of the quit signal during this period is blocked and won't
be delivered until the signal is unblocked. At the end of the 5-second sleep, we check
whether the signal is pending and unblock the signal.
Note that we saved the old mask when we blocked the signal. To unblock the signal, we did a
SIG_SETMASK of the old mask. Alternatively, we could SIG_UNBLOCK only the signal that we had
blocked. Be aware, however, if we write a function that can be called by others and if we
need to block a signal in our function, we can't use SIG_UNBLOCK to unblock the signal. In this
case, we have to use SIG_SETMASK and reset the signal mask to its prior value, because it's
possible that the caller had specifically blocked this signal before calling our function. We'll see
an example of this in the system function in Section 10.18.
If we generate the quit signal during this sleep period, the signal is now pending and
unblocked, so it is delivered before sigprocmask returns. We'll see this occur because the
printf in the signal handler is output before the printf that follows the call to sigprocmask.
The process then goes to sleep for another 5 seconds. If we generate the quit signal during
this sleep period, the signal should terminate the process, since we reset the handling of the
signal to its default when we caught it. In the following output, the terminal prints ^\ when
we input Control-backslash, the terminal quit character:
$ ./a.out
^\
SIGQUIT pending
caught SIGQUIT
SIGQUIT unblocked
^\Quit(coredump)
$ ./a.out
generate signal once (before 5 seconds are up)
after return from sleep
in signal handler
after return from sigprocmask
generate signal again
^\^\^\^\^\^\^\^\^\^\
SIGQUIT pending
caught SIGQUIT
SIGQUIT unblocked
^\Quit(coredump)
generate signal 10 times (before 5 seconds are up)
signal is generated only once
generate signal again
The message Quit(coredump) is printed by the shell when it sees that its child terminated
abnormally. Note that when we run the program the second time, we generate the quit signal
Page 442
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
ten times while the process is asleep, yet the signal is delivered only once to the process
when it's unblocked. This demonstrates that signals are not queued on this system.
Figure 10.15. Example of signal sets and sigprocmask
#include "apue.h"
static void sig_quit(int);
int
main(void)
{
sigset_t
newmask, oldmask, pendmask;
if (signal(SIGQUIT, sig_quit) == SIG_ERR)
err_sys("can't catch SIGQUIT");
/*
* Block SIGQUIT and save current signal mask.
*/
sigemptyset(&newmask);
sigaddset(&newmask, SIGQUIT);
if (sigprocmask(SIG_BLOCK, &newmask, &oldmask) < 0)
err_sys("SIG_BLOCK error");
sleep(5);
/* SIGQUIT here will remain pending */
if (sigpending(&pendmask) < 0)
err_sys("sigpending error");
if (sigismember(&pendmask, SIGQUIT))
printf("\nSIGQUIT pending\n");
/*
* Reset signal mask which unblocks SIGQUIT.
*/
if (sigprocmask(SIG_SETMASK, &oldmask, NULL) < 0)
err_sys("SIG_SETMASK error");
printf("SIGQUIT unblocked\n");
sleep(5);
exit(0);
/* SIGQUIT here will terminate with core file */
}
static void
sig_quit(int signo)
{
printf("caught SIGQUIT\n");
if (signal(SIGQUIT, SIG_DFL) == SIG_ERR)
err_sys("can't reset SIGQUIT");
}
Page 443
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.14. sigaction Function
The sigaction function allows us to examine or modify (or both) the action associated with a
particular signal. This function supersedes the signal function from earlier releases of the
UNIX System. Indeed, at the end of this section, we show an implementation of signal using
sigaction.
[View full width]
#include <signal.h>
int sigaction(int signo, const struct sigaction
*restrict act,
struct sigaction *restrict oact);
Returns: 0 if OK, 1 on error
The argument signo is the signal number whose action we are examining or modifying. If the
act pointer is non-null, we are modifying the action. If the oact pointer is non-null, the system
returns the previous action for the signal through the oact pointer. This function uses the
following structure:
struct sigaction {
void
(*sa_handler)(int);
sigset_t sa_mask;
int
sa_flags;
/*
/*
/*
/*
addr of signal handler, */
or SIG_IGN, or SIG_DFL */
additional signals to block */
signal options, Figure 10.16 */
/* alternate handler */
void
(*sa_sigaction)(int, siginfo_t *, void *);
};
When changing the action for a signal, if the sa_handler field contains the address of a
signal-catching function (as opposed to the constants SIG_IGN or SIG_DFL), then the sa_mask
field specifies a set of signals that are added to the signal mask of the process before the
signal-catching function is called. If and when the signal-catching function returns, the signal
mask of the process is reset to its previous value. This way, we are able to block certain
signals whenever a signal handler is invoked. The operating system includes the signal being
delivered in the signal mask when the handler is invoked. Hence, we are guaranteed that
whenever we are processing a given signal, another occurrence of that same signal is blocked
until we're finished processing the first occurrence. Recall from Section 10.8 that additional
occurrences of the same signal are usually not queued. If the signal occurs five times while it
is blocked, when we unblock the signal, the signal-handling function for that signal will usually
be invoked only one time.
Once we install an action for a given signal, that action remains installed until we explicitly
change it by calling sigaction. Unlike earlier systems with their unreliable signals, POSIX.1
requires that a signal handler remain installed until explicitly changed.
The sa_flags field of the act structure specifies various options for the handling of this signal.
Figure 10.16 details the meaning of these options when set. The SUS column contains • if the
flag is defined as part of the base POSIX.1 specification, and XSI if it is defined as an XSI
extension to the base.
Page 444
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 10.16. Option flags (sa_flags) for the handling of each signal
Option
SUS
FreeBSD
5.2.1
SA_INTERRUPT
SA_NOCLDSTOP
Linux
2.4.22
Mac
OS X
10.3
Solaris
9
•
Description
System calls interrupted by this
signal are not automatically
restarted (the XSI default for
sigaction). See Section 10.5 for
more information.
•
•
•
•
•
If signo is SIGCHLD, do not generate
this signal when a child process
stops (job control). This signal is
still generated, of course, when a
child terminates (but see the
SA_NOCLDWAIT option below). As an
XSI extension, SIGCHLD won't be
sent when a stopped child
continues if this flag is set.
SA_NOCLDWAIT
XSI
•
•
•
•
If signo is SIGCHLD, this option
prevents the system from creating
zombie processes when children of
the calling process terminate. If it
subsequently calls wait, the calling
process blocks until all its child
processes have terminated and
then returns 1 with errno set to
ECHILD. (Recall Section 10.7.)
SA_NODEFER
XSI
•
•
•
•
When this signal is caught, the
signal is not automatically blocked
by the system while the
signal-catching function executes
(unless the signal is also included
in sa_mask). Note that this type of
operation corresponds to the earlier
unreliable signals.
SA_ONSTACK
XSI
•
•
•
•
If an alternate stack has been
declared with sigaltstack(2), this
signal is delivered to the process on
the alternate stack.
SA_RESETHAND
XSI
•
•
•
•
The disposition for this signal is
reset to SIG_DFL, and the
SA_SIGINFO flag is cleared on entry
to the signal-catching function.
Note that this type of operation
corresponds to the earlier unreliable
signals. The disposition for the two
signals SIGILL and SIGTRAP can't be
reset automatically, however.
Page 445
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 10.16. Option flags (sa_flags) for the handling of each signal
Option
SUS
FreeBSD
5.2.1
Linux
2.4.22
Mac
OS X
10.3
Solaris
9
Description
Setting this flag causes sigaction
to behave as if SA_NODEFER is also
set.
SA_RESTART
SA_SIGINFO
XSI
•
•
•
•
System calls interrupted by this
signal are automatically restarted.
(Refer to Section 10.5.)
•
•
•
•
•
This option provides additional
information to a signal handler: a
pointer to a siginfo structure and a
pointer to an identifier for the
process context.
The sa_sigaction field is an alternate signal handler used when the SA_SIGINFO flag is used
with sigaction. Implementations might use the same storage for both the sa_sigaction field
and the sa_handler field, so applications can use only one of these fields at a time.
Normally, the signal handler is called as
void handler(int signo);
but if the SA_SIGINFO flag is set, the signal handler is called as
void handler(int signo, siginfo_t *info, void *context);
The siginfo_t structure contains information about why the signal was generated. An example
of what it might look like is shown below. All POSIX.1-compliant implementations must include
at least the si_signo and si_code members. Additionally, implementations that are XSI
compliant contain at least the following fields:
struct siginfo {
int
si_signo;
int
si_errno;
int
si_code;
pid_t si_pid;
uid_t si_uid;
void *si_addr;
int
si_status;
long
si_band;
/* possibly other
};
/* signal number */
/* if nonzero, errno value from <errno.h> */
/* additional info (depends on signal) */
/* sending process ID */
/* sending process real user ID */
/* address that caused the fault */
/* exit value or signal number */
/* band number for SIGPOLL */
fields also */
Figure 10.17 shows values of si_code for various signals, as defined by the Single UNIX
Specification. Note that implementations may define additional code values.
Page 446
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 10.17. siginfo_t code values
Signal
SIGILL
SIGFPE
SIGSEGV
SIGBUS
Code
Reason
ILL_ILLOPC
illegal opcode
ILL_ILLOPN
illegal operand
ILL_ILLADR
illegal addressing mode
ILL_ILLTRP
illegal trap
ILL_PRVOPC
privileged opcode
ILL_PRVREG
privileged register
ILL_COPROC
coprocessor error
ILL_BADSTK
internal stack error
FPE_INTDIV
integer divide by zero
FPE_INTOVF
integer overflow
FPE_FLTDIV
floating-point divide by zero
FPE_FLTOVF
floating-point overflow
FPE_FLTUND
floating-point underflow
FPE_FLTRES
floating-point inexact result
FPE_FLTINV
invalid floating-point operation
FPE_FLTSUB
subscript out of range
SEGV_MAPERR
address not mapped to object
SEGV_ACCERR
invalid permissions for mapped object
BUS_ADRALN
invalid address alignment
BUS_ADRERR
nonexistent physical address
BUS_OBJERR
object-specific hardware error
trAP_BRKPT
process breakpoint trap
Page 447
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 10.17. siginfo_t code values
Signal
Code
Reason
SIGTRAP
TRAP_TRACE
process trace trap
CLD_EXITED
child has exited
CLD_KILLED
child has terminated abnormally (no core)
CLD_DUMPED
child has terminated abnormally with core
CLD_TRAPPED
traced child has trapped
CLD_STOPPED
child has stopped
CLD_CONTINUED
stopped child has continued
POLL_IN
data can be read
POLL_OUT
data can be written
POLL_MSG
input message available
POLL_ERR
I/O error
POLL_PRI
high-priority message available
POLL_HUP
device disconnected
SI_USER
signal sent by kill
SI_QUEUE
signal sent by sigqueue (real-time extension)
SI_TIMER
expiration of a timer set by timer_settime (real-time extension)
SI_ASYNCIO
completion of asynchronous I/O request (real-time extension)
SI_MESGQ
arrival of a message on a message queue (real-time extension)
SIGCHLD
SIGPOLL
Any
If the signal is SIGCHLD, then the si_pid, si_status, and si_uid field will be set. If the signal is
SIGILL or SIGSEGV, then the si_addr contains the address responsible for the fault, although
the address might not be accurate. If the signal is SIGPOLL, then the si_band field will contain
the priority band for STREAMS messages that generate the POLL_IN, POLL_OUT, or POLL_MSG
events. (For a complete discussion of priority bands, see Rago [1993].) The si_errno field
contains the error number corresponding to the condition that caused the signal to be
generated, although its use is implementation defined.
Page 448
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The context argument to the signal handler is a typeless pointer that can be cast to a
ucontext_t structure identifying the process context at the time of signal delivery.
When an implementation supports the real-time signal extensions, signal handlers established
with the SA_SIGINFO flag will result in signals being queued reliably. A separate range of
reserved signals is available for real-time application use. The siginfo structure can contain
application-specific data if the signal is generated by sigqueue. We do not discuss the
real-time extensions further. Refer to Gallmeister [1995] for more details.
Examplesignal Function
Let's now implement the signal function using sigaction. This is what many platforms do (and
what a note in the POSIX.1 Rationale states was the intent of POSIX). Systems with binary
compatibility constraints, on the other hand, might provide a signal function that supports
the older, unreliable-signal semantics. Unless you specifically require these older, unreliable
semantics (for backward compatibility), you should use the following implementation of signal
or call sigaction directly. (As you might guess, an implementation of signal with the old
semantics could call sigaction specifying SA_RESETHAND and SA_NODEFER.) All the examples in
this text that call signal call the function shown in Figure 10.18.
Note that we must use sigemptyset to initialize the sa_mask member of the structure. We're
not guaranteed that
act.sa_mask = 0;
does the same thing.
We intentionally try to set the SA_RESTART flag for all signals other than SIGALRM, so that any
system call interrupted by these other signals is automatically restarted. The reason we don't
want SIGALRM restarted is to allow us to set a timeout for I/O operations. (Recall the
discussion of Figure 10.10.)
Some older systems, such as SunOS, define the SA_INTERRUPT flag. These systems restart
interrupted system calls by default, so specifying this flag causes system calls to be
interrupted. Linux defines the SA_INTERRUPT flag for compatibility with applications that use it,
but the default is to not restart system calls when the signal handler is installed with
sigaction. The XSI extension of the Single UNIX Specification specifies that the sigaction
function not restart interrupted system calls unless the SA_RESTART flag is specified.
Figure 10.18. An implementation of signal using sigaction
#include "apue.h"
/* Reliable version of signal(), using POSIX sigaction(). */
Sigfunc *
signal(int signo, Sigfunc *func)
{
struct sigaction
act, oact;
act.sa_handler = func;
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
if (signo == SIGALRM) {
#ifdef SA_INTERRUPT
act.sa_flags |= SA_INTERRUPT;
#endif
} else {
#ifdef SA_RESTART
act.sa_flags |= SA_RESTART;
Page 449
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
#endif
}
if (sigaction(signo, &act, &oact) < 0)
return(SIG_ERR);
return(oact.sa_handler);
}
Examplesignal_intr Function
Figure 10.19 shows a version of the signal function that tries to prevent any interrupted
system calls from being restarted.
For improved portability, we specify the SA_INTERRUPT flag, if defined by the system, to
prevent interrupted system calls from being restarted.
Figure 10.19. The signal_intr function
#include "apue.h"
Sigfunc *
signal_intr(int signo, Sigfunc *func)
{
struct sigaction
act, oact;
act.sa_handler = func;
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
#ifdef SA_INTERRUPT
act.sa_flags |= SA_INTERRUPT;
#endif
if (sigaction(signo, &act, &oact) < 0)
return(SIG_ERR);
return(oact.sa_handler);
}
Page 450
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.15. sigsetjmp and siglongjmp Functions
In Section 7.10, we described the setjmp and longjmp functions, which can be used for
nonlocal branching. The longjmp function is often called from a signal handler to return to the
main loop of a program, instead of returning from the handler. We saw this in Figures 10.8
and 10.11.
There is a problem in calling longjmp, however. When a signal is caught, the signal-catching
function is entered with the current signal automatically being added to the signal mask of the
process. This prevents subsequent occurrences of that signal from interrupting the signal
handler. If we longjmp out of the signal handler, what happens to the signal mask for the
process?
Under FreeBSD 5.2.1 and Mac OS X 10.3, setjmp and longjmp save and restore the signal
mask. Linux 2.4.22 and Solaris 9, however, do not do this. FreeBSD and Mac OS X provide the
functions _setjmp and _longjmp, which do not save and restore the signal mask.
To allow either form of behavior, POSIX.1 does not specify the effect of setjmp and longjmp on
signal masks. Instead, two new functions, sigsetjmp and siglongjmp, are defined by POSIX.1.
These two functions should always be used when branching from a signal handler.
#include <setjmp.h>
int sigsetjmp(sigjmp_buf env, int savemask);
Returns: 0 if called directly, nonzero if returning from a call to siglongjmp
void siglongjmp(sigjmp_buf env, int val);
The only difference between these functions and the setjmp and longjmp functions is that
sigsetjmp has an additional argument. If savemask is nonzero, then sigsetjmp also saves the
current signal mask of the process in env. When siglongjmp is called, if the env argument was
saved by a call to sigsetjmp with a nonzero savemask, then siglongjmp restores the saved
signal mask.
Example
The program in Figure 10.20 demonstrates how the signal mask that is installed by the system
when a signal handler is invoked automatically includes the signal being caught. The program
also illustrates the use of the sigsetjmp and siglongjmp functions.
This program demonstrates another technique that should be used whenever siglongjmp is
called from a signal handler. We set the variable canjump to a nonzero value only after we've
called sigsetjmp. This variable is also examined in the signal handler, and siglongjmp is called
only if the flag canjump is nonzero. This provides protection against the signal handler being
called at some earlier or later time, when the jump buffer isn't initialized by sigsetjmp. (In this
trivial program, we terminate quickly after the siglongjmp, but in larger programs, the signal
handler may remain installed long after the siglongjmp.) Providing this type of protection
usually isn't required with longjmp in normal C code (as opposed to a signal handler). Since a
signal can occur at any time, however, we need the added protection in a signal handler.
Here, we use the data type sig_atomic_t, which is defined by the ISO C standard to be the
type of variable that can be written without being interrupted. By this we mean that a
variable of this type should not extend across page boundaries on a system with virtual
Page 451
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
memory and can be accessed with a single machine instruction, for example. We always
include the ISO type qualifier volatile for these data types too, since the variable is being
accessed by two different threads of control: the main function and the asynchronously
executing signal handler. Figure 10.21 shows a time line for this program.
We can divide Figure 10.21 into three parts: the left part (corresponding to main), the center
part (sig_usr1), and the right part (sig_alrm). While the process is executing in the left part,
its signal mask is 0 (no signals are blocked). While executing in the center part, its signal mask
is SIGUSR1. While executing in the right part, its signal mask is SIGUSR1|SIGALRM.
Let's examine the output when the program in Figure 10.20 is executed:
$ ./a.out &
starting main:
[1]
531
$ kill -USR1 531
starting sig_usr1: SIGUSR1
$ in sig_alrm: SIGUSR1 SIGALRM
finishing sig_usr1: SIGUSR1
ending main:
start process in background
the job-control shell prints its process ID
send the process SIGUSR1
just press RETURN
[1] + Done
./a.out &
The output is as we expect: when a signal handler is invoked, the signal being caught is
added to the current signal mask of the process. The original mask is restored when the signal
handler returns. Also, siglongjmp restores the signal mask that was saved by sigsetjmp.
If we change the program in Figure 10.20 so that the calls to sigsetjmp and siglongjmp are
replaced with calls to setjmp and longjmp on Linux (or _setjmp and _longjmp on FreeBSD), the
final line of output becomes
ending main: SIGUSR1
This means that the main function is executing with the SIGUSR1 signal blocked, after the call
to setjmp. This probably isn't what we want.
Figure 10.20. Example of signal masks, sigsetjmp, and siglongjmp
#include "apue.h"
#include <setjmp.h>
#include <time.h>
static void
static sigjmp_buf
static volatile sig_atomic_t
sig_usr1(int), sig_alrm(int);
jmpbuf;
canjump;
int
main(void)
{
if (signal(SIGUSR1, sig_usr1) == SIG_ERR)
err_sys("signal(SIGUSR1) error");
if (signal(SIGALRM, sig_alrm) == SIG_ERR)
err_sys("signal(SIGALRM) error");
pr_mask("starting main: ");
/* Figure 10.14 */
if (sigsetjmp(jmpbuf, 1)) {
pr_mask("ending main: ");
exit(0);
Page 452
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
}
canjump = 1;
/* now sigsetjmp() is OK */
for ( ; ; )
pause();
}
static void
sig_usr1(int signo)
{
time_t starttime;
if (canjump == 0)
return;
/* unexpected signal, ignore */
pr_mask("starting sig_usr1: ");
alarm(3);
/* SIGALRM in 3 seconds */
starttime = time(NULL);
for ( ; ; )
/* busy wait for 5 seconds */
if (time(NULL) > starttime + 5)
break;
pr_mask("finishing sig_usr1: ");
canjump = 0;
siglongjmp(jmpbuf, 1);
/* jump back to main, don't return */
}
static void
sig_alrm(int signo)
{
pr_mask("in sig_alrm: ");
}
Figure 10.21. Time line for example program handling two signals
[View full size image]
Page 453
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.16. sigsuspend Function
We have seen how we can change the signal mask for a process to block and unblock
selected signals. We can use this technique to protect critical regions of code that we don't
want interrupted by a signal. What if we want to unblock a signal and then pause, waiting for
the previously blocked signal to occur? Assuming that the signal is SIGINT, the incorrect way
to do this is
sigset_t
newmask, oldmask;
sigemptyset(&newmask);
sigaddset(&newmask, SIGINT);
/* block SIGINT and save current signal mask */
if (sigprocmask(SIG_BLOCK, &newmask, &oldmask) < 0)
err_sys("SIG_BLOCK error");
/* critical region of code */
/* reset signal mask, which unblocks SIGINT */
if (sigprocmask(SIG_SETMASK, &oldmask, NULL) < 0)
err_sys("SIG_SETMASK error");
/* window is open */
pause(); /* wait for signal to occur */
/* continue processing */
If the signal is sent to the process while it is blocked, the signal delivery will be deferred until
the signal is unblocked. To the application, this can look as if the signal occurs between the
unblocking and the pause (depending on how the kernel implements signals). If this happens,
or if the signal does occur between the unblocking and the pause, we have a problem. Any
occurrence of the signal in this window of time is lost in the sense that we might not see the
signal again, in which case the pause will block indefinitely. This is another problem with the
earlier unreliable signals.
To correct this problem, we need a way to both reset the signal mask and put the process to
sleep in a single atomic operation. This feature is provided by the sigsuspend function.
#include <signal.h>
int sigsuspend(const sigset_t *sigmask
);
Returns: 1 with errno set to EINTR
The signal mask of the process is set to the value pointed to by sigmask. Then the process is
suspended until a signal is caught or until a signal occurs that terminates the process. If a
signal is caught and if the signal handler returns, then sigsuspend returns, and the signal mask
of the process is set to its value before the call to sigsuspend.
Note that there is no successful return from this function. If it returns to the caller, it always
returns 1 with errno set to EINTR (indicating an interrupted system call).
Page 454
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Example
Figure 10.22 shows the correct way to protect a critical region of code from a specific signal.
Note that when sigsuspend returns, it sets the signal mask to its value before the call. In this
example, the SIGINT signal will be blocked. We therefore reset the signal mask to the value
that we saved earlier (oldmask).
Running the program from Figure 10.22 produces the following output:
$ ./a.out
program start:
in critical region: SIGINT
^?
type the interrupt character
in sig_int: SIGINT SIGUSR1
after return from sigsuspend: SIGINT
program exit:
We added SIGUSR1 to the mask installed when we called sigsuspend so that when the signal
handler ran, we could tell that the mask had actually changed. We can see that when
sigsuspend returns, it restores the signal mask to its value before the call.
Figure 10.22. Protecting a critical region from a signal
#include "apue.h"
static void sig_int(int);
int
main(void)
{
sigset_t
newmask, oldmask, waitmask;
pr_mask("program start: ");
if (signal(SIGINT, sig_int) == SIG_ERR)
err_sys("signal(SIGINT) error");
sigemptyset(&waitmask);
sigaddset(&waitmask, SIGUSR1);
sigemptyset(&newmask);
sigaddset(&newmask, SIGINT);
/*
* Block SIGINT and save current signal mask.
*/
if (sigprocmask(SIG_BLOCK, &newmask, &oldmask) < 0)
err_sys("SIG_BLOCK error");
/*
* Critical region of code.
*/
pr_mask("in critical region: ");
/*
* Pause, allowing all signals except SIGUSR1.
*/
if (sigsuspend(&waitmask) != -1)
err_sys("sigsuspend error");
Page 455
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
pr_mask("after return from sigsuspend: ");
/*
* Reset signal mask which unblocks SIGINT.
*/
if (sigprocmask(SIG_SETMASK, &oldmask, NULL) < 0)
err_sys("SIG_SETMASK error");
/*
* And continue processing ...
*/
pr_mask("program exit: ");
exit(0);
}
static void
sig_int(int signo)
{
pr_mask("\nin sig_int: ");
}
Example
Another use of sigsuspend is to wait for a signal handler to set a global variable. In the
program shown in Figure 10.23, we catch both the interrupt signal and the quit signal, but
want to wake up the main routine only when the quit signal is caught.
Sample output from this program is
$ ./a.out
^?
interrupt
type the interrupt character
^?
interrupt
type the interrupt character again
^?
interrupt
and again
^?
interrupt
and again
^?
interrupt
and again
^?
interrupt
and again
^?
interrupt
and again
^\ $
now terminate with quit character
Figure 10.23. Using sigsuspend to wait for a global variable to be set
#include "apue.h"
Page 456
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
volatile sig_atomic_t
quitflag;
/* set nonzero by signal handler */
static void
sig_int(int signo) /* one signal handler for SIGINT and SIGQUIT */
{
if (signo == SIGINT)
printf("\ninterrupt\n");
else if (signo == SIGQUIT)
quitflag = 1;
/* set flag for main loop */
}
int
main(void)
{
sigset_t
newmask, oldmask, zeromask;
if (signal(SIGINT, sig_int) == SIG_ERR)
err_sys("signal(SIGINT) error");
if (signal(SIGQUIT, sig_int) == SIG_ERR)
err_sys("signal(SIGQUIT) error");
sigemptyset(&zeromask);
sigemptyset(&newmask);
sigaddset(&newmask, SIGQUIT);
/*
* Block SIGQUIT and save current signal mask.
*/
if (sigprocmask(SIG_BLOCK, &newmask, &oldmask) < 0)
err_sys("SIG_BLOCK error");
while (quitflag == 0)
sigsuspend(&zeromask);
/*
* SIGQUIT has been caught and is now blocked; do whatever.
*/
quitflag = 0;
/*
* Reset signal mask which unblocks SIGQUIT.
*/
if (sigprocmask(SIG_SETMASK, &oldmask, NULL) < 0)
err_sys("SIG_SETMASK error");
exit(0);
}
For portability between non-POSIX systems that support ISO C, and POSIX.1 systems, the
only thing we should do within a signal handler is assign a value to a variable of type
sig_atomic_t, and nothing else. POSIX.1 goes further and specifies a list of functions that are
safe to call from within a signal handler (Figure 10.4), but if we do this, our code may not run
correctly on non-POSIX systems.
Example
As another example of signals, we show how signals can be used to synchronize a parent and
child. Figure 10.24 shows implementations of the five routines TELL_WAIT, TELL_PARENT,
Page 457
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
TELL_CHILD, WAIT_PARENT, and WAIT_CHILD from Section 8.9.
We use the two user-defined signals: SIGUSR1 is sent by the parent to the child, and SIGUSR2
is sent by the child to the parent. In Figure 15.7, we show another implementation of these
five functions using pipes.
Figure 10.24. Routines to allow a parent and child to synchronize
#include "apue.h"
static volatile sig_atomic_t sigflag; /* set nonzero by sig handler */
static sigset_t newmask, oldmask, zeromask;
static void
sig_usr(int signo)
{
sigflag = 1;
}
/* one signal handler for SIGUSR1 and SIGUSR2 */
void
TELL_WAIT(void)
{
if (signal(SIGUSR1, sig_usr) == SIG_ERR)
err_sys("signal(SIGUSR1) error");
if (signal(SIGUSR2, sig_usr) == SIG_ERR)
err_sys("signal(SIGUSR2) error");
sigemptyset(&zeromask);
sigemptyset(&newmask);
sigaddset(&newmask, SIGUSR1);
sigaddset(&newmask, SIGUSR2);
/*
* Block SIGUSR1 and SIGUSR2, and save current signal mask.
*/
if (sigprocmask(SIG_BLOCK, &newmask, &oldmask) < 0)
err_sys("SIG_BLOCK error");
}
void
TELL_PARENT(pid_t pid)
{
kill(pid, SIGUSR2);
}
void
WAIT_PARENT(void)
{
while (sigflag == 0)
sigsuspend(&zeromask);
sigflag = 0;
/* tell parent we're done */
/* and wait for parent */
/*
* Reset signal mask to original value.
*/
if (sigprocmask(SIG_SETMASK, &oldmask, NULL) < 0)
err_sys("SIG_SETMASK error");
}
void
TELL_CHILD(pid_t pid)
Page 458
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
{
kill(pid, SIGUSR1);
/* tell child we're done */
}
void
WAIT_CHILD(void)
{
while (sigflag == 0)
sigsuspend(&zeromask);
sigflag = 0;
/* and wait for child */
/*
* Reset signal mask to original value.
*/
if (sigprocmask(SIG_SETMASK, &oldmask, NULL) < 0)
err_sys("SIG_SETMASK error");
}
The sigsuspend function is fine if we want to go to sleep while waiting for a signal to occur (as
we've shown in the previous two examples), but what if we want to call other system
functions while we're waiting? Unfortunately, this problem has no bulletproof solution unless
we use multiple threads and dedicate a separate thread to handling signals, as we discuss in
Section 12.8.
Without using threads, the best we can do is to set a global variable in the signal handler
when the signal occurs. For example, if we catch both SIGINT and SIGALRM and install the
signal handlers using the signal_intr function, the signals will interrupt any slow system call
that is blocked. The signals are most likely to occur when we're blocked in a call to the select
function (Section 14.5.1), waiting for input from a slow device. (This is especially true for
SIGALRM, since we set the alarm clock to prevent us from waiting forever for input.) The code
to handle this looks similar to the following:
if (intr_flag)
handle_intr();
if (alrm_flag)
handle_alrm();
/* flag set by our SIGINT handler */
/* flag set by our SIGALRM handler */
/* signals occurring in here are lost */
while (select( ... ) < 0) {
if (errno == EINTR) {
if (alrm_flag)
handle_alrm();
else if (intr_flag)
handle_intr();
} else {
/* some other error */
}
}
We test each of the global flags before calling select and again if select returns an
interrupted system call error. The problem occurs if either signal is caught between the first
two if statements and the subsequent call to select. Signals occurring in here are lost, as
indicated by the code comment. The signal handlers are called, and they set the appropriate
global variable, but the select never returns (unless some data is ready to be read).
What we would like to be able to do is the following sequence of steps, in order.
1.
Block SIGINT and SIGALRM.
Page 459
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
2.
Test the two global variables to see whether either signal has occurred and, if so,
handle the condition.
3.
Call select (or any other system function, such as read) and unblock the two signals,
as an atomic operation.
The sigsuspend function helps us only if step 3 is a pause operation.
Page 460
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.17. abort Function
We mentioned earlier that the abort function causes abnormal program termination.
#include <stdlib.h>
void abort(void);
This function never returns
This function sends the SIGABRT signal to the caller. (Processes should not ignore this signal.)
ISO C states that calling abort will deliver an unsuccessful termination notification to the host
environment by calling raise(SIGABRT).
ISO C requires that if the signal is caught and the signal handler returns, abort still doesn't
return to its caller. If this signal is caught, the only way the signal handler can't return is if it
calls exit, _exit, _Exit, longjmp, or siglongjmp. (Section 10.15 discusses the differences
between longjmp and siglongjmp.) POSIX.1 also specifies that abort overrides the blocking or
ignoring of the signal by the process.
The intent of letting the process catch the SIGABRT is to allow it to perform any cleanup that
it wants to do before the process terminates. If the process doesn't terminate itself from this
signal handler, POSIX.1 states that, when the signal handler returns, abort terminates the
process.
The ISO C specification of this function leaves it up to the implementation as to whether
output streams are flushed and whether temporary files (Section 5.13) are deleted. POSIX.1
goes further and requires that if the call to abort terminates the process, then the effect on
the open standard I/O streams in the process will be the same as if the process had called
fclose on each stream before terminating.
Earlier versions of System V generated the SIGIOT signal from the abort function. Furthermore,
it was possible for a process to ignore this signal or to catch it and return from the signal
handler, in which case abort returned to its caller.
4.3BSD generated the SIGILL signal. Before doing this, the 4.3BSD function unblocked the
signal and reset its disposition to SIG_DFL (terminate with core file). This prevented a process
from either ignoring the signal or catching it.
Historically, implementations of abort differ in how they deal with standard I/O streams. For
defensive programming and improved portability, if we want standard I/O streams to be
flushed, we specifically do it before calling abort. We do this in the err_dump function (
Appendix B).
Since most UNIX System implementations of tmpfile call unlink immediately after creating the
file, the ISO C warning about temporary files does not usually concern us.
Example
Figure 10.25 shows an implementation of the abort function as specified by POSIX.1.
We first see whether the default action will occur; if so, we flush all the standard I/O streams.
This is not equivalent to an fclose on all the open streams (since it just flushes them and
doesn't close them), but when the process terminates, the system closes all open files. If the
process catches the signal and returns, we flush all the streams again, since the process
Page 461
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
could have generated more output. The only condition we don't handle is if the process
catches the signal and calls _exit or _Exit. In this case, any unflushed standard I/O buffers in
memory are discarded. We assume that a caller that does this doesn't want the buffers
flushed.
Recall from Section 10.9 that if calling kill causes the signal to be generated for the caller,
and if the signal is not blocked (which we guarantee in Figure 10.25), then the signal (or some
other pending, unlocked signal) is delivered to the process before kill returns. We block all
signals except SIGABRT, so we know that if the call to kill returns, the process caught the
signal and the signal handler returned.
Figure 10.25. Implementation of POSIX.1 abort
#include
#include
#include
#include
<signal.h>
<stdio.h>
<stdlib.h>
<unistd.h>
void
abort(void)
/* POSIX-style abort() function */
{
sigset_t
mask;
struct sigaction
action;
/*
* Caller can't ignore SIGABRT, if so reset to default.
*/
sigaction(SIGABRT, NULL, &action);
if (action.sa_handler == SIG_IGN) {
action.sa_handler = SIG_DFL;
sigaction(SIGABRT, &action, NULL);
}
if (action.sa_handler == SIG_DFL)
fflush(NULL);
/* flush all open stdio streams */
/*
* Caller can't block SIGABRT; make sure it's unblocked.
*/
sigfillset(&mask);
sigdelset(&mask, SIGABRT); /* mask has only SIGABRT turned off */
sigprocmask(SIG_SETMASK, &mask, NULL);
kill(getpid(), SIGABRT);
/* send the signal */
/*
* If we're here, process caught SIGABRT and returned.
*/
fflush(NULL);
/* flush all open stdio streams */
action.sa_handler = SIG_DFL;
sigaction(SIGABRT, &action, NULL); /* reset to default */
sigprocmask(SIG_SETMASK, &mask, NULL); /* just in case ... */
kill(getpid(), SIGABRT);
/* and one more time */
exit(1);
/* this should never be executed ... */
}
Page 462
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.18. system Function
In Section 8.13, we showed an implementation of the system function. That version, however,
did not do any signal handling. POSIX.1 requires that system ignore SIGINT and SIGQUIT and
block SIGCHLD. Before showing a version that correctly handles these signals, let's see why we
need to worry about signal handling.
Example
The program shown in Figure 10.26 uses the version of system from Section 8.13 to invoke
the ed(1) editor. (This editor has been part of UNIX systems for a long time. We use it here
because it is an interactive program that catches the interrupt and quit signals. If we invoke
ed from a shell and type the interrupt character, it catches the interrupt signal and prints a
question mark. The ed program also sets the disposition of the quit signal so that it is
ignored.) The program in Figure 10.26 catches both SIGINT and SIGCHLD. If we invoke the
program, we get
$ ./a.out
append text to the editor's buffer
a
Here is one line of text
.
period on a line by itself stops append mode
1,$p
Here is one line of text
w temp.foo
25
print first through last lines of buffer to see what's there
q
caught SIGCHLD
and leave the editor
write the buffer to a file
editor says it wrote 25 bytes
When the editor terminates, the system sends the SIGCHLD signal to the parent (the a.out
process). We catch it and return from the signal handler. But if it is catching the SIGCHLD
signal, the parent should be doing so because it has created its own children, so that it knows
when its children have terminated. The delivery of this signal in the parent should be blocked
while the system function is executing. Indeed, this is what POSIX.1 specifies. Otherwise,
when the child created by system terminates, it would fool the caller of system into thinking
that one of its own children terminated. The caller would then use one of the wait functions
to get the termination status of the child, thus preventing the system function from being able
to obtain the child's termination status for its return value.
If we run the program again, this time sending the editor an interrupt signal, we get
$ ./a.out
a
append text to the editor's buffer
hello, world
.
period on a line by itself stops append mode
1,$p
hello, world
w temp.foo
print first through last lines to see what's there
write the buffer to a file
Page 463
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
13
editor says it wrote 13 bytes
^?
?
caught SIGINT
type the interrupt character
editor catches signal, prints question mark
and so does the parent process
q
leave editor
caught SIGCHLD
Recall from Section 9.6 that typing the interrupt character causes the interrupt signal to be
sent to all the processes in the foreground process group. Figure 10.27 shows the
arrangement of the processes when the editor is running.
In this example, SIGINT is sent to all three foreground processes. (The shell ignores it.) As we
can see from the output, both the a.out process and the editor catch the signal. But when
we're running another program with the system function, we shouldn't have both the parent
and the child catching the two terminal-generated signals: interrupt and quit. These two
signals should really be sent to the program that is running: the child. Since the command
that is executed by system can be an interactive command (as is the ed program in this
example) and since the caller of system gives up control while the program executes, waiting
for it to finish, the caller of system should not be receiving these two terminal-generated
signals. This is why POSIX.1 specifies that the system function should ignore these two signals
while waiting for the command to complete.
Figure 10.26. Using system to invoke the ed editor
#include "apue.h"
static void
sig_int(int signo)
{
printf("caught SIGINT\n");
}
static void
sig_chld(int signo)
{
printf("caught SIGCHLD\n");
}
int
main(void)
{
if (signal(SIGINT, sig_int) == SIG_ERR)
err_sys("signal(SIGINT) error");
if (signal(SIGCHLD, sig_chld) == SIG_ERR)
err_sys("signal(SIGCHLD) error");
if (system("/bin/ed") < 0)
err_sys("system() error");
exit(0);
}
Figure 10.27. Foreground and background process groups for Figure
10.26
[View full size image]
Page 464
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Example
Figure 10.28 shows an implementation of the system function with the required signal handling.
If we link the program in Figure 10.26 with this implementation of the system function, the
resulting binary differs from the last (flawed) one in the following ways.
1.
No signal is sent to the calling process when we type the interrupt or quit character.
2.
When the ed command exits, SIGCHLD is not sent to the calling process. Instead, it is
blocked until we unblock it in the last call to sigprocmask, after the system function
retrieves the child's termination status by calling waitpid.
POSIX.1 states that if wait or waitpid returns the status of a child process while
SIGCHLD is pending, then SIGCHLD should not be delivered to the process unless the
status of another child process is also available. None of the four implementations
discussed in this book implements this semantic. Instead, SIGCHLD remains pending after
the system function calls waitpid; when the signal is unblocked, it is delivered to the
caller. If we called wait in the sig_chld function in Figure 10.26, it would return 1 with
errno set to ECHILD, since the system function already retrieved the termination status
of the child.
Many older texts show the ignoring of the interrupt and quit signals as follows:
if ((pid = fork()) < 0) {
err_sys("fork error");
} else if (pid == 0) {
/* child */
execl(...);
_exit(127);
}
/* parent */
old_intr = signal(SIGINT, SIG_IGN);
old_quit = signal(SIGQUIT, SIG_IGN);
waitpid(pid, &status, 0)
signal(SIGINT, old_intr);
signal(SIGQUIT, old_quit);
The problem with this sequence of code is that we have no guarantee after the fork whether
the parent or child runs first. If the child runs first and the parent doesn't run for some time
after, it's possible for an interrupt signal to be generated before the parent is able to change
its disposition to be ignored. For this reason, in Figure 10.28, we change the disposition of the
signals before the fork.
Note that we have to reset the dispositions of these two signals in the child before the call
to execl. This allows execl to change their dispositions to the default, based on the caller's
dispositions, as we described in Section 8.10.
Figure 10.28. Correct POSIX.1 implementation of system function
#include
#include
<sys/wait.h>
<errno.h>
Page 465
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
#include
#include
<signal.h>
<unistd.h>
int
system(const char *cmdstring)
/* with appropriate signal handling */
{
pid_t
pid;
int
status;
struct sigaction
ignore, saveintr, savequit;
sigset_t
chldmask, savemask;
if (cmdstring == NULL)
return(1);
/* always a command processor with UNIX */
ignore.sa_handler = SIG_IGN;
/* ignore SIGINT and SIGQUIT */
sigemptyset(&ignore.sa_mask);
ignore.sa_flags = 0;
if (sigaction(SIGINT, &ignore, &saveintr) < 0)
return(-1);
if (sigaction(SIGQUIT, &ignore, &savequit) < 0)
return(-1);
sigemptyset(&chldmask);
/* now block SIGCHLD */
sigaddset(&chldmask, SIGCHLD);
if (sigprocmask(SIG_BLOCK, &chldmask, &savemask) < 0)
return(-1);
if ((pid = fork()) < 0) {
status = -1;
/* probably out of processes */
} else if (pid == 0) {
/* child */
/* restore previous signal actions & reset signal mask */
sigaction(SIGINT, &saveintr, NULL);
sigaction(SIGQUIT, &savequit, NULL);
sigprocmask(SIG_SETMASK, &savemask, NULL);
execl("/bin/sh", "sh", "-c", cmdstring, (char *)0);
_exit(127);
/* exec error */
} else {
/* parent */
while (waitpid(pid, &status, 0) < 0)
if (errno != EINTR) {
status = -1; /* error other than EINTR from waitpid() */
break;
}
}
/* restore previous signal actions & reset signal mask */
if (sigaction(SIGINT, &saveintr, NULL) < 0)
return(-1);
if (sigaction(SIGQUIT, &savequit, NULL) < 0)
return(-1);
if (sigprocmask(SIG_SETMASK, &savemask, NULL) < 0)
return(-1);
return(status);
}
Return Value from system
Beware of the return value from system. It is the termination status of the shell, which isn't
always the termination status of the command string. We saw some examples in Figure 8.23,
Page 466
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
and the results were as we expected: if we execute a simple command, such as date, the
termination status is 0. Executing the shell command exit 44 gave us a termination status of
44. What happens with signals?
Let's run the program in Figure 8.24 and send some signals to the command that's executing:
$ tsys "sleep 30"
^?normal termination, exit status = 130
$ tsys "sleep 30"
we type the interrupt key
^\sh: 946 Quit
normal termination, exit status = 131
we type the quit key
When we terminate the sleep with the interrupt signal, the pr_exit function (Figure 8.5)
thinks that it terminated normally. The same thing happens when we kill the sleep with the
quit key. What is happening here is that the Bourne shell has a poorly documented feature
that its termination status is 128 plus the signal number, when the command it was executing
is terminated by a signal. We can see this with the shell interactively.
$ sh
$ sh -c "sleep 30"
make sure we're running the Bourne shell
^?
$ echo $?
130
$ sh -c "sleep 30"
type the interrupt key
print termination status of last command
^\sh: 962 Quit - core dumped
$ echo $?
131
$ exit
type the quit key
print termination status of last command
leave Bourne shell
On the system being used, SIGINT has a value of 2 and SIGQUIT has a value of 3, giving us the
shell's termination statuses of 130 and 131.
Let's try a similar example, but this time we'll send a signal directly to the shell and see what
gets returned by system:
$ tsys "sleep 30" &
9257
$ ps -f
UID
PID
PPID
TTY
TIME
sar 9260
949
pts/5 0:00
sar 9258
9257
pts/5 0:00
sar
949
947
pts/5 0:01
sar 9257
949
pts/5 0:00
sar 9259
9258
pts/5 0:00
$ kill -KILL 9258
abnormal termination, signal number
start it in background this time
look at the process IDs
CMD
ps -f
sh -c sleep 60
/bin/sh
tsys sleep 60
sleep 60
kill the shell itself
= 9
Here, we can see that the return value from system reports an abnormal termination only when
the shell itself abnormally terminates.
When writing programs that use the system function, be sure to interpret the return value
correctly. If you call fork, exec, and wait yourself, the termination status is not the same as if
you call system.
Page 467
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 468
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.19. sleep Function
We've used the sleep function in numerous examples throughout the text, and we showed two
flawed implementations of it in Figures 10.7 and 10.8.
#include <unistd.h>
unsigned int sleep(unsigned int seconds
);
Returns: 0 or number of unslept seconds
This function causes the calling process to be suspended until either
1.
The amount of wall clock time specified by seconds has elapsed.
2.
A signal is caught by the process and the signal handler returns.
As with an alarm signal, the actual return may be at a time later than requested, because of
other system activity.
In case 1, the return value is 0. When sleep returns early, because of some signal being
caught (case 2), the return value is the number of unslept seconds (the requested time minus
the actual time slept).
Although sleep can be implemented with the alarm function (Section 10.10), this isn't
required. If alarm is used, however, there can be interactions between the two functions. The
POSIX.1 standard leaves all these interactions unspecified. For example, if we do an alarm(10)
and 3 wall clock seconds later do a sleep(5), what happens? The sleep will return in 5
seconds (assuming that some other signal isn't caught in that time), but will another SIGALRM
be generated 2 seconds later? These details depend on the implementation.
Solaris 9 implements sleep using alarm. The Solaris sleep(3) manual page says that a
previously scheduled alarm is properly handled. For example, in the preceding scenario, before
sleep returns, it will reschedule the alarm to happen 2 seconds later; sleep returns 0 in this
case. (Obviously, sleep must save the address of the signal handler for SIGALRM and reset it
before returning.) Also, if we do an alarm(6) and 3 wall clock seconds later do a sleep(5), the
sleep returns in 3 seconds (when the alarm goes off), not in 5 seconds. Here, the return value
from sleep is 2 (the number of unslept seconds).
FreeBSD 5.2.1, Linux 2.4.22, and Mac OS X 10.3, on the other hand, use another technique:
the delay is provided by nanosleep(2). This function is specified to be a high-resolution delay
by the real-time extensions in the Single UNIX Specification. This function allows the
implementation of sleep to be independent of signals.
For portability, you shouldn't make any assumptions about the implementation of sleep, but if
you have any intentions of mixing calls to sleep with any other timing functions, you need to
be aware of possible interactions.
Example
Figure 10.29 shows an implementation of the POSIX.1 sleep function. This function is a
modification of Figure 10.7, which handles signals reliably, avoiding the race condition in the
earlier implementation. We still do not handle any interactions with previously set alarms. (As
we mentioned, these interactions are explicitly undefined by POSIX.1.)
Page 469
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
It takes more code to write this reliable implementation than what is shown in Figure 10.7. We
don't use any form of nonlocal branching (as we did in Figure 10.8 to avoid the race condition
between the alarm and pause), so there is no effect on other signal handlers that may be
executing when the SIGALRM is handled.
Figure 10.29. Reliable implementation of sleep
#include "apue.h"
static void
sig_alrm(int signo)
{
/* nothing to do, just returning wakes up sigsuspend() */
}
unsigned int
sleep(unsigned int nsecs)
{
struct sigaction
newact, oldact;
sigset_t
newmask, oldmask, suspmask;
unsigned int
unslept;
/* set our handler, save previous information */
newact.sa_handler = sig_alrm;
sigemptyset(&newact.sa_mask);
newact.sa_flags = 0;
sigaction(SIGALRM, &newact, &oldact);
/* block SIGALRM and save current signal mask */
sigemptyset(&newmask);
sigaddset(&newmask, SIGALRM);
sigprocmask(SIG_BLOCK, &newmask, &oldmask);
alarm(nsecs);
suspmask = oldmask;
sigdelset(&suspmask, SIGALRM);
sigsuspend(&suspmask);
/* make sure SIGALRM isn't blocked */
/* wait for any signal to be caught */
/* some signal has been caught,
SIGALRM is now blocked */
unslept = alarm(0);
sigaction(SIGALRM, &oldact, NULL);
/* reset previous action */
/* reset signal mask, which unblocks SIGALRM */
sigprocmask(SIG_SETMASK, &oldmask, NULL);
return(unslept);
}
Page 470
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.20. Job-Control Signals
Of the signals shown in Figure 10.1, POSIX.1 considers six to be job-control signals:
SIGCHLD Child process has stopped or terminated.
SIGCONT Continue process, if stopped.
SIGSTOP Stop signal (can't be caught or ignored).
SIGTSTP
Interactive stop signal.
SIGTTIN Read from controlling terminal by member of a background process group.
SIGTTOU Write to controlling terminal by member of a background process group.
Except for SIGCHLD, most application programs don't handle these signals: interactive shells
usually do all the work required to handle these signals. When we type the suspend character
(usually Control-Z), SIGTSTP is sent to all processes in the foreground process group. When we
tell the shell to resume a job in the foreground or background, the shell sends all the
processes in the job the SIGCONT signal. Similarly, if SIGTTIN or SIGTTOU is delivered to a
process, the process is stopped by default, and the job-control shell recognizes this and
notifies us.
An exception is a process that is managing the terminal: the vi(1) editor, for example. It
needs to know when the user wants to suspend it, so that it can restore the terminal's state
to the way it was when vi was started. Also, when it resumes in the foreground, the vi editor
needs to set the terminal state back to the way it wants it, and it needs to redraw the
terminal screen. We see how a program such as vi handles this in the example that follows.
There are some interactions between the job-control signals. When any of the four stop
signals (SIGTSTP, SIGSTOP, SIGTTIN, or SIGTTOU) is generated for a process, any pending SIGCONT
signal for that process is discarded. Similarly, when the SIGCONT signal is generated for a
process, any pending stop signals for that same process are discarded.
Note that the default action for SIGCONT is to continue the process, if it is stopped; otherwise,
the signal is ignored. Normally, we don't have to do anything with this signal. When SIGCONT is
generated for a process that is stopped, the process is continued, even if the signal is
blocked or ignored.
Example
The program in Figure 10.30 demonstrates the normal sequence of code used when a program
handles job control. This program simply copies its standard input to its standard output, but
comments are given in the signal handler for typical actions performed by a program that
manages a screen. When the program in Figure 10.30 starts, it arranges to catch the SIGTSTP
signal only if the signal's disposition is SIG_DFL. The reason is that when the program is started
by a shell that doesn't support job control (/bin/sh, for example), the signal's disposition
should be set to SIG_IGN. In fact, the shell doesn't explicitly ignore this signal; init sets the
disposition of the three job-control signals (SIGTSTP, SIGTTIN, and SIGTTOU) to SIG_IGN. This
disposition is then inherited by all login shells. Only a job-control shell should reset the
disposition of these three signals to SIG_DFL.
Page 471
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
When we type the suspend character, the process receives the SIGTSTP signal, and the signal
handler is invoked. At this point, we would do any terminal-related processing: move the
cursor to the lower-left corner, restore the terminal mode, and so on. We then send ourself
the same signal, SIGTSTP, after resetting its disposition to its default (stop the process) and
unblocking the signal. We have to unblock it since we're currently handling that same signal,
and the system blocks it automatically while it's being caught. At this point, the system stops
the process. It is continued only when it receives (usually from the job-control shell, in
response to an interactive fg command) aSIGCONT signal. We don't catch SIGCONT. Its default
disposition is to continue the stopped process; when this happens, the program continues as
though it returned from the kill function. When the program is continued, we reset the
disposition for the SIGTSTP signal and do whatever terminal processing we want (we could
redraw the screen, for example).
Figure 10.30. How to handle SIGTSTP
#include "apue.h"
#define BUFFSIZE
1024
static void sig_tstp(int);
int
main(void)
{
int
char
n;
buf[BUFFSIZE];
/*
* Only catch SIGTSTP if we're running with a job-control shell.
*/
if (signal(SIGTSTP, SIG_IGN) == SIG_DFL)
signal(SIGTSTP, sig_tstp);
while ((n = read(STDIN_FILENO, buf, BUFFSIZE)) > 0)
if (write(STDOUT_FILENO, buf, n) != n)
err_sys("write error");
if (n < 0)
err_sys("read error");
exit(0);
}
static void
sig_tstp(int signo) /* signal handler for SIGTSTP */
{
sigset_t
mask;
/* ... move cursor to lower left corner, reset tty mode ... */
/*
* Unblock SIGTSTP, since it's blocked while we're handling it.
*/
sigemptyset(&mask);
sigaddset(&mask, SIGTSTP);
sigprocmask(SIG_UNBLOCK, &mask, NULL);
signal(SIGTSTP, SIG_DFL);
/* reset disposition to default */
Page 472
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
kill(getpid(), SIGTSTP);
/* and send the signal to ourself */
/* we won't return from the kill until we're continued */
signal(SIGTSTP, sig_tstp);
/* reestablish signal handler */
/* ... reset tty mode, redraw screen ... */
}
Page 473
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.21. Additional Features
In this section, we describe some additional implementation-dependent features of signals.
Signal Names
Some systems provide the array
extern char *sys_siglist[];
The array index is the signal number, giving a pointer to the character string name of the
signal.
FreeBSD 5.2.1, Linux 2.4.22, and Mac OS X 10.3 all provide this array of signal names. Solaris
9 does, too, but it uses the name _sys_siglist instead.
These systems normally provide the function psignal also.
#include <signal.h>
void psignal(int signo, const char *msg
);
The string msg (which is normally the name of the program) is output to the standard error,
followed by a colon and a space, followed by a description of the signal, followed by a
newline. This function is similar to perror (Section 1.7).
Another common function is strsignal. This function is similar to strerror (also described in
Section 1.7).
#include <string.h>
char *strsignal(int signo);
Returns: a pointer to a string describing the signal
Given a signal number, strsignal will return a string that describes the signal. This string can
be used by applications to print error messages about signals received.
All the platforms discussed in this book provide the psignal and strsignal functions, but
differences do occur. On Solaris 9, strsignal will return a null pointer if the signal number is
invalid, whereas FreeBSD 5.2.1, Linux 2.4.22, and Mac OS X 10.3 return a string indicating
that the signal number is unrecognized. Also, to get the function prototype for psignal on
Solaris, you need to include <siginfo.h>.
Signal Mappings
Solaris provides a couple of functions to map a signal number to a signal name and vice versa.
Page 474
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
#include <signal.h>
int sig2str(int signo, char *str);
int str2sig(const char *str, int *signop
);
Both return: 0 if OK, 1 on error
These functions are useful when writing interactive programs that need to accept and print
signal names and numbers.
The sig2str function translates the given signal number into a string and stores the result in
the memory pointed to by str. The caller must ensure that the memory is large enough to hold
the longest string, including the terminating null byte. Solaris provides the constant
SIG2STR_MAX in <signal.h> to define the maximum string length. The string consists of the
signal name without the "SIG" prefix. For example, translating SIGKILL would result in the string
"KILL" being stored in the str memory buffer.
The str2sig function translates the given name into a signal number. The signal number is
stored in the integer pointed to by signop. The name can be either the signal name without
the "SIG" prefix or a string representation of the decimal signal number (i.e., "9").
Note that sig2str and str2sig depart from common practice and don't set errno when they
fail.
Page 475
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
10.22. Summary
Signals are used in most nontrivial applications. An understanding of the hows and whys of
signal handling is essential to advanced UNIX System programming. This chapter has been a
long and thorough look at UNIX System signals. We started by looking at the warts in previous
implementations of signals and how they manifest themselves. We then proceeded to the
POSIX.1 reliable-signal concept and all the related functions. Once we covered all these
details, we were able to provide implementations of the POSIX.1 abort, system, and sleep
functions. We finished with a look at the job-control signals and the ways that we can
convert between signal names and signal numbers.
Page 476
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Exercises
10.1
10.2
10.3
10.4
In Figure 10.2, remove the for (;;) statement. What happens and why?
Implement the sig2str function described in Section 10.21.
Draw pictures of the stack frames when we run the program from Figure 10.9.
In Figure 10.11, we showed a technique that's often used to set a timeout on
an I/O operation using setjmp and longjmp. The following code has also been
seen:
signal(SIGALRM, sig_alrm);
alarm(60);
if (setjmp(env_alrm) != 0) {
/* handle timeout */
...
}
...
What else is wrong with this sequence of code?
10.5
10.6
10.7
10.8
10.9
10.10
Using only a single timer (either alarm or the higher-precision setitimer),
provide a set of functions that allows a process to set any number of timers.
Write the following program to test the parentchild synchronization functions in
Figure 10.24. The process creates a file and writes the integer 0 to the file.
The process then calls fork, and the parent and child alternate incrementing
the counter in the file. Each time the counter is incremented, print which
process (parent or child) is doing the increment.
In the function shown in Figure 10.25, if the caller catches SIGABRT and returns
from the signal handler, why do we go to the trouble of resetting the
disposition to its default and call kill the second time, instead of simply calling
_exit?
Why do you think the siginfo structure (Section 10.14) includes the real user
ID, instead of the effective user ID, in the si_uid field?
Rewrite the function in Figure 10.14 to handle all the signals from Figure 10.1.
The function should consist of a single loop that iterates once for every signal
in the current signal mask (not once for every possible signal).
Write a program that calls sleep(60) in an infinite loop. Every five times through
the loop (every 5 minutes), fetch the current time of day and print the tm_sec
field. Run the program overnight and explain the results. How would a program
Page 477
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
such as the BSD cron daemon, which runs every minute on the minute, handle
this?
10.11
10.12
Modify Figure 3.4 as follows: (a) change BUFFSIZE to 100; (b) catch the SIGXFSZ
signal using the signal_intr function, printing a message when it's caught, and
returning from the signal handler; and (c) print the return value from write if
the requested number of bytes weren't written. Modify the soft RLIMIT_FSIZE
resource limit (Section 7.11) to 1,024 bytes and run your new program, copying
a file that is larger than 1,024 bytes. (Try to set the soft resource limit from
your shell. If you can't do this from your shell, call setrlimit directly from the
program.) Run this program on the different systems that you have access to.
What happens and why?
Write a program that calls fwrite with a large buffer (a few hundred
megabytes). Before calling fwrite, call alarm to schedule a signal in 1 second.
In your signal handler, print that the signal was caught and return. Does the
call to fwrite complete? What's happening?
Page 478
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Chapter 11. Threads
Section 11.1. Introduction
Section 11.2. Thread Concepts
Section 11.3. Thread Identification
Section 11.4. Thread Creation
Section 11.5. Thread Termination
Section 11.6. Thread Synchronization
Section 11.7. Summary
Exercises
Page 479
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
11.1. Introduction
We discussed processes in earlier chapters. We learned about the environment of a UNIX
process, the relationships between processes, and ways to control processes. We saw that a
limited amount of sharing can occur between related processes.
In this chapter, we'll look inside a process further to see how we can use multiple threads of
control (or simply threads) to perform multiple tasks within the environment of a single
process. All threads within a single process have access to the same process components,
such as file descriptors and memory.
Any time you try to share a single resource among multiple users, you have to deal with
consistency. We'll conclude the chapter with a look at the synchronization mechanisms
available to prevent multiple threads from viewing inconsistencies in their shared resources.
Page 480
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
11.2. Thread Concepts
A typical UNIX process can be thought of as having a single thread of control: each process is
doing only one thing at a time. With multiple threads of control, we can design our programs
to do more than one thing at a time within a single process, with each thread handling a
separate task. This approach can have several benefits.

We can simplify code that deals with asynchronous events by assigning a separate
thread to handle each event type. Each thread can then handle its event using a
synchronous programming model. A synchronous programming model is much simpler
than an asynchronous one.

Multiple processes have to use complex mechanisms provided by the operating system
to share memory and file descriptors, as we will see in Chapters 15 and 17. Threads,
on the other hand, automatically have access to the same memory address space and
file descriptors.

Some problems can be partitioned so that overall program throughput can be improved.
A single process that has multiple tasks to perform implicitly serializes those tasks,
because there is only one thread of control. With multiple threads of control, the
processing of independent tasks can be interleaved by assigning a separate thread per
task. Two tasks can be interleaved only if they don't depend on the processing
performed by each other.

Similarly, interactive programs can realize improved response time by using multiple
threads to separate the portions of the program that deal with user input and output
from the other parts of the program.
Some people associate multithreaded programming with multiprocessor systems. The benefits
of a multithreaded programming model can be realized even if your program is running on a
uniprocessor. A program can be simplified using threads regardless of the number of
processors, because the number of processors doesn't affect the program structure.
Furthermore, as long as your program has to block when serializing tasks, you can still see
improvements in response time and throughput when running on a uniprocessor, because some
threads might be able to run while others are blocked.
A thread consists of the information necessary to represent an execution context within a
process. This includes a thread ID that identifies the thread within a process, a set of register
values, a stack, a scheduling priority and policy, a signal mask, an errno variable (recall
Section 1.7), and thread-specific data (Section 12.6). Everything within a process is sharable
among the threads in a process, including the text of the executable program, the program's
global and heap memory, the stacks, and the file descriptors.
The threads interface we're about to see is from POSIX.1-2001. The threads interface, also
known as "pthreads" for "POSIX threads," is an optional feature in POSIX.1-2001. The feature
test macro for POSIX threads is _POSIX_THREADS. Applications can either use this in an #ifdef
test to determine at compile time whether threads are supported or call sysconf with the
_SC_THREADS constant to determine at runtime whether threads are supported.
Page 481
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
11.3. Thread Identification
Just as every process has a process ID, every thread has a thread ID. Unlike the process ID,
which is unique in the system, the thread ID has significance only within the context of the
process to which it belongs.
Recall that a process ID, represented by the pid_t data type, is a non-negative integer. A
thread ID is represented by the pthread_t data type. Implementations are allowed to use a
structure to represent the pthread_t data type, so portable implementations can't treat them
as integers. Therefore, a function must be used to compare two thread IDs.
#include <pthread.h>
int pthread_equal(pthread_t tid1, pthread_t tid2
);
Returns: nonzero if equal, 0 otherwise
Linux 2.4.22 uses an unsigned long integer for the pthread_t data type. Solaris 9 represents
the pthread_t data type as an unsigned integer. FreeBSD 5.2.1 and Mac OS X 10.3 use a
pointer to the pthread structure for the pthread_t data type.
A consequence of allowing the pthread_t data type to be a structure is that there is no
portable way to print its value. Sometimes, it is useful to print thread IDs during program
debugging, but there is usually no need to do so otherwise. At worst, this results in
nonportable debug code, so it is not much of a limitation.
A thread can obtain its own thread ID by calling the pthread_self function.
#include <pthread.h>
pthread_t pthread_self(void);
Returns: the thread ID of the calling thread
This function can be used with pthread_equal when a thread needs to identify data structures
that are tagged with its thread ID. For example, a master thread might place work
assignments on a queue and use the thread ID to control which jobs go to each worker
thread. This is illustrated in Figure 11.1. A single master thread places new jobs on a work
queue. A pool of three worker threads removes jobs from the queue. Instead of allowing each
thread to process whichever job is at the head of the queue, the master thread controls job
assignment by placing the ID of the thread that should process the job in each job structure.
Each worker thread then removes only jobs that are tagged with its own thread ID.
Figure 11.1. Work queue example
Page 482
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 483
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
11.4. Thread Creation
The traditional UNIX process model supports only one thread of control per process.
Conceptually, this is the same as a threads-based model whereby each process is made up of
only one thread. With pthreads, when a program runs, it also starts out as a single process
with a single thread of control. As the program runs, its behavior should be indistinguishable
from the traditional process, until it creates more threads of control. Additional threads can be
created by calling the pthread_create function.
[View full width]
#include <pthread.h>
int pthread_create(pthread_t *restrict tidp,
const pthread_attr_t *restrict
attr,
void *(*start_rtn)(void), void
*restrict arg);
Returns: 0 if OK, error number on failure
The memory location pointed to by tidp is set to the thread ID of the newly created thread
when pthread_create returns successfully. The attr argument is used to customize various
thread attributes. We'll cover thread attributes in Section 12.3, but for now, we'll set this to
NULL to create a thread with the default attributes.
The newly created thread starts running at the address of the start_rtn function. This
function takes a single argument, arg, which is a typeless pointer. If you need to pass more
than one argument to the start_rtn function, then you need to store them in a structure and
pass the address of the structure in arg.
When a thread is created, there is no guarantee which runs first: the newly created thread or
the calling thread. The newly created thread has access to the process address space and
inherits the calling thread's floating-point environment and signal mask; however, the set of
pending signals for the thread is cleared.
Note that the pthread functions usually return an error code when they fail. They don't set
errno like the other POSIX functions. The per thread copy of errno is provided only for
compatibility with existing functions that use it. With threads, it is cleaner to return the error
code from the function, thereby restricting the scope of the error to the function that caused
it, instead of relying on some global state that is changed as a side effect of the function.
Example
Although there is no portable way to print the thread ID, we can write a small test program
that does, to gain some insight into how threads work. The program in Figure 11.2 creates
one thread and prints the process and thread IDs of the new thread and the initial thread.
This example has two oddities, necessary to handle races between the main thread and the
new thread. (We'll learn better ways to deal with these later in this chapter.) The first is the
need to sleep in the main thread. If it doesn't sleep, the main thread might exit, thereby
terminating the entire process before the new thread gets a chance to run. This behavior is
Page 484
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
dependent on the operating system's threads implementation and scheduling algorithms.
The second oddity is that the new thread obtains its thread ID by calling pthread_self instead
of reading it out of shared memory or receiving it as an argument to its thread-start routine.
Recall that pthread_create will return the thread ID of the newly created thread through the
first parameter (tidp). In our example, the main thread stores this in ntid, but the new thread
can't safely use it. If the new thread runs before the main thread returns from calling
pthread_create, then the new thread will see the uninitialized contents of ntid instead of the
thread ID.
Running the program in Figure 11.2 on Solaris gives us
$ ./a.out
main thread: pid 7225 tid 1 (0x1)
new thread: pid 7225 tid 4 (0x4)
As we expect, both threads have the same process ID, but different thread IDs. Running the
program in Figure 11.2 on FreeBSD gives us
$ ./a.out
main thread: pid 14954 tid 134529024 (0x804c000)
new thread: pid 14954 tid 134530048 (0x804c400)
As we expect, both threads have the same process ID. If we look at the thread IDs as
decimal integers, the values look strange, but if we look at them in hexadecimal, they make
more sense. As we noted earlier, FreeBSD uses a pointer to the thread data structure for its
thread ID.
We would expect Mac OS X to be similar to FreeBSD; however, the thread ID for the main
thread is from a different address range than the thread IDs for threads created with
pthread_create:
$ ./a.out
main thread: pid 779 tid 2684396012 (0xa000a1ec)
new thread: pid 779 tid 25166336 (0x1800200)
Running the same program on Linux gives us slightly different results:
$ ./a.out
new thread: pid 6628 tid 1026 (0x402)
main thread: pid 6626 tid 1024 (0x400)
The Linux thread IDs look more reasonable, but the process IDs don't match. This is an
artifact of the Linux threads implementation, where the clone system call is used to
implement pthread_create. The clone system call creates a child process that can share a
configurable amount of its parent's execution context, such as file descriptors and memory.
Note also that the output from the main thread appears before the output from the thread we
create, except on Linux. This illustrates that we can't make any assumptions about how
threads will be scheduled.
Figure 11.2. Printing thread IDs
#include "apue.h"
#include <pthread.h>
Page 485
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
pthread_t ntid;
void
printids(const char *s)
{
pid_t
pid;
pthread_t tid;
pid = getpid();
tid = pthread_self();
printf("%s pid %u tid %u (0x%x)\n", s, (unsigned int)pid,
(unsigned int)tid, (unsigned int)tid);
}
void *
thr_fn(void *arg)
{
printids("new thread: ");
return((void *)0);
}
int
main(void)
{
int
err;
err = pthread_create(&ntid, NULL, thr_fn, NULL);
if (err != 0)
err_quit("can't create thread: %s\n", strerror(err));
printids("main thread:");
sleep(1);
exit(0);
}
Page 486
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
11.5. Thread Termination
If any thread within a process calls exit, _Exit, or _exit, then the entire process terminates.
Similarly, when the default action is to terminate the process, a signal sent to a thread will
terminate the entire process (we'll talk more about the interactions between signals and
threads in Section 12.8).
A single thread can exit in three ways, thereby stopping its flow of control, without
terminating the entire process.
1.
The thread can simply return from the start routine. The return value is the thread's
exit code.
2.
The thread can be canceled by another thread in the same process.
3.
The thread can call pthread_exit.
#include <pthread.h>
void pthread_exit(void *rval_ptr
);
The rval_ptr is a typeless pointer, similar to the single argument passed to the start routine.
This pointer is available to other threads in the process by calling the pthread_join function.
#include <pthread.h>
int pthread_join(pthread_t thread, void **rval_ptr
);
Returns: 0 if OK, error number on failure
The calling thread will block until the specified thread calls pthread_exit, returns from its start
routine, or is canceled. If the thread simply returned from its start routine, rval_ptr will
contain the return code. If the thread was canceled, the memory location specified by
rval_ptr is set to PTHREAD_CANCELED.
By calling pthread_join, we automatically place a thread in the detached state (discussed
shortly) so that its resources can be recovered. If the thread was already in the detached
state, calling pthread_join fails, returning EINVAL.
If we're not interested in a thread's return value, we can set rval_ptr to NULL. In this case,
calling pthread_join allows us to wait for the specified thread, but does not retrieve the
thread's termination status.
Example
Figure 11.3 shows how to fetch the exit code from a thread that has terminated.
Running the program in Figure 11.3 gives us
$ ./a.out
thread 1 returning
thread 2 exiting
Page 487
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
thread 1 exit code 1
thread 2 exit code 2
As we can see, when a thread exits by calling pthread_exit or by simply returning from the
start routine, the exit status can be obtained by another thread by calling pthread_join.
Figure 11.3. Fetching the thread exit status
#include "apue.h"
#include <pthread.h>
void *
thr_fn1(void *arg)
{
printf("thread 1 returning\n");
return((void *)1);
}
void *
thr_fn2(void *arg)
{
printf("thread 2 exiting\n");
pthread_exit((void *)2);
}
int
main(void)
{
int
pthread_t
void
err;
tid1, tid2;
*tret;
err = pthread_create(&tid1, NULL, thr_fn1, NULL);
if (err != 0)
err_quit("can't create thread 1: %s\n", strerror(err));
err = pthread_create(&tid2, NULL, thr_fn2, NULL);
if (err != 0)
err_quit("can't create thread 2: %s\n", strerror(err));
err = pthread_join(tid1, &tret);
if (err != 0)
err_quit("can't join with thread 1: %s\n", strerror(err));
printf("thread 1 exit code %d\n", (int)tret);
err = pthread_join(tid2, &tret);
if (err != 0)
err_quit("can't join with thread 2: %s\n", strerror(err));
printf("thread 2 exit code %d\n", (int)tret);
exit(0);
}
The typeless pointer passed to pthread_create and pthread_exit can be used to pass more
than a single value. The pointer can be used to pass the address of a structure containing
more complex information. Be careful that the memory used for the structure is still valid when
the caller has completed. If the structure was allocated on the caller's stack, for example, the
memory contents might have changed by the time the structure is used. For example, if a
thread allocates a structure on its stack and passes a pointer to this structure to
pthread_exit, then the stack might be destroyed and its memory reused for something else by
the time the caller of pthread_join tries to use it.
Page 488
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Example
The program in Figure 11.4 shows the problem with using an automatic variable (allocated on
the stack) as the argument to pthread_exit.
When we run this program on Linux, we get
$ ./a.out
thread 1:
structure at 0x409a2abc
foo.a = 1
foo.b = 2
foo.c = 3
foo.d = 4
parent starting second thread
thread 2: ID is 32770
parent:
structure at 0x409a2abc
foo.a = 0
foo.b = 32770
foo.c = 1075430560
foo.d = 1073937284
Of course, the results vary, depending on the memory architecture, the compiler, and the
implementation of the threads library. The results on FreeBSD are similar:
$ ./a.out
thread 1:
structure at 0xbfafefc0
foo.a = 1
foo.b = 2
foo.c = 3
foo.d = 4
parent starting second thread
thread 2: ID is 134534144
parent:
structure at 0xbfafefc0
foo.a = 0
foo.b = 134534144
foo.c = 3
foo.d = 671642590
As we can see, the contents of the structure (allocated on the stack of thread tid1) have
changed by the time the main thread can access the structure. Note how the stack of the
second thread (tid2) has overwritten the first thread's stack. To solve this problem, we can
either use a global structure or allocate the structure using malloc.
Figure 11.4. Incorrect use of pthread_exit argument
#include "apue.h"
#include <pthread.h>
struct foo {
int a, b, c, d;
};
void
printfoo(const char *s, const struct foo *fp)
Page 489
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
{
printf(s);
printf(" structure at 0x%x\n", (unsigned)fp);
printf(" foo.a = %d\n", fp->a);
printf(" foo.b = %d\n", fp->b);
printf(" foo.c = %d\n", fp->c);
printf(" foo.d = %d\n", fp->d);
}
void *
thr_fn1(void *arg)
{
struct foo
foo = {1, 2, 3, 4};
printfoo("thread 1:\n", &foo);
pthread_exit((void *)&foo);
}
void *
thr_fn2(void *arg)
{
printf("thread 2: ID is %d\n", pthread_self());
pthread_exit((void *)0);
}
int
main(void)
{
int
err;
pthread_t
tid1, tid2;
struct foo *fp;
err = pthread_create(&tid1, NULL, thr_fn1, NULL);
if (err != 0)
err_quit("can't create thread 1: %s\n", strerror(err));
err = pthread_join(tid1, (void *)&fp);
if (err != 0)
err_quit("can't join with thread 1: %s\n", strerror(err));
sleep(1);
printf("parent starting second thread\n");
err = pthread_create(&tid2, NULL, thr_fn2, NULL);
if (err != 0)
err_quit("can't create thread 2: %s\n", strerror(err));
sleep(1);
printfoo("parent:\n", fp);
exit(0);
}
One thread can request that another in the same process be canceled by calling the
pthread_cancel function.
#include <pthread.h>
int pthread_cancel(pthread_t tid);
Returns: 0 if OK, error number on failure
Page 490
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
In the default circumstances, pthread_cancel will cause the thread specified by tid to behave
as if it had called pthread_exit with an argument of PTHREAD_CANCELED. However, a thread can
elect to ignore or otherwise control how it is canceled. We will discuss this in detail in Section
12.7. Note that pthread_cancel doesn't wait for the thread to terminate. It merely makes the
request.
A thread can arrange for functions to be called when it exits, similar to the way that the
atexit function (Section 7.3) can be used by a process to arrange that functions can be
called when the process exits. The functions are known as thread cleanup handlers. More
than one cleanup handler can be established for a thread. The handlers are recorded in a
stack, which means that they are executed in the reverse order from that with which they
were registered.
[View full width]
#include <pthread.h>
void pthread_cleanup_push(void (*rtn)(void *),
void *arg);
void pthread_cleanup_pop(int execute);
The pthread_cleanup_push function schedules the cleanup function, rtn, to be called with the
single argument, arg, when the thread performs one of the following actions:

Makes a call to pthread_exit

Responds to a cancellation request

Makes a call to pthread_cleanup_pop with a nonzero execute argument
If the execute argument is set to zero, the cleanup function is not called. In either case,
pthread_cleanup_pop removes the cleanup handler established by the last call to
pthread_cleanup_push.
A restriction with these functions is that, because they can be implemented as macros, they
must be used in matched pairs within the same scope in a thread. The macro definition of
pthread_cleanup_push can include a { character, in which case the matching } character is in
the pthread_cleanup_pop definition.
Example
Figure 11.5 shows how to use thread cleanup handlers. Although the example is somewhat
contrived, it illustrates the mechanics involved. Note that although we never intend to pass a
nonzero argument to the thread start-up routines, we still need to match calls to
pthread_cleanup_pop with the calls to pthread_cleanup_push; otherwise, the program might not
compile.
Running the program in Figure 11.5 gives us
$ ./a.out
thread 1 start
thread 1 push complete
thread 2 start
thread 2 push complete
cleanup: thread 2 second handler
cleanup: thread 2 first handler
thread 1 exit code 1
Page 491
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
thread 2 exit code 2
From the output, we can see that both threads start properly and exit, but that only the
second thread's cleanup handlers are called. Thus, if the thread terminates by returning from
its start routine, its cleanup handlers are not called. Also note that the cleanup handlers are
called in the reverse order from which they were installed.
Figure 11.5. Thread cleanup handler
#include "apue.h"
#include <pthread.h>
void
cleanup(void *arg)
{
printf("cleanup: %s\n", (char *)arg);
}
void *
thr_fn1(void *arg)
{
printf("thread 1 start\n");
pthread_cleanup_push(cleanup, "thread 1 first handler");
pthread_cleanup_push(cleanup, "thread 1 second handler");
printf("thread 1 push complete\n");
if (arg)
return((void *)1);
pthread_cleanup_pop(0);
pthread_cleanup_pop(0);
return((void *)1);
}
void *
thr_fn2(void *arg)
{
printf("thread 2 start\n");
pthread_cleanup_push(cleanup, "thread 2 first handler");
pthread_cleanup_push(cleanup, "thread 2 second handler");
printf("thread 2 push complete\n");
if (arg)
pthread_exit((void *)2);
pthread_cleanup_pop(0);
pthread_cleanup_pop(0);
pthread_exit((void *)2);
}
int
main(void)
{
int
pthread_t
void
err;
tid1, tid2;
*tret;
err = pthread_create(&tid1, NULL,
if (err != 0)
err_quit("can't create thread
err = pthread_create(&tid2, NULL,
if (err != 0)
err_quit("can't create thread
thr_fn1, (void *)1);
1: %s\n", strerror(err));
thr_fn2, (void *)1);
2: %s\n", strerror(err));
Page 492
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
err = pthread_join(tid1, &tret);
if (err != 0)
err_quit("can't join with thread 1: %s\n", strerror(err));
printf("thread 1 exit code %d\n", (int)tret);
err = pthread_join(tid2, &tret);
if (err != 0)
err_quit("can't join with thread 2: %s\n", strerror(err));
printf("thread 2 exit code %d\n", (int)tret);
exit(0);
}
By now, you should begin to see similarities between the thread functions and the process
functions. Figure 11.6 summarizes the similar functions.
Figure 11.6. Comparison of process and thread primitives
Process
primitive
Thread primitive
Description
fork
pthread_create
create a new flow of control
exit
pthread_exit
exit from an existing flow of control
waitpid
pthread_join
get exit status from flow of control
atexit
pthread_cancel_push
register function to be called at exit from flow of
control
getpid
pthread_self
get ID for flow of control
abort
pthread_cancel
request abnormal termination of flow of control
By default, a thread's termination status is retained until pthread_join is called for that
thread. A thread's underlying storage can be reclaimed immediately on termination if that
thread has been detached. When a thread is detached, the pthread_join function can't be
used to wait for its termination status. A call to pthread_join for a detached thread will fail,
returning EINVAL. We can detach a thread by calling pthread_detach.
#include <pthread.h>
int pthread_detach(pthread_t tid);
Returns: 0 if OK, error number on failure
As we will see in the next chapter, we can create a thread that is already in the detached
state by modifying the thread attributes we pass to pthread_create.
Page 493
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
11.6. Thread Synchronization
When multiple threads of control share the same memory, we need to make sure that each
thread sees a consistent view of its data. If each thread uses variables that other threads
don't read or modify, no consistency problems exist. Similarly, if a variable is read-only, there
is no consistency problem with more than one thread reading its value at the same time.
However, when one thread can modify a variable that other threads can read or modify, we
need to synchronize the threads to ensure that they don't use an invalid value when
accessing the variable's memory contents.
When one thread modifies a variable, other threads can potentially see inconsistencies when
reading the value of the variable. On processor architectures in which the modification takes
more than one memory cycle, this can happen when the memory read is interleaved between
the memory write cycles. Of course, this behavior is architecture dependent, but portable
programs can't make any assumptions about what type of processor architecture is being
used.
Figure 11.7 shows a hypothetical example of two threads reading and writing the same
variable. In this example, thread A reads the variable and then writes a new value to it, but
the write operation takes two memory cycles. If thread B reads the same variable between
the two write cycles, it will see an inconsistent value.
Figure 11.7. Interleaved memory cycles with two threads
To solve this problem, the threads have to use a lock that will allow only one thread to access
the variable at a time. Figure 11.8 shows this synchronization. If it wants to read the variable,
thread B acquires a lock. Similarly, when thread A updates the variable, it acquires the same
lock. Thus, thread B will be unable to read the variable until thread A releases the lock.
Figure 11.8. Two threads synchronizing memory access
Page 494
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
You also need to synchronize two or more threads that might try to modify the same variable
at the same time. Consider the case in which you increment a variable (Figure 11.9). The
increment operation is usually broken down into three steps.
1.
Read the memory location into a register.
2.
Increment the value in the register.
3.
Write the new value back to the memory location.
Figure 11.9. Two unsynchronized threads incrementing the same
variable
[View full size image]
Page 495
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
If two threads try to increment the same variable at almost the same time without
synchronizing with each other, the results can be inconsistent. You end up with a value that
is either one or two greater than before, depending on the value observed when the second
thread starts its operation. If the second thread performs step 1 before the first thread
performs step 3, the second thread will read the same initial value as the first thread,
increment it, and write it back, with no net effect.
If the modification is atomic, then there isn't a race. In the previous example, if the increment
takes only one memory cycle, then no race exists. If our data always appears to be
sequentially consistent, then we need no additional synchronization. Our operations are
sequentially consistent when multiple threads can't observe inconsistencies in our data. In
modern computer systems, memory accesses take multiple bus cycles, and multiprocessors
generally interleave bus cycles among multiple processors, so we aren't guaranteed that our
data is sequentially consistent.
In a sequentially consistent environment, we can explain modifications to our data as a
sequential step of operations taken by the running threads. We can say such things as
"Thread A incremented the variable, then thread B incremented the variable, so its value is
two greater than before" or "Thread B incremented the variable, then thread A incremented
the variable, so its value is two greater than before." No possible ordering of the two threads
can result in any other value of the variable.
Besides the computer architecture, races can arise from the ways in which our programs use
variables, creating places where it is possible to view inconsistencies. For example, we might
increment a variable and then make a decision based on its value. The combination of the
increment step and the decision-making step aren't atomic, so this opens a window where
inconsistencies can arise.
Mutexes
We can protect our data and ensure access by only one thread at a time by using the
Page 496
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
pthreads mutual-exclusion interfaces. A mutex is basically a lock that we set (lock) before
accessing a shared resource and release (unlock) when we're done. While it is set, any other
thread that tries to set it will block until we release it. If more than one thread is blocked
when we unlock the mutex, then all threads blocked on the lock will be made runnable, and
the first one to run will be able to set the lock. The others will see that the mutex is still
locked and go back to waiting for it to become available again. In this way, only one thread
will proceed at a time.
This mutual-exclusion mechanism works only if we design our threads to follow the same
data-access rules. The operating system doesn't serialize access to data for us. If we allow
one thread to access a shared resource without first acquiring a lock, then inconsistencies
can occur even though the rest of our threads do acquire the lock before attempting to
access the shared resource.
A mutex variable is represented by the pthread_mutex_t data type. Before we can use a mutex
variable, we must first initialize it by either setting it to the constant
PTHREAD_MUTEX_INITIALIZER (for statically-allocated mutexes only) or calling pthread_mutex_init
. If we allocate the mutex dynamically (by calling malloc, for example), then we need to call
pthread_mutex_destroy before freeing the memory.
[View full width]
#include <pthread.h>
int pthread_mutex_init(pthread_mutex_t *restrict
mutex,
const pthread_mutexattr_t
*restrict attr);
int pthread_mutex_destroy(pthread_mutex_t *mutex);
Both return: 0 if OK, error number on failure
To initialize a mutex with the default attributes, we set attr to NULL. We will discuss
nondefault mutex attributes in Section 12.4.
To lock a mutex, we call pthread_mutex_lock. If the mutex is already locked, the calling thread
will block until the mutex is unlocked. To unlock a mutex, we call pthread_mutex_unlock.
#include <pthread.h>
int pthread_mutex_lock(pthread_mutex_t *mutex);
int pthread_mutex_trylock(pthread_mutex_t *mutex);
int pthread_mutex_unlock(pthread_mutex_t *mutex);
All return: 0 if OK, error number on failure
If a thread can't afford to block, it can use pthread_mutex_trylock to lock the mutex
conditionally. If the mutex is unlocked at the time pthread_mutex_trylock is called, then
pthread_mutex_trylock will lock the mutex without blocking and return 0. Otherwise,
pthread_mutex_trylock will fail, returning EBUSY without locking the mutex.
Page 497
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Example
Figure 11.10 illustrates a mutex used to protect a data structure. When more than one thread
needs to access a dynamically-allocated object, we can embed a reference count in the
object to ensure that we don't free its memory before all threads are done using it.
We lock the mutex before incrementing the reference count, decrementing the reference
count, and checking whether the reference count reaches zero. No locking is necessary when
we initialize the reference count to 1 in the foo_alloc function, because the allocating thread
is the only reference to it so far. If we were to place the structure on a list at this point, it
could be found by other threads, so we would need to lock it first.
Before using the object, threads are expected to add a reference count to it. When they are
done, they must release the reference. When the last reference is released, the object's
memory is freed.
Figure 11.10. Using a mutex to protect a data structure
#include <stdlib.h>
#include <pthread.h>
struct foo {
int
f_count;
pthread_mutex_t f_lock;
/* ... more stuff here ... */
};
struct foo *
foo_alloc(void) /* allocate the object */
{
struct foo *fp;
if ((fp = malloc(sizeof(struct foo))) != NULL) {
fp->f_count = 1;
if (pthread_mutex_init(&fp->f_lock, NULL) != 0) {
free(fp);
return(NULL);
}
/* ... continue initialization ... */
}
return(fp);
}
void
foo_hold(struct foo *fp) /* add a reference to the object */
{
pthread_mutex_lock(&fp->f_lock);
fp->f_count++;
pthread_mutex_unlock(&fp->f_lock);
}
void
foo_rele(struct foo *fp) /* release a reference to the object */
{
pthread_mutex_lock(&fp->f_lock);
if (--fp->f_count == 0) { /* last reference */
pthread_mutex_unlock(&fp->f_lock);
pthread_mutex_destroy(&fp->f_lock);
free(fp);
} else {
Page 498
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
pthread_mutex_unlock(&fp->f_lock);
}
}
Deadlock Avoidance
A thread will deadlock itself if it tries to lock the same mutex twice, but there are less obvious
ways to create deadlocks with mutexes. For example, when we use more than one mutex in
our programs, a deadlock can occur if we allow one thread to hold a mutex and block while
trying to lock a second mutex at the same time that another thread holding the second mutex
tries to lock the first mutex. Neither thread can proceed, because each needs a resource that
is held by the other, so we have a deadlock.
Deadlocks can be avoided by carefully controlling the order in which mutexes are locked. For
example, assume that you have two mutexes, A and B, that you need to lock at the same
time. If all threads always lock mutex A before mutex B, no deadlock can occur from the use
of the two mutexes (but you can still deadlock on other resources). Similarly, if all threads
always lock mutex B before mutex A, no deadlock will occur. You'll have the potential for a
deadlock only when one thread attempts to lock the mutexes in the opposite order from
another thread.
Sometimes, an application's architecture makes it difficult to apply a lock ordering. If enough
locks and data structures are involved that the functions you have available can't be molded
to fit a simple hierarchy, then you'll have to try some other approach. In this case, you might
be able to release your locks and try again at a later time. You can use the
pthread_mutex_trylock interface to avoid deadlocking in this case. If you are already holding
locks and pthread_mutex_trylock is successful, then you can proceed. If it can't acquire the
lock, however, you can release the locks you already hold, clean up, and try again later.
Example
In this example, we update Figure 11.10 to show the use of two mutexes. We avoid deadlocks
by ensuring that when we need to acquire two mutexes at the same time, we always lock
them in the same order. The second mutex protects a hash list that we use to keep track of
the foo data structures. Thus, the hashlock mutex protects both the fh hash table and the
f_next hash link field in the foo structure. The f_lock mutex in the foo structure protects
access to the remainder of the foo structure's fields.
Comparing Figure 11.11 with Figure 11.10, we see that our allocation function now locks the
hash list lock, adds the new structure to a hash bucket, and before unlocking the hash list
lock, locks the mutex in the new structure. Since the new structure is placed on a global list,
other threads can find it, so we need to block them if they try to access the new structure,
until we are done initializing it.
The foo_find function locks the hash list lock and searches for the requested structure. If it is
found, we increase the reference count and return a pointer to the structure. Note that we
honor the lock ordering by locking the hash list lock in foo_find before foo_hold locks the foo
structure's f_lock mutex.
Now with two locks, the foo_rele function is more complicated. If this is the last reference,
we need to unlock the structure mutex so that we can acquire the hash list lock, since we'll
need to remove the structure from the hash list. Then we reacquire the structure mutex.
Because we could have blocked since the last time we held the structure mutex, we need to
recheck the condition to see whether we still need to free the structure. If another thread
found the structure and added a reference to it while we blocked to honor the lock ordering,
we simply need to decrement the reference count, unlock everything, and return.
This locking is complex, so we need to revisit our design. We can simplify things considerably
by using the hash list lock to protect the structure reference count, too. The structure mutex
can be used to protect everything else in the foo structure. Figure 11.12 reflects this change.
Page 499
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Note how much simpler the program in Figure 11.12 is compared to the program in Figure
11.11. The lock-ordering issues surrounding the hash list and the reference count go away
when we use the same lock for both purposes. Multithreaded software design involves these
types of tradeoffs. If your locking granularity is too coarse, you end up with too many threads
blocking behind the same locks, with little improvement possible from concurrency. If your
locking granularity is too fine, then you suffer bad performance from excess locking overhead,
and you end up with complex code. As a programmer, you need to find the correct balance
between code complexity and performance, and still satisfy your locking requirements.
Figure 11.11. Using two mutexes
#include <stdlib.h>
#include <pthread.h>
#define NHASH 29
#define HASH(fp) (((unsigned long)fp)%NHASH)
struct foo *fh[NHASH];
pthread_mutex_t hashlock = PTHREAD_MUTEX_INITIALIZER;
struct foo {
int
f_count;
pthread_mutex_t f_lock;
struct foo
*f_next; /* protected by hashlock */
int
f_id;
/* ... more stuff here ... */
};
struct foo *
foo_alloc(void) /* allocate the object */
{
struct foo *fp;
int
idx;
if ((fp = malloc(sizeof(struct foo))) != NULL) {
fp->f_count = 1;
if (pthread_mutex_init(&fp->f_lock, NULL) != 0) {
free(fp);
return(NULL);
}
idx = HASH(fp);
pthread_mutex_lock(&hashlock);
fp->f_next = fh[idx];
fh[idx] = fp->f_next;
pthread_mutex_lock(&fp->f_lock);
pthread_mutex_unlock(&hashlock);
/* ... continue initialization ... */
pthread_mutex_unlock(&fp->f_lock);
}
return(fp);
}
void
foo_hold(struct foo *fp) /* add a reference to the object */
{
pthread_mutex_lock(&fp->f_lock);
fp->f_count++;
pthread_mutex_unlock(&fp->f_lock);
}
Page 500
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
struct foo *
foo_find(int id) /* find an existing object */
{
struct foo *fp;
int
idx;
idx = HASH(fp);
pthread_mutex_lock(&hashlock);
for (fp = fh[idx]; fp != NULL; fp = fp->f_next) {
if (fp->f_id == id) {
foo_hold(fp);
break;
}
}
pthread_mutex_unlock(&hashlock);
return(fp);
}
void
foo_rele(struct foo *fp) /* release a reference to the object */
{
struct foo *tfp;
int
idx;
pthread_mutex_lock(&fp->f_lock);
if (fp->f_count == 1) { /* last reference */
pthread_mutex_unlock(&fp->f_lock);
pthread_mutex_lock(&hashlock);
pthread_mutex_lock(&fp->f_lock);
/* need to recheck the condition */
if (fp->f_count != 1) {
fp->f_count--;
pthread_mutex_unlock(&fp->f_lock);
pthread_mutex_unlock(&hashlock);
return;
}
/* remove from list */
idx = HASH(fp);
tfp = fh[idx];
if (tfp == fp) {
fh[idx] = fp->f_next;
} else {
while (tfp->f_next != fp)
tfp = tfp->f_next;
tfp->f_next = fp->f_next;
}
pthread_mutex_unlock(&hashlock);
pthread_mutex_unlock(&fp->f_lock);
pthread_mutex_destroy(&fp->f_lock);
free(fp);
} else {
fp->f_count--;
pthread_mutex_unlock(&fp->f_lock);
}
}
Figure 11.12. Simplified locking
Page 501
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
#include <stdlib.h>
#include <pthread.h>
#define NHASH 29
#define HASH(fp) (((unsigned long)fp)%NHASH)
struct foo *fh[NHASH];
pthread_mutex_t hashlock = PTHREAD_MUTEX_INITIALIZER;
struct foo {
int
f_count; /* protected by hashlock */
pthread_mutex_t f_lock;
struct foo
*f_next; /* protected by hashlock */
int
f_id;
/* ... more stuff here ... */
};
struct foo *
foo_alloc(void) /* allocate the object */
{
struct foo *fp;
int
idx;
if ((fp = malloc(sizeof(struct foo))) != NULL) {
fp->f_count = 1;
if (pthread_mutex_init(&fp->f_lock, NULL) != 0) {
free(fp);
return(NULL);
}
idx = HASH(fp);
pthread_mutex_lock(&hashlock);
fp->f_next = fh[idx];
fh[idx] = fp->f_next;
pthread_mutex_lock(&fp->f_lock);
pthread_mutex_unlock(&hashlock);
/* ... continue initialization ... */
}
return(fp);
}
void
foo_hold(struct foo *fp) /* add a reference to the object */
{
pthread_mutex_lock(&hashlock);
fp->f_count++;
pthread_mutex_unlock(&hashlock);
}
struct foo *
foo_find(int id) /* find a existing object */
{
struct foo *fp;
int
idx;
idx = HASH(fp);
pthread_mutex_lock(&hashlock);
for (fp = fh[idx]; fp != NULL; fp = fp->f_next) {
if (fp->f_id == id) {
Page 502
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
fp->f_count++;
break;
}
}
pthread_mutex_unlock(&hashlock);
return(fp);
}
void
foo_rele(struct foo *fp) /* release a reference to the object */
{
struct foo *tfp;
int
idx;
pthread_mutex_lock(&hashlock);
if (--fp->f_count == 0) { /* last reference, remove from list */
idx = HASH(fp);
tfp = fh[idx];
if (tfp == fp) {
fh[idx] = fp->f_next;
} else {
while (tfp->f_next != fp)
tfp = tfp->f_next;
tfp->f_next = fp->f_next;
}
pthread_mutex_unlock(&hashlock);
pthread_mutex_destroy(&fp->f_lock);
free(fp);
} else {
pthread_mutex_unlock(&hashlock);
}
}
ReaderWriter Locks
Readerwriter locks are similar to mutexes, except that they allow for higher degrees of
parallelism. With a mutex, the state is either locked or unlocked, and only one thread can lock
it at a time. Three states are possible with a readerwriter lock: locked in read mode, locked in
write mode, and unlocked. Only one thread at a time can hold a readerwriter lock in write
mode, but multiple threads can hold a readerwriter lock in read mode at the same time.
When a readerwriter lock is write-locked, all threads attempting to lock it block until it is
unlocked. When a readerwriter lock is read-locked, all threads attempting to lock it in read
mode are given access, but any threads attempting to lock it in write mode block until all the
threads have relinquished their read locks. Although implementations vary, readerwriter locks
usually block additional readers if a lock is already held in read mode and a thread is blocked
trying to acquire the lock in write mode. This prevents a constant stream of readers from
starving waiting writers.
Readerwriter locks are well suited for situations in which data structures are read more often
than they are modified. When a readerwriter lock is held in write mode, the data structure it
protects can be modified safely, since only one thread at a time can hold the lock in write
mode. When the readerwriter lock is held in read mode, the data structure it protects can be
read by multiple threads, as long as the threads first acquire the lock in read mode.
Readerwriter locks are also called sharedexclusive locks. When a readerwriter lock is
read-locked, it is said to be locked in shared mode. When it is write-locked, it is said to be
locked in exclusive mode.
Page 503
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
As with mutexes, readerwriter locks must be initialized before use and destroyed before
freeing their underlying memory.
[View full width]
#include <pthread.h>
int pthread_rwlock_init(pthread_rwlock_t *restrict
rwlock,
const pthread_rwlockattr_t
*restrict attr);
int pthread_rwlock_destroy(pthread_rwlock_t *rwlock
);
Both return: 0 if OK, error number on failure
A readerwriter lock is initialized by calling pthread_rwlock_init. We can pass a null pointer for
attr if we want the readerwriter lock to have the default attributes. We discuss readerwriter
lock attributes in Section 12.4.
Before freeing the memory backing a readerwriter lock, we need to call pthread_rwlock_destroy
to clean it up. If pthread_rwlock_init allocated any resources for the readerwriter lock,
pthread_rwlock_destroy frees those resources. If we free the memory backing a readerwriter
lock without first calling pthread_rwlock_destroy, any resources assigned to the lock will be
lost.
To lock a readerwriter lock in read mode, we call pthread_rwlock_rdlock. To write-lock a
readerwriter lock, we call pthread_rwlock_wrlock. Regardless of how we lock a readerwriter
lock, we can call pthread_rwlock_unlock to unlock it.
#include <pthread.h>
int pthread_rwlock_rdlock(pthread_rwlock_t *rwlock
);
int pthread_rwlock_wrlock(pthread_rwlock_t *rwlock
);
int pthread_rwlock_unlock(pthread_rwlock_t *rwlock
);
All return: 0 if OK, error number on failure
Implementations might place a limit on the number of times a readerwriter lock can be locked
in shared mode, so we need to check the return value of pthread_rwlock_rdlock. Even though
pthread_rwlock_wrlock and pthread_rwlock_unlock have error returns, we don't need to check
them if we design our locking properly. The only error returns defined are when we use them
improperly, such as with an uninitialized lock, or when we might deadlock by attempting to
acquire a lock we already own.
The Single UNIX Specification also defines conditional versions of the readerwriter locking
primitives.
Page 504
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
[View full width]
#include <pthread.h>
int pthread_rwlock_tryrdlock(pthread_rwlock_t
*rwlock);
int pthread_rwlock_trywrlock(pthread_rwlock_t
*rwlock);
Both return: 0 if OK, error number on failure
When the lock can be acquired, these functions return 0. Otherwise, they return the error
EBUSY. These functions can be used in situations in which conforming to a lock hierarchy isn't
enough to avoid a deadlock, as we discussed previously.
Example
The program in Figure 11.13 illustrates the use of readerwriter locks. A queue of job requests
is protected by a single readerwriter lock. This example shows a possible implementation of
Figure 11.1, whereby multiple worker threads obtain jobs assigned to them by a single master
thread.
In this example, we lock the queue's readerwriter lock in write mode whenever we need to add
a job to the queue or remove a job from the queue. Whenever we search the queue, we grab
the lock in read mode, allowing all the worker threads to search the queue concurrently. Using
a readerwriter lock will improve performance in this case only if threads search the queue
much more frequently than they add or remove jobs.
The worker threads take only those jobs that match their thread ID off the queue. Since the
job structures are used only by one thread at a time, they don't need any extra locking.
Figure 11.13. Using readerwriter locks
#include <stdlib.h>
#include <pthread.h>
struct job {
struct job *j_next;
struct job *j_prev;
pthread_t
j_id;
/* tells which thread handles this job */
/* ... more stuff here ... */
};
struct queue {
struct job
*q_head;
struct job
*q_tail;
pthread_rwlock_t q_lock;
};
/*
* Initialize a queue.
*/
int
queue_init(struct queue *qp)
Page 505
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
{
int err;
qp->q_head = NULL;
qp->q_tail = NULL;
err = pthread_rwlock_init(&qp->q_lock, NULL);
if (err != 0)
return(err);
/* ... continue initialization ... */
return(0);
}
/*
* Insert a job at the head of the queue.
*/
void
job_insert(struct queue *qp, struct job *jp)
{
pthread_rwlock_wrlock(&qp->q_lock);
jp->j_next = qp->q_head;
jp->j_prev = NULL;
if (qp->q_head != NULL)
qp->q_head->j_prev = jp;
else
qp->q_tail = jp;
/* list was empty */
qp->q_head = jp;
pthread_rwlock_unlock(&qp->q_lock);
}
/*
* Append a job on the tail of the queue.
*/
void
job_append(struct queue *qp, struct job *jp)
{
pthread_rwlock_wrlock(&qp->q_lock);
jp->j_next = NULL;
jp->j_prev = qp->q_tail;
if (qp->q_tail != NULL)
qp->q_tail->j_next = jp;
else
qp->q_head = jp;
/* list was empty */
qp->q_tail = jp;
pthread_rwlock_unlock(&qp->q_lock);
}
/*
* Remove the given job from a queue.
*/
void
job_remove(struct queue *qp, struct job *jp)
{
pthread_rwlock_wrlock(&qp->q_lock);
if (jp == qp->q_head) {
qp->q_head = jp->j_next;
if (qp->q_tail == jp)
qp->q_tail = NULL;
} else if (jp == qp->q_tail) {
Page 506
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
qp->q_tail = jp->j_prev;
if (qp->q_head == jp)
qp->q_head = NULL;
} else {
jp->j_prev->j_next = jp->j_next;
jp->j_next->j_prev = jp->j_prev;
}
pthread_rwlock_unlock(&qp->q_lock);
}
/*
* Find a job for the given thread ID.
*/
struct job *
job_find(struct queue *qp, pthread_t id)
{
struct job *jp;
if (pthread_rwlock_rdlock(&qp->q_lock) != 0)
return(NULL);
for (jp = qp->q_head; jp != NULL; jp = jp->j_next)
if (pthread_equal(jp->j_id, id))
break;
pthread_rwlock_unlock(&qp->q_lock);
return(jp);
}
Condition Variables
Condition variables are another synchronization mechanism available to threads. Condition
variables provide a place for threads to rendezvous. When used with mutexes, condition
variables allow threads to wait in a race-free way for arbitrary conditions to occur.
The condition itself is protected by a mutex. A thread must first lock the mutex to change the
condition state. Other threads will not notice the change until they acquire the mutex,
because the mutex must be locked to be able to evaluate the condition.
Before a condition variable is used, it must first be initialized. A condition variable, represented
by the pthread_cond_t data type, can be initialized in two ways. We can assign the constant
PTHREAD_COND_INITIALIZER to a statically-allocated condition variable, but if the condition
variable is allocated dynamically, we can use the pthread_cond_init function to initialize it.
We can use the pthread_mutex_destroy function to deinitialize a condition variable before
freeing its underlying memory.
[View full width]
#include <pthread.h>
int pthread_cond_init(pthread_cond_t *restrict cond,
pthread_condattr_t *restrict
attr);
int pthread_cond_destroy(pthread_cond_t *cond);
Both return: 0 if OK, error number on failure
Page 507
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Unless you need to create a conditional variable with nondefault attributes, the attr argument
to pthread_cond_init can be set to NULL. We will discuss condition variable attributes in
Section 12.4.
We use pthread_cond_wait to wait for a condition to be true. A variant is provided to return an
error code if the condition hasn't been satisfied in the specified amount of time.
[View full width]
#include <pthread.h>
int pthread_cond_wait(pthread_cond_t *restrict cond,
pthread_mutex_t *restrict
mutex);
int pthread_cond_timedwait(pthread_cond_t
*restrict cond,
pthread_mutex_t
*restrict mutex,
const struct timespec
*restrict timeout);
Both return: 0 if OK, error number on failure
The mutex passed to pthread_cond_wait protects the condition. The caller passes it locked to
the function, which then atomically places the calling thread on the list of threads waiting for
the condition and unlocks the mutex. This closes the window between the time that the
condition is checked and the time that the thread goes to sleep waiting for the condition to
change, so that the thread doesn't miss a change in the condition. When pthread_cond_wait
returns, the mutex is again locked.
The pthread_cond_timedwait function works the same as the pthread_cond_wait function with
the addition of the timeout. The timeout value specifies how long we will wait. It is specified
by the timespec structure, where a time value is represented by a number of seconds and
partial seconds. Partial seconds are specified in units of nanoseconds:
struct timespec {
time_t tv_sec;
long
tv_nsec;
};
/* seconds */
/* nanoseconds */
Using this structure, we need to specify how long we are willing to wait as an absolute time
instead of a relative time. For example, if we are willing to wait 3 minutes, instead of
translating 3 minutes into a timespec structure, we need to translate now + 3 minutes into a
timespec structure.
We can use gettimeofday (Section 6.10) to get the current time expressed as a timeval
structure and translate this into a timespec structure. To obtain the absolute time for the
timeout value, we can use the following function:
Page 508
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
void
maketimeout(struct timespec *tsp, long minutes)
{
struct timeval now;
/* get the current time */
gettimeofday(&now);
tsp->tv_sec = now.tv_sec;
tsp->tv_nsec = now.tv_usec * 1000; /* usec to nsec */
/* add the offset to get timeout value */
tsp->tv_sec += minutes * 60;
}
If the timeout expires without the condition occurring, pthread_cond_timedwait will reacquire
the mutex and return the error ETIMEDOUT. When it returns from a successful call to
pthread_cond_wait or pthread_cond_timedwait, a thread needs to reevaluate the condition,
since another thread might have run and already changed the condition.
There are two functions to notify threads that a condition has been satisfied. The
pthread_cond_signal function will wake up one thread waiting on a condition, whereas the
pthread_cond_broadcast function will wake up all threads waiting on a condition.
The POSIX specification allows for implementations of pthread_cond_signal to wake up more
than one thread, to make the implementation simpler.
#include <pthread.h>
int pthread_cond_signal(pthread_cond_t *cond);
int pthread_cond_broadcast(pthread_cond_t *cond);
Both return: 0 if OK, error number on failure
When we call pthread_cond_signal or pthread_cond_broadcast, we are said to be signaling the
thread or condition. We have to be careful to signal the threads only after changing the state
of the condition.
Example
Figure 11.14 shows an example of how to use condition variables and mutexes together to
synchronize threads.
The condition is the state of the work queue. We protect the condition with a mutex and
evaluate the condition in a while loop. When we put a message on the work queue, we need
to hold the mutex, but we don't need to hold the mutex when we signal the waiting threads.
As long as it is okay for a thread to pull the message off the queue before we call cond_signal
, we can do this after releasing the mutex. Since we check the condition in a while loop, this
doesn't present a problem: a thread will wake up, find that the queue is still empty, and go
back to waiting again. If the code couldn't tolerate this race, we would need to hold the
mutex when we signal the threads.
Figure 11.14. Using condition variables
#include <pthread.h>
struct msg {
struct msg *m_next;
/* ... more stuff here ... */
Page 509
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
};
struct msg *workq;
pthread_cond_t qready = PTHREAD_COND_INITIALIZER;
pthread_mutex_t qlock = PTHREAD_MUTEX_INITIALIZER;
void
process_msg(void)
{
struct msg *mp;
for (;;) {
pthread_mutex_lock(&qlock);
while (workq == NULL)
pthread_cond_wait(&qready, &qlock);
mp = workq;
workq = mp->m_next;
pthread_mutex_unlock(&qlock);
/* now process the message mp */
}
}
void
enqueue_msg(struct msg *mp)
{
pthread_mutex_lock(&qlock);
mp->m_next = workq;
workq = mp;
pthread_mutex_unlock(&qlock);
pthread_cond_signal(&qready);
}
Page 510
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
11.7. Summary
In this chapter, we introduced the concept of threads and discussed the POSIX.1 primitives
available to create and destroy them. We also introduced the problem of thread
synchronization. We discussed three fundamental synchronization mechanismsmutexes,
readerwriter locks, and condition variablesand we saw how to use them to protect shared
resources.
Page 511
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Exercises
11.1
11.2
11.3
11.4
Modify the example shown in Figure 11.4 to pass the structure between the
threads properly.
In the example shown in Figure 11.13, what additional synchronization (if any)
is necessary to allow the master thread to change the thread ID associated
with a pending job? How would this affect the job_remove function?
Apply the techniques shown in Figure 11.14 to the worker thread example (
Figure 11.1 and Figure 11.13) to implement the worker thread function. Don't
forget to update the queue_init function to initialize the condition variable and
change the the job_insert and job_append functions to signal the worker
threads. What difficulties arise?
Which sequence of steps is correct?
1.
Lock a mutex (pthread_mutex_lock).
2.
Change the condition protected by the mutex.
3.
Signal threads waiting on the condition (pthread_cond_broadcast).
4.
Unlock the mutex (pthread_mutex_unlock).
1.
Lock a mutex (pthread_mutex_lock).
2.
Change the condition protected by the mutex.
3.
Unlock the mutex (pthread_mutex_unlock).
4.
Signal threads waiting on the condition (pthread_cond_broadcast).
or
Page 512
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Chapter 12. Thread Control
Section 12.1. Introduction
Section 12.2. Thread Limits
Section 12.3. Thread Attributes
Section 12.4. Synchronization Attributes
Section 12.5. Reentrancy
Section 12.6. Thread-Specific Data
Section 12.7. Cancel Options
Section 12.8. Threads and Signals
Section 12.9. Threads and fork
Section 12.10. Threads and I/O
Section 12.11. Summary
Exercises
Page 513
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
12.1. Introduction
In Chapter 11, we learned the basics about threads and thread synchronization. In this
chapter, we will learn the details of controlling thread behavior. We will look at thread
attributes and synchronization primitive attributes, which we ignored in the previous chapter
in favor of the default behaviors.
We will follow this with a look at how threads can keep data private from other threads in the
same process. Then we will wrap up the chapter with a look at how some process-based
system calls interact with threads.
Page 514
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
12.2. Thread Limits
We discussed the sysconf function in Section 2.5.4. The Single UNIX Specification defines
several limits associated with the operation of threads, which we didn't show in Figure 2.10.
As with other system limits, the thread limits can be queried using sysconf. Figure 12.1
summarizes these limits.
Figure 12.1. Thread limits and name arguments to sysconf
Name of limit
Description
name argument
PTHREAD_DESTRUCTOR_ITERATIONS
maximum number of
times an
implementation will try
to destroy the
thread-specific data
when a thread exits (
Section 12.6)
_SC_THREAD_DESTRUCTOR_ITERATIONS
PTHREAD_KEYS_MAX
maximum number of
keys that can be
created by a process
(Section 12.6)
_SC_THREAD_KEYS_MAX
PTHREAD_STACK_MIN
minimum number of
bytes that can be
used for a thread's
stack (Section 12.3)
_SC_THREAD_STACK_MIN
PTHREAD_THREADS_MAX
maximum number of
threads that can be
created in a process (
Section 12.3)
_SC_THREAD_THREADS_MAX
As with the other limits reported by sysconf, use of these limits is intended to promote
application portability among different operating system implementations. For example, if your
application requires that you create four threads for every file you manage, you might have to
limit the number of files you can manage concurrently if the system won't let you create
enough threads.
Figure 12.2 shows the values of the thread limits for the four implementations described in this
book. When the implementation doesn't define the corresponding sysconf symbol (starting
with _SC_), "no symbol" is listed. If the implementation's limit is indeterminate, "no limit" is
listed. This doesn't mean that the value is unlimited, however. An "unsupported" entry means
that the implementation defines the corresponding sysconf limit symbol, but the sysconf
function doesn't recognize it.
Note that although an implementation may not provide access to these limits, that doesn't
mean that the limits don't exist. It just means that the implementation doesn't provide us with
a way to get at them using sysconf.
Page 515
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 12.2. Examples of thread configuration limits
Limit
FreeBSD 5.2.1
Linux 2.4.22
Mac OS X 10.3
Solaris 9
PTHREAD_DESTRUCTOR_ITERATIONS
no symbol
unsupported
no symbol
no limit
PTHREAD_KEYS_MAX
no symbol
unsupported
no symbol
no limit
PTHREAD_STACK_MIN
no symbol
unsupported
no symbol
4,096
PTHREAD_THREADS_MAX
no symbol
unsupported
no symbol
no limit
Page 516
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
12.3. Thread Attributes
In all the examples in which we called pthread_create in Chapter 11, we passed in a null
pointer instead of passing in a pointer to a pthread_attr_t structure. We can use the
pthread_attr_t structure to modify the default attributes, and associate these attributes with
threads that we create. We use the pthread_attr_init function to initialize the pthread_attr_t
structure. After calling pthread_attr_init, the pthread_attr_t structure contains the default
values for all the thread attributes supported by the implementation. To change individual
attributes, we need to call other functions, as described later in this section.
#include <pthread.h>
int pthread_attr_init(pthread_attr_t *attr);
int pthread_attr_destroy(pthread_attr_t
*attr);
Both return: 0 if OK, error number on failure
To deinitialize a pthread_attr_t structure, we call pthread_attr_destroy. If an implementation
of pthread_attr_init allocated any dynamic memory for the attribute object,
pthread_attr_destroy will free that memory. In addition, pthread_attr_destroy will initialize the
attribute object with invalid values, so if it is used by mistake, pthread_create will return an
error.
The pthread_attr_t structure is opaque to applications. This means that applications aren't
supposed to know anything about its internal structure, thus promoting application portability.
Following this model, POSIX.1 defines separate functions to query and set each attribute.
The thread attributes defined by POSIX.1 are summarized in Figure 12.3. POSIX.1 defines
additional attributes in the real-time threads option, but we don't discuss those here. In
Figure 12.3, we also show which platforms support each thread attribute. If the attribute is
accessible through an obsolete interface, we show ob in the table entry.
Figure 12.3. POSIX.1 thread attributes
Name
Description
detachstat
e
detached thread attribute
guardsize
guard buffer size in bytes at
end of thread stack
stackaddr
lowest address of thread stack
stacksize
size in bytes of thread stack
FreeBSD
5.2.1
Linux
2.4.22
Mac OS X
10.3
Solaris
9
•
•
•
•
•
•
•
ob
•
•
ob
•
•
•
•
In Section 11.5, we introduced the concept of detached threads. If we are no longer
interested in an existing thread's termination status, we can use pthread_detach to allow the
Page 517
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
operating system to reclaim the thread's resources when the thread exits.
If we know that we don't need the thread's termination status at the time we create the
thread, we can arrange for the thread to start out in the detached state by modifying the
detachstate thread attribute in the pthread_attr_t structure. We can use the
pthread_attr_setdetachstate function to set the detachstate thread attribute to one of two
legal values: PTHREAD_CREATE_DETACHED to start the thread in the detached state or
PTHREAD_CREATE_JOINABLE to start the thread normally, so its termination status can be
retrieved by the application.
[View full width]
#include <pthread.h>
int pthread_attr_getdetachstate(const
pthread_attr_t *restrict attr,
int *detachstate
);
int pthread_attr_setdetachstate(pthread_attr_t
*attr, int detachstate);
Both return: 0 if OK, error number on failure
We can call pthread_attr_getdetachstate to obtain the current detachstate attribute. The
integer pointed to by the second argument is set to either PTHREAD_CREATE_DETACHED or
PTHREAD_CREATE_JOINABLE, depending on the value of the attribute in the given pthread_attr_t
structure.
Example
Figure 12.4 shows a function that can be used to create a thread in the detached state.
Note that we ignore the return value from the call to pthread_attr_destroy. In this case, we
initialized the thread attributes properly, so pthread_attr_destroy shouldn't fail. Nonetheless, if
it does fail, cleaning up would be difficult: we would have to destroy the thread we just
created, which is possibly already running, asynchronous to the execution of this function. By
ignoring the error return from pthread_attr_destroy, the worst that can happen is that we leak
a small amount of memory if pthread_attr_init allocated any. But if pthread_attr_init
succeeded in initializing the thread attributes and then pthread_attr_destroy failed to clean
up, we have no recovery strategy anyway, because the attributes structure is opaque to the
application. The only interface defined to clean up the structure is pthread_attr_destroy, and
it just failed.
Figure 12.4. Creating a thread in the detached state
#include "apue.h"
#include <pthread.h>
int
makethread(void *(*fn)(void *), void *arg)
{
int
err;
pthread_t
tid;
pthread_attr_t attr;
Page 518
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
err = pthread_attr_init(&attr);
if (err != 0)
return(err);
err = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
if (err == 0)
err = pthread_create(&tid, &attr, fn, arg);
pthread_attr_destroy(&attr);
return(err);
}
Support for thread stack attributes is optional for a POSIX-conforming operating system, but
is required if the system is to conform to the XSI. At compile time, you can check whether
your system supports each thread stack attribute using the _POSIX_THREAD_ATTR_STACKADDR
and _POSIX_THREAD_ATTR_STACKSIZE symbols. If one is defined, then the system supports the
corresponding thread stack attribute. You can also check at runtime, by using the
_SC_THREAD_ATTR_STACKADDR and _SC_THREAD_ATTR_STACKSIZE parameters to the sysconf function.
POSIX.1 defines several interfaces to manipulate thread stack attributes. Two older
functions, pthread_attr_getstackaddr and pthread_attr_setstackaddr, are marked as obsolete
in Version 3 of the Single UNIX Specification, although many pthreads implementations still
provide them. The preferred way to query and modify a thread's stack attributes is to use the
newer functions pthread_attr_getstack and pthread_attr_setstack. These functions clear up
ambiguities present in the definition of the older interfaces.
[View full width]
#include <pthread.h>
int pthread_attr_getstack(const pthread_attr_t
*restrict attr,
void **restrict stackaddr,
size_t *restrict stacksize);
int pthread_attr_setstack(const pthread_attr_t *attr,
void *stackaddr, size_t
*stacksize);
Both return: 0 if OK, error number on failure
These two functions are used to manage both the stackaddr and the stacksize thread
attributes.
With a process, the amount of virtual address space is fixed. Since there is only one stack, its
size usually isn't a problem. With threads, however, the same amount of virtual address space
must be shared by all the thread stacks. You might have to reduce your default thread stack
size if your application uses so many threads that the cumulative size of their stacks exceeds
the available virtual address space. On the other hand, if your threads call functions that
allocate large automatic variables or call functions many stack frames deep, you might need
more than the default stack size.
If you run out of virtual address space for thread stacks, you can use malloc or mmap (see
Section 14.9) to allocate space for an alternate stack and use pthread_attr_setstack to
change the stack location of threads you create. The address specified by the stackaddr
parameter is the lowest addressable address in the range of memory to be used as the
Page 519
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
thread's stack, aligned at the proper boundary for the processor architecture.
The stackaddr thread attribute is defined as the lowest memory address for the stack. This is
not necessarily the start of the stack, however. If stacks grow from higher address to lower
addresses for a given processor architecture, the stackaddr thread attribute will be the end of
the stack instead of the beginning.
The drawback with pthread_attr_getstackaddr and pthread_attr_setstackaddr is that the
stackaddr parameter was underspecified. It could have been interpreted as the start of the
stack or as the lowest memory address of the memory extent to use as the stack. On
architectures in which the stacks grow down from higher memory addresses to lower
addresses, if the stackaddr parameter is the lowest memory address of the stack, then you
need to know the stack size to determine the start of the stack. The pthread_attr_getstack
and pthread_attr_setstack functions correct these shortcomings.
An application can also get and set the stacksize thread attribute using the
pthread_attr_getstacksize and pthread_attr_setstacksize functions.
[View full width]
#include <pthread.h>
int pthread_attr_getstacksize(const pthread_attr_t
*restrict attr,
size_t *restrict
stacksize);
int pthread_attr_setstacksize(pthread_attr_t *attr
, size_t stacksize);
Both return: 0 if OK, error number on failure
The pthread_attr_setstacksize function is useful when you want to change the default stack
size but don't want to deal with allocating the thread stacks on your own.
The guardsize thread attribute controls the size of the memory extent after the end of the
thread's stack to protect against stack overflow. By default, this is set to PAGESIZE bytes. We
can set the guardsize thread attribute to 0 to disable this feature: no guard buffer will be
provided in this case. Also, if we change the stackaddr thread attribute, the system assumes
that we will be managing our own stacks and disables stack guard buffers, just as if we had
set the guardsize thread attribute to 0.
Page 520
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
[View full width]
#include <pthread.h>
int pthread_attr_getguardsize(const pthread_attr_t
*restrict attr,
size_t *restrict
guardsize);
int pthread_attr_setguardsize(pthread_attr_t *attr
, size_t guardsize);
Both return: 0 if OK, error number on failure
If the guardsize thread attribute is modified, the operating system might round it up to an
integral multiple of the page size. If the thread's stack pointer overflows into the guard area,
the application will receive an error, possibly with a signal.
The Single UNIX Specification defines several other optional thread attributes as part of the
real-time threads option. We will not discuss them here.
More Thread Attributes
Threads have other attributes not represented by the pthread_attr_t structure:

The cancelability state (discussed in Section 12.7)

The cancelability type (also discussed in Section 12.7)

The concurrency level
The concurrency level controls the number of kernel threads or processes on top of which the
user-level threads are mapped. If an implementation keeps a one-to-one mapping between
kernel-level threads and user-level threads, then changing the concurrency level will have no
effect, since it is possible for all user-level threads to be scheduled. If the implementation
multiplexes user-level threads on top of kernel-level threads or processes, however, you might
be able to improve performance by increasing the number of user-level threads that can run
at a given time. The pthread_setconcurrency function can be used to provide a hint to the
system of the desired level of concurrency.
#include <pthread.h>
int pthread_getconcurrency(void);
Returns: current concurrency level
int pthread_setconcurrency(int level);
Returns: 0 if OK, error number on failure
Page 521
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
The pthread_getconcurrency function returns the current concurrency level. If the operating
system is controlling the concurrency level (i.e., if no prior call to pthread_setconcurrency has
been made), then pthread_getconcurrency will return 0.
The concurrency level specified by pthread_setconcurrency is only a hint to the system. There
is no guarantee that the requested concurrency level will be honored. You can tell the system
that you want it to decide for itself what concurrency level to use by passing a level of 0.
Thus, an application can undo the effects of a prior call to pthread_setconcurrency with a
nonzero value of level by calling it again with level set to 0.
Page 522
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
12.4. Synchronization Attributes
Just as threads have attributes, so too do their synchronization objects. In this section, we
discuss the attributes of mutexes, readerwriter locks, and condition variables.
Mutex Attributes
We use pthread_mutexattr_init to initialize a pthread_mutexattr_t structure and
pthread_mutexattr_destroy to deinitialize one.
[View full width]
#include <pthread.h>
int pthread_mutexattr_init(pthread_mutexattr_t *attr);
int pthread_mutexattr_destroy(pthread_mutexattr_t
*attr);
Both return: 0 if OK, error number on failure
The pthread_mutexattr_init function will initialize the pthread_mutexattr_t structure with the
default mutex attributes. Two attributes of interest are the process-shared attribute and the
type attribute. Within POSIX.1, the process-shared attribute is optional; you can test
whether a platform supports it by checking whether the _POSIX_THREAD_PROCESS_SHARED symbol
is defined. You can also check at runtime by passing the _SC_THREAD_PROCESS_SHARED parameter
to the sysconf function. Although this option is not required to be provided by
POSIX-conforming operating systems, the Single UNIX Specification requires that
XSI-conforming operating systems do support this option.
Within a process, multiple threads can access the same synchronization object. This is the
default behavior, as we saw in Chapter 11. In this case, the process-shared mutex attribute
is set to PTHREAD_PROCESS_PRIVATE.
As we shall see in Chapters 14 and 15, mechanisms exist that allow independent processes to
map the same extent of memory into their independent address spaces. Access to shared
data by multiple processes usually requires synchronization, just as does access to shared
data by multiple threads. If the process-shared mutex attribute is set to
PTHREAD_PROCESS_SHARED, a mutex allocated from a memory extent shared between multiple
processes may be used for synchronization by those processes.
We can use the pthread_mutexattr_getpshared function to query a pthread_mutexattr_t
structure for its process-shared attribute. We can change the process-shared attribute with
the pthread_mutexattr_setpshared function.
Page 523
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
[View full width]
#include <pthread.h>
int pthread_mutexattr_getpshared(const
pthread_mutexattr_t *
restrict attr,
int *restrict
pshared);
int pthread_mutexattr_setpshared
(pthread_mutexattr_t *attr,
int pshared);
Both return: 0 if OK, error number on failure
The process-shared mutex attribute allows the pthread library to provide more efficient mutex
implementations when the attribute is set to PTHREAD_PROCESS_PRIVATE, which is the default
case with multithreaded applications. Then the pthread library can restrict the more expensive
implementation to the case in which mutexes are shared among processes.
The type mutex attribute controls the characteristics of the mutex. POSIX.1 defines four
types. The PTHREAD_MUTEX_NORMAL type is a standard mutex that doesn't do any special error
checking or deadlock detection. The PTHREAD_MUTEX_ERRORCHECK mutex type provides error
checking.
The PTHREAD_MUTEX_RECURSIVE mutex type allows the same thread to lock it multiple times
without first unlocking it. A recursive mutex maintains a lock count and isn't released until it is
unlocked the same number of times it is locked. So if you lock a recursive mutex twice and
then unlock it, the mutex remains locked until it is unlocked a second time.
Finally, the PTHREAD_MUTEX_DEFAULT type can be used to request default semantics.
Implementations are free to map this to one of the other types. On Linux, for example, this
type is mapped to the normal mutex type.
The behavior of the four types is shown in Figure 12.5. The "Unlock when not owned" column
refers to one thread unlocking a mutex that was locked by a different thread. The "Unlock
when unlocked" column refers to what happens when a thread unlocks a mutex that is already
unlocked, which usually is a coding mistake.
Figure 12.5. Mutex type behavior
Relock without
unlock?
Unlock when not
owned?
Unlock when
unlocked?
deadlock
undefined
undefined
PTHREAD_MUTEX_ERRORCHECK
returns error
returns error
returns error
PTHREAD_MUTEX_RECURSIVE
allowed
returns error
returns error
Mutex type
PTHREAD_MUTEX_NORMAL
Page 524
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 12.5. Mutex type behavior
Mutex type
PTHREAD_MUTEX_DEFAULT
Relock without
unlock?
Unlock when not
owned?
Unlock when
unlocked?
undefined
undefined
undefined
We can use pthread_mutexattr_gettype to get the mutex type attribute and
pthread_mutexattr_settype to change the mutex type attribute.
[View full width]
#include <pthread.h>
int pthread_mutexattr_gettype(const
pthread_mutexattr_t *
restrict attr, int
*restrict type);
int pthread_mutexattr_settype(pthread_mutexattr_t
*attr, int type);
Both return: 0 if OK, error number on failure
Recall from Section 11.6 that a mutex is used to protect the condition that is associated with
a condition variable. Before blocking the thread, the pthread_cond_wait and the
pthread_cond_timedwait functions release the mutex associated with the condition. This allows
other threads to acquire the mutex, change the condition, release the mutex, and signal the
condition variable. Since the mutex must be held to change the condition, it is not a good idea
to use a recursive mutex. If a recursive mutex is locked multiple times and used in a call to
pthread_cond_wait, the condition can never be satisfied, because the unlock done by
pthread_cond_wait doesn't release the mutex.
Recursive mutexes are useful when you need to adapt existing single-threaded interfaces to a
multithreaded environment, but can't change the interfaces to your functions because of
compatibility constraints. However, using recursive locks can be tricky, and they should be
used only when no other solution is possible.
Example
Figure 12.6 illustrates a situation in which a recursive mutex might seem to solve a
concurrency problem. Assume that func1 and func2 are existing functions in a library whose
interfaces can't be changed, because applications exist that call them, and the applications
can't be changed.
To keep the interfaces the same, we embed a mutex in the data structure whose address (x)
is passed in as an argument. This is possible only if we have provided an allocator function for
the structure, so the application doesn't know about its size (assuming we must increase its
size when we add a mutex to it).
This is also possible if we originally defined the structure with enough padding to allow us now
Page 525
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
to replace some pad fields with a mutex. Unfortunately, most programmers are unskilled at
predicting the future, so this is not a common practice.
If both func1 and func2 must manipulate the structure and it is possible to access it from more
than one thread at a time, then func1 and func2 must lock the mutex before manipulating the
data. If func1 must call func2, we will deadlock if the mutex type is not recursive. We could
avoid using a recursive mutex if we could release the mutex before calling func2 and reacquire
it after func2 returns, but this opens a window where another thread can possibly grab control
of the mutex and change the data structure in the middle of func1. This may not be
acceptable, depending on what protection the mutex is intended to provide.
Figure 12.7 shows an alternative to using a recursive mutex in this case. We can leave the
interfaces to func1 and func2 unchanged and avoid a recursive mutex by providing a private
version of func2, called func2_locked. To call func2_locked, we must hold the mutex embedded
in the data structure whose address we pass as the argument. The body of func2_locked
contains a copy of func2, and func2 now simply acquires the mutex, calls func2_locked, and
then releases the mutex.
If we didn't have to leave the interfaces to the library functions unchanged, we could have
added a second parameter to each function to indicate whether the structure is locked by the
caller. It is usually better to leave the interfaces unchanged if we can, however, instead of
polluting it with implementation artifacts.
The strategy of providing locked and unlocked versions of functions is usually applicable in
simple situations. In more complex situations, such as when the library needs to call a function
outside the library, which then might call back into the library, we need to rely on recursive
locks.
Figure 12.6. Recursive locking opportunity
[View full size image]
Figure 12.7. Avoiding a recursive locking opportunity
[View full size image]
Page 526
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Example
The program in Figure 12.8 illustrates another situation in which a recursive mutex is
necessary. Here, we have a "timeout" function that allows us to schedule another function to
be run at some time in the future. Assuming that threads are an inexpensive resource, we can
create a thread for each pending timeout. The thread waits until the time has been reached,
and then it calls the function we've requested.
The problem arises when we can't create a thread or when the scheduled time to run the
function has already passed. In these cases, we simply call the requested function now, from
the current context. Since the function acquires the same lock that we currently hold, a
deadlock will occur unless the lock is recursive.
We use the makethread function from Figure 12.4 to create a thread in the detached state.
We want the function to run in the future, and we don't want to wait around for the thread to
complete.
We could call sleep to wait for the timeout to expire, but that gives us only second
granularity. If we want to wait for some time other than an integral number of seconds, we
need to use nanosleep(2), which provides similar functionality.
Although nanosleep is required to be implemented only in the real-time extensions of the Single
UNIX Specification, all the platforms discussed in this text support it.
The caller of timeout needs to hold a mutex to check the condition and to schedule the retry
function as an atomic operation. The retry function will try to lock the same mutex. Unless
the mutex is recursive, a deadlock will occur if the timeout function calls retry directly.
Figure 12.8. Using a recursive mutex
#include
#include
#include
#include
"apue.h"
<pthread.h>
<time.h>
<sys/time.h>
extern int makethread(void *(*)(void *), void *);
struct to_info {
void
(*to_fn)(void *);
void
*to_arg;
/* function */
/* argument */
Page 527
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
struct timespec to_wait;
/* time to wait */
};
#define SECTONSEC 1000000000
#define USECTONSEC 1000
/* seconds to nanoseconds */
/* microseconds to nanoseconds */
void *
timeout_helper(void *arg)
{
struct to_info *tip;
tip = (struct to_info *)arg;
nanosleep(&tip->to_wait, NULL);
(*tip->to_fn)(tip->to_arg);
return(0);
}
void
timeout(const struct timespec *when, void (*func)(void *), void *arg)
{
struct timespec now;
struct timeval tv;
struct to_info *tip;
int
err;
gettimeofday(&tv, NULL);
now.tv_sec = tv.tv_sec;
now.tv_nsec = tv.tv_usec * USECTONSEC;
if ((when->tv_sec > now.tv_sec) ||
(when->tv_sec == now.tv_sec && when->tv_nsec > now.tv_nsec)) {
tip = malloc(sizeof(struct to_info));
if (tip != NULL) {
tip->to_fn = func;
tip->to_arg = arg;
tip->to_wait.tv_sec = when->tv_sec - now.tv_sec;
if (when->tv_nsec >= now.tv_nsec) {
tip->to_wait.tv_nsec = when->tv_nsec - now.tv_nsec;
} else {
tip->to_wait.tv_sec--;
tip->to_wait.tv_nsec = SECTONSEC - now.tv_nsec +
when->tv_nsec;
}
err = makethread(timeout_helper, (void *)tip);
if (err == 0)
return;
}
}
/*
* We get here if (a) when <= now, or (b) malloc fails, or
* (c) we can't make a thread, so we just call the function now.
*/
(*func)(arg);
}
pthread_mutexattr_t attr;
pthread_mutex_t mutex;
void
Page 528
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
retry(void *arg)
{
pthread_mutex_lock(&mutex);
/* perform retry steps ... */
pthread_mutex_unlock(&mutex);
}
int
main(void)
{
int
err, condition, arg;
struct timespec when;
if ((err = pthread_mutexattr_init(&attr)) != 0)
err_exit(err, "pthread_mutexattr_init failed");
if ((err = pthread_mutexattr_settype(&attr,
PTHREAD_MUTEX_RECURSIVE)) != 0)
err_exit(err, "can't set recursive type");
if ((err = pthread_mutex_init(&mutex, &attr)) != 0)
err_exit(err, "can't create recursive mutex");
/* ... */
pthread_mutex_lock(&mutex);
/* ... */
if (condition) {
/* calculate target time "when" */
timeout(&when, retry, (void *)arg);
}
/* ... */
pthread_mutex_unlock(&mutex);
/* ... */
exit(0);
}
ReaderWriter Lock Attributes
Readerwriter locks also have attributes, similar to mutexes. We use pthread_rwlockattr_init
to initialize a pthread_rwlockattr_t structure and pthread_rwlockattr_destroy to deinitialize the
structure.
[View full width]
#include <pthread.h>
int pthread_rwlockattr_init(pthread_rwlockattr_t
*attr);
int pthread_rwlockattr_destroy
(pthread_rwlockattr_t *attr);
Both return: 0 if OK, error number on failure
The only attribute supported for readerwriter locks is the process-shared attribute. It is
identical to the mutex process-shared attribute. Just as with the mutex process-shared
attributes, a pair of functions is provided to get and set the process-shared attributes of
Page 529
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
readerwriter locks.
[View full width]
#include <pthread.h>
int pthread_rwlockattr_getpshared(const
pthread_rwlockattr_t *
restrict attr,
int *restrict
pshared);
int pthread_rwlockattr_setpshared
(pthread_rwlockattr_t *attr,
int pshared);
Both return: 0 if OK, error number on failure
Although POSIX defines only one readerwriter lock attribute, implementations are free to
define additional, nonstandard ones.
Condition Variable Attributes
Condition variables have attributes, too. There is a pair of functions for initializing and
deinitializing them, similar to mutexes and readerwriter locks.
[View full width]
#include <pthread.h>
int pthread_condattr_init(pthread_condattr_t *attr);
int pthread_condattr_destroy(pthread_condattr_t
*attr);
Both return: 0 if OK, error number on failure
Just as with the other synchronization primitives, condition variables support the
process-shared attribute.
Page 530
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
[View full width]
#include <pthread.h>
int pthread_condattr_getpshared(const
pthread_condattr_t *
restrict attr,
int *restrict
pshared);
int pthread_condattr_setpshared(pthread_condattr_t
*attr,
int pshared);
Both return: 0 if OK, error number on failure
Page 531
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
12.5. Reentrancy
We discussed reentrant functions and signal handlers in Section 10.6. Threads are similar to
signal handlers when it comes to reentrancy. With both signal handlers and threads, multiple
threads of control can potentially call the same function at the same time.
If a function can be safely called by multiple threads at the same time, we say that the
function is thread-safe. All functions defined in the Single UNIX Specification are guaranteed
to be thread-safe, except those listed in Figure 12.9. In addition, the ctermid and tmpnam
functions are not guaranteed to be thread-safe if they are passed a null pointer. Similarly,
there is no guarantee that wcrtomb and wcsrtombs are thread-safe when they are passed a null
pointer for their mbstate_t argument.
Figure 12.9. Functions not guaranteed to be thread-safe by POSIX.1
asctime
ecvt
gethostent
getutxline
putc_unlocked
basename
encrypt
getlogin
gmtime
putchar_unlocked
catgets
endgrent
getnetbyaddr
hcreate
putenv
crypt
endpwent
getnetbyname
hdestroy
pututxline
ctime
endutxent
getnetent
hsearch
rand
dbm_clearerr
fcvt
getopt
inet_ntoa
readdir
dbm_close
ftw
getprotobyname
l64a
setenv
dbm_delete
gcvt
getprotobynumber
lgamma
setgrent
dbm_error
getc_unlocked
getprotoent
lgammaf
setkey
dbm_fetch
getchar_unlocked
getpwent
lgammal
setpwent
dbm_firstkey
getdate
getpwnam
localeconv
setutxent
dbm_nextkey
getenv
getpwuid
localtime
strerror
dbm_open
getgrent
getservbyname
lrand48
strtok
dbm_store
getgrgid
getservbyport
mrand48
ttyname
dirname
getgrnam
getservent
nftw
unsetenv
dlerror
gethostbyaddr
getutxent
nl_langinfo
wcstombs
Page 532
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 12.9. Functions not guaranteed to be thread-safe by POSIX.1
asctime
ecvt
gethostent
getutxline
putc_unlocked
drand48
gethostbyname
getutxid
ptsname
wctomb
Implementations that support thread-safe functions will define the
_POSIX_THREAD_SAFE_FUNCTIONS symbol in <unistd.h>. Applications can also use the
_SC_THREAD_SAFE_FUNCTIONS argument with sysconf to check for support of thread-safe
functions at runtime. All XSI-conforming implementations are required to support thread-safe
functions.
When it supports the thread-safe functions feature, an implementation provides alternate,
thread-safe versions of some of the POSIX.1 functions that aren't thread-safe. Figure 12.10
lists the thread-safe versions of these functions. Many functions are not thread-safe,
because they return data stored in a static memory buffer. They are made thread-safe by
changing their interfaces to require that the caller provide its own buffer.
Figure 12.10. Alternate thread-safe functions
acstime_r
gmtime_r
ctime_r
localtime_r
getgrgid_r
rand_r
getgrnam_r
readdir_r
getlogin_r
strerror_r
getpwnam_r
strtok_r
getpwuid_r
ttyname_r
The functions listed in Figure 12.10 are named the same as their non-thread-safe relatives,
but with an _r appended at the end of the name, signifying that these versions are reentrant.
If a function is reentrant with respect to multiple threads, we say that it is thread-safe. This
doesn't tell us, however, whether the function is reentrant with respect to signal handlers. We
say that a function that is safe to be reentered from an asynchronous signal handler is
async-signal safe. We saw the async-signal safe functions in Figure 10.4 when we discussed
reentrant functions in Section 10.6.
In addition to the functions listed in Figure 12.10, POSIX.1 provides a way to manage FILE
objects in a thread-safe way. You can use flockfile and ftrylockfile to obtain a lock
associated with a given FILE object. This lock is recursive: you can acquire it again, while you
already hold it, without deadlocking. Although the exact implementation of the lock is
unspecified, it is required that all standard I/O routines that manipulate FILE objects behave
as if they call flockfile and funlockfile internally.
Page 533
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
#include <stdio.h>
int ftrylockfile(FILE *fp);
Returns: 0 if OK, nonzero if lock can't be acquired
void flockfile(FILE *fp);
void funlockfile(FILE *fp);
Although the standard I/O routines might be implemented to be thread-safe from the
perspective of their own internal data structures, it is still useful to expose the locking to
applications. This allows applications to compose multiple calls to standard I/O functions into
atomic sequences. Of course, when dealing with multiple FILE objects, you need to beware of
potential deadlocks and to order your locks carefully.
If the standard I/O routines acquire their own locks, then we can run into serious performance
degradation when doing character-at-a-time I/O. In this situation, we end up acquiring and
releasing a lock for every character read or written. To avoid this overhead, unlocked versions
of the character-based standard I/O routines are available.
#include <stdio.h>
int getchar_unlocked(void);
int getc_unlocked(FILE *fp);
Both return: the next character if OK, EOF on end of file or error
int putchar_unlocked(int c);
int putc_unlocked(int c, FILE *fp);
Both return: c if OK, EOF on error
These four functions should not be called unless surrounded by calls to flockfile (or
ftrylockfile) and funlockfile. Otherwise, unpredictable results can occur (i.e., the types of
problems that result from unsynchronized access to data by multiple threads of control).
Once you lock the FILE object, you can make multiple calls to these functions before releasing
the lock. This amortizes the locking overhead across the amount of data read or written.
Example
Figure 12.11 shows a possible implementation of getenv (Section 7.9). This version is not
reentrant. If two threads call it at the same time, they will see inconsistent results, because
the string returned is stored in a single static buffer that is shared by all threads calling getenv
.
We show a reentrant version of getenv in Figure 12.12. This version is called getenv_r. It uses
the pthread_once function (described in Section 12.6) to ensure that the thread_init function
is called only once per process.
To make getenv_r reentrant, we changed the interface so that the caller must provide its own
Page 534
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
buffer. Thus, each thread can use a different buffer to avoid interfering with the others. Note,
however, that this is not enough to make getenv_r thread-safe. To make getenv_r
thread-safe, we need to protect against changes to the environment while we are searching
for the requested string. We can use a mutex to serialize access to the environment list by
getenv_r and putenv.
We could have used a readerwriter lock to allow multiple concurrent calls to getenv_r, but the
added concurrency probably wouldn't improve the performance of our program by very much,
for two reasons. First, the environment list usually isn't very long, so we won't hold the mutex
for too long while we scan the list. Second, calls to getenv and putenv are infrequent, so if we
improve their performance, we won't affect the overall performance of the program very
much.
If we make getenv_r thread-safe, that doesn't mean that it is reentrant with respect to signal
handlers. If we use a nonrecursive mutex, we run the risk that a thread will deadlock itself if it
calls getenv_r from a signal handler. If the signal handler interrupts the thread while it is
executing getenv_r, we will already be holding env_mutex locked, so another attempt to lock it
will block, causing the thread to deadlock. Thus, we must use a recursive mutex to prevent
other threads from changing the data structures while we look at them, and also prevent
deadlocks from signal handlers. The problem is that the pthread functions are not guaranteed
to be async-signal safe, so we can't use them to make another function async-signal safe.
Figure 12.11. A nonreentrant version of getenv
#include <limits.h>
#include <string.h>
static char envbuf[ARG_MAX];
extern char **environ;
char *
getenv(const char *name)
{
int i, len;
len = strlen(name);
for (i = 0; environ[i] != NULL; i++) {
if ((strncmp(name, environ[i], len) == 0) &&
(environ[i][len] == '=')) {
strcpy(envbuf, &environ[i][len+1]);
return(envbuf);
}
}
return(NULL);
}
Figure 12.12. A reentrant (thread-safe) version of getenv
#include
#include
#include
#include
<string.h>
<errno.h>
<pthread.h>
<stdlib.h>
extern char **environ;
pthread_mutex_t env_mutex;
static pthread_once_t init_done = PTHREAD_ONCE_INIT;
Page 535
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
static void
thread_init(void)
{
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
pthread_mutex_init(&env_mutex, &attr);
pthread_mutexattr_destroy(&attr);
}
int
getenv_r(const char *name, char *buf, int buflen)
{
int i, len, olen;
pthread_once(&init_done, thread_init);
len = strlen(name);
pthread_mutex_lock(&env_mutex);
for (i = 0; environ[i] != NULL; i++) {
if ((strncmp(name, environ[i], len) == 0) &&
(environ[i][len] == '=')) {
olen = strlen(&environ[i][len+1]);
if (olen >= buflen) {
pthread_mutex_unlock(&env_mutex);
return(ENOSPC);
}
strcpy(buf, &environ[i][len+1]);
pthread_mutex_unlock(&env_mutex);
return(0);
}
}
pthread_mutex_unlock(&env_mutex);
return(ENOENT);
}
Page 536
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
12.6. Thread-Specific Data
Thread-specific data, also known as thread-private data, is a mechanism for storing and
finding data associated with a particular thread. The reason we call the data thread-specific,
or thread-private, is that we'd like each thread to access its own separate copy of the data,
without worrying about synchronizing access with other threads.
Many people went to a lot of trouble designing a threads model that promotes sharing process
data and attributes. So why would anyone want to promote interfaces that prevent sharing in
this model? There are two reasons.
First, sometimes we need to maintain data on a per thread basis. Since there is no guarantee
that thread IDs are small, sequential integers, we can't simply allocate an array of per thread
data and use the thread ID as the index. Even if we could depend on small, sequential thread
IDs, we'd like a little extra protection so that one thread can't mess with another's data.
The second reason for thread-private data is to provide a mechanism for adapting
process-based interfaces to a multithreaded environment. An obvious example of this is errno.
Recall the discussion of errno in Section 1.7. Older interfaces (before the advent of threads)
defined errno as an integer accessible globally within the context of a process. System calls
and library routines set errno as a side effect of failing. To make it possible for threads to use
these same system calls and library routines, errno is redefined as thread-private data. Thus,
one thread making a call that sets errno doesn't affect the value of errno for the other
threads in the process.
Recall that all threads in a process have access to the entire address space of the process.
Other than using registers, there is no way for one thread to prevent another from accessing
its data. This is true even for thread-specific data. Even though the underlying implementation
doesn't prevent access, the functions provided to manage thread-specific data promote data
separation among threads.
Before allocating thread-specific data, we need to create a key to associate with the data.
The key will be used to gain access to the thread-specific data. We use pthread_key_create
to create a key.
#include <pthread.h>
int pthread_key_create(pthread_key_t *keyp,
void (*destructor)(void
*));
Returns: 0 if OK, error number on failure
The key created is stored in the memory location pointed to by keyp. The same key can be
used by all threads in the process, but each thread will associate a different thread-specific
data address with the key. When the key is created, the data address for each thread is set
to a null value.
In addition to creating a key, pthread_key_create associates an optional destructor function
with the key. When the thread exits, if the data address has been set to a non-null value, the
destructor function is called with the data address as the only argument. If destructor is null,
then no destructor function is associated with the key. When the thread exits normally, by
calling pthread_exit or by returning, the destructor is called. But if the thread calls exit, _exit
, _Exit, or abort, or otherwise exits abnormally, the destructor is not called.
Page 537
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Threads usually use malloc to allocate memory for their thread-specific data. The destructor
function usually frees the memory that was allocated. If the thread exited without freeing the
memory, then the memory would be lost: leaked by the process.
A thread can allocate multiple keys for thread-specific data. Each key can have a destructor
associated with it. There can be a different destructor function for each key, or they can all
use the same function. Each operating system implementation can place a limit on the number
of keys a process can allocate (recall PTHREAD_KEYS_MAX from Figure 12.1).
When a thread exits, the destructors for its thread-specific data are called in an
implementation-defined order. It is possible for the destructor function to call another function
that might create new thread-specific data and associate it with the key. After all destructors
are called, the system will check whether any non-null thread-specific values were associated
with the keys and, if so, call the destructors again. This process will repeat until either all
keys for the thread have null thread-specific data values or a maximum of
PTHREAD_DESTRUCTOR_ITERATIONS (Figure 12.1) attempts have been made.
We can break the association of a key with the thread-specific data values for all threads by
calling pthread_key_delete.
#include <pthread.h>
int pthread_key_delete(pthread_key_t *key );
Returns: 0 if OK, error number on failure
Note that calling pthread_key_delete will not invoke the destructor function associated with
the key. To free any memory associated with the key's thread-specific data values, we need
to take additional steps in the application.
We need to ensure that a key we allocate doesn't change because of a race during
initialization. Code like the following can result in two threads both calling pthread_key_create:
void destructor(void *);
pthread_key_t key;
int init_done = 0;
int
threadfunc(void *arg)
{
if (!init_done) {
init_done = 1;
err = pthread_key_create(&key, destructor);
}
...
}
Depending on how the system schedules threads, some threads might see one key value,
whereas other threads might see a different value. The way to solve this race is to use
pthread_once.
Page 538
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
[View full width]
#include <pthread.h>
pthread_once_t initflag = PTHREAD_ONCE_INIT;
int pthread_once(pthread_once_t *initflag,
void
(*initfn)(void));
Returns: 0 if OK, error number on failure
The initflag must be a nonlocal variable (i.e., global or static) and initialized to
PTHREAD_ONCE_INIT.
If each thread calls pthread_once, the system guarantees that the initialization routine, initfn,
will be called only once, on the first call to pthread_once. The proper way to create a key
without a race is as follows:
void destructor(void *);
pthread_key_t key;
pthread_once_t init_done = PTHREAD_ONCE_INIT;
void
thread_init(void)
{
err = pthread_key_create(&key, destructor);
}
int
threadfunc(void *arg)
{
pthread_once(&init_done, thread_init);
...
}
Once a key is created, we can associate thread-specific data with the key by calling
pthread_setspecific. We can obtain the address of the thread-specific data with
pthread_getspecific.
#include <pthread.h>
void *pthread_getspecific(pthread_key_t key );
Returns: thread-specific data value or NULL if no value
has been associated with the key
[View full width]
int pthread_setspecific(pthread_key_t key , const
void *value);
Page 539
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
#include <pthread.h>
void *pthread_getspecific(pthread_key_t key );
Returns: 0 if OK, error number on failure
If no thread-specific data has been associated with a key, pthread_getspecific will return a
null pointer. We can use this to determine whether we need to call pthread_setspecific.
Example
In Figure 12.11, we showed a hypothetical implementation of getenv. We came up with a new
interface to provide the same functionality, but in a thread-safe way (Figure 12.12). But what
would happen if we couldn't modify our application programs to use the new interface? In that
case, we could use thread-specific data to maintain a per thread copy of the data buffer used
to hold the return string. This is shown in Figure 12.13.
We use pthread_once to ensure that only one key is created for the thread-specific data we
will use. If pthread_getspecific returns a null pointer, we need to allocate the memory buffer
and associate it with the key. Otherwise, we use the memory buffer returned by
pthread_getspecific. For the destructor function, we use free to free the memory previously
allocated by malloc. The destructor function will be called with the value of the
thread-specific data only if the value is non-null.
Note that although this version of getenv is thread-safe, it is not async-signal safe. Even if we
made the mutex recursive, we could not make it reentrant with respect to signal handlers,
because it calls malloc, which itself is not async-signal safe.
Figure 12.13. A thread-safe, compatible version of getenv
#include
#include
#include
#include
<limits.h>
<string.h>
<pthread.h>
<stdlib.h>
static pthread_key_t key;
static pthread_once_t init_done = PTHREAD_ONCE_INIT;
pthread_mutex_t env_mutex = PTHREAD_MUTEX_INITIALIZER;
extern char **environ;
static void
thread_init(void)
{
pthread_key_create(&key, free);
}
char *
getenv(const char *name)
{
int
i, len;
char
*envbuf;
pthread_once(&init_done, thread_init);
pthread_mutex_lock(&env_mutex);
envbuf = (char *)pthread_getspecific(key);
if (envbuf == NULL) {
Page 540
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
envbuf = malloc(ARG_MAX);
if (envbuf == NULL) {
pthread_mutex_unlock(&env_mutex);
return(NULL);
}
pthread_setspecific(key, envbuf);
}
len = strlen(name);
for (i = 0; environ[i] != NULL; i++) {
if ((strncmp(name, environ[i], len) == 0) &&
(environ[i][len] == '=')) {
strcpy(envbuf, &environ[i][len+1]);
pthread_mutex_unlock(&env_mutex);
return(envbuf);
}
}
pthread_mutex_unlock(&env_mutex);
return(NULL);
}
Page 541
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
12.7. Cancel Options
Two thread attributes that are not included in the pthread_attr_t structure are the
cancelability state and the cancelability type. These attributes affect the behavior of a
thread in response to a call to pthread_cancel (Section 11.5).
The cancelability state attribute can be either PTHREAD_CANCEL_ENABLE or
PTHREAD_CANCEL_DISABLE. A thread can change its cancelability state by calling
pthread_setcancelstate.
#include <pthread.h>
int pthread_setcancelstate(int state, int *oldstate
);
Returns: 0 if OK, error number on failure
In one atomic operation, pthread_setcancelstate sets the current cancelability state to state
and stores the previous cancelability state in the memory location pointed to by oldstate.
Recall from Section 11.5 that a call to pthread_cancel doesn't wait for a thread to terminate.
In the default case, a thread will continue to execute after a cancellation request is made,
until the thread reaches a cancellation point. A cancellation point is a place where the thread
checks to see whether it has been canceled, and then acts on the request. POSIX.1
guarantees that cancellation points will occur when a thread calls any of the functions listed
in Figure 12.14.
Figure 12.14. Cancellation points defined by POSIX.1
accept
mq_timedsend
putpmsg
sigsuspend
aio_suspend
msgrcv
pwrite
sigtimedwait
clock_nanosleep
msgsnd
read
sigwait
close
msync
readv
sigwaitinfo
connect
nanosleep
recv
sleep
creat
open
recvfrom
system
fcntl2
pause
recvmsg
tcdrain
fsync
poll
select
usleep
getmsg
pread
sem_timedwait
wait
getpmsg
pthread_cond_timedwait
sem_wait
waitid
Page 542
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 12.14. Cancellation points defined by POSIX.1
accept
mq_timedsend
putpmsg
sigsuspend
lockf
pthread_cond_wait
send
waitpid
mq_receive
pthread_join
sendmsg
write
mq_send
pthread_testcancel
sendto
writev
mq_timedreceive
putmsg
sigpause
A thread starts with a default cancelability state of PTHREAD_CANCEL_ENABLE. When the state is
set to PTHREAD_CANCEL_DISABLE, a call to pthread_cancel will not kill the thread. Instead, the
cancellation request remains pending for the thread. When the state is enabled again, the
thread will act on any pending cancellation requests at the next cancellation point.
In addition to the functions listed in Figure 12.14, POSIX.1 specifies the functions listed in
Figure 12.15 as optional cancellation points.
Figure 12.15. Optional cancellation points defined by POSIX.1
catclose
ftell
getwc
printf
catgets
ftello
getwchar
putc
catopen
ftw
getwd
putc_unlocked
closedir
fwprintf
glob
putchar
closelog
fwrite
iconv_close
putchar_unlocked
ctermid
fwscanf
iconv_open
puts
dbm_close
getc
ioctl
pututxline
dbm_delete
getc_unlocked
lseek
putwc
dbm_fetch
getchar
mkstemp
putwchar
dbm_nextkey
getchar_unlocked
nftw
readdir
dbm_open
getcwd
opendir
readdir_r
dbm_store
getdate
openlog
remove
dlclose
getgrent
pclose
rename
Page 543
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 12.15. Optional cancellation points defined by POSIX.1
catclose
ftell
getwc
printf
dlopen
getgrgid
perror
rewind
endgrent
getgrgid_r
popen
rewinddir
endhostent
getgrnam
posix_fadvise
scanf
endnetent
getgrnam_r
posix_fallocate
seekdir
endprotoent
gethostbyaddr
posix_madvise
semop
endpwent
gethostbyname
posix_spawn
setgrent
endservent
gethostent
posix_spawnp
sethostent
endutxent
gethostname
posix_trace_clear
setnetent
fclose
getlogin
posix_trace_close
setprotoent
fcntl
getlogin_r
posix_trace_create
setpwent
fflush
getnetbyaddr
posix_trace_create_withlog
setservent
fgetc
getnetbyname
posix_trace_eventtypelist_getnext_id
setutxent
fgetpos
getnetent
posix_trace_eventtypelist_rewind
strerror
fgets
getprotobyname
posix_trace_flush
syslog
fgetwc
getprotobynumber
posix_trace_get_attr
tmpfile
fgetws
getprotoent
posix_trace_get_filter
tmpnam
fopen
getpwent
posix_trace_get_status
ttyname
fprintf
getpwnam
posix_trace_getnext_event
ttyname_r
fputc
getpwnam_r
posix_trace_open
ungetc
fputs
getpwuid
posix_trace_rewind
ungetwc
fputwc
getpwuid_r
posix_trace_set_filter
unlink
fputws
gets
posix_trace_shutdown
vfprintf
Page 544
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 12.15. Optional cancellation points defined by POSIX.1
catclose
ftell
getwc
printf
fread
getservbyname
posix_trace_timedgetnext_event
vfwprintf
freopen
getservbyport
posix_typed_mem_open
vprintf
fscanf
getservent
pthread_rwlock_rdlock
vwprintf
fseek
getutxent
pthread_rwlock_timedrdlock
wprintf
fseeko
getutxid
pthread_rwlock_timedwrlock
wscanf
fsetpos
getutxline
pthread_rwlock_wrlock
Note that several of the functions listed in Figure 12.15 are not discussed further in this text.
Many are optional in the Single UNIX Specification.
If your application doesn't call one of the functions in Figure 12.14 or Figure 12.15 for a long
period of time (if it is compute-bound, for example), then you can call pthread_testcancel to
add your own cancellation points to the program.
#include <pthread.h>
void pthread_testcancel(void);
When you call pthread_testcancel, if a cancellation request is pending and if cancellation has
not been disabled, the thread will be canceled. When cancellation is disabled, however, calls
to pthread_testcancel have no effect.
The default cancellation type we have been describing is known as deferred cancellation.
After a call to pthread_cancel, the actual cancellation doesn't occur until the thread hits a
cancellation point. We can change the cancellation type by calling pthread_setcanceltype.
#include <pthread.h>
int pthread_setcanceltype(int type, int *oldtype
);
Returns: 0 if OK, error number on failure
The type parameter can be either PTHREAD_CANCEL_DEFERRED or PTHREAD_CANCEL_ASYNCHRONOUS.
The pthread_setcanceltype function sets the cancellation type to type and returns the
previous type in the integer pointed to by oldtype.
Asynchronous cancellation differs from deferred cancellation in that the thread can be
canceled at any time. The thread doesn't necessarily need to hit a cancellation point for it to
be canceled.
Page 545
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Page 546
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
12.8. Threads and Signals
Dealing with signals can be complicated even with a process-based paradigm. Introducing
threads into the picture makes things even more complicated.
Each thread has its own signal mask, but the signal disposition is shared by all threads in the
process. This means that individual threads can block signals, but when a thread modifies the
action associated with a given signal, all threads share the action. Thus, if one thread
chooses to ignore a given signal, another thread can undo that choice by restoring the default
disposition or installing a signal handler for the signal.
Signals are delivered to a single thread in the process. If the signal is related to a hardware
fault or expiring timer, the signal is sent to the thread whose action caused the event. Other
signals, on the other hand, are delivered to an arbitrary thread.
In Section 10.12, we discussed how processes can use sigprocmask to block signals from
delivery. The behavior of sigprocmask is undefined in a multithreaded process. Threads have to
use pthread_sigmask instead.
[View full width]
#include <signal.h>
int pthread_sigmask(int how, const sigset_t
*restrict set,
sigset_t *restrict oset);
Returns: 0 if OK, error number on failure
The pthread_sigmask function is identical to sigprocmask, except that pthread_sigmask works
with threads and returns an error code on failure instead of setting errno and returning -1.
A thread can wait for one or more signals to occur by calling sigwait.
[View full width]
#include <signal.h>
int sigwait(const sigset_t *restrict set, int
*restrict signop);
Returns: 0 if OK, error number on failure
The set argument specifies the set of signals for which the thread is waiting. On return, the
integer to which signop points will contain the number of the signal that was delivered.
If one of the signals specified in the set is pending at the time sigwait is called, then sigwait
will return without blocking. Before returning, sigwait removes the signal from the set of
signals pending for the process. To avoid erroneous behavior, a thread must block the signals
it is waiting for before calling sigwait. The sigwait function will atomically unblock the signals
and wait until one is delivered. Before returning, sigwait will restore the thread's signal mask.
Page 547
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
If the signals are not blocked at the time that sigwait is called, then a timing window is
opened up where one of the signals can be delivered to the thread before it completes its call
to sigwait.
The advantage to using sigwait is that it can simplify signal handling by allowing us to treat
asynchronously-generated signals in a synchronous manner. We can prevent the signals from
interrupting the threads by adding them to each thread's signal mask. Then we can dedicate
specific threads to handling the signals. These dedicated threads can make function calls
without having to worry about which functions are safe to call from a signal handler, because
they are being called from normal thread context, not from a traditional signal handler
interrupting a normal thread's execution.
If multiple threads are blocked in calls to sigwait for the same signal, only one of the threads
will return from sigwait when the signal is delivered. If a signal is being caught (the process
has established a signal handler by using sigaction, for example) and a thread is waiting for
the same signal in a call to sigwait, it is left up to the implementation to decide which way to
deliver the signal. In this case, the implementation could either allow sigwait to return or
invoke the signal handler, but not both.
To send a signal to a process, we call kill (Section 10.9). To send a signal to a thread, we
call pthread_kill.
#include <signal.h>
int pthread_kill(pthread_t thread, int signo
);
Returns: 0 if OK, error number on failure
We can pass a signo value of 0 to check for existence of the thread. If the default action for
a signal is to terminate the process, then sending the signal to a thread will still kill the entire
process.
Note that alarm timers are a process resource, and all threads share the same set of alarms.
Thus, it is not possible for multiple threads in a process to use alarm timers without interfering
(or cooperating) with one another (this is the subject of Exercise 12.6).
Example
Recall that in Figure 10.23, we waited for the signal handler to set a flag indicating that the
main program should exit. The only threads of control that could run were the main thread and
the signal handler, so blocking the signals was sufficient to avoid missing a change to the flag.
With threads, we need to use a mutex to protect the flag, as we show in the program in
Figure 12.16.
Instead of relying on a signal handler that interrupts the main thread of control, we dedicate a
separate thread of control to handle the signals. We change the value of quitflag under the
protection of a mutex so that the main thread of control can't miss the wake-up call made
when we call pthread_cond_signal. We use the same mutex in the main thread of control to
check the value of the flag, and atomically release the mutex and wait for the condition.
Note that we block SIGINT and SIGQUIT in the beginning of the main thread. When we create
the thread to handle signals, the thread inherits the current signal mask. Since sigwait will
unblock the signals, only one thread is available to receive signals. This enables us to code
the main thread without having to worry about interrupts from these signals.
If we run this program, we get output similar to that from Figure 10.23:
Page 548
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
$ ./a.out
^?
interrupt
^?
interrupt
^?
interrupt
^\ $
type the interrupt character
type the interrupt character again
and again
now terminate with quit character
Figure 12.16. Synchronous signal handling
#include "apue.h"
#include <pthread.h>
int
sigset_t
quitflag;
mask;
/* set nonzero by thread */
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t wait = PTHREAD_COND_INITIALIZER;
void *
thr_fn(void *arg)
{
int err, signo;
for (;;) {
err = sigwait(&mask, &signo);
if (err != 0)
err_exit(err, "sigwait failed");
switch (signo) {
case SIGINT:
printf("\ninterrupt\n");
break;
case SIGQUIT:
pthread_mutex_lock(&lock);
quitflag = 1;
pthread_mutex_unlock(&lock);
pthread_cond_signal(&wait);
return(0);
default:
printf("unexpected signal %d\n", signo);
exit(1);
}
}
}
int
main(void)
{
int
sigset_t
pthread_t
err;
oldmask;
tid;
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
sigaddset(&mask, SIGQUIT);
if ((err = pthread_sigmask(SIG_BLOCK, &mask, &oldmask)) != 0)
err_exit(err, "SIG_BLOCK error");
Page 549
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
err = pthread_create(&tid, NULL, thr_fn, 0);
if (err != 0)
err_exit(err, "can't create thread");
pthread_mutex_lock(&lock);
while (quitflag == 0)
pthread_cond_wait(&wait, &lock);
pthread_mutex_unlock(&lock);
/* SIGQUIT has been caught and is now blocked; do whatever */
quitflag = 0;
/* reset signal mask which unblocks SIGQUIT */
if (sigprocmask(SIG_SETMASK, &oldmask, NULL) < 0)
err_sys("SIG_SETMASK error");
exit(0);
}
Linux implements threads as separate processes, sharing resources using clone(2). Because of
this, the behavior of threads on Linux differs from that on other implementations when it
comes to signals. In the POSIX.1 thread model, asynchronous signals are sent to a process,
and then an individual thread within the process is selected to receive the signal, based on
which threads are not currently blocking the signal. On Linux, an asynchronous signal is sent
to a particular thread, and since each thread executes as a separate process, the system is
unable to select a thread that isn't currently blocking the signal. The result is that the thread
may not notice the signal. Thus, programs like the one in Figure 12.16 work when the signal is
generated from the terminal driver, which signals the process group, but when you try to send
a signal to the process using kill, it doesn't work as expected on Linux.
Page 550
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
12.9. Threads and fork
When a thread calls fork, a copy of the entire process address space is made for the child.
Recall the discussion of copy-on-write in Section 8.3. The child is an entirely different process
from the parent, and as long as neither one makes changes to its memory contents, copies of
the memory pages can be shared between parent and child.
By inheriting a copy of the address space, the child also inherits the state of every mutex,
readerwriter lock, and condition variable from the parent process. If the parent consists of
more than one thread, the child will need to clean up the lock state if it isn't going to call exec
immediately after fork returns.
Inside the child process, only one thread exists. It is made from a copy of the thread that
called fork in the parent. If the threads in the parent process hold any locks, the locks will
also be held in the child process. The problem is that the child process doesn't contain copies
of the threads holding the locks, so there is no way for the child to know which locks are held
and need to be unlocked.
This problem can be avoided if the child calls one of the exec functions directly after returning
from fork. In this case, the old address space is discarded, so the lock state doesn't matter.
This is not always possible, however, so if the child needs to continue processing, we need to
use a different strategy.
To clean up the lock state, we can establish fork handlers by calling the function
pthread_atfork.
[View full width]
#include <pthread.h>
int pthread_atfork(void (*prepare)(void),
void
(*parent)(void),
void (*child)(void));
Returns: 0 if OK, error number on failure
With pthread_atfork, we can install up to three functions to help clean up the locks. The
prepare fork handler is called in the parent before fork creates the child process. This fork
handler's job is to acquire all locks defined by the parent. The parent fork handler is called in
the context of the parent after fork has created the child process, but before fork has
returned. This fork handler's job is to unlock all the locks acquired by the prepare fork handler.
The child fork handler is called in the context of the child process before returning from fork.
Like the parent fork handler, the child fork handler too must release all the locks acquired by
the prepare fork handler.
Note that the locks are not locked once and unlocked twice, as it may appear. When the child
address space is created, it gets a copy of all locks that the parent defined. Because the
prepare fork handler acquired all the locks, the memory in the parent and the memory in the
child start out with identical contents. When the parent and the child unlock their "copy" of
the locks, new memory is allocated for the child, and the memory contents from the parent
are copied to the child's memory (copy-on-write), so we are left with a situation that looks as
if the parent locked all its copies of the locks and the child locked all its copies of the locks.
The parent and the child end up unlocking duplicate locks stored in different memory
Page 551
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
locations, as if the following sequence of events occurred.
1.
The parent acquired all its locks.
2.
The child acquired all its locks.
3.
The parent released its locks.
4.
The child released its locks.
We can call pthread_atfork multiple times to install more than one set of fork handlers. If we
don't have a need to use one of the handlers, we can pass a null pointer for the particular
handler argument, and it will have no effect. When multiple fork handlers are used, the order
in which the handlers are called differs. The parent and child fork handlers are called in the
order in which they were registered, whereas the prepare fork handlers are called in the
opposite order from which they were registered. This allows multiple modules to register their
own fork handlers and still honor the locking hierarchy.
For example, assume that module A calls functions from module B and that each module has
its own set of locks. If the locking hierarchy is A before B, module B must install its fork
handlers before module A. When the parent calls fork, the following steps are taken, assuming
that the child process runs before the parent.
1.
The prepare fork handler from module A is called to acquire all module A's locks.
2.
The prepare fork handler from module B is called to acquire all module B's locks.
3.
A child process is created.
4.
The child fork handler from module B is called to release all module B's locks in the child
process.
5.
The child fork handler from module A is called to release all module A's locks in the child
process.
6.
The fork function returns to the child.
7.
The parent fork handler from module B is called to release all module B's locks in the
parent process.
8.
The parent fork handler from module A is called to release all module A's locks in the
parent process.
9.
The fork function returns to the parent.
If the fork handlers serve to clean up the lock state, what cleans up the state of condition
variables? On some implementations, condition variables might not need any cleaning up.
However, an implementation that uses a lock as part of the implementation of condition
variables will require cleaning up. The problem is that no interface exists to allow us to do this.
If the lock is embedded in the condition variable data structure, then we can't use condition
variables after calling fork, because there is no portable way to clean up its state. On the
other hand, if an implementation uses a global lock to protect all condition variable data
structures in a process, then the implementation itself can clean up the lock in the fork library
routine. Application programs shouldn't rely on implementation details like this, however.
Example
The program in Figure 12.17 illustrates the use of pthread_atfork and fork handlers.
We define two mutexes, lock1 and lock2. The prepare fork handler acquires them both, the
child fork handler releases them in the context of the child process, and the parent fork
handler releases them in the context of the parent process.
Page 552
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
When we run this program, we get the following output:
$ ./a.out
thread started...
parent about to fork...
preparing locks...
child unlocking locks...
child returned from fork
parent unlocking locks...
parent returned from fork
As we can see, the prepare fork handler runs after fork is called, the child fork handler runs
before fork returns in the child, and the parent fork handler runs before fork returns in the
parent.
Figure 12.17. pthread_atfork example
#include "apue.h"
#include <pthread.h>
pthread_mutex_t lock1 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t lock2 = PTHREAD_MUTEX_INITIALIZER;
void
prepare(void)
{
printf("preparing locks...\n");
pthread_mutex_lock(&lock1);
pthread_mutex_lock(&lock2);
}
void
parent(void)
{
printf("parent unlocking locks...\n");
pthread_mutex_unlock(&lock1);
pthread_mutex_unlock(&lock2);
}
void
child(void)
{
printf("child unlocking locks...\n");
pthread_mutex_unlock(&lock1);
pthread_mutex_unlock(&lock2);
}
void *
thr_fn(void *arg)
{
printf("thread started...\n");
pause();
return(0);
}
int
main(void)
{
int
err;
Page 553
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
pid_t
pthread_t
pid;
tid;
#if defined(BSD) || defined(MACOS)
printf("pthread_atfork is unsupported\n");
#else
if ((err = pthread_atfork(prepare, parent, child)) != 0)
err_exit(err, "can't install fork handlers");
err = pthread_create(&tid, NULL, thr_fn, 0);
if (err != 0)
err_exit(err, "can't create thread");
sleep(2);
printf("parent about to fork...\n");
if ((pid = fork()) < 0)
err_quit("fork failed");
else if (pid == 0) /* child */
printf("child returned from fork\n");
else
/* parent */
printf("parent returned from fork\n");
#endif
exit(0);
}
Page 554
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
12.10. Threads and I/O
We introduced the pread and pwrite functions in Section 3.11. These functions are helpful in a
multithreaded environment, because all threads in a process share the same file descriptors.
Consider two threads reading from or writing to the same file descriptor at the same time.
Thread A
Thread B
lseek(fd, 300, SEEK_SET);
lseek(fd, 700, SEEK_SET);
read(fd, buf1, 100);
read(fd, buf2, 100);
If thread A executes the lseek and then thread B calls lseek before thread A calls read, then
both threads will end up reading the same record. Clearly, this isn't what was intended.
To solve this problem, we can use pread to make the setting of the offset and the reading of
the data one atomic operation.
Thread A
Thread B
pread(fd, buf1, 100, 300);
pread(fd, buf2, 100, 700);
Using pread, we can ensure that thread A reads the record at offset 300, whereas thread B
reads the record at offset 700. We can use pwrite to solve the problem of concurrent threads
writing to the same file.
Page 555
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
12.11. Summary
Threads provide an alternate model for partitioning concurrent tasks in UNIX systems. Threads
promote sharing among separate threads of control, but present unique synchronization
problems. In this chapter, we looked at how we can fine-tune our threads and their
synchronization primitives. We discussed reentrancy with threads. We also looked at how
threads interact with some of the process-oriented system calls.
Page 556
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Exercises
12.1
12.2
12.3
12.4
12.5
12.6
12.7
Run the program in Figure 12.17 on a Linux system, but redirect the output into
a file. Explain the results.
Implement putenv_r, a reentrant version of putenv. Make sure that your
implementation is async-signal safe as well as thread-safe.
Can you make the program in Figure 12.13 async-signal safe by blocking signals
at the beginning of the function and restoring the previous signal mask before
returning? Explain.
Write a program to exercise the version of getenv from Figure 12.13. Compile
and run the program on FreeBSD. What happens? Explain.
Given that you can create multiple threads to perform different tasks within a
program, explain why you might still need to use fork.
Reimplement the program in Figure 10.29 to make it thread-safe without using
nanosleep.
After calling fork, could we safely reinitialize a condition variable in the child
process by first destroying the condition variable with pthread_cond_destroy and
then initializing it with pthread_cond_init?
Page 557
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Chapter 13. Daemon Processes
Section 13.1. Introduction
Section 13.2. Daemon Characteristics
Section 13.3. Coding Rules
Section 13.4. Error Logging
Section 13.5. Single-Instance Daemons
Section 13.6. Daemon Conventions
Section 13.7. ClientServer Model
Section 13.8. Summary
Exercises
Page 558
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
13.1. Introduction
Daemons are processes that live for a long time. They are often started when the system is
bootstrapped and terminate only when the system is shut down. Because they don't have a
controlling terminal, we say that they run in the background. UNIX systems have numerous
daemons that perform day-to-day activities.
In this chapter, we look at the process structure of daemons and how to write a daemon.
Since a daemon does not have a controlling terminal, we need to see how a daemon can
report error conditions when something goes wrong.
For a discussion of the historical background of the term daemon as it applies to computer
systems, see Raymond [1996].
Page 559
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
13.2. Daemon Characteristics
Let's look at some common system daemons and how they relate to the concepts of process
groups, controlling terminals, and sessions that we described in Chapter 9. The ps(1)
command prints the status of various processes in the system. There are a multitude of
optionsconsult your system's manual for all the details. We'll execute
ps -axj
under BSD-based systems to see the information we need for this discussion. The -a option
shows the status of processes owned by others, and -x shows processes that don't have a
controlling terminal. The -j option displays the job-related information: the session ID, process
group ID, controlling terminal, and terminal process group ID. Under System Vbased systems, a
similar command is ps -efjc. (In an attempt to improve security, some UNIX systems don't
allow us to use ps to look at any processes other than our own.) The output from ps looks like
PPID
PID
PGID
SID TTY TPGID UID COMMAND
0
1
0
0 ?
-1
0 init
1
2
1
1 ?
-1
0 [keventd]
1
3
1
1 ?
-1
0 [kapmd]
0
5
1
1 ?
-1
0 [kswapd]
0
6
1
1 ?
-1
0 [bdflush]
0
7
1
1 ?
-1
0 [kupdated]
1 1009 1009 1009 ?
-1
1 1048 1048 1048 ?
-1
0 syslogd -m 0
1 1335 1335 1335 ?
-1
0 xinetd -pidfile /var/run/xinetd.pid
1 1403
1
1 ?
-1
0 [nfsd]
1 1405
1
1 ?
-1
0 [lockd]
1405 1406
1
1 ?
-1
0 [rpciod]
1 1853 1853 1853 ?
-1
0 crond
1 2182 2182 2182 ?
-1
0 /usr/sbin/cupsd
32 portmap
We have removed a few columns that don't interest us, such as the accumulated CPU time.
The column headings, in order, are the parent process ID, process ID, process group ID,
Page 560
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
session ID, terminal name, terminal process group ID (the foreground process group associated
with the controlling terminal), user ID, and command string.
The system that this ps command was run on (Linux) supports the notion of a session ID,
which we mentioned with the setsid function in Section 9.5. The session ID is simply the
process ID of the session leader. A BSD-based system, however, will print the address of the
session structure corresponding to the process group that the process belongs to (Section
9.11).
The system processes you see will depend on the operating system implementation. Anything
with a parent process ID of 0 is usually a kernel process started as part of the system
bootstrap procedure. (An exception to this is init, since it is a user-level command started by
the kernel at boot time.) Kernel processes are special and generally exist for the entire lifetime
of the system. They run with superuser privileges and have no controlling terminal and no
command line.
Process 1 is usually init, as we described in Section 8.2. It is a system daemon responsible
for, among other things, starting system services specific to various run levels. These services
are usually implemented with the help of their own daemons.
On Linux, the kevenTD daemon provides process context for running scheduled functions in the
kernel. The kapmd daemon provides support for the advanced power management features
available with various computer systems. The kswapd daemon is also known as the pageout
daemon. It supports the virtual memory subsystem by writing dirty pages to disk slowly over
time, so the pages can be reclaimed.
The Linux kernel flushes cached data to disk using two additional daemons: bdflush and
kupdated. The bdflush daemon flushes dirty buffers from the buffer cache back to disk when
available memory reaches a low-water mark. The kupdated daemon flushes dirty pages back to
disk at regular intervals to decrease data loss in the event of a system failure.
The portmapper daemon, portmap, provides the service of mapping RPC (Remote Procedure
Call) program numbers to network port numbers. The syslogd daemon is available to any
program to log system messages for an operator. The messages may be printed on a console
device and also written to a file. (We describe the syslog facility in Section 13.4.)
We talked about the inetd daemon (xinetd) in Section 9.3. It listens on the system's network
interfaces for incoming requests for various network servers. The nfsd, lockd, and rpciod
daemons provide support for the Network File System (NFS).
The cron daemon (crond) executes commands at specified dates and times. Numerous system
administration tasks are handled by having programs executed regularly by cron. The cupsd
daemon is a print spooler; it handles print requests on the system.
Note that most of the daemons run with superuser privilege (a user ID of 0). None of the
daemons has a controlling terminal: the terminal name is set to a question mark, and the
terminal foreground process group is 1. The kernel daemons are started without a controlling
terminal. The lack of a controlling terminal in the user-level daemons is probably the result of
the daemons having called setsid. All the user-level daemons are process group leaders and
session leaders and are the only processes in their process group and session. Finally, note
that the parent of most of these daemons is the init process.
Page 561
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
13.3. Coding Rules
Some basic rules to coding a daemon prevent unwanted interactions from happening. We
state these rules and then show a function, daemonize, that implements them.
1.
The first thing to do is call umask to set the file mode creation mask to 0. The file mode
creation mask that's inherited could be set to deny certain permissions. If the daemon
process is going to create files, it may want to set specific permissions. For example, if
it specifically creates files with group-read and group-write enabled, a file mode
creation mask that turns off either of these permissions would undo its efforts.
2.
Call fork and have the parent exit. This does several things. First, if the daemon was
started as a simple shell command, having the parent terminate makes the shell think
that the command is done. Second, the child inherits the process group ID of the
parent but gets a new process ID, so we're guaranteed that the child is not a process
group leader. This is a prerequisite for the call to setsid that is done next.
3.
Call setsid to create a new session. The three steps listed in Section 9.5 occur. The
process (a) becomes a session leader of a new session, (b) becomes the process group
leader of a new process group, and (c) has no controlling terminal.
Under System Vbased systems, some people recommend calling fork again at this point
and having the parent terminate. The second child continues as the daemon. This
guarantees that the daemon is not a session leader, which prevents it from acquiring a
controlling terminal under the System V rules (Section 9.6). Alternatively, to avoid
acquiring a controlling terminal, be sure to specify O_NOCTTY whenever opening a
terminal device.
4.
Change the current working directory to the root directory. The current working
directory inherited from the parent could be on a mounted file system. Since daemons
normally exist until the system is rebooted, if the daemon stays on a mounted file
system, that file system cannot be unmounted.
Alternatively, some daemons might change the current working directory to some
specific location, where they will do all their work. For example, line printer spooling
daemons often change to their spool directory.
5.
Unneeded file descriptors should be closed. This prevents the daemon from holding
open any descriptors that it may have inherited from its parent (which could be a shell
or some other process). We can use our open_max function (Figure 2.16) or the
getrlimit function (Section 7.11) to determine the highest descriptor and close all
descriptors up to that value.
6.
Some daemons open file descriptors 0, 1, and 2 to /dev/null so that any library
routines that try to read from standard input or write to standard output or standard
error will have no effect. Since the daemon is not associated with a terminal device,
there is nowhere for output to be displayed; nor is there anywhere to receive input
from an interactive user. Even if the daemon was started from an interactive session,
the daemon runs in the background, and the login session can terminate without
affecting the daemon. If other users log in on the same terminal device, we wouldn't
want output from the daemon showing up on the terminal, and the users wouldn't
expect their input to be read by the daemon.
Example
Figure 13.1 shows a function that can be called from a program that wants to initialize itself
as a daemon.
Page 562
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
If the daemonize function is called from a main program that then goes to sleep, we can check
the status of the daemon with the ps command:
$ ./a.out
$ ps -axj
PPID
PID
1 3346
$ ps -axj |
1 3346
PGID
SID TTY TPGID UID
3345 3345 ?
-1 501
grep 3345
3345 3345 ?
-1 501
COMMAND
./a.out
./a.out
We can also use ps to verify that no active process exists with ID 3345. This means that our
daemon is in an orphaned process group (Section 9.10) and is not a session leader and thus
has no chance of allocating a controlling terminal. This is a result of performing the second
fork in the daemonize function. We can see that our daemon has been initialized correctly.
Figure 13.1. Initialize a daemon process
#include
#include
#include
#include
"apue.h"
<syslog.h>
<fcntl.h>
<sys/resource.h>
void
daemonize(const char *cmd)
{
int
i, fd0, fd1, fd2;
pid_t
pid;
struct rlimit
rl;
struct sigaction
sa;
/*
* Clear file creation mask.
*/
umask(0);
/*
* Get maximum number of file descriptors.
*/
if (getrlimit(RLIMIT_NOFILE, &rl) < 0)
err_quit("%s: can't get file limit", cmd);
/*
* Become a session leader to lose controlling TTY.
*/
if ((pid = fork()) < 0)
err_quit("%s: can't fork", cmd);
else if (pid != 0) /* parent */
exit(0);
setsid();
/*
* Ensure future opens won't allocate controlling TTYs.
*/
sa.sa_handler = SIG_IGN;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
if (sigaction(SIGHUP, &sa, NULL) < 0)
err_quit("%s: can't ignore SIGHUP");
if ((pid = fork()) < 0)
Page 563
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
err_quit("%s: can't fork", cmd);
else if (pid != 0) /* parent */
exit(0);
/*
* Change the current working directory to the root so
* we won't prevent file systems from being unmounted.
*/
if (chdir("/") < 0)
err_quit("%s: can't change directory to /");
/*
* Close all open file descriptors.
*/
if (rl.rlim_max == RLIM_INFINITY)
rl.rlim_max = 1024;
for (i = 0; i < rl.rlim_max; i++)
close(i);
/*
* Attach file descriptors 0, 1, and 2 to /dev/null.
*/
fd0 = open("/dev/null", O_RDWR);
fd1 = dup(0);
fd2 = dup(0);
/*
* Initialize the log file.
*/
openlog(cmd, LOG_CONS, LOG_DAEMON);
if (fd0 != 0 || fd1 != 1 || fd2 != 2) {
syslog(LOG_ERR, "unexpected file descriptors %d %d %d",
fd0, fd1, fd2);
exit(1);
}
}
Page 564
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
13.4. Error Logging
One problem a daemon has is how to handle error messages. It can't simply write to standard
error, since it shouldn't have a controlling terminal. We don't want all the daemons writing to
the console device, since on many workstations, the console device runs a windowing system.
We also don't want each daemon writing its own error messages into a separate file. It would
be a headache for anyone administering the system to keep up with which daemon writes to
which log file and to check these files on a regular basis. A central daemon error-logging
facility is required.
The BSD syslog facility was developed at Berkeley and used widely in 4.2BSD. Most systems
derived from BSD support syslog.
Until SVR4, System V never had a central daemon logging facility.
The syslog function is included as an XSI extension in the Single UNIX Specification.
The BSD syslog facility has been widely used since 4.2BSD. Most daemons use this facility.
Figure 13.2 illustrates its structure.
Figure 13.2. The BSD syslog facility
There are three ways to generate log messages:
1.
Kernel routines can call the log function. These messages can be read by any user
process that opens and reads the /dev/klog device. We won't describe this function
any further, since we're not interested in writing kernel routines.
2.
Most user processes (daemons) call the syslog(3) function to generate log messages.
We describe its calling sequence later. This causes the message to be sent to the
UNIX domain datagram socket /dev/log.
Page 565
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
3.
A user process on this host, or on some other host that is connected to this host by a
TCP/IP network, can send log messages to UDP port 514. Note that the syslog
function never generates these UDP datagrams: they require explicit network
programming by the process generating the log message.
Refer to Stevens, Fenner, and Rudoff [2004] for details on UNIX domain sockets and UDP
sockets.
Normally, the syslogd daemon reads all three forms of log messages. On start-up, this daemon
reads a configuration file, usually /etc/syslog.conf, which determines where different classes
of messages are to be sent. For example, urgent messages can be sent to the system
administrator (if logged in) and printed on the console, whereas warnings may be logged to a
file.
Our interface to this facility is through the syslog function.
[View full width]
#include <syslog.h>
void openlog(const char *ident, int option, int
facility);
void syslog(int priority, const char *format,
...);
void closelog(void);
int setlogmask(int maskpri);
Returns: previous log priority mask value
Calling openlog is optional. If it's not called, the first time syslog is called, openlog is called
automatically. Calling closelog is also optionalit just closes the descriptor that was being used
to communicate with the syslogd daemon.
Calling openlog lets us specify an ident that is added to each log message. This is normally the
name of the program (cron, inetd, etc.). The option argument is a bitmask specifying various
options. Figure 13.3 describes the available options, including a bullet in the XSI column if the
option is included in the openlog definition in the Single UNIX Specification.
Figure 13.3. The option argument for openlog
option
XSI
Description
LOG_CONS
•
If the log message can't be sent to syslogd via the UNIX domain
datagram, the message is written to the console instead.
LOG_NDELAY
•
Open the UNIX domain datagram socket to the syslogd daemon
immediately; don't wait until the first message is logged. Normally, the
socket is not opened until the first message is logged.
LOG_NOWAIT
•
Do not wait for child processes that might have been created in the
process of logging the message. This prevents conflicts with applications
Page 566
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 13.3. The option argument for openlog
option
XSI
Description
that catch SIGCHLD, since the application might have retrieved the child's
status by the time that syslog calls wait.
LOG_ODELAY
•
Delay the open of the connection to the syslogd daemon until the first
message is logged.
LOG_PERROR
LOG_PID
Write the log message to standard error in addition to sending it to
syslogd. (Unavailable on Solaris.)
•
Log the process ID with each message. This is intended for daemons
that fork a child process to handle different requests (as compared to
daemons, such as syslogd, that never call fork).
The facility argument for openlog is taken from Figure 13.4. Note that the Single UNIX
Specification defines only a subset of the facility codes typically available on a given platform.
The reason for the facility argument is to let the configuration file specify that messages from
different facilities are to be handled differently. If we don't call openlog, or if we call it with a
facility of 0, we can still specify the facility as part of the priority argument to syslog.
Figure 13.4. The facility argument for openlog
facility
XSI
Description
LOG_AUTH
authorization programs: login, su, getty, ...
LOG_AUTHPRIV
same as LOG_AUTH, but logged to file with restricted permissions
LOG_CRON
cron and at
LOG_DAEMON
system daemons: inetd, routed, ...
LOG_FTP
the FTP daemon (ftpd)
LOG_KERN
messages generated by the kernel
LOG_LOCAL0
•
reserved for local use
LOG_LOCAL1
•
reserved for local use
LOG_LOCAL2
•
reserved for local use
LOG_LOCAL3
•
reserved for local use
LOG_LOCAL4
•
reserved for local use
Page 567
ABC Amber CHM Converter Trial version, http://www.processtext.com/abcchm.html
Figure 13.4. The facility argument for openlog
facility
XSI
Description
LOG_LOCAL5
•
reserved for local use
LOG_LOCAL6
•
reserved for local use
LOG_LOCAL7
•
reserved for local use
LOG_LPR
line printer system: lpd, lpc, ...
LOG_MA