Paul Sheer
August 14, 2001
Pages up to and including this page are not included by Prentice Hall.
2
“The reason we don’t sell billions and billions of
Guides
,” continued Harl, after wiping his mouth, “is the expense. What we do is we sell one
Guide
billions and billions of times. We exploit the multidimensional nature of the Universe to cut down on manufacturing costs. And we don’t sell to penniless hitchhikers.
What a stupid notion that was! Find the one section of the market that, more or less by definition, doesn’t have any money, and try to sell to it. No. We sell to the affluent business traveler and his vacationing wife in a billion, billion different futures. This is the most radical, dynamic and thrusting business venture in the entire multidimensional infinity of space-time-probability ever.”
. . .
Ford was completely at a loss for what to do next.
“Look,” he said in a stern voice. But he wasn’t certain how far saying things like “Look” in a stern voice was necessarily going to get him, and time was not on his side. What the hell, he thought, you’re only young once, and threw himself out of the window. That would at least keep the element of surprise on his side.
. . .
In a spirit of scientific inquiry he hurled himself out of the window again.
Douglas Adams
Mostly Harmless
Strangely, the thing that least intrigued me was how they’d managed to get it all done. I suppose I sort of knew. If I’d learned one thing from traveling, it was that the way to get things done was to go ahead and do them. Don’t talk about going to Borneo. Book a ticket, get a visa, pack a bag, and it just happens.
Alex Garland
The Beach
vi
16
17
18
19
20
11
12
13
14
15
8
9
6
7
10
3
4
1
2
5
21
22
23
24
25
26
Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Computing Sub-basics
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
PC Hardware
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Basic Commands
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Regular Expressions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Editing Text Files
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Shell Scripting
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Streams and sed — The Stream Editor
. . . . . . . . . . . . . . . . . . .
Processes, Environment Variables
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
User Accounts and Ownerships
Using Internet Services
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
L
INUX
Resources
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Permission and Modification Times
. . . . . . . . . . . . . . . . . . . . .
Symbolic and Hard Links
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Pre-installed Documentation
. . . . . . . . . . . . . . . . . . . . . . . . .
Overview of the U
NIX
Directory Layout
. . . . . . . . . . . . . . . . . .
U
NIX
Devices
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Partitions, File Systems, Formatting, Mounting
. . . . . . . . . . . . . .
Advanced Shell Scripting
. . . . . . . . . . . . . . . . . . . . . . . . . . .
System Services and lpd
Trivial Introduction to
C
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Shared Libraries
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Source and Binary Packages
. . . . . . . . . . . . . . . . . . . . . . . . .
Introduction to IP
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TCP and UDP
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
131
135
141
153
171
101
111
117
123
127
53
61
73
81
97
15
25
1
5
49
193
207
233
237
247
263 vii
Chapter Summary
39
40
41
42
36
37
38
43
44
A
B
C
D
E
31
32
33
34
35
27
28
29
30
Index
DNS and Name Resolution
. . . . . . . . . . . . . . . . . . . . . . . . . .
Network File System, NFS
. . . . . . . . . . . . . . . . . . . . . . . . . .
Services Running Under inetd
. . . . . . . . . . . . . . . . . . . . . . .
exim and sendmail
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
lilo , initrd , and Booting init ,
?
getty , and U
NIX
. . . . . . . . . . . . . . . . . . . . . . . . .
Run Levels
. . . . . . . . . . . . . . . . . . . .
Sending Faxes
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
uucp and uux
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The L
INUX
File System Standard
. . . . . . . . . . . . . . . . . . . . . .
httpd — Apache Web Server crond and atd
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
postgres SQL Server
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
smbd — Samba NT Server
. . . . . . . . . . . . . . . . . . . . . . . . . .
named — Domain Name Server
. . . . . . . . . . . . . . . . . . . . . . .
Point-to-Point Protocol — Dialup Networking
The L
INUX
. . . . . . . . . . . . . .
Kernel Source, Modules, and Hardware Support
. . . . . .
The X Window System
U
NIX
Security
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lecture Schedule
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
LPI Certification Cross-Reference
. . . . . . . . . . . . . . . . . . . . . .
RHCE Certification Cross-Reference
. . . . . . . . . . . . . . . . . . . .
L
INUX
Advocacy FAQ
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The GNU General Public License Version 2
. . . . . . . . . . . . . . . .
389
409
413
425
437
453
463
485
511
525
531
543
551
573
317
325
333
337
347
273
285
291
299
581
viii
Acknowledgments
1 Introduction
1.1
What This Book Covers . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
1.3
1.4
Read This Next. . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
What Do I Need to Get Started? . . . . . . . . . . . . . . . . . . . . . . .
More About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5
1.6
1.7
1.8
I Get Frustrated with U
NIX
Documentation That I Don’t Understand . .
LPI and RHCE Requirements . . . . . . . . . . . . . . . . . . . . . . . . .
Not RedHat: RedHat-
like
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Updates and Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xxxi
3
3
2
2
1
1
1
1
2
2 Computing Sub-basics
2.1
Binary, Octal, Decimal, and Hexadecimal . . . . . . . . . . . . . . . . . .
2.2
2.3
2.4
2.5
2.6
Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Login and Password Change . . . . . . . . . . . . . . . . . . . . . . . . .
Listing Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Command-Line Editing Keys . . . . . . . . . . . . . . . . . . . . . . . . .
2.7
2.8
Console Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Creating Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9
Allowable Characters for File Names . . . . . . . . . . . . . . . . . . . .
2.10 Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
10
7
8
10
5
5
11
12
12
12
3 PC Hardware
3.1
Motherboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Master/Slave IDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
15
19 ix
Contents
3.3
3.4
3.5
CMOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Serial Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Modems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
20
23
4 Basic Commands
4.1
The ls Command, Hidden Files, Command-Line Options . . . . . . . .
4.2
4.3
Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wildcards, Names, Extensions, and
glob
Expressions . . . . . . . . . . .
4.4
4.5
4.6
4.7
4.3.1
4.3.2
File naming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Glob expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Usage Summaries and the Copy Command . . . . . . . . . . . . . . . . .
Directory Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Relative
vs.
Absolute
Pathnames . . . . . . . . . . . . . . . . . . . . . . . .
System Manual Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8
4.9
System info Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Some Basic Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10 The mc File Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.11 Multimedia Commands for Fun . . . . . . . . . . . . . . . . . . . . . . .
4.12 Terminating Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.13 Compressed Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.14 Searching for Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.15 Searching
Within
Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.16 Copying to MS-DOS and Windows Formatted Floppy Disks . . . . . . .
4.17 Archives and Backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.18 The PATH Where Commands Are Searched For . . . . . . . . . . . . . .
4.19 The -Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
41
42
43
36
36
40
40
44
45
46
47
33
34
34
35
25
25
26
29
29
32
5 Regular Expressions
5.1
5.2
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The fgrep Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3
5.4
5.5
Regular Expression \{ \} Notation . . . . . . . . . . . . . . . . . . . . .
+ ? \< \> ( ) | Notation . . . . . . . . . . . . . . . . . . . . . . . . . . .
Regular Expression Subexpressions . . . . . . . . . . . . . . . . . . . . .
51
52
52
49
49
51 x
Contents
6 Editing Text Files
6.1
vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2
6.3
Syntax Highlighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.1
Cooledit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.2
6.3.3
6.3.4
vi and vim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Emacs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Other editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
59
59
53
53
57
57
58
7 Shell Scripting
7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2
7.3
Looping: the
Looping: the while for and until Statements . . . . . . . . . . . . . . . . .
Statement . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4
7.5
break ing Out of Loops and continue ing . . . . . . . . . . . . . . . . .
Looping Over Glob Expressions . . . . . . . . . . . . . . . . . . . . . . .
7.6
7.7
The case Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using Functions: the function Keyword . . . . . . . . . . . . . . . . .
Properly Processing Command-Line Args: shift . . . . . . . . . . . . .
7.8
7.9
More on Command-Line Arguments: [email protected] and $0 . . . . . . . . . . . . . .
7.10 Single Forward Quote Notation . . . . . . . . . . . . . . . . . . . . . . . .
7.11 Double-Quote Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.12 Backward-Quote Substitution . . . . . . . . . . . . . . . . . . . . . . . . .
68
70
70
65
66
66
67
61
61
62
63
70
71
8 Streams and sed — The Stream Editor
8.1
8.2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3
8.4
Piping Using | Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Complex Piping Example . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5
8.6
8.7
8.8
Redirecting Streams with >& . . . . . . . . . . . . . . . . . . . . . . . . .
Using sed to Edit Streams . . . . . . . . . . . . . . . . . . . . . . . . . . .
Regular Expression Subexpressions . . . . . . . . . . . . . . . . . . . . .
Inserting and Deleting Lines . . . . . . . . . . . . . . . . . . . . . . . . .
75
77
77
79
73
73
74
74
75
9 Processes, Environment Variables
9.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2
ps — List Running Processes . . . . . . . . . . . . . . . . . . . . . . . . .
9.3
Controlling Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
81
82
82 xi
Contents
9.7
9.8
9.9
9.4
9.5
9.6
Creating Background Processes . . . . . . . . . . . . . . . . . . . . . . . .
kill
ing
a Process, Sending Signals . . . . . . . . . . . . . . . . . . . . .
List of Common Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Niceness of Processes, Scheduling Priority . . . . . . . . . . . . . . . . .
Process CPU/Memory Consumption, top . . . . . . . . . . . . . . . . .
Environments of Processes . . . . . . . . . . . . . . . . . . . . . . . . . .
87
88
90
83
84
86
10 Mail
10.1 Sending and Reading Mail . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2 The SMTP Protocol — Sending Mail Raw to Port 25 . . . . . . . . . . . .
97
99
99
11 User Accounts and Ownerships
11.1 File Ownerships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2 The Password File /etc/passwd . . . . . . . . . . . . . . . . . . . . . .
11.3 Shadow Password File:
11.4 The
/etc/shadow groups Command and
. . . . . . . . . . . . . . . . . . .
/etc/group . . . . . . . . . . . . . . . . .
11.5 Manually Creating a User Account . . . . . . . . . . . . . . . . . . . . . .
11.6 Automatically: useradd and groupadd . . . . . . . . . . . . . . . . . .
11.7 User Logins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.7.1 The login command . . . . . . . . . . . . . . . . . . . . . . . . .
11.7.2 The
set user
, su command . . . . . . . . . . . . . . . . . . . . . . .
11.7.3 The who , w , and users commands to see who is logged in . . . .
11.7.4 The id command and
effective
UID . . . . . . . . . . . . . . . . .
11.7.5 User limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
107
108
109
109
105
106
106
106
101
101
102
103
104
12 Using Internet Services
12.1
12.2
ssh rcp
, not and telnet scp or rlogin . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3
rsh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.4 FTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.5
finger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.6 Sending Files by Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.6.1
uuencode and uudecode . . . . . . . . . . . . . . . . . . . . . .
12.6.2 MIME encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . .
113
114
114
114
111
111
112
112
115 xii
Contents
13 L
INUX
Resources
13.1 FTP Sites and the sunsite Mirror . . . . . . . . . . . . . . . . . . . . . .
13.2 HTTP — Web Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.3 SourceForge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.4 Mailing Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.4.1 Majordomo and Listserv . . . . . . . . . . . . . . . . . . . . . . .
13.4.2
* -request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.5 Newsgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.6 RFCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
117
117
118
119
119
119
120
120
121
14 Permission and Modification Times
14.1 The chmod Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2 The umask Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.3 Modification Times: stat . . . . . . . . . . . . . . . . . . . . . . . . . . .
123
123
125
126
15 Symbolic and Hard Links
15.1 Soft Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.2 Hard Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
127
127
129
16 Pre-installed Documentation 131
17 Overview of the U
NIX
Directory Layout
17.1 Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.2 U
NIX
Directory Superstructure . . . . . . . . . . . . . . . . . . . . . . . .
17.3 L
INUX on a Single Floppy Disk . . . . . . . . . . . . . . . . . . . . . . . .
135
135
136
138
18 U
NIX
Devices
18.1 Device Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18.2 Block and Character Devices . . . . . . . . . . . . . . . . . . . . . . . . .
18.3
Major
and
Minor
Device Numbers . . . . . . . . . . . . . . . . . . . . . .
18.4 Common Device Names . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18.5
dd , tar , and Tricks with Block Devices . . . . . . . . . . . . . . . . . . .
18.5.1 Creating boot disks from boot images . . . . . . . . . . . . . . . .
18.5.2 Erasing disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18.5.3 Identifying data on raw disks . . . . . . . . . . . . . . . . . . . . .
18.5.4 Duplicating a disk . . . . . . . . . . . . . . . . . . . . . . . . . . .
18.5.5 Backing up to floppies . . . . . . . . . . . . . . . . . . . . . . . . .
147
147
147
148
141
141
142
143
143
148
149 xiii
Contents
18.5.6 Tape backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18.5.7 Hiding program output, creating blocks of zeros . . . . . . . . .
18.6 Creating Devices with mknod and /dev/MAKEDEV . . . . . . . . . . . .
149
149
150
19 Partitions, File Systems, Formatting, Mounting
19.1 The Physical Disk Structure . . . . . . . . . . . . . . . . . . . . . . . . . .
19.1.1 Cylinders, heads, and sectors . . . . . . . . . . . . . . . . . . . . .
19.1.2 Large Block Addressing . . . . . . . . . . . . . . . . . . . . . . . .
19.1.3 Extended partitions . . . . . . . . . . . . . . . . . . . . . . . . . .
19.2 Partitioning a New Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.3 Formatting Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.3.1 File systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.3.2
mke2fs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.3.3 Formatting floppies and removable drives . . . . . . . . . . . . .
19.3.4 Creating MS-DOS floppies . . . . . . . . . . . . . . . . . . . . . .
19.3.5
mkswap , swapon , and swapoff . . . . . . . . . . . . . . . . . . .
19.4 Device Mounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.4.1 Mounting CD-ROMs . . . . . . . . . . . . . . . . . . . . . . . . .
19.4.2 Mounting floppy disks . . . . . . . . . . . . . . . . . . . . . . . .
19.4.3 Mounting Windows and NT partitions . . . . . . . . . . . . . . .
19.5 File System Repair: fsck . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.6 File System Errors on Boot . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.7 Automatic Mounts: fstab . . . . . . . . . . . . . . . . . . . . . . . . . .
19.8 Manually Mounting /proc . . . . . . . . . . . . . . . . . . . . . . . . . .
19.9 RAM and Loopback Devices . . . . . . . . . . . . . . . . . . . . . . . . .
19.9.1 Formatting a floppy inside a file . . . . . . . . . . . . . . . . . . .
19.9.2 CD-ROM files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.10 Remounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19.11 Disk sync . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
160
160
161
162
162
163
163
164
153
153
153
154
154
155
160
167
167
168
168
169
164
165
165
166
167
20 Advanced Shell Scripting
20.1 Lists of Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.2 Special Parameters: $?
, $* ,. . .
. . . . . . . . . . . . . . . . . . . . . . . .
20.3 Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.4 Built-in Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.5 Trapping Signals — the trap Command . . . . . . . . . . . . . . . . . .
171
171
172
173
175
176 xiv
Contents
20.6 Internal Settings — the set Command . . . . . . . . . . . . . . . . . . .
20.7 Useful Scripts and Commands . . . . . . . . . . . . . . . . . . . . . . . .
20.7.1
20.7.2
20.7.3
chroot if
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
conditionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
patch ing and diff ing . . . . . . . . . . . . . . . . . . . . . . . .
20.7.4 Internet connectivity test . . . . . . . . . . . . . . . . . . . . . . .
20.7.5 Recursive grep (search) . . . . . . . . . . . . . . . . . . . . . . . .
20.7.6 Recursive search and replace . . . . . . . . . . . . . . . . . . . . .
20.7.7
cut and awk — manipulating text file fields . . . . . . . . . . . .
20.7.8 Calculations with bc . . . . . . . . . . . . . . . . . . . . . . . . . .
20.7.9 Conversion of graphics formats of many files . . . . . . . . . . .
20.7.10 Securely erasing files . . . . . . . . . . . . . . . . . . . . . . . . . .
20.7.11 Persistent background processes . . . . . . . . . . . . . . . . . . .
20.7.12 Processing the process list . . . . . . . . . . . . . . . . . . . . . . .
20.8 Shell Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.8.1 Customizing the PATH and LD LIBRARY PATH . . . . . . . . . .
20.9 File Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.9.1 Locking a mailbox file . . . . . . . . . . . . . . . . . . . . . . . . .
20.9.2 Locking over NFS . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.9.3 Directory versus file locking . . . . . . . . . . . . . . . . . . . . .
20.9.4 Locking inside C programs . . . . . . . . . . . . . . . . . . . . . .
181
182
183
183
184
184
185
186
177
178
178
179
179
180
180
187
187
188
190
190
191
21 System Services and lpd
21.1 Using lpr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21.2 Downloading and Installing . . . . . . . . . . . . . . . . . . . . . . . . . .
21.3
LPRng vs. Legacy lpr-0.
nn
. . . . . . . . . . . . . . . . . . . . . . . . .
21.4 Package Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21.4.1 Documentation files . . . . . . . . . . . . . . . . . . . . . . . . . .
21.4.2 Web pages, mailing lists, and download points . . . . . . . . . .
21.4.3 User programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21.4.4 Daemon and administrator programs . . . . . . . . . . . . . . . .
21.4.5 Configuration files . . . . . . . . . . . . . . . . . . . . . . . . . . .
21.4.6 Service initialization files . . . . . . . . . . . . . . . . . . . . . . .
21.4.7 Spool files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21.4.8 Log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21.4.9 Log file rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
195
195
196
196
196
196
193
193
194
195
195
197
198
198 xv
Contents
21.4.10 Environment variables . . . . . . . . . . . . . . . . . . . . . . . .
21.5 The printcap File in Detail . . . . . . . . . . . . . . . . . . . . . . . . .
21.6 PostScript and the Print Filter . . . . . . . . . . . . . . . . . . . . . . . . .
21.7 Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21.8 Printing Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . .
21.9 Useful Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21.9.1
printtool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21.9.2
21.9.3
apsfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
mpage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21.9.4
psutils . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21.10 Printing to Things Besides Printers . . . . . . . . . . . . . . . . . . . . . .
199
199
200
202
203
204
204
204
204
204
205
22 Trivial Introduction to
C
22.1
C Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22.1.1 The simplest C program . . . . . . . . . . . . . . . . . . . . . . . .
22.1.2 Variables and types . . . . . . . . . . . . . . . . . . . . . . . . . .
22.1.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22.1.4
for , while , if , and switch statements . . . . . . . . . . . . . .
22.1.5 Strings, arrays, and memory allocation . . . . . . . . . . . . . . .
22.1.6 String operations . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22.1.7 File operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22.1.8 Reading command-line arguments inside C programs . . . . . .
22.1.9 A more complicated example . . . . . . . . . . . . . . . . . . . . .
22.1.10
#include statements and prototypes . . . . . . . . . . . . . . . .
22.1.11
C comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22.1.12
#define and #if — C macros . . . . . . . . . . . . . . . . . . .
22.2 Debugging with gdb and strace . . . . . . . . . . . . . . . . . . . . . .
22.3
22.4
22.2.1
gdb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22.2.2 Examining core files . . . . . . . . . . . . . . . . . . . . . . . . .
22.2.3
strace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C Projects — Makefile s . . . . . . . . . . . . . . . . . . . . . . . . . . .
22.4.1 Completing our example Makefile . . . . . . . . . . . . . . . .
22.4.2 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . .
220
221
222
223
215
217
218
218
207
208
208
209
210
211
213
223
227
227
227
230
231
231 xvi
Contents
23 Shared Libraries
23.1 Creating DLL .so
Files . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23.2 DLL Versioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23.3 Installing DLL .so
Files . . . . . . . . . . . . . . . . . . . . . . . . . . . .
233
233
234
235
24 Source and Binary Packages
24.1 Building GNU Source Packages . . . . . . . . . . . . . . . . . . . . . . . .
24.2 RedHat and Debian Binary Packages . . . . . . . . . . . . . . . . . . . .
24.2.1 Package versioning . . . . . . . . . . . . . . . . . . . . . . . . . .
24.2.2 Installing, upgrading, and deleting . . . . . . . . . . . . . . . . .
24.2.3 Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24.2.4 Package queries . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24.2.5 File lists and file queries . . . . . . . . . . . . . . . . . . . . . . . .
24.2.6 Package verification . . . . . . . . . . . . . . . . . . . . . . . . . .
24.2.7 Special queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24.2.8
dpkg / apt versus rpm . . . . . . . . . . . . . . . . . . . . . . . . .
24.3 Source Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
240
241
241
242
237
237
240
240
243
244
245
246
25 Introduction to IP
25.1 Internet Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25.2 Special IP Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25.3 Network Masks and Addresses . . . . . . . . . . . . . . . . . . . . . . . .
25.4 Computers on a LAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25.5 Configuring Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25.6 Configuring Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25.7 Configuring Startup Scripts . . . . . . . . . . . . . . . . . . . . . . . . . .
25.7.1 RedHat networking scripts . . . . . . . . . . . . . . . . . . . . . .
25.7.2 Debian networking scripts . . . . . . . . . . . . . . . . . . . . . .
25.8 Complex Routing — a Many-Hop Example . . . . . . . . . . . . . . . . .
25.9 Interface Aliasing — Many IPs on One Physical Card . . . . . . . . . . .
25.10 Diagnostic Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25.10.1
ping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25.10.2
traceroute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25.10.3
tcpdump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
260
260
261
261
254
255
256
259
250
251
252
254
247
247
249
250 xvii
Contents
26 TCP and UDP
26.1 The TCP Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26.2 A Sample TCP Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26.3 User Datagram Protocol (UDP) . . . . . . . . . . . . . . . . . . . . . . . .
26.4
/etc/services File . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26.5 Encrypting and Forwarding TCP . . . . . . . . . . . . . . . . . . . . . . .
263
264
265
268
269
270
27 DNS and Name Resolution
27.1 Top-Level Domains (TLDs) . . . . . . . . . . . . . . . . . . . . . . . . . .
27.2 Resolving DNS Names to IP Addresses . . . . . . . . . . . . . . . . . . .
27.2.1 The Internet DNS infrastructure . . . . . . . . . . . . . . . . . . .
27.2.2 The name resolution process . . . . . . . . . . . . . . . . . . . . .
27.3 Configuring Your Local Machine . . . . . . . . . . . . . . . . . . . . . . .
27.4 Reverse Lookups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27.5
Authoritative
for a Domain . . . . . . . . . . . . . . . . . . . . . . . . . . .
27.6 The host , ping , and whois Command . . . . . . . . . . . . . . . . . . .
27.7 The nslookup Command . . . . . . . . . . . . . . . . . . . . . . . . . . .
27.7.1
NS , MX , PTR , A and CNAME records . . . . . . . . . . . . . . . . . .
27.8 The dig Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
281
281
281
282
283
284
273
273
274
275
276
277
28 Network File System, NFS
28.1 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28.2 Configuration Example . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28.3 Access Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28.4 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28.5 Kernel NFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
285
285
286
288
289
289
29 Services Running Under inetd
29.1 The inetd Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29.2 Invoking Services with /etc/inetd.conf
. . . . . . . . . . . . . . . .
29.2.1 Invoking a standalone service . . . . . . . . . . . . . . . . . . . .
29.2.2 Invoking an inetd service . . . . . . . . . . . . . . . . . . . . . .
29.2.3 Invoking an inetd “TCP wrapper” service . . . . . . . . . . . .
29.2.4 Distribution conventions . . . . . . . . . . . . . . . . . . . . . . .
29.3 Various Service Explanations . . . . . . . . . . . . . . . . . . . . . . . . .
29.4 The xinetd Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29.5 Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
291
291
291
292
292
293
294
294
295
295 xviii
Contents
29.5.1 Limiting access . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29.6 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
296
297
30 exim and sendmail
30.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30.1.1 How mail works . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30.1.2 Configuring a POP/IMAP server . . . . . . . . . . . . . . . . . .
30.2
30.3
30.1.3 Why exim ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
exim Package Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . .
exim Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30.3.1 Global settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30.3.2 Transports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30.3.3 Directors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30.3.4 Routers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30.4 Full-blown Mail server . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30.5 Shell Commands for exim Administration . . . . . . . . . . . . . . . . .
30.6 The Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30.7
/etc/aliases for Equivalent Addresses . . . . . . . . . . . . . . . . .
30.8 Real-Time Blocking List — Combating Spam . . . . . . . . . . . . . . . .
30.8.1 What is
spam
? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30.8.2 Basic spam prevention . . . . . . . . . . . . . . . . . . . . . . . . .
30.8.3 Real-time blocking list . . . . . . . . . . . . . . . . . . . . . . . . .
30.8.4 Mail administrator and user responsibilities . . . . . . . . . . . .
30.9 Sendmail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
312
313
313
314
306
306
308
309
310
311
311
301
301
302
303
304
305
299
299
299
301
31 lilo , initrd , and Booting
31.1 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31.2.1 Kernel boot sequence . . . . . . . . . . . . . . . . . . . . . . . . .
31.2.2 Master boot record . . . . . . . . . . . . . . . . . . . . . . . . . . .
31.2.3 Booting partitions . . . . . . . . . . . . . . . . . . . . . . . . . . .
31.2.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31.3
lilo.conf
and the lilo Command . . . . . . . . . . . . . . . . . . . .
31.4 Creating Boot Floppy Disks . . . . . . . . . . . . . . . . . . . . . . . . . .
31.5 SCSI Installation Complications and initrd
31.7 Modifying lilo.conf
for initrd
. . . . . . . . . . . . . . . .
31.6 Creating an initrd Image . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
31.8 Using mkinitrd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
319
319
321
322
322
324
324
317
317
318
318
318
318 xix
Contents
32 init ,
?
getty , and U
NIX
Run Levels
32.1
32.2
init — the First Process . . . . . . . . . . . . . . . . . . . . . . . . . . .
/etc/inittab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32.2.1 Minimal configuration . . . . . . . . . . . . . . . . . . . . . . . . .
32.2.2 Rereading inittab . . . . . . . . . . . . . . . . . . . . . . . . . .
32.2.3 The respawning too fast error . . . . . . . . . . . . . . . . .
32.3 Useful Run Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32.4
getty Invocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32.5 Bootup Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32.6 Incoming Faxes and Modem Logins . . . . . . . . . . . . . . . . . . . . .
32.6.1
mgetty with character terminals . . . . . . . . . . . . . . . . . .
32.6.2
32.6.3
32.6.4
mgetty log files . . . . . . . . . . . . . . . . . . . . . . . . . . . .
mgetty with modems . . . . . . . . . . . . . . . . . . . . . . . . .
mgetty receiving faxes . . . . . . . . . . . . . . . . . . . . . . . .
328
328
329
329
330
330
325
325
326
326
328
330
330
331
33 Sending Faxes
33.1 Fax Through Printing . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33.2 Setgid Wrapper Binary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
333
333
335
34 uucp and uux
34.1 Command-Line Operation . . . . . . . . . . . . . . . . . . . . . . . . . .
34.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34.3 Modem Dial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34.4
tty /UUCP Lock Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34.5 Debugging uucp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34.6 Using uux with exim . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34.7 Scheduling Dialouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
337
338
338
341
342
343
343
346
35 The L
INUX
File System Standard
35.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.1.2 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.2 The Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.3 The Root Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.3.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.3.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.3.3 Specific Options . . . . . . . . . . . . . . . . . . . . . . . . . . . .
347
349
349
349
349
351
351
352
352 xx
Contents
35.3.4 /bin : Essential user command binaries (for use by all users) . .
35.3.5 /boot : Static files of the boot loader . . . . . . . . . . . . . . . . .
35.3.6 /dev : Device files . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.3.7 /etc : Host-specific system configuration . . . . . . . . . . . . . .
35.3.8 /home : User home directories (optional) . . . . . . . . . . . . . .
35.3.9 /lib : Essential shared libraries and kernel modules . . . . . . . .
358
358
35.3.10 /lib
<
qual
>
: Alternate format essential shared libraries (optional) 359
35.3.11 /mnt : Mount point for a temporarily mounted filesystem . . . .
35.3.12 /opt : Add-on application software packages . . . . . . . . . . .
35.3.13 /root : Home directory for the root user (optional) . . . . . . . .
359
360
361
353
354
355
355
35.3.14 /sbin : System binaries . . . . . . . . . . . . . . . . . . . . . . . .
35.3.15 /tmp : Temporary files . . . . . . . . . . . . . . . . . . . . . . . .
35.4 The /usr Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.4.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.4.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.4.3 Specific Options . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.4.4 /usr/X11R6 : X Window System, Version 11 Release 6 (optional)
35.4.5 /usr/bin : Most user commands . . . . . . . . . . . . . . . . . . .
35.4.6 /usr/include : Directory for standard include files. . . . . . . . .
35.4.7 /usr/lib : Libraries for programming and packages . . . . . . . .
35.4.8 /usr/lib
<
qual
>
: Alternate format libraries (optional) . . . . . .
35.4.9 /usr/local : Local hierarchy . . . . . . . . . . . . . . . . . . . . .
35.4.10 /usr/sbin : Non-essential standard system binaries . . . . . . . .
364
365
365
366
366
367
361
362
362
362
363
363
363
35.4.11 /usr/share : Architecture-independent data . . . . . . . . . . . .
35.4.12 /usr/src : Source code (optional) . . . . . . . . . . . . . . . . . .
35.5 The /var Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.5.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.5.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.5.3 Specific Options . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.5.4 /var/account : Process accounting logs (optional) . . . . . . . . .
35.5.5 /var/cache : Application cache data . . . . . . . . . . . . . . . .
35.5.6 /var/crash : System crash dumps (optional) . . . . . . . . . . . .
35.5.7 /var/games : Variable game data (optional) . . . . . . . . . . . .
35.5.8 /var/lib : Variable state information . . . . . . . . . . . . . . . .
35.5.9 /var/lock : Lock files . . . . . . . . . . . . . . . . . . . . . . . . .
35.5.10 /var/log : Log files and directories . . . . . . . . . . . . . . . . .
374
374
376
376
377
379
367
373
373
373
373
374
379 xxi
Contents
35.5.11 /var/mail : User mailbox files (optional) . . . . . . . . . . . . . .
35.5.12 /var/opt : Variable data for /opt . . . . . . . . . . . . . . . . . .
35.5.13 /var/run : Run-time variable data . . . . . . . . . . . . . . . . . .
35.5.14 /var/spool : Application spool data . . . . . . . . . . . . . . . . .
35.5.15 /var/tmp : Temporary files preserved between system reboots .
35.5.16 /var/yp : Network Information Service (NIS) database files (optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.6 Operating System Specific Annex . . . . . . . . . . . . . . . . . . . . . .
35.6.1 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.7.1 The FHS mailing list . . . . . . . . . . . . . . . . . . . . . . . . . .
35.7.2 Background of the FHS . . . . . . . . . . . . . . . . . . . . . . . .
35.7.3 General Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . .
35.7.4 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.7.5 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . .
35.7.6 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
379
380
380
381
382
382
382
382
386
386
386
386
386
387
387
36 httpd — Apache Web Server
36.1 Web Server Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36.2 Installing and Configuring Apache . . . . . . . . . . . . . . . . . . . . . .
36.2.1 Sample httpd.conf
. . . . . . . . . . . . . . . . . . . . . . . . .
36.2.2 Common directives . . . . . . . . . . . . . . . . . . . . . . . . . .
36.2.3 User HTML directories . . . . . . . . . . . . . . . . . . . . . . . .
36.2.4 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36.2.5 Fancy indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36.2.6 Encoding and language negotiation . . . . . . . . . . . . . . . . .
36.2.7 Server-side includes — SSI . . . . . . . . . . . . . . . . . . . . . .
36.2.8 CGI — Common Gateway Interface . . . . . . . . . . . . . . . . .
36.2.9 Forms and CGI . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36.2.10 Setuid CGIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36.2.11 Apache modules and PHP . . . . . . . . . . . . . . . . . . . . . .
36.2.12 Virtual hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
399
400
401
403
405
406
407
394
398
398
399
389
389
393
393
37 crond and atd
37.1
/etc/crontab Configuration File . . . . . . . . . . . . . . . . . . . . .
37.2 The at Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37.3 Other cron Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
409
409
411
412 xxii
Contents
38 postgres SQL Server
38.1 Structured Query Language . . . . . . . . . . . . . . . . . . . . . . . . . .
38.2
postgres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38.3
postgres Package Content . . . . . . . . . . . . . . . . . . . . . . . . . .
38.4 Installing and Initializing postgres . . . . . . . . . . . . . . . . . . . .
38.5 Database Queries with psql . . . . . . . . . . . . . . . . . . . . . . . . .
38.6 Introduction to SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38.6.1 Creating tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38.6.2 Listing a table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38.6.3 Adding a column . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38.6.4 Deleting (dropping) a column . . . . . . . . . . . . . . . . . . . .
38.6.5 Deleting (dropping) a table . . . . . . . . . . . . . . . . . . . . . .
38.6.6 Inserting rows, “object relational” . . . . . . . . . . . . . . . . . .
38.6.7 Locating rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38.6.8 Listing selected columns, and the oid column . . . . . . . . . . .
38.6.9 Creating tables from other tables . . . . . . . . . . . . . . . . . . .
38.6.10 Deleting rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38.6.11 Searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38.6.12 Migrating from another database; dumping and restoring tables as plain text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38.6.13 Dumping an entire database . . . . . . . . . . . . . . . . . . . . .
38.6.14 More advanced searches . . . . . . . . . . . . . . . . . . . . . . .
38.7 Real Database Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
422
423
423
423
415
417
418
418
413
413
414
414
419
420
420
420
420
421
421
421
421
422
39 smbd — Samba NT Server
39.1 Samba: An Introduction by Christopher R. Hertel . . . . . . . . . . . . .
39.2 Configuring Samba . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39.3 Configuring Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39.4 Configuring a Windows Printer . . . . . . . . . . . . . . . . . . . . . . . .
39.5 Configuring swat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39.6 Windows NT Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
425
425
431
433
434
434
435
40 named — Domain Name Server
40.1 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40.2 Configuring bind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40.2.1 Example configuration . . . . . . . . . . . . . . . . . . . . . . . .
40.2.2 Starting the name server . . . . . . . . . . . . . . . . . . . . . . .
437
438
438
438
443 xxiii
Contents
40.2.3 Configuration in detail . . . . . . . . . . . . . . . . . . . . . . . .
40.3 Round-Robin Load-Sharing . . . . . . . . . . . . . . . . . . . . . . . . . .
40.4 Configuring named for Dialup Use . . . . . . . . . . . . . . . . . . . . . .
40.4.1 Example caching name server . . . . . . . . . . . . . . . . . . . .
40.4.2 Dynamic IP addresses . . . . . . . . . . . . . . . . . . . . . . . . .
40.5 Secondary or Slave DNS Servers . . . . . . . . . . . . . . . . . . . . . . .
444
448
449
449
450
450
41 Point-to-Point Protocol — Dialup Networking
41.1 Basic Dialup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41.1.1 Determining your chat script . . . . . . . . . . . . . . . . . . . .
41.1.2 CHAP and PAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41.1.3 Running pppd . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41.2 Demand-Dial, Masquerading . . . . . . . . . . . . . . . . . . . . . . . . .
41.3 Dialup DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41.4 Dial-in Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41.5 Using tcpdump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41.6 ISDN Instead of Modems . . . . . . . . . . . . . . . . . . . . . . . . . . .
453
453
455
456
456
458
460
460
462
462
42 The L
INUX
Kernel Source, Modules, and Hardware Support
42.1 Kernel Constitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42.2 Kernel Version Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42.3 Modules, insmod Command, and Siblings . . . . . . . . . . . . . . . . .
42.4 Interrupts, I/O Ports, and DMA Channels . . . . . . . . . . . . . . . . .
42.5 Module Options and Device Configuration . . . . . . . . . . . . . . . . .
42.5.1 Five ways to pass options to a module . . . . . . . . . . . . . . .
42.5.2 Module documentation sources . . . . . . . . . . . . . . . . . . .
42.6 Configuring Various Devices . . . . . . . . . . . . . . . . . . . . . . . . .
42.6.1 Sound and pnpdump . . . . . . . . . . . . . . . . . . . . . . . . . .
42.6.2 Parallel port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42.6.3 NIC — Ethernet, PCI, and old ISA . . . . . . . . . . . . . . . . . .
42.6.4 PCI vendor ID and device ID . . . . . . . . . . . . . . . . . . . . .
42.6.5 PCI and sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42.6.6 Commercial sound drivers . . . . . . . . . . . . . . . . . . . . . .
42.6.7 The ALSA sound project . . . . . . . . . . . . . . . . . . . . . . .
42.6.8 Multiple Ethernet cards . . . . . . . . . . . . . . . . . . . . . . . .
42.6.9 SCSI disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
474
474
474
475
469
470
470
472
472
475
475
463
463
464
464
466
467
467 xxiv
Contents
42.6.10 SCSI termination and cooling . . . . . . . . . . . . . . . . . . . . .
42.6.11 CD writers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42.6.12 Serial devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42.7 Modem Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42.8 More on LILO: Options . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42.9 Building the Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42.9.1 Unpacking and patching . . . . . . . . . . . . . . . . . . . . . . .
42.9.2 Configuring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42.10 Using Packaged Kernel Source . . . . . . . . . . . . . . . . . . . . . . . .
42.11 Building, Installing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
477
477
479
480
481
481
481
482
483
483
43 The X Window System
43.1 The X Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43.2 Widget Libraries and Desktops . . . . . . . . . . . . . . . . . . . . . . . .
43.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43.2.2 Qt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43.2.3 Gtk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43.2.4 GNUStep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43.3 XFree86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43.3.1 Running X and key conventions . . . . . . . . . . . . . . . . . . .
43.3.2 Running X utilities . . . . . . . . . . . . . . . . . . . . . . . . . . .
43.3.3 Running two X sessions . . . . . . . . . . . . . . . . . . . . . . . .
43.3.4 Running a window manager . . . . . . . . . . . . . . . . . . . . .
43.3.5 X access control and remote display . . . . . . . . . . . . . . . . .
43.3.6 X selections, cutting, and pasting . . . . . . . . . . . . . . . . . .
43.4 The X Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43.5 X Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43.5.1 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43.5.2 Configuration documentation . . . . . . . . . . . . . . . . . . . .
43.5.3 XFree86 web site . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43.6 X Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43.6.1 Simple 16-color X server . . . . . . . . . . . . . . . . . . . . . . . .
43.6.2 Plug-and-Play operation . . . . . . . . . . . . . . . . . . . . . . .
43.6.3 Proper X configuration . . . . . . . . . . . . . . . . . . . . . . . .
43.7 Visuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43.8 The startx and xinit Commands . . . . . . . . . . . . . . . . . . . . .
498
498
498
499
499
500
501
504
505
493
494
495
495
496
497
497
497
485
485
491
491
492
492
493
493 xxv
Contents
43.9 Login Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43.10 X Font Naming Conventions . . . . . . . . . . . . . . . . . . . . . . . . .
43.11 Font Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43.12 The Font Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
506
506
508
509
44 U
NIX
Security
44.1 Common Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44.1.1 Buffer overflow attacks . . . . . . . . . . . . . . . . . . . . . . . .
44.1.2 Setuid programs . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44.1.3 Network client programs . . . . . . . . . . . . . . . . . . . . . . .
44.1.4
/tmp file vulnerability . . . . . . . . . . . . . . . . . . . . . . . . .
44.1.5 Permission problems . . . . . . . . . . . . . . . . . . . . . . . . .
44.1.6 Environment variables . . . . . . . . . . . . . . . . . . . . . . . .
44.1.7 Password sniffing . . . . . . . . . . . . . . . . . . . . . . . . . . .
44.1.8 Password cracking . . . . . . . . . . . . . . . . . . . . . . . . . . .
44.1.9 Denial of service attacks . . . . . . . . . . . . . . . . . . . . . . . .
44.2 Other Types of Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44.3 Counter Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44.3.1 Removing known risks: outdated packages . . . . . . . . . . . .
44.3.2 Removing known risks: compromised packages . . . . . . . . . .
44.3.3 Removing known risks: permissions . . . . . . . . . . . . . . . .
44.3.4 Password management . . . . . . . . . . . . . . . . . . . . . . . .
44.3.5 Disabling inherently insecure services . . . . . . . . . . . . . . . .
44.3.6 Removing potential risks: network . . . . . . . . . . . . . . . . .
44.3.7 Removing potential risks: setuid programs . . . . . . . . . . . . .
44.3.8 Making life difficult . . . . . . . . . . . . . . . . . . . . . . . . . .
44.3.9 Custom security paradigms . . . . . . . . . . . . . . . . . . . . . .
44.3.10 Proactive cunning . . . . . . . . . . . . . . . . . . . . . . . . . . .
44.4 Important Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44.5 Security Quick-Quiz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44.6 Security Auditing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
514
515
515
515
515
516
516
516
511
511
512
513
514
514
519
520
521
522
523
517
517
517
517
518
523
524
A Lecture Schedule
A.1 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2 Student Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.3 Lecture Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
525
525
525
526 xxvi
Contents
B LPI Certification Cross-Reference
B.1
Exam Details for 101 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B.2
Exam Details for 102 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
531
531
536
C RHCE Certification Cross-Reference
C.1 RH020, RH030, RH033, RH120, RH130, and RH133 . . . . . . . . . . . .
C.2 RH300 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C.3 RH220 (RH253 Part 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C.4 RH250 (RH253 Part 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
543
543
544
547
549
D L
INUX
Advocacy FAQ
D.1 L
INUX
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
D.2 L
INUX
, GNU, and Licensing . . . . . . . . . . . . . . . . . . . . . . . . .
D.3 L
INUX
Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
D.4 L
INUX
Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
D.5 L
INUX
Compared to Other Systems . . . . . . . . . . . . . . . . . . . . .
D.6 Migrating to L
INUX
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
D.7 Technical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
551
551
556
560
563
563
567
569
E The GNU General Public License Version 2 573
Index 581
xxvii
xxviii
Contents
When I began working with GNU/L
INUX world. Though U
NIX in 1994, it was straight from the DOS was unfamiliar territory, L
INUX books assumed that anyone using L
INUX was migrating from System V or BSD—systems that I had never heard of. It is a sensible adage to create, for others to share, the recipe that you would most like to have had. Indeed, I am not convinced that a single unifying text exists, even now, without this book. Even so, I give it to you desperately incomplete; but there is only so much one can explain in a single volume.
I hope that readers will now have a single text to guide them through all facets of GNU/L
INUX
.
xxix
xxx
Contents
A special thanks goes to my technical reviewer, Abraham van der Merwe, and my production editor, Jane Bonnell. Thanks to Jonathan Maltz, Jarrod Cinman, and Alan
Tredgold for introducing me to GNU
/
Linux back in 1994 or so. Credits are owed to all
A
TEX, TEX, GhostScript, GhostView, Au-
A
TEX extension styles, DVIPS,
DVIPDFM, ImageMagick, XDVI, XPDF, and LaTeX2HTML without which this document would scarcely be possible. To name a few: John Bradley, David Carlisle, Eric
Cooper, John Cristy, Peter Deutsch, Nikos Drakos, Mark Eichin, Brian Fox, Carsten
Heinz, Spencer Kimball, Paul King, Donald Knuth, Peter Mattis, Frank Mittelbach,
Ross Moore, Derek B. Noonburg, Johannes Plass, Sebastian Rahtz, Chet Ramey, Tomas
Rokicki, Bob Scheifler, Rainer Schoepf, Brian Smith, Supoj Sutanthavibul, Herb Swan,
Tim Theisen, Paul Vojta, Martin Weber, Mark Wicks, Masatake Yamato, Ken Yap, Herman Zapf.
Thanks to Christopher R. Hertel for contributing his introduction to Samba.
An enormous thanks to the GNU project of the Free Software Foundation, to the countless developers of Free software, and to the many readers that gave valuable feedback on the web site.
xxxi
xxxii
Acknowledgments
Whereas books shelved beside this one will get your feet wet, this one lets you actually paddle for a bit, then thrusts your head underwater while feeding you oxygen.
This book covers GNU
/
L
INUX system administration, for popular distributions like RedHat and Debian , as a tutorial for new users and a reference for advanced administrators. It aims to give concise, thorough explanations and practical examples of each aspect of a U
NIX system. Anyone who wants a comprehensive text on (what is commercially called) “L
INUX
” need look no further—there is little that is not covered here.
The ordering of the chapters is carefully designed to allow you to read in sequence without missing anything. You should hence read from beginning to end, in order that later chapters do not reference unseen material. I have also packed in useful examples which you must practice as you read.
You will need to install a basic L
INUX system. A number of vendors now ship pointand-click-install CDs: you should try get a Debian or “RedHat-like” distribution.
1
1.4. More About This Book 1. Introduction
One hint: try and install as much as possible so that when I mention a software package in this text, you are likely to have it installed already and can use it immediately.
Most cities with a sizable IT infrastructure will have a L
INUX user group to help you source a cheap CD. These are getting really easy to install, and there is no longer much need to read lengthy installation instructions.
Chapter 16 contains a fairly comprehensive list of all reference documentation available on your system. This book supplements that material with a tutorial that is both comprehensive and independent of any previous U
NIX knowledge.
GNU
The book also aims to satisfy the requirements for course notes for a
/
L
INUX training course. Here in South Africa, I use the initial chapters as part of a 36-hour GNU
/
L
INUX training course given in 12 lessons. The details of the layout for this course are given in Appendix A.
Note that all “L
INUX
” systems are really composed mostly of GNU ware, but from now on I will refer to the GNU almost everyone (incorrectly) does.
system as “L
INUX soft-
” in the way
Any system reference will require you to read it at least three times before you get a reasonable picture of what to do.
If you need to read it more than three times, then there is probably some other information that you really should be reading first. If you are reading a document only once, then you are being too impatient with yourself.
It is important to identify the exact terms that you fail to understand in a document. Always try to backtrack to the precise word before you continue.
U
NIX
Its also probably not a good idea to learn new things according to deadlines. Your knowledge should evolve by grace and fascination, rather than pressure.
The difference between being able to pass an exam and being able to do something useful, of course, is huge.
2
1. Introduction 1.7. Not RedHat: RedHatlike
The LPI and RHCE are two certifications that introduce you to L
INUX
. This book covers
far
more than both these two certifications in most places, but occasionally leaves out minor items as an exercise. It certainly covers in excess of what you need to know to pass both these certifications.
The LPI and RHCE requirements are given in Appendix B and C.
These two certifications are merely introductions to U
NIX
. To earn them, users are not expected to write nifty shell scripts to do tricky things, or understand the subtle or advanced features of many standard services, let alone be knowledgeable of the enormous numbers of non-standard and useful applications out there. To be blunt: you can pass these courses and still be considered quite incapable by the standards of companies that do
system integration
.
&
System integration is my own term. It refers to the act of getting L
INUX to do nonbasic functions, like writing complex shell scripts; setting up wide-area dialup networks; creating custom distributions; or interfacing database, web, and email services together.
In fact, these certifications make no reference to computer programming whatsoever.
Throughout this book I refer to examples specific to “RedHat” and “Debian ”. What
I actually mean by this are systems that use .rpm
( r edHat p ackage m anager) packages as opposed to systems that use .deb
( deb ian) packages—there are lots of both. This just means that there is no reason to avoid using a distribution like Mandrake, which is .rpm
based and viewed by many as being better than RedHat.
In short, brand names no longer have any meaning in the Free software community.
(Note that the same applies to the word U
NIX mon denominator between all the U
NIX which we take to mean the comvariants, including RISC, mainframe, and PC variants of both System V and BSD.)
Corrections to this book will be posted on http://www.icon.co.za/˜psheer/rute-errata.html
Please check this web page before notifying me of errors.
.
3
1.8. Updates and Errata 1. Introduction
4
This chapter explains some basics that most computer users will already be familiar with. If you are new to U
NIX
, however, you may want to gloss over the commonly used key bindings for reference.
The best way of thinking about how a computer stores and manages information is to ask yourself how
you
would. Most often the way a computer works is exactly the way you would expect it to if you were inventing it for the first time. The only limitations on this are those imposed by logical feasibility and imagination, but almost anything else is allowed.
When you first learned to count, you did so with 10 digits. Ordinary numbers (like telephone numbers) are called “base ten” numbers. Postal codes that include letters
and
digits are called “base 36” numbers because of the addition of 26 letters onto the usual 10 digits. The simplest base possible is “base two” which uses only two digits: 0 and 1. Now, a 7-digit telephone number has 10
×
10
×
10
×
10
×
10
×
10
×
10 =
7
digits
10
7
36
4
2
8
= 10
= 1
,
= 256
,
000
679
,
,
000
616 possible combinations. A postal code with four characters has possible combinations. However, an 8-digit binary number only has possible combinations.
Since the internal representation of numbers within a computer is binary and since it is rather tedious to convert between decimal and binary, computer scientists have come up with new bases to represent numbers: these are “base sixteen” and
“base eight,” known as
hexadecimal
and
octal
, respectively. Hexadecimal numbers use
5
2.1. Binary, Octal, Decimal, and Hexadecimal 2. Computing Sub-basics the digits 0 through 9 and the letters A through F, whereas octal numbers use only the digits 0 through 7. Hexadecimal is often abbreviated as
hex
.
Consider a 4-digit binary number. It has 2
4
= 16 possible combinations and can therefore be easily represented by one of the 16 hex digits. A 3-digit binary number has 2
3
= 8 possible combinations and can thus be represented by a single octal digit.
Hence, a binary number can be represented with hex or octal digits without much calculation, as shown in Table 2.1.
Table 2.1 Binary hexadecimal, and octal representation
Binary Hexadecimal Binary Octal
0000
0001
0010
0011
0
1
2
3
000
001
010
011
0
1
2
3
0100
0101
0110
0111
4
5
6
7
100
101
110
111
4
5
6
7
1000
1001
1010
1011
1100
1101
1110
1111
8
9
A
B
C
D
E
F
A binary number 01001011 can be represented in hex as 4B and in octal as 113 by simply separating the binary digits into groups of four or three, respectively.
In U
NIX administration, and also in many programming languages, there is often the ambiguity of whether a number is in fact a hex, decimal, or octal number. For instance, a hex number 56 is 01010110, but an octal number 56 is 101110, whereas a decimal number 56 is 111000 (computed through a more tedious calculation). To distinguish between them, hex numbers are often prefixed with the characters “0x”, while octal numbers are prefixed with a “0”. If the first digit is 1 through 9, then it is a decimal number that is probably being referred to. We would then write 0x56 for hex, and
056 for octal. Another representation is to append the letter H, D, O, or B (or h, d, o, b) to the number to indicate its base.
U
NIX makes heavy use of 8-, 16-, and 32-digit binary numbers, often representing them as 2-, 4-, and 8-digit hex numbers. You should get used to seeing numbers like
0xffff (or FFFFh), which in decimal is 65535 and in binary is 1111111111111111.
6
2. Computing Sub-basics 2.2. Files
Common to every computer system invented is the
file
. A file holds a single contiguous block of data. Any kind of data can be stored in a file, and there is no data that cannot be stored in a file. Furthermore, there is no kind of data that is stored anywhere else except in files. A file holds data of the same type, for instance, a single picture will be stored in one file. During production, this book had each chapter stored in a file. It is uncommon for different types of data (say, text and pictures) to be stored together in the same file because it is inconvenient. A computer will typically contain about 10,000 files that have a great many purposes. Each file will have its own name. The file name on a L
INUX or U
NIX machine can be up to 256 characters long.
The file name is usually explanatory—you might call a letter you wrote to your friend something like Mary Jones.letter
(from now on, whenever you see the typewriter font
&
A style of print: here is typewriter font .
, it means that those are words that might be read off the screen of the computer). The name you choose has no meaning to the computer and could just as well be any other combination of letters or digits; however, you will refer to that data with that file name whenever you give an instruction to the computer regarding that data, so
you
would like it to be descriptive.
&
It is important to internalize the fact that computers do not have an interpretation for anything. A computer operates with a set of interdependent logical rules.
Interdependent
means that the rules have no apex, in the sense that computers have no fixed or single way of working. For example, the reason a computer has files at all is because computer
programmers
have decided that this is the most universal and convenient way of storing data, and if you think about it, it really is.
-
The data in each file is merely a long list of numbers. The just the length of the list of numbers. Each number is called a
byte size
of the file is
. Each byte contains 8
bits
. Each bit is either a one or a zero and therefore, once again, there are
×
2
×
2
×
2
×
2
×
2
×
2
×
2 possible combinations. Hence a byte can only
8
bits
1
byte
hold a number as large as 255. There is no type of data that cannot be represented as a list of bytes. Bytes are sometimes also called
octets
. Your letter to Mary will be
encoded
into bytes for storage on the computer. We all know that a television picture is just a sequence of dots on the screen that scan from left to right. In that way, a picture might be represented in a file: that is, as a sequence of bytes where each byte is interpreted as a level of brightness—0 for black and 255 for white. For your letter, the convention is to store an A as 65, a B as 66, and so on. Each punctuation character also has a numerical equivalent.
A mapping between numbers and characters is called a
character mapping character set
. The most common character set in use in the world today is the or a
ASCII
character set which stands for the American Standard Code for Information Interchange. Table 2.2 shows the complete ASCII mappings between characters and their hex, decimal, and octal equivalents.
7
2.3. Commands 2. Computing Sub-basics
Oct
030
031
032
033
034
035
036
037
020
021
022
023
024
025
026
027
010
011
012
013
014
015
016
017
000
001
002
003
004
005
006
007
Char
FS
GS
RS
US
CAN
EM
SUB
ESC
DLE
DC1
DC2
DC3
DC4
NAK
SYN
ETB
BS
HT
LF
VT
FF
CR
SO
SI
NUL
SOH
STX
ETX
EOT
ENQ
ACK
BEL
Hex
1C
1D
1E
1F
18
19
1A
1B
14
15
16
17
10
11
12
13
0C
0D
0E
0F
08
09
0A
0B
04
05
06
07
00
01
02
03
Dec
28
29
30
31
24
25
26
27
20
21
22
23
16
17
18
19
12
13
14
15
8
9
10
11
6
7
4
5
2
3
0
1
Oct
070
071
072
073
074
075
076
077
060
061
062
063
064
065
066
067
050
051
052
053
054
055
056
057
040
041
042
043
044
045
046
047
Char
>
?
<
=
:
;
8
9
6
7
4
5
2
3
0
1
.
/
,
-
(
)
*
+
&
’
$
%
"
#
SPACE
!
Hex
3C
3D
3E
3F
38
39
3A
3B
34
35
36
37
30
31
32
33
2C
2D
2E
2F
28
29
2A
2B
24
25
26
27
20
21
22
23
Table 2.2 ASCII character set
Dec
60
61
62
63
56
57
58
59
52
53
54
55
48
49
50
51
44
45
46
47
40
41
42
43
36
37
38
39
32
33
34
35
Oct Dec Hex Char
130
131
132
133
134
135
136
137
120
121
122
123
124
125
126
127
110
111
112
113
114
115
116
117
100
101
102
103
104
105
106
107
92
93
94
95
88
89
90
91
84
85
86
87
80
81
82
83
76
77
78
79
72
73
74
75
68
69
70
71
64
65
66
67
5C
5D
5E
5F
58
59
5A
5B
54
55
56
57
50
51
52
53
4C
4D
4E
4F
48
49
4A
4B
44
45
46
47
40
41
42
43
ˆ
_
\
]
Z
[
X
Y
V
W
T
U
R
S
P
Q
N
O
L
M
H
I
J
K
F
G
D
E
B
C
@
A
Hex
7C
7D
7E
7F
78
79
7A
7B
74
75
76
77
70
71
72
73
6C
6D
6E
6F
68
69
6A
6B
64
65
66
67
60
61
62
63
Dec
120
121
122
123
124
125
126
127
112
113
114
115
116
117
118
119
104
105
106
107
108
109
110
111
96
97
98
99
100
101
102
103
Char
z
{ x y
|
}
˜
DEL
v w t u r s p q n o l m h i j k f g d e b c
‘ a
Oct
170
171
172
173
174
175
176
177
160
161
162
163
164
165
166
167
150
151
152
153
154
155
156
157
140
141
142
143
144
145
146
147
The second thing common to every computer system invented is the
command
. You tell the computer what to do with single words typed into the computer one at a time.
Modern computers appear to have done away with the typing of commands by having beautiful graphical displays that work with a mouse, but, fundamentally, all that is happening is that commands are being secretly typed in for you. Using commands is still the only way to have complete power over the computer. You don’t really know anything about a computer until you come to grips with the commands it uses. Using a computer will very much involve typing in a word, pressing , and then waiting for the computer screen to spit something back at you. Most commands are typed in to do something useful to a file.
8
2. Computing Sub-basics 2.4. Login and Password Change
Turn on your L
INUX
gin prompt
. A
prompt
box. After a few minutes of initialization, you will see the
lo-
is one or more characters displayed on the screen that you are expected to follow with some typing of your own. Here the prompt may state the name of the computer (each computer has a name—typically consisting of about eight lowercase letters) and then the word login: . L
INUX machines now come with a graphical desktop by default (most of the time), so you might get a pretty graphical login with the same effect. Now you should type your
login name
—a sequence of about eight lower case letters that would have been assigned to you by your computer administrator—and then press the Enter (or Return ) key (that is, ).
A
password prompt
will appear after which you should type your password. Your password
may
be the same as your
login name
. Note that your password will not be shown on the screen as you type it but will be invisible. After typing your password, press the Enter or Return key again. The screen might show some message and prompt you for a log in again—in this case, you have probably typed something incorrectly and should give it another try. From now on, you will be expected to know that the
Enter or Return key should be pressed at the end of every line you type in, analogous to the mechanical typewriter. You will also be expected to know that human error is very common; when you type something incorrectly, the computer will give an error message, and you should try again until you get it right. It is uncommon for a person to understand computer concepts after a first reading or to get commands to work on the first try.
Now that you have logged in you will see a
shell prompt
—a
shell
is the place where you can type commands. The shell is where you will spend most of your time as a system administrator
&
Computer manager.
, but it needn’t look as bland as you see now. Your first exercise is to change your password. Type the command passwd .
You will be asked for a new password and then asked to confirm that password. The password you choose should consist of letters, numbers, and punctuation—you will see later on why this security measure is a good idea. Take good note of your password for the next time you log in. Then the shell will return. The password you have chosen will take effect immediately, replacing the previous password that you used to log in.
The password command might also have given some message indicating what effect it actually had. You may not understand the message, but you should try to get an idea of whether the connotation was positive or negative.
When you are using a computer, it is useful to imagine yourself as ferent places
within being
in difthe computer, rather than just typing commands into it. After you entered the password passwd
place
command, you were no longer
in
the shell, but moved
. You could not use the shell until you had moved
out
of the
into
the passwd command.
9
2.5. Listing Files 2. Computing Sub-basics
Type in the command
U
NIX commands.
ls ls .
ls is short for
list
, abbreviated to two letters like most other lists all your current files. You may find that ls does nothing, but just returns you back to the shell. This would be because you have no files as yet.
Most U
NIX commands do
not
give any kind of message unless something went wrong
(the passwd command above was an exception). If there were files, you would see their names listed rather blandly in columns with no indication of what they are for.
The following keys are useful for editing the
command-line
. Note that U
NIX has had a long and twisted evolution from the mainframe, and the , and other keys may not work properly. The following keys bindings are however common throughout many L
INUX applications:
Ctrl-a Move to the beginning of the line (
Ctrl-e Move to the end of the line ( ).
Ctrl-h Erase backward (
Ctrl-d Erase forward ( ).
).
Ctrl-f Move forward one character (
Ctrl-b Move backward one character (
Alt-f Move forward one word.
Alt-b Move backward one word.
Alt-Ctrl-f Erase forward one word.
Alt-Ctrl-b Erase backward one word.
Ctrl-p Previous command (up arrow).
Ctrl-n Next command (down arrow).
).
).
).
Note that the prefixes Alt for , Ctrl for , and Shift for , mean to hold the key down through the pressing and releasing of the letter key. These are known as
key modifiers
. Note also, that the Ctrl key is always case insensitive; hence Ctrl-D (i.e.
–
– ) and Ctrl-d (i.e.
– ) are identical. The Alt modifier (i.e., –
?
) is
10
2. Computing Sub-basics 2.7. Console Keys in fact a short way of pressing and releasing hence Esc then f is the same as Alt-f —U
NIX before entering the key combination; is different from other operating systems in this use of Esc . The Alt modifier is not case insensitive although some applications will make a special effort to respond insensitively. The Alt key is also sometimes referred to as the Meta example, key. All of these keys are sometimes referred to by their abbreviations: for
C-a for Ctrl-a , or M-f for Meta-f and Alt-f . The Ctrl modifier is sometimes also designated with a caret: for example, ˆC for Ctrl-C .
p and
Your command-line keeps a history of all the commands you have typed in.
Ctrl-n will cycle through previous commands entered. New users seem to gain tremendous satisfaction from typing in lengthy commands over and over.
Never
Ctrltype in anything more than once—use your command history instead.
Ctrl-s sponding.
is used to
Ctrl-q
s
uspend the current session, causing the keyboard to stop rereverses this condition.
Ctrl-r activates a search on your command history. Pressing Ctrl-r in the middle of a search finds the next match whereas Ctrl-s reverts to the previous match (although some distributions have this confused with suspend).
The Tab command is tremendously useful for saving key strokes. Typing a partial directory name, file name, or command, and then pressing Tab once or twice in sequence completes the word for you without your having to type it all in full.
by editing the file
¨
You can make Tab and other keys stop beeping in the irritating way that they do
/etc/inputrc and adding the line set bell-style none
§
¥
¦ and then logging out and logging in again. (More about this later.)
There are several special keys interpreted directly by the L
INUX interface. The Ctrl-Alt-Del
console
or text mode combination initiates a complete shutdown and hardware reboot, which is the preferred method of restarting L
INUX
.
The Ctrl-PgUp and Ctrl-PgDn keys scroll the console, which is very useful for seeing text that has disappeared off the top of the terminal.
You can use Alt-F2 to switch to a new, independent login session. Here you can log in again and run a separate session. There are six of these
virtual consoles
— Alt-
F1 through Alt-F6 —to choose from; they are also called in graphical mode, you will have to instead press
virtual terminals
Ctrl-Alt-F
?
because the
. If you are
Alt-F
?
keys are often used by applications. The convention is that the seventh virtual console is graphical, so Alt-F7 will always take you back to graphical mode.
11
2.8. Creating Files 2. Computing Sub-basics
There are many ways of creating a file. Type cat > Mary Jones.letter
type out a few lines of text. You will use this file in later examples. The cat and then command is used here to write from the keyboard into a file Mary Jones.letter
. At the end of the last line, press ls one more time and then press again, you will see the file Mary Jones.letter
– . Now, if you type listed with any other files. Type cat Mary Jones.letter
without
the > . You will see that the command cat writes the contents of a file to the screen, allowing you to view your letter. It should match exactly what you typed in.
Although U
NIX file names can contain almost any character, standards dictate that only the following characters are preferred in file names:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 .
- ˜
Hence, never use other punctuation characters, brackets, or control characters to name files. Also, never use the space or tab character in a file name, and never begin a file name with a character.
I mentioned that a system may typically contain 10,000 files. Since it would be cumbersome if you were to see all 10,000 of them whenever you typed ls , files are placed in different “cabinets” so that files of the same type are placed together and can be easily isolated from other files. For instance, your letter above might go in a separate “cabinet” with other letters. A “cabinet” in computer terms is actually called a directory. This is the third commonality between all computer systems: all files go in one or another directory. To get an idea of how directories work, type the command mkdir letters , where mkdir stands for
make directory
. Now type ls . This will show the file Mary Jones.letter
as well as a new file, letters . The file letters is not really a file at all, but the name of a directory in which a number of other files can be placed. To go stands for
into
the directory
change directory
letters , you can type cd letters where cd
. Since the directory is newly created, you would not expect it to contain any files, and typing now create a file by using the ls cat will verify such by not listing anything. You can command as you did before (try this). To go back
12
2. Computing Sub-basics 2.10. Directories to the original directory that you were in, you can use the command
..
cd ..
has the special meaning of taking you out of the current directory. Type where the ls again to verify that you have actually gone
up
a directory.
It is, however, bothersome that we cannot tell the difference between files and directories. The way to differentiate is with the ls -l command.
-l stands for
long
format. If you enter this command, you will see a lot of details about the files that may not yet be comprehensible to you. The three things you can watch for are the file name on the far right, the file size (i.e., the number of bytes that the file contains) in the fifth column from the left, and the file type on the far left. The file type is a string of letters of which you will only be interested in one: the character on the far left is either a or a d . A signifies a regular file, and a d signifies a directory. The command ls -l Mary Jones.letter
will list only the single file is useful for finding out the size of a single file.
Mary Jones.letter
and
In fact, there is no limitation on how many directories you can create within each other. In what follows, you will glimpse the layout of all the directories on the computer.
Type the command cd / , where the most directory on the computer called the
/ has the special meaning to go to the top-
root
directory. Now type ls -l . The listing may be quite long and may go off the top of the screen; in that case, try ls -l | less
(then use PgUp and PgDn, and press q when done). You will see that most, if not all, are directories. You can now practice moving around the system with the cd command, not forgetting that cd ..
takes you up and cd / takes you to the root directory.
At any time you can type are currently in.
pwd (
p resent w orking d irectory
) to show the directory you
When you have finished, log out of the computer by using the logout command.
13
2.10. Directories 2. Computing Sub-basics
14
This chapter explains a little about PC hardware. Readers who have built their own PC or who have configuring myriad devices on Windows can probably skip this section.
It is added purely for completeness. This chapter actually comes under the subject of
Microcomputer Organization
, that is, how your machine is electronically structured.
Inside your machine you will find a single, large circuit board called the
motherboard
(see Figure 3.1). It is powered by a humming power supply and has connector leads to the keyboard and other
peripheral devices
.
&
Anything that is not the motherboard, not the power supply and not purely mechanical.
-
The motherboard contains several large microchips and many small ones. The important ones are listed below.
RAM
Random Access Memory
or just
memory
. The memory is a single linear sequence of bytes that are erased when there is no power. It contains sequences of simple coded
instructions
of one to several bytes in length. Examples are: add this number to that; move this number to this device; go to another part of RAM to get other instructions; copy this part of RAM to this other part. When your machine has “64 megs” (64 megabytes), it has 64 1024 1024 bytes of RAM. Locations within that space are called
memory addresses
, so that saying “memory address
1000” means the 1000th byte in memory.
ROM
A small part of RAM does not reset when the computer switches off. It is called
ROM
,
Read Only Memory
. It is factory fixed and usually never changes through the life of a PC, hence the name. It overlaps the area of RAM close to the end of
15
3.1. Motherboard 3. PC Hardware
Figure 3.1 Partially assembled motherboard
16
3. PC Hardware 3.1. Motherboard the first megabyte of memory, so that area of RAM is not physically usable. ROM contains instructions to start up the PC and access certain peripherals.
CPU
Central Processing Unit
. It is the thing that is called 80486, 80586, Pentium, or whatever. On startup, it
jumps
to memory address 1040475 (0xFE05B) and starts reading instructions. The first instructions it gets are actually to fetch more instructions from disk and give a Boot failure message to the screen if it finds nothing useful. The CPU requires a timer to drive it. The timer operates at a high speed of hundreds of millions of ticks per second (hertz). That’s why the machine is named, for example, a “400 MHz” (400 megahertz) machine. The MHz of the machine is roughly proportional to the number of instructions it can process per second from RAM.
I/O ports
Stands for
Input/Output
ports. The ports are a block of RAM that sits in parallel to the normal RAM. There are 65,536 I/O ports, hence I/O is small compared to RAM. I/O ports are used to write to peripherals. When the CPU writes a byte to I/O port 632 (0x278), it is actually sending out a byte through your parallel port. Most I/O ports are not used. There is no specific I/O port chip, though.
There is more stuff on the motherboard:
ISA slots
ISA (
eye-sah
) is a shape of socket for plugging in peripheral devices like modem cards and sound cards. Each card expects to be talked to via an I/O port (or several consecutive I/O ports). What I/O port the card uses is sometimes configured by the manufacturer, and other times is selectable on the card through jumpers
&
Little pin bridges that you can pull off with your fingers.
or switches on the card. Other times still, it can be set by the CPU using a system called
Plug and
Pray
&
This means that you plug the device in, then beckon your favorite deity for spiritual assistance. Actually, some people complained that this might be taken seriously—no, it’s a joke: the real term is Plug ’n
Play
or
PnP
. A card also sometimes needs to signal the CPU to indicate that it is ready to send or receive more bytes through an I/O port. They do this through 1 of 16 connectors inside the ISA slot. These are called
Interrupt
Request lines
or IRQ lines (or sometimes just
Interrupts
), so numbered 0 through
15. Like I/O ports, the IRQ your card uses is sometimes also jumper selectable, sometimes not. If you unplug an old ISA card, you can often see the actual copper thread that goes from the IRQ jumper to the edge connector. Finally, ISA cards can also access memory directly through one of eight
Direct Memory Access
Channels
or
DMA Channels
, which are also possibly selectable by jumpers. Not all cards use DMA, however.
In summary, the peripheral and the CPU need to cooperate on three things: the
I/O port, the IRQ, and the DMA.
If any two cards clash by using either the same I/O port, IRQ number, or DMA channel then they won’t work (at worst your machine will
crash
).
&
Come to a halt and stop responding.
-
17
3.1. Motherboard 3. PC Hardware
“8-bit” ISA slots
Old motherboards have shorter ISA slots. You will notice yours is a double slot (called “16-bit” ISA) with a gap between them. The larger slot can still take an older 8-bit ISA card: like many modem cards.
PCI slots
PCI (
pee-see-eye
) slots are like ISA but are a new standard aimed at highperformance peripherals like networking cards and graphics cards. They also use an IRQ, I/O port and possibly a DMA channel. These, however, are automatically configured by the CPU as a part of the PCI standard, hence there will rarely be jumpers on the card.
AGP slots
AGP slots are even higher performance slots for
Accelerated Graphics Processors
, in other words, cards that do 3D graphics for games. They are also autoconfigured.
Serial ports
A serial port connection may come straight from your motherboard to a socket on your case. There are usually two of these. They may drive an external modem and some kinds of mice and printers. Serial is a simple and cheap way to connect a machine where relatively slow (less that 10 kilobytes per second) data transfer speeds are needed. Serial ports have their own “ISA card” built into the motherboard which uses I/O port 0x3F8–0x3FF and IRQ 4 for the first serial port
(also called COM1 under DOS/Windows) and I/O port 0x2F8–0x2FF and IRQ 3 for COM2. A discussion on serial port technology proceeds in Section 3.4 below.
Parallel port
Normally, only your printer would plug in here. Parallel ports are, however, extremely fast (being able to transfer 50 kilobytes per second), and hence many types of parallel port devices (like CD-ROM drives that plug into a parallel port) are available. Parallel port cables, however, can only be a few meters in length before you start getting transmission errors. The parallel port uses I/O port 0x378–0x37A and IRQ 7. If you have two parallel ports, then the second one uses I/O port 0x278–0x27A, but does not use an IRQ at all.
USB port
The
Universal Serial Bus
aims to allow any type of hardware to plug into one plug. The idea is that one day all serial and parallel ports will be scrapped in favor of a single USB socket from which all external peripherals will daisy chain.
I will not go into USB here.
IDE ribbon
The IDE ribbon plugs into your hard disk drive or C: drive on Windows/DOS and also into your CD-ROM drive (sometimes called an IDE CD-
ROM). The IDE cable actually attaches to its own PCI card internal to the motherboard. There are two IDE connectors that use I/O ports 0xF000–0xF007 and
0xF008–0xF00F, and IRQ 14 and 15, respectively. Most IDE CD-ROMs are also
ATAPI CD-ROMs. ATAPI is a standard (similar to SCSI, below) that enables many other kinds of devices to plug into an IDE ribbon cable. You get special floppy drives, tape drives, and other devices that plug into the same ribbon. They will be all called ATAPI-(this or that).
18
3. PC Hardware 3.2. Master/Slave IDE
SCSI ribbon
Another ribbon might be present, coming out of a card (called the SCSI host adaptor or SCSI card) or your motherboard. Home PCs will rarely have
SCSI, such being expensive and used mostly for high-end servers. SCSI cables are more densely wired than are IDE cables. They also end in a disk drive, tape drive, CD-ROM, or some other device. SCSI cables are not allowed to just-beplugged-in: they must be connected end on end with the last device connected in a special way called
SCSI termination
. There are, however, a few SCSI devices that are automatically terminated. More on this on page 477.
Two IDE hard drives can be connected to a single IDE ribbon. The ribbon alone has nothing to distinguish which connector is which, so the drive itself has jumper pins on it (see Figure 3.2) that can be set to one of several options. These are one of
(MA),
Slave
(SL),
Cable Select
(CS), or
Master-only
/
Single-Drive
Master
/and-like. The MA option means that your drive is the “first” drive of two on this IDE ribbon. The SL option means that your drive is the “second” drive of two on this IDE ribbon. The CS option means that your machine is to make its own decision (some boxes only work with this setting), and the Master-only option means that there is no second drive on this ribbon.
Figure 3.2 Connection end of a typical IDE drive
There might also be a second IDE ribbon, giving you a total of four possible drives. The first ribbon is known as
IDE1
(labeled on your motherboard) or the
primary
ribbon, and the second is known as
IDE2
or the
secondary
ribbon. Your four drives are
19
3.3. CMOS 3. PC Hardware then called
primary master
, labeling under L
INUX
primary slave
,
secondary master
is discussed in Section 18.4.
, and
secondary slave
. Their
The “CMOS”
&
Stands for
Complementary Metal Oxide Semiconductor
ogy used to store setup information through power-downs.
-
, which has to do with the technolis a small application built into ROM.
It is also known as the ROM
BIOS
configuration. You can start it instead of your operating system (OS) by pressing or (or something else) just after you switch your machine on. There will usually be a message Press <key> to enter setup to explain this. Doing so will take you inside the CMOS program where you can change your machine’s configuration. CMOS programs are different between motherboard manufacturers.
Inside the CMOS, you can enable or disable built-in devices (like your mouses and serial ports); set your machine’s “hardware clock” (so that your machine has the correct time and date); and select the boot sequence (whether to load the operating system off the hard drive or CD-ROM—which you will need for installing L
INUX a bootable CD-ROM).
Boot
means to start up the computer.
from
&
The term comes from the lack of resources with which to begin: the operating system is on disk, but you might need the operating system to load from the disk—like trying to lift yourself up from your “bootstraps.”
your hard drive. You should always select
You can also configure
Hardrive autodetection
&
Autodetection
refers to a system that, though having incomplete information, configures itself. In this case the CMOS program probes the drive to determine its capacity. Very old CMOS programs required you to enter the drive’s details manually.
whenever installing a new machine or adding/removing disks. Different CMOSs will have different procedures, so browse through all the menus to see what your CMOS can do.
The CMOS is important when it comes to configuring certain devices built into the motherboard. Modern CMOSs allow you to set the I/O ports and IRQ numbers that you would like particular devices to use. For instance, you can make your CMOS switch COM1 with COM2 or use a non-standard I/O port for your parallel port. When it comes to getting such devices to work under L
INUX
, you will often have to power down your machine to see what the CMOS has to say about that device. More on this in Chapter 42.
Serial ports facilitate low speed communications over a short distance using simple
8 core (or less) cable. The standards are old and communication is not particularly fault tolerant. There are so many variations on serial communication that it has become somewhat of a black art to get serial devices to work properly. Here I give a
20
3. PC Hardware 3.4. Serial Devices short explanation of the protocols, electronics, and hardware. The Serial-HOWTO and
Modem-HOWTO documents contain an exhaustive treatment (see Chapter 16).
Some devices that communicate using serial lines are:
•
•
•
•
•
•
•
•
•
Ordinary domestic dial-up modems.
Some permanent modem-like Internet connections.
Mice and other pointing devices.
Character text terminals.
Printers.
Cash registers.
Magnetic card readers.
Uninterruptible power supply (UPS) units.
Embedded microprocessor devices.
A device is connected to your computer by a cable with a 9-pin or 25-pin, male
1 2 3 4 5 or female connector at each end. These are known as
DB-9
( ) or
DB-25
6 7 8 9
1 2 3 4 5 6 7 8 9
(
14 15 16 17 18 19 20 21 ever. See Table 3.1.
22
10
23
11
24
12
25
13
) connectors. Only eight of the pins are ever used, how-
Table 3.1 Pin assignments for DB-9 and DB-25 sockets
DB-9 pin number
1
9
6
4
5
7
8
3
2
DB-25 pin number
6
20
8
22
4
5
2
3
7
Acronym
TD
RD
RTS
CTS
DSR
DTR
CD
RI
Full-Name
Transmit Data
Receive Data
Request To Send
Clear To Send
Data Set Ready
Data Terminal Ready
Data Carrier Detect
Ring Indicator
Signal Ground
Direction
PC device
←
→
←
←
→
←
→
←
The way serial devices communicate is very straightforward: A stream of bytes is sent between the computer and the peripheral by dividing each byte into eight bits.
The voltage is toggled on a pin called the
TD pin
or
transmit pin
according to whether a bit is 1 or 0. A bit of 1 is indicated by a negative voltage (-15 to -5 volts) and a bit of
0 is indicated by a positive voltage (+5 to +15 volts). The
RD pin
or
receive pin
receives
21
3.4. Serial Devices 3. PC Hardware bytes in a similar way. The computer and the serial device need to agree on a
(also called the
serial port speed data rate
) so that the toggling and reading of voltage levels is properly synchronized. The speed is usually quoted in shows a list of possible serial port speeds.
bps
(bits per second). Table 3.2
50
75
110
134
150
Table 3.2 Serial port speeds in bps
200
300
600
1,200
1,800
2,400
4,800
9,600
19,200
38,400
57,600
115,200
576,000 2,000,000
921,600 2,500,000
230,400 1,000,000 3,000,000
460,800 1,152,000 3,500,000
500,000 1,500,000 4,000,000
A typical mouse communicates between 1,200 and 9,600 bps. Modems communicate at 19,200, 38,400, 57,600, or 115,200 bps. It is rare to find serial ports or peripherals that support the speeds not blocked in Table 3.2.
To further synchronize the peripheral with the computer, an additional proceeds each byte and up to two
stop bits start bit
follow each byte. There may also be a
parity bit
which tells whether there is an even or odd number of 1s in the byte (for error checking). In theory, there may be as many as 12 bits sent for each data byte. These additional bits are optional and device specific. Ordinary modems communicate with an
8N1
protocol—
8
data bits,
N
o parity bit, and
1
stop bit. A mouse communicates with 8 bits and no start, stop, or parity bits. Some devices only use 7 data bits and hence are limited to send only ASCII data (since ASCII characters range only up to
127).
Some types of devices use two more pins called the
request to send
(RTS) and
clear to send
(CTS) pins. Either the computer or the peripheral pull the respective pin to +12 volts to indicate that it is ready to receive data. A further two pins call the DTR (data terminal ready) pin and the DSR (data set ready) pin are sometimes used instead— these work the same way, but just use different pin numbers. In particular, domestic modems make full use of the RTS/CTS pins. This mechanism is called
RTS/CTS flow control
or
hardware flow control
. Some simpler devices make no use of flow control at all.
Devices that do not use flow control will loose data which is sent without the receiver’s readiness.
Some other devices also need to communicate whether they are ready to receive data, but do not have RTS/CTS pins (or DSR/DTR pins) available to them. These emit special control characters, sent amid the data stream, to indicate that flow should halt or restart. This is known as
software flow control
. Devices that optionally support either type of flow control should always be configured to use hardware flow control. In particular, a modem used with L
INUX
must
have hardware flow control enabled.
22
3. PC Hardware 3.5. Modems
Two other pins are the
ring indicator
(RI) pin and the
carrier detect
(CD) pin. These are only used by modems to indicate an incoming call and the detection of a peer modem, respectively.
The above pin assignments and protocol (including some hard-core electrical specifications which I have omitted) are known as
RS-232
. It is implemented using a standard chip called a 16550
UART
(Universal Asynchronous Receiver-Transmitter) chip. RS-232 is easily effected by electrical noise, which limits the length and speed at which you can communicate: A half meter cable can carry 115,200 bps without errors, but a 15 meter cable is reliable at no more than 19,200 bps. Other protocols (like RS-423 or RS-422) can go much greater distances and there are converter appliances that give a more advantageous speed/distance tradeoff.
Telephone lines, having been designed to carry voice, have peculiar limitations when it comes to transmitting data. It turns out that the best way to send a binary digit over a telephone line is to beep it at the listener using two different pitches: a low pitch for
0 and a high pitch for 1. Figure 3.3 shows this operation schematically.
Figure 3.3 Communication between two remote computers by modem
23
3.5. Modems 3. PC Hardware
Converting voltages to pitches and back again is known as
demodulation
and is where the word
modem
comes from. The word
baud modulation-
means the number of possible pitch switches per second, which is sometimes used interchangeably with bps. There are many newer modulation techniques used to get the most out of a telephone line, so that 57,600 bps modems are now the standard (as of this writing). Modems also do other things to the data besides modulating it: They may pack the data to reduce redundancies (
bit compression
) and perform error detection and compensation (
error correction
). Such
modem protocols
are given names like
V.90
(57,600 bps),
V.34
(33,600 bps or 28,800 bps),
V.42
(14,400 bps) or
V.32
(14,400 bps and lower). When two modems connect, they need to negotiate a “V” protocol to use. This negotiation is based on their respective capabilities and the current line quality.
A modem can be in one of two states: connected if it can hear a peer modem’s
command mode carrier signal
or
connect mode
. A modem is over a live telephone call (and is probably transmitting and receiving data in the way explained), otherwise it is in command mode. In command mode the modem does not modulate or transmit data but interprets special text sequences sent to it through the serial line. These text sequences begin with the letters AT and are called AT
tention commands
.
AT commands are sent by your computer to configure your modem for the current telephone line conditions, intended function, and serial port capability—for example, there are commands to: enable automatic answering on ring; set the flow control method; dial a number; and hang up. The sequence of commands used to configure the modem is called the
modem initialization string
. How to manually issue these commands is discussed in Section
32.6.3, 34.3, and 41.1 and will become relevant when you want to dial your Internet service provider (ISP).
Because each modem brand supports a slightly different set of modem commands, it is worthwhile familiarizing yourself with your modem manual. Most modern modems now support the
Hayes command set
—a generic set of the most useful modem commands. However, Hayes has a way of enabling hardware flow control that many popular modems do not adhere to. Whenever in this book I give examples of modem initialization, I include a footnote referring to this section. It is usually sufficient to configure your modem to “factory default settings”, but often a second command is required to enable hardware flow control. There are no initialization strings that work on all modems. The web sites and http://www.teleport.com/˜curt/modems.html
http://www.spy.net/˜dustin/modem/ are useful resources for finding out modem specifications.
24
All of
U
NIX
is
case sensitive
. A command with even a single letter’s capitalization altered is considered to be a completely different command. The same goes for files, directories, configuration file formats, and the syntax of all native programming languages.
In addition to directories and ordinary text files, there are other types of files, although all files contain the same kind of data (i.e., a list of bytes). The
hidden
file is a file that will not ordinarily appear when you type the command ls directory. To see a hidden file you must use the command to
list
ls -a the contents of a
. The -a option means to list
all
the contents in files as well as hidden files. Another variant is
long
format. The ls -l , which lists is used in this way to indicate variations on a command. These are called
U
NIX
command-line options
or
command-line arguments
, and most commands can take a number of them. They can be strung together in any way that is convenient
&
Commands under the GNU free software license are superior in this way: they have a greater number of options than traditional U
NIX commands and are therefore more flexible.
, for example, format.
ls -a -l , ls -l -a , or ls -al —any of these will list
all
files in
long
All GNU commands take the additional arguments -h type a command with just this on the command-line and get a and --help . You can
usage summary
. This is some brief help that will summarize options that you may have forgotten if you are
25
4.2. Error Messages 4. Basic Commands
already
familiar with the command—it will never be an exhaustive description of the usage. See the later explanation about man pages.
The difference between a name of a
hidden hidden
file and an ordinary file is merely that the file file starts with a period. Hiding files in this way is not for security, but for convenience.
The option sion is ls -l is somewhat cryptic for the novice. Its more explanatory verls --format=long . Similarly, the
all
option can be given as ls --all , and means the same thing as ls -a .
Although commands usually do not display a message when they puter accepted and processed the command.
-
execute
successfully, commands do report
&
The com-
errors
in a consistent format. The format varies from one command to another but often appears as follows:
command-name
:
what was attempted
:
error message
. For example, the command ls -l qwerty gives an error ls: qwerty: No such file or directory . What actually happened was that the command ls attempted to read the file qwerty . Since this file does not exist, an error code 2 arose. This error code corresponds to a situation where a file or directory is not being found. The error code is automatically translated into the sentence No such file or directory . It is important to understand the distinction between an explanatory message that a command gives (such as the messages reported by the passwd command in the previous chapter) and an error code that was just translated into a sentence. The reason is that a lot of different kinds of problems can result in an identical error code (there are only about a hundred different error codes). Experience will teach you that error messages do
not
tell you what to do, only what went wrong, and should not be taken as gospel.
The file /usr/include/asm/errno.h
contains a complete list of basic error codes. In addition to these, several other header files
&
Files ending in .h
might define their own error codes. Under U
NIX
, however, these are 99% of all the errors you are ever likely to get. Most of them will be meaningless to you at the moment but are included in Table 4.1 as a reference.
Number
2
3
4
0
1
7
8
5
6
9
C define
EPERM
ENOENT
ESRCH
EINTR
EIO
ENXIO
E2BIG
ENOEXEC
EBADF
Table 4.1 L
INUX error codes
Message
Success
Operation not permitted
No such file or directory
No such process
Interrupted system call
Input/output error
Device not configured
Argument list too long
Exec format error
Bad file descriptor continues...
26
4. Basic Commands 4.2. Error Messages
64
65
66
67
68
69
59
60
61
62
63
Number
35
36
37
38
39
40
32
33
34
35
28
29
30
31
24
25
26
27
20
21
22
23
13
14
15
16
17
18
19
10
11
11
12
54
55
56
57
50
51
52
53
46
47
48
49
42
43
44
45
C define
ENOMSG
EIDRM
ECHRNG
EL2NSYNC
EL3HLT
EL3RST
ELNRNG
EUNATCH
ENOCSI
EL2HLT
EBADE
EBADR
EXFULL
ENOANO
EBADRQC
EBADSLT
EDEADLOCK
EBFONT
ENOSTR
ENODATA
ETIME
ENOSR
ENONET
ENOPKG
EREMOTE
ENOLINK
EADV
ESRMNT
ECHILD
EAGAIN
EWOULDBLOCK
ENOMEM
EACCES
EFAULT
ENOTBLK
EBUSY
EEXIST
EXDEV
ENODEV
ENOTDIR
EISDIR
EINVAL
ENFILE
EMFILE
ENOTTY
ETXTBSY
EFBIG
ENOSPC
ESPIPE
EROFS
EMLINK
EPIPE
EDOM
ERANGE
EDEADLK
EDEADLOCK
ENAMETOOLONG
ENOLCK
ENOSYS
ENOTEMPTY
ELOOP
EWOULDBLOCK
Table 4.1 (continued)
Message
No child processes
Resource temporarily unavailable
Resource temporarily unavailable
Cannot allocate memory
Permission denied
Bad address
Block device required
Device or resource busy
File exists
Invalid cross-device link
No such device
Not a directory
Is a directory
Invalid argument
Too many open files in system
Too many open files
Inappropriate ioctl for device
Text file busy
File too large
No space left on device
Illegal seek
Read-only file system
Too many links
Broken pipe
Numerical argument out of domain
Numerical result out of range
Resource deadlock avoided
Resource deadlock avoided
File name too long
No locks available
Function not implemented
Directory not empty
Too many levels of symbolic links
(
same as
EAGAIN )
No message of desired type
Identifier removed
Channel number out of range
Level 2 not synchronized
Level 3 halted
Level 3 reset
Link number out of range
Protocol driver not attached
No CSI structure available
Level 2 halted
Invalid exchange
Invalid request descriptor
Exchange full
No anode
Invalid request code
Invalid slot
(
same as
EDEADLK )
Bad font file format
Device not a stream
No data available
Timer expired
Out of streams resources
Machine is not on the network
Package not installed
Object is remote
Link has been severed
Advertise error
Srmount error continues...
27
4.2. Error Messages 4. Basic Commands
Number
97
98
99
100
101
102
103
93
94
95
96
89
90
91
92
85
86
87
88
81
82
83
84
74
75
76
77
78
79
80
70
71
72
73
112
113
114
115
116
117
118
119
120
121
122
123
124
104
105
106
107
108
109
110
111
C define
ECONNRESET
ENOBUFS
EISCONN
ENOTCONN
ESHUTDOWN
ETOOMANYREFS
ETIMEDOUT
ECONNREFUSED
EHOSTDOWN
EHOSTUNREACH
EALREADY
EINPROGRESS
ESTALE
EUCLEAN
ENOTNAM
ENAVAIL
EISNAM
EREMOTEIO
EDQUOT
ENOMEDIUM
EMEDIUMTYPE
ECOMM
EPROTO
EMULTIHOP
EDOTDOT
EBADMSG
EOVERFLOW
ENOTUNIQ
EBADFD
EREMCHG
ELIBACC
ELIBBAD
ELIBSCN
ELIBMAX
ELIBEXEC
EILSEQ
ERESTART
ESTRPIPE
EUSERS
ENOTSOCK
EDESTADDRREQ
EMSGSIZE
EPROTOTYPE
ENOPROTOOPT
EPROTONOSUPPORT
ESOCKTNOSUPPORT
EOPNOTSUPP
EPFNOSUPPORT
EAFNOSUPPORT
EADDRINUSE
EADDRNOTAVAIL
ENETDOWN
ENETUNREACH
ENETRESET
ECONNABORTED
Table 4.1 (continued)
Message
Communication error on send
Protocol error
Multihop attempted
RFS specific error
Bad message
Value too large for defined data type
Name not unique on network
File descriptor in bad state
Remote address changed
Can not access a needed shared library
Accessing a corrupted shared library
.lib section in a.out corrupted
Attempting to link in too many shared libraries
Cannot exec a shared library directly
Invalid or incomplete multibyte or wide character
Interrupted system call should be restarted
Streams pipe error
Too many users
Socket operation on non-socket
Destination address required
Message too long
Protocol wrong type for socket
Protocol not available
Protocol not supported
Socket type not supported
Operation not supported
Protocol family not supported
Address family not supported by protocol
Address already in use
Cannot assign requested address
Network is down
Network is unreachable
Network dropped connection on reset
Software caused connection abort
Connection reset by peer
No buffer space available
Transport endpoint is already connected
Transport endpoint is not connected
Cannot send after transport endpoint shutdown
Too many references: cannot splice
Connection timed out
Connection refused
Host is down
No route to host
Operation already in progress
Operation now in progress
Stale NFS file handle
Structure needs cleaning
Not a XENIX named type file
No XENIX semaphores available
Is a named type file
Remote I/O error
Disk quota exceeded
No medium found
Wrong medium type
28
4. Basic Commands 4.3. Wildcards, Names, Extensions, and glob Expressions
ls can produce a lot of output if there are a large number of files in a directory. Now say that we are only interested in files that ended with the letters tter . To list only these files, you can use ls *tter . The
So, for example, the files
*
Tina.letter
, matches any number of any other characters.
Mary Jones.letter
and the file splatter , would all be listed if they were present, whereas a file listed. While the * matches any length of characters, the ?
Harlette would not be matches only one character.
For example, the command
Harlette .
ls ?ar* would list the files Mary Jones.letter
and
When naming files, it is a good idea to choose names that group files of the same type together. You do this by adding an
extension
to the file name that describes the type of file it is. We have already demonstrated this by calling a file
Mary Jones.letter
instead of just Mary Jones . If you keep this convention, you will be able to easily list all the files that are letters by entering file name Mary Jones.letter
ls *.letter
is then said to be composed of two parts: the
. The
name
,
Mary Jones , and the
extension
, letter .
Some common U
NIX extensions you may see are:
.a
Archive.
lib*.a
is a static library.
.alias
X Window System font alias catalog.
.avi
Video format.
.au
Audio format (original Sun Microsystems generic sound file).
.awk
awk program source file.
.bib
bibtex A TEX bibliography source file.
.bmp
Microsoft Bitmap file image format.
.bz2
File compressed with the bzip2 compression program.
.cc
, .cxx
, .C
, .cpp
C++ program source code.
.cf
, .cfg
Configuration file or script.
.cgi
Executable script that produces web page output.
.conf
, .config
Configuration file.
29
4.3. Wildcards, Names, Extensions, and glob Expressions 4. Basic Commands
.csh
csh shell script.
.c
C
program source code.
.db
Database file.
.dir
X Window System font/other database directory.
.deb
Debian package for the Debian distribution.
.diff
Output of the trees.
diff program indicating the difference between files or source
.dvi
Device-independent file. Formatted output of .tex
L TEX file.
.el
Lisp program source.
.g3
G3 fax format image file.
.gif
, .giff
GIF image file.
.gz
File compressed with the gzip compression program.
.htm
, .html
, .shtm
, .html
Hypertext Markup Language. A web page of some sort.
.h
C
/C++ program header file.
.i
SWIG source, or
C
preprocessor output.
.in
configure input file.
.info
Info pages read with the info command.
.jpg
, .jpeg
JPEG image file.
.lj
LaserJet file. Suitable input to a HP LaserJet printer.
.log
Log file of a system service. This file grows with status messages of some system program.
.lsm
L
INUX
Software Map entry.
.lyx
LyX word processor document.
.man
Man page.
.mf
Meta-Font font program source file.
.pbm
PBM image file format.
.pcf
PCF image file—intermediate representation for fonts. X Window System font.
.pcx
PCX image file.
30
4. Basic Commands 4.3. Wildcards, Names, Extensions, and glob Expressions
.pfb
X Window System font file.
Formatted document similar to PostScript or dvi.
.php
PHP program source code (used for web page design).
.pl
Perl program source code.
.ps
PostScript file, for printing or viewing.
.py
Python program source code.
.rpm
RedHat Package Manager rpm file.
.sgml
Standard Generalized Markup Language. Used to create documents to be converted to many different formats.
.sh
sh shell script.
.so
Shared object file.
lib*.so
is a
Dynamically Linked Library
.
code shared by more than one program to save disk space and memory.
-
&
Executable program
.spd
Speedo X Window System font file.
.tar
tar red directory tree.
.tcl
Tcl/Tk source code (programming language).
.texi
, .texinfo
Texinfo source. Info pages are compiled from these.
.tex
TEX or L TEX document. L
A
TEX is for document processing and typesetting.
.tga
TARGA image file.
.tgz
Directory tree that has been archived with tar , and then compressed with gzip .
Also a package for the Slackware distribution.
.tiff
TIFF image file.
.tfm
A
TEX font metric file.
.ttf
Truetype font.
.txt
Plain English text file.
.voc
Audio format (Soundblaster’s own format).
.wav
Audio format (sound files common to Microsoft Windows).
.xpm
XPM image file.
.y
yacc source file.
31
4.3. Wildcards, Names, Extensions, and glob Expressions 4. Basic Commands
.Z
File compressed with the compress compression program.
.zip
File compressed with the gram.
pkzip (or PKZIP.EXE
for DOS) compression pro-
.1
, .2
. . .
Man page.
In addition, files that have no extension and a capitalized descriptive name are usually plain English text and meant for your reading. They come bundled with packages and are for documentation purposes. You will see them hanging around all over the place.
Some full file names you may see are:
AUTHORS
List of people who contributed to or wrote a package.
ChangeLog
List of developer changes made to a package.
COPYING
Copyright (usually GPL) for a package.
INSTALL
Installation instructions.
README
in.
Help information to be read first, pertaining to the directory the README is
TODO
List of future desired work to be done to package.
BUGS
List of errata.
NEWS
Info about new features and changes for the layman about this package.
THANKS
List of contributors to a package.
VERSION
Version information of the package.
There is a way to restrict file listings to within the ranges of certain characters. If you only want to list the files that begin with A through M, you can run ls [A-M]* . Here the brackets have a special meaning—they match a single character like a ?
, but only those given by the range. You can use this feature in a variety of ways, for example,
[a-dJW-Y]* matches all files beginning with matches all files ending with aid , bid , cid or a , b did
, c , d
; and
, J , W , X or Y ; and *[a-d]id
*.{cpp,c,cxx} matches all files ending in expression.
.cpp
Glob
, .c
or .cxx
. This way of specifying a file name is called a
glob
expressions are used in many different contexts, as you will see later.
32
4. Basic Commands 4.4. Usage Summaries and the Copy Command
The command cp stands for
copy
. It duplicates one or more files. The format is cp <file> <newfile> cp <file> [<file> ...] <dir> or cp cp
file newfile file
[
file ...
]
dir
The above lines are called a
usage summary
. The actually type out these characters but replace
< and
<file>
> signs mean that you don’t with a file name of your own.
These are also sometimes written in italics like, written in capitals like, cp FILE NEWFILE .
cp
<file>
file newfile
and
. In rare cases they are
<dir> are called
parameters
.
Sometimes they are obviously numeric, like a command that takes <ioport> .
&
Anyone emailing me to ask why typing in literal, < , i , o , p , o , r , t and > reply.
characters did not work will get a rude
These are common conventions used to specify the usage of a command. The
[ and ] brackets are also not actually typed but mean that the contents between them are optional. The ellipses ...
mean that <file> can be given repeatedly, and these also are never actually typed. From now on you will be expected to substitute your own parameters by interpreting the usage summary. You can see that the second of the above lines is actually just saying that one or more file names can be listed with a directory name last.
From the above usage summary it is obvious that there are two ways to use the cp command. If the last name is not a directory, then cp copies that file and renames it to the file name given. If the last name is a directory, then
into
that directory.
cp copies all the files listed
The usage summary of the ls command is as follows:
¨ ls [-l, --format=long] [-a, --all] <file> <file> ...
ls -al
§
¥
¦ where the comma indicates that either option is valid. Similarly, with the command:
¨ passwd [<username>]
§ passwd
¥
¦
You should practice using the place to place.
cp command now by moving some of your files from
33
4.5. Directory Manipulation 4. Basic Commands
The new cd command is used to take you to different directories. Create a directory with mkdir new . You
could
create a directory one by doing cd new and then mkdir one , but there is a more direct way of doing this with can then change directly to the one directory with mkdir new/one cd new/one
. You
. And similarly you can get back to where you were with cd ../..
. In this way, the / directories within directories. The directory one is called a is used to represent
subdirectory
of new .
The command pwd stands for
present working directory
(also called the
rent directory
) and tells what directory you are currently in. Entering pwd
cur-
gives some output like rectory (with
/home/<username> cd /
. Experiment by changing to the root di-
) and then back into the directory /home/<username> (with cd /home/<username> ). The directory /home/<username> is called your
home directory
, and is where all your personal files are kept. It can be used at any time with the abbreviation tering cd ˜
˜ . In other words, entering
. The process whereby a ˜ cd /home/<username> is the same as enis substituted for your home directory is called
tilde expansion
.
To remove (i.e., erase or delete) a file, use the command remove a directory, use the command rmdir <dir> rm <filename> . To
. Practice using these two commands. Note that you cannot remove a directory unless it is empty. To remove a directory as well as any contents it might contain, use the command rm -R <dir> .
The -R option specifies to dive into any subdirectories of <dir> and delete their contents. The process whereby a command dives into subdirectories of subdirectories of
. . . is called recursion.
-R stands for
recursively
. This is a very dangerous command.
Although you may be used to “undeleting” files on other systems, on U
NIX a deleted file is, at best, extremely difficult to recover.
The cp tories. The command also takes the mv
-R option, allowing it to copy whole direccommand is used to move files and directories. It really just renames a file to a different directory. Note that with
-p and -d with -R cp you should use the option to preserve all attributes of a file and properly reproduce symlinks
(discussed later). Hence, always use
R <dir> <newdir> .
cp -dpR <dir> <newdir> instead of cp -
Commands can be given file name arguments in two ways. If you are in the same directory as the file (i.e., the file is in the current directory), then you can just enter the file name on its own (e.g.,
path name
, like cp my file new file ). Otherwise, you can enter the cp /home/jack/my file /home/jack/new file
full
. Very often administrators use the notation ./my file to be clear about the distinction, for instance,
34
4. Basic Commands 4.7. System Manual Pages cp ./my file ./new file . The leading ./ makes it clear that both files are relative to the current directory. File names not starting with a / are called
relative
path names, and otherwise,
absolute
path names.
(See Chapter 16 for a complete overview of all documentation on the system, and also how to print manual pages in a properly typeset format.)
The command man [<section>|-a] <command> displays help on a particular topic and stands for
manual
. Every command on the entire system is documented in so-named
man pages
. In the past few years a new format of documentation, called
info
, has evolved. This is considered the modern way to document commands, but most system documentation is still available only through documented in man however.
man . Very few packages are not
Man pages are the authoritative reference on how a command works because they are usually written by the very programmer who created the command. Under
U
NIX
, any printed documentation should be considered as being second-hand information. Man pages, however, will often not contain the underlying concepts needed for understanding the context in which a command is used. Hence, it is not possible for a person to learn about U
NIX purely from man pages. However, once you have the necessary background for a command, then its man page becomes an indispensable source of information and you can discard other introductory material.
Now, man pages are divided into sections, numbered 1 through 9. Section 1 contains all man pages for system commands like the ones you have been using. Sections
2-7 contain information for programmers and the like, which you will probably not have to refer to just yet. Section 8 contains pages specifically for system administration commands. There are some additional sections labeled with letters; other than these, there are no manual pages besides the sections 1 through 9. The sections are
. . .
/man1
. . .
/man2
. . .
/man3
. . .
/man4
. . .
/man5
. . .
/man6
. . .
/man7
. . .
/man8
. . .
/man9
User programs
System calls
Library calls
Special files
File formats
Games
Miscellaneous
System administration
Kernel documentation
You should now use the man command to look up the manual pages for all the commands that you have learned. Type man cp , man mv , man rm , man mkdir , man rmdir , man passwd , man cd , man pwd , and of course man man . Much of the
35
4.8. System
info
Pages 4. Basic Commands information might be incomprehensible to you at this stage. Skim through the pages to get an idea of how they are structured and what headings they usually contain. Man pages are referenced with notation like cp (1), for the can be read with man 1 cp cp command in Section 1, which
. This notation will be used from here on.
info pages contain some excellent reference and tutorial information in hypertext linked format. Type info on its own to go to the top-level menu of the entire info hierarchy. You can also type info <command> for help on many basic commands.
Some packages will, however, not have info pages, and other U
NIX systems do not support info at all.
info is an interactive program with keys to navigate and search documentation. Inside info, typing mands.
will invoke the help screen from where you can learn more com-
You should practice using each of these commands.
bc
A calculator program that handles arbitrary precision (very large) numbers. It is useful for doing any kind of calculation on the command-line. Its use is left as an exercise.
cal [[0-12] 1-9999]
Prints out a nicely formatted calender of the current month, a specified month, or a specified whole year.
Try cal 1 for fun, and cal 9 1752 , when the pope had a few days scrapped to compensate for roundoff error.
cat <filename> [<filename> ...]
the screen.
cat
Writes the contents of all the files listed to can join a lot of files together with cat <filename> <filename> ... > <newfile> . The file <newfile> will be an end-on-end
concatenation
of all the files specified.
clear
Erases all the text in the current terminal.
date
Prints out the current date and time. (The command thing entirely different.) time , though, does some-
df
Stands for
disk free
and tells you how much free space is left on your system. The available space usually has the units of kilobytes (1024 bytes) (although on some other U
NIX systems this will be 512 bytes or 2048 bytes). The right-most column
36
4. Basic Commands 4.9. Some Basic Commands tells the directory (in combination with any directories below that) under which that much space is available.
dircmp
Directory compare. This command compares directories to see if changes have been made between them. You will often want to see where two trees differ
(e.g., check for missing files), possibly on different computers. Run man dircmp
(that is, dircmp (1)). (This is a System 5 command and is not present on L
INUX
You can, however, compare directories with the Midnight Commander, mc ).
.
du <directory>
Stands for
disk usage
and prints out the amount of space occupied by a directory. It recurses into any subdirectories and can print only a summary with x / du -s <directory> on a system with /usr
. Also try and /home du --max-depth=1 /var on separate
partitions
.
and du -
&
See page 143.
-
dmesg
Prints a complete log of all messages printed to the screen during the bootup process. This is useful if you blinked when your machine was initializing. These messages might not yet be meaningful, however.
echo
Prints a message to the terminal.
$[10*3+2] , echo ‘$[10*3+2]’
Try echo ’hello there’
. The command echo -e
, echo allows interpretation of certain a
bell backslash
sequences, for example
, or in other words, beeps the terminal.
echo -e "\a" echo -n
, which prints does the same without printing the trailing newline. In other words, it does not cause a wrap to the next line after the text is printed.
echo -e -n "\b" , prints a back-space character only, which will erase the last character printed.
exit
Logs you out.
expr <expression>
5 + 10 ’*’ 2
Calculates the numerical expression expression arithmetic operations that you are accustomed to will work.
.
Try
. Observe how mathematical precedence is obeyed (i.e., the is worked out before the + ).
Most expr
*
file <filename>
Prints file portrait.jpg
out the type will tell you that of data contained portrait.jpg
is a in a file.
JPEG image data, JFIF standard .
The command amount of file types, across every platform.
file file detects an enormous works by checking whether the first few bytes of a file match certain tell-tale byte sequences. The byte sequences are called
magic numbers
. Their complete list is stored in /usr/share/magic .
&
The word “magic” under U
NIX normally refers to byte sequences or numbers that have a specific meaning or implication. So-called systems.
-
magic numbers
are invented for source code, file formats, and file
free
Prints out available free memory. You will notice two listings: swap space and physical memory. These are contiguous as far as the user is concerned. The swap space is a continuation of your installed memory that exists on disk. It is obviously slow to access but provides the illusion of much more available RAM
37
4.9. Some Basic Commands 4. Basic Commands and avoids the possibility of ever running out of memory (which can be quite fatal).
head [-n <lines>] <filename>
lines if the -n
Prints the first option is not given. (See also tail
<lines> below).
lines of a file or 10
hostname [<new-name>]
With no options, hostname prints the name of your machine, otherwise it sets the name to <new-name> .
kbdrate -r <chars-per-second> -d <repeat-delay>
rate of your keys. Most users will like this rate set to
Changes the repeat kbdrate -r 32 -d 250 which unfortunately is the fastest the PC can go.
more
Displays a long file by stopping at the end of each page. Run the following: ls -l /bin > bin-ls , and then try more bin-ls . The first command creates a file with the contents of the output of the directory /bin ls . This will be a long file because has a great many entries. The second command views the file.
Use the space bar to page through the file. When you get bored, just press
You can also try ls -l /bin | more which will do the same thing in one go.
.
less
The GNU version of more , but with extra features. On your system, the two commands may be the same. With less , you can use the arrow keys to page up and down through the file. You can do searches by pressing , and then typing in a word to search for and then pressing . Found words will be highlighted, and the text will be scrolled to the first found word. The important commands are:
–
–
Go to the end of a file.
ssss
Search backward through a file for the text
ssss
.
ssss
Search forward through a file for the text
expression
. See Chapter 5 for more info.
-
ssss
.
&
Actually
ssss
is a
regular
–
Scroll forward and keep trying to read more of the file in case some other program is appending to it—useful for log files.
nnn
–
Go to line
nnn
of the file.
Quit. Used by many U
NIX text-based applications (sometimes – ).
(You can make less file /etc/profile stop beeping in the irritating way that it does by editing the and adding the lines
¨
LESS=-Q export LESS
§
¥
¦ and then logging out and logging in again. But this is an aside that will make more sense later.)
38
4. Basic Commands 4.9. Some Basic Commands
lynx <url>
Opens a URL console. Try
&
URL stands for
Uniform Resource Locator
—a web address.
lynx http://lwn.net/ .
at the
links <url>
Another text-based web browser.
nohup <command> &
Runs a command in the background, appending any output the command may produce to the file nohup.out
in your home directory.
nohup has the useful feature that the command will continue to run even after you have logged out. Uses for nohup will become obvious later.
sleep <seconds>
Pauses for <seconds> seconds. See also usleep .
sort <filename>
called
Prints a file with lines sorted in alphabetical order. Create a file telephone with each line containing a short telephone book entry. Then type sort sort telephone , or sort telephone | less takes many interesting options to sort in reverse ( and see what happens.
sort -r ), to eliminate duplicate entries ( sort -u ), to ignore leading whitespace ( sort -b ), and so on.
See the sort (1) for details.
strings [-n <len>] <filename>
Writes out a binary file, but strips any unreadable characters. Readable groups of characters are placed on separate lines. If you have a binary file that you think may contain something interesting but looks completely garbled when viewed normally, use strings to sift out the interesting stuff: try strings less /bin/cp and then try strings /bin/cp does not print sequences smaller than 4. The -n
. By default option can alter this limit.
split ...
Splits a file into many separate files. This might have been used when a file was too big to be copied onto a floppy disk and needed to be split into, say, 360-KB pieces. Its sister, csplit , can split files along specified lines of text within the file. The commands are seldom used on their own but are very useful within programs that manipulate text.
tac <filename> [<filename> ...]
Writes the contents of all the files listed to the screen, reversing the order of the lines—that is, printing the last line of the file first.
tac is cat backwards and behaves similarly.
tail [-f] [-n <lines>] <filename>
10 lines if the -n
Prints the last option is not given. The -f
<lines> lines of a file or option means to watch the file for lines being appended to the end of it. (See also head above.)
uname
Prints the name of the U
NIX case, L
INUX
.
operating system
you are currently using. In this
uniq <filename>
sorted.
Prints a file with duplicate lines deleted. The file must first be
39
4.10. The
mc
File Manager 4. Basic Commands
usleep <microseconds>
Pauses
(1/1,000,000 of a second).
for <microseconds> microseconds
wc [-c] [-w] [-l] <filename>
c haracter), or words (with -w
Counts the number of bytes (with
), or lines (with -l ) in a file.
-c for
whatis <command>
Gives the first line of the man page corresponding to mand> , unless no such page exists, in which case it prints
<comnothing appropriate .
whoami
Prints your login name.
Those who come from the DOS world may remember the famous
Norton Commander
file manager. The GNU project has a Free clone called the
Midnight Commander
, mc .
It is essential to at least try out this package—it allows you to move around files and directories extremely rapidly, giving a wide-angle picture of the file system. This will drastically reduce the number of tedious commands you will have to type by hand.
You should practice using each of these commands if you have your sound card configured.
&
I don’t want to give the impression that L
INUX does not have graphical applications to do all the functions in this section, but you should be aware that for every graphical application, there is a textmode one that works better and consumes fewer resources.
You may also find that some of these packages are not installed, in which case you can come back to this later.
play [-v <volume>] <filename>
sound card. These formats are
Plays linear audio formats out through your
.8svx
, .aiff
, .au
, .cdr
, .cvs
, .dat
, .gsm
,
.hcom
,
.uw
,
.maud
.sw
, or
, .sf
.ul
, .smp
, .txw
, .vms
, .voc
, .wav
, .wve
, .raw
, .ub
, .sb
files. In other words, it plays almost every type of “basic”
, sound file there is: most often this will be a simple Windows
<volume> in percent.
.wav
file. Specify
rec <filename>
Records from your microphone into a file. ( play and the same package.) rec are from
mpg123 <filename>
Plays audio from MPEG files level 1, 2, or 3. Useful options are
-b 1024 (for increasing the buffer size to prevent jumping) and --2to1 (downsamples by a factor of 2 for reducing CPU load). MPEG files contain sound and/or video, stored very compactly using digital signal processing techniques that the commercial software industry seems to think are very sophisticated.
40
4. Basic Commands 4.12. Terminating Commands
cdplay
Plays a regular music CD .
cdp is the interactive version.
aumix
Sets your sound card’s volume, gain, recording volume, etc. You can use it interactively or just enter aumix -v <volume> to immediately set the volume in percent. Note that this is a dedicated
mixer
program and is considered to be an application separate from any that play music. Preferably do not set the volume from within a sound-playing application, even if it claims this feature—you have much better control with aumix .
mikmod --interpolate -hq --renice Y <filename>
Plays
Mod
files. Mod files are a special type of audio format that stores only the duration and pitch of the notes that constitute a song, along with samples of each musical instrument needed to play the song. This makes for high-quality audio with phenomenally small file size.
mikmod supports 669, AMF, DSM, FAR, GDM, IMF, IT, MED,
MOD, MTM, S3M, STM, STX, ULT, UNI, and XM audio formats—that is, probably every type in existence. Actually, a lot of excellent listening music is available on the Internet in Mod file format. The most common formats are .it
, .mod
,
.s3m
, and .xm
.
&
Original .mod
files are the product of Commodore-Amiga computers and had only four tracks. Today’s 16 (and more) track Mod files are comparable to any recorded music.
-
You usually use – to stop an application or command that runs continuously.
You must type this at the same prompt where you entered the command. If this doesn’t work, the section on plication to quit.
processes
(Section 9.5) will explain about
signalling
a running ap-
Files typically contain a lot of data that one can imagine might be represented with a smaller number of bytes. Take for example the letter you typed out. The word “the” was probably repeated many times. You were probably also using lowercase letters most of the time. The file was by far not a completely random set of bytes, and it repeatedly used spaces as well as using some letters more than others.
&
English text in fact contains, on average, only about 1.3 useful bits (there are eight bits in a byte) of data per byte.
-
Because of this the file can be
compressed
to take up less space. Compression involves representing the same data by using a smaller number of bytes, in such a way that the original data can be reconstructed exactly. Such usually involves finding patterns in the data. The command to compress a file is gzip <filename> , which stands for
GNU zip
. Run gzip happened. Now, use on a file in your home directory and then run more ls to see what to view the compressed file. To uncompress the file use
41
4.14. Searching for Files 4. Basic Commands gzip -d <filename> . Now, use more to view the file again. Many files on the system are stored in compressed format. For example, man pages are often stored compressed and are uncompressed automatically when you read them.
mand
You previously used the command zcat cat to view a file. You can use the comto do the same thing with a compressed file. Gzip a file and then type zcat <filename> . You will see that the contents of the file are written to the screen.
Generally, when commands and files have a z in them they have something to do with compression—the letter z stands for
zip
. You can use zcat <filename> | less view a compressed file proper. You can also use the command to zless <filename> , which does the same as zcat <filename> | less tually have the functionality of zless combined.)
. (Note that your less may ac-
A new addition to the arsenal is much like gzip bzip2 . This is a compression program very
, except that it is slower and compresses 20%–30% better. It is useful for compressing files that will be downloaded from the Internet (to reduce the transfer volume). Files that are compressed with bzip2 have an extension .bz2
. Note that the improvement in compression depends very much on the type of data being compressed. Sometimes there will be negligible size reduction at the expense of a huge speed penalty, while occasionally it is well worth it. Files that are frequently compressed and uncompressed should never use bzip2 .
You can use the command find to search for files. Change to the root directory, and enter find . It will spew out all the files it can see by
recursively descending
each subdirectory and all its subdirectories, and repeats the command find .
-
&
Goes into into all subdirectories.
In other words, find , when executed from the root directory, prints all the files on the system.
stop it.
find will work for a long time if you enter it as you have—press – to
Now change back to your home directory and type find your personal files. You can specify a number of options to again. You will see find
all
to look for specific files.
find -type d
Shows only directories and not the files they contain.
find -type f
Shows only files and not the directories that contain them, even though it will still descend into all directories.
find -name <filename>
instance,
Finds only files that have the name find -name ’*.c’
<filename> will find all files that end in a .c
. For extension
( find -name *.c
why later).
without the quote characters will not work. You will see find -name Mary Jones.letter
will find the file with the name
Mary Jones.letter
.
42
4. Basic Commands 4.15. Searching Within Files
find -size [[+|-]]<size>
smaller (for ) than
Finds only files that have a size larger (for
<size> kilobytes, or the same as <size>
+ ) or kilobytes if the sign is not specified.
find <directory> [<directory> ...]
rectories.
Starts find in each of the specified di-
There are many more options for doing just about any type of search for a file. See find (1) for more details (that is, run man 1 find ). Look also at the -exec option which causes find
¨ to execute a command for each file it finds, for example:
¥
¦ find has the deficiency of actively reading directories to find files. This process is slow, especially when you start from the root directory. An alternative command is locate <filename> . This searches through a previously created database of all the files on the system and hence finds files instantaneously. Its counterpart updatedb updates the database of files used by automatically every day at 04h00.
locate . On some systems, updatedb runs
5
Try these ( updatedb will take several minutes):
¨ updatedb locate rpm locate deb locate passwd locate HOWTO locate README
§
¥
¦
Very often you will want to search through a number of files to find a particular word or phrase, for example, when a number of files contain lists of telephone numbers with people’s names and addresses. The command grep does a line-by-line search through a file and prints only those lines that contain a word that you have specified.
the command summary:
¨ grep has
¥
&
The words
word
,
string
, or
pattern
are used synonymously in this context, basically meaning a short length of letters and-or numbers that you are trying to find matches for. A
pattern
can also be a string with kinds of wildcards in it that match different characters, as we shall see later.
-
¦
43
4.16. Copying to MS-DOS and Windows Formatted Floppy Disks 4. Basic Commands
Run grep for the word “the” to display all lines containing it:
’the’ Mary Jones.letter
. Now try grep ’the’ *.letter
.
grep
grep -n <pattern> <filename>
word was found.
shows the line number in the file where the
grep -<num> <pattern> <filename>
prints out <num> of the lines that came before and after each of the lines in which the word was found.
grep -A <num> <pattern> <filename>
prints out <num> of the lines that came
A fter each of the lines in which the word was found.
grep -B <num> <pattern> <filename>
prints out <num> of the lines that came
B efore each of the lines in which the word was found.
grep -v <pattern> <filename>
the word you are searching for.
prints out only those lines that do
&
You may think that the
not
contain
-v option is no longer doing the same kind of thing that grep is advertised to do: i.e.,
searching
for strings. In fact, U
NIX commands often suffer from this—they have such versatility that their functionality often overlaps with that of other commands. One actually never stops learning new and nifty ways of doing things hidden in the dark corners of man pages.
-
grep -i <pattern> <filename>
insensitive.
does the same as an ordinary grep but is case
A package, called the mtools package, enables reading and writing to MS-
DOS/Windows floppy disks. These are not standard U
NIX commands but are packaged with most L
INUX name” floppy disks. Put an MS-DOS disk in your
¨ distributions. The commands support Windows “long file
A: drive. Try mdir A: touch myfile mcopy myfile A: mdir A:
§
¥
¦
Note that there is age understands
no
such thing as an A: disk under L
INUX
. Only the mtools pack-
A: in order to retain familiarity for MS-DOS users. The complete list
¥ floppyd mattrib mcopy mdel mformat minfo mmount mmove mshowfat mtoolstest
44
4. Basic Commands 4.17. Archives and Backups
5 mbadblocks mcat mcd
§ mdeltree mdir mdu mkmanifest mlabel mmd mpartition mrd mren mtype mzip xcopy
Entering info mtools put into lower case with an mand.
will give detailed help. In general, any MS-DOS command, m prefixed to it, gives the corresponding L
INUX com-
¦
Never begin any work before you have a fail-safe method of backing it up.
One of the primary activities of a system administrator is to make backups. It is essential never to underestimate the volatility information in a computer.
Backups
&
Ability to evaporate or become chaotic.
of of data are therefore continually made. A backup is a duplicate of your files that can be used as a replacement should any or all of the computer be destroyed. The idea is that all of the data in a directory a directory and all its subdirectories and all the files in those subdirectories, etc.
-
&
As usual, meaning are stored in a separate place—often compressed—and can be retrieved in case of an emergency. When we want to store a number of files in this way, it is useful to be able to pack many files into one file so that we can perform operations on that single file only. When many files are packed together into one, this packed file is called an
archive
. Usually archives have the extension .tar
, which stands for
tape archive
.
To
create
an archive of a directory, use the tar
¨ tar -c -f <filename> <directory>
§ command:
¥
¦
Create a directory with a few files in it, and run the tar command to back it up.
A file of <filename> will be created. Take careful note of any error messages that tar reports. List the file and check that its size is appropriate for the size of the directory you are archiving. You can also use the
verify
option (see the man page) of the tar command to check the integrity of restore it with the
¨
extract
<filename> option of the tar
. Now remove the directory, and then command: tar -x -f <filename>
§
¥
¦
You should see your directory recreated with all its files intact. A nice option to give to tar is -v . This option lists all the files that are being added to or extracted from the archive as they are processed, and is useful for monitoring the progress of archiving.
45
4.18. The
PATH
Where Commands Are Searched For 4. Basic Commands
It is obvious that you can call your archive anything you like, however; the common practice is to call it <directory>.tar
, which makes it clear to all exactly what it is.
Another important option is -p which preserves detailed attribute information of files.
Once you have your gzip . This will create a file
.tar
file, you would probably want to compress it with
<directory>.tar.gz
, which is sometimes called <directory>.tgz
for brevity.
A second kind of archiving utility is cpio .
cpio is actually more powerful than tar, but is considered to be more cryptic to use. The principles of cpio are quite similar and its use is left as an exercise.
When you type a command at the shell prompt, it has to be read off disk out of one or other directory. On U
NIX
, all such
executable commands
are located in one of about four directories. A file is located in the directory tree according to its type, rather than according to what software package it belongs to. For example, a word processor may have its actual executable stored in a directory with all other executables, while its font files are stored in a directory with other fonts from all other packages.
The shell has a procedure for searching for executables when you type them in.
If you type in a command with slashes, like /bin/cp , then the shell tries to run the named program, tries to find the cp cp
, out of the /bin directory. If you just type cp command in each of the subdirectories of your on its own, then it
PATH . To see what your
¨
PATH is, just type echo $PATH
§
¥
¦
You will see a colon separated list of four or more directories. Note that the current directory .
is not listed. It is important that the current directory
not
be listed for reasons of security. Hence, to execute a command in the current directory, we hence always ./<command> .
To append, for example, a new directory
¨
PATH="$PATH:/opt/gnome/bin" export PATH
§
/opt/gnome/bin to your PATH , do
L
INUX
¨ supports the convenience of doing this in one line: export PATH="$PATH:/opt/gnome/bin"
§
¥
¦
¥
¦
46
4. Basic Commands 4.19. The
--
Option
There is a further command, from the PATH which , to check whether a command is locatable
. Sometimes there are two commands of the same name in different directories of the which ls
PATH .
&
This is more often true of Solaris systems than L
INUX
<command> locates the one that your shell would execute. Try:
¨
.
Typing which which cp mv rm which which which cranzgots
§
¥
¦ which is also useful in shell scripts to tell if there is a command at all, and hence check whether a particular package is installed, for example, which netscape .
If a file name happens to begin with a then it would be impossible to use that file name as an argument to a command. To overcome this circumstance, most commands take an option -. This option specifies that no more options follow on the commandline—everything else must be treated as a literal file name. For instance
¨ touch -- -stupid_file_name rm -- -stupid_file_name
§
¥
¦
47
4.19. The
--
Option 4. Basic Commands
48
A regular expression is a sequence of characters that forms a template used to search for
strings
&
Words, phrases, or just about any sequence of characters.
within text. In other words, it is a search pattern. To get an idea of when you would need to do this, consider the example of having a list of names and telephone numbers. If you want to find a telephone number that contains a 3 in the second place and ends with an 8, regular expressions provide a way of doing that kind of search. Or consider the case where you would like to send an email to fifty people, replacing the word after the “Dear” with their own name to make the letter more personal. Regular expressions allow for this type of searching and replacing.
Many utilities use the regular expression to give them greater power when manipulating text. The grep command is an example. Previously you used the grep command to locate only simple letter sequences in text. Now we will use it to search for regular expressions.
In the previous chapter you learned that the ?
character can be used to signify that any character can take its place. This is said to be a
wildcard
and works with file names. With regular expressions, the wildcard to use is the can use the command grep .3....8 <filename>
.
character. So, you to find the seven-character telephone number that you are looking for in the above example.
Regular expressions are used for line-by-line searches. For instance, if the seven characters were spread over two lines (i.e., they had a line break in the middle), then grep wouldn’t find them. In general, a program that uses regular expressions will consider searches one line at a time.
49
5.1. Overview 5. Regular Expressions
Here are some regular expression examples that will teach you the regular expression basics. We use the grep command to show the use of regular expressions
(remember that the is enclosed in ’
-w option matches whole words only). Here the expression itself quotes for reasons that are explained later.
grep -w ’t[a-i]e’
Matches the words tee , the , and tie . The brackets have a special significance. They mean to match one character that can be anything from a to i .
grep -w ’t[i-z]e’
Matches the words tie and toe .
grep -w ’cr[a-m]*t’
Matches the words craft , credit , and cricket . The * means to match any number of the previous character, which in this case is any character from a through m .
grep -w ’kr.*n’
Matches the words matches any character and the * kremlin and krypton , because the means to match the dot any number of times.
.
egrep -w ’(th|sh).*rt’
|
Matches the words means to match either the th or the sh .
shirt egrep
, short is just like
, and grep thwart . The but supports
extended regular expressions
that allow for the | feature.
&
The | character often denotes a logical OR , meaning that either the thing on the left or the right of the | is applicable. This is true of many programming languages.
Note how the square brackets mean one-of-severalcharacters and the round brackets with | ’s mean one-of-several-words.
grep -w ’thr[aeiou]*t’
Matches the words threat and throat . As you can see, a list of possible characters can be placed inside the square brackets.
grep -w ’thr[ˆa-f]*t’
Matches the words ter the first bracket means to match
any
throughput character
except
and thrust . The ˆ afthe characters listed. For example, the word thrift is not matched because it contains an f .
If the
The above regular expressions all match whole words (because of the
-w greater number of matches. Also note that although the *
-w option).
option was not present, they might match parts of words, resulting in a far means to match any number of characters, it also will match
no
characters as well; for example: actually match the letter sequence te , that is, a t and an e t[a-i]*e could with zero characters between them.
Usually, you will use regular expressions to search for
whole lines
that match, and sometimes you would like to match a line that begins or ends with a certain string. The
ˆ character specifies the beginning of a line, and the $ character the end of the line. For example, ˆThe matches all lines that start with a The , and hack$ matches all lines that end with end with hack hack
, and ’ˆ *The.*hack *$’ matches all lines that begin with
, even if there is whitespace at the beginning or end of the line.
The and
50
5. Regular Expressions 5.2. The
fgrep
Command
Because regular expressions use certain characters in a special way (these are . \
[ ] * + ?
), these characters cannot be used to match characters. This restriction severely limits you from trying to match, say, file names, which often use the match a .
you can use the sequence \.
.
character. To which forces interpretation as an actual .
and not as a wildcard. Hence, the regular expression ter sequence myfileqtxt or myfile.txt
myfile.txt
might match the let-
, but the regular expression myfile\.txt
will match only myfile.txt
.
You can specify most special characters by adding a example, use \[ for an actual [ , a \$ for an actual $ , a \\
\ character before them, for for and actual \ , \+ for an actual + , and \?
for an actual ?
. ( ?
and + are explained below.)
fgrep is an alternative to grep . The difference is that while grep (the more commonly used command) matches regular expressions, fgrep matches literal strings. In other words you can use fgrep when you would like to search for an ordinary string that is not a regular expression, instead of preceding special characters with \ .
x* matches zero to infinite instances of a character x . You can specify other ranges of numbers of characters to be matched with, for example, x\{3,5\} , which will match at least three but not more than five x ’s, that is xxx , xxxx , or xxxxx .
x\{4\} can then be used to match 4 will match seven or more x x ’s exactly: no more and no less.
x\{7,\}
’s—the upper limit is omitted to mean that there is no maximum number of x ’s.
As in all the examples above, the as well as a single charcter.
x can be a range of characters (like [a-k] ) just
grep -w ’th[a-t]\{2,3\}t’
thrift , and throat .
Matches the words theft , thirst , threat ,
grep -w ’th[a-t]\{4,5\}t’
thinnest .
Matches the words theorist , thicket , and
51
5.4.
+ ? \< \> ( ) |
Notation 5. Regular Expressions
An enhanced version of regular expressions allows for a few more useful features.
Where these conflict with existing notation, they are only available through the egrep command.
+
is analogous to instead of
\{1,\}
zero
. It does the same as or more characters.
* but matches
one
or more characters
?
is analogous to “–1“˝. It matches
zero
or
one
character.
\< \>
can surround a string to match only whole words.
( )
can surround several strings, separated by | . This notation will match any of these strings. ( egrep only.)
\( \)
can surround several strings, separated by these strings. ( grep only.)
\| . This notation will match any of
The following examples should make the last two notations clearer.
grep ’trot’
Matches the words electrotherapist , betroth , and so on, but
grep ’\<trot\>’
matches only the word trot .
egrep -w ’(this|that|c[aeiou]*t)’
coat , cat , and cut .
Matches the words this , that , cot ,
Subexpressions are covered in Chapter 8.
52
To edit a text file means to interactively modify its content. The creation and modification of an ordinary text file is known as
text editing
. A word processor is a kind of editor, but more basic than that is the U
NIX or DOS text editor.
The important editor to learn how to use is vi . After that you can read why, and a little more about other, more user-friendly editors.
Type simply,
¨ vi <filename>
§ to edit any file, or the compatible, but more advanced
¨ vim <filename>
§
¥
¦
¥
¦
To exit vi , press , then the key sequence :q!
and then press .
vi has a short tutorial which should get you going in 20 minutes. If you get bored in the middle, you can skip it and learn vi as you need to edit things. To read the tutorial, enter:
¨ vimtutor
§
¥
¦ which edits the file
53
6.1.
vi
6. Editing Text Files
5
10
/usr/doc/vim-common-5.7/tutor ,
/usr/share/vim/vim56/tutor/tutor , or
/usr/share/doc/vim-common-5.7/tutor/tutor , depending on your distribution.
there are between different L
INUX
&
By this you should be getting an idea of the kinds of differences distributions.
-
You will then see the following at the top of your screen:
¨
===============================================================================
= W e l c o m e t o t h e V I M T u t o r Version 1.4
=
===============================================================================
¥
§
Vim is a very powerful editor that has many commands, too many to explain in a tutor such as this.
This tutor is designed to describe enough of the commands that you will be able to easily use Vim as an all-purpose editor.
The approximate time required to complete the tutor is 25-30 minutes,
¦
You are supposed to edit the tutor file itself as practice, following through 6 lessons. Copy it first to your home directory.
Table 6.1 is a quick reference for vi . It contains only a few of the many hundreds of available commands but is enough to do all basic editing operations. Take note of the following:
•
vi has several
modes
of operation. If you press , you enter then enter text as you would in a normal DOS text editor,
insert
-mode. You
but you cannot arbitrarily move the cursor and delete characters while in insert mode
. Pressing will get you out of insert mode, where you are not able to insert characters, but can now do things like arbitrary deletions and moves.
•
Pressing – (i.e., : ) gets you into
command-line
mode, where you can do operations like importing files, saving of the current file, searches, and text processing. Typically, you type : then some text, and then hit .
•
The word
register
is used below. A register is a hidden clipboard.
•
A useful tip is to enter :set ruler before doing anything. This shows, in the bottom right corner of the screen, what line and column you are on.
54
6. Editing Text Files 6.1.
vi
:wq
:q
:q!
x
X dd
:j!
Ctrl-J u
Ctrl-R de i o
O a
R
Key combination
l h k j
ˆ
$ gg
G
{
}
b w
or or or or
Table 6.1 Common vi commands
Function
Cursor left
Cursor right.
Cursor up.
Cursor down.
Cursor left one word.
Cursor right one word.
Cursor up one paragraph.
Cursor down one paragraph.
Cursor to line start.
Cursor to line end.
Cursor to first line.
Cursor to last line.
Get out of current mode.
Start insert mode.
Insert a blank line below the current line and then start insert mode.
Insert a blank line above the current line and then start insert mode.
Append (start insert mode after the current character).
Replace (start insert mode with overwrite).
Save (write) and quit.
Quit.
Quit forced (without whether a save is required).
checking
Delete (delete under cursor and copy to register).
Backspace (delete left of cursor and copy to register).
Delete line (and copy to register).
Join line (remove newline at end of current line).
Same.
Undo.
Redo.
Delete to word end (and copy to register).
continues...
55
6.1.
vi
6. Editing Text Files
Key combination
db d$ d p
G
?
ˆ
dd
2dd
5dd
Ctrl-G
5G
16G
/
search-string search-string
:-1,$s/
:,$s/
:%s/
:w
:r
search-string search-string
:,$s/\<
:8,22s/
search-string search-string search-string filename
:5,20w
:5,$w!
v y d p
Press v
filename filename filename
/
/
/
replace-string
\>/
/ down a few lines, then,
Table 6.1 (continued)
replace-string
, then move cursor
/gc
/gc
replace-string replace-string replace-string
/g
/g
/gc
Function
Delete to word start (and copy to register).
Delete to line end (and copy to register).
Delete to line beginning (and copy to register).
Delete current line (and copy to register).
Delete two lines (and copy to register).
Delete five lines (and copy to register).
Paste clipboard (insert register).
Show cursor position.
Cursor to line five.
Cursor to line sixteen.
Cursor to last line.
Search forwards for
search-string
.
Search backwards for
search-string
.
Search and replace with confirmation starting at current line.
Search and replace with confirmation starting at line below cursor.
Search and replace whole words.
Search and replace in lines 8 through
22 without confirmation.
Search and replace whole file without confirmation.
Save to file
filename
.
Save lines 5 through 20 to file
name file-
(use Ctrl-G to get line numbers if needed).
Force save lines 5 through to last line to file
filename
.
Insert file
filename
.
Visual mode (start highlighting).
Copy highlighted text to register.
Delete highlighted text (and copy to register).
Paste clipboard (insert register).
Search and replace within highlighted text.
continues...
56
6. Editing Text Files 6.2. Syntax Highlighting
Key combination
:s/
search-string
/
replace-string
/g
:help
Table 6.1 (continued)
Function
:new
:split
:q
:qa
Ctrl-W j
Ctrl-W k
Ctrl-W -
Ctrl-W +
filename
Reference manual (open new window with help screen inside—probably the most important command here!).
Open new blank window.
Open new window with
filename
.
Close current window.
Close all windows.
Move cursor to window below.
Move cursor to window above.
Make window smaller.
Make window larger.
Something all U
NIX users are used to (and have come to expect) is
This basically means that a bash
syntax highlighting
(explained later) script will look like:
.
instead of
Syntax highlighting is meant to preempt programming errors by colorizing correct keywords. You can set syntax highlighting in vim by using :syntax on (but not in it.
vi ). Enable syntax highlighting whenever possible—all good text editors support
Although U
NIX has had full graphics capability for a long time now, most administration of low-level services still takes place inside text configuration files. Word processing is also best accomplished with typesetting systems that require creation of ordinary text files.
&
This is in spite of all the hype regarding the WYSIWYG (what you see is what you get) word
A TEX and the Cooledit text editor.
-
Historically, the standard text editor used to be ed .
ed allows the user to see only one line of text of a file at a time (primitive by today’s standards). Today, ed is mostly used in its streaming version, sed .
ed has long since been superseded by vi .
57
6.3. Editors 6. Editing Text Files
The editor is the place you will probably spend most of your time. Whether you are doing word processing, creating web pages, programming, or administrating. It is your primary interactive application.
(Read this if you “just-want-to-open-a-file-and-start-typing-like-under-Windows.”) cooledit The best editor for day-to-day work is Cooledit, author, I am probably biased in this view.
available from
&
As Cooledit’s
the Cooledit web page
http://cooledit.sourceforge.net/
. Cooledit is a graphical (runs under X) editor. It is also a full-featured Integrated Development Environment (IDE) for whatever you may be doing. Those considering buying an IDE for development need look no further than installing Cooledit for free.
People coming from a Windows background will find Cooledit the easiest and most powerful editor to use. It requires no tutelage; just enter cooledit under X and start typing. Its counterpart in text mode is
Midnight Commander package mc mcedit , which comes with the GNU
. The text-mode version is inferior to other text mode editors like text mode.
emacs and jed but is adequate if you don’t spend a lot of time in
Cooledit has pull-down menus and intuitive keys. It is not necessary to read any documentation before using Cooledit.
Today vi fault on is considered the standard. It is the only editor that
any
U
NIX system.
vim
will
be installed by deis a “Charityware” version that (as usual) improves upon the original vi with a host of features. It is important to learn the basics of even if your day-to-day editor is not going to be vi vi
. The reason is that every administrator is bound to one day have to edit a text file over some really slow network link and vi is the best for this.
On the other hand, new users will probably find vi unintuitive and tedious and will spend a lot of time learning and remembering how to do all the things they need to. I myself cringe at the thought of vi pundits recommending it to new U
NIX users.
In defense of vi , it should be said that many people use it exclusively, and it is probably the only editor that really can do absolutely
everything
. It is also one of the few editors that has working versions and consistent behavior across all U
NIX non-U
NIX systems.
vim and works on AmigaOS, AtariMiNT, BeOS, DOS, MacOS, OS/2,
RiscOS, VMS, and Windows (95/98/NT4/NT5/2000) as well as all U
NIX variants.
58
6. Editing Text Files 6.3. Editors
Emacs stands for Editor MACroS. It is the monster of all editors and can do almost everything one could imagine that a single software package might. It has become a de facto standard alongside vi .
Emacs is more than just a text editor. It is a complete system of using a computer for development, communications, file management, and things you wouldn’t even imagine there are programs for. There is even an which can browse the web.
Window System version available
Other editors to watch out for are joe , jed , nedit , pico , nano , and many others that try to emulate the look and feel of well-known DOS, Windows, or Apple Mac development environments, or to bring better interfaces by using Gtk/Gnome or Qt/KDE.
The list gets longer each time I look. In short, don’t think that the text editors that your vendor has chosen to put on your CD are the best or only free ones out there. The same goes for other applications.
59
6.3. Editors 6. Editing Text Files
60
This chapter introduces you to the concept of
computer programming
. So far, you have entered commands one at a time. Computer programming is merely the idea of getting a number of commands to be executed, that in combination do some unique powerful function.
To execute a number of commands in sequence, create a file with a which you will enter your commands. The .sh
.sh
extension, into extension is not strictly necessary but serves as a reminder that the file contains special text called a on, the word
script shell script
. From now will be used to describe any sequence of commands placed in a text
¥ chmod 0755 myfile.sh
§ which allows the file to be run in the explained way.
¦
Edit the file using your favorite text editor. The first line should be as follows with no whitespace.
characters.
-
¨
#!/bin/sh
§
&
Whitespace are tabs and spaces, and in some contexts, newline (end of line)
¥
¦
The line dictates that the following program is a
shell
script, meaning that it accepts the same sort of commands that you have normally been typing at the prompt. Now enter a number of commands that you would like to be executed. You can start with
¨ echo "Hi there"
¥
61
7.2. Looping: the
while
and
until
Statements 7. Shell Scripting echo "what is your name? (Type your name here and press Enter)" read NM echo "Hello $NM"
§ ¦
Now, exit from your editor and type ./myfile.sh
computer to read and act on your list of commands, also called
. This will
running
the program.
execute
-
&
Cause the the file. Note that typing ./myfile.sh
is no different from typing any other command at the shell prompt. Your file myfile.sh
has in fact become a new U
NIX command all of its own.
Note what the read command is doing. It creates a pigeonhole called NM , and then inserts text read from the keyboard into that pigeonhole. Thereafter, whenever the shell encounters NM , its contents are written out instead of the letters NM (provided you write a $ in front of it). We say that NM is a
variable
because its contents can vary.
5
You can use shell scripts like a calculator. Try
¨ echo "I will work out X*Y" echo "Enter X" read X echo "Enter Y" read Y echo "X*Y = $X*$Y = $[X*Y]"
§
¥
¦
The [ and ] mean that everything between must be
evaluated
&
Substituted, worked out, or reduced to some simplified form.
between them.
as a
numerical expression
&
Sequence of numbers with + , , * , etc.
. You can, in fact, do a calculation at any time by typing at the prompt
¨ ¥
&
Note that the shell that you are using allows such [ ] notation. On some U
NIX systems you will have to use the expr command to get the same effect.
-
¦
The shell reads each line in succession from top to bottom: this is called
program flow
.
Now suppose you would like a command to be executed more than once—you would like to alter the program flow so that the shell reads particular commands repeatedly.
The while command executes a sequence of commands many times. Here is an example ( -le stands for
¨
less than or equal
):
N=1 while test "$N" -le "10" do
¥
62
7. Shell Scripting 7.3. Looping: the
for
Statement
5 echo "Number $N"
N=$[N+1] done
§
The N=1 creates a variable called N and places the number mand executes all the commands between the do and the
1 into it. The done while comrepetitively until the test condition is no longer true (i.e., until
less than or equal to
. See test
N
(1) (that is, run is greater than 10 ). The -le stands for man 1 test ) to learn about the other types of tests you can do on variables. Also be aware of how value that becomes 1 greater with each repetition of the
N while is replaced with a new loop.
¦
You should note here that each line is a distinct command—the commands are
newline-separated
. You can also have more than one command on a line by separating
¥
N=1 ; while test "$N" -le "10"; do echo "Number $N"; N=$[N+1] ; done
§
(Try counting down from 10 with -ge (
greater than or equal
).) It is easy to see that shell scripts are extremely powerful, because any kind of command can be executed with conditions and loops.
¦
The until statement is identical to while plied. The same functionality can be achieved with
¨ except that the reverse logic is ap-
-gt (
greater than
):
¥
¦
The for command also allows execution of commands multiple times. It works like
¥
5 for i in cows sheep chickens pigs do echo "$i is a farm animal" done echo -e "but\nGNUs are not farm animals"
§ ¦ do
The and for done command takes each string after the with i in , and executes the lines between substituted for that string. The strings can be anything (even numbers) but are often file names.
The if command executes a number of commands if a condition is met ( -gt stands for
greater than
, -lt stands for
less than
). The if command executes all the lines between the if and the fi (“if” spelled backwards).
63
7.3. Looping: the
for
Statement 7. Shell Scripting
5
¨
X=10
Y=5 if test "$X" -gt "$Y" ; then echo "$X is greater than $Y" fi
§
¥
¦
5
¨
X=10
The if command in its full form can contain as much as:
Y=5 if test "$X" -gt "$Y" ; then echo "$X is greater than $Y" elif test "$X" -lt "$Y" ; then echo "$X is less than $Y" else echo "$X is equal to $Y" fi
§
¥
¦
Now let us create a script that interprets its arguments. Create a new script called backup-lots.sh
, containing:
¨
#!/bin/sh for i in 0 1 2 3 4 5 6 7 8 9 ; do cp $1 $1.BAK-$i done
§
¥
¦
5
10
Now create a file important data with anything in it and then run ./backuplots.sh important data , which will copy the file 10 times with 10 different extensions. As you can see, the variable $1 has a special meaning—it is the first argument on the command-line. Now let’s get a little bit more sophisticated ( -e test whether the file
exists
):
¨
#!/bin/sh if test "$1" = "" ; then echo "Usage: backup-lots.sh <filename>" exit fi for i in 0 1 2 3 4 5 6 7 8 9 ; do
NEW_FILE=$1.BAK-$i if test -e $NEW_FILE ; then echo "backup-lots.sh: **warning** $NEW_FILE" echo " already exists - skipping" else cp $1 $NEW_FILE
¥
64
7. Shell Scripting done
§ fi
7.4.
break
ing Out of Loops and
continue
ing
¦
5
10
A loop that requires premature termination can include the break statement within it:
¨
#!/bin/sh for i in 0 1 2 3 4 5 6 7 8 9 ; do
NEW_FILE=$1.BAK-$i if test -e $NEW_FILE ; then echo "backup-lots.sh: **error** $NEW_FILE" echo " already exists - exitting" break else cp $1 $NEW_FILE fi done
§
¥
¦ which causes program execution to continue on the line after the are nested within each other, then the command break 2 done . If two loops causes program execution to break out of
both
loops; and so on for values above 2 .
5
10
The continue statement is also useful for terminating the current iteration of the loop. This means that if a continue statement is encountered, execution will immediately continue from the top of the loop, thus ignoring the remainder of the body of the loop:
¨
#!/bin/sh for i in 0 1 2 3 4 5 6 7 8 9 ; do
NEW_FILE=$1.BAK-$i if test -e $NEW_FILE ; then echo "backup-lots.sh: **warning** $NEW_FILE" echo " already exists - skipping" continue fi cp $1 $NEW_FILE done
§
¥
¦
Note that both break and continue work inside for , while , and until loops.
65
7.5. Looping Over Glob Expressions 7. Shell Scripting
We know that the shell can expand file names when given can type ls *.txt
to list all files ending with .txt
wildcards
. For instance, we
. This applies equally well in any situation, for instance:
¨
#!/bin/sh for i in *.txt ; do echo "found a file:" $i done
§
¥
¦
The *.txt
is expanded to all matching files.
These files are searched for in the current directory
. If you include an absolute path then the shell will search in that directory:
¨
#!/bin/sh for i in /usr/doc/*/*.txt ; do echo "found a file:" $i done
§
¥
¦
This example demonstrates the shell’s ability to search for matching files and expand an absolute path.
5
10
15
The case statement can make a potentially complicated program very short. It is best explained with an example.
¨
#!/bin/sh case $1 in
--test|-t) echo "you used the --test option" exit 0
;;
--help|-h) echo "Usage:" echo " exit 0 myprog.sh [--test|--help|--version]"
;;
--version|-v) echo "myprog.sh version 0.0.1" exit 0
;;
-*) echo "No such option $1" echo "Usage:"
¥
66
7. Shell Scripting 7.7. Using Functions: the
function
Keyword
20 esac
;; echo " exit 1 myprog.sh [--test|--help|--version]"
Above you can see that we are trying to process the first argument to a program.
It can be one of several options, so using if statements will result in a long program.
The case statement allows us to specify several possible statement blocks depending on the value of a variable. Note how each statement block is separated by ;; . The strings before the ) are glob expression matches. The first successful match causes that block to be executed. The | symbol enables us to enter several possible glob expressions.
¦
So far, our programs execute mostly from top to bottom. Often, code needs to be repeated, but it is considered bad programming practice to repeat groups of statements that have the same functionality. Function definitions provide a way to group statement blocks into one. A function groups a list of commands and assigns it a name. For example:
¨
#!/bin/sh
¥
5 function usage ()
{ echo "Usage:" echo "
} myprog.sh [--test|--help|--version]"
10
15
20 case $1 in
--test|-t) echo "you used the --test option" exit 0
;;
--help|-h) usage
;;
--version|-v) echo "myprog.sh version 0.0.2" exit 0
;;
-*)
67
7.8. Properly Processing Command-Line Args:
shift
7. Shell Scripting echo "Error: no such option $1" usage exit 1
25 esac
;; echo "You typed \"$1\" on the command-line"
§
Wherever the lines inside the { usage and } keyword appears, it is effectively substituted for the two
. There are obvious advantages to this approach: if you would like to change the program
usage
description, you only need to change it in one place in the code. Good programs use functions so liberally that they never have more than
50 lines of program code in a row.
¦
Most programs we have seen can take many command-line arguments, sometimes in any order. Here is how we can make our own shell scripts with this functionality. The command-line arguments can be reached with
¨
#!/bin/sh
$1 , $2 , etc. The script,
¥ and prints
¨
¦
¥
¦
¥
¦
Now we need to loop through each argument and decide what to do with it. A script like
¨ for i in $1 $2 $3 $4 ; do
<statments>
¥
¦ doesn’t give us much flexibilty. The shift keyword is meant to make things easier.
It shifts up all the arguments by one place so that $1 gets the value of $2 , $2 gets the value of $3 , and so on. ( != tests that the "$1" empty and is hence past the last argument.) Try is not equal to "" , that is, whether it is
68
7. Shell Scripting 7.8. Properly Processing Command-Line Args:
shift
¨ while test "$1" != "" ; do echo $1 shift
¥
¦ and run the program with lots of arguments.
Now we can put any sort of condition statements within the loop to process the arguments in turn:
¨
#!/bin/sh
¥
5 function usage ()
{ echo "Usage:" echo "
} myprog.sh [--test|--help|--version] [--echo <text>]"
10
15
20
25
30 while test "$1" != "" ; do case $1 in
--echo|-e) echo "$2" shift
;;
--test|-t) echo "you used the --test option"
;;
--help|-h) usage exit 0
;;
--version|-v) echo "myprog.sh version 0.0.3" exit 0
;;
-*) echo "Error: no such option $1" usage exit 1
;; esac shift
¦ myprog.sh
can now run with multiple arguments on the command-line.
69
7.9. More on Command-Line Arguments:
and
$0
7. Shell Scripting
Whereas $1 , $2 , $3 , etc. expand to the individual arguments passed to the program, [email protected] expands to
all
arguments. This behavior is useful for passing all remaining arguments onto a second command. For instance,
¨ if test "$1" = "--special" ; then shift myprog2.sh "[email protected]" fi
§
¥
¦
$0 means the name of the program itself and not any command-line argument. It is the command used to invoke the current program. In the above cases, it is ./myprog.sh
.
Note that $0 is immune to shift operations.
Single forward quotes ’
protect
the enclosed text from the shell. In other words, you can place any odd characters inside forward quotes, and the shell will treat them literally and reproduce your text exactly. For instance, you may want to echo an actual
$ to the screen to produce an output like costs $1000 . You can use echo ’costs
$1000’ instead of echo "costs $1000" .
Double quotes " have the opposite sense of single quotes. They allow
all
shell interpretations to take place inside them. The reason they are used at all is only to group text containing whitespace into a single word, because the shell will usually break up text along whitespace boundaries. Try,
¨ for i in "henry john mary sue" ; do echo "$i is a person"
¥
¦ compared to
¨ for i in henry john mary sue ; do echo $i is a person
¥
¦
70
7. Shell Scripting 7.12. Backward-Quote Substitution
Backward quotes ‘ have a special meaning to the shell. When a command is inside backward quotes it means that the command should be run and its
output
substituted in place of the backquotes. Take, for example, the to be catted
¨
, with only the text daisy cat command. Create a small file, inside it. Create a shell script
X=‘cat to_be_catted‘ echo $X
§
¥
¦
The value of X is set to the output of the cat word
¨
X=‘expr 100 + 50 ’*’ 3‘ command, which in this case is the daisy . This is a powerful tool. Consider the expr command: echo $X
§
¥
¦
5
10
Hence we can use expr and backquotes to do mathematics inside our shell script.
Here is a function to calculate factorials. Note how we enclose the * in forward quotes.
They prevent the shell from expanding the
¨ function factorial ()
* into matching file names:
{
N=$1
A=1 while test $N -gt 0 ; do
A=‘expr $A ’*’ $N‘ done
N=‘expr $N - 1‘ echo $A
}
§
¥
¦
We can see that the square braces used further above can actually suffice for most of the times where we would like to use expr . (However, $[] notation is an extension of the GNU shells and is not a standard feature on all varients of U
NIX
.) We can now run factorial 20 and see the output. If we want to assign the output to a variable, we can do this with X=‘factorial 20‘ .
Note that another notation which gives the effect of a backward quote is which is identical to ‘
command
‘
$(
command
)
. Here, I will always use the older backward quote
, style.
71
7.12. Backward-Quote Substitution 7. Shell Scripting
72
The ability to use pipes is one of the powers of U
NIX
. This is one of the principle deficiencies of some non-U
NIX systems. Pipes used on the command-line as explained in this chapter are a neat trick, but pipes used inside
C
programs enormously simplify program interaction. Without pipes, huge amounts of complex and buggy code usually needs to be written to perform simple tasks. It is hoped that this chapter will give the reader an idea of why U
NIX is such a ubiquitous and enduring standard.
The commands grep , echo , df and so on print some output to the screen. In fact, what is happening on a lower level is that they are printing characters one by one into a theoretical data
stream
(also called a
pipe
) called the
stdout
pipe. The shell itself performs the action of reading those characters one by one and displaying them on the screen. The word
pipe
itself means exactly that: A program places data in the one end of a funnel while another program reads that data from the other end. Pipes allow two separate programs to perform simple communications with each other. In this case, the program is merely communicating with the shell in order to display some output.
The same is true with the cat command explained previously. This command, when run with no arguments, reads from the
stdin
pipe. By default, this pipe is the keyboard. One further pipe is the
stderr
pipe to which a program writes error messages.
It is not possible to see whether a program message is caused by the program writing to its stderr or stdout pipe because usually both are directed to the screen. Good programs, however, always write to the appropriate pipes to allow output to be specially separated for diagnostic purposes if need be.
73
8.2. Tutorial 8. Streams and
sed
— The Stream Editor
Create a text file with lots of lines that contain the word contains the word GNU as well as the word Linux .
GNU and one line that
Then run grep GNU myfile.txt
.
The result is printed to stdout as usual.
Now try grep GNU myfile.txt > gnu lines.txt
. What is happening here is that the output of the grep command is being ate a new file
redirected
into a file. The gnu lines.txt
> gnu lines.txt
tells the shell to creand to fill it with any output from stdout instead of displaying the output as it usually does. If the file already exists, it will be
&
Shortened to zero length.
-
truncated
.
of
¨
>
Now suppose you want to append further output to this file. Using does
not
truncate the file, but appends output to it. Try echo "morestuff" >> gnu_lines.txt
§
>> instead then view the contents of gnu lines.txt
.
¥
¦
5
The real power of pipes is realized when one program can read from the output of another program. Consider the grep command, which reads from stdin when given no arguments; run
¨ grep with one argument on the command-line:
grep GNU
A line without that word in it
Another line without that word in it
A line with the word GNU in it
A line with the word GNU in it
I have the idea now
ˆC
#
§
¥
¦ grep ’s default behavior is to read from stdin when no files are given. As you can see, it is doing its usual work of printing lines that have the word GNU in them. Hence, lines containing GNU will be printed twice—as you type them in and again when reads them and decides that they contain GNU .
grep
Now try grep GNU myfile.txt | grep Linux . The first lines with the word GNU in them to stdout. The | grep outputs all specifies that all stdout is to be typed as stdin (as we just did above) into the next command, which is also a grep command.
The second grep grep command scans that data for lines with the word is often used this way as a
filter
&
Something that screens data.
-
Linux in them.
and can be used multiple times, for example,
74
8. Streams and
sed
— The Stream Editor 8.4. A Complex Piping Example
¨ grep L myfile.txt | grep i | grep n | grep u | grep x
§
The < character redirects the contents of a file in place of stdin. In other words, the contents of a file replace what would normally come from a keyboard. Try
¨ ¥
¦
¥
¦
In Chapter 5 we used grep on a dictionary to demonstrate regular expressions.
This is how a dictionary of words can be created (your dictionary might be under
/var/share/ or under
¨
/usr/lib/aspell instead): cat /usr/lib/ispell/english.hash | strings | tr ’A-Z’ ’a-z’ \
| grep ’ˆ[a-z]’ | sort -u > mydict
§
&
A backslash \ as the last character on a line indicates that the line is to be continued. You can leave out the \ but then you must leave out the newline as well — this is known as
line continuation
.
-
¥
¦
The file english.hash
contains the U
NIX dictionary normally used for spell checking. With a bit of filtering, you can create a dictionary that will make solving crossword puzzles a breeze. First, we use the command strings , explained previously, to extract readable bits of text. Here we are using its alternate mode of operation where it reads from stdin when no files are specified on its command-line. The command
The tr grep
(abbreviated from
translate
—see tr (1)) then converts upper to lower case.
command then filters out lines that do not start with a letter. Finally, the sort command sorts the words in alphabetical order. The -u option stands for u
nique
, and specifies that duplicate lines of text should be stripped. Now try less mydict .
Try the command ls nofile.txt > A . We expect that ls will give an error message if the file doesn’t exist. The error message is, however, displayed and not written into the file A . The reason is that ls has written its error message to stderr while > has only redirected stdout. The way to get both stdout and stderr to both go to the same file is to use a is called
redirection operator
. As far as the shell is concerned, stdout is called 1 and stderr
2 , and commands can be appended with a
redirection
like 2>&1 to dictate that stderr is to be mixed into the output of stdout. The actual words stderr and stdout are only used in
C
programming, where the number
1
,
2
are known as
file numbers
or
file descriptors
. Try the following:
75
8.5. Redirecting Streams with
>&
8. Streams and
sed
— The Stream Editor
¨ touch existing_file rm -f non-existing_file ls existing_file non-existing_file
§
¥
¦ ls will output two lines: a line containing a listing for the file and a line containing an error message to explain that the file existing file non-existing file does not exist. The error message would have been written to stderr or file descriptor number number
2
1 .
, and the remaining line would have been written to stdout or file descriptor
Next we try
¨ ls existing_file non-existing_file 2>A cat A
§
¥
¦
Now A contains the error message, while the remaining output came to the screen. Now try
¨ ls existing_file non-existing_file 1>A cat A
§
¥
¦
The notation 1>A is the same as file descriptor 1
>A because the shell assumes that you are referring to when you don’t specify a file descriptor. Now A contains the stdout output, while the error message has been redirected to the screen.
Now try
¨ ls existing_file non-existing_file 1>A 2>&1 cat A
§
Now A contains both the error message and the normal output. The
redirection operator
.
x
>&
y
tells the shell to write pipe
x
into pipe
y
.
>& is called a
Redirection is specified from right to left on the command-line
. Hence, the above command means to mix stderr into stdout and
then
to redirect stdout to the file A .
¥
¦
Finally,
¨ ls existing_file non-existing_file 2>A 1>&2 cat A
§
We notice that this has the same effect, except that here we are doing the reverse: redirecting stdout into stderr and then redirecting stderr into a file A .
¥
¦
To see what happens if we redirect in reverse order, we can try,
76
8. Streams and
sed
— The Stream Editor 8.6. Using
sed
to Edit Streams
¨ ls existing_file non-existing_file 2>&1 1>A cat A
§ which means to redirect stdout into a file A , and
then
to redirect stderr into stdout. This command will therefore not mix stderr and stdout because the redirection to A came first.
¥
¦
ed used to be the standard text programmable.
sed stands for
ed
itor for U
NIX
. It is cryptic to use but is compact and
stream editor
and is the only incarnation of ed that is commonly used today.
grep sed allows editing of files non-interactively. In the way that can search for words and filter lines of text, sed can do search-replace operations and insert and delete lines into text files.
man page to speak of. Do info sed to see sed is one of those programs with no sed ’s comprehensive info pages with examples.
The most common usage of tive words.
sed sed is to replace words in a stream with alternareads from stdin and writes to stdout. Like grep , it is line buffered, which means that it reads one line in at a time and then writes that line out again after performing whatever editing operations. Replacements are typically done with
¨ cat <file> | sed -e ’s/<search-regexp>/<replace-text>/<option>’ \
> <resultfile>
§
¥
¦ where <search-regexp> is a regular expression, would like to replace each occurrence with, and
<replace-text>
<option> is the text you is nothing or g , which means to replace every occurrence in the same line (usually sed just replaces the first occurrence of the regular expression in each line). (There are other <option> ; see the sed info page.) For demonstration, type
¨ sed -e ’s/e/E/g’
§
¥
¦ and type out a few lines of English text.
The section explains how to do the apparently complex task of moving text around within lines. Consider, for example, the output of ls : say you want to automatically strip out only the size column— sed can do this sort of editing if you use the special
\( \) notation to group parts of the regular expression together. Consider the following example:
77
8.7. Regular Expression Subexpressions 8. Streams and
sed
— The Stream Editor
¨ sed -e ’s/\(\<[ˆ ]*\>\)\([ ]*\)\(\<[ˆ ]*\>\)/\3\2\1/g’
§
Here sed is searching for the expression \<.*\>[ ]*\<.*\> . From the chapter on regular expressions, we can see that it matches a whole word, an arbitrary amount of whitespace, and then another whole word. The they can be referred to in <replace-text>
\( \) groups these three so that
. Each part of the regular expression inside \( \) is called a
subexpression
sion is numbered—namely, \1 , \2 of the regular expression. Each subexpres-
, etc. Hence, \1 in <replace-text> is the first
\<[ˆ ]*\> , \2 is [ ]* , and \3 is the second \<[ˆ ]*\> .
¥
¦
Now test to see what happens when you run this:
¨ sed -e ’s/\(\<[ˆ ]*\>\)\([ ]*\)\(\<[ˆ ]*\>\)/\3\2\1/g’
GNU Linux is cool
¥
¦
To return to our ls example (note that this is just an example, to count file sizes you should instead use the du command), think about how we could sum the bytes sizes of all the files in a directory:
¨ expr 0 ‘ls -l | grep ’ˆ-’ | \ sed ’s/ˆ\([ˆ ]*[ ]*\)\{4,4\}\([0-9]*\).*$/ + \2/’‘
§
¥
¦
We know that ls -l output lines start with for ordinary files. So we use grep to strip lines not starting with . If we do an ls -l , we see that the output is divided into four columns of stuff we are not interested in, and then a number indicating the size of the file. A column (or
field
) can be described by the regular expression [ˆ ]*[ ]* , that is, a length of text with no whitespace, followed by a length of whitespace. There are four of these, so we bracket it with \( \) and then use the \{ \} notation to specify that we want exactly 4 . After that come our number characters, which we are not interested in, .*$
[0-9]* , and then any trailing
. Notice here that we have neglected to use \< \> notation to indicate whole words. The reason is that sed tries to match the maximum number of characters legally allowed and, in the situation we have here, has exactly the same effect.
If you haven’t yet figured it out, we are trying to get that column of byte sizes
¥
+ 438
+ 1525
+ 76
+ 92146
§ so that expr can understand it. Hence, we replace each line with subexpression \2 and a leading + sign. Backquotes give the output of this to expr , which studiously sums
¦
78
8. Streams and
sed
— The Stream Editor 8.8. Inserting and Deleting Lines them, ignoring any newline characters as though the summation were typed in on a single line. There is one minor problem here: the first line contains a + with nothing before it, which will cause expr to complain. To get around this, we can just add a to the expression, so that it becomes 0 + . . . .
0
sed can perform a few operations that make it easy to write scripts that edit configuration files for you. For instance,
¨ sed -e ’7a\ an extra line.\ another one.\ one more.’
§
¥
¦ a ppends three lines
¨ sed -e ’7i\ an extra line.\ another one.\ one more.’
§
after
line 7, whereas i nserts three lines
¨ sed -e ’3,5D’
§
before
line 7. Then
D eletes lines 3 through 5.
¥
¦
¥
¦
In sed terminology, the numbers here are called regular expressions matches. To demonstrate:
¨
addresses
, which can also be
¥
¦ deletes all the lines starting from a line matching the regular expression up to a line matching Love Jane
Dear Henry
(or the end of the file if one does not exist).
This behavior applies just as well to to insertions:
¨ sed -e ’/Love Jane/i\
Love Carol\
Love Beth’
§
Note that the $
¨ sed -e ’$i\ symbol indicates the last line:
The new second last line\
¥
¦
¥
79
8.8. Inserting and Deleting Lines 8. Streams and
sed
— The Stream Editor
The new last line.’
§ ¦ and finally, the negation symbol, !
, is used to match all lines
¨ sed -e ’7,11!D’
§
not
specified; for instance,
¥
¦ deletes all lines
except
lines 7 through 11.
80
From this chapter you will get an idea about what is happening under the hood of your
U
NIX system, but go have some coffee first.
On U
NIX
, when you run a program (like any of the shell commands you have been using), the actual computer instructions are read from a file on disk from one of the bin/ directories and placed in RAM. The program is then executed in memory and becomes a
process
. A
process
is some command/program/shell-script that is being run
(or
executed
) in memory. When the process has finished running, it is removed from memory. There are usually about 50 processes running simultaneously at any one time on a system with one person logged in. The CPU hops between each of them to give a share of its
execution time
.
&
Time given to carry out the instructions of a particular program. Note this is in contrast to Windows or DOS where the program itself has to allow the others a share of the CPU: under
U
NIX
, the process has no say in the matter.
Each process is given a process number called the
PID
(process ID). Besides the memory actually occupied by the executable, the process itself seizes additional memory for its operations.
In the same way that a file is owned by a particular user and group, a process also has an owner—usually the person who ran the program. Whenever a process tries to access a file, its ownership is compared to that of the file to decide if the access is permissible. Because all devices are files, the only way a process can do
anything
is through a file, and hence file permission restrictions are the only kind of restrictions ever needed on U
NIX
.
&
There are some exceptions to this.
-
This is how U
NIX access control and security works.
81
9.2.
ps
— List Running Processes 9. Processes, Environment Variables
The center of this operation is called the U
NIX
kernel
. The kernel is what actually does the hardware access, execution, allocation of process IDs, sharing of CPU time, and ownership management.
Log in on a terminal and type the command ps . You should get some output like:
¨
PID TTY STAT TIME COMMAND
5995
5999
6030
§
2 S
2 S
2 R
0:00 /bin/login -- myname
0:00 -bash
0:00 ps ps with no options shows three processes to be running. These are the only three processes visible to you as a user, although there are other system processes not belonging to you. The first process was the program that logged you in by displaying the login prompt and requesting a password. It then ran a second process call bash , the Bourne Again shell
&
The Bourne shell was the original U
NIX shell
typing commands. Finally, you ran ps where you have been
, which must have found itself when it checked which processes were running, but then exited immediately afterward.
¥
¦
The shell has many facilities for controlling and executing processes—this is called job control. Create a small script called proc.sh
:
¨
#!/bin/sh echo "proc.sh: is running" sleep 1000
§
¥
¦
Run the script with chmod 0755 proc.sh
and then ./proc.sh
. The shell
blocks
, waiting for the process to exit. Now press ˆZ. This will cause the process to
stop
(that is, pause but not terminate). Now do a ps again. You will see your script listed. However, it is not presently running because it is in the condition of being stopped. Type bg (for
background
). The script will now be “unstopped” and run in the background. You can now try to run other processes in the meantime. Type fg , and the script returns to the
foreground
. You can then type ˆC to interrupt the process.
82
9. Processes, Environment Variables 9.4. Creating Background Processes
5
Create a program that does something a little more interesting:
¨
#!/bin/sh echo "proc.sh: is running" while true ; do echo -e ’\a’ sleep 2 done
§
¥
¦
Now perform the ˆZ, bg , fg , and ˆC operations from before. To put a process immediately into the background, you can use:
¨
./proc.sh &
§
¥
¦
The
JOB CONTROL
footnotes are mine) section of the bash man page ( bash (1)) looks like this
1
: (the
JOB CONTROL
Job control
refers to the ability to selectively stop (
suspend
) the execution of processes and continue (
resume
) their execution at a later point. A user typically employs this facility via an interactive interface supplied jointly by the system’s terminal driver and
bash
.
The shell associates a listed with the
jobs
job
with each pipeline.
command. When
background
), it prints a line that looks like:
bash
&
What does this mean? It means that each time you execute something in the background, it gets its own unique number, called the job number.
-
It keeps a table of currently executing jobs, which may be starts a job asynchronously (in the
[1] 25647
indicating that this job is job number 1 and that the process ID of the last process in the pipeline associated with this job is 25647. All of the processes in a single pipeline are members of the same job.
job control.
Bash
uses the
job
abstraction as the basis for
To facilitate the implementation of the user interface to job control, the system maintains the notion of a
current terminal process group ID
. Members of this process group (processes whose process group ID is equal to the current terminal process group ID) receive keyboard-generated signals such as
SIGINT
. These processes are said to be in the
foreground
.
Background
processes are those whose process group
ID differs from the terminal’s; such processes are immune to keyboard-generated
1
Thanks to Brian Fox and Chet Ramey for this material.
83
9.5.
kill
ing a Process, Sending Signals 9. Processes, Environment Variables signals. Only foreground processes are allowed to read from or write to the terminal. Background processes which attempt to read from (write to) the terminal are sent a
SIGTTIN (SIGTTOU)
signal by the terminal driver, which, unless caught, suspends the process.
If the operating system on which
bash
is running supports job control,
bash
allows you to use it. Typing the
suspend
character (typically
ˆZ
, Control-Z) while a process is running causes that process to be stopped and returns you to
bash
.
Typing the
delayed suspend
character (typically
ˆY
, Control-Y) causes the process to be stopped when it attempts to read input from the terminal, and control to be returned to
bash
. You may then manipulate the state of this job, using the command to continue it in the background, the
fg bg
command to continue it in the foreground, or the
kill
command to kill it. A
ˆZ
takes effect immediately, and has the additional side effect of causing pending output and typeahead to be discarded.
There are a number of ways to refer to a job in the shell. The character troduces a job name. Job number
n
may be referred to as
%n
%
in-
. A job may also be referred to using a prefix of the name used to start it, or using a substring that appears in its command line. For example,
%ce
refers to a stopped
ce
job. If a prefix matches more than one job,
bash
reports an error. Using hand, refers to any job containing the string
ce
%?ce
, on the other in its command line. If the substring matches more than one job,
bash
reports an error. The symbols
%%
and
%+
refer to the shell’s notion of the the foreground. The
current job previous job
, which is the last job stopped while it was in may be referenced using
%-
. In output pertaining to jobs (e.g., the output of the with a
+ jobs
, and the previous job with a command), the current job is always flagged
-
.
Simply naming a job can be used to bring it into the foreground: onym for
“fg %1”
%1
is a syn-
, bringing job 1 from the background into the foreground.
Similarly,
“%1 &”
resumes job 1 in the background, equivalent to
“bg %1”
.
The shell learns immediately whenever a job changes state.
Normally,
bash
waits until it is about to print a prompt before reporting changes in a job’s status so as to not interrupt any other output. If the is set,
bash
-b
option to the
set
builtin command reports such changes immediately. (See also the description of
notify
variable under
Shell Variables
above.)
If you attempt to exit
bash
while jobs are stopped, the shell prints a message warning you. You may then use the
jobs
command to inspect their status. If you do this, or try to exit again immediately, you are not warned again, and the stopped jobs are terminated.
To terminate a process, use the kill command:
84
9. Processes, Environment Variables 9.5.
kill
ing a Process, Sending Signals
¨ kill <PID>
§
The kill command actually sends a termination
signal
to the process. The sending of a signal simply means that the process is asked to execute one of 30 predefined functions.
In some cases, developers would not have bothered to define a function for a particular signal number (called
catching
the signal); in which case the kernel will substitute the default behavior for that signal. The default behavior for a signal is usually to ignore the signal, to stop the process, or to terminate the process. The default behavior for the
termination
signal is to terminate the process.
¥
¦
To send a specific signal to a process, you can name the signal on the commandline or use its numerical equivalent:
¨ ¥
¦ or
¨ kill -15 12345
§ which is the signal that line.
kill normally sends when none is specified on the command-
¥
¦
To unconditionally terminate a process:
¨ kill -SIGKILL 12345
§ or
¨ kill -9 12345
§ which should only be used as a last resort.
SIGKILL
signal
.
Processes are prohibited from ever catching the
¥
¦
¥
¦
It is cumbersome to have to constantly look up the PID of a process. Hence the
GNU utilities have a command, killall , that sends a signal to all processes of the same name:
¨ killall -<signal> <process_name>
§
¥
¦
This command is useful when you are sure that there is only one of a process running, either because no one else is logged in on the system or because you are not logged in as superuser.
Note that on other
U
NIX
systems, the killall command kills
all
the processes that you are allowed to kill. If you are root, this action would crash the machine.
85
9.6. List of Common Signals 9. Processes, Environment Variables
The full list of signals can be gotten from
/usr/include/asm/signal.h
.
signal (7), and in the file
SIGHUP
(1)
Hang up
. If the terminal becomes disconnected from a process, this signal is sent automatically to the process. Sending a process this signal often causes it to reread its configuration files, so it is useful instead of restarting the process.
Always check the man page to see if a process has this behavior.
SIGINT
(2)
Interrupt
from keyboard. Issued if you press ˆC.
SIGQUIT
(3)
Quit
from keyboard. Issued if you press ˆD.
SIGFPE
(8)
Floating point exception
. Issued automatically to a program performing some kind of illegal mathematical operation.
SIGKILL
(9)
Kill
signal. This is one of the signals that can never be
caught
by a process.
If a process gets this signal it
must
quit immediately and will not perform any clean-up operations (like closing files or removing temporary files). You can send a process a SIGKILL signal if there is no other means of destroying it.
SIGUSR1
(10),
SIGUSR2
(12)
User signal
. These signals are available to developers when they need extra functionality. For example, some processes begin logging debug messages when you send them SIGUSR1 .
SIGSEGV
(11)
Segmentation violation
. Issued automatically when a process tries to access memory outside of its allowable address space, equivalent to a
Fatal Exception
or
General Protection Fault
under Windows. Note that programs with bugs or programs in the process of being developed often get these signals. A program receiving a SIGSEGV , however, can never cause the rest of the system to be compromised. If the kernel itself were to receive such an error, it would cause the system to come down, but such is extremely rare.
SIGPIPE
(13)
Pipe
died. A program was writing to a pipe, the other end of which is no longer available.
SIGTERM
(15)
Terminate
. Cause the program to quit gracefully
SIGCHLD
(17)
Child terminate
. Sent to a parent process every time one of its spawned processes dies.
86
9. Processes, Environment Variables 9.7. Niceness of Processes, Scheduling Priority
All processes are allocated execution time by the kernel. If all processes were allocated the same amount of time, performance would obviously get worse as the number of processes increased. The kernel uses heuristics
&
Sets of rules.
to guess how much time each process should be allocated. The kernel tries to be fair—two users competing for
CPU usage should both get the same amount.
Most processes spend their time waiting for either a key press, some network input, some device to send data, or some time to elapse. They hence do not consume
CPU.
On the other hand, when more than one process runs flat out, it can be difficult for the kernel to decide if it should be given greater
priority
than another process. What if a process is doing some operation more important than another process? How does the kernel tell? The answer is the U
NIX feature of
scheduling priority
or
niceness
. Scheduling priority ranges from +20 to -20 . You can set a process’s niceness with the renice
¥ renice <priority> <pid> renice <priority> -u <user> renice <priority> -g <group>
§ ¦
A typical example is the
SETI
program.
&
SETI stands for Search for Extraterrestrial Intelligence. SETI is an initiative funded by various obscure sources to scan the skies for radio signals from other civilizations. The data that SETI gathers has to be intensively processed. SETI distributes part of that data to anyone who wants to run a seti program in the background. This puts the idle time of millions of machines to “good” use. There is even a SETI screen-saver that has become quite popular. Unfortunately for the colleague in my office, he runs seti at -19 instead of +19 scheduling priority, so nothing on his machine works right. On the other hand, I have inside information that the millions of other civilizations in this galaxy and others are probably not using radio signals to communicate at all :-)
with:
Set its priority to
¥ renice +19 <pid>
§ ¦ to make it disrupt your machine as little as possible.
Note that nice values have the reverse meaning that you would expect: process that eats
little
CPU, while -19 is a process that eats
lots
+19 means a
. Only superuser can set processes to negative nice values.
Mostly, multimedia applications and some device utilities are the only processes that need negative renicing, and most of these will have their own command-line options to set the nice value. See, for example, cdrecord (1) and mikmod (1) — a negative nice value will prevent skips in your playback.
&
L
INUX will soon have so called
real time
process scheduling. This is a kernel feature that reduces scheduling
latency
(the gaps between CPU execution
87
9.8. Process CPU/Memory Consumption,
top
9. Processes, Environment Variables time of a process, as well as the time it takes for a process to wake). There are already some kernel patches that accomplish this goal.
-
Also useful are the -u that a user or group owns.
and -g options, which set the priority of all the processes
Further, we have the nice command, which starts a program under a defined niceness relative to the current nice value of the present user. For example,
¨ nice +<priority> <pid> nice -<priority> <pid>
§
¥
¦
Finally, the snice command can both display and set the current niceness. This command doesn’t seem to work on my machine.
¨ snice -v <pid>
§
¥
¦
The top command sorts all processes by their CPU and memory consumption and displays the
top
twenty or so in a table. Use top whenever you want to see what’s hogging your system.
top -q -d 2 is useful for scheduling the top to a high priority, so that it is sure to refresh its listing without lag.
command itself top -n 1 -b > top.txt
process.
lists all processes, and top -n 1 -b -p <pid> prints information on one top has some useful interactive responses to key presses:
f
Shows a list of displayed fields that you can alter interactively. By default the only fields shown are USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME
COMMAND which is usually what you are most interested in. (The field meanings are given below.)
r
Renices a process.
k
Kills a process.
The top man page describes the field meanings. Some of these are confusing and assume knowledge of the internals of
C
programs. The main question people ask is:
How much memory is a process using?
for
Resident Set Size
.
RSS
The answer is given by the RSS field, which stands means the amount of RAM that a process consumes alone.
The following examples show totals for
all
processes running on my system (which had 65536 kilobytes of RAM at the time). They represent the total of the SIZE , RSS , and SHARE fields, respectively.
88
9. Processes, Environment Variables 9.8. Process CPU/Memory Consumption,
top
¨ echo ‘echo ’0 ’ ; top -q -n 1 -b | sed -e ’1,/PID *USER *PRI/D’ | \ awk ’{print "+" $5}’ | sed -e ’s/M/\\*1024/’‘ | bc
68016
5 echo ‘echo ’0 ’ ; top -q -n 1 -b | sed -e ’1,/PID *USER *PRI/D’ | \ awk ’{print "+" $6}’ | sed -e ’s/M/\\*1024/’‘ | bc
58908
¥
10 echo ‘echo ’0 ’ ; top -q -n 1 -b | sed -e ’1,/PID *USER *PRI/D’ | \ awk ’{print "+" $7}’ | sed -e ’s/M/\\*1024/’‘ | bc
The SIZE represents the total memory usage of a process.
RSS is the same, but excludes memory not needing actual RAM (this would be memory swapped to the swap partition).
SHARE is the amount shared between processes.
Other fields are described by the top man page (quoted verbatim) as follows:
¦
uptime
This line displays the time the system has been up, and the three load averages for the system. The load averages are the average number of processes ready to run during the last 1, 5 and 15 minutes. This line is just like the output of uptime(1). The uptime display may be toggled by the interactive l command.
processes
The total number of processes running at the time of the last update.
This is also broken down into the number of tasks which are running, sleeping, stopped, or undead. The processes and states display may be toggled by the t interactive command.
CPU states
Shows the percentage of CPU time in user mode, system mode, niced tasks, and idle. (Niced tasks are only those whose nice value is negative.) Time spent in niced tasks will also be counted in system and user time, so the total will be more than 100%. The processes and states display may be toggled by the t interactive command.
Mem
Statistics on memory usage, including total available memory, free memory, used memory, shared memory, and memory used for buffers. The display of memory information may be toggled by the m interactive command.
Swap
Statistics on swap space, including total swap space, available swap space, and used swap space. This and Mem are just like the output of free(1).
PID
The process ID of each task.
PPID
The parent process ID of each task.
UID
The user ID of the task’s owner.
USER
The user name of the task’s owner.
PRI
The priority of the task.
NI
The nice value of the task. Negative nice values are higher priority.
SIZE
The size of the task’s code plus data plus stack space, in kilobytes, is shown here.
89
9.9. Environments of Processes 9. Processes, Environment Variables
TSIZE
The code size of the task. This gives strange values for kernel processes and is broken for ELF processes.
DSIZE
TRS
Data + Stack size. This is broken for ELF processes.
Text resident size.
SWAP
D
Size of the swapped out part of the task.
Size of pages marked dirty.
LIB
RSS
Size of use library pages. This does not work for ELF processes.
The total amount of physical memory used by the task, in kilobytes, is shown here. For ELF processes used library pages are counted here, for a.out processes not.
SHARE
The amount of shared memory used by the task is shown in this column.
STAT
The state of the task is shown here. The state is either S for sleeping, D for uninterruptible sleep, R for running, Z for zombies, or T for stopped or traced.
These states are modified by a trailing ¡ for a process with negative nice value,
N for a process with positive nice value, W for a swapped out process (this does not work correctly for kernel processes).
WCHAN
depending on the availability of either /boot/psdatabase or the kernel link map /boot/System.map this shows the address or the name of the kernel function the task currently is sleeping in.
TIME
Total CPU time the task has used since it started. If cumulative mode is on, this also includes the CPU time used by the process’s children which have died. You can set cumulative mode with the S command line option or toggle it with the interactive command S. The header line will then be changed to
CTIME.
%CPU
The task’s share of the CPU time since the last screen update, expressed as a percentage of total CPU time per processor.
%MEM
The task’s share of the physical memory.
COMMAND
The task’s command name, which will be truncated if it is too long to be displayed on one line. Tasks in memory will have a full command line, but swapped-out tasks will only have the name of the program in parentheses (for example, ”(getty)”).
Each process that runs does so with the knowledge of several
var
=
value
text pairs. All this means is that a process can look up the value of some variable that it may have inherited from its parent process. The complete list of these text pairs is called the
environment
of the process, and each
var
is called an
environment variable
. Each process has its own environment, which is copied from the parent process’s environment.
After you have logged in and have a shell prompt, the process you are using
(the shell itself) is just like any other process with an environment with environment variables. To get a complete list of these variables, just type:
90
9. Processes, Environment Variables 9.9. Environments of Processes
¨ set
§
¥
¦
This command is useful for finding the value of an environment variable whose name you are unsure of:
¨ set | grep <regexp>
§
¥
¦
Try set | grep PATH to see the PATH environment variable discussed previously.
The purpose of an environment is just to have an alternative way of passing parameters to a program (in addition to command-line arguments). The difference is that an environment is inherited from one process to the next: for example, a shell might have a certain variable set and may run a file manager, which may run a word processor. The word processor inherited its environment from file manager which inherited its environment from the shell. If you had set an environment variable PRINTER within the shell, it would have been inherited all the way to the word processor, thus eliminating the need to separately configure which printer the word processor should use.
Try
¨
X="Hi there" echo $X
§
You have set a variable. But now run
¨ bash
§
¥
¦
¥
¦
You have now run a new process which is a
¨ echo $X
§
child
of the process you were just in. Type
¥
¦
You will see that X is not set. The reason is that the variable was not environment variable and hence was not inherited. Now type
¨ export
ed
as an
¥
¦ which breaks to the
¨ export X bash echo $X
§
parent
process. Then
You will see that the new bash now knows about X .
Above we are setting an arbitrary variable for our own use.
bash (and many other programs) automatically set many of their own environment variables. The bash
¥
¦
91
9.9. Environments of Processes 9. Processes, Environment Variables man page lists these (when it talks about mand unset <variable> unset
ting
a variable, it means using the com-
). You may not understand some of these at the moment, but they are included here as a complete reference for later.
The following is quoted verbatim from the bash man page. You will see that some variables are of the type that provide special information and are read but never never set, whereas other variables configure behavioral features of the shell (or other programs) and can be set at any time
2
.
Shell Variables
The following variables are set by the shell:
PPID
The process ID of the shell’s parent.
PWD
The current working directory as set by the
OLDPWD cd
command.
The previous working directory as set by the
cd
command.
REPLY
Set to the line of input read by the ments are supplied.
read
builtin command when no argu-
UID
Expands to the user ID of the current user, initialized at shell startup.
EUID
Expands to the effective user ID of the current user, initialized at shell startup.
BASH
Expands to the full pathname used to invoke this instance of
bash
.
BASH VERSION
Expands to the version number of this instance of
bash
.
SHLVL
Incremented by one each time an instance of
bash
is started.
RANDOM
Each time this parameter is referenced, a random integer is generated.
The sequence of random numbers may be initialized by assigning a value to
RANDOM
. If
RANDOM
is unset, it loses its special properties, even if it is subsequently reset.
SECONDS
Each time this parameter is referenced, the number of seconds since shell invocation is returned. If a value is assigned to
SECONDS
. the value returned upon subsequent references is the number of seconds since the assignment plus the value assigned. If
SECONDS
properties, even if it is subsequently reset.
is unset, it loses its special
LINENO
Each time this parameter is referenced, the shell substitutes a decimal number representing the current sequential line number (starting with 1) within a script or function. When not in a script or function, the value substituted is not guaranteed to be meaningful. When in a function, the value is not the number of the source line that the command appears on (that information has been lost by the time the function is executed), but is an approximation of the number of
simple commands
executed in the current function. If is unset, it loses its special properties, even if it is subsequently reset.
LINENO
HISTCMD
The history number, or index in the history list, of the current command. If
HISTCMD
is unset, it loses its special properties, even if it is subsequently reset.
2
Thanks to Brian Fox and Chet Ramey for this material.
92
9. Processes, Environment Variables 9.9. Environments of Processes
OPTARG
The value of the last option argument processed by the command (see
SHELL BUILTIN COMMANDS
below).
getopts
builtin
OPTIND
The index of the next argument to be processed by the command (see
SHELL BUILTIN COMMANDS
below).
getopts
builtin
HOSTTYPE
Automatically set to a string that uniquely describes the type of machine on which
bash
is executing. The default is system-dependent.
OSTYPE
Automatically set to a string that describes the operating system on which
bash
is executing. The default is system-dependent.
The following variables are used by the shell. In some cases, value to a variable; these cases are noted below.
bash
assigns a default
IFS
The
Internal Field Separator
that is used for word splitting after expansion and to split lines into words with the
read
builtin command. The default value is
“
<
space
><
tab
><
newline
>
”.
PATH
The search path for commands.
It is a colon-separated list of directories in which the shell looks for commands (see
ECUTION
below).
COMMAND EX-
The default path is system-dependent, and is set by the administrator who installs
bash
.
A common value is
“/usr/gnu/bin:/usr/local/bin:/usr/ucb:/bin:/usr/bin:.”.
HOME
The home directory of the current user; the default argument for the builtin command.
cd
CDPATH
The search path for the
cd
command. This is a colon-separated list of directories in which the shell looks for destination directories specified by the
cd
command. A sample value is ‘‘.:˜:/usr’’.
ENV
If this parameter is set when
bash
is executing a shell script, its value is interpreted as a filename containing commands to initialize the shell, as in
.bashrc
.
The value of
ENV
is subjected to parameter expansion, command substitution, and arithmetic expansion before being interpreted as a pathname.
PATH
is not used to search for the resultant pathname.
If this parameter is set to a filename and the
MAILPATH
variable is not set,
bash
informs the user of the arrival of mail in the specified file.
MAILCHECK
Specifies how often (in seconds)
bash
checks for mail. The default is
60 seconds. When it is time to check for mail, the shell does so before prompting. If this variable is unset, the shell disables mail checking.
MAILPATH
A colon-separated list of pathnames to be checked for mail. The message to be printed may be specified by separating the pathname from the message with a ‘?’. $ stands for the name of the current mailfile. Example:
MAILPATH=’/usr/spool/mail/bfox?"You have mail":˜/shell-mail?"$_ has mail!"’
Bash
supplies a default value for this variable, but the location of the user mail files that it uses is system dependent (e.g., /usr/spool/mail/
$USER
).
MAIL WARNING
If set, and a file that
bash
is checking for mail has been accessed since the last time it was checked, the message “The mail in
mailfile
has been read” is printed.
93
9.9. Environments of Processes 9. Processes, Environment Variables
PS1
The value of this parameter is expanded (see
PROMPTING
as the primary prompt string. The default value is “
bash“$
”.
below) and used
PS2
The value of this parameter is expanded and used as the secondary prompt string. The default is “
>
”.
PS3
The value of this parameter is used as the prompt for the
select
SHELL GRAMMAR
above).
command (see
PS4
The value of this parameter is expanded and the value is printed before each command
bash
displays during an execution trace. The first character of
PS4
is replicated multiple times, as necessary, to indicate multiple levels of indirection. The default is “
+
”.
HISTSIZE
The number of commands to remember in the command history (see
HISTORY
below). The default value is 500.
HISTFILE
The name of the file in which command history is saved. (See
HISTORY
below.) The default value is
˜/.bash history
. If unset, the command history is not saved when an interactive shell exits.
HISTFILESIZE
The maximum number of lines contained in the history file. When this variable is assigned a value, the history file is truncated, if necessary, to contain no more than that number of lines. The default value is 500.
OPTERR
the
If set to the value 1,
getopts bash
builtin command (see displays error messages generated by
SHELL BUILTIN COMMANDS
below).
OPTERR
executed.
is initialized to 1 each time the shell is invoked or a shell script is
PROMPT COMMAND
If set, the value is executed as a command prior to issuing each primary prompt.
IGNOREEOF
Controls the action of the shell on receipt of an
EOF
character as the sole input. If set, the value is the number of consecutive as the first characters on an input line before
bash
EOF
characters typed exits. If the variable exists but does not have a numeric value, or has no value, the default value is 10.
If it does not exist,
EOF
signifies the end of input to the shell. This is only in effect for interactive shells.
TMOUT
If set to a value greater than zero, the value is interpreted as the number of seconds to wait for input after issuing the primary prompt.
Bash
terminates after waiting for that number of seconds if input does not arrive.
FCEDIT
The default editor for the
fc
builtin command.
FIGNORE
A colon-separated list of suffixes to ignore when performing filename completion (see the entries in
READLINE
FIGNORE
below). A filename whose suffix matches one of is excluded from the list of matched filenames. A sample value is “.o:˜”.
INPUTRC
The filename for the readline startup file, overriding the default of
˜/.inputrc
(see
READLINE
below).
notify
If set,
bash
reports terminated background jobs immediately, rather than waiting until before printing the next primary prompt (see also the
-b
option to the
set
builtin command).
94
9. Processes, Environment Variables 9.9. Environments of Processes
history control
HISTCONTROL
If set to a value of
ignorespace
, lines which begin with a
space
character are not entered on the history list. If set to a value of
ignoredups
, lines matching the last history line are not entered. A value of
ignoreboth
combines the two options. If unset, or if set to any other value than those above, all lines read by the parser are saved on the history list.
command oriented history
If set,
bash
attempts to save all lines of a multiple-line command in the same history entry. This allows easy re-editing of multi-line commands.
glob dot filenames
If set,
bash
includes filenames beginning with a ‘.’ in the results of pathname expansion.
allow null glob expansion
files (see
If set,
bash
Pathname Expansion
allows pathname patterns which match no below) to expand to a null string, rather than themselves.
histchars
The two or three characters which control history expansion and tokenization (see
HISTORY EXPANSION
below). The first character is the
history expansion character
, that is, the character which signals the start of a history expansion, normally ‘
!
’. The second character is the
quick substitution
character, which is used as shorthand for re-running the previous command entered, substituting one string for another in the command. The default is ‘
ˆ
’. The optional third character is the character which signifies that the remainder of the line is a comment, when found as the first character of a word, normally ‘
#
’.
The history comment character causes history substitution to be skipped for the remaining words on the line. It does not necessarily cause the shell parser to treat the rest of the line as a comment.
nolinks
If set, the shell does not follow symbolic links when executing commands that change the current working directory. It uses the physical directory structure instead. By default,
bash
follows the logical chain of directories when performing commands which change the current directory, such as
cd
. See also the description of the
-P
option to the
set
builtin (
SHELL BUILTIN COM-
MANDS
below).
hostname completion file
HOSTFILE
Contains the name of a file in the same format as
/etc/hosts
that should be read when the shell needs to complete a hostname. The file may be changed interactively; the next time hostname completion is attempted
bash
adds the contents of the new file to the already existing database.
noclobber
If set,
bash
does not overwrite an existing file with the
>
,
>
&
, and
<>
redirection operators. This variable may be overridden when creating output files by using the redirection operator
>
—
instead of
>
(see also the
-C
option to the
set
builtin command).
auto resume
This variable controls how the shell interacts with the user and job control. If this variable is set, single word simple commands without redirections are treated as candidates for resumption of an existing stopped job.
There is no ambiguity allowed; if there is more than one job beginning with the string typed, the job most recently accessed is selected. The
name
of a
95
9.9. Environments of Processes 9. Processes, Environment Variables stopped job, in this context, is the command line used to start it. If set to the value
exact
, the string supplied must match the name of a stopped job exactly; if set to
substring
, the string supplied needs to match a substring of the name of a stopped job. The
%?
job id (see
substring
JOB CONTROL
value provides functionality analogous to the below). If set to any other value, the supplied string must be a prefix of a stopped job’s name; this provides functionality analogous to the
%
job id.
no exit on failed exec
If this variable exists, a non-interactive shell will not exit if it cannot execute the file specified in the
exec
builtin command. An interactive shell does not exit if
exec
fails.
cdable vars
If this is set, an argument to the
cd
builtin command that is not a directory is assumed to be the name of a variable whose value is the directory to change to.
96
Electronic Mail
, or
e
mail, is the way most people first come into contact with the Internet. Although you may have used email in a graphical environment, here we show you how mail was first intended to be used on a multiuser system. To a large extent what applies here is really what is going on in the background of any system that supports mail.
A mail message is a block of text sent from one user to another, using some mail command or mailer program. A mail message will usually also be accompanied by a
subject
explaining what the mail is about. The idea of mail is that a message can be sent to someone even though he may not be logged in at the time and the mail will be stored for him until he is around to read it. An email address is probably familiar to you, for example: [email protected]
. This means that bruce has a user account on a computer called kangeroo.co.au
, which often means that he can log in as bruce on that machine. The text after the @ is always the name of the machine.
Today’s Internet does not obey this exactly, but there is always a machine that bruce
does
have an account on where mail is eventually sent.
machine.
-
&
That machine is also usually a U
NIX like
Sometimes email addresses are written in a more user-friendly form
Bruce Wallaby <[email protected]> or [email protected]
(Bruce Wallaby) . In this case, the surrounding characters are purely cosmetic; only [email protected]
is ever used.
When mail is received for you (from another user on the system or from a user from another system) it is appended to the file /var/spool/mail/<username> called the
mail file
or
mailbox file
; <username> is your login name. You then run some program that interprets your mail file, allowing you to browse the file as a sequence of mail messages and read and reply to them.
An actual addition to your mail file might look like this:
97
10. Mail
5
10
15
20
25
¨
From [email protected]
Mon Jun
Return-Path: <[email protected]>
1 21:20:21 1998
Received: from pizza.cranzgot.co.za ([email protected] [192.168.2.254]) by onion.cranzgot.co.za (8.8.7/8.8.7) with ESMTP id VAA11942 for <[email protected]>; Mon, 1 Jun 1998 21:20:20 +0200
Received: from mail450.icon.co.za (mail450.icon.co.za [196.26.208.3]) by pizza.cranzgot.co.za (8.8.5/8.8.5) with ESMTP id VAA19357 for <[email protected]>; Mon, 1 Jun 1998 21:17:06 +0200
Received: from smtp02.inetafrica.com (smtp02.inetafrica.com [196.7.0.140]) by mail450.icon.co.za (8.8.8/8.8.8) with SMTP id VAA02315 for <[email protected]>; Mon, 1 Jun 1998 21:24:21 +0200 (GMT)
Received: from default [196.31.19.216] (fullmoon) by smtp02.inetafrica.com with smtp (Exim 1.73 #1) id 0ygTDL-00041u-00; Mon, 1 Jun 1998 13:57:20 +0200
Message-ID: <[email protected]>
Date: Mon, 01 Jun 1998 13:56:15 +0200
From: a person <[email protected]>
Reply-To: [email protected]
Organization: private
X-Mailer: Mozilla 3.01 (Win95; I)
MIME-Version: 1.0
To: paul sheer <[email protected]>
Subject: hello
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: RO
X-Status: A
30
35 hey paul its me how r u doing i am well what u been upot hows life hope your well
¥
Each mail message begins with a a space. Then comes the
mail header
From at the beginning of a line, followed by
, explaining where the message was routed from to get to your mailbox, who sent the message, where replies should go, the subject of the mail, and various other
mail header fields
. Above, the header is longer than the mail messages. Examine the header carefully.
The header ends with the first blank line. The message itself (or after. The next header in the file will once again start with a From
body
.
) starts right
From s on the beginning of a line be corrupt.
never
exist within the body. If they do, the mailbox is considered to
Some mail readers store their messages in a different format. However the above format (called the
mbox
format) is the most common for U
NIX
. Of interest is a format called
Maildir
, which is one format that does contiguous file. Instead,
Maildir not
store mail messages in a single stores each message as a separate file within a directory. The name of the directory is then considered to be the mailbox “file”; by default
Maildir uses a directory Maildir within the user’s home directory.
98
¦
10. Mail 10.1. Sending and Reading Mail
The simplest way to send mail is to use the s "hello there" <username> . The mail mail command.
Type mail program will then wait for you to type out your message. When you are finished, enter a .
on its own on a single line. The user name will be another user on your system. If no one else is on your system, then send mail to root with mail -s "Hello there" root or mail -s "Hello there" [email protected] (if the @ is not present, then the local machine, localhost , is implied). Sending files over email is discussed in Section 12.6.
You can use mail to view your mailbox. This is a primitive utility in comparison with modern graphical mail readers but is probably the only mail reader that can handle arbitrarily sized mailboxes. Sometimes you may get a mailbox that is over a gigabyte in size, and mail is the only way to delete messages from it. To view your mailbox, type mail , and then z to read your next window of messages, and zto view the previous window. Most commands work like
command message number
, for example, delete 14 to it (for a N or reply 7 ew message).
. The message number is the left column with an N next mutt
For the state of the art in terminal-based mail readers (also called mail and pine .
& pine ’s license is not Free.
-
clients
), try
There are also some graphical mail readers in various stages of development. At the time I am writing this, I have been using balsa for a few months, which was the best mail reader I could find.
To send mail, you need not use a mail client at all. The mail client just follows
(Simple Mail Transfer Protocol), which you can type in from the keyboard.
SMTP
MTA
word
For example, you can send mail by telnet
ing
to
port 25
of a machine that has an
(Mail Transfer Agent—also called the
daemon mailer daemon
or
mail server
) running. The denotes programs that run silently without user intervention.
This is, in fact, how so-called
anonymous mail
or
spam mail
&
Spam
is a term used to indicate unsolicited email—that is, junk mail that is posted in bulk to large numbers of arbitrary email addresses. Sending spam is considered unethical Internet practice.
is sent on the Internet. A mailer daemon runs in most small institutions in the world and has the simple task of receiving mail requests and relaying them on to other mail servers. Try this, for example
(obviously substituting mail.cranzgot.co.za
for the name of a mail server that you normally use):
¨
telnet mail.cranzgot.co.za 25
Trying 192.168.2.1...
¥
99
10.2. The SMTP Protocol — Sending Mail Raw to Port
25
10. Mail
5
10
15
20
Connected to 192.168.2.1.
Escape character is ’ˆ]’.
220 onion.cranzgot.co.za ESMTP Sendmail 8.9.3/8.9.3; Wed, 2 Feb 2000 14:54:47 +0200
HELO cericon.cranzgot.co.za
250 onion.cranzgot.co.za Hello cericon.ctn.cranzgot.co.za [192.168.3.9], pleased to meet yo
MAIL FROM:[email protected]
250 [email protected] Sender ok
RCPT TO:[email protected]
250 [email protected] Recipient ok
DATA
354 Enter mail, end with "." on a line by itself
Subject: just to say hi hi there heres a short message
.
250 OAA04620 Message accepted for delivery
QUIT
221 onion.cranzgot.co.za closing connection
Connection closed by foreign host.
¦
The above causes the message “ hi there heres a short message ” to be delivered to [email protected]
(the R
e
C
i
P
ien
T ). Of course, I can enter any address that I like as the sender, and it can be difficult to determine who sent the message.
In this example, the Subject: is the only header field, although I needn’t have supplied a header at all.
Now, you may have tried this and gotten a rude error message. This might be because the MTA is configured
not
to relay mail except from specific trusted machines— say, only those machines within that organization. In this way anonymous email is prevented.
On the other hand, if you are connecting to the user’s very own mail server, it has to necessarily receive the mail, regardless of who sent it. Hence, the above is a useful way to supply a bogus FROM address and thereby send mail almost anonymously. By
“almost” I mean that the mail server would still have logged the machine from which you connected and the time of connection—there is no perfect anonymity for properly configured mail servers.
The above technique is often the only way to properly test a mail server, and should be practiced for later.
100
U
NIX intrinsically supports multiple users. Each user has a personal
/home/<username>
home
directory in which the user’s files are stored, hidden from other users.
So far you may have been using the machine as the root user, who is the system administrator and has complete access to every file on the system. The root is also called the
superuser
. The home directory of the
an ambiguity here: the
root root user is /root .
directory is the topmost directory, known as the
Note that there is
/ directory. The root user’s home directory is /root and is called the
home directory of root
.
Other than the superuser, every other user has
limited
access to files and directories. Always use your machine as a normal user. Log in as root only to do system administration. This practice will save you from the destructive power that the root user has. In this chapter we show how to manually and automatically create new users.
Users are also divided into sets, called
groups
. A user can belong to several groups and there can be as many groups on the system as you like. Each group is defined by a list of users that are part of that set. In addition, each user may have a group of the same name (as the user’s login name), to which only that user belongs.
Each file on a system is
When you run
owned
ls -al by a particular user and also
owned
by a particular group.
, you can see the user that owns the file in the third column and the group that owns the file in the fourth column (these will often be identical, indicating that the file’s group is a group to which only the user belongs). To change the ownership of the file, simply use the chown ,
change ownerships
, command as follows.
101
11.2. The Password File
/etc/passwd
11. User Accounts and Ownerships
¨ chown <user>[:<group>] <filename>
§
5
10
15
The only place in the whole system where a user name is registered is in this file.
&
Exceptions to this rule are several distributed authentication schemes and the Samba package, but you needn’t worry about these for now.
Once a user is added to this file, that user is said to
exist
on the system. If you thought that user accounts were stored in some unreachable dark corner, then this should dispel that idea. This is also known as the administrators. View this file with less :
password
file to
¥ root:x:0:0:Paul Sheer:/root:/bin/bash bin:x:1:1:bin:/bin: daemon:x:2:2:daemon:/sbin: adm:x:3:4:adm:/var/adm: lp:x:4:7:lp:/var/spool/lpd: sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt mail:x:8:12:mail:/var/spool/mail: news:x:9:13:news:/var/spool/news: uucp:x:10:14:uucp:/var/spool/uucp: gopher:x:13:30:gopher:/usr/lib/gopher-data: ftp:x:14:50:FTP User:/home/ftp: nobody:x:99:99:Nobody:/: alias:x:501:501::/var/qmail/alias:/bin/bash paul:x:509:510:Paul Sheer:/home/paul:/bin/bash jack:x:511:512:Jack Robbins:/home/jack:/bin/bash silvia:x:511:512:Silvia Smith:/home/silvia:/bin/bash
§ ¦
Above is an extract of my own password file. Each user is stored on a separate line. Many of these are not human login accounts but are used by other programs.
Each line contains seven like this:
fields
separated by colons. The account for jack looks
jack
The user’s login name. It should be composed of lowercase letters and numbers.
Other characters are allowed, but are not preferable. In particular, there should
never
be two user names that differ only by their capitalization.
x
The user’s encrypted password. An arate file, /etc/shadow . This x in this field indicates that it is stored in a sep-
shadow
password file is a later addition to U
NIX systems. It contains additional information about the user.
102
¥
¦
11. User Accounts and Ownerships 11.3. Shadow Password File:
/etc/shadow
511
The user’s user identification number,
UID
.
&
This is used by programs as a short alternative to the user’s login name. In fact, internally, the login name is never used, only the UID.
-
512
The user’s group identification number, be discussed later.
-
GID
.
&
Similarly applies to the GID. Groups will
Jack Robbins
The user’s full name.
&
Few programs ever make use of this field.
-
/home/jack
The user’s home directory. The to this when the user logs in.
HOME environment variable will be set
/bin/bash
The shell to start when the user logs in.
The problem with traditional passwd eryone on the system can read the file.
files is that they had to be
world readable
&
Evin order for programs to extract information, such as the user’s full name, about the user. This means that everyone can see the encrypted password in the second field. Anyone can copy any other user’s password field and then try billions of different passwords to see if they match. If you have a hundred users on the system, there are bound to be several that chose passwords that matched some word in the dictionary. The so-called
dictionary
attack will simply try all 80,000 common English words until a match is found. If you think you are clever to add a number in front of an easy-to-guess dictionary word, password cracking algorithms know about these as well.
lem the shadow
&
And about every other trick you can think of.
To solve this probpassword file was invented. The shadow password file is used only for
authentication
&
Verifying that the user is the genuine owner of the account.
and is not world readable—there is no information in the shadow password file that a common program will ever need—no regular user has permission to see the encrypted password field. The fields are colon separated just like the passwd file.
Here is an example line from a /etc/shadow file:
¨ jack:Q,Jpl.or6u2e7:10795:0:99999:7:-1:-1:134537220
§
¥
¦
jack
The user’s login name.
Q,Jpl.or6u2e7
The user’s encrypted password known as the word. This is the user’s 8-character password with a
hash
of the pass-
one-way hash function
applied to it. It is simply a mathematical algorithm applied to the password that is known to produce a unique result for each password. To demonstrate: the
(rather poor) password Loghimin hashes to file. An almost identical password
:lZ1F.0VSRRucs: loghimin in the shadow gives a completely different hash
103
11.4. The
groups
Command and
/etc/group
11. User Accounts and Ownerships
:CavHIpD1W.cmg: . Hence, trying to guess the password from the hash can only be done by trying every possible password. Such a
brute force attack
is therefore considered computationally expensive
but not impossible
. To check if an entered password matches, just apply the identical mathematical algorithm to it: if it matches, then the password is correct. This is how the login command works.
Sometimes you will see a * in place of a hashed password. This means that the account has been disabled.
10795
Days since January 1, 1970, that the password was last changed.
0
Days before which password may not be changed. Usually zero. This field is not often used.
99999
Days after which password must be changed. This is also rarely used, and will be set to 99999 by default.
7
Days before password is to expire that user is warned of pending password expiration.
-1
Days after password expires that account is considered inactive and disabled.
1
is used to indicate infinity—that is, to mean we are effectively not using this feature.
-1
Days since January 1, 1970, when account will be disabled.
134537220
Flag reserved for future use.
On a U
NIX system you may want to give a number of users the same access rights. For instance, you may have five users that should be allowed to access some privileged file and another ten users that are allowed to run a certain program. You can users into, for example, two groups previl and wproc
group
these and then make the relevant file and directories owned by that group with, say,
¨ chown root:previl /home/somefile chown root:wproc /usr/lib/wproc
§
¥
¦
Permissions
&
Explained later.
file/directory must at least be dictate the kind of access, but for the meantime, the
owned
by that group.
The /etc/group file is also colon separated. A line might look like this:
¨ wproc:x:524:jack,mary,henry,arthur,sue,lester,fred,sally
§
¥
¦
104
11. User Accounts and Ownerships 11.5. Manually Creating a User Account
wproc
The name of the group. There should really also be a user of this name as well.
x
The group’s password. This field is usually set with an x and is not used.
524
The GID
group ID
. This must be unique in the group’s file.
jack,mary,henry,arthur,sue,lester,fred,sally
The list of users that belong to the group.
This must be comma separated with no spaces.
You can obviously study the group file to find out which groups a user belongs to,
&
That is,
not
“which users does a group consist of?” which is easy to see at a glance.
but when there are a lot of groups, it can be tedious to scan through the entire file. The groups command prints out this information.
The following steps are required to create a user account:
/etc/passwd entry
line.
To create an entry in this file, simply edit it and copy an existing
&
When editing configuration files, never write out a line from scratch if it has some kind of special format. Always copy an existing entry that has proved itself to be correct, and then edit in the appropriate changes. This will prevent you from making errors.
Always add users from the bottom and try to preserve the “pattern” of the file—that is, if you see numbers increasing, make yours fit in; if you are adding a normal user, add it after the existing lines of normal users. Each user must have a unique UID and should usually have a unique GID. So if you are adding a line to the end of the file, make your new UID and GID the same as the last line but incremented by 1.
/etc/shadow entry
Create a new shadow password entry. At this stage you do not know what the hash is, so just make it a * . You can set the password with the passwd command later.
/etc/group entry
Create a new group entry for the user’s group. Make sure the number in the group entry matches that in the passwd file.
/etc/skel
This directory contains a template home directory for the user. Copy the entire directory and all its contents into /home directory, renaming it to the name of the user. In the case of our
/home/jack .
jack example, you should have a directory
Home directory ownerships
You need to now change the ownerships of the home directory to match the user. The command chown -R jack:jack /home/jack will accomplish this change.
Setting the password
Use passwd <username> to set the user’s password.
105
11.6. Automatically:
useradd
and
groupadd
11. User Accounts and Ownerships
The above process is tedious. The commands that perform all these updates automatically are useradd , userdel , and usermod . The man pages explain the use of these commands in detail. Note that different flavors of U
NIX have different commands to do this. Some may even have graphical programs or web interfaces to assist in creating users.
In addition, the commands with respect to groups.
groupadd , groupdel , and groupmod do the same
It is possible to switch from one user to another, as well as view your login status and the status of other users. Logging in also follows a silent procedure which is important to understand.
A user most often gains access to the system through the gram looks up the UID and GID from the passwd and login group program. This profile and authenticates the user.
The following is quoted from the in detail: login man page, and explains this procedure
login
is used when signing onto a system. It can also be used to switch from one user to another at any time (most modern shells have support for this feature built into them, however).
If an argument is not given,
login
prompts for the username.
If the user is
not
root, and if
/etc/nologin
exists, the contents of this file are printed to the screen, and the login is terminated. This is typically used to prevent logins when the system is being taken down.
If special access restrictions are specified for the user in
/etc/usertty
, these must be met, or the login attempt will be denied and a syslog writes all system messages to the file
syslog
&
System error log program—
/var/log/messages .
message will be generated. See the section on ”Special Access Restrictions.”
If the user is root , then the login must be occuring on a tty listed in
/etc/securetty
.
&
If this file is not present, then root logins will be allowed from anywhere. It is worth deleting this file if your machine is protected by a firewall and you would like to easily login from
106
11. User Accounts and Ownerships 11.7. User Logins another machine on your LAN. If /etc/securetty from the terminals it lists.
is present, then logins are only allowed
Failures will be logged with the
syslog
facility.
After these conditions have been checked, the password will be requested and checked (if a password is required for this username). Ten attempts are allowed before
login
dies, but after the first three, the response starts to get very slow. Login failures are reported via the successful root logins.
syslog
facility. This facility is also used to report any
If the file
.hushlogin
exists, then a ”quiet” login is performed (this disables the checking of mail and the printing of the last login time and message of the day). Otherwise, if
/var/log/lastlog
exists, the last login time is printed (and the current login is recorded).
Random administrative things, such as setting the UID and GID of the tty are performed. The TERM environment variable is preserved, if it exists (other environment variables are preserved if the
-p
option is used). Then the HOME, PATH,
SHELL, TERM, MAIL, and LOGNAME environment variables are set. PATH defaults to
/usr/local/bin:/bin:/usr/bin:
listed in the PATH
/sbin:/bin:/usr/sbin:/usr/bin
.
&
Note that the
. This is only the default PATH
.
however.
-
—the current directory—is for normal users, and to for root. Last, if this is not a ”quiet” login, the message of the day is printed and the file with the user’s name in
/usr/spool/mail
and a message printed if it has non-zero length.
will be checked,
The user’s shell is then started. If no shell is specified for the user in then
/bin/sh
is used. If there is no directory specified in
/etc/passwd
/etc/passwd
, then
/
is used
,
(the home directory is checked for the
.hushlogin
file described above).
To temporarily become another user, you can use the su
¨ su jack
§ program:
This command prompts you for a password (unless you are the root user to begin with). It does nothing more than change the current user to have the access rights of jack . Most environment variables will remain the same. The
USER environment variables will be set to jack
HOME , LOGNAME , and
, but all other environment variables will be inherited.
su is, therefore, not the same as a normal login.
¥
¦
To get the equivalent of a login with
¨ su - jack
§ su , run
This will cause all initialization scripts (that are normally run when the user logs in) to be executed.
&
What actually happens is that the subsequent shell is started with a in front of the zero’th argument. This makes the shell read the user’s personal profile. The this.
-
Hence, after running su with the login command also does option, you logged in as if with the login command.
¥
¦
107
11.7. User Logins 11. User Accounts and Ownerships
who and w tion and other statistics.
¨ print a list of users logged in to the system, as well as their CPU consumpwho --help gives:
Usage: who [OPTION]... [ FILE | ARG1 ARG2 ]
¥
5
10
-H, --heading
-i, -u, --idle
-m
-q, --count
-s
-T, -w, --mesg
--message
--writable
--help
--version print line of column headings add user idle time as HOURS:MINUTES, . or old only hostname and user associated with stdin all login names and number of users logged on
(ignored) add user’s message status as +, - or ?
same as -T same as -T display this help and exit output version information and exit
If FILE is not specified, use /var/run/utmp.
/var/log/wtmp as FILE is common.
15
¦
A little more information can be gathered from the info pages for this command.
The idle time indicates how long since the user has last pressed a key. Most often, one just types who -Hiw .
w is similar. An extract of the w man page says: w displays information about the users currently on the machine, and their processes. The header shows, in this order, the current time, how long the system has been running, how many users are currently logged on, and the system load averages for the past 1, 5, and 15 minutes.
The following entries are displayed for each user: login name, the tty name, the remote host, login time, idle time, JCPU, PCPU, and the command line of their current process.
The JCPU time is the time used by all processes attached to the tty. It does not include past background jobs, but does include currently running background jobs.
The PCPU time is the time used by the current process, named in the ”what” field.
Finally, from a shell script the users command is useful for just seeing who is logged in. You can use in a shell script, for example:
¨ for user in ‘users‘ ; do
<etc> done
§
¥
¦
108
11. User Accounts and Ownerships 11.7. User Logins
id prints your
real
and
effective
UID and GID. A user normally has a UID and a GID but may also have an effective UID and GID as well. The real UID and GID are what a process will generally think you are logged in as. The effective UID and GID are the actual access permissions that you have when trying to read, write, and execute files.
There is a file /etc/security/limits.conf
that stipulates the limitations on CPU usage, process consumption, and other resources on a per-user basis. The documentation for this config file is contained in
/usr/[share/]doc/pam-<version>/txts/README.pam limits .
109
11.7. User Logins 11. User Accounts and Ownerships
110
This chapter summarizes remote access and the various methods of transferring files and data over the Internet.
telnet is a program for talking to a U
NIX a remote login. Try
¨ telnet <remote_machine> telnet localhost
§ network service. It is most often used to do
¥
¦ to log in to your remote machine. It needn’t matter if there is no physical network; network services always work regardless because the machine always has an internal link to itself.
rlogin can type
¨ is like a minimal version of telnet rlogin -l <username> <remote_machine> rlogin -l jack localhost
§ that allows login access only. You
¥
¦ if the system is configured to support remote logins.
These two services are the domain of old world U
NIX
; for security reasons, is now the preferable service for logging in remotely:
¨ ssh [-l <username>] <remote_machine>
§ ssh
¥
¦
111
12.2.
rcp
and
scp
12. Using Internet Services
Though rlogin and telnet are very convenient, they should
never
be used across a public network because your password can easily be read off the wire as you type it in.
rcp stands for
remote copy
and scp is the secure version from the two commands copy files from one machine to another using a similar notation to
¨ ssh package. These cp .
rcp [-r] [<remote_machine>:]<file> [<remote_machine>:]<file>
¥
¦
5
10
Here is an example:
¨
rcp /var/spool/mail/psheer \ divinian.cranzgot.co.za:/home/psheer/mail/cericon
scp /var/spool/mail/psheer \ divinian.cranzgot.co.za:/home/psheer/mail/cericon
The authenticity of host ’divinian.cranzgot.co.za’ can’t be established.
RSA key fingerprint is 43:14:36:5d:bf:4f:f3:ac:19:08:5d:4b:70:4a:7e:6a.
Are you sure you want to continue connecting (yes/no)?
yes
Warning: Permanently added ’divinian.cranzgot.co.za’ (RSA) to the list of known hosts.
[email protected]’s password:
100% |***************************************| 4266 KB 01:18
¥
¦
The -r option copies recursively and copies can take place in either direction or even between two nonlocal machines.
scp should always be used instead of
ing given by scp
rcp for security reasons.
for this first-time connection. See the ssh
Notice also the warndocumentation for how to make your first connection securely. All commands in the behavior.
ssh package have this same
5 rsh (
remote shell
) is a useful utility for executing a command on a remote machine.
Here are some examples:
¨
rsh divinian.cranzgot.co.za hostname
divinian.cranzgot.co.za
rsh divinian.cranzgot.co.za \ tar -czf - /home/psheer | dd of=/dev/fd0 bs=1024
tar: Removing leading ‘/’ from member names
20+0 records in
20+0 records out
¥
112
12. Using Internet Services 12.4. FTP
cat /var/spool/mail/psheer | rsh divinian.cranzgot.co.za \
§
sh -c ’cat >> /home/psheer/mail/cericon’
The first command prints the host name of the remote machine. The second command backs up my
remote
home directory to my
local
floppy disk. (More about dd and
/dev/fd0 come later.) The last command appends my local mailbox file to a remote mailbox file. Notice how stdin, stdout, and stderr are properly redirected to the local terminal. After reading Chapter 29 see rsh (8) or in.rshd
(8) to configure this service.
¦
Once again, for security reasons rsh should never be available across a public network.
FTP stands for
File Transfer Protocol
. If FTP is set up on your local machine, then other machines can download files. Type
¨ ftp metalab.unc.edu
§ or
¨ ncftp metalab.unc.edu
§
¥
¦
¥
¦ ftp is the traditional command-line U user program accessing some remote service.
while
NIX
FTP
ncftp client
,
&
“
client
” always indicates the is a more powerful client that will not always be installed.
You will now be inside an FTP a password. The site
session
metalab.unc.edu
. You will be asked for a login name and is one that allows
anonymous
logins. This means that you can type anonymous as your user name, and then anything you like as a password. You will notice that the session will ask you for an email address as your password. Any sequence of letters with an put your actual email address out of politeness.
@ symbol will suffice, but you should
The FTP session is like a reduced shell. You can type cd , ls , and file lists.
help brings up a list of commands, and you can also type ls -al to view help <command> to get help on a specific command. You can download a file by using the get <filename> command, but before you do this, you must set the
transfer type
to
binary
. The
transfer type
indicates whether or not newline characters will be translated to DOS format. Typing ascii turns on this feature, while binary turns it off. You may also want to enter hash which will print a # for every 1024 bytes of download. This is useful for watching the progress of a download. Go to a directory that has a README file in it and get README
§
¥
¦
113
12.5.
finger
12. Using Internet Services
The file will be downloaded into your current directory.
You can also
¨ put README
§ cd to the /incoming directory and upload files. Try to upload the file that you have just downloaded. Most FTP sites have an /incoming directory that is flushed periodically.
¥
¦
FTP allows far more than just uploading of files, although the administrator has the option to restrict access to any further features. You can create directories, change ownerships, and do almost anything you can on a local file system.
If you have several machines on a trusted
LAN
(
Local Area Network
—that is, your private office or home network), all should have FTP enabled to allow users to easily copy files between machines. How to install and configure one of the many available
FTP servers will become obvious later in this book.
finger is a service for remotely listing who is logged in on a remote system. Try finger @<hostname> to see who is logged in on <hostname> . The
finger
service will often be disabled on machines for security reasons.
Mail is being used more and more for transferring files between machines. It is bad practice to send mail messages over 64 kilobytes over the Internet because it tends to excessively load mail servers. Any file larger than 64 kilobytes should be uploaded by FTP onto some common FTP server. Most small images are smaller than this size, hence sending a small JPEG
&
A common Internet image file format. These are especially compressed and are usually under 100 kilobytes for a typical screen-sized photograph.
image is considered acceptable.
If you must send files by mail then you can do it by using uuencode . This utility packs binary files into a format that mail servers can handle. If you send a mail message containing arbitrary binary data, it will more than likely be corrupted on the way because mail agents are only designed to handle a limited range of characters.
uuencode space.
represents a binary file with allowable characters, albeit taking up slightly more
114
12. Using Internet Services 12.6. Sending Files by Email
Here is a neat trick to pack up a directory and send it to someone by mail.
¨ tar -czf - <mydir> | uuencode <mydir>.tar.gz \
| mail -s "Here are some files" <user>@<machine>
§
To unpack a uuencode d file, use the uudecode
¨ uudecode <myfile>.uu
§ command:
¥
¦
¥
¦
Most graphical mail readers have the ability to
attach
these attachments. The way they do this is not with files to mail messages and read uuencode but in a special format known as
MIME encapsulation
. MIME (
Multipurpose Internet Mail Extensions
) is a way of representing multiple files inside a single mail message. The way binary data is handled is similar to uuencode , but in a format known as
base64
.
Each MIME attachment to a mail message has a particular type, known as the
MIME type
. MIME types merely classify the attached file as an image, an audio clip, a formatted document, or some other type of data. The MIME type is a text tag with the format <major>/<minor> . The major part is called the
major MIME type
and the minor part is called the
minor MIME type
. Available major types match all the kinds of files that you would expect to exist. They are usually one of application , audio , image , message , text , or video . The application type means a file format specific to a particular utility. The minor MIME types run into the hundreds. A long list of
MIME types can be found in /etc/mime.types
.
If needed, some useful command-line utilities in the same vein as create and extract MIME messages. These are mpack , munpack , and mimencode ).
uuencode can mmencode (or
115
12.6. Sending Files by Email 12. Using Internet Services
116
INUX
Very often it is not even necessary to connect to the Internet to find the information you need. Chapter 16 contains a description of most of the documentation on a L
INUX distribution.
It is, however, essential to get the most up-to-date information where security and hardware driver support are concerned. It is also fun and worthwhile to interact with L
INUX users from around the globe. The rapid development of Free software could mean that you may miss out on important new features that could streamline IT services. Hence, reviewing web magazines, reading newsgroups, and subscribing to mailing lists are essential parts of a system administrator’s role.
The metalab.unc.edu
FTP site (previously called sunsite.unc.edu
) is one of the traditional sites for free software. It is mirrored in almost every country that has a significant IT infrastructure. If you point your web browser there, you will find a list of mirrors. For faster access, do pick a mirror in your own country.
It is advisable to browse around this FTP site. In particular you should try to find the locations of:
•
The directory where all sources for official GNU packages are stored. This would be a mirror of the Free Software Foundation’s FTP archives. These are packages that were commissioned by the FSF and not merely released under the
GPL (GNU General Public License). The FSF will distribute them in source form ( .tar.gz
) for inclusion into various distributions. They will, of course, compile and work under any U
NIX
.
117
13.2. HTTP — Web Sites 13.
L
INUX
Resources
•
The generic Linux download directory. It contains innumerable U
NIX packages in source and binary form, categorized in a directory tree. For instance, mail clients have their own directory with many mail packages inside.
metalab is the place where new developers can host any new software that they have produced.
There are instructions on the FTP site to upload software and to request it to be placed into a directory.
•
The kernel sources. This is a mirror of the kernel archives where Linus and other maintainers upload new bugs.
and
beta stable
&
Meaning that the software is well tested and free of serious
&
Meaning that the software is in its development stages.
kernel versions and kernel patches.
•
The various distributions. RedHat, Debian , and possibly other popular distributions may be present.
This list is by no means exhaustive. Depending on the willingness of the site maintainer, there may be mirrors to far more sites from around the world.
The FTP site is how you will download free software. Often, maintainers will host their software on a web site, but every popular package will almost always have an FTP site where versions are persistently stored. An example is in the directory /pub/Linux/apps/editors/X/cooledit/ own
Cooledit
package is distributed.
metalab.unc.edu
where the author’s
Most users should already be familiar with using a web browser.
like http://www.google.com/
, http://www.google.com/linux
, http://infoseek.go.com/
,
You should also become familiar with the concept of a
web search
.
&
Do I need to explain this?
-
You search the web when you point your web browser to a popular search engine http://www.altavista.com/
, or http://www.yahoo.com/ and search for particular key words. Searching is a bit of a black art with the billions of web pages out there. Always consult the search engine’s advanced search options to see how you can do more complex searches than just plain word searches.
The web sites in the FAQ (
Frequently Asked Questions
) (see Appendix D) should all be consulted to get an overview on some of the primary sites of interest to L
INUX users.
Especially important is that you keep up with the latest L
INUX
Linux Weekly News
http://lwn.net/ news. I find the an excellent source. Also, the famous (and infamous)
SlashDot
http://slashdot.org/ web site gives daily updates about “stuff that matters” (and therefore contains a lot about free software).
Fresh Meat
http://freshmeat.net/ is a web site devoted to new software releases. You will find new or updated packages announced every few hours or so.
118
13.
L
INUX
Resources 13.3. SourceForge
Linux Planet
http://www.linuxplanet.com/ seems to be a new (?) web site that I just found while writing this. It looks like it contains lots of tutorial information on
L
INUX
.
News Forge
ware issues.
http://www.newsforge.net/ also contains daily information about soft-
Lycos
http://download.lycos.com/static/advanced search.asp
is an efficient FTP search engine for locating packages. It is one of the few search engines that understand regular expressions.
Realistically, though, a new L
INUX web site is created every week; almost anything prepended or appended to “ linux ” is probably a web site already.
A new phenomenon in the free software community is the SourceForge web site, http://www.sourceforge.net/
. Developers can use this service at no charge to host their project’s web site, FTP archives, and mailing lists. SourceForge has mushroomed so rapidly that it has come to host the better half of all free software projects.
A mailing list is a special address that, when posted to, automatically sends email to a long list of other addresses. You usually subscribe to a mailing list by sending some specially formatted email or by requesting a subscription from the mailing list manager.
Once you have subscribed to a list, any email you post to the list will be sent to every other subscriber, and every other subscriber’s posts to the list will be sent to you.
There are mostly three types of mailing lists: the
majordomo
and the
* -request
type.
type, the
listserv
type,
To subscribe to the
majordomo
variety, send a mail message to [email protected]<machine> with no subject and a one-line message:
¨ subscribe <mailing-list-name>
§
¥
¦
119
13.5. Newsgroups 13.
L
INUX
Resources
This command adds your name to the mailing list name>@<machine> , to which messages are posted.
<mailing-list-
Do the same for [email protected]<machine> .
listserv
-type lists, by sending the same message to list-
Internet, you should get on
¨
For instance, if you are an administrator for any machine that is exposed to the bugtraq . Send email to
¥
¦ to [email protected]
, and become one of the tens of thousands of users that read and report security problems about L
INUX
.
To
unsubscribe
to a list is just as simple. Send an email message:
¨ unsubscribe <mailing-list-name>
§
¥
¦
Never send subscribe or subscribe unsubscribe or unsubscribe messages to the mailing list itself. Send messages only to to the address [email protected]<machine> or [email protected]<machine>
.
You subscribe to these mailing lists by sending an empty email message to <mailinglist-name>[email protected]<machine> with the word subscribe as the subject. The same email with the word unsubscribe removes you from the list.
itself.
Once again, never send subscribe or unsubscribe messages to the mailing list
A newsgroup is a notice board that everyone in the world can see. There are tens of thousands of newsgroups and each group is unique in the world.
The client software you use to read a newsgroup is called a
client
).
rtin is a popular text mode reader, while netscape
news reader
is graphical.
(or pan excellent graphical news reader that I use.
news
is an
Newsgroups are named like Internet hosts. One you might be interested in is comp.os.linux.announce
. The comp is the broadest subject description for
computers
; os stands for
operating systems
; and so on. Many other linux newsgroups are devoted to various L
INUX issues.
120
13.
L
INUX
Resources 13.6. RFCs
Newsgroups servers are big hungry beasts. They form a tree-like structure on the
Internet. When you send mail to a newsgroup it takes about a day or so for the mail you sent to propagate to every other server in the world. Likewise, you can see a list of all the messages posted to each newsgroup by anyone anywhere.
What’s the difference between a newsgroup and a mailing list? The advantage of a newsgroup is that you don’t have to download the messages you are not interested in. If you are on a mailing list, you get all the mail sent to the list. With a newsgroup you can look at the message list and retrieve only the messages you are interested in.
Why not just put the mailing list on a web page? If you did, then everyone in the world would have to go over international links to get to the web page. It would load the server in proportion to the number of subscribers. This is exactly what SlashDot is.
However, your newsgroup server is local, so you retrieve mail over a faster link and save Internet traffic.
An indispensable source of information for serious administrators or developers is the
RFCs. RFC stands for
Request For Comments
. RFCs are Internet standards written by authorities to define everything about Internet communication. Very often, documentation will refer to RFCs.
&
There are also a few nonsense RFCs out there. For example there is an
RFC to communicate using pigeons, and one to facilitate an infinite number of monkeys trying to write the complete works of Shakespeare. Keep a close eye on
Slashdot
http://slashdot.org/ to catch these.
ftp://metalab.unc.edu/pub/docs/rfc/
(and mirrors) has the complete RFCs archived for download. There are about 2,500 of them. The index file rfc-index.txt
is probably where you should start. It has entries like:
¨
2045 Multipurpose Internet Mail Extensions (MIME) Part One: Format of
Internet Message Bodies. N. Freed & N. Borenstein. November 1996.
(Format: TXT=72932 bytes) (Obsoletes RFC1521, RFC1522, RFC1590)
(Updated by RFC2184, RFC2231) (Status: DRAFT STANDARD)
¥
5
2046 Multipurpose Internet Mail Extensions (MIME) Part Two: Media
Types. N. Freed & N. Borenstein. November 1996. (Format: TXT=105854
§ bytes) (Obsoletes RFC1521, RFC1522, RFC1590) (Status: DRAFT STANDARD)
¦
¥
2068 Hypertext Transfer Protocol -- HTTP/1.1. R. Fielding, J. Gettys,
J. Mogul, H. Frystyk, T. Berners-Lee. January 1997. (Format:
§
TXT=378114 bytes) (Status: PROPOSED STANDARD)
Well, you get the idea.
¦
121
13.6. RFCs 13.
L
INUX
Resources
122
Every file and directory on a U
NIX system, besides being owned by a user and a group, has access
flags
&
A switch that can either be on or off.
(also called kind of access that user and group have to the file.
access bits
) dictating what
Running
¨
-rwxr-xr-x
-rw-r--r-drwxrwxrwt
§ ls -ald /bin/cp /etc/passwd /tmp gives you a listing like this:
1 root
1 root
5 root root root root
28628 Mar 24 1999 /bin/cp
1151 Jul 23 22:42 /etc/passwd
4096 Sep 25 15:23 /tmp
¥
¦
In the leftmost column are flags which completely describe the access rights to the file.
So far I have explained that the furthest flag to the left is either an ordinary file or directory. The remaining nine have a -
or d , indicating to indicate an unset value or one of several possible characters. Table 14.1 gives a complete description of file system permissions.
You use the chmod command to change the permissions of a file. It’s usually used as chmod [-R] [u|g|o|a][+|-][r|w|x|s|t] <file> [<file>] ...
§
¥
¦
123
14.1. The
chmod
Command 14. Permission and Modification Times
User, u
Group, g
Other, o
Possible chars, for unset r
Table 14.1 File and directory permissions
Effect for directories Effect for files
User can read the file.
w x s S r
User can read the contents of the directory.
With x or s , user can create and remove files in the directory.
User can access the contents of the files in a directory for
S has no effect.
x or s .
User can write to the file.
User can execute the file for x or s .
s , known as the
setuid
bit, means to set the user owner of the subsequent process to that of the file.
S has no effect.
Group can read the file.
w x s S r w x t T
Group can read the contents of the directory.
With x or s , group can create and remove files in the directory.
Group can access the contents of the files in a directory for x . For s , force all files in this directory to the same group as the directory.
S has no effect.
Everyone can read the contents of the directory.
With x or t , everyone can create and remove files in the directory.
Everyone can access the contents of the files in a directory for x and t .
t , known as the
sticky
bit, prevents users from removing files that they do not own, hence users are free to append to the directory but not to remove other users’ files.
T has no effect.
Group can write to the file.
Group can execute the file for x or s .
s , known as the
setgid
bit, means to set the group owner of the subsequent process to that of the file.
S has no effect.
Everyone can read the file.
Everyone can write to the file.
Group can execute the file for x or t . For t , save the process text image to the swap device so that future loads will be faster (I don’t know if this has an effect on L
INUX
).
T has no effect.
For example,
¨ chmod u+x myfile
§ adds execute permissions for the user of
¨ chmod a-rx myfile
§ myfile . And,
124
¥
¦
¥
¦
14. Permission and Modification Times 14.2. The
umask
Command removes r
ead
and
e
x
ecute
permissions for a
ll
—that is, user, group, and other.
The -R option, once again means
recursive
, diving into subdirectories as usual.
Permission bits are often represented in their binary form, especially in programs.
It is convenient to show the rwxrwxrwx set in octal,
&
See Section 2.1.
where each digit fits conveniently into three bits. Files on the system are usually created with
mode
0644 , meaning rw-r--r-. You can set permissions explicitly with an octal number, for example,
¨ chmod 0755 myfile
§
¥
¦ gives myfile the permissions rwxr-xr-x . For a full list of octal values for all kinds of permissions and file types, see /usr/include/linux/stat.h
.
In Table 14.1 you can see s , the
setuid
or
setgid
bit. If it is used without execute permissions then it has no meaning and is written as a capitalized S . This bit effectively colorizes an x into an s , so you should read an s as e
x
ecute
with
the setuid or setgid bit set.
t is known as the
sticky
bit. It also has no meaning if there are no execute permissions and is written as a capital T .
The leading 0 can in be ignored, but is preferred for explicitness. It value representing the three bits,
setuid
( 4 ),
setgid
( 2 ), and
sticky
( 1
can
take on a
). Hence a value of
5764 is 101 111 110 100 in binary and gives -rwsrw-r-T .
5 umask sets the default permissions for newly created files; it is usually 022 . This default value means that the permissions of any new file you create (say, with the touch command) will be g roup and of o
masked
ther. A with this number.
umask of 006
022 hence
excludes
write permissions of would exclude read and write permissions of o
¨ ther, but would allow read and write of g roup. Try umask touch <file1> ls -al <file1> umask 026 touch <file2> ls -al <file2>
§
¥
¦
026 is probably closer to the kind of mask we like as an ordinary user. Check your
/etc/profile file to see what umask your login defaults to, when, and also why.
125
14.3. Modification Times:
stat
14. Permission and Modification Times
In addition to permissions, each file has three integers associated with it that represent, in seconds, the last time the file was accessed (read), when it was last modified (written to), and when its permissions were last changed. These are known as the
atime
,
mtime
, and
ctime
of a file respectively.
5 is the result of
¨
To get a complete listing of the file’s permissions, use the stat /etc :
File: "/etc"
Size: 4096 Filetype: Directory
Mode: (0755/drwxr-xr-x)
Device: 3,1 Inode: 14057
Uid: (
Links: 41
0/
Access: Sat Sep 25 04:09:08 1999(00000.15:02:23)
Modify: Fri Sep 24 20:55:14 1999(00000.22:16:17) root) stat
Gid: ( command. Here
0/ root)
¥
¦
The Size: quoted here is the actual amount of disk space used to store the directory
listing
, and is the same as reported by ls . In this case it is probably four disk blocks of 1024 bytes each. The size of a directory as quoted here does files contained under it. For a file, however, the Size:
not
mean the sum of all would be the exact file length in bytes (again, as reported by ls ).
126
Very often, a file is required to be in two different directories at the same time. Think for example of a configuration file that is required by two different software packages that are looking for the file in different directories. The file could simply be copied, but to have to replicate changes in more than one place would create an administrative nightmare. Also consider a document that must be present in many directories, but which would be easier to update at one point.
The way two (or more) files can have the same data is with links.
5
10
To demonstrate a soft link, try the following:
¨ touch myfile ln -s myfile myfile2 ls -al cat > myfile a few lines of text
ˆD cat myfile cat myfile2
§
127
¥
¦
15.1. Soft Links 15. Symbolic and Hard Links
Notice that the and the usual ls -al next to listing has the letter myfile l on the far left next to
. This indicates that the file is a
soft
myfile2 link (also known as
, a
symbolic
link or
symlink
) to some other file.
A
symbolic link
contains no data of its own, only a reference to another file. It can even contain a reference to a directory. In either case, programs operating on the link will actually see the file or directory it points to.
5
Try
¨ mkdir mydir ln -s mydir mydir2 ls -al .
touch ./mydir/file1 touch ./mydir2/file2 ls -al ./mydir ls -al ./mydir2
§
The directory mydir2 is a symbolic link to mydir2 and appears as though it is a replica of the original. Once again the directory mydir2 does not consume additional disk space—a program that reads from the link is unaware that it is seeing into a different directory.
¥
¦
Symbolic links can also be copied and retain their value:
¨ cp mydir2 / ls -al / cd /mydir2
§
You have now copied the link to the root directory. However, the link points to a relative path mydir in the same directory as the link. Since there is no mydir here, an error is raised.
¥
¦
Try
¨ rm -f mydir2 /mydir2 ln -s ‘pwd‘/mydir mydir2 ls -al
§
Now you will see
¨ cp mydir2 / ls -al / cd /mydir2
§ mydir2 has an absolute path. You can try and notice that it now works.
One of the common uses of symbolic links is to make
mount
ed (see Section 19.4) file systems accessible from a different directory. For instance, you may have a large
¥
¦
¥
¦
128
15. Symbolic and Hard Links 15.2. Hard Links directory that has to be split over several physical disks. For clarity, you can mount the disks as /disk1 , /disk2 , etc., and then link the various subdirectories in a way that makes efficient use of the space you have.
Another example is the linking of grams accessing the device file
/dev/cdrom
/dev/cdrom to, say, /dev/hdc so that pro-
(see Chapter 18) actually access the correct IDE drive.
U
NIX allows the data of a file to have more than one name in separate places in the same file system. Such a file with more than one name for the same data is called a
hard-linked
¨ file and is similar to a symbolic link. Try touch mydata ln mydata mydataB ls -al
§
¥
¦
The files mydata and mydataB are indistinguishable. They share the same data, and have a 2 in second column of the
twice
ls -al listing. This means that they are hard-linked
(that there are two names for this file).
The reason why hard links are sometimes used in preference to symbolic links is that some programs are not fooled by a symbolic link: If you have, say, a script that uses cp to copy a file, it will copy the symbolic link instead of the file it points to.
actually has an option to override this behavior.
-
& cp
A hard link, however, will always be seen as a real file.
On the other hand, hard links cannot be made between files on different file systems nor can they be made between directories.
129
15.2. Hard Links 15. Symbolic and Hard Links
130
This chapter tells you where to find documentation on a common L
INUX distribution. The paths are derived from a RedHat distribution, but are no less applicable to other distributions, although the exact locations might be different. One difference between distributions is the migration of documentation source from /usr/
????
to
/usr/share/
????
—the proper place for them—on account of their being share able between different machines. See Chapter 35 for the
reason
documentation goes where it does. In many cases, documentation may not be installed or may be in completely different locations. Unfortunately, I cannot keep track of what the 20 major vendors are doing, so it is likely that this chapter will quickly become out of date.
For many proprietary operating systems, the definitive reference for their operating system is printed texts. For L
INUX
, much of documentation is written by the authors themselves and is included with the source code. A typical L
INUX distribution will package documentation along with the compiled binaries. Common distributions come with
hundreds of megabytes
of printable, hyperlinked, and plain text documentation. There is often no need to go the the World Wide Web unless something is outdated.
If you have not already tried this, run
¨ ls -ld /usr/*/doc /usr/*/*/doc /usr/share/*/*/doc \
/opt/*/doc /opt/*/*/doc
§
¥
¦
This is a somewhat unreliable way to search for potential documentation directories, but it gives at least the following list of directories for an official RedHat 7.0 with a complete set of installed packages:
¨
/usr/X11R6/doc
/usr/lib/X11/doc
/usr/local/doc
/usr/share/vim/vim57/doc
/usr/share/doc
/usr/share/gphoto/doc
¥
131
16. Pre-installed Documentation
/usr/share/lout/doc
•
Kernel documentation:
/usr/src/linux/Documentation/
This directory contains information on all hardware drivers except graphic cards. The kernel has built-in drivers for networking cards, SCSI controllers, sound cards, and so on. If you need to find out if one of these is supported, this is the first place to look.
•
X Window System graphics hardware support:
/usr/X11R6/lib/X11/doc/
(This is the same as /usr/X11R6/doc/ .) In this directory you will find documentation on all of the graphics hardware supported by , how to configure , tweak video modes, cope with incompatible graphics cards, and so on. See Section 43.5 for details.
•
TEX and Meta-Font reference:
/usr/share/texmf/doc/
This directory has an enormous and comprehensive reference to the TEX typesetting language and the Meta-Font font generation package. It is not, however, an exhaustive reference.
•
L TEX HTML documentation:
/usr/share/texmf/doc/latex/latex2e-html/
This directory contains a large reference to the L TEX typesetting language. (This book itself was typeset using L TEX.)
•
HOWTOs:
/usr/doc/HOWTO or /usr/share/doc/HOWTO
HOWTOs are an excellent source of layman tutorials for setting up almost any kind of service you can imagine. RedHat seems to no longer ship this documentation with their base set of packages. It is worth listing the contents here to emphasize diversity of topics covered. These are mirrored all over the Internet, so you should have no problem finding them from a search engine (in particular, from http://www.linuxdoc.org/
):
3Dfx-HOWTO
AX25-HOWTO
Access-HOWTO
Alpha-HOWTO
Assembly-HOWTO
Bash-Prompt-HOWTO
Benchmarking-HOWTO
Beowulf-HOWTO
BootPrompt-HOWTO
Bootdisk-HOWTO
Busmouse-HOWTO
Finnish-HOWTO
Firewall-HOWTO
French-HOWTO
Ftape-HOWTO
GCC-HOWTO
German-HOWTO
Glibc2-HOWTO
HAM-HOWTO
Hardware-HOWTO
Hebrew-HOWTO
INDEX.html
Modem-HOWTO
Multi-Disk-HOWTO
Multicast-HOWTO
NET-3-HOWTO
NFS-HOWTO
NIS-HOWTO
Networking-Overview-HOWTO
Optical-Disk-HOWTO
Oracle-HOWTO
PCI-HOWTO
PCMCIA-HOWTO
Security-HOWTO
Serial-HOWTO
Serial-Programming-HOWTO
Shadow-Password-HOWTO
Slovenian-HOWTO
Software-Release-Practice-HOWTO
Sound-HOWTO
Sound-Playing-HOWTO
Spanish-HOWTO
TeTeX-HOWTO
Text-Terminal-HOWTO
132
¦
16. Pre-installed Documentation
CD-Writing-HOWTO
CDROM-HOWTO
COPYRIGHT
Chinese-HOWTO
Commercial-HOWTO
Config-HOWTO
Consultants-HOWTO
Cyrillic-HOWTO
DNS-HOWTO
DOS-Win-to-Linux-HOWTO
DOS-to-Linux-HOWTO
DOSEMU-HOWTO
Danish-HOWTO
Distribution-HOWTO
ELF-HOWTO
Emacspeak-HOWTO
Esperanto-HOWTO
Ethernet-HOWTO
INFO-SHEET
IPCHAINS-HOWTO
IPX-HOWTO
IR-HOWTO
ISP-Hookup-HOWTO
Installation-HOWTO
Intranet-Server-HOWTO
Italian-HOWTO
Java-CGI-HOWTO
Kernel-HOWTO
Keyboard-and-Console-HOWTO
KickStart-HOWTO
LinuxDoc+Emacs+Ispell-HOWTO
META-FAQ
MGR-HOWTO
MILO-HOWTO
MIPS-HOWTO
Mail-HOWTO
PPP-HOWTO
PalmOS-HOWTO
Parallel-Processing-HOWTO
Pilot-HOWTO
Plug-and-Play-HOWTO
Polish-HOWTO
Portuguese-HOWTO
PostgreSQL-HOWTO
Printing-HOWTO
Printing-Usage-HOWTO
Quake-HOWTO
README
RPM-HOWTO
Reading-List-HOWTO
Root-RAID-HOWTO
SCSI-Programming-HOWTO
SMB-HOWTO
SRM-HOWTO
Thai-HOWTO
Tips-HOWTO
UMSDOS-HOWTO
UPS-HOWTO
UUCP-HOWTO
Unix-Internet-Fundamentals-HOWTO
User-Group-HOWTO
VAR-HOWTO
VME-HOWTO
VMS-to-Linux-HOWTO
Virtual-Services-HOWTO
WWW-HOWTO
WWW-mSQL-HOWTO
XFree86-HOWTO
XFree86-Video-Timings-HOWTO
XWindow-User-HOWTO
•
Mini HOWTOs:
/usr/doc/HOWTO/mini or /usr/share/doc/HOWTO/mini
These are smaller quick-start tutorials in the same vein (also available from http://www.linuxdoc.org/
):
3-Button-Mouse
ADSL
ADSM-Backup
AI-Alife
Advocacy
Alsa-sound
Apache+SSL+PHP+fp
Automount
Backup-With-MSDOS
Battery-Powered
Boca
BogoMips
Bridge
Bridge+Firewall
Bzip2
Cable-Modem
Cipe+Masq
Clock
Coffee
Colour-ls
Cyrus-IMAP
DHCP
DHCPcd
DPT-Hardware-RAID
Diald
Diskless
Ext2fs-Undeletion
Fax-Server
Firewall-Piercing
GIS-GRASS
GTEK-BBS-550
Hard-Disk-Upgrade
INDEX
INDEX.html
IO-Port-Programming
IP-Alias
IP-Masquerade
IP-Subnetworking
ISP-Connectivity
Install-From-ZIP
Kerneld
LBX
LILO
Large-Disk
Leased-Line
Linux+DOS+Win95+OS2
Linux+FreeBSD
Linux+FreeBSD-mini-HOWTO
Linux+NT-Loader
Linux+Win95
Loadlin+Win95
Loopback-Root-FS
Mac-Terminal
Mail-Queue
Mail2News
Man-Page
Modules
Multiboot-with-LILO
NCD-X-Terminal
NFS-Root
NFS-Root-Client
Netrom-Node
Netscape+Proxy
Netstation
News-Leafsite
Offline-Mailing
PLIP
Partition
Partition-Rescue
Path
Pre-Installation-Checklist
Process-Accounting
Proxy-ARP-Subnet
Public-Web-Browser
Qmail+MH
Quota
RCS
README
RPM+Slackware
RedHat-CD
Remote-Boot
Remote-X-Apps
SLIP-PPP-Emulator
Secure-POP+SSH
Sendmail+UUCP
Sendmail-Address-Rewrite
Small-Memory
Software-Building
Software-RAID
Soundblaster-AWE
StarOffice
Term-Firewall
TkRat
Token-Ring
Ultra-DMA
Update
Upgrade
VAIO+Linux
VPN
Vesafb
Visual-Bell
Windows-Modem-Sharing
WordPerfect
X-Big-Cursor
XFree86-XInside
Xterm-Title
ZIP-Drive
ZIP-Install
•
L
INUX
documentation project:
/usr/doc/LDP or /usr/share/doc/ldp
The LDP project’s home page is http://www.linuxdoc.org/
. The LDP is a consolidation of
HOWTOs, FAQs, several books, man pages, and more. The web site will have anything that is not already installed on your system.
•
Web documentation:
/home/httpd/html or /var/www/html
Some packages may install documentation here so that it goes online automatically if your web server is running.
(In older distributions, this directory was
/home/httpd/html .)
133
16. Pre-installed Documentation
•
Apache reference:
/home/httpd/html/manual or /var/www/html/manual
Apache keeps this reference material online, so that it is the default web page shown when you install Apache for the first time. Apache is the most popular web server.
•
Manual pages:
/usr/man/ or /usr/share/man/
Manual pages were discussed in Section 4.7. Other directory superstructures (see page
137) may contain man pages—on some other U
NIX systems man pages are littered everywhere.
(for the
¨
To convert a cp man command), page to PostScript (for printing or viewing), use, for example groff -Tps -mandoc /usr/man/man1/cp.1 > cp.ps ; gv cp.ps
groff -Tps -mandoc /usr/share/man/man1/cp.1 > cp.ps ; gv cp.ps
§
¥
¦
•
info pages:
/usr/info/ or /usr/share/info/
Info pages were discussed in Section 4.8.
•
Individual package documentation
: /usr/doc/* or /usr/share/doc/*
Finally, all packages installed on the system have their own individual documentation directory. A package foo will most probably have a documentation directory
/usr/doc/foo (or /usr/share/doc/foo ). This directory most often contains documentation released with the sources of the package, such as release information, feature news, example code, or FAQs. If you have a particular interest in a package, you should always scan its directory in /usr/doc (or /usr/share/doc ) or, better still, download its source distribution.
Below are the /usr/doc (or /usr/share/doc ) directories that contained more than a trivial amount of documentation for that package. In some cases, the package had complete references. (For example, the complete Python references were contained nowhere else.)
ImageMagick-5.2.2
LPRng-3.6.24
XFree86-doc-4.0.1
bash-2.04
bind-8.2.2 P5 cdrecord-1.9
cvs-1.10.8
fetchmail-5.5.0
freetype-1.3.1
gawk-3.0.6
gcc-2.96
gcc-c++-2.96
ghostscript-5.50
gimp-1.1.25
glibc-2.1.92
gtk+-1.2.8
gtk+-devel-1.2.8
ipchains-1.3.9
iproute-2.2.4
isdn4k-utils-3.1
krb5-devel-1.2.1
libtiff-devel-3.5.5
libtool-1.3.5
libxml-1.8.9
lilo-21.4.4
lsof-4.47
lynx-2.8.4
ncurses-devel-5.1
nfs-utils-0.1.9.1
openjade-1.3
openssl-0.9.5a
pam-0.72
pine-4.21
pmake-2.1.34
pygtk-0.6.6
python-docs-1.5.2
rxvt-2.6.3
sane-1.0.3
sgml-tools-1.0.9
slang-devel-1.4.1
stylesheets-1.54.13rh
tin-1.4.4
uucp-1.06.1
vim-common-5.7
134
NIX
Here is an overview of how U
NIX directories are structured. This is a simplistic and theoretical overview and not a specification of the L
INUX file system. Chapter 35 contains proper details of permitted directories and the kinds of files allowed within them.
L
INUX systems are divided into hundreds of small
packages
, each performing some logical group of operations. On L
INUX
, many small, self-contained packages interoperate to give greater functionality than would large, aggregated pieces of software.
There is also no clear distinction between what is part of the operating system and what is an application—every function is just a package.
A software package on a RedHat type system is distributed in a single
Package Manager
(
RPM
) file that has a .rpm
extension. On a
Debian
RedHat
distribution, the equivalent is a
.tgz
files.
.deb
package file, and on the
Slackware
distribution there are Slackware
Each package will unpack as many files, which are placed all over the system.
Packages generally do not create major directories but unpack files into existing, wellknown, major directories.
Note that on a newly installed system there are no files anywhere that do not belong to some package.
135
17.2.
U
NIX
Directory Superstructure 17. Overview of the U
NIX
Directory Layout
5
10
The root directory on a U
NIX
¨ drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x dr-xr-xr-x drwxr-xr-x drwxrwxrwt
2 root
2 root
7 root
41 root
24 root
4 root
2 root
7 root
80 root
3 root
5 root
25 root system typically looks like this: root root root root root root root root root root root root
2048 Aug 25 14:04 bin
1024 Sep 16 10:36 boot
35840 Aug 26 17:08 dev
4096 Sep 24 20:55 etc
1024 Sep 27 11:01 home
3072 May 19 10:05 lib
12288 Dec 15 1998 lost+found
1024 Jun 7 11:47 mnt
0 Sep 16 10:36 proc
3072 Sep 23 23:41 sbin
4096 Sep 28 18:12 tmp
1024 May 29 10:23 usr
5
10
15
The /usr directory typically looks like this:
¨ drwxr-xr-x drwxr-xr-x
9 root
6 root drwxr-xr-x 2 root drwxr-xr-x 261 root drwxr-xr-x 7 root drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x
2 root
4 root
36 root
2 root
79 root drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x
3 root
15 root
2 root
39 root
3 root
3 root root root root root root root root root root root root root root root root root
1024 May 15 11:49 X11R6
27648 Sep 28 17:18 bin
1024 May 13 16:46 dict
7168 Sep 26 10:55 doc
1024 Sep 3 08:07 etc
2048 May 15 10:02 games
1024 Mar 21 1999 i386-redhat-linux
7168 Sep 12 17:06 include
9216 Sep 7 09:05 info
12288 Sep 28 17:17 lib
1024 May 13 16:21 libexec
1024 May 13 16:35 man
4096 May 15 10:02 sbin
1024 Sep 12 17:07 share
1024 Sep 4 14:38 src
1024 Dec 16 1998 var
5
10
The /usr/local
¨ drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x
3 root
2 root
4 root
2 root
5 root
2 root
9 root
12 root
2 root
15 root directory typically looks like this: root root root root root root root root root root
4096 Sep 27 13:16 bin
1024 Feb 6 1996 doc
1024 Sep
1024 Feb
3 08:07 etc
6 1996 games
1024 Aug 21 19:36 include
1024 Sep 7 09:08 info
2048 Aug 21 19:44 lib
1024 Aug 2 1998 man
1024 Feb 6 1996 sbin
1024 Sep 7 09:08 share
136
¥
¦
¥
¦
¥
¦
17. Overview of the U
NIX
Directory Layout 17.2.
U
NIX
Directory Superstructure and the /usr/X11R6 directory also looks similar. What is apparent here is that all these directories contain a similar set of subdirectories. This set of subdirectories is called a
directory superstructure
or previously used by U
NIX administrators.
-
superstructure
.
&
To my knowledge this is a new term not
The superstructure always contains a others are optional.
bin and lib subdirectory, but almost all
Each package will install under one of these superstructures, meaning that it will unpack many files into various subdirectories of the superstructure. A RedHat package would always install under the /usr or / superstructure, unless it is a graphical
Window System application, which installs under the /usr/X11R6/ superstructure. Some very large applications may install under a /opt/<package-name> superstructure, and homemade packages usually install under the perstructure ( local means
specific to this very machine
/usr/local/ su-
). The directory superstructure under which a package installs is often called the
never install files across different superstructures.
installation prefix
.
Packages almost
&
Exceptions to this are configuration files which are mostly stored in /etc/ .
-
Typically, most of the system is under /usr . This directory can be read-only, since packages should never need to write to this directory—any writing is done under /var or /tmp ( /usr/var and /usr/tmp
/tmp , respectively). The small amount under / are often just symlinked to /var or that is not part of another superstructure (usually about 40 megabytes) performs essential system administration functions.
These are commands needed to bring up or repair the system in the absence of /usr .
The list of superstructure subdirectories and their descriptions is as follows:
bin
Binary executables
. Usually all bin directories are in the PATH environment variable so that the shell will search all these directories for binaries.
sbin
Superuser binary executables
. These are programs for system administration only.
Only the root will have these executables in their PATH .
lib
Libraries
. All other data needed by programs goes in here. Most packages have their own subdirectory under lib to store data files into.
Dynamically Linked
Libraries
(
DLL
s or .so
files.)
&
Executable program code shared by more than one program in the bin directory to save disk space and memory.
are stored directly in lib .
etc
Et cetera
. Configuration files.
var
Variable data
. Data files that are continually being re-created or updated.
doc
Documentation
. This directory is discussed in Chapter 16.
man
Manual pages
. This directory is discussed in Chapter 16.
info
Info pages
. This directory is discussed in Chapter 16.
137
17.3.
L
INUX on a Single Floppy Disk 17. Overview of the U
NIX
Directory Layout
share
Shared data
. Architecture-independent files. Files that are independent of the hardware platform go here. This allows them to be shared across different machines, even though those machines may have a different kind of processor altogether.
include
C
header files
. These are for development.
src
C
source files
. These are sources to the kernel or locally built packages.
tmp
Temporary files
. A convenient place for a running program to create a file for temporary use.
You can get L
INUX to run on a 1.44 megabyte floppy disk if you trim all unneeded files off an old Slackware distribution with a 2.0.3x kernel. You can compile a small
2.0.3x kernel to about 400 kilobytes (compressed) (see Chapter 42). A file system can be reduced to 2–3 megabytes of absolute essentials and when compressed will fit into
1 megabyte. If the total is under 1.44 megabytes, then you have your L
INUX floppy. The file list might be as follows (includes all links): on one
/bin
/bin/sh
/bin/cat
/bin/chmod
/bin/chown
/bin/cp
/bin/pwd
/bin/dd
/bin/df
/bin/du
/bin/free
/bin/gunzip
/bin/gzip
/bin/hostname
/bin/login
/bin/ls
/bin/mkdir
/bin/mv
/bin/ps
/bin/rm
/bin/stty
/bin/su
/bin/sync
/bin/zcat
/bin/dircolors
/bin/mount
/bin/umount
/bin/bash
/bin/domainname
/bin/head
/bin/kill
/bin/tar
/bin/cut
/bin/uname
/bin/ping
/bin/ln
/bin/ash
/etc
/etc/default
/etc/fstab
/etc/group
/etc/host.conf
/etc/hosts
/etc/inittab
/etc/issue
/etc/utmp
/etc/networks
/etc/passwd
/etc/profile
/etc/protocols
/etc/rc.d
/etc/rc.d/rc.0
/etc/rc.d/rc.K
/etc/rc.d/rc.M
/etc/rc.d/rc.S
/etc/rc.d/rc.inet1
/etc/rc.d/rc.6
/etc/rc.d/rc.4
/etc/rc.d/rc.inet2
/etc/resolv.conf
/etc/services
/etc/termcap
/etc/motd
/etc/magic
/etc/DIR COLORS
/etc/HOSTNAME
/etc/mtools
/etc/ld.so.cache
/etc/psdevtab
/etc/mtab
/etc/fastboot
/lib
/lib/ld.so
/lib/libc.so.5
/lib/ld-linux.so.1
/lib/libcurses.so.1
/lib/libc.so.5.3.12
/lib/libtermcap.so.2.0.8
/lib/libtermcap.so.2
/lib/libext2fs.so.2.3
/lib/libcom err.so.2
/lib/libcom err.so.2.0
/lib/libext2fs.so.2
/lib/libm.so.5.0.5
/lib/libm.so.5
/lib/cpp
/usr
/usr/adm
/usr/bin
/usr/bin/less
/usr/bin/more
/usr/bin/sleep
/usr/bin/reset
/usr/bin/zless
/usr/bin/file
/usr/bin/fdformat
/usr/bin/strings
/usr/bin/zgrep
/usr/bin/nc
/usr/bin/which
/usr/bin/grep
/usr/sbin
/usr/sbin/showmount
/usr/sbin/chroot
/usr/spool
/usr/tmp
/sbin
/sbin/e2fsck
/sbin/fdisk
/sbin/fsck
/sbin/ifconfig
/sbin/iflink
/sbin/ifsetup
/sbin/init
/sbin/mke2fs
/sbin/mkfs
/sbin/mkfs.minix
/sbin/mklost+found
/sbin/mkswap
/sbin/mount
/sbin/route
/sbin/shutdown
/sbin/swapoff
/sbin/swapon
/sbin/telinit
/sbin/umount
/sbin/agetty
/sbin/update
/sbin/reboot
/sbin/netcfg
/sbin/killall5
/sbin/fsck.minix
/sbin/halt
/sbin/badblocks
/sbin/kerneld
/sbin/fsck.ext2
/var
/var/adm
/var/adm/utmp
/var/adm/cron
/var/spool
/var/spool/uucp
/var/spool/uucp/SYSLOG
/var/spool/uucp/ERRLOG
/var/spool/locks
/var/tmp
/var/run
/var/run/utmp
/home/user
/mnt
/proc
/tmp
/dev/<various-devices>
Note that the tem startup files etc directory differs from that of a RedHat distribution. The sys-
/etc/rc.d
are greatly simplified under Slackware.
138
17. Overview of the U
NIX
Directory Layout 17.3.
L
INUX on a Single Floppy Disk
The /lib/modules
/lib/modules/2.0.36
directory has been stripped for the creation of this floppy.
would contain dynamically loadable kernel drivers (modules). Instead, all needed drivers are compiled into the kernel for simplicity (explained in Chapter 42).
At some point, try creating a single floppy distribution as an exercise. This task should be most instructive to a serious system administrator. At the very least, you should look through all of the commands in the bin directories and the sbin directories above and browse through the man pages of any that are unfamiliar.
The preceding file system comes from the morecram-1.3
package available from http://rute.sourceforge.net/morecram-1.3.tar.gz
. It can be downloaded to provide a useful rescue and setup disk. Note that there are
many
such rescue disks available which are more current than morecram .
139
17.3.
L
INUX on a Single Floppy Disk 17. Overview of the U
NIX
Directory Layout
140
NIX
U
NIX was designed to allow transparent access to hardware devices across all CPU architectures. U
NIX also supports the philosophy that all devices be accessible using the same set of command-line utilities.
U
NIX has a beautifully consistent method of allowing programs to access hardware.
Under U
NIX
, every piece of hardware is a file. To demonstrate this novelty, try viewing the file
¨
/dev/hda (you will have to be root to run this command): less -f /dev/hda
§
¥
¦
/dev/hda is not really a file at all. When you read from it, you are actually reading directly from the first physical hard disk of your machine.
/dev/hda is known as a device file, and all of them are stored under the /dev directory.
Device files allow access to hardware. If you have a sound card installed and configured, you can try:
¨ cat /dev/dsp > my_recording
§
¥
¦
Say something into your microphone and then type:
¨ cat my_recording > /dev/dsp
§
¥
¦
The system will play out the sound through your speakers. (Note that this does not always work, since the recording volume or the recording speed may not be set correctly.)
141
18.2. Block and Character Devices 18.
U
NIX
Devices
If no programs are currently using your mouse, you can also try:
¨ cat /dev/mouse
§
¥
¦
If you now move the mouse, the mouse protocol commands will be written directly to your screen (it will look like garbage). This is an easy way to see if your mouse is working, and is especially useful for testing a serial port. Occasionally this test doesn’t work because some command has previously configured the serial port in some odd way. In that case, also try:
¨ cu -s 1200 -l /dev/mouse
§
¥
¦
At a lower level, programs that access device files do so in two basic ways:
•
They read and write to the device to send and retrieve bulk data (much like less and cat above).
•
They use the
C
ioctl (
IO Control
) function to configure the device. (In the case of the sound card, this might set mono versus stereo, recording speed, or other parameters.)
Because every kind of device that one can think of (except for network cards) can be twisted to fit these two modes of operation, U
NIX
’s scheme has endured since its inception and is the universal method of accessing hardware.
Hardware devices can generally be categorized into random access devices like disk and tape drives, and serial devices like mouse devices, sound cards, and terminals.
Random access devices are usually accessed in large contiguous blocks of data that are stored persistently. They are read from in discrete units (for most disks, 1024 bytes at a time). These are known as shows a b
block
devices. Running an ls -l /dev/hda on the far left of the listing, which means that your hard disk is a block brw-r-----
§
1 root disk 3, 64 Apr 27 1995 /dev/hdb
¥
¦
Serial devices, on the other hand, are accessed one byte at a time. Data can be read or written only once. For example, after a byte has been read from your mouse, the same byte cannot be read by some other program. Serial devices are called
character
devices and are indicated by a c on the far left of the listing. Your /dev/dsp (
Digital
Signal Processor
—that is, your sound card) device looks like:
142
18.
U
NIX
Devices
¨ crw-r--r--
§
1 root sys
18.3.
Major and Minor Device Numbers
14, 3 Jul 18 1994 /dev/dsp
¥
¦
Devices are divided into sets called
major device numbers
. For instance, all SCSI disks are
major number 8
. Further, each individual device has a
/dev/sda , which is
minor device 0 minor device number
like
. Major and minor device numbers identify the device to the kernel. The file name of the device is arbitrary and is chosen for convenience and consistency. You can see the major and minor device number ( 8, 0 ) in the ls
¨ listing for brw-rw----
§
/dev/sda
1 root
: disk 8, 0 May 5 1998 /dev/sda
¥
¦
A list of common devices and their descriptions follows.
bers are shown in parentheses.
The major num-
The complete reference for devices is the file
/usr/src/linux/Documentation/devices.txt
.
/dev/hd??
hd stands for
hard disk
, but refers here only to mon hard disks. The first letter after the hd
IDE
devices—that is, comdictates the physical disk drive:
/dev/hda (3)
First drive, or primary master.
/dev/hdb (3)
Second drive, or primary slave.
/dev/hdc (22)
Third drive, or secondary master.
/dev/hdd (22)
Fourth drive, or secondary slave.
When accessing any of these devices (with, say, less /dev/hda ), you would be reading raw from the actual physical disk starting at the first sector of the first track, sequentially, until the last sector of the last track.
Partitions
&
With all operating systems, disk drives are divided into sections called
partitions
. A typical disk might have 2 to 10 partitions. Each partition acts as a whole disk on its own, giving the effect of having more than one disk. For instance, you might have Windows installed on one partition and L
INUX installed on another. More details come in Chapter 19.
are named /dev/hda1 ,
/dev/hda2 , etc., indicating the first, second, etc., partition on physical drive a .
143
18.4. Common Device Names 18.
U
NIX
Devices
/dev/sd
??
sda
(8)
sd stands for
SCSI disk
, the high-end drives mostly used by servers.
is the first physical disk probed, and so on. Probing goes by SCSI ID and has a system completely different from that of IDE devices.
partition on the first drive, etc.
/dev/sda1 is the first
/dev/ttyS
?
(4)
These are serial devices numbered from 0 up.
/dev/ttyS0 is your first serial port (COM1 under MS-DOS or Windows). If you have a multiport card, these can go to 32 , 64 , and up.
/dev/psaux (10)
PS/2 mouse.
/dev/mouse
A symlink to also supported.
/dev/ttyS0 or /dev/psaux . Other mouse devices are
/dev/modem
A symlink to /dev/ttyS1 or whatever port your modem is on.
/dev/cua
?
(4)
Identical to ttyS
?
but now fallen out of use.
/dev/fd
?
The
(2)
fd0
Floppy disk
.
and fd1 fd0 is equivalent to your A: drive and fd1 your B: drive.
devices autodetect the format of the floppy disk, but you can explicitly specify a higher density by using a device name like /dev/fd0H1920 , which gives you access to 1.88 MB, formatted, 3.5-inch floppies. Other floppy devices are shown in Table 18.1.
See Section 19.3.4 on how to format these devices.
/dev/par
?
(6)
Parallel port
.
/dev/par0 is your first parallel port or LPT1 under DOS.
/dev/lp
?
(6)
Line printer
. Identical to /dev/par
?
.
/dev/urandom
Random
random numbers.
number generator. Reading from this device gives pseudo-
/dev/st
?
(9)
SCSI tape
. SCSI backup tape drive.
/dev/zero (1)
Produces zero bytes, and as many of them as you need. This is useful if you need to generate a block of zeros for some reason. Use dd (see Section
18.5.2) to read a specific number of zeros.
/dev/null (1)
Null device
. Reads nothing. Anything you write to the device is discarded. This is very useful for discarding output.
/dev/pd
?
Parallel port IDE disk
.
/dev/pcd
?
Parallel port ATAPI CD-ROM
.
/dev/pf
?
Parallel port ATAPI disk
.
/dev/sr
?
SCSI CD-ROM
.
/dev/scd
?
SCSI CD-ROM
(Identical, alternate name).
144
18.
U
NIX
Devices 18.4. Common Device Names
l
Table 18.1 Floppy device names
Floppy devices are named /dev/fd
lmnnnn m nnnn
H
E q
D u d h
0
1
360 410 420 720
800 820 830 880
1040 1120 1200
1440 1476 1494
1600 1680 1722
1743 1760 1840
1920 2880 3200
3520 3840
A:
B: drive drive
“double density” 360 KB or 5.25 inch
“high density” 1.2 MB or 5.25 inch
“quad density” 5.25 inch
“double density” 720 KB or 3.5 inch
“high density” 1.44 MB or 3.5 inch
Extra density 3.5 inch.
Any 3.5-inch floppy. Note that
D , H , and E u now replaces
, thus leaving it up to the user to decide if the floppy has enough density for the format.
The size of the format. With D , H , and E , 3.5inch floppies have devices only for the sizes that are likely to work. For instance, there is no
/dev/fd0D1440 because double density disks won’t manage 1440 KB.
/dev/fd0H1920
/dev/fd0H1440 and are probably the ones you are most interested in.
/dev/sg
?
SCSI generic
. This is a general-purpose SCSI command interface for devices like scanners.
/dev/fb
?
(29)
Frame buffer
. This represents the kernel’s attempt at a graphics driver.
/dev/cdrom
A symlink to /dev/hda , /dev/hdb , or /dev/hdc . It can also be linked to your SCSI CD-ROM.
/dev/ttyI
?
ISDN modems
.
/dev/tty
?
(4)
Virtual console
. This is the terminal device for the virtual console itself and is numbered /dev/tty1 through /dev/tty63 .
/dev/tty
??
(3) and /dev/pty
??
nal. These are called
(2)
Other
pseudo-TTY
TTY
devices used for emulating a termis and are identified by two lowercase letters and numbers, such as interest.
ttyq3 . To nondevelopers, these are mostly of theoretical
The file /usr/src/linux/Documentation/devices.txt
(quoted verbatim): also has this to say
145
18.4. Common Device Names 18.
U
NIX
Devices
Recommended links
It is recommended that these links exist on all systems:
/dev/core
/dev/ramdisk
/dev/ftape
/dev/bttv0
/dev/radio
/dev/i2o
*
/dev/scd
?
/proc/kcore ram0 qft0 video0 radio0
/dev/i2o/
*
sr
?
symbolic symbolic symbolic symbolic symbolic symbolic hard
Backward compatibility
Backward compatibility
Backward compatibility
Backward compatibility
Backward compatibility
Backward compatibility
Alternate SCSI CD-ROM name
Locally defined links
The following links may be established locally to conform to the configuration of the system. This is merely a tabulation of existing practice, and does not constitute a recommendation. However, if they exist, they should have the following uses:
/dev/mouse
/dev/tape
/dev/cdrom
/dev/cdwriter
/dev/scanner
/dev/modem
/dev/root
/dev/swap mouse port tape device
CD-ROM device
CD-writer scanner modem port root device swap device symbolic symbolic symbolic symbolic symbolic symbolic symbolic symbolic
Current mouse device
Current tape device
Current CD-ROM device
Current CD-writer device
Current scanner device
Current dialout device
Current root file system
Current swap device
/dev/modem should not be used for a modem which supports dial-in as well as dialout, as it tends to cause lock file problems. If it exists, /dev/modem should point to the appropriate primary TTY device (the use of the alternate callout devices is deprecated).
For SCSI devices, devices ( /dev/st
*
/dev/tape and and
/dev/sr
*
/dev/cdrom should point to the “cooked”
, respectively), whereas /dev/cdwriter and
/dev/scanner should point to the appropriate generic SCSI devices ( /dev/sg
*
).
/dev/mouse may point to a primary serial TTY device, a hardware mouse device, or a socket for a mouse driver program (e.g.
/dev/gpmdata ).
Sockets and pipes
Non-transient sockets and named pipes may exist in /dev . Common entries are:
/dev/printer
/dev/log
/dev/gpmdata socket socket socket lpd local socket syslog local socket mouse multiplexer
146
18.
U
NIX
Devices 18.5.
dd
,
tar
, and Tricks with Block Devices
dd probably originally stood for
disk dump
. It is actually just like cat except it can read and write in discrete blocks. It essentially reads and writes between devices while converting the data in some way. It is generally used in one of these ways:
¨ dd if=<in-file> of=<out-file> [bs=<block-size>] \
[count=<number-of-blocks>] [seek=<output-offset>] \
[skip=<input-offset>]
¥
5 dd if=<in-file> [bs=<block-size>] [count=<number-of-blocks>] \
[skip=<input-offset>] > <outfile> dd of=<out-file> [bs=<block-size>] [count=<number-of-blocks>] \
§
[seek=<output-offset>] < <infile>
¦ of=
To use dd , you must specify an input file and an output file with the options. If the is omitted, then dd of= option is omitted, then reads from stdin.
dd writes to stdout. If the if=
&
If you are confused, remember that dd thinks of if= and option
in
and
out
with respect to itself.
-
Note that caution.
dd is an unforgiving and destructive command that should be used with
To create a new RedHat boot floppy, find the and with a new floppy, run:
¨ dd if=boot.img of=/dev/fd0
§ boot.img
file on ftp.redhat.com
,
¥
¦
This command writes the raw disk image directly to the floppy disk. All distributions will have similar disk images for creating installation floppies (and sometimes rescue floppies).
If you have ever tried to repartition a L
INUX you will know that DOS/Windows FDISK disk back into a DOS/Windows disk, has bugs in it that prevent it from recreating the partition table. A quick
¨ dd if=/dev/zero of=/dev/hda bs=1024 count=10240
§
¥
¦
147
18.5.
dd
,
tar
, and Tricks with Block Devices 18.
U
NIX
Devices will write zeros to the first 10 megabytes of your first IDE drive. This will wipe out the partition table as well as any file system information and give you a “brand new” disk.
To zero a floppy disk is just as easy:
¨ dd if=/dev/zero of=/dev/fd0 bs=1024 count=1440
§
¥
¦
Even writing zeros to a floppy may not be sufficient. Specialized equipment can probably still read magnetic media after it has been erased several times. If, however, you write random bits to the floppy, it becomes completely impossible to determine
¥ mknod /dev/urandom c 1 9 for i in 1 2 3 4 ; do dd if=/dev/urandom of=/dev/fd0 bs=1024 count=1440
¦
Here is a nice trick to find out something about a hard drive:
¨ dd if=/dev/hda1 count=1 bs=512 | file -
§ gives x86 boot sector .
¥
¦
To discover what a floppy disk is, try
¨ dd if=/dev/fd0 count=1 bs=512 | file -
§ which gives floppies.
x86 boot sector, system )k?/bIHC, FAT (12 bit) for DOS
¥
¦
If you have two IDE drives that are of identical size, and provided that you are sure they contain no bad sectors and
¨
provided neither are mounted
, you can run
¥ to copy the entire disk and avoid having to install an operating system from scratch.
It doesn’t matter what is on the original (Windows, L
INUX
, or whatever) since each sector is identically duplicated; the new system will work perfectly.
(If they are not the same size, you will have to use file system exactly.) tar or mirrordir to replicate the
¦
148
18.
U
NIX
Devices 18.5.
dd
,
tar
, and Tricks with Block Devices
You can use tar to back up to
any
device. Consider periodic backups to an ordinary
IDE drive instead of a tape. Here we back up to the secondary slave:
¨ tar -cvzf /dev/hdd /bin /boot /dev /etc /home /lib /sbin /usr /var
§
¥
¦ tar can also back up across multiple floppy disks:
¨ tar -cvMf /dev/fd0 /home/simon
§
¥
¦
tar
¨ traditionally backs up onto tape drives. The commands mt -f /dev/st0 rewind tar -cvf /dev/st0 /home
§ rewind s csi t ape 0 and archive the /home directory onto it. You should not try to use compression with tape drives because they are error prone, and a single error could make the entire archive unrecoverable. The mt command stands for and controls generic SCSI tape devices. See also mt (1).
m agnetic t ape
¥
¦
If you don’t want to see any program output, just append > /dev/null mand. For example, we aren’t often interested in the output of make .
to the com-
& make is discussed later.
-
¨
Here we absorb everything save for error messages.
make > /dev/null
§
¥
¦
Then, of course, we can absorb all output
including
¨ make >& /dev/null
§ error messages with either
¥
¦
¥ make > /dev/null 2>&1
§
The device /dev/null finds innumerable uses in shell scripting to suppress the output of a command or to feed a command dummy (empty) input.
/dev/null is a
safe
¦
149
18.6. Creating Devices with
mknod
and
/dev/MAKEDEV
18.
U
NIX
Devices file from a security point of view. It is often used when a file is required for some feature in a configuration script, and you would like the particular feature disabled. For instance, specifying the users shell to /dev/null inside the password file will
certainly
prevent insecure use of a shell, and is an explicit way of saying that that account does
not
allow shell logins.
You can also use /dev/null to create a file containing nothing:
¨ cat /dev/null > myfile
§ or alternatively, to create a file containing only zeros. Try
¨ dd if=/dev/zero bs=1024 count=<number-of-kilobytes> > myfile
§
¥
¦
¥
¦
Although all devices are listed in the /dev in the file system by using the
¨ directory, you can create a device anywhere mknod command: mknod [-m <mode>] <file-name> [b|c] <major-number> <minor-number>
§
¥
¦
The letters b and c are for creating a block or character device, respectively.
To demonstrate, try
¨ mknod -m 0600 ˜/my-floppy b 2 0 ls -al /dev/fd0 ˜/my-floppy
§ my-floppy can be used just like /dev/fd0
Note carefully the
mode
(i.e., the permissions) of be readable and writable only to root
/dev/fd0 .
/dev/fd0 and to users belonging to the floppy should group, since we obviously don’t want an arbitrary user to be able to log in (remotely) and overwrite a floppy disk.
¥
¦
U
NIX
In fact, this is the reason for having devices represented as files in the first place.
files naturally support group access control, and therefore so do devices.
To create devices that are missing from your /dev directory (some esoteric devices will not be present by default), simply look up the device’s major and minor number in /usr/src/linux/Documentation/devices.txt
and use the mand. This procedure is, however, somewhat tedious, and the script mknod com-
/dev/MAKEDEV is usually available for convenience.
script.
You must be in the /dev directory before you run this
150
18.
U
NIX
Devices 18.6. Creating Devices with
mknod
and
/dev/MAKEDEV
Typical usage of
¨ cd /dev
./MAKEDEV -v fd0
./MAKEDEV -v fd1
§
MAKEDEV is to create a complete set of floppy disk devices.
The man page for MAKEDEV contains more details. In particular, it states:
Note that programs giving the error “ENOENT: No such file or directory” normally means that the device file is missing, whereas “ENODEV: No such device” normally means the kernel does not have the driver configured or loaded.
¥
¦
151
18.6. Creating Devices with
mknod
and
/dev/MAKEDEV
18.
U
NIX
Devices
152
Physical disks are divided into partitions.
&
See
/dev/hd??
tion as to how the disk is partitioned up is stored in a under Section 18.4.
-
partition table
Informa-
, which is a small area of the disk separate from the partitions themselves.
The physical drive itself usually comprises several actual disks of which both sides are used. The sides are labelled 0, 1, 2, 3, and so on, and are also called
heads
because one magnetic head per side does the actual reading and writing. Each side/head has tracks, and each track is divided into segments called
sectors
. Each sector typically holds 512 bytes. The total amount of space on the drive in bytes is therefore:
512 (sectors-per-track) (tracks-per-side) (number-of-sides)
A single track and all the tracks of the same diameter (on all the sides) are called a
cylinder
. Disks are normally talked about in terms of “cylinders and sectors” instead of
“sides, tracks, and sectors.” Partitions are (usually) divided along cylinder boundaries.
Hence, disks do not have arbitrarily sized partitions; rather, the size of the partition is usually a multiple of the amount of data held in a single cylinder. Partitions therefore have a definite inner and outer diameter. Figure 19.1 illustrates the layout of a hard disk.
153
19.1. The Physical Disk Structure 19. Partitions, File Systems, Formatting, Mounting
Partition Sector
Cylinder
Side 0
Side 1
Side 2
Side 3
Side 4
Side 5
Figure 19.1 Hard drive platters and sector layout
The system above is quite straightforward except for the curious limitation that partition tables have only 10 bits in which to store the partition’s cylinder offset. This means that no disk can have more than 1024 cylinders. This limitation was overcome by multiplying up the number of heads in software to reduce the number of cylinders,
&
Called
LBA
(Large Block Addressing) mode.
hence portraying a disk of impossible proportions. The user, however, need never be concerned that the physical disk is completely otherwise.
The partition table has room for only four partitions. For more partitions, one of these four partitions can be divided into many smaller partitions, called
logical
partitions.
The original four are then called
primary
partitions. If a primary partition is subdivided in this way, it is known as an
extended primary
or
extended
partition. Typically, the first primary partition will be small ( /dev/hda1 , say). The second primary partition will fill the rest of the disk as an extended partition ( /dev/hda2 , say). In this case, the entries in the partition table of /dev/hda3 and /dev/hda4 will be blank. The
154
19. Partitions, File Systems, Formatting, Mounting 19.2. Partitioning a New Disk extended partition can be subdivided repeatedly to give /dev/hda5 , /dev/hda6 , and so on.
A new disk has no partition information. Typing tioning utility. The command
¨ fdisk /dev/hda
§ fdisk will start an interactive parti-
¥
¦ fdisk s your primary master.
What follows is an example of the partitioning of a new hard drive. Most distributions these days have a simpler graphical system for creating partitions, so using fdisk will not be necessary at installation time. However, adding a new drive or transferring/copying a L
INUX system to new hardware will require partitioning.
On U
NIX
, each partition has its own
directory
.
Files under one directory might be stored on a different disk or a different partition to files in another directory
. Typically, the
/var directory (and all subdirectories beneath it) is stored on a different partition from the /usr directory (and all subdirectories beneath it).
Table 19.2 offers a general guideline as to how a server machine should be set up (with home computers, you can be far more liberal—most home PCs can do with merely a swap and / partition.). When you install a new server, your distribution should allow you to customize your partitions to match this table.
p
¨
If another operating system is already installed in the first partition, you can type and might see:
Command (m for help):
p
¥
Disk /dev/hda: 255 heads, 63 sectors, 788 cylinders
Units = cylinders of 16065 * 512 bytes
5
Device Boot Start
1
End
312
Blocks
2506108+
Id c
System
Win95 FAT32 (LBA)
¦
In such a case, you can just start adding further partitions.
The exact same procedure applies in the case of SCSI drives. The only difference is that /dev/hd
?
changes to /dev/sd
?
. (See Chapter 42 for SCSI device driver information.)
Here is a partitioning session with fdisk :
¨
[[email protected] /root]#
fdisk /dev/hda
Device contains neither a valid DOS partition table, nor Sun or SGI disklabel
¥
155
19.2. Partitioning a New Disk 19. Partitions, File Systems, Formatting, Mounting
Table 19.1 Which directories should have their own partitions, and their partitions’ sizes
Directory Size
(Megabytes)
Why?
swap
Twice the size of your
RAM
This is where memory is drawn from when you run out. The swap partition gives programs the impression that you have more RAM than you actually do, by swapping data in and out of this partition.
/boot
/var
/tmp
/usr
/home
/
5–10
100–1000
50
500–1500
Remainder of disk
50–100
Swap partitions cannot be over 128 MB, but you can have many of them. This limitation has been removed in newer kernels.
Disk access is obviously slow compared to direct RAM, but when a lot of idle programs are running, swapping to disk allows more real RAM for needy programs.
This directory need not be on a different partition to your you choose, there must be no chance that a file under
/
/boot partition (below). Whatever could span sectors that are over the 1024 cylinder boundary (i.e., outside of the first 500 megabytes of your hard drive). This is why /boot (or / ) is often made the first primary partition of the hard drive. If this requirment is not met, you get the famous system. See Section 31.2.4.
LI prompt on a nonbooting
Here is variable data, like log files, mail spool files, database files, and your web proxy cache (web cache and databases may need to be
much
bigger though). For newer distributions, this directory also contains any local data that this site serves (like FTP files or web pages). If you are going to be using a web cache, either store the stuff in a separate partition/disk or make your /var partition huge. Also, log files can grow to enormous sizes when there are problems. You don’t want a full or corrupted the rest of your disk. This is why it goes in its own partition.
/var partition to effect
Here is temporary data. Programs access this frequently and need it to be fast. It goes in a separate partition because programs
really
need to create a temporary file sometimes, and this should not be affected by other partitions becoming full. This partition is also more likely to be corrupted.
Here is your distribution (Debian , RedHat, Mandrake, etc.). It can be mounted readonly. If you have a disk whose write access can physically be disabled (like some SCSI drives), then you can put /usr secure system. Since /usr on a separate drive. Doing so will make for a much more is stock standard, this is the partition you can most afford to lose. Note however that elsewhere.
/usr/local/ may be important to you—possibly link this
Here are your users’ home directories. For older distributions, this directory also contains any local data that this site serves (like FTP files or web pages).
Anything not in any of the other directories is directly under your are the /bin (5MB), (possibly) /boot (3MB), /dev (0.1MB), /etc
/ directory. These
(4MB), /lib (20MB),
/mnt (0MB), /proc (0MB), and /sbin (4MB) directories. They are essential for the system to start up and contain minimal utilities for recovering the other partitions in an emergency. As stated above, if the /boot directory is in a separate partition, then / must be below the 1024 cylinder boundary (i.e., within the first 500 megabytes of your hard drive).
5
Building a new DOS disklabel. Changes will remain in memory only, until you decide to write them. After that, of course, the previous
¦
First, we use the
¨
Command (m for help): p option to print current partitions—
p
5
Disk /dev/hda: 255 heads, 63 sectors, 788 cylinders
Units = cylinders of 16065 * 512 bytes
§
Device Boot Start End Blocks Id System
156
¥
¦
19. Partitions, File Systems, Formatting, Mounting 19.2. Partitioning a New Disk
5
—of which there are clearly none. Now n
¨
Command (m for help):
Command action e p extended
n
primary partition (1-4) lets us add a new partition:
¥
¦
We want to define the first physical partition starting at the first cylinder:
¨
Partition number (1-4):
1
1
We would like an 80-megabyte partition.
tomatically with:
¨ fdisk calculates the last cylinder au-
+80M
¥
¦
5
Our next partition:
¨ n ew partition will span the rest of the disk and will be an
Command (m for help):
Command action e extended
n
p primary partition (1-4)
e
Partition number (1-4):
2
First cylinder (12-788, default 12):
12
788
e xtended
¥
¦
5
Our remaining l ogical partitions fit within the extended partition:
¨
Command (m for help):
n
Command action l logical (5 or over) p primary partition (1-4)
l
First cylinder (12-788, default 12):
12
Last cylinder or +size or +sizeM or +sizeK (12-788, default 788):
+64M
10
15
Command (m for help):
Command action l p
n
logical (5 or over) primary partition (1-4)
l
First cylinder (21-788, default 21):
21
Last cylinder or +size or +sizeM or +sizeK (21-788, default 788):
+100M
157
¥
¦
¥
19.2. Partitioning a New Disk 19. Partitions, File Systems, Formatting, Mounting
20
Command (m for help):
Command action l p
n
logical (5 or over) primary partition (1-4)
l
First cylinder (34-788, default 34):
34
Last cylinder or +size or +sizeM or +sizeK (34-788, default 788):
+200M
25
30
Command (m for help):
Command action
n
l p logical (5 or over) primary partition (1-4)
l
First cylinder (60-788, default 60):
60
Last cylinder or +size or +sizeM or +sizeK (60-788, default 788):
+1500M
35
Command (m for help):
Command action l p
n
logical (5 or over) primary partition (1-4)
l
First cylinder (252-788, default 252):
252
788
¦
The default determine what kind of file system is stored there. Entering
¨
partition type
is a single byte that the operating system will look at to l lists all known types:
Command (m for help):
l
¥
5
0 Empty
[...]
8 AIX
16
4d
9 AIX bootable
[...]
12
4e
Compaq diagnost 56
Hidden FAT16 <3 5c
Hidden FAT16
Golden Bow
Priam Edisk
61
QNX4.x
82
QNX4.x 2nd part 83 a5
SpeedStor
Linux swap
Linux
BSD/386 a6 OpenBSD db e1
CP/M / CTOS / .
DOS access ff BBT
¦ fdisk will set the type to type of the swap partition:
¨
Linux
Command (m for help):
t
Partition number (1-9):
5
Hex code (type L to list codes):
82
by default. We only need to explicitly set the
¥
¦
Now we need to set the boot a ble flag on the first partition, since BIOS’s will not boot a disk without at least one bootable partition:
¨
Command (m for help):
a
¥
158
19. Partitions, File Systems, Formatting, Mounting 19.2. Partitioning a New Disk
1
¦
Displaying our results gives:
¨
Command (m for help):
p
5
10
Disk /dev/hda: 255 heads, 63 sectors, 788 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot
/dev/hda1 *
/dev/hda2
/dev/hda5
/dev/hda6
/dev/hda7
/dev/hda8
Start
1
12
12
21
34
60
252
End
11
788
20
33
59
251
788
Blocks
88326
6241252+
72261
104391
208813+
1542208+
4313421
Id
83
5
82
83
83
83
83
System
Linux
Extended
Linux swap
Linux
Linux
Linux
Linux
¥
¦
At this point, nothing has been committed to disk. We this step is irreversible): w rite it as follows (
Note:
¨
Command (m for help):
w
The partition table has been altered!
¥
5
Calling ioctl() to re-read partition table.
Syncing disks.
WARNING: If you have created or modified any DOS 6.x
partitions, please see the fdisk manual page for additional
¦
Even having w ritten the partition, fdisk may give a warning that the kernel does not know about the new partitions. This happens if the disk is already in use. In this case, you will need to reboot. For the above partition, the kernel will give the following information at boot time:
¨
Partition check: hda: hda1 hda2 < hda5 hda6 hda7 hda8 hda9 >
§
¥
¦
The < . . .
> partitions.
shows that partition hda2 is extended and is subdivided into five smaller
159
19.3. Formatting Devices 19. Partitions, File Systems, Formatting, Mounting
Disk drives are usually read in blocks of 1024 bytes (two sectors). From the point of view of anyone accessing the device, blocks are stored consecutively—there is no need to think about cylinders or heads—so that any program can read the disk as though it were a linear tape. Try
¨ less /dev/hda1 less -f /dev/hda1
§
¥
¦
Now a complex directory structure with many files of arbitrary size needs to be stored in this contiguous partition. This poses the problem of what to do with a file that gets deleted and leaves a data “hole” in the partition, or a file that has to be split into parts because there is no single contiguous space big enough to hold it. Files also have to be indexed in such a way that they can be found quickly (consider that there can easily be 10,000 files on a system). U
NIX
’s symbolic/hard links and devices files also have to be stored.
To cope with this complexity, operating systems have a format for storing files called the
file system
( fs ). Like MS-DOS with its FAT file system or Windows with its
FAT32 file system, L
INUX has a file system called the
2nd extended file system
, or ext2 .
Whereas ext2 is the traditional native L
INUX file system, three other native file systems have recently become available: SGI’s XFS file system, the ext3fs file system, and the reiserfs file system. These three support fast and reliable recovery in the event of a power failure, using a feature called
journaling
. A journaling file system prewrites disk alterations to a separate log to facilitate recovery if the file system reaches an incoherent state. (See Section 19.5.)
To create a file system on a blank partition, use the command mkfs (or one of its variants). To create a L
INUX ext2 file system on the first partition of the primary master
¥ mkfs -t ext2 -c /dev/hda1
§ or, alternatively
¨
¦
The -c option means to check for bad blocks by reading through the entire disk first.
¥
¦
160
19. Partitions, File Systems, Formatting, Mounting 19.3. Formatting Devices
This is a
read-only
check and causes unreadable blocks to be flagged as such and not be used. To do a full
read-write
check, use the badblocks command. This command writes to and verifies every bit in that partition. Although the -c option should always be used on a new disk, doing a full read-write test is probably pedantic. For the above partition, this test would be:
¨ badblocks -o blocks-list.txt -s -w /dev/hda1 88326 mke2fs -l blocks-list.txt /dev/hda1
§
¥
¦
After running mke2fs , we will find that
¨ dd if=/dev/hda1 count=8 bs=1024 | file -
§ gives Linux/i386 ext2 filesystem .
¥
¦
New kinds of removable devices are being released all the time. Whatever the device, the same formatting procedure is used. Most are IDE compatible, which means you can access them through /dev/hd
?
.
The following examples are a parallel port IDE disk drive, a parallel port ATAPI
CD-ROM drive, a parallel port ATAPI disk drive, and your “A:” floppy drive, respectively:
¨ mke2fs -c /dev/pda1 mke2fs -c /dev/pcd0 mke2fs -c /dev/pf0 mke2fs -c /dev/fd0
§
¥
¦
Actually, using an ext2 file system on a floppy drive wastes a lot of space.
Rather, use an MS-DOS file system, which has less overhead and can be read by anyone
(see Section 19.3.4).
You often will not want to be bothered with partitioning a device that is only going to have one partition anyway. In this case, you can use the whole disk as one partition. An example is a removable IDE drive as a primary slave drives as well as removable IDE brackets are commercial examples.
:
¨ mke2fs -c /dev/hdb
§
&
LS120
disks and
Jazz
¥
¦
161
19.3. Formatting Devices 19. Partitions, File Systems, Formatting, Mounting
5
Accessing files on MS-DOS/Windows floppies is explained in Section 4.16. The command mformat A: will format a floppy, but this command merely initializes the file system; it does not check for bad blocks or do the low-level formatting necessary to reformat floppies to odd storage sizes.
A command, called superformat , from the fdutils package
&
You may have to find this package on the Internet. See Chapter 24 for how to compile and install source packages.
formats a floppy in any way that you like. A more common (but less thorough) command is fdformat from the util-linux package. It verifies that each track is working properly and compensates for variations between the mechanics of different floppy drives. To format a 3.5-inch 1440-KB, 1680-KB, or 1920-KB floppy, respectively, run:
¨ cd /dev
./MAKEDEV -v fd0 superformat /dev/fd0H1440 superformat /dev/fd0H1690
¥
¦
Note that these are “long file name” floppies (VFAT), not old 13-characterfilename MS-DOS floppies.
Most users would have only ever used a 3.5-inch floppy as a “1.44 MB” floppy.
In fact, the disk media and magnetic head can write much more densely than this specification, allowing 24 sectors per track to be stored instead of the usual 18. This is why there is more than one device file for the same drive. Some inferior disks will, however, give errors when trying to format that densely— errors when this happens.
superformat will show
See Table 18.1 on page 145 for the naming conventions of floppy devices, and their many respective formats.
The
¨ mkswap command formats a partition to be used as a swap device. For our disk, mkswap -c /dev/hda5
§
-c has the same meaning as previously—to check for bad blocks.
¥
¦
Once the partition is formatted, the kernel can be signalled to use that partition as a swap partition with
¨ swapon /dev/hda5
§
¥
¦
162
19. Partitions, File Systems, Formatting, Mounting 19.4. Device Mounting and to stop usage,
¨ swapoff /dev/hda5
§
Swap partitions cannot be larger than 128 MB, although you can have as many of them as you like. You can swapon many different partitions simultaneously.
¥
¦
The question of how to access files on an arbitrary disk (without of course) is answered here.
C: , D: , etc., notation,
In U
NIX
, there is only one root file system that spans many disks. Different directories may actually exist on a different physical disk.
To bind a directory to a physical device (like a partition or a
CD-ROM) so that the device’s file system can be read is called
mounting
the device.
¨
The mount command is used as follows: mount [-t <fstype>] [-o <option>] <device> <directory> umount [-f] [<device>|<directory>]
§
The -t option specifies the kind of file system, and can often be omitted since L
INUX can autodetect most file systems.
<fstype> can be one of adfs , affs , autofs , coda , coherent , ntfs , proc , devpts , qnx4 , efs , romfs , ext2 , smbfs hfs
,
, hpfs sysv ,
, ufs iso9660
, umsdos
,
, most common file systems are discussed below. The -o mount (8) for all possible options.
minix vfat ,
, msdos xenix
,
, or ncpfs , xiafs nfs
. The option is not usually used. See
,
¥
¦
Put your distribution CD-ROM disk into your CD-ROM drive and mount it with
¨ ls /mnt/cdrom mount -t iso9660 -o ro /dev/hdb /mnt/cdrom
§
(Your CD-ROM might be make a soft link
/dev/hdc
/dev/cdrom or /dev/hdd , however—in this case you should pointing to the correct device. Your distribution may also prefer /cdrom over /mnt/cdrom .) Now cd to your /mnt/cdrom directory. You
¥
¦
163
19.4. Device Mounting 19. Partitions, File Systems, Formatting, Mounting will notice that it is no longer empty, but “contains” the CD-ROM’s files. What is happening is that the kernel is redirecting all lookups from the directory /mnt/cdrom to read from the CD-ROM disk. You can browse around these files as though they were already copied onto your hard drive. This is one of the things that makes U
NIX cool.
When you are finished with the CD-ROM
¨ umount /dev/hdb eject /dev/hdb
§
unmount
it with
¥
¦
Instead of using
¨ mtools mkdir /mnt/floppy
, you could mount the floppy disk with mount -t vfat /dev/fd0 /mnt/floppy
§ or, for older MS-DOS floppies, use
¨ mkdir /mnt/floppy mount -t msdos /dev/fd0 /mnt/floppy
§
Before you eject the floppy, it is essential to run
¨ umount /dev/fd0
§ in order that cached data is committed to the disk. Failing to ejecting will probably corrupt its file system.
umount a floppy before
¥
¦
¥
¦
¥
¦
Mounting a Windows partition can also be done with the partitions (read-only) with the ntfs vfat file system, and NT file system. VAT32 is also supported (and autodetected). For example,
¨ mkdir /windows mount -t vfat /dev/hda1 /windows mkdir /nt mount -t ntfs /dev/hda2 /nt
§
¥
¦
164
19. Partitions, File Systems, Formatting, Mounting 19.5. File System Repair:
fsck
fsck stands for
file system check
.
fsck scans the file system, reporting and fixing errors. Errors would normally occur only if the kernel halted before the file system was umount ed. In this case, it may have been in the middle of a write operation which left the file system in an
incoherent
state. This usually happens because of a power failure.
The file system is then said to be
unclean
.
fsck is used as follows:
¨ fsck [-V] [-a] [-t <fstype>] <device>
§
-V means to produce verbose output.
-a means to check the file system noninteractively—meaning to not ask the user before trying to make any repairs.
¥
¦
Here is what you would normally do with L
INUX lot about the ext2 file system:
¨ if you don’t know a whole
¥
¦ although you can omit the -t
Note that you should not run option because L fsck
INUX autodetects the file system.
on a mounted file system. In exceptional circumstances it is permissible to run fsck on a file system that has been mounted read-only.
fsck actually just runs a program specific to that file system. In the case of ext2 , the command e2fsck (also known as fsck.ext2
) is run. See e2fsck (8) for exhaustive details.
During an interactive check (without the -a option, or with the -r option— the default), various questions may be asked of you, as regards fixing and saving things.
lost+found
It’s best to save stuff if you aren’t sure; it will be placed in the directory below the root directory of the particular device. In the example system further below, there would exist the directories
/home/lost+found , /var/lost+found , /usr/lost+found
/lost+found
, etc. After doing a
, check on, say, /dev/hda9 , list the /home/lost+found directory and delete what you think you don’t need. These will usually be temporary files and log files (files that change often). It’s rare to lose important files because of an unclean shutdown.
Just read Section 19.5 again and run fsck on the file system that reported the error.
165
19.7. Automatic Mounts:
fstab
19. Partitions, File Systems, Formatting, Mounting
Manual mounts are explained above for new and removable disks. It is, of course necessary for file systems to be automatically mounted at boot time. What gets mounted and how is specified in the configuration file /etc/fstab .
/etc/fstab will usually look something like this for the disk we partitioned
¥
5
/dev/hda1
/dev/hda6
/dev/hda7
/dev/hda8
/dev/hda9
/dev/hda5
/dev/fd0
/dev/cdrom none
10
/
/tmp
/var
/usr
/home swap
/mnt/floppy
/mnt/cdrom
/proc
/dev/pts ext2 ext2 ext2 defaults defaults defaults ext2 ext2 swap auto defaults defaults defaults noauto,user iso9660 noauto,ro,user proc devpts defaults mode=0622
1 1
1 2
1 2
1 2
1 2
0 0
0 0
0 0
0 0
0 0
¦
For the moment we are interested in the first six lines only. The first three fields
(columns) dictate the partition, the directory where it is to be mounted, and the file system type, respectively. The fourth field gives options (the -o option to mount ).
The fifth field tells whether the file system contains real files. The field is used by the dump command to decide if it should be backed up. This is not commonly used.
The last field tells the order in which an fsck should be done on the partitions.
The / partition should come first with a 1 , and all other partitions should come directly after. Placing 2 ’s everywhere else ensures that partitions on different disks can be checked in parallel, which speeds things up slightly at boot time.
The floppy mount command.
and cdrom entries enable you to use an abbreviated form of the mount will just look up the corresponding directory and file system type from
¨
/etc/fstab . Try mount /dev/cdrom
§
¥
¦
These entries also have the devices. The ro user option, which allows ordinary users to mount these option once again tells to mount the CD-ROM read only, and the noauto command tells comes further below.) mount
not
to mount these file systems at boot time. (More proc is a kernel information database that looks like a file system. For example
/proc/cpuinfo is not any kind of file that actually exists on a disk somewhere. Try cat /proc/cpuinfo .
Many programs use /proc to get dynamic information on the status and configuration of your machine. More on this is discussed in Section 42.4.
166
19. Partitions, File Systems, Formatting, Mounting 19.8. Manually Mounting
/proc
The devpts file system is another pseudo file system that generates terminal master/slave pairs for programs. This is mostly of concern to developers.
You can mount the
¨ proc file system with the command mount -t proc /proc /proc
§
This is an exception to the normal mount usage. Note that all common L
INUX installations require /proc to be mounted at boot time. The only times you will need this command are for manual startup or when doing a chroot . (See page 178.)
¥
¦
A
RAM device
area of RAM.
is a block device that can be used as a disk but really points to a physical
A
loopback device
is a block device that can be used as a disk but really points to an ordinary file somewhere.
If your imagination isn’t already running wild, consider creating a floppy disk with file system, files and all,
without actually having a floppy disk
, and being able to dump this creation to floppy at any time with dd . You can also have a whole other
L
INUX system inside a 500 MB file on a Windows partition
and
obviating having to repartition a Windows machine just to run L be done with loopback and RAM devices.
boot into it—thus
INUX
. All this can
5
The operations are quite trivial. To create an ext2
¨ floppy inside a 1440 KB dd if=/dev/zero of=˜/file-floppy count=1440 bs=1024 losetup /dev/loop0 ˜/file-floppy mke2fs /dev/loop0 mkdir ˜/mnt mount /dev/loop0 ˜/mnt ls -al ˜/mnt
§
file
, run:
¥
¦
When you are finished copying the files that you want into ˜/mnt , merely run
167
19.10. Remounting 19. Partitions, File Systems, Formatting, Mounting
¨ umount ˜/mnt losetup -d /dev/loop0
§
5
To dump the file system to a floppy, run
¨ dd if=˜/file-floppy of=/dev/fd0 count=1440 bs=1024
§
A similar procedure for RAM devices is
¨ dd if=/dev/zero of=/dev/ram0 count=1440 bs=1024 mke2fs /dev/ram0 mkdir ˜/mnt mount /dev/ram0 ˜/mnt ls -al ˜/mnt
§ ¦
When you are finished copying the files that you want into
¨ umount ˜/mnt
§
˜/mnt , merely run
To dump the file system to a floppy or file, respectively, run:
¨ dd if=/dev/ram0 of=/dev/fd0 count=1440 bs=1024
¥
¥
¦
¦
¥
¦
¥
¥
¦
Another trick is to move your CD-ROM to a file for high-speed access. Here, we use a shortcut instead of the
¨ losetup command: dd if=/dev/cdrom of=some_name.iso
mount -t iso9660 -o ro,loop=/dev/loop0 some_name.iso /cdrom
§
¥
¦
A file system that is already mounted as for example, with r eado nly can be remounted as r eadw rite,
168
19. Partitions, File Systems, Formatting, Mounting 19.11. Disk
sync
¨ mount -o rw,remount /dev/hda1 /
§
This command is useful when you log in in single-user mode with no write access to your root partition.
¥
¦
The kernel caches write operations in memory for performance reasons. These
flush
(physically commit to the magnetic media) every so often, but you sometimes want to force a flush. This is done simply with
¨ sync
§
¥
¦
169
19.11. Disk
sync
19. Partitions, File Systems, Formatting, Mounting
170
This chapter completes our discussion of sh shell scripting begun in Chapter 7 and expanded on in Chapter 9. These three chapters represent almost everything you can do with the bash shell.
The special operator && and || can be used to execute functions in sequence. For
¥ grep ’ˆharry:’ /etc/passwd || useradd harry
§
The || means to only execute the second command if the first command returns an error. In the above case, grep will return an exit code of 1 if harry is not in the
/etc/passwd file, causing useradd to be executed.
¦
An alternate representation is
¨ grep -v ’ˆharry:’ /etc/passwd && useradd harry
§ where the ing to ||
-v option inverts the sense of matching of grep .
&& has the opposite mean-
, that is, to execute the second command only if the first succeeds.
¥
¦
Adept script writers often string together many commands to create the most succinct representation of an operation:
¨ grep -v ’ˆharry:’ /etc/passwd && useradd harry || \ echo "‘date‘: useradd failed" >> /var/log/my_special_log
§
¥
¦
171
20.2. Special Parameters:
$?
,
$*
,. . .
20. Advanced Shell Scripting
An ordinary variable can be expanded with like PATH and special variables like PWD and
$
VARNAME
RANDOM
. Commonly used variables were covered in Chapter 9. Further special expansions are documented in the following section, quoted verbatim from the bash man page (the footnotes are mine).
1
Special Parameters
The shell treats several parameters specially.
referenced; assignment to them is not allowed.
These parameters may only be
$*
Expands to the positional parameters (i.e., the command-line arguments passed to the shell script, with
$1
being the first argument,
$2
the second etc.), starting from one. When the expansion occurs within double quotes, it expands to a single word with the value of each parameter separated by the first character of the
IFS
special variable. That is, ”
$*
” is equivalent to ”
$1
c
$2
c
...
”, where
c
is the first character of the value of the parameters are separated by spaces. If
IFS
IFS
variable. If
IFS
is unset, the is null, the parameters are joined without intervening separators.
Expands to the positional parameters, starting from one. When the expansion occurs within double quotes, each parameter expands to a separate word.
That is, ”
” is equivalent to ”
$1
” ”
$2
” ... When there are no positional parameters, ”
” and
expand to nothing (i.e., they are removed).
&
Hint: this is very useful for writing wrapper shell scripts that just add one argument.
-
$#
Expands to the number of positional parameters in decimal (i.e. the number of command-line arguments).
$?
Expands to the status of the most recently executed foreground pipeline.
the exit code of the last command.
-
&
I.e.,
$-
Expands to the current option flags as specified upon invocation, by the builtin command, or those set by the shell itself (such as the
-i
option).
set
$$
Expands to the process ID of the shell. In a () subshell, it expands to the process
ID of the current shell, not the subshell.
$!
Expands to the process ID of the most recently executed background (asynchronous) command.
mand
&
, the variable
$!
&
I.e., after executing a background command with will give its process ID.
-
com-
$0
Expands to the name of the shell or shell script. This is set at shell initialization.
If
bash
is invoked with a file of commands,
$0
is set to the name of that file.
If
bash
is started with the
-c
option, then
$0
is set to the first argument after the string to be executed, if one is present. Otherwise, it is set to the file name used to invoke
bash
, as given by argument zero.
&
Note that
basename $0
useful way to get the name of the current command without the leading path.
is a
1
Thanks to Brian Fox and Chet Ramey for this material.
172
20. Advanced Shell Scripting 20.3. Expansion
$-
At shell startup, set to the absolute file name of the shell or shell script being executed as passed in the argument list. Subsequently, expands to the last argument to the previous command, after expansion. Also set to the full file name of each command executed and placed in the environment exported to that command. When checking mail, this parameter holds the name of the mail file currently being checked.
Expansion
refers to the way bash modifies the command-line before executing it.
bash performs several textual modifications to the command-line, proceeding in the following order:
Brace expansion
We have already shown how you can use, for example, the shorthand touch file {one,two,three}.txt
to create multiple files file one.txt
, file two.txt
, and file three.txt
. This is known as brace expansion and occurs before any other kind of modification to the command-line.
Tilde expansion
HOME null).
The special character ˜ is replaced with the full path contained in the environment variable or the home directory of the users login (if
˜+ is replaced with the current working directory and ˜-
$HOME is is replaced with the most recent previous working directory. The last two are rarely used.
Parameter expansion
that $
VAR
and
This refers to expanding anything that begins with a
${
VAR
}
$ . Note do exactly the same thing, except in the latter case,
VAR
can contain non-“whole word” characters that would normally confuse bash .
There are several parameter expansion tricks that you can use to do string manipulation. Most shell programmers never bother with these, probably because they are not well supported by other U
NIX systems.
${
VAR
:-
default
}
This will result in $
VAR
case it will result in
default
.
unless
VAR
is unset or null, in which
${
VAR
:=
default
}
it is empty.
Same as previous except that
default
is also assigned to VAR if
${
VAR
:-
default
}
This will result in an empty string if otherwise it will result in
default
VAR
is unset or null;
. This is the opposite behavior of ${
VAR
:-
default
} .
${
VAR
:?
message
}
This will result in $
VAR
unless
VAR
is unset or null, in which case an error message containing
message
is displayed.
${
VAR
:
offset
} or ${
VAR
:
n
:
l
}
then the following
l
This produces the characters. If
l n
th character of $
VAR
and is not present, then all characters to the right of the
n
th character are produced. This is useful for splitting up strings.
Try:
173
20.3. Expansion 20. Advanced Shell Scripting
¨
TEXT=scripting_for_phun echo ${TEXT:10:3} echo ${TEXT:10}
§
${#
VAR
}
Gives the length of $
VAR
.
${!
PRE
*}
Gives a list of all variables whose names begin with
PRE
.
${
VAR
#
pattern
}
$
VAR
is returned with the glob expression from the leading part of the string. For instance,
pattern
${TEXT#scr} removed in the above example will return ripting for phun .
${
VAR
##
pattern
}
tern
This is the same as the previous expansion except that if
pat-
contains wild cards, then it will try to match the maximum length of characters.
${
VAR
%
pattern
}
The same as ${
VAR
#
pattern
} moved from the trailing part of the string.
except that characters are re-
${
VAR
%%
pattern
}
The same as ${
VAR
##
pattern
} moved from the trailing part of the string.
except that characters are re-
${
VAR
/
search
/
replace
}
search
replaced with
$
VAR replace
.
is returned with the first occurrence of the string
${
VAR
/#
search
/
replace
}
Same as ${
VAR
/
search
/
replace
} except that the match is attempted from the leading part of $
VAR
.
${
VAR
/%
search
/
replace
}
Same as ${
VAR
/
search
/
replace
} except that the match is attempted at the trailing part of $
VAR
.
${
VAR
//
search
/
replace
}
stances of
search
Same as are replaced.
${
VAR
/
search
/
replace
} except that all in-
¥
¦
Backquote expansion
We have already shown backquote expansion in 7.12. Note that the additional notation $(
command
) is equivalent to ‘
command
‘ except that escapes (i.e., \ ) are not required for special characters.
Arithmetic expansion
We have already shown arithmetic expansion on page 62. Note that the additional notation $((
expression
)) is equivalent to $[
expression
].
Finally
The last modifications to the command-line are the splitting of the commandline into words according to the white space between them. The IFS (
Internal Field Separator
) environment variable determines what characters delimit command-line words (usually whitespace). With the command-line divided into words, path names are expanded according to glob wild cards. Consult bash (1) for a comprehensive description of the pattern matching options that most people don’t know about.
174
20. Advanced Shell Scripting 20.4. Built-in Commands
Many commands operate some built-in functionality of bash or are especially interpreted. These do not invoke an executable off the file system. Some of these were described in Chapter 7, and a few more are discussed here. For an exhaustive description, consult bash (1).
5
:
¨
A single colon by itself does nothing. It is useful for a “no operation” line such as: if <command> ; then
: else echo "<command> was unsuccessful" fi
§
¥
¦
.
filename args ...
A single dot is the same as the source command. See below.
alias
¨
command
=
value
alias necho="echo -n" necho "hello"
§
Creates a pseudonym for a command. Try:
Some distributions alias the mv , cp , and rm commands to the same pseudonym with the -i ( i nteractive) option set. This prevents files from being deleted without prompting, but can be irritating for the administrator. See your file for these settings. See also unalias .
˜/.bashrc
¥
¦
unalias
command
Removes an alias created with alias .
alias -p
Prints list of aliases.
eval
arg ...
Executes
arg
s as a line of shell script.
exec
command arg ...
Begins executing
command
under the same process ID as the current script. This is most often used for shell scripts that are mere “wrapper” scripts for real programs. The wrapper script sets any environment variables and then exec s the real program binary as its last line.
exec should never return.
local
var
=
value
Assigns a value to a variable. The resulting variable is visible only within the current function.
pushd
directory
rectories.
and popd
pushd
These two commands are useful for jumping around dican be used instead of cd , but unlike cd , the directory is saved onto a list of directories. At any time, entering popd returns you to the previous directory. This is nice for navigation since it keeps a history of wherever you have been.
175
20.5. Trapping Signals — the
trap
Command 20. Advanced Shell Scripting
printf
like
format args ...
echo
This is like the
C
printf function. It outputs to the terminal but is useful for more complex formatting of output. See printf (3) for details and try printf "%10.3e\n" 12 as an example.
pwd
Prints the present working directory.
set
Prints the value of all environment variables. See also Section 20.6 on the command.
set
source
filename args ...
Reads
filename
into the current current shell environment.
This is useful for executing a shell script when environment variables set by that script must be preserved.
times
Prints the accumulated user and system times for the shell and for processes run from the shell.
type
command
Tells whether
command
is an alias, a built-in or a system executable.
ulimit
Prints and sets various user resource limits like memory usage limits and
CPU limits. See bash (1) for details.
umask
See Section 14.2.
unset
VAR
Deletes a variable or environment variable.
unset -f
func
Deletes a function.
wait
Pauses until all background jobs have completed.
wait
PID
Pauses until background process with process ID of returns the exit code of the background process.
PID
has exited, then
wait %
job
Same with respect to a job spec.
You will often want to make your script perform certain actions in response to a signal.
A list of signals can be found on page 86. To trap a signal, create a function and then use the
¨ trap
#!/bin/sh command to bind the function to the signal.
¥
5 function on_hangup ()
{ echo ’Hangup (SIGHUP) signal recieved’
}
176
20. Advanced Shell Scripting 20.6. Internal Settings — the
set
Command trap on_hangup SIGHUP
10 while true ; do done sleep 1 exit 0
§
Run the above script and then send the process ID the
Section 9.5.)
-HUP signal to test it. (See
¦ signal
¨
An important function of a program is to clean up after itself on exit. The special
EXIT (not really a signal) executes code on exit of the script:
#!/bin/sh
¥
5 function on_exit ()
{ echo ’I should remove temp files now’
} trap on_exit EXIT
10 while true ; do sleep 1 done exit 0
§
Breaking the above program will cause it to print its own epitaph.
If is given instead of a function name, then the signal is unbound (i.e., set to its default value).
¦
The set command can modify certain behavioral settings of the shell. Your current options can be displayed with echo $. Various set commands are usually entered at the top of a script or given as command-line options to instead of set -
option
bash . Using disables the option. Here are a few examples: set +
option
set -e
Exit immediately if any simple command gives an error.
set -h
Cache the location of commands in your PATH . The shell will become confused if binaries are suddenly inserted into the directories of your PATH , perhaps causing a No such file or directory error. In this case, disable this option or restart your shell. This option is enabled by default.
177
20.7. Useful Scripts and Commands 20. Advanced Shell Scripting
set -n
Read commands without executing them. This command is useful for syntax checking.
set -o posix
Comply exactly with the POSIX 1003.2 standard.
set -u
Report an error when trying to reference a variable that is unset. Usually bash just fills in an empty string.
set -v
Print each line of script as it is executed.
set -x
Display each command expansion as it is executed.
set -C
Do not overwrite existing files when using writing.
> . You can use >| to force over-
Here is a collection of useful utility scripts that people are always asking for on the mailing lists. See page 517 for several security check scripts.
The chroot command makes a process think that its root file system is not actually
For example, on one system I have a complete Debian
/ installation residing under a
.
directory, say,
¨
/mnt/debian . I can issue the command chroot /mnt/debian bash -i
§
¥
¦ to run the bash shell interactively, under the root file system command will hence run the command
/mnt/debian
/mnt/debian/bin/bash -i
. This
. All further commands processed under this shell will have no knowledge of the real root directory, so I can use my Debian installation without having to reboot. All further commands will effectively behave as though they are inside a separate U
NIX you may have to remount your /proc file system inside your machine. One caveat: chroot ’d file system— see page 167.
This useful for improving security. Insecure network services can change to a different root directory—any corruption will not affect the real system.
Most rescue disks have a chroot command. After booting the disk, you can manually mount the file systems on your hard drive, and then issue a chroot to begin using your machine as usual. Note that the command arguments invokes a shell by default.
chroot <new-root> without
178
20. Advanced Shell Scripting 20.7. Useful Scripts and Commands
The if test
...
was used to control program flow in Chapter 7. Bash, however, has a built-in alias for the test function: the left square brace, [ .
Using [ instead of test adds only elegance:
¨ if [ 5 -le 3 ] ; then echo ’5 < 3’ fi
§
¥
¦
It is important at this point to realize that the of arithmetic. It merely executes a command test if command understands nothing
(or in this case [ ) and tests the exit code. If the exit code is zero, then the command is considered to be successful and proceeds with the body of the if statement block. The onus is on the test if command to properly evaluate the expression given to it.
if can equally well be used with any command:
¨ if echo "$PATH" | grep -qwv /usr/local/bin ; then export PATH="$PATH:/usr/local/bin" fi
§ conditionally adds /usr/local/bin if grep does not find it in your PATH .
¥
¦
You may often want to find the differences between two files, for example to see what changes have been made to a file between versions. Or, when a large batch of source code may have been updated, it is silly to download the entire directory tree if there have been only a few small changes. You would want a list of alterations instead.
The diff utility dumps the lines that differ between two files. It can be used as
¥
¦
You can also use diff to see difference netween two directory trees.
compares all corresponding files:
¨ diff recursively diff -u --recursive --new-file <old-dir> <new-dir> > <patch-file>.diff
§
The output is known as a changes, and to bring
patch file
<old-dir> against a directory tree, that can be used both to see up to date with <new-dir> .
¥
¦
Patch files may also end in applied to <old-dir> with
.patch
and are often gzip ped. The patch file can be
179
20.7. Useful Scripts and Commands 20. Advanced Shell Scripting
¨ cd <old-dir> patch -p1 -s < <patch-file>.diff
§ which makes <old-dir> identical to <new-dir> . The -p1 option strips the leading directory name from the patch file. The presence of a leading directory name in the patch file often confuses the patch command.
¥
¦
You may want to leave this example until you have covered more networking theory.
The acid test for an Internet connection is a successful DNS query. You can use ping to test whether a server is up, but some networks filter ICMP messages and ping does not check that your DNS is working.
dig sends a single UDP packet similar to ping . Unfortunately, it takes rather long to time out, so we fudge in a kill after 2 seconds.
This script blocks until it successfully queries a remote name server. Typically, the next few lines of following script would run fetchmail and a mail server queue flush, or possibly uucico . Do set the name server IP to something appropriate like that of your local ISP; and increase the 2 second time out if your name server typically takes longer to respond.
¨
MY_DNS_SERVER=197.22.201.154
¥
5
10 while true ; do
( dig @$MY_DNS_SERVER netscape.com IN A &
DIG_PID=$!
{ sleep 2 ; kill $DIG_PID ; } & sleep 1 wait $DIG_PID done
§
) 2>/dev/null | grep -q ’ˆ[ˆ;]*netscape.com’ && break
¦
Recursively searching through a directory tree can be done easily with the find and xargs commands. You should consult both these man pages. The following command pipe searches through the kernel source for anything about the “pcnet” Ethernet card, printing also the line number:
180
20. Advanced Shell Scripting 20.7. Useful Scripts and Commands
¨ find /usr/src/linux -follow -type f | xargs grep -iHn pcnet
§
(You will notice how this command returns rather a lot of data. However, going through it carefully can be quite instructive.)
¥
¦
Limiting a search to a certain file extension is just another common use of this pipe sequence.
¨ ¥
¦
Note that new versions of through directories.
grep also have a -r option to recursively search
Often you will want to perform a search-and-replace throughout all the files in an entire source tree. A typical example is the changing of a function call name throughout lots of
C
source. The following script is a must for any way it recursively calls itself.
¨
#!/bin/sh
/usr/local/bin/ . Notice the
¥
N=‘basename $0‘
5 if [ "$1" = "-v" ] ; then
VERBOSE="-v" shift fi
10
15 if [ "$3" = "" -o "$1" = "-h" -o "$1" = "--help" ] ; then echo "$N: Usage" echo " $N [-h|--help] [-v] <regexp-search> \
<regexp-replace> <glob-file>" echo exit 0 fi
S="$1" ; shift ; R="$1" ; shift
T=$$replc
20
25 if echo "$1" | grep -q / ; then for i in "[email protected]" ; do
SEARCH=‘echo "$S" | sed ’s,/,\\\\/,g’‘
REPLACE=‘echo "$R" | sed ’s,/,\\\\/,g’‘ cat $i | sed "s/$SEARCH/$REPLACE/g" > $T
181
20.7. Useful Scripts and Commands 20. Advanced Shell Scripting
30
35
40
D="$?" if [ "$D" = "0" ] ; then if diff -q $T $i >/dev/null ; then else
: if [ "$VERBOSE" = "-v" ] ; then echo $i fi cat $T > $i fi rm -f $T done fi else find . -type f -name "$1" | xargs $0 $VERBOSE "$S" "$R" fi
§ ¦
The
¨ cut command is useful for slicing files into fields; try cut -d: -f1 /etc/passwd cat /etc/passwd | cut -d: -f1
§
The common use for
¨ awk program is an interpreter for a complete programming language call AWK. A awk is in field stripping. It is slightly more flexible than cut — cat /etc/passwd | awk -F : ’{print $1}’
§
¥
¦
—especially where whitespace gets in the way,
¨ ls -al | awk ’{print $6 " " $7 " " $8}’ ls -al | awk ’{print $5 " bytes"}’
§
¥
¦
¥
¦ which isolates the time and size of the file respectively.
Get your nonlocal IP addresses with:
¨ ifconfig | grep ’inet addr:’ | fgrep -v ’127.0.0.’ | \
§ cut -d: -f2 | cut -d’ ’ -f1
¥
¦
¨
Reverse an IP address with: echo 192.168.3.2 | awk -F . ’{print $4 "." $3 "." $2 "." $1 }’
§
182
¥
¦
20. Advanced Shell Scripting 20.7. Useful Scripts and Commands
Print all common user names (i.e., users with UID values greater than 499 on
RedHat and greater than 999 on Debian ):
¨ awk -F: ’$3 >= 500 {print $1}’ /etc/passwd
( awk -F: ’$3 >= 1000 {print $1}’ /etc/passwd )
§
¥
¦
Scripts can easily use vert to decimal with
¨ bc to do calculations that echo -e ’ibase=16;FFFF’ | bc
§ expr can’t handle. For example, con-
¥
¦ to binary with
¨ ¥
¦ or work out the SIN of 45 degrees with
¨ pi=‘echo "scale=10; 4*a(1)" | bc -l‘ echo "scale=10; s(45*$pi/180)" | bc -l
§
¥
¦
5
The convert program of the
ImageMagick
package is a command many Windows users would love. It can easily be used to convert multiple files from one format to another. Changing a file’s extension can be done with e ’s/\.
¨
old
$/.
new
/’‘ . The echo convert command does the rest:
filename
| sed for i in *.pcx ; do
CMD="convert -quality 625 $i ‘echo $i | sed -e ’s/\.pcx$/.png/’‘"
# Show the command-line to the user: echo $CMD
# Execute the command-line: eval $CMD
¥
¦
Note that the search-and-replace expansion mechanism could also be used to replace the extensions: ${i/%.pcx/.png} produces the desired result.
183
20.7. Useful Scripts and Commands 20. Advanced Shell Scripting
Incidentally, the above nicely compresses high-resolution pcx files—possibly the
A
TEX compilation into PostScript rendered with
GhostScript (i.e.
gs -sDEVICE=pcx256 -sOutputFile=’page%d.pcx’
file
.ps).
5
Removing a file with rm only unlinks the file name from the data. The file blocks may still be on disk, and will only be reclaimed when the file system reuses that data. To erase a file proper, requires writing random bytes into the disk blocks occupied by the file. The following overwrites all the files in the current directory:
¨ for i in * ; do dd if=/dev/urandom of="$i" bs=1024
\
\
\ count=‘expr 1 + \
\‘stat "$i" | grep ’Size:’ | awk ’{print $2}’\‘
/ 1024‘ done
§
\
¥
¦
You can then remove the files normally with rm .
Consider trying to run a process, say, the be done simply with:
¨ rxvt terminal, in the background. This can
¥
¦
However, rxvt still has its output connected to the shell and is a child process of the shell. When a login shell exits, it may take its child processes with it.
rxvt may also die of its own accord from trying to read or write to a terminal that does not exist without the parent shell. Now try:
¨
{ rxvt >/dev/null 2>&1 </dev/null & } &
§
¥
¦
This technique is known as
forking twice
, and
redirecting the terminal to dev null
. The shell can know about its child processes but not about the its “grand child” processes.
We have hence create a daemon process proper with the above command.
Now, it is easy to create a daemon process that restarts itself if it happens to die.
Although such functionality is best accomplished within
C
(which you will get a taste of in Chapter 22), you can make do with:
184
20. Advanced Shell Scripting 20.7. Useful Scripts and Commands
¨
{ { while true ; do rxvt ; done ; } >/dev/null 2>&1 </dev/null & } &
§ ps awwwxf
§
¥
¦
¥
¦
10
5
The following command uses the custom format option of able attribute of a process:
¨ ps to print every conceivps -awwwxo %cpu,%mem,alarm,args,blocked,bsdstart,bsdtime,c,caught,cmd,comm,\ command,cputime,drs,dsiz,egid,egroup,eip,esp,etime,euid,euser,f,fgid,fgroup,\ flag,flags,fname,fsgid,fsgroup,fsuid,fsuser,fuid,fuser,gid,group,ignored,\ intpri,lim,longtname,lstart,m_drs,m_trs,maj_flt,majflt,min_flt,minflt,ni,\ nice,nwchan,opri,pagein,pcpu,pending,pgid,pgrp,pid,pmem,ppid,pri,rgid,rgroup,\ rss,rssize,rsz,ruid,ruser,s,sess,session,sgi_p,sgi_rss,sgid,sgroup,sid,sig,\ sig_block,sig_catch,sig_ignore,sig_pend,sigcatch,sigignore,sigmask,stackp,\ start,start_stack,start_time,stat,state,stime,suid,suser,svgid,svgroup,svuid,\ svuser,sz,time,timeout,tmout,tname,tpgid,trs,trss,tsiz,tt,tty,tty4,tty8,ucomm,\
¥
¦
The output is best piped to a file and viewed with a nonwrapping text editor. More interestingly, the
¨ awk command can print the process ID of a process with ps awwx | grep -w ’htt[p]d’ | awk ’{print $1}’
§
¥
¦ which prints all the processes having
This filter is useful for killing
¨ httpd in the command name or command-line.
netscape as follows: kill -9 ‘ps awx | grep ’netsc[a]pe’ | awk ’{print $1}’‘
§
¥
¦
(Note that the process list.)
[a] in the regular expression prevents grep from finding itself in the
5
Other useful
¨ ps awwxf ps awwxl ps awwxv ps awwxu ps awwxs
§ ps variations are:
The f option is most useful for showing parent-child relationships. It stands for f orest, and shows the full process tree. For example, here I am running an desktop with two windows:
¥
¦
185
20.8. Shell Initialization 20. Advanced Shell Scripting
5
10
15
20
25
¨
PID TTY
1 ?
2 ?
3 ?
4 ?
5 ?
6 ?
262 ?
272 ?
341 ?
447 ?
480 ?
506 tty1
507 tty2
508 tty3
509 ?
514 ?
515 ?
524 ?
748 ?
749 pts/0
5643 pts/0
5645 pts/6
25292 pts/6
11780 ?
11814 ?
15534 pts/6
¥
S
S
S
S
S
S
S
SW<
S
S
STAT
S
SW
SW
SW
SW
R
S
S
S
S
S
S
S
S
S
S
S
TIME COMMAND
0:05 init [5]
0:02 [kflushd]
0:02 [kupdate]
0:00 [kpiod]
0:01 [kswapd]
0:00 [mdrecoveryd]
0:02 syslogd -m 0
0:00 klogd
0:00 xinetd -reuse -pidfile /var/run/xinetd.pid
0:00 crond
0:02 xfs -droppriv -daemon
0:00 /sbin/mingetty tty1
0:00 /sbin/mingetty tty2
0:00 /sbin/mingetty tty3
0:00 /usr/bin/gdm -nodaemon
7:04
0:00
0:18
0:08
0:00
0:09
0:02
0:00
0:16
0:00
\_ /etc/X11/X -auth /var/gdm/:0.Xauth :0
\_ /usr/bin/gdm -nodaemon
\_ /opt/icewm/bin/icewm
\_ rxvt -bg black -cr green -fg whi
|
|
|
|
\_ bash
\_ mc
\_ bash -rcfile .bashrc
\_ ps awwxf
\_ /usr/lib/netscape/netscape-commu
\_ (dns helper)
3:12 cooledit -I /root/.cedit/projects/Rute
6:03 \_ aspell -a -a
The u option shows the useful u ser format, and the others show and l ong format.
v irtual memory, s ignal
¦
Here I will briefly discuss what initialization takes place after logging in and how to modify it.
The interactive shell invoked after field of the user’s entry in the login
/etc/passwd will be the shell specified in the last file. The login program will invoke the shell after authenticating the user, placing a which indicates to the shell that it is a
-
login shell
in front of the the command name,
, meaning that it reads and execute several scripts to initialize the environment. In the case of are: /etc/profile , ˜/.bash profile , ˜/.bash login bash and
, the files it reads
˜/.profile
, in that order. In addition, an interactive shell that is not a login shell also reads
Note that traditional sh shells only read /etc/profile and
˜/.bashrc
˜/.profile
.
.
186
20. Advanced Shell Scripting 20.9. File Locking
5
Administrators can customise things like the environment variables by modifying these startup scripts. Consider the classic case of an installation tree under /opt/ .
Often, a package like and
/opt/staroffice/
LD LIBRARY PATH or /opt/oracle/ will require the PATH variables to be adjusted accordingly. In the case of RedHat, a script,
¨ for i in /opt/*/bin /usr/local/bin ; do test -d $i || continue echo $PATH | grep -wq "$i" && continue done
PATH=$PATH:$i export PATH
¥
10
15 if test ‘id -u‘ -eq 0 ; then for i in /opt/*/sbin /usr/local/sbin ; do test -d $i || continue echo $PATH | grep -wq "$i" && continue
PATH=$PATH:$i done export PATH fi
20 for i in /opt/*/lib /usr/local/lib ; do test -d $i || continue echo $LD_LIBRARY_PATH | grep -wq "$i" && continue done
§
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$i export LD_LIBRARY_PATH can be placed as /etc/profile.d/my local.sh
will take care of anything installed under /opt/ or with execute permissions. This
/usr/local/ . For Debian , the script can be inserted directly into /etc/profile .
¦
Page 235 of Section 23.3 contains details of exactly what LD LIBRARY PATH is.
(Unrelated, but you should also edit your /etc/man.config
paths that appear under all installation trees under /opt/ .) to add man page
Often, one would like a process to have
exclusive access
to a file. By this we mean that only one process can access the file at any one time. Consider a mail folder: if two processes were to write to the folder simultaneously, it could become corrupted. We
187
20.9. File Locking 20. Advanced Shell Scripting also sometimes want to ensure that a program can never be run twice at the same time; this insurance is another use for “locking.”
In the case of a mail folder, if the file is being written to, then should try read it or write to it: and we would like to create a
no
other process
write lock
on the file.
However if the file is being read from, we would like to create a
read lock no
other process should try to write to it: and on the file. Write locks are sometimes called
exclusive locks
; read locks are sometimes called for simplicity.
shared locks
. Often,
exclusive locks
are preferred
Locking can be implemented by simply creating a temporary file to indicate to other processes to wait before trying some kind of access. U
NIX also has some more sophisticated builtin functions.
There are currently four methods of file locking.
research in this area, so this is what I am going on.
-
&
The exim sources seem to indicate thorough
1.
“dot lock” file locking. Here, a temporary file is created with the same name as the mail folder and the extension .lock
added. So long as this file exists, no program should try to access the folder. This is an exclusive lock only. It is easy to write a shell script to do this kind of file locking.
2.
“MBX” file locking. Similar to 1, but a temporary file is created in also an exclusive lock.
/tmp . This is
3.
fcntl locking. Databases require areas of a file to be locked.
call to be used inside
C
programs.
fcntl is a system
4.
flock file locking. Same as fcntl , but locks whole files.
5
10
The following shell function does proper mailbox file locking.
¨ function my_lockfile ()
{
TEMPFILE="$1.$$"
LOCKFILE="$1.lock" echo $$ > $TEMPFILE 2>/dev/null || { echo "You don’t have permission to access ‘dirname $TEMPFILE‘" return 1
} ln $TEMPFILE $LOCKFILE 2>/dev/null && { rm -f $TEMPFILE return 0
}
STALE_PID=‘< $LOCKFILE‘
¥
188
20. Advanced Shell Scripting 20.9. File Locking
15
20
25 test "$STALE_PID" -gt "0" >/dev/null || { return 1
} kill -0 $STALE_PID 2>/dev/null && { rm -f $TEMPFILE return 1
} rm $LOCKFILE 2>/dev/null && { echo "Removed stale lock file of process $STALE_PID"
} ln $TEMPFILE $LOCKFILE 2>/dev/null && {
} rm -f $TEMPFILE return 1 rm -f $TEMPFILE return 0
30
¦
(Note how instead of ‘cat $LOCKFILE‘ , we use ‘< $LOCKFILE‘ , which is faster.)
You can include the above function in scripts that need to lock any kind file. Use the function as follows:
¨
# wait for a lock until my_lockfile /etc/passwd ; do sleep 1 done
¥
5
# The body of the program might go here
# [...]
10
# Then to remove the lock, rm -f /etc/passwd.lock
§ ¦
5
This script is of academic interest only but has a couple of interesting features. Note how the ln function is used to ensure “exclusivity.” ln is one of the few U
NIX functions that is
atomic
, meaning that only one link of the same name can exist, and its creation excludes the possibility that another program would think that it had successfully created the same link. One might naively expect that the program
¨ function my_lockfile ()
{
LOCKFILE="$1.lock" test -e $LOCKFILE && return 1 touch $LOCKFILE return 0
}
§
¥
¦ is sufficient for file locking. However, consider if two programs, running simultane-
189
20.9. File Locking 20. Advanced Shell Scripting ously, executed line 4 at the same time.
Both
would think that the lock did not exist and proceed to line 5. Then both would successfully create the lock file—not what you wanted.
The kill
Sending the 0 command is then useful for checking whether a process is running.
signal does nothing to the process, but the signal fails if the process does not exist. This technique can be used to remove a lock of a process that died before removing the lock itself: that is, a
stale
lock.
The preceding script does
not
work if your file system is mounted over NFS (
network file system
—see Chapter 28). This is obvious because the script relies on the PID of the process, which is not visible across different machines. Not so obvious is that the function does not work exactly right over NFS—you need to stat ln the file and actually check that the link count has increased to 2.
The commands lockfile (from the procmail package) and mutt dotlock
(from the mutt email reader but perhaps not distributed) do similar file locking. These commands, however, but do not store the PID in the lock file. Hence it is not possible to detect a stale lock file. For example, to search your mailbox, you can run:
¨ lockfile /var/spool/mail/mary.lock
grep freddy /var/spool/mail/mary rm -f /var/spool/mail/mary.lock
§
¥
¦
This sequence ensures that you are searching a clean mailbox even if
NFS share.
/var is a remote
File locking is a headache for the developer. The problem with U
NIX we are intuitively thinking about locking a
file
is that whereas
, what we really mean is locking a
file name
within a directory.
File
locking
per se
should only be used on perpetual files, such as database files. For mailbox and passwd files we need
directory locking
&
My own term.
, meaning the exclusive access of one process to a particular directory entry. In my opinion, lack of such a feature is a serious deficiency in U
NIX
, but because it will require kernel, NFS, and (possibly)
C
library extensions, will probably not come into being any time soon.
190
20. Advanced Shell Scripting 20.9. File Locking
C
This topic is certainly outside of the scope of this text, except to say that you should consult the source code of reputable packages rather than invent your own locking scheme.
191
20.9. File Locking 20. Advanced Shell Scripting
192
This chapter covers a wide range of concepts about the way U
NIX services function.
Every function of U
NIX is provided by one or another package. For instance, mail is often handled by the sendmail or other package, web by the apache package.
Here we examine how to obtain, install, and configure a package, using lpd as an example. You can then apply this knowledge to any other package, and later chapters assume that you know these concepts. This discussion will also suffice as an explanation of how to set up and manage printing.
Printing under U
NIX
-Plp <filename> on a properly configured machine is as simple as typing
(or cat <filename> | lpr -Plp ). The “ lp ” in -Plp lpr is the name of the printer
queue
on the local machine you would like to print to. You can omit it if you are printing to the default (i.e., the first listed) queue. A
queue
belongs to a physical printer, so users can predict where paper will come spewing out, by what queue they print to. Queues are conventionally named lp , lp0 , lp1 , and so on, and any number of them may have been redirected to any other queue on any other machine on the network.
The command in progress.
lprm removes pending jobs from a print queue; lpq reports jobs
The service that facilitates all this is called network connection to the lpd lpd . The lpr user program makes a background process, sending it the print job.
lpd then queues, filters, and feeds the job until it appears in the print tray.
193
21.2. Downloading and Installing 21. System Services and
lpd
Printing typifies the process is the
server client/server
nature of U and is initiated by the root
NIX services. The lpd background user. The remaining commands are
client
programs, and are run mostly by users.
The following discussion should relieve the questions of “Where do I get
xxx
service/package?” and “How do I install it?”. Full coverage of package management comes in Section 24.2, but here you briefly see how to use package managers with respect to a real system service.
Let us say we know nothing of the service except that it has something to do with a file /usr/sbin/lpd . First, we use our package managers to find where the file comes from (Debian
¨ commands are shown in parentheses): rpm -qf /usr/sbin/lpd
( dpkg -S /usr/sbin/lpd )
§
¥
¦
Returns lpr-0.
nn
-
n
(for RedHat 6.2, or LPRng-
n
.
n
.
nn
-
n
on RedHat 7.0, or lpr
Debian ). On RedHat you may have to try this on a different machine because on rpm does not know about packages that are not installed. Alternatively, if we would like to see whether a package whose name contains the letters
¨ lpr is installed: rpm -qa | grep -i lpr
( dpkg -l ’*lpr*’ )
§
¥
¦ easily installable with (RedHat 7.0 and Debian
¨
If the package is not present, the package file will be on your CD-ROM and is in braces): rpm -i lpr-0.50-4.i386.rpm
( rpm -i LPRng-3.6.24-2 )
( dpkg -i lpr_0.48-1.deb )
§
¥
¦
(Much more about package management is covered in Chapter 24.)
5 ql lpr
¨
The list of files which the lpr package is comprises (easily obtained with rpm or dpkg -L lpr ) is approximately as follows:
/etc/init.d/lpd
/etc/cron.weekly/lpr
/usr/sbin/lpf
/usr/sbin/lpc
/usr/sbin/lpd
/usr/sbin/pac
/usr/bin/lpq
/usr/share/man/man1/lprm.1.gz
/usr/share/man/man5/printcap.5.gz
/usr/share/man/man8/lpc.8.gz
/usr/share/man/man8/lpd.8.gz
/usr/share/man/man8/pac.8.gz
/usr/share/man/man8/lpf.8.gz
/usr/share/doc/lpr/README.Debian
¥
194
21. System Services and
lpd
21.3.
LPRng
vs. Legacy
lpr-0.
nn
10
/usr/bin/lpr
/usr/bin/lprm
/usr/bin/lptest
/usr/share/man/man1/lpr.1.gz
/usr/share/man/man1/lptest.1.gz
/usr/share/doc/lpr/copyright
/usr/share/doc/lpr/examples/printcap
/usr/share/doc/lpr/changelog.gz
/usr/share/doc/lpr/changelog.Debian.gz
/var/spool/lpd/lp
/var/spool/lpd/remote
¦
(The word
legacy
with regard to software means outdated, superseded, obsolete, or just old.)
RedHat 7.0 has now switched to using
Debian and other distributions use.
LPRng
LPRng rather than the legacy lpr that is a more modern and comprehensive package. It supports the same legacy lpr
/etc/printcap file and identical binaries as did the on RedHat 6.2. The only differences are in the control files created in your spool directories, and a different access control mechanism (discussed below). Note that LPRng has strict permissions requirements on spool directories and is not trivial to install from source.
A package’s many files can be loosely grouped into functional elements. In this sectiom, each element will be explained, drawing on the lpr package as an example.
Refer to the list of files in Section 21.2.
Documentation should be your first and foremost interest.
not always be the only documentation provided.
Man pages will
Above we see that lpr does not install very much into the ever, other packages, like
/usr/share/doc rpm -ql apache directory.
How-
, reveal a huge user manual (in
/home/httpd/html/manual/ ftpd shows lots inside or /var/www/html/manual/
/usr/doc/wu-ftpd-
?
.
?
.
?
.
), and rpm -ql wu-
Every package will probably have a team that maintains it as well as a web page.
In the case of lpd , however, the code is very old, and the various CD vendors do
195
21.4. Package Elements 21. System Services and
lpd
maintenance on it themselves. A better example is the
LPRng Web Page
http://www.astart.com/lprng/LPRng.html
lprNG package. Go to
The
with your web browser. There you can see the authors, mailing lists, and points of download. If a particular package is of much interest to you, then you should become familiar with these resources. Good web pages will also have additional documentation like troubleshooting guides and
FAQs (Frequently Asked Questions). Some may even have archives of their mailing lists. Note that some web pages are geared more toward CD vendors who are trying to create their own distribution and so will not have packages for download that beginner users can easily install.
User programs are found in one or another lpr , lprm , and lptest bin directory. In this case, we can see
, as well as their associated man pages.
lpq ,
Daemon and administrator command will an sbin lpc , lpd , lpf , and pac , as well as their associated directory. In this case we can see man pages. The only
daemon
(background) program is really the age.
lpd program itself, which is the core of the whole pack-
The file /etc/printcap printcap controls is a plain text file that lpd lpd
. Most system services will have a file in /etc reads on startup. Configuring any service pri-
.
marily involves editing its configuration file. Several graphical configuration tools are available that avoid this inconvenience ( printtool , which is especially for lpd , and linuxconf ), but these actually just silently produce the same configuration file.
by the
Because printing is so integral to the system, lpr package. Trying printcap rpm -qf /etc/printcap gives is not actually provided setup-2.3.4-1 , and dpkg -S /etc/printcap tem).
shows it to not be owned (i.e., it is part of the base sys-
The files in /etc/rc.d/init.d/ scripts to run lpd
(or /etc/init.d/ on boot and shutdown. You can start
) are the startup and shutdown lpd yourself on the commandline with
196
21. System Services and
lpd
21.4. Package Elements
¨
/usr/sbin/lpd
§ but it is preferably to use the given script:
¨
/etc/rc.d/init.d/lpd start
/etc/rc.d/init.d/lpd stop
§
(or
¨
/etc/init.d/lpd ). The script has other uses as well:
/etc/rc.d/init.d/lpd status
/etc/rc.d/init.d/lpd restart
§
(or /etc/init.d/lpd ).
¥
¦
¥
¦
¥
¦
5
10
To make sure that lpd runs on startup, you can check that it has a symlink under the appropriate run level. The symlinks can be explained by running
¨ ls -al ‘find /etc -name ’*lpd*’‘ find /etc -name ’*lpd*’ -ls
§
¥
¦ showing,
¨
-rw-r--r--
-rw-r--r--
-rwxr-xr-x lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx
1 root
1 root
1 root
1 root
1 root
1 root
1 root
1 root
1 root
1 root root root root root root root root root root root
17335 Sep 25
10620 Sep 25
2000 /etc/lpd.conf
2000 /etc/lpd.perms
2277 Sep 25 2000 /etc/rc.d/init.d/lpd
13 Mar 21 14:03 /etc/rc.d/rc0.d/K60lpd -> ../init.d/lpd
13 Mar 21 14:03 /etc/rc.d/rc1.d/K60lpd -> ../init.d/lpd
13 Mar 21 14:03 /etc/rc.d/rc2.d/S60lpd -> ../init.d/lpd
13 Mar 24 01:13 /etc/rc.d/rc3.d/S60lpd -> ../init.d/lpd
13 Mar 21 14:03 /etc/rc.d/rc4.d/S60lpd -> ../init.d/lpd
13 Mar 28 23:13 /etc/rc.d/rc5.d/S60lpd -> ../init.d/lpd
13 Mar 21 14:03 /etc/rc.d/rc6.d/K60lpd -> ../init.d/lpd
¥
¦
The “ 3 ” in rc3.d
under rc3.d
is the what are interested in. Having means that lpd
S60lpd will be started when the system enters symlinked to
run level
lpd
3, which is the system’s state of usual operation.
Note that under RedHat the command vices . The Services setup has a menu option System Serlist will allow you to manage what services come alive on boot, thus creating the symlinks automatically. For Debian , check the man page for the update-rc.d
command.
More details on bootup are in Chapter 32.
Systems services like lpd , innd , sendmail , and uucp course of processing each request. These are called create intermediate files in the
spool
files and are stored somewhere under the sequence.
/var/spool/ directory, usually to be processed and then deleted in
197
21.4. Package Elements 21. System Services and
lpd
lpd has a spool directory /var/spool/lpd , which may have been created on installation. You can create spool directories for the two printers in the example below, mkdir -p /var/spool/lpd/lp /var/spool/lpd/lp0
§
¥
¦
U
NIX has a strict policy of not reporting error messages to the user interface whenever there might be no user around to read those messages. Whereas error messages of interactive commands are sent to the terminal screen, error or information messages produced by non-interactive commands are “logged” to files in the directory /var/log/ .
A log file is a plain text file that continually has one-liner status messages appended to it by a daemon process. The usual directory for log files is /var/log . The main log files are /var/log/messages and possibly /var/log/syslog . It contains kernel messages and messages from a few primary services. When a service would produce large log files (think web access with thousands of hits per hour), the service would use its own log file.
sendmail , for example, uses /var/log/maillog . Actually, lpd does not have a log file of its own—one of its failings.
View the system log file with the
¨ tail -f /var/log/messages tail -f /var/log/syslog
§ f
ollow
option to tail :
Restarting the
¨ lpd service gives messages like:
&
Not all distributions log this information.
-
Jun 27 16:06:43 cericon lpd: lpd shutdown succeeded
Jun 27 16:06:45 cericon lpd: lpd startup succeeded
§
¥
¦
¥
¦
Log files are rotated daily or weekly by the file is /etc/logrotate.conf
logrotate package. Its configuration
. For each package that happens to produce a log file, there is an additional configuration file under /etc/logrotate.d/ . It is also easy to write your own—begin by using one of the existing files as an example.
Rotation
means that the log file is renamed with a .1
extension and then truncated to zero length. The service is notified by the logrotate program, sometimes with a SIGHUP .
Your /var/log/ may contain a number of old log files named .2
of log file rotation is to prevent log files from growing indefinitely.
, .3
, etc. The point
198
21. System Services and
lpd
21.5. The
printcap
File in Detail
Most user commands of services make use of some environment variables. These can be defined in your shell startup scripts as usual. For lpr , if no printer is specified on the command-line, the PRINTER queue. For example, environment variable determines the default print export PRINTER=lp1 will force use of the lp1 print queue.
5
10
The printcap (
printer capabilities
) file is similar to (and based on) the termcap (
terminal capabilities
) file. Configuring a printer means adding or removing text in this file.
printcap by a
¨ contains a list of one-line entries, one for each printer. Lines can be broken
\ before the newline. Here is an example of a printcap file for two printers.
lp:\
:sd=/var/spool/lpd/lp:\
:mx#0:\
:sh:\
:lp=/dev/lp0:\
:if=/var/spool/lpd/lp/filter: lp0:\
:sd=/var/spool/lpd/lp0:\
:mx#0:\
:sh:\
:rm=edison:\
:rp=lp3:\
:if=/bin/cat:
§
¥
¦
Printers are named by the first field: in this case lp is the first printer and lp0 the second printer. Each printer usually refers to a different physical device with its own queue. The lp printer should always be listed first and is the default print queue used when no other is specified. Here, lp refers to a local printer on the device /dev/lp0
(first parallel port).
lp0 refers to a remote print queue lp3 on the machine edison .
The printcap has a comprehensive are most of what you will ever need: man page. However, the following fields
sd
Spool directory. This directory contains status and spool files.
mx
Maximum file size. In the preceding example, unlimited.
sh
Suppress headers. The header is a few informational lines printed before or after the print job. This option should always be set to off.
lp
Line printer device.
199
21.6. PostScript and the Print Filter 21. System Services and
lpd
if
Input filter. This is an executable script into which printer data is piped. The output of this script is fed directly to the printing device or remote machine. This filter will translate from the application’s output into the printer’s native code.
rm
Remote machine. If the printer queue is not local, this is the machine name.
rp
Remote printer queue name. The remote machine will have its own printcap file with possibly several printers defined. This specifies which printer to use.
On U
NIX the standard format for all printing is the PostScript file. PostScript .ps
files are graphics files representing arbitrary scalable text, lines, and images. PostScript is actually a programming language specifically designed to draw things on a page; hence, .ps
files are really PostScript programs. The last line in any PostScript program is always showpage , meaning that all drawing operations are complete and that the page can be displayed. Hence, it is easy to see the number of pages inside a PostScript file by grep ping for the string showpage .
The procedure for printing on U
NIX is to convert whatever you would like to print into PostScript. PostScript files can be viewed with a PostScript “emulator,” like the gv (GhostView) program. A program called gs (GhostScript) is the standard utility for converting the PostScript into a format suitable for your printer. The idea behind
PostScript is that it is a language that can easily be built into any printer. The so-called
“PostScript printer” is one that directly interprets a PostScript file. However, these printers are relatively expensive, and most printers only understand the lesser PCL
(printer control language) dialect or some other format.
In short, any of the hundreds of different formats of graphics and text have a utility that will convert a file into PostScript, whereafter gs will convert it for any of the hundreds of different kinds of printers.
&
There are actually many printers not supported by gs at the time of this writing. This is mainly because manufacturers refuse to release specifications to their proprietary printer communication protocols
. The print filter is the workhorse of this whole operation.
ple,
Most applications conveniently output PostScript whenever printing. For examnetscape ’s menu selection shows
200
21. System Services and
lpd
21.6. PostScript and the Print Filter which sends PostScript through the stdin of lpr . All applications without their own printer drivers will do the same. This means that we can generally rely on the fact that the print filter will always receive PostScript.
gs , on the other hand, can convert
PostScript for any printer, so all that remains is to determine its command-line options.
the
If you have chosen “Print To: File,” then you can view the resulting output with gv program. Try gv netscape.ps
desktop applications do
not
, which shows a
print preview
. On U
NIX
, most have their own preview facility because the PostScript printer itself is emulated by gv .
Note that filter programs should not be used with remote filters; remote printer queues can send their PostScript files “as is” with :if=/bin/cat: (as in the example printcap file above). This way, the machine connected to the device need be the only one especially configured for it.
script
¨
The filter program we are going to use for the local print queue will be a shell
/var/spool/lpd/lp/filter . Create the filter with touch /var/spool/lpd/lp/filter chmod a+x /var/spool/lpd/lp/filter
§
¥
¦
¥
#!/bin/bash cat | gs -sDEVICE=ljet4 -sOutputFile=- -sPAPERSIZE=a4 -r600x600 -q exit 0
§ ¦
The -sDEVICE option describes the printer, in this example a Hewlett Packard
LaserJet 1100. Many printers have similar or compatible formats; hence, there are far fewer DEVICE ’s than different makes of printers. To get a full list of supported devices, use gs -h and also consult one of the following files (depending on your distribution):
/usr/doc/ghostscript-
?
.
??
/devices.txt
/usr/share/doc/ghostscript-
?
.
??
/Devices.htm
/usr/share/doc/gs/devices.txt.gz
The -sOutputFile=sPAPERSIZE sets to write to stdout (as required for a filter). The can be set to one of 11x17 , a3 , a4 , a5 , b3 , b4 , b5 ,
halfletter , ledger , legal , letter ,
-g<width>x<height> note , and others listed in the man to set the exact page size in pixels.
page. You can also use
-r600x600 sets the resolution, in this case, 600 dpi (dots per inch).
-q means to set quiet mode, suppressing any informational messages that would otherwise corrupt the PostScript output, and
means to read from stdin and not from a file.
Our printer configuration is now complete. What remains is to start print. You can do that on the command-line with the enscript package.
lpd and test enscript is a program to convert plain text files into nicely formatted PostScript pages. The man page for enscript shows an enormous number of options, but we can simply try:
201
21.7. Access Control
¨ echo hello | enscript -p - | lpr
§
21. System Services and
lpd
¥
¦
You should be very careful about running
Internet.
lpd lpd on any machine that is exposed to the has had numerous security alerts
&
See Chapter 44.
and should really only be used within a trusted LAN.
To prevent any remote machine from using your printer, lpd first looks in the file /etc/hosts.equiv
. This is a simple list of all machines allowed to print to your printers. My own file looks like this:
¨
192.168.3.8
192.168.3.9
192.168.3.10
192.168.3.11
§
¥
¦
The file /etc/hosts.lpd
does the same but doesn’t give administrative control by those machines to the print queues. Note that other services, like sshd and rshd (or in.rshd
), also check the hosts.equiv
file and consider any machine listed to be equiv
alent
. This means that they are completed trusted and so rshd will not request user logins between machines to be authenticated. This behavior is hence a grave security concern.
5
LPRng on RedHat 7.0 has a different access control facility. It can arbitrarily limit access in a variety of ways, depending on the remote user and the action (such as who is allowed to manipulate queues). The file /etc/lpd.perms
contains the configuration.
The file format is simple, although LPRng ’s capabilities are rather involved—to make a long story short, the equivalent hosts.equiv
¨ becomes in lpd.perms
ACCEPT SERVICE=* REMOTEIP=192.168.3.8
ACCEPT SERVICE=* REMOTEIP=192.168.3.9
ACCEPT SERVICE=* REMOTEIP=192.168.3.10
ACCEPT SERVICE=* REMOTEIP=192.168.3.11
DEFAULT REJECT
§
¥
¦
Large organizations with many untrusted users should look more closely at the
LPRng-HOWTO in /usr/share/doc/LPRng-
n
.
n
.
nn
. It explains how to limit access in more complicated ways.
202
21. System Services and
lpd
21.8. Printing Troubleshooting
Here is a convenient order for checking what is not working.
1.
Check that your printer is plugged in and working. All printers have a way of printing a test page. Read your printer manual to find out how.
2.
Check your printer cable.
3.
Check your CMOS settings for your parallel port.
4.
Check your printer cable.
5.
Try echo hello > /dev/lp0 to check that the port is operating. The printer should do something to signify that data has at least been received. Chapter 42 explains how to install your parallel port kernel module.
6.
Use the lpc and so on.
program to query the lpd daemon. Try help , then status lp ,
7.
Check that there is enough space in your /var and /tmp devices for any intermediate files needed by the print filter. A large print job may require hundreds of megabytes.
lpd may not give any kind of error for a print filter failure: the print job may just disappear into nowhere. If you are using legacy lpr , then complain to your distribution vendor about your print filter not properly logging to a file.
8.
For legacy lpr , stop lpd and remove all of program being in a running state.
files from lpd ’s runtime
/var/spool/lpd
&
At or pertaining to the and from any of its subdirectories. (New are .seq
, lock ,
LPRng status , should never require this step.) The unwanted files lpd.lock
, and any left over spool files that failed to disappear with lprm (these files are recognizable by long file names with a host name and random key embedded in the file name). Then, restart lpd .
9.
For remote queues, check that you can do forward
and
reverse lookups on both machines of both machine’s host names and IP address. If not, you may get Host name for your address (
ipaddr
) unknown error messages when trying an lpq . Test with the command host <ip-address> and also host <machinename> on both machines. If any of these do not work, add entries for both machines in /etc/hosts from the example on page 278. Note that the host command may be ignorant of the file /etc/hosts and may still fail. Chapter 40 will explain name lookup configuration.
10.
Run your print filter manually to check that it does, in fact, produce the correct output. For example, echo hello | enscript -p - |
/var/spool/lpd/lp/filter > /dev/lp0 .
11.
Legacy lpd is a bit of a quirky package—meditate.
203
21.9. Useful Programs 21. System Services and
lpd
printtool is a graphical printer setup program that helps you very quickly set up lpd . It immediately generates a printcap file and magic filter, and you need not know anything about lpd configuration.
apsfilter stands for
any to PostScript filter
. The setup described above requires everything be converted to PostScript before printing, but a filter could foreseeably use the file command to determine the type of data coming in and then invoke a program to convert it to PostScript before piping it through plain text, DVI files, or even gzip gs . This would enable JPEG, GIF, ped HTML to be printed directly, since PostScript converters have been written for each of these.
which are generally called
magic filters
.
apsfilter is one of a few such filters,
&
This is because the file command uses magic numbers.
See page 37.
-
I personally find this feature a gimmick rather than a genuine utility, since most of the time you want to lay out the graphical object on a page before printing, which requires you to preview it, and hence convert it to PostScript manually. For most situations, the straight PostScript filter above will work adequately, provided users know to use enscript instead of lpr when printing plain text.
mpage is a useful utility for saving the trees. It resizes PostScript input so that two, four or eight pages fit on one. Change your print filter to:
¨
#!/bin/bash cat | mpage -4 | gs -sDEVICE=ljet4 -sOutputFile=- -sPAPERSIZE=a4 -r600x600 -q -
¥
¦
The package psutils contains a variety of command-line PostScript manipulation programs—a must for anyone doing fancy things with filters.
204
21. System Services and
lpd
21.10. Printing to Things Besides Printers
The printcap allows anything to be specified as the printer device. If we set it to
/dev/null and let our filter force the output to an alternative device, then we can use lpd to redirect “print” jobs to any kind of service imaginable.
5
Here, my filter.sh
is a script that might send the print job through an SMB
(Windows NT) print share (using smbclient —see Chapter 39), to a printer previewer, or to a script that emails the job somewhere.
¨ lp1:\
:sd=/var/spool/lpd/lp1:\
:mx#0:\
:sh:\
:lp=/dev/null:\
:if=/usr/local/bin/my_filter.sh:
§
¥
¦
We see a specific example of redirecting print jobs to a fax machine in Chapter 33.
205
21.10. Printing to Things Besides Printers 21. System Services and
lpd
206
C was invented for the purpose of writing an operating system that could be recompiled (ported) to different hardware platforms (different CPUs). Because the operating system is written in
C
, this language is the first choice for writing any kind of application that has to communicate efficiently with the operating system.
Many people who don’t program very well in guage out of many. This point should be made at once:
C
think of
C
C
as an arbitrary lanis the fundamental basis of all computing in the world today. U
NIX
, Microsoft Windows, office suites, web browsers and device drivers are all written in
C
. Ninety-nine percent of your time spent at a computer is probably spent using an application written in
C
. About 70% of all “open source” software is written in
C
, and the remaining 30% written in languages whose compilers or interpreters are written in
C
.
&
C++ is also quite popular. It is, however, not as fundamental to computing, although it is more suitable in many situations.
-
Further, there is no replacement for
C
. Since it fulfills its purpose almost flawlessly, there will never be a need to replace it.
Other languages may fulfill other purposes, but
C
fulfills its purpose most adequately.
probably be written in
C
For instance, all future operating systems will for a long time to come.
It is for these reasons that your knowledge of U
NIX you can program in
C
will never be complete until
. On the other hand, just because you can program in
C
does not mean that you
should
. Good
C
programming is a fine art which many veteran programmers never manage to master, even after many years.
C
It is essential to join a Free software project to properly master an effective style of
C
development.
207
22.1.
C Fundamentals 22. Trivial Introduction to C
C
We start with a simple
C
program and then add fundamental elements to it. Before going too far, you may wish to review bash functions in Section 7.7.
C
A simple
C
program is:
¨
#include <stdlib.h>
#include <stdio.h>
5 int main (int argc, char *argv[])
{ printf ("Hello World!\n"); return 3;
¥
¦
Save this program in a file is the process of turning C hello.c
code into
. We will now compile the program.
&
Compiling assembler instructions
. Assembler instructions are the program code that your 80
?
86/SPARC/RS6000 CPU understands directly. The resulting binary executable is fast because it is executed natively by your processor—it is the very chip that you see on your motherboard that does fetch Hello byte for byte from memory and executes each instruction. This is what is meant by
instructions per second
(MIPS). The
megahertz
of the machine quoted by hardware vendors is
very million
roughly the number of MIPS. Interpreted languages (like shell scripts) are much slower because the code itself is written in something not understandable to the CPU. The /bin/bash program has to
interpret
the shell program.
/bin/bash itself is written in
C
, but the overhead of interpretation makes scripting languages many orders of magnitude slower than compiled languages. Shell scripts do not need to be compiled.
-
¥
¦
The -o hello the binary file option tells hello historical reasons.
The -Wall gcc
&
GNU
C
Compiler.
cc option means to report on other U
NIX systems.
instead of the default binary file named a.out
.
to produce
&
Called a.out
for all W arnings during the compilation. This is not strictly necessary but is most helpful for correcting possible errors in your programs. More compiler options are discussed on page 239.
Then, run the program with
¨ ¥
¦
Previously you should have familiarized yourself with bash functions. In
C
all
code is inside a function. The first function to be called (by the operating system) is the main function.
208
22. Trivial Introduction to C 22.1.
C Fundamentals
Type echo $?
return value of the to see the return code of the program. You will see it is main function.
3 , the
Other things to note are the " on either side of the string to be printed. Quotes are required around string literals. Inside a string literal, the \n
escape sequence
indicates a newline character.
proliferation of ; ascii (7) shows some other escape sequences. You can also see a everywhere in a
C
program. Every statement in
C
is terminated by a
; unlike statements in shell scripts where a ; is optional.
Now try:
¨
#include <stdlib.h>
#include <stdio.h>
¥
5 int main (int argc, char *argv[])
{ printf ("number %d, number %d\n", 1 + 2, 10); exit (3); printf can be thought of as the command to send output to the terminal. It is also what is known as a
standard
C
library function
. In other words, it is specified that a
C
implementation should always have the a certain way.
printf function and that it should behave in
¦
The %d specifies that a d
ecimal
ber to be substituted will be the first should go in at that point in the text. The num-
argument
to the printf function after the string literal—that is, the 1 + 2 . The next is, the 10 . The %d is known as a
%d is substituted with the second argument—that
format specifier
. It essentially
converts
an integer number into a decimal representation. See printf (3) for more details.
With bash , you could use a variable anywhere, anytime, and the variable would just be blank if it had never been assigned a value. In
C
, however, you have to explicitly tell the compiler what variables you are going to need before each block of code. You
¥
#include <stdlib.h>
#include <stdio.h>
5
10 int main (int argc, char *argv[])
{ int x; int y; x = 10; y = 2: printf ("number %d, number %d\n", 1 + y, x); exit (3);
209
22.1.
C Fundamentals 22. Trivial Introduction to C
¦
The int x int
eger
is a variable declaration. It tells the program to reserve space for one variable that it will later refer to as x .
int is the
type
of the variable.
x =
10 assigned a value of 10 to the variable. There are types for each kind of number you would like to work with, and format specifiers to convert them for printing:
¨
#include <stdlib.h>
#include <stdio.h>
¥
5
10
15
20 int main (int argc, char *argv[])
{ char a; short b; int c; long d; float e; double f; long double g; a = ’A’; b = 10; c = 10000000; d = 10000000; e = 3.14159; f = 10e300; g = 10e300; printf ("%c, %hd, %d, %ld, %f, %f, %Lf\n", a, b, c, d, e, f, g); exit (3);
¦
You will notice that %f is used for both float
s and
double
s
. The reason is that a float replacing %f is always converted to a with %e double before an operation like this. Also try to print in exponential notation—that is, less significant digits.
Functions are implemented as follows:
¨
#include <stdlib.h>
#include <stdio.h>
5 void mutiply_and_print (int x, int y)
{ printf ("%d * %d = %d\n", x, y, x * y);
}
10 int main (int argc, char *argv[])
{ mutiply_and_print (30, 5);
210
¥
22. Trivial Introduction to C 22.1.
C Fundamentals mutiply_and_print (12, 3); exit (3);
¦
Here we have a non-main function
declared
with
called
by the main function. The function is
¥
¦
This declaration states the return value of the function ( void for no return value), the function name ( mutiply and print ), and then the
arguments
that are going to be passed to the function. The numbers passed to the function are given their own names, x and y , and are converted to the type of x and y before being passed to the function— in this case, int curly braces { and and } .
int . The actual
C
code that comprises the function goes between
5
In other words, the above function is equivalent to:
¨ void mutiply_and_print ()
{ int x; int y; x = <first-number-passed> y = <second-number-passed> printf ("%d * %d = %d\n", x, y, x * y);
¥
¦
10
15
As with shell scripting, we have the
¨
#include <stdlib.h>
#include <stdio.h> for , while , and if statements:
5 int main (int argc, char *argv[])
{ int x; x = 10; if (x == 10) { printf ("x is exactly 10\n"); x++;
} else if (x == 20) { printf ("x is equal to 20\n");
} else {
211
¥
22.1.
C Fundamentals 22. Trivial Introduction to C
20
25
30
35
40
45 printf ("No, x is not equal to 10 or 20\n");
} if (x > 10) { printf ("Yes, x is more than 10\n");
} while (x > 0) { printf ("x is %d\n", x); x = x - 1;
} for (x = 0; x < 10; x++) { printf ("x is %d\n", x);
} switch (x) { case 9: printf ("x is nine\n"); break; case 10: printf ("x is ten\n"); break; case 11: printf ("x is eleven\n"); break; default: printf ("x is huh?\n"); break;
} return 0;
It is easy to see the format that these statements take, although they are vastly different from shell scripts.
C
code works in
statement blocks
between curly braces, in the same way that shell scripts have do ’s and done ’s.
¦
Note that with most programming languages when we want to add able we have to write, say, x = x + 1 . In
C
, the abbreviation x++
1 to a variis used, meaning to
increment
a variable by 1 .
The for loop takes three statements between ( . . .
) : a statement to start things off, a comparison, and a statement to be executed on each completion of the statement block. The statement block after the is untrue.
for is repeatedly executed until the comparison
The switch ment inside its ( statement is like
. . .
) case and decides which in shell scripts.
case switch considers the arguline to jump to. In this example it will obviously be printf ("x is ten\n"); loop exited. The break because x was 10 when the previous tokens mean that we are through with the switch for statement and that execution should continue from Line 46.
212
22. Trivial Introduction to C 22.1.
C Fundamentals
Note that in
C
the comparison assign a value to a variable, whereas
==
== is used instead of = . The symbol is an
equality operator
.
= means to
This list is called an
¨
#include <stdlib.h>
#include <stdio.h>
array
:
5
10 int main (int argc, char *argv[])
{ int x; int y[10]; for (x = 0; x < 10; x++) { y[x] = x * 2;
} for (x = 0; x < 10; x++) { printf ("item %d is %d\n", x, y[x]);
} return 0;
15
¦
If an array is of type
¨
#include <stdlib.h>
#include <stdio.h> char
acter
, then it is called a
string
:
¥
5
10
15 int main (int argc, char *argv[])
{ int x; char y[11]; for (x = 0; x < 10; x++) { y[x] = 65 + x * 2;
} for (x = 0; x < 10; x++) { printf ("item %d is %d\n", x, y[x]);
} y[10] = 0; printf ("string is %s\n", y); return 0;
Note that a string has to be a zero. The code
null-terminated
y[10] = 0
. This means that the last character must be sets the 11th item in the array to zero. This also means that strings need to be one char longer than you would think.
¦
213
¥
¦
¥
22.1.
C Fundamentals 22. Trivial Introduction to C
(Note that the first item in the array is gramming languages.) y[0] , not y[1] , as with some other pro-
In the preceding example, the line char y[11]
But what if you want a string of 100,000 bytes?
C
reserved 11 bytes for the string.
allows you to request memory from the kernel. This is called
allocate memory
. Any non-trivial program will allocate memory for itself and there is no other way of getting large blocks of memory for your program to use. Try:
¨
#include <stdlib.h>
#include <stdio.h>
¥
5
10
15 int main (int argc, char *argv[])
{ int x; char *y; y = malloc (11); printf ("%ld\n", y); for (x = 0; x < 10; x++) { y[x] = 65 + x * 2;
} y[10] = 0; printf ("string is %s\n", y); free (y); return 0;
¦
points
The declaration char *y to a memory location. The means to declare a variable (a number) called
* (
asterisk
) in this context means
pointer
y that
. For example, if you have a machine with perhaps 256 megabytes of RAM + swap, then tially has a range of this much. The numerical value of y is also printed with y potenprintf
("%ld\n", y); , but is of no interest to the programmer.
When you have finished using memory you must give it back to the operating system by using free . Programs that don’t free all the memory they allocate are said to
leak
memory.
Allocating memory often requires you to perform a calculation to determine the amount of memory required. In the above case we are allocating the space of 11 char s.
Since each char allocating 11 int is really a single byte, this presents no problem. But what if we were s? An int on a PC is 32 bits—four bytes. To determine the size of a type, we use the
¨ sizeof
#include <stdlib.h>
#include <stdio.h> keyword:
¥
5 int main (int argc, char *argv[])
{ int a; int b;
214
22. Trivial Introduction to C 22.1.
C Fundamentals
10
15
20 int c; int d; int e; int f; int g; a = sizeof (char); b = sizeof (short); c = sizeof (int); d = sizeof (long); e = sizeof (float); f = sizeof (double); g = sizeof (long double); printf ("%d, %d, %d, %d, %d, %d, %d\n", a, b, c, d, e, f, g); return 0;
¦
Here you can see the number of bytes required by all of these types. Now we can easily allocate arrays of things other than
¨ char .
#include <stdlib.h>
#include <stdio.h>
¥
5
10
15 int main (int argc, char *argv[])
{ int x; int *y; y = malloc (10 * sizeof (int)); printf ("%ld\n", y); for (x = 0; x < 10; x++) { y[x] = 65 + x * 2;
} for (x = 0; x < 10; x++) { printf ("%d\n", y[x]);
} free (y); return 0;
On many machines an
Always use the sizeof
int is four bytes (32 bits), but you should never assume this.
keyword to allocate memory.
¦
C
programs probably do more string manipulation than anything else. Here is a program that divides a sentence into words:
¨
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
¥
5 int main (int argc, char *argv[])
215
22.1.
C Fundamentals 22. Trivial Introduction to C
{
10
15 int length_of_word; int i; int length_of_sentence; char p[256]; char *q; strcpy (p, "hello there, my name is fred."); length_of_sentence = strlen (p);
20
25
30
35 length_of_word = 0; for (i = 0; i <= length_of_sentence; i++) { if (p[i] == ’ ’ || i == length_of_sentence) { q = malloc (length_of_word + 1); if (q == 0) { perror ("malloc failed"); abort ();
} strncpy (q, p + i - length_of_word, length_of_word); q[length_of_word] = 0; printf ("word: %s\n", q); free (q); length_of_word = 0;
} else { length_of_word = length_of_word + 1;
}
} return 0;
Here we introduce three more
standard
C
library functions
.
strcpy stands for str
ing
c
o
py . It copies bytes from one place to another sequentially, until it reaches a zero byte (i.e., the end of string). Line 13 of this program copies text array p , which is called the
target
of the copy.
into
the char acter strlen stands for str
ing
len
gth
. It determines the length of a string, which is just a count of the number of char acters up to the null character.
We need to loop over the length of the sentence. The variable current position in the sentence.
i indicates the
Line 20 says that if we find a character 32 (denoted by ’ ’ ), we know we have reached a word boundary. We also know that the end of the sentence is a word boundary even though there may not be a space there. The token || means
OR
. At this point we can allocate memory for the current word and copy the word into that memory.
The strncpy function is useful for this. It copies a string, but only up to a limit of length of word characters (the last argument). Like strcpy , the first argument is the target, and the second argument is the place to copy from.
To calculate the position of the start of the last word, we use length of word . This means that we are adding i p + i to the memory location p and
¦
216
22. Trivial Introduction to C 22.1.
C Fundamentals then going back sition.
length of word counts thereby pointing strncpy to the exact po-
Finally, we null-terminate the string on Line 27. We can then print used memory, and begin with the next word.
q , free the
For a complete list of string operations, see string (3).
Under most programming languages, file operations involve three steps:
reading
or
writing
to the file, and then
closing opening
the file. You use the command a file, fopen to tell the operating system that you are ready to begin working with a file:
The following program opens a file and spits it out on the terminal:
¨
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
¥
5 int main (int argc, char *argv[])
{ int c;
FILE *f;
10
15
20 f = fopen ("mytest.c", "r"); if (f == 0) { perror ("fopen"); return 1;
} for (;;) { c = fgetc (f); if (c == -1) break; printf ("%c", c);
} fclose (f); return 0;
¦
A new type is presented here:
initialized
with fopen
FILE * . It is a file operations variable that must be before it can be used. The fopen function takes two arguments: the first is the name of the file, and the second is a string explaining open the file—in this case "r" means r
eading how
we want to from the start of the file. Other options are "w" for w
riting
and several more described in fopen (3).
If the return value of fopen is zero, it means that fopen function then prints a textual error message (for example, has failed. The perror
No such file or directory ). It is essential to check the return value of all library calls in this way. These checks will constitute about one third of your
C
program.
217
22.1.
C Fundamentals 22. Trivial Introduction to C
The command fgetc
gets
a character from the file. It retrieves consecutive bytes from the file until it reaches the end of the file, when it returns a -1 . The break statement says to immediately terminate the for from line 21.
break loop, whereupon execution will continue statements can appear inside while loops as well.
You will notice that the means to loop forever.
for statement is empty. This is allowable
C
code and
Some other file functions are fwrite (3), fputc (3), fprintf fread
(3), and
, fwrite fseek (3).
, fputc , fprintf , and fseek . See
C
Up until now, you are probably wondering what the (int argc, char *argv[]) are for. These are the command-line arguments passed to the program by the shell.
argc is the total number of command-line arguments, and argv is an array of strings of each argument. Printing them out is easy:
¨
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
¥
5
10 int main (int argc, char *argv[])
{ int i; for (i = 0; i < argc; i++) { printf ("argument %d is %s\n", i, argv[i]);
} return 0;
¦
5
Here we put this altogether in a program that reads in lots of files and dumps them as words. Here are some new notations you will encounter: != is the inverse of == and tests if
not-equal-to
; realloc
reallocates
memory—it resizes an old block of memory so that any bytes of the old block are preserved; \n , \t mean the newline character, 10, or the tab character, 9, respectively (see ascii (7)).
¨
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
¥ void word_dump (char *filename)
{ int length_of_word; int amount_allocated;
218
22. Trivial Introduction to C
10
15
20
25
30
35
40
45
50
55
60
65
70 char *q;
FILE *f; int c; c = 0; f = fopen (filename, "r"); if (f == 0) { perror ("fopen failed"); exit (1);
} length_of_word = 0; amount_allocated = 256; q = malloc (amount_allocated); if (q == 0) { perror ("malloc failed"); abort ();
} while (c != -1) { if (length_of_word >= amount_allocated) { amount_allocated = amount_allocated * 2; q = realloc (q, amount_allocated); if (q == 0) { perror ("realloc failed"); abort ();
}
} c = fgetc (f); q[length_of_word] = c; if (c == -1 || c == ’ ’ || c == ’\n’ || c == ’\t’) { if (length_of_word > 0) { q[length_of_word] = 0; printf ("%s\n", q);
} amount_allocated = 256; q = realloc (q, amount_allocated); if (q == 0) { perror ("realloc failed"); abort ();
} length_of_word = 0;
} else { length_of_word = length_of_word + 1;
}
} fclose (f);
} int main (int argc, char *argv[])
{ int i; if (argc < 2) { printf ("Usage:\n\twordsplit <filename> ...\n"); exit (1);
} for (i = 1; i < argc; i++) { word_dump (argv[i]);
219
22.1.
C Fundamentals
22.1.
C Fundamentals 22. Trivial Introduction to C
75
} return 0;
This program is more complicated than you might immediately expect. Reading in a file where we are
sure
that a word will never exceed 30 characters is simple.
But what if we have a file that contains some words that are 100,000 characters long?
GNU programs are expected to behave correctly under these circumstances.
To cope with normal as well as extreme circumstances, we start off assuming that a word will never be more than 256 characters. If it appears that the word is growing over 256 characters, we realloc
ate
the memory space to double its size (lines 32 amd
33). When we start with a new word, we can free up memory again, so we realloc back to 256 again (lines 48 and 49). In this way we are using the minimum amount of memory at each point in time.
We have hence created a program that can work efficiently with a 100-gigabyte file just as easily as with a 100-byte file.
This is part of the art of
C
programming.
Experienced
C
programmers may actually scoff at the above listing because it really isn’t as “minimalistic” as is absolutely possible. In fact, it is a truly excellent listing for the following reasons:
•
The program is easy to understand.
•
The program uses an efficient algorithm (albeit not optimal).
•
The program contains no arbitrary limits that would cause unexpected behavior in extreme circumstances.
•
The program uses no nonstandard
C
functions or notations that would prohibit it compiling successfully on other systems. It is therefore
portable
.
Readability in
C
reading the code.
is your first priority—it is imperative that what you do is
obvious
to anyone
¦
At the start of each program will be one or more #include statements. These tell the compiler to read in another
C
program. Now, “raw”
C
does not have a whole lot in the way of protecting against errors: for example, the strcpy be used with one, three, or four arguments, and the
C
function could just as well program would still compile.
It would, however, wreak havoc with the internal memory and cause the program to crash. These other .h
C
programs are called
header
files. They contain templates for
220
22. Trivial Introduction to C 22.1.
C Fundamentals how functions are meant to be called. Every function you might like to use is contained in one or another template file. The templates are called
function prototypes
.
&
C++ has something called “templates.” This is a special C++ term having nothing to do with the discussion here.
code. A function prototype for
¨
A function prototype is written the same as the function itself, but without the word dump would simply be:
¥
¦
The trailing ; is essential and distinguishes a function prototype from a function.
After a function prototype is defined, any attempt to use the function in a way other than intended—say, passing it to few arguments or arguments of the wrong type—will be met with fierce opposition from gcc .
using
You will notice that the str ing operations.
#include <string.h> appeared when we started
Recompiling these programs without the #include
<string.h> line gives the warning message
¨ ¥
¦ which is quite to the point.
The function prototypes give a clear definition of how every function is to be used. Man pages will always first state the function prototype so that you are clear on what arguments are to be passed and what types they should have.
C
A
C
comment is denoted with lines. Anything between the /*
/* <comment lines> */ and */ and can span multiple is ignored. Every function should be commented, and all nonobvious code should be commented. It is a good maxim that a program that
needs
lots of comments to explain it is
badly written
. Also, never comment the obvious, and explain advisable
¨
why
you do things rather that
what
you are doing. It is
not
to make pretty graphics between each function, so rather:
/* returns -1 on error, takes a positive integer */ int sqr (int x)
{
§
<...>
¥
¦
¥
5
/***************************----SQR----******************************
* x = argument to make the square of *
* return value = *
*
*
-1 (on error) square of x (on success)
*
*
********************************************************************/
221
22.1.
C Fundamentals 22. Trivial Introduction to C
5 int sqr (int x)
{
§
<...>
¦ which is liable to cause nausea. In C++, the additional comment whereby everything between the //
// is allowed, and the end of the line is ignored. It is accepted under gcc , but should not be used unless you really are programming in C++. In addition, programmers often “comment out” lines by placing a #if 0 . . .
#endif around them, which really does exactly the same thing as a comment (see Section 22.1.12) but allows you to have comments within comments. For example
¨
§ int x; x = 10;
#if 0 printf ("debug: x is %d\n", x);
#endif y = x + 10;
<...>
/* print debug information */
¥
¦ comments out Line 4.
C
Anything starting with a # is not actually
C
, but a is first run through a
preprocessor
C
preprocessor directive
. A
C
program that removes all spurious junk, like comments, #include much more readable by defining
¨ statements, and anything else beginning with a
macros
# . You can make
C
programs instead of literal values. For instance,
¥
¦ in our example program, #define
s
Thereafter, wherever in the
C
the text START BUFFER SIZE program we have a to be the text
START BUFFER SIZE , the text
256 .
256 will be seen by the compiler, and we can use START BUFFER SIZE instead. This is a much
cleaner
way of programming because, if, say, we would like to change the 256 to some other value, we only need to change it in one place.
START BUFFER SIZE more meaningful than a number, making the program more readable.
is also
Whenever you have a
literal constant
like defined near the top of your program.
256 , you should replace it with a macro
5 directive.
¨
You can also check for the existence of macros with the
#
#ifdef and directives are really a programming language all on their own:
/* Set START_BUFFER_SIZE to fine-tune performance before compiling: */
#define START_BUFFER_SIZE 256
/* #define START_BUFFER_SIZE 128 */
/* #define START_BUFFER_SIZE 1024 */
/* #define START_BUFFER_SIZE 16384 */
#ifndef
¥
222
22. Trivial Introduction to C 22.2. Debugging with
gdb
and
strace
10
#ifndef START_BUFFER_SIZE
#error This code did not define START_BUFFER_SIZE. Please edit
#endif
#if START_BUFFER_SIZE <= 0
#error Wooow! START_BUFFER_SIZE must be greater than zero
#endif
15
20
#if START_BUFFER_SIZE < 16
#warning START_BUFFER_SIZE to small, program may be inefficient
#elif START_BUFFER_SIZE > 65536
#warning START_BUFFER_SIZE to large, program may be inefficient
#else
/* START_BUFFER_SIZE is ok, do not report */
#endif
25 void word_dump (char *filename)
{
<...> amount_allocated = START_BUFFER_SIZE;
§ q = malloc (amount_allocated);
<...>
¦
Programming errors, or
bugs
, can be found by inspecting program execution. Some developers claim that the need for such inspection implies a sloppy development process.
Nonetheless it is instructive to learn
C
by actually watching a program work.
The GNU debugger, gdb , is a replacement for the standard U
NIX debugger, db . To debug a program means to step through its execution line-by-line, in order to find programming errors as they happen. Use the command gcc -Wall -g -O0 -o wordsplit wordsplit.c
to recompile your program above. The -g option enables debugging support in the resulting executable and the -O0 option disables compiler optimization (which sometimes causes confusing behavior). For the following example, create a test file readme.txt
-q wordsplit . The standard with some plain text inside it. You can then run gdb gdb prompt will appear, which indicates the start of a
debugging session
:
¨ ¥
¦
At the prompt, many one letter commands are available to control program execution.
223
22.2. Debugging with
gdb
and
strace
22. Trivial Introduction to C
5
The first of these is r
un
which executes the program as though it had been started from a regular shell:
¨
(gdb)
r
Starting program: /homes/src/wordsplit/wordsplit
Usage: wordsplit <filename> ...
¥
¦
Obviously, we will want to set some trial command-line arguments. This is done with the special command, set args :
¨
set args readme.txt readme2.txt
¥
¦ sets a
¨
The b
reak
command is used like b [[<file>:]<line>|<function>] , and
break point
at a function or line number:
(gdb)
b main
¥
¦
A break point will interrupt execution of the program. In this case the program will stop when it enters the main function (i.e., right at the start). Now we can r un the program again:
¨
(gdb)
r
Starting program: /home/src/wordsplit/wordsplit readme.txt readme2.txt
¥
5
5
Breakpoint 1, main (argc=3, argv=0xbffff804) at wordsplit.c:67
67 if (argc < 2) {
As specified, the program stops at the beginning of the main function at line 67.
If you are interested in viewing the contents of a variable, you can use the p rint
¦
¥
(gdb)
$1 = 3
p argc
(gdb)
p argv[1]
which tells us the value of argc and argv[1] . The l
ist
command displays the lines
¦
¥
(gdb)
63
64
65
66
l
int main (int argc, char *argv[])
{ int i;
224
22. Trivial Introduction to C 22.2. Debugging with
gdb
and
strace
5
67
68
69 if (argc < 2) { printf ("Usage:\n\twordsplit <filename> ...\n"); exit (1);
}
¦
The l
ist
name):
¨
(gdb)
3
4
1
2
5
6
7 command can also take an optional file and line number (or even a function
l wordsplit.c:1
#include <stdlib.h>
#include <stdio.h>
#include <string.h> void word_dump (char *filename)
{ int length_of_word; int amount_allocated;
¥
¦
5
Next, we can try setting a break point at an arbitrary line and then using the c
ontinue
command to proceed with program execution:
¨
(gdb)
b wordsplit.c:48
Breakpoint 2 at 0x804873e: file wordsplit.c, line 48.
(gdb)
c
Continuing.
Zaphod
¥
5
Breakpoint 2, word_dump (filename=0xbffff988 "readme.txt") at wordsplit.c:48 amount_allocated = 256;
¦
Execution obediently stops at line 48. At this point it is useful to run a prints out the current
stack
b
ack
t
race
. This which shows the functions that were called to get to the current line. This output allows you to
trace
¨ the history of execution.
(gdb)
#0
bt
word_dump (filename=0xbffff988 "readme.txt") at wordsplit.c:48
#1 0x80487e0 in main (argc=3, argv=0xbffff814) at wordsplit.c:73
#2 0x4003db65 in __libc_start_main (main=0x8048790 <main>, argc=3, ubp_av=0xbf fff814, init=0x8048420 <_init>, fini=0x804883c <_fini>, rtld_fini=0x4000df24 <_dl_fini>, stack_end=0xbffff8
¥
¦
¨
(gdb)
The clear
clear
command then deletes the break point at the current line:
¥
¦
The
The most important commands for debugging are the n command simply executes one line of
C
code: n
ext
and s
tep
commands.
225
22.2. Debugging with
gdb
and
strace
22. Trivial Introduction to C
5
5
¨
(gdb)
49
(gdb)
50
(gdb)
n n n
q = realloc (q, amount_allocated); if (q == 0) {
¥ length_of_word = 0;
¦
This activity is called n
stepping
through your program. The s command is identical to except that it dives into functions instead of running them as single line. To see the difference, step over line 73 first with
¨ n , and then with s , as follows:
(gdb)
set args readme.txt readme2.txt
(gdb)
b main
Breakpoint 1 at 0x8048796: file wordsplit.c, line 67.
(gdb)
r
Starting program: /home/src/wordsplit/wordsplit readme.txt readme2.txt
¥
10
15
20
25
Breakpoint 1, main (argc=3, argv=0xbffff814) at wordsplit.c:67
67 if (argc < 2) {
(gdb)
72
n
for (i = 1; i < argc; i++) {
(gdb)
n
word_dump (argv[i]); 73
(gdb)
Zaphod
n
has two heads
72
(gdb)
73
(gdb)
s s
for (i = 1; i < argc; i++) { word_dump (argv[i]); word_dump (filename=0xbffff993 "readme2.txt") at wordsplit.c:13
13 c = 0;
(gdb)
15
s
f = fopen (filename, "r");
¦
10
5
An interesting feature of gdb is its ability to attach onto running programs. Try the following sequence of commands:
¨
28157 ?
28160 pts/6
S
S
lpd ps awx | grep lpd
0:00 lpd Waiting
0:00 grep lpd
gdb -q /usr/sbin/lpd
(no debugging symbols found)...
(gdb)
attach 28157
Attaching to program: /usr/sbin/lpd, Pid 28157
0x40178bfe in __select () from /lib/libc.so.6
¥
¦
226
22. Trivial Introduction to C 22.3.
C Libraries
The lpd daemon was not compiled with debugging support, but the point is still made: you can halt and debug
any
running process on the system. Try running a bt
¨ for fun. Now release the process with
(gdb)
detach
¥
¦
The debugger provides copious amounts of online help. The help be run to explain further. The gdb info command can pages also elaborate on an enormous number of display features and tracing features not covered here.
If your program has a segmentation violation (“segfault”) then a core file will be written to the current directory. This is known as a
core dump
. A core dump is caused by a bug in the program—its response to a SIGSEGV signal sent to the program because it tried to access an area of memory outside of its allowed range. These files can be examined using gdb to (usually) reveal where the problem occurred. Simply run
<executable> ./core and then type bt (or any gdb command) at the gdb gdb prompt.
Typing
¨ file ./core will reveal something like
¥
¦
The strace command prints every
system call
performed by a program. A system call is a function call made
¨ strace ls
by
a
C
library function to the L
INUX kernel. Try
¥
¦
If a program has not been compiled with debugging support, the only way to inspect its execution may be with the strace command. In any case, the command can provide valuable information about where a program is failing and is useful for diagnosing errors.
C
We made reference to the Standard
C
library. The
C
language on its own does almost nothing; everything useful is an external function. External functions are grouped into
227
22.3.
C Libraries 22. Trivial Introduction to C libraries. The Standard functions, run:
¨ nm /lib/libc.so.6
C
library is the file /lib/libc.so.6
. To list all the
C
library
¥
¦ many of these have man pages, but some will have no documentation and require you to read the comments inside the header files (which are often most explanatory). It is better not to use functions unless you are sure that they are
standard
functions in the sense that they are common to other systems.
To create your own library is simple. Let’s say we have two files that contain several functions that we would like to compile into a library. The files are simple math sqrt.c
¨
#include <stdlib.h>
#include <stdio.h>
¥
5 static int abs_error (int a, int b)
{ if (a > b) return a - b; return b - a;
}
10
15
20 int simple_math_isqrt (int x)
{ int result; if (x < 0) { fprintf (stderr,
"simple_math_sqrt: taking the sqrt of a negative number\n"); abort ();
} result = 2; while (abs_error (result * result, x) > 1) { result = (x / result + result) / 2;
} return result;
¦ and
¨ simple math pow.c
#include <stdlib.h>
#include <stdio.h>
¥
5
10 int simple_math_ipow (int x, int y)
{ int result; if (x == 1 || y == 0) return 1; if (x == 0 && y < 0) { fprintf (stderr,
"simple_math_pow: raising zero to a negative power\n");
228
22. Trivial Introduction to C 22.3.
C Libraries
15
20 abort ();
} if (y < 0) return 0; result = 1; while (y > 0) { result = result * x; y = y - 1;
} return result;
¦
We would like to call the library functions in the library simple math simple math . It is good practice to name all the
??????
. The function abs error is not going to be used outside of the file simple math sqrt.c
in front of it, meaning that it is a
local
function.
and so we put the keyword static
We can compile the code with:
¨ gcc -Wall -c simple_math_sqrt.c
¥
¦
The -c option means
compile only
. The code is not turned into an executable. The generated files are simple math sqrt.o
and simple math pow.o
. These are called o
bject
files.
We now need to
archive
(a predecessor of tar ):
¨ these files into a library. We do this with the ar libsimple_math.a simple_math_sqrt.o simple_math_pow.o
ar command
¥
¦
The ranlib command indexes the archive.
The library can now be used. Create a file
¨
#include <stdlib.h>
#include <stdio.h> mytest.c
:
¥
5 int main (int argc, char *argv[])
{ printf ("%d\n", simple_math_ipow (4, 3)); printf ("%d\n", simple_math_isqrt (50)); return 0;
¦ and run
¨ gcc -Wall -c mytest.c
¥
¦
229
22.4.
C Projects —
Makefile
s 22. Trivial Introduction to C
The first command compiles the file tion is called
linking
mytest.c
into the program, which assimilates mytest.o
mytest.o
, and the second funcand the libraries into a single executable. The option L.
means to look in the current directory for any libraries
(usually only /lib and /usr/lib are searched). The option -lsimple math means to assimilate the library
This operation is called libsimple math.a
static
( lib and .a
are added automatically).
&
Nothing to do with the “ static ” keyword.
linking because it happens before the program is run and includes all object files into the executable.
As an aside, note that it is often the case that many static libraries are linked into the same program. Here order is important: the library with the least dependencies should come last, or you will get so-called
symbol referencing errors
.
We can also create a header file simple math.h
for using the library.
¨
/* calculates the integer square root, aborts on error */ int simple_math_isqrt (int x);
¥
5
/* calculates the integer power, aborts on error */
¦
Add the line #include "simple math.h"
¨
#include <stdlib.h>
#include <stdio.h> to the top of mytest.c
:
¥
This addition gets rid of the sages. Usually implicit declaration of function
#include <simple math.h> warning meswould be used, but here, this is a header file in the current directory—our
"simple math.h" instead of
own
header file—and this is where we use
<simple math.h> .
¦
C
What if you make a small change to one of the files (as you are likely to do very often when developing)? You could script the process of compiling and linking, but the script would build everything, and not just the changed file. What we really need is a utility that only recompiles object files whose sources have changed: make is such a utility.
make is a program that looks inside a does a lot of compiling and linking.
Makefile
Makefile in the current directory then s contain lists of rules and
dependencies
describing how to build a program.
that goal.
Inside a make
Makefile you need to state a list of
what-depends-on-what
dependencies can work through, as well as the shell commands needed to achieve each
230
22. Trivial Introduction to C 22.4.
C Projects —
Makefile
s
Our first (last?)
dependency depends on
in the process of completing the compilation is that both the library, libsimple math.a
, and the object file, mytest mytest.o
. In make
¨ terms we create a Makefile line that looks like: libsimple_math.a mytest.o
¥
¦ meaning simply that the files dated before mytest .
libsimple math.a mytest.o
mytest: is called a make
target
must exist and be up-
. Beneath this line, we also need to state how to build
¨
§ mytest : gcc -Wall -o [email protected] mytest.o -L. -lsimple_math
¥
¦
The [email protected] means the name of the target itself, which is just substituted with
that the space before the gcc is a tab character and not 8 space characters.
mytest .
Note
The next dependency is that libsimple math.a
depends on simple math sqrt.o
simple math pow.o
. Once again we have a dependency, along with a shell script to build the target. The full
¨
Makefile
rule
is: libsimple_math.a: simple_math_sqrt.o simple_math_pow.o
rm -f [email protected] ar rc [email protected] simple_math_sqrt.o simple_math_pow.o
§ ranlib [email protected]
¥
¦
Note again that the left margin consists of a single tab character and not spaces.
The final dependency is that the files ple math pow.o
ple math pow.c
depend on the
. This requires two stating such a rule in the case of many
¨ files make simple math sqrt.o
simple math sqrt.c
target rules, but
C
source files, make and and simsimhas a short way of
.c.o:
§ gcc -Wall -c -o $*.o $<
¥
¦ which means that any .o
means of the command files needed can be built from a gcc -Wall -c -o $*.o $<
.c
file of a similar name by
, where $*.o
means the name of the object file and $< means the name of the file that $*.o
depends on, one at a time.
Makefile s can, in fact, have their rules put in any order, so it’s best to state the most obvious rules first for readability.
There is also a rule you should always state at the outset:
231
22.4.
C Projects —
Makefile
s 22. Trivial Introduction to C
¨ libsimple_math.a mytest
The all: target is the rule that make tries to satisfy when command-line arguments. This just means that make libsimple math.a
is run with no and mytest are the last two files to be built, that is, they are the top-level dependencies.
Makefile s also have their own form of environment variables, like shell scripts.
You can see that we have used the text simple math in three of our rules. It makes sense to define a
macro
for this so that we can easily change to a different library name.
¨
Our final Makefile is:
# Comments start with a # (hash) character like shell scripts.
# Makefile to build libsimple_math.a and mytest program.
# Paul Sheer <[email protected]> Sun Mar 19 15:56:08 2000
¥
¥
¦
5
OBJS = simple_math_sqrt.o simple_math_pow.o
LIBNAME = simple_math
CFLAGS = -Wall
10 all: lib$(LIBNAME).a mytest mytest: lib$(LIBNAME).a mytest.o
gcc $(CFLAGS) -o [email protected] mytest.o -L. -l${LIBNAME}
15 lib$(LIBNAME).a: $(OBJS) rm -f [email protected] ar rc [email protected] $(OBJS) ranlib [email protected]
20
.c.o: gcc $(CFLAGS) -c -o $*.o $< clean:
§ rm -f *.o *.a mytest
¦
¨
We can now easily type in the current directory to cause everything to be built.
You can see we have added an additional disconnected target can be run explictly on the command-line like this:
¨ clean: . Targets
¥
¦ which removes all built files.
Makefile s have far more uses than just building needs to be built from sources can employ a Makefile
C
programs. Anything that to make things easier.
¥
¦
232
This chapter follows directly from our construction of
22. It discusses creation and installation of
static
.a
libraries in Chapter
Dynamically Linked Libraries
(DLLs). Here I show you both so that you have a good technical overview of how DLLs work on U
NIX
.
You can then promptly forget everything except ldconfig and LD LIBRARY PATH discussed below.
The .a
library file is good for creating functions that many programs can include. This practice is called
code reuse
. But note how the .a
file is
linked into
(included) in the executable mytest in Chapter 22.
mytest is enlarged by the size of ple math.a
. When hundreds of programs use the same .a
libsimfile, that code is effectively duplicated all over the file system. Such inefficiency was deemed unacceptable long before L
INUX
, so library files were invented that only link with the program when it runs—a process known as
dynamic
linking. Instead of .a
files, similar .so
( s
hared
o
bject
) files live in when it runs.
/lib/ and /usr/lib/ and are automatically linked to a program
5
Creating a DLL requires several changes to the
¨
Makefile
OBJS
LIBNAME
= simple_math_sqrt.o simple_math_pow.o
= simple_math
SONAME = libsimple_math.so.1.0.0
SOVERSION = libsimple_math.so.1.0
CFLAGS = -Wall on page 232: all: lib$(LIBNAME).so mytest
233
¥
23.2. DLL Versioning 23. Shared Libraries
10 mytest: lib$(LIBNAME).so mytest.o
gcc $(CFLAGS) -o [email protected] mytest.o -L. -l${LIBNAME}
15 lib$(LIBNAME).so: $(OBJS) gcc -shared $(CFLAGS) $(OBJS) -lc -Wl,-soname -Wl,$(SOVERSION) \
-o $(SONAME) && \ ln -sf $(SONAME) $(SOVERSION) && \ ln -sf $(SONAME) lib$(LIBNAME).so
.c.o: gcc -fPIC -DPIC $(CFLAGS) -c -o $*.o $<
20 clean:
§ rm -f *.o *.a *.so mytest
¦
The -shared option to gcc builds our shared library. The -W options are linker options that set the version number of the library that linking programs will load at runtime. The -fPIC -DPIC suitable for dynamic linking.
means to generate
position-independent code
, that is, code
After running
¨ lrwxrwxrwx lrwxrwxrwx
-rwxr-xr-x
1 root
1 root
1 root
1 root root root root root make we have
23 Sep 17 22:02 libsimple_math.so -> libsimple_math.so.1.0.0
23 Sep 17 22:02 libsimple_math.so.1.0 -> libsimple_math.so.1.0.0
6046 Sep 17 22:02 libsimple_math.so.1.0.0
13677 Sep 17 22:02 mytest
¥
¦
You may observe that our three .so
files are similar to the many files in /lib/ and
/usr/lib/ . This complicated system of linking and symlinking is part of the process of
library versioning
. Although generating a DLL is out of the scope of most system admin tasks, library versioning is important to understand.
DLLs have a problem. Consider a DLL that is outdated or buggy: simply overwriting the DLL file with an updated file will affect all the applications that use it. If these applications rely on certain behavior of the DLL code, then they will probably crash with the fresh DLL. U
NIX has elegantly solved this problem by allowing multiple versions of DLLs to be present simultaneously. The programs themselves have their required version number built into them. Try
¨ ldd mytest
§
¥
¦ which will show the DLL files that
¨ mytest is scheduled to link with: libsimple_math.so.1.0 => ./libsimple_math.so.1.0 (0x40018000)
¥
234
23. Shared Libraries 23.3. Installing DLL
.so
Files libc.so.6 => /lib/libc.so.6 (0x40022000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
§
At the moment, we are interested in libsimple math.so.1.0
. Note how it matches the SOVERSION variable in the Makefile . Note also how we have chosen our symlinks. We are effectively allowing ple math.so.1.0.
?
(were our mytest simple math to link with any future libsimlibrary to be upgraded to a new version) purely because of the way we have chosen our symlinks. However, it will not link with any library libsimple math.so.1.1.
?
, for example. As developers of libsimple math , we are deciding that libraries of a different we are considering libraries to be named lib
name
.so.
major
.
minor
.
patch
-
minor
&
For this example version number will be incompatible, whereas libraries of a different
patch
level will not be incompatible.
¦
We could also change SOVERSION to libsimple math.so.1
. This would effectively be saying that future libraries of different minor version numbers are compatible; only a change in the major version number would dictate incompatibility.
If you run ./mytest , you will be greeted with an error while loading shared libraries message. The reason is that the dynamic linker does not search the current directory for
¨
.so
files. To run your program, you will have to install your library: mkdir -p /usr/local/lib install -m 0755 libsimple_math.so libsimple_math.so.1.0 \ libsimple_math.so.1.0.0 /usr/local/lib
§
¥
¦
Then, edit the
¨
/etc/ld.so.conf
/usr/local/lib
§ file and add a line
Then, reconfigure your libraries with
¨ ldconfig
§
Finally, run your program with
¨ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib"
./mytest
§
¥
¦
¥
¦
¥
¦ ldconfig configures all libraries on the system. It recreates appropriate symlinks (as we did) and rebuilds a lookup cache. The library directories it considers are
/lib , /usr/lib , and those listed in /etc/ld.so.config
. The ldconfig command should be run automatically when the system boots and manually whenever libraries are installed or upgraded.
235
23.3. Installing DLL
.so
Files 23. Shared Libraries
The LD LIBRARY PATH environment variable is relevant to every executable on the system and similar to the PATH environment variable.
LD LIBRARY PATH dictates what directories should be searched for library files. Here, we appended
/usr/local/lib to the search path in case it was missing. Note that even with
LD LIBRARY PATH unset, /lib and /usr/lib will always be searched.
236
In this chapter you will, first and foremost, learn to build packages from source, building on your knowledge of Makefile s in Chapter 22. Most packages, however, also come as .rpm
(RedHat) or .deb
(Debian ) files, which are discussed further below.
Almost all packages originally come as the many public FTP sites, like
C
sources, tar metalab.unc.edu
ed and available from one of
. Thoughtful developers would have made their packages
GNU standards compliant
. This means that un tar ring the package will reveal the following files inside the top-level directory:
INSTALL
This is a standard document beginning with the line “ These are generic installation instructions.
” Since all GNU installed in the same way, this file should always be the same.
packages are
NEWS
News of interest to users.
README
Any essential information. This is usually an explanation of what the package does, promotional material, and anything special that need be done to install the package.
COPYING
The GNU General Public License.
AUTHORS
A list of major contributors.
ChangeLog
A specially formatted list containing a history of all changes ever done to the package, by whom, and on what date. Used to track work on the package.
237
24.1. Building GNU Source Packages 24. Source and Binary Packages
Being GNU standards compliant should also mean that the package can be installed with only the three following commands:
¨
./configure make make install
§
¥
¦
It also
usually
means that packages will compile on any U
NIX section should be a good guide to getting L machines.
INUX system. Hence, this software to work on non-L
INUX
An example will illustrate these steps. Begin by downloading metalab.unc.edu
in the directory cooledit from
/pub/Linux/apps/editors/X/cooledit , using ftp . Make a directory /opt/src in which to build such custom packages. Now
¥ cd /opt/src tar -xvzf cooledit-3.17.2.tar.gz
cd cooledit-3.17.2
§
You will notice that most sources have the name
The
major package
-
major
.
minor
.
patch
.tar.gz
version of the package is changed when the developers make a substantial
.
feature update or when they introduce incompatibilities to previous versions. The minor version is usually updated when small features are added. The patch number
(also known as the patch signifies bug fixes.
level
) is updated whenever a new release is made and usually
¦
At this point you can apply any patches you may have. See Section 20.7.3.
by
You can now autoconf
./configure the package. The ./configure script is generated
—a package used by developers to create on any type of U
NIX system. The autoconf
C
source that will compile package also contains the
GNU Coding
Standards
to which all software should comply.
& autoconf is the remarkable work of David
MacKenzie. I often hear the myth that U
NIX systems have so diverged that they are no longer compatible.
The fact that sophisticated software like cooledit (and countless others) compiles on almost any U
NIX machine should dispel this nonsense. There is also hype surrounding developers “porting” commercial software from other U
NIX systems to L
INUX
. If they had written their software in the least bit properly to begin with, there would be no porting to be done. In short,
all
L
INUX software runs on
all
exceptions are a few packages that use some custom features of the L
INUX kernel.
-
¨
./configure --prefix=/opt/cooledit
§
U
NIX s. The only
¥
¦
Here, --prefix indicates the top-level directory under which the package will be installed. (See Section 17.2.). Always also try
¨
./configure --help
§
¥
¦ to see package-specific options.
238
24. Source and Binary Packages 24.1. Building GNU Source Packages
¨
Another trick sets compile options:
-O2
Sets compiler optimizations to be “as fast as possible without making the binary larger.” ( -O3 almost never provides an advantage.)
-fomit-frame-pointer
Permits the compiler to use one extra register that would normally be used for debugging. Use this option only when you are absolutely sure you have no interest in analyzing any running problems with the package.
-s
Strips the object code. This reduces the size of the object code by eliminating any debugging data.
-pipe
Instructs not to use temporary files. Rather, use pipes to feed the code through the different stages of compilation. This usually speeds compilation.
amount of code and your CPU power.
entry-level machine at the time of writing.
-
¨ make
§
Compile the package. This can take up to several hours depending on the
& cooledit will compile in under 10 minutes on any
¥
¦ if you decide that you would rather compile with debug support after all.
Install the package with
¨ make install
§
A nice trick to install into a different subdirectory is
¨ mkdir /tmp/cooledit make install prefix=/tmp/cooledit
§
&
Not always supported.
:
You can use these commands to pack up the completed build for un tar ing onto a different system. You should, however, never try to run a package from a directory different from the one it was --prefix ed to install into, since most packages
compile in
this location and then access installed data from beneath it.
¥
¦
¥
¦
Using a source package is often the best way to install when you want the package to work the way the developers intended. You will also tend to find more documentation, when vendors have neglected to include certain files.
¥
¦
239
¥
¦
24.2. RedHat and Debian Binary Packages 24. Source and Binary Packages
In this section, we place Debian examples inside parentheses, examples from actual systems, they do not always correspond.
( . . .
) . Since these are
The package numbering for RedHat and Debian though this is far from a rule):
¨ packages is often as follows (al-
<package-name>-<source-version>-<package-version>.<hardware-platform>.rpm
¥
¦
For example,
¨ bash-1.14.7-22.i386.rpm
¥ is the Bourne Again Shell you are using, major version 1, minor version 14, patch 7, package version 22, compiled for an Intel 386 processor. Sometimes, the Debian package will have the architecture appended to the version number, in the above case, perhaps bash 2.03-6 i386.deb
.
The <source-version> is the version on the original
<package-version> , also called the
release
, refers to the
.tar
.rpm
file (as above). The file itself; in this case, bash-1.14.7-22.i386.rpm
has been packed together for the 8 th time, possibly with minor improvements to the way it installs with each new number. The i386 is called the
architecture
and could also be
Sun Microsystems workstations
machine, ppc sparc for a for a
SPARC
&
Type of processor used in
PowerPC
&
Another non-Intel workstation
, alpha for a
DEC Alpha
&
High-end 64 bit server/workstation
machine, or several others.
¦
To install a package, run the following command on the .rpm
¨ rpm -i mirrordir-0.10.48-1.i386.rpm
( dpkg -i mirrordir_0.10.48-2.deb )
§ or .deb
file:
¥
¦
Upgrading (Debian automatically chooses an upgrade if the package is already present) can be done with the following command,
¨ rpm -U mirrordir-0.10.49-1.i386.rpm
( dpkg -i mirrordir_0.10.49-1.deb )
§
¥
¦ and then completely uninstalling with
240
24. Source and Binary Packages 24.2. RedHat and Debian Binary Packages
¨ rpm -e mirrordir
( dpkg --purge mirrordir )
§
¥
¦
With Debian , a package r emoval does not remove configuration files, thus allowing you to revert to its current setup if you later decide to reinstall:
¨ dpkg -r mirrordir
§
¥
¦
If you need to reinstall a package (perhaps because of a file being corrupted), use
¨ rpm -i --force python-1.6-2.i386.rpm
§
¥
¦
Debian reinstalls automatically if the package is present.
Packages often require other packages to already be installed in order to work. The package database keeps track of these
dependencies
. Often you will get an error: failed dependencies: (or dependency problems for Debian ) message when you try to install. This means that other packages must be installed first. The same might happen when you try to remove packages. If two packages mutually require each other, you must place them both on the command-line at once when installing.
Sometimes a package requires something that is not essential or is already provided by an equivalent package. For example, a program may require sendmail to be installed even though exim is an adequate substitute. In such cases, the option --nodeps skips dependency checking.
¨ rpm -i --nodeps <rpm-file>
( dpkg -i --ignore-depends=<required-package> <deb-file> )
§
¥
¦
Note that Debian is far more fastidious about its dependencies; override them only when you are sure what is going on underneath.
.rpm
and .deb
packages are more than a way of archiving files; otherwise, we could just use .tar
files. Each package has its file list stored in a database that can be queried.
The following are some of the more useful queries that can be done. Note that these are queries on
already
installed packages only:
To get a list of all packages ( q uery
¨ rpm -qa a ll, l list),
¥
241
24.2. RedHat and Debian Binary Packages 24. Source and Binary Packages
( dpkg -l ’*’ )
§ ¦
To search for a package name,
¨ rpm -qa | grep <regular-expression>
( dpkg -l <glob-expression> )
§
Try,
¨ rpm -qa | grep util
( dpkg -l ’*util*’ )
§
¥
¦
To query for the existence of a package, say,
¨ rpm -q textutils
( dpkg -l textutils )
§ textutils ( q uery, l ist), gives the name and version
¨ textutils-2.0e-7 textutils 2.0-2 The GNU text file processing utilities. )
¥
¦
¥
¦
To get info on a package ( q uery
¨ rpm -qi <package>
( dpkg -s <package> )
§ i nfo, s tatus),
To list libraries and other packages required by a package,
¨ rpm -qR <package>
( dpkg -s <package> | grep Depends )
§
To list what other packages require this one (with Debian a removal with the
¨
--no-act option to merely test), rpm -q --whatrequires <package>
( dpkg --purge --no-act <package> )
§ we can check by attempting
¥
¦
¥
¦
¥
¦
¦
¥
To get a file list contained by a package installed.
-
,
&
Once again,
not
for files but packages already
242
24. Source and Binary Packages 24.2. RedHat and Debian Binary Packages
¨ rpm -ql <package>
( dpkg -L <package> )
§
Package file lists are especially useful for finding what commands and documentation a package provides. Users are often frustrated by a package that they “don’t know what to do with.” Listing files owned by the package is where to start.
¥
¦
To find out what package a file belongs to,
¨ rpm -qf <filename>
( dpkg -S <filename> )
§
¥
¦
For example, rpm -qf /etc/rc.d/init.d/httpd
/etc/init.d/httpd ) gives
(or apache-mod ssl-1.3.12.2.6.6-1 rpm -qf on my system, and from
¨ rpm -ql fileutils-4.0w-3 | grep bin gives a list of all other commands fileutils . A trick to find all the sibling files of a command in your PATH is: rpm -ql ‘rpm -qf \‘which --skip-alias <command> \‘‘
( dpkg -L ‘dpkg -S \‘which <command> \‘ | cut -f1 -d:‘ )
§
¥
¦
You sometimes might want to query whether a package’s files have been modified since installation (possibly by a hacker or an incompetent system administrator). To verify all packages is time consuming but provides some very instructive output:
¨ rpm -V ‘rpm -qa‘
( debsums -a )
§
¥
¦
However, there is not yet a way of saying that the package installed is the real package (see Section 44.3.2). To check this, you need to get your actual .deb
or .rpm
file and verify it with:
¨ rpm -Vp openssh-2.1.1p4-1.i386.rpm
( debsums openssh_2.1.1p4-1_i386.deb )
§
¥
¦ it is
Finally, even if you have the package file, how can you be absolutely sure that
the
package that the original packager created, and not some Trojan substitution?
Use the md5sum command to check:
¥ md5sum openssh-2.1.1p4-1.i386.rpm
243
24.2. RedHat and Debian Binary Packages 24. Source and Binary Packages
( md5sum openssh_2.1.1p4-1_i386.deb )
§ md5sum uses the
MD5
mathematical algorithm to calculate a numeric
hash
on the file contents, in this case, value based
8e8d8e95db7fde99c09e1398e4dd3468 . This is identical to password hashing described on page 103. There is no feasible computational method of forging a package to give the same MD5 hash; hence, packagers will often publish their md5sum results on their web page, and you can check these against your own as a security measure.
¦
To query a package file that has not been installed, use, for example:
¨ rpm -qp --qf ’[%{VERSION}\n]’ <rpm-file>
( dpkg -f <deb-file> Version )
§
Here, VERSION can be queried: is a query
tag
applicable to .rpm
files. Here is a list of other tags that
¥
¦
BUILDHOST
BUILDTIME
CHANGELOG
CHANGELOGTEXT
CHANGELOGTIME
COPYRIGHT
DESCRIPTION
DISTRIBUTION
GROUP
LICENSE
NAME
OBSOLETES
OS
PACKAGER
PROVIDES
RELEASE
REQUIREFLAGS
REQUIRENAME
REQUIREVERSION
RPMTAG POSTIN
RPMTAG POSTUN
RPMTAG PREIN
RPMTAG PREUN
RPMVERSION
SERIAL
SIZE
SOURCERPM
SUMMARY
VENDOR
VERIFYSCRIPT
VERSION
For Debian , Version is a
control field
. Others are
Conffiles
Conflicts
Depends
Description
Essential
Installed-Size
Maintainer
Package
Pre-Depends
Priority
Provides
Recommends
Replaces
Section
Source
Status
Suggests
Version
It is further possible to extract all scripts, config, and control files from a file with:
.deb
244
24. Source and Binary Packages 24.2. RedHat and Debian Binary Packages
¨ dpkg -e <deb-file> <out-directory>
§
¥
¦
This command creates a directory can also dump the package as a
¨
<out-directory> tar file with: and places the files in it. You
¥
¦ or for an .rpm
¨ file,
¥
¦
Finally, package file lists can be queried with
¨ rpm -qip <rpm-file>
( dpkg -I <deb-file> ) rpm -qlp <rpm-file>
( dpkg -c <deb-file> )
§ which is analogous to similar queries on already installed packages.
¥
¦
Only a taste of Debian package management was provided above. Debian has two higher-level tools: APT (
Advanced Package Tool
—which comprises the commands aptcache , apt-cdrom , apt-config , and apt-get ); and dselect , which is an interactive text-based package selector. When you first install Debian , I suppose the first thing you are supposed to do is run dselect (there are other graphical front-ends— search on
Fresh Meat
http://freshmeat.net/
), and then install and configure all the things you skipped over during installation. Between these you can do some sophisticated time-saving things like recursively resolving package dependencies through automatic downloads—that is, just mention the package and APT will find it and what it depends on, then download and install everything for you. See apt.conf
(5) for more information.
apt (8), sources.list
(5), and
There are also numerous interactive graphical applications for managing RPM packages. Most are purely cosmetic.
Experience will clearly demonstrate the superiority of Debian packages over most others. You will also notice that where RedHat-like distributions have chosen a selection of packages that they thought
you
would find useful, Debian has hundreds of volunteer maintainers selecting what
they
find useful. Almost every free U
NIX package on the Internet has been included in Debian .
245
24.3. Source Packages 24. Source and Binary Packages
Both RedHat and Debian binary packages begin life as source files from which their binary versions are compiled. Source RedHat packages will end in .src.rpm
, and
Debian packages will always appear under the source tree in the distribution. The
RPM-HOWTO details the building of RedHat source packages, and Debian ’s dpkgdev and packaging-manual packages contain a complete reference to the Debian package standard and packaging methods (try dpkg -L dpkg-dev and dpkg -
L packaging-manual ).
The actual building of RedHat and Debian edition.
source packages is not covered in this
246
IP
stands for
Internet.
Internet Protocol
. It is the method by which data is transmitted over the
At a hardware level, network cards are capable of transmitting
packets
(also called
datagrams
) of data between one another. A packet contains a small block of, say, 1 kilobyte of data (in contrast to serial lines, which transmit continuously). All Internet communication occurs through transmission of packets, which travel intact, even between machines on opposite sides of the world.
Each packet contains a header of 24 bytes or more which precedes the data.
Hence, slightly more than the said 1 kilobyte of data would be found on the wire.
When a packet is transmitted, the header would obviously contain the destination machine. Each machine is hence given a unique
IP address
—a 32-bit number. There are no machines on the Internet that do not have an IP address.
The header bytes are shown in Table 25.1.
Bytes
0
1
2–3
4–5
Table 25.1 IP header bytes
Description
bits 0–3: Version, bits 4–7: Internet Header Length (IHL)
Type of service (TOS)
Length
Identification continues...
247
25.1. Internet Communication 25. Introduction to IP
Table 25.1 (continued)
6–7
8
9
10–11
12–15
16–19
20–IHL*4-1 bits 0-3: Flags, bits 4-15: Offset
Time to live (TTL)
Type
Checksum
Source IP address
Destination IP address
Options + padding to round up to four bytes
Data begins at IHL*4 and ends at Length-1
Version
for the mean time is 4, although
(slow) process of deployment.
IHL
IP Next Generation
(version 6) is in the is the length of the header divided by 4.
TOS
(
Type of Service
) is a somewhat esoteric field for tuning performance and is not explained here. The
Length
field is the length in bytes of the entire packet including the header.
The
Source
ing/going.
and
Destination
are the IP addresses
from
and
to
which the packet is com-
The above description constitutes the view of the Internet that a machine has.
However, physically, the Internet consists of many small high-speed networks (like those of a company or a university) called
Local Area Networks
, or
LAN
s. These are all connected to each other by lower-speed long distance links. On a LAN, the
raw
medium of transmission is not a packet but an Ethernet
frame
. Frames are analogous to packets (having both a header and a data portion) but are sized to be efficient with particular hardware. IP packets are encapsulated within frames, where the IP packet fits within the
Data
part of the frame. A frame may, however, be too small to hold an entire IP packet, in which case the IP packet is split into several smaller packets.
This group of smaller IP packets is then given an identifying number, and each smaller packet will then have the
Identification
field set with that number and the
Offset
field set to indicate its position within the actual packet. On the other side of the connection, the destination machine will reconstruct a packet from all the smaller subpackets that have the same
Identification
field.
imal
The convention for writing an IP address in human readable form is notation like 152.2.254.81
dotted dec-
, where each number is a byte and is hence in the range of 0 to 255. Hence the entire address
space
is in the range of 0.0.0.0
to
255.255.255.255
. To further organize the assignment of addresses, each 32-bit address is divided into two parts, a
network
and a
host
part of the address, as shown in
Figure 25.1.
248
25. Introduction to IP 25.2. Special IP Addresses
0
Class A: 0
1 2 3 4 5 6
network part
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
host part
Class B: 1 0
network part host part
Class C: 1 1 0
network part host part
Figure 25.1 IP address classes
The network part of the address designates the LAN, and the host part the particular machine on the LAN. Now, because it was unknown at the time of specification whether there would one day be more LANs or more machines per LAN, three different classes of address were created.
Class A
addresses begin with the first bit of the network part set to 0 (hence, a
Class A address always has the first dotted decimal number less than 128 ). The next 7 bits give the identity of the LAN, and the remaining 24 bits give the identity of an actual machine on that LAN. A Class B address begins with a 1 and then a 0 (first decimal number is 128 through 191 ). The next 14 bits give the LAN, and the remaining 16 bits give the machine. Most universities, like the address above, are Class B addresses.
Lastly, Class C addresses start with a 1 1 0 (first decimal number is 192 through 223 ), and the next 21 bits and then the next 8 bits are the LAN and machine, respectively.
Small companies tend use Class C addresses.
In practice, few organizations require Class A addresses. A university or large company might use a Class B address but then would have its own further subdivisions, like using the third dotted decimal as a department (bits 16 through 23) and the last dotted decimal (bits 24 through 31) as the machine within that department. In this way the LAN becomes a micro-Internet in itself. Here, the LAN is called a the various departments are each called a
subnet
.
network
and
Some special-purposes IP addresses are never used on the open Internet.
192.168.0.0
through 192.168.255.255
are private addresses perhaps used inside a local LAN that does not communicate directly with the Internet.
through 127.255.255.255
are used for communication with the
127.0.0.0
localhost
—that is, the machine itself. Usually, 127.0.0.1
is an IP address pointing to the machine itself.
Further, 172.16.0.0
through 172.31.255.255
are additional private addresses for very large internal networks, and larger ones.
10.0.0.0
through 10.255.255.255
are for even
249
25.3. Network Masks and Addresses 25. Introduction to IP
Consider again the example of a university with a Class B address.
It might have an IP address range of 137.158.0.0
through 137.158.255.255
. Assume it was decided that the astronomy department should get 512 of its own IP addresses, 137.158.26.0
network address
of through
137.158.26.0
137.158.27.255
. We say that astronomy has a
. The machines there all have a
network mask
of
255.255.254.0
. A particular machine in astronomy may have an
IP address
of
137.158.27.158
. This terminology is used later. Figure 25.2 illustrates this example.
Netmask
Network address
IP address
Host part
255 .
Dotted IP
255 .
254 .
0
137
137 .
.
158
158 .
.
26
27 .
.
0
158
0 .
0 .
1 .
158
Binary
1111 1111 1111 1111 1111 111 0 0000 0000 z }| {
1000 1001 1001 1110 0001 1010 0000 0000 z }| {
0000 0000 0000 0000 0000 000 1 1001 1110
Figure 25.2 Dividing an address into network and host portions
In this section we will use the term LAN to indicate a network of computers that are all more or less connected directly together by Ethernet cables (this is common for small businesses with up to about 50 machines). Each machine has an Ethernet card which is referred to as eth0 throughout all command-line operations. If there is more than one card on a single machine, then these are named called a
network interface
(or just
interface
eth0
, or sometimes
, eth1 , eth2 , etc., and are each
Ethernet port
) of the machine.
LANs work as follows. Network cards transmit a frame to the LAN, and other network cards read that frame from the LAN. If any one network card transmits a frame, then
all
other network cards can see that frame. If a card starts to transmit a frame while another card is in the process of transmitting a frame, then a
clash
is said to have occurred, and the card waits a random amount of time and then tries again.
Each network card has a physical address of 48 bits called the
hardware address
(which is inserted at the time of its manufacture and has nothing to do with IP addresses).
Each frame has a destination address in its header that tells what network card it is destined for, so that network cards ignore frames that are not addressed to them.
Since frame transmission is governed by the network cards, the destination hardware address must be determined from the destination IP address before a packet is sent to a particular machine. This is done is through the
Address Resolution Protocol
250
25. Introduction to IP 25.5. Configuring Interfaces
(ARP). A machine will transmit a special packet that asks “What hardware address is this IP address?” The guilty machine then responds, and the transmitting machine stores the result for future reference. Of course, if you suddenly switch network cards, then other machines on the LAN will have the wrong information, so ARP has timeouts and re-requests built into the protocol. Try typing the command of hardware address to IP mappings.
arp to get a list
Most distributions have a generic way to configure your interfaces. Here, however, we first look at a complete network configuration using only raw networking commands.
We first create a lo interface. This is called the
loopback
device (and has nothing to do with loopback block devices: /dev/loop
?
files). The loopback device is an imaginary network card that is used to communicate with the machine itself; for instance, if you are telnet ing to the local machine, you are actually connecting via the loopback config
ure
) command is used to do anything with device. The ifconfig interfaces. First, run
¨
( i
nter
/sbin/ifconfig lo down f
ace
/sbin/ifconfig eth0 down
§
¥
¦ to delete any existing interfaces, then run
¨
/sbin/ifconfig lo 127.0.0.1
§ which creates the loopback interface.
¥
¦
¨
Create the Ethernet interface with:
The broadcast address is a special address that all machines respond to. It is usually the first or last address of the particular network.
5
¨
Now run
/sbin/ifconfig
§ to view the interfaces. The output will be
¨ eth0 Link encap:Ethernet inet addr:192.168.3.9
HWaddr 00:00:E8:3B:2D:A2
Bcast:192.168.3.255
Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1359 errors:0 dropped:0 overruns:0 frame:0
TX packets:1356 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100
¥
¦
¥
¥
¦
251
25.6. Configuring Routing 25. Introduction to IP
Interrupt:11 Base address:0xe400
10 lo Link encap:Local Loopback inet addr:127.0.0.1
Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:3924 Metric:1
RX packets:53175 errors:0 dropped:0 overruns:0 frame:0
TX packets:53175 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0
§ which shows various interesting bits, like the 48-bit hardware address of the network card (hex bytes 00:00:E8:3B:2D:A2 ).
¦
The interfaces are now active. However, nothing tells the kernel what packets should go to what interface, even though we might expect such behavior to happen on its own.
With U
NIX
, you must explicitly tell the kernel to send particular packets to particular interfaces.
Any packet arriving through any interface is pooled by the kernel. The kernel then looks at each packet’s destination address and decides, based on the destination, where it should be sent. It doesn’t matter where the packet came from; once the kernel
has
the packet, it’s what its destination address says that matters. It is up to the rest of the network to ensure that packets do not arrive at the wrong interfaces in the first place.
We know that any packet having the network address 127.
???
.
???
.
???
go to the loopback device (this is more or less a convention). The command,
¨
/sbin/route add -net 127.0.0.0 netmask 255.0.0.0 lo
§ adds a
route
to the network 127.0.0.0
, albeit an imaginary one.
must
¥
¦ eth0 device can be routed as follows:
¥
¦
The command to display the current routes is
¨
/sbin/route -n
§
( -n
¨ causes route to not print IP addresses as host names) with the following output:
Kernel IP routing table
Destination Gateway
127.0.0.0
0.0.0.0
0.0.0.0
Genmask
255.0.0.0
255.255.255.0
Flags Metric Ref
U 0 0
U 0 0
Use Iface
0 lo
0 eth0
¥
¦
¥
¦
252
25. Introduction to IP 25.6. Configuring Routing
This address output has the meaning,
192.168.3.0/255.255.255.0
“packets must be sent to hence, is not set (see the following commands).
with destination address
127.0.0.0/255.0.0.0
address.
-
&
The notation must be sent to the lo
network/mask
is often used to denote ranges of IP opback device,” and “packets with destination eth0 .” Gateway is zero,
The routing table now routes 127.
and 192.168.3.
a route for the remaining possible IP addresses. U
NIX packets. Now we need can have a route that says to send packets with particular destination IP addresses to another machine on the LAN, from whence they might be forwarded elsewhere. This is sometimes called the
gateway
machine. The command is:
¨
/sbin/route add -net <network-address> netmask <netmask> gw \
§
<gateway-ip-address> <interface>
¥
¦
This is the most general form of the command, but it’s often easier to just type:
¨ ¥
¦ when we want to add a route that applies to all remaining packets. This route is called the
¨
default gateway
.
default signifies all packets; it is the same as
/sbin/route add -net 0.0.0.0 netmask 0.0.0.0 gw <gateway-ip-address> \
§
<interface>
¥
¦ but since routes are ordered according to
erence to less specific ones
.
netmask ,
more specific routes are used in pref-
Finally, you can set your host name with:
¨ hostname cericon.cranzgot.co.za
§
¥
¦
5
A summary of the example commands so far is
¨
/sbin/ifconfig lo down
/sbin/ifconfig eth0 down
/sbin/ifconfig lo 127.0.0.1
/sbin/ifconfig eth0 192.168.3.9 broadcast 192.168.3.255 netmask 255.255.255.0
/sbin/route add -net 127.0.0.0 netmask 255.0.0.0 lo
/sbin/route add -net 192.168.3.0 netmask 255.255.255.0 eth0
/sbin/route add default gw 192.168.3.254 eth0
¥
¦
Although these 7 commands will get your network working, you should not do such a manual configuration. The next section explains how to configure your startup scripts.
253
25.7. Configuring Startup Scripts 25. Introduction to IP
Most distributions will have a modular and extensible system of startup scripts that initiate networking.
5
RedHat systems contain the directory /etc/sysconfig/ , which contains configuration files to automatically bring up networking.
The file
¨
DEVICE=eth0
/etc/sysconfig/network-scripts/ifcfg-eth0
IPADDR=192.168.3.9
NETMASK=255.255.255.0
NETWORK=192.168.3.0
BROADCAST=192.168.3.255
contains:
¥
¦
The file /etc/sysconfig/network
¨
NETWORKING=yes
HOSTNAME=cericon.cranzgot.co.za
GATEWAY=192.168.3.254
§ contains:
¥
¦
You can see that these two files are equivalent to the example configuration done above. These two files can take an enormous number of options for the various protocols besides IP, but this is the most common configuration.
The file /etc/sysconfig/network-scripts/ifcfg-lo for the loopback device will be configured automatically at installation; you should never need to edit it.
To stop and start networking (i.e., to bring up and down the interfaces and routing), type (alternative commands in parentheses):
¨
/etc/init.d/network stop
( /etc/rc.d/init.d/network stop )
/etc/init.d/network start
( /etc/rc.d/init.d/network start )
§
¥
¦ which will indirectly read your /etc/sysconfig/ files.
You can add further files, say, ifcfg-eth1
/etc/sysconfig/network-scripts/ ) for a secondary Ethernet device.
(under
For example, ifcfg-eth1 could contain
254
25. Introduction to IP 25.7. Configuring Startup Scripts
5
¨
DEVICE=eth1
IPADDR=192.168.4.1
NETMASK=255.255.255.0
NETWORK=192.168.4.0
BROADCAST=192.168.4.255
ONBOOT=yes
§ and then run echo "1" > /proc/sys/net/ipv4/ip forward forwarding between your two interfaces.
to enable packet
¥
¦
5
Debian , on the other hand, has a directory /etc/network/ containing a file
/etc/network/interfaces .
&
As usual, Debian has a neat and clean approach.
(See also interfaces (5).) For the same configuration as above, this file would contain:
¨ iface lo inet loopback iface eth0 inet static address 192.168.3.9
netmask 255.255.255.0
gateway 192.168.3.254
§
¥
¦
The file other) options:
¨
/etc/network/options ip_forward=no spoofprotect=yes syncookies=no
§ contains the same forwarding (and some
¥
¦
To stop and start networking (i.e., bring up and down the interfaces and routing), type
¨
/etc/init.d/networking stop
/etc/init.d/networking start
§
¥
¦ which will indirectly read your /etc/network/interfaces file.
down
Actually, the /etc/init.d/networking commands. See ifup script merely runs the ifup and if-
(8). You can alternatively run these commands directly for finer control.
We add further interfaces similar to the RedHat example above by appending to the /etc/network/interfaces file. The Debian equivalent is,
255
25.8. Complex Routing — a Many-Hop Example 25. Introduction to IP
5
¨ iface lo inet loopback iface eth0 inet static address 192.168.3.9
netmask 255.255.255.0
gateway 192.168.3.254
iface eth1 inet static address 192.168.4.1
netmask 255.255.255.0
§ and then set ip forward=yes in your /etc/network/options file.
¥
¦ in
Finally, whereas RedHat sets its host name from the line
/etc/sysconfig/network , Debian
HOSTNAME= . . .
sets it from the contents of the file
/etc/hostname , which, in the present case, would contain just
¨ cericon.cranzgot.co.za
§
¥
¦
Consider two distant LANs that need to communicate. Two dedicated machines, one on each LAN, are linked by some alternative method (in this case, a permanent serial line), as shown in Figure 25.3.
This arrangement can be summarized by five machines
A
, and
B
form LAN
1
on subnet 192.168.1.0/26
X
,
A
,
B
,
. Machines
C
C
, and and
D
D
. Machines form LAN
X
2
, on subnet 192.168.1.128/26 . Note how we use the “ /26 ” to indicate that only the first 26 bits are network address bits, while the remaining 6 bits are host address bits.
This means that we can have at most 2
6
= 64 IP addresses on each of LAN
1
and
2
.
Our dedicated serial link comes between machines
B
and
C
.
Machine
X
has IP address 192.168.1.1
. This machine is the gateway to the
Internet. The Ethernet port of machine
B
is simply configured with an IP address of 192.168.1.2
with a default gateway of address is 192.168.1.63
192.168.1.1
(the last 6 bits set to 1).
. Note that the broadcast
The Ethernet port of machine
C
is configured with an IP address of
192.168.1.129
. No default gateway should be set until serial line is configured.
We will make the network between
B
and
C
subnet 192.168.1.192/26 . It is effectively a LAN on its own, even though only two machines can ever be connected.
Machines
B
and
C
will have IP addresses respectively, on their facing interfaces.
192.168.1.252
and 192.168.1.253
,
256
25. Introduction to IP 25.8. Complex Routing — a Many-Hop Example
Figure 25.3 Two remotely connected networks
5
This is a real-life example with an unreliable serial link. To keep the link up requires pppd and a shell script to restart the link if it dies. The pppd program is covered in Chapter 41. The script for Machine
B
is:
¨
#!/bin/sh while true ; do pppd lock local mru 296 mtu 296 nodetach nocrtscts nocdtrcts \
192.168.1.252:192.168.1.253 /dev/ttyS0 115200 noauth \ lcp-echo-interval 1 lcp-echo-failure 2 lcp-max-terminate 1 lcp-restart 1
¥
¦
Note that if the link were an Ethernet link instead (on a second Ethernet card), and/or a genuine LAN between machines
B
and
C
(with subnet 192.168.1.252/26 ), then the same script would be just
¨
/sbin/ifconfig eth1 192.168.1.252 broadcast 192.168.1.255 netmask \
§
255.255.255.192
¥
¦ in which case all “ ppp0 ” would change to “ eth1 ” in the scripts that follow.
257
25.8. Complex Routing — a Many-Hop Example 25. Introduction to IP
5
Routing on machine
B
is achieved with the following script, provided the link is up. This script must be executed whenever pppd has negotiated the connection and can therefore be placed in the file ically as soon as the
¨
/etc/pppd/ip-up ppp0 interface is available:
, which pppd executes automat-
/sbin/route del default
/sbin/route add -net 192.168.1.192 netmask 255.255.255.192 dev ppp0
/sbin/route add -net 192.168.1.128 netmask 255.255.255.192 gw 192.168.1.253
/sbin/route add default gw 192.168.1.1
¥
¦
10
5
Our full routing table and interface list for machine
B
then looks like this
&
RedHat
6 likes to add (redundant) explicit routes to each device. These may not be necessary on your system
:
¨
Kernel IP routing table
Destination
192.168.1.2
192.168.1.253
192.168.1.0
192.168.1.192
192.168.1.128
127.0.0.0
0.0.0.0
Gateway
0.0.0.0
0.0.0.0
0.0.0.0
0.0.0.0
192.168.1.253
0.0.0.0
192.168.1.1
Genmask Flags Metric Ref
255.255.255.255 UH 0 0
255.255.255.255 UH
255.255.255.192 U
0
0
0
0
255.255.255.192 U
255.255.255.192 UG
0
0
0
0
255.0.0.0
0.0.0.0
U
UG
0
0
0
0
Use Iface
0 eth0
0 ppp0
0 eth0
0 ppp0
0 ppp0
0 lo
0 eth0
¥
15 eth0 lo ppp0
§
Link encap:Ethernet inet addr:192.168.1.2
HWaddr 00:A0:24:75:3B:69
Bcast:192.168.1.63
Mask:255.255.255.192
Link encap:Local Loopback inet addr:127.0.0.1
Mask:255.0.0.0
Link encap:Point-to-Point Protocol inet addr:192.168.1.252
P-t-P:192.168.1.253
Mask:255.255.255.255
¦
5
5
On machine
C
we can similarly run the script,
¨
#!/bin/sh while true ; do pppd lock local mru 296 mtu 296 nodetach nocrtscts nocdtrcts \
192.168.1.253:192.168.1.252 /dev/ttyS0 115200 noauth \ lcp-echo-interval 1 lcp-echo-failure 2 lcp-max-terminate 1 lcp-restart 1
¥ and then create routes with
¨
/sbin/route del default
/sbin/route add -net 192.168.1.192 netmask 255.255.255.192 dev ppp0
/sbin/route add default gw 192.168.1.252
¦
¥
¦
258
25. Introduction to IP 25.9. Interface Aliasing — Many IPs on One Physical Card
5
10 eth0
15
¨ lo ppp0
§
Our full routing table for machine
Kernel IP routing table
Destination Gateway
192.168.1.129
192.168.1.252
192.168.1.192
192.168.1.128
127.0.0.0
0.0.0.0
0.0.0.0
0.0.0.0
0.0.0.0
0.0.0.0
0.0.0.0
192.168.1.252
C
Genmask Flags Metric Ref
255.255.255.255 UH
255.255.255.255 UH
255.255.255.192 U
255.255.255.192 U
255.0.0.0
U
0.0.0.0
then looks like:
UG
0
0
0
0
0
0
0
0
0
0
0
0
Use Iface
0 eth0
0 ppp0
0 ppp0
0 eth0
0 lo
0 ppp0
Link encap:Ethernet HWaddr 00:A0:CC:D5:D8:A7 inet addr:192.168.1.129
Bcast:192.168.1.191
Link encap:Local Loopback inet addr:127.0.0.1
Mask:255.0.0.0
Link encap:Point-to-Point Protocol inet addr:192.168.1.253
P-t-P:192.168.1.252
Mask:255.255.255.192
Mask:255.255.255.255
¥
¦
Machine
D
can be configured like any ordinary machine on a LAN. It just sets its default gateway to 192.168.1.129
. Machine
A
, however, has to know to send packets destined for subnet an extra entry for the
192.168.1.128/26
192.168.1.128/26
through
machine
B
. Its routing table has
LAN. The full routing table for machine
A
¥
5
Kernel IP routing table
Destination
192.168.1.0
192.168.1.128
127.0.0.0
Gateway
0.0.0.0
192.168.1.2
0.0.0.0
192.168.1.1
Genmask Flags Metric Ref
255.255.255.192 U 0 0
255.255.255.192 UG
255.0.0.0
U
0.0.0.0
UG
0
0
0
0
0
0
Use Iface
0 eth0
0 eth0
0 lo
0 eth0
¦
To avoid having to add this extra route on machine same route on machine
X
A
, you can instead add the
. This may seem odd, but all that this means is that packets originating from
A
destined for LAN 2 first route), and are then redirected
by
X
try
to go through to go through
B
.
X
(since
A
has only one
The preceding configuration allowed machines to properly send packets between machines
A
and
D
and out through the Internet. One caveat: ping sometimes did not work even though were using, telnet
**shrug**
.
did. This may be a peculiarity of the kernel version we
(The file /usr/src/linux/Documentation/networking/alias.txt
the kernel documentation on this.) contains
259
25.10. Diagnostic Utilities 25. Introduction to IP
If you have one network card which you would like to double as several different
IP addresses, you can. Simply name the interface eth0:
n
where
n
is from 0 to some large integer. You can use ifconfig as before as many times as you like on the same
/sbin/ifconfig eth0:0 192.168.4.1 broadcast 192.168.4.255 netmask 255.255.255.0
/sbin/ifconfig eth0:1 192.168.5.1 broadcast 192.168.5.255 netmask 255.255.255.0
¥
¦
—in
addition
to your regular eth0 to three LANs having networks device. Here, the same interface can communicate
192.168.4.0
, 192.168.5.0
, and 192.168.6.0
.
Don’t forget to add routes to these networks as above.
It is essential to know how to inspect and test your network to resolve problems. The standard U
NIX utilities are explained here.
The ping command is the most common network utility. IP packets come in three types on the Internet, represented in the
Type
field of the IP header:
UDP
,
TCP
, and
ICMP
. (The first two, discussed later, represent the two basic methods of communication between two programs running on different machines.)
ICMP
stands for
Internet
Control Message Protocol
and is a diagnostic packet that is responded to in a special way.
Try:
¨ ¥
¦
5 or specify some other well-known host. You will get output like:
¨
PING metalab.unc.edu (152.19.254.81) from 192.168.3.9 : 56(84) bytes of data.
64 bytes from 152.19.254.81: icmp_seq=0 ttl=238 time=1059.1 ms
64 bytes from 152.19.254.81: icmp_seq=1 ttl=238 time=764.9 ms
64 bytes from 152.19.254.81: icmp_seq=2 ttl=238 time=858.8 ms
64 bytes from 152.19.254.81: icmp_seq=3 ttl=238 time=1179.9 ms
64 bytes from 152.19.254.81: icmp_seq=4 ttl=238 time=986.6 ms
64 bytes from 152.19.254.81: icmp_seq=5 ttl=238 time=1274.3 ms
¥
¦
What is happening is that ping is sending ICMP packets to metalab.unc.edu
, which is automatically responding with a return ICMP packet. Being able to ping a machine is often the acid test of whether you have a correctly configured and working network interface. Note that some sites explicitly filter out ICMP packets, so, for example, ping cnn.com
won’t work.
260
25. Introduction to IP 25.10. Diagnostic Utilities ping sends a packet every second and measures the time it takes to receive the return packet—like a submarine sonar “ping.” Over the Internet, you can get times in excess of 2 seconds if the place is remote enough. On a local LAN this delay will drop to under a millisecond.
ping
If ping does not even get to the line PING metalab.unc.edu
. . . , it means that cannot resolve the host name. You should then check that your DNS is set up correctly—see Chapter 27. If ping gets to that line but no further, it means that the packets are not getting there or are not getting back. In all other cases, ping gives an error message reporting the absence of either routes or interfaces.
traceroute is a rather fascinating utility to identify where a packet has been. It uses
UDP packets or, with the -I option, ICMP packets to detect the routing path. On my machine,
¨ ¥
¦
5
10
15
20 gives
¨
18
19
20
21
12
13
14
15
16
17
10
11
8
9
6
7
4
5 traceroute to metalab.unc.edu (152.19.254.81), 30 hops max, 38 byte packets
1 192.168.3.254 (192.168.3.254) 1.197 ms 1.085 ms 1.050 ms
2
3
192.168.254.5 (192.168.254.5) cranzgate (192.168.2.254)
45.165 ms
48.205 ms
45.314 ms
48.170 ms
45.164 ms
48.074 ms cranzposix (160.124.182.254) 46.117 ms cismpjhb.posix.co.za (160.124.255.193) cisap1.posix.co.za (160.124.112.1) saix.posix.co.za (160.124.255.6) ndf-core1.gt.saix.net (196.25.253.1)
46.064 ms
451.886 ms
274.834 ms
187.402 ms
252.558 ms
45.999 ms
71.549 ms
147.251 ms
325.030 ms
173.321 ms
400.654 ms
628.576 ms
186.256 ms 255.805 ms ny-core.saix.net (196.25.0.238) 497.273 ms corerouter1.WestOrange.cw.net (204.70.9.138)
454.531 ms bordercore6-serial5-0-0-26.WestOrange.cw.net (166.48.144.105)
490.845 ms
639.795 ms core6.Washington.cw.net (204.70.4.113)
204.70.10.182 (204.70.10.182)
580.971 ms
644.070 ms
893.481 ms
726.363 ms
730.608 ms
639.942 ms mae-brdr-01.inet.qwest.net (205.171.4.201)
* * *
767.783 ms * *
595.755 ms
698.483 ms
595.174 ms *
1029.369 ms
* wdc-core-03.inet.qwest.net (205.171.24.69) atl-core-02.inet.qwest.net (205.171.5.243) atl-edge-05.inet.qwest.net (205.171.21.54)
* * *
779.546 ms
894.553 ms
735.810 ms
898.371 ms
689.472 ms *
784.461 ms 789.592 ms
* * unc-gw.ncren.net (128.109.190.2) unc-gw.ncren.net (128.109.190.2)
* helios.oit.unc.edu (152.2.22.3)
889.257 ms
646.569 ms
600.558 ms
780.000 ms *
839.135 ms
¥
¦
You can see that there were twenty machines
&
This is actually a good argument for why
“enterprise”-level web servers have no use in non-U.S. markets: there isn’t even the network speed to load such servers, thus making any kind of server speed comparisons superfluous.
(or
hops
) between mine and metalab.unc.edu
.
tcpdump watches a particular interface for
all
the traffic that passes it—that is, all the traffic of all the machines connected to the same hub (also called the
segment
or
network segment
). A network card usually grabs only the frames destined for it, but tcpdump
261
25.10. Diagnostic Utilities 25. Introduction to IP puts the card into
promiscuous
mode, meaning that the card is to retrieve all frames regardless of their destination hardware address. Try
¨ ¥
¦ tcpdump is also discussed in Section 41.5. Deciphering the output of for now as an exercise for the reader. More on the
tcp
part of tcpdump tcpdump is left in Chapter 26.
262
In the previous chapter we talked about communication between machines in a generic sense. However, when you have two applications on opposite sides of the Atlantic
Ocean, being able to send a packet that may or may not reach the other side is not sufficient. What you need is
reliable
communication.
Ideally, a programmer wants to be able to establish a link to a remote machine and then feed bytes in one at a time and be sure that the bytes are being read on the other end, and vice-versa. Such communication is called
reliable stream
communication.
If your only tools are discrete, unreliable packets, implementing a reliable, continuous stream is tricky. You can send single packets and then wait for the remote machine to confirm receipt, but this approach is inefficient (packets can take a long time to get to and from their destination)—you really want to be able to send as many packets as possible at once and then have some means of negotiating with the remote machine when to resend packets that were not received. What
TCP
(
Transmission Control Protocol
) does is to send data packets one way and then
acknowledgment packets
other way, saying how much of the stream has been properly received.
the
We therefore say that
TCP is implemented on top of IP
. This is why Internet communication is sometimes called TCP/IP.
TCP communication has three stages:
negotiation
,
transfer
, and is all my own terminology. This is also somewhat of a schematic representation.
-
detachment
.
&
This
Negotiation
The by using a
client
C
application (say, a web browser) first initiates the connection connect() (see connect (2)) function. This causes the kernel to
263
26.1. The TCP Header 26. TCP and UDP send a
SYN
(
SYN
chronization) packet to the remote TCP server (in this case, a web server). The web server responds with a
SYN-ACK
packet (
ACK
nowledge), and finally the client responds with a final SYN packet. This packet negotiation is unbeknown to the programmer.
Transfer:
The programmer will use the send() ( send (2)) and recv() ( recv (2))
C
function calls to send and receive an actual stream of bytes. The stream of bytes will be broken into packets, and the packets sent individually to the remote application. In the case of the web server, the first bytes sent would be the line
GET /index.html HTTP/1.0<CR><NL><CR><NL> . On the remote side, reply packets (also called ACK packets) are sent back as the data arrives, indicating whether parts of the stream went missing and require retransmission. Communication is
full-duplex
—meaning that there are streams in both directions—both data and acknowledge packets are going both ways simultaneously.
Detachment:
The programmer will use the close() (see
C
function call shutdown() and close(2)) shutdown() and to terminate the connection. A
FIN
packet will be sent and TCP communication will cease.
TCP packets are obviously
Data begins at. . .
encapsulated
within IP packets. The TCP packet is inside the part of the IP packet. A TCP packet has a header part and a data part. The data part may sometimes be empty (such as in the negotiation stage).
Table 26.1 shows the full TCP/IP header.
Bytes (IP)
0
1
2–3
4–5
6–7
8
9
10–11
12–15
16–19
20–IHL*4-1
Bytes (TCP)
Table 26.1 Combined TCP and IP header
Description
Bits 0–3: Version, Bits 4–7: Internet Header Length (IHL)
Type of service (TOS)
Length
Identification
Bits 0-3: Flags, bits 4-15: Offset
Time to live (TTL)
Type
Checksum
Source IP address
Destination IP address
Options + padding to round up to four bytes
Description
continues...
264
26. TCP and UDP 26.2. A Sample TCP Session
Table 26.1 (continued)
0–1
2–3
4–7
8–11
12
13
14–15
16–17
18–19
20–(20 + options * 4)
Source port
Destination port
Sequence number
Acknowledgment number
Bits 0–3: number of bytes of additional TCP options / 4
Control
Window
Checksum
Urgent pointer
Options + padding to round up to four bytes
TCP data begins at IHL * 4 + 20 + options * 4 and ends at Length - 1
The minimum combined TCP/IP header is thus 40 bytes.
The
With Internet machines, several applications often communicate simultaneously.
Source port
and
Destination port
fields identify and distinguish individual streams. In the case of web communication, the destination port (from the clients point of view) is port 80, and hence all outgoing traffic will have the number 80 filled in this field. The source port (from the client’s point of view) is chosen randomly to any unused port number above 1024 before the connection is negotiated; these, too, are filled into outgoing packets. No two streams have the same combinations of source and destination port numbers. The kernel uses the port numbers on incoming packets to determine which application requires those packets, and similarly for the remote machine.
Sequence number
is the offset within the stream that this particular packet of data belongs to. The
Acknowledge number
data has been received.
Control
is the point in the stream up to which all is various other flag bits.
Window
is the maximum amount that the receiver is prepared to accept.
tegrity, and
Urgent pointer
Checksum
is used to verify data inis for interrupting the stream. Data needed by extensions to the protocol are appended after the header as options.
It is easy to see TCP working by using telnet telnet to log in to remote systems, but
. You are probably familiar with using telnet is actually a generic program to connect to
any
web page.
TCP socket as we did in Chapter 10. Here we will try connect to cnn.com
’s
We first need to get an IP address of cnn.com
:
265
26.2. A Sample TCP Session 26. TCP and UDP
¨
[[email protected]]# host
cnn.com
¥
¦
10
15
20
25
30
5
Now, in one window we run
¨
tcpdump \
’( src 192.168.3.9 and dst 207.25.71.20 ) or ( src 207.25.71.20 and dst 192.168.3.9 )’
Kernel filter, protocol ALL, datagram packet socket
¥ which says to list all packets having source ( src ) or destination ( dst ) addresses of either us or CNN.
¦
Then we use the HTTP protocol to grab the page. Type in the HTTP command
GET / HTTP/1.0
and then press
twice
(as required by the HTTP protocol). The
¥
[[email protected] root]#
telnet 207.25.71.20 80
Trying 207.25.71.20...
Connected to 207.25.71.20.
Escape character is ’ˆ]’.
GET / HTTP/1.0
HTTP/1.0 200 OK
Server: Netscape-Enterprise/2.01
Date: Tue, 18 Apr 2000 10:55:14 GMT
Set-cookie: CNNid=cf19472c-23286-956055314-2; expires=Wednesday, 30-Dec-2037 16:00:00 GMT; path=/; domain=.cnn.com
Last-modified: Tue, 18 Apr 2000 10:55:14 GMT
Content-type: text/html
<HTML>
<HEAD>
<TITLE>CNN.com</TITLE>
<META http-equiv="REFRESH" content="1800">
<!--CSSDATA:956055234-->
<SCRIPT src="/virtual/2000/code/main.js" language="javascript"></SCRIPT>
<LINK rel="stylesheet" href="/virtual/2000/style/main.css" type="text/css">
<SCRIPT language="javascript" type="text/javascript">
<!--// if ((navigator.platform==’MacPPC’)&&(navigator.ap
..............
..............
</BODY>
</HTML>
¦
The above commands produce the front page of CNN’s web site in raw HTML.
This is easy to paste into a file and view off-line.
In the other window, tcpdump tcpdump is showing us what packets are being exchanged.
nicely shows us host names instead of IP addresses and the letters www instead of the port number 80. The local “random” port in this case was 4064 .
266
26. TCP and UDP 26.2. A Sample TCP Session
5
10
15
20
25
30
¨
tcpdump \
’( src 192.168.3.9 and dst 207.25.71.20 ) or ( src 207.25.71.20 and dst 192.168.3.9 )’
Kernel filter, protocol ALL, datagram packet socket tcpdump: listening on all devices
12:52:35.467121 eth0 > cericon.cranzgot.co.za.4064 > www1.cnn.com.www:
S 2463192134:2463192134(0) win 32120 <mss 1460,sackOK,timestamp 154031689 0,nop,wscale 0
12:52:35.964703 eth0 < www1.cnn.com.www > cericon.cranzgot.co.za.4064:
S 4182178234:4182178234(0) ack 2463192135 win 10136 <nop,nop,timestamp 1075172823 154031
12:52:35.964791 eth0 > cericon.cranzgot.co.za.4064 > www1.cnn.com.www:
. 1:1(0) ack 1 win 32120 <nop,nop,timestamp 154031739 1075172823> (DF)
12:52:46.413043 eth0 > cericon.cranzgot.co.za.4064 > www1.cnn.com.www:
P 1:17(16) ack 1 win 32120 <nop,nop,timestamp 154032784 1075172823> (DF)
12:52:46.908156 eth0 < www1.cnn.com.www > cericon.cranzgot.co.za.4064:
. 1:1(0) ack 17 win 10136 <nop,nop,timestamp 1075173916 154032784>
12:52:49.259870 eth0 > cericon.cranzgot.co.za.4064 > www1.cnn.com.www:
P 17:19(2) ack 1 win 32120 <nop,nop,timestamp 154033068 1075173916> (DF)
12:52:49.886846 eth0 < www1.cnn.com.www > cericon.cranzgot.co.za.4064:
P 1:278(277) ack 19 win 10136 <nop,nop,timestamp 1075174200 154033068>
12:52:49.887039 eth0 > cericon.cranzgot.co.za.4064 > www1.cnn.com.www:
. 19:19(0) ack 278 win 31856 <nop,nop,timestamp 154033131 1075174200> (DF)
12:52:50.053628 eth0 < www1.cnn.com.www > cericon.cranzgot.co.za.4064:
. 278:1176(898) ack 19 win 10136 <nop,nop,timestamp 1075174202 154033068>
12:52:50.160740 eth0 < www1.cnn.com.www > cericon.cranzgot.co.za.4064:
P 1176:1972(796) ack 19 win 10136 <nop,nop,timestamp 1075174202 154033068>
12:52:50.220067 eth0 > cericon.cranzgot.co.za.4064 > www1.cnn.com.www:
. 19:19(0) ack 1972 win 31856 <nop,nop,timestamp 154033165 1075174202> (DF)
12:52:50.824143 eth0 < www1.cnn.com.www > cericon.cranzgot.co.za.4064:
. 1972:3420(1448) ack 19 win 10136 <nop,nop,timestamp 1075174262 154033131>
12:52:51.021465 eth0 < www1.cnn.com.www > cericon.cranzgot.co.za.4064:
. 3420:4868(1448) ack 19 win 10136 <nop,nop,timestamp 1075174295 154033165>
¥
35
40
45
50
55
60
65
..............
..............
12:53:13.856919 eth0 > cericon.cranzgot.co.za.4064 > www1.cnn.com.www:
. 19:19(0) ack 53204 win 30408 <nop,nop,timestamp 154035528 1075176560> (DF)
12:53:14.722584 eth0 < www1.cnn.com.www > cericon.cranzgot.co.za.4064:
. 53204:54652(1448) ack 19 win 10136 <nop,nop,timestamp 1075176659 154035528>
12:53:14.722738 eth0 > cericon.cranzgot.co.za.4064 > www1.cnn.com.www:
. 19:19(0) ack 54652 win 30408 <nop,nop,timestamp 154035615 1075176659> (DF)
12:53:14.912561 eth0 < www1.cnn.com.www > cericon.cranzgot.co.za.4064:
. 54652:56100(1448) ack 19 win 10136 <nop,nop,timestamp 1075176659 154035528>
12:53:14.912706 eth0 > cericon.cranzgot.co.za.4064 > www1.cnn.com.www:
. 19:19(0) ack 58500 win 30408 <nop,nop,timestamp 154035634 1075176659> (DF)
12:53:15.706463 eth0 < www1.cnn.com.www > cericon.cranzgot.co.za.4064:
. 58500:59948(1448) ack 19 win 10136 <nop,nop,timestamp 1075176765 154035634>
12:53:15.896639 eth0 < www1.cnn.com.www > cericon.cranzgot.co.za.4064:
. 59948:61396(1448) ack 19 win 10136 <nop,nop,timestamp 1075176765 154035634>
12:53:15.896791 eth0 > cericon.cranzgot.co.za.4064 > www1.cnn.com.www:
. 19:19(0) ack 61396 win 31856 <nop,nop,timestamp 154035732 1075176765> (DF)
12:53:16.678439 eth0 < www1.cnn.com.www > cericon.cranzgot.co.za.4064:
. 61396:62844(1448) ack 19 win 10136 <nop,nop,timestamp 1075176864 154035732>
12:53:16.867963 eth0 < www1.cnn.com.www > cericon.cranzgot.co.za.4064:
. 62844:64292(1448) ack 19 win 10136 <nop,nop,timestamp 1075176864 154035732>
12:53:16.868095 eth0 > cericon.cranzgot.co.za.4064 > www1.cnn.com.www:
. 19:19(0) ack 64292 win 31856 <nop,nop,timestamp 154035829 1075176864> (DF)
12:53:17.521019 eth0 < www1.cnn.com.www > cericon.cranzgot.co.za.4064:
FP 64292:65200(908) ack 19 win 10136 <nop,nop,timestamp 1075176960 154035829>
12:53:17.521154 eth0 > cericon.cranzgot.co.za.4064 > www1.cnn.com.www:
. 19:19(0) ack 65201 win 31856 <nop,nop,timestamp 154035895 1075176960> (DF)
12:53:17.523243 eth0 > cericon.cranzgot.co.za.4064 > www1.cnn.com.www:
F 19:19(0) ack 65201 win 31856 <nop,nop,timestamp 154035895 1075176960> (DF)
12:53:20.410092 eth0 > cericon.cranzgot.co.za.4064 > www1.cnn.com.www:
F 19:19(0) ack 65201 win 31856 <nop,nop,timestamp 154036184 1075176960> (DF)
12:53:20.940833 eth0 < www1.cnn.com.www > cericon.cranzgot.co.za.4064:
267
26.3. User Datagram Protocol (UDP) 26. TCP and UDP
. 65201:65201(0) ack 20 win 10136 <nop,nop,timestamp 1075177315 154035895>
The preceding output requires some explanation: Line 5, 7, and 9 are the negotiation stage.
tcpdump uses the format <Sequence number>:<Sequence number
+ data length>(<data length>) on each line to show the context of the packet within the stream.
tcpdump
Sequence number
, however, is chosen randomly at the outset, so prints the relative sequence number after the first two packets to make it clearer what the actual position is within the stream. Line 11 is where I pressed the first time, and Line 15 was Enter with an empty line. The “ ack 19
Enter
”s indicates the point to which CNN’s web server has received incoming data; in this case we only ever typed in 19 bytes, hence the web server sets this value in every one of its outgoing packets, while our own outgoing packets are mostly empty of data.
Lines 61 and 63 are the detachment stage.
More information about the tcpdump output can be had from tcpdump (8) under the section
TCP Packets
.
¦
You don’t
always
need reliable communication.
Sometimes you want to directly control packets for efficiency, or because you don’t really mind if packets get lost. Two examples are name server communications, for which single packet transmissions are desired, or voice transmissions for which reducing lag time is more important than data integrity. Another is NFS (Network File
System), which uses UDP to implement exclusively high bandwidth data transfer.
With UDP the programmer sends and receives individual packets, again encapsulated within IP. Ports are used in the same way as with TCP, but these are merely identifiers and there is no concept of a stream. The full UDP/IP header is listed in
Table 26.2.
Table 26.2 Combined UDP and IP header
Bytes (IP)
0
1
2–3
4–5
6–7
Description
bits 0–3: Version, bits 4–7: Internet Header Length (IHL)
Type of service (TOS)
Length
Identification bits 0-3: Flags, bits 4-15: Offset continues...
268
26. TCP and UDP 26.4.
/etc/services
File
Table 26.2 (continued)
8
9
10–11
12–15
16–19
20–(IHL * 4 - 1)
Time to live (TTL)
Type
Checksum
Source IP address
Destination IP address
Options + padding to round up to four bytes
Bytes (UDP)
0–1
2–3
4–5
6–7
Description
Source port
Destination port
Length
Checksum
UDP data begins at IHL * 4 + 8 and ends at Length - 1
Various standard port numbers are used exclusively for particular types of services.
Port 80 is for web as shown earlier. Port numbers 1 through 1023 are reserved for such standard services and each is given a convenient textual name.
All services are defined for both TCP as well as UDP, even though there is, for example, no such thing as UDP FTP access.
Port numbers below 1024 are used exclusively for root uid programs such as mail, DNS, and web services. Programs of ordinary users are not allowed to
bind
to ports below 1024.
&
Port binding is where a program reserves a port for listening for an incoming connection, as do all network services. Web servers, for example,
bind
to port 80.
The place where these ports are defined is in the /etc/services file. These mappings are mostly for descriptive purposes—programs can look up port names from numbers and visa versa.
The /etc/services file has nothing to do with the availability of a service.
5
10
Here is an extract of the
¨ tcpmux echo echo discard discard systat daytime daytime netstat qotd msp msp
1/tcp
7/tcp
7/udp
9/tcp
9/udp
11/tcp
13/tcp
13/udp
15/tcp
17/tcp
18/tcp
18/udp
/etc/services .
sink null sink null users quote
# TCP port service multiplexer
# message send protocol
# message send protocol
¥
269
26.5. Encrypting and Forwarding TCP 26. TCP and UDP
15
20
25
30
35 ftp-data ftp fsp ssh ssh telnet smtp time time rlp nameserver whois domain domain mtp bootps bootps bootpc bootpc tftp gopher gopher rje finger www
20/tcp
21/tcp
21/udp
22/tcp
22/udp
23/tcp
25/tcp
37/tcp
37/udp
39/udp
42/tcp
43/tcp
53/tcp
53/udp
57/tcp
67/tcp
67/udp
68/tcp
68/udp
69/udp
70/tcp
70/udp
77/tcp
79/tcp
80/tcp
80/udp fspd
# SSH Remote Login Protocol
# SSH Remote Login Protocol mail timserver timserver resource name nicname nameserver nameserver
# resource location
# IEN 116
# name-domain server
# deprecated
# BOOTP server
# BOOTP client
# Internet Gopher netrjs http # WorldWideWeb HTTP
# HyperText Transfer Protocol
¦
The TCP stream can easily be reconstructed by anyone listening on a wire who happens to see your network traffic, so TCP is known as an inherently insecure service. We would like to encrypt our data so that anything captured between the client and server will appear garbled. Such an encrypted stream should have several properties:
1.
It should ensure that the connecting client
really
is connecting to the server in question. In other words it should authenticate the server to ensure that the server is not a Trojan.
2.
It should prevent any information being gained by a snooper. This means that any traffic read should appear cryptographically garbled.
3.
It should be impossible for a listener to modify the traffic without detection.
The above is relatively easily accomplished with at least two packages. Take the example where we would like to use POP3 to retrieve mail from a remote machine.
First, we can verify that POP3 is working by logging in on the POP3 server. Run a telnet to port 110 (i.e., the POP3 service) as follows:
270
26. TCP and UDP 26.5. Encrypting and Forwarding TCP
5
¨ telnet localhost 110
Connected to localhost.localdomain.
Escape character is ’ˆ]’.
+OK POP3 localhost.localdomain v7.64 server ready
QUIT
+OK Sayonara
Connection closed by foreign host.
§
¥
¦
For our first example, we use the OpenSSH package. We can initialize and run the sshd Secure Shell daemon if it has not been initialized before. The following com-
¥ ssh-keygen -b 1024 -f /etc/ssh/ssh_host_key -q -N ’’ ssh-keygen -d -f /etc/ssh/ssh_host_dsa_key -q -N ’’ sshd
§ ¦
5
To create an encrypted channel shown in Figure 26.1, we use the ssh client login program in a special way. We would like it to listen on a particular TCP port and then encrypt and forward all traffic to the remote TCP port on the server. This is known as
(encrypted) port forwarding
. On the client machine we choose an arbitrary unused port to listen on, in this case
¨
12345 : ssh -C -c arcfour -N -n -2 -L 12345:<pop3-server.doma.in>:110 \
§
<pop3-server.doma.in> -l <user> -v
¥
¦ where <user> is the name of a shell account on the POP3 server. Finally, also on the client machine, we run:
¨ telnet localhost 12345
Connected to localhost.localdomain.
Escape character is ’ˆ]’.
+OK POP3 localhost.localdomain v7.64 server ready
QUIT
+OK Sayonara
Connection closed by foreign host.
§
¥
¦
Here we get results identical to those above, because, as far as the server is concerned, the POP3 connection comes from a client on the server machine itself, unknowing of the fact that it has originated from sshd , which in turn is forwarding from a remote ssh client. In addition, the -C option compresses all data (useful for low-speed connections). Also note that you should generally never use any encryption besides arcfour and SSH Protocol 2 (option -2 ).
The second method is the forward program of the mirrordir package. It has a unique encryption protocol that does much of what OpenSSH can, although the pro-
271
26.5. Encrypting and Forwarding TCP
Client
( telnet locahost 12345 )
26. TCP and UDP
( ssh . . . 12345:pop:110 pop)
12345
POP3 Server
( sshd )
22
( ipop3d )
110
Figure 26.1 Forwarding between two machines tocol has not been validated by the community at large (and therefore should be used with caution). On the server machine you can just type secure-mcserv . On the client
¥ forward <user>@<pop3-server.doma.in> <pop3-server.doma.in>:110 \
§
12345 --secure -z -K 1024
¦ and then run telnet 12345 to test as before.
With forwarding enabled you can use any POP3 client as you normally would.
Be sure, though, to set your host and port addresses to localhost and 12345 within your POP3 client.
This example can, of course, be applied to
almost
any service. Some services will not work if they do special things like create reverse TCP connections back to the client
(for example, FTP). Your luck may vary.
272
We know that each computer on the Internet has its own IP address. Although this address is sufficient to identify a computer for purposes of transmitting packets, it is not particularly accommodating to people. Also, if a computer were to be relocated, we would like to still identify it by the same name.
Hence, each computer is given a descriptive textual name. The basic textual name of a machine is called the
unqualified host name
&
This is my own terminology.
and is usually less than eight characters and contains only lowercase letters and numbers (and especially no dots). Groups of computers have a
domain name
. The full name of machine is
unqualified host name
.
domain name
and is called the
fully qualified host name
&
Standard terminology.
or the
qualified host name
.
&
My terminology.
For example, my computer is cericon . The domain name of my company is cranzgot.co.za
, and hence the qualified host name of my computer is address might be 160.123.76.9
.
cericon.cranzgot.co.za
, although the IP
Often the word
domain
is synonymous with
domain name
, and the
host name
on its own can mean either the qualified or unqualified host name.
This system of naming computers is called the
Domain Name System (DNS)
Domains always end in a standard set of things. Here is a complete list of things that the last bit of a domain can be.
.com
A U.S. or international
.com
domain.
com pany proper. In fact, any organization might have a
273
27.2. Resolving DNS Names to IP Addresses 27. DNS and Name Resolution
.gov
A U.S.
gov ernment organization.
.edu
A U.S. university.
.mil
A U.S.
mil itary department.
.int
An organization established by int ernational treaties.
.org
A U.S. or nonprofit org anization. In fact, anyone can have a .org
domain.
.net
An Internet service provider (ISP). In fact, any bandwidth reseller, IT company, or any company at all might have a .net
domain.
Besides the above, the domain could end in a two-letter country code.
The complete list of country codes is given in Table 27.1. The rarely used, since in the United States .com
, .edu
, .org
, .mil
, .gov
,
.us
domain is
.int
, or .net
are mostly used.
Within each country, a domain may have things before it for better description.
Each country may implement a different structure. Some examples are:
.co.za
A South African co mpany. ( za = Zuid Afrika, from Dutch.)
.org.za
A South African nonprofit org anization.
.ac.za
A South African ac ademic university.
.edu.au
An au stralian tertiary edu cational institution.
.gov.za
A South African gov ernment organization.
a
Note that a South African company might choose a
.co.za
.com
domain instead of domain. The Internet has become more commercialized than organized, meaning that anyone can pretty much register any domain that is not already taken.
In practice, a user will type a host name (say, www.cranzgot.co.za
) into some application like a web browser. The application has to then try find the IP address associated with that name, in order to send packets to it. This section describes the query structure used on the Internet so that everyone can find out anyone else’s IP address.
An obvious lookup infrastructure might involve distributing a long table of host name vs. IP numbers to every machine on the Internet. But as soon as you have more than a few thousand machines, this approach becomes impossible.
274
27. DNS and Name Resolution 27.2. Resolving DNS Names to IP Addresses
.af
Afghanistan
.al
Albania
.dz
Algeria
.as
American samoa
.ad
Andorra
.ao
Angola
.ai
Anguilla
.aq
Antarctica
.ag
Antigua and barbuda
.ar
Argentina
.am
Armenia
.aw
Aruba
.au
Australia
.at
Austria
.az
Bzerbaijan
.bs
Bahamas
.bh
Bahrain
.bd
Bangladesh
.bb
Barbados
.be
Belgium
.bz
Belize
.bj
Benin
.bm
Bermuda
.bt
Bhutan
.bo
Bolivia
.ba
Bosnia Hercegovina
.bw
Botswana
.bv
Bouvet Island
.br
Brazil
.io
British Indian Ocean Territory
.bn
Brunei Darussalam
.bg
Bulgaria
.bf
Burkina Faso
.bi
Burundi
.by
Celarus
.kh
Cambodia
.cm
Cameroon
.ca
Canada
.cv
Cape Verde
.ky
Cayman Islands
.cf
Central African Rep.
.td
Chad
.cl
Chile
.cn
China
.cx
Christmas Island
.cc
Cocos (Keeling) Islands
.co
Colombia
.km
Comoros
.cg
Congo
.ck
Cook Islands
.cr
Costa Rica
.ci
Cote D’ivoire
.hr
Croatia
.cu
Cuba
.cy
Cyprus
.cz
Czech Rep.
.cs
Dzechoslovakia
.dk
Denmark
.dj
Djibouti
.dm
Dominica
Table 27.1 ISO country codes
.do
Eominican Rep.
.tp
East Timor
.ec
Ecuador
.eg
Egypt
.sv
El Salvador
.gq
Equatorial Guinea
.ee
Estonia
.et
Fthiopia
.fk
Falkland Islands (Malvinas)
.fo
Faroe Islands
.fj
Fiji
.fi
Finland
.fr
France
.gf
French Guiana
.pf
French Polynesia
.tf
Grench Southern Territories
.ga
Gabon
.gm
Gambia
.ge
Georgia
.de
Germany
.gh
Ghana
.gi
Gibraltar
.gr
Greece
.gl
Greenland
.gd
Grenada
.gp
Guadeloupe
.gu
Guam
.gt
Guatemala
.gn
Guinea
.gw
Guinea-Bissau
.gy
Huyana
.ht
Haiti
.hm
Heard and Mc Donald Islands
.hn
Honduras
.hk
Hong Kong
.hu
Iungary
.is
Iceland
.in
India
.id
Indonesia
.ir
Iran (Islamic Rep. of)
.iq
Iraq
.ie
Ireland
.il
Israel
.it
Jtaly
.jm
Jamaica
.jp
Japan
.jo
Kordan
.kz
Kazakhstan
.ke
Kenya
.ki
Kiribati
.kp
Korea, Demo. People’s Rep.of
.kr
Korea, Rep. of
.kw
Kuwait
.kg
Lyrgyzstan
.la
Lao People’s Demo. Rep.
.lv
Latvia
.lb
Lebanon
.ls
Lesotho
.lr
Liberia
.ly
Libyan Arab Jamahiriya
.li
Liechtenstein
.lt
Lithuania
.lu
Muxembourg
.mo
Macau
.mg
Madagascar
.mw
Malawi
.my
Malaysia
.mv
Maldives
.ml
Mali
.mt
Malta
.mh
Marshall Islands
.mq
Martinique
.mr
Mauritania
.mu
Mauritius
.mx
Mexico
.fm
Micronesia
.md
Moldova, Rep. of
.mc
Monaco
.mn
Mongolia
.ms
Montserrat
.ma
Morocco
.mz
Mozambique
.mm
Nyanmar
.na
Namibia
.nr
Nauru
.np
Nepal
.nl
Netherlands
.an
Netherlands Antilles
.nt
Neutral Zone
.nc
New Caledonia
.nz
New Zealand
.ni
Nicaragua
.ne
Niger
.ng
Nigeria
.nu
Niue
.nf
Norfolk Island
.mp
Northern Mariana Islands
.no
Oorway
.om
Pman
.pk
Pakistan
.pw
Palau
.pa
Panama
.pg
Papua New Guinea
.py
Paraguay
.pe
Peru
.ph
Philippines
.pn
Pitcairn
.pl
Poland
.pt
Portugal
.pr
Querto Rico
.qa
Ratar
.re
Reunion
.ro
Romania
.ru
Russian Federation
.rw
Swanda
.sh
St. Helena
.kn
Saint Kitts and Nevis
.lc
Saint Lucia
.pm
St. Pierre and Miquelon
.vc
St. Vincent and the Grenadines
Samoa
San Marino
Sao Tome and Principe
Saudi Arabia
Senegal
Seychelles
Sierra Leone
Singapore
Slovakia
Slovenia
Solomon Islands
Somalia
South Africa
Spain
Sri Lanka
Sudan
Suriname
Svalbard and Jan Mayen Is.
Swaziland
Sweden
Switzerland
Tyrian Arab Rep.
Taiwan, Province of China
Tajikistan
Tanzania, United Rep. of
Thailand
Togo
Tokelau
Tonga
Trinidad and Tobago
Tunisia
Turkey
Turkmenistan
Turks and Caicos Islands
Uuvalu
Uganda
Ukraine
United Arab Emirates
United Kingdom
United States
US Minor Outlying Islands
Uruguay
USSR
Vzbekistan
Vanuatu
Vatican City State (Holy See)
Venezuela
Viet Nam
Virgin Islands (British)
Wirgin Islands (U.S.)
Wallis and Futuna Islands
Yestern Sahara
Yemen, Rep. of
Zugoslavia
Zaire
Zambia
Zimbabwe
.su
.uz
.vu
.va
.ve
.vn
.vg
.vi
.tm
.tc
.tv
.ug
.ua
.ae
.gb
.us
.um
.uy
.wf
.eh
.ye
.yu
.zr
.zm
.zw
.tz
.th
.tg
.tk
.to
.tt
.tn
.tr
.sr
.sj
.sz
.se
.ch
.sy
.tw
.tj
.sk
.si
.sb
.so
.za
.es
.lk
.sd
.ws
.sm
.st
.sa
.sn
.sc
.sl
.sg
Another imaginary infrastructure might have one huge computer on the Internet somewhere whose IP address is known by everyone. This computer would be responsible for servicing requests for IP numbers, and the said application running on your local machine would just query this big machine. Of course, with billions of machines out there, this approach will obviously create far too much network traffic.
some Microsoft LANs kind of work this way—that is, not very well.
-
&
Actually,
The DNS structure on the Internet actually works like this.
275
27.2. Resolving DNS Names to IP Addresses 27. DNS and Name Resolution
There
are
They are called computers that service requests for IP numbers—millions of them.
name servers
(or
DNS servers
), and a request is called a
DNS lookup
(or just a
lookup
). However, each name server only has information about a specific part of the Internet, and they constantly query each other.
5
10
There are 13
root
name servers on the Internet.
ftp://ftp.rs.internic.net/domain/named.root
.
-
¨ a.root-servers.net
b.root-servers.net
198.41.0.4
128.9.0.107
c.root-servers.net
d.root-servers.net
e.root-servers.net
f.root-servers.net
g.root-servers.net
h.root-servers.net
i.root-servers.net
j.root-servers.net
k.root-servers.net
l.root-servers.net
m.root-servers.net
§
192.33.4.12
128.8.10.90
192.203.230.10
192.5.5.241
192.112.36.4
128.63.2.53
192.36.148.17
198.41.0.10
193.0.14.129
198.32.64.12
202.12.27.33
&
This list can be gotten from
¥
¦
Each country also has a name server, and in turn each organization has a name server. Each name server only has information about machines in its own domain, as well as information about other name servers. The root name servers only have information on the IP addresses of the name servers of .com
, .edu
, .za
, etc. The
.za
name server only has information on the IP addresses of the name servers of
.org.za
, .ac.za
, .co.za
, etc. The .co.za
name server only has information on the name servers of all South African companies, like .cranzgot.co.za
, .icon.co.za
,
.mweb.co.za
, etc. The .cranzgot.co.za
, name server only has info on the machines at Cranzgot Systems, like www.cranzgot.co.za
.
Your own machine will defined in its configuration files a name server that is geographically close to it. The responsibilities of this name server will be to directly answer any queries about its own domain that it has information about and to answer any other queries by querying as many other name servers on the Internet as is necessary.
Now our application is presented with www.cranzgot.co.za
. The following sequence of lookups takes place to resolve this name into an IP address. This procedure is called
host name resolution
the
resolver
.
and the algorithm that performs this operation is called
276
27. DNS and Name Resolution 27.3. Configuring Your Local Machine
1.
The application checks certain special databases on the local machine. If it can get an answer directly from them, it proceeds no further.
2.
The application looks up a geographically close name server from the local machine’s configuration file. Let’s say this machine is called ns .
3.
The application queries ns with “ www.cranzgot.co.za
?”.
4.
ns determines whether that IP has been recently looked up. If it has, there is no need to ask further, since the result would be stored in a local cache.
5.
ns checks whether the domain is local. That is, whether it is a computer about which it has direct information. In this case, this would only be true if the ns were cranzgot.co.za
’s very own name server.
6.
ns strips out the TLD (top level domain) .za
. It queries a root name server, asking what name server is responsible for .za
.
The answer will be ucthpx.uct.ac.za
of IP address 137.158.128.1
.
7.
ns strips out the next highest domain co.za
asking what name server is responsible for
It queries
.co.za
.
137.158.128.1
The answer will be
, secdns1.posix.co.za
of IP address 160.124.112.10
.
8.
ns strips out the next highest domain cranzgot.co.za
.
It queries
160.124.112.10
, asking what name server is responsible for got.co.za
.
The answer will be pizza.cranzgot.co.za
cranzof IP address
196.28.123.1
.
9.
ns queries 196.28.123.1
asking for the IP address of www.cranzgot.co.za
.
The answer will be 160.123.176.1
.
10.
ns returns the result to the application.
11.
ns stores each of these results in a local cache with an expiration date, to avoid having to look them up a second time.
We referred to “configuration files” above.
/etc/host.conf
, /etc/hosts , and
These are actually the files:
/etc/resolv.conf
. These are the three and only files that specify how all applications are going to look up IP numbers; and have nothing to do with the configuration files of the name server daemon itself, even though a name server daemon might be running on the local machine.
When an application needs to look up a host name, it goes through the following procedure.
&
What is actually happening is that the application is making a
C
library call to the function
277
27.3. Configuring Your Local Machine 27. DNS and Name Resolution gethostbyname() , hence all these configuration files really belong to the C library packages glibc or libc . However, this is a detail you need not be concerned about.
The following are equivalent to steps 1, 2, and 3 above, with the details of the configuration files filled in. The configuration files that follow are taken from an actual installation.
1.
The application checks the file line order hosts,bind
/etc/host.conf
. This file will usually have a in it, specifying that it should first ( hosts ) check the local database file fied in
/etc/hosts
/etc/resolv.conf
, and then (
. The file bind ) query the name server speci-
/etc/hosts contains a plain text list of
IP addresses and names. An example is given below. If the application can get an answer directly from /etc/hosts , it proceeds no further.
2.
The application checks in the file /etc/resolv.conf
for a line nameserver
<nameserver> . There can actually be three of these lines so that if one name server fails, the application can try the next in turn.
3.
The application sends to the name server a query with the host name. If the host name is unqualified, then the application, before trying the query, appends to the host name a local domain name. A line
...
<domainN> search <domain1> <domain2> may appear in the configuration file to facilitate this. A query is made with each of <domain1> , <domain2> etc. appended in turn until the query successfully returns an IP. This just saves you having to type in the full host name for computers within your own organization.
4.
The name server proceeds with the hierarchical queries described from step 4 onward.
The /etc/hosts
¨
127.0.0.1
192.168.3.9
192.168.3.10
192.168.2.1
§ file should look something like this: localhost.localdomain
cericon.cranzgot.co.za
pepper.cranzgot.co.za
onion.cranzgot.co.za
localhost cericon pepper onion
¥
¦
The hosts pepper , cericon , and onion are the hosts that this machine has the most communication with, and hence are listed here.
cericon is the local machine and must be listed. You can list any hosts to which you want fast lookups, or hosts that might need to be known in spite of name servers being down.
¨ order
The /etc/host.conf
might look like this. All of the lines are optional: hosts, bind, nis trim some.domain
spoofalert nospoof
¥
278
27. DNS and Name Resolution 27.3. Configuring Your Local Machine
5 multi reorder
§ on
¦
order
The order in which lookups are done. Don’t try fiddling with this value. It never seems to have any effect. You should leave it as order hosts,bind
(or order hosts,bind,nis
HOWTO on the web.)
/etc/resolv.conf
if you are using NIS—search for the
Once again, bind
NISmeans to then go and check the which holds the name server query options.
trim
Strip the domain some.domain
from the end of a host name before trying a lookup. You will probably never require this feature.
spoofalert
Try reverse lookups on a host name after looking up the IP (i.e., do a query to find the name from the IP). If this query does not return the correct result, it could mean that some machine is trying to make it look like it is someone it really isn’t. This is a hacker’s trick called
spoofing
.
spoofalert warns you of such attempts in your log file /var/log/messages .
nospoof
Disallow results that fail the spoof test.
multi on
Return more than one result if there are aliases. Actually, a host can have several IP numbers, and an IP number can have several host names. Consider a computer that might want more than one name ( ftp.cranzgot.co.za
www.cranzgot.co.za
and are the same machine.) Or a machine that has several networking cards and an IP address for each. This option should always be turned on.
multi off is the alternative. Most applications use only the first value returned.
reorder
If more than one IP is returned by a lookup, then sort that list according to the IP that has the most convenient network route.
order
Despite this array of options, an ply like
¨ hosts, bind multi
§ on
/etc/host.conf
file almost always looks sim-
¥
¦
5
The /etc/resolv.conf
¨ nameserver 192.168.2.1
nameserver 160.123.76.1
file could look something like this: nameserver 196.41.0.131
search cranzgot.co.za ct.cranzgot.co.za uct.ac.za
sortlist 192.168.3.0/255.255.255.0 192.168.2.0/255.255.255.0
279
¥
27.3. Configuring Your Local Machine 27. DNS and Name Resolution options ndots:1 timeout:30 attempts:2 rotate no-check-names inet6
§ ¦
nameserver
Specifies a name server to query. No more than three may be listed. The point of having more than one is to safeguard against a name server being down; the next in the list will then be queried.
search
If given a host name with less than ndots dots (i.e., 1 in this case), add each of the domains in turn to the host name, trying a lookup with each. This option allows you to type in an unqualified host name and the application work out what organization it is belongs to from the search list. You can have up to six domains, but then queries would be time consuming.
domain
The line “ domain ct.cranzgot.co.za
” is the same as “ search ct.cranzgot.co.za cranzgot.co.za co.za
”. Always use search explicitly instead of domain to reduce the number of queries to a minimum.
sortlist
If more than one host is returned, sort them according to the following
network
/
mask
s.
options
Various additional parameters can be specified in this one line:
ndots
Explained under search above. The default is 1 .
timeout
How long to wait before considering a query to have failed. The default is 30 seconds.
attempts
Number of attempts to make before failing. The default is 2. This means that a down name server will cause your application to wait 1 full minute before deciding that it can’t resolve the IP.
rotate
Try the name servers in round robin fashion. This distributes load across name servers.
no-check-names inet6
Don’t check for invalid characters in host names.
The man page for resolv.conf
( resolver (5)) says: inet6 sets RES_USE_INET6 in _res.options .
This has the effect of trying a AAAA query before an A query inside the gethostbyname function, and of mapping IPv4 responses in IPv6 ‘‘tunnelled form’’ if no AAAA records are found but an A record set exists.
An AAAA query is a 128-bit “next generation,” or “IPV6” Internet address.
Despite this array of options, an simply like:
¨ nameserver 192.168.2.254
search cranzgot.co.za
§
/etc/resolv.conf
file almost always looks
¥
¦
280
27. DNS and Name Resolution 27.4. Reverse Lookups
A
reverse lookup
, mentioned under nospoof , is the determining of the host name from the IP address. The course of queries is similar to forward lookups using part of the IP address to find out what machines are responsible for what ranges of IP address.
A
forward lookup
is an ordinary lookup of the IP address from the host name.
I have emphasized that name servers only hold information for their own domains.
Any other information they may have about another domain is cached, temporary data that has an expiration date attached to it.
The domain that a name server has information about is said to be the domain that a name server is
authoritative
for. Alternatively we say: “a name server is
authoritative
for the domain.” For instance, the server thoritative for the domain cranzgot.co.za
ns2.cranzgot.co.za
is au-
. Hence, lookups from anywhere on the Internet having the domain of ns2.cranzgot.co.za
cranzgot.co.za
ultimately are the responsibility
, and originate (albeit through a long series of caches) from the host ns2.cranzgot.co.za
.
The command query. Try
¨ host host www.cnn.com
§ looks up a host name or an IP address, by doing a name server
¥
¦ for an example of a host with lots of IP address. Keep typing host over and over.
Notice that the order of the hosts keeps changing randomly. This reordering distributes load among the many cnn.com
servers.
Now, pick one of the IP addresses and type
¨ host <ip-address>
§
This command will return the host name cnn.com
.
¥
¦
Note that the host command is not available on all U
NIX systems.
The ping command has nothing directly to do with DNS but is a quick way of getting an IP address and at the same time checking whether a host is responding. It is often used as the acid test for network and DNS connectivity. See Section 25.10.1.
281
27.7. The
nslookup
Command 27. DNS and Name Resolution
¨
Now enter:
¥
¦
(Note that original BSD get a response like this:
¨
[rs.internic.net] whois worked like whois -h <host> <user> .) You will
¥
5
Whois Server Version 1.1
Domain names in the .com, .net, and .org domains can now be registered with many different competing registrars. Go to http://www.internic.net
for detailed information.
10
15
Domain Name: CNN.COM
Registrar: NETWORK SOLUTIONS, INC.
Whois Server: whois.networksolutions.com
Referral URL: www.networksolutions.com
Name Server: NS-01A.ANS.NET
Name Server: NS-01B.ANS.NET
Name Server: NS-02A.ANS.NET
Name Server: NS-02B.ANS.NET
Updated Date: 22-sep-1999
20
>>> Last update of whois database: Thu, 20 Jan 00 01:39:07 EST <<<
The Registry database contains ONLY .COM, .NET, .ORG, .EDU domains and
¦
(Internic happens to have this database of .com
, .net
, .org
, and .edu
domains.)
nslookup is a program to interactively query a name server. If you run
¨ nslookup
§
¥
¦ you will get a > prompt at which you can type commands. If you type in a host name, nslookup will return its IP address(s), and visa versa. Also, typing
¨ help
§
¥
¦ any time will return a complete list of commands. By default, nslookup name server listed in
¨
/etc/resolv.conf
uses the first for all its queries. However, the command server <nameserver>
§
¥
¦
282
27. DNS and Name Resolution 27.7. The
nslookup
Command will force nslookup to connect to a name server of your choice.
The word
record
is a piece of DNS information.
5
¨
Now enter the command: set type=NS
§
¥
¦
This tells nslookup to return the second type of information that a DNS can deliver:
the authoritative name server for a domain
or the NS record of the domain. You can enter any domain here. For instance, if you enter
¨ set type=NS cnn.com
§
¥
¦ nslookup returns
¨
Non-authoritative answer: cnn.com nameserver = NS-02B.ANS.NET
cnn.com nameserver = NS-02A.ANS.NET
cnn.com nameserver = NS-01B.ANS.NET
cnn.com nameserver = NS-01A.ANS.NET
¥
10
Authoritative answers can be found from:
NS-02B.ANS.NET
internet address = 207.24.245.178
NS-02A.ANS.NET
internet address = 207.24.245.179
NS-01B.ANS.NET
NS-01A.ANS.NET
§ internet address = 199.221.47.8
internet address = 199.221.47.7
This output tells us that four name servers are authoritative for the domain cnn.com
(one plus three backups). It also tells us that it did not get this answer from an authoritative source, but through a cached source. It also tells us what name servers are authoritative for this very information.
¦
Now, switch to a name server that
¨ server NS-02B.ANS.NET
§
is
authoritative for cnn.com
: and run the same query:
¨ cnn.com
§
The new result is somewhat more emphatic, but no different.
There are only a few other kinds of records that you can get from a name server.
Try
¥
¦
¥
¦
283
27.8. The
dig
Command 27. DNS and Name Resolution
5
¨ set type=MX cnn.com
§ to get the so-called
MX record
for that domain. The for handling mail destined to that domain.
MX
MX record is the server responsible records also have a priority (usually 10 or 20). This tells any mail server to try the 20 one should the 10 one fail, and so on.
There are usually only one or two MX records. Mail is actually the only Internet service handled by DNS. (For instance, there is no such thing as a NEWSX record for news, or a WX record for web pages, whatever kind of information we may like such records to hold.)
¥
¦
Also try
¨ set type=PTR
<ip-address> set type=A
<hostname> set type=CNAME
<hostname>
§
So-called
A
PTR records are reverse lookups, or P
oin
T
e
R
s
to host names. So-called records are forward lookups (the default type of lookup when you first invoke nslookup with), or A and the type of lookup the first half of this chapter was most concerned
ddress
lookups. So-called CNAME records are lookups of C
anonical
NAME
s
.
DNS allows you to alias a computer to many different names, even though each has one
real
name (called the
canonical
name).
CNAME lookups returns the machine name proper.
¥
¦
dig stands for
domain information groper
. It sends single requests to a DNS server for testing or scripting purposes (it is similar to nslookup , but non-interactive).
It is usually used like,
¨ ¥
¦ where <server> is the machine running the DNS daemon to query, the domain of interest and <query-type> is one of A , ANY , MX , NS ,
<domain>
SOA , HINFO is
, or
AXFR —of these, you can read about the non-obvious ones in used to test an Internet connection. See Section 20.7.4.
dig (1).
dig can also be
¨
Useful is the AXFR record. For instance dig @dns.dial-up.net icon.co.za AXFR
§ lists the entire domain of one of our local ISPs.
¥
¦
284
This chapter covers NFS, the file-sharing capabilities of U
NIX
, and describes how to set up directories shareable to other U
NIX machines.
As soon as one thinks of high-speed Ethernet, the logical possibility of
sharing
a file system across a network comes to mind. MS-DOS, OS/2, Apple Macintosh, and
Windows have their own file-sharing schemes (IPX, SMB etc.), and NFS is the U
NIX equivalent.
Consider your hard drive with its 10,000 or so files. Ethernet is fast enough that you should be able to entirely use the hard drive of another machine, transferring needed data as network packets as required; or you should be able to make a directory tree visible to several computers. Doing this efficiently is a complex task. NFS is a standard, a protocol, and (on L
INUX
) a software suite that accomplishes this task in an efficient manner. It is really easy to configure as well. Unlike some other sharing protocols, NFS merely shares files and does not facilitate printing or messaging.
Depending on your distribution, the following programs may be located in any of the bin or sbin directories. These are all daemon processes. To get NFS working, they should be started in the order given here.
portmap
(also sometimes called rpc.portmap
) This maps service names to ports.
Client and server processes may request a TCP port number based on a service name, and your portmap handles these requests. It is basically a network version of
/etc/services file.
285
28.2. Configuration Example 28. Network File System, NFS
rpc.mountd
(also sometimes called mountd ) This handles the initial incoming request from a client to mount a file system and check that the request is allowable.
rpc.nfsd
self.
(also sometimes called nfsd ) This is the core—the file-server program it-
rpc.lockd
(also sometimes called lockd ) This handles shared locks between different machines on the same file over the network.
The acronym RPC stands for
Remote Procedure Call
. RPC was developed along with NFS by Sun Microsystems. It is an efficient way for a program to call a function on another machine and can be used by any service that needs to have efficient distributed processing. These days, its not really used for much except NFS, having been superseded by technologies like CORBA.
&
The “Object-Oriented” version of RPC
You can however, still write distributed applications with L
INUX
’s RPC implementation.
Sharing a directory with a remote machine requires that forward and reverse DNS lookups be working for the server machine as well as all client machines. DNS is covered in Chapter 27 and Chapter 40. If you are just testing NFS and you are sharing directories to your local machine (which we do now), you
may
find NFS to still work without a proper DNS setup. You should at least have proper entries in your
/etc/hosts file for your local machine (see page 278).
The first step is deciding on the directory you would like to share. A useful trick is to share your CD-ROM to your whole LAN. This is perfectly safe considering that
CDs are read-only. Create an /etc/exports
¨
/mnt/cdrom
§
192.168.1.0/24(ro) file with the following in it: localhost(ro)
¥
¦
You can immediately see that the format of the /etc/exports file is simply a line for each shareable directory. Next to each directory name goes a list of hosts that are allowed to connect. In this case, those allowed access are all IP addresses having the upper 24 bits matching 192.168.1
, as well as the localhost .
Next, mount your CD-ROM as usual with
¨ mkdir -p /mnt/cdrom mount -t iso9660 -o ro /dev/cdrom /mnt/cdrom
§
¥
¦
Now start each of the NFS processes in sequence:
286
28. Network File System, NFS 28.2. Configuration Example
¨ portmap rpc.mountd
rpc.nfsd
rpc.lockd
§
¥
¦
Whenever you make changes to your running
¨ exportfs -r
§
/etc/exports file you should also follow by
¥
¦ which causes a rereading of the /etc/exports mand with no options should then show
¨
/mnt/cdrom 192.168.1.0/24
/mnt/cdrom
§ localhost.localdomain
file. Entering the which lists directories and hosts allowed to access them.
exportfs com-
¥
¦
It is useful to test mounts from your local machine before testing from a remote machine. Here we perform the NFS mounting operation proper:
¨ mkdir /mnt/nfs mount -t nfs localhost:/mnt/cdrom /mnt/nfs
§
¥
¦
You can see that the mount command sees the remote machine’s directory as a “device” of sorts, although the t ype is nfs instead of ext2 , vfat , or iso9660 . The remote host name is followed by a colon followed by the directory on that remote machine
relative to the root directory
. This syntax is unlike that for other kinds of services that name all files relative to some “top level” directory (eg., FTP and web servers). The acid test now is to run ls on the /mnt/nfs directory to verify that its contents are indeed the same as /mnt/cdrom . Supposing our server is called cdromserver , we can run the
¥ mkdir /mnt/nfs mount -t nfs cdromserver:/mnt/cdrom /mnt/nfs
§ ¦
If anything went wrong, you might like to search your process list for all processes with an rpc , mount , nfs , or portmap in them. Completely stopping NFS means clearing all of these processes (if you really want to start from scratch). It is useful to also keep
¨ tail -f /var/log/messages tail -f /var/log/syslog
§
¥
¦ running in a separate console to watch for any error (or success) messages (actually true of any configuration you are doing). Note that it is not always obvious that NFS
287
28.3. Access Permissions 28. Network File System, NFS is failing because of a forward or reverse DNS lookup, so double-check beforehand that these are working— mount will not usually be more eloquent than the classic NFS error message: “ mount: <xyz> failed, reason given by server: Permission denied .” A faulty DNS is also indicated by whole-minute pauses in operation.
Most distributions will not require you to manually start and stop the daemon processes above. Like most services, RedHat’s NFS implementation can be invoked simply with:
¨
/etc/init.d/nfs start
/etc/init.d/nfslock start
§
¥
¦
(or
¨
/etc/rc.d/init.d/ ). On Debian , similarly,
/etc/init.d/nfs-common start
/etc/init.d/nfs-kernel-server start
§
¥
¦
Above, we used 192.168.1.0/24(ro) to specify that we want to give r eado nly access to a range of IP addresses. You can actually put host names with wildcards also; for example:
¨
/mnt/cdrom
§
*.mynet.mydomain.co.za(ro)
¥
¦
¨
/home
§
Then also allow r eadw rite access with, say:
*.mynet.mydomain.co.za(rw)
One further option, no root squash , disables NFS’s special treatment of root owned files. This option is useful if you are finding certain files strangely inaccessible.
no root squash root
¨ is really only for systems (like diskless workstations) that need full access to a file system. An example is:
*.very.trusted.net(rw,no_root_squash)
¥
¦
¥
¦
The man page for options.
/etc/exports , exports (5), contains an exhaustive list of
288
28. Network File System, NFS 28.4. Security
NFS requires that a number of services be running that have no use anywhere else.
Many naive administrators create directory exports with impunity, thus exposing those machines to opportunistic hackers. An NFS server should be well hidden behind a firewall, and any Internet server exposed to the Internet should
never
run the portmap or RPC services. Preferably uninstall all of these services if you are not actually running an NFS server.
There are actually two versions of the NFS implementation for L
INUX
. Although this is a technical caveat, it is worth understanding that the NFS server was originally implemented by an ordinary daemon process before the L
INUX ported NFS. Debian kernel itself supsupports both implementations in two packages, nfs-server and nfs-kernel-server , although the configuration should be identical. Depending on the versions of these implementations and the performance you require, one or the other may be better. You are advised to at least check the status of the kernel NFS implementation on the kernel web pages. Of course, NFS as a
client
must necessarily be supported by the kernel as a regular file system type in order to be able to mount anything.
289
28.5. Kernel NFS 28. Network File System, NFS
290
There are some hundred odd services that a common L
INUX distribution supports.
For all of these to be running simultaneously would be a strain. Hence, a special daemon process watches for incoming TCP connections and then starts the relevant executable, saving that executable from having to run all the time. This is used only for sparsely used services—that is, not web, mail, or DNS.
The daemon that performs this function is traditionally called inetd : the subject of this chapter.
(Section 36.1 contains an example of writing your own network service in shell script to run under inetd .)
Which package contains inetd depends on the taste of your distribution. Indeed, under RedHat, version 7.0 switched to xinetd , a move that departs radically from the traditional U
NIX inetd .
are the configuration file xinetd is discussed below. The important
/etc/inetd.conf
, the executable inetd
/usr/sbin/inetd files
, the inetd and inetd.conf
man pages, and the startup script
/etc/rc.d/init.d/inetd or /etc/init.d/inetd
/etc/init.d/inet (or
). Another important file is
/etc/services , discussed in Section 26.4.
Most services can be started in one of three ways: first as a standalone (resource hungry, as discussed) daemon; second, under inetd ; or third as an inetd service which is
291
29.2. Invoking Services with
/etc/inetd.conf
29. Services Running Under
inetd
“TCP wrapper”-moderated. However, some services will run using
only
one method.
Here, we will give an example showing all three methods. You will need to have an ftp package installed for this example (either
Debian ).
wuftpd on RedHat or ftpd on
Try the following (alternative commands in parentheses):
¨
/usr/sbin/in.ftpd -D
( /usr/sbin/in.wuftpd -s )
§
The -D option instructs the service to start in D aemon mode (or
This represents the first way of running an Internet service.
s tandalone mode).
¥
¦
With this method we can let
/etc/inetd.conf
¨ inetd run the service for us.
Edit your file and add or edit the line (alternatives in parentheses): ftp stream tcp nowait root /usr/sbin/in.ftpd in.ftpd
( ftp
§ stream tcp nowait root /usr/sbin/in.wuftpd in.wuftpd )
¥
¦
Then, restart the
¨ inetd service with
/etc/init.d/inet restart
( killall -1 inetd )
( /etc/rc.d/init.d/inet restart )
§
¥
¦
¥ ps awx | grep ftp ftp localhost
§ ¦
The fields in the /etc/inetd.conf
file have the following meanings:
ftp
The name of the service. Looking in the this is TCP port 21 .
/etc/services file, we can see that
stream tcp
Socket type and protocol. In this case, a TCP stream socket, and hardly ever anything else.
nowait
Do not wait for the process to exit before listening for a further incoming connection. Compare to wait and respawn in Chapter 32.
292
29. Services Running Under
inetd
29.2. Invoking Services with
/etc/inetd.conf
root
The initial user ID under which the service must run.
/usr/sbin/in.ftpd
(
/usr/sbin/in.wuftpd
)
The actual executable.
in.ftpd
The command-line. In this case, just the program name and no options.
With this last method we let inetd run the service for us under the tcpd wrapper command. This is almost the same as before, but with a slight change in the
/etc/inetd.conf
¨ ftp entry: stream tcp stream tcp nowait root nowait root
/usr/sbin/tcpd /usr/sbin/in.ftpd
/usr/sbin/tcpd /usr/sbin/in.wuftpd )
¥
¦
Then, restart the voke in.ftpd
(or inetd service as before. These alternative lines allow in.wuftpd
) on inetd ’s behalf. The tcpd tcpd to incommand does various tests on the incoming connection to decide whether it should be trusted.
tcpd checks what host the connection originates from and compares that host against entries in the file /etc/hosts.allow
and /etc/hosts.deny
. It can refuse connections from selected hosts, thus giving you finer access control to services.
your
¨
Consider the preceding
/etc/hosts.allow
/etc/inetd.conf
file: in.ftpd: LOCAL, .my.domain
( in.wuftpd: LOCAL, .my.domain )
§ entry against the following line in
¥
¦ as well as the following line in the file
¨ in.ftpd: ALL
( in.wuftpd: ALL )
§
/etc/hosts.deny
:
¥
¦
This example will deny connections from all machines with host names not ending in .my.domain
but allow connections from the local
&
The same machine on which inetd is running
machine. It is useful at this point to try make an ftp connection from different machines to test access control. A complete explanation of the and /etc/hosts.deny
file format can be obtained from
/etc/hosts.allow
hosts access (5). Another example is ( /etc/hosts.deny
):
¨
ALL: .snake.oil.com, 146.168.160.0/255.255.240.0
§
¥
¦ which would deny access for ALL services to all machines inside the 146.168.160.0
(first 20 bits) network, as well as all machines under the snake.oil.com
domain.
293
29.3. Various Service Explanations 29. Services Running Under
inetd
Note that the above methods cannot be used simultaneously. If a service is already running one way, trying to start it another way will fail, possibly with a “port in use” error message.
whether to make the service an
Your distribution would have already decided inetd entry or a standalone daemon. In the former case, a line in /etc/inetd.conf
/etc/init.d/<service> (or will be present; in the latter case, a script
/etc/rc.d/init.d/<service> ) will be present to start or stop but there will be the daemon. Typically, there will be no
/etc/init.d/httpd and
/etc/init.d/ftpd
/etc/init.d/named script, scripts. Note that there will
always
be a /etc/init.d/inet script.
All these services are potential security holes. Don’t take chances: disable them all by commenting out all lines in
/etc/inetd.conf
.
5
10
A typical
¨ ftp telnet shell login talk ntalk pop-3 imap uucp tftp bootps finger
/etc/inetd.conf
stream stream stream stream dgram dgram stream stream stream dgram dgram stream stream tcp tcp udp udp tcp tcp tcp tcp tcp tcp udp udp tcp nowait nowait nowait nowait wait wait nowait nowait nowait wait wait nowait wait file (without the comment lines) looks something like: root root root root nobody.tty
nobody.tty
root root uucp root root nobody root
/usr/sbin/tcpd
/usr/sbin/tcpd
/usr/sbin/tcpd
/usr/sbin/tcpd
/usr/sbin/tcpd
/usr/sbin/tcpd
/usr/sbin/tcpd
/usr/sbin/tcpd
/usr/sbin/tcpd
/usr/sbin/tcpd
/usr/sbin/tcpd
/usr/sbin/tcpd
/usr/sbin/in.identd
in.ftpd -l -a in.telnetd
in.rshd
in.rlogind
in.talkd
in.ntalkd
ipop3d imapd
/usr/sbin/uucico -l in.tftpd
bootpd in.fingerd
in.identd -e -o
¥
¦
The above services have the following purposes (port numbers in parentheses):
ftp (21)
File Transfer Protocol, as shown above.
telnet (23)
Telnet login access.
shell (514)
rsh Remote shell script execution service.
login (513)
rlogin Remote Login login service.
294
29. Services Running Under
inetd
29.4. The
xinetd
Alternative
talk (517), ntalk
User communication gimmick.
pop-3 (110)
Post Office Protocol mail retrieval service—how most people get their mail through their ISP.
imap (143)
Internet Mail Access Protocol—a more sophisticated and dangerously insecure version of POP.
uucp (540)
Unix-to-Unix copy operating over TCP.
tftp (69)
Trivial FTP service used, for example, by diskless workstations to retrieve a kernel image.
bootpd (67)
BOOTP IP configuration service for LANs that require automatic IP assignment.
finger (79)
User lookup service.
auth (113)
A service that determines the owner of a particular TCP connection. If you run a machine with lots of users, administrators of other machines can see which users are connecting to them from your machine. For tracking purposes, some
IRC and FTP servers require that a connecting client run this service. Disable this service if your box does not support shell logins for many users.
Instead of the usual inetd + package as of version 7.0. The tcpd combination, RedHat switched to the xinetd package combines the features of xinetd tcpd and inetd into one neat package. The xinetd
/etc/xinetd.conf
; an executable package consists of a top-level config file,
/usr/sbin/xinetd ; and then a config file for each service under the directory
like ftpd
/etc/xinetd.d/ .
This arrangement allows a package control over its own configuration through its own separate file
.
5
The default top-level config file, /etc/xinetd.conf
, looks simply like this:
¨ defaults
{ instances log_type log_on_success log_on_failure
= 60
= SYSLOG authpriv
= HOST PID
= HOST RECORD
}
295
¥
29.5. Configuration Files 29. Services Running Under
inetd
includedir /etc/xinetd.d
§
The file dictates, respectively, that xinetd multaneous connections of each service to log ’s authpriv tion; and logs the channel; logs the
HOST (and also
HOST
RECORD and for each failed connection. In other words, interesting at all.
does the following: limits the number of si-
60 ; logs to the syslog facility, using sys-
P rocess ID for each successful connecinformation about the connection attempt)
/etc/xinetd.conf
really says nothing
¦
5
10
The last line says to look in /etc/xinetd.d/
Our FTP service would have the file
¨ for more (service-specific) files.
/etc/xinetd.d/wu-ftpd containing: service ftp
{ socket_type server server_args wait user log_on_success log_on_failure nice
= stream
= /usr/sbin/in.ftpd
= -l -a
= no
= root
+= DURATION USERID
+= USERID
= 10
}
§
¥
¦
This file is similar to our /etc/inetd.conf
line above, albeit more verbose. Respectively, this file dictates these actions: listen with a stream TCP socket; run the executable guments
/usr/sbin/in.ftpd
-l -a on a successful incoming connection; pass the aron the command-line to in.ftpd
(see ftpd (8)); never wait for in.ftpd
to exit before accepting the next incoming connection; run in.ftpd
root ; additionally log the DURATION and USERID as user of successful connections; additionally log the USERID of failed connections; and be in.ftpd
at a priority of 10.
nice to the CPU by running
5
The security options of only from xinetd allow much flexibility.
Most important is the option to limit the remote hosts allowed to use a service. The most extreme use is to add
¨ defaults only from 127.0.0.1
to the top-level config file:
{ only_from = 127.0.0.1 mymachine.local.domain
§
.
.
.
¥
¦
296
29. Services Running Under
inetd
29.6. Security which allows no remote machines to use any can add an only from xinetd line to any of the files in service at all. Alternatively, you
/etc/xinetd.d/ to restrict access on a per-service basis.
§ only from as well as domain names. For example,
¨ only_from = can also take IP address ranges of the form
127.0.0.1
192.168.128.0/17
nnn
.
nnn
.
nnn
.
.somewhere.friendly.com
nnn
/
bits
,
¥
¦ which in the last case allows access from all machines with host names ending in
.somewhere.friendly.com
.
Finally there is the no access option that works identically to only from , dictating hosts and IP ranges from which connections are
¨
not
allowed: no_access = .snake.oil.net
§
¥
¦
It may be thought that using to
all
/etc/hosts.deny
( or only from = ) to deny access remote machines should be enough to secure a system. This is
not
true: even a local user being able to access a local service is a potential security hole, since the service usually has higher privileges than the user. It is best to remove all services that are not absolutely necessary. For Internet machines, do not hesitate to hash out every last service or even uninstall inetd ( or xinetd ) entirely.
See also Chapter 44.
297
29.6. Security 29. Services Running Under
inetd
298
This chapter effectively explains how to get L
INUX up and running as a mail server. I have also included discussion on the process of mail delivery right through to retrieval of mail using POP and IMAP.
exim and sendmail are
MTA
s (
mail transfer agents
). An MTA is just a daemon process that listens on port 25 for incoming mail connections, spools
&
See page 197 about spooling in general.
that mail in a
queue
(for exim , the /var/spool/exim/input/ directory, for sendmail , the /var/spool/mqueue/ directory), then resends that mail to some other MTA or delivers it locally to some user’s mailbox. In other words, the MTA is the very package that handles all mail spooling, routing, and delivery. We saw in Section
10.2 how to manually connect to an MTA with telnet . In that example, sendmail version 8.9.3 was the MTA running on machine mail.cranzgot.co.za
.
sendmail is the original and popular U
NIX
MTA. It is probably necessary to learn how to configure it because so many organizations standardize on it. However, because exim is so easy to configure, it is worthwhile replacing you see it—there are at least three MTAs that are preferable to sendmail wherever sendmail . I explain the minimum of what you need to know about in detail.
sendmail later on and explain exim
Before we get into MTA configuration, a background in mail delivery and indexii MX recordDNSMX record handling is necessary. The sequence of events whereby a mail
299
30.1. Introduction 30.
exim
and
sendmail
message (sent by a typical interactive mail client) ends up on a distant user’s personal workstation is as follows:
1.
A user configures his mail client (Outlook Express, Netscape, etc.) to use a particular
SMTP host
(for
outgoing mail
, also called the
SMTP gateway
) and
POP host
(or
IMAP host
) for
incoming mail
.
2.
The user composes a message to, say, on “Send.” [email protected]
and then clicks
3.
The mail client initiates an outgoing TCP connection to port 25 of the SMTP host.
An MTA running on the SMTP host and listening on port 25 services the request. The mail client uses the SMTP protocol exactly as in Section 10.2. It fills in [email protected]
as the recipient address and transfers a properly composed header (hopefully) and message body to the MTA. The mail client then terminates the connection and reports any errors.
4.
The MTA queues the message as a spool file, periodically considering whether to process the message further according to a retry schedule.
5.
Should the retry schedule permit, the MTA considers the recipient address [email protected]
. It strips out the
domain part
of the email address— that is, everything after the
lookup
indexii MX
@ . It then performs a DNS recordDNS) for the domain
MX query
toonland.net
(or
MX
. DNS resolution for toonland.net
follows the procedure listed in Section 27.2.2. In short, this means (approximately) that it looks for the name server that is authoritative for the domain record of the domain toonland.net
toonland.net
mail.toonland.net
. It queries that name server for the MX
. The name server returns a host name, say, with corresponding IP address, say, 197.21.135.82
.
&
Section 27.7.1 shows you how you can manually lookup the MX record. Chapter 40 shows you how to set up your name server to return such an MX record.
-
6.
The MTA makes an SMTP connection to port 25 of
MTA running on mail.toonland.net
197.21.135.82
. Another services the request. A recipient address, message header, and message body are transferred using the SMTP protocol. The MTA then terminates the connection.
7.
The MTA running on mail.toonland.net
considers the recipient address [email protected]
. It recognizes toonland.net
as a domain for which it hosts mail (that is, a its own
local domain
/etc/passwd file.
). It recognizes rrabbit as a user name within
8.
The MTA running on mail.toonland.net
the user’s personal mailbox file, say, appends the message to
/var/spool/mail/rrabbit or
/home/rrabbit/Maildir/ .
the mailbox on
The delivery is now complete. How the email gets from mail.toonland.net
to Mr Rabbit’s personal workstation is
not
the responsibility of the MTA and does
not
happen through SMTP.
300
30.
exim
and
sendmail
30.2.
exim
Package Contents
9.
Mr Rabbit would have configured his mail client (running on his personal workstation) to use a POP/IMAP host mail.toonland.net
for incoming mail.
mail.toonland.net
tively.
runs a POP or IMAP service on port 110 or 143, respec-
10.
Mr Rabbit’s mail client makes a TCP connection to port 110 (or 143) and communicates using the POP or IMAP protocol. The POP or IMAP service is responsible for feeding the message to the mail client and deleting it from the mailbox file.
11.
Mr Rabbit’s mail client stores the message on his workstation using its own methods and displays the message as a “new” message.
POP and IMAP are invoked by inetd or xinetd —see Chapter 29. Except for limiting the range of clients that are allowed to connect (for security reasons), no configuration is required. Client connections authenticate themselves using the normal U
NIX login name and password. There are specialized POP and IMAP packages for supporting different mailbox types (like Maildir).
The that exim exim
home page
http://www.exim.org/ gives you a full rundown. Here I will just say is the simplest MTA to configure. Moreover, its configuration file works the same way you imagine mail to work. It’s really easy to customize the exim configuration to do some really weird things. The whole package fits together cleanly, logically, and intuitively. This is in contrast to sendmail ’s sendmail.cf
is widely considered to be extremely cryptic and impractical.
exim file, which also seems to have been written with proper security considerations, although many people argue that postfix and qmail are the last word in secure mail.
You can get exim as a .rpm
or .deb
file.
After installation, the file
/usr/share/doc/eximplete exim
?
.
??
/doc/spec.txt
& or /usr/doc/
documentation; there is also an HTML version on the contains the comexim web page, whereas the man page contains only command-line information.
placement for sendmail , meaning that for every critical exim sendmail is a drop-in recommand, there is an exim command of the same name that takes the same options, so that needy scripts won’t know the difference. These are:
301
30.3.
exim
Configuration File 30.
exim
and
sendmail
5
¨
/etc/aliases
/usr/bin/mailq
/usr/bin/newaliases
/usr/bin/rmail
/usr/lib/sendmail
/usr/sbin/sendmail
§
Finally, there is the file exim
/etc/exim/config , binary itself,
/etc/exim.conf
/usr/sbin/exim
, or
, and configuration
/etc/exim/exim.conf
, depending on your L
INUX
/etc/init.d/exim .
distribution.
& or
Then there are the usual start/stop scripts,
/etc/rc.d/init.d/exim
-
¥
¦
As a preliminary example, here we create a simple spooling mail server for a personal workstation, cericon.cranzgot.co.za
.
Client applications (especially non-U
NIX ones) are usually configured to connect to an MTA running on a remote machine, however, using a remote SMTP host can be irritating if the host or network go down. Running enables all applications to use localhost exim on the local workstation as their SMTP gateway: that is, exim takes care of queuing and periodic retries.
5
10
Here is the configuration. The difference between this and a full-blown mail server is actually very slight.
¨
#################### MAIN CONFIGURATION SETTINGS ##################### log_subject errors_address = postmaster freeze_tell_mailmaster = yes queue_list_requires_admin = false prod_requires_admin = false trusted_users = psheer local_domains = localhost : ${primary_hostname} never_users = root
# relay_domains = my.equivalent.domains : more.equivalent.domains
host_accept_relay = localhost : *.cranzgot.co.za : 192.168.0.0/16 exim_user = mail exim_group = mail end
¥
15
20
###################### TRANSPORTS CONFIGURATION ###################### remote_smtp: driver = smtp hosts = 192.168.2.1
hosts_override local_delivery: driver = appendfile
302
30.
exim
and
sendmail
30.3.
exim
Configuration File
25
30 end file = /var/spool/mail/${local_part} delivery_date_add envelope_to_add return_path_add group = mail mode_fail_narrower = mode = 0660
35
###################### DIRECTORS CONFIGURATION ####################### localuser: driver = localuser transport = local_delivery end
40
45
###################### ROUTERS CONFIGURATION ######################### lookuphost: driver = lookuphost transport = remote_smtp literal: driver = ipliteral transport = remote_smtp end
50
###################### RETRY CONFIGURATION ###########################
* * F,2h,15m; G,16h,1h,1.5; F,4d,8h end
###################### REWRITE CONFIGURATION ######################### [email protected]
The exim config file is divided into six logical sections separated by the end keyword.
The top or MAIN section contains global settings. The global settings have the following meanings:
log subject
Tells
LOVE YOU" exim to log the subject in the mail log file. For example, will be added to the log file.
T="I
errors address
The mail address where errors are to be sent. It doesn’t matter what you put here, because all mail will get rewritten to [email protected]
, as we see later.
freeze tell mailmaster
Tells errors address about
frozen
messages.
frozen
messages are messages that could not be delivered for some reason (like a permissions problem, or a failed message whose return address is invalid) and are flagged to sit idly in the mail queue, and not be processed any further. Note that
303
¦
30.3.
exim
Configuration File 30.
exim
and
sendmail
frozen messages sometimes mean that something is wrong with your system or mail configuration.
local domains
Each mail message received is processed in one of two ways: by either a local or remote delivery. A local delivery is one to a user on the local machine, and a remote delivery is one to somewhere else on the Internet.
local domains distinguishes between these two. For example, according to the config line above, a message destined to or [email protected]
[email protected] is local, whereas a message to [email protected]
is remote. Note that the list is colon delimited.
never users
Never become this user. Just for security.
exim user
Specifies the user that exim should run as.
exim group
Specifies the group that exim should run as.
It is important to understand the host accept relay relay domains options for security.
and
host accept relay
This option specifies machines that are allowed to use con.cranzgot.co.za
as a
relay
ceri-
. A relay is a host that sends mail on another machine’s behalf: that is, we are acting as a relay when we process a mail message that neither originated from our machine nor is destined for a user on our machine.
We
never
want to relay from an untrusted host. Why? Because it may, for example, allow someone to send 100,000 messages to 100,000 different addresses, each with
us
in the message header.
host accept relay specifies a list of trusted hosts that are allowed to send such arbitrary messages through us. Note again that the list is colon delimited.
In this example, we don’t even need to put in addresses of other machines on our
LAN, except if we are feeling friendly.
relay domains
relay domains gives an additional condition for which an arbitrary host is allowed to use us as a relay. Consider that we are a backup mail server for a particular domain; mail to the domain does not originate from us nor is destined for us yet must be allowed
only if the destination address matches the domains for which we are a backup
. We put such domains under relay domains .
The transport section comes immediately after the main configuration options. It defines various
methods
of delivering mail. We are going to refer to these methods later in
304
30.
exim
and
sendmail
30.3.
exim
Configuration File the configuration file. Our manual telnet ing to port 25 was
transport
ing a mail message by SMTP. Appending a mail message to the end of a mail folder is also a transport method. These are represented by the respectively.
remote smtp: and local delivery: labels,
remote smtp:
This transport has the following suboptions:
driver
The actual method of delivery.
transport, director, or router.
driver = always specifies the kind of
hosts override and hosts
Using these two options together overrides any list of hosts that may have been looked up by DNS MX queries. By “list of hosts” we mean machines established from the recipients email address to which we might like to make an SMTP delivery, but which we are not going to use. Instead we send all mail to internal mail server.
192.168.2.1
, which is this company’s
local delivery:
This transport has the following suboptions:
driver
The actual method of delivery.
transport, director, or router.
driver = always specifies the kind of
file
The file to append the mail message to.
everything before the @
${local part} character of the recipient’s address.
is replaced with
delivery date add , envelope to add , and
things to add to the header.
return path add
Various
group , mode fail narrower and mode
Various permission settings.
(It should be obvious at this stage what these two transports are going to be used for. As far as MTAs are concerned, the only two things that ever happen to an email message are that it either (a) gets sent through SMTP to another host or (b) gets appended to a file.)
If a message arrives and it is listed in delivery. This means exim local domains works through the list of
, exim
directors
will attempt a local until it finds one that does not fail. The only director listed here is the one labeled cal delivery localuser: with loas its transport. So quite simply, email messages having recipient addresses that are listed under not very complicated.
local domains are appended to a user’s mailbox file—
A director
directs
mail to a mailbox.
305
30.4. Full-blown Mail server 30.
exim
and
sendmail
If a message arrives and it is not listed in delivery. Similarly, this means exim local domains , works through the list of exim attempts a remote
routers
until it finds one that does not fail.
Two routers are listed here. The first is for common email addresses. It uses the lookuphost driver, which does a DNS MX query on the domain part of the email address (i.e., everything after the remote smtp
@ ). The MX records found are then passed to the transport (and in our case, then ignored). The lookuphost driver will fail if the domain part of the email address is a bracketed literal IP address.
The second router uses the ipliteral driver.
It sends mail directly to an IP address in the case of bracketed, literal email addresses.
For example, [email protected][111.1.1.1] .
A router
routes
mail to another host.
5
10
15
20
25
An actual mail server config file contains very little extra. This one is the example config file that comes by default with
¨ exim-3.16
:
#################### MAIN CONFIGURATION SETTINGS #####################
# primary_hostname =
# qualify_domain =
# qualify_recipient =
# local_domains = never_users = root
# host_accept_relay = localhost
# host_accept_relay = my.friends.host : 131.111.0.0/16
# relay_domains = my.equivalent.domains : more.equivalent.domains
host_lookup = 0.0.0.0/0
# receiver_unqualified_hosts =
# sender_unqualified_hosts = rbl_domains = rbl.maps.vix.com
no_rbl_reject_recipients sender_reject = "*@*.sex*.net:*@sex*.net" host_reject = "open-relay.spamming-site.com" rbl_warn_header
# rbl_domains = rbl.maps.vix.com:dul.maps.vix.com:relays.orbs.org
# percent_hack_domains = * end
###################### TRANSPORTS CONFIGURATION ###################### remote_smtp: driver = smtp
# procmail transport goes here <--local_delivery: driver = appendfile
¥
306
30.
exim
and
sendmail
30.4. Full-blown Mail server
30
35
40
45
50
55
60
65
70
75
80 file = /var/spool/mail/${local_part} delivery_date_add envelope_to_add return_path_add group = mail mode = 0660 address_pipe: driver = pipe return_output address_file: driver = appendfile delivery_date_add envelope_to_add return_path_add address_reply: driver = autoreply end
###################### DIRECTORS CONFIGURATION #######################
# routers because of a "self=local" setting (not used in this configuration).
system_aliases: driver = aliasfile file = /etc/aliases search_type = lsearch user = mail group = mail file_transport = address_file pipe_transport = address_pipe userforward: driver = forwardfile file = .forward
no_verify no_expn check_ancestor
# filter file_transport = address_file pipe_transport = address_pipe reply_transport = address_reply
# procmail director goes here <--localuser: driver = localuser transport = local_delivery end
###################### ROUTERS CONFIGURATION #########################
# widen_domains = "sales.mycompany.com:mycompany.com" lookuphost: driver = lookuphost transport = remote_smtp
# widen_domains = literal: driver = ipliteral transport = remote_smtp end
###################### RETRY CONFIGURATION ###########################
* * F,2h,15m; G,16h,1h,1.5; F,4d,8h end
307
30.5. Shell Commands for
exim
Administration 30.
exim
and
sendmail
¦
For procmail mailex (5)), simply add
¨ support (see procmail (1), procmailrc procmail: driver = pipe
§ command = "/usr/bin/procmail -Y -d ${local_part}"
(6), and after your remote smtp
¨ transport, and then also, procmail: driver = localuser transport = procmail
§ require_files = /usr/bin/procmail after your user forward director.
proc-
¥
¦
¥
¦
As with other daemons, you can stop configuration file with:
¨
/etc/init.d/exim stop
/etc/init.d/exim start
/etc/init.d/exim reload
§ exim , start exim , and cause exim to reread its
¥
¦
You should always do a start reload up script actually just runs standalone daemon, listening for connections on port 25, and then execute a plained below) every 30 minutes.
to cause config file changes to take effect. The exim -bd -q30m , which tells exim to start as a runq (ex-
To cause exim
& and many other MTAs for that matter
pending messages and consider each one for deliver, run
¨ to loop through the queue of runq
§
¥
¦ which is the same as exim -q .
¨ mailq
§
To list mail that is queued for delivery, use which is the same as exim -bp .
To forcibly attempt delivery on any mail in the queue, use
¥
¦
308
30.
exim
and
sendmail
30.6. The Queue
¨ exim -qf
§ and then to forcibly retry even frozen messages in the queue, use
¨ exim -qff
§
To delete a message from the queue, use
¨ exim -Mrm <message-id>
§
The man page exim (8) contains exhaustive treatment of command-line options.
Those above are most of what you will use, however.
¥
¦
¥
¦
¥
¦
5
It is often useful to check the queue directory /var/spool/exim/input/ messages, just to get an inside look at what’s going on. The simple session—
¨
0m
mailq
320 14Epss-0008DY-00 <[email protected]> [email protected]
for mail
¥
0m 304 14Ept8-0008Dg-00 <[email protected]> [email protected]
10
[[email protected]]# total 16
-rw-------
ls -l /var/spool/exim/input/
1 root root 25 Jan
-rw-------
-rw-------
1 root
1 root
1 root root root root
550 Jan
25 Jan
530 Jan
6 11:43 14Epss-0008DY-00-D
6 11:43 14Epss-0008DY-00-H
6 11:43 14Ept8-0008Dg-00-D
6 11:43 14Ept8-0008Dg-00-H
—clearly shows that two messages are queued for delivery. The files ending in
envelope headers
, and those ending in -D are message bodies. The spec.txt
-H are document will show you how to interpret the contents of the header files.
¦
Don’t be afraid to manually rm files from this directory, but always delete them in pairs (i.e., remove the both the header
and
the body file), and make sure exim is not running at the time. In the above example, the commands,
¨
exim -Mrm 14Epss-0008DY-00 14Ept8-0008Dg-00
Message 14Epss-0008DY-00 has been removed
Message 14Ept8-0008Dg-00 has been removed
mailq
¥
309
30.7.
/etc/aliases
for Equivalent Addresses 30.
exim
and
sendmail
5 work even better.
¦
Often, we would like certain local addresses to
actually
instance, we would like all mail destined to user deliver to other addresses. For
MAILER-DAEMON to actually go to user postmaster ; or perhaps some user has two accounts but would like to read mail from only one of them.
The /etc/aliases file performs this mapping. This file has become somewhat of an institution; however you can see that in the case of exim , aliasing is completely arbitrary: you can specify a lookup on provided that file is colon delimited.
any
file under the system aliases: director
A default /etc/aliases should check that the file could contain as much as the following; you postmaster account does exist on your system, and test whether you can read, send, and receive mail as user
¨ postmaster
# This is a combination of what I found in the Debian
# and RedHat distributions.
.
¥
5
10
15
20
25
30
MAILER-DAEMON: abuse: anonymous: backup: backup-reports: bin: daemon: decode: dns: dns-admin: dumper: fetchmail-daemon: games: gnats: ingres: info: irc: list: listmaster: lp: mail: mailer-daemon: majordom: man: manager: msql: news: postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster
310
30.
exim
and
sendmail
30.8. Real-Time Blocking List — Combating Spam
35
40
45 nobody: operator: postgres: proxy: root: sync: support: sys: system: toor: uucp: warnings: web-master: www-data: postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster postmaster
# some users who want their mail redirected arny: [email protected]
You can remove a lot of these aliases, since they assume services to be running that might not be installed— games , ingres , for example. Aliases can do two things: firstly, anticipate what mail people are likely to use if they need to contact the administrator; and secondly, catch any mail sent by system daemons: for example the, email address of the DNS administrator is dictated by the DNS config files, as explained on page 445.
Note that an alias in the on the system— larry and
/etc/aliases arny file does not have to have an account need not have entries in the /etc/passwd file.
¦
Spam
refers to unsolicited
&
Not looked for or requested; unsought
bulk mail sent to users usually for promotional purposes. That is, mail is sent automatically to many people with whom the sender has no relationship, and where the recipient did nothing to prompt the mail: all on the
chance
that the recipient might be interested in the subject matter.
Alternatively, spam can be thought of as any mail sent to email addresses, where those addresses were obtained without their owners consent. More practically, anyone who has had an email account for very long will have gotten messages like Subject:
Fast way to earn big $$$!
, which clutters my mailbox. The longer you have an email address, the more of these messages you will get, and the more irritated you will get.
311
30.8. Real-Time Blocking List — Combating Spam 30.
exim
and
sendmail
To send spam is easy. Work your way around the Internet till you find a mail server that allows relaying. Then send it 10,000 email addresses and a message about where to get pictures of naked underage girls. Now you are a genuine worthy-ofbeing-arrested spammer. Unfortunately for the unsuspecting administrator of that machine and provided you have even a little clue what you’re doing, he will probably never be able to track you down. Several other tricks are employed to get the most out of your $100-for-1,000,000-genuine-email-addresses.
Note that spam is not merely email you are not interested in. People often confuse mail with other types of communication. . . like telephone calls: if you get a telephone call, you
have
to pick up the phone then and there—the call is an an invasion of your privacy. The beauty of email is that you never need to have your privacy invaded.
You can simply delete the mail. If you never want to get email from a particular person again, you can simply add a filter that blocks mail from that person’s address (see procmailex (5)).
&
If you are irritated by the presumption of the sender, then that’s
your
problem.
Replying to that person with “Please don’t email me...” not only shows that you are insecure, but also that you are clueless, don’t get much mail, and are therefore also unpopular.
-
The point at which email becomes intrusive is purely a question of volume, much like airwave advertisements.
Because it comes from a different place each time, you cannot protect yourself against it with a simple mail filter.
Typical spam mail will begin with a spammer subject like
From Home Now!!
Create Wealth and then the spammer will audaciously append the footer:
This is not a SPAM. You are receiving this because you are on a list of email addresses that I have bought. And you have opted to receive information about business opportunities. If you did not opt in to receive information on business opportunities then please accept our apology. To be REMOVED from this list simply reply with REMOVE as the subject. And you will NEVER receive another email from me.
Need I say that you should be wary of replying with the sender that your email is a valid address.
REMOVE , since it clearly tells
You can start by at least adding the following lines to your MAIN configuration section:
¨ headers_check_syntax headers_sender_verify sender_verify receiver_verify
§
¥
¦
The option headers check syntax causes exim to check all headers of incoming mail messages for correct syntax, failing them otherwise. The next three options check
312
30.
exim
and
sendmail
30.8. Real-Time Blocking List — Combating Spam that one of the Sender: , Reply-To: in the SMTP MAIL and RCPT or From: headers, as well as both the addresses commands, are genuine email addresses.
The reasoning here is that spammers will often use malformed headers to trick the
MTA into sending things it ordinarily wouldn’t, I am not sure exactly how this applies in exim ’s case, but these are for the good measure of rejecting email messages at the point where the SMTP exchange is being initiated.
5
10
To find out a lot more about spamming, banning hosts, reporting spam and email usage in general, see
MAPS (Mail Abuse Prevention System LLC)
http://www.mail-abuse.org/
, as well as
Open Relay Behavior-modification System
working, there is also http://www.orbl.org/ and http://www.orbs.org/ http://www.ordb.org/
.
-
.
&
If this site is not
Real-time Blocking Lists
or
RBL’s are a not-so-new idea that has been incorporated into exim as a feature. It works as follows. The spammer has to use a host that allows relays. The IP of that relay host will be clear to the MTA at the time of connection. The MTA can then check that against a database of publicly available
banned IP addresses
of relay hosts. For exim , this means the list under rbl domains . If the rbl domains friendly has this IP blacklisted, then exim denies it also. You can enable this capability with front web page.
-
¨
&
This example comes from exim ’s
# reject messages whose sending host is in MAPS/RBL
# add warning to messages whose sending host is in ORBS rbl_domains = blackholes.mail-abuse.org/reject : \ dialups.mail-abuse.org/reject : \ relays.mail-abuse.org/reject : \ relays.orbs.org/warn
# check all hosts other than those on internal network rbl_hosts = !192.168.0.0/16:0.0.0.0/0
# but allow mail to [email protected] even from rejected host recipients_reject_except = [email protected]
# change some logging actions (collect more data) rbl_log_headers rbl_log_rcpt_count
§
# log headers of accepted RBLed messages
# log recipient info of accepted RBLed messages
¥
¦ in your MAIN configuration section.
no rbl reject recipients ; otherwise, not actually refuse email.
Also remember to remove the line exim will only log a warning message and
Mail administrator and email users are expected to be aware of the following:
313
30.9. Sendmail 30.
exim
and
sendmail
•
Spam is evil.
•
Spam is caused by poorly configured mail servers.
•
It is the responsibility of the mail administrator to ensure that proper measures have been taken to prevent spam.
•
Even as a user, you should follow up spam by checking where it came from and complaining to those administrators.
•
Many mail administrators are not aware there is an issue. Remind them.
sendmail ’s configuration file is ited from the first U
NIX
/etc/sendmail.cf
. This file format was inherservers and references simpler files under the directory
/etc/mail/ . You can do most ordinary things by editing one or another file under
/etc/mail/ without having to deal with the complexities of /etc/sendmail.cf
.
5
Like most stock MTAs shipped with L
INUX distributions, the sendmail package will work by default as a mailer without any configuration. However, as always, you will have to add a list of relay hosts. This is done in the file for sendmail-8.10
/etc/mail/access and above. To relay from yourself and, say, the hosts on network
192.168.0.0/16 , as well as, say, the domain
hosts
.trusted.com
, you must have at least:
¨ localhost.localdomain
RELAY localhost
127.0.0.1
192.168
trusted.com
§
RELAY
RELAY
RELAY
RELAY
¥
¦ which is exactly what the host accept relay option does in the case of exim .
The domains for which you are acting as a backup mail server must be listed in the file /etc/mail/relay-domains , each on a single line. This is analogous to the relay domains option of exim .
Then, of course, the domains for which also be specified. This is analogous to the sendmail is going to receive mail must local domains option of exim . These are listed in the file /etc/mail/local-host-names , each on a single line.
The same /etc/aliases file is used by exim and sendmail .
Having configured anything under /etc/mail/ , you should now run make in this directory to rebuild lookup tables for these files. You also have to run the command
314
30.
exim
and
sendmail
30.9. Sendmail newaliases restart whenever you modify the sendmail .
/etc/aliases file. In both cases, you must sendmail has received a large number of security alerts in its time. It is imperative that you install the latest version. Note that older versions of sendmail have configurations that allowed relaying by default—another reason to upgrade.
FAQ
A useful resource to for finding out more tricks with http://www.sendmail.org/faq/
.
sendmail is
The Sendmail
315
30.9. Sendmail 30.
exim
and
sendmail
316
lilo stands for li
nux
lo
ader
.
LILO: is the prompt you first see after boot up, from which you can usually choose the OS you would like to boot and give certain boot options to the kernel. This chapter explains how to configure options, and to get otherwise non-booting systems to boot.
lilo and kernel boot
The lilo
¨
/boot/boot.b
package itself contains the files
/boot/message
/boot/chain.b
/boot/os2_d.b
/usr/share/doc/lilo-<version>
§
/sbin/lilo
/usr/bin/keytab-lilo which is not that interesting, except to know that the technical and user documentation is there if hard-core details are needed.
¥
¦
When you first start your L
INUX system, the boot options, is displayed. Pressing
LILO: prompt, at which you can enter displays a list of things to type. The purpose is to allow the booting of different L
INUX installations on the same machine, or different operating systems stored in different partitions on the same disk. Later, you can actually view the file boot options) were used.
/proc/cmdline to see what boot options (including default
317
31.2. Theory 31.
lilo
,
initrd
, and Booting
A U
NIX kernel, to be booted, must be loaded into memory from disk and be executed.
The execution of the kernel causes it to uncompress itself and then run.
&
The word
boot
itself comes from the concept that a computer cannot begin executing without program code, and program code cannot get into memory without other program code—like trying to lift yourself up by your bootstraps, and hence the name.
The first thing the kernel does after it runs is initialize various hardware devices. It then mounts the root file system on a specified partition. Once the root file system is mounted, the kernel executes /sbin/init to begin the U
NIX operating system. This is how all U
NIX systems begin life.
PCs begin life with a small program in the ROM BIOS that loads the very first sector of the disk into memory, called the
boot sector
of the
master boot record
or
MBR
. This piece of code is up to 512 bytes long and is expected to start the operating system. In the case of L
INUX
, the boot sector loads the file /boot/map , which contains a list of the precise location of the disk sectors that the L
INUX
kernel image
(usually the file
/boot/vmlinuz ) spans. It loads each of these sectors, thus reconstructing the kernel image in memory. Then it jumps to the kernel to execute it.
You may ask how it is possible to load a file from a file system when the file system is not mounted. Further, the boot partition is a small and simple program and certainly does not support the many types of file systems and devices that the kernel image may reside in. Actually, lilo doesn’t have to support a file system to access a file, as long as it has a list of the sectors that the file spans and is prepared to use the BIOS
interrupts
&
Nothing to do with “interrupting” or hardware interrupts, but refers to BIOS functions that are available for use by the operating system. Hardware devices may insert custom BIOS functions to provided rudimentary support needed for themselves at startup. This support is distinct from that provided by the hardware device drivers of the booted kernel.
to read those sectors. If the file is never modified, that sector list will never change; this is how the
/boot/vmlinuz files are loaded.
/boot/map and
In addition to the MBR, each primary partition has a boot sector that can boot the operating system in that partition. MS-DOS (Windows) partitions have this, and hence lilo can optionally load and execute these installation in another partition.
partition boot sectors
to start a Windows
318
31.
lilo
,
initrd
, and Booting 31.3.
lilo.conf
and the
lilo
Command
BIOSs have inherited several limitations because of lack of foresight of their designers.
First, some BIOSs do not support more than one IDE.
documentation.
-
&
At least according to the lilo
I myself have not come across this as a problem.
The second limitation is most important to note. As explained, functions to access the IDE drive, lilo uses BIOS
but the BIOS of a PC is often limited to accessing the first 1024 cylinders of the disk
. Hence, whatever LILO reads
must
reside within the first
1024 cylinders (the first 500 megabytes of disk space). Here is the list of things whose sectors are required to be within this space:
1.
/boot/vmlinuz
2.
Various lilo files /boot/*.b
3.
Any non-L
INUX partition boot sector you would like to boot
However a L
INUX root partition can reside anywhere because the boot sector program never reads this partition except for the abovementioned files. A scenario where the /boot/ directory is a small partition below the 500 megabyte boundary and the partition is above the 500 megabyte boundary, is quite common. See page 155.
/
Note that newer “LBA” BIOS’s support more than the first 512 megabytes—even up to 8 Gigabytes. I personally do not count on this.
5
10
To “do a lilo ” means running the
/etc/lilo.conf
file. The lilo.conf
lilo command as root , with a correct file will doubtless have been set up by your distribution (check yours). A typical lilo.conf
file that allows booting of a Windows partition and two L
INUX
¨ partitions is as follows: boot=/dev/hda prompt timeout = 50 compact vga = extended lock password = jAN]")Wo restricted append = "ether=9,0x300,0xd0000,0xd4000,eth0 hisax=1,3,5,0xd8000,0xd80,HiSax" image = /boot/vmlinuz-2.2.17
label = linux root = /dev/hda5 read-only
¥
319
31.3.
lilo.conf
and the
lilo
Command 31.
lilo
,
initrd
, and Booting
15
20 image = /boot/vmlinuz-2.0.38
label = linux-old root = /dev/hda6 read-only other = /dev/hda2 label = win
§ table = /dev/hda
¦
Running lilo will install into the MBR a boot loader that understands where to get the /boot/map file, which in turn understands where to get the /boot/vmlinuz-
2.2.12-20 file. It gives output like:
¨
lilo
Added linux *
Added linux-old
Added win
§
¥
¦
It also backs up your existing MBR, if this has not previously been done, into a file
/boot/boot.0300
(where 0300 refers to the device’s major and minor number).
Let’s go through the options:
boot
Device to boot. It will most always be /dev/hda or /dev/sda .
prompt
Display a prompt where the user can enter the OS to boot.
timeout
How many tenths of a seconds to display the prompt (after which the first image is booted).
compact
String together adjacent sector reads. This makes the kernel load
much
faster.
vga
We would like 80 search /etc/rc.d
50 text mode. Your startup scripts may reset this to 80 recursively for any file containing “ textmode ”.
25—
lock
Always default to boot the last OS booted used.
-
.
&
A very useful feature which is seldom
password
Require a password to boot.
restricted
the
Require a password only if someone attempts to enter special options at
LILO: prompt.
append
A
kernel boot option
. Kernel boot options are central to lilo and kernel modules and are discussed in Chapter 42.5. They are mostly not needed in simple installations.
image
A L
INUX kernel to boot.
label
The text to type at the boot prompt to cause this kernel/partition to boot.
320
31.
lilo
,
initrd
, and Booting 31.4. Creating Boot Floppy Disks
root
The root file system that the kernel must mount.
read-only
only.
Flag to specify that the root file system must initially be mounted read-
other
Some other operating system to boot: in this case, a Windows partition.
table
Partition table info to be passed to the partition boot sector.
Further allowed.
other = partitions can follow, and many image = kernel images are
The preceding lilo.conf
file assumed a partition scheme as follows:
/dev/hda1
10-megabyte ext2 partition to be mounted on /boot .
/dev/hda2
Windows 98 partition over 500 megabytes in size.
/dev/hda3
Extended partition.
/dev/hda4
Unused primary partition.
/dev/hda5
ext2 root file system.
/dev/hda6
Second ext2 root file system containing an older distribution.
/dev/hda
?
L
INUX swap, /home , and other partitions.
If LILO is broken or absent, we require an alternative boot method. A floppy disk capable of booting our system must contain a kernel image, the means to load the image into memory, and the means to mount /dev/hda5 as the root file system. To system, and create such a floppy, insert a
new
floppy disk into a running L
INUX overwrite it with the following commands:
¨ dd if=/boot/vmlinuz-2.2.17 of=/dev/fd0 rdev /dev/fd0 /dev/hda5
§
¥
¦
Then simply boot the floppy. This procedure requires a second L
INUX installation at least. If you only have an MS-DOS or Windows system at your disposal then you will have to download the RAWRITE.EXE
utility as well as a raw boot disk image.
Many of these are available and will enable you to create a boot floppy from a DOS prompt. I will not go into detail about this here.
321
31.5. SCSI Installation Complications and
initrd
31.
lilo
,
initrd
, and Booting
Some of the following descriptions may be difficult to understand without knowledge of kernel modules explained in Chapter 42. You may want to come back to it later.
Consider a system with zero IDE disks and one SCSI disk containing a L
INUX installation. There are BIOS interrupts to read the SCSI disk, just as there were for the
IDE, so LILO can happily access a kernel image somewhere inside the SCSI partition.
However, the kernel is going to be lost without a
kernel module
&
See Chapter 42. The kernel doesn’t support every possible kind of hardware out there all by itself. It is actually divided into a main part (the kernel image discussed in this chapter) and hundreds of modules (loadable parts that reside in
/lib/modules/ ) that support the many type of SCSI, network, sound etc., peripheral devices.
that understands the particular SCSI driver. So although the kernel can load and execute, it won’t be able to mount its root file system without loading a SCSI module first.
But the module itself resides in the root file system in /lib/modules/ . This is a tricky situation to solve and is done in one of two ways: either (a) using a kernel with preenabled SCSI support or (b) using what is known as an
system image
.
initrd preliminary root file
The first method is what I recommend. It’s a straightforward (though timeconsuming) procedure to create a kernel with SCSI support for your SCSI card built-in
(and not in a separate module). Built-in SCSI and network drivers will also autodetect cards most of the time, allowing immediate access to the dev ice—they will work without being given any options
&
Discussed in Chapter 42.
and, most importantly, without your having to read up on how to configure them. This setup is known as
compiled-in
support for a hardware driver (as opposed to
module
support for the driver). The resulting kernel image will be larger by an amount equal to the size of module. Chapter
42 discusses such kernel compiles.
The second method is faster but trickier. L
INUX kernel mounts this file system as a RAM disk, executes the file only mounts the real file system.
supports what is known as an initrd image ( init ial r AM d isk image). This is a small, 1.5 megabyte file system that is loaded by LILO and mounted by the kernel instead of the real file system. The
/linuxrc , and then
Start by creating a small file system. Make a directory ing files into it.
¨ drwxr-xr-x drwxr-xr-x
-rwxr-xr-x
-rwxr-xr-x
7 root
2 root
1 root
1 root root root root root
˜/initrd and copy the follow-
1024 Sep 14 20:12 initrd/
1024 Sep 14 20:12 initrd/bin/
436328 Sep 14 20:12 initrd/bin/insmod
424680 Sep 14 20:12 initrd/bin/sash
¥
322
31.
lilo
,
initrd
, and Booting 31.6. Creating an
initrd
Image
5
10
15 drwxr-xr-x crw-r--r-crw-r--r-brw-r--r-crw-r--r-crw-r--r-crw-r--r-crw-r--r-crw-r--r-drwxr-xr-x drwxr-xr-x
-rwxr-xr-x
2 root
1 root
1 root
1 root
1 root
1 root
1 root
1 root
1 root
2 root
2 root
1 root
2 root root root root root root root root root root root root root root
5,
1,
1,
4,
4,
1024 Sep 14 20:12 initrd/dev/
1 Sep 14 20:12 initrd/dev/console
3 Sep 14 20:12 initrd/dev/null
1 Sep 14 20:12 initrd/dev/ram
0 Sep 14 20:12 initrd/dev/systty
1 Sep 14 20:12 initrd/dev/tty1
4,
4,
4,
1 Sep 14 20:12 initrd/dev/tty2
1 Sep 14 20:12 initrd/dev/tty3
1 Sep 14 20:12 initrd/dev/tty4
1024 Sep 14 20:12 initrd/etc/
1024 Sep 14 20:12 initrd/lib/
76 Sep 14 20:12 initrd/linuxrc
1024 Sep 14 20:12 initrd/loopfs/
¦
On my system, the file does not require shared libraries.
initrd/bin/insmod version copied from is the
statically linked
& meaning it
/sbin/insmod.static
—a member of the modutils-2.3.13
package.
initrd/bin/sash is a statically linked shell from the sash-3.4
package. You can recompile insmod from source if you don’t have a statically linked version. Alternatively, copy the needed DLLs from trd/lib/ . (You can get the list of required DLLs by running
/lib/ to inildd /sbin/insmod .
Don’t forget to also copy symlinks and run
DLLs.) strip -s <lib> to reduce the size of the
Now copy into the initrd/lib/ directory the SCSI modules you require.
For example, if we have an Adaptec AIC-7850 SCSI adapter, we would require the aic7xxx.o
place it in the
¨ module from /lib/modules/<version>/scsi/aic7xxx.o
. Then, initrd/lib/ directory.
1 root root 129448 Sep 27 1999 initrd/lib/aic7xxx.o
¥
¦
The file initrd/linuxrc should contain a script to load all the modules needed for the kernel to access the SCSI partition. In this case, just the aic7xxx module
& insmod can take options such as the IRQ and IO-port for the device. See Chapter 42.
:
¨
#!/bin/sash
¥ aliasall
5 echo "Loading aic7xxx module" insmod /lib/aic7xxx.o
§ ¦
Now double-check all your permissions and then testing.
¨ chroot ˜/initrd /bin/sash chroot to the file system for
¥
323
31.7. Modifying
lilo.conf
for
initrd
31.
lilo
,
initrd
, and Booting
5
/linuxrc
§
Now, create a file system image similar to that in Section 19.9:
¨ dd if=/dev/zero of=˜/file-inird count=2500 bs=1024 losetup /dev/loop0 ˜/file-inird mke2fs /dev/loop0 mkdir ˜/mnt mount /dev/loop0 ˜/mnt cp -a initrd/* ˜/mnt/ umount ˜/mnt
Finally,
¨ gzip the file system to an appropriately named file: gzip -c ˜/file-inird > initrd-<kernel-version>
§
5
10
Your
Simply add the
¨ lilo.conf
file can be changed slightly to force use of an initrd option. For example: boot=/dev/sda prompt timeout = 50 compact vga = extended linear image = /boot/vmlinuz-2.2.17
initrd = /boot/initrd-2.2.17
label = linux root = /dev/sda1 read-only
§ initrd file system.
Notice the use of the in lilo linear option. This is a BIOS trick that you can read about
(5). It is often necessary but can make SCSI disks nonportable to different
BIOSs (meaning that you will have to rerun computer).
lilo if you move the disk to a different
¥
¦
Now that you have learned the manual method of creating an can read the mkinitrd initrd image, you man page. It creates an image in a single command. This is command is peculiar to RedHat.
324
¦
¥
¦
¦
¥
?
NIX
This chapter explains how L
INUX
(and a U
NIX system in general) initializes itself.
It follows on from the kernel boot explained in Section 31.2. We also go into some advanced uses for mgetty , like receiving of faxes.
After the kernel has been unpacked into memory, it begins to execute, initializing hardware. The last thing it does is mount the root file system, which necessarily contains a program /sbin/init , which the kernel executes.
init the kernel ever executes explicitly; the onus is then on is one of the only programs init to bring the U
NIX system up.
init always has the process ID 1 .
For the purposes of init , the (rather arbitrary) concept of a U
NIX
run level
was invented. The run level is the current operation of the machine, numbered run level 0 through run level 9 . When the U
NIX system is
at
a particular run level, it means that a certain selection of services is running. In this way, the machine could be a mail server or an Window workstation depending on what run level it is in.
The traditionally defined run levels are:
2
3
0
1
4
Halt.
Single-user mode.
Multiuser, without network file system (NFS).
Full multiuser mode.
Unused.
325
32.2.
/etc/inittab
32.
init
, ?
getty
, and U
NIX
Run Levels
7
8
5
6
9
Window System Workstation (usually identical to run level
Reboot.
Undefined.
Undefined.
Undefined.
3 ).
The idea here is that init begins at a particular run level that can then be manually changed to any other by the superuser.
init uses a list of scripts for each run level to start or stop each of the many services pertaining to that run level. These scripts are /etc/rc
?
/etc/rc.d/rc
?
.d/ . . . .
-
.d/K
NNservice
, where
NN
, K or
, or
/etc/rc
S
?
.d/S
NNservice
&
On some systems is a prefix to force the order of execution
(since the files are executed in alphabetical order).
These scripts all take the options start and stop on the command-line, to begin or terminate the service.
For example, when the particular scripts from init enters, say, run level
/etc/rc3.d/ and
5 from run level
/etc/rc5.d/
3 , it executes to bring up or down the appropriate services. This may involve, say, executing
¨
/etc/rc3.d/S20exim stop
§
¥
¦ and similar commands.
init has one config file: /etc/inittab which is scanned once on bootup.
A minimal
¨ inittab id:3:initdefault: file might consist of the following.
si::sysinit:/etc/rc.d/rc.sysinit
5
10 l0:0:wait:/etc/rc.d/rc 0 l1:1:wait:/etc/rc.d/rc 1 l2:2:wait:/etc/rc.d/rc 2 l3:3:wait:/etc/rc.d/rc 3 l4:4:wait:/etc/rc.d/rc 4 l5:5:wait:/etc/rc.d/rc 5 l6:6:wait:/etc/rc.d/rc 6
326
¥
32.
init
, ?
getty
, and U
NIX
Run Levels 32.2.
/etc/inittab
ud::once:/sbin/update
15
1:2345:respawn:/sbin/getty 38400 tty1
2:2345:respawn:/sbin/getty 38400 tty2
3:2345:respawn:/sbin/getty 38400 tty3
4:2345:respawn:/sbin/getty 38400 tty4
20
S0:2345:respawn:/sbin/mgetty -n 3 -s 115200 ttyS0 57600
S4:2345:respawn:/sbin/mgetty -r -s 19200 ttyS4 DT19200 x:5:respawn:/usr/bin/X11/xdm -nodaemon
§
The lines are colon-separated fields and have the following meaning (lots more can be gotten from inittab (5)):
id:3:initdefault:
This dictates that the default run level is that the system will boot up into. This field usually has a 3
3 . It is the run level or a 5 , which are most often the only two run levels that the system ever sits in.
si::sysinit:/etc/rc.d/rc.sysinit
tialize the system. If you view the file
This says to run a script on bootup to ini-
/etc/rc.d/rc.sysinit
, you will see a fairly long script that does the following: mounts the proc file system; initializes the keyboard maps, console font, NIS domain, host name, and swap partition; runs isapnp and depmod -a ; cleans the
script is only run once on bootup
. On Debian utmp file; as well as other things.
This
this is a script, /etc/init.d/rcS , that runs everything under /etc/rcS.d/ .
clean, elegant and extensible solution.
-
&
As usual, Debian gravitated to the most
l3:3:wait:/etc/rc.d/rc 3
The first field is a descriptive tag and could be anything. The second is a list of run levels under which the particular script (last field) is to be invoked: in this case, run level 3 . The wait
/etc/rc.d/rc 3 means to pause until is to be run when entering
/etc/rc.d/rc has finished execution. If you view the file under /etc/rc
?
.d/
/etc/rc.d/rc , you will see it merely executes scripts as appropriate for a run level change.
ud::once:/sbin/update
This flushes the disk cache on each run level change.
1:2345:respawn:/sbin/getty 38400 tty1
/sbin/getty 38400 tty1
This says to run the command when in run level 2 through 5 .
respawn means to restart the process if it dies.
x:5:respawn:/usr/bin/X11/xdm -nodaemon
/usr/bin/X11/xdm -nodaemon
This says to run the command when in run level 5 . This is the Window
System graphical login program.
327
¦
32.3. Useful Run Levels 32.
init
, ?
getty
, and U
NIX
Run Levels
If you modify the inittab file, init
SIGHUP . This is the same as typing
¨ telinit q
§ will probably not notice until you issue it a
¥
¦ which causes init to reread /etc/inittab .
You get a respawning too fast error when an inittab line makes no sense
&
These errors are common and very irritating when you are doing console work, hence an explicit section on it.
: like a getty running on a non-functioning serial port. Simply comment out or delete the appropriate line and then run
¨ telinit q
§
¥
¦
Switching run levels manually is something that is rarely done. The most common way of shutting down the machine is to use:
¨ shutdown -h now
§
¥
¦ which effectively goes to run level 0 , and
¨ shutdown -r now
§ which effectively goes to run level 6 .
¥
¦
You can also specify the run level at the
¨ linux 1
§ or
¨ linux single
§
LILO: prompt. Type to enter single-user mode when booting your machine. You change to single-user mode on a running system with:
¥
¦
¥
¦
328
32.
init
, ?
getty
, and U
NIX
Run Levels
¨ telinit S
§
You can forcefully enter any run level with
¨ telinit <N>
§
32.4.
getty
Invocation
The getty man page begins with:
getty
opens a tty port, prompts for a login name and invokes the /bin/login command. It is normally invoked by
init(8)
.
Note that of getty .
getty , agetty , fgetty and mingetty are just different implementations
The most noticeable effect of init running at all is that it spawns a login to each of the L
INUX virtual consoles. It is the as specified in the inittab getty (or sometimes mingetty ) command line above that displays this login. Once the login name is entered, getty password.
invokes the /bin/login program, which then prompts the user for a
The login program (discussed in Section 11.7) then executes a shell. When the shell dies (as a result of the user exit ing the session) getty is just respawn ed.
Together with Chapter 31 you should now have a complete picture of the entire bootup process:
7.
8.
5.
6.
3.
4.
1.
2.
First sector loaded into RAM and executed— LILO: prompt appears.
Kernel loaded from sector list.
Kernel executed; unpacks.
Kernel initializes hardware.
Kernel mounts root file system, say /dev/hda1 .
Kernel executes /sbin/init as PID 1 .
init executes all scripts for default run level.
init spawns getty programs on each terminal.
329
¥
¦
¥
¦
32.6. Incoming Faxes and Modem Logins 32.
init
, ?
getty
, and U
NIX
Run Levels
9.
10.
11.
getty prompts for login.
getty login executes /bin/login to authentic user.
starts shell.
The original purpose of getty was to manage character terminals on mainframe computers.
mgetty is a more comprehensive getty that deals with proper serial devices.
A typical
¨ inittab entry is
S4:2345:respawn:/sbin/mgetty -r -s 19200 ttyS4 DT19200
§
¥
¦ which would open a login on a terminal connected to a serial line on
See page 479 for information on configuring multiport serial cards.
/dev/ttyS4 .
(The L
INUX devices /dev/tty1 through ulate classic terminals in this way.)
/dev/tty12 as used by getty em-
mgetty will log to /var/log/mgetty.log.ttyS
?
. This log file contains everything you need for troubleshooting. It is worthwhile running tail -f on these files while watching a login take place.
Running
L
INUX
¨ mgetty (see mgetty machine. Your
(8)) is a common and trivial way to get a dial login to a inittab entry is just
S0:2345:respawn:/sbin/mgetty -n 3 -s 115200 ttyS0 57600
§
¥
¦ where -n 3 says to answer the phone after the 3 rd ring. Nothing more is needed than to plug your modem into a telephone. You can then use dip -t , as done in Section
41.1.1, to dial this machine from another L
INUX machine. Here is an example session:
&
This example assumes that an initialization string of AT&F1 is sufficient. See Section 3.5.
-
¨
dip -t
DIP: Dialup IP Protocol Driver version 3.3.7o-uri (8 Feb 96)
Written by Fred N. van Kempen, MicroWalt Corporation.
¥
330
32.
init
, ?
getty
, and U
NIX
Run Levels 32.6. Incoming Faxes and Modem Logins
5
10
DIP>
port ttyS0
DIP>
DIP>
[ Entering TERMINAL mode.
AT&F1
OK
speed 57600 term
Use CTRL-] to get back ]
ATDT5952521
CONNECT 19200/ARQ/V34/LAPM/V42BIS
15
Red Hat Linux release 6.1 (Cartman)
Kernel 2.2.12-20 on an i686
¦
Note that this is purely a login session having nothing to do with PPP dialup.
mgetty receives faxes by default, provided your modem supports faxing
&
If your modem says it supports faxing, and this still does not work, you will have to spend a lot of time reading through your modem’s AT command set manual, as well as the mgetty info documentation.
and provided it has not been explicitly disabled with the -D
¨ option. An appropriate inittab line is,
¥
¦
The options mean, respectively, to set the debug level to port speed to 57600 , and set the fax ID number to
4 , answer after
27 21 7654321
3 rings, set the
. Alternatively, you can use the line
¨
S0:2345:respawn:/sbin/mgetty ttyS0 57600
§
¥
¦ and instead put your configuration options in the file
/etc/mgetty+sendfax/ :
¨ debug 4 rings 3 speed 57600 fax-id 27 21 7654321
§ mgetty.config
under
¥
¦
Faxes end up in /var/spool/fax/incoming/ note how the command
¨ strings /sbin/mgetty | grep new_fax
§ as useless .g3
format files, but
¥
¦ gives
331
32.6. Incoming Faxes and Modem Logins 32.
init
, ?
getty
, and U
NIX
Run Levels
¨
/etc/mgetty+sendfax/new_fax
§
¥
¦ which is a script that mgetty secretly runs when new faxes arrive.
It can be used to convert faxes into something (like
.gif
.gif
graphics files
&
I recommend .png
over any day, however.
) readable by typical office programs. The following example
/etc/mgetty+sendfax/new fax script puts incoming faxes into /home/fax/ as
.gif
files that all users can access.
the CPU-intensive
¨ convert
&
Modified from the mgetty contribs.
program from the ImageMagic
Note how it uses package.
#!/bin/sh
¥
5
# you must have pbm tools and they must be in your PATH
PATH=/usr/bin:/bin:/usr/X11R6/bin:/usr/local/bin
HUP="$1"
SENDER="$2"
PAGES="$3"
10 shift 3
P=1
15
20
25
30 while [ $P -le $PAGES ] ; do
FAX=$1
BASENAME=‘basename $FAX‘
RES=‘echo $BASENAME | sed ’s/.\(.\).*/\1/’‘ if [ "$RES" = "n" ] ; then else
STRETCH="-s"
STRETCH="" fi nice g32pbm $STRETCH $FAX > /tmp/$BASENAME.pbm \
&& rm -f $FAX \
&& nice convert -colorspace gray -colors 16 -geom \ done
’50%x50%’ /tmp/$BASENAME.pbm /home/fax/$BASENAME.gif \
&& rm -f /tmp/$BASENAME.pbm \
&& chmod 0666 /home/fax/$BASENAME.gif
shift
P=‘expr $P + 1‘
¦
332
This chapter discusses the sendfax program, with reference to the specific example of setting up an artificial printer that will automatically use a modem to send its print jobs to remote fax machines.
Continuing from Section 21.10. . .
5
The
You should go now and read the sendfax command is just one program that sends faxes through the modem.
Like mgetty , it reads a config file in sendfax section of the info
/etc/mgetty+sendfax/ page for mgetty
. This config file is just
.
sendfax.config
and can contain as little as
¨ verbose y debug 5 fax-devices ttyS0 fax-id 27 21 7654321 max-tries 3 max-tries-continue y
§
¥
¦
Below, fax filter.sh
is a script that sends the print job through the fax machine after requesting the telephone number through gdialog .
& gdialog is part of the gnome-utils package.
-
¨
An appropriate /etc/printcap entry is: fax:\
:sd=/var/spool/lpd/fax:\
:mx#0:\
¥
333
33.1. Fax Through Printing 33. Sending Faxes
5
§
:sh:\
:lp=/dev/null:\
:if=/var/spool/lpd/fax/fax_filter.sh:
¦
The file fax filter.sh
the /var/log/fax log file, see page 198.
-
¨ itself could contain a script like this for a modem on /dev/ttyS0 :
#!/bin/sh
&
Remember to rotate
¥
5 exec 1>>/var/log/fax exec 2>>/var/log/fax echo echo echo [email protected]
10 echo "Starting fax ‘date‘: I am ‘id‘"
15
20
25 export DISPLAY=localhost:0.0
export HOME=/home/lp function error()
{ gdialog --title "Send Fax" --msgbox "$1" 10 75 || \ cd / echo ’Huh? no gdialog on this machine’ rm -Rf /tmp/$$fax || \ gdialog \
--title "Send Fax" \
--msgbox "rm -Rf /tmp/$$fax failed" \
10 75 exit 1
}
30 mkdir /tmp/$$fax || error "mkdir /tmp/$$fax failed" cd /tmp/$$fax || error "cd /tmp/$$fax failed" cat > fax.ps
35
40 if /usr/bin/gdialog \
--title "Send Fax" \
--inputbox "Enter the phone number to fax:" \
10 75 "" 2>TEL ; then else
: echo "gdialog failed ‘< TEL‘" rm -Rf /tmp/$$fax exit 0 fi
45
TEL=‘< TEL‘ test -z "$TEL" && error ’no telephone number given’
334
33. Sending Faxes 33.2. Setgid Wrapper Binary cat fax.ps | gs -r204x98 -sOutputFile=- -sDEVICE=faxg3 -dBATCH -q - \
1>fax.ps.g3 || error ’gs failed’
50 ls -al /var/lock/
/usr/sbin/sendfax -x 5 -n -l ttyS0 $TEL fax.ps.g3 || \ error "sendfax failed" rm -Rf /tmp/$$fax
55
¦
10
5
The above script is not enough however. Above,
/dev/ttyS0 device as well as the /var/lock/ sendfax requires access to the directory (to create a modem lock file—see Section 34.4). It cannot do that as the runs). On RedHat, the command lp user (under which the above filter ls -ald /var/lock /dev/ttyS0 reveals that only uucp is allowed to access modems. We can get around this restriction by creating a setgid (see Chapter 14) binary that runs as the uucp user. Do this by compiling the
C
program,
¨
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
¥ int main (int argc, char **argv)
{ char **a; int i;
15
/* set the real group ID to that of the effective group ID */ if (setgid (getegid ())) { perror ("sendfax_wrapper: setgid failed"); exit (1);
}
20
25
/* copy all arguments */ a = (char **) malloc ((argc + 1) * sizeof (char *)); for (i = 1; i < argc; i++) a[i] = (char *) strdup (argv[i]); a[argc] = NULL;
/* execute sendfax */ a[0] = "/usr/sbin/sendfax"; execvp (a[0], a);
/* exit on failure */ perror ("sendfax_wrapper: failed to exececute /usr/sbin/sendfax"); exit (1);
30
¦ using the commands,
335
33.2. Setgid Wrapper Binary 33. Sending Faxes
¨ gcc sendfax_wrapper.c -o /usr/sbin/sendfax_wrapper -Wall chown lp:uucp /usr/sbin/sendfax_wrapper chmod g+s,o-rwx /usr/sbin/sendfax_wrapper
§
Then, replace sendfax sendfax wrapper with sendfax wrapper just executes sendfax in the filter script. You can see that after changing the group ID to the
effective group ID
(GID) as obtained from the getegid function on line 12. The effective group
ID is uucp because of the setgid group bit (i.e., g+s ) in the chmod command, and hence sendfax runs under the uucp group with full access to the modem device.
¥
¦
On your own system it may be cleaner to try implement this without a wrapper.
Debian , for example, has a dialout group for the purposes of accessing modems.
Also be aware that some distributions may not use the uucp user in the way RedHat does and you may have to create an alternative user especially for this task.
336
uucp is a command to copy a file from one U
NIX command on another U
NIX system to another.
uux executes a system, even if that command is receiving data through stdin on the local system.
uux is extremely useful for automating many kinds of distributed functions, like mail and news.
The package.
uucp and uux commands both come as part of the uucp uucp (
Unix-to-Unix Copy
) may sound ridiculous considering the availability of modern commands like uucp rcp , rsh , or even FTP transfers (which accomplish the same thing), but has features that these do not, making it an essential, albeit antiquated, utility.
For instance, uucp never executes jobs immediately. It will, for example, queue a file copy for later processing and then dial the remote machine during the night to complete the operation.
uucp predates the Internet: It was originally used to implement a mail system, using only modems and telephone lines. It hence has sophisticated protocols for ensuring that your file/command
really does get there
, with the maximum possible fault tolerance and the minimum of retransmission. This is why it should always be used for automated tasks wherever there are unreliable (i.e., modem) connections. The uucp version that comes with most L thor.
INUX distributions is called Taylor UUCP after its au-
Especially important is that when a uucp operation is interrupted by a line break, the connection time is not wasted: uucp will not have discarded any partially transmitted data. This means that no matter how slow or error prone the connection, progress is always made. Compare this to an SMTP or POP3/IMAP connection: Any line break halfway through a large mail message will necessitate that the entire operation to be restarted from scratch.
337
34.1. Command-Line Operation 34.
uucp
and
uux
To copy a file from one machine to another, simply enter
¨
You can also run commands on the remote system, like
¨ echo -n ’Hi, this is a short message\n\n-paul’ | \
§ uux - ’cericon!rmail’ ’john’
¥
¦ which runs rmail on the remote system cericon , feeding some text to the program. Note how you should quote the !
rmail character to prevent it from being interpreted by the shell. (These commands will almost always fail with permission denied by remote . The error will come in a mail message to the user that ran the command.)
¥
¦
uucp comes with comprehensive documentation in HTML format ( /usr/doc/uucp-
version
/uucp.html
or /usr/share/ . . . ) on RedHat, and info format on Debian and RedHat. Here, I sketch a basic and typical configuration.
The uucp package has a long history of revisions, beginning with the first modem-based mail networks. The latest GNU editions that come with L
INUX distributions have a configuration file format that will probably differ from that which old uucp hands are used to.
Dialup networks today typically use alup, probably not using uucp uucp in combination with normal PPP di-
’s dial-in facilities at all. For example, if you are deploying a number of remote hosts that are using modems, these hosts should always use uucp to upload and retrieve mail, rather than POP3/IMAP or straight SMTP, because of the retransmission problem discussed above. In other words, uucp is really working as an ordinary TCP service, albeit with far more fault tolerance.
To make uucp into a TCP server, place it into /etc/inetd.conf
as follows
¨ stream tcp nowait uucp /usr/sbin/tcpd /usr/lib/uucp/uucico -l being also
very
careful to limit the hosts that can connect by using the techniques discussed in Chapter 29. Similarly for xinetd , create a file /etc/xinetd.d/uucp containing,
¥
¦
338
34.
uucp
and
uux
34.2. Configuration
5
10
¨ service uucp
{ only_from socket_type wait user server server_args disable
}
§
= 127.0.0.1 192.168.0.0/16
= stream
= no
= uucp
= /usr/lib/uucp/uucico
= -l
= no
¥
¦ uucp configuration files are stored under figure a client machine,
/etc/uucp/ machine1.cranzgot.co.za
.
Now we con-
, to send mail through server1.cranzgot.co.za
uucico service above.
, where server1.cranzgot.co.za
is running the uucp has an antiquated authentication mechanism that uses its own list of users and passwords completely distinct from those of ordinary U
NIX accounts. We must first add a common “user” and password to both machines for authentication purposes. For machine1.cranzgot.co.za
, we can add to the file /etc/uucp/call the line
¨ server1
§ machine1login pAsSwOrD123
¥
¦ which tells speak to uucp server1 .
to use the login
On machine1login server1.cranzgot.co.za
whenever trying to we can add to the file
/etc/uucp/passwd
¨ machine1login
§ the line, pAsSwOrD123
¥
¦
Note that the uucp name server1.cranzgot.co.za
server1 for convenience.
ing to do with domain names.
was uucp chosen for the machine names, however, have noth-
5
Next, we need to tell uucp about the intentions of machine1 . Any machine that you might connect to or from must be listed in the /etc/uucp/sys file. Our entry looks like
¨ system machine1 call-login * call-password * commands rmail protocol t
§
¥
¦ and can have as many entries as we like. The only things server1 has to know about machine1 are the user and password and the preferred protocol. The * ’s mean to look
339
34.2. Configuration 34.
uucp
and
uux
up the user and password in the /etc/uucp/passwd file, and protocol t means to use a simple non-error, correcting protocol (as appropriate for use over TCP). The commands option takes a space-separated list of permitted commands—for security reasons, commands not in this list cannot be executed. (This is why I stated above that commands will almost always fail with permission denied by remote —they are usually not listed under commands .)
5
The /etc/uucp/sys file on
¨ system server1 call-login * call-password * time any port TCP address 192.168.3.2
protocol t
§ machine1 will contain:
¥
¦
Here time any specifies which times of the day uucp may make calls to server1 .
The default is
The option time Never port TCP
.
&
See the uucp documentation under means that we are using a
modem
Time Strings
for more info.
named TCP to execute the dialout. All modems are defined in the file /etc/uucp/port . We can add our modem entry to
¨
/etc/uucp/port as follows, port TCP type tcp
§
¥
¦ which clearly is not really a modem at all.
Finally, we can queue a mail transfer job with
¨ echo -e ’Hi Jack\n\nHow are you?\n\n-jill" | \
§ uux - --nouucico ’server1!rmail’ ’[email protected]’
¥
¦ and copy a file with
¨ ¥
¦
Note that /var/spool/uucppublic/ is the only directory you are allowed access to by default. You should probably keep it this way for security.
Although we have queued a job for processing, nothing will transfer until the program uucico (which stands for
Unix-to-Unix copy in copy out
) is run. The idea is that both server1 and machine1 may have queued a number of jobs; then when uucico is running on both machines and talking to each other, all jobs on both machines are processed in turn, regardless of which machine initiated the connection.
340
34.
uucp
and
uux
34.3. Modem Dial
Usually uucico is run from a crond script every hour. (Even having run uucico , nothing will transfer if the time of day does not come within the ranges specified under uucico
¨ time ...
.) Here we can run tail -f /var/log/uucp/Log manually as follows: while running uucico --debug 3 --force --system server1
§
¥
¦
The higher the debug level, the more verbose output you will see in the will --force ably dial the --system server1
Log file. This regardless of when it last dialed (usually there are constraints on calling soon after a failed call: --force overrides this).
If your mail server on server1 queued the message on the remote side.
is configured correctly, it should now have
If you are really going to use answer
¨ uucp calls on server1 uucp the old-fashioned way, you can use by adding the following to your mgetty
/etc/inittab to file:
¥
¦
¥ uucp machine1login /usr/sbin/uucico -l -u machine1login to the file /etc/mgetty+sendfax/login.config
( /etc/mgetty/login.config
for Debian ). You will then also have to add a U
NIX account machine1login with password pAsSwOrD123 . This approach works is because mgetty and uucico have the same login prompt and password prompt, but mgetty uses /etc/passwd instead of /etc/uucp/passwd to authenticate. Also, for a modem connection, is error prone: change it to protocol g protocol t
, which has small packet sizes and error correction.
¦
Note that the above configuration also supports faxes, logins, voice, and PPP (see
Section 41.4) on the same modem, because mgetty only starts uucico if the user name is machine1login .
5 to your
¨
To dial out from machine1 , you first need to add a modem device (besides TCP )
/etc/uucp/port file: port ACU type modem device /dev/ttyS0 dialer mymodem speed 57600
§
¥
¦
341
34.4.
tty
/UUCP Lock Files 34.
uucp
and
uux
5
10
ACU is antiquated terminology and stands for
Automatic Calling Unit
(i.e., a modem). We have to specify the usual types of things for serial ports, like the device ( /dev/ttyS0 for a modem on COM1) and speed of the serial line. We also must specify a means to initialize the modem: the dialer mymodem option. A file
/etc/uucp/dial should then contain an entry for our type of modem matching
“ mymodem ” as follows:
See Section 3.5.
-
¨
&
This example assumes that an initialization string of AT&F1 is sufficient.
dialer mymodem chat "" AT&F1\r\d\c OK\r ATDT\D CONNECT chat-fail RING chat-fail NO\sCARRIER chat-fail ERROR chat-fail NO\sDIALTONE chat-fail BUSY chat-fail NO\sANSWER chat-fail VOICE complete \d\d+++\d\dATH\r\c abort \d\d+++\d\dATH\r\c
§
¥
¦
More about modems and dialing is covered with pppd in Chapter 41.
5
With the modem properly specified, we can change our entry in the
¨ system server1 call-login * call-password * time any port ACU phone 555-6789 protocol g
§ sys file to
The same uux commands should now work over dialup.
¥
¦
I hinted about lock files in Section 33.2. A more detailed explanation follows.
You will have noticed by now that several services use serial devices, and many of them can use the same device at different times. This creates a possible conflict should two services wish to use the same device at the same time. For instance, what if someone wants to send a fax, while another person is dialing in?
The solution is the
/var/lock/ of the form
UUCP lock file
LCK..
device
that indicates the serial port is being used by that process. For instance, when running
.
This is a file created by a process in sendfax through a modem connected on
342
34.
uucp
and
uux
34.5. Debugging
uucp
/dev/ttyS0 , a file /var/lock/LCK..ttyS0
sendfax , along with all other mgetty suddenly appears. This is because programs, obeys the UUCP lock file convention. The contents of this file actually contain the process ID of the program using the serial device, so it is easy to check whether the lock file is bogus. A lock file of such a dead process is called a
stale lock file
and can be removed manually.
uucp implementations rarely run smoothly the first time. Fortunately, you have available a variety of verbose debugging options.
uucico takes the --debug option to specify the level of debug output.
/var/log/uucp/Log , /var/log/uucp/Debug
You should examine the files
, and /var/log/uucp/Stats to get an idea about what is going on in the background.
spool directory /var/spool/uucp/
Also important is the
. You can specify the debugging level with -
-debug
level
debug chat options follows where
level
is in the range of 0 through 11 . You can also use -to only see modem communication details. A full description of other
&
Credits to the uucp documentation.
:
--debug abnormal
Output debugging messages for abnormal situations, such as recoverable errors.
--debug chat
Output debugging messages for chat scripts.
--debug handshake
Output debugging messages for the initial handshake.
--debug uucp
--debug proto
protocol Output debugging messages for the UUCP session protocol.
Output debugging messages for the individual link protocols.
--debug port
Output debugging messages for actions on the communication port.
--debug config
Output debugging messages while reading the configuration files.
--debug spooldir
--debug execute
cuted.
Output debugging messages for actions in the spool directory.
Output debugging messages whenever another program is exe-
--debug incoming
List all incoming data in the debugging file.
--debug outgoing
--debug all
List all outgoing data in the debugging file.
All of the above.
On machine1 we would like quires a pipe transport ( exim exim to spool all mail through uucp . Using transports are discussed in Section 30.3.2).
uucp exim remerely sends mail through stdin of the uux responsible for executing rmail on command and then forgets about it.
server1 . The complete exim.conf
uux is then file is simply as follows.
343
34.6. Using
uux
with
exim
34.
uucp
and
uux
5
10
15
20
25
30
35
40
¨
#################### MAIN CONFIGURATION SETTINGS ##################### log_subject errors_address = admin local_domains = localhost : ${primary_hostname} : machine1 : \ machine1.cranzgot.co.za
host_accept_relay = 127.0.0.1 : localhost : ${primary_hostname} : \ machine1 : machine1.cranzgot.co.za
never_users = root exim_user = mail exim_group = mail end
###################### TRANSPORTS CONFIGURATION ###################### uucp: driver = pipe user = nobody command = "/usr/bin/uux - --nouucico ${host}!rmail \
${local_part}@${domain}" return_fail_output = true local_delivery: driver = appendfile file = /var/spool/mail/${local_part} delivery_date_add envelope_to_add return_path_add group = mail mode_fail_narrower = mode = 0660 end
###################### DIRECTORS CONFIGURATION ####################### localuser: driver = localuser transport = local_delivery end
###################### ROUTERS CONFIGURATION ######################### touucp: driver = domainlist route_list = "* server1" transport = uucp end
###################### RETRY CONFIGURATION ###########################
* * F,2m,1m
¥
On machine server1 , exim must however be running as a full-blown mail server to properly route the mail elsewhere. Of course, on sender; hence, it appears to exim server1 , rmail is the that the mail is coming from the local machine. This means that no extra configuration is required to support mail coming mand.
from
a uux com-
Note that you can add further domains to your route list so that your dialouts occur directly to the recipient’s machine. For instance:
344
¦
34.
uucp
and
uux
34.6. Using
uux
with
exim
¨
5
§ route_list = "machine2.cranzgot.co.za machine2 ; \ machine2 machine2 ; \ machine3.cranzgot.co.za machine3 ; \ machine3 machine3 ; \
* server1"
¥
¦
5
You can then add further entries to your /etc/uucp/sys
¨ system machine2 call-login * call-password * time any port ACU phone 555-6789 protocol g file as follows:
10
15 system machine3 call-login * call-password * time any port ACU phone 554-3210 protocol g
§
¥
¦
5
10
The exim.conf
file on server1 machine1 . The router will look like this:
¨ must also have a router to get mail back to
###################### ROUTERS CONFIGURATION ######################### touucp: driver = domainlist route_list = "machine2.cranzgot.co.za machine2 ; \ machine2 machine2 ; \ machine3.cranzgot.co.za machine3 ; \ machine3 machine3" transport = uucp lookuphost: driver = lookuphost transport = remote_smtp end
§
¥
¦
This router sends all mail matching our dial-in hosts through the uucp transport while all other mail (destined for the Internet) falls through to the lookuphost router.
345
34.7. Scheduling Dialouts 34.
uucp
and
uux
Above, we used uucico only manually.
cess on its own and must be invoked by uucico crond does not operate as a daemon pro-
. All systems that use uucp have a
/etc/crontab entry or a script under /etc/cron.hourly
.
A typical /etc/crontab for machine1 might contain:
¨
45 * * * * uucp /usr/lib/uucp/uucico --master
40 8,13,18 * * * root /usr/bin/uux -r server1!
§
The option --master tells uucico to loop through all pending jobs and call any machines for which jobs are queued. It does this every hour. The second line queues a null command three times daily for the machine out to server1 server1 . This will force uucico to dial at least three times a day on the appearance of real work to be done.
The point of this to pick up any jobs coming the other way. This process is known as creating a
poll file
.
¥
¦
Clearly, you can use uucp running in demand mode, a over a TCP link initiated by uucp pppd . If a dial link is call will trigger a dialout and make a straight TCP connection through to the remote host. A common situation occurs when a number of satellite systems are dialing an ISP that has no uucp facility. To service the satellite machines, a separate uucp server is deployed that has no modems of its own. The server will have a permanent Internet connection and listen on TCP for uucp transfers.
346
INUX
This chapter reproduces the
Filesystem Hierarchy Standard
, translated into L TEX with some minor formatting changes and the addition of this book’s chapter number to all the section headers. An original can be obtained from http://www.pathname.com/fhs/
.
the FHS home page
If you have ever asked the questions “Where in my file system does file or “What is directory
yyy xxx
go?” for?”, then consult this document. It can be considered to provide the final word on such matters. Although this is mostly a reference for people creating new L
INUX distributions, all administrators can benefit from an understanding of the rulings and explanations provided here.
Filesystem Hierarchy Standard Group
edited by Rusty Russell and Daniel Quinlan
ABSTRACT
This standard consists of a set of requirements and guidelines for file and directory placement under U
NIX
-like operating systems. The guidelines are intended to support interoperability of applications, system administration tools, development tools, and scripts as well as greater uniformity of documentation for these systems.
May 23, 2001
347
35. The L
INUX
File System Standard
All trademarks and copyrights are owned by their owners, unless specifically noted otherwise.
Use of a term in this document should not be regarded as affecting the validity of any trademark or service mark.
Copyright c 1994-2000 Daniel Quinlan
Copyright c 2001 Paul ‘Rusty’ Russell
Permission is granted to make and distribute verbatim copies of this standard provided the copyright and this permission notice are preserved on all copies.
Permission is granted to copy and distribute modified versions of this standard under the conditions for verbatim copying, provided also that the title page is labeled as modified including a reference to the original standard, provided that information on retrieving the original standard is included, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
Permission is granted to copy and distribute translations of this standard into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the copyright holder.
348
35. The L
INUX
File System Standard 35.1. Introduction
This standard enables
•
Software to predict the location of installed files and directories, and
•
Users to predict the location of installed files and directories.
We do this by
•
Specifying guiding principles for each area of the filesystem,
•
Specifying the minimum files and directories required,
•
Enumerating exceptions to the principles, and
•
Enumerating specific cases where there has been historical conflict.
The FHS document is used by
•
Independent software suppliers to create applications which are FHS compliant, and work with distributions which are FHS complaint,
•
OS creators to provide systems which are FHS compliant, and
•
Users to understand and maintain the FHS compliance of a system.
A constant-width font is used for displaying the names of files and directories.
Components of filenames that vary are represented by a description of the contents enclosed in
”
<
” and ”
>
” characters,
<
thus
>
. Electronic mail addresses are also enclosed in ”
<
” and ”
>
” but are shown in the usual typeface.
Optional components of filenames are enclosed in ” [ ” and ” ] ” characters and may be combined with the ”
<
” and ”
>
” convention. For example, if a filename is allowed to occur either with or without an extension, it might be represented by
<
filename
>
[.
<
extension
>
] .
Variable substrings of directory names and filenames are indicated by ” * ”.
This standard assumes that the operating system underlying an FHS-compliant file system supports the same basic security features found in most U
NIX filesystems.
It is possible to define two independent categories of files: shareable vs. unshareable and variable vs. static. There should be a simple and easily understandable mapping from directories to the type of data they contain: directories may be mount points for other filesystems with different characteristics from the filesystem on which they are mounted.
Shareable data is that which can be shared between several different hosts; unshareable is that which must be specific to a particular host. For example, user home directories are shareable data, but device lock files are not.
349
35.2. The Filesystem 35. The L
INUX
File System Standard
Static data includes binaries, libraries, documentation, and anything that does not change without system administrator intervention; variable data is anything else that does change without system administrator intervention.
BEGIN RATIONALE
The distinction between shareable and unshareable data is needed for several reasons:
•
In a networked environment (i.e., more than one host at a site), there is a good deal of data that can be shared between different hosts to save space and ease the task of maintenance.
•
In a networked environment, certain files contain information specific to a single host.
Therefore these filesystems cannot be shared (without taking special measures).
•
Historical implementations of U
NIX
-like filesystems interspersed shareable and unshareable data in the same hierarchy, making it difficult to share large portions of the filesystem.
The ”shareable” distinction can be used to support, for example:
•
A /usr partition (or components of /usr ) mounted (read-only) through the network
(using NFS).
•
A /usr partition (or components of /usr ) mounted from read-only media. A CD-
ROM is one copy of many identical ones distributed to other users by the postal mail system and other methods. It can thus be regarded as a read-only filesystem shared with other FHS-compliant systems by some kind of ”network”.
The ”static” versus ”variable” distinction affects the filesystem in two major ways:
•
•
Since / contains both variable and static data, it needs to be mounted read-write.
Since the traditional /usr contains both variable and static data, and since we may want to mount it read-only (see above), it is necessary to provide a method to have
/usr mounted read-only. This is done through the creation of a /var hierarchy that is mounted read-write (or is a part of another read-write partition, such as /) , taking over much of the /usr partition’s traditional functionality.
Here is a summarizing chart. This chart is only an example for a common FHS-compliant system, other chart layouts are possible within FHS-compliance.
static variable shareable
/usr
/opt
/var/mail
/var/spool/news unshareable
/etc
/boot
/var/run
/var/lock
END RATIONALE
350
35. The L
INUX
File System Standard 35.3. The Root Filesystem
The contents of the root filesystem must be adequate to boot, restore, recover, and/or repair the system.
•
To boot a system, enough must be present on the root partition to mount other filesystems.
This includes utilities, configuration, boot loader information, and other essential startup data.
/usr , /opt , and partitions or filesystems.
/var are designed such that they may be located on other
•
To enable recovery and/or repair of a system, those utilities needed by an experienced maintainer to diagnose and reconstruct a damaged system must be present on the root filesystem.
•
To restore a system, those utilities needed to restore from system backups (on floppy, tape, etc.) must be present on the root filesystem.
BEGIN RATIONALE
The primary concern used to balance these considerations, which favor placing many things on the root filesystem, is the goal of keeping root as small as reasonably possible.
For several reasons, it is desirable to keep the root filesystem small:
•
It is occasionally mounted from very small media.
•
The root filesystem contains many system-specific configuration files. Possible examples include a kernel that is specific to the system, a specific hostname, etc. This means that the root filesystem isn’t always shareable between networked systems. Keeping it small on servers in networked systems minimizes the amount of lost space for areas of unshareable files. It also allows workstations with smaller local hard drives.
•
While you may have the root filesystem on a large partition, and may be able to fill it to your heart’s content, there will be people with smaller partitions. If you have more files installed, you may find incompatibilities with other systems using root filesystems on smaller partitions. If you are a developer then you may be turning your assumption into a problem for a large number of users.
•
Disk errors that corrupt data on the root filesystem are a greater problem than errors on any other partition. A small root filesystem is less prone to corruption as the result of a system crash.
Software must never create or require special files or subdirectories in the root directory.
Other locations in the FHS hierarchy provide more than enough flexibility for any package.
There are several reasons why introducing a new subdirectory of the root filesystem is prohibited:
•
It demands space on a root partition which the system administrator may want kept small and simple for either performance or security reasons.
•
It evades whatever discipline the system administrator may have set up for distributing standard file hierarchies across mountable volumes.
END RATIONALE
351
35.3. The Root Filesystem 35. The L
INUX
File System Standard
The following directories, or symbolic links to directories, are required in / .
/ ——— the root directory bin Essential command binaries boot dev
Static files of the boot loader
Device files etc lib mnt opt sbin tmp usr var
Host-specific system configuration
Essential shared libraries and kernel modules
Mount point for mounting a filesystem temporarily
Add-on application software packages
Essential system binaries
Temporary files
Secondary hierarchy
Variable data
Each directory listed above is specified in detail in separate subsections below.
/usr and each have a complete section in this document due to the complexity of those directories.
/var
The following directories, or symbolic links to directories, must be in subsystem is installed:
/ , if the corresponding
/ ——— the root directory home User home directories (optional) lib
<
qual
>
root
Alternate format essential shared libraries (optional)
Home directory for the root user (optional)
Each directory listed above is specified in detail in separate subsections below.
352
35. The L
INUX
File System Standard 35.3. The Root Filesystem
35.3.4.1 Purpose
/bin contains commands that may be used by both the system administrator and by users, but which are required when no other filesystems are mounted (e.g. in single user mode). It may also contain commands which are used indirectly by scripts.
1
35.3.4.2 Requirements
There must be no subdirectories in /bin .
The following commands, or symbolic links to commands, are required in /bin .
login ls mkdir mknod more mount mv ps pwd rm rmdir sed sh stty su sync cat chgrp chmod chown cp date dd df dmesg echo false hostname kill ln
Utility to concatenate files to standard output
Utility to change file group ownership
Utility to change file access permissions
Utility to change file owner and group
Utility to copy files and directories
Utility to print or set the system data and time
Utility to convert and copy a file
Utility to report filesystem disk space usage
Utility to print or control the kernel message buffer
Utility to display a line of text
Utility to do nothing, unsuccessfully
Utility to show or set the system’s host name
Utility to send signals to processes
Utility to make links between files
Utility to begin a session on the system
Utility to list directory contents
Utility to make directories
Utility to make block or character special files
Utility to page through text
Utility to mount a filesystem
Utility to move/rename files
Utility to report process status
Utility to print name of current working directory
Utility to remove files or directories
Utility to remove empty directories
The ‘sed’ stream editor
The Bourne command shell
Utility to change and print terminal line settings
Utility to change user ID
Utility to flush filesystem buffers
1
Command binaries that are not essential enough to place into /bin must be placed in /usr/bin , instead. Items that are required only by non-root users (the X Window System, chsh , etc.) are generally not essential enough to be placed into the root partition.
353
35.3. The Root Filesystem 35. The L
INUX
File System Standard true umount uname
Utility to do nothing, successfully
Utility to unmount file systems
Utility to print system information
If /bin/sh mand.
is not a true Bourne shell, it must be a hard or symbolic link to the real shell com-
The [ and test commands must be placed together in either /bin or /usr/bin .
BEGIN RATIONALE
For example bash behaves differently when called as link also allows users to easily see that /bin/sh sh or bash . The use of a symbolic is not a true Bourne shell.
The requirement for the [ and test commands to be included as binaries (even if implemented internally by the shell) is shared with the POSIX.2 standard.
END RATIONALE
35.3.4.3 Specific Options
The following programs, or symbolic links to programs, must be in subsystem is installed:
/bin if the corresponding csh ed tar cpio gzip gunzip zcat netstat ping
The C shell (optional)
The ‘ed’ editor (optional)
The tar archiving utility (optional)
The cpio archiving utility (optional)
The GNU compression utility (optional)
The GNU uncompression utility (optional)
The GNU uncompression utility (optional)
The network statistics utility (optional)
The ICMP network test utility (optional)
If the gunzip and zcat programs exist, they must be symbolic or hard links to gzip.
may be a symbolic link to /bin/tcsh or /usr/bin/tcsh .
/bin/csh
BEGIN RATIONALE
The tar, gzip and cpio commands have been added to make restoration of a system possible
(provided that / is intact).
Conversely, if no restoration from the root partition is ever expected, then these binaries might be omitted (e.g., a ROM chip root, mounting /usr through NFS). If restoration of a system is planned through the network, then ftp or tftp (along with everything necessary to get an ftp connection) must be available on the root partition.
END RATIONALE
35.3.5.1 Purpose
This directory contains everything required for the boot process except configuration files and the map installer. Thus /boot stores data that is used before the kernel begins executing user-
354
35. The L
INUX
File System Standard 35.3. The Root Filesystem mode programs. This may include saved master boot sectors, sector map files, and other data that is not directly edited by hand.
2
35.3.5.2 Specific Options
The operating system kernel must be located in either / or /boot .
3
35.3.6.1 Purpose
The /dev directory is the location of special or device files.
35.3.6.2 Specific Options
If it is possible that devices in /dev will need to be manually created, /dev must contain a command named MAKEDEV , which can create devices as needed. It may also contain a
MAKEDEV.local
for any local devices.
If required, MAKEDEV must have provisions for creating any device that may be found on the system, not just those that a particular implementation installs.
35.3.7.1 Purpose
/etc contains configuration files and directories that are specific to the current system.
4
35.3.7.2 Requirements
No binaries may be located under /etc .
The following directories, or symbolic links to directories are required in /etc :
/etc ——— Host-specific system configuration opt Configuration for /opt
2
Programs necessary to arrange for the boot loader to be able to boot a file must be placed in /sbin .
Configuration files for boot loaders must be placed in /etc .
3
On some i386 machines, it may be necessary for /boot to be located on a separate partition located completely below cylinder 1024 of the boot device due to hardware constraints.
Certain MIPS systems require a /boot partition that is a mounted MS-DOS filesystem or whatever other filesystem type is accessible for the firmware. This may result in restrictions with respect to usable filenames within /boot (only for affected systems).
4
The setup of command scripts invoked at boot time may resemble System V, BSD or other models.
Further specification in this area may be added to a future version of this standard.
355
35.3. The Root Filesystem 35. The L
INUX
File System Standard
35.3.7.3 Specific Options
The following directories, or symbolic links to directories must be in subsystem is installed:
/etc , if the corresponding
/etc ——— Host-specific system configuration
X11 Configuration for the X Window System (optional) sgml Configuration for SGML and XML (optional)
The following files, or symbolic links to files, must be in installed:
5
/etc if the corresponding subsystem is csh.login
exports fstab ftpusers gateways gettydefs group host.conf
hosts hosts.allow
hosts.deny
hosts.equiv
hosts.lpd
inetd.conf
inittab issue ld.so.conf
motd mtab mtools.conf
networks passwd printcap profile protocols resolv.conf
rpc securetty services shells syslog.conf
Systemwide initialization file for C shell logins (optional)
NFS filesystem access control list (optional)
Static information about filesystems (optional)
FTP daemon user access control list (optional)
File which lists gateways for routed (optional)
Speed and terminal settings used by getty (optional)
User group file (optional)
Resolver configuration file (optional)
Static information about host names (optional)
Host access file for TCP wrappers (optional)
Host access file for TCP wrappers (optional)
List of trusted hosts for rlogin, rsh, rcp (optional)
List of trusted hosts for lpd (optional)
Configuration file for inetd (optional)
Configuration file for init (optional)
Pre-login message and identification file (optional)
List of extra directories to search for shared libraries (optional)
Post-login message of the day file (optional)
Dynamic information about filesystems (optional)
Configuration file for mtools (optional)
Static information about network names (optional)
The password file (optional)
The lpd printer capability database (optional)
Systemwide initialization file for sh shell logins (optional)
IP protocol listing (optional)
Resolver configuration file (optional)
RPC protocol listing (optional)
TTY access control for root login (optional)
Port names for network services (optional)
Pathnames of valid login shells (optional)
Configuration file for syslogd (optional)
5
Systems that use the shadow password suite will have additional configuration files in
( /etc/shadow and others) and programs in /usr/sbin ( useradd , usermod , and others).
/etc
356
35. The L
INUX
File System Standard 35.3. The Root Filesystem mtab does not fit the static nature of /etc : it is excepted for historical reasons.
6
35.3.7.4 /etc/opt : Configuration files for /opt
35.3.7.4.1 Purpose
Host-specific configuration files for add-on application software packages must be installed within the directory in /opt
/etc/opt/
<
package
>
, where
<
package where the static data from that package is stored.
>
is the name of the subtree
35.3.7.4.2 Requirements
No structure is imposed on the internal arrangement of /etc/opt/
<
package
>
.
If a configuration file must reside in a different location in order for the package or system to function properly, it may be placed in a location other than /etc/opt/
<
package
>
.
BEGIN RATIONALE
Refer to the rationale for /opt .
END RATIONALE
35.3.7.5 /etc/X11 : Configuration for the X Window System (optional)
35.3.7.5.1 Purpose
/etc/X11 is the location for all X11 host-specific configuration. This directory is necessary to allow local control if /usr is mounted read only.
35.3.7.5.2 Specific Options
The following files, or symbolic links to files, must be in /etc/X11 if the corresponding subsystem is installed:
Xconfig
XF86Config
Xmodmap
The configuration file for early versions of XFree86 (optional)
The configuration file for XFree86 versions 3 and 4 (optional)
Global X11 keyboard modification file (optional)
Subdirectories of /etc/X11 may include those for dow managers, for example) that need them.
7 xdm and for any other programs (some win-
We recommend that window managers with only one configuration file which is a default .*wmrc file must name it system.*wmrc (unless there is a widely-accepted alternative name) and not use a subdirectory. Any window manager subdirectories must be identically named to the actual window manager binary.
6
On some Linux systems, this may be a symbolic link to /proc/mounts , in which case this exception is not required.
7
/etc/X11/xdm holds the configuration files for xdm . These are most of the files previously found in
/usr/lib/X11/xdm . Some local variable data for xdm is stored in /var/lib/xdm .
357
35.3. The Root Filesystem 35. The L
INUX
File System Standard
35.3.7.6 /etc/sgml : Configuration files for SGML and XML (optional)
35.3.7.6.1 Purpose
Generic configuration files defining high-level parameters of the SGML or XML systems are installed here. Files with names
*.cat
*.conf
indicate generic configuration files. File with names are the DTD-specific centralized catalogs, containing references to all other catalogs needed to use the given DTD. The super catalog file catalogs.
catalog references all the centralized
35.3.8.1 Purpose
/home is a fairly standard concept, but it is clearly a site-specific filesystem.
differ from host to host. Therefore, no program should rely on this location.
9
8
The setup will
35.3.9.1 Purpose
The /lib directory contains those shared library images needed to boot the system and run the commands in the root filesystem, ie. by binaries in /bin and /sbin .
10
35.3.9.2 Requirements
At least one of each of the following filename patterns are required (they may be files, or symbolic links): libc.so.* ld*
The dynamically-linked C library (optional)
The execution time linker/loader (optional)
8
Different people prefer to place user accounts in a variety of places. This section describes only a suggested placement for user home directories; nevertheless we recommend that all FHS-compliant distributions use this as the default location for home directories.
On small systems, each user’s directory is typically one of the many subdirectories of /home such as /home/smith ,
/home
/home/torvalds , /home/operator , etc. On large systems (especially when the directories are shared amongst many hosts using NFS) it is useful to subdivide user home directories. Subdivision may be accomplished by using subdirectories such as /home/staff , /home/guests ,
/home/students , etc.
9
If you want to find out a user’s home directory, you should use the getpwent(3) library function rather than relying on /etc/passwd because user information may be stored remotely using systems such as NIS.
10
Shared libraries that are only necessary for binaries in /usr (such as any X Window binaries) must not be in /lib . Only the shared libraries required to run binaries in /bin and /sbin may be here. In particular, the library libm.so.* may also be placed in /usr/lib if it is not required by anything in /bin or /sbin .
358
35. The L
INUX
File System Standard 35.3. The Root Filesystem
If a C preprocessor is installed, /lib/cpp must be a reference to it, for historical reasons.
11
35.3.9.3 Specific Options
The following directories, or symbolic links to directories, must be in subsystem is installed:
/lib , if the corresponding
/lib ——— essential shared libraries and kernel modules modules Loadable kernel modules (optional)
<
>
35.3.10.1 Purpose
There may be one or more variants of the /lib one binary format requiring separate libraries.
12 directory on systems which support more than
35.3.10.2 Requirements
If one or more of these directories exist, the requirements for their contents are the same as the normal /lib directory, except that /lib
<
qual
>
/cpp is not required.
13
35.3.11.1 Purpose
This directory is provided so that the system administrator may temporarily mount a filesystem as needed. The content of this directory is a local issue and should not affect the manner in which any program is run.
This directory must not be used by installation programs: a suitable temporary directory not in use by the system must be used instead.
11
The usual placement of this binary is
/lib/cpp
/usr/lib/gcc-lib/
<
target
>
/
<
version
>
/cpp .
can either point at this binary, or at any other reference to this binary which exists in the filesystem. (For example, /usr/bin/cpp is also often used.)
12
This is commonly used for 64-bit or 32-bit support on systems which support multiple binary formats, but require libraries of the same name. In this case, /lib32 and /lib64 might be the library directories, and /lib a symlink to one of them.
13
/lib
<
qual
>
/cpp is still permitted: this allows the case where /lib and /lib
<
qual
>
are the same
(one is a symbolic link to the other).
359
35.3. The Root Filesystem 35. The L
INUX
File System Standard
35.3.12.1 Purpose
/opt is reserved for the installation of add-on application software packages.
A package to be installed in /opt must locate its static files in a separate /opt/
<
package
>
directory tree, where
<
package
>
is a name that describes the software package.
35.3.12.2 Requirements
/opt ——— Add-on application software packages
<
package
>
Static package objects
The directories /opt/bin , /opt/doc , /opt/include , /opt/info , /opt/lib , and
/opt/man are reserved for local system administrator use. Packages may provide ”front-end” files intended to be placed in (by linking or copying) these reserved directories by the local system administrator, but must function normally in the absence of these reserved directories.
Programs to be invoked by users must be located in the directory the package includes U
NIX manual pages, they must be located in
/opt/
<
package
>
/bin . If
/opt/
<
package
>
/man and the same substructure as /usr/share/man must be used.
Package files that are variable (change in normal operation) must be installed in /var/opt . See the section on /var/opt for more information.
Host-specific configuration files must be installed in more information.
/etc/opt . See the section on /etc for
No other package files may exist outside the /opt , /var/opt , and /etc/opt hierarchies except for those package files that must reside in specific locations within the filesystem tree in order to function properly. For example, device lock files must be placed in /var/lock and devices must be located in /dev .
Distributions may install software in /opt , but must not modify or delete software installed by the local system administrator without the assent of the local system administrator.
BEGIN RATIONALE
The use of /opt for add-on software is a well-established practice in the U
NIX community.
The System V Application Binary Interface [AT&T 1990], based on the System V Interface
Definition (Third Edition), provides for an /opt structure very similar to the one defined here.
The Intel Binary Compatibility Standard v. 2 (iBCS2) also provides a similar structure for
/opt .
Generally, all data required to support a package on a system must be present within
/opt/
<
package
>
, including files intended to be copied into /etc/opt/
<
package
>
and /var/opt/
<
package
>
as well as reserved directories in /opt .
The minor restrictions on distributions using /opt are necessary because conflicts are possible between distribution-installed and locally-installed software, especially in the case of fixed pathnames found in some binary software.
END RATIONALE
360
35. The L
INUX
File System Standard 35.3. The Root Filesystem
35.3.13.1 Purpose
The root account’s home directory may be determined by developer or local preference, but this is the recommended default location.
14
35.3.14.1 Purpose
Utilities used for system administration (and other root-only commands) are stored in /sbin ,
/usr/sbin , and /usr/local/sbin .
/sbin contains binaries essential for booting, restoring, recovering, and/or repairing the system in addition to the binaries in /bin .
15
Programs executed after /usr is known to be mounted (when there are no problems) are generally placed into into
/usr/sbin . Locally-installed system administration programs should be placed
/usr/local/sbin .
16
35.3.14.2 Requirements
The following commands, or symbolic links to commands, are required in /sbin .
shutdown Command to bring the system down.
35.3.14.3 Specific Options
The following files, or symbolic links to files, must be in /sbin is installed: if the corresponding subsystem
14
If the home directory of the root account is not stored on the root partition it will be necessary to make certain it will default to / if it can not be located.
We recommend against using the root account for tasks that can be performed as an unprivileged user, and that it be used solely for system administration. For this reason, we recommend that subdirectories for mail and other applications not appear in the root account’s home directory, and that mail for administration roles such as root, postmaster, and webmaster be forwarded to an appropriate user.
15
Originally, /sbin binaries were kept in /etc .
16
Deciding what things go into "sbin" directories is simple: if a normal (not a system administrator) user will ever run it directly, then it must be placed in one of the have to place any of the sbin directories in their path.
"bin" directories. Ordinary users should not
For example, files such as chfn which users only occasionally use must still be placed in /usr/bin .
ping , although it is absolutely necessary for root (network recovery and diagnosis) is often used by users and must live in /bin for that reason.
We recommend that users have read and execute permission for everything in certain setuid and setgid programs. The division between /bin and /sbin
/sbin except, perhaps, was not created for security reasons or to prevent users from seeing the operating system, but to provide a good partition between binaries that everyone uses and ones that are primarily used for administration tasks. There is no inherent security advantage in making /sbin off-limits for users.
361
35.4. The /usr Hierarchy 35. The L
INUX
File System Standard fastboot fasthalt fdisk fsck fsck.* getty halt ifconfig init mkfs mkfs.* mkswap reboot route swapon swapoff update
Reboot the system without checking the disks (optional)
Stop the system without checking the disks (optional)
Partition table manipulator (optional)
File system check and repair utility (optional)
File system check and repair utility for a specific filesystem (optional)
The getty program (optional)
Command to stop the system (optional)
Configure a network interface (optional)
Initial process (optional)
Command to build a filesystem (optional)
Command to build a specific filesystem (optional)
Command to set up a swap area (optional)
Command to reboot the system (optional)
IP routing table utility (optional)
Enable paging and swapping (optional)
Disable paging and swapping (optional)
Daemon to periodically flush filesystem buffers (optional)
35.3.15.1 Purpose
The /tmp directory must be made available for programs that require temporary files.
Programs must not assume that any files or directories in tions of the program.
/tmp are preserved between invoca-
BEGIN RATIONALE
IEEE standard P1003.2 (POSIX, part 2) makes requirements that are similar to the above section.
Although data stored in /tmp may be deleted in a site-specific manner, it is recommended that files and directories located in /tmp be deleted whenever the system is booted.
FHS added this recommendation on the basis of historical precedent and common practice, but did not make it a requirement because system administration is not within the scope of this standard.
END RATIONALE
/usr is the second major section of the filesystem.
means that /usr
/usr is shareable, read-only data. That should be shareable between various FHS-compliant hosts and must not be written to. Any information that is host-specific or varies with time is stored elsewhere.
Large software packages must not use a direct subdirectory under the /usr hierarchy.
362
35. The L
INUX
File System Standard 35.4. The /usr Hierarchy
The following directories, or symbolic links to directories, are required in /usr .
/usr ——— Secondary Hierarchy bin Most user commands include lib
Header files included by C programs
Libraries local sbin share
Local hierarchy (empty after main installation)
Non-vital system binaries
Architecture-independent data
/usr ——— Secondary Hierarchy
X11R6 X Window System, version 11 release 6 (optional) games lib
<
qual
>
src
Games and educational binaries (optional)
Alternate Format Libraries (optional)
Source code (optional)
An exception is made for the X Window System because of considerable precedent and widelyaccepted practice.
The following symbolic links to directories may be present. This possibility is based on the need to preserve compatibility with older systems until all implementations can be assumed to use the /var hierarchy.
/usr/spool -> /var/spool
/usr/tmp -> /var/tmp
/usr/spool/locks -> /var/lock
Once a system no longer requires any one of the above symbolic links, the link may be removed, if desired.
35.4.4.1 Purpose
This hierarchy is reserved for the X Window System, version 11 release 6, and related files.
To simplify matters and make XFree86 more compatible with the X Window System on other systems, the following symbolic links must be present if /usr/X11R6 exists:
363
35.4. The /usr Hierarchy 35. The L
INUX
File System Standard
/usr/bin/X11 -> /usr/X11R6/bin
/usr/lib/X11 -> /usr/X11R6/lib/X11
/usr/include/X11 -> /usr/X11R6/include/X11
In general, software must not be installed or managed via the above symbolic links. They are intended for utilization by users only. The difficulty is related to the release version of the X
Window System — in transitional periods, it is impossible to know what release of X11 is in use.
35.4.4.2 Specific Options
Host-specific data in /usr/X11R6/lib/X11 should be interpreted as a demonstration file. Applications requiring information about the current host must reference a configuration file in
/etc/X11 , which may be linked to a file in /usr/X11R6/lib .
17
35.4.5.1 Purpose
This is the primary directory of executable commands on the system.
35.4.5.2 Specific Options
The following directories, or symbolic links to directories, must be in sponding subsystem is installed:
/usr/bin , if the corre-
/usr/bin ——— Binaries that are not needed in single-user mode mh Commands for the MH mail handling system (optional)
/usr/bin/X11 must be a symlink to /usr/X11R6/bin if the latter exists.
The following files, or symbolic links to files, must be in system is installed:
/usr/bin , if the corresponding subperl python tclsh wish expect
The Practical Extraction and Report Language (optional)
The Python interpreted language (optional)
Simple shell containing Tcl interpreter (optional)
Simple Tcl/Tk windowing shell (optional)
Program for interactive dialog (optional)
BEGIN RATIONALE
Because shell script interpreters (invoked with #!
<
path
>
on the first line of a shell script) cannot rely on a path, it is advantageous to standardize their locations. The Bourne shell and C-shell interpreters are already fixed in /bin , but Perl, Python, and Tcl are often found
17
Examples of such configuration files include Xconfig , XF86Config , or system.twmrc
)
364
35. The L
INUX
File System Standard 35.4. The /usr Hierarchy in many different places. They may be symlinks to the physical location of the shell interpreters.
END RATIONALE
35.4.6.1 Purpose
This is where all of the system’s general-use include files for the C programming language should be placed.
35.4.6.2 Specific Options
The following directories, or symbolic links to directories, must be in corresponding subsystem is installed:
/usr/include , if the
/usr/include ——— Include files bsd BSD compatibility include files (optional)
The symbolic link exists.
/usr/include/X11 must link to /usr/X11R6/include/X11 if the latter
35.4.7.1 Purpose
/usr/lib includes object files, libraries, and internal binaries that are not intended to be executed directly by users or shell scripts.
18
Applications may use a single subdirectory under /usr/lib . If an application uses a subdirectory, all architecture-dependent data exclusively used by the application must be placed within that subdirectory.
19
35.4.7.2 Specific Options
For historical reasons,
/usr/sbin/sendmail
/usr/lib/sendmail if the latter exists.
20 must be a symbolic link to
18
Miscellaneous architecture-independent application-specific static files and subdirectories must be placed in /usr/share .
19
For example, the perl5 subdirectory for Perl 5 modules and libraries.
20
Some executable commands such as makewhatis and sendmail have also been traditionally placed in
/usr/lib .
makewhatis is an internal binary and must be placed in a binary directory; users access only catman . Newer sendmail binaries are now placed by default in /usr/sbin . Additionally, systems using a sendmail -compatible mail transfer agent must provide /usr/sbin/sendmail as a symbolic link to the appropriate executable.
365
35.4. The /usr Hierarchy 35. The L
INUX
File System Standard
If /lib/X11 exists, /usr/lib/X11
/lib/X11 is a symbolic link to.
21 must be a symbolic link to /lib/X11 , or to whatever
<
>
35.4.8.1 Purpose
/usr/lib
<
qual
>
performs the same role as /usr/lib for an alternate binary format, except that the symbolic links required.
22
/usr/lib
<
qual
>
/sendmail and /usr/lib
<
qual
>
/X11 are not
35.4.9.1 Purpose
The /usr/local hierarchy is for use by the system administrator when installing software locally. It needs to be safe from being overwritten when the system software is updated. It may be used for programs and data that are shareable amongst a group of hosts, but not found in
/usr .
Locally installed software must be placed within being installed to replace or upgrade software in
/usr/local
/usr .
23 rather than /usr unless it is
35.4.9.2 Requirements
The following directories, or symbolic links to directories, must be in /usr/local
/usr/local ——— Local hierarchy bin Local binaries games include lib
Local game binaries
Local C header files
Local libraries man sbin share
Local online manuals
Local system binaries
Local architecture-independent hierarchy
21
Host-specific data for the X Window System must not be stored in /usr/lib/X11 . Host-specific configuration files such as Xconfig or XF86Config must be stored in /etc/X11 . This includes configuration data such as system.twmrc
even if it is only made a symbolic link to a more global configuration file
(probably in /usr/X11R6/lib/X11 ).
22
The case where /usr/lib and /usr/lib
<
qual
>
are the same (one is a symbolic link to the other) these files and the per-application subdirectories will exist.
23
Software placed in / or /usr may be overwritten by system upgrades (though we recommend that distributions do not overwrite data in /etc under these circumstances). For this reason, local software must not be placed outside of /usr/local without good reason.
366
35. The L
INUX
File System Standard 35.4. The /usr Hierarchy src Local source code
No other directories, except those listed below, may be in
FHS-compliant system.
/usr/local after first installing a
35.4.9.3 Specific Options
If directories exist in
/lib
<
/usr/local qual
.
>
or /usr/lib
<
qual
>
exist, the equivalent directories must also
35.4.10.1 Purpose
This directory contains any non-essential binaries used exclusively by the system administrator.
System administration programs that are required for system repair, system recovery, mounting
/usr , or other essential functions must be placed in /sbin instead.
24
35.4.11.1 Purpose
The /usr/share hierarchy is for all read-only architecture independent data files.
25
This hierarchy is intended to be shareable among all architecture platforms of a given OS; thus, for example, a site with i386, Alpha, and PPC platforms might maintain a single /usr/share directory that is centrally-mounted. Note, however, that /usr/share is generally not intended to be shared by different OSes or by different releases of the same OS.
Any program or package which contains or requires data that doesn’t need to be modified should store that data in /usr/share (or /usr/local/share , if installed locally). It is recommended that a subdirectory be used in /usr/share for this purpose.
Game data stored in /usr/share/games must be purely static data. Any modifiable files, such as score files, game play logs, and so forth, should be placed in /var/games .
35.4.11.2 Requirements
The following directories, or symbolic links to directories, must be in /usr/share
/usr/share ——— Architecture-independent data man Online manuals misc Miscellaneous architecture-independent data
24
Locally installed system administration programs should be placed in /usr/local/sbin .
25
Much of this data originally lived in /usr ( man , doc ) or /usr/lib ( dict , terminfo , zoneinfo ).
367
35.4. The /usr Hierarchy 35. The L
INUX
File System Standard
35.4.11.3 Specific Options
The following directories, or symbolic links to directories, must be in /usr/share , if the corresponding subsystem is installed:
/usr/share ——— Architecture-independent data dict Word lists (optional) doc games
Miscellaneous documentation (optional)
Static data files for /usr/games (optional) info locale nls
GNU Info system’s primary directory (optional)
Locale information (optional)
Message catalogs for Native language support (optional) sgml terminfo tmac zoneinfo
SGML and XML data (optional)
Directories for terminfo database (optional) troff macros not distributed with groff (optional)
Timezone information and configuration (optional)
It is recommended that application-specific, architecture-independent directories be placed here.
Such directories include groff , perl , ghostscript ,
(BSD). They may, however, be placed in /usr/lib texmf , and kbd (Linux) or syscons for backwards compatibility, at the distributor’s discretion. Similarly, a /usr/lib/games hierarchy may be used in addition to the
/usr/share/games hierarchy if the distributor wishes to place some game data there.
35.4.11.4 /usr/share/dict : Word lists (optional)
35.4.11.4.1 Purpose
This directory is the home for word lists on the system; Traditionally this directory contains only the English words file, which is used by use either American or British spelling.
look(1) and various spelling programs.
words may
BEGIN RATIONALE
The reason that only word lists are located here is that they are the only files common to all spell checkers.
END RATIONALE
35.4.11.4.2 Specific Options
The following files, or symbolic links to files, must be in ing subsystem is installed:
/usr/share/dict , if the correspondwords List of English words (optional)
368
35. The L
INUX
File System Standard 35.4. The /usr Hierarchy
Sites that require both American
/usr/share/dict/american-english and or
British spelling may link words
/usr/share/dict/british-english .
to
Word lists for other languages may be added using the English name for that language, e.g.,
/usr/share/dict/french , /usr/share/dict/danish , etc. These should, if possible, use an ISO 8859 character set which is appropriate for the language in question; if possible the Latin1
(ISO 8859-1) character set should be used (this is often not possible).
Other word lists must be included here, if present.
35.4.11.5 /usr/share/man : Manual pages
35.4.11.5.1 Purpose
This section details the organization for manual pages throughout the system, including
/usr/share/man . Also refer to the section on /var/cache/man .
The primary
<
mandir
>
of the system is /usr/share/man ual information for commands and data under the / and
.
/usr/share/man contains man-
/usr filesystems.
26
Manual pages are stored in tion of
<
mandir
>
,
<
<
locale mandir
>
,
<
>
/
<
locale section
>
, and
>
<
/man arch
<
>
section
>
/
<
arch is given below.
>
. An explana-
A description of each section follows:
•
man1 : User programs
Manual pages that describe publicly accessible commands are contained in this chapter.
Most program documentation that a user will need to use is located here.
•
man2 : System calls
This section describes all of the system calls (requests for the kernel to perform operations).
•
man3 : Library functions and subroutines
Section 3 describes program library routines that are not direct calls to kernel services.
This and chapter 2 are only really of interest to programmers.
•
man4 : Special files
Section 4 describes the special files, related driver functions, and networking support available in the system. Typically, this includes the device files found in kernel interface to networking protocol support.
/dev and the
•
man5 : File formats
The formats for many data files are documented in the section 5. This includes various include files, program output files, and system files.
•
man6 : Games
This chapter documents games, demos, and generally trivial programs. Different people have various notions about how essential this is.
•
man7 : Miscellaneous
Manual pages that are difficult to classify are designated as being section 7. The troff and other text processing macro packages are found here.
26
Obviously, there are no manual pages in required in emergencies.
27
27
Really.
/ because they are not required at boot time nor are they
369
35.4. The /usr Hierarchy 35. The L
INUX
File System Standard
•
man8 : System administration
Programs used by system administrators for system operation and maintenance are documented here. Some of these programs are also occasionally useful for normal users.
35.4.11.5.2 Specific Options
The following directories, or symbolic links to
/usr/share/
<
mandir
>
/
<
locale
>
, unless they are empty:
28 directories,
<
mandir
>
/
<
locale
>
——— A manual page hierarchy man1 User programs (optional) man2 man3 man4
System calls (optional)
Library calls (optional)
Special files (optional) man5 man6 man7 man8
File formats (optional)
Games (optional)
Miscellaneous (optional)
System administration (optional) must be in
The component
<
section
>
describes the manual section.
Provisions must be made in the structure of /usr/share/man to support manual pages which are written in different (or multiple) languages. These provisions must take into account the storage and reference of these manual pages. Relevant factors include language (including geographical-based differences), and character code set.
This naming of language subdirectories of /usr/share/man is based on Appendix E of the
POSIX 1003.1 standard which describes the locale identification string — the most well-accepted method to describe a cultural environment. The
<
locale
>
string is:
<
language
>
[
<
territory
>
][.
<
character-set
>
][,
<
version
>
]
The
<
language
>
field must be taken from ISO 639 (a code for the representation of names of languages). It must be two characters wide and specified with lowercase letters only.
The
<
territory
>
field must be the two-letter code of ISO 3166 (a specification of representations of countries), if possible. (Most people are familiar with the two-letter codes used for the country codes in email addresses.
29
) It must be two characters wide and specified with uppercase letters only.
The
<
character-set
>
field must represent the standard describing the character set. If the
<
character-set
>
field is just a numeric specification, the number represents the number of
28
For example, if /usr/local/man has no manual pages in section 4 (Devices), then
/usr/local/man/man4 may be omitted.
29
A major exception to this rule is the United Kingdom, which is ‘GB’ in the ISO 3166, but ‘UK’ for most email addresses.
370
35. The L
INUX
File System Standard 35.4. The /usr Hierarchy the international standard describing the character set. It is recommended that this be a numeric representation if possible (ISO standards, especially), not include additional punctuation symbols, and that any letters be in lowercase.
A parameter specifying a
<
character-set
>
<
version
>
of the profile may be placed after the field, delimited by a comma. This may be used to discriminate between different cultural needs; for instance, dictionary order versus a more systems-oriented collating order.
This standard recommends not using the
<
version
>
field, unless it is necessary.
Systems which use a unique language and code set for all manual pages may omit the
<
locale
>
substring and store all manual pages in
<
mandir
>
.
For example, systems which only have English manual pages coded with ASCII, may store manual pages (the man
<
section
>
directories) directly in /usr/share/man . (That is the traditional circumstance and arrangement, in fact.)
Countries for which there is a well-accepted standard character code set may omit the
<
character-set
>
field, but it is strongly recommended that it be included, especially for countries with several competing standards.
Various examples:
Language
English
English
English
French
French
German
German
German
German
Japanese
Japanese
Japanese
Territory
—
United Kingdom
United States
Canada
France
Germany
Germany
Germany
Switzerland
Japan
Japan
Japan
Character Set
ASCII
ASCII
ASCII
ISO 8859-1
ISO 8859-1
ISO 646
ISO 6937
ISO 8859-1
ISO 646
JIS
SJIS
UJIS (or EUC-J)
Directory
/usr/share/man/en
/usr/share/man/en GB
/usr/share/man/en US
/usr/share/man/fr CA
/usr/share/man/fr FR
/usr/share/man/de DE.646
/usr/share/man/de DE.6937
/usr/share/man/de DE.88591
/usr/share/man/de CH.646
/usr/share/man/ja JP.jis
/usr/share/man/ja JP.sjis
/usr/share/man/ja JP.ujis
Similarly, provision must be made for manual pages which are architecture-dependent, such as documentation on device-drivers or low-level system administration commands.
These must be placed under an
<
arch
>
directory in the appropriate man
<
section
>
directory; for example, a man page for the i386 ctrlaltdel(8) command might be placed in
/usr/share/man/
<
locale
>
/man8/i386/ctrlaltdel.8
.
Manual pages for commands and data under
Manual pages for X11R6 are stored in
/usr/local
/usr/X11R6/man are stored in /usr/local/man
. It follows that all manual page hier-
.
archies in the system must have the same structure as /usr/share/man .
The cat page sections ( cat
<
section
>
) containing formatted manual page entries are also found within subdirectories of
<
mandir
>
/
<
locale
>
, but are not required nor may they be distributed in lieu of nroff source manual pages.
The numbered sections ”1” through ”8” are traditionally defined. In general, the file name for manual pages located within a particular section end with .
<
section
>
.
371
35.4. The /usr Hierarchy 35. The L
INUX
File System Standard
In addition, some large sets of application-specific manual pages have an additional suffix appended to the manual page filename. For example, the MH mail handling system manual pages must have mh appended to all MH manuals. All X Window System manual pages must have an x appended to the filename.
The practice of placing various language manual pages in appropriate subdirectories of
/usr/share/man also applies to the other manual page hierarchies, such as /usr/local/man and /usr/X11R6/man . (This portion of the standard also applies later in the section on the optional /var/cache/man structure.)
35.4.11.6 /usr/share/misc : Miscellaneous architecture-independent data
This directory contains miscellaneous architecture-independent files which don’t require a separate subdirectory under /usr/share .
35.4.11.6.1 Specific Options
The following files, or symbolic links to files, must be in ing subsystem is installed:
/usr/share/misc , if the correspondascii magic termcap termcap.db
ASCII character set table (optional)
Default list of magic numbers for the file command (optional)
Terminal capability database (optional)
Terminal capability database (optional)
Other (application-specific) files may appear here,
30
/usr/lib at their discretion.
but a distributor may place them in
35.4.11.7 /usr/share/sgml : SGML and XML data (optional)
35.4.11.7.1 Purpose
/usr/share/sgml tions, such as ordinary catalogs (not the centralized ones, see style sheets.
contains architecture-independent files used by SGML or XML applica-
/etc/sgml ), DTDs, entities, or
35.4.11.7.2 Specific Options
The following directories, or symbolic links to directories, must be in /usr/share/sgml , if the corresponding subsystem is installed:
30
Some such files include:
{ airport , birthtoken , ipfw.samp.filters
, eqnchar , getopt ipfw.samp.scripts
,
, man.template
, map3270 , mdoc.template
, gprof.callg
keycap.pcvt
more.help
,
,
, gprof.flat
mail.help
, na.phone
,
, inter.phone
mail.tildehelp
nslookup.help
, oper-
,
, ator , scsi modes , sendmail.hf
, style , units.lib
,
} vgrindefs , vgrindefs.db
, zipcodes
372
35. The L
INUX
File System Standard 35.5. The /var Hierarchy
/usr/share/sgml ——— SGML and XML data docbook docbook DTD (optional) tei html mathml tei DTD (optional) html DTD (optional) mathml DTD (optional)
Other files that are not specific to a given DTD may reside in their own subdirectory.
35.4.12.1 Purpose
Any non-local source code should be placed in this subdirectory.
/var contains variable data files. This includes spool directories and files, administrative and logging data, and transient and temporary files.
Some portions of
/var/log ,
/var
/var/lock are not shareable between different systems.
, and /var/run . Other portions may be shared, notably
For instance,
/var/mail ,
/var/cache/man , /var/cache/fonts , and /var/spool/news .
/var is specified here in order to make it possible to mount /usr read-only. Everything that once went into /usr that is written to during system operation (as opposed to installation and software maintenance) must be in /var .
If /var cannot be made a separate partition, it is often preferable to move /var out of the root partition and into the /usr partition. (This is sometimes done to reduce the size of the root partition or when space runs low in the root partition.) However, /var must not be linked to
/usr because this makes separation of /usr and /var more difficult and is likely to create a naming conflict. Instead, link /var to /usr/var .
Applications must generally not add directories to the top level of /var . Such directories should only be added if they have some system-wide implication, and in consultation with the FHS mailing list.
The following directories, or symbolic links to directories, are required in /var .
/var ——— Variable data cache Application cache data lib Variable state information
373
35.5. The /var Hierarchy 35. The L
INUX
File System Standard local lock log opt run spool tmp
Variable data for /usr/local
Lock files
Log files and directories
Variable data for /opt
Data relevant to running processes
Application spool data
Temporary files preserved between system reboots
Several directories are ‘reserved’ in the sense that they must not be used arbitrarily by some new application, since they would conflict with historical and/or local practice. They are:
/var/backups
/var/cron
/var/msgs
/var/preserve
The following directories, or symbolic links to directories, must be in subsystem is installed:
/var , if the corresponding
/var ——— Variable data account Process accounting logs (optional) crash games
System crash dumps (optional)
Variable game data (optional) mail yp
User mailbox files (optional)
Network Information Service (NIS) database files (optional)
35.5.4.1 Purpose
This directory holds the current active process accounting log and the composite process usage data (as used in some U
NIX
-like systems by lastcomm and sa ).
35.5.5.1 Purpose
/var/cache is intended for cached data from applications. Such data is locally generated as a result of time-consuming I/O or calculation. The application must be able to regenerate or
374
35. The L
INUX
File System Standard 35.5. The /var Hierarchy restore the data. Unlike /var/spool , the cached files can be deleted without data loss. The data must remain valid between invocations of the application and rebooting the system.
Files located under /var/cache may be expired in an application specific manner, by the system administrator, or both. The application must always be able to recover from manual deletion of these files (generally because of a disk space shortage). No other requirements are made on the data format of the cache directories.
BEGIN RATIONALE
The existence of a separate directory for cached data allows system administrators to set different disk and backup policies from other directories in /var .
END RATIONALE
35.5.5.2 Specific Options
/var/cache ——— Cache directories fonts man
Locally-generated fonts (optional)
Locally-formatted manual pages (optional) www
<
package
>
WWW proxy or cache data (optional)
Package specific cache data (optional)
35.5.5.3 /var/cache/fonts : Locally-generated fonts (optional)
35.5.5.3.1 Purpose
The directory /var/cache/fonts should be used to store any dynamically-created fonts. In particular, all of the fonts which are automatically generated by appropriately-named subdirectories of /var/cache/fonts .
31 mktexpk must be located in
35.5.5.3.2 Specific Options
Other dynamically created fonts may also be placed in this tree, under appropriately-named subdirectories of /var/cache/fonts .
35.5.5.4 /var/cache/man : Locally-formatted manual pages (optional)
35.5.5.4.1 Purpose
This directory provides a standard location for sites that provide a read-only /usr partition, but wish to allow caching of locally-formatted man pages. Sites that mount single-user installations) may choose not to use /var/cache/man
/usr as writable (e.g., and may write formatted man pages into the cat
<
section
>
directories in /usr/share/man directly. We recommend that most sites use one of the following options instead:
•
Preformat all manual pages alongside the unformatted versions.
31
This standard does not currently incorporate the TEX Directory Structure (a document that describes the layout TEX files and directories), but it may be useful reading. It is located at ftp://ctan.tug.org/tex/ .
375
35.5. The /var Hierarchy 35. The L
INUX
File System Standard
•
Allow no caching of formatted man pages, and require formatting to be done each time a man page is brought up.
•
Allow local caching of formatted man pages in /var/cache/man .
The structure of /var/cache/man needs to reflect both the fact of multiple man page hierarchies and the possibility of multiple language support.
Given an unformatted manual page that normally appears in
<
path
>
/man/
<
locale
>
/man
<
section
>
, the directory to place formatted man pages in is /var/cache/man/
<
catpath
>
/
<
locale
>
/cat
<
section
>
, where is derived from components.
32
<
path
>
by removing any leading
(Note that the
<
locale
>
usr and/or trailing component may be missing.)
<
catpath
>
share pathname
Man pages written to /var/cache/man may eventually be transferred to the appropriate preformatted directories in the source man hierarchy or expired; likewise formatted man pages in the source man hierarchy may be expired if they are not accessed for a period of time.
If preformatted manual pages come with a system on read-only media (a CD-ROM, for instance), they must be installed in the source man hierarchy (e.g.
/usr/share/man/cat
<
section
>
).
/var/cache/man is reserved as a writable cache for formatted manual pages.
BEGIN RATIONALE
Release 1.2 of the standard specified /var/catman for this hierarchy. The path has been moved under /var/cache to better reflect the dynamic nature of the formatted man pages.
The directory name has been changed to man to allow for enhancing the hierarchy to include post-processed formats other than ”cat”, such as PostScript, HTML, or DVI.
END RATIONALE
35.5.6.1 Purpose
This directory holds system crash dumps. As of the date of this release of the standard, system crash dumps were not supported under Linux.
35.5.7.1 Purpose
Any variable data relating to games in /usr should be placed here.
/var/games should hold the variable data previously found in /usr ; static data, such as help text, level descriptions, and so on, must remain elsewhere, such as /usr/share/games .
BEGIN RATIONALE
32 and
For example, /usr/share/man/man1/ls.1
is formatted into
/usr/X11R6/man/
<
locale
>
/man3/XtClass.3x
into
/var/cache/man/cat1/ls.1
/var/cache/man/X11R6/
<
locale
>
/-
, cat3/XtClass.3x
.
376
35. The L
INUX
File System Standard 35.5. The /var Hierarchy
/var/games has been given a hierarchy of its own, rather than leaving it merged in with the old /var/lib as in release 1.2. The separation allows local control of backup strategies, permissions, and disk usage, as well as allowing inter-host sharing and reducing clutter in
/var/lib . Additionally, /var/games is the path traditionally used by BSD.
END RATIONALE
35.5.8.1 Purpose
This hierarchy holds state information pertaining to an application or the system. State information is data that programs modify while they run, and that pertains to one specific host. Users must never need to modify files in /var/lib to configure a package’s operation.
State information is generally used to preserve the condition of an application (or a group of inter-related applications) between invocations and between different instances of the same application. State information should generally remain valid after a reboot, should not be logging output, and should not be spooled data.
An application (or a group of inter-related applications) must use a subdirectory of for its data.
33
There is one required subdirectory, /var/lib/misc
/var/lib
, which is intended for state files that don’t need a subdirectory; the other subdirectories should only be present if the application in question is included in the distribution.
/var/lib/
<
name
>
is the location that must be used for all distribution packaging support.
Different distributions may use different names, of course.
35.5.8.2 Requirements
The following directories, or symbolic links to directories, are required in /var/lib :
/var/lib ——— Variable state information misc Miscellaneous state data
35.5.8.3 Specific Options
The following directories, or symbolic links to directories, must be in sponding subsystem is installed:
/var/lib , if the corre-
/var/lib ——— Variable state information
<
editor
>
Editor backup files and state (optional)
<
pkgtool
>
<
package
>
Packaging support files (optional)
State data for packages and subsystems (optional)
33
An important difference between this version of this standard and previous ones is that applications are now required to use a subdirectory of /var/lib .
377
35.5. The /var Hierarchy 35. The L
INUX
File System Standard hwclock xdm
State directory for hwclock (optional)
X display manager variable data (optional)
35.5.8.4 /var/lib/
<
editor
>
: Editor backup files and state (optional)
35.5.8.4.1 Purpose
These directories contain saved files generated by any unexpected termination of an editor (e.g., elvis, jove, nvi).
Other editors may not require a directory for crash-recovery files, but may require a welldefined place to store other information while the editor is running. This information should be stored in a subdirectory under in /var/lib/emacs/lock ).
/var/lib (for example, GNU Emacs would place lock files
Future editors may require additional state information beyond crash-recovery files and lock files — this information should also be placed under /var/lib/
<
editor
>
.
BEGIN RATIONALE
Previous Linux releases, as well as all commercial vendors, use /var/preserve for vi or its clones. However, each editor uses its own format for these crash-recovery files, so a separate directory is needed for each editor.
Editor-specific lock files are usually quite different from the device or resource lock files that are stored in /var/lock and, hence, are stored under /var/lib .
END RATIONALE
35.5.8.5 /var/lib/hwclock : State directory for hwclock (optional)
35.5.8.5.1 Purpose
This directory contains the file /var/lib/hwclock/adjtime .
BEGIN RATIONALE
In FHS 2.1, this file was incorrect.
/etc/adjtime , but as hwclock updates it, that was obviously
END RATIONALE
35.5.8.6 /var/lib/misc : Miscellaneous variable data
35.5.8.6.1 Purpose
This directory contains variable data not placed in a subdirectory in /var/lib . An attempt should be made to use relatively unique names in this directory to avoid namespace conflicts.
34
34
This hierarchy should contain files stored in /var/db in current BSD releases. These include cate.database
and mountdtab , and the kernel symbol database(s).
lo-
378
35. The L
INUX
File System Standard 35.5. The /var Hierarchy
35.5.9.1 Purpose
Lock files should be stored within the /var/lock directory structure.
Lock files for devices and other resources shared by multiple applications, such as the serial device lock files that were originally found in either /usr/spool/locks must now be stored in /var/lock or /usr/spool/uucp
. The naming convention which must be used is LCK..
, followed by the base name of the device file. For example, to lock
LCK..ttyS0
would be created.
/dev/ttyS0 the file
35
The format used for the contents of such lock files must be the HDB UUCP lock file format. The
HDB format is to store the process identifier (PID) as a ten byte ASCII decimal number, with a trailing newline. For example, if process 1230 holds a lock file, it would contain the eleven characters: space, space, space, space, space, space, one, two, three, zero, and newline.
35.5.10.1 Purpose
This directory contains miscellaneous log files. Most logs must be written to this directory or an appropriate subdirectory.
35.5.10.2 Specific Options
The following files, or symbolic links to files, must be in system is installed:
/var/log , if the corresponding sublastlog messages wtmp record of last login of each user system messages from syslogd record of all logins and logouts
35.5.11.1 Purpose
The mail spool must be accessible through form
<
username
>
.
36
/var/mail and the mail spool files must take the
User mailbox files in this location must be stored in the standard U
NIX mailbox format.
BEGIN RATIONALE
35
Then, anything wishing to use /dev/ttyS0 can read the lock file and act accordingly (all locks in
/var/lock should be world-readable).
36
Note that /var/mail may be a symbolic link to another directory.
379
35.5. The /var Hierarchy 35. The L
INUX
File System Standard
The logical location for this directory was changed from /var/spool/mail in order to bring FHS in-line with nearly every U
NIX implementation. This change is important for inter-operability since a single /var/mail is often shared between multiple hosts and multiple U
NIX implementations (despite NFS locking issues).
It is important to note that there is no requirement to physically move the mail spool to this location. However, programs and header files must be changed to use /var/mail .
END RATIONALE
35.5.12.1 Purpose
Variable data of the packages in
<
package
>
/opt must be installed in is the name of the subtree in /opt
/var/opt/
<
package
>
, where where the static data from an add-on software package is stored, except where superseded by another file in /etc . No structure is imposed on the internal arrangement of /var/opt/
<
package
>
.
BEGIN RATIONALE
Refer to the rationale for /opt .
END RATIONALE
35.5.13.1 Purpose
This directory contains system information data describing the system since it was booted. Files under this directory must be cleared (removed or truncated as appropriate) at the beginning of the boot process. Programs may have a subdirectory of programs that use more than one run-time file.
37
/var/run ; this is encouraged for
Process identifier (PID) files, which were originally placed in /etc , must be placed in /var/run . The naming convention for PID files is
<
program-name
>
.pid
. For example, the crond PID file is named /var/run/crond.pid
.
35.5.13.2 Requirements
The internal format of PID files remains unchanged. The file must consist of the process identifier in ASCII-encoded decimal, followed by a newline character. For example, if crond number 25, /var/run/crond.pid
was process would contain three characters: two, five, and newline.
Programs that read PID files should be somewhat flexible in what they accept; i.e., they should ignore extra whitespace, leading zeroes, absence of the trailing newline, or additional lines in the
PID file. Programs that create PID files should use the simple specification located in the above paragraph.
The utmp file, which stores information about who is currently using the system, is located in this directory.
Programs that maintain transient U
NIX
-domain sockets must place them in this directory.
37
/var/run should be unwritable for unprivileged users (root or users running daemons); it is a major security problem if any user can write in this directory.
380
35. The L
INUX
File System Standard 35.5. The /var Hierarchy
35.5.14.1 Purpose
/var/spool contains data which is awaiting some kind of later processing.
Data in
/var/spool represents work to be done in the future (by a program, user, or administrator); often data is deleted after it has been processed.
38
35.5.14.2 Specific Options
The following directories, or symbolic links to directories, must be in /var/spool , if the corresponding subsystem is installed:
/var/spool ——— Spool directories lpd Printer spool directory (optional) mqueue news
Outgoing mail queue (optional)
News spool directory (optional) rwho uucp
Rwhod files (optional)
Spool directory for UUCP (optional)
35.5.14.3 /var/spool/lpd : Line-printer daemon print queues (optional)
35.5.14.3.1 Purpose
The lock file for lpd , lpd.lock
, must be placed in /var/spool/lpd . It is suggested that the lock file for each printer be placed in the spool directory for that specific printer and named lock .
35.5.14.3.2 Specific Options
/var/spool/lpd ——— Printer spool directory
<
printer
>
Spools for a specific printer (optional)
35.5.14.4 /var/spool/rwho : Rwhod files (optional)
35.5.14.4.1 Purpose
This directory holds the rwhod information for other systems on the local net.
BEGIN RATIONALE
38
UUCP lock files must be placed in /var/lock . See the above section on /var/lock .
381
35.6. Operating System Specific Annex 35. The L
INUX
File System Standard
Some BSD releases use /var/rwho for this data; given its historical location in
/var/spool on other systems and its approximate fit to the definition of ‘spooled’ data, this location was deemed more appropriate.
END RATIONALE
35.5.15.1 Purpose
The /var/tmp directory is made available for programs that require temporary files or directories that are preserved between system reboots. Therefore, data stored in /var/tmp is more persistent than data in /tmp .
Files and directories located in /var/tmp must not be deleted when the system is booted. Although data stored in /var/tmp is typically deleted in a site-specific manner, it is recommended that deletions occur at a less frequent interval than /tmp .
35.5.16.1 Purpose
Variable data for the Network Information Service (NIS), formerly known as the Sun Yellow
Pages (YP), must be placed in this directory.
BEGIN RATIONALE
/var/yp is the standard directory for NIS (YP) data and is almost exclusively used in NIS documentation and systems.
39
END RATIONALE
This section is for additional requirements and recommendations that only apply to a specific operating system. The material in this section should never conflict with the base standard.
This is the annex for the Linux operating system.
35.6.1.1 / : Root directory
On Linux systems, if the kernel is located in / , we recommend using the names vmlinuz , which have been used in recent Linux kernel source packages.
vmlinux or
39
NIS should not be confused with Sun NIS+, which uses a different directory, /var/nis .
382
35. The L
INUX
File System Standard 35.6. Operating System Specific Annex
35.6.1.2 /bin : Essential user command binaries (for use by all users)
Linux systems which require them place these additional files into /bin .
{ setserial }
35.6.1.3 /dev : Devices and special files
All devices and special files in /dev should adhere to the
Linux Allocated Devices
document, which is available with the Linux kernel source. It is maintained by H. Peter Anvin
<
>
.
Symbolic links in /dev should not be distributed with Linux systems except as provided in the
Linux Allocated Devices
document.
BEGIN RATIONALE
The requirement not to make symlinks promiscuously is made because local setups will often differ from that on the distributor’s development machine. Also, if a distribution install script configures the symbolic links at install time, these symlinks will often not get updated if local changes are made in hardware. When used responsibly at a local level, however, they can be put to good use.
END RATIONALE
35.6.1.4 /etc : Host-specific system configuration
Linux systems which require them place these additional files into /etc .
{ lilo.conf }
35.6.1.5 /proc : Kernel and process information virtual filesystem
The proc filesystem is the de-facto standard Linux method for handling process and system information, rather than /dev/kmem and other similar methods. We strongly encourage this for the storage and retrieval of process information as well as other kernel and memory information.
35.6.1.6 /sbin : Essential system binaries
Linux systems place these additional files into /sbin .
•
Second extended filesystem commands (optional):
{ badblocks , dumpe2fs , e2fsck , mke2fs , mklost+found , tune2fs }
•
Boot-loader map installer (optional):
{ lilo }
383
35.6. Operating System Specific Annex 35. The L
INUX
File System Standard
Optional files for /sbin:
•
Static binaries:
{ ldconfig , sln , ssync }
Static of sln ln ( sln ) and static sync ( ssync
(to repair incorrect symlinks in
) are useful when things go wrong. The primary use
/lib after a poorly orchestrated upgrade) is no longer a major concern now that the ldconfig program (usually located in /usr/sbin ) exists and can act as a guiding hand in upgrading the dynamic libraries. Static sync is useful in some emergency situations. Note that these need not be statically linked versions of the standard ln and sync , but may be.
The ldconfig binary is optional for /sbin since a site may choose to run ldconfig at boot time, rather than only when upgrading the shared libraries. (It’s not clear whether or not it is advantageous to run ldconfig on each boot.) Even so, some people like ldconfig around for the following (all too common) situation:
1.
2.
I’ve just removed /lib/
<
file
>
.
I can’t find out the name of the library because that doesn’t have ls ls is dynamically linked, I’m using a shell built-in, and I don’t know about using ” echo * ” as a replacement.
3.
I have a static sln , but I don’t know what to call the link.
•
Miscellaneous:
{ ctrlaltdel , kbdrate }
So as to cope with the fact that some keyboards come up with such a high repeat rate as to be unusable, kbdrate may be installed in /sbin on some systems.
Since the default action in the kernel for the Ctrl-Alt-Del key combination is an instant hard reboot, it is generally advisable to disable the behavior before mounting the root filesystem in read-write mode. Some init suites are able to disable Ctrl-Alt-Del, but others may require the ctrlaltdel program, which may be installed in /sbin on those systems.
35.6.1.7 /usr/include : Header files included by C programs
These symbolic links are required if a C or C++ compiler is installed and only for systems not based on glibc.
/usr/include/asm -> /usr/src/linux/include/asm-<arch>
/usr/include/linux -> /usr/src/linux/include/linux
35.6.1.8 /usr/src : Source code
For systems based on glibc, there are no specific guidelines for this directory. For systems based on Linux libc revisions prior to glibc, the following guidelines and rationale apply:
The only source code that should be placed in a specific location is the Linux kernel source code.
It is located in /usr/src/linux .
If a C or C++ compiler is installed, but the complete Linux kernel source code is not installed, then the include files from the kernel source code must be located in these directories:
384
35. The L
INUX
File System Standard 35.6. Operating System Specific Annex
/usr/src/linux/include/asm-<arch>
/usr/src/linux/include/linux
<
arch
>
is the name of the system architecture.
Note:
/usr/src/linux
may be a symbolic link to a kernel source code tree.
BEGIN RATIONALE
It is important that the kernel include files be located in /usr/src/linux and not in
/usr/include so there are no problems when system administrators upgrade their kernel version for the first time.
END RATIONALE
35.6.1.9 /var/spool/cron : cron and at jobs
This directory contains the variable data for the cron and at programs.
385
35.7. Appendix 35. The L
INUX
File System Standard
The FHS mailing list is located at
<
>
. To subscribe to the list send mail to
<
>
with body ” ADD fhs-discuss ”.
Thanks to Network Operations at the University of California at San Diego who allowed us to use their excellent mailing list server.
As noted in the introduction, please do not send mail to the mailing list without first contacting the FHS editor or a listed contributor.
The process of developing a standard filesystem hierarchy began in August 1993 with an effort to restructure the file and directory structure of Linux. The FSSTND, a filesystem hierarchy standard specific to the Linux operating system, was released on February 14, 1994. Subsequent revisions were released on October 9, 1994 and March 28, 1995.
In early 1995, the goal of developing a more comprehensive version of FSSTND to address not only Linux, but other U
NIX
-like systems was adopted with the help of members of the BSD development community. As a result, a concerted effort was made to focus on issues that were general to U
NIX
-like systems. In recognition of this widening of scope, the name of the standard was changed to Filesystem Hierarchy Standard or FHS for short.
Volunteers who have contributed extensively to this standard are listed at the end of this document. This standard represents a consensus view of those and other contributors.
Here are some of the guidelines that have been used in the development of this standard:
•
Solve technical problems while limiting transitional difficulties.
•
Make the specification reasonably stable.
•
Gain the approval of distributors, developers, and other decision-makers in relevant development groups and encourage their participation.
•
Provide a standard that is attractive to the implementors of different U
NIX
-like systems.
This document specifies a standard filesystem hierarchy for FHS filesystems by specifying the location of files and directories, and the contents of some system files.
This standard has been designed to be used by system integrators, package developers, and system administrators in the construction and maintenance of FHS compliant filesystems. It is primarily intended to be a reference and is not a tutorial on how to manage a conforming filesystem hierarchy.
386
35. The L
INUX
File System Standard 35.7. Appendix
The FHS grew out of earlier work on FSSTND, a filesystem organization standard for the Linux operating system. It builds on FSSTND to address interoperability issues not just in the Linux community but in a wider arena including 4.4BSD-based operating systems. It incorporates lessons learned in the BSD world and elsewhere about multi-architecture support and the demands of heterogeneous networking.
Although this standard is more comprehensive than previous attempts at filesystem hierarchy standardization, periodic updates may become necessary as requirements change in relation to emerging technology. It is also possible that better solutions to the problems addressed here will be discovered so that our solutions will no longer be the best possible solutions. Supplementary drafts may be released in addition to periodic updates to this document. However, a specific goal is backwards compatibility from one release of this document to the next.
Comments related to this standard are welcome. Any comments or suggestions for changes may be directed to the FHS editor (Daniel Quinlan
<
>
) or the FHS mailing list. Typographical or grammatical comments should be directed to the FHS editor.
Before sending mail to the mailing list it is requested that you first contact the FHS editor in order to avoid excessive re-discussion of old topics.
Questions about how to interpret items in this document may occasionally arise. If you have need for a clarification, please contact the FHS editor. Since this standard represents a consensus of many participants, it is important to make certain that any interpretation also represents their collective opinion. For this reason it may not be possible to provide an immediate response unless the inquiry has been the subject of previous discussion.
The developers of the FHS wish to thank the developers, system administrators, and users whose input was essential to this standard. We wish to thank each of the contributors who helped to write, compile, and compose this standard.
The FHS Group also wishes to thank those Linux developers who supported the FSSTND, the predecessor to this standard. If they hadn’t demonstrated that the FSSTND was beneficial, the
FHS could never have evolved.
Brandon S. Allbery
Keith Bostic
Drew Eckhardt
Rik Faith
Stephen Harris
Ian Jackson
John A. Martin
Ian McCloghrie
Chris Metcalf
Ian Murdock
David C. Niemi
Daniel Quinlan
<
>
<
>
<
>
<
>
<
>
<
>
<
>
<
>
<
>
<
>
<
>
<
>
387
35.7. Appendix 35. The L
INUX
File System Standard
Eric S. Raymond
Rusty Russell
Mike Sangrey
David H. Silber
Thomas Sippel-Dau
Theodore Ts’o
Stephen Tweedie
Fred N. van Kempen
Bernd Warken
<
>
<
>
<
>
<
>
<
>
<
>
<
>
<
>
<
>
388
In this chapter, we will show how to set up a web server running virtual domains and dynamic CGI web pages. HTML is not covered, and you are expected to have some understanding of what HTML is, or at least where to find documentation about it.
In Section 26.2 we showed a simple HTTP session with the
web server
telnet command. A is really nothing more than a program that reads a file from the hard disk whenever a GET /<filename>.html HTTP/1.0
request comes in on port 80. Here, we will show a simple web server written in shell script.
&
Not by me. The author did not put his name in the source, so if you are out there, please drop me an email.
You will need to add
5
¥ www stream
§ tcp nowait nobody /usr/local/sbin/sh-httpd
¦ to your /etc/inetd.conf
file. If you are running xinetd , then you will need to add a file containing
¨ service www
{ socket_type wait user server
= stream
= no
= nobody
= /usr/local/sbin/sh-httpd
}
§
¥
¦ to your /etc/xinetd.d/ servers and restart inetd directory. Then, you must stop any already running web
(or xinetd ).
389
36.1. Web Server Basics 36.
httpd
— Apache Web Server
5
10
You will also have to create a log file ( /usr/local/var/log/sh-httpd.log
) and at least one web page ( /usr/local/var/sh-www/index.html
) for your server to serve. It can contain, say:
¨
<HTML>
<HEAD>
<TITLE>My First Document</TITLE>
</HEAD>
<BODY bgcolor=#CCCCCC text="#000000">
This is my first document<P>
Please visit
<A HREF="http://rute.sourceforge.net/">
The Rute Home Page
</A> for more info.</P>
</BODY>
</HTML>
§
¥
¦
5 body
Note that the server runs as user, and the index.html
nobody , so the log file must be writable by the file must be readable. Also note the use of the nogetpeername command, which can be changed to pipes package installed.
PEER="" if you do not have the net-
&
I am not completely sure if other commands used here are unavailable on other U
NIX
¨ systems.
.
#!/bin/sh
VERSION=0.1
NAME="ShellHTTPD"
DEFCONTENT="text/html"
DOCROOT=/usr/local/var/sh-www
DEFINDEX=index.html
LOGFILE=/usr/local/var/log/sh-httpd.log
¥
10 log() { local REMOTE_HOST=$1 local REFERRER=$2 local CODE=$3 local SIZE=$4
15 echo "$REMOTE_HOST $REFERRER - [$REQ_DATE] \
\"${REQUEST}\" ${CODE} ${SIZE}" >> ${LOGFILE}
}
20 print_header() { echo -e "HTTP/1.0 200 OK\r" echo -e "Server: ${NAME}/${VERSION}\r" echo -e "Date: ‘date‘\r"
}
25 print_error() { echo -e "HTTP/1.0 $1 $2\r"
390
36.
httpd
— Apache Web Server
30 echo -e "Content-type: $DEFCONTENT\r" echo -e "Connection: close\r" echo -e "Date: ‘date‘\r" echo -e "\r" echo -e "$2\r" exit 1
35
} guess_content_type() { local FILE=$1 local CONTENT
40
45 case ${FILE##*.} in html) CONTENT=$DEFCONTENT ;; gz) CONTENT=application/x-gzip ;;
*) CONTENT=application/octet-stream ;; esac echo -e "Content-type: $CONTENT"
55
}
50 do_get() { local DIR local NURL local LEN if [ ! -d $DOCROOT ]; then log ${PEER} - 404 0 print_error 404 "No such file or directory" fi
60
65
70
75
80 if [ -z "${URL##*/}" ]; then
URL=${URL}${DEFINDEX} fi
DIR="‘dirname $URL‘" if [ ! -d ${DOCROOT}/${DIR} ]; then log ${PEER} - 404 0 else print_error 404 "Directory not found" cd ${DOCROOT}/${DIR}
NURL="‘pwd‘/‘basename ${URL}‘"
URL=${NURL} fi if [ ! -f ${URL} ]; then log ${PEER} - 404 0 print_error 404 "Document not found" fi print_header guess_content_type ${URL}
LEN="‘ls -l ${URL} | tr -s ’ ’ | cut -d ’ ’ -f 5‘" echo -e "Content-length: $LEN\r\n\r" log ${PEER} - 200 ${LEN}
391
36.1. Web Server Basics
36.1. Web Server Basics 36.
httpd
— Apache Web Server
85
90 cat ${URL} sleep 3
} read_request() { local DIRT local COMMAND read REQUEST read DIRT
95
REQ_DATE="‘date +"%d/%b/%Y:%H:%M:%S %z"‘"
REQUEST="‘echo ${REQUEST} | tr -s [:blank:]‘"
COMMAND="‘echo ${REQUEST} | cut -d ’ ’ -f 1‘"
URL="‘echo ${REQUEST} | cut -d ’ ’ -f 2‘"
PROTOCOL="‘echo ${REQUEST} | cut -d ’ ’ -f 3‘"
100
105 case $COMMAND in
HEAD) print_error 501 "Not implemented (yet)"
;;
GET) do_get
;;
*) print_error 501 "Not Implemented"
;; esac
110
}
115
#
# It was supposed to be clean - without any non-standard utilities
# but I want some logging where the connections come from, so
# I use just this one utility to get the peer address
#
# This is from the netpipes package
PEER="‘getpeername | cut -d ’ ’ -f 1‘"
120 read_request
Now run telnet localhost 80 , as in Section 26.2. If that works and your log files are being properly appended (use tail -f . . . ), you can try to connect to http://localhost/ with a web browser like Netscape.
Notice also that the command getsockname (which tells you which of your own
IP addresses the remote client connected to) could allow the script to serve pages from a different directory for each IP address. This is baby, I’m in a giant nutshell.... how do I get out?
-
virtual domains
in a nutshell.
&
Groovy,
392
¦
36.
httpd
— Apache Web Server 36.2. Installing and Configuring Apache
Because all distributions package Apache in a different way, here I assume Apache to have been installed from its source tree, rather than from a .deb
or .rpm
package. You can refer to Section 24.1 on how to install Apache from its source
.tar.gz
file like any other GNU package. (You can even install it under Windows, Windows NT, or OS/2.) The source tree is, of course, available from
The
Apache Home Page
http://www.apache.org
. Here I assume you have installed it in -prefix=/opt/apache/ . In the process, Apache will have dumped a huge reference manual into /opt/apache/htdocs/manual/ .
5
10
15
20
25
30
Apache has several legacy configuration files: access.conf
and srm.conf
are two of them. These files are now deprecated and should be left empty. A single configuration file
¨
/opt/apache/conf/httpd.conf
may contain at minimum:
ServerType standalone
ServerRoot "/opt/apache"
PidFile /opt/apache/logs/httpd.pid
ScoreBoardFile /opt/apache/logs/httpd.scoreboard
Port 80
User nobody
Group nobody
HostnameLookups Off
ServerAdmin [email protected]
UseCanonicalName On
ServerSignature On
DefaultType text/plain
ErrorLog /opt/apache/logs/error_log
LogLevel warn
LogFormat "%h %l %u %t \"%r\" %>s %b" common
CustomLog /opt/apache/logs/access_log common
DocumentRoot "/opt/apache/htdocs"
DirectoryIndex index.html
AccessFileName .htaccess
<Directory />
Options FollowSymLinks
AllowOverride None
Order Deny,Allow
Deny from All
</Directory>
<Files ˜ "ˆ\.ht">
Order allow,deny
Deny from all
</Files>
<Directory "/opt/apache/htdocs">
Options Indexes FollowSymLinks MultiViews
AllowOverride All
¥
393
36.2. Installing and Configuring Apache 36.
httpd
— Apache Web Server
35
40
Order allow,deny
Allow from all
</Directory>
<Directory "/opt/apache/htdocs/home/*/www">
Options Indexes MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
¦
With the config file ready, you can move the index.html
file above to
/opt/apache/htdocs/ . You will notice the complete Apache manual and a demo page already installed there; you can move them to another directory for the time being. Now run
¨
/opt/apache/bin/httpd -X
§
¥
¦ and then point your web browser to http://localhost/ as before.
Here is a description of the options.
Each option is called a
directive
in Apache terminology.
A complete list of basic directives is in the file
/opt/apache/htdocs/manual/mod/core.html
.
ServerType
inetd
As discussed in Section 29.2, some services can run standalone or from
(or xinetd ). This directive can be exactly standalone or inetd . If you choose inetd , you will need to add an appropriate line into your inetd configuration, although a web server should almost certainly choose standalone mode.
ServerRoot
This is the directory superstructure
&
See page 137.
under which Apache is installed. It will always be the same as the value passed to --prefix= .
PidFile
Many system services store the process ID in a file for shutdown and monitoring purposes. On most distributions, the file is /var/run/httpd.pid
.
ScoreBoardFile
This option is used for communication between Apache parent and child processes on some non-U
NIX systems.
Port
This is the TCP port for standalone servers to listen on.
User , Group
This option is important for security. It forces httpd to user nobody privileges. If the web server is ever hacked, the attack will not be able to gain more than the privileges of the nobody user.
394
36.
httpd
— Apache Web Server 36.2. Installing and Configuring Apache
HostnameLookups
directive to on
To force a reverse DNS lookup on every connecting host, set this
. To force a forward lookup on every reverse lookup, set this to double . This option is for logging purposes since access control does a reverse and forward reverse lookup anyway if required. It should certainly be off if you want to reduce latency.
ServerAdmin
Error messages include this email address.
UseCanonicalName
If Apache has to return a URL for any reason, it will normally return the full name of the server. Setting to off uses the very host name sent by the client.
ServerSignature
Add the server name to HTML error messages.
DefaultType
All files returned to the client have a type field specifying how the file should be displayed. If Apache cannot deduce the type, it assumes the MIME
Type to be text/plain . See Section 12.6.2 for a discussion of MIME types.
ErrorLog
Where errors get logged, usually /var/log/httpd/error log
LogLevel
How much info to log.
LogFormat
Define a new log format. Here we defined a log format and call it common . Multiple lines are allowed. Lots of interesting information can actually be logged: See /opt/apache/htdocs/manual/mod/mod log config.html
for a full description.
CustomLog
The log file name and its (previously defined) format.
DocumentRoot
This directive specifies the top-level directory that client connections will see. The string /opt/apache/htdocs/ is prepended to any file lookup, and hence a URL http://localhost/manual/index.html.en
/opt/apache/htdocs/manual/index.html.en
.
will return the file
DirectoryIndex
This directive gives the default file to try serve for URLs that contain only a directory name. If a file index.html
does not exist under that directory, an index of the directory is sent to the client. Other common configurations use index.htm
or default.html
.
AccessFileName
from a file
Before serving a file to a client, Apache reads additional directives
.htaccess
in the same directory as the requested file. If a parent directory contains a cess
.htaccess
instead, this one will take priority. The .htacfile contains directives that limit access to the directory, as discussed below.
The above is merely the general configuration of Apache. To actually serve pages, you need to define directories, each with a particular purpose, containing particular
HTML or graphic files. The Apache configuration file is very much like an HTML document. Sections are started with <
section parameter
> and ended with </
section
> .
395
36.2. Installing and Configuring Apache 36.
httpd
— Apache Web Server
5
The most common directive of this sort is <Directory /
directory
> which does such directory definition. Before defining any directories, we need to limit access to the root directory. This control is critical for security.
¨
<Directory />
Options FollowSymLinks
Deny from All
Order Deny,Allow
AllowOverride None
</Directory>
§
¥
¦
This configuration tells Apache about the root directory, giving clients very restrictive access to it. The directives are
&
Some of these are extracted from the Apache manual.
:
Options
The Options directive controls which server features are available in a particular directory. There is also the syntax of the parent directory, for example,
+
option
or -
option
to include the options
Options +FollowSymLinks -Indexes .
FollowSymLinks
The server will follow any symbolic links beneath the directory. Be careful about what symbolic links you have beneath directories with
FollowSymLinks . You can, for example, give everyone access to the root directory by having a link ../../../ under htdocs —not what you want.
ExecCGI
Execution of CGI scripts is permitted.
Includes
Server-side includes are permitted (more on this later).
IncludesNOEXEC
and #include
Server-side includes are permitted, but the #exec of CGI scripts are disabled.
command
Indexes
If a client asks for a directory by name and no whatever DirectoryIndex index.html
file (or file you specified) is present, then a pretty listing of the contents of that directory is created and returned. For security you may want to turn this option off.
MultiViews
Content-negotiated MultiViews are allowed (more on this later).
SymLinksIfOwnerMatch
The server will only follow symbolic links for which the target file or directory is owned by the same user ID as the link (more on this later).
All
All options except for MultiViews . This is the default setting.
Deny
Hosts that are not allowed to connect. You can specify a host name or IP address, for example, as:
¨
Deny from 10.1.2.3
Deny from 192.168.5.0/24
Deny from cranzgot.co.za
§
¥
¦
396
36.
httpd
— Apache Web Server 36.2. Installing and Configuring Apache which will deny access to 10.1.2.3
, all hosts beginning with 192.168.5.
, and all hosts ending in .cranzgot.co.za
, including the host cranzgot.co.za
.
Allow
Hosts that are allowed to connect. This directive uses the same syntax as Deny .
Order
If order is Deny,Allow , then the client that does not match a Deny
Deny directives are checked first and any directive or does match an Allow directive will be
allowed
access to the server.
If order is Allow,Deny , then the client that does not match an
Allow
Allow directives are checked first and any directive or does match a Deny directive will be
denied
access to the server.
AllowOverride
In addition to the directives specified here, additional directives will be read from the file specified by AccessFileName , usually called .htaccess . This file would usually exist alongside your in a parent directory.
.html
files or otherwise
If the file exists, its contents are read into the current <Directory
.htaccess
. . .
> directive.
AllowOverride file is allowed to squash.
says what directives the
The complete list can be found in
/opt/apache/htdocs/manual/mod/core.html
.
You can see that we give very restrictive Options to the root directory, as well as very restrictive access. The only server feature we allow is FollowSymLinks , then we Deny any access, and then we remove the possibility that a override our restrictions.
.htaccess
file could
The <Files
. . .
> directive sets restrictions on all files matching a particular regular expression. As a security measure, we use it to prevent access to all .htaccess
files as follows:
¨
<Files ˜ "ˆ\.ht">
Order allow,deny
Deny from all
</Files>
§
¥
¦
5
We are now finally ready to add actual web page directories. These take a less restrictive set of access controls:
¨
<Directory "/opt/apache/htdocs">
Options Indexes FollowSymLinks MultiViews
AllowOverride All
Order allow,deny
Allow from all
</Directory>
§
¥
¦
397
36.2. Installing and Configuring Apache 36.
httpd
— Apache Web Server
5
Our users may require that Apache know about their private web page directories
˜/www/
¨
. This is easy to support with the special UserDir directive:
<Directory "/opt/apache/htdocs/home/*/www">
Options Indexes MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
UserDir /opt/apache/htdocs/home/*/www
§
¥
¦
For this feature to work, you must symlink to /home , and create a directory www/
/opt/apache/htdocs/home under each user’s home directory.
Hitting the URL http://localhost/˜jack/index.html
will then retrieve the file
/opt/apache/htdocs/home/jack/www/index.html
. You will find that Apache gives a Forbidden error message when you try to do this. This is probably because jack ’s home directory’s permissions are too restrictive. Your choices vary between now making jack ’s home directory less restricted or increasing the privileges of
Apache. Running Apache under the www group by using Group www , and then running
¨ groupadd -g 65 www chown jack:www /home/jack /home/jack/www chmod 0750 /home/jack /home/jack/www
§
¥
¦ is a reasonable compromise.
Sometimes, HTML documents will want to refer to a file or graphic by using a simple prefix, rather than a long directory name. Other times, you want two different references to source the same file. The Alias directive creates virtual links between directories. For example, adding the following line, means that a URL /icons/bomb.gif
will serve the file
¨
/opt/apache/icons/bomb.gif
:
Alias /icons/ "/opt/apache/icons/"
§
¥
¦
We do, of course, need to tell Apache about this directory:
¨
<Directory "/opt/apache/icons">
Options None
AllowOverride None
Order allow,deny
¥
398
36.
httpd
— Apache Web Server 36.2. Installing and Configuring Apache
5
Allow from all
</Directory>
§ ¦
You will find the directory lists generated by the preceding configuration rather bland.
¥
5
¦ causes nice descriptive icons to be printed to the left of the file name. What icons match what file types is a trick issue. You can start with:
¨
AddIconByEncoding (CMP,/icons/compressed.gif) x-compress x-gzip
AddIconByType (TXT,/icons/text.gif) text/*
AddIconByType (IMG,/icons/image2.gif) image/*
AddIconByType (SND,/icons/sound2.gif) audio/*
AddIconByType (VID,/icons/movie.gif) video/*
AddIcon /icons/compressed.gif .Z .z .tgz .gz .zip
AddIcon /icons/a.gif .ps .eps
AddIcon /icons/layout.gif .html .shtml .htm
§
¥
¦
This requires the Alias directive above to be present. The default Apache configuration contains a far more extensive map of file types.
You can get Apache to serve
¨
AddEncoding x-compress Z gzip ped files with this:
AddEncoding x-gzip gz
§
Now if a client requests a file index.html
, but only a file index.html.gz
exists,
Apache decompresses it on-the-fly. Note that you must have the MultiViews options enabled.
¥
¦
The next options cause Apache to serve index.html.
language-code
when index.html
is requested, filling in the preferred language code sent by the web browser.
Adding these directives causes your Apache manual to display correctly and will properly show documents that have non-English translations. Here also, the MultiViews must be present.
¨
AddLanguage en .en
¥
399
36.2. Installing and Configuring Apache 36.
httpd
— Apache Web Server
5
10
AddLanguage da .dk
AddLanguage nl .nl
AddLanguage et .ee
AddLanguage fr .fr
AddLanguage de .de
AddLanguage el .el
AddLanguage ja .ja
AddLanguage ru .ru
LanguagePriority en da nl et fr de el ja ru
§
The LanguagePriority not specify any.
directive indicates the preferred language if the browser did
¦
Some files might contain a .koi8-r extension, indicating a Russian character set encoding for this file. Many languages have such custom character sets. Russian files are named
webpage
.html.ru.koi8-r . Apache must tell the web browser about the encoding type, based on the extension. Here are directives for Japanese, Russian, and
UTF-8
&
UTF-8 is a Unicode character set encoding useful for any language.
-
, as follows:
¨
AddCharset ISO-2022-JP .jis
AddCharset KOI8-R
AddCharset UTF-8
§
.koi8-r
.utf8
¥
¦
Once again, the default Apache configuration contains a far more extensive map of languages and character sets.
5
Apache actually has a built-in programming language that interprets .shtml
files as scripts. The output of such a script is returned to the client. Most of a typical .shtml
file will be ordinary HTML, which will be served unmodified. However, lines like
¨
<!--#echo var="DATE_LOCAL" -->
§
¥
¦ will be interpreted, and their output
included
into the HTML—hence the name
serverside includes
. Server-side includes are ideal for HTML pages that contain mostly static
HTML with small bits of dynamic content. To demonstrate, add the following to your httpd.conf
:
¨
AddType text/html .shtml
AddHandler server-parsed .shtml
<Directory "/opt/apache/htdocs/ssi">
Options Includes
AllowOverride None
Order allow,deny
¥
400
36.
httpd
— Apache Web Server 36.2. Installing and Configuring Apache
5
5
Allow from all
</Directory>
§
Create a directory
¨
</HTML>
§
/opt/apache/htdocs/ssi with the index file
<HTML>
The date today is <!--#echo var="DATE_LOCAL" -->.<P>
Here is a directory listing:<br>
<PRE>
<!--#exec cmd="ls -al" -->
</PRE>
<!--#include virtual="footer.html" --> index.shtml
:
¦
¥
¦ and then a file footer.html
containing anything you like. It is obvious how useful this procedure is for creating many documents with the same banner by means of a
#include statement. If you are wondering what other variables you can print besides
DATE LOCAL , try the following:
¨
<HTML>
<PRE>
<!--#printenv -->
</PRE>
</HTML>
§
¥
¦
You can also goto http://localhost/manual/howto/ssi.html
to see some other examples.
(I have actually never managed to figure out why CGI is called CGI.) CGI is where a URL points to a script. What comes up in your browser is the output of the script
(were it to be executed) instead of the contents of the script itself. To try this, create a file /opt/apache/htdocs/test.cgi
:
¨
#!/bin/sh
¥
5
10 echo ’Content-type: text/html’ echo echo ’<HTML>’ echo ’ <HEAD>’ echo ’ <TITLE>My First CGI</TITLE>’ echo ’ </HEAD>’ echo ’ <BODY bgcolor=#CCCCCC text="#000000">’ echo ’This is my first CGI<P>’ echo ’Please visit’
401
36.2. Installing and Configuring Apache 36.
httpd
— Apache Web Server
15 echo ’ echo ’ echo ’
<A HREF="http://rute.sourceforge.net/">’
The Rute Home Page’
</A>’ echo ’for more info.</P>’ echo ’ </BODY>’ echo ’</HTML>’
§ ¦
5
Make this script executable with chmod a+x test.cgi
running it on the command-line. Add the line
¨
AddHandler cgi-script .cgi
§ and test the output by
¥
¦ to your httpd.conf
/opt/apache/htdocs
¨ file.
Next, modify your to include ExecCGI , like this:
Options
<Directory "/opt/apache/htdocs">
Options Indexes FollowSymLinks MultiViews ExecCGI
AllowOverride All
Order allow,deny
Allow from all
</Directory>
§ for the directory
¥
¦
After restarting Apache you should be able to visit the URL
If you run into problems, don’t forget to run http://localhost/test.cgi
tail /opt/apache/logs/error log
.
to get a full report.
To get a full list of environment variables available to your CGI program, try the following script:
¨
#!/bin/sh
¥
5 echo ’Content-type: text/html’ echo echo ’<HTML>’ echo ’<PRE>’ set echo ’</PRE>’ echo ’</HTML>’
§ ¦
The script will show ordinary bash environment variables as well as more interesting variables like QUERY STRING : Change your script to
¨
#!/bin/sh
¥ echo ’Content-type: text/html’ echo
402
36.
httpd
— Apache Web Server 36.2. Installing and Configuring Apache
5
5 echo ’<HTML>’ echo ’<PRE>’ echo $QUERY_STRING echo ’</PRE>’ echo ’</HTML>’
§ and then go to the URL http://localhost/test/test.cgi?xxx=2&yyy=3 variables can be passed to the shell script.
. It is easy to see how
¦
The preceding example is not very interesting. However, it gets useful when scripts have complex logic or can access information that Apache can’t access on its own. In Chapter 38 we see how to deploy an SQL database. When you have covered
SQL, you can come back here and replace your CGI script with,
¨
#!/bin/sh
¥ echo ’Content-type: text/html’ echo
This script will dump the table list of the template1 database if it exists. Apache will have to run as a user that can access this database, which means changing User nobody to User postgres .
&
Note that for security you should postgres database. See Section 38.4.
-
really
limit who can connect to the
¦
5
10
15
To create a functional form, use the HTTP
/opt/apache/htdocs/test/form.html
¨
<FORM> could contain: tag as follows.
<HTML>
<FORM name="myform" action="test.cgi" method="get">
<TABLE>
<TR>
<TD colspan="2" align="center">
Please enter your personal details:
</TD>
</TR>
<TR>
<TD>Name:</TD><TD><INPUT type="text" name="name"></TD>
</TR>
<TR>
<TD>Email:</TD><TD><INPUT type="text" name="email"></TD>
</TR>
<TR>
A file
¥
403
36.2. Installing and Configuring Apache 36.
httpd
— Apache Web Server
20
25
<TD>Tel:</TD><TD><INPUT type="text" name="tel"></TD>
</TR>
<TR>
<TD colspan="2" align="center">
<INPUT type="submit" value="Submit">
</TD>
</TR>
</TABLE>
</FORM>
</HTML>
§ which looks like:
¦ the entered data to a
¨
Note how this form calls our existing postgres SQL table:
#!/bin/sh test.cgi
script. Here is a script that adds
¥ echo ’Content-type: text/html’ echo
5 opts=‘echo "$QUERY_STRING" | \ sed -e ’s/[ˆA-Za-z0-9 %&+,.\/:[email protected]_˜-]//g’ -e ’s/&/ /g’ -e q‘
10
15
20 for opt in $opts ; do case $opt in name=*) name=${opt/name=/}
;; email=*) email=${opt/email=/}
;; tel=*) tel=${opt/tel=/}
;; esac
404
36.
httpd
— Apache Web Server 36.2. Installing and Configuring Apache done
25
30 if psql -d template1 -H -c "\
INSERT INTO people (name, email, tel) \
VALUES (’$name’, ’$email’, ’$tel’)" 2>&1 | grep -q ’ˆINSERT ’ ; then echo "<HTML>Your details \"$name\", \"$email\" and \"$tel\"<BR>" else echo "have been succesfully recorded.</HTML>" echo "<HTML>Database error, please contact our webmaster.</HTML>" fi exit 0
§ ¦
Note how the first lines of script remove all unwanted characters from
QUERY STRING . Such processing is imperative for security because shell scripts can easily execute commands should characters like $ and ‘ be present in a string.
To use the alternative “POST” method, change your FORM tag to
¨ ¥
<FORM name="myform" action="test.cgi" method="post">
§ ¦
The POST method sends the query text through stdin of the CGI script. Hence, you need to also change your
¨ opts= line to opts=‘cat | \ sed -e ’s/[ˆA-Za-z0-9 %&+,.\/:[email protected]_˜-]//g’ -e ’s/&/ /g’ -e q‘
§
¥
¦
Running Apache as a privileged user has security implications. Another way to get this script to execute as user postgres is to create a setuid binary. To do this, create a file test.cgi
¨ by compiling the following
C
program similar to that in Section 33.2.
#include <unistd.h>
¥
5 int main (int argc, char *argv[])
{ setreuid (geteuid (), geteuid ()); execl ("/opt/apache/htdocs/test/test.sh", "test.sh", 0); return 0;
}
§ ¦
405
36.2. Installing and Configuring Apache 36.
httpd
— Apache Web Server
Then run test.cgi
(or chown postgres:www test.cgi
chmod 4550 test.cgi
and chmod a-w,o-rx,u+s
). Recreate your shell script as test.sh
and go to the URL again. Apache runs then executes the script as the test.cgi
postgres
, which becomes user user. Even with Apache as postgres , and
User nobody your script will still work. Note how your setuid program is insecure: it takes no arguments and performs only a single function, but it takes environment variables (or input from stdin) that could influence its functionality. If a login user could execute the script, that user could send data via these variables that could cause the script to behave in an unforeseen way. An alternative is:
¨
#include <unistd.h>
¥
5 int main (int argc, char *argv[])
{ char *envir[] = {0}; setreuid (geteuid (), geteuid ()); execle ("/opt/apache/htdocs/test/test.sh", "test.sh", 0, envir); return 0;
}
§
This script nullifies the environment before starting the CGI, thus forcing you to use the POST method only. Because the only information that can be passed to the script is a single line of text (through the -e q option to sed ) and because that line of text is carefully stripped of unwanted characters, we can be much more certain of security.
¦
CGI execution is extremely slow if Apache has to invoke a shell script for each hit.
Apache has a number of facilities for built-in interpreters that will parse script files with high efficiency. A well-known programming language developed specifically for the Web is PHP. PHP can be downloaded as source from
The PHP Home Page
http://www.php.net
and contains the usual GNU installation instructions.
Apache has the facility for adding functionality at runtime using what it calls
DSO (
Dynamic Shared Object
) files. This feature is for distribution vendors who want to ship split installs of Apache that enable users to install only the parts of Apache they like. This is conceptually the same as what we saw in Section 23.1: To give your program some extra feature provided by some library, you can the library to your program
or
compile the library as a shared
either
.so
statically link file to be linked at run time. The difference here is that the library files are (usually) called and are stored in /opt/apache/libexec/ mod
. They are also only loaded if a
name
Load-
Module
name
module appears in reinstall Apache starting with:
¨ httpd.conf
. To enable DSO support, rebuild and
./configure --prefix=/opt/apache --enable-module=so
§
¥
¦
406
36.
httpd
— Apache Web Server 36.2. Installing and Configuring Apache
Any source package that creates an Apache module can now use the Apache utility /opt/apache/bin/apxs to tell it about the current Apache installation, so you should make sure this executable is in your PATH .
You can now follow the instructions for installing PHP, possibly beginning with
./configure --prefix=/opt/php --with-apxs=/opt/apache/bin/apxs
--with-pgsql=/usr .
postgres
(This assumes that you want to enable support for the
SQL database and have postgres previously installed as a package under /usr .) Finally, check that a file
/opt/apache/libexec/ .
libphp4.so
eventually ends up in
Your httpd.conf
then needs to know about PHP scripts. Add the following
¥
5
LoadModule php4_module /opt/apache/libexec/libphp4.so
AddModule mod_php4.c
AddType application/x-httpd-php .php
§ and then create a file /opt/apache/htdocs/hello.php
¨
<html>
<head>
<title>Example</title>
</head>
<body>
<?php echo "Hi, I’m a PHP script!"; ?>
</body>
</html>
§ containing and test by visiting the URL http://localhost/hello.php
.
Programming in the PHP language is beyond the scope of this book.
¦
¥
¦
Virtual hosting is the use of a single web server to serve the web pages of multiple domains. Although the web browser seems to be connecting to a web site that is an isolated entity, that web site may in fact be hosted alongside many others on the same machine.
Virtual hosting is rather trivial to configure. Let us say that we have three domains: www.domain1.com
, www.domain2.com
, and www.domain3.com
. We want domains www.domain1.com
and
196.123.45.1
, while www.domain3.com
www.domain2.com
to share IP address has its own IP address of 196.123.45.2
.
The sharing of a single IP address is called
name-based virtual hosting
, and the use of a different IP address for each domain is called
IP-based virtual hosting
.
407
36.2. Installing and Configuring Apache 36.
httpd
— Apache Web Server
If our machine has one IP address, 196.123.45.1
, we may need to configure a separate IP address on the same network card as follows (see Section 25.9):
¨ ifconfig eth0:1 196.123.45.2 netmask 255.255.255.0 up
§
¥
¦
For each domain /opt/apache/htdocs/www.domain
?
.com/ , we now create a top-level directory. We need to tell Apache that we intend to use the IP address
196.123.45.1
for several hosts. We do that with the
Then for each host, we must specify a top-level directory as follows:
¨
NameVirtualHost 196.123.45.1
NameVirtualHost directive.
¥
5
<VirtualHost 196.123.45.1>
ServerName www.domain1.com
DocumentRoot /opt/apache/htdocs/www.domain1.com/
</VirtualHost>
10
<VirtualHost 196.123.45.1>
ServerName www.domain2.com
DocumentRoot /opt/apache/htdocs/www.domain2.com/
</VirtualHost>
15
<VirtualHost 196.123.45.2>
ServerName www.domain3.com
DocumentRoot /opt/apache/htdocs/www.domain3.com/
</VirtualHost>
§
All that remains is to configure a correct DNS zone for each domain so that lookups of www.domain1.com
and www.domain2.com
return 196.123.45.1
while lookups of www.domain3.com
return 196.123.45.2
.
¦
You can then add index.html
files to each directory.
408
crond and atd familiar with.
are two very simple and important services that everyone should be crond does the job of running commands periodically (daily, weekly), and atd ’s main feature is to run a command once at some future time.
These two services are so basic that we are not going to detail their package contents and invocation.
The /etc/crontab l